Re: [PATCH][testsuite] Mark as UNSUPPORTED tests that don't fit into tiny memory model

2014-12-18 Thread Rainer Orth
Andreas Schwab sch...@linux-m68k.org writes:

 domi...@lps.ens.fr (Dominique Dhumieres) writes:

 Here it is. Committed as r218662 to get things going again.

 The same change should be done for libitm and libatomic.

 And boehm-gc.

Right, fixed as follows and installed on mainline after testing on
i386-pc-solaris2.11 and sparc-sun-solaris2.11.

I must say I'm less than impressed how this patch tested actually.

Rainer


2014-12-17  Rainer Orth  r...@cebitec.uni-bielefeld.de

* testsuite/lib/boehm-gc.exp: Load target-utils.exp.

# HG changeset patch
# Parent db99d18efa0cce3ad66a0cff02fedf946af2a842
Include target-utils.exp in boehm-gc testing

diff --git a/boehm-gc/testsuite/lib/boehm-gc.exp b/boehm-gc/testsuite/lib/boehm-gc.exp
--- a/boehm-gc/testsuite/lib/boehm-gc.exp
+++ b/boehm-gc/testsuite/lib/boehm-gc.exp
@@ -27,6 +27,7 @@ load_gcc_lib wrapper.exp
 load_gcc_lib target-supports.exp
 # For dg-skip-if.
 load_gcc_lib target-supports-dg.exp
+load_gcc_lib target-utils.exp
 # For ${tool}_exit.
 load_gcc_lib gcc-defs.exp
 # For prune_gcc_output.

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH][testsuite] Mark as UNSUPPORTED tests that don't fit into tiny memory model

2014-12-18 Thread Kyrill Tkachov


On 18/12/14 09:43, Rainer Orth wrote:

Andreas Schwab sch...@linux-m68k.org writes:


domi...@lps.ens.fr (Dominique Dhumieres) writes:


Here it is. Committed as r218662 to get things going again.

The same change should be done for libitm and libatomic.

And boehm-gc.

Right, fixed as follows and installed on mainline after testing on
i386-pc-solaris2.11 and sparc-sun-solaris2.11.

I must say I'm less than impressed how this patch tested actually.

Hi Rainer,

Sorry about this.
I had grepped for includes of gcc-defs.exp in lib*, which missed boehm-gc.
Thanks for fixing this, I was away earlier this week.

I'll make sure to check the libraries explicitly if I do any testsuite 
refactoring work in the future

Kyrill



Rainer


2014-12-17  Rainer Orth  r...@cebitec.uni-bielefeld.de

* testsuite/lib/boehm-gc.exp: Load target-utils.exp.





Re: [PATCH][testsuite] Mark as UNSUPPORTED tests that don't fit into tiny memory model

2014-12-18 Thread Rainer Orth
Hi Kyrill,

 Sorry about this.
 I had grepped for includes of gcc-defs.exp in lib*, which missed boehm-gc.
 Thanks for fixing this, I was away earlier this week.

 I'll make sure to check the libraries explicitly if I do any testsuite
 refactoring work in the future

great, thanks.  I know all too well that it needs a large amount of this
do get away from the current copy-and-paste programming pervasive
throughout the testsuite.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


[testsuite, i386] Change mpx effective-target test into link test

2014-12-18 Thread Rainer Orth
Currently, the new gcc.dg/lto/chkp-privatize test FAILs on Solaris/x86
with gas and ld for 64-bit:

UNRESOLVED: gcc.dg/lto/chkp-privatize 
c_lto_chkp-privatize_0.o-c_lto_chkp-privatize_1.o execute  -fPIC -flto 
-flto-partition=max -fcheck-pointer-bounds -mmpx 
FAIL: gcc.dg/lto/chkp-privatize 
c_lto_chkp-privatize_0.o-c_lto_chkp-privatize_1.o link,  -fPIC -flto 
-flto-partition=max -fcheck-pointer-bounds -mmpx 

output is:
ld: fatal: relocation error: file c_lto_chkp-privatize_0.o: section 
[4].rela.text: invalid relocation type: 0x28
ld: fatal: symbol referencing errors
collect2: error: ld returned 1 exit status

This relocation type 0x28 (40) is R_X86_64_PLT32_BND, which Solaris 10
and 11 ld know nothing about.

Given that there's no test whatsoever in gcc/configure.ac that the whole
toolchain supports the necessary mnemonics and relocs, the logical place
seems to be the testsuite.

The following patch does just that, turning the mpx effective-target
test into a link test, thus checking both assembler and linker used.

Tested with the appropriate runtest invocations on i386-pc-solaris2.11
with as/ld (both 32 and 64-bit tests unsupported due to missing
assembler support), gas/ld (32-bit tests pass, 64-bit tests unsupported
due to missing linker support), and x86_64-unknown-linux-gnu (both 32
and 64-bit tests pass).

I wonder what's the point of having mpx-dg.exp at all: I'd rather move
the single test to target-supports.exp, avoiding the whole flurry of
mpx-dg.exp inclusions all over gcc/testsuite.  Ilya?  However that may
be, it can be done as a followup.

Ok for mainline?

Rainer


2014-12-18  Rainer Orth  r...@cebitec.uni-bielefeld.de

* lib/mpx-dg.exp (check_effective_target_mpx): Change into link test.
Add main.

# HG changeset patch
# Parent bf453dbd938f27cf7eced4b5df3c29f5e452a55f
Change mpx effective-target test into link test

diff --git a/gcc/testsuite/lib/mpx-dg.exp b/gcc/testsuite/lib/mpx-dg.exp
--- a/gcc/testsuite/lib/mpx-dg.exp
+++ b/gcc/testsuite/lib/mpx-dg.exp
@@ -18,7 +18,8 @@
 # error-free for trivial code, 0 otherwise.
 
 proc check_effective_target_mpx {} {
-return [check_no_compiler_messages mpx object {
+return [check_no_compiler_messages mpx executable {
 	int *foo (int *arg) { return arg; }
+	int main (void) { return foo ((void *)0) == 0; }
 } -fcheck-pointer-bounds -mmpx]
 }

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


[patch 1/2][ARM]: New CPU support for Marvell Whitney

2014-12-18 Thread Xingxing Pan

Hi,

This patch contains Marvell Whitney core's pipeline description.
Test on arm-linux-gnueabi and no new regression are found.

Is it OK for trunk?

Regards,
Xingxing


 2014-12-18 Xingxing Pan xxing...@marvell.com

 * config/arm/arm-cores.def: Add new core marvell-whitney.
 * config/arm/arm-protos.h:
 (marvell_whitney_vector_element_size_is_byte): Declare.
 (marvell_whitney_non_shift_with_shift_operand): Ditto.
 * config/arm/arm-tables.opt: Regenerated.
 * config/arm/arm-tune.md: Regenerated.
 * config/arm/arm.c (arm_marvell_whitney_tune): New structure.
 (arm_issue_rate): Add marvell_whitney.
 (marvell_whitney_vector_element_size_is_byte): New function.
 (marvell_whitney_non_shift_with_shift_operand): Ditto.
 * config/arm/arm.md: Include marvell-whitney.md.
 (generic_sched): Add marvell_whitney.
 (generic_vfp): Ditto.
 * config/arm/bpabi.h (BE8_LINK_SPEC): Add marvell-whitney.
 * config/arm/t-arm (MD_INCLUDES): Add marvell-whitney.md.
 * config/arm/marvell-whitney.md: New file.
 * doc/invoke.texi: Document marvell-whitney.

diff --git a/gcc/config/arm/arm-cores.def b/gcc/config/arm/arm-cores.def
index 423ee9e..b0ffbe1 100644
--- a/gcc/config/arm/arm-cores.def
+++ b/gcc/config/arm/arm-cores.def
@@ -159,6 +159,7 @@ ARM_CORE(cortex-m7,   cortexm7, 
cortexm7, 7EM,

FL_LDSCHED, cortex_m7)
  ARM_CORE(cortex-m4, cortexm4, cortexm4, 7EM, 
FL_LDSCHED, v7m)
  ARM_CORE(cortex-m3, cortexm3, cortexm3, 7M, 
FL_LDSCHED, v7m)
  ARM_CORE(marvell-pj4,   marvell_pj4, marvell_pj4, 
 7A,  FL_LDSCHED, 9e)

+ARM_CORE(marvell-whitney,marvell_whitney, marvell_whitney,
7A,  FL_LDSCHED | FL_THUMB_DIV | FL_ARM_DIV, marvell_whitney)

  /* V7 big.LITTLE implementations */
  ARM_CORE(cortex-a15.cortex-a7, cortexa15cortexa7, cortexa7, 7A,
FL_LDSCHED | FL_THUMB_DIV | FL_ARM_DIV, cortex_a15)
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 20cfa9f..e86db1e 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -231,6 +231,9 @@ extern void arm_order_regs_for_local_alloc (void);

  extern int arm_max_conditional_execute ();

+extern bool marvell_whitney_vector_element_size_is_byte (rtx insn);
+extern bool marvell_whitney_non_shift_with_shift_operand (rtx insn);
+
  /* Vectorizer cost model implementation.  */
  struct cpu_vec_costs {
const int scalar_stmt_cost;   /* Cost of any scalar operation, 
excluding

diff --git a/gcc/config/arm/arm-tables.opt b/gcc/config/arm/arm-tables.opt
index 9b1886e..3371ce3 100644
--- a/gcc/config/arm/arm-tables.opt
+++ b/gcc/config/arm/arm-tables.opt
@@ -298,6 +298,9 @@ EnumValue
  Enum(processor_type) String(marvell-pj4) Value(marvell_pj4)

  EnumValue
+Enum(processor_type) String(marvell-whitney) Value(marvell_whitney)
+
+EnumValue
  Enum(processor_type) String(cortex-a15.cortex-a7) 
Value(cortexa15cortexa7)


  EnumValue
diff --git a/gcc/config/arm/arm-tune.md b/gcc/config/arm/arm-tune.md
index d300c51..c73c33c 100644
--- a/gcc/config/arm/arm-tune.md
+++ b/gcc/config/arm/arm-tune.md
@@ -28,9 +28,10 @@

cortexm1smallmultiply,cortexm0smallmultiply,cortexm0plussmallmultiply,
genericv7a,cortexa5,cortexa7,
cortexa8,cortexa9,cortexa12,
-   cortexa15,cortexa17,cortexr4,cortexr4f,
-   cortexr5,cortexr7,cortexm7,
-   cortexm4,cortexm3,marvell_pj4,
-   cortexa15cortexa7,cortexa17cortexa7,cortexa53,
-   cortexa57,cortexa57cortexa53
+   cortexa15,cortexa17,cortexr4,
+   cortexr4f,cortexr5,cortexr7,
+   cortexm7,cortexm4,cortexm3,
+   marvell_pj4,marvell_whitney,cortexa15cortexa7,
+   cortexa17cortexa7,cortexa53,cortexa57,
+   cortexa57cortexa53
(const (symbol_ref ((enum attr_tune) arm_tune
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 0ec526b..183da4c 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -1914,6 +1914,25 @@ const struct tune_params arm_cortex_a9_tune =
8   /* Maximum insns to 
inline memset.  */

  };

+const struct tune_params arm_marvell_whitney_tune =
+{
+  arm_9e_rtx_costs,
+  cortexa9_extra_costs,
+  cortex_a9_sched_adjust_cost,
+  1,   /* Constant limit.  */
+  5,   /* Max cond insns.  */
+  ARM_PREFETCH_BENEFICIAL(4,32,32),
+  false,   /* Prefer constant pool.  */
+  arm_default_branch_cost,
+  false,   /* Prefer LDRD/STRD.  */
+  {true, true},/* Prefer non 
short circuit.  */

+  arm_default_vec_cost,/* Vectorizer costs.  */
+  false,/* Prefer Neon for
64-bits bitops.  */
+  false, false, 

Re: [PATCH 4/4] OpenMP 4.0 offloading to Intel MIC: non-fallback testing

2014-12-18 Thread Jakub Jelinek
On Wed, Dec 17, 2014 at 11:48:01PM +0100, Thomas Schwinge wrote:
 I have another suggestion: on gomp-4_0-branch, we had already started
 using a libgomp-test-support.exp file for a similar purpose.  I now
 changed your code on gomp-4_0-branch in r218845 as follows (though, not
 very much tested).  The advantage here is that people who are not using
 GCC build-tree testing, but are directly invoking runtest, then don't
 have to set various environment variables, but instead just have to copy
 this one libgomp-test-support.exp file from the build tree to the testing
 tree.  Does this make sense?
 
 commit de01639bf47e29a20959771e587fb6f30c372a45
 Author: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4
 Date:   Wed Dec 17 22:45:05 2014 +
 
 libgomp: Route offloading data through the libgomp-test-support.exp file.
 
   libgomp/
   * testsuite/Makefile.am: Don't export OFFLOAD_TARGETS,
   OFFLOAD_ADDITIONAL_OPTIONS, and OFFLOAD_ADDITIONAL_LIB_PATHS...
   * testsuite/libgomp-test-support.exp.in: ..., and instead set
   offload_targets, offload_additional_options, and
   offload_additional_lib_paths here.  Update all users.

LGTM.

Jakub


Re: OpenACC middle end changes

2014-12-18 Thread Thomas Schwinge
Hi!

On Thu, 13 Nov 2014 19:09:49 +0100, Jakub Jelinek ja...@redhat.com wrote:
 On Thu, Nov 13, 2014 at 05:59:11PM +0100, Thomas Schwinge wrote:
  --- gcc/builtins.c
  +++ gcc/builtins.c

  +/* Expand OpenACC acc_on_device.
  +
  +   This has to happen late (that is, not in early folding; 
  expand_builtin_*,
  +   rather than fold_builtin_*), as we have to act differently for host and
  +   acceleration device (ACCEL_COMPILER conditional).  */
  +
  +static rtx
  +expand_builtin_acc_on_device (tree exp, rtx target ATTRIBUTE_UNUSED)
  +{
  +  if (!validate_arglist (exp, INTEGER_TYPE, VOID_TYPE))
  +return NULL_RTX;
  +
  +  tree arg, v1, v2, ret;
  +  location_t loc;
  +
  +  arg = CALL_EXPR_ARG (exp, 0);
  +  arg = builtin_save_expr (arg);
  +  loc = EXPR_LOCATION (exp);
  +
  +  /* Build: (arg == v1 || arg == v2) ? 1 : 0.  */
  +
  +#ifdef ACCEL_COMPILER
  +  v1 = build_int_cst (TREE_TYPE (arg), /* TODO: acc_device_not_host */ 3);
  +  v2 = build_int_cst (TREE_TYPE (arg), ACCEL_COMPILER_acc_device);
  +#else
  +  v1 = build_int_cst (TREE_TYPE (arg), /* TODO: acc_device_none */ 0);
  +  v2 = build_int_cst (TREE_TYPE (arg), /* TODO: acc_device_host */ 2);
  +#endif
  +
  +  v1 = fold_build2_loc (loc, EQ_EXPR, integer_type_node, arg, v1);
  +  v2 = fold_build2_loc (loc, EQ_EXPR, integer_type_node, arg, v2);
  +
  +  /* Can't use TRUTH_ORIF_EXPR, as that is not supported by
  + expand_expr_real*.  */
  +  ret = fold_build3_loc (loc, COND_EXPR, integer_type_node, v1, v1, v2);
  +  ret = fold_build3_loc (loc, COND_EXPR, integer_type_node,
  +ret, integer_one_node, integer_zero_node);
  +
  +  return expand_normal (ret);
 
 If you can't fold it late (which is indeed a problem for -O0),
 then I'd suggest to implement this more RTL-ish.
 So, avoid the builtin_save_expr, instead
   rtx op = expand_normal (arg);
 Don't build v1/v2 as trees (and, please fix the TODOs), but rtxes,

(acc_device_* TODOs already resolved earlier on.)

 just
   rtx v1 = GEN_INT (...);
   rtx v2 = GEN_INT (...);
   machine_mode mode = TYPE_MODE (TREE_TYPE (arg));
   rtx ret = gen_reg_rtx (TYPE_MODE (integer_type_node));
   emit_move_insn (ret, const0_rtx);
   rtx_code_label *done_label = gen_label_rtx ();
   emit_cmp_and_jump_insns (op, v1, NE, NULL_RTX, mode,
  false, done_label, PROB_EVEN);
   emit_cmp_and_jump_insns (op, v2, NE, NULL_RTX, mode,
  false, done_label, PROB_EVEN);
   emit_move_insn (ret, const1_rtx);
   emit_label (done_label);
   return ret;
 or similar.

Thanks for the review/suggestion/code!

 Note, it would still be worthwhile to fold the builtin, at least
 when optimizing, after IPA.  Dunno if we have some property you can check,
 and Richard B. could suggest where it would be most appropriate (if GIMPLE
 guarded match.pd entry, or what), gimple_fold, etc.

I'll make a note to have a look at that later on.

 I bet I should handle omp_is_initial_device (); similarly.

Yeah.

Committed to gomp-4_0-branch in r218858:

commit da5ad5aec1c0f9b230ecb2dc00620a5598de5066
Author: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4
Date:   Thu Dec 18 10:42:30 2014 +

OpenACC acc_on_device: Make builtin expansion more RTXy.

gcc/
* builtins.c (expand_builtin_acc_on_device): Make more RTXy.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@218858 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog.gomp |  5 +
 gcc/builtins.c | 44 +---
 2 files changed, 26 insertions(+), 23 deletions(-)

diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp
index b370616..a3650c5 100644
--- gcc/ChangeLog.gomp
+++ gcc/ChangeLog.gomp
@@ -1,3 +1,8 @@
+2014-12-18  Thomas Schwinge  tho...@codesourcery.com
+   Jakub Jelinek  ja...@redhat.com
+
+   * builtins.c (expand_builtin_acc_on_device): Make more RTXy.
+
 2014-12-17  Thomas Schwinge  tho...@codesourcery.com
Bernd Schmidt  ber...@codesourcery.com
 
diff --git gcc/builtins.c gcc/builtins.c
index fcf3f53..e946521 100644
--- gcc/builtins.c
+++ gcc/builtins.c
@@ -5889,38 +5889,36 @@ expand_stack_save (void)
acceleration device (ACCEL_COMPILER conditional).  */
 
 static rtx
-expand_builtin_acc_on_device (tree exp, rtx target ATTRIBUTE_UNUSED)
+expand_builtin_acc_on_device (tree exp, rtx target)
 {
   if (!validate_arglist (exp, INTEGER_TYPE, VOID_TYPE))
 return NULL_RTX;
 
-  tree arg, v1, v2, ret;
-  location_t loc;
-
-  arg = CALL_EXPR_ARG (exp, 0);
-  arg = builtin_save_expr (arg);
-  loc = EXPR_LOCATION (exp);
-
-  /* Build: (arg == v1 || arg == v2) ? 1 : 0.  */
+  tree arg = CALL_EXPR_ARG (exp, 0);
 
+  /* Return (arg == v1 || arg == v2) ? 1 : 0.  */
+  machine_mode v_mode = TYPE_MODE (TREE_TYPE (arg));
+  rtx v = expand_normal (arg), v1, v2;
 #ifdef ACCEL_COMPILER
-  v1 = build_int_cst (TREE_TYPE (arg), GOMP_DEVICE_NOT_HOST);
-  v2 = build_int_cst (TREE_TYPE (arg), 

Re: OpenACC middle end changes

2014-12-18 Thread Thomas Schwinge
Hi Jakub!

Many thanks for the review comments!  The very most have been addresed,
here are just a few comments.  If you feel strongly/differently about
any, I'll address those, too.


On Thu, 13 Nov 2014 19:09:49 +0100, Jakub Jelinek ja...@redhat.com wrote:
 On Thu, Nov 13, 2014 at 05:59:11PM +0100, Thomas Schwinge wrote:
  --- gcc/gimple-pretty-print.c
  +++ gcc/gimple-pretty-print.c
  @@ -1136,18 +1136,21 @@ dump_gimple_omp_for (pretty_printer *buffer, gimple 
  gs, int spc, int flags)
  case GF_OMP_FOR_KIND_FOR:
kind = ;
break;
  -   case GF_OMP_FOR_KIND_SIMD:
  - kind =  simd;
  - break;
  -   case GF_OMP_FOR_KIND_CILKSIMD:
  - kind =  cilksimd;
  - break;
  case GF_OMP_FOR_KIND_DISTRIBUTE:
kind =  distribute;
break;
  case GF_OMP_FOR_KIND_CILKFOR:
kind =  _Cilk_for;
break;
  +   case GF_OMP_FOR_KIND_OACC_LOOP:
  + kind =  oacc_loop;
  + break;
  +   case GF_OMP_FOR_KIND_SIMD:
  + kind =  simd;
  + break;
  +   case GF_OMP_FOR_KIND_CILKSIMD:
  + kind =  cilksimd;
  + break;
 
 Why the reshuffling?  The result isn't alphabetically sorted
 anyway.  I'd just add new stuff at the end ;)

It's the order in which the GF_OMP_FOR_KIND_* are defined.  At least for
my mind ;-) that makes it very easy to grasp that all of them are
covered.


  @@ -1684,7 +1718,12 @@ gimple_copy (gimple stmt)
gimple_try_set_cleanup (copy, new_seq);
break;
   
  +   case GIMPLE_OACC_KERNELS:
  +   case GIMPLE_OACC_PARALLEL:
  +  gcc_unreachable ();
  +
  case GIMPLE_OMP_FOR:
  + gcc_assert (!is_gimple_omp_oacc_specifically (stmt));
new_seq = gimple_seq_copy (gimple_omp_for_pre_body (stmt));
gimple_omp_for_set_pre_body (copy, new_seq);
t = unshare_expr (gimple_omp_for_clauses (stmt));
  @@ -1754,6 +1793,7 @@ gimple_copy (gimple stmt)
  case GIMPLE_OMP_TASKGROUP:
  case GIMPLE_OMP_ORDERED:
  copy_omp_body:
  + gcc_assert (!is_gimple_omp_oacc_specifically (stmt));
new_seq = gimple_seq_copy (gimple_omp_body (stmt));
gimple_omp_set_body (copy, new_seq);
break;
 
 Why are you so afraid to support oacc in gimple_copy?

Not afraid, but have never run into this, so didn't want to claim such
code paths have been tested.  Anyway, with GIMPLE_OACC_* now merged into
GIMPLE_OMP_TARGET, this should -- famous last words?  ;-) -- now just
work, so I removed all these asserts.  (And, no doubt, usage of
assert/gcc_unreachable was not exactly correct, for code paths that could
actually legitimately be reached; probably should've used sorry instead.)


  +/* GIMPLE_OACC_PARALLEL */
  +struct GTY((tag(GSS_OMP_PARALLEL_LAYOUT)))
  +  gimple_statement_oacc_parallel : public 
  gimple_statement_omp_parallel_layout
  +{
  +/* No extra fields; adds invariant:
  + stmt-code == GIMPLE_OACC_PARALLEL.  */
 
 Formatting, there is too big indentation.  That said, this will be replaced
 with GF_OMP_TARGET_KIND_*, right?
 
  +static inline tree
  +gimple_oacc_kernels_clauses (const_gimple gs)
  +{
  +  const gimple_statement_oacc_kernels *oacc_kernels_stmt =
  +as_a const gimple_statement_oacc_kernels * (gs);
 
 Wrong formatting (many times, = goes on the second line.
 Seems you probably just copied it from preexisting bad formatting,
 we should ping Dave Malcolm to fix up his scripts.

Yes, yes, and yes.  (And, now gone.)


  +/* Return true if STMT is any of the OpenACC types specifically.  */
  +
  +static inline bool
  +is_gimple_omp_oacc_specifically (const_gimple stmt)
 
 Why not is_gimple_oacc or gimple_oacc_p ?

The idea is to make it clear in the name that STMT must be an OMP one.
Now renamed to the shorter is_gimple_omp_oacc.


  @@ -99,15 +103,16 @@ enum gimplify_omp_var_data
   
   enum omp_region_type
   {
  -  ORT_WORKSHARE = 0,
  -  ORT_SIMD = 1,
  -  ORT_PARALLEL = 2,
  -  ORT_COMBINED_PARALLEL = 3,
  -  ORT_TASK = 4,
  -  ORT_UNTIED_TASK = 5,
  -  ORT_TEAMS = 8,
  -  ORT_TARGET_DATA = 16,
  -  ORT_TARGET = 32
  +  /* An undefined region type.  */
  +  ORT_INVALID = 0,
  +
  +  ORT_WORKSHARE,
  +  ORT_SIMD,
  +  ORT_PARALLEL,
  +  ORT_COMBINED_PARALLEL,
  +  ORT_TASK,
  +  ORT_TEAMS,
  +  ORT_TARGET
   };
 
 If you want to shift from bitmasks in the enum
 to extra on the side bits (why?), then combined
 for parallel is another thing.

Right, but I've now dropped (reverted) this and further gimplification
changes.  Maybe this is material for the next stage 1, but maybe not
useful enough.


 [...]

  +static inline tree
  +lookup_reduction (const char *id, omp_context *ctx)
 
 Can't you use oacc_ in the name of OpenACC specific functions?

 [...]

Review comments for OpenACC reduction code have been addressed by Cesar.
(Thanks!)


  case OMP_CLAUSE_DEFAULT:
  + gcc_assert (!is_gimple_omp_oacc_specifically (ctx-stmt));
ctx-default_kind = OMP_CLAUSE_DEFAULT_KIND (c);
break;
 
 For clauses which are not parsed for OpenACC 

Re: [C++ Patch] PR 60955

2014-12-18 Thread Paolo Carlini

Hi,

On 12/17/2014 09:37 PM, Jason Merrill wrote:
I'm uncomfortable with setting TREE_NO_WARNING on a decl just because 
we don't want a warning for one particular use of it.  How about 
suppressing warnings across the call to build_static_cast?
Sure. The below uses the c_inhibit_evaluation_warnings mechanism and 
passes testing. I wondered if in such cases we could alternately use the 
warning_sentinel mechanism (moved to cp-tree.h of course) ?


Thanks,
Paolo.

///
Index: cp/semantics.c
===
--- cp/semantics.c  (revision 218857)
+++ cp/semantics.c  (working copy)
@@ -1660,7 +1660,9 @@ force_paren_expr (tree expr)
  tree type = unlowered_expr_type (expr);
  bool rval = !!(kind  clk_rvalueref);
  type = cp_build_reference_type (type, rval);
+ ++c_inhibit_evaluation_warnings;
  expr = build_static_cast (type, expr, tf_error);
+ --c_inhibit_evaluation_warnings;
  if (expr != error_mark_node)
REF_PARENTHESIZED_P (expr) = true;
}
Index: cp/typeck.c
===
--- cp/typeck.c (revision 218857)
+++ cp/typeck.c (working copy)
@@ -6047,7 +6047,8 @@ cxx_mark_addressable (tree exp)
  (address of explicit register variable %qD requested, x);
return false;
  }
-   else if (extra_warnings)
+   else if (extra_warnings
+ c_inhibit_evaluation_warnings == 0)
  warning
(OPT_Wextra, address requested for %qD, which is declared 
%register%, x);
  }
Index: testsuite/g++.dg/warn/register-parm-1.C
===
--- testsuite/g++.dg/warn/register-parm-1.C (revision 0)
+++ testsuite/g++.dg/warn/register-parm-1.C (working copy)
@@ -0,0 +1,9 @@
+// PR c++/60955
+// { dg-options -Wextra }
+
+unsigned int erroneous_warning(register int a) {
+if ((a)  0xff) return 1; else return 0;
+}
+unsigned int no_erroneous_warning(register int a) {
+if (a  0xff) return 1; else return 0;
+}


Re: [testsuite, i386] Change mpx effective-target test into link test

2014-12-18 Thread Uros Bizjak
On Thu, Dec 18, 2014 at 11:12 AM, Rainer Orth
r...@cebitec.uni-bielefeld.de wrote:
 Currently, the new gcc.dg/lto/chkp-privatize test FAILs on Solaris/x86
 with gas and ld for 64-bit:

 UNRESOLVED: gcc.dg/lto/chkp-privatize 
 c_lto_chkp-privatize_0.o-c_lto_chkp-privatize_1.o execute  -fPIC -flto 
 -flto-partition=max -fcheck-pointer-bounds -mmpx
 FAIL: gcc.dg/lto/chkp-privatize 
 c_lto_chkp-privatize_0.o-c_lto_chkp-privatize_1.o link,  -fPIC -flto 
 -flto-partition=max -fcheck-pointer-bounds -mmpx

 output is:
 ld: fatal: relocation error: file c_lto_chkp-privatize_0.o: section 
 [4].rela.text: invalid relocation type: 0x28
 ld: fatal: symbol referencing errors
 collect2: error: ld returned 1 exit status

 This relocation type 0x28 (40) is R_X86_64_PLT32_BND, which Solaris 10
 and 11 ld know nothing about.

 Given that there's no test whatsoever in gcc/configure.ac that the whole
 toolchain supports the necessary mnemonics and relocs, the logical place
 seems to be the testsuite.

 The following patch does just that, turning the mpx effective-target
 test into a link test, thus checking both assembler and linker used.

 Tested with the appropriate runtest invocations on i386-pc-solaris2.11
 with as/ld (both 32 and 64-bit tests unsupported due to missing
 assembler support), gas/ld (32-bit tests pass, 64-bit tests unsupported
 due to missing linker support), and x86_64-unknown-linux-gnu (both 32
 and 64-bit tests pass).

 I wonder what's the point of having mpx-dg.exp at all: I'd rather move
 the single test to target-supports.exp, avoiding the whole flurry of
 mpx-dg.exp inclusions all over gcc/testsuite.  Ilya?  However that may
 be, it can be done as a followup.

 Ok for mainline?

 Rainer


 2014-12-18  Rainer Orth  r...@cebitec.uni-bielefeld.de

 * lib/mpx-dg.exp (check_effective_target_mpx): Change into link test.
 Add main.

OK.

Thanks,
Uros.


Re: OpenACC middle end changes

2014-12-18 Thread Jakub Jelinek
On Thu, Dec 18, 2014 at 11:46:00AM +0100, Thomas Schwinge wrote:
  just
rtx v1 = GEN_INT (...);
rtx v2 = GEN_INT (...);
machine_mode mode = TYPE_MODE (TREE_TYPE (arg));
rtx ret = gen_reg_rtx (TYPE_MODE (integer_type_node));
emit_move_insn (ret, const0_rtx);
rtx_code_label *done_label = gen_label_rtx ();
emit_cmp_and_jump_insns (op, v1, NE, NULL_RTX, mode,
 false, done_label, PROB_EVEN);
emit_cmp_and_jump_insns (op, v2, NE, NULL_RTX, mode,
 false, done_label, PROB_EVEN);
emit_move_insn (ret, const1_rtx);
emit_label (done_label);
return ret;
  or similar.
 
 Thanks for the review/suggestion/code!

Note, as I found later, emit_cmp_and_jump_insns is good enough only
for certain modes on certain architectures (in particular, for
cases where can_compare_p returns true).
So it is better to use do_compare_rtx_and_jump instead of
emit_cmp_and_jump_insns, because it handles also the cases which
emit_cmp_and_jump_insns silently mishandles.  You'll need to reorder
the arguments a little bit and add one NULL_RTX argument.
See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63848#c4

Jakub


Re: OpenACC middle end changes

2014-12-18 Thread Jakub Jelinek
On Thu, Dec 18, 2014 at 12:07:01PM +0100, Thomas Schwinge wrote:
   - case GF_OMP_FOR_KIND_SIMD:
   -   kind =  simd;
   -   break;
   - case GF_OMP_FOR_KIND_CILKSIMD:
   -   kind =  cilksimd;
   -   break;
 case GF_OMP_FOR_KIND_DISTRIBUTE:
   kind =  distribute;
   break;
 case GF_OMP_FOR_KIND_CILKFOR:
   kind =  _Cilk_for;
   break;
   + case GF_OMP_FOR_KIND_OACC_LOOP:
   +   kind =  oacc_loop;
   +   break;
   + case GF_OMP_FOR_KIND_SIMD:
   +   kind =  simd;
   +   break;
   + case GF_OMP_FOR_KIND_CILKSIMD:
   +   kind =  cilksimd;
   +   break;
  
  Why the reshuffling?  The result isn't alphabetically sorted
  anyway.  I'd just add new stuff at the end ;)
 
 It's the order in which the GF_OMP_FOR_KIND_* are defined.  At least for
 my mind ;-) that makes it very easy to grasp that all of them are
 covered.

Ok.

   +/* Return true if STMT is any of the OpenACC types specifically.  */
   +
   +static inline bool
   +is_gimple_omp_oacc_specifically (const_gimple stmt)
  
  Why not is_gimple_oacc or gimple_oacc_p ?
 
 The idea is to make it clear in the name that STMT must be an OMP one.
 Now renamed to the shorter is_gimple_omp_oacc.

Ok.

  If you want to shift from bitmasks in the enum
  to extra on the side bits (why?), then combined
  for parallel is another thing.
 
 Right, but I've now dropped (reverted) this and further gimplification
 changes.  Maybe this is material for the next stage 1, but maybe not
 useful enough.

Ack.

 Do you want me to repost the OpenACC Middle End changes patch, or would
 you be OK with reviewing the code on gomp-4_0-branch, diffing against the
 last trunk merge point, 0fcfaa33cbf333ac69cc2b01a7277e5272ff8a3d,
 r218679?

So, is what is on the gomp-4_0-branch now all that you'd like to merge to
trunk now?  Has it been tested on nvptx?  I guess we should test it with
XeonPhi offloading too to make sure it doesn't break.
And then you or together with your coworkers should write the summary
ChangeLogs (i.e. what changed compared to trunk in a single giant entry
for each ChangeLog file as opposed to many ChangeLog.gomp change entries).

Jakub


Re: [PATCH][AArch64] Implement vsqrt_f64 intrinsic

2014-12-18 Thread Kyrill Tkachov


On 17/12/14 00:04, Joseph Myers wrote:

On Mon, 15 Dec 2014, James Greenhalgh wrote:


@@ -22792,6 +22792,12 @@ vsqrtq_f32 (float32x4_t a)
return __builtin_aarch64_sqrtv4sf (a);
  }
  
+__extension__ static __inline float64x1_t __attribute__ ((__always_inline__))

+vsqrt_f64 (float64x1_t a)
+{
+  return (float64x1_t) { __builtin_sqrt (a[0]) };
+}

Hi Kyrill,

Does this introduce an implicit need to link against a maths library if
we want arm_neon.h to work correctly? If so, I think we need to take a
different approach.

At O0 I've started to see:

undefined reference to `sqrt' 

When checking a large arm_neon.h testcase.

It does seem strange that the mid-end would convert a __builtin_sqrt back
to a library call at O0 when the target has an optab for it, so perhaps
there is a bug there to go hunt?

__builtin_sqrt has the same semantics as the sqrt library function.  This
includes setting errno for negative arguments (other than -0 and -NaN).
The semantics also include that it's always OK to expand to a call to that
library function (generally, __builtin_foo may always expand to a call to
foo, if there is such a library function).


Thanks for the explanation.
I see that there are some intrinsics implemented in terms of 
__builtin_fabsf, presumable they can be at 'risk' too of having a 
library call emitted?


Kyrill








Re: OpenACC middle end changes

2014-12-18 Thread Jakub Jelinek
On Thu, Dec 18, 2014 at 12:38:53PM +0100, Jakub Jelinek wrote:
 So, is what is on the gomp-4_0-branch now all that you'd like to merge to
 trunk now?  Has it been tested on nvptx?  I guess we should test it with
 XeonPhi offloading too to make sure it doesn't break.
 And then you or together with your coworkers should write the summary
 ChangeLogs (i.e. what changed compared to trunk in a single giant entry
 for each ChangeLog file as opposed to many ChangeLog.gomp change entries).

Also, it would be nice to update wiki/Offloading to give details on all the
steps how to configure nvptx offloading (how to grab nvptx-newlib,
nvptx-tools, how to configure nvptx-none compiler, in what order to build
those etc.).

Jakub


[PATCH] Fix -fsanitize=float-cast-overflow with C FE (PR sanitizer/64344)

2014-12-18 Thread Jakub Jelinek
On Tue, Dec 16, 2014 at 06:58:47PM +, Joseph Myers wrote:
 On Fri, 12 Dec 2014, Jakub Jelinek wrote:
 
  Hi!
  
  -fsanitize=float-cast-overflow sanitization is done in convert.c and calls
  there save_expr.  Unfortunately, save_expr is a no-go for the C FE, we need
  c_save_expr, but as convert.c is shared by all FEs, the only way to arrange
  that would be a new langhook.  This patch attempts to fix it the same way
  as PR54428 did (the other save_expr in c-convert.c).
  
  Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
 
 OK.

Unfortunately this broke stuff, as reported by Tobias.

The problems are:
1) sometimes convert is called after c_fully_fold, so we want
   in_late_binary_op set in that case and use save_expr instead of c_save_expr
2) if not in_late_binary_op, ubsan_instrument_float_cast wants to use the
   result of c_save_expr in comparisons (ok) and also as argument to
   a function call; but c_fully_fold will handle the former, but not look
   into CALL_EXPR arguments, so those need to be c_fully_folded
and, preexisting problems, where constant is required, things didn't work
well, because while the check was optimized out, we ended up with
COMPOUND_EXPR of 0 and some constant.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

2014-12-18  Jakub Jelinek  ja...@redhat.com

PR sanitizer/64344
* ubsan.h (ubsan_instrument_float_cast): Add ARG argument.
* ubsan.c (ubsan_instrument_float_cast): Add ARG argument, pass
it to libubsan handler instead of EXPR.  Fold comparisons earlier,
if the result is integer_zerop, return NULL_TREE.
* convert.c (convert_to_integer): Pass expr as ARG.
c/
* c-typeck.c (convert_for_assignment, c_finish_return): For
-fsanitize=float-cast-overflow casts from REAL_TYPE to integer/enum
types also set in_late_binary_op around convert call.
* c-convert.c (convert): For -fsanitize=float-cast-overflow REAL_TYPE
to integral type casts, if not in_late_binary_op, pass c_fully_fold
result on expr as last argument to ubsan_instrument_float_cast,
if in_late_binary_op, don't use c_save_expr but save_expr.
testsuite/
* c-c++-common/ubsan/pr64344-1.c: New test.
* c-c++-common/ubsan/pr64344-2.c: New test.

--- gcc/ubsan.h.jj  2014-12-03 16:33:05.0 +0100
+++ gcc/ubsan.h 2014-12-18 09:57:12.934732565 +0100
@@ -47,7 +47,7 @@ extern tree ubsan_type_descriptor (tree,
 extern tree ubsan_encode_value (tree, bool = false);
 extern bool is_ubsan_builtin_p (tree);
 extern tree ubsan_build_overflow_builtin (tree_code, location_t, tree, tree, 
tree);
-extern tree ubsan_instrument_float_cast (location_t, tree, tree);
+extern tree ubsan_instrument_float_cast (location_t, tree, tree, tree);
 extern tree ubsan_get_source_location_type (void);
 
 #endif  /* GCC_UBSAN_H  */
--- gcc/ubsan.c.jj  2014-12-03 16:33:05.0 +0100
+++ gcc/ubsan.c 2014-12-18 10:47:00.045039211 +0100
@@ -1252,10 +1252,11 @@ instrument_bool_enum_load (gimple_stmt_i
 }
 
 /* Instrument float point-to-integer conversion.  TYPE is an integer type of
-   destination, EXPR is floating-point expression.  */
+   destination, EXPR is floating-point expression.  ARG is what to pass
+   the libubsan call as value, often EXPR itself.  */
 
 tree
-ubsan_instrument_float_cast (location_t loc, tree type, tree expr)
+ubsan_instrument_float_cast (location_t loc, tree type, tree expr, tree arg)
 {
   tree expr_type = TREE_TYPE (expr);
   tree t, tt, fn, min, max;
@@ -1348,6 +1349,12 @@ ubsan_instrument_float_cast (location_t
   else
 return NULL_TREE;
 
+  t = fold_build2 (UNLE_EXPR, boolean_type_node, expr, min);
+  tt = fold_build2 (UNGE_EXPR, boolean_type_node, expr, max);
+  t = fold_build2 (TRUTH_OR_EXPR, boolean_type_node, t, tt);
+  if (integer_zerop (t))
+return NULL_TREE;
+
   if (flag_sanitize_undefined_trap_on_error)
 fn = build_call_expr_loc (loc, builtin_decl_explicit (BUILT_IN_TRAP), 0);
   else
@@ -1364,14 +1371,10 @@ ubsan_instrument_float_cast (location_t
   fn = builtin_decl_explicit (bcode);
   fn = build_call_expr_loc (loc, fn, 2,
build_fold_addr_expr_loc (loc, data),
-   ubsan_encode_value (expr, false));
+   ubsan_encode_value (arg, false));
 }
 
-  t = fold_build2 (UNLE_EXPR, boolean_type_node, expr, min);
-  tt = fold_build2 (UNGE_EXPR, boolean_type_node, expr, max);
-  return fold_build3 (COND_EXPR, void_type_node,
- fold_build2 (TRUTH_OR_EXPR, boolean_type_node, t, tt),
- fn, integer_zero_node);
+  return fold_build3 (COND_EXPR, void_type_node, t, fn, integer_zero_node);
 }
 
 /* Instrument values passed to function arguments with nonnull attribute.  */
--- gcc/convert.c.jj2014-12-12 16:32:29.0 +0100
+++ gcc/convert.c   2014-12-18 09:57:27.742475996 +0100
@@ 

Re: OpenACC middle end changes

2014-12-18 Thread Jakub Jelinek
On Thu, Dec 18, 2014 at 01:02:22PM +0100, Jakub Jelinek wrote:
 On Thu, Dec 18, 2014 at 12:38:53PM +0100, Jakub Jelinek wrote:
  So, is what is on the gomp-4_0-branch now all that you'd like to merge to
  trunk now?  Has it been tested on nvptx?  I guess we should test it with
  XeonPhi offloading too to make sure it doesn't break.
  And then you or together with your coworkers should write the summary
  ChangeLogs (i.e. what changed compared to trunk in a single giant entry
  for each ChangeLog file as opposed to many ChangeLog.gomp change entries).
 
 Also, it would be nice to update wiki/Offloading to give details on all the
 steps how to configure nvptx offloading (how to grab nvptx-newlib,
 nvptx-tools, how to configure nvptx-none compiler, in what order to build
 those etc.).

FYI, just tried to build gomp-4_0-branch with:
../configure --build=x86_64-intelmicemul-linux-gnu 
--host=x86_64-intelmicemul-linux-gnu --target=x86_64-intelmicemul-linux-gnu 
--enable-as-accelerator-for=x86_64-pc-linux-gnu --disable-bootstrap
make -j16
and the build failed with:
../../gcc/builtins.c: In function ‘rtx_def* expand_builtin_acc_on_device(tree, 
rtx)’:
../../gcc/builtins.c:5904:17: error: ‘ACCEL_COMPILER_acc_device’ was not 
declared in this scope
   v2 = GEN_INT (ACCEL_COMPILER_acc_device);
 ^
../../gcc/rtl.h:3186:51: note: in definition of macro ‘GEN_INT’
 #define GEN_INT(N)  gen_rtx_CONST_INT (VOIDmode, (N))
   ^

Where is ACCEL_COMPILER_acc_device macro supposed to be defined?

Jakub


Re: OpenACC middle end changes

2014-12-18 Thread Thomas Schwinge
Hi Jakub!

On Thu, 18 Dec 2014 13:15:38 +0100, Jakub Jelinek ja...@redhat.com wrote:
 On Thu, Dec 18, 2014 at 01:02:22PM +0100, Jakub Jelinek wrote:
  On Thu, Dec 18, 2014 at 12:38:53PM +0100, Jakub Jelinek wrote:
   So, is what is on the gomp-4_0-branch now all that you'd like to merge to
   trunk now?

Basically, yes.  Only basically, because there are still a few
unaddressed review issues in the front ends -- which I'll look into now.
(Meaning really: now.)  :-)


Doing the merge as one big commit on trunk will be the easiest
approach, of course.  Is that OK, or is there any requirement to single
out any of the changes, such as the
libgomp/testsuite/libgomp-test-support.exp file just discussed, or the
libgomp »Offloading and Multi Processing Runtime Library« renaming, or
anything else?


   Has it been tested on nvptx?

I have always been testing on gomp-4_0-branch with ACC_DEVICE_TYPE=host
and ACC_DEVICE_TYPE=host_nonshm, plus with ACC_DEVICE_TYPE=nvidia in an
internal branch.  This branch corresponds to gomp-4_0-branch, but also
includes a few patches related to offloading that Bernd has posted for
trunk approval, but has not yet gotten approved.


   I guess we should test it with
   XeonPhi offloading too to make sure it doesn't break.

Right.  Do you happen to be set up for such testing?  I have not yet
managed to properly change my build/test scripts for
x86_64-intelmicemul-linux-gnu.


   And then you or together with your coworkers should write the summary
   ChangeLogs (i.e. what changed compared to trunk in a single giant entry
   for each ChangeLog file as opposed to many ChangeLog.gomp change entries).

Right, I'll do that once it's time to merge -- otherwise it'll be too
cumbersome to keep those up to date.


  Also, it would be nice to update wiki/Offloading to give details on all the
  steps how to configure nvptx offloading (how to grab nvptx-newlib,
  nvptx-tools, how to configure nvptx-none compiler, in what order to build
  those etc.).

Right.


 FYI, just tried to build gomp-4_0-branch with:
 ../configure --build=x86_64-intelmicemul-linux-gnu 
 --host=x86_64-intelmicemul-linux-gnu --target=x86_64-intelmicemul-linux-gnu 
 --enable-as-accelerator-for=x86_64-pc-linux-gnu --disable-bootstrap
 make -j16
 and the build failed with:
 ../../gcc/builtins.c: In function ‘rtx_def* 
 expand_builtin_acc_on_device(tree, rtx)’:
 ../../gcc/builtins.c:5904:17: error: ‘ACCEL_COMPILER_acc_device’ was not 
 declared in this scope
v2 = GEN_INT (ACCEL_COMPILER_acc_device);
  ^
 ../../gcc/rtl.h:3186:51: note: in definition of macro ‘GEN_INT’
  #define GEN_INT(N)  gen_rtx_CONST_INT (VOIDmode, (N))
^
 
 Where is ACCEL_COMPILER_acc_device macro supposed to be defined?

From b6781092de7cc9fc8c24600815e0a1223e1241f5 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge tho...@codesourcery.com
Date: Wed, 17 Dec 2014 10:10:53 +0100
Subject: [PATCH] Intel MIC offloading.

---
 gcc/config.gcc |  1 +
 gcc/config/i386/intelmic-offload.h | 35 +++
 2 files changed, 36 insertions(+)
 create mode 100644 gcc/config/i386/intelmic-offload.h

diff --git gcc/config.gcc gcc/config.gcc
index 8541274..faad47d 100644
--- gcc/config.gcc
+++ gcc/config.gcc
@@ -2906,6 +2906,7 @@ esac
 case ${target} in
 *-intelmic-* | *-intelmicemul-*)
tmake_file=${tmake_file} i386/t-intelmic
+   tm_file=${tm_file} i386/intelmic-offload.h
;;
 esac
 
diff --git gcc/config/i386/intelmic-offload.h gcc/config/i386/intelmic-offload.h
new file mode 100644
index 000..dc346c7
--- /dev/null
+++ gcc/config/i386/intelmic-offload.h
@@ -0,0 +1,35 @@
+/* Definitions for Intel MIC offloading.
+
+   Copyright (C) 2014 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   http://www.gnu.org/licenses/.  */
+
+#ifndef INTELMIC_OFFLOAD_H
+#define INTELMIC_OFFLOAD_H
+
+/* Support for OpenACC acc_on_device.  */
+
+#include gomp-constants.h
+
+#define ACCEL_COMPILER_acc_device GOMP_DEVICE_INTEL_MIC
+
+#endif
-- 
1.9.1


Grüße,
 Thomas



Re: OpenACC middle end changes

2014-12-18 Thread Jakub Jelinek
On Thu, Dec 18, 2014 at 01:24:20PM +0100, Thomas Schwinge wrote:
 Hi Jakub!
 
 On Thu, 18 Dec 2014 13:15:38 +0100, Jakub Jelinek ja...@redhat.com wrote:
  On Thu, Dec 18, 2014 at 01:02:22PM +0100, Jakub Jelinek wrote:
   On Thu, Dec 18, 2014 at 12:38:53PM +0100, Jakub Jelinek wrote:
So, is what is on the gomp-4_0-branch now all that you'd like to merge 
to
trunk now?
 
 Basically, yes.  Only basically, because there are still a few
 unaddressed review issues in the front ends -- which I'll look into now.
 (Meaning really: now.)  :-)
 
 
 Doing the merge as one big commit on trunk will be the easiest
 approach, of course.  Is that OK, or is there any requirement to single
 out any of the changes, such as the
 libgomp/testsuite/libgomp-test-support.exp file just discussed, or the
 libgomp »Offloading and Multi Processing Runtime Library« renaming, or
 anything else?

Doing one big merge is ok with me.  If one needs to bisect something, they
can look at the gomp-4_0-branch.

Has it been tested on nvptx?
 
 I have always been testing on gomp-4_0-branch with ACC_DEVICE_TYPE=host
 and ACC_DEVICE_TYPE=host_nonshm, plus with ACC_DEVICE_TYPE=nvidia in an
 internal branch.  This branch corresponds to gomp-4_0-branch, but also
 includes a few patches related to offloading that Bernd has posted for
 trunk approval, but has not yet gotten approved.

Do you have a list of them (URLs)?  Have they been pinged?
Are they show stoppers for the offloading, or just some tests fail because
of that?

I guess we should test it with
XeonPhi offloading too to make sure it doesn't break.
 
 Right.  Do you happen to be set up for such testing?  I have not yet
 managed to properly change my build/test scripts for
 x86_64-intelmicemul-linux-gnu.

Anyone with x86_64-linux should be able to test that (i.e. the emulation),
for real offloading (that ix x86_64-intelmic-linux-gnu) you supposedly need
2 libraries, some kernel module, some distro installed on the offloading
device and most importantly the hw.

 --- gcc/config.gcc
 +++ gcc/config.gcc
 @@ -2906,6 +2906,7 @@ esac
  case ${target} in
  *-intelmic-* | *-intelmicemul-*)
   tmake_file=${tmake_file} i386/t-intelmic
 + tm_file=${tm_file} i386/intelmic-offload.h
   ;;
  esac
  
 diff --git gcc/config/i386/intelmic-offload.h 
 gcc/config/i386/intelmic-offload.h
 new file mode 100644
 index 000..dc346c7
 --- /dev/null
 +++ gcc/config/i386/intelmic-offload.h
 @@ -0,0 +1,35 @@
 +/* Definitions for Intel MIC offloading.
 +
 +   Copyright (C) 2014 Free Software Foundation, Inc.
 +
 +   This file is part of GCC.
 +
 +   GCC is free software; you can redistribute it and/or modify
 +   it under the terms of the GNU General Public License as published by
 +   the Free Software Foundation; either version 3, or (at your option)
 +   any later version.
 +
 +   GCC is distributed in the hope that it will be useful,
 +   but WITHOUT ANY WARRANTY; without even the implied warranty of
 +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 +   GNU General Public License for more details.
 +
 +   Under Section 7 of GPL version 3, you are granted additional
 +   permissions described in the GCC Runtime Library Exception, version
 +   3.1, as published by the Free Software Foundation.
 +
 +   You should have received a copy of the GNU General Public License and
 +   a copy of the GCC Runtime Library Exception along with this program;
 +   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 +   http://www.gnu.org/licenses/.  */
 +
 +#ifndef INTELMIC_OFFLOAD_H
 +#define INTELMIC_OFFLOAD_H
 +
 +/* Support for OpenACC acc_on_device.  */
 +
 +#include gomp-constants.h
 +
 +#define ACCEL_COMPILER_acc_device GOMP_DEVICE_INTEL_MIC
 +
 +#endif

LGTM.

Jakub


Re: OpenACC middle end changes

2014-12-18 Thread Jakub Jelinek
On Thu, Dec 18, 2014 at 01:31:45PM +0100, Jakub Jelinek wrote:
  --- gcc/config.gcc
  +++ gcc/config.gcc
  @@ -2906,6 +2906,7 @@ esac
   case ${target} in
   *-intelmic-* | *-intelmicemul-*)
  tmake_file=${tmake_file} i386/t-intelmic
  +   tm_file=${tm_file} i386/intelmic-offload.h
  ;;
   esac
   
  +#define ACCEL_COMPILER_acc_device GOMP_DEVICE_INTEL_MIC
  +
  +#endif

Oh, and where is this defined for nvptx-none target?

Jakub


[PATCH] X86-64: Add -mskip-rax-setup

2014-12-18 Thread H.J. Lu
The Linux kernel never passes floating point arguments around, vararg
functions or not. Hence no vector registers are ever used when calling a
vararg function.  But gcc still dutifully emits an xor %eax,%eax before
each and every call of a vararg function.  Since no callee use that for
anything, these instructions are redundant.

This patch adds the -mskip-rax-setup option to skip setting up RAX
register when SSE is disabled and there are no variable arguments passed
in vector registers.  Since RAX register is used to avoid unnecessarily
saving vector registers on stack when passing variable arguments, the
impacts of this option are callees may waste some stack space, misbehave
or jump to a random location.  GCC 4.4 or newer don't those issues,
regardless the RAX register value since they don't check the RAX register
value when SSE is disabled, regardless the RAX register value:

https://gcc.gnu.org/ml/gcc-patches/2008-09/msg00127.html

I used it on kernel 3.17.7:

   textdata bss   dec hexfilename
11493571  2271232  5926912  19691715 12c78c3 vmlinux.skip-rax
11517879  2271232  5926912  19716023 12cd7b7 vmlinux.orig

It removed 14309 redundant xor %eax,%eax instructions and saved about
27KB.  I am currently running the new kernel without any problem.  OK
for trunk?

Thanks.

H.J.
---
gcc/

* config/i386/i386.c (ix86_expand_call): Skip setting up RAX
register for -mskip-rax-setup when there are no parameters
passed in vector registers.
* config/i386/i386.opt (mskip-rax-setup): New option.
* doc/invoke.texi: Document -mskip-rax-setup.

gcc/testsuite/

* gcc.target/i386/amd64-abi-7.c: New tests.
* gcc.target/i386/amd64-abi-8.c: Likwise.
* gcc.target/i386/amd64-abi-9.c: Likwise.
---
 gcc/ChangeLog   |  8 +
 gcc/config/i386/i386.c  |  7 -
 gcc/config/i386/i386.opt|  4 +++
 gcc/doc/invoke.texi | 13 
 gcc/testsuite/ChangeLog |  6 
 gcc/testsuite/gcc.target/i386/amd64-abi-7.c | 46 +
 gcc/testsuite/gcc.target/i386/amd64-abi-8.c | 18 +++
 gcc/testsuite/gcc.target/i386/amd64-abi-9.c | 18 +++
 8 files changed, 119 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/amd64-abi-7.c
 create mode 100644 gcc/testsuite/gcc.target/i386/amd64-abi-8.c
 create mode 100644 gcc/testsuite/gcc.target/i386/amd64-abi-9.c

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 24a252a..de7907a 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,11 @@
+2014-12-18  H.J. Lu  hongjiu...@intel.com
+
+   * config/i386/i386.c (ix86_expand_call): Skip setting up RAX
+   register for -mskip-rax-setup when there are no parameters
+   passed in vector registers.
+   * config/i386/i386.opt (mskip-rax-setup): New option.
+   * doc/invoke.texi: Document -mskip-rax-setup.
+
 2014-12-18  Martin Liska  mli...@suse.cz
 
PR ipa/64146
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 17ef751..122a350 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -25461,7 +25461,12 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx callarg1,
}
 }
 
-  if (TARGET_64BIT  INTVAL (callarg2) = 0)
+  /* Skip setting up RAX register for -mskip-rax-setup when there are no
+ parameters passed in vector registers.  */
+  if (TARGET_64BIT
+   (INTVAL (callarg2)  0
+ || (INTVAL (callarg2) == 0
+  (TARGET_SSE || !flag_skip_rax_setup
 {
   rtx al = gen_rtx_REG (QImode, AX_REG);
   emit_move_insn (al, callarg2);
diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index 3d54bfa..6dc4da2 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -831,6 +831,10 @@ Target Report Var(flag_nop_mcount) Init(0)
 Generate mcount/__fentry__ calls as nops. To activate they need to be
 patched in.
 
+mskip-rax-setup
+Target Report Var(flag_skip_rax_setup) Init(0)
+Skip setting up RAX register when passing variable arguments.
+
 m8bit-idiv
 Target Report Mask(USE_8BIT_IDIV) Save
 Expand 32bit/64bit integer divide into 8bit unsigned integer divide with 
run-time check
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 15068da..33a7ed2 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -16256,6 +16256,19 @@ the profiling functions as nops. This is useful when 
they
 should be patched in later dynamically. This is likely only
 useful together with @option{-mrecord-mcount}.
 
+@item -mskip-rax-setup
+@itemx -mno-skip-rax-setup
+@opindex mskip-rax-setup
+When generating code for the x86-64 architecture with SSE extensions
+disabled, @option{-skip-rax-setup} can be used to skip setting up RAX
+register when there are no variable arguments passed in vector registers.
+
+@strong{Warning:} Since RAX register is used to avoid unnecessarily
+saving vector 

Re: OpenACC middle end changes

2014-12-18 Thread Thomas Schwinge
Hi Jakub!

On Thu, 18 Dec 2014 12:33:11 +0100, Jakub Jelinek ja...@redhat.com wrote:
 On Thu, Dec 18, 2014 at 11:46:00AM +0100, Thomas Schwinge wrote:
   just
 rtx v1 = GEN_INT (...);
 rtx v2 = GEN_INT (...);
 machine_mode mode = TYPE_MODE (TREE_TYPE (arg));
 rtx ret = gen_reg_rtx (TYPE_MODE (integer_type_node));
 emit_move_insn (ret, const0_rtx);
 rtx_code_label *done_label = gen_label_rtx ();
 emit_cmp_and_jump_insns (op, v1, NE, NULL_RTX, mode,
false, done_label, PROB_EVEN);
 emit_cmp_and_jump_insns (op, v2, NE, NULL_RTX, mode,
false, done_label, PROB_EVEN);
 emit_move_insn (ret, const1_rtx);
 emit_label (done_label);
 return ret;
   or similar.
  
  Thanks for the review/suggestion/code!
 
 Note, as I found later, emit_cmp_and_jump_insns is good enough only
 for certain modes on certain architectures (in particular, for
 cases where can_compare_p returns true).
 So it is better to use do_compare_rtx_and_jump instead of
 emit_cmp_and_jump_insns, because it handles also the cases which
 emit_cmp_and_jump_insns silently mishandles.  You'll need to reorder
 the arguments a little bit and add one NULL_RTX argument.
 See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63848#c4

Thanks again; committed to gomp-4_0-branch in r218862:

commit a58e1475324e6dd6c34a95883f5efc854e204fde
Author: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4
Date:   Thu Dec 18 13:13:06 2014 +

OpenACC acc_on_device: Harden builtin expansion.

gcc/
* builtins.c (expand_builtin_acc_on_device): Use
do_compare_rtx_and_jump instead of emit_cmp_and_jump_insns.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@218862 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog.gomp | 3 +++
 gcc/builtins.c | 8 
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp
index a3650c5..1e6df5f 100644
--- gcc/ChangeLog.gomp
+++ gcc/ChangeLog.gomp
@@ -1,6 +1,9 @@
 2014-12-18  Thomas Schwinge  tho...@codesourcery.com
Jakub Jelinek  ja...@redhat.com
 
+   * builtins.c (expand_builtin_acc_on_device): Use
+   do_compare_rtx_and_jump instead of emit_cmp_and_jump_insns.
+
* builtins.c (expand_builtin_acc_on_device): Make more RTXy.
 
 2014-12-17  Thomas Schwinge  tho...@codesourcery.com
diff --git gcc/builtins.c gcc/builtins.c
index e946521..33025a5 100644
--- gcc/builtins.c
+++ gcc/builtins.c
@@ -5911,10 +5911,10 @@ expand_builtin_acc_on_device (tree exp, rtx target)
 target = gen_reg_rtx (target_mode);
   emit_move_insn (target, const0_rtx);
   rtx_code_label *done_label = gen_label_rtx ();
-  emit_cmp_and_jump_insns (v, v1, NE, NULL_RTX, v_mode,
-  false, done_label, PROB_EVEN);
-  emit_cmp_and_jump_insns (v, v2, NE, NULL_RTX, v_mode,
-  false, done_label, PROB_EVEN);
+  do_compare_rtx_and_jump (v, v1, NE, false, v_mode, NULL_RTX,
+  NULL_RTX, done_label, PROB_EVEN);
+  do_compare_rtx_and_jump (v, v2, NE, false, v_mode, NULL_RTX,
+  NULL_RTX, done_label, PROB_EVEN);
   emit_move_insn (target, const1_rtx);
   emit_label (done_label);
 


Grüße,
 Thomas


pgpybie4WoKNR.pgp
Description: PGP signature


Re: OpenACC middle end changes

2014-12-18 Thread Thomas Schwinge
Hi Jakub!

On Thu, 18 Dec 2014 13:36:16 +0100, Jakub Jelinek ja...@redhat.com wrote:
 On Thu, Dec 18, 2014 at 01:31:45PM +0100, Jakub Jelinek wrote:
   --- gcc/config.gcc
   +++ gcc/config.gcc
   @@ -2906,6 +2906,7 @@ esac
case ${target} in
*-intelmic-* | *-intelmicemul-*)
 tmake_file=${tmake_file} i386/t-intelmic
   + tm_file=${tm_file} i386/intelmic-offload.h
 ;;
esac

   +#define ACCEL_COMPILER_acc_device GOMP_DEVICE_INTEL_MIC
   +
   +#endif
 
 Oh, and where is this defined for nvptx-none target?

One of the pending commits (together with the nvptx mkoffload); I now
applied the following to gomp-4_0-branch in r218863:

commit c3f63a62eb4332f651be5fd377d2d289c8c949f5
Author: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4
Date:   Thu Dec 18 13:13:17 2014 +

Support for OpenACC acc_on_device in offloading configurations.

gcc/
* config/i386/intelmic-offload.h: New file.
* config/nvptx/offload.h: Likewise.
* config.gcc *-intelmic-*, *-intelmicemul-*, nvptx-*: Point to
them via tm_file.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@218863 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog.gomp |  7 +++
 gcc/config.gcc |  2 ++
 gcc/config/i386/intelmic-offload.h | 35 +++
 gcc/config/nvptx/offload.h | 35 +++
 4 files changed, 79 insertions(+)

diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp
index 1e6df5f..a744ebf 100644
--- gcc/ChangeLog.gomp
+++ gcc/ChangeLog.gomp
@@ -1,4 +1,11 @@
 2014-12-18  Thomas Schwinge  tho...@codesourcery.com
+
+   * config/i386/intelmic-offload.h: New file.
+   * config/nvptx/offload.h: Likewise.
+   * config.gcc *-intelmic-*, *-intelmicemul-*, nvptx-*: Point to
+   them via tm_file.
+
+2014-12-18  Thomas Schwinge  tho...@codesourcery.com
Jakub Jelinek  ja...@redhat.com
 
* builtins.c (expand_builtin_acc_on_device): Use
diff --git gcc/config.gcc gcc/config.gcc
index 8541274..1e453e9 100644
--- gcc/config.gcc
+++ gcc/config.gcc
@@ -2178,6 +2178,7 @@ nios2-*-*)
 nvptx-*)
tm_file=${tm_file} newlib-stdint.h
tmake_file=nvptx/t-nvptx
+   tm_file=${tm_file} nvptx/offload.h
;;
 pdp11-*-*)
tm_file=${tm_file} newlib-stdint.h
@@ -2906,6 +2907,7 @@ esac
 case ${target} in
 *-intelmic-* | *-intelmicemul-*)
tmake_file=${tmake_file} i386/t-intelmic
+   tm_file=${tm_file} i386/intelmic-offload.h
;;
 esac
 
diff --git gcc/config/i386/intelmic-offload.h gcc/config/i386/intelmic-offload.h
new file mode 100644
index 000..bea18ed
--- /dev/null
+++ gcc/config/i386/intelmic-offload.h
@@ -0,0 +1,35 @@
+/* Support for Intel MIC offloading.
+
+   Copyright (C) 2014 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   http://www.gnu.org/licenses/.  */
+
+#ifndef INTELMIC_OFFLOAD_H
+#define INTELMIC_OFFLOAD_H
+
+/* Support for OpenACC acc_on_device.  */
+
+#include gomp-constants.h
+
+#define ACCEL_COMPILER_acc_device GOMP_DEVICE_INTEL_MIC
+
+#endif
diff --git gcc/config/nvptx/offload.h gcc/config/nvptx/offload.h
new file mode 100644
index 000..63f9a02
--- /dev/null
+++ gcc/config/nvptx/offload.h
@@ -0,0 +1,35 @@
+/* Support for Nvidia PTX offloading.
+
+   Copyright (C) 2014 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software 

Re: [PATCH] X86-64: Add -mskip-rax-setup

2014-12-18 Thread Uros Bizjak
On Thu, Dec 18, 2014 at 2:11 PM, H.J. Lu hongjiu...@intel.com wrote:
 The Linux kernel never passes floating point arguments around, vararg
 functions or not. Hence no vector registers are ever used when calling a
 vararg function.  But gcc still dutifully emits an xor %eax,%eax before
 each and every call of a vararg function.  Since no callee use that for
 anything, these instructions are redundant.

 This patch adds the -mskip-rax-setup option to skip setting up RAX
 register when SSE is disabled and there are no variable arguments passed
 in vector registers.  Since RAX register is used to avoid unnecessarily
 saving vector registers on stack when passing variable arguments, the
 impacts of this option are callees may waste some stack space, misbehave
 or jump to a random location.  GCC 4.4 or newer don't those issues,
 regardless the RAX register value since they don't check the RAX register
 value when SSE is disabled, regardless the RAX register value:

 https://gcc.gnu.org/ml/gcc-patches/2008-09/msg00127.html

 I used it on kernel 3.17.7:

textdata bss   dec hexfilename
 11493571  2271232  5926912  19691715 12c78c3 vmlinux.skip-rax
 11517879  2271232  5926912  19716023 12cd7b7 vmlinux.orig

 It removed 14309 redundant xor %eax,%eax instructions and saved about
 27KB.  I am currently running the new kernel without any problem.  OK
 for trunk?

How about skipping RAX setup unconditionally for !TARGET_SSE? Please
see ix86_conditional_register_usage, where SSE registers are squashed
for !TARGET_SSE, so it is not possible to use them even in the inline
asm.

Uros.


Re: [PATCH][AArch64] Implement vsqrt_f64 intrinsic

2014-12-18 Thread Joseph Myers
On Thu, 18 Dec 2014, Kyrill Tkachov wrote:

 I see that there are some intrinsics implemented in terms of __builtin_fabsf,
 presumable they can be at 'risk' too of having a library call emitted?

The semantics of fabsf/fabs/fabsl are to clear the sign bit, never raise 
any exceptions (even for signaling NaN arguments, subject to any 
limitations imposed by calling conventions / floating-point load 
conventions on the processor) and never set errno, as per IEEE 754-2008 
(and the C bindings thereto, TS 18661-1:2014).

So there is certainly no need to call a library function for 
__builtin_fabsf, but I don't know whether the implementation guarantees 
never to generate such a call.

-- 
Joseph S. Myers
jos...@codesourcery.com


[PATCH] Fix for PR64330

2014-12-18 Thread Martin Liška

Hello.

Following patch is fix for PR64330, is pre-approved by Honza and I'm going to 
install it.
No regression was seen on x86_64-linux-pc.

Thanks,
Martin
From 6f9ed17a2897be6a26e2c64f8e581771c216e41b Mon Sep 17 00:00:00 2001
From: mliska mli...@suse.cz
Date: Wed, 17 Dec 2014 19:11:48 +0100
Subject: [PATCH] Fix for PR64330.

gcc/ChangeLog:

2014-12-17  Martin Liska  mli...@suse.cz

	PR tree-optimization/64330
	* ipa-icf.c (sem_variable::parse): Add checking
	for externally visible symbols and do not introduce
	an alias for an external declaration.
---
 gcc/ipa-icf.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/gcc/ipa-icf.c b/gcc/ipa-icf.c
index b193200..ad106e5 100644
--- a/gcc/ipa-icf.c
+++ b/gcc/ipa-icf.c
@@ -1131,10 +1131,14 @@ sem_variable::parse (varpool_node *node, bitmap_obstack *stack)
   tree decl = node-decl;
 
   bool readonly = TYPE_P (decl) ? TYPE_READONLY (decl) : TREE_READONLY (decl);
-  bool can_handle = readonly  (DECL_VIRTUAL_P (decl)
- || !TREE_ADDRESSABLE (decl));
+  if (!readonly)
+return NULL;
+
+  bool can_handle = DECL_VIRTUAL_P (decl)
+		|| flag_merge_constants = 2
+		|| (!TREE_ADDRESSABLE (decl)  !node-externally_visible);
 
-  if (!can_handle)
+  if (!can_handle || DECL_EXTERNAL (decl))
 return NULL;
 
   tree ctor = ctor_for_folding (decl);
-- 
2.1.2



Re: [PATCH] IPA ICF: refactoring + fix for PR ipa/63569

2014-12-18 Thread Martin Liška

On 12/17/2014 04:23 PM, Richard Biener wrote:

On Wed, Dec 17, 2014 at 12:17 PM, Martin Liška mli...@suse.cz wrote:

On 12/11/2014 01:37 PM, Richard Biener wrote:


On Wed, Dec 10, 2014 at 1:18 PM, Martin Liška mli...@suse.cz wrote:


Hello.

As suggested by Richard, I split compare_operand functions to various
functions
related to a specific comparison. Apart from that I added fast check for
volatility flag that caused miscompilation mentioned in PR63569.

Patch can bootstrap on x86_64-linux-pc without any regression seen and I
was
able to build Firefox with LTO.

Ready for trunk?



Hmm, I don't think the dispatch to compare_memory_operand is at the
correct place.  It should be called from places where currently
compare_operand is called and it should recurse to compare_operand.
That is, it is more high-level.

Can you please fix the volatile issue separately?  It's also not necessary
to do that check on every operand but just on memory operands.



Hello Richard.

I hope the following patch is the finally the right approach we want to do
;)
In the first patch, I did just refactoring for high-level memory operand
comparison and the second of fixes problem connected to missing volatile
attribute comparison.

Patch was retested and can bootstrap.

What do you think about it?


+bool
+func_checker::compare_cst_or_decl (tree t1, tree t2)
+{
...
+case INTEGER_CST:
+  {
+   ret = compatible_types_p (TREE_TYPE (t1), TREE_TYPE (t2))
+  wi::to_offset  (t1) == wi::to_offset  (t2);
+
+   return return_with_debug (ret);
+  }
+case COMPLEX_CST:
+case VECTOR_CST:
+case STRING_CST:
+case REAL_CST:
+  {
+   ret = operand_equal_p (t1, t2, OEP_ONLY_CONST);
+   return return_with_debug (ret);

why does the type matter for INTEGER_CSTs but not for other constants?
Also comparing INTEGER_CSTs via to_offset is certainly wrong.  Please
use tree_int_cst_equal_p (t1, t2) instead, or operand_equal_p which would
end up calling tree_int_cst_equal_p.  That said, I'd have commonized all
_CST nodes by

   ret = compatible_types_p ()  operand_equal_p (...);

+case CONST_DECL:
+case BIT_FIELD_REF:
+  {

I'm quite sure BIT_FIELD_REF is spurious here.

Now to the meat ...

+  tree load1 = get_base_loadstore (t1, false);
+  tree load2 = get_base_loadstore (t2, false);
+
+  if (load1  load2  !compare_memory_operand (t1, t2))
+   return return_false_with_msg (memory operands are different);
+  else if ((load1  !load2) || (!load1  load2))
+   return return_false_with_msg ();
+

and the similar code for assigns.  The way you introduce the
unpack_handled_component flag to get_base_loadstore makes
it treat x.field as non-memory operand.  That's wrong.  I wonder
why you think you needed that.  Why does

   tree load1= get_base_loadstore (t1);

not work?  Btw, I'd have avoided get_base_loadstore and simply did

   ao_ref_init (r1, t1);
   ao_ref_init (r2, t2);
   tree base1 = ao_ref_base (r1);
   tree base2 = ao_ref_base (r2);
   if ((DECL_P (base1) || (TREE_CODE (base1) == MEM_REF || TREE_CODE
(base1) == TARGET_MEM_REF))
   (DECL_P (base2) || (...))
...

or rather put this into compare_memory_operand and call that on all operands
that may be memory (TREE_CODE () != SSA_NAME  !is_gimple_min_invariant ()).

I also miss where you handle memory operands on the LHS for calls
and for assignments.

The compare_ssa_name refactoring part is ok to install.

Can you please fix the volatile bug before the refactoring as it looks like
we're going to iterate some more here unless I can find the time to give
it a quick try myself.


Hi.

Sure, there's minimalistic patch which fixes the PR.
Can I install this change?

I'm going to apply your notes that are orthogonal to the PR.

Thanks,
Martin



Thanks,
Richard.


Thank you,
Martin



Thanks,
Richard.


Thanks,
Martin





From b984eaae263eedc8beb2413f4739d3574f46d63d Mon Sep 17 00:00:00 2001
From: mliska mli...@suse.cz
Date: Thu, 18 Dec 2014 14:16:12 +0100
Subject: [PATCH 1/2] Fix for PR ipa/63569.

gcc/testsuite/ChangeLog:

2014-12-18  Martin Liska  mli...@suse.cz

	PR ipa/63569
	* gcc.dg/ipa/pr63569.c: New test.

gcc/ChangeLog:

2014-12-18  Martin Liska  mli...@suse.cz

	PR ipa/63569
	* ipa-icf-gimple.c (func_checker::compare_operand): Add missing
	comparison for volatile flag.
---
 gcc/ipa-icf-gimple.c   |  3 +++
 gcc/testsuite/gcc.dg/ipa/pr63569.c | 32 
 2 files changed, 35 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/ipa/pr63569.c

diff --git a/gcc/ipa-icf-gimple.c b/gcc/ipa-icf-gimple.c
index ec0290a..fa2c353 100644
--- a/gcc/ipa-icf-gimple.c
+++ b/gcc/ipa-icf-gimple.c
@@ -230,6 +230,9 @@ func_checker::compare_operand (tree t1, tree t2)
   tree tt1 = TREE_TYPE (t1);
   tree tt2 = TREE_TYPE (t2);
 
+  if (TREE_THIS_VOLATILE (t1) != TREE_THIS_VOLATILE (t2))
+return return_false_with_msg (different operand volatility);
+
   if 

Re: [PATCH 2/3] Extended if-conversion

2014-12-18 Thread Yuri Rumyantsev
 SSA 
 names
 required for the predicates are defined upward - and the 
 complex CFG
 is squashed to a single basic-block, thus the defs will 
 dominate the
 inserted code if you insert after labels just like for the 
 other case.
 Or what am I missing?  (flattening of the basic-blocks of 
 course needs
 to happen in dominator order - but I guess that happens 
 already?)

 I'd like the extended PHI handling to be enablable by a flag 
 even
 for !force-vectorization - I've seen cases with 3 PHI args 
 multiple
 times that would have been nice to vectorize.  I suggest to
 add -ftree-loop-if-convert-aggressive for this.  We can do this 
 as
 followup, but please rename the local flag_force_vectorize flag
 to something less looking like a flag, like simply 'aggressive'.

 Otherwise patch 2 looks ok to me.

 Richard.


 ChangeLog:

 2014-10-24  Yuri Rumyantsev  ysrum...@gmail.com

 * tree-if-conv.c (ifcvt_can_use_mask_load_store): Use
 FLAG_FORCE_VECTORIZE instead of loop flag.
 (if_convertible_bb_p): Allow bb has more than 2 predecessors if
 FLAG_FORCE_VECTORIZE is true.
 (if_convertible_bb_p): Delete check that bb has at least one
 non-critical incoming edge.
 (phi_has_two_different_args): New function.
 (is_cond_scalar_reduction): Add argument EXTENDED to choose 
 access
 to phi arguments. Invoke phi_has_two_different_args to get phi
 arguments if EXTENDED is true. Change check that block
 containing reduction statement candidate is predecessor
 of phi-block since phi may have more than two arguments.
 (convert_scalar_cond_reduction): Add argument BEFORE to insert
 statement before/after gsi point.
 (predicate_scalar_phi): Add argument false (which means 
 non-extended
 predication) to call of is_cond_scalar_reduction. Add argument
 true (which correspondent to argument BEFORE) to call of
 convert_scalar_cond_reduction.
 (get_predicate_for_edge): New function.
 (predicate_arbitrary_scalar_phi): New function.
 (predicate_extended_scalar_phi): New function.
 (find_insertion_point): New function.
 (predicate_all_scalar_phis): Add two boolean variables 
 EXTENDED and
 BEFORE. Initialize EXTENDED to true if BB containing phi has 
 more
 than 2 predecessors or both incoming edges are critical. Invoke
 find_phi_replacement_condition and predicate_scalar_phi or
 find_insertion_point and predicate_extended_scalar_phi 
 depending on
 EXTENDED value.
 (insert_gimplified_predicates): Add check that non-predicated 
 block
 may have statements to insert. Insert predicate of BB just 
 after label
 if FLAG_FORCE_VECTORIZE is true.
 (tree_if_conversion): Add initialization of 
 FLAG_FORCE_VECTORIZE which
 is copy of inner or outer loop field force_vectorize.


patch.20141218
Description: Binary data


Pragma parsing (was: [PATCH] OpenACC for C++ front end)

2014-12-18 Thread Thomas Schwinge
Hi!

On Thu, 13 Nov 2014 14:02:37 +0100, Jakub Jelinek ja...@redhat.com wrote:
   void
   init_pragma (void)
   {
  +  if (flag_openacc)
  +{
  +  const int n_oacc_pragmas
  +   = sizeof (oacc_pragmas) / sizeof (*oacc_pragmas);
  +  int i;
  +
  +  for (i = 0; i  n_oacc_pragmas; ++i)
  +   cpp_register_deferred_pragma (parse_in, acc, oacc_pragmas[i].name,
  + oacc_pragmas[i].id, true, true);
  +}
  +
 if (flag_openmp)
   {
 const int n_omp_pragmas = sizeof (omp_pragmas) / sizeof 
  (*omp_pragmas);
 
 Is -fopenmp -fopenacc tested not to run out of number of supported pragmas
 by libcpp?

We're running (some limited amount of) combined OpenACC/OpenMP compile
testing, yes.


  -/* All clauses defined by OpenMP 2.5, 3.0, 3.1 and 4.0.
  +/* All clauses defined by OpenACC 2.0, and OpenMP 2.5, 3.0, 3.1, and 4.0.
  Used internally by both C and C++ parsers.  */
   typedef enum pragma_omp_clause {
 PRAGMA_OMP_CLAUSE_NONE = 0,
   
 PRAGMA_OMP_CLAUSE_ALIGNED,
  +  PRAGMA_OMP_CLAUSE_ASYNC,
 PRAGMA_OMP_CLAUSE_COLLAPSE,
  +  PRAGMA_OMP_CLAUSE_COPY,
 PRAGMA_OMP_CLAUSE_COPYIN,
  +  PRAGMA_OMP_CLAUSE_COPYOUT,
 PRAGMA_OMP_CLAUSE_COPYPRIVATE,
  +  PRAGMA_OMP_CLAUSE_CREATE,
 PRAGMA_OMP_CLAUSE_DEFAULT,
  +  PRAGMA_OMP_CLAUSE_DELETE,
 PRAGMA_OMP_CLAUSE_DEPEND,
 PRAGMA_OMP_CLAUSE_DEVICE,
  +  PRAGMA_OMP_CLAUSE_DEVICEPTR,
 PRAGMA_OMP_CLAUSE_DIST_SCHEDULE,
 PRAGMA_OMP_CLAUSE_FINAL,
 PRAGMA_OMP_CLAUSE_FIRSTPRIVATE,
 PRAGMA_OMP_CLAUSE_FOR,
 PRAGMA_OMP_CLAUSE_FROM,
  +  PRAGMA_OMP_CLAUSE_HOST,
 PRAGMA_OMP_CLAUSE_IF,
 PRAGMA_OMP_CLAUSE_INBRANCH,
 PRAGMA_OMP_CLAUSE_LASTPRIVATE,
  @@ -90,16 +106,24 @@ typedef enum pragma_omp_clause {
 PRAGMA_OMP_CLAUSE_MERGEABLE,
 PRAGMA_OMP_CLAUSE_NOTINBRANCH,
 PRAGMA_OMP_CLAUSE_NOWAIT,
  +  PRAGMA_OMP_CLAUSE_NUM_GANGS,
 PRAGMA_OMP_CLAUSE_NUM_TEAMS,
 PRAGMA_OMP_CLAUSE_NUM_THREADS,
  +  PRAGMA_OMP_CLAUSE_NUM_WORKERS,
 PRAGMA_OMP_CLAUSE_ORDERED,
 PRAGMA_OMP_CLAUSE_PARALLEL,
  +  PRAGMA_OMP_CLAUSE_PRESENT,
  +  PRAGMA_OMP_CLAUSE_PRESENT_OR_COPY,
  +  PRAGMA_OMP_CLAUSE_PRESENT_OR_COPYIN,
  +  PRAGMA_OMP_CLAUSE_PRESENT_OR_COPYOUT,
  +  PRAGMA_OMP_CLAUSE_PRESENT_OR_CREATE,
 PRAGMA_OMP_CLAUSE_PRIVATE,
 PRAGMA_OMP_CLAUSE_PROC_BIND,
 PRAGMA_OMP_CLAUSE_REDUCTION,
 PRAGMA_OMP_CLAUSE_SAFELEN,
 PRAGMA_OMP_CLAUSE_SCHEDULE,
 PRAGMA_OMP_CLAUSE_SECTIONS,
  +  PRAGMA_OMP_CLAUSE_SELF,
 PRAGMA_OMP_CLAUSE_SHARED,
 PRAGMA_OMP_CLAUSE_SIMDLEN,
 PRAGMA_OMP_CLAUSE_TASKGROUP,
  @@ -107,6 +131,8 @@ typedef enum pragma_omp_clause {
 PRAGMA_OMP_CLAUSE_TO,
 PRAGMA_OMP_CLAUSE_UNIFORM,
 PRAGMA_OMP_CLAUSE_UNTIED,
  +  PRAGMA_OMP_CLAUSE_VECTOR_LENGTH,
  +  PRAGMA_OMP_CLAUSE_WAIT,
 
 Like for CILK, I'd strongly prefer if for the clauses that are
 specific to OpenACC only you'd use PRAGMA_OACC_CLAUSE_* instead,
 and put them after the PRAGMA_CILK_* enum values.
 If you want to have PRAGMA_OACC_CLAUSE_ aliases also for the
 clauses shared in between OpenMP and OpenACC, feel free to add
 aliases like Cilk+ has them.  It is unfortunately lots of new clauses
 and we are getting close to the 64 clauses limit :( when they are used
 in bitmasks.

I still don't like this very much, because we'll then get a mess of
PRAGMA_OMP_CLAUSE_* intermixed with PRAGMA_OACC_CLAUSE_*.  I understand,
for example, PRAGMA_OMP_CLAUSE_REDUCTION to just be a numeric
representation of the reduction string -- and then, it doesn't make
too much sense to me to express this both as PRAGMA_OMP_CLAUSE_REDUCTION
and PRAGMA_OACC_CLAUSE_REDUCTION, or, similarly, to switch back and forth
between PRAGMA_OMP_CLAUSE_* and PRAGMA_OACC_CLAUSE_*.  Yet, that's just
the point that I'm making, so I'll defer if there's no consensus to be
found here.


Chung-Lin also had a suggestion to make; what do you think of this?

--- c-family/c-common.h (revision 442892)
+++ c-family/c-common.h (working copy)
@@ -1076,138 +1076,17 @@ extern void pp_dir_change (cpp_reader *, const cha
 extern bool check_missing_format_attribute (tree, tree);
 
 /* In c-omp.c  */
-#if HOST_BITS_PER_WIDE_INT = 64
-typedef unsigned HOST_WIDE_INT omp_clause_mask;
-# define OMP_CLAUSE_MASK_1 ((omp_clause_mask) 1)
-#else
-struct omp_clause_mask
+#include bitset
+typedef std::bitset128 omp_clause_mask;
+/* Provide '' before full transition to set() method.  */
+static inline omp_clause_mask
+operator  (const omp_clause_mask a, const omp_clause_mask b)
 {
-  inline omp_clause_mask ();
-  inline omp_clause_mask (unsigned HOST_WIDE_INT l);
-  inline omp_clause_mask (unsigned HOST_WIDE_INT l,
- unsigned HOST_WIDE_INT h);
-  inline omp_clause_mask operator = (omp_clause_mask);
-  inline omp_clause_mask operator |= (omp_clause_mask);
-  inline omp_clause_mask operator ~ () const;
-  inline omp_clause_mask operator  (omp_clause_mask) const;
-  inline omp_clause_mask operator | (omp_clause_mask) 

Re: [PATCH] X86-64: Add -mskip-rax-setup

2014-12-18 Thread Uros Bizjak
On Thu, Dec 18, 2014 at 2:24 PM, Uros Bizjak ubiz...@gmail.com wrote:
 On Thu, Dec 18, 2014 at 2:11 PM, H.J. Lu hongjiu...@intel.com wrote:
 The Linux kernel never passes floating point arguments around, vararg
 functions or not. Hence no vector registers are ever used when calling a
 vararg function.  But gcc still dutifully emits an xor %eax,%eax before
 each and every call of a vararg function.  Since no callee use that for
 anything, these instructions are redundant.

 This patch adds the -mskip-rax-setup option to skip setting up RAX
 register when SSE is disabled and there are no variable arguments passed
 in vector registers.  Since RAX register is used to avoid unnecessarily
 saving vector registers on stack when passing variable arguments, the
 impacts of this option are callees may waste some stack space, misbehave
 or jump to a random location.  GCC 4.4 or newer don't those issues,
 regardless the RAX register value since they don't check the RAX register
 value when SSE is disabled, regardless the RAX register value:

 https://gcc.gnu.org/ml/gcc-patches/2008-09/msg00127.html

 I used it on kernel 3.17.7:

textdata bss   dec hexfilename
 11493571  2271232  5926912  19691715 12c78c3 vmlinux.skip-rax
 11517879  2271232  5926912  19716023 12cd7b7 vmlinux.orig

 It removed 14309 redundant xor %eax,%eax instructions and saved about
 27KB.  I am currently running the new kernel without any problem.  OK
 for trunk?

 How about skipping RAX setup unconditionally for !TARGET_SSE? Please
 see ix86_conditional_register_usage, where SSE registers are squashed
 for !TARGET_SSE, so it is not possible to use them even in the inline
 asm.

... when -ffreestanding is in effect, of course.

Uros.


Re: [PATCH] X86-64: Add -mskip-rax-setup

2014-12-18 Thread Jakub Jelinek
On Thu, Dec 18, 2014 at 02:24:06PM +0100, Uros Bizjak wrote:
  It removed 14309 redundant xor %eax,%eax instructions and saved about
  27KB.  I am currently running the new kernel without any problem.  OK
  for trunk?
 
 How about skipping RAX setup unconditionally for !TARGET_SSE? Please
 see ix86_conditional_register_usage, where SSE registers are squashed
 for !TARGET_SSE, so it is not possible to use them even in the inline
 asm.

I'd say a problem is if a -mno-sse TU calls a vararg function (obviously
it can't pass any float/double arguments) to a function in a TU compiled
with -msse2 or higher where the stdarg pass can't figure out anything, say
#include stdarg.h
extern void bar (int, va_list);
void
foo (int x, ...)
{
  va_list ap;
  va_start (ap, x);
  bar (x, ap);
  va_end (ap);
}
If foo is compiled with gcc 4.4 and earlier, it might crash when called from
-mno-sse caller that would not xor %eax,%eax.  If foo is compiled with gcc
4.5? and higher, then it might just randomly save all the xmm registers to
stack (as the test is %al != 0, I think it will be more likely that it will
save it unnecessarily than not).
So I view H.J.'s new option as a user guarantee the callee will be also 
-mno-sse.

Jakub


Re: [PATCH] X86-64: Add -mskip-rax-setup

2014-12-18 Thread H.J. Lu
On Thu, Dec 18, 2014 at 5:51 AM, Jakub Jelinek ja...@redhat.com wrote:
 On Thu, Dec 18, 2014 at 02:24:06PM +0100, Uros Bizjak wrote:
  It removed 14309 redundant xor %eax,%eax instructions and saved about
  27KB.  I am currently running the new kernel without any problem.  OK
  for trunk?

 How about skipping RAX setup unconditionally for !TARGET_SSE? Please
 see ix86_conditional_register_usage, where SSE registers are squashed
 for !TARGET_SSE, so it is not possible to use them even in the inline
 asm.

 I'd say a problem is if a -mno-sse TU calls a vararg function (obviously
 it can't pass any float/double arguments) to a function in a TU compiled
 with -msse2 or higher where the stdarg pass can't figure out anything, say
 #include stdarg.h
 extern void bar (int, va_list);
 void
 foo (int x, ...)
 {
   va_list ap;
   va_start (ap, x);
   bar (x, ap);
   va_end (ap);
 }
 If foo is compiled with gcc 4.4 and earlier, it might crash when called from

This was checked into GCC 4.4:

commit d5d9458afdae9056a9624ae9c332dbcdc2f383be
Author: jakub jakub@138bc75d-0d04-0410-961f-82ee72b054a4
Date:   Tue Sep 2 19:49:41 2008 +

  * config/i386/i386.c (X86_64_VARARGS_SIZE): Removed.
  (setup_incoming_varargs_64): Assume cum != NULL. Set/check
  ix86_varargs_gpr_size and ix86_varargs_fpr_size.  Use
  ix86_varargs_gpr_size instead of X86_64_REGPARM_MAX.
  Don't set ix86_save_varrargs_registers.
  (ix86_setup_incoming_varargs): Assume cum != NULL.
  (ix86_va_start): Check ix86_varargs_gpr_size and
  ix86_varargs_fpr_size instead of cfun-va_list_gpr_size and
  cfun-va_list_fpr_size, respectively.  Subtract 8*X86_64_REGPARM_MAX
  from frame pointer if ix86_varargs_gpr_size == 0.
  (ix86_compute_frame_layout): Updated.
  * config/i386/i386.h (ix86_save_varrargs_registers): Removed.
  (ix86_varargs_gpr_size): Define.
  (ix86_varargs_fpr_size): Likewise.
  (machine_function): Remove save_varrargs_registers.
  Add varargs_gpr_size and varargs_fpr_size.

  * gcc.target/i386/amd64-abi-3.c: New test.
  * gcc.target/i386/amd64-abi-4.c: Likewise.
  * gcc.target/i386/amd64-abi-5.c: Likewise.
  * gcc.target/i386/amd64-abi-6.c: Likewise.


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@139910
138bc75d-0d04-0410-961f-82ee72b054a4

 -mno-sse caller that would not xor %eax,%eax.  If foo is compiled with gcc
 4.5? and higher, then it might just randomly save all the xmm registers to

GCC 4.4 is fine.

 stack (as the test is %al != 0, I think it will be more likely that it will
 save it unnecessarily than not).
 So I view H.J.'s new option as a user guarantee the callee will be also 
 -mno-sse.


If foo may be compiled by GCC 4.3 or older, we can't skip setting up
RAX in foo's
callers.  Such an option is useful.


-- 
H.J.


Re: [PATCH] X86-64: Add -mskip-rax-setup

2014-12-18 Thread Uros Bizjak
On Thu, Dec 18, 2014 at 2:49 PM, Uros Bizjak ubiz...@gmail.com wrote:

 The Linux kernel never passes floating point arguments around, vararg
 functions or not. Hence no vector registers are ever used when calling a
 vararg function.  But gcc still dutifully emits an xor %eax,%eax before
 each and every call of a vararg function.  Since no callee use that for
 anything, these instructions are redundant.

 This patch adds the -mskip-rax-setup option to skip setting up RAX
 register when SSE is disabled and there are no variable arguments passed
 in vector registers.  Since RAX register is used to avoid unnecessarily
 saving vector registers on stack when passing variable arguments, the
 impacts of this option are callees may waste some stack space, misbehave
 or jump to a random location.  GCC 4.4 or newer don't those issues,
 regardless the RAX register value since they don't check the RAX register
 value when SSE is disabled, regardless the RAX register value:

 https://gcc.gnu.org/ml/gcc-patches/2008-09/msg00127.html

 I used it on kernel 3.17.7:

textdata bss   dec hexfilename
 11493571  2271232  5926912  19691715 12c78c3 vmlinux.skip-rax
 11517879  2271232  5926912  19716023 12cd7b7 vmlinux.orig

 It removed 14309 redundant xor %eax,%eax instructions and saved about
 27KB.  I am currently running the new kernel without any problem.  OK
 for trunk?

 How about skipping RAX setup unconditionally for !TARGET_SSE? Please
 see ix86_conditional_register_usage, where SSE registers are squashed
 for !TARGET_SSE, so it is not possible to use them even in the inline
 asm.

 ... when -ffreestanding is in effect, of course.

Ops, this is not the unconditional default kernel compile flag. It is
defined only for 32bit builds, where:

# temporary until string.h is fixed
KBUILD_CFLAGS += -ffreestanding

Yes, it looks to me that new option is the way to go.

Uros.


Re: [PATCH] X86-64: Add -mskip-rax-setup

2014-12-18 Thread H.J. Lu
On Thu, Dec 18, 2014 at 6:03 AM, Uros Bizjak ubiz...@gmail.com wrote:
 On Thu, Dec 18, 2014 at 2:49 PM, Uros Bizjak ubiz...@gmail.com wrote:

 The Linux kernel never passes floating point arguments around, vararg
 functions or not. Hence no vector registers are ever used when calling a
 vararg function.  But gcc still dutifully emits an xor %eax,%eax before
 each and every call of a vararg function.  Since no callee use that for
 anything, these instructions are redundant.

 This patch adds the -mskip-rax-setup option to skip setting up RAX
 register when SSE is disabled and there are no variable arguments passed
 in vector registers.  Since RAX register is used to avoid unnecessarily
 saving vector registers on stack when passing variable arguments, the
 impacts of this option are callees may waste some stack space, misbehave
 or jump to a random location.  GCC 4.4 or newer don't those issues,
 regardless the RAX register value since they don't check the RAX register
 value when SSE is disabled, regardless the RAX register value:

 https://gcc.gnu.org/ml/gcc-patches/2008-09/msg00127.html

 I used it on kernel 3.17.7:

textdata bss   dec hexfilename
 11493571  2271232  5926912  19691715 12c78c3 vmlinux.skip-rax
 11517879  2271232  5926912  19716023 12cd7b7 vmlinux.orig

 It removed 14309 redundant xor %eax,%eax instructions and saved about
 27KB.  I am currently running the new kernel without any problem.  OK
 for trunk?

 How about skipping RAX setup unconditionally for !TARGET_SSE? Please
 see ix86_conditional_register_usage, where SSE registers are squashed
 for !TARGET_SSE, so it is not possible to use them even in the inline
 asm.

 ... when -ffreestanding is in effect, of course.

 Ops, this is not the unconditional default kernel compile flag. It is
 defined only for 32bit builds, where:

 # temporary until string.h is fixed
 KBUILD_CFLAGS += -ffreestanding

 Yes, it looks to me that new option is the way to go.

Is this an OK?

Some really old gcc versions used an indirect jump based on the eax input
and they didn't zero extend first.  So with those compilers, you could actually
jump to a random location.  You can enable it only when you can compile
everything with a newer GCC.

Thanks.

-- 
H.J.


Re: [PATCH] X86-64: Add -mskip-rax-setup

2014-12-18 Thread Uros Bizjak
On Thu, Dec 18, 2014 at 3:09 PM, H.J. Lu hjl.to...@gmail.com wrote:

 The Linux kernel never passes floating point arguments around, vararg
 functions or not. Hence no vector registers are ever used when calling a
 vararg function.  But gcc still dutifully emits an xor %eax,%eax before
 each and every call of a vararg function.  Since no callee use that for
 anything, these instructions are redundant.

 This patch adds the -mskip-rax-setup option to skip setting up RAX
 register when SSE is disabled and there are no variable arguments passed
 in vector registers.  Since RAX register is used to avoid unnecessarily
 saving vector registers on stack when passing variable arguments, the
 impacts of this option are callees may waste some stack space, misbehave
 or jump to a random location.  GCC 4.4 or newer don't those issues,
 regardless the RAX register value since they don't check the RAX register
 value when SSE is disabled, regardless the RAX register value:

 https://gcc.gnu.org/ml/gcc-patches/2008-09/msg00127.html

 I used it on kernel 3.17.7:

textdata bss   dec hexfilename
 11493571  2271232  5926912  19691715 12c78c3 vmlinux.skip-rax
 11517879  2271232  5926912  19716023 12cd7b7 vmlinux.orig

 It removed 14309 redundant xor %eax,%eax instructions and saved about
 27KB.  I am currently running the new kernel without any problem.  OK
 for trunk?

 How about skipping RAX setup unconditionally for !TARGET_SSE? Please
 see ix86_conditional_register_usage, where SSE registers are squashed
 for !TARGET_SSE, so it is not possible to use them even in the inline
 asm.

 ... when -ffreestanding is in effect, of course.

 Ops, this is not the unconditional default kernel compile flag. It is
 defined only for 32bit builds, where:

 # temporary until string.h is fixed
 KBUILD_CFLAGS += -ffreestanding

 Yes, it looks to me that new option is the way to go.

 Is this an OK?

In principle, I'm OK with the patch approach, but let's wait for
eventual comments from Linux people.

 Some really old gcc versions used an indirect jump based on the eax input
 and they didn't zero extend first.  So with those compilers, you could 
 actually
 jump to a random location.  You can enable it only when you can compile
 everything with a newer GCC.

Uros.


Re: [C++ Patch] PR 60955

2014-12-18 Thread Jason Merrill

On 12/18/2014 06:17 AM, Paolo Carlini wrote:

Sure. The below uses the c_inhibit_evaluation_warnings mechanism and
passes testing. I wondered if in such cases we could alternately use the
warning_sentinel mechanism (moved to cp-tree.h of course) ?


That would make sense to me.

Jason




Re: OpenACC middle end changes

2014-12-18 Thread Jakub Jelinek
On Thu, Dec 18, 2014 at 12:07:01PM +0100, Thomas Schwinge wrote:
 Many thanks for the review comments!  The very most have been addresed,
 here are just a few comments.  If you feel strongly/differently about
 any, I'll address those, too.

So, with your latest change both compilers build:
mkdir objmic; cd objmic
../configure --build=x86_64-intelmicemul-linux-gnu 
--host=x86_64-intelmicemul-linux-gnu --target=x86_64-intelmicemul-linux-gnu 
--enable-as-accelerator-for=x86_64-pc-linux-gnu --disable-bootstrap
make -j16
make DESTDIR=`pwd`/../objinst install
mkdir ../obj; cd ../obj
../configure --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu 
--target=x86_64-pc-linux-gnu 
--enable-offload-targets=x86_64-intelmicemul-linux-gnu=/usr/src/gomp-4.0/objmic 
--disable-bootstrap
make -j16

But there are issues during make check.
I first did:
make -j16 -k check RUNTESTFLAGS='--target_board=unix\{-m32,-m64\} gomp.exp 
goacc.exp goacc-gomp.exp'
and that shows:
Making a new config file...
echo set tmpdir /usr/src/gomp-4.0/obj/gcc/testsuite  ./site.tmp
rm -rf testsuite/gcc-parallel
rm -rf testsuite/g++-parallel
rm -rf testsuite/gfortran-parallel
rm -rf testsuite/objc-parallel
mkdir: cannot create directory ‘testsuite’: File exists
make[1]: Entering directory '/usr/src/gomp-4.0/obj/gcc'
make[1]: Entering directory '/usr/src/gomp-4.0/obj/gcc'
make[1]: Entering directory '/usr/src/gomp-4.0/obj/gcc'
make[1]: Entering directory '/usr/src/gomp-4.0/obj/gcc'
mkdir: cannot create directory ‘plugin’: File exists
mkdir: cannot create directory ‘plugin’mkdir: : File existscannot create 
directory ‘plugin’
: File exists
mkdir: cannot create directory ‘plugin’: File exists
Makefile:3787: recipe for target 'check-parallel-gcc_1' failed
make[1]: [check-parallel-gcc_1] Error 1 (ignored)
Makefile:3787: recipe for target 'check-parallel-gcc_2' failed
make[1]: [check-parallel-gcc_2] Error 1 (ignored)
Makefile:3787: recipe for target 'check-parallel-gcc_3' failed
make[1]: [check-parallel-gcc_3] Error 1 (ignored)
Makefile:3787: recipe for target 'check-parallel-gcc_4' failed
make[1]: [check-parallel-gcc_4] Error 1 (ignored)

Clearly preexisting problem even on trunk, so not a show stopper for this.

And in libgomp the testing fails completely:

Making check in testsuite
make[1]: Entering directory 
'/usr/src/gomp-4.0/obj/x86_64-pc-linux-gnu/libgomp/testsuite'
make  check-DEJAGNU
make[2]: Entering directory 
'/usr/src/gomp-4.0/obj/x86_64-pc-linux-gnu/libgomp/testsuite'
Making a new site.exp file...
srcdir=`CDPATH=${ZSH_VERSION+.}:  cd ../../../../libgomp/testsuite  pwd`; 
export srcdir; \
EXPECT=expect; export EXPECT; \
runtest=runtest ; \
if /bin/sh -c $runtest --version  /dev/null 21; then \
  exit_status=0; l='libgomp'; for tool in $l; do \
if $runtest  --tool $tool --srcdir $srcdir ; \
then :; else exit_status=1; fi; \
  done; \
else echo WARNING: could not find \`runtest' 12; :;\
fi; \
exit $exit_status
WARNING: Couldn't find the global config file.
ERROR: tcl error sourcing libgomp-test-support.exp.
can't read (target_alias): no such variable
while executing
set offload_additional_options  
-B/usr/src/gomp-4.0/objmic/libexec/gcc/$(target_alias)/$(gcc_version) 
-B/usr/src/gomp-4.0/objmic/bin
(file libgomp-test-support.exp line 5)
invoked from within
source libgomp-test-support.exp
(uplevel body line 1)
invoked from within
uplevel #0 source libgomp-test-support.exp
invoked from within
catch uplevel #0 source $file
Makefile:277: recipe for target 'check-DEJAGNU' failed
make[2]: *** [check-DEJAGNU] Error 1
make[2]: Leaving directory 
'/usr/src/gomp-4.0/obj/x86_64-pc-linux-gnu/libgomp/testsuite'
Makefile:314: recipe for target 'check-am' failed
make[1]: *** [check-am] Error 2
make[1]: Leaving directory 
'/usr/src/gomp-4.0/obj/x86_64-pc-linux-gnu/libgomp/testsuite'
Makefile:856: recipe for target 'check-recursive' failed
make: *** [check-recursive] Error 1

So clearly the *.exp files need to be taught where to look for 
libgomp-test-support.exp.

Jakub


Re: Pragma parsing (was: [PATCH] OpenACC for C++ front end)

2014-12-18 Thread Jakub Jelinek
On Thu, Dec 18, 2014 at 02:48:32PM +0100, Thomas Schwinge wrote:
  Like for CILK, I'd strongly prefer if for the clauses that are
  specific to OpenACC only you'd use PRAGMA_OACC_CLAUSE_* instead,
  and put them after the PRAGMA_CILK_* enum values.
  If you want to have PRAGMA_OACC_CLAUSE_ aliases also for the
  clauses shared in between OpenMP and OpenACC, feel free to add
  aliases like Cilk+ has them.  It is unfortunately lots of new clauses
  and we are getting close to the 64 clauses limit :( when they are used
  in bitmasks.
 
 I still don't like this very much, because we'll then get a mess of
 PRAGMA_OMP_CLAUSE_* intermixed with PRAGMA_OACC_CLAUSE_*.  I understand,
 for example, PRAGMA_OMP_CLAUSE_REDUCTION to just be a numeric
 representation of the reduction string -- and then, it doesn't make
 too much sense to me to express this both as PRAGMA_OMP_CLAUSE_REDUCTION
 and PRAGMA_OACC_CLAUSE_REDUCTION, or, similarly, to switch back and forth
 between PRAGMA_OMP_CLAUSE_* and PRAGMA_OACC_CLAUSE_*.  Yet, that's just
 the point that I'm making, so I'll defer if there's no consensus to be
 found here.

The point is that we now have lots of clauses, and making it clear what from
those clauses are Cilk+, what are OpenACC, what are OpenMP will help with
code readability.  Especially if some clauses diverge in their meaning
between different standards.

 Chung-Lin also had a suggestion to make; what do you think of this?

We don't need it yet, and I'm not very keen on making GCC internals use STL
more than it does, HWI is certainly more efficient than bitset.
So I'd defer any decision here until we have to.

Jakub


Re: [PATCH, ARM] attribute target (thumb,arm) [6/6] - [7/7]

2014-12-18 Thread Christian Bruel

Hello Ramana,

I don't know if you have started to look at it, but the attribute 
support fails after upgrading.


This patch aims to catch up on the changes around the fipa_ra 
-masm-syntax-unified options since the initial posting. They were not 
tested/supported with the attribute, and of course generated new 
failures after an update, as they needs resetting depending on thumb mode.


It also fixes various minor thinkos, one that prevented global flags 
(e.g -fschedule_insns) depending on thumb mode to be correctly reset on 
situation were -mthumb was passed on the command line and thumb/arm 
attribute was used alternatively. The other is around the -mflip-thumb 
option setting.


Last, a minor fix in the test attr_arm-err.c that had false failures 
with conflicting -march.


Aside, a funny thing: The ACLE doc 
(https://gcc.gnu.org/onlinedocs/gcc/ARM-C-Language-Extensions-_0028ACLE_0029.html#ARM-C-Language-Extensions-_0028ACLE_0029) 
already as a visionary description of the attribute. I might have missed 
something but this time the doc comes earlier than the implementation 
:-) without the historical background.


Best Regards

Christian

2014-12-14  Christian Bruel  christian.br...@st.com

	* config/arm/arm.c (arm_option_override_internal): add opts_set param. Use it to set 
	restrict_it. Handle flag_ipa_ra and inline_asm_unified.
	(arm_valid_target_attribute_tree): Add opts_set param.
	Initialize init_optimize.
	(init_optimize): New static variable.
	(thumb_flipper): Set.
	( arm_valid_target_attribute_p): Rewrite.
	config/arm/arm-protos.h	(arm_valid_target_attribute_tree): New opts_set param
	* config/arm/arm-c.c (arm_pragma_target_parse): Likewiese.
	* config/arm/arm.opt (inline_asm_unified): Save.

2014-12-14  Christian Bruel  christian.br...@st.com

* gcc.target/arm/attr_arm-err.c: Check conflicting -march options.

diff '--exclude=*~' '--exclude=.svn' -ru a/gcc/gcc/config/arm/arm.c b/gcc/gcc/config/arm/arm.c
--- a/gcc/gcc/config/arm/arm.c	2014-12-18 14:36:06.0 +0100
+++ b/gcc/gcc/config/arm/arm.c	2014-12-18 14:35:12.0 +0100
@@ -2625,9 +2625,16 @@
 }
 }
 
+/* True if -mflip-thumb should next add an attribute for the default
+   mode, false if it should next add an attribute for the opposite mode.  */
+static GTY(()) bool thumb_flipper;
+
+static GTY(()) tree init_optimize;
+
 /* Reset options between modes that the user has specified.  */
 static void
-arm_option_override_internal (struct gcc_options *opts)
+arm_option_override_internal (struct gcc_options *opts,
+			  struct gcc_options *opts_set)
 {
   if (TREE_TARGET_THUMB (opts)  !(insn_flags  FL_THUMB))
 {
@@ -2646,13 +2653,13 @@
   if (TREE_TARGET_THUMB (opts)  TARGET_CALLEE_INTERWORKING)
 opts-x_target_flags |= MASK_INTERWORK;
 
-  if (restrict_default)
+  if (! opts_set-x_arm_restrict_it)
 opts-x_arm_restrict_it = arm_arch8;
 
   if (!TREE_TARGET_THUMB2 (opts))
 opts-x_arm_restrict_it = 0;
 
-  if (TREE_TARGET_THUMB1 (opts)  opts-x_flag_schedule_insns)
+  if (TREE_TARGET_THUMB1 (opts))
 {
   /* Don't warn since it's on by default in -O2.  */
   opts-x_flag_schedule_insns = 0;
@@ -2663,9 +2670,21 @@
   if (optimize_function_for_size_p (cfun)  TREE_TARGET_THUMB2 (opts))
 opts-x_flag_shrink_wrap = false;
 
+  /* In Thumb1 mode, we emit the epilogue in RTL, but the last insn
+ - epilogue_insns - does not accurately model the corresponding insns
+ emitted in the asm file.  In particular, see the comment in thumb_exit
+ 'Find out how many of the (return) argument registers we can corrupt'.
+ As a consequence, the epilogue may clobber registers without fipa-ra
+ finding out about it.  Therefore, disable fipa-ra in Thumb1 mode.
+ TODO: Accurately model clobbers for epilogue_insns and reenable
+ fipa-ra.  */
+  if (TREE_TARGET_THUMB1 (opts))
+opts-x_flag_ipa_ra = 0;
+
   /* Thumb2 inline assembly code should always use unified syntax.
  This will apply to ARM and Thumb1 eventually.  */
-  opts-x_inline_asm_unified = TREE_TARGET_THUMB2 (opts);
+  if (TREE_TARGET_THUMB2 (opts))
+  opts-x_inline_asm_unified = 1;
 }
 
 /* Fix up any incompatible options that the user has specified.  */
@@ -3127,27 +3146,17 @@
   if (target_slow_flash_data)
 arm_disable_literal_pool = true;
 
-  /* Override flags, but not the user's one.  */
-  restrict_default = (arm_restrict_it == 2);
-
   /* Disable scheduling fusion by default if it's not armv7 processor
  or doesn't prefer ldrd/strd.  */
   if (flag_schedule_fusion == 2
(!arm_arch7 || !current_tune-prefer_ldrd_strd))
 flag_schedule_fusion = 0;
 
-  /* In Thumb1 mode, we emit the epilogue in RTL, but the last insn
- - epilogue_insns - does not accurately model the corresponding insns
- emitted in the asm file.  In particular, see the comment in thumb_exit
- 'Find out how many of the (return) argument registers we can corrupt'.
- As a consequence, the epilogue may clobber 

libgomp offloading testing (was: OpenACC middle end changes)

2014-12-18 Thread Thomas Schwinge
Hi Jakub!

On Thu, 18 Dec 2014 15:20:42 +0100, Jakub Jelinek ja...@redhat.com wrote:
 So, with your latest change both compilers build:
 mkdir objmic; cd objmic
 ../configure --build=x86_64-intelmicemul-linux-gnu 
 --host=x86_64-intelmicemul-linux-gnu --target=x86_64-intelmicemul-linux-gnu 
 --enable-as-accelerator-for=x86_64-pc-linux-gnu --disable-bootstrap
 make -j16
 make DESTDIR=`pwd`/../objinst install
 mkdir ../obj; cd ../obj
 ../configure --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu 
 --target=x86_64-pc-linux-gnu 
 --enable-offload-targets=x86_64-intelmicemul-linux-gnu=/usr/src/gomp-4.0/objmic
  --disable-bootstrap
 make -j16

Thanks; I'll look into reproducing such a build.


 And in libgomp the testing fails completely:

What happens, in my understanding, is:

 Making check in testsuite
 make[1]: Entering directory 
 '/usr/src/gomp-4.0/obj/x86_64-pc-linux-gnu/libgomp/testsuite'
 make  check-DEJAGNU
 make[2]: Entering directory 
 '/usr/src/gomp-4.0/obj/x86_64-pc-linux-gnu/libgomp/testsuite'
 Making a new site.exp file...
 srcdir=`CDPATH=${ZSH_VERSION+.}:  cd ../../../../libgomp/testsuite  
 pwd`; export srcdir; \
 EXPECT=expect; export EXPECT; \
 runtest=runtest ; \
 if /bin/sh -c $runtest --version  /dev/null 21; then \
   exit_status=0; l='libgomp'; for tool in $l; do \
 if $runtest  --tool $tool --srcdir $srcdir ; \
 then :; else exit_status=1; fi; \
   done; \
 else echo WARNING: could not find \`runtest' 12; :;\
 fi; \
 exit $exit_status
 WARNING: Couldn't find the global config file.
 ERROR: tcl error sourcing libgomp-test-support.exp.
 can't read (target_alias): no such variable

The variable $(target_alias) is not available when in the line:

 while executing
 set offload_additional_options  
 -B/usr/src/gomp-4.0/objmic/libexec/gcc/$(target_alias)/$(gcc_version) 
 -B/usr/src/gomp-4.0/objmic/bin

... in the file:

 (file libgomp-test-support.exp line 5)
 invoked from within
 source libgomp-test-support.exp

... it is being parsed.

Should target_alias and gcc_version be instantiated (AC_SUBST) by
Autoconf already, when creating the libgomp-test-support.exp file from
libgomp/testsuite/libgomp-test-support.exp.in?  Or, should those be
written (in libgomp/plugin/configfrag.ac) in TCL syntax, and evaluated
only once libgomp-test-support.exp is sourced?  target_alias is being
provided in site.exp (as generated by libgomp/testsuite/Makefile), but
gcc_version is not.

 (uplevel body line 1)
 invoked from within
 uplevel #0 source libgomp-test-support.exp
 invoked from within
 catch uplevel #0 source $file
 Makefile:277: recipe for target 'check-DEJAGNU' failed
 make[2]: *** [check-DEJAGNU] Error 1
 make[2]: Leaving directory 
 '/usr/src/gomp-4.0/obj/x86_64-pc-linux-gnu/libgomp/testsuite'
 Makefile:314: recipe for target 'check-am' failed
 make[1]: *** [check-am] Error 2
 make[1]: Leaving directory 
 '/usr/src/gomp-4.0/obj/x86_64-pc-linux-gnu/libgomp/testsuite'
 Makefile:856: recipe for target 'check-recursive' failed
 make: *** [check-recursive] Error 1

 So clearly the *.exp files need to be taught where to look for 
 libgomp-test-support.exp.

That's not the problem, if I'm understanding correctly.


Grüße,
 Thomas


pgpQUlVKSWcfG.pgp
Description: PGP signature


[patch c++]: Fix PR/61198: Crash when selecting specializations through aliases.

2014-12-18 Thread Kai Tietz
Hi,

this patch adds handling of aliases within templates and tries to
resolve specialization for them.

ChangeLog

2014-12-18  Kai Tietz  kti...@redhat.com

PR c++/61198
* pt.c (retrieve_specialization): Handle using.

Tested on x86_64-w64-mingw32.  Ok for apply?

Regards,
Kai

ChangeLog testsuite

2014-12-18  Kai Tietz  kti...@redhat.com

PR c++/61198
* g++.dg/template/using29.C: New file.

// { dg-do compile }

templateint herp, typename derp_t
struct broken
{
templatetypename target_t
using rebind = brokenherp, target_
};

templatetypename derp_t
struct broken2, derp_t
{
templatetypename target_t
using rebind = broken2, target_t;
};

int main(int argc, char **argv)
{
broken2, float::rebinddouble u;

return 0;
}

Index: pt.c
===
--- pt.c(Revision 218832)
+++ pt.c(Arbeitskopie)
@@ -1033,6 +1033,8 @@ optimize_specialization_lookup_p (tree tmpl)
 static tree
 retrieve_specialization (tree tmpl, tree args, hashval_t hash)
 {
+  tree tmpl_parms;
+
   if (tmpl == NULL_TREE)
 return NULL_TREE;

@@ -1044,10 +1046,20 @@ retrieve_specialization (tree tmpl, tree args, has

   /* There should be as many levels of arguments as there are
  levels of parameters.  */
-  gcc_assert (TMPL_ARGS_DEPTH (args)
-  == (TREE_CODE (tmpl) == TEMPLATE_DECL
-  ? TMPL_PARMS_DEPTH (DECL_TEMPLATE_PARMS (tmpl))
-  : template_class_depth (DECL_CONTEXT (tmpl;
+  if (TREE_CODE (tmpl) == TEMPLATE_DECL)
+{
+  tmpl_parms = DECL_TEMPLATE_PARMS (tmpl);
+  if (TMPL_ARGS_DEPTH (args)  TMPL_PARMS_DEPTH (tmpl_parms))
+args = get_innermost_template_args
+(args, TMPL_PARMS_DEPTH (tmpl_parms));
+  gcc_assert (TMPL_ARGS_DEPTH (args)
+  == TMPL_PARMS_DEPTH (tmpl_parms));
+}
+  else
+{
+  tmpl_parms = DECL_CONTEXT (tmpl);
+  gcc_assert (TMPL_ARGS_DEPTH (args) == template_class_depth (tmpl_parms));
+}

   if (optimize_specialization_lookup_p (tmpl))
 {


Re: [PATCH 4/4] OpenMP 4.0 offloading to Intel MIC: non-fallback testing

2014-12-18 Thread Thomas Schwinge
Hi!

On Thu, 30 Oct 2014 14:40:01 +0300, Ilya Verbin iver...@gmail.com wrote:
 This patch allows to run non-fallback 'make check-target-libgomp'.  It passes 
 to
 the host compiler additional -B options with the paths to the offload 
 compilers,
 since non-installed host compiler doesn't know where to find mkoffload tools.
 Also in case of intelmic offload targets it appends paths to liboffloadmic 
 lib.

 --- a/libgomp/configure.ac
 +++ b/libgomp/configure.ac
 @@ -280,9 +280,13 @@ else
multilib_arg=
  fi
  
 +# Get accel target and path to install tree of accel compiler
 +offload_additional_options=
 +offload_additional_lib_paths=
  offload_targets=
  if test x$enable_offload_targets != x; then
for tgt in `echo $enable_offload_targets | sed -e 's#,# #g'`; do
 +tgt_dir=`echo $tgt | grep '=' | sed 's/.*=//'`
  tgt=`echo $tgt | sed 's/=.*//'`
  case $tgt in
*-intelmic-* | *-intelmicemul-*)
 @@ -295,10 +299,20 @@ if test x$enable_offload_targets != x; then
  else
offload_targets=$offload_targets,$tgt_name
  fi
 +if test x$tgt_dir != x; then
 +  offload_additional_options=$offload_additional_options 
 -B$tgt_dir/libexec/gcc/\$(target_alias)/\$(gcc_version) -B$tgt_dir/bin
 +  
 offload_additional_lib_paths=$offload_additional_lib_paths:$tgt_dir/lib64:$tgt_dir/lib
 +else
 +  offload_additional_options=$offload_additional_options 
 -B\$(libexecdir)/gcc/\$(target_alias)/\$(gcc_version) -B\$(bindir)
 +  
 offload_additional_lib_paths=$offload_additional_lib_paths:$toolexeclibdir
 +fi
done
  fi

Hmm, maybe I'm seeing a problem where there isn't one, but in case I'm
not: how will this work if there is more than one offloading compiler
configured?  Won't you get conflicting -B paths added to
offload_additional_options in this case?

 --- a/libgomp/testsuite/lib/libgomp.exp
 +++ b/libgomp/testsuite/lib/libgomp.exp
 @@ -169,6 +186,10 @@ proc libgomp_init { args } {
  
  # Disable color diagnostics
  lappend ALWAYS_CFLAGS additional_flags=-fdiagnostics-color=never
 +
 +# Required to support non-fallback testing of '#pragma omp target'.
 +# Help GCC to find target mkoffload.
 +lappend ALWAYS_CFLAGS additional_flags=${offload_additional_options}
  }

What we're doing in OpenACC offloading testing (gomp-4_0-branch), is in
the libgomp.oacc-c/c.exp (etc.) file cycle through all the available
offloading devices, and then in there -- I think -- such options should
be set, that are specific to one particular offloading device/compiler?


Grüße,
 Thomas


signature.asc
Description: PGP signature


Re: [PATCH][ARM] Implement TARGET_SCHED_MACRO_FUSION_PAIR_P

2014-12-18 Thread Kyrill Tkachov

Ping.

Thanks,
Kyrill

On 11/12/14 15:06, Kyrill Tkachov wrote:

Ping.
https://gcc.gnu.org/ml/gcc-patches/2014-12/msg00340.html

Thanks,
Kyrill

On 04/12/14 09:19, Kyrill Tkachov wrote:

On 02/12/14 22:58, Ramana Radhakrishnan wrote:

On Tue, Nov 11, 2014 at 11:55 AM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote:

Hi all,

This is the arm implementation of the macro fusion hook.
It tries to fuse movw+movt operations together. It also tries to take lo_sum
RTXs into account since those generate movt instructions as well.

Bootstrapped and tested on arm-none-linux-gnueabihf.

Ok for trunk?
if (current_tune-fuseable_ops  ARM_FUSE_MOVW_MOVT)
+{
+  /* We are trying to fuse
+ movw imm / movt imm
+ instructions as a group that gets scheduled together.  */
+

A comment here about the insn structure would be useful.

Done. It's similar to the aarch64 adrp+add case. It does make it easier
to read, thanks.

2014-12-04  Kyrylo Tkachov  kyrylo.tkac...@arm.com\

 * config/arm/arm-protos.h (tune_params): Add fuseable_ops field.
 * config/arm/arm.c (arm_macro_fusion_p): New function.
 (arm_macro_fusion_pair_p): Likewise.
 (TARGET_SCHED_MACRO_FUSION_P): Define.
 (TARGET_SCHED_MACRO_FUSION_PAIR_P): Likewise.
 (ARM_FUSE_NOTHING): Likewise.
 (ARM_FUSE_MOVW_MOVT): Likewise.
 (arm_slowmul_tune, arm_fastmul_tune, arm_strongarm_tune,
 arm_xscale_tune, arm_9e_tune, arm_v6t2_tune, arm_cortex_tune,
 arm_cortex_a8_tune, arm_cortex_a7_tune, arm_cortex_a15_tune,
 arm_cortex_a53_tune, arm_cortex_a57_tune, arm_cortex_a9_tune,
 arm_cortex_a12_tune, arm_v7m_tune, arm_v6m_tune, arm_fa726te_tune
 arm_cortex_a5_tune): Specify fuseable_ops value.


+  set_dest = SET_DEST (curr_set);
+  if (GET_CODE (set_dest) == ZERO_EXTRACT)
+{
+  if (CONST_INT_P (SET_SRC (curr_set))
+   CONST_INT_P (SET_SRC (prev_set))
+   REG_P (XEXP (set_dest, 0))
+   REG_P (SET_DEST (prev_set))
+   REGNO (XEXP (set_dest, 0)) == REGNO (SET_DEST (prev_set)))
+return true;
+}
+  else if (GET_CODE (SET_SRC (curr_set)) == LO_SUM
+REG_P (SET_DEST (curr_set))
+REG_P (SET_DEST (prev_set))
+GET_CODE (SET_SRC (prev_set)) == HIGH
+REGNO (SET_DEST (curr_set)) == REGNO (SET_DEST (prev_set)))
+{
+  return true;
+}

Can we add a fast path exit to be

if (GET_MODE (set_dest) != SImode)
 return false;

Done, but if/when we extend the function to handle more fusion cases it
will need to be
refactored, since we will want to just bail out of this MOVW+MOVT case
rather than the whole function.


I did think whether we wanted to use reg_overlap_mentioned_p as that
may simplify the logic a bit but that's  overkill here as we still
want to restrict it to the cases above.

Otherwise OK.

Here's the updated patch. I've tested on arm-none-eabi and made sure
that the
fusion still happens on the benchmarks I looked at.
Ok?

Thanks,
Kyrill


Ramana





+}
+  return false;
Thanks,
Kyrill

2014-11-11  Kyrylo Tkachov  kyrylo.tkac...@arm.com

   * config/arm/arm-protos.h (tune_params): Add fuseable_ops field.
   * config/arm/arm.c (arm_macro_fusion_p): New function.
   (arm_macro_fusion_pair_p): Likewise.
   (TARGET_SCHED_MACRO_FUSION_P): Define.
   (TARGET_SCHED_MACRO_FUSION_PAIR_P): Likewise.
   (ARM_FUSE_NOTHING): Likewise.
   (ARM_FUSE_MOVW_MOVT): Likewise.
   (arm_slowmul_tune, arm_fastmul_tune, arm_strongarm_tune,
   arm_xscale_tune, arm_9e_tune, arm_v6t2_tune, arm_cortex_tune,
   arm_cortex_a8_tune, arm_cortex_a7_tune, arm_cortex_a15_tune,
   arm_cortex_a53_tune, arm_cortex_a57_tune, arm_cortex_a9_tune,
   arm_cortex_a12_tune, arm_v7m_tune, arm_v6m_tune, arm_fa726te_tune
   arm_cortex_a5_tune): Specify fuseable_ops value.








RE: [PATCH] Disable -fuse-caller-save when -pg is active

2014-12-18 Thread Radovan Obradovic

Patch has been tested with DejaGnu gcc test suite on mips32r2 cross compiler 
and bootstrapped and tested on
x86_64 native compiler.

Radovan


From: Jeff Law [l...@redhat.com]
Sent: Monday, November 17, 2014 1:18 PM
To: Radovan Obradovic; gcc-patches@gcc.gnu.org
Cc: Petar Jovanovic
Subject: Re: [PATCH] Disable -fuse-caller-save when -pg is active

On 11/14/14 10:10, Radovan Obradovic wrote:
 Thank you for the quick reply.

 Please repost after updating to test HAVE_prologue and
 HAVE_epilogue and adding a testcase.

 I have managed to reproduce the problem on the small test case on
 mips32, but the test is architecture independent and should probably
 fail on many other ports without this patch. The optimization is also
 disabled if macros HAVE_prologue and HAVE_epilogue are not defined or
 have false value.
Thanks.  This looks good.  I forgot to ask in my prior message, has this
patch been bootstrapped and regression tested?  If so, on what platform?

Jeff


Re: OpenACC middle end changes

2014-12-18 Thread Thomas Schwinge
Hi Jakub!

On Thu, 13 Nov 2014 19:09:49 +0100, Jakub Jelinek ja...@redhat.com wrote:
  --- gcc/builtins.c
  +++ gcc/builtins.c

  +static rtx
  +expand_builtin_acc_on_device (tree exp, rtx target ATTRIBUTE_UNUSED)
  +{
  +  if (!validate_arglist (exp, INTEGER_TYPE, VOID_TYPE))
  +return NULL_RTX;
  +
  +  tree arg, v1, v2, ret;
  +  location_t loc;
  +
  +  arg = CALL_EXPR_ARG (exp, 0);
  +  arg = builtin_save_expr (arg);
  +  loc = EXPR_LOCATION (exp);
  +
  +  /* Build: (arg == v1 || arg == v2) ? 1 : 0.  */
  +
  +#ifdef ACCEL_COMPILER
  +  v1 = build_int_cst (TREE_TYPE (arg), /* TODO: acc_device_not_host */ 3);
  +  v2 = build_int_cst (TREE_TYPE (arg), ACCEL_COMPILER_acc_device);
  +#else
  +  v1 = build_int_cst (TREE_TYPE (arg), /* TODO: acc_device_none */ 0);
  +  v2 = build_int_cst (TREE_TYPE (arg), /* TODO: acc_device_host */ 2);
  +#endif
  +
  +  v1 = fold_build2_loc (loc, EQ_EXPR, integer_type_node, arg, v1);
  +  v2 = fold_build2_loc (loc, EQ_EXPR, integer_type_node, arg, v2);
  +
  +  /* Can't use TRUTH_ORIF_EXPR, as that is not supported by
  + expand_expr_real*.  */
  +  ret = fold_build3_loc (loc, COND_EXPR, integer_type_node, v1, v1, v2);
  +  ret = fold_build3_loc (loc, COND_EXPR, integer_type_node,
  +ret, integer_one_node, integer_zero_node);
  +
  +  return expand_normal (ret);
 
 If you can't fold it late (which is indeed a problem for -O0),
 then I'd suggest to implement this more RTL-ish.
 So, avoid the builtin_save_expr, instead
   rtx op = expand_normal (arg);
 Don't build v1/v2 as trees (and, please fix the TODOs), but rtxes,
 just
   rtx v1 = GEN_INT (...);
   rtx v2 = GEN_INT (...);
   machine_mode mode = TYPE_MODE (TREE_TYPE (arg));
   rtx ret = gen_reg_rtx (TYPE_MODE (integer_type_node));
   emit_move_insn (ret, const0_rtx);
   rtx_code_label *done_label = gen_label_rtx ();
   emit_cmp_and_jump_insns (op, v1, NE, NULL_RTX, mode,
  false, done_label, PROB_EVEN);
   emit_cmp_and_jump_insns (op, v2, NE, NULL_RTX, mode,
  false, done_label, PROB_EVEN);
   emit_move_insn (ret, const1_rtx);
   emit_label (done_label);
   return ret;
 or similar.

;-) Yes, similar, as I've now found; committed to gomp-4_0-branch in
r218869:

commit cb37a039eb7a7375d074bc092457349312c5a2e2
Author: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4
Date:   Thu Dec 18 16:07:23 2014 +

OpenACC acc_on_device: Fix logic error introduced in an earlier change.

... but which didn't show up in testing until after a libgomp rebuild, 
because
of the caching of the acc_on_device builtin that is being done in
libgomp/oacc-init.c:acc_on_device.

gcc/
* builtins.c (expand_builtin_acc_on_device): Fix logic error.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@218869 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog.gomp | 2 ++
 gcc/builtins.c | 8 
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp
index a744ebf..a21fd92 100644
--- gcc/ChangeLog.gomp
+++ gcc/ChangeLog.gomp
@@ -1,5 +1,7 @@
 2014-12-18  Thomas Schwinge  tho...@codesourcery.com
 
+   * builtins.c (expand_builtin_acc_on_device): Fix logic error.
+
* config/i386/intelmic-offload.h: New file.
* config/nvptx/offload.h: Likewise.
* config.gcc *-intelmic-*, *-intelmicemul-*, nvptx-*: Point to
diff --git gcc/builtins.c gcc/builtins.c
index 33025a5..6891229 100644
--- gcc/builtins.c
+++ gcc/builtins.c
@@ -5909,13 +5909,13 @@ expand_builtin_acc_on_device (tree exp, rtx target)
   machine_mode target_mode = TYPE_MODE (integer_type_node);
   if (!REG_P (target) || GET_MODE (target) != target_mode)
 target = gen_reg_rtx (target_mode);
-  emit_move_insn (target, const0_rtx);
+  emit_move_insn (target, const1_rtx);
   rtx_code_label *done_label = gen_label_rtx ();
-  do_compare_rtx_and_jump (v, v1, NE, false, v_mode, NULL_RTX,
+  do_compare_rtx_and_jump (v, v1, EQ, false, v_mode, NULL_RTX,
   NULL_RTX, done_label, PROB_EVEN);
-  do_compare_rtx_and_jump (v, v2, NE, false, v_mode, NULL_RTX,
+  do_compare_rtx_and_jump (v, v2, EQ, false, v_mode, NULL_RTX,
   NULL_RTX, done_label, PROB_EVEN);
-  emit_move_insn (target, const1_rtx);
+  emit_move_insn (target, const0_rtx);
   emit_label (done_label);
 
   return target;


Grüße,
 Thomas


pgp6P0PultR2w.pgp
Description: PGP signature


Re: [C++ Patch] PR 60955

2014-12-18 Thread Paolo Carlini

Hi,

On 12/18/2014 03:20 PM, Jason Merrill wrote:

On 12/18/2014 06:17 AM, Paolo Carlini wrote:

Sure. The below uses the c_inhibit_evaluation_warnings mechanism and
passes testing. I wondered if in such cases we could alternately use the
warning_sentinel mechanism (moved to cp-tree.h of course) ?

That would make sense to me.

Good. Then I'm finishing testing the below.

Thanks,
Paolo.


Index: cp/cp-tree.h
===
--- cp/cp-tree.h(revision 218857)
+++ cp/cp-tree.h(working copy)
@@ -1149,6 +1149,18 @@ struct processing_template_decl_sentinel
   }
 };
 
+/* RAII sentinel to disable certain warnings during template substitution
+   and elsewhere.  */
+
+struct warning_sentinel
+{
+  int flag;
+  int val;
+  warning_sentinel(int flag, bool suppress=true)
+: flag(flag), val(flag) { if (suppress) flag = 0; }
+  ~warning_sentinel() { flag = val; }
+};
+
 /* The cached class binding level, from the most recently exited
class, or NULL if none.  */
 
Index: cp/pt.c
===
--- cp/pt.c (revision 218857)
+++ cp/pt.c (working copy)
@@ -14438,16 +14438,6 @@ tsubst_non_call_postfix_expression (tree t, tree a
   return t;
 }
 
-/* Sentinel to disable certain warnings during template substitution.  */
-
-struct warning_sentinel {
-  int flag;
-  int val;
-  warning_sentinel(int flag, bool suppress=true)
-: flag(flag), val(flag) { if (suppress) flag = 0; }
-  ~warning_sentinel() { flag = val; }
-};
-
 /* Like tsubst but deals with expressions and performs semantic
analysis.  FUNCTION_P is true if T is the F in F (ARGS).  */
 
Index: cp/semantics.c
===
--- cp/semantics.c  (revision 218857)
+++ cp/semantics.c  (working copy)
@@ -1660,6 +1660,7 @@ force_paren_expr (tree expr)
  tree type = unlowered_expr_type (expr);
  bool rval = !!(kind  clk_rvalueref);
  type = cp_build_reference_type (type, rval);
+ warning_sentinel s (extra_warnings);
  expr = build_static_cast (type, expr, tf_error);
  if (expr != error_mark_node)
REF_PARENTHESIZED_P (expr) = true;
Index: testsuite/g++.dg/warn/register-parm-1.C
===
--- testsuite/g++.dg/warn/register-parm-1.C (revision 0)
+++ testsuite/g++.dg/warn/register-parm-1.C (working copy)
@@ -0,0 +1,9 @@
+// PR c++/60955
+// { dg-options -Wextra }
+
+unsigned int erroneous_warning(register int a) {
+if ((a)  0xff) return 1; else return 0;
+}
+unsigned int no_erroneous_warning(register int a) {
+if (a  0xff) return 1; else return 0;
+}


Re: [PATCH 2/2] [C++] pr31397 - implement -Wsuggest-override

2014-12-18 Thread Trevor Saunders
On Wed, Dec 17, 2014 at 04:21:31PM -0500, Jason Merrill wrote:
 On 11/27/2014 01:28 AM, tsaund...@mozilla.com wrote:
 +  if (warn_override  DECL_VIRTUAL_P (decl)  !DECL_OVERRIDE_P (decl)
 
 Why check DECL_VIRTUAL_P here?

I think it was to avoid warning when decl is hiding a non virtual method
on the parent class, but I guess that won't happen anyway? so I'll
remove it.

Trev

 
 +warning_at(DECL_SOURCE_LOCATION (decl), OPT_Wsuggest_override,
 
 Space before the (.
 
 Jason
 


Re: [PATCH] PR 62173, re-shuffle insns for RTL loop invariant hoisting

2014-12-18 Thread Jiong Wang


On 17/12/14 15:54, Richard Biener wrote:

On Mon, Dec 15, 2014 at 4:29 PM, Jiong Wang jiong.w...@arm.com wrote:

On 15/12/14 15:28, Jiong Wang wrote:

On 04/12/14 19:32, Jiong Wang wrote:

On 04/12/14 11:07, Richard Biener wrote:


On Thu, Dec 4, 2014 at 12:07 PM, Richard Biener
richard.guent...@gmail.com wrote:

On Thu, Dec 4, 2014 at 12:00 PM, Jiong Wang jiong.w...@arm.com wrote:


which means re-associate the constant imm with the virtual frame
pointer.

transform

 RA - fixed_reg + RC
 RD - MEM (RA + const_offset)

  into:

 RA - fixed_reg + const_offset
 RD - MEM (RA + RC)

then RA - fixed_reg + const_offset is actually loop invariant, so the
later
RTL GCSE PRE pass could catch it and do the hoisting, and thus
ameliorate
what tree
level ivopts could not sort out.
There is a LIM pass after gimple ivopts - if the invariantness is
already
visible there why not handle it there similar to the special-cases in
rewrite_bittest and rewrite_reciprocal?

maybe, needs further check.


And of course similar tricks could be applied on the RTL level to
RTL invariant motion?

Thanks. I just checked the code, yes, loop invariant motion pass
is the natural place to integrate such multi-insns invariant analysis
trick.

those code could be integrated into loop-invariant.c cleanly, but
then I found although the invariant detected successfully but it's not
moved
out of loop because of cost issue, and looks like the patch committed to
fix PR33928
is trying to prevent such cheap address be hoisted to reduce register
pressure.


 805   /* ??? Try to determine cheapness of address computation.
Unfortunately
 806  the address cost is only a relative measure, we can't
really compare
 807  it with any absolute number, but only with other address
costs.
 808  But here we don't have any other addresses, so compare
with a magic
 809  number anyway.  It has to be large enough to not regress
PR33928
 810  (by avoiding to move reg+8,reg+16,reg+24 invariants),
but small
 811  enough to not regress 410.bwaves either (by still moving
reg+reg
 812  invariants).
 813  See
http://gcc.gnu.org/ml/gcc-patches/2009-10/msg01210.html .  */
 814   inv-cheap_address = address_cost (SET_SRC (set),
word_mode,
 815  ADDR_SPACE_GENERIC,
speed)  3;


I think that maybe necessary for x86 which is short of register, while
for RISC,
it may not be that necessary, especially the whole register pressure is
not big.

currently, what I could think of is for this transformation below, we
should
increase the costs:
 A
==
 RA - virtual_stack_var + RC
 RD - MEM (RA + const_offset)

  into:

B
==
 RA - virtual_stack_var + const_offset  --B
 RD - MEM (RA + RC)

because the cost is not that cheap, if there is not re-assocation of
virtual_stack_var
with const_offset, then lra elimination will create another instruction
to hold the
elimination result, so format A will actually be

 RT - real_stack_pointer + elimination_offset
 RA - RT + RC
 RD - MEM (RA + const_offset)

so, the re-assocation and later hoisting of invariant B could actually
save two instructions
in the loop, this is why there are 15% perf gap for bzip2 under some
situation.

updated patch.

moved this instruction shuffling trick to rtl loop invariant pass.
as described above, this patch tries to transform A to B, so that
after the transformation:

 * RA - virtual_stack_var + const_offset  could be hoisted out of the
loop
 * easy the work of lra elimination as virtual_stack_var is associated
with const_offset
   that the elimination offset could be combined with const_offset
automatically.

current rtl loop invariant pass treat reg - reg + off as cheap address,
while although
reg - virtual_stack_var + offset fall into the same format, but it's
not that cheap as
we could also save one lra elimination instruction. so this patch will
mark
reg - virtual_stack_var + offset transformed from A to be expensive, so
that it could be
hoisted later.

after patch, pr62173 is fixed on powerpc64, while *still not on aarch64*.
because there are one
glitch in aarch64_legitimize_address which cause unnecessary complex
instructions sequences
generated when legitimize some addresses. and if we fix that, we will get
cheaper address for
those cases which is generally good, and the cheaper address will cause
tree level IVOPT do
more IVOPT optimization which is generally good also, but from the speck2k
result, there
are actually around 1% code size regression on two cases, the reason is
for target support
post-index address, doing IVOPT may not always be the best choice because
we lost the advantage
of using post-index addressing.

on aarch64, for the following testcase, the ivopted version is complexer
than not ivopted version.

   while (oargc--) *(nargv++) 

Re: [PATCH] X86-64: Add -mskip-rax-setup

2014-12-18 Thread H. Peter Anvin
On 12/18/2014 06:12 AM, Uros Bizjak wrote:

 # temporary until string.h is fixed
 KBUILD_CFLAGS += -ffreestanding

 Yes, it looks to me that new option is the way to go.

 Is this an OK?
 
 In principle, I'm OK with the patch approach, but let's wait for
 eventual comments from Linux people.
 

Acked-by: H. Peter Anvin h...@linux.intel.com

H.J. already coordinated with us; we are more than happy with this approach.

Thank you!

-hpa




Aproved patches

2014-12-18 Thread Ricardo Sardano Guanciale
Please, can somebody help me? Where do I find GCC approved/accepted patch?

Is there some patch directory in repository, if is please where?

Thanks and best regards,

Ricardo




Re: [PATCH][ARM][doc] Remove mention of Advanced RISC Machines

2014-12-18 Thread Kyrill Tkachov

Ping.
https://gcc.gnu.org/ml/gcc-patches/2014-12/msg00947.html

Thanks,
Kyrill
On 10/12/14 16:50, Kyrill Tkachov wrote:

Hi all,

The company ARM is not Advanced RISC Machines anymore and anyway the ARM
architecture is distinct from the company.
This patch adjusts the documentation accordingly.

Ok?

Thanks,
Kyrill

2014-12-10  Kyrylo Tkachov  kyrylo.tkac...@arm.com

  * doc/invoke.texi (ARM options): Remove mention of Advanced RISC
  Machines.





Re: [PATCH] X86-64: Add -mskip-rax-setup

2014-12-18 Thread H.J. Lu
On Thu, Dec 18, 2014 at 9:08 AM, H. Peter Anvin h...@zytor.com wrote:
 On 12/18/2014 06:12 AM, Uros Bizjak wrote:

 # temporary until string.h is fixed
 KBUILD_CFLAGS += -ffreestanding

 Yes, it looks to me that new option is the way to go.

 Is this an OK?

 In principle, I'm OK with the patch approach, but let's wait for
 eventual comments from Linux people.


 Acked-by: H. Peter Anvin h...@linux.intel.com

 H.J. already coordinated with us; we are more than happy with this approach.

 Thank you!

I am checking it in now.

Thanks.

-- 
H.J.


Re: [C++ Patch] PR 60955

2014-12-18 Thread Jason Merrill

On 12/18/2014 11:31 AM, Paolo Carlini wrote:

+ warning_sentinel s (extra_warnings);


Let's add a comment about which warning we're avoiding here.  OK with 
that change.


Jason



Re: [PATCH] X86-64: Add -mskip-rax-setup

2014-12-18 Thread Uros Bizjak
On Thu, Dec 18, 2014 at 6:08 PM, H. Peter Anvin h...@zytor.com wrote:
 On 12/18/2014 06:12 AM, Uros Bizjak wrote:

 # temporary until string.h is fixed
 KBUILD_CFLAGS += -ffreestanding

 Yes, it looks to me that new option is the way to go.

 Is this an OK?

 In principle, I'm OK with the patch approach, but let's wait for
 eventual comments from Linux people.


 Acked-by: H. Peter Anvin h...@linux.intel.com

 H.J. already coordinated with us; we are more than happy with this approach.

Great, I'm glad that this specialized option has uses.

Patch is OK.

Thanks,
Uros.


Re: [patch c++]: Fix PR/61198: Crash when selecting specializations through aliases.

2014-12-18 Thread Jason Merrill

On 12/18/2014 10:10 AM, Kai Tietz wrote:

+  if (TMPL_ARGS_DEPTH (args)  TMPL_PARMS_DEPTH (tmpl_parms))
+args = get_innermost_template_args
+(args, TMPL_PARMS_DEPTH (tmpl_parms));


It seems unlikely to be correct to arbitrarily strip off outer template 
args.  Where did the mismatch come from?


Jason



Re: Aproved patches

2014-12-18 Thread Jakub Jelinek
On Thu, Dec 18, 2014 at 03:13:32PM -0200, Ricardo Sardano Guanciale wrote:
 Please, can somebody help me? Where do I find GCC approved/accepted patch?
 
 Is there some patch directory in repository, if is please where?

Approved patches are usually just committed to the SVN repository,
there is no extra repository for patches.

Jakub


Re: [PATCH][ARM][doc] Remove mention of Advanced RISC Machines

2014-12-18 Thread James Greenhalgh
On Thu, Dec 18, 2014 at 05:18:18PM +, Kyrill Tkachov wrote:
 Ping.
 https://gcc.gnu.org/ml/gcc-patches/2014-12/msg00947.html

This patch almost certainly falls under the obvious rule:

  will the person who objects to my work the most be able to
   find a fault with my fix?

ARM hasn't stood for Advanced RISC Machines since 1998, so I can't see
even your most fervent nemesis objecting to this one!

Cheers,
James

 
 Thanks,
 Kyrill
 On 10/12/14 16:50, Kyrill Tkachov wrote:
  Hi all,
 
  The company ARM is not Advanced RISC Machines anymore and anyway the ARM
  architecture is distinct from the company.
  This patch adjusts the documentation accordingly.
 
  Ok?
 
  Thanks,
  Kyrill
 
  2014-12-10  Kyrylo Tkachov  kyrylo.tkac...@arm.com
 
* doc/invoke.texi (ARM options): Remove mention of Advanced RISC
Machines.
 
 
 


Re: [PATCH] IPA ICF: refactoring + fix for PR ipa/63569

2014-12-18 Thread Martin Liška

On 12/17/2014 04:23 PM, Richard Biener wrote:

On Wed, Dec 17, 2014 at 12:17 PM, Martin Liška mli...@suse.cz wrote:

On 12/11/2014 01:37 PM, Richard Biener wrote:


On Wed, Dec 10, 2014 at 1:18 PM, Martin Liška mli...@suse.cz wrote:


Hello.

As suggested by Richard, I split compare_operand functions to various
functions
related to a specific comparison. Apart from that I added fast check for
volatility flag that caused miscompilation mentioned in PR63569.

Patch can bootstrap on x86_64-linux-pc without any regression seen and I
was
able to build Firefox with LTO.

Ready for trunk?



Hmm, I don't think the dispatch to compare_memory_operand is at the
correct place.  It should be called from places where currently
compare_operand is called and it should recurse to compare_operand.
That is, it is more high-level.

Can you please fix the volatile issue separately?  It's also not necessary
to do that check on every operand but just on memory operands.



Hello Richard.

I hope the following patch is the finally the right approach we want to do
;)
In the first patch, I did just refactoring for high-level memory operand
comparison and the second of fixes problem connected to missing volatile
attribute comparison.

Patch was retested and can bootstrap.

What do you think about it?


+bool
+func_checker::compare_cst_or_decl (tree t1, tree t2)
+{
...
+case INTEGER_CST:
+  {
+   ret = compatible_types_p (TREE_TYPE (t1), TREE_TYPE (t2))
+  wi::to_offset  (t1) == wi::to_offset  (t2);
+
+   return return_with_debug (ret);
+  }
+case COMPLEX_CST:
+case VECTOR_CST:
+case STRING_CST:
+case REAL_CST:
+  {
+   ret = operand_equal_p (t1, t2, OEP_ONLY_CONST);
+   return return_with_debug (ret);

why does the type matter for INTEGER_CSTs but not for other constants?
Also comparing INTEGER_CSTs via to_offset is certainly wrong.  Please
use tree_int_cst_equal_p (t1, t2) instead, or operand_equal_p which would
end up calling tree_int_cst_equal_p.  That said, I'd have commonized all
_CST nodes by

   ret = compatible_types_p ()  operand_equal_p (...);

+case CONST_DECL:
+case BIT_FIELD_REF:
+  {

I'm quite sure BIT_FIELD_REF is spurious here.

Now to the meat ...

+  tree load1 = get_base_loadstore (t1, false);
+  tree load2 = get_base_loadstore (t2, false);
+
+  if (load1  load2  !compare_memory_operand (t1, t2))
+   return return_false_with_msg (memory operands are different);
+  else if ((load1  !load2) || (!load1  load2))
+   return return_false_with_msg ();
+

and the similar code for assigns.  The way you introduce the
unpack_handled_component flag to get_base_loadstore makes
it treat x.field as non-memory operand.  That's wrong.  I wonder
why you think you needed that.  Why does

   tree load1= get_base_loadstore (t1);

not work?  Btw, I'd have avoided get_base_loadstore and simply did

   ao_ref_init (r1, t1);
   ao_ref_init (r2, t2);
   tree base1 = ao_ref_base (r1);
   tree base2 = ao_ref_base (r2);
   if ((DECL_P (base1) || (TREE_CODE (base1) == MEM_REF || TREE_CODE
(base1) == TARGET_MEM_REF))
   (DECL_P (base2) || (...))
...

or rather put this into compare_memory_operand and call that on all operands
that may be memory (TREE_CODE () != SSA_NAME  !is_gimple_min_invariant ()).

I also miss where you handle memory operands on the LHS for calls
and for assignments.

The compare_ssa_name refactoring part is ok to install.

Can you please fix the volatile bug before the refactoring as it looks like
we're going to iterate some more here unless I can find the time to give
it a quick try myself.

Thanks,
Richard.



Hi.

There's second part of the patch set.

Thanks,
Martin


Thank you,
Martin



Thanks,
Richard.


Thanks,
Martin





From 288fb5fa6c78cf5a323c7667b180aed2b4d7e346 Mon Sep 17 00:00:00 2001
From: mliska mli...@suse.cz
Date: Thu, 18 Dec 2014 14:22:51 +0100
Subject: [PATCH 2/2] IPA ICF: compare_operand is split to multiple functions.

gcc/ChangeLog

2014-12-12  Martin Liska  mli...@suse.cz

	* ipa-icf-gimple.c (func_checker::compare_ssa_name): Enhance SSA
	name comparison.
	(func_checker::compare_memory_operand): New function.
	(func_checker::compare_operand): Split case to newly
	added functions.
	(func_checker::compare_cst_or_decl): New function.
	(func_checker::compare_gimple_call): Identify
	memory operands.
	(func_checker::compare_gimple_assign): Likewise.
	* ipa-icf-gimple.h: New function.
---
 gcc/ipa-icf-gimple.c | 243 ---
 gcc/ipa-icf-gimple.h |   9 +-
 2 files changed, 142 insertions(+), 110 deletions(-)

diff --git a/gcc/ipa-icf-gimple.c b/gcc/ipa-icf-gimple.c
index fa2c353..2530a19 100644
--- a/gcc/ipa-icf-gimple.c
+++ b/gcc/ipa-icf-gimple.c
@@ -110,6 +110,9 @@ func_checker::~func_checker ()
 bool
 func_checker::compare_ssa_name (tree t1, tree t2)
 {
+  gcc_assert (TREE_CODE (t1) == SSA_NAME);
+  gcc_assert (TREE_CODE (t2) == SSA_NAME);
+
   

Re: [PATCH 4/4] OpenMP 4.0 offloading to Intel MIC: non-fallback testing

2014-12-18 Thread Ilya Verbin
Hi,

2014-12-18 16:27 GMT+01:00 Thomas Schwinge tho...@codesourcery.com:
 Hi!

 On Thu, 30 Oct 2014 14:40:01 +0300, Ilya Verbin iver...@gmail.com wrote:
 This patch allows to run non-fallback 'make check-target-libgomp'.  It 
 passes to
 the host compiler additional -B options with the paths to the offload 
 compilers,
 since non-installed host compiler doesn't know where to find mkoffload tools.
 Also in case of intelmic offload targets it appends paths to liboffloadmic 
 lib.

 --- a/libgomp/configure.ac
 +++ b/libgomp/configure.ac
 @@ -280,9 +280,13 @@ else
multilib_arg=
  fi

 +# Get accel target and path to install tree of accel compiler
 +offload_additional_options=
 +offload_additional_lib_paths=
  offload_targets=
  if test x$enable_offload_targets != x; then
for tgt in `echo $enable_offload_targets | sed -e 's#,# #g'`; do
 +tgt_dir=`echo $tgt | grep '=' | sed 's/.*=//'`
  tgt=`echo $tgt | sed 's/=.*//'`
  case $tgt in
*-intelmic-* | *-intelmicemul-*)
 @@ -295,10 +299,20 @@ if test x$enable_offload_targets != x; then
  else
offload_targets=$offload_targets,$tgt_name
  fi
 +if test x$tgt_dir != x; then
 +  offload_additional_options=$offload_additional_options 
 -B$tgt_dir/libexec/gcc/\$(target_alias)/\$(gcc_version) -B$tgt_dir/bin
 +  
 offload_additional_lib_paths=$offload_additional_lib_paths:$tgt_dir/lib64:$tgt_dir/lib
 +else
 +  offload_additional_options=$offload_additional_options 
 -B\$(libexecdir)/gcc/\$(target_alias)/\$(gcc_version) -B\$(bindir)
 +  
 offload_additional_lib_paths=$offload_additional_lib_paths:$toolexeclibdir
 +fi
done
  fi

 Hmm, maybe I'm seeing a problem where there isn't one, but in case I'm
 not: how will this work if there is more than one offloading compiler
 configured?  Won't you get conflicting -B paths added to
 offload_additional_options in this case?

 --- a/libgomp/testsuite/lib/libgomp.exp
 +++ b/libgomp/testsuite/lib/libgomp.exp
 @@ -169,6 +186,10 @@ proc libgomp_init { args } {

  # Disable color diagnostics
  lappend ALWAYS_CFLAGS additional_flags=-fdiagnostics-color=never
 +
 +# Required to support non-fallback testing of '#pragma omp target'.
 +# Help GCC to find target mkoffload.
 +lappend ALWAYS_CFLAGS additional_flags=${offload_additional_options}
  }

 What we're doing in OpenACC offloading testing (gomp-4_0-branch), is in
 the libgomp.oacc-c/c.exp (etc.) file cycle through all the available
 offloading devices, and then in there -- I think -- such options should
 be set, that are specific to one particular offloading device/compiler?


 Grüße,
  Thomas

I have not tested a compiler, configured to support 2 different
offloading targets, so there might be some corner cases.

In this place I don't see any problems, at least for the case with
installed offloading compilers.
One -B allows to find mkoffload in lto-wrapper:compile_offload_image.
This function tries to open all paths + '/accel/target_name/mkoffload'
suffix. So, there should be no conflicts.
Another -B allows mkoffload to find target driver. It tries to open
'host_name-accel-target_name-gcc', so, there also should be no
conflicts.

However, I still did not tried to enable 'make check' with
non-installed offloading compilers. Probably such target specific
paths can help there.

  -- Ilya


Re: [PATCH] X86-64: Add -mskip-rax-setup

2014-12-18 Thread H.J. Lu
On Thu, Dec 18, 2014 at 9:23 AM, Uros Bizjak ubiz...@gmail.com wrote:
 On Thu, Dec 18, 2014 at 6:08 PM, H. Peter Anvin h...@zytor.com wrote:
 On 12/18/2014 06:12 AM, Uros Bizjak wrote:

 # temporary until string.h is fixed
 KBUILD_CFLAGS += -ffreestanding

 Yes, it looks to me that new option is the way to go.

 Is this an OK?

 In principle, I'm OK with the patch approach, but let's wait for
 eventual comments from Linux people.


 Acked-by: H. Peter Anvin h...@linux.intel.com

 H.J. already coordinated with us; we are more than happy with this approach.

 Great, I'm glad that this specialized option has uses.

 Patch is OK.

It is in now.  Here is the GCC 5 change entry.  OK to install?

Peter, please feel free to use my kernel patch or create a different
one.

Thanks.

-- 
H.J.
---
Index: gcc-5/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-5/changes.html,v
retrieving revision 1.52
diff -u -p -r1.52 changes.html
--- gcc-5/changes.html 15 Dec 2014 19:55:08 - 1.52
+++ gcc-5/changes.html 18 Dec 2014 17:42:18 -
@@ -444,6 +444,10 @@ void operator delete[] (void *, std::siz
  place of the __fentry__ or mcount call, so that a call per function
  can be later patched in. This can be used for low overhead tracing or
  hot code patching./li
+ liThe new code-mskip-rax-setup/code option to skip setting
+ up RAX register when SSE is disabled and there are no variable
+ arguments passed in vector registers.  This can be used to
+ optimize Linux kernel./li
   /ul

 h3 id=shSH/h3


Re: [PATCH] X86-64: Add -mskip-rax-setup

2014-12-18 Thread H. Peter Anvin
On 12/18/2014 09:43 AM, H.J. Lu wrote:
 
 Peter, please feel free to use my kernel patch or create a different
 one.
 

Great, thanks!

-hpa




Re: [PATCH 4/4] OpenMP 4.0 offloading to Intel MIC: non-fallback testing

2014-12-18 Thread Jakub Jelinek
On Thu, Dec 18, 2014 at 06:41:18PM +0100, Ilya Verbin wrote:
  What we're doing in OpenACC offloading testing (gomp-4_0-branch), is in
  the libgomp.oacc-c/c.exp (etc.) file cycle through all the available
  offloading devices, and then in there -- I think -- such options should
  be set, that are specific to one particular offloading device/compiler?
 
 I have not tested a compiler, configured to support 2 different
 offloading targets, so there might be some corner cases.
 
 In this place I don't see any problems, at least for the case with
 installed offloading compilers.
 One -B allows to find mkoffload in lto-wrapper:compile_offload_image.
 This function tries to open all paths + '/accel/target_name/mkoffload'
 suffix. So, there should be no conflicts.
 Another -B allows mkoffload to find target driver. It tries to open
 'host_name-accel-target_name-gcc', so, there also should be no
 conflicts.
 
 However, I still did not tried to enable 'make check' with
 non-installed offloading compilers. Probably such target specific
 paths can help there.

Yeah, generally we want to be able to support all 3 (or 4 etc.) compilers
installed into the same DESTDIR and prefix, so extra -B arguments in there
should not break anything, for each of the offloading compilers you'd just
add what -B options it needs.

Jakub


Re: [C++ Patch] PR 60955

2014-12-18 Thread Paolo Carlini

Hi,

On 12/18/2014 06:21 PM, Jason Merrill wrote:

On 12/18/2014 11:31 AM, Paolo Carlini wrote:

+  warning_sentinel s (extra_warnings);


Let's add a comment about which warning we're avoiding here.  OK with 
that change.
Thanks. I'm attaching what I just committed. This is a regression, I 
suppose the patch is ok for 4_9-branch too, right?


Thanks,
Paolo.

/
/cp
2014-12-18  Paolo Carlini  paolo.carl...@oracle.com

PR c++/60955
* pt.c (struct warning_sentinel): Move it...
* cp-tree.h: ... here.
* semantics.c (force_paren_expr): Use it.

/testsuite
2014-12-18  Paolo Carlini  paolo.carl...@oracle.com

PR c++/60955
* g++.dg/warn/register-parm-1.C: New.
Index: cp/cp-tree.h
===
--- cp/cp-tree.h(revision 218870)
+++ cp/cp-tree.h(working copy)
@@ -1149,6 +1149,18 @@ struct processing_template_decl_sentinel
   }
 };
 
+/* RAII sentinel to disable certain warnings during template substitution
+   and elsewhere.  */
+
+struct warning_sentinel
+{
+  int flag;
+  int val;
+  warning_sentinel(int flag, bool suppress=true)
+: flag(flag), val(flag) { if (suppress) flag = 0; }
+  ~warning_sentinel() { flag = val; }
+};
+
 /* The cached class binding level, from the most recently exited
class, or NULL if none.  */
 
Index: cp/pt.c
===
--- cp/pt.c (revision 218870)
+++ cp/pt.c (working copy)
@@ -14438,16 +14438,6 @@ tsubst_non_call_postfix_expression (tree t, tree a
   return t;
 }
 
-/* Sentinel to disable certain warnings during template substitution.  */
-
-struct warning_sentinel {
-  int flag;
-  int val;
-  warning_sentinel(int flag, bool suppress=true)
-: flag(flag), val(flag) { if (suppress) flag = 0; }
-  ~warning_sentinel() { flag = val; }
-};
-
 /* Like tsubst but deals with expressions and performs semantic
analysis.  FUNCTION_P is true if T is the F in F (ARGS).  */
 
Index: cp/semantics.c
===
--- cp/semantics.c  (revision 218870)
+++ cp/semantics.c  (working copy)
@@ -1660,6 +1660,9 @@ force_paren_expr (tree expr)
  tree type = unlowered_expr_type (expr);
  bool rval = !!(kind  clk_rvalueref);
  type = cp_build_reference_type (type, rval);
+ /* This inhibits warnings in, eg, cxx_mark_addressable
+(c++/60955).  */
+ warning_sentinel s (extra_warnings);
  expr = build_static_cast (type, expr, tf_error);
  if (expr != error_mark_node)
REF_PARENTHESIZED_P (expr) = true;
Index: testsuite/g++.dg/warn/register-parm-1.C
===
--- testsuite/g++.dg/warn/register-parm-1.C (revision 0)
+++ testsuite/g++.dg/warn/register-parm-1.C (working copy)
@@ -0,0 +1,9 @@
+// PR c++/60955
+// { dg-options -Wextra }
+
+unsigned int erroneous_warning(register int a) {
+if ((a)  0xff) return 1; else return 0;
+}
+unsigned int no_erroneous_warning(register int a) {
+if (a  0xff) return 1; else return 0;
+}


Re: Pragma parsing

2014-12-18 Thread Thomas Schwinge
Hi Jakub!

On Thu, 18 Dec 2014 15:29:42 +0100, Jakub Jelinek ja...@redhat.com wrote:
 On Thu, Dec 18, 2014 at 02:48:32PM +0100, Thomas Schwinge wrote:
   Like for CILK, I'd strongly prefer if for the clauses that are
   specific to OpenACC only you'd use PRAGMA_OACC_CLAUSE_* instead,
   and put them after the PRAGMA_CILK_* enum values.
   If you want to have PRAGMA_OACC_CLAUSE_ aliases also for the
   clauses shared in between OpenMP and OpenACC, feel free to add
   aliases like Cilk+ has them.  It is unfortunately lots of new clauses
   and we are getting close to the 64 clauses limit :( when they are used
   in bitmasks.

OK, so there is this limit.  But, I fail to understand how merely moving
the OpenACC-only PRAGMA_*_CLAUSE_* to the end of enum pragma_omp_clause
will help overcome that?  Or have I now completely confused myself, and
I'm not even understanding the problem anymore?  :-|

The only way this would make sense (in my confused mind) is, if we then
did PRAGMA_OACC_CLAUSE_REDUCTION = PRAGMA_OMP_CLAUSE_REDUCTION for the
clauses whose names exist in both standards, and
PRAGMA_OACC_CLAUSE_PRESENT = [some PRAGMA_OMP_CLAUSE_* whose name does
not exist in OpenACC], taking care that no two PRAGMA_OACC_CLAUSE_* are
assigned the same enum value.  Wouldn't this become very cumbersome for
future maintenance, however?

  I still don't like this very much, because we'll then get a mess of
  PRAGMA_OMP_CLAUSE_* intermixed with PRAGMA_OACC_CLAUSE_*.  I understand,
  for example, PRAGMA_OMP_CLAUSE_REDUCTION to just be a numeric
  representation of the reduction string -- and then, it doesn't make
  too much sense to me to express this both as PRAGMA_OMP_CLAUSE_REDUCTION
  and PRAGMA_OACC_CLAUSE_REDUCTION, or, similarly, to switch back and forth
  between PRAGMA_OMP_CLAUSE_* and PRAGMA_OACC_CLAUSE_*.  Yet, that's just
  the point that I'm making, so I'll defer if there's no consensus to be
  found here.
 
 The point is that we now have lots of clauses, and making it clear what from
 those clauses are Cilk+, what are OpenACC, what are OpenMP will help with
 code readability.

The idea then is, I guess, to split parsing of OpenACC's
PRAGMA_OACC_CLAUSE_* out of gcc/c/c-parser.c:c_parser_omp_clause_name
into a new c_parser_oacc_clause_name, just like we already have a
separate c_parser_oacc_all_clauses and c_parser_omp_all_clauses?  (The
motivation for the latter is that this is where the disambiguation
between the (possibly) different semantics of OpenACC and OpenMP clauses
happens, that is, where the names PRAGMA_OMP_CLAUSE_* are resolved to
actual gcc/tree-core.h:enum omp_clause_code's OMP_CLAUSE_*.  For example,
there a OpenACC copy clause is translated into a OMP_CLAUSE_MAP with
tofrom kind.)

 Especially if some clauses diverge in their meaning
 between different standards.

When you say clauses here, do you mean gcc/c-family/c-pragma.h:enum
pragma_omp_clause's PRAGMA_*_CLAUSE_*, or gcc/tree-core.h:enum
omp_clause_code's OMP_CLAUSE_*, or both?  That is, are you suggesting
that we should also be adding new OACC_CLAUSE_* next to the existing
OMP_CLAUSE_* in gcc/tree-core.h:enum omp_clause_code?  (Which would then
require separate handling all over the middle end?)  I thought the
(long-term) goal is to unfiy all this even more?  Or, as above, I have
now completely confused myself, and I'm not even understanding the
problem anymore?  :-|

Or, alternatively, wouldn't it also work if we did have a separate
gcc/c-family/c-pragma.h:enum pragma_oacc_clause (instead of adding to
enum pragma_omp_clause), and then the OpenACC clauses at the parsing
level live in their own namespace?  (The same should then be done for
Cilk+, too, even though the problem there is not as severe as for
OpenACC, as its only two handful PRAGMA_CILK_CLAUSE_*.)  This means we
can't support combined OpenACC/OpenMP pragmas -- but that doesn't make
much sense anyway, so no problem.  But I'm not sure that is actually the
approach you've been suggesting?


Grüße,
 Thomas


pgpdeo0OMWERo.pgp
Description: PGP signature


Re: Pragma parsing

2014-12-18 Thread Jakub Jelinek
On Thu, Dec 18, 2014 at 07:00:09PM +0100, Thomas Schwinge wrote:
 OK, so there is this limit.  But, I fail to understand how merely moving
 the OpenACC-only PRAGMA_*_CLAUSE_* to the end of enum pragma_omp_clause
 will help overcome that?  Or have I now completely confused myself, and
 I'm not even understanding the problem anymore?  :-|
 
 The only way this would make sense (in my confused mind) is, if we then
 did PRAGMA_OACC_CLAUSE_REDUCTION = PRAGMA_OMP_CLAUSE_REDUCTION for the
 clauses whose names exist in both standards, and
 PRAGMA_OACC_CLAUSE_PRESENT = [some PRAGMA_OMP_CLAUSE_* whose name does
 not exist in OpenACC], taking care that no two PRAGMA_OACC_CLAUSE_* are
 assigned the same enum value.  Wouldn't this become very cumbersome for
 future maintenance, however?

Well, if you e.g. put PRAGMA_OMP_CLAUSE_* that isn't relevant to OpenACC
first, then pragmas shared by both, then OpenACC pragmas, then perhaps you
could use biased bitmasks for the OpenACC constructs.  Anyway, guess keep
the ordering as is.

   I still don't like this very much, because we'll then get a mess of
   PRAGMA_OMP_CLAUSE_* intermixed with PRAGMA_OACC_CLAUSE_*.  I understand,
   for example, PRAGMA_OMP_CLAUSE_REDUCTION to just be a numeric
   representation of the reduction string -- and then, it doesn't make
   too much sense to me to express this both as PRAGMA_OMP_CLAUSE_REDUCTION
   and PRAGMA_OACC_CLAUSE_REDUCTION, or, similarly, to switch back and forth
   between PRAGMA_OMP_CLAUSE_* and PRAGMA_OACC_CLAUSE_*.  Yet, that's just
   the point that I'm making, so I'll defer if there's no consensus to be
   found here.
  
  The point is that we now have lots of clauses, and making it clear what from
  those clauses are Cilk+, what are OpenACC, what are OpenMP will help with
  code readability.
 
 The idea then is, I guess, to split parsing of OpenACC's
 PRAGMA_OACC_CLAUSE_* out of gcc/c/c-parser.c:c_parser_omp_clause_name
 into a new c_parser_oacc_clause_name, just like we already have a
 separate c_parser_oacc_all_clauses and c_parser_omp_all_clauses?  (The
 motivation for the latter is that this is where the disambiguation
 between the (possibly) different semantics of OpenACC and OpenMP clauses
 happens, that is, where the names PRAGMA_OMP_CLAUSE_* are resolved to
 actual gcc/tree-core.h:enum omp_clause_code's OMP_CLAUSE_*.  For example,
 there a OpenACC copy clause is translated into a OMP_CLAUSE_MAP with
 tofrom kind.)

I don't think that is needed.  It is the same parsing, so it can be shared,
just use PRAGMA_OMP_* for OpenMP clauses or clauses shared between OpenMP
and OpenACC and PRAGMA_OACC_* for OpenACC specific pragmas in the code
that works with both, and in OpenACC specific FE code the PRAGMA_OACC_*
aliases to PRAGMA_OMP_* values for the shared clauses.
I just want to be able to clearly see what is OpenACC specific and what is
not, similarly how Cilk+ already does it.

Jakub


Re: [patch c++]: Fix PR/61198: Crash when selecting specializations through aliases.

2014-12-18 Thread Kai Tietz
2014-12-18 18:26 GMT+01:00 Jason Merrill ja...@redhat.com:
 On 12/18/2014 10:10 AM, Kai Tietz wrote:

 +  if (TMPL_ARGS_DEPTH (args)  TMPL_PARMS_DEPTH (tmpl_parms))
 +args = get_innermost_template_args
 +(args, TMPL_PARMS_DEPTH (tmpl_parms));


 It seems unlikely to be correct to arbitrarily strip off outer template
 args.  Where did the mismatch come from?

 Jason


Hmm, we are comming from #2 tsubst_decl (cp/pt.c:11278)
#3 instantiate_template_1 (cp/pt.c:15945)
#4 instantiate_template (cp/pt.c:15995)
#5 instantiate_alias_template (cp/pt.c:11827)
#6: lookup_template_class_1
#7: lookup_template_class
#8: finish_template_type
#9: cp_parser_template_id


Well, in general I would have assumed to be able to get alias decl of
tmpl. Wasn't able to find a simple way to get it. So, by looking into
source I found that most cases handling args  tmpl-args by using
inner_most_template_args instead. Defaulting to
innermost_template_args looks indeed a bit wrong, but seemed to work.

Kai


Including a file from include/ in gcc/*.h (was: [gomp4] Use include/gomp-constants.h more actively)

2014-12-18 Thread Thomas Schwinge
Hi!

On Wed, 17 Dec 2014 23:26:53 +0100, I wrote:
 Committed to gomp-4_0-branch in r218840:
 
 commit febcd8dfdb10fa80edff0880973d1915ca2fef74
 Author: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4
 Date:   Wed Dec 17 22:26:24 2014 +
 
 Use include/gomp-constants.h more actively.

 diff --git gcc/tree-core.h gcc/tree-core.h
 index 743bc0d..fc61b88 100644
 --- gcc/tree-core.h
 +++ gcc/tree-core.h
 @@ -32,6 +32,7 @@ along with GCC; see the file COPYING3.  If not see
  #include alias.h
  #include flags.h
  #include symtab.h
 +#include gomp-constants.h
  
  /* This file contains all the data structures that define the 'tree' type.
 There are no accessor macros nor functions in this file. Only the

Is it actually OK to #include gomp-constants.h (living in
[GCC]/include/) from gcc/tree-core.h?  Isn't the tree-core.h file getting
installed, for later use by plugins -- which then need to be able to find
gomp-constants, too, but that is not being installed?

 --- include/gomp-constants.h
 +++ include/gomp-constants.h


Grüße,
 Thomas


pgpwBcuIrVxlU.pgp
Description: PGP signature


Re: Including a file from include/ in gcc/*.h (was: [gomp4] Use include/gomp-constants.h more actively)

2014-12-18 Thread Jakub Jelinek
On Thu, Dec 18, 2014 at 07:25:03PM +0100, Thomas Schwinge wrote:
 Hi!
 
 On Wed, 17 Dec 2014 23:26:53 +0100, I wrote:
  Committed to gomp-4_0-branch in r218840:
  
  commit febcd8dfdb10fa80edff0880973d1915ca2fef74
  Author: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4
  Date:   Wed Dec 17 22:26:24 2014 +
  
  Use include/gomp-constants.h more actively.
 
  diff --git gcc/tree-core.h gcc/tree-core.h
  index 743bc0d..fc61b88 100644
  --- gcc/tree-core.h
  +++ gcc/tree-core.h
  @@ -32,6 +32,7 @@ along with GCC; see the file COPYING3.  If not see
   #include alias.h
   #include flags.h
   #include symtab.h
  +#include gomp-constants.h
   
   /* This file contains all the data structures that define the 'tree' type.
  There are no accessor macros nor functions in this file. Only the
 
 Is it actually OK to #include gomp-constants.h (living in
 [GCC]/include/) from gcc/tree-core.h?  Isn't the tree-core.h file getting
 installed, for later use by plugins -- which then need to be able to find
 gomp-constants, too, but that is not being installed?

Generally, it must be possible to include include/ headers from gcc/
headers, even when they are used by plugins.  Otherwise system.h including
libiberty.h and safe-ctype.h etc. wouldn't work.  Perhaps you need to add
gomp-constants.h to some Makefile variable or something, look at how is
safe-ctype.h etc. handled.

That said, including gomp-constants.h from tree-core.h is I think very much
against all the Andrew's efforts to flatten headers (which is something I'm
not very happy with generally, but in this case, I think only the very few
files that really need the constants should include it).

Jakub


Re: [PATCH] X86-64: Add -mskip-rax-setup

2014-12-18 Thread Rasmus Villemoes
On Thu, Dec 18 2014, H. Peter Anvin h...@zytor.com wrote:

 On 12/18/2014 09:43 AM, H.J. Lu wrote:
 
 Peter, please feel free to use my kernel patch or create a different
 one.
 

 Great, thanks!


Thanks for accepting this idea, and especially to H.J. for already
having done all the work.

Minor thing: If it's not too late, I'd appreciate a 'Suggested-by' or
similar mention in the kernel change log.

Thanks,
Rasmus


Re: [PATCH] X86-64: Add -mskip-rax-setup

2014-12-18 Thread H. Peter Anvin
On 12/18/2014 10:37 AM, Rasmus Villemoes wrote:
 
 Minor thing: If it's not too late, I'd appreciate a 'Suggested-by' or
 similar mention in the kernel change log.
 

I think we can get that.

-hpa




Re: [RFC, PATCH, fortran] PR fortran/60255 Deferred character length

2014-12-18 Thread Andre Vehreschild
Hi all,

here is my next try on proposing a patch for the issue in pr60255. It took me
quite some time to understand the intricacies with handling variables
associated in a select type. I think I got most of the issues fixed now:

- Added generation of _len component for each unlimited polymorphic pointer.
- Removed (my own) _len component creation routine.
- Removed the double underscore in get_len_component().
- Associating an unlimited polymorphic entity to a deferred char array now lets
  the deferred char array use the actual string length from the '_len'
  component of the unlimited polymorphic entity for the charlen instead of the
  size component of the vptr.
- Removed: Generating a special vtab name for deferred strings. A deferred
  string assigned to the unlimited polymorphic entity is now stored as having 
  charlen zero again.
- Basic support for char array arrays (No stuttering here) in u-poly variables.

Bootstraps ok on x86_64-linux-gnu. Comparing regtests I get a difference in
unlimited_polymorphic_2.f90 that I don't understand yet. May be that is only,
because one error message disappeared.

Attached is the full patch for trunk and a delta patch for those of you who
already have my pr60255_3 added.

I don't provide a changelog entry yet, because I think review will find some
issues still to fix. So, comments welcome!

Regards,
Andre

On Tue, 9 Dec 2014 14:16:05 +0100
Dominique d'Humières domi...@lps.ens.fr wrote:

 Dear Andre,
 
 The patch causes an ICE for the test gfortran.dg/unlimited_polymorphic_1.f03:
 
 f951: internal compiler error: in gfc_add_component_ref, at
 fortran/class.c:236
 
 f951: internal compiler error: Abort trap: 6
 gfc: internal compiler error: Abort trap: 6 (program f951)
 Abort
 
 Reduced test for which the ICE is triggered by ‘len(w)'
 
 MODULE m
 
 contains
   subroutine bar (arg, res)
 class(*) :: arg
 character(100) :: res
 select type (w = arg)
   type is (character(*))
 write (res, '(I2)') len(w)
 end select
   end subroutine
 
 END MODULE
 
 Note that with your patch at
 https://gcc.gnu.org/ml/fortran/2014-08/msg00022.html, I get the same ICE for
 the Mikael’s test at https://gcc.gnu.org/ml/fortran/2014-08/msg00055.html
 (before your patch for pr60255, it used to give a wrong length: 80 instead of
 20 AFAICR).
 
 Note that the assert at fortran/class.c:236 is also triggered for pr61115.
 
 Thanks for working on these issues,
 
 Dominique
 
  On 8 December 2014 at 18:38, Andre Vehreschild ve...@gmx.de wrote:
  Hi all,
  
  please find attached a more elaborate patch for pr60255. I totally agree
  that my first attempt was just scratching the surface of the work needed.
  
  This patch also is *not* complete, but because I am really new to gfortran
  patching, I don't want to present a final patch only to learn then, that I
  have violated design rules, common practice or the like. Therefore please
  comment and direct me to any sources/ideas to improve the patch.
  
  Topic:
  The pr 60255 is about assigning a char array to an unlimited polymorphic
  entity. In the comments the concern about the lost length information is
  raised. The patch adds a _len component to the unlimited polymorphic entity
  (after _data and _vtab) and adds an assignment of the string length to _len
  when a string is pointer assigned to the unlimited poly entity.
  Furthermore is the intrinsic len(unlimited poly pointing to a string)
  resolved to give the _len component.
  
  Yet missing:
  - assign _len component back to deferred char array length component
  - transport length along chains of unlimited poly entities, i.e., a = b;
  c = a where all objects are unlimited poly and b is a string.
  - allocate() in this context
  
  Patch dependencies:
  none
  
  Comments, concerns, candy welcome!
  
  Regards,
 Andre
 
 


-- 
Andre Vehreschild * Kreuzherrenstr. 8 * 52062 Aachen
Tel.: +49 241 9291018 * Email: ve...@gmx.de 
diff --git a/gcc/fortran/class.c b/gcc/fortran/class.c
index 29e31e1..f5a815c 100644
--- a/gcc/fortran/class.c
+++ b/gcc/fortran/class.c
@@ -34,6 +34,12 @@ along with GCC; see the file COPYING3.  If not see
  (pointer/allocatable/dimension/...).
 * _vptr: A pointer to the vtable entry (see below) of the dynamic type.
 
+Only for unlimited polymorphic classes:
+* _len:  An integer(4) to store the string length when the unlimited
+ polymorphic pointer is used to point to a char array. The '_len'
+ component will be zero when no character array is stored in
+ '_data'.
+
For each derived type we set up a vtable entry, i.e. a structure with the
following fields:
 * _hash: A hash value serving as a unique identifier for this type.
@@ -544,10 +550,41 @@ gfc_intrinsic_hash_value (gfc_typespec *ts)
 }
 
 
+/* Get the _len component from a class/derived object storing a string.  */
+
+gfc_expr *
+gfc_get_len_component (gfc_expr *e)
+{
+  gfc_expr 

Re: [PATCH 0/9] Enable LRA on SH

2014-12-18 Thread Vladimir Makarov

On 2014-12-17 7:56 PM, Kaz Kojima wrote:

This patch series is to make SH target use LRA and is discussed
in the PR target/55212:

[SH] Switch to LRA
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55212

I think there are issues which should be solved before defaulting
LRA on SH and the helps from the experts are needed.  The series
are constructed as

[PATCH 1/9] LRA: Take account implicit usage of pseudo reg in mem arg
[PATCH 2/9] LRA: Swap base_term and index_term for some case
[PATCH 3/9] Add TARGET_CANNOT_SUBSTITUTE_MEM_EQUIV target macro
[PATCH 4/9] Add TARGET_LEGITIMIZE_ADDRESS_DISPLACEMENT target macro
[PATCH 5/9] [SH] Add -mlra option
[PATCH 6/9] [SH] Miscellaneous changes for LRA
[PATCH 7/9] [SH] Modify movsi_ie movsf_ie patterns for LRA
[PATCH 8/9] [SH] Add splitter to addsi3_compact
[PATCH 9/9] [SH] Split QI/HImode load/store via r0

Patch 1-4 touch generic files and 5-9 are SH specific.  1-3, 6 and 7
are for getting rid of ICEs.  4, 8 and 9 are to fix some serious
code regressions.  The patches are tested with bootstrap and the top
level make -k check on i686-pc-linux-gnu with no new failures.
-mlra makes ~0.3% code size regression on the current sh-lra branch.
It solves a few known failures and causes a few new failures with
make -k check on sh4-unknown-linux-gnu.



Thank you very much for your work on porting LRA to SH!  I know it
is quite a challenge to do any RA work for this target.  And sorry,
I had no time to pay attention to this work.  To be honest I thought
that SH will be one of the last target ported to LRA.  It is a surprise
for me, that porting SH to LRA does not need many changes in LRA itself.

I'll try to review the first 4 patches as soon as possible although I'll 
be on 2 week vacation since tomorrow.




Re: [PATCH 0/9] Enable LRA on SH

2014-12-18 Thread Oleg Endo
On Thu, 2014-12-18 at 13:42 -0500, Vladimir Makarov wrote:

 Thank you very much for your work on porting LRA to SH!  I know it
 is quite a challenge to do any RA work for this target.  

The R0-ness (both, GP reg r0 and FP reg fr0) of the ISA taxes the
patience sometimes.  Too often, I've been thinking about hiding the
r0/fr0 regs from the RA process and do some custom things
pre-RA/post-RA ... hopefully LRA will stop me from thinking more about
that :)

Cheers,
Oleg



Re: [PATCH] X86-64: Add -mskip-rax-setup

2014-12-18 Thread Mike Stump
On Dec 18, 2014, at 9:43 AM, H.J. Lu hjl.to...@gmail.com wrote:
 + arguments passed in vector registers.  This can be used to
 + optimize Linux kernel./li

English, to optimize _the_ Linux kernel?


Re: [debug-early] reuse variable DIEs and fix their context

2014-12-18 Thread Aldy Hernandez

Hi Jason.

It's embarrassing that I just got to this now.  I hope you don't repay 
the favor and take as long responding to me :(.


On 09/12/14 08:15, Jason Merrill wrote:

On 09/11/2014 08:51 PM, Aldy Hernandez wrote:



   /* Generate hidden aliases for Java.  */
-  if (candidates)
+  if (java_hidden_aliases)
 {
-  build_java_method_aliases (candidates);
-  delete candidates;
+  build_java_method_aliases (java_hidden_aliases);
+  delete java_hidden_aliases;
 }


Didn't it work to move this before finalize?  I think the VTV stuff is
all that really needs to come after it, and that can move out of the
front end if this hook is a problem (which I don't really think it is).


I can't move the call to build_java_method_aliases until the compilation 
proper has run because said function iterates through all the functions 
with FOR_EACH_FUNCTION, and the set of available functions at parse time 
and after the optimization passes are quite different, presumably 
because we've pruned all the unused functions.


If I move the call to build_java_method_aliases before the optimization 
passes, we end up emitting many more hidden aliases which cause 
handle_alias_pairs() in cgraphunit.c to bark with:


error: 'void _ZGAN8__JArrayC4Ev()' aliased to external symbol 
'_ZN8__JArrayC4Ev'


So we either leave building Java method aliases after the optimization 
passes, or we somehow tighten the candidate selection in 
collect_candidates_for_java_method_aliases():


  if (DECL_CLASS_SCOPE_P (fndecl)
   TYPE_FOR_JAVA (DECL_CONTEXT (fndecl))
   TARGET_USE_LOCAL_THUNK_ALIAS_P (fndecl))

What do you suggest?

I can, however, move all the other non-VTV stuff to the parser in a 
subsequent patch, while you respond to this query.


Thanks.
Aldy


Re: [PATCH 0/9] Enable LRA on SH

2014-12-18 Thread Mike Stump
On Dec 18, 2014, at 10:42 AM, Vladimir Makarov vmaka...@redhat.com wrote:
 Thank you very much for your work on porting LRA to SH!

I’ve ported my target to LRA, and I have not found any LRA bits I needed to 
change.


Re: [PATCH] X86-64: Add -mskip-rax-setup

2014-12-18 Thread H.J. Lu
On Thu, Dec 18, 2014 at 11:22 AM, Mike Stump mikest...@comcast.net wrote:
 On Dec 18, 2014, at 9:43 AM, H.J. Lu hjl.to...@gmail.com wrote:
 + arguments passed in vector registers.  This can be used to
 + optimize Linux kernel./li

 English, to optimize _the_ Linux kernel?

Updated.

-- 
H.J.
---
Index: gcc-5/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-5/changes.html,v
retrieving revision 1.52
diff -u -p -r1.52 changes.html
--- gcc-5/changes.html 15 Dec 2014 19:55:08 - 1.52
+++ gcc-5/changes.html 18 Dec 2014 19:35:28 -
@@ -444,6 +444,10 @@ void operator delete[] (void *, std::siz
  place of the __fentry__ or mcount call, so that a call per function
  can be later patched in. This can be used for low overhead tracing or
  hot code patching./li
+ liThe new code-mskip-rax-setup/code option to skip setting
+ up RAX register when SSE is disabled and there are no variable
+ arguments passed in vector registers.  This can be used to
+ optimize the Linux kernel./li
   /ul

 h3 id=shSH/h3


Re: [PATCH 2/4] New data structure for cgraph_summary introduced.

2014-12-18 Thread Jan Hubicka
 2014-12-08  Martin Liska  mli...@suse.cz
 
   * cgraph.h (symbol_table::allocate_cgraph_symbol): Summary UID
   is filled up.
   * symbol-summary.h: New file.
   * gengtype.c (open_base_files): Add symbol-summary.h.
   * toplev.c (general_init): Call constructor of symbol_table.
 ---
  gcc/cgraph.h |   8 ++
  gcc/gengtype.c   |   4 +-
  gcc/symbol-summary.h | 281 
 +++
  gcc/toplev.c |   3 +-
  4 files changed, 293 insertions(+), 3 deletions(-)
  create mode 100644 gcc/symbol-summary.h
 
 diff --git a/gcc/cgraph.h b/gcc/cgraph.h
 index a5c5f56..1664bd7 100644
 --- a/gcc/cgraph.h
 +++ b/gcc/cgraph.h
 @@ -1237,6 +1237,8 @@ public:
int count_materialization_scale;
/* Unique id of the node.  */
int uid;
 +  /* Summary unique id of the node.  */
 +  int summary_uid;

Hmm, can't we just use uid here?  I guess the only difference is that summary
uid is not kept dense, unlike the uid.
This should not propagate into much of trouble simply because the summary 
datastructure
should safely remove the summaires via removal hook.
 +
 +/* We want to pass just pointer types as argument for function_summary
 +   template class.  */
 +
 +template class T
 +class function_summary
 +{
 +private:
 +  function_summary();
 +};
 +
 +template class T
 +class GTY((user)) function_summary T *

Eventually I would like this to allow attaching summaries to variables (and 
symbols in general),
too. But currently we do not have use for it, so we can care about this later.

The patch is OK.  Preferrably with summary_uid replaced by uid if it is easily 
doable.

Honza


Re: [PATCH 3/4] First usage of cgraph_summary in ipa-prop pass.

2014-12-18 Thread Jan Hubicka
 
 gcc/lto/ChangeLog:
 
 2014-12-08  Martin Liska  mli...@suse.cz
 
   * lto-partition.c: Include of symbol-summary.h is added.
   * lto-symtab.c: Likewise.
   * lto.c: Likewise.

I must say I am not friend of flattening. Hope this will be resolved for 5.1
 
 gcc/ChangeLog:
 
 2014-12-08  Martin Liska  mli...@suse.cz
 
   * auto-profile.c: Include of symtab-summary.h is added.
   * cgraph.c: Likewise.
   * cgraphbuild.c: Likewise.
   * cgraphclones.c: Likewise.
   * cgraphunit.c: Likewise.
   * ipa-cp.c: Likewise.
   * ipa-devirt.c: Likewise.
   * ipa-icf.c: Likewise.
   * ipa-inline-analysis.c (evaluate_properties_for_edge): New
   ipa_node_params_d data structure is used.
   (inline_node_duplication_hook): Likewise.
   (estimate_function_body_sizes): Likewise.
   (remap_edge_change_prob): Likewise.
   (inline_merge_summary): Likewise.
   * ipa-inline-transform.c: Include of symtab-summary.h is added.
   * ipa-inline.c (early_inliner): New ipa_node_params_d data structure
   is used.
   * ipa-polymorphic-call.c: Include of symtab-summary.h is added.
   * ipa-profile.c: Include of symtab-summary.h is added.
   * ipa-prop.c (ipa_propagate_indirect_call_infos): New ipa_node_params_d
   data structure is used.
   (ipa_node_params::~ipa_node_params): New function.
   (ipa_free_all_node_params): Destruction is simplified.
   (ipa_node_removal_hook): Removed.
   (ipa_add_new_function): Renamed from ipa_node_duplication_hook.
   (ipa_node_params_t::duplicate): New function.
   (ipa_register_cgraph_hooks): Few hooks are removed.
   (ipa_unregister_cgraph_hooks): Likewise.
   (ipa_prop_write_jump_functions): New ipa_node_params_d is used.
   * ipa-prop.h (struct ipa_node_params): Destructor instroduced for
   the structure.
   (ipa_check_create_node_params): Vector for ipa_node_params is replaced
   with function_summary.
   * ipa-split.c: Include of symtab-summary.h is added.
   * ipa-utils.c: Include of symtab-summary.h is added.
   * ipa.c: Include of symtab-summary.h is added.
   * omp-low.c: Include of symtab-summary.h is added.
   * tree-inline.c: Include of symtab-summary.h is added.
   * tree-sra.c: Include of symtab-summary.h is added.
   * tree-ssa-pre.c: Include of symtab-summary.h is added.

OK.
Honza


Re: [PATCH 4/4] Data structure is used for inline_summary struct.

2014-12-18 Thread Jan Hubicka
 
 2014-12-08  Martin Liska  mli...@suse.cz
 
   * lto-partition.c (add_symbol_to_partition_1): New inline_summary_d
   is used.
   (undo_partition): Likewise.
   (lto_balanced_map): Likewise.
 
 gcc/ChangeLog:
 
 2014-12-08  Martin Liska  mli...@suse.cz
 
   * cgraphunit.c (symbol_table::process_new_functions): New 
 inline_summary_d
   is used.
   * ipa-cp.c (ipcp_cloning_candidate_p): Likewise.
   (devirtualization_time_bonus): Likewise.
   (estimate_local_effects): Likewise.
   (ipcp_propagate_stage): Likewise.
   * ipa-inline-analysis.c (evaluate_conditions_for_known_args): Likewise.
   (evaluate_properties_for_edge): Likewise.
   (inline_summary_alloc): Likewise.
   (reset_inline_summary): New inline_summary argument is introduced.
   (inline_summary_t::remove): New function.
   (inline_summary_t::duplicate): Likewise.
   (dump_inline_edge_summary): New inline_summary_d is used.
   (dump_inline_summary): Likewise.
   (estimate_function_body_sizes): Likewise.
   (compute_inline_parameters): Likewise.
   (estimate_edge_devirt_benefit): Likewise.
   (estimate_node_size_and_time): Likewise.
   (inline_update_callee_summaries): Likewise.
   (inline_merge_summary): Likewise.
   (inline_update_overall_summary): Likewise.
   (simple_edge_hints): Likewise.
   (do_estimate_edge_time): Likewise.
   (estimate_time_after_inlining): Likewise.
   (estimate_size_after_inlining): Likewise.
   (do_estimate_growth): Likewise.
   (growth_likely_positive): Likewise.
   (inline_generate_summary): Likewise.
   (inline_read_section): Likewise.
   (inline_read_summary): Likewise.
   (inline_write_summary): Likewise.
   (inline_free_summary): Likewise.
   * ipa-inline-transform.c (clone_inlined_nodes): Likewise.
   (inline_call): Likewise.
   * ipa-inline.c (caller_growth_limits): Likewise.
   (can_inline_edge_p): Likewise.
   (want_early_inline_function_p): Likewise.
   (compute_uninlined_call_time): Likewise.
   (compute_inlined_call_time): Likewise.
   (big_speedup_p): Likewise.
   (want_inline_small_function_p): Likewise.
   (edge_badness): Likewise.
   (update_caller_keys): Likewise.
   (update_callee_keys): Likewise.
   (recursive_inlining): Likewise.
   (inline_small_functions): Likewise.
   (inline_to_all_callers): Likewise.
   (dump_overall_stats): Likewise.
   (early_inline_small_functions): Likewise.
   * ipa-inline.h: New class inline_summary_t replaces
   vecinline_summary_t.
   * ipa-split.c (execute_split_functions): New inline_summary_d is used.
   * ipa.c (walk_polymorphic_call_targets): Likewise.
   * tree-sra.c (ipa_sra_preliminary_function_checks): Likewise.

Patch is OK. I only changed mind about inline_summary_d.  It does not look good 
used
with inline_summary_d-get() especially because _d prefix was used in 
C++ication days
for type names only.
Probably inline_summaries-get() is better.

OK with this change.

Thanks,
Honza


[debug-early] C++: moving more things before the compilation proper

2014-12-18 Thread Aldy Hernandez

Hi Jason.

As promised, I am moving most everything out of 
cxx_post_compilation_parsing_cleanups() which is the hook that will be 
called after the compilation proper has run 
(symtab-finalize_compilation_unit()).


Apart from the Java hidden alias support which I just asked you about, 
and the VTV stuff which will remain in the aforementioned hook, the only 
remaining bits are:


  input_location = locus_at_end_of_parsing;

#ifdef ENABLE_CHECKING
  validate_conversion_obstack ();
#endif /* ENABLE_CHECKING */

I am keeping the input_location bit because the final VTV stuff has some 
error() calls which presumably will need a location.


And I am keeping validate_conversion_obstack() because I *think* the VTV 
stuff will create calls?  If not, let me know and I can move this bit too.


In any case, things are pretty deserted in 
cxx_post_compilation_parsing_cleanups(), which should make everybody 
happy :-).


I am committing the following to the branch.  Let me know if you have 
any suggestions.


Thanks.
Aldy
commit 0cf41add70fad5defc4ff07b76f95e3aaadc1b05
Author: Aldy Hernandez al...@redhat.com
Date:   Thu Dec 18 11:47:19 2014 -0800

* cp/decl2.c (cxx_post_compilation_parsing_cleanups): Move some
final checks...
(c_parse_final_cleanups): ...here.

diff --git a/gcc/cp/decl2.c b/gcc/cp/decl2.c
index f9d1028..6441ab2 100644
--- a/gcc/cp/decl2.c
+++ b/gcc/cp/decl2.c
@@ -4723,6 +4723,21 @@ c_parse_final_cleanups (void)
  pending_statics-length (),
  EMIT_DEBUG_EARLY);
 }
+
+  perform_deferred_noexcept_checks ();
+
+  finish_repo ();
+
+  /* The entire file is now complete.  If requested, dump everything
+ to a file.  */
+  dump_tu ();
+
+  if (flag_detailed_statistics)
+{
+  dump_tree_statistics ();
+  dump_time_statistics ();
+}
+
   timevar_stop (TV_PHASE_DBGINFO);
   timevar_start (TV_PHASE_PARSING);
 }
@@ -4746,8 +4761,6 @@ cxx_post_compilation_parsing_cleanups (void)
   vtv_generate_init_routine ();
 }
 
-  perform_deferred_noexcept_checks ();
-
   /* Generate hidden aliases for Java.  */
   if (java_hidden_aliases)
 {
@@ -4755,17 +4768,6 @@ cxx_post_compilation_parsing_cleanups (void)
   delete java_hidden_aliases;
 }
 
-  finish_repo ();
-
-  /* The entire file is now complete.  If requested, dump everything
- to a file.  */
-  dump_tu ();
-
-  if (flag_detailed_statistics)
-{
-  dump_tree_statistics ();
-  dump_time_statistics ();
-}
   input_location = locus_at_end_of_parsing;
 
 #ifdef ENABLE_CHECKING


[PATCH, rs6000] PR59708, Fix operand ordering for 128-bit andc/orc

2014-12-18 Thread Pat Haugen
The following patch fixes testsuite failures 
builtin-arith-overflow-[10,11].c. In the case where 
rs6000_split_logical() was being called the operands were swapped such 
that the wrong operand was being complemented.


Bootstrap/tested on powerpc64-linux with no new regressions. Ok for 
trunk (and 4.9/4.8 after bootstrap/regtest)?


2014-12-18  Pat Haugen  pthau...@us.ibm.com

* config/rs6000/rs6000.md (boolcmode3_internal1): Fix operand
ordering.


Index: gcc/config/rs6000/rs6000.md
===
--- gcc/config/rs6000/rs6000.md	(revision 218827)
+++ gcc/config/rs6000/rs6000.md	(working copy)
@@ -7554,19 +7554,21 @@ (define_insn_and_split *boolmode3_int
 	 (const_string 16])
 
 ;; 128-bit ANDC/ORC
+;;   In the case where rs6000_split_logical is called, the NOT'd operand
+;;   must be opnd1 in order for the split insns to be recognized.
 (define_insn_and_split *boolcmode3_internal1
   [(set (match_operand:BOOL_128 0 vlogical_operand =BOOL_REGS_OUTPUT)
 	(match_operator:BOOL_128 3 boolean_operator
 	 [(not:BOOL_128
-	   (match_operand:BOOL_128 2 vlogical_operand BOOL_REGS_OP1))
-	  (match_operand:BOOL_128 1 vlogical_operand BOOL_REGS_OP2)]))]
+	   (match_operand:BOOL_128 1 vlogical_operand BOOL_REGS_OP1))
+	  (match_operand:BOOL_128 2 vlogical_operand BOOL_REGS_OP2)]))]
   TARGET_P8_VECTOR || (GET_CODE (operands[3]) == AND)
 {
   if (TARGET_VSX  vsx_register_operand (operands[0], MODEmode))
-return xxl%q3 %x0,%x1,%x2;
+return xxl%q3 %x0,%x2,%x1;
 
   if (TARGET_ALTIVEC  altivec_register_operand (operands[0], MODEmode))
-return v%q3 %0,%1,%2;
+return v%q3 %0,%2,%1;
 
   return #;
 }


patch to solve PR64291

2014-12-18 Thread Vladimir Makarov

  The following patch solves

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64291

  It is a bug in a new rematerialization subpass of LRA.

The patch was bootstrapped on x86/x86-64.

Committed as rev. 218874.

2014-12-18  Vladimir Makarov  vmaka...@redhat.com

PR rtl-optimization/64291
* lra-remat.c (bad_for_rematerialization_p): Add UNPSEC_VLOATILE.
(create_cands): Process only output reload insn with potential
cands.

2014-12-18  Vladimir Makarov  vmaka...@redhat.com

PR rtl-optimization/64291
* testsuite/gcc.target/i386/pr64291-[12].c: New tests.

Index: lra-remat.c
===
--- lra-remat.c (revision 218685)
+++ lra-remat.c (working copy)
@@ -350,12 +350,12 @@ finish_cand_table (void)
 
 
 
-/* Return true if X contains memory or UNSPEC.  We can not just check
-   insn operands as memory or unspec might be not an operand itself
-   but contain an operand.  Insn with memory access is not profitable
-   for rematerialization.  Rematerialization of UNSPEC might result in
-   wrong code generation as the UNPEC effect is unknown
-   (e.g. generating a label).  */
+/* Return true if X contains memory or some UNSPEC.  We can not just
+   check insn operands as memory or unspec might be not an operand
+   itself but contain an operand.  Insn with memory access is not
+   profitable for rematerialization.  Rematerialization of UNSPEC
+   might result in wrong code generation as the UNPEC effect is
+   unknown (e.g. generating a label).  */
 static bool
 bad_for_rematerialization_p (rtx x)
 {
@@ -363,7 +363,7 @@ bad_for_rematerialization_p (rtx x)
   const char *fmt;
   enum rtx_code code;
 
-  if (MEM_P (x) || GET_CODE (x) == UNSPEC)
+  if (MEM_P (x) || GET_CODE (x) == UNSPEC || GET_CODE (x) == UNSPEC_VOLATILE)
 return true;
   code = GET_CODE (x);
   fmt = GET_RTX_FORMAT (code);
@@ -406,7 +406,7 @@ operand_to_remat (rtx_insn *insn)
 if (reg-regno == STACK_POINTER_REGNUM  frame_pointer_needed)
   return -1;
 else if (reg-type == OP_OUT  ! reg-subreg_p
-find_regno_note (insn, REG_UNUSED, reg-regno) == NULL)
+ find_regno_note (insn, REG_UNUSED, reg-regno) == NULL)
   {
/* We permits only one spilled reg.  */
if (found_reg != NULL)
@@ -508,11 +508,14 @@ create_cands (void)
 
if ((set = single_set (insn)) != NULL
 REG_P (SET_SRC (set))  REG_P (SET_DEST (set))
-(src_regno = REGNO (SET_SRC (set))) = FIRST_PSEUDO_REGISTER
+((src_regno = REGNO (SET_SRC (set)))
+   = lra_constraint_new_regno_start)
 (dst_regno = REGNO (SET_DEST (set))) = FIRST_PSEUDO_REGISTER
 reg_renumber[dst_regno]  0
 (insn2 = regno_potential_cand[src_regno].insn) != NULL
 BLOCK_FOR_INSN (insn2) == BLOCK_FOR_INSN (insn))
+ /* It is an output reload insn after insn can be
+rematerialized (potential candidate).  */
  create_cand (insn2, regno_potential_cand[src_regno].nop, dst_regno);
if (nop  0)
  goto fail;
Index: testsuite/gcc.target/i386/pr64291-1.c
===
--- testsuite/gcc.target/i386/pr64291-1.c   (revision 0)
+++ testsuite/gcc.target/i386/pr64291-1.c   (working copy)
@@ -0,0 +1,51 @@
+/* { dg-options -O2 } */
+/* { dg-additional-sources pr64291-2.c } */
+/* { dg-do run } */
+void f(void*,...);
+void g(void*,long,long);
+int nnn=0;
+long test=0;
+
+typedef struct
+{
+  int _mp_size;
+  unsigned long *_mp_d;
+} __mpz_struct;
+typedef __mpz_struct mpz_t[1];
+
+int main ()
+{
+  mpz_t n, d;
+  long nn, dn;
+  unsigned long *np, *dup, *dnp, *qp;
+  long alloc, itch;
+
+  f (n);
+  f (d);
+  qp = (unsigned long*)__builtin_alloca(4099*8) + 1;
+  dnp = (unsigned long*)__builtin_alloca (2049*8);
+  alloc = 1;
+  for (test = 0; test  1; test++)
+{
+  dn = d-_mp_size;
+  dup = d-_mp_d;
+  f (dnp, dup, dn);
+  dnp[dn - 1] |= 1UL63;
+  f (0);
+  nn = nnn;
+  np = n-_mp_d;
+  qp[-1] = -757136820;
+  qp[nn - dn + 1] = 14883681;
+  f (0);
+  if (dn = 6)
+   f (0);
+  itch = nn + 1;
+  if (itch + 1 alloc)
+   {
+ g(0,alloc*8,(itch+1)*8);
+ alloc = itch + 1;
+   }
+  f (np, nn);
+}
+  return 0;
+}
Index: testsuite/gcc.target/i386/pr64291-2.c
===
--- testsuite/gcc.target/i386/pr64291-2.c   (revision 0)
+++ testsuite/gcc.target/i386/pr64291-2.c   (working copy)
@@ -0,0 +1,4 @@
+/* { dg-do compile } */
+extern void abort (void);
+void f(void*p,...){}
+void g(void*p,long a,long b){if (a!=8) abort();}


Re: [RFC PATCH 3/9] Add TARGET_CANNOT_SUBSTITUTE_MEM_EQUIV target macro

2014-12-18 Thread Vladimir Makarov

On 2014-12-17 8:00 PM, Kaz Kojima wrote:

This was discussed in PR55212

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55212#c25
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55212#c58
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55212#c69

and is to fix another ICEs in assign_by_spills.
This patch introduces a new target hook to disable replacing with
memory equivalence if the macro returns true.
The problem happens for r0 again.  SH ISA has many instructions with
r0-parity.  The index register is r0 only, only r0 can be used as
QI/HImode load/store target/source, and so on.
It seems that replacing mem equiv makes register pressure high
especially on r0 and fails in spilling.  Also replacing mem equiv
doesn't win CSiBE in average on this target.  Please see the above
trail #c69 for CSiBE numbers.

I think the one design goal of LRA is to avoid target macros if possible.
There might be some way to handle the above situation without target
macros, though it's over my head.



I don't think it will be easy to solve this problem by following reg 
pressure.  I guess R0 will be in the same pressure reg class.  I believe 
some hook is an adequate solution here.



--
* lra-constraints.c (get_equiv): Don't return memory equivalence
when targetm.cannot_substitute_mem_equiv_p is true.
* target.def (cannot_substitute_mem_equiv_p): New hook.
* config/sh/sh.c (sh_cannot_substitute_mem_equiv_p): New function.
(TARGET_CANNOT_SUBSTITUTE_MEM_EQUIV_P): Define.
* doc/tm.texi.in (TARGET_CANNOT_SUBSTITUTE_MEM_EQUIV_P): New hook.
* doc/tm.texi: Regenerate.



LRA part of the patch is ok.  I found one typo in and the speed on 
avarage working sets.  (average).


But I'd recommend to try different hook implementation checking r0 in 
the address as still equiv memory substitution might be profitable for 
SH in some cases.





Re: [RFC PATCH 2/9] LRA: Swap base_term and index_term for some case

2014-12-18 Thread Vladimir Makarov

On 2014-12-17 7:59 PM, Kaz Kojima wrote:

This was discussed in PR55212

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55212#c52
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55212#c53
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55212#c55

and is to fix ICE in assign_by_spills.
In that case, rtlanal.c:decompose_mem_address guesses base and index
regs when INDEX_REG_CLASS is assigned to that base candidate already.
This will be OK for almost targets but is bad for SH which has only
one register r0 in INDEX_REG_CLASS.  The patch fix-ups it with
swapping these candidates.

--
* lra-constraints.c (process_address_1): Swap base_term and
index_term if INDEX_REG_CLASS is assigned to base_term already
when INDEX_REG_CLASS is a single register class.



The patch is ok.  Although I will probably try to generalize it in future.

Thanks, again.





Re: [PATCH] PR 62173, re-shuffle insns for RTL loop invariant hoisting

2014-12-18 Thread Jiong Wang
2014-12-18 17:00 GMT+00:00 Jiong Wang jiong.w...@arm.com:
 ok for trunk?

 gcc/
 PR62173
  loop-invariant.c.c (expensive_addr): New hash_table.
  (need_expensive_addr_check_p): New bool.
  (find_exits): Rename to find_exists_and_reshuffle.
  Support re-shuffle instructions for better loop invariant hoisting.

another question is, is it safe to re-use REG_DEAD info here without
calling df_note_add_problem and df_analysis first?
am I using those info passed down from the previous pass which
calculated these info and maybe broken?


Re: [PATCH 0/9] Enable LRA on SH

2014-12-18 Thread Vladimir Makarov

On 2014-12-18 2:26 PM, Mike Stump wrote:

On Dec 18, 2014, at 10:42 AM, Vladimir Makarov vmaka...@redhat.com wrote:

Thank you very much for your work on porting LRA to SH!


I’ve ported my target to LRA, and I have not found any LRA bits I needed to 
change.



Thanks, Mike.  When I did a few first ports, it never happened to me. 
First steps are hard but next ones are easier.


Back in Cygnus times SH was always a problem for reload and RA.  A lot 
of regs but one reg is used for many purposes, tiny and mode dependent 
displacements.  A lot of optimizations were implemented specifically for 
SH.  SH optimizations were one reason why reload became so complicated.




Re: [RFC PATCH 4/9] Add TARGET_LEGITIMIZE_ADDRESS_DISPLACEMENT target macro

2014-12-18 Thread Vladimir Makarov

On 2014-12-17 8:00 PM, Kaz Kojima wrote:

This patch adds a target macro discussed in PR55212

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55212#c76
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55212#c78

is to avoid bad codes on SH.
SH has very limited base+displacement addressing and it looks
that the old reload has some mechanism so to split


Yes, that is probably last thing of reload missed in LRA.  As I remember 
Joern wrote this very long ago specifically for SH.  postreload pass 
also has some code to deal with different displacements and their costs 
(if there are no constraints on the constants).



   reg := mem[base + bad_displacement]
to
   new_base := base + (bad_displacement - legitimate_displacement)
   reg := mem[new_base + legitimate_displacement]
The new target macro is to give same effect on LRA.
This helps SH especially for the address like stack_pointer + offset
where offset has 4-bit except for SH2A.
I've tried to define this target macro for ARM thumb and found
only one CSiBE test replaypc-0.4.0.preproc/build-ndx is improved
at 4 bytes with -Os.  I guess that 8-bit offset field for sp+offset
addressing on ARM thumb is very good for this issue.
Again it would be better if something can cover this case without
new target macros.
--
* lra-constraints.c (process_address_1): Try if target can split
displacement with targetm.legitimize_address_displacement.
* target.def (legitimize_address_displacement): New hook.
* targhooks.c (default_legitimize_address_displacement): New function.
* targhooks.h (default_legitimize_address_displacement): Declare.
* config/sh/sh.c (sh_legitimize_address_displacement): New function.
(TARGET_LEGITIMIZE_ADDRESS_DISPLACEMENT): Define.
* doc/tm.texi.in (TARGET_LEGITIMIZE_ADDRESS_DISPLACEMENT): New hook.
* doc/tm.texi: Regenerate.



It is an interesting solution.

The patch is ok for me.

One more thing though.  The patches you propose affect very sensitive 
parts of LRA.  They might break other ports currently using LRA.  If it 
happens, please, revert a breaking patch as I can not be a help next 2 
weeks.




Re: [RFC PATCH 1/9] LRA: Take account implicit usage of pseudo reg in mem arg

2014-12-18 Thread Vladimir Makarov

On 2014-12-17 7:57 PM, Kaz Kojima wrote:

This patch is discussed in PR55212

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55212#c47
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55212#c48
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55212#c49

and is to avoid segfault in remove_pseudos.
In the problematic case in #c47, a call_insn has a note for an argument
via memory:
   (expr_list:DF (use (mem:DF (reg/f:SI 699) [0  S8 A32]))
and this usage of pseudo 699 was missed by LRA.  spill_hard_reg[699]
doesn't have correct value and causes a segfault in remove_pseudos.

--
* lra.c (lra_update_insn_regno_info): Take account of arguments
via memory which could be implicit usage of pseudo regs.



It could be a solution but I believe the problem should be solved in 
some other way.  The patch will generate a correct code but it might 
also generate worse code.  The regs mentioned in insn data are always 
processed for liveness and adding the pseudo can make live range longer 
potentially worsening allocation of other pseudos.


  So I guess we need to find another solution for the problem.  I 
suspect spill_hard_reg is undefined as the pseudo 699 is present only in 
CALL_INSN_USAGE.  Otherwise, spill_hard_reg would be NULL.  A solution 
would be to figure out why there is only one reference, what does it 
mean, and what we expect at the end of RA instead of pseudo (e.g. 
anything, const or some register).  And add code correspondingly in 
remove_pseudos for case nrefs==0 or scratch pseudo (cases when 
spill_hard_reg is not initialized).  May be such code is the result of 
wrong transformations (e.g. incomplete reg equiv substitution chain), 
then it should be fixed somewhere else.


It would be nice to have a small test case for this problem with 
compiler options to reproduce it.




Re: [PATCH] Fix -fsanitize=float-cast-overflow with C FE (PR sanitizer/64344)

2014-12-18 Thread Joseph Myers
On Thu, 18 Dec 2014, Jakub Jelinek wrote:

 c/
   * c-typeck.c (convert_for_assignment, c_finish_return): For
   -fsanitize=float-cast-overflow casts from REAL_TYPE to integer/enum
   types also set in_late_binary_op around convert call.
   * c-convert.c (convert): For -fsanitize=float-cast-overflow REAL_TYPE
   to integral type casts, if not in_late_binary_op, pass c_fully_fold
   result on expr as last argument to ubsan_instrument_float_cast,
   if in_late_binary_op, don't use c_save_expr but save_expr.

The C front-end changes are OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


patch fixing a typo in LRA

2014-12-18 Thread Vladimir Makarov
  The following patch fixes a typo in usage of register_move_cost.  For 
most targets, the change does not matter as the cost function is 
symmetric relative to args.


  The patch was bootstrapped on x86-64.

  Committed as rev. 218875.

2014-12-18  Vladimir Makarov  vmaka...@redhat.com

* lra-constraints.c (lra-constraints.c): Exchange places of sclass
and dclass.
Index: lra-constraints.c
===
--- lra-constraints.c   (revision 218835)
+++ lra-constraints.c   (working copy)
@@ -3197,7 +3197,7 @@ simple_move_p (void)
   (sclass = get_op_class (src)) != NO_REGS
  /* The backend guarantees that register moves of cost 2
 never need reloads.  */
-  targetm.register_move_cost (GET_MODE (src), dclass, sclass) == 2);
+  targetm.register_move_cost (GET_MODE (src), sclass, dclass) == 2);
  }
 
 /* Swap operands NOP and NOP + 1. */


Re: [PATCH] PR 62173, re-shuffle insns for RTL loop invariant hoisting

2014-12-18 Thread Segher Boessenkool
On Thu, Dec 18, 2014 at 05:00:01PM +, Jiong Wang wrote:
 On 17/12/14 15:54, Richard Biener wrote:
 ick.  I realize we don't have SSA form on RTL but doesn't DF provide
 at least some help in looking up definition statements for pseudos?
 In fact we want to restrict the transform to single-use pseudos, thus
 hopefully it can at least tell us that... (maybe not and this is what
 LOG_LINKS are for in combine...?)  At least loop-invariant alreadly
 computes df_chain with DF_UD_CHAIN which seems exactly what
 is needed (apart from maybe getting at single-use info).
 
 thanks very much for these inspiring questions.
 
 yes, we want to restrict the transformation on single-use pseudo only,
 and it's better the transformation could re-use existed info and helper
 function to avoid increase compile time. but I haven't found anything I
 can reuse at the stage the transformation happen.
 
 the info similar as LOG_LINKS is what I want, but maybe simpler. I'd study
 the code about build LOG_LINKS, and try to see if we can do some factor out.

LOG_LINKs in combine are just historical.  combine should be converted
to use DF fully.

LOG_LINKs have nothing to do with single use; they point from the _first_
use to its corresponding def.

You might want to look at what fwprop does instead.


Segher


C++ PATCH for c++/64251 (unnecessary template instantiation)

2014-12-18 Thread Jason Merrill
Here we were seeing errors because of template instantiation triggered 
by mark_used within fold_non_dependent_expr.  We don't want to 
instantiate templates based on uses in other uninstantiated templates, 
and we were already trying to avoid that; let's add another check to 
avoid it.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit 619400bca71f10d05e6adfaaef53c67997fb3f57
Author: Jason Merrill ja...@redhat.com
Date:   Thu Dec 18 12:10:38 2014 -0500

	PR c++/64251
	* decl2.c (mark_used): Don't mark if in_template_function.

diff --git a/gcc/cp/decl2.c b/gcc/cp/decl2.c
index b2123f2..69201b0 100644
--- a/gcc/cp/decl2.c
+++ b/gcc/cp/decl2.c
@@ -5017,7 +5017,7 @@ mark_used (tree decl, tsubst_flags_t complain)
   --function_depth;
 }
 
-  if (processing_template_decl)
+  if (processing_template_decl || in_template_function ())
 return true;
 
   /* Check this too in case we're within instantiate_non_dependent_expr.  */
diff --git a/gcc/testsuite/g++.dg/template/non-dependent14.C b/gcc/testsuite/g++.dg/template/non-dependent14.C
new file mode 100644
index 000..b257d9b
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/non-dependent14.C
@@ -0,0 +1,7 @@
+// PR c++/64251
+
+class DictionaryValue {};
+template typename T void CreateValue(T) {
+  DictionaryValue(0);
+  CreateValue(0);
+}


C++ PATCH for c++/64352 (decltype and SFINAE)

2014-12-18 Thread Jason Merrill
We were just missing a case where we want to pass complain down to avoid 
a hard error in SFINAE context.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit 83bea8460ce8e14e5b9d1c0dc33a0d22b63b566d
Author: Jason Merrill ja...@redhat.com
Date:   Thu Dec 18 12:42:34 2014 -0500

	PR c++/64352
	* pt.c (tsubst_copy_and_build): Pass complain to mark_used.

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 8a663d9..3ddee25 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -15125,7 +15125,7 @@ tsubst_copy_and_build (tree t,
 
 	/* Remember that there was a reference to this entity.  */
 	if (DECL_P (function))
-	  mark_used (function);
+	  mark_used (function, complain);
 
 	/* Put back tf_decltype for the actual call.  */
 	complain |= decltype_flag;
diff --git a/gcc/testsuite/g++.dg/cpp0x/deleted9.C b/gcc/testsuite/g++.dg/cpp0x/deleted9.C
new file mode 100644
index 000..af97be7
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/deleted9.C
@@ -0,0 +1,31 @@
+// PR c++/64352
+// { dg-do compile { target c++11 } }
+
+templatebool B struct bool_type
+{ static constexpr bool value = B; };
+
+using true_type = bool_typetrue;
+using false_type = bool_typefalse;
+
+templatetypename T T declval();
+
+templatetypename... struct void_ { using type = void; };
+templatetypename... I using void_t = typename void_I...::type;
+
+templatetypename _Tp, typename = void
+struct _Has_addressof_free: false_type { };
+
+templatetypename _Tp
+struct _Has_addressof_free
+_Tp, void_tdecltype( operator(declvalconst _Tp()) )
+: true_type { };
+
+struct foo {};
+void operator(foo) = delete;
+
+int main()
+{
+static_assert( !_Has_addressof_freeint::value,  );
+// error: use of deleted function 'void operator(foo)'
+static_assert( !_Has_addressof_freefoo::value,  );
+}


Re: [patch c++]: Fix PR/61198: Crash when selecting specializations through aliases.

2014-12-18 Thread Jason Merrill

On 12/18/2014 01:16 PM, Kai Tietz wrote:

Well, in general I would have assumed to be able to get alias decl of
tmpl. Wasn't able to find a simple way to get it. So, by looking into
source I found that most cases handling args  tmpl-args by using
inner_most_template_args instead. Defaulting to
innermost_template_args looks indeed a bit wrong, but seemed to work.


It's wrong, it just papers over the bug.  There shouldn't be a mismatch 
at this point.


The problem seems to be that most_general_template isn't actually 
returning the most general template, so we're trying to instantiate the 
partially-instantiated alias template with a full set of arguments: thus 
the mismatch.


Jason



Re: flatten expr.h

2014-12-18 Thread Prathamesh Kulkarni
Hi,
I had a discussion with Michael, and would like to put this patch on
hold for now till we have tree.h and rtl.h flattening
patches checked in to avoid conflicts.

Thankyou,
Prathamesh



On 16 December 2014 at 17:13, Prathamesh Kulkarni
prathamesh.kulka...@linaro.org wrote:
 Hi,
 The attached patch flattens expr.h.
 Moved stmt.c prototypes to stmt.h and expmed.c prototypes
 to expmed.h. Rest of the patch involves restructuring includes.

 Thank you,
 Prathamesh


Speedup and cleanup hash-table.h

2014-12-18 Thread Jan Hubicka
Hi,
this patch started as experiment moving hash_table_mod1 inline because it shows
high in streaming profiles and it represents a branch-less code that is good
to schedule to surrounding instructions.
While looking at hash-tab.h I however noticed that it was not updated very
well while it was moved from libiberty.h.  It still assumes lack of 64bit
integers to be possible and also it uses abort() calls instead of gcc_asserts.
Fixed thus.

Bootstrapped/regtested x86_64-linux, also checked that cc1plus binary
shirnks by about 15kb despite the extra inlining. Will commit it shortly.

Honza

* hash-table.h (struct pointer_hash): Fix formating.
(hash_table_higher_prime_index): Declare pure.
(hash_table_mod2, hash_table_mod1, mul_mod): Move inline;
assume that uint64_t always exists.
(hash_tableDescriptor, Allocator, false): Use gcc_checking_assert.
(hash_tableDescriptor, Allocator, false::expand ()): Fix formating.
(hash_tableDescriptor, Allocator, false::clear_slot (value_type 
**slot)):
Use checking assert.
* hash-table.c: Remvoe #if 0 code.
(hash_table_higher_prime_index): Use gcc_assert.
(mul_mod, hash-table_mod1, hash_table_mod2): move to hash-table.h

Index: hash-table.h
===
--- hash-table.h(revision 218795)
+++ hash-table.h(working copy)
@@ -282,7 +282,8 @@ struct pointer_hash : typed_noop_remove
 
   static inline hashval_t hash (const value_type );
 
-  static inline bool equal (const value_type existing, const compare_type 
candidate);
+  static inline bool equal (const value_type existing,
+   const compare_type candidate);
 };
 
 template typename Type
@@ -388,9 +389,54 @@ extern struct prime_ent const prime_tab[
 
 /* Functions for computing hash table indexes.  */
 
-extern unsigned int hash_table_higher_prime_index (unsigned long n);
-extern hashval_t hash_table_mod1 (hashval_t hash, unsigned int index);
-extern hashval_t hash_table_mod2 (hashval_t hash, unsigned int index);
+extern unsigned int hash_table_higher_prime_index (unsigned long n)
+   ATTRIBUTE_PURE;
+
+/* Return X % Y using multiplicative inverse values INV and SHIFT.
+
+   The multiplicative inverses computed above are for 32-bit types,
+   and requires that we be able to compute a highpart multiply.
+
+   FIX: I am not at all convinced that
+ 3 loads, 2 multiplications, 3 shifts, and 3 additions
+   will be faster than
+ 1 load and 1 modulus
+   on modern systems running a compiler.  */
+
+inline hashval_t
+mul_mod (hashval_t x, hashval_t y, hashval_t inv, int shift)
+{
+   hashval_t t1, t2, t3, t4, q, r;
+
+   t1 = ((uint64_t)x * inv)  32;
+   t2 = x - t1;
+   t3 = t2  1;
+   t4 = t1 + t3;
+   q  = t4  shift;
+   r  = x - (q * y);
+
+   return r;
+}
+
+/* Compute the primary table index for HASH given current prime index.  */
+
+inline hashval_t
+hash_table_mod1 (hashval_t hash, unsigned int index)
+{
+  const struct prime_ent *p = prime_tab[index];
+  gcc_checking_assert (sizeof (hashval_t) * CHAR_BIT = 32);
+return mul_mod (hash, p-prime, p-inv, p-shift);
+}
+
+/* Compute the secondary table index for HASH given current prime index.  */
+
+inline hashval_t
+hash_table_mod2 (hashval_t hash, unsigned int index)
+{
+  const struct prime_ent *p = prime_tab[index];
+  gcc_checking_assert (sizeof (hashval_t) * CHAR_BIT = 32);
+  return 1 + mul_mod (hash, p-prime - 2, p-inv_m2, p-shift);
+}
 
 /* The below is some template meta programming to decide if we should use the
hash table partial specialization that directly stores value_type instead of
@@ -748,8 +794,7 @@ hash_tableDescriptor, Allocator, false
 
   if (*slot == HTAB_EMPTY_ENTRY)
 return slot;
-  else if (*slot == HTAB_DELETED_ENTRY)
-abort ();
+  gcc_checking_assert (*slot != HTAB_DELETED_ENTRY);
 
   hash2 = hash_table_mod2 (hash, m_size_prime_index);
   for (;;)
@@ -761,8 +806,7 @@ hash_tableDescriptor, Allocator, false
   slot = m_entries + index;
   if (*slot == HTAB_EMPTY_ENTRY)
 return slot;
-  else if (*slot == HTAB_DELETED_ENTRY)
-abort ();
+  gcc_checking_assert (*slot != HTAB_DELETED_ENTRY);
 }
 }
 
@@ -773,7 +817,7 @@ hash_tableDescriptor, Allocator, false
table entries is changed.  If memory allocation fails, this function
will abort.  */
 
- templatetypename Descriptor, templatetypename Type class Allocator
+templatetypename Descriptor, templatetypename Type class Allocator
 void
 hash_tableDescriptor, Allocator, false::expand ()
 {
@@ -862,9 +906,9 @@ templatetypename Descriptor, templatet
 void
 hash_tableDescriptor, Allocator, false::clear_slot (value_type **slot)
 {
-  if (slot  m_entries || slot = m_entries + size ()
-  || *slot == HTAB_EMPTY_ENTRY || *slot == HTAB_DELETED_ENTRY)
-abort ();
+  gcc_checking_assert (!(slot  m_entries || slot = m_entries + size ()
+

C++ PATCH for c++/64105 (ICE on generic lambda with -std=c++11)

2014-12-18 Thread Jason Merrill
Since generic lambdas don't seem to work properly in C++11 mode, let's 
upgrade the pedwarn to an error.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit 6c7d0ebb35b4918be3200625ef0cbb1816ddd7e9
Author: Jason Merrill ja...@redhat.com
Date:   Thu Dec 18 16:56:33 2014 -0500

	PR c++/64105
	* parser.c (cp_parser_simple_type_specifier): Make auto parameter
	before -std=c++14 an error.

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 0e7ba7a..8ff16ed 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -14862,23 +14862,26 @@ cp_parser_simple_type_specifier (cp_parser* parser,
   maybe_warn_cpp0x (CPP0X_AUTO);
   if (parser-auto_is_implicit_function_template_parm_p)
 	{
-	  type = synthesize_implicit_template_parm (parser);
+	  if (cxx_dialect = cxx14)
+	type = synthesize_implicit_template_parm (parser);
+	  else
+	type = error_mark_node;
 
 	  if (current_class_type  LAMBDA_TYPE_P (current_class_type))
 	{
 	  if (cxx_dialect  cxx14)
-		pedwarn (location_of (type), 0,
+		error_at (token-location,
 			 use of %auto% in lambda parameter declaration 
 			 only available with 
 			 -std=c++14 or -std=gnu++14);
 	}
 	  else if (cxx_dialect  cxx14)
-	pedwarn (location_of (type), 0,
+	error_at (token-location,
 		 use of %auto% in parameter declaration 
 		 only available with 
 		 -std=c++14 or -std=gnu++14);
 	  else
-	pedwarn (location_of (type), OPT_Wpedantic,
+	pedwarn (token-location, OPT_Wpedantic,
 		 ISO C++ forbids use of %auto% in parameter 
 		 declaration);
 	}
@@ -14971,6 +14974,9 @@ cp_parser_simple_type_specifier (cp_parser* parser,
   /* Consume the token.  */
   cp_lexer_consume_token (parser-lexer);
 
+  if (type == error_mark_node)
+	return error_mark_node;
+
   /* There is no valid C++ program where a non-template type is
 	 followed by a .  That usually indicates that the user thought
 	 that the type was a template.  */
diff --git a/gcc/testsuite/g++.dg/cpp1y/lambda-generic-variadic2.C b/gcc/testsuite/g++.dg/cpp1y/lambda-generic-variadic2.C
new file mode 100644
index 000..2808aa6
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/lambda-generic-variadic2.C
@@ -0,0 +1,22 @@
+// PR c++/64105
+// This test was ICEing on pre-C++14 mode.
+
+#include functional
+
+using F = std::functionvoid(void);
+
+struct X
+{
+  template typename T
+  static void f(T t)
+  {
+g(t);
+  }
+
+  static void g(F) {}
+};
+
+int main()
+{
+  X::f([](auto... xs){});	// { dg-error   { target { ! cxx14 } } }
+};


Flatten tree.h and tree-core.h

2014-12-18 Thread Michael Collison
This patch flattens tree.h and tree-core.h. This work is part of the GCC 
Re-Architecture effort being led by Andrew MacLeod.


I removed the includes in tree.h and tree-core.h except for the include 
of tree-core.h in tree.h.


I modified genattrtab.c, genautomata.c, genemit.c, gengtype.c, 
gengtype.c, genoptinit.c, genoutput.c,
genpeep.c, genpreds.c, and optc-save-gen-awk to include the the 
necessary include files removed from

tree.h and tree-core.h when generating their respective files.

All other changes include the necessary include files removed from 
tree.h and tree-core.h. Note the patches modifies all the front-ends.


I bootstrapped on x86 with all languages. I also bootstrapped on all 
targets listed in contrib/config-list.mk with c and c++ enabled.


Is this okay for trunk?

2014-12-18  Michael Collison  michael.colli...@linaro.org

* genattrtab.c (write_header): Include hash-set.h, machmode.h,
vec.h, double-int.h, input.h, alias.h, symtab.h, options.h
tree-core.h, fold-const.h, wide-int.h, and inchash.h when generating
insn-attrtab.c.
* genautomata.c (main) : Include hash-set.h, macInclude hash-set.h, 
machmode.h,

vec.h, double-int.h, input.h, alias.h, symtab.h, options.h
tree-core.h, fold-const.h, wide-int.h, and inchash.h when generating
insn-automata.c.
* genemit.c (main): Include hash-set.h, machmode.h,
vec.h, double-int.h, input.h, alias.h, symtab.h, options.h
tree-core.h, fold-const.h, wide-int.h, and inchash.h when generating
insn-emit.c.
* gengtype.c (open_base_files): Include hash-set.h, machmode.h,
vec.h, double-int.h, input.h, alias.h, symtab.h, options.h
tree-core.h, fold-const.h, wide-int.h, and inchash.h when generating
gtype-desc.c.
* genopinit.c (main): Include hash-set.h, machmode.h,
vec.h, double-int.h, input.h, alias.h, symtab.h, options.h
tree-core.h, fold-const.h, wide-int.h, and inchash.h when generating
insn-opinit.c.
* genoutput.c (output_prologue): Include hash-set.h, machmode.h,
vec.h, double-int.h, input.h, alias.h, symtab.h, options.h
tree-core.h, fold-const.h, wide-int.h, and inchash.h when generating
insn-output.c.
* genpeep.c (main): Include hash-set.h, machmode.h,
vec.h, double-int.h, input.h, alias.h, symtab.h, options.h
tree-core.h, fold-const.h, wide-int.h, and inchash.h when generating
insn-peep.c.
* genpreds.c (write_insn_preds_c): Include hash-set.h, machmode.h,
vec.h, double-int.h, input.h, alias.h, symtab.h, options.h
tree-core.h, fold-const.h, wide-int.h, and inchash.h when generating
insn-preds.c.
* optc-save-gen-awk: Include hash-set.h, machmode.h,
vec.h, double-int.h, input.h, alias.h, symtab.h, options.h
tree-core.h, fold-const.h, wide-int.h, and inchash.h when generating
options-save.c.
* opth-gen.awk: Change include guard from GCC_C_COMMON_H to 
GCC_C_COMMON_C

when generating options.h.

2014-12-18  Michael Collison  michael.colli...@linaro.org

* ada/gcc-interface/cuintp.c: Include hash-set.h, machmode.h,
vec.h, double-int.h, input.h, alias.h, symtab.h,
tree-core.h, fold-const.h, wide-int.h, and inchash.h due to
flattening of tree.h.
* ada/gcc-interface/decl.c: ditto.
* ada/gcc-interface/misc.c: ditto.
* ada/gcc-interface/targtyps.c: Include hash-set.h, machmode.h,
vec.h, double-int.h, input.h, alias.h, symtab.h, options.h,
tree-core.h, fold-const.h, wide-int.h, and inchash.h due to
flattening of tree.h.
* ada/gcc-interface/trans.c: Include hash-set.h, machmode.h,
vec.h, double-int.h, input.h, alias.h, symtab.h, real.h,
tree-core.h, fold-const.h, wide-int.h, inchash.h due to
flattening of tree.h.
* ada/gcc-interface/utils.c: Include hash-set.h, machmode.h,
vec.h, double-int.h, input.h, alias.h, symtab.h,
tree-core.h, fold-const.h, wide-int.h, and inchash.h due to
flattening of tree.h.
* ada/gcc-interface/utils2.c: ditto.
* alias.c: Include hash-set.h, machmode.h,
vec.h, double-int.h, input.h, alias.h, symtab.h, options.h
tree-core.h, fold-const.h, wide-int.h, and inchash.h due to
flattening of tree.h.
* asan.c: ditto.
* attribs.c: ditto.
* auto-inc-dec.c: ditto.
* auto-profile.c: ditto
* bb-reorder.c: ditto.
* bt-load.c: Include symtab.h due to flattening of tree.h.
* builtins.c: Include hash-set.h, machmode.h,
vec.h, double-int.h, input.h, alias.h, symtab.h, options.h
tree-core.h, fold-const.h, wide-int.h, and inchash.h due to
flattening of tree.h.
* c/c-array-notation.c: ditto.
* c/c-aux-info.c: ditto.
* c/c-convert.c: ditto.
* c/c-decl.c: ditto.
* c/c-errors.c: ditto.
* c/c-lang.c: dittoxs.
* c/c-objc-common.c: ditto.
* c/c-parser.c: ditto.
* c/c-typeck.c: Include hash-set.h, machmode.h,
vec.h, double-int.h, input.h, alias.h, symtab.h, options.h
tree-core.h, fold-const.h, wide-int.h, inchash.h, real.h and

Re: Speedup and cleanup hash-table.h

2014-12-18 Thread Andi Kleen
Jan Hubicka hubi...@ucw.cz writes:

 Hi,
 this patch started as experiment moving hash_table_mod1 inline because it 
 shows
 high in streaming profiles and it represents a branch-less code that is good
 to schedule to surrounding instructions.

FWIW with a good modern hash function it shouldn't be needed to have
prime hash table sizes anymore. Without that just a power of two size 
can be used, so it would be just a mask.

I considered this last time I messed with hashes, but I didn't actually
see this function as beening hot.

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only


Re: [PATCH] -fsanitize-recover=list

2014-12-18 Thread Alexey Samsonov
Hi Jakub,

On Tue, Nov 18, 2014 at 11:36 AM, Alexey Samsonov samso...@google.com wrote:

 On Mon, Nov 17, 2014 at 11:53 PM, Jakub Jelinek ja...@redhat.com wrote:
  On Mon, Nov 17, 2014 at 11:39:47PM -0800, Alexey Samsonov wrote:
   That is not what I think we've agreed on and what is implemented in GCC.
   -fsanitize-recover only enables recovery of the undefined plus
   undefined-like sanitizers, in particular it doesn't enable recover from
   kernel-address, because -fsanitize-recover should be a deprecated option
   and kernel-address never used it before.
 
  Hm, indeed, I messed it up. In the older thread we agree that plain
  -f(no-)sanitize-recover
  should be a deprecated synonym for the current meaning of these flags,
  which only specify recoverability for UBSan-related checks :-/
 
  Has GCC already shipped with existing implementation? I'm just curious
  if it's convenient to have flags that would enable/disable recovery for all
  sanitizers (analogous to -Werror flags which exist in a standalone form, 
  but
  may accept specific warnings, so that one can write
$(CC) foo.cc -Wno-error -Werror=sign-compare
 
  Well, no sanitizer is on by default, so you can just use the same
  list for -fno-sanitize-recover= or -fsanitize-recover= as for -fsanitize=
  if you want.

 Yeah, but it may look somewhat redundant. Also, re-using the exact
 same list may lead to diagnostic messages (if you write
 -fsanitize=unreachable,null -fsanitize-recover=unreachable,null).

Reviving this thread.

What do you think of the following idea:
1) we keep -fsanitize-recover and -fno-sanitize-recover as
deprecated synonyms
for -f(no-)?sanitize=ubsan-like checks
2) we introduce -fsanitize-recover=list and
-fno-sanitize-recover=list, where list may contain
specific sanitizers (address) or group of sanitizers (undefined).
3) we introduce group of sanitizers called all, which is, well, all
available sanitizers. It can *not* be
used in -fsanitize=all flag, but can be used to easily disable all
previously enabled sanitizers (-fno-sanitize=all),
or to enable/disable recovery for all enabled sanitizers
(-fsanitize-recover=all / -fno-sanitize-recover=all).

This would be a handy shortcut, and will save users from copying the
same list twice for
-fsanitize= and -fsanitize-recover= flags, and from potential
diagnostic problems I mention.



   So, in GCC -fsanitize-recover stands for
   -fsanitize-recover=undefined,float-divide-by-zero,float-cast-overflow
   and -fno-sanitize-recover stands for
   -fno-sanitize-recover=undefined,float-divide-by-zero,float-cast-overflow
  
* -fsanitize-recover=list: Enable recovery for all selected checks
or group of checks. It is forbidden to list unrecoverable sanitizers
here (e.g., -fsanitize-recover=address will produce an error).
  
   We only error on
   -fsanitize-recover=address
   -fsanitize-recover=thread
   -fsanitize-recover=leak
 
  Right, it's a good idea to error on sanitizer kinds we don't have
  recovery for. I will
  add the errors for TSan/MSan/LSan etc.
 
   but not say on
   -fsanitize-recover=unreachable
   which is part of undefined; unreachable isn't recoverable silently.
   Likewise -fsanitize-recover=return.  Otherwise one couldn't use
   -fsanitize-recover=undefined which is useful.
 
  Can't this be fixed by checking the actual values of -fsanitize-recover= 
  flag
  before expanding the sanitizer groups (i.e. turning undefined into
  null,unreachable,return,)?
 
  We should probably be able to distinguish between 
  -fsanitize-recover=undefined,
  and -fsanitize-recover=unreachable,another checks from undefined.
  In the first case we can enable recovery
  for all parts of undefined that support it. In the second, we can
  produce an error as user
  explicitly tried to enable recovery for sanitizer that doesn't support it.
 
  But in that case you would need to diagnose it already when parsing the
  option, or add code that would remember what bits have been set from other
  option sets.
  In the former case, you'd diagnose even
  -fsanitize-recover=unreachable -fno-sanitize-recover=undefined
  even when in that case unreachable in the end is not recoverable.
  We don't error out for
  -fsanitize-recover=address -fno-sanitize-recover=address
  because in the end address is not recovered.

 OK, that's a good question: should we diagnose -fsanitize-recover=address
 if it was later disabled by -fno-sanitize-recover=address? There is an 
 argument
 for doing this: -fsanitize-recover=address is unsupported, so this
 flag shouldn't
 have been provided in the first place. It makes much more sense to
 delete it rather
 than override it. It couldn't be passed down from some global
 project-wide configuration
 (like CFLAGS), because it wouldn't work anywhere.

 The only case where overriding might come in handy is re-using the same list 
 for
 -fsanitize= and -fsanitize-recover= flags that you mention:
 # SANITIZERS is a list of sanitizers we want to enable.
 

  1   2   >