date:20150925

[Patch,optimization]: Optimized changes in the estimate register pressure cost.

2015-09-25 Thread Ajit Kumar Agarwal

I have made the following changes in the estimate_reg_pressure_cost function 
used 
by the loop invariant and IVOPTS. 

Earlier the estimate_reg_pressure cost uses the cost of n_new variables that 
are generated by the Loop Invariant
 and IVOPTS. These are not sufficient for register pressure calculation. The 
register pressure cost calculation should
use the n_new + n_old (numbers) to consider the cost. n_old is the register  
used inside the loops and the effect of
 n_new new variables generated by loop invariant and IVOPTS on register 
pressure is based on how the new
variables impact on register used inside the loops. The increase or decrease in 
register pressure is due to the impact
of new variables on the register used  inside the loops. The register-register 
move cost or the spill cost should consider
the cost associated with register used and the new variables generated. The 
movement  of new variables increases or 
decreases the register pressure, which is based on  overall cost of n_new + 
n_old variables.

The increase and decrease in register pressure is based on the overall cost of 
n_new + n_old as the changes in the 
register pressure caused due to new variables is based on how the changes 
behave with respect to the register used 
in the loops.

Thus the register pressure caused to new variables is based on the new 
variables and its impact on register used inside
 the loops and thus consider the overall  cost of n_new + n_old.

Bootstrap for i386 and reg tested on i386 with the change is fine.

SPEC CPU 2000 benchmarks are run and there is following impact on the 
performance
and code size.

ratio with the optimization vs ratio without optimization for INT benchmarks
(3807.632 vs 3804.661)

ratio with the optimization vs ratio without optimization for FP benchmarks
( 4668.743 vs 4778.741)

Code size reduction with respect to FP SPEC CPU 2000 benchmarks

Number of instruction with optimization = 1094117
Number of instruction without optimization = 1094659

Reduction in number of instruction with the optimization = 542 instruction.

[Patch,optimization]: Optimized changes in the estimate
 register pressure cost.

Earlier the estimate_reg_pressure cost uses the cost of n_new variables that
are generated by the Loop Invariant and IVOPTS. These are not sufficient for
register pressure calculation. The register pressure cost calculation should
use the n_new + n_old (numbers) to consider the cost. n_old is the register
used inside the loops and the affect of n_new new variables generated by
loop invariant and IVOPTS on register pressure is based on how the new
variables impact on register used inside the loops.

ChangeLog:
2015-09-26  Ajit Agarwal  

* cfgloopanal.c (estimate_reg_pressure_cost) : Add changes
to consider the n_new plus n_old in the register pressure
cost.

Signed-off-by:Ajit Agarwal ajit...@xilinx.com

Thanks & Regards
Ajit



0001-Patch-optimization-Optimized-changes-in-the-estimate.patch
Description: 0001-Patch-optimization-Optimized-changes-in-the-estimate.patch

Re: (patch,rfc) s/gimple/gimple */

2015-09-25 Thread Trevor Saunders

On Thu, Sep 24, 2015 at 11:31:40AM +0200, Thomas Schwinge wrote:
> Hi!
> 
> On Sat, 19 Sep 2015 20:55:35 -0400, Trevor Saunders  
> wrote:
> > On Fri, Sep 18, 2015 at 09:32:37AM -0600, Jeff Law wrote:
> > > On 09/18/2015 07:32 AM, Trevor Saunders wrote:
> > > >On Wed, Sep 16, 2015 at 03:11:14PM -0400, David Malcolm wrote:
> > > >>On Wed, 2015-09-16 at 09:16 -0400, Trevor Saunders wrote:
> > > >>>I gave changing from gimple to gimple * a shot last week.
> 
> > ok, its committed now :)
> 
> [...]/source-gcc/gcc/tree-object-size.c:62:13: warning: 'bool 
> plus_stmt_object_size(object_size_info*, tree, gimple)' declared 'static' but 
> never defined [-Wunused-function]
>  static bool plus_stmt_object_size (struct object_size_info *, tree, 
> gimple);
>  ^
> [...]/source-gcc/gcc/tree-object-size.c:63:13: warning: 'bool 
> cond_expr_object_size(object_size_info*, tree, gimple)' declared 'static' but 
> never defined [-Wunused-function]
>  static bool cond_expr_object_size (struct object_size_info *, tree, 
> gimple);
>  ^
> 
> Not sure why your automation didn't catch these?  Anyway, in r228080 I
> now committed these additional changes (as obvious):

well, it wasn't that automated ;) but this does point out some
interesting things.

> diff --git gcc/tree-object-size.c gcc/tree-object-size.c
> index f76f160..230761b 100644
> --- gcc/tree-object-size.c
> +++ gcc/tree-object-size.c
> @@ -59,8 +59,8 @@ static void collect_object_sizes_for (struct 
> object_size_info *, tree);
>  static void expr_object_size (struct object_size_info *, tree, tree);
>  static bool merge_object_sizes (struct object_size_info *, tree, tree,
>   unsigned HOST_WIDE_INT);
> -static bool plus_stmt_object_size (struct object_size_info *, tree, gimple);
> -static bool cond_expr_object_size (struct object_size_info *, tree, gimple);
> +static bool plus_stmt_object_size (struct object_size_info *, tree, gimple 
> *);
> +static bool cond_expr_object_size (struct object_size_info *, tree, gimple 
> *);

so, these forward decls aren't actually needed, and I'd be tempted to
just remove them, however I was curious why they didn't generate a
warning, and I found we regressed the warning for undefined declared
static functions.  Consider this C++ test case:

static bool bar(int);

static bool
bar(int *p)
{
return p;
}

int main()
{
int x;
bar();
return 0;
}

g++ (Debian 5.2.1-17) 5.2.1 20150911
/tmp/test.cc:1:13: warning: ‘bool bar(int)’ declared ‘static’ but never defined 
[-Wunused-function]
 static bool bar(int);
 ^

xgcc (GCC) 6.0.0 20150920 (experimental)
 bool bar(int*) int main()
Analyzing compilation unit
Performing interprocedural optimizations
 <*free_lang_data>
  Assembling functions:
 bool bar(int*) int main()
Execution times (seconds)
 phase setup :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 (50%) wall  
  1381 kB (76%) ggc
 phase parsing   :   0.00 ( 0%) usr   0.01 (100%) sys   0.00 ( 0%) wall 
330 kB (18%) ggc
 phase opt and generate  :   0.01 (100%) usr   0.00 ( 0%) sys   0.01 (50%) wall 
 95 kB ( 5%) ggc
 parser (global) :   0.00 ( 0%) usr   0.01 (100%) sys   0.00 ( 0%) wall 
311 kB (17%) ggc
 initialize rtl  :   0.01 (100%) usr   0.00 ( 0%) sys   0.01 (50%) wall 
 12 kB ( 1%) ggc
 TOTAL :   0.01 0.01 0.02   
1818 kB
Extra diagnostic checks enabled; compiler may run slowly.
Configure with --enable-checking=release to disable checks.

I guess I'll try and file a bug about that soon :)

Trev

Re: [PATCH] Add new hooks ASM_OUTPUT_START_FUNCTION_HEADER ...

2015-09-25 Thread Bernd Schmidt


On 09/24/2015 03:48 PM, Dominik Vogt wrote:

Hm, I wonder whether wrapping all these section switches in
assemble_start/end_function in ".machine" pseudoops (that's what
we need the hooks for; similar to .arch for ix86) has any real
effect.


I don't think I follow what you're trying to say here?


Bernd

[Patch Prototype AArch64 ifcvt 4/3] Wire up the new if-convert costs hook for AArch64

2015-09-25 Thread James Greenhalgh


Hi,

This patch is a simple prototype showing how a target might choose
to implement TARGET_COSTS_IFCVT_NOCE_IS_PROFITABLE_P.  It has not been
tuned, tested or looked at in any meaningful way.

While the patch is in need of more detailed analysis it is sufficient to
serve as an indication of what direction I was aiming for with this
patch set.

Clearly this is not OK for trunk without further work, but I thought I'd
include it as an afterthought for the costs rework.

Thanks,
James

---
2015-09-26  James Greenhalgh  

* config/aarch64/aarch64.c
(aarch64_additional_branch_cost_for_probability): New.
(aarch64_ifcvt_noce_profitable_p): Likewise.
(TARGET_COSTS_IFCVT_NOCE_PROFITABLE_P): Likewise.

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 4fa6a4e..6a753e9 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -76,6 +76,8 @@
 #include "sched-int.h"
 #include "cortex-a57-fma-steering.h"
 #include "target-globals.h"
+#include "ifcvt.h"
+#include "math.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -13362,6 +13364,7 @@ aarch64_unspec_may_trap_p (const_rtx x, unsigned flags)
   return default_unspec_may_trap_p (x, flags);
 }
 
+
 /* Implement TARGET_PROMOTED_TYPE to promote __fp16 to float.  */
 static tree
 aarch64_promoted_type (const_tree t)
@@ -13370,6 +13373,117 @@ aarch64_promoted_type (const_tree t)
 return float_type_node;
   return NULL_TREE;
 }
+
+/* Get an additional cost roughly mapping to the cycles cost of a
+   mispredicted branch, scaled by how predictable we think that branch
+   is.  Note that in the AArch64 back-end we try to map COSTS_N_INSNS (1)
+   to the latency of the cheapest instruction, with other instructions
+   scaled from that.
+
+   PREDICTABILITY is derived from the computed branch taken probability
+   and will be in the range [0..1], where 0.0 is a branch which should be
+   considered unpredictable, and 1.0 is a branch which is predictable.
+
+   Branches which are more predictable are less profitable to if-convert and
+   should return smaller values.  Branches which are more predictable are more
+   profitable to if-convert and should return higher values.
+
+   FORNOW: use math.h's log2 and take and return floats.
+   TODO: Design a function which gives more meaningful results.  */
+
+static float
+aarch64_additional_branch_cost_for_probability (float predictability)
+{
+  /* Scale to be in range [1..2], then take log2 of it.  Use this to
+ pick a point between two -mcpu (well, one day) dependent values.  */
+  unsigned int lower_bound = COSTS_N_INSNS (2);
+  unsigned int upper_bound = COSTS_N_INSNS (6);
+
+  return log2 (predictability  + 1.0f)
+	 * (upper_bound - lower_bound) + lower_bound;
+}
+
+/* Prototype only, return TRUE if SEQ is a profitable transformation of
+   the basic block structure defined in IF_INFO.
+
+   TODO: Design and analyze how this function should actually behave.
+   This is just guesswork.  */
+
+static bool
+aarch64_ifcvt_noce_profitable_p (rtx_insn *seq,
+ struct noce_if_info *if_info)
+{
+  bool speed_p
+= optimize_bb_for_speed_p (if_info->test_bb);
+
+  float sequence_cost (seq_cost (seq, speed_p));
+
+  /* We know that when expanding an if-convert sequence for AArch64 we
+ will generate a number of redundant comparisons.  Account for that
+ by slightly modifying the numbers.  */
+  sequence_cost /= 1.5f;
+
+  float then_cost (if_info->then_cost);
+  float else_cost = 0.0;
+
+  if (if_info->else_bb)
+else_cost = if_info->else_cost;
+
+  /* Get the weighting factors.  */
+  edge te = EDGE_SUCC (if_info->test_bb, 0);
+  float taken_probability = ((float) te->probability) / REG_BR_PROB_BASE;
+
+  /* We have to reverse the branch taken probability if we have a
+ then_else_reversed branch structure.  We want to get this correct
+ so we scale the cost of the correct branch.  */
+  if (if_info->then_else_reversed)
+taken_probability = 1.0f - taken_probability;
+
+  /* For branch_probability, we don't care which branch we are
+ considering, we just need a value in the range [0..0.5].  */
+  float branch_probability = taken_probability;
+  if (branch_probability > 0.5f)
+branch_probability = 1.0f - branch_probability;
+
+  /* Taken_probability is in range [0.0..0.5].  Scale to be in
+ range [0.0..1.0], and subtract from 1.0 so a taken_probability of 0.5
+ gives a predictability of 0.0.  */
+  float predictability = 1.0 - (2.0 * branch_probability);
+
+  float weighted_average_cost = ((taken_probability
+  * (then_cost - else_cost)) + then_cost);
+
+  if (!if_info->else_bb)
+weighted_average_cost /= 2.0f;
+  float branch_cost
+= aarch64_additional_branch_cost_for_probability (predictability);
+
+  float estimate_unconverted_cost = weighted_average_cost + branch_cost;
+
+  bool judgement = sequence_cost <= estimate_unconverted_cost;
+
+  if

Re: Elimitate duplication of get_catalogs in different abi

2015-09-25 Thread Jonathan Wakely


On 23/09/15 21:28 +0200, François Dumont wrote:

On 05/09/2015 23:02, François Dumont wrote:

On 22/08/2015 14:24, Daniel Krügler wrote:

2015-08-21 23:11 GMT+02:00 François Dumont :

I think I found a better way to handle this problem. It is c++locale.cc
that needs to be built with --fimplicit-templates. I even think that the
*_cow.cc file do not need this option but as I don't know what is the
drawback of this option I kept it. I also explicitely used the file name
c++locale.cc even if it is an alias to a configurable source file.  I
guess there must be some variable to use no ?

With this patch there are 6 additional symbols. I guess I need to
declare those in the scripts even if it is for internal library usage,
right ?

I would expect that the new Catalog_info definition either has deleted
or properly (user-)defined copy constructor and copy assignment
operator.


- Daniel


This type is used in C++98 so I need to make those private, not deleted.

With this change, is the patch ok to commit ?

François



What about this patch ?

I am still uncomfortable in exposing those implementation details in the
versionned symbols but I don't know how to do otherwise. Do you want me
to push this code in std::__detail namespace ?


I think because the types are only used internally in the library we
don't need to export them. The other code inside the shared library
can refer to those symbols without them being exported.

That way users can't see their names (because they're not in any
public headers) and can't use the symbols (because they're not
exported) so they're pure internal implementation details.

I tested it briefly and it seems to work, so if you can confirm it
still works then the patch is OK without the changes to gnu.ver

Thanks.

Re: [PATCH] Clear variables with stale SSA_NAME_RANGE_INFO (PR tree-optimization/67690)

2015-09-25 Thread Marek Polacek

On Fri, Sep 25, 2015 at 03:49:34PM +0200, Marek Polacek wrote:
> On Fri, Sep 25, 2015 at 09:29:30AM +0200, Richard Biener wrote:
> > On Thu, 24 Sep 2015, Marek Polacek wrote:
> > 
> > > As Richi said in 
> > > ,
> > > using recorded SSA name range infos in VRP is likely to expose errors in 
> > > the
> > > ranges.  This PR is such a case.  As discussed in the PR, after tail 
> > > merging
> > > via PRE the range infos cannot be relied upon anymore, so we need to clear
> > > them.
> > > 
> > > Since tree-ssa-ifcombine.c already had code to clean up the flow data in 
> > > a BB,
> > > I've factored it out to a common function.
> > > 
> > > Bootstrapped/regtested on x86_64-linux, ok for trunk and 5?
> > 
> > I believe for tail-merge you also need to clear range info on
> > PHI defs in the BB.  For ifcombine this wasn't necessary (no PHI nodes
> > in the relevant CFG), but it's ok to extend the new 
> > reset_flow_sensitive_info_in_bb function to also reset PHI defs.
> 
> All right.
>  
> > Ok with that change.
> 
> Since I'm not completely sure if I did the right thing here, could you
> please have another look at the new function?

And this is gcc-5 variant of the patch, bootstrapped/regtested on x86_64-linux.

2015-09-25  Marek Polacek  

PR tree-optimization/67690
* tree-ssa-ifcombine.c (pass_tree_ifcombine::execute): Call
reset_flow_sensitive_info_in_bb.
* tree-ssa-tail-merge.c: Include "stringpool.h" and "tree-ssanames.h".
(replace_block_by): Call reset_flow_sensitive_info_in_bb.
* tree-ssanames.c: Include "gimple-iterator.h".
(reset_flow_sensitive_info_in_bb): New function.
* tree-ssanames.h (reset_flow_sensitive_info_in_bb): Declare.

* gcc.dg/torture/pr67690.c: New test.

--- gcc/testsuite/gcc.dg/torture/pr67690.c
+++ gcc/testsuite/gcc.dg/torture/pr67690.c
@@ -0,0 +1,32 @@
+/* { dg-do run } */
+
+const int c1 = 1;
+const int c2 = 2;
+
+int
+check (int i)
+{
+  int j;
+  if (i >= 0)
+j = c2 - i;
+  else
+j = c2 - i;
+  return c2 - c1 + 1 > j;
+}
+
+int invoke (int *pi) __attribute__ ((noinline,noclone));
+int
+invoke (int *pi)
+{
+  return check (*pi);
+}
+
+int
+main ()
+{
+  int i = c1;
+  int ret = invoke ();
+  if (!ret)
+__builtin_abort ();
+  return 0;
+}
--- gcc/tree-ssa-ifcombine.c
+++ gcc/tree-ssa-ifcombine.c
@@ -806,16 +806,7 @@ pass_tree_ifcombine::execute (function *fun)
  {
/* Clear range info from all stmts in BB which is now executed
   conditional on a always true/false condition.  */
-   for (gimple_stmt_iterator gsi = gsi_start_bb (bb);
-!gsi_end_p (gsi); gsi_next ())
- {
-   gimple stmt = gsi_stmt (gsi);
-   ssa_op_iter i;
-   tree op;
-   FOR_EACH_SSA_TREE_OPERAND (op, stmt, i, SSA_OP_DEF)
- reset_flow_sensitive_info (op);
- }
-
+   reset_flow_sensitive_info_in_bb (bb);
cfg_changed |= true;
  }
 }
--- gcc/tree-ssa-tail-merge.c
+++ gcc/tree-ssa-tail-merge.c
@@ -235,6 +235,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "cfgloop.h"
 #include "tree-pass.h"
 #include "trans-mem.h"
+#include "stringpool.h"
+#include "tree-ssanames.h"
 
 /* Describes a group of bbs with the same successors.  The successor bbs are
cached in succs, and the successor edge flags are cached in succ_flags.
@@ -1558,6 +1560,10 @@ replace_block_by (basic_block bb1, basic_block bb2)
   e2->probability = GCOV_COMPUTE_SCALE (e2->count, out_sum);
 }
 
+  /* Clear range info from all stmts in BB2 -- this transformation
+ could make them out of date.  */
+  reset_flow_sensitive_info_in_bb (bb2);
+
   /* Do updates that use bb1, before deleting bb1.  */
   release_last_vdef (bb1);
   same_succ_flush_bb (bb1);
--- gcc/tree-ssanames.c
+++ gcc/tree-ssanames.c
@@ -44,6 +44,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "is-a.h"
 #include "gimple.h"
 #include "gimple-ssa.h"
+#include "gimple-iterator.h"
 #include "tree-phinodes.h"
 #include "ssa-iterators.h"
 #include "stringpool.h"
@@ -563,6 +564,27 @@ reset_flow_sensitive_info (tree name)
 SSA_NAME_RANGE_INFO (name) = NULL;
 }
 
+/* Clear all flow sensitive data from all statements and PHI definitions
+   in BB.  */
+
+void
+reset_flow_sensitive_info_in_bb (basic_block bb)
+{
+  for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p (gsi);
+   gsi_next ())
+{
+  gimple stmt = gsi_stmt (gsi);
+  ssa_op_iter i;
+  def_operand_p def;
+  FOR_EACH_PHI_OR_STMT_DEF (def, stmt, i, SSA_OP_DEF)
+   {
+ tree var = DEF_FROM_PTR (def);
+ if (TREE_CODE (var) != SSA_NAME)
+   continue;
+ reset_flow_sensitive_info (var);
+   }
+}
+}
 
 /* Release all the SSA_NAMEs created by STMT.  */
 
--- gcc/tree-ssanames.h
+++

libgomp: Guard all offload_images/num_offload_images access by register_lock (was: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks)

2015-09-25 Thread Thomas Schwinge

Hi!

It's still Friday afternoon -- so please bear with me once again...

On Thu, 26 Mar 2015 23:41:30 +0300, Ilya Verbin  wrote:
> On Thu, Mar 26, 2015 at 13:09:19 +0100, Jakub Jelinek wrote:
> > the current code is majorly broken.  As I've said earlier, e.g. the lack
> > of mutex guarding gomp_target_init (which is using pthread_once guaranteed
> > to be run just once) vs. concurrent GOMP_offload_register calls
> > (if those are run from ctors, then I guess something like dl_load_lock
> > ensures at least on glibc that multiple GOMP_offload_register calls aren't
> > performed at the same time) in accessing/reallocating offload_images
> > and num_offload_images and the lack of support to register further
> > images after the gomp_target_init call (if you dlopen further shared
> > libraries) is really bad.  And it would be really nice to support the
> > unloading.

> Here is the latest patch for libgomp and mic plugin.

What about the scenario where one thread is inside
GOMP_offload_register_ver/GOMP_offload_register (say, due to opening a
shared library with such a mkoffload-generated constructor) and is
modifying offload_images with register_lock held, and another thread is
inside a GOMP_target* construct -> gomp_init_device and is accessing
offload_images without register_lock held?  Or, why isn't that a
reachable scenario?

Would the following patch (untested) do the right thing (locking added to
gomp_init_device and gomp_unload_device)?  We can then also remove the
is_register_lock parameter from gomp_load_image_to_device, and simplify
the code.

commit 702090223a8d43e7371d6cbfc9d8a39e3c5c2986
Author: Thomas Schwinge 
Date:   Fri Sep 25 17:37:41 2015 +0200

libgomp: Guard all offload_images/num_offload_images access by register_lock
---
 libgomp/target.c |   25 ++---
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git libgomp/target.c libgomp/target.c
index 758ece5..1fbbe31 100644
--- libgomp/target.c
+++ libgomp/target.c
@@ -640,12 +640,13 @@ gomp_update (struct gomp_device_descr *devicep, size_t 
mapnum, void **hostaddrs,
 /* Load image pointed by TARGET_DATA to the device, specified by DEVICEP.
And insert to splay tree the mapping between addresses from HOST_TABLE and
from loaded target image.  We rely in the host and device compiler
-   emitting variable and functions in the same order.  */
+   emitting variable and functions in the same order.
+
+   The device must be locked, and REGISTER_LOCK must be held.  */
 
 static void
 gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
-  const void *host_table, const void *target_data,
-  bool is_register_lock)
+  const void *host_table, const void *target_data)
 {
   void **host_func_table = ((void ***) host_table)[0];
   void **host_funcs_end  = ((void ***) host_table)[1];
@@ -668,8 +669,7 @@ gomp_load_image_to_device (struct gomp_device_descr 
*devicep, unsigned version,
   if (num_target_entries != num_funcs + num_vars)
 {
   gomp_mutex_unlock (>lock);
-  if (is_register_lock)
-   gomp_mutex_unlock (_lock);
+  gomp_mutex_unlock (_lock);
   gomp_fatal ("Cannot map target functions or variables"
  " (expected %u, have %u)", num_funcs + num_vars,
  num_target_entries);
@@ -710,8 +710,7 @@ gomp_load_image_to_device (struct gomp_device_descr 
*devicep, unsigned version,
  != (uintptr_t) host_var_table[i * 2 + 1])
{
  gomp_mutex_unlock (>lock);
- if (is_register_lock)
-   gomp_mutex_unlock (_lock);
+ gomp_mutex_unlock (_lock);
  gomp_fatal ("Can't map target variables (size mismatch)");
}
 
@@ -733,7 +732,8 @@ gomp_load_image_to_device (struct gomp_device_descr 
*devicep, unsigned version,
 }
 
 /* Unload the mappings described by target_data from device DEVICE_P.
-   The device must be locked.   */
+
+   The device must be locked.  */
 
 static void
 gomp_unload_image_from_device (struct gomp_device_descr *devicep,
@@ -810,7 +810,7 @@ GOMP_offload_register_ver (unsigned version, const void 
*host_table,
   gomp_mutex_lock (>lock);
   if (devicep->type == target_type && devicep->is_initialized)
gomp_load_image_to_device (devicep, version,
-  host_table, target_data, true);
+  host_table, target_data);
   gomp_mutex_unlock (>lock);
 }
 
@@ -886,14 +886,15 @@ gomp_init_device (struct gomp_device_descr *devicep)
   devicep->init_device_func (devicep->target_id);
 
   /* Load to device all images registered by the moment.  */
+  gomp_mutex_lock (_lock);
   for (i = 0; i < num_offload_images; i++)
 {
   struct offload_image_descr *image = _images[i];
   if (image->type == devicep->type)
gomp_load_image_to_device (devicep, image->version,
-

[Patch ifcvt 1/3] Factor out cost calculations from noce cases

2015-09-25 Thread James Greenhalgh


Hi,

In this patch we try to pull out the cost calculations used by the
no-conditional-execution if-convert functions. We want to replicate the
logic of the current cost decisions, but to phrase it in a way which
can be pulled out as common. To preserve the current behaviour as best
as we can, this means asking the common question, "is a magic_number
less than or equal to branch_cost". Clearly this is not the question
we want to be asking longer term, but this preserves existing target
behaviour.

This is imperfect for a few reasons.

First, some of the more ambitious noce if-convert functions have a
(slightly) more complicated cost-model. This means that we have to jump
through hoops to present the cost calculation in the common form. These
hoops are not very big, but it does make the logic seem a bit... weird.

Second, because our long term goal is to hand the cost calculation off
to the target and make it better reflect a meaningful question, we must
first build the candidate ifcvt sequence for comparison. This will cause
a slight compile time regression as we now generate more sequences before
bailing out (each of which needs a cost calculation).

On the other hand, it should be clear from this point what we have to do
to lift this out to a target hook which can do a smart job, and I think
this fits with the overall direction we intend to take.

Bootstrapped and checked on x86_64-none-linux-gnu, aarch64-none-linux-gnu
and arm-none-linux-gnueabihf without issue. Comparison of Spec2000/Spec2006
code generation for these three targets showed no changes.

OK?

Thanks,
James

---
2015-09-26  James Greenhalgh  

* ifcvt.c (noce_if_info): Add a magic_number field :-(.
(noce_is_profitable_p): New.
(noce_try_store_flag_constants): Move cost calculation
to after sequence generation, factor it out to noce_is_profitable_p.
(noce_try_addcc): Likewise.
(noce_try_store_flag_mask): Likewise.
(noce_try_cmove): Likewise.
(noce_try_cmove_arith): Likewise.
(noce_try_sign_mask): Add comment regarding cost calculations.

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 157a716..e89d567 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -820,6 +820,14 @@ struct noce_if_info
 
   /* Estimated cost of the particular branch instruction.  */
   unsigned int branch_cost;
+
+  /* For if-convert transformations, the legacy way to decide whether
+ the transformation should be applied is a comparison of a magic
+ number against BRANCH_COST.  Ultimately, this should go away, but
+ to avoid regressing targets this field encodes that number so the
+ profitability analysis can remain unchanged.  */
+  unsigned int magic_number;
+
 };
 
 static rtx noce_emit_store_flag (struct noce_if_info *, rtx, int, int);
@@ -836,6 +844,19 @@ static rtx noce_get_alt_condition (struct noce_if_info *, rtx, rtx_insn **);
 static int noce_try_minmax (struct noce_if_info *);
 static int noce_try_abs (struct noce_if_info *);
 static int noce_try_sign_mask (struct noce_if_info *);
+static bool noce_is_profitable_p (rtx_insn *, struct noce_if_info *);
+
+/* Given SEQ, which is a sequence we might want to generate after
+   if-conversion, and a basic-block structure in IF_INFO which represents
+   the code generation before if-conversion, return TRUE if this would
+   be a profitable transformation.  */
+
+static bool
+noce_is_profitable_p (rtx_insn *seq ATTRIBUTE_UNUSED,
+		  struct noce_if_info *if_info)
+{
+  return (if_info->branch_cost >= if_info->magic_number);
+}
 
 /* Helper function for noce_try_store_flag*.  */
 
@@ -1192,8 +1213,13 @@ noce_try_store_flag_constants (struct noce_if_info *if_info)
   HOST_WIDE_INT itrue, ifalse, diff, tmp;
   int normalize;
   bool can_reverse;
+  bool no_cost_model = false;
   machine_mode mode = GET_MODE (if_info->x);;
   rtx common = NULL_RTX;
+  /* ??? There are paths through this function from which no cost function
+ is checked before conversion.  Maintain that behaviour by setting
+ the magic number used by noce_is_profitable_p to zero.  */
+  if_info->magic_number = 0;
 
   rtx a = if_info->a;
   rtx b = if_info->b;
@@ -1204,9 +1230,9 @@ noce_try_store_flag_constants (struct noce_if_info *if_info)
   && CONST_INT_P (XEXP (a, 1))
   && CONST_INT_P (XEXP (b, 1))
   && rtx_equal_p (XEXP (a, 0), XEXP (b, 0))
-  && noce_operand_ok (XEXP (a, 0))
-  && if_info->branch_cost >= 2)
+  && noce_operand_ok (XEXP (a, 0)))
 {
+  if_info->magic_number = 2;
   common = XEXP (a, 0);
   a = XEXP (a, 1);
   b = XEXP (b, 1);
@@ -1278,23 +1304,33 @@ noce_try_store_flag_constants (struct noce_if_info *if_info)
 	  else
 	gcc_unreachable ();
 	}
-  else if (ifalse == 0 && exact_log2 (itrue) >= 0
-	   && (STORE_FLAG_VALUE == 1
-		   || if_info->branch_cost >= 2))
-	normalize = 1;
-  else if (itrue == 0 && exact_log2 (ifalse) >= 0 && can_reverse
-	   &&

[Patch ifcvt 3/3] Create a new target hook for deciding profitability of noce if-conversion

2015-09-25 Thread James Greenhalgh


Hi,

This patch introduces a new costs hook for deciding on the profitability
of an if-conversion candidate. We defer as much as possible for this
decision to the target, permitting the target to vary the outcome based
on the specific behaviours of a branch predictor in addition to any other
target-specific knowledge that might be available.

I had hoped to keep more of this generic, using rtx_costs and an additional
branch weighting factor to come up with a common formula, but that
proves troublesome for AArch64 where the expansion of multiple conditional
moves generates multiple redundant comparisons, which we know will be
later cleaned up.

As a target would have to make a judgement on how much of the new sequence
to cost, and can probably only do that reasonably with the old sequence as
context, I just expose both parts to the target and allow them to implement
whatever they feel best.

Bootstrapped on aarch64-none-linux-gnu, arm-none-linux-gnueabihf and
x86_64-none-linux-gnu with no issues, and checked code generation on these
platforms to ensure it has not changed.

OK?

Thanks,
James

---
2015-09-26  James Greenhalgh  

* target.def (costs): New hook vector.
(ifcvt_noce_profitable_p): New hook.
* doc/tm.texi.in: Document it.
* doc/tm.texi: Regenerate.
* targhooks.h (default_ifcvt_noce_profitable_p): New.
* targhooks.c (default_ifcvt_noce_profitable_p): New.
* ifcvt.c (noce_profitable_p): Use new target hook.
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index eb495a8..b169d7c 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -6190,6 +6190,12 @@ true for well-predicted branches. On many architectures the
 @code{BRANCH_COST} can be reduced then.
 @end defmac
 
+@deftypefn {Target Hook} bool TARGET_COSTS_IFCVT_NOCE_PROFITABLE_P (rtx_insn *@var{seq}, struct noce_if_info *@var{info})
+This hook should return TRUE if converting the IF-THEN-ELSE blocks
+  described in INFO with the if-converted sequence SEQ is expected to
+  be profitable.
+@end deftypefn
+
 Here are additional macros which do not specify precise relative costs,
 but only that certain actions are more expensive than GCC would
 ordinarily expect.
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 92835c1..4765ec9 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4575,6 +4575,8 @@ true for well-predicted branches. On many architectures the
 @code{BRANCH_COST} can be reduced then.
 @end defmac
 
+@hook TARGET_COSTS_IFCVT_NOCE_PROFITABLE_P
+
 Here are additional macros which do not specify precise relative costs,
 but only that certain actions are more expensive than GCC would
 ordinarily expect.
diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index d7fc523..e5e76bc 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -794,7 +794,7 @@ static bool
 noce_is_profitable_p (rtx_insn *seq ATTRIBUTE_UNUSED,
 		  struct noce_if_info *if_info)
 {
-  return (if_info->branch_cost >= if_info->magic_number);
+  return targetm.costs.ifcvt_noce_profitable_p (seq, if_info);
 }
 
 /* Helper function for noce_try_store_flag*.  */
diff --git a/gcc/target.def b/gcc/target.def
index f330709..996f31d 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -5876,6 +5876,21 @@ DEFHOOK
 
 HOOK_VECTOR_END (mode_switching)
 
+/* Cost functions.  */
+#undef HOOK_PREFIX
+#define HOOK_PREFIX "TARGET_COSTS_"
+HOOK_VECTOR (TARGET_COSTS_, costs)
+
+DEFHOOK
+(ifcvt_noce_profitable_p,
+ "This hook should return TRUE if converting the IF-THEN-ELSE blocks\n\
+  described in INFO with the if-converted sequence SEQ is expected to\n\
+  be profitable.",
+ bool, (rtx_insn *seq, struct noce_if_info *info),
+ default_ifcvt_noce_profitable_p)
+
+HOOK_VECTOR_END (costs)
+
 #undef HOOK_PREFIX
 #define HOOK_PREFIX "TARGET_"
 
diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index 7238c8f..7b6dbe8 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -82,6 +82,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimplify.h"
 #include "stringpool.h"
 #include "tree-ssanames.h"
+#include "ifcvt.h"
 
 
 bool
@@ -1922,4 +1923,14 @@ can_use_doloop_if_innermost (const widest_int &, const widest_int &,
   return loop_depth == 1;
 }
 
+/* For the default implementation, match the legacy logic by simply
+   comparing the estimated branch cost against a magic number.  */
+
+bool
+default_ifcvt_noce_profitable_p (rtx_insn *seq ATTRIBUTE_UNUSED,
+ struct noce_if_info *if_info)
+{
+  return (if_info->branch_cost >= if_info->magic_number);
+}
+
 #include "gt-targhooks.h"
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index 5ae991d..076d513 100644
--- a/gcc/targhooks.h
+++ b/gcc/targhooks.h
@@ -240,4 +240,7 @@ extern void default_setup_incoming_vararg_bounds (cumulative_args_t ca ATTRIBUTE
 		  tree type ATTRIBUTE_UNUSED,
 		  int *pretend_arg_size ATTRIBUTE_UNUSED,
 		  int second_time ATTRIBUTE_UNUSED);
+
+extern bool default_ifcvt_noce_profitable_p (rtx_insn *,
+	 struct

Re: [PATCH 2/5] Reimplement diagnostic_show_locus, introducing rich_location classes (v2)

2015-09-25 Thread Dodji Seketeli

Manuel López-Ibáñez  a écrit:

> On 25 September 2015 at 10:51, Dodji Seketeli  wrote:
>> The line-map parts are OK to me too, but I have no power on those.  So I
>
> You are listed as "line map" maintainer in MAINTAINERS. I rooted for
> you! :)

Right, I meant the libcpp parts (which are not line-map.h), sorry :-)

Cheers,

-- 
Dodji

Re: using scratchpads to enhance RTL-level if-conversion: the new patch is almost ready to be prepared for merging to trunk, but not 100% ready yet

2015-09-25 Thread Jeff Law


On 09/24/2015 09:19 AM, Abe wrote:


On 9/24/15 1:07 AM, Jeff Law wrote:


So what that means is the presence or absence of debug information is
causing a difference in

 > the code you generate.  That is (of course) bad and indicates a bug
of some kind in your code.


I haven't put your code under a debugger or anything like that, but
this does jump out:



+rtx_insn* temp_src_insn = BB_HEAD (then_bb);
+if (temp_src_insn && NOTE_INSN_BASIC_BLOCK_P (temp_src_insn))
+  temp_src_insn = NEXT_INSN (temp_src_insn); /* skip over a
start-of-BB note */



What if BB_HEAD (then_bb) is a DEBUG_INSN with -g enabled, but is a
NOTE_INSN_BASIC_BLOCK when -g is not enabled.  That could cause
this kind of failure.



Thanks very much!  I`m just checking email right before heading out to
surgery, so I can`t experiment on changing the above code right now, but
I`ll be sure to do so next week.

After some thinking, I thought of this question: would "-g"/"-g3"/etc.
cause a DEBUG_INSN to be present in the RTL as the first insn in the BB,
just before the start-of-BB note?  In other words, rather than only
checking to see if I should skip copying the first insn in the BB, would
checking {the first insn, and if that`s not a NOTE, then check the
second insn} and skipping over whichever one is the first "hit" for
"NOTE" be a safe strategy?
I'd be surprised if we had DEBUG_INSNs as the first insn in a block 
(before the note), but it can't hurt to verify.  I believe cfgrtl checks 
for proper location of the block note.  But maybe I'm mis-remembering. 
It might not be a bad idea to run the verification code immediately 
after your transformation (for testing purposes only).


Otherwise, you're just going to have to slog through and find out why 
the codegen is different.  These kind of bugs are usually painful to 
track down, but it's usually worth the time spent in the end.


jeff

Re: [PATCH 4/5] Implement tree expression tracking in C FE (v2)

2015-09-25 Thread David Malcolm

On Fri, 2015-09-25 at 16:06 +0200, Dodji Seketeli wrote:
> Hello,
> 
> Similarly to a comment I made on the previous patch of the series,
> 
> +++ b/libcpp/include/line-map.h
> 
> [...]
> 
>  struct GTY(()) location_adhoc_data {
>source_location locus;
> +  source_range src_range;
>void * GTY((skip)) data;
>  };
> 
> Could we just change the type of locus and make it be a source_range
> instead?  With the right converting constructor (in the source_range
> class) that takes a source_location, the amount of churn should
> hopefully be minimized, or maybe I am missing something ...

Thanks.

I think that the above is one place where we *would* want both locus and
src_range.

One key idea within this patch is for source_location/location_t to be
able to track both a point/caret/locus and a range containing it.   The
idea is to stash the range information within the ad-hoc table.

For the most simple expressions involving just one token, locus ==
src_range.m_start, but in the most general case, locus,
src_range.m_start and src_range.m_finish are all different from each
other; consider this expression:

  foo && bar
  ^~

(i.e. where the caret/locus is at the first '&' and the range starts at
the 'f' of "foo" and finishes at the 'r' of "bar").

As noted elsewhere, we might try to pack short ranges directly into the
32 bits of source_location:
 https://gcc.gnu.org/ml/gcc-patches/2015-09/msg01826.html
which would avoid having to use ad-hoc for such short ranges; ideally
most tokens.  I'm experimenting with that (I don't have it fully working
yet).

> [...]
> 
> diff --git a/libcpp/line-map.c b/libcpp/line-map.c
> 
> [...]
> 
> +source_range
> +get_range_from_adhoc_loc (struct line_maps *set, source_location loc)
> +{
> 
> Please add a comment for this function.

(nods)

> [...]
> 
> diff --git a/gcc/tree.c b/gcc/tree.c
> 
> +void
> +set_source_range (tree *expr, location_t start, location_t finish)
> 
> Please add a comment for this function and its overloads.

(nods)

> [...]
> 
> +void
> +set_c_expr_source_range (c_expr *expr,
> +  location_t start, location_t finish)
> +{
> 
> Likewise.

(nods)

> +location_t
> +set_block (location_t loc, tree block)
> +{
> 
> Likewise.  I am wondering if we shouldn't even change the name of this
> function to better suit what it does: associate a tree to a location.

"associate_tree_with_location" ?

> +  source_range src_range;
> +  if (IS_ADHOC_LOC (loc))
> +/* FIXME: can we update in-place?  */
> +src_range = get_range_from_adhoc_loc (line_table, loc);
> 
> Hmmh, at the moment, I don't think we can update an entry of the adhoc
> map that associates {location, range, tree} as all three components
> contribute to the hash value of the entry.  A new triplet means a new
> entry.
> 
> My understanding is that initially the two elements of the pair
> {location, tree} were contributing to the hash value because the same
> location could very well belong to different blocks.

Ah; thanks.

> +  else
> +src_range = source_range::from_location (loc);
> +
> +  return COMBINE_LOCATION_DATA (line_table, loc, src_range, block);
> +}
> 
> [...]
> 
> @@ -6091,6 +6112,10 @@ c_parser_conditional_expression (c_parser *parser, 
> struct c_expr *after,
>  
>if (c_parser_next_token_is_not (parser, CPP_QUERY))
>  return cond;
> +  if (cond.value != error_mark_node)
> +start = cond.src_range.m_start;
> +  else
> +start = UNKNOWN_LOCATION;
> 
> I think that "getting the start range of a c_expr" is an operation
> that is generally useful; and it's also generally useful for that
> operation to handle cases where the tree carried by the c_expr can be
> an error_mark_node.  So maybe it would be appropriate to have a getter
> function for that operation and use it here instead.

Good idea; thanks.  That will be helpful as I try other representations.

> You would then use that operation in the other places of this patch
> that are getting c_expr::src_range.start -- by the way, those other
> places are not handling the error_mark_node case like the above.
> 
> [...]
> 
> +++ b/gcc/c/c-tree.h
> @@ -132,6 +132,9 @@ struct c_expr
>   The type of an enum constant is a plain integer type, but this
>   field will be the enum type.  */
>tree original_type;
> +
> +  /* FIXME.  */
> +  source_range src_range;
>  };
> 
> Why a FIXME here?

To remind myself before posting the patches that I need to give the
field a descriptive comment, explaining what purpose it serves.

Oops :)

It probably should read something like this:

  /* The source range of this C expression.  This might
 be thought of as redundant, since ranges for
 expressions can be stored in a location_t, but
 not all tree nodes in expressions have a location_t.

 Consider this source code:  

int test (int foo)
{
  return foo * 100;
^^^   ^^^
   }

During C parsing, the ranges for "foo" and

[PATCH 1/2] s/390: Implement "target" attribute.

2015-09-25 Thread Dominik Vogt

On Fri, Sep 25, 2015 at 02:59:41PM +0100, Dominik Vogt wrote:
> The following set of two patches implements the function
> __attribute__ ((target("..."))) and the corresponding #pragma GCC
> target("...") on S/390.  It comes with certain limitations:
> 
>  * It is not possible to change any options that affect the ABI or
>the definition of target macros by using the attribute (vx,
>htm, zarch and others).  Some of them are still supported but
>unable to change the definition of the corresponding target macros.
>In these cases, the pragma has to be used.  One reason for this
>is that it is not possible to change the definition of the target
>macros with the attribute, but the implementation of some features
>relies on them.
> 
>  * Even with the pragma it is not possible to switch between zarch
>and esa architecture because internal data typed would have to be
>changed at Gcc run time.
> 
> The second patch contains a long term change in the interface with
> the assembler.  Currently, the compiler wrapper passes the same
> -march= and -mtune= options to the compiler and the assembler.
> The patch makes this obsolete by emitting ".machine" and
> ".machinemode" dirctives to the top of the assembly language file.
> The old way ist still supported but may be removed once the
> ".machine" feature is supported by all as versions in the field.
> 
> The second patch depends on the first one, and both require the
> (latest) change proposed in this thread:
> https://gcc.gnu.org/ml/gcc-patches/2015-09/msg01546.html

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany
gcc/ChangeLog

* config/s390/s390.opt (s390_arch_string): Remove.
(s390_tune_string): Likewise.
(s390_cost_pointer): Add Variable.
(s390_arch_specified): Add TargetVariable.
(s390_tune_specified): Likewise.
(s390_tune_flags): Likewise.
(s390_arch_flags): Save option.
(march=): Likewise.
(mbackchain): Likewise.
(mdebug): Likewise.
(mesa): Likewise.
(mhard-dfp): Likewise.
(mhard-float): Likewise.
(mlong-double-128): Likewise.
(mlong-double-64): Likewise.
(mhtm): Likewise.
(mvx): Likewise.
(mpacked-stack): Likewise.
(msmall-exec): Likewise.
(msoft-float): Likewise.
(mstack-guard=): Likewise.
(mstack-size=): Likewise.
(mtune=): Likewise.
(mmvcle): Likewise.
(mzvector): Likewise.
(mzarch): Likewise.
(mbranch-cost=): Likewise.
(mwarn-dynamicstack): Likewise.
(mwarn-framesize=): Likewise.
(mwarn-dynamicstack): Allow mno-warn-dynamicstack.
(mwarn-framesize=): Convert to UInteger (negative values are rejected
now).
* config/s390/s390-c.c (s390_cpu_cpp_builtins_internal): Split setting
macros changeable through the GCC target pragma into a separate
function.
(s390_cpu_cpp_builtins): Likewise.
(s390_pragma_target_parse): New function, implement GCC target pragma
if enabled.
(s390_register_target_pragmas): Register s390_pragma_target_parse if
available.
* common/config/s390/s390-common.c (s390_handle_option):
Export.
Move setting s390_arch_flags to s390.c.
Remove s390_tune_flags.
Allow 0 as argument to -mstack-size (switch to default value).
Allow 0 as argument to -mstack-guard (switch off).
Remove now unnecessary explicit parsing code for -mwarn-framesize.
* config/s390/s390-protos.h (s390_handle_option): Export.
(s390_valid_target_attribute_tree): Export.
(s390_reset_previous_fndecl): Export.
* config/s390/s390-builtins.def: Use new macro B_GROUP to mark the start
and end of HTM and VX builtins.
* config/s390/s390-builtins.h (B_GROUP): Use macro.
* config/s390/s390-opts.h: Add comment about processor_type usage.
* config/s390/s390.h (TARGET_CPU_IEEE_FLOAT_P): New macro.
(TARGET_CPU_ZARCH_P): Likewise.
(TARGET_CPU_LONG_DISPLACEMENT_P): Likewise
(TARGET_CPU_EXTIMM_P): Likewise
(TARGET_CPU_DFP_P): Likewise
(TARGET_CPU_Z10_P): Likewise
(TARGET_CPU_Z196_P): Likewise
(TARGET_CPU_ZEC12_P): Likewise
(TARGET_CPU_HTM_P): Likewise
(TARGET_CPU_Z13_P): Likewise
(TARGET_CPU_VX_P): Likewise
(TARGET_HARD_FLOAT_P): Likewise
(TARGET_LONG_DISPLACEMENT_P): Likewise
(TARGET_EXTIMM_P): Likewise
(TARGET_DFP_P): Likewise
(TARGET_Z10_P): Likewise
(TARGET_Z196_P): Likewise
(TARGET_ZEC12_P): Likewise
(TARGET_HTM_P): Likewise
(TARGET_Z13_P): Likewise
(TARGET_VX_P): Likewise
(TARGET_CPU_EXTIMM): Fix indentation
(TARGET_CPU_DFP): Likewise.
(TARGET_CPU_Z10): Likewise.
(TARGET_CPU_Z196): Likewise.
(TARGET_CPU_ZEC12): Likewise.

[Patch ifcvt costs 0/3] Introduce a new target hook for ifcvt costs.

2015-09-25 Thread James Greenhalgh

Hi,

In relation to the patch I put up for review a few weeks ago to teach
RTL if-convert to handle multiple sets in a basic block [1], I was
asking about a sensible cost model to use. There was some consensus at
Cauldron that what should be done in this situation is to introduce a
target hook that delegates answering the question to the target.

This patch series introduces that new target hook to provide cost
decisions for the RTL ifcvt pass.

The idea is to give the target full visibility of the proposed
transformation, and allow it to respond as to whether if-conversion in that
way is profitable.

In order to preserve current behaviour across targets, we will need the
default implementation to keep to the strategy of simply comparing branch
cost against a magic number. Patch 1/3 performs this refactoring, which is
a bit hairy in some corner cases.

Patch 2/3 is a simple code move, pulling the definition of the if_info
structure used by RTL if-convert in to ifcvt.h where it can be included
by targets.

Patch 3/3 then introduces the new target hook, with the same default
behaviour as was previously in noce_is_profitable_p.

The series has been bootstrapped on ARM, AArch64 and x86_64 targets, and
I've verified with Spec2000 and Spec2006 runs that there are no code
generation differences for any of these three targets after the patch.

I also gave ultrasparc3 a quick go, from what I could see, I changed the
register allocation for the floating-point condition code registers.
Presumably this is a side effect of first constructing RTXen that I then
discard. I didn't see anything which looked like more frequent reloads or
substantial code generation changes, though I'm not familiar with the
intricacies of the Sparc condition registers :).

I've included a patch 4/3, to give an example of what a target might want
to do with this hook. It needs work for tuning and deciding how the function
should actually behave, but works if it is thought of as more of a
strawman/prototype than a patch submission.

Are parts 1, 2 and 3 OK?

Thanks,
James

[1]: https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00781.html

---
[Patch ifcvt 1/3] Factor out cost calculations from noce cases

2015-09-26  James Greenhalgh  

* ifcvt.c (noce_if_info): Add a magic_number field :-(.
(noce_is_profitable_p): New.
(noce_try_store_flag_constants): Move cost calculation
to after sequence generation, factor it out to noce_is_profitable_p.
(noce_try_addcc): Likewise.
(noce_try_store_flag_mask): Likewise.
(noce_try_cmove): Likewise.
(noce_try_cmove_arith): Likewise.
(noce_try_sign_mask): Add comment regarding cost calculations.

[Patch ifcvt 2/3] Move noce_if_info in to ifcvt.h

2015-09-26  James Greenhalgh  

* ifcvt.c (noce_if_info): Move to...
* ifcvt.h (noce_if_info): ...Here.

[Patch ifcvt 3/3] Create a new target hook for deciding profitability
of noce if-conversion

2015-09-26  James Greenhalgh  

* target.def (costs): New hook vector.
(ifcvt_noce_profitable_p): New hook.
* doc/tm.texi.in: Document it.
* doc/tm.texi: Regenerate.
* targhooks.h (default_ifcvt_noce_profitable_p): New.
* targhooks.c (default_ifcvt_noce_profitable_p): New.
* ifcvt.c (noce_profitable_p): Use new target hook.

[Patch Prototype AArch64 ifcvt 4/3] Wire up the new if-convert costs
hook for AArch64

2015-09-26  James Greenhalgh  

* config/aarch64/aarch64.c
(aarch64_additional_branch_cost_for_probability): New.
(aarch64_ifcvt_noce_profitable_p): Likewise.
(TARGET_COSTS_IFCVT_NOCE_PROFITABLE_P): Likewise.

Re: Elimitate duplication of get_catalogs in different abi

2015-09-25 Thread Jonathan Wakely


On 25/09/15 16:08 +0100, Jonathan Wakely wrote:

On 23/09/15 21:28 +0200, François Dumont wrote:

On 05/09/2015 23:02, François Dumont wrote:

On 22/08/2015 14:24, Daniel Krügler wrote:

2015-08-21 23:11 GMT+02:00 François Dumont :

I think I found a better way to handle this problem. It is c++locale.cc
that needs to be built with --fimplicit-templates. I even think that the
*_cow.cc file do not need this option but as I don't know what is the
drawback of this option I kept it. I also explicitely used the file name
c++locale.cc even if it is an alias to a configurable source file.  I
guess there must be some variable to use no ?

With this patch there are 6 additional symbols. I guess I need to
declare those in the scripts even if it is for internal library usage,
right ?

I would expect that the new Catalog_info definition either has deleted
or properly (user-)defined copy constructor and copy assignment
operator.


- Daniel


This type is used in C++98 so I need to make those private, not deleted.

With this change, is the patch ok to commit ?

François



What about this patch ?

I am still uncomfortable in exposing those implementation details in the
versionned symbols but I don't know how to do otherwise. Do you want me
to push this code in std::__detail namespace ?


I think because the types are only used internally in the library we
don't need to export them. The other code inside the shared library
can refer to those symbols without them being exported.

That way users can't see their names (because they're not in any
public headers) and can't use the symbols (because they're not
exported) so they're pure internal implementation details.

I tested it briefly and it seems to work, so if you can confirm it
still works then the patch is OK without the changes to gnu.ver


Oh, the problem is that the symbols are matched by patterns in the
_GLIBCXX_3.4 version, so get exported with that version instead. Gah.

In that case your patch would not have worked on Solaris anyway, as
the SOlaris linker gives an error if a symbol matches patterns in more
than one symbol version.

Let me try to adjust the gnu.ver script to make this work ...

Dead code in move_stmt_r

2015-09-25 Thread Bernd Schmidt

I noticed this a while ago while trying to defer parts of omp expansion 
until after LTO. move_tree_r, which is used when moving code into omp 
child functions, has code not to remap variables inside an OMP 
directive. However, by the time we get here, these directives should all 
have been expanded already, and the code appears to be dead.


Hence, the following patch. Bootstrapped and tested on x86_64-linux, ok?


Bernd
	* tree-cfg.c (move_stmt_r): Replace dead code with an assert.
	Remove name from now unused argument.

diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index 807d96f..5712e8d 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -6557,7 +6557,7 @@ move_stmt_eh_region_tree_nr (tree old_t_nr, struct move_stmt_d *p)
statement.  */
 
 static tree
-move_stmt_r (gimple_stmt_iterator *gsi_p, bool *handled_ops_p,
+move_stmt_r (gimple_stmt_iterator *gsi_p, bool *,
 	 struct walk_stmt_info *wi)
 {
   struct move_stmt_d *p = (struct move_stmt_d *) wi->info;
@@ -6619,21 +6619,7 @@ move_stmt_r (gimple_stmt_iterator *gsi_p, bool *handled_ops_p,
 case GIMPLE_OMP_CONTINUE:
   break;
 default:
-  if (is_gimple_omp (stmt))
-	{
-	  /* Do not remap variables inside OMP directives.  Variables
-	 referenced in clauses and directive header belong to the
-	 parent function and should not be moved into the child
-	 function.  */
-	  bool save_remap_decls_p = p->remap_decls_p;
-	  p->remap_decls_p = false;
-	  *handled_ops_p = true;
-
-	  walk_gimple_seq_mod (gimple_omp_body_ptr (stmt), move_stmt_r,
-			   move_stmt_op, wi);
-
-	  p->remap_decls_p = save_remap_decls_p;
-	}
+  gcc_assert (!is_gimple_omp (stmt));
   break;
 }

Re: [PATCH 2/4 v2] bb-reorder: Add the "simple" algorithm

2015-09-25 Thread Peter Bergner

On Fri, 2015-09-25 at 09:16 -0500, Segher Boessenkool wrote:
>   (reorder_basic_blocks): Choose between the STC and the simple
>   algorithms (always choose the former).
[snip]
@@ -2274,7 +2444,10 @@ reorder_basic_blocks (void)
>set_edge_can_fallthru_flag ();
>mark_dfs_back_edges ();
> 
> -  reorder_basic_blocks_software_trace_cache ();
> +  if (1)
> +reorder_basic_blocks_software_trace_cache ();
> +  else
> +reorder_basic_blocks_simple ();

Did you write the code this way because you're thinking of allowing
either reorder function to be called in the future?

Peter

Re: [PATCH 2/4 v2] bb-reorder: Add the "simple" algorithm

2015-09-25 Thread Segher Boessenkool

On Fri, Sep 25, 2015 at 10:59:37AM -0500, Peter Bergner wrote:
> On Fri, 2015-09-25 at 09:16 -0500, Segher Boessenkool wrote:
> > (reorder_basic_blocks): Choose between the STC and the simple
> > algorithms (always choose the former).
> [snip]
> @@ -2274,7 +2444,10 @@ reorder_basic_blocks (void)
> >set_edge_can_fallthru_flag ();
> >mark_dfs_back_edges ();
> > 
> > -  reorder_basic_blocks_software_trace_cache ();
> > +  if (1)
> > +reorder_basic_blocks_software_trace_cache ();
> > +  else
> > +reorder_basic_blocks_simple ();
> 
> Did you write the code this way because you're thinking of allowing
> either reorder function to be called in the future?

I _wrote_ it as "if (0)" actually, to test the new algo ;-)

The next patch (3/4) wires up a new compiler flag to choose between the
two algorithms.


Segher

Re: [PATCH 0/5] RFC: Overhaul of diagnostics (v2)

2015-09-25 Thread Jeff Law


On 09/24/2015 02:15 AM, Richard Biener wrote:

On Thu, Sep 24, 2015 at 2:25 AM, David Malcolm  wrote:

LTO code does support ad-hoc locations but they are "restored" only
when reading function bodies and stmts (by means of COMBINE_LOCATION_DATA).


The obvious simplification would be, as you suggest, to not bother
storing range information with LTO, falling back to just the existing
representation.  Then there's no need to extend LTO to serialize ad-hoc
data; simply store the underlying locus into the bit stream.  I think
that this happens already: lto-streamer-out.c calls expand_location and
stores the result, so presumably any ad-hoc location_t values made by
the v2 patches would have dropped their range data there when I ran the
test suite.


Yep.  We only preserve BLOCKs, so if you don't add extra code to
preserve ranges they'll be "dropped".
Right, but as David pointed out, most of the interesting uses for ranges 
(at least right now) are in the front-end diagnostics.  So losing them 
at this point isn't a major loss IMHO.


jeff

Re: [PATCH 4/5] Implement tree expression tracking in C FE (v2)

2015-09-25 Thread Dodji Seketeli

David Malcolm  a écrit:

> On Fri, 2015-09-25 at 16:06 +0200, Dodji Seketeli wrote:
>> Hello,
>> 
>> Similarly to a comment I made on the previous patch of the series,
>> 
>> +++ b/libcpp/include/line-map.h
>> 
>> [...]
>> 
>>  struct GTY(()) location_adhoc_data {
>>source_location locus;
>> +  source_range src_range;
>>void * GTY((skip)) data;
>>  };
>> 
>> Could we just change the type of locus and make it be a source_range
>> instead?  With the right converting constructor (in the source_range
>> class) that takes a source_location, the amount of churn should
>> hopefully be minimized, or maybe I am missing something ...
>
> Thanks.
>
> I think that the above is one place where we *would* want both locus and
> src_range.

Right, as opposed to the token case where conceptually, all we need is
the begining and the end of the token.  My confusion.

[...]

> As noted elsewhere, we might try to pack short ranges directly into the
> 32 bits of source_location:
>  https://gcc.gnu.org/ml/gcc-patches/2015-09/msg01826.html
> which would avoid having to use ad-hoc for such short ranges; ideally
> most tokens.  I'm experimenting with that (I don't have it fully working
> yet).

Right.  My personal inclination would be to make the general case of
storing all ranges in this adhoc data structure, or, even, into another
on-the-side data structure, similar to the adhoc one, but dedicated to
range associated to points, as you see fit.

Then when that works, consider the optimization of stuffing short ranges
into the 32 bits of source_location, depending on the memory profiles we
get.  But it's your call :-)

[...]

>> +location_t
>> +set_block (location_t loc, tree block)
>> +{
>> 
>> Likewise.  I am wondering if we shouldn't even change the name of this
>> function to better suit what it does: associate a tree to a location.
>
> "associate_tree_with_location" ?

If you find it cool, I am cool :-)

[...]

>> +++ b/gcc/c/c-tree.h
>> @@ -132,6 +132,9 @@ struct c_expr
>>   The type of an enum constant is a plain integer type, but this
>>   field will be the enum type.  */
>>tree original_type;
>> +
>> +  /* FIXME.  */
>> +  source_range src_range;
>>  };
>> 
>> Why a FIXME here?
>
> To remind myself before posting the patches that I need to give the
> field a descriptive comment, explaining what purpose it serves.
>
> Oops :)
>
> It probably should read something like this:
>
>   /* The source range of this C expression.  This might
>  be thought of as redundant, since ranges for
>  expressions can be stored in a location_t, but
>  not all tree nodes in expressions have a location_t.
>
>  Consider this source code:  
>
>   int test (int foo)
>   {
> return foo * 100;
> ^^^   ^^^
>}
>
> During C parsing, the ranges for "foo" and "100" are
> stored within this field of c_expr, but after parsing
> to GENERIC, all we have is a VAR_DECL and an
> INTEGER_CST (the former's location is in at the top of the
> function, and the latter has no location).
>
> For consistency, we store ranges for all expressions
> in this field, not just those that don't have a
> location_t. */
>   source_range src_range;

Great, thanks.

[...]

> [BTW, I'm about to disappear on a vacation from tomorrow until October
> 6th, and will be away from email and computers]

Thanks for the heads-up.

Cheers,

-- 
Dodji

Re: [PATCH] Clear variables with stale SSA_NAME_RANGE_INFO (PR tree-optimization/67690)

2015-09-25 Thread Richard Biener

On September 25, 2015 3:49:34 PM GMT+02:00, Marek Polacek  
wrote:
>On Fri, Sep 25, 2015 at 09:29:30AM +0200, Richard Biener wrote:
>> On Thu, 24 Sep 2015, Marek Polacek wrote:
>> 
>> > As Richi said in
>,
>> > using recorded SSA name range infos in VRP is likely to expose
>errors in the
>> > ranges.  This PR is such a case.  As discussed in the PR, after
>tail merging
>> > via PRE the range infos cannot be relied upon anymore, so we need
>to clear
>> > them.
>> > 
>> > Since tree-ssa-ifcombine.c already had code to clean up the flow
>data in a BB,
>> > I've factored it out to a common function.
>> > 
>> > Bootstrapped/regtested on x86_64-linux, ok for trunk and 5?
>> 
>> I believe for tail-merge you also need to clear range info on
>> PHI defs in the BB.  For ifcombine this wasn't necessary (no PHI
>nodes
>> in the relevant CFG), but it's ok to extend the new 
>> reset_flow_sensitive_info_in_bb function to also reset PHI defs.
>
>All right.
> 
>> Ok with that change.
>
>Since I'm not completely sure if I did the right thing here, could you
>please have another look at the new function?

Doesn't work that way.  You need to iterate over the PHI sequence separately 
via gsi_start_phis(bb), etc.

>Bootstrapped/regtested on x86_64-linux, ok for trunk/5?
>
>2015-09-25  Marek Polacek  
>
>   PR tree-optimization/67690
>   * tree-ssa-ifcombine.c (pass_tree_ifcombine::execute): Call
>   reset_flow_sensitive_info_in_bb.
>   * tree-ssa-tail-merge.c (replace_block_by): Likewise.
>   * tree-ssanames.c: Include "gimple-iterator.h".
>   (reset_flow_sensitive_info_in_bb): New function.
>   * tree-ssanames.h (reset_flow_sensitive_info_in_bb): Declare.
>
>   * gcc.dg/torture/pr67690.c: New test.
>
>diff --git gcc/testsuite/gcc.dg/torture/pr67690.c
>gcc/testsuite/gcc.dg/torture/pr67690.c
>index e69de29..491de51 100644
>--- gcc/testsuite/gcc.dg/torture/pr67690.c
>+++ gcc/testsuite/gcc.dg/torture/pr67690.c
>@@ -0,0 +1,32 @@
>+/* { dg-do run } */
>+
>+const int c1 = 1;
>+const int c2 = 2;
>+
>+int
>+check (int i)
>+{
>+  int j;
>+  if (i >= 0)
>+j = c2 - i;
>+  else
>+j = c2 - i;
>+  return c2 - c1 + 1 > j;
>+}
>+
>+int invoke (int *pi) __attribute__ ((noinline,noclone));
>+int
>+invoke (int *pi)
>+{
>+  return check (*pi);
>+}
>+
>+int
>+main ()
>+{
>+  int i = c1;
>+  int ret = invoke ();
>+  if (!ret)
>+__builtin_abort ();
>+  return 0;
>+}
>diff --git gcc/tree-ssa-ifcombine.c gcc/tree-ssa-ifcombine.c
>index 9f04174..66be430 100644
>--- gcc/tree-ssa-ifcombine.c
>+++ gcc/tree-ssa-ifcombine.c
>@@ -769,16 +769,7 @@ pass_tree_ifcombine::execute (function *fun)
> {
>   /* Clear range info from all stmts in BB which is now executed
>  conditional on a always true/false condition.  */
>-  for (gimple_stmt_iterator gsi = gsi_start_bb (bb);
>-   !gsi_end_p (gsi); gsi_next ())
>-{
>-  gimple *stmt = gsi_stmt (gsi);
>-  ssa_op_iter i;
>-  tree op;
>-  FOR_EACH_SSA_TREE_OPERAND (op, stmt, i, SSA_OP_DEF)
>-reset_flow_sensitive_info (op);
>-}
>-
>+  reset_flow_sensitive_info_in_bb (bb);
>   cfg_changed |= true;
> }
> }
>diff --git gcc/tree-ssa-tail-merge.c gcc/tree-ssa-tail-merge.c
>index 0ce59e8..487961e 100644
>--- gcc/tree-ssa-tail-merge.c
>+++ gcc/tree-ssa-tail-merge.c
>@@ -1534,6 +1534,10 @@ replace_block_by (basic_block bb1, basic_block
>bb2)
>   e2->probability = GCOV_COMPUTE_SCALE (e2->count, out_sum);
> }
> 
>+  /* Clear range info from all stmts in BB2 -- this transformation
>+ could make them out of date.  */
>+  reset_flow_sensitive_info_in_bb (bb2);
>+
>   /* Do updates that use bb1, before deleting bb1.  */
>   release_last_vdef (bb1);
>   same_succ_flush_bb (bb1);
>diff --git gcc/tree-ssanames.c gcc/tree-ssanames.c
>index 4199290..ef21137 100644
>--- gcc/tree-ssanames.c
>+++ gcc/tree-ssanames.c
>@@ -23,6 +23,7 @@ along with GCC; see the file COPYING3.  If not see
> #include "backend.h"
> #include "tree.h"
> #include "gimple.h"
>+#include "gimple-iterator.h"
> #include "hard-reg-set.h"
> #include "ssa.h"
> #include "alias.h"
>@@ -544,6 +545,27 @@ reset_flow_sensitive_info (tree name)
> SSA_NAME_RANGE_INFO (name) = NULL;
> }
> 
>+/* Clear all flow sensitive data from all statements and PHI
>definitions
>+   in BB.  */
>+
>+void
>+reset_flow_sensitive_info_in_bb (basic_block bb)
>+{
>+  for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p (gsi);
>+   gsi_next ())
>+{
>+  gimple *stmt = gsi_stmt (gsi);
>+  ssa_op_iter i;
>+  def_operand_p def;
>+  FOR_EACH_PHI_OR_STMT_DEF (def, stmt, i, SSA_OP_DEF)
>+  {
>+tree var = DEF_FROM_PTR (def);
>+if (TREE_CODE (var) != SSA_NAME)
>+  continue;
>+reset_flow_sensitive_info (var);
>+  }
>+}
>+}
>

Re: [PATCH] Disable -fno-reorder-blocks-and-partition if no -fprofile-use to avoid unnecessary overhead

2015-09-25 Thread Jan Hubicka

> > What has changed between then and now? Also, do we not use
> > estimates/heuristics when not using a profile?
> 
> Nothing has changed - splitting effectively never kicked in without a
> profile. Honza and I had discussed the idea that static profile
> heuristics could eventually be used to distinguish hot vs cold bbs,

Yep, the basic idea was to move EH clenaups to the cold section for start.  The
cleanup code can get surprisingly large for some C++ apps.

> but that hasn't happened. What I didn't notice until recently was the
> size increase in the .o files from varasm adding in unnecessary
> sections and labels when this option was on. Unless and until static

Perhaps we also may just teach varasm to not output those when function is not
split.  I am happy with the patch as it is because it is pointless to run the
code when no splitting happens.

Honza
> heuristics are used to distinguish cold bbs in
> probably_never_executed_bb_p, I don't think it makes sense to do
> anything finer grained that just disabling the option.
> 
> Teresa
> 
> >
> >
> > Bernd
> 
> 
> 
> -- 
> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413

[PATCH, Mips] Compact branch/delay slot optimization.

2015-09-25 Thread Simon Dardis

Hello,

The following patch adds three small optimizations related to compact branches 
for MIPSR6: 

When the result of a load is used by a delay slot branch immediately 
afterwards, undo the
delay slot branch scheduling to hide the pipeline bubble if safe and use a 
compact branch
instead.

Undo delay slot scheduling if an orphaned high-part relocation is in a delay 
slot and use a
compact branch is used instead.

Undo delay slot scheduling in the case where a forbidden slot hazard is 
immediately followed
by a delay slot branch. This would cause a nop to be inserted otherwise.

No regressions. OK to apply?

Thanks,
Simon

gcc/
* config/mips/mips.c: (mips_break_sequence): New function. 
  (mips_reorg_process_insns) Use it. Use compact branches in selected
  situations.

gcc/testsuite/
* gcc.target/mips/split-ds-sequence.c: Test for the above.

Index: config/mips/mips.c
===
--- config/mips/mips.c  (revision 227676)
+++ config/mips/mips.c  (working copy)
@@ -16973,6 +16973,23 @@
   }
 }
 
+/* Remove a SEQUENCE and replace it with the delay slot instruction
+   followed by the branch and return the instruction in the delay slot.
+   Return the first of the two new instructions.
+   Subroutine of mips_reorg_process_insns.  */
+
+static rtx_insn *
+mips_break_sequence (rtx_insn * insn)
+{
+  rtx_insn * before = PREV_INSN (insn);
+  rtx_insn * branch = SEQ_BEGIN (insn);
+  rtx_insn * ds = SEQ_END (insn);
+  remove_insn (insn);
+  add_insn_after (ds, before, NULL);
+  add_insn_after (branch, ds, NULL);
+  return ds;
+}
+
 /* Go through the instruction stream and insert nops where necessary.
Also delete any high-part relocations whose partnering low parts
are now all dead.  See if the whole function can then be put into
@@ -17065,6 +17082,66 @@
{
  if (GET_CODE (PATTERN (insn)) == SEQUENCE)
{
+ rtx_insn * next_active = next_active_insn (insn);
+ /* Undo delay slots to avoid bubbles if the next instruction can
+be placed in a forbidden slot or the cost of adding an
+explicit NOP in a forbidden slot is OK.  */
+ if (TARGET_CB_MAYBE
+ && INSN_P (SEQ_BEGIN (insn))
+ && INSN_P (SEQ_END (insn))
+ && ((next_active
+  && INSN_P (next_active)
+  && GET_CODE (PATTERN (next_active)) != SEQUENCE
+  && get_attr_can_delay (next_active) == CAN_DELAY_YES)
+ || !optimize_size))
+   {
+ /* To hide a potential pipeline bubble, if we scan backwards
+from the current SEQUENCE and find that there is a load
+of a value that is used in the CTI and there are no
+dependencies between the CTI and instruction in the delay
+slot, break the sequence so the load delay is hidden.  */
+ HARD_REG_SET uses;
+ CLEAR_HARD_REG_SET (uses);
+ note_uses ( (SEQ_BEGIN (insn)), record_hard_reg_uses,
+);
+ HARD_REG_SET delay_sets;
+ CLEAR_HARD_REG_SET (delay_sets);
+ note_stores (PATTERN (SEQ_END (insn)), record_hard_reg_sets,
+  _sets);
+
+ rtx prev = prev_active_insn (insn);
+ if (prev
+ && GET_CODE (PATTERN (prev)) == SET
+ && MEM_P (SET_SRC (PATTERN (prev
+   {
+ HARD_REG_SET sets;
+ CLEAR_HARD_REG_SET (sets);
+ note_stores (PATTERN (prev), record_hard_reg_sets,
+  );
+
+ /* Re-order if safe.  */
+ if (!hard_reg_set_intersect_p (delay_sets, uses)
+ && hard_reg_set_intersect_p (uses, sets))
+   {
+ next_insn = mips_break_sequence (insn);
+ /* Need to process the hazards of the newly
+introduced instructions.  */
+ continue;
+   }
+   }
+
+ /* If we find an orphaned high-part relocation in a delay
+slot then we can convert to a compact branch and get
+the orphaned high part deleted.  */
+ if (mips_orphaned_high_part_p (, SEQ_END (insn)))
+   {
+ next_insn = mips_break_sequence (insn);
+ /* Need to process the hazards of the newly
+introduced instructions.  */
+ continue;
+   }
+   }
+
  /* If we find an orphaned high-part relocation in a delay

Re: [Patch 1/2 AArch64/ARM] Give AArch64 ROR (Immediate) a new type attribute

2015-09-25 Thread Marcus Shawcroft

On 25 September 2015 at 14:19, James Greenhalgh
 wrote:

> 2015-09-25  James Greenhalgh  
>
> * config/arm/types.md (type): Add rotate_imm.
> * config/aarch64/aarch64.md (*ror3_insn): Split out the
> ROR immediate case.
> (*rorsi3_insn_uxtw): Likewise.
> * config/aarch64/thunderx.md (thunderx_shift): Add rotate_imm.
> * config/arm/cortex-a53.md (cortex_a53_alu_shift): Add rotate_imm.
> * config/arm/cortex-a57.md (cortex_a53_alu): Add rotate_imm.

OK with me.  /Marcus

libgomp: Compile-time error for non-portable gomp_mutex_t initialization (was: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks)

2015-09-25 Thread Thomas Schwinge

Hi!

It's Friday afternoon -- but anyway, is the following analysis correct?

On Thu, 26 Mar 2015 23:41:30 +0300, Ilya Verbin  wrote:
> On Thu, Mar 26, 2015 at 13:09:19 +0100, Jakub Jelinek wrote:
> > the current code is majorly broken.  As I've said earlier, e.g. the lack
> > of mutex guarding gomp_target_init (which is using pthread_once guaranteed
> > to be run just once) vs. concurrent GOMP_offload_register calls
> > (if those are run from ctors, then I guess something like dl_load_lock
> > ensures at least on glibc that multiple GOMP_offload_register calls aren't
> > performed at the same time) in accessing/reallocating offload_images
> > and num_offload_images and the lack of support to register further
> > images after the gomp_target_init call (if you dlopen further shared
> > libraries) is really bad.  And it would be really nice to support the
> > unloading.

> Here is the latest patch for libgomp and mic plugin.

> libgomp/

>   * target.c (register_lock): New mutex for offload image registration.

>   (GOMP_offload_register): Add mutex lock.

> --- a/libgomp/target.c
> +++ b/libgomp/target.c
> @@ -49,6 +49,9 @@ static void gomp_target_init (void);
>  /* The whole initialization code for offloading plugins is only run one.  */
>  static pthread_once_t gomp_is_initialized = PTHREAD_ONCE_INIT;
>  
> +/* Mutex for offload image registration.  */
> +static gomp_mutex_t register_lock;
> +
>  /* This structure describes an offload image.
> It contains type of the target device, pointer to host table descriptor, 
> and
> pointer to target data.  */

No gomp_mutex_init call for register_lock has been added -- there is no
sensible place to put it, because...

> @@ -654,6 +727,18 @@ void
>  GOMP_offload_register (void *host_table, enum offload_target_type 
> target_type,
>  void *target_data)
>  {
> +  int i;
> +  gomp_mutex_lock (_lock);
> +
> +  /* Load image to all initialized devices.  */
> +  for (i = 0; i < num_devices; i++)
> +{
> +  struct gomp_device_descr *devicep = [i];
> +  if (devicep->type == target_type && devicep->is_initialized)
> + gomp_offload_image_to_device (devicep, host_table, target_data);
> +}
> +
> +  /* Insert image to array of pending images.  */
>offload_images = gomp_realloc (offload_images,
>(num_offload_images + 1)
>* sizeof (struct offload_image_descr));
> @@ -663,74 +748,105 @@ GOMP_offload_register (void *host_table, enum 
> offload_target_type target_type,
>offload_images[num_offload_images].target_data = target_data;
>  
>num_offload_images++;
> +  gomp_mutex_unlock (_lock);
>  }

... it's used in this function, which is invoked from
__attributed__((constructor)) functions generated by
gcc/config/*/*mkoffload.c, and there is no guaranteed ordering for
constructor functions [... -- see source code comment added in the
following patch]:

commit 81d25f393d9b0fdca309354e6a61f707ff403fe2
Author: Thomas Schwinge 
Date:   Fri Sep 25 16:41:29 2015 +0200

libgomp: Compile-time error for non-portable gomp_mutex_t initialization

libgomp/
* target.c [PLUGIN_SUPPORT]: Compile-time error if
!GOMP_MUTEX_INIT_0.
---
 libgomp/target.c |   10 ++
 1 file changed, 10 insertions(+)

diff --git libgomp/target.c libgomp/target.c
index 758ece5..49cb395 100644
--- libgomp/target.c
+++ libgomp/target.c
@@ -52,6 +52,16 @@ static pthread_once_t gomp_is_initialized = 
PTHREAD_ONCE_INIT;
 /* Mutex for offload image registration.  */
 static gomp_mutex_t register_lock;
 
+#if !GOMP_MUTEX_INIT_0
+# error Missing initialization for gomp_mutex_t register_lock.
+/* The problem is: gomp_mutex_t register_lock is used in
+   GOMP_offload_register_ver/GOMP_offload_register, which is called from a
+   __attribute__((constructor)) function (see the mkoffload files), so due to
+   non-deterministic constructor ordering, we can't have an initialization
+   constructor for gomp_mutex_t register_lock, such as
+   critical.c:initialize_critical, for example.  */
+#endif
+
 /* This structure describes an offload image.
It contains type of the target device, pointer to host table descriptor, and
pointer to target data.  */

I guess we don't have to worry about this non-portable gomp_mutex_t usage
right now; OK to commit the patch to at least document it (patch not yet
tested)?


Grüße,
 Thomas


signature.asc
Description: PGP signature

[Patch ifcvt 2/3] Move noce_if_info in to ifcvt.h

2015-09-25 Thread James Greenhalgh


Simple code move. We're going to allow targets to work with this
information, so pull it somewhere they can see it.

No issues building toolchains after this transformation.

OK?

Thanks,
James

---
2015-09-26  James Greenhalgh  

* ifcvt.c (noce_if_info): Move to...
* ifcvt.h (noce_if_info): ...Here.

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index e89d567..d7fc523 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -769,67 +769,6 @@ cond_exec_process_if_block (ce_if_block * ce_info,
   return FALSE;
 }
 
-/* Used by noce_process_if_block to communicate with its subroutines.
-
-   The subroutines know that A and B may be evaluated freely.  They
-   know that X is a register.  They should insert new instructions
-   before cond_earliest.  */
-
-struct noce_if_info
-{
-  /* The basic blocks that make up the IF-THEN-{ELSE-,}JOIN block.  */
-  basic_block test_bb, then_bb, else_bb, join_bb;
-
-  /* The jump that ends TEST_BB.  */
-  rtx_insn *jump;
-
-  /* The jump condition.  */
-  rtx cond;
-
-  /* New insns should be inserted before this one.  */
-  rtx_insn *cond_earliest;
-
-  /* Insns in the THEN and ELSE block.  There is always just this
- one insns in those blocks.  The insns are single_set insns.
- If there was no ELSE block, INSN_B is the last insn before
- COND_EARLIEST, or NULL_RTX.  In the former case, the insn
- operands are still valid, as if INSN_B was moved down below
- the jump.  */
-  rtx_insn *insn_a, *insn_b;
-
-  /* The SET_SRC of INSN_A and INSN_B.  */
-  rtx a, b;
-
-  /* The SET_DEST of INSN_A.  */
-  rtx x;
-
-  /* True if this if block is not canonical.  In the canonical form of
- if blocks, the THEN_BB is the block reached via the fallthru edge
- from TEST_BB.  For the noce transformations, we allow the symmetric
- form as well.  */
-  bool then_else_reversed;
-
-  /* True if the contents of then_bb and else_bb are a
- simple single set instruction.  */
-  bool then_simple;
-  bool else_simple;
-
-  /* The total rtx cost of the instructions in then_bb and else_bb.  */
-  unsigned int then_cost;
-  unsigned int else_cost;
-
-  /* Estimated cost of the particular branch instruction.  */
-  unsigned int branch_cost;
-
-  /* For if-convert transformations, the legacy way to decide whether
- the transformation should be applied is a comparison of a magic
- number against BRANCH_COST.  Ultimately, this should go away, but
- to avoid regressing targets this field encodes that number so the
- profitability analysis can remain unchanged.  */
-  unsigned int magic_number;
-
-};
-
 static rtx noce_emit_store_flag (struct noce_if_info *, rtx, int, int);
 static int noce_try_move (struct noce_if_info *);
 static int noce_try_store_flag (struct noce_if_info *);
diff --git a/gcc/ifcvt.h b/gcc/ifcvt.h
index 3e3dc5b..f1c2dc9 100644
--- a/gcc/ifcvt.h
+++ b/gcc/ifcvt.h
@@ -40,4 +40,64 @@ struct ce_if_block
   int pass;/* Pass number.  */
 };
 
+/* Used by noce_process_if_block to communicate with its subroutines.
+
+   The subroutines know that A and B may be evaluated freely.  They
+   know that X is a register.  They should insert new instructions
+   before cond_earliest.  */
+
+struct noce_if_info
+{
+  /* The basic blocks that make up the IF-THEN-{ELSE-,}JOIN block.  */
+  basic_block test_bb, then_bb, else_bb, join_bb;
+
+  /* The jump that ends TEST_BB.  */
+  rtx_insn *jump;
+
+  /* The jump condition.  */
+  rtx cond;
+
+  /* New insns should be inserted before this one.  */
+  rtx_insn *cond_earliest;
+
+  /* Insns in the THEN and ELSE block.  There is always just this
+ one insns in those blocks.  The insns are single_set insns.
+ If there was no ELSE block, INSN_B is the last insn before
+ COND_EARLIEST, or NULL_RTX.  In the former case, the insn
+ operands are still valid, as if INSN_B was moved down below
+ the jump.  */
+  rtx_insn *insn_a, *insn_b;
+
+  /* The SET_SRC of INSN_A and INSN_B.  */
+  rtx a, b;
+
+  /* The SET_DEST of INSN_A.  */
+  rtx x;
+
+  /* True if this if block is not canonical.  In the canonical form of
+ if blocks, the THEN_BB is the block reached via the fallthru edge
+ from TEST_BB.  For the noce transformations, we allow the symmetric
+ form as well.  */
+  bool then_else_reversed;
+
+  /* True if the contents of then_bb and else_bb are a
+ simple single set instruction.  */
+  bool then_simple;
+  bool else_simple;
+
+  /* The total rtx cost of the instructions in then_bb and else_bb.  */
+  unsigned int then_cost;
+  unsigned int else_cost;
+
+  /* Estimated cost of the particular branch instruction.  */
+  unsigned int branch_cost;
+
+  /* For some if-convert transformations, the canonical way to decide
+ whether the transformation should be applied is a comparison of
+ a magic number against BRANCH_COST.  Ultimately, this should go
+ away, but to avoid regressing targets this field encodes that
+ number so the

Re: libgomp: Compile-time error for non-portable gomp_mutex_t initialization (was: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks)

2015-09-25 Thread Jakub Jelinek

On Fri, Sep 25, 2015 at 05:04:47PM +0200, Thomas Schwinge wrote:
> Hi!
> 
> It's Friday afternoon -- but anyway, is the following analysis correct?
> 
> On Thu, 26 Mar 2015 23:41:30 +0300, Ilya Verbin  wrote:
> > On Thu, Mar 26, 2015 at 13:09:19 +0100, Jakub Jelinek wrote:
> > > the current code is majorly broken.  As I've said earlier, e.g. the lack
> > > of mutex guarding gomp_target_init (which is using pthread_once guaranteed
> > > to be run just once) vs. concurrent GOMP_offload_register calls
> > > (if those are run from ctors, then I guess something like dl_load_lock
> > > ensures at least on glibc that multiple GOMP_offload_register calls aren't
> > > performed at the same time) in accessing/reallocating offload_images
> > > and num_offload_images and the lack of support to register further
> > > images after the gomp_target_init call (if you dlopen further shared
> > > libraries) is really bad.  And it would be really nice to support the
> > > unloading.
> 
> > Here is the latest patch for libgomp and mic plugin.
> 
> > libgomp/
> 
> > * target.c (register_lock): New mutex for offload image registration.
> 
> > (GOMP_offload_register): Add mutex lock.

That is definitely wrong.  You'd totally break --disable-linux-futex support
on linux and bootstrap on e.g. Solaris and various other pthread targets.
At least for ELF and dynamic linking, shared libraries that contain
constructors that call GOMP_offload_register* should have DT_NEEDED libgomp
and thus libgomp's constructors should be run before the constructors of
the libraries that call GOMP_offload_register*.

For the targets without known zero initializer for gomp_mutex_lock, either
there is an option to use pthread_once to make sure it is initialized once,
or there is an option to define a macro like GOMP_MUTEX_INITIALIZER,
defined to PTHREAD_MUTEX_INITIALIZER in config/posix/mutex.h and to
{ 0 } in config/linux/mutex.h and something like {} or whatever in
config/rtems/mutex.h.  Then for the non-automatic non-heap
gomp_mutex_t's you could just initialize them in their initializers
with GOMP_MUTEX_INITIALIZER.

Jakub

Re: [RFC] Try vector as a new representation for vector masks

2015-09-25 Thread Ilya Enkovich

2015-09-23 16:53 GMT+03:00 Richard Biener :
> On Wed, Sep 23, 2015 at 3:41 PM, Ilya Enkovich  wrote:
>> 2015-09-18 16:40 GMT+03:00 Ilya Enkovich :
>>> 2015-09-18 15:22 GMT+03:00 Richard Biener :

 I was thinking about targets not supporting generating vec
 (of whatever mode) from a comparison directly but only via
 a COND_EXPR.
>>>
>>> Where may these direct comparisons come from? Vectorizer never
>>> generates unsupported statements. It means we get them from
>>> gimplifier? So touch optabs in gimplifier to avoid direct comparisons?
>>> Actually vect lowering checks if we are able to make comparison and
>>> expand also uses vec_cond to expand vector comparison, so probably we
>>> may live with them.
>>>

 Not sure if we are always talking about the same thing for
 "bool patterns".  I'd remove bool patterns completely, IMHO
 they are not necessary at all.
>>>
>>> I refer to transformations made by vect_recog_bool_pattern. Don't see
>>> how to remove them completely for targets not supporting comparison
>>> vectorization.
>>>

 I think we do allow this, just the vectorizer doesn't expect it.  In the 
 long
 run I want to get rid of the GENERIC exprs in both COND_EXPR and
 VEC_COND_EXPR.  Just didn't have the time to do this...
>>>
>>> That would be nice. As a first step I'd like to support optabs for
>>> VEC_COND_EXPR directly using vec.
>>>
>>> Thanks,
>>> Ilya
>>>

 Richard.
>>
>> Hi Richard,
>>
>> Do you think we have enough confidence approach is working and we may
>> start integrating it into trunk? What would be integration plan then?
>
> I'm still worried about the vec vector size vs. element size
> issue (well, somewhat).

Yeah, I hit another problem related to element size in vec lowering.
It uses inner type sizes in expand_vector_piecewise and bool vector
expand goes in a wrong way. There were also other places with similar
problems and therefore I want to try to use bools of different sizes
and see how it goes. Also having different sized bools may be useful
to represent masks pack/unpack in scalar code.

>
> Otherwise the integration plan would be
>
>  1) put in the vector GIMPLE type support and change the vector
> comparison type IL requirement to be vector,
> fixing all fallout
>
>  2) get support for directly expanding vector comparisons to
> vector and make use of that from the x86 backend
>
>  3) make the vectorizer generate the above if supported
>
> I think independent improvements are
>
>  1) remove (most) of the bool patterns from the vectorizer
>
>  2) make VEC_COND_EXPR not have a GENERIC comparison embedded

Sounds great!

Ilya

>
> (same for COND_EXPR?)
>
> Richard.
>
>> Thanks,
>> Ilya

Re: [RFC PATCH] parse #pragma GCC diagnostic in libcpp

2015-09-25 Thread Dodji Seketeli

Manuel López-Ibáñez  writes:

> Currently, #pragma GCC diagnostic is handled entirely by the FE. This
> has several drawbacks:
>
> * PR c++/53431 - C++ preprocessor ignores #pragma GCC diagnostic: The
> C++ parser lexes (and preprocesses) before handling the pragmas.
>
> * PR 53920 - "gcc -E" does not honor #pragma GCC diagnostic ignored
> "-Wunused-macro": Because -E does not invoke the FE code that parses
> the FE pragmas.
>
> * PR 64698 - preprocessor ignores #pragma GCC diagnostic when using
> -save-temps. Same issue as above.
>
> The following patch moves the handling of #pragma GCC diagnostic to
> libcpp but keeps the interface with the diagnostic machinery in the FE
> by using a call-back function.
>
> One serious problem with this approach is that the preprocessor will
> delete the pragmas from the preprocessed output, thus '-E',
> '-save-temps'  will not contain the pragmas and compiling the
> preprocessed file will trigger the warnings that they were meant to
> suppress.  Any ideas how to prevent libcpp from deleting the #pragmas?

[...]

Joseph Myers  writes:

> On Sun, 20 Sep 2015, Manuel López-Ibáñez wrote:

>> > I don't see any other way to fix these PRs, but I don't know how to
>> > keep the pragmas from being deleted by the preprocessor.
>
> I'd suppose you want a new type of pragma, that acts like a combination of 
> a deferred one and one for which a handler is called immediately by 
> libcpp.  libcpp would call the handler but also create a CPP_PRAGMA token.  
> The front-end code calling pragma handlers would need to know to do 
> nothing with such pragmas; the token would only be for textual 
> preprocessor output.

Right.  I was thinking about something along those lines as well.

So in concrete terms, in libcpp, once do_pragma() has handled the
pragma, it does two things that are interesting to us:

  * If the pragma is an internal one -- one that is handled immediately by
libcpp, then it keeps going, chewing away at more tokens.

  * If the pragma is a deffered one -- one that is to be handled later
by the FE, it updates pfile->directive_result.type to CPP_PRAGMA
(and sets up some ancillary state there too).

The caller of do_pragma(), which is destringize_and_run() then detects
that pfile->directive_result.type is set, and then puts the tokens of
the pragma back into the input stream again.  So next time the FE
requests more tokens, it's going to get the same pragma tokens.

So, maybe you could alter pragma_entry::is_deferred; change it into a
flag which type is an enum that says how the the pragma is to be
handled; either internally and its tokens shouldn't be visible to the FE
(this is what the current pragma_entry::is_internal means), internally
and the tokens would be visible to the FE, or deferred.

Then do do_pragma() would be adjusted to change the if (p->is_deferred)
clause to allow the third handling kind I just talked about.

I hope this helps.

Cheers,

-- 
Dodji

Re: Dead code in move_stmt_r

2015-09-25 Thread Jeff Law


On 09/25/2015 09:41 AM, Bernd Schmidt wrote:

I noticed this a while ago while trying to defer parts of omp expansion
until after LTO. move_tree_r, which is used when moving code into omp
child functions, has code not to remap variables inside an OMP
directive. However, by the time we get here, these directives should all
have been expanded already, and the code appears to be dead.

Hence, the following patch. Bootstrapped and tested on x86_64-linux, ok?

OK.
jeff

Re: [PATCH 3/5] Implement token range tracking within libcpp and the C FE (v2)

2015-09-25 Thread Dodji Seketeli

David Malcolm  a écrit:

> On Fri, 2015-09-25 at 11:13 +0200, Dodji Seketeli wrote:
[...]

>> Funny; I first overlooked this comment of you, and then when reading the
>> patch, I asked myself "why keep the existing location_t" ?  I mean, in
>> here:
>> 
>>  /* A preprocessing token.  This has been carefully packed and should
>> -   occupy 16 bytes on 32-bit hosts and 24 bytes on 64-bit hosts.  */
>> +   occupy 16 bytes on 32-bit hosts and 24 bytes on 64-bit hosts.
>> +   FIXME: the above comment is no longer true with this patch.  */
>>  struct GTY(()) cpp_token {
>>source_location src_loc;  /* Location of first char of token.  */
>> +  source_range src_range;   /* Source range covered by the token.  
>> */
>>ENUM_BITFIELD(cpp_ttype) type : CHAR_BIT;  /* token type */
>>unsigned short flags; /* flags - see above */
>>  
>> You could just change the type of src_loc and make it be a source_range.
>
> Interesting idea.
>
> For the general case of expressions, I want a location to mean a
> caret/point plus a range that contains it, but for tokens, the
> caret/point is always at the start of the range.  So maybe a src_range
> would do here.
>
> That said, in patches 3 and 4 of this kit I'm experimenting with
> representation; as I said in the blurb for patch 3: "See the
> cover-letter for this patch kit which describes how we might go back to
> using just a location_t, and stashing the range inside the location_t.
> I'm doing it this way for now to allow for more flexibility as I
> benchmark and explore implementation options."

Right.

> So patches 3 and 4 are rather more experimental than patches 1,2 and 5,
> as I find out what the different representations do to the time
> consumption of the compiler.

Understood.

> I like the idea of "source_location" and "location_t" becoming a
> representation of "(point/caret + range)" rather than just a
> point/caret, since this means that we can pass location_t around as
> before, but then we can extract ranges from them, and all of the
> existing diagnostics ought to "automagically" gain underlines "for
> free".

Me too.

>  I'm experimenting with ways to try to do that efficiently, with
> strategies for packing things into the 32-bits compactly; see e.g.:
>  https://gcc.gnu.org/ml/gcc-patches/2015-09/msg01826.html
>
> If so, then cpp_token's src_loc would remain a source_location;
> source_location itself becomes richer.
>
>> Source range could take a converting constructor, that takes a
>> source_location, so that the existing code that does "cpp_token.src_loc
>> = a_source_location;" keeps working.
>> 
>> But then, in the previous patch of the series, I see:
>> 
>> +/* A range of source locations.
>> +
>> +   Ranges are closed:
>> +   m_start is the first location within the range,
>> +   m_finish is the last location within the range.
>> +
>> +   We may need a more compact way to store these, but for now,
>> +   let's do it the simple way, as a pair.  */
>> +struct GTY(()) source_range
>> +{
>> +  source_location m_start;
>> +  source_location m_finish;
>> +
>> +  void debug (const char *msg) const;
>> +
>> +  /* We avoid using constructors, since various structs that
>> + don't yet have constructors will embed instances of
>> + source_range.  */
>> +
>> 
>> But what if we define a default constructor for that one (as well as one
>> that takes a source_location)?  Wouldn't that work for the embedding
>> case that you talk about in that comment?
>
> Perhaps, but I worry that it would lead to a cascade, where suddenly
> we'd need constructors for various other types.  I can try it, I
> guess.

If it leads to a cascade, then don't bother.  My point was precisely to
try to avoid the churn, while limiting the amount of data size increase
for cpp_token in general.

As you implied, if we can just stay with a source_location that carries
the information of a pointer plus a range, even better.

> [BTW, I'm about to disappear on a vacation from tomorrow until October
> 6th, and away from computers]

Thanks for the heads-up.

-- 
Dodji

Re: New post-LTO OpenACC pass

2015-09-25 Thread Nathan Sidwell


On 09/25/15 06:28, Bernd Schmidt wrote:



This is the c-c++-common/goacc/acc_on_device-2.c testcase. Is that expected to
be handled? If I change it to use __builtin_acc_on_device, I can step right into

Breakpoint 8, fold_call_stmt (stmt=0x70736e10, ignore=false) at
../../git/gcc/builtins.c:12277
12277  tree ret = NULL_TREE;

Maybe you were compiling without optimization? In that case
expand_builtin_acc_on_device (which already exists) should still end up doing
the right thing. In no case should you see a RTL call to a function, that
indicates that something else went wrong.


I think I was reading more into the std than it intended, as it claims 
on_deveice should evaluate 'to a constant'.  (no mention of 'when optimizing'). 
 It can't mean 'be useable in integral-constant-expression, as at the point we 
 need those, one doesn't know the value it should be.


thinking about it, I don't think a user can tell.  the case I had in mind (and 
have used it for), is something like


on_device (nvidia)  ? asm ("NVIDIA specific asm") : c-expr

and for that to work, one must turn the optimzer on to get the dead code 
removal, regardless of where on_device expands.  So my goal of  getting it 
expanded regardless of optimization level is not needed --- indeed getting it 
expanded in fold_call_stmt will mean the body of expand_on_device can go away (I 
think).


From the POV of what the programmer really cares about is that when optimizing 
the compiler  knows how to fold it.



Can you send me the patch you tried (and possibly a testcase you expect to be
handled), I'll see if I can find out what's going on.


Thanks!  When things didn't work, I tried getting it workong on the gomp4 
branch, as I new what to expect there.  So the patch is for that branch.


The fails I observed are:

FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/if-1.c 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none execution test
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/gang-static-2.c 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0 
execution test
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/gang-static-2.c 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2 
execution test
FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/if-1.c 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none execution test
FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/gang-static-2.c 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0 
execution test
FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/gang-static-2.c 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2 
execution test



the diff I have is attached -- as you can see it's 'experimental'.

nathan
Index: builtins.c
===
--- builtins.c	(revision 228094)
+++ builtins.c	(working copy)
@@ -5866,6 +5866,8 @@ expand_stack_save (void)
 static rtx
 expand_builtin_acc_on_device (tree exp, rtx target)
 {
+   gcc_unreachable ();
+  
 #ifndef ACCEL_COMPILER
   gcc_assert (!get_oacc_fn_attrib (current_function_decl));
 #endif
@@ -10272,6 +10274,27 @@ fold_builtin_1 (location_t loc, tree fnd
 	return build_empty_stmt (loc);
   break;
 
+case BUILT_IN_ACC_ON_DEVICE:
+  /* Don't fold on_device until we know which compiler is active.  */
+  if (symtab->state == EXPANSION)
+	{
+	  unsigned val_host = GOMP_DEVICE_HOST;
+	  unsigned val_dev = GOMP_DEVICE_NONE;
+
+#ifdef ACCEL_COMPILER
+	  val_host = GOMP_DEVICE_NOT_HOST;
+	  val_dev = ACCEL_COMPILER_acc_device;
+#endif
+	  tree host = build2 (EQ_EXPR, boolean_type_node, arg0,
+			  build_int_cst (integer_type_node, val_host));
+	  tree dev = build2 (EQ_EXPR, boolean_type_node, arg0,
+			 build_int_cst (integer_type_node, val_dev));
+
+	  tree result = build2 (TRUTH_OR_EXPR, boolean_type_node, host, dev);
+	  return fold_convert (integer_type_node, result);
+	}
+  break;
+
 default:
   break;
 }
Index: omp-low.c
===
--- omp-low.c	(revision 228094)
+++ omp-low.c	(working copy)
@@ -14725,21 +14725,20 @@ static void
 oacc_xform_on_device (gcall *call)
 {
   tree arg = gimple_call_arg (call, 0);
-  unsigned val = GOMP_DEVICE_HOST;
-	  
-#ifdef ACCEL_COMPILER
-  val = GOMP_DEVICE_NOT_HOST;
-#endif
-  tree result = build2 (EQ_EXPR, boolean_type_node, arg,
-			build_int_cst (integer_type_node, val));
+  unsigned val_host = GOMP_DEVICE_HOST;
+  unsigned val_dev = GOMP_DEVICE_NONE;
+
 #ifdef ACCEL_COMPILER
-  {
-tree dev  = build2 (EQ_EXPR, boolean_type_node, arg,
-			build_int_cst (integer_type_node,
-   ACCEL_COMPILER_acc_device));
-result = build2 (TRUTH_OR_EXPR, boolean_type_node, result, dev);
-  }
+  val_host = GOMP_DEVICE_NOT_HOST;
+  val_dev = ACCEL_COMPILER_acc_device;
 #endif
+
+  tree host = build2 (EQ_EXPR, boolean_type_node, arg,
+

Re: [gomp4] Another oacc reduction simplification

2015-09-25 Thread Nathan Sidwell


On 09/24/15 16:32, Cesar Philippidis wrote:

On 09/22/2015 08:29 AM, Nathan Sidwell wrote:


1) Don't have a fake gang reduction outside of worker & vector loops.
Deal with the receiver object directly.  I.e. 'ref_to_res' need not be a
null pointer for vector and worker loops.


What happens when there is no receiver object. E.g. a reduction inside a
routine? Specifically, inside lower_oacc_reductions, your doing this:

/* This is the outermost construct with this reduction,
   see if there's a mapping for it.  */
if (maybe_lookup_field (orig, outer))
  ref_to_res = build_receiver_ref (orig, false, outer);

That's going to ICE inside a routine.


Is it?  the 'maybe_lookup' should protect against that.  do you have a testcase?

nathan

Re: [PATCH][RTL-ifcvt] PR rtl-optimization/67465: Handle pairs of complex+simple blocks and empty blocks more gracefully

2015-09-25 Thread Rainer Orth

Hi Kyrill,

> Bootstrapped and tested on aarch64 and x86_64.
> Rainer, could you please try this patch in combination with the one I sent
> earlier at:
> https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00815.html

it took me quite a bit, but I've now regtested those two patches: with
them both applied, the sparc-sun-solaris2.12 build succeeds and the two
gcc.c-torture/execute/20071216-1.c failures are gone.

So, from a SPARC POV the patches are good to go.

Thanks.
Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

Re: [PATCH][RTL-ifcvt] PR rtl-optimization/67465: Handle pairs of complex+simple blocks and empty blocks more gracefully

2015-09-25 Thread Kyrill Tkachov


Hi Rainer,

On 25/09/15 11:57, Rainer Orth wrote:

Hi Kyrill,


Bootstrapped and tested on aarch64 and x86_64.
Rainer, could you please try this patch in combination with the one I sent
earlier at:
https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00815.html

it took me quite a bit, but I've now regtested those two patches: with
them both applied, the sparc-sun-solaris2.12 build succeeds and the two
gcc.c-torture/execute/20071216-1.c failures are gone.

So, from a SPARC POV the patches are good to go.


Phew, thanks a lot!

So, in conclusion the patches I'd like approval for are:
https://gcc.gnu.org/ml/gcc-patches/2015-09/msg01306.html
and
https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00815.html


Kyrill



Thanks.
 Rainer

Re: [PATCH, fortran] Revival of AUTOMATIC patch

2015-09-25 Thread Jim MacArthur

On Thu, Sep 24, 2015 at 10:58:41PM +0200, FX wrote:
> > I think I appreciate what you are trying to do here.  I don't intend to 
> > sound
> > negative here, but if the keyword AUTOMATIC does nothing
> 
> The testcase given is not an example of useful AUTOMATIC. I think it is 
> meant to be used to oppose an implied SAVE attribute, e.g. a variable with 
> explicit initialization or the BIND attribute. Indeed, in the case of 
> implied SAVE by initialization, there it is a little bit more work because 
> you have to move the initialization to the executable part of the code. But 
> that’s not impossible.

The automatic_1 test case was only intended to demonstrate that AUTOMATIC has 
an effect, not a useful one. I don't have the option of being able to 
rewrite all our source code, so I am trying to make a compiler which mimics 
some older proprietary ones; I understand that these features may not be 
useful to someone writing new Fortran code.

> 
> All in all I’m skeptical of adding even more old language extensions with 
> little demand when we have a hard time filling up gaps in the standard. Each 
> addition adds to maintainance load, especially as they might not interact 
> too well with more modern features. (For example coarrays or BIND attribute, 
> which were not around when AUTOMATIC was in use.)
> 
> I don’t find any request for this feature in the whole bugzilla database.

That's understandable. We'll maintain this feature in our own delta. I felt it 
was in the spirit of open source to offer it in case it was useful.

Thanks for taking the time to review it.

Jim

[ubsan PATCH] Clear up unnecessary code

2015-09-25 Thread Marek Polacek

As discussed earlier, this patch removes now useless code and adds
tests to ensure we don't regress in diagnostics.

Bootstrapped/regtested on x86_64-linux, applying to trunk.

2015-09-25  Marek Polacek  

* c-ubsan.c (ubsan_instrument_division): Remove unnecessary code.
(ubsan_instrument_shift): Likewise.

* c-c++-common/ubsan/bounds-11.c: New test.
* c-c++-common/ubsan/bounds-12.c: New test.

diff --git gcc/c-family/c-ubsan.c gcc/c-family/c-ubsan.c
index d2bc264..672762c 100644
--- gcc/c-family/c-ubsan.c
+++ gcc/c-family/c-ubsan.c
@@ -89,20 +89,7 @@ ubsan_instrument_division (location_t loc, tree op0, tree 
op1)
 return NULL_TREE;
 
   /* In case we have a SAVE_EXPR in a conditional context, we need to
- make sure it gets evaluated before the condition.  If the OP0 is
- an instrumented array reference, mark it as having side effects so
- it's not folded away.  */
-  if (flag_sanitize & SANITIZE_BOUNDS)
-{
-  tree xop0 = op0;
-  while (CONVERT_EXPR_P (xop0))
-   xop0 = TREE_OPERAND (xop0, 0);
-  if (TREE_CODE (xop0) == ARRAY_REF)
-   {
- TREE_SIDE_EFFECTS (xop0) = 1;
- TREE_SIDE_EFFECTS (op0) = 1;
-   }
-}
+ make sure it gets evaluated before the condition.  */
   t = fold_build2 (COMPOUND_EXPR, TREE_TYPE (t), unshare_expr (op0), t);
   t = fold_build2 (COMPOUND_EXPR, TREE_TYPE (t), unshare_expr (op1), t);
   if (flag_sanitize_undefined_trap_on_error)
@@ -187,20 +174,7 @@ ubsan_instrument_shift (location_t loc, enum tree_code 
code,
 return NULL_TREE;
 
   /* In case we have a SAVE_EXPR in a conditional context, we need to
- make sure it gets evaluated before the condition.  If the OP0 is
- an instrumented array reference, mark it as having side effects so
- it's not folded away.  */
-  if (flag_sanitize & SANITIZE_BOUNDS)
-{
-  tree xop0 = op0;
-  while (CONVERT_EXPR_P (xop0))
-   xop0 = TREE_OPERAND (xop0, 0);
-  if (TREE_CODE (xop0) == ARRAY_REF)
-   {
- TREE_SIDE_EFFECTS (xop0) = 1;
- TREE_SIDE_EFFECTS (op0) = 1;
-   }
-}
+ make sure it gets evaluated before the condition.  */
   t = fold_build2 (COMPOUND_EXPR, TREE_TYPE (t), unshare_expr (op0), t);
   t = fold_build2 (TRUTH_OR_EXPR, boolean_type_node, t,
   tt ? tt : integer_zero_node);
diff --git gcc/testsuite/c-c++-common/ubsan/bounds-11.c 
gcc/testsuite/c-c++-common/ubsan/bounds-11.c
index e69de29..c3e0f22 100644
--- gcc/testsuite/c-c++-common/ubsan/bounds-11.c
+++ gcc/testsuite/c-c++-common/ubsan/bounds-11.c
@@ -0,0 +1,23 @@
+/* { dg-do run } */
+/* { dg-options "-fsanitize=bounds" } */
+
+struct S
+{
+  unsigned long a[1];
+  int l;
+};
+
+static inline unsigned long
+fn (const struct S *s, int i)
+{
+  return s->a[i] / i;
+}
+
+int
+main ()
+{
+  struct S s;
+  fn (, 1);
+}
+
+/* { dg-output "index 1 out of bounds for type 'long unsigned int \\\[1\\\]'" 
} */
diff --git gcc/testsuite/c-c++-common/ubsan/bounds-12.c 
gcc/testsuite/c-c++-common/ubsan/bounds-12.c
index e69de29..3cd3a4a 100644
--- gcc/testsuite/c-c++-common/ubsan/bounds-12.c
+++ gcc/testsuite/c-c++-common/ubsan/bounds-12.c
@@ -0,0 +1,23 @@
+/* { dg-do run } */
+/* { dg-options "-fsanitize=bounds" } */
+
+struct S
+{
+  unsigned long a[1];
+  int l;
+};
+
+static inline unsigned long
+fn (const struct S *s, int i)
+{
+  return s->a[i] << i;
+}
+
+int
+main ()
+{
+  struct S s;
+  fn (, 1);
+}
+
+/* { dg-output "index 1 out of bounds for type 'long unsigned int \\\[1\\\]'" 
} */

Marek

Re: [RFC] PR tree-optimization/67628: Make tree ifcombine more symmetric and interactions with dom

2015-09-25 Thread Kyrill Tkachov



On 25/09/15 11:15, Richard Biener wrote:

On Fri, 25 Sep 2015, Kyrill Tkachov wrote:


On 25/09/15 10:49, Richard Biener wrote:

On Fri, 25 Sep 2015, Kyrill Tkachov wrote:


Hi all,

On 23/09/15 11:10, Richard Biener wrote:

On Wed, 23 Sep 2015, Kyrill Tkachov wrote:


On 23/09/15 10:09, Pinski, Andrew wrote:

On Sep 23, 2015, at 1:59 AM, Kyrill Tkachov

wrote:



On 22/09/15 20:31, Jeff Law wrote:

On 09/22/2015 07:36 AM, Kyrill Tkachov wrote:
Hi all,
Unfortunately, I see a testsuite regression with this patch:
FAIL: gcc.dg/pr66299-2.c scan-tree-dump-not optimized "<<"

The reduced part of that test is:
void
test1 (int x, unsigned u)
{
   if ((1U << x) != 64
   || (2 << x) != u
   || (x << x) != 384
   || (3 << x) == 9
   || (x << 14) != 98304U
   || (1 << x) == 14
   || (3 << 2) != 12)
 __builtin_abort ();
}

The patched ifcombine pass works more or less as expected and
produces
fewer basic blocks.
Before this patch a relevant part of the ifcombine dump for
test1
is:
;;   basic block 2, loop depth 0, count 0, freq 1, maybe
hot
   if (x_1(D) != 6)
 goto ;
   else
 goto ;

;;   basic block 3, loop depth 0, count 0, freq 9996, maybe
hot
   _2 = 2 << x_1(D);
   _3 = (unsigned intD.10) _2;
   if (_3 != u_4(D))
 goto ;
   else
 goto ;


After this patch it is:
;;   basic block 2, loop depth 0, count 0, freq 1, maybe
hot
   _2 = 2 << x_1(D);
   _3 = (unsigned intD.10) _2;
   _9 = _3 != u_4(D);
   _10 = x_1(D) != 6;
   _11 = _9 | _10;
   if (_11 != 0)
 goto ;
   else
 goto ;

The second form ends up generating worse codegen however, and
the
badness starts with the dom1 pass.
In the unpatched case it manages to deduce that x must be 6 by
the
time
it reaches basic block 3 and
uses that information to eliminate the shift in "_2 = 2 <<
x_1(D)"
from
basic block 3
In the patched case it is unable to make that call, I think
because
the
x != 6 condition is IORed
with another test.

I'm not familiar with the internals of the dom pass, so I'm
not
sure
where to go looking for a fix for this.
Is the ifcombine change a step in the right direction? If so,
what
would
need to be done to fix the issue with
the dom pass?

I don't see how you can reasonably fix this in DOM.  if _9 or
_10 is
true, then _11 is true.  But we can't reasonably record any kind
of
equivalence for _9 or _10 individually.

If the statement
_11 = _9 | _10;

Were changed to

_11 = _9 & _10;

Then we could record something useful about _9 and _10.



I suppose what we want is to not combine basic blocks if the
sequence
and conditions of the basic blocks are
such that dom can potentially exploit them, but how do we
express
that?

I don't think there's going to be a way to directly express
that.
You
could essentially claim that TRUTH_OR is more expensive than
TRUTH_AND
because of the impact on DOM, but that in and of itself may not
resolve
the situation either.

I think the question we need to answer is whether or not your
changes
are generally better, even if there's specific instances where
they
make
things worse.  If the benefits outweigh the negatives then we
can
xfail
that test.

Ok, I'll investigate and benchmark some more.
Andrew, this transformation to ifcombine (together with the
restriction
that the inner condition block
has to be a single comparison) was added by you with r204194.
Is there a particular reason for that restriction and why it is
applied to
the inner block and not either?

My reasoning at the time was there might be an "expensive"
instruction
or
one that might trap (I did not check to see if the other part of the
code
was detecting that).
The outer block did not need any checks as we have something like
...
If (a)
  If (b)

Or

If (a)
  Goto f
else if (b)
 
Else
{
F:

}

And there was no need to check what was before the if (a) part just
what
is
in between the two ifs.

Ah, because the code in outer_cond_bb would have to be executed anyway
whether
we perform the conversion or not, right?

All ifcombine transforms make the outer condition unconditionally
true/false thus the check should have been on whether the outer
cond BB is "empty".  Which would solve your problem, right?

Note that other transforms (bit test recognition) don't care (sth
we might want to fix?).

In general this needs a better cost function, maybe simply use
estimate_num_insns with speed estimates and compare against a
new --param.

So I've played around with that code and I think I'd like to make it a bit
more intricate. Just comparing against estimate_num_insns is too
coarse-grained and
is just a comparison by a magic number number and I've been struggling to
make
this
aggressive enough without pulling in too much code into the unconditional
path.

As far as aarch64 is concerned I want to make this transformation more
aggressive when
it can facilitate

Re: [PR64164] drop copyrename, integrate into expand

2015-09-25 Thread Richard Biener

On Wed, Sep 23, 2015 at 10:07 PM, Alexandre Oliva  wrote:
> On Sep 18, 2015, Alan Lawrence  wrote:
>
>> With the latest git commit 2b27ef197ece54c4573c5a748b0d40076e35412c on
>> branch aoliva/pr64164, I am now able to build a cross toolchain for
>> aarch64 and aarch64_be, and can confirm the ABI failure is fixed on
>> the branch.
>
> Thanks for the confirmation.  I've made one further tweak for cris and
> lm32, dropping the assert that caused build failures for libstdc++
> atomics parms that required more alignment than
> MAX_SUPPORTED_STACK_ALIGNMENT, consolidated the patchset and retested it
> with a more recent baseline (r228019), with native regstraps on
> x86_64-linux-gnu, i686-linux-gnu, powerpc64-linux-gnu,
> powerpc64le-linux-gnu, and cross toolchain builds for the following 73
> platforms: aarch64_be-elf aarch64-elf arm-eabi armeb-eabihf
> arm-symbianelf avr-elf bfin-elf c6x-elf cr16-elf cris-elf crisv32-elf
> epiphany-elf fido-elf fr30-elf frv-elf ft32-elf h8300-elf i686-elf
> ia64-elf iq2000-elf lm32-elf m32c-elf m32r-elf m32rle-elf m68k-elf
> mcore-elf mep-elf microblaze-elf mips64el-elf mips64-elf mips64orion-elf
> mips64vr-elf mipsel-elf mipsisa32-elfoabi mipsisa64-elfoabi
> mipsisa64r2el-elf mipsisa64r2-sde-elf mipsisa64sb1-elf
> mipsisa64sr71k-elf mipstx39-elf mn10300-elf moxie-elf msp430-elf
> nds32be-elf nds32le-elf nios2-elf pdp11-aout powerpc-eabialtivec
> powerpc-eabi powerpc-eabisimaltivec powerpc-eabisim powerpc-eabispe
> powerpcle-eabi powerpcle-eabisim powerpcle-elf powerpc-xilinx-eabi
> ppc64-eabi ppc-eabi ppc-elf rl78-elf rx-elf sh64-elf sh-elf
> sh-superh-elf sparc64-elf sparc-elf sparc-leon-elf spu-elf v850e-elf
> v850-elf visium-elf xstormy16-elf xtensa-elf.  Not all of them succeeded
> in building, but those that didn't failed at the very same spots before
> and after this patch.
>
>
> This patch doesn't really add much functionality.  It rather
> reimplements a lot of the ugly and fragile stuff I put in in the
> previous big patchset in a far more robust and pleasant way.  It fixes a
> number of regressions in the process, mainly because, instead of
> modifying assign_parms so as to let cfgexpand do part of its job, it
> reverts all of the RTL assignment for parameters and results to
> assign_parms.  cfgexpand now leaves the RTL assignment of partitions
> containing default defs or parms and results to assign_parms, and
> assign_parms uses a single callback, set_parm_rtl, to tell cfgexpand the
> assignment for the partition containing the default def of each
> parameter.
>
> This required introducing default defs for all parms and results, even
> if unused; we could refrain from creating them, and refrain from
> initializing those parameters (at least when optimizing), but that would
> require messing with the fragile bits in assign_parms again, and it
> would bring little benefit, since RTL optimization will likely notice
> the initialization is unused and drop it anyway.  Besides, adding the
> default defs was actually needed to fix a regression in the previous
> patch, and even with the current patch it helps make sure we don't
> assign more than one default def to the same SSA partition (the previous
> patch attempted to do that, but there was a bug, fixed in the current
> patch).  Having unused default defs makes it easier for us to decide
> whether to use an entry_value rtx for the initial debug insn of a parm.
> We track partitions holding default defs for parms and results with a
> bitmap; we used to have a bitmap that tracked partitions holding default
> defs, but it was unused!  I just renamed it and repurposed it.
>
> I've also added checking asserts to set_rtl, to verify that, when we
> expect a REG, we get a REG, and that it has the expected mode.  set_rtl
> was also adjusted to record anonymous SSA names or their base types in
> attrs of REGs or MEMs, respectively, so that code that relied on the
> attrs to detect properties of the decl types no longer regress just
> because we no longer generate decls for anonymous SSA names.  Since
> there were prior uses of types in MEM attrs, that was expected to go
> smoothly, but I was surprised at how smoothly adding SSA names to REG
> attrs went.  No adjustments required!
>
> I also tightened a bit the conditions for coalescing: we used to require
> the same canonical type; I've added tests for same alignment
> requirements, and for same signedness.  OTOH, I've added a few more
> coalesce candidates for RESULT_DECLs and the newly-added default defs of
> parms and results.
>
> Other relevant changes were in mode promotion.  TYPE_MODE would often
> return BLKmode for some vector types, which was fine for some return
> decl RTL with PARALLEL, but that didn't quite work for SSA partitions.
> There were other cases of mode promotion of result decls that failed the
> asserts in set_rtl, that revealed promote_decl_mode didn't call
> promote_function_mode as expected for results.
>
>

Re: [committed, PATCH] Change IA MCU processor from iamcu to lakemount

2015-09-25 Thread H.J. Lu

On Thu, Sep 24, 2015 at 5:19 PM, H.J. Lu  wrote:
> The first IA MCU processor will be Lakemount.  This patch changes IA MCU
> processor name from iamcu to lakemount.
>
> Tested on Linux/x86-64 with -m32.  Checked into trunk.
>
> H.J.
> --
> gcc/
>
> * config.gcc (x86_archs): Replace iamcu with lakemount.
> (with_cpu): Likewise.
> (with_arch): Likewise.
> * doc/invoke.texi: Likewise.
> * config/i386/i386-c.c (ix86_target_macros_internal): Replace
> PROCESSOR_IAMCU with PROCESSOR_LAKEMOUNT.  Replace
> __tune_iamcu__ with __tune_lakemount__.
> * config/i386/i386.c (iamcu_cost): Renamed to ...
> (lakemount_cost): This.
> (m_IAMCU): Renamed to ...
> (m_LAKEMOUNT): This.
> (initial_ix86_arch_features): Replace m_IAMCU with m_LAKEMOUNT.
> (processor_target_table): Replace "iamcu" with "lakemount".
> (processor_alias_table): Likewise.
> (ix86_issue_rate): Replace PROCESSOR_IAMCU with
> PROCESSOR_LAKEMOUNT.
> (ix86_adjust_cost): Likewise.
> (ia32_multipass_dfa_lookahead): Likewise.
> * config/i386/i386.h (processor_type): Likewise.
> * config/i386/x86-tune.def: Replace m_IAMCU with m_LAKEMOUNT.
>
> gcc/testsuite/
>
> * gcc.target/i386/pr66749.c (dg-options): Replace -mtune=iamcu
> with -mtune=lakemount.
> * gcc.target/i386/pr66821.c (dg-options): Likewise.
> * gcc.target/i386/pr67329.c (dg-options): Likewise.

It should be Lakemont, not Lakemount.  Fixed on trunk.


-- 
H.J.
From 018a22de7bebda282008ec847eff89fc0cb986b5 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Fri, 25 Sep 2015 04:12:05 -0700
Subject: [PATCH] Rename IA MCU processor lakemount to lakemont

IA MCU processor name is lakemont, not lakemount.

gcc/

	* config.gcc (x86_archs): Replace lakemount with lakemont.
	(with_cpu): Likewise.
	(with_arch): Likewise.
	* config/i386/i386-c.c (ix86_target_macros_internal): Replace
	PROCESSOR_LAKEMOUNT with PROCESSOR_LAKEMONT.  Replace
	__tune_lakemount__ with __tune_lakemont__.
	* config/i386/i386.c (lakemount_cost): Renamed to ...
	(lakemont_cost): This.
	(m_LAKEMOUNT): Renamed to ...
	(m_LAKEMONT): This.
	(initial_ix86_arch_features): Replace m_LAKEMOUNT with m_LAKEMONT.
	(processor_target_table): Replace "lakemount" with "lakemont".
	(processor_alias_table): Likewise.
	(ix86_issue_rate): Replace PROCESSOR_LAKEMOUNT with
	PROCESSOR_LAKEMONT.
	(ix86_adjust_cost): Likewise.
	(ia32_multipass_dfa_lookahead): Likewise.
	* config/i386/i386.h (processor_type): Likewise.
	* config/i386/x86-tune.def: Replace m_LAKEMOUNT with m_LAKEMONT.
	* doc/invoke.texi: Replace lakemount with lakemont.  Replace
	Lakemount with Lakemont.

gcc/testsuite/

	* gcc.target/i386/pr66749.c (dg-options): Replace
	-mtune=lakemount with -mtune=lakemont.
	* gcc.target/i386/pr66821.c (dg-options): Likewise.
	* gcc.target/i386/pr67329.c (dg-options): Likewise.
---
 gcc/ChangeLog   | 24 
 gcc/config.gcc  |  6 +++---
 gcc/config/i386/i386-c.c|  6 +++---
 gcc/config/i386/i386.c  | 16 
 gcc/config/i386/i386.h  |  2 +-
 gcc/config/i386/x86-tune.def| 26 +-
 gcc/doc/invoke.texi |  4 ++--
 gcc/testsuite/ChangeLog |  7 +++
 gcc/testsuite/gcc.target/i386/pr66749.c |  2 +-
 gcc/testsuite/gcc.target/i386/pr66821.c |  2 +-
 gcc/testsuite/gcc.target/i386/pr67329.c |  2 +-
 11 files changed, 64 insertions(+), 33 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 9a6ea3b..fbc353d 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,27 @@
+2015-09-25  H.J. Lu  
+
+	* config.gcc (x86_archs): Replace lakemount with lakemont.
+	(with_cpu): Likewise.
+	(with_arch): Likewise.
+	* config/i386/i386-c.c (ix86_target_macros_internal): Replace
+	PROCESSOR_LAKEMOUNT with PROCESSOR_LAKEMONT.  Replace
+	__tune_lakemount__ with __tune_lakemont__.
+	* config/i386/i386.c (lakemount_cost): Renamed to ...
+	(lakemont_cost): This.
+	(m_LAKEMOUNT): Renamed to ...
+	(m_LAKEMONT): This.
+	(initial_ix86_arch_features): Replace m_LAKEMOUNT with m_LAKEMONT.
+	(processor_target_table): Replace "lakemount" with "lakemont".
+	(processor_alias_table): Likewise.
+	(ix86_issue_rate): Replace PROCESSOR_LAKEMOUNT with
+	PROCESSOR_LAKEMONT.
+	(ix86_adjust_cost): Likewise.
+	(ia32_multipass_dfa_lookahead): Likewise.
+	* config/i386/i386.h (processor_type): Likewise.
+	* config/i386/x86-tune.def: Replace m_LAKEMOUNT with m_LAKEMONT.
+	* doc/invoke.texi: Replace lakemount with lakemont.  Replace
+	Lakemount with Lakemont.
+
 2015-09-24  H.J. Lu  
 
 	* config.gcc (x86_archs): Replace iamcu with lakemount.
diff --git a/gcc/config.gcc b/gcc/config.gcc
index 41814b8..c4c11f9 100644
--- a/gcc/config.gcc

Re: [PATCH v2][GCC] Algorithmic optimization in match and simplify

2015-09-25 Thread Andre Vieira


On 17/09/15 10:46, Richard Biener wrote:

On Thu, Sep 3, 2015 at 1:11 PM, Andre Vieira
 wrote:

On 01/09/15 15:01, Richard Biener wrote:


On Tue, Sep 1, 2015 at 3:40 PM, Andre Vieira
 wrote:


Hi Marc,

On 28/08/15 19:07, Marc Glisse wrote:



(not a review, I haven't even read the whole patch)

On Fri, 28 Aug 2015, Andre Vieira wrote:


2015-08-03  Andre Vieira  

* match.pd: Added new patterns:
  ((X {&,<<,>>} C0) {|,^} C1) {^,|} C2)
  (X {|,^,&} C0) {<<,>>} C1 -> (X {<<,>>} C1) {|,^,&} (C0 {<<,>>}
C1)




+(for op0 (rshift rshift lshift lshift bit_and bit_and)
+ op1 (bit_ior bit_xor bit_ior bit_xor bit_ior bit_xor)
+ op2 (bit_xor bit_ior bit_xor bit_ior bit_xor bit_ior)

You can nest for-loops, it seems clearer as:
(for op0 (rshift lshift bit_and)
 (for op1 (bit_ior bit_xor)
  op2 (bit_xor bit_ior)




Will do, thank you for pointing it out.




+(simplify
+ (op2:c
+  (op1:c
+   (op0 @0 INTEGER_CST@1) INTEGER_CST@2) INTEGER_CST@3)

I suspect you will want more :s (single_use) and less :c
(canonicalization
should put constants in second position).


I can't find the definition of :s (single_use).



Sorry for that - didn't get along updating it yet :/  It restricts
matching to
sub-expressions that have a single-use.  So

+  a &= 0xd123;
+  unsigned short tem = a ^ 0x6040;
+  a = tem | 0xc031; /* Simplify _not_ to ((a & 0xd123) | 0xe071).  */
... use of tem ...

we shouldn't do the simplifcation here because the expression
(a & 0x123) ^ 0x6040 is kept live by 'tem'.


GCC internals do point out
that canonicalization does put constants in the second position, didnt
see
that first. Thank you for pointing it out.


+   C1 = wi::bit_and_not (C1,C2);

Space after ','.


Will do.


Having wide_int_storage in many places is surprising, I can't find
similar
code anywhere else in gcc.




I tried looking for examples of something similar, I think I ended up
using
wide_int because I was able to convert easily to and from it and it has
the
"mask" and "wide_int_to_tree" functions. I welcome any suggestions on
what I
should be using here for integer constant transformations and
comparisons.



Using wide-ints is fine, but you shouldn't need 'wide_int_storage'
constructors - those
are indeed odd.  Is it just for the initializers of wide-ints?

+wide_int zero_mask = wi::zero (prec);
+wide_int C0 = wide_int_storage (@1);
+wide_int C1 = wide_int_storage (@2);
+wide_int C2 = wide_int_storage (@3);
...
+   zero_mask = wide_int_storage (wi::mask (C0.to_uhwi (), false,
prec));

tree_to_uhwi (@1) should do the trick as well

+   C1 = wi::bit_and_not (C1,C2);
+   cst_emit = wi::bit_or (C1, C2);

the ops should be replacable with @2 and @3, the store to C1 obviously not
(but you can use a tree temporary and use wide_int_to_tree here to avoid
the back-and-forth for the case where C1 is not assigned to).

Note that transforms only doing association are prone to endless recursion
in case some other pattern does the reverse op...

Richard.



BR,
Andre




Thank you for all the comments, see reworked version:

Two new algorithmic optimisations:
   1.((X op0 C0) op1 C1) op2 C2)
 with op0 = {&, >>, <<}, op1 = {|,^}, op2 = {|,^} and op1 != op2
 zero_mask has 1's for all bits that are sure to be 0 in (X op0 C0)
 and 0's otherwise.
 if (op1 == '^') C1 &= ~C2 (Only changed if actually emitted)
 if ((C1 & ~zero_mask) == 0) then emit (X op0 C0) op2 (C1 op2 C2)
 if ((C2 & ~zero_mask) == 0) then emit (X op0 C0) op1 (C1 op2 C2)
   2. (X {|,^,&} C0) {<<,>>} C1 -> (X {<<,>>} C1) {|,^,&} (C0 {<<,>>} C1)


This patch does two algorithmic optimisations that target patterns like:
(((x << 24) | 0x00FF) ^ 0xFF00) and ((x ^ 0x4002) >> 1) |
0x8000.

The transformation uses the knowledge of which bits are zero after
operations like (X {&,<<,(unsigned)>>}) to combine constants, reducing
run-time operations.
The two examples above would be transformed into (X << 24) ^ 0x and
(X >> 1) ^ 0xa001 respectively.

The second transformation enables more applications of the first. Also some
targets may benefit from delaying shift operations. I am aware that such an
optimization, in combination with one or more optimizations that cause the
reverse transformation, may lead to an infinite loop. Though such behavior
has not been detected during regression testing and bootstrapping on
aarch64.


+/* (X bit_op C0) rshift C1 -> (X rshift C0) bit_op (C0 rshift C1) */
+(for bit_op (bit_ior bit_xor bit_and)
+(simplify
+ (rshift (bit_op:s @0 INTEGER_CST@1) INTEGER_CST@2)
+ (bit_op
+  (rshift @0 @2)
+  { wide_int_to_tree (type, wi::rshift (@1, @2, TYPE_SIGN (type))); })))
+
+/* (X bit_op C0) lshift C1 -> (X lshift C0) bit_op (C0 lshift C1) */
+(for bit_op (bit_ior bit_xor bit_and)
+(simplify
+ (lshift (bit_op:s @0 INTEGER_CST@1) INTEGER_CST@2)
+ (bit_op
+  (lshift @0 @2)
+

Re: [RFC] PR tree-optimization/67628: Make tree ifcombine more symmetric and interactions with dom

2015-09-25 Thread Richard Biener

On Fri, 25 Sep 2015, Kyrill Tkachov wrote:

> 
> On 25/09/15 11:15, Richard Biener wrote:
> > On Fri, 25 Sep 2015, Kyrill Tkachov wrote:
> > 
> > > On 25/09/15 10:49, Richard Biener wrote:
> > > > On Fri, 25 Sep 2015, Kyrill Tkachov wrote:
> > > > 
> > > > > Hi all,
> > > > > 
> > > > > On 23/09/15 11:10, Richard Biener wrote:
> > > > > > On Wed, 23 Sep 2015, Kyrill Tkachov wrote:
> > > > > > 
> > > > > > > On 23/09/15 10:09, Pinski, Andrew wrote:
> > > > > > > > > On Sep 23, 2015, at 1:59 AM, Kyrill Tkachov
> > > > > > > > > 
> > > > > > > > > wrote:
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > > On 22/09/15 20:31, Jeff Law wrote:
> > > > > > > > > > > On 09/22/2015 07:36 AM, Kyrill Tkachov wrote:
> > > > > > > > > > > Hi all,
> > > > > > > > > > > Unfortunately, I see a testsuite regression with this
> > > > > > > > > > > patch:
> > > > > > > > > > > FAIL: gcc.dg/pr66299-2.c scan-tree-dump-not optimized "<<"
> > > > > > > > > > > 
> > > > > > > > > > > The reduced part of that test is:
> > > > > > > > > > > void
> > > > > > > > > > > test1 (int x, unsigned u)
> > > > > > > > > > > {
> > > > > > > > > > >if ((1U << x) != 64
> > > > > > > > > > >|| (2 << x) != u
> > > > > > > > > > >|| (x << x) != 384
> > > > > > > > > > >|| (3 << x) == 9
> > > > > > > > > > >|| (x << 14) != 98304U
> > > > > > > > > > >|| (1 << x) == 14
> > > > > > > > > > >|| (3 << 2) != 12)
> > > > > > > > > > >  __builtin_abort ();
> > > > > > > > > > > }
> > > > > > > > > > > 
> > > > > > > > > > > The patched ifcombine pass works more or less as expected
> > > > > > > > > > > and
> > > > > > > > > > > produces
> > > > > > > > > > > fewer basic blocks.
> > > > > > > > > > > Before this patch a relevant part of the ifcombine dump
> > > > > > > > > > > for
> > > > > > > > > > > test1
> > > > > > > > > > > is:
> > > > > > > > > > > ;;   basic block 2, loop depth 0, count 0, freq 1,
> > > > > > > > > > > maybe
> > > > > > > > > > > hot
> > > > > > > > > > >if (x_1(D) != 6)
> > > > > > > > > > >  goto ;
> > > > > > > > > > >else
> > > > > > > > > > >  goto ;
> > > > > > > > > > > 
> > > > > > > > > > > ;;   basic block 3, loop depth 0, count 0, freq 9996,
> > > > > > > > > > > maybe
> > > > > > > > > > > hot
> > > > > > > > > > >_2 = 2 << x_1(D);
> > > > > > > > > > >_3 = (unsigned intD.10) _2;
> > > > > > > > > > >if (_3 != u_4(D))
> > > > > > > > > > >  goto ;
> > > > > > > > > > >else
> > > > > > > > > > >  goto ;
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > After this patch it is:
> > > > > > > > > > > ;;   basic block 2, loop depth 0, count 0, freq 1,
> > > > > > > > > > > maybe
> > > > > > > > > > > hot
> > > > > > > > > > >_2 = 2 << x_1(D);
> > > > > > > > > > >_3 = (unsigned intD.10) _2;
> > > > > > > > > > >_9 = _3 != u_4(D);
> > > > > > > > > > >_10 = x_1(D) != 6;
> > > > > > > > > > >_11 = _9 | _10;
> > > > > > > > > > >if (_11 != 0)
> > > > > > > > > > >  goto ;
> > > > > > > > > > >else
> > > > > > > > > > >  goto ;
> > > > > > > > > > > 
> > > > > > > > > > > The second form ends up generating worse codegen however,
> > > > > > > > > > > and
> > > > > > > > > > > the
> > > > > > > > > > > badness starts with the dom1 pass.
> > > > > > > > > > > In the unpatched case it manages to deduce that x must be
> > > > > > > > > > > 6 by
> > > > > > > > > > > the
> > > > > > > > > > > time
> > > > > > > > > > > it reaches basic block 3 and
> > > > > > > > > > > uses that information to eliminate the shift in "_2 = 2 <<
> > > > > > > > > > > x_1(D)"
> > > > > > > > > > > from
> > > > > > > > > > > basic block 3
> > > > > > > > > > > In the patched case it is unable to make that call, I
> > > > > > > > > > > think
> > > > > > > > > > > because
> > > > > > > > > > > the
> > > > > > > > > > > x != 6 condition is IORed
> > > > > > > > > > > with another test.
> > > > > > > > > > > 
> > > > > > > > > > > I'm not familiar with the internals of the dom pass, so
> > > > > > > > > > > I'm
> > > > > > > > > > > not
> > > > > > > > > > > sure
> > > > > > > > > > > where to go looking for a fix for this.
> > > > > > > > > > > Is the ifcombine change a step in the right direction? If
> > > > > > > > > > > so,
> > > > > > > > > > > what
> > > > > > > > > > > would
> > > > > > > > > > > need to be done to fix the issue with
> > > > > > > > > > > the dom pass?
> > > > > > > > > > I don't see how you can reasonably fix this in DOM.  if _9
> > > > > > > > > > or
> > > > > > > > > > _10 is
> > > > > > > > > > true, then _11 is true.  But we can't reasonably record any
> > > > > > > > > > kind
> > > > > > > > > > of
> > > > > > > > > > equivalence for _9 or _10 individually.
> > > > > > > > > > 
> > > > > > > > > > If the statement
> > > > > > > > > > _11

Re: [gcc-5-branch][PATCH][AARCH64]Fix for branch offsets over 1 MiB

2015-09-25 Thread Andre Vieira


Ping.

On 11/09/15 18:15, Andre Vieira wrote:

Conditional branches have a maximum range of [-1048576, 1048572]. Any
destination further away can not be reached by these.
To be able to have conditional branches in very large functions, we
invert the condition and change the destination to jump over an
unconditional branch to the original, far away, destination.

This patch backports the fix from trunk to the gcc-5-branch.
The original patch is at:
https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01493.html

gcc/ChangeLog:
2015-09-09  Andre Vieira  

Backport from mainline:
2015-08-27  Ramana Radhakrishnan  
Andre Vieira  

* config/aarch64/aarch64.md (*condjump): Handle functions > 1 MiB.
(*cb1): Likewise.
(*tb1): Likewise.
(*cb1): Likewise.
* config/aarch64/iterators.md (inv_cb): New code attribute.
(inv_tb): Likewise.
* config/aarch64/aarch64.c (aarch64_gen_far_branch): New.
* config/aarch64/aarch64-protos.h (aarch64_gen_far_branch): New.


gcc/testsuite/ChangeLog:
2015-09-09  Andre Vieira  

Backport from mainline:
2015-08-27  Andre Vieira  

* gcc.target/aarch64/long_branch_1.c: New test.

Re: [v3 PATCH] Avoid creating dangling references in case of nested tuples for tuple constructors that construct from other tuples.

2015-09-25 Thread Jonathan Wakely


On 25/09/15 12:38 +0300, Ville Voutilainen wrote:

Tested on Linux-PPC64.

2015-09-25  Ville Voutilainen  

   Avoid creating dangling references in case of nested tuples
   for tuple constructors that construct from other tuples.
   * include/std/tuple (_TC::_NonNestedTuple): New.
   * include/std/tuple (tuple::_TNTC): New.
   * include/std/tuple (tuple(const tuple<_UElements...>&),
   tuple(tuple<_UElements...>&&): Use _TNTC.
   * testsuite/20_util/tuple/cons/nested_tuple_construct.cc: New.


OK, I guess, though it makes my head hurt!

Thanks.

Re: [PATCH 1/5] Testsuite: add dg-{begin|end}-multiline-output commands

2015-09-25 Thread Jeff Law


On 09/22/2015 03:26 PM, David Malcolm wrote:

This patch is essentially identical to v1 here:
   https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00729.html
The only change is in the ChangeLog, moving the libgo.exp
ChangeLog entry into gcc/testsuite/ChangeLog, analogous to
where Ian put it when introducing the file in r167407.

OK for trunk?

Blurb from v1 follows:

This patch adds an easy way to write tests for expected multiline
output.  For example we can test carets and underlines for
a particular diagnostic with:

/* { dg-begin-multiline-output "" }
  typedef struct _GMutex GMutex;
 ^~~
{ dg-end-multiline-output "" } */

It is used extensively by the rest of the patch kit.

multiline.exp is used by prune.exp; hence we need to load it before
prune.exp via *load_gcc_lib* for the testsuites of the various
non-"gcc" support libraries (e.g. boehm-gc).

gcc/testsuite/ChangeLog:
* lib/multiline.exp: New file.
* lib/prune.exp: Load multiline.exp.
(prune_gcc_output): Call into multiline.exp to handle any
multiline output directives.
* lib/libgo.exp: Load multiline.exp before prune.exp, using
load_gcc_lib.

boehm-gc/ChangeLog:
* testsuite/lib/boehm-gc.exp: Load multiline.exp before
prune.exp, using load_gcc_lib.

libatomic/ChangeLog:
* testsuite/lib/libatomic.exp: Load multiline.exp before
prune.exp, using load_gcc_lib.

libgomp/ChangeLog:
* testsuite/lib/libgomp.exp: Load multiline.exp before prune.exp,
using load_gcc_lib.

libitm/ChangeLog:
* testsuite/lib/libitm.exp: Load multiline.exp before prune.exp,
using load_gcc_lib.

libvtv/ChangeLog:
* testsuite/lib/libvtv.exp: Load multiline.exp before prune.exp,
using load_gcc_lib.
This stalled due to the dejagnu version discussion, which itself has 
stalled :(


I think the only issue was the loading of prune.exp and until we've 
jumped to the latest dejagnu, using load_gcc_lib is the approved way to 
deal with that problem.


S.

Approved.  Hopefully we'll be able to clean up the load_gcc_lib mess in 
the near future, but I don't see a good reason to continue to hold up 
this patch.


jeff

Re: patch for PR61578

2015-09-25 Thread Jeff Law


On 09/24/2015 02:41 PM, Vladimir Makarov wrote:

   The following patch solves the 2nd case of

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61578

   I did a lot of benchmarking of different heuristics in hard reg cost
propagation in IRA.  This is the best what I found.  The patch improves
stably code size of SPEC2000 and its score although it is not that
significant.

   The patch was tested and bootstrapped on x86-64.

   Committed as rev. 228097.

2015-09-24  Vladimir Makarov  

 PR target/61578
 * ira-color.c (update_allocno_cost): Add parameter.
 (update_costs_from_allocno): Decrease conflict cost.  Pass the new
 parameter.
FWIW, I did a quick scan for some of those old BZs.  None were improved. 
 Sigh.


jeff

Re: patch for PR61578

2015-09-25 Thread Vladimir Makarov


On 09/25/2015 12:57 PM, Jeff Law wrote:

On 09/24/2015 02:41 PM, Vladimir Makarov wrote:

   The following patch solves the 2nd case of

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61578

   I did a lot of benchmarking of different heuristics in hard reg cost
propagation in IRA.  This is the best what I found.  The patch improves
stably code size of SPEC2000 and its score although it is not that
significant.

   The patch was tested and bootstrapped on x86-64.

   Committed as rev. 228097.

2015-09-24  Vladimir Makarov  

 PR target/61578
 * ira-color.c (update_allocno_cost): Add parameter.
 (update_costs_from_allocno): Decrease conflict cost. Pass 
the new

 parameter.
FWIW, I did a quick scan for some of those old BZs.  None were 
improved.  Sigh.


Thanks for checking them, Jeff.  I don't think the work on PR61578 has 
been finished.  I guess people will present other cases which should be 
improved and may be their solutions will solve the other PRs.

Re: CSE pass prevents loop-invariant motion

2015-09-25 Thread Jeff Law


On 09/24/2015 12:29 AM, Robin Dapp wrote:

On 09/15/2015 05:25 PM, Jeff Law wrote:

On 09/15/2015 06:11 AM, Robin Dapp wrote:

Hi,

recently, I came across a problem that keeps a load instruction
in a loop although it is loop-invariant.

[..]

You might want to check your costing model -- cprop is supposed to
look at costs to help determine when propagations are useful.

jeff



I had a look at cprop and did not find any immediate references to
cost structures or functions. When or how is cprop supposed to decide
when to take action? Currently, it looks to me as if it would always
try to propagate without checking costs.

Recently, I found this
https://gcc.gnu.org/ml/gcc-patches/2014-06/msg01321.html where
Zhenqiang Chen introduces a cost check in cprop to prevent it from
undoing invariant motion's work. AFAIK it has never gone upstream.

Something similar might also help here. I'll try to adapt it to my
problem and report back.
That patch was picked up by Kugan and a variant thereof installed back 
in June on the trunk.  See cprop.c::try_replace_reg.


jeff

Re: [PATCH] v3 of diagnostic_show_locus and rich_location (was Re: [PATCH 2/5] Reimplement diagnostic_show_locus, introducing rich_location classes (v2))

2015-09-25 Thread Manuel López-Ibáñez

On 25 September 2015 at 22:11, David Malcolm  wrote:
>>
>> +  if (0)
>> +show_ruler (context, line_width, m_x_offset);
>>
>> This should probably be removed from the final code to be committed.
>
> FWIW, the ruler is very helpful to me when debugging the locus-printing
> (e.g. when adding fix-it-hints), and if we remove that if (0) call, we
> get:
>
> warning: ‘void show_ruler(diagnostic_context*, int, int)’ defined but
> not used [-Wunused-function]
>
> which will break bootstrap, so perhaps it instead should be an option?
> "-fdiagnostics-show-ruler" or somesuch?
>
> I don't know that it would be helpful to end-users though.

Functions that are useful only for debugging GCC usually start with
debug_* and have special attribute annotation (grep ^debug_) which
prevents those kinds of warnings (or the optimizers being too smart
and removing them).

Cheers,

Manuel.

[fortran, committed] Add PR55603 testcase

2015-09-25 Thread Mikael Morin


Hello,

PR55603 seems to have been fixed at some point.
I have just committed the test and I'm about to close the PR.

Mikael

Index: gcc/testsuite/gfortran.dg/allocatable_function_9.f90
===
--- gcc/testsuite/gfortran.dg/allocatable_function_9.f90	(rÃ©vision 0)
+++ gcc/testsuite/gfortran.dg/allocatable_function_9.f90	(rÃ©vision 228151)
@@ -0,0 +1,17 @@
+! { dg-do run }
+!
+! PR fortran/55603
+! Check that the allocatable result is properly freed after use.
+!
+! Contributed by Damian Rouson 
+
+  type foo
+  end type
+  type(foo) a
+  a = bar()
+contains
+  function bar()
+type(foo), allocatable :: bar
+allocate(bar)
+  end function
+end
Index: gcc/testsuite/ChangeLog
===
--- gcc/testsuite/ChangeLog	(rÃ©vision 228150)
+++ gcc/testsuite/ChangeLog	(rÃ©vision 228151)
@@ -1,3 +1,8 @@
+2015-09-25  Mikael Morin  
+
+	PR fortran/55603
+	* gfortran.dg/allocatable_function_9.f90: New.
+
 2015-09-25  Oleg Endo  
 
 	PR target/67675

[PATCH] fix PR67700

2015-09-25 Thread Sebastian Pop

The patch makes the detection of scop parameters in parameter_index_in_region a
bit more conservative by discarding scalar variables defined in function of data
references defined in the scop.

2015-09-25  Aditya Kumar  
Sebastian Pop  

PR tree-optimization/67700
* graphite-sese-to-poly.c (parameter_index_in_region): Call
invariant_in_sese_p_rec.
(extract_affine): Same.
(rewrite_cross_bb_scalar_deps): Call update_ssa.
* sese.c (invariant_in_sese_p_rec): Export.  Handle vdefs and 
vuses.
* sese.h (invariant_in_sese_p_rec): Declare.

* testsuite/gcc.dg/graphite/run-id-pr67700.c: New.
---
 gcc/graphite-sese-to-poly.c|  8 +-
 gcc/sese.c | 14 --
 gcc/sese.h |  1 +
 gcc/testsuite/gcc.dg/graphite/run-id-pr67700.c | 36 ++
 4 files changed, 56 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/graphite/run-id-pr67700.c

diff --git a/gcc/graphite-sese-to-poly.c b/gcc/graphite-sese-to-poly.c
index 09a2f91..3b8dd56 100644
--- a/gcc/graphite-sese-to-poly.c
+++ b/gcc/graphite-sese-to-poly.c
@@ -758,6 +758,9 @@ parameter_index_in_region (tree name, sese region)
   if (TREE_CODE (TREE_TYPE (name)) != INTEGER_TYPE)
 return -1;
 
+  if (!invariant_in_sese_p_rec (name, region))
+return -1;
+
   i = parameter_index_in_region_1 (name, region);
   if (i != -1)
 return i;
@@ -813,7 +816,8 @@ extract_affine (scop_p s, tree e, __isl_take isl_space 
*space)
   break;
 
 case SSA_NAME:
-  gcc_assert (-1 != parameter_index_in_region_1 (e, SCOP_REGION (s)));
+  gcc_assert (-1 != parameter_index_in_region_1 (e, s->region)
+ || !invariant_in_sese_p_rec (e, s->region));
   res = extract_affine_name (s, e, space);
   break;
 
@@ -2462,6 +2466,8 @@ rewrite_cross_bb_scalar_deps (scop_p scop, 
gimple_stmt_iterator *gsi)
def, use_stmt);
   }
 
+  update_ssa (TODO_update_ssa);
+
   return res;
 }
 
diff --git a/gcc/sese.c b/gcc/sese.c
index db8c629..2050e2d 100644
--- a/gcc/sese.c
+++ b/gcc/sese.c
@@ -760,9 +760,10 @@ set_ifsese_condition (ifsese if_region, tree condition)
   gsi_insert_after (, cond_stmt, GSI_NEW_STMT);
 }
 
-/* Return false if T is completely defined outside REGION.  */
+/* Return true when T is defined outside REGION or when no definitions are
+   variant in REGION.  */
 
-static bool
+bool
 invariant_in_sese_p_rec (tree t, sese region)
 {
   ssa_op_iter iter;
@@ -776,9 +777,18 @@ invariant_in_sese_p_rec (tree t, sese region)
   || gimple_code (stmt) == GIMPLE_CALL)
 return false;
 
+  /* VDEF is variant when it is in the region.  */
+  if (tree vdef = gimple_vdef (stmt))
+return false;
+
+  /* A VUSE may or may not be variant following the VDEFs.  */
+  if (tree vuse = gimple_vuse (stmt))
+return invariant_in_sese_p_rec (vuse, region);
+
   FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter, SSA_OP_USE)
 {
   tree use = USE_FROM_PTR (use_p);
+
   if (!defined_in_sese_p (use, region))
continue;
 
diff --git a/gcc/sese.h b/gcc/sese.h
index 6ce5cc8..0d13d87 100644
--- a/gcc/sese.h
+++ b/gcc/sese.h
@@ -64,6 +64,7 @@ extern edge copy_bb_and_scalar_dependences (basic_block, 
sese, edge,
vec , bool *);
 extern struct loop *outermost_loop_in_sese (sese, basic_block);
 extern tree scalar_evolution_in_region (sese, loop_p, tree);
+extern bool invariant_in_sese_p_rec (tree, sese);
 
 /* Check that SESE contains LOOP.  */
 
diff --git a/gcc/testsuite/gcc.dg/graphite/run-id-pr67700.c 
b/gcc/testsuite/gcc.dg/graphite/run-id-pr67700.c
new file mode 100644
index 000..81d9e7f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/graphite/run-id-pr67700.c
@@ -0,0 +1,36 @@
+#include 
+#include 
+
+struct abc {
+  int a[81];
+} *abcd;
+
+#define FPMATH_SSE 2
+int global;
+
+void __attribute__ ((noinline)) foo()
+{
+  int pos = 0;
+  int i;
+
+  if (!((global & FPMATH_SSE) != 0))
+for (i = 8; i <= 15; i++)
+  abcd->a[pos++] = i;
+
+  for (i = 29; i <= 36; i++)
+abcd->a[pos++] = i;
+}
+
+int main()
+{
+  int i;
+  abcd = (struct abc*) malloc (sizeof (struct abc));
+  for (i = 0; i <= 80; i++)
+abcd->a[i] = 0;
+
+  foo();
+
+  assert (abcd->a[8] == 29);
+
+  return 0;
+}
-- 
2.1.0.243.g30d45f7

Re: [PATCH] v3 of diagnostic_show_locus and rich_location (was Re: [PATCH 2/5] Reimplement diagnostic_show_locus, introducing rich_location classes (v2))

2015-09-25 Thread Manuel López-Ibáñez

On 25 September 2015 at 22:18, Manuel López-Ibáñez
 wrote:
> On 25 September 2015 at 22:11, David Malcolm  wrote:


   context->last_location = diagnostic_location (diagnostic, 0);
-  expanded_location s0 = diagnostic_expand_location (diagnostic, 0);
-  expanded_location s1 = { };
-  /* Zero-initialized. This is checked later by
diagnostic_print_caret_line.  */

-  if (diagnostic_location (diagnostic, 1) > BUILTINS_LOCATION)
-s1 = diagnostic_expand_location (diagnostic, 1);
+  if (context->frontend_calls_diagnostic_print_caret_line_p)
+{
+  /* The GCC < 6 routine. */
+  expanded_location s0 = diagnostic_expand_location (diagnostic, 0);
+  expanded_location s1 = { };
+  /* Zero-initialized. This is checked later by
+ diagnostic_print_caret_line.  */
+
+  if (diagnostic_num_locations (diagnostic) >= 2)
+s1 = diagnostic->message.m_richloc->get_range (1)->m_start;

-  diagnostic_print_caret_line (context, s0, s1,
-   context->caret_chars[0],
-   context->caret_chars[1]);
+  diagnostic_print_caret_line (context, s0, s1,
+   context->caret_chars[0],
+   context->caret_chars[1]);
+}
+  else
+/* The GCC >= 6 routine.  */
+diagnostic_print_ranges (context, diagnostic);
 }


I haven't had time to look at the patch in detail, so please excuse me
if this is answered elsewhere.

Why do you need this hack? The whole point of moving Fortran to the
common machinery is to not have this duplication.

Can't the new code print one caret without ranges ever? Something like:

error: expected ';'
  }
   ^

If it can, then the function responsible for doing that can be called
by Fortran and it should replace diagnostic_print_caret_line.

Or is it that the new diagnostic_print_ranges cannot print multiple
carets in the same line? Like this

error: error at (1) and (2)
  adfadfafd asdfdaffa
   12

If this is the case, this is a missing functionality that
diagnostic_print_caret_line already has and that was ready to be used
in C/C++. See example at  (O) here:
https://gcc.gnu.org/wiki/Better_Diagnostics

In my mind, it should be possible for Fortran to pass to the
diagnostics machinery two locations with range width 1 (or 0,
depending how you want to represent a range that covers exactly one
char) and get a caret line like the example above. Why is this not
possible?

Cheers,

Manuel.

Re: [PATCH] v3 of diagnostic_show_locus and rich_location (was Re: [PATCH 2/5] Reimplement diagnostic_show_locus, introducing rich_location classes (v2))

2015-09-25 Thread David Malcolm

On Fri, 2015-09-25 at 22:39 +0200, Manuel López-Ibáñez wrote:
> On 25 September 2015 at 22:18, Manuel López-Ibáñez
>  wrote:
> > On 25 September 2015 at 22:11, David Malcolm  wrote:
> 
> 
>context->last_location = diagnostic_location (diagnostic, 0);
> -  expanded_location s0 = diagnostic_expand_location (diagnostic, 0);
> -  expanded_location s1 = { };
> -  /* Zero-initialized. This is checked later by
> diagnostic_print_caret_line.  */
> 
> -  if (diagnostic_location (diagnostic, 1) > BUILTINS_LOCATION)
> -s1 = diagnostic_expand_location (diagnostic, 1);
> +  if (context->frontend_calls_diagnostic_print_caret_line_p)
> +{
> +  /* The GCC < 6 routine. */
> +  expanded_location s0 = diagnostic_expand_location (diagnostic, 0);
> +  expanded_location s1 = { };
> +  /* Zero-initialized. This is checked later by
> + diagnostic_print_caret_line.  */
> +
> +  if (diagnostic_num_locations (diagnostic) >= 2)
> +s1 = diagnostic->message.m_richloc->get_range (1)->m_start;
> 
> -  diagnostic_print_caret_line (context, s0, s1,
> -   context->caret_chars[0],
> -   context->caret_chars[1]);
> +  diagnostic_print_caret_line (context, s0, s1,
> +   context->caret_chars[0],
> +   context->caret_chars[1]);
> +}
> +  else
> +/* The GCC >= 6 routine.  */
> +diagnostic_print_ranges (context, diagnostic);
>  }
> 
> 
> I haven't had time to look at the patch in detail, so please excuse me
> if this is answered elsewhere.

(nods; the discussion has gotten large).

> Why do you need this hack? The whole point of moving Fortran to the
> common machinery is to not have this duplication.

I attempted to address this in:
  https://gcc.gnu.org/ml/gcc-patches/2015-09/msg01700.html
where I said:

  * The Fortran frontend has its own logic for printing multiple
  locations, repeatedly calling in to diagnostic_print_caret_line.
  I hope the new printing logic is suitable for use by Fortran, but I
  wanted to keep the job of "introducing range-capable printing logic"
  separate from that of "updating Fortran diagnostics to use it",
  since I'm not very familiar with Fortran, and what is desirable
  there.  Hence to faithfully preserve the existing behavior, I
  introduced a flag into the diagnostic_context:
"frontend_calls_diagnostic_print_caret_line_p"
  which is set by the Fortran frontend, and makes diagnostic_show_locus
  use the existing printing logic.  Hopefully that's acceptable,
  say, as a migration path.

My recollection is that I saw that the Fortran frontend has logic for
calling into diagnostic_print_caret_line, noticed that the fortran
testsuite has dg- assertions about finding specific messages, and I got
worried that they embed assumptions about how the old printer worked.
Hence I wanted to avoid touching that for the first version, and so in
this patch it's a hybrid of the old Fortran printing code with the new
representation for multiple locations.

Maybe that's a cop-out.  Would you prefer that the patch goes all the
way, and that I attempt to eliminate all calls to
diagnostic_print_caret_line from the Fortran FE, and eliminate the old
implementation?  (either now, or as a followup patch?)  I may need
assistance with that; I suspect that some of the dg- assertions in the
Fortran test suite may need updating.

> Can't the new code print one caret without ranges ever? Something like:
> 
> error: expected ';'
>   }
>^

It can handle that just fine.  See the examples in line-map.h in the
patch for the kinds of things that a rich_location can represent.

> If it can, then the function responsible for doing that can be called
> by Fortran and it should replace diagnostic_print_caret_line.
> 
> Or is it that the new diagnostic_print_ranges cannot print multiple
> carets in the same line? Like this
> 
> error: error at (1) and (2)
>   adfadfafd asdfdaffa
>12

It can do that too; again see the big comment in line-map.h

> If this is the case, this is a missing functionality that
> diagnostic_print_caret_line already has and that was ready to be used
> in C/C++.

We're good, I believe.

> See example at  (O) here:
> https://gcc.gnu.org/wiki/Better_Diagnostics

Pasting it here:

foo.cc: 3:17,3:22: warning: missing braces around initializer for ‘int
[2]’ [-Wmissing-braces]
int a[2][2] = { 0, 1 , 2, 3 }; // { dg-warning "" }
^^ 
{}

It can support printing carets at both locations.  For the braces line,
I'd prefer to do those by explicitly adding a fixit-hints API.  I have a
followup patch to do that; see
  https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00732.html
for an earlier version of said patch, which in fact uses that example in
the unit tests "test_fixit_insert".  Although to be fair, I did that
with a single range rather than a pair of carets:

int a[2][2] = { 0, 1 , 2, 3 };

Re: [PATCH] fortran/67614 -- Detect an invalid NULL in an arithmetic if

2015-09-25 Thread Jerry DeLisle

On 09/25/2015 12:18 PM, Steve Kargl wrote:
> The attached patch has been built and regression tested on
> x86_64-*-freebsd.  There were no regression.
> 
> If the scalar-numeric-expr in an arithmetic-if is ar
> reference to NULL(), this is an invalid expression.
> By resolving the expression, then entity is correctly
> identified by EXPR_NULL.  Thus, the patch prevents 
> an ICE when a messed up non-scalar-numeric-expr is
> later translated in SSA form.
> 


ok for trunk.

Thanks,

Jerry

Re: [PATCH] v3 of diagnostic_show_locus and rich_location (was Re: [PATCH 2/5] Reimplement diagnostic_show_locus, introducing rich_location classes (v2))

2015-09-25 Thread Manuel López-Ibáñez

+   If SHOW_CARET_P is true, then the range should be rendered with
+   a caret at its starting location.  This
+   is for use by the Fortran frontend, for implementing the
+   "%C" and "%L" format codes.  */
+
+void
+rich_location::set_range (unsigned int idx, source_range src_range,
+  bool show_caret_p, bool overwrite_loc_p)

I do not understand when is this show_caret_p used by Fortran given
the diagnostic_show_locus code mentioned earlier.

Related to this:

inline void set_location (unsigned int idx, location_t loc, bool caret_p)

is always called with the last parameter 'true' (boolean parameters
are always almost bad API). Do you really need this parameter?

+/* Overwrite the range within this text_info's rich_location.
+   For use e.g. when implementing "+" in client format decoders.  */

If we got rid of '+' we would not need this extra work. Also '+'
breaks #pragma diagnostics. Not the fault of your patch, but it just
shows that technical debt keeps accumulating.
https://gcc.gnu.org/wiki/Partial_Transitions

Cheers,

Manuel.

Re: [PATCH] fortran/67525 -- fix ICE in SELECT TYPE

2015-09-25 Thread Jerry DeLisle

On 09/25/2015 12:13 PM, Steve Kargl wrote:
> The follwoing patch has been built and regression 
> tested on x86_64-*-freebsd.  There were no regression.
> 
> The patch removes an assert, which allows gfortran
> to detect an invalid selector in SELECT TYPE statement.
> 
OK, thanks for patch!

Jerry

a patch for PR61578

2015-09-25 Thread Vladimir Makarov

The following patch is for s390 regression introduced by the first patch 
for PR61578:


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61578

The patch was bootstrapped and tested on x86-64.

Committed as rev. 228154.

2015-09-25  Vladimir Makarov  

PR target/61578
* lra-constarints.c (match_reload): Check presence of the input 
pseudo

in the output operand.

Index: lra-constraints.c
===
--- lra-constraints.c	(revision 228097)
+++ lra-constraints.c	(working copy)
@@ -945,6 +945,12 @@ match_reload (signed char out, signed ch
 	= (ins[1] < 0 && REG_P (in_rtx)
 	   && (int) REGNO (in_rtx) < lra_new_regno_start
 	   && find_regno_note (curr_insn, REG_DEAD, REGNO (in_rtx))
+	   /* We can not use the same value if the pseudo is mentioned
+	  in the output, e.g. as an address part in memory,
+	  becuase output reload will actually extend the pseudo
+	  liveness.  We don't care about eliminable hard regs here
+	  as we are interesting only in pseudos.  */
+	   && (out < 0 || regno_use_in (REGNO (in_rtx), out_rtx) == NULL_RTX)
 	   ? lra_create_new_reg (inmode, in_rtx, goal_class, "")
 	   : lra_create_new_reg_with_unique_value (outmode, out_rtx,
 		   goal_class, ""));

Re: [PATCH] v3 of diagnostic_show_locus and rich_location (was Re: [PATCH 2/5] Reimplement diagnostic_show_locus, introducing rich_location classes (v2))

2015-09-25 Thread David Malcolm

On Fri, 2015-09-25 at 23:13 +0200, Manuel López-Ibáñez wrote:
> +   If SHOW_CARET_P is true, then the range should be rendered with
> +   a caret at its starting location.  This
> +   is for use by the Fortran frontend, for implementing the
> +   "%C" and "%L" format codes.  */
> +
> +void
> +rich_location::set_range (unsigned int idx, source_range src_range,
> +  bool show_caret_p, bool overwrite_loc_p)
> 
> I do not understand when is this show_caret_p used by Fortran given
> the diagnostic_show_locus code mentioned earlier.

The patch is something of a hybrid: on the one hand it's using the new
rich_location class for storing multiple locations for a diagnostic (and
this replaces the existing way we did this in struct text_info), but on
the other hand, for Fortran, it's using the old printing code.

rich_location::set_range exists to ensure that the %C and %L codes used
by Fortran (and "+" in the C family of FEs) can write back into the
rich_location instance, faithfully emulating the old code that wrote
back to
struct text_info's:
  location_t locations[MAX_LOCATIONS_PER_MESSAGE];

(that array is replaced in the patch by a rich_location *, pointing back
at the rich_location in the diagnostic_info).

> Related to this:
> 
> inline void set_location (unsigned int idx, location_t loc, bool caret_p)
> 
> is always called with the last parameter 'true' (boolean parameters
> are always almost bad API). Do you really need this parameter?

Ah, OK.  Maybe not there.

> +/* Overwrite the range within this text_info's rich_location.
> +   For use e.g. when implementing "+" in client format decoders.  */
> 
> If we got rid of '+' we would not need this extra work. Also '+'
> breaks #pragma diagnostics. Not the fault of your patch, but it just
> shows that technical debt keeps accumulating.
> https://gcc.gnu.org/wiki/Partial_Transitions

(nods)   That "+" thing was one of the surprises I ran into when working
on this, and is the reason that it isn't a :
  const rich_location *
but just a:
  rich_location *
given that formatting the diagnostic text can lead to the location being
modified.  I'm just emulating/supporting the existing behavior.

Dave

[PATCH] fortran/67616 -- Fix ICE within a BLOCK construct

2015-09-25 Thread Steve Kargl

The attached patch has been built and regression
tested on x86_64-*-freebsd.  No regression occurred.

The patch removes a conditional within an assert() 
that triggers when a BLOCK construct is encountered.

OK to commit?

2015-09-25  Steven G. Kargl  

PR fortran/67616
* primary.c (gfc_match_structure_constructor): Remove a condition
in an assert() that is not valid within a BLOCK-END BLOCK construct.

2015-09-25  Steven G. Kargl  

PR fortran/67616
* gfortran.dg/pr67616.f90: New test.
-- 
Steve
Index: fortran/primary.c
===
--- fortran/primary.c	(revision 228061)
+++ fortran/primary.c	(working copy)
@@ -2703,8 +2703,7 @@ gfc_match_structure_constructor (gfc_sym
   e->symtree = symtree;
   e->expr_type = EXPR_FUNCTION;
 
-  gcc_assert (sym->attr.flavor == FL_DERIVED
-	  && symtree->n.sym->attr.flavor == FL_PROCEDURE);
+  gcc_assert (sym->attr.flavor == FL_DERIVED);
   e->value.function.esym = sym;
   e->symtree->n.sym->attr.generic = 1;
 
Index: testsuite/gfortran.dg/pr67616.f90
===
--- testsuite/gfortran.dg/pr67616.f90	(revision 0)
+++ testsuite/gfortran.dg/pr67616.f90	(working copy)
@@ -0,0 +1,13 @@
+! { dg-do compile }
+! PR fortran/67616
+! Original code contributed by Gerhard Steinmetz 
+program p
+   type t
+   end type
+   type(t) :: y
+   data y /t()/
+   block
+  type(t) :: x
+  data x /t()/  ! Prior to patch, this would ICE.
+   end block
+end

Re: libgomp: Guard all offload_images/num_offload_images access by register_lock (was: libgomp nvptx plugin: rework initialisation and support the proposed load/unload hooks)

2015-09-25 Thread Ilya Verbin

On Fri, Sep 25, 2015 at 18:21:27 +0200, Thomas Schwinge wrote:
> On Thu, 26 Mar 2015 23:41:30 +0300, Ilya Verbin  wrote:
> > On Thu, Mar 26, 2015 at 13:09:19 +0100, Jakub Jelinek wrote:
> > > the current code is majorly broken.  As I've said earlier, e.g. the lack
> > > of mutex guarding gomp_target_init (which is using pthread_once guaranteed
> > > to be run just once) vs. concurrent GOMP_offload_register calls
> > > (if those are run from ctors, then I guess something like dl_load_lock
> > > ensures at least on glibc that multiple GOMP_offload_register calls aren't
> > > performed at the same time) in accessing/reallocating offload_images
> > > and num_offload_images and the lack of support to register further
> > > images after the gomp_target_init call (if you dlopen further shared
> > > libraries) is really bad.  And it would be really nice to support the
> > > unloading.
> 
> > Here is the latest patch for libgomp and mic plugin.
> 
> What about the scenario where one thread is inside
> GOMP_offload_register_ver/GOMP_offload_register (say, due to opening a
> shared library with such a mkoffload-generated constructor) and is
> modifying offload_images with register_lock held, and another thread is
> inside a GOMP_target* construct -> gomp_init_device and is accessing
> offload_images without register_lock held?  Or, why isn't that a
> reachable scenario?
> 
> Would the following patch (untested) do the right thing (locking added to
> gomp_init_device and gomp_unload_device)?  We can then also remove the
> is_register_lock parameter from gomp_load_image_to_device, and simplify
> the code.

Looks like you're right, and this scenario is possible.

  -- Ilya

Re: [PATCH] Disable -fno-reorder-blocks-and-partition if no -fprofile-use to avoid unnecessary overhead

2015-09-25 Thread Andi Kleen

Teresa Johnson  writes:

> This patch unsets -freorder-blocks-and-partition when -fprofile-use
> is not specified. Function splitting was not actually being performed
> in that case, as probably_never_executed_bb_p does not distinguish
> any basic blocks as being cold vs hot when there is no profile data.

Actually I'm experimenting with a patch to fix that by allowing
function splitting even without profile feed back. See PR66890
which has the patch. I would prefer to keep and fix it.

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only

Re: [PATCH] Disable -fno-reorder-blocks-and-partition if no -fprofile-use to avoid unnecessary overhead

2015-09-25 Thread Teresa Johnson

On Fri, Sep 25, 2015 at 7:34 AM, Jan Hubicka  wrote:
>> > What has changed between then and now? Also, do we not use
>> > estimates/heuristics when not using a profile?
>>
>> Nothing has changed - splitting effectively never kicked in without a
>> profile. Honza and I had discussed the idea that static profile
>> heuristics could eventually be used to distinguish hot vs cold bbs,
>
> Yep, the basic idea was to move EH clenaups to the cold section for start.  
> The
> cleanup code can get surprisingly large for some C++ apps.
>
>> but that hasn't happened. What I didn't notice until recently was the
>> size increase in the .o files from varasm adding in unnecessary
>> sections and labels when this option was on. Unless and until static
>
> Perhaps we also may just teach varasm to not output those when function is not
> split.  I am happy with the patch as it is because it is pointless to run the
> code when no splitting happens.

Right, that will need to happen if splitting is tuned for static
frequencies. For now I committed this patch.

I also fixed the old change log entry that Bernd pointed out.

Thanks,
Teresa

>
> Honza
>> heuristics are used to distinguish cold bbs in
>> probably_never_executed_bb_p, I don't think it makes sense to do
>> anything finer grained that just disabling the option.
>>
>> Teresa
>>
>> >
>> >
>> > Bernd
>>
>>
>>
>> --
>> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413

Re: [PATCH, fortran] Revival of AUTOMATIC patch

2015-09-25 Thread Paul Richard Thomas

Dear Jim, FX and Jerry,

Frankly, I would accept the patch with the proviso that:
(i) It is hidden behand a  gfc_notify_std (GFC_STD_GNU, "...;
(ii) The feature is set in conflict with the new features that FX
mentions, especially coarrays and bind C; and
(iii) As FX says, a good look at the testcases.

It is a rather standalone patch and so should be quite
self-maintaining. I suspect that it might even be quite useful to
optimise code for devices such as the Intel phi by dint of explicitly
preventing memory swopping.

If the other maintainers are happy with the above provisos, I suggest
Jim that you make the necessary changes and resubmit. If you are not
in a position to do this, I could apply myself to it on a timescale of
the next few weeks.

With best regards

Paul

On 25 September 2015 at 16:26, FX  wrote:
>>> All in all I’m skeptical of adding even more old language extensions with
>>> little demand when we have a hard time filling up gaps in the standard. Each
>>> addition adds to maintainance load, especially as they might not interact
>>> too well with more modern features. (For example coarrays or BIND attribute,
>>> which were not around when AUTOMATIC was in use.)
>>>
>>> I don’t find any request for this feature in the whole bugzilla database.
>>
>> That's understandable. We'll maintain this feature in our own delta. I felt 
>> it
>> was in the spirit of open source to offer it in case it was useful.
>>
>> Thanks for taking the time to review it.
>
> I definitely appreciate your contribution! And because it is now archived in 
> the mailing-list archives, if people are interested in the future they can 
> definitely pick it up. It is a rather “standalone” patch, I don’t think it 
> would bitrot fast.
>
> But maybe other developers feel differently about it, in which case we’ll 
> have a more “technical” review (mostly of the testcases needed, I think).
>
> Thanks again,
> FX

-- 
Outside of a dog, a book is a man's best friend. Inside of a dog it's
too dark to read.

Groucho Marx

Re: [PATCH] DWARF support for AIX v5

2015-09-25 Thread Richard Henderson

On 09/25/2015 11:59 AM, David Edelsohn wrote:
> * dwarf2out.c (XCOFF_DEBUGGING_INFO): Default 0 definition.
> (HAVE_XCOFF_DWARF_EXTRAS): Default to 0 definition.
> (output_fde): Don't output length for debug_frame on AIX.
> (output_call_frame_info): Don't output length for debug_frame on AIX.
> (have_macinfo): Force to False for XCOFF_DEBUGGING_INFO and not
> HAVE_XCOFF_DWARF_EXTRAS.
> (add_AT_loc_list): Return early if XCOFF_DEBUGGING_INFO and not
> HAVE_XCOFF_DWARF_EXTRAS.
> (output_compilation_unit_header): Don't output length on AIX.
> (output_pubnames): Don't output length on AIX.
> (output_aranges): Delete argument. Compute length locally. Don't
> output length on AIX.
> (output_line_info): Don't output length on AIX.
> (dwarf2out_finish): Don't compute aranges_length.
> * dwarf2asm.c (XCOFF_DEBUGGING_INFO): Default 0 definition.
> (dw2_asm_output_nstring): Emit .byte not .ascii on AIX.
> * config/rs6000/rs6000.c (rs6000_output_dwrf_dtprel): Emit correct
> symbol decoration for AIX.
> (rs6000_xcoff_debug_unwind_info): New.
> (rs6000_xcoff_asm_named_section): Emit .dwsect pseudo-op
> for SECTION_DEBUG.
> (rs6000_xcoff_declare_function_name): Emit different
> .function pseudo-op when DWARF2_DEBUG. Don't call
> xcoffout_declare_function for DWARF2_DEBUG.
> * config/rs6000/xcoff.h (TARGET_DEBUG_UNWIND_INFO):
> Redefine.
> * config/rs6000/aix71.h (DWARF2_DEBUGGING_INFO): Define.
> (PREFERRED_DEBUGGING_TYPE): Define.
> (DEBUG_INFO_SECTION): Define.
> (DEBUG_ABBREV_SECTION): Define.
> (DEBUG_ARANGES_SECTION): Define.
> (DEBUG_LINE_SECTION): Define.
> (DEBUG_PUBNAMES_SECTION): Define.
> (DEBUG_PUBTYPES_SECTION): Define.
> (DEBUG_STR_SECTION): Define.
> (DEBUG_RANGES_SECTION): Define.

Ok.


r~

Re: using scratchpads to enhance RTL-level if-conversion: the new patch is almost ready to be prepared for merging to trunk, but not 100% ready yet

2015-09-25 Thread Segher Boessenkool

On Fri, Sep 25, 2015 at 10:09:23AM -0600, Jeff Law wrote:
> >>So what that means is the presence or absence of debug information is
> >>causing a difference in
> > > the code you generate.  That is (of course) bad and indicates a bug
> >of some kind in your code.
> >
> >>I haven't put your code under a debugger or anything like that, but
> >>this does jump out:
> >
> >>+rtx_insn* temp_src_insn = BB_HEAD (then_bb);
> >>+if (temp_src_insn && NOTE_INSN_BASIC_BLOCK_P (temp_src_insn))
> >>+  temp_src_insn = NEXT_INSN (temp_src_insn); /* skip over a
> >>start-of-BB note */
> >
> >>What if BB_HEAD (then_bb) is a DEBUG_INSN with -g enabled, but is a
> >>NOTE_INSN_BASIC_BLOCK when -g is not enabled.  That could cause
> >>this kind of failure.

I haven't looked at the full code, but NEXT_INSN is even more suspicious
(you also need to skip the debug insns).

> Otherwise, you're just going to have to slog through and find out why 
> the codegen is different.  These kind of bugs are usually painful to 
> track down, but it's usually worth the time spent in the end.

Compile with -dap (directly with cc1 or with -S), find some difference
in the generated asm, see what the RTL insn number for that was (that's
what -dp is for), find where the difference came from (from the dumpfiles,
that's -da; you probably already know what pass is the culprit ;-) )


Segher

[PATCH] fortran/67614 -- Detect an invalid NULL in an arithmetic if

2015-09-25 Thread Steve Kargl

The attached patch has been built and regression tested on
x86_64-*-freebsd.  There were no regression.

If the scalar-numeric-expr in an arithmetic-if is ar
reference to NULL(), this is an invalid expression.
By resolving the expression, then entity is correctly
identified by EXPR_NULL.  Thus, the patch prevents 
an ICE when a messed up non-scalar-numeric-expr is
later translated in SSA form.

2015-09-25  Steven G. Kargl  

PR fortran/67614
* resolve.c (gfc_resolve_code): Prevent ICE for invalid EXPR_NULL.

2015-09-25  Steven G. Kargl  

PR fortran/67614
* gfortran.dg/pr67614.f90: New test.

-- 
Steve
Index: fortran/resolve.c
===
--- fortran/resolve.c	(revision 228061)
+++ fortran/resolve.c	(working copy)
@@ -10380,10 +10380,14 @@ gfc_resolve_code (gfc_code *code, gfc_na
 	  {
 	gfc_expr *e = code->expr1;
 
+	gfc_resolve_expr (e);
+	if (e->expr_type == EXPR_NULL)
+	  gfc_error ("Invalid NULL at %L", >where);
+
 	if (t && (e->rank > 0
 		  || !(e->ts.type == BT_REAL || e->ts.type == BT_INTEGER)))
 	  gfc_error ("Arithmetic IF statement at %L requires a scalar "
-			 "REAL or INTEGER expression", >expr1->where);
+			 "REAL or INTEGER expression", >where);
 
 	resolve_branch (code->label1, code);
 	resolve_branch (code->label2, code);
Index: testsuite/gfortran.dg/pr67614.f90
===
--- testsuite/gfortran.dg/pr67614.f90	(revision 0)
+++ testsuite/gfortran.dg/pr67614.f90	(working copy)
@@ -0,0 +1,12 @@
+! { dg-do compile }
+! { dg-options "-std=legacy" }
+! PR fortran/67614
+!
+program foo
+   implicit none
+   integer, pointer :: z
+   if (null(z)) 10, 20, 30! { dg-error "Invalid NULL" }
+10 continue
+20 continue
+30 continue
+end program foo

[gomp4.1] Doacross tweaks

2015-09-25 Thread Jakub Jelinek

Hi!

The latest spec proposals change the meaning of the ordered(n) clause
if collapse > 1 to count the number of source loops.
The ordered >= collapse requirement is something I'm pushing for even when
it is not in the spec (yet).
And, finally, another proposed change is that the + or - signs in the sink:
vector are significant (primarily for unsigned iterators).

Tested on x86_64-linux, committed to gomp-4_1-branch.

2015-09-25  Jakub Jelinek  

* tree.h (OMP_CLAUSE_DEPEND_SINK_NEGATIVE): Define.
* omp-low.c (extract_omp_for_data, expand_omp_for_init_counts,
expand_omp_ordered_source, expand_omp_ordered_sink,
expand_omp_for_ordered_loops, expand_omp_for_generic): Adjust
for fd->ordered being newly 0 when it used to be 0 before,
but otherwise being equal to fd->collapse - 1 + old fd->ordered.
expand_omp_ordered_source_sink): Likewise.  Warn for lexically
later iteration sink, or for sink on iteration never in iteration
space.  Use OMP_CLAUSE_DEPEND_SINK_NEGATIVE.
(lower_depend_clauses): Assert not seeing sink/source depend kinds.
* tree-pretty-print.c (dump_omp_clause): Use
OMP_CLAUSE_DEPEND_SINK_NEGATIVE for the +/- sign of sink vector
constants.
gcc/c/
* c-parser.c (c_parser_omp_clause_depend_sink): Put - sign
into OMP_CLAUSE_DEPEND_SINK_NEGATIVE instead of negating the number.
(c_parser_omp_for_loop): Adjust for ordered clause counting
source loops before collapsing instead of after it.  Require
ordered clause parameter to be >= collapse parameter.
gcc/cp/
* parser.c (cp_parser_omp_clause_depend_sink): Put - sign
into OMP_CLAUSE_DEPEND_SINK_NEGATIVE instead of negating the number.
(cp_parser_omp_for_loop): Adjust for ordered clause counting
source loops before collapsing instead of after it.  Require
ordered clause parameter to be >= collapse parameter.
* pt.c (tsubst_omp_clause_decl): Copy
OMP_CLAUSE_DEPEND_SINK_NEGATIVE flag.
gcc/testsuite/
* c-c++-common/gomp/sink-1.c (foo): Adjust for new meaning of
ordered clause on collapse > 1 loops.
* gcc.dg/gomp/sink-fold-2.c (funk): Remove xfails and adjust
for new diagnostics wording and only warnings, no errors.
libgomp/
* testsuite/libgomp.c/doacross-1.c (main): Adjust for new meaning of
ordered clause on collapse > 1 loops.

--- gcc/tree.h.jj   2015-09-03 16:36:19.0 +0200
+++ gcc/tree.h  2015-09-25 13:40:16.739807842 +0200
@@ -1446,6 +1446,9 @@ extern void protected_set_expr_location
 #define OMP_CLAUSE_DEPEND_KIND(NODE) \
   (OMP_CLAUSE_SUBCODE_CHECK (NODE, 
OMP_CLAUSE_DEPEND)->omp_clause.subcode.depend_kind)
 
+#define OMP_CLAUSE_DEPEND_SINK_NEGATIVE(NODE) \
+  TREE_PUBLIC (TREE_LIST_CHECK (NODE))
+
 #define OMP_CLAUSE_MAP_KIND(NODE) \
   ((enum gomp_map_kind) OMP_CLAUSE_SUBCODE_CHECK (NODE, 
OMP_CLAUSE_MAP)->omp_clause.subcode.map_kind)
 #define OMP_CLAUSE_SET_MAP_KIND(NODE, MAP_KIND) \
--- gcc/omp-low.c.jj2015-09-24 20:20:32.0 +0200
+++ gcc/omp-low.c   2015-09-25 18:17:13.331812912 +0200
@@ -565,7 +565,7 @@ extract_omp_for_data (gomp_for *for_stmt
 ? integer_zero_node : integer_one_node;
 }
 
-  int cnt = fd->collapse + (fd->ordered > 0 ? fd->ordered - 1 : 0);
+  int cnt = fd->ordered ? fd->ordered : fd->collapse;
   for (i = 0; i < cnt; i++)
 {
   if (i == 0 && fd->collapse == 1 && (fd->ordered == 0 || loops == NULL))
@@ -6761,7 +6761,7 @@ expand_omp_for_init_counts (struct omp_f
   return;
 }
 
-  for (i = fd->collapse; i < fd->collapse + fd->ordered - 1; i++)
+  for (i = fd->collapse; i < fd->ordered; i++)
 {
   tree itype = TREE_TYPE (fd->loops[i].v);
   counts[i] = NULL_TREE;
@@ -6770,12 +6770,12 @@ expand_omp_for_init_counts (struct omp_f
   fold_convert (itype, fd->loops[i].n2));
   if (t && integer_zerop (t))
{
- for (i = fd->collapse; i < fd->collapse + fd->ordered - 1; i++)
+ for (i = fd->collapse; i < fd->ordered; i++)
counts[i] = build_int_cst (type, 0);
  break;
}
 }
-  for (i = 0; i < fd->collapse + (fd->ordered ? fd->ordered - 1 : 0); i++)
+  for (i = 0; i < (fd->ordered ? fd->ordered : fd->collapse); i++)
 {
   tree itype = TREE_TYPE (fd->loops[i].v);
 
@@ -7074,8 +7074,7 @@ expand_omp_ordered_source (gimple_stmt_i
   enum built_in_function source_ix = BUILT_IN_GOMP_DOACROSS_POST;
   gimple g
 = gimple_build_call (builtin_decl_explicit (source_ix), 1,
-build_fold_addr_expr (counts[fd->collapse
- + fd->ordered - 1]));
+build_fold_addr_expr (counts[fd->ordered]));
   gimple_set_location (g, loc);
   gsi_insert_before (gsi, g, GSI_SAME_STMT);
 }
@@ -7091,13 +7090,38 @@ expand_omp_ordered_sink

[PATCH] v3 of diagnostic_show_locus and rich_location (was Re: [PATCH 2/5] Reimplement diagnostic_show_locus, introducing rich_location classes (v2))

2015-09-25 Thread David Malcolm

On Fri, 2015-09-25 at 10:51 +0200, Dodji Seketeli wrote:
> Hello David,
> 
> I like this!  Thank you very much for working on this.

Thanks for the review.

> I think this patch is in great shape, and once we agree on some of the
> nits I have commented on below, I think it should go in. Oh, it also
> needs the first patch (1/5, dejagnu first) to go in first, as this one
> depends on it.  I defer to the dejagnu maintainers for that one.

Indeed.  Jeff just approved it, fwiw.

> The line-map parts are OK to me too, but I have no power on those.  So I
> defer to the FE maintainers for that.  The diagnostics parts of the
> Fortran, C++ and C FE look good to me too; these are just well contained
> mechanical adjustments, but I defer to FE maintainers for the final
> word.
> 
> Please find my comments below.

Updated patch attached.

> [...]
> 
> diff --git a/gcc/diagnostic-show-locus.c b/gcc/diagnostic-show-locus.c
> 
> [...]
> 
> +/* A class to inject colorization codes when printing the diagnostic locus,
> +   tracking state as it goes.  */
> +
> +class colorizer
> +{
> 
> [...]
> 
> +  void set_state (int state);
> +  void begin_state (int state);
> +  void finish_state (int state);
> 
> I think the concept of state could use a little bit of explanation, at
> least to say that there are the same number of states, as the number
> of ranges.  And that the 'state' argument to these functions really is
> the range index.

Here's the revised comment I put in the attached patch:

+/* A class to inject colorization codes when printing the diagnostic locus.
+
+   It has one kind of colorization for each of:
+ - normal text
+ - range 0 (the "primary location")
+ - range 1
+ - range 2
+
+   The class caches the lookup of the color codes for the above.
+
+   The class also has responsibility for tracking which of the above is
+   active, filtering out unnecessary changes.  This allows layout::print_line
+   to simply request a colorization code for *every* character it prints
+   thorough this class, and have the filtering be done for it here.  */

Hopefully that comment explains the possible states the colorizer can
have.

FWIW I have a follow-up patch to add support for fix-it hints, so they
might be another kind of colorization state.
(see https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00732.html for the
earlier version of said patch, in v1 of the kit).

> Also, I am thinking that there should maybe be a layout::state type,
> which would have two notional properties (for now): range_index and
> draw_caret_p. So that this function:
> 
> +bool
> +layout::get_state_at_point (/* Inputs.  */
> + int row, int column,
> + int first_non_ws, int last_non_ws,
> + /* Outputs.  */
> + int *out_range_idx,
> + bool *out_draw_caret_p)
> 
> Would take just one output parameter, e.g, a reference to
> layout::state.

Fixed, though I called it "struct point_state", given that it's coming
from get_state_at_point.  I passed it by pointer, since AFAIK our coding
standards don't yet approve of the use of references in the codebase
(outside of places where we need them e.g. container classes).

I also added a unit test for a rich_location with two caret locations
(mimicking one of the Fortran examples), to give us coverage for this
case:

+void test_multiple_carets (void)
+{
+#if 0
+   x = x + y /* { dg-warning "8: test" } */
+/* { dg-begin-multiline-output "" }
+x = x + y
+A   B
+   { dg-end-multiline-output "" } */
+#endif
+}

where the "A" and "B" as caret chars are coming from new code in the
show_locus unittest plugin.

> +layout::layout (diagnostic_context * context,
> + const diagnostic_info *diagnostic)
> 
> [...]
> 
> +  if (loc_range->m_finish.file != m_exploc.file)
> + continue;
> +  if (loc_range->m_show_caret_p)
> + if (loc_range->m_finish.file != m_exploc.file)
> +   continue;
> 
> I think the second if clause is redundant.

Good catch; thanks.  The second if clause was meant to be testing
m_caret.file.  Fixed in the updated patch.

> 
> +  if (0)
> +show_ruler (context, line_width, m_x_offset);
> 
> This should probably be removed from the final code to be committed.

FWIW, the ruler is very helpful to me when debugging the locus-printing
(e.g. when adding fix-it-hints), and if we remove that if (0) call, we
get:

warning: ‘void show_ruler(diagnostic_context*, int, int)’ defined but
not used [-Wunused-function]

which will break bootstrap, so perhaps it instead should be an option?
"-fdiagnostics-show-ruler" or somesuch?

I don't know that it would be helpful to end-users though.

I'd prefer to just keep it in the code with the
  if (0)
as-is, since it's useful "scaffolding" for hacking on the code.

> [...]
> 
> +/* Get the column beyond the rightmost one that could contain a caret or
> +   range marker, given that we stop

[PATCH] DWARF support for AIX v5

2015-09-25 Thread David Edelsohn

This version adds extra tests for HAVE_XCOFF_DWARF_EXTRAS.  I placed
the default in dwarf2out.c instead of defaults.h.

Because eh_frame is internal to GCC with its own section, I emit the
length, but inhibit the length for AIX debug_frame section.

This allows DWARF debugging to work on older AIX 7.1 systems within
the limitations of the available sections.  It also allows it to build
and test on a wider variety of AIX 7.1 systems.

I also changed the TLS decorations to use a switch statement, as suggested.

Thanks, David

* dwarf2out.c (XCOFF_DEBUGGING_INFO): Default 0 definition.
(HAVE_XCOFF_DWARF_EXTRAS): Default to 0 definition.
(output_fde): Don't output length for debug_frame on AIX.
(output_call_frame_info): Don't output length for debug_frame on AIX.
(have_macinfo): Force to False for XCOFF_DEBUGGING_INFO and not
HAVE_XCOFF_DWARF_EXTRAS.
(add_AT_loc_list): Return early if XCOFF_DEBUGGING_INFO and not
HAVE_XCOFF_DWARF_EXTRAS.
(output_compilation_unit_header): Don't output length on AIX.
(output_pubnames): Don't output length on AIX.
(output_aranges): Delete argument. Compute length locally. Don't
output length on AIX.
(output_line_info): Don't output length on AIX.
(dwarf2out_finish): Don't compute aranges_length.
* dwarf2asm.c (XCOFF_DEBUGGING_INFO): Default 0 definition.
(dw2_asm_output_nstring): Emit .byte not .ascii on AIX.
* config/rs6000/rs6000.c (rs6000_output_dwrf_dtprel): Emit correct
symbol decoration for AIX.
(rs6000_xcoff_debug_unwind_info): New.
(rs6000_xcoff_asm_named_section): Emit .dwsect pseudo-op
for SECTION_DEBUG.
(rs6000_xcoff_declare_function_name): Emit different
.function pseudo-op when DWARF2_DEBUG. Don't call
xcoffout_declare_function for DWARF2_DEBUG.
* config/rs6000/xcoff.h (TARGET_DEBUG_UNWIND_INFO):
Redefine.
* config/rs6000/aix71.h (DWARF2_DEBUGGING_INFO): Define.
(PREFERRED_DEBUGGING_TYPE): Define.
(DEBUG_INFO_SECTION): Define.
(DEBUG_ABBREV_SECTION): Define.
(DEBUG_ARANGES_SECTION): Define.
(DEBUG_LINE_SECTION): Define.
(DEBUG_PUBNAMES_SECTION): Define.
(DEBUG_PUBTYPES_SECTION): Define.
(DEBUG_STR_SECTION): Define.
(DEBUG_RANGES_SECTION): Define.

Index: dwarf2out.c
===
--- dwarf2out.c (revision 228137)
+++ dwarf2out.c (working copy)
@@ -108,6 +108,14 @@ static rtx_insn *last_var_location_insn;
 static rtx_insn *cached_next_real_insn;
 static void dwarf2out_decl (tree);

+#ifndef XCOFF_DEBUGGING_INFO
+#define XCOFF_DEBUGGING_INFO 0
+#endif
+
+#ifndef HAVE_XCOFF_DWARF_EXTRAS
+#define HAVE_XCOFF_DWARF_EXTRAS 0
+#endif
+
 #ifdef VMS_DEBUGGING_INFO
 int vms_file_stats_name (const char *, long long *, long *, char *, int *);

@@ -594,11 +602,14 @@ output_fde (dw_fde_ref fde, bool for_eh, bool seco
  for_eh + j);
   ASM_GENERATE_INTERNAL_LABEL (l1, FDE_AFTER_SIZE_LABEL, for_eh + j);
   ASM_GENERATE_INTERNAL_LABEL (l2, FDE_END_LABEL, for_eh + j);
-  if (DWARF_INITIAL_LENGTH_SIZE - DWARF_OFFSET_SIZE == 4 && !for_eh)
-dw2_asm_output_data (4, 0x, "Initial length escape value"
-" indicating 64-bit DWARF extension");
-  dw2_asm_output_delta (for_eh ? 4 : DWARF_OFFSET_SIZE, l2, l1,
-   "FDE Length");
+  if (!XCOFF_DEBUGGING_INFO || for_eh)
+{
+  if (DWARF_INITIAL_LENGTH_SIZE - DWARF_OFFSET_SIZE == 4 && !for_eh)
+   dw2_asm_output_data (4, 0x, "Initial length escape value"
+" indicating 64-bit DWARF extension");
+  dw2_asm_output_delta (for_eh ? 4 : DWARF_OFFSET_SIZE, l2, l1,
+   "FDE Length");
+}
   ASM_OUTPUT_LABEL (asm_out_file, l1);

   if (for_eh)
@@ -794,11 +805,14 @@ output_call_frame_info (int for_eh)
   /* Output the CIE.  */
   ASM_GENERATE_INTERNAL_LABEL (l1, CIE_AFTER_SIZE_LABEL, for_eh);
   ASM_GENERATE_INTERNAL_LABEL (l2, CIE_END_LABEL, for_eh);
-  if (DWARF_INITIAL_LENGTH_SIZE - DWARF_OFFSET_SIZE == 4 && !for_eh)
-dw2_asm_output_data (4, 0x,
-  "Initial length escape value indicating 64-bit DWARF extension");
-  dw2_asm_output_delta (for_eh ? 4 : DWARF_OFFSET_SIZE, l2, l1,
-   "Length of Common Information Entry");
+  if (!XCOFF_DEBUGGING_INFO || for_eh)
+{
+  if (DWARF_INITIAL_LENGTH_SIZE - DWARF_OFFSET_SIZE == 4 && !for_eh)
+   dw2_asm_output_data (4, 0x,
+ "Initial length escape value indicating 64-bit DWARF extension");
+  dw2_asm_output_delta (for_eh ? 4 : DWARF_OFFSET_SIZE, l2, l1,
+   "Length of Common Information Entry");
+}
   ASM_OUTPUT_LABEL (asm_out_file, l1);

   /* Now that the CIE pointer is PC-relative for EH,
@@ -2995,7 +3009,8 @@ static GTY

Re: [ASAN/KSAN/TSAN doc patch] Fix broken sanitizer links in docs/invoke.texi

2015-09-25 Thread Tobias Burnus


Tobias Burnus wrote:

I intent to install it for GCC 6, 5 and 4.9.


I have now committed it to the trunk and the two branches - including a 
wording change (When setting it to -> When set to; plus adding a second 
space after the end-of-sentence period.)


Tobias

[GOOGLE] Disable -fno-reorder-blocks-and-partition if no -fprofile-use to avoid unnecessary overhead

2015-09-25 Thread Teresa Johnson

This patch backports r228136 from trunk to Google/4_9. The only
difference is that I have added the option disabling to our
Google-specific routine for removing profile options when the
specified profile gcda file is not found.

Passes regression tests. Ok for google/4_9?

Thanks,
Teresa

2015-09-25  Teresa Johnson  

Google ref b/24265250
* opts.c (finish_options): Unset -freorder-blocks-and-partition
if not using profile.
(set_profile_use_options): Unset -freorder-blocks-and-partition
if input profile not found and it wasn't explicitly set.

Index: opts.c
===
--- opts.c  (revision 228063)
+++ opts.c  (working copy)
@@ -787,6 +787,16 @@ finish_options (struct gcc_options *opts, struct g
 }

   if (opts->x_flag_reorder_blocks_and_partition
+  && !opts_set->x_flag_profile_use)
+opts->x_flag_reorder_blocks_and_partition = 0;
+
+  /* Disable -freorder-blocks-and-partition when -fprofile-use is not in
+ effect. Function splitting was not actually being performed in that case,
+ as probably_never_executed_bb_p does not distinguish any basic blocks as
+ being cold vs hot when there is no profile data. Leaving it enabled,
+ however, causes the assembly code generator to create (empty) cold
+ sections and labels, leading to unnecessary size overhead.  */
+  if (opts->x_flag_reorder_blocks_and_partition
   && !opts_set->x_flag_reorder_functions)
 opts->x_flag_reorder_functions = 1;

@@ -1365,6 +1375,8 @@ set_profile_use_options (struct gcc_options *opts,
(PARAM_MAX_INLINE_INSNS_AUTO,
  default_param_value (PARAM_MAX_INLINE_INSNS_AUTO),
 opts->x_param_values, opts_set->x_param_values);
+  if (!opts_set->x_flag_reorder_blocks_and_partition)
+opts->x_flag_reorder_blocks_and_partition = false;
}

   if (!opts_set->x_flag_branch_probabilities || reset)



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413

Re: using scratchpads to enhance RTL-level if-conversion: the new patch is almost ready to be prepared for merging to trunk, but not 100% ready yet

2015-09-25 Thread Jeff Law


On 09/25/2015 01:18 PM, Segher Boessenkool wrote:

On Fri, Sep 25, 2015 at 10:09:23AM -0600, Jeff Law wrote:

So what that means is the presence or absence of debug information is
causing a difference in
the code you generate.  That is (of course) bad and indicates a bug

of some kind in your code.


I haven't put your code under a debugger or anything like that, but
this does jump out:



+rtx_insn* temp_src_insn = BB_HEAD (then_bb);
+if (temp_src_insn && NOTE_INSN_BASIC_BLOCK_P (temp_src_insn))
+  temp_src_insn = NEXT_INSN (temp_src_insn); /* skip over a
start-of-BB note */



What if BB_HEAD (then_bb) is a DEBUG_INSN with -g enabled, but is a
NOTE_INSN_BASIC_BLOCK when -g is not enabled.  That could cause
this kind of failure.


I haven't looked at the full code, but NEXT_INSN is even more suspicious
(you also need to skip the debug insns).

Yes, you're absolutely right. :-)

jeff

[PATCH] fortran/67525 -- fix ICE in SELECT TYPE

2015-09-25 Thread Steve Kargl

The follwoing patch has been built and regression 
tested on x86_64-*-freebsd.  There were no regression.

The patch removes an assert, which allows gfortran
to detect an invalid selector in SELECT TYPE statement.

2015-09-25  Steven G. Kargl  

PR fortran/67525
* parse.c (match_deferred_characteristics): Remove an assert, which
allows an invalid SELECT TYPE selector to be detected.

 
2015-09-25  Steven G. Kargl  

PR fortran/67525
* gfortran.dg/pr67525.f90: New test.

-- 
Steve
Index: fortran/parse.c
===
--- fortran/parse.c	(revision 228061)
+++ fortran/parse.c	(working copy)
@@ -3113,15 +3113,18 @@ match_deferred_characteristics (gfc_type
 static void
 check_function_result_typed (void)
 {
-  gfc_typespec* ts = _current_ns->proc_name->result->ts;
+  gfc_typespec ts;
 
   gcc_assert (gfc_current_state () == COMP_FUNCTION);
-  gcc_assert (ts->type != BT_UNKNOWN);
+
+  if (!gfc_current_ns->proc_name->result) return;
+
+  ts = gfc_current_ns->proc_name->result->ts;
 
   /* Check type-parameters, at the moment only CHARACTER lengths possible.  */
   /* TODO:  Extend when KIND type parameters are implemented.  */
-  if (ts->type == BT_CHARACTER && ts->u.cl && ts->u.cl->length)
-gfc_expr_check_typed (ts->u.cl->length, gfc_current_ns, true);
+  if (ts.type == BT_CHARACTER && ts.u.cl && ts.u.cl->length)
+gfc_expr_check_typed (ts.u.cl->length, gfc_current_ns, true);
 }
 
 
Index: testsuite/gfortran.dg/pr67525.f90
===
--- testsuite/gfortran.dg/pr67525.f90	(revision 0)
+++ testsuite/gfortran.dg/pr67525.f90	(working copy)
@@ -0,0 +1,18 @@
+! { dg-do compile }
+! PR fortran/67525
+! Code contributed by Gerhard Steinmetz
+!
+real function f(x)
+   select type (x) ! { dg-error "shall be polymorphic" }
+   end select
+end function f
+
+real function g(x)
+   select type (x=>null()) ! { dg-error "shall be polymorphic" }
+   end select
+end function g
+
+subroutine a(x)
+   select type (x) ! { dg-error "shall be polymorphic" }
+   end select
+end subroutine a

Re: [Patch, fortran] PR40054 and PR63921 - Implement pointer function assignment - redux

2015-09-25 Thread Mikael Morin


Hello Paul,

Le 25/09/2015 14:21, Paul Richard Thomas a écrit :

Dear Mikael, dear all,

Please find attached a revised version of the patch that, I believe,
addresses all the comments. The patch is very much improved and these
improvements are verified by the two extra testcases. Thanks a
million!

Most of the effort involved in preparing this revised patch was
associated with getting rid of ICEs/segfaults triggered by error
recovery. The error handling in resolve_ptr_fcn_assign is still a bit
clumsy but its behaviour is more consistent.

Bootstraps and regtests on FC21/x86_64 - OK for trunk?

Cheers

Paul

2015-09-25  Paul Thomas  

 * decl.c (get_proc_name): Return if statement function is
 found.
 * expr.c (gfc_check_vardef_context): Add error return for
 derived type expression lacking the derived type itself.
 * io.c (next_char_not_space): Change tab warning to warning now
 to prevent locus being lost.

This has disappeared?


 * match.c (gfc_match_ptr_fcn_assign): New function.
 * match.h : Add prototype for gfc_match_ptr_fcn_assign.
 * parse.c : Add static flag 'in_specification_block'.
 (decode_statement): If in specification block match a statement
 function, then, if standard embraces F2008 and no error arising
 from statement function matching, try to match pointer function
 assignment.
 (parse_interface): Set 'in_specification_block' on exiting from
 parse_spec.
 (parse_spec): Set and then reset 'in_specification_block'.
 (gfc_parse_file): Set 'in_specification_block'.
 * resolve.c (get_temp_from_expr): Extend to include functions
 and array constructors as rvalues..
 (resolve_ptr_fcn_assign): New function.
 (gfc_resolve_code): Call it on finding a pointer function as an
 lvalue. If valid or on error, go back to start of resolve_code.
 * symbol.c (gfc_add_procedure): Add a sentence to the error to
 flag up the ambiguity between a statement function and pointer
 function assignment at the end of the specification block.




Index: gcc/fortran/parse.c
===
*** gcc/fortran/parse.c (revision 227854)
--- gcc/fortran/parse.c (working copy)



*** decode_statement (void)
*** 356,362 

match (NULL, gfc_match_assignment, ST_ASSIGNMENT);
match (NULL, gfc_match_pointer_assignment, ST_POINTER_ASSIGNMENT);
!   match (NULL, gfc_match_st_function, ST_STATEMENT_FUNCTION);

match (NULL, gfc_match_data_decl, ST_DATA_DECL);
match (NULL, gfc_match_enumerator_def, ST_ENUMERATOR);
--- 357,375 

match (NULL, gfc_match_assignment, ST_ASSIGNMENT);
match (NULL, gfc_match_pointer_assignment, ST_POINTER_ASSIGNMENT);
!
!   if (in_specification_block)
! {
!   m = match_word (NULL, gfc_match_st_function, _locus);
!   if (m == MATCH_YES)
!   return ST_STATEMENT_FUNCTION;
! }
!
!   if (!(in_specification_block && m == MATCH_ERROR)
!   && !gfc_notification_std (GFC_STD_F2008))
! {
!   match (NULL, gfc_match_ptr_fcn_assign, ST_ASSIGNMENT);
! }
I think that for better error reporting (avoid unclassifiable 
statement), the gfc_notification_std can be dropped, as there is a 
specific gfc_notify_std guarding resolution.


Same for the rest of the condition.  gfc_match_ptr_fcn_assign carrefully 
restores existing errors upon failure, so I would rather use it more often.


So, can you try removing the condition completely (and use the match 
macro above again)?  that should improve errors in ptr_func_assign_2, 
and hopefully not regress.

If it does regress, let's keep it as is.




Index: gcc/fortran/resolve.c
===
*** gcc/fortran/resolve.c   (revision 227854)
--- gcc/fortran/resolve.c   (working copy)
*** generate_component_assignments (gfc_code
*** 10133,10138 
--- 10141,10205 
  }


+ /* F2008: Pointer function assignments are of the form:
+   ptr_fcn (args) = expr
+This function breaks these assignments into two statements:
+   temporary_pointer => ptr_fcn(args)
+   temporary_pointer = expr  */
+
+ static bool
+ resolve_ptr_fcn_assign (gfc_code **code, gfc_namespace *ns)
+ {
+   gfc_expr *tmp_ptr_expr;
+   gfc_code *this_code;
+   gfc_component *comp;
+   gfc_symbol *s;
+
+   if ((*code)->expr1->expr_type != EXPR_FUNCTION)
+ return false;
+
+   /* Even if standard does not support this feature, continue to build
+  the two statements to avoid upsetting frontend_passes.c.  */
+   gfc_notify_std (GFC_STD_F2008, "Pointer procedure assignment at "
+ "%L", &(*code)->loc);
+
+   comp = gfc_get_proc_ptr_comp ((*code)->expr1);
+
+   if (comp)
+ s = comp->ts.interface;
+   else
+ s = (*code)->expr1->symtree->n.sym;
+
+   if (s == NULL || !s->result->attr.pointer)
+ {
+   gfc_error ("F2008: The function result at %L must have "
+

Re: [PATCH][RTL-ifcvt] PR rtl-optimization/67465: Handle pairs of complex+simple blocks and empty blocks more gracefully

2015-09-25 Thread Jeff Law


On 09/25/2015 05:06 AM, Kyrill Tkachov wrote:

Hi Rainer,

On 25/09/15 11:57, Rainer Orth wrote:

Hi Kyrill,


Bootstrapped and tested on aarch64 and x86_64.
Rainer, could you please try this patch in combination with the one I
sent
earlier at:
https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00815.html

it took me quite a bit, but I've now regtested those two patches: with
them both applied, the sparc-sun-solaris2.12 build succeeds and the two
gcc.c-torture/execute/20071216-1.c failures are gone.

So, from a SPARC POV the patches are good to go.


Phew, thanks a lot!

So, in conclusion the patches I'd like approval for are:
https://gcc.gnu.org/ml/gcc-patches/2015-09/msg01306.html
and
https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00815.html
These are OK.  Thanks for taking the time to work with Rainer and sort 
out the sparc issues.  It's greatly appreciated.


Jeff

Re: [PATCH] Disable -fno-reorder-blocks-and-partition if no -fprofile-use to avoid unnecessary overhead

2015-09-25 Thread Jeff Law


On 09/25/2015 10:58 AM, Teresa Johnson wrote:

Woops, we crossed wires. I just committed this patch. Would you like
me to revert it?
Leave it.  If Andi can include a reversion if he can pound his work 
around 66890 into submission.  But I think it'd need some of the 
varasm.c work Jan hinted at.


jeff

Re: Elimitate duplication of get_catalogs in different abi

2015-09-25 Thread Jonathan Wakely


On 25/09/15 16:10 +0100, Jonathan Wakely wrote:

On 25/09/15 16:08 +0100, Jonathan Wakely wrote:

On 23/09/15 21:28 +0200, François Dumont wrote:

On 05/09/2015 23:02, François Dumont wrote:

On 22/08/2015 14:24, Daniel Krügler wrote:

2015-08-21 23:11 GMT+02:00 François Dumont :

I think I found a better way to handle this problem. It is c++locale.cc
that needs to be built with --fimplicit-templates. I even think that the
*_cow.cc file do not need this option but as I don't know what is the
drawback of this option I kept it. I also explicitely used the file name
c++locale.cc even if it is an alias to a configurable source file.  I
guess there must be some variable to use no ?

With this patch there are 6 additional symbols. I guess I need to
declare those in the scripts even if it is for internal library usage,
right ?

I would expect that the new Catalog_info definition either has deleted
or properly (user-)defined copy constructor and copy assignment
operator.


- Daniel


This type is used in C++98 so I need to make those private, not deleted.

With this change, is the patch ok to commit ?

François



What about this patch ?

I am still uncomfortable in exposing those implementation details in the
versionned symbols but I don't know how to do otherwise. Do you want me
to push this code in std::__detail namespace ?


I think because the types are only used internally in the library we
don't need to export them. The other code inside the shared library
can refer to those symbols without them being exported.

That way users can't see their names (because they're not in any
public headers) and can't use the symbols (because they're not
exported) so they're pure internal implementation details.

I tested it briefly and it seems to work, so if you can confirm it
still works then the patch is OK without the changes to gnu.ver


Oh, the problem is that the symbols are matched by patterns in the
_GLIBCXX_3.4 version, so get exported with that version instead. Gah.

In that case your patch would not have worked on Solaris anyway, as
the SOlaris linker gives an error if a symbol matches patterns in more
than one symbol version.

Let me try to adjust the gnu.ver script to make this work ...


This should do it ...

diff --git a/libstdc++-v3/config/abi/pre/gnu.ver b/libstdc++-v3/config/abi/pre/gnu.ver
index d42cd37..c761052 100644
--- a/libstdc++-v3/config/abi/pre/gnu.ver
+++ b/libstdc++-v3/config/abi/pre/gnu.ver
@@ -24,7 +24,7 @@ GLIBCXX_3.4 {
 # Names inside the 'extern' block are demangled names.
 extern "C++"
 {
-  std::[A-Z]*;
+  std::[ABD-Z]*;
   std::a[a-c]*;
   std::ad[a-n]*;
   std::ad[p-z]*;
@@ -106,7 +106,7 @@ GLIBCXX_3.4 {
 # std::istringstream*;
   std::istrstream*;
   std::i[t-z]*;
-  std::[A-Zj-k]*;
+  std::[j-k]*;
 # std::length_error::l*;
 # std::length_error::~l*;
   std::locale::[A-Za-e]*;
@@ -132,9 +132,8 @@ GLIBCXX_3.4 {
 # std::logic_error::l*;
   std::logic_error::what*;
 # std::logic_error::~l*;
-# std::[A-Zm-r]*;
-# std::[A-Zm]*;
-  std::[A-Z]*;
+# std::[m-r]*;
+# std::[m]*;
   std::messages[^_]*;
 # std::messages_byname*;
   std::money_*;
@@ -175,11 +174,13 @@ GLIBCXX_3.4 {
 # std::t[i-n]*;
   std::tr1::h[^a]*;
   std::t[s-z]*;
-# std::[A-Zu-z]*;
+# std::[u-z]*;
 # std::underflow_error::u*;
 # std::underflow_error::~u*;
   std::unexpected*;
-  std::[A-Zv-z]*;
+  std::valarray*;
+  # std::vector*
+  std::[w-z]*;
   std::_List_node_base::hook*;
   std::_List_node_base::swap*;
   std::_List_node_base::unhook*;

Re: [PATCH] Disable -fno-reorder-blocks-and-partition if no -fprofile-use to avoid unnecessary overhead

2015-09-25 Thread Teresa Johnson

Woops, we crossed wires. I just committed this patch. Would you like
me to revert it?

Teresa

On Fri, Sep 25, 2015 at 9:57 AM, Andi Kleen  wrote:
> Teresa Johnson  writes:
>
>> This patch unsets -freorder-blocks-and-partition when -fprofile-use
>> is not specified. Function splitting was not actually being performed
>> in that case, as probably_never_executed_bb_p does not distinguish
>> any basic blocks as being cold vs hot when there is no profile data.
>
> Actually I'm experimenting with a patch to fix that by allowing
> function splitting even without profile feed back. See PR66890
> which has the patch. I would prefer to keep and fix it.
>
> -Andi
>
>
> --
> a...@linux.intel.com -- Speaking for myself only



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413

Re: [PATCH, fortran] Revival of AUTOMATIC patch

2015-09-25 Thread Steve Kargl

On Fri, Sep 25, 2015 at 07:23:53PM +0200, Paul Richard Thomas wrote:
> 
> Frankly, I would accept the patch with the proviso that:
> (i) It is hidden behand a  gfc_notify_std (GFC_STD_GNU, "...;
> (ii) The feature is set in conflict with the new features that FX
> mentions, especially coarrays and bind C; and

Did we also need to add Cray pointers to the list of possible
conflicts?  I can't remember the details of the implementation,
but I do recall that Asher added a number of conflicts for 
F95+ attributes.

-- 
Steve

Re: [PATCH] Disable -fno-reorder-blocks-and-partition if no -fprofile-use to avoid unnecessary overhead

2015-09-25 Thread Andi Kleen

On Fri, Sep 25, 2015 at 11:01:47AM -0600, Jeff Law wrote:
> On 09/25/2015 10:58 AM, Teresa Johnson wrote:
> >Woops, we crossed wires. I just committed this patch. Would you like
> >me to revert it?
> Leave it.  If Andi can include a reversion if he can pound his work
> around 66890 into submission. 

The patch is stable, was just gathering more data.

> But I think it'd need some of the
> varasm.c work Jan hinted at.

The varasm work should be only needed if no function splitting is done,
right? With my patch most functions do function splitting.

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only.

Re: using scratchpads to enhance RTL-level if-conversion: the new patch is almost ready to be prepared for merging to trunk, but not 100% ready yet

2015-09-25 Thread Jeff Law


On 09/25/2015 11:13 AM, Abe Skolnik wrote:

From: Jeff Law 



Sent: Friday, September 25, 2015 11:09 AM




I'd be surprised if we had DEBUG_INSNs as the first insn in a block
(before the note), but it can't hurt to verify.  I believe cfgrtl checks
for proper location of the block note.  But maybe I'm mis-remembering.
It might not be a bad idea to run the verification code immediately
after your transformation (for testing purposes only).


Thanks for the advice.


By "run the verification code", did you mean:

   * call some code whereby "cfgrtl checks for proper location of the block 
note"?

   * re-use some other already-existing-in-GCC code?

   * write some new verification code of my own?
There's several routines for verfication of the CFG and various RTL 
artifacts in cfgrtl.c.  I'm not offhand sure of the main entry point for 
those routines.  I believe they're typically run by the pass manager.






By "after your transformation", did you mean:

   * just before my code does "goto success;", which jumps to a point
 in "noce_process_if_block" that was already there and is shared
 between several different RTL-level if conversions?
At whatever point you've finished your transformation of a single 
IF-THEN-ELSE block.  That way if something was wrong, you'll know 
precisely which IF-THEN-ELSE block is getting messed up.


Since it's just for debugging purposes, you can afford the time to do 
the checking this often.  Obviously that wouldn't be part of the 
official patch.




   * well after the relevant "success:" but before returning?

   * both of the preceding?

   * "other"?
WHenver you've totally completed transforming an IF-THEN-ELSE block, but 
before any other transformations are started.


jeff

Re: [PATCH] v3 of diagnostic_show_locus and rich_location (was Re: [PATCH 2/5] Reimplement diagnostic_show_locus, introducing rich_location classes (v2))

2015-09-25 Thread Manuel López-Ibáñez

On 25 September 2015 at 23:24, David Malcolm  wrote:
> On Fri, 2015-09-25 at 23:13 +0200, Manuel López-Ibáñez wrote:
>> +   If SHOW_CARET_P is true, then the range should be rendered with
>> +   a caret at its starting location.  This
>> +   is for use by the Fortran frontend, for implementing the
>> +   "%C" and "%L" format codes.  */
>> +
>> +void
>> +rich_location::set_range (unsigned int idx, source_range src_range,
>> +  bool show_caret_p, bool overwrite_loc_p)
>>
>> I do not understand when is this show_caret_p used by Fortran given
>> the diagnostic_show_locus code mentioned earlier.

[...]
> rich_location::set_range exists to ensure that the %C and %L codes used
> by Fortran (and "+" in the C family of FEs) can write back into the
> rich_location instance, faithfully emulating the old code that wrote
> back to
> struct text_info's:
>   location_t locations[MAX_LOCATIONS_PER_MESSAGE];

Why Fortran cannot use text->set_location like the other FEs? This way
you do not need set_range at all. In fact, you do:

+source_range range
+  = source_range::from_location (
+  linemap_position_for_loc_and_offset (line_table,
+   loc->lb->location,
+   offset));
+text->set_range (loc_num, range, true);

But I guess this doesn't actually create a range like ^~~~ but as single ^.

The other issue that confuses me is that show_caret_p is always true
when reaching this function via the pretty-printer. Thus, show_caret_p
is also used by C/C++. In fact, I'm not sure when it can be false.

Cheers,

Manuel.

Re: [PATCH] v3 of diagnostic_show_locus and rich_location (was Re: [PATCH 2/5] Reimplement diagnostic_show_locus, introducing rich_location classes (v2))

2015-09-25 Thread Manuel López-Ibáñez

On 25 September 2015 at 23:15, David Malcolm  wrote:
> My recollection is that I saw that the Fortran frontend has logic for
> calling into diagnostic_print_caret_line, noticed that the fortran
> testsuite has dg- assertions about finding specific messages, and I got
> worried that they embed assumptions about how the old printer worked.
> Hence I wanted to avoid touching that for the first version, and so in
> this patch it's a hybrid of the old Fortran printing code with the new
> representation for multiple locations.

It is quite simple, one you understand the logic. Fortran has three
types of output:

(a) # [name]:[locus]:
#
#some code
#  1
# Error: Some error at (1)

which can call the same function used by other FEs to print the caret
line (I call the caret line, the line that contains the caret
character/ranges, 1 in this case).

(b) # [name]:[locus]:
#
#   some code and some more code
#  1   2
# Error: Some error at (1) and (2)

which according to what you explained should also be possible by
calling diagnostic_show_locus with the appropriate location info and

(c) # [name]:[locus]:
#
#   some code
#  1
# [name]:[locus2]:
#
#   some other code
# 2
# Error: Some error at (1) and (2)
# or

which was implemented by calling diagnostic_show_locus with just the
location of 1, then calling diagnostic_print_caret_line with just the
expanded_location of 2. I could have just called diagnostic_show_locus
also to print 2 by overriding diagnostic->location[0] =
diagnostic->location[1] and caret_char[0] = caret_char[1], but that
seemed a bit hackish and more expensive (but perhaps less confusing?).

If you have a function that you can call with one or more
location_t/expanded_location  (or something that can be converted from
a location_t) and pass explicitly the caret_char, then you just need
to call that function with the right parameters to get the second part
of (c). Otherwise, you may simply temporarily do caret_char[0] =
caret_char[1], before calling the same function that prints the
caret-line for (a).

> Maybe that's a cop-out.  Would you prefer that the patch goes all the
> way, and that I attempt to eliminate all calls to
> diagnostic_print_caret_line from the Fortran FE, and eliminate the old
> implementation?  (either now, or as a followup patch?)  I may need
> assistance with that; I suspect that some of the dg- assertions in the
> Fortran test suite may need updating.

There is only one call! I just think this hack is really not necessary
(in fact, it seems more complicated than the alternatives outlined
above). And I'm afraid that once it goes in, it will stay there
forever. You are in a far better position than the Fortran devs to
understand how to call your new interfaces to get the output you
desire.

Cheers,

Manuel.

Re: [RFC] Try vector as a new representation for vector masks

2015-09-25 Thread Richard Biener

On Thu, Sep 24, 2015 at 6:37 PM, Richard Henderson  wrote:
> On 09/24/2015 01:09 AM, Richard Biener wrote:
>> Both are basically a (target) restriction on how we should expand a 
>> conditional
>> move (and its condition).  It's techincally convenient to tie both together 
>> by
>> having them in the same statement but it's also techincally very incovenient
>> in other places.  I'd say for targets where
>>
>> tem_1 = a_2 < b_3;
>> res_4 = tem_1 ? c_5 : d_6;
>> res_7 = tem_1 ? x_8 : z_9;
>>
>> presents a serious issue ("re-using" the flags register) out-of-SSA should
>> duplicate the conditionals so that TER can do its job (and RTL expansion
>> should use TER to get at the flags setter).
>
> Sure it's a target restriction, but it's an extremely common one.  Essentially
> all of our production platforms have it.  What do we gain by adding some sort
> of target hook for this?

A cleaner IL, no GENERIC expression tree building in GIMPLE (I guess that's
sth Andrew needs for his GIMPLE types project as well), less awkward
special-casing of comparisons based on context in code like genmatch.c
or in value-numbering.

>> I imagine that if we expand the above to adjacent statements the CPUs can
>> re-use the condition code.
>
> Sure, but IMO it should be the job of RTL CSE to make that decision, after all
> of the uses (and clobbers) of the flags register have been exposed.
>
>> To me where the condition is in GIMPLE is an implementation detail and the
>> inconveniences outweight the benefits.
>
> Why is a 3-operand gimple statement fine, but a 4-operand gimple statement
> inconvenient?

The inconvenience is not the number of operands but that we have two operation
codes and that we compute two values but only have an SSA name def for one
of them.  Oh, and did I mention that second operation is GENERIC?

So one way to clean things up would be to no longer use a GIMPLE_ASSIGN for
x = a < b ? c : d but instead use a GIMPLE_COND and give that a SSA def
for the result, using the true/false label operand places for 'c' and 'd'.

That still wouldn't get the compare a SSA def but at least it would get rid of
the 2nd operator code and the GENERIC expression operand.

>From the GIMPLE side forcing out the comparison to a separate stmt looks
more obvious and if we're considering doing a different thing then we may as
well think of how to represent predicating arbitrary stmts or how to explicitely
model condition codes in GIMPLE.

It kind of looks like we want a GIMPLE PARALLEL ... (we already have
a GIMPLE stmt with multiple defs - GIMPLE_ASM)

Richard.

>
> r~

[gomp4, committed] Skip inner loops in oacc kernels region

2015-09-25 Thread Tom de Vries


Hi,

this patch fixes an ICE when trying to parallelize inner loops in an 
oacc kernels region.


The patch fixes it by not trying to parallelize those inner loops.

Committed to gomp-4_0-branch.

Thanks,
- Tom
Skip inner loops in oacc kernels region

2015-09-24  Tom de Vries  

	* tree-parloops.c (parallelize_loops): Skip inner loops in oacc kernels
	region.

	* gfortran.dg/goacc/kernels-loop-inner.f95: New test.
---
 .../gfortran.dg/goacc/kernels-loop-inner.f95   | 23 ++
 gcc/tree-parloops.c|  8 +++-
 2 files changed, 30 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-inner.f95

diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-inner.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-inner.f95
new file mode 100644
index 000..4db3a50
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-inner.f95
@@ -0,0 +1,23 @@
+! { dg-additional-options "-O2" }
+! { dg-additional-options "-ftree-parallelize-loops=32" }
+
+program main
+   implicit none
+
+   integer :: a(100,100), b(100,100)
+   integer :: i, j, d
+
+   !$acc kernels
+   do i=1,100
+ do j=1,100
+   a(i,j) = 1
+   b(i,j) = 2
+   a(i,j) = a(i,j) + b(i,j)
+ end do
+   end do
+   !$acc end kernels
+
+   d = sum(a)
+
+   print *,d
+end program main
diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c
index 04ec254..5d93bc2 100644
--- a/gcc/tree-parloops.c
+++ b/gcc/tree-parloops.c
@@ -2792,7 +2792,8 @@ parallelize_loops (bool oacc_kernels_p)
 {
   if (loop == skip_loop)
 	{
-	  if (dump_file && (dump_flags & TDF_DETAILS))
+	  if (!loop->in_oacc_kernels_region
+	  && dump_file && (dump_flags & TDF_DETAILS))
 	fprintf (dump_file,
 		 "Skipping loop %d as inner loop of parallelized loop\n",
 		 loop->num);
@@ -2810,6 +2811,10 @@ parallelize_loops (bool oacc_kernels_p)
 	  if (!loop->in_oacc_kernels_region)
 	continue;
 
+	  /* Don't try to parallelize inner loops in an oacc kernels region.  */
+	  if (loop->inner)
+	skip_loop = loop->inner;
+
 	  if (dump_file && (dump_flags & TDF_DETAILS))
 	fprintf (dump_file,
 		 "Trying loop %d with header bb %d in oacc kernels region\n",
@@ -2892,6 +2897,7 @@ parallelize_loops (bool oacc_kernels_p)
 	}
 
   changed = true;
+  /* Skip inner loop(s) of parallelized loop.  */
   skip_loop = loop->inner;
   if (dump_file && (dump_flags & TDF_DETAILS))
   {
-- 
1.9.1

New Turkish PO file for 'gcc' (version 5.2.0)

2015-09-25 Thread Translation Project Robot

Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the Turkish team of translators.  The file is available at:

http://translationproject.org/latest/gcc/tr.po

(This file, 'gcc-5.2.0.tr.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

http://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

http://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.

[ASAN/KSAN/TSAN doc patch] Fix broken sanitizer links in docs/invoke.texi

2015-09-25 Thread Tobias Burnus

Dear all,

the attached patch fixes the broken sanitizer links, coming from Google's
move from code.google.com to GitHub. I additionally added "help=1" to the
ASAN options as that shows the really supported options of the installed
ASAN library (the web page can be newer/older or just incomplete) and it
also works without internet access.

I intent to install it for GCC 6, 5 and 4.9.

Gerald: The gcc-4.8/changes.html is also affected. Does it make sense to
update the links there as well?

Comments, suggestions?

Cheers,

Tobias
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 844d254..fc49bfe 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -5857,25 +5857,26 @@ many times it is given.  This is mainly intended to be used with
 Enable AddressSanitizer, a fast memory error detector.
 Memory access instructions are instrumented to detect
 out-of-bounds and use-after-free bugs.
-See @uref{http://code.google.com/p/address-sanitizer/} for
+See @uref{https://github.com/google/sanitizers/wiki/AddressSanitizer} for
 more details.  The run-time behavior can be influenced using the
-@env{ASAN_OPTIONS} environment variable; see
-@url{https://code.google.com/p/address-sanitizer/wiki/Flags#Run-time_flags} for
-a list of supported options.
+@env{ASAN_OPTIONS} environment variable. When setting it to @code{help=1},
+the available options are shown at startup of the instrumended program. See
+@url{https://github.com/google/sanitizers/wiki/AddressSanitizerFlags#run-time-flags}
+for a list of supported options.
 
 @item -fsanitize=kernel-address
 @opindex fsanitize=kernel-address
 Enable AddressSanitizer for Linux kernel.
-See @uref{http://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel} for more details.
+See @uref{https://github.com/google/kasan/wiki} for more details.
 
 @item -fsanitize=thread
 @opindex fsanitize=thread
 Enable ThreadSanitizer, a fast data race detector.
 Memory access instructions are instrumented to detect
-data race bugs.  See @uref{http://code.google.com/p/thread-sanitizer/} for more
+data race bugs.  See @uref{https://github.com/google/sanitizers/wiki#threadsanitizer} for more
 details. The run-time behavior can be influenced using the @env{TSAN_OPTIONS}
 environment variable; see
-@url{https://code.google.com/p/thread-sanitizer/wiki/Flags} for a list of
+@url{https://github.com/google/sanitizers/wiki/ThreadSanitizerFlags} for a list of
 supported options.
 
 @item -fsanitize=leak
@@ -5885,7 +5886,7 @@ This option only matters for linking of executables and if neither
 @option{-fsanitize=address} nor @option{-fsanitize=thread} is used.  In that
 case the executable is linked against a library that overrides @code{malloc}
 and other allocator functions.  See
-@uref{https://code.google.com/p/address-sanitizer/wiki/LeakSanitizer} for more
+@uref{https://github.com/google/sanitizers/wiki/AddressSanitizerLeakSanitizer} for more
 details.  The run-time behavior can be influenced using the
 @env{LSAN_OPTIONS} environment variable.

Re: [PATCH 2/5] Reimplement diagnostic_show_locus, introducing rich_location classes (v2)

2015-09-25 Thread Dodji Seketeli

Hello David,

I like this!  Thank you very much for working on this.

I think this patch is in great shape, and once we agree on some of the
nits I have commented on below, I think it should go in. Oh, it also
needs the first patch (1/5, dejagnu first) to go in first, as this one
depends on it.  I defer to the dejagnu maintainers for that one.

The line-map parts are OK to me too, but I have no power on those.  So I
defer to the FE maintainers for that.  The diagnostics parts of the
Fortran, C++ and C FE look good to me too; these are just well contained
mechanical adjustments, but I defer to FE maintainers for the final
word.

Please find my comments below.

[...]

diff --git a/gcc/diagnostic-show-locus.c b/gcc/diagnostic-show-locus.c

[...]

+/* A class to inject colorization codes when printing the diagnostic locus,
+   tracking state as it goes.  */
+
+class colorizer
+{

[...]

+  void set_state (int state);
+  void begin_state (int state);
+  void finish_state (int state);

I think the concept of state could use a little bit of explanation, at
least to say that there are the same number of states, as the number
of ranges.  And that the 'state' argument to these functions really is
the range index.

Also, I am thinking that there should maybe be a layout::state type,
which would have two notional properties (for now): range_index and
draw_caret_p. So that this function:

+bool
+layout::get_state_at_point (/* Inputs.  */
+   int row, int column,
+   int first_non_ws, int last_non_ws,
+   /* Outputs.  */
+   int *out_range_idx,
+   bool *out_draw_caret_p)

Would take just one output parameter, e.g, a reference to
layout::state.


+layout::layout (diagnostic_context * context,
+   const diagnostic_info *diagnostic)

[...]

+  if (loc_range->m_finish.file != m_exploc.file)
+   continue;
+  if (loc_range->m_show_caret_p)
+   if (loc_range->m_finish.file != m_exploc.file)
+ continue;

I think the second if clause is redundant.

+  if (0)
+show_ruler (context, line_width, m_x_offset);

This should probably be removed from the final code to be committed.

[...]

+/* Get the column beyond the rightmost one that could contain a caret or
+   range marker, given that we stop rendering at trailing whitespace.  */
+
+int
+layout::get_x_bound_for_row (int row, int caret_column,
+int last_non_ws)

Please describe what the parameters mean here, especially last_non_ws.
I had to read its code to know that last_non_ws was the *column* of
the last non white space character.

[...]

+void
+layout::print_line (int row)

This function is neat.  I like it! :-)

[...]

 void
 diagnostic_show_locus (diagnostic_context * context,
   const diagnostic_info *diagnostic)
@@ -75,16 +710,25 @@ diagnostic_show_locus (diagnostic_context * context,
 return;

+  /* The GCC 5 routine. */

I'd say the GCC <= 5 routine ;-)

+  else
+/* The GCC 6 routine.  */

And here, the GCC > 5 routine.

I would be surprised to see this patch in particular incur any
noticeable increase in time and space consumption, but, have you noticed
anythying related to that during bootstrap?

Cheers,

-- 
Dodji

Re: [PATCH] Clear variables with stale SSA_NAME_RANGE_INFO (PR tree-optimization/67690)

2015-09-25 Thread Richard Biener

On Thu, 24 Sep 2015, Marek Polacek wrote:

> As Richi said in ,
> using recorded SSA name range infos in VRP is likely to expose errors in the
> ranges.  This PR is such a case.  As discussed in the PR, after tail merging
> via PRE the range infos cannot be relied upon anymore, so we need to clear
> them.
> 
> Since tree-ssa-ifcombine.c already had code to clean up the flow data in a BB,
> I've factored it out to a common function.
> 
> Bootstrapped/regtested on x86_64-linux, ok for trunk and 5?

I believe for tail-merge you also need to clear range info on
PHI defs in the BB.  For ifcombine this wasn't necessary (no PHI nodes
in the relevant CFG), but it's ok to extend the new 
reset_flow_sensitive_info_in_bb function to also reset PHI defs.

Ok with that change.

Thanks,
Richard.

> 2015-09-24  Marek Polacek  
> 
>   PR tree-optimization/67690
>   * tree-ssa-ifcombine.c (pass_tree_ifcombine::execute): Call
>   reset_flow_sensitive_info_in_bb.
>   * tree-ssa-tail-merge.c (replace_block_by): Likewise.
>   * tree-ssanames.c: Include "gimple-iterator.h".
>   (reset_flow_sensitive_info_in_bb): New function.
>   * tree-ssanames.h (reset_flow_sensitive_info_in_bb): Declare.
> 
>   * gcc.dg/torture/pr67690.c: New test.
> 
> diff --git gcc/testsuite/gcc.dg/torture/pr67690.c 
> gcc/testsuite/gcc.dg/torture/pr67690.c
> index e69de29..491de51 100644
> --- gcc/testsuite/gcc.dg/torture/pr67690.c
> +++ gcc/testsuite/gcc.dg/torture/pr67690.c
> @@ -0,0 +1,32 @@
> +/* { dg-do run } */
> +
> +const int c1 = 1;
> +const int c2 = 2;
> +
> +int
> +check (int i)
> +{
> +  int j;
> +  if (i >= 0)
> +j = c2 - i;
> +  else
> +j = c2 - i;
> +  return c2 - c1 + 1 > j;
> +}
> +
> +int invoke (int *pi) __attribute__ ((noinline,noclone));
> +int
> +invoke (int *pi)
> +{
> +  return check (*pi);
> +}
> +
> +int
> +main ()
> +{
> +  int i = c1;
> +  int ret = invoke ();
> +  if (!ret)
> +__builtin_abort ();
> +  return 0;
> +}
> diff --git gcc/tree-ssa-ifcombine.c gcc/tree-ssa-ifcombine.c
> index 9f04174..66be430 100644
> --- gcc/tree-ssa-ifcombine.c
> +++ gcc/tree-ssa-ifcombine.c
> @@ -769,16 +769,7 @@ pass_tree_ifcombine::execute (function *fun)
> {
>   /* Clear range info from all stmts in BB which is now executed
>  conditional on a always true/false condition.  */
> - for (gimple_stmt_iterator gsi = gsi_start_bb (bb);
> -  !gsi_end_p (gsi); gsi_next ())
> -   {
> - gimple *stmt = gsi_stmt (gsi);
> - ssa_op_iter i;
> - tree op;
> - FOR_EACH_SSA_TREE_OPERAND (op, stmt, i, SSA_OP_DEF)
> -   reset_flow_sensitive_info (op);
> -   }
> -
> + reset_flow_sensitive_info_in_bb (bb);
>   cfg_changed |= true;
> }
>  }
> diff --git gcc/tree-ssa-tail-merge.c gcc/tree-ssa-tail-merge.c
> index 0ce59e8..487961e 100644
> --- gcc/tree-ssa-tail-merge.c
> +++ gcc/tree-ssa-tail-merge.c
> @@ -1534,6 +1534,10 @@ replace_block_by (basic_block bb1, basic_block bb2)
>e2->probability = GCOV_COMPUTE_SCALE (e2->count, out_sum);
>  }
>  
> +  /* Clear range info from all stmts in BB2 -- this transformation
> + could make them out of date.  */
> +  reset_flow_sensitive_info_in_bb (bb2);
> +
>/* Do updates that use bb1, before deleting bb1.  */
>release_last_vdef (bb1);
>same_succ_flush_bb (bb1);
> diff --git gcc/tree-ssanames.c gcc/tree-ssanames.c
> index 4199290..5393865 100644
> --- gcc/tree-ssanames.c
> +++ gcc/tree-ssanames.c
> @@ -23,6 +23,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "backend.h"
>  #include "tree.h"
>  #include "gimple.h"
> +#include "gimple-iterator.h"
>  #include "hard-reg-set.h"
>  #include "ssa.h"
>  #include "alias.h"
> @@ -544,6 +545,21 @@ reset_flow_sensitive_info (tree name)
>  SSA_NAME_RANGE_INFO (name) = NULL;
>  }
>  
> +/* Clear all flow sensitive data from all statements in BB.  */
> +
> +void
> +reset_flow_sensitive_info_in_bb (basic_block bb)
> +{
> +  for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p (gsi);
> +   gsi_next ())
> +{
> +  gimple *stmt = gsi_stmt (gsi);
> +  ssa_op_iter i;
> +  tree op;
> +  FOR_EACH_SSA_TREE_OPERAND (op, stmt, i, SSA_OP_DEF)
> + reset_flow_sensitive_info (op);
> +}
> +}
>  
>  /* Release all the SSA_NAMEs created by STMT.  */
>  
> diff --git gcc/tree-ssanames.h gcc/tree-ssanames.h
> index 22ff609..5688ca5 100644
> --- gcc/tree-ssanames.h
> +++ gcc/tree-ssanames.h
> @@ -95,6 +95,7 @@ extern tree duplicate_ssa_name_fn (struct function *, tree, 
> gimple *);
>  extern void duplicate_ssa_name_range_info (tree, enum value_range_type,
>  struct range_info_def *);
>  extern void reset_flow_sensitive_info (tree);
> +extern void reset_flow_sensitive_info_in_bb (basic_block);
>  extern void

[Patch 1/2 AArch64/ARM] Give AArch64 ROR (Immediate) a new type attribute

2015-09-25 Thread James Greenhalgh


Hi,

This patch splits the "shift_imm" type attribute used by AArch64 in
two - giving rotate_imm and shift_imm.

We then apply this transform across the AArch64 pipeline descriptions
which have modelling for shift_imm (cortex-a53, cortex-a57, thunderx).
This should give no functional change to these models.

Bootstrapped and tested on aarch64-none-linux-gnu, and
arm-none-linux-gnueabihf with no issues.

OK?

Thanks,
James

---
2015-09-25  James Greenhalgh  

* config/arm/types.md (type): Add rotate_imm.
* config/aarch64/aarch64.md (*ror3_insn): Split out the
ROR immediate case.
(*rorsi3_insn_uxtw): Likewise.
* config/aarch64/thunderx.md (thunderx_shift): Add rotate_imm.
* config/arm/cortex-a53.md (cortex_a53_alu_shift): Add rotate_imm.
* config/arm/cortex-a57.md (cortex_a53_alu): Add rotate_imm.

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 78b9ae2..4f7323c 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -3807,13 +3807,15 @@
 
 ;; Rotate right
 (define_insn "*ror3_insn"
-  [(set (match_operand:GPI 0 "register_operand" "=r")
-(rotatert:GPI
-  (match_operand:GPI 1 "register_operand" "r")
-  (match_operand:QI 2 "aarch64_reg_or_shift_imm_" "rUs")))]
+  [(set (match_operand:GPI 0 "register_operand" "=r,r")
+ (rotatert:GPI
+   (match_operand:GPI 1 "register_operand" "r,r")
+   (match_operand:QI 2 "aarch64_reg_or_shift_imm_" "r,Us")))]
   ""
-  "ror\\t%0, %1, %2"
-  [(set_attr "type" "shift_reg")]
+  "@
+   ror\\t%0, %1, %2
+   ror\\t%0, %1, %2"
+  [(set_attr "type" "shift_reg, rotate_imm")]
 )
 
 ;; zero_extend version of above
@@ -3902,7 +3904,7 @@
   operands[3] = GEN_INT ( - UINTVAL (operands[2]));
   return "ror\\t%0, %1, %3";
 }
-  [(set_attr "type" "shift_imm")]
+  [(set_attr "type" "rotate_imm")]
 )
 
 ;; zero_extend version of the above
@@ -3916,7 +3918,7 @@
   operands[3] = GEN_INT (32 - UINTVAL (operands[2]));
   return "ror\\t%w0, %w1, %3";
 }
-  [(set_attr "type" "shift_imm")]
+  [(set_attr "type" "rotate_imm")]
 )
 
 (define_insn "*_ashl"
diff --git a/gcc/config/aarch64/thunderx.md b/gcc/config/aarch64/thunderx.md
index cf96368..3dae963 100644
--- a/gcc/config/aarch64/thunderx.md
+++ b/gcc/config/aarch64/thunderx.md
@@ -39,7 +39,7 @@
 
 (define_insn_reservation "thunderx_shift" 1
   (and (eq_attr "tune" "thunderx")
-   (eq_attr "type" "bfm,extend,shift_imm,shift_reg,rbit,rev"))
+   (eq_attr "type" "bfm,extend,rotate_imm,shift_imm,shift_reg,rbit,rev"))
   "thunderx_pipe0 | thunderx_pipe1")
 
 
diff --git a/gcc/config/arm/cortex-a53.md b/gcc/config/arm/cortex-a53.md
index db572f6..3fa0625 100644
--- a/gcc/config/arm/cortex-a53.md
+++ b/gcc/config/arm/cortex-a53.md
@@ -76,7 +76,7 @@
 alu_sreg,alus_sreg,logic_reg,logics_reg,\
 adc_imm,adcs_imm,adc_reg,adcs_reg,\
 adr,bfm,csel,clz,rbit,rev,alu_dsp_reg,\
-shift_imm,shift_reg,\
+rotate_imm,shift_imm,shift_reg,\
 mov_imm,mov_reg,mvn_imm,mvn_reg,\
 mrs,multiple,no_insn"))
   "cortex_a53_slot_any")
diff --git a/gcc/config/arm/cortex-a57.md b/gcc/config/arm/cortex-a57.md
index a32c848..d6ce440 100644
--- a/gcc/config/arm/cortex-a57.md
+++ b/gcc/config/arm/cortex-a57.md
@@ -296,7 +296,7 @@
 			alu_sreg,alus_sreg,logic_reg,logics_reg,\
 			adc_imm,adcs_imm,adc_reg,adcs_reg,\
 			adr,bfm,clz,rbit,rev,alu_dsp_reg,\
-			shift_imm,shift_reg,\
+			rotate_imm,shift_imm,shift_reg,\
 			mov_imm,mov_reg,\
 			mvn_imm,mvn_reg,\
 			mrs,multiple,no_insn"))
diff --git a/gcc/config/arm/types.md b/gcc/config/arm/types.md
index ec609ae..534be74 100644
--- a/gcc/config/arm/types.md
+++ b/gcc/config/arm/types.md
@@ -120,6 +120,7 @@
 ;final output, thus having no impact on scheduling.
 ; rbit   reverse bits.
 ; revreverse bytes.
+; rotate_imm rotate by immediate.
 ; sdiv   signed division.
 ; shift_imm  simple shift operation (LSL, LSR, ASR, ROR) with an
 ;immediate.
@@ -627,6 +628,7 @@
   nop,\
   rbit,\
   rev,\
+  rotate_imm,\
   sdiv,\
   shift_imm,\
   shift_reg,\

[Patch 2/2 ARM/AArch64] Add a new Cortex-A53 scheduling model

2015-09-25 Thread James Greenhalgh


Hi,

This patch introduces a new scheduling model for Cortex-A53.

Bootstrapped and tested on arm-none-linux-gnueabi and aarch64-none-linux-gnu
and checked with a variety of popular benchmarking and microbenchmarking
suites to show a benefit.

OK?

Thanks,
James

---
2015-09-25  James Greenhalgh  

* config/arm/aarch-common-protos.h
(aarch_accumulator_forwarding): New.
(aarch_forward_to_shift_is_not_shifted_reg): Likewise.
* config/arm/aarch-common.c (aarch_accumulator_forwarding): New.
(aarch_forward_to_shift_is_not_shifted_reg): Liekwise.
* config/arm/cortex-a53.md: Rewrite.

diff --git a/gcc/config/arm/aarch-common-protos.h b/gcc/config/arm/aarch-common-protos.h
index 29f7c99..348ae74 100644
--- a/gcc/config/arm/aarch-common-protos.h
+++ b/gcc/config/arm/aarch-common-protos.h
@@ -23,7 +23,9 @@
 #ifndef GCC_AARCH_COMMON_PROTOS_H
 #define GCC_AARCH_COMMON_PROTOS_H
 
+extern int aarch_accumulator_forwarding (rtx_insn *, rtx_insn *);
 extern int aarch_crypto_can_dual_issue (rtx_insn *, rtx_insn *);
+extern int aarch_forward_to_shift_is_not_shifted_reg (rtx_insn *, rtx_insn *);
 extern bool aarch_rev16_p (rtx);
 extern bool aarch_rev16_shleft_mask_imm_p (rtx, machine_mode);
 extern bool aarch_rev16_shright_mask_imm_p (rtx, machine_mode);
diff --git a/gcc/config/arm/aarch-common.c b/gcc/config/arm/aarch-common.c
index 5dd8222..43579d8 100644
--- a/gcc/config/arm/aarch-common.c
+++ b/gcc/config/arm/aarch-common.c
@@ -394,6 +394,112 @@ arm_mac_accumulator_is_result (rtx producer, rtx consumer)
   && !reg_overlap_mentioned_p (result, op1));
 }
 
+/* Return non-zero if the destination of PRODUCER feeds the accumulator
+   operand of an MLA-like operation.  */
+
+int
+aarch_accumulator_forwarding (rtx_insn *producer, rtx_insn *consumer)
+{
+  rtx producer_set = single_set (producer);
+  rtx consumer_set = single_set (consumer);
+
+  /* We are looking for a SET feeding a SET.  */
+  if (!producer_set || !consumer_set)
+return 0;
+
+  rtx dest = SET_DEST (producer_set);
+  rtx mla = SET_SRC (consumer_set);
+
+  /* We're looking for a register SET.  */
+  if (!REG_P (dest))
+return 0;
+
+  rtx accumulator;
+
+  /* Strip a zero_extend.  */
+  if (GET_CODE (mla) == ZERO_EXTEND)
+mla = XEXP (mla, 0);
+
+  switch (GET_CODE (mla))
+{
+case PLUS:
+  /* Possibly an MADD.  */
+  if (GET_CODE (XEXP (mla, 0)) == MULT)
+	accumulator = XEXP (mla, 1);
+  else
+	return 0;
+  break;
+case MINUS:
+  /* Possibly an MSUB.  */
+  if (GET_CODE (XEXP (mla, 1)) == MULT)
+	accumulator = XEXP (mla, 0);
+  else
+	return 0;
+  break;
+case FMA:
+	{
+	  /* Possibly an FMADD/FMSUB/FNMADD/FNMSUB.  */
+	  if (REG_P (XEXP (mla, 1))
+	  && REG_P (XEXP (mla, 2))
+	  && (REG_P (XEXP (mla, 0))
+		  || GET_CODE (XEXP (mla, 0)) == NEG))
+
+	{
+	  /* FMADD/FMSUB.  */
+	  accumulator = XEXP (mla, 2);
+	}
+	  else if (REG_P (XEXP (mla, 1))
+		   && GET_CODE (XEXP (mla, 2)) == NEG
+		   && (REG_P (XEXP (mla, 0))
+		   || GET_CODE (XEXP (mla, 0)) == NEG))
+	{
+	  /* FNMADD/FNMSUB.  */
+	  accumulator = XEXP (XEXP (mla, 2), 0);
+	}
+	  else
+	return 0;
+	  break;
+	}
+  default:
+	/* Not an MLA-like operation.  */
+	return 0;
+}
+
+  return (REGNO (dest) == REGNO (accumulator));
+}
+
+/* Return nonzero if the CONSUMER instruction is some sort of
+   arithmetic or logic + shift operation, and the register we are
+   writing in PRODUCER is not used in a register shift by register
+   operation.  */
+
+int
+aarch_forward_to_shift_is_not_shifted_reg (rtx_insn *producer,
+	   rtx_insn *consumer)
+{
+  rtx value, op;
+  rtx early_op;
+
+  if (!arm_get_set_operands (producer, consumer, , ))
+return 0;
+
+  if ((early_op = arm_find_shift_sub_rtx (op)))
+{
+  if (REG_P (early_op))
+	early_op = op;
+
+  /* Any other canonicalisation of a shift is a shift-by-constant
+	 so we don't care.  */
+  if (GET_CODE (early_op) == ASHIFT)
+	return (!REG_P (XEXP (early_op, 0))
+		|| !REG_P (XEXP (early_op, 1)));
+  else
+	return 1;
+}
+
+  return 0;
+}
+
 /* Return non-zero if the consumer (a multiply-accumulate instruction)
has an accumulator dependency on the result of the producer (a
multiplication instruction) and no other dependency on that result.  */
diff --git a/gcc/config/arm/cortex-a53.md b/gcc/config/arm/cortex-a53.md
index 3fa0625..f7dc7e9 100644
--- a/gcc/config/arm/cortex-a53.md
+++ b/gcc/config/arm/cortex-a53.md
@@ -22,345 +22,699 @@
 (define_automaton "cortex_a53")
 
 
-;; Functional units.
+;; General-purpose functional units.
 
 
-;; There are two main integer execution pipelines, described as
-;; slot 0 and issue slot 1.
+;; We use slot0 and slot1 to model constraints on which

[Patch 0/2 ARM/AArch64] Add a new Cortex-A53 scheduling model

2015-09-25 Thread James Greenhalgh

Hi,

This two patch series updates our scheduling model for the Cortex-A53
processor.

Patch 1/2 splits out the "shift_imm" type attribute used by the AArch64
target to give us two types - rotate_imm and shift_imm.

Patch 2/2 introduces the new scheduling model.

I've bootstrapped and tested the patches in series and individually
on both arm-none-linux-gnueabihf and aarch64-none-linux-gnu targets,
and I've checked the performance against a set of popular benchmark
suites to show a benefit.

OK for trunk?

Thanks,
James

---
[Patch 1/2 AArch64/ARM] Give AArch64 ROR (Immediate) a new type attribute

2015-09-25  James Greenhalgh  

* config/arm/types.md (type): Add rotate_imm.
* config/aarch64/aarch64.md (*ror3_insn): Split out the
ROR immediate case.
(*rorsi3_insn_uxtw): Likewise.
* config/aarch64/thunderx.md (thunderx_shift): Add rotate_imm.
* config/arm/cortex-a53.md (cortex_a53_alu_shift): Add rotate_imm.
* config/arm/cortex-a57.md (cortex_a53_alu): Add rotate_imm.

[Patch 2/2 ARM/AArch64] Add a new Cortex-A53 scheduling model

2015-09-25  James Greenhalgh  

* config/arm/aarch-common-protos.h
(aarch_accumulator_forwarding): New.
(aarch_forward_to_shift_is_not_shifted_reg): Likewise.
* config/arm/aarch-common.c (aarch_accumulator_forwarding): New.
(aarch_forward_to_shift_is_not_shifted_reg): Liekwise.
* config/arm/cortex-a53.md: Rewrite.

[PATCH][AArch64] Improve add immediate expansion

2015-09-25 Thread Wilco Dijkstra

This patch improves add immediate expansion by always expanding complex adds 
into a move immediate
and an add. This enables CSE of all complex immediates. A separate split 
pattern enables combine to
emit a 2-instruction add sequence for single use cases.

Bootstrapped & regression tested on AArch64.

OK for commit?

ChangeLog:
2015-09-25  Wilco Dijkstra  

* gcc/config/aarch64/aarch64.md (add3):
Block early expansion into 2 add instructions.
(add3_pluslong): New pattern to combine complex
immediates into 2 additions.

---
 gcc/config/aarch64/aarch64.md | 54 ---
 1 file changed, 35 insertions(+), 19 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 78b9ae2..13d9257 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -1542,30 +1542,46 @@
  (match_operand:GPI 2 "aarch64_pluslong_operand" "")))]
   ""
   "
-  if (! aarch64_plus_operand (operands[2], VOIDmode))
+  if (!aarch64_plus_operand (operands[2], VOIDmode))
 {
-  HOST_WIDE_INT imm = INTVAL (operands[2]);
-
-  if (aarch64_move_imm (imm, mode) && can_create_pseudo_p ())
-{
+  if (can_create_pseudo_p ())
+   {
  rtx tmp = gen_reg_rtx (mode);
  emit_move_insn (tmp, operands[2]);
  operands[2] = tmp;
-}
+   }
   else
-{
- rtx subtarget = ((optimize && can_create_pseudo_p ())
-  ? gen_reg_rtx (mode) : operands[0]);
-
- if (imm < 0)
-   imm = -(-imm & ~0xfff);
- else
-   imm &= ~0xfff;
-
- emit_insn (gen_add3 (subtarget, operands[1], GEN_INT (imm)));
- operands[1] = subtarget;
- operands[2] = GEN_INT (INTVAL (operands[2]) - imm);
-}
+   {
+ HOST_WIDE_INT imm = INTVAL (operands[2]);
+ imm = imm >= 0 ? imm & 0xfff : -(-imm & 0xfff);
+ emit_insn (gen_add3 (operands[0], operands[1],
+GEN_INT (INTVAL (operands[2]) - imm)));
+ operands[1] = operands[0];
+ operands[2] = GEN_INT (imm);
+   }
+}
+  "
+)
+
+;; Find add with a 2-instruction immediate and merge into 2 add instructions.
+
+(define_insn_and_split "*add3_pluslong"
+  [(set
+(match_operand:GPI 0 "register_operand" "")
+(plus:GPI (match_operand:GPI 1 "register_operand" "")
+ (match_operand:GPI 2 "aarch64_pluslong_operand" "")))]
+  "!aarch64_plus_operand (operands[2], VOIDmode)
+   && !aarch64_move_imm (INTVAL (operands[2]), mode)"
+  "#"
+  "&& true"
+  [(set (match_dup 0) (plus:GPI (match_dup 1) (match_dup 3)))
+   (set (match_dup 0) (plus:GPI (match_dup 0) (match_dup 4)))]
+  "
+{
+  HOST_WIDE_INT imm = INTVAL (operands[2]);
+  imm = imm >= 0 ? imm & 0xfff : -(-imm & 0xfff);
+  operands[3] = GEN_INT (INTVAL (operands[2]) - imm);
+  operands[4] = GEN_INT (imm);
 }
   "
 )
-- 
1.9.1

Re: New post-LTO OpenACC pass

2015-09-25 Thread Bernd Schmidt


On 09/25/2015 12:56 PM, Nathan Sidwell wrote:

On 09/25/15 06:28, Bernd Schmidt wrote:

Can you send me the patch you tried (and possibly a testcase you
expect to be
handled), I'll see if I can find out what's going on.


Thanks!  When things didn't work, I tried getting it workong on the
gomp4 branch, as I new what to expect there.  So the patch is for that
branch.

The fails I observed are:

FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/if-1.c
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none
execution test


Ok, I tried to compile this one. When using -O for host cc1 and ptx 
lto1, I see fold_builtin_1 being executed with state == EXPANSION.


In host cc1:

10294 return fold_convert (integer_type_node, result);
(gdb) p result
$16 = 
(gdb) pge
warning: Expression is not an assignment (and might have no effect)
2 == 2 || 2 == 0

In ptx lto1:

(gdb) p result
$1 = 
(gdb) pge
warning: Expression is not an assignment (and might have no effect)
2 == 4 || 2 == 5

I'm not really sure about the logic, but are the results maybe switched 
(returning false on the device and true on the host)?


I think the reason you're seeing calls to acc_on_device when not 
optimizing is this code:


5931	  /* When not optimizing, generate calls to library functions for a 
certain

5932 set of builtins.  */
5933  if (!optimize
5934  && !called_as_built_in (fndecl)
5935  && fcode != BUILT_IN_FORK
[...]

which should probably have the acc_on_device code added to the list.


Bernd

Re: PR pretty-print/67567 do not pass NULL as a string

2015-09-25 Thread Dodji Seketeli

Manuel López-Ibáñez  a écrit:

> 2015-09-15  Manuel López-Ibáñez  
>
> PR pretty-print/67567
> * resolve.c (resolve_fl_procedure): Work-around when iface->module
> == NULL.

This is OK, thanks.

-- 
Dodji

Re: [PATCH v2][GCC] Algorithmic optimization in match and simplify

2015-09-25 Thread Richard Biener

On Fri, Sep 25, 2015 at 1:30 PM, Andre Vieira
 wrote:
> On 17/09/15 10:46, Richard Biener wrote:
>>
>> On Thu, Sep 3, 2015 at 1:11 PM, Andre Vieira
>>  wrote:
>>>
>>> On 01/09/15 15:01, Richard Biener wrote:


 On Tue, Sep 1, 2015 at 3:40 PM, Andre Vieira
  wrote:
>
>
> Hi Marc,
>
> On 28/08/15 19:07, Marc Glisse wrote:
>>
>>
>>
>> (not a review, I haven't even read the whole patch)
>>
>> On Fri, 28 Aug 2015, Andre Vieira wrote:
>>
>>> 2015-08-03  Andre Vieira  
>>>
>>> * match.pd: Added new patterns:
>>>   ((X {&,<<,>>} C0) {|,^} C1) {^,|} C2)
>>>   (X {|,^,&} C0) {<<,>>} C1 -> (X {<<,>>} C1) {|,^,&} (C0 {<<,>>}
>>> C1)
>>
>>
>>
>>
>> +(for op0 (rshift rshift lshift lshift bit_and bit_and)
>> + op1 (bit_ior bit_xor bit_ior bit_xor bit_ior bit_xor)
>> + op2 (bit_xor bit_ior bit_xor bit_ior bit_xor bit_ior)
>>
>> You can nest for-loops, it seems clearer as:
>> (for op0 (rshift lshift bit_and)
>>  (for op1 (bit_ior bit_xor)
>>   op2 (bit_xor bit_ior)
>
>
>
>
> Will do, thank you for pointing it out.
>>
>>
>>
>>
>> +(simplify
>> + (op2:c
>> +  (op1:c
>> +   (op0 @0 INTEGER_CST@1) INTEGER_CST@2) INTEGER_CST@3)
>>
>> I suspect you will want more :s (single_use) and less :c
>> (canonicalization
>> should put constants in second position).
>>
> I can't find the definition of :s (single_use).



 Sorry for that - didn't get along updating it yet :/  It restricts
 matching to
 sub-expressions that have a single-use.  So

 +  a &= 0xd123;
 +  unsigned short tem = a ^ 0x6040;
 +  a = tem | 0xc031; /* Simplify _not_ to ((a & 0xd123) | 0xe071).  */
 ... use of tem ...

 we shouldn't do the simplifcation here because the expression
 (a & 0x123) ^ 0x6040 is kept live by 'tem'.

> GCC internals do point out
> that canonicalization does put constants in the second position, didnt
> see
> that first. Thank you for pointing it out.
>
>> +   C1 = wi::bit_and_not (C1,C2);
>>
>> Space after ','.
>>
> Will do.
>
>> Having wide_int_storage in many places is surprising, I can't find
>> similar
>> code anywhere else in gcc.
>>
>>
>
> I tried looking for examples of something similar, I think I ended up
> using
> wide_int because I was able to convert easily to and from it and it has
> the
> "mask" and "wide_int_to_tree" functions. I welcome any suggestions on
> what I
> should be using here for integer constant transformations and
> comparisons.



 Using wide-ints is fine, but you shouldn't need 'wide_int_storage'
 constructors - those
 are indeed odd.  Is it just for the initializers of wide-ints?

 +wide_int zero_mask = wi::zero (prec);
 +wide_int C0 = wide_int_storage (@1);
 +wide_int C1 = wide_int_storage (@2);
 +wide_int C2 = wide_int_storage (@3);
 ...
 +   zero_mask = wide_int_storage (wi::mask (C0.to_uhwi (), false,
 prec));

 tree_to_uhwi (@1) should do the trick as well

 +   C1 = wi::bit_and_not (C1,C2);
 +   cst_emit = wi::bit_or (C1, C2);

 the ops should be replacable with @2 and @3, the store to C1 obviously
 not
 (but you can use a tree temporary and use wide_int_to_tree here to avoid
 the back-and-forth for the case where C1 is not assigned to).

 Note that transforms only doing association are prone to endless
 recursion
 in case some other pattern does the reverse op...

 Richard.


> BR,
> Andre
>

>>> Thank you for all the comments, see reworked version:
>>>
>>> Two new algorithmic optimisations:
>>>1.((X op0 C0) op1 C1) op2 C2)
>>>  with op0 = {&, >>, <<}, op1 = {|,^}, op2 = {|,^} and op1 != op2
>>>  zero_mask has 1's for all bits that are sure to be 0 in (X op0 C0)
>>>  and 0's otherwise.
>>>  if (op1 == '^') C1 &= ~C2 (Only changed if actually emitted)
>>>  if ((C1 & ~zero_mask) == 0) then emit (X op0 C0) op2 (C1 op2 C2)
>>>  if ((C2 & ~zero_mask) == 0) then emit (X op0 C0) op1 (C1 op2 C2)
>>>2. (X {|,^,&} C0) {<<,>>} C1 -> (X {<<,>>} C1) {|,^,&} (C0 {<<,>>} C1)
>>>
>>>
>>> This patch does two algorithmic optimisations that target patterns like:
>>> (((x << 24) | 0x00FF) ^ 0xFF00) and ((x ^ 0x4002) >> 1) |
>>> 0x8000.
>>>
>>> The transformation uses the knowledge of which bits are zero after
>>> operations like (X {&,<<,(unsigned)>>}) to combine constants, reducing
>>> run-time operations.
>>> The two examples above would be transformed into (X << 24) ^ 0x
>>> and

Re: [Patch, fortran] PR40054 and PR63921 - Implement pointer function assignment - redux

2015-09-25 Thread Paul Richard Thomas

Dear Mikael, dear all,

Please find attached a revised version of the patch that, I believe,
addresses all the comments. The patch is very much improved and these
improvements are verified by the two extra testcases. Thanks a
million!

Most of the effort involved in preparing this revised patch was
associated with getting rid of ICEs/segfaults triggered by error
recovery. The error handling in resolve_ptr_fcn_assign is still a bit
clumsy but its behaviour is more consistent.

Bootstraps and regtests on FC21/x86_64 - OK for trunk?

Cheers

Paul

2015-09-25  Paul Thomas  

* decl.c (get_proc_name): Return if statement function is
found.
* expr.c (gfc_check_vardef_context): Add error return for
derived type expression lacking the derived type itself.
* io.c (next_char_not_space): Change tab warning to warning now
to prevent locus being lost.
* match.c (gfc_match_ptr_fcn_assign): New function.
* match.h : Add prototype for gfc_match_ptr_fcn_assign.
* parse.c : Add static flag 'in_specification_block'.
(decode_statement): If in specification block match a statement
function, then, if standard embraces F2008 and no error arising
from statement function matching, try to match pointer function
assignment.
(parse_interface): Set 'in_specification_block' on exiting from
parse_spec.
(parse_spec): Set and then reset 'in_specification_block'.
(gfc_parse_file): Set 'in_specification_block'.
* resolve.c (get_temp_from_expr): Extend to include functions
and array constructors as rvalues..
(resolve_ptr_fcn_assign): New function.
(gfc_resolve_code): Call it on finding a pointer function as an
lvalue. If valid or on error, go back to start of resolve_code.
* symbol.c (gfc_add_procedure): Add a sentence to the error to
flag up the ambiguity between a statement function and pointer
function assignment at the end of the specification block.

2015-09-25  Paul Thomas  

* gfortran.dg/fmt_tab_1.f90: Change from run to compile and set
standard as legacy.
* gfortran.dg/function_types_3.f90: Change error message to
"Type inaccessible"
* gfortran.dg/ptr_func_assign_1.f08: New test.
* gfortran.dg/ptr_func_assign_2.f08: New test.

2015-09-25  Mikael Morin  

* gfortran.dg/ptr_func_assign_3.f08: New test.
* gfortran.dg/ptr_func_assign_4.f08: New test.

On 18 September 2015 at 10:36, Paul Richard Thomas
 wrote:
> Dear Mikael,
>
> Thank you very much for the review. I'll give consideration to your
> remarks over the weekend. You will have guessed from the comment that
> I too was uneasy about forcing the break. As for your last remark,
> yes, the code rewriting is indeed in the wrong place. It should be
> rather easy to accomplish both the checks and defined assignments.
>
> Thanks again
>
> Paul
>
> On 17 September 2015 at 15:43, Mikael Morin  wrote:
>> Le 06/09/2015 18:40, Paul Richard Thomas a écrit :
>>>
>>> It helps to attach the patch :-)
>>>
>>> Paul
>>>
>>> On 6 September 2015 at 13:42, Paul Richard Thomas
>>>  wrote:

 Dear All,

 The attached patch more or less implements the assignment of
 expressions to the result of a pointer function. To wit:

 my_ptr_fcn (arg1, arg2...) = expr

 arg1 would usually be the target, pointed to by the function. The
 patch parses these statements and resolves them into:

 temp_ptr => my_ptr_fcn (arg1, arg2...)
 temp_ptr = expr

 I say more or less implemented because I have ducked one of the
 headaches here. At the end of the specification block, there is an
 ambiguity between statement functions and pointer function
 assignments. I do not even try to resolve this ambiguity and require
 that there be at least one other type of executable statement before
 these beasts. This can undoubtedly be fixed but the effort seems to me
 to be unwarranted at the present time.

 This version of the patch extends the coverage of allowed rvalues to
 any legal expression. Also, all the problems with error.c have been
 dealt with by Manuel's patch.

 I am grateful to Dominique for reminding me of PR40054 and pointing
 out PR63921. After a remark of his on #gfortran, I fixed the checking
 of the standard to pick up all the offending lines with F2003 and
 earlier.

 Bootstraps and regtests on FC21/x86_64 - OK for trunk?

>> Hello Paul,
>>
>> I'm mostly concerned about the position where the code rewriting happens.
>> Details below.
>>
>> Mikael
>>
>>
>>>
>>> submit_2.diff
>>>
>>
>>> Index: gcc/fortran/parse.c
>>> ===
>>> *** gcc/fortran/parse.c (revision 227508)
>>> --- gcc/fortran/parse.c (working copy)
>>> *** decode_statement

Re: [RFC] PR tree-optimization/67628: Make tree ifcombine more symmetric and interactions with dom

2015-09-25 Thread Kyrill Tkachov



On 25/09/15 12:35, Richard Biener wrote:

On Fri, 25 Sep 2015, Kyrill Tkachov wrote:


On 25/09/15 11:15, Richard Biener wrote:

On Fri, 25 Sep 2015, Kyrill Tkachov wrote:


On 25/09/15 10:49, Richard Biener wrote:

On Fri, 25 Sep 2015, Kyrill Tkachov wrote:


Hi all,

On 23/09/15 11:10, Richard Biener wrote:

On Wed, 23 Sep 2015, Kyrill Tkachov wrote:


On 23/09/15 10:09, Pinski, Andrew wrote:

On Sep 23, 2015, at 1:59 AM, Kyrill Tkachov

wrote:



On 22/09/15 20:31, Jeff Law wrote:

On 09/22/2015 07:36 AM, Kyrill Tkachov wrote:
Hi all,
Unfortunately, I see a testsuite regression with this
patch:
FAIL: gcc.dg/pr66299-2.c scan-tree-dump-not optimized "<<"

The reduced part of that test is:
void
test1 (int x, unsigned u)
{
if ((1U << x) != 64
|| (2 << x) != u
|| (x << x) != 384
|| (3 << x) == 9
|| (x << 14) != 98304U
|| (1 << x) == 14
|| (3 << 2) != 12)
  __builtin_abort ();
}

The patched ifcombine pass works more or less as expected
and
produces
fewer basic blocks.
Before this patch a relevant part of the ifcombine dump
for
test1
is:
;;   basic block 2, loop depth 0, count 0, freq 1,
maybe
hot
if (x_1(D) != 6)
  goto ;
else
  goto ;

;;   basic block 3, loop depth 0, count 0, freq 9996,
maybe
hot
_2 = 2 << x_1(D);
_3 = (unsigned intD.10) _2;
if (_3 != u_4(D))
  goto ;
else
  goto ;


After this patch it is:
;;   basic block 2, loop depth 0, count 0, freq 1,
maybe
hot
_2 = 2 << x_1(D);
_3 = (unsigned intD.10) _2;
_9 = _3 != u_4(D);
_10 = x_1(D) != 6;
_11 = _9 | _10;
if (_11 != 0)
  goto ;
else
  goto ;

The second form ends up generating worse codegen however,
and
the
badness starts with the dom1 pass.
In the unpatched case it manages to deduce that x must be
6 by
the
time
it reaches basic block 3 and
uses that information to eliminate the shift in "_2 = 2 <<
x_1(D)"
from
basic block 3
In the patched case it is unable to make that call, I
think
because
the
x != 6 condition is IORed
with another test.

I'm not familiar with the internals of the dom pass, so
I'm
not
sure
where to go looking for a fix for this.
Is the ifcombine change a step in the right direction? If
so,
what
would
need to be done to fix the issue with
the dom pass?

I don't see how you can reasonably fix this in DOM.  if _9
or
_10 is
true, then _11 is true.  But we can't reasonably record any
kind
of
equivalence for _9 or _10 individually.

If the statement
_11 = _9 | _10;

Were changed to

_11 = _9 & _10;

Then we could record something useful about _9 and _10.



I suppose what we want is to not combine basic blocks if
the
sequence
and conditions of the basic blocks are
such that dom can potentially exploit them, but how do we
express
that?

I don't think there's going to be a way to directly express
that.
You
could essentially claim that TRUTH_OR is more expensive than
TRUTH_AND
because of the impact on DOM, but that in and of itself may
not
resolve
the situation either.

I think the question we need to answer is whether or not
your
changes
are generally better, even if there's specific instances
where
they
make
things worse.  If the benefits outweigh the negatives then
we
can
xfail
that test.

Ok, I'll investigate and benchmark some more.
Andrew, this transformation to ifcombine (together with the
restriction
that the inner condition block
has to be a single comparison) was added by you with r204194.
Is there a particular reason for that restriction and why it
is
applied to
the inner block and not either?

My reasoning at the time was there might be an "expensive"
instruction
or
one that might trap (I did not check to see if the other part of
the
code
was detecting that).
The outer block did not need any checks as we have something
like
...
If (a)
   If (b)

Or

If (a)
   Goto f
else if (b)
  
Else
{
F:

}

And there was no need to check what was before the if (a) part
just
what
is
in between the two ifs.

Ah, because the code in outer_cond_bb would have to be executed
anyway
whether
we perform the conversion or not, right?

All ifcombine transforms make the outer condition unconditionally
true/false thus the check should have been on whether the outer
cond BB is "empty".  Which would solve your problem, right?

Note that other transforms (bit test recognition) don't care (sth
we might want to fix?).

In general this needs a better cost function, maybe simply use
estimate_num_insns with speed estimates and compare against a
new --param.

So I've played around with that code and I think I'd like to make it a
bit
more intricate. Just comparing against estimate_num_insns is too
coarse-grained and
is just a comparison by a magic number number and I've been struggling
to
make
this
aggressive enough without pulling in too much code into the

Re: [gcc-5-branch][PATCH][AARCH64]Fix for branch offsets over 1 MiB

2015-09-25 Thread James Greenhalgh

On Fri, Sep 11, 2015 at 06:15:23PM +0100, Andre Vieira wrote:
> Conditional branches have a maximum range of [-1048576, 1048572]. Any 
> destination further away can not be reached by these.
> To be able to have conditional branches in very large functions, we 
> invert the condition and change the destination to jump over an 
> unconditional branch to the original, far away, destination.
> 
> This patch backports the fix from trunk to the gcc-5-branch.
> The original patch is at:
> https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01493.html

OK.

Thanks,
James

> gcc/ChangeLog:
> 2015-09-09  Andre Vieira  
> 
>Backport from mainline:
>2015-08-27  Ramana Radhakrishnan  
>Andre Vieira  
> 
>* config/aarch64/aarch64.md (*condjump): Handle functions > 1 MiB.
>(*cb1): Likewise.
>(*tb1): Likewise.
>(*cb1): Likewise.
>* config/aarch64/iterators.md (inv_cb): New code attribute.
>(inv_tb): Likewise.
>* config/aarch64/aarch64.c (aarch64_gen_far_branch): New.
>* config/aarch64/aarch64-protos.h (aarch64_gen_far_branch): New.
> 
> 
> gcc/testsuite/ChangeLog:
> 2015-09-09  Andre Vieira  
> 
>Backport from mainline:
>2015-08-27  Andre Vieira  
> 
>* gcc.target/aarch64/long_branch_1.c: New test.

Re: New post-LTO OpenACC pass

2015-09-25 Thread Bernd Schmidt


On 09/25/2015 02:30 PM, Bernd Schmidt wrote:


(gdb) p result
$1 = 
(gdb) pge
warning: Expression is not an assignment (and might have no effect)
2 == 4 || 2 == 5

I'm not really sure about the logic, but are the results maybe switched
(returning false on the device and true on the host)?


Eh, no, the testcase seems to want to know if it's running on the host, 
so that appears OK. But AFAICS it's doing the right thing. Stepping into 
libgomp:


182   else if (acc_device_type (acc_dev->type) == acc_device_host)
(gdb) p acc_dev->type
$1 = OFFLOAD_TARGET_TYPE_HOST
(gdb) next
184   fn (hostaddrs);

It's not running the offloaded version, so the testcase I think should fail.


Bernd

1 2 >

1 - 100 of 127 matches

Mail list logo