Re: [PATCH][RFC] Remove warning for SET VOIDmode -> BLKmode.

2015-11-06 Thread Jeff Law

On 11/06/2015 04:12 PM, Bernd Schmidt wrote:

On 11/06/2015 10:45 PM, Jeff Law wrote:

On 11/03/2015 10:16 AM, Dominik Vogt wrote:

* genrecog.c (validate_pattern): Allow "set VOIDmode -> BLKmode"
without
warnings.

First, for reference, I was initially concerned the patterns were bogus,
but these are reasonably canonical ways to implement memset and the like.


I still think it would be better to fix the patterns by inserting an
UNSPEC. It's really not clear what set (BLK) (VOID) is supposed to mean.
Let's queue cleaning that up for GCC 7, unless you want to try and get 
to it over the weekend. I don't know how many targets are affected offhand.


Jeff


Re: [PATCH][RFC] Remove warning for SET VOIDmode -> BLKmode.

2015-11-06 Thread Bernd Schmidt

On 11/06/2015 10:45 PM, Jeff Law wrote:

On 11/03/2015 10:16 AM, Dominik Vogt wrote:

* genrecog.c (validate_pattern): Allow "set VOIDmode -> BLKmode"
without
warnings.

First, for reference, I was initially concerned the patterns were bogus,
but these are reasonably canonical ways to implement memset and the like.


I still think it would be better to fix the patterns by inserting an 
UNSPEC. It's really not clear what set (BLK) (VOID) is supposed to mean.



Bernd



[PATCH 1/2] [graphite] do not create unnecessary dimensions in scop scattering

2015-11-06 Thread Sebastian Pop
   * graphite-sese-to-poly.c (build_pbb_scattering_polyhedrons): Remove.
   (build_pbb_minimal_scattering_polyhedrons): New.
   (build_scop_scattering): Remove.
   (build_scop_minimal_scattering): New.
   (build_scop_scattering): Call 
build_pbb_minimal_scattering_polyhedrons.
   (build_poly_scop): Call build_scop_minimal_scattering.
---
 gcc/graphite-sese-to-poly.c | 190 +++-
 1 file changed, 136 insertions(+), 54 deletions(-)

diff --git a/gcc/graphite-sese-to-poly.c b/gcc/graphite-sese-to-poly.c
index fa33c50..394a281 100644
--- a/gcc/graphite-sese-to-poly.c
+++ b/gcc/graphite-sese-to-poly.c
@@ -231,51 +231,78 @@ isl_id_for_pbb (scop_p s, poly_bb_p pbb)
| 0   0   1   0   0   0   0   0  -5  = 0  */
 
 static void
-build_pbb_scattering_polyhedrons (isl_aff *static_sched,
- poly_bb_p pbb)
+build_pbb_minimal_scattering_polyhedrons (isl_aff *static_sched, poly_bb_p pbb,
+ int *sequence_dims,
+ int nb_sequence_dim)
 {
-  isl_val *val;
+  int local_dim = isl_set_dim (pbb->domain, isl_dim_set);
+
+  /* Remove a sequence dimension if irrelevant to domain of current pbb.  */
+  int actual_nb_dim = 0;
+  for (int i = 0; i < nb_sequence_dim; i++)
+if (sequence_dims[i] <= local_dim)
+  actual_nb_dim++;
+
+  /* Build an array that combines sequence dimensions and loops dimensions 
info.
+ This is used later to compute the static scattering polyhedrons.  */
+  bool *sequence_and_loop_dims = NULL;
+  if (local_dim + actual_nb_dim > 0)
+{
+  sequence_and_loop_dims = XNEWVEC (bool, local_dim + actual_nb_dim);
 
-  int scattering_dimensions = isl_set_dim (pbb->domain, isl_dim_set) * 2 + 1;
+  int i = 0, j = 0;
+  for (; i < local_dim; i++)
+   {
+ if (sequence_dims && sequence_dims[j] == i)
+   {
+ /* True for sequence dimension.  */
+ sequence_and_loop_dims[i + j] = true;
+ j++;
+   }
+ /* False for loop dimension.  */
+ sequence_and_loop_dims[i + j] = false;
+   }
+  /* Fake loops make things shifted by one.  */
+  if (sequence_dims && sequence_dims[j] == i)
+   sequence_and_loop_dims[i + j] = true;
+}
 
+  int scattering_dimensions = local_dim + actual_nb_dim;
   isl_space *dc = isl_set_get_space (pbb->domain);
-  isl_space *dm = isl_space_add_dims (isl_space_from_domain (dc),
- isl_dim_out, scattering_dimensions);
+  isl_space *dm = isl_space_add_dims (isl_space_from_domain (dc), isl_dim_out,
+ scattering_dimensions);
   pbb->schedule = isl_map_universe (dm);
 
+  int k = 0;
   for (int i = 0; i < scattering_dimensions; i++)
 {
-  /* Textual order inside this loop.  */
-  if ((i % 2) == 0)
+  if (!sequence_and_loop_dims[i])
{
- isl_constraint *c = isl_equality_alloc
- (isl_local_space_from_space (isl_map_get_space (pbb->schedule)));
-
- val = isl_aff_get_coefficient_val (static_sched, isl_dim_in, i / 2);
- gcc_assert (val && isl_val_is_int (val));
-
- val = isl_val_neg (val);
- c = isl_constraint_set_constant_val (c, val);
- c = isl_constraint_set_coefficient_si (c, isl_dim_out, i, 1);
- pbb->schedule = isl_map_add_constraint (pbb->schedule, c);
-   }
-
-  /* Iterations of this loop.  */
-  else /* if ((i % 2) == 1) */
-   {
- int loop = (i - 1) / 2;
- pbb->schedule = isl_map_equate (pbb->schedule, isl_dim_in, loop,
+ /* Iterations of this loop - loop dimension.  */
+ pbb->schedule = isl_map_equate (pbb->schedule, isl_dim_in, k,
  isl_dim_out, i);
+ k++;
+ continue;
}
+
+  /* Textual order inside this loop - sequence dimension.  */
+  isl_space *s = isl_map_get_space (pbb->schedule);
+  isl_local_space *ls = isl_local_space_from_space (s);
+  isl_constraint *c = isl_equality_alloc (ls);
+  isl_val *val = isl_aff_get_coefficient_val (static_sched, isl_dim_in, k);
+  gcc_assert (val && isl_val_is_int (val));
+  val = isl_val_neg (val);
+  c = isl_constraint_set_constant_val (c, val);
+  c = isl_constraint_set_coefficient_si (c, isl_dim_out, i, 1);
+  pbb->schedule = isl_map_add_constraint (pbb->schedule, c);
 }
 
+  XDELETEVEC (sequence_and_loop_dims);
   pbb->transformed = isl_map_copy (pbb->schedule);
 }
 
-/* Build for BB the static schedule.
-
-   The static schedule is a Dewey numbering of the abstract syntax
-   tree: http://en.wikipedia.org/wiki/Dewey_Decimal_Classification
+/* Build the static schedule for BB.  This function minimizes the number of
+   dimensions used for pbb sequences.
 
The following example informally defines the static schedule:
 
@@ -283,40 

Re: [libgomp] task scheduler rewrite and task priorities implementation

2015-11-06 Thread Aldy Hernandez

On 10/29/2015 08:46 AM, Aldy Hernandez wrote:

Yo!

As promised, here is the work implementing a new API for the task
scheduler, rewriting the scheduler to fit into this new API, and
implementing the task priorities that are in OpenMP > 4.1.  There are
also lots of cleanups and documentation.


After some back and forth with Jakub, here is my final answer :).

I removed the priority_node allocation overhead in favor of keeping the 
data in gomp_task with some fancy offsetof() tricks.  There is now no 
measurable performance difference for the common case (0 priority tasks) 
in tweaked versions of nqueens-1.c, sort-1.c, and strassen.f90.


I also chased down a memory leak, but otherwise, the patch is mostly the 
same.


Tested on x86-64 Linux with 4 cores.

Approved by Jakub.

Committed to gomp-4_5-branch.  Jakub, I'm sure you can pull it from 
there to mainline when you deem appropriate.


Enjoy.
Aldy

commit d37f644faa16f40b05cc4cd90f9dd6f6b868b5ea
Author: Aldy Hernandez 
Date:   Fri Nov 6 14:57:29 2015 -0800

* Makefile.am (libgomp_la_SOURCES): Add priority_queue.c.
* Makefile.in: Regenerate.
* libgomp.h: Shuffle prototypes and forward definitions around so
priority queues can be defined.
(struct gomp_task): Remove children, next_child, prev_child,
next_queue, prev_queue, next_taskgroup, prev_taskgroup.
Add pnode field.
(struct gomp_taskgroup): Remove children.
Add taskgroup_queue.
(struct gomp_team): Change task_queue type to a priority queue.
(splay_compare): Define inline.
(priority_queue_offset): New.
(priority_node_to_task): New.
(task_to_priority_node): New.
* oacc-mem.c: Do not include splay-tree.h.
* priority_queue.c: New file.
* priority_queue.h: New file.
* splay-tree.c: Do not include splay-tree.h.
(splay_tree_foreach_internal): New.
(splay_tree_foreach): New.
* splay-tree.h: Become re-entrant if splay_tree_prefix is defined.
(splay_tree_callback): Define typedef.
* target.c (splay_compare): Move to libgomp.h.
* task.c (gomp_init_task): Initialize children_queue.
(gomp_clear_parent_in_list): New.
(gomp_clear_parent_in_tree): New.
(gomp_clear_parent): Handle priorities.
(GOMP_task): Same.
(gomp_create_target_task): Use priority queues.
(verify_children_queue): Remove.
(priority_list_upgrade_task): New.
(priority_queue_upgrade_task): New.
(verify_task_queue): Remove.
(priority_list_downgrade_task): New.
(priority_queue_downgrade_task): New.
(gomp_task_run_pre): Use priority queues.
Abstract code out to priority_queue_downgrade_task.
(gomp_task_run_post_handle_dependers): Use priority queues.
(gomp_task_run_post_remove_parent): Same.
(gomp_task_run_post_remove_taskgroup): Same.
(gomp_barrier_handle_tasks): Same.
(GOMP_taskwait): Same.
(gomp_task_maybe_wait_for_dependencies): Same.  Abstract code to
priority-queue_upgrade_task.
(GOMP_taskgroup_start): Use priority queues.
(GOMP_taskgroup_end): Same.
* taskloop.c (GOMP_taskloop): Handle priorities.
* team.c (gomp_new_team): Call priority_queue_init.
(free_team): Call priority_queue_free.
* testsuite/libgomp.c/priority.c: New test.

diff --git a/libgomp/Makefile.am b/libgomp/Makefile.am
index 5411278..a3e1c2b 100644
--- a/libgomp/Makefile.am
+++ b/libgomp/Makefile.am
@@ -63,7 +63,7 @@ libgomp_la_SOURCES = alloc.c barrier.c critical.c env.c 
error.c iter.c \
task.c team.c work.c lock.c mutex.c proc.c sem.c bar.c ptrlock.c \
time.c fortran.c affinity.c target.c splay-tree.c libgomp-plugin.c \
oacc-parallel.c oacc-host.c oacc-init.c oacc-mem.c oacc-async.c \
-   oacc-plugin.c oacc-cuda.c
+   oacc-plugin.c oacc-cuda.c priority_queue.c
 
 include $(top_srcdir)/plugin/Makefrag.am
 
diff --git a/libgomp/Makefile.in b/libgomp/Makefile.in
index 79745ce..7a1c976 100644
--- a/libgomp/Makefile.in
+++ b/libgomp/Makefile.in
@@ -168,7 +168,7 @@ am_libgomp_la_OBJECTS = alloc.lo barrier.lo critical.lo 
env.lo \
fortran.lo affinity.lo target.lo splay-tree.lo \
libgomp-plugin.lo oacc-parallel.lo oacc-host.lo oacc-init.lo \
oacc-mem.lo oacc-async.lo oacc-plugin.lo oacc-cuda.lo \
-   $(am__objects_1)
+   priority_queue.lo $(am__objects_1)
 libgomp_la_OBJECTS = $(am_libgomp_la_OBJECTS)
 DEFAULT_INCLUDES = -I.@am__isrc@
 depcomp = $(SHELL) $(top_srcdir)/../depcomp
@@ -415,7 +415,7 @@ libgomp_la_SOURCES = alloc.c barrier.c critical.c env.c 
error.c iter.c \
bar.c ptrlock.c time.c fortran.c affinity.c target.c \
splay-tree.c libgomp-plugin.c oacc-parallel.c oacc-host.c \
oacc-init.c oacc-mem.c oacc-async.c oacc-plugin.c oacc-cuda.c \
-   $(am__append_2)
+   

Re: [PATCH][RTL-ifcvt] PR rtl-optimization/67749: Do not emit separate SET insn in IF-ELSE case

2015-11-06 Thread Jeff Law

On 11/03/2015 02:09 AM, Kyrill Tkachov wrote:

Hi Jeff,

On 02/11/15 22:46, Jeff Law wrote:

On 10/27/2015 08:49 AM, Kyrill Tkachov wrote:

Hi all,

This patch fixes the gcc.dg/ifcvt-2.c test for x86_64 where we were
failing to if-convert. This was because in my patch at
https://gcc.gnu.org/viewcvs/gcc?view=revision=228194 which
tried to emit a SET to move the source of insn_a or insn_b (that came
from the test block) into a temporary. A SET however, is not always
enough. For example, on x86_64 in order for the resulting insn to be
recognised it frequently needs to be in PARALLEL with a (clobber
(reg:CC FLAGS_REG)). This leads to the insn not being recognised.

I don't think it affects the approach you've chosen, but I'm
mentioning it for future reference.

gcse.c has some helper code (that probably ought to move into a more
generic file) that will test for this situation.  Search for the
instances of recog.  It essentially does something like

emit_insn (gen_rtx_SET (...))

And tries to recognize the result to determine if it's valid.


you mean compute_can_copy and can_copy_p? I was not aware of those.
Interesting, they look like they
can be useful in places indeed. I'll keep them in mind for any future work.

Yes.  Those.




What happens in the case were noce_emit_bb returns a failure? We've
modified the original insns to use the new pseudos.  Won't this result
in the original pseudo's uses using undefined values?


We should be fine because we don't modify the original insns. We create
a copy of them i.e.
rtx_insn *copy_of_a = as_a  (copy_rtx (insn_a));

and modify the SET_DEST of that. The original insn should still remain
intact if any step in
noce_try_cmove_arith fails, so we can revert back to the original sequence.

In that case, OK for the trunk.

Thanks,
jeff



[PATCH] Allow vrp to thread across backedges using FSM threader

2015-11-06 Thread Jeff Law


This is in preparation for removing a blob of code in the old threader 
that knew how to thread across backedges.


Essentially this just allows us to try FSM threading in VRP's instance 
of threader.  So certain jump threading happens earlier in the pipeline.


It also tightens up the ssa-dom-thread-7 test a bit.  It was previously 
just counting the number of jump threads registered.  Now it tracks the 
number of jump threads actually realized.  I verified the actually 
realized jump threads were the same before/after -- several just moved 
from DOM1 to VRP1.


This does not effect 68198, which will be the top of my todo list after 
a few more small cleanups take place prior to stage1 close.


Bootstrapped and regression tested on x86-64-linux-gnu.  Installed on 
the trunk.




Jeff
commit f6a3449159568d1b352e9b3d49b1df0ba8b25a6f
Author: law 
Date:   Fri Nov 6 23:26:20 2015 +

[PATCH] Allow vrp to thread across backedges using FSM threader

* cfg-flags.def (IGNORE): New edge flag.
* tree-vrp.c (identify_jump_threads): Mark and clear edges
scheduled for removal with EDGE_IGNORE around call into
jump threader.  Do no thread across edges with EDGE_IGNORE,
but do allow threading across those with EDGE_DFS_BACK.

* gcc.dg/tree-ssa/ssa-dom-thread-7.c: Adjust to look for
realized jump threads.
* gcc.dg/tree-ssa-pr66752-3.c: Look in vrp1 dump for jump
threads rather than dom1 dump.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@229902 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index e2f588b..552c51b 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,11 @@
+2015-11-06  Jeff Law 
+
+   * cfg-flags.def (IGNORE): New edge flag.
+   * tree-vrp.c (identify_jump_threads): Mark and clear edges
+   scheduled for removal with EDGE_IGNORE around call into
+   jump threader.  Do no thread across edges with EDGE_IGNORE,
+   but do allow threading across those with EDGE_DFS_BACK.
+
 2015-11-06  David Wohlferd  
 
* doc/md.texi (multi-alternative constraints): Don't document
diff --git a/gcc/cfg-flags.def b/gcc/cfg-flags.def
index eedcd69..e2bfbed 100644
--- a/gcc/cfg-flags.def
+++ b/gcc/cfg-flags.def
@@ -78,7 +78,7 @@ DEF_BASIC_BLOCK_FLAG(RTL, 9)
 DEF_BASIC_BLOCK_FLAG(FORWARDER_BLOCK, 10)
 
 /* Set on blocks that cannot be threaded through.
-   Only used in cfgcleanup.c.  */
+   Only used for jump threading.  */
 DEF_BASIC_BLOCK_FLAG(NONTHREADABLE_BLOCK, 11)
 
 /* Set on blocks that were modified in some way.  This bit is set in
@@ -177,6 +177,11 @@ DEF_EDGE_FLAG(TM_UNINSTRUMENTED, 15)
 /* Abort (over) edge out of a GIMPLE_TRANSACTION statement.  */
 DEF_EDGE_FLAG(TM_ABORT, 16)
 
+/* An edge we should ignore.  It should be entirely local to
+   passes.  ie, it is never set on any edge upon the completion
+   of any pass.  */
+DEF_EDGE_FLAG(IGNORE, 17)
+
 #endif
 
 /*
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 80221c1..af4a738 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,10 @@
+2015-11-06  Jeff Law  
+
+   * gcc.dg/tree-ssa/ssa-dom-thread-7.c: Adjust to look for
+   realized jump threads.
+   * gcc.dg/tree-ssa-pr66752-3.c: Look in vrp1 dump for jump
+   threads rather than dom1 dump.
+
 2015-11-06  Michael Collison  
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c
index f15b598..577a489 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr66752-3.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-dom1-details -fdump-tree-optimized" } */
+/* { dg-options "-O2 -fdump-tree-vrp1-details -fdump-tree-optimized" } */
 
 extern int status, pt;
 extern int count;
@@ -33,7 +33,7 @@ foo (int N, int c, int b, int *a)
 }
 
 /* There are 3 FSM jump threading opportunities.  */
-/* { dg-final { scan-tree-dump-times "FSM" 3 "dom1"} } */
+/* { dg-final { scan-tree-dump-times "FSM" 3 "vrp1"} } */
 
 /* There should be no assignments or references to FLAG.  */
 /* { dg-final { scan-tree-dump-not "flag" "optimized"} } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
index 445f250..ac12b6c 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
@@ -1,6 +1,8 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-dom1-details" } */
-/* { dg-final { scan-tree-dump-times "FSM" 38 "dom1" } } */
+/* { dg-options "-O2 -fdump-tree-vrp1-stats -fdump-tree-dom1-stats 
-fdump-tree-dom2-stats" } */
+/* { dg-final { scan-tree-dump "Jumps threaded: 7"  "vrp1" } } */
+/* { 

[gomp4] revert fortran declare changes

2015-11-06 Thread Cesar Philippidis
This patch reverts the declare cleanups I introduced in
. Notably, I
deleted resolve_omp_duplicate_list and resolve_oacc_declare_map.

Jim, I also reverted my changes to declare-2.f95. Please make the error
messages consistent with the rest of the FE going forward. Also, note
that the outer for loop in here

+  for (list = OMP_LIST_LINK; list <= OMP_LIST_LINK; list++)
+   for (n = oc->clauses->lists[list]; n; n = n->next)
+ {
+   if (n->sym->mark)
+ gfc_error ("Symbol %qs present on multiple clauses at %L",
+n->sym->name, );
+   else
+ n->sym->mark = 1;
+ }

is unnecessary. I'm not sure if that you intended to check for other
lists or what.

I've applied this patch to gomp-4_0-branch.

Cesar
2015-11-06  Cesar Philippidis  

	gcc/fortran/
	* openmp.c (gfc_match_oacc_declare): Revert error message changes.
	(resolve_omp_duplicate_list): Delete.
	(resolve_oacc_declare_map): Delete.
	(gfc_resolve_oacc_declare): Scan map clauses in place.

	gcc/testsuite/
	* gfortran.dg/goacc/declare-2.f95: Update expected errors.

diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index 484add8..1572fdb 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -1445,7 +1445,7 @@ gfc_match_oacc_declare (void)
   gfc_omp_clauses *c;
   gfc_omp_namelist *n;
   gfc_namespace *ns = gfc_current_ns;
-  gfc_oacc_declare *new_oc;
+  gfc_oacc_declare *new_oc, *oc;
   bool module_var = false;
 
   if (gfc_match_omp_clauses (, OACC_DECLARE_CLAUSES, 0, false, false, true)
@@ -1467,8 +1467,8 @@ gfc_match_oacc_declare (void)
 	  if (n->u.map_op != OMP_MAP_FORCE_ALLOC
 	  && n->u.map_op != OMP_MAP_FORCE_TO)
 	{
-	  gfc_error ("Invalid clause in module with $!ACC DECLARE at %L",
-			 >where);
+	  gfc_error ("Invalid clause in module with "
+			 "$!ACC DECLARE at %C");
 	  return MATCH_ERROR;
 	}
 
@@ -1477,29 +1477,29 @@ gfc_match_oacc_declare (void)
 
   if (ns->proc_name->attr.oacc_function)
 	{
-	  gfc_error ("Invalid declare in routine with $!ACC DECLARE at %C");
+	  gfc_error ("Invalid declare in routine with " "$!ACC DECLARE at %C");
 	  return MATCH_ERROR;
 	}
 
   if (s->attr.in_common)
 	{
-	  gfc_error ("Variable in a common block with $!ACC DECLARE at %L",
-		 >where);
+	  gfc_error ("Unsupported: variable in a common block with "
+		 "$!ACC DECLARE at %C");
 	  return MATCH_ERROR;
 	}
 
   if (s->attr.use_assoc)
 	{
-	  gfc_error ("Variable is USE-associated with $!ACC DECLARE at %L",
-		 >where);
+	  gfc_error ("Unsupported: variable is USE-associated with "
+		 "$!ACC DECLARE at %C");
 	  return MATCH_ERROR;
 	}
 
   if ((s->attr.dimension || s->attr.codimension)
 	  && s->attr.dummy && s->as->type != AS_EXPLICIT)
 	{
-	  gfc_error ("Assumed-size dummy array with $!ACC DECLARE at %L",
-		 >where);
+	  gfc_error ("Unsupported: assumed-size dummy array with "
+		 "$!ACC DECLARE at %C");
 	  return MATCH_ERROR;
 	}
 
@@ -1527,6 +1527,37 @@ gfc_match_oacc_declare (void)
   new_oc->module_var = module_var;
   new_oc->clauses = c;
   new_oc->where = gfc_current_locus;
+
+  for (oc = new_oc; oc; oc = oc->next)
+{
+  c = oc->clauses;
+  for (n = c->lists[OMP_LIST_MAP]; n != NULL; n = n->next)
+	n->sym->mark = 0;
+}
+
+  for (oc = new_oc; oc; oc = oc->next)
+{
+  c = oc->clauses;
+  for (n = c->lists[OMP_LIST_MAP]; n != NULL; n = n->next)
+	{
+	  if (n->sym->mark)
+	{
+	  gfc_error ("Symbol %qs present on multiple clauses at %C",
+			 n->sym->name);
+	  return MATCH_ERROR;
+	}
+	  else
+	n->sym->mark = 1;
+	}
+}
+
+  for (oc = new_oc; oc; oc = oc->next)
+{
+  c = oc->clauses;
+  for (n = c->lists[OMP_LIST_MAP]; n != NULL; n = n->next)
+	n->sym->mark = 1;
+}
+
   ns->oacc_declare = new_oc;
 
   return MATCH_YES;
@@ -3151,41 +3182,6 @@ resolve_omp_udr_clause (gfc_omp_namelist *n, gfc_namespace *ns,
   return copy;
 }
 
-/* Check if a variable appears in multiple clauses.  */
-
-static void
-resolve_omp_duplicate_list (gfc_omp_namelist *clause_list, bool openacc,
-			int list)
-{
-  gfc_omp_namelist *n;
-  const char *error_msg = "Symbol %qs present on multiple clauses at %L";
-
-  /* OpenACC reduction clauses are compatible with everything.  We only
- need to check if a reduction variable is used more than once.  */
-  if (openacc && list == OMP_LIST_REDUCTION)
-{
-  hash_set reductions;
-
-  for (n = clause_list; n; n = n->next)
-	{
-	  if (reductions.contains (n->sym))
-	gfc_error (error_msg, n->sym->name, >where);
-	  else
-	reductions.add (n->sym);
-	}
-
-  return;
-}
-
-  /* Ensure that variables are only used in one clause.  */
-  for (n = clause_list; n; n = n->next)
-{
-  if (n->sym->mark)
-	gfc_error (error_msg, n->sym->name, >where);
-  else
-	n->sym->mark = 1;
-}
-}
 
 

Re: [PATCH] i386: Use the STC bb-reorder algorithm at -Os (PR67864)

2015-11-06 Thread Segher Boessenkool
Adding x86 maintainer, ping?

On Fri, Oct 16, 2015 at 05:53:41AM -0700, Segher Boessenkool wrote:
> For x86, STC still gives better results for optimise-for-size than
> "simple" does.  So use STC at -Os as well.
> 
> Is this okay for trunk?
> 
> 
> Segher
> 
> 
> 2015-10-16  Segher Boessenkool  
> 
>   PR rtl-optimization/67864
>   * common/config/i386/i386-common.c (ix86_option_optimization_table)
>   : Use REORDER_BLOCKS_ALGORITHM_STC
>   at -Os and up.
> 
> ---
>  gcc/common/config/i386/i386-common.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/gcc/common/config/i386/i386-common.c 
> b/gcc/common/config/i386/i386-common.c
> index 79b2472..bb9f29c 100644
> --- a/gcc/common/config/i386/i386-common.c
> +++ b/gcc/common/config/i386/i386-common.c
> @@ -1011,6 +1011,9 @@ static const struct default_options 
> ix86_option_optimization_table[] =
>  { OPT_LEVELS_2_PLUS, OPT_free, NULL, 1 },
>  /* Enable function splitting at -O2 and higher.  */
>  { OPT_LEVELS_2_PLUS, OPT_freorder_blocks_and_partition, NULL, 1 },
> +/* The STC algorithm produces the smallest code at -Os, for x86.  */
> +{ OPT_LEVELS_2_PLUS, OPT_freorder_blocks_algorithm_, NULL,
> +  REORDER_BLOCKS_ALGORITHM_STC },
>  /* Turn off -fschedule-insns by default.  It tends to make the
> problem with not enough registers even worse.  */
>  { OPT_LEVELS_ALL, OPT_fschedule_insns, NULL, 0 },
> -- 
> 2.4.3


Re: regrename: Fix for earlyclobber operands

2015-11-06 Thread Bernd Schmidt

On 11/06/2015 08:30 PM, Jeff Law wrote:

* regrename.c (record_out_operands): Terminate earlyclobbered
operands here.

Do you want to terminate those chains after step 6 is complete so that
the earlyclobber output conflicts with all the other outputs?


I'm not sure what you mean by your question. This patch is about 
terminating live registers that are overwritten by earlyclobber output 
operands. We open a new chain for every output, ec or not, and since 
they are live at the same time, they are all considered conflicting.



Bernd


[gomp4] backport trunk FE changes

2015-11-06 Thread Cesar Philippidis
I've applied this patch to gomp-4_0-branch which backports most of my
front end changes from trunk. Note that I found a regression while
testing, which is also present in trunk. It looks like
kernels-acc-loop-reduction.c is failing because I'm incorrectly
propagating the reduction variable to both to the kernels and loop
constructs for combined 'acc kernels loop'. The problem here is that
kernels don't support the reduction clause. I'll fix that next week.

Cesar
2015-11-06  Cesar Philippidis  

	gcc/c-family/
	* c-omp.c (c_oacc_split_loop_clauses): Make TILE, GANG, WORKER, VECTOR,
	AUTO, SEQ, INDEPENDENT and PRIVATE loop clauses.  Associate REDUCTION
	clauses with parallel and kernels and loops.

	gcc/c/
	* c-parser.c (c_parser_omp_clause_default): Replace only_none with
	is_oacc argument.
	(c_parser_oacc_shape_clause): Allow pointers arguments to gang static.
	(c_parser_oacc_clause_tile): Backport cleanups from trunnk.
	(c_parser_oacc_all_clauses): Likewise, update call to
	c_parser_omp_clause_default.
	(c_parser_omp_all_clauses): Update call to c_parser_omp_clause_default.

	gcc/cp/
	* parser.c (cp_parser_oacc_shape_clause): Allow pointers arguments to
	gang static.
	(cp_parser_oacc_clause_tile): Backport cleanups from trunnk.
	(cp_parser_omp_clause_default): Replace is_omp argument with is_oacc.
	(cp_parser_oacc_all_clauses): Likewise, update call to
	c_parser_omp_clause_{default,tile}.
	(cp_parser_omp_all_clauses): Update call to
	c_parser_omp_clause_default.
	(OACC_PARALLEL_CLAUSE_MASK): Remove PRAGMA_OACC_CLAUSE_GANG.
	* pt.c (tsubst_omp_clauses):
	* semantics.c (finish_omp_clauses):

	gcc/testsuite/
	* c-c++-common/goacc/combined-directives.c: New test.
	* c-c++-common/goacc/loop-clauses.c: New test.
	* c-c++-common/goacc/loop-shape.c: More test cases.
	* c-c++-common/goacc/loop-tile-k1.c: Update error messages.
	* c-c++-common/goacc/loop-tile-p1.c: Likewise.
	* c-c++-common/goacc/tile.c: New test.

diff --git a/gcc/c-family/c-omp.c b/gcc/c-family/c-omp.c
index 67d9da0..8411814 100644
--- a/gcc/c-family/c-omp.c
+++ b/gcc/c-family/c-omp.c
@@ -694,13 +694,12 @@ c_finish_omp_for (location_t locus, enum tree_code code, tree declv,
 /* This function splits clauses for OpenACC combined loop
constructs.  OpenACC combined loop constructs are:
#pragma acc kernels loop
-   #pragma acc parallel loop
-*/
+   #pragma acc parallel loop  */
 
 tree
 c_oacc_split_loop_clauses (tree clauses, tree *not_loop_clauses)
 {
-  tree next, loop_clauses;
+  tree next, loop_clauses, t;
 
   loop_clauses = *not_loop_clauses = NULL_TREE;
   for (; clauses ; clauses = next)
@@ -709,27 +708,29 @@ c_oacc_split_loop_clauses (tree clauses, tree *not_loop_clauses)
 
   switch (OMP_CLAUSE_CODE (clauses))
 {
+	  /* Loop clauses.  */
 	case OMP_CLAUSE_COLLAPSE:
-	case OMP_CLAUSE_REDUCTION:
+	case OMP_CLAUSE_TILE:
 	case OMP_CLAUSE_GANG:
 	case OMP_CLAUSE_WORKER:
 	case OMP_CLAUSE_VECTOR:
 	case OMP_CLAUSE_AUTO:
 	case OMP_CLAUSE_SEQ:
+	case OMP_CLAUSE_INDEPENDENT:
+	case OMP_CLAUSE_PRIVATE:
 	  OMP_CLAUSE_CHAIN (clauses) = loop_clauses;
 	  loop_clauses = clauses;
 	  break;
 
-	case OMP_CLAUSE_PRIVATE:
-	  {
-	tree nc = build_omp_clause (OMP_CLAUSE_LOCATION (clauses),
-	OMP_CLAUSE_CODE (clauses));
-	OMP_CLAUSE_DECL (nc) = OMP_CLAUSE_DECL (clauses);
-	OMP_CLAUSE_CHAIN (nc) = loop_clauses;
-	loop_clauses = nc;
-	  }
-	  /* FALLTHRU */
+	  /* Reductions belong in both constructs.  */
+	case OMP_CLAUSE_REDUCTION:
+	  t = copy_node (clauses);
+	  OMP_CLAUSE_CHAIN (t) = loop_clauses;
+	  loop_clauses = t;
+
+	  /* FIXME: device_type */
 
+	  /* Parallel/kernels clauses.  */
 	default:
 	  OMP_CLAUSE_CHAIN (clauses) = *not_loop_clauses;
 	  *not_loop_clauses = clauses;
diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index fa70055..96c1bdc 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -10627,11 +10627,13 @@ c_parser_omp_clause_copyprivate (c_parser *parser, tree list)
 }
 
 /* OpenMP 2.5:
-   default ( shared | none ) */
+   default ( shared | none )
+
+   OpenACC 2.0:
+   default (none) */
 
 static tree
-c_parser_omp_clause_default (c_parser *parser, tree list,
-			 bool only_none = false)
+c_parser_omp_clause_default (c_parser *parser, tree list, bool is_oacc)
 {
   enum omp_clause_default_kind kind = OMP_CLAUSE_DEFAULT_UNSPECIFIED;
   location_t loc = c_parser_peek_token (parser)->location;
@@ -10652,7 +10654,7 @@ c_parser_omp_clause_default (c_parser *parser, tree list,
 	  break;
 
 	case 's':
-	  if (strcmp ("shared", p) != 0 || only_none)
+	  if (strcmp ("shared", p) != 0 || is_oacc)
 	goto invalid_kind;
 	  kind = OMP_CLAUSE_DEFAULT_SHARED;
 	  break;
@@ -10666,7 +10668,7 @@ c_parser_omp_clause_default (c_parser *parser, tree list,
   else
 {
 invalid_kind:
-  if (only_none)
+  if (is_oacc)
 	c_parser_error (parser, "expected %");
   else
 	c_parser_error (parser, "expected % or %");
@@ -11326,7 +11328,10 @@ 

[PATCH 2/2] add original schedule to scop

2015-11-06 Thread Sebastian Pop
From: Abderrazek Zaafrani 

* graphite-optimize-isl.c (optimize_isl): Call isl_union_map_is_equal.
* graphite-poly.c (new_scop): Initialize original_schedule.
(free_scop): Free original_schedule.
* graphite-poly.h (struct scop): Add field original_schedule.
* graphite-sese-to-poly.c (build_scop_original_schedule): New.
(build_poly_scop): Call build_scop_original_schedule.
---
 gcc/graphite-optimize-isl.c | 20 ++
 gcc/graphite-poly.c |  2 ++
 gcc/graphite-poly.h |  3 +++
 gcc/graphite-sese-to-poly.c | 50 +
 4 files changed, 71 insertions(+), 4 deletions(-)

diff --git a/gcc/graphite-optimize-isl.c b/gcc/graphite-optimize-isl.c
index 0d85975..c09264b 100644
--- a/gcc/graphite-optimize-isl.c
+++ b/gcc/graphite-optimize-isl.c
@@ -441,11 +441,23 @@ optimize_isl (scop_p scop)
 #else
   isl_union_map *schedule_map = get_schedule_map (schedule);
 #endif
-  apply_schedule_map_to_scop (scop, schedule_map);
 
-  isl_schedule_free (schedule);
-  isl_union_map_free (schedule_map);
-  return true;
+  if (isl_union_map_is_equal (scop->original_schedule, schedule_map))
+{
+  if (dump_file && dump_flags)
+   fprintf (dump_file, "\nISL schedule same as original schedule\n");
+
+  isl_schedule_free (schedule);
+  isl_union_map_free (schedule_map);
+  return false;
+}
+  else
+{
+  apply_schedule_map_to_scop (scop, schedule_map);
+  isl_schedule_free (schedule);
+  isl_union_map_free (schedule_map);
+  return true;
+}
 }
 
 #endif /* HAVE_isl */
diff --git a/gcc/graphite-poly.c b/gcc/graphite-poly.c
index 2aa40c0..36c3061 100644
--- a/gcc/graphite-poly.c
+++ b/gcc/graphite-poly.c
@@ -305,6 +305,7 @@ new_scop (edge entry, edge exit)
   scop->must_waw_no_source = NULL;
   scop->may_waw_no_source = NULL;
   scop_set_region (scop, region);
+  scop->original_schedule = NULL;
   scop->pbbs.create (3);
   scop->poly_scop_p = false;
   scop->drs.create (3);
@@ -341,6 +342,7 @@ free_scop (scop_p scop)
   isl_union_map_free (scop->may_waw);
   isl_union_map_free (scop->must_waw_no_source);
   isl_union_map_free (scop->may_waw_no_source);
+  isl_union_map_free (scop->original_schedule);
   XDELETE (scop);
 }
 
diff --git a/gcc/graphite-poly.h b/gcc/graphite-poly.h
index 5298f85..b35431a 100644
--- a/gcc/graphite-poly.h
+++ b/gcc/graphite-poly.h
@@ -436,6 +436,9 @@ struct scop
 *must_war, *may_war, *must_war_no_source, *may_war_no_source,
 *must_waw, *may_waw, *must_waw_no_source, *may_waw_no_source;
 
+  /* Original schedule of the SCoP.  */
+  isl_union_map *original_schedule;
+
   /* True when the scop has been converted to its polyhedral
  representation.  */
   bool poly_scop_p;
diff --git a/gcc/graphite-sese-to-poly.c b/gcc/graphite-sese-to-poly.c
index 394a281..ba45199 100644
--- a/gcc/graphite-sese-to-poly.c
+++ b/gcc/graphite-sese-to-poly.c
@@ -426,6 +426,55 @@ build_scop_minimal_scattering (scop_p scop)
   isl_aff_free (static_sched);
 }
 
+/* Build the original schedule showing the orginal order of execution
+   of statement instances.
+
+   The following example shows the original schedule:
+
+   for (i: ...)
+ {
+   for (j: ...)
+ {
+   A
+ }
+   B
+ }
+   C
+   for (i: ...)
+ {
+   D
+ }
+
+   Static schedules for A to D expressed in a union map:
+
+   { S_A[i0, i1] -> [i0, i1]; S_B[i0] -> [i0]; S_C[] -> []; S_9[i0] -> [i0] }
+
+*/
+
+static void
+build_scop_original_schedule (scop_p scop)
+{
+  isl_space *space = isl_set_get_space (scop->param_context);
+  isl_union_map *res = isl_union_map_empty (space);
+
+  int i;
+  poly_bb_p pbb;
+  FOR_EACH_VEC_ELT (scop->pbbs, i, pbb)
+{
+  int nb_dimensions = isl_set_dim (pbb->domain, isl_dim_set);
+  isl_space *dc = isl_set_get_space (pbb->domain);
+  isl_space *dm = isl_space_add_dims (isl_space_from_domain (dc),
+ isl_dim_out, nb_dimensions);
+  isl_map *mp = isl_map_universe (dm);
+  for (int i = 0; i < nb_dimensions; i++)
+   mp = isl_map_equate (mp, isl_dim_in, i, isl_dim_out, i);
+
+  res = isl_union_map_add_map (res, mp);
+}
+  scop->original_schedule = res;
+}
+
+
 static isl_pw_aff *extract_affine (scop_p, tree, __isl_take isl_space *space);
 
 /* Extract an affine expression from the chain of recurrence E.  */
@@ -1799,6 +1848,7 @@ build_poly_scop (scop_p scop)
 
   build_scop_drs (scop);
   build_scop_minimal_scattering (scop);
+  build_scop_original_schedule (scop);
 
   /* This SCoP has been translated to the polyhedral
  representation.  */
-- 
1.9.1



libgo patch committed: Fix unexported embedded structs

2015-11-06 Thread Ian Lance Taylor
There has been a long-standing discrepancy between the gc and the
gccgo Go compilers in their handling of the type descriptors for
unexported embedded structs.  gccgo correctly records a PkgPath for
the package where those embedded structs are defined.  gc does not.
The Go libraries are written expect the gc behaviour, which causes
them to both be slightly incorrect and to fail in some cases when used
with gccgo.  This was recently sorted out in the gc library.  I'm
bringing the patch over into gccgo now so that the compilers can
finally be consistent.

This patch adds three gc changes to libgo:
https://golang.org/cl/14010, https:/golang.org/cl/14011, and
https://golang.org/cl/14012.  The main Go bug for this is
https://golang.org/issue/7247 .  It has also been reported against
gccgo as https://gcc.gnu.org/PR66138 .

Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
to mainline and GCC 5 branch.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 229880)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-10c1d6756ed1dcc814c49921c2a5e27f4677e0e6
+012ab5cb2ef1c26e8023ce90d3a2bba174da7b30
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/go/encoding/json/decode_test.go
===
--- libgo/go/encoding/json/decode_test.go   (revision 229832)
+++ libgo/go/encoding/json/decode_test.go   (working copy)
@@ -118,6 +118,7 @@ type Top struct {
Loop
Embed0p // has Point with X, Y, used
Embed0q // has Point with Z, used
+   embed   // contains exported field
 }
 
 type Embed0 struct {
@@ -148,6 +149,10 @@ type Embed0q struct {
Point
 }
 
+type embed struct {
+   Q int
+}
+
 type Loop struct {
Loop1 int `json:",omitempty"`
Loop2 int `json:",omitempty"`
@@ -331,7 +336,8 @@ var unmarshalTests = []unmarshalTest{
"Loop2": 14,
"X": 15,
"Y": 16,
-   "Z": 17
+   "Z": 17,
+   "Q": 18
}`,
ptr: new(Top),
out: Top{
@@ -361,6 +367,9 @@ var unmarshalTests = []unmarshalTest{
Embed0q: Embed0q{
Point: Point{Z: 17},
},
+   embed: embed{
+   Q: 18,
+   },
},
},
{
@@ -507,12 +516,15 @@ func TestMarshalEmbeds(t *testing.T) {
Embed0q: Embed0q{
Point: Point{Z: 17},
},
+   embed: embed{
+   Q: 18,
+   },
}
b, err := Marshal(top)
if err != nil {
t.Fatal(err)
}
-   want := 
"{\"Level0\":1,\"Level1b\":2,\"Level1c\":3,\"Level1a\":5,\"LEVEL1B\":6,\"e\":{\"Level1a\":8,\"Level1b\":9,\"Level1c\":10,\"Level1d\":11,\"x\":12},\"Loop1\":13,\"Loop2\":14,\"X\":15,\"Y\":16,\"Z\":17}"
+   want := 
"{\"Level0\":1,\"Level1b\":2,\"Level1c\":3,\"Level1a\":5,\"LEVEL1B\":6,\"e\":{\"Level1a\":8,\"Level1b\":9,\"Level1c\":10,\"Level1d\":11,\"x\":12},\"Loop1\":13,\"Loop2\":14,\"X\":15,\"Y\":16,\"Z\":17,\"Q\":18}"
if string(b) != want {
t.Errorf("Wrong marshal result.\n got: %q\nwant: %q", b, want)
}
Index: libgo/go/encoding/json/encode.go
===
--- libgo/go/encoding/json/encode.go(revision 229832)
+++ libgo/go/encoding/json/encode.go(working copy)
@@ -1022,7 +1022,7 @@ func typeFields(t reflect.Type) []field
// Scan f.typ for fields to include.
for i := 0; i < f.typ.NumField(); i++ {
sf := f.typ.Field(i)
-   if sf.PkgPath != "" { // unexported
+   if sf.PkgPath != "" && !sf.Anonymous { // 
unexported
continue
}
tag := sf.Tag.Get("json")
Index: libgo/go/encoding/xml/marshal_test.go
===
--- libgo/go/encoding/xml/marshal_test.go   (revision 229832)
+++ libgo/go/encoding/xml/marshal_test.go   (working copy)
@@ -139,6 +139,7 @@ type EmbedA struct {
EmbedC
EmbedB EmbedB
FieldA string
+   embedD
 }
 
 type EmbedB struct {
@@ -153,6 +154,11 @@ type EmbedC struct {
FieldC  string
 }
 
+type embedD struct {
+   fieldD string
+   FieldE string // Promoted and visible when embedD is embedded.
+}
+
 type NameCasing struct {
XMLName struct{} `xml:"casing"`
Xy  string
@@ -711,6 +717,9 

Re: [PATCH v3 1/2] [PR debug/67192] Fix C loops' back-jump location

2015-11-06 Thread Jeff Law

On 11/04/2015 09:17 AM, Andreas Arnez wrote:

Since r223098 ("Implement -Wmisleading-indentation") the backward-jump
generated for a C while- or for-loop can get the wrong line number.
This is because the check for misleading indentation peeks ahead one
token, advancing input_location to after the loop, and then
c_finish_loop() creates the back-jump and calls add_stmt(), which
assigns input_location to the statement by default.

This patch swaps the check for misleading indentation with the finishing
of the loop, such that input_location still has the right value at the
time of any invocations of add_stmt().  Note that this does not fully
cover all cases where the back-jump gets the wrong location.

gcc/c/ChangeLog:

PR debug/67192
* c-parser.c (c_parser_while_statement): Finish the loop before
parsing ahead for misleading indentation.
(c_parser_for_statement): Likewise.

gcc/testsuite/ChangeLog:

PR debug/67192
* gcc.dg/guality/pr67192.c: New test.

OK.

You might consider testing C++ on the same tests.  I wouldn't be 
surprised if it shows the same problem.


jeff



Re: Move const char * -> int/fp folds to fold-const-call.c

2015-11-06 Thread Jeff Law

On 11/06/2015 08:22 AM, Richard Sandiford wrote:

This patch moves folds that deal with constant string arguments and
return a constant integer or floating-point value.  For example, it
handles strcmp ("foo", "bar") but not strstr ("foobar", "bar"),
which wouldn't currently be accepted by the gimple folders.

The builtins.c folding for strlen (via c_strlen) is a bit more general
than what the fold-const-call.c code does (and more general than we need
for the gimple folders).  I've therefore left it as-is, even though it
partially duplicates the new code.

Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.
OK to install?

Thanks,
Richard


gcc/
* builtins.c (fold_builtin_nan): Delete.
(fold_builtin_memcmp): Remove case where both arguments are constant.
(fold_builtin_strcmp, fold_builtin_strncmp): Likewise.
(fold_builtin_strspn, fold_builtin_strcspn): Likewise.
(fold_builtin_1): Remove BUILT_IN_NAN* handling.
* fold-const-call.c: Include fold-const.h.
(host_size_t_cst_p): New function.
(build_cmp_result, fold_const_builtin_nan): Likewise.
(fold_const_call_1): New function, split out from...
(fold_const_call): ...here (for all three interfaces).  Handle
constant nan, nans, strlen, strcmp, strncmp, strspn and strcspn.


OK.

jeff





[PATCH] Remove more backedge threading support

2015-11-06 Thread Jeff Law


This removes the last bits of code to thread through backedges using the 
old threader.   Behaviour is essentially the same, except some shuffling 
of code to ensure we always fall back to the FSM threader if a backedge 
is found.


Bootstrapped and regression tested on x86_64-linux-gnu.  Installing on 
the trunk momentarily.  This does not affect 68198.


Next step in the cleanup is removing the bits that handle updating the 
CFG/SSA graph for threads across backedges :-)


Jeff
commit 5de480fca9f5b76d13033a274699b9557f5fe66c
Author: law 
Date:   Sat Nov 7 06:31:14 2015 +

[PATCH] Remove more backedge threading support

* tree-ssa-threadedge.c (dummy_simplify): Remove.
(thread_around_empty_blocks): Remove backedge_seen_p argument.
If we thread to a backedge, then return false.  Update recursive
call to eliminate backedge_seen_p argument.
(thread_through_normal_block): Remove backedge_seen_p argument.
Remove backedge_seen_p argument from calls to
thread_around_empty_blocks.  Remove checks on backedge_seen_p.
If we thread to a backedge, then return 0.
(thread_across_edge): Remove bookkeeping for backedge_seen.  Don't
pass it to thread_through_normal_block or thread_through_empty_blocks.
For joiner handling, if we see a backedge, do not try normal
threading.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@229911 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 90fbe5f..78dc3f0 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,18 @@
+2015-11-06  Jeff Law 
+
+   * tree-ssa-threadedge.c (dummy_simplify): Remove.
+   (thread_around_empty_blocks): Remove backedge_seen_p argument.
+   If we thread to a backedge, then return false.  Update recursive
+   call to eliminate backedge_seen_p argument.
+   (thread_through_normal_block): Remove backedge_seen_p argument.
+   Remove backedge_seen_p argument from calls to
+   thread_around_empty_blocks.  Remove checks on backedge_seen_p.
+   If we thread to a backedge, then return 0.
+   (thread_across_edge): Remove bookkeeping for backedge_seen.  Don't
+   pass it to thread_through_normal_block or thread_through_empty_blocks.
+   For joiner handling, if we see a backedge, do not try normal
+   threading.
+
 2015-11-06  Abderrazek Zaafrani  
 
* graphite-optimize-isl.c (optimize_isl): Call isl_union_map_is_equal.
diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c
index 9379198..971fc52 100644
--- a/gcc/tree-ssa-threadedge.c
+++ b/gcc/tree-ssa-threadedge.c
@@ -376,17 +376,6 @@ record_temporary_equivalences_from_stmts_at_dest (edge e,
   return stmt;
 }
 
-/* Once we have passed a backedge in the CFG when threading, we do not want to
-   utilize edge equivalences for simplification purpose.  They are no longer
-   necessarily valid.  We use this callback rather than the ones provided by
-   DOM/VRP to achieve that effect.  */
-static tree
-dummy_simplify (gimple *stmt1 ATTRIBUTE_UNUSED, gimple *stmt2 ATTRIBUTE_UNUSED,
-   class avail_exprs_stack *avail_exprs_stack ATTRIBUTE_UNUSED)
-{
-  return NULL_TREE;
-}
-
 /* Simplify the control statement at the end of the block E->dest.
 
To avoid allocating memory unnecessarily, a scratch GIMPLE_COND
@@ -396,7 +385,7 @@ dummy_simplify (gimple *stmt1 ATTRIBUTE_UNUSED, gimple 
*stmt2 ATTRIBUTE_UNUSED,
a condition using pass specific information.
 
Return the simplified condition or NULL if simplification could
-   not be performed. 
+   not be performed.
 
The available expression table is referenced via AVAIL_EXPRS_STACK.  */
 
@@ -707,7 +696,7 @@ propagate_threaded_block_debug_into (basic_block dest, 
basic_block src)
return false.
 
DUMMY_COND, HANDLE_DOMINATING_ASSERTS and SIMPLIFY are used to
-   try and simplify the condition at the end of TAKEN_EDGE->dest. 
+   try and simplify the condition at the end of TAKEN_EDGE->dest.
 
The available expression table is referenced via AVAIL_EXPRS_STACK.  */
 
@@ -718,8 +707,7 @@ thread_around_empty_blocks (edge taken_edge,
bool handle_dominating_asserts,
pfn_simplify simplify,
bitmap visited,
-   vec *path,
-   bool *backedge_seen_p)
+   vec *path)
 {
   basic_block bb = taken_edge->dest;
   gimple_stmt_iterator gsi;
@@ -754,23 +742,23 @@ thread_around_empty_blocks (edge taken_edge,
   if (single_succ_p (bb))
{
  taken_edge = single_succ_edge (bb);
+
+ if ((taken_edge->flags & EDGE_DFS_BACK) != 0)
+   return false;
+
  if (!bitmap_bit_p (visited, taken_edge->dest->index))
{
  jump_thread_edge *x
= new 

[PATCH/RFC] C++ FE: expression ranges (work in progress)

2015-11-06 Thread David Malcolm
Caveat: this patch is a work-in-progress, but I thought it was worth
posting to check that the concept is OK.

This patch builds on top of the patch kit:
"[PATCH 00/10] Overhaul of diagnostics (v5)"
  https://gcc.gnu.org/ml/gcc-patches/2015-10/msg02536.html
of which patches 1-4 are now in trunk.

Note that the above kit generalizes the meaning of location_t to mean
a combination of both a caret location *and* a source range.

This patch is analogous to:
"[PATCH 06/10] Track expression ranges in C frontend"
  https://gcc.gnu.org/ml/gcc-patches/2015-10/msg02535.html

in that it adds range information to the various expressions, but this
time for the C++ frontend.

The benefit here would be e.g. to highlight the LHS and RHS of a bad
binary expression:

error: no match for ‘operator+’ (operand types are ‘int’ and ‘s’)
   return (first_function_with_a_very_long_name ()
   ~~~
   + second_function_with_a_very_long_name ());
   ^ 

where exactly which is the LHS and RHS can be unclear when dealing with
complicated nesting of expressions.

As with the C frontend, there's an issue with tree nodes that
don't have locations: VAR_DECL, INTEGER_CST, etc:

  int test (int foo)
  {
return foo * 100;
   ^^^   ^^^
  }

where we'd like to access the source spelling ranges of the expressions
during parsing, so that we can use them when reporting parser errors.

I resolved this for the C frontend by augmenting struct c_expr.
However, the C++ parser works purely on "tree".  Hence this patch
introduces a new class "cp_expr", which bundles a tree and a location_t,
with implicit conversion between tree and cp_expr via a ctor,
an operator tree () etc, effectively preserving location information for
things like VAR_DECL during parsing.  (I first attempted explicit
conversions, but it turned out to be a huge task; doing it implicitly
made it doable).  Some explicit "expr.get_value ()" method calls are
needed for variadic calls e.g.:

error_at (id_expr_token->location,
  "local variable %qD may not appear in this context",
  -   decl);
  +   decl.get_value ());

since operator tree () can't be used there.

For tree nodes with a location, the location_t is always the same
as the EXPR_LOCATION of node (and redundant).
For other tree nodes, the location_t is of interest (and is asserted
to not be UNKNOWN_LOCATION).

Most of the time, the cp_expr's location is rather redundant, but
it's essential at the leaves of the parse tree, where it's used for
preserving the locations of the tokens as they become identifiers,
constants, etc: an extra 32-bit value is being stored and passed around
with the tree ptrs.

The new tests were shameless copied from the C FE testsuite, fixing
them up to reflect some apparent differences in caret location between
C and C++ for some expression, and adding some C++-specific constructs
(perhaps it could be done by referencing the C tests in
g++.dg/plugin/plugin.exp, with relative paths, rather than copying them?)

This isn't ready yet:
* it has FIXMEs
* although it seems to bootstrap, it has about 100 regressions in
  g++.sum, and a lot of regressions in obj-c++.sum
* I haven't done performance testing yet

That said, is this the right direction?

Also, in cp_parser_new_expression I attempted to generate meaningful
ranges e.g.:

  int *foo = new int[100];
 ^~~~

but it seems to be hard to do this, due to the trailing optional
components; I found myself wanting to ask the lexer for the last
token it consumed (in particular, I'm interested in the
location_t of that token).  Is this a reasonable thing to add to the
lexer?

Thanks

gcc/ChangeLog:
* convert.c (convert_to_integer): In integer-to-integer
conversions, preserve the location.

gcc/cp/ChangeLog:
* call.c (build_new_op_1): Set the location of the result
of cp_build_unary_op.
* cp-tree.h (class cp_expr): New class.
(perform_koenig_lookup): Convert return type and param from tree
to cp_expr.
(finish_increment_expr): Likewise.
(finish_unary_op_expr): Likewise.
(finish_id_expression): Likewise for return type.
(build_class_member_access_expr): Likewise for param.
(finish_class_member_access_expr): Likewise.
(build_x_unary_op): Likewise.
(build_c_cast): Likewise.
(build_x_modify_expr): Likewise for return type.
* name-lookup.c (lookup_arg_dependent_1): Likewise.
(lookup_arg_dependent): Likewise; also for local "ret".
* name-lookup.h (lookup_arg_dependent): Likewise for return type.
* parser.c (struct cp_parser_expression_stack_entry): Likewise
for field "lhs".
(cp_parser_identifier): Likewise for return type.  Use cp_expr
ctor to preserve the token's location.
(cp_parser_string_literal): Likewise, 

Re: [PATCH] x86 interrupt attribute

2015-11-06 Thread Jeff Law

On 11/06/2015 03:19 PM, H.J. Lu wrote:

On Fri, Nov 6, 2015 at 7:31 AM, Uros Bizjak  wrote:

On Fri, Nov 6, 2015 at 3:07 PM, Yulia Koval  wrote:

Hi,

I updated and reposted the patch. Regtested/bootstraped on
x86_64/Linux and i686/Linux. Ok for trunk?


This version still emits insns from ix86_function_arg, so NAK.

Uros.


Hi Uros,

The ix86_function_arg change only applies the interrupt function body,
which has fixed parameters and won't be called by any functions directly.
Uros was pretty clear that generating code from within ix86_function_arg 
was a NAK from his side (and I'd agree with that assessment from Uros). 
 Generating code like that from within FUNCTION_ARG is pretty bad.


Can you arrange to load up the arguments from within the prologue so 
that they look like they were passed in?  Alternately you have to make 
FUNCTION_ARG return 0 for those cases and arrange to define all the 
other argument passing offsets appropriately so those arguments can be 
found in the stack.


Jeff


Re: regrename: Fix for earlyclobber operands

2015-11-06 Thread Jeff Law

On 11/06/2015 04:04 PM, Bernd Schmidt wrote:

On 11/06/2015 08:30 PM, Jeff Law wrote:

* regrename.c (record_out_operands): Terminate earlyclobbered
operands here.

Do you want to terminate those chains after step 6 is complete so that
the earlyclobber output conflicts with all the other outputs?


I'm not sure what you mean by your question. This patch is about
terminating live registers that are overwritten by earlyclobber output
operands. We open a new chain for every output, ec or not, and since
they are live at the same time, they are all considered conflicting.

I must have completely mis-understood the code.

OK for the trunk.

jeff


Re: [PATCH v3 2/2] [PR debug/67192] Further fix C loops' back-jump location

2015-11-06 Thread Jeff Law

On 11/04/2015 09:18 AM, Andreas Arnez wrote:

After parsing an unconditional "while"- or "for"-loop, the C front-end
generates a backward-goto statement and implicitly sets its location to
the current input_location.  But in some cases the parser peeks ahead
first, such that input_location already points to the line after the
loop and the generated backward-goto gets the wrong line number.

One way this can occur is with a loop body consisting of an "if"
statement, because then the parser peeks for an optional "else" before
finishing the loop.

Another way occurred after r223098 ("Implement
-Wmisleading-indentation"), even with a loop body enclosed in braces.
This was because the check for misleading indentation always peeks ahead
one token as well.

This patch avoids the use of input_location and sets the location of the
backward-goto to the start of the loop body instead, or, if there is no
loop body, to the start of the loop.

gcc/c/ChangeLog:

PR debug/67192
* c-typeck.c (c_finish_loop): For unconditional loops, set the
location of the backward-goto to the start of the loop body.

gcc/testsuite/ChangeLog:

PR debug/67192
* gcc.dg/guality/pr67192.c (f3, f4): New functions.
(main): Invoke them.
Also OK.  And please consider using those tests with the C++ compiler to 
see if it's suffering from the same problem.


Thanks again!

jeff



Re: [hsa 7/12] Disabling the vectorizer for GPU kernels/functions

2015-11-06 Thread Richard Biener
On Thu, 5 Nov 2015, Martin Jambor wrote:

> Hi,
> 
> in the previous email I wrote we need to "change behavior" of a few
> optimization passes.  One was the flattening of GPU functions and the
> other two are in the patch below.  It all comes to that, at the
> moment, we need to switch off the vectorizer (only for the GPU
> functions, of course).
> 
> We are actually quite close to being able to handle gimple vector
> input in HSA back-end but not all the way yet, and before allowing the
> vectorizer again, we will have to make sure it never produces vectors
> bigger than 128bits (in GPU functions).

Hmm.  I'd rather have this modify
DECL_FUNCTION_SPECIFIC_OPTIMIZATION of the hsa function to get this
effect.  I think I mentioned this to the OACC guys as well for a
similar needs of them.

Richard.

> Thanks,
> 
> Martin
> 
> 
> 2015-11-05  Martin Jambor  
> 
>   * tree-ssa-loop.c: Include cgraph.c, symbol-summary.c and hsa.h.
>   (pass_vectorize::gate): Do not run on HSA functions.
>   * tree-vectorizer.c: Include symbol-summary.c and hsa.h.
>   (pass_slp_vectorize::gate): Do not run on HSA functions.
> 
> diff --git a/gcc/tree-ssa-loop.c b/gcc/tree-ssa-loop.c
> index 8ecd140..0d119e2 100644
> --- a/gcc/tree-ssa-loop.c
> +++ b/gcc/tree-ssa-loop.c
> @@ -35,6 +35,9 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tree-inline.h"
>  #include "tree-scalar-evolution.h"
>  #include "tree-vectorizer.h"
> +#include "cgraph.h"
> +#include "symbol-summary.h"
> +#include "hsa.h"
>  
>  
>  /* A pass making sure loops are fixed up.  */
> @@ -257,7 +260,8 @@ public:
>/* opt_pass methods: */
>virtual bool gate (function *fun)
>  {
> -  return flag_tree_loop_vectorize || fun->has_force_vectorize_loops;
> +  return (flag_tree_loop_vectorize || fun->has_force_vectorize_loops)
> + && !hsa_gpu_implementation_p (fun->decl);
>  }
>  
>virtual unsigned int execute (function *);
> diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c
> index b80a8dd..366138c 100644
> --- a/gcc/tree-vectorizer.c
> +++ b/gcc/tree-vectorizer.c
> @@ -75,6 +75,8 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tree-ssa-propagate.h"
>  #include "dbgcnt.h"
>  #include "tree-scalar-evolution.h"
> +#include "symbol-summary.h"
> +#include "hsa.h"
>  
>  
>  /* Loop or bb location.  */
> @@ -675,7 +677,10 @@ public:
>  
>/* opt_pass methods: */
>opt_pass * clone () { return new pass_slp_vectorize (m_ctxt); }
> -  virtual bool gate (function *) { return flag_tree_slp_vectorize != 0; }
> +  virtual bool gate (function *fun)
> +  {
> +return flag_tree_slp_vectorize && !hsa_gpu_implementation_p (fun->decl);
> +  }
>virtual unsigned int execute (function *);
>  
>  }; // class pass_slp_vectorize
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [hsa 9/12] Small alloc-pool fix

2015-11-06 Thread Richard Biener
On Fri, 6 Nov 2015, Martin Liška wrote:

> On 11/06/2015 10:00 AM, Richard Biener wrote:
> > On Thu, 5 Nov 2015, Martin Jambor wrote:
> > 
> >> Hi,
> >>
> >> we use C++ new operators based on alloc-pools a lot in the subsequent
> >> patches and realized that on the current trunk, such new operators
> >> would needlessly call the placement ::new operator within the allocate
> >> method of pool-alloc.  Fixed below by providing a new allocation
> >> method which does not call placement new, which is only safe to use
> >> from within a new operator.
> >>
> >> The patch also fixes the slightly weird two parameter operator new
> >> (which we do not use in HSA backend) so that it does not do the same.
> > 
> 
> Hi.
> 
> > Why do you need to add the pointer variant then?
> 
> You are right, we originally used the variant in the branch, but it was 
> eventually
> left.
> 
> > 
> > Also isn't the issue with allocate() that it does
> > 
> > return ::new (m_allocator.allocate ()) T ();
> > 
> > which 1) value-initializes and 2) doesn't even work with types like
> > 
> > struct T { T(int); };
> > 
> > thus types without a default constructor.
> 
> You are right, it produces compilation error.
> 
> > 
> > I think the allocator was poorly C++-ified without updating the
> > specification for the cases it is supposed to handle.  And now
> > we have C++ uses that are not working because the allocator is
> > broken.
> > 
> > An incrementally better version (w/o fixing the issue with
> > types w/o default constructor) is
> > 
> > return ::new (m_allocator.allocate ()) T;
> 
> I've tried that, and it also calls default ctor:
> 
> ../../gcc/alloc-pool.h: In instantiation of ‘T* 
> object_allocator::allocate() [with T = et_occ]’:
> ../../gcc/alloc-pool.h:531:22:   required from ‘void* operator new(size_t, 
> object_allocator&) [with T = et_occ; size_t = long unsigned int]’
> ../../gcc/et-forest.c:449:46:   required from here
> ../../gcc/et-forest.c:58:3: error: ‘et_occ::et_occ()’ is private
>et_occ ();
>^
> In file included from ../../gcc/et-forest.c:28:0:
> ../../gcc/alloc-pool.h:483:44: error: within this context
>  return ::new (m_allocator.allocate ()) T;

Yes, but it does slightly cheaper initialization of PODs

> 
> > 
> > thus default-initialize which does no initialization for PODs (without
> > array members...) which is what the old pool allocator did.
> 
> I'm not so familiar with differences related to PODs.
> 
> > 
> > To fix the new operator (how do you even call that?  does it allow
> > specifying constructor args and thus work without a default constructor?)
> > it should indeed use an allocation method not performing the placement
> > new.  But I'd call it allocate_raw rather than vallocate.
> 
> For situations where do not have a default ctor, one should you the 
> helper method defined at the end of alloc-pool.h:
> 
> template 
> inline void *
> operator new (size_t, object_allocator )
> {
>   return a.allocate ();
> }
> 
> For instance:
> et_occ *nw = new (et_occurrences) et_occ (2);

Oh, so it uses placement new syntax...  works for me.

> or as used in the HSA branch:
> 
> /* New operator to allocate convert instruction from pool alloc.  */
> 
> void *
> hsa_insn_cvt::operator new (size_t)
> {
>   return hsa_allocp_inst_cvt->allocate_raw ();
> }
> 
> and
> 
> cvtinsn = new hsa_insn_cvt (reg, *ptmp2);
> 
> 
> I attached patch where I rename the method as suggested.

Ok.

Thanks,
Richard.

> Thanks,
> Martin
> 
> > 
> > Thanks.
> > Richard.
> > 
> >> Thanks,
> >>
> >> Martin
> >>
> >>
> >> 2015-11-05  Martin Liska  
> >>Martin Jambor  
> >>
> >>* alloc-pool.h (object_allocator::vallocate): New method.
> >>(operator new): Call vallocate instead of allocate.
> >>(operator new): New operator.
> >>
> >>
> >> diff --git a/gcc/alloc-pool.h b/gcc/alloc-pool.h
> >> index 0dc05cd..46b6550 100644
> >> --- a/gcc/alloc-pool.h
> >> +++ b/gcc/alloc-pool.h
> >> @@ -483,6 +483,12 @@ public:
> >>  return ::new (m_allocator.allocate ()) T ();
> >>}
> >>  
> >> +  inline void *
> >> +  vallocate () ATTRIBUTE_MALLOC
> >> +  {
> >> +return m_allocator.allocate ();
> >> +  }
> >> +
> >>inline void
> >>remove (T *object)
> >>{
> >> @@ -523,12 +529,19 @@ struct alloc_pool_descriptor
> >>  };
> >>  
> >>  /* Helper for classes that do not provide default ctor.  */
> >> -
> >>  template 
> >>  inline void *
> >>  operator new (size_t, object_allocator )
> >>  {
> >> -  return a.allocate ();
> >> +  return a.vallocate ();
> >> +}
> >> +
> >> +/* Helper for classes that do not provide default ctor.  */
> >> +template 
> >> +inline void *
> >> +operator new (size_t, object_allocator *a)
> >> +{
> >> +  return a->vallocate ();
> >>  }
> >>  
> >>  /* Hashtable mapping alloc_pool names to descriptors.  */
> >>
> >>
> > 
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 

Re: [PATCH] Add configure flag for operator new (std::nothrow)

2015-11-06 Thread Pedro Alves
On 11/06/2015 01:56 AM, Jonathan Wakely wrote:
> On 5 November 2015 at 23:31, Daniel Gutson

>> The issue is, as I understand it, to do the actual work of operator
>> new, i.e. allocate memory. It should force
>> us to copy most of the code of the original code of operator new,
>> which may change on new versions of the
>> STL, forcing us to keep updated.
> 
> It can just call malloc, and the replacement operator delete can call free.
> 
> That is very unlikely to need to change (which is corroborated by the
> fact that the default definitions in libsupc++ change very rarely).

Or perhaps libsupc++ could provide the default operator new under
a __default_operator_new alias or some such, so that the user-defined
replacement can fallback to calling it.  Likewise for op delete.

Thanks,
Pedro Alves



Re: Merge of HSA branch

2015-11-06 Thread Bernd Schmidt

On 11/05/2015 10:51 PM, Martin Jambor wrote:

Individual changes are described in slightly more detail in their
respective messages.  If you are interested in how the HSAIL
generation works in general, I encourage you to have a look at my
Cauldron slides or presentation, only very few things have changed as
far as the general principles are concerned.  Let me just quickly stress
here that we do acceleration within a single compiler, as opposed to
LTO-ways of all the other accelerator teams.


Realistically we're probably not going to reject this work, but I still 
want to ask whether the approach was acked by the community before you 
started. I'm really not exactly thrilled about having two different 
classes of backends in the compiler, and two different ways of handling 
offloading.



I also acknowledge that we should add HSA-specific tests to the GCC
testsuite but we are only now looking at how to do that and will
welcome any guidance in this regard.


Yeah, I was looking for any kind of new test, because...


the class of OpenMP loops we can handle well is small,


I'd appreciate more information on what this means. Any examples or 
performance numbers?



Bernd


Re: [PATCH] PR driver/67613 - spell suggestions for misspelled command line options

2015-11-06 Thread Bernd Schmidt

On 11/04/2015 03:53 PM, David Malcolm wrote:

This patch adds hints to the option-not-found error in the driver,
using the Levenshtein distance implementation posted here:
"[PATCH 0/2] Levenshtein-based suggestions (v3)"
   https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03379.html

It splits out the identifier-based implementation into a new
spellcheck-tree.c, keeping the core in spellcheck.c, since the tree
checking code needed for IDENTIFIER_POINTER etc isn't available in
the driver.


Nice. Ok.


Bernd


Re: [PATCH 6/6] Make SRA replace constant-pool loads

2015-11-06 Thread Eric Botcazou
> Hmm, can you clarify, do you mean I should *not* replace constant pool
> values with their DECL_INITIAL? The attempt to substitute in the
> initial value is what leads to most of the problems. For example, in
> gnat/opt31.adb, create_access finds this expression accessing *.LC0:
> 
> MEM[(interfaces__unsigned_8[(sizetype)  opt31__messages_t___XUP>.P_BOUNDS->LB0: opt31__messages_t___XUP>.P_BOUNDS->UB0 >=  opt31__messages_t___XUP>.P_BOUNDS->LB0 ? (sizetype)  struct opt31__messages_t___XUP>.P_BOUNDS->UB0 : (sizetype)
> .P_BOUNDS->LB0 +
> 4294967295] *)&*.LC0][1 ...]{lb: 1 sz: 1}
> 
> this is an ARRAY_RANGE_REF of a MEM_REF of an ADDR_EXPR of *.LC0. So
> far I haven't extended subst_constant_pool_initial to handle
> ARRAY_RANGE_REFs, as it can't even handle this MEM_REF:
> 
> MEM[(interfaces__unsigned_8[(sizetype)  opt31__messages_t___XUP>.P_BOUNDS->LB0: opt31__messages_t___XUP>.P_BOUNDS->UB0 >=  opt31__messages_t___XUP>.P_BOUNDS->LB0 ? (sizetype)  struct opt31__messages_t___XUP>.P_BOUNDS->UB0 : (sizetype)
> .P_BOUNDS->LB0 +
> 4294967295] *)&*.LC0]
> 
> because the type here has size:
> 
> MIN_EXPR <_GLOBAL.SZ2.ada_opt31 ( opt31__messages_t___XUP>.P_BOUNDS->UB0,  opt31__messages_t___XUP>.P_BOUNDS->LB0), 17179869176>
> 
> inside the MEM_REF of the ADDR_EXPR is *.LC0, whose DECL_INITIAL is a
> 4-element array (fine). Sadly while the MEM_REF
> type_contains_placeholder_p, the type of the outer ARRAY_RANGE_REF
> does not

FWIW you are allowed to punt on this kind of complex expressions that appear 
only in Ada.  New optimizations are sort of allowed to work on the C family of 
languages first, and be extended or not to the rest of languages afterwards.

> One possibility is that this whole construct, ARRAY_RANGE_REF that it
> is, should mark *.LC0 in cannot_scalarize_away_bitmap.

ARRAY_RANGE_REF is only used in Ada so you can do that for now (unless this 
introduces regressions in the gnat.dg testsuite but I doubt it).

-- 
Eric Botcazou


Re: [PATCH] Fix PR68067

2015-11-06 Thread Alan Lawrence

On 28/10/15 13:38, Richard Biener wrote:


Applied as follows.

Bootstrapped / tested on x86_64-unknown-linux-gnu.

Richard.

2015-10-28  Richard Biener  

* fold-const.c (negate_expr_p): Adjust the division case to
properly avoid introducing undefined overflow.
(fold_negate_expr): Likewise.


Since this we've been seeing an ICE compiling polynom.c from 254.gap in SPEC2000 
on aarch64-linux-gnu with -O3 -ffast-math -mcpu=cortex-a53 (or -Ofast 
-mcpu=cortex-a53), on both native (bootstrapped and --disable-bootstrap) and 
cross-linux builds.


A number of options prevent the ICE, e.g. any of -fno-thread-jumps, 
-fno-strict-overflow, -fdump-tree-alias or -fdump-tree-ealias (!). Similarly, 
dropping the -mcpu=cortex-a53, or changing to -mcpu=cortex-a57.


(I have a recent build in a chroot for which -fno-strict-overflow does *not* fix 
the ICE but haven't yet figured out exactly what the difference in the chroot 
environment is.)


Moreover, preprocessing in a separate step (i.e. piping preprocessed output via 
a file with -E), also avoids the ICE. (This is hindering my efforts to reduce 
the testcase!).  So my hypothesis is that this is a front-end/preprocessor bug, 
rather than anything directly due to this commit.


The error message in full (line refs from that commit, r229479) is:
=
../spec2000/benchspec/CINT2000/254.gap/src/polynom.c: In function 
‘NormalizeCoeffsListx’:
../spec2000/benchspec/CINT2000/254.gap/src/polynom.c:358:11: error: incompatible 
types in PHI argument 0

 TypHandle NormalizeCoeffsListx ( hdC )
   ^
long int

int

../spec2000/benchspec/CINT2000/254.gap/src/polynom.c:358:11: error: location 
references block not in block tree

l1_279 = PHI <1(28), l1_299(33)>
../spec2000/benchspec/CINT2000/254.gap/src/polynom.c:358:11: error: invalid PHI 
argument


../spec2000/benchspec/CINT2000/254.gap/src/polynom.c:358:11: internal compiler 
error: tree check: expected class ‘type’, have ‘declaration’ (namespace_decl) in 
useless_type_conversion_p, at gimple-expr.c:84
0xd110ef tree_class_check_failed(tree_node const*, tree_code_class, char const*, 
int, char const*)

../../gcc-fsf/gcc/tree.c:9643
0x82561b tree_class_check
../../gcc-fsf/gcc/tree.h:3042
0x82561b useless_type_conversion_p(tree_node*, tree_node*)
../../gcc-fsf/gcc/gimple-expr.c:84
0xaca043 verify_gimple_phi
../../gcc-fsf/gcc/tree-cfg.c:4673
0xaca043 verify_gimple_in_cfg(function*, bool)
../../gcc-fsf/gcc/tree-cfg.c:4967
0x9c2e0b execute_function_todo
../../gcc-fsf/gcc/passes.c:1967
0x9c360b do_per_function
../../gcc-fsf/gcc/passes.c:1659
0x9c3807 execute_todo
../../gcc-fsf/gcc/passes.c:2022
Please submit a full bug report,
with preprocessed source if appropriate.
=
which looks like an "incompatible types from PHI argument" from a first call to 
verify_gimple_phi, then a second call to verify_gimple_phi prints "invalid phi 
argument" and ICEs in the test just before possibly printing a second 
incompatible_types message.



--Alan



Re: Merge of HSA branch

2015-11-06 Thread Richard Biener
On Fri, 6 Nov 2015, Bernd Schmidt wrote:

> On 11/05/2015 10:51 PM, Martin Jambor wrote:
> > Individual changes are described in slightly more detail in their
> > respective messages.  If you are interested in how the HSAIL
> > generation works in general, I encourage you to have a look at my
> > Cauldron slides or presentation, only very few things have changed as
> > far as the general principles are concerned.  Let me just quickly stress
> > here that we do acceleration within a single compiler, as opposed to
> > LTO-ways of all the other accelerator teams.
> 
> Realistically we're probably not going to reject this work, but I still want
> to ask whether the approach was acked by the community before you started. I'm
> really not exactly thrilled about having two different classes of backends in
> the compiler, and two different ways of handling offloading.

Realistically the other approaches werent acked either (well, implicitely
by review).  Not doing an RTL backend for NVPTX would have simplified
your life as well.  Not doing an RTL backend practically means not
going the LTO way as you couldn't easily even build a target without
RTL pieces (not sure how big a "dummy" RTL target would be).

Richard.

> > I also acknowledge that we should add HSA-specific tests to the GCC
> > testsuite but we are only now looking at how to do that and will
> > welcome any guidance in this regard.
> 
> Yeah, I was looking for any kind of new test, because...
>
> > the class of OpenMP loops we can handle well is small,
> 
> I'd appreciate more information on what this means. Any examples or
> performance numbers?
> 
> 
> Bernd
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: Re: [PATCH] Fix PRs 66502 and 67167

2015-11-06 Thread Jiong Wang



On 21/08/15 10:47, Jiong Wang wrote:

Richard Biener writes:


I see the following ICE:

t.c:13:1: internal compiler error: in decompose_normal_address, at
rtlanal.c:6090
  }
  ^
0xc94a37 decompose_normal_address
 /space/rguenther/tramp3d/trunk/gcc/rtlanal.c:6090
0xc94d25 decompose_address(address_info*, rtx_def**, machine_mode,
unsigned char, rtx_code)
 /space/rguenther/tramp3d/trunk/gcc/rtlanal.c:6167
0xc94dc3 decompose_mem_address(address_info*, rtx_def*)
 /space/rguenther/tramp3d/trunk/gcc/rtlanal.c:6187
0xb61149 process_address_1
 /space/rguenther/tramp3d/trunk/gcc/lra-constraints.c:2867
0xb61c4e process_address
 /space/rguenther/tramp3d/trunk/gcc/lra-constraints.c:3124
0xb62607 curr_insn_transform
 /space/rguenther/tramp3d/trunk/gcc/lra-constraints.c:3419
0xb65250 lra_constraints(bool)
 /space/rguenther/tramp3d/trunk/gcc/lra-constraints.c:4421

that looks like a latent issue to me in an area of GCC I am not
familiar with.  I suggest to open a bugreport and CC Vladimir.

Thanks for the info. Done https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67305


Richard,

  Though the ICE itself is caused by one latent bug in ARM backend 
(PR67305), while my
further double check shows there is performance regression since this 
patch. The regression
should have been caused by other gcc latent bugs in tree-vrp pass. 
Bugzilla created to track


  Thanks.

  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68234

Regards,
Jiong





The r226850 change caused us to eliminate an induction variable
early (I suspect IVOPTs would have done this later anyway, but
I did not verify that):

Replaced redundant PHI node defining bl_2 with c_1
Replaced c_1 + 1 with bl_15 in all uses of c_16 = c_1 + 1;
Removing dead stmt c_16 = c_1 + 1;
Removing dead stmt bl_2 = PHI <0(2), bl_15(3)>

Thanks,
Richard.


   Thanks.





Re: [hsa 9/12] Small alloc-pool fix

2015-11-06 Thread Richard Biener
On Thu, 5 Nov 2015, Martin Jambor wrote:

> Hi,
> 
> we use C++ new operators based on alloc-pools a lot in the subsequent
> patches and realized that on the current trunk, such new operators
> would needlessly call the placement ::new operator within the allocate
> method of pool-alloc.  Fixed below by providing a new allocation
> method which does not call placement new, which is only safe to use
> from within a new operator.
> 
> The patch also fixes the slightly weird two parameter operator new
> (which we do not use in HSA backend) so that it does not do the same.

Why do you need to add the pointer variant then?

Also isn't the issue with allocate() that it does

return ::new (m_allocator.allocate ()) T ();

which 1) value-initializes and 2) doesn't even work with types like

struct T { T(int); };

thus types without a default constructor.

I think the allocator was poorly C++-ified without updating the
specification for the cases it is supposed to handle.  And now
we have C++ uses that are not working because the allocator is
broken.

An incrementally better version (w/o fixing the issue with
types w/o default constructor) is

return ::new (m_allocator.allocate ()) T;

thus default-initialize which does no initialization for PODs (without
array members...) which is what the old pool allocator did.

To fix the new operator (how do you even call that?  does it allow
specifying constructor args and thus work without a default constructor?)
it should indeed use an allocation method not performing the placement
new.  But I'd call it allocate_raw rather than vallocate.

Thanks.
Richard.

> Thanks,
> 
> Martin
> 
> 
> 2015-11-05  Martin Liska  
>   Martin Jambor  
> 
>   * alloc-pool.h (object_allocator::vallocate): New method.
>   (operator new): Call vallocate instead of allocate.
>   (operator new): New operator.
> 
> 
> diff --git a/gcc/alloc-pool.h b/gcc/alloc-pool.h
> index 0dc05cd..46b6550 100644
> --- a/gcc/alloc-pool.h
> +++ b/gcc/alloc-pool.h
> @@ -483,6 +483,12 @@ public:
>  return ::new (m_allocator.allocate ()) T ();
>}
>  
> +  inline void *
> +  vallocate () ATTRIBUTE_MALLOC
> +  {
> +return m_allocator.allocate ();
> +  }
> +
>inline void
>remove (T *object)
>{
> @@ -523,12 +529,19 @@ struct alloc_pool_descriptor
>  };
>  
>  /* Helper for classes that do not provide default ctor.  */
> -
>  template 
>  inline void *
>  operator new (size_t, object_allocator )
>  {
> -  return a.allocate ();
> +  return a.vallocate ();
> +}
> +
> +/* Helper for classes that do not provide default ctor.  */
> +template 
> +inline void *
> +operator new (size_t, object_allocator *a)
> +{
> +  return a->vallocate ();
>  }
>  
>  /* Hashtable mapping alloc_pool names to descriptors.  */
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


[PATCH] Fix PR ipa/68035

2015-11-06 Thread Martin Liška
Hello.

Following patch triggers hash calculation of items (functions and variables)
in situations where LTO mode is not utilized.

Patch survives regression tests and bootstraps on x86_64-linux-pc.

Ready for trunk?
Thanks,
Martin
>From 62266e21a89777c6dbd680f7c87f15abe603c024 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Thu, 5 Nov 2015 18:31:31 +0100
Subject: [PATCH] Fix PR ipa/68035

gcc/testsuite/ChangeLog:

2015-11-05  Martin Liska  

	* gcc.dg/ipa/pr68035.c: New test.

gcc/ChangeLog:

2015-11-05  Martin Liska  

	PR ipa/68035
	* ipa-icf.c (sem_item_optimizer::build_graph): Force building
	of a hash value for an item if we are not running in LTO mode.
---
 gcc/ipa-icf.c  |   4 ++
 gcc/testsuite/gcc.dg/ipa/pr68035.c | 108 +
 2 files changed, 112 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/ipa/pr68035.c

diff --git a/gcc/ipa-icf.c b/gcc/ipa-icf.c
index 7bb3af5..09c42a1 100644
--- a/gcc/ipa-icf.c
+++ b/gcc/ipa-icf.c
@@ -2744,6 +2744,10 @@ sem_item_optimizer::build_graph (void)
 {
   sem_item *item = m_items[i];
   m_symtab_node_map.put (item->node, item);
+
+  /* Initialize hash values if we are not in LTO mode.  */
+  if (!in_lto_p)
+	item->get_hash ();
 }
 
   for (unsigned i = 0; i < m_items.length (); i++)
diff --git a/gcc/testsuite/gcc.dg/ipa/pr68035.c b/gcc/testsuite/gcc.dg/ipa/pr68035.c
new file mode 100644
index 000..a8cb779
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ipa/pr68035.c
@@ -0,0 +1,108 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-ipa-icf"  } */
+
+static const unsigned short list_0[] = { 777, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49 };
+static const unsigned short list_1[] = { 0, 777, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49 };
+static const unsigned short list_2[] = { 0, 1, 777, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49 };
+static const unsigned short list_3[] = { 0, 1, 2, 777, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49 };
+static const unsigned short list_4[] = { 0, 1, 2, 3, 777, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49 };
+static const unsigned short list_5[] = { 0, 1, 2, 3, 4, 777, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49 };
+static const unsigned short list_6[] = { 0, 1, 2, 3, 4, 5, 777, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49 };
+static const unsigned short list_7[] = { 0, 1, 2, 3, 4, 5, 6, 777, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49 };
+static const unsigned short list_8[] = { 0, 1, 2, 3, 4, 5, 6, 7, 777, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49 };
+static const unsigned short list_9[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 777, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49 };
+static const unsigned short list_10[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 777, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49 };
+static const unsigned short list_11[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 777, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49 };
+static const unsigned short list_12[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 777, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49 };
+static const unsigned short list_13[] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 777, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49 };
+static const unsigned short list_14[] = { 0, 1, 

[PATCH] Make BB vectorizer work on sub-BBs

2015-11-06 Thread Richard Biener

The following patch makes the BB vectorizer not only handle BB heads
(until the first stmt with a data reference it cannot handle) but
arbitrary regions in a BB separated by such stmts.

This improves the number of BB vectorizations from 469 to 556
in a quick test on SPEC CPU 2006 with -Ofast on x86_64 and
1x400.perlbench 1x410.bwaves 1x416.gamess 1x450.soplex 1x453.povray 
1x481.wrf failing both patched and unpatched (have to update my
config used for such experiments it seems ...)

Bootstrapped and tested on x86_64-unknown-linux-gnu, aarch64 cross built.

I'm currently re-testing for a cosmetic change I made when writing
the changelog.

I expected (and there are) some issues with compile-time.  Left
is unpatched and right is patched.

'403.gcc': 00:00:54 (54)  | '403.gcc': 00:00:55 (55)
'483.xalancbmk': 00:02:20 (140)   | '483.xalancbmk': 00:02:24 (144)
'416.gamess': 00:02:36 (156)  | '416.gamess': 00:02:37 (157)
'435.gromacs': 00:00:18 (18)  | '435.gromacs': 00:00:19 (19)
'447.dealII': 00:01:31 (91)   | '447.dealII': 00:01:33 (93)
'453.povray': 00:04:54 (294)  | '453.povray': 00:08:54 (534)
'454.calculix': 00:00:34 (34) | '454.calculix': 00:00:52 (52)
'481.wrf': 00:01:57 (117) | '481.wrf': 00:01:59 (119)

other benchmarks are unchanged.  I'm double-checking now that a followup
patch I have which re-implements BB vectorization dependence checking
fixes this (that's the only quadraticness I know of).

Richard.

2015-11-06  Richard Biener  

* tree-vectorizer.h (struct _bb_vec_info): Add region_begin/end
members.
(vect_stmt_in_region_p): Declare.
* tree-vect-slp.c (new_bb_vec_info): Work on a region.
(destroy_bb_vec_info): Likewise.
(vect_bb_slp_scalar_cost): Use vect_stmt_in_region_p.
(vect_get_and_check_slp_defs): Likewise.
(vect_slp_analyze_bb_1): Refactor to make it work on sub-BBs.
(vect_slp_bb): Likewise.
* tree-vect-patterns.c (vect_same_loop_or_bb_p): Implement
in terms of vect_stmt_in_region_p.
(vect_pattern_recog): Iterate over the BB region.
* tree-vect-stmts.c (vect_is_simple_use): Use vect_stmt_in_region_p.
* tree-vectorizer.c (vect_stmt_in_region_p): New function.
(pass_slp_vectorize::execute): Initialize all stmt UIDs to -1.

* config/i386/i386.c: Include gimple-iterator.h.
* config/aarch64/aarch64.c: Likewise.

* gcc.dg/vect/bb-slp-38.c: New testcase.

Index: gcc/tree-vectorizer.h
===
*** gcc/tree-vectorizer.h.orig  2015-11-05 09:52:00.640227178 +0100
--- gcc/tree-vectorizer.h   2015-11-05 13:20:58.385786476 +0100
*** nested_in_vect_loop_p (struct loop *loop
*** 390,395 
--- 390,397 
  typedef struct _bb_vec_info : public vec_info
  {
basic_block bb;
+   gimple_stmt_iterator region_begin;
+   gimple_stmt_iterator region_end;
  } *bb_vec_info;
  
  #define BB_VINFO_BB(B)   (B)->bb
*** void vect_pattern_recog (vec_info *);
*** 1085,1089 
--- 1087,1092 
  /* In tree-vectorizer.c.  */
  unsigned vectorize_loops (void);
  void vect_destroy_datarefs (vec_info *);
+ bool vect_stmt_in_region_p (vec_info *, gimple *);
  
  #endif  /* GCC_TREE_VECTORIZER_H  */
Index: gcc/tree-vect-slp.c
===
*** gcc/tree-vect-slp.c.orig2015-11-05 09:52:00.640227178 +0100
--- gcc/tree-vect-slp.c 2015-11-06 10:22:56.707880233 +0100
*** vect_get_and_check_slp_defs (vec_info *v
*** 209,215 
unsigned int i, number_of_oprnds;
gimple *def_stmt;
enum vect_def_type dt = vect_uninitialized_def;
-   struct loop *loop = NULL;
bool pattern = false;
slp_oprnd_info oprnd_info;
int first_op_idx = 1;
--- 209,214 
*** vect_get_and_check_slp_defs (vec_info *v
*** 218,226 
bool first = stmt_num == 0;
bool second = stmt_num == 1;
  
-   if (is_a  (vinfo))
- loop = LOOP_VINFO_LOOP (as_a  (vinfo));
- 
if (is_gimple_call (stmt))
  {
number_of_oprnds = gimple_call_num_args (stmt);
--- 217,222 
*** again:
*** 276,286 
   from the pattern.  Check that all the stmts of the node are in the
   pattern.  */
if (def_stmt && gimple_bb (def_stmt)
!   && ((is_a  (vinfo)
!  && flow_bb_inside_loop_p (loop, gimple_bb (def_stmt)))
! || (is_a  (vinfo)
! && gimple_bb (def_stmt) == as_a  (vinfo)->bb
! && gimple_code (def_stmt) != GIMPLE_PHI))
&& vinfo_for_stmt (def_stmt)
&& STMT_VINFO_IN_PATTERN_P (vinfo_for_stmt (def_stmt))
  && !STMT_VINFO_RELEVANT (vinfo_for_stmt (def_stmt))
--- 272,278 
   from the pattern.  Check that all the stmts of the node are in the
   pattern.  */

Re: [hsa 9/12] Small alloc-pool fix

2015-11-06 Thread Martin Liška
On 11/06/2015 10:00 AM, Richard Biener wrote:
> On Thu, 5 Nov 2015, Martin Jambor wrote:
> 
>> Hi,
>>
>> we use C++ new operators based on alloc-pools a lot in the subsequent
>> patches and realized that on the current trunk, such new operators
>> would needlessly call the placement ::new operator within the allocate
>> method of pool-alloc.  Fixed below by providing a new allocation
>> method which does not call placement new, which is only safe to use
>> from within a new operator.
>>
>> The patch also fixes the slightly weird two parameter operator new
>> (which we do not use in HSA backend) so that it does not do the same.
> 

Hi.

> Why do you need to add the pointer variant then?

You are right, we originally used the variant in the branch, but it was 
eventually
left.

> 
> Also isn't the issue with allocate() that it does
> 
> return ::new (m_allocator.allocate ()) T ();
> 
> which 1) value-initializes and 2) doesn't even work with types like
> 
> struct T { T(int); };
> 
> thus types without a default constructor.

You are right, it produces compilation error.

> 
> I think the allocator was poorly C++-ified without updating the
> specification for the cases it is supposed to handle.  And now
> we have C++ uses that are not working because the allocator is
> broken.
> 
> An incrementally better version (w/o fixing the issue with
> types w/o default constructor) is
> 
> return ::new (m_allocator.allocate ()) T;

I've tried that, and it also calls default ctor:

../../gcc/alloc-pool.h: In instantiation of ‘T* object_allocator::allocate() 
[with T = et_occ]’:
../../gcc/alloc-pool.h:531:22:   required from ‘void* operator new(size_t, 
object_allocator&) [with T = et_occ; size_t = long unsigned int]’
../../gcc/et-forest.c:449:46:   required from here
../../gcc/et-forest.c:58:3: error: ‘et_occ::et_occ()’ is private
   et_occ ();
   ^
In file included from ../../gcc/et-forest.c:28:0:
../../gcc/alloc-pool.h:483:44: error: within this context
 return ::new (m_allocator.allocate ()) T;


> 
> thus default-initialize which does no initialization for PODs (without
> array members...) which is what the old pool allocator did.

I'm not so familiar with differences related to PODs.

> 
> To fix the new operator (how do you even call that?  does it allow
> specifying constructor args and thus work without a default constructor?)
> it should indeed use an allocation method not performing the placement
> new.  But I'd call it allocate_raw rather than vallocate.

For situations where do not have a default ctor, one should you the helper 
method defined
at the end of alloc-pool.h:

template 
inline void *
operator new (size_t, object_allocator )
{
  return a.allocate ();
}

For instance:
et_occ *nw = new (et_occurrences) et_occ (2);

or as used in the HSA branch:

/* New operator to allocate convert instruction from pool alloc.  */

void *
hsa_insn_cvt::operator new (size_t)
{
  return hsa_allocp_inst_cvt->allocate_raw ();
}

and

cvtinsn = new hsa_insn_cvt (reg, *ptmp2);


I attached patch where I rename the method as suggested.

Thanks,
Martin

> 
> Thanks.
> Richard.
> 
>> Thanks,
>>
>> Martin
>>
>>
>> 2015-11-05  Martin Liska  
>>  Martin Jambor  
>>
>>  * alloc-pool.h (object_allocator::vallocate): New method.
>>  (operator new): Call vallocate instead of allocate.
>>  (operator new): New operator.
>>
>>
>> diff --git a/gcc/alloc-pool.h b/gcc/alloc-pool.h
>> index 0dc05cd..46b6550 100644
>> --- a/gcc/alloc-pool.h
>> +++ b/gcc/alloc-pool.h
>> @@ -483,6 +483,12 @@ public:
>>  return ::new (m_allocator.allocate ()) T ();
>>}
>>  
>> +  inline void *
>> +  vallocate () ATTRIBUTE_MALLOC
>> +  {
>> +return m_allocator.allocate ();
>> +  }
>> +
>>inline void
>>remove (T *object)
>>{
>> @@ -523,12 +529,19 @@ struct alloc_pool_descriptor
>>  };
>>  
>>  /* Helper for classes that do not provide default ctor.  */
>> -
>>  template 
>>  inline void *
>>  operator new (size_t, object_allocator )
>>  {
>> -  return a.allocate ();
>> +  return a.vallocate ();
>> +}
>> +
>> +/* Helper for classes that do not provide default ctor.  */
>> +template 
>> +inline void *
>> +operator new (size_t, object_allocator *a)
>> +{
>> +  return a->vallocate ();
>>  }
>>  
>>  /* Hashtable mapping alloc_pool names to descriptors.  */
>>
>>
> 

diff --git a/gcc/alloc-pool.h b/gcc/alloc-pool.h
index 0dc05cd..8b8c023 100644
--- a/gcc/alloc-pool.h
+++ b/gcc/alloc-pool.h
@@ -477,11 +477,22 @@ public:
 m_allocator.release_if_empty ();
   }
 
+  /* Allocate memory for instance of type T and call a default constructor.  */
+
   inline T *
   allocate () ATTRIBUTE_MALLOC
   {
 return ::new (m_allocator.allocate ()) T ();
   }
+  /* Allocate memory for instance of type T and return void * that
+ could be used in situations where a default constructor is not provided
+ by the class T.  */
+
+  inline void *
+  allocate_raw () ATTRIBUTE_MALLOC
+  {
+ 

[PATCH] Fix transform_to_exit_first_loop_alt with -g

2015-11-06 Thread Tom de Vries

Hi,

This patch fixes a problem with -g compilation in 
transform_to_exit_first_loop_alt.


Consider test-case test.c:
...
void
f (int *a, int n)
{
  int i;
  for (i = 0; i < n; ++i)
a[i] = 1;
}
...

If we add a "checking_verify_ssa (true, true)" call at the end of 
transform_to_exit_first_loop_alt, and we compile with "-g -O2 
-ftree-parallelize-loops=4", we run into this ICE:

...
test.c: In function ‘f’:
test.c:2:1: error: definition in block 5 does not dominate use in block 13
for SSA_NAME: i_10 in statement:
# DEBUG i => i_10
test.c:2:1: internal compiler error: verify_ssa failed
...

Before transform_to_exit_first_loop_alt, the loop looks like:
...
  :

  :
  # ivtmp_22 = PHI <0(11), ivtmp_23(7)>
  i_13 = ivtmp_22;
  # DEBUG i => i_13
  _5 = (long unsigned int) i_13;
  _6 = _5 * 4;
  _8 = a_7(D) + _6;
  *_8 = 1;
  i_10 = i_13 + 1;
  # DEBUG i => i_10
  # DEBUG i => i_10
  if (ivtmp_22 < _1)
goto ;
  else
goto ;

  :
  ivtmp_23 = ivtmp_22 + 1;
  goto ;
...


And after transform_to_exit_first_loop_alt, it looks like:
...
  :
  goto ;

  :
  # ivtmp_22 = PHI 
  i_13 = ivtmp_22;
  # DEBUG i => i_13
  _5 = (long unsigned int) i_13;
  _6 = _5 * 4;
  _8 = a_7(D) + _6;
  *_8 = 1;
  i_10 = i_13 + 1;
  goto ;

  :
  # ivtmp_25 = PHI 
  # DEBUG i => i_10
  # DEBUG i => i_10
  if (ivtmp_25 < _2)
goto ;
  else
goto ;

  :
  ivtmp_23 = ivtmp_22 + 1;
  goto ;
...

The ICE triggers because the use of i_10 in debug insn 'DEBUG i => i_10' 
in bb 13 is no longer dominated by the defition of i_10 in bb 5.


The patch fixes the ICE by ensuring that 
gimple_split_block_before_cond_jump really splits before cond_jump, 
instead of after the last nondebug insn before cond_jump, as it does 
now. This behaviour also better matches the rtl implementation of the 
cfghook. Btw, note that the only user of cfghook 
split_block_before_cond_jump is transform_to_exit_first_loop_alt.


[ A similar fix for an openacc variant of this ICE was committed on the 
gomp-4_0-branch: https://gcc.gnu.org/ml/gcc-patches/2015-07/msg00060.html ]


Bootstrapped and reg-tested on x86_64.

OK for trunk?

Thanks,
- Tom
Fix transform_to_exit_first_loop_alt with -g

2015-11-06  Tom de Vries  

	* tree-cfg.c (gimple_split_block_before_cond_jump): Split before
	cond_jump, instead of split after last nondebug insn before cond_jump.
	* tree-parloops.c (transform_to_exit_first_loop_alt): Verify ssa before
	returning.
---
 gcc/tree-cfg.c  | 2 +-
 gcc/tree-parloops.c | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index cfed3c2..5d98eec 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -5783,7 +5783,7 @@ gimple_split_block_before_cond_jump (basic_block bb)
   if (gimple_code (last) != GIMPLE_COND
   && gimple_code (last) != GIMPLE_SWITCH)
 return NULL;
-  gsi_prev_nondebug ();
+  gsi_prev ();
   split_point = gsi_stmt (gsi);
   return split_block (bb, split_point)->dest;
 }
diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c
index 6c85634..3d41275 100644
--- a/gcc/tree-parloops.c
+++ b/gcc/tree-parloops.c
@@ -1737,6 +1737,8 @@ transform_to_exit_first_loop_alt (struct loop *loop,
   /* Recalculate dominance info.  */
   free_dominance_info (CDI_DOMINATORS);
   calculate_dominance_info (CDI_DOMINATORS);
+
+  checking_verify_ssa (true, true);
 }
 
 /* Tries to moves the exit condition of LOOP to the beginning of its header
-- 
1.9.1



Re: [hsa 5/12] New HSA-related GCC options

2015-11-06 Thread Richard Biener
On Thu, 5 Nov 2015, Martin Jambor wrote:

> Hi,
> 
> the following small part of the merge deals with new options.  It adds
> four independent things:
> 
> 1) flag_disable_hsa is used by code in opts.c (in the first patch) to
>remember whether HSA has been explicitely disabled on the compiler
>command line.

But I don't see any way to disable it on the command line?  (no switch?)

> 2) -Whsa is a new warning we emit whenever we fail to produce HSAIL
>for some source code.  It is on by default but of course only
>emitted by HSAIL generating code so should never affect anybody who
>does not use HSA-enabled compiler and OpenMP 4 device constructs.
> 
> We have found the following two additions very useful for debugging on
> the branch but will understand if they are not deemed suitable for
> trunk and will gladly remove them:
> 
> 3) -fdisable-hsa-gridification disables the gridification process to
>ease experimenting with dynamic parallelism.  With this option,
>HSAIL is always generated from the CPU-intended gimple.

So this sounds like sth a user should never do which means
it shouln't be a switch (but a parameter or removed).

> 4) Parameter hsa-gen-debug-stores will be obsolete once HSA run-time
>supports debugging traps.  Before that, we have to do with
>debugging stores to memory at defined places, which however can
>cost speed in benchmarks.  So they are only enabled with this
>parameter.  We decided to make it a parameter rather than a switch
>to emphasize the fact it will go away and to possibly allow us
>select different levels of verbosity of the stores in the future).

You miss documentation in invoke.texi for new switches and parameters.

> Thanks,
> 
> Martin
> 
> 
> 2015-11-05  Martin Jambor  
> 
>   * common.opt (disable_hsa): New variable.
>   (-Whsa): New warning.
>   (-fdisable-hsa-gridification): New option.
>   * params.def (PARAM_HSA_GEN_DEBUG_STORES): New parameter.
> 
> diff --git a/gcc/common.opt b/gcc/common.opt
> index 961a1b6..9cb52db 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -223,6 +223,10 @@ unsigned int flag_sanitize_recover = SANITIZE_UNDEFINED 
> | SANITIZE_NONDEFAULT |
>  Variable
>  bool dump_base_name_prefixed = false
>  
> +; Flag whether HSA generation has been explicitely disabled
> +Variable
> +bool flag_disable_hsa = false
> +
>  ###
>  Driver
>  
> @@ -577,6 +581,10 @@ Wfree-nonheap-object
>  Common Var(warn_free_nonheap_object) Init(1) Warning
>  Warn when attempting to free a non-heap object.
>  
> +Whsa
> +Common Var(warn_hsa) Init(1) Warning
> +Warn when a function cannot be expanded to HSAIL.
> +
>  Winline
>  Common Var(warn_inline) Warning
>  Warn when an inlined function cannot be inlined.
> @@ -1107,6 +1115,10 @@ fdiagnostics-show-location=
>  Common Joined RejectNegative Enum(diagnostic_prefixing_rule)
>  -fdiagnostics-show-location=[once|every-line]How often to emit 
> source location at the beginning of line-wrapped diagnostics.
>  
> +fdisable-hsa-gridification
> +Common Report Var(flag_disable_hsa_gridification)
> +Disable HSA gridification for OMP pragmas
> +
>  ; Required for these enum values.
>  SourceInclude
>  pretty-print.h
> diff --git a/gcc/params.def b/gcc/params.def
> index c5d96e7..86911e2 100644
> --- a/gcc/params.def
> +++ b/gcc/params.def
> @@ -1177,6 +1177,11 @@ DEFPARAM (PARAM_MAX_SSA_NAME_QUERY_DEPTH,
> "Maximum recursion depth allowed when querying a property of an"
> " SSA name.",
> 2, 1, 0)
> +
> +DEFPARAM (PARAM_HSA_GEN_DEBUG_STORES,
> +   "hsa-gen-debug-stores",
> +   "Level of hsa debug stores verbosity",
> +   0, 0, 1)
>  /*
>  
>  Local variables:
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [Patch ifcvt] Teach RTL ifcvt to handle multiple simple set instructions

2015-11-06 Thread Christophe Lyon
On 4 November 2015 at 16:37, James Greenhalgh  wrote:
>
> On Wed, Nov 04, 2015 at 12:04:19PM +0100, Bernd Schmidt wrote:
>> On 10/30/2015 07:03 PM, James Greenhalgh wrote:
>> >+ i = tmp_i; <- Should be cleaned up
>>
>> Maybe reword as "Subsequent passes are expected to clean up the
>> extra moves", otherwise it sounds like a TODO item.
>>
>> >+   read back in anotyher SET, as might occur in a swap idiom or
>>
>> Typo.
>>
>> >+  if (find_reg_note (insn, REG_DEAD, new_val) != NULL_RTX)
>> >+{
>> >+  /* The write to targets[i] is only live until the read
>> >+ here.  As the condition codes match, we can propagate
>> >+ the set to here.  */
>> >+   new_val = SET_SRC (single_set (unmodified_insns[i]));
>> >+}
>>
>> Shouldn't use braces around single statements (also goes for the
>> surrounding for loop).
>>
>> >+  /* We must have at least one real insn to convert, or there will
>> >+ be trouble!  */
>> >+  unsigned count = 0;
>>
>> The comment seems a bit strange in this context - I think it's left
>> over from the earlier version?
>>
>> As far as I'm concerned this is otherwise ok.
>
> Thanks,
>
> I've updated the patch with those issues addressed. As the cost model was
> controversial in an earlier revision, I'll leave this on list for 24 hours
> and, if nobody jumps in to object, commit it tomorrow.
>
> I've bootstrapped and tested the updated patch on x86_64-none-linux-gnu
> just to check that I got the braces right, with no issues.
>

The new test does not pass on some ARM configurations, I filed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68232

Christophe.

> Thanks,
> James
>
> ---
> gcc/
>
> 2015-11-04  James Greenhalgh  
>
> * ifcvt.c (bb_ok_for_noce_convert_multiple_sets): New.
> (noce_convert_multiple_sets): Likewise.
> (noce_process_if_block): Call them.
>
> gcc/testsuite/
>
> 2015-11-04  James Greenhalgh  
>
> * gcc.dg/ifcvt-4.c: New.
>


Re: [PATCH] PR67518 and PR53852 -- add testcase.

2015-11-06 Thread Paul Richard Thomas
Dear Joost,

These two testcases look fine to me. PR53852 is marked as being a
4.9,5,6 regression. If I have understood correctly, it has only been
fixed on trunk. Do you know if there is any intention to fix it on the
other branches?

OK for trunk and, subject to the previous question being answered
affirmatively, for 4.9/5 branches.

Thanks

Paul

On 6 November 2015 at 08:04, VandeVondele  Joost
 wrote:
>> Attached testcases for two previously fixed PRs (and thanks to Dominique who 
>> was quicker for PR67982).
> ping ?



-- 
Outside of a dog, a book is a man's best friend. Inside of a dog it's
too dark to read.

Groucho Marx


RE: [PATCH] PR67518 and PR53852 -- add testcase.

2015-11-06 Thread VandeVondele Joost
Thanks Paul. I believe PR53852 won't be fixed on 4.9/5 as it seems to depend on 
the recent graphite cleanup work and recent isl. As such I'll commit to trunk 
only.

Re: [PATCH] Fix PR68067

2015-11-06 Thread Richard Biener
On Fri, 6 Nov 2015, Alan Lawrence wrote:

> On 28/10/15 13:38, Richard Biener wrote:
> > 
> > Applied as follows.
> > 
> > Bootstrapped / tested on x86_64-unknown-linux-gnu.
> > 
> > Richard.
> > 
> > 2015-10-28  Richard Biener  
> > 
> > * fold-const.c (negate_expr_p): Adjust the division case to
> > properly avoid introducing undefined overflow.
> > (fold_negate_expr): Likewise.
> 
> Since this we've been seeing an ICE compiling polynom.c from 254.gap in
> SPEC2000 on aarch64-linux-gnu with -O3 -ffast-math -mcpu=cortex-a53 (or -Ofast
> -mcpu=cortex-a53), on both native (bootstrapped and --disable-bootstrap) and
> cross-linux builds.
> 
> A number of options prevent the ICE, e.g. any of -fno-thread-jumps,
> -fno-strict-overflow, -fdump-tree-alias or -fdump-tree-ealias (!). Similarly,
> dropping the -mcpu=cortex-a53, or changing to -mcpu=cortex-a57.
> 
> (I have a recent build in a chroot for which -fno-strict-overflow does *not*
> fix the ICE but haven't yet figured out exactly what the difference in the
> chroot environment is.)
> 
> Moreover, preprocessing in a separate step (i.e. piping preprocessed output
> via a file with -E), also avoids the ICE. (This is hindering my efforts to
> reduce the testcase!).  So my hypothesis is that this is a
> front-end/preprocessor bug, rather than anything directly due to this commit.
> 
> The error message in full (line refs from that commit, r229479) is:
> =
> ../spec2000/benchspec/CINT2000/254.gap/src/polynom.c: In function
> ‘NormalizeCoeffsListx’:
> ../spec2000/benchspec/CINT2000/254.gap/src/polynom.c:358:11: error:
> incompatible types in PHI argument 0
>  TypHandle NormalizeCoeffsListx ( hdC )
>^
> long int
> 
> int
> 
> ../spec2000/benchspec/CINT2000/254.gap/src/polynom.c:358:11: error: location
> references block not in block tree
> l1_279 = PHI <1(28), l1_299(33)>

^^^

this is the error to look at!  It means that the GC heap will be corrupted
quite easily.

> ../spec2000/benchspec/CINT2000/254.gap/src/polynom.c:358:11: error: invalid
> PHI argument

which means this could be a followup error.  We do have a bugreport (or 
two) about similar issues that were tracked down to different patches.

Somebody needs to sit down and debug this properly ;)

> ../spec2000/benchspec/CINT2000/254.gap/src/polynom.c:358:11: internal compiler
> error: tree check: expected class ‘type’, have ‘declaration’ (namespace_decl)
> in useless_type_conversion_p, at gimple-expr.c:84
> 0xd110ef tree_class_check_failed(tree_node const*, tree_code_class, char
> const*, int, char const*)
> ../../gcc-fsf/gcc/tree.c:9643
> 0x82561b tree_class_check
> ../../gcc-fsf/gcc/tree.h:3042
> 0x82561b useless_type_conversion_p(tree_node*, tree_node*)
> ../../gcc-fsf/gcc/gimple-expr.c:84
> 0xaca043 verify_gimple_phi
> ../../gcc-fsf/gcc/tree-cfg.c:4673
> 0xaca043 verify_gimple_in_cfg(function*, bool)
> ../../gcc-fsf/gcc/tree-cfg.c:4967
> 0x9c2e0b execute_function_todo
> ../../gcc-fsf/gcc/passes.c:1967

Interesting would be for which pass this happens - just print
*pass at this point.

> 0x9c360b do_per_function
> ../../gcc-fsf/gcc/passes.c:1659
> 0x9c3807 execute_todo
> ../../gcc-fsf/gcc/passes.c:2022
> Please submit a full bug report,
> with preprocessed source if appropriate.
> =
> which looks like an "incompatible types from PHI argument" from a first call
> to verify_gimple_phi, then a second call to verify_gimple_phi prints "invalid
> phi argument" and ICEs in the test just before possibly printing a second
> incompatible_types message.
> 
> 
> --Alan
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)

[gomp4] Re: [1/3] OpenACC reductions

2015-11-06 Thread Thomas Schwinge
Hi Nathan!

On Mon, 2 Nov 2015 11:18:37 -0500, Nathan Sidwell  wrote:
> This is the core execution bits of OpenACC reductions.

> One thing not handled by this patch are reductions of variables of reference 
> type.  We have an implementation on gomp4 branch [...]

Trying to keep the existing code on gomp-4_0-branch alive, I merged your
trunk r229767 into gomp-4_0-branch in r229835.  To avoid regressions in
libgomp reduction execution tests, I had to apply one hack; please have a
look.  For your easier review, here is the merge commit in two variants,
first displayed as a three-way diff by Git's --cc option:

commit 2b76127eebddb59d45e5f068324e14efe77bb05c
Merge: bed2efe 641a0fa
Author: tschwinge 
Date:   Fri Nov 6 09:33:40 2015 +

svn merge -r 229764:229767 svn+ssh://gcc.gnu.org/svn/gcc/trunk


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@229835 
138bc75d-0d04-0410-961f-82ee72b054a4


 gcc/ChangeLog   | 28 +++-
 gcc/omp-low.c   | 58 ++---
 gcc/targhooks.h |  2 +-
 3 files changed, 67 insertions(+), 21 deletions(-)

diff --cc gcc/omp-low.c
index debedb1,6a0915b..da574a9
--- gcc/omp-low.c
+++ gcc/omp-low.c
@@@ -5441,14 -5306,25 +5441,28 @@@ lower_oacc_reductions (location_t loc, 
  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION)
{
tree orig = OMP_CLAUSE_DECL (c);
-   tree var = OMP_CLAUSE_REDUCTION_PRIVATE_DECL (c);
 -  tree var = maybe_lookup_decl (orig, ctx);
++  tree var;
tree ref_to_res = NULL_TREE;
-   
+   tree incoming, outgoing;
+ 
+   enum tree_code rcode = OMP_CLAUSE_REDUCTION_CODE (c);
+   if (rcode == MINUS_EXPR)
+ rcode = PLUS_EXPR;
+   else if (rcode == TRUTH_ANDIF_EXPR)
+ rcode = BIT_AND_EXPR;
+   else if (rcode == TRUTH_ORIF_EXPR)
+ rcode = BIT_IOR_EXPR;
+   tree op = build_int_cst (unsigned_type_node, rcode);
+ 
++  var = OMP_CLAUSE_REDUCTION_PRIVATE_DECL (c);
 +  if (!var)
 +var = maybe_lookup_decl (orig, ctx);
if (!var)
  var = orig;
+   gcc_assert (!is_reference (var));
  
+   incoming = outgoing = var;
+   
if (!inner)
  {
/* See if an outer construct also reduces this variable.  */
@@@ -5490,24 -5365,22 +5503,31 @@@
   see if there's a mapping for it.  */
if (gimple_code (outer->stmt) == GIMPLE_OMP_TARGET
&& maybe_lookup_field (orig, outer))
- ref_to_res = build_receiver_ref (orig, false, outer);
+ {
+   ref_to_res = build_receiver_ref (orig, false, outer);
+   if (is_reference (orig))
+ ref_to_res = build_simple_mem_ref (ref_to_res);
  
+   outgoing = var;
+   incoming = omp_reduction_init_op (loc, rcode, TREE_TYPE (var));
+ }
++  /* This is enabled on trunk, but has been disabled in the merge of
++ trunk r229767 into gomp-4_0-branch, as otherwise there were a
++ lot of regressions in libgomp reduction execution tests.  It is
++ unclear if the problem is in the tests themselves, or here, or
++ elsewhere.  Given the usage of "var =
++ OMP_CLAUSE_REDUCTION_PRIVATE_DECL (c)" on gomp-4_0-branch, maybe
++ we have to consider that here, too, instead of "orig"?  */
++#if 0
+   else
+ incoming = outgoing = orig;
++#endif
+ 
  has_outer_reduction:;
  }
-   gcc_assert (!is_reference (var));
+ 
if (!ref_to_res)
  ref_to_res = integer_zero_node;
-   else if (is_reference (orig))
- ref_to_res = build_simple_mem_ref (ref_to_res);
- 
-   enum tree_code rcode = OMP_CLAUSE_REDUCTION_CODE (c);
-   if (rcode == MINUS_EXPR)
- rcode = PLUS_EXPR;
-   else if (rcode == TRUTH_ANDIF_EXPR)
- rcode = BIT_AND_EXPR;
-   else if (rcode == TRUTH_ORIF_EXPR)
- rcode = BIT_IOR_EXPR;
-   tree op = build_int_cst (unsigned_type_node, rcode);
  
/* Determine position in reduction buffer, which may be used
   by target.  */
diff --cc gcc/targhooks.h
index f8efe47a,c34e4ae..4a4496a
--- gcc/targhooks.h
+++ gcc/targhooks.h
@@@ -109,10 -109,9 +109,10 @@@ extern void default_finish_cost (void *
  extern void default_destroy_cost_data (void *);
  
  /* OpenACC hooks.  */
- extern void default_goacc_reduction (gcall *);
  extern bool default_goacc_validate_dims (tree, int [], int);
 +extern unsigned default_goacc_dim_limit (unsigned);
  extern bool default_goacc_fork_join (gcall *, const int [], bool);
+ extern void default_goacc_reduction (gcall *);
  
  /* These are here, and not in hooks.[ch], because not all users of
 hooks.h include tm.h, and thus we don't have CUMULATIVE_ARGS.  */

..., and second, as a "plain patch" 

[PATCH][ARM] PR 68143 Properly update memory offsets when expanding setmem

2015-11-06 Thread Kyrill Tkachov

Hi all,

In this wrong-code PR the vector setmem expansion and 
arm_block_set_aligned_vect in particular
use the wrong offset when calling adjust_automodify_address. In the attached 
testcase during the
initial zeroing out we get two V16QI stores, but they both are recorded by 
adjust_automodify_address
as modifying x+0 rather than x+0 and x+12 (the total size to be written is 28).

This led to the scheduling pass moving the store from "x.g = 2;" to before the 
zeroing stores.

This patch fixes the problem by keeping track of the offset to which stores are 
emitted and
passing it to adjust_automodify_address as appropriate.

From inspection I see arm_block_set_unaligned_vect also has this issue so I 
performed the same
fix in that function as well.

Bootstrapped and tested on arm-none-linux-gnueabihf.

Ok for trunk?

This bug appears on GCC 5 too and I'm currently testing this patch there.
Ok to backport to GCC 5 as well?

Thanks,
Kyrill

2015-11-06  Kyrylo Tkachov  

PR target/68143
* config/arm/arm.c (arm_block_set_unaligned_vect): Keep track of
offset from dstbase and use it appropriately in
adjust_automodify_address.
(arm_block_set_aligned_vect): Likewise.

2015-11-06  Kyrylo Tkachov  

PR target/68143
* gcc.target/arm/pr68143_1.c: New test.
commit 78c6989a7af1df672ea227057180d79d717ed5f3
Author: Kyrylo Tkachov 
Date:   Wed Oct 28 17:29:18 2015 +

[ARM] Properly update memory offsets when expanding setmem

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 66e8afc..adf3143 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -29268,7 +29268,7 @@ arm_block_set_unaligned_vect (rtx dstbase,
   rtx (*gen_func) (rtx, rtx);
   machine_mode mode;
   unsigned HOST_WIDE_INT v = value;
-
+  unsigned int offset = 0;
   gcc_assert ((align & 0x3) != 0);
   nelt_v8 = GET_MODE_NUNITS (V8QImode);
   nelt_v16 = GET_MODE_NUNITS (V16QImode);
@@ -29289,7 +29289,7 @@ arm_block_set_unaligned_vect (rtx dstbase,
 return false;
 
   dst = copy_addr_to_reg (XEXP (dstbase, 0));
-  mem = adjust_automodify_address (dstbase, mode, dst, 0);
+  mem = adjust_automodify_address (dstbase, mode, dst, offset);
 
   v = sext_hwi (v, BITS_PER_WORD);
   val_elt = GEN_INT (v);
@@ -29306,7 +29306,11 @@ arm_block_set_unaligned_vect (rtx dstbase,
 {
   emit_insn ((*gen_func) (mem, reg));
   if (i + 2 * nelt_mode <= length)
-	emit_insn (gen_add2_insn (dst, GEN_INT (nelt_mode)));
+	{
+	  emit_insn (gen_add2_insn (dst, GEN_INT (nelt_mode)));
+	  offset += nelt_mode;
+	  mem = adjust_automodify_address (dstbase, mode, dst, offset);
+	}
 }
 
   /* If there are not less than nelt_v8 bytes leftover, we must be in
@@ -29317,6 +29321,9 @@ arm_block_set_unaligned_vect (rtx dstbase,
   if (i + nelt_v8 < length)
 {
   emit_insn (gen_add2_insn (dst, GEN_INT (length - i)));
+  offset += length - i;
+  mem = adjust_automodify_address (dstbase, mode, dst, offset);
+
   /* We are shifting bytes back, set the alignment accordingly.  */
   if ((length & 1) != 0 && align >= 2)
 	set_mem_align (mem, BITS_PER_UNIT);
@@ -29327,12 +29334,13 @@ arm_block_set_unaligned_vect (rtx dstbase,
   else if (i < length && i + nelt_v8 >= length)
 {
   if (mode == V16QImode)
-	{
-	  reg = gen_lowpart (V8QImode, reg);
-	  mem = adjust_automodify_address (dstbase, V8QImode, dst, 0);
-	}
+	reg = gen_lowpart (V8QImode, reg);
+
   emit_insn (gen_add2_insn (dst, GEN_INT ((length - i)
 	  + (nelt_mode - nelt_v8;
+  offset += (length - i) + (nelt_mode - nelt_v8);
+  mem = adjust_automodify_address (dstbase, V8QImode, dst, offset);
+
   /* We are shifting bytes back, set the alignment accordingly.  */
   if ((length & 1) != 0 && align >= 2)
 	set_mem_align (mem, BITS_PER_UNIT);
@@ -29359,6 +29367,7 @@ arm_block_set_aligned_vect (rtx dstbase,
   rtx rval[MAX_VECT_LEN];
   machine_mode mode;
   unsigned HOST_WIDE_INT v = value;
+  unsigned int offset = 0;
 
   gcc_assert ((align & 0x3) == 0);
   nelt_v8 = GET_MODE_NUNITS (V8QImode);
@@ -29390,14 +29399,15 @@ arm_block_set_aligned_vect (rtx dstbase,
   /* Handle first 16 bytes specially using vst1:v16qi instruction.  */
   if (mode == V16QImode)
 {
-  mem = adjust_automodify_address (dstbase, mode, dst, 0);
+  mem = adjust_automodify_address (dstbase, mode, dst, offset);
   emit_insn (gen_movmisalignv16qi (mem, reg));
   i += nelt_mode;
   /* Handle (8, 16) bytes leftover using vst1:v16qi again.  */
   if (i + nelt_v8 < length && i + nelt_v16 > length)
 	{
 	  emit_insn (gen_add2_insn (dst, GEN_INT (length - nelt_mode)));
-	  mem = adjust_automodify_address (dstbase, mode, dst, 0);
+	  offset += length - nelt_mode;
+	  mem = adjust_automodify_address (dstbase, mode, dst, offset);
 	  /* We are shifting bytes back, set the alignment accordingly.  */
 	  if ((length & 0x3) == 0)
 	set_mem_align (mem, 

Re: [patch] fix regrename pass to ensure renamings produce valid insns

2015-11-06 Thread Bernd Schmidt

On 06/17/2015 07:11 PM, Sandra Loosemore wrote:


Index: gcc/regrename.c
===
--- gcc/regrename.c (revision 224532)
+++ gcc/regrename.c (working copy)
@@ -942,19 +942,22 @@ regrename_do_replace (struct du_head *he
int reg_ptr = REG_POINTER (*chain->loc);

if (DEBUG_INSN_P (chain->insn) && REGNO (*chain->loc) != base_regno)
-   INSN_VAR_LOCATION_LOC (chain->insn) = gen_rtx_UNKNOWN_VAR_LOC ();
+   validate_change (chain->insn, &(INSN_VAR_LOCATION_LOC (chain->insn)),
+gen_rtx_UNKNOWN_VAR_LOC (), true);
else
{
- *chain->loc = gen_raw_REG (GET_MODE (*chain->loc), reg);
+ validate_change (chain->insn, chain->loc,
+  gen_raw_REG (GET_MODE (*chain->loc), reg), true);
  if (regno >= FIRST_PSEUDO_REGISTER)
ORIGINAL_REGNO (*chain->loc) = regno;
  REG_ATTRS (*chain->loc) = attr;


With a patch I'm working on that uses the renamer more often, I found 
that this is causing compare-debug failures. Validating changes to 
debug_insns (the INSN_VAR_LOCATION_LOC in particular) can apparently fail.


The following fix was bootstrapped and tested with -frename-registers 
enabled at -O1 on x86_64-linux. Ok?



Bernd

	* regrename.c (regrename_do_replace): Do not validate changes to
	debug insns.

Index: gcc/regrename.c
===
--- gcc/regrename.c	(revision 229049)
+++ gcc/regrename.c	(working copy)
@@ -946,10 +951,7 @@ regrename_do_replace (struct du_head *he
   struct reg_attrs *attr = REG_ATTRS (*chain->loc);
   int reg_ptr = REG_POINTER (*chain->loc);
 
-  if (DEBUG_INSN_P (chain->insn) && REGNO (*chain->loc) != base_regno)
-	validate_change (chain->insn, &(INSN_VAR_LOCATION_LOC (chain->insn)),
-			 gen_rtx_UNKNOWN_VAR_LOC (), true);
-  else
+  if (!DEBUG_INSN_P (chain->insn))
 	{
 	  validate_change (chain->insn, chain->loc, 
 			   gen_raw_REG (GET_MODE (*chain->loc), reg), true);
@@ -963,6 +965,16 @@ regrename_do_replace (struct du_head *he
   if (!apply_change_group ())
 return false;
 
+  for (chain = head->first; chain; chain = chain->next_use)
+if (DEBUG_INSN_P (chain->insn))
+  {
+	if (REGNO (*chain->loc) != base_regno)
+	  INSN_VAR_LOCATION_LOC (chain->insn) = gen_rtx_UNKNOWN_VAR_LOC ();
+	else
+	  *chain->loc = gen_raw_REG (GET_MODE (*chain->loc), reg);
+	df_insn_rescan (chain->insn);
+  }
+
   mode = GET_MODE (*head->first->loc);
   head->regno = reg;
   head->nregs = hard_regno_nregs[reg][mode];


[gomp4] Re: [3/3] OpenACC reductions

2015-11-06 Thread Thomas Schwinge
Hi Nathan!

On Mon, 2 Nov 2015 11:38:47 -0500, Nathan Sidwell  wrote:
> This patch are the initial set of tests.  The libgomp tests use an idiom of 
> summing thread identifiers and then checking the expected set of threads 
> participated.  They are all derived from the loop tests I recently added for 
> the 
> execution model itself.
> 
> The fortran test was duplicated in both the gfortran testsuite and the 
> libgomp 
> testsuite.   I deleted it from the former.  It was slightly bogus as it asked 
> for a vector-length of 40, and appeared to be working by accident by not 
> actually partitioning the loop.  I fixed that up

On gomp-4_0-branch, you had modified/XFAILed (ICE) that test in r228955,

-- which still needs to be resolved, so I left that as-is, that is, did
not delete the gcc/testsuite/gfortran.dg/goacc/reduction-2.f95 file in
the merge commit.

> and reworked it to avoid 
> needing a reduction on a reference variable.  Reference handling will be a 
> later 
> patch.

As that is -- apparently -- functional on gomp-4_0-branch, I also left
the libgomp/testsuite/libgomp.oacc-fortran/reduction-5.f90 file as-is;
it's also doing more elaborate testing in its gomp-4_0-branch variant.

Merged your trunk r229769 and r229770 into gomp-4_0-branch in r229837,
effectively just adding your new libgomp testsuite files unmodified:

commit a222b569f0234d219fec69cd13b66446f664440d
Merge: 089a022 06d6724
Author: tschwinge 
Date:   Fri Nov 6 09:40:44 2015 +

svn merge -r 229768:229770 svn+ssh://gcc.gnu.org/svn/gcc/trunk


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@229837 
138bc75d-0d04-0410-961f-82ee72b054a4

 gcc/testsuite/ChangeLog|  4 ++
 libgomp/ChangeLog  | 11 
 .../libgomp.oacc-c-c++-common/loop-red-g-1.c   | 54 
 .../libgomp.oacc-c-c++-common/loop-red-gwv-1.c | 56 
 .../libgomp.oacc-c-c++-common/loop-red-v-1.c   | 56 
 .../libgomp.oacc-c-c++-common/loop-red-v-2.c   | 59 ++
 .../libgomp.oacc-c-c++-common/loop-red-w-1.c   | 54 
 .../libgomp.oacc-c-c++-common/loop-red-w-2.c   | 57 +
 .../libgomp.oacc-c-c++-common/loop-red-wv-1.c  | 54 
 9 files changed, 405 insertions(+)


Grüße
 Thomas


signature.asc
Description: PGP signature


[gomp4] Re: [2/3] OpenACC reductions

2015-11-06 Thread Thomas Schwinge
Hi Nathan!

On Wed, 4 Nov 2015 11:59:28 -0500, Nathan Sidwell  wrote:
> [PTX backend pieces of OpenACC reduction handling]

Merged your trunk r229768 into gomp-4_0-branch in r229836:

commit 089a0224af68e30b55f42734de48adc645eb7370
Merge: 2b76127 78a78aa
Author: tschwinge 
Date:   Fri Nov 6 09:38:10 2015 +

svn merge -r 229767:229768 svn+ssh://gcc.gnu.org/svn/gcc/trunk


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@229836 
138bc75d-0d04-0410-961f-82ee72b054a4

 gcc/ChangeLog|  23 +++
 gcc/config/nvptx/nvptx.c | 169 +++
 2 files changed, 107 insertions(+), 85 deletions(-)

I hope I did the right thing replacing the existing code on
gomp-4_0-branch with what you committed to trunk: in particular, the
nvptx_lockless_update and nvptx_goacc_reduction_init functions.  That is,
in the merge commit, I effectively applied the following patch
(gomp-4_0-branch before vs. after):

--- gcc/ChangeLog
+++ gcc/ChangeLog
[...]
--- gcc/config/nvptx/nvptx.c
+++ gcc/config/nvptx/nvptx.c
@@ -57,21 +57,22 @@
[#include directives reshuffled]
@@ -104,19 +105,18 @@ struct tree_hasher : ggc_cache_ptr_hash
 static GTY((cache)) hash_table *declared_fndecls_htab;
 static GTY((cache)) hash_table *needed_fndecls_htab;
 
-/* Size of buffer needed to broadcast across workers.  This is used
-   for both worker-neutering and worker broadcasting.   It is shared
-   by all functions emitted.  The buffer is placed in shared memory.
-   It'd be nice if PTX supported common blocks, because then this
-   could be shared across TUs (taking the largest size).  */
+/* Buffer needed to broadcast across workers.  This is used for both
+   worker-neutering and worker broadcasting.  It is shared by all
+   functions emitted.  The buffer is placed in shared memory.  It'd be
+   nice if PTX supported common blocks, because then this could be
+   shared across TUs (taking the largest size).  */
 static unsigned worker_bcast_size;
 static unsigned worker_bcast_align;
 #define worker_bcast_name "__worker_bcast"
 static GTY(()) rtx worker_bcast_sym;
 
-/* Size of buffer needed for worker reductions.  This has to be
-   distinct from the worker broadcast array, as both may be live
-   concurrently.  */
+/* Buffer needed for worker reductions.  This has to be distinct from
+   the worker broadcast array, as both may be live concurrently.  */
 static unsigned worker_red_size;
 static unsigned worker_red_align;
 #define worker_red_name "__worker_red"
@@ -3977,8 +3977,8 @@ nvptx_file_end (void)
 {
   /* Define the reduction buffer.  */
 
-  worker_red_size = (worker_red_size + worker_red_align - 1)
-   & ~(worker_red_align - 1);
+  worker_red_size = ((worker_red_size + worker_red_align - 1)
+& ~(worker_red_align - 1));
   
   fprintf (asm_out_file, "// BEGIN VAR DEF: %s\n", worker_red_name);
   fprintf (asm_out_file, ".shared .align %d .u8 %s[%d];\n",
@@ -3986,7 +3986,7 @@ nvptx_file_end (void)
   worker_red_name, worker_red_size);
 }
 }
-
+
 /* Expander for the shuffle builtins.  */
 
 static rtx
@@ -4046,6 +4046,10 @@ nvptx_expand_worker_addr (tree exp, rtx target,
   return target;
 }
 
+/* Expand the CMP_SWAP PTX builtins.  We have our own versions that do
+   not require taking the address of any object, other than the memory
+   cell being operated on.  */
+
 static rtx
 nvptx_expand_cmp_swap (tree exp, rtx target,
   machine_mode ARG_UNUSED (m), int ARG_UNUSED (ignore))
@@ -4096,7 +4100,7 @@ static GTY(()) tree 
nvptx_builtin_decls[NVPTX_BUILTIN_MAX];
 /* Return the NVPTX builtin for CODE.  */
 
 static tree
-nvptx_builtin_decl (unsigned code, bool initialize_p ATTRIBUTE_UNUSED)
+nvptx_builtin_decl (unsigned code, bool ARG_UNUSED (initialize_p))
 {
   if (code >= NVPTX_BUILTIN_MAX)
 return error_mark_node;
@@ -4110,10 +4114,10 @@ static void
 nvptx_init_builtins (void)
 {
 #define DEF(ID, NAME, T)   \
-  (nvptx_builtin_decls[NVPTX_BUILTIN_ ## ID] = \
-   add_builtin_function ("__builtin_nvptx_" NAME,  \
-build_function_type_list T,\
-NVPTX_BUILTIN_ ## ID, BUILT_IN_MD, NULL, NULL))
+  (nvptx_builtin_decls[NVPTX_BUILTIN_ ## ID]   \
+   = add_builtin_function ("__builtin_nvptx_" NAME,\
+  build_function_type_list T,  \
+  NVPTX_BUILTIN_ ## ID, BUILT_IN_MD, NULL, NULL))
 #define ST sizetype
 #define UINT unsigned_type_node
 #define LLUINT long_long_unsigned_type_node
@@ -4140,7 +4144,7 @@ nvptx_init_builtins (void)
IGNORE is nonzero if the value is to be ignored.  */
 
 static rtx
-nvptx_expand_builtin (tree exp, rtx target, rtx subtarget 

Re: Merge of HSA branch

2015-11-06 Thread Martin Liška
On 11/06/2015 11:12 AM, Bernd Schmidt wrote:
> On 11/05/2015 10:51 PM, Martin Jambor wrote:
>> Individual changes are described in slightly more detail in their
>> respective messages.  If you are interested in how the HSAIL
>> generation works in general, I encourage you to have a look at my
>> Cauldron slides or presentation, only very few things have changed as
>> far as the general principles are concerned.  Let me just quickly stress
>> here that we do acceleration within a single compiler, as opposed to
>> LTO-ways of all the other accelerator teams.
> 
> Realistically we're probably not going to reject this work, but I still want 
> to ask whether the approach was acked by the community before you started. 
> I'm really not exactly thrilled about having two different classes of 
> backends in the compiler, and two different ways of handling offloading.
> 
>> I also acknowledge that we should add HSA-specific tests to the GCC
>> testsuite but we are only now looking at how to do that and will
>> welcome any guidance in this regard.
> 
> Yeah, I was looking for any kind of new test, because...
> 
>> the class of OpenMP loops we can handle well is small,
> 
> I'd appreciate more information on what this means. Any examples or 
> performance numbers?

Hello.

As mentioned by Martin Jambor, it was explained during his speech at the 
Cauldron this year.
It can be easily explained on the following simple case:

#pragma omp target teams
#pragma omp distribute parallel for private(j)
   for (j=0; j 
> 
> Bernd



regrename: Fix for earlyclobber operands

2015-11-06 Thread Bernd Schmidt
I have a patch which makes use of the renamer more often, and this 
exposed a bug with earlyclobber operands. The code that does the 
terminate_write step has the following comment:


  /* Step 5: Close open chains that overlap writes.  Similar to
 step 2, we hide in-out operands, since we do not want to
 close these chains.  We also hide earlyclobber operands,
 since we've opened chains for them in step 1, and earlier
 chains they would overlap with must have been closed at
 the previous insn at the latest, as such operands cannot
 possibly overlap with any input operands.  */

That's all right as far as it goes, but the problem is that this means 
there isn't a terminate_write step for earlyclobbers.


The following seems like the simplest possible fix. It was bootstrapped 
and tested with -frename-registers enabled at -O1 on x86_64-linux. Ok?


(Incidentally there are some avx tests that fail if they are renamed, 
apparently because the scan-assembler doesn't allow register numbers 
like %zmm10. avx512bw-vptestmb-1.c is one of those).



Bernd
	* regrename.c (record_out_operands): Terminate earlyclobbered
	operands here.

Index: gcc/regrename.c
===
--- gcc/regrename.c	(revision 229049)
+++ gcc/regrename.c	(working copy)
@@ -1513,6 +1525,8 @@ record_out_operands (rtx_insn *insn, boo
 	cur_operand = insn_info->op_info + i;
 
   prev_open = open_chains;
+  if (earlyclobber)
+	scan_rtx (insn, loc, cl, terminate_write, OP_OUT);
   scan_rtx (insn, loc, cl, mark_write, OP_OUT);
 
   /* ??? Many targets have output constraints on the SET_DEST


Re: improved RTL-level if conversion using scratchpads [half-hammock edition]

2015-11-06 Thread Bernd Schmidt

On 11/06/2015 12:43 AM, Abe wrote:

Feedback from Bernd has also been applied.


But inconsistently, and I think not quite in the way I meant it in one case.


-/* Return true if a write into MEM may trap or fault.  */
-
  static bool
  noce_mem_write_may_trap_or_fault_p (const_rtx mem)
  {
-  rtx addr;
-
if (MEM_READONLY_P (mem))
  return true;

-  if (may_trap_or_fault_p (mem))
-return true;
-
+  rtx addr;
addr = XEXP (mem, 0);

/* Call target hook to avoid the effects of -fpic etc  */
@@ -2881,6 +2883,18 @@ noce_mem_write_may_trap_or_fault_p (const_rtx mem)
return false;
  }

+/* Return true if a write into MEM may trap or fault
+   without scratchpad support.  */
+
+static bool
+unsafe_address_p (const_rtx mem)
+{
+  if (may_trap_or_fault_p (mem))
+return true;
+
+  return noce_mem_write_may_trap_or_fault_p (mem);
+}


The naming seems backwards from what I suggested in terms of naming. You 
still haven't explained why you want to modify this function, or call 
the limited one even when generating scratchpads. I asked about this 
last time.



  static basic_block
-find_if_header (basic_block test_bb, int pass)
+find_if_header (basic_block test_bb, int pass, bool just_sz_spad)
  {


Arguments need documentation.


+DEFPARAM (PARAM_FORCE_ENABLE_RTL_IFCVT_SPADS,
+ "force-enable-rtl-ifcvt-spads",
+ "Force-enable the use of scratchpads in RTL if conversion, "
+ "overriding the target and the profile data or lack thereof.",
+ 0, 0, 1)
+

+DEFHOOKPOD
+(rtl_ifcvt_scratchpad_control,
+"*",
+enum rtl_ifcvt_spads_ctl_enum, rtl_ifcvt_spads_as_per_profile)
+
+DEFHOOK
+(rtl_ifcvt_get_spad,
+ "*",
+ rtx, (unsigned short size),
+ default_rtl_ifcvt_get_spad)


That moves the problematic bit in a target hook rather than fixing it. 
Two target hooks and a param is probably a bit much for a change like 
this. Which target do you actually want this for? It would help to 
understand why you're doing all this.



+enum rtl_ifcvt_spads_ctl_enum {


Names are still too verbose ("enum" shouldn't be part of it).


+rtx default_rtl_ifcvt_get_spad (unsigned short size)
+{
+  return assign_stack_local (BLKmode, size, 0);
+}


Formatting problem, here and in a few other places. I didn't fully read 
the patch this time around.


I'm probably not reviewing further patches because I don't see this 
progressing to a state where it's acceptable. Others may do so, but as 
far as I'm concerned the patch is rejected.



Bernd


Re: Merge of HSA branch

2015-11-06 Thread Bernd Schmidt

On 11/06/2015 11:30 AM, Richard Biener wrote:

On Fri, 6 Nov 2015, Bernd Schmidt wrote:


Realistically we're probably not going to reject this work, but I still want
to ask whether the approach was acked by the community before you started. I'm
really not exactly thrilled about having two different classes of backends in
the compiler, and two different ways of handling offloading.


Realistically the other approaches werent acked either (well, implicitely
by review).


I think the LTO approach was discussed beforehand. As far as I remember 
(and Jakub may correct me) it was considered for intelmic, and Jakub had 
considerable input on it. I heard that it came up at the 2013 Cauldron.
Writing an rtl backend is the default thing to do for gcc and I would 
expect any other approach to be discussed beforehand.



Not doing an RTL backend for NVPTX would have simplified
your life as well.


I'm not convinced about this. At least I just had to turn off the 
register allocator, not write a new one.



Bernd


Re: [PATCH] Make BB vectorizer work on sub-BBs

2015-11-06 Thread Richard Biener
On Fri, 6 Nov 2015, Richard Biener wrote:

> 
> The following patch makes the BB vectorizer not only handle BB heads
> (until the first stmt with a data reference it cannot handle) but
> arbitrary regions in a BB separated by such stmts.
> 
> This improves the number of BB vectorizations from 469 to 556
> in a quick test on SPEC CPU 2006 with -Ofast on x86_64 and
> 1x400.perlbench 1x410.bwaves 1x416.gamess 1x450.soplex 1x453.povray 
> 1x481.wrf failing both patched and unpatched (have to update my
> config used for such experiments it seems ...)
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu, aarch64 cross built.
> 
> I'm currently re-testing for a cosmetic change I made when writing
> the changelog.
> 
> I expected (and there are) some issues with compile-time.  Left
> is unpatched and right is patched.
> 
> '403.gcc': 00:00:54 (54)  | '403.gcc': 00:00:55 (55)
> '483.xalancbmk': 00:02:20 (140)   | '483.xalancbmk': 00:02:24 (144)
> '416.gamess': 00:02:36 (156)  | '416.gamess': 00:02:37 (157)
> '435.gromacs': 00:00:18 (18)  | '435.gromacs': 00:00:19 (19)
> '447.dealII': 00:01:31 (91)   | '447.dealII': 00:01:33 (93)
> '453.povray': 00:04:54 (294)  | '453.povray': 00:08:54 (534)
> '454.calculix': 00:00:34 (34) | '454.calculix': 00:00:52 (52)
> '481.wrf': 00:01:57 (117) | '481.wrf': 00:01:59 (119)
> 
> other benchmarks are unchanged.  I'm double-checking now that a followup
> patch I have which re-implements BB vectorization dependence checking
> fixes this (that's the only quadraticness I know of).

Fixes all but

'453.povray': 00:04:54 (294)  | '453.povray': 00:06:46 (406)

it even improves compile-time on some:

'464.h264ref': 00:00:26 (26)  | '464.h264ref': 00:00:21 (21)

it also increases the number of vectorized BBs to 722.

Needs some work still though.

Richard.

> Richard.
> 
> 2015-11-06  Richard Biener  
> 
>   * tree-vectorizer.h (struct _bb_vec_info): Add region_begin/end
>   members.
>   (vect_stmt_in_region_p): Declare.
>   * tree-vect-slp.c (new_bb_vec_info): Work on a region.
>   (destroy_bb_vec_info): Likewise.
>   (vect_bb_slp_scalar_cost): Use vect_stmt_in_region_p.
>   (vect_get_and_check_slp_defs): Likewise.
>   (vect_slp_analyze_bb_1): Refactor to make it work on sub-BBs.
>   (vect_slp_bb): Likewise.
>   * tree-vect-patterns.c (vect_same_loop_or_bb_p): Implement
>   in terms of vect_stmt_in_region_p.
>   (vect_pattern_recog): Iterate over the BB region.
>   * tree-vect-stmts.c (vect_is_simple_use): Use vect_stmt_in_region_p.
>   * tree-vectorizer.c (vect_stmt_in_region_p): New function.
>   (pass_slp_vectorize::execute): Initialize all stmt UIDs to -1.
> 
>   * config/i386/i386.c: Include gimple-iterator.h.
>   * config/aarch64/aarch64.c: Likewise.
> 
>   * gcc.dg/vect/bb-slp-38.c: New testcase.
> 
> Index: gcc/tree-vectorizer.h
> ===
> *** gcc/tree-vectorizer.h.orig2015-11-05 09:52:00.640227178 +0100
> --- gcc/tree-vectorizer.h 2015-11-05 13:20:58.385786476 +0100
> *** nested_in_vect_loop_p (struct loop *loop
> *** 390,395 
> --- 390,397 
>   typedef struct _bb_vec_info : public vec_info
>   {
> basic_block bb;
> +   gimple_stmt_iterator region_begin;
> +   gimple_stmt_iterator region_end;
>   } *bb_vec_info;
>   
>   #define BB_VINFO_BB(B)   (B)->bb
> *** void vect_pattern_recog (vec_info *);
> *** 1085,1089 
> --- 1087,1092 
>   /* In tree-vectorizer.c.  */
>   unsigned vectorize_loops (void);
>   void vect_destroy_datarefs (vec_info *);
> + bool vect_stmt_in_region_p (vec_info *, gimple *);
>   
>   #endif  /* GCC_TREE_VECTORIZER_H  */
> Index: gcc/tree-vect-slp.c
> ===
> *** gcc/tree-vect-slp.c.orig  2015-11-05 09:52:00.640227178 +0100
> --- gcc/tree-vect-slp.c   2015-11-06 10:22:56.707880233 +0100
> *** vect_get_and_check_slp_defs (vec_info *v
> *** 209,215 
> unsigned int i, number_of_oprnds;
> gimple *def_stmt;
> enum vect_def_type dt = vect_uninitialized_def;
> -   struct loop *loop = NULL;
> bool pattern = false;
> slp_oprnd_info oprnd_info;
> int first_op_idx = 1;
> --- 209,214 
> *** vect_get_and_check_slp_defs (vec_info *v
> *** 218,226 
> bool first = stmt_num == 0;
> bool second = stmt_num == 1;
>   
> -   if (is_a  (vinfo))
> - loop = LOOP_VINFO_LOOP (as_a  (vinfo));
> - 
> if (is_gimple_call (stmt))
>   {
> number_of_oprnds = gimple_call_num_args (stmt);
> --- 217,222 
> *** again:
> *** 276,286 
>from the pattern.  Check that all the stmts of the node are in the
>pattern.  */
> if (def_stmt && gimple_bb (def_stmt)
> !   && ((is_a  

regrename: don't overflow insn_rr_info

2015-11-06 Thread Bernd Schmidt
This one is a fix for something that could currently only affect c6x, 
but I have code that exposes it on i386.


When optionally gathering operand info in regrename, we can overflow the 
array in certain situations. This can occur when we have a situation 
where a value is constructed in multiple small registers and then 
accessed as a larger one (CDImode in the testcase I have). In that case 
we enter the "superset" path, which fails the involved chains, but the 
smaller pieces still all get seen by record_operand_use, and there may 
be more of them than MAX_REGS_PER_ADDRESS.


The following fixes it. Bootstrapped and tested  with -frename-registers 
enabled at -O1 on x86_64-linux. Ok?



Bernd
	* regrename.c (record_operand_use): Keep track of failed operands
	and stop appending if we see any.
	* regrename.h (struct operand_rr_info): Add a failed field and shrink
	n_chains to short.

Index: gcc/regrename.c
===
--- gcc/regrename.c	(revision 229049)
+++ gcc/regrename.c	(working copy)
@@ -204,8 +204,13 @@ mark_conflict (struct du_head *chains, u
 static void
 record_operand_use (struct du_head *head, struct du_chain *this_du)
 {
-  if (cur_operand == NULL)
+  if (cur_operand == NULL || cur_operand->failed)
 return;
+  if (head->cannot_rename)
+{
+  cur_operand->failed = true;
+  return;
+}
   gcc_assert (cur_operand->n_chains < MAX_REGS_PER_ADDRESS);
   cur_operand->heads[cur_operand->n_chains] = head;
   cur_operand->chains[cur_operand->n_chains++] = this_du;
Index: gcc/regrename.h
===
--- gcc/regrename.h	(revision 229049)
+++ gcc/regrename.h	(working copy)
@@ -68,7 +71,8 @@ struct du_chain
 struct operand_rr_info
 {
   /* The number of chains recorded for this operand.  */
-  int n_chains;
+  short n_chains;
+  bool failed;
   /* Holds either the chain for the operand itself, or for the registers in
  a memory operand.  */
   struct du_chain *chains[MAX_REGS_PER_ADDRESS];


Re: regrename: don't overflow insn_rr_info

2015-11-06 Thread Ramana Radhakrishnan


On 06/11/15 11:08, Bernd Schmidt wrote:
> This one is a fix for something that could currently only affect c6x, but I 
> have code that exposes it on i386.
> 
> When optionally gathering operand info in regrename, we can overflow the 
> array in certain situations. This can occur when we have a situation where a 
> value is constructed in multiple small registers and then accessed as a 
> larger one (CDImode in the testcase I have). In that case we enter the 
> "superset" path, which fails the involved chains, but the smaller pieces 
> still all get seen by record_operand_use, and there may be more of them than 
> MAX_REGS_PER_ADDRESS.
> 
> The following fixes it. Bootstrapped and tested  with -frename-registers 
> enabled at -O1 on x86_64-linux. Ok?
> 
> 
> Bernd

This sounds like it will fix http://gcc.gnu.org/PR66785 ...

Ramana


Re: [PATCH] Fix PR68067

2015-11-06 Thread Richard Biener
On Fri, 6 Nov 2015, Alan Lawrence wrote:

> On 06/11/15 10:39, Richard Biener wrote:
> > > ../spec2000/benchspec/CINT2000/254.gap/src/polynom.c:358:11: error:
> > > location
> > > references block not in block tree
> > > l1_279 = PHI <1(28), l1_299(33)>
> > 
> > ^^^
> > 
> > this is the error to look at!  It means that the GC heap will be corrupted
> > quite easily.
> 
> Thanks, I'll have a go at that.
> 
> -fdump-tree-alias is also suspicious, hinting that that does more than just
> print. (How many bugs here??)

Well, it only allocates some more heap memory for extra debug verbosity.
I think this can happen with other dumps as well.

> > Interesting would be for which pass this happens - just print
> > *pass at this point.
> 
> FWIW - unswitch. (A long time after -fdump-tree-alias!)

Ah, unswitch again... :/

> Cheers, Alan
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: libgo patch committed: Update to Go 1.5 release

2015-11-06 Thread Rainer Orth
Ian Lance Taylor  writes:

> I have committed a patch to libgo to update it to the Go 1.5 release.
>
> As usual for libgo updates, the actual patch is too large to attach to
> this e-mail message.  I've attached the changes to the gccgo-specific
> files.
>
> Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
> to mainline.
>
> This may cause trouble on non-GNU/Linux operating systems.  Please let
> me know about any problems you encounter.

It does indeed (first tried on i386-pc-solaris2.10):

* 

/vol/gcc/src/hg/trunk/local/libgo/runtime/go-varargs.c: In function 
'__go_ioctl':
/vol/gcc/src/hg/trunk/local/libgo/runtime/go-varargs.c:63:10: error: implicit 
declaration of function 'ioctl' [-Werror=implicit-function-declaration]
   return ioctl (d, request, arg);
  ^

  Needs , the following patch works:

diff --git a/libgo/runtime/go-varargs.c b/libgo/runtime/go-varargs.c
--- a/libgo/runtime/go-varargs.c
+++ b/libgo/runtime/go-varargs.c
@@ -11,6 +11,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /* The syscall package calls C functions.  The Go compiler can not
represent a C varargs functions.  On some systems it's important

* 

/vol/gcc/src/hg/trunk/local/libgo/go/syscall/exec_bsd.go:107:7: error: 
incompatible types in assignment (cannot use type int as type Pid_t)
r1 = raw_getpid()
   ^

I can cast to Pid_t and this works.  The underlying error to me seems
that raw_getpid the in the generated libcalls.go is wrong, casting
c_getpid return value to int while pid_t can be long.

* 

/vol/gcc/src/hg/trunk/local/libgo/go/net/hook_cloexec.go:13:70: error: 
reference to undefined identifier 'syscall.Accept4'
  accept4Func func(int, int) (int, syscall.Sockaddr, error) = syscall.Accept4
  ^
  
No accept4 on Solaris (and certainly other systems, thence configure
test), but used unconditionally.

* 

/vol/gcc/src/hg/trunk/local/libgo/go/net/sendfile_solaris.go:78:22: error: 
reference to undefined identifier 'syscall.Sendfile'
   n, err1 := syscall.Sendfile(dst, src, , n)
  ^

Only in go/syscall/libcall_linux.go!?

* 

/vol/gcc/src/hg/trunk/local/libgo/go/net/tcpsockopt_solaris.go:34:103: error: 
reference to undefined identifier 'syscall.TCP_KEEPALIVE_THRESHOLD'
  return os.NewSyscallError("setsockopt", syscall.SetsockoptInt(fd.sysfd, 
syscall.IPPROTO_TCP, syscall.TCP_KEEPALIVE_THRESHOLD, msecs))

   ^
  
Not in Solaris 10, only Solaris 11 and 12 have it.

  Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [openacc] tile, independent, default, private and firstprivate support in c/++

2015-11-06 Thread Nathan Sidwell

On 11/06/15 01:50, Jakub Jelinek wrote:

On Thu, Nov 05, 2015 at 06:10:49PM -0800, Cesar Philippidis wrote:

I've applied this patch to trunk. It also includes the fortran and
template changes. Note that there is a new regression in
gfortran.dg/goacc/combined_loop.f90. Basically, the gimplifier is
complaining about reduction variables appearing in multiple clauses.
E.g. 'acc parallel reduction(+:var) copy(var)'. Nathan's upcoming
gimplifier changes should address that.


If you are relying on the OMP_CLAUSE_MAP_PRIVATE flag that I've added
on gomp-4_1-branch and then removed yesterday, feel free to re-add it,
but of course never set it for OpenMP, just OpenACC constructs
(so for OpenMP keep the gimplifier assertion, for OpenACC set it).



FWIW not noticed a problem with my firstprivate reworking, rebased ontop 
yesterday's  openmp merge


nathan

--
Nathan Sidwell


Re: [gomp4 06/14] omp-low: copy omp_data_o to shared memory on NVPTX

2015-11-06 Thread Jakub Jelinek
On Fri, Nov 06, 2015 at 03:00:30PM +0100, Bernd Schmidt wrote:
> >Sanity-checked by running the libgomp testsuite.  I realize the #ifdef in
> >internal-fn.c is not appropriate: it's there to make the patch smaller, I'll
> >replace it with a target hook if otherwise this approach is ok.
> 
> FWIW, no objections from me regarding the approach.

As I said on IRC, I fear it is not a general solution, and will try to write
a testcase that demonstrates that soon.
That said, as a temporary partial workaround it might be acceptable, but
1) there really should be a target hook
2) it should never be emitted when not in target regions (declare target
   functions or when inside of OpenMP target region)
3) it should be folded away as soon as possible for the non-PTX
   targets (both host and say XeonPhi), so in the openacc pass shortly post 
   IPA (or look at if something that will come with the HSA merge could help 
there)

Jakub


Re: [gomp4 05/14] omp-low: set 'omp target entrypoint' only on entypoints

2015-11-06 Thread Bernd Schmidt

On 10/30/2015 05:44 PM, Alexander Monakov wrote:

+  /* Ignore "omp target entrypoint" here: OpenMP target region functions are
+ called from gomp_nvptx_main.  The corresponding kernel entry is emitted
+ from write_omp_entry.  */
  }


I'm probably confused, but didn't we agree that this should be changed 
so that the entry point isn't gomp_nvptx_main but instead something that 
wraps a call to that function?


This patch creates a new "omp target entrypoint" annotation that appears 
not to be used - it would be better to just not annotate a function if 
it's not going to need entrypoint treatment. IMO a single type of 
attribute should be sufficient for that.



Bernd



Re: [PATCH][ARM/AArch64] PR 68088: Fix RTL checking ICE due to subregs inside accumulator forwarding check

2015-11-06 Thread Nikolai Bozhenov

On 11/06/2015 08:09 PM, Kyrill Tkachov wrote:


On 06/11/15 17:07, Nikolai Bozhenov wrote:

On 11/06/2015 04:46 PM, Ramana Radhakrishnan wrote:

Hi!

I faced the same issue but I had somewhat different RTL for the 
consumer:


 (insn 20 15 21 2 (set (reg/i:SI 0 r0)
 (minus:SI (subreg:SI (reg:DI 117) 4)
 (mult:SI (reg:SI 123)
 (reg:SI 114 gasman.c:4 48 {*mulsi3subsi})

where (reg:DI 117) is produced by umulsidi3_v6 instruction. Is it
really true that (subreg:SI (reg:DI 117) 4) may be forwarded in one
cycle in this case?
If the accumulator can be forwarded (i.e. a SImode register), there 
isn't a reason why a subreg:SI (reg:DI) will not get forwarded.


The subreg:SI is an artifact before register allocation, thus it's a 
representation issue that the patch is fixing here unless I 
misunderstand your question.



I mean, in my example it is not the multiplication result that is
forwarded but its upper part. So, shouldn't we check that offset in a
subreg expression is zero? Or is it ok to forward only the upper part
of a multiplication?


Could you please post the full RTL instruction we're talking about 
here as it appears in the scheduler dump?

So that we're all on the same page about which case we're talking about.



Sure. I'm talking about a sequence like this:

(insn 8 13 11 2 (set (reg:DI 117)
(mult:DI (zero_extend:DI (reg:SI 116))
(zero_extend:DI (reg:SI 118 test.c:2 54 {*umulsidi3_v6}
 (expr_list:REG_DEAD (reg:SI 118)
(expr_list:REG_DEAD (reg:SI 116)
(expr_list:REG_EQUAL (mult:DI (zero_extend:DI (reg:SI 116))
(const_int 1374400 [0x14f8c0]))
(nil)
(insn 11 8 12 2 (set (reg:DI 120)
(mult:DI (zero_extend:DI (subreg:SI (reg:DI 117) 4))
(zero_extend:DI (reg:SI 121 test.c:2 54 {*umulsidi3_v6}
 (expr_list:REG_DEAD (reg:SI 121)
(expr_list:REG_EQUAL (mult:DI (zero_extend:DI (subreg:SI 
(reg:DI 117) 4))

(const_int 3435973837 [0xcccd]))
(nil
(insn 12 11 20 2 (set (reg:SI 114)
(lshiftrt:SI (subreg:SI (reg:DI 120) 4)
(const_int 3 [0x3]))) test.c:2 126 {*arm_shiftsi3}
 (expr_list:REG_DEAD (reg:DI 120)
(expr_list:REG_EQUAL (udiv:SI (subreg:SI (reg:DI 117) 4)
(const_int 10 [0xa]))
(nil
(insn 20 12 21 2 (set (reg/i:SI 0 r0)
(minus:SI (subreg:SI (reg:DI 117) 4)
(mult:SI (reg:SI 123)
(reg:SI 114 test.c:3 48 {*mulsi3subsi}
 (expr_list:REG_DEAD (reg:SI 123)
(expr_list:REG_DEAD (reg:DI 117)
(expr_list:REG_DEAD (reg:SI 114)
(nil)

The last insn is a MLA-like operation which is tested by
aarch_accumulator_forwarding if it is a valid forwarding consumer.

Thanks,
Nikolai


Re: regrename: don't overflow insn_rr_info

2015-11-06 Thread Bernd Schmidt

On 11/06/2015 12:17 PM, Ramana Radhakrishnan wrote:

On 06/11/15 11:08, Bernd Schmidt wrote:

This one is a fix for something that could currently only affect c6x, but I 
have code that exposes it on i386.

When optionally gathering operand info in regrename, we can overflow the array in certain 
situations. This can occur when we have a situation where a value is constructed in 
multiple small registers and then accessed as a larger one (CDImode in the testcase I 
have). In that case we enter the "superset" path, which fails the involved 
chains, but the smaller pieces still all get seen by record_operand_use, and there may be 
more of them than MAX_REGS_PER_ADDRESS.

The following fixes it. Bootstrapped and tested  with -frename-registers 
enabled at -O1 on x86_64-linux. Ok?


Bernd


This sounds like it will fix http://gcc.gnu.org/PR66785 ...


Ah, I didn't realize something else was using this functionality:

gcc/config/aarch64/cortex-a57-fma-steering.c
1025:  regrename_init (true);

Yeah, the description of that bug makes it sound like the same issue.


Bernd


RE: [PATCH, MIPS, PR/61114] Migrate to reduc_..._scal optabs.

2015-11-06 Thread Simon Dardis
Committed r229844.

Thanks,
Simon

> -Original Message-
> From: Moore, Catherine [mailto:catherine_mo...@mentor.com]
> Sent: 03 November 2015 14:09
> To: Simon Dardis; Alan Lawrence; Matthew Fortune
> Cc: gcc-patches@gcc.gnu.org
> Subject: RE: [PATCH, MIPS, PR/61114] Migrate to reduc_..._scal optabs.
> 
> 
> 
> > -Original Message-
> > From: Simon Dardis [mailto:simon.dar...@imgtec.com]
> > Sent: Wednesday, October 07, 2015 6:51 AM
> > To: Alan Lawrence; Matthew Fortune; Moore, Catherine
> > Cc: gcc-patches@gcc.gnu.org
> > Subject: RE: [PATCH, MIPS, PR/61114] Migrate to reduc_..._scal optabs.
> >
> > On the change from smin/smax it was a deliberate change as I managed
> > to confuse myself of the mode patterns, correct version follows.
> > Reverted back to VWHB for smax/smin. Stylistic point addressed.
> >
> > No new regression, ok for commit?
> >
> 
> Yes, OK to commit.  Sorry for the delay in review.
> Catherine
> 
> >
> > Index: config/mips/loongson.md
> >
> ==
> > =
> > --- config/mips/loongson.md (revision 228282)
> > +++ config/mips/loongson.md (working copy)
> > @@ -852,58 +852,66 @@
> >"dsrl\t%0,%1,%2"
> >[(set_attr "type" "fcvt")])
> >
> > -(define_expand "reduc_uplus_"
> > -  [(match_operand:VWH 0 "register_operand" "")
> > -   (match_operand:VWH 1 "register_operand" "")]
> > +(define_insn "vec_loongson_extract_lo_"
> > +  [(set (match_operand: 0 "register_operand" "=r")
> > +(vec_select:
> > +  (match_operand:VWHB 1 "register_operand" "f")
> > +  (parallel [(const_int 0)])))]
> >"TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
> > -{
> > -  mips_expand_vec_reduc (operands[0], operands[1],
> gen_add3);
> > -  DONE;
> > -})
> > +  "mfc1\t%0,%1"
> > +  [(set_attr "type" "mfc")])
> >
> > -; ??? Given that we're not describing a widening reduction, we should
> > -; not have separate optabs for signed and unsigned.
> > -(define_expand "reduc_splus_"
> > -  [(match_operand:VWHB 0 "register_operand" "")
> > +(define_expand "reduc_plus_scal_"
> > +  [(match_operand: 0 "register_operand" "")
> > (match_operand:VWHB 1 "register_operand" "")]
> >"TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
> >  {
> > -  emit_insn (gen_reduc_uplus_(operands[0], operands[1]));
> > +  rtx tmp = gen_reg_rtx (GET_MODE (operands[1]));
> > + mips_expand_vec_reduc (tmp, operands[1], gen_add3);
> emit_insn
> > + (gen_vec_loongson_extract_lo_ (operands[0], tmp));
> >DONE;
> >  })
> >
> > -(define_expand "reduc_smax_"
> > -  [(match_operand:VWHB 0 "register_operand" "")
> > +(define_expand "reduc_smax_scal_"
> > +  [(match_operand: 0 "register_operand" "")
> > (match_operand:VWHB 1 "register_operand" "")]
> >"TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
> >  {
> > -  mips_expand_vec_reduc (operands[0], operands[1],
> gen_smax3);
> > +  rtx tmp = gen_reg_rtx (GET_MODE (operands[1]));
> > + mips_expand_vec_reduc (tmp, operands[1], gen_smax3);
> > + emit_insn (gen_vec_loongson_extract_lo_ (operands[0], tmp));
> >DONE;
> >  })
> >
> > -(define_expand "reduc_smin_"
> > -  [(match_operand:VWHB 0 "register_operand" "")
> > +(define_expand "reduc_smin_scal_"
> > +  [(match_operand: 0 "register_operand" "")
> > (match_operand:VWHB 1 "register_operand" "")]
> >"TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
> >  {
> > -  mips_expand_vec_reduc (operands[0], operands[1],
> gen_smin3);
> > +  rtx tmp = gen_reg_rtx (GET_MODE (operands[1]));
> > + mips_expand_vec_reduc (tmp, operands[1], gen_smin3);
> > + emit_insn (gen_vec_loongson_extract_lo_ (operands[0], tmp));
> >DONE;
> >  })
> >
> > -(define_expand "reduc_umax_"
> > -  [(match_operand:VB 0 "register_operand" "")
> > +(define_expand "reduc_umax_scal_"
> > +  [(match_operand: 0 "register_operand" "")
> > (match_operand:VB 1 "register_operand" "")]
> >"TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
> >  {
> > -  mips_expand_vec_reduc (operands[0], operands[1],
> gen_umax3);
> > +  rtx tmp = gen_reg_rtx (GET_MODE (operands[1]));
> > + mips_expand_vec_reduc (tmp, operands[1], gen_umax3);
> > + emit_insn (gen_vec_loongson_extract_lo_ (operands[0], tmp));
> >DONE;
> >  })
> >
> > -(define_expand "reduc_umin_"
> > -  [(match_operand:VB 0 "register_operand" "")
> > +(define_expand "reduc_umin_scal_"
> > +  [(match_operand: 0 "register_operand" "")
> > (match_operand:VB 1 "register_operand" "")]
> >"TARGET_HARD_FLOAT && TARGET_LOONGSON_VECTORS"
> >  {
> > -  mips_expand_vec_reduc (operands[0], operands[1],
> gen_umin3);
> > +  rtx tmp = gen_reg_rtx (GET_MODE (operands[1]));
> > + mips_expand_vec_reduc (tmp, operands[1], gen_umin3);
> > + emit_insn (gen_vec_loongson_extract_lo_ (operands[0], tmp));
> >DONE;
> >  })
> >
> >
> > -Original Message-
> > From: Alan Lawrence [mailto:alan.lawre...@arm.com]
> > Sent: 06 October 2015 11:12
> > To: Simon Dardis; Matthew Fortune; Moore, Catherine
> > Cc: gcc-patches@gcc.gnu.org
> > 

Re: [PATCH] Fix PR68067

2015-11-06 Thread Alan Lawrence

On 06/11/15 10:39, Richard Biener wrote:

../spec2000/benchspec/CINT2000/254.gap/src/polynom.c:358:11: error: location
references block not in block tree
l1_279 = PHI <1(28), l1_299(33)>


^^^

this is the error to look at!  It means that the GC heap will be corrupted
quite easily.


Thanks, I'll have a go at that.

-fdump-tree-alias is also suspicious, hinting that that does more than just 
print. (How many bugs here??)



Interesting would be for which pass this happens - just print
*pass at this point.


FWIW - unswitch. (A long time after -fdump-tree-alias!)

Cheers, Alan



Re: [PATCH] Fix PR ipa/68035

2015-11-06 Thread Richard Biener
On Fri, Nov 6, 2015 at 10:10 AM, Martin Liška  wrote:
> Hello.
>
> Following patch triggers hash calculation of items (functions and variables)
> in situations where LTO mode is not utilized.
>
> Patch survives regression tests and bootstraps on x86_64-linux-pc.

Why does that make a difference?  Do we have direct ->hash users
that should have used get_hash ()?

> Ready for trunk?
> Thanks,
> Martin


Re: [PATCH PR52272]Be smart when adding iv candidates

2015-11-06 Thread Richard Biener
On Wed, Nov 4, 2015 at 11:18 AM, Bin Cheng  wrote:
> Hi,
> PR52272 reported a performance regression in spec2006/410.bwaves once GCC is
> prevented from representing address of one memory object using address of
> another memory object.  Also as I commented in that PR, we have two possible
> fixes for this:
> 1) Improve how TMR.base is deduced, so that we can represent addr of mem obj
> using another one, while not breaking PR50955.
> 2) Add iv candidates with base object stripped.  In this way, we use the
> common base-stripped part to represent all address expressions, in the form
> of [base_1 + common], [base_2 + common], ..., [base_n + common].
>
> In terms of code generation, method 2) is at least as good as 1), actually
> better in my opinion.  The problem of 2) is we need to tell when iv
> candidates should be added for the common part and when shouldn't.  This
> issue can be generalized and described as: We know IVO tries to add
> candidates by deriving from iv uses.  One disadvantage is that candidates
> are derived from iv use independently.  It doesn't take common sub
> expression among different iv uses into consideration.  As a result,
> candidate for common sub expression is not added, while many useless
> candidates are added.
>
> As a matter of fact, candidate derived from iv use is useful only if it's
> common enough and could be shared among different uses.  A candidate is most
> likely useless if it's derived from a single use and could not be shared by
> others.  This patch works in this way by firstly recording all kinds
> candidates derived from iv uses, then adding candidates for common ones.
>
> The patch improves 410.bwaves by 3-4% on x86_64.  I also saw regression for
> 400.perlbench and small regression for 401.bzip on x86_64, but I can confirm
> they are false alarms caused by align issues.
> For aarch64, fp cases are obviously improved for both spec2000 and spec2006.
> Also the patch causes 2-3% regression for 459.GemsFDTD, which I think is
> another irrelevant issue caused by heuristic candidate selecting algorithm.
> Unfortunately, I don't have fix to it currently.
>
> This patch may add more candidates in some cases, but generally candidates
> number is smaller because we don't need to add useless candidates now.
> Statistic data shows there are quite fewer loops with more than 30
> candidates when building spec2k6 on x86_64 using this patch.
>
> Bootstrap and test on x86_64.  I will re-test it against latest trunk on
> AArch64.  Is it OK?

+inline bool
+iv_common_cand_hasher::equal (const iv_common_cand *ccand1,
+  const iv_common_cand *ccand2)
+{
+  return ccand1->hash == ccand2->hash
+&& operand_equal_p (ccand1->base, ccand2->base, 0)
+&& operand_equal_p (ccand1->step, ccand2->step, 0)
+&& TYPE_PRECISION (TREE_TYPE (ccand1->base))
+ == TYPE_PRECISION (TREE_TYPE (ccand2->base));

I'm wondering on the TYPE_PRECISION check.  a) why is that needed?
and b) what kind of tree is base so that it is safe to inspect TYPE_PRECISION
unconditionally?

+  slot = data->iv_common_cand_tab->find_slot (, INSERT);
+  if (*slot == NULL)
+{
+  *slot = XNEW (struct iv_common_cand);

allocate from the IV obstack instead?  I see we do a lot of heap allocations
in IVOPTs, so we can improve that as followup as well.

We probably should empty the obstack after each processed loop.

Thanks,
Richard.


> Thanks,
> bin
>
> 2015-11-03  Bin Cheng  
>
> PR tree-optimization/52272
> * tree-ssa-loop-ivopts.c (struct iv_common_cand): New struct.
> (struct iv_common_cand_hasher): New struct.
> (iv_common_cand_hasher::hash): New function.
> (iv_common_cand_hasher::equal): New function.
> (struct ivopts_data): New fields, iv_common_cand_tab and
> iv_common_cands.
> (tree_ssa_iv_optimize_init): Initialize above fields.
> (record_common_cand, common_cand_cmp): New functions.
> (add_iv_candidate_derived_from_uses): New function.
> (add_iv_candidate_for_use): Record iv_common_cands derived from
> iv use in hash table, instead of adding candidates directly.
> (add_iv_candidate_for_uses): Call
> add_iv_candidate_derived_from_uses.
> (record_important_candidates): Add important candidates to iv uses'
> related_cands.  Always keep related_cands for future use.
> (try_add_cand_for): Use iv uses' related_cands.
> (free_loop_data, tree_ssa_iv_optimize_finalize): Release new fields
> in struct ivopts_data, iv_common_cand_tab and iv_common_cands.


Remove noce_mem_write_may_trap_or_fault_p in ifcvt

2015-11-06 Thread Bernd Schmidt
The ifcvt scratchpad patch had some modifications to the function 
noce_mem_write_may_trap_or_fault_p in ifcvt, but as far as I can tell, 
that function itself makes no sense whatsoever. It returns true for 
MEM_READONLY_P which is sensible, but then it also goes on an unreliable 
search through the address, looking for SYMBOL_REFs that are in 
decl_readonly_section. Needless to say, this won't ever find anything on 
targets that don't allow symbolic addresses, and is not reliable even on 
targets that do. The MEM_READONLY_P test must suffice; if there's a 
reason why it doesn't, we'd need to figure out why and possibly disable 
if-conversion of stores more thoroughly.


As a sanity check I bootstrapped and tested the first of the two 
attached patches, which changes the "return true" paths to 
gcc_unreachable. Since that passed, I propose the second of the two 
attached patches, which removes the function and replaces it with a 
simpler check. Ok if that passes too?



Bernd
Index: gcc/ifcvt.c
===
--- gcc/ifcvt.c	(revision 229049)
+++ gcc/ifcvt.c	(working copy)
@@ -2868,11 +2868,12 @@ noce_mem_write_may_trap_or_fault_p (cons
 	  return false;
 	break;
   case LABEL_REF:
-	return true;
+	gcc_unreachable ();
   case SYMBOL_REF:
 	if (SYMBOL_REF_DECL (addr)
 	&& decl_readonly_section (SYMBOL_REF_DECL (addr), 0))
-	  return true;
+	  gcc_unreachable ();
+
 	return false;
   default:
 	return false;
	* ifcvt.c (noce_mem_write_may_trap_or_fault_p): Delete function.
	(noce_process_if_block): Don't call it, test MEM_READONLY_P and
	may_trap_or_fault_p instead.

Index: gcc/ifcvt.c
===
--- gcc/ifcvt.c	(revision 229049)
+++ gcc/ifcvt.c	(working copy)
@@ -2828,59 +2828,6 @@ noce_operand_ok (const_rtx op)
   return ! may_trap_p (op);
 }
 
-/* Return true if a write into MEM may trap or fault.  */
-
-static bool
-noce_mem_write_may_trap_or_fault_p (const_rtx mem)
-{
-  rtx addr;
-
-  if (MEM_READONLY_P (mem))
-return true;
-
-  if (may_trap_or_fault_p (mem))
-return true;
-
-  addr = XEXP (mem, 0);
-
-  /* Call target hook to avoid the effects of -fpic etc  */
-  addr = targetm.delegitimize_address (addr);
-
-  while (addr)
-switch (GET_CODE (addr))
-  {
-  case CONST:
-  case PRE_DEC:
-  case PRE_INC:
-  case POST_DEC:
-  case POST_INC:
-  case POST_MODIFY:
-	addr = XEXP (addr, 0);
-	break;
-  case LO_SUM:
-  case PRE_MODIFY:
-	addr = XEXP (addr, 1);
-	break;
-  case PLUS:
-	if (CONST_INT_P (XEXP (addr, 1)))
-	  addr = XEXP (addr, 0);
-	else
-	  return false;
-	break;
-  case LABEL_REF:
-	return true;
-  case SYMBOL_REF:
-	if (SYMBOL_REF_DECL (addr)
-	&& decl_readonly_section (SYMBOL_REF_DECL (addr), 0))
-	  return true;
-	return false;
-  default:
-	return false;
-  }
-
-  return false;
-}
-
 /* Return whether we can use store speculation for MEM.  TOP_BB is the
basic block above the conditional block where we are considering
doing the speculative store.  We look for whether MEM is set
@@ -3210,7 +3157,8 @@ noce_process_if_block (struct noce_if_in
 	 THEN and ELSE arms, then we can go ahead with the conversion;
 	 either the program is broken, or the condition is always
 	 false such that the other memory is selected.  */
-  if (noce_mem_write_may_trap_or_fault_p (orig_x))
+  if (may_trap_or_fault_p (orig_x)
+	  || MEM_READONLY_P (orig_x))
 	return FALSE;
 
   /* Avoid store speculation: given "if (...) x = a" where x is a


[Patch] Change to argument promotion in fixed conversion library calls

2015-11-06 Thread Steve Ellcey
This is part one of a two part patch I want to checkin in order to improve
argument passing on MIPS.  Right now MIPS defines TARGET_PROMOTE_PROTOTYPES
as true and so short and char arguments are getting promoted in the caller and
callee functions.  The ABI requires callers to do the promotion and so
I want to remove the definition of TARGET_PROMOTE_PROTOTYPES which will
get rid of the promotion done in the callee.  This matches what LLVM and
the MIPS Greenhills compilers do.

When I made this change I had one regression in the GCC testsuite
(gcc.dg/fixed-point/convert-sat.c).  I tracked this down to the 
fact that emit_library_call_value_1 does not do any argument promotion
because it does not have the original tree type information for library
calls.  It only knows about modes.  I can't change emit_library_call_value_1
to do the promotion because it does not know whether to do a signed or
unsigned promotion, but expand_fixed_convert could do the conversion
before calling emit_library_call_value_1 and that is what this patch does.

Note that if a target uses one of the default argument promotion functions
like default_promote_function_mode or
default_promote_function_mode_always_promote, this change will have no
affect because they call promote_mode and if promote_mode is called with
a NULL_TREE as the first argument it returns the original mode.  For target
specific promote functions, this change should just cause the promotion
to be done in expand_fixed_convert instead of in emit_library_call_value_1
and with the proper setting of unsignedp.

This patch by itself will not fix my MIPS bug because it uses one of the
default promote functions.  Part two of this patch will be to define the
TARGET_PROMOTE_FUNCTION_MODE macro on MIPS so that it promotes QI and HI
to SI even if the type tree is NULL and then everything will work.

The 'real' long term fix for this problem is to have tree types for builtin
functions so the proper promotions can always be done but that is a fairly
large change that I am not willing to tackle right now and it could probably
not be done in time for GCC 6.0 anyway.

See https://gcc.gnu.org/ml/gcc/2015-10/msg00234.html for more discussion
of this change.

Is this part of the patch OK to checkin?  Tested on mips-mti-linux-gnu.

Steve Ellcey
sell...@imgtec.com


2015-11-06  Steve Ellcey  

* optabs.c (expand_fixed_convert): Call promote_function_mode.


diff --git a/gcc/optabs.c b/gcc/optabs.c
index fdcdc6a..964e92a 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -4880,6 +4880,24 @@ expand_fixed_convert (rtx to, rtx from, int uintp, int 
satp)
   libfunc = convert_optab_libfunc (tab, to_mode, from_mode);
   gcc_assert (libfunc);
 
+  if (SCALAR_INT_MODE_P (from_mode))
+{
+  /*  If we need to promote the integer function argument we need to do
+ it here instead of inside emit_library_call_value because here we
+ know if we should be doing a signed or unsigned promotion.  */
+
+  machine_mode arg_mode;
+  int unsigned_p = 0;
+
+  arg_mode = promote_function_mode (NULL_TREE, from_mode,
+   _p, NULL_TREE, 0);
+  if (arg_mode != from_mode)
+   {
+ from = convert_to_mode (arg_mode, from, uintp);
+ from_mode = arg_mode;
+   }
+}
+
   start_sequence ();
   value = emit_library_call_value (libfunc, NULL_RTX, LCT_CONST, to_mode,
   1, from, from_mode);


Re: [OpenACC] declare directive

2015-11-06 Thread Jakub Jelinek
On Fri, Nov 06, 2015 at 10:08:26AM -0600, James Norris wrote:
> 2015-10-27  James Norris  
>   Joseph Myers  
> 
>   gcc/
>   * c-family/c-pragma.c (oacc_pragmas): Add entry for declare directive. 
>   * c-family/c-pragma.h (enum pragma_kind): Add PRAGMA_OACC_DECLARE.

c-family/, c/, cp/ have all their own ChangeLog files, and the entries
that go in there should not have the path prefixes.

> +  for (t = clauses; t; t = OMP_CLAUSE_CHAIN (t))
> +{
> +  location_t loc = OMP_CLAUSE_LOCATION (t);
> +  tree decl = OMP_CLAUSE_DECL (t);
> +  tree devres = NULL_TREE;
> +  if (!DECL_P (decl))
> + {
> +   error_at (loc, "subarray in %<#pragma acc declare%>");

Is that the OpenACC term that you use everywhere?  In OpenMP
those are array sections.

> + case GOMP_MAP_LINK:
> +   if (!global_bindings_p () && !DECL_EXTERNAL (decl))
> + {
> +   error_at (loc,
> + "%qD must be a global variable in"
> + "%<#pragma acc declare link%>",
> + decl);
> +   error = true;
> +   continue;

What about TREE_STATIC?

> diff --git a/gcc/gimplify.c b/gcc/gimplify.c
> index fa34858..fbd4a69 100644
> --- a/gcc/gimplify.c
> +++ b/gcc/gimplify.c
> @@ -157,6 +157,7 @@ struct gimplify_omp_ctx
>bool target_map_scalars_firstprivate;
>bool target_map_pointers_as_0len_arrays;
>bool target_firstprivatize_array_bases;
> +  gomp_target *declare_returns;
>  };

This declare_returns stuff looks broken.
What exactly do you want to achieve?  Why do you record it
in gimplify_omp_ctx, but then only look at it in gimplify_body?
The way you abuse gimplify_omp_ctx for that is just too ugly.
All the gimplification code expects that when you enter some OpenMP/OpenACC
construct, you create new context if it needs one, then process the body
of the construct, then pop it up.  The declare_returns stuff
violates all of this.  Do you want to enqueue all the statements
at the end of the body?  Then just stick it into some gimple_seq
outside of the gimplify_omp_ctx, and process in there.  Or if you
want to process it when leaving some construct, arrange for that.

> @@ -5841,6 +5863,8 @@ omp_default_clause (struct gimplify_omp_ctx *ctx, tree 
> decl,
>flags |= GOVD_FIRSTPRIVATE;
>break;
>  case OMP_CLAUSE_DEFAULT_UNSPECIFIED:
> +  if (is_global_var (decl) && device_resident_p (decl))
> + flags |= GOVD_MAP_TO_ONLY | GOVD_MAP;

I don't think you want to do this except for (selected or all?)
OpenACC contexts.  Say, I don't see why one couldn't e.g. try to mix
OpenMP host parallelization or tasking with OpenACC offloading,
and that affecting in weird way OpenMP semantics.

> +  /* Any clauses that affect the state of a variable prior
> + to return are saved and dealt with after the body has
> + been gimplified.  */
> +
> +  gimplify_scan_omp_clauses (, pre_p, ORT_TARGET_DATA,
> +  OACC_DECLARE);
> +
> +  c = gimplify_omp_ctxp;
> +  gimplify_omp_ctxp = c->outer_context;
> +  delete_omp_context (c);

Why don't you call gimplify_adjust_omp_clauses instead?

> +  stmt = gimple_build_omp_target (NULL, GF_OMP_TARGET_KIND_OACC_DECLARE,
> +   clauses);
> +  gimplify_omp_ctxp->declare_returns = stmt;

But the above is the main thing I have trouble with.
What happens if you have multiple #pragma acc declare directives?
Is the stmt from the other ones lost?
I'd expect something like gimple_seq in there instead and pushing
stmts into the sequence.
I don't know what is the semantics of the construct, if it is something
close to say target data in OpenMP or acc data in OpenACC, the addition
of code to unmap the variables is performed using a cleanup, otherwise
how do you handle exceptions, or goto and all other kinds of abnormal
returns from the function.

> @@ -160,6 +151,25 @@ varpool_node::get_create (tree decl)
>node->force_output = 1;
>  #endif
>  }
> +}
> +
> +/* Return varpool node assigned to DECL.  Create new one when needed.  */
> +varpool_node *
> +varpool_node::get_create (tree decl)
> +{
> +  varpool_node *node = varpool_node::get (decl);
> +  gcc_checking_assert (TREE_CODE (decl) == VAR_DECL);
> +  if (node)
> +{
> +  if (!node->offloadable)
> + make_offloadable (node, decl);

I don't like this change at all, that is significant extra work
every time somebody calls this function.
Just do what we do on the OpenMP side, if you add some "omp declare target"
or similar attribute and the varpool node for it has been already created,
set offloadable in the FE.

Jakub


Re: [Patch] Change to argument promotion in fixed conversion library calls

2015-11-06 Thread Bernd Schmidt

On 11/06/2015 08:04 PM, Steve Ellcey  wrote:

When I made this change I had one regression in the GCC testsuite
(gcc.dg/fixed-point/convert-sat.c).  I tracked this down to the
fact that emit_library_call_value_1 does not do any argument promotion
because it does not have the original tree type information for library
calls.  It only knows about modes.  I can't change emit_library_call_value_1
to do the promotion because it does not know whether to do a signed or
unsigned promotion, but expand_fixed_convert could do the conversion
before calling emit_library_call_value_1 and that is what this patch does.


Hmm, difficult. I can see how there would be a problem, but considering 
how many calls to emit_library_call_* we have, I'm slightly worried 
whether this is really is a good approach.


On the other hand, there seems to be precedent in this file:

  if (GET_MODE_PRECISION (GET_MODE (from)) < GET_MODE_PRECISION 
(SImode))

from = convert_to_mode (SImode, from, unsignedp);


The 'real' long term fix for this problem is to have tree types for builtin
functions so the proper promotions can always be done but that is a fairly
large change that I am not willing to tackle right now and it could probably
not be done in time for GCC 6.0 anyway.


Yeah, but I agree that this is the real fix. We should aim to get rid of 
the emit_library_call functions.



+  if (SCALAR_INT_MODE_P (from_mode))
+{
+  /*  If we need to promote the integer function argument we need to do


Extra space at the start of the comment.


+ it here instead of inside emit_library_call_value because here we
+ know if we should be doing a signed or unsigned promotion.  */
+
+  machine_mode arg_mode;
+  int unsigned_p = 0;
+
+  arg_mode = promote_function_mode (NULL_TREE, from_mode,
+   _p, NULL_TREE, 0);
+  if (arg_mode != from_mode)
+   {
+ from = convert_to_mode (arg_mode, from, uintp);
+ from_mode = arg_mode;
+   }
+}


Move this into a separate function (prepare_libcall_arg)? I'll think 
about it over the weekend and let others chime in if they want, but I 
think I'll probably end up approving it with that change.



Bernd


Handle internal functions in is_tm_pure_call

2015-11-06 Thread Richard Sandiford
The upcoming changes to use internal functions for things like sqrt
caused a failure in gcc.dg/tm/20100610.c, because we were trying to get
call flags from the null gimple_call_fn of an IFN_SQRT call.  We've been
making fairly heavy use of internal functions for a while now so I think
this might be latent.

Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.
OK to install?

Thanks,
Richard


gcc/
* trans-mem.c (is_tm_pure_call): Use gimple_call_flags for
internal functions.

diff --git a/gcc/trans-mem.c b/gcc/trans-mem.c
index 45bc759..4583bd5 100644
--- a/gcc/trans-mem.c
+++ b/gcc/trans-mem.c
@@ -266,6 +266,9 @@ is_tm_safe (const_tree x)
 static bool
 is_tm_pure_call (gimple *call)
 {
+  if (gimple_call_internal_p (call))
+return (gimple_call_flags (call) & (ECF_CONST | ECF_TM_PURE)) != 0;
+
   tree fn = gimple_call_fn (call);
 
   if (TREE_CODE (fn) == ADDR_EXPR)



Re: libgo patch committed: Update to Go 1.5 release

2015-11-06 Thread Ian Lance Taylor
On Fri, Nov 6, 2015 at 5:01 AM, Rainer Orth  
wrote:
> Ian Lance Taylor  writes:
>
>> I have committed a patch to libgo to update it to the Go 1.5 release.
>>
>> As usual for libgo updates, the actual patch is too large to attach to
>> this e-mail message.  I've attached the changes to the gccgo-specific
>> files.
>>
>> Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
>> to mainline.
>>
>> This may cause trouble on non-GNU/Linux operating systems.  Please let
>> me know about any problems you encounter.
>
> It does indeed (first tried on i386-pc-solaris2.10):
>
> *
>
> /vol/gcc/src/hg/trunk/local/libgo/runtime/go-varargs.c: In function 
> '__go_ioctl':
> /vol/gcc/src/hg/trunk/local/libgo/runtime/go-varargs.c:63:10: error: implicit 
> declaration of function 'ioctl' [-Werror=implicit-function-declaration]
>return ioctl (d, request, arg);
>   ^
>
>   Needs , the following patch works:
>
>
>
> *
>
> /vol/gcc/src/hg/trunk/local/libgo/go/syscall/exec_bsd.go:107:7: error: 
> incompatible types in assignment (cannot use type int as type Pid_t)
> r1 = raw_getpid()
>^
>
> I can cast to Pid_t and this works.  The underlying error to me seems
> that raw_getpid the in the generated libcalls.go is wrong, casting
> c_getpid return value to int while pid_t can be long.
>
> *
>
> /vol/gcc/src/hg/trunk/local/libgo/go/net/hook_cloexec.go:13:70: error: 
> reference to undefined identifier 'syscall.Accept4'
>   accept4Func func(int, int) (int, syscall.Sockaddr, error) = syscall.Accept4
>   ^
>
> No accept4 on Solaris (and certainly other systems, thence configure
> test), but used unconditionally.
>
> *
>
> /vol/gcc/src/hg/trunk/local/libgo/go/net/sendfile_solaris.go:78:22: error: 
> reference to undefined identifier 'syscall.Sendfile'
>n, err1 := syscall.Sendfile(dst, src, , n)
>   ^
>
> Only in go/syscall/libcall_linux.go!?
>
> *
>
> /vol/gcc/src/hg/trunk/local/libgo/go/net/tcpsockopt_solaris.go:34:103: error: 
> reference to undefined identifier 'syscall.TCP_KEEPALIVE_THRESHOLD'
>   return os.NewSyscallError("setsockopt", syscall.SetsockoptInt(fd.sysfd, 
> syscall.IPPROTO_TCP, syscall.TCP_KEEPALIVE_THRESHOLD, msecs))
>
>^
>
> Not in Solaris 10, only Solaris 11 and 12 have it.

Thanks for the notes.  I committed this patch to address these problems.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 229832)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-47f256e1ed527b2eb4041acf90d33e6abc5e1685
+10c1d6756ed1dcc814c49921c2a5e27f4677e0e6
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/Makefile.am
===
--- libgo/Makefile.am   (revision 229832)
+++ libgo/Makefile.am   (working copy)
@@ -787,10 +787,14 @@ endif
 endif
 
 if LIBGO_IS_LINUX
-go_net_cloexec_file = go/net/sock_cloexec.go
+go_net_cloexec_file = go/net/sock_cloexec.go go/net/hook_cloexec.go
+else
+if LIBGO_IS_FREEBSD
+go_net_cloexec_file = go/net/sock_cloexec.go go/net/hook_cloexec.go
 else
 go_net_cloexec_file = go/net/sys_cloexec.go
 endif
+endif
 
 if LIBGO_IS_OPENBSD
 go_net_tcpsockopt_file = go/net/tcpsockopt_openbsd.go
@@ -825,7 +829,6 @@ go_net_common_files = \
go/net/file.go \
go/net/file_unix.go \
go/net/hook.go \
-   go/net/hook_cloexec.go \
go/net/hook_unix.go \
go/net/hosts.go \
go/net/interface.go \
@@ -1985,6 +1988,12 @@ else
 syscall_exec_test_file =
 endif
 
+if LIBGO_IS_LINUX
+syscall_os_file =
+else
+syscall_os_file = go/syscall/libcall_bsd.go
+endif
+
 go_base_syscall_files = \
go/syscall/env_unix.go \
go/syscall/syscall_errno.go \
@@ -2003,6 +2012,7 @@ go_base_syscall_files = \
$(syscall_sleep_file) \
$(syscall_errstr_file) \
$(syscall_size_file) \
+   $(syscall_os_file) \
$(syscall_socket_file) \
$(syscall_socket_os_file) \
$(syscall_socket_type_file) \
Index: libgo/go/net/tcpsockopt_solaris.go
===
--- libgo/go/net/tcpsockopt_solaris.go  (revision 229832)
+++ libgo/go/net/tcpsockopt_solaris.go  (working copy)
@@ -1,7 +1,9 @@
-// Copyright 2015 The Go Authors.  All rights reserved.
+// Copyright 2013 The Go Authors. All rights reserved.
 // Use of this source code is governed by a BSD-style
 // license that can be found in the LICENSE file.
 
+// TCP socket options for solaris
+
 package net
 
 import (
@@ -10,26 +12,16 @@ import (
"time"
 )
 
+// Set keep alive period.
 func setKeepAlivePeriod(fd *netFD, d time.Duration) error {
if err := fd.incref(); err != nil {
return err
 

Re: Remove noce_mem_write_may_trap_or_fault_p in ifcvt

2015-11-06 Thread Jeff Law

On 11/06/2015 11:44 AM, Bernd Schmidt wrote:

On 11/06/2015 07:39 PM, Jeff Law wrote:

Given the name "..may_trap_or_fault_p" ISTM that its mode of operation
should be to return true (the safe value) unless we can prove the write
will not fault.  The more cases we can prove true, the better AFAICT.

The PLUS case looks totally wrong.  Though it could possibly be made
correct by looking for [sp,fp,ap] + offset addresses and verifying we're
doing a mis-aligned write.  We'd probably also need some kind of
sensible verification that the offset isn't too large/small.


I'm guessing this is already covered by the call to may_trap_or_fault_p.
Yea, it looks like may_trap_or_fault_p tries to handle unaligned and 
wacky offsets for sp,fp,ap relative addresses.


So maybe what noce_mem_write_may_trap_or_fault_p is really trying to do 
is take objects where may_trap_or_fault_p returns false, but which in 
fact may fault if we write that memory?  ie, working around lameness in 
may_trap_or_fault_p not knowing the context (ie read vs write).




The only additional thing that this function tries to prove is that the
mem isn't readonly. IMO either MEM_READONLY_P is sufficient for that
(and my patches operate under that assumption), or it isn't sufficient
and no amount of checking the address is going to make the function useful.
Right.  I think we've come to agreement on what 
noce_mem_write_may_trap_or_fault_p is trying to to.


The problem I have is I don't think we can assume that the lack of 
MEM_READONLY_P is sufficient to determine that a particular memory 
reference is not to readonly memory.   ie, when MEM_READONLY_P is set, 
the object must be readonly, when it is not set, we know nothing.



I think what we really want here is to fix may_trap_or_fault_p to be 
safe.  It's returning "false" in cases where it really ought be 
returning "true".  Then we just use may_trap_or_fault_p, dropping 
noce_mem_write_may_trap_or_fault_p.


Jeff



Re: [Patch] Change to argument promotion in fixed conversion library calls

2015-11-06 Thread Steve Ellcey
On Fri, 2015-11-06 at 20:14 +0100, Bernd Schmidt wrote:

> 
> > +  if (SCALAR_INT_MODE_P (from_mode))
> > +{
> > +  /*  If we need to promote the integer function argument we need to do
> 
> Extra space at the start of the comment.
> 
> > + it here instead of inside emit_library_call_value because here we
> > + know if we should be doing a signed or unsigned promotion.  */
> > +
> > +  machine_mode arg_mode;
> > +  int unsigned_p = 0;
> > +
> > +  arg_mode = promote_function_mode (NULL_TREE, from_mode,
> > +   _p, NULL_TREE, 0);
> > +  if (arg_mode != from_mode)
> > +   {
> > + from = convert_to_mode (arg_mode, from, uintp);
> > + from_mode = arg_mode;
> > +   }
> > +}
> 
> Move this into a separate function (prepare_libcall_arg)? I'll think 
> about it over the weekend and let others chime in if they want, but I 
> think I'll probably end up approving it with that change.
> 
> 
> Bernd

Are you thinking of a simple function that is called on all targets or a
target specific function?  Maybe a target specific function would be
safer.

We could have:

from = targetm.prepare_libcall_arg (from, uintp);
from_mode = GET_MODE (from);

And have the default prepare_libcall_arg simply return the original
'from' rtx value.  We could also pass some other arguments, maybe the
library call rtx (fun) and an integer to specify what argument this is
(1, 2, 3, ...) if we wanted to try and generalize it even more.

Steve Ellcey
sell...@imgtec.com





Re: Remove noce_mem_write_may_trap_or_fault_p in ifcvt

2015-11-06 Thread Bernd Schmidt

On 11/06/2015 08:20 PM, Jeff Law wrote:

So maybe what noce_mem_write_may_trap_or_fault_p is really trying to do
is take objects where may_trap_or_fault_p returns false, but which in
fact may fault if we write that memory?  ie, working around lameness in
may_trap_or_fault_p not knowing the context (ie read vs write).


Yes exactly, that is the aim - detect extra possibilities for faults 
knowing that the context is a write.



I think what we really want here is to fix may_trap_or_fault_p to be
safe.  It's returning "false" in cases where it really ought be
returning "true".  Then we just use may_trap_or_fault_p, dropping
noce_mem_write_may_trap_or_fault_p.


Well, I think if MEM_READONLY_P is insufficient (and I guess people 
could cast away const - bother), then the current 
noce_mem_write_may_trap_or_fault_p should be simplified to "return 
true;" and eliminated.  We could try to special case stack accesses 
within a known limit later on.


Maybe it doesn't even matter given that we call noce_can_store_speculate 
immediately after this test.



Bernd


Re: regrename: Fix for earlyclobber operands

2015-11-06 Thread Jeff Law

On 11/06/2015 03:54 AM, Bernd Schmidt wrote:

I have a patch which makes use of the renamer more often, and this
exposed a bug with earlyclobber operands. The code that does the
terminate_write step has the following comment:

   /* Step 5: Close open chains that overlap writes.  Similar to
  step 2, we hide in-out operands, since we do not want to
  close these chains.  We also hide earlyclobber operands,
  since we've opened chains for them in step 1, and earlier
  chains they would overlap with must have been closed at
  the previous insn at the latest, as such operands cannot
  possibly overlap with any input operands.  */

That's all right as far as it goes, but the problem is that this means
there isn't a terminate_write step for earlyclobbers.

The following seems like the simplest possible fix. It was bootstrapped
and tested with -frename-registers enabled at -O1 on x86_64-linux. Ok?

(Incidentally there are some avx tests that fail if they are renamed,
apparently because the scan-assembler doesn't allow register numbers
like %zmm10. avx512bw-vptestmb-1.c is one of those).


Bernd

rr-ec-term.diff


* regrename.c (record_out_operands): Terminate earlyclobbered
operands here.
Do you want to terminate those chains after step 6 is complete so that 
the earlyclobber output conflicts with all the other outputs?


Isn't that how we typically do things for earlyclobbers in life 
analysis?  earlyclobbers birth before the insn so they conflict with the 
inputs and die after the insn so they conflict with the outputs?


Or am I fundamentally wrong in even thinking about this in a manner 
similar to lifetime analysis?


Jeff



Re: [Patch] Change to argument promotion in fixed conversion library calls

2015-11-06 Thread Bernd Schmidt

On 11/06/2015 08:27 PM, Steve Ellcey wrote:


Are you thinking of a simple function that is called on all targets or a
target specific function?  Maybe a target specific function would be
safer.


No, I think just what you have there is probably sufficient.


Bernd


Re: [OpenACC] declare directive

2015-11-06 Thread Jakub Jelinek
On Fri, Nov 06, 2015 at 08:03:52PM +0100, Jakub Jelinek wrote:
> What exactly do you want to achieve?  Why do you record it
> in gimplify_omp_ctx, but then only look at it in gimplify_body?
> The way you abuse gimplify_omp_ctx for that is just too ugly.
> All the gimplification code expects that when you enter some OpenMP/OpenACC
> construct, you create new context if it needs one, then process the body
> of the construct, then pop it up.  The declare_returns stuff
> violates all of this.  Do you want to enqueue all the statements
> at the end of the body?  Then just stick it into some gimple_seq
> outside of the gimplify_omp_ctx, and process in there.  Or if you
> want to process it when leaving some construct, arrange for that.

If the unmap operation is supposed to sit where the corresponding variable
goes out of scope, supposedly you would need to find out in which bind
(gimplify_ctx (not omp!)) the variable is declared and arrange for the
statement to be added as a cleanup of that region.
If you queue everything to the end of function, I'm afraid with e.g.
void foo (void)
{
  {
char buf[2048];
#acc declare whatever(buf)
// ...
  }
  {
char buf2[2048];
#acc declare whatever(buf2)
// ...
  }
}
both vars will be assigned overlapping ranges and you might run into
trouble trying to map buf2 while buf is still mapped.

Jakub


Re: Handle internal functions in is_tm_pure_call

2015-11-06 Thread Jeff Law

On 11/06/2015 12:15 PM, Richard Sandiford wrote:

The upcoming changes to use internal functions for things like sqrt
caused a failure in gcc.dg/tm/20100610.c, because we were trying to get
call flags from the null gimple_call_fn of an IFN_SQRT call.  We've been
making fairly heavy use of internal functions for a while now so I think
this might be latent.

Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.
OK to install?

Thanks,
Richard


gcc/
* trans-mem.c (is_tm_pure_call): Use gimple_call_flags for
internal functions.

OK.
jeff



Re: OpenACC declare directive updates

2015-11-06 Thread Jakub Jelinek
On Wed, Nov 04, 2015 at 06:32:00AM -0600, James Norris wrote:
> +/* Node in the linked list used for storing !$oacc declare constructs.  */
> +
> +typedef struct gfc_oacc_declare
> +{
> +  struct gfc_oacc_declare *next;
> +  bool module_var;
> +  gfc_omp_clauses *clauses;
> +  gfc_omp_clauses *return_clauses;
> +}
> +gfc_oacc_declare;
> +
> +#define gfc_get_oacc_declare() XCNEW (gfc_oacc_declare)
> +
> +
>  /* Node in the linked list used for storing !$omp declare simd constructs.  
> */
>  
>  typedef struct gfc_omp_declare_simd
> @@ -1644,7 +1668,7 @@ typedef struct gfc_namespace
>struct gfc_data *data, *old_data;
>  
>/* !$ACC DECLARE clauses.  */
> -  gfc_omp_clauses *oacc_declare_clauses;
> +  struct gfc_oacc_declare *oacc_declare_clauses;

This should be renamed now that it doesn't hold just clauses,
perhaps just oacc_declare?  Also, no need to use struct keyword.

> @@ -2857,6 +2957,42 @@ oacc_compatible_clauses (gfc_omp_clauses *clauses, int 
> list,
>return false;
>  }
>  
> +/* Check if a variable appears in multiple clauses.  */
> +
> +static void
> +resolve_omp_duplicate_list (gfc_omp_namelist *clause_list, bool openacc,
> + int list)
> +{
> +  gfc_omp_namelist *n;
> +  const char *error_msg = "Symbol %qs present on multiple clauses at %L";
> +
> +  /* OpenACC reduction clauses are compatible with everything.  We only
> + need to check if a reduction variable is used more than once.  */
> +  if (openacc && list == OMP_LIST_REDUCTION)
> +{
> +  hash_set reductions;
> +
> +  for (n = clause_list; n; n = n->next)
> + {
> +   if (reductions.contains (n->sym))
> + gfc_error (error_msg, n->sym->name, >expr->where);
> +   else
> + reductions.add (n->sym);
> + }
> +
> +  return;
> +}
> +
> +  /* Ensure that variables are only used in one clause.  */
> +  for (n = clause_list; n; n = n->next)
> +{
> +  if (n->sym->mark)
> + gfc_error (error_msg, n->sym->name, >expr->where);
> +  else
> + n->sym->mark = 1;
> +}
> +}

You are readding a function that has been rejected, OMP_LIST_REDUCTION
is handled differently now.

Jakub


Re: [RFA] Do not use libiberty's getpagesize on Android

2015-11-06 Thread DJ Delorie

> libiberty/ChangeLog:
> 
> * configure.ac: Set AC_CV_FUNC_GETPAGESIZE to "yes" on
> Android hosts.
> * configure: Regenerate.
> 
> OK to apply?

Ok.


Re: Unreviewed patch

2015-11-06 Thread Jeff Law

On 11/06/2015 06:29 AM, Rainer Orth wrote:

The following patch has remained unrevied for a month:

[build] Support init priority on Solaris
 https://gcc.gnu.org/ml/gcc-patches/2015-10/msg00716.html

It needs build and ia64 maintainers and someone familiar with the init
priority support to review.
This is OK.  Thanks for pinging -- I didn't have it in my queue of 
things to look at, so it'd definitely fallen through the cracks.

jeff



Re: [PING] Re: [PATCH] c++/67913, 67917 - fix new expression with wrong number of elements

2015-11-06 Thread Jason Merrill

OK.

Jason


Re: improved RTL-level if conversion using scratchpads [half-hammock edition]

2015-11-06 Thread Bernd Schmidt

On 11/06/2015 03:10 PM, Sebastian Pop wrote:

On Fri, Nov 6, 2015 at 2:56 AM, Bernd Schmidt  wrote:

Formatting problem, here and in a few other places. I didn't fully read the
patch this time around.

I'm probably not reviewing further patches because I don't see this
progressing to a state where it's acceptable. Others may do so, but as far
as I'm concerned the patch is rejected.


Bernd,
I would like to ask you to focus on the technical part, and provide a
review only based on technical reasons.
Please ignore all formatting changes: I will help address all those changes.
I will send a patch addressing all the comments you had in the current review.


As long as this just has allocation from the normal stack frame as its 
only strategy, I consider it unacceptable (and I think Richard B voiced 
the same opinion). If you want a half-finished redzone allocator, I can 
send you a patch.



Bernd



Add -fno-math-errno to gcc.dg/lto/20110201-1_0.c

2015-11-06 Thread Richard Sandiford
At the moment the ECF_* flags for a gimple call to a built-in
function are derived from the function decl, which in turn is
derived from the global command-line options.  So if the compiler
is run with -fno-math-errno, we always assume functions don't set
errno, regardless of local optimization options.  Similarly if the
compiler is run with -fmath-errno, we always assume functions set errno.

This shows up in gcc.dg/lto/20110201-1_0.c, where we compile
the file with -O0 and use -O2 -ffast-math for a specific function.
-O2 -ffast-math is enough for us to convert cabs to sqrt as hoped,
but because of the global -fmath-errno setting, we assume that the
call to sqrt is not pure or const and create vops for it.  This makes
it appear to the gimple code that a simple sqrt optab isn't enough.

Later patches move more decisions about maths functions to gimple
and think that in this case we should use:

y = sqrt (x);
if (!(x >= 0))
sqrt (x); // to set errno.

This is being tracked as PR68235.  For now the patch adds
-fno-math-errno to the dg-options for this test.

Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.
OK to install?

Thanks,
Richard

gcc/testsuite/
PR tree-optimization/68235
* gcc.dg/lto/20110201-1_0.c: Add -fno-math-errno.

diff --git a/gcc/testsuite/gcc.dg/lto/20110201-1_0.c 
b/gcc/testsuite/gcc.dg/lto/20110201-1_0.c
index 068dddc..2144f07 100644
--- a/gcc/testsuite/gcc.dg/lto/20110201-1_0.c
+++ b/gcc/testsuite/gcc.dg/lto/20110201-1_0.c
@@ -1,6 +1,6 @@
 /* { dg-lto-do run } */
-/* { dg-lto-options { { -O0 -flto } } } */
-/* { dg-lto-options { "-O0 -flto -mfloat-abi=softfp -mfpu=neon-vfpv4" } { 
target arm*-*-* } } */
+/* { dg-lto-options { { -O0 -flto -fno-math-errno } } } */
+/* { dg-lto-options { "-O0 -flto -fno-math-errno -mfloat-abi=softfp 
-mfpu=neon-vfpv4" } { target arm*-*-* } } */
 /* { dg-require-linker-plugin "" } */
 /* { dg-require-effective-target sqrt_insn } */
 



Don't treat rint as setting errno

2015-11-06 Thread Richard Sandiford
builtins.def says that rint sets errno, but it looks like this might
be a mistake.  C99 says that rint doesn't set errno and the builtins.c
expansion code doesn't try to keep errno up to date.

Perhaps this was because earlier versions of POSIX said that
rint sets errno on overflow:

http://pubs.opengroup.org/onlinepubs/009695399/functions/rintf.html

However, this is another instance of the observation that "rounding
functions could never overflow" (because anything using exponents
that large is already integral).  The page above also says that
differences with C99 are unintentional and the ERANGE clause has
been removed from later versions of POSIX:

http://pubs.opengroup.org/onlinepubs/9699919799/functions/rint.html

Also, the version of POSIX that lists ERANGE for rint does the same
for nearbyint:

http://pubs.opengroup.org/onlinepubs/009695399/functions/nearbyintf.html

and we already treat nearbyint as not setting errno.  This too has been
clarified in later versions of POSIX:

http://pubs.opengroup.org/onlinepubs/9699919799/functions/nearbyint.html

Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.
OK to install?

Thanks,
Richard


gcc/
* builtins.def (BUILTIN_RINT, BUILTIN_RINTF, BUILTIN_RINTL): Use
ATTR_MATHFN_FPROUNDING rather than ATTR_MATHFN_FPROUNDING_ERRNO.

diff --git a/gcc/builtins.def b/gcc/builtins.def
index 886b45c..076da40 100644
--- a/gcc/builtins.def
+++ b/gcc/builtins.def
@@ -475,9 +475,9 @@ DEF_C99_BUILTIN(BUILT_IN_REMAINDERL, "remainderl", 
BT_FN_LONGDOUBLE_LONG
 DEF_C99_BUILTIN(BUILT_IN_REMQUO, "remquo", 
BT_FN_DOUBLE_DOUBLE_DOUBLE_INTPTR, ATTR_MATHFN_FPROUNDING_STORE)
 DEF_C99_BUILTIN(BUILT_IN_REMQUOF, "remquof", 
BT_FN_FLOAT_FLOAT_FLOAT_INTPTR, ATTR_MATHFN_FPROUNDING_STORE)
 DEF_C99_BUILTIN(BUILT_IN_REMQUOL, "remquol", 
BT_FN_LONGDOUBLE_LONGDOUBLE_LONGDOUBLE_INTPTR, ATTR_MATHFN_FPROUNDING_STORE)
-DEF_C99_BUILTIN(BUILT_IN_RINT, "rint", BT_FN_DOUBLE_DOUBLE, 
ATTR_MATHFN_FPROUNDING_ERRNO)
-DEF_C99_BUILTIN(BUILT_IN_RINTF, "rintf", BT_FN_FLOAT_FLOAT, 
ATTR_MATHFN_FPROUNDING_ERRNO)
-DEF_C99_BUILTIN(BUILT_IN_RINTL, "rintl", BT_FN_LONGDOUBLE_LONGDOUBLE, 
ATTR_MATHFN_FPROUNDING_ERRNO)
+DEF_C99_BUILTIN(BUILT_IN_RINT, "rint", BT_FN_DOUBLE_DOUBLE, 
ATTR_MATHFN_FPROUNDING)
+DEF_C99_BUILTIN(BUILT_IN_RINTF, "rintf", BT_FN_FLOAT_FLOAT, 
ATTR_MATHFN_FPROUNDING)
+DEF_C99_BUILTIN(BUILT_IN_RINTL, "rintl", BT_FN_LONGDOUBLE_LONGDOUBLE, 
ATTR_MATHFN_FPROUNDING)
 DEF_C99_BUILTIN(BUILT_IN_ROUND, "round", BT_FN_DOUBLE_DOUBLE, 
ATTR_CONST_NOTHROW_LEAF_LIST)
 DEF_C99_BUILTIN(BUILT_IN_ROUNDF, "roundf", BT_FN_FLOAT_FLOAT, 
ATTR_CONST_NOTHROW_LEAF_LIST)
 DEF_C99_BUILTIN(BUILT_IN_ROUNDL, "roundl", 
BT_FN_LONGDOUBLE_LONGDOUBLE, ATTR_CONST_NOTHROW_LEAF_LIST)



Move c_getstr to fold-const.c

2015-11-06 Thread Richard Sandiford
Upcoming patches to fold-const-call.c want to use c_getstr, which is
currently defined in builtins.c.  The function doesn't really do anything
related to built-ins, and I'd rather not make fold-const-call.c depend
on builtins.c and builtins.c depend on fold-const-call.c, so this patch
moves the function to fold-const.c instead.

Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.
OK to install?

Thanks,
Richard


gcc/
* builtins.h (c_getstr): Move to...
* fold-const.h (c_getstr): ...here.
* builtins.c (c_getstr): Move to...
* fold-const.c (c_getstr): ...here.

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 8f0717c..69c56e7 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -616,27 +616,6 @@ c_strlen (tree src, int only_value)
   return ssize_int (strlen (ptr + offset));
 }
 
-/* Return a char pointer for a C string if it is a string constant
-   or sum of string constant and integer constant.  */
-
-const char *
-c_getstr (tree src)
-{
-  tree offset_node;
-
-  src = string_constant (src, _node);
-  if (src == 0)
-return 0;
-
-  if (offset_node == 0)
-return TREE_STRING_POINTER (src);
-  else if (!tree_fits_uhwi_p (offset_node)
-  || compare_tree_int (offset_node, TREE_STRING_LENGTH (src) - 1) > 0)
-return 0;
-
-  return TREE_STRING_POINTER (src) + tree_to_uhwi (offset_node);
-}
-
 /* Return a constant integer corresponding to target reading
GET_MODE_BITSIZE (MODE) bits from string constant STR.  */
 
diff --git a/gcc/builtins.h b/gcc/builtins.h
index cce9e75..b039632 100644
--- a/gcc/builtins.h
+++ b/gcc/builtins.h
@@ -87,7 +87,6 @@ extern bool is_simple_builtin (tree);
 extern bool is_inexpensive_builtin (tree);
 
 extern bool readonly_data_expr (tree exp);
-extern const char *c_getstr (tree);
 extern bool init_target_chars (void);
 extern unsigned HOST_WIDE_INT target_newline;
 extern unsigned HOST_WIDE_INT target_percent;
diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index ee9b349..3b6e898 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -14397,3 +14397,24 @@ fold_build_pointer_plus_hwi_loc (location_t loc, tree 
ptr, HOST_WIDE_INT off)
   return fold_build2_loc (loc, POINTER_PLUS_EXPR, TREE_TYPE (ptr),
  ptr, size_int (off));
 }
+
+/* Return a char pointer for a C string if it is a string constant
+   or sum of string constant and integer constant.  */
+
+const char *
+c_getstr (tree src)
+{
+  tree offset_node;
+
+  src = string_constant (src, _node);
+  if (src == 0)
+return 0;
+
+  if (offset_node == 0)
+return TREE_STRING_POINTER (src);
+  else if (!tree_fits_uhwi_p (offset_node)
+  || compare_tree_int (offset_node, TREE_STRING_LENGTH (src) - 1) > 0)
+return 0;
+
+  return TREE_STRING_POINTER (src) + tree_to_uhwi (offset_node);
+}
diff --git a/gcc/fold-const.h b/gcc/fold-const.h
index 97d18cf..94a21b7 100644
--- a/gcc/fold-const.h
+++ b/gcc/fold-const.h
@@ -180,6 +180,7 @@ extern tree exact_inverse (tree, tree);
 extern tree const_unop (enum tree_code, tree, tree);
 extern tree const_binop (enum tree_code, tree, tree, tree);
 extern bool negate_mathfn_p (enum built_in_function);
+extern const char *c_getstr (tree);
 
 /* Return OFF converted to a pointer offset type suitable as offset for
POINTER_PLUS_EXPR.  Use location LOC for this conversion.  */



[gomp4, committed] Revert "Add dom_walker::walk_until"

2015-11-06 Thread Tom de Vries

Hi,

this patch reverts the "Add dom_walker::walk_until" patch.

The dom_walker::walk_until functionality is no longer required now that 
we've reverted pass_dominator::sese_mode_p.


Committed to gomp-4_0-branch.

Thanks,
- Tom
Revert "Add dom_walker::walk_until"

2015-11-06  Tom de Vries  

	revert:
	2015-10-12  Tom de Vries  

	* domwalk.c (dom_walker::walk): Rename to ...
	(dom_walker::walk_until): ... this.  Add and handle until and
	until_inclusive parameters.
	(dom_walker::walk): Reimplement using dom_walker::walk_until.
	* domwalk.h (dom_walker::walk_until): Declare.
---
 gcc/domwalk.c | 32 +---
 gcc/domwalk.h |  2 --
 2 files changed, 5 insertions(+), 29 deletions(-)

diff --git a/gcc/domwalk.c b/gcc/domwalk.c
index 6a205f0..167fc38 100644
--- a/gcc/domwalk.c
+++ b/gcc/domwalk.c
@@ -143,18 +143,11 @@ cmp_bb_postorder (const void *a, const void *b)
 }
 
 /* Recursively walk the dominator tree.
-   BB is the basic block we are currently visiting.  UNTIL is a basic_block that
-   is the root of a subtree that we won't visit.  If UNTIL_INCLUSIVE, we visit
-   UNTIL, but not it's children.  Otherwise don't visit UNTIL and its
-   children.  */
+   BB is the basic block we are currently visiting.  */
 
 void
-dom_walker::walk_until (basic_block bb, basic_block until, bool until_inclusive)
+dom_walker::walk (basic_block bb)
 {
-  bool skip_self = (bb == until && !until_inclusive);
-  if (skip_self)
-return;
-
   basic_block dest;
   basic_block *worklist = XNEWVEC (basic_block,
    n_basic_blocks_for_fn (cfun) * 2);
@@ -188,15 +181,9 @@ dom_walker::walk_until (basic_block bb, basic_block until, bool until_inclusive)
 	  worklist[sp++] = NULL;
 
 	  int saved_sp = sp;
-	  bool skip_children = bb == until && until_inclusive;
-	  if (!skip_children)
-	for (dest = first_dom_son (m_dom_direction, bb);
-		 dest; dest = next_dom_son (m_dom_direction, dest))
-	  {
-		bool skip_child = (dest == until && !until_inclusive);
-		if (!skip_child)
-		  worklist[sp++] = dest;
-	  }
+	  for (dest = first_dom_son (m_dom_direction, bb);
+	   dest; dest = next_dom_son (m_dom_direction, dest))
+	worklist[sp++] = dest;
 	  if (m_dom_direction == CDI_DOMINATORS)
 	switch (sp - saved_sp)
 	  {
@@ -230,12 +217,3 @@ dom_walker::walk_until (basic_block bb, basic_block until, bool until_inclusive)
 }
   free (worklist);
 }
-
-/* Recursively walk the dominator tree.
-   BB is the basic block we are currently visiting.  */
-
-void
-dom_walker::walk (basic_block bb)
-{
-  walk_until (bb, NULL, true);
-}
diff --git a/gcc/domwalk.h b/gcc/domwalk.h
index 71e6075..71a7c47 100644
--- a/gcc/domwalk.h
+++ b/gcc/domwalk.h
@@ -34,8 +34,6 @@ public:
 
   /* Walk the dominator tree.  */
   void walk (basic_block);
-  /* Walk a part of the dominator tree.  */
-  void walk_until (basic_block, basic_block, bool);
 
   /* Function to call before the recursive walk of the dominator children.  */
   virtual void before_dom_children (basic_block) {}
-- 
1.9.1



[gomp4, committed] Fix double word typo in tree-inline.c

2015-11-06 Thread Tom de Vries

Hi,

reverting a patch in tree-inline.c in gomp-4_0-branch exposed a typo 
already fixed on trunk.  This patch fixes that.


Committed to gomp-4_0-branch.

Thanks,
- Tom

2015-11-06  Tom de Vries  

	backport from trunk:
	2015-07-12  Aldy Hernandez  

	* tree-inline.c: Fix double word typos.
---
 gcc/tree-inline.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
index 3d06e6e..884131f 100644
--- a/gcc/tree-inline.c
+++ b/gcc/tree-inline.c
@@ -4540,7 +4540,7 @@ expand_call_inline (basic_block bb, gimple *stmt, copy_body_data *id)
   id->src_cfun = DECL_STRUCT_FUNCTION (fn);
   id->call_stmt = stmt;
 
-  /* If the the src function contains an IFN_VA_ARG, then so will the dst
+  /* If the src function contains an IFN_VA_ARG, then so will the dst
  function after inlining.  */
   if ((id->src_cfun->curr_properties & PROP_gimple_lva) == 0)
 {
-- 
1.9.1



Handle constant fp classifications in fold-const-call.c

2015-11-06 Thread Richard Sandiford
Move the constant "is finite", "is infinite" and "is nan" queries
to fold-const-call.c.

Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.
OK to install?

Thanks,
Richard


gcc/
* builtins.c (fold_builtin_classify): Move constant cases to...
* fold-const-call.c (fold_const_call_ss): ...here.

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 69c56e7..6eefd54 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -8018,7 +8018,6 @@ static tree
 fold_builtin_classify (location_t loc, tree fndecl, tree arg, int 
builtin_index)
 {
   tree type = TREE_TYPE (TREE_TYPE (fndecl));
-  REAL_VALUE_TYPE r;
 
   if (!validate_arg (arg, REAL_TYPE))
 return NULL_TREE;
@@ -8029,16 +8028,6 @@ fold_builtin_classify (location_t loc, tree fndecl, tree 
arg, int builtin_index)
   if (!HONOR_INFINITIES (arg))
return omit_one_operand_loc (loc, type, integer_zero_node, arg);
 
-  if (TREE_CODE (arg) == REAL_CST)
-   {
- r = TREE_REAL_CST (arg);
- if (real_isinf ())
-   return real_compare (GT_EXPR, , )
-  ? integer_one_node : integer_minus_one_node;
- else
-   return integer_zero_node;
-   }
-
   return NULL_TREE;
 
 case BUILT_IN_ISINF_SIGN:
@@ -8078,24 +8067,12 @@ fold_builtin_classify (location_t loc, tree fndecl, 
tree arg, int builtin_index)
  && !HONOR_INFINITIES (arg))
return omit_one_operand_loc (loc, type, integer_one_node, arg);
 
-  if (TREE_CODE (arg) == REAL_CST)
-   {
- r = TREE_REAL_CST (arg);
- return real_isfinite () ? integer_one_node : integer_zero_node;
-   }
-
   return NULL_TREE;
 
 case BUILT_IN_ISNAN:
   if (!HONOR_NANS (arg))
return omit_one_operand_loc (loc, type, integer_zero_node, arg);
 
-  if (TREE_CODE (arg) == REAL_CST)
-   {
- r = TREE_REAL_CST (arg);
- return real_isnan () ? integer_one_node : integer_zero_node;
-   }
-
   arg = builtin_save_expr (arg);
   return fold_build2_loc (loc, UNORDERED_EXPR, type, arg, arg);
 
diff --git a/gcc/fold-const-call.c b/gcc/fold-const-call.c
index 5af2c63..c277d2b 100644
--- a/gcc/fold-const-call.c
+++ b/gcc/fold-const-call.c
@@ -736,6 +736,31 @@ fold_const_call_ss (wide_int *result, built_in_function fn,
   /* Not yet folded to a constant.  */
   return false;
 
+CASE_FLT_FN (BUILT_IN_FINITE):
+case BUILT_IN_FINITED32:
+case BUILT_IN_FINITED64:
+case BUILT_IN_FINITED128:
+case BUILT_IN_ISFINITE:
+  *result = wi::shwi (real_isfinite (arg) ? 1 : 0, precision);
+  return true;
+
+CASE_FLT_FN (BUILT_IN_ISINF):
+case BUILT_IN_ISINFD32:
+case BUILT_IN_ISINFD64:
+case BUILT_IN_ISINFD128:
+  if (real_isinf (arg))
+   *result = wi::shwi (arg->sign ? -1 : 1, precision);
+  else
+   *result = wi::shwi (0, precision);
+  return true;
+
+CASE_FLT_FN (BUILT_IN_ISNAN):
+case BUILT_IN_ISNAND32:
+case BUILT_IN_ISNAND64:
+case BUILT_IN_ISNAND128:
+  *result = wi::shwi (real_isnan (arg) ? 1 : 0, precision);
+  return true;
+
 default:
   return false;
 }



Move constant bitop and bswap folds to fold-const-call.c

2015-11-06 Thread Richard Sandiford
The only folds left in builtins.c were for constants, so we can remove
the builtins.c handling entirely.

Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.
OK to install?

Thanks,
Richard


gcc/
* builtins.c (fold_builtin_bitop, fold_builtin_bswap): Delete.
(fold_builtin_1): Don't call them.
* fold-const-call.c: Include tm.h.
(fold_const_call_ss): New variant for integer-to-integer folds.
(fold_const_call): Call it.

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 6eefd54..3f7fe3b 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -148,7 +148,6 @@ static tree rewrite_call_expr (location_t, tree, int, tree, 
int, ...);
 static bool validate_arg (const_tree, enum tree_code code);
 static rtx expand_builtin_fabs (tree, rtx, rtx);
 static rtx expand_builtin_signbit (tree, rtx);
-static tree fold_builtin_bitop (tree, tree);
 static tree fold_builtin_strchr (location_t, tree, tree, tree);
 static tree fold_builtin_memchr (location_t, tree, tree, tree, tree);
 static tree fold_builtin_memcmp (location_t, tree, tree, tree);
@@ -7332,99 +7331,6 @@ fold_builtin_sincos (location_t loc,
 fold_build1_loc (loc, REALPART_EXPR, type, call)));
 }
 
-/* Fold function call to builtin ffs, clz, ctz, popcount and parity
-   and their long and long long variants (i.e. ffsl and ffsll).  ARG is
-   the argument to the call.  Return NULL_TREE if no simplification can
-   be made.  */
-
-static tree
-fold_builtin_bitop (tree fndecl, tree arg)
-{
-  if (!validate_arg (arg, INTEGER_TYPE))
-return NULL_TREE;
-
-  /* Optimize for constant argument.  */
-  if (TREE_CODE (arg) == INTEGER_CST && !TREE_OVERFLOW (arg))
-{
-  tree type = TREE_TYPE (arg);
-  int result;
-
-  switch (DECL_FUNCTION_CODE (fndecl))
-   {
-   CASE_INT_FN (BUILT_IN_FFS):
- result = wi::ffs (arg);
- break;
-
-   CASE_INT_FN (BUILT_IN_CLZ):
- if (wi::ne_p (arg, 0))
-   result = wi::clz (arg);
- else if (! CLZ_DEFINED_VALUE_AT_ZERO (TYPE_MODE (type), result))
-   result = TYPE_PRECISION (type);
- break;
-
-   CASE_INT_FN (BUILT_IN_CTZ):
- if (wi::ne_p (arg, 0))
-   result = wi::ctz (arg);
- else if (! CTZ_DEFINED_VALUE_AT_ZERO (TYPE_MODE (type), result))
-   result = TYPE_PRECISION (type);
- break;
-
-   CASE_INT_FN (BUILT_IN_CLRSB):
- result = wi::clrsb (arg);
- break;
-
-   CASE_INT_FN (BUILT_IN_POPCOUNT):
- result = wi::popcount (arg);
- break;
-
-   CASE_INT_FN (BUILT_IN_PARITY):
- result = wi::parity (arg);
- break;
-
-   default:
- gcc_unreachable ();
-   }
-
-  return build_int_cst (TREE_TYPE (TREE_TYPE (fndecl)), result);
-}
-
-  return NULL_TREE;
-}
-
-/* Fold function call to builtin_bswap and the short, long and long long
-   variants.  Return NULL_TREE if no simplification can be made.  */
-static tree
-fold_builtin_bswap (tree fndecl, tree arg)
-{
-  if (! validate_arg (arg, INTEGER_TYPE))
-return NULL_TREE;
-
-  /* Optimize constant value.  */
-  if (TREE_CODE (arg) == INTEGER_CST && !TREE_OVERFLOW (arg))
-{
-  tree type = TREE_TYPE (TREE_TYPE (fndecl));
-
-  switch (DECL_FUNCTION_CODE (fndecl))
-   {
- case BUILT_IN_BSWAP16:
- case BUILT_IN_BSWAP32:
- case BUILT_IN_BSWAP64:
-   {
- signop sgn = TYPE_SIGN (type);
- tree result =
-   wide_int_to_tree (type,
- wide_int::from (arg, TYPE_PRECISION (type),
- sgn).bswap ());
- return result;
-   }
-   default:
- gcc_unreachable ();
-   }
-}
-
-  return NULL_TREE;
-}
-
 /* Fold function call to builtin memchr.  ARG1, ARG2 and LEN are the
arguments to the call, and TYPE is its return type.
Return NULL_TREE if no simplification can be made.  */
@@ -8364,19 +8270,6 @@ fold_builtin_1 (location_t loc, tree fndecl, tree arg0)
 CASE_FLT_FN (BUILT_IN_NANS):
   return fold_builtin_nan (arg0, type, false);
 
-case BUILT_IN_BSWAP16:
-case BUILT_IN_BSWAP32:
-case BUILT_IN_BSWAP64:
-  return fold_builtin_bswap (fndecl, arg0);
-
-CASE_INT_FN (BUILT_IN_FFS):
-CASE_INT_FN (BUILT_IN_CLZ):
-CASE_INT_FN (BUILT_IN_CTZ):
-CASE_INT_FN (BUILT_IN_CLRSB):
-CASE_INT_FN (BUILT_IN_POPCOUNT):
-CASE_INT_FN (BUILT_IN_PARITY):
-  return fold_builtin_bitop (fndecl, arg0);
-
 case BUILT_IN_ISASCII:
   return fold_builtin_isascii (loc, arg0);
 
diff --git a/gcc/fold-const-call.c b/gcc/fold-const-call.c
index c277d2b..48e05a9 100644
--- a/gcc/fold-const-call.c
+++ b/gcc/fold-const-call.c
@@ -25,6 +25,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "stor-layout.h"
 #include "options.h"
 #include "fold-const-call.h"
+#include "tm.h" /* For 

Re: Move c_getstr to fold-const.c

2015-11-06 Thread Bernd Schmidt

On 11/06/2015 04:05 PM, Richard Sandiford wrote:

Upcoming patches to fold-const-call.c want to use c_getstr, which is
currently defined in builtins.c.  The function doesn't really do anything
related to built-ins, and I'd rather not make fold-const-call.c depend
on builtins.c and builtins.c depend on fold-const-call.c, so this patch
moves the function to fold-const.c instead.

Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.
OK to install?


Sure.


Bernd


Re: Don't treat rint as setting errno

2015-11-06 Thread Bernd Schmidt

On 11/06/2015 04:02 PM, Richard Sandiford wrote:

builtins.def says that rint sets errno, but it looks like this might
be a mistake.  C99 says that rint doesn't set errno and the builtins.c
expansion code doesn't try to keep errno up to date.


[snip explanation of the history]

FWIW the manpage has this to say:

"SUSv2 and POSIX.1-2001 contain text about overflow (which might set 
errno to ERANGE, or raise an FE_OVERFLOW exception).  In practice, the 
result cannot overflow on any current machine, so this error-handling 
stuff is just  nonsense."



Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.
OK to install?


Ok.


Bernd


Small C++ PATCH to clean up non-type template parameter handling

2015-11-06 Thread Jason Merrill
The old code here laments not being able to reuse the CONST_DECL created 
by process_template_parm, but we can get to it from the TEMPLATE_PARM_INDEX.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit 02af5bca9bd6dbd3080bef614e411c56303d3c66
Author: Jason Merrill 
Date:   Thu Nov 5 16:20:12 2015 -0500

	* pt.c (push_inline_template_parms_recursive): Don't recreate the
	CONST_DECL.

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 45eda3a..bfea8e2 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -441,21 +441,8 @@ push_inline_template_parms_recursive (tree parmlist, int levels)
 	  break;
 
 	case PARM_DECL:
-	  {
-	/* Make a CONST_DECL as is done in process_template_parm.
-	   It is ugly that we recreate this here; the original
-	   version built in process_template_parm is no longer
-	   available.  */
-	tree decl = build_decl (DECL_SOURCE_LOCATION (parm),
-CONST_DECL, DECL_NAME (parm),
-TREE_TYPE (parm));
-	DECL_ARTIFICIAL (decl) = 1;
-	TREE_CONSTANT (decl) = 1;
-	TREE_READONLY (decl) = 1;
-	DECL_INITIAL (decl) = DECL_INITIAL (parm);
-	SET_DECL_TEMPLATE_PARM_P (decl);
-	pushdecl (decl);
-	  }
+	  /* Push the CONST_DECL.  */
+	  pushdecl (TEMPLATE_PARM_DECL (DECL_INITIAL (parm)));
 	  break;
 
 	default:


Re: [PING 2] [PATCH] c++/67942 - diagnose placement new buffer overflow

2015-11-06 Thread Martin Sebor

On 11/06/2015 05:55 AM, Rainer Orth wrote:

Martin Sebor  writes:


If we use gcc_checking_assert it won't fire in release builds; let's go
with that.


Okay. Attached is an updated patch with that change.


Unfortunately, this breaks i386-pc-solaris2.10 bootstrap:

/vol/gcc/src/hg/trunk/local/gcc/cp/init.c: In function 'void 
warn_placement_new_too_small(tree, tree, tree, tree)':
/vol/gcc/src/hg/trunk/local/gcc/cp/init.c:2454:17: error: format '%lu' expects 
argument of type 'long unsigned int', but argument 5 has type 'long long 
unsigned int' [-Werror=format=]
   bytes_avail);
  ^

Printing an unsigned HOST_WIDE_INT with %lu in one case, but %wu in the
other seems like a simple typo, so the following fixes bootstrap for me:


Yes, that was a typo. Sorry about that and thanks to ktkachov for
committing the fix!

Martin


Move #undef DEF_INTERNAL_FN to internal-fn.def

2015-11-06 Thread Richard Sandiford
In practice the definition of DEF_INTERNAL_FN is never reused after
including internal-fn.def, so we might as well #undef it there.

This becomes more obvious with a later patch that adds other
DEF_INTERNAL_* directives, such as DEF_INTERNAL_OPTAB_FN.
If the includer doesn't care about the information carried in
these new directives, it can simply leave the macro undefined
and internals.def will provide a definition that forwards to
DEF_INTERNAL_FN.  It doesn't make much sense for includers to have
to #undef macros that are defined by internals.def and it seems overly
complicated to get internals.def to undef macros only in the cases
where it provided a definition.  Instead I went with the approach of
#undeffing all the DEF_INTERNAL_* macros unconditionally.

Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.
OK to install?

Thanks,
Richard


gcc/
* internal-fn.def: #undef DEF_INTERNAL_FN at the end.
* internal-fn.c: Don't undef it here.
* tree-core.h: Likewise.

diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index a7da373..f9f7746 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -43,7 +43,6 @@ along with GCC; see the file COPYING3.  If not see
 const char *const internal_fn_name_array[] = {
 #define DEF_INTERNAL_FN(CODE, FLAGS, FNSPEC) #CODE,
 #include "internal-fn.def"
-#undef DEF_INTERNAL_FN
   ""
 };
 
@@ -51,7 +50,6 @@ const char *const internal_fn_name_array[] = {
 const int internal_fn_flags_array[] = {
 #define DEF_INTERNAL_FN(CODE, FLAGS, FNSPEC) FLAGS,
 #include "internal-fn.def"
-#undef DEF_INTERNAL_FN
   0
 };
 
@@ -65,7 +63,6 @@ init_internal_fns ()
   if (FNSPEC) internal_fn_fnspec_array[IFN_##CODE] = \
 build_string ((int) sizeof (FNSPEC), FNSPEC ? FNSPEC : "");
 #include "internal-fn.def"
-#undef DEF_INTERNAL_FN
   internal_fn_fnspec_array[IFN_LAST] = 0;
 }
 
@@ -2054,7 +2051,6 @@ expand_GOACC_LOOP (gcall *stmt ATTRIBUTE_UNUSED)
 static void (*const internal_fn_expanders[]) (gcall *) = {
 #define DEF_INTERNAL_FN(CODE, FLAGS, FNSPEC) expand_##CODE,
 #include "internal-fn.def"
-#undef DEF_INTERNAL_FN
   0
 };
 
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 78266d9..a2da2dd 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -83,3 +83,5 @@ DEF_INTERNAL_FN (GOACC_DIM_POS, ECF_PURE | ECF_NOTHROW | 
ECF_LEAF, ".")
 
 /* OpenACC looping abstraction.  See internal-fn.h for usage.  */
 DEF_INTERNAL_FN (GOACC_LOOP, ECF_PURE | ECF_NOTHROW, NULL)
+
+#undef DEF_INTERNAL_FN
diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index 6b17da7..1c6976e 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -768,7 +768,6 @@ enum annot_expr_kind {
 enum internal_fn {
 #define DEF_INTERNAL_FN(CODE, FLAGS, FNSPEC) IFN_##CODE,
 #include "internal-fn.def"
-#undef DEF_INTERNAL_FN
   IFN_LAST
 };
 



Move const char * -> int/fp folds to fold-const-call.c

2015-11-06 Thread Richard Sandiford
This patch moves folds that deal with constant string arguments and
return a constant integer or floating-point value.  For example, it
handles strcmp ("foo", "bar") but not strstr ("foobar", "bar"),
which wouldn't currently be accepted by the gimple folders.

The builtins.c folding for strlen (via c_strlen) is a bit more general
than what the fold-const-call.c code does (and more general than we need
for the gimple folders).  I've therefore left it as-is, even though it
partially duplicates the new code.

Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.
OK to install?

Thanks,
Richard


gcc/
* builtins.c (fold_builtin_nan): Delete.
(fold_builtin_memcmp): Remove case where both arguments are constant.
(fold_builtin_strcmp, fold_builtin_strncmp): Likewise.
(fold_builtin_strspn, fold_builtin_strcspn): Likewise.
(fold_builtin_1): Remove BUILT_IN_NAN* handling.
* fold-const-call.c: Include fold-const.h.
(host_size_t_cst_p): New function.
(build_cmp_result, fold_const_builtin_nan): Likewise.
(fold_const_call_1): New function, split out from...
(fold_const_call): ...here (for all three interfaces).  Handle
constant nan, nans, strlen, strcmp, strncmp, strspn and strcspn.

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 3f7fe3b..add9fc8 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -143,7 +143,6 @@ static tree fold_builtin_constant_p (tree);
 static tree fold_builtin_classify_type (tree);
 static tree fold_builtin_strlen (location_t, tree, tree);
 static tree fold_builtin_inf (location_t, tree, int);
-static tree fold_builtin_nan (tree, tree, int);
 static tree rewrite_call_expr (location_t, tree, int, tree, int, ...);
 static bool validate_arg (const_tree, enum tree_code code);
 static rtx expand_builtin_fabs (tree, rtx, rtx);
@@ -7264,26 +7263,6 @@ fold_builtin_inf (location_t loc, tree type, int warn)
   return build_real (type, real);
 }
 
-/* Fold a call to __builtin_nan or __builtin_nans with argument ARG.  */
-
-static tree
-fold_builtin_nan (tree arg, tree type, int quiet)
-{
-  REAL_VALUE_TYPE real;
-  const char *str;
-
-  if (!validate_arg (arg, POINTER_TYPE))
-return NULL_TREE;
-  str = c_getstr (arg);
-  if (!str)
-return NULL_TREE;
-
-  if (!real_nan (, str, quiet, TYPE_MODE (type)))
-return NULL_TREE;
-
-  return build_real (type, real);
-}
-
 /* Fold function call to builtin sincos, sincosf, or sincosl.  Return
NULL_TREE if no simplification can be made.  */
 
@@ -7378,8 +7357,6 @@ fold_builtin_memchr (location_t loc, tree arg1, tree 
arg2, tree len, tree type)
 static tree
 fold_builtin_memcmp (location_t loc, tree arg1, tree arg2, tree len)
 {
-  const char *p1, *p2;
-
   if (!validate_arg (arg1, POINTER_TYPE)
   || !validate_arg (arg2, POINTER_TYPE)
   || !validate_arg (len, INTEGER_TYPE))
@@ -7394,25 +7371,6 @@ fold_builtin_memcmp (location_t loc, tree arg1, tree 
arg2, tree len)
   if (operand_equal_p (arg1, arg2, 0))
 return omit_one_operand_loc (loc, integer_type_node, integer_zero_node, 
len);
 
-  p1 = c_getstr (arg1);
-  p2 = c_getstr (arg2);
-
-  /* If all arguments are constant, and the value of len is not greater
- than the lengths of arg1 and arg2, evaluate at compile-time.  */
-  if (tree_fits_uhwi_p (len) && p1 && p2
-  && compare_tree_int (len, strlen (p1) + 1) <= 0
-  && compare_tree_int (len, strlen (p2) + 1) <= 0)
-{
-  const int r = memcmp (p1, p2, tree_to_uhwi (len));
-
-  if (r > 0)
-   return integer_one_node;
-  else if (r < 0)
-   return integer_minus_one_node;
-  else
-   return integer_zero_node;
-}
-
   /* If len parameter is one, return an expression corresponding to
  (*(const unsigned char*)arg1 - (const unsigned char*)arg2).  */
   if (tree_fits_uhwi_p (len) && tree_to_uhwi (len) == 1)
@@ -7445,8 +7403,6 @@ fold_builtin_memcmp (location_t loc, tree arg1, tree 
arg2, tree len)
 static tree
 fold_builtin_strcmp (location_t loc, tree arg1, tree arg2)
 {
-  const char *p1, *p2;
-
   if (!validate_arg (arg1, POINTER_TYPE)
   || !validate_arg (arg2, POINTER_TYPE))
 return NULL_TREE;
@@ -7455,21 +7411,8 @@ fold_builtin_strcmp (location_t loc, tree arg1, tree 
arg2)
   if (operand_equal_p (arg1, arg2, 0))
 return integer_zero_node;
 
-  p1 = c_getstr (arg1);
-  p2 = c_getstr (arg2);
-
-  if (p1 && p2)
-{
-  const int i = strcmp (p1, p2);
-  if (i < 0)
-   return integer_minus_one_node;
-  else if (i > 0)
-   return integer_one_node;
-  else
-   return integer_zero_node;
-}
-
   /* If the second arg is "", return *(const unsigned char*)arg1.  */
+  const char *p2 = c_getstr (arg2);
   if (p2 && *p2 == '\0')
 {
   tree cst_uchar_node = build_type_variant (unsigned_char_type_node, 1, 0);
@@ -7484,6 +7427,7 @@ fold_builtin_strcmp (location_t loc, tree arg1, tree arg2)
 }
 
   /* If the first arg is "", return -*(const 

Re: [PATCH] x86 interrupt attribute

2015-11-06 Thread Uros Bizjak
On Fri, Nov 6, 2015 at 3:07 PM, Yulia Koval  wrote:
> Hi,
>
> I updated and reposted the patch. Regtested/bootstraped on
> x86_64/Linux and i686/Linux. Ok for trunk?

This version still emits insns from ix86_function_arg, so NAK.

Uros.


Re: [PATCH] gcc/config.gcc: fix typo for powerpc e6500 cpu_is_64bit

2015-11-06 Thread David Edelsohn
2015-11-06  Arnout Vandecappelle  
 * gcc/config.gcc: fix typo for powerpc e6500 cpu_is_64bit

For GCC, please don't send ChangeLog entries as diffs.

Applied.  Thanks.

- David


Move #undef DEF_BUILTIN* to builtins.def

2015-11-06 Thread Richard Sandiford
I was confused at first why tree-core.h was undefining DEF_BUILTIN_CHKP
before defining it, then undefining it again after including builtins.def.
This is because builtins.def provides a default definition of
DEF_BUILTIN_CHKP, but leaves it up to the caller to undefine it where
necessary.  Similarly to the previous internal-fn.def patch, it seems
more obvious for builtins.def to #undef things unconditionally.

One argument might have been that keeping preprocessor stuff
out of the .def files makes it easier for non-cpp parsers.  In practice
though we already have #ifs and multiline #defines, so single-line #undefs
should be easy in comparison.

Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.
OK to install?

Thanks,
Richard


gcc/ada/
* gcc-interface/utils.c: Don't undef DEF_BUILTIN.

gcc/c-family/
* c-common.c: Don't undef DEF_BUILTIN.

gcc/jit/
* jit-builtins.c: Don't undef DEF_BUILTIN.

gcc/lto/
* lto-lang.c: Don't undef DEF_BUILTIN.

gcc/
* builtins.def: #undef DEF_BUILTIN and DEF_BUILTIN_CHKP
* builtins.c, genmatch.c, tree-core.h: Don't undef them here.

diff --git a/gcc/ada/gcc-interface/utils.c b/gcc/ada/gcc-interface/utils.c
index 8617a87..3b893b8 100644
--- a/gcc/ada/gcc-interface/utils.c
+++ b/gcc/ada/gcc-interface/utils.c
@@ -6040,7 +6040,6 @@ install_builtin_functions (void)
BOTH_P, FALLBACK_P, NONANSI_P,   \
built_in_attributes[(int) ATTRS], IMPLICIT);
 #include "builtins.def"
-#undef DEF_BUILTIN
 }
 
 /* --- *
diff --git a/gcc/builtins.c b/gcc/builtins.c
index add9fc8..ad661c1 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -78,7 +78,6 @@ const char * built_in_names[(int) END_BUILTINS] =
 {
 #include "builtins.def"
 };
-#undef DEF_BUILTIN
 
 /* Setup an array of builtin_info_type, make sure each element decl is
initialized to NULL_TREE.  */
diff --git a/gcc/builtins.def b/gcc/builtins.def
index 076da40..ed850df 100644
--- a/gcc/builtins.def
+++ b/gcc/builtins.def
@@ -945,3 +945,6 @@ DEF_GCC_BUILTIN (BUILT_IN_LINE, "LINE", BT_FN_INT, 
ATTR_NOTHROW_LEAF_LIST)
 
 /* Pointer Bounds Checker builtins.  */
 #include "chkp-builtins.def"
+
+#undef DEF_BUILTIN_CHKP
+#undef DEF_BUILTIN
diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index c87704b..b4663ce 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -5735,7 +5735,6 @@ c_define_builtins (tree va_list_ref_type_node, tree 
va_list_arg_type_node)
   BOTH_P, FALLBACK_P, NONANSI_P,   \
   built_in_attributes[(int) ATTRS], IMPLICIT);
 #include "builtins.def"
-#undef DEF_BUILTIN
 
   targetm.init_builtins ();
 
diff --git a/gcc/genmatch.c b/gcc/genmatch.c
index b5a0fff..241a628 100644
--- a/gcc/genmatch.c
+++ b/gcc/genmatch.c
@@ -208,7 +208,6 @@ enum built_in_function {
 #include "builtins.def"
 END_BUILTINS
 };
-#undef DEF_BUILTIN
 
 /* Return true if CODE represents a commutative tree code.  Otherwise
return false.  */
@@ -4598,7 +4597,6 @@ add_operator (VIEW_CONVERT2, "VIEW_CONVERT2", 
"tcc_unary", 1);
 #define DEF_BUILTIN(ENUM, N, C, T, LT, B, F, NA, AT, IM, COND) \
   add_builtin (ENUM, # ENUM);
 #include "builtins.def"
-#undef DEF_BUILTIN
 
   /* Parse ahead!  */
   parser p (r);
diff --git a/gcc/jit/jit-builtins.c b/gcc/jit/jit-builtins.c
index b28a5de..63ff5af 100644
--- a/gcc/jit/jit-builtins.c
+++ b/gcc/jit/jit-builtins.c
@@ -62,7 +62,6 @@ static const struct builtin_data builtin_data[] =
 {
 #include "builtins.def"
 };
-#undef DEF_BUILTIN
 
 /* Helper function for find_builtin_by_name.  */
 
diff --git a/gcc/lto/lto-lang.c b/gcc/lto/lto-lang.c
index be317a4..4805c2a 100644
--- a/gcc/lto/lto-lang.c
+++ b/gcc/lto/lto-lang.c
@@ -731,7 +731,6 @@ lto_define_builtins (tree va_list_ref_type_node 
ATTRIBUTE_UNUSED,
 builtin_types[(int) LIBTYPE], BOTH_P, FALLBACK_P,  \
 NONANSI_P, built_in_attributes[(int) ATTRS], IMPLICIT);
 #include "builtins.def"
-#undef DEF_BUILTIN
 }
 
 static GTY(()) tree registered_builtin_types;
diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index 1c6976e..bd4e629 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -159,13 +159,10 @@ enum built_in_function {
 
   BEGIN_CHKP_BUILTINS,
 
-#undef DEF_BUILTIN
 #define DEF_BUILTIN(ENUM, N, C, T, LT, B, F, NA, AT, IM, COND)
-#undef DEF_BUILTIN_CHKP
 #define DEF_BUILTIN_CHKP(ENUM, N, C, T, LT, B, F, NA, AT, IM, COND) \
   ENUM##_CHKP = ENUM + BEGIN_CHKP_BUILTINS + 1,
 #include "builtins.def"
-#undef DEF_BUILTIN_CHKP
 
   END_CHKP_BUILTINS = BEGIN_CHKP_BUILTINS * 2 + 1,
 
@@ -186,7 +183,6 @@ enum built_in_function {
   /* Upper bound on non-language-specific builtins.  */
   END_BUILTINS
 };
-#undef DEF_BUILTIN
 
 /* Tree code classes.  Each tree_code has an associated code class
represented by a TREE_CODE_CLASS.  */



Re: improved RTL-level if conversion using scratchpads [half-hammock edition]

2015-11-06 Thread Sebastian Pop
On Fri, Nov 6, 2015 at 6:32 AM, Bernd Schmidt  wrote:
> On 11/06/2015 03:10 PM, Sebastian Pop wrote:
>>
>> On Fri, Nov 6, 2015 at 2:56 AM, Bernd Schmidt  wrote:
>>>
>>> Formatting problem, here and in a few other places. I didn't fully read
>>> the
>>> patch this time around.
>>>
>>> I'm probably not reviewing further patches because I don't see this
>>> progressing to a state where it's acceptable. Others may do so, but as
>>> far
>>> as I'm concerned the patch is rejected.
>>
>>
>> Bernd,
>> I would like to ask you to focus on the technical part, and provide a
>> review only based on technical reasons.
>> Please ignore all formatting changes: I will help address all those
>> changes.
>> I will send a patch addressing all the comments you had in the current
>> review.
>
>
> As long as this just has allocation from the normal stack frame as its only
> strategy, I consider it unacceptable (and I think Richard B voiced the same

Understood.

> opinion). If you want a half-finished redzone allocator, I can send you a
> patch.

Yes please.  Let's get it work.

Thanks,
Sebastian


C++ PATCH for non-type constrained-type-specifiers

2015-11-06 Thread Jason Merrill
I started looking at allowing non-type constrained-type-specifiers in 
auto deduction and then realized that we didn't handle them in function 
parameters either.  Fixing that brought home to me the oddity of having 
a type-specifier stand in for a non-type.  Mind weighing in on that on 
the core reflector?


I also wonder why we have two different ways of expressing a 
constrained-type-specifier in the implementation: a constrained 
template-parameter (TYPE_DECL, is_constrained_parameter) and a 
constrained auto (TEMPLATE_TYPE_PARM).  Why not represent them the same way?


This patch doesn't mess with this duality, but extends 
equivalent_placeholder_constraints to deal with both kinds, and then 
uses that for comparing constrained-type-specifiers.  And also handles 
non-type constrained-type-specifiers in abbreviated function templates.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit 030c2ba7fc87d48604176697c8dcf1bf48da
Author: Jason Merrill 
Date:   Fri Nov 6 00:17:57 2015 -0500

	Support non-type constrained-type-specifiers.
	* parser.c (check_type_concept): Remove.
	(cp_parser_maybe_constrained_type_specifier): Don't call it.
	(synthesize_implicit_template_parm): Handle non-type and template
	template parameters.  Also compare extra args.  Return the decl.
	(cp_parser_template_argument): Handle constrained-type-specifiers for
	non-type template parameters.
	(finish_constrained_template_template_parm): Split out from
	cp_parser_constrained_template_template_parm.
	(cp_parser_nonclass_name): Move some logic into
	cp_parser_maybe_concept_name.
	(cp_parser_init_declarator): Fix error recovery.
	(get_concept_from_constraint): Remove.
	(cp_parser_simple_type_specifier): Adjust for
	synthesize_implicit_template_parm returning the decl.
	* constraint.cc (placeholder_extract_concept_and_args)
	(equivalent_placeholder_constraints): Also handle TYPE_DECL
	constrained parms.

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index a1fbf17..c6eaf75 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -1379,12 +1379,21 @@ make_constrained_auto (tree con, tree args)
   return decl;
 }
 
-/* Given the predicate constraint T from a placeholder type, extract its
-   TMPL and ARGS.  */
+/* Given the predicate constraint T from a constrained-type-specifier, extract
+   its TMPL and ARGS.  FIXME why do we need two different forms of
+   constrained-type-specifier?  */
 
 void
 placeholder_extract_concept_and_args (tree t, tree , tree )
 {
+  if (TREE_CODE (t) == TYPE_DECL)
+{
+  /* A constrained parameter.  */
+  tmpl = DECL_TI_TEMPLATE (CONSTRAINED_PARM_CONCEPT (t));
+  args = CONSTRAINED_PARM_EXTRA_ARGS (t);
+  return;
+}
+
   gcc_assert (TREE_CODE (t) == PRED_CONSTR);
   t = PRED_CONSTR_EXPR (t);
   gcc_assert (TREE_CODE (t) == CALL_EXPR
@@ -1418,9 +1427,10 @@ placeholder_extract_concept_and_args (tree t, tree , tree )
 bool
 equivalent_placeholder_constraints (tree c1, tree c2)
 {
-  if (TREE_CODE (c1) == TEMPLATE_TYPE_PARM)
+  if (c1 && TREE_CODE (c1) == TEMPLATE_TYPE_PARM)
+/* A constrained auto.  */
 c1 = PLACEHOLDER_TYPE_CONSTRAINTS (c1);
-  if (TREE_CODE (c2) == TEMPLATE_TYPE_PARM)
+  if (c2 && TREE_CODE (c2) == TEMPLATE_TYPE_PARM)
 c2 = PLACEHOLDER_TYPE_CONSTRAINTS (c2);
 
   if (c1 == c2)
@@ -1434,14 +1444,21 @@ equivalent_placeholder_constraints (tree c1, tree c2)
 
   if (t1 != t2)
 return false;
-  int len = TREE_VEC_LENGTH (a1);
-  if (len != TREE_VEC_LENGTH (a2))
-return false;
+
   /* Skip the first argument to avoid infinite recursion on the
  placeholder auto itself.  */
-  for (int i = len-1; i > 0; --i)
-if (!cp_tree_equal (TREE_VEC_ELT (a1, i),
-			TREE_VEC_ELT (a2, i)))
+  bool skip1 = (TREE_CODE (c1) == PRED_CONSTR);
+  bool skip2 = (TREE_CODE (c2) == PRED_CONSTR);
+
+  int len1 = (a1 ? TREE_VEC_LENGTH (a1) : 0) - skip1;
+  int len2 = (a2 ? TREE_VEC_LENGTH (a2) : 0) - skip2;
+
+  if (len1 != len2)
+return false;
+
+  for (int i = 0; i < len1; ++i)
+if (!cp_tree_equal (TREE_VEC_ELT (a1, i + skip1),
+			TREE_VEC_ELT (a2, i + skip2)))
   return false;
   return true;
 }
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index c6f5729..d1f4970 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -13871,18 +13871,9 @@ cp_parser_constrained_type_template_parm (cp_parser *parser,
 return error_mark_node;
 }
 
-/* Finish parsing/processing a template template parameter by borrowing
-   the template parameter list from the prototype parameter.  */
-
 static tree
-cp_parser_constrained_template_template_parm (cp_parser *parser,
-  tree proto,
-  tree id,
-  cp_parameter_declarator *parmdecl)
+finish_constrained_template_template_parm (tree proto, tree id)
 {
-  if 

Re: Handle constant fp classifications in fold-const-call.c

2015-11-06 Thread Bernd Schmidt

On 11/06/2015 04:08 PM, Richard Sandiford wrote:

Move the constant "is finite", "is infinite" and "is nan" queries
to fold-const-call.c.


I kind of missed it when that file was added, but I'm assuming it's the 
correct direction for things to move. Ok.



Bernd


[Obivous][testsuite,aarch64/arm] Fix typo

2015-11-06 Thread Christophe Lyon
Hi,

I've committed the attached fix for a comment in
gcc.target/aarch64/advsimd-intrinsics/vtbX.c.
(as revision 229858)

Christophe.
2015-11-06  Christophe Lyon  

* gcc.target/aarch64/advsimd-intrinsics/vtbX.c: Fix typos in
comment.

===
--- gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vtbX.c  (revision 
229857)
+++ gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vtbX.c  (revision 
229858)
@@ -147,8 +147,8 @@
   VDUP(vector, , uint, u, 8, 8, 2);
   VDUP(vector, , poly, p, 8, 8, 2);
 
-  /* To ensure coverage, add some indexes larger than 8,16 and 32
- except: lane 0 (by 6), lane 1 (by 8) and lane 2 (by 9).  */
+  /* To ensure coverage, add some indexes larger than 8, 16 and 32
+ except: lane 0 (by 10), lane 4 (by 20) and lane 5 (by 40).  */
   VSET_LANE(vector, , int, s, 8, 8, 0, 10);
   VSET_LANE(vector, , int, s, 8, 8, 4, 20);
   VSET_LANE(vector, , int, s, 8, 8, 5, 40);


Re: [OpenACC] declare directive

2015-11-06 Thread James Norris

Thomas,

I've added the issues that you pointed out. Attached
is the updated patch and ChangeLog.

Thanks for taking the time for your review!

Jakub,

Could you queue this patch ahead of my other one for
today? If it doesn't pass review, I'd like to work
the weekend and have the updated patch ready for
you when you walk in on your Monday.

Thank you,
Jim


On 11/04/2015 10:49 AM, Thomas Schwinge wrote:

Hi Jim!

On Tue, 3 Nov 2015 10:31:32 -0600, James Norris  
wrote:

On 10/27/2015 03:18 PM, James Norris wrote:

  This patch adds the processing of OpenACC declare directive in C
  and C++. (Note: Support in Fortran is already in trunk.)


..., and a patch adjusting some Fortran front end things is awaiting
review,
.


  I've revised the patch since I originally submitted it for review
  (https://gcc.gnu.org/ml/gcc-patches/2015-10/msg02967.html). The
  revision is due to Jakub and et al OpenMP 4.5 work in the area of
  'omp declare target'. I now exploit that functionality and have
  revised the patch accordingly.


Oh, wow, you could remove a lot of code!


Yes, I missed that patch when it entered into the code base. My bad.



Just a superficial review on your patch; patch re-ordered a bit for
review.


--- a/gcc/builtin-types.def
+++ b/gcc/builtin-types.def



+DEF_FUNCTION_TYPE_3 (BT_FN_VOID_PTR_INT_UINT, BT_VOID, BT_PTR, BT_INT, BT_UINT)



--- a/gcc/fortran/types.def
+++ b/gcc/fortran/types.def



+DEF_FUNCTION_TYPE_3 (BT_FN_VOID_PTR_INT_UINT, BT_VOID, BT_PTR, BT_INT, BT_UINT)



--- a/gcc/omp-builtins.def
+++ b/gcc/omp-builtins.def



+DEF_GOACC_BUILTIN (BUILT_IN_GOACC_STATIC, "GOACC_register_static",
+   BT_FN_VOID_PTR_INT_UINT, ATTR_NOTHROW_LIST)



--- a/libgomp/libgomp.map
+++ b/libgomp/libgomp.map



+GOACC_register_static;


I think these changes can be dropped -- assuming you have not
unintentionally dropped the GOACC_register_static function/usage in your
v2 patch.


Will fix.




--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c



@@ -830,6 +830,7 @@ const struct attribute_spec c_common_attribute_table[] =



+  { "oacc declare",   0, -1, true,  false, false, NULL, false },


As far as I can tell, nothing is setting this attribute anymore in your
v2 patch, so I guess this and all handling code ("lookup_attribute",
"remove_attribute") can also be dropped?


Will fix.




--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c



  /* OpenACC 2.0:
+   # pragma acc declare oacc-data-clause[optseq] new-line
+*/
+
+#define OACC_DECLARE_CLAUSE_MASK\
+( (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_COPY)\
+| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_COPYIN)\
+| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_COPYOUT)\
+| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_CREATE)\
+| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_DEVICEPTR)\
+| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_DEVICE_RESIDENT)\
+| (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_LINK)\


For uniformity, please use/add a new alias "PRAGMA_OACC_CLAUSE_* =
PRAGMA_OMP_CLAUSE_LINK" instead of using PRAGMA_OMP_CLAUSE_* here, and
also in c_parser_oacc_data_clause, c_parser_oacc_all_clauses, and in the
C++ front end.


Will fix.




+| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_PRESENT)\
+| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_PRESENT_OR_COPY)\
+| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_PRESENT_OR_COPYIN)\
+| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_PRESENT_OR_COPYOUT)\
+| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_PRESENT_OR_CREATE) )
+
+static void
+c_parser_oacc_declare (c_parser *parser)
+{



--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c



+static tree
+cp_parser_oacc_declare (cp_parser *parser, cp_token *pragma_tok)
+{
+  [...]
+  tree prev_attr = lookup_attribute ("oacc declare",
+ DECL_ATTRIBUTES (decl));


Per my comment above, this would always be NULL_TREE.  The C front end is
different?


Will fix.




+
+  if (OMP_CLAUSE_CODE (t) == OMP_CLAUSE_LINK)
+id = get_identifier ("omp declare target link");
+  else
+id = get_identifier ("omp declare target");
+
+  if (prev_attr)
+{
+  tree p = TREE_VALUE (prev_attr);
+  tree cl = TREE_VALUE (p);
+
+  if (!devres && OMP_CLAUSE_MAP_KIND (cl) != GOMP_MAP_DEVICE_RESIDENT)
+{
+  error_at (loc, "variable %qD used more than once with "
+"%<#pragma acc declare%>", decl);
+  inform (OMP_CLAUSE_LOCATION (TREE_VALUE (p)),
+  "previous directive was here");
+  error = true;
+  continue;
+}
+}



--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -15314,6 +15314,17 @@ tsubst_expr (tree t, tree args, tsubst_flags_t 
complain, tree in_decl,
add_stmt (t);
break;

+case 

  1   2   3   >