[PATCH, PR tree-optimization/68766] Remove all LOOP_VECTORIZED calls

2015-12-07 Thread Ilya Enkovich
Hi,

This patch enables LOOP_VECTORIZED calls processing when debug counters are 
used for vectorizer.  Bootstrapped and regtested on x86_64-unknown-linux-gnu.  
Patch was approved in tracker [1] and applied to trunk.

Thanks,
Ilya

[1] - https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68766#c3
--
gcc/

2015-12-08  Ilya Enkovich  

PR tree-optimization/68766
* tree-vectorizer.c (vectorize_loops): Check for
if-converted loops when debug counters are used.

gcc/testsuite/

2015-12-08  Ilya Enkovich  

PR tree-optimization/68766
* gcc.dg/pr68766.c: New test.


diff --git a/gcc/testsuite/gcc.dg/pr68766.c b/gcc/testsuite/gcc.dg/pr68766.c
new file mode 100644
index 000..a0d549b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr68766.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize -fdbg-cnt=vect_loop:1" } */
+/* { dg-additional-options "-mavx2" { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-prune-output "dbg_cnt 'vect_loop' set to 1" } */
+
+int a, b, g, h;
+int c[58];
+int d[58];
+int fn1() {
+  for (; g; g++)
+if (a)
+  c[g] = b;
+}
+
+int fn2() {
+  fn1();
+  for (; h; h++)
+d[h] = 0;
+}
diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c
index b721c56..c496c4b 100644
--- a/gcc/tree-vectorizer.c
+++ b/gcc/tree-vectorizer.c
@@ -536,7 +536,13 @@ vectorize_loops (void)
  continue;
 
 if (!dbg_cnt (vect_loop))
- break;
+ {
+   /* We may miss some if-converted loops due to
+  debug counter.  Set any_ifcvt_loops to visit
+  them at finalization.  */
+   any_ifcvt_loops = true;
+   break;
+ }
 
gimple *loop_vectorized_call = vect_loop_vectorized_call (loop);
if (loop_vectorized_call)


[rl78] fix far addressing etc

2015-12-07 Thread DJ Delorie

Various fixes for far memory addressing (and large programs in general).  
Committed.

* config/rl78/constraints.md (Wfr): Change to be a non-memory
constraint.
* config/rl78/rl78-protos.h (rl78_one_far_p): Declare.
* config/rl78/rl78.c (rl78_one_far_p): Define.
* config/rl78/rl78-virt (movqi_virt): Fix far memory
alternatives.
(movhi_virt): Likewise.
(zero_extendqihi2_virt): Likewise.
(extendqihi2_virt): Likewise.
(add3_virt): Likewise.
(sub3_virt): Likewise.
(andqi3_virt): Likewise.
(iorqi3_virt): Likewise.
(xorqi3_virt): Likewise.
* config/rl78/rl78-real.md (bf,br): Use long forms to avoid reloc
overflow in large files.

Index: gcc/config/rl78/constraints.md
===
--- gcc/config/rl78/constraints.md  (revision 231385)
+++ gcc/config/rl78/constraints.md  (working copy)
@@ -361,13 +361,13 @@
 (define_memory_constraint "Ws1"
   "es:word8[SP]"
   (match_test "(rl78_es_addr (op) && satisfies_constraint_Cs1 (rl78_es_base 
(op)))
|| satisfies_constraint_Cs1 (op)")
   )
 
-(define_memory_constraint "Wfr"
+(define_constraint "Wfr"
   "ES/CS far pointer"
   (and (match_code "mem")
(match_test "rl78_far_p (op)"))
   )
 
 (define_memory_constraint "Wsa"
Index: gcc/config/rl78/rl78-protos.h
===
--- gcc/config/rl78/rl78-protos.h   (revision 231385)
+++ gcc/config/rl78/rl78-protos.h   (working copy)
@@ -51,6 +51,8 @@ bool  rl78_flags_already_set (rtx, rtx);
 void   rl78_output_symbol_ref (FILE *, rtx);
 void   rl78_output_labelref (FILE *, const char *);
 intrl78_saddr_p (rtx x);
 intrl78_sfr_p (rtx x);
 void   rl78_output_aligned_common (FILE *, tree, const char *,
int, int, int);
+
+intrl78_one_far_p (rtx *operands, int num_operands);
Index: gcc/config/rl78/rl78-real.md
===
--- gcc/config/rl78/rl78-real.md(revision 231385)
+++ gcc/config/rl78/rl78-real.md(working copy)
@@ -586,25 +586,25 @@
(if_then_else (eq (and (reg:QI A_REG)
   (match_operand 0 "immediate_operand" "n"))
  (const_int 0))
  (label_ref (match_operand 1 "" ""))
  (pc)))]
   ""
-  "bf\tA.%B0, $%1"
+  "bt\tA.%B0, $1f\n\tbr !!%1\n\t1:"
   [(set (attr "update_Z") (const_string "clobber"))]
 )
 
 (define_insn "bt"
   [(set (pc)
(if_then_else (ne (and (reg:QI A_REG)
   (match_operand 0 "immediate_operand" "n"))
  (const_int 0))
  (label_ref (match_operand 1 "" ""))
  (pc)))]
   ""
-  "bt\tA.%B0, $%1"
+  "bf\tA.%B0, $1f\n\tbr !!%1\n\t1:"
   [(set (attr "update_Z") (const_string "clobber"))]
 )
 
 ;; NOTE: These peepholes are fragile.  They rely upon GCC generating
 ;; a specific sequence on insns, based upon examination of test code.
 ;; Improvements to GCC or using code other than the test code can result
Index: gcc/config/rl78/rl78-virt.md
===
--- gcc/config/rl78/rl78-virt.md(revision 231385)
+++ gcc/config/rl78/rl78-virt.md(working copy)
@@ -39,14 +39,14 @@
   "rl78_virt_insns_ok ()"
   "v.mov %0, %1"
   [(set_attr "valloc" "op1")]
 )
 
 (define_insn "*movqi_virt"
-  [(set (match_operand:QI 0 "nonimmediate_operand" "=vY,v,Wfr")
-   (match_operand1 "general_operand" "vInt8J,YWfr,vInt8J"))]
+  [(set (match_operand:QI 0 "nonimmediate_operand" "=vY,v,*Wfr,Y,*Wfr,*Wfr")
+   (match_operand1 "general_operand" 
"vInt8JY,*Wfr,vInt8J,*Wfr,Y,*Wfr"))]
   "rl78_virt_insns_ok ()"
   "v.mov %0, %1"
   [(set_attr "valloc" "op1")]
 )
 
 (define_insn "*movhi_virt_mm"
@@ -55,33 +55,33 @@
   "rl78_virt_insns_ok ()"
   "v.movw %0, %1"
   [(set_attr "valloc" "op1")]
 )
 
 (define_insn "*movhi_virt"
-  [(set (match_operand:HI 0 "nonimmediate_operand" "=vS,  Y,   v,   Wfr")
-   (match_operand:HI 1 "general_operand"  "viYS, viS, Wfr, vi"))]
+  [(set (match_operand:HI 0 "nonimmediate_operand" "=vS,  Y,   v,   *Wfr")
+   (match_operand:HI 1 "general_operand"  "viYS, viS, *Wfr, vi"))]
   "rl78_virt_insns_ok ()"
   "v.movw %0, %1"
   [(set_attr "valloc" "op1")]
 )
 
 ;;-- Conversions 
 
 (define_insn "*zero_extendqihi2_virt"
-  [(set (match_operand:HI 0 "rl78_nonfar_nonimm_operand" "=vm")
-   (zero_extend:HI (match_operand:QI 1 "general_operand" "vim")))]
-  "rl78_virt_insns_ok ()"
+  [(set (match_operand:HI 0 "rl78_nonfar_nonimm_operand" 
"=vY,*Wfr")
+   (zero_extend:HI (match_operand:QI 1 "general_operand" "vim,viY")))]
+  

[RFA] [PATCH] [PR tree-optimization/68619] Avoid direct cfg cleanups in tree-ssa-dom.c [3/3]

2015-12-07 Thread Jeff Law


And testcases.  One from the BZ.  Two ICEs that showed up during 
development, one case where we optimize better now than before, and one 
case where we missed an optimization during development that's since 
been fixed.


commit f5b74ee83944177f0a1b98ca577343e45aa35584
Author: Jeff Law 
Date:   Mon Dec 7 22:43:53 2015 -0700

PR tree-optimization/68619
* gcc.dg/tree-ssa/pr68619-1.c: New test.
* gcc.dg/tree-ssa/pr68619-2.c: New test.
* gcc.dg/tree-ssa/pr68619-3.c: New test.
* gcc.dg/tree-ssa/pr68619-4.c: New test.
* gcc.dg/tree-ssa/pr68619-5.c: New test.

diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 4b1b1a3..0cb09e1 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,12 @@
+2015-12-05  Jeff Law  
+
+   PR tree-optimization/68619
+   * gcc.dg/tree-ssa/pr68619-1.c: New test.
+   * gcc.dg/tree-ssa/pr68619-2.c: New test.
+   * gcc.dg/tree-ssa/pr68619-3.c: New test.
+   * gcc.dg/tree-ssa/pr68619-4.c: New test.
+   * gcc.dg/tree-ssa/pr68619-5.c: New test.
+
 2015-12-02  Jeff Law  
 
* gcc.dg/tree-ssa/reassoc-43.c: New test.
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr68619-1.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr68619-1.c
new file mode 100644
index 000..3e988de
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr68619-1.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -w" } */
+
+extern void fn2(int);
+int a, b, c;
+void fn1() {
+  int d;
+  for (; b; b++) {
+a = 7;
+for (; a;) {
+jump:
+  fn2(d ?: c);
+  d = 0;
+}
+d = c;
+if (c)
+  goto jump;
+  }
+  goto jump;
+}
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr68619-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr68619-2.c
new file mode 100644
index 000..cca706e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr68619-2.c
@@ -0,0 +1,92 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-dom2-details -w" } */
+
+typedef union tree_node *tree;
+struct gcc_options
+{
+  int x_flag_finite_math_only;
+};
+extern struct gcc_options global_options;
+enum mode_class
+{ MODE_RANDOM, MODE_CC, MODE_INT, MODE_PARTIAL_INT, MODE_FRACT, MODE_UFRACT,
+  MODE_ACCUM, MODE_UACCUM, MODE_FLOAT, MODE_DECIMAL_FLOAT, MODE_COMPLEX_INT,
+  MODE_COMPLEX_FLOAT, MODE_VECTOR_INT, MODE_VECTOR_FRACT,
+  MODE_VECTOR_UFRACT, MODE_VECTOR_ACCUM, MODE_VECTOR_UACCUM,
+  MODE_VECTOR_FLOAT, MAX_MODE_CLASS
+};
+extern const unsigned char mode_class[27];
+extern const unsigned char mode_inner[27];
+struct real_value
+{
+};
+struct real_format
+{
+  unsigned char has_inf;
+};
+extern const struct real_format *real_format_for_mode[5 -
+ 2 + 1 + 15 - 10 + 1];
+struct tree_type
+{
+};
+union tree_node
+{
+  int code;
+  int mode;
+  struct tree_type type;
+};
+tree
+omp_reduction_init (tree clause, tree type)
+{
+  if type)->code) == 64))
+{
+  struct real_value max;
+  if (mode_class[type))->code) ==
+ 32 ?
+ vector_type_mode (type)
+ : (type)->mode)]) ==
+MODE_VECTOR_FLOAT)
+   &&
+   ((real_format_for_mode
+ [((mode_class[((mode_class[type))->code) ==
+ 32 ?
+ vector_type_mode (type)
+ : (type)->mode)]) ==
+12) ? (type))->code)
+==
+32 ?
+vector_type_mode
+(type)
+: (type)->mode))
+   : (mode_inner[type))->code) ==
+  32 ?
+  vector_type_mode (type)
+  : (type)->mode)])]) ==
+   12)
+  ? (mode_class[type))->code) ==
+ 32 ? vector_type_mode (type)
+ : (type)->mode)]) ==
+12) ? (type))->code) ==
+32 ?
+vector_type_mode (type)
+: (type)->mode)) : (mode_inner
+[type))->code) ==
+  32 ?
+  vector_type_mode (type)
+  : (type)->mode)])) - 10) +
+ (5 - 2 +
+  1))
+  : mode_class
+[type))->code) ==
+  32 ? vector_type_mode (type) : (type)->mode)]) ==
+   12) ? (type))->code) ==
+   32 ? 

[RFA] [PATCH] [PR tree-optimization/68619] Avoid direct cfg cleanups in tree-ssa-dom.c [2/3]

2015-12-07 Thread Jeff Law


This patch tweaks tree-ssa-dom.c to use the new capability in the dom 
walker.  Additionally:


The code to remove jump threading paths now runs after the walk is 
finished rather than when the conditional is optimized.  The code which 
optimizes conditionals replaces the condition with a true/false 
condition and clears EDGE_EXECUTABLE.


When we try to find equivalences from PHIs, we ignore a PHI arg 
associated with an unexecutable edge.


When we look for blocks that have a single incoming edge, ignoring loop 
edges, we ignore edges that are not executable.


That's enough to get all the benefits of the current implementation on 
the trunk, and as slightly better code than the trunk in certain cases.


Obviously if we twiddle the member names in patch #1, then this patch 
will need corresponding trivial updates.


Jeff
commit 89a7f78005a5ec4788383ecd44474c85103693b5
Author: Jeff Law 
Date:   Mon Dec 7 22:43:06 2015 -0700

PR tree-optimization/68619
* tree-ssa-dom.c (pass_dominator:execute): Use new methods
from dom_walker to handle unreachable code.  If a block has an
unreachable edge, remove all jump threads through any successor
of the affected block.
(dom_opt_dom_walker::thread_across_edge): Do not thread across
edges without EDGE_EXECUTABLE set.
(record_equivalences_from_phis): Ignore alternatives if the edge
does not have EDGE_EXECUTABLE set.
(single_incoming_edge_ignoring_loop_edges): Similarly.
(dom_opt_dom_walker::before_dom_children): Use new methods from
dom_walker to handle unreachable code.
(dom_opt_dom_walker::after_dom_children): Similarly.
(optimize_stmt): If a gimple_code has a compile-time constant
condition, clear EDGE_EXECUTABLE on the non-taken edges.  Also
change the condition to true/false as necessary.

diff --git a/gcc/tree-ssa-dom.c b/gcc/tree-ssa-dom.c
index aeb726c..c48951e 100644
--- a/gcc/tree-ssa-dom.c
+++ b/gcc/tree-ssa-dom.c
@@ -609,8 +609,39 @@ pass_dominator::execute (function *fun)
   dom_opt_dom_walker walker (CDI_DOMINATORS,
 const_and_copies,
 avail_exprs_stack);
+  walker.init_edge_executable (cfun);
   walker.walk (fun->cfg->x_entry_block_ptr);
 
+  /* Look for blocks where we cleared EDGE_EXECUTABLE on an outgoing
+ edge.  When found, remove jump threads which contain any outgoing
+ edge from the affected block.  */
+  if (cfg_altered)
+{
+  FOR_EACH_BB_FN (bb, fun)
+   {
+ edge_iterator ei;
+ edge e;
+
+ /* First see if there are any edges without EDGE_EXECUTABLE
+set.  */
+ bool found = false;
+ FOR_EACH_EDGE (e, ei, bb->succs)
+   {
+ if ((e->flags & EDGE_EXECUTABLE) == 0)
+   {
+ found = true;
+ break;
+   }
+   }
+
+ /* If there were any such edges found, then remove jump threads
+containing any edge leaving BB.  */
+ if (found)
+   FOR_EACH_EDGE (e, ei, bb->succs)
+ remove_jump_threads_including (e);
+   }
+}
+
   {
 gimple_stmt_iterator gsi;
 basic_block bb;
@@ -893,6 +924,11 @@ record_temporary_equivalences (edge e,
 void
 dom_opt_dom_walker::thread_across_edge (edge e)
 {
+  /* If E is not executable, then there's no reason to bother
+ threading across it.  */
+  if ((e->flags & EDGE_EXECUTABLE) == 0)
+return;
+
   if (! m_dummy_cond)
 m_dummy_cond =
 gimple_build_cond (NE_EXPR,
@@ -951,6 +987,11 @@ record_equivalences_from_phis (basic_block bb)
  if (lhs == t)
continue;
 
+ /* If the associated edge is not marked as executable, then it
+can be ignored.  */
+ if ((gimple_phi_arg_edge (phi, i)->flags & EDGE_EXECUTABLE) == 0)
+   continue;
+
  t = dom_valueize (t);
 
  /* If we have not processed an alternative yet, then set
@@ -997,6 +1038,10 @@ single_incoming_edge_ignoring_loop_edges (basic_block bb)
   if (dominated_by_p (CDI_DOMINATORS, e->src, e->dest))
continue;
 
+  /* We can safely ignore edges that are not executable.  */
+  if ((e->flags & EDGE_EXECUTABLE) == 0)
+   continue;
+
   /* If we have already seen a non-loop edge, then we must have
 multiple incoming non-loop edges and thus we return NULL.  */
   if (retval)
@@ -1307,6 +1352,14 @@ dom_opt_dom_walker::before_dom_children (basic_block bb)
   m_avail_exprs_stack->push_marker ();
   m_const_and_copies->push_marker ();
 
+  /* If BB is not reachable, then propagate that property to edges, but
+ do not process this block any further.  */
+  if (!this->bb_reachable (cfun, bb))
+{
+  this->propagate_unreachable_to_edges (bb, dump_file, dump_flags);
+  return;
+}
+
   record_equivalences_from_incoming_edge (bb, 

[RFA] [PATCH] [PR tree-optimization/68619] Avoid direct cfg cleanups in tree-ssa-dom.c [1/3]

2015-12-07 Thread Jeff Law


First in the series.  This merely refactors code from tree-ssa-sccvn.c 
into domwalk.c so that other walkers can use it as they see fit.



There's an initialization function which sets all edges to executable.

There's a test function to see if a block is reachable.

There's a propagation function to propagate the unreachable property to 
edges.


Finally a function to clear m_unreachable_dom.  I consider this a wart. 
 Essentially that data member contains the highest unreachable block in 
the dominator tree.  Once we've finished processing that block's 
children, we need to clear the member.  Ideally clients wouldn't need to 
call this member function.


Bikeshedding on the members names is encouraged.  And if someone has a 
clean, simple way to ensure that the m_unreachable_dom member gets 
cleared regardless of whether or not a client has its own 
after_dom_children callback, I'd love to hear it.


OK for trunk?


Jeff
commit 5e53fefae0373768b98d51d5746d43f36cecbe2a
Author: Jeff Law 
Date:   Mon Dec 7 11:32:58 2015 -0700

* domwalk.h (dom_walker::init_edge_executable): New method.
(dom_walker::maybe_clear_unreachable_dom): Likewise.
(dom_walker::bb_reachable): Likewise.
(dom_walker::propagate_unreachable_to_edges): Likewise.
(dom_walker::m_unreachable_dom): New private data member.
* domwalk.c: Include dumpfile.h.
(dom_walker::init_edge_executable): New method.
(dom_walker::maybe_clear_unreachable_dom): Likewise.
(dom_walker::bb_reachable): Likewise.  Factored out of
tree-ssa-sccvn.c
(dom_walker::propagate_unreachable_to_edges): Likewise.
* tree-ssa-sccvn.c (sccvn_dom_walker::unreachable_dom): Remove
private data member.
(sccvn_dom_walker::after_dom_children): Use methods from dom_walker
class.
(sccvn_dom_walker::before_dom_children): Similarly.
(run_scc_vn): Likewise.

diff --git a/gcc/domwalk.c b/gcc/domwalk.c
index 167fc38..feb6478 100644
--- a/gcc/domwalk.c
+++ b/gcc/domwalk.c
@@ -24,6 +24,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "backend.h"
 #include "cfganal.h"
 #include "domwalk.h"
+#include "dumpfile.h"
 
 /* This file implements a generic walker for dominator trees.
 
@@ -142,6 +143,93 @@ cmp_bb_postorder (const void *a, const void *b)
   return 1;
 }
 
+/* Mark all edges in the CFG as possibly being executable.  */
+
+void
+dom_walker::init_edge_executable (struct function *fun)
+{
+  basic_block bb;
+  FOR_ALL_BB_FN (bb, fun)
+{
+  edge_iterator ei;
+  edge e;
+  FOR_EACH_EDGE (e, ei, bb->succs)
+   e->flags |= EDGE_EXECUTABLE;
+}
+}
+
+/* Return TRUE if BB is reachable, false otherwise.  */
+
+bool
+dom_walker::bb_reachable (struct function *fun, basic_block bb)
+{
+  /* If any of the predecessor edges that do not come from blocks dominated
+ by us are still marked as possibly executable consider this block
+ reachable.  */
+  bool reachable = false;
+  if (!m_unreachable_dom)
+{
+  reachable = bb == ENTRY_BLOCK_PTR_FOR_FN (fun);
+  edge_iterator ei;
+  edge e;
+  FOR_EACH_EDGE (e, ei, bb->preds)
+   if (!dominated_by_p (CDI_DOMINATORS, e->src, bb))
+ reachable |= (e->flags & EDGE_EXECUTABLE);
+}
+
+  return reachable;
+}
+
+/* BB has been determined to be unreachable.  Propagate that property
+   to incoming and outgoing edges of BB as appropriate.  */
+
+void
+dom_walker::propagate_unreachable_to_edges (basic_block bb,
+   FILE *dump_file,
+   int dump_flags)
+{
+  if (dump_file && (dump_flags & TDF_DETAILS))
+fprintf (dump_file, "Marking all outgoing edges of unreachable "
+"BB %d as not executable\n", bb->index);
+
+  edge_iterator ei;
+  edge e;
+  FOR_EACH_EDGE (e, ei, bb->succs)
+e->flags &= ~EDGE_EXECUTABLE;
+
+  FOR_EACH_EDGE (e, ei, bb->preds)
+{
+  if (dominated_by_p (CDI_DOMINATORS, e->src, bb))
+   {
+ if (dump_file && (dump_flags & TDF_DETAILS))
+   fprintf (dump_file, "Marking backedge from BB %d into "
+"unreachable BB %d as not executable\n",
+e->src->index, bb->index);
+ e->flags &= ~EDGE_EXECUTABLE;
+   }
+}
+
+  if (!m_unreachable_dom)
+m_unreachable_dom = bb;
+}
+
+/* When we propagate the unreachable property to edges, we
+   also arrange to track the highest block in the dominator
+   walk which was unreachable.  We can use that to identify
+   more unreachable blocks.
+
+   When we finish processing the dominator children for that
+   highest unreachable block, we need to make sure to clear
+   that recorded highest block unreachable block in the
+   dominator tree.  */
+
+void
+dom_walker::maybe_clear_unreachable_dom (basic_block bb)
+{
+  if (m_unreachable_dom == bb)
+m_unreachable_dom = NULL;
+}
+
 /* Recursively walk the 

[RFA] [PATCH] [PR tree-optimization/68619] Avoid direct cfg cleanups in tree-ssa-dom.c [0/3]

2015-12-07 Thread Jeff Law
Richi and I have been discussing revamping slightly how DOM handles 
conditionals which it detects are always true or always false.


During gcc6 stage1 I added code to allow DOM to clean them up 
immediately, primarily to avoid the waste of having the threader handle 
those cases.  It was also believed that by cleaning things up during the 
DOM walk we could realize some secondary benefits (certain PHIs become 
more likely to collapse down to a const/copy which can then be propagated).


That code causes an interesting problem as shown by 68619.  Essentially 
the CFG has 3 loops, one is a natural loop, the other two are irreducible.


DOM finds conditionals which it can optimize to true/false.  It removes 
the unreachable edges and everything seems perfect.  Except that removal 
of those edges causes the irreducible loops become reducible.  This is a 
good thing, except


Now we have two new natural loops, which triggers a checking failure 
because we haven't set up loop structures for the newly exposed natural 
loops.


Richi's suggestion (before this problem was reported) was to have DOM 
leave the CFG alone, but otherwise optimize as-if the edges had been 
removed.  Final removal of the edges would be left to cfg_cleanup.  He 
also pointed me at SCCVN which does something similar.


This change essentially has DOM working in the same was as SCCVN.  The 
change is broken into 3 parts.


1. Refactor the code from tree-ssa-sccvn.c into domwalk.c  Essentially 
it's 4 new member functions that a dominator walker can optionally use 
to improve it's behaviour when the pass might make certain edges 
unexecutable.  I need someone to review these changes.  If you've got a 
better name for the member functions, certainly pass them along.  I'm 
not particularly happy with maybe_clear_unreachable_dom.  It feels like 
an internal implementation detail has leaked out, but I'm not really 
sure how to fix it, so any suggestions there are certainly welcome


2. Use the new member functions in tree-ssa-dom.c.  It's pretty simple 
stuff.


3. New tests.  One is the actual 68619 testcase.  Two ICEs for minor 
bugs found during development/testing, one case where we optimize better 
now than before, one for a missed optimization during development.


The patchset as a whole has been bootstrapped and regression tested on 
x86_64-linux-gnu.


Jeff


Re: [PATCH] Fix new sancov tests

2015-12-07 Thread Jakub Jelinek
On Sun, Dec 06, 2015 at 09:56:32AM +0100, Dmitry Vyukov wrote:

> --- gcc.dg/sancov/sancov.exp (revision 231328)
> +++ gcc.dg/sancov/sancov.exp (working copy)
> @@ -18,6 +18,7 @@
> 
>  load_lib gcc-dg.exp
>  load_lib torture-options.exp
> +load_lib asan-dg.exp
> 
>  dg-init
>  torture-init
> @@ -31,7 +32,11 @@
>   { -O2 -g } \
>   { -O3 -g } ]
> 
> -gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.c]] "" ""
> +gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/basic*.c]] "" ""
> 
> +if [check_effective_target_fsanitize_address] {
> +  gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/asan*.c]] "" ""
> +}
> +

I don't like this, it is bad enough vect.exp works this way, let's not add
further tests depending on test names.
So, either just load_lib asan-dg.exp and change
/* { dg-do compile } */
to
/* { dg-do compile { target fsanitize_address } } */
or avoid the load_lib and add check_effective_target_fsanitize_address
variant that checks compilation with -fsanitize=address of trivial program
instead of linking, put it into lib/target-supports.exp and use it
in dg-do ompile.

Jakub


Re: [Fortran, Patch] Memory sync after coarray image control statements and assignment

2015-12-07 Thread Matthew Wahab

On 07/12/15 10:06, Tobias Burnus wrote:

I wrote:

I wonder whether using

__asm__ __volatile__ ("":::"memory");

would be sufficient as it has a way lower overhead than
__sync_synchronize().


Namely, something like the attached patch.

Regarding the original patch submission: Is there a reason that you didn't
include the test case of Deepak from 
https://gcc.gnu.org/ml/fortran/2015-04/msg00062.html
It should work as -fcoarray=lib -lcaf_single "dg-do run" test.

Tobias



I don't know anything about Fortran or coarrays and I'm curious whether this affects 
architectures with weak memory models. Is the barrier only needed to stop reordering 
by the compiler or is does it also need to stop reordering by the hardware?


Matthew




Re: [Fortran, Patch] Memory sync after coarray image control statements and assignment

2015-12-07 Thread Alessandro Fanfarillo
Your patch fixes the issues. In attachment patch, test case and changelog.

Thanks!

2015-12-07 11:06 GMT+01:00 Tobias Burnus :
> I wrote:
>> I wonder whether using
>>
>> __asm__ __volatile__ ("":::"memory");
>>
>> would be sufficient as it has a way lower overhead than
>> __sync_synchronize().
>
> Namely, something like the attached patch.
>
> Regarding the original patch submission: Is there a reason that you didn't
> include the test case of Deepak from 
> https://gcc.gnu.org/ml/fortran/2015-04/msg00062.html
> It should work as -fcoarray=lib -lcaf_single "dg-do run" test.
>
> Tobias
commit 69e650945454491bbaf81513a1eed10b5b6b0f45
Author: Alessandro Fanfarillo 
Date:   Mon Dec 7 15:46:10 2015 +0100

Introducing __asm__ __volatile__ (:::memory) after image control 
statements, send and get

diff --git a/gcc/fortran/trans-intrinsic.c b/gcc/fortran/trans-intrinsic.c
index 21efe44..25ff311 100644
--- a/gcc/fortran/trans-intrinsic.c
+++ b/gcc/fortran/trans-intrinsic.c
@@ -1222,6 +1222,15 @@ gfc_conv_intrinsic_caf_get (gfc_se *se, gfc_expr *expr, 
tree lhs, tree lhs_kind,
   se->expr = res_var;
   if (array_expr->ts.type == BT_CHARACTER)
 se->string_length = argse.string_length;
+
+  /* It guarantees memory consistency within the same segment */
+  tmp = gfc_build_string_const (strlen ("memory")+1, "memory"),
+  tmp = build5_loc (input_location, ASM_EXPR, void_type_node,
+   gfc_build_string_const (1, ""), NULL_TREE, NULL_TREE,
+   tree_cons (NULL_TREE, tmp, NULL_TREE), NULL_TREE);
+  ASM_VOLATILE_P (tmp) = 1;
+  gfc_add_expr_to_block (>pre, tmp);
+
 }
 
 
@@ -1390,6 +1399,15 @@ conv_caf_send (gfc_code *code) {
   gfc_add_expr_to_block (, tmp);
   gfc_add_block_to_block (, _se.post);
   gfc_add_block_to_block (, _se.post);
+
+  /* It guarantees memory consistency within the same segment */
+  tmp = gfc_build_string_const (strlen ("memory")+1, "memory"),
+  tmp = build5_loc (input_location, ASM_EXPR, void_type_node,
+   gfc_build_string_const (1, ""), NULL_TREE, NULL_TREE,
+   tree_cons (NULL_TREE, tmp, NULL_TREE), NULL_TREE);
+  ASM_VOLATILE_P (tmp) = 1;
+  gfc_add_expr_to_block (, tmp);
+
   return gfc_finish_block ();
 }
 
diff --git a/gcc/fortran/trans-stmt.c b/gcc/fortran/trans-stmt.c
index 3df483a..b7e1faa 100644
--- a/gcc/fortran/trans-stmt.c
+++ b/gcc/fortran/trans-stmt.c
@@ -818,6 +818,15 @@ gfc_trans_lock_unlock (gfc_code *code, gfc_exec_op op)
   errmsg, errmsg_len);
   gfc_add_expr_to_block (, tmp);
 
+  /* It guarantees memory consistency within the same segment */
+  tmp = gfc_build_string_const (strlen ("memory")+1, "memory"),
+  tmp = build5_loc (input_location, ASM_EXPR, void_type_node,
+   gfc_build_string_const (1, ""), NULL_TREE, NULL_TREE,
+   tree_cons (NULL_TREE, tmp, NULL_TREE), NULL_TREE);
+  ASM_VOLATILE_P (tmp) = 1;
+
+  gfc_add_expr_to_block (, tmp);
+
   if (stat2 != NULL_TREE)
gfc_add_modify (, stat2,
fold_convert (TREE_TYPE (stat2), stat));
@@ -995,6 +1004,14 @@ gfc_trans_event_post_wait (gfc_code *code, gfc_exec_op op)
   errmsg, errmsg_len);
   gfc_add_expr_to_block (, tmp);
 
+  /* It guarantees memory consistency within the same segment */
+  tmp = gfc_build_string_const (strlen ("memory")+1, "memory"),
+  tmp = build5_loc (input_location, ASM_EXPR, void_type_node,
+   gfc_build_string_const (1, ""), NULL_TREE, NULL_TREE,
+   tree_cons (NULL_TREE, tmp, NULL_TREE), NULL_TREE);
+  ASM_VOLATILE_P (tmp) = 1;
+  gfc_add_expr_to_block (, tmp);
+
   if (stat2 != NULL_TREE)
 gfc_add_modify (, stat2, fold_convert (TREE_TYPE (stat2), stat));
 
@@ -1080,6 +1097,18 @@ gfc_trans_sync (gfc_code *code, gfc_exec_op type)
   fold_convert (integer_type_node, images));
 }
 
+  /* Per F2008, 8.5.1, a SYNC MEMORY is implied by calling the
+ image control statements SYNC IMAGES and SYNC ALL.  */
+  if (flag_coarray == GFC_FCOARRAY_LIB)
+{
+  tmp = gfc_build_string_const (strlen ("memory")+1, "memory"),
+  tmp = build5_loc (input_location, ASM_EXPR, void_type_node,
+   gfc_build_string_const (1, ""), NULL_TREE, NULL_TREE,
+   tree_cons (NULL_TREE, tmp, NULL_TREE), NULL_TREE);
+  ASM_VOLATILE_P (tmp) = 1;
+  gfc_add_expr_to_block (, tmp);
+}
+
   if (flag_coarray != GFC_FCOARRAY_LIB)
 {
   /* Set STAT to zero.  */
diff --git a/gcc/fortran/trans.c b/gcc/fortran/trans.c
index 001db41..1993743 100644
--- a/gcc/fortran/trans.c
+++ b/gcc/fortran/trans.c
@@ -746,6 +746,14 @@ gfc_allocate_using_lib (stmtblock_t * block, tree pointer, 
tree size,
 TREE_TYPE (pointer), pointer,
 fold_convert ( TREE_TYPE (pointer), tmp));
   

[PATCH] Fix Changelog entry and add pr66896.C

2015-12-07 Thread Martin Liška
Hi.

As Jakub pointed out in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66896#c15, 
I forgot
to add a test-case to both GCC-5-branch and trunk.

May I please installed the suggested patch to both these branches?
Thanks,
Martin
>From 7df3eaa59c4b6ee9f011f35ee480e022fe77e0b3 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Mon, 7 Dec 2015 16:00:31 +0100
Subject: [PATCH] Fix ChangelogEntry and add pr66896.C.

---
 gcc/testsuite/ChangeLog|  2 +-
 gcc/testsuite/g++.dg/ipa/pr66896.C | 22 ++
 2 files changed, 23 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/ipa/pr66896.C

diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index ca604d2..7106276 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -10570,7 +10570,7 @@
 
 2015-07-16  Martin Liska  
 
-	* g++.dg/ipa/pr66896.c: New test.
+	* g++.dg/ipa/pr66896.C: New test.
 
 2015-07-16  Richard Biener  
 
diff --git a/gcc/testsuite/g++.dg/ipa/pr66896.C b/gcc/testsuite/g++.dg/ipa/pr66896.C
new file mode 100644
index 000..236537a
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ipa/pr66896.C
@@ -0,0 +1,22 @@
+// PR ipa/66896
+// { dg-do compile }
+
+void f2 (void *);
+void f3 ();
+
+struct A
+{
+  int *a;
+  A ();
+  ~A () { a3 (); }
+  int a1 (int * p) { if (!p) f3 (); f2 (p); }
+  void a3 () { if (*a) a1 (a); }
+};
+
+struct B : A {~B () { a3 ();}};
+
+struct F {};
+
+struct G : F {B g;};
+
+void foo () {G g;}
-- 
2.6.3



Re: [PATCH] Fix new sancov tests

2015-12-07 Thread Dmitry Vyukov
On Mon, Dec 7, 2015 at 4:20 PM, Jakub Jelinek  wrote:
> On Mon, Dec 07, 2015 at 04:16:02PM +0100, Dmitry Vyukov wrote:
>> Index: ChangeLog
>> ===
>> --- ChangeLog (revision 231362)
>> +++ ChangeLog (working copy)
>> @@ -1,3 +1,7 @@
>> +2015-12-06  Dmitry Vyukov  
>> +
>> + * gcc.dg/sancov/asan.c: Don't run when asan is not available.
>
> The ChangeLog entry should also contain the other change:
> * gcc.dg/sancov/sancov.exp: Load asan-dg.exp.
>
> Ok with that change.


Committed as 231364 with updated ChangeLog:

* gcc.dg/sancov/sancov.exp: Load asan-dg.exp.
* gcc.dg/sancov/asan.c: Don't run when asan is not available.

Thanks!


Re: [PATCH] Fix new sancov tests

2015-12-07 Thread Dmitry Vyukov
On Mon, Dec 7, 2015 at 2:56 PM, Nathan Sidwell  wrote:
> On 12/06/15 03:56, Dmitry Vyukov wrote:
>>
>> Hello,
>>
>> Sancov tests submitted in 231296 assume that asan is supported on all
>> platforms.
>> This patch fixes that assumption.
>
>
>>* gcc.target/powerpc/recip-sqrtf.c: New test.
>> Index: gcc.dg/sancov/sancov.exp
>> ===
>
>
>> -gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.c]] "" ""
>> +gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/basic*.c]] "" ""
>>
>> +if [check_effective_target_fsanitize_address] {
>> +  gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/asan*.c]] "" ""
>> +}
>> +
>>   torture-finish
>>   dg-finish
>>
>
> Thanks for addressing this.  FWIW I think canonical form in .exp files is to
> place something like
>
> if { ! [is this for me?] } then {
>  return
> }
>
> as early as possible (before the dg-init call)


Only asan.c is not for me, but other tests still need to be executed.


Re: [gomp-nvptx 6/9] nvptx libgcc: rewrite in C

2015-12-07 Thread Nathan Sidwell

On 12/01/15 18:52, Bernd Schmidt wrote:

What exactly is the problem with having asm files? I'm asking because this...

On 12/01/2015 04:28 PM, Alexander Monakov wrote:

+/* __shared__ char *__nvptx_stacks[32];  */
+asm ("// BEGIN GLOBAL VAR DEF: __nvptx_stacks");
+asm (".visible .shared .u64 __nvptx_stacks[32];");
+
+/* __shared__ unsigned __nvptx_uni[32];  */
+asm ("// BEGIN GLOBAL VAR DEF: __nvptx_uni");
+asm (".visible .shared .u32 __nvptx_uni[32];");


... doesn't look great to me. This is better done in assembly directly IMO.


the decl reworking I recently committed has a 'TODO: this would be a good place 
to check for a .shared section' in it.  That would  seem a better place to 
augment and allow the above with a regular __attribute__((section...))


nathan



Re: [AArch64] Rework ARMv8.1 command line options.

2015-12-07 Thread James Greenhalgh
On Mon, Dec 07, 2015 at 11:09:52AM +, Matthew Wahab wrote:
> Ping. Updated patch attached.

This is OK, thanks.

James

> 
> Matthew
> 
> On 27/11/15 09:23, Matthew Wahab wrote:
> >On 24/11/15 15:22, James Greenhalgh wrote:
> > > On Mon, Nov 16, 2015 at 04:31:32PM +, Matthew Wahab wrote:
> > >>
> > >> The command line options for target selection allow ARMv8.1 extensions
> > >> to be individually enabled/disabled. They also allow the extensions to
> > >> be enabled with -march=armv8-a. This doesn't reflect the ARMv8.1
> > >> architecture which requires all extensions to be enabled and doesn't make
> > >> them available for ARMv8.
> > >>
> > >> This patch removes the options for the individual ARMv8.1 extensions
> > >> except for +lse. This means that setting -march=armv8.1-a will enable
> > >> all extensions required by ARMv8.1 and that the ARMv8.1 extensions can't
> > >> be used with -march=armv8.
> >
> > > I think I mentioned it in another review, but this patch seems a good 
> > > place
> > > to solve the problem. Could you please update the documentation to explain
> > > what you've written above. As it stands I find myself confused by which
> > > features GCC will make available at -march=armv8-a and -march=armv8.1-a.
> >
> >Attached is a patch with the documentation for the AArch64 -march option
> >reworked to try to make it clearer what the -march=armv8.1-a option will
> >do. Extensions with feature modifiers (+crc, +lse) are explicitly stated
> >as being enabled by -march=armv8.1-a. Extensions without feature
> >modifiers (RDMA, PAN, LOR) are treated as part of the generic 'ARMv8.1
> >architecture extension' term in the description of -march=armv8.1-a.
> >
> >I've also rearranged the -march section, to put the description of the
> >values for -march together and reworded the description of the
> >-march=native option.
> >
> >Matthew
> >
> >2015-11-26  Matthew Wahab  
> >
> > * config/aarch64/aarch64-options-extensions.def: Remove
> > AARCH64_FL_RDMA from "fp" and "simd".  Remove "pan", "lor",
> > "rdma".
> > * config/aarch64/aarch64.h (AARCH64_FL_PAN): Remove.
> > (AARCH64_FL_LOR): Remove.
> > (AARCH64_FL_RDMA): Remove.
> > (AARCH64_FL_V8_1): New.
> > (AARCH64_FL_FOR_AARCH8_1): Replace AARCH64_FL_PAN, AARCH64_FL_LOR
> > and AARCH64_FL_RDMA with AARCH64_FL_V8_1.
> > (AARCH64_ISA_RDMA): Replace AARCH64_FL_RDMA with AARCH64_FL_V8_1.
> > * doc/invoke.texi (AArch64 -march): Rewrite initial paragraph and
> > section on -march=native.  Group descriptions of permitted
> > architecture names together.  Expand description of
> > -march=armv8.1-a.
> > (AArch64 -mtune): Slightly rework section on -march=native.
> > (AArch64 -mcpu): Slightly rework section on -march=native.
> > (AArch64 Feature Modifiers): Remove "pan", "lor" and "rdma".
> > State that -march=armv8.1-a enables "crc" and "lse".
> >
> 



[PATCH] gcc/config/tilegx/tilegx.md: Compare only 32-bit values for 32-bit comparing

2015-12-07 Thread Chen Gang
From 358ae2453a4b965adaf9e684220b7461f719a568 Mon Sep 17 00:00:00 2001
From: Chen Gang 
Date: Mon, 7 Dec 2015 21:29:20 +0800
Subject: [PATCH] gcc/config/tilegx/tilegx.md: Compare only 32-bit values for 
32-bit comparing

For __buildin_mul_overflow(), it will really compare only 32-bit values
for 32-bit comparing. If compare 64-bit values instead of, it will cause
logical issue.

This fix will have low performance for 32-bit comparing (it has more
instructions, also without boundling), but it is correct and fix the
related issue.

2015-12-07  Chen Gang  
---
 gcc/config/tilegx/tilegx.md | 34 ++
 1 file changed, 26 insertions(+), 8 deletions(-)

diff --git a/gcc/config/tilegx/tilegx.md b/gcc/config/tilegx/tilegx.md
index 944953c..b694dbd 100644
--- a/gcc/config/tilegx/tilegx.md
+++ b/gcc/config/tilegx/tilegx.md
@@ -1650,22 +1650,40 @@
 })
 
  
-(define_insn "insn_cmpne_"
-  [(set (match_operand:I48MODE2 0 "register_operand" "=r")
-   (ne:I48MODE2 (match_operand:I48MODE 1 "reg_or_0_operand" "rO")
-    (match_operand:I48MODE 2 "reg_or_cint_operand" "rO")))]
+(define_insn "insn_cmpne_di"
+  [(set (match_operand:I48MODE 0 "register_operand" "=r")
+   (ne:I48MODE (match_operand:DI 1 "reg_or_0_operand" "rO")
+    (match_operand:DI 2 "reg_or_cint_operand" "rO")))]
   ""
   "cmpne\t%0, %r1, %r2")
  
-(define_insn "insn_cmpeq_"
-  [(set (match_operand:I48MODE2 0 "register_operand" "=r,r")
-   (eq:I48MODE2 (match_operand:I48MODE 1 "reg_or_0_operand" "%rO,rO")
-    (match_operand:I48MODE 2 "reg_or_cint_operand" "I,rO")))]
+(define_insn "insn_cmpne_si"
+  [(set (match_operand:I48MODE 0 "register_operand" "=r")
+   (ne:I48MODE (match_operand:SI 1 "reg_or_0_operand" "rO")
+    (match_operand:SI 2 "reg_or_cint_operand" "rO")))]
+  ""
+  "xor\t%0, %r1, %r2; bfextu\t%0, %0, 0, 31; cmpne\t%0, %0, zero"
+  [(set_attr "type" "cannot_bundle")])
+
+(define_insn "insn_cmpeq_di"
+  [(set (match_operand:I48MODE 0 "register_operand" "=r,r")
+   (eq:I48MODE (match_operand:DI 1 "reg_or_0_operand" "rO,rO")
+    (match_operand:DI 2 "reg_or_cint_operand" "I,rO")))]
   ""
   "@
    cmpeqi\t%0, %r1, %2
    cmpeq\t%0, %r1, %r2")
 
+(define_insn "insn_cmpeq_si"
+  [(set (match_operand:I48MODE 0 "register_operand" "=r,r")
+   (eq:I48MODE (match_operand:SI 1 "reg_or_0_operand" "rO,rO")
+    (match_operand:SI 2 "reg_or_cint_operand" "I,rO")))]
+  ""
+  "@
+   xori\t%0, %r1, %2; bfextu\t%0, %0, 0, 31; cmpeqi\t%0, %0, 0
+   xor\t%0, %r1, %r2; bfextu\t%0, %0, 0, 31; cmpeqi\t%0, %0, 0"
+  [(set_attr "type" "cannot_bundle")])
+
 (define_insn "insn_cmplts_"
   [(set (match_operand:I48MODE2 0 "register_operand" "=r,r")
    (lt:I48MODE2 (match_operand:I48MODE 1 "reg_or_0_operand" "rO,rO")
-- 
1.9.3

  

0001-gcc-config-tilegx-tilegx.md-Compare-only-32-bit-valu.patch
Description: Binary data


Re: [PATCH] Fix new sancov tests

2015-12-07 Thread Dmitry Vyukov
On Mon, Dec 7, 2015 at 3:09 PM, Jakub Jelinek  wrote:
> On Sun, Dec 06, 2015 at 09:56:32AM +0100, Dmitry Vyukov wrote:
>
>> --- gcc.dg/sancov/sancov.exp (revision 231328)
>> +++ gcc.dg/sancov/sancov.exp (working copy)
>> @@ -18,6 +18,7 @@
>>
>>  load_lib gcc-dg.exp
>>  load_lib torture-options.exp
>> +load_lib asan-dg.exp
>>
>>  dg-init
>>  torture-init
>> @@ -31,7 +32,11 @@
>>   { -O2 -g } \
>>   { -O3 -g } ]
>>
>> -gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.c]] "" ""
>> +gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/basic*.c]] "" ""
>>
>> +if [check_effective_target_fsanitize_address] {
>> +  gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/asan*.c]] "" ""
>> +}
>> +
>
> I don't like this, it is bad enough vect.exp works this way, let's not add
> further tests depending on test names.
> So, either just load_lib asan-dg.exp and change
> /* { dg-do compile } */
> to
> /* { dg-do compile { target fsanitize_address } } */
> or avoid the load_lib and add check_effective_target_fsanitize_address
> variant that checks compilation with -fsanitize=address of trivial program
> instead of linking, put it into lib/target-supports.exp and use it
> in dg-do ompile.


Did the first option. Please take another look:


Index: ChangeLog
===
--- ChangeLog (revision 231362)
+++ ChangeLog (working copy)
@@ -1,3 +1,7 @@
+2015-12-06  Dmitry Vyukov  
+
+ * gcc.dg/sancov/asan.c: Don't run when asan is not available.
+
 2015-12-07  Nathan Sidwell  

  * gcc.target/nvptx/decl-init.c: New.
Index: gcc.dg/sancov/asan.c
===
--- gcc.dg/sancov/asan.c (revision 231362)
+++ gcc.dg/sancov/asan.c (working copy)
@@ -3,7 +3,7 @@
  - coverage does not instrument asan-emitted basic blocks
  - asan considers coverage callback as "nonfreeing" (thus 1 asan store
callback.  */
-/* { dg-do compile } */
+/* { dg-do compile { target fsanitize_address } } */
 /* { dg-options "-fsanitize-coverage=trace-pc -fsanitize=address
-fdump-tree-optimized" } */

 void foo(volatile int *a, int *b)
Index: gcc.dg/sancov/sancov.exp
===
--- gcc.dg/sancov/sancov.exp (revision 231362)
+++ gcc.dg/sancov/sancov.exp (working copy)
@@ -17,6 +17,7 @@
 # .

 load_lib gcc-dg.exp
+load_lib asan-dg.exp
 load_lib torture-options.exp

 dg-init
Index: ChangeLog
===
--- ChangeLog	(revision 231362)
+++ ChangeLog	(working copy)
@@ -1,3 +1,7 @@
+2015-12-06  Dmitry Vyukov  
+
+	* gcc.dg/sancov/asan.c: Don't run when asan is not available.
+
 2015-12-07  Nathan Sidwell  
 
 	* gcc.target/nvptx/decl-init.c: New.
Index: gcc.dg/sancov/asan.c
===
--- gcc.dg/sancov/asan.c	(revision 231362)
+++ gcc.dg/sancov/asan.c	(working copy)
@@ -3,7 +3,7 @@
  - coverage does not instrument asan-emitted basic blocks
  - asan considers coverage callback as "nonfreeing" (thus 1 asan store
callback.  */
-/* { dg-do compile } */
+/* { dg-do compile { target fsanitize_address } } */
 /* { dg-options "-fsanitize-coverage=trace-pc -fsanitize=address -fdump-tree-optimized" } */
 
 void foo(volatile int *a, int *b)
Index: gcc.dg/sancov/sancov.exp
===
--- gcc.dg/sancov/sancov.exp	(revision 231362)
+++ gcc.dg/sancov/sancov.exp	(working copy)
@@ -17,6 +17,7 @@
 # .
 
 load_lib gcc-dg.exp
+load_lib asan-dg.exp
 load_lib torture-options.exp
 
 dg-init


RE: [PATCH] [ARC] Add support for atomic memory built-in.

2015-12-07 Thread Claudiu Zissulescu
Hi,

> AFAICT, you use hardware synchronisation instruction for EMMODEL_SEQ,
> and compiler memory barriers for all other memory models (except
> MEMMODEL_RELAXED).  That makes no sense; either the platform needs
> explicit instructions for memory coherency, or it doesn't.

Indeed, we on purpose misused the sync primitive to compensate for the lack of 
data memory barrier (dmb) primitive in the early SMP-HS cores. Now, I've 
checked and we can safely use the dmb primitive for all HS cores present today 
(no old HS ip without dmb is out there). Hence, I've refurbish the patch 
(attached) removing the old sync/software memory barrier combinations and use 
the newer dmb instruction for it.

Tested with dg.exp (when passing -matomic to gcc compiler line, the atomic 
tests are also successfully executed).

Thanks,
Claudiu


0001-ARC-Add-support-for-atomic-memory-built-in.patch
Description: 0001-ARC-Add-support-for-atomic-memory-built-in.patch


Re: C PATCH for c/68668 (grokdeclarator and wrong type of PARM_DECL)

2015-12-07 Thread Marek Polacek
On Thu, Dec 03, 2015 at 09:40:29PM +, Joseph Myers wrote:
> On Thu, 3 Dec 2015, Marek Polacek wrote:
> 
> > > I think you also need to decrement orig_qual_indirect, which counts the 
> > > number of levels of array type derivation from orig_qual_type.
> > 
> > Thus:
> > 
> > Bootstrapped/regtested on x86_64-linux, ok for trunk?
> > 
> > 2015-12-03  Marek Polacek  
> > 
> > PR c/68668
> > * c-decl.c (grokdeclarator): When creating a PARM_DECL of ARRAY_TYPE,
> > use TREE_TYPE of orig_qual_type.  Decrement ORIG_QUAL_INDIRECT.
> 
> On further consideration:
> 
> Removing one level of array type derivation from type means it is one 
> fewer levels indirect from the original version of orig_qual_type.  So I 
> think you should actually decrement orig_qual_indirect without changing 
> orig_qual_type.  But, if orig_qual_indirect is indirect, in that case you 
> may get better results from changing orig_qual_type without decrementing 
> orig_qual_indirect.

I think I don't quite understand the last part of this, but here's the patch
anyway.  Having added more testcases, I noticed that my last version indeed
wasn't really correct -- thanks for spotting that.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2015-12-07  Marek Polacek  

PR c/68668
* c-decl.c (grokdeclarator): If ORIG_QUAL_INDIRECT is indirect, use
TREE_TYPE of ORIG_QUAL_TYPE, otherwise decrement ORIG_QUAL_INDIRECT.

* gcc.dg/pr68668.c: New test.

diff --git gcc/c/c-decl.c gcc/c/c-decl.c
index 9ad8219..6a85514 100644
--- gcc/c/c-decl.c
+++ gcc/c/c-decl.c
@@ -6417,6 +6417,13 @@ grokdeclarator (const struct c_declarator *declarator,
  {
/* Transfer const-ness of array into that of type pointed to.  */
type = TREE_TYPE (type);
+   if (orig_qual_type != NULL_TREE)
+ {
+   if (orig_qual_indirect != 0)
+ orig_qual_type = TREE_TYPE (orig_qual_type);
+   else
+ orig_qual_indirect--;
+ }
if (type_quals)
  type = c_build_qualified_type (type, type_quals, orig_qual_type,
 orig_qual_indirect);
diff --git gcc/testsuite/gcc.dg/pr68668.c gcc/testsuite/gcc.dg/pr68668.c
index e69de29..d013aa9 100644
--- gcc/testsuite/gcc.dg/pr68668.c
+++ gcc/testsuite/gcc.dg/pr68668.c
@@ -0,0 +1,53 @@
+/* PR c/68668 */
+/* { dg-do compile } */
+
+typedef const int T[];
+typedef const int U[1];
+
+int
+fn1 (T p)
+{
+  return p[0];
+}
+
+int
+fn2 (U p[2])
+{
+  return p[0][0];
+}
+
+int
+fn3 (U p[2][3])
+{
+  return p[0][0][0];
+}
+
+int
+fn4 (U *p)
+{
+  return p[0][0];
+}
+
+int
+fn5 (U (*p)[1])
+{
+  return (*p)[0][0];
+}
+
+int
+fn6 (U (*p)[1][2])
+{
+  return (*p)[0][0][0];
+}
+
+int
+fn7 (U **p)
+{
+  return p[0][0][0];
+}
+
+int
+fn8 (U (**p)[1])
+{
+  return (*p)[0][0][0];
+}

Marek


Re: [PATCH] Fix -Werror= handling for Joined warnings, add a few missing Warning keywords (PRs c/48088, c/68657)

2015-12-07 Thread Bernd Schmidt

On 12/07/2015 02:44 PM, Jakub Jelinek wrote:


So like this?



+/* Perform diagnostics for read_cmdline_option and control_warning_option
+   functions.  Returns true if an error has been diagnosed.  */


Let's document arguments; for the ones identical to read_cmdline_option 
an explicit pointer there is sufficient, but errors is new.



@@ -1332,8 +1348,8 @@ get_option_state (struct gcc_options *op
 used by -Werror= and #pragma GCC diagnostic.  */

  void
-control_warning_option (unsigned int opt_index, int kind, bool imply,
-   location_t loc, unsigned int lang_mask,
+control_warning_option (unsigned int opt_index, int kind, const char *arg,
+   bool imply, location_t loc, unsigned int lang_mask,


This also needs an update to the function comment.

Other than that I'm ok with this. This area could probably be 
restructured a bit but for now I think this is good enough.



Bernd


Re: [PATCH] gcc/config/tilegx/tilegx.md: Compare only 32-bit values for 32-bit comparing

2015-12-07 Thread Richard Henderson
On 12/07/2015 06:53 AM, Chen Gang wrote:
> -(define_insn "insn_cmpne_"
> -  [(set (match_operand:I48MODE2 0 "register_operand" "=r")
> - (ne:I48MODE2 (match_operand:I48MODE 1 "reg_or_0_operand" "rO")
> -  (match_operand:I48MODE 2 "reg_or_cint_operand" "rO")))]
> +(define_insn "insn_cmpne_di"
> +  [(set (match_operand:I48MODE 0 "register_operand" "=r")
> + (ne:I48MODE (match_operand:DI 1 "reg_or_0_operand" "rO")
> +  (match_operand:DI 2 "reg_or_cint_operand" "rO")))]
>""
>"cmpne\t%0, %r1, %r2")
>   
> -(define_insn "insn_cmpeq_"
> -  [(set (match_operand:I48MODE2 0 "register_operand" "=r,r")
> - (eq:I48MODE2 (match_operand:I48MODE 1 "reg_or_0_operand" "%rO,rO")
> -  (match_operand:I48MODE 2 "reg_or_cint_operand" "I,rO")))]
> +(define_insn "insn_cmpne_si"
> +  [(set (match_operand:I48MODE 0 "register_operand" "=r")
> + (ne:I48MODE (match_operand:SI 1 "reg_or_0_operand" "rO")
> +  (match_operand:SI 2 "reg_or_cint_operand" "rO")))]
> +  ""
> +  "xor\t%0, %r1, %r2; bfextu\t%0, %0, 0, 31; cmpne\t%0, %0, zero"
> +  [(set_attr "type" "cannot_bundle")])
> +

The preferred solution is to remove SImode comparisons entirely, so that the
middle-end extends the data itself.

In addition, you might experiment with removing the SImode result of the
comparisons here.  We don't have them for Alpha (only DImode result), and we
don't miss them; when SImode results are required they are created via subregs.


r~


Re: [PATCH] Fix -Werror= handling for Joined warnings, add a few missing Warning keywords (PRs c/48088, c/68657)

2015-12-07 Thread Jakub Jelinek
On Mon, Dec 07, 2015 at 11:24:02AM +0100, Bernd Schmidt wrote:
> On 12/04/2015 08:36 PM, Jakub Jelinek wrote:
> >On Fri, Dec 04, 2015 at 06:19:19PM +, Manuel López-Ibáñez wrote:
> >>My guess is that the first error_at should use arg instead of
> >>option->opt_text to be equivalent. Of course, ideally, this code would
> >>not be duplicated, but rather merged "somehow".
> >
> >Consider that fixed.  As for duplication, as one operates on
> >cl_decoded_option and the other not etc., this is harder, plus
> >the missing and non-int cases are IMHO short enough that it is not worth
> >trying hard to avoid the duplication.
> >For the enum case which is larger, it is maybe worth adding
> >a helper routine for it, which would need probably only
> >location_t loc, const struct cl_enum *e, const char *opt, unsigned int 
> >lang_mask
> >arguments.  Can try that on Monday.
> 
> Maybe you can split the error printing code out of read_cmdline_option. For
> the original patch I noticed the duplication but figured it was not enough
> to really worry about, but for the error handling I think we should make an
> effort.

So like this?

2015-12-07  Jakub Jelinek  

PR c/48088
PR c/68657
* common.opt (Wframe-larger-than=): Add Warning.
* opts.h (control_warning_option): Add ARG argument.
* opts-common.c (cmdline_handle_error): New function.
(read_cmdline_option): Use it.
(control_warning_option): Likewise.  Add ARG argument.
If non-NULL, decode it if needed and pass through
to handle_generated_option.  Handle CLVC_ENUM like
CLVC_BOOLEAN.
* opts.c (common_handle_option): Adjust control_warning_option
caller.
(enable_warning_as_error): Likewise.
c-family/
* c.opt (Wfloat-conversion, Wsign-conversion): Add Warning.
* c-pragma.c (handle_pragma_diagnostic): Adjust
control_warning_option caller.
ada/
* gcc-interface/trans.c (Pragma_to_gnu): Adjust
control_warning_option caller.
testsuite/
* c-c++-common/pr68657-1.c: New test.
* c-c++-common/pr68657-2.c: New test.
* c-c++-common/pr68657-3.c: New test.

--- gcc/common.opt.jj   2015-12-06 12:20:38.496706114 +0100
+++ gcc/common.opt  2015-12-07 13:56:34.167539666 +0100
@@ -581,7 +581,7 @@ Common Var(flag_fatal_errors)
 Exit on the first error occurred.
 
 Wframe-larger-than=
-Common RejectNegative Joined UInteger
+Common RejectNegative Joined UInteger Warning
 -Wframe-larger-than=   Warn if a function's stack frame requires more 
than  bytes.
 
 Wfree-nonheap-object
--- gcc/opts.h.jj   2015-12-04 18:55:34.150960576 +0100
+++ gcc/opts.h  2015-12-07 13:56:34.189539357 +0100
@@ -363,7 +363,7 @@ extern void read_cmdline_option (struct
 const struct cl_option_handlers *handlers,
 diagnostic_context *dc);
 extern void control_warning_option (unsigned int opt_index, int kind,
-   bool imply, location_t loc,
+   const char *arg, bool imply, location_t loc,
unsigned int lang_mask,
const struct cl_option_handlers *handlers,
struct gcc_options *opts,
--- gcc/opts-common.c.jj2015-12-04 18:55:34.097961330 +0100
+++ gcc/opts-common.c   2015-12-07 14:34:01.714160852 +0100
@@ -1021,62 +1021,38 @@ generate_option_input_file (const char *
   decoded->errors = 0;
 }
 
-/* Handle the switch DECODED (location LOC) for the language indicated
-   by LANG_MASK, using the handlers in *HANDLERS and setting fields in
-   OPTS and OPTS_SET and using diagnostic context DC (if not NULL) for
-   diagnostic options.  */
+/* Perform diagnostics for read_cmdline_option and control_warning_option
+   functions.  Returns true if an error has been diagnosed.  */
 
-void
-read_cmdline_option (struct gcc_options *opts,
-struct gcc_options *opts_set,
-struct cl_decoded_option *decoded,
-location_t loc,
-unsigned int lang_mask,
-const struct cl_option_handlers *handlers,
-diagnostic_context *dc)
+static bool
+cmdline_handle_error (location_t loc, const struct cl_option *option,
+ const char *opt, const char *arg, int errors,
+ unsigned int lang_mask)
 {
-  const struct cl_option *option;
-  const char *opt = decoded->orig_option_with_args_text;
-
-  if (decoded->warn_message)
-warning_at (loc, 0, decoded->warn_message, opt);
-
-  if (decoded->opt_index == OPT_SPECIAL_unknown)
-{
-  if (handlers->unknown_option_callback (decoded))
-   error_at (loc, "unrecognized command line option %qs", decoded->arg);
-  return;
-}
-
-  if (decoded->opt_index == OPT_SPECIAL_ignore)
-return;
-
-  option = 

Re: [PATCH] Fix libgfortran build on hppa64-hp-hpux11.11

2015-12-07 Thread Ian Lance Taylor
John David Anglin  writes:

> 2015-12-05  John David Anglin  
>
>   PR 68115/libfortran
>   * configure.ac: Set libbacktrace_cv_sys_sync to no on hppa*-*-hpux*.
>   * configure: Regenerate.
>   * elf.c (backtrace_initialize): Cast __sync_bool_compare_and_swap call
>   to void.

This is OK.

Thanks.

Ian


Re: [gomp-nvptx 2/9] nvptx backend: new "uniform SIMT" codegen variant

2015-12-07 Thread Nathan Sidwell

On 12/01/15 11:01, Bernd Schmidt wrote:

On 12/01/2015 04:28 PM, Alexander Monakov wrote:

I'm taking a different approach.  I want to execute all insns in all warp
members, while ensuring that effect (on global and local state) is that same
as if any single thread was executing that instruction.  Most instructions
automatically satisfy that: if threads have the same state, then executing an
arithmetic instruction, normal memory load/store, etc. keep local state the
same in all threads.

The two exception insn categories are atomics and calls.  For calls, we can
demand recursively that they uphold this execution model, until we reach
runtime-provided "syscalls": malloc/free/vprintf.  Those we can handle like
atomics.


Didn't we also conclude that address-taking (let's say for stack addresses) is
also an operation that does not result in the same state?

Have you tried to use the mechanism used for OpenACC? IMO that would be a good
first step - get things working with fewer changes, and then look into
optimizing them (ideally for OpenMP and OpenACC both).


I would have thought the right approach would be to augment the existing 
neutering code to insert predication (instead of branch-around) using a 
heuristic as to which is the better choice.


nathan


Re: [PATCH] Fix new sancov tests

2015-12-07 Thread Jakub Jelinek
On Mon, Dec 07, 2015 at 04:16:02PM +0100, Dmitry Vyukov wrote:
> Index: ChangeLog
> ===
> --- ChangeLog (revision 231362)
> +++ ChangeLog (working copy)
> @@ -1,3 +1,7 @@
> +2015-12-06  Dmitry Vyukov  
> +
> + * gcc.dg/sancov/asan.c: Don't run when asan is not available.

The ChangeLog entry should also contain the other change:
* gcc.dg/sancov/sancov.exp: Load asan-dg.exp.

Ok with that change.

Jakub


Re: [PATCH] Fix new sancov tests

2015-12-07 Thread Nathan Sidwell

On 12/06/15 03:56, Dmitry Vyukov wrote:

Hello,

Sancov tests submitted in 231296 assume that asan is supported on all platforms.
This patch fixes that assumption.



   * gcc.target/powerpc/recip-sqrtf.c: New test.
Index: gcc.dg/sancov/sancov.exp
===



-gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.c]] "" ""
+gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/basic*.c]] "" ""

+if [check_effective_target_fsanitize_address] {
+  gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/asan*.c]] "" ""
+}
+
  torture-finish
  dg-finish



Thanks for addressing this.  FWIW I think canonical form in .exp files is to 
place something like


if { ! [is this for me?] } then {
 return
}

as early as possible (before the dg-init call)

nathan


[PTX]

2015-12-07 Thread Nathan Sidwell
Alex pointed me at his ptx patch 
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03393.html but meanwhile I'd 
reorganized that part of the PTX backend.


This patch implements Alex's patch in the new code base.  There are some minor 
additions.

1) we also look inside vector types.
2) we only limit to Pmode, if the type's mode is BLKmode.  Those are the structs 
where we can't tell whether a pointer initialization will happen.


I think there's still some cleanup on the init emission -- now the object size 
is alwasy a multiple  of the element size, we don't have to deal with any 
trailing partial elements.



nathan
2015-12-07  Nathan Sidwell  

	gcc/
	* config/nvptx/nvptx.c (nvptx_assemble_decl_begin): Look inside
	complex and vector types.  Cope with packed structs.

	gcc/testsuite/
	* gcc.target/nvptx/decl-init.c: New.

Index: config/nvptx/nvptx.c
===
--- config/nvptx/nvptx.c	(revision 231361)
+++ config/nvptx/nvptx.c	(working copy)
@@ -1643,17 +1644,24 @@ nvptx_assemble_decl_begin (FILE *file, c
   while (TREE_CODE (type) == ARRAY_TYPE)
 type = TREE_TYPE (type);
 
-  if (!INTEGRAL_TYPE_P (type) && !SCALAR_FLOAT_TYPE_P (type))
-type = ptr_type_node;
+  if (TREE_CODE (type) == VECTOR_TYPE
+  || TREE_CODE (type) == COMPLEX_TYPE)
+/* Neither vector nor complex types can contain the other.  */
+type = TREE_TYPE (type);
+
   unsigned elt_size = int_size_in_bytes (type);
-  if (elt_size > UNITS_PER_WORD)
-{
-  type = ptr_type_node;
-  elt_size = int_size_in_bytes (type);
-}
+
+  /* Largest mode we're prepared to accept.  For BLKmode types we
+ don't know if it'll contain pointer constants, so have to choose
+ pointer size, otherwise we can choose DImode.  */
+  machine_mode elt_mode = TYPE_MODE (type) == BLKmode ? Pmode : DImode;
+
+  elt_size |= GET_MODE_SIZE (elt_mode);
+  elt_size &= -elt_size; /* Extract LSB set.  */
+  elt_mode = mode_for_size (elt_size * BITS_PER_UNIT, MODE_INT, 0);
 
   decl_chunk_size = elt_size;
-  decl_chunk_mode = int_mode_for_mode (TYPE_MODE (type));
+  decl_chunk_mode = elt_mode;
   decl_offset = 0;
   init_part = 0;
 
Index: testsuite/gcc.target/nvptx/decl-init.c
===
--- testsuite/gcc.target/nvptx/decl-init.c	(revision 0)
+++ testsuite/gcc.target/nvptx/decl-init.c	(working copy)
@@ -0,0 +1,37 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-Wno-long-long" } */
+
+__extension__ _Complex float cf = 1.0f + 2.0if;
+__extension__ _Complex double cd = 3.0 + 4.0i;
+
+long long la[2] = 
+  {0x0102030405060708ll,
+   0x1112131415161718ll};
+
+struct six 
+{
+  char a;
+  short b, c;
+};
+
+struct six six1 = {1, 2, 3};
+struct six six2[2] = {{4, 5, 6}, {7, 8, 9}};
+
+struct __attribute__((packed)) five 
+{
+  char a;
+  int b;
+};
+struct five five1 = {10, 11};
+struct five five2[2] = {{12, 13}, {14, 15}};
+
+int  __attribute__((vector_size(16))) vi = {16, 17, 18, 19};
+
+/* dg-final { scan-assembler ".align 4 .u32 cf\\\[2\\\] = { 1065353216, 1073741824 };" } } */
+/* dg-final { scan-assembler ".align 8 .u64 df\\\[2\\\] = { 4613937818241073152, 4616189618054758400 };" } } */
+/* dg-final { scan-assembler ".align 8 .u64 la\\\[2\\\] = { 72623859790382856, 1230066625199609624 };" } } */
+/* dg-final { scan-assembler ".align 2 .u16 six1\\\[3\\\] = { 1, 2, 3 };" } } */
+/* dg-final { scan-assembler ".align 2 .u16 six2\\\[6\\\] = { 4, 5, 6, 7, 8, 9 };" } } */
+/* dg-final { scan-assembler ".align 1 .u8 five1\\\[5\\\] = { 10, 11, 0, 0, 0 };" } } */
+/* dg-final { scan-assembler ".align 1 .u8 five2\\\[10\\\] = { 12, 13, 0, 0, 0, 14, 15, 0, 0, 0 };" } } */
+/* dg-final { scan-assembler ".align 8 .u32 vi\\\[4\\\] = { 16, 17, 18, 19 };" } } */


Re: [PATCH 5/7][Testsuite] Support ARMv8.1 ARM tests.

2015-12-07 Thread Matthew Wahab

On 27/11/15 17:11, Matthew Wahab wrote:

On 27/11/15 13:44, Christophe Lyon wrote:

On 26/11/15 16:02, Matthew Wahab wrote:



This patch adds ARMv8.1 support to GCC Dejagnu, to allow ARM
tests to specify targest and to set up command line options.
It builds on the ARMv8.1 target support added for AArch64 tests, partly
reworking that support to take into account the different configurations
that tests may be run under.



I may be mistaken, but -mfpu=neon-fp-armv8 and -mfloat-abi=softfp are not
supported by aarch64-gcc. So it seems to me that
check_effective_target_arm_v8_1a_neon_ok_nocache will not always work
for aarch64 after your patch.



Or does it work because no option is needed and thus "" always
matches and thus the loop always exits after the first iteration
on aarch64?


Yes, the idea is that the empty string will make the function first try
'-march=armv8.1-a' without any other flag. That will work for AArch64 because it
doesn't need any other option.


Maybe a more accurate comment would help remembering that, in case
-mfpu option becomes necessary for aarch64.



Agreed, it's worth having a comment to explain what the 'foreach' construct is 
doing.

Matthew


I've added a comment to the foreach construct, to make it clearer what
it's doing.

Matthew

testsuite/
2015-12-07  Matthew Wahab  

* lib/target-supports.exp (add_options_for_arm_v8_1a_neon): Update
comment.  Use check_effetive_target_arm_v8_1a_neon_ok to select
the command line options.
(check_effective_target_arm_v8_1a_neon_ok_nocache): Update initial
test to allow ARM targets.  Select and record a working set of
command line options.
(check_effective_target_arm_v8_1a_neon_hw): Add tests for ARM
targets.

>From 7e2cd1ef475a5c7f4a4722b9ba32bd46e3b30eae Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Fri, 9 Oct 2015 17:38:12 +0100
Subject: [PATCH 5/7] [Testsuite] Support ARMv8.1 NEON on ARM.

---
 gcc/testsuite/lib/target-supports.exp | 60 ++-
 1 file changed, 45 insertions(+), 15 deletions(-)

diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 4e349e9..6dfb6f6 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -2816,14 +2816,15 @@ proc add_options_for_arm_v8_neon { flags } {
 return "$flags $et_arm_v8_neon_flags -march=armv8-a"
 }
 
-# Add the options needed for ARMv8.1 Adv.SIMD.
+# Add the options needed for ARMv8.1 Adv.SIMD.  Also adds the ARMv8 NEON
+# options for AArch64 and for ARM.
 
 proc add_options_for_arm_v8_1a_neon { flags } {
-if { [istarget aarch64*-*-*] } {
-	return "$flags -march=armv8.1-a"
-} else {
+if { ! [check_effective_target_arm_v8_1a_neon_ok] } {
 	return "$flags"
 }
+global et_arm_v8_1a_neon_flags
+return "$flags $et_arm_v8_1a_neon_flags -march=armv8.1-a"
 }
 
 proc add_options_for_arm_crc { flags } {
@@ -3271,17 +3272,33 @@ proc check_effective_target_arm_neonv2_hw { } {
 }
 
 # Return 1 if the target supports the ARMv8.1 Adv.SIMD extension, 0
-# otherwise.  The test is valid for AArch64.
+# otherwise.  The test is valid for AArch64 and ARM.  Record the command
+# line options that needed.
 
 proc check_effective_target_arm_v8_1a_neon_ok_nocache { } {
-if { ![istarget aarch64*-*-*] } {
-	return 0
+global et_arm_v8_1a_neon_flags
+set et_arm_v8_1a_neon_flags ""
+
+if { ![istarget arm*-*-*] && ![istarget aarch64*-*-*] } {
+	return 0;
 }
-return [check_no_compiler_messages_nocache arm_v8_1a_neon_ok assembly {
-	#if !defined (__ARM_FEATURE_QRDMX)
-	#error "__ARM_FEATURE_QRDMX not defined"
-	#endif
-} [add_options_for_arm_v8_1a_neon ""]]
+
+# Iterate through sets of options to find the compiler flags that
+# need to be added to the -march option.  Start with the empty set
+# since AArch64 only needs the -march setting.
+foreach flags {"" "-mfpu=neon-fp-armv8" "-mfloat-abi=softfp" \
+		   "-mfpu=neon-fp-armv8 -mfloat-abi=softfp"} {
+	if { [check_no_compiler_messages_nocache arm_v8_1a_neon_ok object {
+	#if !defined (__ARM_FEATURE_QRDMX)
+	#error "__ARM_FEATURE_QRDMX not defined"
+	#endif
+	} "$flags -march=armv8.1-a"] } {
+	set et_arm_v8_1a_neon_flags "$flags -march=armv8.1-a"
+	return 1
+	}
+}
+
+return 0;
 }
 
 proc check_effective_target_arm_v8_1a_neon_ok { } {
@@ -3308,16 +3325,17 @@ proc check_effective_target_arm_v8_neon_hw { } {
 }
 
 # Return 1 if the target supports executing the ARMv8.1 Adv.SIMD extension, 0
-# otherwise.  The test is valid for AArch64.
+# otherwise.  The test is valid for AArch64 and ARM.
 
 proc check_effective_target_arm_v8_1a_neon_hw { } {
 if { ![check_effective_target_arm_v8_1a_neon_ok] } {
 	return 0;
 }
-return [check_runtime_nocache arm_v8_1a_neon_hw_available {
+return [check_runtime arm_v8_1a_neon_hw_available {
 	int
 	main 

Re: C PATCH for c/68668 (grokdeclarator and wrong type of PARM_DECL)

2015-12-07 Thread Joseph Myers
On Mon, 7 Dec 2015, Marek Polacek wrote:

> Anyway, here's the version with == 0.  Thanks,
> 
> Bootstrapped/regtested on x86_64-linux, ok for trunk?
> 
> 2015-12-07  Marek Polacek  
> 
>   PR c/68668
>   * c-decl.c (grokdeclarator): If ORIG_QUAL_INDIRECT is indirect, use
>   TREE_TYPE of ORIG_QUAL_TYPE, otherwise decrement ORIG_QUAL_INDIRECT.
> 
>   * gcc.dg/pr68668.c: New test.

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PTX] no return fns

2015-12-07 Thread Nathan Sidwell

On 12/07/15 11:18, Nathan Sidwell wrote:

calls to no return fns can cause problems with the PTX JIT.  It doesn't
understand their no-return nature and can erroneously think there are unexitable
loops (depending on the precise placement of bbs).  It can get so upset it
segfaults.

gcc.dg/pr68671.c started causing this last week, with what looked like
incomplete jump threading.


Bernd, I meant to ask if you recalled previous tests you failed because of this 
problem?  I'm retesting

 gcc.c-torture/compile/920723-1.c
 gcc.c-torture/compile/pr33855.c
 gcc.c-torture/execute/981019-1.c

to see if they are culprits.

nathan


Re: [PTX] no return fns

2015-12-07 Thread Alexander Monakov
Hello Nathan,

On Mon, 7 Dec 2015, Nathan Sidwell wrote:
> This patch changes call emission to look for a noreturn note and emit a trap
> insn after the call.  The JIT  no longer explodes.

I think there's a potential issue with the patch: when the noreturn function
has a non-void return value, your patch places 'trap' between 'call' and
'ld.param' insns.  That violates the PTX specification, which demands:

All st.param instructions used for passing arguments to function call must
immediately precede the corresponding call instruction and ld.param
instruction used for collecting return value must immediately follow the
call instruction without any control flow alteration.

Alexander


Re: [PATCH 1/7][ARM] Add support for ARMv8.1.

2015-12-07 Thread Matthew Wahab

Ping. Updated patch attached.
Matthew

On 26/11/15 15:55, Matthew Wahab wrote:

Hello,


ARMv8.1 includes an extension to ARM which adds two Adv.SIMD
instructions, vqrdmlah and vqrdmlsh. This patch set adds support for
ARMv8.1 and for the new instructions, enabling the architecture with
--march=armv8.1-a. The new instructions are enabled when both ARMv8.1
and a suitable fpu options are set, for instance with -march=armv8.1-a
-mfpu=neon-fp-armv8 -mfloat-abi=hard.

This patch set adds the command line options and internal feature
macros. Following patches
- enable multilib support for ARMv8.1,
- add patterns for the new instructions,
- add the ACLE feature macro for the ARMv8.1 extensions,
- extend target support in the testsuite to ARMv8.1,
- add the ACLE intrinsics for vqrmdl{as}h and
- add the ACLE intrinsics for vqrmdl{as}h_lane.

Tested the series for arm-none-eabi with cross-compiled check-gcc on an
ARMv8.1 emulator. Also tested arm-none-linux-gnueabihf with native
bootstrap and make check.

Is this ok for trunk?
Matthew

gcc/
2015-11-26  Matthew Wahab  

 * config/arm/arm-arches.def: Add "armv8.1-a" and "armv8.1-a+crc".
 * config/arm/arm-protos.h (FL2_ARCH8_1): New.
 (FL2_FOR_ARCH8_1A): New.
 * config/arm/arm-tables.opt: Regenerate.
 * config/arm/arm.c (arm_arch8_1): New.
 (arm_option_override): Set arm_arch8_1.
 * config/arm/arm.h (TARGET_NEON_RDMA): New.
 (arm_arch8_1): Declare.
 * doc/invoke.texi (ARM Options, -march): Add "armv8.1-a" and
 "armv8.1-a+crc".
 (ARM Options, -mfpu): Fix a typo.


>From 65bcf9a875fd31f6201e64cbbd4fdfb0b8f4719e Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Tue, 1 Sep 2015 11:31:25 +0100
Subject: [PATCH 1/7] [ARM] Add ARMv8.1 architecture flags and options.

Change-Id: I6bb0c7f020613a1a17e40bccc28b00c30d644c70
---
 gcc/config/arm/arm-arches.def |  5 +
 gcc/config/arm/arm-protos.h   |  3 +++
 gcc/config/arm/arm-tables.opt | 10 --
 gcc/config/arm/arm.c  |  4 
 gcc/config/arm/arm.h  |  6 ++
 gcc/doc/invoke.texi   |  6 +++---
 6 files changed, 29 insertions(+), 5 deletions(-)

diff --git a/gcc/config/arm/arm-arches.def b/gcc/config/arm/arm-arches.def
index ddf6c3c..6c83153 100644
--- a/gcc/config/arm/arm-arches.def
+++ b/gcc/config/arm/arm-arches.def
@@ -57,6 +57,11 @@ ARM_ARCH("armv7-m", cortexm3,	7M,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC |	  FL_FOR_
 ARM_ARCH("armv7e-m", cortexm4,  7EM,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC |	  FL_FOR_ARCH7EM))
 ARM_ARCH("armv8-a", cortexa53,  8A,	ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_FOR_ARCH8A))
 ARM_ARCH("armv8-a+crc",cortexa53, 8A,   ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_CRC32  | FL_FOR_ARCH8A))
+ARM_ARCH ("armv8.1-a", cortexa53,  8A,
+	  ARM_FSET_MAKE (FL_CO_PROC | FL_FOR_ARCH8A,  FL2_FOR_ARCH8_1A))
+ARM_ARCH ("armv8.1-a+crc",cortexa53, 8A,
+	  ARM_FSET_MAKE (FL_CO_PROC | FL_CRC32 | FL_FOR_ARCH8A,
+			 FL2_FOR_ARCH8_1A))
 ARM_ARCH("iwmmxt",  iwmmxt, 5TE,	ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_STRONG | FL_FOR_ARCH5TE | FL_XSCALE | FL_IWMMXT))
 ARM_ARCH("iwmmxt2", iwmmxt2,5TE,	ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_STRONG | FL_FOR_ARCH5TE | FL_XSCALE | FL_IWMMXT | FL_IWMMXT2))
 
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index e7328e7..d649e86 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -387,6 +387,8 @@ extern bool arm_is_constant_pool_ref (rtx);
 #define FL_IWMMXT2(1 << 30)   /* "Intel Wireless MMX2 technology".  */
 #define FL_ARCH6KZ(1 << 31)   /* ARMv6KZ architecture.  */
 
+#define FL2_ARCH8_1   (1 << 0)	  /* Architecture 8.1.  */
+
 /* Flags that only effect tuning, not available instructions.  */
 #define FL_TUNE		(FL_WBUF | FL_VFPV2 | FL_STRONG | FL_LDSCHED \
 			 | FL_CO_PROC)
@@ -415,6 +417,7 @@ extern bool arm_is_constant_pool_ref (rtx);
 #define FL_FOR_ARCH7M	(FL_FOR_ARCH7 | FL_THUMB_DIV)
 #define FL_FOR_ARCH7EM  (FL_FOR_ARCH7M | FL_ARCH7EM)
 #define FL_FOR_ARCH8A	(FL_FOR_ARCH7VE | FL_ARCH8)
+#define FL2_FOR_ARCH8_1A	FL2_ARCH8_1
 
 /* There are too many feature bits to fit in a single word so the set of cpu and
fpu capabilities is a structure.  A feature set is created and manipulated
diff --git a/gcc/config/arm/arm-tables.opt b/gcc/config/arm/arm-tables.opt
index 48aac41..db17f6e 100644
--- a/gcc/config/arm/arm-tables.opt
+++ b/gcc/config/arm/arm-tables.opt
@@ -416,10 +416,16 @@ EnumValue
 Enum(arm_arch) String(armv8-a+crc) Value(26)
 
 EnumValue
-Enum(arm_arch) String(iwmmxt) Value(27)
+Enum(arm_arch) String(armv8.1-a) Value(27)
 
 EnumValue
-Enum(arm_arch) String(iwmmxt2) Value(28)
+Enum(arm_arch) String(armv8.1-a+crc) Value(28)
+
+EnumValue
+Enum(arm_arch) String(iwmmxt) Value(29)
+
+EnumValue
+Enum(arm_arch) String(iwmmxt2) Value(30)
 
 Enum
 Name(arm_fpu) Type(int)
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 3588b83..f89411e 100644
--- a/gcc/config/arm/arm.c
+++ 

Re: [patch] Fix PR middle-end/68291 & 68292

2015-12-07 Thread Eric Botcazou
> Ok. Although thinking about your comment in the PR about not making such
> vectors gimple registers I wonder what the effects of that would be.

First of all it's a bit painful to do because is_gimple_reg_type is defined 
inline in gimple-expr.h and adding TYPE_MODE in there causes a compilation 
failure for a bunch of files.  More seriously, this can probably be seen as a 
real layering violation so I'm not sure this would be a progress.

-- 
Eric Botcazou


[PATCH] Add testcase for c++/68116

2015-12-07 Thread Marek Polacek
This testcase used to ICE, but compiles fine since the C++ delayed folding
merge.  I'd like to add it to the testsuite and close the PR.

Tested on x86_64-linux, ok for trunk?

2015-12-07  Marek Polacek  

PR c++/68116
* g++.dg/cpp0x/pr68116.C: New test.

diff --git gcc/testsuite/g++.dg/cpp0x/pr68116.C 
gcc/testsuite/g++.dg/cpp0x/pr68116.C
index e69de29..04ed901 100644
--- gcc/testsuite/g++.dg/cpp0x/pr68116.C
+++ gcc/testsuite/g++.dg/cpp0x/pr68116.C
@@ -0,0 +1,12 @@
+// PR c++/68116
+// { dg-do compile { target c++11 } }
+
+class C {
+  void foo ();
+  typedef void (C::*T) (int);
+  static T b[];
+};
+C::T C::b[]
+{
+  T (::foo)
+};

Marek


Re: [PATCH] New version of libmpx with new memmove wrapper

2015-12-07 Thread Ilya Enkovich
2015-12-06 22:41 GMT+03:00 Aleksandra Tsvetkova :
> Fixed all.
> Now there are no new fails on spec2000

If you made some fix in your algorithm to pass SPEC benchmarks, you
need to extend your tests to cover this fix.

Thanks,
Ilya


Re: [PATCH 3/7][ARM] Add patterns for new instructions

2015-12-07 Thread Matthew Wahab

Ping. Updated patch attached.
Matthew

On 26/11/15 16:00, Matthew Wahab wrote:

Hello,

This patch adds patterns for the instructions, vqrdmlah and vqrdmlsh,
introduced in the ARMv8.1 architecture. The instructions are made
available when -march=armv8.1-a is enabled with suitable fpu settings,
such as -mfpu=neon-fp-armv8 -mfloat-abi=hard.

Tested the series for arm-none-eabi with cross-compiled check-gcc on an
ARMv8.1 emulator. Also tested arm-none-linux-gnueabihf with native
bootstrap and make check.

Ok for trunk?
Matthew

gcc/
2015-11-26  Matthew Wahab  

 * config/arm/iterators.md (VQRDMLH_AS): New.
 (neon_rdma_as): New.
 * config/arm/neon.md
 (neon_vqrdmlh): New.
 (neon_vqrdmlh_lane): New.
 * config/arm/unspecs.md (UNSPEC_VQRDMLAH): New.
 (UNSPEC_VQRDMLSH): New.



>From 8b69bae2f0057be09d3cbe3fe3c29155085e260d Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Wed, 17 Jun 2015 12:00:50 +0100
Subject: [PATCH 3/7] [ARM] Add patterns for new instructions.

Change-Id: Ia84c345019c7beda2d3c6c39074043d2e005347a
---
 gcc/config/arm/iterators.md |  5 +
 gcc/config/arm/neon.md  | 45 +
 gcc/config/arm/unspecs.md   |  2 ++
 3 files changed, 52 insertions(+)

diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 6a54125..c7a6880 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -362,6 +362,8 @@
 (define_int_iterator CRYPTO_SELECTING [UNSPEC_SHA1C UNSPEC_SHA1M
UNSPEC_SHA1P])
 
+(define_int_iterator VQRDMLH_AS [UNSPEC_VQRDMLAH UNSPEC_VQRDMLSH])
+
 ;;
 ;; Mode attributes
 ;;
@@ -831,3 +833,6 @@
(simple_return " && use_simple_return_p ()")])
 (define_code_attr return_cond_true [(return " && USE_RETURN_INSN (TRUE)")
(simple_return " && use_simple_return_p ()")])
+
+;; Attributes for VQRDMLAH/VQRDMLSH
+(define_int_attr neon_rdma_as [(UNSPEC_VQRDMLAH "a") (UNSPEC_VQRDMLSH "s")])
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 62fb6da..844ef5e 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -2014,6 +2014,18 @@
   [(set_attr "type" "neon_sat_mul_")]
 )
 
+;; vqrdmlah, vqrdmlsh
+(define_insn "neon_vqrdmlh"
+  [(set (match_operand:VMDQI 0 "s_register_operand" "=w")
+	(unspec:VMDQI [(match_operand:VMDQI 1 "s_register_operand" "0")
+		   (match_operand:VMDQI 2 "s_register_operand" "w")
+		   (match_operand:VMDQI 3 "s_register_operand" "w")]
+		  VQRDMLH_AS))]
+  "TARGET_NEON_RDMA"
+  "vqrdmlh.\t%0, %2, %3"
+  [(set_attr "type" "neon_sat_mla__long")]
+)
+
 (define_insn "neon_vqdmlal"
   [(set (match_operand: 0 "s_register_operand" "=w")
 (unspec: [(match_operand: 1 "s_register_operand" "0")
@@ -3176,6 +3188,39 @@ if (BYTES_BIG_ENDIAN)
   [(set_attr "type" "neon_sat_mul__scalar_q")]
 )
 
+;; vqrdmlah_lane, vqrdmlsh_lane
+(define_insn "neon_vqrdmlh_lane"
+  [(set (match_operand:VMQI 0 "s_register_operand" "=w")
+	(unspec:VMQI [(match_operand:VMQI 1 "s_register_operand" "0")
+		  (match_operand:VMQI 2 "s_register_operand" "w")
+		  (match_operand: 3 "s_register_operand"
+	  "")
+		  (match_operand:SI 4 "immediate_operand" "i")]
+		 VQRDMLH_AS))]
+  "TARGET_NEON_RDMA"
+{
+  return
+   "vqrdmlh.\t%q0, %q2, %P3[%c4]";
+}
+  [(set_attr "type" "neon_mla__scalar")]
+)
+
+(define_insn "neon_vqrdmlh_lane"
+  [(set (match_operand:VMDI 0 "s_register_operand" "=w")
+	(unspec:VMDI [(match_operand:VMDI 1 "s_register_operand" "0")
+		  (match_operand:VMDI 2 "s_register_operand" "w")
+		  (match_operand:VMDI 3 "s_register_operand"
+	  "")
+		  (match_operand:SI 4 "immediate_operand" "i")]
+		 VQRDMLH_AS))]
+  "TARGET_NEON_RDMA"
+{
+  return
+   "vqrdmlh.\t%P0, %P2, %P3[%c4]";
+}
+  [(set_attr "type" "neon_mla__scalar")]
+)
+
 (define_insn "neon_vmla_lane"
   [(set (match_operand:VMD 0 "s_register_operand" "=w")
 	(unspec:VMD [(match_operand:VMD 1 "s_register_operand" "0")
diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
index 67acafd..ffe703c 100644
--- a/gcc/config/arm/unspecs.md
+++ b/gcc/config/arm/unspecs.md
@@ -360,5 +360,7 @@
   UNSPEC_NVRINTX
   UNSPEC_NVRINTA
   UNSPEC_NVRINTN
+  UNSPEC_VQRDMLAH
+  UNSPEC_VQRDMLSH
 ])
 
-- 
2.1.4



[PTX] no return fns

2015-12-07 Thread Nathan Sidwell
calls to no return fns can cause problems with the PTX JIT.  It doesn't 
understand their no-return nature and can erroneously think there are unexitable 
loops (depending on the precise placement of bbs).  It can get so upset it 
segfaults.


gcc.dg/pr68671.c started causing this last week, with what looked like 
incomplete jump threading.


This patch changes call emission to look for a noreturn note and emit a trap 
insn after the call.  The JIT  no longer explodes.


nathan

2015-12-07  Nathan Sidwell  

	gcc/
	* config/nvptx/nvptx.c (nvptx_output_call_insn): Emit trap after no
	return call.

	gcc/testsuite/
	* gcc.target/nvptx/abort.c: New.

Index: config/nvptx/nvptx.c
===
--- config/nvptx/nvptx.c	(revision 231362)
+++ config/nvptx/nvptx.c	(working copy)
@@ -1890,6 +1890,13 @@ nvptx_output_call_insn (rtx_insn *insn,
 }
   fprintf (asm_out_file, ";\n");
 
+  if (find_reg_note (insn, REG_NORETURN, NULL))
+/* No return functions confuse the PTX JIT, as it doesn't realize
+   the flow control barrier they imply.  It can seg fault if it
+   encounters what looks like an unexitable loop.  Emit a trailing
+   trap, which it does grok.  */
+fprintf (asm_out_file, "\t\ttrap; // (noreturn)\n");
+
   return result != NULL_RTX ? "\tld.param%t0\t%0, [%%retval_in];\n\t}" : "}";
 }
 
Index: testsuite/gcc.target/nvptx/abort.c
===
--- testsuite/gcc.target/nvptx/abort.c	(revision 0)
+++ testsuite/gcc.target/nvptx/abort.c	(working copy)
@@ -0,0 +1,13 @@
+/* { dg-do compile} */
+/* Annotate no return functions with a trailing 'trap'.  */
+
+extern void abort ();
+
+int main (int argc, char **argv)
+{
+  if (argc > 2)
+abort ();
+  return 0;
+}
+
+/* { dg-final { scan-assembler "call abort;\[\r\n\t \]+trap;" } } */


Re: [PTX] no return fns

2015-12-07 Thread Bernd Schmidt

On 12/07/2015 06:03 PM, Nathan Sidwell wrote:

On 12/07/15 11:18, Nathan Sidwell wrote:

calls to no return fns can cause problems with the PTX JIT.  It doesn't
understand their no-return nature and can erroneously think there are
unexitable
loops (depending on the precise placement of bbs).  It can get so
upset it
segfaults.

gcc.dg/pr68671.c started causing this last week, with what looked like
incomplete jump threading.


Bernd, I meant to ask if you recalled previous tests you failed because
of this problem?  I'm retesting
  gcc.c-torture/compile/920723-1.c
  gcc.c-torture/compile/pr33855.c
  gcc.c-torture/execute/981019-1.c


I don't completely recall what reasons there were for ptxas crashes (or 
if we indeed reliably figured out the causes). Uninitialized registers 
come to mind, but I don't think I recall issues with noreturn.



Bernd



Re: [PTX] no return fns

2015-12-07 Thread Bernd Schmidt

On 12/07/2015 06:34 PM, Nathan Sidwell wrote:


Aren't noreturn fns required to be void?  It certainly doesn't make
sense for them to do otherwise.


The documentation says "it makes no sense" for them to have a type other 
than void, but I don't think that translates into a requirement. I 
suppose you could imagine a situation where you call various functions 
through a given function pointer type, and one of them doesn't return 
and could be marked as such.



Bernd



Re: [gomp4] Fix Fortran deviceptr

2015-12-07 Thread Cesar Philippidis
On 12/06/2015 06:52 AM, James Norris wrote:

> This patch fixes a some runtime issues when dealing with
> the deviceptr clause in Fortran. There were some corner
> cases that were not being dealt with correctly, and the
> patch resolves these. Also a new set of test cases has
> been added.

Which corner cases?

> diff --git a/libgomp/ChangeLog.gomp b/libgomp/ChangeLog.gomp
> index a2f1c31..791aa4c 100644
> --- a/libgomp/ChangeLog.gomp
> +++ b/libgomp/ChangeLog.gomp
> @@ -1,3 +1,10 @@
> +2015-12-06  James Norris  
> +
> + * oacc-parallel.c (GOACC_parallel_keyed, GOACC_data_start):
> + Handle Fortran deviceptr clause combination.
> + * testsuite/libgomp.oacc-fortran/deviceptr-1.f90: New test.
> + * testsuite/libgomp.oacc-fortran/declare-1.f90: Remove erroneous test.
> +
>  2015-12-05  Chung-Lin Tang  
>  
>   * oacc-plugin.h (GOMP_PLUGIN_async_unmap_vars): Add int parameter.
> diff --git a/libgomp/oacc-parallel.c b/libgomp/oacc-parallel.c
> index a4b2c01..a606152 100644
> --- a/libgomp/oacc-parallel.c
> +++ b/libgomp/oacc-parallel.c
> @@ -99,18 +99,37 @@ GOACC_parallel_keyed (int device, void (*fn) (void *),
>thr = goacc_thread ();
>acc_dev = thr->dev;
>  
> -  for (i = 0; i < (signed)(mapnum - 1); i++)
> +  for (i = 0; i < mapnum; i++)
>  {
>unsigned short kind1 = kinds[i] & 0xff;
> -  unsigned short kind2 = kinds[i+1] & 0xff;
>  
>/* Handle Fortran deviceptr clause.  */
> -  if ((kind1 == GOMP_MAP_FORCE_DEVICEPTR && kind2 == GOMP_MAP_POINTER)
> -&& (sizes[i + 1] == 0)
> -&& (hostaddrs[i] == *(void **)hostaddrs[i + 1]))
> +  if (kind1 == GOMP_MAP_FORCE_DEVICEPTR)
>   {
> -   kinds[i+1] = kinds[i];
> -   sizes[i+1] = sizeof (void *);
> +   unsigned short kind2;
> +
> +   if (i < (signed)mapnum - 1)
> + kind2 = kinds[i + 1] & 0xff;
> +   else
> + kind2 = 0x;
> +
> +   if (sizes[i] == sizeof (void *))
> + continue;
> +
> +   /* At this point, we're dealing with a Fortran deviceptr.
> +  If the next element is not what we're expecting, then
> +  this is an instance of where the deviceptr variable was
> +  not used within the region and the pointer was removed
> +  by the gimplifier.  */
> +   if (kind2 == GOMP_MAP_POINTER
> +   && sizes[i + 1] == 0
> +   && hostaddrs[i] == *(void **)hostaddrs[i + 1])
> + {
> +   kinds[i+1] = kinds[i];
> +   sizes[i+1] = sizeof (void *);
> + }
> +
> +   /* Invalidate the entry.  */
> hostaddrs[i] = NULL;
>   }
>  }
> @@ -254,18 +273,38 @@ GOACC_data_start (int device, size_t mapnum,
>struct goacc_thread *thr = goacc_thread ();
>struct gomp_device_descr *acc_dev = thr->dev;
>  
> -  for (i = 0; i < (signed)(mapnum - 1); i++)
> +  for (i = 0; i < mapnum; i++)
>  {
>unsigned short kind1 = kinds[i] & 0xff;
> -  unsigned short kind2 = kinds[i+1] & 0xff;
>  
>/* Handle Fortran deviceptr clause.  */
> -  if ((kind1 == GOMP_MAP_FORCE_DEVICEPTR && kind2 == GOMP_MAP_POINTER)
> -&& (sizes[i + 1] == 0)
> -&& (hostaddrs[i] == *(void **)hostaddrs[i + 1]))
> +  if (kind1 == GOMP_MAP_FORCE_DEVICEPTR)
>   {
> -   kinds[i+1] = kinds[i];
> -   sizes[i+1] = sizeof (void *);
> +   unsigned short kind2;
> +
> +   if (i < (signed)mapnum - 1)
> + kind2 = kinds[i + 1] & 0xff;
> +   else
> + kind2 = 0x;
> +
> +   /* If the size is right, skip it.  */
> +   if (sizes[i] == sizeof (void *))
> + continue;
> +
> +   /* At this point, we're dealing with a Fortran deviceptr.
> +  If the next element is not what we're expecting, then
> +  this is an instance of where the deviceptr variable was
> +  not used within the region and the pointer was removed
> +  by the gimplifier.  */
> +   if (kind2 == GOMP_MAP_POINTER
> +   && sizes[i + 1] == 0
> +   && hostaddrs[i] == *(void **)hostaddrs[i + 1])
> + {
> +   kinds[i+1] = kinds[i];
> +   sizes[i+1] = sizeof (void *);
> + }
> +
> +   /* Invalidate the entry.  */
> hostaddrs[i] = NULL;
>   }
>  }

Two observations:

 1. Why is deviceptr so special that gomp_map_vars can't handle it
directly?

 2. It appears that deviceptr code in GOACC_parallel_keyed is mostly
identical to GOACC_data_start. Can you put that duplicate code into
a function? That would be easier to maintain in the long run.

> diff --git a/libgomp/testsuite/libgomp.oacc-fortran/declare-1.f90 
> b/libgomp/testsuite/libgomp.oacc-fortran/declare-1.f90
> index 430cd24..e781878 100644
> --- a/libgomp/testsuite/libgomp.oacc-fortran/declare-1.f90
> +++ b/libgomp/testsuite/libgomp.oacc-fortran/declare-1.f90
> @@ -1,6 +1,4 @@
>  ! { dg-do run  { target openacc_nvidia_accel_selected } }
> -! 

RE: [Patch,microblaze]: Instruction prefetch optimization for microblaze.

2015-12-07 Thread Ajit Kumar Agarwal


-Original Message-
From: Michael Eager [mailto:ea...@eagerm.com] 
Sent: Thursday, December 03, 2015 7:27 PM
To: Ajit Kumar Agarwal; GCC Patches
Cc: Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala
Subject: Re: [Patch,microblaze]: Instruction prefetch optimization for 
microblaze.

On 12/01/2015 12:49 AM, Ajit Kumar Agarwal wrote:
> The changes are made in this patch for the instruction prefetch optimizations 
> for Microblaze.
>
> Reg tested for Microblaze target.
>
> The changes are made for instruction prefetch optimizations for 
> Microblaze. The "wic" microblaze instruction is the instruction 
> prefetch instruction. The instruction prefetch optimization is done to 
> generate the iprefetch instruction at the call site fall through path. 
> This optimization is enabled with  microblaze target flag mxl-prefetch. The 
> purpose of adding the flags is that selection of "wic" instruction should be 
> enabled in the reconfigurable design and the selection is not enabled by 
> default.
>
> ChangeLog:
> 2015-12-01  Ajit Agarwal  
>
>   * config/microblaze/microblaze.c
>   (get_branch_target): New.
>   (insert_wic_for_ilb_runout): New.
>   (insert_wic): New.
>   (microblaze_machine_dependent_reorg): New.
>   (TARGET_MACHINE_DEPENDENT_REORG): Define macro.
>   * config/microblaze/microblaze.md
>   (UNSPEC_IPREFETCH): Define.
>   (iprefetch): New pattern
>   * config/microblaze/microblaze.opt
>   (mxl-prefetch): New flag.
>
> Signed-off-by:Ajit Agarwal ajit...@xilinx.com
>
>
> Thanks & Regards
> Ajit
>

>>+  rtx_insn *insn, *before_4 = 0, *before_16 = 0;  int addr = 0, length, 
>>+ first_addr = -1;  int wic_addr0 = 128 * 4, wic_addr1 = 128 * 4;

>>Especially when there are initializers, I prefer to see each variable 
>>declared on a separate line.  If the meaning of a variable is not clear (and 
>>most of these are not), include a comment >>before the declaration.

>>+if (first_addr == -1)
>>+  first_addr = INSN_ADDRESSES (INSN_UID (insn));

>>Can be moved to initialize first_addr.

>>+addr = INSN_ADDRESSES (INSN_UID (insn)) - first_addr;

>>Is "addr" and address or offset?  If the latter, use a more descriptive name.


>>+if (before_4 == 0 && addr + length >= 4 * 4)
>>+  before_4 = insn;
...

>>Please add comments to describe what you are doing here.  What are before_4 
>>and before_16?  What are all these conditions testing?


>>+  loop_optimizer_finalize();

>>Space before parens.

All the above comments are incorporated. Updated patch is attached.

Regtested for Microblaze target. 

Mibench/EEMBC benchmarks are run on the hardware enabling the mxl-prefetch and 
the run goes through fine
With the generation of "wic" instruction.

[Patch,microblaze]: Instruction prefetch optimization for microblaze.

The changes are made for instruction prefetch optimizations for Microblaze. The 
"wic"
microblaze instruction is the instruction prefetch instruction. The instruction 
prefetch
optimization is done to generate the iprefetch instruction at the call site 
fall through
path. This optimization is enabled with  microblaze target flag mxl-prefetch. 
The purpose
of adding the flags is that selection of "wic" instruction should be enabled in 
the
reconfigurable design and the selection is not enabled by default.

ChangeLog:
2015-12-07  Ajit Agarwal  

* config/microblaze/microblaze.c
(get_branch_target): New.
(insert_wic_for_ilb_runout): New.
(insert_wic): New.
(microblaze_machine_dependent_reorg): New.
(TARGET_MACHINE_DEPENDENT_REORG): Define macro.
* config/microblaze/microblaze.md
(UNSPEC_IPREFETCH): Define.
(iprefetch): New pattern
* config/microblaze/microblaze.opt
(mxl-prefetch): New flag.

Signed-off-by:Ajit Agarwal ajit...@xilinx.com

Thanks & Regards
Ajit

-- 
Michael Eagerea...@eagercon.com
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077


iprefetch.patch
Description: iprefetch.patch


Re: [PATCH 4/7][ARM] Add ACLE feature macro for ARMv8.1 instructions.

2015-12-07 Thread Matthew Wahab

Ping. Updated patch attached.
Matthew


On 26/11/15 16:01, Matthew Wahab wrote:

Hello,

This patch adds the feature macro __ARM_FEATURE_QRDMX to indicate the
presence of the ARMv8.1 instructions vqrdmlah and vqrdmlsh. It is
defined when the instructions are available, as it is when
-march=armv8.1-a is enabled with suitable fpu options.

Tested the series for arm-none-eabi with cross-compiled check-gcc on an
ARMv8.1 emulator. Also tested arm-none-linux-gnueabihf with native
bootstrap and make check.

Ok for trunk?
Matthew

gcc/
2015-11-26  Matthew Wahab  

 * config/arm/arm-c.c (arm_cpu_builtins): Define __ARM_FEATURE_QRDMX.



>From 721586aad45f7f75a0c198517602125c9d8f76f2 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Wed, 17 Jun 2015 13:25:09 +0100
Subject: [PATCH 4/7] [ARM] Add __ARM_FEATURE_QRDMX

Change-Id: I26cde507e8844a731e4fd857fbd30bf87f213f89
---
 gcc/config/arm/arm-c.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c
index 7dee28e..62c9304 100644
--- a/gcc/config/arm/arm-c.c
+++ b/gcc/config/arm/arm-c.c
@@ -68,6 +68,9 @@ arm_cpu_builtins (struct cpp_reader* pfile)
 
   def_or_undef_macro (pfile, "__ARM_FEATURE_UNALIGNED", unaligned_access);
 
+  if (TARGET_NEON_RDMA)
+builtin_define ("__ARM_FEATURE_QRDMX");
+
   if (TARGET_CRC32)
 builtin_define ("__ARM_FEATURE_CRC32");
 
-- 
2.1.4



Re: C PATCH for c/68668 (grokdeclarator and wrong type of PARM_DECL)

2015-12-07 Thread Joseph Myers
On Mon, 7 Dec 2015, Marek Polacek wrote:

> + if (orig_qual_indirect != 0)
> +   orig_qual_type = TREE_TYPE (orig_qual_type);
> + else
> +   orig_qual_indirect--;

For optimal results for debug info, I think that should be == 0 (i.e. 
preserve orig_qual_type, which may be a typedef, if possible - if the 
parameter is an array of orig_qual_type), rather than != 0.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: patch to fix PR68349

2015-12-07 Thread Vladimir Makarov

On 12/04/2015 06:52 PM, H.J. Lu wrote:

On Fri, Dec 4, 2015 at 11:26 AM, Vladimir Makarov  wrote:

   The following patch fixes

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68349

   The patch was tested and bootstrapped on x86/x86-64.

  Committed as rev. 231300.

unsigned long strlen();
^^^

I got

./xgcc -B./ -O2
/export/gnu/import/git/sources/gcc/gcc/testsuite/gcc.target/i386/pr68349.c
-S -m32
/export/gnu/import/git/sources/gcc/gcc/testsuite/gcc.target/i386/pr68349.c:6:15:
warning: conflicting types for built-in function ‘strlen’
  unsigned long strlen();

Shouldn't strlen be renamed?


Thanks for reporting this, H.J.

I've committed the following patch:

Index: ChangeLog
===
--- ChangeLog   (revision 231368)
+++ ChangeLog   (working copy)
@@ -1,3 +1,7 @@
+2015-12-07  Vladimir Makarov  
+
+   * gcc.target/i386/pr68349.c (strlen): Rename to my_strlen.
+
 2015-12-07  Nathan Sidwell  

* gcc.target/nvptx/abort.c: New.
Index: gcc.target/i386/pr68349.c
===
--- gcc.target/i386/pr68349.c   (revision 231300)
+++ gcc.target/i386/pr68349.c   (working copy)
@@ -3,7 +3,7 @@
 /* { dg-options "-O2" } */

 int a, b;
-unsigned long strlen();
+unsigned long my_strlen();
 typedef struct sHyphenNode {
   char sepcnts[0];
   struct sHyphenNode *Daughters[];
@@ -12,7 +12,7 @@
 PHyphenNode c;
 void DoHyphens_Field_1() {
   char d[300], e[300];
-  int z, f, l = strlen();
+  int z, f, l = my_strlen();
   for (; z;)
 ;
   for (; l; z++) {



Re: [patch] Fix PR middle-end/68291 & 68292

2015-12-07 Thread Richard Biener
On December 7, 2015 5:42:02 PM GMT+01:00, Eric Botcazou  
wrote:
>> Ok. Although thinking about your comment in the PR about not making
>such
>> vectors gimple registers I wonder what the effects of that would be.
>
>First of all it's a bit painful to do because is_gimple_reg_type is
>defined 
>inline in gimple-expr.h and adding TYPE_MODE in there causes a
>compilation 
>failure for a bunch of files.  More seriously, this can probably be
>seen as a 
>real layering violation so I'm not sure this would be a progress.

Yeah, it would also have quite some impact on optimization.  Note that if only 
specific decls are involved they can be made non-registers via 
DECL_GIMPLE_REG_P.

Richard.




Re: C PATCH for c/68668 (grokdeclarator and wrong type of PARM_DECL)

2015-12-07 Thread Marek Polacek
On Mon, Dec 07, 2015 at 04:05:11PM +, Joseph Myers wrote:
> On Mon, 7 Dec 2015, Marek Polacek wrote:
> 
> > +   if (orig_qual_indirect != 0)
> > + orig_qual_type = TREE_TYPE (orig_qual_type);
> > +   else
> > + orig_qual_indirect--;
> 
> For optimal results for debug info, I think that should be == 0 (i.e. 
> preserve orig_qual_type, which may be a typedef, if possible - if the 
> parameter is an array of orig_qual_type), rather than != 0.

I'm confused now.  If orig_qual_indirect != 0, it is indirect, right?
Earlier you said:
"if orig_qual_indirect is indirect, in that case you 
may get better results from changing orig_qual_type without decrementing 
orig_qual_indirect."
Isn't that something else than this version with == 0?

Anyway, here's the version with == 0.  Thanks,

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2015-12-07  Marek Polacek  

PR c/68668
* c-decl.c (grokdeclarator): If ORIG_QUAL_INDIRECT is indirect, use
TREE_TYPE of ORIG_QUAL_TYPE, otherwise decrement ORIG_QUAL_INDIRECT.

* gcc.dg/pr68668.c: New test.

diff --git gcc/c/c-decl.c gcc/c/c-decl.c
index 9ad8219..6a85514 100644
--- gcc/c/c-decl.c
+++ gcc/c/c-decl.c
@@ -6417,6 +6417,13 @@ grokdeclarator (const struct c_declarator *declarator,
  {
/* Transfer const-ness of array into that of type pointed to.  */
type = TREE_TYPE (type);
+   if (orig_qual_type != NULL_TREE)
+ {
+   if (orig_qual_indirect == 0)
+ orig_qual_type = TREE_TYPE (orig_qual_type);
+   else
+ orig_qual_indirect--;
+ }
if (type_quals)
  type = c_build_qualified_type (type, type_quals, orig_qual_type,
 orig_qual_indirect);
diff --git gcc/testsuite/gcc.dg/pr68668.c gcc/testsuite/gcc.dg/pr68668.c
index e69de29..d013aa9 100644
--- gcc/testsuite/gcc.dg/pr68668.c
+++ gcc/testsuite/gcc.dg/pr68668.c
@@ -0,0 +1,53 @@
+/* PR c/68668 */
+/* { dg-do compile } */
+
+typedef const int T[];
+typedef const int U[1];
+
+int
+fn1 (T p)
+{
+  return p[0];
+}
+
+int
+fn2 (U p[2])
+{
+  return p[0][0];
+}
+
+int
+fn3 (U p[2][3])
+{
+  return p[0][0][0];
+}
+
+int
+fn4 (U *p)
+{
+  return p[0][0];
+}
+
+int
+fn5 (U (*p)[1])
+{
+  return (*p)[0][0];
+}
+
+int
+fn6 (U (*p)[1][2])
+{
+  return (*p)[0][0][0];
+}
+
+int
+fn7 (U **p)
+{
+  return p[0][0][0];
+}
+
+int
+fn8 (U (**p)[1])
+{
+  return (*p)[0][0][0];
+}

Marek


Re: [PATCH 2/7][ARM] Multilib support for ARMv8.1.

2015-12-07 Thread Matthew Wahab

Ping. Updated patch attached.
Matthew

On 26/11/15 15:58, Matthew Wahab wrote:

This patch sets up multilib support for ARMv8.1, treating it as a
synonym for ARMv8. Since ARMv8.1 integer, FP or SIMD
instructions are only generated for the new, instruction-specific
instrinsics, mapping to ARMv8 rather than adding a new multilib variant
is sufficient.

Tested the series for arm-none-eabi with cross-compiled check-gcc on an
ARMv8.1 emulator. Also tested arm-none-linux-gnueabihf with native
bootstrap and make check.

Ok for trunk?
Matthew

gcc/
2015-11-26  Matthew Wahab  

 * config/arm/t-aprofile: Make "armv8.1-a" and "armv8.1-a+crc"
 matches for "armv8-a".



>From c5c0f983e03135fe0cde29077353b429c0c502a2 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Fri, 23 Oct 2015 09:37:12 +0100
Subject: [PATCH 2/7] [ARM] Multilib support for ARMv8.1

Change-Id: I65ee77768e22452ac15452cf6d4fdec3079ef852
---
 gcc/config/arm/t-aprofile | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/config/arm/t-aprofile b/gcc/config/arm/t-aprofile
index cf34161..b23f1bc 100644
--- a/gcc/config/arm/t-aprofile
+++ b/gcc/config/arm/t-aprofile
@@ -98,6 +98,8 @@ MULTILIB_MATCHES   += march?armv8-a=mcpu?xgene1
 
 # Arch Matches
 MULTILIB_MATCHES   += march?armv8-a=march?armv8-a+crc
+MULTILIB_MATCHES   += march?armv8-a=march?armv8.1-a
+MULTILIB_MATCHES   += march?armv8-a=march?armv8.1-a+crc
 
 # FPU matches
 MULTILIB_MATCHES   += mfpu?vfpv3-d16=mfpu?vfpv3
-- 
2.1.4



Re: [PATCH 7/7][ARM] Add ACLE intrinsics vqrdmlah_lane and vqrdmlsh_lane

2015-12-07 Thread Matthew Wahab

Ping. Updated patch attached.
Matthew

On 26/11/15 16:10, Matthew Wahab wrote:

Attached the missing patch.
Matthew

On 26/11/15 16:04, Matthew Wahab wrote:

Hello,

This patch adds the ACLE intrinsics for the instructions introduced in
ARMv8.1. It adds the vqrmdlah_lane and vqrdmlsh_lane forms of the
instrinsics to the arm_neon.h header, together with the ARM builtins
used to implement them. The intrinsics are available when
-march=armv8.1-a is enabled together with appropriate fpu options.

Tested the series for arm-none-eabi with cross-compiled check-gcc on an
ARMv8.1 emulator. Also tested arm-none-linux-gnueabihf with native
bootstrap and make check.

Ok for trunk?
Matthew

gcc/
2015-11-26  Matthew Wahab  

 * config/arm/arm_neon.h (vqrdmlahq_lane_s16): New.
 (vqrdmlahq_lane_s32): New.
 (vqrdmlah_lane_s16): New.
 (vqrdmlah_lane_s32): New.
 (vqrdmlshq_lane_s16): New.
 (vqrdmlshq_lane_s32): New.
 (vqrdmlsh_lane_s16): New.
 (vqrdmlsh_lane_s32): New.
 * config/arm/arm_neon_builtins.def: Add "vqrdmlah_lane" and
 "vqrdmlsh_lane".





>From 9928f1e8e30c500933fa68f95311cf0f78dd6712 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Tue, 1 Sep 2015 16:22:34 +0100
Subject: [PATCH 7/7] [ARM] Add neon intrinsics vqrdmlah_lane, vqrdmlsh_lane.

Change-Id: Ia0ab4bbe683af2d019d18a34302a7b9798193a79
---
 gcc/config/arm/arm_neon.h| 50 
 gcc/config/arm/arm_neon_builtins.def |  2 ++
 2 files changed, 52 insertions(+)

diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index b617f80..ed50253 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -7096,6 +7096,56 @@ vqrdmulh_lane_s32 (int32x2_t __a, int32x2_t __b, const int __c)
   return (int32x2_t)__builtin_neon_vqrdmulh_lanev2si (__a, __b, __c);
 }
 
+#ifdef __ARM_FEATURE_QRDMX
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vqrdmlahq_lane_s16 (int16x8_t __a, int16x8_t __b, int16x4_t __c, const int __d)
+{
+  return (int16x8_t)__builtin_neon_vqrdmlah_lanev8hi (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vqrdmlahq_lane_s32 (int32x4_t __a, int32x4_t __b, int32x2_t __c, const int __d)
+{
+  return (int32x4_t)__builtin_neon_vqrdmlah_lanev4si (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vqrdmlah_lane_s16 (int16x4_t __a, int16x4_t __b, int16x4_t __c, const int __d)
+{
+  return (int16x4_t)__builtin_neon_vqrdmlah_lanev4hi (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vqrdmlah_lane_s32 (int32x2_t __a, int32x2_t __b, int32x2_t __c, const int __d)
+{
+  return (int32x2_t)__builtin_neon_vqrdmlah_lanev2si (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vqrdmlshq_lane_s16 (int16x8_t __a, int16x8_t __b, int16x4_t __c, const int __d)
+{
+  return (int16x8_t)__builtin_neon_vqrdmlsh_lanev8hi (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vqrdmlshq_lane_s32 (int32x4_t __a, int32x4_t __b, int32x2_t __c, const int __d)
+{
+  return (int32x4_t)__builtin_neon_vqrdmlsh_lanev4si (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vqrdmlsh_lane_s16 (int16x4_t __a, int16x4_t __b, int16x4_t __c, const int __d)
+{
+  return (int16x4_t)__builtin_neon_vqrdmlsh_lanev4hi (__a, __b, __c, __d);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vqrdmlsh_lane_s32 (int32x2_t __a, int32x2_t __b, int32x2_t __c, const int __d)
+{
+  return (int32x2_t)__builtin_neon_vqrdmlsh_lanev2si (__a, __b, __c, __d);
+}
+#endif
+
 __extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
 vmul_n_s16 (int16x4_t __a, int16_t __b)
 {
diff --git a/gcc/config/arm/arm_neon_builtins.def b/gcc/config/arm/arm_neon_builtins.def
index 8d5c0ca..1fdb2a8 100644
--- a/gcc/config/arm/arm_neon_builtins.def
+++ b/gcc/config/arm/arm_neon_builtins.def
@@ -60,6 +60,8 @@ VAR4 (BINOP, vqdmulh_n, v4hi, v2si, v8hi, v4si)
 VAR4 (BINOP, vqrdmulh_n, v4hi, v2si, v8hi, v4si)
 VAR4 (SETLANE, vqdmulh_lane, v4hi, v2si, v8hi, v4si)
 VAR4 (SETLANE, vqrdmulh_lane, v4hi, v2si, v8hi, v4si)
+VAR4 (MAC_LANE, vqrdmlah_lane, v4hi, v2si, v8hi, v4si)
+VAR4 (MAC_LANE, vqrdmlsh_lane, v4hi, v2si, v8hi, v4si)
 VAR2 (BINOP, vqdmull, v4hi, v2si)
 VAR8 (BINOP, vshls, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
 VAR8 (BINOP, vshlu, v8qi, v4hi, v2si, di, v16qi, v8hi, v4si, v2di)
-- 
2.1.4



Re: [PATCH 6/7][ARM] Add ACLE intrinsics vqrdmlah and vqrdmlsh

2015-12-07 Thread Matthew Wahab

Ping. Updated patch attached.
Matthew

On 26/11/15 16:04, Matthew Wahab wrote:

Hello,

This patch adds the ACLE intrinsics for the instructions introduced in
ARMv8.1. It adds the vqrmdlah and vqrdmlsh forms of the instrinsics to
the arm_neon.h header, together with the ARM builtins used to implement
them. The intrinsics are available when -march=armv8.1-a is enabled
together with appropriate fpu options.

Tested the series for arm-none-eabi with cross-compiled check-gcc on an
ARMv8.1 emulator. Also tested arm-none-linux-gnueabihf with native
bootstrap and make check.

Ok for trunk?
Matthew

gcc/
2015-11-26  Matthew Wahab  

 * config/arm/arm_neon.h (vqrdmlah_s16, vqrdmlah_s32): New.
 (vqrdmlahq_s16, vqrdmlahq_s32): New.
 (vqrdmlsh_s16, vqrdmlsh_s32): New.
 (vqrdmlahq_s16, vqrdmlshq_s32): New.
 * config/arm/arm_neon_builtins.def: Add "vqrdmlah" and "vqrdmlsh".



>From 1844027592d818e0de53a3da904ae6bfe1aef534 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Tue, 1 Sep 2015 16:21:44 +0100
Subject: [PATCH 6/7] [ARM] Add neon intrinsics vqrdmlah, vqrdmlsh.

Change-Id: Ic40ff4d477f36ec01714c68e3b83b66208c7958b
---
 gcc/config/arm/arm_neon.h| 50 
 gcc/config/arm/arm_neon_builtins.def |  2 ++
 2 files changed, 52 insertions(+)

diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 0a33d21..b617f80 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -1158,6 +1158,56 @@ vqrdmulhq_s32 (int32x4_t __a, int32x4_t __b)
   return (int32x4_t)__builtin_neon_vqrdmulhv4si (__a, __b);
 }
 
+#ifdef __ARM_FEATURE_QRDMX
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vqrdmlah_s16 (int16x4_t __a, int16x4_t __b, int16x4_t __c)
+{
+  return (int16x4_t)__builtin_neon_vqrdmlahv4hi (__a, __b, __c);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vqrdmlah_s32 (int32x2_t __a, int32x2_t __b, int32x2_t __c)
+{
+  return (int32x2_t)__builtin_neon_vqrdmlahv2si (__a, __b, __c);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vqrdmlahq_s16 (int16x8_t __a, int16x8_t __b, int16x8_t __c)
+{
+  return (int16x8_t)__builtin_neon_vqrdmlahv8hi (__a, __b, __c);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vqrdmlahq_s32 (int32x4_t __a, int32x4_t __b, int32x4_t __c)
+{
+  return (int32x4_t)__builtin_neon_vqrdmlahv4si (__a, __b, __c);
+}
+
+__extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
+vqrdmlsh_s16 (int16x4_t __a, int16x4_t __b, int16x4_t __c)
+{
+  return (int16x4_t)__builtin_neon_vqrdmlshv4hi (__a, __b, __c);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vqrdmlsh_s32 (int32x2_t __a, int32x2_t __b, int32x2_t __c)
+{
+  return (int32x2_t)__builtin_neon_vqrdmlshv2si (__a, __b, __c);
+}
+
+__extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
+vqrdmlshq_s16 (int16x8_t __a, int16x8_t __b, int16x8_t __c)
+{
+  return (int16x8_t)__builtin_neon_vqrdmlshv8hi (__a, __b, __c);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vqrdmlshq_s32 (int32x4_t __a, int32x4_t __b, int32x4_t __c)
+{
+  return (int32x4_t)__builtin_neon_vqrdmlshv4si (__a, __b, __c);
+}
+#endif
+
 __extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
 vmull_s8 (int8x8_t __a, int8x8_t __b)
 {
diff --git a/gcc/config/arm/arm_neon_builtins.def b/gcc/config/arm/arm_neon_builtins.def
index 0b719df..8d5c0ca 100644
--- a/gcc/config/arm/arm_neon_builtins.def
+++ b/gcc/config/arm/arm_neon_builtins.def
@@ -45,6 +45,8 @@ VAR4 (BINOP, vqdmulh, v4hi, v2si, v8hi, v4si)
 VAR4 (BINOP, vqrdmulh, v4hi, v2si, v8hi, v4si)
 VAR2 (TERNOP, vqdmlal, v4hi, v2si)
 VAR2 (TERNOP, vqdmlsl, v4hi, v2si)
+VAR4 (TERNOP, vqrdmlah, v4hi, v2si, v8hi, v4si)
+VAR4 (TERNOP, vqrdmlsh, v4hi, v2si, v8hi, v4si)
 VAR3 (BINOP, vmullp, v8qi, v4hi, v2si)
 VAR3 (BINOP, vmulls, v8qi, v4hi, v2si)
 VAR3 (BINOP, vmullu, v8qi, v4hi, v2si)
-- 
2.1.4



Re: [PATCH] Adjust vect-widen-mult-const-[su]16.c for r226675

2015-12-07 Thread Bill Schmidt
Hi Richi,

I was afraid this would break X86.  Unfortunately, your proposed patch
didn't change any output for me.  Still seeing 6 and 8 instances of
"pattern recognized", unfortunately.

Bill

On Mon, 2015-12-07 at 11:50 +0100, Richard Biener wrote:
> On Fri, Dec 4, 2015 at 8:51 PM, Bill Schmidt
>  wrote:
> > Since r226675, we have been seeing these failures:
> >
> > FAIL: gcc.dg/vect/vect-widen-mult-const-s16.c -flto -ffat-lto-objects
> > scan-tree-dump-times vect "pattern recognized" 2
> > FAIL: gcc.dg/vect/vect-widen-mult-const-s16.c scan-tree-dump-times vect
> > "pattern recognized" 2
> > FAIL: gcc.dg/vect/vect-widen-mult-const-u16.c -flto -ffat-lto-objects
> > scan-tree-dump-times vect "pattern recognized" 2
> > FAIL: gcc.dg/vect/vect-widen-mult-const-u16.c scan-tree-dump-times vect
> > "pattern recognized" 2
> >
> > Comparing the vect-details dumps from r226674 to r226675, I see these as
> > the reason:
> >
> > 63a64,66
> >> /home/wschmidt/gcc/gcc-mainline-base/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-s16.c:16:3:
> >>  note: vect_recog_mult_pattern: detected:
> >> /home/wschmidt/gcc/gcc-mainline-base/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-s16.c:16:3:
> >>  note: patt_47 = _6 << 2;
> >> /home/wschmidt/gcc/gcc-mainline-base/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-s16.c:16:3:
> >>  note: pattern recognized: patt_47 = _6 << 2;
> > 70a74,76
> >> /home/wschmidt/gcc/gcc-mainline-base/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-s16.c:16:3:
> >>  note: vect_recog_mult_pattern: detected:
> >> /home/wschmidt/gcc/gcc-mainline-base/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-s16.c:16:3:
> >>  note: patt_40 = _6 << 1;
> >> /home/wschmidt/gcc/gcc-mainline-base/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-s16.c:16:3:
> >>  note: pattern recognized: patt_40 = _6 << 1;
> >
> > 747a754,756
> >> /home/wschmidt/gcc/gcc-mainline-base/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-s16.c:31:3:
> >>  note: vect_recog_mult_pattern: detected:
> >> /home/wschmidt/gcc/gcc-mainline-base/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-s16.c:31:3:
> >>  note: patt_47 = _6 << 2;
> >> /home/wschmidt/gcc/gcc-mainline-base/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-s16.c:31:3:
> >>  note: pattern recognized: patt_47 = _6 << 2;
> > 754a764,766
> >> /home/wschmidt/gcc/gcc-mainline-base/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-s16.c:31:3:
> >>  note: vect_recog_mult_pattern: detected:
> >> /home/wschmidt/gcc/gcc-mainline-base/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-s16.c:31:3:
> >>  note: patt_40 = _6 << 1;
> >> /home/wschmidt/gcc/gcc-mainline-base/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-s16.c:31:3:
> >>  note: pattern recognized: patt_40 = _6 << 1;
> >
> > These seems precisely what's expected, given the nature of the patch,
> > which is looking for these opportunities.  So it's likely that we should
> > just change
> >
> > /* { dg-final { scan-tree-dump-times "pattern recognized" 2
> > "vect" { target vect_widen_mult_hi_to_si_pattern } } } */
> >
> > to
> >
> > /* { dg-final { scan-tree-dump-times "pattern recognized" 6
> > "vect" { target vect_widen_mult_hi_to_si_pattern } } } */
> >
> > and similarly for the unsigned case.  The following patch does this.
> > However, I wanted to run this by Venkat since this was apparently not
> > detected when his patch went in.  This doesn't appear to be a
> > target-specific issue, and most targets support
> > vect_widen_mult_hi_to_si_pattern, so I'm not sure why this wasn't fixed
> > with the original patch.  Will this change break on any other targets
> > for some reason?
> >
> > Tested on powerpc64le-unknown-linux-gnu.  Ok for trunk?
> 
> Hmm.  That will FAIL on x86_64 though because it can handle multiplication
> natively.  I think the pattern recognition is simply bogus as it fails to 
> detect
> the stmt is already part of the widen-mult pattern?  In fact, pattern
> recognition
> looping over all pattern functions even if one already matched on the very
> same stmt looks bogus to me.
> 
> Does the (untested)
> 
> Index: gcc/tree-vect-patterns.c
> ===
> --- gcc/tree-vect-patterns.c(revision 231357)
> +++ gcc/tree-vect-patterns.c(working copy)
> @@ -3791,7 +3791,7 @@ vect_mark_pattern_stmts (gimple *orig_st
> This function also does some bookkeeping, as explained in the 
> documentation
> for vect_recog_pattern.  */
> 
> -static void
> +static bool
>  vect_pattern_recog_1 (vect_recog_func_ptr vect_recog_func,
>   gimple_stmt_iterator si,
>   vec *stmts_to_replace)
> @@ -3809,7 +3809,7 @@ vect_pattern_recog_1 (vect_recog_func_pt
>stmts_to_replace->quick_push (stmt);
>pattern_stmt = (* vect_recog_func) (stmts_to_replace, _in, _out);
>if (!pattern_stmt)
> -return;
> +return false;
> 
>stmt = stmts_to_replace->last ();
>stmt_info = vinfo_for_stmt (stmt);
> @@ 

Re: [PTX] no return fns

2015-12-07 Thread Nathan Sidwell

On 12/07/15 12:08, Alexander Monakov wrote:

Hello Nathan,

On Mon, 7 Dec 2015, Nathan Sidwell wrote:

This patch changes call emission to look for a noreturn note and emit a trap
insn after the call.  The JIT  no longer explodes.


I think there's a potential issue with the patch: when the noreturn function
has a non-void return value, your patch places 'trap' between 'call' and


Aren't noreturn fns required to be void?  It certainly doesn't make sense for 
them to do otherwise.



'ld.param' insns.  That violates the PTX specification, which demands:

 All st.param instructions used for passing arguments to function call must
 immediately precede the corresponding call instruction and ld.param
 instruction used for collecting return value must immediately follow the
 call instruction without any control flow alteration.


so I don't think this can happen ...

nathan


Re: -fstrict-aliasing fixes 6/6: permit inlining of comdats

2015-12-07 Thread Richard Biener
On Fri, 4 Dec 2015, Jan Hubicka wrote:

> > 
> > I wonder if you can split out the re-naming at this stage.  Further
> > comments below.
> 
> OK, I will commit the renaming and ipa-icf fix separately.
> > 
> > > Bootstrapped/regtested x86_64-linux, OK?
> > > 
> > > I will work on some testcases for the ICF and fold-const that would lead
> > > to wrong code if alias sets was ignored early.
> > 
> > Would be nice to have a wrong-code testcase go with the commit.
> > 
> > > Honza
> > >   * fold-const.c (operand_equal_p): Before inlining do not permit
> > >   transformations that would break with strict aliasing.
> > >   * ipa-inline.c (can_inline_edge_p) Use merged_comdat.
> > >   * ipa-inline-transform.c (inline_call): When inlining merged comdat do
> > >   not drop strict_aliasing flag of caller.
> > >   * cgraphclones.c (cgraph_node::create_clone): Use merged_comdat.
> > >   * cgraph.c (cgraph_node::dump): Dump merged_comdat.
> > >   * ipa-icf.c (sem_function::merge): Drop merged_comdat when merging
> > >   comdat and non-comdat.
> > >   * cgraph.h (cgraph_node): Rename merged to merged_comdat.
> > >   * ipa-inline-analysis.c (simple_edge_hints): Check both merged_comdat
> > >   and icf_merged.
> > > 
> > >   * lto-symtab.c (lto_cgraph_replace_node): Update code computing
> > >   merged_comdat.
> > > Index: fold-const.c
> > > ===
> > > --- fold-const.c  (revision 231239)
> > > +++ fold-const.c  (working copy)
> > > @@ -2987,7 +2987,7 @@ operand_equal_p (const_tree arg0, const_
> > >  flags)))
> > >   return 0;
> > > /* Verify that accesses are TBAA compatible.  */
> > > -   if (flag_strict_aliasing
> > > +   if ((flag_strict_aliasing || !cfun->after_inlining)
> > > && (!alias_ptr_types_compatible_p
> > >   (TREE_TYPE (TREE_OPERAND (arg0, 1)),
> > >TREE_TYPE (TREE_OPERAND (arg1, 1)))
> > 
> > Sooo  first of all the code is broken anyway as it guards
> > the restrict checking (MR_DEPENDENCE_*) stuff with flag_strict_aliasing
> > (ick).  Second, I wouldn't mind if we drop the flag_strict_aliasing
> > check alltogether, a cfun->after_inlining checks makes me just too
> > nervous.
> 
> OK, I will drop the check separately, too.  
> Next stage1 we need to look into code merging across alias classes. ipa-icf
> scores are currently 40% down compared to GCC 5 at Firefox.
> > 
> > So your logic relies on the fact that the -fno-strict-aliasing was
> > not necessary on copy A if copy B was compiled without that flag
> > because otherwise copy B would invoke undefined behavior?
> 
> Yes.
> > 
> > This menans it's a language semantics thing but you simply look at
> > whether it's "comdat"?  Shouldn't this use some ODR thing instead?
> 
> It is definition of COMDAT. COMDAT functions are output in every unit
> used and no matter what body wins the linking is correct.  Only C++
> produce comdat functions, so they all comply ODR rule, so we could rely
> on the fact that all function bodies should be equivalent on a source
> level.
> > 
> > Also as undefined behavior only applies at runtime consider copy A
> > (with -fno-strict-aliasing) is used in contexts where undefined
> > behavior would occur while copy B not.  Say,
> > 
> > int foo (int *p, short *q)
> > {
> >   *p = 1;
> >   return *q;
> > }
> > 
> > and the copy A use is foo (, ) while the copy B use foo (, ).
> > 
> > Yes, the case is lame here as we'd miscompile this in copy B and
> > comdat makes us eventually use that copy for A.  But if we don't
> > manage to miscompile this without inlining there isn't any undefined
> > behavior (at runtime) you can rely on.
> 
> Well, it is ODR violation in this case :)
> > 
> > Just want to know whether you thought about the above cases, I would
> > declare them invalid but I am not sure the C++ standard agrees here.
> 
> Well, not exactly of the case mentioned above, but still think that this is
> safe (ugly, too). An alternative is to keep around the bodies until after
> inlining.  I have infrastructure for that in my tree, but it is hard to tune 
> to
> do: first the alternative function body may have different symbol references
> (as a result of different early inlinin) which may not be resolved in current
> binary so we can not use it at all. Second keepin many alternatives of every
> body around makes code size estimates in inliner go crazy.

/me inserts his usual "partitioning to the rescue" comment...

;)

Richard.


Re: [PATCH][1/2] Fix PR68553

2015-12-07 Thread Richard Biener
On Fri, 4 Dec 2015, Alan Lawrence wrote:

> On 04/12/15 17:46, Ramana Radhakrishnan wrote:
> > 
> > 
> > On 04/12/15 16:04, Richard Biener wrote:
> > > On December 4, 2015 4:32:33 PM GMT+01:00, Alan Lawrence
> > >  wrote:
> > > > On 27/11/15 08:30, Richard Biener wrote:
> > > > > 
> > > > > This is part 1 of a fix for PR68533 which shows that some targets
> > > > > cannot can_vec_perm_p on an identity permutation.  I chose to fix
> > > > > this in the vectorizer by detecting the identity itself but with
> > > > > the current structure of vect_transform_slp_perm_load this is
> > > > > somewhat awkward.  Thus the following no-op patch simplifies it
> > > > > greatly (from the times it was restricted to do interleaving-kind
> > > > > of permutes).  It turned out to not be 100% no-op as we now can
> > > > > handle non-adjacent source operands so I split it out from the
> > > > > actual fix.
> > > > > 
> > > > > The two adjusted testcases no longer fail to vectorize because
> > > > > of "need three vectors" but unadjusted would fail because there
> > > > > are simply not enough scalar iterations in the loop.  I adjusted
> > > > > that and now we vectorize it just fine (running into PR68559
> > > > > which I filed).
> > > > > 
> > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
> > > > > 
> > > > > Richard.
> > > > > 
> > > > > 2015-11-27  Richard Biener  
> > > > > 
> > > > >   PR tree-optimization/68553
> > > > >   * tree-vect-slp.c (vect_get_mask_element): Remove.
> > > > >   (vect_transform_slp_perm_load): Implement in a simpler way.
> > > > > 
> > > > >   * gcc.dg/vect/pr45752.c: Adjust.
> > > > >   * gcc.dg/vect/slp-perm-4.c: Likewise.
> > > > 
> > > > On aarch64 and ARM targets, this causes
> > > > 
> > > > PASS->FAIL: gcc.dg/vect/O3-pr36098.c scan-tree-dump-times vect
> > > > "vectorizing
> > > > stmts using SLP" 0
> > > > 
> > > > That is, we now vectorize using SLP, when previously we did not.
> > > > 
> > > > On aarch64 (and I expect ARM too), previously we used a VEC_LOAD_LANES,
> > > > without
> > > > unrolling,
> > > but now we unroll * 4, and vectorize using 3 loads and
> > > > permutes:
> > > 
> > > Happens on x86_64 as well with at least Sse4.1.  Unfortunately we'll have
> > > to start introducing much more fine-grained target-supports for vect_perm
> > > to reliably guard all targets.
> > 
> > I don't know enough about SSE4.1 to know whether it's a problem there or
> > not. This is an actual regression on AArch64 and ARM and not just a testism,
> > you now get :
> > 
> > .L5:
> >  ldr q0, [x5, 16]
> >  add x4, x4, 48
> >  ldr q1, [x5, 32]
> >  add w6, w6, 1
> >  ldr q4, [x5, 48]
> >  cmp w3, w6
> >  ldr q2, [x5], 64
> >  orr v3.16b, v0.16b, v0.16b
> >  orr v5.16b, v4.16b, v4.16b
> >  orr v4.16b, v1.16b, v1.16b
> >  tbl v0.16b, {v0.16b - v1.16b}, v6.16b
> >  tbl v2.16b, {v2.16b - v3.16b}, v7.16b
> >  tbl v4.16b, {v4.16b - v5.16b}, v16.16b
> >  str q0, [x4, -32]
> >  str q2, [x4, -48]
> >  str q4, [x4, -16]
> >  bhi .L5
> > 
> > instead of
> > 
> > .L5:
> >  ld4 {v4.4s - v7.4s}, [x7], 64
> >  add w4, w4, 1
> >  cmp w3, w4
> >  orr v1.16b, v4.16b, v4.16b
> >  orr v2.16b, v5.16b, v5.16b
> >  orr v3.16b, v6.16b, v6.16b
> >  st3 {v1.4s - v3.4s}, [x6], 48
> >  bhi .L5
> > 
> > LD4 and ST3 do all the permutes without needing actual permute instructions
> > - a strategy that favours generic permutes avoiding the load_lanes case is
> > likely to be more expensive on most implementations. I think worth a PR
> > atleast.
> > 
> > regards
> > Ramana
> > 
> 
> Yes, quite right. PR 68707.

Thanks - I will think of sth.  Note that it's not all clearly obvious
the 2nd variant is cheaper because in the load-lane variant you have
a larger unrolling factor plus extra peeling due to the gap (ld4).
This means that loops not iterating much at runtime are likely pessimized
compared to the SLP variant.  Not sure where the actual cut-off would be.

Btw, is a ld3/st3 pair actually cheaper than three ld/st (without
extra permutation)?  It's shorter code, but is it faster?

Thanks,
Richard.


RE: [PATCH][ARC] Refurbish emitting DWARF2 for epilogue.

2015-12-07 Thread Claudiu Zissulescu
Hi Joern,

> > +  insn = emit_insn (gen_blockage ());
> 
> Is this actually part of the patch to fix cfi generation?

This instruction prevents the delay branch scheduler to speculatively use 
epilogue instructions to fill up the delay slots. Hence, generating an assert 
during dwarf2cfi execution. This behavior is experiment by dg.exp/pr49994-1.s 
test.

At a closer inspection of the patch, I've refurbish it in a more generic 
fashion (attached), where the blockage guards the entire expand epilogue 
process.

It may be questionable if emitting blockage in epilogue is part of the cfi 
refactoring of the epilogue. However, without it we may still get errors in 
dwarf2cfi.

Thanks,
Claudiu


0001-Refurbish-emitting-DWARF2-related-information-when-e.patch
Description: 0001-Refurbish-emitting-DWARF2-related-information-when-e.patch


Re: [Fortran, Patch] Memory sync after coarray image control statements and assignment

2015-12-07 Thread Tobias Burnus
I wrote:
> I wonder whether using
>
> __asm__ __volatile__ ("":::"memory");
>
> would be sufficient as it has a way lower overhead than 
> __sync_synchronize().

Namely, something like the attached patch.

Regarding the original patch submission: Is there a reason that you didn't
include the test case of Deepak from 
https://gcc.gnu.org/ml/fortran/2015-04/msg00062.html
It should work as -fcoarray=lib -lcaf_single "dg-do run" test.

Tobias
 trans-intrinsic.c |   18 ++
 trans-stmt.c  |   29 +
 trans.c   |   16 
 3 files changed, 63 insertions(+)

diff --git a/gcc/fortran/trans-intrinsic.c b/gcc/fortran/trans-intrinsic.c
index 21efe44..04ba3ea 100644
--- a/gcc/fortran/trans-intrinsic.c
+++ b/gcc/fortran/trans-intrinsic.c
@@ -1222,6 +1222,15 @@ gfc_conv_intrinsic_caf_get (gfc_se *se, gfc_expr *expr, tree lhs, tree lhs_kind,
   se->expr = res_var;
   if (array_expr->ts.type == BT_CHARACTER)
 se->string_length = argse.string_length;
+
+  /* It guarantees memory consistency within the same segment */
+  tmp = gfc_build_string_const (strlen ("memory")+1, "memory"),
+  tmp = build5_loc (input_location, ASM_EXPR, void_type_node,
+		gfc_build_string_const (1, ""), NULL_TREE, NULL_TREE,
+		tree_cons (NULL_TREE, tmp, NULL_TREE), NULL_TREE);
+  ASM_VOLATILE_P (tmp) = 1;
+  gfc_add_expr_to_block (>pre, tmp);
+
 }
 
 
@@ -1390,6 +1399,15 @@ conv_caf_send (gfc_code *code) {
   gfc_add_expr_to_block (, tmp);
   gfc_add_block_to_block (, _se.post);
   gfc_add_block_to_block (, _se.post);
+
+  /* It guarantees memory consistency within the same segment */
+  tmp = gfc_build_string_const (strlen ("memory")+1, "memory"),
+  tmp = build5_loc (input_location, ASM_EXPR, void_type_node,
+		gfc_build_string_const (1, ""), NULL_TREE, NULL_TREE,
+		tree_cons (NULL_TREE, tmp, NULL_TREE), NULL_TREE);
+  ASM_VOLATILE_P (tmp) = 1;
+  gfc_add_expr_to_block (, tmp);
+
   return gfc_finish_block ();
 }
 
diff --git a/gcc/fortran/trans-stmt.c b/gcc/fortran/trans-stmt.c
index 3df483a..b7e1faa 100644
--- a/gcc/fortran/trans-stmt.c
+++ b/gcc/fortran/trans-stmt.c
@@ -818,6 +818,15 @@ gfc_trans_lock_unlock (gfc_code *code, gfc_exec_op op)
    errmsg, errmsg_len);
   gfc_add_expr_to_block (, tmp);
 
+  /* It guarantees memory consistency within the same segment */
+  tmp = gfc_build_string_const (strlen ("memory")+1, "memory"),
+  tmp = build5_loc (input_location, ASM_EXPR, void_type_node,
+			gfc_build_string_const (1, ""), NULL_TREE, NULL_TREE,
+			tree_cons (NULL_TREE, tmp, NULL_TREE), NULL_TREE);
+  ASM_VOLATILE_P (tmp) = 1;
+
+  gfc_add_expr_to_block (, tmp);
+
   if (stat2 != NULL_TREE)
 	gfc_add_modify (, stat2,
 			fold_convert (TREE_TYPE (stat2), stat));
@@ -995,6 +1004,14 @@ gfc_trans_event_post_wait (gfc_code *code, gfc_exec_op op)
 			   errmsg, errmsg_len);
   gfc_add_expr_to_block (, tmp);
 
+  /* It guarantees memory consistency within the same segment */
+  tmp = gfc_build_string_const (strlen ("memory")+1, "memory"),
+  tmp = build5_loc (input_location, ASM_EXPR, void_type_node,
+		gfc_build_string_const (1, ""), NULL_TREE, NULL_TREE,
+		tree_cons (NULL_TREE, tmp, NULL_TREE), NULL_TREE);
+  ASM_VOLATILE_P (tmp) = 1;
+  gfc_add_expr_to_block (, tmp);
+
   if (stat2 != NULL_TREE)
 gfc_add_modify (, stat2, fold_convert (TREE_TYPE (stat2), stat));
 
@@ -1080,6 +1097,18 @@ gfc_trans_sync (gfc_code *code, gfc_exec_op type)
 			   fold_convert (integer_type_node, images));
 }
 
+  /* Per F2008, 8.5.1, a SYNC MEMORY is implied by calling the
+ image control statements SYNC IMAGES and SYNC ALL.  */
+  if (flag_coarray == GFC_FCOARRAY_LIB)
+{
+  tmp = gfc_build_string_const (strlen ("memory")+1, "memory"),
+  tmp = build5_loc (input_location, ASM_EXPR, void_type_node,
+			gfc_build_string_const (1, ""), NULL_TREE, NULL_TREE,
+			tree_cons (NULL_TREE, tmp, NULL_TREE), NULL_TREE);
+  ASM_VOLATILE_P (tmp) = 1;
+  gfc_add_expr_to_block (, tmp);
+}
+
   if (flag_coarray != GFC_FCOARRAY_LIB)
 {
   /* Set STAT to zero.  */
diff --git a/gcc/fortran/trans.c b/gcc/fortran/trans.c
index 001db41..1993743 100644
--- a/gcc/fortran/trans.c
+++ b/gcc/fortran/trans.c
@@ -746,6 +746,14 @@ gfc_allocate_using_lib (stmtblock_t * block, tree pointer, tree size,
 			 TREE_TYPE (pointer), pointer,
 			 fold_convert ( TREE_TYPE (pointer), tmp));
   gfc_add_expr_to_block (block, tmp);
+
+  /* It guarantees memory consistency within the same segment */
+  tmp = gfc_build_string_const (strlen ("memory")+1, "memory"),
+  tmp = build5_loc (input_location, ASM_EXPR, void_type_node,
+		gfc_build_string_const (1, ""), NULL_TREE, NULL_TREE,
+		tree_cons (NULL_TREE, tmp, NULL_TREE), NULL_TREE);
+  ASM_VOLATILE_P (tmp) = 1;
+  gfc_add_expr_to_block (block, tmp);
 }
 
 
@@ -1356,6 +1364,14 @@ gfc_deallocate_with_status (tree pointer, tree status, tree errmsg,
 	 token, pstat, errmsg, 

[PING] [ARM] Use vector wide add for mixed-mode adds

2015-12-07 Thread Michael Collison

Ping. Originally posted here:

https://gcc.gnu.org/ml/gcc-patches/2015-11/msg03440.html

Regards,

Michael Collison


Re: -fstrict-aliasing fixes 6/6: permit inlining of comdats

2015-12-07 Thread Richard Biener
On Fri, 4 Dec 2015, Jan Hubicka wrote:

> Hi,
> this is the patch for fold-const.c. Can you think of some testcase for the
> MR_DEPENDENCE_CLIQUE comparsion? I am not that familiar with the code to
> be able to construct it :(

With ICF it would involve a variant using restrict args vs.
non-restrict args.  For other optimizers it's more difficult
to construct a testcase that would fail.  But if any alias related
compare is necessary in operand_equal_p then the dependence check
is required as well.

> Bootstrapped/regtested x86_64-linux, OK?

Ok.

Thanks,
Richard.

> Honza
> 
>   * fold-const.c (operand_equal_p): Do not use flag_strict_aliasing.
> Index: fold-const.c
> ===
> --- fold-const.c  (revision 231290)
> +++ fold-const.c  (working copy)
> @@ -2987,14 +2987,13 @@ operand_equal_p (const_tree arg0, const_
>  flags)))
>   return 0;
> /* Verify that accesses are TBAA compatible.  */
> -   if (flag_strict_aliasing
> -   && (!alias_ptr_types_compatible_p
> - (TREE_TYPE (TREE_OPERAND (arg0, 1)),
> -  TREE_TYPE (TREE_OPERAND (arg1, 1)))
> -   || (MR_DEPENDENCE_CLIQUE (arg0)
> -   != MR_DEPENDENCE_CLIQUE (arg1))
> -   || (MR_DEPENDENCE_BASE (arg0)
> -   != MR_DEPENDENCE_BASE (arg1
> +   if (!alias_ptr_types_compatible_p
> +  (TREE_TYPE (TREE_OPERAND (arg0, 1)),
> +   TREE_TYPE (TREE_OPERAND (arg1, 1)))
> +   || (MR_DEPENDENCE_CLIQUE (arg0)
> +   != MR_DEPENDENCE_CLIQUE (arg1))
> +   || (MR_DEPENDENCE_BASE (arg0)
> +   != MR_DEPENDENCE_BASE (arg1)))
>   return 0;
>/* Verify that alignment is compatible.  */
>if (TYPE_ALIGN (TREE_TYPE (arg0))


[patch] Fix PR middle-end/68291 & 68292

2015-12-07 Thread Eric Botcazou
Hi,

it's a couple of regressions in the C testsuite present on SPARC 64-bit and 
coming from the new coalescing code which fails to handle vector types with 
BLKmode that are returned in multiple registers.  The code assigns a BLKmode 
REG to the RESULT_DECL of the function in expand_function_start and this later 
causes expand_function_end to choke.

As discussed with Alexandre in the audit trail, the attached minimal fix just 
prevents the problematic BLKmode REG from being generated, which appears to be 
sufficient to restore the nominal operating mode.

Tested on x86-64/Linux and SPARC64/Solaris, OK for the mainline?


2015-12-07  Eric Botcazou  

PR middle-end/68291
PR middle-end/68292
* cfgexpand.c (set_rtl): Always accept PARALLELs with BLKmode for
SSA names based on RESULT_DECLs.
* function.c (expand_function_start): Do not create BLKmode REGs
for GIMPLE registers when coalescing is enabled.

-- 
Eric BotcazouIndex: cfgexpand.c
===
--- cfgexpand.c	(revision 231318)
+++ cfgexpand.c	(working copy)
@@ -184,10 +184,15 @@ set_rtl (tree t, rtx x)
   || SUBREG_P (XEXP (x, 0)))
   && (REG_P (XEXP (x, 1))
   || SUBREG_P (XEXP (x, 1
+			  /* We need to accept PARALLELs for RESUT_DECLs
+ because of vector types with BLKmode returned
+ in multiple registers, but they are supposed
+ to be uncoalesced.  */
 			  || (GET_CODE (x) == PARALLEL
   && SSAVAR (t)
   && TREE_CODE (SSAVAR (t)) == RESULT_DECL
-  && !flag_tree_coalesce_vars))
+  && (GET_MODE (x) == BLKmode
+  || !flag_tree_coalesce_vars)))
 			   : (MEM_P (x) || x == pc_rtx
 			  || (GET_CODE (x) == CONCAT
   && MEM_P (XEXP (x, 0))
Index: function.c
===
--- function.c	(revision 231318)
+++ function.c	(working copy)
@@ -5148,15 +5148,16 @@ expand_function_start (tree subr)
   /* Compute the return values into a pseudo reg, which we will copy
 	 into the true return register after the cleanups are done.  */
   tree return_type = TREE_TYPE (res);
-  /* If we may coalesce this result, make sure it has the expected
-	 mode.  */
-  if (flag_tree_coalesce_vars && is_gimple_reg (res))
-	{
-	  tree def = ssa_default_def (cfun, res);
-	  gcc_assert (def);
-	  machine_mode mode = promote_ssa_mode (def, NULL);
-	  set_parm_rtl (res, gen_reg_rtx (mode));
-	}
+
+  /* If we may coalesce this result, make sure it has the expected mode
+	 in case it was promoted.  But we need not bother about BLKmode.  */
+  machine_mode promoted_mode
+	= flag_tree_coalesce_vars && is_gimple_reg (res)
+	  ? promote_ssa_mode (ssa_default_def (cfun, res), NULL)
+	  : BLKmode;
+
+  if (promoted_mode != BLKmode)
+	set_parm_rtl (res, gen_reg_rtx (promoted_mode));
   else if (TYPE_MODE (return_type) != BLKmode
 	   && targetm.calls.return_in_msb (return_type))
 	/* expand_function_end will insert the appropriate padding in


Re: [PATCH 2/2] Fix minor glitches with basic asm

2015-12-07 Thread Richard Biener
On Sun, 6 Dec 2015, Bernd Edlinger wrote:

> 
> Hi,
> 
> while looking at the handling of basic asm statements
> I noticed two minor glitches, which I want to fix now.
> 
> Secondly there is a wrong check in shorten_branches in final.c
> 
> Here we check if GET_CODE (body) == ASM_INPUT, that is
> never true, because GET_CODE (body) == SEQUENCE here.
> The right object to check is PATTERN (inner_insn).
> 
> 
> Boot-strapped and reg-tested on x86_64-pc-linux-gnu,
> OK for trunk?

Ok.

Richard.


Re: [PATCH 1/2] Fix minor glitches with basic asm

2015-12-07 Thread Richard Biener
On Sun, 6 Dec 2015, Bernd Edlinger wrote:

> 
> Hi,
> 
> while looking at the handling of basic asm statements
> I noticed two minor glitches, which I want to fix now.
> 
> First there is a missing check in compare_gimple_asm in ipa-icf-gimple.c
> 
> Here we check if two asm statements are exactly identical,
> there is a possibility that one is a basic asm and the other is an
> extended asm with zero operands. Even if both have the same string
> the string means something slightly different, if % or { } are around.
> 
> example:
> 
> asm("%"); // OK
> asm("%":); // error: invalid 'asm': invalid %-code
> 
> 
> Boot-strapped and reg-tested on x86_64-pc-linux-gnu,
> OK for trunk?

Ok.

Richard.


Re: [Fortran, Patch] Memory sync after coarray image control statements and assignment

2015-12-07 Thread Alessandro Fanfarillo
Hi,

2015-12-07 8:20 GMT+01:00 Tobias Burnus :
> Always - or only with optimization?
>

Only with optimization.

> I wonder whether using
>
> __asm__ __volatile__ ("":::"memory");
>
> would be sufficient as it has a way lower overhead than
> __sync_synchronize().
>
>
> That would be something like:
>
>   r = build_stmt (input_location, ASM_EXPR, string,
>   output_operands, input_operands,
>   clobbers, labels);
>   ASM_VOLATILE_P (r) = 1;
>
> with string = "", output_operands = NULL_TREE, input_operands = NULL_TREE,
> clobbers = "memory" and labels = NULL_TREE.  (Except that string+clobbers
> are trees and not char[].)
>

I'm going to try it. Thanks.


Re: [patch] Fix PR middle-end/68291 & 68292

2015-12-07 Thread Bernd Schmidt

On 12/07/2015 10:35 AM, Eric Botcazou wrote:

As discussed with Alexandre in the audit trail, the attached minimal fix just
prevents the problematic BLKmode REG from being generated, which appears to be
sufficient to restore the nominal operating mode.




PR middle-end/68291
PR middle-end/68292
* cfgexpand.c (set_rtl): Always accept PARALLELs with BLKmode for
SSA names based on RESULT_DECLs.
* function.c (expand_function_start): Do not create BLKmode REGs
for GIMPLE registers when coalescing is enabled.



Ok. Although thinking about your comment in the PR about not making such 
vectors gimple registers I wonder what the effects of that would be.



Bernd


[PATCH][ARM] PR target/68648: Fold NOT of CONST_INT in andsi_iorsi3_notsi splitter

2015-12-07 Thread Kyrill Tkachov

Hi all,

In this PR we ICE because during post-reload splitting we generate the insn:
(insn 27 26 11 2 (set (reg:SI 0 r0 [orig:121 D.4992 ] [121])
(and:SI (not:SI (const_int 1 [0x1]))
(reg:SI 0 r0 [orig:121 D.4992 ] [121])))
 (nil))


The splitter at fault is andsi_iorsi3_notsi that accepts a const_int in 
operands[3]
and outputs (not (match_dup 3)). It should really be trying to constant fold 
the result
first.  This patch does that by calling simplify_gen_unary to generate the 
complement
of operands[3] if it's a register or the appropriate const_int rtx with the 
correct
folded result that will still properly match the arm bic-immediate instruction.

Bootstrapped and tested on arm-none-eabi.

Is this ok for trunk?

This appears on GCC 4.9 and GCC 5 and I'll be testing the fix there as well.
Ok for those branches if testing is successful?

Thanks,
Kyrill

2015-12-07  Kyrylo Tkachov  

PR target/68648
* config/arm/arm.md (*andsi_iorsi3_notsi): Try to simplify
the complement of operands[3] during splitting.

2015-12-07  Kyrylo Tkachov  

PR target/68648
* gcc.c-torture/execute/pr68648.c: New test.
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 2b48bbaf034b286d723536ec2aa6fe0f9b312911..dfb75c5f11c66c6b4a34ff3071b5a0957c3512cb 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -3274,8 +3274,22 @@ (define_insn_and_split "*andsi_iorsi3_notsi"
   "#"   ; "orr%?\\t%0, %1, %2\;bic%?\\t%0, %0, %3"
   "&& reload_completed"
   [(set (match_dup 0) (ior:SI (match_dup 1) (match_dup 2)))
-   (set (match_dup 0) (and:SI (not:SI (match_dup 3)) (match_dup 0)))]
-  ""
+   (set (match_dup 0) (and:SI (match_dup 4) (match_dup 5)))]
+  {
+ /* If operands[3] is a constant make sure to fold the NOT into it
+	to avoid creating a NOT of a CONST_INT.  */
+rtx not_rtx = simplify_gen_unary (NOT, SImode, operands[3], SImode);
+if (CONST_INT_P (not_rtx))
+  {
+	operands[4] = operands[0];
+	operands[5] = not_rtx;
+  }
+else
+  {
+	operands[5] = operands[0];
+	operands[4] = not_rtx;
+  }
+  }
   [(set_attr "length" "8")
(set_attr "ce_count" "2")
(set_attr "predicable" "yes")
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr68648.c b/gcc/testsuite/gcc.c-torture/execute/pr68648.c
new file mode 100644
index ..fc66806a99a0abef7bd517ae2f5200b387e69ce4
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr68648.c
@@ -0,0 +1,20 @@
+int __attribute__ ((noinline))
+foo (void)
+{
+  return 123;
+}
+
+int __attribute__ ((noinline))
+bar (void)
+{
+  int c = 1;
+  c |= 4294967295 ^ (foo () | 4073709551608);
+  return c;
+}
+
+int
+main ()
+{
+  if (bar () != 0x83fd4005)
+__builtin_abort ();
+}


Re: [PATCH] enable loop fusion on isl-15

2015-12-07 Thread Richard Biener
On Fri, Dec 4, 2015 at 8:59 PM, Sebastian Paul Pop  wrote:
> I would highly recommend updating the required version of ISL to isl-0.15:
> that would simplify the existing code, removing a lot of code under "#ifdef
> old ISL",
> and allow us to fully transition to schedule_trees instead of dealing with
> the
> old antiquated union_maps in the scheudler.  The result is faster
> compilation time.

Hmm.  I think we agreed to raise the requirement to ISL 0.14.  OTOH the plan
was to make graphite enabled by -O3 [-fprofile-use] by default which would
mean making ISL a hard host requirement.  That raises the barrier on making
the version requirement stricter ...

Sebastian, were quite into stage3 already - what's your plans / progress with
the defaulting of GRPAHITE?  (compile-time / performance numbers though
I see ICEs still popping up - a good thing in some sense as it looks like
GRAPHITE gets testing).

Thanks,
Richard.

> Thanks,
> Sebastian
>
> -Original Message-
> From: Mike Stump [mailto:mikest...@comcast.net]
> Sent: Friday, December 04, 2015 12:03 PM
> To: Alan Lawrence
> Cc: Sebastian Pop; seb...@gmail.com; gcc-patches@gcc.gnu.org;
> hiradi...@msn.com
> Subject: Re: [PATCH] enable loop fusion on isl-15
>
> On Dec 4, 2015, at 5:13 AM, Alan Lawrence  wrote:
>> On 05/11/15 21:43, Sebastian Pop wrote:
>>>* graphite-optimize-isl.c (optimize_isl): Call
>>>isl_options_set_schedule_maximize_band_depth.
>>>
>>>* gcc.dg/graphite/fuse-1.c: New.
>>>* gcc.dg/graphite/fuse-2.c: New.
>>>* gcc.dg/graphite/interchange-13.c: Remove bogus check.
>>
>> I note that the test
>>
>> scan-tree-dump-times forwprop4 "gimple_simplified to[^\\n]*\\^ 12" 1
>>
>> FAILs under isl-0.14, with which GCC can still be built and generally
> claims to work.
>>
>> Is it worth trying to detect this in the testsuite, so we can XFAIL it? By
> which I mean, is there a reasonable testsuite mechanism by which we could do
> that?
>
> You can permanently ignore it by updating to 0.15?  I don't see the
> advantage of bothering to finesse this too much.  I don't know of a way to
> detect 14 v 15 other than this test case, but, if you do that, you can't use
> that result to gate this test case.  If one wanted to engineer in a way, one
> would expose the isl version via a preprocessor symbol (built in), and then
> the test case would use that to gate it.  If we had to fix it, I think I'd
> prefer we just raise the isl version to 15 or later and be done with it.
>


Re: [PATCH PR68542]

2015-12-07 Thread Yuri Rumyantsev
Richard!

Here is middle-end part of patch with changes proposed by you.

Is it OK for trunk?

Thanks.
Yuri.

ChangeLog:
2015-12-07  Yuri Rumyantsev  

PR middle-end/68542
* fold-const.c (fold_relational_const): Add handling of vector
comparison with boolean result.
* tree-cfg.c (verify_gimple_comparison): Add argument CODE, allow
comparison of vector operands with boolean result for EQ/NE only.
(verify_gimple_assign_binary): Adjust call for verify_gimple_comparison.
(verify_gimple_cond): Likewise.
* tree-ssa-forwprop.c (combine_cond_expr_cond): Do not perform
combining for non-compatible vector types.
* tree-vrp.c (register_edge_assert_for): VRP does not track ranges for
vector types.



2015-12-04 18:07 GMT+03:00 Yuri Rumyantsev :
> Hi Richard.
>
> Thanks a lot for your review.
> Below are my answers.
>
> You asked why I inserted additional check to
> ++ b/gcc/tree-ssa-forwprop.c
> @@ -373,6 +373,11 @@ combine_cond_expr_cond (gimple *stmt, enum
> tree_code code, tree type,
>
>gcc_assert (TREE_CODE_CLASS (code) == tcc_comparison);
>
> +  /* Do not perform combining it types are not compatible.  */
> +  if (TREE_CODE (TREE_TYPE (op0)) == VECTOR_TYPE
> +  && !tree_int_cst_equal (TYPE_SIZE (type), TYPE_SIZE (TREE_TYPE (op0
> +return NULL_TREE;
> +
>
> again, how does this happen?
>
> This is because without it I've got assert in fold_convert_loc
>   gcc_assert (TREE_CODE (orig) == VECTOR_TYPE
>  && tree_int_cst_equal (TYPE_SIZE (type), TYPE_SIZE (orig)));
>
> since it tries to convert vector of bool to scalar bool.
> Here is essential part of call-stack:
>
> #0  internal_error (gmsgid=0x1e48397 "in %s, at %s:%d")
> at ../../gcc/diagnostic.c:1259
> #1  0x01743ada in fancy_abort (
> file=0x1847fc3 "../../gcc/fold-const.c", line=2217,
> function=0x184b9d0  tree_node*)::__FUNCTION__> "fold_convert_loc") at
> ../../gcc/diagnostic.c:1332
> #2  0x009c8330 in fold_convert_loc (loc=0, type=0x718a9d20,
> arg=0x71a7f488) at ../../gcc/fold-const.c:2216
> #3  0x009f003f in fold_ternary_loc (loc=0, code=VEC_COND_EXPR,
> type=0x718a9d20, op0=0x71a7f460, op1=0x718c2000,
> op2=0x718c2030) at ../../gcc/fold-const.c:11453
> #4  0x009f2f94 in fold_build3_stat_loc (loc=0, code=VEC_COND_EXPR,
> type=0x718a9d20, op0=0x71a7f460, op1=0x718c2000,
> op2=0x718c2030) at ../../gcc/fold-const.c:12394
> #5  0x009d870c in fold_binary_op_with_conditional_arg (loc=0,
> code=EQ_EXPR, type=0x718a9d20, op0=0x71a7f460,
> op1=0x71a48780, cond=0x71a7f460, arg=0x71a48780,
> cond_first_p=1) at ../../gcc/fold-const.c:6465
> #6  0x009e3407 in fold_binary_loc (loc=0, code=EQ_EXPR,
> type=0x718a9d20, op0=0x71a7f460, op1=0x71a48780)
> at ../../gcc/fold-const.c:9211
> #7  0x00ecb8fa in combine_cond_expr_cond (stmt=0x71a487d0,
> code=EQ_EXPR, type=0x718a9d20, op0=0x71a7f460,
> op1=0x71a48780, invariant_only=true)
> at ../../gcc/tree-ssa-forwprop.c:382
>
>
> Secondly, I did not catch your idea to implement GCC Vector Extension
> for vector comparison with bool result since
> such extension completely depends on comparison context, e.g. for your
> example, result type of comparison depends on using - for
> if-comparison it is scalar, but for c = (a==b) - result type is
> vector. I don't think that this is reasonable for current release.
>
> And finally about AMD performance. I checked that this transformation
> works for "-march=bdver4" option and regression for 481.wrf must
> disappear too.
>
> Thanks.
> Yuri.
>
> 2015-12-04 15:18 GMT+03:00 Richard Biener :
>> On Mon, Nov 30, 2015 at 2:11 PM, Yuri Rumyantsev  wrote:
>>> Hi All,
>>>
>>> Here is a patch for 481.wrf preformance regression for avx2 which is
>>> sligthly modified mask store optimization. This transformation allows
>>> perform unpredication for semi-hammock containing masked stores, other
>>> words if we have a loop like
>>> for (i=0; i>>   if (c[i]) {
>>> p1[i] += 1;
>>> p2[i] = p3[i] +2;
>>>   }
>>>
>>> then it will be transformed to
>>>if (!mask__ifc__42.18_165 == { 0, 0, 0, 0, 0, 0, 0, 0 }) {
>>>  vect__11.19_170 = MASK_LOAD (vectp_p1.20_168, 0B, 
>>> mask__ifc__42.18_165);
>>>  vect__12.22_172 = vect__11.19_170 + vect_cst__171;
>>>  MASK_STORE (vectp_p1.23_175, 0B, mask__ifc__42.18_165, 
>>> vect__12.22_172);
>>>  vect__18.25_182 = MASK_LOAD (vectp_p3.26_180, 0B, 
>>> mask__ifc__42.18_165);
>>>  vect__19.28_184 = vect__18.25_182 + vect_cst__183;
>>>  MASK_STORE (vectp_p2.29_187, 0B, mask__ifc__42.18_165, 
>>> vect__19.28_184);
>>>}
>>> i.e. it will put all computations related to masked stores to semi-hammock.
>>>
>>> Bootstrapping and regression testing did not show any new failures.
>>

[hsa 1/10] Configury changes and new options

2015-12-07 Thread Martin Jambor
Hi,

this patch contains changes to the configuration mechanism and offload
bits, so that users can build compilers with HSA support. It plays
nicely with other accelerators despite using an altogether different
implementation approach.  I have also added to it definitions of the
new options and parameters, since at least one hunk in common.opt is
highly related.  -fdisable-hsa-gridification has disappeared, othrwise
very little has changed since the last submission.

With this patch, the user can request HSA support by including the
string "hsa" among the requested accelerators in
--enable-offload-targets.  This will cause the compiler to start
producing HSAIL for target OpenMP constructs/functions and the hsa
libgomp plugin to be built.  Because the plugin needs to use HSA
run-time library, I have introduced options --with-hsa-runtime (and
more precise --with-hsa-include and --with-hsa-lib) to help find it.
The open-sourced hsa runtime available at github is binary compatible
with the closed-source one which however also contains the finalizer
and so needs to be used for all practical purposes.  I am regularly
asking AMD to keep their promise and open source the finalizer too.

One catch is however that there is no offload compiler for HSA and so
the wrapper should not attempt to look for it (that is what the hunk
in lto-wrapper.c does) and when HSA is the only accelerator, it is
wasteful to output LTO sections with byte-code and therefore if HSA is
the only configured accelerator, it does not set ENABLE_OFFLOADING
macro.

Finally, when the compiler has been configured for HSA but the user
disables it by omitting it in the -foffload compiler option, we need
to observe that decision.  That is what the opts.c hunk does.

As far as the options are concerned, the patch adds new warning -Whsa
we emit whenever we fail to produce HSAIL for some source code.  It is
on by default but warnigs are of course only emitted by HSAIL
generating code so will never affect anybody who does not use both an
HSA-enabled compiler and OpenMP 4 device constructs.

Then there is a new parameter hsa-gen-debug-stores, which will be
obsolete once HSA run-time supports debugging traps.  Before that, we
have to do with debugging stores to memory at defined places, which
however can cost speed in benchmarks.  So we only enabled them with
this parameter.  We decided to make it a parameter rather than a
switch to emphasize the fact it will go away and to possibly allow us
select different levels of verbosity of the stores in the future).

Any feedback is very appreciated,

Martin



2015-12-04  Martin Jambor  

gcc/
* Makefile.in (OBJS): Add new source files.
(GTFILES): Add hsa.c.
* config.in (ENABLE_HSA): New.
* configure.ac: Treat hsa differently from other accelerators.
(OFFLOAD_TARGETS): Define ENABLE_OFFLOADING according to
$enable_offloading.
(ENABLE_HSA): Define ENABLE_HSA according to $enable_hsa.
* doc/install.texi (Configuration): Document --with-hsa-runtime,
--with-hsa-runtime-include and --with-hsa-runtime-lib.
* lto-wrapper.c (compile_images_for_offload_targets): Do not attempt
to invoke offload compiler for hsa acclerator.
* opts.c (common_handle_option): Determine whether HSA offloading
should be performed.
* common.opt (disable_hsa): New variable.
(-Whsa): New warning.
* doc/invoke.texi (-Whsa): Document.
(hsa-gen-debug-stores): Likewise.
* params.def (PARAM_HSA_GEN_DEBUG_STORES): New parameter.

libgomp/plugin/
* Makefrag.am: Add HSA plugin requirements.
* configfrag.ac (HSA_RUNTIME_INCLUDE): New variable.
(HSA_RUNTIME_LIB): Likewise.
(HSA_RUNTIME_CPPFLAGS): Likewise.
(HSA_RUNTIME_INCLUDE): New substitution.
(HSA_RUNTIME_LIB): Likewise.
(HSA_RUNTIME_LDFLAGS): Likewise.
(hsa-runtime): New configure option.
(hsa-runtime-include): Likewise.
(hsa-runtime-lib): Likewise.
(PLUGIN_HSA): New substitution variable.
Fill HSA_RUNTIME_INCLUDE and HSA_RUNTIME_LIB according to the new
configure options.
(PLUGIN_HSA_CPPFLAGS): Likewise.
(PLUGIN_HSA_LDFLAGS): Likewise.
(PLUGIN_HSA_LIBS): Likewise.
Check that we have access to HSA run-time.

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index bee2879..5fe73a7 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1296,6 +1296,11 @@ OBJS = \
graphite-sese-to-poly.o \
gtype-desc.o \
haifa-sched.o \
+   hsa.o \
+   hsa-gen.o \
+   hsa-regalloc.o \
+   hsa-brig.o \
+   hsa-dump.o \
hw-doloop.o \
hwint.o \
ifcvt.o \
@@ -1320,6 +1325,7 @@ OBJS = \
ipa-icf.o \
ipa-icf-gimple.o \
ipa-reference.o \
+   ipa-hsa.o \
ipa-ref.o \
ipa-utils.o \
ipa.o \
@@ -2401,6 +2407,7 @@ GTFILES = 

[hsa 6/10] Pass manager changes

2015-12-07 Thread Martin Jambor
Hi,

the pass manager changes required for HSA have already been committed
to trunk so all that remains are these additions to the pass pipeline.

Thanks,

Martin


2015-12-04  Martin Jambor  
Martin Liska  

* passes.def: Schedule pass_ipa_hsa and pass_gen_hsail.
* tree-pass.h (make_pass_gen_hsail): Declare.
(make_pass_ipa_hsa): Likewise.


diff --git a/gcc/passes.def b/gcc/passes.def
index 28cb4c1..0f0f36d 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -144,6 +144,7 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_ipa_cp);
   NEXT_PASS (pass_ipa_cdtor_merge);
   NEXT_PASS (pass_target_clone);
+  NEXT_PASS (pass_ipa_hsa);
   NEXT_PASS (pass_ipa_inline);
   NEXT_PASS (pass_ipa_pure_const);
   NEXT_PASS (pass_ipa_reference);
@@ -377,6 +378,7 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_nrv);
   NEXT_PASS (pass_cleanup_cfg_post_optimizing);
   NEXT_PASS (pass_warn_function_noreturn);
+  NEXT_PASS (pass_gen_hsail);
 
   NEXT_PASS (pass_expand);
 
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index 9704918..30127d4 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -467,6 +467,7 @@ extern gimple_opt_pass *make_pass_ubsan (gcc::context 
*ctxt);
 extern gimple_opt_pass *make_pass_sanopt (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_oacc_kernels (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_oacc_kernels2 (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_gen_hsail (gcc::context *ctxt);
 
 /* IPA Passes */
 extern simple_ipa_opt_pass *make_pass_ipa_lower_emutls (gcc::context *ctxt);
@@ -491,6 +492,7 @@ extern ipa_opt_pass_d *make_pass_ipa_cp (gcc::context 
*ctxt);
 extern ipa_opt_pass_d *make_pass_ipa_icf (gcc::context *ctxt);
 extern ipa_opt_pass_d *make_pass_ipa_devirt (gcc::context *ctxt);
 extern ipa_opt_pass_d *make_pass_ipa_reference (gcc::context *ctxt);
+extern ipa_opt_pass_d *make_pass_ipa_hsa (gcc::context *ctxt);
 extern ipa_opt_pass_d *make_pass_ipa_pure_const (gcc::context *ctxt);
 extern simple_ipa_opt_pass *make_pass_ipa_pta (gcc::context *ctxt);
 extern simple_ipa_opt_pass *make_pass_ipa_tm (gcc::context *ctxt);


[hsa 7/10] IPA-HSA pass

2015-12-07 Thread Martin Jambor
Hi,

when a target construct is gridified, the HSA GPU function is
associated with the CPU function throughout the compilation, so that
they can be registered as a pair in libgomp.

Ungridified target constructs and, more importantly, "pragma omp
declare target" marked functions emerge out of OMP expansion as one
gimple function for both the host and the accelerator. However, at
some point we need to create a special HSA function representation so
that we can modify behavior of a (very) few optimization passes for
them.

Both is done by the following new IPA pass, which creates new HSA
clones in these cases.  Moreover, it redirects the appropriate call
graph edges to be in between HSA implementations, marks HSA clones
with the flatten attribute to minimize any call overhead (which is
much more significant on GPUs) and makes sure both the CPU and GPU
functions are coupled together and remain in the same LTO partition so
that they can b registered together to libgomp.

Thanks,

Martin

2015-12-04  Martin Liska  
Martin Jambor  

* ipa-hsa.c: New file.
* lto-section-in.c (lto_section_name): Add hsa section name.
* lto-streamer.h (lto_section_type): Add hsa section.
* lto-partition.c: Include "hsa.h"
(add_symbol_to_partition_1): Put hsa implementations int the
same partition as host implementations.
* timevar.def (TV_IPA_HSA): New.

diff --git a/gcc/ipa-hsa.c b/gcc/ipa-hsa.c
new file mode 100644
index 000..5b3e563
--- /dev/null
+++ b/gcc/ipa-hsa.c
@@ -0,0 +1,329 @@
+/* Callgraph based analysis of static variables.
+   Copyright (C) 2015 Free Software Foundation, Inc.
+   Contributed by Martin Liska 
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+/* Interprocedural HSA pass is responsible for creation of HSA clones.
+   For all these HSA clones, we emit HSAIL instructions and pass processing
+   is terminated.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "is-a.h"
+#include "hash-set.h"
+#include "vec.h"
+#include "tree.h"
+#include "tree-pass.h"
+#include "function.h"
+#include "basic-block.h"
+#include "gimple.h"
+#include "dumpfile.h"
+#include "gimple-pretty-print.h"
+#include "tree-streamer.h"
+#include "stringpool.h"
+#include "cgraph.h"
+#include "print-tree.h"
+#include "symbol-summary.h"
+#include "hsa.h"
+
+namespace {
+
+/* If NODE is not versionable, warn about not emiting HSAIL and return false.
+   Otherwise return true.  */
+
+static bool
+check_warn_node_versionable (cgraph_node *node)
+{
+  if (!node->local.versionable)
+{
+  warning_at (EXPR_LOCATION (node->decl), OPT_Whsa,
+ "could not emit HSAIL for function %s: function cannot be "
+ "cloned", node->name ());
+  return false;
+}
+  return true;
+}
+
+/* The function creates HSA clones for all functions that were either
+   marked as HSA kernels or are callable HSA functions.  Apart from that,
+   we redirect all edges that come from an HSA clone and end in another
+   HSA clone to connect these two functions.  */
+
+static unsigned int
+process_hsa_functions (void)
+{
+  struct cgraph_node *node;
+
+  if (hsa_summaries == NULL)
+hsa_summaries = new hsa_summary_t (symtab);
+
+  FOR_EACH_DEFINED_FUNCTION (node)
+{
+  hsa_function_summary *s = hsa_summaries->get (node);
+
+  /* A linked function is skipped.  */
+  if (s->m_binded_function != NULL)
+   continue;
+
+  if (s->m_kind != HSA_NONE)
+   {
+ if (!check_warn_node_versionable (node))
+   continue;
+ cgraph_node *clone = node->create_virtual_clone
+   (vec  (), NULL, NULL, "hsa");
+ TREE_PUBLIC (clone->decl) = TREE_PUBLIC (node->decl);
+
+ clone->force_output = true;
+ hsa_summaries->link_functions (clone, node, s->m_kind, false);
+
+ if (dump_file)
+   fprintf (dump_file, "Created a new HSA clone: %s, type: %s\n",
+clone->name (),
+s->m_kind == HSA_KERNEL ? "kernel" : "function");
+   }
+  else if (hsa_callable_function_p (node->decl))
+   {
+ if (!check_warn_node_versionable (node))
+   continue;
+ cgraph_node *clone = node->create_virtual_clone
+   (vec  (), NULL, NULL, "hsa");
+ 

Re: [PATCH] Fix -Werror= handling for Joined warnings, add a few missing Warning keywords (PRs c/48088, c/68657)

2015-12-07 Thread Bernd Schmidt

On 12/04/2015 08:36 PM, Jakub Jelinek wrote:

On Fri, Dec 04, 2015 at 06:19:19PM +, Manuel López-Ibáñez wrote:

My guess is that the first error_at should use arg instead of
option->opt_text to be equivalent. Of course, ideally, this code would
not be duplicated, but rather merged "somehow".


Consider that fixed.  As for duplication, as one operates on
cl_decoded_option and the other not etc., this is harder, plus
the missing and non-int cases are IMHO short enough that it is not worth
trying hard to avoid the duplication.
For the enum case which is larger, it is maybe worth adding
a helper routine for it, which would need probably only
location_t loc, const struct cl_enum *e, const char *opt, unsigned int lang_mask
arguments.  Can try that on Monday.


Maybe you can split the error printing code out of read_cmdline_option. 
For the original patch I noticed the duplication but figured it was not 
enough to really worry about, but for the error handling I think we 
should make an effort.



Bernd



Re: [PATCH] Fix missing range information for "%q+D" format code

2015-12-07 Thread Bernd Schmidt

On 12/04/2015 10:09 PM, David Malcolm wrote:

Updated patch to comment attached, which rewrites things to clarify the
meaning of SHOW_CARET_P.


I guess this is OK for now. I think I'll go play with the Fortran 
frontend a bit to see what exactly is going on with its use of set_range 
there, I still think that function is a little clunky.



Bernd



Re: [PATCH] Adjust vect-widen-mult-const-[su]16.c for r226675

2015-12-07 Thread Richard Biener
On Fri, Dec 4, 2015 at 8:51 PM, Bill Schmidt
 wrote:
> Since r226675, we have been seeing these failures:
>
> FAIL: gcc.dg/vect/vect-widen-mult-const-s16.c -flto -ffat-lto-objects
> scan-tree-dump-times vect "pattern recognized" 2
> FAIL: gcc.dg/vect/vect-widen-mult-const-s16.c scan-tree-dump-times vect
> "pattern recognized" 2
> FAIL: gcc.dg/vect/vect-widen-mult-const-u16.c -flto -ffat-lto-objects
> scan-tree-dump-times vect "pattern recognized" 2
> FAIL: gcc.dg/vect/vect-widen-mult-const-u16.c scan-tree-dump-times vect
> "pattern recognized" 2
>
> Comparing the vect-details dumps from r226674 to r226675, I see these as
> the reason:
>
> 63a64,66
>> /home/wschmidt/gcc/gcc-mainline-base/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-s16.c:16:3:
>>  note: vect_recog_mult_pattern: detected:
>> /home/wschmidt/gcc/gcc-mainline-base/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-s16.c:16:3:
>>  note: patt_47 = _6 << 2;
>> /home/wschmidt/gcc/gcc-mainline-base/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-s16.c:16:3:
>>  note: pattern recognized: patt_47 = _6 << 2;
> 70a74,76
>> /home/wschmidt/gcc/gcc-mainline-base/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-s16.c:16:3:
>>  note: vect_recog_mult_pattern: detected:
>> /home/wschmidt/gcc/gcc-mainline-base/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-s16.c:16:3:
>>  note: patt_40 = _6 << 1;
>> /home/wschmidt/gcc/gcc-mainline-base/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-s16.c:16:3:
>>  note: pattern recognized: patt_40 = _6 << 1;
>
> 747a754,756
>> /home/wschmidt/gcc/gcc-mainline-base/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-s16.c:31:3:
>>  note: vect_recog_mult_pattern: detected:
>> /home/wschmidt/gcc/gcc-mainline-base/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-s16.c:31:3:
>>  note: patt_47 = _6 << 2;
>> /home/wschmidt/gcc/gcc-mainline-base/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-s16.c:31:3:
>>  note: pattern recognized: patt_47 = _6 << 2;
> 754a764,766
>> /home/wschmidt/gcc/gcc-mainline-base/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-s16.c:31:3:
>>  note: vect_recog_mult_pattern: detected:
>> /home/wschmidt/gcc/gcc-mainline-base/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-s16.c:31:3:
>>  note: patt_40 = _6 << 1;
>> /home/wschmidt/gcc/gcc-mainline-base/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-s16.c:31:3:
>>  note: pattern recognized: patt_40 = _6 << 1;
>
> These seems precisely what's expected, given the nature of the patch,
> which is looking for these opportunities.  So it's likely that we should
> just change
>
> /* { dg-final { scan-tree-dump-times "pattern recognized" 2
> "vect" { target vect_widen_mult_hi_to_si_pattern } } } */
>
> to
>
> /* { dg-final { scan-tree-dump-times "pattern recognized" 6
> "vect" { target vect_widen_mult_hi_to_si_pattern } } } */
>
> and similarly for the unsigned case.  The following patch does this.
> However, I wanted to run this by Venkat since this was apparently not
> detected when his patch went in.  This doesn't appear to be a
> target-specific issue, and most targets support
> vect_widen_mult_hi_to_si_pattern, so I'm not sure why this wasn't fixed
> with the original patch.  Will this change break on any other targets
> for some reason?
>
> Tested on powerpc64le-unknown-linux-gnu.  Ok for trunk?

Hmm.  That will FAIL on x86_64 though because it can handle multiplication
natively.  I think the pattern recognition is simply bogus as it fails to detect
the stmt is already part of the widen-mult pattern?  In fact, pattern
recognition
looping over all pattern functions even if one already matched on the very
same stmt looks bogus to me.

Does the (untested)

Index: gcc/tree-vect-patterns.c
===
--- gcc/tree-vect-patterns.c(revision 231357)
+++ gcc/tree-vect-patterns.c(working copy)
@@ -3791,7 +3791,7 @@ vect_mark_pattern_stmts (gimple *orig_st
This function also does some bookkeeping, as explained in the documentation
for vect_recog_pattern.  */

-static void
+static bool
 vect_pattern_recog_1 (vect_recog_func_ptr vect_recog_func,
  gimple_stmt_iterator si,
  vec *stmts_to_replace)
@@ -3809,7 +3809,7 @@ vect_pattern_recog_1 (vect_recog_func_pt
   stmts_to_replace->quick_push (stmt);
   pattern_stmt = (* vect_recog_func) (stmts_to_replace, _in, _out);
   if (!pattern_stmt)
-return;
+return false;

   stmt = stmts_to_replace->last ();
   stmt_info = vinfo_for_stmt (stmt);
@@ -3831,13 +3831,13 @@ vect_pattern_recog_1 (vect_recog_func_pt
   /* Check target support  */
   type_in = get_vectype_for_scalar_type (type_in);
   if (!type_in)
-   return;
+   return false;
   if (type_out)
type_out = get_vectype_for_scalar_type (type_out);
   else
type_out = type_in;
   if (!type_out)
-   return;
+   return false;
   pattern_vectype = type_out;

   if (is_gimple_assign 

Re: [AArch64] Rework ARMv8.1 command line options.

2015-12-07 Thread Matthew Wahab

Ping. Updated patch attached.

Matthew

On 27/11/15 09:23, Matthew Wahab wrote:

On 24/11/15 15:22, James Greenhalgh wrote:
 > On Mon, Nov 16, 2015 at 04:31:32PM +, Matthew Wahab wrote:
 >>
 >> The command line options for target selection allow ARMv8.1 extensions
 >> to be individually enabled/disabled. They also allow the extensions to
 >> be enabled with -march=armv8-a. This doesn't reflect the ARMv8.1
 >> architecture which requires all extensions to be enabled and doesn't make
 >> them available for ARMv8.
 >>
 >> This patch removes the options for the individual ARMv8.1 extensions
 >> except for +lse. This means that setting -march=armv8.1-a will enable
 >> all extensions required by ARMv8.1 and that the ARMv8.1 extensions can't
 >> be used with -march=armv8.

 > I think I mentioned it in another review, but this patch seems a good place
 > to solve the problem. Could you please update the documentation to explain
 > what you've written above. As it stands I find myself confused by which
 > features GCC will make available at -march=armv8-a and -march=armv8.1-a.

Attached is a patch with the documentation for the AArch64 -march option
reworked to try to make it clearer what the -march=armv8.1-a option will
do. Extensions with feature modifiers (+crc, +lse) are explicitly stated
as being enabled by -march=armv8.1-a. Extensions without feature
modifiers (RDMA, PAN, LOR) are treated as part of the generic 'ARMv8.1
architecture extension' term in the description of -march=armv8.1-a.

I've also rearranged the -march section, to put the description of the
values for -march together and reworded the description of the
-march=native option.

Matthew

2015-11-26  Matthew Wahab  

 * config/aarch64/aarch64-options-extensions.def: Remove
 AARCH64_FL_RDMA from "fp" and "simd".  Remove "pan", "lor",
 "rdma".
 * config/aarch64/aarch64.h (AARCH64_FL_PAN): Remove.
 (AARCH64_FL_LOR): Remove.
 (AARCH64_FL_RDMA): Remove.
 (AARCH64_FL_V8_1): New.
 (AARCH64_FL_FOR_AARCH8_1): Replace AARCH64_FL_PAN, AARCH64_FL_LOR
 and AARCH64_FL_RDMA with AARCH64_FL_V8_1.
 (AARCH64_ISA_RDMA): Replace AARCH64_FL_RDMA with AARCH64_FL_V8_1.
 * doc/invoke.texi (AArch64 -march): Rewrite initial paragraph and
 section on -march=native.  Group descriptions of permitted
 architecture names together.  Expand description of
 -march=armv8.1-a.
 (AArch64 -mtune): Slightly rework section on -march=native.
 (AArch64 -mcpu): Slightly rework section on -march=native.
 (AArch64 Feature Modifiers): Remove "pan", "lor" and "rdma".
 State that -march=armv8.1-a enables "crc" and "lse".



>From 498323fc1992cd75070e86f195d4bba09a5e02e0 Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Fri, 30 Oct 2015 10:32:59 +
Subject: [PATCH] [AArch64] Rework ARMv8.1 command line options.

Change-Id: Ib9053719f45980255a3d7727e226a53d9f214049
---
 gcc/config/aarch64/aarch64-option-extensions.def |  9 ++---
 gcc/config/aarch64/aarch64.h |  9 ++---
 gcc/doc/invoke.texi  | 47 
 3 files changed, 30 insertions(+), 35 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-option-extensions.def b/gcc/config/aarch64/aarch64-option-extensions.def
index b261a0f..4f1d535 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -34,11 +34,10 @@
should contain a whitespace-separated list of the strings in 'Features'
that are required.  Their order is not important.  */
 
-AARCH64_OPT_EXTENSION("fp",	AARCH64_FL_FP,  AARCH64_FL_FPSIMD | AARCH64_FL_CRYPTO | AARCH64_FL_RDMA, "fp")
-AARCH64_OPT_EXTENSION("simd",	AARCH64_FL_FPSIMD,  AARCH64_FL_SIMD | AARCH64_FL_CRYPTO | AARCH64_FL_RDMA,   "asimd")
+AARCH64_OPT_EXTENSION ("fp", AARCH64_FL_FP,
+		   AARCH64_FL_FPSIMD | AARCH64_FL_CRYPTO, "fp")
+AARCH64_OPT_EXTENSION ("simd", AARCH64_FL_FPSIMD,
+		   AARCH64_FL_SIMD | AARCH64_FL_CRYPTO, "asimd")
 AARCH64_OPT_EXTENSION("crypto",	AARCH64_FL_CRYPTO | AARCH64_FL_FPSIMD,  AARCH64_FL_CRYPTO,   "aes pmull sha1 sha2")
 AARCH64_OPT_EXTENSION("crc",	AARCH64_FL_CRC, AARCH64_FL_CRC,"crc32")
 AARCH64_OPT_EXTENSION("lse",	AARCH64_FL_LSE, AARCH64_FL_LSE,"lse")
-AARCH64_OPT_EXTENSION("pan",	AARCH64_FL_PAN,		AARCH64_FL_PAN,		"pan")
-AARCH64_OPT_EXTENSION("lor",	AARCH64_FL_LOR,		AARCH64_FL_LOR,		"lor")
-AARCH64_OPT_EXTENSION("rdma",	AARCH64_FL_RDMA | AARCH64_FL_FPSIMD,	AARCH64_FL_RDMA,	"rdma")
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 68c006f..06345f0 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -134,9 +134,7 @@ extern unsigned aarch64_architecture_version;
 #define AARCH64_FL_CRC(1 << 3)	/* Has CRC.  */
 /* ARMv8.1 

[hsa 0/10] Merge of HSA branch

2015-12-07 Thread Martin Jambor
Hi,

I'm sorry it took me more than a month to come up with another round
of patches aiming at merging the HSA branch into the trunk.  Keeping
up-to date with the latest changes in the OpenMP 4.5 area was
strenuous and we have discovered and fixed a few bugs as I intensified
my testing efforts.

While those are the main areas where this patch set differs from the
previous one, I have of course addressed the feedback I got the last
time, including implementing device-specific OpenMP target arguments,
moving kernel grid size from gimple class fields to new artificial
clauses and disabling the vectorizer for HSA functions using
DECL_FUNCTION_SPECIFIC_OPTIMIZATION rather than extra code in
respective pass gates.

Because I have not been able to come up with any solution to failing
libgomp/testsuite/libgomp.c++/target-2.C, I have disabled use of
dynamic parallelism in this merge (I keep it on the branch) and
therefore entirely rely on the gridification process to run loops on
the accelerator, because gridified constructs do not have this issue
(passing private symbols by reference).

HSA tests are still missing, I would need some guidance as to how to
best implement them (specially to test gridification which of course
does not happen for other accelerators).  There are no failing
testcases if HSA is not configured.  If it is, there are some, all of
which fall into one the following categories:

  1) HSA cannot compile a function for one reason or another (most
 common cause is inability of HSA to take an address of a function
 or make an indirect call) and gives a warning, which is regarded
 as an "excess error" by dejagnu.

  2) When HSA is not emitted for a function, libgomp runs a host
 fallback instead of it.  When the test queries
 omp_is_initial_device and asserts it returns false, the test
 fails.

  3) There are still a few failing OpenACC tests, but those just
 should not be run.

Of course, the patch set bootstraps fine on x86_64-linux with or
without configured HSA.

Any feedback is welcome.  Thanks,

Martin


[hsa 5/10] OpenMP lowering/expansion changes (gridification)

2015-12-07 Thread Martin Jambor
Hi,

the patch in this email contains the changes to make our OpenMP
lowering and expansion machinery produce GPU kernels for a certain
limited class of loops.  The plan is to make that class quite a big
bigger, but only the following is ready for submission now.

Basically, whenever the compiler configured for HSAIL generation
encounters the following pattern:

  #pragma omp target
  #pragma omp teams thread_limit(workgroup_size) // thread_limit is optional
  #pragma omp distribute parallel for firstprivate(n,j) private(i) 
other_sharing_clauses()
for (i = j + 1; i < n; i += 3)
  some_loop_body


it creates a copy of the entire target body and expands it slightly
differently for concurrent execution on a GPU.  Note that both teams
and distribute constructs are mandatory.  Moreover, currently the
distribute has to be in a combined statement with the inner for
construct.  And there are quite a few other restrictions which I hope
to alleviate over the next year, most notably reductions and collapse
clause now prevent gridification (see the new function
target_follows_gridifiable_pattern to find out what exactly the
restrictions are).

The first phase of the "gridification" process is run before omp
"scanning" phase.  We look for the pattern above, and if we encounter
one, we copy its entire body into a new gimple statement
GIMPLE_OMP_GPUKERNEL.  Within it, we mark the teams, distribute and
parallel constructs with a new flag "kernel_phony."  This flag will
then make OMP lowering phase process their sharing clauses like usual,
but the statements representing the constructs will be removed at
lowering (and thus will never be expanded).  The resulting wasteful
repackaging of data is nicely cleaned by our optimizers even at -O1.

At expansion time, we identify gomp_target statements with a kernel
and expand the kernel into a special function, with the loop
represented by the GPU grid and not control flow.  Afterwards, the
normal body of the target is expanded as usual.  Finally, we need to
take the grid dimensions stored within new fields of the target
statement by the first phase, store in a structure and pass them in a
device-specific argument to GOMP_target_ext.

The patch thus also implements the compiler part of device-specific
target arguments as discussed on the mailing list an IRC.

Originally, when I started with the above pattern matching, I did not
allow any other gimple statements in between the respective omp
constructs.  That however proved to be too restrictive for two
reasons.  First, statements in pre-bodies of both distribute and for
loops needed to be accounted for when calculating the kernel grid size
(which is done before the target statement itself) and second, Fortran
parameter dereferences happily result in interleaving statements when
there were none in the user source code.

Therefore, I allow register-type stores to local non-addressable
variables in pre-bodies and also in between the OMP constructs.  All
of them are copied in front of the target statement and either used
for grid size calculation or removed as useless by later
optimizations.

I hope that eventually I managed to write the gridification in a way
that interferes very little with the rest of the OMP pipeline and yet
only re-implement the bare necessary minimum of functionality that is
already there.  Any feedback is of course still very welcome.

Thanks,

Martin


2015-12-04  Martin Jambor  

* builtin-types.def (BT_FN_VOID_UINT_PTR_INT_PTR): New.
(BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_INT_INT): Removed.
(BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_PTR): New.
* fortran/types.def (BT_FN_VOID_UINT_PTR_INT_PTR): New.
(BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_INT_INT): Removed.
(BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_UINT_PTR_PTR): New.
* gimple-low.c (lower_stmt): Also handle GIMPLE_OMP_GPUKERNEL.
* gimple-pretty-print.c (dump_gimple_omp_for): Also handle
GF_OMP_FOR_KIND_KERNEL_BODY.
(dump_gimple_omp_block): Also handle GIMPLE_OMP_GPUKERNEL.
(pp_gimple_stmt_1): Likewise.
* gimple-walk.c (walk_gimple_stmt): Likewise.
* gimple.c (gimple_build_omp_gpukernel): New function.
(gimple_copy): Also handle GIMPLE_OMP_GPUKERNEL.
* gimple.def (GIMPLE_OMP_TEAMS): Moved into its own layout.
(GIMPLE_OMP_GPUKERNEL): New.
* gimple.h (gf_mask): Added GF_OMP_FOR_KIND_KERNEL_BODY.
(gomp_for): New field kernel_phony.
(gimple_statement_omp_parallel_layout): Likewise.
(gimple_statement_omp_single_layout): Updated comments.
(gomp_teams): New field kernel_phony.
(gimple_build_omp_gpukernel): Declare.
(gimple_has_substatements): Also handle GIMPLE_OMP_GPUKERNEL.
(gimple_omp_for_kernel_phony): New.
(gimple_omp_for_set_kernel_phony): Likewise.
(gimple_omp_parallel_kernel_phony): Likewise.

[hsa 8/10] HSAIL BRIG description header file (and a steering committee request)

2015-12-07 Thread Martin Jambor
Hi,

the following patch adds a BRIG (binary representation of HSAIL)
representation description.  It is within a single header file
describing the binary structures and constants of the format.

The file comes from the HSA Foundation (I have only added the
HSA_BRIG_FORMAT_H macro and check and removed some weird comments
which are not present in proposed future versions of the file) and is
licensed under "University of Illinois/NCSA Open Source License."

The license is "GPL-compatible" according to FSF
(http://www.gnu.org/licenses/license-list.en.html#GPLCompatibleLicenses)
so I believe we can have it in GCC.  Nevertheless, it is not GPL and
there is no copyright assignment for it, but the situation is
hopefully analogous to some other libraries that have their upstream
elsewhere but we ship them as part of the GCC.

I would therefore like to ask the GCC steering committee for
permission to add this file to GCC (and update it as HSA standard
evolves).  Please let me know if there is something more I need to do
in this regard.

Thanks,

Martin


2015-12-04  Martin Jambor  

* hsa-brig-format.h: New file.

diff --git a/gcc/hsa-brig-format.h b/gcc/hsa-brig-format.h
new file mode 100644
index 000..6e2fe75
--- /dev/null
+++ b/gcc/hsa-brig-format.h
@@ -0,0 +1,1277 @@
+// University of Illinois/NCSA
+// Open Source License
+//
+// Copyright (c) 2013-2015, Advanced Micro Devices, Inc.
+// All rights reserved.
+//
+// Developed by:
+//
+// HSA Team
+//
+// Advanced Micro Devices, Inc
+//
+// www.amd.com
+//
+// Permission is hereby granted, free of charge, to any person obtaining a 
copy of
+// this software and associated documentation files (the "Software"), to deal 
with
+// the Software without restriction, including without limitation the rights to
+// use, copy, modify, merge, publish, distribute, sublicense, and/or sell 
copies
+// of the Software, and to permit persons to whom the Software is furnished to 
do
+// so, subject to the following conditions:
+//
+// * Redistributions of source code must retain the above copyright notice,
+//   this list of conditions and the following disclaimers.
+//
+// * Redistributions in binary form must reproduce the above copyright 
notice,
+//   this list of conditions and the following disclaimers in the
+//   documentation and/or other materials provided with the distribution.
+//
+// * Neither the names of the HSA Team, University of Illinois at
+//   Urbana-Champaign, nor the names of its contributors may be used to
+//   endorse or promote products derived from this Software without 
specific
+//   prior written permission.
+//
+// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 
FITNESS
+// FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL THE
+// CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS WITH 
THE
+// SOFTWARE.
+
+#ifndef HSA_BRIG_FORMAT_H
+#define HSA_BRIG_FORMAT_H
+
+typedef uint32_t BrigVersion32_t;
+
+enum BrigVersion {
+
+BRIG_VERSION_HSAIL_MAJOR = 1,
+BRIG_VERSION_HSAIL_MINOR = 0,
+BRIG_VERSION_BRIG_MAJOR  = 1,
+BRIG_VERSION_BRIG_MINOR  = 0
+};
+
+typedef uint8_t BrigAlignment8_t;
+
+typedef uint8_t BrigAllocation8_t;
+
+typedef uint8_t BrigAluModifier8_t;
+
+typedef uint8_t BrigAtomicOperation8_t;
+
+typedef uint32_t BrigCodeOffset32_t;
+
+typedef uint8_t BrigCompareOperation8_t;
+
+typedef uint16_t BrigControlDirective16_t;
+
+typedef uint32_t BrigDataOffset32_t;
+
+typedef BrigDataOffset32_t BrigDataOffsetCodeList32_t;
+
+typedef BrigDataOffset32_t BrigDataOffsetOperandList32_t;
+
+typedef BrigDataOffset32_t BrigDataOffsetString32_t;
+
+typedef uint8_t BrigExecutableModifier8_t;
+
+typedef uint8_t BrigImageChannelOrder8_t;
+
+typedef uint8_t BrigImageChannelType8_t;
+
+typedef uint8_t BrigImageGeometry8_t;
+
+typedef uint8_t BrigImageQuery8_t;
+
+typedef uint16_t BrigKind16_t;
+
+typedef uint8_t BrigLinkage8_t;
+
+typedef uint8_t BrigMachineModel8_t;
+
+typedef uint8_t BrigMemoryModifier8_t;
+
+typedef uint8_t BrigMemoryOrder8_t;
+
+typedef uint8_t BrigMemoryScope8_t;
+
+typedef uint16_t BrigOpcode16_t;
+
+typedef uint32_t BrigOperandOffset32_t;
+
+typedef uint8_t BrigPack8_t;
+
+typedef uint8_t BrigProfile8_t;
+
+typedef uint16_t BrigRegisterKind16_t;
+
+typedef uint8_t BrigRound8_t;
+
+typedef uint8_t BrigSamplerAddressing8_t;
+
+typedef uint8_t BrigSamplerCoordNormalization8_t;
+
+typedef uint8_t BrigSamplerFilter8_t;
+
+typedef uint8_t BrigSamplerQuery8_t;
+
+typedef uint32_t BrigSectionIndex32_t;
+
+typedef uint8_t BrigSegCvtModifier8_t;
+
+typedef uint8_t BrigSegment8_t;
+
+typedef uint32_t BrigStringOffset32_t;
+
+typedef uint16_t BrigType16_t;
+

[hsa 2/10] Modifications to libgomp proper

2015-12-07 Thread Martin Jambor
Hi,

The patch below contains all changes to libgomp files except for the
hsa plugin (which is in the following patch).

The changes can roughly divided into three categories.  First, it
contains changes I that are necessary to support shared-memory
devices.  In majority of cases this means treating them like the host
fallback because there is no need to copy, host malloc can be used for
allocating etc.  It also means that GOMP_target_ext and
gomp_target_task_fn should not be remapping arguments but should pass
to the plugin the same thing host fallback function would receive.

Second, because GCC HSA backend often does not emit HSAIL for function
it knows it cannot handle, these two functions need to gracefully
handle the case when there is no device implementation of a particular
function available by doing host fallback too.

Third, the patch implements libgomp-part of the device-specific
arguments passed to GOMP_target as requested Jakub (well, some are
actually for all devices but that is what we call them).  Because of
nowait target constructs, the arguments have proliferated into tasking
too, as did firstprivate copies.

Any feedback will be greatly appreciated,

Martin


2015-12-04  Martin Jambor  
Martin Liska  

include/
* gomp-constants.h (GOMP_DEVICE_HSA): New macro.
(GOMP_VERSION_HSA): Likewise.
(GOMP_TARGET_ARG_DEVICE_MASK): Likewise.
(GOMP_TARGET_ARG_DEVICE_ALL): Likewise.
(GOMP_TARGET_ARG_SUBSEQUENT_PARAM): Likewise.
(GOMP_TARGET_ARG_ID_MASK): Likewise.
(GOMP_TARGET_ARG_NUM_TEAMS): Likewise.
(GOMP_TARGET_ARG_THREAD_LIMIT): Likewise.
(GOMP_TARGET_ARG_VALUE_SHIFT): Likewise.
(GOMP_TARGET_ARG_HSA_KERNEL_ATTRIBUTES): Likewise.
(GOMP_kernel_launch_attributes): New type.
(GOMP_hsa_kernel_dispatch): New type.

libgomp/
* libgomp-plugin.h (offload_target_type): New element
OFFLOAD_TARGET_TYPE_HSA.
* libgomp.h (gomp_target_task): New field args.
(bool gomp_create_target_task): Updated.
(gomp_device_descr): Extra parameter of run_func and async_run_func,
new field can_run_func.
* libgomp_g.h (GOMP_target_ext): Change prototype.
* oacc-host.c (host_run): Added a new parameter args.
* target.c (gomp_target_fallback_firstprivate): New function.
(gomp_target_fallback_firstprivate): Use
gomp_target_fallback_firstprivate.
(gomp_get_target_fn_addr): Allow returning NULL for shared memory
devices.
(GOMP_target): Do host fallback for all shared memory devices.  Do not
pass any args to plugins.
(GOMP_target_ext): Add new parameter args.  Allow host fallback if
device shares memory.  Do not remap data if device has shared memory.
(gomp_target_task_fn): Likewise.  Also Treat shared memory devices
like host fallback for mappings.
(GOMP_target_data): Treat shared memory devices like host fallback.
(GOMP_target_data_ext): Likewise.
(GOMP_target_update): Likewise.
(GOMP_target_update_ext): Likewise.  Also pass NULL as args to
gomp_create_target_task.
(GOMP_target_enter_exit_data): Likewise.
(omp_target_alloc): Treat shared memory devices like host fallback.
(omp_target_free): Likewise.
(omp_target_is_present): Likewise.
(omp_target_memcpy): Likewise.
(omp_target_memcpy_rect): Likewise.
(omp_target_associate_ptr): Likewise.
(gomp_load_plugin_for_device): Also load can_run.
* task.c (GOMP_PLUGIN_target_task_completion): Free
firstprivate_copies.
(gomp_create_target_task): Accept new argument args and store it to
ttask.

liboffloadmic/plugin
* libgomp-plugin-intelmic.cpp (GOMP_OFFLOAD_async_run): New unused
parameter.
(GOMP_OFFLOAD_run): Likewise.

diff --git a/include/gomp-constants.h b/include/gomp-constants.h
index dffd631..1dae474 100644
--- a/include/gomp-constants.h
+++ b/include/gomp-constants.h
@@ -176,6 +176,7 @@ enum gomp_map_kind
 #define GOMP_DEVICE_NOT_HOST   4
 #define GOMP_DEVICE_NVIDIA_PTX 5
 #define GOMP_DEVICE_INTEL_MIC  6
+#define GOMP_DEVICE_HSA7
 
 #define GOMP_DEVICE_ICV-1
 #define GOMP_DEVICE_HOST_FALLBACK  -2
@@ -201,6 +202,7 @@ enum gomp_map_kind
 #define GOMP_VERSION   0
 #define GOMP_VERSION_NVIDIA_PTX 1
 #define GOMP_VERSION_INTEL_MIC 0
+#define GOMP_VERSION_HSA 0
 
 #define GOMP_VERSION_PACK(LIB, DEV) (((LIB) << 16) | (DEV))
 #define GOMP_VERSION_LIB(PACK) (((PACK) >> 16) & 0x)
@@ -228,4 +230,74 @@ enum gomp_map_kind
 #define GOMP_LAUNCH_OP(X) (((X) >> GOMP_LAUNCH_OP_SHIFT) & 0x)
 #define GOMP_LAUNCH_OP_MAX 0x
 
+/* Bitmask to apply in order to find out the intended device of a target
+   argument.  */
+#define GOMP_TARGET_ARG_DEVICE_MASK((1 << 7) 

[hsa 3/10] HSA libgomp plugin

2015-12-07 Thread Martin Jambor
Hi,

the patch below adds the HSA-specific plugin for libgomp.  The plugin
implements the interface mandated by libgomp and takes care of finding
any available HSA devices, finalizing HSAIL code and running it on
HSA-capable GPUs.  The plugin does not really implement any data
movement functions (it implements them with a fatal error call)
because memory is shared in HSA environments and the previous patch
has modified libgomp proper not to call those functions on devices
with this capability.

The changes since the last submission include version checks,
receiving grid sizes through a device-specific parameter and support
for asynchronous execution.

Any feedback will be greatly appreciated,

Martin


2015-12-04  Martin Jambor  
Martin Liska  

* plugin/plugin-hsa.c: New file.

diff --git a/libgomp/plugin/plugin-hsa.c b/libgomp/plugin/plugin-hsa.c
new file mode 100644
index 000..b132954
--- /dev/null
+++ b/libgomp/plugin/plugin-hsa.c
@@ -0,0 +1,1449 @@
+/* Plugin for HSAIL execution.
+
+   Copyright (C) 2013-2015 Free Software Foundation, Inc.
+
+   Contributed by Martin Jambor  and
+   Martin Liska .
+
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   .  */
+
+#include 
+#include 
+#include 
+#include 
+#include "libgomp-plugin.h"
+#include "gomp-constants.h"
+#include "hsa.h"
+#include "hsa_ext_finalize.h"
+#include "dlfcn.h"
+
+/* Part of the libgomp plugin interface.  Return the name of the accelerator,
+   which is "hsa".  */
+
+const char *
+GOMP_OFFLOAD_get_name (void)
+{
+  return "hsa";
+}
+
+/* Part of the libgomp plugin interface.  Return the specific capabilities the
+   HSA accelerator have.  */
+
+unsigned int
+GOMP_OFFLOAD_get_caps (void)
+{
+  return GOMP_OFFLOAD_CAP_SHARED_MEM | GOMP_OFFLOAD_CAP_OPENMP_400;
+}
+
+/* Part of the libgomp plugin interface.  Identify as HSA accelerator.  */
+
+int
+GOMP_OFFLOAD_get_type (void)
+{
+  return OFFLOAD_TARGET_TYPE_HSA;
+}
+
+/* Return the libgomp version number we're compatible with.  There is
+   no requirement for cross-version compatibility.  */
+
+unsigned
+GOMP_OFFLOAD_version (void)
+{
+  return GOMP_VERSION;
+}
+
+/* Flag to decide whether print to stderr information about what is going on.
+   Set in init_debug depending on environment variables.  */
+
+static bool debug;
+
+/* Flag to decide if the runtime should suppress a possible fallback to host
+   execution.  */
+
+static bool suppress_host_fallback;
+
+/* Initialize debug and suppress_host_fallback according to the environment.  
*/
+
+static void
+init_enviroment_variables (void)
+{
+  if (getenv ("HSA_DEBUG"))
+debug = true;
+  else
+debug = false;
+
+  if (getenv ("HSA_SUPPRESS_HOST_FALLBACK"))
+suppress_host_fallback = true;
+  else
+suppress_host_fallback = false;
+}
+
+/* Print a logging message with PREFIX to stderr if HSA_DEBUG value
+   is set to true.  */
+
+#define HSA_LOG(prefix, ...) \
+  do \
+  { \
+if (debug) \
+  { \
+   fprintf (stderr, prefix); \
+   fprintf (stderr, __VA_ARGS__); \
+  } \
+  } \
+  while (false);
+
+/* Print a debugging message to stderr.  */
+
+#define HSA_DEBUG(...) HSA_LOG ("HSA debug: ", __VA_ARGS__)
+
+/* Print a warning message to stderr.  */
+
+#define HSA_WARNING(...) HSA_LOG ("HSA warning: ", __VA_ARGS__)
+
+/* Print HSA warning STR with an HSA STATUS code.  */
+
+static void
+hsa_warn (const char *str, hsa_status_t status)
+{
+  if (!debug)
+return;
+
+  const char* hsa_error;
+  hsa_status_string (status, _error);
+
+  unsigned l = strlen (hsa_error);
+
+  char *err = GOMP_PLUGIN_malloc (sizeof (char) * l);
+  memcpy (err, hsa_error, l - 1);
+  err[l] = '\0';
+
+  fprintf (stderr, "HSA warning: %s (%s)\n", str, err);
+
+  free (err);
+}
+
+/* Report a fatal error STR together with the HSA error corresponding to STATUS
+   and terminate execution of the current process.  */
+
+static void
+hsa_fatal (const char *str, hsa_status_t status)
+{
+  const 

[hsa 4/10] Merge of HSA branch

2015-12-07 Thread Martin Jambor
Subject: Make copy_gimple_seq_and_replace_locals copy seqs in omp clauses

Hi,

this is https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00477.html with
the early return requested by Jakub.  Please refer to that previous
email for explanation why it is necessary.

Thanks,

2015-12-03  Martin Jambor  

* tree-inline.c (duplicate_remap_omp_clause_seq): New function.
(replace_locals_op): Duplicate gimple sequences in OMP clauses.

diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
index ebab189..dea23c7 100644
--- a/gcc/tree-inline.c
+++ b/gcc/tree-inline.c
@@ -5116,6 +5116,8 @@ mark_local_labels_stmt (gimple_stmt_iterator *gsip,
   return NULL_TREE;
 }
 
+static gimple_seq duplicate_remap_omp_clause_seq (gimple_seq seq,
+ struct walk_stmt_info *wi);
 
 /* Called via walk_gimple_seq by copy_gimple_seq_and_replace_local.
Using the splay_tree pointed to by ST (which is really a `splay_tree'),
@@ -5160,6 +5162,35 @@ replace_locals_op (tree *tp, int *walk_subtrees, void 
*data)
  TREE_OPERAND (expr, 3) = NULL_TREE;
}
 }
+  else if (TREE_CODE (expr) == OMP_CLAUSE)
+{
+  /* Before the omplower pass completes, some OMP clauses can contain
+sequences that are neither copied by gimple_seq_copy nor walked by
+walk_gimple_seq.  To make copy_gimple_seq_and_replace_locals work even
+in those situations, we have to copy and process them explicitely.  */
+
+  if (OMP_CLAUSE_CODE (expr) == OMP_CLAUSE_LASTPRIVATE)
+   {
+ gimple_seq seq = OMP_CLAUSE_LASTPRIVATE_GIMPLE_SEQ (expr);
+ seq = duplicate_remap_omp_clause_seq (seq, wi);
+ OMP_CLAUSE_LASTPRIVATE_GIMPLE_SEQ (expr) = seq;
+   }
+  else if (OMP_CLAUSE_CODE (expr) == OMP_CLAUSE_LINEAR)
+   {
+ gimple_seq seq = OMP_CLAUSE_LINEAR_GIMPLE_SEQ (expr);
+ seq = duplicate_remap_omp_clause_seq (seq, wi);
+ OMP_CLAUSE_LINEAR_GIMPLE_SEQ (expr) = seq;
+   }
+  else if (OMP_CLAUSE_CODE (expr) == OMP_CLAUSE_REDUCTION)
+   {
+ gimple_seq seq = OMP_CLAUSE_REDUCTION_GIMPLE_INIT (expr);
+ seq = duplicate_remap_omp_clause_seq (seq, wi);
+ OMP_CLAUSE_REDUCTION_GIMPLE_INIT (expr) = seq;
+ seq = OMP_CLAUSE_REDUCTION_GIMPLE_MERGE (expr);
+ seq = duplicate_remap_omp_clause_seq (seq, wi);
+ OMP_CLAUSE_REDUCTION_GIMPLE_MERGE (expr) = seq;
+   }
+}
 
   /* Keep iterating.  */
   return NULL_TREE;
@@ -5200,6 +5231,21 @@ replace_locals_stmt (gimple_stmt_iterator *gsip,
   return NULL_TREE;
 }
 
+/* Create a copy of SEQ and remap all decls in it.  */
+
+static gimple_seq
+duplicate_remap_omp_clause_seq (gimple_seq seq, struct walk_stmt_info *wi)
+{
+  if (!seq)
+return NULL;
+
+  /* If there are any labels in OMP sequences, they can be only referred to in
+ the sequence itself and therefore we can do both here.  */
+  walk_gimple_seq (seq, mark_local_labels_stmt, NULL, wi);
+  gimple_seq copy = gimple_seq_copy (seq);
+  walk_gimple_seq (copy, replace_locals_stmt, replace_locals_op, wi);
+  return copy;
+}
 
 /* Copies everything in SEQ and replaces variables and labels local to
current_function_decl.  */


[hsa 10/10] HSA register allocator

2015-12-07 Thread Martin Jambor
Hi,

because HSA backend is not based on RTL,we need our own, and it is in
this patch.  The allocator has been written by Michael Matz and I have
put it into a separate email so that I can add him to CC, because he
is much better suited to answer any questions or review comments.

Thanks,

Martin


2015-12-04  Michael Matz 
Martin Jambor  

* hsa-regalloc.c: New file.

diff --git a/gcc/hsa-regalloc.c b/gcc/hsa-regalloc.c
new file mode 100644
index 000..9db4c1d
--- /dev/null
+++ b/gcc/hsa-regalloc.c
@@ -0,0 +1,719 @@
+/* HSAIL IL Register allocation and out-of-SSA.
+   Copyright (C) 2013-15 Free Software Foundation, Inc.
+   Contributed by Michael Matz 
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "is-a.h"
+#include "vec.h"
+#include "tree.h"
+#include "dominance.h"
+#include "cfg.h"
+#include "cfganal.h"
+#include "function.h"
+#include "bitmap.h"
+#include "dumpfile.h"
+#include "cgraph.h"
+#include "print-tree.h"
+#include "cfghooks.h"
+#include "symbol-summary.h"
+#include "hsa.h"
+
+
+/* Process a PHI node PHI of basic block BB as a part of naive out-f-ssa.  */
+
+static void
+naive_process_phi (hsa_insn_phi *phi)
+{
+  unsigned count = phi->operand_count ();
+  for (unsigned i = 0; i < count; i++)
+{
+  gcc_checking_assert (phi->get_op (i));
+  hsa_op_base *op = phi->get_op (i);
+  hsa_bb *hbb;
+  edge e;
+
+  if (!op)
+   break;
+
+  e = EDGE_PRED (phi->m_bb, i);
+  if (single_succ_p (e->src))
+   hbb = hsa_bb_for_bb (e->src);
+  else
+   {
+ basic_block old_dest = e->dest;
+ hbb = hsa_init_new_bb (split_edge (e));
+
+ /* If switch insn used this edge, fix jump table.  */
+ hsa_bb *source = hsa_bb_for_bb (e->src);
+ hsa_insn_sbr *sbr;
+ if (source->m_last_insn
+ && (sbr = dyn_cast  (source->m_last_insn)))
+   sbr->replace_all_labels (old_dest, hbb->m_bb);
+   }
+
+  hsa_build_append_simple_mov (phi->m_dest, op, hbb);
+}
+}
+
+/* Naive out-of SSA.  */
+
+static void
+naive_outof_ssa (void)
+{
+  basic_block bb;
+
+  hsa_cfun->m_in_ssa = false;
+
+  FOR_ALL_BB_FN (bb, cfun)
+  {
+hsa_bb *hbb = hsa_bb_for_bb (bb);
+hsa_insn_phi *phi;
+
+for (phi = hbb->m_first_phi;
+phi;
+phi = phi->m_next ? as_a  (phi->m_next): NULL)
+  naive_process_phi (phi);
+
+/* Zap PHI nodes, they will be deallocated when everything else will.  */
+hbb->m_first_phi = NULL;
+hbb->m_last_phi = NULL;
+  }
+}
+
+/* Return register class number for the given HSA TYPE.  0 means the 'c' one
+   bit register class, 1 means 's' 32 bit class, 2 stands for 'd' 64 bit class
+   and 3 for 'q' 128 bit class.  */
+
+static int
+m_reg_class_for_type (BrigType16_t type)
+{
+  switch (type)
+{
+case BRIG_TYPE_B1:
+  return 0;
+
+case BRIG_TYPE_U8:
+case BRIG_TYPE_U16:
+case BRIG_TYPE_U32:
+case BRIG_TYPE_S8:
+case BRIG_TYPE_S16:
+case BRIG_TYPE_S32:
+case BRIG_TYPE_F16:
+case BRIG_TYPE_F32:
+case BRIG_TYPE_B8:
+case BRIG_TYPE_B16:
+case BRIG_TYPE_B32:
+case BRIG_TYPE_U8X4:
+case BRIG_TYPE_S8X4:
+case BRIG_TYPE_U16X2:
+case BRIG_TYPE_S16X2:
+case BRIG_TYPE_F16X2:
+  return 1;
+
+case BRIG_TYPE_U64:
+case BRIG_TYPE_S64:
+case BRIG_TYPE_F64:
+case BRIG_TYPE_B64:
+case BRIG_TYPE_U8X8:
+case BRIG_TYPE_S8X8:
+case BRIG_TYPE_U16X4:
+case BRIG_TYPE_S16X4:
+case BRIG_TYPE_F16X4:
+case BRIG_TYPE_U32X2:
+case BRIG_TYPE_S32X2:
+case BRIG_TYPE_F32X2:
+  return 2;
+
+case BRIG_TYPE_B128:
+case BRIG_TYPE_U8X16:
+case BRIG_TYPE_S8X16:
+case BRIG_TYPE_U16X8:
+case BRIG_TYPE_S16X8:
+case BRIG_TYPE_F16X8:
+case BRIG_TYPE_U32X4:
+case BRIG_TYPE_U64X2:
+case BRIG_TYPE_S32X4:
+case BRIG_TYPE_S64X2:
+case BRIG_TYPE_F32X4:
+case BRIG_TYPE_F64X2:
+  return 3;
+
+default:
+  gcc_unreachable ();
+}
+}
+
+/* If the Ith operands of INSN is or contains a register (in an address),
+   return the address of that register operand.  If not return NULL.  */
+
+static hsa_op_reg **
+insn_reg_addr (hsa_insn_basic *insn, int i)
+{
+  hsa_op_base *op = insn->get_op (i);
+  if 

Re: [PATCH] [C FE] Fold trivial exprs that refer to const vars

2015-12-07 Thread Marek Polacek
On Sun, Dec 06, 2015 at 11:50:15PM -0500, Patrick Palka wrote:
> diff --git a/gcc/c/c-fold.c b/gcc/c/c-fold.c
> index c554e17..ab0b37f 100644
> --- a/gcc/c/c-fold.c
> +++ b/gcc/c/c-fold.c
> @@ -88,6 +88,7 @@ c_fully_fold (tree expr, bool in_init, bool *maybe_const)
>  }
>ret = c_fully_fold_internal (expr, in_init, maybe_const,
>  _const_itself, false);
> +  ret = decl_constant_value_for_optimization (ret);

Sorry, I don't think you can just do this.  Because for e.g.
  const int x = 7;
  x++;
we'd turn this into
  7++;
, right?  And I'm sure that's going to ICE in gimplifier.

Marek


Re: [PATCH] [C FE] Fold trivial exprs that refer to const vars

2015-12-07 Thread Patrick Palka
On Sun, Dec 6, 2015 at 11:50 PM, Patrick Palka  wrote:
> There is a minor inconsistency in the folding behavior within the C
> frontend.  The C frontend does not currently fold the expression "x",
> where x is a const int, yet the FE does fold the expression "x + 0".
>
> This happens because decl_constant_value is called in c_fully_fold only
> while recursing over the operands of the expression being folded, i.e.
> there is no top-level call to decl_constant_value to handle the case
> where the expression being folded happens to be a singular expression
> such as "x", as opposed to "x + 5" (where x is a const variable).
>
> To fix this inconsistency, this patch calls decl_constant_value in
> c_fully fold after folding the given expression.
>
> Bootstrap + regtest in progress on x86_64-pc-linux-gnu, OK to commit if
> testing succeeds?

It just occurred to me that this change is not completely safe because
calling c_fully_fold on an lvalue can now return an rvalue. Callers of
c_fully_fold are not prepared to handle this.  Indeed, this patch
causes a couple of regressions in the handling asm() memory operands
due to this implicit lvalue-rvalue conversion.


Re: [PATCH] [C FE] Fold trivial exprs that refer to const vars

2015-12-07 Thread Joseph Myers
On Mon, 7 Dec 2015, Patrick Palka wrote:

> To fix this inconsistency, this patch calls decl_constant_value in
> c_fully fold after folding the given expression.

The aim should be to eliminate decl_constant_value use here once all 
folding optimizations are also done on GIMPLE (and generally reduce the 
amount of folding done in the front end), not to use it in more cases.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] [C FE] Fold trivial exprs that refer to const vars

2015-12-07 Thread Patrick Palka
On Mon, Dec 7, 2015 at 7:20 AM, Marek Polacek  wrote:
> On Sun, Dec 06, 2015 at 11:50:15PM -0500, Patrick Palka wrote:
>> diff --git a/gcc/c/c-fold.c b/gcc/c/c-fold.c
>> index c554e17..ab0b37f 100644
>> --- a/gcc/c/c-fold.c
>> +++ b/gcc/c/c-fold.c
>> @@ -88,6 +88,7 @@ c_fully_fold (tree expr, bool in_init, bool *maybe_const)
>>  }
>>ret = c_fully_fold_internal (expr, in_init, maybe_const,
>>  _const_itself, false);
>> +  ret = decl_constant_value_for_optimization (ret);
>
> Sorry, I don't think you can just do this.  Because for e.g.
>   const int x = 7;
>   x++;
> we'd turn this into
>   7++;
> , right?  And I'm sure that's going to ICE in gimplifier.

Yes, looks like it.


RE: [PATCH : RL78] Disable interrupts during hardware multiplication routines

2015-12-07 Thread Kaushik Phatak
Hi DJ,
Please find attached an updated patch which tries to address the points raised 
by you in my earlier attempt,
https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01729.html

1. Added an option for -msave.. and -mno-save.. 
   The default will be to save the MDUC registers for the g13 target in the ISR.
   
2. As an optimization, it will check for usage of mul and divmod routines 
before saving/restoring.

3. I have eliminated the special insns used earlier and directly generating 
movhi/movqi for the mem reference, also
   setting them as volatile.

4. Updated and fixed the issue in invoke.texi.

This has been regression tested for -mg13 -msim.
The only glitch I observed was the list file printed out the address of the MDUC
registers in decimal and not in HEX, for example,
mov a, !983272 is displayed instead of,
mov a, !0xF00E8
However, the objectdump generates these addresses correctly in hex along with 
their register name 
references (, etc.)

Please let me know if this updated patch is OK.

Best Regards,
Kaushik

gcc/ChangeLog
2015-12-07  Kaushik Phatak  

* config/rl78/rl78.c (rl78_expand_prologue): Save the MDUC related
registers in all interrupt handlers if necessary.
(rl78_option_override): Add warning.
(MUST_SAVE_MDUC_REGISTER): New macro.
(rl78_expand_epilogue): Restore the MDUC registers if necessary.
* config/rl78/rl78.c (check_mduc_usage): New function.
* config/rl78/rl78.opt (msave-mduc-in-interrupts): New option.
(mno-save-mduc-in-interrupts): New option.
* doc/invoke.texi (@item -msave-mduc-in-interrupts): New item.
(@item -mno-save-mduc-in-interrupts): New item

Index: gcc/config/rl78/rl78.c
===
--- gcc/config/rl78/rl78.c  (revision 2871)
+++ gcc/config/rl78/rl78.c  (working copy)
@@ -342,6 +342,10 @@
 #undef  TARGET_OPTION_OVERRIDE
 #define TARGET_OPTION_OVERRIDE rl78_option_override
 
+#define MUST_SAVE_MDUC_REGISTER \
+  (!TARGET_NO_SAVE_MDUC_REGISTER\
+   && (is_interrupt_func (NULL_TREE)) && RL78_MUL_G13)
+
 static void
 rl78_option_override (void)
 {
@@ -366,6 +370,9 @@
 /* Address spaces are currently only supported by C.  */
 error ("-mes0 can only be used with C");
 
+  if (TARGET_SAVE_MDUC_REGISTER && !(TARGET_G13 || RL78_MUL_G13))
+warning (0, "mduc registers only saved for G13 target");
+
   switch (rl78_cpu_type)
 {
 case CPU_UNINIT:
@@ -1307,6 +1314,27 @@
   return (lookup_attribute ("naked", DECL_ATTRIBUTES (current_function_decl)) 
!= NULL_TREE);
 }
 
+/* Check if the block uses mul/div insns.  */
+int
+check_mduc_usage ()
+{
+  rtx insn;
+  basic_block bb;
+  FOR_EACH_BB_FN (bb, cfun)
+  {
+FOR_BB_INSNS (bb, insn)
+{
+  if (recog_memoized (insn) == CODE_FOR_udivmodsi4_g13
+  || recog_memoized (insn) == CODE_FOR_mulhi3_g13
+  || recog_memoized (insn) == CODE_FOR_mulsi3_g13)
+{
+  return 1;
+}
+}
+  }
+  return 0;
+}
+
 /* Expand the function prologue (from the prologue pattern).  */
 void
 rl78_expand_prologue (void)
@@ -1318,7 +1346,7 @@
 
   if (rl78_is_naked_func ())
 return;
-
+ 
   /* Always re-compute the frame info - the register usage may have changed.  
*/
   rl78_compute_frame_info ();
 
@@ -1371,6 +1399,46 @@
   F (emit_insn (gen_push (ax)));
 }
 
+  /* Save MDUC register inside interrupt routine.  */
+  if (MUST_SAVE_MDUC_REGISTER && (!crtl->is_leaf || check_mduc_usage ()))
+{
+  rtx mem_mduc;
+
+  mem_mduc = gen_rtx_MEM (QImode, GEN_INT (0xf00e8));
+  MEM_VOLATILE_P (mem_mduc) = 1;   
+  emit_insn (gen_movqi (gen_rtx_REG (QImode, A_REG), mem_mduc));
+  emit_insn (gen_push (gen_rtx_REG (HImode, AX_REG)));
+
+  mem_mduc = gen_rtx_MEM (HImode, GEN_INT (0x0));
+  MEM_VOLATILE_P (mem_mduc) = 1;   
+  emit_insn (gen_movhi (gen_rtx_REG (HImode, AX_REG), mem_mduc));
+  emit_insn (gen_push (gen_rtx_REG (HImode, AX_REG)));
+
+  mem_mduc = gen_rtx_MEM (HImode, GEN_INT (0x2));
+  MEM_VOLATILE_P (mem_mduc) = 1;   
+  emit_insn (gen_movhi (gen_rtx_REG (HImode, AX_REG), mem_mduc));
+  emit_insn (gen_push (gen_rtx_REG (HImode, AX_REG)));
+
+  mem_mduc = gen_rtx_MEM (HImode, GEN_INT (0x4));
+  MEM_VOLATILE_P (mem_mduc) = 1;   
+  emit_insn (gen_movhi (gen_rtx_REG (HImode, AX_REG), mem_mduc));
+  emit_insn (gen_push (gen_rtx_REG (HImode, AX_REG)));
+
+  mem_mduc = gen_rtx_MEM (HImode, GEN_INT (0x6));
+  MEM_VOLATILE_P (mem_mduc) = 1;   
+  emit_insn (gen_movhi (gen_rtx_REG (HImode, AX_REG), mem_mduc));
+  emit_insn (gen_push (gen_rtx_REG (HImode, AX_REG)));
+
+  mem_mduc = gen_rtx_MEM (HImode, GEN_INT (0xf00e0));
+  MEM_VOLATILE_P (mem_mduc) = 1;   
+  emit_insn (gen_movhi (gen_rtx_REG (HImode, AX_REG), mem_mduc));
+ 

Re: [hsa 0/10] Merge of HSA branch

2015-12-07 Thread Jakub Jelinek
On Mon, Dec 07, 2015 at 12:17:58PM +0100, Martin Jambor wrote:
> Because I have not been able to come up with any solution to failing
> libgomp/testsuite/libgomp.c++/target-2.C, I have disabled use of
> dynamic parallelism in this merge (I keep it on the branch) and
> therefore entirely rely on the gridification process to run loops on
> the accelerator, because gridified constructs do not have this issue
> (passing private symbols by reference).

I'm fine with not doing it in this series, but I'd strongly prefer
if dynamic parallelism is added for GCC 6.1.  Even for PTX we'll need
some IPA analysis on what functions might run in the various OpenMP
contexts (teams, parallel, simd) and what functions contains such
directives, and let the backends (or HSA) do something based on that
for sharing of the vars, or other properties of the function code
generation.

> HSA tests are still missing, I would need some guidance as to how to
> best implement them (specially to test gridification which of course
> does not happen for other accelerators).  There are no failing
> testcases if HSA is not configured.  If it is, there are some, all of
> which fall into one the following categories:
> 
>   1) HSA cannot compile a function for one reason or another (most
>  common cause is inability of HSA to take an address of a function
>  or make an indirect call) and gives a warning, which is regarded
>  as an "excess error" by dejagnu.

It would be good if there is a -W* switch to turn such warnings off.
Not just for the purposes of dejagnu libgomp testing, but say one
might try to compile a program primarily say for XeonPhi or PTX offloading,
but have HSA enabled to, but care primarily about the former two, etc.

>   2) When HSA is not emitted for a function, libgomp runs a host
>  fallback instead of it.  When the test queries
>  omp_is_initial_device and asserts it returns false, the test
>  fails.

Do you have examples of which tests fall into this category?

In any case, it will be needed to also update the wiki page with details on
how to build the HSA support in, what are the prerequisities etc.

Jakub


[PATCH][contrib] Update download_prerequisites to ISL 0.15.

2015-12-07 Thread Alan Lawrence
Since r229889, we now have tests that pass only with ISL 0.15. Although ISL 0.15
is not a requirement, it seems we should make it easy to build the compiler that
way.

Other opinions?

Thanks,
Alan

contrib/ChangeLog:

* download_prerequisites: Update ISL version to 0.15.
---
 contrib/download_prerequisites | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/contrib/download_prerequisites b/contrib/download_prerequisites
index 6940330..a685a1d 100755
--- a/contrib/download_prerequisites
+++ b/contrib/download_prerequisites
@@ -48,7 +48,7 @@ ln -sf $MPC mpc || exit 1
 
 # Necessary to build GCC with the Graphite loop optimizations.
 if [ "$GRAPHITE_LOOP_OPT" = "yes" ] ; then
-  ISL=isl-0.14
+  ISL=isl-0.15
 
   wget ftp://gcc.gnu.org/pub/gcc/infrastructure/$ISL.tar.bz2 || exit 1
   tar xjf $ISL.tar.bz2  || exit 1
-- 
1.9.1



Re: [PATCH][contrib] Update download_prerequisites to ISL 0.15.

2015-12-07 Thread Markus Trippelsdorf
On 2015.12.07 at 11:52 +, Alan Lawrence wrote:
> Since r229889, we now have tests that pass only with ISL 0.15. Although ISL 
> 0.15
> is not a requirement, it seems we should make it easy to build the compiler 
> that
> way.

This is already fixed by r231329.

-- 
Markus


Re: [PATCH][contrib] Update download_prerequisites to ISL 0.15.

2015-12-07 Thread Alan Lawrence

On 07/12/15 11:54, Markus Trippelsdorf wrote:

On 2015.12.07 at 11:52 +, Alan Lawrence wrote:

Since r229889, we now have tests that pass only with ISL 0.15. Although ISL 0.15
is not a requirement, it seems we should make it easy to build the compiler that
way.


This is already fixed by r231329.



Ah, yes, sorry, that'd been sitting in my outbox since Friday but I hadn't sent 
it. Thanks and sorry for the noise.


--Alan



Re: [PATCH][contrib] Update download_prerequisites to ISL 0.15.

2015-12-07 Thread Richard Biener
On Mon, Dec 7, 2015 at 12:52 PM, Alan Lawrence  wrote:
> Since r229889, we now have tests that pass only with ISL 0.15. Although ISL 
> 0.15
> is not a requirement, it seems we should make it easy to build the compiler 
> that
> way.
>
> Other opinions?

Ok.

Richard.

> Thanks,
> Alan
>
> contrib/ChangeLog:
>
> * download_prerequisites: Update ISL version to 0.15.
> ---
>  contrib/download_prerequisites | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/contrib/download_prerequisites b/contrib/download_prerequisites
> index 6940330..a685a1d 100755
> --- a/contrib/download_prerequisites
> +++ b/contrib/download_prerequisites
> @@ -48,7 +48,7 @@ ln -sf $MPC mpc || exit 1
>
>  # Necessary to build GCC with the Graphite loop optimizations.
>  if [ "$GRAPHITE_LOOP_OPT" = "yes" ] ; then
> -  ISL=isl-0.14
> +  ISL=isl-0.15
>
>wget ftp://gcc.gnu.org/pub/gcc/infrastructure/$ISL.tar.bz2 || exit 1
>tar xjf $ISL.tar.bz2  || exit 1
> --
> 1.9.1
>


[PTX] return type emission

2015-12-07 Thread Nathan Sidwell
This patch merges the two places were we handle return type emission.  In a 
similar way to write_one_arg, we now have write_return.


This patch shows that the C++ NRV patch I created some time back is at best 
incomplete, because it's not being considered in the prototype emission. 
Something that I've been wondering about for a little while.


So, back to figuring out that one now ...

nathan
2015-12-07  Nathan Sidwell  

	* config//nvptx/nvptx.c (write_return): New.
	(write_fn_proto, nvptx_declare_function_name): Call it.

Index: config/nvptx/nvptx.c
===
--- config/nvptx/nvptx.c	(revision 231372)
+++ config/nvptx/nvptx.c	(working copy)
@@ -452,6 +452,33 @@ write_one_arg (std::stringstream , int
   return argno + 1;
 }
 
+static bool
+write_return (std::stringstream , bool for_proto, tree type,
+	  machine_mode ret_mode)
+{
+  machine_mode mode = TYPE_MODE (type);
+  bool return_in_mem = mode != VOIDmode && !RETURN_IN_REG_P (mode);
+
+  mode = arg_promotion (mode);
+  if (for_proto)
+{
+  if (!return_in_mem && mode != VOIDmode)
+	s << "(.param" << nvptx_ptx_type_from_mode (mode, false)
+	  << " %out_retval) ";
+}
+  else
+{
+  /* Prologue.  C++11 ABI causes us to return a reference to the
+	 passed in pointer for return_in_mem.  */
+  ret_mode = arg_promotion (ret_mode);
+  if (ret_mode != VOIDmode)
+	s << "\t.reg" << nvptx_ptx_type_from_mode (ret_mode, false)
+	  << " %retval;\n";
+}
+
+  return return_in_mem;
+}
+
 /* Look for attributes in ATTRS that would indicate we must write a function
as a .entry kernel rather than a .func.  Return true if one is found.  */
 
@@ -520,19 +547,7 @@ write_fn_proto (std::stringstream , bo
   tree result_type = TREE_TYPE (fntype);
 
   /* Declare the result.  */
-  bool return_in_mem = false;
-  if (TYPE_MODE (result_type) != VOIDmode)
-{
-  machine_mode mode = TYPE_MODE (result_type);
-  if (!RETURN_IN_REG_P (mode))
-	return_in_mem = true;
-  else
-	{
-	  mode = arg_promotion (mode);
-	  s << "(.param" << nvptx_ptx_type_from_mode (mode, false)
-	<< " %out_retval) ";
-	}
-}
+  bool return_in_mem = write_return (s, true, result_type, VOIDmode);
 
   s << name;
 
@@ -725,8 +740,8 @@ nvptx_declare_function_name (FILE *file,
   write_fn_proto (s, true, name, decl);
   s << "{\n";
 
-  bool return_in_mem = (TYPE_MODE (result_type) != VOIDmode
-			&& !RETURN_IN_REG_P (TYPE_MODE (result_type)));
+  bool return_in_mem = write_return (s, false, result_type,
+ (machine_mode)cfun->machine->ret_reg_mode);
   if (return_in_mem)
 argno = write_one_arg (s, 0, argno, ptr_type_node, true);
   
@@ -755,16 +770,6 @@ nvptx_declare_function_name (FILE *file,
 
   fprintf (file, "%s", s.str().c_str());
 
-  /* C++11 ABI causes us to return a reference to the passed in
- pointer for return_in_mem.  */
-  if (cfun->machine->ret_reg_mode != VOIDmode)
-{
-  machine_mode mode = arg_promotion
-	((machine_mode)cfun->machine->ret_reg_mode);
-  fprintf (file, "\t.reg%s %%retval;\n",
-	   nvptx_ptx_type_from_mode (mode, false));
-}
-
   fprintf (file, "\t.reg.u%d %s;\n", GET_MODE_BITSIZE (Pmode),
 	   reg_names[OUTGOING_STATIC_CHAIN_REGNUM]);
   


Re: [PATCH] [C FE] Fold trivial exprs that refer to const vars

2015-12-07 Thread Patrick Palka
On Mon, Dec 7, 2015 at 7:23 AM, Joseph Myers  wrote:
> On Mon, 7 Dec 2015, Patrick Palka wrote:
>
>> To fix this inconsistency, this patch calls decl_constant_value in
>> c_fully fold after folding the given expression.
>
> The aim should be to eliminate decl_constant_value use here once all
> folding optimizations are also done on GIMPLE (and generally reduce the
> amount of folding done in the front end), not to use it in more cases.

I see, that makes sense. For now I filed PR 68764 to track this
particular issue.

>
> --
> Joseph S. Myers
> jos...@codesourcery.com


[arm-embedded][PATCHv2, ARM, libgcc] New aeabi_idiv function for armv6-m

2015-12-07 Thread Thomas Preud'homme
We decided to apply this to ARM/embedded-5-branch.

Best regards,

Thomas

> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Andre Vieira
> Sent: Wednesday, October 28, 2015 1:03 AM
> To: gcc-patches@gcc.gnu.org
> Subject: Re: [PING][PATCHv2, ARM, libgcc] New aeabi_idiv function for
> armv6-m
> 
> Ping.
> 
> BR,
> Andre
> 
> On 13/10/15 18:01, Andre Vieira wrote:
> > This patch ports the aeabi_idiv routine from Linaro Cortex-Strings
> > (https://git.linaro.org/toolchain/cortex-strings.git), which was
> > contributed by ARM under Free BSD license.
> >
> > The new aeabi_idiv routine is used to replace the one in
> > libgcc/config/arm/lib1funcs.S. This replacement happens within the
> > Thumb1 wrapper. The new routine is under LGPLv3 license.
> >
> > The main advantage of this version is that it can improve the
> > performance of the aeabi_idiv function for Thumb1. This solution will
> > also increase the code size. So it will only be used if
> > __OPTIMIZE_SIZE__ is not defined.
> >
> > Make check passed for armv6-m.
> >
> > libgcc/ChangeLog:
> > 2015-08-10  Hale Wang  
> >   Andre Vieira  
> >
> > * config/arm/lib1funcs.S: Add new wrapper.
> >





C++ PATCH for c++/68464 (ICE with constexpr and delayed folding)

2015-12-07 Thread Jason Merrill
This testcase was failing because the NRV optimization changing VAR_DECL 
to RESULT_DECL was confusing the constexpr code.  Fixed by moving 
delayed folding to below NRV modifications.  I also needed to move the 
pre-genericize plugin earlier so that folding doesn't break the 
expression ranges plugin.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit 95e8cb9f98c39a7e198ce95e0cb3ba1e57de959c
Author: Jason Merrill 
Date:   Mon Dec 7 13:26:33 2015 -0500

	PR c++/68464
	* cp-gimplify.c (cp_fold): Don't assume X has TREE_TYPE.
	(cp_genericize): Don't do cp_fold_r here.
	(cp_fold_function): New.
	* cp-tree.h: Declare it.
	* decl.c (finish_function): Call it and the pre-genericize plugin
	before NRV processing.

diff --git a/gcc/cp/cp-gimplify.c b/gcc/cp/cp-gimplify.c
index 177e271..373c9e1 100644
--- a/gcc/cp/cp-gimplify.c
+++ b/gcc/cp/cp-gimplify.c
@@ -991,6 +991,15 @@ cp_fold_r (tree *stmt_p, int *walk_subtrees, void *data)
   return NULL;
 }
 
+/* Fold ALL the trees!  FIXME we should be able to remove this, but
+   apparently that still causes optimization regressions.  */
+
+void
+cp_fold_function (tree fndecl)
+{
+  cp_walk_tree (_SAVED_TREE (fndecl), cp_fold_r, NULL, NULL);
+}
+
 /* Perform any pre-gimplification lowering of C++ front end trees to
GENERIC.  */
 
@@ -1475,10 +1484,6 @@ cp_genericize (tree fndecl)
 {
   tree t;
 
-  /* Fold ALL the trees!  FIXME we should be able to remove this, but
- apparently that still causes optimization regressions.  */
-  cp_walk_tree (_SAVED_TREE (fndecl), cp_fold_r, NULL, NULL);
-
   /* Fix up the types of parms passed by invisible reference.  */
   for (t = DECL_ARGUMENTS (fndecl); t; t = DECL_CHAIN (t))
 if (TREE_ADDRESSABLE (TREE_TYPE (t)))
@@ -1936,11 +1941,11 @@ cp_fold (tree x)
   location_t loc;
   bool rval_ops = true;
 
-  if (!x || error_operand_p (x))
+  if (!x || x == error_mark_node)
 return x;
 
   if (processing_template_decl
-  || (EXPR_P (x) && !TREE_TYPE (x)))
+  || (EXPR_P (x) && (!TREE_TYPE (x) || TREE_TYPE (x) == error_mark_node)))
 return x;
 
   /* Don't bother to cache DECLs or constants.  */
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 1b2563d..6190f4e 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -6812,6 +6812,7 @@ extern tree cxx_omp_clause_dtor			(tree, tree);
 extern void cxx_omp_finish_clause		(tree, gimple_seq *);
 extern bool cxx_omp_privatize_by_reference	(const_tree);
 extern bool cxx_omp_disregard_value_expr	(tree, bool);
+extern void cp_fold_function			(tree);
 extern tree cp_fully_fold			(tree);
 
 /* in name-lookup.c */
diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 0af7bd4..62636c9 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -14580,6 +14580,14 @@ finish_function (int flags)
  the NRV transformation.   */
   maybe_save_function_definition (fndecl);
 
+  /* Invoke the pre-genericize plugin before we start munging things.  */
+  if (!processing_template_decl)
+invoke_plugin_callbacks (PLUGIN_PRE_GENERICIZE, fndecl);
+
+  /* Perform delayed folding before NRV transformation.  */
+  if (!processing_template_decl)
+cp_fold_function (fndecl);
+
   /* Set up the named return value optimization, if we can.  Candidate
  variables are selected in check_return_expr.  */
   if (current_function_return_value)
@@ -14686,7 +14694,6 @@ finish_function (int flags)
   if (!processing_template_decl)
 {
   struct language_function *f = DECL_SAVED_FUNCTION_DATA (fndecl);
-  invoke_plugin_callbacks (PLUGIN_PRE_GENERICIZE, fndecl);
   cp_genericize (fndecl);
   /* Clear out the bits we don't need.  */
   f->x_current_class_ptr = NULL;


C++ PATCH for c++/68170 (-fconcepts and template friend)

2015-12-07 Thread Jason Merrill
The code for recognizing a constrained partial specialization of a class 
was incorrectly treating the template friend as a partial specialization 
because "class A" was finding the injected-class-name from the enclosing 
scope.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit 1155b92a5e8151ba85a35661089972658074607b
Author: Jason Merrill 
Date:   Mon Dec 7 10:02:29 2015 -0500

	PR c++/68170
	* pt.c (maybe_new_partial_specialization): The injected-class-name
	is not a new partial specialization.

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 22dcee2..60cc94c 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -855,6 +855,10 @@ maybe_new_partial_specialization (tree type)
   if (!current_template_parms)
 return NULL_TREE;
 
+  // The injected-class-name is not a new partial specialization.
+  if (DECL_SELF_REFERENCE_P (TYPE_NAME (type)))
+	return NULL_TREE;
+
   // If the constraints are not the same as those of the primary
   // then, we can probably create a new specialization.
   tree type_constr = current_template_constraints ();
diff --git a/gcc/testsuite/g++.dg/template/friend60.C b/gcc/testsuite/g++.dg/template/friend60.C
new file mode 100644
index 000..5ba9ab2
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/friend60.C
@@ -0,0 +1,13 @@
+// PR c++/68170
+
+template< typename T >
+class A
+{
+};
+
+template<>
+class A< void >
+{
+  template< typename X >
+  friend class A;
+};


Re: [Patch] Fix bug for frame instructions in annulled delay slots

2015-12-07 Thread Bernd Schmidt

On 12/07/2015 07:54 PM, Steve Ellcey  wrote:

  if (must_annul)
-   used_annul = 1;
+   {
+ /* Frame related instructions cannot go into annulled delay
+slots, it messes up the dwarf info.  */
+ if (RTX_FRAME_RELATED_P (trial))
+   return;


Don't you need to use break rather than return?


+ else if (!RTX_FRAME_RELATED_P (trial) \


Stray backslash.

Other than that I think this is OK. There are some preexisting tests for 
frame related insns already in this code.



Bernd



Re: [Patch] Fix bug for frame instructions in annulled delay slots

2015-12-07 Thread Jeff Law

On 12/07/2015 12:28 PM, Bernd Schmidt wrote:

On 12/07/2015 07:54 PM, Steve Ellcey  wrote:

if (must_annul)
-used_annul = 1;
+{
+  /* Frame related instructions cannot go into annulled delay
+ slots, it messes up the dwarf info.  */
+  if (RTX_FRAME_RELATED_P (trial))
+return;


Don't you need to use break rather than return?


+  else if (!RTX_FRAME_RELATED_P (trial) \


Stray backslash.

Other than that I think this is OK. There are some preexisting tests for
frame related insns already in this code.
Also note there's probably port cleanup that could happen once this goes 
in.  IIRC the PA port (for example) explicitly disallows frame related 
insns from many (most, all?) delay slots.  Other targets may be doing 
something similar.


jeff


Re: [Patch] Fix bug for frame instructions in annulled delay slots

2015-12-07 Thread Steve Ellcey
On Mon, 2015-12-07 at 20:28 +0100, Bernd Schmidt wrote:
> On 12/07/2015 07:54 PM, Steve Ellcey  wrote:
> >   if (must_annul)
> > -   used_annul = 1;
> > +   {
> > + /* Frame related instructions cannot go into annulled delay
> > +slots, it messes up the dwarf info.  */
> > + if (RTX_FRAME_RELATED_P (trial))
> > +   return;
> 
> Don't you need to use break rather than return?

I am not sure about this.  There is an earlier if statement in the loop
that does a 'return' instead of a break (or continue) and there is a 
return in the 'else' part of the if that sets must_annul.  Both of these
are inside the loop that looks at all the instructions in the sequence
'seq'.  I think the code is looking at all the instructions in the
sequence and if any of them fail one of the tests in the loop (in this
case require annulling) then we can't handle any of the instructions in
the sequence and so we return immediately without putting any of the
instructions from 'seq' in the delay slot.  I believe a frame related
instruction in an annulled branch needs to be handled that way too.

> 
> > + else if (!RTX_FRAME_RELATED_P (trial) \
> 
> Stray backslash.

That is easily fixed.

> Other than that I think this is OK. There are some preexisting tests for 
> frame related insns already in this code.
> 
> 
> Bernd
> 





Re: -fstrict-aliasing fixes 5/6: make type system independent of flag_strict_aliasing

2015-12-07 Thread Jan Hubicka
> > > Bootstrapped/regtested x86_64-linux and also lto-bootstraped. Looks OK?
> > > 
> > >   * alias.c (alias_set_subset_of, alias_sets_conflict_p,
> > >   objects_must_conflict_p): Short circuit for !flag_strict_aliasing
> > >   (get_alias_set): Remove flag_strict_aliasing check.
> > >   (new_alias_set): Likewise.
> > 
> > Not clear whether it's this patch specifically or another one in the 
> > series, 
> > but the compiler now hangs on simple Ada code it used to compile instantly.
> > 
> > A couple of testcases is attached.  It looks like the compiler is now stuck 
> > in 
> > get_alias_set endlessly pushing references onto a vector.
> uhm, sorry. I will take a look.
The problem is with the type:
(gdb) p debug_tree (p)
 
sizes-gimplified public visited unsigned DI
size  constant visited 64>
unit size  constant visited 8>
align 64 symtab 0 alias set -1 canonical type 0x76af02a0
pointer_to_this >

it is a recursive pointer to itself. Does this make sense in Ada?  If so we
will need to add a recursion guard into the loop and put the alias set into
voidptr_alias_set.  It more looks like a frontend bug to me - I can not think
of a use for this beast.

Honza


Re: [PATCH] Adjust vect-widen-mult-const-[su]16.c for r226675

2015-12-07 Thread Bill Schmidt
On Mon, 2015-12-07 at 19:17 +0100, Richard Biener wrote:
> On December 7, 2015 6:21:36 PM GMT+01:00, Bill Schmidt 
>  wrote:
> >Hi Richi,
> >
> >I was afraid this would break X86.  Unfortunately, your proposed patch
> >didn't change any output for me.  Still seeing 6 and 8 instances of
> >"pattern recognized", unfortunately.
> 
> 
> Hmm, can you open a PR and attach vectorizer dumps?

PR68776.  Thanks!

Bill

> 
> Thanks,
> Richard.
> >Bill
> >
> >On Mon, 2015-12-07 at 11:50 +0100, Richard Biener wrote:
> >> On Fri, Dec 4, 2015 at 8:51 PM, Bill Schmidt
> >>  wrote:
> >> > Since r226675, we have been seeing these failures:
> >> >
> >> > FAIL: gcc.dg/vect/vect-widen-mult-const-s16.c -flto
> >-ffat-lto-objects
> >> > scan-tree-dump-times vect "pattern recognized" 2
> >> > FAIL: gcc.dg/vect/vect-widen-mult-const-s16.c scan-tree-dump-times
> >vect
> >> > "pattern recognized" 2
> >> > FAIL: gcc.dg/vect/vect-widen-mult-const-u16.c -flto
> >-ffat-lto-objects
> >> > scan-tree-dump-times vect "pattern recognized" 2
> >> > FAIL: gcc.dg/vect/vect-widen-mult-const-u16.c scan-tree-dump-times
> >vect
> >> > "pattern recognized" 2
> >> >
> >> > Comparing the vect-details dumps from r226674 to r226675, I see
> >these as
> >> > the reason:
> >> >
> >> > 63a64,66
> >> >>
> >/home/wschmidt/gcc/gcc-mainline-base/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-s16.c:16:3:
> >note: vect_recog_mult_pattern: detected:
> >> >>
> >/home/wschmidt/gcc/gcc-mainline-base/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-s16.c:16:3:
> >note: patt_47 = _6 << 2;
> >> >>
> >/home/wschmidt/gcc/gcc-mainline-base/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-s16.c:16:3:
> >note: pattern recognized: patt_47 = _6 << 2;
> >> > 70a74,76
> >> >>
> >/home/wschmidt/gcc/gcc-mainline-base/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-s16.c:16:3:
> >note: vect_recog_mult_pattern: detected:
> >> >>
> >/home/wschmidt/gcc/gcc-mainline-base/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-s16.c:16:3:
> >note: patt_40 = _6 << 1;
> >> >>
> >/home/wschmidt/gcc/gcc-mainline-base/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-s16.c:16:3:
> >note: pattern recognized: patt_40 = _6 << 1;
> >> >
> >> > 747a754,756
> >> >>
> >/home/wschmidt/gcc/gcc-mainline-base/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-s16.c:31:3:
> >note: vect_recog_mult_pattern: detected:
> >> >>
> >/home/wschmidt/gcc/gcc-mainline-base/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-s16.c:31:3:
> >note: patt_47 = _6 << 2;
> >> >>
> >/home/wschmidt/gcc/gcc-mainline-base/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-s16.c:31:3:
> >note: pattern recognized: patt_47 = _6 << 2;
> >> > 754a764,766
> >> >>
> >/home/wschmidt/gcc/gcc-mainline-base/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-s16.c:31:3:
> >note: vect_recog_mult_pattern: detected:
> >> >>
> >/home/wschmidt/gcc/gcc-mainline-base/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-s16.c:31:3:
> >note: patt_40 = _6 << 1;
> >> >>
> >/home/wschmidt/gcc/gcc-mainline-base/gcc/testsuite/gcc.dg/vect/vect-widen-mult-const-s16.c:31:3:
> >note: pattern recognized: patt_40 = _6 << 1;
> >> >
> >> > These seems precisely what's expected, given the nature of the
> >patch,
> >> > which is looking for these opportunities.  So it's likely that we
> >should
> >> > just change
> >> >
> >> > /* { dg-final { scan-tree-dump-times "pattern recognized" 2
> >> > "vect" { target vect_widen_mult_hi_to_si_pattern } } } */
> >> >
> >> > to
> >> >
> >> > /* { dg-final { scan-tree-dump-times "pattern recognized" 6
> >> > "vect" { target vect_widen_mult_hi_to_si_pattern } } } */
> >> >
> >> > and similarly for the unsigned case.  The following patch does
> >this.
> >> > However, I wanted to run this by Venkat since this was apparently
> >not
> >> > detected when his patch went in.  This doesn't appear to be a
> >> > target-specific issue, and most targets support
> >> > vect_widen_mult_hi_to_si_pattern, so I'm not sure why this wasn't
> >fixed
> >> > with the original patch.  Will this change break on any other
> >targets
> >> > for some reason?
> >> >
> >> > Tested on powerpc64le-unknown-linux-gnu.  Ok for trunk?
> >> 
> >> Hmm.  That will FAIL on x86_64 though because it can handle
> >multiplication
> >> natively.  I think the pattern recognition is simply bogus as it
> >fails to detect
> >> the stmt is already part of the widen-mult pattern?  In fact, pattern
> >> recognition
> >> looping over all pattern functions even if one already matched on the
> >very
> >> same stmt looks bogus to me.
> >> 
> >> Does the (untested)
> >> 
> >> Index: gcc/tree-vect-patterns.c
> >> ===
> >> --- gcc/tree-vect-patterns.c(revision 231357)
> >> +++ gcc/tree-vect-patterns.c(working copy)
> >> @@ -3791,7 +3791,7 @@ vect_mark_pattern_stmts (gimple *orig_st
> >> This function also does some bookkeeping, as explained in the
> >documentation
> >> 

Re: [PATCH 3b/4][AArch64] Add scheduling model for Exynos M1

2015-12-07 Thread Evandro Menezes

On 12/04/2015 03:25 AM, Kyrill Tkachov wrote:

This is ok arm-wise, sorry for the delay.
Make sure to regenerate and commit the updated config/arm/arm-tune.md 
hunk

when committing the patch.


Checked in as r231378.

Thank you,

--
Evandro Menezes



Re: [Patch] Fix bug for frame instructions in annulled delay slots

2015-12-07 Thread Bernd Schmidt

On 12/07/2015 08:43 PM, Steve Ellcey wrote:

I am not sure about this.  There is an earlier if statement in the loop
that does a 'return' instead of a break (or continue) and there is a
return in the 'else' part of the if that sets must_annul.  Both of these
are inside the loop that looks at all the instructions in the sequence
'seq'.  I think the code is looking at all the instructions in the
sequence and if any of them fail one of the tests in the loop (in this
case require annulling) then we can't handle any of the instructions in
the sequence and so we return immediately without putting any of the
instructions from 'seq' in the delay slot.  I believe a frame related
instruction in an annulled branch needs to be handled that way too.


Ah, I think I was looking at the other function that has the same 
must_annul test (steal_delay_list_from_fallthrough). The patch is ok 
without the backslash. Maybe the other function ought to be changed as 
well though?



Bernd


  1   2   >