date:20151217

Hi,

We decided to apply the following patch to the ARM embedded 5 branch.

Best regards,

Thomas

> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
> Sent: Thursday, December 17, 2015 4:08 PM
> To: gcc-patches@gcc.gnu.org; Richard Earnshaw; Ramana Radhakrishnan;
> Kyrylo Tkachov
> Subject: [PATCH, ARM 5/6] Add support for MOVT/MOVW to ARMv8-M
> Baseline
> 
> Hi,
> 
> This patch is part of a patch series to add support for ARMv8-M[1] to GCC.
> This specific patch makes the compiler start generating code with the
> new MOVT/MOVW instructions for ARMv8-M Baseline.
> 
> [1] For a quick overview of ARMv8-M please refer to the initial cover
> letter.
> 
> ChangeLog entry is as follows:
> 
> 
> *** gcc/ChangeLog ***
> 
> 2015-11-13  Thomas Preud'homme  
> 
> * config/arm/arm.h (TARGET_HAVE_MOVT): Include ARMv8-M as
> having MOVT.
> * config/arm/arm.c (arm_arch_name): (const_ok_for_op): Check
> MOVT/MOVW
> availability with TARGET_HAVE_MOVT.
> (thumb_legitimate_constant_p): Legalize high part of a label_ref as a
> constant.
> (thumb1_rtx_costs): Also return 0 if setting a half word constant and
> movw is available.
> (thumb1_size_rtx_costs): Make set of half word constant also cost 1
> extra instruction if MOVW is available.  Make constant with bottom
> half
> word zero cost 2 instruction if MOVW is available.
> * config/arm/arm.md (define_attr "arch"): Add v8mb.
> (define_attr "arch_enabled"): Set to yes if arch value is v8mb and
> target is ARMv8-M Baseline.
> * config/arm/thumb1.md (thumb1_movdi_insn): Add ARMv8-M
> Baseline only
> alternative for constants satisfying j constraint.
> (thumb1_movsi_insn): Likewise.
> (movsi splitter for K alternative): Tighten condition to not trigger
> if movt is available and j constraint is satisfied.
> (Pe immediate splitter): Likewise.
> (thumb1_movhi_insn): Add ARMv8-M Baseline only alternative for
> constant fitting in an halfword to use movw.
> * doc/sourcebuild.texi (arm_thumb1_movt_ko): Document new
> ARM
> effective target.
> 
> 
> *** gcc/testsuite/ChangeLog ***
> 
> 2015-11-13  Thomas Preud'homme  
> 
> * lib/target-supports.exp
> (check_effective_target_arm_thumb1_movt_ko):
> Define effective target.
> * gcc.target/arm/pr42574.c: Require arm_thumb1_movt_ko instead
> of
> arm_thumb1_ok as effective target to exclude ARMv8-M Baseline.
> 
> 
> diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
> index ff3cfcd..015df50 100644
> --- a/gcc/config/arm/arm.h
> +++ b/gcc/config/arm/arm.h
> @@ -261,7 +261,7 @@ extern void
> (*arm_lang_output_object_attributes_hook)(void);
>  #define TARGET_HAVE_LDACQ(TARGET_ARM_ARCH >= 8 &&
> arm_arch_notm)
> 
>  /* Nonzero if this chip provides the movw and movt instructions.  */
> -#define TARGET_HAVE_MOVT (arm_arch_thumb2)
> +#define TARGET_HAVE_MOVT (arm_arch_thumb2 || arm_arch8)
> 
>  /* Nonzero if integer division instructions supported.  */
>  #define TARGET_IDIV  ((TARGET_ARM && arm_arch_arm_hwdiv) \
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index 51d501e..d832309 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -8158,6 +8158,12 @@ arm_legitimate_constant_p_1
> (machine_mode, rtx x)
>  static bool
>  thumb_legitimate_constant_p (machine_mode mode
> ATTRIBUTE_UNUSED, rtx x)
>  {
> +  /* Splitters for TARGET_USE_MOVT call arm_emit_movpair which
> creates high
> + RTX.  These RTX must therefore be allowed for Thumb-1 so that
> when run
> + for ARMv8-M baseline or later the result is valid.  */
> +  if (TARGET_HAVE_MOVT && GET_CODE (x) == HIGH)
> +x = XEXP (x, 0);
> +
>return (CONST_INT_P (x)
> || CONST_DOUBLE_P (x)
> || CONSTANT_ADDRESS_P (x)
> @@ -8244,7 +8250,8 @@ thumb1_rtx_costs (rtx x, enum rtx_code code,
> enum rtx_code outer)
>  case CONST_INT:
>if (outer == SET)
>   {
> -   if ((unsigned HOST_WIDE_INT) INTVAL (x) < 256)
> +   if ((unsigned HOST_WIDE_INT) INTVAL (x) < 256
> +   || (TARGET_HAVE_MOVT && !(INTVAL (x) & 0x)))
>   return 0;
> if (thumb_shiftable_const (INTVAL (x)))
>   return COSTS_N_INSNS (2);
> @@ -8994,16 +9001,24 @@ thumb1_size_rtx_costs (rtx x, enum
> rtx_code code, enum rtx_code outer)
>the mode.  */
>words = ARM_NUM_INTS (GET_MODE_SIZE (GET_MODE (SET_DEST
> (x;
>return COSTS_N_INSNS (words)
> -  + COSTS_N_INSNS (1) * (satisfies_constraint_J (SET_SRC (x))
> - || satisfies_constraint_K (SET_SRC (x))
> -/* thumb1_movdi_insn.  */
> - || ((words > 1) && MEM_P (SET_SRC
>

Re: Last testcase for PR middle-end/25140

On Wed, Dec 16, 2015 at 8:22 PM, Jan Hubicka  wrote:
> Hi,
> I checked the ipa-pta and pta implementations and these seems to work just
> fine with presence of aliases because get_constraint_for_ssa_var already
> looks into the alias targets.
>
> This patch adds a testcase I constructed.  Since I am done with auditing
> *alias*.c for variable aliases I will close the PR (after 10 years, yay)

Note that alias-2.c still fails on some targets (miscompiled in RTL
opt somewhere).
PR68832

Richard.

> Honza
>
> PR middle-end/25140
> * gcc.c-torture/execute/alias-4.c: New testcase.
> Index: testsuite/gcc.c-torture/execute/alias-4.c
> ===
> --- testsuite/gcc.c-torture/execute/alias-4.c   (revision 0)
> +++ testsuite/gcc.c-torture/execute/alias-4.c   (revision 0)
> @@ -0,0 +1,19 @@
> +/* { dg-require-alias "" } */
> +int a = 1;
> +extern int b __attribute__ ((alias ("a")));
> +int c = 1;
> +extern int d __attribute__ ((alias ("c")));
> +main (int argc)
> +{
> +  int *p;
> +  int *q;
> +  if (argc)
> +p = , q = 
> +  else
> +p = , q = 
> +  *p = 1;
> +  *q = 2;
> +  if (*p == 1)
> +__builtin_abort ();
> +  return 0;
> +}

[PATCH 2/5] Fix more asymmetric comparison functions

Some more symmetry fixes.  These were detected manually (not via 
automatic analysis by SortChecker)

so I've put them to a separate patch.

Cc-ing
* Alexandre for sel_rank_for_schedule
* Ben for cmp_modes
* Jakub for range_entry_cmp
* Richard for sort_bbs_in_loop_postorder_cmp, 
sort_locs_in_loop_postorder_cmp, find_ref_loc_in_loop_cmp and 
dr_group_sort_cmp


/Yury
>From 5716669d0b88265ee610ad139a0dc4152d1c20f3 Mon Sep 17 00:00:00 2001
From: Yury Gribov 
Date: Sat, 12 Dec 2015 10:27:45 +0300
Subject: [PATCH 2/5] Fix more asymmetric comparison functions.

2015-12-17  Yury Gribov  

	* genmodes.c (cmp_modes): Make symmetric.
	* sel-sched.c (sel_rank_for_schedule): Ditto.
	* tree-ssa-loop-im.c (sort_bbs_in_loop_postorder_cmp):
	(sort_locs_in_loop_postorder_cmp):
	(find_ref_loc_in_loop_cmp): Check invariant.
	* tree-ssa-reassoc.c (range_entry_cmp): Make symmetric.
	* tree-vect-data-refs (dr_group_sort_cmp): Ditto.
---
 gcc/genmodes.c|  6 --
 gcc/sel-sched.c   |  4 +++-
 gcc/tree-ssa-loop-im.c| 19 +++
 gcc/tree-ssa-reassoc.c|  8 +++-
 gcc/tree-vect-data-refs.c |  7 ---
 5 files changed, 29 insertions(+), 15 deletions(-)

diff --git a/gcc/genmodes.c b/gcc/genmodes.c
index 15d62a0..f78a4da 100644
--- a/gcc/genmodes.c
+++ b/gcc/genmodes.c
@@ -813,8 +813,9 @@ cmp_modes (const void *a, const void *b)
 {
   if (m->counter < n->counter)
 	return -1;
-  else
+  else if (m->counter > n->counter)
 	return 1;
+  return 0;
 }
 
   if (m->component->bytesize > n->component->bytesize)
@@ -829,8 +830,9 @@ cmp_modes (const void *a, const void *b)
 
   if (m->counter < n->counter)
 return -1;
-  else
+  else if (m->counter > n->counter)
 return 1;
+  return 0;
 }
 
 static void
diff --git a/gcc/sel-sched.c b/gcc/sel-sched.c
index aebc2d9..c6efe9b 100644
--- a/gcc/sel-sched.c
+++ b/gcc/sel-sched.c
@@ -3343,7 +3343,9 @@ sel_rank_for_schedule (const void *x, const void *y)
   tmp2_insn = EXPR_INSN_RTX (tmp2);
 
   /* Schedule debug insns as early as possible.  */
-  if (DEBUG_INSN_P (tmp_insn) && !DEBUG_INSN_P (tmp2_insn))
+  if (DEBUG_INSN_P (tmp_insn) && DEBUG_INSN_P (tmp2_insn))
+return 0;
+  else if (DEBUG_INSN_P (tmp_insn))
 return -1;
   else if (DEBUG_INSN_P (tmp2_insn))
 return 1;
diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c
index 9b1b815..b53a490 100644
--- a/gcc/tree-ssa-loop-im.c
+++ b/gcc/tree-ssa-loop-im.c
@@ -1504,7 +1504,11 @@ sort_bbs_in_loop_postorder_cmp (const void *bb1_, const void *bb2_)
   struct loop *loop2 = bb2->loop_father;
   if (loop1->num == loop2->num)
 return 0;
-  return bb_loop_postorder[loop1->num] < bb_loop_postorder[loop2->num] ? -1 : 1;
+  gcc_assert(bb_loop_postorder[loop1->num] != bb_loop_postorder[loop2->num]);
+  if (bb_loop_postorder[loop1->num] < bb_loop_postorder[loop2->num])
+return -1;
+  else
+return 1;
 }
 
 /* qsort sort function to sort ref locs after their loop fathers postorder.  */
@@ -1518,7 +1522,11 @@ sort_locs_in_loop_postorder_cmp (const void *loc1_, const void *loc2_)
   struct loop *loop2 = gimple_bb (loc2->stmt)->loop_father;
   if (loop1->num == loop2->num)
 return 0;
-  return bb_loop_postorder[loop1->num] < bb_loop_postorder[loop2->num] ? -1 : 1;
+  gcc_assert(bb_loop_postorder[loop1->num] != bb_loop_postorder[loop2->num]);
+  if (bb_loop_postorder[loop1->num] < bb_loop_postorder[loop2->num])
+return -1;
+  else
+return 1;
 }
 
 /* Gathers memory references in loops.  */
@@ -1625,8 +1633,11 @@ find_ref_loc_in_loop_cmp (const void *loop_, const void *loc_)
   if (loop->num  == loc_loop->num
   || flow_loop_nested_p (loop, loc_loop))
 return 0;
-  return (bb_loop_postorder[loop->num] < bb_loop_postorder[loc_loop->num]
-	  ? -1 : 1);
+  gcc_assert(bb_loop_postorder[loop->num] != bb_loop_postorder[loc_loop->num]);
+  if (bb_loop_postorder[loop->num] < bb_loop_postorder[loc_loop->num])
+return -1;
+  else
+return 1;
 }
 
 /* Iterates over all locations of REF in LOOP and its subloops calling
diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
index e54700e..472c8b1 100644
--- a/gcc/tree-ssa-reassoc.c
+++ b/gcc/tree-ssa-reassoc.c
@@ -2018,11 +2018,9 @@ range_entry_cmp (const void *a, const void *b)
 
   if (p->idx < q->idx)
 return -1;
-  else
-{
-  gcc_checking_assert (p->idx > q->idx);
-  return 1;
-}
+  else if (p->idx > q->idx)
+return 1;
+  return 0;
 }
 
 /* Helper routine of optimize_range_test.
diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index 4c566c8..7755aaa 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -2643,9 +2643,10 @@ dr_group_sort_cmp (const void *dra_, const void *drb_)
 
   /* Then sort after DR_INIT.  In case of identical DRs sort after stmt UID.  */
   cmp = tree_int_cst_compare (DR_INIT (dra), DR_INIT (drb));
-  if (cmp == 0)
-return gimple_uid (DR_STMT (dra)) <

[PATCH 5/5] Fix intransitive comparison in dr_group_sort_cmp


That's an interesting one. The original comparison function assumes that
operand_equal_p(a,b) is true iff compare_tree(a, b) == 0.
Unfortunately that's not true (functions are written by different authors).

This causes subtle violation of transitiveness.

I believe removing operand_equal_p should preserve the intended semantics
(same approach taken in another comparison function in this file - 
comp_dr_with_seg_len_pair).


Cc-ing Cong Hou and Richard who are the authours.

/Yury
From 7fb1fd8b2027a3a3e2d914f8bd000fe53bffe110 Mon Sep 17 00:00:00 2001
From: Yury Gribov 
Date: Sun, 13 Dec 2015 01:30:22 +0300
Subject: [PATCH 5/5] Fix intransitive comparison in dr_group_sort_cmp.

2012-12-17  Yury Gribov  

	* tree-vect-data-refs.c (dr_group_sort_cmp):
	Make transitive.

Error message:
Dec 10 22:28:59 yugr-ubuntu1404 : cc1plus[23983]: qsort: comparison function is not transitive (comparison function 0xddbbf0 (/home/yugr/build/gcc-4.9.3-patched-bootstrap/gcc/cc1plus+9dbbf0), called from 0xddd233 (/home/yugr/build/gcc-4.9.3-patched-bootstrap/gcc/cc1plus+9dd233), cmdline is "/home/yugr/build/gcc-4.9.3-patched-bootstrap/gcc/testsuite/g++/../../cc1plus -quiet -nostdinc++ -I /home/yugr/build/gcc-4.9.3-patched-bootstrap/x86_64-unknown-linux-gnu/libstdc++-v3/include/x86_64-unknown-linux-gnu -I /home/yugr/build/gcc-4.9.3-patched-bootstrap/x86_64-unknown-linux-gnu/libstdc++-v3/include -I /home/yugr/src/gcc-4.9.3-patched/libstdc++-v3/libsupc++ -I /home/yugr/src/gcc-4.9.3-patched/libstdc++-v3/include/backward -I /home/yugr/src/gcc-4.9.3-patched/libstdc++-v3/testsuite/util -imultiarch x86_64-linux-gnu -iprefix /home/yugr/install/gcc-4.9.3/lib/gcc/x86_64-unknown-linux-gnu/4.9.3/ -isystem /home/yugr/build/gcc-4.9.3-patched-bootstrap/gcc/testsuite/g++/../../include -isystem /home/yugr/build/gcc-4.9.3-patched-bootstrap/gcc/testsuite/g++/../../include-fixed -D_GNU_SOURCE /home/yugr/src/gcc-4.9.3-patched/gcc/testsuite/g++.dg/vect/pr43771.cc -quiet -dumpbase pr43771.cc -msse2 -mtune=generic -march=x86-64 -auxbase-strip pr43771.s -O2 -std=c++1y -fno-diagnostics-show-caret -fdiagnostics-color=never -fmessage-length=0 -ftree-vectorize -fvect-cost-model=unlimited -fdump-tree-vect-details -o pr43771.s")
---
 gcc/tree-vect-data-refs.c | 39 +--
 1 file changed, 13 insertions(+), 26 deletions(-)

diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index 7755aaa..e69875a 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -2604,42 +2604,29 @@ dr_group_sort_cmp (const void *dra_, const void *drb_)
 return loopa->num < loopb->num ? -1 : 1;
 
   /* Ordering of DRs according to base.  */
-  if (!operand_equal_p (DR_BASE_ADDRESS (dra), DR_BASE_ADDRESS (drb), 0))
-{
-  cmp = compare_tree (DR_BASE_ADDRESS (dra), DR_BASE_ADDRESS (drb));
-  if (cmp != 0)
-return cmp;
-}
+  cmp = compare_tree (DR_BASE_ADDRESS (dra), DR_BASE_ADDRESS (drb));
+  if (cmp != 0)
+return cmp;
 
   /* And according to DR_OFFSET.  */
-  if (!dr_equal_offsets_p (dra, drb))
-{
-  cmp = compare_tree (DR_OFFSET (dra), DR_OFFSET (drb));
-  if (cmp != 0)
-return cmp;
-}
+  cmp = compare_tree (DR_OFFSET (dra), DR_OFFSET (drb));
+  if (cmp != 0)
+return cmp;
 
   /* Put reads before writes.  */
   if (DR_IS_READ (dra) != DR_IS_READ (drb))
 return DR_IS_READ (dra) ? -1 : 1;
 
   /* Then sort after access size.  */
-  if (!operand_equal_p (TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dra))),
-			TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (drb))), 0))
-{
-  cmp = compare_tree (TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dra))),
-  TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (drb;
-  if (cmp != 0)
-return cmp;
-}
+  cmp = compare_tree (TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dra))),
+  TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (drb;
+  if (cmp != 0)
+return cmp;
 
   /* And after step.  */
-  if (!operand_equal_p (DR_STEP (dra), DR_STEP (drb), 0))
-{
-  cmp = compare_tree (DR_STEP (dra), DR_STEP (drb));
-  if (cmp != 0)
-return cmp;
-}
+  cmp = compare_tree (DR_STEP (dra), DR_STEP (drb));
+  if (cmp != 0)
+return cmp;
 
   /* Then sort after DR_INIT.  In case of identical DRs sort after stmt UID.  */
   cmp = tree_int_cst_compare (DR_INIT (dra), DR_INIT (drb));
-- 
1.9.1

[PATCH, ARM 4/6] Factor out MOVW/MOVT availability and desirability checks

Hi,

This patch is part of a patch series to add support for ARMv8-M[1] to GCC. This 
specific patch factors out the checks for MOVW/MOVT availability and whether to 
use it. To this end, the new macro TARGET_HAVE_MOVT is introduced and code is 
modified to use it or the existing TARGET_USE_MOVT as needed.

[1] For a quick overview of ARMv8-M please refer to the initial cover letter.

ChangeLog entry is as follows:


*** gcc/ChangeLog ***

2015-11-09  Thomas Preud'homme  

* config/arm/arm.h (TARGET_USE_MOVT): Check MOVT/MOVW availability
with TARGET_HAVE_MOVT.
(TARGET_HAVE_MOVT): Define.
* config/arm/arm.c (const_ok_for_op): Check MOVT/MOVW
availability with TARGET_HAVE_MOVT.
* config/arm/arm.md (arm_movt): Use TARGET_HAVE_MOVT to check movt
availability.
(addsi splitter): Use TARGET_USE_MOVT to check whether to use
movt + movw.
(symbol_refs movsi splitter): Remove TARGET_32BIT check.
(arm_movtas_ze): Use TARGET_HAVE_MOVT to check movt availability.
* config/arm/constraints.md (define_constraint "j"): Use
TARGET_HAVE_MOVT to check movt availability.


diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index fed3205..1831d01 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -233,7 +233,7 @@ extern void (*arm_lang_output_object_attributes_hook)(void);
 
 /* Should MOVW/MOVT be used in preference to a constant pool.  */
 #define TARGET_USE_MOVT \
-  (arm_arch_thumb2 \
+  (TARGET_HAVE_MOVT \
&& (arm_disable_literal_pool \
|| (!optimize_size && !current_tune->prefer_constant_pool)))
 
@@ -268,6 +268,9 @@ extern void (*arm_lang_output_object_attributes_hook)(void);
 /* Nonzero if this chip supports load-acquire and store-release.  */
 #define TARGET_HAVE_LDACQ  (TARGET_ARM_ARCH >= 8)
 
+/* Nonzero if this chip provides the movw and movt instructions.  */
+#define TARGET_HAVE_MOVT   (arm_arch_thumb2)
+
 /* Nonzero if integer division instructions supported.  */
 #define TARGET_IDIV((TARGET_ARM && arm_arch_arm_hwdiv) \
 || (TARGET_THUMB2 && arm_arch_thumb_hwdiv))
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 62287bc..ec5197a 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -3851,7 +3851,7 @@ const_ok_for_op (HOST_WIDE_INT i, enum rtx_code code)
 {
 case SET:
   /* See if we can use movw.  */
-  if (arm_arch_thumb2 && (i & 0x) == 0)
+  if (TARGET_HAVE_MOVT && (i & 0x) == 0)
return 1;
   else
/* Otherwise, try mvn.  */
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 8ebb1bf..78dafa0 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -5736,7 +5736,7 @@
   [(set (match_operand:SI 0 "nonimmediate_operand" "=r")
(lo_sum:SI (match_operand:SI 1 "nonimmediate_operand" "0")
   (match_operand:SI 2 "general_operand"  "i")))]
-  "arm_arch_thumb2 && arm_valid_symbolic_address_p (operands[2])"
+  "TARGET_HAVE_MOVT && arm_valid_symbolic_address_p (operands[2])"
   "movt%?\t%0, #:upper16:%c2"
   [(set_attr "predicable" "yes")
(set_attr "predicable_short_it" "no")
@@ -5796,8 +5796,7 @@
   [(set (match_operand:SI 0 "arm_general_register_operand" "")
(const:SI (plus:SI (match_operand:SI 1 "general_operand" "")
   (match_operand:SI 2 "const_int_operand" ""]
-  "TARGET_THUMB2
-   && arm_disable_literal_pool
+  "TARGET_USE_MOVT
&& reload_completed
&& GET_CODE (operands[1]) == SYMBOL_REF"
   [(clobber (const_int 0))]
@@ -5827,8 +5826,7 @@
 (define_split
   [(set (match_operand:SI 0 "arm_general_register_operand" "")
(match_operand:SI 1 "general_operand" ""))]
-  "TARGET_32BIT
-   && TARGET_USE_MOVT && GET_CODE (operands[1]) == SYMBOL_REF
+  "TARGET_USE_MOVT && GET_CODE (operands[1]) == SYMBOL_REF
&& !flag_pic && !target_word_relocations
&& !arm_tls_referenced_p (operands[1])"
   [(clobber (const_int 0))]
@@ -11030,7 +11028,7 @@
(const_int 16)
(const_int 16))
 (match_operand:SI 1 "const_int_operand" ""))]
-  "arm_arch_thumb2"
+  "TARGET_HAVE_MOVT"
   "movt%?\t%0, %L1"
  [(set_attr "predicable" "yes")
   (set_attr "predicable_short_it" "no")
diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md
index d01a918..838e031 100644
--- a/gcc/config/arm/constraints.md
+++ b/gcc/config/arm/constraints.md
@@ -67,7 +67,7 @@
 
 (define_constraint "j"
  "A constant suitable for a MOVW instruction. (ARM/Thumb-2)"
- (and (match_test "TARGET_32BIT && arm_arch_thumb2")
+ (and (match_test "TARGET_HAVE_MOVT")
   (ior (and (match_code "high")
(match_test "arm_valid_symbolic_address_p (XEXP (op, 0))"))
   (and (match_code "const_int")


Testing:

* Toolchain was built successfully with the following multilib list: 
armv6-m,armv7-m,armv7e-m,cortex-m7. The

Re: [PATCH, testsuite] Fix PR68629: attr-simd-3.c failure on arm-none-eabi targets

2015-12-17 Thread Jakub Jelinek

On Thu, Dec 17, 2015 at 09:02:39AM +0100, Thomas Schwinge wrote:
> On Tue, 15 Dec 2015 17:44:59 +0100, I wrote:
> > On Wed, 9 Dec 2015 17:56:13 +0800, "Thomas Preud'homme" 
> >  wrote:
> > > c-c++-common/attr-simd-3.c fails to compile on arm-none-eabi targets due 
> > > to -fcilkplus needing -pthread which is not available for those targets. 
> > > This patch solves this issue by adding a condition to the cilkplus 
> > > effective target that compiling with -fcilkplus succeeds and requires 
> > > cilkplus as an effective target for attr-simd-3.c testcase.
> > 
> > > PR testsuite/68629
> > > * lib/target-supports.exp (check_effective_target_cilkplus): Also
> > > check that compiling with -fcilkplus does not give an error.
> > > * c-c++-common/attr-simd-3.c: Require cilkplus effective target.
> > 
> > > --- a/gcc/testsuite/lib/target-supports.exp
> > > +++ b/gcc/testsuite/lib/target-supports.exp
> > > @@ -1432,7 +1432,12 @@ proc check_effective_target_cilkplus { } {
> > >  if { [istarget avr-*-*] } {
> > >   return 0;
> > >  }
> > > -return 1
> > > +return [ check_no_compiler_messages_nocache fcilkplus_available 
> > > executable {
> > > + #ifdef __cplusplus
> > > + extern "C"
> > > + #endif
> > > + int dummy;
> > > + } "-fcilkplus" ]
> > >  }

That change has been obviously bad.  If anything, you want to make it
compile time only, i.e. check_no_compiler_messages_nocache fcilkplus_available 
assembly
Just look at cilk-plus.exp:
It checks check_effective_target_cilkplus, and performs lots of tests if it
it returns true, and then checks check_libcilkrts_available and performs
further tests.
So, if any use of -fcilkplus fails on your target, then putting it
into check_effective_target_cilkplus is fine, you won't lose any Cilk+
testing that way.  Otherwise, if it is conditional say only some constructs,
say array notation is fine, but _Cilk_for is not, then even that is wrong.

In any case, IMHO the attr-simd-3.c test just should be moved into
c-c++-common/cilk-plus/SE/ directory.

Jakub

RE: [PATCH, testsuite] Fix PR68629: attr-simd-3.c failure on arm-none-eabi targets

Hi,

> From: Jakub Jelinek [mailto:ja...@redhat.com]
> Sent: Thursday, December 17, 2015 4:26 PM
> > >
> > > > --- a/gcc/testsuite/lib/target-supports.exp
> > > > +++ b/gcc/testsuite/lib/target-supports.exp
> > > > @@ -1432,7 +1432,12 @@ proc check_effective_target_cilkplus { } {
> > > >  if { [istarget avr-*-*] } {
> > > > return 0;
> > > >  }
> > > > -return 1
> > > > +return [ check_no_compiler_messages_nocache
> fcilkplus_available executable {
> > > > +   #ifdef __cplusplus
> > > > +   extern "C"
> > > > +   #endif
> > > > +   int dummy;
> > > > +   } "-fcilkplus" ]
> > > >  }
> 
> That change has been obviously bad.  If anything, you want to make it
> compile time only, i.e. check_no_compiler_messages_nocache
> fcilkplus_available assembly

Indeed, I failed to parse the space and didn't realize the kind of testing 
could be selected.

> Just look at cilk-plus.exp:
> It checks check_effective_target_cilkplus, and performs lots of tests if it
> it returns true, and then checks check_libcilkrts_available and performs
> further tests.
> So, if any use of -fcilkplus fails on your target, then putting it
> into check_effective_target_cilkplus is fine, you won't lose any Cilk+
> testing that way.  Otherwise, if it is conditional say only some constructs,
> say array notation is fine, but _Cilk_for is not, then even that is wrong.

Ok. When I saw the very small list of target for which the condition returned 
true, I thought the goal was only to check if the target *could* support 
cilkplus and that actual support was tested by cilk-plus.exp. I'll revert this 
commit and prepare a patch to add arm in that list.

> 
> In any case, IMHO the attr-simd-3.c test just should be moved into
> c-c++-common/cilk-plus/SE/ directory.

That was my thought initially but then I changed my mind, thinking that the 
test was placed there for a reason. I'll prepare a third patch to do that.

My apologize for the breakage.

Best regards,

Thomas

RE: [PATCH, testsuite] Fix PR68629: attr-simd-3.c failure on arm-none-eabi targets

Reverted now.

> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
> Sent: Wednesday, December 09, 2015 5:56 PM
> To: gcc-patches@gcc.gnu.org
> Subject: [PATCH, testsuite] Fix PR68629: attr-simd-3.c failure on arm-
> none-eabi targets
> 
> c-c++-common/attr-simd-3.c fails to compile on arm-none-eabi targets
> due to -fcilkplus needing -pthread which is not available for those targets.
> This patch solves this issue by adding a condition to the cilkplus effective
> target that compiling with -fcilkplus succeeds and requires cilkplus as an
> effective target for attr-simd-3.c testcase.
> 
> ChangeLog entry is as follows:
> 
> 
> *** gcc/testsuite/ChangeLog ***
> 
> 2015-12-08  Thomas Preud'homme  
> 
> PR testsuite/68629
> * lib/target-supports.exp (check_effective_target_cilkplus): Also
> check that compiling with -fcilkplus does not give an error.
> * c-c++-common/attr-simd-3.c: Require cilkplus effective target.
> 
> 
> diff --git a/gcc/testsuite/c-c++-common/attr-simd-3.c b/gcc/testsuite/c-
> c++-common/attr-simd-3.c
> index d61ba82..1970c67 100644
> --- a/gcc/testsuite/c-c++-common/attr-simd-3.c
> +++ b/gcc/testsuite/c-c++-common/attr-simd-3.c
> @@ -1,4 +1,5 @@
>  /* { dg-do compile } */
> +/* { dg-require-effective-target "cilkplus" } */
>  /* { dg-options "-fcilkplus" } */
>  /* { dg-prune-output "undeclared here \\(not in a
> function\\)|\[^\n\r\]* was not declared in this scope" } */
> 
> diff --git a/gcc/testsuite/lib/target-supports.exp
> b/gcc/testsuite/lib/target-supports.exp
> index 4e349e9..95b903c 100644
> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp
> @@ -1432,7 +1432,12 @@ proc check_effective_target_cilkplus { } {
>  if { [istarget avr-*-*] } {
>   return 0;
>  }
> -return 1
> +return [ check_no_compiler_messages_nocache fcilkplus_available
> executable {
> + #ifdef __cplusplus
> + extern "C"
> + #endif
> + int dummy;
> + } "-fcilkplus" ]
>  }
> 
>  proc check_linker_plugin_available { } {
> 
> 
> Testsuite shows no regression when run with
>   + an arm-none-eabi GCC cross-compiler targeting Cortex-M3
>   + a bootstrapped x86_64-linux-gnu GCC native compiler
> 
> Is this ok for trunk?
> 
> Best regards,
> 
> Thomas
>

[PATCH 3/5] "Fix" intransitive comparison in reload_pseudo_compare_func

This patch fixes intransitive comparison in reload_pseudo_compare_func. 
Imagine the following

situation:
1) bitmap_bit_p is unset for A and B but set for C
2) A < B (due to early ira_reg_class_max_nregs comparison)
3) B < C (due to following regno_assign_info comparison)

It may then easily happen that A > C (due to regno_assign_info 
comparison) which violates the transitiveness requirement of total ordering.


Unfortunately I'm not sure how to properly fix this so Cc-ing Vladimir 
for help.


/Yury
From 83da5d11c4f013dd14c1ea0c1722c108d80f58ed Mon Sep 17 00:00:00 2001
From: Yury Gribov 
Date: Sat, 12 Dec 2015 10:08:45 +0300
Subject: [PATCH 3/5] "Fix" intransitive comparison in
 reload_pseudo_compare_func.

2015-12-17  Yury Gribov  

	* lra-assigns.c (reload_pseudo_compare_func):
	Make transitive.

Error message:
Dec 10 00:33:18 yugr-ubuntu1404 : cc1plus[612]: qsort: comparison function is not transitive (comparison function 0x87bc50 (/home/yugr/build/gcc-4.9.3-patched-bootstrap/gcc/cc1plus+47bc50), called from 0x87d25c (/home/yugr/build/gcc-4.9.3-patched-bootstrap/gcc/cc1plus+47d25c), cmdline is "/home/yugr/build/gcc-4.9.3-patched-bootstrap/./gcc/cc1plus -quiet -nostdinc++ -I . -I /home/yugr/src/gcc-4.9.3-patched/libsanitizer/tsan -I .. -I /home/yugr/src/gcc-4.9.3-patched/libsanitizer -I /home/yugr/src/gcc-4.9.3-patched/libsanitizer/include -I ../../libstdc++-v3/include -I ../../libstdc++-v3/include/x86_64-unknown-linux-gnu -I /home/yugr/src/gcc-4.9.3-patched/libsanitizer/../libstdc++-v3/libsupc++ -imultiarch x86_64-linux-gnu -iprefix /home/yugr/build/gcc-4.9.3-patched-bootstrap/gcc/../lib/gcc/x86_64-unknown-linux-gnu/4.9.3/ -isystem /home/yugr/build/gcc-4.9.3-patched-bootstrap/./gcc/include -isystem /home/yugr/build/gcc-4.9.3-patched-bootstrap/./gcc/include-fixed -MD .libs/tsan_interface_atomic.d -MF .deps/tsan_interface_atomic.Tpo -MP -MT tsan_interface_atomic.lo -D_GNU_SOURCE -D _GNU_SOURCE -D _DEBUG -D __STDC_CONSTANT_MACROS -D __STDC_FORMAT_MACROS -D __STDC_LIMIT_MACROS -D _GNU_SOURCE -D PIC -isystem /home/yugr/install/gcc-4.9.3/x86_64-unknown-linux-gnu/include -isystem /home/yugr/install/gcc-4.9.3/x86_64-unknown-linux-gnu/sys-include /home/yugr/src/gcc-4.9.3-patched/libsanitizer/tsan/tsan_interface_atomic.cc -quiet -dumpbase tsan_interface_atomic.cc -mtune=generic -march=x86-64 -auxbase-strip .libs/tsan_interface_atomic.o -g -O2 -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wpedantic -Wno-long-long -Wno-variadic-macros -fno-builtin -fno-exceptions -fno-rtti -fomit-frame-pointer -funwind-tables -fvisibility=hidden -fPIC -o /tmp/cc3IPd7A.s")
---
 gcc/lra-assigns.c | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/gcc/lra-assigns.c b/gcc/lra-assigns.c
index 2a9ff21..94f3e66 100644
--- a/gcc/lra-assigns.c
+++ b/gcc/lra-assigns.c
@@ -208,12 +208,7 @@ reload_pseudo_compare_func (const void *v1p, const void *v2p)
 return diff;
   if ((diff
= (ira_reg_class_max_nregs[cl2][lra_reg_info[r2].biggest_mode]
-	  - ira_reg_class_max_nregs[cl1][lra_reg_info[r1].biggest_mode])) != 0
-  /* The code below executes rarely as nregs == 1 in most cases.
-	 So we should not worry about using faster data structures to
-	 check reload pseudos.  */
-  && ! bitmap_bit_p (_reload_pseudos, r1)
-  && ! bitmap_bit_p (_reload_pseudos, r2))
+	  - ira_reg_class_max_nregs[cl1][lra_reg_info[r1].biggest_mode])) != 0)
 return diff;
   if ((diff = (regno_assign_info[regno_assign_info[r2].first].freq
 	   - regno_assign_info[regno_assign_info[r1].first].freq)) != 0)
-- 
1.9.1

[PATCH 4/5] Fix intransitive comparison in compare_access_positions

Another intransitive comparison in reload_pseudo_compare_func. Buggy 
scenario:

1) A and B are ints of equal presion so we return 0
2) C is REAL and thus can compare differently to A and B

Cc-ing Martin who's the original author.

/Yury
>From 6f3930ad81945f6b5d7aecfdda16089547a592d3 Mon Sep 17 00:00:00 2001
From: Yury Gribov 
Date: Sat, 12 Dec 2015 10:39:15 +0300
Subject: [PATCH 4/5] Fix intransitive comparison in compare_access_positions.

2015-12-17  Yury Gribov  

	* tree-sra.c (compare_access_positions):
	Make transitive.

Error message:
Dec 10 23:51:43 yugr-ubuntu1404 : f951[31364]: qsort: comparison function is not transitive (comparison function 0x9aa8e0 (/home/yugr/build/gcc-4.9.3-patched-bootstrap/gcc/f951+5aa8e0), called from 0x9afeda (/home/yugr/build/gcc-4.9.3-patched-bootstrap/gcc/f951+5afeda), cmdline is "/home/yugr/build/gcc-4.9.3-patched-bootstrap/gcc/testsuite/gfortran/../../f951 /home/yugr/src/gcc-4.9.3-patched/gcc/testsuite/gfortran.fortran-torture/execute/intrinsic_set_exponent.f90 -quiet -dumpbase intrinsic_set_exponent.f90 -mtune=generic -march=x86-64 -auxbase intrinsic_set_exponent -O1 -w -fno-diagnostics-show-caret -fdiagnostics-color=never -fintrinsic-modules-path /home/yugr/install/gcc-4.9.3/lib/gcc/x86_64-unknown-linux-gnu/4.9.3/finclude -o /tmp/ccwhVAn9.s")
---
 gcc/tree-sra.c | 20 +---
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
index c4fea5b..5028850 100644
--- a/gcc/tree-sra.c
+++ b/gcc/tree-sra.c
@@ -1432,6 +1432,13 @@ scan_function (void)
   return ret;
 }
 
+static int
+imprecise_int_type_p (const tree type)
+{
+  return INTEGRAL_TYPE_P (type)
+&& (TREE_INT_CST_LOW (TYPE_SIZE (type)) != TYPE_PRECISION (type));
+}
+
 /* Helper of QSORT function. There are pointers to accesses in the array.  An
access is considered smaller than another if it has smaller offset or if the
offsets are the same but is size is bigger. */
@@ -1471,16 +1478,15 @@ compare_access_positions (const void *a, const void *b)
 	return -1;
   /* Put the integral type with the bigger precision first.  */
   else if (INTEGRAL_TYPE_P (f1->type)
-	   && INTEGRAL_TYPE_P (f2->type))
+	   && INTEGRAL_TYPE_P (f2->type)
+	   && TYPE_PRECISION (f2->type) != TYPE_PRECISION (f1->type))
 	return TYPE_PRECISION (f2->type) - TYPE_PRECISION (f1->type);
   /* Put any integral type with non-full precision last.  */
-  else if (INTEGRAL_TYPE_P (f1->type)
-	   && (TREE_INT_CST_LOW (TYPE_SIZE (f1->type))
-		   != TYPE_PRECISION (f1->type)))
+  else if (imprecise_int_type_p (f1->type)
+	   && !imprecise_int_type_p (f2->type))
 	return 1;
-  else if (INTEGRAL_TYPE_P (f2->type)
-	   && (TREE_INT_CST_LOW (TYPE_SIZE (f2->type))
-		   != TYPE_PRECISION (f2->type)))
+  else if (!imprecise_int_type_p (f1->type)
+	   && imprecise_int_type_p (f2->type))
 	return -1;
   /* Stabilize the sort.  */
   return TYPE_UID (f1->type) - TYPE_UID (f2->type);
-- 
1.9.1

Re: [PATCH 1/4] gcc/arc: Fix warning in test




On 16/12/15 00:15, Andrew Burgess wrote:

Missing function declaration causes a warning, that results in test
failure.
Ah, this test was affected when the default language was changed to 
gnu11 in October last year.


gcc/testsuite/ChangeLog:

* gcc.target/arc/jump-around-jump.c (rtc_set_time): Declare.

Thanks, I've checked this in.

[PATCH 1/5] Fix asymmetric comparison functions


Some obvious symmetry fixes.

Cc-ing
* Andrey (Belevantsev) for bb_top_order_comparator
* Andrew (MacLeod) for compare_case_labels
* Andrew (Pinski) for resort_field_decl_cmp
* Diego for pair_cmp
* Geoff for resort_method_name_cmp
* Jakub for compare_case_labels
* Jason for method_name_cmp
* Richard for insert_phi_nodes_compare_var_infos, compare_case_labels
* Steven for cmp_v_in_regset_pool

/Yury
>From bf924dca4ccc3f8640438400e923a4c508e898e0 Mon Sep 17 00:00:00 2001
From: Yury Gribov 
Date: Sat, 12 Dec 2015 09:51:54 +0300
Subject: [PATCH 1/5] Fix asymmetric comparison functions.

Qsort requires user-defined comparison function to be
a total order. One of the requirements for this is being
symmetric i.e. return inverse results on element swap.
This patch fixes comparison functions to satisfy these
conditions.

2015-12-17  Yury Gribov  

	* c-family/c-common.c (resort_field_decl_cmp):
	Make symmteric.
	* cp/class.c (method_name_cmp): Ditto.
	(resort_method_name_cmp): Ditto.
	* fortran/interface.c (pair_cmp): Ditto.
	* gimple.c (compare_case_labels): Ditto.
	* tree-into-ssa.c (insert_phi_nodes_compare_var_infos):
	Ditto.
	* tree-vrp.c (compare_case_labels): Ditto.
	* sel-sched-ir.c (cmp_v_in_regset_pool): Ditto.
	(bb_top_order_comparator): Ditto.
---
 gcc/c-family/c-common.c |  4 +++-
 gcc/cp/class.c  | 10 ++
 gcc/fortran/interface.c |  6 +-
 gcc/gimple.c|  4 +++-
 gcc/sel-sched-ir.c  |  5 +++--
 gcc/tree-into-ssa.c |  5 +
 gcc/tree-vrp.c  |  4 +++-
 7 files changed, 24 insertions(+), 14 deletions(-)

diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 9bc02fc..eecdfb5 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -9956,8 +9956,10 @@ resort_field_decl_cmp (const void *x_p, const void *y_p)
 resort_data.new_value (, resort_data.cookie);
 if (d1 < d2)
   return -1;
+if (d1 > d2)
+  return 1;
   }
-  return 1;
+  return 0;
 }
 
 /* Resort DECL_SORTED_FIELDS because pointers have been reordered.  */
diff --git a/gcc/cp/class.c b/gcc/cp/class.c
index 216a301..3a740d2 100644
--- a/gcc/cp/class.c
+++ b/gcc/cp/class.c
@@ -2188,9 +2188,9 @@ method_name_cmp (const void* m1_p, const void* m2_p)
 return -1;
   if (*m2 == NULL_TREE)
 return 1;
-  if (DECL_NAME (OVL_CURRENT (*m1)) < DECL_NAME (OVL_CURRENT (*m2)))
-return -1;
-  return 1;
+  tree d1 = DECL_NAME (OVL_CURRENT (*m1));
+  tree d2 = DECL_NAME (OVL_CURRENT (*m2));
+  return d1 < d2 ? -1 : d1 > d2 ? 1 : 0;
 }
 
 /* This routine compares two fields like method_name_cmp but using the
@@ -2214,8 +2214,10 @@ resort_method_name_cmp (const void* m1_p, const void* m2_p)
 resort_data.new_value (, resort_data.cookie);
 if (d1 < d2)
   return -1;
+if (d1 > d2)
+  return 1;
   }
-  return 1;
+  return 0;
 }
 
 /* Resort TYPE_METHOD_VEC because pointers have been reordered.  */
diff --git a/gcc/fortran/interface.c b/gcc/fortran/interface.c
index bfd5d36..e4b93c8 100644
--- a/gcc/fortran/interface.c
+++ b/gcc/fortran/interface.c
@@ -3109,7 +3109,11 @@ pair_cmp (const void *p1, const void *p2)
 }
   if (a2->expr->expr_type != EXPR_VARIABLE)
 return 1;
-  return a1->expr->symtree->n.sym < a2->expr->symtree->n.sym;
+  if (a1->expr->symtree->n.sym < a2->expr->symtree->n.sym)
+return 1;
+  if (a1->expr->symtree->n.sym > a2->expr->symtree->n.sym)
+return -1;
+  return 0;
 }
 
 
diff --git a/gcc/gimple.c b/gcc/gimple.c
index bf552a7..51f515e 100644
--- a/gcc/gimple.c
+++ b/gcc/gimple.c
@@ -2774,7 +2774,9 @@ compare_case_labels (const void *p1, const void *p2)
   const_tree const case2 = *(const_tree const*)p2;
 
   /* The 'default' case label always goes first.  */
-  if (!CASE_LOW (case1))
+  if (!CASE_LOW (case1) && !CASE_LOW (case2))
+return 0;
+  else if (!CASE_LOW (case1))
 return -1;
   else if (!CASE_LOW (case2))
 return 1;
diff --git a/gcc/sel-sched-ir.c b/gcc/sel-sched-ir.c
index 2a9aa10..2f53d22 100644
--- a/gcc/sel-sched-ir.c
+++ b/gcc/sel-sched-ir.c
@@ -959,7 +959,7 @@ cmp_v_in_regset_pool (const void *x, const void *xx)
 return 1;
   else if (r1 < r2)
 return -1;
-  gcc_unreachable ();
+  return 0;
 }
 
 /* Free the regset pool possibly checking for memory leaks.  */
@@ -5935,8 +5935,9 @@ bb_top_order_comparator (const void *x, const void *y)
  bbs with greater number should go earlier.  */
   if (rev_top_order_index[bb1->index] > rev_top_order_index[bb2->index])
 return -1;
-  else
+  else if (rev_top_order_index[bb1->index] < rev_top_order_index[bb2->index])
 return 1;
+  return 0;
 }
 
 /* Create a region for LOOP and return its number.  If we don't want
diff --git a/gcc/tree-into-ssa.c b/gcc/tree-into-ssa.c
index 5486d5c..f3b8c02 100644
--- a/gcc/tree-into-ssa.c
+++ b/gcc/tree-into-ssa.c
@@ -1041,10 +1041,7 @@ insert_phi_nodes_compare_var_infos (const void *a, const void *b)
 {
   const var_info *defa = *(var_info * const *)a;

[arm-embedded][PATCH, ARM 4/6] Factor out MOVW/MOVT availability and desirability checks

Hi,

We decided to apply the following patch to the ARM embedded 5 branch.

Best regards,

Thomas

> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
> Sent: Thursday, December 17, 2015 3:59 PM
> To: gcc-patches@gcc.gnu.org; Richard Earnshaw; Ramana Radhakrishnan;
> Kyrylo Tkachov
> Subject: [PATCH, ARM 4/6] Factor out MOVW/MOVT availability and
> desirability checks
> 
> Hi,
> 
> This patch is part of a patch series to add support for ARMv8-M[1] to GCC.
> This specific patch factors out the checks for MOVW/MOVT availability
> and whether to use it. To this end, the new macro TARGET_HAVE_MOVT
> is introduced and code is modified to use it or the existing
> TARGET_USE_MOVT as needed.
> 
> [1] For a quick overview of ARMv8-M please refer to the initial cover
> letter.
> 
> ChangeLog entry is as follows:
> 
> 
> *** gcc/ChangeLog ***
> 
> 2015-11-09  Thomas Preud'homme  
> 
> * config/arm/arm.h (TARGET_USE_MOVT): Check MOVT/MOVW
> availability
> with TARGET_HAVE_MOVT.
> (TARGET_HAVE_MOVT): Define.
> * config/arm/arm.c (const_ok_for_op): Check MOVT/MOVW
> availability with TARGET_HAVE_MOVT.
> * config/arm/arm.md (arm_movt): Use TARGET_HAVE_MOVT to
> check movt
> availability.
> (addsi splitter): Use TARGET_USE_MOVT to check whether to use
> movt + movw.
> (symbol_refs movsi splitter): Remove TARGET_32BIT check.
> (arm_movtas_ze): Use TARGET_HAVE_MOVT to check movt
> availability.
> * config/arm/constraints.md (define_constraint "j"): Use
> TARGET_HAVE_MOVT to check movt availability.
> 
> 
> diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
> index fed3205..1831d01 100644
> --- a/gcc/config/arm/arm.h
> +++ b/gcc/config/arm/arm.h
> @@ -233,7 +233,7 @@ extern void
> (*arm_lang_output_object_attributes_hook)(void);
> 
>  /* Should MOVW/MOVT be used in preference to a constant pool.  */
>  #define TARGET_USE_MOVT \
> -  (arm_arch_thumb2 \
> +  (TARGET_HAVE_MOVT \
> && (arm_disable_literal_pool \
> || (!optimize_size && !current_tune->prefer_constant_pool)))
> 
> @@ -268,6 +268,9 @@ extern void
> (*arm_lang_output_object_attributes_hook)(void);
>  /* Nonzero if this chip supports load-acquire and store-release.  */
>  #define TARGET_HAVE_LDACQ(TARGET_ARM_ARCH >= 8)
> 
> +/* Nonzero if this chip provides the movw and movt instructions.  */
> +#define TARGET_HAVE_MOVT (arm_arch_thumb2)
> +
>  /* Nonzero if integer division instructions supported.  */
>  #define TARGET_IDIV  ((TARGET_ARM && arm_arch_arm_hwdiv) \
>|| (TARGET_THUMB2 &&
> arm_arch_thumb_hwdiv))
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index 62287bc..ec5197a 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -3851,7 +3851,7 @@ const_ok_for_op (HOST_WIDE_INT i, enum
> rtx_code code)
>  {
>  case SET:
>/* See if we can use movw.  */
> -  if (arm_arch_thumb2 && (i & 0x) == 0)
> +  if (TARGET_HAVE_MOVT && (i & 0x) == 0)
>   return 1;
>else
>   /* Otherwise, try mvn.  */
> diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
> index 8ebb1bf..78dafa0 100644
> --- a/gcc/config/arm/arm.md
> +++ b/gcc/config/arm/arm.md
> @@ -5736,7 +5736,7 @@
>[(set (match_operand:SI 0 "nonimmediate_operand" "=r")
>   (lo_sum:SI (match_operand:SI 1 "nonimmediate_operand" "0")
>  (match_operand:SI 2 "general_operand"  "i")))]
> -  "arm_arch_thumb2 && arm_valid_symbolic_address_p (operands[2])"
> +  "TARGET_HAVE_MOVT && arm_valid_symbolic_address_p
> (operands[2])"
>"movt%?\t%0, #:upper16:%c2"
>[(set_attr "predicable" "yes")
> (set_attr "predicable_short_it" "no")
> @@ -5796,8 +5796,7 @@
>[(set (match_operand:SI 0 "arm_general_register_operand" "")
>   (const:SI (plus:SI (match_operand:SI 1 "general_operand" "")
>  (match_operand:SI 2 "const_int_operand"
> ""]
> -  "TARGET_THUMB2
> -   && arm_disable_literal_pool
> +  "TARGET_USE_MOVT
> && reload_completed
> && GET_CODE (operands[1]) == SYMBOL_REF"
>[(clobber (const_int 0))]
> @@ -5827,8 +5826,7 @@
>  (define_split
>[(set (match_operand:SI 0 "arm_general_register_operand" "")
> (match_operand:SI 1 "general_operand" ""))]
> -  "TARGET_32BIT
> -   && TARGET_USE_MOVT && GET_CODE (operands[1]) ==
> SYMBOL_REF
> +  "TARGET_USE_MOVT && GET_CODE (operands[1]) == SYMBOL_REF
> && !flag_pic && !target_word_relocations
> && !arm_tls_referenced_p (operands[1])"
>[(clobber (const_int 0))]
> @@ -11030,7 +11028,7 @@
> (const_int 16)
> (const_int 16))
>  (match_operand:SI 1 "const_int_operand" ""))]
> -  "arm_arch_thumb2"
> +  "TARGET_HAVE_MOVT"
>"movt%?\t%0, %L1"
>   [(set_attr "predicable" "yes")
>(set_attr

[PATCH, ARM 5/6] Add support for MOVT/MOVW to ARMv8-M Baseline

Hi,

This patch is part of a patch series to add support for ARMv8-M[1] to GCC. This 
specific patch makes the compiler start generating code with the new MOVT/MOVW 
instructions for ARMv8-M Baseline.

[1] For a quick overview of ARMv8-M please refer to the initial cover letter.

ChangeLog entry is as follows:


*** gcc/ChangeLog ***

2015-11-13  Thomas Preud'homme  

* config/arm/arm.h (TARGET_HAVE_MOVT): Include ARMv8-M as having MOVT.
* config/arm/arm.c (arm_arch_name): (const_ok_for_op): Check MOVT/MOVW
availability with TARGET_HAVE_MOVT.
(thumb_legitimate_constant_p): Legalize high part of a label_ref as a
constant.
(thumb1_rtx_costs): Also return 0 if setting a half word constant and
movw is available.
(thumb1_size_rtx_costs): Make set of half word constant also cost 1
extra instruction if MOVW is available.  Make constant with bottom half
word zero cost 2 instruction if MOVW is available.
* config/arm/arm.md (define_attr "arch"): Add v8mb.
(define_attr "arch_enabled"): Set to yes if arch value is v8mb and
target is ARMv8-M Baseline.
* config/arm/thumb1.md (thumb1_movdi_insn): Add ARMv8-M Baseline only
alternative for constants satisfying j constraint.
(thumb1_movsi_insn): Likewise.
(movsi splitter for K alternative): Tighten condition to not trigger
if movt is available and j constraint is satisfied.
(Pe immediate splitter): Likewise.
(thumb1_movhi_insn): Add ARMv8-M Baseline only alternative for
constant fitting in an halfword to use movw.
* doc/sourcebuild.texi (arm_thumb1_movt_ko): Document new ARM
effective target.


*** gcc/testsuite/ChangeLog ***

2015-11-13  Thomas Preud'homme  

* lib/target-supports.exp (check_effective_target_arm_thumb1_movt_ko):
Define effective target.
* gcc.target/arm/pr42574.c: Require arm_thumb1_movt_ko instead of
arm_thumb1_ok as effective target to exclude ARMv8-M Baseline.


diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index ff3cfcd..015df50 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -261,7 +261,7 @@ extern void (*arm_lang_output_object_attributes_hook)(void);
 #define TARGET_HAVE_LDACQ  (TARGET_ARM_ARCH >= 8 && arm_arch_notm)
 
 /* Nonzero if this chip provides the movw and movt instructions.  */
-#define TARGET_HAVE_MOVT   (arm_arch_thumb2)
+#define TARGET_HAVE_MOVT   (arm_arch_thumb2 || arm_arch8)
 
 /* Nonzero if integer division instructions supported.  */
 #define TARGET_IDIV((TARGET_ARM && arm_arch_arm_hwdiv) \
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 51d501e..d832309 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -8158,6 +8158,12 @@ arm_legitimate_constant_p_1 (machine_mode, rtx x)
 static bool
 thumb_legitimate_constant_p (machine_mode mode ATTRIBUTE_UNUSED, rtx x)
 {
+  /* Splitters for TARGET_USE_MOVT call arm_emit_movpair which creates high
+ RTX.  These RTX must therefore be allowed for Thumb-1 so that when run
+ for ARMv8-M baseline or later the result is valid.  */
+  if (TARGET_HAVE_MOVT && GET_CODE (x) == HIGH)
+x = XEXP (x, 0);
+
   return (CONST_INT_P (x)
  || CONST_DOUBLE_P (x)
  || CONSTANT_ADDRESS_P (x)
@@ -8244,7 +8250,8 @@ thumb1_rtx_costs (rtx x, enum rtx_code code, enum 
rtx_code outer)
 case CONST_INT:
   if (outer == SET)
{
- if ((unsigned HOST_WIDE_INT) INTVAL (x) < 256)
+ if ((unsigned HOST_WIDE_INT) INTVAL (x) < 256
+ || (TARGET_HAVE_MOVT && !(INTVAL (x) & 0x)))
return 0;
  if (thumb_shiftable_const (INTVAL (x)))
return COSTS_N_INSNS (2);
@@ -8994,16 +9001,24 @@ thumb1_size_rtx_costs (rtx x, enum rtx_code code, enum 
rtx_code outer)
 the mode.  */
   words = ARM_NUM_INTS (GET_MODE_SIZE (GET_MODE (SET_DEST (x;
   return COSTS_N_INSNS (words)
-+ COSTS_N_INSNS (1) * (satisfies_constraint_J (SET_SRC (x))
-   || satisfies_constraint_K (SET_SRC (x))
-  /* thumb1_movdi_insn.  */
-   || ((words > 1) && MEM_P (SET_SRC (x;
++ COSTS_N_INSNS (1)
+  * (satisfies_constraint_J (SET_SRC (x))
+ || satisfies_constraint_K (SET_SRC (x))
+/* Too big immediate for 2byte mov, using movt.  */
+ || ((unsigned HOST_WIDE_INT) INTVAL (SET_SRC (x)) >= 256
+ && TARGET_HAVE_MOVT
+ && satisfies_constraint_j (SET_SRC (x)))
+/* thumb1_movdi_insn.  */
+ || ((words > 1) && MEM_P (SET_SRC (x;
 
 case CONST_INT:
   if (outer == SET)
 {
   if ((unsigned HOST_WIDE_INT) INTVAL (x) < 256)

[PATCH, ARM 6/6] Add support for CB(N)Z and (U|S)DIV to ARMv8-M Baseline

Hi,

This patch is part of a patch series to add support for ARMv8-M[1] to GCC. This 
specific patch makes the compiler start generating code with the new CB(N)Z and 
(U|S)DIV instructions for ARMv8-M Baseline.

Sharing of instruction patterns for div insn template with ARM or Thumb-2 was 
done by allowing %? punctuation character for Thumb-1. This is safe to do since 
the compiler would fault in arm_print_condition if a condition code is not 
handled by a branch in Thumb1.

Unfortunately, cbz cannot be shared with cbranchsi4 because it would lead to 
worse code for Thumb-1. Indeed, choosing cb(n)z over the other alternatives for 
cbranchsi4 depends on the distance between target and pc which lead 
insn-attrtab to evaluate the minimum length of this pattern to be 2 as it 
cannot computer the distance statically. It would be possible to determine that 
this alternative is not available for non ARMv8-M Thumb-1 target statically but 
genattrtab is not currently capable to do it, so this is for a later patch.


[1] For a quick overview of ARMv8-M please refer to the initial cover letter.

ChangeLog entry is as follows:


*** gcc/ChangeLog ***

2015-11-13  Thomas Preud'homme  

* config/arm/arm.c (arm_print_operand_punct_valid_p): Make %? valid
for Thumb-1.
* config/arm/arm.h (TARGET_HAVE_CBZ): Define.
(TARGET_IDIV): Set for all Thumb targets provided they have hardware
divide feature.
* config/arm/thumb1.md (thumb1_cbz): New insn.


diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 015df50..247f144 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -263,9 +263,12 @@ extern void 
(*arm_lang_output_object_attributes_hook)(void);
 /* Nonzero if this chip provides the movw and movt instructions.  */
 #define TARGET_HAVE_MOVT   (arm_arch_thumb2 || arm_arch8)
 
+/* Nonzero if this chip provides the cb{n}z instruction.  */
+#define TARGET_HAVE_CBZ(arm_arch_thumb2 || arm_arch8)
+
 /* Nonzero if integer division instructions supported.  */
 #define TARGET_IDIV((TARGET_ARM && arm_arch_arm_hwdiv) \
-|| (TARGET_THUMB2 && arm_arch_thumb_hwdiv))
+|| (TARGET_THUMB && arm_arch_thumb_hwdiv))
 
 /* Nonzero if disallow volatile memory access in IT block.  */
 #define TARGET_NO_VOLATILE_CE  (arm_arch_no_volatile_ce)
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index d832309..5ef3a1d 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -22568,7 +22568,7 @@ arm_print_operand_punct_valid_p (unsigned char code)
 {
   return (code == '@' || code == '|' || code == '.'
  || code == '(' || code == ')' || code == '#'
- || (TARGET_32BIT && (code == '?'))
+ || code == '?'
  || (TARGET_THUMB2 && (code == '!'))
  || (TARGET_THUMB && (code == '_')));
 }
diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md
index 7e3bcb4..074b267 100644
--- a/gcc/config/arm/thumb1.md
+++ b/gcc/config/arm/thumb1.md
@@ -973,6 +973,92 @@
   DONE;
 })
 
+;; A pattern for the cb(n)z instruction added in ARMv8-M baseline profile,
+;; adapted from cbranchsi4_insn.  Modifying cbranchsi4_insn instead leads to
+;; code generation difference for ARMv6-M because the minimum length of the
+;; instruction becomes 2 even for it due to a limitation in genattrtab's
+;; handling of pc in the length condition.
+(define_insn "thumb1_cbz"
+  [(set (pc) (if_then_else
+ (match_operator 0 "equality_operator"
+  [(match_operand:SI 1 "s_register_operand" "l")
+   (const_int 0)])
+ (label_ref (match_operand 2 "" ""))
+ (pc)))]
+  "TARGET_THUMB1 && TARGET_HAVE_MOVT"
+{
+  if (get_attr_length (insn) == 2)
+{
+  if (GET_CODE (operands[0]) == EQ)
+   return "cbz\t%1, %l2";
+  else
+   return "cbnz\t%1, %l2";
+}
+  else
+{
+  rtx t = cfun->machine->thumb1_cc_insn;
+  if (t != NULL_RTX)
+   {
+ if (!rtx_equal_p (cfun->machine->thumb1_cc_op0, operands[1])
+ || !rtx_equal_p (cfun->machine->thumb1_cc_op1, operands[2]))
+   t = NULL_RTX;
+ if (cfun->machine->thumb1_cc_mode == CC_NOOVmode)
+   {
+ if (!noov_comparison_operator (operands[0], VOIDmode))
+   t = NULL_RTX;
+   }
+ else if (cfun->machine->thumb1_cc_mode != CCmode)
+   t = NULL_RTX;
+   }
+  if (t == NULL_RTX)
+   {
+ output_asm_insn ("cmp\t%1, #0", operands);
+ cfun->machine->thumb1_cc_insn = insn;
+ cfun->machine->thumb1_cc_op0 = operands[1];
+ cfun->machine->thumb1_cc_op1 = operands[2];
+ cfun->machine->thumb1_cc_mode = CCmode;
+   }
+  else
+   /* Ensure we emit the right type of condition code on the jump.  */
+   XEXP (operands[0], 0) = gen_rtx_REG (cfun->machine->thumb1_cc_mode,
+

[arm-embedded][PATCH, ARM 6/6] Add support for CB(N)Z and (U|S)DIV to ARMv8-M Baseline

Hi,

We decided to apply the following patch to the ARM embedded 5 branch.

Best regards,

Thomas

> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
> Sent: Thursday, December 17, 2015 4:18 PM
> To: gcc-patches@gcc.gnu.org; Richard Earnshaw; Ramana Radhakrishnan;
> Kyrylo Tkachov
> Subject: [PATCH, ARM 6/6] Add support for CB(N)Z and (U|S)DIV to
> ARMv8-M Baseline
> 
> Hi,
> 
> This patch is part of a patch series to add support for ARMv8-M[1] to GCC.
> This specific patch makes the compiler start generating code with the
> new CB(N)Z and (U|S)DIV instructions for ARMv8-M Baseline.
> 
> Sharing of instruction patterns for div insn template with ARM or Thumb-
> 2 was done by allowing %? punctuation character for Thumb-1. This is
> safe to do since the compiler would fault in arm_print_condition if a
> condition code is not handled by a branch in Thumb1.
> 
> Unfortunately, cbz cannot be shared with cbranchsi4 because it would
> lead to worse code for Thumb-1. Indeed, choosing cb(n)z over the other
> alternatives for cbranchsi4 depends on the distance between target and
> pc which lead insn-attrtab to evaluate the minimum length of this
> pattern to be 2 as it cannot computer the distance statically. It would be
> possible to determine that this alternative is not available for non
> ARMv8-M Thumb-1 target statically but genattrtab is not currently
> capable to do it, so this is for a later patch.
> 
> 
> [1] For a quick overview of ARMv8-M please refer to the initial cover
> letter.
> 
> ChangeLog entry is as follows:
> 
> 
> *** gcc/ChangeLog ***
> 
> 2015-11-13  Thomas Preud'homme  
> 
> * config/arm/arm.c (arm_print_operand_punct_valid_p): Make %?
> valid
> for Thumb-1.
> * config/arm/arm.h (TARGET_HAVE_CBZ): Define.
> (TARGET_IDIV): Set for all Thumb targets provided they have
> hardware
> divide feature.
> * config/arm/thumb1.md (thumb1_cbz): New insn.
> 
> 
> diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
> index 015df50..247f144 100644
> --- a/gcc/config/arm/arm.h
> +++ b/gcc/config/arm/arm.h
> @@ -263,9 +263,12 @@ extern void
> (*arm_lang_output_object_attributes_hook)(void);
>  /* Nonzero if this chip provides the movw and movt instructions.  */
>  #define TARGET_HAVE_MOVT (arm_arch_thumb2 || arm_arch8)
> 
> +/* Nonzero if this chip provides the cb{n}z instruction.  */
> +#define TARGET_HAVE_CBZ  (arm_arch_thumb2 || arm_arch8)
> +
>  /* Nonzero if integer division instructions supported.  */
>  #define TARGET_IDIV  ((TARGET_ARM && arm_arch_arm_hwdiv) \
> -  || (TARGET_THUMB2 &&
> arm_arch_thumb_hwdiv))
> +  || (TARGET_THUMB &&
> arm_arch_thumb_hwdiv))
> 
>  /* Nonzero if disallow volatile memory access in IT block.  */
>  #define TARGET_NO_VOLATILE_CE
>   (arm_arch_no_volatile_ce)
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index d832309..5ef3a1d 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -22568,7 +22568,7 @@ arm_print_operand_punct_valid_p
> (unsigned char code)
>  {
>return (code == '@' || code == '|' || code == '.'
> || code == '(' || code == ')' || code == '#'
> -   || (TARGET_32BIT && (code == '?'))
> +   || code == '?'
> || (TARGET_THUMB2 && (code == '!'))
> || (TARGET_THUMB && (code == '_')));
>  }
> diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md
> index 7e3bcb4..074b267 100644
> --- a/gcc/config/arm/thumb1.md
> +++ b/gcc/config/arm/thumb1.md
> @@ -973,6 +973,92 @@
>DONE;
>  })
> 
> +;; A pattern for the cb(n)z instruction added in ARMv8-M baseline
> profile,
> +;; adapted from cbranchsi4_insn.  Modifying cbranchsi4_insn instead
> leads to
> +;; code generation difference for ARMv6-M because the minimum
> length of the
> +;; instruction becomes 2 even for it due to a limitation in genattrtab's
> +;; handling of pc in the length condition.
> +(define_insn "thumb1_cbz"
> +  [(set (pc) (if_then_else
> +   (match_operator 0 "equality_operator"
> +[(match_operand:SI 1 "s_register_operand" "l")
> + (const_int 0)])
> +   (label_ref (match_operand 2 "" ""))
> +   (pc)))]
> +  "TARGET_THUMB1 && TARGET_HAVE_MOVT"
> +{
> +  if (get_attr_length (insn) == 2)
> +{
> +  if (GET_CODE (operands[0]) == EQ)
> + return "cbz\t%1, %l2";
> +  else
> + return "cbnz\t%1, %l2";
> +}
> +  else
> +{
> +  rtx t = cfun->machine->thumb1_cc_insn;
> +  if (t != NULL_RTX)
> + {
> +   if (!rtx_equal_p (cfun->machine->thumb1_cc_op0, operands[1])
> +   || !rtx_equal_p (cfun->machine->thumb1_cc_op1,
> operands[2]))
> + t = NULL_RTX;
> +   if (cfun->machine->thumb1_cc_mode == CC_NOOVmode)
> + {
> +   if (!noov_comparison_operator (operands[0], VOIDmode))
> +

Re: [build] Only support -gstabs on Mac OS X if assember supports it (PR target/67973)

2015-12-17 Thread Mike Stump

On Dec 16, 2015, at 11:29 PM, Rainer Orth  wrote:
> Here's what I came up with.  Tested with the appropriate runtest
> invocations both in a tree with the Xcode 7/LLVM as without stabs
> support, where the tests come out UNSUPPORTED, and another one with the
> Xcode 6.4/gas as with stabs, where they PASS.
> 
> I've left alone two testcases using -gstabs* which are guaranteed to
> work without the keyword:
> 
>   gcc.target/powerpc/stabs-attrib-vect-darwin.c
>   gcc.target/s390/20041216-1.c
> 
> In case the current test for stabs (checking if one can compile/assemble
> with -gstabs) isn't enough on some of the targets currently listed
> explicitly, it could easily be augmented.
> 
> Ok for mainline?

Ok.

[PATCH 0/5] Fix qsort comparison functions


Hi all,

This patchset fixes bugs in comparison functions used in qsort(3). 
Standard requires comparison functions to satisfy certain 
symmetry/transitivity axioms ("total ordering" in 
http://pubs.opengroup.org/onlinepubs/009695399/functions/qsort.html). 
Violation triggers undefined behavior which can e.g. cause qsort to 
produce invalid results (or even crash - check 
https://bugzilla.samba.org/show_bug.cgi?id=3959).


Most of the patches are pretty obvious except for no. 3 for which I was 
failed to devise a behavior-preserving fix.  I've Cc-ed the original 
authors in hope they'll be able to help.


I've verified all patches on x86_64-pc-linux-gnu (bootstrap + regression 
test).


NB: Bugs were found with SortChecker tool 
(https://github.com/yugr/sortcheck).


/Yury

Re: [PATCH 0/4] [ARC] Collection Of Bug Fixes




On 16/12/15 00:15, Andrew Burgess wrote:

This is a collection of 4 bug fix patches for arc.  All 4 patches are
really stand-alone, I've only grouped them together as they all only
effect arc.

Note for future postings:  ChangeLog entries are supposed to appear as
plain text, not as diff.

Re: [PATCH PR68906]

2015-12-17 Thread Yuri Rumyantsev

Richard,

Here is modified patch which checks only that exit block belongs to loop.

Bootstrapping and regression testing were successful.
Is it OK for trunk?

ChangeLog:
2014-12-17  Yuri Rumyantsev  

PR tree-optimization/68906
* tree-ssa-loop-unswitch.c (tree_unswitch_outer_loop): Add  a check
that an exit block belongs to LOOP.

gcc/testsuite/ChangeLog
* gcc.dg/torture/pr68906.c: New test.
.

2015-12-16 17:49 GMT+03:00 Richard Biener :
> On Wed, Dec 16, 2015 at 3:36 PM, Yuri Rumyantsev  wrote:
>> Richard,
>>
>> Here is updated patch which includes (1) a test on exit proposed by
>> you and (2) another test from PR68021 which is caught by new check on
>> counted loop. Outer-loop unswitching is not performed for both new
>> tests.
>
> As said I don't think
>
>/* If the loop is not expected to iterate, there is no need
>for unswitching.  */
> -  iterations = estimated_loop_iterations_int (loop);
> -  if (iterations >= 0 && iterations <= 1)
> +  niters = number_of_latch_executions (loop);
> +  if (!niters || chrec_contains_undetermined (niters))
>  {
>
> is good.  We do want to retain the estimated_loop_iterations check
> as it takes into account profile data while yours lets through all
> counted loops.
>
> Also I don't see why SCEV needs to be able to analyze the IV to check
> for validity.
>
> Can you please split the patch into the part I suggested (which is ok)
> and the rest?
>
> Thanks,
> Richard.
>
>>
>> Bootstrapping and regression testing did not show any new failures.
>>
>> Is it OK for trunk.
>>
>> ChangeLog:
>> 2014-12-16  Yuri Rumyantsev  
>>
>> PR tree-optimization/68021
>> PR tree-optimization/68906
>> * tree-ssa-loop-unswitch.c : Include couple header files.
>> (tree_unswitch_outer_loop): Add check that an exit is not inside inner
>> loop, use number_of_latch_executions to detect non-iterated loops.
>>
>> gcc/testsuite/ChangeLog
>> * gcc.dg/torture/pr68021.c: New test.
>> * gcc.dg/torture/pr68906.c: Likewise.
>>
>> 2015-12-16 15:51 GMT+03:00 Richard Biener :
>>> On Wed, Dec 16, 2015 at 1:14 PM, Yuri Rumyantsev  wrote:
 Hi All,

 Here is simple patch which cures the issue with outer-loop unswitching
 - added invocation of number_of_latch_executions() to reject
 unswitching for non-iterated loops.

 Bootstrapping and regression testing did not show any new failures.
 Is it OK for trunk?
>>>
>>> No, that looks like just papering over the issue.
>>>
>>> The issue (with the 2nd testcase at least) is that single_exit () accepts
>>> an exit from the inner loop.
>>>
>>> Index: gcc/tree-ssa-loop-unswitch.c
>>> ===
>>> --- gcc/tree-ssa-loop-unswitch.c(revision 231686)
>>> +++ gcc/tree-ssa-loop-unswitch.c(working copy)
>>> @@ -431,7 +431,7 @@ tree_unswitch_outer_loop (struct loop *l
>>>  return false;
>>>/* Accept loops with single exit only.  */
>>>exit = single_exit (loop);
>>> -  if (!exit)
>>> +  if (!exit || exit->src->loop_father != loop)
>>>  return false;
>>>/* Check that phi argument of exit edge is not defined inside loop.  */
>>>if (!check_exit_phi (loop))
>>>
>>> fixes the runtime testcase for me (not suitable for the testsuite due
>>> to the infinite
>>> looping though).
>>>
>>> Can you please bootstrap/test the above with your testcase?  The above 
>>> patch is
>>> ok if it passes testing (no time myself right now)
>>>
>>> Thanks,
>>> Richard.
>>>
 ChangeLog:

 2014-12-16  Yuri Rumyantsev  

 PR tree-optimization/68906
 * tree-ssa-loop-unswitch.c : Include couple header files.
 (tree_unswitch_outer_loop): Use number_of_latch_executions
 to reject non-iterated loops.

 gcc/testsuite/ChangeLog
 * gcc.dg/torture/pr68906.c: New test.


patch.2
Description: Binary data

[arm-embedded][PATCH, GCC/ARM, 2/3] Error out for incompatible ARM multilibs

Hi,

We decided to apply the following patch to the ARM embedded 5 branch.

Best regards,

Thomas

> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
> Sent: Wednesday, December 16, 2015 7:59 PM
> To: gcc-patches@gcc.gnu.org; Richard Earnshaw; Ramana Radhakrishnan;
> Kyrylo Tkachov
> Subject: [PATCH, GCC/ARM, 2/3] Error out for incompatible ARM
> multilibs
> 
> Currently in config.gcc, only the first multilib in a multilib list is 
> checked for
> validity and the following elements are ignored due to the break which
> only breaks out of loop in shell. A loop is also done over the multilib list
> elements despite no combination being legal. This patch rework the code
> to address both issues.
> 
> ChangeLog entry is as follows:
> 
> 
> 2015-11-24  Thomas Preud'homme  
> 
> * config.gcc: Error out when conflicting multilib is detected.  Do not
> loop over multilibs since no combination is legal.
> 
> 
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index 59aee2c..be3c720 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -3772,38 +3772,40 @@ case "${target}" in
>   # Add extra multilibs
>   if test "x$with_multilib_list" != x; then
>   arm_multilibs=`echo $with_multilib_list | sed -e
> 's/,/ /g'`
> - for arm_multilib in ${arm_multilibs}; do
> - case ${arm_multilib} in
> - aprofile)
> + case ${arm_multilibs} in
> + aprofile)
>   # Note that arm/t-aprofile is a
>   # stand-alone make file fragment to be
>   # used only with itself.  We do not
>   # specifically use the
>   # TM_MULTILIB_OPTION framework
> because
>   # this shorthand is more
> - # pragmatic. Additionally it is only
> - # designed to work without any
> - # with-cpu, with-arch with-mode
> + # pragmatic.
> + tmake_profile_file="arm/t-aprofile"
> + ;;
> + default)
> + ;;
> + *)
> + echo "Error: --with-multilib-
> list=${with_multilib_list} not supported." 1>&2
> + exit 1
> + ;;
> + esac
> +
> + if test "x${tmake_profile_file}" != x ; then
> + # arm/t-aprofile is only designed to work
> + # without any with-cpu, with-arch, with-
> mode,
>   # with-fpu or with-float options.
> - if test "x$with_arch" != x \
> - || test "x$with_cpu" != x \
> - || test "x$with_float" != x \
> - || test "x$with_fpu" != x \
> - || test "x$with_mode" != x ;
> then
> - echo "Error: You cannot use
> any of --with-arch/cpu/fpu/float/mode with --with-multilib-list=aprofile"
> 1>&2
> - exit 1
> - fi
> - tmake_file="${tmake_file}
> arm/t-aprofile"
> - break
> - ;;
> - default)
> - ;;
> - *)
> - echo "Error: --with-multilib-
> list=${with_multilib_list} not supported." 1>&2
> - exit 1
> - ;;
> - esac
> - done
> + if test "x$with_arch" != x \
> + || test "x$with_cpu" != x \
> + || test "x$with_float" != x \
> + || test "x$with_fpu" != x \
> + || test "x$with_mode" != x ; then
> + echo "Error: You cannot use any of --
> with-arch/cpu/fpu/float/mode with --with-multilib-list=${arm_multilib}"
> 1>&2
> + exit 1
> + fi
> +
> + tmake_file="${tmake_file}
> ${tmake_profile_file}"
> + fi
>   fi
>   ;;
> 
> 
> Tested with the following multilib lists:
>   + foo -> "Error: --with-multilib-list=foo not supported" as expected

[PATCH][AARCH64][NEON] Enabling V*HFmode simd immediate loads.

2015-12-17 Thread Bilyan Borisov


This patch adds support for loading vector 16bit floating point immediates
(modes V*HF) using a movi instruction. We leverage the existing code that does
checking for an 8 bit pattern in a 64/128-bit long splattered version of the
concatenated bit pattern representations of the individual constant elements
of the vector. This enables us to load a variety of constants, since the movi
instruction also comes with an up to 24 bit immediate left shift encoding (in
multiples of 8). A new testcase was added that checks for presence of movi
instructions and for correctness of results.

Tested on aarch64-none-elf, aarch64_be-none-elf, bootstrapped on
aarch64-none-linux-gnu.

---

gcc/

2015-XX-XX  Bilyan Borisov  

* config/aarch64/aarch64.c (aarch64_simd_container_mode): Added HFmode
cases.
(aarch64_vect_float_const_representable_p): Updated comment.
(aarch64_simd_valid_immediate): Added support for V*HF arguments.
(aarch64_output_simd_mov_immediate): Added check for HFmode.

gcc/testsuite/

2015-XX-XX  Bilyan Borisov  

* gcc.target/aarch64/fp16/f16_mov_immediate_simd_1.c: New.

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index ae4cfb336a827a63a6baadefcb5646a9dbfb7523..bb6fce0a829d634a7694710e8a8c9a1c3e841abd 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -10250,6 +10250,8 @@ aarch64_simd_container_mode (machine_mode mode, unsigned width)
 	return V2DFmode;
 	  case SFmode:
 	return V4SFmode;
+	  case HFmode:
+	return V8HFmode;
 	  case SImode:
 	return V4SImode;
 	  case HImode:
@@ -10266,6 +10268,8 @@ aarch64_simd_container_mode (machine_mode mode, unsigned width)
 	  {
 	  case SFmode:
 	return V2SFmode;
+	  case HFmode:
+	return V4HFmode;
 	  case SImode:
 	return V2SImode;
 	  case HImode:
@@ -10469,7 +10473,12 @@ sizetochar (int size)
 /* Return true iff x is a uniform vector of floating-point
constants, and the constant can be represented in
quarter-precision form.  Note, as aarch64_float_const_representable
-   rejects both +0.0 and -0.0, we will also reject +0.0 and -0.0.  */
+   rejects both +0.0 and -0.0, we will also reject +0.0 and -0.0.
+   Also note that this won't ever be called for V*HFmode vectors,
+   since in aarch64_simd_valid_immediate () we check for the mode
+   and handle these vector types differently from other floating
+   point vector modes.  */
+
 static bool
 aarch64_vect_float_const_representable_p (rtx x)
 {
@@ -10505,7 +10514,10 @@ aarch64_simd_valid_immediate (rtx op, machine_mode mode, bool inverse,
   unsigned int invmask = inverse ? 0xff : 0;
   int eshift, emvn;
 
-  if (GET_MODE_CLASS (mode) == MODE_VECTOR_FLOAT)
+  /* Ignore V*HFmode vectors, they are handled below with the integer
+ code.  */
+  if (GET_MODE_CLASS (mode) == MODE_VECTOR_FLOAT
+  && GET_MODE_INNER (mode) != HFmode)
 {
   if (! (aarch64_simd_imm_zero_p (op, mode)
 	 || aarch64_vect_float_const_representable_p (op)))
@@ -10530,15 +10542,26 @@ aarch64_simd_valid_immediate (rtx op, machine_mode mode, bool inverse,
   rtx el = CONST_VECTOR_ELT (op, BYTES_BIG_ENDIAN ? (n_elts - 1 - i) : i);
   unsigned HOST_WIDE_INT elpart;
 
-  gcc_assert (CONST_INT_P (el));
-  elpart = INTVAL (el);
+  if (CONST_INT_P (el))
+	elpart = INTVAL (el);
+  /* Convert HFmode vector element to bit pattern.  Logic below will catch
+	 most common constants since for FP16 the sign and exponent are in the
+	 top 6 bits and a movi with a left shift of 8 will catch all powers
+	 of 2 that fit in a 16 bit floating point, and the 2 extra bits left
+	 for the mantissa can cover some more non-power of 2 constants.  With
+	 a 0 left shift, we can cover constants of the form 1.xxx since we have
+	 8 bits only for the mantissa.  */
+  else if (CONST_DOUBLE_P (el) && GET_MODE_INNER (mode) == HFmode)
+	elpart =
+	  real_to_target (NULL, CONST_DOUBLE_REAL_VALUE (el), HFmode);
+  else
+gcc_unreachable ();
 
   for (unsigned int byte = 0; byte < innersize; byte++)
 	{
 	  bytes[idx++] = (elpart & 0xff) ^ invmask;
 	  elpart >>= BITS_PER_UNIT;
 	}
-
 }
 
   /* Sanity check.  */
@@ -11913,7 +11936,10 @@ aarch64_output_simd_mov_immediate (rtx const_vector,
   lane_count = width / info.element_width;
 
   mode = GET_MODE_INNER (mode);
-  if (GET_MODE_CLASS (mode) == MODE_FLOAT)
+  /* We handle HFmode vectors separately from the other floating point
+ vector modes.  See aarch64_simd_valid_immediate (), but in short
+ we use a movi instruction rather than a fmov.  */
+  if (GET_MODE_CLASS (mode) == MODE_FLOAT && mode != HFmode)
 {
   gcc_assert (info.shift == 0 && ! info.mvn);
   /* For FP zero change it to a CONST_INT 0 and use the integer SIMD
diff --git a/gcc/testsuite/gcc.target/aarch64/fp16/f16_mov_immediate_simd_1.c

[arm-embedded][PATCH, ARM, 3/3] Add multilib support for bare-metal ARM architectures

Hi,

We decided to apply the following patch to the ARM embedded 5 branch.

Best regards,

Thomas

> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
> Sent: Wednesday, December 16, 2015 8:04 PM
> To: 'Ramana Radhakrishnan'; Richard Earnshaw; Kyrylo Tkachov; gcc-
> patches
> Cc: Jasmin J.
> Subject: [PATCH, ARM, 3/3] Add multilib support for bare-metal ARM
> architectures
> 
> Hi Ramana,
> 
> As suggested in your initial answer to this thread, we updated the
> multilib patch provided in ARM's embedded branch to be up-to-date
> with regards to supported CPUs in GCC. As to the need to modify
> Makefile.in and configure.ac, this is because the patch aims to let control
> to the user as to what multilib should be built. To this effect, it takes a 
> list
> of architecture at configure time and that list needs to be passed down
> to t-baremetal Makefile to set the multilib variables appropriately.
> 
> ChangeLog entry is as follows:
> 
> 
> *** gcc/ChangeLog ***
> 
> 2015-12-15  Thomas Preud'homme  
> 
> * Makefile.in (with_multilib_list): New variables substituted by
> configure.
> * config.gcc: Handle bare-metal multilibs in --with-multilib-list
> option.
> * config/arm/t-baremetal: New file.
> * configure.ac (with_multilib_list): New AC_SUBST.
> * configure: Regenerate.
> * doc/install.texi (--with-multilib-list): Update description for
> arm*-*-* targets to mention bare-metal multilibs.
> 
> 
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index
> 1f698798aa2df3f44d6b3a478bb4bf48e9fa7372..18b790afa114aa7580be06
> 62d3ac9ffbc94e919d 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -546,6 +546,7 @@ lang_opt_files=@lang_opt_files@ $(srcdir)/c-
> family/c.opt $(srcdir)/common.opt
>  lang_specs_files=@lang_specs_files@
>  lang_tree_files=@lang_tree_files@
>  target_cpu_default=@target_cpu_default@
> +with_multilib_list=@with_multilib_list@
>  OBJC_BOEHM_GC=@objc_boehm_gc@
>  extra_modes_file=@extra_modes_file@
>  extra_opt_files=@extra_opt_files@
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index
> af948b5e203f6b4f53dfca38e9d02d060d00c97b..d8098ed3cefacd00cb1059
> 0db1ec86d48e9fcdbc 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -3787,15 +3787,25 @@ case "${target}" in
>   default)
>   ;;
>   *)
> - echo "Error: --with-multilib-
> list=${with_multilib_list} not supported." 1>&2
> - exit 1
> + for arm_multilib in ${arm_multilibs}; do
> + case ${arm_multilib} in
> + armv6-m | armv7-m | armv7e-m
> | armv7-r | armv8-m.base | armv8-m.main)
> +
>   tmake_profile_file="arm/t-baremetal"
> + ;;
> + *)
> + echo "Error: --with-
> multilib-list=${with_multilib_list} not supported." 1>&2
> + exit 1
> + ;;
> + esac
> + done
>   ;;
>   esac
> 
>   if test "x${tmake_profile_file}" != x ; then
> - # arm/t-aprofile is only designed to work
> - # without any with-cpu, with-arch, with-
> mode,
> - # with-fpu or with-float options.
> + # arm/t-aprofile and arm/t-baremetal are
> only
> + # designed to work without any with-cpu,
> + # with-arch, with-mode, with-fpu or
> with-float
> + # options.
>   if test "x$with_arch" != x \
>   || test "x$with_cpu" != x \
>   || test "x$with_float" != x \
> diff --git a/gcc/config/arm/t-baremetal b/gcc/config/arm/t-baremetal
> new file mode 100644
> index
> ..ffd29815e6ec22c747e777
> 47ed9b69e0ae21b63a
> --- /dev/null
> +++ b/gcc/config/arm/t-baremetal
> @@ -0,0 +1,130 @@
> +# A set of predefined MULTILIB which can be used for different ARM
> targets.
> +# Via the configure option --with-multilib-list, user can customize the
> +# final MULTILIB implementation.
> +
> +comma := ,
> +
> +with_multilib_list := $(subst $(comma), ,$(with_multilib_list
> +
> +MULTILIB_OPTIONS   = mthumb/marm
> +MULTILIB_DIRNAMES  = thumb arm
> +MULTILIB_OPTIONS  += march=armv6s-m/march=armv7-
> m/march=armv7e-m/march=armv7/march=armv8-
> m.base/march=armv8-m.main
> +MULTILIB_DIRNAMES += armv6-m armv7-m armv7e-m armv7-ar

Re: [PATCH][combine] PR rtl-optimization/68651 Try changing rtx from (r + r) to (r << 1) to aid recognition



On 16/12/15 17:28, Jeff Law wrote:

On 12/16/2015 10:00 AM, Kyrill Tkachov wrote:


On 16/12/15 12:18, Bernd Schmidt wrote:

On 12/15/2015 05:21 PM, Kyrill Tkachov wrote:

Then for the shift pattern in the MD file we'd have to
dynamically select the scheduling type depending on whether or
not the shift amount is 1 and the costs line up?


Yes. This isn't unusual, take a look at i386.md where you have a
lot of switches on attr type to decide which string to print.



I'm just worried that if we take this idea to its logical conclusion,
we have to add a new canonicalisation rule: "all (plus x x)
expressions shall be expressed as (ashift x 1)". Such a rule seems
too specific to me and all targets would have to special-case it in
their MD patterns and costs if they ever wanted to treat an add and a
shift differently. In this particular case we'd have to
conditionalise the scheduling string selection on a particular CPU
tuning and the shift amount, which will make the pattern much harder
to read. To implement this properly we'd also have to

That's not terribly unusual.  And we've done those kind of canonicalization 
rules before -- most recently to deal with issues in combine we settled on 
canonicalization rules for ashift vs mult.  While there was fallout, it's 
manageable.





The price we pay when trying these substitutions is an iteration
over the rtx with FOR_EACH_SUBRTX_PTR. recog gets called only if
that iteration actually performed a substitution of x + x into x
<< 1. Is that too high a price to pay? (I'm not familiar with the
performance characteristics of the FOR_EACH_SUBRTX machinery)


It depends on how many of these transforms we are going to try; it
 also feels very hackish, trying to work around the core design of
the combiner. IMO it would be better for machine descriptions to
work with the pass rather than against it.



Perhaps I'm lacking the historical context, but what is the core
design of the combiner? Why should the backend have to jump through
these hoops if it already communicates to the midend (through correct
rtx costs) that a shift is more expensive than a plus? I'd be more
inclined to agree that this is perhaps a limitation in recog rather
than combine, but still not a backend problem.

The historical design of combine is pretty simple.

Use data dependence to substitute the definition of an operand in a use
of the operand. Essentially create bigger blobs of RTL. Canonicalize and
simplify that larger blob of RTL, then try to match it with a pattern in
the backend.

Note that costing didn't enter the picture. The assumption was that if
the combination succeeds, then it's profitable (fewer insns).  We haven't 
generally encouraged trying to match multiple forms of equivalent expressions, 
instead we declare a canonical form and make sure combine uses it.




Thanks for the explanation.




If you can somehow arrange for the (plus x x) to be turned into a
shift while substituting that might be yet another approach to
try.



I did investigate where else we could make this transformation. For
the zero_extend+shift case (the ubfiz instruction from the testcase
in my original submission) we could fix this by modifying
make_extraction to convert its argument to a shift from (plus x x)
as, in that context, shifts are undoubtedly more likely to simplify
with the various extraction operations that it's trying to perform.

Note that canonicalizing (plus x x) to (ashift x 1) is consistent with the 
canonicalization we do for (mult x C) to (ashift x log2 (C)) where C is an 
exact power of two.

When we made that change consistently (there were cases where we instead 
preferred MULT in the past), we had to fix some backends, but the fallout 
wasn't terrible.

I would think from a representational standpoint canonicalizing (plus x x) to 
(ashift x 1) would be generally a good thing.



Ok. Gathering from the above it's combines' job to canonicalise, so 
implementing this approach would be simply a matter
of adding the transformation to combine_simplify_rtx. At least that does the 
trick for my testcases and the backend
can decide whether to emit the add instruction for the shift-by-one case.

If there's consensus on this approach I'll propose a patch for that.

Thanks for all your inputs,
Kyrill




jeff

Re: [PATCH, PR67627][RFC] broken libatomic multilib parallel build

2015-12-17 Thread Szabolcs Nagy


On 16/12/15 17:06, Jeff Law wrote:

On 12/04/2015 05:39 AM, Szabolcs Nagy wrote:

As described in pr other/67627, the all-multi target can be
built in parallel with the %_.lo targets which generate make
dependencies that are parsed during the build of all-multi.

gcc -MD does not generate the makefile dependencies in an
atomic way so make can fail if it concurrently parses those
half-written files.
(not observed on x86, but happens on arm native builds.)

this workaround forces all-multi to only run after the *_.lo
targets are done, but there might be a better solution using
automake properly. (automake should know about the generated
make dependency files that are included into the makefile so
no manual tinkering is needed to get the right build order,
but i don't know how to do that.)

2015-12-04  Szabolcs Nagy  

 PR other/67627
 * Makefile.am (all-multi): Add dependency.
 * Makefile.in: Regenerate.

So looking at the patch, it looks like you're adding
a dependency in Makefile.am to pass it through to
Makefile.in, which is fine.

So I think you just need to replicate that fix across
the other libraries which have this problem.



i don't see other libraries that use all-multi and
include auto-dependency generated makefiles as well.

only libatomic has both.

is it ok to commit and backport?
(gcc-5 and 4.9 have the same issue)

Re: [Patch] S/390: Simplify vector conditionals

2015-12-17 Thread Robin Dapp

Hi,

the attached patch renames the constm1_operand predicate to
all_ones_operand and introduces a check for int mode.
It should be applied on top of the last patch ([Patch] S/390: Simplify
vector conditionals).

Regtested on s390.

Regards
 Robin

gcc/ChangeLog:

2015-12-15 Robin Dapp 

* config/s390/predicates.md: Change and rename
constm1_operand to all_ones_operand
* config/s390/s390.c (s390_expand_vcond): Use all_ones_operand
* config/s390/vector.md: Likewise
diff --git a/gcc/config/s390/predicates.md b/gcc/config/s390/predicates.md
index 5c462c4..02a1e4e 100644
--- a/gcc/config/s390/predicates.md
+++ b/gcc/config/s390/predicates.md
@@ -29,9 +29,10 @@
   (and (match_code "const_int,const_wide_int,const_double,const_vector")
(match_test "op == CONST0_RTX (mode)")))
 
-;; Return true if OP an all ones operand (int/float/vector).
-(define_predicate "constm1_operand"
-  (and (match_code "const_int,const_wide_int,const_double,const_vector")
+;; Return true if OP an all ones operand (int/vector).
+(define_predicate "all_ones_operand"
+  (and (match_code "const_int, const_wide_int, const_vector")
+   (match_test "INTEGRAL_MODE_P (GET_MODE (op))")
(match_test "op == CONSTM1_RTX (mode)")))
 
 ;; Return true if OP is a 4 bit mask operand
diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 529b884..d58c243 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -6169,7 +6169,7 @@ s390_expand_vcond (rtx target, rtx then, rtx els,
 	}
 
   /* if x < 0 ? -1 : 0 or if x >= 0 ? 0 : -1 */
-  else if (constm1_operand (negop, target_mode))
+  else if (all_ones_operand (negop, target_mode))
 	{
 	  rtx res = expand_simple_binop (cmp_mode, ASHIFTRT, cmp_op1,
 	 GEN_INT (shift), target,
@@ -6199,7 +6199,7 @@ s390_expand_vcond (rtx target, rtx then, rtx els,
 
   /* If the results are supposed to be either -1 or 0 we are done
  since this is what our compare instructions generate anyway.  */
-  if (constm1_operand (then, GET_MODE (then))
+  if (all_ones_operand (then, GET_MODE (then))
   && const0_operand (els, GET_MODE (els)))
 {
   emit_move_insn (target, gen_rtx_SUBREG (target_mode,
diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
index f6a85c8..cd9407a 100644
--- a/gcc/config/s390/vector.md
+++ b/gcc/config/s390/vector.md
@@ -1107,7 +1107,7 @@
 	 (eq (match_operand: 3 "register_operand" "")
 	 (match_operand:V 4 "const0_operand" ""))
 	 (match_operand:V 1 "const0_operand" "")
-	 (match_operand:V 2 "constm1_operand" "")))]
+	 (match_operand:V 2 "all_ones_operand" "")))]
   "TARGET_VX"
   [(set (match_dup 0) (match_dup 3))]
 {
@@ -1120,7 +1120,7 @@
 	(if_then_else:V
 	 (eq (match_operand: 3 "register_operand" "")
 	 (match_operand:V 4 "const0_operand" ""))
-	 (match_operand:V 1 "constm1_operand" "")
+	 (match_operand:V 1 "all_ones_operand" "")
 	 (match_operand:V 2 "const0_operand" "")))]
   "TARGET_VX"
   [(set (match_dup 0) (not:V (match_dup 3)))]
@@ -1134,7 +1134,7 @@
 	(if_then_else:V
 	 (ne (match_operand: 3 "register_operand" "")
 	 (match_operand:V 4 "const0_operand" ""))
-	 (match_operand:V 1 "constm1_operand" "")
+	 (match_operand:V 1 "all_ones_operand" "")
 	 (match_operand:V 2 "const0_operand" "")))]
   "TARGET_VX"
   [(set (match_dup 0) (match_dup 3))]
@@ -1149,7 +1149,7 @@
 	 (ne (match_operand: 3 "register_operand" "")
 	 (match_operand:V 4 "const0_operand" ""))
 	 (match_operand:V 1 "const0_operand" "")
-	 (match_operand:V 2 "constm1_operand" "")))]
+	 (match_operand:V 2 "all_ones_operand" "")))]
   "TARGET_VX"
   [(set (match_dup 0) (not:V (match_dup 3)))]
 {
@@ -1185,7 +1185,7 @@
   [(set (match_operand:V 0 "register_operand" "=v")
 	(if_then_else:V
 	 (eq (match_operand: 3 "register_operand" "v")
-	 (match_operand: 4 "constm1_operand" ""))
+	 (match_operand: 4 "all_ones_operand" ""))
 	 (match_operand:V 1 "register_operand" "v")
 	 (match_operand:V 2 "register_operand" "v")))]
   "TARGET_VX"
@@ -1197,7 +1197,7 @@
   [(set (match_operand:V 0 "register_operand" "=v")
 	(if_then_else:V
 	 (eq (not: (match_operand: 3 "register_operand" "v"))
-	 (match_operand: 4 "constm1_operand" ""))
+	 (match_operand: 4 "all_ones_operand" ""))
 	 (match_operand:V 1 "register_operand" "v")
 	 (match_operand:V 2 "register_operand" "v")))]
   "TARGET_VX"

[PATCH] Remove unused modified_noreturn_calls


Bootstrapped on x86_64-unknown-linux-gnu, applied.

Richard.

2015-12-16  Richard Biener  

* gimple-ssa.h (struct gimple_df): Remove modified_noreturn_calls
field.
* tree-ssa.c (delete_tree_ssa): Do not zero it.

Index: gcc/gimple-ssa.h
===
--- gcc/gimple-ssa.h(revision 231696)
+++ gcc/gimple-ssa.h(working copy)
@@ -44,6 +44,9 @@ struct tm_restart_hasher : ggc_ptr_hash<
   }
 };
 
+extern void gt_ggc_mx (gimple *&);
+extern void gt_pch_nx (gimple *&);
+
 struct ssa_name_hasher : ggc_ptr_hash
 {
   /* Hash a tree in a uid_decl_map.  */
@@ -67,13 +70,6 @@ struct ssa_name_hasher : ggc_ptr_hash *modified_noreturn_calls;
-
   /* Array of all SSA_NAMEs used in the function.  */
   vec *ssa_names;
 
Index: gcc/tree-ssa.c
===
--- gcc/tree-ssa.c  (revision 231696)
+++ gcc/tree-ssa.c  (working copy)
@@ -1124,7 +1124,6 @@ delete_tree_ssa (struct function *fn)
   if (fn->gimple_df->decls_to_pointers != NULL)
 delete fn->gimple_df->decls_to_pointers;
   fn->gimple_df->decls_to_pointers = NULL;
-  fn->gimple_df->modified_noreturn_calls = NULL;
   fn->gimple_df = NULL;
 
   /* We no longer need the edge variable maps.  */

Re: [PATCH] Fix some blockers of PR c++/24666 (arrays decay to pointers too early)

2015-12-17 Thread Paolo Carlini


Hi,

On 16/12/2015 23:10, Patrick Palka wrote:

gcc/cp/ChangeLog:

PR c++/59878
* typeck.c (convert_for_initialization): Don't perform an early
decaying conversion if converting to a class type.

gcc/testsuite/ChangeLog:

PR c++/59878
* g++.dg/conversion/pr59878.C: New test.
Nit: note that the actual bug number is 59879, not 59878. Can you please 
correct all those 8 to 9?


Thanks a lot!
Paolo.

RE: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation

2015-12-17 Thread Ajit Kumar Agarwal

Hello Jeff and Richard:

Here is the Summary of the FDO(Feedback Directed Optimization ) performance 
results.

SPEC CPU2000 INT benchmarks.
a) FDO + Splitting Paths enabled + tracer enabled
 Geomean Score = 3907.751673.
b) FDO + No Splitting Paths + tracer enabled
 Geomean Score = 3895.191536.

SPEC CPU2000 FP benchmarks.
a) FDO + Splitting Paths enabled + tracer enabled
 Geomean Score = 4793.321963
b) FDO + No Splitting Paths + tracer enabled
 Geomean Score = 4770.855467

The gains are maximum with Split Paths enabled + tracer pass enabled as 
compared to No Split Paths + tracer enabled. The 
Split Paths pass is very much required.

Thanks & Regards
Ajit

-Original Message-
From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org] On 
Behalf Of Ajit Kumar Agarwal
Sent: Wednesday, December 16, 2015 3:44 PM
To: Richard Biener
Cc: Jeff Law; GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli 
Hunsigida; Nagaraju Mekala
Subject: RE: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa 
representation



-Original Message-
From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org] On 
Behalf Of Richard Biener
Sent: Wednesday, December 16, 2015 3:27 PM
To: Ajit Kumar Agarwal
Cc: Jeff Law; GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli 
Hunsigida; Nagaraju Mekala
Subject: Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa 
representation

On Wed, Dec 16, 2015 at 8:43 AM, Ajit Kumar Agarwal 
 wrote:
> Hello Jeff:
>
> Here is more of a data you have asked for.
>
> SPEC FP benchmarks.
> a) No Path Splitting + tracer enabled
> Geomean Score =  4749.726.
> b) Path Splitting enabled + tracer enabled.
> Geomean Score =  4781.655.
>
> Conclusion: With both Path Splitting and tracer enabled we got maximum gains. 
> I think we need to have Path Splitting pass.
>
> SPEC INT benchmarks.
> a) Path Splitting enabled + tracer not enabled.
> Geomean Score =  3745.193.
> b) No Path Splitting + tracer enabled.
> Geomean Score = 3738.558.
> c) Path Splitting enabled + tracer enabled.
> Geomean Score = 3742.833.

>>I suppose with SPEC you mean SPEC CPU 2006?

The performance data is with respect to SPEC CPU 2000 benchmarks.

>>Can you disclose the architecture you did the measurements on and the compile 
>>flags you used otherwise?

Intel(R) Xeon(R) CPU E5-2687W v3 @ 3.10GHz 
cpu cores   : 10
cache size  : 25600 KB

I have used -O3 and enable the tracer with  -ftracer .

Thanks & Regards
Ajit
>>Note that tracer does a very good job only when paired with FDO so can you 
>>re-run SPEC with FDO and compare with path-splitting enabled on top of that?


Thanks,
Richard.

> Conclusion: We are getting more gains with Path Splitting as compared to 
> tracer. With both Path Splitting and tracer enabled we are also getting  
> gains.
> I think we should have Path Splitting pass.
>
> One more observation: Richard's concern is the creation of multiple 
> exits with Splitting paths through duplication. My observation is,  in 
> tracer pass also there is a creation of multiple exits through duplication. I 
> don’t think that’s an issue with the practicality considering the gains we 
> are getting with Splitting paths with more PRE, CSE and DCE.
>
> Thanks & Regards
> Ajit
>
>
>
>
> -Original Message-
> From: Jeff Law [mailto:l...@redhat.com]
> Sent: Wednesday, December 16, 2015 5:20 AM
> To: Richard Biener
> Cc: Ajit Kumar Agarwal; GCC Patches; Vinod Kathail; Shail Aditya 
> Gupta; Vidhumouli Hunsigida; Nagaraju Mekala
> Subject: Re: [Patch,tree-optimization]: Add new path Splitting pass on 
> tree ssa representation
>
> On 12/11/2015 03:05 AM, Richard Biener wrote:
>> On Thu, Dec 10, 2015 at 9:08 PM, Jeff Law  wrote:
>>> On 12/03/2015 07:38 AM, Richard Biener wrote:

 This pass is now enabled by default with -Os but has no limits on 
 the amount of stmts it copies.
>>>
>>> The more statements it copies, the more likely it is that the path 
>>> spitting will turn out to be useful!  It's counter-intuitive.
>>
>> Well, it's still not appropriate for -Os (nor -O2 I think).  -ftracer 
>> is enabled with -fprofile-use (but it is also properly driven to only 
>> trace hot paths) and otherwise not by default at any optimization level.
> Definitely not appropriate for -Os.  But as I mentioned, I really want to 
> look at the tracer code as it may totally subsume path splitting.
>
>>
>> Don't see how this would work for the CFG pattern it operates on 
>> unless you duplicate the exit condition into that new block creating 
>> an even more obfuscated CFG.
> Agreed, I don't see any way to fix the multiple exit problem.  Then again, 
> this all runs after the tree loop optimizer, so I'm not sure how big of an 
> issue it is in practice.
>
>
>>> It was only after I approved this code after twiddling it for Ajit 
>>> that I came across Honza's

Re: [PATCH] Fix PR68707, 67323

2015-12-17 Thread Alan Lawrence


On 16/12/15 15:01, Richard Biener wrote:


The following patch adds a heuristic to prefer store/load-lanes
over SLP when vectorizing.  Compared to the variant attached to
the PR I made the STMT_VINFO_STRIDED_P behavior explicit (matching
what you've tested).


Not sure I follow this. Compared to the variant attached to the PR - we will now 
attempt to use load-lanes, if (say) all of the loads are strided, even if we 
know we don't support load-lanes (for any of them). That sounds the wrong way 
around and I think rather different to what you proposed earlier? (At the least, 
the debug message "can use load/store lanes" is potentially misleading, that's 
not necessarily the case!)


There are arguments that we want to do less SLP, generally, on ARM/AArch64 but I 
think Wilco's permute cost patch 
https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01469.html is a better way of 
achieving that?


Just my gut feeling at this point - I haven't evaluated this version of the 
patch on any benchmarks etc...


Thanks, Alan

[arm-embedded][PATCH, ARM, 1/3] Document --with-multilib-list for arm--* targets

Hi,

We decided to apply the following patch to the ARM embedded 5 branch.

Best regards,

Thomas

> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
> Sent: Wednesday, December 16, 2015 7:56 PM
> To: gcc-patches@gcc.gnu.org; Richard Earnshaw; Ramana Radhakrishnan;
> Kyrylo Tkachov
> Subject: [PATCH, ARM, 1/3] Document --with-multilib-list for arm*-*-*
> targets
> 
> Currently, the documentation for --with-multilib-list in
> gcc/doc/install.texi only mentions sh*-*-* and x86-64-*-linux* targets.
> However, arm*-*-* targets also support this option. This patch adds
> documention for the meaning of this option for arm*-*-* targets.
> 
> ChangeLog entry is as follows:
> 
> 
> *** gcc/ChangeLog ***
> 
> 2015-12-09  Thomas Preud'homme  
> 
> * doc/install.texi (--with-multilib-list): Describe the meaning of the
> option for arm*-*-* targets.
> 
> 
> diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
> index 57399ed..2c93eb0 100644
> --- a/gcc/doc/install.texi
> +++ b/gcc/doc/install.texi
> @@ -1102,9 +1102,19 @@ sysv, aix.
>  @item --with-multilib-list=@var{list}
>  @itemx --without-multilib-list
>  Specify what multilibs to build.
> -Currently only implemented for sh*-*-* and x86-64-*-linux*.
> +Currently only implemented for arm*-*-*, sh*-*-* and x86-64-*-linux*.
> 
>  @table @code
> +@item arm*-*-*
> +@var{list} is either @code{default} or @code{aprofile}.  Specifying
> +@code{default} is equivalent to omitting this option while specifying
> +@code{aprofile} builds multilibs for each combination of ISA (@code{-
> marm} or
> +@code{-mthumb}), architecture (@code{-march=armv7-a}, @code{-
> march=armv7ve},
> +or @code{-march=armv8-a}), FPU available (none, @code{-
> mfpu=vfpv3-d16},
> +@code{neon}, @code{vfpv4-d16}, @code{neon-vfpv4} or
> @code{neon-fp-armv8}
> +depending on architecture) and floating-point ABI (@code{-mfloat-
> abi=softfp}
> +or @code{-mfloat-abi=hard}).
> +
>  @item sh*-*-*
>  @var{list} is a comma separated list of CPU names.  These must be of
> the
>  form @code{sh*} or @code{m*} (in which case they match the compiler
> option
> 
> 
> PDF builds fine out of the updated file and look as expected.
> 
> Is this ok for trunk?
> 
> Best regards,
> 
> Thomas

[PATCH, ARM 7/6] Enable atomics for ARMv8-M Mainline

Hi,

This patch is part of a patch series to add support for ARMv8-M[1] to GCC. This 
specific patch enable atomics for ARMv8-M Mainline. No change is needed to 
existing patterns since Thumb-2 backend can already handle them fine.

[1] For a quick overview of ARMv8-M please refer to the initial cover letter.


ChangeLog entries are as follow:

*** gcc/ChangeLog ***

2015-12-17  Thomas Preud'homme  

* config/arm/arm.h (TARGET_HAVE_LDACQ): Enable for ARMv8-M Mainline.


diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 
1f79c37b5c36a410a2d500ba92c62a5ba4ca1178..fa2a6fb03ffd2ca53bfb7e7c8f03022b626880e0
 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -258,7 +258,7 @@ extern void (*arm_lang_output_object_attributes_hook)(void);
 || arm_arch7) && arm_arch_notm)
 
 /* Nonzero if this chip supports load-acquire and store-release.  */
-#define TARGET_HAVE_LDACQ  (TARGET_ARM_ARCH >= 8 && arm_arch_notm)
+#define TARGET_HAVE_LDACQ  (TARGET_ARM_ARCH >= 8 && TARGET_32BIT)
 
 /* Nonzero if this chip provides the movw and movt instructions.  */
 #define TARGET_HAVE_MOVT   (arm_arch_thumb2 || arm_arch8)


Testing:

* Toolchain was built successfully with and without the ARMv8-M support patches 
with the following multilib list: armv6-m,armv7-m,armv7e-m,cortex-m7. The code 
generation for crtbegin.o, crtend.o, crti.o, crtn.o, libgcc.a, libgcov.a, 
libc.a, libg.a, libgloss-linux.a, libm.a, libnosys.a, librdimon.a, librdpmon.a, 
libstdc++.a and libsupc++.a is unchanged for all these targets. 

* GCC also showed no testsuite regression when targeting ARMv8-M Baseline 
compared to ARMv6-M on ARM Fast Models and when targeting ARMv6-M and ARMv7-M 
(compared to without the patch)
* GCC was bootstrapped successfully targeting Thumb-1 and targeting Thumb-2

Is this ok for stage3?

Best regards,

Thomas

Re: [PATCH] PR target/68937: i686: -fno-plt produces wrong code (maybe only with tailcall

2015-12-17 Thread Uros Bizjak

On Thu, Dec 17, 2015 at 12:29 AM, H.J. Lu  wrote:
> Since sibcall never returns, we can only use call-clobbered register
> as GOT base.  Otherwise, callee-saved register used as GOT base won't
> be properly restored.
>
> Tested on x86-64 with -m32.  OK for trunk?

You don't have to add explicit clobber for members of "CLOBBERED_REGS"
class, and register_no_elim_operand predicate should be used with "U"
constraint. Also, please introduce new predicate, similar to how
GOT_memory_operand is defined and handled.

Uros.

>
> H.J.
> ---
> gcc/
>
> PR target/68937
> * config/i386/i386.c (ix86_function_ok_for_sibcall): Count
> call via GOT slot as indirect call.
> (ix86_expand_call): Mark PIC register used for sibcall as
> call-clobbered.
> * config/i386/i386.md (*sibcall_GOT_32): New pattern.
> (*sibcall_value_GOT_32): Likewise.
>
> gcc/testsuite/
>
> PR target/68937
> * gcc.target/i386/pr68937-1.c: New test.
> * gcc.target/i386/pr68937-2.c: Likewise.
> * gcc.target/i386/pr68937-3.c: Likewise.
> * gcc.target/i386/pr68937-4.c: Likewise.
> ---
>  gcc/config/i386/i386.c| 15 ---
>  gcc/config/i386/i386.md   | 45 
> +++
>  gcc/testsuite/gcc.target/i386/pr68937-1.c | 13 +
>  gcc/testsuite/gcc.target/i386/pr68937-2.c | 13 +
>  gcc/testsuite/gcc.target/i386/pr68937-3.c | 13 +
>  gcc/testsuite/gcc.target/i386/pr68937-4.c | 13 +
>  6 files changed, 109 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr68937-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr68937-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr68937-3.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr68937-4.c
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index cecea24..ebc9d09 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -6723,8 +6723,10 @@ ix86_function_ok_for_sibcall (tree decl, tree exp)
>/* If this call is indirect, we'll need to be able to use a
>  call-clobbered register for the address of the target function.
>  Make sure that all such registers are not used for passing
> -parameters.  Note that DLLIMPORT functions are indirect.  */
> +parameters.  Note that DLLIMPORT functions and call via GOT
> +slot are indirect.  */
>if (!decl
> + || (flag_pic && !flag_plt)
>   || (TARGET_DLLIMPORT_DECL_ATTRIBUTES && DECL_DLLIMPORT_P (decl)))
> {
>   /* Check if regparm >= 3 since arg_reg_available is set to
> @@ -27019,8 +27021,8 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx 
> callarg1,
>   rtx callarg2,
>   rtx pop, bool sibcall)
>  {
> -  rtx vec[3];
> -  rtx use = NULL, call;
> +  rtx vec[4];
> +  rtx use = NULL, call, clobber = NULL;
>unsigned int vec_len = 0;
>
>if (pop == const0_rtx)
> @@ -27075,6 +27077,10 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx 
> callarg1,
>   fnaddr = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, addr),
>UNSPEC_GOT);
>   fnaddr = gen_rtx_CONST (Pmode, fnaddr);
> + /* Since sibcall never returns, mark PIC register as
> +call-clobbered.  */
> + if (sibcall)
> +   clobber = pic_offset_table_rtx;
>   fnaddr = gen_rtx_PLUS (Pmode, pic_offset_table_rtx,
>  fnaddr);
> }
> @@ -27151,6 +27157,9 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx 
> callarg1,
>  }
>vec[vec_len++] = call;
>
> +  if (clobber)
> +vec[vec_len++] = gen_rtx_CLOBBER (VOIDmode, clobber);
> +
>if (pop)
>  {
>pop = gen_rtx_PLUS (Pmode, stack_pointer_rtx, pop);
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index 49b2216..65c1534 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -11865,6 +11865,28 @@
>"* return ix86_output_call_insn (insn, operands[0]);"
>[(set_attr "type" "call")])
>
> +;; Since sibcall never returns, we can only use call-clobbered register
> +;; as GOT base.
> +(define_insn "*sibcall_GOT_32"
> +  [(call (mem:QI
> +  (mem:SI (plus:SI
> +(match_operand:SI 0 "register_operand" "U")
> +(const:SI
> +  (unspec:SI [(match_operand:SI 1 "symbol_operand")]
> +   UNSPEC_GOT)
> +(match_operand 2))
> +   (clobber (match_dup 0))]
> +  "!TARGET_MACHO && !TARGET_64BIT && SIBLING_CALL_P (insn)"
> +{
> +  rtx fnaddr = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, operands[1]),
> +  UNSPEC_GOT);
> +  fnaddr = gen_rtx_CONST (Pmode, fnaddr);
> +  fnaddr = gen_rtx_PLUS (Pmode, operands[0], fnaddr);
> +  fnaddr =

Re: [PATCH 2/4] gcc/arc: Remove load_update_operand predicate




On 16/12/15 00:15, Andrew Burgess wrote:
  


* config/arc/arc.md (*loadqi_update): Use 'memory_operand' and fix
RTL pattern to include the plus.
(*load_zeroextendqisi_update): Likewise.
(*load_signextendqisi_update): Likewise.
(*loadhi_update): Likewise.
(*load_zeroextendhisi_update): Likewise.
(*load_signextendhisi_update): Likewise.
(*loadsi_update): Likewise.
(*loadsf_update): Likewise.
* config/arc/predicates.md (load_update_operand): Delete.


Store_update_operand has the very same problem, so it would make sense 
to fix that

in the same check-in.
FWIW, while using "memory_operand" makes for simple source code, it 
introduces
duplicated checks (and they are appropriate only because the the update 
and some of the
non-update addressing modes on the ARC are different modes of the same 
encoding).
It checks that the inside of the MEM is a valid memory address, which is 
redundant
with the pattern-provided checks that there's a plus with appropriate 
base and index/update

operand inside.
Also, by using memory_operand, you are adding a check to reject volatile 
memory operands

during most optimization passes.
Note that ARC's move_src_operand and move_dest_operand are fine with 
volatile MEMs

irrespective of the setting of volatile_ok.
Problems with volatiles can rally only be expected if there are multiple 
MEMs in a single pattern,
that might alias, arithmetic on MEM that might result from 
simplifications using a different set of
MEMs, or if the machine instructions that a pattern corresponds to are 
intrinsically unsuitable for

volatile.
Needlessly rejecting volatile MEMs will reduce optimization potential; 
this tends to be more
visible on embedded platforms than with embedded code than with typical 
workstation code.


Less reliance on this addressing modes orthogonality, no change of 
volatile behaviour,
and maybe a slightly faster compiler, would be to just have an 
"update_operand" or
"any_mem_operand" predicate checking that the inside is a MEM. and leave 
the address

processing entirely to the instruction pattern and its operand predicates.

Re: [PATCH 1/5] Fix asymmetric comparison functions

2015-12-17 Thread Andrey Belevantsev


On 17.12.2015 15:13, Yury Gribov wrote:

On 12/17/2015 02:58 PM, Andrey Belevantsev wrote:

Hello,

On 17.12.2015 11:58, Yury Gribov wrote:

Some obvious symmetry fixes.

Cc-ing
* Andrey (Belevantsev) for bb_top_order_comparator


Here, as Jakub mentioned, we assume that the argument addresses will
never be equal,


The problem is that this is not guaranteed.


Well, if the consensus is that this is indeed the case, you're free to 
change both places as you suggest.


Yours,
Andrey




thus that would always be different basic blocks (the
comparator is used for providing a custom sort over loop body bbs) and
you don't need a return 0 there.  You can put there gcc_unreachable
instead as in ...


* Andrew (MacLeod) for compare_case_labels
* Andrew (Pinski) for resort_field_decl_cmp
* Diego for pair_cmp
* Geoff for resort_method_name_cmp
* Jakub for compare_case_labels
* Jason for method_name_cmp
* Richard for insert_phi_nodes_compare_var_infos, compare_case_labels
* Steven for cmp_v_in_regset_pool


... this case -- here gcc_unreachable () marks that we're sorting pool
pointers and their values are always different.  Please do not remove it.


Same here.

/Yury

Re: [PATCH] PR target/68937: i686: -fno-plt produces wrong code (maybe only with tailcall

On Thu, Dec 17, 2015 at 2:04 AM, Uros Bizjak  wrote:
> On Thu, Dec 17, 2015 at 12:29 AM, H.J. Lu  wrote:
>> Since sibcall never returns, we can only use call-clobbered register
>> as GOT base.  Otherwise, callee-saved register used as GOT base won't
>> be properly restored.
>>
>> Tested on x86-64 with -m32.  OK for trunk?
>
> You don't have to add explicit clobber for members of "CLOBBERED_REGS"
> class, and register_no_elim_operand predicate should be used with "U"
> constraint. Also, please introduce new predicate, similar to how
> GOT_memory_operand is defined and handled.
>

Here is the updated patch.  There is a predicate already,
sibcall_memory_operand.  It allows any registers to
be as GOT base, which is the root of our problem.
This patch removes GOT slot from it and handles
sibcall over GOT slot with *sibcall_GOT_32 and
*sibcall_value_GOT_32 patterns.  Since I need to
expose constraints on GOT base register to RA,
I have to use 2 operands, GOT base and function
symbol, to describe sibcall over 32-bit GOT slot.

OK for master if there is no regression.

Thanks.

-- 
H.J.
From e055e1ea71353897aa7ce4b38a5186c8b64ddc7c Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Wed, 16 Dec 2015 12:34:57 -0800
Subject: [PATCH] Use call-clobbered register for sibcall via GOT

Since sibcall never returns, we can only use call-clobbered register
as GOT base.  Otherwise, callee-saved register used as GOT base won't
be properly restored.

gcc/

	PR target/68937
	* config/i386/i386.c (ix86_function_ok_for_sibcall): Count
	call via GOT slot as indirect call.
	* config/i386/i386.md (*sibcall_GOT_32): New pattern.
	(*sibcall_value_GOT_32): Likewise.
	* config/i386/predicates.md (sibcall_memory_operand): Remove
	GOT slot.

gcc/testsuite/

	PR target/68937
	* gcc.target/i386/pr68937-1.c: New test.
	* gcc.target/i386/pr68937-2.c: Likewise.
	* gcc.target/i386/pr68937-3.c: Likewise.
	* gcc.target/i386/pr68937-4.c: Likewise.
	* gcc.target/i386/pr68937-5.c: Likewise.
---
 gcc/config/i386/i386.c|  4 ++-
 gcc/config/i386/i386.md   | 43 +++
 gcc/config/i386/predicates.md | 10 +++
 gcc/testsuite/gcc.target/i386/pr68937-1.c | 13 ++
 gcc/testsuite/gcc.target/i386/pr68937-2.c | 13 ++
 gcc/testsuite/gcc.target/i386/pr68937-3.c | 13 ++
 gcc/testsuite/gcc.target/i386/pr68937-4.c | 13 ++
 gcc/testsuite/gcc.target/i386/pr68937-5.c |  9 +++
 8 files changed, 111 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr68937-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr68937-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr68937-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr68937-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr68937-5.c

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index cecea24..0e2bec3 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -6723,8 +6723,10 @@ ix86_function_ok_for_sibcall (tree decl, tree exp)
   /* If this call is indirect, we'll need to be able to use a
 	 call-clobbered register for the address of the target function.
 	 Make sure that all such registers are not used for passing
-	 parameters.  Note that DLLIMPORT functions are indirect.  */
+	 parameters.  Note that DLLIMPORT functions and call via GOT
+	 slot are indirect.  */
   if (!decl
+	  || (flag_pic && !flag_plt)
 	  || (TARGET_DLLIMPORT_DECL_ATTRIBUTES && DECL_DLLIMPORT_P (decl)))
 	{
 	  /* Check if regparm >= 3 since arg_reg_available is set to
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 49b2216..7c62586 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -11865,6 +11865,27 @@
   "* return ix86_output_call_insn (insn, operands[0]);"
   [(set_attr "type" "call")])
 
+;; Since sibcall never returns, we can only use call-clobbered register
+;; as GOT base.
+(define_insn "*sibcall_GOT_32"
+  [(call (mem:QI
+	   (mem:SI (plus:SI
+		 (match_operand:SI 0 "register_no_elim_operand" "U")
+		 (const:SI
+		   (unspec:SI [(match_operand:SI 1 "symbol_operand")]
+			UNSPEC_GOT)
+	 (match_operand 2))]
+  "!TARGET_MACHO && !TARGET_64BIT && SIBLING_CALL_P (insn)"
+{
+  rtx fnaddr = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, operands[1]),
+			   UNSPEC_GOT);
+  fnaddr = gen_rtx_CONST (Pmode, fnaddr);
+  fnaddr = gen_rtx_PLUS (Pmode, operands[0], fnaddr);
+  fnaddr = gen_const_mem (Pmode, fnaddr);
+  return ix86_output_call_insn (insn, fnaddr);
+}
+  [(set_attr "type" "call")])
+
 (define_insn "*sibcall"
   [(call (mem:QI (match_operand:W 0 "sibcall_insn_operand" "UBsBz"))
 	 (match_operand 1))]
@@ -12042,6 +12063,28 @@
   "* return ix86_output_call_insn (insn, operands[1]);"
   [(set_attr "type" "callv")])
 
+;; Since sibcall never returns, we can only use call-clobbered register
+;; as GOT base.
+(define_insn "*sibcall_value_GOT_32"

Re: [PATCH] Fix some blockers of PR c++/24666 (arrays decay to pointers too early)

2015-12-17 Thread Patrick Palka


On Thu, 17 Dec 2015, Paolo Carlini wrote:


Hi,

On 16/12/2015 23:10, Patrick Palka wrote:

gcc/cp/ChangeLog:

PR c++/59878
* typeck.c (convert_for_initialization): Don't perform an early
decaying conversion if converting to a class type.

gcc/testsuite/ChangeLog:

PR c++/59878
* g++.dg/conversion/pr59878.C: New test.
Nit: note that the actual bug number is 59879, not 59878. Can you please 
correct all those 8 to 9?


Sorry about that... Going to correct this with the following patch after
a quick regtest:

--- 8< ---

Subject: [PATCH] Fix wrong PR references

PR c++/59878 -> PR c++/59879
---
 gcc/cp/ChangeLog  |  2 +-
 gcc/testsuite/ChangeLog   |  4 ++--
 gcc/testsuite/g++.dg/conversion/pr59878.C | 25 -
 gcc/testsuite/g++.dg/conversion/pr59879.C | 25 +
 4 files changed, 28 insertions(+), 28 deletions(-)
 delete mode 100644 gcc/testsuite/g++.dg/conversion/pr59878.C
 create mode 100644 gcc/testsuite/g++.dg/conversion/pr59879.C

diff --git a/gcc/cp/ChangeLog b/gcc/cp/ChangeLog
index 14292e9..a192f00 100644
--- a/gcc/cp/ChangeLog
+++ b/gcc/cp/ChangeLog
@@ -2,7 +2,7 @@

PR c++/16333
PR c++/41426
-   PR c++/59878
+   PR c++/59879
PR c++/66895
* typeck.c (convert_for_initialization): Don't perform an early
decaying conversion if converting to a class type.
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index fafa8cc..7386f6b 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -37,11 +37,11 @@

PR c++/16333
PR c++/41426
-   PR c++/59878
+   PR c++/59879
PR c++/66895
* g++.dg/conversion/pr16333.C: New test.
* g++.dg/conversion/pr41426.C: New test.
-   * g++.dg/conversion/pr59878.C: New test.
+   * g++.dg/conversion/pr59879.C: New test.
* g++.dg/conversion/pr66895.C: New test.

 2015-12-16  Martin Sebor  
diff --git a/gcc/testsuite/g++.dg/conversion/pr59878.C 
b/gcc/testsuite/g++.dg/conversion/pr59878.C
deleted file mode 100644
index ed567fe..000
--- a/gcc/testsuite/g++.dg/conversion/pr59878.C
+++ /dev/null
@@ -1,25 +0,0 @@
-// PR c++/59878
-
-struct Test {
- template 
- Test(const char ()[N]) {}
-};
-
-Test test() {
- return "test1";
-}
-
-void test2(Test arg = "test12") {}
-
-template 
-void test3(T arg = "test123") {}
-
-template 
-void test4(const T  = "test123") {}
-
-int main() {
- test();
- test2();
- test3();
- test4();
-}
diff --git a/gcc/testsuite/g++.dg/conversion/pr59879.C 
b/gcc/testsuite/g++.dg/conversion/pr59879.C
new file mode 100644
index 000..7bd5b99
--- /dev/null
+++ b/gcc/testsuite/g++.dg/conversion/pr59879.C
@@ -0,0 +1,25 @@
+// PR c++/59879
+
+struct Test {
+ template 
+ Test(const char ()[N]) {}
+};
+
+Test test() {
+ return "test1";
+}
+
+void test2(Test arg = "test12") {}
+
+template 
+void test3(T arg = "test123") {}
+
+template 
+void test4(const T  = "test123") {}
+
+int main() {
+ test();
+ test2();
+ test3();
+ test4();
+}
--
2.7.0.rc0.50.g1470d8f.dirty

[PATCH] Fix PR68946


This fixes PR68946.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2015-12-17  Richard Biener  

PR tree-optimization/68946
* tree-vect-slp.c (vect_slp_analyze_node_operations): Push
SLP def type to stmt operands one stmt at a time.

* gcc.dg/torture/pr68946.c: New testcase.

Index: gcc/tree-vect-slp.c
===
--- gcc/tree-vect-slp.c (revision 231745)
+++ gcc/tree-vect-slp.c (working copy)
@@ -2221,12 +2250,6 @@ vect_slp_analyze_node_operations (slp_tr
 if (!vect_slp_analyze_node_operations (child))
   return false;
 
-  /* Push SLP node def-type to stmts.  */
-  FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child)
-if (SLP_TREE_DEF_TYPE (child) != vect_internal_def)
-  FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (child), j, stmt)
-   STMT_VINFO_DEF_TYPE (vinfo_for_stmt (stmt)) = SLP_TREE_DEF_TYPE (child);
-
   bool res = true;
   FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), i, stmt)
 {
@@ -2234,19 +2257,21 @@ vect_slp_analyze_node_operations (slp_tr
   gcc_assert (stmt_info);
   gcc_assert (STMT_SLP_TYPE (stmt_info) != loop_vect);
 
-  if (!vect_analyze_stmt (stmt, , node))
-   {
- res = false;
- break;
-   }
+  /* Push SLP node def-type to stmt operands.  */
+  FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), j, child)
+   if (SLP_TREE_DEF_TYPE (child) != vect_internal_def)
+ STMT_VINFO_DEF_TYPE (vinfo_for_stmt (SLP_TREE_SCALAR_STMTS 
(child)[i]))
+   = SLP_TREE_DEF_TYPE (child);
+  res = vect_analyze_stmt (stmt, , node);
+  /* Restore def-types.  */
+  FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), j, child)
+   if (SLP_TREE_DEF_TYPE (child) != vect_internal_def)
+ STMT_VINFO_DEF_TYPE (vinfo_for_stmt (SLP_TREE_SCALAR_STMTS 
(child)[i]))
+   = vect_internal_def;
+  if (! res)
+   break;
 }
 
-  /* Restore stmt def-types.  */
-  FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child)
-if (SLP_TREE_DEF_TYPE (child) != vect_internal_def)
-  FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (child), j, stmt)
-   STMT_VINFO_DEF_TYPE (vinfo_for_stmt (stmt)) = vect_internal_def;
-
   return res;
 }
 
Index: gcc/testsuite/gcc.dg/torture/pr68946.c
===
--- gcc/testsuite/gcc.dg/torture/pr68946.c  (revision 0)
+++ gcc/testsuite/gcc.dg/torture/pr68946.c  (working copy)
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-fno-vect-cost-model" } */
+
+int printf (const char *, ...);
+
+int a, b, g;
+short c, e, h, i;
+int f[8];
+void fn1() {
+short j;
+for (; a;) {
+   printf("%d", g);
+   b = 7;
+   for (; b >= 0; b--) {
+   i = 1;
+   short k = f[b];
+   e = k ? k : 3;
+   j = (i && (c |= e)) << 3;
+   int l = j, m = 0;
+   h = l < 0 || l >> m;
+   f[b] = h;
+   }
+}
+}

[ptx] annotate 2 tests

2015-12-17 Thread Nathan Sidwell


These two tests require label values,  annotated thusly.

nathan
2015-12-17  Nathan Sidwell  

	* c-c++-common/Wunused-var-13.c: Requires label values.
	* gcc.dg/torture/pr46216.c: Likewise.

Index: c-c++-common/Wunused-var-13.c
===
--- c-c++-common/Wunused-var-13.c	(revision 231757)
+++ c-c++-common/Wunused-var-13.c	(working copy)
@@ -1,6 +1,7 @@
 /* PR c/46015 */
 /* { dg-options "-Wunused" } */
 /* { dg-do compile } */
+/* { dg-require-effective-target label_values } */
 
 int
 f1 (int i)
Index: gcc.dg/torture/pr46216.c
===
--- gcc.dg/torture/pr46216.c	(revision 231757)
+++ gcc.dg/torture/pr46216.c	(working copy)
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-require-effective-target label_values } */
 
 typedef int Embryo_Cell;
 int

Re: [PATCH] PR target/68937: i686: -fno-plt produces wrong code (maybe only with tailcall

2015-12-17 Thread Uros Bizjak

On Thu, Dec 17, 2015 at 2:00 PM, H.J. Lu  wrote:
> On Thu, Dec 17, 2015 at 2:04 AM, Uros Bizjak  wrote:
>> On Thu, Dec 17, 2015 at 12:29 AM, H.J. Lu  wrote:
>>> Since sibcall never returns, we can only use call-clobbered register
>>> as GOT base.  Otherwise, callee-saved register used as GOT base won't
>>> be properly restored.
>>>
>>> Tested on x86-64 with -m32.  OK for trunk?
>>
>> You don't have to add explicit clobber for members of "CLOBBERED_REGS"
>> class, and register_no_elim_operand predicate should be used with "U"
>> constraint. Also, please introduce new predicate, similar to how
>> GOT_memory_operand is defined and handled.
>>
>
> Here is the updated patch.  There is a predicate already,
> sibcall_memory_operand.  It allows any registers to
> be as GOT base, which is the root of our problem.
> This patch removes GOT slot from it and handles
> sibcall over GOT slot with *sibcall_GOT_32 and
> *sibcall_value_GOT_32 patterns.  Since I need to
> expose constraints on GOT base register to RA,
> I have to use 2 operands, GOT base and function
> symbol, to describe sibcall over 32-bit GOT slot.

Please use

   (mem:SI (plus:SI
 (match_operand:SI 0 "register_no_elim_operand" "U")
 (match_operand:SI 1 "GOT32_symbol_operand")))
...

to avoid manual rebuild of the operand.

Uros.

RE: [PATCH][AArch64] Add TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS

2015-12-17 Thread Wilco Dijkstra

James Greenhalgh wrote:
> On Wed, Dec 16, 2015 at 01:05:21PM +, Wilco Dijkstra wrote:
> > James Greenhalgh wrote:
> > > On Tue, Dec 15, 2015 at 10:54:49AM +, Wilco Dijkstra wrote:
> > > > ping
> > > >
> > > > > -Original Message-
> > > > > From: Wilco Dijkstra [mailto:wilco.dijks...@arm.com]
> > > > > Sent: 06 November 2015 20:06
> > > > > To: 'gcc-patches@gcc.gnu.org'
> > > > > Subject: [PATCH][AArch64] Add TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS
> > > > >
> > > > > This patch adds support for the TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS
> > > > > hook. When the cost of GENERAL_REGS and FP_REGS is identical, the 
> > > > > register
> > > > > allocator always uses ALL_REGS even when it has a much higher cost. 
> > > > > The
> > > > > hook changes the class to either FP_REGS or GENERAL_REGS depending on 
> > > > > the
> > > > > mode of the register. This results in better register allocation 
> > > > > overall,
> > > > > fewer spills and reduced codesize - particularly in SPEC2006 gamess.
> > > > >
> > > > > GCC regression passes with several minor fixes.
> > > > >
> > > > > OK for commit?
> > > > >
> > > > > ChangeLog:
> > > > > 2015-11-06  Wilco Dijkstra  
> > > > >
> > > > >   * gcc/config/aarch64/aarch64.c
> > > > >   (TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS): New define.
> > > > >   (aarch64_ira_change_pseudo_allocno_class): New function.
> > > > >   * gcc/testsuite/gcc.target/aarch64/cvtf_1.c: Build with -O2.
> > > > >   * gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c
> > > > >   (test_corners_sisd_di): Improve force to SIMD register.
> > > > >   (test_corners_sisd_si): Likewise.
> > > > >   * gcc/testsuite/gcc.target/aarch64/vdup_lane_2.c: Build with 
> > > > > -O2.
> > > > >   * gcc/testsuite/gcc.target/aarch64/vect-ld1r-compile-fp.c:
> > > > >   Remove scan-assembler check for ldr.
> > >
> > > Drop the gcc/ from the ChangeLog.
> > >
> > > > > --
> > > > >  gcc/config/aarch64/aarch64.c   | 22 
> > > > > ++
> > > > >  gcc/testsuite/gcc.target/aarch64/cvtf_1.c  |  2 +-
> > > > >  gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c  |  4 ++--
> > > > >  gcc/testsuite/gcc.target/aarch64/vdup_lane_2.c |  2 +-
> > > > >  .../gcc.target/aarch64/vect-ld1r-compile-fp.c  |  1 -
> > >
> > > These testsuite changes concern me a bit, and you don't mention them 
> > > beyond
> > > saying they are minor fixes...
> >
> > Well any changes to register allocator preferencing would cause fallout in
> > tests that are assuming which register is allocated, especially if they use
> > nasty inline assembler hacks to do so...
> 
> Sure, but the testcases here each operate on data that should live in
> FP_REGS given the initial conditions that the nasty hacks try to mimic -
> that's what makes the regressions notable.
> 
> >
> > > > >  #define FCVTDEF(ftype,itype) \
> > > > >  void \
> > > > > diff --git a/gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c 
> > > > > b/gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c
> > > > > index 363f554..8465c89 100644
> > > > > --- a/gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c
> > > > > +++ b/gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c
> > > > > @@ -186,9 +186,9 @@ test_corners_sisd_di (Int64x1 b)
> > > > >  {
> > > > >force_simd_di (b);
> > > > >b = b >> 63;
> > > > > +  force_simd_di (b);
> > > > >b = b >> 0;
> > > > >b += b >> 65; /* { dg-warning "right shift count >= width of type" 
> > > > > } */
> > > > > -  force_simd_di (b);
> > >
> > > This one I don't understand, but seems to say that we've decided to move
> > > b out of FP_REGS after getting it in there for b = b << 63; ? So this is
> > > another register allocator regression?
> >
> > No, basically the register allocator is now making better decisions as to
> > where to allocate integer variables. It will only allocate them to FP
> > registers if they are primarily used by other FP operations. The
> > force_simd_di inline assembler tries to mimic FP uses, and if there are
> > enough of them at the right places then everything works as expected.  If
> > however you do 3 consecutive integer operations then the allocator will now
> > correctly prefer to allocate them to the integer registers (while previously
> > it wouldn't, which is inefficient).
> 
> I'm not sure I understand this argument in the abstract (though I believe
> it for some of the supported cores for the AArch64 target). At an abstract
> level, given a set of operations which can execute in either FP_REGS or
> GENERAL_REGS and initial and post conditions that allocate all input and
> output registers from those operations to FP_REGS, I would expect those
> operations to take place using FP_REGS? Your patch seems to break this
> expectation?

No my patch doesn't break that expectation. The goal is that if the cost of 
allocating to either integer or FP registers is the same, we prefer the most
natural register

Re: libgcc: unwind-ia64.c without malloc/free


On 12/17/2015 12:17 AM, Bernd Edlinger wrote:

this is just an idea, how to avoid use of malloc in unwind-ia64.c.

[...]

What do you think?


Not worth worrying about IMO. I think ia64 is dead and best left to rest 
in maintenance mode.



Bernd

Re: libgcc: unwind-ia64.c without malloc/free


On 12/17/2015 06:17 AM, Bernd Schmidt wrote:

On 12/17/2015 12:17 AM, Bernd Edlinger wrote:

this is just an idea, how to avoid use of malloc in unwind-ia64.c.

[...]

What do you think?


Not worth worrying about IMO. I think ia64 is dead and best left to rest
in maintenance mode.
Agreed.  And in general using alloca is a problem waiting to happen 
unless you can prove there's no way to blow out the stack.   I can't 
count the number of problems of that nature we've fixed in glibc over 
the last 5 years when the hackers realized that was a great attack vector.


jeff

Re: [PATCH] Remove unused modified_noreturn_calls

On Thu, 17 Dec 2015, Bernd Schmidt wrote:

> On 12/17/2015 10:59 AM, Richard Biener wrote:
> > 
> > +extern void gt_ggc_mx (gimple *&);
> > +extern void gt_pch_nx (gimple *&);
> > +
> 
> This doesn't occur in the ChangeLog - unrelated change?

Not unrelated, it's required to make gtype-desc.c compile.  See
other occurances of these forward-decls.  They are needed from
hash_map/table.

Took me quite a while to figure out ;)

Richard.

Re: [PATCH][ARC] Refurbish and extend builtin function support for ARC




On 17/12/15 09:31, Claudiu Zissulescu wrote:

Please find a new patch that refurbishes and extends the builtin function 
support for ARC. I also added a number of builtins for ARCv2 architecture, and 
a number of tests.

Ok to commit?

gcc/
2015-12-14  Claudiu Zissulescu  

...


  
	(VUNSPEC_DEXCL_NORES, VUNSPEC_LR_HIGH): Remove

Typo: missing a period.
Otherwise, this is OK.

Although, I think the regular builtin part of arc_expand_builtin could be
simpler if you increased the size op xop by one, and put target into xop[0].

This would then lend itself to further simplification if we had something
to common these switches on the number of arguments to pass to GEN_FCN
strewn over various parts and ports of gcc.
Like:

rtx_insn *
apply_GEN_FCN (enum insn_code icode, rtx *arg)
{
  switch (insn_data[icode].n_generator_args)
{
case 0:
  return GEN_FCN (icode) ();
case 1:
  return GEN_FCN (icode) (arg[0]);
...
}
}

This could be generated by one of the generator programs so that the 
switch has as many cases

as required to cover the full range of insn_data[icode].n_generator_args .

Re: [PATCH] Remove unused modified_noreturn_calls


On 12/17/2015 10:59 AM, Richard Biener wrote:


+extern void gt_ggc_mx (gimple *&);
+extern void gt_pch_nx (gimple *&);
+


This doesn't occur in the ChangeLog - unrelated change?


Bernd

Re: [PATCHES, PING*5] Enhance standard DWARF for Ada

2015-12-17 Thread Pierre-Marie de Rodat


On 12/16/2015 10:30 PM, Jason Merrill wrote:

OK with those changes.


All changes done, and all patches pushed. Thank you very much!!

--
Pierre-Marie de Rodat

Re: [PATCH][combine] Check WORD_REGISTER_OPERATIONS normally rather than through preprocessor

2015-12-17 Thread Segher Boessenkool

Hi Kyrill,

On Tue, Dec 15, 2015 at 05:07:41PM +, Kyrill Tkachov wrote:
> As part of the war on conditional compilation here's an #if check on 
> WORD_REGISTER_OPERATIONS that
> seems to have been missed out.
> 
> Bootstrapped and tested on arm, aarch64, x86_64.
> 
> Is it still ok to commit these kinds of conditional compilation conversions?

You could say it is a bugfix, a missed case in the conversion ;-)

> diff --git a/gcc/combine.c b/gcc/combine.c
> index 
> 8601d8983ce345e2129dd047b3520d98c0582842..0658a6dbc6df6862df662bc7842c13ed06b36b04
>  100644
> --- a/gcc/combine.c
> +++ b/gcc/combine.c
> @@ -11488,10 +11488,10 @@ simplify_comparison (enum rtx_code code, rtx *pop0, 
> rtx *pop1)
>/* Try a few ways of applying the same transformation to both operands.  */
>while (1)
>  {
> -#if !WORD_REGISTER_OPERATIONS
>/* The test below this one won't handle SIGN_EXTENDs on these machines,
>so check specially.  */
> -  if (code != GTU && code != GEU && code != LTU && code != LEU
> +  if (!WORD_REGISTER_OPERATIONS && code != GTU && code != GEU
> +   && code != LTU && code != LEU

Please keep all the code != together, i.e.

+  if (!WORD_REGISTER_OPERATIONS
+ && code != GTU && code != GEU && code != LTU && code != LEU

Okay with that change.


Segher

Re: [Fortran, Patch] (RFC, Coarray) Implement TS18508's EVENTS

2015-12-17 Thread Steve Kargl

On Thu, Dec 17, 2015 at 01:22:06PM +0100, Alessandro Fanfarillo wrote:
> 
> I've noticed that this patch has been applied only on trunk and not on
> the gcc-5-branch. Is it a problem to include EVENTS in gcc-5?
> 

No problem.  When I applied the EVENTS patch to trunk,
the 5.3 release was being prepared.  I was going to
wait for a week or two after 5.3 came out, then apply
the patch.  Now that you have commit access, feel 
free to back port the patch.  Rememer to post the
patch that you commit to both the fortran and gcc-patches
list. 

-- 
Steve

Re: [PATCH 4/5] Fix intransitive comparison in compare_access_positions

2015-12-17 Thread Martin Jambor

Hi,

On Thu, Dec 17, 2015 at 12:02:11PM +0300, Yury Gribov wrote:
> Another intransitive comparison in reload_pseudo_compare_func. Buggy
> scenario:
> 1) A and B are ints of equal presion so we return 0
> 2) C is REAL and thus can compare differently to A and B
> 
> Cc-ing Martin who's the original author.

I cannot approve it but I also do not object to this change.
Thanks,

Martin

> 
> /Yury

> From 6f3930ad81945f6b5d7aecfdda16089547a592d3 Mon Sep 17 00:00:00 2001
> From: Yury Gribov 
> Date: Sat, 12 Dec 2015 10:39:15 +0300
> Subject: [PATCH 4/5] Fix intransitive comparison in compare_access_positions.
> 
> 2015-12-17  Yury Gribov  
> 
>   * tree-sra.c (compare_access_positions):
>   Make transitive.
>

Re: [PATCH 5/5] Fix intransitive comparison in dr_group_sort_cmp

On Thu, 17 Dec 2015, Yury Gribov wrote:

> On 12/17/2015 02:57 PM, Richard Biener wrote:
> > On Thu, 17 Dec 2015, Yury Gribov wrote:
> > 
> > > That's an interesting one. The original comparison function assumes that
> > > operand_equal_p(a,b) is true iff compare_tree(a, b) == 0.
> > > Unfortunately that's not true (functions are written by different
> > > authors).
> > > 
> > > This causes subtle violation of transitiveness.
> > > 
> > > I believe removing operand_equal_p should preserve the intended semantics
> > > (same approach taken in another comparison function in this file -
> > > comp_dr_with_seg_len_pair).
> > > 
> > > Cc-ing Cong Hou and Richard who are the authours.
> > 
> > I don't think the patch is good.  compare_tree really doesn't expect
> > equal elements (and it returning zero is bad or a bug).
> 
> Hm but that's how it's used in other comparator in this file
> (comp_dr_with_seg_len_pair).

But for sure

  switch (code)
{
/* For const values, we can just use hash values for comparisons.  */
case INTEGER_CST:
case REAL_CST:
case FIXED_CST:
case STRING_CST:
case COMPLEX_CST:
case VECTOR_CST:
  {
hashval_t h1 = iterative_hash_expr (t1, 0);
hashval_t h2 = iterative_hash_expr (t2, 0);
if (h1 != h2)
  return h1 < h2 ? -1 : 1;
break;
  }

doesn't detect un-equality correctly (it assumes the hash is 
collision-free).

Also note that operator== of dr_with_seg_len again also uses
operand_equal_p (plus compare_tree).

IMHO compare_tree should be cleaned up with respect to what
trees we expect here (no REAL_CSTs for example) and properly
do comparisons.

> > But it's also
> > "lazy" in that it will return 0 when it hopes a further disambiguation
> > inside dr_group_sort_cmp on a different field will eventually lead to
> > a non-zero compare_tree.
> > 
> > So eventually if compare_tree returns zero we have to fall back to the
> > final disambiguator using gimple_uid.
> >
> > That said, I'd like to see the testcase where you observe an
> > intransitive comparison.
> 
> Let me dig my debugging logs (I'll send detailed repro tomorrow).

Thanks.

Richard.

Re: [Fortran, Patch] (RFC, Coarray) Implement TS18508's EVENTS

2015-12-17 Thread Alessandro Fanfarillo

Great! Thanks.

2015-12-17 15:57 GMT+01:00 Steve Kargl :
> On Thu, Dec 17, 2015 at 01:22:06PM +0100, Alessandro Fanfarillo wrote:
>>
>> I've noticed that this patch has been applied only on trunk and not on
>> the gcc-5-branch. Is it a problem to include EVENTS in gcc-5?
>>
>
> No problem.  When I applied the EVENTS patch to trunk,
> the 5.3 release was being prepared.  I was going to
> wait for a week or two after 5.3 came out, then apply
> the patch.  Now that you have commit access, feel
> free to back port the patch.  Rememer to post the
> patch that you commit to both the fortran and gcc-patches
> list.
>
> --
> Steve

Re: [PATCH 0/2] obsolete some old targets

2015-12-17 Thread Trevor Saunders

On Tue, Dec 15, 2015 at 03:25:18PM -0700, Jeff Law wrote:
> On 12/15/2015 03:02 PM, Trevor Saunders wrote:
> >>
> >>Can you mark interix as obsolete?  It hasn't even built for a long time.
> >
> >  Sure, I can do that if you want, I just wasn't sure before you wanted
> >  to.
> Please do.  I know we've been round and round on that one before, but given
> it hasn't been building since 2012, I think obsoleting is appropriate.

ok, I committed these two patches and a third obsoleting interix, given
its mechanical the same as these I took this as approval to go ahead
with it and save you a second of review time.  If oyu object obviously
we can change that.

> Fixing it wouldn't be hard, it just doesn't seem worth the effort.

agreed

Trev

> 
> jeff

Re: [PATCH][combine][RFC][2/2] PR rtl-optimization/68796: Perfer zero_extract comparison against zero rather than unsupported shorter modes



On 17/12/15 15:58, Bernd Schmidt wrote:

On 12/17/2015 04:36 PM, Kyrill Tkachov wrote:

The documentation on RTL canonical forms in md.texi says:

"Equality comparisons of a group of bits (usually a single bit) with zero
  will be written using @code{zero_extract} rather than the equivalent
  @code{and} or @code{sign_extract} operations. "

However, this is not always followed in combine. If it's trying to optimise
a comparison against zero of a bitmask that is the mode mask of some mode
(255 for QImode and 65535 for HImode in the testcases of this patch)
it will instead create a subreg to that shorter mode.


I suspect that this is an oversight in the documentation, and if given two 
choices the simpler form is intended to be the canonical one.


it ends up trying to make a QImode comparison against zero, for which
targets like
aarch64 have no pattern.


So, can you define a pattern for it...


To get the benefit on aarch64 this needs patch 1/2 that adds an aarch64
pattern
for comparing a zero_extract with zero.


... instead of this one?



Yes, I had investigated that approach and it has the same effect (on aarch64).
My motivation for this approach was to try avoiding defining multiple patterns 
for what should
be equivalent expressions. But if the short subreg form is intended to be the 
canonical form...


What do people think of this approach?
I hope this just enforces the already documented canonicalisation rules
with minimal(none?) negative
fallout.


I'm not so sure about this. Other ports have QImode comparisons and I would 
want to see some evidence that there are no code quality regressions. This is 
not stage 3 material in any case.



Well, this patch still produces the QImode comparison if the target has a 
QImode comparison
(the have_insn_for check in the simplify_comparison hunk).
As I said, the effects on arm and aarch64 were strictly beneficial.
On x86_64 I saw no codegen difference on SPEC2006.
If this is considered too risky at this stage I can propose a QImode pattern for
aarch64 instead to isolate this fix to that backend.

Thanks,
Kyrill



Bernd

Re: [PATCH][combine][RFC][2/2] PR rtl-optimization/68796: Perfer zero_extract comparison against zero rather than unsupported shorter modes


On 12/17/2015 08:58 AM, Bernd Schmidt wrote:

On 12/17/2015 04:36 PM, Kyrill Tkachov wrote:

The documentation on RTL canonical forms in md.texi says:

"Equality comparisons of a group of bits (usually a single bit) with zero
  will be written using @code{zero_extract} rather than the equivalent
  @code{and} or @code{sign_extract} operations. "

However, this is not always followed in combine. If it's trying to
optimise
a comparison against zero of a bitmask that is the mode mask of some mode
(255 for QImode and 65535 for HImode in the testcases of this patch)
it will instead create a subreg to that shorter mode.


I suspect that this is an oversight in the documentation, and if given
two choices the simpler form is intended to be the canonical one.
It's also the case that sometimes a SUBREG is preferred because it 
conveys that certain bits are "don't care".  In theory this may allow 
things to optimize better.


However, in practice, I'm not sure that's regularly the case because 
various passes are weak in trying to exploit the semantics of the SUBREG 
and passes are generally pretty strong in their handling of zero_extract 
and friends.


IIRC I actually bumped against this in the gcc-5 cycle when fixing some 
suboptimal code generation issues.  I think it was BZ15184.  I'd check 
the archives for Dec 2014 and Jan 2015.  There may be a mention of this 
issue in there from me (I can recall bumping into it, but can't recall 
if I ever did mentioned it publicly or if I ever submitted the change to 
prefer the zero_extract form over the subreg form.


Jeff

PATCH: PR target/66232: -fPIC -fno-plt -mx32 fails to generate indirect branch via GOT

Since Pmode is 64-bit with -maddress-mode=long for x32, indirect call
via GOT slot doesn't need zero_extend.  This patch limits *call_got_x32
and *call_value_got_x32 patterns to 32-bit Pmode, adds *call_got_x32_long
and *call_value_got_x32_long for 64-bit Pmode.

OK for trunk if there is no regression?


H.J.
---
gcc/

PR target/66232
* config/i386/i386.md (*call_got_x32): Limited to 32-bit Pmode.
(*call_value_got_x32): Likewise.
(*call_got_x32_long): New pattern.
(call_value_got_x32_long): Likewise.

gcc/testsuite/

PR target/66232
* gcc.target/i386/pr66232-10.c: New test.
* gcc.target/i386/pr66232-11.c: Likewise.
* gcc.target/i386/pr66232-12.c: Likewise.
* gcc.target/i386/pr66232-13.c: Likewise.
---
 gcc/config/i386/i386.md| 19 +--
 gcc/testsuite/gcc.target/i386/pr66232-10.c | 13 +
 gcc/testsuite/gcc.target/i386/pr66232-11.c | 14 ++
 gcc/testsuite/gcc.target/i386/pr66232-12.c | 13 +
 gcc/testsuite/gcc.target/i386/pr66232-13.c | 13 +
 5 files changed, 70 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr66232-10.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr66232-11.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr66232-12.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr66232-13.c

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 49b2216..dc61050 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -11861,7 +11861,14 @@
   [(call (mem:QI (zero_extend:DI
   (match_operand:SI 0 "GOT_memory_operand" "Bg")))
 (match_operand 1))]
-  "TARGET_X32"
+  "TARGET_X32 && Pmode == SImode"
+  "* return ix86_output_call_insn (insn, operands[0]);"
+  [(set_attr "type" "call")])
+
+(define_insn "*call_got_x32_long"
+  [(call (mem:QI (match_operand:DI 0 "GOT_memory_operand" "Bg"))
+(match_operand 1))]
+  "TARGET_X32 && Pmode == DImode"
   "* return ix86_output_call_insn (insn, operands[0]);"
   [(set_attr "type" "call")])
 
@@ -12038,7 +12045,15 @@
(zero_extend:DI
  (match_operand:SI 1 "GOT_memory_operand" "Bg")))
  (match_operand 2)))]
-  "TARGET_X32"
+  "TARGET_X32 && Pmode == SImode"
+  "* return ix86_output_call_insn (insn, operands[1]);"
+  [(set_attr "type" "callv")])
+
+(define_insn "*call_value_got_x32_long"
+  [(set (match_operand 0)
+   (call (mem:QI (match_operand:DI 1 "GOT_memory_operand" "Bg"))
+ (match_operand 2)))]
+  "TARGET_X32 && Pmode == DImode"
   "* return ix86_output_call_insn (insn, operands[1]);"
   [(set_attr "type" "callv")])
 
diff --git a/gcc/testsuite/gcc.target/i386/pr66232-10.c 
b/gcc/testsuite/gcc.target/i386/pr66232-10.c
new file mode 100644
index 000..c4e9157
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr66232-10.c
@@ -0,0 +1,13 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-require-effective-target maybe_x32 } */
+/* { dg-options "-O2 -mx32 -fpic -fno-plt -maddress-mode=long" } */
+
+extern void bar (void);
+
+void
+foo (void)
+{
+  bar ();
+}
+
+/* { dg-final { scan-assembler "jmp\[ \t\]*.bar@GOTPCREL" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr66232-11.c 
b/gcc/testsuite/gcc.target/i386/pr66232-11.c
new file mode 100644
index 000..05794af
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr66232-11.c
@@ -0,0 +1,14 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-require-effective-target maybe_x32 } */
+/* { dg-options "-O2 -mx32 -fpic -fno-plt -maddress-mode=long" } */
+
+extern void bar (void);
+
+int
+foo (void)
+{
+  bar ();
+  return 0;
+}
+
+/* { dg-final { scan-assembler "call\[ \t\]*.bar@GOTPCREL" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr66232-12.c 
b/gcc/testsuite/gcc.target/i386/pr66232-12.c
new file mode 100644
index 000..313b9e4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr66232-12.c
@@ -0,0 +1,13 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-require-effective-target maybe_x32 } */
+/* { dg-options "-O2 -mx32 -fpic -fno-plt -maddress-mode=long" } */
+
+extern int bar (void);
+
+int
+foo (void)
+{
+  return bar ();
+}
+
+/* { dg-final { scan-assembler "jmp\[ \t\]*.bar@GOTPCREL" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr66232-13.c 
b/gcc/testsuite/gcc.target/i386/pr66232-13.c
new file mode 100644
index 000..50a12cf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr66232-13.c
@@ -0,0 +1,13 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-require-effective-target maybe_x32 } */
+/* { dg-options "-O2 -mx32 -fpic -fno-plt -maddress-mode=long" } */
+
+extern int bar (void);
+
+int
+foo (void)
+{
+  return bar () + 1;
+}
+
+/* { dg-final { scan-assembler "call\[ \t\]*.bar@GOTPCREL" } } */
-- 
2.5.0

C++ PATCH for c++/67550 (wrong value for reference to const class var)

2015-12-17 Thread Jason Merrill

In my rework of decl_constant_value and kin, I enabled its use in more 
places, which revealed a problem: when it is allowing non-constexpr 
aggregate initializers, we need to double-check that we aren't returning 
something that had initializers stripped out in split_nonconstant_init.


Tested x86_64-pc-linux-gnu, applying to trunk and 5.
commit 230c77258443c76da9837a078f0326fee3311f02
Author: Jason Merrill 
Date:   Thu Dec 17 09:56:20 2015 -0500

	PR c++/67550
	* init.c (constant_value_1): Don't return a CONSTRUCTOR missing
	non-constant elements.

diff --git a/gcc/cp/init.c b/gcc/cp/init.c
index a08f7d7..b7f10a1 100644
--- a/gcc/cp/init.c
+++ b/gcc/cp/init.c
@@ -2093,6 +2093,11 @@ constant_value_1 (tree decl, bool strict_p, bool return_aggregate_cst_ok_p)
 	  && (TREE_CODE (init) == CONSTRUCTOR
 		  || TREE_CODE (init) == STRING_CST)))
 	break;
+  /* Don't return a CONSTRUCTOR for a variable with partial run-time
+	 initialization, since it doesn't represent the entire value.  */
+  if (TREE_CODE (init) == CONSTRUCTOR
+	  && !DECL_INITIALIZED_BY_CONSTANT_EXPRESSION_P (decl))
+	break;
   decl = unshare_expr (init);
 }
   return decl;
diff --git a/gcc/testsuite/g++.dg/init/aggr13.C b/gcc/testsuite/g++.dg/init/aggr13.C
new file mode 100644
index 000..08248a6
--- /dev/null
+++ b/gcc/testsuite/g++.dg/init/aggr13.C
@@ -0,0 +1,17 @@
+// PR c++/67550
+// { dg-do run }
+
+struct S {
+  int x;
+  int y;
+};
+int foo() { return 1; }
+
+int main() {
+  S const data[] = {{0, foo()}};
+
+  S data2[] = {data[0]};
+
+  if (!data2[0].y)
+__builtin_abort();
+}

Re: [Patch, avr] Provide correct memory move costs

2015-12-17 Thread Denis Chertykov

2015-12-16 10:08 GMT+03:00 Senthil Kumar Selvaraj
:
> Hi,
>
>   When analyzing code size regressions for AVR for top-of-trunk, I
>   found a few cases where aggresive inlining (by the middle-end)
>   of functions containing calls to memcpy was bloating up the code.
>
>   Turns out that the AVR backend has MOVE_MAX set to 4 (unchanged from the
>   original commit), when it really should be 1, as the AVRs can only
>   move a single byte between reg and memory in a single instruction.
>   Setting it to 4 causes the middle end to underestimate the
>   cost of memcopys with a compile time constant length parameter, as it
>   thinks a 4 byte copy's cost is only a single instruction.
>
>   Just setting MOVE_MAX to 1 makes the middle end too conservative
>   though, and causes a bunch of regression tests to fail, as lots of
>   optimizations fail to pass the code size increase threshold check,
> even when not optimizing for size.
>
>   Instead, the below patch sets MOVE_MAX_PIECES to 2, and implements a
>   target hook that tells the middle-end to use load/store insns for
>   memory moves upto two bytes. Also, the patch sets MOVE_RATIO to 3 when
>   optimizing for speed, so that moves upto 4 bytes will occur through
>   load/store sequences, like it does now.
>
>   With this, only a couple of regression tests fail. uninit-19.c fails
>   because it thinks only non-pic code won't inline a function, but the
>   cost computation prevents inlining for AVRs. The test passes if
>   the optimization level is increased to -O3.
>
> strlenopt-8.c has an XPASS and a FAIL because a previous pass issued
> a builtin_memcpy instead of a MEM assignment. Execution still passes.
>
>   I'll continue running more tests to see if there are other performance
>   related consequences.
>
>   Is this ok? If ok, could someone commit please? I don't have commit
>   access.
>
> Regards
> Senthil
>
> gcc/ChangeLog
>
> 2015-12-16  Senthil Kumar Selvaraj  
>
> * config/avr/avr.h (MOVE_MAX): Set value to 1.
> (MOVE_MAX_PIECES): Define.
> (MOVE_RATIO): Define.
> * config/avr/avr.c (TARGET_USE_BY_PIECES_INFRASTRUCTURE_P):
> Provide target hook.
> (avr_use_by_pieces_infrastructure_p): New function.

Committed.

Denis.

[COMMITTED] Add myself to MAINTAINERS (Write After Approval)

2015-12-17 Thread Andris Pavenis


Just committed.revision 231774

Andris

Index: MAINTAINERS
===
--- MAINTAINERS (revision 231774)
+++ MAINTAINERS (working copy)
@@ -525,6 +525,7 @@
  Patrick Palka
  Seongbae Park
  Devang Patel
+Andris Pavenis
  Fernando Pereira
  Kaushik Phatak
  Nicolas Pitre
Index: ChangeLog
===
--- ChangeLog   (revision 231774)
+++ ChangeLog   (working copy)
@@ -1,3 +1,7 @@
+2015-12-17  Andris Pavenis
+
+   * MAINTAINERS (Write After Approval): Add Myself.
+
  2015-12-17  Nathan Sidwell

 * config/isl.m4 (ISL_CHECK_VERSION): Add gmp libs.

Re: [PATCH 0/2] obsolete some old targets

2015-12-17 Thread Trevor Saunders

On Thu, Dec 17, 2015 at 03:36:18PM +0100, Kamil Rytarowski wrote:
> Hi,
> 
> I talked with devs and it will be better to just keep it removed and focus on 
> native NetBSD with NetBSD userland.
> 
> Actually nobody seems to be interested in the Debian/NetBSD distribution.

that's what I thought from googling, so I'll just go ahead and commit
these patches.

Trev

> 
> Thanks,
> 
> > Sent: Thursday, December 17, 2015 at 3:24 PM
> > From: "Trevor Saunders" 
> > To: "Kamil Rytarowski" 
> > Cc: tbsaunde+...@tbsaunde.org
> > Subject: Re: [PATCH 0/2] obsolete some old targets
> >
> > On Thu, Dec 17, 2015 at 12:37:47PM +0100, Kamil Rytarowski wrote:
> > > -BEGIN PGP SIGNED MESSAGE-
> > > Hash: SHA256
> > > 
> > > I want to keep knetbsd alive. My application for FSF is still ongoing.
> > > 
> > > Please hold on.
> > 
> > Well, These patches aren't going to make resurrecting knetbsd
> > significantly harder, there isn't even a significant amount of knetbsd
> > specific code in gcc, so even removing it should be easily reverted
> > should that become desirable.  On the other hand it doesn't seem like
> > removing the knetbsd specific code will help much at the moment either.
> > So I guess it doesn't really matter one way or another to me.
> > 
> > Trev
> > 
> > > 
> > > Thanks
> > > 
> > > On 15.12.2015 04:55, tbsaunde+...@tbsaunde.org wrote:
> > > > From: Trevor Saunders 
> > > > 
> > > > Hi,
> > > > 
> > > > http://gcc.gnu.org/ml/gcc-patches/2015-12/msg00365.html reminded me
> > > > I hadn't gotten around to marking *-knetbsd and openbsd 2/3
> > > > obsolete as I offered to do back in the spring.
> > > > 
> > > > I tested I could still build on x86_64-linux-gnu, and could only
> > > > cross compile to i386-openbsd2 i386-openbsd3 and
> > > > x86_64_64-knetbsd-gnu with --enable-obsolete.  Given how late in
> > > > the cycle we are I'm not sure if we should remove these targets as
> > > > soon as stage 1 opens, but we might as well obsolete them I guess,
> > > > ok to commit?
> > > > 
> > > > Trev
> > > > 
> > > > 
> > > > Trevor Saunders (2): mark *-knetbsd-* as obsolete obsolete openbsd
> > > > 2.0 and 3.X
> > > > 
> > > > gcc/config.gcc | 4 +++- 1 file changed, 3 insertions(+), 1
> > > > deletion(-)
> > > > 
> > > 
> > > -BEGIN PGP SIGNATURE-
> > > Version: GnuPG v2
> > > 
> > > iQIcBAEBCAAGBQJWcp6JAAoJEEuzCOmwLnZs6IEQAKzugPu0CurmIRNyLR6oyTd3
> > > sTTt/ffzD3RibyJEIVjTBC5tfOFcnS2Mi57TRdN5lDfyF1gwsPpvcY5Ce+WTjnHf
> > > 4Npi/SDego2HPQka5laeJv/MJdBrc7f5bncowcicrZMvo1QImYA4BFQuRk3rMSWj
> > > y31GUlTlP7yQFQ0FSXGFegkEZ7J/LqYmW+piSMhqEcqnRD6FJgGNwGPIngdQ3HvE
> > > w4z37n1Bs8qD9P6AW0D3YZfvDKn7GbGGTVq3uk1MI78hivXdCgXPyY3qnhVCmTjj
> > > 2dAX2h0Tl5aYBbVseO2ecPm/U7BnOYQBACnysnNjh3TLBzIjoXrt1Sao1m2aywj/
> > > f1+LUS2ySknZKidJRNGO/IqrhDIG2Qgmrn2MQDofTCFIwcrvZkt2wRqjBhf7IaCc
> > > Y5o7/emwj+dbfbPQNvu7RS6kFtOS4JXgs8b8D3oXHc9D9BNWYWEu5XSIWK+1HwwF
> > > 3wMcqZoZdqDFm1swM1XjOFpMjengq4AY8HAEROnj1p1qG4LhFKD84qFnELpEDowa
> > > leG9B+l9yoJQVi2GgZA8XmE7gT54oHu+pqlL7N/FgMNRS1rg4YUmAF6DOWl9cWm+
> > > NAdugbI+6VDUcvhgtrPIUv378Zn2jSUwzdl+hFp9C+jrwsc0KQN8Sg3a1wX3e8yf
> > > 0nsnHzcG0ulJnBPTDdEN
> > > =RMIO
> > > -END PGP SIGNATURE-
> >

[PATCH][combine][RFC][2/2] PR rtl-optimization/68796: Perfer zero_extract comparison against zero rather than unsupported shorter modes


Hi all,

The documentation on RTL canonical forms in md.texi says:

"Equality comparisons of a group of bits (usually a single bit) with zero
 will be written using @code{zero_extract} rather than the equivalent
 @code{and} or @code{sign_extract} operations. "

However, this is not always followed in combine. If it's trying to optimise
a comparison against zero of a bitmask that is the mode mask of some mode
(255 for QImode and 65535 for HImode in the testcases of this patch)
it will instead create a subreg to that shorter mode.
This means that for the example:
int
f255 (int x)
{
  if (x & 255)
return 1;
  return x;
}

it ends up trying to make a QImode comparison against zero, for which targets 
like
aarch64 have no pattern.

This patch attempts to fix this in two places in combine.
First is simplify_comparison when handling the and-bitmask case.
Currently it will call gen_lowpart_or_truncate on the argument to produce the 
short subreg.
With this patch we don't do that when comparing against zero.
This way the and-bitmask form is preserved for make_extraction later on to 
convert
into a zero_extract.
The second place is in make_extraction itself where it tries to avoid creating 
a zero_extract,
but the canonicalisation rules and the function comment for make_extraction say 
that it should
try hard create a zero_extraction when inside a comparison in particular
(" IN_COMPARE is nonzero if we are in a COMPARE.  This means that a
   ZERO_EXTRACT should be built even for bits starting at bit 0.")

With this patch for the testcases:
int
f255 (int x)
{
  if (x & 255)
return 1;
  return x;
}

int
foo (long x)
{
   return ((short) x != 0) ? x : 1;
}

we now generate for aarch64 at -O2:
f255:
tst x0, 255
csinc   w0, w0, wzr, eq
ret

and
foo:
tst x0, 65535
csinc   x0, x0, xzr, ne
ret


instead of the previous:
f255:
and w1, w0, 255
cmp w1, wzr
csinc   w0, w0, wzr, eq
ret

foo:
sxthw1, w0
cmp w1, wzr
csinc   x0, x0, xzr, ne
ret


Bootstrapped and tested on arm, aarch64, x86_64.
To get the benefit on aarch64 this needs patch 1/2 that adds an aarch64 pattern
for comparing a zero_extract with zero.
On aarch64 this greatly increases the usage of the TST instruction by about 54% 
on SPEC2006.
Performance-wise there were no regressions and slight improvements on SPECINT 
that may just
be above normal noise (overall 0.5% improvement).
On arm it makes very little difference (arm already defines QI and HImode 
comparisons against zero)
but makes more use of the lsrs-immediate instruction in place of the arm tst 
instruction, which has
a shorter encoding in Thumb2 state.
On x86_64 I saw no difference in code size for SPEC2006 on my setup.

What do people think of this approach?
I hope this just enforces the already documented canonicalisation rules with 
minimal(none?) negative
fallout.

Thanks,
Kyrill


2015-12-17  Kyrylo Tkachov  

PR rtl-optimization/68796
* combine.c (make_extraction): Don't try to avoid the extraction if
inside a compare.
(simplify_comparison): Don't truncate to lowpart if comparing against
zero and target doesn't have a native compare instruction in the
required short mode.

2015-12-17  Kyrylo Tkachov  

PR rtl-optimization/68796
* gcc.target/aarch64/tst_5.c: New test.
* gcc.target/aarch64/tst_6.c: Likewise.
diff --git a/gcc/combine.c b/gcc/combine.c
index 8601d8983ce345e2129dd047b3520d98c0582842..345e63f9a05f2310a5c9e5b239ed069d22565d1c 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -7337,10 +7337,13 @@ make_extraction (machine_mode mode, rtx inner, HOST_WIDE_INT pos,
  low-order bit and this is either not in the destination or we have the
  appropriate STRICT_LOW_PART operation available.
 
+ Don't do this if we are inside a comparison, as the canonicalization
+ rules call for a zero_extract form.
  For MEM, we can avoid an extract if the field starts on an appropriate
  boundary and we can change the mode of the memory reference.  */
 
   if (tmode != BLKmode
+  && !in_compare
   && ((pos_rtx == 0 && (pos % BITS_PER_WORD) == 0
 	   && !MEM_P (inner)
 	   && (inner_mode == tmode
@@ -12108,14 +12111,19 @@ simplify_comparison (enum rtx_code code, rtx *pop0, rtx *pop1)
 
 	 unless TRULY_NOOP_TRUNCATION allows it or the register is
 	 known to hold a value of the required mode the
-	 transformation is invalid.  */
+	 transformation is invalid.
+	 If the target does not have a compare instruction of that mode
+	 don't do this when comparing against 0 since the canonicalization
+	 rules require such an operation to be represented as a
+	 zero_extract, which make_extraction will produce later on.  */
 	  if ((equality_comparison_p || unsigned_comparison_p)
 	  && CONST_INT_P (XEXP (op0, 1))
 	  && (i =

[PATCH][AArch64][1/2] PR rtl-optimization/68796 Add compare-of-zero_extract pattern


Hi all,

In this PR I'm trying to increase the use of the aarch64 instruction TST that 
performs a
bitwise AND with a bitmask and compares the result with zero.
GCC has many ways of representing these operations in RTL. Depending on the 
mask, the target
and the context it might be an AND-immediate, a ZERO_EXTRACT or a ZERO_EXTEND 
of a subreg.

aarch64.md already contains a pattern for the compare with and-immediate case, 
which is the most
general form of this, but it doesn't match in many common cases

The documentation on canonicalization in md.texi says:
"Equality comparisons of a group of bits (usually a single bit) with zero
 will be written using @code{zero_extract} rather than the equivalent
 @code{and} or @code{sign_extract} operations. "

This means that we should define a compare with a zero-extract pattern in 
aarch64,
which is what this patch does. It's fairly simple: it constructs the TST mask 
from
the operands of the zero_extract and updates the SELECT_CC_MODE implementation 
to
assign the correct CC_NZ mode to such comparisons.  Note that this is valid only
for equality comparisons against zero.

So for the testcase:
int
f1 (int x)
{
  if (x & 1)
return 1;
  return x;
}

we now generate:
f1:
tst x0, 1
csinc   w0, w0, wzr, eq
ret

instead of the previous:
f1:
and w1, w0, 1
cmp w1, wzr
csinc   w0, w0, wzr, eq
ret


and for the testcase:
int
f2 (long x)
{
   return ((short) x >= 0) ? x : 0;
}

we now generate:
f2:
tst x0, 32768
cselx0, x0, xzr, eq
ret

instead of:
f2:
sxthw1, w0
cmp w1, wzr
cselx0, x0, xzr, ge
ret

i.e. we test the sign bit rather than perform the full comparison with zero.

Bootstrapped and tested on aarch64-none-linux-gnu.

Ok for trunk?

Thanks,
Kyrill

2015-12-17  Kyrylo Tkachov  

PR rtl-optimization/68796
* config/aarch64/aarch64.md (*and3nr_compare0_zextract):
New pattern.
* config/aarch64/aarch64.c (aarch64_select_cc_mode): Handle
ZERO_EXTRACT comparison with zero.
(aarch64_mask_from_zextract_ops): New function.
* config/aarch64/aarch64-protos.h (aarch64_mask_from_zextract_ops):
New prototype.

2015-12-17  Kyrylo Tkachov  

PR rtl-optimization/68796
* gcc.target/aarch64/tst_3.c: New test.
* gcc.target/aarch64/tst_4.c: Likewise.
diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 87d6eb1358845527d7068550925949802a7e48e2..febca98d38d5f09c97b0f79adc55bb29eca217b9 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -330,6 +330,7 @@ int aarch64_uxt_size (int, HOST_WIDE_INT);
 int aarch64_vec_fpconst_pow_of_2 (rtx);
 rtx aarch64_final_eh_return_addr (void);
 rtx aarch64_legitimize_reload_address (rtx *, machine_mode, int, int, int);
+rtx aarch64_mask_from_zextract_ops (rtx, rtx);
 const char *aarch64_output_move_struct (rtx *operands);
 rtx aarch64_return_addr (int, rtx);
 rtx aarch64_simd_gen_const_vector_dup (machine_mode, int);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index cb8955d5d6c909e8179bb1ab8203eb165f55e4b6..58a9fc68f391162ed9847d7fb79d70d3ee9919f5 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -4147,7 +4147,9 @@ aarch64_select_cc_mode (RTX_CODE code, rtx x, rtx y)
   && y == const0_rtx
   && (code == EQ || code == NE || code == LT || code == GE)
   && (GET_CODE (x) == PLUS || GET_CODE (x) == MINUS || GET_CODE (x) == AND
-	  || GET_CODE (x) == NEG))
+	  || GET_CODE (x) == NEG
+	  || (GET_CODE (x) == ZERO_EXTRACT && CONST_INT_P (XEXP (x, 1))
+	  && CONST_INT_P (XEXP (x, 2)
 return CC_NZmode;
 
   /* A compare with a shifted operand.  Because of canonicalization,
@@ -10757,6 +10759,21 @@ aarch64_simd_imm_zero_p (rtx x, machine_mode mode)
   return x == CONST0_RTX (mode);
 }
 
+
+/* Return the bitmask CONST_INT to select the bits required by a zero extract
+   operation of width WIDTH at bit position POS.  */
+
+rtx
+aarch64_mask_from_zextract_ops (rtx width, rtx pos)
+{
+  gcc_assert (CONST_INT_P (width));
+  gcc_assert (CONST_INT_P (pos));
+
+  unsigned HOST_WIDE_INT mask
+= ((unsigned HOST_WIDE_INT)1 << UINTVAL (width)) - 1;
+  return GEN_INT (mask << UINTVAL (pos));
+}
+
 bool
 aarch64_simd_imm_scalar_p (rtx x, machine_mode mode ATTRIBUTE_UNUSED)
 {
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 4604fd2588be87944a72224dccb3dfb32e42a1ad..fd2b3ef64f1736545948eb49e5ac6dfbd206e3e9 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -3698,6 +3698,28 @@ (define_insn "*and3nr_compare0"
   [(set_attr "type" "logics_reg,logics_imm")]
 )
 
+(define_insn "*and3nr_compare0_zextract"
+  [(set (reg:CC_NZ CC_REGNUM)
+	(compare:CC_NZ
+	 (zero_extract:GPI (match_operand:GPI 0 "register_operand" "r")
+

Re: [PATCH][combine][RFC][2/2] PR rtl-optimization/68796: Perfer zero_extract comparison against zero rather than unsupported shorter modes


On 12/17/2015 04:36 PM, Kyrill Tkachov wrote:

The documentation on RTL canonical forms in md.texi says:

"Equality comparisons of a group of bits (usually a single bit) with zero
  will be written using @code{zero_extract} rather than the equivalent
  @code{and} or @code{sign_extract} operations. "

However, this is not always followed in combine. If it's trying to optimise
a comparison against zero of a bitmask that is the mode mask of some mode
(255 for QImode and 65535 for HImode in the testcases of this patch)
it will instead create a subreg to that shorter mode.


I suspect that this is an oversight in the documentation, and if given 
two choices the simpler form is intended to be the canonical one.



it ends up trying to make a QImode comparison against zero, for which
targets like
aarch64 have no pattern.


So, can you define a pattern for it...


To get the benefit on aarch64 this needs patch 1/2 that adds an aarch64
pattern
for comparing a zero_extract with zero.


... instead of this one?


What do people think of this approach?
I hope this just enforces the already documented canonicalisation rules
with minimal(none?) negative
fallout.


I'm not so sure about this. Other ports have QImode comparisons and I 
would want to see some evidence that there are no code quality 
regressions. This is not stage 3 material in any case.



Bernd

Re: [PATCH][combine][RFC][2/2] PR rtl-optimization/68796: Perfer zero_extract comparison against zero rather than unsupported shorter modes



On 17/12/15 16:12, Bernd Schmidt wrote:

On 12/17/2015 05:10 PM, Kyrill Tkachov wrote:

Well, this patch still produces the QImode comparison if the target has
a QImode comparison
(the have_insn_for check in the simplify_comparison hunk).


Ok, I didn't look that closely because I had doubts about the approach. This 
kind of check also goes somewhat against the principles of just producing 
canonical forms of RTL.



One could argue that if the target has (or advertises having) a native
QImode register comparison then it's objectively a simplification to transform 
a comparison in a wider mode
to a comparison in the shorter mode.

If, however, the target doesn't have such an instruction (like aarch64 doesn't 
have QImode registers) then
truncating the wider mode to QImode through a subreg is not less complex than a 
zero_extract, as both will
involve some form of extracting/masking the desired QImode bits. So picking a 
canonical form there makes sense,
and the documentation already specifies the zero_extract form as the canonical.

Would be nice to get a definite clarification on whether the subreg form is 
indeed the canonical one.
Then we can document it and I can just add a QI/HImode compare pattern to 
aarch64.

Thanks,
Kyrill



Bernd

C++ PATCH for c++/67576 (multiple evaluation of typeid operand)

2015-12-17 Thread Jason Merrill

When I changed build_typeid to take the address of a polymorphic operand 
rather than using the lvalue directly, I forgot the parallel change from 
stabilize_reference to save_expr.


Tested x86_64-pc-linux-gnu, applying to trunk and 5.
commit 5361caf55040d2a15b5ebb5ff0fc1e3e605dba9c
Author: Jason Merrill 
Date:   Thu Dec 17 00:10:20 2015 -0500

	PR c++/67576
	PR c++/25466
	* rtti.c (build_typeid): Use save_expr, not stabilize_reference.

diff --git a/gcc/cp/rtti.c b/gcc/cp/rtti.c
index b397b55..f42b1cb 100644
--- a/gcc/cp/rtti.c
+++ b/gcc/cp/rtti.c
@@ -332,7 +332,7 @@ build_typeid (tree exp, tsubst_flags_t complain)
   /* So we need to look into the vtable of the type of exp.
  Make sure it isn't a null lvalue.  */
   exp = cp_build_addr_expr (exp, complain);
-  exp = stabilize_reference (exp);
+  exp = save_expr (exp);
   cond = cp_convert (boolean_type_node, exp, complain);
   exp = cp_build_indirect_ref (exp, RO_NULL, complain);
 }
diff --git a/gcc/testsuite/g++.dg/rtti/typeid11.C b/gcc/testsuite/g++.dg/rtti/typeid11.C
new file mode 100644
index 000..384b0f4
--- /dev/null
+++ b/gcc/testsuite/g++.dg/rtti/typeid11.C
@@ -0,0 +1,16 @@
+// { dg-do run }
+
+#include 
+
+struct Base { virtual void foo() {} }; // polymorphic
+
+int main()
+{
+  Base b;
+  Base *ary[] = { , , };
+
+  int iter = 0;
+  typeid(*ary[iter++]);
+  if (iter != 1)	// should be 1
+__builtin_abort();	// but 2
+}

Re: [PATCH] Fix PR68707, 67323

2015-12-17 Thread Alan Lawrence


On 17/12/15 10:46, Richard Biener wrote:

On Thu, 17 Dec 2015, Alan Lawrence wrote:


On 16/12/15 15:01, Richard Biener wrote:


The following patch adds a heuristic to prefer store/load-lanes
over SLP when vectorizing.  Compared to the variant attached to
the PR I made the STMT_VINFO_STRIDED_P behavior explicit (matching
what you've tested).


Not sure I follow this. Compared to the variant attached to the PR - we will
now attempt to use load-lanes, if (say) all of the loads are strided, even if
we know we don't support load-lanes (for any of them). That sounds the wrong
way around and I think rather different to what you proposed earlier? (At the
least, the debug message "can use load/store lanes" is potentially misleading,
that's not necessarily the case!)


Ah, indeed.  Note that the whole thing is still guarded by the check
that we can use store-lanes for the store.

I can also do it the other way around (as previously proposed) which
would change outcome for slp-perm-11.c.  That proposal would not reject
the SLP if there were any strided grouped loads involved.


Indeed; the STMT_VINFO_STRIDED_P || !vect_load_lanes_supported approach (as on 
PR68707) vectorizes slp-perm-11.c with SLP, which works much better than the 
!STMT_VINFO_STRIDED_P && !vect_load_lanes_supported, which tries to use st2 (and 
only sort-of works - you get an st2 output, but no ld2, and lots of faff).


I think I move for the patch from PR68707, therefore. (Ramana - any thoughts?)


Btw, another option is to push the decision past full SLP analysis
and thus make the decision globally for all SLP instances - currently
SLP instances are cancelled one a one-by-one basis meaning we might
do SLP plus load/store-lanes in the same loop.


I don't see anything inherently wrong with doing both in the same loop. On 
simple loops, I suspect we'll do better committing to one strategy or the other 
(tho really it's only the VF required I think?), but then, on such simple loops, 
there are probably not very many SLP instances!



Maybe we have to go all the way to implementing a better vectorization
cost hook just for the permutations - the SLP path in theory knows
exactly which ones it will generate.


Yes, I think this sounds like a good plan for GCC 7. It doesn't require 
constructing an entire stmt (if you are concerned about the cost of that), and 
on most targets, probably integrates fairly easily with the 
expand_vec_perm_const hooks.


--Alan

Re: [PATCH][combine][RFC][2/2] PR rtl-optimization/68796: Perfer zero_extract comparison against zero rather than unsupported shorter modes


On 12/17/2015 08:58 AM, Bernd Schmidt wrote:


I suspect that this is an oversight in the documentation, and if given
two choices the simpler form is intended to be the canonical one.
The other BZ I was looking at in this space was 15596.  It's PPC, but 
shows a generic weakness in how we identify extractions and insertions. 
 Fixing it would probably help all the ports that have relatively 
strong methods to set/clear a series of bits in the middle of a word.


It feels like combine has all the information necessary to improve 
things, but the overall combiner flow and APIs are extremely uncooperative.


jeff

[PTX] Reorder hard regs

2015-12-17 Thread Nathan Sidwell

This  reorders the hardregs to be a  contiguous block, and names them somewhat 
more conventionally.  (I had considered %sp, %fp etc, but went with the longer 
names).


nathan
2015-12-17  Nathan Sidwell  

	* config/nvptx/nvptx.h (NVPTX_RETURN_REGNUM, FRAME_POINTER_REGNUM,
	ARG_POINTER_REGNUM, STATIC_CHAIN_REGNUM): Renumber.
	(REGISTER_NAMES): Update and rename.
	(FIXED_REGISTERS, CALL_USED_REGISTERS): Update.
	(enum_reg_class, REG_CLASS_NAMES, REG_CLASS_CONTENTS): Reformat.

Index: config/nvptx/nvptx.h
===
--- config/nvptx/nvptx.h	(revision 231769)
+++ config/nvptx/nvptx.h	(working copy)
@@ -78,19 +78,15 @@
 #define PTRDIFF_TYPE (TARGET_ABI64 ? "long int" : "int")
 
 #define POINTER_SIZE (TARGET_ABI64 ? 64 : 32)
-
 #define Pmode (TARGET_ABI64 ? DImode : SImode)
 
 /* Registers.  Since ptx is a virtual target, we just define a few
-   hard registers for special purposes and leave pseudos unallocated.  */
-
-#define FIRST_PSEUDO_REGISTER 16
-/* We have to have some available hard registers, to keep gcc setup
+   hard registers for special purposes and leave pseudos unallocated.
+   We have to have some available hard registers, to keep gcc setup
happy.  */
-#define FIXED_REGISTERS	\
-  { 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1 }
-#define CALL_USED_REGISTERS\
-  { 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 }
+#define FIRST_PSEUDO_REGISTER 16
+#define FIXED_REGISTERS	{ 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }
+#define CALL_USED_REGISTERS { 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 }
 
 #define HARD_REGNO_NREGS(REG, MODE)		\
   ((void)(REG), (void)(MODE), 1)
@@ -100,32 +96,13 @@
  ((void)(REG), (void)(MODE), true)
 
 /* Register Classes.  */
-
-enum reg_class
-  {
-NO_REGS,
-ALL_REGS,
-LIM_REG_CLASSES
-  };
-
+enum reg_class {  NO_REGS,ALL_REGS,	LIM_REG_CLASSES };
+#define REG_CLASS_NAMES{ "NO_REGS",  "ALL_REGS" }
+#define REG_CLASS_CONTENTS { { 0x }, { 0x } }
 #define N_REG_CLASSES (int) LIM_REG_CLASSES
 
-#define REG_CLASS_NAMES {	  \
-"NO_REGS",			  \
-"ALL_REGS" }
-
-#define REG_CLASS_CONTENTS	\
-{\
-  /* NO_REGS.  */		\
-  { 0x },			\
-  /* ALL_REGS.  */		\
-  { 0x },			\
-}
-
 #define GENERAL_REGS ALL_REGS
-
 #define REGNO_REG_CLASS(R) ((void)(R), ALL_REGS)
-
 #define BASE_REG_CLASS ALL_REGS
 #define INDEX_REG_CLASS NO_REGS
 
@@ -151,17 +128,16 @@ enum reg_class
 #define FRAME_GROWS_DOWNWARD 0
 #define STACK_GROWS_DOWNWARD 1
 
+#define NVPTX_RETURN_REGNUM 0
 #define STACK_POINTER_REGNUM 1
-#define NVPTX_RETURN_REGNUM 4
-#define FRAME_POINTER_REGNUM 15
-#define ARG_POINTER_REGNUM 14
-
-#define STATIC_CHAIN_REGNUM 12
+#define FRAME_POINTER_REGNUM 2
+#define ARG_POINTER_REGNUM 3
+#define STATIC_CHAIN_REGNUM 4
 
 #define REGISTER_NAMES			\
   {	\
-"%hr0", "%outargs", "%hfp", "%hr3", "%retval", "%hr5", "%hr6", "%hr7",	\
-"%hr8", "%hr9", "%hr10", "%hr11", "%chain_in", "%hr13", "%argp", "%frame" \
+"%value", "%stack", "%frame", "%args", "%chain", "%hr5", "%hr6", "%hr7", \
+"%hr8", "%hr9", "%hr10", "%hr11", "%hr12", "%hr13", "%hr14", "%hr15" \
   }
 
 #define FIRST_PARM_OFFSET(FNDECL) ((void)(FNDECL), 0)

Re: [PATCH] PR target/68937: i686: -fno-plt produces wrong code (maybe only with tailcall

On Thu, Dec 17, 2015 at 5:42 AM, Uros Bizjak  wrote:
> On Thu, Dec 17, 2015 at 2:00 PM, H.J. Lu  wrote:
>> On Thu, Dec 17, 2015 at 2:04 AM, Uros Bizjak  wrote:
>>> On Thu, Dec 17, 2015 at 12:29 AM, H.J. Lu  wrote:
 Since sibcall never returns, we can only use call-clobbered register
 as GOT base.  Otherwise, callee-saved register used as GOT base won't
 be properly restored.

 Tested on x86-64 with -m32.  OK for trunk?
>>>
>>> You don't have to add explicit clobber for members of "CLOBBERED_REGS"
>>> class, and register_no_elim_operand predicate should be used with "U"
>>> constraint. Also, please introduce new predicate, similar to how
>>> GOT_memory_operand is defined and handled.
>>>
>>
>> Here is the updated patch.  There is a predicate already,
>> sibcall_memory_operand.  It allows any registers to
>> be as GOT base, which is the root of our problem.
>> This patch removes GOT slot from it and handles
>> sibcall over GOT slot with *sibcall_GOT_32 and
>> *sibcall_value_GOT_32 patterns.  Since I need to
>> expose constraints on GOT base register to RA,
>> I have to use 2 operands, GOT base and function
>> symbol, to describe sibcall over 32-bit GOT slot.
>
> Please use
>
>(mem:SI (plus:SI
>  (match_operand:SI 0 "register_no_elim_operand" "U")
>  (match_operand:SI 1 "GOT32_symbol_operand")))
> ...
>
> to avoid manual rebuild of the operand.
>

Is this OK?

Thanks.

-- 
H.J.
From 9a5818415f9de92454ee555e8d8c3bd675fe30dd Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Wed, 16 Dec 2015 12:34:57 -0800
Subject: [PATCH] Use call-clobbered register for sibcall via GOT

Since sibcall never returns, we can only use call-clobbered register
as GOT base.  Otherwise, callee-saved register used as GOT base won't
be properly restored.

gcc/

	PR target/68937
	* config/i386/i386.c (ix86_function_ok_for_sibcall): Count
	call via GOT slot as indirect call.
	* config/i386/i386.md (*sibcall_GOT_32): New pattern.
	(*sibcall_value_GOT_32): Likewise.
	* config/i386/predicates.md (sibcall_memory_operand): Remove
	GOT slot.
	(GOT32_symbol_operand): New predicate.

gcc/testsuite/

	PR target/68937
	* gcc.target/i386/pr68937-1.c: New test.
	* gcc.target/i386/pr68937-2.c: Likewise.
	* gcc.target/i386/pr68937-3.c: Likewise.
	* gcc.target/i386/pr68937-4.c: Likewise.
	* gcc.target/i386/pr68937-5.c: Likewise.
---
 gcc/config/i386/i386.c|  4 +++-
 gcc/config/i386/i386.md   | 33 +++
 gcc/config/i386/predicates.md | 16 +--
 gcc/testsuite/gcc.target/i386/pr68937-1.c | 13 
 gcc/testsuite/gcc.target/i386/pr68937-2.c | 13 
 gcc/testsuite/gcc.target/i386/pr68937-3.c | 13 
 gcc/testsuite/gcc.target/i386/pr68937-4.c | 13 
 gcc/testsuite/gcc.target/i386/pr68937-5.c |  9 +
 8 files changed, 107 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr68937-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr68937-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr68937-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr68937-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr68937-5.c

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index cecea24..0e2bec3 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -6723,8 +6723,10 @@ ix86_function_ok_for_sibcall (tree decl, tree exp)
   /* If this call is indirect, we'll need to be able to use a
 	 call-clobbered register for the address of the target function.
 	 Make sure that all such registers are not used for passing
-	 parameters.  Note that DLLIMPORT functions are indirect.  */
+	 parameters.  Note that DLLIMPORT functions and call via GOT
+	 slot are indirect.  */
   if (!decl
+	  || (flag_pic && !flag_plt)
 	  || (TARGET_DLLIMPORT_DECL_ATTRIBUTES && DECL_DLLIMPORT_P (decl)))
 	{
 	  /* Check if regparm >= 3 since arg_reg_available is set to
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 49b2216..6ab8eaa 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -11865,6 +11865,22 @@
   "* return ix86_output_call_insn (insn, operands[0]);"
   [(set_attr "type" "call")])
 
+;; Since sibcall never returns, we can only use call-clobbered register
+;; as GOT base.
+(define_insn "*sibcall_GOT_32"
+  [(call (mem:QI
+	   (mem:SI (plus:SI
+		 (match_operand:SI 0 "register_no_elim_operand" "U")
+		 (match_operand:SI 1 "GOT32_symbol_operand"
+	 (match_operand 2))]
+  "!TARGET_MACHO && !TARGET_64BIT && SIBLING_CALL_P (insn)"
+{
+  rtx fnaddr = gen_rtx_PLUS (Pmode, operands[0], operands[1]);
+  fnaddr = gen_const_mem (Pmode, fnaddr);
+  return ix86_output_call_insn (insn, fnaddr);
+}
+  [(set_attr "type" "call")])
+
 (define_insn "*sibcall"
   [(call (mem:QI (match_operand:W 0

Re: [PATCH][combine][RFC][2/2] PR rtl-optimization/68796: Perfer zero_extract comparison against zero rather than unsupported shorter modes


On 12/17/2015 05:10 PM, Kyrill Tkachov wrote:

Well, this patch still produces the QImode comparison if the target has
a QImode comparison
(the have_insn_for check in the simplify_comparison hunk).


Ok, I didn't look that closely because I had doubts about the approach. 
This kind of check also goes somewhat against the principles of just 
producing canonical forms of RTL.



Bernd

Re: [PATCH] PR target/68937: i686: -fno-plt produces wrong code (maybe only with tailcall

On Thu, Dec 17, 2015 at 7:50 AM, H.J. Lu  wrote:
> On Thu, Dec 17, 2015 at 5:42 AM, Uros Bizjak  wrote:
>> On Thu, Dec 17, 2015 at 2:00 PM, H.J. Lu  wrote:
>>> On Thu, Dec 17, 2015 at 2:04 AM, Uros Bizjak  wrote:
 On Thu, Dec 17, 2015 at 12:29 AM, H.J. Lu  wrote:
> Since sibcall never returns, we can only use call-clobbered register
> as GOT base.  Otherwise, callee-saved register used as GOT base won't
> be properly restored.
>
> Tested on x86-64 with -m32.  OK for trunk?

 You don't have to add explicit clobber for members of "CLOBBERED_REGS"
 class, and register_no_elim_operand predicate should be used with "U"
 constraint. Also, please introduce new predicate, similar to how
 GOT_memory_operand is defined and handled.

>>>
>>> Here is the updated patch.  There is a predicate already,
>>> sibcall_memory_operand.  It allows any registers to
>>> be as GOT base, which is the root of our problem.
>>> This patch removes GOT slot from it and handles
>>> sibcall over GOT slot with *sibcall_GOT_32 and
>>> *sibcall_value_GOT_32 patterns.  Since I need to
>>> expose constraints on GOT base register to RA,
>>> I have to use 2 operands, GOT base and function
>>> symbol, to describe sibcall over 32-bit GOT slot.
>>
>> Please use
>>
>>(mem:SI (plus:SI
>>  (match_operand:SI 0 "register_no_elim_operand" "U")
>>  (match_operand:SI 1 "GOT32_symbol_operand")))
>> ...
>>
>> to avoid manual rebuild of the operand.
>>
>
> Is this OK?
>

An updated patch to allow sibcall_memory_operand for RTL
expansion.  OK for trunk if there is no regression?

Thanks.


-- 
H.J.
From dffd3a70b9788174f9b279ff27bf72dbc2384659 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Wed, 16 Dec 2015 12:34:57 -0800
Subject: [PATCH] Use call-clobbered register for sibcall via GOT

Since sibcall never returns, we can only use call-clobbered register as
GOT base.  Otherwise, callee-saved register used as GOT base won't be
properly restored.  sibcall_memory_operand is changed to allow 32-bit
GOT slot only with pseudo register as GOT base for RTL expansion.  2
new patterns, *sibcall_GOT_32 and *sibcall_value_GOT_32, are added to
expose GOT base register to register allocator so that call-clobbered
register will be used for GOT base.

gcc/

	PR target/68937
	* config/i386/i386.c (ix86_function_ok_for_sibcall): Count
	call via GOT slot as indirect call.
	* config/i386/i386.md (*sibcall_GOT_32): New pattern.
	(*sibcall_value_GOT_32): Likewise.
	* config/i386/predicates.md (sibcall_memory_operand): Allow
	32-bit GOT slot only with pseudo register as GOT base.
	(GOT32_symbol_operand): New predicate.

gcc/testsuite/

	PR target/68937
	* gcc.target/i386/pr68937-1.c: New test.
	* gcc.target/i386/pr68937-2.c: Likewise.
	* gcc.target/i386/pr68937-3.c: Likewise.
	* gcc.target/i386/pr68937-4.c: Likewise.
	* gcc.target/i386/pr68937-5.c: Likewise.
---
 gcc/config/i386/i386.c|  4 +++-
 gcc/config/i386/i386.md   | 33 +++
 gcc/config/i386/predicates.md | 12 +++
 gcc/testsuite/gcc.target/i386/pr68937-1.c | 13 
 gcc/testsuite/gcc.target/i386/pr68937-2.c | 13 
 gcc/testsuite/gcc.target/i386/pr68937-3.c | 13 
 gcc/testsuite/gcc.target/i386/pr68937-4.c | 13 
 gcc/testsuite/gcc.target/i386/pr68937-5.c |  9 +
 8 files changed, 109 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr68937-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr68937-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr68937-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr68937-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr68937-5.c

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index cecea24..0e2bec3 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -6723,8 +6723,10 @@ ix86_function_ok_for_sibcall (tree decl, tree exp)
   /* If this call is indirect, we'll need to be able to use a
 	 call-clobbered register for the address of the target function.
 	 Make sure that all such registers are not used for passing
-	 parameters.  Note that DLLIMPORT functions are indirect.  */
+	 parameters.  Note that DLLIMPORT functions and call via GOT
+	 slot are indirect.  */
   if (!decl
+	  || (flag_pic && !flag_plt)
 	  || (TARGET_DLLIMPORT_DECL_ATTRIBUTES && DECL_DLLIMPORT_P (decl)))
 	{
 	  /* Check if regparm >= 3 since arg_reg_available is set to
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 49b2216..6ab8eaa 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -11865,6 +11865,22 @@
   "* return ix86_output_call_insn (insn, operands[0]);"
   [(set_attr "type" "call")])
 
+;; Since sibcall never returns, we can only use call-clobbered register
+;;

Re: [PATCH] Fix PR68852



On 14/12/15 15:14, Richard Biener wrote:

The following fixes PR68852 - so I finally needed to sit down and
fix the "build-from-scalars" hack in the SLP vectorizer by pretending
we'd have a sane vectorizer IL.  Basically I now mark the SLP node
with a proper vect_def_type but I have to push that down to the
stmt-info level whenever sth would look at it.

It's a bit ugly but not too much yet ;)

Anyway, the proper fix is to have a sane data structure, nothing for
GCC 6 though.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Verified SPEC CPU 2006 is happy with the patch.


Unfortunately it's not very happy on aarch64 ;)
416.gamess and the trans.fppized.f in particular ICEs after this patch with

trans.fppized.f:2086:0:

   SUBROUTINE TRFMCX(NPRINT,ICORBS,IORBS,IORB,DOFOCK,DOEXCH,


internal compiler error: in vect_analyze_stmt, at tree-vect-stmts.c:8013
0xd34d1b vect_analyze_stmt(gimple*, bool*, _slp_tree*)
$SRC/tree-vect-stmts.c:8013
0xd4b64a vect_slp_analyze_node_operations
$SRC/tree-vect-slp.c:2237
0xd4b533 vect_slp_analyze_node_operations
$SRC/tree-vect-slp.c:2221
0xd4b533 vect_slp_analyze_node_operations
$SRC/tree-vect-slp.c:2221
0xd4b533 vect_slp_analyze_node_operations
$SRC/tree-vect-slp.c:2221
0xd4b533 vect_slp_analyze_node_operations
$SRC/tree-vect-slp.c:2221
0xd4f7dc vect_slp_analyze_operations(vec<_slp_instance*, va_heap, vl_ptr>, 
void*)
$SRC/tree-vect-slp.c:2269
0xd546a0 vect_slp_analyze_bb_1
$SRC/tree-vect-slp.c:2543
0xd546a0 vect_slp_bb(basic_block_def*)
$SRC/tree-vect-slp.c:2630
0xd56985 execute
$SRC/tree-vectorizer.c:759
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.

when using the flags
-mcpu=cortex-a53+crypto -save-temps -Ofast -fomit-frame-pointer 
-fno-aggressive-loop-optimizations

I'll open a bug report to keep track of it.

Thanks,
Kyrill


Richard.

2015-12-14  Richard Biener  

PR tree-optimization/68852
* tree-vectorizer.h (struct _slp_tree): Add def_type member.
(SLP_TREE_DEF_TYPE): New accessor.
* tree-vect-stmts.c (vect_is_simple_use): Remove BB vectorization
hack.
* tree-vect-slp.c (vect_create_new_slp_node): Initialize
SLP_TREE_DEF_TYPE.
(vect_build_slp_tree): When a node is to be built up from scalars
do not push a NULL as child but instead set its def_type to
vect_external_def.
(vect_analyze_slp_cost_1): Check for child def-type instead
of NULL.
(vect_detect_hybrid_slp_stmts): Likewise.
(vect_bb_slp_scalar_cost): Likewise.
(vect_get_slp_defs): Likewise.
(vect_slp_analyze_node_operations): Likewise.  Before
processing node push the children def-types to the underlying
stmts vinfo and restore it afterwards.
(vect_schedule_slp_instance): Likewise.
(vect_slp_analyze_bb_1): Do not mark stmts not in SLP instances
as not vectorizable.

* g++.dg/torture/pr68852.C: New testcase.

Index: gcc/tree-vectorizer.h
===
*** gcc/tree-vectorizer.h   (revision 231552)
--- gcc/tree-vectorizer.h   (working copy)
*** struct _slp_tree {
*** 107,112 
--- 107,114 
 unsigned int vec_stmts_size;
 /* Whether the scalar computations use two different operators.  */
 bool two_operators;
+   /* The DEF type of this node.  */
+   enum vect_def_type def_type;
   };
   
   
*** typedef struct _slp_instance {

*** 139,144 
--- 141,147 
   #define SLP_TREE_NUMBER_OF_VEC_STMTS(S)  (S)->vec_stmts_size
   #define SLP_TREE_LOAD_PERMUTATION(S) (S)->load_permutation
   #define SLP_TREE_TWO_OPERATORS(S) (S)->two_operators
+ #define SLP_TREE_DEF_TYPE(S)   (S)->def_type
   
   
   
Index: gcc/tree-vect-stmts.c

===
*** gcc/tree-vect-stmts.c   (revision 231552)
--- gcc/tree-vect-stmts.c   (working copy)
*** vect_is_simple_use (tree operand, vec_in
*** 8649,8658 
 else
   {
 stmt_vec_info stmt_vinfo = vinfo_for_stmt (*def_stmt);
!   if (is_a  (vinfo) && !STMT_VINFO_VECTORIZABLE (stmt_vinfo))
!   *dt = vect_external_def;
!   else
!   *dt = STMT_VINFO_DEF_TYPE (stmt_vinfo);
   }
   
 if (dump_enabled_p ())

--- 8652,8658 
 else
   {
 stmt_vec_info stmt_vinfo = vinfo_for_stmt (*def_stmt);
!   *dt = STMT_VINFO_DEF_TYPE (stmt_vinfo);
   }
   
 if (dump_enabled_p ())

Index: gcc/testsuite/g++.dg/torture/pr68852.C
===
--- gcc/testsuite/g++.dg/torture/pr68852.C  (revision 0)
+++

Re: [PATCH][combine][RFC][2/2] PR rtl-optimization/68796: Perfer zero_extract comparison against zero rather than unsupported shorter modes


On 12/17/2015 10:04 AM, Kyrill Tkachov wrote:


In this case, I'm expecting a QImode compare with zero to map down to
the aarch64 TST reg, #255 instruction which
definitely zeroes out any bits outside of QImode (as it is a bitwise AND
with a bitmask),
so zero_extract is the more correct expression here, no?
It's more about the semantics of the code and how it interacts with RTL 
generation, optimization and analysis than it is with the final assembly 
generated by the backend that drives SUBREG vs zero_extract.


The backend assembly code generator is free to implement stricter 
semantics (such as defining all the bits for a paradoxical subreg), but 
the rest of the compiler can not depend on those stricter semantics.


The easiest way to think about the subreg case here is that it's used 
when we've got a narrow object that we want to view in a wider mode, but 
we don't actually care about the upper bits.  The widening is merely to 
make the mode match another operand.



zero_extract is still the canonical form.  subreg is a specialized form 
for cases where the upper bits are "don't care" values.  This should 
probably be documented as the current state of the world.


I think it's an open question whether or not to drop the subreg form and 
always use zero-extract.  I've certainly seen cases where the former is 
*supposed* to allow better code generation, but in fact actually gets in 
the way resulting in poorer code generation.


Jeff

Re: [PATCH][AArch64][1/2] PR rtl-optimization/68796 Add compare-of-zero_extract pattern


Hi James,

On 17/12/15 17:24, James Greenhalgh wrote:

On Thu, Dec 17, 2015 at 03:36:40PM +, Kyrill Tkachov wrote:

2015-12-17  Kyrylo Tkachov  

 PR rtl-optimization/68796
 * config/aarch64/aarch64.md (*and3nr_compare0_zextract):
 New pattern.
 * config/aarch64/aarch64.c (aarch64_select_cc_mode): Handle
 ZERO_EXTRACT comparison with zero.
 (aarch64_mask_from_zextract_ops): New function.
 * config/aarch64/aarch64-protos.h (aarch64_mask_from_zextract_ops):
 New prototype.

2015-12-17  Kyrylo Tkachov  

 PR rtl-optimization/68796
 * gcc.target/aarch64/tst_3.c: New test.
 * gcc.target/aarch64/tst_4.c: Likewise.

Two comments.


diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 
87d6eb1358845527d7068550925949802a7e48e2..febca98d38d5f09c97b0f79adc55bb29eca217b9
 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -330,6 +330,7 @@ int aarch64_uxt_size (int, HOST_WIDE_INT);
  int aarch64_vec_fpconst_pow_of_2 (rtx);
  rtx aarch64_final_eh_return_addr (void);
  rtx aarch64_legitimize_reload_address (rtx *, machine_mode, int, int, int);
+rtx aarch64_mask_from_zextract_ops (rtx, rtx);
  const char *aarch64_output_move_struct (rtx *operands);
  rtx aarch64_return_addr (int, rtx);
  rtx aarch64_simd_gen_const_vector_dup (machine_mode, int);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
cb8955d5d6c909e8179bb1ab8203eb165f55e4b6..58a9fc68f391162ed9847d7fb79d70d3ee9919f5
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -4147,7 +4147,9 @@ aarch64_select_cc_mode (RTX_CODE code, rtx x, rtx y)
&& y == const0_rtx
&& (code == EQ || code == NE || code == LT || code == GE)
&& (GET_CODE (x) == PLUS || GET_CODE (x) == MINUS || GET_CODE (x) == AND
- || GET_CODE (x) == NEG))
+ || GET_CODE (x) == NEG
+ || (GET_CODE (x) == ZERO_EXTRACT && CONST_INT_P (XEXP (x, 1))
+ && CONST_INT_P (XEXP (x, 2)
  return CC_NZmode;
  
/* A compare with a shifted operand.  Because of canonicalization,

@@ -10757,6 +10759,21 @@ aarch64_simd_imm_zero_p (rtx x, machine_mode mode)
return x == CONST0_RTX (mode);
  }
  
+

+/* Return the bitmask CONST_INT to select the bits required by a zero extract
+   operation of width WIDTH at bit position POS.  */
+
+rtx
+aarch64_mask_from_zextract_ops (rtx width, rtx pos)
+{

It is up to you, but would this not more naturally be:

   unsigned HOST_WIDE_INT
   aarch64_mask_from_zextract_ops (rtx width, rtx pos)

Given how it gets used elsewhere?


It gets used in exactly two places, once in the condition of the pattern
where we have to extract its UINTVAL and once when outputting the assembly
string where we want the rtx wrapper around it to assign it to operands[1],
so I'd argue it's a 50-50 choice.
So I'll leave it as it is unless you have a strong preference.


+  gcc_assert (CONST_INT_P (width));
+  gcc_assert (CONST_INT_P (pos));
+
+  unsigned HOST_WIDE_INT mask
+= ((unsigned HOST_WIDE_INT)1 << UINTVAL (width)) - 1;

Space between (unsigned HOST_WIDE_INT) and 1.



Consider it done.
Thanks,
Kyrill


+  return GEN_INT (mask << UINTVAL (pos));
+}
+
  bool
  aarch64_simd_imm_scalar_p (rtx x, machine_mode mode ATTRIBUTE_UNUSED)
  {

Otherwise, this is OK.

Thanks,
James

Re: [PATCH][combine][RFC][2/2] PR rtl-optimization/68796: Perfer zero_extract comparison against zero rather than unsupported shorter modes



On 17/12/15 17:27, Segher Boessenkool wrote:

On Thu, Dec 17, 2015 at 05:12:16PM +0100, Bernd Schmidt wrote:

On 12/17/2015 05:10 PM, Kyrill Tkachov wrote:

Well, this patch still produces the QImode comparison if the target has
a QImode comparison
(the have_insn_for check in the simplify_comparison hunk).

Ok, I didn't look that closely because I had doubts about the approach.
This kind of check also goes somewhat against the principles of just
producing canonical forms of RTL.

The canonicalisation rules exist so that optimisers only need to match
one form instead of several, and machine descriptions only need to
describe one form instead of several.  For this bitmasking case it
perversely forces you to describe the same instruction in many ways,
for many targets.  This is what the change_zero_ext was about as well.

It's not so easy to fix for the compare case.  Maybe the idea of making
genrecog make code that recognises more forms of the same insn will work
out.  GCC 7 in any case...


Perhaps I had underestimated how involved this issue is :)
So if I want to improve the aarch64 situation for GCC 6,
would the recommended course of action be to just define the
QI and HImode compare against zero patterns?

Note that I think the make_extraction hunk from my patch is in line
with the function comment of make_extraction that says:
"   IN_COMPARE is nonzero if we are in a COMPARE.  This means that a
ZERO_EXTRACT should be built even for bits starting at bit 0."

whereas the condition that I'm adding "&& !in_compare" is explicitly trying
to avoid an extraction.

But anyway, if this has the potential to cause negative fallout that I
had not anticipated, it can wait for later.

Thanks,
Kyrill



Segher

config-list.mk and obsoleted configurations (was: [BUILDROBOT] "error: null argument where non-null required" on multiple targets)

2015-12-17 Thread Jan-Benedict Glaw

On Thu, 2015-12-17 11:05:42 -0700, Jeff Law  wrote:
> On 12/16/2015 03:46 AM, Jan-Benedict Glaw wrote:
> > Shall I bisect one of the cases anew, with the "Test value of
> > _GLIBCXX_USE_C99_WCHAR not whether it is defined" patch that
> > uncovered it, applied? Starting with some arbitrary old revision?
> Yes.  I'd really like to see config-list.mk working again.  The
> first step is always building a test the developers can easily work
> with.

Will do. Have a good starting point?

  Oh, there are some targets that were obsoleted today. I think the
OpenBSD3 and the two knetbsd configurations will need an
--enable-obsolete. I suggest this (untested) patch:

contrib/
2015-12-17  Jan-Benedict Glaw  

* config-list.mk (LIST): Add --enable-obsolete to recently obsoleted
targets x86_64-knetbsd-gnu, i686-knetbsd-gnu and i686-openbsd3.0 .

diff --git a/contrib/ChangeLog b/contrib/ChangeLog
index 8d39e68..ab8060b 100644
--- a/contrib/ChangeLog
+++ b/contrib/ChangeLog
@@ -1,3 +1,8 @@
+2015-12-17  Jan-Benedict Glaw  
+
+   * config-list.mk (LIST): Add --enable-obsolete to recently obsoleted
+   targets x86_64-knetbsd-gnu, i686-knetbsd-gnu and i686-openbsd3.0 .
+
 2015-12-06  Tobias Burnus  
 
* download_prerequisites: Download ISL 0.15 instead of 0.14.
diff --git a/contrib/config-list.mk b/contrib/config-list.mk
index f0e39d6..0f15464 100644
--- a/contrib/config-list.mk
+++ b/contrib/config-list.mk
@@ -28,7 +28,8 @@ LIST = aarch64-elf aarch64-linux-gnu \
   hppa64-hpux11.0OPT-enable-sjlj-exceptions=yes hppa2.0-hpux11.9 \
   i686-pc-linux-gnu i686-apple-darwin i686-apple-darwin9 i686-apple-darwin10 \
   i486-freebsd4 i686-freebsd6 i686-kfreebsd-gnu \
-  i686-netbsdelf9 i686-knetbsd-gnu i686-openbsd i686-openbsd3.0 \
+  i686-netbsdelf9 i686-knetbsd-gnuOPT-enable-obsolete \
+  i686-openbsd i686-openbsd3.0OPT-enable-obsolete \
   i686-elf i686-kopensolaris-gnu i686-symbolics-gnu i686-pc-msdosdjgpp \
   i686-lynxos i686-nto-qnx \
   i686-rtems i686-solaris2.10 i686-wrs-vxworks \
@@ -74,7 +75,7 @@ LIST = aarch64-elf aarch64-linux-gnu \
   vax-netbsdelf vax-openbsd visium-elf x86_64-apple-darwin \
   x86_64-pc-linux-gnuOPT-with-fpmath=avx \
   x86_64-elfOPT-with-fpmath=sse x86_64-freebsd6 x86_64-netbsd \
-  x86_64-knetbsd-gnu x86_64-w64-mingw32 \
+  x86_64-knetbsd-gnuOPT-enable-obsolete x86_64-w64-mingw32 \
   x86_64-mingw32OPT-enable-sjlj-exceptions=yes xstormy16-elf xtensa-elf \
   xtensa-linux \
   i686-interix3OPT-enable-obsolete



MfG, JBG

-- 
  Jan-Benedict Glaw  jbg...@lug-owl.de  +49-172-7608481
Signature of: 23:53 <@jbglaw> So, ich kletter' jetzt mal ins Bett.
the second  : 23:57 <@jever2> .oO( kletter ..., hat er noch Gitter vorm Bett, 
wie früher meine Kinder?)
  00:00 <@jbglaw> jever2: *patsch*
  00:01 <@jever2> *aua*, wofür, Gedanken sind frei!
  00:02 <@jbglaw> Nee, freie Gedanken, die sind seit 1984 doch aus!
  00:03 <@jever2> 1984? ich bin erst seit 1985 verheiratet!


signature.asc
Description: Digital signature

Re: [PATCH][combine][RFC][2/2] PR rtl-optimization/68796: Perfer zero_extract comparison against zero rather than unsupported shorter modes

2015-12-17 Thread Segher Boessenkool

On Thu, Dec 17, 2015 at 05:12:16PM +0100, Bernd Schmidt wrote:
> On 12/17/2015 05:10 PM, Kyrill Tkachov wrote:
> >Well, this patch still produces the QImode comparison if the target has
> >a QImode comparison
> >(the have_insn_for check in the simplify_comparison hunk).
> 
> Ok, I didn't look that closely because I had doubts about the approach. 
> This kind of check also goes somewhat against the principles of just 
> producing canonical forms of RTL.

The canonicalisation rules exist so that optimisers only need to match
one form instead of several, and machine descriptions only need to
describe one form instead of several.  For this bitmasking case it
perversely forces you to describe the same instruction in many ways,
for many targets.  This is what the change_zero_ext was about as well.

It's not so easy to fix for the compare case.  Maybe the idea of making
genrecog make code that recognises more forms of the same insn will work
out.  GCC 7 in any case...

Segher

Re: ipa-cp heuristics fixes

2015-12-17 Thread Jakub Jelinek

On Wed, Dec 16, 2015 at 08:15:12PM +0100, Jan Hubicka wrote:
> just to summarize a discussion on IRC. The problem is that we produce debug
> statements for eliminated arguments only in ipa-sra and ipa-split, while we
> don't do anything for cgraph clones. This is a problem on release branches,
> too.
> 
> It seems we have all the necessary logic, but the callee modification code 
> from
> ipa-split should be moved to tree_function_versioning (which is used by both
> ipa-split and cgraph clone mechanizm) and caller modifcation copied to
> cgraph_edge::redirect_call_stmt_to_callee.
> 
> I am trying to do that. It seems bit difficult as the caller and callee
> modifications are tied together and I do not know how chaining of
> transfomraitons is going to work. 

Ok, so here is a WIP patch changing the functions you wanted, untested so
far.

I've been looking at 3 testcases (attached), -1.c and -3.c with -g -O2,
and -2.c with -g -O3.
The -3.c one is a copy of the test we have for the ipa-split debug info
stuff, before/after the patch we generate the same stuff.
-2.c testcase is for the (new or now much more often taken patch) of
ipa-cp, the patch arranges for proper debug info in that case
(but, I'm really surprised why when the function is already cloned, nothing
figures out that the clone is always called with the same constant
passed to the arg8 and the argument isn't removed and replaced by constant.
-1.c is a testcase for the IPA-SRA path, where we unfortunately end up with
-a slight regression (on the IL size, in the end we generate the same
assembly):
+  # DEBUG D#8 s=> arg8
+  # DEBUG arg8 => D#8
   # DEBUG arg8 => 7
with the patch.  On that testcase, arg8 is used, but it is always passed
value 7 (similarly to -2.c testcase) and in that case we really don't
need/want the decl_debug_args stuff, it is unnecessary, it is enough to say
in the callee that arg8 is 7.  Nothing on the caller side sets the magic
corresponding D# debug expr decl anyway.
Either tree_versioning is too low-level for the debug info addition, or
we need to figure out how to tell it if a constant will be always passed
to some argument and what that constant will be, so that we'd emit
always the # DEBUG arg8 => constant in that case instead of the source bind
stuff (but then figure out what has added that and avoid duplication too).

And then there is another thing, but best to be handled somewhere in
dwarf2out.c or in the debugger.  The arguments are printed in pretty random
order:
#0  foo (arg7=arg7@entry=30, arg8=arg8@entry=7, arg6=6, arg5=5, arg4=4, arg3=3, 
arg2=2, arg1=1) at pr68860-2.c:15
So, either the debugger for functions with abstract origins should look at
the order of arguments in the abstract origin and ignore order in the
particular instantiation, or dwarf2out.c should sort the
DW_TAG_formal_parameter such that it if at all possible matches the order
specified in the source.

--- gcc/ipa-split.c.jj  2015-12-10 11:14:00.0 +0100
+++ gcc/ipa-split.c 2015-12-17 18:21:39.402036180 +0100
@@ -1209,7 +1209,6 @@ split_function (basic_block return_bb, s
   gimple *last_stmt = NULL;
   unsigned int i;
   tree arg, ddef;
-  vec **debug_args = NULL;
 
   if (dump_file)
 {
@@ -1432,73 +1431,38 @@ split_function (basic_block return_bb, s
  vector to say for debug info that if parameter parm had been passed,
  it would have value parm_Y(D).  */
   if (args_to_skip)
-for (parm = DECL_ARGUMENTS (current_function_decl), num = 0;
-parm; parm = DECL_CHAIN (parm), num++)
-  if (bitmap_bit_p (args_to_skip, num)
- && is_gimple_reg (parm))
-   {
- tree ddecl;
- gimple *def_temp;
-
- /* This needs to be done even without MAY_HAVE_DEBUG_STMTS,
-otherwise if it didn't exist before, we'd end up with
-different SSA_NAME_VERSIONs between -g and -g0.  */
- arg = get_or_create_ssa_default_def (cfun, parm);
- if (!MAY_HAVE_DEBUG_STMTS)
-   continue;
-
- if (debug_args == NULL)
-   debug_args = decl_debug_args_insert (node->decl);
- ddecl = make_node (DEBUG_EXPR_DECL);
- DECL_ARTIFICIAL (ddecl) = 1;
- TREE_TYPE (ddecl) = TREE_TYPE (parm);
- DECL_MODE (ddecl) = DECL_MODE (parm);
- vec_safe_push (*debug_args, DECL_ORIGIN (parm));
- vec_safe_push (*debug_args, ddecl);
- def_temp = gimple_build_debug_bind (ddecl, unshare_expr (arg),
- call);
- gsi_insert_after (, def_temp, GSI_NEW_STMT);
-   }
-  /* And on the callee side, add
- DEBUG D#Y s=> parm
- DEBUG var => D#Y
- stmts to the first bb where var is a VAR_DECL created for the
- optimized away parameter in DECL_INITIAL block.  This hints
- in the debug info that var (whole DECL_ORIGIN is the parm PARM_DECL)
- is optimized away, but could be looked up at the call site
- as value of D#X there.  */
-  if (debug_args !=

Re: [PATCH] C FE: improvements to ranges of bad return values


On 12/16/2015 07:19 PM, David Malcolm wrote:

In the C FE, c_parser_statement_after_labels passes "xloc" to
c_finish_return, which is the location of the first token
within the returned expression.

Hence we don't get a full underline for the following:

diagnostic-range-bad-return.c:34:10: warning: function returns address of local 
variable [-Wreturn-local-addr]
return _local;
   ^

This feels like a bug; this patch fixes it to use the location of
the expr if available, and to fall back to xloc otherwise, giving
us underlining of the full expression:

diagnostic-range-bad-return.c:34:10: warning: function returns address of local 
variable [-Wreturn-local-addr]
return _local;
   ^~~

The testcase also adds some coverage for underlining the
"return" token for the cases where we're warning about th
erroneous presence/absence of a return value.

As an additional tweak, it struck me that we could be more
user-friendly for these latter diagnostics by issuing a note
about where the function was declared, so this patch also adds
an inform for these cases:

diagnostic-range-bad-return.c: In function 'missing_return_value':
diagnostic-range-bad-return.c:31:3: warning: 'return' with no value, in 
function returning non-void
return; /* { dg-warning "'return' with no value, in function returning 
non-void" } */
^~

diagnostic-range-bad-return.c:29:5: note: declared here
  int missing_return_value (void)
  ^~~~

(ideally we'd put the underline on the return type, but that location
isn't captured)

This latter part of the patch is an enhancement rather than a
bugfix, though FWIW, and I'm not sure I can argue this with a
straight face, the tweak was posted as part of:
   "[PATCH 16/22] C/C++ frontend: use tree ranges in various diagnostics"
in https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00745.html
during stage 1.  Hopefully low risk, and a small usability improvement;
but if this is pushing it, it'd be simple to split this up and only
do the bug fix.

Successfully bootstrapped on x86_64-pc-linux-gnu;
adds 12 PASS results to gcc.sum.

OK for trunk for stage 3?

gcc/c/ChangeLog:
* c-parser.c (c_parser_statement_after_labels): When calling
c_finish_return, Use the return expression's location if it has
one, falling back to the location of the first token within it.
* c-typeck.c (c_finish_return): When issuing warnings about
the incorrect presence/absence of a return value, issue a note
showing the declaration of the function.
This is fine.  I think the first is pretty easy to justify.  THe second 
is harder.  Again I think it's very low risk and has user-visible benfits.


Jeff

Re: [PATCH][combine][RFC][2/2] PR rtl-optimization/68796: Perfer zero_extract comparison against zero rather than unsupported shorter modes


On 12/17/2015 09:26 AM, Kyrill Tkachov wrote:

One could argue that if the target has (or advertises having) a native
QImode register comparison then it's objectively a simplification to
transform a comparison in a wider mode
to a comparison in the shorter mode.

Generally true.

The most commonly cited exception is any port that defines 
WORD_REGISTER_OPERATIONS.  However, I would be comfortable with the idea 
that defining QImode comparisons on a target with 
WORD_REGISTER_OPERATIONS is a pretty explicit indication that it wants 
to try and shorten comparisons for one reason or another.







If, however, the target doesn't have such an instruction (like aarch64
doesn't have QImode registers) then
truncating the wider mode to QImode through a subreg is not less complex
than a zero_extract, as both will
involve some form of extracting/masking the desired QImode bits. So
picking a canonical form there makes sense,
and the documentation already specifies the zero_extract form as the
canonical.

Would be nice to get a definite clarification on whether the subreg form
is indeed the canonical one.
The subreg style "extension" isn't really an extension.  It is a way to 
say that we want to look at the object in a wider mode, but we don't 
actually care about the upper bits.  It's generally expected that the 
subreg won't result in the generation of any code.


A zero extract defines all the bits.

In theory the optimizers can use a SUBREG just like they could a REG, 
which should enable additional optimization.  In practice I don't think 
that's been as true as we'd like.


jeff

Re: [PATCH][AArch64][1/2] PR rtl-optimization/68796 Add compare-of-zero_extract pattern

2015-12-17 Thread James Greenhalgh

On Thu, Dec 17, 2015 at 03:36:40PM +, Kyrill Tkachov wrote:
> 2015-12-17  Kyrylo Tkachov  
> 
> PR rtl-optimization/68796
> * config/aarch64/aarch64.md (*and3nr_compare0_zextract):
> New pattern.
> * config/aarch64/aarch64.c (aarch64_select_cc_mode): Handle
> ZERO_EXTRACT comparison with zero.
> (aarch64_mask_from_zextract_ops): New function.
> * config/aarch64/aarch64-protos.h (aarch64_mask_from_zextract_ops):
> New prototype.
> 
> 2015-12-17  Kyrylo Tkachov  
> 
> PR rtl-optimization/68796
> * gcc.target/aarch64/tst_3.c: New test.
> * gcc.target/aarch64/tst_4.c: Likewise.

Two comments.

> diff --git a/gcc/config/aarch64/aarch64-protos.h 
> b/gcc/config/aarch64/aarch64-protos.h
> index 
> 87d6eb1358845527d7068550925949802a7e48e2..febca98d38d5f09c97b0f79adc55bb29eca217b9
>  100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -330,6 +330,7 @@ int aarch64_uxt_size (int, HOST_WIDE_INT);
>  int aarch64_vec_fpconst_pow_of_2 (rtx);
>  rtx aarch64_final_eh_return_addr (void);
>  rtx aarch64_legitimize_reload_address (rtx *, machine_mode, int, int, int);
> +rtx aarch64_mask_from_zextract_ops (rtx, rtx);
>  const char *aarch64_output_move_struct (rtx *operands);
>  rtx aarch64_return_addr (int, rtx);
>  rtx aarch64_simd_gen_const_vector_dup (machine_mode, int);
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
> cb8955d5d6c909e8179bb1ab8203eb165f55e4b6..58a9fc68f391162ed9847d7fb79d70d3ee9919f5
>  100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -4147,7 +4147,9 @@ aarch64_select_cc_mode (RTX_CODE code, rtx x, rtx y)
>&& y == const0_rtx
>&& (code == EQ || code == NE || code == LT || code == GE)
>&& (GET_CODE (x) == PLUS || GET_CODE (x) == MINUS || GET_CODE (x) == 
> AND
> -   || GET_CODE (x) == NEG))
> +   || GET_CODE (x) == NEG
> +   || (GET_CODE (x) == ZERO_EXTRACT && CONST_INT_P (XEXP (x, 1))
> +   && CONST_INT_P (XEXP (x, 2)
>  return CC_NZmode;
>  
>/* A compare with a shifted operand.  Because of canonicalization,
> @@ -10757,6 +10759,21 @@ aarch64_simd_imm_zero_p (rtx x, machine_mode mode)
>return x == CONST0_RTX (mode);
>  }
>  
> +
> +/* Return the bitmask CONST_INT to select the bits required by a zero extract
> +   operation of width WIDTH at bit position POS.  */
> +
> +rtx
> +aarch64_mask_from_zextract_ops (rtx width, rtx pos)
> +{

It is up to you, but would this not more naturally be:

  unsigned HOST_WIDE_INT
  aarch64_mask_from_zextract_ops (rtx width, rtx pos)

Given how it gets used elsewhere?

> +  gcc_assert (CONST_INT_P (width));
> +  gcc_assert (CONST_INT_P (pos));
> +
> +  unsigned HOST_WIDE_INT mask
> += ((unsigned HOST_WIDE_INT)1 << UINTVAL (width)) - 1;

Space between (unsigned HOST_WIDE_INT) and 1.

> +  return GEN_INT (mask << UINTVAL (pos));
> +}
> +
>  bool
>  aarch64_simd_imm_scalar_p (rtx x, machine_mode mode ATTRIBUTE_UNUSED)
>  {

Otherwise, this is OK.

Thanks,
James

Re: [PATCH] PR c++/68795: fix uninitialized close_paren_loc in cp_parser_postfix_expression


On 12/17/2015 07:32 PM, David Malcolm wrote:

+   if (close_paren_loc)


close_paren_loc != UNKNOWN_LOCATION - it's very confusing otherwise.


Bernd

Re: [BUILDROBOT] "error: null argument where non-null required" on multiple targets


On 12/16/2015 03:46 AM, Jan-Benedict Glaw wrote:

On Tue, 2015-12-15 10:43:58 -0700, Jeff Law  wrote:

On 12/14/2015 01:07 PM, Jan-Benedict Glaw wrote:

On Mon, 2015-12-14 18:54:28 +, Moore, Catherine 
 wrote:

avr-rtems   
http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=478544
mipsel-elf  
http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=478844
mipsisa64r2-sde-elf 
http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=478855
mipsisa64sb1-elf
http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=478865
mips-rtems  
http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=478877
powerpc-eabialtivec 
http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=478922
powerpc-eabispe 
http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=478932
powerpc-rtems   
http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=478956
ppc-elf 
http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=478968
sh-superh-elf   
http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=479077


Is there an easy way to reproduce the MIPS problems that you
reported?  I don't seem to be able to do it with a cross-compiler
targeting mipsel-elf.


What's your build compiler? For these builds, where it showed up, I'm
using a freshly compiles HEAD/master version. So basically, compile a
current GCC for your build machine:

Right.  This is something that only shows up when using the trunk to build
the crosses.

When I looked, I thought I bisected it to the delayed folding work.


Shall I bisect one of the cases anew, with the "Test value of
_GLIBCXX_USE_C99_WCHAR not whether it is defined" patch that uncovered
it, applied? Starting with some arbitrary old revision?
Yes.  I'd really like to see config-list.mk working again.  The first 
step is always building a test the developers can easily work with.



jeff

Re: [PATCH][combine][RFC][2/2] PR rtl-optimization/68796: Perfer zero_extract comparison against zero rather than unsupported shorter modes


On 12/17/2015 06:44 PM, Kyrill Tkachov wrote:

Perhaps I had underestimated how involved this issue is :)
So if I want to improve the aarch64 situation for GCC 6,
would the recommended course of action be to just define the
QI and HImode compare against zero patterns?


For GCC 6 I think this is the only approach.


Bernd

[PATCH] PR c++/68795: fix uninitialized close_paren_loc in cp_parser_postfix_expression

2015-12-17 Thread David Malcolm

cp_parser_parenthesized_expression_list can leave *close_paren_loc
untouched if an error occurs; specifically when following this goto:

7402  if (expr == error_mark_node)
7403goto skip_comma;

which can lead to cp_parser_postfix_expression attempting to
use uninitialized data for the finishing location of a
parenthesized expression.

The attached patch fixes this by having cp_parser_postfix_expression
initialize the underlying location to UNKNOWN_LOCATION, and only use
it if it's been written to.

Verified the fix manually by compiling
  g++.old-deja/g++.ns/invalid1.C
before and after under valgrind.

Successfully bootstrapped on x86_64-pc-linux-gnu.

OK for trunk?

gcc/cp/ChangeLog:
* parser.c (cp_parser_postfix_expression): Initialize
close_paren_loc to UNKNOWN_LOCATION; only use it if
it has been written to by
cp_parser_parenthesized_expression_list.
(cp_parser_postfix_dot_deref_expression): Likewise.
(cp_parser_parenthesized_expression_list): Document the behavior
with respect to the CLOSE_PAREN_LOC param.
---
 gcc/cp/parser.c | 18 --
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index a420cf1..56dfe42 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -6664,7 +6664,7 @@ cp_parser_postfix_expression (cp_parser *parser, bool 
address_p, bool cast_p,
bool saved_non_integral_constant_expression_p = false;
tsubst_flags_t complain = complain_flags (decltype_p);
vec *args;
-   location_t close_paren_loc;
+   location_t close_paren_loc = UNKNOWN_LOCATION;
 
 is_member_access = false;
 
@@ -6826,10 +6826,13 @@ cp_parser_postfix_expression (cp_parser *parser, bool 
address_p, bool cast_p,
koenig_p,
complain);
 
-   location_t combined_loc = make_location (token->location,
-start_loc,
-close_paren_loc);
-   postfix_expression.set_location (combined_loc);
+   if (close_paren_loc)
+ {
+   location_t combined_loc = make_location (token->location,
+start_loc,
+close_paren_loc);
+   postfix_expression.set_location (combined_loc);
+ }
 
/* The POSTFIX_EXPRESSION is certainly no longer an id.  */
idk = CP_ID_KIND_NONE;
@@ -7298,7 +7301,10 @@ cp_parser_postfix_dot_deref_expression (cp_parser 
*parser,
plain identifier argument, normal_attr for an attribute that wants
an expression, or non_attr if we aren't parsing an attribute list.  If
NON_CONSTANT_P is non-NULL, *NON_CONSTANT_P indicates whether or
-   not all of the expressions in the list were constant.  */
+   not all of the expressions in the list were constant.
+   If CLOSE_PAREN_LOC is non-NULL, and no errors occur, then *CLOSE_PAREN_LOC
+   will be written to with the location of the closing parenthesis.  If
+   an error occurs, it may or may not be written to.  */
 
 static vec *
 cp_parser_parenthesized_expression_list (cp_parser* parser,
-- 
1.8.5.3

[PATCH] [graphite] replace ISL with isl

2015-12-17 Thread Sebastian Pop

---
 Makefile.in   |  2 +-
 Makefile.tpl  |  2 +-
 config/isl.m4 |  2 +-
 configure | 10 +++---
 configure.ac  | 14 
 contrib/download_prerequisites|  2 +-
 gcc/Makefile.in   |  2 +-
 gcc/common.opt|  2 +-
 gcc/configure |  8 ++---
 gcc/configure.ac  |  8 ++---
 gcc/doc/install.texi  |  8 ++---
 gcc/doc/invoke.texi   |  4 +--
 gcc/graphite-isl-ast-to-gimple.c  | 47 +--
 gcc/graphite-scop-detection.c |  4 +--
 gcc/graphite-sese-to-poly.c   |  6 ++--
 gcc/graphite.c|  4 +--
 gcc/graphite.h|  2 +-
 gcc/params.def|  2 +-
 gcc/testsuite/gcc.dg/graphite/fuse-1.c|  4 +--
 gcc/testsuite/gcc.dg/graphite/fuse-2.c|  4 +--
 gcc/testsuite/gcc.dg/graphite/interchange-1.c |  2 +-
 gcc/testsuite/gcc.dg/graphite/pr35356-1.c |  2 +-
 gcc/toplev.c  |  2 +-
 23 files changed, 69 insertions(+), 74 deletions(-)

diff --git a/Makefile.in b/Makefile.in
index cb62c35..e9b5950 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -312,7 +312,7 @@ NORMAL_TARGET_EXPORTS = \
 HOST_GMPLIBS = @gmplibs@
 HOST_GMPINC = @gmpinc@
 
-# Where to find ISL
+# Where to find isl
 HOST_ISLLIBS = @isllibs@
 HOST_ISLINC = @islinc@
 
diff --git a/Makefile.tpl b/Makefile.tpl
index 693e4d5..f7bb77e 100644
--- a/Makefile.tpl
+++ b/Makefile.tpl
@@ -315,7 +315,7 @@ NORMAL_TARGET_EXPORTS = \
 HOST_GMPLIBS = @gmplibs@
 HOST_GMPINC = @gmpinc@
 
-# Where to find ISL
+# Where to find isl
 HOST_ISLLIBS = @isllibs@
 HOST_ISLINC = @islinc@
 
diff --git a/config/isl.m4 b/config/isl.m4
index e4e4aab..86ccb94 100644
--- a/config/isl.m4
+++ b/config/isl.m4
@@ -94,7 +94,7 @@ AC_DEFUN([ISL_REQUESTED],
 
 # ISL_CHECK_VERSION ISL_CHECK_VERSION ()
 # 
-# Test that ISL contains functionality added to the minimum expected version.
+# Test whether isl contains functionality added to the minimum expected 
version.
 AC_DEFUN([ISL_CHECK_VERSION],
 [
   if test "${ENABLE_ISL_CHECK}" = yes ; then
diff --git a/configure b/configure
index c3c5cb0..f5786ed 100755
--- a/configure
+++ b/configure
@@ -1549,7 +1549,7 @@ Optional Packages:
   --with-boot-libs=LIBS   libraries for stage2 and later
   --with-boot-ldflags=FLAGS
   linker flags for stage2 and later
-  --with-isl=PATH Specify prefix directory for the installed ISL
+  --with-isl=PATH Specify prefix directory for the installed isl
   package. Equivalent to
   --with-isl-include=PATH/include plus
   --with-isl-lib=PATH/lib
@@ -5943,7 +5943,7 @@ fi
 
 
 
-# GCC GRAPHITE dependency ISL.
+# GCC GRAPHITE dependency isl.
 # Basic setup is inlined here, actual checks are in config/isl.m4
 
 
@@ -5956,7 +5956,7 @@ fi
 # Treat --without-isl as a request to disable
 # GRAPHITE support and skip all following checks.
 if test "x$with_isl" != "xno"; then
-  # Check for ISL
+  # Check for isl
 
 
 # Check whether --with-isl-include was given.
@@ -6079,13 +6079,13 @@ $as_echo "recommended isl version is 0.15, minimum 
required isl version 0.14 is
 && test "x${isllibs}" = x \
 && test "x${islinc}" = x ; then
 
-as_fn_error "Unable to find a usable ISL.  See config.log for details." 
"$LINENO" 5
+as_fn_error "Unable to find a usable isl.  See config.log for details." 
"$LINENO" 5
   fi
 
 
 fi
 
-# If the ISL check failed, disable builds of in-tree variant of ISL
+# If the isl check failed, disable builds of in-tree variant of isl
 if test "x$with_isl" = xno ||
test "x$gcc_cv_isl" = xno; then
   noconfigdirs="$noconfigdirs isl"
diff --git a/configure.ac b/configure.ac
index a6998ff..a719e03 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1773,31 +1773,31 @@ AC_ARG_WITH(boot-ldflags,
  fi])
 AC_SUBST(poststage1_ldflags)
 
-# GCC GRAPHITE dependency ISL.
+# GCC GRAPHITE dependency isl.
 # Basic setup is inlined here, actual checks are in config/isl.m4
 
 AC_ARG_WITH(isl,
   [AS_HELP_STRING(
[--with-isl=PATH],
-   [Specify prefix directory for the installed ISL package.
+   [Specify prefix directory for the installed isl package.
 Equivalent to --with-isl-include=PATH/include
 plus --with-isl-lib=PATH/lib])])
 
 # Treat --without-isl as a request to disable
 # GRAPHITE support and skip all following checks.
 if test "x$with_isl" != "xno"; then
-  # Check for ISL
+  # Check for isl
   dnl Provide configure switches and initialize islinc & isllibs
   dnl with user input.
   ISL_INIT_FLAGS
-

Re: [PATCH][combine][RFC][2/2] PR rtl-optimization/68796: Perfer zero_extract comparison against zero rather than unsupported shorter modes

Hi Jeff,

On 17/12/15 16:59, Jeff Law wrote:

On 12/17/2015 09:26 AM, Kyrill Tkachov wrote:

One could argue that if the target has (or advertises having) a native
QImode register comparison then it's objectively a simplification to
transform a comparison in a wider mode
to a comparison in the shorter mode.

Generally true.

The most commonly cited exception is any port that defines WORD_REGISTER_OPERATIONS. However, I would be comfortable with the idea that defining QImode comparisons on a target with WORD_REGISTER_OPERATIONS is a pretty explicit indication
that it wants to try and shorten comparisons for one reason or another.

I was investigating WORD_REGISTER_OPERATIONS as part of this. But we can't
define it for aarch64.
In any case, aarch64 doesn't have QImode registers so I thought we'd try to
avoid creating them.

If, however, the target doesn't have such an instruction (like aarch64
doesn't have QImode registers) then
truncating the wider mode to QImode through a subreg is not less complex
than a zero_extract, as both will
involve some form of extracting/masking the desired QImode bits. So
picking a canonical form there makes sense,
and the documentation already specifies the zero_extract form as the
canonical.

Would be nice to get a definite clarification on whether the subreg form
is indeed the canonical one.
The subreg style "extension" isn't really an extension. It is a way to say that we want to look at the object in a wider mode, but we don't actually care about the upper bits. It's generally expected that the subreg won't result in the
generation of any code.

A zero extract defines all the bits.

In this case, I'm expecting a QImode compare with zero to map down to the
aarch64 TST reg, #255 instruction which
definitely zeroes out any bits outside of QImode (as it is a bitwise AND with a
bitmask),
so zero_extract is the more correct expression here, no?

In theory the optimizers can use a SUBREG just like they could a REG, which
should enable additional optimization. In practice I don't think that's been
as true as we'd like.

jeff

Re: [PATCH] PR target/68937: i686: -fno-plt produces wrong code (maybe only with tailcall

On Thu, Dec 17, 2015 at 8:11 AM, H.J. Lu  wrote:
> On Thu, Dec 17, 2015 at 7:50 AM, H.J. Lu  wrote:
>> On Thu, Dec 17, 2015 at 5:42 AM, Uros Bizjak  wrote:
>>> On Thu, Dec 17, 2015 at 2:00 PM, H.J. Lu  wrote:
 On Thu, Dec 17, 2015 at 2:04 AM, Uros Bizjak  wrote:
> On Thu, Dec 17, 2015 at 12:29 AM, H.J. Lu  wrote:
>> Since sibcall never returns, we can only use call-clobbered register
>> as GOT base.  Otherwise, callee-saved register used as GOT base won't
>> be properly restored.
>>
>> Tested on x86-64 with -m32.  OK for trunk?
>
> You don't have to add explicit clobber for members of "CLOBBERED_REGS"
> class, and register_no_elim_operand predicate should be used with "U"
> constraint. Also, please introduce new predicate, similar to how
> GOT_memory_operand is defined and handled.
>

 Here is the updated patch.  There is a predicate already,
 sibcall_memory_operand.  It allows any registers to
 be as GOT base, which is the root of our problem.
 This patch removes GOT slot from it and handles
 sibcall over GOT slot with *sibcall_GOT_32 and
 *sibcall_value_GOT_32 patterns.  Since I need to
 expose constraints on GOT base register to RA,
 I have to use 2 operands, GOT base and function
 symbol, to describe sibcall over 32-bit GOT slot.
>>>
>>> Please use
>>>
>>>(mem:SI (plus:SI
>>>  (match_operand:SI 0 "register_no_elim_operand" "U")
>>>  (match_operand:SI 1 "GOT32_symbol_operand")))
>>> ...
>>>
>>> to avoid manual rebuild of the operand.
>>>
>>
>> Is this OK?
>>
>
> An updated patch to allow sibcall_memory_operand for RTL
> expansion.  OK for trunk if there is no regression?
>

There is no regressions on x86-64 with -m32.  OK for trunk?

-- 
H.J.

stop IPA wrapping 'main'

2015-12-17 Thread Nathan Sidwell

gcc.dg/20031102-1.c now causes some 'surprising' optimization behaviour.  It is 
essentially


int FooBar(void)
{
 ... stuff
  return 0;
}

int main(void)
{
  return FooBar();
}


What happens is  that FooBar gets inlined into main, and then ipa-icf notices 
FooBar and main have identical bodies.  It chooses to have FooBar tail call 
main, which results in a surprising  call of 'main'.   On PTX this is 
particularly unfortunate because we have to emit a single prototype for main 
with the regular argc and argv arguments (the backend gets around 'int main 
(void)' by faking the additional 2 args).  But that fails here because the tail 
call doesn't match the prototype.


Anyway, picking 'main' as the source function struck me as a poor choice, hence 
the attached patch.  It picks the second function of a congruent set, if the 
first is 'main'.  Note that even on, say x86-linux, we emit a tail call rather 
than an alias for the included testcase.


I removed the gcc_assert, as the vector indexing operator already checks the 
subscript is within range.


Alternatively I could probably just fixup the testcase to make FooBar 
uninlinable, as I suspect that might have been the original intent.


tested on x86_64-linux and ptx-none.

nathan
2015-12-17  Nathan Sidwell  

	gcc/
	* ipa-icf.c (sem_item_optimizer::merge): Don't pick 'main' as the
	source function.

	gcc/testsuite/
	* gcc.dg/ipa/ipa-icf-merge-1.c: New.
	
Index: ipa-icf.c
===
--- ipa-icf.c	(revision 231770)
+++ ipa-icf.c	(working copy)
@@ -3398,14 +3398,20 @@ sem_item_optimizer::merge_classes (unsig
 	if (c->members.length () == 1)
 	  continue;
 
-	gcc_assert (c->members.length ());
-
 	sem_item *source = c->members[0];
 
-	for (unsigned int j = 1; j < c->members.length (); j++)
+	if (MAIN_NAME_P (DECL_NAME (source->decl)))
+	  /* If merge via wrappers, picking main as the target can be
+	 problematic.  */
+	  source = c->members[1];
+
+	for (unsigned int j = 0; j < c->members.length (); j++)
 	  {
 	sem_item *alias = c->members[j];
 
+	if (alias == source)
+	  continue;
+
 	if (dump_file)
 	  {
 		fprintf (dump_file, "Semantic equality hit:%s->%s\n",
Index: testsuite/gcc.dg/ipa/ipa-icf-merge-1.c
===
--- testsuite/gcc.dg/ipa/ipa-icf-merge-1.c	(revision 0)
+++ testsuite/gcc.dg/ipa/ipa-icf-merge-1.c	(working copy)
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O2 -fdump-ipa-icf" } */
+
+/* Picking 'main' as a candiate target for equivalent functios is not a
+   good idea.  */
+
+int baz (int);
+
+int foo ()
+{
+  return baz (baz (0));
+}
+
+
+int main ()
+{
+  return baz (baz (0));
+}
+
+/* Notice the two functions are the same.  */
+/* { dg-final { scan-ipa-dump "Semantic equality hit:foo->main" "icf" } } */
+
+/* Make sure we don't tail call main.  */
+/* { dg-final { scan-ipa-dump-not "= main \\(\\);" "icf" } } */
+
+/* Make sure we tail call foo.  */
+/* { dg-final { scan-ipa-dump "= foo \\(\\);" "icf" } } */

Re: [PATCH] Fix PR c++/68831 (superfluous -Waddress warning for C++ delete)

2015-12-17 Thread Patrick Palka

On Thu, Dec 10, 2015 at 6:54 PM, Patrick Palka  wrote:
> Is this OK to commit if bootstrap + regtest on x86_64 succeeds?
>
> gcc/cp/ChangeLog:
>
> PR c++/68831
> * init.c (build_delete): Use a warning sentinel to disable
> -Waddress warnings when building the conditional that tests
> if the operand is NULL.
>
> gcc/testsuite/ChangeLog:
>
> PR c++/68831
> * g++.dg/pr68831.C: New test.

Ping.

> ---
>  gcc/cp/init.c  |  1 +
>  gcc/testsuite/g++.dg/pr68831.C | 10 ++
>  2 files changed, 11 insertions(+)
>  create mode 100644 gcc/testsuite/g++.dg/pr68831.C
>
> diff --git a/gcc/cp/init.c b/gcc/cp/init.c
> index 5ecf9fb..2fffc61 100644
> --- a/gcc/cp/init.c
> +++ b/gcc/cp/init.c
> @@ -4439,6 +4439,7 @@ build_delete (tree otype, tree addr, 
> special_function_kind auto_delete,
>else
> {
>   /* Handle deleting a null pointer.  */
> + warning_sentinel s (warn_address);
>   ifexp = fold (cp_build_binary_op (input_location,
> NE_EXPR, addr, nullptr_node,
> complain));
> diff --git a/gcc/testsuite/g++.dg/pr68831.C b/gcc/testsuite/g++.dg/pr68831.C
> new file mode 100644
> index 000..8d32819
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/pr68831.C
> @@ -0,0 +1,10 @@
> +// PR c++/68831
> +// { dg-options "-Waddress" }
> +
> +class DenseMap {
> +public:
> +  ~DenseMap();
> +};
> +extern const DenseMap 
> +void foo() { delete  }
> +
> --
> 2.6.4.491.gda30757.dirty
>

Re: config-list.mk and obsoleted configurations


On 12/17/2015 11:34 AM, Jan-Benedict Glaw wrote:

On Thu, 2015-12-17 11:05:42 -0700, Jeff Law  wrote:

On 12/16/2015 03:46 AM, Jan-Benedict Glaw wrote:

Shall I bisect one of the cases anew, with the "Test value of
_GLIBCXX_USE_C99_WCHAR not whether it is defined" patch that
uncovered it, applied? Starting with some arbitrary old revision?

Yes.  I'd really like to see config-list.mk working again.  The
first step is always building a test the developers can easily work
with.


Will do. Have a good starting point?
The biggest problem is the breakage around wither USE_C99_WCHAR or 
delayed folding.  I think I counted 30+ targets that were effected.


Once that's settled, I suspect anything remaining will be pretty minor.

I'd disable interix completely.

Not sure what to do with avr-rtems at this point.


   Oh, there are some targets that were obsoleted today. I think the
OpenBSD3 and the two knetbsd configurations will need an
--enable-obsolete. I suggest this (untested) patch:

contrib/
2015-12-17  Jan-Benedict Glaw  

* config-list.mk (LIST): Add --enable-obsolete to recently obsoleted
targets x86_64-knetbsd-gnu, i686-knetbsd-gnu and i686-openbsd3.0 .

Seems fine to me once it's gone through whatever testing you want to do.

jeff

Re: config-list.mk and obsoleted configurations


On 12/17/2015 11:58 AM, Jan-Benedict Glaw wrote:

On Thu, 2015-12-17 11:39:24 -0700, Jeff Law  wrote:

On 12/17/2015 11:34 AM, Jan-Benedict Glaw wrote:

On Thu, 2015-12-17 11:05:42 -0700, Jeff Law  wrote:

On 12/16/2015 03:46 AM, Jan-Benedict Glaw wrote:

Shall I bisect one of the cases anew, with the "Test value of
_GLIBCXX_USE_C99_WCHAR not whether it is defined" patch that
uncovered it, applied? Starting with some arbitrary old revision?

Yes.  I'd really like to see config-list.mk working again.  The
first step is always building a test the developers can easily work
with.


Will do. Have a good starting point?

The biggest problem is the breakage around wither USE_C99_WCHAR or delayed
folding.  I think I counted 30+ targets that were effected.


It's probably delayed folding; seems the USE_C99_WCHAR stuff only
uncovers it, doesn't it?


Once that's settled, I suspect anything remaining will be pretty minor.

I'd disable interix completely.


Seems to be not hard to fix. Breaks with:
I know, but it's not worth fixing IMHO.  Interix has been a dead product 
for a long time.  We almost got rid of it several years ago, but someone 
objected and said they'd maintain it.  I asked Trevor to put it back on 
the deprecated list a little while ago.


AFAICT it hasn't been building since 2012.  I fixed some of the problems 
a few months ago, but just can't really justify anyone's time to figure 
out which way to #define this away to preserve prior behaviour and to 
continue to keep it working over time.






Not sure what to do with avr-rtems at this point.


My buildrobot just fails at the very same USE_C99_WCHAR issue right
now. Is there something more hidden, later on in the build?
avr-rtems has deeper issues, which ultimately look like the same problem 
you're seeing with delayed folding, but aren't the same problem.


Essentially avr-rtems's definitions of various standard types are all 
conditional on flags with a default that is NULL.  Those are ultimately 
passed to one of the str* functions and GCC throws a warning/failure.


There's no way to fold those down to a constant, (or even to prove the 
NULL case couldn't happen IIRC).  So even once the current delayed 
folding issue gets fixed, avr-rtems will remain broken.


It's also unclear how long avr-rtems will be around.  I get the sense 
it's on its last legs -- and given we have both avr and rtems coverage 
via other targets, I don't think building avr-rtems is really all that 
helpful.


Jeff

Re: [PATCH] IRA: Fix % constraint modifier handling on disabled alternatives.

2015-12-17 Thread Vladimir Makarov


On 12/14/2015 08:05 AM, Andreas Krebbel wrote:

Hi,

the constraint modifier % applies to all the alternatives of a pattern
and hence is mostly added to the first constraint of an operand.  IRA
currently ignores it if the alternative with the % gets disabled by
using the `enabled' attribute or if it is not among the preferred
alternatives.

Fixed with the attached patch by moving the % check to the first loop
which walks unconditionally over all the constraints.

Ok for mainline?



Yes, Andreas.

Thanks for working on this issue.

Re: PATCH: PR target/66232: -fPIC -fno-plt -mx32 fails to generate indirect branch via GOT