Re: [02/13] Replace handle_cache_entry with new interface

2015-06-24 Thread Richard Sandiford
Jeff Law l...@redhat.com writes:
 On 06/16/2015 02:45 AM, Richard Sandiford wrote:
 As described in the covering note, this patch replaces handle_cache_entry
 with a new function keep_cache_entry.  It also ensures that elements are
 deleted using the proper function, so that m_n_deleted is updated.

 I couldn't tell whether the unusual name of the function
 (gt_cleare_cache) is deliberate or not, but I left it be.
 Short-hand for clear_entry or something similar?

Yeah, could be.

 gcc/ada/
  * gcc-interface/decl.c (value_annotation_hasher::handle_cache_entry):
  Delete.
  (value_annotation_hasher::keep_cache_entry): New function.
  * gcc-interface/utils.c (pad_type_hasher::handle_cache_entry):
  Delete.
  (pad_type_hasher::keep_cache_entry): New function.

 gcc/
  * hash-table.h (hash_table): Add gt_cleare_cache as a friend.
  (gt_cleare_cache): Check here for deleted and empty entries.
  Replace handle_cache_entry with a call to keep_cache_entry.
  * hash-traits.h (ggc_cache_hasher::handle_cache_entry): Delete.
  (ggc_cache_hasher::keep_cache_entry): New function.
  * trans-mem.c (tm_wrapper_hasher::handle_cache_entry): Delete.
  (tm_wrapper_hasher::keep_cache_entry): New function.
  * tree.h (tree_decl_map_cache_hasher::handle_cache_entry): Delete.
  (tree_vec_map_cache_hasher::keep_cache_entry): New function.
  * tree.c (type_cache_hasher::handle_cache_entry): Delete.
  (type_cache_hasher::keep_cache_entry): New function.
  (tree_vec_map_cache_hasher::handle_cache_entry): Delete.
  (tree_vec_map_cache_hasher::keep_cache_entry): New function.
  * ubsan.c (tree_type_map_cache_hasher::handle_cache_entry): Delete.
  (tree_type_map_cache_hasher::keep_cache_entry): New function.
  * varasm.c (tm_clone_hasher::handle_cache_entry): Delete.
  (tm_clone_hasher::keep_cache_entry): New function.
  * config/i386/i386.c (dllimport_hasher::handle_cache_entry): Delete.
  (dllimport_hasher::keep_cache_entry): New function.
 So for all the keep_cache_entry functions, I guess they're trivial 
 enough that a function comment probably isn't needed.

Yeah.  For cases like this where the function is implementing a defined
interface (described in hash-table.h), I think it's better to only have
comments for implementations that are doing something non-obvious.

 Presumably no good way to share the trivial implementation?

Probably not without sharing the other parts of the traits in some way.
That might be another possible cleanup :-)

Thanks,
Richard



[gomp4, committed] Add replace_uses_in_dominated_bbs

2015-06-24 Thread Tom de Vries

Hi,

this patch factors out a new function replace_uses_in_dominated_bbs
out of rewrite_virtuals_into_loop_closed_ssa.

Committed to gomp-4_0-branch.

Thanks,
- Tom
Add replace_uses_in_dominated_bbs

2015-06-18  Tom de Vries  t...@codesourcery.com

	* tree-ssa-loop-manip.c (replace_uses_in_dominated_bbs): Factor out
	of ...
	(rewrite_virtuals_into_loop_closed_ssa): ... here.
---
 gcc/tree-ssa-loop-manip.c | 26 --
 1 file changed, 16 insertions(+), 10 deletions(-)

diff --git a/gcc/tree-ssa-loop-manip.c b/gcc/tree-ssa-loop-manip.c
index 9c558ca..0d2c972 100644
--- a/gcc/tree-ssa-loop-manip.c
+++ b/gcc/tree-ssa-loop-manip.c
@@ -588,6 +588,21 @@ replace_uses_in_bbs_by (tree name, tree val, bitmap bbs)
 }
 }
 
+/* Replace uses of OLD_VAL with NEW_VAL in bbs dominated by BB.  */
+
+static void
+replace_uses_in_dominated_bbs (tree old_val, tree new_val, basic_block bb)
+{
+  bitmap dominated = BITMAP_ALLOC (NULL);
+
+  bitmap_get_dominated_by (CDI_DOMINATORS, bb, dominated);
+  bitmap_set_bit (dominated, bb-index);
+
+  replace_uses_in_bbs_by (old_val, new_val, dominated);
+
+  BITMAP_FREE (dominated);
+}
+
 /* Ensure a virtual phi is present in the exit block, if LOOP contains a vdef.
In other words, ensure loop-closed ssa normal form for virtuals.  */
 
@@ -635,17 +650,8 @@ rewrite_virtuals_into_loop_closed_ssa (struct loop *loop)
 
   tree res_new = copy_ssa_name (final_loop, NULL);
   gphi *nphi = create_phi_node (res_new, exit-dest);
-
-  /* Gather the bbs dominated by the exit block.  */
-  bitmap exit_dominated = BITMAP_ALLOC (NULL);
-  bitmap_get_dominated_by (CDI_DOMINATORS, exit-dest, exit_dominated);
-  bitmap_set_bit (exit_dominated, exit-dest-index);
-
-  replace_uses_in_bbs_by (final_loop, res_new, exit_dominated);
-
+  replace_uses_in_dominated_bbs (final_loop, res_new, exit-dest);
   add_phi_arg (nphi, final_loop, exit, UNKNOWN_LOCATION);
-
-  BITMAP_FREE (exit_dominated);
 }
 
 /* Check invariants of the loop closed ssa form for the USE in BB.  */
-- 
1.9.1



[gomp4, committed] Add get_virtual_phi

2015-06-24 Thread Tom de Vries

Hi,

this patch factors new function get_virtual_phi out of 
rewrite_virtuals_into_loop_closed_ssa.


Committed to gomp-4_0-branch.

Thanks,
- Tom
Add get_virtual_phi

2015-06-18  Tom de Vries  t...@codesourcery.com

	* tree-ssa-loop-manip.c (get_virtual_phi): Factor out of ...
	(rewrite_virtuals_into_loop_closed_ssa): ... here.
---
 gcc/tree-ssa-loop-manip.c | 44 
 1 file changed, 20 insertions(+), 24 deletions(-)

diff --git a/gcc/tree-ssa-loop-manip.c b/gcc/tree-ssa-loop-manip.c
index 0d2c972..b7c3676 100644
--- a/gcc/tree-ssa-loop-manip.c
+++ b/gcc/tree-ssa-loop-manip.c
@@ -603,6 +603,24 @@ replace_uses_in_dominated_bbs (tree old_val, tree new_val, basic_block bb)
   BITMAP_FREE (dominated);
 }
 
+/* Return the virtual phi in BB.  */
+
+static gphi *
+get_virtual_phi (basic_block bb)
+{
+  for (gphi_iterator gsi = gsi_start_phis (bb);
+   !gsi_end_p (gsi);
+   gsi_next (gsi))
+{
+  gphi *phi = gsi.phi ();
+
+  if (virtual_operand_p (PHI_RESULT (phi)))
+	return phi;
+}
+
+  return NULL;
+}
+
 /* Ensure a virtual phi is present in the exit block, if LOOP contains a vdef.
In other words, ensure loop-closed ssa normal form for virtuals.  */
 
@@ -612,35 +630,13 @@ rewrite_virtuals_into_loop_closed_ssa (struct loop *loop)
   gphi *phi;
   edge exit = single_dom_exit (loop);
 
-  phi = NULL;
-  for (gphi_iterator gsi = gsi_start_phis (loop-header);
-   !gsi_end_p (gsi);
-   gsi_next (gsi))
-{
-  if (virtual_operand_p (PHI_RESULT (gsi.phi (
-	{
-	  phi = gsi.phi ();
-	  break;
-	}
-}
-
+  phi = get_virtual_phi (loop-header);
   if (phi == NULL)
 return;
 
   tree final_loop = PHI_ARG_DEF_FROM_EDGE (phi, single_succ_edge (loop-latch));
 
-  phi = NULL;
-  for (gphi_iterator gsi = gsi_start_phis (exit-dest);
-   !gsi_end_p (gsi);
-   gsi_next (gsi))
-{
-  if (virtual_operand_p (PHI_RESULT (gsi.phi (
-	{
-	  phi = gsi.phi ();
-	  break;
-	}
-}
-
+  phi = get_virtual_phi (exit-dest);
   if (phi != NULL)
 {
   tree final_exit = PHI_ARG_DEF_FROM_EDGE (phi, exit);
-- 
1.9.1



[PATCH 3/8] S/390: Disable effect of support_vector_misalignment hook for non-z13 targets.

2015-06-24 Thread Andreas Krebbel
gcc/ChangeLog:

* config/s390/s390.c (s390_support_vector_misalignment): Call
default implementation for !TARGET_VX.
---
 gcc/config/s390/s390.c |6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 859ed68..80a2c89 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -13728,7 +13728,11 @@ s390_support_vector_misalignment (machine_mode mode 
ATTRIBUTE_UNUSED,
  int misalignment ATTRIBUTE_UNUSED,
  bool is_packed ATTRIBUTE_UNUSED)
 {
-  return true;
+  if (TARGET_VX)
+return true;
+
+  return default_builtin_support_vector_misalignment (mode, type, misalignment,
+ is_packed);
 }
 
 /* The vector ABI requires vector types to be aligned on an 8 byte
-- 
1.7.9.5



[PATCH 2/8] S/390: Limit legitimate_constant_p changes to TARGET_VX.

2015-06-24 Thread Andreas Krebbel
gcc/ChangeLog:

* config/s390/s390.c (s390_legitimate_constant_p): Add
  TARGET_VX check.
---
 gcc/config/s390/s390.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 934f7c0..859ed68 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -3611,7 +3611,7 @@ legitimate_pic_operand_p (rtx op)
 static bool
 s390_legitimate_constant_p (machine_mode mode, rtx op)
 {
-  if (VECTOR_MODE_P (mode)  GET_CODE (op) == CONST_VECTOR)
+  if (TARGET_VX  VECTOR_MODE_P (mode)  GET_CODE (op) == CONST_VECTOR)
 {
   if (GET_MODE_SIZE (mode) != 16)
return 0;
-- 
1.7.9.5



[PATCH 8/8] S/390: Switch mode attribute to bhfgq for vec scatter patterns.

2015-06-24 Thread Andreas Krebbel
This fixes the mode attribute used in the vec scatter insn
definitions.  vec_scatter_elementmode_non_vec_int and
vec_scatter_elementV_HW_64:mode_SI were using gf mode attribute
which does not support vector modes.

gcc/ChangeLog:

* config/s390/vx-builtins.md
(vec_scatter_elementmode_non_vec_int)
(vec_scatter_elementV_HW_64:mode_SI): Replace gf mode
attribute with bhfgq.
---
 gcc/config/s390/vx-builtins.md |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/s390/vx-builtins.md b/gcc/config/s390/vx-builtins.md
index e306ee8..35ada13 100644
--- a/gcc/config/s390/vx-builtins.md
+++ b/gcc/config/s390/vx-builtins.md
@@ -414,7 +414,7 @@
(unspec:non_vec [(match_operand:V_HW_640 
register_operand  v)
   (match_dup 3)] UNSPEC_VEC_EXTRACT))]
   TARGET_VX  !TARGET_64BIT
-  vsceV_HW_64:gf\t%v0,%O2(%v1,%R2),%3
+  vsceV_HW_64:bhfgq\t%v0,%O2(%v1,%R2),%3
   [(set_attr op_type VRV)])
 
 ; Element size and target adress size is the same
@@ -428,7 +428,7 @@
(unspec:non_vec [(match_operand:V_HW_32_64 0 register_operand  
v)
   (match_dup 3)] UNSPEC_VEC_EXTRACT))]
   TARGET_VX
-  vscegf\t%v0,%O2(%v1,%R2),%3
+  vscebhfgq\t%v0,%O2(%v1,%R2),%3
   [(set_attr op_type VRV)])
 
 ; Depending on the address size we have to expand a different pattern.
-- 
1.7.9.5



[PATCH 5/8] S/390: Remove internal builtins from vecintrin.h.

2015-06-24 Thread Andreas Krebbel
This patch removes a couple for builtin definitions from the
vecintrin.h file which are supposed to be used only internally.

gcc/ChangeLog:

* config/s390/vecintrin.h: Remove internal builtins.
---
 gcc/config/s390/vecintrin.h |   35 ---
 1 file changed, 35 deletions(-)

diff --git a/gcc/config/s390/vecintrin.h b/gcc/config/s390/vecintrin.h
index 95851f4..2e26e3a 100644
--- a/gcc/config/s390/vecintrin.h
+++ b/gcc/config/s390/vecintrin.h
@@ -160,9 +160,6 @@ vec_any_numeric (__vector double a)
 #define vec_packs __builtin_s390_vec_packs
 #define vec_packs_cc __builtin_s390_vec_packs_cc
 #define vec_packsu __builtin_s390_vec_packsu
-#define vec_packsu_u16 __builtin_s390_vec_packsu_u16
-#define vec_packsu_u32 __builtin_s390_vec_packsu_u32
-#define vec_packsu_u64 __builtin_s390_vec_packsu_u64
 #define vec_packsu_cc __builtin_s390_vec_packsu_cc
 #define vec_perm __builtin_s390_vec_perm
 #define vec_permi __builtin_s390_vec_permi
@@ -179,42 +176,12 @@ vec_any_numeric (__vector double a)
 #define vec_and __builtin_s390_vec_and
 #define vec_andc __builtin_s390_vec_andc
 #define vec_avg __builtin_s390_vec_avg
-#define vec_all_eqv16qi __builtin_vec_all_eqv16qi
-#define vec_all_eqv8hi __builtin_vec_all_eqv8hi
-#define vec_all_eqv4si __builtin_vec_all_eqv4si
-#define vec_all_eqv2di __builtin_vec_all_eqv2di
-#define vec_all_eqv2df __builtin_vec_all_eqv2df
-#define vec_all_gev16qi __builtin_vec_all_gev16qi
-#define vec_all_geuv16qi __builtin_vec_all_geuv16qi
-#define vec_all_gev8hi __builtin_vec_all_gev8hi
-#define vec_all_geuv8hi __builtin_vec_all_geuv8hi
-#define vec_all_gev4si __builtin_vec_all_gev4si
-#define vec_all_geuv4si __builtin_vec_all_geuv4si
-#define vec_all_gev2di __builtin_vec_all_gev2di
-#define vec_all_geuv2di __builtin_vec_all_geuv2di
-#define vec_all_gev2df __builtin_vec_all_gev2df
-#define vec_all_gtv2df __builtin_vec_all_gtv2df
 #define vec_all_eq __builtin_s390_vec_all_eq
 #define vec_all_ne __builtin_s390_vec_all_ne
 #define vec_all_ge __builtin_s390_vec_all_ge
 #define vec_all_gt __builtin_s390_vec_all_gt
 #define vec_all_le __builtin_s390_vec_all_le
 #define vec_all_lt __builtin_s390_vec_all_lt
-#define vec_any_eqv16qi __builtin_vec_any_eqv16qi
-#define vec_any_eqv8hi __builtin_vec_any_eqv8hi
-#define vec_any_eqv4si __builtin_vec_any_eqv4si
-#define vec_any_eqv2di __builtin_vec_any_eqv2di
-#define vec_any_eqv2df __builtin_vec_any_eqv2df
-#define vec_any_gev16qi __builtin_vec_any_gev16qi
-#define vec_any_geuv16qi __builtin_vec_any_geuv16qi
-#define vec_any_gev8hi __builtin_vec_any_gev8hi
-#define vec_any_geuv8hi __builtin_vec_any_geuv8hi
-#define vec_any_gev4si __builtin_vec_any_gev4si
-#define vec_any_geuv4si __builtin_vec_any_geuv4si
-#define vec_any_gev2di __builtin_vec_any_gev2di
-#define vec_any_geuv2di __builtin_vec_any_geuv2di
-#define vec_any_gev2df __builtin_vec_any_gev2df
-#define vec_any_gtv2df __builtin_vec_any_gtv2df
 #define vec_any_eq __builtin_s390_vec_any_eq
 #define vec_any_ne __builtin_s390_vec_any_ne
 #define vec_any_ge __builtin_s390_vec_any_ge
@@ -233,9 +200,7 @@ vec_any_numeric (__vector double a)
 #define vec_gfmsum_accum __builtin_s390_vec_gfmsum_accum
 #define vec_abs __builtin_s390_vec_abs
 #define vec_max __builtin_s390_vec_max
-#define vec_max_dbl __builtin_s390_vec_max_dbl
 #define vec_min __builtin_s390_vec_min
-#define vec_min_dbl __builtin_s390_vec_min_dbl
 #define vec_mladd __builtin_s390_vec_mladd
 #define vec_mhadd __builtin_s390_vec_mhadd
 #define vec_meadd __builtin_s390_vec_meadd
-- 
1.7.9.5



[PATCH 0/8] S/390: z13 support fixes and improvements

2015-06-24 Thread Andreas Krebbel
Hi,

the following patchset consists of minor improvements and fixes.  The
most notable change is the conditional builtin creation patch which
tries to limit a lot of the builtin initialization work to -march=z13.

It also includes the GNU vector ABI attribute patch from the original
series which I didn't commit until now.

They have been tested on head with --with-arch=z13 without
regressions.

Committed to mainline.

Bye,

-Andreas-

Andreas Krebbel (8):
  S/390 Vector ABI GNU Attribute.
  S/390: Limit legitimate_constant_p changes to TARGET_VX.
  S/390: Disable effect of support_vector_misalignment hook for non-z13
targets.
  S/390: Fix s390_secondary_reload register class check.
  S/390: Remove internal builtins from vecintrin.h.
  S/390: Make builtin creation conditional.
  S/390: Add proper comments to vpopct builtins for automated testsuite
generation.
  S/390: Switch mode attribute to bhfgq for vec scatter patterns.

 gcc/config/s390/s390-builtin-types.def |  585 +-
 gcc/config/s390/s390-builtins.def  | 1231 ++--
 gcc/config/s390/s390-builtins.h|   37 +-
 gcc/config/s390/s390-c.c   |   11 +-
 gcc/config/s390/s390.c |  243 +++-
 gcc/config/s390/vecintrin.h|   35 -
 gcc/config/s390/vx-builtins.md |4 +-
 gcc/configure  |   36 +
 gcc/configure.ac   |7 +
 gcc/testsuite/gcc.target/s390/vector/vec-abi-1.c   |1 +
 .../gcc.target/s390/vector/vec-abi-attr-1.c|   18 +
 .../gcc.target/s390/vector/vec-abi-attr-2.c|   53 +
 .../gcc.target/s390/vector/vec-abi-attr-3.c|   18 +
 .../gcc.target/s390/vector/vec-abi-attr-4.c|   17 +
 .../gcc.target/s390/vector/vec-abi-attr-5.c|   19 +
 .../gcc.target/s390/vector/vec-abi-attr-6.c|   24 +
 16 files changed, 1336 insertions(+), 1003 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-abi-attr-1.c
 create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-abi-attr-2.c
 create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-abi-attr-3.c
 create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-abi-attr-4.c
 create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-abi-attr-5.c
 create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-abi-attr-6.c

-- 
1.7.9.5



Re: [PATCH][testsuite] Fix TORTURE_OPTIONS overriding

2015-06-24 Thread Richard Biener
On Tue, 23 Jun 2015, James Greenhalgh wrote:

 
 On Thu, Jun 18, 2015 at 11:10:01AM +0100, Richard Biener wrote:
 
  Currently when doing
 
  make check-gcc RUNTESTFLAGS=TORTURE_OPTIONS=\\\{ -O3 } { -O2 }\\\
  dg-torture.exp
 
  you get -O3 and -O2 but also the two LTO torture option combinations.
  That's undesired (those are the most expensive anyway).  The following
  patch avoids this by setting LTO_TORTURE_OPTIONS only when
  TORTURE_OPTIONS isn't specified.
 
  Tested with and without TORTURE_OPTIONS for C and fortran tortures.
 
  Seems the instruction in c-torture.exp how to override TORTURE_OPTIONS
  is off, RUNTESTFLAGS=TORTURE_OPTIONS=\\\{ { -O3 } { -O2 } }\\\
  certainly doesn't do what it should.
 
 This patch causes issues for ARM and AArch64 cross multilib
 testing. There are two issues, one is that we now clobber
 gcc_force_conventional_output after setting it in the conditional this patch
 moved (hits all targets, see the new x86-64 failures like pr61848.c).
 
 The other is that we no longer protect environment settings before calling
 check_effective_target_lto, which results in our cross --specs files no
 longer being on the path.
 
 I've fixed these issues by rearranging the file again, but I'm not
 sure if what I've done is sensible and does not cause other issues. This
 seems to bring back the tests I'd lost overnight, and doesn't cause
 issues elsewhere.
 
 I've run some cross-tests to ensure this brings back the missing tests,
 and a full x86-64 testrun to make sure I haven't dropped any from there.
 
 OK for trunk?

Ok.

Thanks,
Richard.

 Thanks,
 James
 
 ---
 2015-06-23  James Greenhalgh  james.greenha...@arm.com
 
 * lib/c-torture.exp: Don't call check_effective_target_lto
   before setting up environment correctly.
 * lib/gcc-dg.exp: Likewise, and protect
   gcc_force_conventional_output.
 
 

-- 
Richard Biener rguent...@suse.de
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Dilip Upmanyu, Graham 
Norton, HRB 21284 (AG Nuernberg)


Re: [PATCH 4.8] PR66306 Fix reload problem with commutative operands

2015-06-24 Thread Andreas Krebbel
On 06/16/2015 07:40 PM, Ulrich Weigand wrote:
 Andreas Krebbel wrote:
 
 this fixes a reload problem with match_dup's on commutative operands.

 Bootstrapped and regtested on x86-64, ppc64, and s390x.

 Ok?

 Bye,

 -Andreas-

 2015-06-11  Andreas Krebbel  kreb...@linux.vnet.ibm.com

  PR rtl-optimization/66306
  * reload.c (find_reloads): Swap the match_dup info for
  commutative operands.
 
 This does indeed appear to be broken, and the fix looks good to me.
 
 However, I'm not clear why this should be a 4.8 only patch ... the
 same problem seems to be still there on mainline, right?
 
 Patch is OK for mainline if it passes regression tests there.

I've committed the patch after successful testing on PPC64 and s390x. I 
couldn't get reload working
on x86_64 quickly.

Bye,

-Andreas-



[ARM] Correct spelling of references to ARMv6KZ

2015-06-24 Thread Matthew Wahab

Hello,

GCC supports ARM architecture ARMv6KZ but refers to it as ARMv6ZK. This is made
visible by the command line option -march=armv6zk and by the predefined macro
__ARM_ARCH_6ZK__.

This patch corrects the spelling internally and adds -march=armv6kz. To preserve
existing behaviour, -march=armv6zk is kept as an alias of -march=armv6kz and
both __ARM_ARCH_6KZ__ and __ARM_ARCH_6ZK__ macros are defined for the
architecture.

Use of -march=arm6kz will need to wait for binutils to be updated, a patch has
been submitted (https://sourceware.org/ml/binutils/2015-06/msg00236.html). Use
of the existing spelling, -march=arm6zk, still works with current binutils.

Tested arm-none-linux-gnueabihf with check-gcc.

Ok for trunk?
Matthew

gcc/
2015-15-24  Matthew Wahab  matthew.wa...@arm.com

* config/arm/arm-arches.def: Add armv6kz. Replace 6ZK with 6KZ
and FL_FOR_ARCH6ZK with FL_FOR_ARCH6KZ.
* config/arm/arm-c.c (arm_cpu_builtins): Emit __ARM_ARCH_6ZK__
for armv6kz targets.
* config/arm/arm-cores.def: Replace 6ZK with 6KZ.
* config/arm/arm-protos.h (FL_ARCH6KZ): New.
(FL_FOR_ARCH6ZK): Remove.
(FL_FOR_ARCH6KZ): New.
(arm_arch6zk): New declaration.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/arm.c (arm_arch6kz): New.
(arm_option_override): Set arm_arch6kz.
* config/arm/arm.h (BASE_ARCH_6ZK): Rename to BASE_ARCH_6KZ.
* config/arm/driver-arm.c: Add armv6kz.
* doc/invoke.texi: Replace armv6zk with armv6kz and
armv6zkt2 with armv6kzt2.
diff --git a/gcc/config/arm/arm-arches.def b/gcc/config/arm/arm-arches.def
index 840c1ff..3dafaa5 100644
--- a/gcc/config/arm/arm-arches.def
+++ b/gcc/config/arm/arm-arches.def
@@ -44,7 +44,8 @@ ARM_ARCH(armv6,   arm1136js,  6,   FL_CO_PROC | FL_FOR_ARCH6)
 ARM_ARCH(armv6j,  arm1136js,  6J,  FL_CO_PROC | FL_FOR_ARCH6J)
 ARM_ARCH(armv6k,  mpcore,	6K,  FL_CO_PROC | FL_FOR_ARCH6K)
 ARM_ARCH(armv6z,  arm1176jzs, 6Z,  FL_CO_PROC | FL_FOR_ARCH6Z)
-ARM_ARCH(armv6zk, arm1176jzs, 6ZK, FL_CO_PROC | FL_FOR_ARCH6ZK)
+ARM_ARCH(armv6kz, arm1176jzs, 6KZ, FL_CO_PROC | FL_FOR_ARCH6KZ)
+ARM_ARCH(armv6zk, arm1176jzs, 6KZ, FL_CO_PROC | FL_FOR_ARCH6KZ)
 ARM_ARCH(armv6t2, arm1156t2s, 6T2, FL_CO_PROC | FL_FOR_ARCH6T2)
 ARM_ARCH(armv6-m, cortexm1,	6M,			  FL_FOR_ARCH6M)
 ARM_ARCH(armv6s-m, cortexm1,	6M,			  FL_FOR_ARCH6M)
diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c
index 6aa59ad..e2d458c 100644
--- a/gcc/config/arm/arm-c.c
+++ b/gcc/config/arm/arm-c.c
@@ -169,6 +169,11 @@ arm_cpu_builtins (struct cpp_reader* pfile, int flags)
 }
   if (arm_arch_iwmmxt2)
 builtin_define (__IWMMXT2__);
+  /* ARMv6KZ was originally identified as the misspelled __ARM_ARCH_6ZK__.  To
+ preserve the existing behaviour, the misspelled feature macro must still be
+ defined.  */
+  if (arm_arch6kz)
+builtin_define (__ARM_ARCH_6ZK__);
   if (TARGET_AAPCS_BASED)
 {
   if (arm_pcs_default == ARM_PCS_AAPCS_VFP)
diff --git a/gcc/config/arm/arm-cores.def b/gcc/config/arm/arm-cores.def
index 103c314..9d47fcf 100644
--- a/gcc/config/arm/arm-cores.def
+++ b/gcc/config/arm/arm-cores.def
@@ -125,8 +125,8 @@ ARM_CORE(arm1026ej-s,	arm1026ejs, arm1026ejs,	5TEJ, FL_LDSCHED, 9e)
 /* V6 Architecture Processors */
 ARM_CORE(arm1136j-s,		arm1136js, arm1136js,		6J,  FL_LDSCHED, 9e)
 ARM_CORE(arm1136jf-s,		arm1136jfs, arm1136jfs,		6J,  FL_LDSCHED | FL_VFPV2, 9e)
-ARM_CORE(arm1176jz-s,		arm1176jzs, arm1176jzs,		6ZK, FL_LDSCHED, 9e)
-ARM_CORE(arm1176jzf-s,	arm1176jzfs, arm1176jzfs,	6ZK, FL_LDSCHED | FL_VFPV2, 9e)
+ARM_CORE(arm1176jz-s,		arm1176jzs, arm1176jzs,		6KZ, FL_LDSCHED, 9e)
+ARM_CORE(arm1176jzf-s,	arm1176jzfs, arm1176jzfs,	6KZ, FL_LDSCHED | FL_VFPV2, 9e)
 ARM_CORE(mpcorenovfp,		mpcorenovfp, mpcorenovfp,	6K,  FL_LDSCHED, 9e)
 ARM_CORE(mpcore,		mpcore, mpcore,			6K,  FL_LDSCHED | FL_VFPV2, 9e)
 ARM_CORE(arm1156t2-s,		arm1156t2s, arm1156t2s,		6T2, FL_LDSCHED, v6t2)
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 62f91ef..7aae934 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -382,6 +382,7 @@ extern bool arm_is_constant_pool_ref (rtx);
 
 #define FL_IWMMXT (1  29)	  /* XScale v2 or Intel Wireless MMX technology.  */
 #define FL_IWMMXT2(1  30)   /* Intel Wireless MMX2 technology.  */
+#define FL_ARCH6KZ(1  31)   /* ARMv6KZ architecture.  */
 
 /* Flags that only effect tuning, not available instructions.  */
 #define FL_TUNE		(FL_WBUF | FL_VFPV2 | FL_STRONG | FL_LDSCHED \
@@ -401,7 +402,7 @@ extern bool arm_is_constant_pool_ref (rtx);
 #define FL_FOR_ARCH6J	FL_FOR_ARCH6
 #define FL_FOR_ARCH6K	(FL_FOR_ARCH6 | FL_ARCH6K)
 #define FL_FOR_ARCH6Z	FL_FOR_ARCH6
-#define FL_FOR_ARCH6ZK	FL_FOR_ARCH6K
+#define FL_FOR_ARCH6KZ	(FL_FOR_ARCH6K | FL_ARCH6KZ)
 #define FL_FOR_ARCH6T2	(FL_FOR_ARCH6 | FL_THUMB2)
 

[PATCH 4/8] S/390: Fix s390_secondary_reload register class check.

2015-06-24 Thread Andreas Krebbel
The current check does not work as expected with mixed register
classes and also does not handle NO_REGS correctly.

gcc/ChangeLog:

* config/s390/s390.c (s390_secondary_reload): Fix check for
GENERAL_REGS register class.
---
 gcc/config/s390/s390.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 80a2c89..a5aab5d 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -4045,7 +4045,7 @@ s390_secondary_reload (bool in_p, rtx x, reg_class_t 
rclass_i,
   if (MEM_P (x)
   s390_loadrelative_operand_p (XEXP (x, 0), NULL, NULL)
   (mode == QImode
- || !reg_classes_intersect_p (GENERAL_REGS, rclass)
+ || !reg_class_subset_p (rclass, GENERAL_REGS)
  || GET_MODE_SIZE (mode)  UNITS_PER_WORD
  || !s390_check_symref_alignment (XEXP (x, 0),
   GET_MODE_SIZE (mode
-- 
1.7.9.5



[PATCH 1/8] S/390 Vector ABI GNU Attribute.

2015-06-24 Thread Andreas Krebbel
With this patch .gnu_attribute is used to mark binaries with a vector
ABI tag.  This is required since the z13 vector support breaks the ABI
of existing vector_size attribute generated vector types:

1. vector_size(16) and bigger vectors are aligned to 8 byte
boundaries (formerly vectors were always naturally aligned)

2. vector_size(16) or smaller vectors are passed via VR if available
or by value on the stack (formerly vector were passed on the stack by
reference).

The .gnu_attribute will be used by ld to emit a warning if binaries
with incompatible ABIs are being linked together:
https://sourceware.org/ml/binutils/2015-04/msg00316.html

And it will be used by GDB to perform inferior function calls using a
vector ABI which fits to the binary being debugged:
https://sourceware.org/ml/gdb-patches/2015-04/msg00833.html

The current implementation tries to only set the attribute if the
vector types are really used in ABI relevant contexts in order to
avoid false positives during linking.

However, this unfortunately has some limitations like in the following
case where an ABI relevant context cannot be detected properly:

typedef int __attribute__((vector_size(16))) v4si;
struct A
{
  char x;
  v4si y;
};
char a[sizeof(struct A)];

The number of elements in a depends on the ABI (24 with -mvx and 32
with -mno-vx).  However, the implementation is not able to detect this
since the struct type is not used anywhere else and consequently does
not survive until the checking code is able to see it.

Ideas about how to improve the implementation without creating too
many false postives are welcome.

In particular we do not want to set the attribute for local uses of
vector types as they would be natural for ifunc optimizations.

gcc/
* config/s390/s390.c (s390_vector_abi): New variable definition.
(s390_check_type_for_vector_abi): New function.
(TARGET_ASM_FILE_END): New macro definition.
(s390_asm_file_end): New function.
(s390_function_arg): Call s390_check_type_for_vector_abi.
(s390_gimplify_va_arg): Likewise.
* configure: Regenerate.
* configure.ac: Check for .gnu_attribute Binutils feature.

gcc/testsuite/
* gcc.target/s390/vector/vec-abi-1.c: Add gnu attribute check.
* gcc.target/s390/vector/vec-abi-attr-1.c: New test.
* gcc.target/s390/vector/vec-abi-attr-2.c: New test.
* gcc.target/s390/vector/vec-abi-attr-3.c: New test.
* gcc.target/s390/vector/vec-abi-attr-4.c: New test.
* gcc.target/s390/vector/vec-abi-attr-5.c: New test.
* gcc.target/s390/vector/vec-abi-attr-6.c: New test.
---
 gcc/config/s390/s390.c |  121 
 gcc/configure  |   36 ++
 gcc/configure.ac   |7 ++
 gcc/testsuite/gcc.target/s390/vector/vec-abi-1.c   |1 +
 .../gcc.target/s390/vector/vec-abi-attr-1.c|   18 +++
 .../gcc.target/s390/vector/vec-abi-attr-2.c|   53 +
 .../gcc.target/s390/vector/vec-abi-attr-3.c|   18 +++
 .../gcc.target/s390/vector/vec-abi-attr-4.c|   17 +++
 .../gcc.target/s390/vector/vec-abi-attr-5.c|   19 +++
 .../gcc.target/s390/vector/vec-abi-attr-6.c|   24 
 10 files changed, 314 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-abi-attr-1.c
 create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-abi-attr-2.c
 create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-abi-attr-3.c
 create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-abi-attr-4.c
 create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-abi-attr-5.c
 create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-abi-attr-6.c

diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index d6ed179..934f7c0 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -461,6 +461,97 @@ struct GTY(()) machine_function
 #define PREDICT_DISTANCE (TARGET_Z10 ? 384 : 2048)
 
 
+/* Indicate which ABI has been used for passing vector args.
+   0 - no vector type arguments have been passed where the ABI is relevant
+   1 - the old ABI has been used
+   2 - a vector type argument has been passed either in a vector register
+   or on the stack by value  */
+static int s390_vector_abi = 0;
+
+/* Set the vector ABI marker if TYPE is subject to the vector ABI
+   switch.  The vector ABI affects only vector data types.  There are
+   two aspects of the vector ABI relevant here:
+
+   1. vectors = 16 bytes have an alignment of 8 bytes with the new
+   ABI and natural alignment with the old.
+
+   2. vector = 16 bytes are passed in VRs or by value on the stack
+   with the new ABI but by reference on the stack with the old.
+
+   If ARG_P is true TYPE is used for a function argument or return
+   value.  The ABI marker then is set for all vector data types.  If
+   ARG_P is false only type 1 vectors are being checked.  */
+

[PATCH 7/8] S/390: Add proper comments to vpopct builtins for automated testsuite generation.

2015-06-24 Thread Andreas Krebbel
This is a comment only change which is supposed to be used by the
autogenerated tests I run for the builtins.

gcc/ChangeLog:

* config/s390/s390-builtins.def: Fix vpopct instruction comments.
---
 gcc/config/s390/s390-builtins.def |   26 +-
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/gcc/config/s390/s390-builtins.def 
b/gcc/config/s390/s390-builtins.def
index 17f9c85..0a24da9 100644
--- a/gcc/config/s390/s390-builtins.def
+++ b/gcc/config/s390/s390-builtins.def
@@ -1774,19 +1774,19 @@ OB_DEF_VAR (s390_vec_or_dbl_c,  s390_vo,
0,
 B_DEF  (s390_vo,iorv16qi3,  0, 
 B_VX,   0,  BT_FN_UV16QI_UV16QI_UV16QI)
 
 OB_DEF (s390_vec_popcnt,s390_vec_popcnt_s8, 
s390_vec_popcnt_u64,B_VX,   BT_FN_OV4SI_OV4SI)
-OB_DEF_VAR (s390_vec_popcnt_s8, s390_vpopctb,   0, 
 BT_OV_UV16QI_V16QI)
-OB_DEF_VAR (s390_vec_popcnt_u8, s390_vpopctb,   0, 
 BT_OV_UV16QI_UV16QI)
-OB_DEF_VAR (s390_vec_popcnt_s16,s390_vpopcth,   0, 
 BT_OV_UV8HI_V8HI)
-OB_DEF_VAR (s390_vec_popcnt_u16,s390_vpopcth,   0, 
 BT_OV_UV8HI_UV8HI)
-OB_DEF_VAR (s390_vec_popcnt_s32,s390_vpopctf,   0, 
 BT_OV_UV4SI_V4SI)
-OB_DEF_VAR (s390_vec_popcnt_u32,s390_vpopctf,   0, 
 BT_OV_UV4SI_UV4SI)
-OB_DEF_VAR (s390_vec_popcnt_s64,s390_vpopctg,   0, 
 BT_OV_UV2DI_V2DI)
-OB_DEF_VAR (s390_vec_popcnt_u64,s390_vpopctg,   0, 
 BT_OV_UV2DI_UV2DI)
-
-B_DEF  (s390_vpopctb,   popcountv16qi2, 0, 
 B_VX,   0,  BT_FN_UV16QI_UV16QI)
-B_DEF  (s390_vpopcth,   popcountv8hi2,  0, 
 B_VX,   0,  BT_FN_UV8HI_UV8HI)
-B_DEF  (s390_vpopctf,   popcountv4si2,  0, 
 B_VX,   0,  BT_FN_UV4SI_UV4SI)
-B_DEF  (s390_vpopctg,   popcountv2di2,  0, 
 B_VX,   0,  BT_FN_UV2DI_UV2DI)
+OB_DEF_VAR (s390_vec_popcnt_s8, s390_vpopctb,   0, 
 BT_OV_UV16QI_V16QI)  /* vpopct */
+OB_DEF_VAR (s390_vec_popcnt_u8, s390_vpopctb,   0, 
 BT_OV_UV16QI_UV16QI) /* vpopct */
+OB_DEF_VAR (s390_vec_popcnt_s16,s390_vpopcth,   0, 
 BT_OV_UV8HI_V8HI)/* vpopct */
+OB_DEF_VAR (s390_vec_popcnt_u16,s390_vpopcth,   0, 
 BT_OV_UV8HI_UV8HI)   /* vpopct */
+OB_DEF_VAR (s390_vec_popcnt_s32,s390_vpopctf,   0, 
 BT_OV_UV4SI_V4SI)/* vpopct vsumb */
+OB_DEF_VAR (s390_vec_popcnt_u32,s390_vpopctf,   0, 
 BT_OV_UV4SI_UV4SI)   /* vpopct vsumb */
+OB_DEF_VAR (s390_vec_popcnt_s64,s390_vpopctg,   0, 
 BT_OV_UV2DI_V2DI)/* vpopct vsumb vsumgf */
+OB_DEF_VAR (s390_vec_popcnt_u64,s390_vpopctg,   0, 
 BT_OV_UV2DI_UV2DI)   /* vpopct vsumb vsumgf */
+
+B_DEF  (s390_vpopctb,   popcountv16qi2, 0, 
 B_VX,   0,  BT_FN_UV16QI_UV16QI) /* vpopct */
+B_DEF  (s390_vpopcth,   popcountv8hi2,  0, 
 B_VX,   0,  BT_FN_UV8HI_UV8HI)   /* vpopct */
+B_DEF  (s390_vpopctf,   popcountv4si2,  0, 
 B_VX,   0,  BT_FN_UV4SI_UV4SI)   /* vpopct vsumb */
+B_DEF  (s390_vpopctg,   popcountv2di2,  0, 
 B_VX,   0,  BT_FN_UV2DI_UV2DI)   /* vpopct vsumb 
vsumgf */
 
 OB_DEF (s390_vec_rl,s390_vec_rl_u8, s390_vec_rl_s64,   
 B_VX,   BT_FN_OV4SI_OV4SI_OV4SI)
 OB_DEF_VAR (s390_vec_rl_u8, s390_verllvb,   0, 
 BT_OV_UV16QI_UV16QI_UV16QI)
-- 
1.7.9.5



[gomp4, committed] Add bitmap_get_dominated_by

2015-06-24 Thread Tom de Vries

Hi,

this patch adds bitmap_get_dominated_by, a version of get_dominated_by 
that returns a bitmap rather than a vector.


Committed to gomp-4_0-branch.

Thanks,
- Tom
Add bitmap_get_dominated_by

2015-06-18  Tom de Vries  t...@codesourcery.com

	* dominance.c (bitmap_get_dominated_by): New function.
	* dominance.h (bitmap_get_dominated_by): Declare.
	* tree-ssa-loop-manip.c (rewrite_virtuals_into_loop_closed_ssa): Use
	bitmap_get_dominated_by.
---
 gcc/dominance.c   | 21 +
 gcc/dominance.h   |  1 +
 gcc/tree-ssa-loop-manip.c | 10 +-
 3 files changed, 23 insertions(+), 9 deletions(-)

diff --git a/gcc/dominance.c b/gcc/dominance.c
index 09c8c90..4b35ec4 100644
--- a/gcc/dominance.c
+++ b/gcc/dominance.c
@@ -757,6 +757,27 @@ set_immediate_dominator (enum cdi_direction dir, basic_block bb,
 dom_computed[dir_index] = DOM_NO_FAST_QUERY;
 }
 
+/* Returns in BBS the list of basic blocks immediately dominated by BB, in the
+   direction DIR.  As get_dominated_by, but returns result as a bitmap.  */
+
+void
+bitmap_get_dominated_by (enum cdi_direction dir, basic_block bb, bitmap bbs)
+{
+  unsigned int dir_index = dom_convert_dir_to_idx (dir);
+  struct et_node *node = bb-dom[dir_index], *son = node-son, *ason;
+
+  bitmap_clear (bbs);
+
+  gcc_checking_assert (dom_computed[dir_index]);
+
+  if (!son)
+return;
+
+  bitmap_set_bit (bbs, ((basic_block) son-data)-index);
+  for (ason = son-right; ason != son; ason = ason-right)
+bitmap_set_bit (bbs, ((basic_block) son-data)-index);
+}
+
 /* Returns the list of basic blocks immediately dominated by BB, in the
direction DIR.  */
 vecbasic_block 
diff --git a/gcc/dominance.h b/gcc/dominance.h
index 37e138b..0a1a13e 100644
--- a/gcc/dominance.h
+++ b/gcc/dominance.h
@@ -41,6 +41,7 @@ extern void free_dominance_info (enum cdi_direction);
 extern basic_block get_immediate_dominator (enum cdi_direction, basic_block);
 extern void set_immediate_dominator (enum cdi_direction, basic_block,
  basic_block);
+extern void bitmap_get_dominated_by (enum cdi_direction, basic_block, bitmap);
 extern vecbasic_block get_dominated_by (enum cdi_direction, basic_block);
 extern vecbasic_block get_dominated_by_region (enum cdi_direction,
 			 basic_block *,
diff --git a/gcc/tree-ssa-loop-manip.c b/gcc/tree-ssa-loop-manip.c
index 1150e6c..9c558ca 100644
--- a/gcc/tree-ssa-loop-manip.c
+++ b/gcc/tree-ssa-loop-manip.c
@@ -638,16 +638,8 @@ rewrite_virtuals_into_loop_closed_ssa (struct loop *loop)
 
   /* Gather the bbs dominated by the exit block.  */
   bitmap exit_dominated = BITMAP_ALLOC (NULL);
+  bitmap_get_dominated_by (CDI_DOMINATORS, exit-dest, exit_dominated);
   bitmap_set_bit (exit_dominated, exit-dest-index);
-  vecbasic_block exit_dominated_vec
-= get_dominated_by (CDI_DOMINATORS, exit-dest);
-
-  int i;
-  basic_block dom_bb;
-  FOR_EACH_VEC_ELT (exit_dominated_vec, i, dom_bb)
-bitmap_set_bit (exit_dominated, dom_bb-index);
-
-  exit_dominated_vec.release ();
 
   replace_uses_in_bbs_by (final_loop, res_new, exit_dominated);
 
-- 
1.9.1



Re: [VRP] Improve value ranges for unsigned division

2015-06-24 Thread Kugan


On 23/06/15 01:09, Richard Biener wrote:
 On Sat, Jun 20, 2015 at 9:12 AM, Kugan
 kugan.vivekanandara...@linaro.org wrote:
 As discussed in PR64130, this patch improves the VRP value ranges for
 unsigned division.

 Bootstrapped and regression tested on x86_64-linux-gnu and regression
 tested on arm-none-linux-gnu with no new regression.

 Is this OK for trunk?
 
 Hum, the patch is at least incomplete not covering the
 cmp == -1 case in the max value computation, no?

Thanks for the review. Attached patch adds this as well.

 
 Also I wonder if we have two VR_RANGEs as you require
 the code using extract_range_from_multiplicative_op_1 isn't
 better suited and already handles the case properly?


I tried with this approach. But, for value range of vr1 where min or max
is zero,  extract_range_from_multiplicative_op_1 will not be able to
infer range.

If we check for vr1 having zero before calling
extract_range_from_multiplicative_op_1, then it is not going get the
value range for the test case in PR (which is  2305843009213693951 / a
 where a has a range [0, 4294967295])


Thanks,
Kugan



 Richard.
 
 Thanks,
 Kugan

 gcc/ChangeLog:

 2015-06-20  Kugan Vivekanandarajah  kug...@linaro.org

 PR middle-end/64130
 * tree-vrp.c (extract_range_from_binary_expr_1): For unsigned
 division, compute minimum when value ranges for dividend and
 divisor are available.


 gcc/testsuite/ChangeLog:

 2015-06-20  Kugan Vivekanandarajah  kug...@linaro.org

 PR middle-end/64130
 * gcc.dg/tree-ssa/pr64130.c: New test.
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr64130.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr64130.c
index e69de29..9e96abb 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr64130.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr64130.c
@@ -0,0 +1,11 @@
+
+/* { dg-do compile } */
+/* { dg-options -O2 -fdump-tree-vrp1 } */
+
+int funsigned(unsigned a)
+{
+  return 0x1L / a == 0;
+}
+
+/* { dg-final { scan-tree-dump : \\\[2, 8589934591\\\] vrp1 } } */
+
diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index b517363..0b8fb31 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -3151,14 +3151,33 @@ extract_range_from_binary_expr_1 (value_range_t *vr,
 and all numbers from min to 0 for negative min.  */
  cmp = compare_values (vr0.max, zero);
  if (cmp == -1)
-   max = zero;
+   {
+ /* When vr0.max  0, vr1.min != 0 and value
+ranges for dividend and divisor are available.  */
+ if (vr1.type == VR_RANGE
+  !symbolic_range_p (vr0)
+  !symbolic_range_p (vr1)
+  !compare_values (vr1.min, zero))
+   max = int_const_binop (code, vr0.max, vr1.min);
+ else
+   max = zero;
+   }
  else if (cmp == 0 || cmp == 1)
max = vr0.max;
  else
type = VR_VARYING;
  cmp = compare_values (vr0.min, zero);
  if (cmp == 1)
-   min = zero;
+   {
+ /* For unsigned division when value ranges for dividend
+and divisor are available.  */
+ if (vr1.type == VR_RANGE
+  !symbolic_range_p (vr0)
+  !symbolic_range_p (vr1))
+   min = int_const_binop (code, vr0.min, vr1.max);
+ else
+   min = zero;
+   }
  else if (cmp == 0 || cmp == -1)
min = vr0.min;
  else


Re: [05/13] Add nofree_ptr_hash

2015-06-24 Thread Richard Sandiford
Jeff Law l...@redhat.com writes:
 On 06/16/2015 02:55 AM, Richard Sandiford wrote:
 This patch stops pointer_hash from inheriting typed_noop_remove and
 instead creates a new class nofree_ptr_hash that inherits from both.
 It then updates all uses of typed_noop_remove (which are all pointers)
 and pointer_hash so that they use this new class instead.

 gcc/
  * hash-table.h: Update comments.
  * hash-traits.h (pointer_hash): Don't inherit from typed_noop_remove.
  (nofree_ptr_hash): New class.
  * asan.c (asan_mem_ref_hasher): Inherit from nofree_ptr_hash rather
  than typed_noop_remove.  Remove redudant typedefs.
  * attribs.c (attribute_hasher): Likewise.
  * cfg.c (bb_copy_hasher): Likewise.
  * cselib.c (cselib_hasher): Likewise.
  * dse.c (invariant_group_base_hasher): Likewise.
  * dwarf2cfi.c (trace_info_hasher): Likewise.
  * dwarf2out.c (macinfo_entry_hasher): Likewise.
  (comdat_type_hasher, loc_list_hasher): Likewise.
  * gcse.c (pre_ldst_expr_hasher): Likewise.
  * genmatch.c (id_base): Likewise.
  * genrecog.c (test_pattern_hasher): Likewise.
  * gimple-ssa-strength-reduction.c (cand_chain_hasher): Likewise.
  * haifa-sched.c (delay_i1_hasher): Likewise.
  * hard-reg-set.h (simplifiable_subregs_hasher): Likewise.
  * ipa-icf.h (congruence_class_group_hash): Likewise.
  * ipa-profile.c (histogram_hash): Likewise.
  * ira-color.c (allocno_hard_regs_hasher): Likewise.
  * lto-streamer.h (string_slot_hasher): Likewise.
  * lto-streamer.c (tree_entry_hasher): Likewise.
  * plugin.c (event_hasher): Likewise.
  * postreload-gcse.c (expr_hasher): Likewise.
  * store-motion.c (st_expr_hasher): Likewise.
  * tree-sra.c (uid_decl_hasher): Likewise.
  * tree-ssa-coalesce.c (coalesce_pair_hasher): Likewise.
  (ssa_name_var_hash): Likewise.
  * tree-ssa-live.c (tree_int_map_hasher): Likewise.
  * tree-ssa-loop-im.c (mem_ref_hasher): Likewise.
  * tree-ssa-pre.c (pre_expr_d): Likewise.
  * tree-ssa-sccvn.c (vn_nary_op_hasher): Likewise.
  * vtable-verify.h (registration_hasher): Likewise.
  * vtable-verify.c (vtbl_map_hasher): Likewise.
  * config/arm/arm.c (libcall_hasher): Likewise.
  * config/i386/winnt.c (wrapped_symbol_hasher): Likewise.
  * config/ia64/ia64.c (bundle_state_hasher): Likewise.
  * config/sol2.c (comdat_entry_hasher): Likewise.
  * fold-const.c (fold): Use nofree_ptr_hash instead of pointer_hash.
  (print_fold_checksum, fold_checksum_tree): Likewise.
  (debug_fold_checksum, fold_build1_stat_loc): Likewise.
  (fold_build2_stat_loc, fold_build3_stat_loc): Likewise.
  (fold_build_call_array_loc): Likewise.
  * tree-ssa-ccp.c (gimple_htab): Likewise.
  * tree-browser.c (tree_upper_hasher): Inherit from nofree_ptr_hash
  rather than pointer_type.

 gcc/c/
  * c-decl.c (detect_field_duplicates_hash): Use nofree_ptr_hash
  instead of pointer_hash.
  (detect_field_duplicates): Likewise.

 gcc/cp/
  * class.c (fixed_type_or_null_ref_ht): Inherit from nofree_ptr_hash
  rather than pointer_hash.
  (fixed_type_or_null): Use nofree_ptr_hash instead of pointer_hash.
  * semantics.c (nrv_data): Likewise.
  * tree.c (verify_stmt_tree_r, verify_stmt_tree): Likewise.

 gcc/java/
  * jcf-io.c (charstar_hash): Inherit from nofree_ptr_hash rather
  than typed_noop_remove.  Remove redudant typedefs.

 gcc/lto/
  * lto.c (tree_scc_hasher): Inherit from nofree_ptr_hash rather
  than typed_noop_remove.  Remove redudant typedefs.

 gcc/objc/
  * objc-act.c (decl_name_hash): Inherit from nofree_ptr_hash rather
  than typed_noop_remove.  Remove redudant typedefs.

 libcc1/
  * plugin.cc (string_hasher): Inherit from nofree_ptr_hash rather
  than typed_noop_remove.  Remove redudant typedefs.
  (plugin_context): Use nofree_ptr_hash rather than pointer_hash.
  (plugin_context::mark): Likewise.
 So are we allowing multiple inheritance in GCC?  It seems like that's 
 what we've got for nofree_ptr_hash.  Is there a better way to achieve 
 what you're trying to do, or do you think this use ought to fall under 
 some kind of exception?


 Index: gcc/haifa-sched.c
 ===
 --- gcc/haifa-sched.c2015-06-16 09:53:47.338092692 +0100
 +++ gcc/haifa-sched.c2015-06-16 09:53:47.322092878 +0100
 @@ -614,9 +614,8 @@ struct delay_pair

   /* Helpers for delay hashing.  */

 -struct delay_i1_hasher : typed_noop_remove delay_pair
 +struct delay_i1_hasher : nofree_ptr_hash delay_pair
   {
 -  typedef delay_pair *value_type;
 typedef void *compare_type;
 static inline hashval_t hash (const delay_pair *);
 static inline bool equal (const delay_pair *, const void *);
 Did you keep compare_type intentionally?  Similarly for the changes in 
 hard-reg-set.h, tree-ssa-loop-im.c, and tree-ssa-sccvn.c.  

[Patch ARM] Fix PR target/63408

2015-06-24 Thread Ramana Radhakrishnan

Hi,

The attached patch fixes PR target/63408 and adds a regression test
for the same. The problem is essentially that
vfp3_const_double_for_fract_bits() needs to be aware that negative
values cannot be used in this context.

Tested with a bootstrap and regression test run on armhf. Applied to trunk.

Will apply to 5 after regression testing there and 4.9 after it unfreezes.

Ramana




2015-06-24  Ramana Radhakrishnan  ramana.radhakrish...@arm.com

PR target/63408
* config/arm/arm.c (vfp3_const_double_for_fract_bits): Disable
for negative numbers.

2015-06-24  Ramana Radhakrishnan  ramana.radhakrish...@arm.com

PR target/63408
* gcc.target/arm/pr63408.c: New test.
commit 9e6fb32c5ba143912c1114a59af0114500c5bc31
Author: Ramana Radhakrishnan ramana.radhakrish...@arm.com
Date:   Tue Jun 23 17:04:40 2015 +0100

Fix PR target/63408

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 4fea1a6..4a284ec 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -29812,7 +29812,8 @@ vfp3_const_double_for_fract_bits (rtx operand)
 return 0;
   
   REAL_VALUE_FROM_CONST_DOUBLE (r0, operand);
-  if (exact_real_inverse (DFmode, r0))
+  if (exact_real_inverse (DFmode, r0)
+   !REAL_VALUE_NEGATIVE (r0))
 {
   if (exact_real_truncate (DFmode, r0))
{
diff --git a/gcc/testsuite/gcc.target/arm/pr63408.c 
b/gcc/testsuite/gcc.target/arm/pr63408.c
new file mode 100644
index 000..a23b2a4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr63408.c
@@ -0,0 +1,25 @@
+
+/* { dg-do run }  */
+/* { dg-options -O2 } */
+void abort (void) __attribute__ ((noreturn));
+float __attribute__((noinline))
+f(float a, int b)
+{
+  return a - (((float)b / 0x7fff) * 100);
+}
+
+
+int
+main (void)
+{
+  float a[] = { 100.0, 0.0, 0.0};
+  int b[] = { 0x7fff, 0x7fff/100.0f, -0x7fff / 100.0f};
+  float c[] = { 0.0, -1.0, 1.0 };
+  int i;
+
+  for (i = 0; i  (sizeof(a) / sizeof (float)); i++)
+if (f (a[i], b[i]) != c[i])
+   abort ();
+
+  return 0;
+}


Re: [PATCH/AARCH64] Update ThunderX schedule model

2015-06-24 Thread James Greenhalgh
On Tue, Jun 23, 2015 at 10:00:21PM +0100, Andrew Pinski wrote:
 Hi,
   This patch updates the schedule model to be more accurate and model
 SIMD and fp instructions that I had missed out when I had the last
 patch.
 
 OK?  Bootstrapped and tested on aarch64-linux-gnu with no regeessions.

These TBL descriptions look a little large...

 +;; 64bit TBL is emulated and takes 160 cycles
 +(define_insn_reservation thunderx_tbl 160
 +  (and (eq_attr tune thunderx)
 +   (eq_attr type neon_tbl1))
 +  (thunderx_pipe1+thunderx_pipe0)*160)
 +
 +;; 128bit TBL is emulated and takes 320 cycles
 +(define_insn_reservation thunderx_tblq 320
 +  (and (eq_attr tune thunderx)
 +   (eq_attr type neon_tbl1_q))
 +  (thunderx_pipe1+thunderx_pipe0)*320)
  

Is there really value in modelling this as taking up 320 cycles and
blocking both execution units for the same? Surely you can achieve much
the same effect with smaller numbers... The difference modelling complete
machine reservation for 320 cycles and 10 cycles should have next to no
effect on performance - if these are blocking for so long it shouldn't
matter where you place them in the instruction stream. 

On the other hand, the thunderx scheduler is generally well behaved
as it splits execution units to their own automata - certainly when
compared to the 60,000 states of the combined Cortex-A53 scheduler,
and this patch as it stands does not bloat it by much:

Automaton `thunderx_main'
  323 NDFA states,657 NDFA arcs
  323 DFA states, 657 DFA arcs
  323 minimal DFA states, 657 minimal DFA arcs
  213 all insns  9 insn equivalence classes
0 locked states
  661 transition comb vector els,  2907 trans table els: use comb vect
 2907 min delay table els, compression factor 1

Compared with setting both of these reservations to

 +  (thunderx_pipe1+thunderx_pipe0)*10)

Which gives:

Automaton `thunderx_main'
   13 NDFA states, 36 NDFA arcs
   13 DFA states,  36 DFA arcs
   13 minimal DFA states,  36 minimal DFA arcs
  213 all insns  8 insn equivalence classes
0 locked states
   39 transition comb vector els,   104 trans table els: use comb vect
  104 min delay table els, compression factor 2

So I think that while you could consider revisiting the TBL reservations;
given the above statistics, this patch is OK for trunk.

Thanks,
James

 ChangeLog:
 
  * config/aarch64/thunderx.md (thunderx_shift): Add rbit and rev.
 (thunderx_crc32): New reservation.
 (thunderx_fmov): Add fcsel, ffarithd and ffariths.
 (thunderx_fabs): New reservation.
 (thunderx_fcsel): New reservation.
 (thunderx_fcmp): New reservation.
 (thunderx_fsqrtd): Correct latency.
 (thunderx_frint): Add f_cvt.
 (thunderx_f_cvt): Remove f_cvt.
 (thunderx_simd_fp_store): Add neon_store1_one_lane
 and neon_store1_one_lane_q.
 (thunderx_neon_ld1): New reservation.
 (thunderx_neon_move): Add neon_dup.
 neon_ins, neon_from_gp, neon_to_gp,
 neon_abs, neon_neg,
 neon_fp_neg_s, and neon_fp_abs_s.
 (thunderx_neon_move_q): Add neon_dup_q,
 neon_ins_q, neon_from_gp_q, neon_to_gp_q,
 neon_abs_q, neon_neg_q,
 neon_fp_neg_s_q, neon_fp_neg_d_q,
 neon_fp_abs_s_q, and neon_fp_abs_d_q.
 (thunderx_neon_add): Add neon_arith_acc, neon_rev, neon_fp_abd_s,
 neon_fp_abd_d, and neon_fp_reduc_minmax_s.
 (thunderx_neon_add_q): Add neon_fp_abd_s_q, neon_fp_abd_d_q,
 neon_arith_acc_q, neon_rev_q,
 neon_fp_reduc_minmax_s_q, and neon_fp_reduc_minmax_d_q.
 (thunderx_neon_mult): New reservation.
 (thunderx_neon_mult_q): New reservation.
 (thunderx_crypto_aese): New reservation.
 (thunderx_crypto_aesmc): New reservation.
 (bypasses): Add bypass to thunderx_neon_mult_q.
 (thunderx_tbl): New reservation.
 (thunderx_tblq): New reservation.

 Index: config/aarch64/thunderx.md
 ===
 --- config/aarch64/thunderx.md(revision 224856)
 +++ config/aarch64/thunderx.md(working copy)
 @@ -39,7 +39,7 @@ (define_insn_reservation thunderx_add
  
  (define_insn_reservation thunderx_shift 1
(and (eq_attr tune thunderx)
 -   (eq_attr type bfm,extend,shift_imm,shift_reg))
 +   (eq_attr type bfm,extend,shift_imm,shift_reg,rbit,rev))
thunderx_pipe0 | thunderx_pipe1)
  
  
 @@ -66,12 +66,18 @@ (define_insn_reservation thunderx_mul
 (eq_attr type mul,muls,mla,mlas,clz,smull,umull,smlal,umlal))
thunderx_pipe1 + thunderx_mult)
  
 -;; Multiply high instructions take an extra cycle and cause the muliply unit 
 to
 -;; be busy for an extra cycle.
 +;; crcb,crch,crcw is 4 cycles and can only happen on pipe 1
  
 -;(define_insn_reservation thunderx_mul_high 5
 +(define_insn_reservation thunderx_crc32 4
 +  (and (eq_attr tune thunderx)
 +   (eq_attr type crc))
 +  thunderx_pipe1 + thunderx_mult)
 +
 +;; crcx is 5 cycles and only happen on pipe 1
 +;(define_insn_reservation thunderx_crc64 5
  ;  (and (eq_attr tune thunderx)
 -;   (eq_attr type 

Re: PING: Re: [patch] PR debug/66482: Do not ICE in gen_formal_parameter_die

2015-06-24 Thread Richard Biener
On Tue, Jun 23, 2015 at 6:08 PM, Aldy Hernandez al...@redhat.com wrote:
 On 06/12/2015 10:07 AM, Aldy Hernandez wrote:

 Hi.

 This is now a P2, as it is causing a secondary target bootstrap to fail
 (s390).

Ok.

Thanks,
Richard.

 Aldy

 Sigh.  I must say my head is spinning with this testcase and what we do
 with it (-O3), even prior to the debug-early work:

 void f(int p) {}
 int g() {
void f(int p);
g();
return 0;
 }

 The inliner recursively inlines this function up to a certain depth, but
 the useless inlining gets cleaned up shortly afterwards.  However, the
 BLOCK_SOURCE_LOCATION are still set throughout which is technically
 correct.

 Eventually late dwarf gets a hold of all this and we end up calling
 dwarf2out_abstract_function to build debug info for the abstract
 instance of a function for which we have already generated a DIE for.
 Basically, a similar issue to what we encountered for template parameter
 packs.  Or at least, that's my understanding, because as I've said, I
 admit to being slightly confused here.

 Since technically this is all going away when we remove
 dwarf2out_abstract_function, I suggest we remove the assert and avoid
 sudden death.  It's not like the we generated useful debugging for this
 testcase anyhow.

 Aldy




Re: [PATCH][RFC] Add FRE in pass_vectorize

2015-06-24 Thread Richard Biener
On Tue, 23 Jun 2015, Jeff Law wrote:

 On 06/10/2015 08:02 AM, Richard Biener wrote:
  
  The following patch adds FRE after vectorization which is needed
  for IVOPTs to remove redundant PHI nodes (well, I'm testing a
  patch for FRE that will do it already there).
 Redundant or degenerates which should be propagated?

Redundant, basically two IVs with the same initial value and same step.
IVOPTs can deal with this if the initial values and the step are already
same enough - the vectorizer can end up generating redundant huge
expressions for both.

 I believe Alan Lawrence has run into similar issues (unpropagated degenerates)
 with his changes to make loop header copying more aggressive.  Threading will
 also create them.  The phi-only propagator may be the solution.  It ought to
 be cheaper than FRE.

Yes, but that's unrelated (see above).

  The patch also makes FRE preserve loop-closed SSA form and thus
  make it suitable for use in the loop pipeline.
 Loop optimizations will tend to create opportunities for redundancy
 elimination, so the ability to use FRE in the loop pipeline seems like a good
 thing.  We ran into this in RTL land, so I'm not surprised to see it occurring
 in the gimple optimizers and thus I'm not opposed to running FRE in the loop
 pipeline.
 
 
 
  
  With the placement in the vectorizer sub-pass FRE will effectively
  be enabled by -O3 only (well, or if one requests loop vectorization).
  I've considered placing it after complete_unroll instead but that
  would enable it at -O1 already.  I have no strong opinion on the
  exact placement, but it should help all passes between vectorizing
  and ivopts for vectorized loops.
 For -O3/vectorization it seems like a no-brainer.  -O1 less so.  IIRC we
 conditionalize -frerun-cse-after-loop on -O2 which seems more appropriate than
 doing it with -O1.
 
  
  Any other suggestions on pass placement?  I can of course key
  that FRE run on -O3 explicitely.  Not sure if we at this point
  want to start playing fancy games like setting a property
  when a pass (likely) generated redundancies that are worth
  fixing up and then key FRE on that one (it gets harder and
  less predictable what transforms are run on code).
 RTL CSE is bloody expensive and so many times I wanted the ability to know a
 bit about what the loop optimizer had done (or not done) so that I could
 conditionally skip the second CSE pass.   We never built that, but it's
 something I've wanted for decades.

Hmm, ok.  We can abuse pass properties for this but I don't think
they are a scalable fit.  Not sure if we'd like to go full way
adding sth like PROP_want_ccp PROP_want_copyprop PROP_want_cse, etc.
(any others?).  And whether FRE would then catch a PROP_want_copyprop
because it also can do copy propagation.

Eventually we'll just end up setting PROP_want_* from every pass...
(like we schedule a CFG cleanup from nearly every pass that did
anything).

Going a bit further here, esp. in the loop context, would be to
have the basic cleanups be region-based.  Because given a big
function with many loops and just one vectorized it would be
enough to cleanup the vectorized loop (yes, and in theory
all downstream effects, but that's probably secondary and not
so important).  It's not too difficult to make FRE run on
a MEME region, the interesting part, engineering-wise, is to
really make it O(size of MEME region) - that is, eliminate
things like O(num_ssa_names) or O(n_basic_blocks) setup cost.

And then there is the possibility of making passes generate less
needs to perform cleanups after them - like in the present case
with the redundant IVs make them more appearant redundant by
CSEing the initial value and step during vectorizer code generation.
I'm playing with the idea of adding a simple CSE machinery to
the gimple_build () interface (aka match-and-simplify).  It
eventually invokes (well, not currently, but that can be fixed)
maybe_push_res_to_seq which is a good place to maintain a
table of already generated expressions.  That of course only
works if you either always append to the same sequence or at least
insert at the same place.

I'm now back to match-and-simplify and will pursue that last idea
a bit (also wanting it for SCCVN itself).

Richard.

 Jeff
 
 

-- 
Richard Biener rguent...@suse.de
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Dilip Upmanyu, Graham 
Norton, HRB 21284 (AG Nuernberg)


Re: [patch] fix regrename pass to ensure renamings produce valid insns

2015-06-24 Thread Ramana Radhakrishnan



On 24/06/15 02:00, Sandra Loosemore wrote:

On 06/18/2015 11:32 AM, Eric Botcazou wrote:

The attached patch teaches regrename to validate insns affected by each
register renaming before making the change.  I can see at least two
other ways to handle this -- earlier, by rejecting renamings that result
in invalid instructions when it's searching for the best renaming; or
later, by validating the entire set of renamings as a group instead of
incrementally for each one -- but doing it all in regname_do_replace
seems least disruptive and risky in terms of the existing code.


OK, but the patch looks incomplete, rename_chains should be adjusted
as well,
i.e. regrename_do_replace should now return a boolean.


Like this?  I tested this on nios2 and x86_64-linux-gnu, as before, plus
built for aarch64-linux-gnu and ran the gcc testsuite.


Hopefully that was built with --with-cpu=cortex-a57 to enable the 
renaming pass ?


Ramana



The c6x back end also calls regrename_do_replace.  I am not set up to
build or test on that target, and Bernd told me off-list that it would
never fail on that target anyway so I have left that code alone.

-Sandra


[gomp4, committed] Move rewrite_virtuals_into_loop_closed_ssa to tree-ssa-loop-manip.c

2015-06-24 Thread Tom de Vries

Hi,

this patch moves rewrite_virtuals_into_loop_closed_ssa to 
tree-ssa-loop-manip.c, as requested here: 
https://gcc.gnu.org/ml/gcc-patches/2015-06/msg01264.html .


Thanks,
- Tom
Move rewrite_virtuals_into_loop_closed_ssa to tree-ssa-loop-manip.c

2015-06-18  Tom de Vries  t...@codesourcery.com

	* tree-parloops.c (replace_uses_in_bbs_by)
	(rewrite_virtuals_into_loop_closed_ssa): Move to ...
	* tree-ssa-loop-manip.c: here.
	* tree-ssa-loop-manip.h (rewrite_virtuals_into_loop_closed_ssa):
	Declare.
---
 gcc/tree-parloops.c   | 87 ---
 gcc/tree-ssa-loop-manip.c | 87 +++
 gcc/tree-ssa-loop-manip.h |  1 +
 3 files changed, 88 insertions(+), 87 deletions(-)

diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c
index 0661b78..a9d8c2a 100644
--- a/gcc/tree-parloops.c
+++ b/gcc/tree-parloops.c
@@ -1507,93 +1507,6 @@ replace_uses_in_bb_by (tree name, tree val, basic_block bb)
 }
 }
 
-/* Replace uses of NAME by VAL in blocks BBS.  */
-
-static void
-replace_uses_in_bbs_by (tree name, tree val, bitmap bbs)
-{
-  gimple use_stmt;
-  imm_use_iterator imm_iter;
-
-  FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, name)
-{
-  if (!bitmap_bit_p (bbs, gimple_bb (use_stmt)-index))
-	continue;
-
-  use_operand_p use_p;
-  FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter)
-	SET_USE (use_p, val);
-}
-}
-
-/* Ensure a virtual phi is present in the exit block, if LOOP contains a vdef.
-   In other words, ensure loop-closed ssa normal form for virtuals.  */
-
-static void
-rewrite_virtuals_into_loop_closed_ssa (struct loop *loop)
-{
-  gphi *phi;
-  edge exit = single_dom_exit (loop);
-
-  phi = NULL;
-  for (gphi_iterator gsi = gsi_start_phis (loop-header);
-   !gsi_end_p (gsi);
-   gsi_next (gsi))
-{
-  if (virtual_operand_p (PHI_RESULT (gsi.phi (
-	{
-	  phi = gsi.phi ();
-	  break;
-	}
-}
-
-  if (phi == NULL)
-return;
-
-  tree final_loop = PHI_ARG_DEF_FROM_EDGE (phi, single_succ_edge (loop-latch));
-
-  phi = NULL;
-  for (gphi_iterator gsi = gsi_start_phis (exit-dest);
-   !gsi_end_p (gsi);
-   gsi_next (gsi))
-{
-  if (virtual_operand_p (PHI_RESULT (gsi.phi (
-	{
-	  phi = gsi.phi ();
-	  break;
-	}
-}
-
-  if (phi != NULL)
-{
-  tree final_exit = PHI_ARG_DEF_FROM_EDGE (phi, exit);
-  gcc_assert (operand_equal_p (final_loop, final_exit, 0));
-  return;
-}
-
-  tree res_new = copy_ssa_name (final_loop, NULL);
-  gphi *nphi = create_phi_node (res_new, exit-dest);
-
-  /* Gather the bbs dominated by the exit block.  */
-  bitmap exit_dominated = BITMAP_ALLOC (NULL);
-  bitmap_set_bit (exit_dominated, exit-dest-index);
-  vecbasic_block exit_dominated_vec
-= get_dominated_by (CDI_DOMINATORS, exit-dest);
-
-  int i;
-  basic_block dom_bb;
-  FOR_EACH_VEC_ELT (exit_dominated_vec, i, dom_bb)
-bitmap_set_bit (exit_dominated, dom_bb-index);
-
-  exit_dominated_vec.release ();
-
-  replace_uses_in_bbs_by (final_loop, res_new, exit_dominated);
-
-  add_phi_arg (nphi, final_loop, exit, UNKNOWN_LOCATION);
-
-  BITMAP_FREE (exit_dominated);
-}
-
 /* Do transformation from:
 
  bb preheader:
diff --git a/gcc/tree-ssa-loop-manip.c b/gcc/tree-ssa-loop-manip.c
index 228fac6..1150e6c 100644
--- a/gcc/tree-ssa-loop-manip.c
+++ b/gcc/tree-ssa-loop-manip.c
@@ -569,6 +569,93 @@ rewrite_into_loop_closed_ssa (bitmap changed_bbs, unsigned update_flag)
   free (use_blocks);
 }
 
+/* Replace uses of NAME by VAL in blocks BBS.  */
+
+static void
+replace_uses_in_bbs_by (tree name, tree val, bitmap bbs)
+{
+  gimple use_stmt;
+  imm_use_iterator imm_iter;
+
+  FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, name)
+{
+  if (!bitmap_bit_p (bbs, gimple_bb (use_stmt)-index))
+	continue;
+
+  use_operand_p use_p;
+  FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter)
+	SET_USE (use_p, val);
+}
+}
+
+/* Ensure a virtual phi is present in the exit block, if LOOP contains a vdef.
+   In other words, ensure loop-closed ssa normal form for virtuals.  */
+
+void
+rewrite_virtuals_into_loop_closed_ssa (struct loop *loop)
+{
+  gphi *phi;
+  edge exit = single_dom_exit (loop);
+
+  phi = NULL;
+  for (gphi_iterator gsi = gsi_start_phis (loop-header);
+   !gsi_end_p (gsi);
+   gsi_next (gsi))
+{
+  if (virtual_operand_p (PHI_RESULT (gsi.phi (
+	{
+	  phi = gsi.phi ();
+	  break;
+	}
+}
+
+  if (phi == NULL)
+return;
+
+  tree final_loop = PHI_ARG_DEF_FROM_EDGE (phi, single_succ_edge (loop-latch));
+
+  phi = NULL;
+  for (gphi_iterator gsi = gsi_start_phis (exit-dest);
+   !gsi_end_p (gsi);
+   gsi_next (gsi))
+{
+  if (virtual_operand_p (PHI_RESULT (gsi.phi (
+	{
+	  phi = gsi.phi ();
+	  break;
+	}
+}
+
+  if (phi != NULL)
+{
+  tree final_exit = PHI_ARG_DEF_FROM_EDGE (phi, exit);
+  gcc_assert (operand_equal_p (final_loop, final_exit, 0));
+  return;
+}
+
+  tree res_new = copy_ssa_name 

Re: Remove redundant AND from count reduction loop

2015-06-24 Thread Richard Biener
On Tue, Jun 23, 2015 at 11:27 PM, Marc Glisse marc.gli...@inria.fr wrote:
 On Tue, 23 Jun 2015, Richard Sandiford wrote:

 +/* Vector comparisons are defined to produce all-one or all-zero results.
 */
 +(simplify
 + (vec_cond @0 integer_all_onesp@1 integer_zerop@2)
 + (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
 +   (convert @0)))


 I am trying to understand why the test tree_nop_conversion_p is the right
 one (at least for the transformations not using VIEW_CONVERT_EXPR). By
 definition of VEC_COND_EXPR, type and TREE_TYPE (@0) are both integer vector
 types of the same size and number of elements. It thus seems like a
 conversion is always fine. For vectors, tree_nop_conversion_p apparently
 only checks that they have the same mode (quite often VOIDmode I guess).

The only conversion we seem to allow is changing the signed vector from
the comparison result to an unsigned vector (same number of elements
and same mode of the elements).  That is, a check using
TYPE_MODE (type) == TYPE_MODE (TREE_TYPE (@0)) would probably
be better (well, technically a TYPE_VECTOR_SUBPARTS  element
mode compare should be better as generic vectors might not have a vector mode).

I'm fine with using tree_nop_conversion_p for now.

 +/* We could instead convert all instances of the vec_cond to negate,
 +   but that isn't necessarily a win on its own.  */

so p ? 1 : 0 - -p?  Why isn't that a win on its own?  It looks more compact
at least ;)  It would also simplify the patterns below.

I'm missing a comment on the transform done by the following patterns.

 +(simplify
 + (plus:c @3 (vec_cond @0 integer_each_onep@1 integer_zerop@2))
 + (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
 +  (minus @3 (convert @0
 +
 +(simplify
 + (plus:c @3 (view_convert_expr


 Aren't we suppose to drop _expr in match.pd?

Yes.  I probably should adjust genmatch.c to reject the _expr variants ;)


 +(vec_cond @0 integer_each_onep@1 integer_zerop@2)))
 + (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
 +  (minus @3 (convert @0
 +
 +(simplify
 + (minus @3 (vec_cond @0 integer_each_onep@1 integer_zerop@2))
 + (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
 +  (plus @3 (convert @0
 +
 +(simplify
 + (minus @3 (view_convert_expr
 +   (vec_cond @0 integer_each_onep@1 integer_zerop@2)))
 + (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
 +  (plus @3 (convert @0
 +

Generally for sign-conversions of vectors you should use view_convert.

The above also hints at missing conditional view_convert support
and a way to iterate over commutative vs. non-commutative ops so
we could write

(for op (plus:c minus)
 rop (minus plus)
  (op @3 (view_convert? (vec_cond @0 integer_each_onep@1 integer_zerop@2)))
  (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
   (rop @3 (view_convert @0)

I'll see implementing that.

Richard.


 /* Simplifications of comparisons.  */

 /* We can simplify a logical negation of a comparison to the
 Index: gcc/testsuite/gcc.target/aarch64/vect-add-sub-cond.c
 ===
 --- /dev/null   2015-06-02 17:27:28.541944012 +0100
 +++ gcc/testsuite/gcc.target/aarch64/vect-add-sub-cond.c2015-06-23
 12:06:27.120203685 +0100
 @@ -0,0 +1,94 @@
 +/* Make sure that vector comaprison results are not unnecessarily ANDed
 +   with vectors of 1.  */
 +/* { dg-do compile } */
 +/* { dg-options -O2 -ftree-vectorize } */
 +
 +#define COUNT1(X) if (X) count += 1
 +#define COUNT2(X) if (X) count -= 1
 +#define COUNT3(X) count += (X)
 +#define COUNT4(X) count -= (X)
 +
 +#define COND1(X) (X)
 +#define COND2(X) ((X) ? 1 : 0)
 +#define COND3(X) ((X) ? -1 : 0)
 +#define COND4(X) ((X) ? 0 : 1)
 +#define COND5(X) ((X) ? 0 : -1)
 +
 +#define TEST_LT(X, Y) ((X)  (Y))
 +#define TEST_LE(X, Y) ((X) = (Y))
 +#define TEST_GT(X, Y) ((X)  (Y))
 +#define TEST_GE(X, Y) ((X) = (Y))
 +#define TEST_EQ(X, Y) ((X) == (Y))
 +#define TEST_NE(X, Y) ((X) != (Y))
 +
 +#define COUNT_LOOP(ID, TYPE, CMP_ARRAY, TEST, COUNT) \
 +  TYPE \
 +  reduc_##ID (__typeof__ (CMP_ARRAY[0]) x) \
 +  { \
 +TYPE count = 0; \
 +for (unsigned int i = 0; i  1024; ++i) \
 +  COUNT (TEST (CMP_ARRAY[i], x)); \
 +return count; \
 +  }
 +
 +#define COND_LOOP(ID, ARRAY, CMP_ARRAY, TEST, COND) \
 +  void \
 +  plus_##ID (__typeof__ (CMP_ARRAY[0]) x) \
 +  { \
 +for (unsigned int i = 0; i  1024; ++i) \
 +  ARRAY[i] += COND (TEST (CMP_ARRAY[i], x)); \
 +  } \
 +  void \
 +  plusc_##ID (void) \
 +  { \
 +for (unsigned int i = 0; i  1024; ++i) \
 +  ARRAY[i] += COND (TEST (CMP_ARRAY[i], 10)); \
 +  } \
 +  void \
 +  minus_##ID (__typeof__ (CMP_ARRAY[0]) x) \
 +  { \
 +for (unsigned int i = 0; i  1024; ++i) \
 +  ARRAY[i] -= COND (TEST (CMP_ARRAY[i], x)); \
 +  } \
 +  void \
 +  minusc_##ID (void) \
 +  { \
 +for (unsigned int i = 0; i  1024; ++i) \
 +  ARRAY[i] += COND (TEST (CMP_ARRAY[i], 1)); \
 +  }
 +
 +#define ALL_LOOPS(ID, ARRAY, 

[PATCH] Support conditional view_convert in match.pd

2015-06-24 Thread Richard Biener

Tested on match.pd (same code generation) and a toy pattern.  Will apply
after bootstrap  regtest on x86_64-unknown-linux-gnu.

Richard.

2015-06-24  Richard Biener  rguent...@suse.de

* genmatch.c (enum tree_code): Add VIEW_CONVERT[012].
(main): Likewise.
(lower_opt_convert): Support lowering of conditional view_convert.
(parser::parse_operation): Likewise.
(parser::parse_for): Likewise.

Index: gcc/genmatch.c
===
--- gcc/genmatch.c  (revision 224888)
+++ gcc/genmatch.c  (working copy)
@@ -161,6 +161,9 @@ enum tree_code {
 CONVERT0,
 CONVERT1,
 CONVERT2,
+VIEW_CONVERT0,
+VIEW_CONVERT1,
+VIEW_CONVERT2,
 MAX_TREE_CODES
 };
 #undef DEFTREECODE
@@ -749,12 +752,14 @@ lower_commutative (simplify *s, vecsimp
children if STRIP, else replace them with an unconditional convert.  */
 
 operand *
-lower_opt_convert (operand *o, enum tree_code oper, bool strip)
+lower_opt_convert (operand *o, enum tree_code oper,
+  enum tree_code to_oper, bool strip)
 {
   if (capture *c = dyn_castcapture * (o))
 {
   if (c-what)
-   return new capture (c-where, lower_opt_convert (c-what, oper, strip));
+   return new capture (c-where,
+   lower_opt_convert (c-what, oper, to_oper, strip));
   else
return c;
 }
@@ -766,16 +771,18 @@ lower_opt_convert (operand *o, enum tree
   if (*e-operation == oper)
 {
   if (strip)
-   return lower_opt_convert (e-ops[0], oper, strip);
+   return lower_opt_convert (e-ops[0], oper, to_oper, strip);
 
-  expr *ne = new expr (get_operator (CONVERT_EXPR));
-  ne-append_op (lower_opt_convert (e-ops[0], oper, strip));
+  expr *ne = new expr (to_oper == CONVERT_EXPR
+  ? get_operator (CONVERT_EXPR)
+  : get_operator (VIEW_CONVERT_EXPR));
+  ne-append_op (lower_opt_convert (e-ops[0], oper, to_oper, strip));
   return ne;
 }
 
   expr *ne = new expr (e-operation, e-is_commutative);
   for (unsigned i = 0; i  e-ops.length (); ++i)
-ne-append_op (lower_opt_convert (e-ops[i], oper, strip));
+ne-append_op (lower_opt_convert (e-ops[i], oper, to_oper, strip));
 
   return ne;
 }
@@ -818,20 +825,28 @@ lower_opt_convert (operand *o)
 
   v1.safe_push (o);
 
-  enum tree_code opers[] = { CONVERT0, CONVERT1, CONVERT2 };
+  enum tree_code opers[]
+= { CONVERT0, CONVERT_EXPR,
+   CONVERT1, CONVERT_EXPR,
+   CONVERT2, CONVERT_EXPR,
+   VIEW_CONVERT0, VIEW_CONVERT_EXPR,
+   VIEW_CONVERT1, VIEW_CONVERT_EXPR,
+   VIEW_CONVERT2, VIEW_CONVERT_EXPR };
 
   /* Conditional converts are lowered to a pattern with the
  conversion and one without.  The three different conditional
  convert codes are lowered separately.  */
 
-  for (unsigned i = 0; i  3; ++i)
+  for (unsigned i = 0; i  sizeof (opers) / sizeof (enum tree_code); i += 2)
 {
   v2 = vNULL;
   for (unsigned j = 0; j  v1.length (); ++j)
if (has_opt_convert (v1[j], opers[i]))
  {
-   v2.safe_push (lower_opt_convert (v1[j], opers[i], false));
-   v2.safe_push (lower_opt_convert (v1[j], opers[i], true));
+   v2.safe_push (lower_opt_convert (v1[j],
+opers[i], opers[i+1], false));
+   v2.safe_push (lower_opt_convert (v1[j],
+opers[i], opers[i+1], true));
  }
 
   if (v2 != vNULL)
@@ -2890,14 +2905,22 @@ parser::parse_operation ()
   const cpp_token *token = peek ();
   if (strcmp (id, convert0) == 0)
 fatal_at (id_tok, use 'convert?' here);
+  else if (strcmp (id, view_convert0) == 0)
+fatal_at (id_tok, use 'view_convert?' here);
   if (token-type == CPP_QUERY
!(token-flags  PREV_WHITE))
 {
   if (strcmp (id, convert) == 0)
id = convert0;
-  else if (strcmp  (id, convert1) == 0)
+  else if (strcmp (id, convert1) == 0)
;
-  else if (strcmp  (id, convert2) == 0)
+  else if (strcmp (id, convert2) == 0)
+   ;
+  else if (strcmp (id, view_convert) == 0)
+   id = view_convert0;
+  else if (strcmp (id, view_convert1) == 0)
+   ;
+  else if (strcmp (id, view_convert2) == 0)
;
   else
fatal_at (id_tok, non-convert operator conditionalized);
@@ -2907,8 +2930,10 @@ parser::parse_operation ()
  match expression);
   eat_token (CPP_QUERY);
 }
-  else if (strcmp  (id, convert1) == 0
-  || strcmp  (id, convert2) == 0)
+  else if (strcmp (id, convert1) == 0
+  || strcmp (id, convert2) == 0
+  || strcmp (id, view_convert1) == 0
+  || strcmp (id, view_convert2) == 0)
 fatal_at (id_tok, expected '?' after conditional operator);
   id_base *op = get_operator (id);
   if (!op)
@@ -3325,7 +3350,9 @@ parser::parse_for (source_location)
  id_base *idb = get_operator 

Re: [PATCH 2/3][AArch64 nofp] Clarify docs for +nofp/-mgeneral-regs-only

2015-06-24 Thread James Greenhalgh
On Tue, Jun 23, 2015 at 05:03:13PM +0100, Alan Lawrence wrote:
 James Greenhalgh wrote:
snip

 To my eye, beginning a sentence in lowercase looks very odd in pdf, and still 
 a 
 bit odd in html. Have changed to That is...?
 
 Tested with make pdf  make html.
 
 gcc/ChangeLog (unchanged):
 
   * doc/invoke.texi: Clarify AArch64 feature modifiers (no)fp, (no)simd
   and (no)crypto.

OK.

Thanks,
James

 diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
 index 
 d8e982c3aa338819df3785696c493a66c1f5b674..0579bf2ecf993bb56987e0bb9686925537ab61e3
  100644
 --- a/gcc/doc/invoke.texi
 +++ b/gcc/doc/invoke.texi
 @@ -12359,7 +12359,10 @@ Generate big-endian code.  This is the default when 
 GCC is configured for an
  
  @item -mgeneral-regs-only
  @opindex mgeneral-regs-only
 -Generate code which uses only the general registers.
 +Generate code which uses only the general-purpose registers.  This is 
 equivalent
 +to feature modifier @option{nofp} of @option{-march} or @option{-mcpu}, 
 except
 +that @option{-mgeneral-regs-only} takes precedence over any conflicting 
 feature
 +modifier regardless of sequence.
  
  @item -mlittle-endian
  @opindex mlittle-endian
 @@ -12498,20 +12501,22 @@ over the appropriate part of this option.
  @subsubsection @option{-march} and @option{-mcpu} Feature Modifiers
  @cindex @option{-march} feature modifiers
  @cindex @option{-mcpu} feature modifiers
 -Feature modifiers used with @option{-march} and @option{-mcpu} can be one
 -the following:
 +Feature modifiers used with @option{-march} and @option{-mcpu} can be any of
 +the following and their inverses @option{no@var{feature}}:
  
  @table @samp
  @item crc
  Enable CRC extension.
  @item crypto
 -Enable Crypto extension.  This implies Advanced SIMD is enabled.
 +Enable Crypto extension.  This also enables Advanced SIMD and floating-point
 +instructions.
  @item fp
 -Enable floating-point instructions.
 +Enable floating-point instructions.  This is on by default for all possible
 +values for options @option{-march} and @option{-mcpu}.
  @item simd
 -Enable Advanced SIMD instructions.  This implies floating-point instructions
 -are enabled.  This is the default for all current possible values for options
 -@option{-march} and @option{-mcpu=}.
 +Enable Advanced SIMD instructions.  This also enables floating-point
 +instructions.  This is on by default for all possible values for options
 +@option{-march} and @option{-mcpu}.
  @item lse
  Enable Large System Extension instructions.
  @item pan
 @@ -12522,6 +12527,10 @@ Enable Limited Ordering Regions support.
  Enable ARMv8.1 Advanced SIMD instructions.
  @end table
  
 +That is, @option{crypto} implies @option{simd} implies @option{fp}.
 +Conversely, @option{nofp} (or equivalently, @option{-mgeneral-regs-only})
 +implies @option{nosimd} implies @option{nocrypto}.
 +
  @node Adapteva Epiphany Options
  @subsection Adapteva Epiphany Options
  



Re: [PATCH] Fix PR c++/30044

2015-06-24 Thread Patrick Palka
On Wed, Jun 24, 2015 at 5:08 AM, Markus Trippelsdorf
mar...@trippelsdorf.de wrote:
 On 2015.06.23 at 19:40 -0400, Patrick Palka wrote:
 On Tue, Jun 23, 2015 at 12:38 AM, Jason Merrill ja...@redhat.com wrote:
  On 06/15/2015 02:32 PM, Patrick Palka wrote:
 
  On Mon, Jun 15, 2015 at 2:05 PM, Jason Merrill ja...@redhat.com wrote:
 
  Any reason not to use grow_tree_vec?
 
 
  Doing so causes a lot of ICEs in the testsuite.  I think it's because
  grow_tree_vec invalidates the older parameter_vec which some trees may
  still be holding a reference to in their DECL_TEMPLATE_PARMS field.
 
 
  Hmm, that's unfortunate, as doing it this way means we get a bunch of
  garbage TREE_VECs in the process.  But I guess the patch is OK as is.

 Yeah, though I can't think of a simple way to work around this -- any
 solution I think of seems to require a change in the representation of
 current_template_parms, something that would be quite invasive
 Will commit the patch shortly.

 Your patch causes LLVM build to hang on the attached testcase. (I killed
 gcc after ~10 minutes compile time.)

 perf shows:
   23.03%  cc1plus  cc1plus  [.] comp_template_parms
   19.41%  cc1plus  cc1plus  [.] structural_comptypes
   16.28%  cc1plus  cc1plus  [.] cp_type_quals
   15.89%  cc1plus  cc1plus  [.] comp_template_parms_position
   14.01%  cc1plus  cc1plus  [.] comp_type_attributes
6.58%  cc1plus  cc1plus  [.] comptypes
 ...

 To reproduce just run:
  g++ -c -O3 -std=c++11 gtest-all.ii

Thanks.  I don't think infinite recursion is going on.  Rather, it
seems that this patch causes a quadratic slowdown (in the number of
template template parameters in a parameter list and in the number of
partial specializations of a template) in the structural_comptypes -
comp_template_parms - comptypes loop when comparing two
TEMPLATE_TEMPLATE_PARMs to find the canonical template template
parameter of a partial specialization.  The test case has a good
amount of mechanical partial specializations of templates with big
parameter lists containing lots of template template parameters so
it's very sensitive to this quadratic slowdown.

To compare two template template parameters for structural equality,
structural_comptypes must compare their DECL_TEMPLATE_PARMS for
structural equality.  Since the patch gives the DECL_TEMPLATE_PARMS
field a level containing all previously declared template parameters
in the parameter list it's defined in, this comparison becomes
recursive and quadratic if all the parameters of the template are
template template parameters which is what the test has starting at
line 48518.

In the meantime I will revert this patch since I won't be able to find
a solution in time.

What should be done about the PR?  I suppose I should reopen it...


 --
 Markus


C PATCH to use VAR_P

2015-06-24 Thread Marek Polacek
Similarly to what Gaby did in 2013 for C++
(https://gcc.gnu.org/ml/gcc-patches/2013-03/msg01271.html), this patch
makes the c/ and c-family/ code use VAR_P rather than

  TREE_CODE (t) == VAR_DECL

(This is on top of the previous patch with is_global_var.)

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2015-06-24  Marek Polacek  pola...@redhat.com

* array-notation-common.c: Use VAR_P throughout.
* c-ada-spec.c: Likewise.
* c-common.c: Likewise.
* c-format.c: Likewise.
* c-gimplify.c: Likewise.
* c-omp.c: Likewise.
* c-pragma.c: Likewise.
* c-pretty-print.c: Likewise.
* cilk.c: Likewise.

* c-array-notation.c: Use VAR_P throughout.
* c-decl.c: Likewise.
* c-objc-common.c: Likewise.
* c-parser.c: Likewise.
* c-typeck.c: Likewise.

diff --git gcc/c-family/array-notation-common.c 
gcc/c-family/array-notation-common.c
index d60ec3f..f517424 100644
--- gcc/c-family/array-notation-common.c
+++ gcc/c-family/array-notation-common.c
@@ -231,7 +231,7 @@ find_rank (location_t loc, tree orig_expr, tree expr, bool 
ignore_builtin_fn,
   || TREE_CODE (ii_tree) == INDIRECT_REF)
ii_tree = TREE_OPERAND (ii_tree, 0);
  else if (TREE_CODE (ii_tree) == PARM_DECL
-  || TREE_CODE (ii_tree) == VAR_DECL)
+  || VAR_P (ii_tree))
break;
  else
gcc_unreachable ();
diff --git gcc/c-family/c-ada-spec.c gcc/c-family/c-ada-spec.c
index ab29f86..ef3c5e3 100644
--- gcc/c-family/c-ada-spec.c
+++ gcc/c-family/c-ada-spec.c
@@ -2826,7 +2826,7 @@ print_ada_declaration (pretty_printer *buffer, tree t, 
tree type, int spc)
 }
   else
 {
-  if (TREE_CODE (t) == VAR_DECL
+  if (VAR_P (t)
   decl_name
   *IDENTIFIER_POINTER (decl_name) == '_')
return 0;
diff --git gcc/c-family/c-common.c gcc/c-family/c-common.c
index d315854..d7ccf0e 100644
--- gcc/c-family/c-common.c
+++ gcc/c-family/c-common.c
@@ -1620,7 +1620,7 @@ decl_constant_value_for_optimization (tree exp)
 gcc_unreachable ();
 
   if (!optimize
-  || TREE_CODE (exp) != VAR_DECL
+  || !VAR_P (exp)
   || TREE_CODE (TREE_TYPE (exp)) == ARRAY_TYPE
   || DECL_MODE (exp) == BLKmode)
 return exp;
@@ -6952,7 +6952,7 @@ handle_nocommon_attribute (tree *node, tree name,
   tree ARG_UNUSED (args),
   int ARG_UNUSED (flags), bool *no_add_attrs)
 {
-  if (TREE_CODE (*node) == VAR_DECL)
+  if (VAR_P (*node))
 DECL_COMMON (*node) = 0;
   else
 {
@@ -6970,7 +6970,7 @@ static tree
 handle_common_attribute (tree *node, tree name, tree ARG_UNUSED (args),
 int ARG_UNUSED (flags), bool *no_add_attrs)
 {
-  if (TREE_CODE (*node) == VAR_DECL)
+  if (VAR_P (*node))
 DECL_COMMON (*node) = 1;
   else
 {
@@ -7349,12 +7349,12 @@ handle_used_attribute (tree *pnode, tree name, tree 
ARG_UNUSED (args),
   tree node = *pnode;
 
   if (TREE_CODE (node) == FUNCTION_DECL
-  || (TREE_CODE (node) == VAR_DECL  TREE_STATIC (node))
+  || (VAR_P (node)  TREE_STATIC (node))
   || (TREE_CODE (node) == TYPE_DECL))
 {
   TREE_USED (node) = 1;
   DECL_PRESERVE_P (node) = 1;
-  if (TREE_CODE (node) == VAR_DECL)
+  if (VAR_P (node))
DECL_READ_P (node) = 1;
 }
   else
@@ -7378,14 +7378,13 @@ handle_unused_attribute (tree *node, tree name, tree 
ARG_UNUSED (args),
   tree decl = *node;
 
   if (TREE_CODE (decl) == PARM_DECL
- || TREE_CODE (decl) == VAR_DECL
+ || VAR_P (decl)
  || TREE_CODE (decl) == FUNCTION_DECL
  || TREE_CODE (decl) == LABEL_DECL
  || TREE_CODE (decl) == TYPE_DECL)
{
  TREE_USED (decl) = 1;
- if (TREE_CODE (decl) == VAR_DECL
- || TREE_CODE (decl) == PARM_DECL)
+ if (VAR_P (decl) || TREE_CODE (decl) == PARM_DECL)
DECL_READ_P (decl) = 1;
}
   else
@@ -7913,7 +7912,7 @@ handle_section_attribute (tree *node, tree ARG_UNUSED 
(name), tree args,
   goto fail;
 }
 
-  if (TREE_CODE (decl) == VAR_DECL
+  if (VAR_P (decl)
current_function_decl != NULL_TREE
!TREE_STATIC (decl))
 {
@@ -7932,7 +7931,7 @@ handle_section_attribute (tree *node, tree ARG_UNUSED 
(name), tree args,
   goto fail;
 }
 
-  if (TREE_CODE (decl) == VAR_DECL
+  if (VAR_P (decl)
!targetm.have_tls  targetm.emutls.tmpl_section
DECL_THREAD_LOCAL_P (decl))
 {
@@ -8223,7 +8222,7 @@ handle_alias_ifunc_attribute (bool is_alias, tree *node, 
tree name, tree args,
   tree decl = *node;
 
   if (TREE_CODE (decl) != FUNCTION_DECL
-   (!is_alias || TREE_CODE (decl) != VAR_DECL))
+   (!is_alias || !VAR_P (decl)))
 {
   warning (OPT_Wattributes, %qE attribute ignored, name);
   *no_add_attrs = true;
@@ -8518,7 +8517,7 @@ c_determine_visibility (tree decl)
  

Re: C PATCH to use VAR_P

2015-06-24 Thread Uros Bizjak
Hello!

 Similarly to what Gaby did in 2013 for C++
 (https://gcc.gnu.org/ml/gcc-patches/2013-03/msg01271.html), this patch
 makes the c/ and c-family/ code use VAR_P rather than

   TREE_CODE (t) == VAR_DECL

 (This is on top of the previous patch with is_global_var.)

You could also use VAR_OR_FUNCTION_DECL, e.g. in the part below.

Uros.

@@ -7378,14 +7378,13 @@ handle_unused_attribute (tree *node, tree
name, tree ARG_UNUSED (args),
   tree decl = *node;

   if (TREE_CODE (decl) == PARM_DECL
-  || TREE_CODE (decl) == VAR_DECL
+  || VAR_P (decl)
   || TREE_CODE (decl) == FUNCTION_DECL
   || TREE_CODE (decl) == LABEL_DECL
   || TREE_CODE (decl) == TYPE_DECL)


Re: [gomp4.1] Add new versions of GOMP_target{,_data,_update} and GOMP_target_enter_exit_data

2015-06-24 Thread Jakub Jelinek
On Tue, Jun 23, 2015 at 02:40:43PM +0300, Ilya Verbin wrote:
 On Sat, Jun 20, 2015 at 00:35:14 +0300, Ilya Verbin wrote:
  Given that a mapped variable in 4.1 can have different kinds across nested 
  data
  regions, we need to store map-type not only for each var, but also for each
  structured mapping.  Here is my WIP patch, is it sane? :)
  Attached testcase works OK on the device with non-shared memory.
 
 A bit updated version with a fix for GOMP_MAP_TO_PSET.
 make check-target-libgomp passed.

Thinking about this more, for always modifier this isn't really sufficient.
Consider:
void
foo (int *p)
{
  #pragma omp target data (alloc:p[0:32])
  {
#pragma omp target data (always, from:p[7:9])
{
  ...
}
  }
}
If all we record is the corresponding splay_tree and the flags
(from/always_from), then this would try to copy from the device
the whole array section, rather than just the small portion of it.
So, supposedly in addition to the splay_tree for always from case we also
need to remember e.g. [relative offset, length] within the splay tree
object.

Jakub


Re: Remove redundant AND from count reduction loop

2015-06-24 Thread Richard Biener
On Wed, Jun 24, 2015 at 2:28 PM, Richard Sandiford
richard.sandif...@arm.com wrote:
 Richard Biener richard.guent...@gmail.com writes:
 On Wed, Jun 24, 2015 at 1:10 PM, Richard Sandiford
 richard.sandif...@arm.com wrote:
 Richard Biener richard.guent...@gmail.com writes:
 I'm fine with using tree_nop_conversion_p for now.

 I like the suggestion about checking TYPE_VECTOR_SUBPARTS and the element
 mode.  How about:

  (if (VECTOR_INTEGER_TYPE_P (type)
TYPE_VECTOR_SUBPARTS (type) == TYPE_VECTOR_SUBPARTS (TREE_TYPE 
 (@0))
(TYPE_MODE (TREE_TYPE (type))
   == TYPE_MODE (TREE_TYPE (TREE_TYPE (@0)

 (But is it really OK to be adding more mode-based compatibility checks?
 I thought you were hoping to move away from modes in the middle end.)

 The TYPE_MODE check makes the VECTOR_INTEGER_TYPE_P check redundant
 (the type of a comparison is always a signed vector integer type).

 OK, will just use VECTOR_TYPE_P then.

 Given we're in a VEC_COND_EXPR that's redundant as well.

 Hmm, but is it really guaranteed in:

  (plus:c @3 (view_convert (vec_cond @0 integer_each_onep@1 integer_zerop@2)))

 that the @3 and the view_convert are also vectors?  I thought we allowed
 view_converts from vector to non-vector types.

Hmm, true.

 +/* We could instead convert all instances of the vec_cond to negate,
 +   but that isn't necessarily a win on its own.  */

 so p ? 1 : 0 - -p?  Why isn't that a win on its own?  It looks
 more compact
 at least ;)  It would also simplify the patterns below.

 In the past I've dealt with processors where arithmetic wasn't handled
 as efficiently as logical ops.  Seems like an especial risk for 64-bit
 elements, from a quick scan of the i386 scheduling models.

 But then expansion could undo this ...

 So do the inverse fold and convert (neg (cond)) to (vec_cond cond 1 0)?
 Is there precendent for doing that kind of thing?

 Expanding it as this, yes.  Whether there is precedence no idea, but
 surely the expand_unop path could, if there is no optab for neg:vector_mode,
 try expanding as vec_cond .. 1 0.

 Yeah, that part isn't the problem.  It's when there is an implementation
 of (neg ...) (which I'd hope all real integer vector architectures would
 support) but it's not as efficient as the (and ...) that most targets
 would use for a (vec_cond ... 0).

I would suppose that a single-operand op (neg) is always bettter than a
two-operand (and) one.  But you of course never know...

 There is precedence for different
 expansion paths dependent on optabs (or even rtx cost?).  Of course
 expand_unop doesn't get the original tree ops (expand_expr.c does,
 where some special-casing using get_gimple_for_expr is).  Not sure
 if expand_unop would get 'cond' in a form where it can recognize
 the result is either -1 or 0.

 It just seems inconsistent to have the optabs machinery try to detect
 this ad-hoc combination opportunity while still leaving the vcond optab
 to handle more arbitrary cases, like (vec_cond (eq x y) 0xbeef 0).
 The vcond optabs would still have the logic needed to produce the
 right code, but we'd be circumventing it and trying to reimplement
 one particular case in a different way.

That's true.  One could also leave it to combine / simplify_rtx and
thus rtx_cost.  But that's true of all of the match.pd stuff you add, no?

Richard.

 Thanks,
 Richard



Re: Remove redundant AND from count reduction loop

2015-06-24 Thread Richard Sandiford
Richard Biener richard.guent...@gmail.com writes:
 On Tue, Jun 23, 2015 at 11:27 PM, Marc Glisse marc.gli...@inria.fr wrote:
 On Tue, 23 Jun 2015, Richard Sandiford wrote:

 +/* Vector comparisons are defined to produce all-one or all-zero results.
 */
 +(simplify
 + (vec_cond @0 integer_all_onesp@1 integer_zerop@2)
 + (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
 +   (convert @0)))


 I am trying to understand why the test tree_nop_conversion_p is the right
 one (at least for the transformations not using VIEW_CONVERT_EXPR). By
 definition of VEC_COND_EXPR, type and TREE_TYPE (@0) are both integer vector
 types of the same size and number of elements. It thus seems like a
 conversion is always fine. For vectors, tree_nop_conversion_p apparently
 only checks that they have the same mode (quite often VOIDmode I guess).

 The only conversion we seem to allow is changing the signed vector from
 the comparison result to an unsigned vector (same number of elements
 and same mode of the elements).  That is, a check using
 TYPE_MODE (type) == TYPE_MODE (TREE_TYPE (@0)) would probably
 be better (well, technically a TYPE_VECTOR_SUBPARTS  element
 mode compare should be better as generic vectors might not have a vector 
 mode).

OK.  The reason I was being paranoid was that I couldn't see anywhere
where we enforced that the vector condition in a VEC_COND had to have
the same element width as the values being selected.  tree-cfg.c
only checks that rhs2 and rhs3 are compatible with the result.
There doesn't seem to be any checking of rhs1 vs. the other types.
So I wasn't sure whether anything stopped us from, e.g., comparing two
V4HIs and using the result to select between two V4SIs.

 I'm fine with using tree_nop_conversion_p for now.

I like the suggestion about checking TYPE_VECTOR_SUBPARTS and the element
mode.  How about:

 (if (VECTOR_INTEGER_TYPE_P (type)
   TYPE_VECTOR_SUBPARTS (type) == TYPE_VECTOR_SUBPARTS (TREE_TYPE (@0))
   (TYPE_MODE (TREE_TYPE (type))
  == TYPE_MODE (TREE_TYPE (TREE_TYPE (@0)

(But is it really OK to be adding more mode-based compatibility checks?
I thought you were hoping to move away from modes in the middle end.)

 +/* We could instead convert all instances of the vec_cond to negate,
 +   but that isn't necessarily a win on its own.  */

 so p ? 1 : 0 - -p?  Why isn't that a win on its own?  It looks more compact
 at least ;)  It would also simplify the patterns below.

In the past I've dealt with processors where arithmetic wasn't handled
as efficiently as logical ops.  Seems like an especial risk for 64-bit
elements, from a quick scan of the i386 scheduling models.

 I'm missing a comment on the transform done by the following patterns.

Heh.  The comment was supposed to be describing all four at once.
I originally had then bunched together without whitespace, but it
looked bad.

 +(simplify
 + (plus:c @3 (vec_cond @0 integer_each_onep@1 integer_zerop@2))
 + (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
 +  (minus @3 (convert @0
 +
 +(simplify
 + (plus:c @3 (view_convert_expr


 Aren't we suppose to drop _expr in match.pd?

 Yes.  I probably should adjust genmatch.c to reject the _expr variants ;)

OK.

 +(vec_cond @0 integer_each_onep@1 integer_zerop@2)))
 + (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
 +  (minus @3 (convert @0
 +
 +(simplify
 + (minus @3 (vec_cond @0 integer_each_onep@1 integer_zerop@2))
 + (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
 +  (plus @3 (convert @0
 +
 +(simplify
 + (minus @3 (view_convert_expr
 +   (vec_cond @0 integer_each_onep@1 integer_zerop@2)))
 + (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
 +  (plus @3 (convert @0
 +

 Generally for sign-conversions of vectors you should use view_convert.

OK.

 The above also hints at missing conditional view_convert support
 and a way to iterate over commutative vs. non-commutative ops so
 we could write

 (for op (plus:c minus)
  rop (minus plus)
   (op @3 (view_convert? (vec_cond @0 integer_each_onep@1 integer_zerop@2)))
   (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
(rop @3 (view_convert @0)

 I'll see implementing that.

Looks good. :-)

I also realised later that:

/* Vector comparisons are defined to produce all-one or all-zero results.  */
(simplify
 (vec_cond @0 integer_all_onesp@1 integer_zerop@2)
 (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
   (convert @0)))

is redundant with some fold-const.c code.

Thanks,
Richard



Re: Remove redundant AND from count reduction loop

2015-06-24 Thread Richard Biener
On Wed, Jun 24, 2015 at 11:57 AM, Richard Sandiford
richard.sandif...@arm.com wrote:
 Richard Biener richard.guent...@gmail.com writes:
 On Tue, Jun 23, 2015 at 11:27 PM, Marc Glisse marc.gli...@inria.fr wrote:
 On Tue, 23 Jun 2015, Richard Sandiford wrote:

 +/* Vector comparisons are defined to produce all-one or all-zero results.
 */
 +(simplify
 + (vec_cond @0 integer_all_onesp@1 integer_zerop@2)
 + (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
 +   (convert @0)))


 I am trying to understand why the test tree_nop_conversion_p is the right
 one (at least for the transformations not using VIEW_CONVERT_EXPR). By
 definition of VEC_COND_EXPR, type and TREE_TYPE (@0) are both integer vector
 types of the same size and number of elements. It thus seems like a
 conversion is always fine. For vectors, tree_nop_conversion_p apparently
 only checks that they have the same mode (quite often VOIDmode I guess).

 The only conversion we seem to allow is changing the signed vector from
 the comparison result to an unsigned vector (same number of elements
 and same mode of the elements).  That is, a check using
 TYPE_MODE (type) == TYPE_MODE (TREE_TYPE (@0)) would probably
 be better (well, technically a TYPE_VECTOR_SUBPARTS  element
 mode compare should be better as generic vectors might not have a vector 
 mode).

 OK.  The reason I was being paranoid was that I couldn't see anywhere
 where we enforced that the vector condition in a VEC_COND had to have
 the same element width as the values being selected.

We don't require that indeed.

  tree-cfg.c
 only checks that rhs2 and rhs3 are compatible with the result.
 There doesn't seem to be any checking of rhs1 vs. the other types.
 So I wasn't sure whether anything stopped us from, e.g., comparing two
 V4HIs and using the result to select between two V4SIs.

Nothing does (or should).

 I'm fine with using tree_nop_conversion_p for now.

 I like the suggestion about checking TYPE_VECTOR_SUBPARTS and the element
 mode.  How about:

  (if (VECTOR_INTEGER_TYPE_P (type)
TYPE_VECTOR_SUBPARTS (type) == TYPE_VECTOR_SUBPARTS (TREE_TYPE (@0))
(TYPE_MODE (TREE_TYPE (type))
   == TYPE_MODE (TREE_TYPE (TREE_TYPE (@0)

 (But is it really OK to be adding more mode-based compatibility checks?
 I thought you were hoping to move away from modes in the middle end.)

The TYPE_MODE check makes the VECTOR_INTEGER_TYPE_P check redundant
(the type of a comparison is always a signed vector integer type).
Yes, mode-based
checks are ok.  I don't see us moving away from them.

 +/* We could instead convert all instances of the vec_cond to negate,
 +   but that isn't necessarily a win on its own.  */

 so p ? 1 : 0 - -p?  Why isn't that a win on its own?  It looks more compact
 at least ;)  It would also simplify the patterns below.

 In the past I've dealt with processors where arithmetic wasn't handled
 as efficiently as logical ops.  Seems like an especial risk for 64-bit
 elements, from a quick scan of the i386 scheduling models.

But then expansion could undo this ...

 I'm missing a comment on the transform done by the following patterns.

 Heh.  The comment was supposed to be describing all four at once.
 I originally had then bunched together without whitespace, but it
 looked bad.

 +(simplify
 + (plus:c @3 (vec_cond @0 integer_each_onep@1 integer_zerop@2))
 + (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
 +  (minus @3 (convert @0
 +
 +(simplify
 + (plus:c @3 (view_convert_expr


 Aren't we suppose to drop _expr in match.pd?

 Yes.  I probably should adjust genmatch.c to reject the _expr variants ;)

 OK.

 +(vec_cond @0 integer_each_onep@1 integer_zerop@2)))
 + (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
 +  (minus @3 (convert @0
 +
 +(simplify
 + (minus @3 (vec_cond @0 integer_each_onep@1 integer_zerop@2))
 + (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
 +  (plus @3 (convert @0
 +
 +(simplify
 + (minus @3 (view_convert_expr
 +   (vec_cond @0 integer_each_onep@1 integer_zerop@2)))
 + (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
 +  (plus @3 (convert @0
 +

 Generally for sign-conversions of vectors you should use view_convert.

 OK.

 The above also hints at missing conditional view_convert support
 and a way to iterate over commutative vs. non-commutative ops so
 we could write

 (for op (plus:c minus)
  rop (minus plus)
   (op @3 (view_convert? (vec_cond @0 integer_each_onep@1 integer_zerop@2)))
   (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
(rop @3 (view_convert @0)

 I'll see implementing that.

 Looks good. :-)

 I also realised later that:

 /* Vector comparisons are defined to produce all-one or all-zero results.  */
 (simplify
  (vec_cond @0 integer_all_onesp@1 integer_zerop@2)
  (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
(convert @0)))

 is redundant with some fold-const.c code.

If so then you should remove the fold-const.c at the time you add the 

Re: [PATCH 3/3][AArch64 nofp] Fix another ICE with +nofp/-mgeneral-regs-only

2015-06-24 Thread James Greenhalgh
On Tue, Jun 23, 2015 at 05:03:28PM +0100, Alan Lawrence wrote:
 This fixes another ICE, obtained with the attached testcase - yes, there was 
 a 
 way to get hold of a float, without passing an argument or going through 
 movsf/movdf!
 
 Bootstrapped + check-gcc on aarch64-none-linux-gnu.
 
 gcc/ChangeLog:
 
   * config/aarch64/aarch64.md (optabfcvt_targetGPF:mode2):
   Condition on TARGET_FLOAT.
 
 gcc/testsuite/ChangeLog:
 
   * gcc.target/aarch64/mgeneral-regs_3.c: New.

OK.

Thanks,
James

 diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
 index 
 99cefece8093791ccf17cb071a4e9997bda8fd89..bcaafda5ea46f136dc90f34aa8f2dfaddabd09f5
  100644
 --- a/gcc/config/aarch64/aarch64.md
 +++ b/gcc/config/aarch64/aarch64.md
 @@ -4106,7 +4106,7 @@
  (define_insn optabfcvt_targetGPF:mode2
[(set (match_operand:GPF 0 register_operand =w,w)
  (FLOATUORS:GPF (match_operand:FCVT_TARGET 1 register_operand 
 w,r)))]
 -  
 +  TARGET_FLOAT
@
 su_optabcvtf\t%GPF:s0, %s1
 su_optabcvtf\t%GPF:s0, %w11
 diff --git a/gcc/testsuite/gcc.target/aarch64/mgeneral-regs_3.c 
 b/gcc/testsuite/gcc.target/aarch64/mgeneral-regs_3.c
 new file mode 100644
 index 
 ..225d9eaa45530d88315a146f3fae72d86fe66373
 --- /dev/null
 +++ b/gcc/testsuite/gcc.target/aarch64/mgeneral-regs_3.c
 @@ -0,0 +1,11 @@
 +/* { dg-options -mgeneral-regs-only -O2 } */
 +
 +extern void abort (void);
 +
 +int
 +test (int i, ...)
 +{
 +  float f = (float) i; /* { dg-error '-mgeneral-regs-only' is incompatible 
 with floating point code } */
 +  if (f != f) abort ();
 +  return 2;
 +}



C PATCH to use is_global_var

2015-06-24 Thread Marek Polacek
This patch makes the C FE use the predicate is_global_var in place of direct

  TREE_STATIC (t) || DECL_EXTERNAL (t)

It should improve readability a bit and make predicates easier to follow.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2015-06-24  Marek Polacek  pola...@redhat.com

* c-common.c (handle_no_reorder_attribute): Use is_global_var.
* cilk.c (extract_free_variables): Likewise.

* c-decl.c: Use is_global_var throughout.
* c-parser.c: Likewise.
* c-typeck.c: Likewise.

diff --git gcc/c-family/c-common.c gcc/c-family/c-common.c
index dee6550..d315854 100644
--- gcc/c-family/c-common.c
+++ gcc/c-family/c-common.c
@@ -7446,8 +7446,7 @@ handle_no_reorder_attribute (tree *pnode,
 {
   tree node = *pnode;
 
-  if (!VAR_OR_FUNCTION_DECL_P (node)
-!(TREE_STATIC (node) || DECL_EXTERNAL (node)))
+  if (!VAR_OR_FUNCTION_DECL_P (node)  !is_global_var (node))
 {
   warning (OPT_Wattributes,
%qE attribute only affects top level objects,
diff --git gcc/c-family/cilk.c gcc/c-family/cilk.c
index c38e05f..347e4b9 100644
--- gcc/c-family/cilk.c
+++ gcc/c-family/cilk.c
@@ -1063,7 +1063,7 @@ extract_free_variables (tree t, struct wrapper_data *wd,
TREE_ADDRESSABLE (t) = 1;
 case VAR_DECL:
 case PARM_DECL:
-  if (!TREE_STATIC (t)  !DECL_EXTERNAL (t))
+  if (!is_global_var (t))
add_variable (wd, t, how);
   return;
 
diff --git gcc/c/c-decl.c gcc/c/c-decl.c
index fc1fdf9..ab54db9 100644
--- gcc/c/c-decl.c
+++ gcc/c/c-decl.c
@@ -2650,9 +2650,8 @@ merge_decls (tree newdecl, tree olddecl, tree newtype, 
tree oldtype)
  tree_code_size (TREE_CODE (olddecl)) - sizeof (struct 
tree_decl_common));
  olddecl-decl_with_vis.symtab_node = snode;
 
- if ((DECL_EXTERNAL (olddecl)
-  || TREE_PUBLIC (olddecl)
-  || TREE_STATIC (olddecl))
+ if ((is_global_var (olddecl)
+  || TREE_PUBLIC (olddecl))
   DECL_SECTION_NAME (newdecl) != NULL)
set_decl_section_name (olddecl, DECL_SECTION_NAME (newdecl));
 
@@ -4395,7 +4394,7 @@ c_decl_attributes (tree *node, tree attributes, int flags)
   /* Add implicit omp declare target attribute if requested.  */
   if (current_omp_declare_target_attribute
((TREE_CODE (*node) == VAR_DECL
-   (TREE_STATIC (*node) || DECL_EXTERNAL (*node)))
+   is_global_var (*node))
  || TREE_CODE (*node) == FUNCTION_DECL))
 {
   if (TREE_CODE (*node) == VAR_DECL
@@ -4794,8 +4793,7 @@ finish_decl (tree decl, location_t init_loc, tree init,
   TREE_TYPE (decl) = error_mark_node;
 }
 
-  if ((DECL_EXTERNAL (decl) || TREE_STATIC (decl))
-  DECL_SIZE (decl) != 0)
+  if (is_global_var (decl)  DECL_SIZE (decl) != 0)
{
  if (TREE_CODE (DECL_SIZE (decl)) == INTEGER_CST)
constant_expression_warning (DECL_SIZE (decl));
@@ -4911,8 +4909,7 @@ finish_decl (tree decl, location_t init_loc, tree init,
{
  /* Recompute the RTL of a local array now
 if it used to be an incomplete type.  */
- if (was_incomplete
-  !TREE_STATIC (decl)  !DECL_EXTERNAL (decl))
+ if (was_incomplete  !is_global_var (decl))
{
  /* If we used it already as memory, it must stay in memory.  */
  TREE_ADDRESSABLE (decl) = TREE_USED (decl);
diff --git gcc/c/c-parser.c gcc/c/c-parser.c
index e0ab0a1..f4d18bd 100644
--- gcc/c/c-parser.c
+++ gcc/c/c-parser.c
@@ -14769,7 +14769,7 @@ c_parser_omp_threadprivate (c_parser *parser)
error_at (loc, %qD is not a variable, v);
   else if (TREE_USED (v)  !C_DECL_THREADPRIVATE_P (v))
error_at (loc, %qE declared %threadprivate% after first use, v);
-  else if (! TREE_STATIC (v)  ! DECL_EXTERNAL (v))
+  else if (! is_global_var (v))
error_at (loc, automatic variable %qE cannot be %threadprivate%, v);
   else if (TREE_TYPE (v) == error_mark_node)
;
diff --git gcc/c/c-typeck.c gcc/c/c-typeck.c
index aeb1043..3dc1f07 100644
--- gcc/c/c-typeck.c
+++ gcc/c/c-typeck.c
@@ -4380,7 +4380,7 @@ c_mark_addressable (tree exp)
if (C_DECL_REGISTER (x)
 DECL_NONLOCAL (x))
  {
-   if (TREE_PUBLIC (x) || TREE_STATIC (x) || DECL_EXTERNAL (x))
+   if (TREE_PUBLIC (x) || is_global_var (x))
  {
error
  (global register variable %qD used in nested function, x);
@@ -4390,7 +4390,7 @@ c_mark_addressable (tree exp)
  }
else if (C_DECL_REGISTER (x))
  {
-   if (TREE_PUBLIC (x) || TREE_STATIC (x) || DECL_EXTERNAL (x))
+   if (TREE_PUBLIC (x) || is_global_var (x))
  error (address of global register variable %qD requested, x);
else
  error (address of register variable %qD requested, x);
@@ -9470,8 +9470,7 @@ c_finish_return (location_t loc, tree 

[gomp4.1] Add affinity query routines

2015-06-24 Thread Jakub Jelinek
Hi!

This got enacted earlier this week, a couple of routines to query
the affinity.

2015-06-24  Jakub Jelinek  ja...@redhat.com

* omp.h.in (omp_get_num_places, omp_get_place_num_procs,
omp_get_place_proc_ids, omp_get_place_num,
omp_get_partition_num_places, omp_get_partition_place_nums): New
prototypes.
* omp_lib.f90.in (omp_get_num_places, omp_get_place_num_procs,
omp_get_place_proc_ids, omp_get_place_num,
omp_get_partition_num_places, omp_get_partition_place_nums): New
interfaces.
* omp_lib.h.in (omp_get_num_places, omp_get_place_num_procs,
omp_get_place_proc_ids, omp_get_place_num,
omp_get_partition_num_places, omp_get_partition_place_nums): New
externs.
* libgomp.map (OMP_4.1): Export omp_get_num_places,
omp_get_place_num_procs, omp_get_place_proc_ids, omp_get_place_num,
omp_get_partition_num_places, omp_get_partition_place_nums,
omp_get_num_places_, omp_get_place_num_procs_,
omp_get_place_num_procs_8_, omp_get_place_proc_ids_,
omp_get_place_proc_ids_8_, omp_get_place_num_,
omp_get_partition_num_places_, omp_get_partition_place_nums_
and omp_get_partition_place_nums_8_.
* libgomp.h (gomp_get_place_proc_ids_8): New prototype.
* env.c (omp_get_num_places, omp_get_place_num,
omp_get_partition_num_places, omp_get_partition_place_nums): New
functions, add ialias for them.
* fortran.c (omp_get_num_places, omp_get_place_num, 
omp_get_partition_num_places, omp_get_partition_place_nums,
omp_get_place_num_procs, omp_get_place_proc_ids): New
ialias_redirects.
(omp_get_num_places_, omp_get_place_num_procs_,
omp_get_place_num_procs_8_, omp_get_place_proc_ids_,
omp_get_place_proc_ids_8_, omp_get_place_num_,
omp_get_partition_num_places_, omp_get_partition_place_nums_,
omp_get_partition_place_nums_8_): New functions.
* config/linux/affinity.c (omp_get_place_num_procs,
omp_get_place_proc_ids, gomp_get_place_proc_ids_8): New functions.
* config/posix/affinity.c (omp_get_place_num_procs,
omp_get_place_proc_ids, gomp_get_place_proc_ids_8): New functions.
* testsuite/libgomp.c/affinity-2.c: New test.
* testsuite/libgomp.fortran/affinity1.f90: New test.
* testsuite/libgomp.fortran/affinity2.f90: New test.

--- libgomp/omp.h.in.jj 2015-06-12 16:45:16.0 +0200
+++ libgomp/omp.h.in2015-06-23 19:24:15.056053879 +0200
@@ -125,6 +125,12 @@ extern int omp_in_final (void) __GOMP_NO
 
 extern int omp_get_cancellation (void) __GOMP_NOTHROW;
 extern omp_proc_bind_t omp_get_proc_bind (void) __GOMP_NOTHROW;
+extern int omp_get_num_places (void) __GOMP_NOTHROW;
+extern int omp_get_place_num_procs (int) __GOMP_NOTHROW;
+extern void omp_get_place_proc_ids (int, int *) __GOMP_NOTHROW;
+extern int omp_get_place_num (void) __GOMP_NOTHROW;
+extern int omp_get_partition_num_places (void) __GOMP_NOTHROW;
+extern void omp_get_partition_place_nums (int *) __GOMP_NOTHROW;
 
 extern void omp_set_default_device (int) __GOMP_NOTHROW;
 extern int omp_get_default_device (void) __GOMP_NOTHROW;
--- libgomp/omp_lib.f90.in.jj   2015-06-12 17:34:56.0 +0200
+++ libgomp/omp_lib.f90.in  2015-06-24 11:49:40.159360460 +0200
@@ -330,6 +330,58 @@
   end function omp_get_proc_bind
 end interface
 
+interface
+  function omp_get_num_places ()
+integer (4) :: omp_get_num_places
+  end function omp_get_num_places
+end interface
+
+interface omp_get_place_num_procs
+  function omp_get_place_num_procs (place_num)
+integer (4), intent(in) :: place_num
+integer (4) :: omp_get_place_num_procs
+  end function omp_get_place_num_procs
+
+  function omp_get_place_num_procs_8 (place_num)
+integer (8), intent(in) :: place_num
+integer (4) :: omp_get_place_num_procs_8
+  end function omp_get_place_num_procs_8
+end interface
+
+interface omp_get_place_proc_ids
+  subroutine omp_get_place_proc_ids (place_num, ids)
+integer (4), intent(in) :: place_num
+integer (4), intent(out) :: ids(*)
+  end subroutine omp_get_place_proc_ids
+
+  subroutine omp_get_place_proc_ids_8 (place_num, ids)
+integer (8), intent(in) :: place_num
+integer (8), intent(out) :: ids(*)
+  end subroutine omp_get_place_proc_ids_8
+end interface
+
+interface
+  function omp_get_place_num ()
+integer (4) :: omp_get_place_num
+  end function omp_get_place_num
+end interface
+
+interface
+  function omp_get_partition_num_places ()
+integer (4) :: omp_get_partition_num_places
+  end function omp_get_partition_num_places
+end interface
+
+   

Re: [06/12] Consolidate string hashers

2015-06-24 Thread Mikhail Maltsev
On 23.06.2015 17:49, Richard Sandiford wrote:
 This patch replaces various string hashers with a single copy
 in hash-traits.h.

(snip)

 Index: gcc/config/alpha/alpha.c
 ===
 --- gcc/config/alpha/alpha.c  2015-06-23 15:48:30.751788389 +0100
 +++ gcc/config/alpha/alpha.c  2015-06-23 15:48:30.747788453 +0100
 @@ -4808,13 +4808,7 @@ alpha_multipass_dfa_lookahead (void)
  
  struct GTY(()) alpha_links;
  
 -struct string_traits : default_hashmap_traits
 -{
 -  static bool equal_keys (const char *const a, const char *const b)
 -  {
 -return strcmp (a, b) == 0;
 -  }
 -};
 +typedef simple_hashmap_traits nofree_string_hash string_traits;
  

I remember that when we briefly discussed unification of string traits,
a looked through GCC code and this one seemed weird to me: it does not
reimplement the hash function. I.e. the pointer value is used as hash. I
wonder, is it intentional or not? This could actually work if strings
are interned (but in that case there is no need to compare them, because
comparing pointers would be enough).

-- 
Regards,
Mikhail Maltsev


Re: New type-based pool allocator code miscompiled due to aliasing issue?

2015-06-24 Thread Martin Liška

On 06/23/2015 09:44 PM, Pat Haugen wrote:

On 06/18/2015 06:10 AM, Richard Biener wrote:

You are right that we should call ::new just for classes that have 
m_ignore_type_size == false.
I've come up with following patch, that I tested slightly:

diff --git a/gcc/alloc-pool.h b/gcc/alloc-pool.h
index 1785df5..7da5f7a 100644
--- a/gcc/alloc-pool.h
+++ b/gcc/alloc-pool.h
@@ -412,8 +412,16 @@ pool_allocatorT::allocate ()
  #endif
VALGRIND_DISCARD (VALGRIND_MAKE_MEM_UNDEFINED (header, size));

+  T *ptr = (T *)header;
+
/* Call default constructor.  */
-  return (T *)(header);
+  if (!m_ignore_type_size)
+{
+  memset (header + sizeof (T), 0, m_extra_size);
+  return ::new (ptr) T;
+}
+  else
+return ptr;
  }

  /* Puts PTR back on POOL's free list.  */

Would it be suitable?

Suitable with the memset removed, yes.

What's the status of this patch? I have a couple spec regression testers that 
have been unable to build GCC due to this issue, specifically the sched-deps.c 
change. The above patch (with memset removed) does result in a successful build.

Thanks,
Pat



Hello.

I've finishing a new patch that will do the job in more suitable way.

Martin


[PATCH IRA] save a bitmap check

2015-06-24 Thread Zhouyi Zhou

In function assign_hard_reg, checking the bit of conflict_a in 
consideration_allocno_bitmap is unneccesary, because when retry_p is 
false, conflicting objects are always inside of the same loop_node
(this is ensured in function process_bb_node_lives which marks the
living objects to death near the end of that function).

   

Bootstrap and regtest scheduled on x86_64 GNU/Linux
Signed-off-by: Zhouyi Zhou yizhouz...@ict.ac.cn
---
 gcc/ChangeLog   | 4 
 gcc/ira-color.c | 6 ++
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index d1f82b2..07605ae 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,7 @@
+2015-06-24  Zhouyi Zhou  yizhouz...@ict.ac.cn
+
+   * ira-color.c (assign_hard_reg): save a bitmap check
+   
 2015-06-24  Andreas Krebbel  kreb...@linux.vnet.ibm.com
 
PR rtl-optimization/66306
diff --git a/gcc/ira-color.c b/gcc/ira-color.c
index 6c53507..d7776d6 100644
--- a/gcc/ira-color.c
+++ b/gcc/ira-color.c
@@ -1733,14 +1733,12 @@ assign_hard_reg (ira_allocno_t a, bool retry_p)
  /* Reload can give another class so we need to check all
 allocnos.  */
  if (!retry_p
-  (!bitmap_bit_p (consideration_allocno_bitmap,
-ALLOCNO_NUM (conflict_a))
- || ((!ALLOCNO_ASSIGNED_P (conflict_a)
+  ((!ALLOCNO_ASSIGNED_P (conflict_a)
   || ALLOCNO_HARD_REGNO (conflict_a)  0)
   !(hard_reg_set_intersect_p
   (profitable_hard_regs,
ALLOCNO_COLOR_DATA
-   (conflict_a)-profitable_hard_regs)
+   (conflict_a)-profitable_hard_regs
continue;
  conflict_aclass = ALLOCNO_CLASS (conflict_a);
  ira_assert (ira_reg_classes_intersect_p
-- 
1.9.1











Re: [PATCH 1/8] S/390 Vector ABI GNU Attribute.

2015-06-24 Thread Richard Biener
On Wed, Jun 24, 2015 at 8:57 AM, Andreas Krebbel
kreb...@linux.vnet.ibm.com wrote:
 With this patch .gnu_attribute is used to mark binaries with a vector
 ABI tag.  This is required since the z13 vector support breaks the ABI
 of existing vector_size attribute generated vector types:

 1. vector_size(16) and bigger vectors are aligned to 8 byte
 boundaries (formerly vectors were always naturally aligned)

 2. vector_size(16) or smaller vectors are passed via VR if available
 or by value on the stack (formerly vector were passed on the stack by
 reference).

 The .gnu_attribute will be used by ld to emit a warning if binaries
 with incompatible ABIs are being linked together:
 https://sourceware.org/ml/binutils/2015-04/msg00316.html

 And it will be used by GDB to perform inferior function calls using a
 vector ABI which fits to the binary being debugged:
 https://sourceware.org/ml/gdb-patches/2015-04/msg00833.html

 The current implementation tries to only set the attribute if the
 vector types are really used in ABI relevant contexts in order to
 avoid false positives during linking.

 However, this unfortunately has some limitations like in the following
 case where an ABI relevant context cannot be detected properly:

 typedef int __attribute__((vector_size(16))) v4si;
 struct A
 {
   char x;
   v4si y;
 };
 char a[sizeof(struct A)];

 The number of elements in a depends on the ABI (24 with -mvx and 32
 with -mno-vx).  However, the implementation is not able to detect this
 since the struct type is not used anywhere else and consequently does
 not survive until the checking code is able to see it.

 Ideas about how to improve the implementation without creating too
 many false postives are welcome.

I'd be more conservative and instead hook into
targetm.vector_mode_supported_p (and thus vector_type_mode).

Yes, it will trip on local vector types.  But I can't see how you
can avoid this in general without seeing the whole program.

If I'd do it retro-actively I'd reverse the flag and instead mark units
which use generic non-z13 vectors...

Note that other targets simply emit -Wpsabi warnings here:

 gcc t.c -S -m32
t.c: In function ‘foo’:
t.c:4:1: warning: SSE vector return without SSE enabled changes the
ABI [-Wpsabi]
 {
 ^
t.c:3:6: note: The ABI for passing parameters with 16-byte alignment
has changed in GCC 4.6
 v4si foo (v4si x)
  ^
t.c:3:6: warning: SSE vector argument without SSE enabled changes the
ABI [-Wpsabi]

for

typedef int v4si __attribute__((vector_size(16)));

v4si foo (v4si x)
{
  return x;
}

on i?86 without -msse2.  So you could as well do that - warn for vector
type uses on non-z13 and be done with that.

Richard.

 In particular we do not want to set the attribute for local uses of
 vector types as they would be natural for ifunc optimizations.

 gcc/
 * config/s390/s390.c (s390_vector_abi): New variable definition.
 (s390_check_type_for_vector_abi): New function.
 (TARGET_ASM_FILE_END): New macro definition.
 (s390_asm_file_end): New function.
 (s390_function_arg): Call s390_check_type_for_vector_abi.
 (s390_gimplify_va_arg): Likewise.
 * configure: Regenerate.
 * configure.ac: Check for .gnu_attribute Binutils feature.

 gcc/testsuite/
 * gcc.target/s390/vector/vec-abi-1.c: Add gnu attribute check.
 * gcc.target/s390/vector/vec-abi-attr-1.c: New test.
 * gcc.target/s390/vector/vec-abi-attr-2.c: New test.
 * gcc.target/s390/vector/vec-abi-attr-3.c: New test.
 * gcc.target/s390/vector/vec-abi-attr-4.c: New test.
 * gcc.target/s390/vector/vec-abi-attr-5.c: New test.
 * gcc.target/s390/vector/vec-abi-attr-6.c: New test.
 ---
  gcc/config/s390/s390.c |  121 
 
  gcc/configure  |   36 ++
  gcc/configure.ac   |7 ++
  gcc/testsuite/gcc.target/s390/vector/vec-abi-1.c   |1 +
  .../gcc.target/s390/vector/vec-abi-attr-1.c|   18 +++
  .../gcc.target/s390/vector/vec-abi-attr-2.c|   53 +
  .../gcc.target/s390/vector/vec-abi-attr-3.c|   18 +++
  .../gcc.target/s390/vector/vec-abi-attr-4.c|   17 +++
  .../gcc.target/s390/vector/vec-abi-attr-5.c|   19 +++
  .../gcc.target/s390/vector/vec-abi-attr-6.c|   24 
  10 files changed, 314 insertions(+)
  create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-abi-attr-1.c
  create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-abi-attr-2.c
  create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-abi-attr-3.c
  create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-abi-attr-4.c
  create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-abi-attr-5.c
  create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-abi-attr-6.c

 diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
 index d6ed179..934f7c0 100644
 --- 

Re: [PATCH] PR c++/65750

2015-06-24 Thread Paolo Carlini

Hi,

On 04/14/2015 11:34 PM, Jason Merrill wrote:

On 04/14/2015 05:27 PM, Adam Butcher wrote:

On 2015-04-10 15:57, Adam Butcher wrote:

+  cp_lexer_consume_token (parser-lexer);


Actually there should be two of these as the 'auto' isn't consumed yet.

OK.


I'm finishing retesting the amended patch and, if everything goes well, 
I will apply it to trunk, as approved by Jason (only additional minor 
tweak: testcase in cpp0x instead of cpp1y).


What about gcc-5-branch? It's a regression.

Thanks,
Paolo.


Re: Remove redundant AND from count reduction loop

2015-06-24 Thread Richard Biener
On Wed, Jun 24, 2015 at 1:10 PM, Richard Sandiford
richard.sandif...@arm.com wrote:
 Richard Biener richard.guent...@gmail.com writes:
 I'm fine with using tree_nop_conversion_p for now.

 I like the suggestion about checking TYPE_VECTOR_SUBPARTS and the element
 mode.  How about:

  (if (VECTOR_INTEGER_TYPE_P (type)
TYPE_VECTOR_SUBPARTS (type) == TYPE_VECTOR_SUBPARTS (TREE_TYPE 
 (@0))
(TYPE_MODE (TREE_TYPE (type))
   == TYPE_MODE (TREE_TYPE (TREE_TYPE (@0)

 (But is it really OK to be adding more mode-based compatibility checks?
 I thought you were hoping to move away from modes in the middle end.)

 The TYPE_MODE check makes the VECTOR_INTEGER_TYPE_P check redundant
 (the type of a comparison is always a signed vector integer type).

 OK, will just use VECTOR_TYPE_P then.

Given we're in a VEC_COND_EXPR that's redundant as well.

 +/* We could instead convert all instances of the vec_cond to negate,
 +   but that isn't necessarily a win on its own.  */

 so p ? 1 : 0 - -p?  Why isn't that a win on its own?  It looks more 
 compact
 at least ;)  It would also simplify the patterns below.

 In the past I've dealt with processors where arithmetic wasn't handled
 as efficiently as logical ops.  Seems like an especial risk for 64-bit
 elements, from a quick scan of the i386 scheduling models.

 But then expansion could undo this ...

 So do the inverse fold and convert (neg (cond)) to (vec_cond cond 1 0)?
 Is there precendent for doing that kind of thing?

Expanding it as this, yes.  Whether there is precedence no idea, but
surely the expand_unop path could, if there is no optab for neg:vector_mode,
try expanding as vec_cond .. 1 0.  There is precedence for different
expansion paths dependent on optabs (or even rtx cost?).  Of course
expand_unop doesn't get the original tree ops (expand_expr.c does,
where some special-casing using get_gimple_for_expr is).  Not sure
if expand_unop would get 'cond' in a form where it can recognize
the result is either -1 or 0.

 I also realised later that:

 /* Vector comparisons are defined to produce all-one or all-zero results.  
 */
 (simplify
  (vec_cond @0 integer_all_onesp@1 integer_zerop@2)
  (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
(convert @0)))

 is redundant with some fold-const.c code.

 If so then you should remove the fold-const.c at the time you add the 
 pattern.

 Can I just drop that part of the patch instead?  The fold-const.c
 code handles COND_EXPR and VEC_COND_EXPR analogously, so I'd have
 to move COND_EXPR at the same time.  And then the natural follow-on
 would be: why not move the other COND_EXPR and VEC_COND_EXPR folds too? :-)

Yes, why not? ;)  But sure, you can also drop the case for now.

 Note that ISTR code performing exactly the opposite transform in
 fold-const.c ...

 That's another reason why I'm worried about just doing the (negate ...)
 thing without knowing whether the negate can be folded into anything else.

I'm not aware of anything here.

Richard.

 Thanks,
 Richard



Re: Remove redundant AND from count reduction loop

2015-06-24 Thread Richard Sandiford
Richard Biener richard.guent...@gmail.com writes:
 I'm fine with using tree_nop_conversion_p for now.

 I like the suggestion about checking TYPE_VECTOR_SUBPARTS and the element
 mode.  How about:

  (if (VECTOR_INTEGER_TYPE_P (type)
TYPE_VECTOR_SUBPARTS (type) == TYPE_VECTOR_SUBPARTS (TREE_TYPE (@0))
(TYPE_MODE (TREE_TYPE (type))
   == TYPE_MODE (TREE_TYPE (TREE_TYPE (@0)

 (But is it really OK to be adding more mode-based compatibility checks?
 I thought you were hoping to move away from modes in the middle end.)

 The TYPE_MODE check makes the VECTOR_INTEGER_TYPE_P check redundant
 (the type of a comparison is always a signed vector integer type).

OK, will just use VECTOR_TYPE_P then.

 +/* We could instead convert all instances of the vec_cond to negate,
 +   but that isn't necessarily a win on its own.  */

 so p ? 1 : 0 - -p?  Why isn't that a win on its own?  It looks more compact
 at least ;)  It would also simplify the patterns below.

 In the past I've dealt with processors where arithmetic wasn't handled
 as efficiently as logical ops.  Seems like an especial risk for 64-bit
 elements, from a quick scan of the i386 scheduling models.

 But then expansion could undo this ...

So do the inverse fold and convert (neg (cond)) to (vec_cond cond 1 0)?
Is there precendent for doing that kind of thing?

 I also realised later that:

 /* Vector comparisons are defined to produce all-one or all-zero results.  */
 (simplify
  (vec_cond @0 integer_all_onesp@1 integer_zerop@2)
  (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
(convert @0)))

 is redundant with some fold-const.c code.

 If so then you should remove the fold-const.c at the time you add the pattern.

Can I just drop that part of the patch instead?  The fold-const.c
code handles COND_EXPR and VEC_COND_EXPR analogously, so I'd have
to move COND_EXPR at the same time.  And then the natural follow-on
would be: why not move the other COND_EXPR and VEC_COND_EXPR folds too? :-)

 Note that ISTR code performing exactly the opposite transform in
 fold-const.c ...

That's another reason why I'm worried about just doing the (negate ...)
thing without knowing whether the negate can be folded into anything else.

Thanks,
Richard



Re: C PATCH to use VAR_P

2015-06-24 Thread Marek Polacek
On Wed, Jun 24, 2015 at 02:37:30PM +0200, Uros Bizjak wrote:
 Hello!
 
  Similarly to what Gaby did in 2013 for C++
  (https://gcc.gnu.org/ml/gcc-patches/2013-03/msg01271.html), this patch
  makes the c/ and c-family/ code use VAR_P rather than
 
TREE_CODE (t) == VAR_DECL
 
  (This is on top of the previous patch with is_global_var.)
 
 You could also use VAR_OR_FUNCTION_DECL, e.g. in the part below.
 
Sure, I thought I had dealt with VAR_OR_FUNCTION_DECL_P in
https://gcc.gnu.org/ml/gcc-patches/2015-05/msg01797.html, but
I must have missed this.  Thanks,


Marek


Re: [06/12] Consolidate string hashers

2015-06-24 Thread Richard Sandiford
Mikhail Maltsev malts...@gmail.com writes:
 On 23.06.2015 17:49, Richard Sandiford wrote:
 Index: gcc/config/alpha/alpha.c
 ===
 --- gcc/config/alpha/alpha.c 2015-06-23 15:48:30.751788389 +0100
 +++ gcc/config/alpha/alpha.c 2015-06-23 15:48:30.747788453 +0100
 @@ -4808,13 +4808,7 @@ alpha_multipass_dfa_lookahead (void)
  
  struct GTY(()) alpha_links;
  
 -struct string_traits : default_hashmap_traits
 -{
 -  static bool equal_keys (const char *const a, const char *const b)
 -  {
 -return strcmp (a, b) == 0;
 -  }
 -};
 +typedef simple_hashmap_traits nofree_string_hash string_traits;
  

 I remember that when we briefly discussed unification of string traits,
 a looked through GCC code and this one seemed weird to me: it does not
 reimplement the hash function. I.e. the pointer value is used as hash. I
 wonder, is it intentional or not? This could actually work if strings
 are interned (but in that case there is no need to compare them, because
 comparing pointers would be enough).

I think it was accidental.  The code originally used splay trees and
so didn't need to provide a hash.

SYMBOL_REF names are unique, like you say, so pointer equality should be
enough.  Even then though, htab_hash_string ought to give a better hash
than the pointer value (as well as giving a stable order, although that
isn't important here).  So IMO the patch as it stands is still an
improvement: we're keeping the existing comparison function but adding
a better hasher.

If the series goes anywhere I might look at adding a dedicated interned
string hasher that sits inbetween pointer_hash and string_hash.

Thanks,
Richard



Re: Remove redundant AND from count reduction loop

2015-06-24 Thread Richard Sandiford
Richard Biener richard.guent...@gmail.com writes:
 On Wed, Jun 24, 2015 at 1:10 PM, Richard Sandiford
 richard.sandif...@arm.com wrote:
 Richard Biener richard.guent...@gmail.com writes:
 I'm fine with using tree_nop_conversion_p for now.

 I like the suggestion about checking TYPE_VECTOR_SUBPARTS and the element
 mode.  How about:

  (if (VECTOR_INTEGER_TYPE_P (type)
TYPE_VECTOR_SUBPARTS (type) == TYPE_VECTOR_SUBPARTS (TREE_TYPE 
 (@0))
(TYPE_MODE (TREE_TYPE (type))
   == TYPE_MODE (TREE_TYPE (TREE_TYPE (@0)

 (But is it really OK to be adding more mode-based compatibility checks?
 I thought you were hoping to move away from modes in the middle end.)

 The TYPE_MODE check makes the VECTOR_INTEGER_TYPE_P check redundant
 (the type of a comparison is always a signed vector integer type).

 OK, will just use VECTOR_TYPE_P then.

 Given we're in a VEC_COND_EXPR that's redundant as well.

Hmm, but is it really guaranteed in:

 (plus:c @3 (view_convert (vec_cond @0 integer_each_onep@1 integer_zerop@2)))

that the @3 and the view_convert are also vectors?  I thought we allowed
view_converts from vector to non-vector types.

 +/* We could instead convert all instances of the vec_cond to negate,
 +   but that isn't necessarily a win on its own.  */

 so p ? 1 : 0 - -p?  Why isn't that a win on its own?  It looks
 more compact
 at least ;)  It would also simplify the patterns below.

 In the past I've dealt with processors where arithmetic wasn't handled
 as efficiently as logical ops.  Seems like an especial risk for 64-bit
 elements, from a quick scan of the i386 scheduling models.

 But then expansion could undo this ...

 So do the inverse fold and convert (neg (cond)) to (vec_cond cond 1 0)?
 Is there precendent for doing that kind of thing?

 Expanding it as this, yes.  Whether there is precedence no idea, but
 surely the expand_unop path could, if there is no optab for neg:vector_mode,
 try expanding as vec_cond .. 1 0.

Yeah, that part isn't the problem.  It's when there is an implementation
of (neg ...) (which I'd hope all real integer vector architectures would
support) but it's not as efficient as the (and ...) that most targets
would use for a (vec_cond ... 0).

 There is precedence for different
 expansion paths dependent on optabs (or even rtx cost?).  Of course
 expand_unop doesn't get the original tree ops (expand_expr.c does,
 where some special-casing using get_gimple_for_expr is).  Not sure
 if expand_unop would get 'cond' in a form where it can recognize
 the result is either -1 or 0.

It just seems inconsistent to have the optabs machinery try to detect
this ad-hoc combination opportunity while still leaving the vcond optab
to handle more arbitrary cases, like (vec_cond (eq x y) 0xbeef 0).
The vcond optabs would still have the logic needed to produce the
right code, but we'd be circumventing it and trying to reimplement
one particular case in a different way.

Thanks,
Richard



Re: [PATCH] config/bfin/bfin.c (hwloop_optimize): Set JUMP_LABEL() after emit jump_insn

2015-06-24 Thread Chen Gang
On 6/24/15 12:25, Jeff Law wrote:
 On 06/20/2015 04:48 AM, Chen Gang wrote:
 JUMP_LABLE() must be defined after optimization completed. In this case,
 it is doing optimization, and is almost finished, so it is no chances to
 set JUMP_LABLE() next. The related issue is Bug 65803.

 2015-06-20  Chen Gang  gang.chen.5...@gmail.com

 * config/bfin/bfin.c (hwloop_optimize): Set JUMP_LABEL() after
 emit jump_insn.
 Thanks.  I've reduced the testcase from pr65803 and committed the changes to 
 the trunk along with the reduced testcase.
 
 I tested the bfin port lightly -- just confirmed that it'd build newlib as a 
 sanity test.
 
 Actual committed patch is attached for archival purposes.
 

OK, thanks. I shall continue to try another bfin bugs (which I found
during building Linux kernel with allmodconfig). I shall try to finish
one bug within this month (2015-06-30).

After finish bfin bugs (may be several bugs left, now), I shall try
tilegx testsuite with qemu (I almost finish qemu tilegx linux-user with
much help by qemu members). I shall try to start within next month.

And sorry for my disappearing several months:

 - I had to spend more time resources on qemu tilegx (for qemu tilegx, I
   had already delayed too much to bear).

 - I am not quite familiar with gcc internal contents, so if I do not
   spend enough time resources on an issue, I will probably send spams (
   may be even worse: hiding the issues instead of solving it).


Thanks.
-- 
Chen Gang

Open, share, and attitude like air, water, and life which God blessed


RE: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math

2015-06-24 Thread Evandro Menezes
Philipp,

I think that execute_cse_reciprocals_1() applies only when the denominator is 
known at compile-time, otherwise the division stays.  It doesn't seem to know 
whether the target supports the approximate reciprocal or not.

Cheers,

-- 
Evandro Menezes  Austin, TX


 -Original Message-
 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org] On
 Behalf Of Dr. Philipp Tomsich
 Sent: Wednesday, June 24, 2015 15:08
 To: Evandro Menezes
 Cc: Benedikt Huber; gcc-patches@gcc.gnu.org
 Subject: Re: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt)
 estimation in -ffast-math
 
 Evandro,
 
 Shouldn't ‘execute_cse_reciprocals_1’ take care of this, once the reciprocal-
 division is implemented?
 Do you think there’s additional work needed to catch all cases/opportunities?
 
 Best,
 Philipp.
 
  On 24 Jun 2015, at 20:19, Evandro Menezes e.mene...@samsung.com wrote:
 
  Benedikt,
 
  Are you developing the reciprocal approximation just for 1/x proper or for
 any division, as in x/y = x * 1/y?
 
  Thank you,
 
  --
  Evandro Menezes  Austin, TX
 
 
  -Original Message-
  From: Benedikt Huber [mailto:benedikt.hu...@theobroma-systems.com]
  Sent: Wednesday, June 24, 2015 12:11
  To: Dr. Philipp Tomsich
  Cc: Evandro Menezes; gcc-patches@gcc.gnu.org
  Subject: Re: [PATCH] [aarch64] Implemented reciprocal square root
  (rsqrt) estimation in -ffast-math
 
  Evandro,
 
  Yes, we also have the 1/x approximation.
  However we do not have the test cases yet, and it also would need
  some clean up.
  I am going to provide a patch for that soon (say next week).
  Also, for this optimization we have *not* yet found a benchmark with
  significant improvements.
 
  Best Regards,
  Benedikt
 
 
  On 24 Jun 2015, at 18:52, Dr. Philipp Tomsich
  philipp.tomsich@theobroma-
  systems.com wrote:
 
  Evandro,
 
  We’ve seen a 28% speed-up on gromacs in SPECfp for the (scalar)
  reciprocal
  sqrt.
 
  Also, the “reciprocal divide” patches are floating around in various
  of our git-tree, but aren’t ready for public consumption, yet… I’ll
  leave Benedikt to comment on potential timelines for getting that
  pushed
  out.
 
  Best,
  Philipp.
 
  On 24 Jun 2015, at 18:42, Evandro Menezes e.mene...@samsung.com wrote:
 
  Benedikt,
 
  You beat me to it! :-)  Do you have the implementation for dividing
  using the Newton series as well?
 
  I'm not sure that the series is always for all data types and on
  all processors.  It would be useful to allow each AArch64 processor
  to enable this or not depending on the data type.  BTW, do you have
  some tests showing the speed up?
 
  Thank you,
 
  --
  Evandro Menezes  Austin, TX
 
  -Original Message-
  From: gcc-patches-ow...@gcc.gnu.org
  [mailto:gcc-patches-ow...@gcc.gnu.org]
  On
  Behalf Of Benedikt Huber
  Sent: Thursday, June 18, 2015 7:04
  To: gcc-patches@gcc.gnu.org
  Cc: benedikt.hu...@theobroma-systems.com;
  philipp.tomsich@theobroma- systems.com
  Subject: [PATCH] [aarch64] Implemented reciprocal square root
  (rsqrt) estimation in -ffast-math
 
  arch64 offers the instructions frsqrte and frsqrts, for rsqrt
  estimation
  and
  a Newton-Raphson step, respectively.
  There are ARMv8 implementations where this is faster than using
  fdiv and rsqrt.
  It runs three steps for double and two steps for float to achieve
  the
  needed
  precision.
 
  There is one caveat and open question.
  Since -ffast-math enables flush to zero intermediate values
  between approximation steps will be flushed to zero if they are
 denormal.
  E.g. This happens in the case of rsqrt (DBL_MAX) and rsqrtf (FLT_MAX).
  The test cases pass, but it is unclear to me whether this is
  expected behavior with -ffast-math.
 
  The patch applies to commit:
  svn+ssh://gcc.gnu.org/svn/gcc/trunk@224470
 
  Please consider including this patch.
  Thank you and best regards,
  Benedikt Huber
 
  Benedikt Huber (1):
  2015-06-15  Benedikt Huber  benedikt.hu...@theobroma-systems.com
 
  gcc/ChangeLog|   9 +++
  gcc/config/aarch64/aarch64-builtins.c|  60 
  gcc/config/aarch64/aarch64-protos.h  |   2 +
  gcc/config/aarch64/aarch64-simd.md   |  27 
  gcc/config/aarch64/aarch64.c |  63 +
  gcc/config/aarch64/aarch64.md|   3 +
  gcc/testsuite/gcc.target/aarch64/rsqrt.c | 113
  +++
  7 files changed, 277 insertions(+) create mode 100644
  gcc/testsuite/gcc.target/aarch64/rsqrt.c
 
  --
  1.9.1
  Mail Attachment.eml
 
 
 



[C++ Patch] PR 51911

2015-06-24 Thread Paolo Carlini

Hi,

the below implements quite literally the requirements. It does that 
after the cp_parser_new_initializer call, I think that makes in general 
for better error recovery. The wording definitely needs a review, though 
(more concise?). Tested x86_64-linux.


Thanks,
Paolo.

///
/cp
2015-06-24  Paolo Carlini  paolo.carl...@oracle.com

PR c++/51911
* parser.c (cp_parser_new_expression): Enforce 5.3.4/2.

/testsuite
2015-06-24  Paolo Carlini  paolo.carl...@oracle.com

PR c++/51911
* g++.dg/cpp0x/new-auto1.C: New.
Index: cp/parser.c
===
--- cp/parser.c (revision 224918)
+++ cp/parser.c (working copy)
@@ -7457,6 +7457,7 @@ cp_parser_new_expression (cp_parser* parser)
   vectree, va_gc *initializer;
   tree nelts = NULL_TREE;
   tree ret;
+  cp_token *token;
 
   /* Look for the optional `::' operator.  */
   global_scope_p
@@ -7482,7 +7483,6 @@ cp_parser_new_expression (cp_parser* parser)
  type-id.  */
   if (cp_lexer_next_token_is (parser-lexer, CPP_OPEN_PAREN))
 {
-  cp_token *token;
   const char *saved_message = parser-type_definition_forbidden_message;
 
   /* Consume the `('.  */
@@ -7513,9 +7513,11 @@ cp_parser_new_expression (cp_parser* parser)
   else
 type = cp_parser_new_type_id (parser, nelts);
 
+  token = cp_lexer_peek_token (parser-lexer);
+
   /* If the next token is a `(' or '{', then we have a new-initializer.  */
-  if (cp_lexer_next_token_is (parser-lexer, CPP_OPEN_PAREN)
-  || cp_lexer_next_token_is (parser-lexer, CPP_OPEN_BRACE))
+  if (token-type == CPP_OPEN_PAREN
+  || token-type == CPP_OPEN_BRACE)
 initializer = cp_parser_new_initializer (parser);
   else
 initializer = NULL;
@@ -7524,6 +7526,19 @@ cp_parser_new_expression (cp_parser* parser)
  expression.  */
   if (cp_parser_non_integral_constant_expression (parser, NIC_NEW))
 ret = error_mark_node;
+  /* 5.3.4/2: If the auto type-specifier appears in the type-specifier-seq
+ of a new-type-id or type-id of a new-expression, the new-expression shall
+ contain a new-initializer of the form ( assignment-expression ).  */
+  else if (type_uses_auto (type)
+   (token-type != CPP_OPEN_PAREN
+  || vec_safe_length (initializer) != 1
+  || BRACE_ENCLOSED_INITIALIZER_P ((*initializer)[0])))
+{
+  error_at (token-location,
+   initialization of new-expression for type %auto% 
+   requires exactly one parenthesized expression);
+  ret = error_mark_node;
+}
   else
 {
   /* Create a representation of the new-expression.  */
Index: testsuite/g++.dg/cpp0x/new-auto1.C
===
--- testsuite/g++.dg/cpp0x/new-auto1.C  (revision 0)
+++ testsuite/g++.dg/cpp0x/new-auto1.C  (working copy)
@@ -0,0 +1,9 @@
+// PR c++/51911
+// { dg-do compile { target c++11 } }
+
+#include initializer_list
+
+int main()
+{
+  auto foo = new auto {3, 4, 5};  // { dg-error initialization }
+}


[PATCH] Do not constrain on REAL_TYPE

2015-06-24 Thread Aditya Kumar
From: Aditya Kumar aditya...@samsung.com

gcc/ChangeLog:

2015-06-24  Aditya Kumar  aditya...@samsung.com
Sebastian Pop s@samsung.com

* graphite-sese-to-poly.c (parameter_index_in_region): Discard 
REAL_TYPE parameters.
(scan_tree_for_params): Handle REAL_CST in scan_tree_for_params.
(add_conditions_to_domain): Do not constrain on REAL_TYPE.

---
 gcc/graphite-sese-to-poly.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/gcc/graphite-sese-to-poly.c b/gcc/graphite-sese-to-poly.c
index 271c499..5b37796 100644
--- a/gcc/graphite-sese-to-poly.c
+++ b/gcc/graphite-sese-to-poly.c
@@ -796,6 +796,9 @@ parameter_index_in_region (tree name, sese region)
 
   gcc_assert (SESE_ADD_PARAMS (region));
 
+  /* Cannot constrain on REAL_TYPE parameters.  */
+  if (TREE_CODE (TREE_TYPE (name)) == REAL_TYPE)
+return -1;
   i = SESE_PARAMS (region).length ();
   SESE_PARAMS (region).safe_push (name);
   return i;
@@ -915,6 +918,7 @@ scan_tree_for_params (sese s, tree e)
 
 case INTEGER_CST:
 case ADDR_EXPR:
+case REAL_CST:
   break;
 
default:
@@ -1194,6 +1198,10 @@ add_conditions_to_domain (poly_bb_p pbb)
   {
   case GIMPLE_COND:
  {
+/* Don't constrain on REAL_TYPE.  */
+   if (TREE_CODE (TREE_TYPE (gimple_cond_lhs (stmt))) == REAL_TYPE)
+  break;
+
gcond *cond_stmt = as_a gcond * (stmt);
enum tree_code code = gimple_cond_code (cond_stmt);
 
-- 
2.1.0.243.g30d45f7



Re: [Patch, C++, PR65882] Check tf_warning flag in build_new_op_1

2015-06-24 Thread Mikhail Maltsev
On 06/24/2015 06:52 PM, Christophe Lyon wrote:
 Hi Mikhail,
 
 In the gcc-5-branch, I can see that your new inhibit-warn-2.C test
 fails (targets ARM and AArch64).
 
 I can see this error message in g++.log:
 /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/g++.dg/diagnostic/inhibit-warn-2.C:
 In function 'void fn1()':
 /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/g++.dg/diagnostic/inhibit-warn-2.C:29:3:
 error: 'typename A(Ftypename C template-parameter-1-1
 ::type::value || B:: value)::type D::operator=(Expr) [with Expr =
 int; typename A(Ftypename C template-parameter-1-1
 ::type::value || B:: value)::type = int]' is private
 /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/g++.dg/diagnostic/inhibit-warn-2.C:35:7:
 error: within this context
 
 Christophe.
 
Oops. Sorry for that, it seems that I messed up with my testing box and
the backport did not actually get regtested :(.

The problem is caused by difference in wording of diagnostics. GCC 6
gives an error on line 35 and a note on line 29:

$ ./cc1plus ~/gcc/src/gcc/testsuite/g++.dg/diagnostic/inhibit-warn-2.C
 void fn1()
/home/miyuki/gcc/src/gcc/testsuite/g++.dg/diagnostic/inhibit-warn-2.C:35:7:
error: 'typename A(Ftypename C template-parameter-1-1
::type::value || B:: value)::type D::operator=(Expr) [with Expr =
int; typename A(Ftypename C template-parameter-1-1 ::type::value
|| B:: value)::type = int]' is private within this context
   opt = 0;
/home/miyuki/gcc/src/gcc/testsuite/g++.dg/diagnostic/inhibit-warn-2.C:29:3:
note: declared private here
   operator=(Expr);

GCC 5 gives two errors:

/home/miyuki/gcc/src/gcc/testsuite/g++.dg/diagnostic/inhibit-warn-2.C:29:3:
error: 'typename A(Ftypename C template-parameter-1-1
::type::value || B:: value)::type D::operator=(Expr) [with Expr =
int; typename A(Ftypename C template-parameter-1-1 ::type::value
|| B:: value)::type = int]' is private
   operator=(Expr);
/home/miyuki/gcc/src/gcc/testsuite/g++.dg/diagnostic/inhibit-warn-2.C:35:7:
error: within this context
   opt = 0;

It can probably be fixed like this:

diff --git a/gcc/testsuite/g++.dg/diagnostic/inhibit-warn-2.C
b/gcc/testsuite/g++.dg/diagnostic/inhibit-warn-2.C
index cb16b4c..f658c1d 100644
--- a/gcc/testsuite/g++.dg/diagnostic/inhibit-warn-2.C
+++ b/gcc/testsuite/g++.dg/diagnostic/inhibit-warn-2.C
@@ -26,11 +26,11 @@ class D
 {
   template class Expr
   typename AFtypename CExpr::type::value || B::value::type
-  operator=(Expr); // { dg-message declared }
+  operator=(Expr); // { dg-message private }
 };

 void fn1()
 {
   D opt;
-  opt = 0; // { dg-error private }
+  opt = 0; // { dg-error this context }
 }

But I am not sure, what should I do in this case. Maybe it is better to
remove the failing testcase from GCC 5 branch (provided that
inhibit-warn-1.C tests a fix for the same bug and does not fail)?

-- 
Regards,
Mikhail Maltsev


Re: pr66345.c size_t assumption bug

2015-06-24 Thread DJ Delorie

 OK.

Thanks, committed.


Re: [C/C++ PATCH] PR c++/66572. Fix Wlogical-op false positive

2015-06-24 Thread Mikhail Maltsev
On 23.06.2015 22:49, Marek Polacek wrote:
 On Sat, Jun 20, 2015 at 03:02:06AM +0300, Mikhail Maltsev wrote:
 -  /* We do not warn for constants because they are typical of macro
 - expansions that test for features.  */
 -  if (CONSTANT_CLASS_P (op_left) || CONSTANT_CLASS_P (op_right))
 +  /* We do not warn for literal constants because they are typical of macro
 + expansions that test for features.  Likewise, we do not warn for
 + const-qualified and constexpr variables which are initialized by 
 constant
 + expressions, because they can come from e.g. type_traits or similar 
 user
 + code.  */
 +  if (TREE_CONSTANT (op_left) || TREE_CONSTANT (op_right))
  return;
 
 That looks wrong, because with TREE_CONSTANT we'd warn in C but not in C++
 for the following:
 
 const int a = 4;
 void
 f (void)
 {
   const int b = 4;
   static const int c = 5;
   if (a  a) {}
   if (b  b) {}
   if (c  c) {}
 }
 
Actually for this case the patch silences the warning both for C and
C++. It's interesting that Clang warns like this:

test.c:7:10: warning: use of logical '' with constant operand
[-Wconstant-logical-operand]

It does not warn for my testcase with templates. It also does not warn
about:

void
bar (const int parm_a)
{
  const bool a = parm_a;
  if (a  a) {}
  if (a || a) {}
  if (parm_a  parm_a) {}
  if (parm_a || parm_a) {}
}

EDG does not give any warnings at all (in all 3 testcases).

 Note that const-qualified types are checked using TYPE_READONLY.
Yes, but I think we should warn about const-qualified types like in the
example above (and in your recent patch).

 
 But I'm not even sure that the warning in the original testcase in the PR
 is bogus; you won't get any warning when using e.g.
   foounsigned, signed();
 in main().

Maybe my snippet does not express clearly enough what it was supposed to
express. I actually meant something like this:

  templateclass _U1, class _U2, class = typename
   enable_if__and_is_convertible_U1, _T1,
is_convertible_U2, _T2::value::type
constexpr pair(pair_U1, _U2 __p)
: first(std::forward_U1(__p.first)),
  second(std::forward_U2(__p.second)) { }

(it's std::pair move constructor)
It is perfectly possible that the user will construct an std::pairT, T
object from an std::pairU, U. In this case we get an and of two
identical is_convertible instantiations. The difference is that here
there is a clever __and_ template which helps to avoid warnings. Well,
at least I now know a good way to suppress them in my code :).

Though I still think that this warning is bogus. Probably the correct
(and the hard) way to check templates is to compare ASTs of the operands
before any substitutions.

But for now I could try to implement an idea, which I mentioned in the
bug report: add a new flag to enum tsubst_flags, and set it when we
check ASTs which depend on parameters of a template being instantiated
(we already have similar checks for macro expansions). What do you think
about such approach?

-- 
Regards,
Mikhail Maltsev


Re: Remove redundant AND from count reduction loop

2015-06-24 Thread Richard Sandiford
 There is precedence for different
 expansion paths dependent on optabs (or even rtx cost?).  Of course
 expand_unop doesn't get the original tree ops (expand_expr.c does,
 where some special-casing using get_gimple_for_expr is).  Not sure
 if expand_unop would get 'cond' in a form where it can recognize
 the result is either -1 or 0.

 It just seems inconsistent to have the optabs machinery try to detect
 this ad-hoc combination opportunity while still leaving the vcond optab
 to handle more arbitrary cases, like (vec_cond (eq x y) 0xbeef 0).
 The vcond optabs would still have the logic needed to produce the
 right code, but we'd be circumventing it and trying to reimplement
 one particular case in a different way.

 That's true.  One could also leave it to combine / simplify_rtx and
 thus rtx_cost.  But that's true of all of the match.pd stuff you add, no?

It's probably true of most match.pd stuff in general though :-)
One advantage of match.pd of course is that it works across
block boundaries.

The difference between the stuff I added and converting vec_cond_expr
to negate is that the stuff I added avoids the vec_cond_expr altogether
and so ought to be an unequivocal win.  Replacing vec_cond_expr with
negate just rewrites it into another (arguably more surprising) form.

Thanks,
Richard



Re: [PATCH 1/8] S/390 Vector ABI GNU Attribute.

2015-06-24 Thread Andreas Krebbel
On 06/24/2015 12:14 PM, Richard Biener wrote:
 On Wed, Jun 24, 2015 at 8:57 AM, Andreas Krebbel
 Ideas about how to improve the implementation without creating too
 many false postives are welcome.
 
 I'd be more conservative and instead hook into
 targetm.vector_mode_supported_p (and thus vector_type_mode).
 
 Yes, it will trip on local vector types.  But I can't see how you
 can avoid this in general without seeing the whole program.

This would mean that the GNU ABI marker would be set for code which makes use 
of one of the builtins
locally to optimize a special case wrapped in a runtime check. This is what I 
was trying to avoid
actually.
We do have some optimizations for Glibc. However, these are all written in 
assembler what would not
trigger the ABI flag to be set. So for Glibc the more conservative approach 
would be no problem so far.

The current implementation has the hole that an ABI vector type usage might not 
be detected if the
type is used in a sizeof construct without being used anywhere else. One 
question is how big that
problem actually is? Another is if there are more cases which might slip 
through? If it turns out to
be too risky I probably have to go with one of the more conservative approaches 
:(

 If I'd do it retro-actively I'd reverse the flag and instead mark units
 which use generic non-z13 vectors...
 
 Note that other targets simply emit -Wpsabi warnings here:
 
 gcc t.c -S -m32
 t.c: In function ‘foo’:
 t.c:4:1: warning: SSE vector return without SSE enabled changes the
 ABI [-Wpsabi]
  {
  ^
 t.c:3:6: note: The ABI for passing parameters with 16-byte alignment
 has changed in GCC 4.6
  v4si foo (v4si x)
   ^
 t.c:3:6: warning: SSE vector argument without SSE enabled changes the
 ABI [-Wpsabi]
 
 for
 
 typedef int v4si __attribute__((vector_size(16)));
 
 v4si foo (v4si x)
 {
   return x;
 }
 
 on i?86 without -msse2.  So you could as well do that - warn for vector
 type uses on non-z13 and be done with that.

Yes. I've seen this and plan to implement this ontop of the other mechanism. 
But we would still need
something like the GNU ABI marker for GDB.

Bye,

-Andreas-


 
 Richard.
 
 In particular we do not want to set the attribute for local uses of
 vector types as they would be natural for ifunc optimizations.

 gcc/
 * config/s390/s390.c (s390_vector_abi): New variable definition.
 (s390_check_type_for_vector_abi): New function.
 (TARGET_ASM_FILE_END): New macro definition.
 (s390_asm_file_end): New function.
 (s390_function_arg): Call s390_check_type_for_vector_abi.
 (s390_gimplify_va_arg): Likewise.
 * configure: Regenerate.
 * configure.ac: Check for .gnu_attribute Binutils feature.

 gcc/testsuite/
 * gcc.target/s390/vector/vec-abi-1.c: Add gnu attribute check.
 * gcc.target/s390/vector/vec-abi-attr-1.c: New test.
 * gcc.target/s390/vector/vec-abi-attr-2.c: New test.
 * gcc.target/s390/vector/vec-abi-attr-3.c: New test.
 * gcc.target/s390/vector/vec-abi-attr-4.c: New test.
 * gcc.target/s390/vector/vec-abi-attr-5.c: New test.
 * gcc.target/s390/vector/vec-abi-attr-6.c: New test.
 ---
  gcc/config/s390/s390.c |  121 
 
  gcc/configure  |   36 ++
  gcc/configure.ac   |7 ++
  gcc/testsuite/gcc.target/s390/vector/vec-abi-1.c   |1 +
  .../gcc.target/s390/vector/vec-abi-attr-1.c|   18 +++
  .../gcc.target/s390/vector/vec-abi-attr-2.c|   53 +
  .../gcc.target/s390/vector/vec-abi-attr-3.c|   18 +++
  .../gcc.target/s390/vector/vec-abi-attr-4.c|   17 +++
  .../gcc.target/s390/vector/vec-abi-attr-5.c|   19 +++
  .../gcc.target/s390/vector/vec-abi-attr-6.c|   24 
  10 files changed, 314 insertions(+)
  create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-abi-attr-1.c
  create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-abi-attr-2.c
  create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-abi-attr-3.c
  create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-abi-attr-4.c
  create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-abi-attr-5.c
  create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-abi-attr-6.c

 diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
 index d6ed179..934f7c0 100644
 --- a/gcc/config/s390/s390.c
 +++ b/gcc/config/s390/s390.c
 @@ -461,6 +461,97 @@ struct GTY(()) machine_function
  #define PREDICT_DISTANCE (TARGET_Z10 ? 384 : 2048)


 +/* Indicate which ABI has been used for passing vector args.
 +   0 - no vector type arguments have been passed where the ABI is relevant
 +   1 - the old ABI has been used
 +   2 - a vector type argument has been passed either in a vector register
 +   or on the stack by value  */
 +static int s390_vector_abi = 0;
 +
 +/* Set the vector ABI marker if TYPE is 

Re: [gomp4] Preserve NVPTX reconvergence points

2015-06-24 Thread Bernd Schmidt

On 06/19/2015 03:45 PM, Jakub Jelinek wrote:


If the loop remains in the IL (isn't optimized away as unreachable or
isn't removed, e.g. as a non-loop - say if it contains a noreturn call),
the flags on struct loop should be still there.  For the loop clauses
(reduction always, and private/lastprivate if addressable etc.) for
OpenMP simd / Cilk+ simd we use special arrays indexed by internal
functions, which then during vectorization are shrunk (but in theory could
be expanded too) to the right vectorization factor if vectorized, of course
accesses within the loop vectorized using SIMD, and if not vectorized,
shrunk to 1 element.


I'd appreciate if you could describe that mechanism in more detail. As 
far as I can tell it is very poorly commented and documented in the 
code. I mean, it doesn't even follow the minimal coding standards of 
describing function inputs:


/* Helper function of lower_rec_input_clauses, used for #pragma omp simd
   privatization.  */

static bool
lower_rec_simd_input_clauses (tree new_var, omp_context *ctx, int max_vf,
  tree idx, tree lane, tree ivar, tree lvar)


Bernd



Re: [gomp4] Preserve NVPTX reconvergence points

2015-06-24 Thread Jakub Jelinek
On Wed, Jun 24, 2015 at 03:11:04PM +0200, Bernd Schmidt wrote:
 On 06/19/2015 03:45 PM, Jakub Jelinek wrote:
 
 If the loop remains in the IL (isn't optimized away as unreachable or
 isn't removed, e.g. as a non-loop - say if it contains a noreturn call),
 the flags on struct loop should be still there.  For the loop clauses
 (reduction always, and private/lastprivate if addressable etc.) for
 OpenMP simd / Cilk+ simd we use special arrays indexed by internal
 functions, which then during vectorization are shrunk (but in theory could
 be expanded too) to the right vectorization factor if vectorized, of course
 accesses within the loop vectorized using SIMD, and if not vectorized,
 shrunk to 1 element.
 
 I'd appreciate if you could describe that mechanism in more detail. As far
 as I can tell it is very poorly commented and documented in the code. I
 mean, it doesn't even follow the minimal coding standards of describing
 function inputs:
 
 /* Helper function of lower_rec_input_clauses, used for #pragma omp simd
privatization.  */
 
 static bool
 lower_rec_simd_input_clauses (tree new_var, omp_context *ctx, int max_vf,
 tree idx, tree lane, tree ivar, tree lvar)

Here is the theory behind it:
https://gcc.gnu.org/ml/gcc-patches/2013-04/msg01661.html
In the end it is using internal functions instead of uglified builtins.
I'd suggest you look at some of the libgomp.c/simd*.c tests, say
with -O2 -mavx2 -fdump-tree-{omplower,ssa,ifcvt,vect,optimized}
to see how it is lowered and expanded.  I assume #pragma omp simd roughly
corresponds to #pragma acc loop vector, maxvf for PTX vectorization is
supposedly 32 (warp size).  For SIMD vectorization, if the vectorization
fails, the arrays are shrunk to 1 element, otherwise they are shrunk to the
vectorization factor, and later optimizations if they aren't really
addressable optimized using FRE and other memory optimizations so that they
don't touch memory unless really needed.
For the PTX style vectorization (parallelization between threads in a warp),
I'd say you would always shrink to 1 element again, but such variables would
be local to each of the threads in the warp (or another possibility is
shared arrays of size 32 indexed by %tid.x  31), while addressable variables
without such magic type would be shared among all threads; non-addressable
variables (SSA_NAMEs) depending on where they are used.
You'd need to transform reductions (which are right now represented as
another loop, from 0 to an internal function, so easily recognizable) into
the PTX reductions.  Also, lastprivate is now an access to the array using
last lane internal function, dunno what that corresponds to in PTX
(perhaps also a reduction where all but the thread executing the last
iteration say or in 0 and the remaining thread ors in the lastprivate value).

Jakub


Re: [patch] fix regrename pass to ensure renamings produce valid insns

2015-06-24 Thread Sandra Loosemore

On 06/24/2015 01:58 AM, Ramana Radhakrishnan wrote:



On 24/06/15 02:00, Sandra Loosemore wrote:

On 06/18/2015 11:32 AM, Eric Botcazou wrote:

The attached patch teaches regrename to validate insns affected by each
register renaming before making the change.  I can see at least two
other ways to handle this -- earlier, by rejecting renamings that
result
in invalid instructions when it's searching for the best renaming; or
later, by validating the entire set of renamings as a group instead of
incrementally for each one -- but doing it all in regname_do_replace
seems least disruptive and risky in terms of the existing code.


OK, but the patch looks incomplete, rename_chains should be adjusted
as well,
i.e. regrename_do_replace should now return a boolean.


Like this?  I tested this on nios2 and x86_64-linux-gnu, as before, plus
built for aarch64-linux-gnu and ran the gcc testsuite.


Hopefully that was built with --with-cpu=cortex-a57 to enable the
renaming pass ?


No, sorry.  I was assuming there were compile-only unit tests for this 
pass that automatically add the right options to enable it.  I don't 
know that I can actually run cortex-a57 code (I was struggling with a 
flaky test harness as it was).


-Sandra



Re: [PATCH, i386] Fix `misaligned_operand' predicate.

2015-06-24 Thread Uros Bizjak
On Wed, Jun 24, 2015 at 4:35 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote:
 Hello,

 Patch in the bottom uses proper check of valid memory
 in `misaligned_operand' predicate.

 gcc/
 * config/i386/predicates.md (misaligned_operand): Properly
 check if operand is memory.

 Bootstrapped and reg-tested.

 Is it ok for trunk?

I have reviewed the uses of misaligned_operand predicate, and AFAICS
they always operate after the check for memory_operand. So, there is
no point to re-check it with full memory_operand predicate.

Please introduce another predicate for legitimate misaligned memory
operand, perhaps named misaligned_memory_operand.

Uros.

 --
 Thanks,  K

 diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
 index 4e45246..7d6ae77 100644
 --- a/gcc/config/i386/predicates.md
 +++ b/gcc/config/i386/predicates.md
 @@ -1365,7 +1365,7 @@

  ;; Return true if OP is misaligned memory operand
  (define_predicate misaligned_operand
 -  (and (match_code mem)
 +  (and (match_operand 0 memory_operand)
 (match_test MEM_ALIGN (op)  GET_MODE_ALIGNMENT (mode

  ;; Return true if OP is a emms operation, known to be a PARALLEL.


[patch committed SH] Fix PR target/66563

2015-06-24 Thread Kaz Kojima
The attached patch is to fix PR target/66563 which is a 4.9/5/6
regression.  These newer compilers can CSE some expressions on
the sequences for getting GOT.  The target should make sure it
won't happen.  See PR target/66563 for details.
Tested on sh4-unknown-linux-gnu and committed on trunk.
I'll backport it to 5 later and to 4.9 when it reopens.

Regards,
kaz
--
2015-06-24  Kaz Kojima  kkoj...@gcc.gnu.org

PR target/66563
* config/sh/sh.md (GOTaddr2picreg): Add a new operand for
an additional element of the unspec vector.  Modify indices
of operands.
(builtin_setjmp_receiver): Pass const0_rtx to gen_GOTaddr2picreg.
* config/sh/sh.c (prepare_move_operands): Pass incremented
const_int to gen_GOTaddr2picreg.
(sh_expand_prologue): Pass const0_rtx to gen_GOTaddr2picreg.

diff --git a/config/sh/sh.c b/config/sh/sh.c
index 6f03206..2c247b1 100644
--- a/config/sh/sh.c
+++ b/config/sh/sh.c
@@ -1845,12 +1845,13 @@ prepare_move_operands (rtx operands[], machine_mode 
mode)
  || tls_kind == TLS_MODEL_LOCAL_DYNAMIC
  || tls_kind == TLS_MODEL_INITIAL_EXEC))
{
+ static int got_labelno;
  /* Don't schedule insns for getting GOT address when
 the first scheduling is enabled, to avoid spill
 failures for R0.  */
  if (flag_schedule_insns)
emit_insn (gen_blockage ());
- emit_insn (gen_GOTaddr2picreg ());
+ emit_insn (gen_GOTaddr2picreg (GEN_INT (++got_labelno)));
  emit_use (gen_rtx_REG (SImode, PIC_REG));
  if (flag_schedule_insns)
emit_insn (gen_blockage ());
@@ -7958,7 +7959,7 @@ sh_expand_prologue (void)
 }
 
   if (flag_pic  df_regs_ever_live_p (PIC_OFFSET_TABLE_REGNUM))
-emit_insn (gen_GOTaddr2picreg ());
+emit_insn (gen_GOTaddr2picreg (const0_rtx));
 
   if (SHMEDIA_REGS_STACK_ADJUST ())
 {
diff --git a/config/sh/sh.md b/config/sh/sh.md
index e88d249..43cd949 100644
--- a/config/sh/sh.md
+++ b/config/sh/sh.md
@@ -10592,12 +10592,18 @@ label:
   [(set_attr in_delay_slot no)
(set_attr type arith)])
 
+;; Loads of the GOTPC relocation values must not be optimized away
+;; by e.g. any kind of CSE and must stay as they are.  Although there
+;; are other various ways to ensure this, we use an artificial counter
+;; operand to generate unique symbols.
 (define_expand GOTaddr2picreg
   [(set (reg:SI R0_REG)
-   (unspec:SI [(const:SI (unspec:SI [(match_dup 1)] UNSPEC_PIC))]
-  UNSPEC_MOVA))
-   (set (match_dup 0) (const:SI (unspec:SI [(match_dup 1)] UNSPEC_PIC)))
-   (set (match_dup 0) (plus:SI (match_dup 0) (reg:SI R0_REG)))]
+   (unspec:SI [(const:SI (unspec:SI [(match_dup 2)
+ (match_operand:SI 0  )]
+UNSPEC_PIC))] UNSPEC_MOVA))
+   (set (match_dup 1)
+   (const:SI (unspec:SI [(match_dup 2) (match_dup 0)] UNSPEC_PIC)))
+   (set (match_dup 1) (plus:SI (match_dup 1) (reg:SI R0_REG)))]
   
 {
   if (TARGET_VXWORKS_RTP)
@@ -10608,8 +10614,8 @@ label:
   DONE;
 }
 
-  operands[0] = gen_rtx_REG (Pmode, PIC_REG);
-  operands[1] = gen_rtx_SYMBOL_REF (VOIDmode, GOT_SYMBOL_NAME);
+  operands[1] = gen_rtx_REG (Pmode, PIC_REG);
+  operands[2] = gen_rtx_SYMBOL_REF (VOIDmode, GOT_SYMBOL_NAME);
 
   if (TARGET_SHMEDIA)
 {
@@ -10618,23 +10624,23 @@ label:
   rtx lab = PATTERN (gen_call_site ());
   rtx insn, equiv;
 
-  equiv = operands[1];
-  operands[1] = gen_rtx_UNSPEC (Pmode, gen_rtvec (2, operands[1], lab),
+  equiv = operands[2];
+  operands[2] = gen_rtx_UNSPEC (Pmode, gen_rtvec (2, operands[2], lab),
UNSPEC_PCREL_SYMOFF);
-  operands[1] = gen_rtx_CONST (Pmode, operands[1]);
+  operands[2] = gen_rtx_CONST (Pmode, operands[2]);
 
   if (Pmode == SImode)
{
- emit_insn (gen_movsi_const (pic, operands[1]));
+ emit_insn (gen_movsi_const (pic, operands[2]));
  emit_insn (gen_ptrel_si (tr, pic, copy_rtx (lab)));
}
   else
{
- emit_insn (gen_movdi_const (pic, operands[1]));
+ emit_insn (gen_movdi_const (pic, operands[2]));
  emit_insn (gen_ptrel_di (tr, pic, copy_rtx (lab)));
}
 
-  insn = emit_move_insn (operands[0], tr);
+  insn = emit_move_insn (operands[1], tr);
 
   set_unique_reg_note (insn, REG_EQUAL, equiv);
 
@@ -10688,7 +10694,7 @@ label:
   [(match_operand 0  )]
   flag_pic
 {
-  emit_insn (gen_GOTaddr2picreg ());
+  emit_insn (gen_GOTaddr2picreg (const0_rtx));
   DONE;
 })
 


[PATCH] i386: Do not modify existing RTL (PR66412)

2015-06-24 Thread Segher Boessenkool
A few define_split's in the i386 backend modify RTL in place.  This does
not work.  This patch fixes all cases that do PUT_MODE on existing RTL.

Bootstrapped and tested; no regressions.  Is this okay for trunk?

Hrm, this wants the testcase in that PR added I suppose.  Will send
it separately.


Segher


2015-06-24  Segher Boessenkool  seg...@kernel.crashing.org

* config/i386/i386.md (various splitters): Use copy_rtx before
doing PUT_MODE on operands.

---
 gcc/config/i386/i386.md |   16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index d75b2e1..5425cec 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -10865,7 +10865,10 @@
(const_int 0)))]
   
   [(set (match_dup 0) (match_dup 1))]
-  PUT_MODE (operands[1], QImode);)
+{
+  operands[1] = copy_rtx (operands[1]);
+  PUT_MODE (operands[1], QImode);
+})
 
 (define_split
   [(set (strict_low_part (match_operand:QI 0 nonimmediate_operand))
@@ -10874,7 +10877,10 @@
(const_int 0)))]
   
   [(set (match_dup 0) (match_dup 1))]
-  PUT_MODE (operands[1], QImode);)
+{
+  operands[1] = copy_rtx (operands[1]);
+  PUT_MODE (operands[1], QImode);
+})
 
 (define_split
   [(set (match_operand:QI 0 nonimmediate_operand)
@@ -11031,7 +11037,10 @@
(if_then_else (match_dup 0)
  (label_ref (match_dup 1))
  (pc)))]
-  PUT_MODE (operands[0], VOIDmode);)
+{
+  operands[0] = copy_rtx (operands[0]);
+  PUT_MODE (operands[0], VOIDmode);
+})
 
 (define_split
   [(set (pc)
@@ -17298,6 +17307,7 @@
   operands[1] = gen_lowpart (SImode, operands[1]);
   if (GET_CODE (operands[3]) != ASHIFT)
 operands[2] = gen_lowpart (SImode, operands[2]);
+  operands[3] = copy_rtx (operands[3]);
   PUT_MODE (operands[3], SImode);
 })
 
-- 
1.7.10.4



Re: [RS6000 1/7] Hide insns not needing to be public

2015-06-24 Thread David Edelsohn
On Tue, Jun 23, 2015 at 8:50 PM, Alan Modra amo...@gmail.com wrote:
 * config/rs6000/rs6000.md (addsi3_high, bswaphi2_internal,
 ashldi3_internal5, ashldi3_internal8): Prefix with '*'.

This patch is okay.

The rotate changes need to be discussed and coordinated with Segher.

The cost changes are okay in theory, but really should be applied in
conjunction with the rtx_cost improvements that you are discussing
with Jeff.

Thanks, David


[Patch ARM] Fixup testsuite noise with various multilibs in arm.exp

2015-06-24 Thread Ramana Radhakrishnan
Pretty much self explanatory. This fixes up a significant amount of 
noise in a testsuite run with an -mfloat-abi=soft, -mfpu=fp-armv8 
variant in the multilib flags provided for the testsuite run.


Tested arm-none-eabi cross with a set of multilibs and applied to trunk.

regards
ramana

2015-06-24  Ramana Radhakrishnan  ramana.radhakrish...@arm.com

* gcc.target/arm/fixed_float_conversion.c: Skip for inappropriate
  multilibs.
* gcc.target/arm/memset-inline-10.c: Likewise.
* gcc.target/arm/pr58784.c: Likewise.
* gcc.target/arm/pr59985.C: Likewise.
* gcc.target/arm/vfp-1.c: Likewise and test only for the non fma cases.


commit 6187bfc6ca02ec16f8443188f740958079d8e6ea
Author: Ramana Radhakrishnan ramana.radhakrish...@arm.com
Date:   Wed Jun 24 14:33:51 2015 +0100

more noise.

diff --git a/gcc/testsuite/gcc.target/arm/fixed_float_conversion.c 
b/gcc/testsuite/gcc.target/arm/fixed_float_conversion.c
index 078b103..05ccd14 100644
--- a/gcc/testsuite/gcc.target/arm/fixed_float_conversion.c
+++ b/gcc/testsuite/gcc.target/arm/fixed_float_conversion.c
@@ -3,6 +3,7 @@
 /* { dg-require-effective-target arm_vfp3_ok } */
 /* { dg-options -O1 } */
 /* { dg-add-options arm_vfp3 } */
+/* { dg-skip-if need fp instructions { *-*-* } { -mfloat-abi=soft } {  } 
} */
 
 float
 fixed_to_float (int i)
diff --git a/gcc/testsuite/gcc.target/arm/memset-inline-10.c 
b/gcc/testsuite/gcc.target/arm/memset-inline-10.c
index d3b777c..c1087c8 100644
--- a/gcc/testsuite/gcc.target/arm/memset-inline-10.c
+++ b/gcc/testsuite/gcc.target/arm/memset-inline-10.c
@@ -1,5 +1,7 @@
 /* { dg-do compile } */
 /* { dg-options -march=armv7-a -mfloat-abi=hard -mfpu=neon -O2 } */
+/* { dg-skip-if need SIMD instructions { *-*-* } { -mfloat-abi=soft } {  
} } */
+/* { dg-skip-if need SIMD instructions { *-*-* } { -mfpu=vfp* } {  } } */
 
 #define BUF 100
 long a[BUF];
diff --git a/gcc/testsuite/gcc.target/arm/pr58784.c 
b/gcc/testsuite/gcc.target/arm/pr58784.c
index 4ee3ef5..29a0f73 100644
--- a/gcc/testsuite/gcc.target/arm/pr58784.c
+++ b/gcc/testsuite/gcc.target/arm/pr58784.c
@@ -1,6 +1,8 @@
 /* { dg-do compile } */
 /* { dg-skip-if incompatible options { arm_thumb1 } { * } {  } } */
 /* { dg-options -march=armv7-a -mfloat-abi=hard -mfpu=neon -marm -O2 } */
+/* { dg-skip-if need hardfp ABI { *-*-* } { -mfloat-abi=soft } {  } } */
+
 
 typedef struct __attribute__ ((__packed__))
 {
diff --git a/gcc/testsuite/gcc.target/arm/pr59985.C 
b/gcc/testsuite/gcc.target/arm/pr59985.C
index 1351c48..97d5915 100644
--- a/gcc/testsuite/gcc.target/arm/pr59985.C
+++ b/gcc/testsuite/gcc.target/arm/pr59985.C
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
 /* { dg-skip-if incompatible options { arm_thumb1 } { * } {  } } */
 /* { dg-options -g -fcompare-debug -O2 -march=armv7-a -mtune=cortex-a9 
-mfpu=vfpv3-d16 -mfloat-abi=hard } */
+/* { dg-skip-if need hardfp abi { *-*-* } { -mfloat-abi=soft } {  } } */
 
 extern void *f1 (unsigned long, unsigned long);
 extern const struct line_map *f2 (void *, int, unsigned int, const char *, 
unsigned int);
diff --git a/gcc/testsuite/gcc.target/arm/vfp-1.c 
b/gcc/testsuite/gcc.target/arm/vfp-1.c
index b6bb7be..9aa5302 100644
--- a/gcc/testsuite/gcc.target/arm/vfp-1.c
+++ b/gcc/testsuite/gcc.target/arm/vfp-1.c
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
-/* { dg-options -O2 -mfpu=vfp -mfloat-abi=softfp } */
+/* { dg-options -O2 -mfpu=vfp -mfloat-abi=softfp -ffp-contract=off } */
 /* { dg-require-effective-target arm_vfp_ok } */
+/* { dg-skip-if need fp instructions { *-*-* } { -mfloat-abi=soft } {  } 
} */
 
 extern float fabsf (float);
 extern float sqrtf (float);


[PATCH, committed] Fix warning

2015-06-24 Thread Ilya Enkovich
Hi,

I've committed this patch to fix a warning for mpx-bootstrap.

/export/users/aguskov/MPX/git_branch/source/gcc/tree.h:2858:51: error: 
'vectype' may be used uninitialized in this function 
[-Werror=maybe-uninitialized]
 tree_check_failed (__t, __f, __l, __g, __c, 0);
   ^
/export/users/aguskov/MPX/git_branch/source/gcc/tree-vect-slp.c:483:8: note: 
'vectype' was declared here
   tree vectype, scalar_type, first_op1 = NULL_TREE;
^

Thanks,
Ilya
--
2015-06-24  Ilya Enkovich  enkovich@gmail.com

* tree-vect-slp.c (vect_build_slp_tree_1): Init vectype.


diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index 91ddc0f..bbc7d13 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -480,7 +480,7 @@ vect_build_slp_tree_1 (loop_vec_info loop_vinfo, 
bb_vec_info bb_vinfo,
   enum tree_code first_cond_code = ERROR_MARK;
   tree lhs;
   bool need_same_oprnds = false;
-  tree vectype, scalar_type, first_op1 = NULL_TREE;
+  tree vectype = NULL_TREE, scalar_type, first_op1 = NULL_TREE;
   optab optab;
   int icode;
   machine_mode optab_op2_mode;


Re: Remove redundant AND from count reduction loop

2015-06-24 Thread Richard Biener
On Wed, Jun 24, 2015 at 3:37 PM, Richard Sandiford
richard.sandif...@arm.com wrote:
 There is precedence for different
 expansion paths dependent on optabs (or even rtx cost?).  Of course
 expand_unop doesn't get the original tree ops (expand_expr.c does,
 where some special-casing using get_gimple_for_expr is).  Not sure
 if expand_unop would get 'cond' in a form where it can recognize
 the result is either -1 or 0.

 It just seems inconsistent to have the optabs machinery try to detect
 this ad-hoc combination opportunity while still leaving the vcond optab
 to handle more arbitrary cases, like (vec_cond (eq x y) 0xbeef 0).
 The vcond optabs would still have the logic needed to produce the
 right code, but we'd be circumventing it and trying to reimplement
 one particular case in a different way.

 That's true.  One could also leave it to combine / simplify_rtx and
 thus rtx_cost.  But that's true of all of the match.pd stuff you add, no?

 It's probably true of most match.pd stuff in general though :-)
 One advantage of match.pd of course is that it works across
 block boundaries.

 The difference between the stuff I added and converting vec_cond_expr
 to negate is that the stuff I added avoids the vec_cond_expr altogether
 and so ought to be an unequivocal win.  Replacing vec_cond_expr with
 negate just rewrites it into another (arguably more surprising) form.

True.  Btw, conditional view_convert is now in trunk so you can at least
merge both plus:c patterns and both minus patterns.

Richard.

 Thanks,
 Richard



[PATCH, i386] Fix `misaligned_operand' predicate.

2015-06-24 Thread Kirill Yukhin
Hello,

Patch in the bottom uses proper check of valid memory
in `misaligned_operand' predicate.

gcc/
* config/i386/predicates.md (misaligned_operand): Properly
check if operand is memory.

Bootstrapped and reg-tested.

Is it ok for trunk?

--
Thanks,  K

diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
index 4e45246..7d6ae77 100644
--- a/gcc/config/i386/predicates.md
+++ b/gcc/config/i386/predicates.md
@@ -1365,7 +1365,7 @@

 ;; Return true if OP is misaligned memory operand
 (define_predicate misaligned_operand
-  (and (match_code mem)
+  (and (match_operand 0 memory_operand)
(match_test MEM_ALIGN (op)  GET_MODE_ALIGNMENT (mode

 ;; Return true if OP is a emms operation, known to be a PARALLEL.


[gomp4] Additional tests for declare directive and fixes.

2015-06-24 Thread James Norris

Hi!

The following patch adds additional testing of the declare directive
and fixes for issues that arose from the testing.


Committed to gomp-4_0-branch.

Jim
diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index e7df751..bcbd163 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -1767,12 +1767,15 @@ finish_oacc_declare (tree fnbody, tree decls)
 	break;
 }
 
-  stmt = make_node (OACC_DECLARE);
-  TREE_TYPE (stmt) = void_type_node;
-  OACC_DECLARE_CLAUSES (stmt) = ret_clauses;
-  SET_EXPR_LOCATION (stmt, loc);
+  if (ret_clauses)
+{
+  stmt = make_node (OACC_DECLARE);
+  TREE_TYPE (stmt) = void_type_node;
+  OACC_DECLARE_CLAUSES (stmt) = ret_clauses;
+  SET_EXPR_LOCATION (stmt, loc);
 
-  tsi_link_before (i, stmt, TSI_CONTINUE_LINKING);
+  tsi_link_before (i, stmt, TSI_CONTINUE_LINKING);
+}
 
   DECL_ATTRIBUTES (fndecl)
 	  = remove_attribute (oacc declare, DECL_ATTRIBUTES (fndecl));
@@ -12812,6 +12815,14 @@ c_parser_oacc_declare (c_parser *parser)
 	  error = true;
 	  continue;
 	}
+	  else if (TREE_PUBLIC (decl))
+	{
+	  error_at (loc,
+			invalid use of %global% variable %qD 
+			in %#pragma acc declare%, decl);
+	  error = true;
+	  continue;
+	}
 	  break;
 	}
 
diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 15da51e..a35f599 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -14343,7 +14343,17 @@ finish_oacc_declare (tree fndecl, tree decls)
  {
 	t = tsi_stmt (i);
 	if (TREE_CODE (t) == BIND_EXPR)
-	  list = BIND_EXPR_BODY (t);
+	  {
+	list = BIND_EXPR_BODY (t);
+	if (TREE_CODE (list) != STATEMENT_LIST)
+	  {
+		stmt = list;
+		list = alloc_stmt_list ();
+		BIND_EXPR_BODY (t) = list;
+		i = tsi_start (list);
+		tsi_link_after (i, stmt, TSI_CONTINUE_LINKING);
+	  }
+	  }
   }
 
   if (clauses)
@@ -14371,11 +14381,11 @@ finish_oacc_declare (tree fndecl, tree decls)
 	}
 	}
 
-	if (!found)
-	  {
-	i = tsi_start (list);
-	tsi_link_before (i, stmt, TSI_CONTINUE_LINKING);
-	  }
+  if (!found)
+	{
+	  i = tsi_start (list);
+	  tsi_link_before (i, stmt, TSI_CONTINUE_LINKING);
+	}
 }
 
 while (oacc_returns)
@@ -14405,18 +14415,21 @@ finish_oacc_declare (tree fndecl, tree decls)
 	free (r);
  }
 
-  for (i = tsi_start (list); !tsi_end_p (i); tsi_next (i))
+  if (ret_clauses)
 {
-  if (tsi_end_p (i))
-	break;
-}
+  for (i = tsi_start (list); !tsi_end_p (i); tsi_next (i))
+	{
+	  if (tsi_end_p (i))
+	break;
+	}
 
-  stmt = make_node (OACC_DECLARE);
-  TREE_TYPE (stmt) = void_type_node;
-  OMP_STANDALONE_CLAUSES (stmt) = ret_clauses;
-  SET_EXPR_LOCATION (stmt, loc);
+  stmt = make_node (OACC_DECLARE);
+  TREE_TYPE (stmt) = void_type_node;
+  OMP_STANDALONE_CLAUSES (stmt) = ret_clauses;
+  SET_EXPR_LOCATION (stmt, loc);
 
-  tsi_link_before (i, stmt, TSI_CONTINUE_LINKING);
+  tsi_link_before (i, stmt, TSI_CONTINUE_LINKING);
+}
 
   DECL_ATTRIBUTES (fndecl)
 	  = remove_attribute (oacc declare, DECL_ATTRIBUTES (fndecl));
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 78bcb0a1..41fb35e 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -32123,6 +32123,14 @@ cp_parser_oacc_declare (cp_parser *parser, cp_token *pragma_tok)
 	  error = true;
 	  continue;
 	}
+	  else if (TREE_PUBLIC (decl))
+	{
+	  error_at (loc,
+			invalid use of %global% variable %qD 
+			in %#pragma acc declare%, decl);
+	  error = true;
+	  continue;
+	}
 	  break;
 	}
 
diff --git a/gcc/testsuite/c-c++-common/goacc/declare-2.c b/gcc/testsuite/c-c++-common/goacc/declare-2.c
index ce12463..7979f0c 100644
--- a/gcc/testsuite/c-c++-common/goacc/declare-2.c
+++ b/gcc/testsuite/c-c++-common/goacc/declare-2.c
@@ -63,4 +63,6 @@ f (void)
 
   extern int ve6;
 #pragma acc declare present_or_create(ve6) /* { dg-error invalid use of } */
+
+#pragma acc declare present (v9) /* { dg-error invalid use of } */
 }
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-1.c
index 59cfe51..584b921 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-1.c
@@ -4,6 +4,26 @@
 #include stdlib.h
 #include stdio.h
 
+#define N 8
+
+void
+subr1 (int *a)
+{
+  int f[N];
+#pragma acc declare copy (f)
+
+#pragma acc parallel copy (a[0:N])
+  {
+int i;
+
+for (i = 0; i  N; i++)
+  {
+	f[i] = a[i];
+	a[i] = f[i] + f[i];
+  }
+  }
+}
+
 int b[8];
 #pragma acc declare create (b)
 
@@ -13,7 +33,6 @@ int d[8] = { 1, 2, 3, 4, 5, 6, 7, 8 };
 int
 main (int argc, char **argv)
 {
-  const int N = 8;
   int a[N];
   int e[N];
 #pragma acc declare create (e)
@@ -61,5 +80,18 @@ main (int argc, char **argv)
 	abort ();
 }
 
+  for (i = 0; i  N; i++)
+{
+  a[i] = 1234;
+}
+
+  subr1 (a[0]);
+
+  for (i = 0; i  N; i++)
+{
+  if (a[i] != 1234 * 2)
+	abort ();
+}
+
   return 0;
 }
diff --git 

Re: [PATCH 1/3][AArch64 nofp] Fix ICEs with +nofp/-mgeneral-regs-only and improve error messages

2015-06-24 Thread James Greenhalgh
On Tue, Jun 23, 2015 at 05:02:46PM +0100, Alan Lawrence wrote:
 James Greenhalgh wrote:
 
 Bootstrap + check-gcc on aarch64-none-linux-gnu.
 
 (ChangeLog's identical to v1)
 
 gcc/ChangeLog:
 
   * config/aarch64/aarch64-protos.h (aarch64_err_no_fpadvsimd): New.
 
   * config/aarch64/aarch64.md (movmode/GPF, movtf): Use
   aarch64_err_no_fpadvsimd.
 
   * config/aarch64/aarch64.c (aarch64_err_no_fpadvsimd): New.
   (aarch64_layout_arg, aarch64_init_cumulative_args): Use
   aarch64_err_no_fpadvsimd if !TARGET_FLOAT and we need FP regs.
   (aarch64_expand_builtin_va_start, aarch64_setup_incoming_varargs):
   Turn error into assert, test TARGET_FLOAT.
   (aarch64_gimplify_va_arg_expr): Use aarch64_err_no_fpadvsimd, test
   TARGET_FLOAT.
 
 gcc/testsuite/ChangeLog:
 
   * gcc.target/aarch64/mgeneral-regs_1.c: New file.
   * gcc.target/aarch64/mgeneral-regs_2.c: New file.
   * gcc.target/aarch64/nofp_1.c: New file.


OK.

Thanks,
James

 diff --git a/gcc/config/aarch64/aarch64-protos.h 
 b/gcc/config/aarch64/aarch64-protos.h
 index 
 965a11b7bee188819796e2b17017a87dca80..ac92c5924a4cfc5941fe8eeb31281e18bd21a5a0
  100644
 --- a/gcc/config/aarch64/aarch64-protos.h
 +++ b/gcc/config/aarch64/aarch64-protos.h
 @@ -259,6 +259,7 @@ unsigned aarch64_dbx_register_number (unsigned);
  unsigned aarch64_trampoline_size (void);
  void aarch64_asm_output_labelref (FILE *, const char *);
  void aarch64_elf_asm_named_section (const char *, unsigned, tree);
 +void aarch64_err_no_fpadvsimd (machine_mode, const char *);
  void aarch64_expand_epilogue (bool);
  void aarch64_expand_mov_immediate (rtx, rtx);
  void aarch64_expand_prologue (void);
 diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
 index 
 a79bb6a96572799181a5bff3c3818e294f87cb7a..3193a15970e5524e0f3a8a5505baea5582e55731
  100644
 --- a/gcc/config/aarch64/aarch64.c
 +++ b/gcc/config/aarch64/aarch64.c
 @@ -522,6 +522,16 @@ static const char * const aarch64_condition_codes[] =
hi, ls, ge, lt, gt, le, al, nv
  };
  
 +void
 +aarch64_err_no_fpadvsimd (machine_mode mode, const char *msg)
 +{
 +  const char *mc = FLOAT_MODE_P (mode) ? floating-point : vector;
 +  if (TARGET_GENERAL_REGS_ONLY)
 +error (%qs is incompatible with %s %s, -mgeneral-regs-only, mc, msg);
 +  else
 +error (%qs feature modifier is incompatible with %s %s, +nofp, mc, 
 msg);
 +}
 +
  static unsigned int
  aarch64_min_divisions_for_recip_mul (enum machine_mode mode)
  {
 @@ -1772,6 +1782,9 @@ aarch64_layout_arg (cumulative_args_t pcum_v, 
 machine_mode mode,
   and homogenous short-vector aggregates (HVA).  */
if (allocate_nvrn)
  {
 +  if (!TARGET_FLOAT)
 + aarch64_err_no_fpadvsimd (mode, argument);
 +
if (nvrn + nregs = NUM_FP_ARG_REGS)
   {
 pcum-aapcs_nextnvrn = nvrn + nregs;
 @@ -1898,6 +1911,17 @@ aarch64_init_cumulative_args (CUMULATIVE_ARGS *pcum,
pcum-aapcs_stack_words = 0;
pcum-aapcs_stack_size = 0;
  
 +  if (!TARGET_FLOAT
 +   fndecl  TREE_PUBLIC (fndecl)
 +   fntype  fntype != error_mark_node)
 +{
 +  const_tree type = TREE_TYPE (fntype);
 +  machine_mode mode ATTRIBUTE_UNUSED; /* To pass pointer as argument.  */
 +  int nregs ATTRIBUTE_UNUSED; /* Likewise.  */
 +  if (aarch64_vfp_is_call_or_return_candidate (TYPE_MODE (type), type,
 +mode, nregs, NULL))
 + aarch64_err_no_fpadvsimd (TYPE_MODE (type), return type);
 +}
return;
  }
  
 @@ -7557,9 +7581,7 @@ aarch64_expand_builtin_va_start (tree valist, rtx 
 nextarg ATTRIBUTE_UNUSED)
  
if (!TARGET_FLOAT)
  {
 -  if (cum-aapcs_nvrn  0)
 - sorry (%qs and floating point or vector arguments,
 --mgeneral-regs-only);
 +  gcc_assert (cum-aapcs_nvrn == 0);
vr_save_area_size = 0;
  }
  
 @@ -7666,8 +7688,7 @@ aarch64_gimplify_va_arg_expr (tree valist, tree type, 
 gimple_seq *pre_p,
  {
/* TYPE passed in fp/simd registers.  */
if (!TARGET_FLOAT)
 - sorry (%qs and floating point or vector arguments,
 --mgeneral-regs-only);
 + aarch64_err_no_fpadvsimd (mode, varargs);
  
f_top = build3 (COMPONENT_REF, TREE_TYPE (f_vrtop),
 unshare_expr (valist), f_vrtop, NULL_TREE);
 @@ -7904,9 +7925,7 @@ aarch64_setup_incoming_varargs (cumulative_args_t 
 cum_v, machine_mode mode,
  
if (!TARGET_FLOAT)
  {
 -  if (local_cum.aapcs_nvrn  0)
 - sorry (%qs and floating point or vector arguments,
 --mgeneral-regs-only);
 +  gcc_assert (local_cum.aapcs_nvrn == 0);
vr_saved = 0;
  }
  
 diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
 index 
 1efe57c91b10e47ab7511d089f7b4bb53f18f06e..a9c41fdfee64784591a4a7360652d7da3e7f90d1
  100644
 --- a/gcc/config/aarch64/aarch64.md
 +++ b/gcc/config/aarch64/aarch64.md
 @@ -979,16 +979,16 @@
[(set (match_operand:GPF 0 

Re: [C++17] Implement N3928 - Extending static_assert

2015-06-24 Thread Ed Smith-Rowland

On 06/17/2015 03:22 PM, Jason Merrill wrote:

On 06/17/2015 01:53 PM, Ed Smith-Rowland wrote:

I tried the obvious: an error message with %qE and got 'false'.
constexpr values are evaluated early on.

Is there a possibility that late folding could help or is that
completely different?


Late folding could help, but I think handling it in libcpp (by 
actually stringizing the argument) would work better.


Jason


OK, Doing the bit with libcpp and getting a better message is taking 
longer than I thought it would.

I'm going ahead with what I had back in May.
I'm still working on better - it'll just take a hot minute.

Meanwhile, this takes care of an annoyance for many.

Committed 224903.

Ed

cp/

2015-06-24  Edward Smith-Rowland  3dw...@verizon.net

Implement N3928 - Extending static_assert
* parser.c (cp_parser_static_assert): Support static_assert with
no message string.  Supply an empty string in this case.
* semantics.c (finish_static_assert): Don't try to print a message if
the message strnig is empty.


testsuite/

2015-06-24  Edward Smith-Rowland  3dw...@verizon.net

Implement N3928 - Extending static_assert
* g++.dg/cpp0x/static_assert8.C: Adjust.
* g++.dg/cpp0x/static_assert12.C: New.
* g++.dg/cpp0x/static_assert13.C: New.
* g++.dg/cpp1y/static_assert1.C: New.
* g++.dg/cpp1y/static_assert2.C: New.
* g++.dg/cpp1z/static_assert-nomsg.C: New.

Index: cp/parser.c
===
--- cp/parser.c (revision 224897)
+++ cp/parser.c (working copy)
@@ -12173,6 +12173,7 @@
 
static_assert-declaration:
  static_assert ( constant-expression , string-literal ) ; 
+ static_assert ( constant-expression ) ; (C++1Z)
 
If MEMBER_P, this static_assert is a class member.  */
 
@@ -12210,20 +12211,35 @@
/*allow_non_constant_p=*/true,
/*non_constant_p=*/dummy);
 
-  /* Parse the separating `,'.  */
-  cp_parser_require (parser, CPP_COMMA, RT_COMMA);
+  if (cp_lexer_peek_token (parser-lexer)-type == CPP_CLOSE_PAREN)
+{
+  if (cxx_dialect  cxx1z)
+   pedwarn (input_location, OPT_Wpedantic,
+static_assert without a message 
+only available with -std=c++1z or -std=gnu++1z);
+  /* Eat the ')'  */
+  cp_lexer_consume_token (parser-lexer);
+  message = build_string (1, );
+  TREE_TYPE (message) = char_array_type_node;
+  fix_string_type (message);
+}
+  else
+{
+  /* Parse the separating `,'.  */
+  cp_parser_require (parser, CPP_COMMA, RT_COMMA);
 
-  /* Parse the string-literal message.  */
-  message = cp_parser_string_literal (parser, 
-  /*translate=*/false,
-  /*wide_ok=*/true);
+  /* Parse the string-literal message.  */
+  message = cp_parser_string_literal (parser, 
+ /*translate=*/false,
+ /*wide_ok=*/true);
 
-  /* A `)' completes the static assertion.  */
-  if (!cp_parser_require (parser, CPP_CLOSE_PAREN, RT_CLOSE_PAREN))
-cp_parser_skip_to_closing_parenthesis (parser, 
-   /*recovering=*/true, 
-   /*or_comma=*/false,
-  /*consume_paren=*/true);
+  /* A `)' completes the static assertion.  */
+  if (!cp_parser_require (parser, CPP_CLOSE_PAREN, RT_CLOSE_PAREN))
+   cp_parser_skip_to_closing_parenthesis (parser, 
+   /*recovering=*/true, 
+   /*or_comma=*/false,
+  /*consume_paren=*/true);
+}
 
   /* A semicolon terminates the declaration.  */
   cp_parser_require (parser, CPP_SEMICOLON, RT_SEMICOLON);
Index: cp/semantics.c
===
--- cp/semantics.c  (revision 224897)
+++ cp/semantics.c  (working copy)
@@ -7174,8 +7174,17 @@
   input_location = location;
   if (TREE_CODE (condition) == INTEGER_CST 
integer_zerop (condition))
-/* Report the error. */
-error (static assertion failed: %s, TREE_STRING_POINTER (message));
+   {
+ int sz = TREE_INT_CST_LOW (TYPE_SIZE_UNIT
+(TREE_TYPE (TREE_TYPE (message;
+ int len = TREE_STRING_LENGTH (message) / sz - 1;
+  /* Report the error. */
+ if (len == 0)
+error (static assertion failed);
+ else
+error (static assertion failed: %s,
+  TREE_STRING_POINTER (message));
+   }
   else if (condition  condition != error_mark_node)
{
  error (non-constant condition for static assertion);
Index: 

[patch] PR debug/66653: avoid late_global_decl on decl_type_context()s

2015-06-24 Thread Aldy Hernandez
The problem here is that we are trying to call 
dwarf2out_late_global_decl() on a static variable in a template which 
has a type of TEMPLATE_TYPE_PARM:


template typename T class A
{
  static __thread T a;
};

We are calling late_global_decl because we are about to remove the 
unused static from the symbol table:


  /* See if the debugger can use anything before the DECL
 passes away.  Perhaps it can notice a DECL that is now a
 constant and can tag the early DIE with an appropriate
 attribute.

 Otherwise, this is the last chance the debug_hooks have
 at looking at optimized away DECLs, since
 late_global_decl will subsequently be called from the
 contents of the now pruned symbol table.  */
  if (!decl_function_context (node-decl))
(*debug_hooks-late_global_decl) (node-decl);

Since gen_type_die_with_usage() cannot handle TEMPLATE_TYPE_PARMs we ICE.

I think we need to avoid calling late_global_decl on DECL's for which 
decl_type_context() is true, similarly to what we do for the call to 
early_global_decl in rest_of_decl_compilation:


   !decl_function_context (decl)
   !current_function_decl
   DECL_SOURCE_LOCATION (decl) != BUILTINS_LOCATION
   !decl_type_context (decl))
(*debug_hooks-early_global_decl) (decl);

Presumably the old code did not run into this problem because the 
TEMPLATE_TYPE_PARAMs had been lowered by the time dwarf2out_decl was 
called, but here we are calling late_global_decl relatively early.


The attached patch fixes the problem.

Tested with --enable-languages=all.  Ada had other issues, so I skipped it.

OK for mainline?
commit 302f9976c53aa09e431bd54f37dbfeaa2c6b2acc
Author: Aldy Hernandez al...@redhat.com
Date:   Wed Jun 24 20:04:09 2015 -0700

PR debug/66653
* cgraphunit.c (analyze_functions): Do not call
debug_hooks-late_global_decl when decl_type_context.

diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
index 066a155..d2974ad 100644
--- a/gcc/cgraphunit.c
+++ b/gcc/cgraphunit.c
@@ -1149,7 +1149,8 @@ analyze_functions (bool first_time)
 at looking at optimized away DECLs, since
 late_global_decl will subsequently be called from the
 contents of the now pruned symbol table.  */
- if (!decl_function_context (node-decl))
+ if (!decl_function_context (node-decl)
+  !decl_type_context (node-decl))
(*debug_hooks-late_global_decl) (node-decl);
 
  node-remove ();
diff --git a/gcc/testsuite/g++.dg/debug/dwarf2/pr66653.C 
b/gcc/testsuite/g++.dg/debug/dwarf2/pr66653.C
new file mode 100644
index 000..bcaaf88
--- /dev/null
+++ b/gcc/testsuite/g++.dg/debug/dwarf2/pr66653.C
@@ -0,0 +1,8 @@
+// PR debug/54508
+// { dg-do compile }
+// { dg-options -g }
+
+template typename T class A
+{
+  static __thread T a;
+};


Re: [PATCH][GSoC] Extend shared_ptr to support arrays

2015-06-24 Thread Tim Shen
On Wed, Jun 24, 2015 at 10:33 AM, Fan You youfan.n...@gmail.com wrote:
 Hi,

 Here is the revised patch including all the test case.

 This can also be seen at https://github.com/Noeyfan/gcc-1 on branch
 shared_arrays

 Any comments?

I ran `git diff c7248656569bb0b4549f5c1ed347f7e028a15664
90aff5632fd9f3044d53ce190ae99fb69c41ce49`.

To systematically detect consecutive spaces (to convert them to tabs),
I'll just simply do:
`egrep ^\t* {8} shared_ptr*`

-   = typename conditionalis_array_Tp::value, _Array_Deleter,
_Normal_Deleter::type;
+   = typename conditionalis_array_Tp
+   ::value, _Array_Deleter, _Normal_Deleter::type;
Tabs. Also, I personally prefer to put '::value' to the same line as
is_array_Tp.

-  using __base_type = __shared_ptrelement_type;
+  using __Base_type = __shared_ptrelement_type;
_Base_type, not __Base_type. Also, the mostly used is _Base:

...src/gcc/libstdc++-v3 % grep -r '_Base[a-zA-Z_0-9]*' . -o | grep
':.*$' -o|sort|uniq -c
   2350 :_Base
  1 :_Base_biteset
 62 :_Base_bitset
120 :_Base_const_iterator
 20 :_Base_const_local_iterator
  4 :_Based
177 :_Base_iterator
  1 :_Base_Iterator
  8 :_Base_local_iterator
 21 :_Base_manager
133 :_Base_ptr
  9 :_Base_ref
  2 :_BaseSequence
173 :_Base_type
  3 :_BaseType

-   : __base_type(__p, _Deleter_type())
+   : __Base_type(__p, _Deleter_type())
Please be aware of tabs.

-  templatetypename _Tp1, typename = _Convertible_Tp1*
+  templatetypename _Tp1, typename = _Compatible_Tp1
__shared_ptr(__shared_ptr__libfund_v1_Tp1, _Lp __r) noexcept
-: __base_type(std::move(__r))
+: __Base_type(static_casttypename __shared_ptr
+  __libfund_v1_Tp1::__Base_type(std::move(__r)))
static_casttypename __shared_ptr__libfund_v1_Tp1::__Base_type(__r)
is enough, since std::move is actually a static_cast to rvalue reference.

Alternatively, you may define a template alias for the static_cast, if
you find it too long.

-   operator=(const __shared_ptr__libfund_v1_Tp1, _Lp __r) noexcept
+   operator=(const __shared_ptr_Tp1, _Lp __r) noexcept
Why?

templatetypename _Tp
  inline bool
  operator(const shared_ptr_Tp __a, nullptr_t) noexcept
- {
+ {
using _Tp_RE = typename remove_extent_Tp::type;
-   return std::less_Tp_RE()(__a.get(), nullptr);
+   return std::less_Tp_RE()(__a.get(), nullptr);
  }
using _Tp_RE = typename shared_ptr_Tp::element_type;



-- 
Regards,
Tim Shen


Re: [patch 4/5] Remove cgraph.h dependence on hard-reg-set.h

2015-06-24 Thread Jeff Law

On 06/16/2015 11:20 AM, Andrew MacLeod wrote:

cgraph.h requires hard-reg-set.h in order to compile simply because the
cgraph_rtl_info structure contains a HARD_REG_SET element.

All accesses to this structure are already handled by returning a
pointer to the structure within the cgraph_node.  By moving the
defintion of struct cgraph_rtl_info into rtl.h and maintaining a pointer
to it instead of the structure within cgraph_node,  the compilation
requirement on hard-reg-set.h can be completely removed when including
cgraph.h.  This will hopefully help prevent bringing hard-reg-set and
tm.h into a number of source files.

  The structure in rtl.h is protected by  checking for HARD_CONST (which
many other things in rtl.h do). This is mostly so generator files won't
trip over them.  2 source files needed adjustment because they didn't
include hard-reg-set.h before rtl.h.  I guess they never referenced the
other things protected by HARD_CONST in the file.  This ordering issue
should shortly be resolved by an include grouping.

Bootstraps on x86_64-unknown-linux-gnu with no new regressions. Also
passes all the targets in config-list.mk

OK for trunk?

OK.
jeff



Re: [PATCH IRA] save a bitmap check

2015-06-24 Thread Jeff Law

On 06/24/2015 03:54 AM, Zhouyi Zhou wrote:


In function assign_hard_reg, checking the bit of conflict_a in
consideration_allocno_bitmap is unneccesary, because when retry_p is
false, conflicting objects are always inside of the same loop_node
(this is ensured in function process_bb_node_lives which marks the
living objects to death near the end of that function).



Bootstrap and regtest scheduled on x86_64 GNU/Linux
Signed-off-by: Zhouyi Zhou yizhouz...@ict.ac.cn
---
  gcc/ChangeLog   | 4 
  gcc/ira-color.c | 6 ++
  2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index d1f82b2..07605ae 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,7 @@
+2015-06-24  Zhouyi Zhou  yizhouz...@ict.ac.cn
+
+   * ira-color.c (assign_hard_reg): save a bitmap check
My concern here is the invariant you're exploiting to eliminate the 
redundant bitmap check is far from obvious and there's no good way I can 
see to ensure that invariant remains invariant.


Without some solid performance data indicating this is a notable 
compile-time improvement, I don't think it's a wise idea.


If it does turn out that this is a noteworthy compile-time improvement, 
then you would need a comment before this conditional explaining in 
detail why we don't need to check for conflict's allocno in 
consideration_allocno_bitmap.


Jeff


Re: [05/13] Add nofree_ptr_hash

2015-06-24 Thread Jeff Law

On 06/24/2015 02:23 AM, Richard Sandiford wrote:

Jeff Law l...@redhat.com writes:

So I'm holding off on approving this one pending further discussion of
the use of multiple inheritance for nofree_ptr_hash.


I thought that might be controversial. :-)  My two main defences are:

1) This is multiple inheritance of traits classes, which all just have
static member functions, rather than multiple inheritance of data-
carrying classes.  It's really just a union of two separate groups
of functions.
As I was thinking about this during review I almost convinced myself 
that multiple inheritance from traits classes ought to be acceptable.


As you state, they don't carry data and we're just getting a union of 
their functions.  One could probably even argue that traits classes by 
their nature are designed to be composed with other traits and classes.


I'm (obviously) not as well versed in this stuff as I ought to be, hence 
my conservatism.  It'd be real helpful if folks with more real world 
experience in this space could chime in on the pros/cons if this approach.


If we do go forward, ISTM updating our coding conventions to codify this 
exception to the avoid MI would be wise.  And my inclination is to go 
forward, but let's give other folks a chance to chime in.



Jeff


Re: [PATCH][RFC] Add FRE in pass_vectorize

2015-06-24 Thread Jeff Law

On 06/24/2015 01:59 AM, Richard Biener wrote:

Redundant, basically two IVs with the same initial value and same step.
IVOPTs can deal with this if the initial values and the step are already
same enough - the vectorizer can end up generating redundant huge
expressions for both.
Ah, so yes, this is a totally different issue than Alan and I are 
discussing.



RTL CSE is bloody expensive and so many times I wanted the ability to know a
bit about what the loop optimizer had done (or not done) so that I could
conditionally skip the second CSE pass.   We never built that, but it's
something I've wanted for decades.


Hmm, ok.  We can abuse pass properties for this but I don't think
they are a scalable fit.  Not sure if we'd like to go full way
adding sth like PROP_want_ccp PROP_want_copyprop PROP_want_cse, etc.
(any others?).  And whether FRE would then catch a PROP_want_copyprop
because it also can do copy propagation.
And that's why we haven't pushed hard on this issue -- it doesn't scale 
and to make it scale requires rethinking the basics of the pass manager.




Going a bit further here, esp. in the loop context, would be to
have the basic cleanups be region-based.  Because given a big
function with many loops and just one vectorized it would be
enough to cleanup the vectorized loop (yes, and in theory
all downstream effects, but that's probably secondary and not
so important).  It's not too difficult to make FRE run on
a MEME region, the interesting part, engineering-wise, is to
really make it O(size of MEME region) - that is, eliminate
things like O(num_ssa_names) or O(n_basic_blocks) setup cost.
I had a long talk with some of the SGI compiler guys many years ago 
about region-based optimizations.  It was something they had been trying 
to bring into their compiler for years, but never got it working to a 
point where they were happy with it.  While they didn't show me the 
code, they indicated the changes were highly invasive -- and all the 
code had been #ifdef'd out because it just didn't work.  Naturally it 
was all bitrotting.









And then there is the possibility of making passes generate less
needs to perform cleanups after them - like in the present case
with the redundant IVs make them more appearant redundant by
CSEing the initial value and step during vectorizer code generation.
I'm playing with the idea of adding a simple CSE machinery to
the gimple_build () interface (aka match-and-simplify).  It
eventually invokes (well, not currently, but that can be fixed)
maybe_push_res_to_seq which is a good place to maintain a
table of already generated expressions.  That of course only
works if you either always append to the same sequence or at least
insert at the same place.
As you know we've gone back and forth on this in the past.  It's always 
a trade-off.  I still ponder from time to time putting the simple CSE 
and cprop bits back into the SSA rewriting phase to avoid generating all 
kinds of garbage that just needs to be cleaned up later -- particularly 
for incremental SSA updates.




Jeff


Re: [PATCH] i386: Do not modify existing RTL (PR66412)

2015-06-24 Thread Jeff Law

On 06/24/2015 05:29 PM, Segher Boessenkool wrote:

A few define_split's in the i386 backend modify RTL in place.  This does
not work.  This patch fixes all cases that do PUT_MODE on existing RTL.

Bootstrapped and tested; no regressions.  Is this okay for trunk?

Hrm, this wants the testcase in that PR added I suppose.  Will send
it separately.


Segher


2015-06-24  Segher Boessenkool  seg...@kernel.crashing.org

* config/i386/i386.md (various splitters): Use copy_rtx before
doing PUT_MODE on operands.
Are the copies really needed?  If we're slamming a mode into an 
ix86_comparison_operator, we should be safe since those can't be shared. 
 Copying is just wasteful.


Jeff


Re: [Patch SRA] Fix PR66119 by calling get_move_ratio in SRA

2015-06-24 Thread Jeff Law

On 06/23/2015 09:42 AM, James Greenhalgh wrote:


On Tue, Jun 23, 2015 at 09:52:01AM +0100, Jakub Jelinek wrote:

On Tue, Jun 23, 2015 at 09:18:52AM +0100, James Greenhalgh wrote:

This patch fixes the issue by always calling get_move_ratio in the SRA
code, ensuring that an up-to-date value is used.

Unfortunately, this means we have to use 0 as a sentinel value for
the parameter - indicating no user override of the feature - and
therefore cannot use it to disable scalarization. However, there
are other ways to disable scalarazation (-fno-tree-sra) so this is not
a great loss.


You can handle even that.



snip


   enum compiler_param param
 = optimize_function_for_size_p (cfun)
   ? PARAM_SRA_MAX_SCALARIZATION_SIZE_SIZE
   : PARAM_SRA_MAX_SCALARIZATION_SIZE_SPEED;
   unsigned max_scalarization_size = PARAM_VALUE (param) * BITS_PER_UNIT;
   if (!max_scalarization_size  !global_options_set.x_param_values[param])

Then it will handle explicit --param sra-max-scalarization-size-Os*=0
differently from implicit 0.


Ah hah! OK, I've respun the patch removing this extra justification in
the documentation and reshuffling the logic a little.


OT, shouldn't max_scalarization_size be at least unsigned HOST_WIDE_INT,
so that it doesn't overflow for larger values (0x4000 etc.)?
Probably need some cast in the multiplication to avoid UB in the compiler.


I've increased the size of max_scalarization_size to a UHWI in this spin.

Bootstrapped and tested on AArch64 and x86-64 with no issues and checked
to see the PR is fixed.

OK for trunk, and gcc-5 in a few days?

Thanks,
James

---
gcc/

2015-06-23  James Greenhalgh  james.greenha...@arm.com

PR tree-optimization/66119
* toplev.c (process_options): Don't set up default values for
the sra_max_scalarization_size_{speed,size} parameters.
* tree-sra (analyze_all_variable_accesses): If no values
have been set for the sra_max_scalarization_size_{speed,size}
parameters, call get_move_ratio to get target defaults.

Any testcase for this change?

OK with a testcase.

jeff



Re: [PATCH v2] Rerun loop-header-copying just before vectorization

2015-06-24 Thread Jeff Law

On 06/19/2015 11:32 AM, Alan Lawrence wrote:

This is a respin of
https://gcc.gnu.org/ml/gcc-patches/2015-05/msg02139.html . Changes are:

* Separate the two passes by descending from a common base class,
allowing different predicates;
* Test flag_tree_vectorize, and loop-force_vectorize/dont_vectorize
- this fixes the test failing before;
* Simplify the check for code after exit edge;
* Revert unnecessary changes to pass_tree_loop_init::execute;
* Revert change to slp-perm-7 test (following fix by Marc Glisse)
So FWIW, if you don't want to make this a separate pass, you'd probably 
want the code which allows us to run the phi-only propagator as a 
subroutine to propagate and eliminate those degenerate PHIs.  I posted 
it a year or two ago, but went a different direction to solve whatever 
issue I was looking at.


I'm comfortable with this as a separate pass and relying on cfg cleanups 
to handle this stuff for us as this implementation of your patch 
currently does.





Bootstrapped + check-gcc on aarch64 and x86_64 (linux).

gcc/ChangeLog:

 * tree-pass.h (make_pass_ch_vect): New.
 * passes.def: Add pass_ch_vect just before pass_if_conversion.

 * tree-ssa-loop-ch.c (pass_ch_base, pass_ch_vect, pass_data_ch_vect,
 pass_ch::process_loop_p): New.
 (pass_ch): Extend pass_ch_base.

 (pass_ch::execute): Move all but loop_optimizer_init/finalize to...
 (pass_ch_base::execute): ...here.

gcc/testsuite/ChangeLog:

 * gcc.dg/vect/vect-strided-a-u16-i4.c (main1): Narrow scope of
x,y,z,w.
 of unsigned
 * gcc.dg/vect/vect-ifcvt-11.c: New.
Can you add a function comment to ch_base::copy_headers.  I know it 
didn't have one before, but it really should have one.


I'd also add a comment to the execute methods.  pass_ch initializes and 
finalizes loop structures while pass_ch_vect::execute assumes the loop 
structures are already initialized and finalization is assumed to be 
handled earlier in the call chain.


I'd also suggest a comment to the process_loop_p method.


+
+  /* Apply copying if the exit block looks to have code after it.  */
+  edge_iterator ei;
+  edge e;
+  FOR_EACH_EDGE (e, ei, exit-src-succs)
+if (!loop_exit_edge_p (loop, e)
+e-dest != loop-header
+e-dest != loop-latch)
+  return true; /* Block with exit edge has code after it.  */
Don't put comments on the same line as code.  Instead I'd suggest 
describing the CFG pattern your looking for as part of the comment 
before the loop over the edges.



With those comment fixes, this is OK for the trunk.

jeff


RE: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math

2015-06-24 Thread Kumar, Venkataramanan
Hi, 

If I understand correct, current implementation replaces 

fdiv 
fsqrt

 by  
 frsqrte
for i=0 to 3
fmul
frsqrts  
fmul

So I think gains depends latency of  frsqrts  insn.

I see patch has patterns for  vector versions of frsqrts, but does not enable 
them?

Regards,
Venkat.

 -Original Message-
 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
 ow...@gcc.gnu.org] On Behalf Of Dr. Philipp Tomsich
 Sent: Wednesday, June 24, 2015 10:22 PM
 To: Evandro Menezes
 Cc: Benedikt Huber; gcc-patches@gcc.gnu.org
 Subject: Re: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt)
 estimation in -ffast-math
 
 Evandro,
 
 We’ve seen a 28% speed-up on gromacs in SPECfp for the (scalar) reciprocal
 sqrt.
 
 Also, the “reciprocal divide” patches are floating around in various of our 
 git-
 tree, but aren’t ready for public consumption, yet… I’ll leave Benedikt to
 comment on potential timelines for getting that pushed out.
 
 Best,
 Philipp.
 
  On 24 Jun 2015, at 18:42, Evandro Menezes e.mene...@samsung.com
 wrote:
 
  Benedikt,
 
  You beat me to it! :-)  Do you have the implementation for dividing
  using the Newton series as well?
 
  I'm not sure that the series is always for all data types and on all
  processors.  It would be useful to allow each AArch64 processor to
  enable this or not depending on the data type.  BTW, do you have some
  tests showing the speed up?
 
  Thank you,
 
  --
  Evandro Menezes  Austin, TX
 
  -Original Message-
  From: gcc-patches-ow...@gcc.gnu.org
  [mailto:gcc-patches-ow...@gcc.gnu.org]
  On
  Behalf Of Benedikt Huber
  Sent: Thursday, June 18, 2015 7:04
  To: gcc-patches@gcc.gnu.org
  Cc: benedikt.hu...@theobroma-systems.com;
 philipp.tomsich@theobroma-
  systems.com
  Subject: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt)
  estimation in -ffast-math
 
  arch64 offers the instructions frsqrte and frsqrts, for rsqrt
  estimation
  and
  a Newton-Raphson step, respectively.
  There are ARMv8 implementations where this is faster than using fdiv
  and rsqrt.
  It runs three steps for double and two steps for float to achieve the
  needed
  precision.
 
  There is one caveat and open question.
  Since -ffast-math enables flush to zero intermediate values between
  approximation steps will be flushed to zero if they are denormal.
  E.g. This happens in the case of rsqrt (DBL_MAX) and rsqrtf (FLT_MAX).
  The test cases pass, but it is unclear to me whether this is expected
  behavior with -ffast-math.
 
  The patch applies to commit:
  svn+ssh://gcc.gnu.org/svn/gcc/trunk@224470
 
  Please consider including this patch.
  Thank you and best regards,
  Benedikt Huber
 
  Benedikt Huber (1):
   2015-06-15  Benedikt Huber  benedikt.huber@theobroma-
 systems.com
 
  gcc/ChangeLog|   9 +++
  gcc/config/aarch64/aarch64-builtins.c|  60 
  gcc/config/aarch64/aarch64-protos.h  |   2 +
  gcc/config/aarch64/aarch64-simd.md   |  27 
  gcc/config/aarch64/aarch64.c |  63 +
  gcc/config/aarch64/aarch64.md|   3 +
  gcc/testsuite/gcc.target/aarch64/rsqrt.c | 113
  +++
  7 files changed, 277 insertions(+)
  create mode 100644 gcc/testsuite/gcc.target/aarch64/rsqrt.c
 
  --
  1.9.1
  Mail Attachment.eml



Re: Remove redundant AND from count reduction loop

2015-06-24 Thread Jeff Law

On 06/24/2015 05:43 AM, Richard Biener wrote:


Note that ISTR code performing exactly the opposite transform in
fold-const.c ...


That's another reason why I'm worried about just doing the (negate ...)
thing without knowing whether the negate can be folded into anything else.


I'm not aware of anything here.
It's worth looking at -- I've certainly seen cases where we end up 
infinite recursion because we've got a transformation in once place (say 
match.pd) and its inverse elsewhere (fold-const.c).


Jeff


Re: [patch] fix regrename pass to ensure renamings produce valid insns

2015-06-24 Thread Eric Botcazou
 Like this?  I tested this on nios2 and x86_64-linux-gnu, as before, plus
 built for aarch64-linux-gnu and ran the gcc testsuite.

Yes, the patch is OK, modulo...

 The c6x back end also calls regrename_do_replace.  I am not set up to
 build or test on that target, and Bernd told me off-list that it would
 never fail on that target anyway so I have left that code alone.

... Bernd has obviously the final say here, but it would be better to add an 
assertion that it indeed did not fail (just build the cc1 as a sanity check).

Thanks for adding the missing head comment to regrename_do_replace.

-- 
Eric Botcazou


RE: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math

2015-06-24 Thread Evandro Menezes
Benedikt,

You beat me to it! :-)  Do you have the implementation for dividing using
the Newton series as well?

I'm not sure that the series is always for all data types and on all
processors.  It would be useful to allow each AArch64 processor to enable
this or not depending on the data type.  BTW, do you have some tests showing
the speed up?

Thank you,

-- 
Evandro Menezes  Austin, TX

 -Original Message-
 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org]
On
 Behalf Of Benedikt Huber
 Sent: Thursday, June 18, 2015 7:04
 To: gcc-patches@gcc.gnu.org
 Cc: benedikt.hu...@theobroma-systems.com; philipp.tomsich@theobroma-
 systems.com
 Subject: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt)
 estimation in -ffast-math
 
 arch64 offers the instructions frsqrte and frsqrts, for rsqrt estimation
and
 a Newton-Raphson step, respectively.
 There are ARMv8 implementations where this is faster than using fdiv and
 rsqrt.
 It runs three steps for double and two steps for float to achieve the
needed
 precision.
 
 There is one caveat and open question.
 Since -ffast-math enables flush to zero intermediate values between
 approximation steps will be flushed to zero if they are denormal.
 E.g. This happens in the case of rsqrt (DBL_MAX) and rsqrtf (FLT_MAX).
 The test cases pass, but it is unclear to me whether this is expected
 behavior with -ffast-math.
 
 The patch applies to commit:
 svn+ssh://gcc.gnu.org/svn/gcc/trunk@224470
 
 Please consider including this patch.
 Thank you and best regards,
 Benedikt Huber
 
 Benedikt Huber (1):
   2015-06-15  Benedikt Huber  benedikt.hu...@theobroma-systems.com
 
  gcc/ChangeLog|   9 +++
  gcc/config/aarch64/aarch64-builtins.c|  60 
  gcc/config/aarch64/aarch64-protos.h  |   2 +
  gcc/config/aarch64/aarch64-simd.md   |  27 
  gcc/config/aarch64/aarch64.c |  63 +
  gcc/config/aarch64/aarch64.md|   3 +
  gcc/testsuite/gcc.target/aarch64/rsqrt.c | 113
 +++
  7 files changed, 277 insertions(+)
  create mode 100644 gcc/testsuite/gcc.target/aarch64/rsqrt.c
 
 --
 1.9.1
---BeginMessage---
   * config/aarch64/aarch64-builtins.c: Builtins for rsqrt and rsqrtf.
   * config/aarch64/aarch64-protos.h: Declare.
   * config/aarch64/aarch64-simd.md: Matching expressions for frsqrte
and frsqrts.
   * config/aarch64/aarch64.c: New functions. Emit rsqrt estimation code
in fast math mode.
   * config/aarch64/aarch64.md: Added enum entry.
   * testsuite/gcc.target/aarch64/rsqrt.c: Tests for single and double.
---
 gcc/ChangeLog|   9 +++
 gcc/config/aarch64/aarch64-builtins.c|  60 
 gcc/config/aarch64/aarch64-protos.h  |   2 +
 gcc/config/aarch64/aarch64-simd.md   |  27 
 gcc/config/aarch64/aarch64.c |  63 +
 gcc/config/aarch64/aarch64.md|   3 +
 gcc/testsuite/gcc.target/aarch64/rsqrt.c | 113
+++
 7 files changed, 277 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/rsqrt.c

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index c9b156f..690ebba 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,12 @@
+2015-06-15  Benedikt Huber  benedikt.hu...@theobroma-systems.com
+
+   * config/aarch64/aarch64-builtins.c: Builtins for rsqrt and rsqrtf.
+   * config/aarch64/aarch64-protos.h: Declare.
+   * config/aarch64/aarch64-simd.md: Matching expressions for frsqrte
and frsqrts.
+   * config/aarch64/aarch64.c: New functions. Emit rsqrt estimation
code in fast math mode.
+   * config/aarch64/aarch64.md: Added enum entry.
+   * testsuite/gcc.target/aarch64/rsqrt.c: Tests for single and double.
+
 2015-06-14  Richard Sandiford  richard.sandif...@arm.com
 
* rtl.h (classify_insn): Declare.
diff --git a/gcc/config/aarch64/aarch64-builtins.c
b/gcc/config/aarch64/aarch64-builtins.c
index f7a39ec..484bb84 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -342,6 +342,8 @@ enum aarch64_builtins
   AARCH64_BUILTIN_GET_FPSR,
   AARCH64_BUILTIN_SET_FPSR,
 
+  AARCH64_BUILTIN_RSQRT,
+  AARCH64_BUILTIN_RSQRTF,
   AARCH64_SIMD_BUILTIN_BASE,
   AARCH64_SIMD_BUILTIN_LANE_CHECK,
 #include aarch64-simd-builtins.def
@@ -831,6 +833,32 @@ aarch64_init_crc32_builtins ()
 }
 
 void
+aarch64_add_builtin_rsqrt (void)
+{
+  tree fndecl = NULL;
+  tree ftype = NULL;
+  ftype = build_function_type_list (double_type_node, double_type_node,
NULL_TREE);
+
+  fndecl = add_builtin_function (__builtin_aarch64_rsqrt,
+ ftype,
+ AARCH64_BUILTIN_RSQRT,
+ BUILT_IN_MD,
+ NULL,
+ NULL_TREE);
+  aarch64_builtin_decls[AARCH64_BUILTIN_RSQRT] = 

Re: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math

2015-06-24 Thread Dr. Philipp Tomsich
Evandro,

We’ve seen a 28% speed-up on gromacs in SPECfp for the (scalar) reciprocal sqrt.

Also, the “reciprocal divide” patches are floating around in various of our 
git-tree, but 
aren’t ready for public consumption, yet… I’ll leave Benedikt to comment on 
potential 
timelines for getting that pushed out.

Best,
Philipp.

 On 24 Jun 2015, at 18:42, Evandro Menezes e.mene...@samsung.com wrote:
 
 Benedikt,
 
 You beat me to it! :-)  Do you have the implementation for dividing using
 the Newton series as well?
 
 I'm not sure that the series is always for all data types and on all
 processors.  It would be useful to allow each AArch64 processor to enable
 this or not depending on the data type.  BTW, do you have some tests showing
 the speed up?
 
 Thank you,
 
 -- 
 Evandro Menezes  Austin, TX
 
 -Original Message-
 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org]
 On
 Behalf Of Benedikt Huber
 Sent: Thursday, June 18, 2015 7:04
 To: gcc-patches@gcc.gnu.org
 Cc: benedikt.hu...@theobroma-systems.com; philipp.tomsich@theobroma-
 systems.com
 Subject: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt)
 estimation in -ffast-math
 
 arch64 offers the instructions frsqrte and frsqrts, for rsqrt estimation
 and
 a Newton-Raphson step, respectively.
 There are ARMv8 implementations where this is faster than using fdiv and
 rsqrt.
 It runs three steps for double and two steps for float to achieve the
 needed
 precision.
 
 There is one caveat and open question.
 Since -ffast-math enables flush to zero intermediate values between
 approximation steps will be flushed to zero if they are denormal.
 E.g. This happens in the case of rsqrt (DBL_MAX) and rsqrtf (FLT_MAX).
 The test cases pass, but it is unclear to me whether this is expected
 behavior with -ffast-math.
 
 The patch applies to commit:
 svn+ssh://gcc.gnu.org/svn/gcc/trunk@224470
 
 Please consider including this patch.
 Thank you and best regards,
 Benedikt Huber
 
 Benedikt Huber (1):
  2015-06-15  Benedikt Huber  benedikt.hu...@theobroma-systems.com
 
 gcc/ChangeLog|   9 +++
 gcc/config/aarch64/aarch64-builtins.c|  60 
 gcc/config/aarch64/aarch64-protos.h  |   2 +
 gcc/config/aarch64/aarch64-simd.md   |  27 
 gcc/config/aarch64/aarch64.c |  63 +
 gcc/config/aarch64/aarch64.md|   3 +
 gcc/testsuite/gcc.target/aarch64/rsqrt.c | 113
 +++
 7 files changed, 277 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/rsqrt.c
 
 --
 1.9.1
 Mail Attachment.eml



Re: C PATCH to use is_global_var

2015-06-24 Thread Jeff Law

On 06/24/2015 04:22 AM, Marek Polacek wrote:

This patch makes the C FE use the predicate is_global_var in place of direct

   TREE_STATIC (t) || DECL_EXTERNAL (t)

It should improve readability a bit and make predicates easier to follow.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2015-06-24  Marek Polacek  pola...@redhat.com

* c-common.c (handle_no_reorder_attribute): Use is_global_var.
* cilk.c (extract_free_variables): Likewise.

* c-decl.c: Use is_global_var throughout.
* c-parser.c: Likewise.
* c-typeck.c: Likewise.
OK.  If you find other places where you can use is_global_var to replace 
the TREE_STATIC || DECL_EXTERNAL check, consider them pre-approved.


jeff



Re: fix PR46029: reimplement if conversion of loads and stores

2015-06-24 Thread Ramana Radhakrishnan



On 12/06/15 21:50, Abe Skolnik wrote:

Hi everybody!

In the current implementation of if conversion, loads and stores are
if-converted in a thread-unsafe way:

   * loads were always executed, even when they should have not been.
 Some source code could be rendered invalid due to null pointers
 that were OK in the original program because they were never
 dereferenced.

   * writes were if-converted via load/maybe-modify/store, which
 renders some code multithreading-unsafe.

This patch reimplements if-conversion of loads and stores in a safe
way using a scratchpad allocated by the compiler on the stack:

   * loads are done through an indirection, reading either the correct
 data from the correct source [if the condition is true] or reading
 from the scratchpad and later ignoring this read result [if the
 condition is false].

   * writes are also done through an indirection, writing either to the
 correct destination [if the condition is true] or to the
 scratchpad [if the condition is false].

Vectorization of if-cvt-stores-vect-ifcvt-18.c disabled because the
old if-conversion resulted in unsafe code that could fail under
multithreading even though the as-written code _was_ thread-safe.

Passed regression testing and bootstrap on amd64-linux.
Is this OK to commit to trunk?


I can't approve or reject but this certainly looks like an improvement 
compared to where we are as we get rid of the data races.


The only gotcha I can think with this approach is that this introduces 
false dependencies that would cause unnecessary write-after-write 
hazards with the writes to the scratchpad when you unroll the loop - but 
that's not necessarily worse than where we are today.


Some fun stats from a previous Friday afternoon poke at this without 
doing any benchmarking as such.


In a bootstrap with BOOT_CFLAGS=-O2 -ftree-loop-if-convert-stores and 
one without it, I see about 12.20% more csel's on an AArch64 bootstrap 
(goes from 7898 - 8862) vs plain old -O2.


And I did see the one case in libquantum get sorted with this, though 
the performance results were funny let's say (+5% in one case, -1.5% on 
another core), I haven't analysed it deeply yet but it does look 
interesting.


regards
Ramana




Regards,

Abe




2015-06-12  Sebastian Pop  s@samsung.com
 Abe Skolnik  a.skol...@samsung.com

PR tree-optimization/46029
* tree-data-ref.c (struct data_ref_loc_d): Moved...
(get_references_in_stmt): Exported.
* tree-data-ref.h (struct data_ref_loc_d): ... here.
(get_references_in_stmt): Declared.

* doc/invoke.texi (-ftree-loop-if-convert-stores): Update description.
* tree-if-conv.c (struct ifc_dr): Removed.
(IFC_DR): Removed.
(DR_WRITTEN_AT_LEAST_ONCE): Removed.
(DR_RW_UNCONDITIONALLY): Removed.
(memrefs_read_or_written_unconditionally): Removed.
(write_memrefs_written_at_least_once): Removed.
(ifcvt_could_trap_p): Does not take refs parameter anymore.
(ifcvt_memrefs_wont_trap): Removed.
(has_non_addressable_refs): New.
(if_convertible_gimple_assign_stmt_p): Call has_non_addressable_refs.
Removed use of refs.
(if_convertible_stmt_p): Removed use of refs.
(if_convertible_gimple_assign_stmt_p): Same.
(if_convertible_loop_p_1): Removed use of refs.  Remove initialization
of dr-aux, DR_WRITTEN_AT_LEAST_ONCE, and DR_RW_UNCONDITIONALLY.
(insert_address_of): New.
(create_scratchpad): New.
(create_indirect_cond_expr): New.
(predicate_mem_writes): Call create_indirect_cond_expr.  Take an extra
parameter for scratch_pad.
(combine_blocks): Same.
(tree_if_conversion): Same.

testsuite/
* g++.dg/tree-ssa/ifc-pr46029.C: New.
* gcc.dg/tree-ssa/ifc-5.c: Make it exactly like the FFmpeg kernel.
* gcc.dg/tree-ssa/ifc-8.c: New.
* gcc.dg/tree-ssa/ifc-9.c: New.
* gcc.dg/tree-ssa/ifc-10.c: New.
* gcc.dg/tree-ssa/ifc-11.c: New.
* gcc.dg/tree-ssa/ifc-12.c: New.
* gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c: Disabled.
* gcc.dg/vect/if-cvt-stores-vect-ifcvt-19.c: New.
---
  gcc/ChangeLog  |  28 ++
  gcc/doc/invoke.texi|  18 +-
  gcc/testsuite/g++.dg/tree-ssa/ifc-pr46029.C|  76 
  gcc/testsuite/gcc.dg/tree-ssa/ifc-10.c |  17 +
  gcc/testsuite/gcc.dg/tree-ssa/ifc-11.c |  16 +
  gcc/testsuite/gcc.dg/tree-ssa/ifc-12.c |  13 +
  gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c  |  19 +-
  gcc/testsuite/gcc.dg/tree-ssa/ifc-8.c  |  29 ++
  gcc/testsuite/gcc.dg/tree-ssa/ifc-9.c  |  17 +
  .../gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c  |  10 +-
  .../gcc.dg/vect/if-cvt-stores-vect-ifcvt-19.c  |  46 +++
  gcc/tree-data-ref.c 

Re: [PATCH] i386: Do not modify existing RTL (PR66412)

2015-06-24 Thread Jeff Law

On 06/24/2015 09:40 PM, Jeff Law wrote:

On 06/24/2015 05:29 PM, Segher Boessenkool wrote:

A few define_split's in the i386 backend modify RTL in place.  This does
not work.  This patch fixes all cases that do PUT_MODE on existing RTL.

Bootstrapped and tested; no regressions.  Is this okay for trunk?

Hrm, this wants the testcase in that PR added I suppose.  Will send
it separately.


Segher


2015-06-24  Segher Boessenkool  seg...@kernel.crashing.org

* config/i386/i386.md (various splitters): Use copy_rtx before
doing PUT_MODE on operands.

Are the copies really needed?  If we're slamming a mode into an
ix86_comparison_operator, we should be safe since those can't be shared.
  Copying is just wasteful.
It might be worth verifying that something else hasn't created shared 
RTL in violation of the RTL sharing assumptions.


Jeff


Re: [patch] fix regrename pass to ensure renamings produce valid insns

2015-06-24 Thread Jeff Law

On 06/23/2015 07:00 PM, Sandra Loosemore wrote:

On 06/18/2015 11:32 AM, Eric Botcazou wrote:

The attached patch teaches regrename to validate insns affected by each
register renaming before making the change.  I can see at least two
other ways to handle this -- earlier, by rejecting renamings that result
in invalid instructions when it's searching for the best renaming; or
later, by validating the entire set of renamings as a group instead of
incrementally for each one -- but doing it all in regname_do_replace
seems least disruptive and risky in terms of the existing code.


OK, but the patch looks incomplete, rename_chains should be adjusted
as well,
i.e. regrename_do_replace should now return a boolean.


Like this?  I tested this on nios2 and x86_64-linux-gnu, as before, plus
built for aarch64-linux-gnu and ran the gcc testsuite.

The c6x back end also calls regrename_do_replace.  I am not set up to
build or test on that target, and Bernd told me off-list that it would
never fail on that target anyway so I have left that code alone.

-Sandra

regrename-2.log


2015-06-23  Chung-Lin Tangclt...@codesourcery.com
Sandra Loosemoresan...@codesourcery.com

gcc/
* regrename.h (regrename_do_replace): Change to return bool.
* regrename.c (rename_chains): Check return value of
regname_do_replace.
(regrename_do_replace): Re-validate the modified insns and
return bool status.
* config/aarch64/cortex-a57-fma-steering.c (rename_single_chain):
Update to match rename_chains changes.
As Eric mentioned, please put an assert to verify that the call from the 
c6x backend never fails.


The regrename and ARM bits are fine.

Do you have a testcase that you can add to the suite?  If so it'd be 
appreciated if you could include that too.


Approved with the c6x assert if a testcase isn't available or 
exceedingly difficult to produce.


jeff



Re: [PATCH] i386: Do not modify existing RTL (PR66412)

2015-06-24 Thread Segher Boessenkool
On Wed, Jun 24, 2015 at 09:40:28PM -0600, Jeff Law wrote:
 On 06/24/2015 05:29 PM, Segher Boessenkool wrote:
 A few define_split's in the i386 backend modify RTL in place.  This does
 not work.  This patch fixes all cases that do PUT_MODE on existing RTL.

  * config/i386/i386.md (various splitters): Use copy_rtx before
  doing PUT_MODE on operands.
 Are the copies really needed?  If we're slamming a mode into an 
 ix86_comparison_operator, we should be safe since those can't be shared. 
  Copying is just wasteful.

combine still holds pointers to the old rtx, which is what is causing
the problem in the PR (it does always unshare things in the end, but
it does not make copies while it's working).  Either those few splitters
need to do the copy (and some already do), or combine has to do the copy
always, which would be more wasteful.

It has always been this way as far as I see?  Am I missing something?

[ I see i386 also does PUT_CODE in a few more splitters, hrm. ]


Segher


Re: C++ PATCH for c++/66501 (wrong code with array move assignment)

2015-06-24 Thread Jason Merrill

On 06/23/2015 10:05 AM, Jason Merrill wrote:

build_vec_init was assuming that if a class has a trivial copy
assignment, then an array assignment is trivial.  But overload
resolution might not choose the copy assignment operator.  So this patch
changes build_vec_init to check for any non-trivial assignment operator.


On further consideration, it occurred to me that is_trivially_xible 
gives the precise answer we want, so I'm changing build_vec_init to use it.


On 4.9 we don't have is_trivially_xible, so I'm doing a simpler check, 
just adding TYPE_HAS_COMPLEX_MOVE_ASSIGN to the mix.


Tested x86_64-pc-linux-gnu, applying to trunk, 5 and 4.9.


commit d4d071b1f6552bfe57a1ed9e27de028580958afd
Author: Jason Merrill ja...@redhat.com
Date:   Tue Jun 23 22:02:30 2015 -0400

	PR c++/66501
	* init.c (vec_copy_assign_is_trivial): New.
	(build_vec_init): Use it.

diff --git a/gcc/cp/init.c b/gcc/cp/init.c
index 957a7a4..04c09d8 100644
--- a/gcc/cp/init.c
+++ b/gcc/cp/init.c
@@ -3367,6 +3367,18 @@ get_temp_regvar (tree type, tree init)
   return decl;
 }
 
+/* Subroutine of build_vec_init.  Returns true if assigning to an array of
+   INNER_ELT_TYPE from INIT is trivial.  */
+
+static bool
+vec_copy_assign_is_trivial (tree inner_elt_type, tree init)
+{
+  tree fromtype = inner_elt_type;
+  if (real_lvalue_p (init))
+fromtype = cp_build_reference_type (fromtype, /*rval*/false);
+  return is_trivially_xible (MODIFY_EXPR, inner_elt_type, fromtype);
+}
+
 /* `build_vec_init' returns tree structure that performs
initialization of a vector of aggregate types.
 
@@ -3443,8 +3455,7 @@ build_vec_init (tree base, tree maxindex, tree init,
TREE_CODE (atype) == ARRAY_TYPE
TREE_CONSTANT (maxindex)
(from_array == 2
-	  ? (!CLASS_TYPE_P (inner_elt_type)
-	 || !TYPE_HAS_COMPLEX_COPY_ASSIGN (inner_elt_type))
+	  ? vec_copy_assign_is_trivial (inner_elt_type, init)
 	  : !TYPE_NEEDS_CONSTRUCTING (type))
((TREE_CODE (init) == CONSTRUCTOR
 	   /* Don't do this if the CONSTRUCTOR might contain something
diff --git a/gcc/testsuite/g++.dg/cpp0x/rv-array1.C b/gcc/testsuite/g++.dg/cpp0x/rv-array1.C
new file mode 100644
index 000..9075764
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/rv-array1.C
@@ -0,0 +1,55 @@
+// PR c++/66501
+// { dg-do run { target c++11 } }
+
+int total_size;
+
+struct Object
+{
+  int size = 0;
+
+  Object () = default;
+
+  ~Object () {
+total_size -= size;
+  }
+
+  Object (const Object ) = delete;
+  Object  operator= (const Object ) = delete;
+
+  Object (Object  b) {
+size = b.size;
+b.size = 0;
+  }
+
+  Object  operator= (Object  b) {
+if (this !=  b) {
+  total_size -= size;
+  size = b.size;
+  b.size = 0;
+}
+return * this;
+  }
+
+  void grow () {
+size ++;
+total_size ++;
+  }
+};
+
+struct Container {
+  Object objects[2];
+};
+
+int main (void)
+{
+  Container container;
+
+  // grow some objects in the container
+  for (auto  object : container.objects)
+object.grow ();
+
+  // now empty it
+  container = Container ();
+
+  return total_size;
+}
commit 1c9edbe05d9e9437bcb7b3f621809461399aefe0
Author: Jason Merrill ja...@redhat.com
Date:   Tue Jun 23 22:02:30 2015 -0400

	PR c++/66501
	* init.c (vec_copy_assign_is_trivial): New.
	(build_vec_init): Use it.

diff --git a/gcc/cp/init.c b/gcc/cp/init.c
index 5cb7fc4..09a897f 100644
--- a/gcc/cp/init.c
+++ b/gcc/cp/init.c
@@ -3379,6 +3379,21 @@ get_temp_regvar (tree type, tree init)
   return decl;
 }
 
+/* Subroutine of build_vec_init.  Returns true if assigning to an array of
+   INNER_ELT_TYPE from INIT is trivial.  */
+
+static bool
+vec_copy_assign_is_trivial (tree inner_elt_type, tree init)
+{
+  if (!CLASS_TYPE_P (inner_elt_type))
+return true;
+  if (cxx_dialect = cxx11
+   !real_lvalue_p (init)
+   type_has_move_assign (inner_elt_type))
+return !TYPE_HAS_COMPLEX_MOVE_ASSIGN (inner_elt_type);
+  return TYPE_HAS_TRIVIAL_COPY_ASSIGN (inner_elt_type);
+}
+
 /* `build_vec_init' returns tree structure that performs
initialization of a vector of aggregate types.
 
@@ -3460,8 +3475,7 @@ build_vec_init (tree base, tree maxindex, tree init,
TREE_CODE (atype) == ARRAY_TYPE
TREE_CONSTANT (maxindex)
(from_array == 2
-	  ? (!CLASS_TYPE_P (inner_elt_type)
-	 || !TYPE_HAS_COMPLEX_COPY_ASSIGN (inner_elt_type))
+	  ? vec_copy_assign_is_trivial (inner_elt_type, init)
 	  : !TYPE_NEEDS_CONSTRUCTING (type))
((TREE_CODE (init) == CONSTRUCTOR
 	   /* Don't do this if the CONSTRUCTOR might contain something
diff --git a/gcc/testsuite/g++.dg/cpp0x/rv-array1.C b/gcc/testsuite/g++.dg/cpp0x/rv-array1.C
new file mode 100644
index 000..9075764
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/rv-array1.C
@@ -0,0 +1,55 @@
+// PR c++/66501
+// { dg-do run { target c++11 } }
+
+int total_size;
+
+struct Object
+{
+  int size = 0;
+
+  Object () = default;
+
+  ~Object () {
+total_size -= size;
+  }
+
+ 

Re: [Patch, C++, PR65882] Check tf_warning flag in build_new_op_1

2015-06-24 Thread Christophe Lyon
On 22 June 2015 at 18:59, Jason Merrill ja...@redhat.com wrote:
 On 06/19/2015 08:23 PM, Mikhail Maltsev wrote:

 I see that version 5.2 is set as target milestone for this bug. Should I
 backport the patch?


 Please.

 Jason


Hi Mikhail,

In the gcc-5-branch, I can see that your new inhibit-warn-2.C test
fails (targets ARM and AArch64).

I can see this error message in g++.log:
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/g++.dg/diagnostic/inhibit-warn-2.C:
In function 'void fn1()':
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/g++.dg/diagnostic/inhibit-warn-2.C:29:3:
error: 'typename A(Ftypename C template-parameter-1-1
::type::value || B:: value)::type D::operator=(Expr) [with Expr =
int; typename A(Ftypename C template-parameter-1-1
::type::value || B:: value)::type = int]' is private
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/g++.dg/diagnostic/inhibit-warn-2.C:35:7:
error: within this context

Christophe.


RE: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math

2015-06-24 Thread Evandro Menezes
Benedikt,

Are you developing the reciprocal approximation just for 1/x proper or for any 
division, as in x/y = x * 1/y?

Thank you,

-- 
Evandro Menezes  Austin, TX


 -Original Message-
 From: Benedikt Huber [mailto:benedikt.hu...@theobroma-systems.com]
 Sent: Wednesday, June 24, 2015 12:11
 To: Dr. Philipp Tomsich
 Cc: Evandro Menezes; gcc-patches@gcc.gnu.org
 Subject: Re: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt)
 estimation in -ffast-math
 
 Evandro,
 
 Yes, we also have the 1/x approximation.
 However we do not have the test cases yet, and it also would need some clean
 up.
 I am going to provide a patch for that soon (say next week).
 Also, for this optimization we have *not* yet found a benchmark with
 significant improvements.
 
 Best Regards,
 Benedikt
 
 
  On 24 Jun 2015, at 18:52, Dr. Philipp Tomsich philipp.tomsich@theobroma-
 systems.com wrote:
 
  Evandro,
 
  We’ve seen a 28% speed-up on gromacs in SPECfp for the (scalar) reciprocal
 sqrt.
 
  Also, the “reciprocal divide” patches are floating around in various
  of our git-tree, but aren’t ready for public consumption, yet… I’ll
  leave Benedikt to comment on potential timelines for getting that pushed
 out.
 
  Best,
  Philipp.
 
  On 24 Jun 2015, at 18:42, Evandro Menezes e.mene...@samsung.com wrote:
 
  Benedikt,
 
  You beat me to it! :-)  Do you have the implementation for dividing
  using the Newton series as well?
 
  I'm not sure that the series is always for all data types and on all
  processors.  It would be useful to allow each AArch64 processor to
  enable this or not depending on the data type.  BTW, do you have some
  tests showing the speed up?
 
  Thank you,
 
  --
  Evandro Menezes  Austin, TX
 
  -Original Message-
  From: gcc-patches-ow...@gcc.gnu.org
  [mailto:gcc-patches-ow...@gcc.gnu.org]
  On
  Behalf Of Benedikt Huber
  Sent: Thursday, June 18, 2015 7:04
  To: gcc-patches@gcc.gnu.org
  Cc: benedikt.hu...@theobroma-systems.com; philipp.tomsich@theobroma-
  systems.com
  Subject: [PATCH] [aarch64] Implemented reciprocal square root
  (rsqrt) estimation in -ffast-math
 
  arch64 offers the instructions frsqrte and frsqrts, for rsqrt
  estimation
  and
  a Newton-Raphson step, respectively.
  There are ARMv8 implementations where this is faster than using fdiv
  and rsqrt.
  It runs three steps for double and two steps for float to achieve
  the
  needed
  precision.
 
  There is one caveat and open question.
  Since -ffast-math enables flush to zero intermediate values between
  approximation steps will be flushed to zero if they are denormal.
  E.g. This happens in the case of rsqrt (DBL_MAX) and rsqrtf (FLT_MAX).
  The test cases pass, but it is unclear to me whether this is
  expected behavior with -ffast-math.
 
  The patch applies to commit:
  svn+ssh://gcc.gnu.org/svn/gcc/trunk@224470
 
  Please consider including this patch.
  Thank you and best regards,
  Benedikt Huber
 
  Benedikt Huber (1):
  2015-06-15  Benedikt Huber  benedikt.hu...@theobroma-systems.com
 
  gcc/ChangeLog|   9 +++
  gcc/config/aarch64/aarch64-builtins.c|  60 
  gcc/config/aarch64/aarch64-protos.h  |   2 +
  gcc/config/aarch64/aarch64-simd.md   |  27 
  gcc/config/aarch64/aarch64.c |  63 +
  gcc/config/aarch64/aarch64.md|   3 +
  gcc/testsuite/gcc.target/aarch64/rsqrt.c | 113
  +++
  7 files changed, 277 insertions(+)
  create mode 100644 gcc/testsuite/gcc.target/aarch64/rsqrt.c
 
  --
  1.9.1
  Mail Attachment.eml
 




[PR fortran/66528] unbalanced IF/ENDIF with -fmax-errors=1 causes invalid free

2015-06-24 Thread Manuel López-Ibáñez
The problem is that diagnostic_action_after_output tries to delete the
active pretty-printer which tries to delete its output_buffer, which
is normally dynamically allocated via placement-new, but the
output_buffer used by the error_buffer of Fortran is statically
allocated. Being statically allocated simplifies a lot pushing/poping
several instances of error_buffer.

The solution I found is to reset the active output_buffer back to the
default one before calling diagnostic_action_after_output. This is a
bit ugly, because this function does use the output_buffer, however,
at the point that Fortran calls it, both are in an equivalent state,
thus there is no visible difference.


Bootstrapped and regression tested on x86_64-linux-gnu.

2015-06-24  Manuel López-Ibáñez  m...@gcc.gnu.org

PR fortran/66528
* gfortran.dg/maxerrors.f90: New test.

gcc/fortran/ChangeLog:

2015-06-24  Manuel López-Ibáñez  m...@gcc.gnu.org

PR fortran/66528
* error.c (gfc_warning_check): Restore the default output_buffer
before calling diagnostic_action_after_output.
(gfc_error_check): Likewise.
(gfc_diagnostics_init): Add comment.
Index: gcc/testsuite/gfortran.dg/maxerrors.f90
===
--- gcc/testsuite/gfortran.dg/maxerrors.f90 (revision 0)
+++ gcc/testsuite/gfortran.dg/maxerrors.f90 (revision 0)
@@ -0,0 +1,12 @@
+! { dg-do compile }
+! { dg-options -fmax-errors=1 }
+! PR66528
+! { dg-prune-output compilation terminated }
+program main
+  read (*,*) n
+  if (n0) then
+print *,foo
+  end ! { dg-error END IF statement expected }
+print *,bar
+end program main
+
Index: gcc/fortran/error.c
===
--- gcc/fortran/error.c (revision 224844)
+++ gcc/fortran/error.c (working copy)
@@ -1247,24 +1247,23 @@ gfc_clear_warning (void)
If so, print the warning.  */
 
 void
 gfc_warning_check (void)
 {
-  /* This is for the new diagnostics machinery.  */
   if (! gfc_output_buffer_empty_p (pp_warning_buffer))
 {
   pretty_printer *pp = global_dc-printer;
   output_buffer *tmp_buffer = pp-buffer;
   pp-buffer = pp_warning_buffer;
   pp_really_flush (pp);
   warningcount += warningcount_buffered;
   werrorcount += werrorcount_buffered;
   gcc_assert (warningcount_buffered + werrorcount_buffered == 1);
+  pp-buffer = tmp_buffer;
   diagnostic_action_after_output (global_dc, 
  warningcount_buffered 
  ? DK_WARNING : DK_ERROR);
-  pp-buffer = tmp_buffer;
 }
 }
 
 
 /* Issue an error.  */
@@ -1379,12 +1378,12 @@ gfc_error_check (void)
   output_buffer *tmp_buffer = pp-buffer;
   pp-buffer = pp_error_buffer;
   pp_really_flush (pp);
   ++errorcount;
   gcc_assert (gfc_output_buffer_empty_p (pp_error_buffer));
-  diagnostic_action_after_output (global_dc, DK_ERROR);
   pp-buffer = tmp_buffer;
+  diagnostic_action_after_output (global_dc, DK_ERROR);
   return true;
 }
 
   return false;
 }
@@ -1470,10 +1469,12 @@ gfc_diagnostics_init (void)
   diagnostic_format_decoder (global_dc) = gfc_format_decoder;
   global_dc-caret_chars[0] = '1';
   global_dc-caret_chars[1] = '2';
   pp_warning_buffer = new (XNEW (output_buffer)) output_buffer ();
   pp_warning_buffer-flush_p = false;
+  /* pp_error_buffer is statically allocated.  This simplifies memory
+ management when using gfc_push/pop_error. */
   pp_error_buffer = (error_buffer.buffer);
   pp_error_buffer-flush_p = false;
 }
 
 void


Re: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math

2015-06-24 Thread Benedikt Huber
Evandro,

Yes, we also have the 1/x approximation.
However we do not have the test cases yet, and it also would need some clean up.
I am going to provide a patch for that soon (say next week).
Also, for this optimization we have *not* yet found a benchmark with 
significant improvements.

Best Regards,
Benedikt


 On 24 Jun 2015, at 18:52, Dr. Philipp Tomsich 
 philipp.toms...@theobroma-systems.com wrote:
 
 Evandro,
 
 We’ve seen a 28% speed-up on gromacs in SPECfp for the (scalar) reciprocal 
 sqrt.
 
 Also, the “reciprocal divide” patches are floating around in various of our 
 git-tree, but
 aren’t ready for public consumption, yet… I’ll leave Benedikt to comment on 
 potential
 timelines for getting that pushed out.
 
 Best,
 Philipp.
 
 On 24 Jun 2015, at 18:42, Evandro Menezes e.mene...@samsung.com wrote:
 
 Benedikt,
 
 You beat me to it! :-)  Do you have the implementation for dividing using
 the Newton series as well?
 
 I'm not sure that the series is always for all data types and on all
 processors.  It would be useful to allow each AArch64 processor to enable
 this or not depending on the data type.  BTW, do you have some tests showing
 the speed up?
 
 Thank you,
 
 --
 Evandro Menezes  Austin, TX
 
 -Original Message-
 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org]
 On
 Behalf Of Benedikt Huber
 Sent: Thursday, June 18, 2015 7:04
 To: gcc-patches@gcc.gnu.org
 Cc: benedikt.hu...@theobroma-systems.com; philipp.tomsich@theobroma-
 systems.com
 Subject: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt)
 estimation in -ffast-math
 
 arch64 offers the instructions frsqrte and frsqrts, for rsqrt estimation
 and
 a Newton-Raphson step, respectively.
 There are ARMv8 implementations where this is faster than using fdiv and
 rsqrt.
 It runs three steps for double and two steps for float to achieve the
 needed
 precision.
 
 There is one caveat and open question.
 Since -ffast-math enables flush to zero intermediate values between
 approximation steps will be flushed to zero if they are denormal.
 E.g. This happens in the case of rsqrt (DBL_MAX) and rsqrtf (FLT_MAX).
 The test cases pass, but it is unclear to me whether this is expected
 behavior with -ffast-math.
 
 The patch applies to commit:
 svn+ssh://gcc.gnu.org/svn/gcc/trunk@224470
 
 Please consider including this patch.
 Thank you and best regards,
 Benedikt Huber
 
 Benedikt Huber (1):
 2015-06-15  Benedikt Huber  benedikt.hu...@theobroma-systems.com
 
 gcc/ChangeLog|   9 +++
 gcc/config/aarch64/aarch64-builtins.c|  60 
 gcc/config/aarch64/aarch64-protos.h  |   2 +
 gcc/config/aarch64/aarch64-simd.md   |  27 
 gcc/config/aarch64/aarch64.c |  63 +
 gcc/config/aarch64/aarch64.md|   3 +
 gcc/testsuite/gcc.target/aarch64/rsqrt.c | 113
 +++
 7 files changed, 277 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/rsqrt.c
 
 --
 1.9.1
 Mail Attachment.eml
 



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: fix PR46029: reimplement if conversion of loads and stores

2015-06-24 Thread Jeff Law

On 06/22/2015 10:27 AM, Alan Lawrence wrote:



My main thought concerns the direction we are travelling here. A major
reason why we do if-conversion is to enable vectorization. Is this is
targetted at gathering/scattering loads? Following vectorization,
different elements of the vector being loaded/stored may have to go
to/from the scratchpad or to/from main memory.

Or, are we aiming at the case where the predicate or address are
invariant? That seems unlikely - loop unswitching would be better for
the predicate; loading from an address, we'd just peel and hoist;
storing, this'd result in the address holding the last value written, at
exit from the loop, a curious idiom. Where the predicate/address is
invariant across the vector? (!)

Or, at we aiming at non-vectorized code?
I think we're aiming at correctness issues, particularly WRT not 
allowing the optimizers to introduce new data races for C11/C++11.





Re. creation of scratchpads:
(1) Should the '64' byte size be the result of scanning the
function, for the largest data size to which we store? (ideally,
conditionally store!)
I suspect most functions have conditional stores, but far fewer have 
conditional stores that we'd like to if-convert.  ISTM that if we can 
lazily allocate the scratchpad that'd be best.   If this were an RTL 
pass, then I'd say query the backend for the widest mode store insn and 
use that to size the scratchpad.  We may have something similar we can 
do in gimple without resorting querying insn backend capabilities. 
Perhaps walking the in-scope addressable variables or somesuch.




(2) Allocating only once per function: if we had one scratchpad per
loop, it could/would live inside the test of gimple_build_call_internal
(IFN_LOOP_VECTORIZED,  Otherwise, if we if-convert one or more
loops in the function, but then fail to vectorize them, we'll leave the
scratchpad around for later phases to clean up. Is that OK?
If the scratchpad is local to a function, then I'd expect we'd clean it 
up just like any other unused local.  Once it's a global, then all bets 
are off.


Anyway, I probably should just look at the patch before making more 
comments.


jeff



Re: C PATCH to use is_global_var

2015-06-24 Thread Joseph Myers
On Wed, 24 Jun 2015, Marek Polacek wrote:

 diff --git gcc/c/c-decl.c gcc/c/c-decl.c
 index fc1fdf9..ab54db9 100644
 --- gcc/c/c-decl.c
 +++ gcc/c/c-decl.c
 @@ -2650,9 +2650,8 @@ merge_decls (tree newdecl, tree olddecl, tree newtype, 
 tree oldtype)
 tree_code_size (TREE_CODE (olddecl)) - sizeof (struct 
 tree_decl_common));
 olddecl-decl_with_vis.symtab_node = snode;
  
 -   if ((DECL_EXTERNAL (olddecl)
 -|| TREE_PUBLIC (olddecl)
 -|| TREE_STATIC (olddecl))
 +   if ((is_global_var (olddecl)
 +|| TREE_PUBLIC (olddecl))
  DECL_SECTION_NAME (newdecl) != NULL)
   set_decl_section_name (olddecl, DECL_SECTION_NAME (newdecl));
  

At least this case covers both FUNCTION_DECL and VAR_DECL.  If 
is_global_var is appropriate for functions as well as variables, I think 
it should be renamed (and have its comment updated to explain what it 
means for functions).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: C PATCH to use VAR_P

2015-06-24 Thread Jeff Law

On 06/24/2015 06:45 AM, Marek Polacek wrote:

On Wed, Jun 24, 2015 at 02:37:30PM +0200, Uros Bizjak wrote:

Hello!


Similarly to what Gaby did in 2013 for C++
(https://gcc.gnu.org/ml/gcc-patches/2013-03/msg01271.html), this patch
makes the c/ and c-family/ code use VAR_P rather than

   TREE_CODE (t) == VAR_DECL

(This is on top of the previous patch with is_global_var.)


You could also use VAR_OR_FUNCTION_DECL, e.g. in the part below.


Sure, I thought I had dealt with VAR_OR_FUNCTION_DECL_P in
https://gcc.gnu.org/ml/gcc-patches/2015-05/msg01797.html, but
I must have missed this.  Thanks,

Consider that follow-up approved as well.
jeff


Re: C PATCH to use VAR_P

2015-06-24 Thread Jeff Law

On 06/24/2015 06:25 AM, Marek Polacek wrote:

Similarly to what Gaby did in 2013 for C++
(https://gcc.gnu.org/ml/gcc-patches/2013-03/msg01271.html), this patch
makes the c/ and c-family/ code use VAR_P rather than

   TREE_CODE (t) == VAR_DECL

(This is on top of the previous patch with is_global_var.)

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2015-06-24  Marek Polacek  pola...@redhat.com

* array-notation-common.c: Use VAR_P throughout.
* c-ada-spec.c: Likewise.
* c-common.c: Likewise.
* c-format.c: Likewise.
* c-gimplify.c: Likewise.
* c-omp.c: Likewise.
* c-pragma.c: Likewise.
* c-pretty-print.c: Likewise.
* cilk.c: Likewise.

* c-array-notation.c: Use VAR_P throughout.
* c-decl.c: Likewise.
* c-objc-common.c: Likewise.
* c-parser.c: Likewise.
* c-typeck.c: Likewise.
I spot checked mostly for VAR_P vs !VAR_P correctness and everything 
looked correct.  OK for the trunk.  Consider any follow-ups to use VAR_P 
in a similar way pre-approved.


jeff



Re: [02/13] Replace handle_cache_entry with new interface

2015-06-24 Thread Jeff Law

On 06/24/2015 02:16 AM, Richard Sandiford wrote:

So for all the keep_cache_entry functions, I guess they're trivial
enough that a function comment probably isn't needed.


Yeah.  For cases like this where the function is implementing a defined
interface (described in hash-table.h), I think it's better to only have
comments for implementations that are doing something non-obvious.

That works for me.




Presumably no good way to share the trivial implementation?


Probably not without sharing the other parts of the traits in some way.
That might be another possible cleanup :-)
I'll let you decide whether or not to pursue.  I'd like to hope that ICF 
would help us here.


jeff


Re: [patch] fix regrename pass to ensure renamings produce valid insns

2015-06-24 Thread Eric Botcazou
 Yes, the patch is OK, modulo...

But you also need the approval of an ARM maintainer.

-- 
Eric Botcazou


Re: fix PR46029: reimplement if conversion of loads and stores

2015-06-24 Thread Jeff Law

On 06/24/2015 10:50 AM, Ramana Radhakrishnan wrote:



On 12/06/15 21:50, Abe Skolnik wrote:

Hi everybody!

In the current implementation of if conversion, loads and stores are
if-converted in a thread-unsafe way:

   * loads were always executed, even when they should have not been.
 Some source code could be rendered invalid due to null pointers
 that were OK in the original program because they were never
 dereferenced.

   * writes were if-converted via load/maybe-modify/store, which
 renders some code multithreading-unsafe.

This patch reimplements if-conversion of loads and stores in a safe
way using a scratchpad allocated by the compiler on the stack:

   * loads are done through an indirection, reading either the correct
 data from the correct source [if the condition is true] or reading
 from the scratchpad and later ignoring this read result [if the
 condition is false].

   * writes are also done through an indirection, writing either to the
 correct destination [if the condition is true] or to the
 scratchpad [if the condition is false].

Vectorization of if-cvt-stores-vect-ifcvt-18.c disabled because the
old if-conversion resulted in unsafe code that could fail under
multithreading even though the as-written code _was_ thread-safe.

Passed regression testing and bootstrap on amd64-linux.
Is this OK to commit to trunk?


I can't approve or reject but this certainly looks like an improvement
compared to where we are as we get rid of the data races.
Right.  I was going to assume the primary purpose to to address 
correctness issues, not increase the amount of if-conversion for 
optimization purposes.


I have a couple of high level concerns around the scratchpad usage 
(aliasing, write-write hazards), but until I dig into the patch I don't 
know if they're real issues or not.



Jeff


  1   2   >