Re: [Driver] Add support for -fuse-ld=lld

2016-06-23 Thread Davide Italiano
+ HJ who wrote the code for the option originally.

On Thu, Jun 23, 2016 at 9:01 PM, Davide Italiano  wrote:
> LLVM currently ships with a new ELF linker http://lld.llvm.org/.
> I experiment a lot with gcc and lld so it would be nice if
> -fuse-ld=lld is supported (considering the linker is now mature enough
> to link large C/C++ applications).
>
> Also, IMHO, -fuse-ld should be a generic facility which accept other
> linkers (as long as they follow the convention ld.), and should
> also support absolute path, e.g. -fuse-ld=/usr/local/bin/ld.mylinker.
> Probably outside of the scope of this patch, but I thought worth
> mentioning.
>
> Thanks,
>
> --
> Davide
>
> From 323c23d79c91d7dcee2f29b9ced8c1c00703d346 Mon Sep 17 00:00:00 2001
> From: Davide Italiano 
> Date: Thu, 23 Jun 2016 20:51:53 -0700
> Subject: [PATCH] Driver: Add support for -fuse-ld=lld.
>
> * collect2.c  (main): Support -fuse-ld=lld.
>
> * common.opt: Add fuse-ld=lld
>
> * doc/invoke.texi:  Document -fuse-ld=lld
>
> * opts.c: Ignore -fuse-ld=lld
> ---
>  gcc/collect2.c  | 11 ---
>  gcc/common.opt  |  4 
>  gcc/doc/invoke.texi |  4 
>  gcc/opts.c  |  1 +
>  4 files changed, 17 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/collect2.c b/gcc/collect2.c
> index bffac80..6a8387c 100644
> --- a/gcc/collect2.c
> +++ b/gcc/collect2.c
> @@ -831,6 +831,7 @@ main (int argc, char **argv)
>USE_PLUGIN_LD,
>USE_GOLD_LD,
>USE_BFD_LD,
> +  USE_LLD_LD,
>USE_LD_MAX
>  } selected_linker = USE_DEFAULT_LD;
>static const char *const ld_suffixes[USE_LD_MAX] =
> @@ -838,7 +839,8 @@ main (int argc, char **argv)
>"ld",
>PLUGIN_LD_SUFFIX,
>"ld.gold",
> -  "ld.bfd"
> +  "ld.bfd",
> +  "ld.lld"
>  };
>static const char *const real_ld_suffix = "real-ld";
>static const char *const collect_ld_suffix = "collect-ld";
> @@ -1004,6 +1006,8 @@ main (int argc, char **argv)
>selected_linker = USE_BFD_LD;
>  else if (strcmp (argv[i], "-fuse-ld=gold") == 0)
>selected_linker = USE_GOLD_LD;
> +  else if (strcmp (argv[i], "-fuse-ld=lld") == 0)
> +selected_linker = USE_LLD_LD;
>
>  #ifdef COLLECT_EXPORT_LIST
>  /* These flags are position independent, although their order
> @@ -1093,7 +1097,8 @@ main (int argc, char **argv)
>/* Maybe we know the right file to use (if not cross).  */
>ld_file_name = 0;
>  #ifdef DEFAULT_LINKER
> -  if (selected_linker == USE_BFD_LD || selected_linker == USE_GOLD_LD)
> +  if (selected_linker == USE_BFD_LD || selected_linker == USE_GOLD_LD ||
> +  selected_linker == USE_LLD_LD)
>  {
>char *linker_name;
>  # ifdef HOST_EXECUTABLE_SUFFIX
> @@ -1307,7 +1312,7 @@ main (int argc, char **argv)
>else if (!use_collect_ld
> && strncmp (arg, "-fuse-ld=", 9) == 0)
>  {
> -  /* Do not pass -fuse-ld={bfd|gold} to the linker. */
> +  /* Do not pass -fuse-ld={bfd|gold|lld} to the linker. */
>ld1--;
>ld2--;
>  }
> diff --git a/gcc/common.opt b/gcc/common.opt
> index 5d90385..2a95a1f 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -2536,6 +2536,10 @@ fuse-ld=gold
>  Common Driver Negative(fuse-ld=bfd)
>  Use the gold linker instead of the default linker.
>
> +fuse-ld=lld
> +Common Driver Negative(fuse-ld=lld)
> +Use the lld LLVM linker instead of the default linker.
> +
>  fuse-linker-plugin
>  Common Undocumented Var(flag_use_linker_plugin)
>
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 2c87c53..4b8acff 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -10651,6 +10651,10 @@ Use the @command{bfd} linker instead of the
> default linker.
>  @opindex fuse-ld=gold
>  Use the @command{gold} linker instead of the default linker.
>
> +@item -fuse-ld=lld
> +@opindex fuse-ld=lld
> +Use the LLVM @command{lld} linker instead of the default linker.
> +
>  @cindex Libraries
>  @item -l@var{library}
>  @itemx -l @var{library}
> diff --git a/gcc/opts.c b/gcc/opts.c
> index 7406210..f2c86f7 100644
> --- a/gcc/opts.c
> +++ b/gcc/opts.c
> @@ -2178,6 +2178,7 @@ common_handle_option (struct gcc_options *opts,
>
>  case OPT_fuse_ld_bfd:
>  case OPT_fuse_ld_gold:
> +case OPT_fuse_ld_lld:
>  case OPT_fuse_linker_plugin:
>/* No-op. Used by the driver and passed to us because it starts with 
> f.*/
>break;
> --
> 2.5.5


[Driver] Add support for -fuse-ld=lld

2016-06-23 Thread Davide Italiano
LLVM currently ships with a new ELF linker http://lld.llvm.org/.
I experiment a lot with gcc and lld so it would be nice if
-fuse-ld=lld is supported (considering the linker is now mature enough
to link large C/C++ applications).

Also, IMHO, -fuse-ld should be a generic facility which accept other
linkers (as long as they follow the convention ld.), and should
also support absolute path, e.g. -fuse-ld=/usr/local/bin/ld.mylinker.
Probably outside of the scope of this patch, but I thought worth
mentioning.

Thanks,

--
Davide

>From 323c23d79c91d7dcee2f29b9ced8c1c00703d346 Mon Sep 17 00:00:00 2001
From: Davide Italiano 
Date: Thu, 23 Jun 2016 20:51:53 -0700
Subject: [PATCH] Driver: Add support for -fuse-ld=lld.

* collect2.c  (main): Support -fuse-ld=lld.

* common.opt: Add fuse-ld=lld

* doc/invoke.texi:  Document -fuse-ld=lld

* opts.c: Ignore -fuse-ld=lld
---
 gcc/collect2.c  | 11 ---
 gcc/common.opt  |  4 
 gcc/doc/invoke.texi |  4 
 gcc/opts.c  |  1 +
 4 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/gcc/collect2.c b/gcc/collect2.c
index bffac80..6a8387c 100644
--- a/gcc/collect2.c
+++ b/gcc/collect2.c
@@ -831,6 +831,7 @@ main (int argc, char **argv)
   USE_PLUGIN_LD,
   USE_GOLD_LD,
   USE_BFD_LD,
+  USE_LLD_LD,
   USE_LD_MAX
 } selected_linker = USE_DEFAULT_LD;
   static const char *const ld_suffixes[USE_LD_MAX] =
@@ -838,7 +839,8 @@ main (int argc, char **argv)
   "ld",
   PLUGIN_LD_SUFFIX,
   "ld.gold",
-  "ld.bfd"
+  "ld.bfd",
+  "ld.lld"
 };
   static const char *const real_ld_suffix = "real-ld";
   static const char *const collect_ld_suffix = "collect-ld";
@@ -1004,6 +1006,8 @@ main (int argc, char **argv)
   selected_linker = USE_BFD_LD;
 else if (strcmp (argv[i], "-fuse-ld=gold") == 0)
   selected_linker = USE_GOLD_LD;
+  else if (strcmp (argv[i], "-fuse-ld=lld") == 0)
+selected_linker = USE_LLD_LD;

 #ifdef COLLECT_EXPORT_LIST
 /* These flags are position independent, although their order
@@ -1093,7 +1097,8 @@ main (int argc, char **argv)
   /* Maybe we know the right file to use (if not cross).  */
   ld_file_name = 0;
 #ifdef DEFAULT_LINKER
-  if (selected_linker == USE_BFD_LD || selected_linker == USE_GOLD_LD)
+  if (selected_linker == USE_BFD_LD || selected_linker == USE_GOLD_LD ||
+  selected_linker == USE_LLD_LD)
 {
   char *linker_name;
 # ifdef HOST_EXECUTABLE_SUFFIX
@@ -1307,7 +1312,7 @@ main (int argc, char **argv)
   else if (!use_collect_ld
&& strncmp (arg, "-fuse-ld=", 9) == 0)
 {
-  /* Do not pass -fuse-ld={bfd|gold} to the linker. */
+  /* Do not pass -fuse-ld={bfd|gold|lld} to the linker. */
   ld1--;
   ld2--;
 }
diff --git a/gcc/common.opt b/gcc/common.opt
index 5d90385..2a95a1f 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2536,6 +2536,10 @@ fuse-ld=gold
 Common Driver Negative(fuse-ld=bfd)
 Use the gold linker instead of the default linker.

+fuse-ld=lld
+Common Driver Negative(fuse-ld=lld)
+Use the lld LLVM linker instead of the default linker.
+
 fuse-linker-plugin
 Common Undocumented Var(flag_use_linker_plugin)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 2c87c53..4b8acff 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -10651,6 +10651,10 @@ Use the @command{bfd} linker instead of the
default linker.
 @opindex fuse-ld=gold
 Use the @command{gold} linker instead of the default linker.

+@item -fuse-ld=lld
+@opindex fuse-ld=lld
+Use the LLVM @command{lld} linker instead of the default linker.
+
 @cindex Libraries
 @item -l@var{library}
 @itemx -l @var{library}
diff --git a/gcc/opts.c b/gcc/opts.c
index 7406210..f2c86f7 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -2178,6 +2178,7 @@ common_handle_option (struct gcc_options *opts,

 case OPT_fuse_ld_bfd:
 case OPT_fuse_ld_gold:
+case OPT_fuse_ld_lld:
 case OPT_fuse_linker_plugin:
   /* No-op. Used by the driver and passed to us because it starts with f.*/
   break;
-- 
2.5.5
From 323c23d79c91d7dcee2f29b9ced8c1c00703d346 Mon Sep 17 00:00:00 2001
From: Davide Italiano 
Date: Thu, 23 Jun 2016 20:51:53 -0700
Subject: [PATCH] Driver: Add support for -fuse-ld=lld.

* collect2.c  (main): Support -fuse-ld=lld.

* common.opt: Add fuse-ld=lld

* doc/invoke.texi:  Document -fuse-ld=lld

* opts.c: Ignore -fuse-ld=lld
---
 gcc/collect2.c  | 11 ---
 gcc/common.opt  |  4 
 gcc/doc/invoke.texi |  4 
 gcc/opts.c  |  1 +
 4 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/gcc/collect2.c b/gcc/collect2.c
index bffac80..6a8387c 100644
--- a/gcc/collect2.c
+++ b/gcc/collect2.c
@@ -831,6 +831,7 @@ main (int argc, char **argv)
   USE_PLUGIN_LD,
   USE_GOLD_LD,
   USE_BFD_LD,
+  USE_LLD_LD,
   USE_LD_MAX
 } selected_linker = USE_DEFAULT_LD;
   static const char *const 

[PATCHv2q, rs6000] Add minimum __float128 built-in support required for glib

2016-06-23 Thread Bill Schmidt
Hi,

Once more with feeling...  I've revised my v2 patch to rename the
functions to __builtin_q rather than __builtin_f128 to avoid
the collision with Joseph's work.  I've also corrected the formatting
problems that Segher noted with my previous attempt.

Bootstrapped and tested on powerpc64le-unknown-linux-gnu with
no regressions.  Is this ok for trunk, and eventually for 6.2?

Thanks for your patience as we worked through the details of getting
this right for GCC 6 and 7!

(I've had some possible mailer problems with previous patches, so
I will try both inserting the patch and providing an attachment.  Please
let me know which (if either) is preferable.)

Thanks,
BIll


[gcc]

2016-06-23  Bill Schmidt  

* config/rs6000/rs6000-builtin.def (BU_FLOAT128_2): New #define.
(BU_FLOAT128_1): Likewise.
(FABSQ): Likewise.
(COPYSIGNQ): Likewise.
(RS6000_BUILTIN_NANQ): Likewise.
(RS6000_BUILTIN_NANSQ): Likewise.
(RS6000_BUILTIN_INFQ): Likewise.
(RS6000_BUILTIN_HUGE_VALQ): Likewise.
* config/rs6000/rs6000.c (rs6000_fold_builtin): New prototype.
(TARGET_FOLD_BUILTIN): New #define.
(rs6000_builtin_mask_calculate): Add TARGET_FLOAT128 entry.
(rs6000_invalid_builtin): Add handling for RS6000_BTM_FLOAT128.
(rs6000_fold_builtin): New target hook implementation, handling
folding of 128-bit NaNs and infinities.
(rs6000_init_builtins): Initialize const_str_type_node; ensure all
entries are filled in to avoid problems during bootstrap
self-test; define builtins for 128-bit NaNs and infinities.
(rs6000_opt_mask): Add entry for float128.
* config/rs6000/rs6000.h (RS6000_BTM_FLOAT128): New #define.
(RS6000_BTM_COMMON): Include RS6000_BTM_FLOAT128.
(rs6000_builtin_type_index): Add RS6000_BTI_const_str.
(const_str_type_node): New #define.
* config/rs6000/rs6000.md (copysign3 for IEEE128): Convert
to a define_expand that dispatches to either copysign3_soft
or copysign3_hard.
(copysign3_hard): Rename from copysign3.
(copysign3_soft): New define_insn.
* doc/extend.texi: Document new builtins.

[gcc/testsuite]

2016-06-23  Bill Schmidt  

* gcc.target/powerpc/abs128-1.c: New.
* gcc.target/powerpc/copysign128-1.c: New.
* gcc.target/powerpc/inf128-1.c: New.
* gcc.target/powerpc/nan128-1.c: New.


Index: gcc/config/rs6000/rs6000-builtin.def
===
--- gcc/config/rs6000/rs6000-builtin.def(revision 237619)
+++ gcc/config/rs6000/rs6000-builtin.def(working copy)
@@ -652,7 +652,23 @@
 | RS6000_BTC_BINARY),  \
CODE_FOR_ ## ICODE) /* ICODE */
 
+/* IEEE 128-bit floating-point builtins.  */
+#define BU_FLOAT128_2(ENUM, NAME, ATTR, ICODE)  \
+  RS6000_BUILTIN_2 (MISC_BUILTIN_ ## ENUM,  /* ENUM */  \
+"__builtin_" NAME,  /* NAME */  \
+   RS6000_BTM_FLOAT128,/* MASK */  \
+   (RS6000_BTC_ ## ATTR/* ATTR */  \
+| RS6000_BTC_BINARY),  \
+   CODE_FOR_ ## ICODE) /* ICODE */
 
+#define BU_FLOAT128_1(ENUM, NAME, ATTR, ICODE)  \
+  RS6000_BUILTIN_1 (MISC_BUILTIN_ ## ENUM,  /* ENUM */  \
+"__builtin_" NAME,  /* NAME */  \
+   RS6000_BTM_FLOAT128,/* MASK */  \
+   (RS6000_BTC_ ## ATTR/* ATTR */  \
+| RS6000_BTC_UNARY),   \
+   CODE_FOR_ ## ICODE) /* ICODE */
+
 /* Miscellaneous builtins for instructions added in ISA 3.0.  These
instructions don't require either the DFP or VSX options, just the basic
ISA 3.0 enablement since they operate on general purpose registers.  */
@@ -1814,6 +1830,11 @@ BU_P9V_OVERLOAD_1 (VPRTYBD,  "vprtybd")
 BU_P9V_OVERLOAD_1 (VPRTYBQ,"vprtybq")
 BU_P9V_OVERLOAD_1 (VPRTYBW,"vprtybw")
 
+/* 1 argument IEEE 128-bit floating-point functions.  */
+BU_FLOAT128_1 (FABSQ,  "fabsq",   CONST, abskf2)
+
+/* 2 argument IEEE 128-bit floating-point functions.  */
+BU_FLOAT128_2 (COPYSIGNQ,  "copysignq",   CONST, copysignkf3)
 
 /* 1 argument crypto functions.  */
 BU_CRYPTO_1 (VSBOX,"vsbox",  CONST, crypto_vsbox)
@@ -2191,6 +2212,18 @@ BU_SPECIAL_X (RS6000_BUILTIN_CPU_IS, "__builtin_cp
 BU_SPECIAL_X (RS6000_BUILTIN_CPU_SUPPORTS, "__builtin_cpu_supports",
  RS6000_BTM_ALWAYS, RS6000_BTC_MISC)
 
+BU_SPECIAL_X (RS6000_BUILTIN_NANQ, "__builtin_nanq",
+

Re: [PATCHv2, rs6000] Add minimum __float128 built-in support required for glibc

2016-06-23 Thread Bill Schmidt
> 
> On Jun 23, 2016, at 5:41 PM, Joseph Myers  wrote:
> 
> On Thu, 23 Jun 2016, Bill Schmidt wrote:
> 
>> After discussing with the glibc folks, I'd like to propose that this patch
>> be altered to use the 'q' suffix for the builtin names.  That way we won't
>> have a naming conflict with Joseph's patch in the short term, and we'll
>> be able to stage the movement on trunk to the f128 support.
>> 
>> I've been informed that there are other packages/libraries that assume 
>> the 'q' suffix, so we will need both anyway.  For the time being, we can
>> use #defines for glibc using GCC 6 to define the f128 functions to be
>> the q functions.  We'll plan to normalize to use the arch-neutral f128
>> builtins after the 6.2 push completes.
> 
> Those #defines in glibc would be needed anyway for __float128 functions in 
> glibc for x86 to support GCC versions before GCC 7 (which x86 support I'm 
> minded to look at adding once the __float128 functions for powerpc64le are 
> in; adding them for a new architecture shouldn't be hard once the first 
> architecture is done).
> 
> The 'q' suffix should be considered legacy (just like the __float128 
> name!), but if it doesn't conflict with the generic support it's 
> essentially a target maintainer matter.  As I said in my patch submission 
> for the generic functions, I don't know if target built-in functions can 
> be made into aliases for generic ones, but that's what's desirable in 
> optimization terms once the generic ones are in, so that optimizations 
> apply equally to both.

Agreed!

> 
> (I'm presuming that eventually we *will* enable all built-in function and 
> other optimizations, that currently are just for float / double / long 
> double, for the new types and their corresponding TS 18661-3 functions as 
> well - that's just lower priority since it's not at all on the critical 
> path for enabling support for the new type, unlike this minimal set of 
> functions.)

Yes, that's the way we feel too -- it needs to happen, but further down the
road once we're out of this bottleneck.

> 
> Do those packages assuming 'q' expect more than the minimal built-in 
> functions (i.e., do they want libquadmath)?  I've noted before that while 
> libquadmath is not the way forward for libm support for __float128 (that's 
> *f128 functions in glibc), and while libquadmath is missing the past few 
> years' improvements to glibc libm, enabling it for powerpc64le (once 
> you've got built-in functions and complex arithmetic functions in libgcc) 
> would allow you to test that the back-end __float128 support works for a 
> substantial body of code with __float128 arithmetic  (While there is 
> no libquadmath testsuite, Paul Murphy's recent work should make it much 
> easier to run the glibc libm-test with libquadmath than it was when 
> libquadmath was first added.)

Actually libquadmath is what I was thinking of -- there may be more, but
I'm not very tuned into this side of things.  Once we have the full set of
__float128 support in place, we should be able to do some of this kind
of testing.

Thanks!
Bill

> 
> -- 
> Joseph S. Myers
> jos...@codesourcery.com
> 



Re: [PATCHv2, rs6000] Add minimum __float128 built-in support required for glibc

2016-06-23 Thread Bill Schmidt
Thanks, I'll make these changes and re-spin.  Not sure what was up
with my tabs...

> On Jun 23, 2016, at 6:49 PM, Segher Boessenkool  
> wrote:
> 
> Hi Bill,
> 
> Some little things about the patch...
> 
> On Thu, Jun 23, 2016 at 04:44:27PM -0500, Bill Schmidt wrote:
>> We no longer have a half-clever implementation to construct an infinity
>> inside vector registers, or the full-clever one that Segher proposed in
>> response. :)  We can try to add that support later if desired.
> 
> For posterity:
> 
> Use   vspltisw A,N ; vsrw B,A,A ; vslo D,B,A   to create in D:
> N=-16  _______  (ieee128 -Inf)
> N=-17  7fff_______  (ieee128 +Inf)

For the latter, it was N=-15 IIRC?  Anyway -17 is an illegal value for vspltisw.

> 
>> @@ -35569,6 +35639,7 @@ static struct rs6000_opt_mask const rs6000_builtin
>>   { "hard-dfp",   RS6000_BTM_DFP,false, false },
>>   { "hard-float", RS6000_BTM_HARD_FLOAT, false, false },
>>   { "long-double-128",RS6000_BTM_LDBL128,false, false },
>> +  { "float128",  RS6000_BTM_FLOAT128,   false, false },
> 
> The previous entries use tabs for indentation.
> 
>> --- gcc/config/rs6000/rs6000.h   (revision 237619)
>> +++ gcc/config/rs6000/rs6000.h   (working copy)
>> @@ -2689,6 +2689,7 @@ extern int frame_pointer_needed;
>> #define RS6000_BTM_HARD_FLOATMASK_SOFT_FLOAT /* Hardware floating 
>> point.  */
>> #define RS6000_BTM_LDBL128   MASK_MULTIPLE   /* 128-bit long double.  */
>> #define RS6000_BTM_64BIT MASK_64BIT  /* 64-bit addressing.  */
>> +#define RS6000_BTM_FLOAT128 MASK_P9_VECTOR  /* IEEE 128-bit float.  */
> 
> Here, too.
> 
>> @@ -2705,7 +2706,8 @@ extern int frame_pointer_needed;
>>   | RS6000_BTM_CELL  \
>>   | RS6000_BTM_DFP   \
>>   | RS6000_BTM_HARD_FLOAT\
>> - | RS6000_BTM_LDBL128)
>> + | RS6000_BTM_LDBL128   \
>> + | RS6000_BTM_FLOAT128)
> 
> And here.  And more later.  Let's try to stick to one style, at least
> locally.
> 
>> --- gcc/config/rs6000/rs6000.md  (revision 237619)
>> +++ gcc/config/rs6000/rs6000.md  (working copy)
>> @@ -13326,7 +13326,25 @@
>>"xssqrtqp %0,%1"
>>   [(set_attr "type" "vecdiv")])
>> 
>> -(define_insn "copysign3"
>> +(define_expand "copysign3"
>> +  [(use (match_operand:IEEE128 0 "altivec_register_operand" ""))
>> +   (use (match_operand:IEEE128 1 "altivec_register_operand" ""))
>> +   (use (match_operand:IEEE128 2 "altivec_register_operand" ""))]
> 
> The "" is not needed.
> 
>> +  "FLOAT128_IEEE_P (mode)"
>> +{
>> +  if (TARGET_FLOAT128_HW)
>> +emit_insn (gen_copysign3_hard (operands[0], operands[1],
>> + operands[2]));
> 
> Tabbing here...
> 
>> +  else
>> +{
>> +  rtx tmp = gen_reg_rtx (mode);
>> +  emit_insn (gen_copysign3_soft (operands[0], operands[1],
>> +   operands[2], tmp));
> 
> ... and here is completely broken.
> 
>> @@ -13336,6 +13354,18 @@
>>"xscpsgnqp %0,%2,%1"
>>   [(set_attr "type" "vecsimple")])
>> 
>> +(define_insn "copysign3_soft"
>> +  [(set (match_operand:IEEE128 0 "altivec_register_operand" "=v")
>> +(unspec:IEEE128
>> + [(match_operand:IEEE128 1 "altivec_register_operand" "v")
>> +  (match_operand:IEEE128 2 "altivec_register_operand" "v")
>> +  (match_operand:IEEE128 3 "altivec_register_operand" "+v")]
>> + UNSPEC_COPYSIGN))]
>> +  "!TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode)"
>> +   "xscpsgndp %x3,%x2,%x1\n\txxpermdi %x0,%x3,%x1,1"
> 
> Two machine insns in a template should be separated by \; not \n\t .
> 
>> +Additional built-in functions are available for the 64-bit PowerPC
>> +family of processors, for efficient use of 128-bit floating point
>> +(@code{__float128}) values.
>> +
>> +The following floating-point built-in functions are always available.  All
>> +of them implement the function that is part of the name.
> 
> "Always"?  Not just with -mfloat128?  And it needs VMX?

Sorry, pasto from the 386 docs.  I meant to change that, sorry.

Bill

> 
> 
> Segher
> 



[PATCH] Enable non-PIC noplt tests on 32-bit x86 target

2016-06-23 Thread H.J. Lu
Since non-PIC noplt works on 32-bit x86 target now with assembler/linker
support, enable non-PIC noplt tests on 32-bit x86 target.  main in
noplt-2.c and noplt-4.c are renamed to bar to avoid stack re-alignment
in main for 32-bit target, which disables tailcall optimization.

Tested on x86.  OK for trunk?

H.J.
---
* gcc.target/i386/noplt-1.c: Don't disable for ia32.  Scan for
ia32 if R_386_GOT32X relocation is supported.
* gcc.target/i386/noplt-3.c: Likewise.
* gcc.target/i386/noplt-2.c: Likewise.
(main): Renamed to ...
(bar): This.
* gcc.target/i386/noplt-4.c: Likewise.
(main): Renamed to ...
(bar): This.
* gcc.target/i386/pr67400-3.c: Don't disable for ia32.
* gcc.target/i386/pr67400-5.c: Likewise.
---
 gcc/testsuite/gcc.target/i386/noplt-1.c   | 5 +++--
 gcc/testsuite/gcc.target/i386/noplt-2.c   | 7 ---
 gcc/testsuite/gcc.target/i386/noplt-3.c   | 5 +++--
 gcc/testsuite/gcc.target/i386/noplt-4.c   | 7 ---
 gcc/testsuite/gcc.target/i386/pr67400-3.c | 2 +-
 gcc/testsuite/gcc.target/i386/pr67400-5.c | 2 +-
 6 files changed, 16 insertions(+), 12 deletions(-)

diff --git a/gcc/testsuite/gcc.target/i386/noplt-1.c 
b/gcc/testsuite/gcc.target/i386/noplt-1.c
index cc04bf5..f099a38 100644
--- a/gcc/testsuite/gcc.target/i386/noplt-1.c
+++ b/gcc/testsuite/gcc.target/i386/noplt-1.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { *-*-linux* && { ! ia32 } } } } */
+/* { dg-do compile { target *-*-linux* } } */
 /* { dg-options "-fno-pic" } */
 
 __attribute__ ((noplt))
@@ -10,4 +10,5 @@ int main()
   return 0;
 }
 
-/* { dg-final { scan-assembler "call\[ \t\]\\*.*foo.*@GOTPCREL\\(%rip\\)" } } 
*/ 
+/* { dg-final { scan-assembler "call\[ \t\]*.foo@GOTPCREL" { target { ! ia32 } 
} } } */
+/* { dg-final { scan-assembler "call\[ \t\]*.foo@GOT" { target { ia32 && 
got32x_reloc } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/noplt-2.c 
b/gcc/testsuite/gcc.target/i386/noplt-2.c
index 54e33f4..9548b81 100644
--- a/gcc/testsuite/gcc.target/i386/noplt-2.c
+++ b/gcc/testsuite/gcc.target/i386/noplt-2.c
@@ -1,13 +1,14 @@
-/* { dg-do compile { target { *-*-linux* && { ! ia32 } } } } */
+/* { dg-do compile { target *-*-linux* } } */
 /* { dg-options "-O2 -fno-pic" } */
 
 
 __attribute__ ((noplt))
 int foo();
 
-int main()
+int bar()
 {
   return foo();
 }
 
-/* { dg-final { scan-assembler "jmp\[ \t\]\\*.*foo.*@GOTPCREL\\(%rip\\)" } } 
*/ 
+/* { dg-final { scan-assembler "jmp\[ \t\]*.foo@GOTPCREL" { target { ! ia32 } 
} } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*.foo@GOT" { target { ia32 && 
got32x_reloc } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/noplt-3.c 
b/gcc/testsuite/gcc.target/i386/noplt-3.c
index 14e6b6b..436c0d1 100644
--- a/gcc/testsuite/gcc.target/i386/noplt-3.c
+++ b/gcc/testsuite/gcc.target/i386/noplt-3.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { *-*-linux* && { ! ia32 } } } } */
+/* { dg-do compile { target *-*-linux* } } */
 /* { dg-options "-fno-pic -fno-plt" } */
 
 void foo();
@@ -9,4 +9,5 @@ int main()
   return 0;
 }
 
-/* { dg-final { scan-assembler "call\[ \t\]\\*.*foo.*@GOTPCREL\\(%rip\\)" } } 
*/ 
+/* { dg-final { scan-assembler "call\[ \t\]*.foo@GOTPCREL" { target { ! ia32 } 
} } } */
+/* { dg-final { scan-assembler "call\[ \t\]*.foo@GOT" { target { ia32 && 
got32x_reloc } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/noplt-4.c 
b/gcc/testsuite/gcc.target/i386/noplt-4.c
index 9907347..b89fcf0 100644
--- a/gcc/testsuite/gcc.target/i386/noplt-4.c
+++ b/gcc/testsuite/gcc.target/i386/noplt-4.c
@@ -1,11 +1,12 @@
-/* { dg-do compile { target { *-*-linux* && { ! ia32 } } } } */
+/* { dg-do compile { target *-*-linux* } } */
 /* { dg-options "-O2 -fno-pic -fno-plt" } */
 
 int foo();
 
-int main()
+int bar()
 {
   return foo();
 }
 
-/* { dg-final { scan-assembler "jmp\[ \t\]\\*.*foo.*@GOTPCREL\\(%rip\\)" } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*.foo@GOTPCREL" { target { ! ia32 } 
} } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*.foo@GOT" { target { ia32 && 
got32x_reloc } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr67400-3.c 
b/gcc/testsuite/gcc.target/i386/pr67400-3.c
index 649c980..fd2f209 100644
--- a/gcc/testsuite/gcc.target/i386/pr67400-3.c
+++ b/gcc/testsuite/gcc.target/i386/pr67400-3.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { *-*-linux* && { ! ia32 } } } } */
+/* { dg-do compile { target *-*-linux* } } */
 /* { dg-options "-O2 -fno-pic -fno-plt" } */
 
 static void
diff --git a/gcc/testsuite/gcc.target/i386/pr67400-5.c 
b/gcc/testsuite/gcc.target/i386/pr67400-5.c
index 2d26a68..9bb98dc 100644
--- a/gcc/testsuite/gcc.target/i386/pr67400-5.c
+++ b/gcc/testsuite/gcc.target/i386/pr67400-5.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { *-*-linux* && { ! ia32 } } } } */
+/* { dg-do compile { target *-*-linux* } } */
 /* { dg-options "-O2 -fno-pic -fno-plt" } */
 
 extern void foo (void);
-- 
2.5.5



Re: [PATCHv2, rs6000] Add minimum __float128 built-in support required for glibc

2016-06-23 Thread Segher Boessenkool
Hi Bill,

Some little things about the patch...

On Thu, Jun 23, 2016 at 04:44:27PM -0500, Bill Schmidt wrote:
> We no longer have a half-clever implementation to construct an infinity
> inside vector registers, or the full-clever one that Segher proposed in
> response. :)  We can try to add that support later if desired.

For posterity:

Use   vspltisw A,N ; vsrw B,A,A ; vslo D,B,A   to create in D:
N=-16  _______  (ieee128 -Inf)
N=-17  7fff_______  (ieee128 +Inf)

> @@ -35569,6 +35639,7 @@ static struct rs6000_opt_mask const rs6000_builtin
>{ "hard-dfp",   RS6000_BTM_DFP,false, false },
>{ "hard-float", RS6000_BTM_HARD_FLOAT, false, false },
>{ "long-double-128",RS6000_BTM_LDBL128,false, false },
> +  { "float128",  RS6000_BTM_FLOAT128,   false, false },

The previous entries use tabs for indentation.

> --- gcc/config/rs6000/rs6000.h(revision 237619)
> +++ gcc/config/rs6000/rs6000.h(working copy)
> @@ -2689,6 +2689,7 @@ extern int frame_pointer_needed;
>  #define RS6000_BTM_HARD_FLOATMASK_SOFT_FLOAT /* Hardware floating 
> point.  */
>  #define RS6000_BTM_LDBL128   MASK_MULTIPLE   /* 128-bit long double.  */
>  #define RS6000_BTM_64BIT MASK_64BIT  /* 64-bit addressing.  */
> +#define RS6000_BTM_FLOAT128 MASK_P9_VECTOR  /* IEEE 128-bit float.  */

Here, too.

> @@ -2705,7 +2706,8 @@ extern int frame_pointer_needed;
>| RS6000_BTM_CELL  \
>| RS6000_BTM_DFP   \
>| RS6000_BTM_HARD_FLOAT\
> -  | RS6000_BTM_LDBL128)
> +  | RS6000_BTM_LDBL128   \
> +  | RS6000_BTM_FLOAT128)

And here.  And more later.  Let's try to stick to one style, at least
locally.

> --- gcc/config/rs6000/rs6000.md   (revision 237619)
> +++ gcc/config/rs6000/rs6000.md   (working copy)
> @@ -13326,7 +13326,25 @@
> "xssqrtqp %0,%1"
>[(set_attr "type" "vecdiv")])
>  
> -(define_insn "copysign3"
> +(define_expand "copysign3"
> +  [(use (match_operand:IEEE128 0 "altivec_register_operand" ""))
> +   (use (match_operand:IEEE128 1 "altivec_register_operand" ""))
> +   (use (match_operand:IEEE128 2 "altivec_register_operand" ""))]

The "" is not needed.

> +  "FLOAT128_IEEE_P (mode)"
> +{
> +  if (TARGET_FLOAT128_HW)
> +emit_insn (gen_copysign3_hard (operands[0], operands[1],
> +  operands[2]));

Tabbing here...

> +  else
> +{
> +  rtx tmp = gen_reg_rtx (mode);
> +  emit_insn (gen_copysign3_soft (operands[0], operands[1],
> +operands[2], tmp));

... and here is completely broken.

> @@ -13336,6 +13354,18 @@
> "xscpsgnqp %0,%2,%1"
>[(set_attr "type" "vecsimple")])
>  
> +(define_insn "copysign3_soft"
> +  [(set (match_operand:IEEE128 0 "altivec_register_operand" "=v")
> + (unspec:IEEE128
> +  [(match_operand:IEEE128 1 "altivec_register_operand" "v")
> +   (match_operand:IEEE128 2 "altivec_register_operand" "v")
> +   (match_operand:IEEE128 3 "altivec_register_operand" "+v")]
> +  UNSPEC_COPYSIGN))]
> +  "!TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode)"
> +   "xscpsgndp %x3,%x2,%x1\n\txxpermdi %x0,%x3,%x1,1"

Two machine insns in a template should be separated by \; not \n\t .

> +Additional built-in functions are available for the 64-bit PowerPC
> +family of processors, for efficient use of 128-bit floating point
> +(@code{__float128}) values.
> +
> +The following floating-point built-in functions are always available.  All
> +of them implement the function that is part of the name.

"Always"?  Not just with -mfloat128?  And it needs VMX?


Segher


Re: [PATCHv2, rs6000] Add minimum __float128 built-in support required for glibc

2016-06-23 Thread Joseph Myers
On Thu, 23 Jun 2016, Bill Schmidt wrote:

> After discussing with the glibc folks, I'd like to propose that this patch
> be altered to use the 'q' suffix for the builtin names.  That way we won't
> have a naming conflict with Joseph's patch in the short term, and we'll
> be able to stage the movement on trunk to the f128 support.
> 
> I've been informed that there are other packages/libraries that assume 
> the 'q' suffix, so we will need both anyway.  For the time being, we can
> use #defines for glibc using GCC 6 to define the f128 functions to be
> the q functions.  We'll plan to normalize to use the arch-neutral f128
> builtins after the 6.2 push completes.

Those #defines in glibc would be needed anyway for __float128 functions in 
glibc for x86 to support GCC versions before GCC 7 (which x86 support I'm 
minded to look at adding once the __float128 functions for powerpc64le are 
in; adding them for a new architecture shouldn't be hard once the first 
architecture is done).

The 'q' suffix should be considered legacy (just like the __float128 
name!), but if it doesn't conflict with the generic support it's 
essentially a target maintainer matter.  As I said in my patch submission 
for the generic functions, I don't know if target built-in functions can 
be made into aliases for generic ones, but that's what's desirable in 
optimization terms once the generic ones are in, so that optimizations 
apply equally to both.

(I'm presuming that eventually we *will* enable all built-in function and 
other optimizations, that currently are just for float / double / long 
double, for the new types and their corresponding TS 18661-3 functions as 
well - that's just lower priority since it's not at all on the critical 
path for enabling support for the new type, unlike this minimal set of 
functions.)

Do those packages assuming 'q' expect more than the minimal built-in 
functions (i.e., do they want libquadmath)?  I've noted before that while 
libquadmath is not the way forward for libm support for __float128 (that's 
*f128 functions in glibc), and while libquadmath is missing the past few 
years' improvements to glibc libm, enabling it for powerpc64le (once 
you've got built-in functions and complex arithmetic functions in libgcc) 
would allow you to test that the back-end __float128 support works for a 
substantial body of code with __float128 arithmetic  (While there is 
no libquadmath testsuite, Paul Murphy's recent work should make it much 
easier to run the glibc libm-test with libquadmath than it was when 
libquadmath was first added.)

-- 
Joseph S. Myers
jos...@codesourcery.com


[PATCH], Add PowerPC ISA 3.0 lxsihzx, lxsibzx, stxsihx, stxsibx support

2016-06-23 Thread Michael Meissner
PowerPC ISA 3.0 adds new instructions (LXSIHZX, LXSIBZX, STXSIHX, and STXSIBX)
that allow you to load and zero extend byte and half word values from memory
and to store them back.

This patch is similar in spirit to the patch I wrote years ago for power7 that
generates LFIWAX, LFIWZX, and STFIWX when loading up 32-bit integers to convert
to floating point, and converting floating point to 32-bit integers.

At some point it would be nice to allow various small integers directly into
the floating/vector registers, but I suspect that will take some amount of
effort to implement and tune.  So this patch adds support to avoid using direct
move when converting between small integers and floating point.

If you are curious, out of the 29 Spec 2006 CPU benchmarks, there are 8
benchmarks (perlbench, cactusADM, gobmk, povray, k264ref, omnetpp, wrf, and
sphinx3) that convert load up small integers from memory and convert them to
floating point.

There are 3 benchmarks (cactusADM, povray, and wrf) that convert floating point
to small integers and store the result.

I have done a bootstrap and make check with no regression on a power8 little
endian system and there were no regressions.  Are these patches ok to check
into the trunk, and after a burn-in period, check them into the GCC 6.2 branch?

[gcc]
2016-06-23  Michael Meissner  

* config/rs6000/vsx.md (UNSPEC_P9_MEMORY): New unspec to support
loading and storing byte/half-word values in the vector registers.
(vsx_sign_extend_hi_): Enable the generator function.
(p9_lxsizx): New insns to load zero-extended bytes and
half-words on ISA 3.0 to the vector registers.
(p9_stxsizx): New insns to store zero-extended bytes and
half-words on ISA 3.0 from the vector registers.
* config/rs6000/rs6000.md (FP_ISA3): New iterator to optimize
converting char/half-word items to floating point on ISA 3.0.
(float2): On ISA 3.0 generate the lxsihzx
and lxsibzx instructions if we are converting an 8-bit or 16-bit
item from memory to floating point.
(float2_internal): Likewise.
(floatuns2): Likewise.
(floatuns2_internal): Likewise.
(fix_trunc2): On ISA 3.0 generate the stxsihx
and stxsibx instructions to store floating point values converted
to 8 or 16-bit integers.
(fixuns_truncsi2): Likewise.

[gcc/testsuite]
2016-06-23  Michael Meissner  

* gcc.target/powerpc/p9-fpcvt-1.c: New test to test ISA 3.0 load
byte/half-word to vector registers and store byte/half-word from
vector register instructions.
* gcc.target/powerpc/p9-fpcvt-2.c: Likewise.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/vsx.md
===
--- gcc/config/rs6000/vsx.md
(.../svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)
(revision 237716)
+++ gcc/config/rs6000/vsx.md(.../gcc/config/rs6000) (working copy)
@@ -293,6 +293,7 @@ (define_c_enum "unspec"
UNSPEC_VSX_XVCVDPSXDS
UNSPEC_VSX_XVCVDPUXDS
UNSPEC_VSX_SIGN_EXTEND
+   UNSPEC_P9_MEMORY
   ])
 
 ;; VSX moves
@@ -2705,7 +2706,7 @@ (define_insn "vsx_sign_extend_qi_"
   "vextsb2 %0,%1"
   [(set_attr "type" "vecsimple")])
 
-(define_insn "*vsx_sign_extend_hi_"
+(define_insn "vsx_sign_extend_hi_"
   [(set (match_operand:VSINT_84 0 "vsx_register_operand" "=v")
(unspec:VSINT_84
 [(match_operand:V8HI 1 "vsx_register_operand" "v")]
@@ -2721,3 +2722,24 @@ (define_insn "*vsx_sign_extend_si_v2di"
   "TARGET_P9_VECTOR"
   "vextsw2d %0,%1"
   [(set_attr "type" "vecsimple")])
+
+
+;; ISA 3.0 memory operations
+(define_insn "p9_lxsizx"
+  [(set (match_operand:DI 0 "vsx_register_operand" "=wi")
+   (unspec:DI [(zero_extend:DI
+(match_operand:QHI 1 "indexed_or_indirect_operand" "Z"))]
+  UNSPEC_P9_MEMORY))]
+  "TARGET_P9_VECTOR"
+  "lxsizx %x0,%y1"
+  [(set_attr "type" "fpload")])
+
+(define_insn "p9_stxsix"
+  [(set (match_operand:QHI 0 "reg_or_indexed_operand" "=r,Z")
+   (unspec:QHI [(match_operand:DI 1 "vsx_register_operand" "wi,wi")]
+   UNSPEC_P9_MEMORY))]
+  "TARGET_P9_VECTOR"
+  "@
+   mfvsrd %0,%x1
+   stxsix %x1,%y0"
+  [(set_attr "type" "mffgpr,fpstore")])
Index: gcc/config/rs6000/rs6000.md
===
--- gcc/config/rs6000/rs6000.md 
(.../svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)
(revision 237716)
+++ gcc/config/rs6000/rs6000.md (.../gcc/config/rs6000) (working copy)
@@ -506,6 +506,12 @@ (define_mode_iterator FLOAT128 [(KF "TAR
(IF "TARGET_FLOAT128")
(TF "TARGET_LONG_DOUBLE_128")])
 
+; Iterator for 

Re: [PATCHv2, rs6000] Add minimum __float128 built-in support required for glibc

2016-06-23 Thread Bill Schmidt
After discussing with the glibc folks, I'd like to propose that this patch
be altered to use the 'q' suffix for the builtin names.  That way we won't
have a naming conflict with Joseph's patch in the short term, and we'll
be able to stage the movement on trunk to the f128 support.

I've been informed that there are other packages/libraries that assume 
the 'q' suffix, so we will need both anyway.  For the time being, we can
use #defines for glibc using GCC 6 to define the f128 functions to be
the q functions.  We'll plan to normalize to use the arch-neutral f128
builtins after the 6.2 push completes.

If this is acceptable, I'll respin the patch with the new names and we
can move ahead.

Thanks,
Bill

> On Jun 23, 2016, at 4:57 PM, Bill Schmidt  wrote:
> 
> So, I wasn't quite clear here... this is what I want to be able to put in 
> 6.2.  Normally we would put it upstream in trunk for burn-in, and then
> backport after a bit.
> 
> Unfortunately we are going to have a naming conflict with Joseph's
> patch to add the _Floatn, etc., builtin support.  So we can't put this
> directly upstream.
> 
> We are in a bit of a quandary with timing to get this upstream for 6.2.
> It seems for trunk we need to redevelop the patch after Joseph's
> patch lands.  It is probably not too hard to make that all work, but I
> need to understand it better.  Many portions of this patch won't be
> appropriate for trunk, unless we choose to rename all the functions
> to use a q suffix instead of f128.  I'm not sure how our glibc fellows
> would feel about a change in naming, but probably not excited.
> 
> So, not quite sure where to go with this in order to get the code in
> 6.2 before it closes.  Open to advice.
> 
> Bill
> 
>> On Jun 23, 2016, at 4:44 PM, Bill Schmidt  
>> wrote:
>> 
>> Hi,
>> 
>> This is a revision of my previous patch, correcting two issues.  The inff128
>> and huge_valf128 builtins now participate in folding so they are suitable
>> for use in static initializers, and the new builtins are now documented.
>> We no longer have a half-clever implementation to construct an infinity
>> inside vector registers, or the full-clever one that Segher proposed in
>> response. :)  We can try to add that support later if desired.
>> 
>> Regstrap in progress for powerpc64le-unknown-linux-gnu.  Provided
>> there are no regressions, is this ok?
>> 
>> Thanks,
>> Bill
>> 
>> 
>> [gcc]
>> 
>> 2016-06-23  Bill Schmidt  
>> 
>>  * config/rs6000/rs6000-builtin.def (BU_FLOAT128_2): New #define.
>>  (BU_FLOAT128_1): Likewise.
>>  (FABSF128): Likewise.
>>  (COPYSIGNF128): Likewise.
>>  (RS6000_BUILTIN_NANF128): Likewise.
>>  (RS6000_BUILTIN_NANSF128): Likewise.
>>  (RS6000_BUILTIN_INFF128): Likewise.
>>  (RS6000_BUILTIN_HUGE_VALF128): Likewise.
>>  * config/rs6000/rs6000.c (rs6000_fold_builtin): New prototype.
>>  (TARGET_FOLD_BUILTIN): New #define.
>>  (rs6000_builtin_mask_calculate): Add TARGET_FLOAT128 entry.
>>  (rs6000_invalid_builtin): Add handling for RS6000_BTM_FLOAT128.
>>  (rs6000_fold_builtin): New target hook implementation, handling
>>  folding of 128-bit NaNs and infinities.
>>  (rs6000_init_builtins): Initialize const_str_type_node; ensure all
>>  entries are filled in to avoid problems during bootstrap
>>  self-test; define builtins for 128-bit NaNs and infinities.
>>  (rs6000_opt_mask): Add entry for float128.
>>  * config/rs6000/rs6000.h (RS6000_BTM_FLOAT128): New #define.
>>  (RS6000_BTM_COMMON): Include RS6000_BTM_FLOAT128.
>>  (rs6000_builtin_type_index): Add RS6000_BTI_const_str.
>>  (const_str_type_node): New #define.
>>  * config/rs6000/rs6000.md (copysign3 for IEEE128): Convert
>>  to a define_expand that dispatches to either copysign3_soft
>>  or copysign3_hard.
>>  (copysign3_hard): Rename from copysign3.
>>  (copysign3_soft): New define_insn.
>>  * doc/extend.texi: Document new builtins.
>> 
>> [gcc/testsuite]
>> 
>> 2016-06-23  Bill Schmidt  
>> 
>>  * gcc.target/powerpc/abs128-1.c: New.
>>  * gcc.target/powerpc/copysign128-1.c: New.
>>  * gcc.target/powerpc/inf128-1.c: New.
>>  * gcc.target/powerpc/nan128-1.c: New.
>> 
>> 
>> Index: gcc/config/rs6000/rs6000-builtin.def
>> ===
>> --- gcc/config/rs6000/rs6000-builtin.def (revision 237619)
>> +++ gcc/config/rs6000/rs6000-builtin.def (working copy)
>> @@ -652,7 +652,23 @@
>>   | RS6000_BTC_BINARY),  \
>>  CODE_FOR_ ## ICODE) /* ICODE */
>> 
>> +/* IEEE 128-bit floating-point builtins.  */
>> +#define BU_FLOAT128_2(ENUM, NAME, ATTR, ICODE)  \
>> +  RS6000_BUILTIN_2 (MISC_BUILTIN_ ## ENUM,  /* ENUM */  \
>> +

Re: [PATCHv2, rs6000] Add minimum __float128 built-in support required for glibc

2016-06-23 Thread Bill Schmidt
So, I wasn't quite clear here... this is what I want to be able to put in 
6.2.  Normally we would put it upstream in trunk for burn-in, and then
backport after a bit.

Unfortunately we are going to have a naming conflict with Joseph's
patch to add the _Floatn, etc., builtin support.  So we can't put this
directly upstream.

We are in a bit of a quandary with timing to get this upstream for 6.2.
It seems for trunk we need to redevelop the patch after Joseph's
patch lands.  It is probably not too hard to make that all work, but I
need to understand it better.  Many portions of this patch won't be
appropriate for trunk, unless we choose to rename all the functions
to use a q suffix instead of f128.  I'm not sure how our glibc fellows
would feel about a change in naming, but probably not excited.

So, not quite sure where to go with this in order to get the code in
6.2 before it closes.  Open to advice.

Bill

> On Jun 23, 2016, at 4:44 PM, Bill Schmidt  wrote:
> 
> Hi,
> 
> This is a revision of my previous patch, correcting two issues.  The inff128
> and huge_valf128 builtins now participate in folding so they are suitable
> for use in static initializers, and the new builtins are now documented.
> We no longer have a half-clever implementation to construct an infinity
> inside vector registers, or the full-clever one that Segher proposed in
> response. :)  We can try to add that support later if desired.
> 
> Regstrap in progress for powerpc64le-unknown-linux-gnu.  Provided
> there are no regressions, is this ok?
> 
> Thanks,
> Bill
> 
> 
> [gcc]
> 
> 2016-06-23  Bill Schmidt  
> 
>   * config/rs6000/rs6000-builtin.def (BU_FLOAT128_2): New #define.
>   (BU_FLOAT128_1): Likewise.
>   (FABSF128): Likewise.
>   (COPYSIGNF128): Likewise.
>   (RS6000_BUILTIN_NANF128): Likewise.
>   (RS6000_BUILTIN_NANSF128): Likewise.
>   (RS6000_BUILTIN_INFF128): Likewise.
>   (RS6000_BUILTIN_HUGE_VALF128): Likewise.
>   * config/rs6000/rs6000.c (rs6000_fold_builtin): New prototype.
>   (TARGET_FOLD_BUILTIN): New #define.
>   (rs6000_builtin_mask_calculate): Add TARGET_FLOAT128 entry.
>   (rs6000_invalid_builtin): Add handling for RS6000_BTM_FLOAT128.
>   (rs6000_fold_builtin): New target hook implementation, handling
>   folding of 128-bit NaNs and infinities.
>   (rs6000_init_builtins): Initialize const_str_type_node; ensure all
>   entries are filled in to avoid problems during bootstrap
>   self-test; define builtins for 128-bit NaNs and infinities.
>   (rs6000_opt_mask): Add entry for float128.
>   * config/rs6000/rs6000.h (RS6000_BTM_FLOAT128): New #define.
>   (RS6000_BTM_COMMON): Include RS6000_BTM_FLOAT128.
>   (rs6000_builtin_type_index): Add RS6000_BTI_const_str.
>   (const_str_type_node): New #define.
>   * config/rs6000/rs6000.md (copysign3 for IEEE128): Convert
>   to a define_expand that dispatches to either copysign3_soft
>   or copysign3_hard.
>   (copysign3_hard): Rename from copysign3.
>   (copysign3_soft): New define_insn.
>   * doc/extend.texi: Document new builtins.
> 
> [gcc/testsuite]
> 
> 2016-06-23  Bill Schmidt  
> 
>   * gcc.target/powerpc/abs128-1.c: New.
>   * gcc.target/powerpc/copysign128-1.c: New.
>   * gcc.target/powerpc/inf128-1.c: New.
>   * gcc.target/powerpc/nan128-1.c: New.
> 
> 
> Index: gcc/config/rs6000/rs6000-builtin.def
> ===
> --- gcc/config/rs6000/rs6000-builtin.def  (revision 237619)
> +++ gcc/config/rs6000/rs6000-builtin.def  (working copy)
> @@ -652,7 +652,23 @@
>| RS6000_BTC_BINARY),  \
>   CODE_FOR_ ## ICODE) /* ICODE */
> 
> +/* IEEE 128-bit floating-point builtins.  */
> +#define BU_FLOAT128_2(ENUM, NAME, ATTR, ICODE)  \
> +  RS6000_BUILTIN_2 (MISC_BUILTIN_ ## ENUM,  /* ENUM */  \
> +"__builtin_" NAME,  /* NAME */  \
> + RS6000_BTM_FLOAT128,/* MASK */  \
> + (RS6000_BTC_ ## ATTR/* ATTR */  \
> +  | RS6000_BTC_BINARY),  \
> + CODE_FOR_ ## ICODE) /* ICODE */
> 
> +#define BU_FLOAT128_1(ENUM, NAME, ATTR, ICODE)  \
> +  RS6000_BUILTIN_1 (MISC_BUILTIN_ ## ENUM,  /* ENUM */  \
> +"__builtin_" NAME,  /* NAME */  \
> + RS6000_BTM_FLOAT128,/* MASK */  \
> + (RS6000_BTC_ ## ATTR/* ATTR */  \
> +  | RS6000_BTC_UNARY),   \
> + CODE_FOR_ ## ICODE) /* ICODE */
> +
> /* 

[PATCHv2, rs6000] Add minimum __float128 built-in support required for glibc

2016-06-23 Thread Bill Schmidt
Hi,

This is a revision of my previous patch, correcting two issues.  The inff128
and huge_valf128 builtins now participate in folding so they are suitable
for use in static initializers, and the new builtins are now documented.
We no longer have a half-clever implementation to construct an infinity
inside vector registers, or the full-clever one that Segher proposed in
response. :)  We can try to add that support later if desired.

Regstrap in progress for powerpc64le-unknown-linux-gnu.  Provided
there are no regressions, is this ok?

Thanks,
Bill


[gcc]

2016-06-23  Bill Schmidt  

* config/rs6000/rs6000-builtin.def (BU_FLOAT128_2): New #define.
(BU_FLOAT128_1): Likewise.
(FABSF128): Likewise.
(COPYSIGNF128): Likewise.
(RS6000_BUILTIN_NANF128): Likewise.
(RS6000_BUILTIN_NANSF128): Likewise.
(RS6000_BUILTIN_INFF128): Likewise.
(RS6000_BUILTIN_HUGE_VALF128): Likewise.
* config/rs6000/rs6000.c (rs6000_fold_builtin): New prototype.
(TARGET_FOLD_BUILTIN): New #define.
(rs6000_builtin_mask_calculate): Add TARGET_FLOAT128 entry.
(rs6000_invalid_builtin): Add handling for RS6000_BTM_FLOAT128.
(rs6000_fold_builtin): New target hook implementation, handling
folding of 128-bit NaNs and infinities.
(rs6000_init_builtins): Initialize const_str_type_node; ensure all
entries are filled in to avoid problems during bootstrap
self-test; define builtins for 128-bit NaNs and infinities.
(rs6000_opt_mask): Add entry for float128.
* config/rs6000/rs6000.h (RS6000_BTM_FLOAT128): New #define.
(RS6000_BTM_COMMON): Include RS6000_BTM_FLOAT128.
(rs6000_builtin_type_index): Add RS6000_BTI_const_str.
(const_str_type_node): New #define.
* config/rs6000/rs6000.md (copysign3 for IEEE128): Convert
to a define_expand that dispatches to either copysign3_soft
or copysign3_hard.
(copysign3_hard): Rename from copysign3.
(copysign3_soft): New define_insn.
* doc/extend.texi: Document new builtins.

[gcc/testsuite]

2016-06-23  Bill Schmidt  

* gcc.target/powerpc/abs128-1.c: New.
* gcc.target/powerpc/copysign128-1.c: New.
* gcc.target/powerpc/inf128-1.c: New.
* gcc.target/powerpc/nan128-1.c: New.


Index: gcc/config/rs6000/rs6000-builtin.def
===
--- gcc/config/rs6000/rs6000-builtin.def(revision 237619)
+++ gcc/config/rs6000/rs6000-builtin.def(working copy)
@@ -652,7 +652,23 @@
 | RS6000_BTC_BINARY),  \
CODE_FOR_ ## ICODE) /* ICODE */
 
+/* IEEE 128-bit floating-point builtins.  */
+#define BU_FLOAT128_2(ENUM, NAME, ATTR, ICODE)  \
+  RS6000_BUILTIN_2 (MISC_BUILTIN_ ## ENUM,  /* ENUM */  \
+"__builtin_" NAME,  /* NAME */  \
+   RS6000_BTM_FLOAT128,/* MASK */  \
+   (RS6000_BTC_ ## ATTR/* ATTR */  \
+| RS6000_BTC_BINARY),  \
+   CODE_FOR_ ## ICODE) /* ICODE */
 
+#define BU_FLOAT128_1(ENUM, NAME, ATTR, ICODE)  \
+  RS6000_BUILTIN_1 (MISC_BUILTIN_ ## ENUM,  /* ENUM */  \
+"__builtin_" NAME,  /* NAME */  \
+   RS6000_BTM_FLOAT128,/* MASK */  \
+   (RS6000_BTC_ ## ATTR/* ATTR */  \
+| RS6000_BTC_UNARY),   \
+   CODE_FOR_ ## ICODE) /* ICODE */
+
 /* Miscellaneous builtins for instructions added in ISA 3.0.  These
instructions don't require either the DFP or VSX options, just the basic
ISA 3.0 enablement since they operate on general purpose registers.  */
@@ -1814,6 +1830,11 @@ BU_P9V_OVERLOAD_1 (VPRTYBD,  "vprtybd")
 BU_P9V_OVERLOAD_1 (VPRTYBQ,"vprtybq")
 BU_P9V_OVERLOAD_1 (VPRTYBW,"vprtybw")
 
+/* 1 argument IEEE 128-bit floating-point functions.  */
+BU_FLOAT128_1 (FABSF128,   "fabsf128",   CONST, abskf2)
+
+/* 2 argument IEEE 128-bit floating-point functions.  */
+BU_FLOAT128_2 (COPYSIGNF128,   "copysignf128",   CONST, copysignkf3)
 
 /* 1 argument crypto functions.  */
 BU_CRYPTO_1 (VSBOX,"vsbox",  CONST, crypto_vsbox)
@@ -2191,6 +2212,18 @@ BU_SPECIAL_X (RS6000_BUILTIN_CPU_IS, "__builtin_cp
 BU_SPECIAL_X (RS6000_BUILTIN_CPU_SUPPORTS, "__builtin_cpu_supports",
  RS6000_BTM_ALWAYS, RS6000_BTC_MISC)
 
+BU_SPECIAL_X (RS6000_BUILTIN_NANF128, "__builtin_nanf128",
+ RS6000_BTM_FLOAT128, RS6000_BTC_CONST)
+
+BU_SPECIAL_X (RS6000_BUILTIN_NANSF128, 

Re: [PATCH] c++/60760 - arithmetic on null pointers should not be allowed in constant expressions

2016-06-23 Thread Jason Merrill

On 06/20/2016 10:17 PM, Martin Sebor wrote:

+  && tree_int_cst_equal (lhs, null_pointer_node)
+  && !tree_int_cst_equal (rhs, integer_zero_node))


Not integer_zerop?


+   "invalid conversion involving a null pointer");

...

+   "invalid conversion from %qT to %qT",


The conversion isn't invalid, it just isn't a constant expression.  For 
the null pointer to pointer conversion, does this properly allow 
conversion to void* or to base*?



+   if (integer_zerop (op))

...

+ else if (!integer_zerop (op))


The second test seems redundant.

Jason



Re: [PATCH] Reject boolean/enum types in last arg of __builtin_*_overflow_p (take 2)

2016-06-23 Thread Jason Merrill
I thought I already approved that patch.  If not, consider this approval.

Jason


On Wed, Jun 22, 2016 at 12:34 AM, Jeff Law  wrote:
> On 06/15/2016 05:47 AM, Jakub Jelinek wrote:
>>
>> On Tue, Jun 14, 2016 at 11:13:28AM -0600, Martin Sebor wrote:

 Here is an untested patch for that.  Except that the middle-end
 considers
 conversions between BOOLEAN_TYPE and single bit unsigned type as
 useless,
 so in theory this can't work well, and in practice only if we are lucky
 enough (plus it generates terrible code right now), so we'd probably
 need
 to come up with a different way of expressing whether the internal fn
 should have a bool/_Bool-ish behavior or not (optional 3rd argument or
 something ugly like that).  Plus add lots of testcases to cover the
 weirdo
 cases.  Is it really worth it, even when we don't want to support
 overflow
 into enumeration type and thus will not cover all integral types anyway?
>>>
>>>
>>> If it's cumbersome to get to work I agree that it's not worth
>>> the effort.  Thanks for taking the time to prototype it.
>>
>>
>> Ok, so here is an updated patch.  In addition to diagnostic wording
>> changes
>> this (as also the earlier posted patch) fixes the handling of sub-mode
>> precision, it adds hopefully sufficient testsuite coverage for
>> __builtin_{add,sub,mul}_overflow_p.
>>
>> The only thing I'm unsure about is what to do with bitfield types.
>> For __builtin_{add,sub,mul}_overflow it is not an issue, as one can't take
>> address of a bitfield.  For __builtin_{add,sub,mul}_overflow_p right now,
>> the C FE doesn't promote the last argument in any way, therefore for C
>> the builtin-arith-overflow-p-19.c testcase tests the behavior of bitfield
>> overflows.  The C++ FE even for type-generic builtins promotes the
>> argument
>> to the underlying type (as part of decay_conversion), therefore for C++
>> overflow to bit-fields doesn't work.  Is that acceptable that because the
>> bitfields in the two languages behave generally slightly differently it is
>> ok that it differs even here, or should the C FE promote bitfields to the
>> underlying type for the last argument of
>> __builtin_{add,sub,mul}_overflow_p,
>> or should the C++ FE special case __builtin_{add,sub,mul}_overflow_p and
>> not decay_conversion on the last argument to these, something else?
>>
>> 2016-06-15  Jakub Jelinek  
>>
>> * internal-fn.c (expand_arith_set_overflow): New function.
>> (expand_addsub_overflow, expand_neg_overflow,
>> expand_mul_overflow):
>> Use it.
>> (expand_arith_overflow_result_store): Likewise.  Handle precision
>> smaller than mode precision.
>> * tree-vrp.c (extract_range_basic): For imag part, handle
>> properly signed 1-bit precision result.
>> * doc/extend.texi (__builtin_add_overflow): Document that last
>> argument can't be pointer to enumerated or boolean type.
>> (__builtin_add_overflow_p): Document that last argument can't
>> have enumerated or boolean type.
>>
>> * c-common.c (check_builtin_function_arguments): Require last
>> argument of BUILT_IN_*_OVERFLOW_P to have INTEGER_TYPE type.
>> Adjust wording of diagnostics for BUILT_IN_*_OVERLFLOW
>> if the last argument is pointer to enumerated or boolean type.
>>
>> * c-c++-common/builtin-arith-overflow-1.c (generic_wrong_type, f3,
>> f4): Adjust expected diagnostics.
>> * c-c++-common/torture/builtin-arith-overflow.h (TP): New macro.
>> (T): If OVFP is defined, redefine to TP.
>> * c-c++-common/torture/builtin-arith-overflow-12.c: Adjust
>> comment.
>> * c-c++-common/torture/builtin-arith-overflow-p-1.c: New test.
>> * c-c++-common/torture/builtin-arith-overflow-p-2.c: New test.
>> * c-c++-common/torture/builtin-arith-overflow-p-3.c: New test.
>> * c-c++-common/torture/builtin-arith-overflow-p-4.c: New test.
>> * c-c++-common/torture/builtin-arith-overflow-p-5.c: New test.
>> * c-c++-common/torture/builtin-arith-overflow-p-6.c: New test.
>> * c-c++-common/torture/builtin-arith-overflow-p-7.c: New test.
>> * c-c++-common/torture/builtin-arith-overflow-p-8.c: New test.
>> * c-c++-common/torture/builtin-arith-overflow-p-9.c: New test.
>> * c-c++-common/torture/builtin-arith-overflow-p-10.c: New test.
>> * c-c++-common/torture/builtin-arith-overflow-p-11.c: New test.
>> * c-c++-common/torture/builtin-arith-overflow-p-12.c: New test.
>> * c-c++-common/torture/builtin-arith-overflow-p-13.c: New test.
>> * c-c++-common/torture/builtin-arith-overflow-p-14.c: New test.
>> * c-c++-common/torture/builtin-arith-overflow-p-15.c: New test.
>> * c-c++-common/torture/builtin-arith-overflow-p-16.c: New test.
>> * 

Re: [PATCH, PR71602] Give error for invalid va_list argument to va_arg

2016-06-23 Thread Jason Merrill
On Thu, Jun 23, 2016 at 1:27 PM, Tom de Vries  wrote:
> Hi,
>
> this patch fixes PR71602, a 6/7 regression.
>
> Consider this test-case:
> ...
> __builtin_va_list *pap;
>
> void
> fn1 (void)
> {
>  __builtin_va_arg(pap, double);
> }
> ...
>
> The testcase is invalid, because we're not passing a va_list as first
> argument of va_arg, but a va_list*.
>
> When compiling for x86_64 -m64, we run into the second assert in this
> snippet from build_va_arg:
> ...
> {
>   /* Case 2b: va_list is pointer to array elem type.  */
>   gcc_assert (POINTER_TYPE_P (va_type));
>   gcc_assert (TREE_TYPE (va_type) == TREE_TYPE (canon_va_type));
>
>   /* Don't take the address.  We've already got ''.  */
>   ;
> }
> ...
>
> At that point, va_type and canon_va_type are:
> ...
> (gdb) call debug_generic_expr (va_type)
> struct [1] *
> (gdb) call debug_generic_expr (canon_va_type)
> struct [1]
> ...
>
> so TREE_TYPE (va_type) and TREE_TYPE (canon_va_type) are not equal:
> ...
> (gdb) call debug_generic_expr (va_type.typed.type)
> struct [1]
> (gdb) call debug_generic_expr (canon_va_type.typed.type)
> struct
> ...
>
> Given the semantics of the target hook:
> ...
> Target Hook: tree TARGET_CANONICAL_VA_LIST_TYPE (tree type)
>
> This hook returns the va_list type of the calling convention specified
> by the type of type. If type is not a valid va_list type, it returns
> NULL_TREE.
> ...
> one could argue that canonical_va_list_type should return NULL_TREE for a
> va_list*, which would fix the ICE. But the current implementation seems to
> rely on canonical_va_list_type to return va_list for a va_list* argument.

It does seem like it's the job of canonical_va_list_type to detect an
invalid argument.  Why not fix that?

> The patch fixes the ICE by making the valid va_list check in build_va_arg
> more precise, by taking into account the non-strict behavior of
> canonical_va_list_type.

If you do need to check this here, is there a reason you need to pass
in a callback rather than use lang_hooks.types_compatible_p?

Jason


Fix zero size debug array swap noexcept qualification

2016-06-23 Thread François Dumont
Debug mode array had simply been forgotten when fixing zero-size 
swap method as part of swappable traits implementation.


* include/debug/array (array<>::swap): Fix noexcept qualificaton for
zero-size array.

Tested under Linux x86_64 debug mode.

François
Index: include/debug/array
===
--- include/debug/array	(revision 237614)
+++ include/debug/array	(working copy)
@@ -86,7 +86,7 @@
 
   void
   swap(array& __other)
-  noexcept(__is_nothrow_swappable<_Tp>::value)
+  noexcept(_AT_Type::_Is_nothrow_swappable::value)
   { std::swap_ranges(begin(), end(), __other.begin()); }
 
   // Iterators.
@@ -168,9 +168,8 @@
   at(size_type __n)
   {
 	if (__n >= _Nm)
-	  std::__throw_out_of_range_fmt(__N("array::at: __n "
-"(which is %zu) >= _Nm "
-	"(which is %zu)"),
+	  std::__throw_out_of_range_fmt(__N("array::at: __n (which is %zu) "
+	">= _Nm (which is %zu)"),
 	__n, _Nm);
 	return _AT_Type::_S_ref(_M_elems, __n);
   }



[PATCH, testsuite]: Use dg-additional-flags in vect testsuite

2016-06-23 Thread Uros Bizjak
Hello!

2016-06-23  Uros Bizjak  

* g++.dg/vect/pr33834_2.cc: Use dg-additional-options instead of
dg-options and remove default vector testsuite compile flags.
* g++.dg/vect/pr33860a.cc: Ditto.
* g++.dg/vect/pr45470-a.cc: Ditto.
* g++.dg/vect/pr45470-b.cc: Ditto.
* g++.dg/vect/pr60896.cc: Ditto.
* gcc.dg/vect/no-tree-pre-pr45241.c: Ditto.
* gcc.dg/vect/pr18308.c: Ditto.
* gcc.dg/vect/pr24049.c: Ditto.
* gcc.dg/vect/pr33373.c: Ditto.
* gcc.dg/vect/pr36228.c: Ditto.
* gcc.dg/vect/pr42395.c: Ditto.
* gcc.dg/vect/pr42604.c: Ditto.
* gcc.dg/vect/pr46663.c: Ditto.
* gcc.dg/vect/pr48765.c: Ditto.
* gcc.dg/vect/pr49093.c: Ditto.
* gcc.dg/vect/pr49352.c: Ditto.
* gcc.dg/vect/pr52298.c: Ditto.
* gcc.dg/vect/pr52870.c: Ditto.
* gcc.dg/vect/pr53185.c: Ditto.
* gcc.dg/vect/pr53773.c: Ditto.
* gcc.dg/vect/pr56695.c: Ditto.
* gcc.dg/vect/pr62171.c: Ditto.
* gcc.dg/vect/pr63530.c: Ditto.
* gcc.dg/vect/pr68339.c: Ditto.
* gcc.dg/vect/pr71259.c: Ditto.
* gcc.dg/vect/vect-82_64.c: Ditto.
* gcc.dg/vect/vect-83_64.c: Ditto.
* gcc.dg/vect/vect-debug-pr41926.c: Ditto.
* gcc.dg/vect/vect-shift-2-big-array.c: Ditto.
* gcc.dg/vect/vect-shift-2.c: Ditto.
* gfortran.dg/vect/fast-math-mgrid-resid.f: Ditto.
* gfortran.dg/vect/pr39318.f90: Ditto.
* gfortran.dg/vect/pr45714-a.f: Ditto.
* gfortran.dg/vect/pr45714-b.f: Ditto.
* gfortran.dg/vect/pr46213.f90: Ditto.

Tested on x86_64-linux-gnu {,-m32}  and committed to mainline SVN.

Uros.
Index: g++.dg/vect/pr33834_2.cc
===
--- g++.dg/vect/pr33834_2.cc(revision 237739)
+++ g++.dg/vect/pr33834_2.cc(working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O3 -ftree-vectorize" } */
+/* { dg-additional-options "-O3" } */
 
 /* Testcase by Martin Michlmayr  */
 
Index: g++.dg/vect/pr33860a.cc
===
--- g++.dg/vect/pr33860a.cc (revision 237739)
+++ g++.dg/vect/pr33860a.cc (working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-Wno-psabi" { target { { i?86-*-* x86_64-*-* } && ilp32 } } } 
*/
+/* { dg-additional-options "-Wno-psabi" { target { { i?86-*-* x86_64-*-* } && 
ilp32 } } } */
 
 /* Testcase by Martin Michlmayr  */
 
Index: g++.dg/vect/pr45470-a.cc
===
--- g++.dg/vect/pr45470-a.cc(revision 237739)
+++ g++.dg/vect/pr45470-a.cc(working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O1 -ftree-vectorize -fnon-call-exceptions" } */
+/* { dg-additional-options "-O1 -fnon-call-exceptions" } */
 
 struct A
 {
Index: g++.dg/vect/pr45470-b.cc
===
--- g++.dg/vect/pr45470-b.cc(revision 237739)
+++ g++.dg/vect/pr45470-b.cc(working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O1 -ftree-vectorize -fno-vect-cost-model 
-fnon-call-exceptions" } */
+/* { dg-additional-options "-O1 -fnon-call-exceptions" } */
 
 template < typename _Tp > struct new_allocator
 {
Index: g++.dg/vect/pr60896.cc
===
--- g++.dg/vect/pr60896.cc  (revision 237739)
+++ g++.dg/vect/pr60896.cc  (working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O3" } */
+/* { dg-additional-options "-O3" } */
 
 struct A
 {
Index: gcc.dg/vect/no-tree-pre-pr45241.c
===
--- gcc.dg/vect/no-tree-pre-pr45241.c   (revision 237739)
+++ gcc.dg/vect/no-tree-pre-pr45241.c   (working copy)
@@ -1,6 +1,5 @@
 /* PR tree-optimization/45241 */
 /* { dg-do compile } */
-/* { dg-options "-ftree-vectorize" } */
 
 int
 foo (short x)
Index: gcc.dg/vect/pr18308.c
===
--- gcc.dg/vect/pr18308.c   (revision 237739)
+++ gcc.dg/vect/pr18308.c   (working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O -ftree-vectorize -funroll-loops" } */
+/* { dg-additional-options "-O -funroll-loops" } */
 void foo();
 
 void bar(int j)
Index: gcc.dg/vect/pr24049.c
===
--- gcc.dg/vect/pr24049.c   (revision 237739)
+++ gcc.dg/vect/pr24049.c   (working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O1 -ftree-vectorize --param ggc-min-heapsize=0 --param 
ggc-min-expand=0" } */
+/* { dg-additional-options "-O1 --param ggc-min-heapsize=0 --param 
ggc-min-expand=0" } */
 
 void unscrunch (unsigned char *, int *);
 
Index: gcc.dg/vect/pr33373.c
===
--- gcc.dg/vect/pr33373.c   (revision 237739)
+++ gcc.dg/vect/pr33373.c   (working copy)
@@ -1,5 

RFC (attributes): PATCH for c++/50800 to set affects_type_identity for may_alias

2016-06-23 Thread Jason Merrill
My earlier patch for 50800 fixed the ICE by consistently stripping 
non-mangled attributes from template arguments, and mangling those that 
affect type identity.  At the C++ meeting this week someone pointed out 
to me that this is a real problem for x86 vector code, which relies on 
may_alias semantics: if may_alias is stripped from __m128, users can't 
use templates with vectors.


So, it seems that the solution is to mangle may_alias by saying that it 
affects type identity.  But since we still want to be able to convert 
back and forth, I thought that it would make sense to treat the 
may_alias version of a type as a variant, rather than a new distinct 
type.  So the first patch creates a new category of attributes that are 
treated as type variants.


An alternative patch just sets affects_type_identity and adjusts the C++ 
front end to allow conversion between pointers to add or discard may_alias.


Thoughts?

Tested x86_64-pc-linux-gnu.
commit b9722b2721f8e3901a7343b9c373d37a9e4ecefd
Author: Jason Merrill 
Date:   Tue Jun 21 22:26:38 2016 +0300

	PR c++/50800 - may_alias and templates

gcc/c/
	* tree-core.h (ATTRIBUTE_TYPE_VARIANT): New enumerator.
	* tree.c (comp_type_attributes): Check variant_attribute_p.
	(build_type_attribute_qual_variant): Handle variant attributes.
	(check_attribute_qualified_type, get_attribute_qualified_type): New.
	* tree.h: Declare get_attribute_qualified_type.
	* attribs.c (variant_attribute_p, remove_variant_attributes): New.
	* attribs.h: Declare them.
gcc/c-family/
	* c-common.c (c_common_attribute_table): may_alias affects type
	identity.
gcc/cp/
	* tree.c (apply_identity_attributes): No longer static.
	* mangle.c (canonicalize_for_substitution): Call it.
	* cp-tree.h: Declare it.
	* typeck.c (underlying_type_name): New.
	(structural_comptypes): Do structurally compare arithmetic types.

diff --git a/gcc/attribs.c b/gcc/attribs.c
index 9a88621..c36f4b4 100644
--- a/gcc/attribs.c
+++ b/gcc/attribs.c
@@ -690,3 +690,43 @@ make_attribute (const char *name, const char *arg_name, tree chain)
   attr = tree_cons (attr_name, attr_args, chain);
   return attr;
 }
+
+/* True iff ATTR is a type attribute that should be treated as creating a
+   variant of a base type, rather than a completely distinct type.  */
+
+bool
+variant_attribute_p (const_tree attr)
+{
+  const attribute_spec *s = lookup_attribute_spec (TREE_PURPOSE (attr));
+  return s && s->affects_type_identity == ATTRIBUTE_TYPE_VARIANT;
+}
+
+/* Return either ATTRS or a list of attributes without any attributes for which
+   variant_attribute_p is true.  Does not modify ATTRs.  */
+
+tree
+remove_variant_attributes (tree attrs)
+{
+  tree last = NULL_TREE;
+  for (tree a = attrs; a; a = TREE_CHAIN (a))
+if (variant_attribute_p (a))
+  last = a;
+  if (!last)
+return attrs;
+  tree l = NULL_TREE;
+  tree *p = 
+  for (tree a = attrs; a; a = TREE_CHAIN (a))
+{
+  if (a == last)
+	{
+	  *p = TREE_CHAIN (a);
+	  break;
+	}
+  if (!variant_attribute_p (a))
+	{
+	  *p = build_tree_list (TREE_PURPOSE (a), TREE_VALUE (a));
+	  p = _CHAIN (*p);
+	}
+}
+  return l;
+}
diff --git a/gcc/attribs.h b/gcc/attribs.h
index 23d3043..9a4d4e0 100644
--- a/gcc/attribs.h
+++ b/gcc/attribs.h
@@ -41,4 +41,7 @@ extern tree make_attribute (const char *, const char *, tree);
 extern struct scoped_attributes* register_scoped_attributes (const struct attribute_spec *,
 			 const char *);
 
+extern bool variant_attribute_p (const_tree);
+extern tree remove_variant_attributes (tree);
+
 #endif // GCC_ATTRIBS_H
diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 8f21fd1..96e97c4 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -757,7 +757,8 @@ const struct attribute_spec c_common_attribute_table[] =
 			  handle_nonnull_attribute, false },
   { "nothrow",0, 0, true,  false, false,
 			  handle_nothrow_attribute, false },
-  { "may_alias",	  0, 0, false, true, false, NULL, false },
+  { "may_alias",	  0, 0, false, true, false, NULL,
+			  ATTRIBUTE_TYPE_VARIANT },
   { "cleanup",		  1, 1, true, false, false,
 			  handle_cleanup_attribute, false },
   { "warn_unused_result", 0, 0, false, true, true,
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 5b87bb3..418b650 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -6508,6 +6508,7 @@ extern bool class_tmpl_impl_spec_p		(const_tree);
 extern int zero_init_p(const_tree);
 extern bool check_abi_tag_redeclaration		(const_tree, const_tree, const_tree);
 extern bool check_abi_tag_args			(tree, tree);
+extern tree apply_identity_attributes		(tree, tree, bool * = NULL);
 extern tree strip_typedefs			(tree, bool * = NULL);
 extern tree strip_typedefs_expr			(tree, bool * = NULL);
 extern tree copy_binfo(tree, tree, tree,
diff --git a/gcc/cp/mangle.c b/gcc/cp/mangle.c
index 

[RFT PATCH, i386]: Use options for 32bit targets when checking for 32bit as/ld features

2016-06-23 Thread Uros Bizjak
Hello!

This patch uses options for 32bit  targets when checking for 32bit
as/ld features. The patch also groups together these tests, so they
can reuse a couple of option variables.

2016-06-23  Uros Bizjak  

* configure.ac (HAVE_AS_GOTOF_IN_DATA): Use $as_ix86_gas_32_opt to
assemble for 32bit target.
(HAVE_AS_IX86_TLSGDPLT): Use $as_ix86_gas_32_opt to assemble
and $ld_ix86_gld_32_opt to link for 32bit target.
(HAVE_AS_IX86_TLSLDMPLT): Ditto.
* configure: Regenerate.

Bootstrapped on x86_64-linux-gnu.

Rainer, can you please test this patch on x86 solaris target?

Uros.
Index: configure
===
--- configure   (revision 237739)
+++ configure   (working copy)
@@ -25707,52 +25707,6 @@
 
 fi
 
-# These two are used unconditionally by i386.[ch]; it is to be defined
-# to 1 if the feature is present, 0 otherwise.
-as_ix86_gotoff_in_data_opt=
-if test x$gas = xyes; then
-  as_ix86_gotoff_in_data_opt="--32"
-fi
-{ $as_echo "$as_me:${as_lineno-$LINENO}: checking assembler for GOTOFF in 
data" >&5
-$as_echo_n "checking assembler for GOTOFF in data... " >&6; }
-if test "${gcc_cv_as_ix86_gotoff_in_data+set}" = set; then :
-  $as_echo_n "(cached) " >&6
-else
-  gcc_cv_as_ix86_gotoff_in_data=no
-if test $in_tree_gas = yes; then
-if test $gcc_cv_gas_vers -ge `expr \( \( 2 \* 1000 \) + 11 \) \* 1000 + 0`
-  then gcc_cv_as_ix86_gotoff_in_data=yes
-fi
-  elif test x$gcc_cv_as != x; then
-$as_echo ' .text
-.L0:
-   nop
-   .data
-   .long .L0@GOTOFF' > conftest.s
-if { ac_try='$gcc_cv_as $gcc_cv_as_flags $as_ix86_gotoff_in_data_opt -o 
conftest.o conftest.s >&5'
-  { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_try\""; } >&5
-  (eval $ac_try) 2>&5
-  ac_status=$?
-  $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
-  test $ac_status = 0; }; }
-then
-   gcc_cv_as_ix86_gotoff_in_data=yes
-else
-  echo "configure: failed program was" >&5
-  cat conftest.s >&5
-fi
-rm -f conftest.o conftest.s
-  fi
-fi
-{ $as_echo "$as_me:${as_lineno-$LINENO}: result: 
$gcc_cv_as_ix86_gotoff_in_data" >&5
-$as_echo "$gcc_cv_as_ix86_gotoff_in_data" >&6; }
-
-
-cat >>confdefs.h <<_ACEOF
-#define HAVE_AS_GOTOFF_IN_DATA `if test $gcc_cv_as_ix86_gotoff_in_data = yes; 
then echo 1; else echo 0; fi`
-_ACEOF
-
-
 { $as_echo "$as_me:${as_lineno-$LINENO}: checking assembler for rep and 
lock prefix" >&5
 $as_echo_n "checking assembler for rep and lock prefix... " >&6; }
 if test "${gcc_cv_as_ix86_rep_lock_prefix+set}" = set; then :
@@ -25821,6 +25775,18 @@
 
 fi
 
+# Enforce 32-bit output with gas and gld.
+if test x$gas = xyes; then
+  as_ix86_gas_32_opt="--32"
+fi
+if echo "$ld_ver" | grep GNU > /dev/null; then
+  if $gcc_cv_ld -V 2>/dev/null | grep elf_i386_sol2 > /dev/null; then
+ld_ix86_gld_32_opt="-melf_i386_sol2"
+  else
+ld_ix86_gld_32_opt="-melf_i386"
+  fi
+fi
+
 { $as_echo "$as_me:${as_lineno-$LINENO}: checking assembler for 
R_386_TLS_GD_PLT reloc" >&5
 $as_echo_n "checking assembler for R_386_TLS_GD_PLT reloc... " >&6; }
 if test "${gcc_cv_as_ix86_tlsgdplt+set}" = set; then :
@@ -25829,7 +25795,7 @@
   gcc_cv_as_ix86_tlsgdplt=no
   if test x$gcc_cv_as != x; then
 $as_echo 'calltls_gd@tlsgdplt' > conftest.s
-if { ac_try='$gcc_cv_as $gcc_cv_as_flags  -o conftest.o conftest.s >&5'
+if { ac_try='$gcc_cv_as $gcc_cv_as_flags $as_ix86_gas_32_opt -o conftest.o 
conftest.s >&5'
   { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_try\""; } >&5
   (eval $ac_try) 2>&5
   ac_status=$?
@@ -25837,7 +25803,7 @@
   test $ac_status = 0; }; }
 then
if test x$gcc_cv_ld != x \
-&& $gcc_cv_ld -o conftest conftest.o -G > /dev/null 2>&1; then
+&& $gcc_cv_ld $ld_ix86_gld_32_opt -o conftest conftest.o -G > 
/dev/null 2>&1; then
   gcc_cv_as_ix86_tlsgdplt=yes
 fi
 rm -f conftest
@@ -25861,6 +25827,7 @@
 tls_ld:
.section .text,"ax",@progbits
 calltls_ld@tlsldmplt'
+
 { $as_echo "$as_me:${as_lineno-$LINENO}: checking assembler for 
R_386_TLS_LDM_PLT reloc" >&5
 $as_echo_n "checking assembler for R_386_TLS_LDM_PLT reloc... " >&6; }
 if test "${gcc_cv_as_ix86_tlsldmplt+set}" = set; then :
@@ -25869,7 +25836,7 @@
   gcc_cv_as_ix86_tlsldmplt=no
   if test x$gcc_cv_as != x; then
 $as_echo "$conftest_s" > conftest.s
-if { ac_try='$gcc_cv_as $gcc_cv_as_flags  -o conftest.o conftest.s >&5'
+if { ac_try='$gcc_cv_as $gcc_cv_as_flags $as_ix86_gas_32_opt -o conftest.o 
conftest.s >&5'
   { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_try\""; } >&5
   (eval $ac_try) 2>&5
   ac_status=$?
@@ -25877,7 +25844,7 @@
   test $ac_status = 0; }; }
 then
if test x$gcc_cv_ld != x \
-&& $gcc_cv_ld -o conftest conftest.o -G > /dev/null 2>&1; then
+&& $gcc_cv_ld $ld_ix86_gld_32_opt -o 

Re: [Patch, avr] Fix PR 71151

2016-06-23 Thread Mike Stump
On Jun 23, 2016, at 9:16 AM, Georg-Johann Lay  wrote:
> Maybe even check during configure whether an appropriate version of 
> Binutils is used?
>> That would be nice, but is it ok to add target specific conditions to
>> configure.ac?
> 
> We already have avr-specific tests in gcc/configure.ac

Further, the only point of configure is to have target specific tests in it.  
:-)

Re: move increase_alignment from simple to regular ipa pass

2016-06-23 Thread Prathamesh Kulkarni
On 17 June 2016 at 19:52, Prathamesh Kulkarni
 wrote:
> On 14 June 2016 at 18:31, Prathamesh Kulkarni
>  wrote:
>> On 13 June 2016 at 16:13, Jan Hubicka  wrote:
 diff --git a/gcc/cgraph.h b/gcc/cgraph.h
 index ecafe63..41ac408 100644
 --- a/gcc/cgraph.h
 +++ b/gcc/cgraph.h
 @@ -1874,6 +1874,9 @@ public:
   if we did not do any inter-procedural code movement.  */
unsigned used_by_single_function : 1;

 +  /* Set if -fsection-anchors is set.  */
 +  unsigned section_anchor : 1;
 +
  private:
/* Assemble thunks and aliases associated to varpool node.  */
void assemble_aliases (void);
 diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
 index 4bfcad7..e75d5c0 100644
 --- a/gcc/cgraphunit.c
 +++ b/gcc/cgraphunit.c
 @@ -800,6 +800,9 @@ varpool_node::finalize_decl (tree decl)
   it is available to notice_global_symbol.  */
node->definition = true;
notice_global_symbol (decl);
 +
 +  node->section_anchor = flag_section_anchors;
 +
if (TREE_THIS_VOLATILE (decl) || DECL_PRESERVE_P (decl)
/* Traditionally we do not eliminate static variables when not
optimizing and when not doing toplevel reoder.  */
 diff --git a/gcc/common.opt b/gcc/common.opt
 index f0d7196..e497795 100644
 --- a/gcc/common.opt
 +++ b/gcc/common.opt
 @@ -1590,6 +1590,10 @@ fira-algorithm=
  Common Joined RejectNegative Enum(ira_algorithm) Var(flag_ira_algorithm) 
 Init(IRA_ALGORITHM_CB) Optimization
  -fira-algorithm=[CB|priority] Set the used IRA algorithm.

 +fipa-increase_alignment
 +Common Report Var(flag_ipa_increase_alignment) Init(0) Optimization
 +Option to gate increase_alignment ipa pass.
 +
  Enum
  Name(ira_algorithm) Type(enum ira_algorithm) UnknownError(unknown IRA 
 algorithm %qs)

 @@ -2133,7 +2137,7 @@ Common Report Var(flag_sched_dep_count_heuristic) 
 Init(1) Optimization
  Enable the dependent count heuristic in the scheduler.

  fsection-anchors
 -Common Report Var(flag_section_anchors) Optimization
 +Common Report Var(flag_section_anchors)
  Access data in the same section from shared anchor points.

  fsee
 diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
 index a0db3a4..1482566 100644
 --- a/gcc/config/aarch64/aarch64.c
 +++ b/gcc/config/aarch64/aarch64.c
 @@ -8252,6 +8252,8 @@ aarch64_override_options (void)

aarch64_register_fma_steering ();

 +  /* Enable increase_alignment pass.  */
 +  flag_ipa_increase_alignment = 1;
>>>
>>> I would rather enable it always on targets that do support anchors.
>> AFAIK aarch64 supports section anchors.
 diff --git a/gcc/lto/lto-symtab.c b/gcc/lto/lto-symtab.c
 index ce9e146..7f09f3a 100644
 --- a/gcc/lto/lto-symtab.c
 +++ b/gcc/lto/lto-symtab.c
 @@ -342,6 +342,13 @@ lto_symtab_merge (symtab_node *prevailing, 
 symtab_node *entry)
   The type compatibility checks or the completing of types has properly
   dealt with most issues.  */

 +  /* ??? is this assert necessary ?  */
 +  varpool_node *v_prevailing = dyn_cast (prevailing);
 +  varpool_node *v_entry = dyn_cast (entry);
 +  gcc_assert (v_prevailing && v_entry);
 +  /* section_anchor of prevailing_decl wins.  */
 +  v_entry->section_anchor = v_prevailing->section_anchor;
 +
>>> Other flags are merged in lto_varpool_replace_node so please move this 
>>> there.
>> Ah indeed, thanks for the pointers.
>> I wonder though if we need to set
>> prevailing_node->section_anchor = vnode->section_anchor ?
>> IIUC, the function merges flags from vnode into prevailing_node
>> and removes vnode. However we want prevailing_node->section_anchor
>> to always take precedence.
 +/* Return true if alignment should be increased for this vnode.
 +   This is done if every function that references/referring to vnode
 +   has flag_tree_loop_vectorize set.  */
 +
 +static bool
 +increase_alignment_p (varpool_node *vnode)
 +{
 +  ipa_ref *ref;
 +
 +  for (int i = 0; vnode->iterate_reference (i, ref); i++)
 +if (cgraph_node *cnode = dyn_cast (ref->referred))
 +  {
 + struct cl_optimization *opts = opts_for_fn (cnode->decl);
 + if (!opts->x_flag_tree_loop_vectorize)
 +   return false;
 +  }
>>>
>>> If you take address of function that has vectorizer enabled probably doesn't
>>> imply need to increase alignment of that var. So please drop the loop.
>>>
>>> You only want function that read/writes or takes address of the symbol. But
>>> onthe other hand, you need to walk all aliases of the symbol by
>>> call_for_symbol_and_aliases
 +
 +  for (int i = 0; vnode->iterate_referring (i, ref); i++)

Add minimal _FloatN, _FloatNx built-in functions

2016-06-23 Thread Joseph Myers
This patch, relative to a tree with
 (pending
review) applied, adds a minimal set of built-in functions for the new
_FloatN and _FloatNx types.

The functions added are __builtin_fabs*, __builtin_copysign*,
__builtin_huge_val*, __builtin_inf*, __builtin_nan* and
__builtin_nans* (where * = fN or fNx).  That is, 42 new entries are
added to the enum of built-in functions and the associated array of
decls, where not all of them are actually supported on any one target.

These functions are believed to be sufficient for libgcc (complex
multiplication and division use __builtin_huge_val*,
__builtin_copysign* and __builtin_fabs*) and for glibc (which also
depends on complex multiplication from libgcc, as well as using such
functions itself).  The basic target-independent support for folding /
expanding calls to these built-in functions is wired up, so those for
constants can be used in static initializers, and the fabs and
copysign built-ins can always be expanded to bit-manipulation inline
(for any format setting signbit_ro and signbit_rw, which covers all
formats supported for _FloatN and _FloatNx), although insn patterns
for fabs (abs2) and copysign (copysign3) will be used when
available and may result in more optimal code.

The complex multiplication and division functions in libgcc rely on
predefined macros (defined with -fbuilding-libgcc) to say what the
built-in function suffixes to use with a particular mode are.  This
patch updates that code accordingly, where previously it involved a
hack supposing that machine-specific suffixes for constants were also
suffixes for built-in functions.

As with the main _FloatN / _FloatNx patch, this patch does not update
code dealing only with optimizations that currently has cases only
covering float, double and long double, though some such cases are
straightforward and may be covered in a followup patch.

The functions are defined with DEF_GCC_BUILTIN, so calls to the TS
18661-3 functions such as fabsf128 and copysignf128, without the
__builtin_, will not be optimized.  As noted in the original _FloatN /
_FloatNx patch submission, in principle the bulk of the libm functions
that have built-in versions should have those versions extended to
cover the new types, but that would require more consideration of the
effects of increasing the size of the enum and initializing many more
functions at startup.

I don't know whether target-specific built-in functions can readily be
made into aliases for target-independent functions, but if they can,
it would make sense to do so for the x86 and ia64 *q functions
corresponding to these, so that they can benefit from the
architecture-independent folding logic and from any optimizations
enabled for these functions in future, and so that less
target-specific code is needed to support them.

Bootstrapped with no regressions on x86_64-pc-linux-gnu.  OK to
commit (the non-C-front-end parts)?

gcc:
2016-06-23  Joseph Myers  

* tree.h (CASE_FLT_FN_FLOATN_NX, float16_type_node)
(float32_type_node, float64_type_node, float32x_type_node)
(float128x_type_node): New macros.
* builtin-types.def (BT_FLOAT16, BT_FLOAT32, BT_FLOAT64)
(BT_FLOAT128, BT_FLOAT32X, BT_FLOAT64X, BT_FLOAT128X)
(BT_FN_FLOAT16, BT_FN_FLOAT32, BT_FN_FLOAT64, BT_FN_FLOAT128)
(BT_FN_FLOAT32X, BT_FN_FLOAT64X, BT_FN_FLOAT128X)
(BT_FN_FLOAT16_FLOAT16, BT_FN_FLOAT32_FLOAT32)
(BT_FN_FLOAT64_FLOAT64, BT_FN_FLOAT128_FLOAT128)
(BT_FN_FLOAT32X_FLOAT32X, BT_FN_FLOAT64X_FLOAT64X)
(BT_FN_FLOAT128X_FLOAT128X, BT_FN_FLOAT16_CONST_STRING)
(BT_FN_FLOAT32_CONST_STRING, BT_FN_FLOAT64_CONST_STRING)
(BT_FN_FLOAT128_CONST_STRING, BT_FN_FLOAT32X_CONST_STRING)
(BT_FN_FLOAT64X_CONST_STRING, BT_FN_FLOAT128X_CONST_STRING)
(BT_FN_FLOAT16_FLOAT16_FLOAT16, BT_FN_FLOAT32_FLOAT32_FLOAT32)
(BT_FN_FLOAT64_FLOAT64_FLOAT64, BT_FN_FLOAT128_FLOAT128_FLOAT128)
(BT_FN_FLOAT32X_FLOAT32X_FLOAT32X)
(BT_FN_FLOAT64X_FLOAT64X_FLOAT64X)
(BT_FN_FLOAT128X_FLOAT128X_FLOAT128X): New type definitions.
* builtins.def (DEF_GCC_FLOATN_NX_BUILTINS): New macro.
(copysign, fabs, huge_val, inf, nan, nans): Use it.
* builtins.c (expand_builtin): Use CASE_FLT_FN_FLOATN_NX for fabs
and copysign.
(fold_builtin_0): Use CASE_FLT_FN_FLOATN_NX for inf and huge_val.
(fold_builtin_1): Use CASE_FLT_FN_FLOATN_NX for fabs.
* doc/extend.texi (Other Builtins): Document these built-in
functions.
* fold-const-call.c (fold_const_call): Use CASE_FLT_FN_FLOATN_NX
for nan and nans.

gcc/c-family:
2016-06-23  Joseph Myers  

* c-family/c-cppbuiltin.c (c_cpp_builtins): Check _FloatN and
_FloatNx types for suffixes for built-in functions.

gcc/testsuite:
2016-06-23  Joseph Myers  

[PATCH, testsuite]: Skip PR71488 testcases for non-sse4 targets

2016-06-23 Thread Uros Bizjak
2016-06-23  Uros Bizjak  

PR tree-optimization/71488
* gcc.target/i386/i386.exp (check_effective_target_sse4): Move to ...
* lib/target-supports.exp: ... here.
(check_sse4_hw_available): New procedure.
(check_effective_target_sse4_runtime): Ditto.
* g++.dg/pr71488.C (dg-additional-options): Use -msse4 instead of
-march=westmere for sse4_runtime targets.
* gcc.dg/vect/vect-bool-cmp.c: Include "tree-vect.h".
(dg-additional-options): Use for sse4_runtime targets.
(main): Call check_vect ().
(dg-final): Perform scan only for sse4_runtime targets.

Tested on x86_64-linux-gnu {,-m32} AVX target and SSE2-only target.

Committed to mainline SVN.

Uros.
Index: g++.dg/pr71488.C
===
--- g++.dg/pr71488.C(revision 237733)
+++ g++.dg/pr71488.C(working copy)
@@ -1,7 +1,7 @@
-// PR middle-end/71488
+// PR tree-optimization/71488
 // { dg-do run }
 // { dg-options "-O3 -std=c++11" }
-// { dg-additional-options "-march=westmere" { target i?86-*-* x86_64-*-* } }
+// { dg-additional-options "-msse4" { target sse4_runtime } }
 // { dg-require-effective-target c++11 }
 
 #include 
Index: gcc.dg/vect/vect-bool-cmp.c
===
--- gcc.dg/vect/vect-bool-cmp.c (revision 237733)
+++ gcc.dg/vect/vect-bool-cmp.c (working copy)
@@ -1,8 +1,10 @@
-/* PR71488 */
+/* PR tree-optimization/71488 */
 /* { dg-require-effective-target vect_int } */
 /* { dg-require-effective-target vect_pack_trunc } */
-/* { dg-additional-options "-msse4" { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-additional-options "-msse4" { target sse4_runtime } } */
 
+#include "tree-vect.h"
+
 int i1, i2;
 
 void __attribute__((noclone,noinline))
@@ -199,6 +201,8 @@
   long long l2[32];
   int i;
 
+  check_vect ();
+
   for (i = 0; i < 32; i++)
 {
   l2[i] = i2[i] = s2[i] = i % 2;
@@ -249,4 +253,4 @@
   check (res, ne);
 }
 
-/* { dg-final { scan-tree-dump-times "VECTORIZED" 18 "vect" { target { 
i?86-*-* x86_64-*-* } } } } */
+/* { dg-final { scan-tree-dump-times "VECTORIZED" 18 "vect" { target 
sse4_runtime } } } */
Index: gcc.target/i386/i386.exp
===
--- gcc.target/i386/i386.exp(revision 237733)
+++ gcc.target/i386/i386.exp(working copy)
@@ -76,20 +76,6 @@
 } "-O2 -mssse3" ]
 }
 
-# Return 1 if sse4 instructions can be compiled.
-proc check_effective_target_sse4 { } {
-return [check_no_compiler_messages sse4.1 object {
-   typedef long long __m128i __attribute__ ((__vector_size__ (16)));
-   typedef int __v4si __attribute__ ((__vector_size__ (16)));
-
-   __m128i _mm_mullo_epi32 (__m128i __X, __m128i __Y)
-   {
-   return (__m128i) __builtin_ia32_pmulld128 ((__v4si)__X,
-  (__v4si)__Y);
-   }
-} "-O2 -msse4.1" ]
-}
-
 # Return 1 if aes instructions can be compiled.
 proc check_effective_target_aes { } {
 return [check_no_compiler_messages aes object {
Index: lib/target-supports.exp
===
--- lib/target-supports.exp (revision 237733)
+++ lib/target-supports.exp (working copy)
@@ -1608,6 +1608,29 @@
 }]
 }
 
+# Return 1 if the target supports executing SSE4 instructions, 0
+# otherwise.  Cache the result.
+
+proc check_sse4_hw_available { } {
+return [check_cached_effective_target sse4_hw_available {
+   # If this is not the right target then we can skip the test.
+   if { !([istarget x86_64-*-*] || [istarget i?86-*-*]) } {
+   expr 0
+   } else {
+   check_runtime_nocache sse4_hw_available {
+   #include "cpuid.h"
+   int main ()
+   {
+ unsigned int eax, ebx, ecx, edx;
+ if (__get_cpuid (1, , , , ))
+   return !(ecx & bit_SSE4_2);
+ return 1;
+   }
+   } ""
+   }
+}]
+}
+
 # Return 1 if the target supports executing AVX instructions, 0
 # otherwise.  Cache the result.
 
@@ -1654,6 +1677,17 @@
 return 0
 }
 
+# Return 1 if the target supports running SSE4 executables, 0 otherwise.
+
+proc check_effective_target_sse4_runtime { } {
+if { [check_effective_target_sse4]
+&& [check_sse4_hw_available]
+&& [check_sse_os_support_available] } {
+   return 1
+}
+return 0
+}
+
 # Return 1 if the target supports running AVX executables, 0 otherwise.
 
 proc check_effective_target_avx_runtime { } {
@@ -6390,6 +6424,20 @@
 } "-O2 -msse2" ]
 }
 
+# Return 1 if sse4.1 instructions can be compiled.
+proc check_effective_target_sse4 { } {
+return [check_no_compiler_messages sse4.1 object {
+   typedef long long __m128i __attribute__ ((__vector_size__ (16)));
+   typedef int __v4si __attribute__ ((__vector_size__ (16)));
+
+   __m128i 

Re: [PING^5][PATCHv2, ARM, libgcc] New aeabi_idiv function for armv6-m

2016-06-23 Thread Andre Vieira (lists)
Ping.

On 08/06/16 15:35, Andre Vieira (lists) wrote:
> Ping.
> 
> On 19/05/16 11:19, Andre Vieira (lists) wrote:
>> Ping for GCC-7, patch applies cleanly, passed make check for cortex-m0.
>>
>> Might be worth mentioning that this patch has been used in three
>> releases of the GNU ARM embedded toolchain, using GCC versions 4.9 and
>> 5, and no issues have been reported so far.
>>
>> On 25/01/16 17:15, Andre Vieira (lists) wrote:
>>> Ping.
>>>
>>> On 27/10/15 17:03, Andre Vieira wrote:
 Ping.

 BR,
 Andre

 On 13/10/15 18:01, Andre Vieira wrote:
> This patch ports the aeabi_idiv routine from Linaro Cortex-Strings
> (https://git.linaro.org/toolchain/cortex-strings.git), which was
> contributed by ARM under Free BSD license.
>
> The new aeabi_idiv routine is used to replace the one in
> libgcc/config/arm/lib1funcs.S. This replacement happens within the
> Thumb1 wrapper. The new routine is under LGPLv3 license.
>
> The main advantage of this version is that it can improve the
> performance of the aeabi_idiv function for Thumb1. This solution will
> also increase the code size. So it will only be used if
> __OPTIMIZE_SIZE__ is not defined.
>
> Make check passed for armv6-m.
>
> libgcc/ChangeLog:
> 2015-08-10  Hale Wang  
>   Andre Vieira  
>
> * config/arm/lib1funcs.S: Add new wrapper.
>
>>>
>>
> 



[PATCH] Fix big-endian bswap

2016-06-23 Thread Wilco Dijkstra
This patch fixes a bug in the bswap pass.  In big-endian BIT_FIELD_REF uses
big-endian bit numbering so we need to adjust the bit position.
The existing version could potentially generate incorrect code however GCC 
doesn't
emit a BIT_FIELD_REF to access the low byte in a register, so the symbolic 
number
never matches in big-endian.

The test gcc.dg/optimize-bswapsi-4.c now passes on AArch64, no other changes.
OK for commit?

ChangeLog:
2016-06-23  Wilco Dijkstra  

* gcc/tree-ssa-math-opts.c (find_bswap_or_nop_1): Adjust bitnumbering
for big-endian BIT_FIELD_REF.

--
diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
index 
513ef0b3f4eb29a35eae8a0eb14ee8f8c24fcfd9..d31c12fd818a713ca3d251b9464015b147235bbe
 100644
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -2307,6 +2307,10 @@ find_bswap_or_nop_1 (gimple *stmt, struct 
symbolic_number *n, int limit)
  && bitsize % BITS_PER_UNIT == 0
  && init_symbolic_number (n, TREE_OPERAND (rhs1, 0)))
{
+ /* Handle big-endian bit numbering in BIT_FIELD_REF.  */
+ if (BYTES_BIG_ENDIAN)
+   bitpos = TYPE_PRECISION (n->type) - bitpos - bitsize;
+
  /* Shift.  */
  if (!do_shift_rotate (RSHIFT_EXPR, n, bitpos))
return NULL;



Re: [PATCH] i386: Access external function via GOT slot for -fno-plt

2016-06-23 Thread H.J. Lu
On Thu, Jun 23, 2016 at 9:23 AM, Uros Bizjak  wrote:
> On Thu, Jun 23, 2016 at 6:08 PM, H.J. Lu  wrote:
>
>> Here is the updated patch.  OK for trunk?
>>
>> PR target/66232
>> PR target/67400
>> * configure.ac (as_ix86_tls_ldm_opt): Renamed to ...
>> (as_ix86_gas_opt): This.
>> (ld_ix86_tls_ldm_opt): Renamed to ...
>> (ld_ix86_gld_opt): This.
>> (R_386_TLS_LDM reloc): Updated.
>> (R_386_GOT32X reloc): New assembler/linker check.
>> (HAVE_AS_IX86_GOT32X): New.  Defined to 1 if 32-bit assembler and
>> linker support "jmp *_start@GOT" and "cmpl $0, bar@GOT".  Otherise,
>> defined to 0.
>> * config.in: Regenerated.
>> * configure: Likewise.
>> * config/i386/i386.c (ix86_force_load_from_GOT_p): Return
>> true if HAVE_AS_IX86_GOT32X is 1 in 32-bit mode.
>> (ix86_legitimate_address_p): Allow UNSPEC_GOT for -fno-plt
>> if ix86_force_load_from_GOT_p returns true.
>> (ix86_print_operand_address_as): Also support UNSPEC_GOT if
>> ix86_force_load_from_GOT_p returns true.
>> (ix86_expand_move): Generate UNSPEC_GOT in 32-bit mode to load
>> the external function address via the GOT slot.
>> (ix86_nopic_noplt_attribute_p): Check both TARGET_64BIT and
>> HAVE_AS_IX86_GOT32X before returning false.
>> (ix86_output_call_insn): Generate "%!jmp/call\t*%p0@GOT" in
>> 32-bit mode if ix86_nopic_noplt_attribute_p returns true.
>>
>> gcc/testsuite/
>>
>> PR target/66232
>> PR target/67400
>> * gcc.target/i386/pr66232-14.c: New file.
>> * gcc.target/i386/pr66232-15.c: Likewise.
>> * gcc.target/i386/pr66232-16.c: Likewise.
>> * gcc.target/i386/pr66232-17.c: Likewise.
>> * gcc.target/i386/pr67400-1.c: Don't disable for ia32.  Scan for
>> ia32 if R_386_GOT32X relocation is supported.
>> * gcc.target/i386/pr67400-2.c: Likewise.
>> * gcc.target/i386/pr67400-3.c: Likewise.
>> * gcc.target/i386/pr67400-4.c: Likewise.
>> * gcc.target/i386/pr67400-6.c: Likewise.
>> * gcc.target/i386/pr67400-7.c: Likewise.
>> * lib/target-supports.exp (check_effective_target_got32x_reloc):
>> New.
>
> OK with a nit below.
>
> Thanks,
> Uros.
>
> --- a/gcc/configure.ac
> +++ b/gcc/configure.ac
> @@ -4164,13 +4164,13 @@ tls_ld:
>
>  # Enforce 32-bit output with gas and gld.
>  if test x$gas = xyes; then
> -  as_ix86_tls_ldm_opt="--32"
> +  as_ix86_gas_opt="--32"
>  fi
>  if echo "$ld_ver" | grep GNU > /dev/null; then
>if $gcc_cv_ld -V 2>/dev/null | grep elf_i386_sol2 > /dev/null; then
> -ld_ix86_tls_ldm_opt="-melf_i386_sol2"
> +ld_ix86_gld_opt="-melf_i386_sol2"
>else
> -ld_ix86_tls_ldm_opt="-melf_i386"
> +ld_ix86_gld_opt="-melf_i386"
>fi
>  fi
>  conftest_s='
>
> I'd like to suggest better names, perhaps as_ix86_gas_32_opt and
> ld_ix86_gld_32_opt to mark that they are intended for 32bit targets.
> But it is up to you, we can also live with above names.

I checked it with the name change suggested above.

Thanks.


-- 
H.J.


Re: [PATCH] i386: Access external function via GOT slot for -fno-plt

2016-06-23 Thread Uros Bizjak
On Thu, Jun 23, 2016 at 6:08 PM, H.J. Lu  wrote:

> Here is the updated patch.  OK for trunk?
>
> PR target/66232
> PR target/67400
> * configure.ac (as_ix86_tls_ldm_opt): Renamed to ...
> (as_ix86_gas_opt): This.
> (ld_ix86_tls_ldm_opt): Renamed to ...
> (ld_ix86_gld_opt): This.
> (R_386_TLS_LDM reloc): Updated.
> (R_386_GOT32X reloc): New assembler/linker check.
> (HAVE_AS_IX86_GOT32X): New.  Defined to 1 if 32-bit assembler and
> linker support "jmp *_start@GOT" and "cmpl $0, bar@GOT".  Otherise,
> defined to 0.
> * config.in: Regenerated.
> * configure: Likewise.
> * config/i386/i386.c (ix86_force_load_from_GOT_p): Return
> true if HAVE_AS_IX86_GOT32X is 1 in 32-bit mode.
> (ix86_legitimate_address_p): Allow UNSPEC_GOT for -fno-plt
> if ix86_force_load_from_GOT_p returns true.
> (ix86_print_operand_address_as): Also support UNSPEC_GOT if
> ix86_force_load_from_GOT_p returns true.
> (ix86_expand_move): Generate UNSPEC_GOT in 32-bit mode to load
> the external function address via the GOT slot.
> (ix86_nopic_noplt_attribute_p): Check both TARGET_64BIT and
> HAVE_AS_IX86_GOT32X before returning false.
> (ix86_output_call_insn): Generate "%!jmp/call\t*%p0@GOT" in
> 32-bit mode if ix86_nopic_noplt_attribute_p returns true.
>
> gcc/testsuite/
>
> PR target/66232
> PR target/67400
> * gcc.target/i386/pr66232-14.c: New file.
> * gcc.target/i386/pr66232-15.c: Likewise.
> * gcc.target/i386/pr66232-16.c: Likewise.
> * gcc.target/i386/pr66232-17.c: Likewise.
> * gcc.target/i386/pr67400-1.c: Don't disable for ia32.  Scan for
> ia32 if R_386_GOT32X relocation is supported.
> * gcc.target/i386/pr67400-2.c: Likewise.
> * gcc.target/i386/pr67400-3.c: Likewise.
> * gcc.target/i386/pr67400-4.c: Likewise.
> * gcc.target/i386/pr67400-6.c: Likewise.
> * gcc.target/i386/pr67400-7.c: Likewise.
> * lib/target-supports.exp (check_effective_target_got32x_reloc):
> New.

OK with a nit below.

Thanks,
Uros.

--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -4164,13 +4164,13 @@ tls_ld:

 # Enforce 32-bit output with gas and gld.
 if test x$gas = xyes; then
-  as_ix86_tls_ldm_opt="--32"
+  as_ix86_gas_opt="--32"
 fi
 if echo "$ld_ver" | grep GNU > /dev/null; then
   if $gcc_cv_ld -V 2>/dev/null | grep elf_i386_sol2 > /dev/null; then
-ld_ix86_tls_ldm_opt="-melf_i386_sol2"
+ld_ix86_gld_opt="-melf_i386_sol2"
   else
-ld_ix86_tls_ldm_opt="-melf_i386"
+ld_ix86_gld_opt="-melf_i386"
   fi
 fi
 conftest_s='

I'd like to suggest better names, perhaps as_ix86_gas_32_opt and
ld_ix86_gld_32_opt to mark that they are intended for 32bit targets.
But it is up to you, we can also live with above names.


Re: [Patch, avr] Fix PR 71151

2016-06-23 Thread Georg-Johann Lay

Senthil Kumar Selvaraj schrieb:

Georg-Johann Lay writes:


Senthil Kumar Selvaraj schrieb:

Senthil Kumar Selvaraj writes:


Georg-Johann Lay writes:


Senthil Kumar Selvaraj schrieb:

Hi,

  [set JUMP_TABLES_IN_TEXT_SECTION to 1]

I added tests that use linker relaxation and discovered a relaxation bug
in binutils 2.26 (and later) that messes up symbol values in the
presence of alignment directives. I'm working on that right now -
hopefully, it'll get backported to the release branch.

Once that gets upstream, I'll resend the patch - with more tests, and
incorporating your comments.


There were two binutils bugs (PR ld/20221 and ld/20254) that were
blocking this patch - on enabling, relaxation, jumptables were
getting corrupted. Both of the issues are now fixed, and the fixes
are in master and 2.26 branch.
Should we mention in the release notes that Binutils >= 2.26 is needed 
for avr-gcc >= 6 ?


Yes, we should document it. binutils 2.25 would probably work too, as
the bugs were introduced only in binutils 2.26. I'll check and send a patch.
Maybe even check during configure whether an appropriate version of 
Binutils is used?


That would be nice, but is it ok to add target specific conditions to
configure.ac?


We already have avr-specific tests in gcc/configure.ac which test 
whether -mrmw and --mlink-relax are supported by as or not.  This is 
then used in gen-avr-mmcu-specs.c:


#ifdef HAVE_AS_AVR_MLINK_RELAX_OPTION
  fprintf (f, "*asm_relax:\n\t%s\n\n", ASM_RELAX_SPEC);
#endif // have avr-as --mlink-relax

#ifdef HAVE_AS_AVR_MRMW_OPTION
  fprintf (f, "*asm_rmw:\n%s\n\n", rmw
   ? "\t%{!mno-rmw: -mrmw}"
   : "\t%{mrmw}");
#endif // have avr-as -mrmw


Johann


Re: [PATCH] i386: Access external function via GOT slot for -fno-plt

2016-06-23 Thread H.J. Lu
On Thu, Jun 23, 2016 at 5:33 AM, Uros Bizjak  wrote:
> On Thu, Jun 23, 2016 at 1:45 PM, H.J. Lu  wrote:
>> i386 psABI has been updated to clarify that R_386_GOT32X and R_386_GOT32
>> relocations can be used to access GOT without base register when PIC is
>> disabled:
>>
>> https://groups.google.com/forum/#!topic/ia32-abi/awsRSvJOJfs
>>
>> 32-bit x86 assembler and linker from binutils 2.26.1 and 2.27 support
>>
>> call/jmp *_start@GOT
>> cmpl $0, bar@GOT
>>
>> for both normal and IFUNC functions.  We check if 32-bit x86 assembler
>> and linker have the fix for:
>>
>> https://sourceware.org/bugzilla/show_bug.cgi?id=20244
>>
>> before accessing external function via GOT slot for -fno-plt in both PIC
>> and non-PIC modes.
>>
>> Tested on i686 and x86-64.  OK for trunk?
>>
>>
>> H.J.
>> 
>> PR target/66232
>> PR target/67400
>> * configure.ac (HAVE_LD_IX86_GOT32X_RELOC): New.  Defined to 1
>> if 32-bit assembler and linker support "jmp *_start@GOT" and
>> "movl $0, bar@GOT".  Otherise, defined to 0.
>> * config.in: Regenerated.
>> * configure: Likewise.
>> * config/i386/i386.c (ix86_force_load_from_GOT_p): Return
>> true if HAVE_LD_IX86_GOT32X_RELOC is 1 in 32-bit mode.
>> (ix86_legitimate_address_p): Allow UNSPEC_GOT for -fno-plt
>> if ix86_force_load_from_GOT_p returns true.
>> (ix86_print_operand_address_as): Also support UNSPEC_GOT if
>> ix86_force_load_from_GOT_p returns true.
>> (ix86_expand_move): Generate UNSPEC_GOT in 32-bit mode to load
>> the external function address via the GOT slot.
>> (ix86_nopic_noplt_attribute_p): Check HAVE_LD_IX86_GOT32X_RELOC
>> == 0 before returning false in 32-bit mode.
>> (ix86_output_call_insn): Generate "%!jmp/call\t*%p0@GOT" in
>> 32-bit mode if ix86_nopic_noplt_attribute_p returns true.
>>
>> gcc/testsuite/
>>
>> PR target/66232
>> PR target/67400
>> * gcc.target/i386/pr66232-14.c: New file.
>> * gcc.target/i386/pr66232-15.c: Likewise.
>> * gcc.target/i386/pr66232-16.c: Likewise.
>> * gcc.target/i386/pr66232-17.c: Likewise.
>> * gcc.target/i386/pr67400-1.c: Don't disable for ia32.  Scan for
>> ia32 and if R_386_GOT32X relocation is supported.
>> * gcc.target/i386/pr67400-2.c: Likewise.
>> * gcc.target/i386/pr67400-3.c: Likewise.
>> * gcc.target/i386/pr67400-4.c: Likewise.
>> * gcc.target/i386/pr67400-6.c: Likewise.
>> * gcc.target/i386/pr67400-7.c: Likewise.
>> * lib/target-supports.exp (check_effective_target_got32x_reloc):
>> New.
>> ---
>>  gcc/config.in  |  9 +-
>>  gcc/config/i386/i386.c | 35 
>>  gcc/configure  | 50 
>> +
>>  gcc/configure.ac   | 42 
>>  gcc/testsuite/gcc.target/i386/pr66232-14.c | 13 
>>  gcc/testsuite/gcc.target/i386/pr66232-15.c | 14 
>>  gcc/testsuite/gcc.target/i386/pr66232-16.c | 13 
>>  gcc/testsuite/gcc.target/i386/pr66232-17.c | 13 
>>  gcc/testsuite/gcc.target/i386/pr67400-1.c  |  8 +++--
>>  gcc/testsuite/gcc.target/i386/pr67400-2.c  |  8 +++--
>>  gcc/testsuite/gcc.target/i386/pr67400-3.c  |  3 +-
>>  gcc/testsuite/gcc.target/i386/pr67400-4.c  |  5 +--
>>  gcc/testsuite/gcc.target/i386/pr67400-6.c  |  8 +++--
>>  gcc/testsuite/gcc.target/i386/pr67400-7.c  |  6 ++--
>>  gcc/testsuite/lib/target-supports.exp  | 51 
>> ++
>>  15 files changed, 256 insertions(+), 22 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.target/i386/pr66232-14.c
>>  create mode 100644 gcc/testsuite/gcc.target/i386/pr66232-15.c
>>  create mode 100644 gcc/testsuite/gcc.target/i386/pr66232-16.c
>>  create mode 100644 gcc/testsuite/gcc.target/i386/pr66232-17.c
>>
>> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
>> index 9c7b015..a2dcf36 100644
>> --- a/gcc/config/i386/i386.c
>> +++ b/gcc/config/i386/i386.c
>> @@ -15125,7 +15125,8 @@ darwin_local_data_pic (rtx disp)
>>  bool
>>  ix86_force_load_from_GOT_p (rtx x)
>>  {
>> -  return (TARGET_64BIT && !TARGET_PECOFF && !TARGET_MACHO
>> +  return ((TARGET_64BIT || HAVE_LD_IX86_GOT32X_RELOC)
>> + && !TARGET_PECOFF && !TARGET_MACHO
>>   && !flag_plt && !flag_pic
>>   && ix86_cmodel != CM_LARGE
>>   && GET_CODE (x) == SYMBOL_REF
>> @@ -15606,6 +15607,14 @@ ix86_legitimate_address_p (machine_mode, rtx addr, 
>> bool strict)
>>  used.  While ABI specify also 32bit relocations, we don't 
>> produce
>>  them at all and use IP relative instead.  */
>>   case UNSPEC_GOT:
>> +   gcc_assert (flag_pic
>> +   || ix86_force_load_from_GOT_p (XVECEXP (XEXP (disp, 
>> 0), 0, 

Re: [PATCH,openacc] check for compatible loop parallelism with acc routine calls

2016-06-23 Thread Cesar Philippidis
On 06/17/2016 07:42 AM, Jakub Jelinek wrote:
> On Wed, Jun 15, 2016 at 08:12:15PM -0700, Cesar Philippidis wrote:
>> The second set of changes involves teaching the gimplifier to error when
>> it detects a function call to an non-acc routines inside an OpenACC
>> offloaded region. Actually, I relaxed non-acc routines by excluding
>> calls to builtin functions, including those prefixed with _gfortran_.
>> Nvptx does have a newlib c library, and it also has a subset of
>> libgfortran. Still, this solution is probably not optimal.
> 
> I don't really like that, hardcoding prefixes or whatever is available
> (you have quite some subset of libc, libm etc. available too) in the
> compiler looks very hackish.  What is wrong with complaining during
> linking of the offloaded code?

Wouldn't the error get reported multiple times then, i.e. once per
target? Then again, maybe this error could have been restrained to the
host compiler.

Anyway, this patch now reduces that error to a warning. Furthermore,
that warning is being thrown in lower_omp_1 instead of
gimplify_call_expr because the latter is called multiple times and that
causes duplicate warnings. The only bit of fallout I had with this
change was with the fortran FE's usage of BUILT_IN_EXPECT in
gfc_{un}likely. Since these are generated implicitly by the FE, I just
added an oacc_function attribute to those calls when flag_openacc is set.

>> Next, I had to modify the openacc header files in libgomp to mark
>> acc_on_device as an acc routine. Unfortunately, this meant that I had to
>> build the opeancc.mod module for gfortran with -fopenacc. But doing
>> that, caused caused gcc to stream offloaded code to the openacc.o object
>> file. So, I've updated the behavior of flag_generate_offload such that
>> minus one indicates that the user specified -foffload=disable, and that
>> will prevent gcc from streaming offloaded lto code. The alternative was
>> to hack libtool to build libgomp with -foffload=disable.
> 
> This also looks wrong.  I'd say the right thing is when loading modules
> that have OpenACC bits set in it (and also OpenMP bits, I admit I haven't
> handled this well) into CU with the corresponding flags unset (-fopenacc,
> -fopenmp, -fopenmp-simd here, depending on which bit it is), then
> IMHO the module loading code should just ignore it, pretend it wasn't there.
> Similarly e.g. to how lto1 with -g0 should ignore debug statements that
> could be in the LTO inputs.

This required two changes. First, I had to teach lto-cgraph.c how to
report an error rather then fail an assert when partitions are missing
decls. Second, I taught the lto wrapper how to stream offloaded code on
the absence of -fopen*. The only kink with this approach is that I had
to build libgomp/openacc.f90 with -frandom-seed=1 to prevent lto related
bootstrap failures.

By the way, Thomas, I've added

 #pragma acc routine(__builtin_acc_on_device) seq

to openacc.h. Is this OK, or should I just modify the various
libgomp.oacc-c-c++-common/loop* tests to use that pragma directly? Or
another option is to have the compiler add that attribute directly. I
don't think we're really expecting the end user to use
__builtin_acc_on_device directly since this is a gcc-ism.

Cesar
2016-06-23  Cesar Philippidis  

	gcc/
	* lto-cgraph.c (input_overwrite_node): Error on missing symbols.
	(input_varpool_node): Likewise.
	* lto-wrapper.c (compile_images_for_offload_targets): Don't stream
	offloaded images without -fopenacc, -fopenmp or -fopenmp-simd.
	(run_gcc): Set flag_openacc, flag_openmp, and flag_openmp_simd.
	* omp-low.c (lower_omp_1): Emit a warning when calling a function
	that doesn't have an oacc_function attribute from an OpenACC offloaded
	region.
	(oacc_loop_fixed_partitions): Consider SEQ loops when checking
	parallelism.

	gcc/fortran/
	* gfortran.h (enum oacc_function): New enum.
	(oacc_function_types): Declare.
	(symbol_attribute): Add oacc_function field.
	(gfc_intrinsic_sym): Likewise.
	(add_omp_offloading_attributes): Declare.
	* intrinsic.c (add_sym): Initialize oacc_fuction to zero.
	(gfc_intrinsic_sub_interface): Set attr.oacc_function as to
	OACC_FUNCTION_SEQ in the resolved symbol when appropriate.
	* module.c (oacc_function): New DECL_MIO_NAME.
	(mio_symbol_attribute): Set attr->oacc_function.
	* openmp.c (gfc_oacc_routine_dims): Change return type to oacc_function.
	(gfc_match_oacc_routine): Permit named 'acc routine' directives on
	intrinsic procedures.  Update call to gfc_oacc_routine_dims.
	* symbol.c (oacc_function_types): Define.
	* trans-decl.c (add_omp_offloading_attributes): New function.
	(add_attributes_to_decl): Use it.
	* trans.c (gfc_unlikely): Mark calls BUILT_IN_EXPECT as 'acc routines'
	with flag_openacc is set.
	(gfc_likely): Likewise.

	gcc/testsuite/
	* c-c++-common/goacc/kernels-1.c: Add warnings to calls to
	__builtin_abort.
	* c-c++-common/goacc/parallel-1.c: Likewise.
	* c-c++-common/goacc/routine-3.c: Add coverage for acc seq 

Re: IRA costs tweaks, PR 56069

2016-06-23 Thread Jeff Law

On 04/27/2016 05:33 AM, Bernd Schmidt wrote:

On 04/27/2016 06:02 AM, Jeff Law wrote:

AFAICT the sra-1.c expects to see the incremented value and I'm at a
loss to understand what's really going on here.  Can you give more
details?


Yeah, maybe my first impression wasn't very accurate.

When I try to run gdb manually, it just crashes:

(gdb) show version
GNU gdb (Gentoo 7.10.1 vanilla) 7.10.1
(gdb) b 43
Breakpoint 1 at 0x40059b: file sra-1.c, line 43.
(gdb) run
Starting program: /local/src/egcs/bscommit/gcc/a.out

Breakpoint 1, f3 (k=) at sra-1.c:43
43  bar (a.j);/* { dg-final { gdb-test 43 "a.j" "14" } } */
(gdb) p a.j
Segmentation fault (core dumped)


[ ... ]




I don't really understand the var-tracking stuff too well, so no idea
where to go from here. I suppose I'm withdrawing my patch.
Based on the above, there's some kind of GDB bug.  So your patch may 
still be a good thing.


I did a build on F23 which has effectively the same version of gdb and 
can reproduce the gdb segfault.  It also reproduces on F24 which has 
gdb-7.11.1


AFAICT gdb thinks the value of "a" has been optimized out, but goes 
absolutely bananas and segfaults if you try to examine a field within "a".


[law@torsion gcc]$ ./xgcc -B./ -g sra-1.c -O2
[law@torsion gcc]$ gdb ./a.out
GNU gdb (GDB) Fedora 7.10.1-30.fc23
[ ... ]
(gdb) b 43
Breakpoint 1 at 0x40056b: file sra-1.c, line 43.
(gdb) r
Starting program: /opt/notnfs/law/gcc-testing/obj/gcc/a.out
Missing separate debuginfos, use: dnf debuginfo-install 
glibc-2.22-10.fc23.x86_64


Breakpoint 1, f3 (k=) at sra-1.c:43
43bar (a.j);/* { dg-final { gdb-test 43 "a.j" "14" } 
} */

(gdb) p a
$1 = 
(gdb) p a.i
Segmentation fault (core dumped)


My gdb skills are far too rusty to take this further.  I've filed an 
upstream report (BZ20295 in the gdb tracker).  Probably not much we can 
do until the GDB side gets fixed.


I'm going to attach this to 56069 for future reference.
jeff






Re: [PATCH v2] Allocate constant size dynamic stack space in the prologue

2016-06-23 Thread Dominik Vogt
Third version of the patch.  Changes:

 * Corrected a typo in a test case comment.
 * Verify that stack variable alignment does not force the frame
   pointer into existence (with -fomit-frame-pointer)

The test should hopefully run on all targets.  Tested on s390,
s390x biarch, x86_64.  The only open question I'm aware of is the
stack-usage-2.c test.  I guess foo3() will not generate

  stack usage might be ... bytes

On any target anymore, and using alloca() with a constant size
results in "unbounded".  It's unclear to me whether that message
is ever generated, and if so, how to trigger it.

> >>diff --git a/gcc/testsuite/gcc.dg/stack-layout-dynamic-1.c 
> >>b/gcc/testsuite/gcc.dg/stack-layout-dynamic-1.c
> >>new file mode 100644
> >>index 000..e06a16c
> >>--- /dev/null
> >>+++ b/gcc/testsuite/gcc.dg/stack-layout-dynamic-1.c
> >>@@ -0,0 +1,14 @@
> >>+/* Verify that run time aligned local variables are aloocated in the 
> >>prologue
> >>+   in one pass together with normal local variables.  */
> >>+/* { dg-do compile } */
> >>+/* { dg-options "-O0" } */
> >>+
> >>+extern void bar (void *, void *, void *);
> >>+void foo (void)
> >>+{
> >>+  int i;
> >>+  __attribute__ ((aligned(65536))) char runtime_aligned_1[512];
> >>+  __attribute__ ((aligned(32768))) char runtime_aligned_2[1024];
> >>+  bar (, _aligned_1, _aligned_2);
> >>+}
> >>+/* { dg-final { scan-assembler-times "cfi_def_cfa_offset" 2 { target { 
> >>s390*-*-* } } } } */
> >
> >I've no idea how to test this on other targets, or how to express
> >the test in a target independent way.  The scan-assembler-times
> >does not work on x86_64.
> I wonder if you could force -fomit-frame-pointer and see if we still
> end up with a frame pointer?
> 
> jeff

On Fri, May 06, 2016 at 10:37:47AM +0100, Dominik Vogt wrote:
> Updated version of the patch described below.  Apart from fixing a
> bug and adding a test, the new logic is now used always, for all
> targets.  The discussion of the original patch starts here:
>
> https://gcc.gnu.org/ml/gcc-patches/2015-11/msg03052.html
>
> The new patch has been bootstrapped and regression tested on s390,
> s390x and x86_64, but please check the questions/comments in the
> follow up message.
>
> On Wed, Nov 25, 2015 at 01:56:10PM +0100, Dominik Vogt wrote:
> > The attached patch fixes a warning during Linux kernel compilation
> > on S/390 due to -mwarn-dynamicstack and runtime alignment of stack
> > variables with constant size causing cfun->calls_alloca to be set
> > (even if alloca is not used at all).  The patched code places
> > constant size runtime aligned variables in the "virtual stack
> > vars" area instead of creating a "virtual stack dynamic" area.
> >
> > This behaviour is activated by defining
> >
> >   #define ALLOCATE_DYNAMIC_STACK_SPACE_IN_PROLOGUE 1
> >
> > in the backend; otherwise the old logic is used.
> >
> > The kernel uses runtime alignment for the page structure (aligned
> > to 16 bytes), and apart from triggereing the alloca warning
> > (-mwarn-dynamicstack), the current Gcc also generates inefficient
> > code like
> >
> >   aghi %r15,-160  # prologue: create stack frame
> >   lgr %r11,%r15   # prologue: generate frame pointer
> >   aghi %r15,-32   # space for dynamic stack
> >
> > which could be simplified to
> >
> >   aghi %r15,-192
> >
> > (if later optimization passes are able to get rid of the frame
> > pointer).  Is there a specific reason why the patched behaviour
> > shouldn't be used for all platforms?
> >
> > --
> >
> > As the placement of runtime aligned stack variables with constant
> > size is done completely in the middleend, I don't see a way to fix
> > this in the backend.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany
gcc/ChangeLog

* cfgexpand.c (expand_stack_vars): Implement synamic stack space
allocation in the prologue.
* explow.c (get_dynamic_stack_base): New function to return an address
expression for the dynamic stack base.
(get_dynamic_stack_size): New function to do the required dynamic stack
space size calculations.
(allocate_dynamic_stack_space): Use new functions.
(align_dynamic_address): Move some code from
allocate_dynamic_stack_space to new function.
* explow.h (get_dynamic_stack_base, get_dynamic_stack_size): Export.
gcc/testsuite/ChangeLog

* gcc.target/s390/warn-dynamicstack-1.c: New test.
* gcc.dg/stack-usage-2.c (foo3): Adapt expected warning.
stack-layout-dynamic-1.c: New test.
>From 8e48e4a8ff063e7894a375724ed5eddb57018c03 Mon Sep 17 00:00:00 2001
From: Dominik Vogt 
Date: Wed, 25 Nov 2015 09:31:19 +0100
Subject: [PATCH] Allocate constant size dynamic stack space in the
 prologue ...

... and place it in the virtual stack vars area, if the platform supports it.
On S/390 this saves adjusting the stack pointer twice and forcing the frame
pointer into existence.  It also removes the warning with -mwarn-dynamicstack
that is 

[RFC] __atomic_compare_exchange* optimizations (PR middle-end/66867)

2016-06-23 Thread Jakub Jelinek
Hi!

This PR is about 2 issues with the *atomic_compare_exchange* APIs, which
didn't exist with __sync_*_compare_and_swap:
1) the APIs make the expected argument addressable, although it is very
   common it is an automatic variable that is addressable only because of
   these APIs
2) for the fear that expected might be a pointer to memory accessed by
   multiple threads, the store of the oldvar to that location is only
   conditional (if the compare and swap failed) - while again for the
   common case when it is a local otherwise non-addressable automatic
   var, it can be stored unconditionally.

To resolve this, we effectively need a call (or some other stmt) that
returns two values.  We need that also for the __builtin_*_overflow*
builtins and have solved it by returning from an internal-fn call
a complex int value, where REALPART_EXPR of it is one result and
IMAGPART_EXPR the other (bool-ish) result.

The following patch handles it the same, by folding
__atomic_compare_exchange_N early to an internal call (with conditional
store in the IL), and then later on if the expected var becomes
non-addressable and is rewritten into SSA, optimizing the conditional store
into unconditional (that is the gimple-fold.c part).

Thinking about this again, there could be another option - keep
__atomic_compare_exchange_N in the IL, but under certain conditions (similar
to what the patch uses in fold_builtin_atomic_compare_exchange) for these
builtins ignore  on the second argument, and if we actually turn var
into non-addressable, convert the builtin call similarly to what
fold_builtin_atomic_compare_exchange does in the patch (except the store
would be non-conditional then; the gimple-fold.c part wouldn't be needed
then).

Any preferences?  This version has been bootstrapped/regtested on
x86_64-linux and i686-linux.  Attached are various testcases I've been using
to see if the generated code improved (tried x86_64, powerpc64le, s390x and
aarch64).  E.g. on x86_64-linux, in the first testcase at -O2 the
improvement in f1/f2 is removal of dead
   movl$0, -4(%rsp)
in f4
-   movl$0, -4(%rsp)
lock; cmpxchgl  %edx, (%rdi)
-   je  .L7
-   movl%eax, -4(%rsp)
-.L7:
-   movl-4(%rsp), %eax
etc.

2016-06-23  Jakub Jelinek  

PR middle-end/66867
* builtins.c: Include gimplify.h.
(expand_ifn_atomic_compare_exchange_into_call,
expand_ifn_atomic_compare_exchange,
fold_builtin_atomic_compare_exchange): New functions.
(fold_builtin_varargs): Handle BUILT_IN_ATOMIC_COMPARE_EXCHANGE_*.
* internal-fn.c (expand_ATOMIC_COMPARE_EXCHANGE): New function.
* tree.h (build_call_expr_internal_loc): Rename to ...
(build_call_expr_internal_loc_array): ... this.  Fix up type of
last argument.
* internal-fn.def (ATOMIC_COMPARE_EXCHANGE): New internal fn.
* predict.c (expr_expected_value_1): Handle IMAGPART_EXPR of
ATOMIC_COMPARE_EXCHANGE result.
* builtins.h (expand_ifn_atomic_compare_exchange): New prototype.
* gimple-fold.c (fold_ifn_atomic_compare_exchange): New function.
(gimple_fold_call): Handle IFN_ATOMIC_COMPARE_EXCHANGE.

* gfortran.dg/coarray_atomic_4.f90: Add -O0 to dg-options.

--- gcc/builtins.c.jj   2016-06-08 21:01:25.0 +0200
+++ gcc/builtins.c  2016-06-23 09:17:51.053713986 +0200
@@ -65,6 +65,7 @@ along with GCC; see the file COPYING3.
 #include "internal-fn.h"
 #include "case-cfn-macros.h"
 #include "gimple-fold.h"
+#include "gimplify.h"
 
 
 struct target_builtins default_target_builtins;
@@ -5158,6 +5159,123 @@ expand_builtin_atomic_compare_exchange (
   return target;
 }
 
+/* Helper function for expand_ifn_atomic_compare_exchange - expand
+   internal ATOMIC_COMPARE_EXCHANGE call into __atomic_compare_exchange_N
+   call.  The weak parameter must be dropped to match the expected parameter
+   list and the expected argument changed from value to pointer to memory
+   slot.  */
+
+static void
+expand_ifn_atomic_compare_exchange_into_call (gcall *call, machine_mode mode)
+{
+  unsigned int z;
+  vec *vec;
+
+  vec_alloc (vec, 5);
+  vec->quick_push (gimple_call_arg (call, 0));
+  tree expected = gimple_call_arg (call, 1);
+  rtx x = assign_stack_temp_for_type (mode, GET_MODE_SIZE (mode),
+ TREE_TYPE (expected));
+  rtx expd = expand_expr (expected, x, mode, EXPAND_NORMAL);
+  if (expd != x)
+emit_move_insn (x, expd);
+  tree v = make_tree (TREE_TYPE (expected), x);
+  vec->quick_push (build1 (ADDR_EXPR,
+  build_pointer_type (TREE_TYPE (expected)), v));
+  vec->quick_push (gimple_call_arg (call, 2));
+  /* Skip the boolean weak parameter.  */
+  for (z = 4; z < 6; z++)
+vec->quick_push (gimple_call_arg (call, z));
+  built_in_function fncode
+= (built_in_function) ((int) BUILT_IN_ATOMIC_COMPARE_EXCHANGE_1
+  + 

Re: [PATCH] Handle undefined extern vars in output_in_order

2016-06-23 Thread Alexander Monakov
Hi,

I've discovered that this assert in my patch was too restrictive:

+  if (DECL_HAS_VALUE_EXPR_P (pv->decl))
+   {
+ gcc_checking_assert (lookup_attribute ("omp declare target link",
+DECL_ATTRIBUTES (pv->decl)));

Testing for the nvptx target uncovered that there's another case where a
global variable would have a value expr: emutls.  Sorry for not spotting it
earlier (but at least the new assert did its job).  I think we should always
skip here over decls that have value-exprs, just like hard-reg vars are
skipped.  The following patch does that.  Is this still OK?

(bootstrapped/regtested on x86-64)

Alexander

* cgraphunit.c (cgraph_order_sort_kind): New entry ORDER_VAR_UNDEF.
(output_in_order): Loop over undefined variables too.  Output them
via assemble_undefined_decl.  Skip variables that correspond to hard
registers or have value-exprs.
* varpool.c (symbol_table::output_variables): Handle undefined
variables together with defined ones.
 
diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
index 4bfcad7..e30fe6e 100644
--- a/gcc/cgraphunit.c
+++ b/gcc/cgraphunit.c
@@ -2141,6 +2141,7 @@ enum cgraph_order_sort_kind
   ORDER_UNDEFINED = 0,
   ORDER_FUNCTION,
   ORDER_VAR,
+  ORDER_VAR_UNDEF,
   ORDER_ASM
 };
 
@@ -2187,16 +2188,20 @@ output_in_order (bool no_reorder)
}
 }
 
-  FOR_EACH_DEFINED_VARIABLE (pv)
-if (!DECL_EXTERNAL (pv->decl))
-  {
-   if (no_reorder && !pv->no_reorder)
-   continue;
-   i = pv->order;
-   gcc_assert (nodes[i].kind == ORDER_UNDEFINED);
-   nodes[i].kind = ORDER_VAR;
-   nodes[i].u.v = pv;
-  }
+  /* There is a similar loop in symbol_table::output_variables.
+ Please keep them in sync.  */
+  FOR_EACH_VARIABLE (pv)
+{
+  if (no_reorder && !pv->no_reorder)
+   continue;
+  if (DECL_HARD_REGISTER (pv->decl)
+ || DECL_HAS_VALUE_EXPR_P (pv->decl))
+   continue;
+  i = pv->order;
+  gcc_assert (nodes[i].kind == ORDER_UNDEFINED);
+  nodes[i].kind = pv->definition ? ORDER_VAR : ORDER_VAR_UNDEF;
+  nodes[i].u.v = pv;
+}
 
   for (pa = symtab->first_asm_symbol (); pa; pa = pa->next)
 {
@@ -,16 +2227,13 @@ output_in_order (bool no_reorder)
  break;
 
case ORDER_VAR:
-#ifdef ACCEL_COMPILER
- /* Do not assemble "omp declare target link" vars.  */
- if (DECL_HAS_VALUE_EXPR_P (nodes[i].u.v->decl)
- && lookup_attribute ("omp declare target link",
-  DECL_ATTRIBUTES (nodes[i].u.v->decl)))
-   break;
-#endif
  nodes[i].u.v->assemble_decl ();
  break;
 
+   case ORDER_VAR_UNDEF:
+ assemble_undefined_decl (nodes[i].u.v->decl);
+ break;
+
case ORDER_ASM:
  assemble_asm (nodes[i].u.a->asm_str);
  break;
diff --git a/gcc/varpool.c b/gcc/varpool.c
index ab615fa..e5f991e 100644
--- a/gcc/varpool.c
+++ b/gcc/varpool.c
@@ -733,11 +733,6 @@ symbol_table::output_variables (void)
 
   timevar_push (TV_VAROUT);
 
-  FOR_EACH_VARIABLE (node)
-if (!node->definition
-   && !DECL_HAS_VALUE_EXPR_P (node->decl)
-   && !DECL_HARD_REGISTER (node->decl))
-  assemble_undefined_decl (node->decl);
   FOR_EACH_DEFINED_VARIABLE (node)
 {
   /* Handled in output_in_order.  */
@@ -747,20 +742,19 @@ symbol_table::output_variables (void)
   node->finalize_named_section_flags ();
 }
 
-  FOR_EACH_DEFINED_VARIABLE (node)
+  /* There is a similar loop in output_in_order.  Please keep them in sync.  */
+  FOR_EACH_VARIABLE (node)
 {
   /* Handled in output_in_order.  */
   if (node->no_reorder)
continue;
-#ifdef ACCEL_COMPILER
-  /* Do not assemble "omp declare target link" vars.  */
-  if (DECL_HAS_VALUE_EXPR_P (node->decl)
- && lookup_attribute ("omp declare target link",
-  DECL_ATTRIBUTES (node->decl)))
+  if (DECL_HARD_REGISTER (node->decl)
+ || DECL_HAS_VALUE_EXPR_P (node->decl))
continue;
-#endif
-  if (node->assemble_decl ())
-changed = true;
+  if (node->definition)
+   changed |= node->assemble_decl ();
+  else
+   assemble_undefined_decl (node->decl);
 }
   timevar_pop (TV_VAROUT);
   return changed;



[PATCH 1/3] Add gcc-auto-profile script

2016-06-23 Thread Andi Kleen
From: ak 

Using autofdo is currently something difficult. It requires using the
model specific branches taken event, which differs on different CPUs.
The example shown in the manual requires a special patched version of
perf that is non standard, and also will likely not work everywhere.

This patch adds a new gcc-auto-profile script that figures out the
correct event and runs perf.

This is needed to actually make use of autofdo in a generic way
in the build system and in the test suite.

Since maintaining the script would be somewhat tedious (needs changes
every time a new CPU comes out) I auto generated it from the online
Intel event database. The script to do that is in contrib and can be
rerun.

Right now there is no test if perf works in configure. This
would vary depending on the build and target system, and since
it currently doesn't work in virtualization and needs uptodate
kernel it may often fail in common distribution build setups.

So far the script is not installed.

gcc/:
2016-06-23  Andi Kleen  

* config/i386/gcc-auto-profile: New file.

contrib/:

2016-06-23  Andi Kleen  

* gen_autofdo_event.py: New file to regenerate
gcc-auto-profile.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@237731 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 contrib/ChangeLog|   5 ++
 contrib/gen_autofdo_event.py | 155 +++
 gcc/ChangeLog|   4 +
 gcc/config/i386/gcc-auto-profile |  70 ++
 4 files changed, 234 insertions(+)
 create mode 100755 contrib/gen_autofdo_event.py
 create mode 100755 gcc/config/i386/gcc-auto-profile

diff --git a/contrib/ChangeLog b/contrib/ChangeLog
index 8e6823d..07019c2 100644
--- a/contrib/ChangeLog
+++ b/contrib/ChangeLog
@@ -1,3 +1,8 @@
+2016-06-23  Andi Kleen  
+
+   * gen_autofdo_event.py: New file to regenerate
+   gcc-auto-profile.
+
 2016-06-21  Trevor Saunders  
 
* config-list.mk: Stop testing mep-elf.
diff --git a/contrib/gen_autofdo_event.py b/contrib/gen_autofdo_event.py
new file mode 100755
index 000..3865cbb
--- /dev/null
+++ b/contrib/gen_autofdo_event.py
@@ -0,0 +1,155 @@
+#!/usr/bin/python
+# Generate Intel taken branches Linux perf event script for autofdo profiling.
+
+# Copyright (C) 2016 Free Software Foundation, Inc.
+#
+# GCC is free software; you can redistribute it and/or modify it under
+# the terms of the GNU General Public License as published by the Free
+# Software Foundation; either version 3, or (at your option) any later
+# version.
+#
+# GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+# WARRANTY; without even the implied warranty of MERCHANTABILITY or
+# FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+# for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# .  */
+
+# Run it with perf record -b -e EVENT program ...
+# The Linux Kernel needs to support the PMU of the current CPU, and
+# It will likely not work in VMs.
+# Add --all to print for all cpus, otherwise for current cpu.
+# Add --script to generate shell script to run correct event.
+#
+# Requires internet (https) access. This may require setting up a proxy
+# with export https_proxy=...
+#
+import urllib2
+import sys
+import json
+import argparse
+import collections
+
+baseurl = "https://download.01.org/perfmon;
+
+target_events = (u'BR_INST_RETIRED.NEAR_TAKEN',
+ u'BR_INST_EXEC.TAKEN',
+ u'BR_INST_RETIRED.TAKEN_JCC',
+ u'BR_INST_TYPE_RETIRED.COND_TAKEN')
+
+ap = argparse.ArgumentParser()
+ap.add_argument('--all', '-a', help='Print for all CPUs', action='store_true')
+ap.add_argument('--script', help='Generate shell script', action='store_true')
+args = ap.parse_args()
+
+eventmap = collections.defaultdict(list)
+
+def get_cpu_str():
+with open('/proc/cpuinfo', 'r') as c:
+vendor, fam, model = None, None, None
+for j in c:
+n = j.split()
+if n[0] == 'vendor_id':
+vendor = n[2]
+elif n[0] == 'model' and n[1] == ':':
+model = int(n[2])
+elif n[0] == 'cpu' and n[1] == 'family':
+fam = int(n[3])
+if vendor and fam and model:
+return "%s-%d-%X" % (vendor, fam, model), model
+return None, None
+
+def find_event(eventurl, model):
+print >>sys.stderr, "Downloading", eventurl
+u = urllib2.urlopen(eventurl)
+events = json.loads(u.read())
+u.close()
+
+found = 0
+for j in events:
+if j[u'EventName'] in target_events:
+event = "cpu/event=%s,umask=%s/" % (j[u'EventCode'], j[u'UMask'])
+if u'PEBS' in j and j[u'PEBS'] > 0:
+  

Committed version of autofdo testing patches

2016-06-23 Thread Andi Kleen
I committed the autofdo bootstrap and testing patches now.

I did some small changes to the patchkit, so I'm reposting the final
committed version:

- Addressed Jeff's comments
- Fixed the grep code in the scripts
- Unsupported tooling is now reported as unsupported
- Unified tree bootstrap is tested and works
- I fixed some problems I added to the TCL code for testing, now
all the profiling test cases run again. Currently there are some new
failures with autofdo available due to differences to the instrumented 
bootstrap;
I'll address those separately.

-Andi



[PATCH 2/3] Run profile feedback tests with autofdo

2016-06-23 Thread Andi Kleen
From: ak 

Extend the existing bprob and tree-prof tests to also run with autofdo.
The test runtimes are really a bit too short for autofdo, but it's
a reasonable sanity check.

This only works natively for now.

dejagnu doesn't seem to support a wrapper for unix tests, so I had
to open code running these tests.  That should be ok due to the
native run restrictions.

gcc/testsuite/:

2016-06-23  Andi Kleen  

* g++.dg/bprob/bprob.exp: Support autofdo.
* g++.dg/tree-prof/tree-prof.exp: dito.
* gcc.dg/tree-prof/tree-prof.exp: dito.
* gcc.misc-tests/bprob.exp: dito.
* gfortran.dg/prof/prof.exp: dito.
* lib/profopt.exp: dito.
* lib/target-supports.exp: Check for autofdo.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@237732 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/testsuite/ChangeLog  | 10 
 gcc/testsuite/g++.dg/bprob/bprob.exp |  8 +++
 gcc/testsuite/g++.dg/tree-prof/tree-prof.exp |  8 +++
 gcc/testsuite/gcc.dg/tree-prof/tree-prof.exp |  8 +++
 gcc/testsuite/gcc.misc-tests/bprob.exp   |  7 +++
 gcc/testsuite/gfortran.dg/prof/prof.exp  |  7 +++
 gcc/testsuite/lib/profopt.exp| 82 +++-
 gcc/testsuite/lib/target-supports.exp| 37 +
 8 files changed, 164 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 287baf6..55f8dbf 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,13 @@
+2016-06-23  Andi Kleen  
+
+   * g++.dg/bprob/bprob.exp: Support autofdo.
+   * g++.dg/tree-prof/tree-prof.exp: dito.
+   * gcc.dg/tree-prof/tree-prof.exp: dito.
+   * gcc.misc-tests/bprob.exp: dito.
+   * gfortran.dg/prof/prof.exp: dito.
+   * lib/profopt.exp: dito.
+   * lib/target-supports.exp: Check for autofdo.
+
 2016-06-23  Martin Liska  
 
* gcc.dg/pr71619.c: New test.
diff --git a/gcc/testsuite/g++.dg/bprob/bprob.exp 
b/gcc/testsuite/g++.dg/bprob/bprob.exp
index d07..4818298 100644
--- a/gcc/testsuite/g++.dg/bprob/bprob.exp
+++ b/gcc/testsuite/g++.dg/bprob/bprob.exp
@@ -53,6 +53,7 @@ if $tracelevel then {
 
 set profile_options "-fprofile-arcs"
 set feedback_options "-fbranch-probabilities"
+set profile_wrapper ""
 
 # Main loop.
 foreach profile_option $profile_options feedback_option $feedback_options {
@@ -65,4 +66,11 @@ foreach profile_option $profile_options feedback_option 
$feedback_options {
 }
 }
 
+foreach src [lsort [glob -nocomplain $srcdir/$subdir/*.C]] {
+if ![runtest_file_p $runtests $src] then {
+continue
+}
+auto-profopt-execute $src
+}
+
 set PROFOPT_OPTIONS $bprob_save_profopt_options
diff --git a/gcc/testsuite/g++.dg/tree-prof/tree-prof.exp 
b/gcc/testsuite/g++.dg/tree-prof/tree-prof.exp
index 7a4b5cb..26ee0b3 100644
--- a/gcc/testsuite/g++.dg/tree-prof/tree-prof.exp
+++ b/gcc/testsuite/g++.dg/tree-prof/tree-prof.exp
@@ -44,6 +44,7 @@ set PROFOPT_OPTIONS [list {}]
 # profile data.
 set profile_option "-fprofile-generate -D_PROFILE_GENERATE"
 set feedback_option "-fprofile-use -D_PROFILE_USE"
+set profile_wrapper ""
 
 foreach src [lsort [glob -nocomplain $srcdir/$subdir/*.C]] {
 # If we're only testing specific files and this isn't one of them, skip it.
@@ -53,4 +54,11 @@ foreach src [lsort [glob -nocomplain $srcdir/$subdir/*.C]] {
 profopt-execute $src
 }
 
+foreach src [lsort [glob -nocomplain $srcdir/$subdir/*.C]] {
+if ![runtest_file_p $runtests $src] then {
+continue
+}
+auto-profopt-execute $src
+}
+
 set PROFOPT_OPTIONS $treeprof_save_profopt_options
diff --git a/gcc/testsuite/gcc.dg/tree-prof/tree-prof.exp 
b/gcc/testsuite/gcc.dg/tree-prof/tree-prof.exp
index 650ad8d..aaccf19 100644
--- a/gcc/testsuite/gcc.dg/tree-prof/tree-prof.exp
+++ b/gcc/testsuite/gcc.dg/tree-prof/tree-prof.exp
@@ -44,6 +44,7 @@ set PROFOPT_OPTIONS [list {}]
 # profile data.
 set profile_option "-fprofile-generate -D_PROFILE_GENERATE"
 set feedback_option "-fprofile-use -D_PROFILE_USE"
+set profile_wrapper ""
 
 foreach src [lsort [glob -nocomplain $srcdir/$subdir/*.c]] {
 # If we're only testing specific files and this isn't one of them, skip it.
@@ -53,4 +54,11 @@ foreach src [lsort [glob -nocomplain $srcdir/$subdir/*.c]] {
 profopt-execute $src
 }
 
+foreach src [lsort [glob -nocomplain $srcdir/$subdir/*.c]] {
+if ![runtest_file_p $runtests $src] then {
+continue
+}
+auto-profopt-execute $src
+}
+
 set PROFOPT_OPTIONS $treeprof_save_profopt_options
diff --git a/gcc/testsuite/gcc.misc-tests/bprob.exp 
b/gcc/testsuite/gcc.misc-tests/bprob.exp
index 52dcb1f..132bfe3 100644
--- a/gcc/testsuite/gcc.misc-tests/bprob.exp
+++ b/gcc/testsuite/gcc.misc-tests/bprob.exp
@@ -41,6 +41,7 @@ load_lib profopt.exp
 set bprob_save_profopt_options $PROFOPT_OPTIONS
 set PROFOPT_OPTIONS [list { -O2 

Re: [PATCH] Print column numbers in inclusion trace consistently.

2016-06-23 Thread David Malcolm
On Tue, 2016-06-21 at 21:09 -0600, Jeff Law wrote:
> On 06/03/2016 05:24 AM, Marcin Baczyński wrote:
> > Hi,
> > the patch below fixes PR/42014. Although the fix itself seems easy
> > enough,
> > I have a problem with the test. Is there a way to match the output
> > before
> > the "warning:" line? dg-{begin,end}-multiline-output doesn't do the
> > job, or
> > at least I don't know how to convince it.
> > 
> > Bootstrapped on x86_64 linux.
> > 
> > Thanks,
> > Marcin
> > 
> > 
> > gcc/ChangeLog:
> > 
> >PR/42014
> > 
> >* diagnostic.c (diagnostic_report_current_module): Print column
> > numbers
> > for all mentioned files if context->show_column.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> >PR/42014
> > 
> >* gcc.dg/inclusion-trace-column.i: New test.
> The change itself seems reasonable.  You might contact David Malcolm 
> (dmalc...@redhat.com) directly to see if he's got any ideas on how to
> convince the multi-line test to do what you want.  Let's hold off 
> installing the fix until we've got the testsuite issue sorted out.

You could turn up the verbosity level to debug things, by running
something like:

 make check-gcc RUNTESTFLAGS="-v -v -v -v dg.exp=inclusion-trace-column.i"

(multiline.exp prints various things at verbosity level 3 and 4, iirc;
in particular, it can show you the regexp it's looking for).

Maybe a tabs vs spaces issue?

Dave



Implement C _FloatN, _FloatNx types [version 3]

2016-06-23 Thread Joseph Myers
[The only changes in version 3 of the patch are additions to the
testcases to provide more thorough test coverage for comparisons on
the new types, both the C operators == != < <= > >= and the
__builtin_is* comparisons.]


ISO/IEC TS 18661-3:2015 defines C bindings to IEEE interchange and
extended types, in the form of _FloatN and _FloatNx type names with
corresponding fN/FN and fNx/FNx constant suffixes and FLTN_* / FLTNX_*
 macros.  This patch implements support for this feature in
GCC.

The _FloatN types, for N = 16, 32, 64 or >= 128 and a multiple of 32,
are types encoded according to the corresponding IEEE interchange
format (endianness unspecified; may use either the NaN conventions
recommended in IEEE 754-2008, or the MIPS NaN conventions, since the
choice of convention is only an IEEE recommendation, not a
requirement).  The _FloatNx types, for N = 32, 64 and 128, are IEEE
"extended" types: types extending a narrower format with range and
precision at least as big as those specified in IEEE 754 for each
extended type (and with unspecified representation, but still
following IEEE semantics for their values and operations - and with
the set of values being determined by the precision and the maximum
exponent, which means that while Intel "extended" is suitable for
_Float64x, m68k "extended" is not).  These types are always distinct
from and not compatible with each other and the standard floating
types float, double, long double; thus, double, _Float64 and _Float32x
may all have the same ABI, but they are three still distinct types.
The type names may be used with _Complex to construct corresponding
complex types (unlike __float128, which acts more like a typedef name
than a keyword - thus, this patch may be considered to fix PR
c/32187).  The new suffixes can be combined with GNU "i" and "j"
suffixes for constants of complex types (e.g. 1.0if128, 2.0f64i).

The set of types supported is implementation-defined.  In this GCC
patch, _Float32 is SFmode if that is suitable; _Float32x and _Float64
are DFmode if that is suitable; _Float128 is TFmode if that is
suitable; _Float64x is XFmode if that is suitable, and otherwise
TFmode if that is suitable.  There is a target hook to override the
choices if necessary.  "Suitable" means both conforming to the
requirements of that type, and supported as a scalar type including in
libgcc.  The ABI is whatever the back end does for scalars of that
mode (but note that _Float32 is passed without promotion in variable
arguments, unlike float).  All the existing issues with exceptions and
rounding modes for existing types apply equally to the new type names.

No GCC port supports a floating-point format suitable for _Float128x.
Although there is HFmode support for ARM and AArch64, use of that for
_Float16 is not enabled.  Supporting _Float16 would require additional
work on the excess precision aspects of TS 18661-3: there are new
values of FLT_EVAL_METHOD, which are not currently supported in GCC,
and FLT_EVAL_METHOD == 0 now means that operations and constants on
types narrower than float are evaluated to the range and precision of
float.  Implementing that, so that _Float16 gets evaluated with excess
range and precision, would involve changes to the excess precision
infrastructure so that the _Float16 case is enabled by default, unlike
the x87 case which is only enabled for -fexcess-precision=standard.
Other differences between _Float16 and __fp16 would also need to be
disentangled.

GCC has some prior support for nonstandard floating-point types in the
form of __float80 and __float128.  Where these were previously types
distinct from long double, they are made by this patch into aliases
for _Float64x / _Float128 if those types have the required properties.

In principle the set of possible _FloatN types is infinite.  This
patch hardcodes the four such types for N <= 128, but with as much
code as possible using loops over types to minimize the number of
places with such hardcoding.  I don't think it's likely any further
such types will be of use in future (or indeed that formats suitable
for _Float128x will actually be implemented).  There is a corner case
that all _FloatN, for N >= 128 and a multiple of 32, should be treated
as keywords even when the corresponding type is not supported; I
intend to deal with that in a followup patch.

Tests are added for various functionality of the new types, mostly
using type-generic headers.  PowerPC maintainers should note that the
tests do not do anything regarding passing special options to enable
support for the types, either for the tests themselves or for the
corresponding effective-target tests.  Thus, to run the _Float128
tests on PowerPC, you will need to add such support, { dg-add-options
float128 } or similar and make sure it affects both the
effective-target tests and the tests themselves.  The complex
arithmetic support in libgcc will also be needed, as otherwise the
associated tests will fail.  (The same would 

Re: [PATCH, rs6000] Add minimum __float128 built-in support required for glibc

2016-06-23 Thread Bill Schmidt

> On Jun 22, 2016, at 8:10 PM, Bill Schmidt  wrote:
> 
>> On Jun 22, 2016, at 6:27 PM, Joseph Myers  wrote:
>> 
>> (b) for trunk, having an insn pattern infkf1 for a built-in function that 
>> loads a constant is not appropriate (other insn patterns to optimize the 
>> architecture-independent built-in functions may well be appropriate).  
>> Rather, if there is a particularly efficient way of generating code to 
>> load a certain constant, the back end should be set up to use that way 
>> whenever that constant occurs (more generally, whenever any constant to 
>> which that efficient way applies occurs) - including for example when it 
>> occurs from folding arithmetic, say (__float128) __builtin_inff (), not 
>> just from __builtin_inff128 ().
> 
> The fact that I hook this built-in directly to a pattern named infkf1
> doesn't seem to preclude anything you suggest.  I named it this way
> on the off-chance that inf1 becomes a standard pattern in the
> future, in which case I want to generate this constant.  We can 
> always use gen_infkf1 to reuse this code in any other context.  I'm
> not understanding your objection.

Though perhaps your point is the specific use of KF here, which is
inconsistent and should be fixed.  The mode should be "whichever of
TF and KF is IEEE-128" as elsewhere.  I'll fix that.  Probably this
pattern should be moved over to rs6000.md also, where the fabs and
copysign support are.

Bill

> 
> Thanks,
> Bill



Re: [PATCH] i386: Access external function via GOT slot for -fno-plt

2016-06-23 Thread Uros Bizjak
On Thu, Jun 23, 2016 at 1:45 PM, H.J. Lu  wrote:
> i386 psABI has been updated to clarify that R_386_GOT32X and R_386_GOT32
> relocations can be used to access GOT without base register when PIC is
> disabled:
>
> https://groups.google.com/forum/#!topic/ia32-abi/awsRSvJOJfs
>
> 32-bit x86 assembler and linker from binutils 2.26.1 and 2.27 support
>
> call/jmp *_start@GOT
> cmpl $0, bar@GOT
>
> for both normal and IFUNC functions.  We check if 32-bit x86 assembler
> and linker have the fix for:
>
> https://sourceware.org/bugzilla/show_bug.cgi?id=20244
>
> before accessing external function via GOT slot for -fno-plt in both PIC
> and non-PIC modes.
>
> Tested on i686 and x86-64.  OK for trunk?
>
>
> H.J.
> 
> PR target/66232
> PR target/67400
> * configure.ac (HAVE_LD_IX86_GOT32X_RELOC): New.  Defined to 1
> if 32-bit assembler and linker support "jmp *_start@GOT" and
> "movl $0, bar@GOT".  Otherise, defined to 0.
> * config.in: Regenerated.
> * configure: Likewise.
> * config/i386/i386.c (ix86_force_load_from_GOT_p): Return
> true if HAVE_LD_IX86_GOT32X_RELOC is 1 in 32-bit mode.
> (ix86_legitimate_address_p): Allow UNSPEC_GOT for -fno-plt
> if ix86_force_load_from_GOT_p returns true.
> (ix86_print_operand_address_as): Also support UNSPEC_GOT if
> ix86_force_load_from_GOT_p returns true.
> (ix86_expand_move): Generate UNSPEC_GOT in 32-bit mode to load
> the external function address via the GOT slot.
> (ix86_nopic_noplt_attribute_p): Check HAVE_LD_IX86_GOT32X_RELOC
> == 0 before returning false in 32-bit mode.
> (ix86_output_call_insn): Generate "%!jmp/call\t*%p0@GOT" in
> 32-bit mode if ix86_nopic_noplt_attribute_p returns true.
>
> gcc/testsuite/
>
> PR target/66232
> PR target/67400
> * gcc.target/i386/pr66232-14.c: New file.
> * gcc.target/i386/pr66232-15.c: Likewise.
> * gcc.target/i386/pr66232-16.c: Likewise.
> * gcc.target/i386/pr66232-17.c: Likewise.
> * gcc.target/i386/pr67400-1.c: Don't disable for ia32.  Scan for
> ia32 and if R_386_GOT32X relocation is supported.
> * gcc.target/i386/pr67400-2.c: Likewise.
> * gcc.target/i386/pr67400-3.c: Likewise.
> * gcc.target/i386/pr67400-4.c: Likewise.
> * gcc.target/i386/pr67400-6.c: Likewise.
> * gcc.target/i386/pr67400-7.c: Likewise.
> * lib/target-supports.exp (check_effective_target_got32x_reloc):
> New.
> ---
>  gcc/config.in  |  9 +-
>  gcc/config/i386/i386.c | 35 
>  gcc/configure  | 50 +
>  gcc/configure.ac   | 42 
>  gcc/testsuite/gcc.target/i386/pr66232-14.c | 13 
>  gcc/testsuite/gcc.target/i386/pr66232-15.c | 14 
>  gcc/testsuite/gcc.target/i386/pr66232-16.c | 13 
>  gcc/testsuite/gcc.target/i386/pr66232-17.c | 13 
>  gcc/testsuite/gcc.target/i386/pr67400-1.c  |  8 +++--
>  gcc/testsuite/gcc.target/i386/pr67400-2.c  |  8 +++--
>  gcc/testsuite/gcc.target/i386/pr67400-3.c  |  3 +-
>  gcc/testsuite/gcc.target/i386/pr67400-4.c  |  5 +--
>  gcc/testsuite/gcc.target/i386/pr67400-6.c  |  8 +++--
>  gcc/testsuite/gcc.target/i386/pr67400-7.c  |  6 ++--
>  gcc/testsuite/lib/target-supports.exp  | 51 
> ++
>  15 files changed, 256 insertions(+), 22 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr66232-14.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr66232-15.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr66232-16.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr66232-17.c
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 9c7b015..a2dcf36 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -15125,7 +15125,8 @@ darwin_local_data_pic (rtx disp)
>  bool
>  ix86_force_load_from_GOT_p (rtx x)
>  {
> -  return (TARGET_64BIT && !TARGET_PECOFF && !TARGET_MACHO
> +  return ((TARGET_64BIT || HAVE_LD_IX86_GOT32X_RELOC)
> + && !TARGET_PECOFF && !TARGET_MACHO
>   && !flag_plt && !flag_pic
>   && ix86_cmodel != CM_LARGE
>   && GET_CODE (x) == SYMBOL_REF
> @@ -15606,6 +15607,14 @@ ix86_legitimate_address_p (machine_mode, rtx addr, 
> bool strict)
>  used.  While ABI specify also 32bit relocations, we don't produce
>  them at all and use IP relative instead.  */
>   case UNSPEC_GOT:
> +   gcc_assert (flag_pic
> +   || ix86_force_load_from_GOT_p (XVECEXP (XEXP (disp, 
> 0), 0, 0)));
> +   if (!TARGET_64BIT)
> + goto is_legitimate_pic;
> +
> +   /* 64bit address unspec.  */
> +   return false;

The above should read like:

  

Re: [patch,avr]: ad PR71151: Make test cases pass on smaller targets.

2016-06-23 Thread Senthil Kumar Selvaraj

Georg-Johann Lay writes:

> On 22.06.2016 19:06, Mike Stump wrote:
>> On Jun 22, 2016, at 7:21 AM, Georg-Johann Lay wrote:
>>>
>>> Some tests for PR71151 assume that the target MCU has a 3-byte PC.  The
>>> tests are failing because the simulator (avrtest) rejects to load the
>>> respective executables if .text exceeds 128KiB, e.g. for -mmcu=atmega128
>>> which has only flash of 128KiB and only a 2-byte PC.
>>>
>>> Hence the tests have to be skipped if the target MCU has no 3-byte PC,
>>> hence a new dg-require-effective-target proc supporting "avr_3byte_pc".
>>>
>>> I added the new proc right after the last check_effective_target_arm_***
>>> so that the test is in ASCII collating order.
>>>
>>> Ok for trunk and v6?
>>
>> No.  Please see target-utils.exp and ensure that the tools generate a
>> stylized message and then add support for that to target-utils.exp.  If you
>> are using binutils, the text should go into a memory segment that will fill
>
> Binutils don't produce a message so there is nothing to scan for.  Hacking 
> binutils is beyond my scope.

binutils doesn't produce a message because

1. The size of text is not device specific right now - IIRC, it's set to
the max flash size for the emulation. I have a partial fix for this -
the text region size can now be set via a linker symbol
(__TEXT_REGION_LENGTH__). I'm planning to patch avr-libc to
automatically set this symbol based on flash size information
present in the device header file.

2. Even if (1) is fixed, the custom section (.foo) is not mapped to
any output section or region in the linker script. The linker can
error out only if the contents overflow a region.

If we have a custom linker script that places .foo in the text region,
and if we set the location counter to the address .foo should be placed,
i.e. something like


.text  :
{
 ...
 *(.fini0)  /* Infinite loop after program termination.  */
 KEEP (*(.fini0))

 . = FOO_START;
 KEEP(*(.foo))
 _etext = . ;
}  > text

and then if we pass -Wl,--defsym=FOO_START=0x20002 when linking, we'll get
the linker to report overflow.

Not sure if it's worth the effort though.

Mike, how about effective targets for sub arch/ISA variants
(avr51/avrxmega6/avrtiny..)? I guess arm has these for thumb1/thumb2/neon/dsp
etc. That would help us skip arch specific test cases, and will help
with testcases like these too - we can infer PC size from the arch.

Regards
Senthil

>
>> when it is too large.  When it does, then binutils will generate one of the
>> messages already handled, then you're done.
>
> avrtest behaves just as if the program under test would call abort.  There 
> are 
> at least 2 other AVR simulators; dunno how they would handle the situation.
>
> I don't see how an a-posteriori test could be independent of simulator, 
> independent of board descriptions and all that stuff.
>
> The tests in question don't fail because the program is too big as a result 
> of 
> some mussed optimization; some code is deliberately placed across a 64KiB or 
> 128KiB boundary or beyond 128KiB.  All this is known a priori.
>
> Hence dropping the original patch and proposing a new one that doesn't need 
> extensions to lib.
>
> The new tests just won't put any code at places where we know in advance some 
> simulator might barf.  As the compiler has no idea of exact flash size, the 
> relevant flash property is deduced from ISA properties.
>
> Is this one ok?
>
> Johann
>
>
> gcc/testsuite/
>   PR target/71151
>   * gcc.target/avr/pr71151-common.h (foo): Use macro SECTION_NAME
>   instead of ".foo" for its section name.
>   * gcc.target/avr/pr71151-2.c (SECTION_NAME): Define appropriately
>   depending on MCU's flash size.
>   * gcc.target/avr/pr71151-3.c (SECTION_NAME): Dito.
>   * gcc.target/avr/pr71151-4.c (SECTION_NAME): Dito.
>   * gcc.target/avr/pr71151-5.c (SECTION_NAME): Dito.
>   * gcc.target/avr/pr71151-6.c (SECTION_NAME): Dito.
>   * gcc.target/avr/pr71151-7.c (SECTION_NAME): Dito.
>   * gcc.target/avr/pr71151-8.c (SECTION_NAME): Dito.



[PATCH] i386: Access external function via GOT slot for -fno-plt

2016-06-23 Thread H.J. Lu
i386 psABI has been updated to clarify that R_386_GOT32X and R_386_GOT32
relocations can be used to access GOT without base register when PIC is
disabled:

https://groups.google.com/forum/#!topic/ia32-abi/awsRSvJOJfs

32-bit x86 assembler and linker from binutils 2.26.1 and 2.27 support

call/jmp *_start@GOT
cmpl $0, bar@GOT

for both normal and IFUNC functions.  We check if 32-bit x86 assembler
and linker have the fix for:

https://sourceware.org/bugzilla/show_bug.cgi?id=20244

before accessing external function via GOT slot for -fno-plt in both PIC
and non-PIC modes.

Tested on i686 and x86-64.  OK for trunk?


H.J.

PR target/66232
PR target/67400
* configure.ac (HAVE_LD_IX86_GOT32X_RELOC): New.  Defined to 1
if 32-bit assembler and linker support "jmp *_start@GOT" and
"movl $0, bar@GOT".  Otherise, defined to 0.
* config.in: Regenerated.
* configure: Likewise.
* config/i386/i386.c (ix86_force_load_from_GOT_p): Return
true if HAVE_LD_IX86_GOT32X_RELOC is 1 in 32-bit mode.
(ix86_legitimate_address_p): Allow UNSPEC_GOT for -fno-plt
if ix86_force_load_from_GOT_p returns true.
(ix86_print_operand_address_as): Also support UNSPEC_GOT if
ix86_force_load_from_GOT_p returns true.
(ix86_expand_move): Generate UNSPEC_GOT in 32-bit mode to load
the external function address via the GOT slot.
(ix86_nopic_noplt_attribute_p): Check HAVE_LD_IX86_GOT32X_RELOC
== 0 before returning false in 32-bit mode.
(ix86_output_call_insn): Generate "%!jmp/call\t*%p0@GOT" in
32-bit mode if ix86_nopic_noplt_attribute_p returns true.

gcc/testsuite/

PR target/66232
PR target/67400
* gcc.target/i386/pr66232-14.c: New file.
* gcc.target/i386/pr66232-15.c: Likewise.
* gcc.target/i386/pr66232-16.c: Likewise.
* gcc.target/i386/pr66232-17.c: Likewise.
* gcc.target/i386/pr67400-1.c: Don't disable for ia32.  Scan for
ia32 and if R_386_GOT32X relocation is supported.
* gcc.target/i386/pr67400-2.c: Likewise.
* gcc.target/i386/pr67400-3.c: Likewise.
* gcc.target/i386/pr67400-4.c: Likewise.
* gcc.target/i386/pr67400-6.c: Likewise.
* gcc.target/i386/pr67400-7.c: Likewise.
* lib/target-supports.exp (check_effective_target_got32x_reloc):
New.
---
 gcc/config.in  |  9 +-
 gcc/config/i386/i386.c | 35 
 gcc/configure  | 50 +
 gcc/configure.ac   | 42 
 gcc/testsuite/gcc.target/i386/pr66232-14.c | 13 
 gcc/testsuite/gcc.target/i386/pr66232-15.c | 14 
 gcc/testsuite/gcc.target/i386/pr66232-16.c | 13 
 gcc/testsuite/gcc.target/i386/pr66232-17.c | 13 
 gcc/testsuite/gcc.target/i386/pr67400-1.c  |  8 +++--
 gcc/testsuite/gcc.target/i386/pr67400-2.c  |  8 +++--
 gcc/testsuite/gcc.target/i386/pr67400-3.c  |  3 +-
 gcc/testsuite/gcc.target/i386/pr67400-4.c  |  5 +--
 gcc/testsuite/gcc.target/i386/pr67400-6.c  |  8 +++--
 gcc/testsuite/gcc.target/i386/pr67400-7.c  |  6 ++--
 gcc/testsuite/lib/target-supports.exp  | 51 ++
 15 files changed, 256 insertions(+), 22 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr66232-14.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr66232-15.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr66232-16.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr66232-17.c

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 9c7b015..a2dcf36 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -15125,7 +15125,8 @@ darwin_local_data_pic (rtx disp)
 bool
 ix86_force_load_from_GOT_p (rtx x)
 {
-  return (TARGET_64BIT && !TARGET_PECOFF && !TARGET_MACHO
+  return ((TARGET_64BIT || HAVE_LD_IX86_GOT32X_RELOC)
+ && !TARGET_PECOFF && !TARGET_MACHO
  && !flag_plt && !flag_pic
  && ix86_cmodel != CM_LARGE
  && GET_CODE (x) == SYMBOL_REF
@@ -15606,6 +15607,14 @@ ix86_legitimate_address_p (machine_mode, rtx addr, 
bool strict)
 used.  While ABI specify also 32bit relocations, we don't produce
 them at all and use IP relative instead.  */
  case UNSPEC_GOT:
+   gcc_assert (flag_pic
+   || ix86_force_load_from_GOT_p (XVECEXP (XEXP (disp, 0), 
0, 0)));
+   if (!TARGET_64BIT)
+ goto is_legitimate_pic;
+
+   /* 64bit address unspec.  */
+   return false;
+
  case UNSPEC_GOTOFF:
gcc_assert (flag_pic);
if (!TARGET_64BIT)
@@ -18194,7 +18203,8 @@ ix86_print_operand_address_as (FILE *file, rtx addr,
   /* Load the external function address via the GOT slot to avoid PLT.  */
   else if (GET_CODE (disp) == CONST
   && 

Re: [PATCH, rs6000] Add minimum __float128 built-in support required for glibc

2016-06-23 Thread Joseph Myers
On Wed, 22 Jun 2016, Bill Schmidt wrote:

> I understand that this is what we want for GCC 7.  My current concern is to
> get my patch included in GCC 6.2, where I can't be polluting common code.
> To get it accepted there, I first need this code approved in mainline.  So I
> am quite willing to move to the architecture-independent ones later, but
> for now I don't see that I have any choice but to seek approval for the
> purely arch-dependent one.

I don't think it's sensible to choose implementation approaches on 
mainline based on possible backports.

It seems clear to me that the architecture-independent approach is the 
right one for mainline.  This may mean the backport has some 
architecture-specific code that's not on mainline (along with code, to 
optimize copysign/fabs expansion, that may be relevant in both places), 
but that shouldn't influence the choice of how to do things on mainline.

It looks rather like this proposed code would not in fact result in 
__builtin_inff128 or __builtin_huge_valf128 suitable for glibc use - 
there's no folding support for them so they wouldn't be usable in static 
initializers.  (This can be worked around, as would be necessary anyway 
for __float128 support in glibc for x86_64 with existing GCC releases - 
some internal header would define __builtin_inff128() to ((__float128) 
__builtin_inf ()) for older compilers.)

Until the glibc support is actually tested, reviewed and fully functional, 
it would not surprise me at all if there are other back-end bugs in GCC 6 
that need fixing for it to be usable to build or use such support in 
glibc, or other features needed that we haven't realised are needed.

> The fact that I hook this built-in directly to a pattern named infkf1
> doesn't seem to preclude anything you suggest.  I named it this way
> on the off-chance that inf1 becomes a standard pattern in the
> future, in which case I want to generate this constant.  We can 
> always use gen_infkf1 to reuse this code in any other context.  I'm
> not understanding your objection.

That expander pattern is not useful given a target-independent built-in 
__builtin_inff128, since it will never be used except by a built-in 
function specifically associated with it.

I don't know what code will be generated for a use of _Float128 infinity, 
from the target-independent code - or, right now, for a use of 
(__float128) __builtin_inf ().  But if it's not the code you want, any 
reasonable fix would not be restricted to the case where __builtin_inff128 
() is used - it would work equally well for any case where that constant 
bit-pattern is wanted in VSX registers.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] Do not emit SAVE_EXPR for already assigned SSA_NAMEs (PR71606).

2016-06-23 Thread Eric Botcazou
> The gimplifier has been changed recently to use anonymous SSA_NAMEs instead
> of temporary decls.

But the PR is a regression present since GCC 4.7...

> And the gimplifier uses save_expr (which is a gimplifier function BTW) on
> both not gimplified at all as well as partially gimplified trees.

Are you confounding it with something else?  Because save_expr is definitely 
not a gimplifier function, it's mostly used to build GENERIC trees.  That 
being said, I can imagine it being invoked from the gimplifier, but I'd like 
to see the backtrace.

-- 
Eric Botcazou


Re: [patch,avr]: ad PR71151: Make test cases pass on smaller targets.

2016-06-23 Thread Georg-Johann Lay

On 22.06.2016 19:06, Mike Stump wrote:

On Jun 22, 2016, at 7:21 AM, Georg-Johann Lay wrote:


Some tests for PR71151 assume that the target MCU has a 3-byte PC.  The
tests are failing because the simulator (avrtest) rejects to load the
respective executables if .text exceeds 128KiB, e.g. for -mmcu=atmega128
which has only flash of 128KiB and only a 2-byte PC.

Hence the tests have to be skipped if the target MCU has no 3-byte PC,
hence a new dg-require-effective-target proc supporting "avr_3byte_pc".

I added the new proc right after the last check_effective_target_arm_***
so that the test is in ASCII collating order.

Ok for trunk and v6?


No.  Please see target-utils.exp and ensure that the tools generate a
stylized message and then add support for that to target-utils.exp.  If you
are using binutils, the text should go into a memory segment that will fill


Binutils don't produce a message so there is nothing to scan for.  Hacking 
binutils is beyond my scope.



when it is too large.  When it does, then binutils will generate one of the
messages already handled, then you're done.


avrtest behaves just as if the program under test would call abort.  There are 
at least 2 other AVR simulators; dunno how they would handle the situation.


I don't see how an a-posteriori test could be independent of simulator, 
independent of board descriptions and all that stuff.


The tests in question don't fail because the program is too big as a result of 
some mussed optimization; some code is deliberately placed across a 64KiB or 
128KiB boundary or beyond 128KiB.  All this is known a priori.


Hence dropping the original patch and proposing a new one that doesn't need 
extensions to lib.


The new tests just won't put any code at places where we know in advance some 
simulator might barf.  As the compiler has no idea of exact flash size, the 
relevant flash property is deduced from ISA properties.


Is this one ok?

Johann


gcc/testsuite/
PR target/71151
* gcc.target/avr/pr71151-common.h (foo): Use macro SECTION_NAME
instead of ".foo" for its section name.
* gcc.target/avr/pr71151-2.c (SECTION_NAME): Define appropriately
depending on MCU's flash size.
* gcc.target/avr/pr71151-3.c (SECTION_NAME): Dito.
* gcc.target/avr/pr71151-4.c (SECTION_NAME): Dito.
* gcc.target/avr/pr71151-5.c (SECTION_NAME): Dito.
* gcc.target/avr/pr71151-6.c (SECTION_NAME): Dito.
* gcc.target/avr/pr71151-7.c (SECTION_NAME): Dito.
* gcc.target/avr/pr71151-8.c (SECTION_NAME): Dito.

Index: gcc.target/avr/pr71151-2.c
===
--- gcc.target/avr/pr71151-2.c	(revision 237587)
+++ gcc.target/avr/pr71151-2.c	(working copy)
@@ -5,6 +5,8 @@
flash address for loading jump table entry, 2 byte entry, after
removing the special section placement hook. */
 
+#define SECTION_NAME ".foo"
+
 #include "exit-abort.h"
 #include "pr71151-common.h"
 
Index: gcc.target/avr/pr71151-3.c
===
--- gcc.target/avr/pr71151-3.c	(revision 237587)
+++ gcc.target/avr/pr71151-3.c	(working copy)
@@ -1,10 +1,17 @@
 /* { dg-do run } */
 /* { dg-options "-Os -fno-tree-switch-conversion -ffunction-sections -mno-relax -fdata-sections -Wl,--section-start=.foo=0x1" } */
 
+#ifdef __AVR_HAVE_ELPM__
 /* Make sure jumptables work properly if placed above 64 KB and below 128 KB,
i.e. 3 byte flash address for loading jump table entry and 2 byte jump table
entry, with relaxation disabled, after removing the special section
placement hook. */
+#define SECTION_NAME ".foo"
+#else
+/* No special jump table placement so that avrtest won't abort
+   for, e.g. ATmega64.  */
+#define SECTION_NAME ".text.foo"
+#endif
 
 #include "exit-abort.h"
 #include "pr71151-common.h"
Index: gcc.target/avr/pr71151-4.c
===
--- gcc.target/avr/pr71151-4.c	(revision 237587)
+++ gcc.target/avr/pr71151-4.c	(working copy)
@@ -1,10 +1,17 @@
 /* { dg-do run } */
 /* { dg-options "-Os -fno-tree-switch-conversion -ffunction-sections -fdata-sections -mrelax -Wl,--section-start=.foo=0x1" } */
 
+#ifdef __AVR_HAVE_ELPM__
 /* Make sure jumptables work properly if placed above 64 KB and below 128 KB,
i.e. 3 byte flash address for loading jump table entry and 2 byte jump
table entry, with relaxation enabled, after removing the special section
placement hook. */
+#define SECTION_NAME ".foo"
+#else
+/* No special jump table placement so that avrtest won't abort
+   for, e.g. ATmega64.  */
+#define SECTION_NAME ".text.foo"
+#endif
 
 #include "exit-abort.h"
 #include "pr71151-common.h"
Index: gcc.target/avr/pr71151-5.c
===
--- gcc.target/avr/pr71151-5.c	(revision 237587)
+++ gcc.target/avr/pr71151-5.c	(working copy)
@@ -1,20 +1,23 @@
 /* { dg-do run } */
 

[PATCH] Fix PR rtl-optimization/71634

2016-06-23 Thread Martin Liška
Hello.

Following patch changes minimum of ira-max-loops-num to 1.
Having the minimum equal to zero does not make much sense.

Ready after it finishes reg on x86_64-linux?

Thanks,
Martin
>From e72dafdf3a2a7cfaca4a617fd10e80dd7aae1e91 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Thu, 23 Jun 2016 12:52:44 +0200
Subject: [PATCH] Fix PR rtl-optimization/71634

gcc/ChangeLog:

2016-06-23  Martin Liska  

	* params.def: Change min of ira-max-loops-num to 1.
---
 gcc/params.def | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/params.def b/gcc/params.def
index 894b7f3..1273cc9 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -810,7 +810,7 @@ DEFPARAM (PARAM_SCCVN_MAX_ALIAS_QUERIES_PER_ACCESS,
 DEFPARAM (PARAM_IRA_MAX_LOOPS_NUM,
 	  "ira-max-loops-num",
 	  "Max loops number for regional RA.",
-	  100, 0, 0)
+	  100, 1, 0)
 
 DEFPARAM (PARAM_IRA_MAX_CONFLICT_TABLE_SIZE,
 	  "ira-max-conflict-table-size",
-- 
2.8.4



Re: [PATCH] Do not emit SAVE_EXPR for already assigned SSA_NAMEs (PR71606).

2016-06-23 Thread Jakub Jelinek
On Thu, Jun 23, 2016 at 12:41:53PM +0200, Eric Botcazou wrote:
> > This is candidate patch for the PR, which do not create SAVE_EXPR trees for
> > already assigned SSA_NAMEs.
> > 
> > Patch survives reg on x86_64-linux-gnu.
> > 
> > Thoughts?
> 
> This looks like a layering violation, save_expr is a GENERIC thing so 
> invoking 
> it on an SSA_NAME is weird.  How does this happen?

The gimplifier has been changed recently to use anonymous SSA_NAMEs instead
of temporary decls.  And the gimplifier uses save_expr (which is a
gimplifier function BTW) on both not gimplified at all as well as partially
gimplified trees.

Jakub


[PATCH] Fix PR middle-end/71619

2016-06-23 Thread Martin Liška
Hi.

Following patch reverts the hunk that was removed in r237103.

Patch can bootstrap & passes regression tests on x86_64-linux-gnu.
Patch is approved by Honza.

Thanks,
Martin
>From e31bef9738193edd46fae118074bd7f241f366c2 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Thu, 23 Jun 2016 11:03:27 +0200
Subject: [PATCH] Fix PR middle-end/71619

gcc/ChangeLog:

2016-06-23  Martin Liska  

	PR middle-end/71619
	* predict.c (predict_loops): Revert the hunk that was removed
	in r237103.

gcc/testsuite/ChangeLog:

2016-06-23  Martin Liska  

	* gcc.dg/pr71619.c: New test.
---
 gcc/predict.c  |  6 +-
 gcc/testsuite/gcc.dg/pr71619.c | 11 +++
 2 files changed, 16 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr71619.c

diff --git a/gcc/predict.c b/gcc/predict.c
index 470de8a..d505d9c 100644
--- a/gcc/predict.c
+++ b/gcc/predict.c
@@ -1769,7 +1769,11 @@ predict_loops (void)
 	  else
 	continue;
 
-	  gcc_checking_assert (nitercst);
+	  /* If the prediction for number of iterations is zero, do not
+	 predict the exit edges.  */
+	  if (nitercst == 0)
+	continue;
+
 	  probability = RDIV (REG_BR_PROB_BASE, nitercst);
 	  predict_edge (ex, predictor, probability);
 	}
diff --git a/gcc/testsuite/gcc.dg/pr71619.c b/gcc/testsuite/gcc.dg/pr71619.c
new file mode 100644
index 000..e1404bc
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr71619.c
@@ -0,0 +1,11 @@
+/* PR 71619 */
+
+/* { dg-do compile } */
+/* { dg-options "-O --param=max-predicted-iterations=0" } */
+
+void
+foo ()
+{
+  int count = -10;
+  while (count++);
+}
-- 
2.8.4



Re: [PATCH] Do not emit SAVE_EXPR for already assigned SSA_NAMEs (PR71606).

2016-06-23 Thread Eric Botcazou
> This is candidate patch for the PR, which do not create SAVE_EXPR trees for
> already assigned SSA_NAMEs.
> 
> Patch survives reg on x86_64-linux-gnu.
> 
> Thoughts?

This looks like a layering violation, save_expr is a GENERIC thing so invoking 
it on an SSA_NAME is weird.  How does this happen?

-- 
Eric Botcazou


[PATCH, PR71602] Give error for invalid va_list argument to va_arg

2016-06-23 Thread Tom de Vries

Hi,

this patch fixes PR71602, a 6/7 regression.

Consider this test-case:
...
__builtin_va_list *pap;

void
fn1 (void)
{
 __builtin_va_arg(pap, double);
}
...

The testcase is invalid, because we're not passing a va_list as first 
argument of va_arg, but a va_list*.


When compiling for x86_64 -m64, we run into the second assert in this 
snippet from build_va_arg:

...
{
  /* Case 2b: va_list is pointer to array elem type.  */
  gcc_assert (POINTER_TYPE_P (va_type));
  gcc_assert (TREE_TYPE (va_type) == TREE_TYPE (canon_va_type));

  /* Don't take the address.  We've already got ''.  */
  ;
}
...

At that point, va_type and canon_va_type are:
...
(gdb) call debug_generic_expr (va_type)
struct [1] *
(gdb) call debug_generic_expr (canon_va_type)
struct [1]
...

so TREE_TYPE (va_type) and TREE_TYPE (canon_va_type) are not equal:
...
(gdb) call debug_generic_expr (va_type.typed.type)
struct [1]
(gdb) call debug_generic_expr (canon_va_type.typed.type)
struct
...

Given the semantics of the target hook:
...
Target Hook: tree TARGET_CANONICAL_VA_LIST_TYPE (tree type)

This hook returns the va_list type of the calling convention 
specified by the type of type. If type is not a valid va_list type, it 
returns NULL_TREE.

...
one could argue that canonical_va_list_type should return NULL_TREE for 
a va_list*, which would fix the ICE. But the current implementation 
seems to rely on canonical_va_list_type to return va_list for a va_list* 
argument.


The patch fixes the ICE by making the valid va_list check in 
build_va_arg more precise, by taking into account the non-strict 
behavior of canonical_va_list_type.


Bootstrapped and reg-tested on x86_64 (-m64 and -m32).

OK for trunk?

Thanks,
- Tom
Give error for invalid va_list argument to va_arg

2016-06-22  Tom de Vries  

	PR c/71602
	* c-common.c (build_va_arg): Add comp_types parameter.  Give error for
	invalid va_list argument.
	* c-common.h (build_va_arg): Add comp_types parameter.

	* c-typeck.c (va_list_comptypes): New function.
	(c_build_va_arg): Add argument to build_va_arg call.

	* call.c (build_x_va_arg): Add argument to build_va_arg call.

	* c-c++-common/va-arg-va-list-type.c: New test.

---
 gcc/c-family/c-common.c  | 34 ++--
 gcc/c-family/c-common.h  |  2 +-
 gcc/c/c-typeck.c | 11 +++-
 gcc/cp/call.c|  6 +++--
 gcc/testsuite/c-c++-common/va-arg-va-list-type.c |  9 +++
 5 files changed, 44 insertions(+), 18 deletions(-)

diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 85f3a03..8d0f335 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -5696,7 +5696,8 @@ build_va_arg_1 (location_t loc, tree type, tree op)
va_arg (EXPR, TYPE) at source location LOC.  */
 
 tree
-build_va_arg (location_t loc, tree expr, tree type)
+build_va_arg (location_t loc, tree expr, tree type,
+	  bool (*comp_types) (tree, tree))
 {
   tree va_type = TREE_TYPE (expr);
   tree canon_va_type = (va_type == error_mark_node
@@ -5712,6 +5713,14 @@ build_va_arg (location_t loc, tree expr, tree type)
   return build_va_arg_1 (loc, type, expr);
 }
 
+  bool valid_va_list
+= ((TREE_CODE (canon_va_type) == ARRAY_TYPE
+	&& TREE_CODE (va_type) == POINTER_TYPE)
+   ? comp_types (TREE_TYPE (canon_va_type), TREE_TYPE (va_type))
+   : comp_types (canon_va_type, va_type));
+  if (!valid_va_list)
+goto fail_first_arg_type;
+
   if (TREE_CODE (canon_va_type) != ARRAY_TYPE)
 {
   /* Case 1: Not an array type.  */
@@ -5724,11 +5733,7 @@ build_va_arg (location_t loc, tree expr, tree type)
   tree canon_expr_type
 	= targetm.canonical_va_list_type (TREE_TYPE (expr));
   if (canon_expr_type == NULL_TREE)
-	{
-	  error_at (loc,
-		"first argument to % not of type %");
-	  return error_mark_node;
-	}
+	goto fail_first_arg_type;
 
   return build_va_arg_1 (loc, type, expr);
 }
@@ -5797,23 +5802,24 @@ build_va_arg (location_t loc, tree expr, tree type)
   tree canon_expr_type
 	= targetm.canonical_va_list_type (TREE_TYPE (expr));
   if (canon_expr_type == NULL_TREE)
-	{
-	  error_at (loc,
-		"first argument to % not of type %");
-	  return error_mark_node;
-	}
+	goto fail_first_arg_type;
 }
-  else
+  else if (TREE_CODE (va_type) == POINTER_TYPE)
 {
   /* Case 2b: va_list is pointer to array elem type.  */
-  gcc_assert (POINTER_TYPE_P (va_type));
-  gcc_assert (TREE_TYPE (va_type) == TREE_TYPE (canon_va_type));
 
   /* Don't take the address.  We've already got ''.  */
   ;
 }
+  else
+goto fail_first_arg_type;
 
   return build_va_arg_1 (loc, type, expr);
+
+ fail_first_arg_type:
+  error_at (loc,
+	"first argument to % not of type %");
+  return error_mark_node;
 }
 
 
diff --git 

Re: [PATCH, i386] Add native detection for VIA C7 and Eden CPUs

2016-06-23 Thread Uros Bizjak
On Thu, Jun 23, 2016 at 10:26 AM, J. Mayer  wrote:
> The following patch adds native detection for C7, Eden "Esther" and
> Eden "Nehemiah" VIA CPUs.
>
> Please CC me to any comment / review / change request.
>
> ---
>
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index 14b8030..55afd8b 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,3 +1,10 @@
> +2016-06-21  Jocelyn Mayer  
> +
> +   * config/i386/driver-i386.c (host_detect_local_cpu): Set
> +   PROCESSOR_PENTIUMPRO for signature_CENTAUR_ebx family >= 9.
> +   : Pass c7 or nehemiah for
> +   signature_CENTAUR_ebx.

The patch is OK for mainline.

Thanks,
Uros.

>  2016-06-21  Jakub Jelinek  
>
> PR tree-optimization/71588
> diff --git a/gcc/config/i386/driver-i386.c b/gcc/config/i386/driver-
> i386.c
> index 9f44ee8..22a8f28 100644
> --- a/gcc/config/i386/driver-i386.c
> +++ b/gcc/config/i386/driver-i386.c
> @@ -653,10 +653,7 @@ const char *host_detect_local_cpu (int argc, const
> char **argv)
> case 6:
>   if (has_longmode)
> processor = PROCESSOR_K8;
> - else if (model > 9)
> -   /* Use the default detection procedure.  */
> -   ;
> - else if (model == 9)
> + else if (model >= 9)
> processor = PROCESSOR_PENTIUMPRO;
>   else if (model >= 6)
> processor = PROCESSOR_I486;
> @@ -818,15 +815,27 @@ const char *host_detect_local_cpu (int argc,
> const char **argv)
>as all the CPUs below are 32-bit only.  */
> cpu = "x86-64";
>   else if (has_sse3)
> -   /* It is Core Duo.  */
> -   cpu = "pentium-m";
> +   {
> + if (vendor == signature_CENTAUR_ebx)
> +   /* C7 / Eden "Esther" */
> +   cpu = "c7";
> + else
> +   /* It is Core Duo.  */
> +   cpu = "pentium-m";
> +   }
>   else if (has_sse2)
> /* It is Pentium M.  */
> cpu = "pentium-m";
>   else if (has_sse)
> {
>   if (vendor == signature_CENTAUR_ebx)
> -   cpu = "c3-2";
> +   {
> + if (model >= 9)
> +   /* Eden "Nehemiah" */
> +   cpu = "nehemiah";
> + else
> +   cpu = "c3-2";
> +   }
>   else
> /* It is Pentium III.  */
> cpu = "pentium3";
>


Re: [PATCH 2/2][v3] Drop excess size used for run time allocated stack variables.

2016-06-23 Thread Dominik Vogt
On Wed, Jun 22, 2016 at 10:24:02PM -0600, Jeff Law wrote:
> On 05/25/2016 07:32 AM, Dominik Vogt wrote:
> >On Wed, May 25, 2016 at 02:30:54PM +0100, Dominik Vogt wrote:
> >>> On Tue, May 03, 2016 at 03:17:53PM +0100, Dominik Vogt wrote:
>  > Version two of the patch including a test case.
>  >
>  > On Mon, May 02, 2016 at 09:10:25AM -0600, Jeff Law wrote:
> > > > On 04/29/2016 04:12 PM, Dominik Vogt wrote:
> >> > > >The attached patch removes excess stack space allocation with
> >> > > >alloca in some situations.  Plese check the commit message in the
> >> > > >patch for details.
>  >
> > > > However, I would strongly recommend some tests, even if they are
> > > > target specific.  You can always copy pr36728-1 into the s390x
> > > > directory and look at size of the generated stack.  Simliarly for
> > > > pr50938 for x86.
>  >
>  > However, x86 uses the "else" branch in round_push, i.e. it uses
>  > "virtual_preferred_stack_boundary_rtx" to calculate the number of
>  > bytes to add for stack alignment.  That value is unknown at the
>  > time round_push is called, so the test case fails on such targets,
>  > and I've no idea how to fix this properly.
> >>>
> >>> Third version of the patch with the suggested cleanup in the first
> >>> patch and the functional stuff in the second one.  The first patch
> >>> is based on Jeff's draft with the change suggested by Eric and
> >>> more cleanup added by me.
> >This is the updated funtional patch.  Re-tested with limited
> >effort, i.e. tested and bootstrapped on s390x biarch (but did not
> >look for performance regressions compared to version 2 of the
> >patch).
> >
> >Ciao
> >
> >Dominik ^_^  ^_^
> >
> >-- Dominik Vogt IBM Germany
> >
> >
> >0002-v3-ChangeLog
> >
> >
> >gcc/ChangeLog
> >
> > * explow.c (round_push): Use know adjustment.
> > (allocate_dynamic_stack_space): Pass known adjustment to round_push.
> >gcc/testsuite/ChangeLog
> >
> > * gcc.dg/pr50938.c: New test.
> >
> >
> >0002-v3-Drop-excess-size-used-for-run-time-allocated-stack-v.patch
> >
> >
> >From 4296d353e1d153b5b5ee435a44cae6117bf2fff0 Mon Sep 17 00:00:00 2001
> >From: Dominik Vogt 
> >Date: Fri, 29 Apr 2016 08:36:59 +0100
> >Subject: [PATCH 2/2] Drop excess size used for run time allocated stack
> > variables.
> >
> >The present calculation sometimes led to more stack memory being used than
> >necessary with alloca.  First, (STACK_BOUNDARY -1) would be added to the
> >allocated size:
> >
> >  size = plus_constant (Pmode, size, extra);
> >  size = force_operand (size, NULL_RTX);
> >
> >Then round_push was called and added another (STACK_BOUNDARY - 1) before
> >rounding down to a multiple of STACK_BOUNDARY.  On s390x this resulted in
> >adding 14 before rounding down for "x" in the test case pr36728-1.c.
> >
> >round_push() now takes an argument to inform it about what has already been
> >added to size.
> >---
> > gcc/explow.c   | 45 +---
> > gcc/testsuite/gcc.dg/pr50938.c | 52 
> > ++
> > 2 files changed, 79 insertions(+), 18 deletions(-)
> > create mode 100644 gcc/testsuite/gcc.dg/pr50938.c
> >
> >diff --git a/gcc/explow.c b/gcc/explow.c
> >index 09a0330..85596e2 100644
> >--- a/gcc/explow.c
> >+++ b/gcc/explow.c
> >@@ -949,24 +949,30 @@ anti_adjust_stack (rtx adjust)
> > }
> >
> > /* Round the size of a block to be pushed up to the boundary required
> >-   by this machine.  SIZE is the desired size, which need not be constant.  
> >*/
> >+   by this machine.  SIZE is the desired size, which need not be constant.
> >+   ALREADY_ADDED is the number of units that have already been added to SIZE
> >+   for other alignment reasons.
> >+*/
> >
> > static rtx
> >-round_push (rtx size)
> >+round_push (rtx size, int already_added)
> > {
> >-  rtx align_rtx, alignm1_rtx;
> >+  rtx align_rtx, add_rtx;
> >
> >   if (!SUPPORTS_STACK_ALIGNMENT
> >   || crtl->preferred_stack_boundary == MAX_SUPPORTED_STACK_ALIGNMENT)
> > {
> >   int align = crtl->preferred_stack_boundary / BITS_PER_UNIT;
> >+  int add;
> >
> >   if (align == 1)
> > return size;
> >
> >+  add = (align > already_added) ? align - already_added - 1 : 0;
> >+
> >   if (CONST_INT_P (size))
> > {
> >-  HOST_WIDE_INT new_size = (INTVAL (size) + align - 1) / align * align;
> >+  HOST_WIDE_INT new_size = (INTVAL (size) + add) / align * align;
> >
> >   if (INTVAL (size) != new_size)
> > size = GEN_INT (new_size);
> So presumably the idea here is when the requested SIZE would require
> allocating additional space to first see if the necessary space is
> already available inside ALREADY_ADDED

Yes.

> and use that rather than rounding size up to an alignment boundary.

Not exactly.  Consider the unpatched code.  At the beginning we
have some amount of space to be allocated on the stack at runtime

Re: [PATCH, vec-tails 05/10] Check if loop can be masked

2016-06-23 Thread Ilya Enkovich
On 22 Jun 11:42, Jeff Law wrote:
> On 06/22/2016 10:09 AM, Ilya Enkovich wrote:
> 
> >>Given the common structure & duplication I can't help but wonder if a single
> >>function should be used for widening/narrowing.  Ultimately can't you swap
> >>mask_elems/req_elems and always go narrower to wider (using a different
> >>optab for the two different cases)?
> >
> >I think we can't always go in narrower to wider direction because widening
> >uses two optabs wand also because the way insn_data is checked.
> OK.  Thanks for considering.
> 
> >>
> >>I'm guessing Richi's comment about what tree type you're looking at refers
> >>to this and similar instances.  Doesn't this give you the type of the number
> >>of iterations rather than the type of the iteration variable itself?
> >>
> >>
> >
> >Since I build vector IV by myself and use to compare with NITERS I
> >feel it's safe to
> >use type of NITERS.  Do you expect NITERS and IV types differ?
> Since you're comparing to NITERS, it sounds like you've got it right and
> that Richi and I have it wrong.
> 
> It's less a question of whether or not we expect NITERS and IV to have
> different types, but more a realization that there's nothing that inherently
> says they have to be the same.  THey probably are the same most of the time,
> but I don't think that's something we can or should necessarily depend on.
> 
> 
> 
> >>>@@ -1791,6 +1870,20 @@ vectorizable_mask_load_store (gimple *stmt,
> >>>gimple_stmt_iterator *gsi,
> >>>   && !useless_type_conversion_p (vectype, rhs_vectype)))
> >>> return false;
> >>>
> >>>+  if (LOOP_VINFO_CAN_BE_MASKED (loop_vinfo))
> >>>+{
> >>>+  /* Check that mask conjuction is supported.  */
> >>>+  optab tab;
> >>>+  tab = optab_for_tree_code (BIT_AND_EXPR, vectype, optab_default);
> >>>+  if (!tab || optab_handler (tab, TYPE_MODE (vectype)) ==
> >>>CODE_FOR_nothing)
> >>>+   {
> >>>+ if (dump_enabled_p ())
> >>>+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> >>>+"cannot be masked: unsupported mask
> >>>operation\n");
> >>>+ LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
> >>>+   }
> >>>+}
> >>
> >>Should the optab querying be in optab-query.c?
> >
> >We always directly call optab_handler for simple operations.  There are 
> >dozens
> >of such calls in vectorizer.
> OK.  I would look favorably on a change to move those queries out into
> optabs-query as a separate patch.
> 
> >
> >We don't embed masking capabilities into vectorizer.
> >
> >Actually we don't depend on masking capabilities so much.  We have to mask
> >loads and stores and use can_mask_load_store for that which uses existing 
> >optab
> >query.  We also require masking for reductions and use VEC_COND for that
> >(and use existing expand_vec_cond_expr_p).  Other checks are to check if we
> >can build required masks.  So we actually don't expose any new processor
> >masking capabilities to GIMPLE.  I.e. all this works on targets with no
> >rich masking capabilities.  E.g. we can mask loops for quite old SSE targets.
> OK.  I think the key here is that load/store masking already exists and the
> others are either VEC_COND or checking if we can build the mask rather than
> can the operation be masked.  THanks for clarifying.
> jeff

Here is an updated version with less typos and more comments.

Thanks,
Ilya
--
gcc/

2016-05-23  Ilya Enkovich  

* tree-vect-loop.c: Include insn-config.h and recog.h.
(vect_check_required_masks_widening): New.
(vect_check_required_masks_narrowing): New.
(vect_get_masking_iv_elems): New.
(vect_get_masking_iv_type): New.
(vect_get_extreme_masks): New.
(vect_check_required_masks): New.
(vect_analyze_loop_operations): Add vect_check_required_masks
call to compute LOOP_VINFO_CAN_BE_MASKED.
(vect_analyze_loop_2): Initialize LOOP_VINFO_CAN_BE_MASKED and
LOOP_VINFO_NEED_MASKING before starting over.
(vectorizable_reduction): Compute LOOP_VINFO_CAN_BE_MASKED and
masking cost.
* tree-vect-stmts.c (can_mask_load_store): New.
(vect_model_load_masking_cost): New.
(vect_model_store_masking_cost): New.
(vect_model_simple_masking_cost): New.
(vectorizable_mask_load_store): Compute LOOP_VINFO_CAN_BE_MASKED
and masking cost.
(vectorizable_simd_clone_call): Likewise.
(vectorizable_store): Likewise.
(vectorizable_load): Likewise.
(vect_stmt_should_be_masked_for_epilogue): New.
(vect_add_required_mask_for_stmt): New.
(vect_analyze_stmt): Compute LOOP_VINFO_CAN_BE_MASKED.
* tree-vectorizer.h (vect_model_load_masking_cost): New.
(vect_model_store_masking_cost): New.
(vect_model_simple_masking_cost): New.


diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index c75d234..3b50168 100644
--- 

[PATCH/AARCH64] Add rtx_costs routine for vulcan.

2016-06-23 Thread Virendra Pathak
Hi gcc-patches group,

Please find the patch for adding rtx_costs routine for vulcan cpu.

Tested with compiling cross aarch64-linux-gcc , bootstrapped native
aarch64-unknown-linux-gnu
and make check (gcc). No new regression failure is added by this patch.

Kindly review and merge the patch to trunk, if the patch is okay.
Thanks.


gcc/ChangeLog:

Virendra Pathak  

* config/aarch64/aarch64-cores.def: Update vulcan COSTS.
* config/aarch64/aarch64-cost-tables.h
(vulcan_extra_costs): New variable.
* config/aarch64/aarch64.c
(vulcan_addrcost_table): Likewise.
(vulcan_regmove_cost): Likewise.
(vulcan_vector_cost): Likewise.
(vulcan_branch_cost): Likewise.
(vulcan_tunings): Likewise.




with regards,
Virendra Pathak
From 602d25a25b69b7615e52b03bbaa28919a08b6bd0 Mon Sep 17 00:00:00 2001
From: Virendra Pathak 
Date: Tue, 21 Jun 2016 01:44:38 -0700
Subject: [PATCH] AArch64: Add rtx_costs routine for vulcan.

---
 gcc/config/aarch64/aarch64-cores.def |   2 +-
 gcc/config/aarch64/aarch64-cost-tables.h | 102 +++
 gcc/config/aarch64/aarch64.c |  75 +++
 3 files changed, 178 insertions(+), 1 deletion(-)

diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index ced8f94..f29d25a 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -51,7 +51,7 @@ AARCH64_CORE("xgene1",  xgene1,xgene1,8A,  
AARCH64_FL_FOR_ARCH8, xge
 
 /* V8.1 Architecture Processors.  */
 
-AARCH64_CORE("vulcan",  vulcan, cortexa57, 8_1A,  AARCH64_FL_FOR_ARCH8_1 | 
AARCH64_FL_CRYPTO, cortexa57, "0x42", "0x516")
+AARCH64_CORE("vulcan",  vulcan, cortexa57, 8_1A,  AARCH64_FL_FOR_ARCH8_1 | 
AARCH64_FL_CRYPTO, vulcan, "0x42", "0x516")
 
 /* V8 big.LITTLE implementations.  */
 
diff --git a/gcc/config/aarch64/aarch64-cost-tables.h 
b/gcc/config/aarch64/aarch64-cost-tables.h
index 3a3f519..12ad52e 100644
--- a/gcc/config/aarch64/aarch64-cost-tables.h
+++ b/gcc/config/aarch64/aarch64-cost-tables.h
@@ -127,6 +127,108 @@ const struct cpu_cost_table thunderx_extra_costs =
   }
 };
 
+const struct cpu_cost_table vulcan_extra_costs =
+{
+  /* ALU */
+  {
+0, /* Arith.  */
+0, /* Logical.  */
+0, /* Shift.  */
+0, /* Shift_reg.  */
+COSTS_N_INSNS (1), /* Arith_shift.  */
+COSTS_N_INSNS (1), /* Arith_shift_reg.  */
+COSTS_N_INSNS (1), /* Log_shift.  */
+COSTS_N_INSNS (1), /* Log_shift_reg.  */
+0, /* Extend.  */
+COSTS_N_INSNS (1), /* Extend_arith.  */
+0, /* Bfi.  */
+0, /* Bfx.  */
+COSTS_N_INSNS (3), /* Clz.  */
+0, /* Rev.  */
+0, /* Non_exec.  */
+true   /* Non_exec_costs_exec.  */
+  },
+  {
+/* MULT SImode */
+{
+  COSTS_N_INSNS (4),   /* Simple.  */
+  COSTS_N_INSNS (4),   /* Flag_setting.  */
+  COSTS_N_INSNS (4),   /* Extend.  */
+  COSTS_N_INSNS (5),   /* Add.  */
+  COSTS_N_INSNS (5),   /* Extend_add.  */
+  COSTS_N_INSNS (18)   /* Idiv.  */
+},
+/* MULT DImode */
+{
+  COSTS_N_INSNS (4),   /* Simple.  */
+  0,   /* Flag_setting.  */
+  COSTS_N_INSNS (4),   /* Extend.  */
+  COSTS_N_INSNS (5),   /* Add.  */
+  COSTS_N_INSNS (5),   /* Extend_add.  */
+  COSTS_N_INSNS (26)   /* Idiv.  */
+}
+  },
+  /* LD/ST */
+  {
+COSTS_N_INSNS (4), /* Load.  */
+COSTS_N_INSNS (4), /* Load_sign_extend.  */
+COSTS_N_INSNS (5), /* Ldrd.  */
+COSTS_N_INSNS (4), /* Ldm_1st.  */
+1, /* Ldm_regs_per_insn_1st.  */
+1, /* Ldm_regs_per_insn_subsequent.  */
+COSTS_N_INSNS (4), /* Loadf.  */
+COSTS_N_INSNS (4), /* Loadd.  */
+COSTS_N_INSNS (4), /* Load_unaligned.  */
+0, /* Store.  */
+0, /* Strd.  */
+0, /* Stm_1st.  */
+1, /* Stm_regs_per_insn_1st.  */
+1, /* Stm_regs_per_insn_subsequent.  */
+0, /* Storef.  */
+0, /* Stored.  */
+0, /* Store_unaligned.  */
+COSTS_N_INSNS (1), /* Loadv.  */
+COSTS_N_INSNS (1)  /* Storev.  */
+  },
+  {
+/* FP SFmode */
+{
+  COSTS_N_INSNS (16),  /* Div.  */
+  COSTS_N_INSNS (6),   /* Mult.  */
+  COSTS_N_INSNS (6),   /* Mult_addsub. */
+  COSTS_N_INSNS (6),   /* Fma.  */
+  COSTS_N_INSNS (6),   /* Addsub.  */
+  COSTS_N_INSNS (5),   /* Fpconst. */
+  COSTS_N_INSNS (5),   /* Neg.  */
+  COSTS_N_INSNS (5),   /* Compare.  */
+  COSTS_N_INSNS (7),   /* Widen.  */
+  COSTS_N_INSNS (7),   /* Narrow.  

[PATCH, i386] Add native detection for VIA C7 and Eden CPUs

2016-06-23 Thread J. Mayer
The following patch adds native detection for C7, Eden "Esther" and
Eden "Nehemiah" VIA CPUs.

Please CC me to any comment / review / change request.

---

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 14b8030..55afd8b 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,10 @@
+2016-06-21  Jocelyn Mayer  
+
+   * config/i386/driver-i386.c (host_detect_local_cpu): Set
+   PROCESSOR_PENTIUMPRO for signature_CENTAUR_ebx family >= 9.
+   : Pass c7 or nehemiah for
+   signature_CENTAUR_ebx.
+
 2016-06-21  Jakub Jelinek  
 
PR tree-optimization/71588
diff --git a/gcc/config/i386/driver-i386.c b/gcc/config/i386/driver-
i386.c
index 9f44ee8..22a8f28 100644
--- a/gcc/config/i386/driver-i386.c
+++ b/gcc/config/i386/driver-i386.c
@@ -653,10 +653,7 @@ const char *host_detect_local_cpu (int argc, const
char **argv)
case 6:
  if (has_longmode)
processor = PROCESSOR_K8;
- else if (model > 9)
-   /* Use the default detection procedure.  */
-   ;
- else if (model == 9)
+ else if (model >= 9)
processor = PROCESSOR_PENTIUMPRO;
  else if (model >= 6)
processor = PROCESSOR_I486;
@@ -818,15 +815,27 @@ const char *host_detect_local_cpu (int argc,
const char **argv)
   as all the CPUs below are 32-bit only.  */
cpu = "x86-64";
  else if (has_sse3)
-   /* It is Core Duo.  */
-   cpu = "pentium-m";
+   {
+ if (vendor == signature_CENTAUR_ebx)
+   /* C7 / Eden "Esther" */
+   cpu = "c7";
+ else
+   /* It is Core Duo.  */
+   cpu = "pentium-m";
+   }
  else if (has_sse2)
/* It is Pentium M.  */
cpu = "pentium-m";
  else if (has_sse)
{
  if (vendor == signature_CENTAUR_ebx)
-   cpu = "c3-2";
+   {
+ if (model >= 9)
+   /* Eden "Nehemiah" */
+   cpu = "nehemiah";
+ else
+   cpu = "c3-2";
+   }
  else
/* It is Pentium III.  */
cpu = "pentium3";



Re: [PATCH] Prevent LTO wrappers to process a recursive execution

2016-06-23 Thread Martin Liška
On 06/23/2016 06:57 AM, Jeff Law wrote:
> Is this still something you want to pursue?  It looks pretty reasonable and 
> one could make an argument that it's a good idea in and of itself.
> 
> jeff

Yeah, I would like to install the patch :) Can I take your reply as signal that 
it's accepted?

Thanks,
Martin


[PATCH] Do not emit SAVE_EXPR for already assigned SSA_NAMEs (PR71606).

2016-06-23 Thread Martin Liška
Hi.

This is candidate patch for the PR, which do not create SAVE_EXPR trees for
already assigned SSA_NAMEs.

Patch survives reg on x86_64-linux-gnu.

Thoughts?
Thanks,
Martin
>From 91d01830302171b5cd53fa2f32cc881b2b50762f Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 22 Jun 2016 18:07:55 +0200
Subject: [PATCH] Do not emit SAVE_EXPR for already assigned SSA_NAMEs
 (PR71606).

gcc/ChangeLog:

2016-06-22  Martin Liska  

	PR middle-end/71606
	* tree.c (save_expr): Do not generate SAVE_EXPR if the
	argument is already an assigned SSA_NAME.

gcc/testsuite/ChangeLog:

2016-06-22  Martin Liska  

	* gcc.dg/torture/pr71606.c: New test.
---
 gcc/testsuite/gcc.dg/torture/pr71606.c | 11 +++
 gcc/tree.c |  3 +++
 2 files changed, 14 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr71606.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr71606.c b/gcc/testsuite/gcc.dg/torture/pr71606.c
new file mode 100644
index 000..b0cc26a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr71606.c
@@ -0,0 +1,11 @@
+_Complex a;
+void fn1 ();
+
+int main () {
+  fn1 (a);
+  return 0;
+}
+
+void fn1 (__complex__ long double p1) {
+  __imag__ p1 = 6.0L;
+}
diff --git a/gcc/tree.c b/gcc/tree.c
index bc60190..344eb61 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -3340,6 +3340,9 @@ save_expr (tree expr)
   tree t = fold (expr);
   tree inner;
 
+  if (TREE_CODE (expr) == SSA_NAME && SSA_NAME_DEF_STMT (expr))
+return t;
+
   /* If the tree evaluates to a constant, then we don't want to hide that
  fact (i.e. this allows further folding, and direct checks for constants).
  However, a read-only object that has side effects cannot be bypassed.
-- 
2.8.4



Re: [PATCH 2/3] Add support for arm*-*-phoenix* targets.

2016-06-23 Thread Jakub Sejdak
How about backporting this to gcc-6 and gcc-5?

2016-06-21 22:10 GMT+02:00 Jeff Law :
> On 06/15/2016 08:22 AM, Kuba Sejdak wrote:
>>
>> Is it ok for trunk? If possible, If possible, please merge it also to
>> GCC-6 and GCC-5 branches.
>>
>> 2016-06-15  Jakub Sejdak  
>>
>>* config.gcc: Add support for arm*-*-phoenix* targets.
>>* config/arm/t-phoenix: New.
>>* config/phoenix.h: New.
>>
>> ---
>>  gcc/ChangeLog|  6 ++
>>  gcc/config.gcc   | 11 +++
>>  gcc/config/arm/t-phoenix | 29 +
>>  gcc/config/phoenix.h | 33 +
>>  4 files changed, 79 insertions(+)
>>  create mode 100644 gcc/config/arm/t-phoenix
>>  create mode 100644 gcc/config/phoenix.h
>>
>
>> +arm*-*-phoenix*)
>> +   tm_file="dbxelf.h elfos.h arm/unknown-elf.h arm/elf.h arm/bpabi.h"
>> +   tm_file="${tm_file} newlib-stdint.h phoenix.h"
>> +   tm_file="${tm_file} arm/aout.h arm/arm.h"
>> +   tmake_file="${tmake_file} arm/t-arm arm/t-bpabi arm/t-phoenix"
>
> Do you really need dbxelf.h?  We're trying to get away from stabs, so unless
> there's a strong need, avoid dbxelf.h :-)
>
> OK for the trunk with dbxelf.h removed.
>
> jeff



-- 
Jakub Sejdak
Software Engineer
Phoenix Systems (www.phoesys.com)
+48 608 050 163


Re: [PATCH 3/3] Add support for arm*-*-phoenix* targets in libgcc.

2016-06-23 Thread Jakub Sejdak
How about backporting this to gcc-6 and gcc-5?

2016-06-21 22:11 GMT+02:00 Jeff Law :
> On 06/15/2016 08:22 AM, Kuba Sejdak wrote:
>>
>> Is it ok for trunk? If possible, If possible, please merge it also to
>> GCC-6 and GCC-5 branches.
>>
>> 2016-06-15  Jakub Sejdak  
>>
>>* config.host: Add suport for arm*-*-phoenix* targets.
>
> OK for the trunk.
>
> jeff
>



-- 
Jakub Sejdak
Software Engineer
Phoenix Systems (www.phoesys.com)
+48 608 050 163


Re: [Patch, avr] Fix PR 71151

2016-06-23 Thread Senthil Kumar Selvaraj

Georg-Johann Lay writes:

> Senthil Kumar Selvaraj schrieb:
>> Senthil Kumar Selvaraj writes:
>> 
>>> Georg-Johann Lay writes:
>>>
 Senthil Kumar Selvaraj schrieb:
> Hi,
>
>   [set JUMP_TABLES_IN_TEXT_SECTION to 1]
>>>
>>> I added tests that use linker relaxation and discovered a relaxation bug
>>> in binutils 2.26 (and later) that messes up symbol values in the
>>> presence of alignment directives. I'm working on that right now -
>>> hopefully, it'll get backported to the release branch.
>>>
>>> Once that gets upstream, I'll resend the patch - with more tests, and
>>> incorporating your comments.
>>>
>> 
>> There were two binutils bugs (PR ld/20221 and ld/20254) that were
>> blocking this patch - on enabling, relaxation, jumptables were
>> getting corrupted. Both of the issues are now fixed, and the fixes
>> are in master and 2.26 branch.
>
> Should we mention in the release notes that Binutils >= 2.26 is needed 
> for avr-gcc >= 6 ?

Yes, we should document it. binutils 2.25 would probably work too, as
the bugs were introduced only in binutils 2.26. I'll check and send a patch.
>
> Maybe even check during configure whether an appropriate version of 
> Binutils is used?

That would be nice, but is it ok to add target specific conditions to
configure.ac?

Regards
Senthil