Re: [PATCH], Pr 71667, Fix PowerPC ISA 3.0 DImode Altivec load/store

2016-06-27 Thread Segher Boessenkool
On Mon, Jun 27, 2016 at 08:08:20PM -0400, Michael Meissner wrote:
> This patch fixes PR 71667 that I discovered when trying to build the Spec 2006
> xalancbmk benchmark for the Power9.  The Altivec indexed memory load/stores
> need to go before the D-form (register + offset) load/stores, because they 
> have
> different syntaxes.

I'm not sure what that means?  Could you explain a bit more?

>   PR target/71667
>   * config/rs6000/rs6000.md (movdi_internal32): Swap alternatives
>   for loading Altivec registers so that the indexed case comes
>   first, and the general case (which includes indexed loads) comes
>   later.

So the general case allows indexed loads as well, but that breaks somehow?

>   PR target/71667
>   * g++.dg/pr71667.C: New test for PR 71667.

20599 lines, can you minimize this a bit?  If not, maybe we should just
do without testcase here.


Segher


[PATCH, contrib] download_prerequisites: check for existing symlinks before making new ones

2016-06-27 Thread Eric Gallager
The last time I ran ./contrib/download_prerequisites, I already had
previous symlinks set up from a previous run of the script, so `ln`
followed the existing symlinks and created the new ones in the
directories to which the symlinks pointed. This patch should fix that
by removing the old symlinks before creating new ones. (For some
reason the `-f` flag to `ln` that was already there wasn't enough for
me.) Tested by running the script and ensuring that the new isl
symlink pointed to the correct directory, and that there were no bad
symlinks in the old isl directory. Could someone commit this trivial
patch for me, or something like it? I don't have write access.

Thanks,
Eric Gallager
 contrib/download_prerequisites | 4 
 1 file changed, 4 insertions(+)

diff --git a/contrib/download_prerequisites b/contrib/download_prerequisites
index 917ee23..6c6e02f 100755
--- a/contrib/download_prerequisites
+++ b/contrib/download_prerequisites
@@ -36,14 +36,17 @@ MPC=mpc-1.0.3
 
 wget ftp://gcc.gnu.org/pub/gcc/infrastructure/$MPFR.tar.bz2 || exit 1
 tar xjf $MPFR.tar.bz2 || exit 1
+if test -L mpfr; then unlink mpfr; fi
 ln -sf $MPFR mpfr || exit 1
 
 wget ftp://gcc.gnu.org/pub/gcc/infrastructure/$GMP.tar.bz2 || exit 1
 tar xjf $GMP.tar.bz2  || exit 1
+if test -L gmp; then unlink gmp; fi
 ln -sf $GMP gmp || exit 1
 
 wget ftp://gcc.gnu.org/pub/gcc/infrastructure/$MPC.tar.gz || exit 1
 tar xzf $MPC.tar.gz || exit 1
+if test -L mpc; then unlink mpc; fi
 ln -sf $MPC mpc || exit 1
 
 # Necessary to build GCC with the Graphite loop optimizations.
@@ -52,5 +55,6 @@ if [ "$GRAPHITE_LOOP_OPT" = "yes" ] ; then
 
   wget ftp://gcc.gnu.org/pub/gcc/infrastructure/$ISL.tar.bz2 || exit 1
   tar xjf $ISL.tar.bz2  || exit 1
+  if test -L isl; then unlink isl; fi
   ln -sf $ISL isl || exit 1
 fi


Re: [PATCH, rs6000] Fix PR target 71656, reload ICE when -mcpu=power9 -mpower9-dform-vector

2016-06-27 Thread Peter Bergner

On 6/27/16 3:21 PM, Segher Boessenkool wrote:

On Sat, Jun 25, 2016 at 07:14:01PM -0500, Peter Bergner wrote:
Okay for trunk, okay for 6 later.  One comment:


+  if (VECTOR_MODE_P (mode)
+  && !mode_supports_vsx_dform_quad (mode))
+return false;

   if (GET_CODE (addr) != PLUS)
 return false;

   op0 = XEXP (addr, 0);
-  if (!base_reg_operand (op0, Pmode))
+  if (!REG_P (op0)
+  || !INT_REG_OK_FOR_BASE_P (op0, strict))
 return false;


Just put these short conditionals on one line each?  It looks silly ;-)


Ok, committed to trunk with that change.  Thanks!

Peter





Re: [PATCH, AARCH64] add qdf24xx tuning structure

2016-06-27 Thread Jim Wilson
On Mon, Jun 13, 2016 at 3:01 AM, Kyrill Tkachov
 wrote:
> Hi Jim,
>
> On 10/06/16 23:48, Jim Wilson wrote:
>>
>> This adds a tuning structure for qdf24xx.  This was tested with an
>> aarch64-linux bootstrap and a make check, with no regressions.  I also
>> tested it with an x86_64-linux C make check to verify that I didn't
>> break the testsuite for non aarch64 targets.
>
>
> As this also changes code in the arm backend
> it also needs a bootstrap and test on an arm target
> (arm-none-linux-gnueabihf for example).
> Can you please confirm that this passes successfully?

Yes, I forgot to do the bootstrap and make check on arm.

I tried to do that testing, and ran into problems with the armv8-a
assembler warning
IT blocks containing 32-bit Thumb instructions are deprecated in ARMv8
which messed up my testsuite results so much that they were unusable.
I had to rerun the tests to workaround that, and then got distracted
by other problems, but I have now done an armhf bootstrap and make
check with unpatched (cortex-a57) and patched (qdf24xx) trees and got
the same results.

During the delay, the aarch64 tuning structure changed how the recip
square root approx is handled, so I had to make a trivial change to my
patch to compensate for that, and then redo the aarch64 bootstrap to
make sure it was still OK.  The new patch is attached, which otherwise
the same as the previous patch.  I'm assuming this is still OK to
install, as the previous patch was approved pending test results, but
will wait a bit in case someone ones to object.

Jim
	gcc/
	* config/aarch64/aarch64-cores.def (qdf24xx): Use qdf24xx tuning.
	* config/aarch64/aarch64.c (qdf24xx_addrcost_table,
	qdf24xx_regmove_cost, qdf24xx_tunings): New.
	* config/arm/aarch64-cost-tables.h (qdf24xx_extra_costs): New.
	* config/arm/arm-cores.def (qdf24xx): Use qdf24xx tuning.
	* config/arm/arm.c (arm_qdf24xx_tune): New.

	gcc/testsuite/
	* gcc.dg/asr_div1.c: Add aarch64 specific dg-options.

Index: config/aarch64/aarch64-cores.def
===
--- config/aarch64/aarch64-cores.def	(revision 237800)
+++ config/aarch64/aarch64-cores.def	(working copy)
@@ -46,7 +46,7 @@ AARCH64_CORE("cortex-a57",  cortexa57, cortexa57,
 AARCH64_CORE("cortex-a72",  cortexa72, cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa72, "0x41", "0xd08")
 AARCH64_CORE("cortex-a73",  cortexa73, cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa73, "0x41", "0xd09")
 AARCH64_CORE("exynos-m1",   exynosm1,  exynosm1,  8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, exynosm1,  "0x53", "0x001")
-AARCH64_CORE("qdf24xx", qdf24xx,   cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, cortexa57, "0x51", "0x800")
+AARCH64_CORE("qdf24xx", qdf24xx,   cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, qdf24xx,   "0x51", "0x800")
 AARCH64_CORE("thunderx",thunderx,  thunderx,  8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx,  "0x43", "0x0a1")
 AARCH64_CORE("xgene1",  xgene1,xgene1,8A,  AARCH64_FL_FOR_ARCH8, xgene1, "0x50", "0x000")
 
Index: config/aarch64/aarch64.c
===
--- config/aarch64/aarch64.c	(revision 237800)
+++ config/aarch64/aarch64.c	(working copy)
@@ -250,6 +250,22 @@ static const struct cpu_addrcost_table xgene1_addr
   0, /* imm_offset  */
 };
 
+static const struct cpu_addrcost_table qdf24xx_addrcost_table =
+{
+{
+  1, /* hi  */
+  0, /* si  */
+  0, /* di  */
+  1, /* ti  */
+},
+  0, /* pre_modify  */
+  0, /* post_modify  */
+  0, /* register_offset  */
+  0, /* register_sextend  */
+  0, /* register_zextend  */
+  0 /* imm_offset  */
+};
+
 static const struct cpu_regmove_cost generic_regmove_cost =
 {
   1, /* GP2GP  */
@@ -308,6 +324,15 @@ static const struct cpu_regmove_cost xgene1_regmov
   2 /* FP2FP  */
 };
 
+static const struct cpu_regmove_cost qdf24xx_regmove_cost =
+{
+  2, /* GP2GP  */
+  /* Avoid the use of int<->fp moves for spilling.  */
+  6, /* GP2FP  */
+  6, /* FP2GP  */
+  4 /* FP2FP  */
+};
+
 /* Generic costs for vector insn classes.  */
 static const struct cpu_vector_cost generic_vector_cost =
 {
@@ -647,6 +672,32 @@ static const struct tune_params xgene1_tunings =
   (AARCH64_EXTRA_TUNE_NONE)	/* tune_flags.  */
 };
 
+static const struct tune_params qdf24xx_tunings =
+{
+  _extra_costs,
+  _addrcost_table,
+  _regmove_cost,
+  _vector_cost,
+  _branch_cost,
+  _approx_modes,
+  4, /* memmov_cost  */
+  4, /* issue_rate  */
+  (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
+   | AARCH64_FUSE_MOVK_MOVK), /* fuseable_ops  */
+  16,	/* function_align.  */
+  8,	/* jump_align.  */
+  16,	/* loop_align.  */
+  2,	/* int_reassoc_width.  */
+  4,	/* fp_reassoc_width.  */
+  1,	/* vec_reassoc_width.  */
+  2,	/* min_div_recip_mul_sf.  */
+  2,	/* min_div_recip_mul_df.  

Re: [PATCH, AARCH64] add qdf24xx tuning structure

2016-06-27 Thread Jim Wilson
On Mon, Jun 13, 2016 at 1:53 AM, James Greenhalgh
 wrote:
> On Fri, Jun 10, 2016 at 03:48:38PM -0700, Jim Wilson wrote:
>> This adds a tuning structure for qdf24xx.  This was tested with an

> Have you seen my recent patch for Cortex-A57 that changes the costs there
> to be relative to the cost of a floating-point register to floating-point
> register move [1]? I found that gave me a number of
> improvements due to comparisons in the compiler that assume a move in a
> mode is cheap, and other costs will be defined relative to it.
>
> Did you consider that for the qdf24xx costs?

No.  We haven't considered that.  However, we have plans to revist
some of this tuning work when we get updated docs, so we can take a
look at this then.

Jim


PING Re: [PATCH 0/9] separate shrink-wrapping

2016-06-27 Thread Segher Boessenkool
Ping.

On Wed, Jun 08, 2016 at 01:47:31AM +, Segher Boessenkool wrote:
> This patch series introduces separate shrink-wrapping.
> 
> There are many things the prologue/epilogue of a function do, and most of
> those things can be done independently.  For example, most of the time,
> for many targets, the save of callee-saved registers can be done later
> than the "main" prologue.
> 
> Doing so helps quite a bit because the prologue is expensive for functions
> that do not need everything it does done for every path through the
> function; often, the hot paths do not need much at all, e.g. not those
> things the prologue needs to do for the function to call other functions.
> 
> The first patch creates a command-line flag, some hooks, a status flag
> ("is this function wrapped separately", used by later passes), and
> documentation for these things.
> 
> The next six patches are to prevent later passes from mishandling the
> epilogue instructions that now appear before the epilogue: mostly, you
> cannot do much to instructions with a REG_CFA_RESTORE note without
> confusing dwarf2cfi.  The cprop one is for prologue instructions.
> 
> Then, the main patch.  And finally a patch for PowerPC that implements
> separate wrapping for GPRs and LR.
> 
> Tested on powerpc64-linux (-m32/-m64, -mlra/-mno-lra), and on
> powerpc64le-linux.  Previous versions of this series also tested on
> x86_64-linux.
> 
> Is this okay for trunk?
> 
> 
> Segher
> 
> 
> Segher Boessenkool (9):
>   separate shrink-wrap: New command-line flag, status flag, hooks, and doc
>   cfgcleanup: Don't confuse CFI when -fshrink-wrap-separate
>   dce: Don't dead-code delete separately wrapped restores
>   regrename: Don't rename restores
>   regrename: Don't run if function was separately shrink-wrapped
>   sel-sched: Don't mess with register restores
>   cprop: Leave RTX_FRAME_RELATED_P instructions alone
>   shrink-wrap: shrink-wrapping for separate concerns
>   rs6000: Separate shrink-wrapping
> 
>  gcc/cfgcleanup.c   |   5 +
>  gcc/common.opt |   4 +
>  gcc/config/rs6000/rs6000.c | 257 --
>  gcc/dce.c  |   9 +
>  gcc/doc/invoke.texi|  11 +-
>  gcc/doc/tm.texi|  53 
>  gcc/doc/tm.texi.in |  29 ++
>  gcc/emit-rtl.h |   4 +
>  gcc/function.c |  15 +-
>  gcc/regcprop.c |   3 +
>  gcc/regrename.c|  12 +-
>  gcc/sel-sched-ir.c |   1 +
>  gcc/shrink-wrap.c  | 647 
> +
>  gcc/shrink-wrap.h  |   1 +
>  gcc/target.def |  56 
>  15 files changed, 1088 insertions(+), 19 deletions(-)
> 
> -- 
> 1.9.3


[PATCH] rs6000: Fix split of ashdi3_extswsli_dot for memory (PR71670)

2016-06-27 Thread Segher Boessenkool
The splitter for ashdi3_extswsli_dot for cr0 with memory uses emit_insn
gen_ashdi3_extswsli_dot, which does not work because that emits a scratch,
while the splitter runs after reload so there should be a real register
instead.  We can laboriously fix that up, or emit using
gen_ashdi3_extswsli_dot2 instead.  This patch does the latter.

Tested that the new testcase fails without, and works with the patch.
Now bootstrapping.


Segher


2016-06-27  Segher Boessenkool  

PR target/71670
* config/rs6000/rs6000.md (ashdi3_extswsli_dot): Use
gen_ashdi3_extswsli_dot2 instead of gen_ashdi3_extswsli_dot.

gcc/testsuite/
PR target/71670
* gcc.target/powerpc/pr71670.c: New testcase.

---
 gcc/config/rs6000/rs6000.md| 2 +-
 gcc/testsuite/gcc.target/powerpc/pr71670.c | 7 +++
 2 files changed, 8 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr71670.c

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 133eef1..39a9cf8 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -4087,7 +4087,7 @@ (define_insn_and_split "ashdi3_extswsli_dot"
 
   if (REGNO (cr) == CR0_REGNO)
 {
-  emit_insn (gen_ashdi3_extswsli_dot (dest, src2, shift, cr));
+  emit_insn (gen_ashdi3_extswsli_dot2 (dest, src2, shift, cr));
   DONE;
 }
 
diff --git a/gcc/testsuite/gcc.target/powerpc/pr71670.c 
b/gcc/testsuite/gcc.target/powerpc/pr71670.c
new file mode 100644
index 000..18fb627
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr71670.c
@@ -0,0 +1,7 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power9" } } */
+/* { dg-options "-mcpu=power9 -O1" } */
+
+volatile int a;
+int b;
+void fn1(void) { b + (long)b || a; }
-- 
1.9.3



Re: OpenACC wait clause

2016-06-27 Thread Cesar Philippidis
On 06/27/2016 12:23 PM, Jakub Jelinek wrote:
> On Mon, Jun 27, 2016 at 11:36:26AM -0700, Cesar Philippidis wrote:

>> @@ -630,9 +653,10 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, uint64_t 
>> mask,
>>  {
>>gfc_omp_clauses *c = gfc_get_omp_clauses ();
>>locus old_loc;
>> +  bool seen_error = false;
>>  
>>*cp = NULL;
>> -  while (1)
>> +  while (!seen_error)
>>  {
>>if ((first || gfc_match_char (',') != MATCH_YES)
>>&& (needs_space && gfc_match_space () != MATCH_YES))
> 
> Why?
> The main loop is while (1) which has break; as the last statement.
> Instead of setting seen_error, just set
> gfc_current_locus = old_loc;
> if you have already matched successfully something, and then break;
> as you already do.

I made that change.

>> @@ -1275,9 +1309,16 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, uint64_t 
>> mask,
>>  continue;
>>if ((mask & OMP_CLAUSE_TILE)
>>&& !c->tile_list
>> -  && match_oacc_expr_list ("tile (", >tile_list,
>> -   true) == MATCH_YES)
>> -continue;
>> +  && gfc_match ("tile") == MATCH_YES)
>> +{
>> +  if (match_oacc_expr_list (" (", >tile_list, true) != MATCH_YES)
>> +{
>> +  seen_error = true;
>> +  break;
>> +}
>> +  needs_space = true;
>> +  continue;
>> +}
>>if ((mask & OMP_CLAUSE_TO)
>>&& gfc_match_omp_variable_list ("to (",
>>>lists[OMP_LIST_TO], false,
> 
> So, tile without ()s is also a valid clause in OpenACC?
> If yes, what do you set in c structure for the existence of the clause?
> If it is not valid, then the above change looks wrong.

No. The tile clause always was a () argument. So I should have used
gfc_match_omp_variable_list ("tile (", >tile_list) instead. This
patch fixes that.

>> @@ -1309,10 +1350,13 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, 
>> uint64_t mask,
>>&& gfc_match ("vector") == MATCH_YES)
>>  {
>>c->vector = true;
>> -  if (gfc_match (" ( length : %e )", >vector_expr) == MATCH_YES
>> -  || gfc_match (" ( %e )", >vector_expr) == MATCH_YES)
>> -needs_space = false;
>> -  else
>> +  match m = match_oacc_clause_gwv(c, GOMP_DIM_VECTOR);
> 
> Formatting, space before (.

Fixed.

>> @@ -1348,7 +1402,7 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, uint64_t 
>> mask,
>>break;
>>  }
>>  
>> -  if (gfc_match_omp_eos () != MATCH_YES)
>> +  if (seen_error || gfc_match_omp_eos () != MATCH_YES)
>>  {
>>gfc_free_omp_clauses (c);
>>return MATCH_ERROR;
> 
> Again, IMHO not needed, if you restore the old_loc into gfc_current_loc,
> then gfc_match_omp_eos () will surely fail.

Fixed.

Is this ok for trunk and gcc6?

Cesar

2016-06-27  Cesar Philippidis  

	gcc/fortran/
	* openmp.c (match_oacc_clause_gang): Rename to ...
	(match_oacc_clause_gwv): this.  Add support for OpenACC worker and
	vector clauses.
	(gfc_match_omp_clauses): Use match_oacc_clause_gwv for
	OMP_CLAUSE_{GANG,WORKER,VECTOR}.  Propagate any MATCH_ERRORs for
	invalid OMP_CLAUSE_{ASYNC,WAIT,GANG,WORKER,VECTOR} clauses.
	(gfc_match_oacc_wait): Propagate MATCH_ERROR for invalid
	oacc_expr_lists.  Adjust the first and needs_space arguments to
	gfc_match_omp_clauses.

	gcc/testsuite/
	* gfortran.dg/goacc/asyncwait-2.f95: Updated expected diagnostics.
	* gfortran.dg/goacc/asyncwait-3.f95: Likewise.
	* gfortran.dg/goacc/asyncwait-4.f95: Add test coverage.

diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index f514866..a691af9 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -396,43 +396,66 @@ cleanup:
 }
 
 static match
-match_oacc_clause_gang (gfc_omp_clauses *cp)
+match_oacc_clause_gwv (gfc_omp_clauses *cp, unsigned gwv)
 {
   match ret = MATCH_YES;
 
   if (gfc_match (" ( ") != MATCH_YES)
 return MATCH_NO;
 
-  /* The gang clause accepts two optional arguments, num and static.
- The num argument may either be explicit (num: ) or
- implicit without ( without num:).  */
-
-  while (ret == MATCH_YES)
+  if (gwv == GOMP_DIM_GANG)
 {
-  if (gfc_match (" static :") == MATCH_YES)
+/* The gang clause accepts two optional arguments, num and static.
+	 The num argument may either be explicit (num: ) or
+	 implicit without ( without num:).  */
+
+  while (ret == MATCH_YES)
 	{
-	  if (cp->gang_static)
-	return MATCH_ERROR;
+	  if (gfc_match (" static :") == MATCH_YES)
+	{
+	  if (cp->gang_static)
+		return MATCH_ERROR;
+	  else
+		cp->gang_static = true;
+	  if (gfc_match_char ('*') == MATCH_YES)
+		cp->gang_static_expr = NULL;
+	  else if (gfc_match (" %e ", >gang_static_expr) != MATCH_YES)
+		return MATCH_ERROR;
+	}
 	  else
-	cp->gang_static = true;
-	  if (gfc_match_char ('*') == MATCH_YES)
-	cp->gang_static_expr = 

[PATCH] Offer suggestions for misspelled --param names.

2016-06-27 Thread David Malcolm
Another use of spellcheck.{c|h}, this time for --param.

Successfully bootstrapped on x86_64-pc-linux-gnu;
adds 4 PASS results to gcc.sum.

OK for trunk?

gcc/ChangeLog:
* opts.c (handle_param): Use find_param_fuzzy to offer suggestions
for misspelled param names.
* params.c: Include spellcheck.h.
(find_param_fuzzy): New function.
* params.h (find_param_fuzzy): New prototype.
* spellcheck.c (struct edit_distance_traits): Move
to...
* spellcheck.h (struct edit_distance_traits):
...here.

gcc/testsuite/ChangeLog:
* gcc.dg/spellcheck-params.c: New testcase.
* gcc.dg/spellcheck-params-2.c: New testcase.
---
 gcc/opts.c |  9 -
 gcc/params.c   | 14 ++
 gcc/params.h   |  1 +
 gcc/spellcheck.c   | 18 --
 gcc/spellcheck.h   | 18 ++
 gcc/testsuite/gcc.dg/spellcheck-params-2.c |  4 
 gcc/testsuite/gcc.dg/spellcheck-params.c   |  4 
 7 files changed, 49 insertions(+), 19 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/spellcheck-params-2.c
 create mode 100644 gcc/testsuite/gcc.dg/spellcheck-params.c

diff --git a/gcc/opts.c b/gcc/opts.c
index 7406210..f09c520 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -2228,7 +2228,14 @@ handle_param (struct gcc_options *opts, struct 
gcc_options *opts_set,
 
   enum compiler_param index;
   if (!find_param (arg, ))
-   error_at (loc, "invalid --param name %qs", arg);
+   {
+ const char *suggestion = find_param_fuzzy (arg);
+ if (suggestion)
+   error_at (loc, "invalid --param name %qs; did you mean %qs?",
+ arg, suggestion);
+ else
+   error_at (loc, "invalid --param name %qs", arg);
+   }
   else
{
  if (!param_string_value_p (index, equal + 1, ))
diff --git a/gcc/params.c b/gcc/params.c
index 41660b4..1b5000b 100644
--- a/gcc/params.c
+++ b/gcc/params.c
@@ -25,6 +25,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "params.h"
 #include "params-enum.h"
 #include "diagnostic-core.h"
+#include "spellcheck.h"
 
 /* An array containing the compiler parameters and their current
values.  */
@@ -142,6 +143,19 @@ find_param (const char *name, enum compiler_param *index)
   return false;
 }
 
+/* Look for the closest match for NAME in the parameter table, returning it
+   if it is a reasonable suggestion for a misspelling.  Return NULL
+   otherwise.  */
+
+const char *
+find_param_fuzzy (const char *name)
+{
+  best_match  bm (name);
+  for (size_t i = 0; i < num_compiler_params; ++i)
+bm.consider (compiler_params[i].option);
+  return bm.get_best_meaningful_candidate ();
+}
+
 /* Return true if param with entry index INDEX should be defined using strings.
If so, return the value corresponding to VALUE_NAME in *VALUE_P.  */
 
diff --git a/gcc/params.h b/gcc/params.h
index 7221ab6..97c8d56 100644
--- a/gcc/params.h
+++ b/gcc/params.h
@@ -89,6 +89,7 @@ enum compiler_param
 };
 
 extern bool find_param (const char *, enum compiler_param *);
+extern const char *find_param_fuzzy (const char *name);
 extern bool param_string_value_p (enum compiler_param, const char *, int *);
 
 /* The value of the parameter given by ENUM.  Not an lvalue.  */
diff --git a/gcc/spellcheck.c b/gcc/spellcheck.c
index 2648f3a..b37b1e4 100644
--- a/gcc/spellcheck.c
+++ b/gcc/spellcheck.c
@@ -121,24 +121,6 @@ levenshtein_distance (const char *s, const char *t)
   return levenshtein_distance (s, strlen (s), t, strlen (t));
 }
 
-/* Specialization of edit_distance_traits for C-style strings.  */
-
-template <>
-struct edit_distance_traits
-{
-  static size_t get_length (const char *str)
-  {
-gcc_assert (str);
-return strlen (str);
-  }
-
-  static const char *get_string (const char *str)
-  {
-gcc_assert (str);
-return str;
-  }
-};
-
 /* Given TARGET, a non-NULL string, and CANDIDATES, a non-NULL ptr to
an autovec of non-NULL strings, determine which element within
CANDIDATES has the lowest edit distance to TARGET.  If there are
diff --git a/gcc/spellcheck.h b/gcc/spellcheck.h
index 035f4ac..b48cfbc 100644
--- a/gcc/spellcheck.h
+++ b/gcc/spellcheck.h
@@ -48,6 +48,24 @@ find_closest_string (const char *target,
 template 
 struct edit_distance_traits {};
 
+/* Specialization of edit_distance_traits for C-style strings.  */
+
+template <>
+struct edit_distance_traits
+{
+  static size_t get_length (const char *str)
+  {
+gcc_assert (str);
+return strlen (str);
+  }
+
+  static const char *get_string (const char *str)
+  {
+gcc_assert (str);
+return str;
+  }
+};
+
 /* A type for use when determining the best match against a string,
expressed as a template so that we can match against various
string-like types (const char *, frontend identifiers, and preprocessor

Re: [PATCH, rs6000] Scheduling update

2016-06-27 Thread Segher Boessenkool
On Mon, Jun 27, 2016 at 04:46:00PM -0500, Pat Haugen wrote:
> On 06/22/2016 02:10 PM, Segher Boessenkool wrote:
> >> Index: config/rs6000/htm.md
> >> ===
> >> --- config/rs6000/htm.md   (revision 237621)
> >> +++ config/rs6000/htm.md   (working copy)
> >> @@ -72,7 +72,8 @@ (define_insn "*tabort"
> >> (set (match_operand:BLK 2) (unspec:BLK [(match_dup 2)] 
> >> UNSPEC_HTM_FENCE))]
> >>"TARGET_HTM"
> >>"tabort. %0"
> >> -  [(set_attr "type" "htm")
> >> +  [(set_attr "type" "htmsimple")
> >> +   (set_attr "power9_alu2" "yes")
> >> (set_attr "length" "4")])
> > 
> > What determines if an insn is htm or htmsimple?
> > 
> htm insns are cracked whereas htmsimple are not.

Sorry, I wasn't clear.  That is what is the difference on p9, sure.
But is there some pattern to this?  Some difference that does not depend
on a specific CPU implementation.

> (rs6000_sched_init): Fix initialization of last_scheduled_insn.
> Initialize divCnt/vec_load_pendulum.

You missed divCnt here :-)

> +(define_insn_reservation "power9-vecdiv" 32
> +  (and (eq_attr "type" "vecdiv")
> +   (eq_attr "size" "!128")
> +   (eq_attr "cpu" "power9"))
> +  "DU_super_power9,VSU_super_power9")

Does that work, the ! ?


This looks much better :-)

Okay for trunk; okay for 6 later.  Thanks,


Segher


Re: [PATCH, rs6000] Scheduling update

2016-06-27 Thread Pat Haugen
On 06/22/2016 02:10 PM, Segher Boessenkool wrote:
>> Index: config/rs6000/htm.md
>> ===
>> --- config/rs6000/htm.md (revision 237621)
>> +++ config/rs6000/htm.md (working copy)
>> @@ -72,7 +72,8 @@ (define_insn "*tabort"
>> (set (match_operand:BLK 2) (unspec:BLK [(match_dup 2)] 
>> UNSPEC_HTM_FENCE))]
>>"TARGET_HTM"
>>"tabort. %0"
>> -  [(set_attr "type" "htm")
>> +  [(set_attr "type" "htmsimple")
>> +   (set_attr "power9_alu2" "yes")
>> (set_attr "length" "4")])
> 
> What determines if an insn is htm or htmsimple?
> 
htm insns are cracked whereas htmsimple are not.


> 
>> +; Quad-precision FP ops, execute in DFU
>> +(define_attr "power9_qp" "no,yes"
>> +  (if_then_else (ior (match_operand:KF 0 "" "")
>> + (match_operand:TF 0 "" "")
>> + (match_operand:KF 1 "" "")
>> + (match_operand:TF 1 "" ""))
>> +(const_string "yes")
>> +(const_string "no")))
> 
> (The "" are not needed I think).
> 
> This perhaps could be better handled with the "size" attribute.
> 
Patch has been modified to annotate 128-bit FP insns with size '128' and 
handled that way.


>> +(define_insn_reservation "power9-load-ext" 6
>> +  (and (eq_attr "type" "load")
>> +   (eq_attr "sign_extend" "yes")
>> +   (eq_attr "update" "no")
>> +   (eq_attr "cpu" "power9"))
>> +  "DU_C2_power9,LSU_power9")
> 
> So you do not describe the units used after the first cycle?  Why is
> that, to keep the size of the automaton down?
> 
Yes, I ran into problems with DFA state explosion when trying to list follow-on 
cycles/unit reservations.

> 
>> +(define_insn_reservation "power9-fpload-double" 4
>> +  (and (eq_attr "type" "fpload")
>> +   (eq_attr "update" "no")
>> +   (match_operand:DF 0 "" "")
>> +   (eq_attr "cpu" "power9"))
>> +  "DU_slice_3_power9,LSU_power9")
> 
> Using match_operand here is asking for trouble.  "size", and you can
> default that for "fpload" insns, and document there that it looks at the
> mode of operands[0] for fpload?
Handled with size '64' additions to fpload insns.


>>  {
>> +  int pos;
>> +  int i;
>> +  rtx_insn *tmp;
> 
> Moving these to an outer scope is really a step back.  The new code could
> just declare them itself; in fact, it should probably be a separate
> function anyway.
> 
Separate function created.


Updated changelog/patch follow, with additional coding style corrections you 
pointed out also made. The diff is against current trunk, am currently 
bootstrap/regtesting on top of the other patch you already reviewed.

Thanks,
Pat


2016-06-27  Pat Haugen  

* config/rs6000/rs6000.md ('type' attribute): Add htmsimple/dfp types.
('size' attribute): Add '128'.
Include power9.md.
(*mov_hardfloat32, *mov_hardfloat64, *movdi_internal32,
*movdi_internal64, *movdf_update1): Set size attribute to '64'.
(add3, sub3, mul3, div3, sqrt2,
copysign3, neg2_hw, abs2_hw, *nabs2_hw,
*fma4_hw, *fms4_hw, *nfma4_hw, *nfms4_hw,
extend2_hw, truncdf2_hw,
*xscvqpwz_, *xscvqpdz_, *xscvdqp_,
*truncdf2_odd): Set size attribute to '128'.
(*cmp_hw): Change type to veccmp and set size attribute to '128'.
* config/rs6000/power6.md (power6-fp): Include dfp type.
* config/rs6000/power7.md (power7-fp): Likewise.
* config/rs6000/power8.md (power8-fp): Likewise.
* config/rs6000/power9.md: New file.
* config/rs6000/t-rs6000 (MD_INCLUDES): Add power9.md.
* config/rs6000/htm.md (*tabort, *tabortc, *tabortci,
*trechkpt, *treclaim, *tsr, *ttest): Change type attribute to
htmsimple.
* config/rs6000/dfp.md (extendsddd2, truncddsd2, extendddtd2,
trunctddd2, a3, addtd3, subdd3, subtd3, muldd3, multd3, divdd3,
divtd3, *cmpdd_internal1, *cmptd_internal1, floatdidd2, floatditd2,
ftruncdd2, fixdddi2, ftrunctd2, fixtddi2, dfp_ddedpd_,
dfp_denbcd_, dfp_dxex_, dfp_diex_, dfp_dscli_,
dfp_dscri_): Change type attribute to dfp.
* config/rs6000/crypto.md (crypto_vshasigma): Change type
attribute to vecsimple.
* config/rs6000/rs6000.c (power9_cost): Update costs, cache size
and prefetch streams.
(rs6000_option_override_internal): Remove temporary code setting
tuning to power8.  Don't set rs6000_sched_groups for power9.
(last_scheduled_insn): Change to rtx_insn *.
(divide_cnt, vec_load_pendulum): New variables.
(rs6000_adjust_cost): Add Power9 to test for store->load separation.
(rs6000_issue_rate): Set issue rate for Power9.
(is_power9_pairable_vec_type): New.
(power9_sched_reorder2): New.
(rs6000_sched_reorder2): Call new function for Power9 specific
reordering.
(insn_must_be_first_in_group): Remove Power9.

Re: [PATCH], Add PowerPC ISA 3.0 lxsihzx, lxsibzx, stxsihx, stxsibx support

2016-06-27 Thread Segher Boessenkool
On Mon, Jun 27, 2016 at 05:37:57PM -0400, Michael Meissner wrote:
> > > @@ -872,7 +878,6 @@ (define_insn_and_split "*zero_extendsi > > (set_attr "dot" "yes")
> > > (set_attr "length" "4,8")])
> > >  
> > > -
> > >  (define_insn "extendqi2"
> > >[(set (match_operand:EXTQI 0 "gpc_reg_operand" "=r")
> > >   (sign_extend:EXTQI (match_operand:QI 1 "gpc_reg_operand" "r")))]
> > 
> > Unrelated whitespace change, please don't.
> 
> I generally try to clean things up as I'm editing code.

But it's not a cleanup.  The double blank lines group related patterns.


Segher


Re: [PATCH, rs6000] Scheduling update

2016-06-27 Thread Segher Boessenkool
On Mon, Jun 27, 2016 at 04:24:39PM -0500, Pat Haugen wrote:
> On 06/27/2016 03:41 PM, Segher Boessenkool wrote:
> >> * config/rs6000/rs6000.md ('type' attribute): Add
> >> > vec_logical,veccmp_fx,vec_extend,vecmove insn types.
> > Those names are a bit irregular (underscore vs. no underscore after "vec",
> > "extend" is called "exts" for integer, "vec_logical" holds no relation to
> > integer "logical").
> 
> I can remove the underscore to match existing types and change extend->exts. 
> As for the vec_logical/integer logical point, not sure I'm understanding, 
> these are the vector forms of and/or/xor/etc.

Oh!  So they were wrongly called "perm" before?  Ha.

If you can easily make the changes, please do, otherwise just postpone
it, we'll go over this a few more times anyway.


Segher


Re: [PATCH], Add PowerPC ISA 3.0 lxsihzx, lxsibzx, stxsihx, stxsibx support

2016-06-27 Thread Michael Meissner
On Mon, Jun 27, 2016 at 03:05:19PM -0500, Segher Boessenkool wrote:
> On Thu, Jun 23, 2016 at 06:37:15PM -0400, Michael Meissner wrote:
> > PowerPC ISA 3.0 adds new instructions (LXSIHZX, LXSIBZX, STXSIHX, and 
> > STXSIBX)
> > that allow you to load and zero extend byte and half word values from memory
> > and to store them back.
> 
> Okay for trunk with fixes below; okay for 6 later.
> 
> > @@ -872,7 +878,6 @@ (define_insn_and_split "*zero_extendsi > (set_attr "dot" "yes")
> > (set_attr "length" "4,8")])
> >  
> > -
> >  (define_insn "extendqi2"
> >[(set (match_operand:EXTQI 0 "gpc_reg_operand" "=r")
> > (sign_extend:EXTQI (match_operand:QI 1 "gpc_reg_operand" "r")))]
> 
> Unrelated whitespace change, please don't.

I generally try to clean things up as I'm editing code.

> > @@ -5188,6 +5193,107 @@ (define_insn_and_split "*floatunssidf2_i
> >[(set_attr "length" "20")
> > (set_attr "type" "fp")])
> >  
> > +;; ISA 3.0 adds instructions lxsi[bh]zx to directly load QImode and HImode 
> > to
> > +;; vector registers.  At the moment, QI/HImode are not allowed in floating
> > +;; point or vector registers, so we use UNSPEC's to use the load byte and
> > +;; half-word instructions.
> > +
> > +(define_expand "float2"
> > +  [(parallel [(set (match_operand:FP_ISA3 0 "vsx_register_operand" "")
> > +  (float:FP_ISA3
> > +   (match_operand:QHI 1 "input_operand" "")))
> > + (clobber (match_scratch:DI 2 ""))
> > + (clobber (match_scratch:DI 3 ""))])]
> 
> Drop the "" please.

Ok.

> > +  "TARGET_P9_VECTOR && TARGET_DIRECT_MOVE && TARGET_POWERPC64"
> > +{
> > +  if (MEM_P (operands[1]))
> > +operands[1] = rs6000_address_for_fpconvert (operands[1]);
> 
> That function should get a better name, it's not used soletly for fpconvert
> anymore.  I'm not asking you to do this now, it should be a separate patch
> anyway ;-)

However, in this case, it is doing the address change for a floating point
conversion.  But changing the name of existing functions that are working, tend
to be low on my priority list.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: Ping Re: Implement C _FloatN, _FloatNx types [version 3]

2016-06-27 Thread Segher Boessenkool
On Mon, Jun 27, 2016 at 09:22:54PM +, Joseph Myers wrote:
> > > Ping.  This patch 
> > >  is pending 
> > > review.  Built-in functions are available in the followup patch 
> > > .
> > 
> > I can't ack the patch, but the rs6000 bits of your original patch look fine.
> > I didn't mean to ask you to change those -- I hadn't read the patch
> > and was just commenting on your description of the patch, and I realize
> > now that I wasn't careful in my use of language.  I apologize for the
> > miscommunication there.  We should keep the check on
> > FLOAT128_IEEE_P to determine which mode is __float128 for now,
> > until the whole issue of two 128-bit floats is behind us.
> 
> In that case, consider the relevant part of patch version 1 to be 
> substituted into version 3 for the rest of the review.

The rs6000 part is approved, then.

Thanks,


Segher


Re: [PATCH,rs6000] Add support for Power9 DFP Test Significance Immediate instruction

2016-06-27 Thread Segher Boessenkool
Hi Kelvin,

On Mon, Jun 27, 2016 at 11:47:52AM -0600, Kelvin Nilsen wrote:
> 
> This patch adds built-in function support for the new Power9 dtstsfi
> instruction.
> 
> This has bootstrapped and regression tested on
> powerpc64le-unknown-linux-gnu without regressions.  Is this ok for the
> trunk?  Is this patch ok for gcc-6 after some burn-in time on the trunk?

Okay for trunk with the fixes below.  Okay for 6 later, too.


>   * gcc.target/powerpc/dtstsfi-0.c: New test.
..
>   * gcc.target/powerpc/dtstsfi-79.c: New test.

If you have this many tests, please use a subdir.

>   (rs6000_expand_binop_builtin): Enforce that argument 0 of the exp
>   argument is a 6-bit unsigned literal value if  the icode argument

Duplicated space.

> +(define_expand "dfptstsfi__"
> +  [(set (match_dup 3)
> + (compare:CCFP
> + (unspec:D64_D128 

Trailing space.

> +   [(match_operand:SI 1 "const_int_operand" "n")
> +(match_operand:D64_D128 2 "gpc_reg_operand" "d")]
> +   UNSPEC_DTSTSFI)
> +  (match_dup 4)))
> +   (set (match_operand:SI 0 "register_operand" "")
> + (DFP_TEST:SI (match_dup 3) 

Trailing space.

> +  (const_int 0)))
> +  ]
> +  "TARGET_P9_MISC"
> +{
> +  operands[3] = gen_reg_rtx (CCFPmode);
> +  operands[4] = CONST0_RTX (SImode);

That's just const0_rtx.

> +(define_insn "*dfp_sgnfcnc_"
> +  [(set (match_operand:CCFP 0 "" "=y")
> +(compare:CCFP
> +  (unspec:D64_D128 [(match_operand:SI 1 "const_int_operand" "n")
> +(match_operand:D64_D128 2 "gpc_reg_operand" "d")]
> +  UNSPEC_DTSTSFI)
> +  (match_operand:SI 3 "zero_constant" "j")))]
> +  "TARGET_P9_MISC"
> +  {

Please left-align the {} block.

> +/* If immediate operand is greater than 63, it will behave as if
> + * the value had been 63.  The code generator does not support 
> + * immediate operand values greater than 63. */

No leading * on comment continuation lines.  Two spaces after full stop.
Trailing space.

> +  else if (icode == CODE_FOR_dfptstsfi_eq_dd
> +  || icode == CODE_FOR_dfptstsfi_lt_dd
> +  || icode == CODE_FOR_dfptstsfi_gt_dd 
> +  || icode == CODE_FOR_dfptstsfi_unordered_dd
> +  || icode == CODE_FOR_dfptstsfi_eq_td 
> +  || icode == CODE_FOR_dfptstsfi_lt_td 
> +  || icode == CODE_FOR_dfptstsfi_gt_td 
> +  || icode == CODE_FOR_dfptstsfi_unordered_td)

Many trailing spaces.

> +{
> +  /* Only allow 6-bit unsigned literals.  */
> +  STRIP_NOPS (arg0);
> +  if (TREE_CODE (arg0) != INTEGER_CST
> +   || TREE_INT_CST_LOW (arg0) & ~0x3f)

Use IN_RANGE instead?  This only tests the bits that fit in a host int.

> +/* Miscellaneous builtins for decimal floating point instructions
> +   added in ISA 3.0.  These instructions don't require the VSX
> +   options, just the basic ISA 3.0 enablement since they operate on
> +   general purpose registers.  */ 

Trailing space.

Thanks,


Segher


Re: Ping Re: Implement C _FloatN, _FloatNx types [version 3]

2016-06-27 Thread Joseph Myers
On Mon, 27 Jun 2016, Bill Schmidt wrote:

> Hi Joseph,
> 
> > On Jun 27, 2016, at 12:21 PM, Joseph Myers  wrote:
> > 
> > Ping.  This patch 
> >  is pending 
> > review.  Built-in functions are available in the followup patch 
> > .
> 
> I can't ack the patch, but the rs6000 bits of your original patch look fine.
> I didn't mean to ask you to change those -- I hadn't read the patch
> and was just commenting on your description of the patch, and I realize
> now that I wasn't careful in my use of language.  I apologize for the
> miscommunication there.  We should keep the check on
> FLOAT128_IEEE_P to determine which mode is __float128 for now,
> until the whole issue of two 128-bit floats is behind us.

In that case, consider the relevant part of patch version 1 to be 
substituted into version 3 for the rest of the review.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH, rs6000] Scheduling update

2016-06-27 Thread Pat Haugen
On 06/27/2016 03:41 PM, Segher Boessenkool wrote:
>> * config/rs6000/rs6000.md ('type' attribute): Add
>> > vec_logical,veccmp_fx,vec_extend,vecmove insn types.
> Those names are a bit irregular (underscore vs. no underscore after "vec",
> "extend" is called "exts" for integer, "vec_logical" holds no relation to
> integer "logical").

I can remove the underscore to match existing types and change extend->exts. As 
for the vec_logical/integer logical point, not sure I'm understanding, these 
are the vector forms of and/or/xor/etc.

Thanks,
Pat



Re: [PATCH] Fix FFI return type for closures in the java interpreter

2016-06-27 Thread Tom Tromey
> "Matthew" == Matthew Fortune  writes:

Matthew> I've identified a latent bug in the java interpreter that affects MIPS
Matthew> n32 and n64 ABIs both little and big endian and, I presume, any 64-bit
Matthew> big endian target with int as 32-bit.

Thanks.

Matthew> I mentioned in my earlier post about a possible similar issue in the
Matthew> lang/reflect/natVMProxy.cc code (unbox function) by code inspection. I
Matthew> don't know how to trigger this code but perhaps someone can advise.

It's a bit complicated, and it's been a while since I looked at any of
this, but I think what you want to do is make an InvocationHandler that
handles some method returning "int" (maybe hashCode would work), then
make a Proxy class that wraps it.  Then, make an instance of the proxy
class and call the method.

Matthew> libjava/
Matthew>* interpret-run.cc: Use ffi_arg for FFI integer return types.
Matthew> libjava/testsuite/
Matthew>* libjava.jar/arraysort.java: New file.
Matthew>* libjava.jar/arraysort.jar: New file.
Matthew>* libjava.jar/arraysort.out: New file.
Matthew>* libjava.jar/arraysort.xfail: New file.

This is ok.

Tom


Re: Ping Re: Implement C _FloatN, _FloatNx types [version 3]

2016-06-27 Thread Bill Schmidt
Hi Joseph,

> On Jun 27, 2016, at 12:21 PM, Joseph Myers  wrote:
> 
> Ping.  This patch 
>  is pending 
> review.  Built-in functions are available in the followup patch 
> .

I can't ack the patch, but the rs6000 bits of your original patch look fine.
I didn't mean to ask you to change those -- I hadn't read the patch
and was just commenting on your description of the patch, and I realize
now that I wasn't careful in my use of language.  I apologize for the
miscommunication there.  We should keep the check on
FLOAT128_IEEE_P to determine which mode is __float128 for now,
until the whole issue of two 128-bit floats is behind us.

Best regards,
Bill

> 
> -- 
> Joseph S. Myers
> jos...@codesourcery.com
> 



Re: [PATCH] Document behavior of __builtin_*_overflow_p on bitfields

2016-06-27 Thread Joseph Myers
On Mon, 27 Jun 2016, Jakub Jelinek wrote:

> Hi!
> 
> While the docs say that no integral argument promotions are performed, I
> think it is better to make the behavior for bit-fields explicitly
> documented.
> 
> Ok for trunk?

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [Bug libstdc++/71640] [7 Regression] include/c++/7.0.0/bits/hashtable.h:293:7: error: too many template parameters in template redeclaration

2016-06-27 Thread François Dumont
Trivial attached patch applied to fix this regression. I am surprised 
that gcc had not detected it.


2016-06-27  François Dumont  

PR libstdc++/71640
* include/bits/hashtable.h: Remove _Unique_keya parameter in _Insert
friend declaration.


On 26/06/2016 18:21, pinskia at gcc dot gnu.org wrote:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71640

Andrew Pinski  changed:

What|Removed |Added

Target Milestone|--- |7.0



Index: include/bits/hashtable.h
===
--- include/bits/hashtable.h	(revision 237802)
+++ include/bits/hashtable.h	(working copy)
@@ -294,7 +294,7 @@
 	   typename _ExtractKeya, typename _Equala,
 	   typename _H1a, typename _H2a, typename _Hasha,
 	   typename _RehashPolicya, typename _Traitsa,
-	   bool _Constant_iteratorsa, bool _Unique_keysa>
+	   bool _Constant_iteratorsa>
 	friend struct __detail::_Insert;
 
 public:


Re: [PATCH, rs6000] Scheduling update

2016-06-27 Thread Segher Boessenkool
On Mon, Jun 27, 2016 at 09:54:09AM -0500, Pat Haugen wrote:
> On 06/22/2016 02:10 PM, Segher Boessenkool wrote:
> > The "power9_alu2" attribute is writing part of the scheduling description
> > inside the machine description proper.  Can this be reduced, maybe by
> > adding an attribute describing something about the insns that makes them
> > be handled by the alu2?  I realise it isn't all so regular :-(
>  
>  
> >> > +; 2 cycle FP ops
> >> > +(define_attr "power9_fp_2cyc" "no,yes"
> >> > +  (cond [(eq_attr "mnemonic" "fabs,fcpsgn,fmr,fmrgow,fnabs,fneg,\
> >> > +  xsabsdp,xscpsgndp,xsnabsdp,xsnegdp,\
> >> > +  xsabsqp,xscpsgnqp,xsnabsqp,xsnegqp")
> >> > + (const_string "yes")]
> >> > +(const_string "no")))
> > Eww.  Can we have an attribute for the FP move instructions, instead?
> > Maybe a value "fpmove" for the "type", even?
> 
> The following patch adds new insn 'type' values that will be used for the 
> Power9 patch to overcome the items listed above. There is no functional 
> change to existing processor types. Bootstrap/regtested on 
> powerpc64/powerpc64le with no new failures. Ok for trunk? Ok for backport to 
> GCC 6 branch after successful bootstrap/regtest there?

Hi Pat,

> * config/rs6000/rs6000.md ('type' attribute): Add
> vec_logical,veccmp_fx,vec_extend,vecmove insn types.

Those names are a bit irregular (underscore vs. no underscore after "vec",
"extend" is called "exts" for integer, "vec_logical" holds no relation to
integer "logical").

That said...  If this makes the power9 scheduling patch better, okay
for trunk and 6 later.  So please wait for a review of that patch.

Thanks,


Segher


Re: [PATCH, rs6000] Fix PR target 71656, reload ICE when -mcpu=power9 -mpower9-dform-vector

2016-06-27 Thread Segher Boessenkool
On Sat, Jun 25, 2016 at 07:14:01PM -0500, Peter Bergner wrote:
> This patch fixes PR71656 by adding support to rs6000_legitimize_reload_address
> for POWER9 vector-dform addresses.  Previously, -mpower9-dform-vector was
> disabled when using reload due to this bug and it was not added to the default
> ISA 3.0 flags.  Now that -mpower9-dform-vector works with reload, I have
> added it to the default ISA 3.0 flags and removed the disabling of the
> option when using -mno-lra.  I also updated rs6000_legitimate_address_p
> so that it too correctly handles quad addresses.
> 
> This has bootstrapped and regtested with no regessions on powerpc64le-linux.  
> Ok for for trunk?
> 
> This bug also affects the FSF 6 branch.  Ok for that branch after the patch
> has burned in a while on trunk and after the usual bootstrap and regtesting?

Okay for trunk, okay for 6 later.  One comment:

> +  if (VECTOR_MODE_P (mode)
> +  && !mode_supports_vsx_dform_quad (mode))
> +return false;
>  
>if (GET_CODE (addr) != PLUS)
>  return false;
>  
>op0 = XEXP (addr, 0);
> -  if (!base_reg_operand (op0, Pmode))
> +  if (!REG_P (op0)
> +  || !INT_REG_OK_FOR_BASE_P (op0, strict))
>  return false;

Just put these short conditionals on one line each?  It looks silly ;-)

Thanks,


Segher


[PATCH] Fix FFI return type for closures in the java interpreter

2016-06-27 Thread Matthew Fortune
Hi,

I've identified a latent bug in the java interpreter that affects MIPS
n32 and n64 ABIs both little and big endian and, I presume, any 64-bit
big endian target with int as 32-bit.

A full description is in my original post:

https://gcc.gnu.org/ml/java-patches/2016-q2/msg00020.html

Patch tested on x86_64-pc-linux-gnu where the new testcase works both
before and after with no regressions. Also tested on mips64el-linux-gnu
where the test fails before and works after with no regressions.

Aurelien provided the test case and I tried to follow an existing test
to integrate it. Let me know if there is anything else I need to do.

I mentioned in my earlier post about a possible similar issue in the
lang/reflect/natVMProxy.cc code (unbox function) by code inspection. I
don't know how to trigger this code but perhaps someone can advise.

OK to commit?

Thanks,
Matthew

libjava/

* interpret-run.cc: Use ffi_arg for FFI integer return types.

libjava/testsuite/

* libjava.jar/arraysort.java: New file.
* libjava.jar/arraysort.jar: New file.
* libjava.jar/arraysort.out: New file.
* libjava.jar/arraysort.xfail: New file.
---
 libjava/interpret-run.cc  |   2 +-
 libjava/testsuite/libjava.jar/arraysort.jar   | Bin 0 -> 1864 bytes
 libjava/testsuite/libjava.jar/arraysort.java  |  44 ++
 libjava/testsuite/libjava.jar/arraysort.out   |  10 ++
 libjava/testsuite/libjava.jar/arraysort.xfail |   1 +
 5 files changed, 56 insertions(+), 1 deletion(-)
 create mode 100644 libjava/testsuite/libjava.jar/arraysort.jar
 create mode 100644 libjava/testsuite/libjava.jar/arraysort.java
 create mode 100644 libjava/testsuite/libjava.jar/arraysort.out
 create mode 100644 libjava/testsuite/libjava.jar/arraysort.xfail

diff --git a/libjava/interpret-run.cc b/libjava/interpret-run.cc
index a4c2d4d..6be354e 100644
--- a/libjava/interpret-run.cc
+++ b/libjava/interpret-run.cc
@@ -1838,7 +1838,7 @@ details.  */
   return;
 
 insn_ireturn:
-  *(jint *) retp = POPI ();
+  *(ffi_arg *) retp = POPI ();
   return;
 
 insn_return:
diff --git a/libjava/testsuite/libjava.jar/arraysort.jar 
b/libjava/testsuite/libjava.jar/arraysort.jar
new file mode 100644
index 
..ee051e45a836f99563e3e15ce748795d01e87e4e
GIT binary patch
literal 1864
zcmWIWW@h1H00Eo28y;W=l;C8LVeoYgan$wnbJGtE;bdT59F`k?28c^5xEUB(K+3>G
z0MG~#AcuqDY3hc|A-l0JQ1!kM^U9M!>OsNp&~w^S#_!D?W9NbpB}La^B>!6
zf679aMLA>g&6#s+KR(F@shrQeTEeQ8w}Z?^h=dPVkenOk>e|0rMi-+iCS#$T&;
zxb+?@$^5Q+eCgY*AKuMvY}5IuFJ*KjdYf*_

Re: [PATCH], Add PowerPC ISA 3.0 lxsihzx, lxsibzx, stxsihx, stxsibx support

2016-06-27 Thread Segher Boessenkool
On Thu, Jun 23, 2016 at 06:37:15PM -0400, Michael Meissner wrote:
> PowerPC ISA 3.0 adds new instructions (LXSIHZX, LXSIBZX, STXSIHX, and STXSIBX)
> that allow you to load and zero extend byte and half word values from memory
> and to store them back.

Okay for trunk with fixes below; okay for 6 later.

> @@ -872,7 +878,6 @@ (define_insn_and_split "*zero_extendsi (set_attr "dot" "yes")
> (set_attr "length" "4,8")])
>  
> -
>  (define_insn "extendqi2"
>[(set (match_operand:EXTQI 0 "gpc_reg_operand" "=r")
>   (sign_extend:EXTQI (match_operand:QI 1 "gpc_reg_operand" "r")))]

Unrelated whitespace change, please don't.

> @@ -5188,6 +5193,107 @@ (define_insn_and_split "*floatunssidf2_i
>[(set_attr "length" "20")
> (set_attr "type" "fp")])
>  
> +;; ISA 3.0 adds instructions lxsi[bh]zx to directly load QImode and HImode to
> +;; vector registers.  At the moment, QI/HImode are not allowed in floating
> +;; point or vector registers, so we use UNSPEC's to use the load byte and
> +;; half-word instructions.
> +
> +(define_expand "float2"
> +  [(parallel [(set (match_operand:FP_ISA3 0 "vsx_register_operand" "")
> +(float:FP_ISA3
> + (match_operand:QHI 1 "input_operand" "")))
> +   (clobber (match_scratch:DI 2 ""))
> +   (clobber (match_scratch:DI 3 ""))])]

Drop the "" please.

> +  "TARGET_P9_VECTOR && TARGET_DIRECT_MOVE && TARGET_POWERPC64"
> +{
> +  if (MEM_P (operands[1]))
> +operands[1] = rs6000_address_for_fpconvert (operands[1]);

That function should get a better name, it's not used soletly for fpconvert
anymore.  I'm not asking you to do this now, it should be a separate patch
anyway ;-)

Thanks,


Segher


Re: OpenACC wait clause

2016-06-27 Thread Jakub Jelinek
On Mon, Jun 27, 2016 at 11:36:26AM -0700, Cesar Philippidis wrote:
> With that aside, this patch should correct the issues you pointed out.
> gfc_match_omp_clauses now uses a common function to parse the gang,
> worker and vector clauses. Also, this patch takes extra care with
> MATCH_ERRORs when parsing the wait and async clauses and the wait
> directive. One oddity I noticed is how the fortran FE accepts directives
> with clauses specified like this
> 
>   !$omp parallel if(.true.)private(x)

That is completely intentional, in that case the whitespace isn't needed to
separate tokens.  It is only a matter of coding style then.
Somebody will only use
!$omp parallel if(.true.) private(x)
other people
!$omp parallel if ( .true. ) , private ( x )
etc.

> Shouldn't there be some sort of separator between if and private? I know
> that it's easy to distinguish the clauses because of the parenthesis,
> but it still looks weird.

If user wants, it can, but it isn't required by the standard.
In C you can also write if(cond)if(cond2)if(cond3){foo(x,y,z);}else{bar(x);}
if you want, similarly Fortran.  OpenMP/Fortran even allows
!$omp endparalleldosimd
and similar, you don't have to use spaces in between end parallel do simd.
For fixed form of course even more whitespace is optional.

> @@ -630,9 +653,10 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, uint64_t 
> mask,
>  {
>gfc_omp_clauses *c = gfc_get_omp_clauses ();
>locus old_loc;
> +  bool seen_error = false;
>  
>*cp = NULL;
> -  while (1)
> +  while (!seen_error)
>  {
>if ((first || gfc_match_char (',') != MATCH_YES)
> && (needs_space && gfc_match_space () != MATCH_YES))

Why?
The main loop is while (1) which has break; as the last statement.
Instead of setting seen_error, just set
gfc_current_locus = old_loc;
if you have already matched successfully something, and then break;
as you already do.

> @@ -1275,9 +1309,16 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, uint64_t 
> mask,
>   continue;
> if ((mask & OMP_CLAUSE_TILE)
> && !c->tile_list
> -   && match_oacc_expr_list ("tile (", >tile_list,
> -true) == MATCH_YES)
> - continue;
> +   && gfc_match ("tile") == MATCH_YES)
> + {
> +   if (match_oacc_expr_list (" (", >tile_list, true) != MATCH_YES)
> + {
> +   seen_error = true;
> +   break;
> + }
> +   needs_space = true;
> +   continue;
> + }
> if ((mask & OMP_CLAUSE_TO)
> && gfc_match_omp_variable_list ("to (",
> >lists[OMP_LIST_TO], false,

So, tile without ()s is also a valid clause in OpenACC?
If yes, what do you set in c structure for the existence of the clause?
If it is not valid, then the above change looks wrong.

> @@ -1309,10 +1350,13 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, uint64_t 
> mask,
> && gfc_match ("vector") == MATCH_YES)
>   {
> c->vector = true;
> -   if (gfc_match (" ( length : %e )", >vector_expr) == MATCH_YES
> -   || gfc_match (" ( %e )", >vector_expr) == MATCH_YES)
> - needs_space = false;
> -   else
> +   match m = match_oacc_clause_gwv(c, GOMP_DIM_VECTOR);

Formatting, space before (.

> @@ -1348,7 +1402,7 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, uint64_t 
> mask,
>break;
>  }
>  
> -  if (gfc_match_omp_eos () != MATCH_YES)
> +  if (seen_error || gfc_match_omp_eos () != MATCH_YES)
>  {
>gfc_free_omp_clauses (c);
>return MATCH_ERROR;

Again, IMHO not needed, if you restore the old_loc into gfc_current_loc,
then gfc_match_omp_eos () will surely fail.

Jakub


Re: OpenACC wait clause

2016-06-27 Thread Cesar Philippidis
On 06/24/2016 08:53 AM, Jakub Jelinek wrote:
> On Fri, Jun 24, 2016 at 08:42:49AM -0700, Cesar Philippidis wrote:
 @@ -1328,7 +1328,8 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, 
 uint64_t mask,
  && gfc_match ("wait") == MATCH_YES)
{
  c->wait = true;
 -match_oacc_expr_list (" (", >wait_list, false);
 +if (match_oacc_expr_list (" (", >wait_list, false) == MATCH_NO)
 +  needs_space = true;
  continue;
}
  if ((mask & OMP_CLAUSE_WORKER)
>>>
>>> I think it is still problematic.  Most of the parsing fortran FE errors are 
>>> deferred,
>>> meaning that if you don't reject the whole gfc_match_omp_clauses, then no
>>> diagnostics is actually emitted.
>>
>> What exactly is the problem here? Do you want more precise errors or do
>> you want to keep the errors generic and deferred?
> 
> The problem is that if you ignore MATCH_ERROR and don't reject the stmt,
> then invalid source will be accepted, which is not acceptable.
> 
> I admit the Fortran OpenMP/OpenACC error recovery and diagnostics is not
> very good, but it generally follows the model done elsewhere in the Fortran
> FE.  So, I think before changing anything substantial first the bugs in
> there (where MATCH_ERROR is ignored) should be fixed (something that can be
> also backported), and only afterwards we should be discussing some plan to
> improve the infrastructure (like for clauses if there are any non-fatal
> errors in the parsing emit the diagnostics immediately, don't add the
> clauses to the FE IL and don't reject the whole stmt).

That's reasonable. I wasn't considering the gcc6 backport.

Improving the infrastructure is definitely a good ideal. I'll need to
clear that out with Thomas though. As a heads up, we've been getting a
lot of customer feedback to make the compiler emit more performance
diagnostics in general. Basically, they want to know what type of
parallelization the compiler detected and applied (e.g., which loops are
independent and were vectorized/partitioned across gang, workers and
vectors), along with missed optimizations.

With that aside, this patch should correct the issues you pointed out.
gfc_match_omp_clauses now uses a common function to parse the gang,
worker and vector clauses. Also, this patch takes extra care with
MATCH_ERRORs when parsing the wait and async clauses and the wait
directive. One oddity I noticed is how the fortran FE accepts directives
with clauses specified like this

  !$omp parallel if(.true.)private(x)

Shouldn't there be some sort of separator between if and private? I know
that it's easy to distinguish the clauses because of the parenthesis,
but it still looks weird.

With that oddity aside, is this patch OK for trunk and gcc6?

Cesar
2016-06-27  Cesar Philippidis  

	gcc/fortran/
	* openmp.c (match_oacc_clause_gang): Rename to ...
	(match_oacc_clause_gwv): this.  Add support for OpenACC worker and
	vector clauses.
	(gfc_match_omp_clauses): Use match_oacc_clause_gwv for
	OMP_CLAUSE_{GANG,WORKER,VECTOR}.  Propagate any MATCH_ERRORs for
	invalid OMP_CLAUSE_{ASYNC,WAIT,GANG,WORKER,VECTOR} clauses.
	(gfc_match_oacc_wait): Propagate MATCH_ERROR for invalid
	oacc_expr_lists.  Adjust the first and needs_space arguments to
	gfc_match_omp_clauses.

	gcc/testsuite/
	* gfortran.dg/goacc/asyncwait-2.f95: Updated expected diagnostics.
	* gfortran.dg/goacc/asyncwait-3.f95: Likewise.
	* gfortran.dg/goacc/asyncwait-4.f95: Add test coverage.

diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index f514866..fa6587e 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -396,43 +396,66 @@ cleanup:
 }
 
 static match
-match_oacc_clause_gang (gfc_omp_clauses *cp)
+match_oacc_clause_gwv (gfc_omp_clauses *cp, unsigned gwv)
 {
   match ret = MATCH_YES;
 
   if (gfc_match (" ( ") != MATCH_YES)
 return MATCH_NO;
 
-  /* The gang clause accepts two optional arguments, num and static.
- The num argument may either be explicit (num: ) or
- implicit without ( without num:).  */
-
-  while (ret == MATCH_YES)
+  if (gwv == GOMP_DIM_GANG)
 {
-  if (gfc_match (" static :") == MATCH_YES)
+/* The gang clause accepts two optional arguments, num and static.
+	 The num argument may either be explicit (num: ) or
+	 implicit without ( without num:).  */
+
+  while (ret == MATCH_YES)
 	{
-	  if (cp->gang_static)
-	return MATCH_ERROR;
+	  if (gfc_match (" static :") == MATCH_YES)
+	{
+	  if (cp->gang_static)
+		return MATCH_ERROR;
+	  else
+		cp->gang_static = true;
+	  if (gfc_match_char ('*') == MATCH_YES)
+		cp->gang_static_expr = NULL;
+	  else if (gfc_match (" %e ", >gang_static_expr) != MATCH_YES)
+		return MATCH_ERROR;
+	}
 	  else
-	cp->gang_static = true;
-	  if (gfc_match_char ('*') == MATCH_YES)
-	cp->gang_static_expr = NULL;
-	  else if (gfc_match (" %e ", >gang_static_expr) != MATCH_YES)
-	return 

Re: Implement loop guard heuristics

2016-06-27 Thread Jan Hubicka
> Jan Hubicka  writes:
> > Hi,
> > this patch implements loop guard heuristics predicting that if one loop is
> > nested in another and controlled by a conditional that conditional is likely
> > false. This helps to get more realistic predictionsin larger loop nests.
> 
> Just curious: what's the basis for the condition being likely false?
> (Sorry if this is a well-known heuristic.)

Well, the reason is that if you have large loop nest there must be some way
to trim it down.  The "science" behind predictors is that they are driven
by real world data. You implement predictor, collect profile feedback and
then use analyze_brprob script to tell you how often given predictor match.

These values are then fed into predict.def and prodice.c uses Dempster-Shaffer
method to combine the hypoteses into likely outcome.  

Here are data collected by Martin (Liska) on SPEC.  Loop guard predicts
1109 branches statically, executed 6531156943 times and the prediction
is correct 61.88% cases.

Not the most reliable heuristics but it did caused quite interesting 
improvements
this weekend
https://gcc.opensuse.org/SPEC/CINT/sb-vangelis-head-64/recent.html
(gzip seems to get off-noise improvements at least)

HEURISTICS   BRANCHES  (REL)  HITRATE   
 COVERAGE COVERAGE  (REL)
loop guard with recursion   3   0.0%  17.17% /  93.91%  
 77177.72K   0.0%
no prediction   10690  18.6%  33.65% /  85.08%   
163062046405  163.06G  15.6%
loop iv compare40   0.1%  52.06% /  52.15%  
  88068708.81M   0.0%
early return (on trees)  6599  11.5%  54.39% /  86.51%
33950850516   33.95G   3.2%
loop guard   1109   1.9%  61.88% /  88.38% 
65311569436.53G   0.6%
opcode values positive (on trees)3368   5.9%  64.55% /  90.39%
17957835504   17.96G   1.7%
continue  505   0.9%  66.66% /  82.85%
10086801881   10.09G   1.0%
call11893  20.7%  67.26% /  92.26%
34994677942   34.99G   3.3%
opcode values nonequal (on trees)7118  12.4%  67.63% /  81.38%
74854159732   74.85G   7.2%
loop iterations  2761   4.8%  67.99% /  67.99%   
408310259008  408.31G  39.1%
const return  271   0.5%  69.39% /  87.09%  
301566712  301.57M   0.0%
pointer (on trees)   5990  10.4%  69.59% /  87.18%
16667684592   16.67G   1.6%
combined57560 100.0%  69.74% /  80.61%  
10448750786601.04T 100.0%
DS theory   29243  50.8%  70.14% /  86.40%   
160792553051  160.79G  15.4%
loop exit with recursion   40   0.1%  72.17% /  92.33%  
454740027  454.74M   0.0%
recursive call 65   0.1%  75.19% /  76.33%  
189825093  189.83M   0.0%
first match 17627  30.6%  77.81% /  78.31%   
721020479204  721.02G  69.0%
extra loop exit   139   0.2%  82.80% /  88.17% 
16968160701.70G   0.2%
loop exit2813   4.9%  85.36% /  87.83%
60086533565   60.09G   5.8%
null return   393   0.7%  91.47% /  93.08% 
32686781973.27G   0.3%
guessed loop iterations  8084  14.0%  91.73% /  92.49%   
242056066564  242.06G  23.2%
guess loop iv compare 207   0.4%  97.75% /  97.79% 
43191869824.32G   0.4%
negative return   277   0.5%  97.94% /  99.23% 
10621190281.06G   0.1%
Fortran loop preheader   2536   4.4%  99.81% /  99.88% 
61389041776.14G   0.6%
noreturn call2468   4.3% 100.00% / 100.00% 
82321829158.23G   0.8%
Fortran fail alloc393   0.7% 100.00% / 100.00%  
  393   393.00   0.0%
Fortran repeated allocation/deallocation  218   0.4% 100.00% / 100.00%  
  218   218.00   0.0%
Fortran zero-sized array  677   1.2% 100.00% / 100.00%  
112723807  112.72M   0.0%
Fortran overflow 1282   2.2% 100.00% / 100.00%  
175074185  175.07M   0.0%

Loop count: 17886
  avg. # of iter: 12141.20
  median # of iter: 10.00
  avg. (1% cutoff) # of iter: 1038.28
  avg. (5% cutoff) # of iter: 464.53
  avg. (10% cutoff) # of iter: 222.38
  avg. (20% cutoff) # of iter: 64.87
  avg. (30% cutoff) # of iter: 44.06


[PATCH] Document behavior of __builtin_*_overflow_p on bitfields

2016-06-27 Thread Jakub Jelinek
Hi!

While the docs say that no integral argument promotions are performed, I
think it is better to make the behavior for bit-fields explicitly
documented.

Ok for trunk?

2016-06-27  Jakub Jelinek  

* doc/extend.texi (__builtin_add_overflow_p): Clarify behavior when
last argument is a bit-field.

--- gcc/doc/extend.texi.jj  2016-06-25 19:18:39.0 +0200
+++ gcc/doc/extend.texi 2016-06-27 13:58:34.209076739 +0200
@@ -9888,6 +9888,9 @@ cast to the type of the third argument.
 precision result, the built-in functions return false, otherwise they return 
true.
 The value of the third argument is ignored, just the side-effects in the third 
argument
 are evaluated, and no integral argument promotions are performed on the last 
argument.
+If the third argument is a bit-field, the type used for the result cast has the
+precision and signedness of the given bit-field, rather than precision and 
signedness
+of the underlying type.
 
 For example, the following macro can be used to portably check, at
 compile-time, whether or not adding two constant integers will overflow,

Jakub


[PATCH] Fix ICE with __builtin_*_overflow_p on bitfields (PR rtl-optimization/71673)

2016-06-27 Thread Jakub Jelinek
Hi!

On targets which don't support sub-word operations trying OPTAB_DIRECT
AND on e.g. QImode or HImode leads to NULL being returned, so we ICE on
builtin-arith-overflow-p-19.c e.g. on arm, aarch64 or sparc*.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, tested
with cross-compiler on the testcase, ok for trunk?

2016-06-27  Jakub Jelinek  

PR rtl-optimization/71673
* internal-fn.c (expand_arith_overflow_result_store): Use
OPTAB_LIB_WIDEN instead of OPTAB_DIRECT as last argument to
expand_simple_binop.

--- gcc/internal-fn.c.jj2016-06-24 13:01:58.0 +0200
+++ gcc/internal-fn.c   2016-06-27 13:38:25.753237581 +0200
@@ -454,7 +454,7 @@ expand_arith_overflow_result_store (tree
= immed_wide_int_const (wi::shifted_mask (0, prec, false, tgtprec),
tgtmode);
  lres = expand_simple_binop (tgtmode, AND, res, mask, NULL_RTX,
- true, OPTAB_DIRECT);
+ true, OPTAB_LIB_WIDEN);
}
   else
{

Jakub


[PATCH] Fix ICE with V1DImode ctor (PR middle-end/71626)

2016-06-27 Thread Jakub Jelinek
Hi!

This patch is an attempt to fix ICE on the following testcase.
In output_constant_pool_2 we assume that for vector modes, force_const_mem
must be a CONST_VECTOR, but for the weirdo vector modes with single element
it could very well be just a SUBREG of some other constant.
This isn't enough though, e.g. for -fpic we shouldn't force it into constant
pool constant, because it needs relocation.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-06-27  Jakub Jelinek  

PR middle-end/71626
* output.h (compute_reloc_for_rtx): New prototype.
* varasm.c (output_constant_pool_2): Handle SUBREGs for V1??mode
vectors.
(compute_reloc_for_rtx): No longer static.
* config/i386/i386.c (ix86_expand_vector_move): For V1??mode SUBREG
of scalar constant that needs relocation, force the constant into
the inner mode reg first.

* gcc.c-torture/execute/pr71626-1.c: New test.
* gcc.c-torture/execute/pr71626-2.c: New test.

--- gcc/output.h.jj 2016-01-04 14:55:53.0 +0100
+++ gcc/output.h2016-06-27 12:21:41.529484513 +0200
@@ -349,6 +349,9 @@ extern bool decl_readonly_section (const
given a constant expression.  */
 extern int compute_reloc_for_constant (tree);
 
+/* Similarly, but for RTX.  */
+extern int compute_reloc_for_rtx (const_rtx);
+
 /* User label prefix in effect for this compilation.  */
 extern const char *user_label_prefix;
 
--- gcc/varasm.c.jj 2016-06-10 20:24:03.0 +0200
+++ gcc/varasm.c2016-06-27 12:21:36.062553957 +0200
@@ -3834,6 +3834,17 @@ output_constant_pool_2 (machine_mode mod
 machine_mode submode = GET_MODE_INNER (mode);
unsigned int subalign = MIN (align, GET_MODE_BITSIZE (submode));
 
+   /* For V1??mode, allow SUBREGs.  */
+   if (GET_CODE (x) == SUBREG
+   && GET_MODE_NUNITS (mode) == 1
+   && GET_MODE_BITSIZE (submode) == GET_MODE_BITSIZE (mode)
+   && SUBREG_BYTE (x) == 0
+   && GET_MODE (SUBREG_REG (x)) == submode)
+ {
+   output_constant_pool_2 (submode, SUBREG_REG (x), align);
+   break;
+ }
+
gcc_assert (GET_CODE (x) == CONST_VECTOR);
units = CONST_VECTOR_NUNITS (x);
 
@@ -6637,7 +6648,7 @@ compute_reloc_for_rtx_1 (const_rtx x)
is a mask for which bit 1 indicates a global relocation, and bit 0
indicates a local relocation.  */
 
-static int
+int
 compute_reloc_for_rtx (const_rtx x)
 {
   switch (GET_CODE (x))
--- gcc/config/i386/i386.c.jj   2016-06-24 12:59:29.0 +0200
+++ gcc/config/i386/i386.c  2016-06-27 12:23:15.504290801 +0200
@@ -19605,6 +19605,20 @@ ix86_expand_vector_move (machine_mode mo
   if (push_operand (op0, VOIDmode))
 op0 = emit_move_resolve_push (mode, op0);
 
+  /* For V1XXmode subregs of scalar constants, force the scalar into
+ reg first if it would need relocations.  */
+  if (SUBREG_P (op1)
+  && CONSTANT_P (SUBREG_REG (op1))
+  && VECTOR_MODE_P (mode)
+  && GET_MODE_NUNITS (mode) == 1
+  && GET_MODE (SUBREG_REG (op1)) == GET_MODE_INNER (mode)
+  && compute_reloc_for_rtx (SUBREG_REG (op1)))
+{
+  rtx r = force_reg (GET_MODE (SUBREG_REG (op1)), SUBREG_REG (op1));
+  op1 = simplify_gen_subreg (mode, r, GET_MODE (SUBREG_REG (op1)),
+SUBREG_BYTE (op1));
+}
+
   /* Force constants other than zero into memory.  We do not know how
  the instructions used to build constants modify the upper 64 bits
  of the register, once we have that information we may be able
--- gcc/testsuite/gcc.c-torture/execute/pr71626-1.c.jj  2016-06-27 
12:31:51.689755677 +0200
+++ gcc/testsuite/gcc.c-torture/execute/pr71626-1.c 2016-06-27 
12:31:05.078344165 +0200
@@ -0,0 +1,19 @@
+/* PR middle-end/71626 */
+
+typedef __INTPTR_TYPE__ V __attribute__((__vector_size__(sizeof 
(__INTPTR_TYPE__;
+
+__attribute__((noinline, noclone)) V
+foo ()
+{
+  V v = { (__INTPTR_TYPE__) foo };
+  return v;
+}
+
+int
+main ()
+{
+  V v = foo ();
+  if (v[0] != (__INTPTR_TYPE__) foo)
+__builtin_abort ();
+  return 0;
+}
--- gcc/testsuite/gcc.c-torture/execute/pr71626-2.c.jj  2016-06-27 
12:31:54.837715933 +0200
+++ gcc/testsuite/gcc.c-torture/execute/pr71626-2.c 2016-06-27 
12:31:47.977802542 +0200
@@ -0,0 +1,4 @@
+/* PR middle-end/71626 */
+/* { dg-additional-options "-fpic" { target fpic } } */
+
+#include "pr71626-1.c"

Jakub


Re: Implement loop guard heuristics

2016-06-27 Thread Richard Sandiford
Jan Hubicka  writes:
> Hi,
> this patch implements loop guard heuristics predicting that if one loop is
> nested in another and controlled by a conditional that conditional is likely
> false. This helps to get more realistic predictionsin larger loop nests.

Just curious: what's the basis for the condition being likely false?
(Sorry if this is a well-known heuristic.)

Thanks,
Richard


Re: Importing gnulib into the gcc tree

2016-06-27 Thread Joseph Myers
On Sat, 25 Jun 2016, ayush goel wrote:

> Initially I have just imported the bcopy module from gnulib which will 
> eventually replace gcc’s dependency on libiberty’s bcopy. 

GCC should not depend on bcopy.  Any bcopy use is a bug and it should be 
replaced by memcpy or memmove as appropriate.  The poisoning in system.h 
should prevent such uses from building in the first place.

You should only import gnulib modules for functionality actually used in 
GCC - remembering that GCC now depends on at least a C++98 compiler and 
library, so it's unlikely that modules for ISO C90 functionality are 
actually relevant on any currently supported host or build system.

For example, GCC uses obstacks, so the gnulib version of obstack is 
appropriate to replace the libiberty version.

-- 
Joseph S. Myers
jos...@codesourcery.com

[PATCH,rs6000] Add support for Power9 DFP Test Significance Immediate instruction

2016-06-27 Thread Kelvin Nilsen

This patch adds built-in function support for the new Power9 dtstsfi
instruction.

This has bootstrapped and regression tested on
powerpc64le-unknown-linux-gnu without regressions.  Is this ok for the
trunk?  Is this patch ok for gcc-6 after some burn-in time on the trunk?

Thanks.

gcc/testsuite/ChangeLog:

2016-06-27  Kelvin Nilsen  

* gcc.target/powerpc/dtstsfi-0.c: New test.
* gcc.target/powerpc/dtstsfi-1.c: New test.
* gcc.target/powerpc/dtstsfi-10.c: New test.
* gcc.target/powerpc/dtstsfi-11.c: New test.
* gcc.target/powerpc/dtstsfi-12.c: New test.
* gcc.target/powerpc/dtstsfi-13.c: New test.
* gcc.target/powerpc/dtstsfi-14.c: New test.
* gcc.target/powerpc/dtstsfi-15.c: New test.
* gcc.target/powerpc/dtstsfi-16.c: New test.
* gcc.target/powerpc/dtstsfi-17.c: New test.
* gcc.target/powerpc/dtstsfi-18.c: New test.
* gcc.target/powerpc/dtstsfi-19.c: New test.
* gcc.target/powerpc/dtstsfi-2.c: New test.
* gcc.target/powerpc/dtstsfi-20.c: New test.
* gcc.target/powerpc/dtstsfi-21.c: New test.
* gcc.target/powerpc/dtstsfi-22.c: New test.
* gcc.target/powerpc/dtstsfi-23.c: New test.
* gcc.target/powerpc/dtstsfi-24.c: New test.
* gcc.target/powerpc/dtstsfi-25.c: New test.
* gcc.target/powerpc/dtstsfi-26.c: New test.
* gcc.target/powerpc/dtstsfi-27.c: New test.
* gcc.target/powerpc/dtstsfi-28.c: New test.
* gcc.target/powerpc/dtstsfi-29.c: New test.
* gcc.target/powerpc/dtstsfi-3.c: New test.
* gcc.target/powerpc/dtstsfi-30.c: New test.
* gcc.target/powerpc/dtstsfi-31.c: New test.
* gcc.target/powerpc/dtstsfi-32.c: New test.
* gcc.target/powerpc/dtstsfi-33.c: New test.
* gcc.target/powerpc/dtstsfi-34.c: New test.
* gcc.target/powerpc/dtstsfi-35.c: New test.
* gcc.target/powerpc/dtstsfi-36.c: New test.
* gcc.target/powerpc/dtstsfi-37.c: New test.
* gcc.target/powerpc/dtstsfi-38.c: New test.
* gcc.target/powerpc/dtstsfi-39.c: New test.
* gcc.target/powerpc/dtstsfi-4.c: New test.
* gcc.target/powerpc/dtstsfi-40.c: New test.
* gcc.target/powerpc/dtstsfi-41.c: New test.
* gcc.target/powerpc/dtstsfi-42.c: New test.
* gcc.target/powerpc/dtstsfi-43.c: New test.
* gcc.target/powerpc/dtstsfi-44.c: New test.
* gcc.target/powerpc/dtstsfi-45.c: New test.
* gcc.target/powerpc/dtstsfi-46.c: New test.
* gcc.target/powerpc/dtstsfi-47.c: New test.
* gcc.target/powerpc/dtstsfi-48.c: New test.
* gcc.target/powerpc/dtstsfi-49.c: New test.
* gcc.target/powerpc/dtstsfi-5.c: New test.
* gcc.target/powerpc/dtstsfi-50.c: New test.
* gcc.target/powerpc/dtstsfi-51.c: New test.
* gcc.target/powerpc/dtstsfi-52.c: New test.
* gcc.target/powerpc/dtstsfi-53.c: New test.
* gcc.target/powerpc/dtstsfi-54.c: New test.
* gcc.target/powerpc/dtstsfi-55.c: New test.
* gcc.target/powerpc/dtstsfi-56.c: New test.
* gcc.target/powerpc/dtstsfi-57.c: New test.
* gcc.target/powerpc/dtstsfi-58.c: New test.
* gcc.target/powerpc/dtstsfi-59.c: New test.
* gcc.target/powerpc/dtstsfi-6.c: New test.
* gcc.target/powerpc/dtstsfi-60.c: New test.
* gcc.target/powerpc/dtstsfi-61.c: New test.
* gcc.target/powerpc/dtstsfi-62.c: New test.
* gcc.target/powerpc/dtstsfi-63.c: New test.
* gcc.target/powerpc/dtstsfi-64.c: New test.
* gcc.target/powerpc/dtstsfi-65.c: New test.
* gcc.target/powerpc/dtstsfi-66.c: New test.
* gcc.target/powerpc/dtstsfi-67.c: New test.
* gcc.target/powerpc/dtstsfi-68.c: New test.
* gcc.target/powerpc/dtstsfi-69.c: New test.
* gcc.target/powerpc/dtstsfi-7.c: New test.
* gcc.target/powerpc/dtstsfi-70.c: New test.
* gcc.target/powerpc/dtstsfi-71.c: New test.
* gcc.target/powerpc/dtstsfi-72.c: New test.
* gcc.target/powerpc/dtstsfi-73.c: New test.
* gcc.target/powerpc/dtstsfi-74.c: New test.
* gcc.target/powerpc/dtstsfi-75.c: New test.
* gcc.target/powerpc/dtstsfi-76.c: New test.
* gcc.target/powerpc/dtstsfi-77.c: New test.
* gcc.target/powerpc/dtstsfi-78.c: New test.
* gcc.target/powerpc/dtstsfi-79.c: New test.
* gcc.target/powerpc/dtstsfi-8.c: New test.
* gcc.target/powerpc/dtstsfi-9.c: New test.


thingcc/ChangeLog:

2016-06-27  Kelvin Nilsen  

* config/rs6000/altivec.md (darn_32): Change the condition to
TARGET_P9_MISC instead of TARGET_MODULO.
(darn_raw): Replace TARGET_MODULO with TARGET_P9_MISC in the
condition expression.
(darn): Replace TARGET_MODULO with TARGET_P9_MISC in the
condition expression.
* 

Re: [PATCH, rs6000] Add minimum __float128 built-in support required for glibc

2016-06-27 Thread Bill Schmidt

> On Jun 27, 2016, at 12:41 PM, Richard Sandiford  
> wrote:
> 
> Joseph Myers  writes:
>> On Wed, 22 Jun 2016, Bill Schmidt wrote:
>>> The fact that I hook this built-in directly to a pattern named infkf1
>>> doesn't seem to preclude anything you suggest.  I named it this way
>>> on the off-chance that inf1 becomes a standard pattern in the
>>> future, in which case I want to generate this constant.  We can 
>>> always use gen_infkf1 to reuse this code in any other context.  I'm
>>> not understanding your objection.
>> 
>> That expander pattern is not useful given a target-independent built-in 
>> __builtin_inff128, since it will never be used except by a built-in 
>> function specifically associated with it.
>> 
>> I don't know what code will be generated for a use of _Float128 infinity, 
>> from the target-independent code - or, right now, for a use of 
>> (__float128) __builtin_inf ().  But if it's not the code you want, any 
>> reasonable fix would not be restricted to the case where __builtin_inff128 
>> () is used - it would work equally well for any case where that constant 
>> bit-pattern is wanted in VSX registers.
> 
> Yeah, I don't think we should have named patterns to generate constants.
> We should send the constant through the normal move patterns and make
> the expander or move define_insns handle them appropriately.

Agreed.  We'll plan on working the various interesting constants into the 
handling of the move insns.

Bill

> 
> Thanks,
> Richard
> 



Re: [PATCH, rs6000] Add minimum __float128 built-in support required for glibc

2016-06-27 Thread Richard Sandiford
Joseph Myers  writes:
> On Wed, 22 Jun 2016, Bill Schmidt wrote:
>> The fact that I hook this built-in directly to a pattern named infkf1
>> doesn't seem to preclude anything you suggest.  I named it this way
>> on the off-chance that inf1 becomes a standard pattern in the
>> future, in which case I want to generate this constant.  We can 
>> always use gen_infkf1 to reuse this code in any other context.  I'm
>> not understanding your objection.
>
> That expander pattern is not useful given a target-independent built-in 
> __builtin_inff128, since it will never be used except by a built-in 
> function specifically associated with it.
>
> I don't know what code will be generated for a use of _Float128 infinity, 
> from the target-independent code - or, right now, for a use of 
> (__float128) __builtin_inf ().  But if it's not the code you want, any 
> reasonable fix would not be restricted to the case where __builtin_inff128 
> () is used - it would work equally well for any case where that constant 
> bit-pattern is wanted in VSX registers.

Yeah, I don't think we should have named patterns to generate constants.
We should send the constant through the normal move patterns and make
the expander or move define_insns handle them appropriately.

Thanks,
Richard


-fopt-info handling

2016-06-27 Thread Ulrich Drepper
The manual says about -fop-info:

   If OPTIONS is omitted, it defaults to 'all-all', which means
dump all available optimization info from all the passes.

The current implementation (at at least recent gcc 6.1) don't follow
that, though.  They just ignore the option in that case.

How about the attached patch?  It is simple and doesn't duplicate the
information what "all-all" means and instead let's the option parser
do the hard work.


d-gcc-opt-info
Description: Binary data


Ping Re: Implement C _FloatN, _FloatNx types [version 3]

2016-06-27 Thread Joseph Myers
Ping.  This patch 
 is pending 
review.  Built-in functions are available in the followup patch 
.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH, libgcc/ARM 1/6] Fix Thumb-1 only == ARMv6-M & Thumb-2 only == ARMv7-M assumptions

2016-06-27 Thread Thomas Preudhomme
Hi Ramana,

On Wednesday 01 June 2016 10:00:52 Ramana Radhakrishnan wrote:
> 
> From here down to 
> 
> > -#if ((__ARM_ARCH__ > 5) && !defined(__ARM_ARCH_6M__)) \
> > -|| defined(__ARM_ARCH_5E__) || defined(__ARM_ARCH_5TE__) \
> > -|| defined(__ARM_ARCH_5TEJ__)
> > -#define HAVE_ARM_CLZ 1
> > -#endif
> > -
> > 
> >  #ifdef L_clzsi2
> > 
> > -#if defined(__ARM_ARCH_6M__)
> > +#if !__ARM_ARCH_ISA_ARM && __ARM_ARCH_ISA_THUMB == 1
> > 
> >  FUNC_START clzsi2
> >  
> > mov r1, #28
> > mov r3, #1
> > 
> > @@ -1544,7 +1538,7 @@ FUNC_START clzsi2
> > 
> > FUNC_END clzsi2
> >  
> >  #else
> >  ARM_FUNC_START clzsi2
> > 
> > -# if defined(HAVE_ARM_CLZ)
> > +# if defined(__ARM_FEATURE_CLZ)
> > 
> > clz r0, r0
> > RET
> >  
> >  # else
> > 
> > @@ -1568,15 +1562,15 @@ ARM_FUNC_START clzsi2
> > 
> >  .align 2
> >  1:
> >  .byte 4, 3, 2, 2, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0
> > 
> > -# endif /* !HAVE_ARM_CLZ */
> > +# endif /* !__ARM_FEATURE_CLZ */
> > 
> > FUNC_END clzsi2
> >  
> >  #endif
> >  #endif /* L_clzsi2 */
> >  
> >  #ifdef L_clzdi2
> > 
> > -#if !defined(HAVE_ARM_CLZ)
> > +#if !defined(__ARM_FEATURE_CLZ)
> 
> here should be it's own little patchlet and can  go in separately.

The patch in attachment changes the CLZ availability check in libgcc to test 
ISA supported and architecture version rather than encode a specific list of 
architectures. __ARM_FEATURE_CLZ is not used because its value depends on what 
mode the user is targeting but only the architecture support matters in this 
case. Indeed, the code using CLZ is written in assembler and uses mnemonics 
available both in ARM and Thumb mode so only CLZ availability in one of the 
mode matters.

This change was split out from [PATCH, GCC, ARM 1/7] Fix Thumb-1 only == 
ARMv6-M & Thumb-2 only == ARMv7-M assumptions.

ChangeLog entry is as follows:

*** libgcc/ChangeLog ***

2016-06-16  Thomas Preud'homme  

* config/arm/lib1funcs.S (HAVE_ARM_CLZ): Define for ARMv6* or later
and ARMv5t* rather than for a fixed list of architectures.

Looking for code generation change accross a number of combinations of ISAs 
(ARM/Thumb), optimization levels (Os/O2), and architectures (armv4, armv4t, 
armv5, armv5t, armv5te, armv6, armv6j, armv6k, armv6s-m, armv6kz, armv6t2, 
armv6z, armv6zk, armv7, armv7-a, armv7e-m, armv7-m, armv7-r, armv7ve, armv8-a, 
armv8-a+crc, iwmmxt and iwmmxt2) shows that only ARMv5T is impacted (uses CLZ 
now). This is expected because currently HAVE_ARM_CLZ is not defined for this 
architecture while the ARMv7-a/ARMv7-R Architecture Reference Manual [1] 
states that all ARMv5T* architectures have CLZ. ARMv5E should also be impacted 
(not using CLZ anymore) but testing it is difficult since current binutils does 
not support ARMv5E.

[1] Document ARM DDI0406C in http://infocenter.arm.com

Best regards,

Thomasdiff --git a/libgcc/config/arm/lib1funcs.S b/libgcc/config/arm/lib1funcs.S
index 951dcda1c3bf7f323423a3e2813bdf0501653016..c4f061f8196d243159903cac4eb0291d1bf0b1ad 100644
--- a/libgcc/config/arm/lib1funcs.S
+++ b/libgcc/config/arm/lib1funcs.S
@@ -1512,9 +1512,10 @@ LSYM(Lover12):
 
 #endif /* __symbian__ */
 
-#if ((__ARM_ARCH__ > 5) && !defined(__ARM_ARCH_6M__)) \
-|| defined(__ARM_ARCH_5E__) || defined(__ARM_ARCH_5TE__) \
-|| defined(__ARM_ARCH_5TEJ__)
+#if (__ARM_ARCH_ISA_THUMB == 2	\
+ || (__ARM_ARCH_ISA_ARM	\
+	 && (__ARM_ARCH__ > 5	\
+	 || (__ARM_ARCH__ == 5 && __ARM_ARCH_ISA_THUMB
 #define HAVE_ARM_CLZ 1
 #endif
 


Re: [PATCH, PR middle-end/71488] Fix vectorization of comparison of booleans

2016-06-27 Thread Ilya Enkovich
Looks like it caused PR71655 and therefore is not so safe :/

Ilya

2016-06-22 17:00 GMT+03:00 Ilya Enkovich :
> 2016-06-21 23:57 GMT+03:00 Jeff Law :
>> On 06/16/2016 05:06 AM, Ilya Enkovich wrote:
>>>
>>> Hi,
>>>
>>> This patch fixes incorrect comparison vectorization for booleans.
>>> The problem is that regular comparison which works for scalars
>>> doesn't work for vectors due to different binary representation.
>>> Also this never works for scalar masks.
>>>
>>> This patch replaces such comparisons with bitwise operations
>>> which work correctly for both vector and scalar masks.
>>>
>>> Bootstrapped and regtested on x86_64-unknown-linux-gnu.  Is it
>>> OK for trunk?  What should be done for gcc-6-branch?  Port this
>>> patch or just restrict vectorization for comparison of booleans?
>>>
>>> Thanks,
>>> Ilya
>>> --
>>> gcc/
>>>
>>> 2016-06-15  Ilya Enkovich  
>>>
>>> PR middle-end/71488
>>> * tree-vect-patterns.c (vect_recog_mask_conversion_pattern):
>>> Support
>>> comparison of boolean vectors.
>>> * tree-vect-stmts.c (vectorizable_comparison): Vectorize
>>> comparison
>>> of boolean vectors using bitwise operations.
>>>
>>> gcc/testsuite/
>>>
>>> 2016-06-15  Ilya Enkovich  
>>>
>>> PR middle-end/71488
>>> * g++.dg/pr71488.C: New test.
>>> * gcc.dg/vect/vect-bool-cmp.c: New test.
>>
>> OK.  Given this is a code generation bug, I'll support porting this patch to
>> the gcc-6 branch.  Is there any reason to think that porting out be more
>> risky than usual?  It looks pretty simple to me, am I missing some subtle
>> dependency?
>
> I don't feel this patch is too risky.  I asked only because simple restriction
> on masks comparison is even more safe.
>
> Thanks,
> Ilya
>
>
>>
>> jeff
>>


Re: [Patch, lra] PR70751, correct the cost for spilling non-pseudo into memory

2016-06-27 Thread Dominik Vogt
On Thu, Jun 09, 2016 at 09:52:39AM -0600, Jeff Law wrote:
> On 06/08/2016 10:47 AM, Jiong Wang wrote:
> >As discussed on the PR
> >
> >  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70751,
> >
> >here is the patch.

This commit has introduced an ICE with s390x, march=z13.  Is it a
backend bug or one in the middleend?

-- x.c --
void foo(int *n, int off, int *m)
{
  int i;

  for(i = 0 ;i <= 3; i++)
{
  n[off + i] = m[2 * i];
  n[off + 7 - i] = m[2 * i + 1];
}
}
-- snip --

  $ gcc -S -O3 -march=z13 x.c   
x.c: In function ‘foo’:
x.c:10:1: internal compiler error: Max. number of generated reload insns per 
insn is achieved (90)

 }
 ^
0x804c34b7 lra_constraints(bool)
gcc/lra-constraints.c:4468
0x804b085f lra(_IO_FILE*)
gcc/lra.c:2290
0x804690fb do_reload
gcc/ira.c:5384
0x804690fb execute
gcc/ira.c:5568

> >
> >For this particular failure on arm, *arm_movsi_insn has the following
> >operand
> >constraints:
> >operand 0: "=rk,r,r,r,rk,m"
> >  operand 1: "rk, I,K,j,mi,rk"
> >
> >gcc won't explicitly refuse an unmatch CT_MEMORY operand (r235184) if it
> >comes from substituion that alternative (alt) 4 got a chance to compete
> >with
> >alt 0, and eventually be the winner as it's with rld_nregs=0 while alt 0 is
> >with rld_nregs=1.
> >
> >I fell it's OK to give alt 4 a chance here, but we should calculate the
> >cost
> >correctly.
> >
> >For alt 4, it should be treated as spill into memory, but currently lra
> >only
> >recognize a spill for pseudo register.   While the spilled rtx for alt 4
> >is a
> >plus after equiv substitution.
> >
> > (plus:SI (reg/f:SI 102 sfp)
> >(const_int 4 [0x4]))
> >
> >This patch thus let lra-constraint cost spill of Non-pseudo as well and
> >fixed
> >the regression.
> >
> >x86_64/aarch64/arm boostrap and regression OK.
> >arm bootstrapped cc1 is about 0.3% smaller in code size.
> >
> >OK for trunk?
> >
> >gcc/
> >PR rtl-optimization/70751
> >* lra-constraints.c (process_alt_operands): Recognize Non-pseudo
> >spilled into
> >memory.
> >
> I think the change itself is fine, but the comment could use some
> work. Actually I think you just need to remove the first sentence
> (the one referring to BZ70751 and r235184) and keep everything from
> "Suppose a target" onward.
> 
> OK with that fix.
> 
> jeff
> 


Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany



Re: [PATCH, libgcc/ARM 1a/6, ping] Fix Thumb-1 only == ARMv6-M & Thumb-2 only == ARMv7-M assumptions

2016-06-27 Thread Thomas Preudhomme
Ping?

Best regards,

Thomas

On Friday 17 June 2016 18:21:44 Thomas Preudhomme wrote:
> On Wednesday 01 June 2016 10:00:52 Ramana Radhakrishnan wrote:
> > Please fix up the macros, post back and redo the test. Otherwise this
> > is ok from a quick read.
> 
> What about the updated patch in attachment? As for the original patch, I've
> checked that code generation does not change for a number of combinations of
> ISAs (ARM/Thumb), optimization levels (Os/O2), and architectures (armv4,
> armv4t, armv5, armv5t, armv5te, armv6, armv6j, armv6k, armv6s-m, armv6kz,
> armv6t2, armv6z, armv6zk, armv7, armv7-a, armv7e-m, armv7-m, armv7-r,
> armv7ve, armv8-a, armv8-a+crc, iwmmxt and iwmmxt2).
> 
> Note, I renumbered this patch 1a to not make the numbering of other patches
> look strange. The CLZ part is now in patch 1b/7.
> 
> ChangeLog entries are now as follow:
> 
> 
> *** gcc/ChangeLog ***
> 
> 2016-05-23  Thomas Preud'homme  
> 
> * config/arm/elf.h: Use __ARM_ARCH_ISA_THUMB and __ARM_ARCH_ISA_ARM
> to decide whether to prevent some libgcc routines being included for some
> multilibs rather than __ARM_ARCH_6M__ and add comment to indicate the link
> between this condition and the one in
> libgcc/config/arm/lib1func.S.
> 
> 
> *** gcc/testsuite/ChangeLog ***
> 
> 2015-11-10  Thomas Preud'homme  
> 
> * lib/target-supports.exp (check_effective_target_arm_cortex_m): Use
> __ARM_ARCH_ISA_ARM to test for Cortex-M devices.
> 
> 
> *** libgcc/ChangeLog ***
> 
> 2016-06-01  Thomas Preud'homme  
> 
> * config/arm/bpabi-v6m.S: Clarify what architectures is the
> implementation suitable for.
> * config/arm/lib1funcs.S (__prefer_thumb__): Define among other
> cases for all Thumb-1 only targets.
> (NOT_ISA_TARGET_32BIT): Define for Thumb-1 only targets.
> (THUMB_LDIV0): Test for NOT_ISA_TARGET_32BIT rather than
> __ARM_ARCH_6M__.
> (EQUIV): Likewise.
> (ARM_FUNC_ALIAS): Likewise.
> (umodsi3): Add check to __ARM_ARCH_ISA_THUMB != 1 to guard the idiv
> version.
> (modsi3): Likewise.
> (clzsi2): Test for NOT_ISA_TARGET_32BIT rather than __ARM_ARCH_6M__.
> (clzdi2): Likewise.
> (ctzsi2): Likewise.
> (L_interwork_call_via_rX): Test for __ARM_ARCH_ISA_ARM rather than
> __ARM_ARCH_6M__ in guard for checking whether it is defined.
> (final includes): Test for NOT_ISA_TARGET_32BIT rather than
> __ARM_ARCH_6M__ and add comment to indicate the connection between
> this condition and the one in gcc/config/arm/elf.h.
> * config/arm/libunwind.S: Test for __ARM_ARCH_ISA_THUMB and
> __ARM_ARCH_ISA_ARM rather than __ARM_ARCH_6M__.
> * config/arm/t-softfp: Likewise.
> 
> 
> Best regards,
> 
> Thomas



[PATCH, obvious] Update comments for several vectorizator functions

2016-06-27 Thread Ilya Enkovich
Hi,

This patch adds args description for some vectorizer functions.
I'm going to commit it to trunk.

Thanks,
Ilya
--
gcc/

2016-06-27  Ilya Enkovich  

* tree-vect-loop-manip.c (vect_update_ivs_after_vectorizer): Update
comment.
(vect_update_inits_of_drs): Likewise.
(vect_create_cond_for_alias_checks): Likewise.
* tree-vect-loop.c (vect_get_known_peeling_cost): Likewise.


diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
index fab5879..c26aa1d 100644
--- a/gcc/tree-vect-loop-manip.c
+++ b/gcc/tree-vect-loop-manip.c
@@ -1738,6 +1738,10 @@ vect_update_ivs_after_vectorizer (loop_vec_info 
loop_vinfo, tree niters,
iterates NITERS times, the new epilog loop iterates
NITERS % VECTORIZATION_FACTOR times.
 
+   If CHECK_PROFITABILITY is 1 then profitability check is generated
+   using TH as a cost model profitability threshold of iterations for
+   vectorization.
+
The original loop will later be made to iterate
NITERS / VECTORIZATION_FACTOR times (this value is placed into RATIO).
 
@@ -2000,7 +2004,11 @@ vect_update_inits_of_drs (loop_vec_info loop_vinfo, tree 
niters)
'niters' is set to the misalignment of one of the data references in the
loop, thereby forcing it to refer to an aligned location at the beginning
of the execution of this loop.  The data reference for which we are
-   peeling is recorded in LOOP_VINFO_UNALIGNED_DR.  */
+   peeling is recorded in LOOP_VINFO_UNALIGNED_DR.
+
+   If CHECK_PROFITABILITY is 1 then profitability check is generated
+   using TH as a cost model profitability threshold of iterations for
+   vectorization.  */
 
 void
 vect_do_peeling_for_alignment (loop_vec_info loop_vinfo, tree ni_name,
@@ -2315,7 +2323,7 @@ vect_create_cond_for_alias_checks (loop_vec_info 
loop_vinfo, tree * cond_expr)
 
The test generated to check which version of loop is executed
is modified to also check for profitability as indicated by the
-   cost model initially.
+   cost model threshold TH.
 
The versioning precondition(s) are placed in *COND_EXPR and
*COND_EXPR_STMT_LIST.  */
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 1146de9..41b9380 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -3384,7 +3384,15 @@ vect_get_known_peeling_cost (loop_vec_info loop_vinfo, 
int peel_iters_prologue,
 
Return the number of iterations required for the vector version of the
loop to be profitable relative to the cost of the scalar version of the
-   loop.  */
+   loop.
+
+   *RET_MIN_PROFITABLE_NITERS is a cost model profitability threshold
+   of iterations for vectorization.  -1 value means loop vectorization
+   is not profitable.  This returned value may be used for dynamic
+   profitability check.
+
+   *RET_MIN_PROFITABLE_ESTIMATE is a profitability threshold to be used
+   for static check against estimated number of iterations.  */
 
 static void
 vect_estimate_min_profitable_iters (loop_vec_info loop_vinfo,


Re: [AArch64] ARMv8.2 command line and feature macros support

2016-06-27 Thread Jiong Wang

On 07/06/16 09:46, Jiong Wang wrote:

2016-06-07  Matthew Wahab
Jiong Wang

* config/aarch64/aarch64-arches.def: Add "armv8.2-a".
* config/aarch64/aarch64.h (AARCH64_FL_V8_2): New.
(AARCH64_FL_F16): New.
(AARCH64_FL_FOR_ARCH8_2): New.
(AARCH64_ISA_8_2): New.
(AARCH64_ISA_F16): New.
(TARGET_FP_F16INST): New.
(TARGET_SIMD_F16INST): New.
* config/aarch64/aarch64-option-extensions.def: New entry for 
"fp16".
* config/aarch64/aarch64-c.c (arch64_update_cpp_builtins): 
Conditionally define
__ARM_FEATURE_FP16_SCALAR_ARITHMETIC and 
__ARM_FEATURE_FP16_VECTOR_ARITHMETIC.
* doc/invoke.texi (AArch64 Options): Document "armv8.2-a" and 
"fp16".




This is a updated version of this patch, the updates are:

  * When enabling "fp16" also enables "fp".
  * When disabling "fp" also disables "fp16".
  * When disabling "fp16" only disables "fp16".

OK for trunk?

2016-06-27  Matthew Wahab  
Jiong Wang  

* config/aarch64/aarch64-arches.def: Add "armv8.2-a".
* config/aarch64/aarch64.h (AARCH64_FL_V8_2): New.
(AARCH64_FL_F16): New.
(AARCH64_FL_FOR_ARCH8_2): New.
(AARCH64_ISA_8_2): New.
(AARCH64_ISA_F16): New.
(TARGET_FP_F16INST): New.
(TARGET_SIMD_F16INST): New.
* config/aarch64/aarch64-option-extensions.def ("fp16"): New entry.
("fp"): Disabling "fp" also disables "fp16".
* config/aarch64/aarch64-c.c (arch64_update_cpp_builtins): 
Conditionally define
__ARM_FEATURE_FP16_SCALAR_ARITHMETIC and 
__ARM_FEATURE_FP16_VECTOR_ARITHMETIC.
* doc/invoke.texi (AArch64 Options): Document "armv8.2-a" and "fp16".


diff --git a/gcc/config/aarch64/aarch64-arches.def b/gcc/config/aarch64/aarch64-arches.def
index 1e9d90b..7dcf140 100644
--- a/gcc/config/aarch64/aarch64-arches.def
+++ b/gcc/config/aarch64/aarch64-arches.def
@@ -32,4 +32,5 @@
 
 AARCH64_ARCH("armv8-a",	  generic,	 8A,	8,  AARCH64_FL_FOR_ARCH8)
 AARCH64_ARCH("armv8.1-a", generic,	 8_1A,	8,  AARCH64_FL_FOR_ARCH8_1)
+AARCH64_ARCH("armv8.2-a", generic,	 8_2A,	8,  AARCH64_FL_FOR_ARCH8_2)
 
diff --git a/gcc/config/aarch64/aarch64-c.c b/gcc/config/aarch64/aarch64-c.c
index e64dc76..3380ed6 100644
--- a/gcc/config/aarch64/aarch64-c.c
+++ b/gcc/config/aarch64/aarch64-c.c
@@ -95,6 +95,11 @@ aarch64_update_cpp_builtins (cpp_reader *pfile)
   else
 cpp_undef (pfile, "__ARM_FP");
 
+  aarch64_def_or_undef (TARGET_FP_F16INST,
+			"__ARM_FEATURE_FP16_SCALAR_ARITHMETIC", pfile);
+  aarch64_def_or_undef (TARGET_SIMD_F16INST,
+			"__ARM_FEATURE_FP16_VECTOR_ARITHMETIC", pfile);
+
   aarch64_def_or_undef (TARGET_SIMD, "__ARM_FEATURE_NUMERIC_MAXMIN", pfile);
   aarch64_def_or_undef (TARGET_SIMD, "__ARM_NEON", pfile);
 
diff --git a/gcc/config/aarch64/aarch64-option-extensions.def b/gcc/config/aarch64/aarch64-option-extensions.def
index e8706d1..a10ccf2 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -39,8 +39,8 @@
that are required.  Their order is not important.  */
 
 /* Enabling "fp" just enables "fp".
-   Disabling "fp" also disables "simd", "crypto".  */
-AARCH64_OPT_EXTENSION("fp", AARCH64_FL_FP, 0, AARCH64_FL_SIMD | AARCH64_FL_CRYPTO, "fp")
+   Disabling "fp" also disables "simd", "crypto" and "fp16".  */
+AARCH64_OPT_EXTENSION("fp", AARCH64_FL_FP, 0, AARCH64_FL_SIMD | AARCH64_FL_CRYPTO | AARCH64_FL_F16, "fp")
 
 /* Enabling "simd" also enables "fp".
Disabling "simd" also disables "crypto".  */
@@ -55,3 +55,7 @@ AARCH64_OPT_EXTENSION("crc", AARCH64_FL_CRC, 0, 0, "crc32")
 
 /* Enabling or disabling "lse" only changes "lse".  */
 AARCH64_OPT_EXTENSION("lse", AARCH64_FL_LSE, 0, 0, "atomics")
+
+/* Enabling "fp16" also enables "fp".
+   Disabling "fp16" just disables "fp16".  */
+AARCH64_OPT_EXTENSION("fp16", AARCH64_FL_F16, AARCH64_FL_FP, 0, "fp16")
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index b15c23f0..d715da7 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -135,6 +135,8 @@ extern unsigned aarch64_architecture_version;
 /* ARMv8.1 architecture extensions.  */
 #define AARCH64_FL_LSE	  (1 << 4)  /* Has Large System Extensions.  */
 #define AARCH64_FL_V8_1	  (1 << 5)  /* Has ARMv8.1 extensions.  */
+#define AARCH64_FL_V8_2	  (1 << 8)  /* Has ARMv8.2 features.  */
+#define AARCH64_FL_F16	  (1 << 9)  /* Has ARMv8.2 FP16 extensions.  */
 
 /* Has FP and SIMD.  */
 #define AARCH64_FL_FPSIMD (AARCH64_FL_FP | AARCH64_FL_SIMD)
@@ -146,6 +148,8 @@ extern unsigned aarch64_architecture_version;
 #define AARCH64_FL_FOR_ARCH8   (AARCH64_FL_FPSIMD)
 #define AARCH64_FL_FOR_ARCH8_1			   \
   (AARCH64_FL_FOR_ARCH8 | AARCH64_FL_LSE | AARCH64_FL_CRC | AARCH64_FL_V8_1)
+#define AARCH64_FL_FOR_ARCH8_2			\
+  

Re: [PATCH, rs6000] Scheduling update

2016-06-27 Thread Pat Haugen
On 06/22/2016 02:10 PM, Segher Boessenkool wrote:
> The "power9_alu2" attribute is writing part of the scheduling description
> inside the machine description proper.  Can this be reduced, maybe by
> adding an attribute describing something about the insns that makes them
> be handled by the alu2?  I realise it isn't all so regular :-(
 
 
>> > +; 2 cycle FP ops
>> > +(define_attr "power9_fp_2cyc" "no,yes"
>> > +  (cond [(eq_attr "mnemonic" "fabs,fcpsgn,fmr,fmrgow,fnabs,fneg,\
>> > +xsabsdp,xscpsgndp,xsnabsdp,xsnegdp,\
>> > +xsabsqp,xscpsgnqp,xsnabsqp,xsnegqp")
>> > +   (const_string "yes")]
>> > +(const_string "no")))
> Eww.  Can we have an attribute for the FP move instructions, instead?
> Maybe a value "fpmove" for the "type", even?

The following patch adds new insn 'type' values that will be used for the 
Power9 patch to overcome the items listed above. There is no functional change 
to existing processor types. Bootstrap/regtested on powerpc64/powerpc64le with 
no new failures. Ok for trunk? Ok for backport to GCC 6 branch after successful 
bootstrap/regtest there?

-Pat

2016-06-27  Pat Haugen  

* config/rs6000/rs6000.md ('type' attribute): Add
vec_logical,veccmp_fx,vec_extend,vecmove insn types.
(*abs2_fpr, *nabs2_fpr, *neg2_fpr, *extendsfdf2_fpr,
copysign3_fcpsgn, truncdf2_internal1, neg2_internal,
p8_fmrgow_, pack): Change type to fpsimple.
(*xxsel, copysign3_hard, neg2_hw, abs2_hw,
*nabs2_hw): Change type to vecmove.
(*and3_internal, *bool3_internal, *boolc3_internal,
*boolcc3_internal, *eqv3_internal,
*one_cmpl3_internal, *ieee_128bit_vsx_neg2_internal,
*ieee_128bit_vsx_abs2_internal,
*ieee_128bit_vsx_nabs2_internal, extendkftf2, trunctfkf2,
*ieee128_mfvsrd_64bit, *ieee128_mfvsrd_32bit, *ieee128_mtvsrd_64bit,
*ieee128_mtvsrd_32bit): Change type to vec_logical.
(mov_hardfloat, *mov_hardfloat32, *mov_hardfloat64,
*movdi_internal32, *movdi_internal64): Update insn types.
* config/rs6000/vsx.md (*vsx_le_undo_permute_,
vsx_extract_): Change type to vec_logical.
(*vsx_xxsel, *vsx_xxsel_uns): Change type to vecmove.
(vsx_sign_extend_qi_, *vsx_sign_extend_hi_,
*vsx_sign_extend_si_v2di): Change type to vec_extend.
* config/rs6000/altivec.md (*altivec_mov, *altivec_movti): Change
type to vec_logical.
(*altivec_eq, *altivec_gt, *altivec_gtu,
*altivec_vcmpequ_p, *altivec_vcmpgts_p,
*altivec_vcmpgtu_p): Change type to veccmp_fx.
(*altivec_vsel, *altivec_vsel_uns): Change type to vecmove.
* config/rs6000/dfp.md (*negdd2_fpr, *absdd2_fpr, *nabsdd2_fpr,
negtd2, *abstd2_fpr, *nabstd2_fpr): Change type to fpsimple.
* config/rs6000/40x.md (ppc405-float): Add fpsimple.
* config/rs6000/440.md (ppc440-fp): Add fpsimple.
* config/rs6000/476.md (ppc476-fp): Add fpsimple.
* config/rs6000/601.md (ppc601-fp): Add fpsimple.
* config/rs6000/603.md (ppc603-fp): Add fpsimple.
* config/rs6000/6xx.md (ppc604-fp): Add fpsimple.
* config/rs6000/7xx.md (ppc750-fp): Add fpsimple.
(ppc7400-vecsimple): Add vec_logical, vecmove, veccmp_fx.
* config/rs6000/7450.md (ppc7450-fp): Add fpsimple.
(ppc7450-vecsimple): Add vec_logical, vecmove.
(ppc7450-veccmp): Add veccmp_fx.
* config/rs6000/8540.md (ppc8540_simple_vector): Add vec_logical,
vecmove.
(ppc8540_vector_compare): Add veccmp_fx.
* config/rs6000/a2.md (ppca2-fp): Add fpsimple.
* config/rs6000/cell.md (cell-fp): Add fpsimple.
(cell-vecsimple): Add vec_logical, vecmove.
(cell-veccmp): Add veccmp_fx.
* config/rs6000/e300c2c3.md (ppce300c3_fp): Add fpsimple.
* config/rs6000/e6500.md (e6500_vecsimple): Add vec_logical, vecmove,
veccmp_fx.
* config/rs6000/mpc.md (mpccore-fp): Add fpsimple.
* config/rs6000/power4.md (power4-fp): Add fpsimple.
(power4-vecsimple): Add vec_logical, vecmove.
(power4-veccmp): Add veccmp_fx.
* config/rs6000/power5.md (power5-fp): Add fpsimple.
* config/rs6000/power6.md (power6-fp): Add fpsimple.
(power6-vecsimple): Add vec_logical, vecmove.
(power6-veccmp): Add veccmp_fx.
* config/rs6000/power7.md (power7-fp): Add fpsimple.
(power7-vecsimple): Add vec_logical, vecmove, veccmp_fx.
* config/rs6000/power8.md (power8-fp): Add fpsimple.
(power8-vecsimple): Add vec_logical, vecmove, veccmp_fx.
* config/rs6000/rs64.md (rs64a-fp): Add fpsimple.
* config/rs6000/titan.md (titan_fp): Add fpsimple.
* config/rs6000/xfpu.md (fp-default, fp-addsub-s, fp-addsub-d): Add
fpsimple.
* config/rs6000/rs6000.c (rs6000_adjust_cost): Add TYPE_FPSIMPLE.



Index: 

Determine more IVs to be non-overflowing

2016-06-27 Thread Jan Hubicka
Hi,
this patch makes simple_iv to determine more often that IV can not overflow.
First I commonized the logic in simple_iv with nowrap_type_p because it tests
the same. Second I added iv_can_overflow_p which uses known upper bound on
number of iteration to see if the IV calculation can overflow.

One interesting thig is that the ivcanon IV variables that goes from niter to 0
are believed to be wrapping.  This is because the type is unsigned and -1 is
then large number.

It is not specified what overflow means, I suppose one can think of it as 
overflow
in the calucaltion what sort of happens in this case.
Inspecting the code I think both users (niter and ivopts) agrees with different
interpretation that the induction can not wrap: that is the sequence produced
is always monotonously increasing or decreasing.

I would suggest to incrementally rename it to no_wrap_p and add comment in this
sense and handle this case, too.  It will need update at one place in ivopts 
where
we are detemrining the direction of iteration:

  /* We need to know that the candidate induction variable does not overflow.
 While more complex analysis may be used to prove this, for now just
 check that the variable appears in the original program and that it
 is computed in a type that guarantees no overflows.  */
  cand_type = TREE_TYPE (cand->iv->base);
  if (cand->pos != IP_ORIGINAL || !nowrap_type_p (cand_type))
return false;
..
  step = int_cst_value (cand->iv->step);
...
  /* Determine the new comparison operator.  */ 
  comp = step < 0 ? GT_EXPR : LT_EXPR;  

(which is the only occurence of step). I guess we can add function iv_direction
that will return -1,0,1 for monotonously decreasing, constant/unknown and
monotonously increasing inductions.


Honza

* tree-scalar-evolution.h (iv_can_overflow_p): Declare.
* tree-scalar-evolution.c (iv_can_overflow_p): New funcition.
(simple_iv): Use it.
* tree-ssa-loop-niter.c (nowrap_type_p): Use ANY_INTEGRAL_TYPE_P

* gcc.dg/tree-ssa/scev-14.c: New testcase.
Index: tree-scalar-evolution.h
===
--- tree-scalar-evolution.h (revision 237798)
+++ tree-scalar-evolution.h (working copy)
@@ -38,6 +38,7 @@ extern unsigned int scev_const_prop (voi
 extern bool expression_expensive_p (tree);
 extern bool simple_iv (struct loop *, struct loop *, tree, struct affine_iv *,
   bool);
+extern bool iv_can_overflow_p (struct loop *, tree, tree, tree);
 extern tree compute_overall_effect_of_inner_loop (struct loop *, tree);
 
 /* Returns the basic block preceding LOOP, or the CFG entry block when
Index: tree-scalar-evolution.c
===
--- tree-scalar-evolution.c (revision 237798)
+++ tree-scalar-evolution.c (working copy)
@@ -3309,6 +3310,70 @@ scev_reset (void)
 }
 }
 
+/* Return true if the IV calculation in TYPE can overflow based on the 
knowledge
+   of the upper bound on the number of iterations of LOOP, the BASE and STEP
+   of IV.
+
+   We do not use information whether TYPE can overflow so it is safe to
+   use this test even for derived IVs not computed every iteration or
+   hypotetical IVs to be inserted into code.  */
+
+bool
+iv_can_overflow_p (struct loop *loop, tree type, tree base, tree step)
+{
+  widest_int nit, base_min, base_max, step_min, step_max, type_min, type_max;
+  wide_int min, max;
+  tree maxt, mint;
+
+  if (TREE_CODE (base) == INTEGER_CST)
+base_min = base_max = wi::to_widest (base);
+  else if (TREE_CODE (base) == SSA_NAME && !POINTER_TYPE_P (TREE_TYPE (base))
+  && get_range_info (base, , ) == VR_RANGE)
+{
+  base_min = widest_int::from (min, TYPE_SIGN (TREE_TYPE (base)));
+  base_max = widest_int::from (max, TYPE_SIGN (TREE_TYPE (base)));
+}
+  else
+return true;
+
+  if (TREE_CODE (step) == INTEGER_CST)
+step_min = step_max = wi::to_widest (step);
+  else if (TREE_CODE (step) == SSA_NAME && !POINTER_TYPE_P (TREE_TYPE (base))
+  && get_range_info (step, , ) == VR_RANGE)
+{
+  step_min = widest_int::from (min, TYPE_SIGN (TREE_TYPE (step)));
+  step_max = widest_int::from (max, TYPE_SIGN (TREE_TYPE (step)));
+}
+  else
+return true;
+
+  if (!get_max_loop_iterations (loop, ))
+/* 10GHz CPU runing one cycle loop will reach 2^60 iterations in 260
+   years.  I won't live long enough to be forced to fix the
+   miscompilation.  Having the limit here will let us to consider 
+   64bit IVs with base 0 and step 1...16 as non-wrapping which makes
+   niter and ivopts go smoother.  */
+nit = ((widest_int)1 << 60);
+
+  if (INTEGRAL_TYPE_P (type))
+{
+  maxt = TYPE_MAX_VALUE (type);
+  mint = TYPE_MIN_VALUE (type);
+}
+  else
+{
+  maxt = 

Re: RFC (attributes): PATCH for c++/50800 to set affects_type_identity for may_alias

2016-06-27 Thread Richard Biener
On Thu, Jun 23, 2016 at 9:39 PM, Jason Merrill  wrote:
> My earlier patch for 50800 fixed the ICE by consistently stripping
> non-mangled attributes from template arguments, and mangling those that
> affect type identity.  At the C++ meeting this week someone pointed out to
> me that this is a real problem for x86 vector code, which relies on
> may_alias semantics: if may_alias is stripped from __m128, users can't use
> templates with vectors.
>
> So, it seems that the solution is to mangle may_alias by saying that it
> affects type identity.  But since we still want to be able to convert back
> and forth, I thought that it would make sense to treat the may_alias version
> of a type as a variant, rather than a new distinct type.  So the first patch
> creates a new category of attributes that are treated as type variants.
>
> An alternative patch just sets affects_type_identity and adjusts the C++
> front end to allow conversion between pointers to add or discard may_alias.
>
> Thoughts?

As may_alias purely affects semantics in the implementation of an API
but not the ABI it shouldn't effect mangling.  In the middle-end we use
TYPE_REF_CAN_ALIAS_ALL and that and the unqualified pointer
share the same canonical type (but it's not a variant type, pointer types
are chained via TYPE_POINTER_TO).

Not sure if you can make use of the canonicalness in the C++ FE
and maybe drop the attribute early there.

Richard.

> Tested x86_64-pc-linux-gnu.


Re: Fix zero size debug array swap noexcept qualification

2016-06-27 Thread Jonathan Wakely

On 23/06/16 22:22 +0200, François Dumont wrote:
   Debug mode array had simply been forgotten when fixing zero-size 
swap method as part of swappable traits implementation.


   * include/debug/array (array<>::swap): Fix noexcept qualificaton for
   zero-size array.

   Tested under Linux x86_64 debug mode.


OK for trunk, thanks.



[PATCH] remove unused CTOR_LISTS_DEFINED_EXTERNALLY macro

2016-06-27 Thread tbsaunde+gcc
From: Trevor Saunders 

Hi,

The last target to use this was i386-interix, so since that is gone we
don't need this anymore.

bootstrapped and regtested on x86-linux-gnu, ok?

Trev

libgcc/ChangeLog:

2016-06-27  Trevor Saunders  

* libgcc2.c (SYMBOL__MAIN): Remove checks for
CTOR_LISTS_DEFINED_EXTERNALLY.
---
 libgcc/libgcc2.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/libgcc/libgcc2.c b/libgcc/libgcc2.c
index 6bc9a2f..0a716bf 100644
--- a/libgcc/libgcc2.c
+++ b/libgcc/libgcc2.c
@@ -2309,8 +2309,7 @@ SYMBOL__MAIN (void)
must be in the bss/common section.
 
Long term no port should use those extensions.  But many still do.  */
-#if !defined(__LIBGCC_INIT_SECTION_ASM_OP__) \
-&& !defined(CTOR_LISTS_DEFINED_EXTERNALLY)
+#if !defined(__LIBGCC_INIT_SECTION_ASM_OP__)
 #if defined (TARGET_ASM_CONSTRUCTOR) || defined (USE_COLLECT2)
 func_ptr __CTOR_LIST__[2] = {0, 0};
 func_ptr __DTOR_LIST__[2] = {0, 0};
@@ -2318,6 +2317,6 @@ func_ptr __DTOR_LIST__[2] = {0, 0};
 func_ptr __CTOR_LIST__[2];
 func_ptr __DTOR_LIST__[2];
 #endif
-#endif /* no __LIBGCC_INIT_SECTION_ASM_OP__ and not 
CTOR_LISTS_DEFINED_EXTERNALLY */
+#endif /* no __LIBGCC_INIT_SECTION_ASM_OP__ */
 #endif /* L_ctors */
 #endif /* LIBGCC2_UNITS_PER_WORD <= MIN_UNITS_PER_WORD */
-- 
2.7.4



[PATCH, CHKP, PR ipa/71624] Fix local.can_change_signature computation for instrumentation thunk callees

2016-06-27 Thread Ilya Enkovich
Hi,

This patch sets local.can_change_signature to false for instrumentation
thunk callees.  We have two reasons for that:
  - We don't support modification of instrumentation thunks
  - We don't actually emit instrumentation thunk and therefore its
signature should be in sync with callee

This patch should prevent incorrect IPA optimizations on instrumneted
functions.

Bootstrapped and tested on x86_64-unknown-linux-gnu.  I'm going to apply
it to trunk and gcc-6-btanch.

@Honza, do you think this a valid fix to prevent invalid code modifications?

Thanks,
Ilya
--
gcc/

2016-06-27  Ilya Enkovich  

* ipa-inline-analysis.c (compute_inline_parameters): Set
local.can_change_signature to false for intrumentation
thunk callees.

gcc/testsuite/

2016-06-27  Ilya Enkovich  

* g++.dg/pr71624.C: New test.


diff --git a/gcc/ipa-inline-analysis.c b/gcc/ipa-inline-analysis.c
index 5d67218..da29d22 100644
--- a/gcc/ipa-inline-analysis.c
+++ b/gcc/ipa-inline-analysis.c
@@ -3017,6 +3017,16 @@ compute_inline_parameters (struct cgraph_node *node, 
bool early)
   node->local.can_change_signature = !e;
 }
 }
+   /* Functions called by instrumentation thunk can't change signature
+ because instrumentation thunk modification is not supported.  */
+   if (node->local.can_change_signature)
+for (e = node->callers; e; e = e->next_caller)
+  if (e->caller->thunk.thunk_p
+  && e->caller->thunk.add_pointer_bounds_args)
+{
+  node->local.can_change_signature = false;
+  break;
+}
estimate_function_body_sizes (node, early);
pop_cfun ();
  }
diff --git a/gcc/testsuite/g++.dg/pr71624.C b/gcc/testsuite/g++.dg/pr71624.C
new file mode 100644
index 000..94a75cd
--- /dev/null
+++ b/gcc/testsuite/g++.dg/pr71624.C
@@ -0,0 +1,35 @@
+/* PR71624 */
+// { dg-do compile { target i?86-*-* x86_64-*-* } }
+/* { dg-options "-fcheck-pointer-bounds -mmpx -O2" } */
+
+class c1
+{
+public:
+  virtual int fn1 () const;
+  int fn2 (const int *) const;
+};
+
+class c2
+{
+  int fn1 ();
+  c1 obj;
+};
+
+int
+c1::fn1 () const
+{
+  return 0;
+}
+
+int
+c1::fn2 (const int *) const
+{
+  return this->fn1 ();
+}
+
+int
+c2::fn1 ()
+{
+  return obj.fn2 (0);
+}
+


Re: [patch, avr,wwwdocs] PR 58655

2016-06-27 Thread Pitchumani Sivanupandi

Ping!

On Wednesday 22 June 2016 12:05 PM, Pitchumani Sivanupandi wrote:

On Tuesday 21 June 2016 09:39 PM, Georg-Johann Lay wrote:

Pitchumani Sivanupandi schrieb:

Attached patches add documentation for -mfract-convert-truncate option
and add that info to release notes (gcc-4.9 changes).

If OK, could someone commit please? I do not have commit access.

Regards,
Pitchumani

gcc/ChangeLog

2016-06-21  Pitchumani Sivanupandi 

PR target/58655
* doc/invoke.texi (AVR Options): Document -mfract-convert-truncate
option.

--- a/wwwdocs/htdocs/gcc-4.9/changes.html
+++ b/wwwdocs/htdocs/gcc-4.9/changes.html
@@ -579,6 +579,14 @@ auto incr(T x) { return x++; }
size when compiling for the M-profile processors.
  
  
+AVR
+
+  
+A new command-line option -mfract-convert-truncate has been added.


 tags around the option.


+It allows compiler to use truncation instead of rounding towards
+0 for fractional int types.


"zero" instead of "0", and it's for fixed-point types, not for int 
types.



+  
+
 IA-32/x86-64
   
 -mfpmath=sse is now implied by
-ffast-math

...

 @emph{Blackfin Options}
 @gccoptlist{-mcpu=@var{cpu}@r{[}-@var{sirevision}@r{]} @gol
@@ -14586,6 +14586,10 @@ sbiw r26, const   ; X -= const
 @opindex mtiny-stack
 Only change the lower 8@tie{}bits of the stack pointer.

+@item -mfract-convert-truncate
+@opindex mfract-convert-truncate
+Allow to use truncation instead of rounding towards 0 for fractional
int types.


Same here: "zero" and "fixed-point".


+
 @item -nodevicelib
 @opindex nodevicelib
 Don't link against AVR-LibC's device specific library
@code{lib.a}.



Thanks Johann.

Updated the patches.

Regards,
Pitchumani

gcc/ChangeLog

2016-06-22  Pitchumani Sivanupandi  

 PR target/58655
 * config/avr/avr.opt (-mfract-convert-truncate): Update description.
 * doc/invoke.texi (AVR Options): Document it.






Re: RFC (attributes): PATCH for c++/50800 to set affects_type_identity for may_alias

2016-06-27 Thread Florian Weimer

On 06/23/2016 09:39 PM, Jason Merrill wrote:

-/* { dg-final { scan-assembler "_ZN1AIdEC1Ev" } } */
+/* { dg-final { scan-assembler "_ZN1AIU9may_aliasdEC1Ev" } } */


I find this rather worrying.

In glibc, we want to use the may_alias attribute on struct 
sockaddr_storage for POSIX conformance.  We can only do this if it does 
not affect name mangling in application code.


If we define a struct like this:

extern "C" {

struct sockaddr_storage
{
  int ss_family;
  char pad[100];
};

};

would your patch change the assembler name of a C++ function with the 
following declaration


  void get_address(struct sockaddr_storage *);

?

Thanks,
Florian


Re: [ARM][testsuite] Add missing guards to fp16 AdvSIMD tests

2016-06-27 Thread Kyrill Tkachov


On 21/06/16 09:45, Christophe Lyon wrote:

Hi,

I've noticed that some guards were missing on some of the AdvSIMD
tests involving fp16 code.

The attached patch fixes, although I didn't notice any difference in
validation: I have no configuration where
check_effective_target_arm_neon_fp16_ok fails.

I did locally modify this effective target to always return false to
make sure I covered all the missing guards.

However, I'm not sure when check_effective_target_arm_neon_fp16_ok can
fail? This effective target test was added by Alan Lawrence a few
months ago.

OK?


Ok.
Thanks,
Kyrill


Christophe




Re: [ARM][testsuite] neon-testgen.ml removal

2016-06-27 Thread Christophe Lyon
ping?

On 22 June 2016 at 17:52, Christophe Lyon  wrote:
> Hi,
>
> This is a new attempt at removing neon-testgen.ml and generated files.
>
> Compared to my previous version several months ago:
> - I have recently added testcases to make sure we do not lose coverage
> as described in
> https://gcc.gnu.org/ml/gcc-patches/2015-11/msg02922.html
> - I now also remove neon.ml as requested by Kyrylo in
> https://gcc.gnu.org/ml/gcc-patches/2015-11/msg01664.html, and moved
> the remaining hand-written tests up to gcc.target/arm.
>
> Doing this, I had to slightly update vst1Q_laneu64-1.c because it's
> now compiled with more pedantic flags and there was a signed/unsigned
> char buffer pointer mismatch.
>
> Sorry, I had to compress the patch, otherwise it's too large and rejected
> by the list server.
>
> OK?
>
> Christophe


Re: [ARM][testsuite] Add missing guards to fp16 AdvSIMD tests

2016-06-27 Thread Christophe Lyon
ping?

On 21 June 2016 at 10:45, Christophe Lyon  wrote:
> Hi,
>
> I've noticed that some guards were missing on some of the AdvSIMD
> tests involving fp16 code.
>
> The attached patch fixes, although I didn't notice any difference in
> validation: I have no configuration where
> check_effective_target_arm_neon_fp16_ok fails.
>
> I did locally modify this effective target to always return false to
> make sure I covered all the missing guards.
>
> However, I'm not sure when check_effective_target_arm_neon_fp16_ok can
> fail? This effective target test was added by Alan Lawrence a few
> months ago.
>
> OK?
>
> Christophe


[ARM] FP16 ARM Alternative format variants of AAPCS tests.

2016-06-27 Thread Matthew Wahab

Hello,

Tests added for FP16 argument and return values being passed in
registers only check the case when the FP16 IEEE format is used. This
patch adds equivalent tests that also check the behaviour when the
ARM Alternative format is used.

This patch depends on the testsuite directives added for the FP16 aapcs
tests at https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01794.html.

Tested arm-none-eabi with cross-compiled make check-gcc and
arm-none-linux-gnueabihf with native make check.

Ok for trunk?
Matthew

testsuite/
2016-06-27  Matthew Wahab  

* gcc.target/arm/fp16-aapcs-3.c: New.
* gcc.target/arm/fp16-aapcs-4.c: New.
* gcc.target/arm/aapcs/aapcs/vfp22.c: New.
* gcc.target/arm/aapcs/aapcs/vfp23.c: New.
* gcc.target/arm/aapcs/aapcs/vfp24.c: New.
* gcc.target/arm/aapcs/aapcs/vfp25.c: New.

>From 13b0cbec24a3fdeaaf6318acb42c79bf76e3414e Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Wed, 15 Jun 2016 09:22:59 +0100
Subject: [PATCH] [ARM] FP16 ARM Alternative format variants of AAPCS tests.

Tests added for FP16 argument and return values being passed in
registers only check the case when the FP16 IEEE format is used. This
patch adds equivalent tests that to also check the behaviour when the
ARM Alternative format is used.

Tested arm-none-eabi with cross-compiled make check and
arm-none-linux-gnueabihf with native make check.

testsuite/
2016-06-27  Matthew Wahab  

	* gcc.target/arm/fp16-aapcs-3.c: New.
	* gcc.target/arm/fp16-aapcs-4.c: New.
	* gcc.target/arm/aapcs/aapcs/vfp22.c: New.
	* gcc.target/arm/aapcs/aapcs/vfp23.c: New.
	* gcc.target/arm/aapcs/aapcs/vfp24.c: New.
	* gcc.target/arm/aapcs/aapcs/vfp25.c: New.
---
 gcc/testsuite/gcc.target/arm/aapcs/vfp22.c  | 28 +++
 gcc/testsuite/gcc.target/arm/aapcs/vfp23.c  | 30 +
 gcc/testsuite/gcc.target/arm/aapcs/vfp24.c  | 22 +
 gcc/testsuite/gcc.target/arm/aapcs/vfp25.c  | 26 +
 gcc/testsuite/gcc.target/arm/fp16-aapcs-3.c | 21 
 gcc/testsuite/gcc.target/arm/fp16-aapcs-4.c | 21 
 6 files changed, 148 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/arm/aapcs/vfp22.c
 create mode 100644 gcc/testsuite/gcc.target/arm/aapcs/vfp23.c
 create mode 100644 gcc/testsuite/gcc.target/arm/aapcs/vfp24.c
 create mode 100644 gcc/testsuite/gcc.target/arm/aapcs/vfp25.c
 create mode 100644 gcc/testsuite/gcc.target/arm/fp16-aapcs-3.c
 create mode 100644 gcc/testsuite/gcc.target/arm/fp16-aapcs-4.c

diff --git a/gcc/testsuite/gcc.target/arm/aapcs/vfp22.c b/gcc/testsuite/gcc.target/arm/aapcs/vfp22.c
new file mode 100644
index 000..1944bb5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/aapcs/vfp22.c
@@ -0,0 +1,28 @@
+/* Test AAPCS layout (VFP variant)  */
+
+/* { dg-do run { target arm_eabi } }  */
+/* { dg-require-effective-target arm_hard_vfp_ok }  */
+/* { dg-require-effective-target arm_fp16_hw }  */
+/* { dg-add-options arm_fp16_alternative }  */
+
+#ifndef IN_FRAMEWORK
+#define VFP
+#define TESTFILE "vfp22.c"
+#include "abitest.h"
+
+#else
+#if defined (__ARM_BIG_ENDIAN)
+ARG (__fp16, 1.0f, S0 + 2)
+#else
+ARG (__fp16, 1.0f, S0)
+#endif
+ARG (float, 2.0f, S1)
+ARG (double, 4.0, D1)
+ARG (float, 2.0f, S4)
+#if defined (__ARM_BIG_ENDIAN)
+ARG (__fp16, 1.0f, S5 + 2)
+#else
+ARG (__fp16, 1.0f, S5)
+#endif
+LAST_ARG (int, 3, R0)
+#endif
diff --git a/gcc/testsuite/gcc.target/arm/aapcs/vfp23.c b/gcc/testsuite/gcc.target/arm/aapcs/vfp23.c
new file mode 100644
index 000..bcacf9f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/aapcs/vfp23.c
@@ -0,0 +1,30 @@
+/* Test AAPCS layout (VFP variant)  */
+
+/* { dg-do run { target arm_eabi } }  */
+/* { dg-require-effective-target arm_hard_vfp_ok }  */
+/* { dg-require-effective-target arm_fp16_hw }  */
+/* { dg-add-options arm_fp16_alternative }  */
+
+#ifndef IN_FRAMEWORK
+#define VFP
+#define TESTFILE "vfp23.c"
+
+__complex__ x = 1.0+2.0i;
+
+#include "abitest.h"
+#else
+#if defined (__ARM_BIG_ENDIAN)
+ARG (__fp16, 1.0f, S0 + 2)
+#else
+ARG (__fp16, 1.0f, S0)
+#endif
+ARG (float, 2.0f, S1)
+ARG (__complex__ double, x, D1)
+ARG (float, 3.0f, S6)
+#if defined (__ARM_BIG_ENDIAN)
+ARG (__fp16, 2.0f, S7 + 2)
+#else
+ARG (__fp16, 2.0f, S7)
+#endif
+LAST_ARG (int, 3, R0)
+#endif
diff --git a/gcc/testsuite/gcc.target/arm/aapcs/vfp24.c b/gcc/testsuite/gcc.target/arm/aapcs/vfp24.c
new file mode 100644
index 000..eac640e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/aapcs/vfp24.c
@@ -0,0 +1,22 @@
+/* Test AAPCS layout (VFP variant)  */
+
+/* { dg-do run { target arm_eabi } }  */
+/* { dg-require-effective-target arm_hard_vfp_ok }  */
+/* { dg-require-effective-target arm_fp16_hw }  */
+/* { dg-add-options arm_fp16_alternative }  */
+
+#ifndef IN_FRAMEWORK
+#define VFP
+#define TESTFILE "vfp24.c"
+
+#define PCSATTR __attribute__((pcs("aapcs")))
+
+#include "abitest.h"

Re: [ARM] Fix, add tests for FP16 aapcs.

2016-06-27 Thread Matthew Wahab

On 10/06/16 15:30, Matthew Wahab wrote:
> On 10/06/16 15:22, Christophe Lyon wrote:
>> On 10 June 2016 at 15:56, Matthew Wahab  wrote:
>>> On 10/06/16 09:32, Christophe Lyon wrote:

 On 9 June 2016 at 17:21, Matthew Wahab  wrote:
>
 It's an improvement, but I'm still seeing a few problems with this patch:
 the vfp* tests are still failing in some of the configurations I test,
 because
 * you force dg-options that contains -mfloat-abi=hard,
 * you check effective-target arm_neon_fp16_hw
 * but you don't call dg-add-options arm_neon_fp16

> I understand now. I still think it would be better to use a list of
> require-effective-targets so I'll try that first and use the arm_neon_fp16
> options if that doesn't work.
>

Sorry for the delay. I've added effective-target requirements to the
tests to check for hard-fp and for VFP (i.e. non-neon) FP16 support. The
directives for the VFP FP16 support are new. I've split them out to a
separate patch, both patches are attached.

The first patch adds:

- effective-target keywords arm_fp16_ok and arm_fp16_hw to check for
  compiler and hardware support for FP16.

- add-options features arm_fp16_ieee and arm_fp16_alternative, to
  enable FP16 IEEE format and FP16 ARM Alternative format support

Note that the existing add-options feature arm_fp16 enables the default
FP16 format (fp16-format=none).

The second patch updates the tests to use these directives. It also
reworks gcc.target/arm/fp16-aapcs-1.c test is also reworked to focus on
argument passing and return values adds a softfp variant as
fp16-aapcs-2.c.

As before, checked for arm-none-eabi with cross-compiled check-gcc and
arm-linux-gnueabihf with native make check. I also ran the tests for
cross-compiled arm-none-eabi with -mcpu=Cortex-M3.

Ok for trunk?
Matthew

PATCH 1/2 ChangeLog
gcc/
2016-06-27  Matthew Wahab  

* doc/sourcebuild.texi (Effective-Target keywords): Add entries
for arm_fp16_ok and arm_fp16_hw.
(Add Options): Add entries for arm_fp16, arm_fp16_ieee and
arm_fp16_alternative.

testsuite/
2016-06-27  Matthew Wahab  

* lib/target-supports.exp (add_options_for arm_fp16): Reword
comment.
(add_options_for_arm_fp16_ieee): New.
(add_options_for_arm_fp16_alternative): New.
(check_effective_target_arm_fp16_ok_nocache): Add to comment.  Fix a
long-line.
(check_effective_target_arm_fp16_hw): New.

PATCH 2/2 ChangeLog
testsuite/
2016-06-27  Matthew Wahab  

* testsuite/gcc.target/arm/aapcs/neon-vect10.c: Require
-mfloat-ab=hard.  Replace arm_neon_fp16_ok with arm_neon_fp16_hw.
* testsuite/gcc.target/arm/aapcs/neon-vect9.c: Likewise.
* testsuite/gcc.target/arm/aapcs/vfp18.c: Likewise.  Also add
options for ARM FP16 IEEE format.
* testsuite/gcc.target/arm/aapcs/vfp19.c: Likewise.
* testsuite/gcc.target/arm/aapcs/vfp20.c: Likewise.
* testsuite/gcc.target/arm/aapcs/vfp21.c: Likewise.
* testsuite/gcc.target/arm/fp16-aapcs-1.c: Require
-mfloat-ab=hard.  Also simplify the test.
* testsuite/gcc.target/arm/fp16-aapcs-2.c: New.

>From ff46f8397b2ae4ffe3be0027849aa8ff63e9ab9b Mon Sep 17 00:00:00 2001
From: Matthew Wahab 
Date: Mon, 13 Jun 2016 13:30:13 +0100
Subject: [PATCH 1/2] [Testsuite] Selectors and options directives for ARM VFP
 FP16 support.

To support FP16 VFP tests for the ARM backend, this patch adds:

- effective-target keywords arm_fp16_ok and arm_fp16_hw to check for
  compiler and hardware support for FP16.

- add-options features arm_fp16_ieee and arm_fp16_alternative, to
  enable FP16 IEEE format and FP16 ARM Alternative format support

Note that the existing add-options feature arm_fp16 enables the default
FP16 format (fp16-format=none).

gcc/
2016-06-27  Matthew Wahab  

	* doc/sourcebuild.texi (Effective-Target keywords): Add entries
	for arm_fp16_ok and arm_fp16_hw.
	(Add Options): Add entries for arm_fp16, arm_fp16_ieee and
	arm_fp16_alternative.

testsuite/
2016-06-27  Matthew Wahab  

	* lib/target-supports.exp (add_options_for arm_fp16): Reword
	comment.
	(add_options_for_arm_fp16_ieee): New.
	(add_options_for_arm_fp16_alternative): New.
	(effective_target_arm_fp16_ok_nocache): Add to comment.  Fix a
	long-line.
	(effective_target_arm_fp16_hw): New.
---
 gcc/doc/sourcebuild.texi  | 32 +++
 gcc/testsuite/lib/target-supports.exp | 58 ---
 2 files changed, 85 insertions(+), 5 deletions(-)

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 95a781d..23d3c3f 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -1555,6 +1555,16 @@ options.  Some multilibs may be incompatible with these options.
 ARM 

Re: [PATCH] Allow fwprop to undo vectorization harm (PR68961)

2016-06-27 Thread Richard Biener
On Wed, 15 Jun 2016, Richard Sandiford wrote:

> Richard Biener  writes:
> > With the proposed cost change for vector construction we will end up
> > vectorizing the testcase in PR68961 again (on x86_64 and likely
> > on ppc64le as well after that target gets adjustments).  Currently
> > we can't optimize that away again noticing the direct overlap of
> > argument and return registers.  The obstackle is
> >
> > (insn 7 4 8 2 (set (reg:V2DF 93)
> > (vec_concat:V2DF (reg/v:DF 91 [ a ])
> > (reg/v:DF 92 [ aa ]))) 
> > ...
> > (insn 21 8 24 2 (set (reg:DI 97 [ D.1756 ])
> > (subreg:DI (reg:TI 88 [ D.1756 ]) 0))
> > (insn 24 21 11 2 (set (reg:DI 100 [+8 ])
> > (subreg:DI (reg:TI 88 [ D.1756 ]) 8))
> >
> > which we eventually optimize to DFmode subregs of (reg:V2DF 93).
> >
> > First of all simplify_subreg doesn't handle the subregs of a vec_concat
> > (easy fix below).
> >
> > Then combine doesn't like to simplify the multi-use (it tries some
> > parallel it seems).  So I went to forwprop which eventually manages
> > to do this but throws away the result (reg:DF 91) or (reg:DF 92)
> > because it is not a constant.  Thus I allow arbitrary simplification
> > results for SUBREGs of [VEC_]CONCAT operations.  There doesn't seem
> > to be a magic flag to tell it to restrict to the case where all
> > uses can be simplified or so, nor to restrict simplifications to a REG.
> > But I don't see any undesirable simplifications of (subreg 
> > ([vec_]concat)).
> 
> Adding that as a special case to propgate_rtx feels like a hack though :-)
> I think:
> 
> > Index: gcc/fwprop.c
> > ===
> > *** gcc/fwprop.c(revision 237286)
> > --- gcc/fwprop.c(working copy)
> > *** propagate_rtx (rtx x, machine_mode mode,
> > *** 664,670 
> > || (GET_CODE (new_rtx) == SUBREG
> >   && REG_P (SUBREG_REG (new_rtx))
> >   && (GET_MODE_SIZE (mode)
> > ! <= GET_MODE_SIZE (GET_MODE (SUBREG_REG (new_rtx))
> >   flags |= PR_CAN_APPEAR;
> > if (!varying_mem_p (new_rtx))
> >   flags |= PR_HANDLE_MEM;
> > --- 664,673 
> > || (GET_CODE (new_rtx) == SUBREG
> >   && REG_P (SUBREG_REG (new_rtx))
> >   && (GET_MODE_SIZE (mode)
> > ! <= GET_MODE_SIZE (GET_MODE (SUBREG_REG (new_rtx)
> > !   || ((GET_CODE (new_rtx) == VEC_CONCAT
> > !  || GET_CODE (new_rtx) == CONCAT)
> > ! && GET_CODE (x) == SUBREG))
> >   flags |= PR_CAN_APPEAR;
> > if (!varying_mem_p (new_rtx))
> >   flags |= PR_HANDLE_MEM;
> 
> ...this if statement should fundamentally only test new_rtx.
> E.g. we'd want the same thing for any SUBREG inside X.
> 
> How about changing:
> 
>   /* The replacement we made so far is valid, if all of the recursive
>  replacements were valid, or we could simplify everything to
>  a constant.  */
>   return valid_ops || can_appear || CONSTANT_P (tem);
> 
> so that (REG_P (tem) && !HARD_REGISTER_P (tem)) is also valid?
> I suppose that's likely to increase register pressure though,
> if only some uses of new_rtx simplify.  (There again, requiring all
> uses to be replacable could make hot code the hostage of cold code.)

Yes, my fear was about register presure increase for the case not all
uses can be replaced (fwprop doesn't seem to have code to verify or
require that).

I can avoid checking for GET_CODE (x) == SUBREG and add a PR_REG
case to restrict REG_P (tem) && !HARD_REGISTER_P (tem) to the
new_rtx == [VEC_]CONCAT case for example.

Richard.

> Thanks,
> Richard
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [PATCH] Backport PowerPC complex __float128 compiler support to GCC 6.x

2016-06-27 Thread Richard Biener
On Wed, 22 Jun 2016, Michael Meissner wrote:

> On Wed, Jun 15, 2016 at 11:01:05AM +0200, Richard Biener wrote:
> > And I don't understand the layout_type change either - it looks to me
> > it could just have used
> > 
> >   SET_TYPE_MODE (type, GET_MODE_COMPLEX_MODE (TYPE_MODE 
> > (TREE_TYPE (type;
> > 
> > and be done with it.  To me that looks a lot safer.
> 
> I made this change in the trunk, and now I would like approval for applying
> this code which includes the above change in the GCC 6.2 branch.
> 
> Here is the change for the trunk:
> https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01489.html
> 
> I tested it on both a big endian power7 and a little endian power8 systems 
> with
> no regressions.  Is it ok to apply to the GCC 6.2 branch?

Ok.

Richard.

> [gcc]
> 2016-06-22  Michael Meissner  
> 
>   Back port from trunk
>   2016-06-21  Michael Meissner  
> 
>   * stor-layout.c (layout_type): Move setting complex MODE to
>   layout_type, instead of setting it ahead of time by the caller.
> 
>   Back port from trunk
>   2016-05-11  Alan Modra  
> 
>   * config/rs6000/rs6000.c (is_complex_IBM_long_double,
>   abi_v4_pass_in_fpr): New functions.
>   (rs6000_function_arg_boundary): Exclude complex IBM long double
>   from 64-bit alignment when ABI_V4.
>   (rs6000_function_arg, rs6000_function_arg_advance_1,
>   rs6000_gimplify_va_arg): Use abi_v4_pass_in_fpr.
> 
>   Back port from trunk
>   2016-05-02  Michael Meissner  
> 
>   * machmode.h (mode_complex): Add support to give the complex mode
>   for a given mode.
>   (GET_MODE_COMPLEX_MODE): Likewise.
>   * stor-layout.c (layout_type): For COMPLEX_TYPE, use the mode
>   stored by build_complex_type and gfc_build_complex_type instead of
>   trying to figure out the appropriate mode based on the size. Raise
>   an assertion error, if the type was not set.
>   * genmodes.c (struct mode_data): Add field for the complex type of
>   the given type.
>   (blank_mode): Likewise.
>   (make_complex_modes): Remember the complex mode created in the
>   base type.
>   (emit_mode_complex): Write out the mode_complex array to map a
>   type mode to the complex version.
>   (emit_insn_modes_c): Likewise.
>   * tree.c (build_complex_type): Set the complex type to use before
>   calling layout_type.
>   * config/rs6000/rs6000.c (rs6000_hard_regno_nregs_internal): Add
>   support for __float128 complex datatypes.
>   (rs6000_hard_regno_mode_ok): Likewise.
>   (rs6000_setup_reg_addr_masks): Likewise.
>   (rs6000_complex_function_value): Likewise.
>   * config/rs6000/rs6000.h (FLOAT128_IEEE_P): Likewise.
>   __float128 and __ibm128 complex.
>   (FLOAT128_IBM_P): Likewise.
>   (ALTIVEC_ARG_MAX_RETURN): Likewise.
>   * doc/extend.texi (Additional Floating Types): Document that
>   -mfloat128 must be used to enable __float128.  Document complex
>   __float128 and __ibm128 support.
> 
> [gcc/testsuite]
> 2016-06-22  Michael Meissner  
> 
>   Back port from trunk
>   2016-05-02  Michael Meissner  
> 
>   * gcc.target/powerpc/float128-complex-1.c: New tests for complex
>   __float128.
>   * gcc.target/powerpc/float128-complex-2.c: Likewise.
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: Fix for PR70926 in Libiberty Demangler (5)

2016-06-27 Thread Marcel Böhme
Hi Jeff,

On 23 Jun 2016, at 4:21 AM, Jeff Law  wrote:
> 
> OK for the trunk.  Please install.
> 
> Sorry for the delays.
> 
> Jeff

I might not have the access rights to commit to trunk.

Best regards
- Marcel