Re: [PATCH] Refactor optimize isl

2015-09-11 Thread Tobias Grosser

On 09/11/2015 07:07 PM, Aditya Kumar wrote:

Updated patch with corrections:

Refactor graphite-optimize-isl.c. Renamed function name, variable names etc.,
and indented the source according to gcc style guidelines.  Modified comments
accordingly. No functional change intended.


Looks reasonable.

Just for history, this code was copied from Polly this is why the formatting 
does
not match gcc's style. The relevant file in Polly has been evolved since then
and might provide you with ideas on how to improve this file in gcc.

Tobias


Passes regtest and bootstap on x86_64.

gcc/ChangeLog:

2015-09-10  Aditya Kumar  

 * graphite-optimize-isl.c (get_tile_map): Refactor.
 (get_schedule_for_band): Same.
 (getScheduleForBand): Same.
 (get_prevector_map): Same.
 (get_schedule_for_band_list): Same.
 (get_schedule_map): Same.
 (get_single_map): Same.
 (apply_schedule_map_to_scop): Same.
 (optimize_isl): Same.


---
  gcc/graphite-optimize-isl.c | 416 ++--
  1 file changed, 210 insertions(+), 206 deletions(-)

diff --git a/gcc/graphite-optimize-isl.c b/gcc/graphite-optimize-isl.c
index 811a510..e891e91 100644
--- a/gcc/graphite-optimize-isl.c
+++ b/gcc/graphite-optimize-isl.c
@@ -50,6 +50,9 @@ along with GCC; see the file COPYING3.  If not see
  #include "params.h"
  #include "dumpfile.h"

+/* Set this to true to disable tiling of nested loops.  */
+static bool disable_tiling = false;
+
  static isl_union_set *
  scop_get_domains (scop_p scop ATTRIBUTE_UNUSED)
  {
@@ -64,152 +67,153 @@ scop_get_domains (scop_p scop ATTRIBUTE_UNUSED)
return res;
  }

-/* getTileMap - Create a map that describes a n-dimensonal tiling.
-
-   getTileMap creates a map from a n-dimensional scattering space into an
+/* get_tile_map - Create a map that describes a n-dimensonal tiling.
+
+   get_tile_map creates a map from a n-dimensional scattering space into an
 2*n-dimensional scattering space. The map describes a rectangular tiling.
-
+
 Example:
- scheduleDimensions = 2, parameterDimensions = 1, tileSize = 32
-
-tileMap := [p0] -> {[s0, s1] -> [t0, t1, s0, s1]:
- t0 % 32 = 0 and t0 <= s0 < t0 + 32 and
- t1 % 32 = 0 and t1 <= s1 < t1 + 32}
-
+ SCHEDULE_DIMENSIONS = 2, PARAMETER_DIMENSIONS = 1, TILE_SIZE = 32
+
+tile_map := [p0] -> {[s0, s1] -> [t0, t1, s0, s1]:
+t0 % 32 = 0 and t0 <= s0 < t0 + 32 and
+t1 % 32 = 0 and t1 <= s1 < t1 + 32}
+
 Before tiling:
-
+
 for (i = 0; i < N; i++)
   for (j = 0; j < M; j++)
-   S(i,j)
-
+   S(i,j)
+
 After tiling:
-
+
 for (t_i = 0; t_i < N; i+=32)
   for (t_j = 0; t_j < M; j+=32)
-   for (i = t_i; i < min(t_i + 32, N); i++)  | Unknown that N % 32 = 0
- for (j = t_j; j < t_j + 32; j++)|   Known that M % 32 = 0
-   S(i,j)
-   */
-
+   for (i = t_i; i < min(t_i + 32, N); i++)  | Unknown that N % 32 = 0
+ for (j = t_j; j < t_j + 32; j++)|   Known that M % 32 = 0
+   S(i,j)
+  */
+
  static isl_basic_map *
-getTileMap (isl_ctx *ctx, int scheduleDimensions, int tileSize)
+get_tile_map (isl_ctx *ctx, int schedule_dimensions, int tile_size)
  {
-  int x;
/* We construct

- tileMap := [p0] -> {[s0, s1] -> [t0, t1, p0, p1, a0, a1]:
-   s0 = a0 * 32 and s0 = p0 and t0 <= p0 < t0 + 32 and
-   s1 = a1 * 32 and s1 = p1 and t1 <= p1 < t1 + 32}
+ tile_map := [p0] -> {[s0, s1] -> [t0, t1, p0, p1, a0, a1]:
+   s0 = a0 * 32 and s0 = p0 and t0 <= p0 < t0 + 32 and
+   s1 = a1 * 32 and s1 = p1 and t1 <= p1 < t1 + 32}

   and project out the auxilary dimensions a0 and a1.  */
-  isl_space *Space = isl_space_alloc (ctx, 0, scheduleDimensions,
- scheduleDimensions * 3);
-  isl_basic_map *tileMap = isl_basic_map_universe (isl_space_copy (Space));
+  isl_space *space
+  = isl_space_alloc (ctx, 0, schedule_dimensions, schedule_dimensions * 3);
+  isl_basic_map *tile_map = isl_basic_map_universe (isl_space_copy (space));

-  isl_local_space *LocalSpace = isl_local_space_from_space (Space);
+  isl_local_space *local_space = isl_local_space_from_space (space);

-  for (x = 0; x < scheduleDimensions; x++)
+  for (int x = 0; x < schedule_dimensions; x++)
  {
int sX = x;
int tX = x;
-  int pX = scheduleDimensions + x;
-  int aX = 2 * scheduleDimensions + x;
+  int pX = schedule_dimensions + x;
+  int aX = 2 * schedule_dimensions + x;

isl_constraint *c;

-  /* sX = aX * tileSize; */
-  c = isl_equality_alloc (isl_local_space_copy (LocalSpace));
+  /* sX = aX * tile_size; */
+  c = isl_equality_alloc (isl_local_space_copy (local_space));
isl_constraint_set_coefficient_si (c, isl_dim_out, sX, 

[gomp4] Update fortran tests.

2015-09-11 Thread James Norris

Hi!

Attached is a patch to update two (2) tests that
were failing because the test cases were incorrect.

Committed after regtesting on x86_64 and powerpc64le.

Thanks!
Jim
Index: lib-12.f90
===
--- lib-12.f90	(revision 227667)
+++ lib-12.f90	(working copy)
@@ -4,12 +4,14 @@ program main
   use openacc
   implicit none
 
-  integer :: i, n
+  integer :: i, j, n
 
+  j = 0
   n = 100
 
-  !$acc parallel async (0)
+  !$acc parallel async (0) copy (j)
 do i = 1, 100
+  j = j + 1
 end do
   !$acc end parallel
 
Index: lib-13.f90
===
--- lib-13.f90	(revision 227667)
+++ lib-13.f90	(working copy)
@@ -4,17 +4,22 @@ program main
   use openacc
   implicit none
 
-  integer :: i, j, nprocs
+  integer :: i, j
   integer, parameter :: N = 100
+  integer, parameter :: nprocs = 2
+  integer :: k(nprocs)
 
-  nprocs = 2
+  k(:) = 0
 
-  do j = 1, nprocs
-!$acc parallel async (j)
-  do i = 1, N
-  end do
-!$acc end parallel
-  end do
+  !$acc data copy (k(1:nprocs))
+do j = 1, nprocs
+  !$acc parallel async (j)
+do i = 1, N
+  k(j) = k(j) + 1
+end do
+  !$acc end parallel
+end do
+  !$acc end data
 
   if (acc_async_test (1) .neqv. .TRUE.) call abort
   if (acc_async_test (2) .neqv. .TRUE.) call abort


Re: [PATCH] Remove dead code from graphite-optimize-isl.c

2015-09-11 Thread Tobias Grosser

On 09/11/2015 07:29 PM, Aditya Kumar wrote:

The variable `static bool enable_polly_vector' is always assigned to false. 
This results in dead code in optimize-isl.c.
Removing the dead code. No functional change intended.


Fine with me as well. This code is used in Polly to enable outer loop 
vectorization with Polly.
It was historically disabled in gcc, as we missed heuristics to drive this. 
Again, it might
be worth looking into Polly's version of this code, which now works on schedule 
trees and
which is now a lot easier to read and work with.

Best,
Tobias


Re: Use resolution info to get rid of weak symbols

2015-09-11 Thread H.J. Lu
On Sun, May 18, 2014 at 12:38 PM, Jan Hubicka  wrote:
> Hi,
> this patch makes GCC to use resolution info to turn COMDAT and WEAK
> symbols into regular symbols based on feedback given by linker plugin.
> If resolution says that given symbol is prevailing, it is possible
> to turn them into normal symbols, while when resolution says it
> is prevailed, it is possible to turn them into external symbols.
>
> Doing so makes rest of the backend to work smoother on them.
> We previously did this transformation partly for functions, this patch
> just makes it to happen for variables too and implements the second
> part (turning the symbol into external definition).
>
> Bootstrapped/regtested x86_64-linux and tested with libreoffice
> build.  Will commit it shortly.
>
> * ipa.c (update_visibility_by_resolution_info): New function.
> (function_and_variable_visibility): Use it.

This caused:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67548

-- 
H.J.


RE: [PATCH, MIPS] Frame header optimization for MIPS O32 ABI

2015-09-11 Thread Steve Ellcey
On Fri, 2015-09-04 at 01:40 -0700, Matthew Fortune wrote:

> A few comments below. I found some of the comments a bit hard to parse but 
> have
> not attempted any rewording. I'd like Catherine to comment too as I have 
> barely
> any experience at the gimple level to know if this accounts for any necessary
> subtleties.

Catherine said she would look at this next week but I have updated the
patch in the mean time to address your comments and give Catherine a
more up-to-date patch to look over.

The only functional change is in callees_functions_use_frame_header (was
called_functions_use_stack) where I check for weak functions and where I
return true if gimple_call_fndecl returned NULL.  I am not sure in what
exact situations gimple_call_fndecl will return NULL but I did run into
that when testing and so we do the conservative thing in that case and
assume we need the frame header.

I also used your example to add 3 tests in order to verify when the
optimization does and does not happen.

Steve Ellcey
sell...@imgtec.com


2015-09-11  Steve Ellcey  

* config.gcc (mips*-*-*): Add frame-header-opt.o to extra_objs.
* frame-header-opt.c: New file.
* config/mips/mips-proto.h (mips_register_frame_header_opt):
Add prototype.
* config/mips/mips.c (mips_compute_frame_info): Check
optimize_call_stack flag.
(mips_option_override): Register new frame_header_opt pass.
(mips_frame_info, mips_int_mask, mips_shadow_set,
machine_function): Move these types to...
* config/mips/mips.h: here.
(machine_function): Add does_not_use_frame_header and
optimize_call_stack fields.
* config/mips/t-mips (frame-header-opt.o): Add new make rule.
* doc/invoke.texi (-mframe-header-opt, -mno-frame-header-opt):
Document new flags.
* config/mips/mips.opt (mframe-header-opt): Add new option.

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 5712547..eea97de 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -420,6 +420,7 @@ microblaze*-*-*)
 mips*-*-*)
cpu_type=mips
extra_headers="loongson.h"
+   extra_objs="frame-header-opt.o"
extra_options="${extra_options} g.opt fused-madd.opt 
mips/mips-tables.opt"
;;
 nds32*)
diff --git a/gcc/config/mips/frame-header-opt.c 
b/gcc/config/mips/frame-header-opt.c
index e69de29..5c4e93c 100644
--- a/gcc/config/mips/frame-header-opt.c
+++ b/gcc/config/mips/frame-header-opt.c
@@ -0,0 +1,221 @@
+/* Analyze functions to determine if calling functions need to allocate
+   stack space (a frame header) for its called functions to write out their
+   arguments on to the stack.  This optimization is only applicable to
+   TARGET_OLDABI targets because calling functions on TARGET_NEWABI targets
+   do not allocate any stack space for arguments (the called function does it
+   if needed).
+
+   Copyright (C) 2015 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT
+ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+
+#include "config.h"
+#include "system.h"
+#include "context.h"
+#include "coretypes.h"
+#include "tree.h"
+#include "tree-core.h"
+#include "tree-pass.h"
+#include "target.h"
+#include "target-globals.h"
+#include "cfg.h"
+#include "cgraph.h"
+#include "function.h"
+#include "basic-block.h"
+#include "gimple.h"
+#include "gimple-iterator.h"
+#include "gimple-walk.h"
+
+static unsigned int frame_header_opt (void);
+
+namespace {
+
+const pass_data pass_data_ipa_frame_header_opt =
+{
+  IPA_PASS, /* type */
+  "frame-header-opt", /* name */
+  OPTGROUP_NONE, /* optinfo_flags */
+  TV_CGRAPHOPT, /* tv_id */
+  0, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  0, /* todo_flags_finish */
+};
+
+class pass_ipa_frame_header_opt : public ipa_opt_pass_d
+{
+public:
+  pass_ipa_frame_header_opt (gcc::context *ctxt)
+: ipa_opt_pass_d (pass_data_ipa_frame_header_opt, ctxt,
+  NULL, /* generate_summary */
+  NULL, /* write_summary */
+  NULL, /* read_summary */
+  NULL, /* write_optimization_summary */
+  NULL, /* read_optimization_summary */
+  NULL, /* stmt_fixup */
+  0, /* function_transform_todo_flags_start */
+  NULL, /* 

libbacktrace patch committed: Update dependencies

2015-09-11 Thread Ian Lance Taylor
Although libbacktrace uses automake, it can't use automatic dependency
tracking, because it breaks when using bootstrap-lean (PR 54732).  The
dependencies for sort.lo and stest.lo were never added to the list.
Also, the dependencies for backtrace.lo were not updated for my recent
change to that file.  This patch fixes the problem.  Committed to
mainline.

Ian

2015-09-11  Ian Lance Taylor  

* Makefile.am (backtrace.lo): Depend on internal.h.
(sort.lo, stest.lo): Add explicit dependencies.
* Makefile.in: Rebuild.
Index: Makefile.am
===
--- Makefile.am (revision 227673)
+++ Makefile.am (working copy)
@@ -116,7 +116,7 @@ endif NATIVE
 
 INCDIR = $(top_srcdir)/../include
 alloc.lo: config.h backtrace.h internal.h
-backtrace.lo: config.h backtrace.h
+backtrace.lo: config.h backtrace.h internal.h
 btest.lo: (INCDIR)/filenames.h backtrace.h backtrace-supported.h
 dwarf.lo: config.h $(INCDIR)/dwarf2.h $(INCDIR)/dwarf2.def \
$(INCDIR)/filenames.h backtrace.h internal.h
@@ -130,5 +130,7 @@ posix.lo: config.h backtrace.h internal.
 print.lo: config.h backtrace.h internal.h
 read.lo: config.h backtrace.h internal.h
 simple.lo: config.h backtrace.h internal.h
+sort.lo: config.h backtrace.h internal.h
+stest.lo: config.h backtrace.h internal.h
 state.lo: config.h backtrace.h backtrace-supported.h internal.h
 unknown.lo: config.h backtrace.h internal.h


Re: Ping^2 Re: Pass -foffload targets from driver to libgomp at link time

2015-09-11 Thread Joseph Myers
On Fri, 11 Sep 2015, Jakub Jelinek wrote:

> On Fri, Sep 11, 2015 at 03:26:04PM +, Joseph Myers wrote:
> > On Fri, 11 Sep 2015, Bernd Schmidt wrote:
> > 
> > > I expect you know best what to do in the OpenACC testsuite driver, but you
> > > might want to run the libgomp.exp parts by Jakub. If the testsuite parts 
> > > are
> > > independent of the rest of the patch, please repost them separately.
> > 
> > Jakub?  The testsuite changes and the rest of the patch depend on each 
> > other.
> 
> So, do I understand well that you'll call GOMP_set_offload_targets from
> constructs of all shared libraries (and the binary) that contain offloaded
> code?  If yes, that is surely going to fail the assertions in there.
> You can dlopen such libraries etc.  What if you link one library with
> -fopenmp=nvptx-none and another one with -fopenmp=x86_64-intelmicemul-linux?

Thomas (I think you're back next week), any comments on how shared 
libraries with different offloading selected fit into your design 
(including the case where some but not all of the executable / shared 
libraries specify -foffload=disable)?

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH][AArch64] Use preferred aliases for CSNEG, CSINC, CSINV

2015-09-11 Thread Kyrill Tkachov


On 11/09/15 16:31, James Greenhalgh wrote:

On Tue, Sep 01, 2015 at 11:08:10AM +0100, Kyrill Tkachov wrote:

Hi all,

The ARMv8-A reference manual says:
"CNEG , , 
is equivalent to
CSNEG , , , invert()
and is the preferred disassembly when Rn == Rm && cond != '111x'."

That is, when the two input registers are the same we can use the shorter CNEG 
mnemonic
with the inverse condition instead of the longer CSNEG instruction. Similarly 
for the
CSINV and CSINC instructions, they have shorter CINV and CINC forms.
This patch adjusts the output templates to emit the preferred shorter sequences 
when possible.

The new mnemonics are just aliases, they map down to the same instruction in 
the end, so there
are no performance or behaviour implications. But it does make the assembly a 
bit more readable
IMO, since:
"cnegw27, w9, le"
can be simply read as "if the condition is less or equal negate w9" instead of 
the previous:
"csnegw27, w9, w9, gt" where you have to remember which of the input 
registers is negated.


Bootstrapped and tested on aarch64-linux-gnu.
Ok for trunk?

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 77bc7cd..2e4b26c 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -3090,7 +3090,12 @@ (define_insn "csinc3_insn"
(const_int 1))
  (match_operand:GPI 3 "aarch64_reg_or_zero" "rZ")))]
""
-  "csinc\\t%0, %3, %2, %M1"
+  {
+if (rtx_equal_p (operands[2], operands[3]))
+  return "cinc\\t%0, %2, %m1";
+else
+  return "csinc\\t%0, %3, %2, %M1";
+  }
[(set_attr "type" "csel")]
  )

I guess you do it this way rather than just adding a new alternative in
the pattern to avoid any chance of constraining the register allocator, but
would this not be more natural to read as an {r, r, r, 2} alternative, or
similar?


I had not considered this approach and I'm a bit sceptical on how feasible it 
is.
If we put the {r,r,r,2} as a second alternative then it will be a purely more 
restrictive
version of the first alternative and so will never match.
If, however, we put it as the first alternative we'll be expressing some 
preference for
allocating the same register for operands 2 and 3, which is not something we 
want to do.



If you've given that some thought and decided it doesn't work for you,
then this is OK for trunk.

Given the above
I'll commit this version next week if there are no objections.

Thanks,
Kyrill



Thanks,
James





Re: [patch] libstdc++/67173 Fix filesystem::canonical for Solaris 10.

2015-09-11 Thread Martin Sebor

On 09/11/2015 08:21 AM, Jonathan Wakely wrote:

Solaris 10 doesn't follow POSIX in accepting a null pointer as the
second argument to realpath(), so allocate a buffer for it.


FWIW, the NULL requirement is new in Issue 7. In Issue 6, the behavior
is implementation-defined, and before then it was an error. Solaris 10
claims conformance to SUSv2 and its realpath fails with EINVAL.
Solaris 11 says it conforms to Issue 6 but according to the man page
its realpath already implements the Issue 7 requirement.

I suspect the same problem will come up on other systems so checking
the POSIX version might be better than testing for each OS.

Martin



Re: [PATCH] vectorizing conditional expressions (PR tree-optimization/65947)

2015-09-11 Thread Ramana Radhakrishnan

>>> Saying that all reductions have equivalent performance is unlikely to be
>>> true for many platforms.  On PowerPC, for example, a PLUS reduction has
>>> very different cost from a MAX reduction.  If the model isn't
>>> fine-grained enough, let's please be aggressive about fixing it.  I'm
>>> fine if it's a separate patch, but in my mind this shouldn't be allowed
>>> to languish.
>>
>> ...I agree that the general vectoriser cost model could probably be
>> improved, but it seems fairer for that improvement to be done by whoever
>> adds the patterns that need it.
> 
> All right.  But in response to Ramana's comment, are all relevant
> reductions of similar cost on each ARM platform?  Obviously they don't
> have the same cost on different platforms, but the question is whether a
> reduc_plus, reduc_max, etc., has identical cost on each individual
> platform.  If not, ARM may have a concern as well.  I don't know the
> situation for x86 either.

>From cauldron I have a note that we need to look at the vectorizer cost model
for both the ARM and AArch64 backends and move away from
the set of magic constants that it currently returns.

On AArch32, all the reduc_ patterns are emulated with pair-wise operations
while on AArch64 they aren't. Thus they aren't likely to be the same cost as a
standard vector arithmetic instruction. What difference this makes in practice
remains to be seen, however the first step is moving towards the newer 
vectorizer
cost model interface.

I'll put this on a list of things for us to look at but I'm not sure who/when
will get around to looking at this.

regards
Ramana


[gcc-5-branch][PATCH][AARCH64]Fix for branch offsets over 1 MiB

2015-09-11 Thread Andre Vieira
Conditional branches have a maximum range of [-1048576, 1048572]. Any 
destination further away can not be reached by these.
To be able to have conditional branches in very large functions, we 
invert the condition and change the destination to jump over an 
unconditional branch to the original, far away, destination.


This patch backports the fix from trunk to the gcc-5-branch.
The original patch is at:
https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01493.html

gcc/ChangeLog:
2015-09-09  Andre Vieira  

  Backport from mainline:
  2015-08-27  Ramana Radhakrishnan  
  Andre Vieira  

  * config/aarch64/aarch64.md (*condjump): Handle functions > 1 MiB.
  (*cb1): Likewise.
  (*tb1): Likewise.
  (*cb1): Likewise.
  * config/aarch64/iterators.md (inv_cb): New code attribute.
  (inv_tb): Likewise.
  * config/aarch64/aarch64.c (aarch64_gen_far_branch): New.
  * config/aarch64/aarch64-protos.h (aarch64_gen_far_branch): New.


gcc/testsuite/ChangeLog:
2015-09-09  Andre Vieira  

  Backport from mainline:
  2015-08-27  Andre Vieira  

  * gcc.target/aarch64/long_branch_1.c: New test.
From 5b9f35c3a6dff67328e66e82f33ef3dc732ff5f7 Mon Sep 17 00:00:00 2001
From: Andre Simoes Dias Vieira 
Date: Tue, 8 Sep 2015 16:51:14 +0100
Subject: [PATCH] Backport fix for far branches

---
 gcc/config/aarch64/aarch64-protos.h  |  1 +
 gcc/config/aarch64/aarch64.c | 23 ++
 gcc/config/aarch64/aarch64.md| 89 +++
 gcc/config/aarch64/iterators.md  |  6 ++
 gcc/testsuite/gcc.target/aarch64/long_branch_1.c | 91 
 5 files changed, 195 insertions(+), 15 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/long_branch_1.c

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 59c5824f894cf5dafe93a996180056696518feb4..8669694bfc33cc040baeab5aaddc7a0575071add 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -258,6 +258,7 @@ void aarch64_init_expanders (void);
 void aarch64_print_operand (FILE *, rtx, char);
 void aarch64_print_operand_address (FILE *, rtx);
 void aarch64_emit_call_insn (rtx);
+const char * aarch64_gen_far_branch (rtx *, int, const char *, const char *);
 
 /* Initialize builtins for SIMD intrinsics.  */
 void init_aarch64_simd_builtins (void);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index cd20b48d7fd300ab496e67cae0d6e503ac305409..c2e4252cf689a4d7b28890743095536f3a390338 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1029,6 +1029,29 @@ aarch64_split_128bit_move_p (rtx dst, rtx src)
 
 /* Split a complex SIMD combine.  */
 
+/* Generate code to enable conditional branches in functions over 1 MiB.  */
+const char *
+aarch64_gen_far_branch (rtx * operands, int pos_label, const char * dest,
+			const char * branch_format)
+{
+rtx_code_label * tmp_label = gen_label_rtx ();
+char label_buf[256];
+char buffer[128];
+ASM_GENERATE_INTERNAL_LABEL (label_buf, dest,
+ CODE_LABEL_NUMBER (tmp_label));
+const char *label_ptr = targetm.strip_name_encoding (label_buf);
+rtx dest_label = operands[pos_label];
+operands[pos_label] = tmp_label;
+
+snprintf (buffer, sizeof (buffer), "%s%s", branch_format, label_ptr);
+output_asm_insn (buffer, operands);
+
+snprintf (buffer, sizeof (buffer), "b\t%%l%d\n%s:", pos_label, label_ptr);
+operands[pos_label] = dest_label;
+output_asm_insn (buffer, operands);
+return "";
+}
+
 void
 aarch64_split_simd_combine (rtx dst, rtx src1, rtx src2)
 {
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 4e7a6d19ac2ff8067912052697ca7a1fc403f910..c3c3beb75b4d4486d5b235e31b4db7bd213c7c7b 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -179,6 +179,13 @@
 	 (const_string "no")
 	] (const_string "yes")))
 
+;; Attribute that specifies whether we are dealing with a branch to a
+;; label that is far away, i.e. further away than the maximum/minimum
+;; representable in a signed 21-bits number.
+;; 0 :=: no
+;; 1 :=: yes
+(define_attr "far_branch" "" (const_int 0))
+
 ;; ---
 ;; Pipeline descriptions and scheduling
 ;; ---
@@ -306,8 +313,23 @@
 			   (label_ref (match_operand 2 "" ""))
 			   (pc)))]
   ""
-  "b%m0\\t%l2"
-  [(set_attr "type" "branch")]
+  {
+if (get_attr_length (insn) == 8)
+  return aarch64_gen_far_branch (operands, 2, "Lbcond", "b%M0\\t");
+else
+  return  "b%m0\\t%l2";
+  }
+  [(set_attr "type" "branch")
+   (set (attr "length")
+	(if_then_else (and 

[PATCH] Refactor optimize isl

2015-09-11 Thread Aditya Kumar
Updated patch with corrections:

Refactor graphite-optimize-isl.c. Renamed function name, variable names etc.,
and indented the source according to gcc style guidelines.  Modified comments
accordingly. No functional change intended.

Passes regtest and bootstap on x86_64.

gcc/ChangeLog:

2015-09-10  Aditya Kumar  

* graphite-optimize-isl.c (get_tile_map): Refactor.
(get_schedule_for_band): Same.
(getScheduleForBand): Same.
(get_prevector_map): Same.
(get_schedule_for_band_list): Same.
(get_schedule_map): Same.
(get_single_map): Same.
(apply_schedule_map_to_scop): Same.
(optimize_isl): Same.


---
 gcc/graphite-optimize-isl.c | 416 ++--
 1 file changed, 210 insertions(+), 206 deletions(-)

diff --git a/gcc/graphite-optimize-isl.c b/gcc/graphite-optimize-isl.c
index 811a510..e891e91 100644
--- a/gcc/graphite-optimize-isl.c
+++ b/gcc/graphite-optimize-isl.c
@@ -50,6 +50,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "params.h"
 #include "dumpfile.h"
 
+/* Set this to true to disable tiling of nested loops.  */
+static bool disable_tiling = false;
+
 static isl_union_set *
 scop_get_domains (scop_p scop ATTRIBUTE_UNUSED)
 {
@@ -64,152 +67,153 @@ scop_get_domains (scop_p scop ATTRIBUTE_UNUSED)
   return res;
 }
 
-/* getTileMap - Create a map that describes a n-dimensonal tiling.
-  
-   getTileMap creates a map from a n-dimensional scattering space into an
+/* get_tile_map - Create a map that describes a n-dimensonal tiling.
+
+   get_tile_map creates a map from a n-dimensional scattering space into an
2*n-dimensional scattering space. The map describes a rectangular tiling.
-  
+
Example:
- scheduleDimensions = 2, parameterDimensions = 1, tileSize = 32
- 
-tileMap := [p0] -> {[s0, s1] -> [t0, t1, s0, s1]:
- t0 % 32 = 0 and t0 <= s0 < t0 + 32 and
- t1 % 32 = 0 and t1 <= s1 < t1 + 32}
- 
+ SCHEDULE_DIMENSIONS = 2, PARAMETER_DIMENSIONS = 1, TILE_SIZE = 32
+
+tile_map := [p0] -> {[s0, s1] -> [t0, t1, s0, s1]:
+t0 % 32 = 0 and t0 <= s0 < t0 + 32 and
+t1 % 32 = 0 and t1 <= s1 < t1 + 32}
+
Before tiling:
- 
+
for (i = 0; i < N; i++)
  for (j = 0; j < M; j++)
-   S(i,j)
- 
+   S(i,j)
+
After tiling:
- 
+
for (t_i = 0; t_i < N; i+=32)
  for (t_j = 0; t_j < M; j+=32)
-   for (i = t_i; i < min(t_i + 32, N); i++)  | Unknown that N % 32 = 0
- for (j = t_j; j < t_j + 32; j++)|   Known that M % 32 = 0
-   S(i,j)
-   */
- 
+   for (i = t_i; i < min(t_i + 32, N); i++)  | Unknown that N % 32 = 0
+ for (j = t_j; j < t_j + 32; j++)|   Known that M % 32 = 0
+   S(i,j)
+  */
+
 static isl_basic_map *
-getTileMap (isl_ctx *ctx, int scheduleDimensions, int tileSize)
+get_tile_map (isl_ctx *ctx, int schedule_dimensions, int tile_size)
 {
-  int x;
   /* We construct
 
- tileMap := [p0] -> {[s0, s1] -> [t0, t1, p0, p1, a0, a1]:
-   s0 = a0 * 32 and s0 = p0 and t0 <= p0 < t0 + 32 and
-   s1 = a1 * 32 and s1 = p1 and t1 <= p1 < t1 + 32}
+ tile_map := [p0] -> {[s0, s1] -> [t0, t1, p0, p1, a0, a1]:
+   s0 = a0 * 32 and s0 = p0 and t0 <= p0 < t0 + 32 and
+   s1 = a1 * 32 and s1 = p1 and t1 <= p1 < t1 + 32}
 
  and project out the auxilary dimensions a0 and a1.  */
-  isl_space *Space = isl_space_alloc (ctx, 0, scheduleDimensions,
- scheduleDimensions * 3);
-  isl_basic_map *tileMap = isl_basic_map_universe (isl_space_copy (Space));
+  isl_space *space
+  = isl_space_alloc (ctx, 0, schedule_dimensions, schedule_dimensions * 3);
+  isl_basic_map *tile_map = isl_basic_map_universe (isl_space_copy (space));
 
-  isl_local_space *LocalSpace = isl_local_space_from_space (Space);
+  isl_local_space *local_space = isl_local_space_from_space (space);
 
-  for (x = 0; x < scheduleDimensions; x++)
+  for (int x = 0; x < schedule_dimensions; x++)
 {
   int sX = x;
   int tX = x;
-  int pX = scheduleDimensions + x;
-  int aX = 2 * scheduleDimensions + x;
+  int pX = schedule_dimensions + x;
+  int aX = 2 * schedule_dimensions + x;
 
   isl_constraint *c;
 
-  /* sX = aX * tileSize; */
-  c = isl_equality_alloc (isl_local_space_copy (LocalSpace));
+  /* sX = aX * tile_size; */
+  c = isl_equality_alloc (isl_local_space_copy (local_space));
   isl_constraint_set_coefficient_si (c, isl_dim_out, sX, 1);
-  isl_constraint_set_coefficient_si (c, isl_dim_out, aX, -tileSize);
-  tileMap = isl_basic_map_add_constraint (tileMap, c);
+  isl_constraint_set_coefficient_si (c, isl_dim_out, aX, -tile_size);
+  tile_map = isl_basic_map_add_constraint (tile_map, c);
 
   /* pX = sX; */
-  c = 

[PATCH] Remove dead code from graphite-optimize-isl.c

2015-09-11 Thread Aditya Kumar
The variable `static bool enable_polly_vector' is always assigned to false. 
This results in dead code in optimize-isl.c.
Removing the dead code. No functional change intended.

Passes bootstrap and regtest.

gcc/ChangeLog:

2015-09-11  Aditya Kumar  

* graphite-optimize-isl.c (get_prevector_map): Delete function.
(get_schedule_for_band_list): Remove dead code.



---
 gcc/graphite-optimize-isl.c | 136 ++--
 1 file changed, 4 insertions(+), 132 deletions(-)

diff --git a/gcc/graphite-optimize-isl.c b/gcc/graphite-optimize-isl.c
index e891e91..b2a9a3f 100644
--- a/gcc/graphite-optimize-isl.c
+++ b/gcc/graphite-optimize-isl.c
@@ -204,119 +204,13 @@ get_schedule_for_band (isl_band *band, int *dimensions)
   return isl_union_map_apply_range (partial_schedule, tile_umap);
 }
 
-/* Create a map that pre-vectorizes one scheduling dimension.
-
-   get_prevector_map creates a map that maps each input dimension to the same
-   output dimension, except for the dimension DIM_TO_VECTORIZE.
-   DIM_TO_VECTORIZE is
-   strip mined by 'VECTOR_WIDTH' and the newly created point loop of
-   DIM_TO_VECTORIZE is moved to the innermost level.
-
-   Example (DIM_TO_VECTORIZE=0, SCHEDULE_DIMENSIONS=2,VECTOR_WIDTH=4):
-
-   | Before transformation
-   |
-   | A[i,j] -> [i,j]
-   |
-   | for (i = 0; i < 128; i++)
-   |for (j = 0; j < 128; j++)
-   |  A(i,j);
-
- Prevector map:
- [i,j] -> [it,j,ip] : it % 4 = 0 and it <= ip <= it + 3 and i = ip
-
-   | After transformation:
-   |
-   | A[i,j] -> [it,j,ip] : it % 4 = 0 and it <= ip <= it + 3 and i = ip
-   |
-   | for (it = 0; it < 128; it+=4)
-   |for (j = 0; j < 128; j++)
-   |  for (ip = max(0,it); ip < min(128, it + 3); ip++)
-   |A(ip,j);
-
-   The goal of this transformation is to create a trivially vectorizable loop.
-   This means a parallel loop at the innermost level that has a constant number
-   of iterations corresponding to the target vector width.
-
-   This transformation creates a loop at the innermost level. The loop has a
-   constant number of iterations, if the number of loop iterations at
-   DIM_TO_VECTORIZE can be devided by VECTOR_WIDTH. The default VECTOR_WIDTH is
-   currently constant and not yet target specific. This function does not
-   reason about parallelism.  */
-static isl_map *
-get_prevector_map (isl_ctx *ctx, int dim_to_vectorize, int schedule_dimensions,
-  int vector_width)
-{
-  isl_space *space;
-  isl_local_space *local_space, *local_space_range;
-  isl_set *modulo;
-  isl_map *tiling_map;
-  isl_constraint *c;
-  isl_aff *aff;
-  int point_dimension; /* ip */
-  int tile_dimension;  /* it */
-  isl_val *vector_widthMP;
-  int i;
-
-  /* assert (0 <= DimToVectorize && DimToVectorize < ScheduleDimensions);*/
-
-  space
-  = isl_space_alloc (ctx, 0, schedule_dimensions, schedule_dimensions + 1);
-  tiling_map = isl_map_universe (isl_space_copy (space));
-  local_space = isl_local_space_from_space (space);
-  point_dimension = schedule_dimensions;
-  tile_dimension = dim_to_vectorize;
-
-  /* Create an identity map for everything except DimToVectorize and map
- DimToVectorize to the point loop at the innermost dimension.  */
-  for (i = 0; i < schedule_dimensions; i++)
-{
-  c = isl_equality_alloc (isl_local_space_copy (local_space));
-  isl_constraint_set_coefficient_si (c, isl_dim_in, i, -1);
-
-  if (i == dim_to_vectorize)
-   isl_constraint_set_coefficient_si (c, isl_dim_out, point_dimension, 1);
-  else
-   isl_constraint_set_coefficient_si (c, isl_dim_out, i, 1);
-
-  tiling_map = isl_map_add_constraint (tiling_map, c);
-}
-
-  /* it % 'VectorWidth' = 0  */
-  local_space_range
-  = isl_local_space_range (isl_local_space_copy (local_space));
-  aff = isl_aff_zero_on_domain (local_space_range);
-  aff = isl_aff_set_constant_si (aff, vector_width);
-  aff = isl_aff_set_coefficient_si (aff, isl_dim_in, tile_dimension, 1);
-
-  vector_widthMP = isl_val_int_from_si (ctx, vector_width);
-  aff = isl_aff_mod_val (aff, vector_widthMP);
-  modulo = isl_pw_aff_zero_set (isl_pw_aff_from_aff (aff));
-  tiling_map = isl_map_intersect_range (tiling_map, modulo);
-
-  /* it <= ip */
-  c = isl_inequality_alloc (isl_local_space_copy (local_space));
-  isl_constraint_set_coefficient_si (c, isl_dim_out, tile_dimension, -1);
-  isl_constraint_set_coefficient_si (c, isl_dim_out, point_dimension, 1);
-  tiling_map = isl_map_add_constraint (tiling_map, c);
-
-  /* ip <= it + ('VectorWidth' - 1) */
-  c = isl_inequality_alloc (local_space);
-  isl_constraint_set_coefficient_si (c, isl_dim_out, tile_dimension, 1);
-  isl_constraint_set_coefficient_si (c, isl_dim_out, point_dimension, -1);
-  isl_constraint_set_constant_si (c, vector_width - 1);
-  tiling_map = isl_map_add_constraint (tiling_map, c);
-
-  return tiling_map;
-}
-
-static bool enable_polly_vector = false;
 
 /* 

[testsuite] Link gcc.dg/pie-link.c with -pie

2015-09-11 Thread Rainer Orth
While starting to develop the patch for Solaris PIE support,
the gcc.dg/pie-link.c test succeeded at a point when it shouldn't have,
i.e. before I had PIC crt files.  In its current form, the test doesn't
test what it's supposed to test, namely successfully linking position
independent executables, because it compiles with -fpie, but links as a
regular executable.  The test was added here

2007-06-01  Geoffrey Keating  

* gcc.dg/pie-link.c: New test.

https://gcc.gnu.org/ml/gcc-patches/2007-06/msg00070.html

-pie doesn't make a difference on Darwin, but very much so on Linux and
Solaris, where it is necessary for PIE creation.

Fixed thus, tested on x86_64-unknown-linux-gnu and *-*-solaris2.12 with
my upcoming PIE patch.

Ok for mainline and gcc-5 branch?

Rainer


2015-02-10  Rainer Orth  

* gcc.dg/pie-link.c: Add -pie to dg-options.

# HG changeset patch
# Parent 3e8c1a7c7f81ba581f0dd21ef2fc84ba136ec40d
Link gcc.dg/pie-link.c with -pie

diff --git a/gcc/testsuite/gcc.dg/pie-link.c b/gcc/testsuite/gcc.dg/pie-link.c
--- a/gcc/testsuite/gcc.dg/pie-link.c
+++ b/gcc/testsuite/gcc.dg/pie-link.c
@@ -1,5 +1,5 @@
 /* { dg-do link { target pie } } */
-/* { dg-options "-fpie" } */
+/* { dg-options "-fpie -pie" } */
 
 int main(void)
 {

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [testsuite] Link gcc.dg/pie-link.c with -pie

2015-09-11 Thread Jakub Jelinek
On Fri, Sep 11, 2015 at 11:08:03AM +0200, Rainer Orth wrote:
> While starting to develop the patch for Solaris PIE support,
> the gcc.dg/pie-link.c test succeeded at a point when it shouldn't have,
> i.e. before I had PIC crt files.  In its current form, the test doesn't
> test what it's supposed to test, namely successfully linking position
> independent executables, because it compiles with -fpie, but links as a
> regular executable.  The test was added here
> 
> 2007-06-01  Geoffrey Keating  
> 
>   * gcc.dg/pie-link.c: New test.
> 
> https://gcc.gnu.org/ml/gcc-patches/2007-06/msg00070.html
> 
> -pie doesn't make a difference on Darwin, but very much so on Linux and
> Solaris, where it is necessary for PIE creation.
> 
> Fixed thus, tested on x86_64-unknown-linux-gnu and *-*-solaris2.12 with
> my upcoming PIE patch.
> 
> Ok for mainline and gcc-5 branch?
> 
>   Rainer
> 
> 
> 2015-02-10  Rainer Orth  
> 
>   * gcc.dg/pie-link.c: Add -pie to dg-options.

Ok (if it doesn't work on Darwin, it can be conditionalized on !darwin).

> --- a/gcc/testsuite/gcc.dg/pie-link.c
> +++ b/gcc/testsuite/gcc.dg/pie-link.c
> @@ -1,5 +1,5 @@
>  /* { dg-do link { target pie } } */
> -/* { dg-options "-fpie" } */
> +/* { dg-options "-fpie -pie" } */
>  
>  int main(void)
>  {

Jakub


Re: [PATCH 2/2] shrink-wrap: Rewrite try_shrink_wrapping

2015-09-11 Thread Bernd Schmidt

On 09/10/2015 05:14 PM, Segher Boessenkool wrote:

This patch rewrites the shrink-wrapping algorithm, allowing non-linear
pieces of CFG to be duplicated for use without prologue instead of just
linear pieces.


An example would be good for this kind of patch, also in the comments.



-  add_to_hard_reg_set (, GET_MODE (dreg),
-  REGNO (dreg));
+  add_to_hard_reg_set (, GET_MODE (dreg), REGNO (dreg));
  }


In general please try to avoid mixing formatting changes with functional 
ones. No need to remove these ones from future versions of the patch though.



+  /* If there is more than one predecessor of PRO not dominated by PRO,
+ fail.  We don't have to do this (can split the block), but do this
+ for now (the original code disallowed this, too).


Comments shouldn't reference previous versions. Also, a comment 
describing the why rather than just what is being done would be more 
helpful.


I'm wondering how your new algorithm prevents the prologue from being 
placed inside a loop. Can you have a situation where this picks a 
predecessor that is reachable but not dominated by PRO?



+  int num = (*entry_edge)->probability;
+  int den = REG_BR_PROB_BASE;
+
+
+  if (*entry_edge == orig_entry_edge)
+goto out;


There are a couple of places where there are multiple blank lines. I 
think we avoid that.



+   FOR_EACH_EDGE (e, ei, src->succs)
+ {
+   if (e->dest == EXIT_BLOCK_PTR_FOR_FN (cfun))
  {
-   all_set = false;
-   break;
  }


Minor, but I'd prefer a continue rather than an empty { } block.

Other than that it looks pretty good.


Bernd


Re: [Patch] Teach RTL ifcvt to handle multiple simple set instructions

2015-09-11 Thread James Greenhalgh
On Fri, Sep 11, 2015 at 10:04:12AM +0100, Ramana Radhakrishnan wrote:
> On Fri, Sep 11, 2015 at 10:53:13AM +0200, Bernd Schmidt wrote:
> > On 09/10/2015 11:11 PM, Jeff Law wrote:
> > >I think that's probably what James is most interested in getting some
> > >ideas around -- the cost model.
> > >
> > >I think the fundamental problem is BRANCH_COST isn't actually relative
> > >to anything other than the default value of "1".  It doesn't directly
> > >correspond to COSTS_N_INSNS or anything else.  So while using
> > >COSTS_N_INSNS (BRANCH_COST)) would seem to make sense, it actually
> > >doesn't.  It's not even clear how a value of 10 relates to a value of 1
> > >other than it's more expensive.
> > >
> > >ifcvt (and others) comparing to magic #s is more than a bit lame.  But
> > >with BRANCH_COST having no meaning relative to anything else I can see
> > >why Richard did things that way.
> > >
> > >In an ideal world we'd find some mapping from BRANCH_COST that relates
> > >to CONST_N_INSNS.  I suspect a simple mapping doesn't necessarily exist
> > >and we'll likely regress targets with any simplistic mapping.  But maybe
> > >now is the time to address that fundamental problem and deal with the
> > >fallout.
> > 
> > I think the right approach if we want to fix this is a new
> > branch_cost_ninsns target hook (maybe with arguments
> > taken_percentage, predictability), and gradually move everything to
> > use that instead of BRANCH_COST.
> 
> Perhaps providing backends with the entire if-then-else block along
> with the above mentioned information being if converted may be another
> approach, it allows the backends to analyse what cases are good to
> if-convert as per the ISA or micro-architecture and what aren't.

I'm not sure how much of this is likely to be target-dependent and how
much can just be abstracted to common ifcvt code resuing rtx_costs.

I've been sketching out a rough idea of a more applicable cost model for
RTL ifcvt, taking in to consideration what David mentioned regarding the
talks at cauldron. The question we want to ask is:

Which is preferable between:

  Before:
   (COSTS_N_INSNS cost of the compare+branch insns at the tail of the if BB.
 ??? (possibly) some factor related to BRANCH_COST)
   + weighted cost of then BB.
   + (if needed) weighted cost of else BB.

  After:
   seq_cost the candidate new sequence.
  
The weighted cost of the two BBs should mix in some idea as to the relative
probability that we execute them.

The tough part is figuring out how to (reasonably) factor in branch cost.
The reason that is tough is that BRANCH_COST is used inconsistently. Normally
it is not measured relative to anything, but is compared against magic numbers
for optimizations (each of which are really their own question to be posed
as above).

I don't have a good answer to that, nor a good answer as to what BRANCH_COST
should represent in future. The use in the compiler is sort-of consistent
with a measurement against instruction counts (i.e. a branch cost of 3 means
a branch is equivalent to 3 cheap instructions), but is sometimes just used
as a measure of expensive (a branch cost of >= 2 means that abs should be
expanded using a sequence of bit operations).

I'll look in to how the code in ifcvt starts to look with a modified cost
model and get back to you...

James



Re: [libgo] Use stat_atim.go on Solaris 12+

2015-09-11 Thread Rainer Orth
Ian Lance Taylor  writes:

> On Wed, Aug 26, 2015 at 4:14 AM, Rainer Orth
>  wrote:
>> Solaris 12 changes the stat_[amc]tim members of struct stat from
>> timestruc_t to timespec_t for XPG7 compatiblity, thus breaking the libgo
>> build.  The following patch checks for this change and uses the common
>> stat_atim.go if appropriate.
>>
>> Btw., I noticed that go/os/stat_atim.go and stat_dragonfly.go are identical;
>> no idea why that would be useful.
>>
>> Bootstrapped without regressions on i386-pc-solaris2.1[12] and
>> sparc-sun-solaris2.1[12].
>>
>> I had to regenerate aclocal.m4 since for some reason it had been built
>> with automake 1.11.1 instead of the common 1.11.6, thus inhibiting
>> Makefile.in regeneration.
>>
>> Ok for mainline now and the gcc 5 branch after some soak time?
>>
>> Rainer
>>
>>
>> 2015-02-10  Rainer Orth  
>>
>> * configure.ac (have_stat_timespec): Check for timespec_t st_atim
>> in .
>> (HAVE_STAT_TIMESPEC): New conditional.
>> * configure: Regenerate.
>> * Makefile.am [LIBGO_IS_SOLARIS && HAVE_STAT_TIMESPEC]
>> (go_os_stat_file): Use go/os/stat_atim.go.
>> * aclocal.m4: Regenerate.
>> * Makefile.in: Regenerate.
>
> Thanks.  Committed to mainline.
>
> Sorry for the slow review.

Thanks, and no worries: I just wanted to make sure it can make it into
GCC 5.3.

> This is fine to commit to GCC 5 branch.

I'll let it soak on mainline for a week or two and commit to the branch
then.

> stat_atim.go is a gccgo-specific file.  stat_dragonfly.go came in from
> the master Go repository.  Just another thing to straighten out some
> time.

I see: such differences can make things a bit confusing ;-)

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


[patch] libstdc++/65092 Allocator-extended constructors for container adaptors.

2015-09-11 Thread Jonathan Wakely

These should have been added as part of finishing C++11 support but I
missed them.

Now we have even more constructors to slow down compilation, yay
allocators.

Tested powerpc64le-linux, commtted to trunk.



commit 0a8e02d699f4fe30e44678eb7cbd08bb6c6aed97
Author: Jonathan Wakely 
Date:   Fri Sep 11 10:19:43 2015 +0100

Allocator-extended constructors for container adaptors.

	PR libstdc++/65092
	* include/bits/stl_queue.h (queue, priority_queue): Add
	allocator-extended constructors.
	* include/bits/stl_stack.h (stack): Likewise.
	* testsuite/23_containers/priority_queue/requirements/
	uses_allocator.cc: Test allocator-extended constructors.
	* testsuite/23_containers/queue/requirements/uses_allocator.cc:
	Likewise.
	* testsuite/23_containers/stack/requirements/uses_allocator.cc:
	Likewise.

diff --git a/libstdc++-v3/include/bits/stl_queue.h b/libstdc++-v3/include/bits/stl_queue.h
index 5f8e6fb..f7e5e30 100644
--- a/libstdc++-v3/include/bits/stl_queue.h
+++ b/libstdc++-v3/include/bits/stl_queue.h
@@ -110,6 +110,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 friend bool
 operator<(const queue<_Tp1, _Seq1>&, const queue<_Tp1, _Seq1>&);
 
+#if __cplusplus >= 201103L
+  template
+	using _Uses = typename
+	  enable_if::value>::type;
+#endif
+
 public:
   typedef typename _Sequence::value_typevalue_type;
   typedef typename _Sequence::reference reference;
@@ -144,6 +150,27 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   explicit
   queue(_Sequence&& __c = _Sequence())
   : c(std::move(__c)) { }
+
+  template>
+	explicit
+	queue(const _Alloc& __a)
+	: c(__a) { }
+
+  template>
+	queue(const _Sequence& __c, const _Alloc& __a)
+	: c(__c, __a) { }
+
+  template>
+	queue(_Sequence&& __c, const _Alloc& __a)
+	: c(std::move(__c), __a) { }
+
+  template>
+	queue(const queue& __q, const _Alloc& __a)
+	: c(__q.c, __a) { }
+
+  template>
+	queue(queue&& __q, const _Alloc& __a)
+	: c(std::move(__q.c), __a) { }
 #endif
 
   /**
@@ -378,6 +405,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   __glibcxx_class_requires4(_Compare, bool, _Tp, _Tp,
 _BinaryFunctionConcept)
 
+#if __cplusplus >= 201103L
+  template
+	using _Uses = typename
+	  enable_if::value>::type;
+#endif
+
 public:
   typedef typename _Sequence::value_typevalue_type;
   typedef typename _Sequence::reference reference;
@@ -412,6 +445,32 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 		 _Sequence&& __s = _Sequence())
   : c(std::move(__s)), comp(__x)
   { std::make_heap(c.begin(), c.end(), comp); }
+
+  template>
+	explicit
+	priority_queue(const _Alloc& __a)
+	: c(__a) { }
+
+  template>
+	priority_queue(const _Compare& __x, const _Alloc& __a)
+	: c(__x, __a) { }
+
+  template>
+	priority_queue(const _Compare& __x, const _Sequence& __c,
+		   const _Alloc& __a)
+	: c(__x, __c, __a) { }
+
+  template>
+	priority_queue(const _Compare& __x, _Sequence&& __c, const _Alloc& __a)
+	: c(__x, std::move(__c), __a) { }
+
+  template>
+	priority_queue(const priority_queue& __q, const _Alloc& __a)
+	: c(__q.c, __a) { }
+
+  template>
+	priority_queue(priority_queue&& __q, const _Alloc& __a)
+	: c(std::move(__q.c), __a) { }
 #endif
 
   /**
diff --git a/libstdc++-v3/include/bits/stl_stack.h b/libstdc++-v3/include/bits/stl_stack.h
index 09dd611..0b54d1a 100644
--- a/libstdc++-v3/include/bits/stl_stack.h
+++ b/libstdc++-v3/include/bits/stl_stack.h
@@ -114,6 +114,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 friend bool
 operator<(const stack<_Tp1, _Seq1>&, const stack<_Tp1, _Seq1>&);
 
+#if __cplusplus >= 201103L
+  template
+	using _Uses = typename
+	  enable_if::value>::type;
+#endif
+
 public:
   typedef typename _Sequence::value_typevalue_type;
   typedef typename _Sequence::reference reference;
@@ -142,6 +148,27 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   explicit
   stack(_Sequence&& __c = _Sequence())
   : c(std::move(__c)) { }
+
+  template>
+	explicit
+	stack(const _Alloc& __a)
+	: c(__a) { }
+
+  template>
+	stack(const _Sequence& __c, const _Alloc& __a)
+	: c(__c, __a) { }
+
+  template>
+	stack(_Sequence&& __c, const _Alloc& __a)
+	: c(std::move(__c), __a) { }
+
+  template>
+	stack(const stack& __q, const _Alloc& __a)
+	: c(__q.c, __a) { }
+
+  template>
+	stack(stack&& __q, const _Alloc& __a)
+	: c(std::move(__q.c), __a) { }
 #endif
 
   /**
diff --git a/libstdc++-v3/testsuite/23_containers/priority_queue/requirements/uses_allocator.cc b/libstdc++-v3/testsuite/23_containers/priority_queue/requirements/uses_allocator.cc
index 75729ff..9419ac8 100644
--- a/libstdc++-v3/testsuite/23_containers/priority_queue/requirements/uses_allocator.cc
+++ 

Re: [RFC AArch64][PR 63304] Handle literal pools for functions > 1 MiB in size.

2015-09-11 Thread Ramana Radhakrishnan
On Thu, Aug 27, 2015 at 03:07:30PM +0100, Marcus Shawcroft wrote:
> On 27 July 2015 at 15:33, Ramana Radhakrishnan
>  wrote:
> 
> >   Ramana Radhakrishnan  
> >
> > PR target/63304
> > * config/aarch64/aarch64.c (aarch64_expand_mov_immediate): Handle
> > nopcrelative_literal_loads.
> > (aarch64_classify_address): Likewise.
> > (aarch64_constant_pool_reload_icode): Define.
> > (aarch64_secondary_reload): Handle secondary reloads for
> > literal pools.
> > (aarch64_override_options): Handle nopcrelative_literal_loads.
> > (aarch64_classify_symbol): Handle nopcrelative_literal_loads.
> > * config/aarch64/aarch64.md 
> > (aarch64_reload_movcp):
> > Define.
> > (aarch64_reload_movcp): Likewise.
> > * config/aarch64/aarch64.opt: New option 
> > mnopc-relative-literal-loads
> > * config/aarch64/predicates.md (aarch64_constant_pool_symref): New
> > predicate.
> > * doc/invoke.texi (mnopc-relative-literal-loads): Document.
> 
> This looks OK to me. It needs rebasing, but OK if the rebase is
> trival.  Default on is fine.  Hold off on the back ports for a couple
> of weeks.
> Cheers
> /Marcus

I didn't want to commit this and run off on holiday.

The rebase required is pretty much for Kyrill's work with saving
and restoring state for the target attributes stuff. So that's simple enough
and been tested ok.

I had forgotten there was a pre-requisite that requires a rebase after Alan's
recent work for F16, I've posted that again here after rebase for
approval.

https://gcc.gnu.org/ml/gcc-patches/2015-07/msg02074.html

movtf is unnecessary as a separate expander. Move this to be with
the standard scalar floating point expanders.

Achieved by adding a new iterator and then using the same.

Tested cross aarch64-none-elf and no regressions.

Ramana

* config/aarch64/aarch.md (mov:GPF_F16): Use GPF_TF_F16.
(movtf): Delete.
* config/aarch64/iterators.md (GPF_TF_F16): New.
(GPF_F16): Delete.

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 2522982..58bb04a 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -1043,8 +1043,8 @@
 })
 
 (define_expand "mov"
-  [(set (match_operand:GPF_F16 0 "nonimmediate_operand" "")
-   (match_operand:GPF_F16 1 "general_operand" ""))]
+  [(set (match_operand:GPF_TF_F16 0 "nonimmediate_operand" "")
+   (match_operand:GPF_TF_F16 1 "general_operand" ""))]
   ""
   {
 if (!TARGET_FLOAT)
@@ -1118,24 +1118,6 @@
  f_loadd,f_stored,load1,store1,mov_reg")]
 )
 
-(define_expand "movtf"
-  [(set (match_operand:TF 0 "nonimmediate_operand" "")
-   (match_operand:TF 1 "general_operand" ""))]
-  ""
-  {
-if (!TARGET_FLOAT)
-  {
-   aarch64_err_no_fpadvsimd (TFmode, "code");
-   FAIL;
-  }
-
-if (GET_CODE (operands[0]) == MEM
-&& ! (GET_CODE (operands[1]) == CONST_DOUBLE
- && aarch64_float_const_zero_rtx_p (operands[1])))
-  operands[1] = force_reg (TFmode, operands[1]);
-  }
-)
-
 (define_insn "*movtf_aarch64"
   [(set (match_operand:TF 0
 "nonimmediate_operand" "=w,?,w ,?r,w,?w,w,m,?r ,Ump,Ump")
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 475aa6e..c1a0ce2 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -38,8 +38,8 @@
 ;; Iterator for General Purpose Floating-point registers (32- and 64-bit modes)
 (define_mode_iterator GPF [SF DF])
 
-;; Iterator for General Purpose Float registers, inc __fp16.
-(define_mode_iterator GPF_F16 [HF SF DF])
+;; Iterator for all scalar floating point modes (HF, SF, DF and TF)
+(define_mode_iterator GPF_TF_F16 [HF SF DF TF])
 
 ;; Integer vector modes.
 (define_mode_iterator VDQ_I [V8QI V16QI V4HI V8HI V2SI V4SI V2DI])




Re: [Patch] Teach RTL ifcvt to handle multiple simple set instructions

2015-09-11 Thread Kyrill Tkachov


On 10/09/15 22:11, Jeff Law wrote:

On 09/10/2015 12:23 PM, Bernd Schmidt wrote:

  > No testcase provided, as currently I don't know of targets with a high
  > enough branch cost to actually trigger the optimisation.

Hmm, so the code would not actually be used right now? In that case I'll
leave it to others to decide whether we want to apply it. Other than the
points above it looks OK to me.

Some targets have -mbranch-cost to allow overriding the default costing.
   visium has a branch cost of 10!  Several ports have a cost of 6 either
unconditionally or when the branch is not well predicted.

Presumably James is more interested in the ARM/AArch64 targets ;-)

I think that's probably what James is most interested in getting some
ideas around -- the cost model.

I think the fundamental problem is BRANCH_COST isn't actually relative
to anything other than the default value of "1".  It doesn't directly
correspond to COSTS_N_INSNS or anything else.  So while using
COSTS_N_INSNS (BRANCH_COST)) would seem to make sense, it actually
doesn't.  It's not even clear how a value of 10 relates to a value of 1
other than it's more expensive.

ifcvt (and others) comparing to magic #s is more than a bit lame.  But
with BRANCH_COST having no meaning relative to anything else I can see
why Richard did things that way.


Out of interest, what was the intended original meaning
of branch costs if it was not to be relative to instructions?

Thanks,
Kyrill


In an ideal world we'd find some mapping from BRANCH_COST that relates
to CONST_N_INSNS.  I suspect a simple mapping doesn't necessarily exist
and we'll likely regress targets with any simplistic mapping.  But maybe
now is the time to address that fundamental problem and deal with the
fallout.

jeff






Re: [Patch] Teach RTL ifcvt to handle multiple simple set instructions

2015-09-11 Thread Bernd Schmidt

On 09/10/2015 11:11 PM, Jeff Law wrote:

I think that's probably what James is most interested in getting some
ideas around -- the cost model.

I think the fundamental problem is BRANCH_COST isn't actually relative
to anything other than the default value of "1".  It doesn't directly
correspond to COSTS_N_INSNS or anything else.  So while using
COSTS_N_INSNS (BRANCH_COST)) would seem to make sense, it actually
doesn't.  It's not even clear how a value of 10 relates to a value of 1
other than it's more expensive.

ifcvt (and others) comparing to magic #s is more than a bit lame.  But
with BRANCH_COST having no meaning relative to anything else I can see
why Richard did things that way.

In an ideal world we'd find some mapping from BRANCH_COST that relates
to CONST_N_INSNS.  I suspect a simple mapping doesn't necessarily exist
and we'll likely regress targets with any simplistic mapping.  But maybe
now is the time to address that fundamental problem and deal with the
fallout.


I think the right approach if we want to fix this is a new 
branch_cost_ninsns target hook (maybe with arguments taken_percentage, 
predictability), and gradually move everything to use that instead of 
BRANCH_COST.



Bernd




Re: [Patch] Teach RTL ifcvt to handle multiple simple set instructions

2015-09-11 Thread Ramana Radhakrishnan
On Fri, Sep 11, 2015 at 10:53:13AM +0200, Bernd Schmidt wrote:
> On 09/10/2015 11:11 PM, Jeff Law wrote:
> >I think that's probably what James is most interested in getting some
> >ideas around -- the cost model.
> >
> >I think the fundamental problem is BRANCH_COST isn't actually relative
> >to anything other than the default value of "1".  It doesn't directly
> >correspond to COSTS_N_INSNS or anything else.  So while using
> >COSTS_N_INSNS (BRANCH_COST)) would seem to make sense, it actually
> >doesn't.  It's not even clear how a value of 10 relates to a value of 1
> >other than it's more expensive.
> >
> >ifcvt (and others) comparing to magic #s is more than a bit lame.  But
> >with BRANCH_COST having no meaning relative to anything else I can see
> >why Richard did things that way.
> >
> >In an ideal world we'd find some mapping from BRANCH_COST that relates
> >to CONST_N_INSNS.  I suspect a simple mapping doesn't necessarily exist
> >and we'll likely regress targets with any simplistic mapping.  But maybe
> >now is the time to address that fundamental problem and deal with the
> >fallout.
> 
> I think the right approach if we want to fix this is a new
> branch_cost_ninsns target hook (maybe with arguments
> taken_percentage, predictability), and gradually move everything to
> use that instead of BRANCH_COST.

Perhaps providing backends with the entire if-then-else block along
with the above mentioned information being if converted may be another
approach, it allows the backends to analyse what cases are good to
if-convert as per the ISA or micro-architecture and what aren't.

regards
Ramana



Re: [testsuite] Link gcc.dg/pie-link.c with -pie

2015-09-11 Thread Rainer Orth
Jakub Jelinek  writes:

> On Fri, Sep 11, 2015 at 11:08:03AM +0200, Rainer Orth wrote:
>> While starting to develop the patch for Solaris PIE support,
>> the gcc.dg/pie-link.c test succeeded at a point when it shouldn't have,
>> i.e. before I had PIC crt files.  In its current form, the test doesn't
>> test what it's supposed to test, namely successfully linking position
>> independent executables, because it compiles with -fpie, but links as a
>> regular executable.  The test was added here
>> 
>> 2007-06-01  Geoffrey Keating  
>> 
>>  * gcc.dg/pie-link.c: New test.
>> 
>> https://gcc.gnu.org/ml/gcc-patches/2007-06/msg00070.html
>> 
>> -pie doesn't make a difference on Darwin, but very much so on Linux and
>> Solaris, where it is necessary for PIE creation.
>> 
>> Fixed thus, tested on x86_64-unknown-linux-gnu and *-*-solaris2.12 with
>> my upcoming PIE patch.
>> 
>> Ok for mainline and gcc-5 branch?
>> 
>>  Rainer
>> 
>> 
>> 2015-02-10  Rainer Orth  
>> 
>>  * gcc.dg/pie-link.c: Add -pie to dg-options.
>
> Ok (if it doesn't work on Darwin, it can be conditionalized on !darwin).

I had it in my Darwin builds and it passed, -pie being a no-op on that
target.

Thanks.
Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [RFC AArch64][PR 63304] Handle literal pools for functions > 1 MiB in size.

2015-09-11 Thread Richard Earnshaw
On 11/09/15 09:48, Ramana Radhakrishnan wrote:
> On Thu, Aug 27, 2015 at 03:07:30PM +0100, Marcus Shawcroft wrote:
>> On 27 July 2015 at 15:33, Ramana Radhakrishnan
>>  wrote:
>>
>>>   Ramana Radhakrishnan  
>>>
>>> PR target/63304
>>> * config/aarch64/aarch64.c (aarch64_expand_mov_immediate): Handle
>>> nopcrelative_literal_loads.
>>> (aarch64_classify_address): Likewise.
>>> (aarch64_constant_pool_reload_icode): Define.
>>> (aarch64_secondary_reload): Handle secondary reloads for
>>> literal pools.
>>> (aarch64_override_options): Handle nopcrelative_literal_loads.
>>> (aarch64_classify_symbol): Handle nopcrelative_literal_loads.
>>> * config/aarch64/aarch64.md 
>>> (aarch64_reload_movcp):
>>> Define.
>>> (aarch64_reload_movcp): Likewise.
>>> * config/aarch64/aarch64.opt: New option 
>>> mnopc-relative-literal-loads
>>> * config/aarch64/predicates.md (aarch64_constant_pool_symref): New
>>> predicate.
>>> * doc/invoke.texi (mnopc-relative-literal-loads): Document.
>>
>> This looks OK to me. It needs rebasing, but OK if the rebase is
>> trival.  Default on is fine.  Hold off on the back ports for a couple
>> of weeks.
>> Cheers
>> /Marcus
> 
> I didn't want to commit this and run off on holiday.
> 
> The rebase required is pretty much for Kyrill's work with saving
> and restoring state for the target attributes stuff. So that's simple enough
> and been tested ok.
> 
> I had forgotten there was a pre-requisite that requires a rebase after Alan's
> recent work for F16, I've posted that again here after rebase for
> approval.
> 
> https://gcc.gnu.org/ml/gcc-patches/2015-07/msg02074.html
> 
> movtf is unnecessary as a separate expander. Move this to be with
> the standard scalar floating point expanders.
> 
> Achieved by adding a new iterator and then using the same.
> 
> Tested cross aarch64-none-elf and no regressions.
> 
> Ramana
> 
>   * config/aarch64/aarch.md (mov:GPF_F16): Use GPF_TF_F16.
>   (movtf): Delete.
>   * config/aarch64/iterators.md (GPF_TF_F16): New.
>   (GPF_F16): Delete.

This is OK.

R.

> 
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 2522982..58bb04a 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -1043,8 +1043,8 @@
>  })
>  
>  (define_expand "mov"
> -  [(set (match_operand:GPF_F16 0 "nonimmediate_operand" "")
> - (match_operand:GPF_F16 1 "general_operand" ""))]
> +  [(set (match_operand:GPF_TF_F16 0 "nonimmediate_operand" "")
> + (match_operand:GPF_TF_F16 1 "general_operand" ""))]
>""
>{
>  if (!TARGET_FLOAT)
> @@ -1118,24 +1118,6 @@
>   f_loadd,f_stored,load1,store1,mov_reg")]
>  )
>  
> -(define_expand "movtf"
> -  [(set (match_operand:TF 0 "nonimmediate_operand" "")
> - (match_operand:TF 1 "general_operand" ""))]
> -  ""
> -  {
> -if (!TARGET_FLOAT)
> -  {
> - aarch64_err_no_fpadvsimd (TFmode, "code");
> - FAIL;
> -  }
> -
> -if (GET_CODE (operands[0]) == MEM
> -&& ! (GET_CODE (operands[1]) == CONST_DOUBLE
> -   && aarch64_float_const_zero_rtx_p (operands[1])))
> -  operands[1] = force_reg (TFmode, operands[1]);
> -  }
> -)
> -
>  (define_insn "*movtf_aarch64"
>[(set (match_operand:TF 0
>"nonimmediate_operand" "=w,?,w ,?r,w,?w,w,m,?r ,Ump,Ump")
> diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
> index 475aa6e..c1a0ce2 100644
> --- a/gcc/config/aarch64/iterators.md
> +++ b/gcc/config/aarch64/iterators.md
> @@ -38,8 +38,8 @@
>  ;; Iterator for General Purpose Floating-point registers (32- and 64-bit 
> modes)
>  (define_mode_iterator GPF [SF DF])
>  
> -;; Iterator for General Purpose Float registers, inc __fp16.
> -(define_mode_iterator GPF_F16 [HF SF DF])
> +;; Iterator for all scalar floating point modes (HF, SF, DF and TF)
> +(define_mode_iterator GPF_TF_F16 [HF SF DF TF])
>  
>  ;; Integer vector modes.
>  (define_mode_iterator VDQ_I [V8QI V16QI V4HI V8HI V2SI V4SI V2DI])
> 
> 



Re: [PATCH][RTL-ifcvt] PR rtl-optimization/67465: Handle pairs of complex+simple blocks and empty blocks more gracefully

2015-09-11 Thread Rainer Orth
Kyrill Tkachov  writes:

> On 10/09/15 12:43, Rainer Orth wrote:
>> Hi Kyrill,
>>
>>> Rainer, could you please check that this patch still fixes the SPARC
>>> regressions?
>> unfortunately, it breaks sparc-sun-solaris2.10 bootstrap: compiling
>> stage2 libiberty/regex.c FAILs:
>>
>>
>
> Thanks for providing the preprocessed file.
> I've reproduced and fixed the ICE in this version of the patch.
> The problem was that I was taking the mode of x before the check
> of whether a and b are MEMs, after which we would change x to an
> address_mode reg,
> thus confusing emit_move_insn.
>
> The fix is to take the mode of x and perform the can_conditionally_move_p 
> check
> after that transformation.
>
> Bootstrapped and tested on aarch64 and x86_64.
> The preprocessed regex.i that Rainer provided now compiles successfully for me
> on a sparc-sun-solaris2.10 stage-1 cross-compiler.
>
> Rainer, thanks for your help so far, could you please try out this patch?

While bootstrap succeeds again, the testsuite regression in
gcc.c-torture/execute/20071216-1.c reoccured.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH] [PATCH][ARM] Fix pr63210.c testcase.

2015-09-11 Thread Alex Velenko

Hi,
Committed fsf-trunk r227677, fsf-5 r227678.
Kind regards,
Alex



Re: [PATCH] PR67401: Fix wrong code generated by expand_atomic_compare_and_swap

2015-09-11 Thread Bernd Schmidt

On 09/11/2015 01:21 AM, John David Anglin wrote:

As noted in the PR, expand_atomic_compare_and_swap can generate wrong code when 
libcalls are emitted
for the sync_compare_and_swap and the result comparison test.  This is fixed by 
emitting a move insn to copy
the result rtx of the sync_compare_and_swap libcall to target_oval instead of 
directly assigning it.
Could you provide relevant parts of the rtl dumps or (preferrably) the 
patch you are using to enable the libcall?



Bernd



Re: [PATCH] vectorizing conditional expressions (PR tree-optimization/65947)

2015-09-11 Thread Alan Hayward
Hi Bill,

I’d be a bit worried about asking the backend for the cost of a
COND_REDUCTION, as that will rely on the backend understanding the
implementation the vectorizer is using - every time the vectorizer
changed, the backends would need to be updated too. I’m hoping soon to get
together a patch to reduce the stmts produced on the simpler cases, which
would require a different set of costings. I can also imagine further
improvements being added for other special cases over time. Having the
backends understand every variation would be a little cumbersome.

As it stands today, we correctly exit the optimisation if max reduction
isn’t supported in hardware, which is what the cost model is expecting.


If power wanted to use this implementation, then I think it’d probably
need some code in tree-vect-generic.c to implement on emulated max
reduction, which would then require updates to the costs modelling of
anything that uses max reduction (not just cond reduction). All of that is
outside the scope of this patch.


Thanks,
Alan.

On 10/09/2015 23:14, "Bill Schmidt"  wrote:

>Hi Alan,
>
>The cost modeling of the epilogue code seems pretty target-specific ("An
>EQ stmt and an AND stmt, reduction of the max index and a reduction of
>the found values, a broadcast of the max value," resulting in two
>vector_stmts, one vec_to_scalar, and two scalar_to_vecs).  On powerpc,
>this will not represent the cost accurately, and the cost will indeed be
>quite different depending on the mode (logarithmic in the number of
>elements).  I think that you need to create a new entry in
>vect_cost_for_stmt to represent the cost of a COND_REDUCTION, and allow
>each target to calculate the cost appropriately.
>
>(Powerpc doesn't have a max-reduction hardware instruction, but because
>the reduction will be only in the epilogue code, it may still be
>profitable for us to generate the somewhat expensive reduction sequence
>in order to vectorize the loop.  But we definitely want to model it as
>costly in and of itself.  Also, the sequence will produce the maximum
>value in all positions without a separate broadcast.)
>
>Thanks,
>Bill
>
>On Thu, 2015-09-10 at 15:51 +0100, Alan Hayward wrote:
>> Hi,
>> This patch (attached) adds support for vectorizing conditional
>>expressions
>> (PR 65947), for example:
>> 
>> int condition_reduction (int *a, int min_v)
>> {
>>   int last = 0;
>>   for (int i = 0; i < N; i++)
>> if (a[i] < min_v)
>>   last = a[i];
>>   return last;
>> }
>> 
>> To do this the loop is vectorised to create a vector of data results (ie
>> of matching a[i] values). Using an induction variable, an additional
>> vector is added containing the indexes where the matches occured. In the
>> function epilogue this is reduced to a single max value and then used to
>> index into the vector of data results.
>> When no values are matched in the loop, the indexes vector will contain
>> all zeroes, eventually matching the first entry in the data results
>>vector.
>> 
>> To vectorize sucessfully, support is required for REDUC_MAX_EXPR. This
>>is
>> supported by aarch64 and arm. On X86 and powerpc, gcc will complain that
>> REDUC_MAX_EXPR is not supported for the required modes, failing the
>> vectorization. On mips it complains that the required vcond expression
>>is
>> not supported. It is suggested the relevant backend experts add the
>> required backend support.
>> 
>> Using a simple testcase based around a large number of N and run on an
>> aarch64 juno board, with the patch in use, the runtime reduced to 0.8 of
>> it's original time.
>> 
>> This patch caused binary differences in three spec2006 binaries on
>>aarch64
>> - 4.16.gamess, 435.gromacs and 456.hmmer. Running them on a juno board
>> showed no improvement or degregation in runtime.
>> 
>> 
>> In the near future I hope to submit a further patch (as PR 66558) which
>> optimises the case where the result is simply the index of the loop, for
>> example:
>> int condition_reduction (int *a, int min_v)
>> {
>>   int last = 0;
>>   for (int i = 0; i < N; i++)
>> if (a[i] < min_v)
>>   last = i;
>>   return last;
>> }
>> In this case a lot of the new code can be optimized away.
>> 
>> I have run check for aarch64, arm and x86 and have seen no regressions.
>> 
>> 
>> Changelog:
>> 
>> 2015-08-28  Alan Hayward 
>> 
>> PR tree-optimization/65947
>> * tree-vect-loop.c
>> (vect_is_simple_reduction_1): Find condition reductions.
>> (vect_model_reduction_cost): Add condition reduction costs.
>> (get_initial_def_for_reduction): Add condition reduction initial
>> var.
>> (vect_create_epilog_for_reduction): Add condition reduction
>>epilog.
>> (vectorizable_reduction): Condition reduction support.
>> * tree-vect-stmts.c
>> (vectorizable_condition): Add vect reduction arg
>> * doc/sourcebuild.texi (Vector-specific attributes): Document
>> 

Re: [PATCH 2/2] shrink-wrap: Rewrite try_shrink_wrapping

2015-09-11 Thread Jiong Wang

Segher Boessenkool writes:

> On Thu, Sep 10, 2015 at 08:14:29AM -0700, Segher Boessenkool wrote:
>> This patch rewrites the shrink-wrapping algorithm, allowing non-linear
>> pieces of CFG to be duplicated for use without prologue instead of just
>> linear pieces.
>
>> Bootstrapped and regression tested on powerpc64-linux.  Is this okay
>> for mainline?
>
> Now also bootstrapped and regression tested on x86_64-linux.

+ AArch64 boostrapping OK.

A quick check shows > 30% more functions shrink-wrapped during
bootstrapping by a the following command:

cd $TOP_BUILD ; find . -name "*.pro_and_epilogue" | xargs grep 
"Perform.*shrink" | wc -l

-- 
Regards,
Jiong



RE: [PATCH] Remove dead code from graphite-optimize-isl.c

2015-09-11 Thread Sebastian Paul Pop
For the record, the patch LGTM.
Aditya and I have discussed about this in the morning.

Thanks,
Sebastian

-Original Message-
From: Aditya Kumar [mailto:aditya...@samsung.com] 
Sent: Friday, September 11, 2015 12:30 PM
To: gcc-patches@gcc.gnu.org
Cc: tob...@grosser.es; richard.guent...@gmail.com; aditya...@samsung.com;
s@samsung.com; seb...@gmail.com
Subject: [PATCH] Remove dead code from graphite-optimize-isl.c

The variable `static bool enable_polly_vector' is always assigned to false.
This results in dead code in optimize-isl.c.
Removing the dead code. No functional change intended.

Passes bootstrap and regtest.

gcc/ChangeLog:

2015-09-11  Aditya Kumar  

* graphite-optimize-isl.c (get_prevector_map): Delete function.
(get_schedule_for_band_list): Remove dead code.



---
 gcc/graphite-optimize-isl.c | 136
++--
 1 file changed, 4 insertions(+), 132 deletions(-)

diff --git a/gcc/graphite-optimize-isl.c b/gcc/graphite-optimize-isl.c
index e891e91..b2a9a3f 100644
--- a/gcc/graphite-optimize-isl.c
+++ b/gcc/graphite-optimize-isl.c
@@ -204,119 +204,13 @@ get_schedule_for_band (isl_band *band, int
*dimensions)
   return isl_union_map_apply_range (partial_schedule, tile_umap);
 }
 
-/* Create a map that pre-vectorizes one scheduling dimension.
-
-   get_prevector_map creates a map that maps each input dimension to the
same
-   output dimension, except for the dimension DIM_TO_VECTORIZE.
-   DIM_TO_VECTORIZE is
-   strip mined by 'VECTOR_WIDTH' and the newly created point loop of
-   DIM_TO_VECTORIZE is moved to the innermost level.
-
-   Example (DIM_TO_VECTORIZE=0, SCHEDULE_DIMENSIONS=2,VECTOR_WIDTH=4):
-
-   | Before transformation
-   |
-   | A[i,j] -> [i,j]
-   |
-   | for (i = 0; i < 128; i++)
-   |for (j = 0; j < 128; j++)
-   |  A(i,j);
-
- Prevector map:
- [i,j] -> [it,j,ip] : it % 4 = 0 and it <= ip <= it + 3 and i = ip
-
-   | After transformation:
-   |
-   | A[i,j] -> [it,j,ip] : it % 4 = 0 and it <= ip <= it + 3 and i = ip
-   |
-   | for (it = 0; it < 128; it+=4)
-   |for (j = 0; j < 128; j++)
-   |  for (ip = max(0,it); ip < min(128, it + 3); ip++)
-   |A(ip,j);
-
-   The goal of this transformation is to create a trivially vectorizable
loop.
-   This means a parallel loop at the innermost level that has a constant
number
-   of iterations corresponding to the target vector width.
-
-   This transformation creates a loop at the innermost level. The loop has
a
-   constant number of iterations, if the number of loop iterations at
-   DIM_TO_VECTORIZE can be devided by VECTOR_WIDTH. The default
VECTOR_WIDTH is
-   currently constant and not yet target specific. This function does not
-   reason about parallelism.  */
-static isl_map *
-get_prevector_map (isl_ctx *ctx, int dim_to_vectorize, int
schedule_dimensions,
-  int vector_width)
-{
-  isl_space *space;
-  isl_local_space *local_space, *local_space_range;
-  isl_set *modulo;
-  isl_map *tiling_map;
-  isl_constraint *c;
-  isl_aff *aff;
-  int point_dimension; /* ip */
-  int tile_dimension;  /* it */
-  isl_val *vector_widthMP;
-  int i;
-
-  /* assert (0 <= DimToVectorize && DimToVectorize < ScheduleDimensions);*/
-
-  space
-  = isl_space_alloc (ctx, 0, schedule_dimensions, schedule_dimensions +
1);
-  tiling_map = isl_map_universe (isl_space_copy (space));
-  local_space = isl_local_space_from_space (space);
-  point_dimension = schedule_dimensions;
-  tile_dimension = dim_to_vectorize;
-
-  /* Create an identity map for everything except DimToVectorize and map
- DimToVectorize to the point loop at the innermost dimension.  */
-  for (i = 0; i < schedule_dimensions; i++)
-{
-  c = isl_equality_alloc (isl_local_space_copy (local_space));
-  isl_constraint_set_coefficient_si (c, isl_dim_in, i, -1);
-
-  if (i == dim_to_vectorize)
-   isl_constraint_set_coefficient_si (c, isl_dim_out, point_dimension,
1);
-  else
-   isl_constraint_set_coefficient_si (c, isl_dim_out, i, 1);
-
-  tiling_map = isl_map_add_constraint (tiling_map, c);
-}
-
-  /* it % 'VectorWidth' = 0  */
-  local_space_range
-  = isl_local_space_range (isl_local_space_copy (local_space));
-  aff = isl_aff_zero_on_domain (local_space_range);
-  aff = isl_aff_set_constant_si (aff, vector_width);
-  aff = isl_aff_set_coefficient_si (aff, isl_dim_in, tile_dimension, 1);
-
-  vector_widthMP = isl_val_int_from_si (ctx, vector_width);
-  aff = isl_aff_mod_val (aff, vector_widthMP);
-  modulo = isl_pw_aff_zero_set (isl_pw_aff_from_aff (aff));
-  tiling_map = isl_map_intersect_range (tiling_map, modulo);
-
-  /* it <= ip */
-  c = isl_inequality_alloc (isl_local_space_copy (local_space));
-  isl_constraint_set_coefficient_si (c, isl_dim_out, tile_dimension, -1);
-  isl_constraint_set_coefficient_si (c, isl_dim_out, point_dimension, 1);
-  tiling_map = isl_map_add_constraint (tiling_map, c);
-
-  /* 

Re: [C++ Patch] PR 51911 V2 ("G++ accepts new auto { list }")

2015-09-11 Thread Jason Merrill

On 09/11/2015 03:11 PM, Paolo Carlini wrote:

this is a slightly reworked (simplified) version of a patch I sent a
while ago. The issue is that we are not enforcing at all 5.3.4/2 in the
parser, thus we end up rejecting the first test below with a misleading
error message talking about list-initialization (and a wrong location),
because we diagnose it too late like 'auto foo{3, 4, 5};', and simply
accepting the second. Tested x86_64-linux.


Hmm, I think we really ought to accept

  new auto { 2 }

to be consistent with all the other recent changes to treat { elt } like 
(elt); this seems like a piece that was missed from DR 1467.  Do you 
agree, Ville?


Jason



Re: [PATCH] Convert SPARC to LRA

2015-09-11 Thread Richard Henderson
On 09/11/2015 12:43 PM, David Miller wrote:
> From: David Miller 
> Date: Tue, 08 Sep 2015 21:41:15 -0700 (PDT)
> 
>> I'm therefore reasonably confident in these changes, but I will
>> not apply them just yet to give the other sparc maintainers some
>> time to review and give feedback.
> 
> Richard, Eric, any objections?
> 

Nope.


r~


Merge from trunk to gccgo branch

2015-09-11 Thread Ian Lance Taylor
I merged trunk revision 227689 to the gccgo branch.

Ian


Re: [PATCH] Convert SPARC to LRA

2015-09-11 Thread David Miller
From: David Miller 
Date: Tue, 08 Sep 2015 21:41:15 -0700 (PDT)

> I'm therefore reasonably confident in these changes, but I will
> not apply them just yet to give the other sparc maintainers some
> time to review and give feedback.

Richard, Eric, any objections?


[PATCH] v2 shrink-wrap: Rewrite

2015-09-11 Thread Segher Boessenkool
[ v2: patch adjusted after Bernd's comments ]

This patch rewrites the shrink-wrapping algorithm, allowing non-linear
pieces of CFG to be duplicated for use without prologue instead of just
linear pieces.

On PowerPC, this enables shrink-wrapping of about 2%-3% more functions.
I expected more, but in most cases this would help we cannot yet shrink-
wrap because there are non-volatile registers used, often in the first
block already.

Since with this patch you still get only one prologue, it doesn't do
much either for the case where there are many no-return error paths
(common in an enable-checking compiler build); all those paths end in
a no-return call, and those need a prologue (are not sibling calls).
There are PRs about this.  For shrink-wrapping, because all those
paths want a prologue we put a prologue early in the function, although
none of the "regular" code needs it.

I instrumented things a bit (not in the patch).  We can get about 10%
to 20% more functions shrink-wrapped by allowing multiple edges that
need a prologue inserted (edges to one and the same block); this can be
easily done by just inserting an extra block.  I'll work on this.

Of the blocks chosen to have the prologue inserted, about 70% need a
prologue because there is a call, 25% for other reasons (non-volatile
register sets mostly), and only 5% do not themselves need a prologue.

There are also cases where no block needs a prologue at all, but GCC
thinks the function needs one nevertheless.  This happens for example
if a stack frame was created for an address-taken local variable, but
that variable was optimised away later.  This doesn't happen much in
most cases (one in a thousand or so).  There are some cases (like -pg)
where the compiler forces a stack frame even if nothing uses it.

Shrink-wrapping is run at -O1, and basic block reordering is not.
Shrink-wrapping would often benefit from some simple reordering.  There
are quite a few targets that do not want the STC bbro at all, either;
we should have a simple bbro that runs at -O1 as well, does not increase
code size, and can be used for those targets that do not want STC.

It also would be nice to get rid of the silly games shrink-wrapping
plays (together with function.c) making fake edges for where the
simple_returns should be inserted.  It would simplify a lot of code
if we would (could) just insert them directly.

Bootstrapped and regression tested on powerpc64-linux.  Is this okay
for mainline?


Segher


2015-09-10  Segher Boessenkool  

* shrink-wrap.c (requires_stack_frame_p): Fix formatting.
(dup_block_and_redirect): Delete function.
(can_dup_for_shrink_wrapping): New function.
(fix_fake_fallthrough_edge): New function.
(try_shrink_wrapping): Rewrite function.
(convert_to_simple_return): Call fix_fake_fallthrough_edge.

---
 gcc/shrink-wrap.c | 788 +-
 1 file changed, 422 insertions(+), 366 deletions(-)

diff --git a/gcc/shrink-wrap.c b/gcc/shrink-wrap.c
index d10795a..1387594 100644
--- a/gcc/shrink-wrap.c
+++ b/gcc/shrink-wrap.c
@@ -91,8 +91,7 @@ requires_stack_frame_p (rtx_insn *insn, HARD_REG_SET 
prologue_used,
   if (!REG_P (dreg))
continue;
 
-  add_to_hard_reg_set (, GET_MODE (dreg),
-  REGNO (dreg));
+  add_to_hard_reg_set (, GET_MODE (dreg), REGNO (dreg));
 }
   if (hard_reg_set_intersect_p (hardregs, prologue_used))
 return true;
@@ -463,414 +462,469 @@ prepare_shrink_wrap (basic_block entry_block)
   }
 }
 
-/* Create a copy of BB instructions and insert at BEFORE.  Redirect
-   preds of BB to COPY_BB if they don't appear in NEED_PROLOGUE.  */
-static void
-dup_block_and_redirect (basic_block bb, basic_block copy_bb, rtx_insn *before,
-   bitmap_head *need_prologue)
+/* Return whether we can duplicate basic block BB for shrink wrapping.  We
+   cannot if the block cannot be duplicated at all, or if any of its incoming
+   edges are complex and come from a block that does not require a prologue
+   (we cannot redirect such edges), or if the block is too big to copy.
+   PRO is the basic block before which we would put the prologue, MAX_SIZE is
+   the maximum size block we allow to be copied.  */
+
+static bool
+can_dup_for_shrink_wrapping (basic_block bb, basic_block pro, unsigned 
max_size)
 {
-  edge_iterator ei;
-  edge e;
-  rtx_insn *insn = BB_END (bb);
+  if (!can_duplicate_block_p (bb))
+return false;
 
-  /* We know BB has a single successor, so there is no need to copy a
- simple jump at the end of BB.  */
-  if (simplejump_p (insn))
-insn = PREV_INSN (insn);
+  edge e;
+  edge_iterator ei;
+  FOR_EACH_EDGE (e, ei, bb->preds)
+if (e->flags & EDGE_COMPLEX
+   && !dominated_by_p (CDI_DOMINATORS, e->src, pro))
+  return false;
 
-  start_sequence ();
-  duplicate_insn_chain (BB_HEAD (bb), insn);
-  if (dump_file)
-{
-

[gomp4] PTX partition discovery cleanup

2015-09-11 Thread Nathan Sidwell
This preliminary patch changes PTX's loop discovery to use a recursive DFS walk, 
rather than a worklist.


The significant changes, from the POC  of the next patch I'll be committing are:

1) always insert a 'fork' insn, which keeps the entry to a partitioned region as 
a possible Single-Entry-Single-Exit exit node.


2) don't add the ENTRY bb to the null loop that contains the entire function.

both #1  and #2 help with the SESE optimization I've been  working on.

nathan
2015-09-11  Nathan Sidwell  

	* config/nvptx/nvptx.c (nvptx_emit_forking): Always emit fork.
	(bb_par_t, bb_par_vec_t): Delete.
	(nvptx_find_par): New recursive function. Broken out of ...
	(nvptx_discover_pars): ... here.  Call it.
	(nvptx_optimize_inner): Write to dump_file.

Index: gcc/config/nvptx/nvptx.c
===
--- gcc/config/nvptx/nvptx.c	(revision 227683)
+++ gcc/config/nvptx/nvptx.c	(working copy)
@@ -308,8 +308,8 @@ nvptx_emit_forking (unsigned mask, bool
 {
   rtx op = GEN_INT (mask | (is_call << GOMP_DIM_MAX));
   
-  /* Emit fork for worker level.  */
-  if (!is_call && mask & GOMP_DIM_MASK (GOMP_DIM_WORKER))
+  /* Emit fork at all levels, this helps form SESE regions..  */
+  if (!is_call)
 	emit_insn (gen_nvptx_fork (op));
   emit_insn (gen_nvptx_forked (op));
 }
@@ -2591,7 +2591,7 @@ nvptx_discover_pre (basic_block block, i
   return pre_insn;
 }
 
-/*  Dump this parallel and all its inner parallels.  */
+/* Dump this parallel and all its inner parallels.  */
 
 static void
 nvptx_dump_pars (parallel *par, unsigned depth)
@@ -2614,110 +2614,114 @@ nvptx_dump_pars (parallel *par, unsigned
 nvptx_dump_pars (par->next, depth);
 }
 
-typedef std::pair bb_par_t;
-typedef auto_vec bb_par_vec_t;
-
-/* Walk the BBG looking for fork & join markers.  Construct a
-   loop structure for the function.  MAP is a mapping of basic blocks
-   to head & taiol markers, discoveded when splitting blocks.  This
-   speeds up the discovery.  We rely on the BB visited flag having
-   been cleared when splitting blocks.  */
+/* If BLOCK contains a fork/join marker, process it to create or
+   terminate a loop structure.  Add this block to the current loop,
+   and then walk successor blocks.   */
 
 static parallel *
-nvptx_discover_pars (bb_insn_map_t *map)
+nvptx_find_par (bb_insn_map_t *map, parallel *par, basic_block block)
 {
-  parallel *outer_par = new parallel (0, 0);
-  bb_par_vec_t worklist;
-  basic_block block;
-
-  // Mark entry and exit blocks as visited.
-  block = EXIT_BLOCK_PTR_FOR_FN (cfun);
+  if (block->flags & BB_VISITED)
+return par;
   block->flags |= BB_VISITED;
-  block = ENTRY_BLOCK_PTR_FOR_FN (cfun);
-  worklist.safe_push (bb_par_t (block, outer_par));
 
-  while (worklist.length ())
+  if (rtx_insn **endp = map->get (block))
 {
-  bb_par_t bb_par = worklist.pop ();
-  parallel *l = bb_par.second;
+  rtx_insn *end = *endp;
 
-  block = bb_par.first;
-
-  // Have we met this block?
-  if (block->flags & BB_VISITED)
-	continue;
-  block->flags |= BB_VISITED;
-  
-  rtx_insn **endp = map->get (block);
-  if (endp)
+  /* This is a block head or tail, or return instruction.  */
+  switch (recog_memoized (end))
 	{
-	  rtx_insn *end = *endp;
-	  
-	  /* This is a block head or tail, or return instruction.  */
-	  switch (recog_memoized (end))
-	{
-	case CODE_FOR_return:
-	  /* Return instructions are in their own block, and we
-		 don't need to do anything more.  */
-	  continue;
+	case CODE_FOR_return:
+	  /* Return instructions are in their own block, and we
+	 don't need to do anything more.  */
+	  return par;
 
-	case CODE_FOR_nvptx_forked:
-	  /* Loop head, create a new inner loop and add it into
-		 our parent's child list.  */
-	  {
-		unsigned mask = UINTVAL (XVECEXP (PATTERN (end), 0, 0));
-
-		gcc_assert (mask);
-		l = new parallel (l, mask);
-		l->forked_block = block;
-		l->forked_insn = end;
-		if (!(mask & GOMP_DIM_MASK (GOMP_DIM_MAX))
-		&& (mask & GOMP_DIM_MASK (GOMP_DIM_WORKER)))
-		  l->fork_insn
-		= nvptx_discover_pre (block, CODE_FOR_nvptx_fork);
-	  }
-	  break;
+	case CODE_FOR_nvptx_forked:
+	  /* Loop head, create a new inner loop and add it into
+	 our parent's child list.  */
+	  {
+	unsigned mask = UINTVAL (XVECEXP (PATTERN (end), 0, 0));
 
-	case CODE_FOR_nvptx_join:
-	  /* A loop tail.  Finish the current loop and return to
-		 parent.  */
-	  {
-		unsigned mask = UINTVAL (XVECEXP (PATTERN (end), 0, 0));
-
-		gcc_assert (l->mask == mask);
-		l->join_block = block;
-		l->join_insn = end;
-		if (!(mask & GOMP_DIM_MASK (GOMP_DIM_MAX))
-		&& (mask & GOMP_DIM_MASK (GOMP_DIM_WORKER)))
-		  l->joining_insn
-		= nvptx_discover_pre (block, CODE_FOR_nvptx_joining);
-		l = l->parent;
-	  }
-	  break;
+	gcc_assert 

[gomp4] parallel reduction nested inside data regions

2015-09-11 Thread Cesar Philippidis
This patch corrects the way that build_outer_var_ref deals with data
mappings in acc parallel and kernels when they are nested in some other
construct (i.e. acc data). This issue can be reproduced with acc
parallel reduction nested nested inside a acc data region.

I've applied this fix to gomp-4_0-branch.

Cesar
2015-09-11  Cesar Philippidis  

	gcc/
	* omp-low.c (build_outer_var_ref):

	gcc/testsuite/
	* c-c++-common/goacc/parallel-reduction.c: Enclose the parallel
	reduction inside an acc data region.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/parallel-reduction.c: Enclose
	one parallel reduction inside a data region.


diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 09adea8..ba37372 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -1240,6 +1240,8 @@ build_outer_var_ref (tree var, omp_context *ctx)
   if (x == NULL_TREE)
 	x = var;
 }
+  else if (is_oacc_parallel (ctx))
+x = var;
   else if (ctx->outer)
 {
   /* OpenACC may have multiple outer contexts (one per loop).  */
@@ -1256,7 +1258,7 @@ build_outer_var_ref (tree var, omp_context *ctx)
   else
 	x = lookup_decl (var, ctx->outer);
 }
-  else if (is_reference (var) || is_oacc_parallel (ctx)
+  else if (is_reference (var)
 	   || extract_oacc_routine_gwv (current_function_decl) != 0)
 /* This can happen with orphaned constructs.  If var is reference, it is
possible it is shared and as such valid.  */
diff --git a/gcc/testsuite/c-c++-common/goacc/parallel-reduction.c b/gcc/testsuite/c-c++-common/goacc/parallel-reduction.c
index debed55..d7cc947 100644
--- a/gcc/testsuite/c-c++-common/goacc/parallel-reduction.c
+++ b/gcc/testsuite/c-c++-common/goacc/parallel-reduction.c
@@ -2,11 +2,15 @@ int
 main ()
 {
   int sum = 0;
+  int dummy = 0;
 
-#pragma acc parallel num_gangs (10) copy (sum) reduction (+:sum)
+#pragma acc data copy (dummy)
   {
-int v = 5;
-sum += 10 + v;
+#pragma acc parallel num_gangs (10) copy (sum) reduction (+:sum)
+{
+  int v = 5;
+  sum += 10 + v;
+}
   }
 
   return sum;
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-reduction.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-reduction.c
index 381d5b6..d328f46 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-reduction.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-reduction.c
@@ -10,10 +10,14 @@ main ()
 {
   int s1 = 0, s2 = 0;
   int i;
+  int dummy = 0;
 
-#pragma acc parallel num_gangs (N) reduction (+:s1)
+#pragma acc data copy (dummy)
   {
-s1++;
+#pragma acc parallel num_gangs (N) reduction (+:s1)
+{
+  s1++;
+}
   }
 
   if (acc_get_device_type () != acc_device_nvidia)


[C++ Patch] PR 51911 V2 ("G++ accepts new auto { list }")

2015-09-11 Thread Paolo Carlini

Hi,

this is a slightly reworked (simplified) version of a patch I sent a 
while ago. The issue is that we are not enforcing at all 5.3.4/2 in the 
parser, thus we end up rejecting the first test below with a misleading 
error message talking about list-initialization (and a wrong location), 
because we diagnose it too late like 'auto foo{3, 4, 5};', and simply 
accepting the second. Tested x86_64-linux.


Thanks,
Paolo.


/cp
2015-09-11  Paolo Carlini  

PR c++/51911
* parser.c (cp_parser_new_expression): Enforce 5.3.4/2.

/testsuite
2015-09-11  Paolo Carlini  

PR c++/51911
* g++.dg/cpp0x/new-auto1.C: New.
Index: cp/parser.c
===
--- cp/parser.c (revision 227690)
+++ cp/parser.c (working copy)
@@ -7591,8 +7591,9 @@ cp_parser_new_expression (cp_parser* parser)
 type = cp_parser_new_type_id (parser, );
 
   /* If the next token is a `(' or '{', then we have a new-initializer.  */
-  if (cp_lexer_next_token_is (parser->lexer, CPP_OPEN_PAREN)
-  || cp_lexer_next_token_is (parser->lexer, CPP_OPEN_BRACE))
+  cp_token *token = cp_lexer_peek_token (parser->lexer);
+  if (token->type == CPP_OPEN_PAREN
+  || token->type == CPP_OPEN_BRACE)
 initializer = cp_parser_new_initializer (parser);
   else
 initializer = NULL;
@@ -7601,6 +7602,18 @@ cp_parser_new_expression (cp_parser* parser)
  expression.  */
   if (cp_parser_non_integral_constant_expression (parser, NIC_NEW))
 ret = error_mark_node;
+  /* 5.3.4/2: "If the auto type-specifier appears in the type-specifier-seq
+ of a new-type-id or type-id of a new-expression, the new-expression shall
+ contain a new-initializer of the form ( assignment-expression )".  */
+  else if (type_uses_auto (type)
+  && (token->type != CPP_OPEN_PAREN
+  || vec_safe_length (initializer) != 1))
+{
+  error_at (token->location,
+   "initialization of new-expression for type % "
+   "requires exactly one parenthesized expression");
+  ret = error_mark_node;
+}
   else
 {
   /* Create a representation of the new-expression.  */
Index: testsuite/g++.dg/cpp0x/new-auto1.C
===
--- testsuite/g++.dg/cpp0x/new-auto1.C  (revision 0)
+++ testsuite/g++.dg/cpp0x/new-auto1.C  (working copy)
@@ -0,0 +1,7 @@
+// PR c++/51911
+// { dg-do compile { target c++11 } }
+
+#include 
+
+auto foo = new auto { 3, 4, 5 };  // { dg-error "21:initialization of 
new-expression for type 'auto'" }
+auto bar = new auto { 3 };  // { dg-error "21:initialization of new-expression 
for type 'auto'" }


RE: [PATCH] Refactor optimize isl

2015-09-11 Thread Aditya Kumar


-Original Message-
From: Tobias Grosser [mailto:tob...@grosser.es] 
Sent: Friday, September 11, 2015 1:16 PM
To: Aditya Kumar; gcc-patches@gcc.gnu.org
Cc: richard.guent...@gmail.com; s@samsung.com; seb...@gmail.com
Subject: Re: [PATCH] Refactor optimize isl

On 09/11/2015 07:07 PM, Aditya Kumar wrote:
> Updated patch with corrections:
>
> Refactor graphite-optimize-isl.c. Renamed function name, variable 
> names etc., and indented the source according to gcc style guidelines.  
> Modified comments accordingly. No functional change intended.

> Looks reasonable.

> Just for history, this code was copied from Polly this is why the
formatting does not match gcc's style. The relevant file in Polly has been
evolved since then and might provide you with ideas > on how to improve this
file in gcc.

Thanks, I would look into Polly.

-Aditya

> Tobias




Go patch committed: Fix possible out of bounds memcmp

2015-09-11 Thread Ian Lance Taylor
This patch by Chris Manghane fixes the Go frontend to avoid a possible
out of bounds memcmp when looking for a go:nointerface comment.  This
fixes https://golang.org/issue/11577 .  Bootstrapped and ran Go
testsuite on x86_64-unknown-linux-gnu.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 227696)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-352617bfe0a880febf5d2a87e89ea439c742ba18
+aea4360ca9c37f8e929f177ae7e42593ee62aa79
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/lex.cc
===
--- gcc/go/gofrontend/lex.cc(revision 227696)
+++ gcc/go/gofrontend/lex.cc(working copy)
@@ -1752,7 +1752,9 @@ Lex::skip_cpp_comment()
   // For field tracking analysis: a //go:nointerface comment means
   // that the next interface method should not be stored in the type
   // descriptor.  This permits it to be discarded if it is not needed.
-  if (this->lineoff_ == 2 && memcmp(p, "go:nointerface", 14) == 0)
+  if (this->lineoff_ == 2
+  && pend - p > 14
+  && memcmp(p, "go:nointerface", 14) == 0)
 this->saw_nointerface_ = true;
 
   while (p < pend)


Re: [Patch] Teach RTL ifcvt to handle multiple simple set instructions

2015-09-11 Thread Jeff Law

On 09/11/2015 02:49 AM, Kyrill Tkachov wrote:


On 10/09/15 22:11, Jeff Law wrote:

On 09/10/2015 12:23 PM, Bernd Schmidt wrote:

  > No testcase provided, as currently I don't know of targets with a
high
  > enough branch cost to actually trigger the optimisation.

Hmm, so the code would not actually be used right now? In that case I'll
leave it to others to decide whether we want to apply it. Other than the
points above it looks OK to me.

Some targets have -mbranch-cost to allow overriding the default costing.
   visium has a branch cost of 10!  Several ports have a cost of 6 either
unconditionally or when the branch is not well predicted.

Presumably James is more interested in the ARM/AArch64 targets ;-)

I think that's probably what James is most interested in getting some
ideas around -- the cost model.

I think the fundamental problem is BRANCH_COST isn't actually relative
to anything other than the default value of "1".  It doesn't directly
correspond to COSTS_N_INSNS or anything else.  So while using
COSTS_N_INSNS (BRANCH_COST)) would seem to make sense, it actually
doesn't.  It's not even clear how a value of 10 relates to a value of 1
other than it's more expensive.

ifcvt (and others) comparing to magic #s is more than a bit lame.  But
with BRANCH_COST having no meaning relative to anything else I can see
why Richard did things that way.


Out of interest, what was the intended original meaning
of branch costs if it was not to be relative to instructions?
I don't think it ever had one.  It's self-relative.  A cost of 2 is 
greater than a cost of 1.  No more, no less IIRC.   Lame?  Yes. 
Short-sighted?  Yes.  Should we try to fix it.  Yes.


If you look at how BRANCH_COST actually gets used, AFAIK it's tested 
only against "magic constants", which are themselves lame, short-sighted 
and need to be fixed.


jeff



[gomp4] SESE region neutering

2015-09-11 Thread Nathan Sidwell
One optimization one can apply to the worker & vector neutering that the PTX 
backend does, is to neuter Single-Entry, Single-Exit regions, rather than 
individual BBs.  An  SESE region in a region  bounded by a single entry BB and a 
single exit BB.  (A single BB is a trivial  SESE region).  Finding these regions 
is important as it means we can skip straight from the entry block to the exit 
block.


This patch implements that optimization.  As the comment at the head of the code 
says, we first find 'cycle-equivalent' BBs.  These are ones that we determine 
are in the same (set of) loops, in the closed graph.  Such equivalent BBs form 
the entry and exit BBs of an SESE region.  Once we've found these, we need to 
find the ones that cover the most of the graph -- and delete the ones that are 
consumed by the larger areas.  This is done by a coloring algorithm executed as 
a DFS walk.  One of the properties of SESE regions is that they are always 
strictly nested -- they never partially overlap.  That property is used by the 
coloring algorithm.


Once we've obtained the set of SESE regions, it's a simple matter of applying 
the neutering algorithm, which already accepted an entry and an exit node.


nathan
2015-09-11  Nathan Sidwell  

	Implement SESE block neutering optimization.
	* config/nvptx/nvptx.c (bb_pair_t, bb_pair_vec_t,
	pseudo_node_t): New typdefs.
	(struct bracket): New struct.
	(bracket_vec_t): New typedef.
	(struct bb_sese): New struct.
	(bb_sese:~bb_sese, bb_sese::append, bb_sese::remove): Member fns.
	(BB_SET_SESE, BB_GET_SESE): Local accessors.
	(nvptx_sese_number, nvptx_sese_pseudo, nvptx_sese_color): Worker
	fns for finding SESE regions.
	(nvptx_find_sese): SESE region finder.
	(nvptx_neuter_pars): Find & neuter SESE regions when optimizing.

Index: gcc/config/nvptx/nvptx.c
===
--- gcc/config/nvptx/nvptx.c	(revision 227692)
+++ gcc/config/nvptx/nvptx.c	(working copy)
@@ -2724,6 +2724,616 @@ nvptx_discover_pars (bb_insn_map_t *map)
   return par;
 }
 
+/* Analyse a group of BBs within a partitioned region and create N
+   Single-Entry-Single-Exit regions.  Some of those regions will be
+   trivial ones consisting of a single BB.  The blocks of a
+   partitioned region might form a set of disjoint graphs -- because
+   the region encloses a differently partitoned sub region.
+
+   We use the linear time algorithm described in 'Finding Regions Fast:
+   Single Entry Single Exit and control Regions in Linear Time'
+   Johnson, Pearson & Pingali.  That algorithm deals with complete
+   CFGs, where a back edge is inserted from END to START, and thus the
+   problem becomes one of finding equivalent loops.
+
+   In this case we have a partial CFG.  We complete it by redirecting
+   any incoming edge to the graph to be from an arbitrary external BB,
+   and similarly redirecting any outgoing edge to be to  that BB.
+   Thus we end up with a closed graph.
+
+   The algorithm works by building a spanning tree of an undirected
+   graph and keeping track of back edges from nodes further from the
+   root in the tree to nodes nearer to the root in the tree.  In the
+   description below, the root is up and the tree grows downwards.
+
+   We avoid having to deal with degenerate back-edges to the same
+   block, by splitting each BB into 3 -- one for input edges, one for
+   the node itself and one for the output edges.  Such back edges are
+   referred to as 'Brackets'.  Cycle equivalent nodes will have the
+   same set of brackets.
+   
+   Determining bracket equivalency is done by maintaining a list of
+   brackets in such a manner that the list length and final bracket
+   uniquely identify the set.
+
+   We use coloring to mark all BBs with cycle equivalency with the
+   same color.  This is the output of the 'Finding Regions Fast'
+   algorithm.  Notice it doesn't actually find the set of nodes within
+   a particular region, just unorderd sets of nodes that are the
+   entries and exits of SESE regions.
+   
+   After determining cycle equivalency, we need to find the minimal
+   set of SESE regions.  Do this with a DFS coloring walk of the
+   complete graph.  We're either 'looking' or 'coloring'.  When
+   looking, and we're in the subgraph, we start coloring the color of
+   the current node, and remember that node as the start of the
+   current color's SESE region.  Every time we go to a new node, we
+   decrement the count of nodes with thet color.  If it reaches zero,
+   we remember that node as the end of the current color's SESE region
+   and return to 'looking'.  Otherwise we color the node the current
+   color.
+
+   This way we end up with coloring the inside of non-trivial SESE
+   regions with the color of that region.  */
+
+/* A pair of BBs.  We use this to represent SESE regions.  */
+typedef std::pair bb_pair_t;
+typedef auto_vec bb_pair_vec_t;
+
+/* A 

Re: [PATCH] v2 shrink-wrap: Rewrite

2015-09-11 Thread Pat Haugen

On 09/11/2015 02:49 PM, Segher Boessenkool wrote:

On PowerPC, this enables shrink-wrapping of about 2%-3% more functions.
I expected more, but in most cases this would help we cannot yet shrink-
wrap because there are non-volatile registers used, often in the first
block already.
I looked into shrink-wrapping earlier this year and noticed the same. I 
tried expanding ira.c:split_live_ranges_for_shrink_wrap() to do a more 
general live range splitting than the limited parm_reg->pseudo copies it 
tries splitting now, but the splits ended up getting coalesced back 
together and a non-volatile chosen. Unfortunately my digging got put on 
the back burner and I haven't looked at it for a while.


Another issue I saw for PowerPC that prevented shrink-wrapping was the 
case where non-volatile CR's are used somewhere in the function. This 
causes a 'mfcr' to be generated in the prolog to save the non-volatile 
CR's, which currently lists all CR's as used. This kills shrink-wrapping 
since the early exit test will define a CR which then fails the 
shrink-wrap requirement of not defining a register used in the prolog. I 
tried a simple hack of just removing the volatile CR's from the 
'movesi_from_cr' define_insn and it solved the issue, but never 
submitted a patch for just that piece.


-Pat



[PATCH] PR28901 -Wunused-variable ignores unused const initialised variables

2015-09-11 Thread Mark Wielaard
12 years ago it was decided that -Wunused-variable shouldn't warn about
static const variables because some code used const static char rcsid[]
strings which were never used but wanted in the code anyway. But as the
bug points out this hides some real bugs. These days the usage of rcsids
is not very popular anymore. So this patch changes the default to warn
about unused static const variables with -Wunused-variable. And it adds
a new option -Wno-unused-const-variable to turn this warning off. New
testcases are included to test the new warning with -Wunused-variable
and suppressing it with -Wno-unused-const-variable or unused attribute.

gcc/ChangeLog

   PR c/28901
   * common.opt (Wunused-const-variable): New option.
   * toplev.c (check_global_declaration): Check and use
   warn_unused_const_variable.
   * doc/invoke.texi (Warning Options): Add -Wunused-const-variable.
   (-Wunused-variable): Remove non-constant. Implies
   -Wunused-const-variable.
   (-Wunused-const-variable): New.

gcc/testsuite/ChangeLog

   PR c/28901
   * gcc.dg/unused-4.c: Adjust warning for static const.
   * gcc.dg/unused-variable-1.c: New test.
   * gcc.dg/unused-variable-2.c: Likewise.

Tested on x86_64-pc-linux-gnu, no regressions.

---

diff --git a/gcc/common.opt b/gcc/common.opt
index 94d1d88..ae2fe77 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -735,6 +735,10 @@ Wunused-variable
 Common Var(warn_unused_variable) Warning EnabledBy(Wunused)
 Warn when a variable is unused
 
+Wunused-const-variable
+Common Var(warn_unused_const_variable) Warning EnabledBy(Wunused-variable)
+Warn when a const variable is unused
+
 Wcoverage-mismatch
 Common Var(warn_coverage_mismatch) Init(1) Warning
 Warn in case profiles in -fprofile-use do not match
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 518d689..211b9e1 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -290,6 +290,7 @@ Objective-C and Objective-C++ Dialects}.
 -Wunsuffixed-float-constants  -Wunused  -Wunused-function @gol
 -Wunused-label  -Wunused-local-typedefs -Wunused-parameter @gol
 -Wno-unused-result -Wunused-value @gol -Wunused-variable @gol
+-Wunused-const-variable @gol
 -Wunused-but-set-parameter -Wunused-but-set-variable @gol
 -Wuseless-cast -Wvariadic-macros -Wvector-operation-performance @gol
 -Wvla -Wvolatile-register-var  -Wwrite-strings @gol
@@ -4143,13 +4144,22 @@ its return value. The default is 
@option{-Wunused-result}.
 @item -Wunused-variable
 @opindex Wunused-variable
 @opindex Wno-unused-variable
-Warn whenever a local variable or non-constant static variable is unused
-aside from its declaration.
+Warn whenever a local or static variable is unused aside from its
+declaration. This option implies @option{-Wunused-const-variable}.
 This warning is enabled by @option{-Wall}.
 
 To suppress this warning use the @code{unused} attribute
 (@pxref{Variable Attributes}).
 
+@item -Wunused-const-variable
+@opindex Wunused-const-variable
+@opindex Wno-unused-const-variable
+Warn whenever a constant static variable is unused aside from its declaration.
+This warning is enabled by @option{-Wunused-variable}.
+
+To suppress this warning use the @code{unused} attribute
+(@pxref{Variable Attributes}).
+
 @item -Wunused-value
 @opindex Wunused-value
 @opindex Wno-unused-value

diff --git a/gcc/testsuite/gcc.dg/unused-4.c b/gcc/testsuite/gcc.dg/unused-4.c
index 99e845f..5323600 100644
--- a/gcc/testsuite/gcc.dg/unused-4.c
+++ b/gcc/testsuite/gcc.dg/unused-4.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-Wunused -O3" } */
 
-static const int i = 0;
+static const int i = 0;/* { dg-warning "defined but not used" 
} */
 static void f() { }/* { dg-warning "defined but not used" } */
 static inline void g() { }
diff --git a/gcc/testsuite/gcc.dg/unused-variable-1.c 
b/gcc/testsuite/gcc.dg/unused-variable-1.c
new file mode 100644
index 000..cb86c3bc
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/unused-variable-1.c
@@ -0,0 +1,7 @@
+/* { dg-do compile } */
+/* { dg-options "-Wunused-variable" } */
+
+static int a = 0;/* { dg-warning "defined but not used" } */
+static const int b = 0;  /* { dg-warning "defined but not used" } */
+static int c __attribute__ ((unused)) = 0;
+static const char rcsid[] __attribute__ ((unused)) = "version-string";
diff --git a/gcc/testsuite/gcc.dg/unused-variable-2.c 
b/gcc/testsuite/gcc.dg/unused-variable-2.c
new file mode 100644
index 000..0496466
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/unused-variable-2.c
@@ -0,0 +1,7 @@
+/* { dg-do compile } */
+/* { dg-options "-Wunused-variable -Wno-unused-const-variable" } */
+
+static int a = 0;/* { dg-warning "defined but not used" } */
+static const int b = 0;
+static int c __attribute__ ((unused)) = 0;
+static const char rcsid[] = "version-string";
diff --git a/gcc/toplev.c b/gcc/toplev.c
index 926224a..95e4c52 100644
--- a/gcc/toplev.c
+++ b/gcc/toplev.c
@@ 

Re: [PATCH] v2 shrink-wrap: Rewrite

2015-09-11 Thread Pat Haugen

On 09/11/2015 05:40 PM, Segher Boessenkool wrote:

Another issue I saw for PowerPC that prevented shrink-wrapping was the
>case where non-volatile CR's are used somewhere in the function. This
>causes a 'mfcr' to be generated in the prolog to save the non-volatile
>CR's, which currently lists all CR's as used. This kills shrink-wrapping
>since the early exit test will define a CR which then fails the
>shrink-wrap requirement of not defining a register used in the prolog.

You mean it cannot put the prologue on an edge where some reg defined
in the prologue is live?  On PowerPC I see that mostly for r0 (it is used
to move LR through), but it doesn't happen all that often -- it's the last
of the volatile regs in the REG_ALLOC_ORDER already.  I'll pay attention
to the non-volatile CR fields, interesting.
It can't move the prolog down (past) an insn that defines a register 
that is only "used" in the prolog, since the prolog would then be 
referencing the new value in the reg as opposed to what the value was 
coming in to the function, this check is in requires_stack_frame_p(). 
The r0 case you mention is not a concern since r0 is first defined 
before its use in the prolog.

>I
>tried a simple hack of just removing the volatile CR's from the
>'movesi_from_cr' define_insn and it solved the issue, but never
>submitted a patch for just that piece.

We probably need a specialised version for that.  The other issue is,
why do we use those CR fields at all?  Are we saving them over calls
(probably a bad tradeoff), or do we really want so many CR fields
live?
Yes, that was part why I didn't submit it, didn't dig enough to convince 
myself that was a safe thing, but thinking back that define_insn may 
have only been used for prolog. Not that a specialized version still 
wouldn't be a bad idea to prevent surprises down the road when it's 
possibly used for non-prolog code that really does want all the CR's 
listed. I agree that the whole non-volatile CR use issue should probably 
be looked at. As you say, even if we aren't using them across calls 
(which I'm guessing would be a low percentage if any), is it really 
buying us much over just recomputing the compare.


-Pat



Re: [PATCH] PR28901 -Wunused-variable ignores unused const initialised variables

2015-09-11 Thread Bernd Schmidt

On 09/12/2015 12:12 AM, Mark Wielaard wrote:

12 years ago it was decided that -Wunused-variable shouldn't warn about
static const variables because some code used const static char rcsid[]
strings which were never used but wanted in the code anyway. But as the
bug points out this hides some real bugs. These days the usage of rcsids
is not very popular anymore. So this patch changes the default to warn
about unused static const variables with -Wunused-variable. And it adds
a new option -Wno-unused-const-variable to turn this warning off. New
testcases are included to test the new warning with -Wunused-variable
and suppressing it with -Wno-unused-const-variable or unused attribute.



PR c/28901
* gcc.dg/unused-4.c: Adjust warning for static const.
* gcc.dg/unused-variable-1.c: New test.
* gcc.dg/unused-variable-2.c: Likewise.


Should these go into c-c++-common? Otherwise I'm ok with the patch, 
please wait a few days to see if there are objections to this change 
then commit.



Bernd




Re: [PATCH] v2 shrink-wrap: Rewrite

2015-09-11 Thread Segher Boessenkool
On Fri, Sep 11, 2015 at 05:06:47PM -0500, Pat Haugen wrote:
> On 09/11/2015 02:49 PM, Segher Boessenkool wrote:
> >On PowerPC, this enables shrink-wrapping of about 2%-3% more functions.
> >I expected more, but in most cases this would help we cannot yet shrink-
> >wrap because there are non-volatile registers used, often in the first
> >block already.
> I looked into shrink-wrapping earlier this year and noticed the same. I 
> tried expanding ira.c:split_live_ranges_for_shrink_wrap() to do a more 
> general live range splitting than the limited parm_reg->pseudo copies it 
> tries splitting now, but the splits ended up getting coalesced back 
> together and a non-volatile chosen. Unfortunately my digging got put on 
> the back burner and I haven't looked at it for a while.

It isn't an easy problem, lots of tradeoffs :-(  RA is magic.

> Another issue I saw for PowerPC that prevented shrink-wrapping was the 
> case where non-volatile CR's are used somewhere in the function. This 
> causes a 'mfcr' to be generated in the prolog to save the non-volatile 
> CR's, which currently lists all CR's as used. This kills shrink-wrapping 
> since the early exit test will define a CR which then fails the 
> shrink-wrap requirement of not defining a register used in the prolog.

You mean it cannot put the prologue on an edge where some reg defined
in the prologue is live?  On PowerPC I see that mostly for r0 (it is used
to move LR through), but it doesn't happen all that often -- it's the last
of the volatile regs in the REG_ALLOC_ORDER already.  I'll pay attention
to the non-volatile CR fields, interesting.

> I 
> tried a simple hack of just removing the volatile CR's from the 
> 'movesi_from_cr' define_insn and it solved the issue, but never 
> submitted a patch for just that piece.

We probably need a specialised version for that.  The other issue is,
why do we use those CR fields at all?  Are we saving them over calls
(probably a bad tradeoff), or do we really want so many CR fields
live?


Segher


[PATCH] Another small cleanup to the const_and_copies stack

2015-09-11 Thread Jeff Law


While working on the next significant hunk of changes towards 47679, I 
saw a few more obvious cleanups that ought to happen.


First, we have a policy for data members that they ought to be prefixed 
with m_ to denote the extra overhead in accessing those members.  The 
data member in the const_and_copies unwinder stack didn't have the 
prefix.  This patch fixes that oversight and the obvious fallout.


It also adds an additional high level comment for the const_and_copies 
stack and updates an out-of-date comment in tree-ssa-dom.c.


This was extracted out of a larger patch that has bootstrapped and 
regression tested on x86_64.  This patch was quick-strapped to ensure it 
didn't break anything.


This is (of course) related to BZ47679.


Installed on the trunk.

Jeff
diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 45f3b15..839b53e 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,13 @@
+2015-09-11  Jeff Law  
+
+   PR tree-optimization/47679
+   * tree-ssa-dom.c (struct cond_equivalence): Update comment.
+   * tree-ssa-scopedtables.h (class const_and_copies): Prefix data
+   member with m_.  Update inline member functions as necessary.  Add
+   toplevel comment.
+   * tree-ssa-scopedtables.c: Update const_and_copies's member
+   functions to use m_ prefix to access the stack.
+
 2015-09-11  Aditya Kumar  
 
 * graphite-optimize-isl.c (disable_tiling): Remove.
diff --git a/gcc/tree-ssa-dom.c b/gcc/tree-ssa-dom.c
index f0b19ff..e3eb0db 100644
--- a/gcc/tree-ssa-dom.c
+++ b/gcc/tree-ssa-dom.c
@@ -92,8 +92,7 @@ struct cond_equivalence
 };
 
 
-/* Structure for recording edge equivalences as well as any pending
-   edge redirections during the dominator optimizer.
+/* Structure for recording edge equivalences.
 
Computing and storing the edge equivalences instead of creating
them on-demand can save significant amounts of time, particularly
@@ -101,10 +100,7 @@ struct cond_equivalence
 
These structures live for a single iteration of the dominator
optimizer in the edge's AUX field.  At the end of an iteration we
-   free each of these structures and update the AUX field to point
-   to any requested redirection target (the code for updating the
-   CFG and SSA graph for edge redirection expects redirection edge
-   targets to be in the AUX field for each edge.  */
+   free each of these structures.  */
 
 struct edge_info
 {
diff --git a/gcc/tree-ssa-scopedtables.c b/gcc/tree-ssa-scopedtables.c
index 1fea69a..fedd92a 100644
--- a/gcc/tree-ssa-scopedtables.c
+++ b/gcc/tree-ssa-scopedtables.c
@@ -35,11 +35,11 @@ along with GCC; see the file COPYING3.  If not see
 void
 const_and_copies::pop_to_marker (void)
 {
-  while (stack.length () > 0)
+  while (m_stack.length () > 0)
 {
   tree prev_value, dest;
 
-  dest = stack.pop ();
+  dest = m_stack.pop ();
 
   /* A NULL value indicates we should stop unwinding, otherwise
 pop off the next entry as they're recorded in pairs.  */
@@ -55,7 +55,7 @@ const_and_copies::pop_to_marker (void)
  fprintf (dump_file, "\n");
}
 
-  prev_value = stack.pop ();
+  prev_value = m_stack.pop ();
   set_ssa_name_value (dest, prev_value);
 }
 }
@@ -90,9 +90,9 @@ const_and_copies::record_const_or_copy (tree x, tree y, tree 
prev_x)
 }
 
   set_ssa_name_value (x, y);
-  stack.reserve (2);
-  stack.quick_push (prev_x);
-  stack.quick_push (x);
+  m_stack.reserve (2);
+  m_stack.quick_push (prev_x);
+  m_stack.quick_push (x);
 }
 
 /* A new value has been assigned to LHS.  If necessary, invalidate any
@@ -114,16 +114,16 @@ const_and_copies::invalidate (tree lhs)
  then it's a "stop unwinding" marker.  Else the current marker is
  the SSA_NAME with an equivalence and the prior entry in the stack
  is what the current element is equivalent to.  */
-  for (int i = stack.length() - 1; i >= 0; i--)
+  for (int i = m_stack.length() - 1; i >= 0; i--)
 {
   /* Ignore the stop unwinding markers.  */
-  if ((stack)[i] == NULL)
+  if ((m_stack)[i] == NULL)
continue;
 
   /* We want to check the current value of stack[i] to see if
 it matches LHS.  If so, then invalidate.  */
-  if (SSA_NAME_VALUE ((stack)[i]) == lhs)
-   record_const_or_copy ((stack)[i], NULL_TREE);
+  if (SSA_NAME_VALUE ((m_stack)[i]) == lhs)
+   record_const_or_copy ((m_stack)[i], NULL_TREE);
 
   /* Remember, we're dealing with two elements in this case.  */
   i--;
diff --git a/gcc/tree-ssa-scopedtables.h b/gcc/tree-ssa-scopedtables.h
index 13f7ccb..f7d9ca4 100644
--- a/gcc/tree-ssa-scopedtables.h
+++ b/gcc/tree-ssa-scopedtables.h
@@ -20,14 +20,20 @@ along with GCC; see the file COPYING3.  If not see
 #ifndef GCC_TREE_SSA_SCOPED_TABLES_H
 #define GCC_TREE_SSA_SCOPED_TABLES_H
 
+/* This class defines an unwindable const/copy equivalence table
+   layered on top of 

Re: [PATCH][PR67476] Add param parloops-schedule

2015-09-11 Thread Jakub Jelinek
On Fri, Sep 11, 2015 at 12:55:00PM +0200, Tom de Vries wrote:
> Hi,
> 
> this patch adds a param parloops-schedule=<0-4>, which sets the omp schedule
> for loops paralellized by parloops.
> 
> The <0-4> maps onto .
> 
> Bootstrapped and reg-tested on x86_64.
> 
> OK for trunk?

I don't really like it, the mapping of the integers to the enum values
is non-obvious and hard to remember.
Perhaps add support for enumeration params if you want this instead?

Jakub


[patch] libstdc++/58265 Implement N4258 noexcept for std::basic_string.

2015-09-11 Thread Jonathan Wakely

This updates the non-COW basic_string to meet the noexcept
requirements in the current draft, which required finishing the
allocator propagation support.

Tested powerpc64le-linux, normal and debug. Committed to trunk.


commit 909cf7116d75d42920e4b4f89ee8ddf04c843a9c
Author: Jonathan Wakely 
Date:   Fri Sep 11 10:54:31 2015 +0100

Implement N4258 noexcept for std::basic_string.

	PR libstdc++/58265
	* doc/xml/manual/intro.xml: Document LWG 2063 and 2064 resolutions.
	* doc/html/manual/bugs.html: Regenerate.
	* include/bits/basic_string.h (basic_string): Implement N4258. Add
	correct exception-specifications and propagate allocators correctly.
	* include/bits/basic_string.tcc (basic_string::swap): Propagate
	allocators correctly.
	* include/debug/string (__gnu_debug::basic_string): Add correct
	exceptions-specifications and allcoator-extended constructors.
	* testsuite/21_strings/basic_string/allocator/char/copy.cc: New.
	* testsuite/21_strings/basic_string/allocator/char/copy_assign.cc:
	New.
	* testsuite/21_strings/basic_string/allocator/char/minimal.cc: New.
	* testsuite/21_strings/basic_string/allocator/char/move.cc: New.
	* testsuite/21_strings/basic_string/allocator/char/move_assign.cc:
	New.
	* testsuite/21_strings/basic_string/allocator/char/noexcept.cc: New.
	* testsuite/21_strings/basic_string/allocator/char/swap.cc: New.
	* testsuite/21_strings/basic_string/allocator/wchar_t/copy.cc: New.
	* testsuite/21_strings/basic_string/allocator/wchar_t/copy_assign.cc:
	New.
	* testsuite/21_strings/basic_string/allocator/wchar_t/minimal.cc: New.
	* testsuite/21_strings/basic_string/allocator/wchar_t/move.cc: New.
	* testsuite/21_strings/basic_string/allocator/wchar_t/move_assign.cc:
	New.
	* testsuite/21_strings/basic_string/allocator/wchar_t/noexcept.cc: New.
	* testsuite/21_strings/basic_string/allocator/wchar_t/swap.cc: New.
	* testsuite/util/testsuite_allocator.h (tracker_allocator): Define
	defaulted assignment operators.

diff --git a/libstdc++-v3/doc/xml/manual/intro.xml b/libstdc++-v3/doc/xml/manual/intro.xml
index 1cc183e..2aa9ba7 100644
--- a/libstdc++-v3/doc/xml/manual/intro.xml
+++ b/libstdc++-v3/doc/xml/manual/intro.xml
@@ -850,6 +850,18 @@ requirements of the license of GCC.
 Add additional overloads.
 
 
+http://www.w3.org/1999/xlink; xlink:href="../ext/lwg-defects.html#2063">2063:
+	Contradictory requirements for string move assignment
+
+Respect propagation trait for move assignment.
+
+
+http://www.w3.org/1999/xlink; xlink:href="../ext/lwg-defects.html#2064">2064:
+	More noexcept issues in basic_string
+
+Add noexcept to the comparison operators.
+
+
 http://www.w3.org/1999/xlink; xlink:href="../ext/lwg-defects.html#2067">2067:
 	packaged_task should have deleted copy c'tor with const parameter
 
diff --git a/libstdc++-v3/include/bits/basic_string.h b/libstdc++-v3/include/bits/basic_string.h
index 3226617..e6e7bb5 100644
--- a/libstdc++-v3/include/bits/basic_string.h
+++ b/libstdc++-v3/include/bits/basic_string.h
@@ -379,9 +379,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
*  @brief  Default constructor creates an empty string.
*/
   basic_string()
-#if __cplusplus >= 201103L
-  noexcept(is_nothrow_default_constructible<_Alloc>::value)
-#endif
+  _GLIBCXX_NOEXCEPT_IF(is_nothrow_default_constructible<_Alloc>::value)
   : _M_dataplus(_M_local_data())
   { _M_set_length(0); }
 
@@ -389,7 +387,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
*  @brief  Construct an empty string using allocator @a a.
*/
   explicit
-  basic_string(const _Alloc& __a)
+  basic_string(const _Alloc& __a) _GLIBCXX_NOEXCEPT
   : _M_dataplus(_M_local_data(), __a)
   { _M_set_length(0); }
 
@@ -398,7 +396,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
*  @param  __str  Source string.
*/
   basic_string(const basic_string& __str)
-  : _M_dataplus(_M_local_data(), __str._M_get_allocator()) // TODO A traits
+  : _M_dataplus(_M_local_data(),
+		_Alloc_traits::_S_select_on_copy(__str._M_get_allocator()))
   { _M_construct(__str._M_data(), __str._M_data() + __str.length()); }
 
   /**
@@ -511,10 +510,25 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
   { _M_construct(__str.begin(), __str.end()); }
 
   basic_string(basic_string&& __str, const _Alloc& __a)
+  noexcept(_Alloc_traits::_S_always_equal())
   : _M_dataplus(_M_local_data(), __a)
   {
-	if (__str.get_allocator() == __a)
-	  *this = std::move(__str);
+	if (__str._M_is_local())
+	  {
+	traits_type::copy(_M_local_buf, __str._M_local_buf,
+			  _S_local_capacity + 1);
+	_M_length(__str.length());
+	__str._M_set_length(0);
+	  }
+	else if (_Alloc_traits::_S_always_equal()
+	|| __str.get_allocator() == __a)
+	  {
+	_M_data(__str._M_data());
+	_M_length(__str.length());

Re: [PATCH 04/22] Reimplement diagnostic_show_locus, introducing rich_location classes

2015-09-11 Thread Michael Matz
Hi,

On Thu, 10 Sep 2015, David Malcolm wrote:

> +/* FIXME: (dmalcolm)
> +   This plugin is currently the only user of
> + gcc_rich_location::add_range_with_caption
> +   As such, the symbol is present in libbackend.a, but not in "cc1",
> +   and running the plugin fails with a linker error:
> + ./diagnostic_plugin_test_show_locus.so: undefined symbol: 
> _ZN17gcc_rich_location22add_range_with_captionEjjP18diagnostic_contextPKcz
> +   which c++filt tells us is:
> + ./diagnostic_plugin_test_show_locus.so: undefined symbol: 
> gcc_rich_location::add_range_with_caption(unsigned int, unsigned int, 
> diagnostic_context*, char const*, ...)
> +
> +   I've tried various workarounds (adding DEBUG_FUNCTION to the
> +   method, taking its address), but can't seem to fix it that way.
> +   So as a nasty workaround, the following material is copied
> +   from gcc-rich-location.c: */

You need to make cc1 use _anything_ defined in the source file 
gcc-rich-location.c.  E.g. it could be some global internal variable:

int _force_me_into_cc1_hack;

which you then refer to in e.g. diagnostic-color.c (or from whereever).


Ciao,
Michael.


Re: [PATCH 07/22] Implement token range tracking within libcpp and C/C++ FEs

2015-09-11 Thread Michael Matz
Hi,

On Thu, 10 Sep 2015, David Malcolm wrote:

> Does anyone know why this was "carefully packed" and to what extent
> this matters?  I'm adding an extra 8 bytes to it (or 4 if we eliminate
> the existing location_t).  As far as I can see, these are
> short-lived, and there are only relative few alive at any time.

The c++ frontend stores _all_ tokens before starting to parse, so the size 
of cp_token is not totally irrelevant.  It still might not matter much, 
though.


Ciao,
Michael.


Re: [PATCH 04/22] Reimplement diagnostic_show_locus, introducing rich_location classes

2015-09-11 Thread Michael Matz
Hi,

On Thu, 10 Sep 2015, David Malcolm wrote:

> +/* A range of source locations.
> +
> +   Ranges are half-open:
> +   m_start is the first location within the range, whereas
> +   m_finish is the first location *after* the range.

I think you eventually decided that they are closed, not half-open, at 
least this:

> +  static source_range from_location (source_location loc)
> +  {
> +source_range result;
> +result.m_start = loc;
> +result.m_finish = loc;

and this:

> +/* Ranges are closed
> +   m_start is the first location within the range, and
> +   m_finish is the last location within the range.  */

suggest so :)


Ciao,
Michael.


Re: [PATCH][PR67476] Add param parloops-schedule

2015-09-11 Thread Tom de Vries

On 11/09/15 15:28, Tom de Vries wrote:

So the definition of param parloop-schedule becomes:
...
DEFPARAMENUM PARAM_PARLOOPS_SCHEDULE,
  "parloops-schedule",
  "Schedule type of omp schedule for loops parallelized by "
  "parloops (static, dynamic, guided, auto, runtime)",
  0, 0, 4, "static", "dynamic", "guided", "auto", "runtime")
...
[ I'll repost the original patch containing this update. ]



This is the patch adding param parloops-schedule, containing the update 
to use DEFPARAMENUM.


OK for trunk if x86_64 bootstrap and reg-test succeeds?

Thanks,
- Tom

Add param parloops-schedule

2015-09-10  Tom de Vries  

	PR tree-optimization/67476
	* doc/invoke.texi (@item parloops-schedule): New item.
	* omp-low.c (expand_omp_for_generic): Handle simple latch.  Add missing
	phis.  Handle original loop.
	* params.def (PARAM_PARLOOPS_SCHEDULE): New DEFPARAMENUM.
	* tree-parloops.c (create_parallel_loop): Handle
	PARAM_PARLOOPS_SCHEDULE.

	* testsuite/libgomp.c/autopar-3.c: New test.
	* testsuite/libgomp.c/autopar-4.c: New test.
	* testsuite/libgomp.c/autopar-5.c: New test.
	* testsuite/libgomp.c/autopar-6.c: New test.
	* testsuite/libgomp.c/autopar-7.c: New test.
	* testsuite/libgomp.c/autopar-8.c: New test.
---
 gcc/doc/invoke.texi |  4 +++
 gcc/omp-low.c   | 57 +++--
 gcc/params.def  |  6 
 gcc/tree-parloops.c | 24 +-
 libgomp/testsuite/libgomp.c/autopar-3.c |  4 +++
 libgomp/testsuite/libgomp.c/autopar-4.c |  4 +++
 libgomp/testsuite/libgomp.c/autopar-5.c |  4 +++
 libgomp/testsuite/libgomp.c/autopar-6.c |  4 +++
 libgomp/testsuite/libgomp.c/autopar-7.c |  4 +++
 libgomp/testsuite/libgomp.c/autopar-8.c |  4 +++
 10 files changed, 111 insertions(+), 4 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.c/autopar-3.c
 create mode 100644 libgomp/testsuite/libgomp.c/autopar-4.c
 create mode 100644 libgomp/testsuite/libgomp.c/autopar-5.c
 create mode 100644 libgomp/testsuite/libgomp.c/autopar-6.c
 create mode 100644 libgomp/testsuite/libgomp.c/autopar-7.c
 create mode 100644 libgomp/testsuite/libgomp.c/autopar-8.c

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 76e5e29..2221795 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -11005,6 +11005,10 @@ automaton.  The default is 50.
 Chunk size of omp schedule for loops parallelized by parloops.  The default
 is 0.
 
+@item parloops-schedule
+Schedule type of omp schedule for loops parallelized by parloops (static,
+dynamic, guided, auto, runtime).  The default is static.
+
 @end table
 @end table
 
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 88a5149..4f0498b 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -239,6 +239,7 @@ static vec taskreg_contexts;
 
 static void scan_omp (gimple_seq *, omp_context *);
 static tree scan_omp_1_op (tree *, int *, void *);
+static gphi *find_phi_with_arg_on_edge (tree, edge);
 
 #define WALK_SUBSTMTS  \
 case GIMPLE_BIND: \
@@ -6155,7 +6156,9 @@ expand_omp_for_generic (struct omp_region *region,
   if (!broken_loop)
 {
   l2_bb = create_empty_bb (cont_bb);
-  gcc_assert (BRANCH_EDGE (cont_bb)->dest == l1_bb);
+  gcc_assert (BRANCH_EDGE (cont_bb)->dest == l1_bb
+		  || (single_succ_edge (BRANCH_EDGE (cont_bb)->dest)->dest
+		  == l1_bb));
   gcc_assert (EDGE_COUNT (cont_bb->succs) == 2);
 }
   else
@@ -6429,8 +6432,12 @@ expand_omp_for_generic (struct omp_region *region,
   remove_edge (e);
 
   make_edge (cont_bb, l2_bb, EDGE_FALSE_VALUE);
-  add_bb_to_loop (l2_bb, cont_bb->loop_father);
   e = find_edge (cont_bb, l1_bb);
+  if (e == NULL)
+	{
+	  e = BRANCH_EDGE (cont_bb);
+	  gcc_assert (single_succ (e->dest) == l1_bb);
+	}
   if (gimple_omp_for_combined_p (fd->for_stmt))
 	{
 	  remove_edge (e);
@@ -6454,7 +6461,45 @@ expand_omp_for_generic (struct omp_region *region,
 	  e->flags = EDGE_FALLTHRU;
 	}
   make_edge (l2_bb, l0_bb, EDGE_TRUE_VALUE);
+}
+
+
+  if (gimple_in_ssa_p (cfun))
+{
+  gphi_iterator psi;
+
+  for (psi = gsi_start_phis (l3_bb); !gsi_end_p (psi); gsi_next ())
+	{
+	  source_location locus;
+	  gphi *nphi;
+	  gphi *exit_phi = psi.phi ();
+
+	  edge l2_to_l3 = find_edge (l2_bb, l3_bb);
+	  tree exit_res = PHI_ARG_DEF_FROM_EDGE (exit_phi, l2_to_l3);
 
+	  basic_block latch = BRANCH_EDGE (cont_bb)->dest;
+	  edge latch_to_l1 = find_edge (latch, l1_bb);
+	  gphi *inner_phi = find_phi_with_arg_on_edge (exit_res, latch_to_l1);
+
+	  tree t = gimple_phi_result (exit_phi);
+	  tree new_res = copy_ssa_name (t, NULL);
+	  nphi = create_phi_node (new_res, l0_bb);
+
+	  edge l0_to_l1 = find_edge (l0_bb, l1_bb);
+	  t = PHI_ARG_DEF_FROM_EDGE (inner_phi, l0_to_l1);
+	  locus = gimple_phi_arg_location_from_edge (inner_phi, l0_to_l1);
+	  edge entry_to_l0 = find_edge (entry_bb, l0_bb);
+	  add_phi_arg (nphi, t, 

[patch] libstdc++/65142 Check read() result in std::random_device.

2015-09-11 Thread Jonathan Wakely

We should not silently ignore a failure to read from the random
device.

Tested powerpc64le-linux, committed to trunk. I'm going to commit this
to the gcc-5 branch too.


commit 2d2f7012dc3744dafef0de94dd845bd190253dbd
Author: Jonathan Wakely 
Date:   Fri Feb 20 17:29:50 2015 +

Check read() result in std::random_device.

	PR libstdc++/65142
	* src/c++11/random.cc (random_device::_M_getval()): Check read result.

diff --git a/libstdc++-v3/src/c++11/random.cc b/libstdc++-v3/src/c++11/random.cc
index edf900f..1d102c7 100644
--- a/libstdc++-v3/src/c++11/random.cc
+++ b/libstdc++-v3/src/c++11/random.cc
@@ -130,13 +130,17 @@ namespace std _GLIBCXX_VISIBILITY(default)
 #endif
 
 result_type __ret;
+
 #ifdef _GLIBCXX_HAVE_UNISTD_H
-read(fileno(static_cast(_M_file)),
-	 static_cast(&__ret), sizeof(result_type));
+auto e = read(fileno(static_cast(_M_file)),
+		  static_cast(&__ret), sizeof(result_type));
 #else
-std::fread(static_cast(&__ret), sizeof(result_type),
-	   1, static_cast(_M_file));
+auto e = std::fread(static_cast(&__ret), sizeof(result_type),
+		1, static_cast(_M_file));
 #endif
+if (e != sizeof(result_type))
+  __throw_runtime_error(__N("random_device could not read enough bytes"));
+
 return __ret;
   }
 


Re: [PATCH] vectorizing conditional expressions (PR tree-optimization/65947)

2015-09-11 Thread Ramana Radhakrishnan
On Fri, Sep 11, 2015 at 2:19 PM, Bill Schmidt
 wrote:
> Hi Alan,
>
> I probably wasn't clear enough.  The implementation in the vectorizer is
> fine and I'm not asking that to change per target.  What I'm objecting
> to is the equivalence between a REDUC_MAX_EXPR and a cost associated
> with vec_to_scalar.  This assumes that the back end will implement a
> REDUC_MAX_EXPR in a specific way that at least some back ends cannot.
> But those back ends should be free to model the cost of the
> REDUC_MAX_EXPR appropriately.  Therefore I am asking for a new
> vect_cost_for_stmt type to represent the cost of a REDUC_MAX_EXPR.  For
> ARM, this cost will be the same as a vec_to_scalar.  For others, it may
> not be; for powerpc, it certainly will not be.



>
> We can produce a perfectly fine sequence for a REDUC_MAX_EXPR during RTL
> expansion, and therefore it is not correct for us to explode this in
> tree-vect-generic.  This would expand the code size without providing
> any significant optimization opportunity, and could reduce the ability
> to, for instance, common REDUC_MAX_EXPRs.  It would also slow down the
> gimple vectorizers.
>
> I apologize if my loose use of language confused the issue.  It isn't
> the whole COND_REDUCTION I'm concerned with, but the REDUC_MAX_EXPRs
> that are used by it.
>
> (The costs in powerpc won't be enormous, but they are definitely
> mode-dependent in a way that vec_to_scalar is not.  We'll need 2*log(n)
> instructions, where n is the number of elements in the mode being
> vectorized.)


IIUC, on AArch64 a reduc_max_expr matches with a single reduction
operation but on AArch32 Neon a reduc_smax gets implemented as a
sequence of vpmax instructions which sounds similar to the PowerPC
example as well. Thus mapping a reduc_smax expression to the cost of a
vec_to_scalar is probably not right in this particular situation.


regards
Ramana
>
> A secondary concern for powerpc is that REDUC_MAX_EXPR produces a scalar
> that has to be broadcast back to a vector, and the best way to implement
> it for us already has the max value in all positions of a vector.  But
> that is something we should be able to fix with simplify-rtx in the back
> end.
>
> Thanks,
> Bill
>
>
> On Fri, 2015-09-11 at 10:15 +0100, Alan Hayward wrote:
>> Hi Bill,
>>
>> I’d be a bit worried about asking the backend for the cost of a
>> COND_REDUCTION, as that will rely on the backend understanding the
>> implementation the vectorizer is using - every time the vectorizer
>> changed, the backends would need to be updated too. I’m hoping soon to get
>> together a patch to reduce the stmts produced on the simpler cases, which
>> would require a different set of costings. I can also imagine further
>> improvements being added for other special cases over time. Having the
>> backends understand every variation would be a little cumbersome.
>>
>> As it stands today, we correctly exit the optimisation if max reduction
>> isn’t supported in hardware, which is what the cost model is expecting.
>>
>>
>> If power wanted to use this implementation, then I think it’d probably
>> need some code in tree-vect-generic.c to implement on emulated max
>> reduction, which would then require updates to the costs modelling of
>> anything that uses max reduction (not just cond reduction). All of that is
>> outside the scope of this patch.
>>
>>
>> Thanks,
>> Alan.
>>
>> On 10/09/2015 23:14, "Bill Schmidt"  wrote:
>>
>> >Hi Alan,
>> >
>> >The cost modeling of the epilogue code seems pretty target-specific ("An
>> >EQ stmt and an AND stmt, reduction of the max index and a reduction of
>> >the found values, a broadcast of the max value," resulting in two
>> >vector_stmts, one vec_to_scalar, and two scalar_to_vecs).  On powerpc,
>> >this will not represent the cost accurately, and the cost will indeed be
>> >quite different depending on the mode (logarithmic in the number of
>> >elements).  I think that you need to create a new entry in
>> >vect_cost_for_stmt to represent the cost of a COND_REDUCTION, and allow
>> >each target to calculate the cost appropriately.
>> >
>> >(Powerpc doesn't have a max-reduction hardware instruction, but because
>> >the reduction will be only in the epilogue code, it may still be
>> >profitable for us to generate the somewhat expensive reduction sequence
>> >in order to vectorize the loop.  But we definitely want to model it as
>> >costly in and of itself.  Also, the sequence will produce the maximum
>> >value in all positions without a separate broadcast.)
>> >
>> >Thanks,
>> >Bill
>> >
>> >On Thu, 2015-09-10 at 15:51 +0100, Alan Hayward wrote:
>> >> Hi,
>> >> This patch (attached) adds support for vectorizing conditional
>> >>expressions
>> >> (PR 65947), for example:
>> >>
>> >> int condition_reduction (int *a, int min_v)
>> >> {
>> >>   int last = 0;
>> >>   for (int i = 0; i < N; i++)
>> >> if (a[i] < min_v)
>> >>   last = a[i];
>> >>   return 

[gomp4] Override default target hook

2015-09-11 Thread James Norris

Hi!

The attached patch overrides the hook's default action
for anchored address usage.

Committed after regtest on x86_64 and powerpc64le

Thanks!
Jim
Index: nvptx.c
===
--- nvptx.c	(revision 227683)
+++ nvptx.c	(working copy)
@@ -4340,6 +4340,14 @@ nvptx_goacc_reduction (gimple call)
   gcc_unreachable ();
 }
 }
+
+/* Don't allow use of anchored addresses.  */
+
+static bool
+nvptx_use_anchors_for_symbol (const_rtx ARG_UNUSED (symbol))
+{
+  return false;
+}
 
 #undef TARGET_OPTION_OVERRIDE
 #define TARGET_OPTION_OVERRIDE nvptx_option_override
@@ -4452,6 +4460,9 @@ nvptx_goacc_reduction (gimple call)
 #undef TARGET_GOACC_REDUCTION
 #define TARGET_GOACC_REDUCTION nvptx_goacc_reduction
 
+#undef TARGET_USE_ANCHORS_FOR_SYMBOL_P
+#define TARGET_USE_ANCHORS_FOR_SYMBOL_P nvptx_use_anchors_for_symbol
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-nvptx.h"


[PATCH][PR67476] Add param parloops-schedule

2015-09-11 Thread Tom de Vries

Hi,

this patch adds a param parloops-schedule=<0-4>, which sets the omp 
schedule for loops paralellized by parloops.


The <0-4> maps onto .

Bootstrapped and reg-tested on x86_64.

OK for trunk?

Thanks,
- Tom
Add param parloops-schedule

2015-09-10  Tom de Vries  

	PR tree-optimization/67476
	* doc/invoke.texi (@item parloops-schedule): New item.
	* omp-low.c (expand_omp_for_generic): Handle simple latch.  Add missing
	phis.  Handle original loop.
	* params.def (PARAM_PARLOOPS_SCHEDULE): New DEFPARAM.
	* tree-parloops.c (create_parallel_loop): Handle
	PARAM_PARLOOPS_SCHEDULE.

	* testsuite/libgomp.c/autopar-3.c: New test.
	* testsuite/libgomp.c/autopar-4.c: New test.
	* testsuite/libgomp.c/autopar-5.c: New test.
	* testsuite/libgomp.c/autopar-6.c: New test.
	* testsuite/libgomp.c/autopar-7.c: New test.
	* testsuite/libgomp.c/autopar-8.c: New test.
---
 gcc/doc/invoke.texi |  4 +++
 gcc/omp-low.c   | 57 +++--
 gcc/params.def  |  6 
 gcc/tree-parloops.c | 24 +-
 libgomp/testsuite/libgomp.c/autopar-3.c |  4 +++
 libgomp/testsuite/libgomp.c/autopar-4.c |  4 +++
 libgomp/testsuite/libgomp.c/autopar-5.c |  4 +++
 libgomp/testsuite/libgomp.c/autopar-6.c |  4 +++
 libgomp/testsuite/libgomp.c/autopar-7.c |  4 +++
 libgomp/testsuite/libgomp.c/autopar-8.c |  4 +++
 10 files changed, 111 insertions(+), 4 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.c/autopar-3.c
 create mode 100644 libgomp/testsuite/libgomp.c/autopar-4.c
 create mode 100644 libgomp/testsuite/libgomp.c/autopar-5.c
 create mode 100644 libgomp/testsuite/libgomp.c/autopar-6.c
 create mode 100644 libgomp/testsuite/libgomp.c/autopar-7.c
 create mode 100644 libgomp/testsuite/libgomp.c/autopar-8.c

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 76e5e29..f3b67a8 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -11005,6 +11005,10 @@ automaton.  The default is 50.
 Chunk size of omp schedule for loops parallelized by parloops.  The default
 is 0.
 
+@item parloops-schedule
+Schedule type of omp schedule for loops parallelized by parloops (0:static,
+1:dynamic, 2:guided, 3:auto, 4:runtime).  The default is 0.
+
 @end table
 @end table
 
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 88a5149..4f0498b 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -239,6 +239,7 @@ static vec taskreg_contexts;
 
 static void scan_omp (gimple_seq *, omp_context *);
 static tree scan_omp_1_op (tree *, int *, void *);
+static gphi *find_phi_with_arg_on_edge (tree, edge);
 
 #define WALK_SUBSTMTS  \
 case GIMPLE_BIND: \
@@ -6155,7 +6156,9 @@ expand_omp_for_generic (struct omp_region *region,
   if (!broken_loop)
 {
   l2_bb = create_empty_bb (cont_bb);
-  gcc_assert (BRANCH_EDGE (cont_bb)->dest == l1_bb);
+  gcc_assert (BRANCH_EDGE (cont_bb)->dest == l1_bb
+		  || (single_succ_edge (BRANCH_EDGE (cont_bb)->dest)->dest
+		  == l1_bb));
   gcc_assert (EDGE_COUNT (cont_bb->succs) == 2);
 }
   else
@@ -6429,8 +6432,12 @@ expand_omp_for_generic (struct omp_region *region,
   remove_edge (e);
 
   make_edge (cont_bb, l2_bb, EDGE_FALSE_VALUE);
-  add_bb_to_loop (l2_bb, cont_bb->loop_father);
   e = find_edge (cont_bb, l1_bb);
+  if (e == NULL)
+	{
+	  e = BRANCH_EDGE (cont_bb);
+	  gcc_assert (single_succ (e->dest) == l1_bb);
+	}
   if (gimple_omp_for_combined_p (fd->for_stmt))
 	{
 	  remove_edge (e);
@@ -6454,7 +6461,45 @@ expand_omp_for_generic (struct omp_region *region,
 	  e->flags = EDGE_FALLTHRU;
 	}
   make_edge (l2_bb, l0_bb, EDGE_TRUE_VALUE);
+}
+
+
+  if (gimple_in_ssa_p (cfun))
+{
+  gphi_iterator psi;
+
+  for (psi = gsi_start_phis (l3_bb); !gsi_end_p (psi); gsi_next ())
+	{
+	  source_location locus;
+	  gphi *nphi;
+	  gphi *exit_phi = psi.phi ();
+
+	  edge l2_to_l3 = find_edge (l2_bb, l3_bb);
+	  tree exit_res = PHI_ARG_DEF_FROM_EDGE (exit_phi, l2_to_l3);
 
+	  basic_block latch = BRANCH_EDGE (cont_bb)->dest;
+	  edge latch_to_l1 = find_edge (latch, l1_bb);
+	  gphi *inner_phi = find_phi_with_arg_on_edge (exit_res, latch_to_l1);
+
+	  tree t = gimple_phi_result (exit_phi);
+	  tree new_res = copy_ssa_name (t, NULL);
+	  nphi = create_phi_node (new_res, l0_bb);
+
+	  edge l0_to_l1 = find_edge (l0_bb, l1_bb);
+	  t = PHI_ARG_DEF_FROM_EDGE (inner_phi, l0_to_l1);
+	  locus = gimple_phi_arg_location_from_edge (inner_phi, l0_to_l1);
+	  edge entry_to_l0 = find_edge (entry_bb, l0_bb);
+	  add_phi_arg (nphi, t, entry_to_l0, locus);
+
+	  edge l2_to_l0 = find_edge (l2_bb, l0_bb);
+	  add_phi_arg (nphi, exit_res, l2_to_l0, UNKNOWN_LOCATION);
+
+	  add_phi_arg (inner_phi, new_res, l0_to_l1, UNKNOWN_LOCATION);
+	};
+}
+
+  if (!broken_loop)
+{
   set_immediate_dominator (CDI_DOMINATORS, l2_bb,
 			   recompute_dominator (CDI_DOMINATORS, l2_bb));
   

Re: [PATCH 2/2] shrink-wrap: Rewrite try_shrink_wrapping

2015-09-11 Thread Segher Boessenkool
On Fri, Sep 11, 2015 at 10:24:42AM +0100, Jiong Wang wrote:
> 
> Segher Boessenkool writes:
> 
> > On Thu, Sep 10, 2015 at 08:14:29AM -0700, Segher Boessenkool wrote:
> >> This patch rewrites the shrink-wrapping algorithm, allowing non-linear
> >> pieces of CFG to be duplicated for use without prologue instead of just
> >> linear pieces.
> >
> >> Bootstrapped and regression tested on powerpc64-linux.  Is this okay
> >> for mainline?
> >
> > Now also bootstrapped and regression tested on x86_64-linux.
> 
> + AArch64 boostrapping OK.

Thank you for testing!

> A quick check shows > 30% more functions shrink-wrapped during
> bootstrapping by a the following command:
> 
> cd $TOP_BUILD ; find . -name "*.pro_and_epilogue" | xargs grep 
> "Perform.*shrink" | wc -l

Wow, that is a lot!  But this is mostly the testsuite?  Shorter functions
can be wrapped a whole lot more often.


Segher


Re: [Patch, libstdc++] Add specific error message into exceptions

2015-09-11 Thread Tim Shen
On Mon, Sep 7, 2015 at 4:06 AM, Jonathan Wakely  wrote:
> On 28/08/15 11:23 -0700, Tim Shen wrote:
>>
>> On Fri, Aug 28, 2015 at 8:59 AM, Jonathan Wakely 
>> wrote:
>>>
>>> There seems to be no need to construct a std::string here, just pass a
>>> const char* (see below).
>>
>>
>> To be honest, I wasn't considering performance for a bit, since
>> exceptions are already considered slow by me :P. But yes, we can do
>> less allocations.
>>
>>> I wonder if we want to make this more efficient by adding a private
>>> member to regex_error that would allow information to be appended to
>>> the string, rather then creating a new regex_error with a new string.
>
>
> In case it wasn't clear, I was suggesting to add a private member
> *function* not data member.
>
>> I can add a helper function to _Scanner to construct the exception
>> object for only once. For functions that can't access this helper, use
>> return value for error handling.
>>
>>> I suggest adding another overload that takes a const char* rather than
>>> std::string. The reason is that when using the new ABI this function
>>> will take a std::__cxx11::string, so calling it will allocate memory
>>> for the string data, then that string is passed to the regex_error
>>> constructor which has to convert it internally to an old std::string,
>>> which has to allocate a second time.
>>
>>
>> First, to make it clear: due to _M_get_location_string(), we need
>> dynamic allocation.
>>
>> So is it good to have an owned raw pointer stored in runtime_error,
>> pointing to a heap allocated char chunk, which will be deallocated in
>> regex_error's dtor?
>
>
> No, adding that pointer is an ABI change.
>
> If you can't do it without an ABI change then you will have to lose
> the _M_get_location_string() functionality. It seems non-essential
> anyway.

Ok then, let's not append dynamic location information, but use a
string literal pointer only.


-- 
Regards,
Tim Shen
commit fc3343a2c719049620447f6dc20191e2af4895f6
Author: Tim Shen 
Date:   Thu Aug 27 21:42:40 2015 -0700

PR libstdc++/67361
* include/bits/regex_error.h: Add __throw_regex_error that
supports string.
* include/bits/regex_automaton.h: Add more specific exception
messages.
* include/bits/regex_automaton.tcc: Likewise.
* include/bits/regex_compiler.h: Likewise.
* include/bits/regex_compiler.tcc: Likewise.
* include/bits/regex_scanner.h: Likewise.
* include/bits/regex_scanner.tcc: Likewise.

diff --git a/libstdc++-v3/include/bits/regex_automaton.h 
b/libstdc++-v3/include/bits/regex_automaton.h
index b6ab307..1f672ee 100644
--- a/libstdc++-v3/include/bits/regex_automaton.h
+++ b/libstdc++-v3/include/bits/regex_automaton.h
@@ -327,7 +327,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   {
this->push_back(std::move(__s));
if (this->size() > _GLIBCXX_REGEX_STATE_LIMIT)
- __throw_regex_error(regex_constants::error_space);
+ __throw_regex_error(
+   regex_constants::error_space,
+   "Number of NFA states exceeds limit. Please use shorter regex "
+   "string, or use smaller brace expression, or make "
+   "_GLIBCXX_REGEX_STATE_LIMIT larger.");
return this->size()-1;
   }
 
diff --git a/libstdc++-v3/include/bits/regex_automaton.tcc 
b/libstdc++-v3/include/bits/regex_automaton.tcc
index f6f63a1..9bb1164 100644
--- a/libstdc++-v3/include/bits/regex_automaton.tcc
+++ b/libstdc++-v3/include/bits/regex_automaton.tcc
@@ -149,7 +149,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 _NFA<_TraitsT>::_M_insert_backref(size_t __index)
 {
   if (this->_M_flags & regex_constants::__polynomial)
-   __throw_regex_error(regex_constants::error_complexity);
+   __throw_regex_error(regex_constants::error_complexity,
+   "Unexpected back-reference in polynomial mode.");
   // To figure out whether a backref is valid, a stack is used to store
   // unfinished sub-expressions. For example, when parsing
   // "(a(b)(c\\1(d)))" at '\\1', _M_subexpr_count is 3, indicating that 3
@@ -158,10 +159,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   // _M_paren_stack is {1, 3}, for incomplete "(a.." and "(c..". At this
   // time, "\\2" is valid, but "\\1" and "\\3" are not.
   if (__index >= _M_subexpr_count)
-   __throw_regex_error(regex_constants::error_backref);
+   __throw_regex_error(
+ regex_constants::error_backref,
+ "Back-reference index exceeds current sub-expression count.");
   for (auto __it : this->_M_paren_stack)
if (__index == __it)
- __throw_regex_error(regex_constants::error_backref);
+ __throw_regex_error(
+   regex_constants::error_backref,
+   "Back-reference referred to an opened sub-expression.");
   this->_M_has_backref = true;
   _StateT __tmp(_S_opcode_backref);
   

Re: [PATCH][AArch64] Use preferred aliases for CSNEG, CSINC, CSINV

2015-09-11 Thread Andrew Pinski
On Tue, Sep 1, 2015 at 6:08 PM, Kyrill Tkachov  wrote:
> Hi all,
>
> The ARMv8-A reference manual says:
> "CNEG , , 
> is equivalent to
> CSNEG , , , invert()
> and is the preferred disassembly when Rn == Rm && cond != '111x'."
>
> That is, when the two input registers are the same we can use the shorter
> CNEG mnemonic
> with the inverse condition instead of the longer CSNEG instruction.
> Similarly for the
> CSINV and CSINC instructions, they have shorter CINV and CINC forms.
> This patch adjusts the output templates to emit the preferred shorter
> sequences when possible.
>
> The new mnemonics are just aliases, they map down to the same instruction in
> the end, so there
> are no performance or behaviour implications. But it does make the assembly
> a bit more readable
> IMO, since:
> "cnegw27, w9, le"
> can be simply read as "if the condition is less or equal negate w9" instead
> of the previous:
> "csnegw27, w9, w9, gt" where you have to remember which of the input
> registers is negated.
>
>
> Bootstrapped and tested on aarch64-linux-gnu.
> Ok for trunk?

I really think this kind of special casing is not correct and does not
belong in the compiler.  The main reason it complicates the back-end
more than the benefit of easier of reading the assembly code.

Thanks,
Andrew Pinski

>
> Thanks,
> Kyrill
>
> 2015-09-01  Kyrylo Tkachov  
>
> * config/aarch64/aarch64.md (csinc3_insn): Use CINC
> mnemonic when possible.
> (*csinv3_insn): Use CINV mnemonic when possible.
> (csneg3_insn): USE CNEG mnemonic when possible.
>
> 2015-09-01  Kyrylo Tkachov  
>
> * gcc.target/aarch64/abs_1.c: Update scan-assembler checks
> to allow cneg.
> * gcc.target/aarch64/cond_op_imm_1.c: Likewise.  Likewise for cinv.
> * gcc.target/aarch64/mod_2.c: Likewise.


Re: [PATCH 2/2] shrink-wrap: Rewrite try_shrink_wrapping

2015-09-11 Thread Segher Boessenkool
On Fri, Sep 11, 2015 at 11:19:40AM +0200, Bernd Schmidt wrote:
> On 09/10/2015 05:14 PM, Segher Boessenkool wrote:
> >This patch rewrites the shrink-wrapping algorithm, allowing non-linear
> >pieces of CFG to be duplicated for use without prologue instead of just
> >linear pieces.
> 
> An example would be good for this kind of patch, also in the comments.

I'll make up something for v2.

> >+  /* If there is more than one predecessor of PRO not dominated by PRO,
> >+ fail.  We don't have to do this (can split the block), but do this
> >+ for now (the original code disallowed this, too).
> 
> Comments shouldn't reference previous versions. Also, a comment 
> describing the why rather than just what is being done would be more 
> helpful.

I'll just remove that second sentence, it just describes what we do not
yet do.

> I'm wondering how your new algorithm prevents the prologue from being 
> placed inside a loop. Can you have a situation where this picks a 
> predecessor that is reachable but not dominated by PRO?

It doesn't prevent it!

The prologue will not be _inside_ the loop: there is one prologue, and it
is executed exactly once for any block needing it.  But the code can copy
part of the first iteration of a loop, if there are early exits.  Example
(from the testsuite, pr39943.c, -Os I think):

(_B_egin, _R_eturn, _S_imple_return; edges without arrowhead are down or
to the right, to simplify the diagram):

Block 3 needs a prologue (it has a call), the rest doesn't:



  B   3<--  B   ->3<--
  |   |   | |  |  |   |
  |   v   |   becomes   |  |  v   |
  2---4---  2---5--   4---
  | | |
  R S R


by copying bb 4 to bb 5, and inserting the prologue on the edge 5->3.

> Other than that it looks pretty good.

Thanks, and thanks for the review!  I'll send v2 later today.


Segher


Re: New power of 2 hash policy

2015-09-11 Thread Michael Matz
Hi,

On Thu, 10 Sep 2015, François Dumont wrote:

> Here is a patch to offer an alternative hash policy. This one is 
> using power of 2 number of buckets allowing a faster modulo operation. 
> This is obvious when running the performance test that I have adapted to 
> use this alternative policy. Something between current implementation 
> and the tr1 one, the old std one.
> 
> Of course with this hash policy the lower bits of the hash code are 
> more important. For pointers it would require to change the std::hash 
> implementation to remove the lower 0 bits like in the patch I proposed 
> some weeks ago.
> 
> What do you think ?

No comment on if it should be included (except that it seems useful to 
me), but one observation of the patch:

> +1ul << 31,
> +#if __SIZEOF_LONG__ != 8
> +1ul << 32
> +#else

This is wrong, 1ul<<32 is zero on a 32bit machine, and is also the 33rd 
entry in that table, when you want only 32.  Like you also (correctly) 
stop with 1ul<<63 for a 64bit machine.


Ciao,
Michael.

Re: [PATCH] vectorizing conditional expressions (PR tree-optimization/65947)

2015-09-11 Thread Bill Schmidt
Hi Alan,

I probably wasn't clear enough.  The implementation in the vectorizer is
fine and I'm not asking that to change per target.  What I'm objecting
to is the equivalence between a REDUC_MAX_EXPR and a cost associated
with vec_to_scalar.  This assumes that the back end will implement a
REDUC_MAX_EXPR in a specific way that at least some back ends cannot.
But those back ends should be free to model the cost of the
REDUC_MAX_EXPR appropriately.  Therefore I am asking for a new
vect_cost_for_stmt type to represent the cost of a REDUC_MAX_EXPR.  For
ARM, this cost will be the same as a vec_to_scalar.  For others, it may
not be; for powerpc, it certainly will not be.

We can produce a perfectly fine sequence for a REDUC_MAX_EXPR during RTL
expansion, and therefore it is not correct for us to explode this in
tree-vect-generic.  This would expand the code size without providing
any significant optimization opportunity, and could reduce the ability
to, for instance, common REDUC_MAX_EXPRs.  It would also slow down the
gimple vectorizers.

I apologize if my loose use of language confused the issue.  It isn't
the whole COND_REDUCTION I'm concerned with, but the REDUC_MAX_EXPRs
that are used by it.

(The costs in powerpc won't be enormous, but they are definitely
mode-dependent in a way that vec_to_scalar is not.  We'll need 2*log(n)
instructions, where n is the number of elements in the mode being
vectorized.)

A secondary concern for powerpc is that REDUC_MAX_EXPR produces a scalar
that has to be broadcast back to a vector, and the best way to implement
it for us already has the max value in all positions of a vector.  But
that is something we should be able to fix with simplify-rtx in the back
end.

Thanks,
Bill


On Fri, 2015-09-11 at 10:15 +0100, Alan Hayward wrote:
> Hi Bill,
> 
> I’d be a bit worried about asking the backend for the cost of a
> COND_REDUCTION, as that will rely on the backend understanding the
> implementation the vectorizer is using - every time the vectorizer
> changed, the backends would need to be updated too. I’m hoping soon to get
> together a patch to reduce the stmts produced on the simpler cases, which
> would require a different set of costings. I can also imagine further
> improvements being added for other special cases over time. Having the
> backends understand every variation would be a little cumbersome.
> 
> As it stands today, we correctly exit the optimisation if max reduction
> isn’t supported in hardware, which is what the cost model is expecting.
> 
> 
> If power wanted to use this implementation, then I think it’d probably
> need some code in tree-vect-generic.c to implement on emulated max
> reduction, which would then require updates to the costs modelling of
> anything that uses max reduction (not just cond reduction). All of that is
> outside the scope of this patch.
> 
> 
> Thanks,
> Alan.
> 
> On 10/09/2015 23:14, "Bill Schmidt"  wrote:
> 
> >Hi Alan,
> >
> >The cost modeling of the epilogue code seems pretty target-specific ("An
> >EQ stmt and an AND stmt, reduction of the max index and a reduction of
> >the found values, a broadcast of the max value," resulting in two
> >vector_stmts, one vec_to_scalar, and two scalar_to_vecs).  On powerpc,
> >this will not represent the cost accurately, and the cost will indeed be
> >quite different depending on the mode (logarithmic in the number of
> >elements).  I think that you need to create a new entry in
> >vect_cost_for_stmt to represent the cost of a COND_REDUCTION, and allow
> >each target to calculate the cost appropriately.
> >
> >(Powerpc doesn't have a max-reduction hardware instruction, but because
> >the reduction will be only in the epilogue code, it may still be
> >profitable for us to generate the somewhat expensive reduction sequence
> >in order to vectorize the loop.  But we definitely want to model it as
> >costly in and of itself.  Also, the sequence will produce the maximum
> >value in all positions without a separate broadcast.)
> >
> >Thanks,
> >Bill
> >
> >On Thu, 2015-09-10 at 15:51 +0100, Alan Hayward wrote:
> >> Hi,
> >> This patch (attached) adds support for vectorizing conditional
> >>expressions
> >> (PR 65947), for example:
> >> 
> >> int condition_reduction (int *a, int min_v)
> >> {
> >>   int last = 0;
> >>   for (int i = 0; i < N; i++)
> >> if (a[i] < min_v)
> >>   last = a[i];
> >>   return last;
> >> }
> >> 
> >> To do this the loop is vectorised to create a vector of data results (ie
> >> of matching a[i] values). Using an induction variable, an additional
> >> vector is added containing the indexes where the matches occured. In the
> >> function epilogue this is reduced to a single max value and then used to
> >> index into the vector of data results.
> >> When no values are matched in the loop, the indexes vector will contain
> >> all zeroes, eventually matching the first entry in the data results
> >>vector.
> >> 
> >> To vectorize 

[patch] libstdc++/64857 Rationalise PCH headers and 17_intro/headers tests.

2015-09-11 Thread Jonathan Wakely

Ensure that  includes everything, and rename the
17_intro/headers/c++200x directory to c++2011.

Tested powerpc64le-linux, committed to trunk.


commit 3b07b2a428725ba70636e00a1f6ed8a28bb59b44
Author: Jonathan Wakely 
Date:   Thu Jan 29 13:12:22 2015 +

Rationalise PCH headers and 17_intro/headers tests.

	PR libstdc++/64857
	* doc/xml/manual/using.xml: Improve aggregate header documentation.
	* doc/html/manual/*: Regenerate.
	* include/precompiled/extc++.h: Include  for C++11
	and later and include more extension headers.
	* testsuite/17_intro/headers/c++1998/all_attributes.cc: Remove
	redundant header.
	* testsuite/17_intro/headers/c++200x/: Rename to c++2011.
	* testsuite/17_intro/headers/c++2014/all_attributes.cc: Remove
	redundant headers.
	* testsuite/17_intro/headers/c++2014/all_no_exceptions.cc: New.
	* testsuite/17_intro/headers/c++2014/all_no_rtti.cc: New.
	* testsuite/17_intro/headers/c++2014/all_pedantic_errors.cc: New.
	* testsuite/17_intro/headers/c++2014/operator_names.cc: New.
	* testsuite/17_intro/headers/c++2014/stdc++.cc: New.
	* testsuite/17_intro/headers/c++2014/stdc++_multiple_inclusion.cc:
	New.

diff --git a/libstdc++-v3/doc/xml/manual/using.xml b/libstdc++-v3/doc/xml/manual/using.xml
index bad49f2..2c8d179 100644
--- a/libstdc++-v3/doc/xml/manual/using.xml
+++ b/libstdc++-v3/doc/xml/manual/using.xml
@@ -16,7 +16,8 @@
   The standard library conforms to the dialect of C++ specified by the
   -std option passed to the compiler.
   By default, g++ is equivalent to
-  g++ -std=gnu++98.
+  g++ -std=gnu++14 since GCC 6, and
+  g++ -std=gnu++98 for older releases.
 
 
  
@@ -718,7 +719,7 @@ and std::sinl.
 
 There are three base header files that are provided. They can be
 used to precompile the standard headers and extensions into binary
-files that may the be used to speed compiles that use these headers.
+files that may then be used to speed up compilations that use these headers.
 
 
 
@@ -726,7 +727,7 @@ files that may the be used to speed compiles that use these headers.
 
   stdc++.h
 Includes all standard headers. Actual content varies depending on
-language dialect.
+language dialect.
 
 
 
@@ -737,13 +738,14 @@ language dialect.
 
 
 extc++.h
-Includes all of stdtr1c++.h, and adds all the Extension headers.
+Includes all of stdc++.h, and adds all the Extension headers
+(and in C++98 mode also adds all the TR1 headers by including all of
+stdtr1c++.h).
 
 
 
-How to construct a .gch file from one of these base header files.
-
-First, find the include directory for the compiler. One way to do
+To construct a .gch file from one of these base header files,
+first find the include directory for the compiler. One way to do
 this is:
 
 
diff --git a/libstdc++-v3/include/precompiled/extc++.h b/libstdc++-v3/include/precompiled/extc++.h
index de3775b..8883e47 100644
--- a/libstdc++-v3/include/precompiled/extc++.h
+++ b/libstdc++-v3/include/precompiled/extc++.h
@@ -28,15 +28,25 @@
 
 #if __cplusplus < 201103L
 #include 
+#else
+#include 
 #endif
 
 #include 
+#if __cplusplus >= 201103L
+# include 
+#endif
+#include 
 #include 
 #include 
 #include 
 #include 
+#if __cplusplus >= 201103L
+# include 
+#endif
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -45,9 +55,13 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
+#if __cplusplus >= 201103L
+# include 
+#endif
 #include 
 #include 
 #include 
diff --git a/libstdc++-v3/testsuite/17_intro/headers/c++1998/all_attributes.cc b/libstdc++-v3/testsuite/17_intro/headers/c++1998/all_attributes.cc
index 7bc7ffe..20107d2 100644
--- a/libstdc++-v3/testsuite/17_intro/headers/c++1998/all_attributes.cc
+++ b/libstdc++-v3/testsuite/17_intro/headers/c++1998/all_attributes.cc
@@ -35,7 +35,6 @@
 #define unused 1
 #endif
 
-#include  // TODO: this is missing from 
 #include 
 
 int
diff --git a/libstdc++-v3/testsuite/17_intro/headers/c++200x/42319.cc b/libstdc++-v3/testsuite/17_intro/headers/c++200x/42319.cc
deleted file mode 100644
index 65afb57..000
--- a/libstdc++-v3/testsuite/17_intro/headers/c++200x/42319.cc
+++ /dev/null
@@ -1,22 +0,0 @@
-// { dg-do compile }
-// { dg-options "-std=gnu++11" }
-
-// Copyright (C) 2009-2015 Free Software Foundation, Inc.
-//
-// This file is part of the GNU ISO C++ Library.  This library is free
-// software; you can redistribute it and/or modify it under the
-// terms of the GNU General Public License as published by the
-// Free Software Foundation; either version 3, or (at your option)
-// any later version.
-
-// This library is distributed in the hope that it will be useful,
-// but WITHOUT ANY WARRANTY; without even the implied warranty of
-// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-// GNU General Public License for more details.
-
-// You should have received a copy of the GNU General Public License along
-// 

Re: [PATCH] Teach genmatch.c to generate single-use restrictions from flags

2015-09-11 Thread Bernd Schmidt

On 07/08/2015 04:39 PM, Richard Biener wrote:


This introduces a :s flag to match expressions which enforces
the expression to have a single-use if(!) the simplified
expression is larger than one statement.


This seems to be missing documentation in match-and-simplify.texi.


Bernd




Re: New power of 2 hash policy

2015-09-11 Thread Jonathan Wakely

On 11/09/15 15:11 +0200, Michael Matz wrote:

Hi,

On Thu, 10 Sep 2015, François Dumont wrote:


Here is a patch to offer an alternative hash policy. This one is
using power of 2 number of buckets allowing a faster modulo operation.
This is obvious when running the performance test that I have adapted to
use this alternative policy. Something between current implementation
and the tr1 one, the old std one.

Of course with this hash policy the lower bits of the hash code are
more important. For pointers it would require to change the std::hash
implementation to remove the lower 0 bits like in the patch I proposed
some weeks ago.

What do you think ?


No comment on if it should be included (except that it seems useful to
me), but one observation of the patch:


+1ul << 31,
+#if __SIZEOF_LONG__ != 8
+1ul << 32
+#else


This is wrong, 1ul<<32 is zero on a 32bit machine, and is also the 33rd
entry in that table, when you want only 32.  Like you also (correctly)
stop with 1ul<<63 for a 64bit machine.


I'd prefer to see that table disappear completely, replaced by a
constexpr function. We need a static table of prime numbers because
they can't be computed instantly, but we don't need to store powers of
two in the library.

I agree the extension is useful, and would like to see it included,
but I wonder if we can do it without adding any new symbols to the
shared library. We certainly don't need the table, and the few other
functions added to the DSO could probably be defined inline in
headers.



Re: [PATCH][PR67476] Add param parloops-schedule

2015-09-11 Thread Tom de Vries

On 11/09/15 12:57, Jakub Jelinek wrote:

On Fri, Sep 11, 2015 at 12:55:00PM +0200, Tom de Vries wrote:

>Hi,
>
>this patch adds a param parloops-schedule=<0-4>, which sets the omp schedule
>for loops paralellized by parloops.
>
>The <0-4> maps onto .
>
>Bootstrapped and reg-tested on x86_64.
>
>OK for trunk?

I don't really like it, the mapping of the integers to the enum values
is non-obvious and hard to remember.
Perhaps add support for enumeration params if you want this instead?



This patch adds handling of a DEFPARAMENUM macro, which is similar to 
the DEFPARAM macro, but allows the values to be named.


So the definition of param parloop-schedule becomes:
...
DEFPARAMENUM PARAM_PARLOOPS_SCHEDULE,
 "parloops-schedule",
 "Schedule type of omp schedule for loops parallelized by "
 "parloops (static, dynamic, guided, auto, runtime)",
 0, 0, 4, "static", "dynamic", "guided", "auto", "runtime")
...
[ I'll repost the original patch containing this update. ]

OK for trunk if x86_64 bootstrap and reg-test succeeds?

Thanks,
- Tom

Support DEFPARAMENUM in params.def

2015-09-11  Tom de Vries  

	* opts.c (handle_param): Handle case that param arg is a string.
	* params-list.h: Handle DEFPARAMENUM in params.def.
	* params.c (find_param): New function, factored out of ...
	(set_param_value): ... here.
	(get_param_string_value): New function.
	* params.h (struct param_info): Add values field.
	(get_param_string_value): Declare.
---
 gcc/opts.c| 12 ---
 gcc/params-list.h |  3 ++
 gcc/params.c  | 93 +--
 gcc/params.h  |  5 +++
 4 files changed, 85 insertions(+), 28 deletions(-)

diff --git a/gcc/opts.c b/gcc/opts.c
index f1a9acd..47b8b86 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -2116,15 +2116,17 @@ handle_param (struct gcc_options *opts, struct gcc_options *opts_set,
 	  arg);
   else
 {
+  *equal = '\0';
+
   value = integral_argument (equal + 1);
   if (value == -1)
+	value = get_param_string_value (arg, equal + 1);
+
+  if (value == -1)
 	error_at (loc, "invalid --param value %qs", equal + 1);
   else
-	{
-	  *equal = '\0';
-	  set_param_value (arg, value,
-			   opts->x_param_values, opts_set->x_param_values);
-	}
+	set_param_value (arg, value,
+			 opts->x_param_values, opts_set->x_param_values);
 }
 
   free (arg);
diff --git a/gcc/params-list.h b/gcc/params-list.h
index ee33ef5..8a19919 100644
--- a/gcc/params-list.h
+++ b/gcc/params-list.h
@@ -19,5 +19,8 @@ along with GCC; see the file COPYING3.  If not see
 
 #define DEFPARAM(enumerator, option, nocmsgid, default, min, max) \
   enumerator,
+#define DEFPARAMENUM(enumerator, option, nocmsgid, default, min, max, ...) \
+  enumerator,
 #include "params.def"
 #undef DEFPARAM
+#undef DEFPARAMENUM
diff --git a/gcc/params.c b/gcc/params.c
index b0bc80b..7eedab8 100644
--- a/gcc/params.c
+++ b/gcc/params.c
@@ -37,12 +37,22 @@ static size_t num_compiler_params;
default values determined.  */
 static bool params_finished;
 
+#define DEFPARAM(ENUM, OPTION, HELP, DEFAULT, MIN, MAX)
+#define DEFPARAMENUM(ENUM, OPTION, HELP, DEFAULT, MIN, MAX, ...)	\
+  static const char *values_ ## ENUM [] = { __VA_ARGS__ };
+#include "params.def"
+#undef DEFPARAMENUM
+#undef DEFPARAM
+
 static const param_info lang_independent_params[] = {
 #define DEFPARAM(ENUM, OPTION, HELP, DEFAULT, MIN, MAX) \
-  { OPTION, DEFAULT, MIN, MAX, HELP },
+  { OPTION, DEFAULT, MIN, MAX, HELP, NULL },
+#define DEFPARAMENUM(ENUM, OPTION, HELP, DEFAULT, MIN, MAX, ...)	\
+  { OPTION, DEFAULT, MIN, MAX, HELP, values_ ## ENUM },
 #include "params.def"
 #undef DEFPARAM
-  { NULL, 0, 0, 0, NULL }
+#undef DEFPARAMENUM
+  { NULL, 0, 0, 0, NULL, NULL }
 };
 
 /* Add the N PARAMS to the current list of compiler parameters.  */
@@ -114,6 +124,45 @@ set_param_value_internal (compiler_param num, int value,
 params_set[i] = true;
 }
 
+/* Return true if it can find the matching entry for NAME in the parameter
+   table, and assign the entry index to INDEX.  Return false otherwise.  */
+
+bool
+find_param (const char *name, size_t *index)
+{
+  for (size_t i = 0; i < num_compiler_params; ++i)
+if (strcmp (compiler_params[i].option, name) == 0)
+  {
+	*index = i;
+	return true;
+  }
+
+  return false;
+}
+
+/* Return the param value for param name VALUE_NAME belonging to param NAME.
+   Return -1 if there no corresponding param value.  */
+
+int
+get_param_string_value (const char *name, const char *value_name)
+{
+  size_t index;
+  if (!find_param (name, ))
+return -1;
+
+  param_info *entry = _params[index];
+  if (entry->value_names == NULL)
+return -1;
+
+  int n = entry->max_value - entry->min_value + 1;
+  int value = entry->min_value;
+  for (int i = 0; i < n ; ++i, ++value)
+if (strcmp (entry->value_names[i], value_name) == 0)
+  return value;
+
+  return -1;
+}

Re: [PATCH 2/2] shrink-wrap: Rewrite try_shrink_wrapping

2015-09-11 Thread Bernd Schmidt

On 09/11/2015 02:36 PM, Segher Boessenkool wrote:


I'm wondering how your new algorithm prevents the prologue from being
placed inside a loop. Can you have a situation where this picks a
predecessor that is reachable but not dominated by PRO?


It doesn't prevent it!

The prologue will not be _inside_ the loop: there is one prologue, and it
is executed exactly once for any block needing it.  But the code can copy
part of the first iteration of a loop, if there are early exits.  Example
(from the testsuite, pr39943.c, -Os I think):


I was wondering if that was the intention. This also ought to be spelled 
out in the comments.



Bernd


Re: New power of 2 hash policy

2015-09-11 Thread Jonathan Wakely

On 11/09/15 14:18 +0100, Jonathan Wakely wrote:

On 11/09/15 15:11 +0200, Michael Matz wrote:

Hi,

On Thu, 10 Sep 2015, François Dumont wrote:


   Here is a patch to offer an alternative hash policy. This one is
using power of 2 number of buckets allowing a faster modulo operation.
This is obvious when running the performance test that I have adapted to
use this alternative policy. Something between current implementation
and the tr1 one, the old std one.

   Of course with this hash policy the lower bits of the hash code are
more important. For pointers it would require to change the std::hash
implementation to remove the lower 0 bits like in the patch I proposed
some weeks ago.

   What do you think ?


No comment on if it should be included (except that it seems useful to
me), but one observation of the patch:


+1ul << 31,
+#if __SIZEOF_LONG__ != 8
+1ul << 32
+#else


This is wrong, 1ul<<32 is zero on a 32bit machine, and is also the 33rd
entry in that table, when you want only 32.  Like you also (correctly)
stop with 1ul<<63 for a 64bit machine.


I'd prefer to see that table disappear completely, replaced by a
constexpr function. We need a static table of prime numbers because
they can't be computed instantly, but we don't need to store powers of
two in the library.

I agree the extension is useful, and would like to see it included,
but I wonder if we can do it without adding any new symbols to the
shared library. We certainly don't need the table, and the few other
functions added to the DSO could probably be defined inline in
headers.



Also there several comments that talk about finding "the next prime"
which should talk about powers of two, and the smaller table for fast
lookup of the next "prime" may not be needed for powers of two. There
are fast tricks for finding the next power of two using bitwise
operations.

So I'm in favour of the change in general, but it needs a little bit
of reworking where the prime number code has been copy


[patch] libstdc++/67173 Fix filesystem::canonical for Solaris 10.

2015-09-11 Thread Jonathan Wakely

Solaris 10 doesn't follow POSIX in accepting a null pointer as the
second argument to realpath(), so allocate a buffer for it.

Tested x86_64-linux, committed to trunk.


commit ed4023452f85f6c745ce473b2503f4e46fb02cd9
Author: Jonathan Wakely 
Date:   Fri Sep 11 15:19:27 2015 +0100

Fix filesystem::canonical on Solaris 10.

	PR libstdc++/67173
	* src/filesystem/ops.cc (filesystem::canonical): Allocate buffer for
	realpath on Solaris 10.

diff --git a/libstdc++-v3/src/filesystem/ops.cc b/libstdc++-v3/src/filesystem/ops.cc
index 661685a..cefb927 100644
--- a/libstdc++-v3/src/filesystem/ops.cc
+++ b/libstdc++-v3/src/filesystem/ops.cc
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include   // PATH_MAX
 #ifdef _GLIBCXX_HAVE_UNISTD_H
 # include 
 # if defined(_GLIBCXX_HAVE_SYS_STAT_H) && defined(_GLIBCXX_HAVE_SYS_TYPES_H)
@@ -97,7 +98,11 @@ fs::canonical(const path& p, const path& base, error_code& ec)
 {
   path can;
 #ifdef _GLIBCXX_USE_REALPATH
-  if (char_ptr rp = char_ptr{::realpath(absolute(p, base).c_str(), nullptr)})
+  char* buffer = nullptr;
+#if defined(__SunOS_5_10) && defined(PATH_MAX)
+  buffer = (char*)::malloc(PATH_MAX);
+#endif
+  if (char_ptr rp = char_ptr{::realpath(absolute(p, base).c_str(), buffer)})
 {
   can.assign(rp.get());
   ec.clear();


Re: [PATCH] vectorizing conditional expressions (PR tree-optimization/65947)

2015-09-11 Thread Richard Sandiford
Ramana Radhakrishnan  writes:
> On Fri, Sep 11, 2015 at 2:19 PM, Bill Schmidt
>  wrote:
>> Hi Alan,
>>
>> I probably wasn't clear enough.  The implementation in the vectorizer is
>> fine and I'm not asking that to change per target.  What I'm objecting
>> to is the equivalence between a REDUC_MAX_EXPR and a cost associated
>> with vec_to_scalar.  This assumes that the back end will implement a
>> REDUC_MAX_EXPR in a specific way that at least some back ends cannot.
>> But those back ends should be free to model the cost of the
>> REDUC_MAX_EXPR appropriately.  Therefore I am asking for a new
>> vect_cost_for_stmt type to represent the cost of a REDUC_MAX_EXPR.  For
>> ARM, this cost will be the same as a vec_to_scalar.  For others, it may
>> not be; for powerpc, it certainly will not be.
>>
>> We can produce a perfectly fine sequence for a REDUC_MAX_EXPR during RTL
>> expansion, and therefore it is not correct for us to explode this in
>> tree-vect-generic.  This would expand the code size without providing
>> any significant optimization opportunity, and could reduce the ability
>> to, for instance, common REDUC_MAX_EXPRs.  It would also slow down the
>> gimple vectorizers.
>>
>> I apologize if my loose use of language confused the issue.  It isn't
>> the whole COND_REDUCTION I'm concerned with, but the REDUC_MAX_EXPRs
>> that are used by it.
>>
>> (The costs in powerpc won't be enormous, but they are definitely
>> mode-dependent in a way that vec_to_scalar is not.  We'll need 2*log(n)
>> instructions, where n is the number of elements in the mode being
>> vectorized.)
>
> IIUC, on AArch64 a reduc_max_expr matches with a single reduction
> operation but on AArch32 Neon a reduc_smax gets implemented as a
> sequence of vpmax instructions which sounds similar to the PowerPC
> example as well. Thus mapping a reduc_smax expression to the cost of a
> vec_to_scalar is probably not right in this particular situation.

But AIUI vec_to_scalar exists to represent reduction operations.
(I see it was also used for strided stores.)  So for better or worse,
I think the interface that Alan's patch uses is the defined interface
for measuring the cost of a reduction.

If a backend implemented reduc_umax_scal_optab in current sources,
without Alan's patch, then that optab would be used for a "natural"
unsigned max reduction (i.e. a reduction of a MAX_EXPR with unsigned
inputs).  vec_to_scalar would be used to weigh the cost of the epilogue
reduction statement in that case.

So if defining a new Power pattern might cause Alan's patch to trigger
in cases where the transformation is actually too expensive, I would
expect the same to be true for a natural umax without Alan's patch.
The two cases ought to underestimate the true cost by the same degree.

In other words, whether the cost interface is flexible enough is
definitely interesting but seems orthogonal to this patch.

Thanks,
Richard



Re: Ping^2 Re: Pass -foffload targets from driver to libgomp at link time

2015-09-11 Thread Bernd Schmidt



On 09/11/2015 04:23 PM, Joseph Myers wrote:

On Thu, 10 Sep 2015, Bernd Schmidt wrote:


On 09/10/2015 03:41 PM, Joseph Myers wrote:

Ping^2.  This patch
 is still
pending review.


No fundamental objections, but I have some questions. Cuold you describe
what the handling of flags/lang_mask accomplishes in this patch? Would
option handling be simpler if the creation/compilation of the extra file
happened in lto_wrapper (where we already do similar things through
mkoffload)?


The point of the lang_mask handling is that if, say, we're compiling C++
or Fortran code, with options that aren't valid for C, we mustn't pass
those options to cc1 when building the constructor as C code, but we do
still need to pass options valid for C (which might e.g. affect the ABI).

[...]

I don't see lto-wrapper as being any easier as a place to do this; no
doubt lto-wrapper or collect2 could create the file and call back into the
driver to compile it, but I don't see the advantage in doing that over
having the driver (which already has all the relevant information, since
it's coming from the command line rather than inspection of object files
being linked) do it.


The point would be that lto_wrapper already produces such an appropriate 
set of options. But I guess if you're thinking ahead to using this 
filtering in gcc.c for other purposes then that's also a good argument. 
So, patch is ok, but please update the comment for give_switch (document 
the new behaviour and that it depends on a global variable).


I expect you know best what to do in the OpenACC testsuite driver, but 
you might want to run the libgomp.exp parts by Jakub. If the testsuite 
parts are independent of the rest of the patch, please repost them 
separately.



Bernd


Re: Ping^2 Re: Pass -foffload targets from driver to libgomp at link time

2015-09-11 Thread Joseph Myers
On Thu, 10 Sep 2015, Bernd Schmidt wrote:

> On 09/10/2015 03:41 PM, Joseph Myers wrote:
> > Ping^2.  This patch 
> >  is still 
> > pending review.
> 
> No fundamental objections, but I have some questions. Cuold you describe
> what the handling of flags/lang_mask accomplishes in this patch? Would
> option handling be simpler if the creation/compilation of the extra file
> happened in lto_wrapper (where we already do similar things through
> mkoffload)?

The point of the lang_mask handling is that if, say, we're compiling C++ 
or Fortran code, with options that aren't valid for C, we mustn't pass 
those options to cc1 when building the constructor as C code, but we do 
still need to pass options valid for C (which might e.g. affect the ABI).

There's an argument that this sort of option filtering should be done more 
generally.  That is, if we have a mixed-language compilation in a single 
call to the driver, it should filter the options so that cc1 gets those 
options applicable for C, cc1plus those applicable to C++, etc., with 
options for inappropriate languages only being diagnosed if none of the 
source files are for that language.  I don't know if that's the right 
thing to do or not, but it's at least plausible.

I don't see lto-wrapper as being any easier as a place to do this; no 
doubt lto-wrapper or collect2 could create the file and call back into the 
driver to compile it, but I don't see the advantage in doing that over 
having the driver (which already has all the relevant information, since 
it's coming from the command line rather than inspection of object files 
being linked) do it.

> I initially thought the information you're giving to
> GOMP_set_offload_targets is already available implicitly, from the calls
> to GOMP_offload_register. But digging through the archives it sounds
> like the problem is that if there's no offloadable code, no offload
> image will be generated. Is that correct?

Yes.  In the message Thomas referred to, "On the other hand, for example, 
for -foffload=nvptx-none, even if user program code doesn't contain any 
offloaded data (and thus the offload machinery has not been run), the user 
program might still contain any executable directives or OpenACC runtime 
library calls, so we'd still like to use the libgomp nvptx plugin.  
However, we currently cannot detect this situation.".

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH 2/2] shrink-wrap: Rewrite try_shrink_wrapping

2015-09-11 Thread Jiong Wang

Segher Boessenkool writes:

> On Fri, Sep 11, 2015 at 10:24:42AM +0100, Jiong Wang wrote:
>> 
>> Segher Boessenkool writes:
>> 
>> > On Thu, Sep 10, 2015 at 08:14:29AM -0700, Segher Boessenkool wrote:
>> >> This patch rewrites the shrink-wrapping algorithm, allowing non-linear
>> >> pieces of CFG to be duplicated for use without prologue instead of just
>> >> linear pieces.
>> >
>> >> Bootstrapped and regression tested on powerpc64-linux.  Is this okay
>> >> for mainline?
>> >
>> > Now also bootstrapped and regression tested on x86_64-linux.
>> 
>> + AArch64 boostrapping OK.
>
> Thank you for testing!
>
>> A quick check shows > 30% more functions shrink-wrapped during
>> bootstrapping by a the following command:
>> 
>> cd $TOP_BUILD ; find . -name "*.pro_and_epilogue" | xargs grep 
>> "Perform.*shrink" | wc -l
>
> Wow, that is a lot!  But this is mostly the testsuite?  Shorter functions
> can be wrapped a whole lot more often.

They all comes from gcc source code, not from testsuite as my bootstrap
command is "make BOOT_CFLAGS=-O2 -fdump-rtl-pro_and_epilogue". testsuite
itself is not involved in bootstrap.

And I can confirm I get >30% more functions shrink-wrapped by

cd $TOP_BUILD/gcc ; grep "Perform.*shrink" *.pro_and_epilogue | wc -l

This only count shrink-wrap performed on gcc core source code during
final stage in bootstrapping. I also do some quick check, new
shrink-wrap opportunites come from files like dwarf2out.c, emit-rtl.c,
tree.c, tree-into-ssa.c etc, so they are valid.

I know shrink-wrap is very sensitive to the RTL instruction sequences,
looks like your re-write make it much more friendly to AArch64 :)

-- 
Regards,
Jiong



Re: [PATCH] vectorizing conditional expressions (PR tree-optimization/65947)

2015-09-11 Thread Bill Schmidt
On Fri, 2015-09-11 at 15:29 +0100, Richard Sandiford wrote:
> Ramana Radhakrishnan  writes:
> > On Fri, Sep 11, 2015 at 2:19 PM, Bill Schmidt
> >  wrote:
> >> Hi Alan,
> >>
> >> I probably wasn't clear enough.  The implementation in the vectorizer is
> >> fine and I'm not asking that to change per target.  What I'm objecting
> >> to is the equivalence between a REDUC_MAX_EXPR and a cost associated
> >> with vec_to_scalar.  This assumes that the back end will implement a
> >> REDUC_MAX_EXPR in a specific way that at least some back ends cannot.
> >> But those back ends should be free to model the cost of the
> >> REDUC_MAX_EXPR appropriately.  Therefore I am asking for a new
> >> vect_cost_for_stmt type to represent the cost of a REDUC_MAX_EXPR.  For
> >> ARM, this cost will be the same as a vec_to_scalar.  For others, it may
> >> not be; for powerpc, it certainly will not be.
> >>
> >> We can produce a perfectly fine sequence for a REDUC_MAX_EXPR during RTL
> >> expansion, and therefore it is not correct for us to explode this in
> >> tree-vect-generic.  This would expand the code size without providing
> >> any significant optimization opportunity, and could reduce the ability
> >> to, for instance, common REDUC_MAX_EXPRs.  It would also slow down the
> >> gimple vectorizers.
> >>
> >> I apologize if my loose use of language confused the issue.  It isn't
> >> the whole COND_REDUCTION I'm concerned with, but the REDUC_MAX_EXPRs
> >> that are used by it.
> >>
> >> (The costs in powerpc won't be enormous, but they are definitely
> >> mode-dependent in a way that vec_to_scalar is not.  We'll need 2*log(n)
> >> instructions, where n is the number of elements in the mode being
> >> vectorized.)
> >
> > IIUC, on AArch64 a reduc_max_expr matches with a single reduction
> > operation but on AArch32 Neon a reduc_smax gets implemented as a
> > sequence of vpmax instructions which sounds similar to the PowerPC
> > example as well. Thus mapping a reduc_smax expression to the cost of a
> > vec_to_scalar is probably not right in this particular situation.
> 
> But AIUI vec_to_scalar exists to represent reduction operations.
> (I see it was also used for strided stores.)  So for better or worse,
> I think the interface that Alan's patch uses is the defined interface
> for measuring the cost of a reduction.
>
> If a backend implemented reduc_umax_scal_optab in current sources,
> without Alan's patch, then that optab would be used for a "natural"
> unsigned max reduction (i.e. a reduction of a MAX_EXPR with unsigned
> inputs).  vec_to_scalar would be used to weigh the cost of the epilogue
> reduction statement in that case.
> 
> So if defining a new Power pattern might cause Alan's patch to trigger
> in cases where the transformation is actually too expensive, I would
> expect the same to be true for a natural umax without Alan's patch.
> The two cases ought to underestimate the true cost by the same degree.
> 
> In other words, whether the cost interface is flexible enough is
> definitely interesting but seems orthogonal to this patch.

That's a reasonable argument, but is this not a good opportunity to fix
an incorrect assumption in the vectorizer cost model?  I would prefer
for this issue not to get lost on a technicality.

The vectorizer cost model has many small flaws, and we all need to be
mindful of trying to improve it at every opportunity, rather than
allowing it to continue to degrade.  We just had a big discussion about
improving cost models at the last Cauldron, and my request is consistent
with that direction.

Saying that all reductions have equivalent performance is unlikely to be
true for many platforms.  On PowerPC, for example, a PLUS reduction has
very different cost from a MAX reduction.  If the model isn't
fine-grained enough, let's please be aggressive about fixing it.  I'm
fine if it's a separate patch, but in my mind this shouldn't be allowed
to languish.

Thanks,
Bill

> 
> Thanks,
> Richard
> 




Re: [PATCH 04/22] Reimplement diagnostic_show_locus, introducing rich_location classes

2015-09-11 Thread David Malcolm
On Fri, 2015-09-11 at 16:07 +0200, Michael Matz wrote:
> Hi,
> 
> On Thu, 10 Sep 2015, David Malcolm wrote:
> 
> > +/* A range of source locations.
> > +
> > +   Ranges are half-open:
> > +   m_start is the first location within the range, whereas
> > +   m_finish is the first location *after* the range.
> 
> I think you eventually decided that they are closed, not half-open, at 
> least this:

Oops.  Good catch; thanks.  Yes: in an early version of this work they
were half-open, but I found having both endpoints be within the range to
be much more convenient.


> > +  static source_range from_location (source_location loc)
> > +  {
> > +source_range result;
> > +result.m_start = loc;
> > +result.m_finish = loc;
> 
> and this:
> 
> > +/* Ranges are closed
> > +   m_start is the first location within the range, and
> > +   m_finish is the last location within the range.  */
> 
> suggest so :)




Re: [PATCH] vectorizing conditional expressions (PR tree-optimization/65947)

2015-09-11 Thread Richard Sandiford
Bill Schmidt  writes:
> On Fri, 2015-09-11 at 15:29 +0100, Richard Sandiford wrote:
>> Ramana Radhakrishnan  writes:
>> > On Fri, Sep 11, 2015 at 2:19 PM, Bill Schmidt
>> >  wrote:
>> >> Hi Alan,
>> >>
>> >> I probably wasn't clear enough.  The implementation in the vectorizer is
>> >> fine and I'm not asking that to change per target.  What I'm objecting
>> >> to is the equivalence between a REDUC_MAX_EXPR and a cost associated
>> >> with vec_to_scalar.  This assumes that the back end will implement a
>> >> REDUC_MAX_EXPR in a specific way that at least some back ends cannot.
>> >> But those back ends should be free to model the cost of the
>> >> REDUC_MAX_EXPR appropriately.  Therefore I am asking for a new
>> >> vect_cost_for_stmt type to represent the cost of a REDUC_MAX_EXPR.  For
>> >> ARM, this cost will be the same as a vec_to_scalar.  For others, it may
>> >> not be; for powerpc, it certainly will not be.
>> >>
>> >> We can produce a perfectly fine sequence for a REDUC_MAX_EXPR during RTL
>> >> expansion, and therefore it is not correct for us to explode this in
>> >> tree-vect-generic.  This would expand the code size without providing
>> >> any significant optimization opportunity, and could reduce the ability
>> >> to, for instance, common REDUC_MAX_EXPRs.  It would also slow down the
>> >> gimple vectorizers.
>> >>
>> >> I apologize if my loose use of language confused the issue.  It isn't
>> >> the whole COND_REDUCTION I'm concerned with, but the REDUC_MAX_EXPRs
>> >> that are used by it.
>> >>
>> >> (The costs in powerpc won't be enormous, but they are definitely
>> >> mode-dependent in a way that vec_to_scalar is not.  We'll need 2*log(n)
>> >> instructions, where n is the number of elements in the mode being
>> >> vectorized.)
>> >
>> > IIUC, on AArch64 a reduc_max_expr matches with a single reduction
>> > operation but on AArch32 Neon a reduc_smax gets implemented as a
>> > sequence of vpmax instructions which sounds similar to the PowerPC
>> > example as well. Thus mapping a reduc_smax expression to the cost of a
>> > vec_to_scalar is probably not right in this particular situation.
>> 
>> But AIUI vec_to_scalar exists to represent reduction operations.
>> (I see it was also used for strided stores.)  So for better or worse,
>> I think the interface that Alan's patch uses is the defined interface
>> for measuring the cost of a reduction.
>>
>> If a backend implemented reduc_umax_scal_optab in current sources,
>> without Alan's patch, then that optab would be used for a "natural"
>> unsigned max reduction (i.e. a reduction of a MAX_EXPR with unsigned
>> inputs).  vec_to_scalar would be used to weigh the cost of the epilogue
>> reduction statement in that case.
>> 
>> So if defining a new Power pattern might cause Alan's patch to trigger
>> in cases where the transformation is actually too expensive, I would
>> expect the same to be true for a natural umax without Alan's patch.
>> The two cases ought to underestimate the true cost by the same degree.
>> 
>> In other words, whether the cost interface is flexible enough is
>> definitely interesting but seems orthogonal to this patch.
>
> That's a reasonable argument, but is this not a good opportunity to fix
> an incorrect assumption in the vectorizer cost model?  I would prefer
> for this issue not to get lost on a technicality.

I think it's more than technicality though.  I don't think it should be
Alan's responsibility to extend the cost model when (a) his patch uses the
current model in the way that it was intended to be used (at least AIUI) and
(b) in this case, the motivating example for the new model is a pattern
that hasn't been written yet. :-)

So...

> The vectorizer cost model has many small flaws, and we all need to be
> mindful of trying to improve it at every opportunity, rather than
> allowing it to continue to degrade.  We just had a big discussion about
> improving cost models at the last Cauldron, and my request is consistent
> with that direction.
>
> Saying that all reductions have equivalent performance is unlikely to be
> true for many platforms.  On PowerPC, for example, a PLUS reduction has
> very different cost from a MAX reduction.  If the model isn't
> fine-grained enough, let's please be aggressive about fixing it.  I'm
> fine if it's a separate patch, but in my mind this shouldn't be allowed
> to languish.

...I agree that the general vectoriser cost model could probably be
improved, but it seems fairer for that improvement to be done by whoever
adds the patterns that need it.

Thanks,
Richard



Re: [PATCH] PR67401: Fix wrong code generated by expand_atomic_compare_and_swap

2015-09-11 Thread John David Anglin

On 2015-09-11 4:15 AM, Bernd Schmidt wrote:

On 09/11/2015 01:21 AM, John David Anglin wrote:
As noted in the PR, expand_atomic_compare_and_swap can generate wrong 
code when libcalls are emitted
for the sync_compare_and_swap and the result comparison test. This is 
fixed by emitting a move insn to copy
the result rtx of the sync_compare_and_swap libcall to target_oval 
instead of directly assigning it.
Could you provide relevant parts of the rtl dumps or (preferrably) the 
patch you are using to enable the libcall?


This can be duplicated with a cross to hppa-unknown-linux-gnu with the 
following change to enable the libcall:


Index: config/pa/pa.c
===
--- config/pa/pa.c  (revision 227689)
+++ config/pa/pa.c  (working copy)
@@ -5737,7 +5737,7 @@
 }

   if (TARGET_SYNC_LIBCALL)
-init_sync_libfuncs (UNITS_PER_WORD);
+init_sync_libfuncs (8);
 }

The relevant rtl from .expand is:

(call_insn 48 47 49 (parallel [
(set (reg:DI 28 %r28)
(call (mem:SI (symbol_ref/v:SI 
("@__sync_val_compare_and_swap_8") [flags 0x41]) [0  S4 A32])

(const_int 64 [0x40])))
(clobber (reg:SI 1 %r1))
(clobber (reg:SI 2 %r2))
(use (const_int 0 [0]))
]) xxx.c:6 -1
 (expr_list:REG_EH_REGION (const_int -2147483648 [0x8000])
(nil))
(expr_list (use (reg:DI 23 %r23))
(expr_list (use (reg:SI 26 %r26))
(expr_list (use (mem:DI (plus:SI (reg/f:SI 93 
virtual-outgoing-args)
(const_int -24 [0xffe8])) [0  
S8 A64]))

(nil)

(insn 49 48 50 (set (reg:SI 128)
(const_int 1 [0x1])) xxx.c:6 -1
 (nil))

(insn 50 49 51 (set (reg:DI 23 %r23)
(reg:DI 123)) xxx.c:6 -1
 (nil))

(insn 51 50 52 (set (reg:DI 25 %r25)
(reg:DI 28 %r28)) xxx.c:6 -1
 (nil))

(call_insn/u 52 51 53 (parallel [
(set (reg:SI 28 %r28)
(call (mem:SI (symbol_ref/v:SI ("@__ucmpdi2") [flags 
0x41]) [0  S4 A32])

(const_int 64 [0x40])))
(clobber (reg:SI 1 %r1))
(clobber (reg:SI 2 %r2))
(use (const_int 0 [0]))
]) xxx.c:6 -1
 (expr_list:REG_EH_REGION (const_int -2147483648 [0x8000])
(nil))
(expr_list (use (reg:DI 23 %r23))
(expr_list (use (reg:DI 25 %r25))
(nil

(jump_insn 53 52 54 (set (pc)
(if_then_else (eq (reg:SI 28 %r28)
(const_int 1 [0x1]))
(label_ref 55)
(pc))) xxx.c:6 -1
 (nil))

(insn 54 53 55 (set (reg:SI 128)
(const_int 0 [0])) xxx.c:6 -1
 (nil))

(code_label 55 54 56 3 "" [0 uses])

(jump_insn 56 55 57 (set (pc)
(if_then_else (ne (reg:SI 128)
(const_int 0 [0]))
(label_ref 58)
(pc))) xxx.c:6 -1
 (nil))

(insn 57 56 58 (set (mem:DI (reg:SI 122) [0  S8 A64])
(reg:DI 28 %r28)) xxx.c:6 -1
 (nil))

(code_label 58 57 59 4 "" [0 uses])

(insn 59 58 0 (set (reg:SI 102)
(reg:SI 128)) xxx.c:6 -1
 (nil))

;; if (_17 != 0)

(jump_insn 60 59 0 (set (pc)
(if_then_else (ne (reg:SI 102)
(const_int 0 [0]))
(label_ref 0)
(pc))) xxx.c:6 -1
 (nil))

The value in reg:DI 28 returned by the __sync_val_compare_and_swap_8 is 
clobbered by the

the call to __ucmpdi2.  It is needed in insn 57.

Dave

--
John David Anglin  dave.ang...@bell.net



Re: [PATCH][AArch64] Use preferred aliases for CSNEG, CSINC, CSINV

2015-09-11 Thread Kyrill Tkachov

Ping.
https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00020.html

Thanks,
Kyrill

On 01/09/15 11:08, Kyrill Tkachov wrote:

Hi all,

The ARMv8-A reference manual says:
"CNEG , , 
is equivalent to
CSNEG , , , invert()
and is the preferred disassembly when Rn == Rm && cond != '111x'."

That is, when the two input registers are the same we can use the shorter CNEG 
mnemonic
with the inverse condition instead of the longer CSNEG instruction. Similarly 
for the
CSINV and CSINC instructions, they have shorter CINV and CINC forms.
This patch adjusts the output templates to emit the preferred shorter sequences 
when possible.

The new mnemonics are just aliases, they map down to the same instruction in 
the end, so there
are no performance or behaviour implications. But it does make the assembly a 
bit more readable
IMO, since:
"cnegw27, w9, le"
can be simply read as "if the condition is less or equal negate w9" instead of 
the previous:
"csnegw27, w9, w9, gt" where you have to remember which of the input 
registers is negated.


Bootstrapped and tested on aarch64-linux-gnu.
Ok for trunk?

Thanks,
Kyrill

2015-09-01  Kyrylo Tkachov  

  * config/aarch64/aarch64.md (csinc3_insn): Use CINC
  mnemonic when possible.
  (*csinv3_insn): Use CINV mnemonic when possible.
  (csneg3_insn): USE CNEG mnemonic when possible.

2015-09-01  Kyrylo Tkachov  

  * gcc.target/aarch64/abs_1.c: Update scan-assembler checks
  to allow cneg.
  * gcc.target/aarch64/cond_op_imm_1.c: Likewise.  Likewise for cinv.
  * gcc.target/aarch64/mod_2.c: Likewise.




Re: Ping^2 Re: Pass -foffload targets from driver to libgomp at link time

2015-09-11 Thread Joseph Myers
On Fri, 11 Sep 2015, Bernd Schmidt wrote:

> I expect you know best what to do in the OpenACC testsuite driver, but you
> might want to run the libgomp.exp parts by Jakub. If the testsuite parts are
> independent of the rest of the patch, please repost them separately.

Jakub?  The testsuite changes and the rest of the patch depend on each 
other.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH 21/22] Use Levenshtein distance for various misspellings in C frontend

2015-09-11 Thread Manuel López-Ibáñez

On 10/09/15 22:28, David Malcolm wrote:

There are a couple of FIXMEs here:
* where to call levenshtein_distance_unit_tests


Should this be part of make check? Perhaps a small program that is compiled and 
linked with spellcheck.c? This would be possible if spellcheck.c did not depend 
on tree.h or tm.h, which I doubt it needs to.



* should we attempt error-recovery in c-typeck.c:build_component_ref


I would say yes, but why not leave this discussion to a later patch? The 
current one seems useful enough.



+
+/* Look for the closest match for NAME within the currently valid
+   scopes.
+
+   This finds the identifier with the lowest Levenshtein distance to
+   NAME.  If there are multiple candidates with equal minimal distance,
+   the first one found is returned.  Scopes are searched from innermost
+   outwards, and within a scope in reverse order of declaration, thus
+   benefiting candidates "near" to the current scope.  */
+
+tree
+lookup_name_fuzzy (tree name)
+{
+  gcc_assert (TREE_CODE (name) == IDENTIFIER_NODE);
+
+  c_binding *best_binding = NULL;
+  int best_distance = INT_MAX;
+
+  for (c_scope *scope = current_scope; scope; scope = scope->outer)
+for (c_binding *binding = scope->bindings; binding; binding = 
binding->prev)
+  {
+   if (!binding->id)
+ continue;
+   int dist = levenshtein_distance (name, binding->id);
+   if (dist < best_distance)


I guess 'dist' cannot be negative. Can it be zero? If not, wouldn't be 
appropriate to exit as soon as it becomes 1?


Is this code discriminating between types and names? That is, what happens for:

typedef int ins;

int foo(void)
{
   int inr;
   inp x;
}


+/* Recursively append candidate IDENTIFIER_NODEs to CANDIDATES.  */
+
+static void
+lookup_field_fuzzy_find_candidates (tree type, tree component,
+   vec *candidates)
+{
+  tree field;
+  for (field = TYPE_FIELDS (type); field; field = DECL_CHAIN (field))
+{
+  if (DECL_NAME (field) == NULL_TREE
+ && (TREE_CODE (TREE_TYPE (field)) == RECORD_TYPE
+ || TREE_CODE (TREE_TYPE (field)) == UNION_TYPE))
+   {
+ lookup_field_fuzzy_find_candidates (TREE_TYPE (field),
+ component,
+ candidates);
+   }
+
+  if (DECL_NAME (field))
+   candidates->safe_push (field);
+}
+}


This is appending inner-most, isn't it? Thus, given:

struct s{
struct j { int aa; } kk;
int aa;
};

void foo(struct s x)
{
x.ab;
}

it will find s::j::aa before s::aa, no?


  tree
-build_component_ref (location_t loc, tree datum, tree component)
+build_component_ref (location_t loc, tree datum, tree component,
+source_range *ident_range)
  {
tree type = TREE_TYPE (datum);
enum tree_code code = TREE_CODE (type);
@@ -2294,7 +2356,31 @@ build_component_ref (location_t loc, tree datum, tree 
component)

if (!field)
{
- error_at (loc, "%qT has no member named %qE", type, component);
+ if (!ident_range)
+   {
+ error_at (loc, "%qT has no member named %qE",
+   type, component);
+ return error_mark_node;
+   }
+ gcc_rich_location richloc (*ident_range);
+ if (TREE_CODE (datum) == INDIRECT_REF)
+   richloc.add_expr (TREE_OPERAND (datum, 0));
+ else
+   richloc.add_expr (datum);
+ field = lookup_field_fuzzy (type, component);
+ if (field)
+   {
+ error_at_rich_loc
+   (,
+"%qT has no member named %qE; did you mean %qE?",
+type, component, field);
+ /* FIXME: error recovery: should we try to keep going,
+with "field"? (having issued an error, and hence no
+output).  */
+   }
+ else
+   error_at_rich_loc (, "%qT has no member named %qE",
+  type, component);
  return error_mark_node;
}


I don't understand why looking for a candidate or not depends on ident_range.


--- /dev/null
+++ b/gcc/testsuite/gcc.dg/spellcheck.c
@@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-options "-fdiagnostics-show-caret" } */
+
+struct foo
+{
+  int foo;
+  int bar;
+  int baz;
+};
+
+int test (struct foo *ptr)
+{
+  return ptr->m_bar; /* { dg-error "'struct foo' has no member named 'm_bar'; did 
you mean 'bar'?" } */
+
+/* { dg-begin-multiline-output "" }
+   return ptr->m_bar;
+  ~~~  ^
+   { dg-end-multiline-output "" } */
+}
+
+int test2 (void)
+{
+  struct foo instance = {};
+  return instance.m_bar; /* { dg-error "'struct foo' has no member named 'm_bar'; 
did you mean 'bar'?" } */
+
+/* { dg-begin-multiline-output "" }
+   return instance.m_bar;
+   ^
+   { dg-end-multiline-output "" } */
+}
+
+int64 foo; /* { dg-error "unknown type name 'int64'; did you mean 

Re: [PATCH][AArch64] Use preferred aliases for CSNEG, CSINC, CSINV

2015-09-11 Thread James Greenhalgh
On Tue, Sep 01, 2015 at 11:08:10AM +0100, Kyrill Tkachov wrote:
> Hi all,
> 
> The ARMv8-A reference manual says:
> "CNEG , , 
> is equivalent to
> CSNEG , , , invert()
> and is the preferred disassembly when Rn == Rm && cond != '111x'."
> 
> That is, when the two input registers are the same we can use the shorter 
> CNEG mnemonic
> with the inverse condition instead of the longer CSNEG instruction. Similarly 
> for the
> CSINV and CSINC instructions, they have shorter CINV and CINC forms.
> This patch adjusts the output templates to emit the preferred shorter 
> sequences when possible.
> 
> The new mnemonics are just aliases, they map down to the same instruction in 
> the end, so there
> are no performance or behaviour implications. But it does make the assembly a 
> bit more readable
> IMO, since:
> "cnegw27, w9, le"
> can be simply read as "if the condition is less or equal negate w9" instead 
> of the previous:
> "csnegw27, w9, w9, gt" where you have to remember which of the input 
> registers is negated.
> 
> 
> Bootstrapped and tested on aarch64-linux-gnu.
> Ok for trunk?
>
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 77bc7cd..2e4b26c 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -3090,7 +3090,12 @@ (define_insn "csinc3_insn"
>   (const_int 1))
> (match_operand:GPI 3 "aarch64_reg_or_zero" "rZ")))]
>""
> -  "csinc\\t%0, %3, %2, %M1"
> +  {
> +if (rtx_equal_p (operands[2], operands[3]))
> +  return "cinc\\t%0, %2, %m1";
> +else
> +  return "csinc\\t%0, %3, %2, %M1";
> +  }
>[(set_attr "type" "csel")]
>  )

I guess you do it this way rather than just adding a new alternative in
the pattern to avoid any chance of constraining the register allocator, but
would this not be more natural to read as an {r, r, r, 2} alternative, or
similar?

If you've given that some thought and decided it doesn't work for you,
then this is OK for trunk.

Thanks,
James



Re: [PATCH][RTL-ifcvt] PR rtl-optimization/67465: Handle pairs of complex+simple blocks and empty blocks more gracefully

2015-09-11 Thread Kyrill Tkachov


On 11/09/15 09:51, Rainer Orth wrote:

Kyrill Tkachov  writes:


On 10/09/15 12:43, Rainer Orth wrote:

Hi Kyrill,


Rainer, could you please check that this patch still fixes the SPARC
regressions?

unfortunately, it breaks sparc-sun-solaris2.10 bootstrap: compiling
stage2 libiberty/regex.c FAILs:



Thanks for providing the preprocessed file.
I've reproduced and fixed the ICE in this version of the patch.
The problem was that I was taking the mode of x before the check
of whether a and b are MEMs, after which we would change x to an
address_mode reg,
thus confusing emit_move_insn.

The fix is to take the mode of x and perform the can_conditionally_move_p check
after that transformation.

Bootstrapped and tested on aarch64 and x86_64.
The preprocessed regex.i that Rainer provided now compiles successfully for me
on a sparc-sun-solaris2.10 stage-1 cross-compiler.

Rainer, thanks for your help so far, could you please try out this patch?

While bootstrap succeeds again, the testsuite regression in
gcc.c-torture/execute/20071216-1.c reoccured.

Right, so I dug into the RTL dumps and I think this is a separate issue that's 
being exacerbated by my patch.
The code tries to if-convert a block which contains a compare instruction i.e. 
sets the CC register.
Now, bb_valid_for_noce_process_p should have caught this, and in particular 
insn_valid_noce_process_p
which should check that the instruction doesn't set the CC register. However, 
on SPARC the
cc_in_cond returns NULL! This is due to the canonicalize_comparison 
implementation that seems to
remove the CC register from the condition expression and returns something like:
(leu (reg/v:SI 109 [ b ])
(const_int -4096 [0xf000])

Therefore the set_of (cc_in_cond (cond), insn) check doesn't get triggered 
because cc_in_cond returns NULL.
Regardless of how the branch condition got canonicalized, I think we still want 
to reject any insn in the block
that sets a condition code register, so this patch checks the destination of 
every set in the block for a MODE_CC
expression and cancels if-conversion if that's the case.

Oleg pointed me to the older PR 58517 affecting SH which seems similar and I 
think my previous ifcvt patch would expose
this problem more.

Anyway, with this patch the failing SPARC testcase 
gcc.c-torture/execute/20071216-1.c generates the same assembly
as before r227368 and bootstrap and test on aarch64 and x86_64 passes ok for me.

Rainer, could you try this patch on top of the previous patch? 
(https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00689.html)
The two together should fix all of PR 67456, 67464, 67465 and 67481.

Thanks,
Kyrill

2015-09-11  Kyrylo Tkachov  

PR rtl-optimization/67481
* ifcvt.c (contains_ccmode_rtx_p): New function.
(insn_valid_noce_process_p): Use it.


diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 9af3249..090a584 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -1838,6 +1838,19 @@ noce_try_cmove (struct noce_if_info *if_info)
   return FALSE;
 }
 
+/* Return true if X contains a conditional code mode rtx.  */
+
+static bool
+contains_ccmode_rtx_p (rtx x)
+{
+  subrtx_iterator::array_type array;
+  FOR_EACH_SUBRTX (iter, array, x, ALL)
+if (GET_MODE_CLASS (GET_MODE (*iter)) == MODE_CC)
+  return true;
+
+  return false;
+}
+
 /* Helper for bb_valid_for_noce_process_p.  Validate that
the rtx insn INSN is a single set that does not set
the conditional register CC and is in general valid for
@@ -1856,6 +1869,7 @@ insn_valid_noce_process_p (rtx_insn *insn, rtx cc)
   /* Currently support only simple single sets in test_bb.  */
   if (!sset
   || !noce_operand_ok (SET_DEST (sset))
+  || contains_ccmode_rtx_p (SET_DEST (sset))
   || !noce_operand_ok (SET_SRC (sset)))
 return false;
 


Re: Ping^2 Re: Pass -foffload targets from driver to libgomp at link time

2015-09-11 Thread Jakub Jelinek
On Fri, Sep 11, 2015 at 03:26:04PM +, Joseph Myers wrote:
> On Fri, 11 Sep 2015, Bernd Schmidt wrote:
> 
> > I expect you know best what to do in the OpenACC testsuite driver, but you
> > might want to run the libgomp.exp parts by Jakub. If the testsuite parts are
> > independent of the rest of the patch, please repost them separately.
> 
> Jakub?  The testsuite changes and the rest of the patch depend on each 
> other.

So, do I understand well that you'll call GOMP_set_offload_targets from
constructs of all shared libraries (and the binary) that contain offloaded
code?  If yes, that is surely going to fail the assertions in there.
You can dlopen such libraries etc.  What if you link one library with
-fopenmp=nvptx-none and another one with -fopenmp=x86_64-intelmicemul-linux?
Can't the -foffload= string be passed to GOMP_offload_register_ver
(or just derive the list of plugins that should be loaded or at least those
that should be tried first from the list of offloaded data that has been
registered so far)?
I mean, it is also very well possible some program calls omp_get_num_devices
() etc. say from main binary and only then dlopens shared libraries that
contain offloaded regions and then attempt to offload in those shared
libraries.  So, better it should always load all possible plugins, but
perhaps in order determined by what has been registered?

Jakub


Re: [PATCH] vectorizing conditional expressions (PR tree-optimization/65947)

2015-09-11 Thread Bill Schmidt
On Fri, 2015-09-11 at 16:28 +0100, Richard Sandiford wrote:
> Bill Schmidt  writes:
> > On Fri, 2015-09-11 at 15:29 +0100, Richard Sandiford wrote:
> >> Ramana Radhakrishnan  writes:
> >> > On Fri, Sep 11, 2015 at 2:19 PM, Bill Schmidt
> >> >  wrote:
> >> >> Hi Alan,
> >> >>
> >> >> I probably wasn't clear enough.  The implementation in the vectorizer is
> >> >> fine and I'm not asking that to change per target.  What I'm objecting
> >> >> to is the equivalence between a REDUC_MAX_EXPR and a cost associated
> >> >> with vec_to_scalar.  This assumes that the back end will implement a
> >> >> REDUC_MAX_EXPR in a specific way that at least some back ends cannot.
> >> >> But those back ends should be free to model the cost of the
> >> >> REDUC_MAX_EXPR appropriately.  Therefore I am asking for a new
> >> >> vect_cost_for_stmt type to represent the cost of a REDUC_MAX_EXPR.  For
> >> >> ARM, this cost will be the same as a vec_to_scalar.  For others, it may
> >> >> not be; for powerpc, it certainly will not be.
> >> >>
> >> >> We can produce a perfectly fine sequence for a REDUC_MAX_EXPR during RTL
> >> >> expansion, and therefore it is not correct for us to explode this in
> >> >> tree-vect-generic.  This would expand the code size without providing
> >> >> any significant optimization opportunity, and could reduce the ability
> >> >> to, for instance, common REDUC_MAX_EXPRs.  It would also slow down the
> >> >> gimple vectorizers.
> >> >>
> >> >> I apologize if my loose use of language confused the issue.  It isn't
> >> >> the whole COND_REDUCTION I'm concerned with, but the REDUC_MAX_EXPRs
> >> >> that are used by it.
> >> >>
> >> >> (The costs in powerpc won't be enormous, but they are definitely
> >> >> mode-dependent in a way that vec_to_scalar is not.  We'll need 2*log(n)
> >> >> instructions, where n is the number of elements in the mode being
> >> >> vectorized.)
> >> >
> >> > IIUC, on AArch64 a reduc_max_expr matches with a single reduction
> >> > operation but on AArch32 Neon a reduc_smax gets implemented as a
> >> > sequence of vpmax instructions which sounds similar to the PowerPC
> >> > example as well. Thus mapping a reduc_smax expression to the cost of a
> >> > vec_to_scalar is probably not right in this particular situation.
> >> 
> >> But AIUI vec_to_scalar exists to represent reduction operations.
> >> (I see it was also used for strided stores.)  So for better or worse,
> >> I think the interface that Alan's patch uses is the defined interface
> >> for measuring the cost of a reduction.
> >>
> >> If a backend implemented reduc_umax_scal_optab in current sources,
> >> without Alan's patch, then that optab would be used for a "natural"
> >> unsigned max reduction (i.e. a reduction of a MAX_EXPR with unsigned
> >> inputs).  vec_to_scalar would be used to weigh the cost of the epilogue
> >> reduction statement in that case.
> >> 
> >> So if defining a new Power pattern might cause Alan's patch to trigger
> >> in cases where the transformation is actually too expensive, I would
> >> expect the same to be true for a natural umax without Alan's patch.
> >> The two cases ought to underestimate the true cost by the same degree.
> >> 
> >> In other words, whether the cost interface is flexible enough is
> >> definitely interesting but seems orthogonal to this patch.
> >
> > That's a reasonable argument, but is this not a good opportunity to fix
> > an incorrect assumption in the vectorizer cost model?  I would prefer
> > for this issue not to get lost on a technicality.
> 
> I think it's more than technicality though.  I don't think it should be
> Alan's responsibility to extend the cost model when (a) his patch uses the
> current model in the way that it was intended to be used (at least AIUI) and
> (b) in this case, the motivating example for the new model is a pattern
> that hasn't been written yet. :-)

Agreed.  However, the original patch description said in essence, this
is good for everybody, powerpc and x86 should go and implement their
patterns and use it.  It turns out not to be so simple, unfortunately.

> 
> So...
> 
> > The vectorizer cost model has many small flaws, and we all need to be
> > mindful of trying to improve it at every opportunity, rather than
> > allowing it to continue to degrade.  We just had a big discussion about
> > improving cost models at the last Cauldron, and my request is consistent
> > with that direction.
> >
> > Saying that all reductions have equivalent performance is unlikely to be
> > true for many platforms.  On PowerPC, for example, a PLUS reduction has
> > very different cost from a MAX reduction.  If the model isn't
> > fine-grained enough, let's please be aggressive about fixing it.  I'm
> > fine if it's a separate patch, but in my mind this shouldn't be allowed
> > to languish.
> 
> ...I agree that the general vectoriser cost model could probably be
> improved, but it 

Re: Openacc launch API

2015-09-11 Thread Nathan Sidwell

Ping?

https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01498.html



On 09/07/15 08:48, Nathan Sidwell wrote:

On 08/25/15 09:29, Nathan Sidwell wrote:

Jakub,

This patch changes the launch API for openacc parallels.  The current scheme
passes the launch dimensions as 3 separate parameters to the GOACC_parallel
function.  This is problematic for a couple of reasons:

1) these must be validated in the host compiler

2) they provide no extension to support a variety of different offload devices
with different geometry requirements.

This patch changes things so that the function tables emitted by (ptx)
mkoffloads includes the geometry triplet for each function.  This allows them to
be validated and/or manipulated in the offload compiler.  However, this only
works for compile-time known dimensions -- which is a common case.  To deal with
runtime-computed dimensions we have to retain the host-side compiler's
calculation and pass that into the GOACC_parallel function.  We change
GOACC_parallel to take a variadic list of keyed operands ending with a sentinel
marker.  These keyed operands have a slot for expansion to support multiple
different offload devices.

We also extend the functionality of the 'oacc function' internal attribute.
Rather than being a simple marker, it now has a value, which is a TREE_LIST of
the geometry required.  The geometry is held as INTEGER_CSTs on the TREE_VALUE
slots.  Runtime-calculated values are represented by an INTEGER_CST of zero.
We'll also use this representation for  'routines', where the TREE_PURPOSE slot
will be used to indicate the levels at which a routine might spawn a partitioned
loop.  Again, to allow future expansion supporting a number of different offload
devices, this can become a list-of-lists, keyed by and offload device
identifier.  The offload  compiler can manipulate this data, and a later patch
will do this within a new oacc-xform pass.

I  did rename the GOACC_parallel entry point to GOACC_parallel_keyed and provide
a forwarding function. However, as the mkoffload data is incompatible, this is
probably overkill.  I've had to increment the (just committed) version number to
detect the change in data representation.  So any attempt to run an old binary
with a new libgomp will fail at the loading point.  We could simply keep the
same 'GOACC_parallel' name and not need any new symbols.  WDYT?


Ping?

https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01498.html

nathan