Re: [PATCH] Fix PR68799

2016-01-15 Thread Richard Biener
On January 15, 2016 11:06:40 PM GMT+01:00, Bill Schmidt 
 wrote:
>Hi,
>
>In straight-line strength reduction, it can sometimes happen that more
>than one conditional candidate can be predicated upon a PHI node P.
>During processing of the first conditional candidate, a new PHI node
>may
>be introduced in the same block as P, with new computations introduced
>in predecessors of that block.  If this requires splitting one or more
>edges, the PHI statement will change, and as a result it will now have
>a
>different address in storage than it did before.
>
>A problem arose because we cached the addresses of the PHI statements
>in
>a mapping from statements to strength-reduction candidates.  If, after
>a
>PHI statement changes, we refer to it by its new address when
>consulting
>this mapping, we won't find the associated candidate.
>
>Now, this shouldn't happen, because after creating the candidate table,
>we should always refer to PHIs by the original address at the time of
>the table's creation.  However, I had some sloppy code that caused us
>to
>look up the PHI statement by its result operand, even though we already
>had a handle to the original cached PHI statement.  When that code is
>used, it can find the moved statement instead, and things go wrong.
>
>This patch solves the problem by simply replacing the sloppy code with
>a
>direct lookup based on the cached PHI statement address, so everything
>remains consistent.
>
>I haven't added a test case because this is a pretty difficult scenario
>to re-create reliably.  The patch is pretty obvious, so I'm not too
>concerned about that.
>
>The problem was found originally by Matthias Klose while doing a
>PGO/LTO
>build of python-2.7, and reported as PR68799.  Matthias has tested the
>patch for me in his environment, and indeed it fixes the problem.  I've
>bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
>regressions for GCC 5.3.  Currently doing the same for trunk.
>
>Assuming no regressions is this ok for mainline, and thereafter for GCC
>5 and GCC 4.9?

OK.

Thanks,
Richard.

>Thanks,
>Bill
>
>
>2016-01-15  Bill Schmidt  
>
>   PR tree-optimization/68799
>   * gimple-ssa-strength-reduction.c (create_phi_basis): Directly
>   look up phi candidates in the statement-candidate map.
>   (phi_add_costs): Likewise.
>   (record_phi_increments): Likewise.
>   (phi_incr_cost): Likewise.
>   (ncd_with_phi): Likewise.
>   (all_phi_incrs_profitable): Likewise.
>
>
>Index: gcc/gimple-ssa-strength-reduction.c
>===
>--- gcc/gimple-ssa-strength-reduction.c(revision 232394)
>+++ gcc/gimple-ssa-strength-reduction.c(working copy)
>@@ -2267,7 +2267,7 @@ create_phi_basis (slsr_cand_t c, gimple from_phi,
>   slsr_cand_t basis = lookup_cand (c->basis);
>   int nargs = gimple_phi_num_args (from_phi);
>   basic_block phi_bb = gimple_bb (from_phi);
>-  slsr_cand_t phi_cand = base_cand_from_table (gimple_phi_result
>(from_phi));
>+  slsr_cand_t phi_cand = *stmt_cand_map->get (from_phi);
>   phi_args.create (nargs);
> 
>   /* Process each argument of the existing phi that represents
>@@ -2376,7 +2376,7 @@ phi_add_costs (gimple phi, slsr_cand_t c, int
>one_
> {
>   unsigned i;
>   int cost = 0;
>-  slsr_cand_t phi_cand = base_cand_from_table (gimple_phi_result
>(phi));
>+  slsr_cand_t phi_cand = *stmt_cand_map->get (phi);
> 
> /* If we work our way back to a phi that isn't dominated by the hidden
>  basis, this isn't a candidate for replacement.  Indicate this by
>@@ -2587,7 +2587,7 @@ static void
> record_phi_increments (slsr_cand_t basis, gimple phi)
> {
>   unsigned i;
>-  slsr_cand_t phi_cand = base_cand_from_table (gimple_phi_result
>(phi));
>+  slsr_cand_t phi_cand = *stmt_cand_map->get (phi);
>   
>   for (i = 0; i < gimple_phi_num_args (phi); i++)
> {
>@@ -2658,7 +2658,7 @@ phi_incr_cost (slsr_cand_t c, const widest_int
>&in
>   unsigned i;
>   int cost = 0;
>   slsr_cand_t basis = lookup_cand (c->basis);
>-  slsr_cand_t phi_cand = base_cand_from_table (gimple_phi_result
>(phi));
>+  slsr_cand_t phi_cand = *stmt_cand_map->get (phi);
> 
>   for (i = 0; i < gimple_phi_num_args (phi); i++)
> {
>@@ -3002,7 +3002,7 @@ ncd_with_phi (slsr_cand_t c, const widest_int
>&inc
> {
>   unsigned i;
>   slsr_cand_t basis = lookup_cand (c->basis);
>-  slsr_cand_t phi_cand = base_cand_from_table (gimple_phi_result
>(phi));
>+  slsr_cand_t phi_cand = *stmt_cand_map->get (phi);
> 
>   for (i = 0; i < gimple_phi_num_args (phi); i++)
> {
>@@ -3212,7 +3212,7 @@ all_phi_incrs_profitable (slsr_cand_t c, gimple
>ph
> {
>   unsigned i;
>   slsr_cand_t basis = lookup_cand (c->basis);
>-  slsr_cand_t phi_cand = base_cand_from_table (gimple_phi_result
>(phi));
>+  slsr_cand_t phi_cand = *stmt_cand_map->get (phi);
> 
>   for (i = 0; i < gimple_phi_num_args (phi); i++)
> {




Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation

2016-01-15 Thread Jeff Law

On 01/04/2016 07:32 AM, Ajit Kumar Agarwal wrote:



-Original Message- From: Jeff Law [mailto:l...@redhat.com]
Sent: Wednesday, December 23, 2015 12:06 PM To: Ajit Kumar Agarwal;
Richard Biener Cc: GCC Patches; Vinod Kathail; Shail Aditya Gupta;
Vidhumouli Hunsigida; Nagaraju Mekala Subject: Re:
[Patch,tree-optimization]: Add new path Splitting pass on tree ssa
representation

On 12/11/2015 02:11 AM, Ajit Kumar Agarwal wrote:


Mibench/EEMBC benchmarks (Target Microblaze)

Automotive_qsort1(4.03%), Office_ispell(4.29%),
Office_stringsearch1(3.5%). Telecom_adpcm_d( 1.37%),
ospfv2_lite(1.35%).

I'm having a real tough time reproducing any of these results.
In fact, I'm having a tough time seeing cases where path
splitting even applies to the Mibench/EEMBC benchmarks
>>mentioned above.



In the very few cases where split-paths might apply, the net
resulting assembly code I get is the same with and without
split-paths.



How consistent are these results?


I am consistently getting the gains for office_ispell and
office_stringsearch1, telcom_adpcm_d. I ran it again today and we see
gains in the same bench mark tests with the split path changes.


What functions are being affected that in turn impact
performance?


For office_ispell: The function are Function "linit (linit,
funcdef_no=0, decl_uid=2535, cgraph_uid=0, symbol_order=2) for
lookup.c file". "Function checkfile (checkfile, funcdef_no=1,
decl_uid=2478, cgraph_uid=1, symbol_order=4)" " Function correct
(correct, funcdef_no=2, decl_uid=2503, cgraph_uid=2,
symbol_order=5)" " Function askmode (askmode, funcdef_no=24,
decl_uid=2464, cgraph_uid=24, symbol_order=27)" for correct.c file.

For office_stringsearch1: The function is Function "bmhi_search
(bmhi_search, funcdef_no=1, decl_uid=2178, cgraph_uid=1,
symbol_order=5)" for bmhisrch.c file.

Can you send me the pre-processed lookup.c, correct.c and bmhi_search.c?

I generated mine using x86 and that may be affecting my ability to 
reproduce your results on the microblaze target.  Looking specifically 
at bmhi_search.c and correct.c, I see they are going to be sensitive to 
the target headers.  If (for exmaple) they use FORTIFY_SOURCE or macros 
for toupper.


In the bmhi_search I'm looking at, I don't see any opportunities for the 
path splitter to do anything.  The CFG just doesn't have the right 
shape.  Again, that may be an artifact of how toupper is implemented in 
the system header files -- hence my request for the cpp output on each 
of the important files.


Jeff


[PATCH, committed] PR diagnostic/68899: fix read-beyond-buffer when printing very wide source lines

2016-01-15 Thread David Malcolm
Our code for printing source code can apply an x-offset when printing very
wide source lines, which attempts to ensure that the caret will be printed
before line-wrapping occurs (it doesn't attempt to prevent line-wrapping,
but the old implementation didn't either).

The current implementation has a trivial bug in which the x-offset is applied
too early, leading to a read past the end of the source line buffer of
up to x-offset bytes.

Fixed thusly.

Successfully bootstrapped®rtested on x86_64-pc-linux-gnu;
committed to trunk as r232465 as obvious.

gcc/ChangeLog:
PR diagnostic/68899
* diagnostic-show-locus.c (layout::print_source_line): Move x
offset of line until after call to
get_line_width_without_trailing_whitespace.
---
 gcc/diagnostic-show-locus.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/gcc/diagnostic-show-locus.c b/gcc/diagnostic-show-locus.c
index 3ef0052..e323254 100644
--- a/gcc/diagnostic-show-locus.c
+++ b/gcc/diagnostic-show-locus.c
@@ -524,14 +524,13 @@ layout::print_source_line (int row, line_bounds 
*lbounds_out)
   if (!line)
 return false;
 
-  line += m_x_offset;
-
   m_colorizer.set_normal_text ();
 
   /* We will stop printing the source line at any trailing
  whitespace.  */
   line_width = get_line_width_without_trailing_whitespace (line,
   line_width);
+  line += m_x_offset;
 
   pp_space (m_pp);
   int first_non_ws = INT_MAX;
-- 
1.8.5.3



RE: Ping Re: Handle Octeon 3 not supporting MIPS paired-single instructions

2016-01-15 Thread Moore, Catherine


> -Original Message-
> From: Myers, Joseph
> Sent: Friday, January 15, 2016 6:28 PM
> To: Andrew Pinski
> Cc: GCC Patches; Moore, Catherine; matthew.fort...@imgtec.com
> Subject: Ping Re: Handle Octeon 3 not supporting MIPS paired-single
> instructions
> 
> On Fri, 8 Jan 2016, Andrew Pinski wrote:
> 
> > On Fri, Jan 8, 2016 at 4:05 PM, Joseph Myers 
> wrote:
> > > The Octeon 3 processor does not support the MIPS paired-single
> > > instructions.  This results in illegal instruction errors in the
> > > testsuite when vectorization tests try to use those instructions.
> > >
> > > This patch teaches the compiler about that lack of support, so that
> > > warnings are given when -mpaired-single (or something implying it)
> > > is used when compiling for such a processor.  I chose to test
> > > TARGET_OCTEON as the simplest conditional; since the older Octeon
> > > processors don't support hard float at all, I don't think the choice
> > > matters for them.  Tests that then failed with the warning were
> > > updated to disable them for Octeon.
> > >
> > > Tested with no regressions for cross to mips64el-linux-gnu (Octeon
> > > 3).  OK to commit?
> >
> > This is ok from my point of view.  I did not think about doing this at
> > the time I added Octeon 3 support.
> 
> Ping for MIPS maintainer review of this patch  patches/2016-01/msg00492.html>.
> 
This is OK, please commit.
Thanks,
Catherine


Re: [PATCH][PR tree-optimization/69270] Exploit VRP information in DOM

2016-01-15 Thread Jakub Jelinek
On Fri, Jan 15, 2016 at 03:32:33PM -0700, Jeff Law wrote:
> +bool
> +ssa_name_has_boolean_range (tree op)
> +{
> +  gcc_assert (TREE_CODE (op) == SSA_NAME);
> +
> +  /* Boolean types always have a range [0..1].  */
> +  if (TREE_CODE (TREE_TYPE (op)) == BOOLEAN_TYPE)
> +return true;
> +
> +  /* An integral type with a single bit of precision.  */
> +  if (INTEGRAL_TYPE_P (TREE_TYPE (op))
> +  && TYPE_UNSIGNED (TREE_TYPE (op))
> +  && TYPE_PRECISION (TREE_TYPE (op)) == 1)
> +return true;
> +
> +  /* An integral type with more precision, but the object
> + only takes on values [0..1] as determined by VRP
> + analysis.  */
> +  if (INTEGRAL_TYPE_P (TREE_TYPE (op))
> +  && (TYPE_PRECISION (TREE_TYPE (op)) > 1
> +   || TYPE_UNSIGNED (TREE_TYPE (op)))

I think this || TYPE_UNSIGNED (TREE_TYPE (op)) is useless.
Because, if TYPE_PRECISION (TREE_TYPE (op)) > 1, then both signed and
unsigned is fine, and if precision is 1, then already the earlier if
handled it, and precision 0 is hopefully invalid.

> +  && wi::eq_p (get_nonzero_bits (op), 1))
> +return true;
> +
> +  return false;
> +}

Jakub


[PATCH] Fix RTL DSE (PR rtl-optimization/68955)

2016-01-15 Thread Jakub Jelinek
Hi!

The following testcase is miscompiled on i686-linux at -O3.
The bug is in DSE record_store, which for group_id < 0 uses mem_addr
set to result of get_addr (base->val_rtx) (plus optional offset),
which is fine for canon_true_dependence with other MEMs in that function,
but we also store that address in store_info.  The problem is if later on
e.g. some read uses the same e.g. hard register as get_addr returned, but
that register contains at that later point a different value. 
canon_true_dependence then happily returns the read does not alias the
store, although it might.
The fix is to store the VALUE (plus optional offset) into
store_info->mem_addr instead, then at some later insn when get_addr is
called on it it will either return the same register or expression (if it
has not changed), or some different one otherwise.

Bootstrapped/regtested on x86_64-linux and i686-linux, I've additionally
performed instrumented x86_64-linux and i686-linux bootstraps/regtests,
which recorded number of locally_deleted and globally_deleted from each
function, once without this patch, once with it.  Across both 64-bit and
32-bit bootstrap/regtest, both unpatched and patched had the same number
151999 of globally_deleted, and unpatched had 74828 locally_deleted, while
patched 74849 locally_deleted.  Thus the patch should in addition to
actually fixing a real bug not really decrease number of DSEd stores
significant way, but sometimes even improve it.

Ok for trunk?

2016-01-15  Jakub Jelinek  

PR rtl-optimization/68955
* dse.c (record_store): For group_id < 0, ensure mem_addr is not
result of get_addr.

* gcc.dg/torture/pr68955.c: New test.

--- gcc/dse.c.jj2016-01-04 14:55:51.0 +0100
+++ gcc/dse.c   2016-01-15 19:25:31.323384174 +0100
@@ -1653,6 +1653,15 @@ record_store (rtx body, bb_info_t bb_inf
   insn_info->store_rec = store_info;
   store_info->mem = mem;
   store_info->alias_set = spill_alias_set;
+  if (spill_alias_set == 0 && group_id < 0)
+{
+  /* Ensure we store address with VALUE in it, instead of result of
+get_addr on it, otherwise if the registers in get_addr change,
+we will not notice possible aliasing.  */
+  mem_addr = base->val_rtx;
+  if (offset)
+   mem_addr = plus_constant (get_address_mode (mem), mem_addr, offset);
+}
   store_info->mem_addr = mem_addr;
   store_info->cse_base = base;
   if (width > HOST_BITS_PER_WIDE_INT)
--- gcc/testsuite/gcc.dg/torture/pr68955.c.jj   2016-01-15 19:32:00.347979523 
+0100
+++ gcc/testsuite/gcc.dg/torture/pr68955.c  2016-01-15 19:31:39.0 
+0100
@@ -0,0 +1,41 @@
+/* PR rtl-optimization/68955 */
+/* { dg-do run } */
+/* { dg-output "ONE1ONE" } */
+
+int a, b, c, d, g, m;
+int i[7][7][5] = { { { 5 } }, { { 5 } },
+  { { 5 }, { 5 }, { 5 }, { 5 }, { 5 }, { -1 } } };
+static int j = 11;
+short e, f, h, k, l;
+
+static void
+foo ()
+{
+  for (; e < 5; e++)
+for (h = 3; h; h--)
+  {
+   for (g = 1; g < 6; g++)
+ {
+   m = c == 0 ? b : b / c;
+   i[e][1][e] = i[1][1][1] | (m & l) && f;
+ }
+   for (k = 0; k < 6; k++)
+ {
+   for (d = 0; d < 6; d++)
+ i[1][e][h] = i[h][k][e] >= l;
+   i[e + 2][h + 3][e] = 6 & l;
+   i[2][1][2] = a;
+   for (; j < 5;)
+ for (;;)
+   ;
+ }
+  }
+}
+
+int
+main ()
+{
+  foo ();
+  __builtin_printf ("ONE%dONE\n", i[1][0][2]);
+  return 0;
+}

Jakub


[patch] doc/sourcebuild.texi (Directives): Remove extra closing braces.

2016-01-15 Thread Jonathan Wakely

This removes stray closing braces in the docs for dg-error, dg-warning
etc.

OK for trunk?

commit 1cb064263cfcfa14da81585886750f01a5611c7e
Author: Jonathan Wakely 
Date:   Sat Jan 16 00:11:27 2016 +

	* doc/sourcebuild.texi (Directives): Remove extra closing braces.

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 62212d4..3c00569 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -1154,7 +1154,7 @@ conditions (which are the same as for @code{dg-skip-if}) are met.
 @subsubsection Verify compiler messages
 
 @table @code
-@item @{ dg-error @var{regexp} [@var{comment} [@{ target/xfail @var{selector} @} [@var{line}] @}]] @}
+@item @{ dg-error @var{regexp} [@var{comment} [@{ target/xfail @var{selector} @} [@var{line}] ]] @}
 This DejaGnu directive appears on a source line that is expected to get
 an error message, or else specifies the source line associated with the
 message.  If there is no message for that line or if the text of that
@@ -1162,7 +1162,7 @@ message is not matched by @var{regexp} then the check fails and
 @var{comment} is included in the @code{FAIL} message.  The check does
 not look for the string @samp{error} unless it is part of @var{regexp}.
 
-@item @{ dg-warning @var{regexp} [@var{comment} [@{ target/xfail @var{selector} @} [@var{line}] @}]] @}
+@item @{ dg-warning @var{regexp} [@var{comment} [@{ target/xfail @var{selector} @} [@var{line}] ]] @}
 This DejaGnu directive appears on a source line that is expected to get
 a warning message, or else specifies the source line associated with the
 message.  If there is no message for that line or if the text of that
@@ -1170,13 +1170,13 @@ message is not matched by @var{regexp} then the check fails and
 @var{comment} is included in the @code{FAIL} message.  The check does
 not look for the string @samp{warning} unless it is part of @var{regexp}.
 
-@item @{ dg-message @var{regexp} [@var{comment} [@{ target/xfail @var{selector} @} [@var{line}] @}]] @}
+@item @{ dg-message @var{regexp} [@var{comment} [@{ target/xfail @var{selector} @} [@var{line}] ]] @}
 The line is expected to get a message other than an error or warning.
 If there is no message for that line or if the text of that message is
 not matched by @var{regexp} then the check fails and @var{comment} is
 included in the @code{FAIL} message.
 
-@item @{ dg-bogus @var{regexp} [@var{comment} [@{ target/xfail @var{selector} @} [@var{line}] @}]] @}
+@item @{ dg-bogus @var{regexp} [@var{comment} [@{ target/xfail @var{selector} @} [@var{line}] ]] @}
 This DejaGnu directive appears on a source line that should not get a
 message matching @var{regexp}, or else specifies the source line
 associated with the bogus message.  It is usually used with @samp{xfail}


Re: [hsa merge 09/10] Majority of the HSA back-end

2016-01-15 Thread Martin Jambor
Hi,

bootstrapping on i686-linux revealed the need for the following simple
patch.  I've run into two types of compilation errors on
powerpc-ibm-aix (no htolenn functions and ASM_GENERATE_INTERNAL_LABEL
somehow expanding to undeclared rs6000_xcoff_strip_dollar).  I plan to
workaround them quickly by making most of the contents of hsa-*.c
files compiled only conditionally (and leave potential hsa support on
non-linux platforms for later), but I will not have time to do the
change and test it properly until Monday.

But that will hopefully really be it,

Martin


2016-01-16  Martin Jambor  

* hsa-dump.c (dump_hsa_symbol): Add missing argumet cast.

diff --git a/gcc/hsa-dump.c b/gcc/hsa-dump.c
index af79bcb..c5f1f69 100644
--- a/gcc/hsa-dump.c
+++ b/gcc/hsa-dump.c
@@ -720,7 +720,7 @@ dump_hsa_symbol (FILE *f, hsa_symbol *symbol)
   hsa_type_name (symbol->m_type & ~BRIG_TYPE_ARRAY_MASK), name);
 
   if (symbol->m_type & BRIG_TYPE_ARRAY_MASK)
-fprintf (f, "[%lu]", symbol->m_dim);
+fprintf (f, "[%lu]", (unsigned long) symbol->m_dim);
 }
 
 /* Dump textual representation of HSA IL operand OP to file F.  */


Ping Re: Handle Octeon 3 not supporting MIPS paired-single instructions

2016-01-15 Thread Joseph Myers
On Fri, 8 Jan 2016, Andrew Pinski wrote:

> On Fri, Jan 8, 2016 at 4:05 PM, Joseph Myers  wrote:
> > The Octeon 3 processor does not support the MIPS paired-single
> > instructions.  This results in illegal instruction errors in the
> > testsuite when vectorization tests try to use those instructions.
> >
> > This patch teaches the compiler about that lack of support, so that
> > warnings are given when -mpaired-single (or something implying it) is
> > used when compiling for such a processor.  I chose to test
> > TARGET_OCTEON as the simplest conditional; since the older Octeon
> > processors don't support hard float at all, I don't think the choice
> > matters for them.  Tests that then failed with the warning were
> > updated to disable them for Octeon.
> >
> > Tested with no regressions for cross to mips64el-linux-gnu (Octeon
> > 3).  OK to commit?
> 
> This is ok from my point of view.  I did not think about doing this at
> the time I added Octeon 3 support.

Ping for MIPS maintainer review of this patch 
.

-- 
Joseph S. Myers
jos...@codesourcery.com


[patch] libstdc++/69293 Use static assertion for uses-allocator construction

2016-01-15 Thread Jonathan Wakely

The PR is actually due to a defect in the standard, which I reported
today. The reporter said we're missing a check for is_constructible that would ensure we go to bullet (9.4)
and make the example in the PR ill-formed. I didn't add that check
because it's redundant, we don't need to check is_constructible we can
just try the construction and if it fails the program is ill-formed
anyway.

However, due to the  defect in the standard the example in the PR
*should* be ill-formed, but isn't.  This patch adds a static assertion
making it ill-formed. It would be ill-formed anyway once the defect is
resolved and we implement the resolution, but with the static
assertion we give a better diagnostic, both for
scoped_allocator_adaptor and for ill-formed uses of allocators with
std::tuple.

Tested powerpc64le-linux, committed to trunk.


commit ccd75a6477a0be335d8d38fcbcd6047cebaee096
Author: Jonathan Wakely 
Date:   Fri Jan 15 19:48:25 2016 +

Use static assertion for uses-allocator construction

	PR libstdc++/69293
	* include/bits/uses_allocator.h (__uses_alloc): Add
	static assertion that type is constructible from the arguments.
	* testsuite/20_util/scoped_allocator/69293_neg.cc: New.
	* testsuite/20_util/uses_allocator/69293_neg.cc: New.
	* testsuite/20_util/uses_allocator/cons_neg.cc: Adjust dg-error.

diff --git a/libstdc++-v3/include/bits/uses_allocator.h b/libstdc++-v3/include/bits/uses_allocator.h
index 70ba007..b1ff58a 100644
--- a/libstdc++-v3/include/bits/uses_allocator.h
+++ b/libstdc++-v3/include/bits/uses_allocator.h
@@ -85,7 +85,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 is_constructible<_Tp, allocator_arg_t, _Alloc, _Args...>::value,
 __uses_alloc1<_Alloc>,
	__uses_alloc2<_Alloc>>::type
-{ };
+{
+  static_assert(__or_<
+	  is_constructible<_Tp, allocator_arg_t, _Alloc, _Args...>,
+	  is_constructible<_Tp, _Args..., _Alloc>>::value, "construction with"
+	  " an allocator must be possible if uses_allocator is true");
+};
 
   template
 struct __uses_alloc
diff --git a/libstdc++-v3/testsuite/20_util/scoped_allocator/69293_neg.cc b/libstdc++-v3/testsuite/20_util/scoped_allocator/69293_neg.cc
new file mode 100644
index 000..f3b2d87
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/scoped_allocator/69293_neg.cc
@@ -0,0 +1,51 @@
+// Copyright (C) 2016 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-options "-std=gnu++11" }
+// { dg-do compile }
+
+// PR libstdc++/69293
+
+#include 
+#include 
+
+using std::allocator;
+using std::allocator_arg_t;
+using std::uses_allocator;
+using std::scoped_allocator_adaptor;
+using std::is_constructible;
+
+struct X
+{
+  using allocator_type = allocator;
+};
+
+using scoped_alloc = scoped_allocator_adaptor, X::allocator_type>;
+using inner_alloc_type = scoped_alloc::inner_allocator_type;
+
+static_assert(uses_allocator{}, "");
+static_assert(!is_constructible{}, "");
+static_assert(!is_constructible{}, "");
+
+void
+test01()
+{
+  scoped_alloc sa;
+  auto p = sa.allocate(1);
+  sa.construct(p);  // this is required to be ill-formed
+  // { dg-error "static assertion failed" "" { target *-*-* } 89 }
+}
diff --git a/libstdc++-v3/testsuite/20_util/uses_allocator/69293_neg.cc b/libstdc++-v3/testsuite/20_util/uses_allocator/69293_neg.cc
new file mode 100644
index 000..19417fc
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/uses_allocator/69293_neg.cc
@@ -0,0 +1,49 @@
+// Copyright (C) 2016 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-options "-std=gnu++11" }
+// { dg-do compile }
+
+

Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation

2016-01-15 Thread Jeff Law

On 01/14/2016 01:55 AM, Jeff Law wrote:
[ Replying to myself again, mostly to make sure we've got these thoughts 
in the archives. ]


Anyway, going back to adpcm_decode, we do end up splitting this path:

  # vpdiff_12 = PHI 
   if (sign_41 != 0)
 goto ;
   else
 goto ;
;;succ:   15
;;16

;;   basic block 15, loop depth 1
;;pred:   14
   valpred_51 = valpred_76 - vpdiff_12;
   goto ;
;;succ:   17

;;   basic block 16, loop depth 1
;;pred:   14
   valpred_52 = vpdiff_12 + valpred_76;
;;succ:   17

;;   basic block 17, loop depth 1
;;pred:   15
;;16
   # valpred_7 = PHI 
   _85 = MAX_EXPR ;
   valpred_13 = MIN_EXPR <_85, 32767>;
   step_53 = stepsizeTable[index_62];
   outp_54 = outp_69 + 2;
   _55 = (short int) valpred_13;
   MEM[base: outp_54, offset: -2B] = _55;
   if (outp_54 != _74)
 goto ;
   else
 goto ;

This doesn't result in anything particularly interesting/good AFAICT. We
propagate valpred_51/52 into the use in the MAX_EXPR in the duplicate
paths, but that doesn't allow any further simplification.
So with the heuristic I'm poking at, this gets rejected.  Essentially it 
doesn't think it's likely to expose CSE/DCE opportunities (and it's 
correct).  The number of statements in predecessor blocks that feed 
operands in the to-be-copied-block is too small relative to the size of 
the to-be-copied-block.





Ajit, can you confirm which of adpcm_code or adpcm_decode where path
splitting is showing a gain?  I suspect it's the former but would like
to make sure so that I can adjust the heuristics properly.
I'd still like to have this answered when you can Ajit, just to be 100% 
 that it's the path splitting in adpcm_code that's responsible for the 
improvements you're seeing in adpcm.


jeff


Re: [patch] libstdc++/48891 Use ::isinf and ::isnan if libc defines them

2016-01-15 Thread Jonathan Wakely

On 13/01/16 16:26 +, Jonathan Wakely wrote:

On 08/01/16 13:59 +, Jonathan Wakely wrote:

I'm only checking for those functions for *-*-*gnu* targets, as I
don't know of any other targets where it's an issue. Solaris and
the BSDs don't define those functions. If it affects other targets we
can extend the check to cover them too.


AIX also defines ::isinf and ::isnan, so this extends the configure
check to AIX.

Committed to trunk.

commit 4448720e8e3fabe45e49ee56bc469abe7c4b06e0
Author: Jonathan Wakely 
Date:   Fri Jan 15 14:07:19 2016 +

PR libstdc++/69294 Check for isinf and isnan on AIX

	PR libstdc++/69294
	* acinclude.m4 (GLIBCXX_CHECK_MATH11_PROTO): Check for obsolete isinf
	and isnan on AIX. Quote variables.
	* configure: Regenerate.

diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
index 1e25660..f8dbb95 100644
--- a/libstdc++-v3/acinclude.m4
+++ b/libstdc++-v3/acinclude.m4
@@ -2186,7 +2186,7 @@ AC_DEFUN([GLIBCXX_CHECK_MATH11_PROTO], [
   fi
   AC_MSG_RESULT([$glibcxx_cv_math11_overload])
   ;;
-*-*-*gnu*)
+*-*-*gnu* | *-*-aix*)
   # If  defines the obsolete isinf(double) and isnan(double)
   # functions (instead of or as well as the C99 generic macros) then we
   # can't define std::isinf(double) and std::isnan(double) in 
@@ -3445,9 +3445,9 @@ EOF
   AC_LANG_RESTORE
 
   # Set atomicity_dir to builtins if all but the long long test above passes.
-  if test $glibcxx_cv_atomic_bool = yes \
- && test $glibcxx_cv_atomic_short = yes \
- && test $glibcxx_cv_atomic_int = yes; then
+  if test "$glibcxx_cv_atomic_bool" = yes \
+ && test "$glibcxx_cv_atomic_short" = yes \
+ && test "$glibcxx_cv_atomic_int" = yes; then
 AC_DEFINE(_GLIBCXX_ATOMIC_BUILTINS, 1,
 [Define if the compiler supports C++11 atomics.])
 atomicity_dir=cpu/generic/atomicity_builtins


Re: [PATCH v2] libstdc++: Make certain exceptions transaction_safe.

2016-01-15 Thread Torvald Riegel
On Thu, 2016-01-14 at 17:58 +, Jonathan Wakely wrote:
> On 07/01/16 17:47 +0100, Torvald Riegel wrote:
> >The attached patch makes some exceptions transaction-safe, as require by
> >the Transactional Memory TS.  I believe I addressed all feedback for the
> >previous version of this patch (in particular, there are now more safety
> >checks for preconditions for this implementation (eg, that the new
> >allocator is used), all exceptions declared by the TM TS are now
> >supported (with the exception of tx_exception -- should I add that in a
> >follow-up patch?)
> 
> Yes, that can be a separate patch, as it's adding a new type rather
> than modifying the existing ones to add this TM magic.
> 
> >Thus, the patch adds txnal clones as required.  They are new exported
> >symbols, but not visible to nontransactional code.  The only changes to
> >headers is transaction_safe[_dynamic] annotations where required by the
> >TS, and a few friend declarations.  The annotations are only enabled if
> >a user compiles with -fgnu-tm.  IOW, the changes are pretty much
> >invisible when not using the TM TS.
> 
> Thanks for adding all the clear comments as well. I'm sure we'll all
> be grateful for those when we come to look back at the code.
> 
> >There are also commented-out calls to _ITM_setAssociatedException in the
> >code, which exist to show how we plan to support transaction
> >cancellation through exceptions (which needs some more libitm support
> >and bugfixes on the compiler side).
> >
> >Tested on x86_64-linux and x86-linux.
> >
> >OK?
> 
> Yes, with some minor formatting changes noted below.

Addressed these, fixed a problem with using GLIBCXX_WEAK_DEFINITION
(which is only set on Darwin despite the generic-sounding name -- so
just use __attribute__((weak)) directly), and also updated
testsuite_abi.cc so that it knows about CXXABI_1.3.10.

Approved by Jonathan Wakely.  Committed as r232454.


commit df44fb92fd161282cc6540053cd82177b7c02d51
Author: Torvald Riegel 
Date:   Fri Nov 13 01:00:52 2015 +0100

libstdc++: Make certain exceptions transaction_safe.

diff --git a/libitm/testsuite/libitm.c++/libstdc++-safeexc.C b/libitm/testsuite/libitm.c++/libstdc++-safeexc.C
new file mode 100644
index 000..3e1655e
--- /dev/null
+++ b/libitm/testsuite/libitm.c++/libstdc++-safeexc.C
@@ -0,0 +1,89 @@
+// Tests that the exceptions declared by the TM TS (N4514) as transaction_safe
+// are indeed that.  Thus, this also tests the transactional clones in
+// libstdc++ and libsupc++.
+
+// { dg-do run }
+
+#include 
+#include 
+#include 
+#include 
+
+using namespace std;
+
+template void thrower(const T& t)
+{
+  try
+{
+  atomic_commit
+  {
+	throw t;
+  }
+}
+  catch (T ex)
+{
+  if (ex != t) abort ();
+}
+}
+
+template void thrower1(const string& what)
+{
+  try
+{
+  atomic_commit
+  {
+	throw T ();
+  }
+}
+  catch (T ex)
+{
+  if (what != ex.what()) abort ();
+}
+}
+
+template void thrower2(const string& what)
+{
+  try
+{
+  atomic_commit
+  {
+	throw T (what);
+  }
+}
+  catch (T ex)
+{
+  if (what != ex.what()) abort ();
+}
+}
+
+
+int main ()
+{
+  thrower (23);
+  thrower (23);
+  thrower (23);
+  thrower (23);
+  thrower (23);
+  thrower (23);
+  thrower (42);
+  thrower (42);
+  thrower (42);
+  thrower (42);
+  thrower (23.42);
+  thrower (23.42);
+  thrower (23.42);
+  thrower (0);
+  thrower (0);
+  thrower1 ("std::exception");
+  thrower1 ("std::bad_exception");
+  thrower2 ("test");
+  thrower2 ("test");
+  thrower2 ("test");
+  thrower2 ("test");
+  thrower2 ("test");
+  thrower2 ("test");
+  thrower2 ("test");
+  thrower2 ("test");
+  thrower2 ("test");
+  return 0;
+}
diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
index b76e8d5..1e25660 100644
--- a/libstdc++-v3/acinclude.m4
+++ b/libstdc++-v3/acinclude.m4
@@ -2594,6 +2594,8 @@ AC_DEFUN([GLIBCXX_ENABLE_ALLOCATOR], [
   ;;
   esac
 
+  GLIBCXX_CONDITIONAL(ENABLE_ALLOCATOR_NEW,
+		  test $enable_libstdcxx_allocator_flag = new)
   AC_SUBST(ALLOCATOR_H)
   AC_SUBST(ALLOCATOR_NAME)
 ])
@@ -4344,6 +4346,34 @@ dnl
   AC_LANG_RESTORE
 ])
 
+dnl
+dnl Check how size_t is mangled.  Copied from libitm.
+dnl
+AC_DEFUN([GLIBCXX_CHECK_SIZE_T_MANGLING], [
+  AC_CACHE_CHECK([how size_t is mangled],
+ glibcxx_cv_size_t_mangling, [
+AC_TRY_COMPILE([], [extern __SIZE_TYPE__ x; extern unsigned long x;],
+   [glibcxx_cv_size_t_mangling=m], [
+  AC_TRY_COMPILE([], [extern __SIZE_TYPE__ x; extern unsigned int x;],
+ [glibcxx_cv_size_t_mangling=j], [
+AC_TRY_COMPILE([],
+   [extern __SIZE_TYPE__ x; extern unsigned long long x;],
+   [glibcxx_cv_size_t_mangling=y], [
+  AC_TRY_COMPILE([],
+ [extern __SIZE_TYPE__ x; extern unsigned short x;],
+ [glibcxx_cv_size_t_mangling=t],
+   

[PATCH] fix gimplification of call parameters (PR cilkplus/69267)

2016-01-15 Thread Ryan Burn
This patch changes the function cilk_gimplify_call_params_in_spawned_fn to use 
gimplify_arg instead of gimplify_expr. It fixes an ICE when calling a function 
with a constructed empty class as the argument.

Bootstrapped and regression tested on x86_64-linux.

2016-01-15  Ryan Burn  

PR cilkplus/69267
* cilk.c (cilk_gimplify_call_params_in_spawned_fn): Change to use
gimplify_arg. Removed superfluous post_p argument.
* c-family.h (cilk_gimplify_call_params_in_spawned_fn): Removed
superfluous post_p argument.
* c-gimplify.c (c_gimplify_expr): Likewise.

gcc/cp/ChangeLog:

2016-01-15  Ryan Burn  

PR cilkplus/69267
* cp-gimplify.c (cilk_cp_gimplify_call_params_in_spawned_fn): Removed
superfluous post_p argument in call to
cilk_gimplify_call_params_in_spawned_fn.

gcc/testsuite/ChangeLog:

2016-01-15  Ryan Burn

  
PR cilkplus/69267
* g++.dg/cilk-plus/CK/pr69267.cc: New test.





pr69267.diff
Description: Binary data


Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation

2016-01-15 Thread Jeff Law

On 01/13/2016 01:10 AM, Jeff Law wrote:


I'm going to focus on adpcm for the moment, in particular adpcm_coder.
It appears the key blocks are:


;;   basic block 14, loop depth 1, count 0, freq 9100, maybe hot
;;prev block 13, next block 15, flags: (NEW, REACHABLE)
;;pred:   12 [100.0%]  (FALLTHRU,EXECUTABLE)
;;13 [100.0%]  (FALLTHRU,EXECUTABLE)
   # valpred_12 = PHI 
   _112 = MAX_EXPR ;
   valpred_18 = MIN_EXPR <_112, 32767>;
   delta_56 = delta_7 | iftmp.1_114;
   _57 = indexTable[delta_56];
   index_58 = _57 + index_107;
   _113 = MIN_EXPR ;
   index_111 = MAX_EXPR <_113, 0>;
   step_59 = stepsizeTable[index_111];
   if (bufferstep_93 != 0)
 goto ;
   else
 goto ;
;;succ:   15 [50.0%]  (TRUE_VALUE,EXECUTABLE)
;;16 [50.0%]  (FALSE_VALUE,EXECUTABLE)

;;   basic block 15, loop depth 1, count 0, freq 4550, maybe hot
;;prev block 14, next block 16, flags: (NEW, REACHABLE)
;;pred:   14 [50.0%]  (TRUE_VALUE,EXECUTABLE)
   _60 = delta_56 << 4;
   goto ;
;;succ:   17 [100.0%]  (FALLTHRU,EXECUTABLE)

;;   basic block 16, loop depth 1, count 0, freq 4550, maybe hot
;;prev block 15, next block 17, flags: (NEW, REACHABLE)
;;pred:   14 [50.0%]  (FALSE_VALUE,EXECUTABLE)
   outp_62 = outp_83 + 1;
   _63 = (signed char) delta_56;
   _65 = (signed char) outputbuffer_90;
   _66 = _63 | _65;
   *outp_83 = _66;
;;succ:   17 [100.0%]  (FALLTHRU,EXECUTABLE)

;;   basic block 17, loop depth 1, count 0, freq 9100, maybe hot
;;prev block 16, next block 18, flags: (NEW, REACHABLE)
;;pred:   15 [100.0%]  (FALLTHRU,EXECUTABLE)
;;16 [100.0%]  (FALLTHRU,EXECUTABLE)
   # outp_3 = PHI 
   # outputbuffer_21 = PHI <_60(15), outputbuffer_90(16)>
   _109 = bufferstep_93 ^ 1;
   _98 = _109 & 1;
   ivtmp.11_68 = ivtmp.11_105 + 2;
   if (ivtmp.11_68 != _116)
 goto ;
   else
 goto ;


Block #17 is the join point that we're going to effectively copy into
blocks #15 and #16.  Doing so in turn exposes bufferstep_93 as the
constant 0 in block #16, which in turn allows elimination of a couple
statements in the extended version of block #16 and we propagate the
constant 1 for bufferstep_93 to the top of the loop when reached via
block #16.  So we save a few instructions.  However, I think we're
actually doing a fairly poor job here.

bufferstep is a great example of a flip-flop variable and its value is
statically computable based on the path from the prior loop iteration
which, if exploited would allow the FSM threader to eliminate the
conditional at the end of bb14.  I'm going to have to play with that.
So I've extended DOM & uncprop to pick up the missed propagation 
opportunity, which in turn allows DOM to simplify this function even 
further and hopefully set ourselves up for either unrolling the loop or 
using the FSM threader to eliminate the test on bufferstep completely. 
But those are gcc-7 items.





Anyway, it's late and I want to rip this test apart a bit more and see
how it interacts with the heuristic that I've cobbled together as well
as see what it would take to have DOM or VRP get data on bufferstep_93
on the true path out of BB14 after a path-split.
As I expected, this showed a need for a minor tweak to the heuristic I'm 
poking at for path splitting.  Nothing particularly hard, it needs 
further work (it's not compile-time efficient right now), but it's good 
enough to put away adpcm_code and continue looking more closely at 
adpcm_decode.


Jeff



Re: [PATCH][PR tree-optimization/69270] Exploit VRP information in DOM

2016-01-15 Thread Jeff Law

On 01/14/2016 11:14 AM, Jeff Law wrote:

On 01/14/2016 12:49 AM, Jakub Jelinek wrote:

On Thu, Jan 14, 2016 at 08:46:43AM +0100, Jakub Jelinek wrote:

On Thu, Jan 14, 2016 at 12:38:52AM -0700, Jeff Law wrote:

+  /* An integral type with more precision, but the object
+ only takes on values [0..1] as determined by VRP
+ analysis.  */
+  wide_int min, max;
+  if (INTEGRAL_TYPE_P (TREE_TYPE (op))
+  && get_range_info (op, &min, &max) == VR_RANGE
+  && wi::eq_p (min, 0)
+  && wi::eq_p (max, 1))
+return true;


You could use and/or:
   if (INTEGRAL_TYPE_P (TREE_TYPE (op)) && wi::eq_p (get_nonzero_bits
(op), 1))
set_range_info for VR_RANGE should usually update also the non-zero
bits, but
set_nonzero_bits does not update the recorded range.


Though, that would need to be limited to TYPE_PRECISION (TREE_TYPE
(op)) > 1
or TYPE_UNSIGNED.

Quite surprisingly, this does seem to do better fairly often.  Usually
it's just getting more constants into the PHI nodes without further
improvements.  However occasionally I see a PHI that turns into a
constant, which is then propagated to a use where we're able to simplify
some arithmetic/logical.

Unfortunately it doesn't make a bit of difference in the final output,
so something post DOM was picking up these anyway (most likely VRP2).
But I'm a fan of getting stuff optimized earlier in the pipeline when
it's reasonable to do so, and this seems reasonable.

Limiting to TYPE_PRECISION > 1 or TYPE_UNSIGNED ought to be trivial.
So further testing did show some code regressions from this improvement. 
 Specifically we were clearly better at propagating boolean values 
derived from test conditions into PHIs (and ultimately into real 
statements as well).  That was the purpose of the patch.


Where we took a small step backwards was the out-of-ssa translation and 
RTL expansion.  A constant argument in a PHI generates a constant load 
at RTL time.  We have uncprop to detect cases where there are already 
objects holding the value we want and just before out-of-ssa we 
un-propagate the constant.  When the object holding the value we want 
coalesces with the LHS of the PHI (which is most of the time) we win.


uncprop wasn't catching these new cases where we'd propagated constants 
more aggressively into PHI nodes.   This patch fixes that problem.


In all, this is a very small improvement in the generated code.  It may 
ultimately prove more useful in the future to drive path partitioning.


There's two small tests.  One verifies we're able to propagate more 
constants per the original intent of the patch.  The other verifies we 
un-propagate as well.


Bootstrapped and regression tested on x86_64.  Installed on the trunk.

jeff


commit 1384b36abcd52a7ac72ca6538afa2aed2e04f8e0
Author: Jeff Law 
Date:   Fri Jan 15 17:15:24 2016 -0500

PR tree-optimization/69270
* tree-ssanames.c (ssa_name_has_boolean_range): Moved here from
tree-ssa-dom.c.  Improve test for [0..1] ranve from VRP.
* tree-ssa-dom.c (ssa_name_has_boolean_range): Remove.
* tree-ssanames.h (ssa_name_has_boolean_range): Prototype.
* tree-ssa-uncprop.c (associate_equivalences_with_edges): Use
ssa_name_has_boolean_range and constant_boolean_node.

PR tree-optimization/69270
* gcc.dg/tree-ssa/pr69270-2.c: New test.
* gcc.dg/tree-ssa/pr69270-3.c: New test.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index e3dc328..409e981 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,13 @@
+2016-01-15  Jeff Law  
+
+   PR tree-optimization/69270
+   * tree-ssanames.c (ssa_name_has_boolean_range): Moved here from
+   tree-ssa-dom.c.  Improve test for [0..1] ranve from VRP.
+   * tree-ssa-dom.c (ssa_name_has_boolean_range): Remove.
+   * tree-ssanames.h (ssa_name_has_boolean_range): Prototype.
+   * tree-ssa-uncprop.c (associate_equivalences_with_edges): Use
+   ssa_name_has_boolean_range and constant_boolean_node.
+
 2016-01-15  Vladimir Makarov  
 
PR rtl-optimization/69030
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 29291a2..d9a9246 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,9 @@
+2016-01-15  Jeff Law  
+
+   PR tree-optimization/69270
+   * gcc.dg/tree-ssa/pr69270-2.c: New test.
+   * gcc.dg/tree-ssa/pr69270-3.c: New test.
+
 2016-01-15  Paul Thomas  
 
PR fortran/49630
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr69270-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr69270-2.c
new file mode 100644
index 000..15c7bdd
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr69270-2.c
@@ -0,0 +1,52 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-dom3-details -w" } */
+
+/* There should be a reference to usecount that turn into
+   constants.  */
+/* { dg-final { scan-tree-dump-times "Replaced .usecount_\[0-9\]+. with 
constant .1." 1 "dom3"} } */
+
+/* And an assignment using usecount ought to fold down t

[PATCH] Fix PR68799

2016-01-15 Thread Bill Schmidt
Hi,

In straight-line strength reduction, it can sometimes happen that more
than one conditional candidate can be predicated upon a PHI node P.
During processing of the first conditional candidate, a new PHI node may
be introduced in the same block as P, with new computations introduced
in predecessors of that block.  If this requires splitting one or more
edges, the PHI statement will change, and as a result it will now have a
different address in storage than it did before.

A problem arose because we cached the addresses of the PHI statements in
a mapping from statements to strength-reduction candidates.  If, after a
PHI statement changes, we refer to it by its new address when consulting
this mapping, we won't find the associated candidate.

Now, this shouldn't happen, because after creating the candidate table,
we should always refer to PHIs by the original address at the time of
the table's creation.  However, I had some sloppy code that caused us to
look up the PHI statement by its result operand, even though we already
had a handle to the original cached PHI statement.  When that code is
used, it can find the moved statement instead, and things go wrong.

This patch solves the problem by simply replacing the sloppy code with a
direct lookup based on the cached PHI statement address, so everything
remains consistent.

I haven't added a test case because this is a pretty difficult scenario
to re-create reliably.  The patch is pretty obvious, so I'm not too
concerned about that.

The problem was found originally by Matthias Klose while doing a PGO/LTO
build of python-2.7, and reported as PR68799.  Matthias has tested the
patch for me in his environment, and indeed it fixes the problem.  I've
bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
regressions for GCC 5.3.  Currently doing the same for trunk.

Assuming no regressions is this ok for mainline, and thereafter for GCC
5 and GCC 4.9?

Thanks,
Bill


2016-01-15  Bill Schmidt  

PR tree-optimization/68799
* gimple-ssa-strength-reduction.c (create_phi_basis): Directly
look up phi candidates in the statement-candidate map.
(phi_add_costs): Likewise.
(record_phi_increments): Likewise.
(phi_incr_cost): Likewise.
(ncd_with_phi): Likewise.
(all_phi_incrs_profitable): Likewise.


Index: gcc/gimple-ssa-strength-reduction.c
===
--- gcc/gimple-ssa-strength-reduction.c (revision 232394)
+++ gcc/gimple-ssa-strength-reduction.c (working copy)
@@ -2267,7 +2267,7 @@ create_phi_basis (slsr_cand_t c, gimple from_phi,
   slsr_cand_t basis = lookup_cand (c->basis);
   int nargs = gimple_phi_num_args (from_phi);
   basic_block phi_bb = gimple_bb (from_phi);
-  slsr_cand_t phi_cand = base_cand_from_table (gimple_phi_result (from_phi));
+  slsr_cand_t phi_cand = *stmt_cand_map->get (from_phi);
   phi_args.create (nargs);
 
   /* Process each argument of the existing phi that represents
@@ -2376,7 +2376,7 @@ phi_add_costs (gimple phi, slsr_cand_t c, int one_
 {
   unsigned i;
   int cost = 0;
-  slsr_cand_t phi_cand = base_cand_from_table (gimple_phi_result (phi));
+  slsr_cand_t phi_cand = *stmt_cand_map->get (phi);
 
   /* If we work our way back to a phi that isn't dominated by the hidden
  basis, this isn't a candidate for replacement.  Indicate this by
@@ -2587,7 +2587,7 @@ static void
 record_phi_increments (slsr_cand_t basis, gimple phi)
 {
   unsigned i;
-  slsr_cand_t phi_cand = base_cand_from_table (gimple_phi_result (phi));
+  slsr_cand_t phi_cand = *stmt_cand_map->get (phi);
   
   for (i = 0; i < gimple_phi_num_args (phi); i++)
 {
@@ -2658,7 +2658,7 @@ phi_incr_cost (slsr_cand_t c, const widest_int &in
   unsigned i;
   int cost = 0;
   slsr_cand_t basis = lookup_cand (c->basis);
-  slsr_cand_t phi_cand = base_cand_from_table (gimple_phi_result (phi));
+  slsr_cand_t phi_cand = *stmt_cand_map->get (phi);
 
   for (i = 0; i < gimple_phi_num_args (phi); i++)
 {
@@ -3002,7 +3002,7 @@ ncd_with_phi (slsr_cand_t c, const widest_int &inc
 {
   unsigned i;
   slsr_cand_t basis = lookup_cand (c->basis);
-  slsr_cand_t phi_cand = base_cand_from_table (gimple_phi_result (phi));
+  slsr_cand_t phi_cand = *stmt_cand_map->get (phi);
 
   for (i = 0; i < gimple_phi_num_args (phi); i++)
 {
@@ -3212,7 +3212,7 @@ all_phi_incrs_profitable (slsr_cand_t c, gimple ph
 {
   unsigned i;
   slsr_cand_t basis = lookup_cand (c->basis);
-  slsr_cand_t phi_cand = base_cand_from_table (gimple_phi_result (phi));
+  slsr_cand_t phi_cand = *stmt_cand_map->get (phi);
 
   for (i = 0; i < gimple_phi_num_args (phi); i++)
 {




[aarch64] Fix target/69176

2016-01-15 Thread Richard Henderson
See the PR for details, but basically, the plus operations are special so you
can't just split out one of the alternatives to a different pattern.

This merges the two-instruction add case back into the main plus pattern, and
then adds peepholes and splitters to generate the same code as before.

Ok?


r~
* config/aarch64/aarch64.md (add3): Move long immediate
operands to pseudo only if CSE is expected.  Split long immediate
operands only after reload, and for the stack pointer.
(*add3_pluslong): Remove.
(*addsi3_aarch64, *adddi3_aarch64): Merge into...
(*add3_aarch64): ... here.  Add r/rk/Upl alternative.
(*addsi3_aarch64_uxtw): Add r/rk/Upl alternative.
(*add3 peepholes): New.
(*add3 splitters): New.
* config/aarch64/constraints.md (Upl): New.
* config/aarch64/predicates.md (aarch64_pluslong_strict_immedate): New.


diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index f6c8eb1..bde231b 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -1590,96 +1590,120 @@
 (plus:GPI (match_operand:GPI 1 "register_operand" "")
  (match_operand:GPI 2 "aarch64_pluslong_operand" "")))]
   ""
-  "
-  if (!aarch64_plus_operand (operands[2], VOIDmode))
+{
+  if (aarch64_pluslong_strict_immedate (operands[2], mode))
 {
-  if (can_create_pseudo_p ())
-   {
- rtx tmp = gen_reg_rtx (mode);
- emit_move_insn (tmp, operands[2]);
- operands[2] = tmp;
-   }
-  else
+  /* Give CSE the opportunity to share this constant across additions.  */
+  if (!cse_not_expected && can_create_pseudo_p ())
+operands[2] = force_reg (mode, operands[2]);
+
+  /* Split will refuse to operate on a modification to the stack pointer.
+Aid the prologue and epilogue expanders by splitting this now.  */
+  else if (reload_completed && operands[0] == stack_pointer_rtx)
{
- HOST_WIDE_INT imm = INTVAL (operands[2]);
- imm = imm >= 0 ? imm & 0xfff : -(-imm & 0xfff);
- emit_insn (gen_add3 (operands[0], operands[1],
-GEN_INT (INTVAL (operands[2]) - imm)));
+ HOST_WIDE_INT i = INTVAL (operands[2]);
+ HOST_WIDE_INT s = (i >= 0 ? i & 0xfff : -(-i & 0xfff));
+ emit_insn (gen_rtx_SET (operands[0],
+ gen_rtx_PLUS (mode, operands[1],
+   GEN_INT (i - s;
  operands[1] = operands[0];
- operands[2] = GEN_INT (imm);
+ operands[2] = GEN_INT (s);
}
 }
-  "
-)
-
-;; Find add with a 2-instruction immediate and merge into 2 add instructions.
-
-(define_insn_and_split "*add3_pluslong"
-  [(set
-(match_operand:GPI 0 "register_operand" "=r")
-(plus:GPI (match_operand:GPI 1 "register_operand" "r")
- (match_operand:GPI 2 "aarch64_pluslong_immediate" "i")))]
-  "!aarch64_plus_operand (operands[2], VOIDmode)
-   && !aarch64_move_imm (INTVAL (operands[2]), mode)"
-  "#"
-  "&& true"
-  [(set (match_dup 0) (plus:GPI (match_dup 1) (match_dup 3)))
-   (set (match_dup 0) (plus:GPI (match_dup 0) (match_dup 4)))]
-  "
-{
-  HOST_WIDE_INT imm = INTVAL (operands[2]);
-  imm = imm >= 0 ? imm & 0xfff : -(-imm & 0xfff);
-  operands[3] = GEN_INT (INTVAL (operands[2]) - imm);
-  operands[4] = GEN_INT (imm);
-}
-  "
-)
+})
 
-(define_insn "*addsi3_aarch64"
+(define_insn "*add3_aarch64"
   [(set
-(match_operand:SI 0 "register_operand" "=rk,rk,w,rk")
-(plus:SI
- (match_operand:SI 1 "register_operand" "%rk,rk,w,rk")
- (match_operand:SI 2 "aarch64_plus_operand" "I,r,w,J")))]
+(match_operand:GPI 0 "register_operand" "=rk,rk,w,rk,r")
+(plus:GPI
+ (match_operand:GPI 1 "register_operand" "%rk,rk,w,rk,rk")
+ (match_operand:GPI 2 "aarch64_pluslong_operand" "I,r,w,J,Upl")))]
   ""
   "@
-  add\\t%w0, %w1, %2
-  add\\t%w0, %w1, %w2
-  add\\t%0.2s, %1.2s, %2.2s
-  sub\\t%w0, %w1, #%n2"
-  [(set_attr "type" "alu_imm,alu_sreg,neon_add,alu_imm")
-   (set_attr "simd" "*,*,yes,*")]
+  add\\t%0, %1, %2
+  add\\t%0, %1, %2
+  add\\t%0, %1, %2
+  sub\\t%0, %1, #%n2
+  #"
+  [(set_attr "type" "alu_imm,alu_sreg,neon_add,alu_imm,multiple")
+   (set_attr "simd" "*,*,yes,*,*")]
 )
 
 ;; zero_extend version of above
 (define_insn "*addsi3_aarch64_uxtw"
   [(set
-(match_operand:DI 0 "register_operand" "=rk,rk,rk")
+(match_operand:DI 0 "register_operand" "=rk,rk,rk,r")
 (zero_extend:DI
- (plus:SI (match_operand:SI 1 "register_operand" "%rk,rk,rk")
-  (match_operand:SI 2 "aarch64_plus_operand" "I,r,J"]
+ (plus:SI (match_operand:SI 1 "register_operand" "%rk,rk,rk,rk")
+  (match_operand:SI 2 "aarch64_pluslong_operand" "I,r,J,Upl"]
   ""
   "@
   add\\t%w0, %w1, %2
   add\\t%w0, %w1, %w2
-  sub\\t%w0, %w1, #%n2"
-  [(set_attr "type" "alu_imm,alu_sreg,alu_imm")]
+  sub\\t%w

Re: reject decl with incomplete struct/union type in check_global_declaration()

2016-01-15 Thread Joseph Myers
On Fri, 15 Jan 2016, Prathamesh Kulkarni wrote:

> On 15 January 2016 at 03:27, Joseph Myers  wrote:
> > On Thu, 14 Jan 2016, Prathamesh Kulkarni wrote:
> >
> >> Hi,
> >> For test-case containing only the following declaration:
> >> static struct undefined_struct object;
> >> gcc rejects it at -O0 in assemble_variable() with error "storage size
> >> of  is unknown",
> >> however no error is reported when compiled with -O2.
> >
> > Cf bug 24293 (for the -fsyntax-only case) - does this patch fix that?
> Ah this doesn't fix PR24293, it seems analyze_function() doesn't get
> called for -fsyntax-only.
> I don't have a good solution for this. I assume varpool won't be
> populated for -fsyntax-only ?
> And we need to walk over decls with incomplete struct/union types
> after parsing the whole translation unit.
> In the attached patch, I kept a global vec incomplete_record_decls;
> In finish_decl(), if the decl is static, has type struct/union and
> size 0 then it is appened to incomplete_record_decls.
> In c_parser_translation_unit(), iterate over incomplete_record_decls
> and if report error if any decl has size zero.
> The patch passes testsuite.

There's a GNU C extension allowing forward declarations of enums, and it 
seems that

static enum e x;

doesn't get diagnosed either with -fsyntax-only.  Thus I think you should 
cover that case as well.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH 2/2] RFE: poisoning of invalid memory blocks and obstacks

2016-01-15 Thread Jeff Law

On 01/15/2016 01:04 PM, David Malcolm wrote:

It was difficult to track down the memory corruption bug fixed by the
previous patch (PR jit/68446).  The following patch attempts to make
it easier to find that kind of thing by adding "poisoning" code:

(A) when memory blocks are returned to the memory_block_pool's free
 list (e.g. by an obstack), fill the content with a garbage value.

(B) When calling
   obstack_free (obstack, NULL);
 which leaves the obstack requiring reinitialization, fill
 the obstack's fields with a garbage value.

in both cases to try fail faster for use-after-free errors.

This patch isn't ready as-is:
- I couldn't see an equivalent of CHECKING_P for libiberty, so
   case (B) would do it even in a production build.

- this interracts badly with Valgrind; the latter emits messages
   about "Invalid write of size 8"
"16 bytes inside a block of size 65,536 alloc'd"
   I think that it merely needs some extra uses of the valgrind
   annotation macros to fix.

- the garbage/poison values I picked were rather arbitrary

That said, it's survived bootstrap®rtesting on x86_64-pc-linux-gnu
(in conjunction with the previous patch).

Thoughts?

gcc/ChangeLog:
* memory-block.h (memory_block_pool::release): If CHECKING_P,
fill the released block with a poison value.

libiberty/ChangeLog:
* obstack.c (_obstack_free): If OBJ is zero, poison the
obstack to highlight the need for reinitialization.
---
  gcc/memory-block.h  | 3 +++
  libiberty/obstack.c | 5 +
  2 files changed, 8 insertions(+)

diff --git a/gcc/memory-block.h b/gcc/memory-block.h
index d7b96a3..52c17f9 100644
--- a/gcc/memory-block.h
+++ b/gcc/memory-block.h
@@ -66,6 +66,9 @@ inline void
  memory_block_pool::release (void *uncast_block)
  {
block_list *block = new (uncast_block) block_list;
+#if CHECKING_P
+  memset (block, 0xde, block_size);
+#endif

Is there some reason this isn't if (flag_checking) instead of a #if?

As you note, we don't currently have the concept of checking mode for 
libiberty.  If obstacks weren't opaque, we could wrap obstack_free 
inside GCC and handle poising there.


We'll definitely want the valgrind annotations.

This feels like something we should add during gcc7's cycle.  Note that 
we may not be the canonical sources for obstacks -- I'm really not sure 
on that one.


jeff




Re: [PATCH 5/5] s390: Add -fsplit-stack support

2016-01-15 Thread Marcin Kościelnicki

On 15/01/16 19:38, Andreas Krebbel wrote:

Marcin,

your implementation looks very good to me. Thanks!

But please be aware that we deprecated the support of g5 and g6 and intend to 
remove that code from
the back-end with the next GCC version.  So I would prefer if you could remove 
all the
!TARGET_CPU_ZARCH stuff from the implementation and just error out if 
split-stack is enabled with
-march g5/g6.  It currently makes the implementation more complicated and would 
have to be removed
anyway in the future.

Thanks!

https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01854.html


Bye,

-Andreas-




Very well, I'll do that.

Btw, as for dropping support for g5/g6: I've noticed 
s390_function_profiler could also use larl+brasl for -m31 given 
TARGET_CPU_ZARCH.  Should I submit a patch for that?  I'm asking because 
gold with -fsplit-stack needs to know the exact sequence used, so if 
it's going to change after g5/g6 removal, I'd better add it to gold now 
(and make gcc always emit it for non-g5/g6, so that gold won't need to 
look at the old one).


What about the other patches?  #1 and #2 should be ready to go.  I'm not 
sure how I should go about getting #3 and #4 reviewed.  We don't need #3 
anymore once g5/g6 support is removed, but #4 might still be necessary - 
we still have that unconditional jump.


Marcin Kościelnicki


Re: [PATCH] fix #69277 - [6 Regression] ICE mangling a flexible array member

2016-01-15 Thread Jason Merrill

On 01/14/2016 10:01 PM, Martin Sebor wrote:

In anticipation of needing to do something I put together the attached
patch that rolls this change into version 10, letting version 9 and
prior roll it back.  I also mention it in the manual.  What the patch
doesn't do is add a warning.


Looks good, but please also add a warning.  See the use of 
abi_warn_or_compat_version_crosses elsewhere in mangle.c.


Jason



Re: [PATCH] add test for c++/68490 - error initializing a structure with a flexible array member

2016-01-15 Thread Jason Merrill

On 01/14/2016 07:00 PM, Martin Sebor wrote:

Among the bugs fixed by the flexible array patch (r231665) was
c++/68490.  I forgot to include a test for this bug in the commit
so I'm adding it via the attached patch.

(Please let me know if adding new passing tests is considered
trivial and I don't need to request approval for such things.)


Yes, go ahead and add tests without waiting for approval.


+// { dg-options "-Wpedantic -Wno-error=pedantic" }


You shouldn't need the -Wno-error=pedantic, though.

Jason



[PATCH] [graphite] fix pr68692: reinstantiate the copy of internal parameters

2016-01-15 Thread Sebastian Pop
Adding a testcase and reverting this patch:
[PATCH] remove parameter_rename_map

This map was used in the transition to the new scop detection: with the new scop
detection, we do not need this map anymore.

   * graphite-isl-ast-to-gimple.c (gcc_expression_from_isl_ast_expr_id):
   Remove use of parameter_rename_map.
   (copy_def): Remove.
   (copy_internal_parameters): Remove.
   (graphite_regenerate_ast_isl): Remove call to copy_internal_parameters.
   * sese.c (new_sese_info): Do not initialize parameter_rename_map.
   (free_sese_info): Do not free parameter_rename_map.
   (set_rename): Do not use parameter_rename_map.
   (rename_uses): Update call to set_rename.
   (graphite_copy_stmts_from_block): Do not use parameter_rename_map.
   * sese.h (parameter_rename_map_t): Remove.
   (struct sese_info_t): Remove field parameter_rename_map.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@229783 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/graphite-isl-ast-to-gimple.c   | 107 -
 gcc/sese.c |   4 +
 gcc/sese.h |   4 +
 gcc/testsuite/gfortran.dg/graphite/pr68692.f90 |  64 +++
 4 files changed, 178 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gfortran.dg/graphite/pr68692.f90

diff --git a/gcc/graphite-isl-ast-to-gimple.c b/gcc/graphite-isl-ast-to-gimple.c
index d7d0f7a..a8f88ff 100644
--- a/gcc/graphite-isl-ast-to-gimple.c
+++ b/gcc/graphite-isl-ast-to-gimple.c
@@ -512,7 +512,11 @@ gcc_expression_from_isl_ast_expr_id (tree type,
  "Could not map isl_id to tree expression");
   isl_ast_expr_free (expr_id);
   tree t = res->second;
-  return fold_convert (type, t);
+  tree *val = region->parameter_rename_map->get(t);
+
+  if (!val)
+   val = &t;
+  return fold_convert (type, *val);
 }
 
 /* Converts an isl_ast_expr_int expression E to a GCC expression tree of
@@ -1475,6 +1479,13 @@ translate_isl_ast_to_gimple::set_rename (tree old_name, 
tree expr)
   r.safe_push (expr);
   region->rename_map->put (old_name, r);
 }
+
+  tree t;
+  int i;
+  /* For a parameter of a scop we don't want to rename it.  */
+  FOR_EACH_VEC_ELT (region->params, i, t)
+if (old_name == t)
+  region->parameter_rename_map->put(old_name, expr);
 }
 
 /* Return an iterator to the instructions comes last in the execution order.
@@ -2735,6 +2746,14 @@ should_copy_to_new_region (gimple *stmt, sese_info_p 
region)
   && scev_analyzable_p (lhs, region->region))
 return false;
 
+  /* Do not copy parameters that have been generated in the header of the
+ scop.  */
+  if (is_gimple_assign (stmt)
+  && (lhs = gimple_assign_lhs (stmt))
+  && TREE_CODE (lhs) == SSA_NAME
+  && region->parameter_rename_map->get(lhs))
+return false;
+
   return true;
 }
 
@@ -2800,6 +2819,25 @@ 
translate_isl_ast_to_gimple::graphite_copy_stmts_from_block (basic_block bb,
   if (codegen_error_p ())
return false;
 
+  /* For each SSA_NAME in the parameter_rename_map rename their usage.  */
+  ssa_op_iter iter;
+  use_operand_p use_p;
+  if (!is_gimple_debug (copy))
+   FOR_EACH_SSA_USE_OPERAND (use_p, copy, iter, SSA_OP_USE)
+ {
+   tree old_name = USE_FROM_PTR (use_p);
+
+   if (TREE_CODE (old_name) != SSA_NAME
+   || SSA_NAME_IS_DEFAULT_DEF (old_name))
+ continue;
+
+   tree *new_expr = region->parameter_rename_map->get (old_name);
+   if (!new_expr)
+ continue;
+
+   replace_exp (use_p, *new_expr);
+ }
+
   update_stmt (copy);
 }
 
@@ -3111,6 +3149,70 @@ translate_isl_ast_to_gimple::scop_to_isl_ast (scop_p 
scop)
   return ast_isl;
 }
 
+/* Copy def from sese REGION to the newly created TO_REGION. TR is defined by
+   DEF_STMT. GSI points to entry basic block of the TO_REGION.  */
+
+static void
+copy_def (tree tr, gimple *def_stmt, sese_info_p region, sese_info_p to_region,
+ gimple_stmt_iterator *gsi)
+{
+  if (!defined_in_sese_p (tr, region->region))
+return;
+
+  ssa_op_iter iter;
+  use_operand_p use_p;
+  FOR_EACH_SSA_USE_OPERAND (use_p, def_stmt, iter, SSA_OP_USE)
+{
+  tree use_tr = USE_FROM_PTR (use_p);
+
+  /* Do not copy parameters that have been generated in the header of the
+scop.  */
+  if (region->parameter_rename_map->get(use_tr))
+   continue;
+
+  gimple *def_of_use = SSA_NAME_DEF_STMT (use_tr);
+  if (!def_of_use)
+   continue;
+
+  copy_def (use_tr, def_of_use, region, to_region, gsi);
+}
+
+  gimple *copy = gimple_copy (def_stmt);
+  gsi_insert_after (gsi, copy, GSI_NEW_STMT);
+
+  /* Create new names for all the definitions created by COPY and
+ add replacement mappings for each new name.  */
+  def_operand_p def_p;
+  ssa_op_iter op_iter;
+  FOR_EACH_SSA_DEF_OPERAND (def_p, copy, op_iter, SSA_OP_ALL_DEFS)
+{

Re: [PATCH] Fix PR c++/69091 (ICE with operator overload having 'auto' return type)

2016-01-15 Thread Jason Merrill

OK.

Jason


Re: [PATCH] Decrease size of cp_token (PR bootstrap/68271)

2016-01-15 Thread Jason Merrill

OK.

Jason


Re: [Patch, fortran] Bug 68241 - [meta-bug] Deferred-length character - PRs49630, 54070, 60593, 60795, 61147, 63232 and 64324

2016-01-15 Thread Paul Richard Thomas
Dear All,

Following an exchange with Dominique on #gfortran, I fixed PR54070
comment #23. The changes are in trans-array.c and are listed in the
ChangeLogs below.

Committed to trunk as revision 232450. I will wait some weeks before
committing to 5-branch. This patch should have made deferred character
length a rather more usable feature. They still don't work in common
blocks (PR55735) and there are still problems with them as associate
variables (PR60458). I will endeavour to fix these PRs next.

Thanks, Dominique!

Paul

On 9 January 2016 at 20:33, Paul Richard Thomas
 wrote:
> Dear All,
>
> This is a further instalment of deferred character length fixes. I
> have listed the status of all the deferred length PRs that I know of
> in an attachment. As far as I can see, there are five left that are
> really concerned with deferred character length functionality.
>
> In terms of the number of PRs fixed, this patch is rather less
> impressive than it looks. Essentially four things have been fixed:
> (i) Deferred character length results are passed by reference and so,
> within the procedure itself, they are consistently indirectly
> referenced;
> (ii) The deferred character types are made correctly by indirectly
> referencing the character length;
> (iii) Array references to deferred character arrays use pointer arithmetic; 
> and
> (iv) Scalar assignments to unallocated arrays are trapped at runtime
> with -fcheck=mem.
>
> A minor tweak was required to fix PR64324 because deferred length
> characters were being misidentified as assumed length.
>
> The ChangeLog is clear as to what has been done. The only point on
> which I am uncertain is that of making the length parameter of
> deferred character length procedure results TREE_STATIC. This was
> required to make the patch function correctly at any level of
> optimization. Is this the best and/or only way of doing this?
>
> Bootstrapped and regtested on FC21/x86_64 - OK for trunk and, after a
> decent interval, 5 branch?
>
> Cheers
>
> Paul
>
> 2016-01-09  Paul Thomas  
>
> PR fortran/64324
> * resolve.c (check_uop_procedure): Prevent deferred length
> characters from being trapped by assumed length error.
>
> PR fortran/49630
> PR fortran/54070
> PR fortran/60593
> PR fortran/60795
> PR fortran/61147
> PR fortran/64324
> * trans-array.c (gfc_conv_scalarized_array_ref): Pass decl for
> function as well as variable expressions.
> * trans.c (gfc_build_array_ref): Expand logic for setting span
> to include indirect references to character lengths.
> * trans-decl.c (gfc_get_symbol_decl): Ensure that deferred
> result char lengths that are PARM_DECLs are indirectly
> referenced both for directly passed and by reference.
> (create_function_arglist): If the length type is a pointer type
> then store the length as the 'passed_length' and make the char
> length an indirect reference to it.
> (gfc_trans_deferred_vars): If a character length has escaped
> being set as an indirect reference, return it via the 'passed
> length'.
> * trans-expr.c (gfc_conv_procedure_call): The length of
> deferred character length results is set TREE_STATIC and set to
> zero.
> (gfc_trans_assignment_1): Do not fix the rse string_length if
> it is a variable, a parameter or an indirect reference. Add the
> code to trap assignment of scalars to unallocated arrays.
> * trans-stmt.c (gfc_trans_allocate): Remove 'def_str_len' and
> all references to it. Instead, replicate the code to obtain a
> explicitly defined string length and provide a value before
> array allocation so that the dtype is correctly set.
> trans-types.c (gfc_get_character_type): If the character length
> is a pointer, use the indirect reference.
>
> 2016-01-09  Paul Thomas  
>
> PR fortran/49630
> * gfortran.dg/deferred_character_13.f90: New test for the fix
> of comment 3 of the PR.
>
> PR fortran/54070
> * gfortran.dg/deferred_character_8.f90: New test
> * gfortran.dg/allocate_error_5.f90: New test
>
> PR fortran/60593
> * gfortran.dg/deferred_character_10.f90: New test
>
> PR fortran/60795
> * gfortran.dg/deferred_character_14.f90: New test
>
> PR fortran/61147
> * gfortran.dg/deferred_character_11.f90: New test
>
> PR fortran/64324
> * gfortran.dg/deferred_character_9.f90: New test



-- 
The difference between genius and stupidity is; genius has its limits.

Albert Einstein


Re: [hsa merge 08/10] HSAIL BRIG description header file

2016-01-15 Thread Richard Biener
On January 15, 2016 6:50:51 PM GMT+01:00, Mike Stump  
wrote:
>On Jan 15, 2016, at 2:37 AM, Jakub Jelinek  wrote:
>> HSA Foundation grants express permission to any current Founder,
>Promoter,
>> Supporter Contributor, Academic or Associate member of HSA Foundation
>to
>> copy and redistribute UNMODIFIED versions of this specification
>
>So, this isn’t the GNU way.  We need to get permission from and they
>need to grant us, or they need to sign an assignment or it needs to be
>reimplemented.
>
>They need to ask themselves, if they want us to support their standard
>or not.  Getting permission might take a week to a month, but, it is
>better to go that route.  If they don’t want to grant us what we want,
>then they are likely to want to sue users of our compiler, and in that
>case, we are better not putting it in in the first place.
>
>My vote would be for the SC to nix this until the issue is resolved.

It's a non copyrightable set of magic numbers associated to identifiers (we 
could even change).

Richard.




Re: [PATCH] Fix -Wformat-security warning in libgfortran

2016-01-15 Thread Paul Richard Thomas
Hi Jakub,

Of course, that's OK; obvious even - good for trunk.

Thanks

Paul

On 15 January 2016 at 21:07, Jakub Jelinek  wrote:
> Hi!
>
> In our gcc package build, libgfortran is built with -Werror=format-security
> and errors on this file.  While it is a false positive, because
> cmdmsg_values[i] for any valid i don't contain % characters, IMNSHO it is
> better to use "%s", msg anyway to make it clear that msg should not be
> interpretted as format string.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2016-01-15  Jakub Jelinek  
>
> * intrinsics/execute_command_line.c (set_cmdstat): Use "%s", msg
> instead of msg to avoid -Wformat-security warning.
>
> --- libgfortran/intrinsics/execute_command_line.c.jj2016-01-04 
> 15:14:11.0 +0100
> +++ libgfortran/intrinsics/execute_command_line.c   2016-01-15 
> 14:47:32.132158422 +0100
> @@ -1,6 +1,6 @@
>  /* Implementation of the EXECUTE_COMMAND_LINE intrinsic.
> Copyright (C) 2009-2016 Free Software Foundation, Inc.
> -   Contributed by François-Xavier Coudert.
> +   Contributed by François-Xavier Coudert.
>
>  This file is part of the GNU Fortran runtime library (libgfortran).
>
> @@ -55,7 +55,7 @@ set_cmdstat (int *cmdstat, int value)
>  #define MSGLEN 200
>char msg[MSGLEN] = "EXECUTE_COMMAND_LINE: ";
>strncat (msg, cmdmsg_values[value], MSGLEN - strlen(msg) - 1);
> -  runtime_error (msg);
> +  runtime_error ("%s", msg);
>  }
>  }
>
>
> Jakub



-- 
The difference between genius and stupidity is; genius has its limits.

Albert Einstein


[PATCH] Decrease size of cp_token (PR bootstrap/68271)

2016-01-15 Thread Jakub Jelinek
Hi!

As discussed in bugzilla some time ago, this patch decreases size of
cp_token on 64-bit hosts from 24 bytes to 16 bytes and on 32-bit
hosts from 16 bytes to 12.  As for C++ all tokens are preparsed, it is quite
important.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-01-15  Jakub Jelinek  

PR bootstrap/68271
* parser.h (cp_token): Remove pragma_kind field.  Add comment
with number of unused bits.
* parser.c (eof_token): Remove pragma_kind field initializer.
(cp_lexer_get_preprocessor_token): Don't set pragma_kind
field, don't clear CPP_PRAGMA u.value.
(cp_parser_pragma_kind): New function.
(cp_parser_omp_sections_scope, cp_parser_oacc_kernels_parallel,
cp_parser_omp_construct, cp_parser_initial_pragma,
cp_parser_pragma): Use cp_parser_pragma_kind instead of accessing
pragma_kind field.

* c-pragma.c (c_register_pragma_1): Adjust comment to note that
C++ FE no longer has limit on number of pragmas.

--- gcc/cp/parser.h.jj  2016-01-04 14:55:57.0 +0100
+++ gcc/cp/parser.h 2016-01-15 15:46:53.479338001 +0100
@@ -47,8 +47,6 @@ struct GTY (()) cp_token {
   ENUM_BITFIELD (rid) keyword : 8;
   /* Token flags.  */
   unsigned char flags;
-  /* Identifier for the pragma.  */
-  ENUM_BITFIELD (pragma_kind) pragma_kind : 8;
   /* True if this token is from a context where it is implicitly extern "C" */
   BOOL_BITFIELD implicit_extern_c : 1;
   /* True if an error has already been reported for this token, such as a
@@ -59,6 +57,7 @@ struct GTY (()) cp_token {
  it is no longer a valid token and it should be considered
  deleted.  */
   BOOL_BITFIELD purged_p : 1;
+  /* 5 unused bits.  */
   /* The location at which this token was found.  */
   location_t location;
   /* The value associated with this token, if any.  */
--- gcc/cp/parser.c.jj  2016-01-14 22:31:22.0 +0100
+++ gcc/cp/parser.c 2016-01-15 16:24:42.520089347 +0100
@@ -48,7 +48,7 @@ along with GCC; see the file COPYING3.
 
 static cp_token eof_token =
 {
-  CPP_EOF, RID_MAX, 0, PRAGMA_NONE, false, false, false, 0, { NULL }
+  CPP_EOF, RID_MAX, 0, false, false, false, 0, { NULL }
 };
 
 /* The various kinds of non integral constant we encounter. */
@@ -782,7 +782,6 @@ cp_lexer_get_preprocessor_token (cp_lexe
 = c_lex_with_flags (&token->u.value, &token->location, &token->flags,
lexer == NULL ? 0 : C_LEX_STRING_NO_JOIN);
   token->keyword = RID_MAX;
-  token->pragma_kind = PRAGMA_NONE;
   token->purged_p = false;
   token->error_reported = false;
 
@@ -848,13 +847,6 @@ cp_lexer_get_preprocessor_token (cp_lexe
default:token->keyword = C_RID_CODE (token->u.value);
}
 }
-  else if (token->type == CPP_PRAGMA)
-{
-  /* We smuggled the cpp_token->u.pragma value in an INTEGER_CST.  */
-  token->pragma_kind = ((enum pragma_kind)
-   TREE_INT_CST_LOW (token->u.value));
-  token->u.value = NULL_TREE;
-}
 }
 
 /* Update the globals input_location and the input file stack from TOKEN.  */
@@ -2689,6 +2681,18 @@ cp_parser_is_keyword (cp_token* token, e
   return token->keyword == keyword;
 }
 
+/* Return TOKEN's pragma_kind if it is CPP_PRAGMA, otherwise
+   PRAGMA_NONE.  */
+
+static enum pragma_kind
+cp_parser_pragma_kind (cp_token *token)
+{
+  if (token->type != CPP_PRAGMA)
+return PRAGMA_NONE;
+  /* We smuggled the cpp_token->u.pragma value in an INTEGER_CST.  */
+  return (enum pragma_kind) TREE_INT_CST_LOW (token->u.value);
+}
+
 /* Helper function for cp_parser_error.
Having peeked a token of kind TOK1_KIND that might signify
a conflict marker, peek successor tokens to determine
@@ -33937,7 +33941,8 @@ cp_parser_omp_sections_scope (cp_parser
 
   stmt = push_stmt_list ();
 
-  if (cp_lexer_peek_token (parser->lexer)->pragma_kind != PRAGMA_OMP_SECTION)
+  if (cp_parser_pragma_kind (cp_lexer_peek_token (parser->lexer))
+  != PRAGMA_OMP_SECTION)
 {
   substmt = cp_parser_omp_structured_block (parser);
   substmt = build1 (OMP_SECTION, void_type_node, substmt);
@@ -33952,7 +33957,7 @@ cp_parser_omp_sections_scope (cp_parser
   if (tok->type == CPP_EOF)
break;
 
-  if (tok->pragma_kind == PRAGMA_OMP_SECTION)
+  if (cp_parser_pragma_kind (tok) == PRAGMA_OMP_SECTION)
{
  cp_lexer_consume_token (parser->lexer);
  cp_parser_require_pragma_eol (parser, tok);
@@ -35356,7 +35361,7 @@ cp_parser_oacc_kernels_parallel (cp_pars
 {
   omp_clause_mask mask;
   enum tree_code code;
-  switch (pragma_tok->pragma_kind)
+  switch (cp_parser_pragma_kind (pragma_tok))
 {
 case PRAGMA_OACC_KERNELS:
   strcat (p_name, " kernels");
@@ -36572,7 +36577,7 @@ cp_parser_omp_construct (cp_parser *pars
   char p_name[sizeof "#pragma omp teams distribute parallel for simd"];
   omp_clause_mask mask (0);
 
-  switch (pragma_tok->pragma_kind)
+  sw

Re: [PATCH] Fix warning in adaint.c

2016-01-15 Thread Arnaud Charlet
> I've noticed
> ../../gcc/ada/adaint.c: In function 'char*
> __gnat_locate_exec_on_path(char*)':
> ../../gcc/ada/adaint.c:2799:34: warning: deprecated conversion from
> string constant to 'char*' [-Wwrite-strings]
>if (path_val == NULL) path_val = "";
>   ^
> warning, fixed thusly.
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK, thanks.

> 2016-01-15  Jakub Jelinek  
> 
>   * adaint.c (__gnat_locate_exec_on_path): Use const char * instead
>   of char * for path_val to avoid warnings.
> 
> --- gcc/ada/adaint.c.jj   2015-11-18 11:19:23.412735554 +0100
> +++ gcc/ada/adaint.c  2016-01-15 14:23:31.029079447 +0100
> @@ -2791,7 +2791,7 @@ __gnat_locate_exec_on_path (char *exec_n
>WS2SC (apath_val, wapath_val, EXPAND_BUFFER_SIZE);
>  
>  #else
> -  char *path_val = getenv ("PATH");
> +  const char *path_val = getenv ("PATH");
>  
>/* If PATH is not defined, proceed with __gnat_locate_exec anyway, so we
>can
>   find files that contain directory names.  */
> 
>   Jakub
> 


[PATCH] Fix -Wformat-security warning in libgfortran

2016-01-15 Thread Jakub Jelinek
Hi!

In our gcc package build, libgfortran is built with -Werror=format-security
and errors on this file.  While it is a false positive, because
cmdmsg_values[i] for any valid i don't contain % characters, IMNSHO it is
better to use "%s", msg anyway to make it clear that msg should not be
interpretted as format string.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-01-15  Jakub Jelinek  

* intrinsics/execute_command_line.c (set_cmdstat): Use "%s", msg
instead of msg to avoid -Wformat-security warning.

--- libgfortran/intrinsics/execute_command_line.c.jj2016-01-04 
15:14:11.0 +0100
+++ libgfortran/intrinsics/execute_command_line.c   2016-01-15 
14:47:32.132158422 +0100
@@ -1,6 +1,6 @@
 /* Implementation of the EXECUTE_COMMAND_LINE intrinsic.
Copyright (C) 2009-2016 Free Software Foundation, Inc.
-   Contributed by François-Xavier Coudert.
+   Contributed by François-Xavier Coudert.
 
 This file is part of the GNU Fortran runtime library (libgfortran).
 
@@ -55,7 +55,7 @@ set_cmdstat (int *cmdstat, int value)
 #define MSGLEN 200
   char msg[MSGLEN] = "EXECUTE_COMMAND_LINE: ";
   strncat (msg, cmdmsg_values[value], MSGLEN - strlen(msg) - 1);
-  runtime_error (msg);
+  runtime_error ("%s", msg);
 }
 }
 

Jakub


[PATCH] Fix warning in adaint.c

2016-01-15 Thread Jakub Jelinek
Hi!

I've noticed
../../gcc/ada/adaint.c: In function 'char* __gnat_locate_exec_on_path(char*)':
../../gcc/ada/adaint.c:2799:34: warning: deprecated conversion from string 
constant to 'char*' [-Wwrite-strings]
   if (path_val == NULL) path_val = "";
  ^
warning, fixed thusly.
Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-01-15  Jakub Jelinek  

* adaint.c (__gnat_locate_exec_on_path): Use const char * instead
of char * for path_val to avoid warnings.

--- gcc/ada/adaint.c.jj 2015-11-18 11:19:23.412735554 +0100
+++ gcc/ada/adaint.c2016-01-15 14:23:31.029079447 +0100
@@ -2791,7 +2791,7 @@ __gnat_locate_exec_on_path (char *exec_n
   WS2SC (apath_val, wapath_val, EXPAND_BUFFER_SIZE);
 
 #else
-  char *path_val = getenv ("PATH");
+  const char *path_val = getenv ("PATH");
 
   /* If PATH is not defined, proceed with __gnat_locate_exec anyway, so we can
  find files that contain directory names.  */

Jakub


Re: [hsa 2/10] Modifications to libgomp proper

2016-01-15 Thread Jakub Jelinek
On Tue, Jan 12, 2016 at 03:23:32PM +0100, Jakub Jelinek wrote:
> But looking at GOMP_PLUGIN_target_task_completion, I see we have a bug in
> there,
>   gomp_mutex_lock (&team->task_lock);
>   if (ttask->state == GOMP_TARGET_TASK_READY_TO_RUN)
> {
>   ttask->state = GOMP_TARGET_TASK_FINISHED;
>   gomp_mutex_unlock (&team->task_lock);
> }
>   ttask->state = GOMP_TARGET_TASK_FINISHED;
>   gomp_target_task_completion (team, task);
>   gomp_mutex_unlock (&team->task_lock);
> there was meant to be I think return; after the first unlock, otherwise
> it doubly unlocks the same lock, and performs gomp_target_task_completion
> without the lock held, which may cause great havoc.

I've bootstrapped/regtested this on x86_64-linux and i686-linux (no
offloading), and regtested on x86_64-linux -> x86_64-intelmicemul-linux
offloading, and then tested also with sleep (1) added in between gomp_mutex_lock
and preceeding gomp_target_task_fn call, both without and with the fix.
Without the fix with the extra sleeps, 3 target-3*.c tests FAILed (crashed,
hanged forever), with the fix everything was ok.

Installed on the trunk.

2016-01-15  Jakub Jelinek  

* task.c (GOMP_PLUGIN_target_task_completion): Add missing return.

--- libgomp/task.c.jj   2016-01-04 14:38:59.0 +0100
+++ libgomp/task.c  2016-01-15 00:11:08.851909133 +0100
@@ -579,6 +579,7 @@ GOMP_PLUGIN_target_task_completion (void
 {
   ttask->state = GOMP_TARGET_TASK_FINISHED;
   gomp_mutex_unlock (&team->task_lock);
+  return;
 }
   ttask->state = GOMP_TARGET_TASK_FINISHED;
   gomp_target_task_completion (team, task);


Jakub


Re: [PATCH, PR68976] Use reaching def phi arg in sese_add_exit_phis_edge

2016-01-15 Thread Sebastian Pop
On Fri, Jan 15, 2016 at 11:19 AM, Sebastian Pop  wrote:
> On Fri, Jan 15, 2016 at 7:58 AM, Tom de Vries  wrote:
>> During scop detection/canonicalize_loop_closed_ssa_form, an exit phi is
>> introduced in the loop for _24:
>> ...
>>   :
>>   # _58 = PHI <_24(22)>
>> ...
>> Note that _24 is not defined in the loop, but before it. AFAIU the header
>> comment of canonicalize_loop_closed_ssa_form, this phi is not needed. That
>> might be the root cause of the bug,
>
> I think that may be the problem, as it is invariant in the loops, so
> it is considered to be a parameter of the scop.
> Let me see if we could avoid adding that phi node in the first place.

I just sent out a patch that implements this.
Thanks Tom for pointing out the issue!


[PATCH] [graphite] fix PR68976: only add loop close phi for names defined in loop

2016-01-15 Thread Sebastian Pop
* graphite-isl-ast-to-gimple.c: Fix comment.
* graphite-scop-detection.c (defined_in_loop_p): New.
(canonicalize_loop_closed_ssa): Do not add close phi nodes for SSA
names defined in loop.

gcc/testsuite

* gcc.dg/graphite/pr68976.c: New test.
---
 gcc/graphite-isl-ast-to-gimple.c|  4 ++--
 gcc/graphite-scop-detection.c   | 13 -
 gcc/testsuite/gcc.dg/graphite/pr68976.c | 11 +++
 3 files changed, 25 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/graphite/pr68976.c

diff --git a/gcc/graphite-isl-ast-to-gimple.c b/gcc/graphite-isl-ast-to-gimple.c
index df52c49..9509af4 100644
--- a/gcc/graphite-isl-ast-to-gimple.c
+++ b/gcc/graphite-isl-ast-to-gimple.c
@@ -492,8 +492,8 @@ private:
 /* Return the tree variable that corresponds to the given isl ast identifier
expression (an isl_ast_expr of type isl_ast_expr_id).
 
-   FIXME: We should replace blind conversation of id's type with derivation
-   of the optimal type when we get the corresponding isl support. Blindly
+   FIXME: We should replace blind conversion of id's type with derivation
+   of the optimal type when we get the corresponding isl support.  Blindly
converting type sizes may be problematic when we switch to smaller
types.  */
 
diff --git a/gcc/graphite-scop-detection.c b/gcc/graphite-scop-detection.c
index 8dfb20a..6ddf7bb 100644
--- a/gcc/graphite-scop-detection.c
+++ b/gcc/graphite-scop-detection.c
@@ -336,6 +336,15 @@ make_close_phi_nodes_unique (basic_block bb)
 }
 }
 
+/* Return true when NAME is defined in LOOP.  */
+
+static bool
+defined_in_loop_p (tree name, loop_p loop)
+{
+  gcc_assert (TREE_CODE (name) == SSA_NAME);
+  return loop == loop_containing_stmt (SSA_NAME_DEF_STMT (name));
+}
+
 /* Transforms LOOP to the canonical loop closed SSA form.  */
 
 static void
@@ -376,7 +385,9 @@ canonicalize_loop_closed_ssa (loop_p loop)
use_operand_p use_p;
gphi *close_phi;
 
-   if (TREE_CODE (arg) != SSA_NAME)
+   /* Only add close phi nodes for SSA_NAMEs defined in LOOP.  */
+   if (TREE_CODE (arg) != SSA_NAME
+   || !defined_in_loop_p (arg, loop))
  continue;
 
close_phi = create_phi_node (NULL_TREE, close);
diff --git a/gcc/testsuite/gcc.dg/graphite/pr68976.c 
b/gcc/testsuite/gcc.dg/graphite/pr68976.c
new file mode 100644
index 000..ae9bf0f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/graphite/pr68976.c
@@ -0,0 +1,11 @@
+/* { dg-options "-O2 -floop-nest-optimize" } */
+
+int kw = -1, hv = -1, ju;
+int mc[1];
+void xx(void)
+{
+  for (; kw; ++kw)
+for (; hv; ++hv)
+  for (ju = 0; ju < 2; ++ju)
+mc[kw+1] = mc[0];
+}
-- 
2.5.0



Re: [hsa merge 07/10] IPA-HSA pass

2016-01-15 Thread Jakub Jelinek
On Fri, Jan 15, 2016 at 10:46:32PM +0300, Alexander Monakov wrote:
> On Fri, 15 Jan 2016, Jakub Jelinek wrote:
> 
> > On Fri, Jan 15, 2016 at 10:19:13PM +0300, Alexander Monakov wrote:
> > > Sorry, can you clarify -- what do you mean by "can't offload"?
> > 
> > I meant stuff like setjmp/longjmp, exceptions?, alloca (I know your changes
> > might fix this one), computed goto, non-local goto, and the like, which I
> > believe nvptx doesn't support.
> 
> Right, but such issues are diagnosed as a compile-time error; the run-time
> stage is simply not reached.
> 
> Did you mean that eventually GCC might change and somehow allow compilation to
> run to completion even though offloaded code cannot be fully generated?

Yeah, at least depending on some option, either
downgrade all errors in the offloading compiler into warnings that just
result in the offloading image for the particular accelerator not being
created, or issue errors, but still allow the linking.

Jakub


Re: [hsa merge 07/10] IPA-HSA pass

2016-01-15 Thread Alexander Monakov
On Fri, 15 Jan 2016, Jakub Jelinek wrote:

> On Fri, Jan 15, 2016 at 10:19:13PM +0300, Alexander Monakov wrote:
> > Sorry, can you clarify -- what do you mean by "can't offload"?
> 
> I meant stuff like setjmp/longjmp, exceptions?, alloca (I know your changes
> might fix this one), computed goto, non-local goto, and the like, which I
> believe nvptx doesn't support.

Right, but such issues are diagnosed as a compile-time error; the run-time
stage is simply not reached.

Did you mean that eventually GCC might change and somehow allow compilation to
run to completion even though offloaded code cannot be fully generated?

Thanks.
Alexander


[PATCH 2/2] RFE: poisoning of invalid memory blocks and obstacks

2016-01-15 Thread David Malcolm
It was difficult to track down the memory corruption bug fixed by the
previous patch (PR jit/68446).  The following patch attempts to make
it easier to find that kind of thing by adding "poisoning" code:

(A) when memory blocks are returned to the memory_block_pool's free
list (e.g. by an obstack), fill the content with a garbage value.

(B) When calling
  obstack_free (obstack, NULL);
which leaves the obstack requiring reinitialization, fill
the obstack's fields with a garbage value.

in both cases to try fail faster for use-after-free errors.

This patch isn't ready as-is:
- I couldn't see an equivalent of CHECKING_P for libiberty, so
  case (B) would do it even in a production build.

- this interracts badly with Valgrind; the latter emits messages
  about "Invalid write of size 8"
"16 bytes inside a block of size 65,536 alloc'd"
  I think that it merely needs some extra uses of the valgrind
  annotation macros to fix.

- the garbage/poison values I picked were rather arbitrary

That said, it's survived bootstrap®rtesting on x86_64-pc-linux-gnu
(in conjunction with the previous patch).

Thoughts?

gcc/ChangeLog:
* memory-block.h (memory_block_pool::release): If CHECKING_P,
fill the released block with a poison value.

libiberty/ChangeLog:
* obstack.c (_obstack_free): If OBJ is zero, poison the
obstack to highlight the need for reinitialization.
---
 gcc/memory-block.h  | 3 +++
 libiberty/obstack.c | 5 +
 2 files changed, 8 insertions(+)

diff --git a/gcc/memory-block.h b/gcc/memory-block.h
index d7b96a3..52c17f9 100644
--- a/gcc/memory-block.h
+++ b/gcc/memory-block.h
@@ -66,6 +66,9 @@ inline void
 memory_block_pool::release (void *uncast_block)
 {
   block_list *block = new (uncast_block) block_list;
+#if CHECKING_P
+  memset (block, 0xde, block_size);
+#endif
   block->m_next = instance.m_blocks;
   instance.m_blocks = block;
 }
diff --git a/libiberty/obstack.c b/libiberty/obstack.c
index 6d8d672..8df5517 100644
--- a/libiberty/obstack.c
+++ b/libiberty/obstack.c
@@ -292,6 +292,11 @@ _obstack_free (struct obstack *h, void *obj)
   else if (obj != 0)
 /* obj is not in any of the chunks! */
 abort ();
+
+  /* If OBJ is zero, the obstack will require reinitialization; poison it.
+ TODO: make this conditional on being a debug build.  */
+  if (obj == 0)
+memset (h, 0xdd, sizeof (struct obstack));
 }
 
 _OBSTACK_SIZE_T
-- 
1.8.5.3



[PATCH 1/2] fix memory chunk corruption for opts_obstack (PR jit/68446)

2016-01-15 Thread David Malcolm
There can be multiple gcc_options instances, each with
a call to
  init_options_struct
matched with a call to
  finalize_options_struct
whereas the
  opts_obstack
is a singleton.  Each gcc_options instance can potentially use the
opts_obstack singleton.

r230264 (aka 25faed340686df8d7bb2242dc8d04285976922b6) fixed
a large memory leak (1.2MB) of the opts_obstack, by making
initialization of the opts_obstack be idempotent
(in init_opts_obstack).

This works if we only have one in-process run of the compiler.
Unfortunately this commit broke much of libgccjit's test suite,
which now fails with memory corruption errors.

The root cause of the breakage is that toplev::finalize cleans up the
opts_obstack using:

  obstack_free (opts_obstack, NULL);

Calling obstack_free with NULL leaves an obstack in an uninitialized
state and hence a reinitialization is required;
libiberty/obstacks.texi has:

  Note that if @var{object} is a null pointer, the result is an
  *uninitialized* obstack.  [my emphasis]

Hence opts_obstack reverts to an uninitialized state - but further
calls to initialize it are rejected by the idempotency code in
init_opts_obstack, and we then attempt to allocate from an
uninitialized obstack.

In particular, the obstack's "chunk" field becomes invalid, but isn't
unset. The underlying 64KB chunk(s) are returned to memory_block_pool's
m_blocks linked-list of 64KB free chunks, and they get reused by othe
obstacks e.g. for bitmaps.  However, given that opts_obstack fails
to be reinitialized, opts_obstack.chunks points at a freed chunk.
Hence, on the 2nd iteration of a jit testcase, it gets used to
allocate copies of the options, but this out of a chunk that's being
used by a different memory_block_pool user, so chaos ensues: we have
64KB chunks of memory being erroneously shared between different
memory-pool users.

The following patch removes idempotency from init_opts_obstack, and
replaces the call to init_opts_stack from init_options_struct with
an assert that the singleton opts_stack is already initialized,
adding in the necessary per-compile initialization of opts_stack
(we already have per-compile cleanup).

Or to put it another way, previously, we had this pattern of calls:

  - for each jit compile:
- toplev:
  - multiple calls to init_options_struct matched with calls to
finalize_options_struct; the first call to init_options_struct
idempotently initializes opts_obstack.
  - obstack_free (&opts_obstack, NULL);
(leading to corrupt opts_obstack on the 2nd iteration of jit compilation)

and with this patch we instead have this:

  - for each jit compile:
- toplev:
  - init_opts_obstack
  - multiple calls to init_options_struct matched with calls to
finalize_options_struct
  - obstack_free (&opts_obstack, NULL);

(I don't like that opts_obstack is global state, but it seems risky
to try to fix that at this stage).

The patch also adds code to reset save_decoded_options and
save_decoded_options_count when freeing opts_obstack, since these
saved options are allocated from out of opts_obstack and hence
also become invalid when it's freed.

Successfully bootstrapped®rtested on x86_64-pc-linux-gnu in
conjunction with the followup patch.

Fixes all of the jit test suite, apart from test-threads.c (which is
broken for a different reason).  I've also verified manually under
Valgrind that this also keeps the fix for the large leak reported in
the PR that motivated r230264).

OK for trunk?  (assuming it bootstraps®rtests by itself)

gcc/ChangeLog:
PR jit/68446
* gcc.c (driver::decode_argv): Add call to
init_opts_obstack before init_options_struct.
* opts.c (init_opts_obstack): Remove idempotency.
(init_options_struct): Replace call to init_opts_obstack
with a gcc_assert to verify that it has already been called.
* toplev.c (toplev::main): Add call to init_opts_obstack before
calls to init_options_struct.
(toplev::finalize): Move cleanup of opts_obstack next to
cleanup of save_decoded_options, clearing the latter, and
save_decoded_options_count.
---
 gcc/gcc.c|  1 +
 gcc/opts.c   | 14 +-
 gcc/toplev.c |  7 ++-
 3 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/gcc/gcc.c b/gcc/gcc.c
index 319a073..c191fde 100644
--- a/gcc/gcc.c
+++ b/gcc/gcc.c
@@ -7191,6 +7191,7 @@ driver::decode_argv (int argc, const char **argv)
   global_init_params ();
   finish_params ();
 
+  init_opts_obstack ();
   init_options_struct (&global_options, &global_options_set);
 
   decode_cmdline_options_to_array (argc, argv,
diff --git a/gcc/opts.c b/gcc/opts.c
index 2add158..9437535 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -266,18 +266,12 @@ add_comma_separated_to_vector (void **pvec, const char 
*arg)
   *pvec = v;
 }
 
-/* Initialize opts_obstack if not initialized.  */
+/* Initialize opts_obstack.  */
 
 void
 init_opts_obstack (void)
 {
-  static bool opts_o

Patch to fix PR69030

2016-01-15 Thread Vladimir Makarov

  The following patch fixes

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69030

  The patch was bootstrapped and tested on x86 and x86-64.

  Committed as rev. 232445.

2016-01-15  Vladimir Makarov  

PR rtl-optimization/69030
* lra-spills.c (remove_pseudos): Check nrefs and make the function
returning bool.
(spill_pseudos): Delete debug insn for dead pseudo.
(lra_spill): Initiate spill_hard_reg and slots memory separately.

2016-01-15  Vladimir Makarov  

PR rtl-optimization/69030
* lra-spills.c (remove_pseudos): Check nrefs and make the function
returning bool.
(spill_pseudos): Delete debug insn for dead pseudo.
(lra_spill): Initiate spill_hard_reg and slots memory separately.

Index: lra-spills.c
===
--- lra-spills.c	(revision 232143)
+++ lra-spills.c	(working copy)
@@ -396,17 +396,19 @@ assign_stack_slot_num_and_sort_pseudos (
 
 /* Recursively process LOC in INSN and change spilled pseudos to the
corresponding memory or spilled hard reg.  Ignore spilled pseudos
-   created from the scratches.	*/
-static void
+   created from the scratches.  Return true if the pseudo nrefs equal
+   to 0 (don't change the pseudo in this case).  Otherwise return false.  */
+static bool
 remove_pseudos (rtx *loc, rtx_insn *insn)
 {
   int i;
   rtx hard_reg;
   const char *fmt;
   enum rtx_code code;
-
+  bool res = false;
+  
   if (*loc == NULL_RTX)
-return;
+return res;
   code = GET_CODE (*loc);
   if (code == REG && (i = REGNO (*loc)) >= FIRST_PSEUDO_REGISTER
   && lra_get_regno_hard_regno (i) < 0
@@ -416,6 +418,9 @@ remove_pseudos (rtx *loc, rtx_insn *insn
 	 into scratches back.  */
   && ! lra_former_scratch_p (i))
 {
+  if (lra_reg_info[i].nrefs == 0
+	  && pseudo_slots[i].mem == NULL && spill_hard_reg[i] == NULL)
+	return true;
   if ((hard_reg = spill_hard_reg[i]) != NULL_RTX)
 	*loc = copy_rtx (hard_reg);
   else
@@ -425,22 +430,23 @@ remove_pseudos (rtx *loc, rtx_insn *insn
 	false, false, 0, true);
 	  *loc = x != pseudo_slots[i].mem ? x : copy_rtx (x);
 	}
-  return;
+  return res;
 }
 
   fmt = GET_RTX_FORMAT (code);
   for (i = GET_RTX_LENGTH (code) - 1; i >= 0; i--)
 {
   if (fmt[i] == 'e')
-	remove_pseudos (&XEXP (*loc, i), insn);
+	res = remove_pseudos (&XEXP (*loc, i), insn) || res;
   else if (fmt[i] == 'E')
 	{
 	  int j;
 
 	  for (j = XVECLEN (*loc, i) - 1; j >= 0; j--)
-	remove_pseudos (&XVECEXP (*loc, i, j), insn);
+	res = remove_pseudos (&XVECEXP (*loc, i, j), insn) || res;
 	}
 }
+  return res;
 }
 
 /* Convert spilled pseudos into their stack slots or spill hard regs,
@@ -450,7 +456,7 @@ static void
 spill_pseudos (void)
 {
   basic_block bb;
-  rtx_insn *insn;
+  rtx_insn *insn, *curr;
   int i;
   bitmap_head spilled_pseudos, changed_insns;
 
@@ -467,52 +473,70 @@ spill_pseudos (void)
 }
   FOR_EACH_BB_FN (bb, cfun)
 {
-  FOR_BB_INSNS (bb, insn)
-	if (bitmap_bit_p (&changed_insns, INSN_UID (insn)))
-	  {
-	rtx *link_loc, link;
-	remove_pseudos (&PATTERN (insn), insn);
-	if (CALL_P (insn))
-	  remove_pseudos (&CALL_INSN_FUNCTION_USAGE (insn), insn);
-	for (link_loc = ®_NOTES (insn);
-		 (link = *link_loc) != NULL_RTX;
-		 link_loc = &XEXP (link, 1))
-	  {
-		switch (REG_NOTE_KIND (link))
-		  {
-		  case REG_FRAME_RELATED_EXPR:
-		  case REG_CFA_DEF_CFA:
-		  case REG_CFA_ADJUST_CFA:
-		  case REG_CFA_OFFSET:
-		  case REG_CFA_REGISTER:
-		  case REG_CFA_EXPRESSION:
-		  case REG_CFA_RESTORE:
-		  case REG_CFA_SET_VDRAP:
-		remove_pseudos (&XEXP (link, 0), insn);
-		break;
-		  default:
-		break;
-		  }
-	  }
-	if (lra_dump_file != NULL)
-	  fprintf (lra_dump_file,
-		   "Changing spilled pseudos to memory in insn #%u\n",
-		   INSN_UID (insn));
-	lra_push_insn (insn);
-	if (lra_reg_spill_p || targetm.different_addr_displacement_p ())
-	  lra_set_used_insn_alternative (insn, -1);
-	  }
-	else if (CALL_P (insn))
-	  /* Presence of any pseudo in CALL_INSN_FUNCTION_USAGE does
-	 not affect value of insn_bitmap of the corresponding
-	 lra_reg_info.  That is because we don't need to reload
-	 pseudos in CALL_INSN_FUNCTION_USAGEs.  So if we process
-	 only insns in the insn_bitmap of given pseudo here, we
-	 can miss the pseudo in some
-	 CALL_INSN_FUNCTION_USAGEs.  */
-	  remove_pseudos (&CALL_INSN_FUNCTION_USAGE (insn), insn);
-  bitmap_and_compl_into (df_get_live_in (bb), &spilled_pseudos);
-  bitmap_and_compl_into (df_get_live_out (bb), &spilled_pseudos);
+  FOR_BB_INSNS_SAFE (bb, insn, curr)
+	{
+	  bool removed_pseudo_p = false;
+	  
+	  if (bitmap_bit_p (&changed_insns, INSN_UID (insn)))
+	{
+	  rtx *link_loc, link;
+
+	  removed_pseudo_p = remove_pseudos (&PATTERN (insn), insn);
+	  if (CALL_P (insn)
+		  && remove_pseudos (&CALL_INSN_FUNCTI

Re: [hsa merge 07/10] IPA-HSA pass

2016-01-15 Thread Jakub Jelinek
On Fri, Jan 15, 2016 at 10:19:13PM +0300, Alexander Monakov wrote:
> Sorry, can you clarify -- what do you mean by "can't offload"?

I meant stuff like setjmp/longjmp, exceptions?, alloca (I know your changes
might fix this one), computed goto, non-local goto, and the like, which I
believe nvptx doesn't support.

Jakub


Re: [hsa merge 07/10] IPA-HSA pass

2016-01-15 Thread Alexander Monakov
On Fri, 15 Jan 2016, Jakub Jelinek wrote:
> On Fri, Jan 15, 2016 at 07:38:14PM +0300, Ilya Verbin wrote:
> > On Fri, Jan 15, 2016 at 17:09:54 +0100, Jakub Jelinek wrote:
> > > On Fri, Jan 15, 2016 at 05:02:34PM +0100, Martin Jambor wrote:
> > > > How do other accelerators cope with the situation when half of the
> > > > application is compiled with the accelerator disabled?  (Would some of
> > > > their calls to GOMP_target_ext lead to abort?)
> > > 
> > > GOMP_target_ext should never abort (unless internal error), worst case it
> > > just falls back into the host fallback.

Agreed -- the way it aborts today rather than using host fallback looks rather
surprising to me.

> > Wouldn't that lead to hard-to-find problems in case of nonshared memory?
> > I mean when someone expects that all target regions are executed on the 
> > device,
> > but in fact some of them are silently executed on the host with different 
> > data
> > environment.
> 
> E.g. for HSA it really shouldn't matter, as it is shared memory accelerator.
> For XeonPhi we hopefully can offload anything.  NVPTX is problematic,
> because it can't offload all the code, 

Sorry, can you clarify -- what do you mean by "can't offload"?

> but if it can be e.g. compile time detected that it will not be possible, it
> can just provide offloaded code for the target.

(as a result of previous confusion I can't follow this part either)

Thanks.
Alexander


Re: [PATCH v2] sanitize paths used in regular expression

2016-01-15 Thread Mike Stump
On Jan 15, 2016, at 10:40 AM, Zachary T Welch  wrote:
> Does this version look better?

Ok.

> I am not sure if this the right place to put the new helper, so let me know 
> if there is a better spot for it.

So, someone that wants to rehome the helper is free to do that.

Re: [PATCH] PR middle-end/67220: GCC fails to properly handle libcall symbol visibility of built functions

2016-01-15 Thread H.J. Lu
On Tue, Oct 20, 2015 at 4:37 PM, Bernd Schmidt  wrote:
> On 10/15/2015 12:37 PM, H.J. Lu wrote:
>>
>> On Thu, Oct 15, 2015 at 1:44 AM, Richard Biener
>>  wrote:
>>>
>>> On Wed, Oct 14, 2015 at 6:21 PM, H.J. Lu  wrote:

 By default, there is no visibility on builtin functions.  When there is
 explicitly declared visibility on the C library function which a builtin
 function fall back on, we should honor the explicit visibility on the
 the C library function.
>
>
>>> Doesn't the C++ FE have the same issue?
>>>
>>
>> Unlike gcc, visibility triggers a warning in g++:
>>
>> memcpy.i:2:14: warning: ‘void* memcpy(void*, const void*, size_t)’:
>> visibility attribute ignored because it conflicts with previous
>> declaration [-Wattributes]
>>   extern void *memcpy(void *dest, const void *src, size_t n)
>>^
>> : note: previous declaration of ‘void* memcpy(void*, const
>> void*, size_t)’
>> [hjl@gnu-tools-1 pr67220]$
>
>
> I see no good reason for C and C++ to have different behaviour here. It
> looks like the C++ frontend sets DECL_VISIBILITY_SPECIFIED to 1 for
> builtins, causing the above behaviour. Cc'ing Jason, but I think the C++
> frontend should be changed not to set D_V_S and have the same changes as the
> C frontend for merging the visibilities.
>

What should we do with C++ front-end?

-- 
H.J.


[PATCH v2] sanitize paths used in regular expression

2016-01-15 Thread Zachary T Welch
Does this version look better?  I am not sure if this the right place
to put the new helper, so let me know if there is a better spot for it.

gcc/testsuite/lib/
* prune.exp (prune_file_path): Sanitize path used in regex.
(escape_regex_chars): New.

Signed-off-by: Zachary T Welch 
---
 gcc/testsuite/lib/prune.exp | 25 +++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/lib/prune.exp b/gcc/testsuite/lib/prune.exp
index 8e4c203..fd3c4ea 100644
--- a/gcc/testsuite/lib/prune.exp
+++ b/gcc/testsuite/lib/prune.exp
@@ -73,12 +73,33 @@ proc prune_gcc_output { text } {
 return $text
 }
 
+# escape metacharacters in literal string, so it can be used in regex
+
+proc escape_regex_chars { line } {
+return [string map {"^" "\\^"
+   "$" "\\$"
+   "(" "\\("
+   ")" "\\)"
+   "[" "\\["
+   "]" "\\]"
+   "{" "\\{"
+   "}" "\\}"
+   "." "\\."
+   "\\" ""
+   "?" "\\?"
+   "+" "\\+"
+   "*" "\\*"
+   "|" "\\|"} $line]
+}
+
 proc prune_file_path { text } {
 global srcdir
 
+set safedir [escape_regex_chars $srcdir]
+regsub -all "$safedir\/"  $text "" text
+
 # Truncate absolute file path into relative path.
-set topdir "[file dirname [file dirname [file dirname $srcdir]]]"
-regsub -all "$srcdir\/" $text "" text
+set topdir "[file dirname [file dirname [file dirname $safedir]]]"
 regsub -all "$topdir\/" $text "" text
 
 return $text
-- 
1.8.1.1



Re: [PATCH 5/5] s390: Add -fsplit-stack support

2016-01-15 Thread Andreas Krebbel
Marcin,

your implementation looks very good to me. Thanks!

But please be aware that we deprecated the support of g5 and g6 and intend to 
remove that code from
the back-end with the next GCC version.  So I would prefer if you could remove 
all the
!TARGET_CPU_ZARCH stuff from the implementation and just error out if 
split-stack is enabled with
-march g5/g6.  It currently makes the implementation more complicated and would 
have to be removed
anyway in the future.

Thanks!

https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01854.html


Bye,

-Andreas-



On 01/02/2016 08:16 PM, Marcin Kościelnicki wrote:
> libgcc/ChangeLog:
> 
>   * config.host: Use t-stack and t-stack-s390 for s390*-*-linux.
>   * config/s390/morestack.S: New file.
>   * config/s390/t-stack-s390: New file.
>   * generic-morestack.c (__splitstack_find): Add s390-specific code.
> 
> gcc/ChangeLog:
> 
>   * common/config/s390/s390-common.c (s390_supports_split_stack):
>   New function.
>   (TARGET_SUPPORTS_SPLIT_STACK): New macro.
>   * config/s390/s390-protos.h: Add s390_expand_split_stack_prologue.
>   * config/s390/s390.c (struct machine_function): New field
>   split_stack_varargs_pointer.
>   (s390_split_branches): Don't split split-stack pseudo-insns, rewire
>   split-stack prologue conditional jump instead of splitting it.
>   (s390_chunkify_start): Don't reload const pool register on split-stack
>   prologue conditional jumps.
>   (s390_register_info): Mark r12 as clobbered if it'll be used as temp
>   in s390_emit_prologue.
>   (s390_emit_prologue): Use r12 as temp if r1 is taken by split-stack
>   vararg pointer.
>   (morestack_ref): New global.
>   (SPLIT_STACK_AVAILABLE): New macro.
>   (s390_expand_split_stack_prologue): New function.
>   (s390_expand_split_stack_call_esa): New function.
>   (s390_expand_split_stack_call_zarch): New function.
>   (s390_live_on_entry): New function.
>   (s390_va_start): Use split-stack vararg pointer if appropriate.
>   (s390_reorg): Lower the split-stack pseudo-insns.
>   (s390_asm_file_end): Emit the split-stack note sections.
>   (TARGET_EXTRA_LIVE_ON_ENTRY): New macro.
>   * config/s390/s390.md: (UNSPEC_STACK_CHECK): New unspec.
>   (UNSPECV_SPLIT_STACK_CALL_ZARCH): New unspec.
>   (UNSPECV_SPLIT_STACK_CALL_ESA): New unspec.
>   (UNSPECV_SPLIT_STACK_SIBCALL): New unspec.
>   (UNSPECV_SPLIT_STACK_MARKER): New unspec.
>   (split_stack_prologue): New expand.
>   (split_stack_call_esa): New insn.
>   (split_stack_call_zarch_*): New insn.
>   (split_stack_cond_call_zarch_*): New insn.
>   (split_stack_space_check): New expand.
>   (split_stack_sibcall_basr): New insn.
>   (split_stack_sibcall_*): New insn.
>   (split_stack_cond_sibcall_*): New insn.
>   (split_stack_marker): New insn.
> ---
>  gcc/ChangeLog|  41 ++
>  gcc/common/config/s390/s390-common.c |  14 +
>  gcc/config/s390/s390-protos.h|   1 +
>  gcc/config/s390/s390.c   | 538 +-
>  gcc/config/s390/s390.md  | 133 +++
>  libgcc/ChangeLog |   7 +
>  libgcc/config.host   |   4 +-
>  libgcc/config/s390/morestack.S   | 718 
> +++
>  libgcc/config/s390/t-stack-s390  |   2 +
>  libgcc/generic-morestack.c   |   4 +
>  10 files changed, 1454 insertions(+), 8 deletions(-)
>  create mode 100644 libgcc/config/s390/morestack.S
>  create mode 100644 libgcc/config/s390/t-stack-s390
> 
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index 4c7046f..a4f4dff 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,5 +1,46 @@
>  2016-01-02  Marcin Kościelnicki  
> 
> + * common/config/s390/s390-common.c (s390_supports_split_stack):
> + New function.
> + (TARGET_SUPPORTS_SPLIT_STACK): New macro.
> + * config/s390/s390-protos.h: Add s390_expand_split_stack_prologue.
> + * config/s390/s390.c (struct machine_function): New field
> + split_stack_varargs_pointer.
> + (s390_split_branches): Don't split split-stack pseudo-insns, rewire
> + split-stack prologue conditional jump instead of splitting it.
> + (s390_chunkify_start): Don't reload const pool register on split-stack
> + prologue conditional jumps.
> + (s390_register_info): Mark r12 as clobbered if it'll be used as temp
> + in s390_emit_prologue.
> + (s390_emit_prologue): Use r12 as temp if r1 is taken by split-stack
> + vararg pointer.
> + (morestack_ref): New global.
> + (SPLIT_STACK_AVAILABLE): New macro.
> + (s390_expand_split_stack_prologue): New function.
> + (s390_expand_split_stack_call_esa): New function.
> + (s390_expand_split_stack_call_zarch): New function.
> + (s390_live_on_entry): New function.
> + (s390_va_start): Use split-stack vararg pointer if appropriate.
> + (s390_reorg): L

Re: Optimise hash_table::empty

2016-01-15 Thread Bernd Schmidt

On 01/15/2016 07:00 PM, Richard Sandiford wrote:

Calling redirect_edge_var_map_empty after each pass was slowing things
down because hash_table::empty () cleared all slots even if the hash
table was already empty.

Tested on x86_64-linux-gnu, where it gives a 1% compile time improvement
for fold-const.ii at -O and -O2.  OK to install?


Ok.


Bernd


[PATCH] Fix PR c++/69091 (ICE with operator overload having 'auto' return type)

2016-01-15 Thread Patrick Palka
The crux of the problem in this PR is that type_dependent_expression_p
returns true for a FUNCTION_DECL that is not actually type-dependent.
This leads tsubst_decl to attempt to perform template argument
substitution on the template arguments of the FUNCTION_DECL, which
do not necessarily correspond with the current template arguments.

In the test case provided in the PR, the FUNCTION_DECL in question
(which is used in a CALL_EXPR built by build_min_non_dep_op_overload
during processing of the template function f()) is the following:

 >
QI
size 
unit size 
align 8 symtab 0 alias set -1 canonical type 0x76a0c738
arg-types 
chain 
chain >>>
pointer_to_this >
addressable public external QI file 
/home/patrick/code/gcc/gcc/testsuite/g++.dg/template/pr69091.C line 8 col 6 
align 8 context 
full-name "auto operator|(Option, OptionsRhs) [with 
ValueType = canine_t; ValueType Value = (canine_t)0u; OptionsRhs = 
Option]"
template-info 0x769ffcc0>

type_dependent_expression_p returns true for this FUNCTION_DECL because
its TREE_TYPE is 'auto'.

Direct calls to operator| do not have this problem because in such a
case the CALL_EXPR_FN of the CALL_EXPR that's built is an OVERLOAD (to a
TEMPLATE_DECL), which tsubst does not touch.

This patch makes it so that type_dependent_expression_p considers a
FUNCTION_DECL with template info to be type-dependent if and only if any
of its template arguments are dependent, similar to how C++14 variable
templates are handled.  Thus for the above FUNCTION_DECL we would return
false because all of its template arguments are non-dependent:

 
elt 1  constant 0>
elt 2 >

Bootstrapped and regtested on x86_64-pc-linux-gnu with no new
regressions, and also tested against Boost.  Is this change OK?

gcc/cp/ChangeLog:

PR c++/69091
* pt.c (type_dependent_expression_p): For a function template
specialization, a type is dependent iff any of its template
arguments are.

gcc/testsuite/ChangeLog:

PR c++/69091
* g++.dg/template/pr69091.C: New test.
---
 gcc/cp/pt.c |  8 
 gcc/testsuite/g++.dg/template/pr69091.C | 25 +
 2 files changed, 29 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/template/pr69091.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index edec774..403c5ac 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -22759,12 +22759,12 @@ type_dependent_expression_p (tree expression)
  || dependent_scope_p (scope));
 }
 
+  /* A function template specialization is type-dependent if it has any
+ dependent template arguments.  */
   if (TREE_CODE (expression) == FUNCTION_DECL
   && DECL_LANG_SPECIFIC (expression)
-  && DECL_TEMPLATE_INFO (expression)
-  && (any_dependent_template_arguments_p
- (INNERMOST_TEMPLATE_ARGS (DECL_TI_ARGS (expression)
-return true;
+  && DECL_TEMPLATE_INFO (expression))
+return any_dependent_template_arguments_p (DECL_TI_ARGS (expression));
 
   if (TREE_CODE (expression) == TEMPLATE_DECL
   && !DECL_TEMPLATE_TEMPLATE_PARM_P (expression))
diff --git a/gcc/testsuite/g++.dg/template/pr69091.C 
b/gcc/testsuite/g++.dg/template/pr69091.C
new file mode 100644
index 000..ec7bb25
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/pr69091.C
@@ -0,0 +1,25 @@
+// PR c++/69091
+// { dg-do compile { target c++14 } }
+
+template 
+struct Option {};
+
+template 
+auto operator|(Option, OptionsRhs) {
+  return Value;
+}
+
+enum canine_t { no, yes };
+Option cat;
+Option dog;
+
+template 
+void f(T) {
+  cat | dog;
+}
+
+struct A {};
+int main() {
+  f(A{});
+  return 0;
+}
-- 
2.7.0.83.gdfccd77.dirty



Re: [PATCH, rs6000] Add support for __builtin_cpu_is() and __builtin_cpu_supports()

2016-01-15 Thread Peter Bergner
On Thu, 2016-01-14 at 21:50 -0600, Peter Bergner wrote:
> This patch adds support for __builtin_cpu_init(), __builtin_cpu_is() and
> __builtin_cpu_supports() builtins for PowerPC.  We use the same API as the
> x86* builtins of the same name.  These builtins uses the new GLIBC 2.23
> feature where we store the AT_PLATFORM, AT_HWCAP and AT_HWCAP2 values in the
> Thread Control Block (TCB) which offers very fast access to these values.

Sorry, I forgot the documentation for the builtins.  Here they are.

Peter

* doc/extend.texi (PowerPC Built-in Functions): Document
__builtin_cpu_init, __builtin_cpu_is and __builtin_cpu_supports.

Index: gcc/doc/extend.texi
===
--- gcc/doc/extend.texi (revision 232359)
+++ gcc/doc/extend.texi (working copy)
@@ -13527,6 +13527,162 @@
 @node PowerPC Built-in Functions
 @subsection PowerPC Built-in Functions
 
+The following built-in functions are always available and can be used to
+check the PowerPC target platform type:
+
+@deftypefn {Built-in Function} void __builtin_cpu_init (void)
+This function is a @code{nop} on the PowerPC platform and is included solely
+to maintain API compatibility with the x86 builtins.
+@end deftypefn
+
+@deftypefn {Built-in Function} int __builtin_cpu_is (const char *@var{cpuname})
+This function returns a value of @code{1} if the run-time CPU is of type
+@var{cpuname} and returns @code{0} otherwise. The following CPU names can be
+detected:
+
+@table @samp
+@item power9
+IBM POWER9 Server CPU.
+@item power8
+IBM POWER8 Server CPU.
+@item power7
+IBM POWER7 Server CPU.
+@item power6x
+IBM POWER6 Server CPU (RAW mode).
+@item power6
+IBM POWER6 Server CPU (Architected mode).
+@item power5+
+IBM POWER5+ Server CPU.
+@item power5
+IBM POWER5 Server CPU.
+@item ppc970
+IBM 970 Server CPU (ie, Apple G5).
+@item power4
+IBM POWER4 Server CPU.
+@item ppca2
+IBM A2 64-bit Embedded CPU
+@item ppc476
+IBM PowerPC 476FP 32-bit Embedded CPU.
+@item ppc464
+IBM PowerPC 464 32-bit Embedded CPU.
+@item ppc440
+PowerPC 440 32-bit Embedded CPU.
+@item ppc405
+PowerPC 405 32-bit Embedded CPU.
+@item ppc-cell-be
+IBM PowerPC Cell Broadband Engine Architecture CPU.
+@end table
+
+Here is an example:
+@smallexample
+if (__builtin_cpu_is ("power8"))
+  @{
+ do_power8 (); // POWER8 specific implementation.
+  @}
+else
+  @{
+ do_generic (); // Generic implementation.
+  @}
+@end smallexample
+@end deftypefn
+
+@deftypefn {Built-in Function} int __builtin_cpu_supports (const char 
*@var{feature})
+This function returns a value of @code{1} if the run-time CPU supports the 
HWCAP
+feature @var{feature} and returns @code{0} otherwise. The following features 
can be
+detected:
+
+@table @samp
+@item 4xxmac
+4xx CPU has a Multiply Accumulator.
+@item altivec
+CPU has a SIMD/Vector Unit.
+@item arch_2_05
+CPU supports ISA 2.05 (eg, POWER6)
+@item arch_2_06
+CPU supports ISA 2.06 (eg, POWER7)
+@item arch_2_07
+CPU supports ISA 2.07 (eg, POWER8)
+@item arch_3_00
+CPU supports ISA 3.00 (eg, POWER9)
+@item archpmu
+CPU supports the set of compatible performance monitoring events.
+@item booke
+CPU supports the Embedded ISA category.
+@item cellbe
+CPU has a CELL broadband engine.
+@item dfp
+CPU has a decimal floating point unit.
+@item dscr
+CPU supports the data stream control register.
+@item ebb
+CPU supports event base branching.
+@item efpdouble
+CPU has a SPE double precision floating point unit.
+@item efpsingle
+CPU has a SPE single precision floating point unit.
+@item fpu
+CPU has a floating point unit.
+@item htm
+CPU has hardware transaction memory instructions.
+@item htm-nosc
+Kernel aborts hardware transactions when a syscall is made.
+@item ic_snoop
+CPU supports icache snooping capabilities.
+@item ieee128
+CPU supports 128-bit IEEE binary floating point instructions.
+@item isel
+CPU supports the integer select instruction.
+@item mmu
+CPU has a memory management unit.
+@item notb
+CPU does not have a timebase (eg, 601 and 403gx).
+@item pa6t
+CPU supports the PA Semi 6T CORE ISA.
+@item power4
+CPU supports ISA 2.00 (eg, POWER4)
+@item power5
+CPU supports ISA 2.02 (eg, POWER5)
+@item power5+
+CPU supports ISA 2.03 (eg, POWER5+)
+@item power6x
+CPU supports ISA 2.05 (eg, POWER6) extended opcodes mffgpr and mftgpr.
+@item ppc32
+CPU supports 32-bit mode execution.
+@item ppc601
+CPU supports the old POWER ISA (eg, 601)
+@item ppc64
+CPU supports 64-bit mode execution.
+@item ppcle
+CPU supports a little-endian mode that uses address swizzling.
+@item smt
+CPU support simultaneous multi-threading.
+@item spe
+CPU has a signal processing extension unit.
+@item tar
+CPU supports the target address register.
+@item true_le
+CPU supports true little-endian mode.
+@item ucache
+CPU has unified I/D cache.
+@item vcrypto
+CPU supports the vector cryptography instructions.
+@item vsx
+CPU supports the vector-scalar extension.
+@end table
+
+Here is an example:
+@smallexample
+if (__b

Re: Optimise hash_table::empty

2016-01-15 Thread Trevor Saunders
On Fri, Jan 15, 2016 at 06:00:10PM +, Richard Sandiford wrote:
> Calling redirect_edge_var_map_empty after each pass was slowing things
> down because hash_table::empty () cleared all slots even if the hash
> table was already empty.
> 
> Tested on x86_64-linux-gnu, where it gives a 1% compile time improvement
> for fold-const.ii at -O and -O2.  OK to install?

I can't ok, but it looks good to me.

Trev



Re: [hsa merge 07/10] IPA-HSA pass

2016-01-15 Thread Ilya Verbin
On Fri, Jan 15, 2016 at 17:45:22 +0100, Jakub Jelinek wrote:
> On Fri, Jan 15, 2016 at 07:38:14PM +0300, Ilya Verbin wrote:
> > On Fri, Jan 15, 2016 at 17:09:54 +0100, Jakub Jelinek wrote:
> > > On Fri, Jan 15, 2016 at 05:02:34PM +0100, Martin Jambor wrote:
> > > > How do other accelerators cope with the situation when half of the
> > > > application is compiled with the accelerator disabled?  (Would some of
> > > > their calls to GOMP_target_ext lead to abort?)
> > > 
> > > GOMP_target_ext should never abort (unless internal error), worst case it
> > > just falls back into the host fallback.
> > 
> > Wouldn't that lead to hard-to-find problems in case of nonshared memory?
> > I mean when someone expects that all target regions are executed on the 
> > device,
> > but in fact some of them are silently executed on the host with different 
> > data
> > environment.
> 
> E.g. for HSA it really shouldn't matter, as it is shared memory accelerator.
> For XeonPhi we hopefully can offload anything.

As you said, if compilation of target image fails with ICE or somehow, host
fallback and offloading to other targets should still work:
https://gcc.gnu.org/ml/gcc-patches/2015-02/msg00951.html
That patch was not applied, but it can be simulated by -foffload=disable,
I've created a testcase:

$ cat main.c

#pragma omp declare target
int x;
#pragma omp end declare target
extern int foo ();

int main ()
{
  int shared_mem = 0;
  #pragma omp target map (alloc: x, shared_mem)
{
  x = 10;
  shared_mem = 1;
}

  x = 20;
  int r = foo ();
  if (!shared_mem && r != 100)
__builtin_abort ();
  return 0;
}


$ cat liba.c 

#pragma omp declare target
extern int x;
#pragma omp end declare target

int foo ()
{
  int r;
  #pragma omp target map (from: r) map (alloc: x)
r = x * x;
  return r;
}


$ gcc -fopenmp -fPIC -shared liba.c -o liba.so -foffload=disable
$ gcc -fopenmp -L. -la main.c


Currently it prints "libgomp: Target function wasn't mapped", but after this
change:

--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -1390,7 +1390,7 @@ gomp_get_target_fn_addr (struct gomp_device_descr 
*devicep,
   splay_tree_key tgt_fn = splay_tree_lookup (&devicep->mem_map, &k);
   gomp_mutex_unlock (&devicep->lock);
   if (tgt_fn == NULL)
-   gomp_fatal ("Target function wasn't mapped");
+   return NULL;

... it will fail at __builtin_abort, but without -foffload=disable it will pass.

  -- Ilya


[wwwdocs] gcc-6/changes.html: diagnostics, Levenshtein, -Wmisleading-indentation, jit (v2)

2016-01-15 Thread David Malcolm
On Wed, 2016-01-13 at 10:00 -0500, David Malcolm wrote:
> Ping: https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00250.html
> 
> On Wed, 2016-01-06 at 09:50 -0500, David Malcolm wrote:
> > The attached patch adds information on various things to the
> > gcc-6/changes.html page:
> > 
> > * source-range-tracking (the patch merges the description of the string
> > location work into this, and updates the colorization of the example to
> > reflect gcc-6's behavior)
> > * fix-it hints
> > * hints for misspelled member names
> > * -Wmisleading-indentation
> > * jit improvements
> > * hints for misspelled command-line options
> > 
> > OK to commit?
> > Dave

Oops; I forgot to validate it; sorry.

Here's an updated version of the above, which the W3C validator
reports as being clean (fixing various "&" and "<" and a missing
end-tag).

OK to commit?

Dave
Index: htdocs/gcc-6/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-6/changes.html,v
retrieving revision 1.46
diff -u -p -r1.46 changes.html
--- htdocs/gcc-6/changes.html	22 Dec 2015 19:23:31 -	1.46
+++ htdocs/gcc-6/changes.html	15 Jan 2016 17:58:36 -
@@ -63,13 +63,40 @@ enum {
   oldval __attribute__ ((deprecated ("too old")))
 };
 
-Initial support for precise diagnostic locations within strings:
+Source locations for the C and C++ compilers are now tracked as ranges,
+  rather than just points, making it easier to identify the subexpression
+  of interest within a complicated expression.
+  For example:
 
-format-strings.c:3:14: warning: field width specifier '*' expects a matching 'int' argument [-Wformat=]
+test.cc: In function 'int test(int, int, foo, int, int)':
+test.cc:5:16: error: no match for 'operator*' (operand types are 'int' and 'foo')
+   return p + q * r * s + t;
+  ~~^~~
+
+In addition, there is now initial support for precise diagnostic locations
+within strings:
+
+format-strings.c:3:14: warning: field width specifier '*' expects a matching 'int' argument [-Wformat=]
printf("%*d");
-^
+^
 
-
+Diagnostics can now contain "fix-it hints", which are displayed
+  in context underneath the relevant source code.  For example:
+  
+
+fixits.c: In function 'bad_deref':
+fixits.c:11:13: error: 'ptr' is a pointer; did you mean to use '->'?
+   return ptr.x;
+ ^
+ ->
+
+The C and C++ compilers now offer suggestions for misspelled field names:
+
+spellcheck-fields.cc:52:13: error: 'struct s' has no member named 'colour'; did you mean 'color'?
+   return ptr->colour;
+   ^~
+
+
 New command-line options have been added for the C and C++ compilers:
   
 -Wshift-negative-value warns about left shifting a
@@ -89,8 +116,29 @@ enum {
   depends on the optimization options used.
 -Wduplicated-cond warns about duplicated conditions
 	  in an if-else-if chain.
+-Wmisleading-indentation warns about places where the
+  indentation of the code gives a misleading idea of the block
+  structure of the code to a human reader.  For example, given
+  https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2014-1266";>CVE-2014-1266:
+
+sslKeyExchange.c: In function 'SSLVerifySignedServerKeyExchange':
+sslKeyExchange.c:631:8: warning: statement is indented as if it were guarded by... [-Wmisleading-indentation]
+goto fail;
+^~~~
+sslKeyExchange.c:629:4: note: ...this 'if' clause, but it is not
+if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0)
+^~
+
+  This warning is enabled by -Wall.
   
 
+The C and C++ compilers now emit saner error messages if
+  merge-conflict markers are present in a source file.
+
+test.c:3:1: error: version control conflict marker in file
+ <<< HEAD
+ ^~~
+
   
 
 C
@@ -166,8 +214,19 @@ enum {
 
 
 
-
-
+libgccjit
+  
+The driver code is now run in-process within libgccjit,
+  providing a small speed-up of the compilation process.
+The API has gained entrypoints for
+  
+https://gcc.gnu.org/onlinedocs/jit/topics/performance.html";>timing how long was spent in different parts of code,
+https://gcc.gnu.org/onlinedocs/jit/topics/functions.html#gcc_jit_block_end_with_switch";>creating switch statements,
+https://gcc.gnu.org/onlinedocs/jit/topics/contexts.html#gcc_jit_context_set_bool_allow_unreachable_blocks";>allowing unreachable basic blocks in a function, and
+https://gcc.gnu.org/onlinedocs/jit/topics/contexts.html#gcc_jit_context_add_command_line_option";>adding arbitrary command-line options to a compilation.
+  
+
+  
 
 
 New Targets and Target Specific Improvements
@@ -389,6 +448,12 @@ enum {
 Other significant improvements
 
   
+The gcc and g++ driver programs will now
+  provide suggestions for misspelled command line options.
+
+$ gcc -static-libfortran test.f95
+gcc: error: unrecogn

Optimise hash_table::empty

2016-01-15 Thread Richard Sandiford
Calling redirect_edge_var_map_empty after each pass was slowing things
down because hash_table::empty () cleared all slots even if the hash
table was already empty.

Tested on x86_64-linux-gnu, where it gives a 1% compile time improvement
for fold-const.ii at -O and -O2.  OK to install?

Thanks,
Richard


gcc/
* hash-table.h (hash_table::empty): Turn into an inline wrapper
that checks whether the table is already empty.  Rename the
original implementation to...
(hash_table::empty_slot): ...this new private function.

diff --git a/gcc/hash-table.h b/gcc/hash-table.h
index 2c52a4a..e925e1e 100644
--- a/gcc/hash-table.h
+++ b/gcc/hash-table.h
@@ -390,8 +390,8 @@ public:
   /* Return the current number of elements in this hash table. */
   size_t elements_with_deleted () const { return m_n_elements; }
 
-  /* This function clears all entries in the given hash table.  */
-  void empty ();
+  /* This function clears all entries in this hash table.  */
+  void empty () { if (elements ()) empty_slow (); }
 
   /* This function clears a specified SLOT in a hash table.  It is
  useful when you've already done the lookup and don't want to do it
@@ -499,6 +499,8 @@ private:
 
   template friend void gt_cleare_cache (hash_table *);
 
+  void empty_slow ();
+
   value_type *alloc_entries (size_t n CXX_MEM_STAT_INFO) const;
   value_type *find_empty_slot_for_expand (hashval_t);
   void expand ();
@@ -755,9 +757,11 @@ hash_table::expand ()
 ggc_free (oentries);
 }
 
+/* Implements empty() in cases where it isn't a no-op.  */
+
 template class Allocator>
 void
-hash_table::empty ()
+hash_table::empty_slow ()
 {
   size_t size = m_size;
   value_type *entries = m_entries;



Re: [hsa merge 08/10] HSAIL BRIG description header file

2016-01-15 Thread Mike Stump
On Jan 15, 2016, at 2:37 AM, Jakub Jelinek  wrote:
> HSA Foundation grants express permission to any current Founder, Promoter,
> Supporter Contributor, Academic or Associate member of HSA Foundation to
> copy and redistribute UNMODIFIED versions of this specification

So, this isn’t the GNU way.  We need to get permission from and they need to 
grant us, or they need to sign an assignment or it needs to be reimplemented.

They need to ask themselves, if they want us to support their standard or not.  
Getting permission might take a week to a month, but, it is better to go that 
route.  If they don’t want to grant us what we want, then they are likely to 
want to sue users of our compiler, and in that case, we are better not putting 
it in in the first place.

My vote would be for the SC to nix this until the issue is resolved.

Re: [PATCH] c++/58109 - alignas() fails to compile with constant expression

2016-01-15 Thread Martin Sebor

On 01/12/2016 11:11 AM, Martin Sebor wrote:

On 01/11/2016 10:20 PM, Jason Merrill wrote:

On 12/22/2015 09:32 PM, Martin Sebor wrote:

+  if (is_attribute_p ("aligned", name)
+  || is_attribute_p ("vector_size", name))
+{
+  /* Attribute argument may be a dependent indentifier.  */
+  if (tree t = args ? TREE_VALUE (args) : NULL_TREE)
+if (value_dependent_expression_p (t)
+|| type_dependent_expression_p (t))
+  return true;
+}


Instead of this, is_late_template_attribute should be fixed to check
attribute_takes_identifier_p.


attribute_takes_identifier_p() returns false for the aligned
attribute and for vector_size (it returns true only for
attributes cleanup, format, and mode, and none others).

Are you suggesting to also change attribute_takes_identifier_p
to return true for these attributes?  That would likely mean
changes to the C front end as well.)


Jason, can you please clarify what you had in mind?  I realize this
isn't as severe as a codegen problem but I'd like to try to wrap it
up in between higher priority tasks.



Thanks
Martin




Re: [PATCH] sanitize paths used in regular expression

2016-01-15 Thread Mike Stump
On Jan 15, 2016, at 2:47 AM, David Malcolm  wrote:
> FWIW, I do something similar in multiline.exp's _build_multiline_regex,
> which attempts to have a complete list of metacharacters (though I
> believe some of these are not valid for POSIX filenames);

Only ‘\’ and ‘\0’ are invalid.  The rest are ok.  ‘/‘ is only invalid in a 
single component of a path, because / is used as a separator.

>   # We need to escape "^" and other regexp metacharacters.
>   set line [string map {"^" "\\^"
> "(" "\\("
> ")" "\\)"
> "[" "\\["
> "]" "\\]"
> "{" "\\{"
> "}" "\\}"
> "." "\\."
> "\\" ""
> "?" "\\?"
> "+" "\\+"
> "*" "\\*"
> "|" "\\|"} $line]

Some regexp systems that use ^, also use $.  TCL does does example.

[PATCH 10/15] rewrite computation of iteration domains

2016-01-15 Thread Sebastian Pop
From: Sebastian Pop 

* graphite-sese-to-poly.c (set_scop_parameter_dim): Remove.
(cleanup_loop_iter_dom): Remove.
(build_loop_iteration_domains): Remove.
(build_scop_context): Remove.
(build_scop_iteration_domain): Remove.
(add_loop_constraints): New.
(build_iteration_domains): New.
(build_poly_scop): Call build_iteration_domains.
---
 gcc/graphite-sese-to-poly.c | 407 +---
 1 file changed, 192 insertions(+), 215 deletions(-)

diff --git a/gcc/graphite-sese-to-poly.c b/gcc/graphite-sese-to-poly.c
index 78dc2fb..abf18a7 100644
--- a/gcc/graphite-sese-to-poly.c
+++ b/gcc/graphite-sese-to-poly.c
@@ -431,159 +431,6 @@ extract_affine (scop_p s, tree e, __isl_take isl_space 
*space)
   return res;
 }
 
-/* Assign dimension for each parameter in SCOP.  */
-
-static void
-set_scop_parameter_dim (scop_p scop)
-{
-  sese_info_p region = scop->scop_info;
-  unsigned nbp = sese_nb_params (region);
-  isl_space *space = isl_space_set_alloc (scop->isl_context, nbp, 0);
-
-  unsigned i;
-  tree e;
-  FOR_EACH_VEC_ELT (region->params, i, e)
-space = isl_space_set_dim_id (space, isl_dim_param, i,
-  isl_id_for_ssa_name (scop, e));
-
-  scop->param_context = isl_set_universe (space);
-}
-
-static inline bool
-cleanup_loop_iter_dom (isl_set *inner, isl_set *outer, isl_space *space, mpz_t 
g)
-{
-  isl_set_free (inner);
-  isl_set_free (outer);
-  isl_space_free (space);
-  mpz_clear (g);
-  return false;
-}
-
-/* Builds the constraint polyhedra for LOOP in SCOP.  OUTER_PH gives
-   the constraints for the surrounding loops.  */
-
-static bool
-build_loop_iteration_domains (scop_p scop, struct loop *loop,
-  int nb,
- isl_set *outer, isl_set **doms)
-{
-
-  tree nb_iters = number_of_latch_executions (loop);
-  const sese_l& region = scop->scop_info->region;
-  gcc_assert (loop_in_sese_p (loop, region));
-
-  isl_set *inner = isl_set_copy (outer);
-  int pos = isl_set_dim (outer, isl_dim_set);
-  isl_val *v;
-  mpz_t g;
-
-  mpz_init (g);
-
-  inner = isl_set_add_dims (inner, isl_dim_set, 1);
-  isl_space *space = isl_set_get_space (inner);
-
-  /* 0 <= loop_i */
-  isl_constraint *c = isl_inequality_alloc
-  (isl_local_space_from_space (isl_space_copy (space)));
-  c = isl_constraint_set_coefficient_si (c, isl_dim_set, pos, 1);
-  inner = isl_set_coalesce (isl_set_add_constraint (inner, c));
-
-  /* loop_i <= cst_nb_iters */
-  if (TREE_CODE (nb_iters) == INTEGER_CST)
-{
-  c = isl_inequality_alloc
- (isl_local_space_from_space (isl_space_copy (space)));
-  c = isl_constraint_set_coefficient_si (c, isl_dim_set, pos, -1);
-  tree_int_to_gmp (nb_iters, g);
-  v = isl_val_int_from_gmp (scop->isl_context, g);
-  c = isl_constraint_set_constant_val (c, v);
-  inner = isl_set_add_constraint (inner, c);
-}
-
-  /* loop_i <= expr_nb_iters */
-  else if (!chrec_contains_undetermined (nb_iters))
-{
-  isl_pw_aff *aff;
-
-  nb_iters = scalar_evolution_in_region (region, loop, nb_iters);
-
-  /* Bail out as we do not know the scev.  */
-  if (chrec_contains_undetermined (nb_iters))
-   return cleanup_loop_iter_dom (inner, outer, space, g);
-
-  aff = extract_affine (scop, nb_iters, isl_set_get_space (inner));
-  isl_set *valid = isl_pw_aff_nonneg_set (isl_pw_aff_copy (aff));
-  valid = isl_set_project_out (valid, isl_dim_set, 0,
-  isl_set_dim (valid, isl_dim_set));
-
-  if (valid)
-   scop->param_context = isl_set_coalesce
- (isl_set_intersect (scop->param_context, valid));
-
-  isl_local_space *ls = isl_local_space_from_space (isl_space_copy 
(space));
-  isl_aff *al = isl_aff_set_coefficient_si (isl_aff_zero_on_domain (ls),
-   isl_dim_in, pos, 1);
-  isl_set *le = isl_pw_aff_le_set (isl_pw_aff_from_aff (al),
-  isl_pw_aff_copy (aff));
-  inner = isl_set_intersect (inner, le);
-
-  widest_int nit;
-  if (max_stmt_executions (loop, &nit))
-   {
- /* Insert in the context the constraints from the
-estimation of the number of iterations NIT and the
-symbolic number of iterations (involving parameter
-names) NB_ITERS.  First, build the affine expression
-"NIT - NB_ITERS" and then say that it is positive,
-i.e., NIT approximates NB_ITERS: "NIT >= NB_ITERS".  */
- mpz_t g;
- mpz_init (g);
- wi::to_mpz (nit, g, SIGNED);
- mpz_sub_ui (g, g, 1);
-
- isl_pw_aff *approx
-   = extract_affine_gmp (g, isl_set_get_space (inner));
- isl_set *x = isl_pw_aff_ge_set (approx, aff);
- x = isl_set_project_out (x, isl_dim_set, 0,
-  isl_set_dim (x, isl_dim_set));
-   

Re: [AArch64] Remove TODO (redundant type conversions) in arm_neon.h

2016-01-15 Thread James Greenhalgh
On Mon, Jan 11, 2016 at 11:56:50AM +, Jiong Wang wrote:
> There are quite a few redundant type conversions in arm_neon.h, all of
> them are intrinsics taking argument of vector float type and return result
> of vector unsigned integer type.
> 
> The problem is currently we support UNOP and UNOPU qualifiers for unary
> "signed <- signed", "unsigned <- unsigned" respectively, while we are
> lack of unary "unsigned <- signed" qualifier which is added by this patch
> as UNOPUS.
> 
> "vector unsigned int" <- "vector float" should fall into UNOPUS catalogue.
> 
> I guess this patch also fix hiding bugs in arm_neon.h which will be exposed
> when -Wconversion specified because several builtins are returning 
> inconsistent
> types with declared, for example "vcvtas_u32_f32", "vcvtad_u64_f64".
> 
> ok for trunk or should wait until stage-1 re-open?
> 

Yes, because of those bugs I'd like to take this now.

OK for trunk.

Thanks,
James

> 2016-01-11  Jiong. Wang  
> 
> gcc/
>   * config/aarch64/aarch64-builtins.c (aarch64_types_unopus_qualifiers):
>   New.
>   (TYPES_UNOPUS): New.
>   * config/aarch64/aarch64-simd-builtins.def (lbtruncuv2sf): Correct
>   builtin type, from UNOP to UNOPUS.
>   (lbtruncuv4sf): Likewise.
>   (lbtruncuv2df): Likewise.
>   (lrounduv2sf): Likewise.
>   (lrounduv4sf): Likewise.
>   (lrounduv2df): Likewise.
>   (lroundusf): Likewise.
>   (lroundusf): Likewise.
>   (lceiluv2sf): Likewise.
>   (lceiluv4sf): Likewise.
>   (lceiluv2df): Likewise.
>   (lceilusf): Likewise.
>   (lceiludf): Likewise.
>   (lflooruv2sf): Likewise.
>   (lflooruv4sf): Likewise.
>   (lflooruv2df): Likewise.
>   (lfloorusf): Likewise.
>   (lfloorudf): Likewise.
>   (lfrintnuv2sf): Likewise.
>   (lfrintnuv4sf): Likewise.
>   (lfrintnuv2df): Likewise.
>   (lfrintnusf): Likewise.
>   (lfrintnudf): Likewise.
>   * config/aarch64/arm_neon.h (vcvt_u32_f32): Remove unncessary type
>   conversion.
>   (vcvtq_u32_f32): Likewise.
>   (vcvtq_u64_f64): Likewise.
>   (vcvta_u32_f32): Likewise.
>   (vcvtaq_u32_f32): Likewise.
>   (vcvtaq_u64_f64): Likewise.
>   (vcvtm_u32_f32): Likewise.
>   (vcvtmq_u32_f32): Likewise.
>   (vcvtmq_u64_f64): Likewise.
>   (vcvtn_u32_f32): Likwise.
>   (vcvtnq_u32_f32): Likewise.
>   (vcvtnq_u64_f64): Likewise.
>   (vcvtp_u32_f32): Likewise.
>   (vcvtpq_u32_f32): Likewise.
>   (vcvtpq_u64_f64): Likewise.
>   (vcvtmd_u64_f64): Likewise.
>   (vcvtms_u32_f32): Likewise.
>   (vcvtad_u64_f64): Likewise.
>   (vcvtas_u32_f32): Likewise.
>   (vcvtnd_u64_f64): Likewise.
>   (vcvtns_u32_f32): Likewise.
>   (vcvtpd_u64_f64): Likewise.
>   (vcvtps_u32_f32): Likewise.
> 




Re: [PATCH][AArch64] Handle CSEL of zero_extended operands in rtx costs

2016-01-15 Thread James Greenhalgh
On Mon, Jan 11, 2016 at 04:41:32PM +, Kyrill Tkachov wrote:
> Hi all,
> 
> This patch fixes the test gcc.target/aarch64/pr66776.c for -mcpu=cortex-a53.
> Currently we don't handle the (if_then_else (cond) (zero_extend r1) 
> (zero_extend r2))
> form of CSEL, so we end up recursing into the operands of the if_then_else 
> and for some CPUs
> reject the combination. We end up generating two UXTW instructions followed 
> by a CSEL rather
> than a single CSEL on the w-regs. Such is the case for -mcpu=cortex-a53.
> 
> This small patch fixes that by catching the zero_extended operands and 
> extracting their
> inner regs properly for further costing in aarch64_if_then_else_costs.
> 
> With this patch the aforementioned test now passes with -mcpu=cortex-a53.
> Bootstrapped and tested on aarch64-none-linux-gnu.
> 
> Ok for trunk?

OK.

Thanks,
James

> Thanks,
> Kyrill
> 
> 2016-01-11  Kyrylo Tkachov  
> 
> * config/aarch64/aarch64.c (aarch64_if_then_else_costs): Handle
> CSEL of zero_extended registers.



[PATCH 13/15] reinstantiate loop blocking

2016-01-15 Thread Sebastian Pop
* graphite-optimize-isl.c (get_schedule_for_node_st): Add back.
(optimize_isl): Call isl_schedule_map_schedule_node_bottom_up.
* params.def (PARAM_LOOP_BLOCK_TILE_SIZE): Adjust to 32.

gcc/testsuite

* gcc.dg/graphite/block-1.c:
* gcc.dg/graphite/block-5.c:
* gcc.dg/graphite/block-6.c:
* gcc.dg/graphite/block-pr47654.c:
* gcc.dg/graphite/interchange-0.c:
* gcc.dg/graphite/interchange-12.c:
* gcc.dg/graphite/interchange-14.c:
* gcc.dg/graphite/interchange-15.c:
* gcc.dg/graphite/interchange-5.c:
* gcc.dg/graphite/interchange-6.c:
* gcc.dg/graphite/interchange-8.c:
* gcc.dg/graphite/interchange-mvt.c:
* gcc.dg/graphite/uns-block-1.c:
* gcc.dg/graphite/uns-interchange-12.c:
* gcc.dg/graphite/uns-interchange-14.c:
* gcc.dg/graphite/uns-interchange-15.c:
* gcc.dg/graphite/uns-interchange-mvt.c:
* gfortran.dg/graphite/pr14741.f90:
---
 gcc/graphite-optimize-isl.c| 54 ++
 gcc/params.def |  2 +-
 gcc/testsuite/gcc.dg/graphite/block-1.c|  2 +-
 gcc/testsuite/gcc.dg/graphite/block-5.c|  2 +-
 gcc/testsuite/gcc.dg/graphite/block-6.c|  3 +-
 gcc/testsuite/gcc.dg/graphite/block-pr47654.c  |  2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-0.c  |  2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-12.c |  3 +-
 gcc/testsuite/gcc.dg/graphite/interchange-14.c |  3 +-
 gcc/testsuite/gcc.dg/graphite/interchange-15.c |  2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-5.c  |  2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-6.c  |  2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-8.c  |  3 +-
 gcc/testsuite/gcc.dg/graphite/interchange-mvt.c|  2 +-
 gcc/testsuite/gcc.dg/graphite/uns-block-1.c|  2 +-
 gcc/testsuite/gcc.dg/graphite/uns-interchange-12.c |  3 +-
 gcc/testsuite/gcc.dg/graphite/uns-interchange-14.c |  3 +-
 gcc/testsuite/gcc.dg/graphite/uns-interchange-15.c |  2 +-
 .../gcc.dg/graphite/uns-interchange-mvt.c  |  2 +-
 gcc/testsuite/gfortran.dg/graphite/pr14741.f90 |  2 +-
 20 files changed, 74 insertions(+), 24 deletions(-)

diff --git a/gcc/graphite-optimize-isl.c b/gcc/graphite-optimize-isl.c
index f385c77..28dc6d4 100644
--- a/gcc/graphite-optimize-isl.c
+++ b/gcc/graphite-optimize-isl.c
@@ -39,6 +39,56 @@ along with GCC; see the file COPYING3.  If not see
 #include "dumpfile.h"
 #include "graphite.h"
 
+/* get_schedule_for_node_st - Improve schedule for the schedule node.
+   Only Simple loop tiling is considered.  */
+
+static __isl_give isl_schedule_node *
+get_schedule_for_node_st (__isl_take isl_schedule_node *node, void *user)
+{
+  if (user)
+return node;
+
+  if (isl_schedule_node_get_type (node) != isl_schedule_node_band
+  || isl_schedule_node_n_children (node) != 1)
+return node;
+
+  isl_space *space = isl_schedule_node_band_get_space (node);
+  unsigned dims = isl_space_dim (space, isl_dim_set);
+  isl_schedule_node *child = isl_schedule_node_get_child (node, 0);
+  isl_schedule_node_type type = isl_schedule_node_get_type (child);
+  isl_space_free (space);
+  isl_schedule_node_free (child);
+
+  if (type != isl_schedule_node_leaf)
+return node;
+
+  if (dims <= 1 || !isl_schedule_node_band_get_permutable (node))
+{
+  if (dump_file && dump_flags)
+   fprintf (dump_file, "not tiled\n");
+  return node;
+}
+
+  /* Tile loops.  */
+  space = isl_schedule_node_band_get_space (node);
+  isl_multi_val *sizes = isl_multi_val_zero (space);
+  long tile_size = PARAM_VALUE (PARAM_LOOP_BLOCK_TILE_SIZE);
+  isl_ctx *ctx = isl_schedule_node_get_ctx (node);
+
+  for (unsigned i = 0; i < dims; i++)
+{
+  sizes = isl_multi_val_set_val (sizes, i,
+isl_val_int_from_si (ctx, tile_size));
+  if (dump_file && dump_flags)
+   fprintf (dump_file, "tiled by %ld\n", tile_size);
+}
+
+  node = isl_schedule_node_band_tile (node, sizes);
+  node = isl_schedule_node_child (node, 0);
+
+  return node;
+}
+
 static isl_union_set *
 scop_get_domains (scop_p scop)
 {
@@ -83,6 +133,7 @@ optimize_isl (scop_p scop)
   sc = isl_schedule_constraints_set_validity (sc, isl_union_map_copy 
(validity));
   sc = isl_schedule_constraints_set_coincidence (sc, validity);
 
+  isl_options_set_tile_scale_tile_loops (scop->isl_context, 32);
   isl_options_set_schedule_serialize_sccs (scop->isl_context, 0);
   isl_options_set_schedule_maximize_band_depth (scop->isl_context, 1);
   isl_options_set_schedule_max_constant_term (scop->isl_context, 20);
@@ -95,6 +146,9 @@ optimize_isl (scop_p scop)
   isl_options_set_ast_build_atomic_upper_bound (scop->isl_context, 1);
 
   scop->transformed_schedule = isl_schedule_constraints_compute_schedule (sc);
+  scop->transformed_schedule =
+isl_schedule_map_schedule_node_bottom_up (scop->tra

[PATCH 08/15] record loops in execution order

2016-01-15 Thread Sebastian Pop
From: Sebastian Pop 

* graphite-scop-detection.c (record_loop_in_sese): New.
(gather_bbs::before_dom_children): Call record_loop_in_sese.
(build_scops): Remove call to build_sese_loop_nests.
* sese.c (sese_record_loop): Remove.
(build_sese_loop_nests): Remove.
(new_sese_info): Remove region->loops.
(free_sese_info): Same.
* sese.h (sese_contains_loop): Same.
(build_sese_loop_nests): Remove.
(sese_contains_loop): Remove.
---
 gcc/graphite-scop-detection.c | 29 ---
 gcc/sese.c| 54 ---
 gcc/sese.h| 10 
 3 files changed, 26 insertions(+), 67 deletions(-)

diff --git a/gcc/graphite-scop-detection.c b/gcc/graphite-scop-detection.c
index e004185..be33be3 100644
--- a/gcc/graphite-scop-detection.c
+++ b/gcc/graphite-scop-detection.c
@@ -1874,15 +1874,40 @@ gather_bbs::gather_bbs (cdi_direction direction, scop_p 
scop)
 {
 }
 
+/* Record in execution order the loops fully contained in the region.  */
+
+static void
+record_loop_in_sese (basic_block bb, sese_info_p region)
+{
+  loop_p father = bb->loop_father;
+  if (loop_in_sese_p (father, region->region))
+{
+  bool found = false;
+  loop_p loop0;
+  int j;
+  FOR_EACH_VEC_ELT (region->loop_nest, j, loop0)
+   if (father == loop0)
+ {
+   found = true;
+   break;
+ }
+  if (!found)
+   region->loop_nest.safe_push (father);
+}
+}
+
 /* Call-back for dom_walk executed before visiting the dominated
blocks.  */
 
 edge
 gather_bbs::before_dom_children (basic_block bb)
 {
-  if (!bb_in_sese_p (bb, scop->scop_info->region))
+  sese_info_p region = scop->scop_info;
+  if (!bb_in_sese_p (bb, region->region))
 return NULL;
 
+  record_loop_in_sese (bb, region);
+  
   gcond *stmt = single_pred_cond_non_loop_exit (bb);
 
   if (stmt)
@@ -1991,8 +2016,6 @@ build_scops (vec *scops)
  continue;
}
 
-  build_sese_loop_nests (scop->scop_info);
-
   find_scop_parameters (scop);
   graphite_dim_t max_dim = PARAM_VALUE (PARAM_GRAPHITE_MAX_NB_SCOP_PARAMS);
 
diff --git a/gcc/sese.c b/gcc/sese.c
index b0f54de..2ecff7d 100644
--- a/gcc/sese.c
+++ b/gcc/sese.c
@@ -43,56 +43,6 @@ along with GCC; see the file COPYING3.  If not see
 #include "sese.h"
 #include "tree-ssa-propagate.h"
 
-/* Record LOOP as occurring in REGION.  */
-
-static void
-sese_record_loop (sese_info_p region, loop_p loop)
-{
-  if (sese_contains_loop (region, loop))
-return;
-
-  bitmap_set_bit (region->loops, loop->num);
-  region->loop_nest.safe_push (loop);
-}
-
-/* Build the loop nests contained in REGION.  Returns true when the
-   operation was successful.  */
-
-void
-build_sese_loop_nests (sese_info_p region)
-{
-  unsigned i;
-  basic_block bb;
-  struct loop *loop0, *loop1;
-
-  FOR_EACH_BB_FN (bb, cfun)
-if (bb_in_sese_p (bb, region->region))
-  {
-   struct loop *loop = bb->loop_father;
-
-   /* Only add loops if they are completely contained in the SCoP.  */
-   if (loop->header == bb
-   && bb_in_sese_p (loop->latch, region->region))
- sese_record_loop (region, loop);
-  }
-
-  /* Make sure that the loops in the SESE_LOOP_NEST are ordered.  It
- can be the case that an inner loop is inserted before an outer
- loop.  To avoid this, semi-sort once.  */
-  FOR_EACH_VEC_ELT (region->loop_nest, i, loop0)
-{
-  if (region->loop_nest.length () == i + 1)
-   break;
-
-  loop1 = region->loop_nest[i + 1];
-  if (loop0->num > loop1->num)
-   {
- region->loop_nest[i] = loop1;
- region->loop_nest[i + 1] = loop0;
-   }
-}
-}
-
 /* For a USE in BB, if BB is outside REGION, mark the USE in the
LIVEOUTS set.  */
 
@@ -228,7 +178,6 @@ new_sese_info (edge entry, edge exit)
 
   region->region.entry = entry;
   region->region.exit = exit;
-  region->loops = BITMAP_ALLOC (NULL);
   region->loop_nest.create (3);
   region->params.create (3);
   region->rename_map = new rename_map_t;
@@ -244,9 +193,6 @@ new_sese_info (edge entry, edge exit)
 void
 free_sese_info (sese_info_p region)
 {
-  if (region->loops)
-region->loops = BITMAP_ALLOC (NULL);
-
   region->params.release ();
   region->loop_nest.release ();
 
diff --git a/gcc/sese.h b/gcc/sese.h
index 99df354..f481524 100644
--- a/gcc/sese.h
+++ b/gcc/sese.h
@@ -86,7 +86,6 @@ typedef struct sese_info_t
   rename_map_t *rename_map;
 
   /* Loops completely contained in this SESE.  */
-  bitmap loops;
   vec loop_nest;
 
   /* Basic blocks contained in this SESE.  */
@@ -107,20 +106,11 @@ typedef struct sese_info_t
 extern sese_info_p new_sese_info (edge, edge);
 extern void free_sese_info (sese_info_p);
 extern void sese_insert_phis_for_liveouts (sese_info_p, basic_block, edge, 
edge);
-extern void build_sese_loop_nests (sese_info_p);
 extern struct loop *outermost_loop_in_sese (se

[PATCH 12/15] new scop schedule.

2016-01-15 Thread Sebastian Pop
From: Sebastian Pop 

* graphite-dependences.c (scop_get_reads): Do not call
isl_union_map_add_map that is undocumented isl functionality.
(scop_get_must_writes): Same.
(scop_get_may_writes): Same.
(scop_get_original_schedule): Remove.
(scop_get_dependences): Do not call isl_union_map_compute_flow that
is deprecated in isl 0.15.  Instead, use isl_union_access_* interface.
(compute_deps): Remove.
* graphite-isl-ast-to-gimple.c (print_schedule_ast): New.
(debug_schedule_ast): New.
(translate_isl_ast_to_gimple::print_isl_ast_node): Removed.
(translate_isl_ast_to_gimple::get_max_schedule_dimensions): Remove.
(translate_isl_ast_to_gimple::extend_schedule): Remove.
(translate_isl_ast_to_gimple::generate_isl_schedule): Remove.
(translate_isl_ast_to_gimple::set_options): Remove.
(translate_isl_ast_to_gimple::scop_to_isl_ast): Generate code
from scop->transformed_schedule.
(graphite_regenerate_ast_isl): Add more dump.
* graphite-optimize-isl.c (optimize_isl): Set
scop->transformed_schedule.  Check whether schedules are equal.
(apply_poly_transforms): Move here.
* graphite-poly.c (apply_poly_transforms): ... from here.
(free_poly_bb): Static.
(free_scop): Static.
(pbb_number_of_iterations_at_time): Remove.
(print_isl_ast): New.
(debug_isl_ast): New.
(debug_scop_pbb): New.
* graphite-scop-detection.c (print_edge): Move.
(print_sese): Move.
* graphite-sese-to-poly.c (build_pbb_scattering_polyhedrons): Remove.
(build_scop_scattering): Remove.
(create_pw_aff_from_tree): Assert instead of bailing out.
(add_condition_to_pbb): Remove unused code, do not fail.
(add_conditions_to_domain): Same.
(add_conditions_to_constraints): Remove.
(build_scop_context): New.
(add_iter_domain_dimension): New.
(build_iteration_domains): Initialize pbb->iterators.
Call add_conditions_to_domain.
(nested_in): New.
(loop_at): New.
(index_outermost_in_loop): New.
(index_pbb_in_loop): New.
(outermost_pbb_in): New.
(add_in_sequence): New.
(add_outer_projection): New.
(outer_projection_mupa): New.
(add_loop_schedule): New.
(build_schedule_pbb): New.
(build_schedule_loop): New.
(embed_in_surrounding_loops): New.
(build_schedule_loop_nest): New.
(build_original_schedule): New.
(build_poly_scop): Call build_original_schedule.
* graphite.h (free_poly_dr): Remove.
(struct poly_bb): Add iterators.  Remove schedule, transformed, saved.
(free_poly_bb): Remove.
(debug_loop_vec): Remove.
(print_isl_ast): Declare.
(debug_isl_ast): Declare.
(scop_do_interchange): Remove.
(scop_do_strip_mine): Remove.
(scop_do_block): Remove.
(flatten_all_loops): Remove.
(optimize_isl): Remove.
(pbb_number_of_iterations_at_time): Remove.
(debug_scop_pbb): Declare.
(print_schedule_ast): Declare.
(debug_schedule_ast): Declare.
(struct scop): Remove schedule.  Add original_schedule,
transformed_schedule.
(free_gimple_poly_bb): Remove.
(print_generated_program): Remove.
(debug_generated_program): Remove.
(unify_scattering_dimensions): Remove.
* sese.c (print_edge): ... here.
(print_sese): ... here.
(debug_edge): ... here.
(debug_sese): ... here.
* sese.h (print_edge): Declare.
(print_sese): Declare.
(dump_edge): Declare.
(dump_sese): Declare.

gcc/testsuite

* gcc.dg/graphite/block-0.c: Adjust pattern.
* gcc.dg/graphite/block-1.c: Same.
* gcc.dg/graphite/block-5.c: Same.
* gcc.dg/graphite/block-6.c: Same.
* gcc.dg/graphite/block-pr47654.c: Same.
* gcc.dg/graphite/interchange-0.c: Same.
* gcc.dg/graphite/interchange-10.c: Same.
* gcc.dg/graphite/interchange-12.c: Same.
* gcc.dg/graphite/interchange-14.c: Same.
* gcc.dg/graphite/interchange-15.c: Same.
* gcc.dg/graphite/interchange-5.c: Same.
* gcc.dg/graphite/interchange-6.c: Same.
* gcc.dg/graphite/interchange-8.c: Same.
* gcc.dg/graphite/interchange-mvt.c: Same.
* gcc.dg/graphite/pr35356-1.c: Same.
* gcc.dg/graphite/scop-10.c (int toto): Same.
* gcc.dg/graphite/uns-block-1.c: Same.
* gcc.dg/graphite/uns-interchange-12.c: Same.
* gcc.dg/graphite/uns-interchange-14.c: Same.
* gcc.dg/graphite/uns-interchange-15.c: Same.
* gcc.dg/graphite/uns-interchange-mvt.c: Same.
* gfortran.dg/graphite/interchange-3.f90: Same.
* gfortran.dg/graphite/pr14741.f90: Same.
---
 gcc/graphite-dependences.c 

[PATCH 05/15] remove tiling

2016-01-15 Thread Sebastian Pop
From: Sebastian Pop 

We remove all code related to tiling, then we will call isl functionality for 
that.

* graphite-isl-ast-to-gimple.c (set_options_for_schedule_tree): Remove.
(translate_isl_ast_to_gimple::scop_to_isl_ast): Call 
set_separate_option.
(graphite_regenerate_ast_isl): Add dump.
* graphite-optimize-isl.c (get_schedule_for_node_st): Remove.
(get_schedule_map_st): Remove.
(get_single_map): Remove.
(apply_schedule_map_to_scop): Remove.
(optimize_isl): Do not use isl_union_maps to build the schedule.
* graphite-poly.c (apply_poly_transforms): Simplify.
(print_isl_set): Use more readable format: ISL_YAML_STYLE_BLOCK.
(print_isl_map): Same.
(print_isl_union_map): Same.
(print_isl_schedule): New.
(debug_isl_schedule): New.
* graphite.h: Declare print_isl_schedule and debug_isl_schedule.

gcc/testsuite

* gcc.dg/graphite/block-0.c: Adjust patern.
* gcc.dg/graphite/block-1.c: Same.
* gcc.dg/graphite/block-5.c: Same.
* gcc.dg/graphite/block-6.c: Same.
* gcc.dg/graphite/block-pr47654.c: Same.
* gcc.dg/graphite/interchange-0.c: Same.
* gcc.dg/graphite/interchange-1.c: Same.
* gcc.dg/graphite/interchange-10.c: Same.
* gcc.dg/graphite/interchange-11.c: Same.
* gcc.dg/graphite/interchange-12.c: Same.
* gcc.dg/graphite/interchange-13.c: Same.
* gcc.dg/graphite/interchange-14.c: Same.
* gcc.dg/graphite/interchange-15.c: Same.
* gcc.dg/graphite/interchange-16.c: Same.
* gcc.dg/graphite/interchange-2.c: Same.
* gcc.dg/graphite/interchange-3.c: Same.
* gcc.dg/graphite/interchange-4.c: Same.
* gcc.dg/graphite/interchange-5.c: Same.
* gcc.dg/graphite/interchange-6.c: Same.
* gcc.dg/graphite/interchange-7.c: Same.
* gcc.dg/graphite/interchange-8.c: Same.
* gcc.dg/graphite/interchange-9.c: Same.
* gcc.dg/graphite/interchange-mvt.c: Same.
* gcc.dg/graphite/uns-block-1.c: Same.
* gcc.dg/graphite/uns-interchange-12.c: Same.
* gcc.dg/graphite/uns-interchange-14.c: Same.
* gcc.dg/graphite/uns-interchange-15.c: Same.
* gcc.dg/graphite/uns-interchange-9.c: Same.
* gcc.dg/graphite/uns-interchange-mvt.c: Same.
* gfortran.dg/graphite/interchange-3.f90: Same.
* gfortran.dg/graphite/pr14741.f90: Same.
---
 gcc/graphite-isl-ast-to-gimple.c   |  47 +++-
 gcc/graphite-optimize-isl.c| 130 +++--
 gcc/graphite-poly.c|  36 --
 gcc/graphite.h |   2 +
 gcc/testsuite/gcc.dg/graphite/block-0.c|   2 +-
 gcc/testsuite/gcc.dg/graphite/block-1.c|   2 +-
 gcc/testsuite/gcc.dg/graphite/block-5.c|   3 +-
 gcc/testsuite/gcc.dg/graphite/block-6.c|   3 +-
 gcc/testsuite/gcc.dg/graphite/block-pr47654.c  |   2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-0.c  |   2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-1.c  |   7 +-
 gcc/testsuite/gcc.dg/graphite/interchange-10.c |   2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-11.c |   2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-12.c |   2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-13.c |   1 +
 gcc/testsuite/gcc.dg/graphite/interchange-14.c |   2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-15.c |   2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-16.c |   1 +
 gcc/testsuite/gcc.dg/graphite/interchange-2.c  |   2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-3.c  |   2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-4.c  |   2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-5.c  |   2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-6.c  |   2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-7.c  |   2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-8.c  |   2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-9.c  |   2 +-
 gcc/testsuite/gcc.dg/graphite/interchange-mvt.c|   2 +-
 gcc/testsuite/gcc.dg/graphite/uns-block-1.c|   2 +-
 gcc/testsuite/gcc.dg/graphite/uns-interchange-12.c |   2 +-
 gcc/testsuite/gcc.dg/graphite/uns-interchange-14.c |   2 +-
 gcc/testsuite/gcc.dg/graphite/uns-interchange-15.c |   2 +-
 gcc/testsuite/gcc.dg/graphite/uns-interchange-9.c  |   2 +-
 .../gcc.dg/graphite/uns-interchange-mvt.c  |   2 +-
 .../gfortran.dg/graphite/interchange-3.f90 |   2 +-
 gcc/testsuite/gfortran.dg/graphite/pr14741.f90 |   2 +-
 35 files changed, 97 insertions(+), 185 deletions(-)

diff --git a/gcc/graphite-isl-ast-to-gimple.c b/gcc/graphite-isl-ast-to-gimple.c
index dad802f..b0da425 100644
--- a/gcc/graphite-isl-ast-to-gimple.c
+++ b/gcc/graphite-isl-ast-to-gimple.c
@@ -300,12 +300,6 @@ class translate_isl_ast_to_gimple
 
   __isl_give isl_union_map *generate_isl_sche

Re: [hsa merge 08/10] HSAIL BRIG description header file

2016-01-15 Thread Martin Jambor
On Fri, Jan 15, 2016 at 01:03:35PM +0100, Jakub Jelinek wrote:
> On Fri, Jan 15, 2016 at 11:37:32AM +0100, Jakub Jelinek wrote:
> > On Fri, Jan 15, 2016 at 11:14:33AM +0100, Martin Jambor wrote:
> > > > Martin, could you ask the HSA Foundation or AMD or whoever if there is
> > > > any way they could remove the second requirement of the license?  It
> > > > adds yet another case where anybody distributing GCC has to list yet
> > > > another copyright notice.
> > > 
> > > I will raise this with the HSA PRM group and perhaps there is a slight
> > > chance that they will change this in the upcoming version of HSAIL.
> > > But it is not going to happen soon enough.
> > 
> > Under what license is
> > http://www.hsafoundation.com/html/Content/PRM/Topics/18_BRIG/_chpStr_BRIG_HSAIL_binary_format.htm
> > ?  Sounds the same as the pdf to me.
> > Unlike the pdf version thereof, you could grab the ... chunks
> > out of this fairly easily with recursive wget and some quick scripting.
> 
> E.g.
> for i in `seq 2 123`; do sed 
> 's/\r$//;s//\n\n/g;s/<\/pre>/\n<\/pre>\n/g;s/ name=[^>]*><\/a>//g' $i | sed -n '/^$/,/^<\/pre>$/{/^<.*pre>$/d;p}'; done
> on downloaded (in the order of appearance in the toc) files, I get
> following, which while it doesn't compile, I suppose some manual reordering
> and if it is needed in C, also e.g. in case of typedef BrigModuleHeader* 
> BrigModule_t; adding
> struct before BrigModuleHeader or turning that struct also into a typedef, 
> might make it work.
> Now the question is if it covers all you care about.
> 

Yes it does.  We have massaged it just a little and it works fine (and
the compiler is also also basically the same binary-wise).  So we will
go with the following hsa-brig-format.h (in its old location in gcc/).

Thanks for this input, it really helped,

Martin


/* HSA BRIG (binary representation of HSAIL) 1.0.1 representation description.
   Copyright (C) 2016 Free Software Foundation, Inc.

This file is part of GCC.

GCC is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 3, or (at your option)
any later version.

GCC is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with GCC; see the file COPYING3.  If not see
.

The contents of the file was created by extracting data structures, enum,
typedef and other definitions from HSA Programmer's Reference Manual Version
1.0.1 (http://www.hsafoundation.com/standards/).

HTML version is provided on the following link:
http://www.hsafoundation.com/html/Content/PRM/Topics/PRM_title_page.htm */

#ifndef HSA_BRIG_FORMAT_H
#define HSA_BRIG_FORMAT_H

struct BrigModuleHeader;
typedef uint16_t BrigKind16_t;
typedef uint32_t BrigVersion32_t;

typedef BrigModuleHeader *BrigModule_t;
typedef uint32_t BrigDataOffset32_t;
typedef uint32_t BrigCodeOffset32_t;
typedef uint32_t BrigOperandOffset32_t;
typedef BrigDataOffset32_t BrigDataOffsetString32_t;
typedef BrigDataOffset32_t BrigDataOffsetCodeList32_t;
typedef BrigDataOffset32_t BrigDataOffsetOperandList32_t;
typedef uint8_t BrigAlignment8_t;

enum BrigAlignment
{
  BRIG_ALIGNMENT_NONE = 0,
  BRIG_ALIGNMENT_1 = 1,
  BRIG_ALIGNMENT_2 = 2,
  BRIG_ALIGNMENT_4 = 3,
  BRIG_ALIGNMENT_8 = 4,
  BRIG_ALIGNMENT_16 = 5,
  BRIG_ALIGNMENT_32 = 6,
  BRIG_ALIGNMENT_64 = 7,
  BRIG_ALIGNMENT_128 = 8,
  BRIG_ALIGNMENT_256 = 9
};

typedef uint8_t BrigAllocation8_t;

enum BrigAllocation
{
  BRIG_ALLOCATION_NONE = 0,
  BRIG_ALLOCATION_PROGRAM = 1,
  BRIG_ALLOCATION_AGENT = 2,
  BRIG_ALLOCATION_AUTOMATIC = 3
};

typedef uint8_t BrigAluModifier8_t;

enum BrigAluModifierMask
{
  BRIG_ALU_FTZ = 1
};

typedef uint8_t BrigAtomicOperation8_t;

enum BrigAtomicOperation
{
  BRIG_ATOMIC_ADD = 0,
  BRIG_ATOMIC_AND = 1,
  BRIG_ATOMIC_CAS = 2,
  BRIG_ATOMIC_EXCH = 3,
  BRIG_ATOMIC_LD = 4,
  BRIG_ATOMIC_MAX = 5,
  BRIG_ATOMIC_MIN = 6,
  BRIG_ATOMIC_OR = 7,
  BRIG_ATOMIC_ST = 8,
  BRIG_ATOMIC_SUB = 9,
  BRIG_ATOMIC_WRAPDEC = 10,
  BRIG_ATOMIC_WRAPINC = 11,
  BRIG_ATOMIC_XOR = 12,
  BRIG_ATOMIC_WAIT_EQ = 13,
  BRIG_ATOMIC_WAIT_NE = 14,
  BRIG_ATOMIC_WAIT_LT = 15,
  BRIG_ATOMIC_WAIT_GTE = 16,
  BRIG_ATOMIC_WAITTIMEOUT_EQ = 17,
  BRIG_ATOMIC_WAITTIMEOUT_NE = 18,
  BRIG_ATOMIC_WAITTIMEOUT_LT = 19,
  BRIG_ATOMIC_WAITTIMEOUT_GTE = 20
};

struct BrigBase
{
  uint16_t byteCount;
  BrigKind16_t kind;
};

typedef uint8_t BrigCompareOperation8_t;

enum BrigCompareOperation
{
  BRIG_COMPARE_EQ = 0,
  BRIG_COMPARE_NE = 1,
  BRIG_COMPARE_LT = 2,
  BRIG_COMPARE_LE = 3,
  BRIG_COMPARE_GT = 4,
  BRIG_COMPARE_GE = 5,
  BRIG_COMPARE_EQU = 6,
  BRIG_COMPARE_NEU = 7,
  BRIG_COMPARE_LTU = 8,
  BRIG_COMPARE_LEU = 9,
  BRIG_COMPARE_GTU = 10,
  BRIG_COMPARE_GEU = 11,
  B

[PATCH 11/15] check for unstructured control flow

2016-01-15 Thread Sebastian Pop
From: Sebastian Pop 

* graphite-scop-detection.c (scop_detection::harmful_loop_in_region):
Discard unstructured if-then-else regions.
---
 gcc/graphite-scop-detection.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/gcc/graphite-scop-detection.c b/gcc/graphite-scop-detection.c
index a0c630b..f035e0d 100644
--- a/gcc/graphite-scop-detection.c
+++ b/gcc/graphite-scop-detection.c
@@ -1078,6 +1078,18 @@ scop_detection::harmful_loop_in_region (sese_l scop) 
const
  return true;
}
 
+  /* Check for unstructured control flow: CFG not generated by structured
+if-then-else.  */
+  if (bb->succs->length () > 1)
+   {
+ edge e;
+ edge_iterator ei;
+ FOR_EACH_EDGE (e, ei, bb->succs)
+   if (!dominated_by_p (CDI_POST_DOMINATORS, bb, e->dest)
+   && !dominated_by_p (CDI_DOMINATORS, e->dest, bb))
+ return true;
+   }
+
   /* Collect all loops in the current region.  */
   loop_p loop = bb->loop_father;
   if (loop_in_sese_p (loop, scop))
-- 
2.5.0



[PATCH 06/15] fix codegen error exposed by compute isl flow patch

2016-01-15 Thread Sebastian Pop
From: Sebastian Pop 

we used to fail using an iv from a different loop.

* graphite-isl-ast-to-gimple.c (enum phi_node_kind): New.
(class translate_isl_ast_to_gimple): Use phi_node_kind instead of bool.
(is_valid_rename): Same.
(translate_isl_ast_to_gimple::get_rename): Same.
(translate_isl_ast_to_gimple::rename_all_uses): Same.
(translate_isl_ast_to_gimple::rename_uses): Same.
(get_new_name): Check for close_phi nodes.
(copy_loop_phi_args): Use phi_node_kind.
(translate_isl_ast_to_gimple::copy_loop_close_phi_args): Same.
(translate_isl_ast_to_gimple::copy_cond_phi_args): Same.

gcc/testsuite

* gfortran.dg/graphite/interchange-3.f90: Adjust pattern.
---
 gcc/graphite-isl-ast-to-gimple.c   | 48 ++
 .../gfortran.dg/graphite/interchange-3.f90 |  2 +-
 2 files changed, 31 insertions(+), 19 deletions(-)

diff --git a/gcc/graphite-isl-ast-to-gimple.c b/gcc/graphite-isl-ast-to-gimple.c
index b0da425..a196419 100644
--- a/gcc/graphite-isl-ast-to-gimple.c
+++ b/gcc/graphite-isl-ast-to-gimple.c
@@ -135,6 +135,14 @@ set_separate_option (__isl_take isl_schedule_node *node, 
void *user)
   return node;
 }
 
+enum phi_node_kind
+{
+  unknown_phi,
+  loop_phi,
+  close_phi,
+  cond_phi
+};
+
 class translate_isl_ast_to_gimple
 {
  public:
@@ -317,14 +325,14 @@ class translate_isl_ast_to_gimple
  SSA form.  */
 
   bool is_valid_rename (tree rename, basic_block def_bb, basic_block use_bb,
-   bool loop_phi, tree old_name, basic_block old_bb) const;
+   phi_node_kind, tree old_name, basic_block old_bb) const;
 
   /* Returns the expression associated to OLD_NAME (which is used in OLD_BB), 
in
  NEW_BB from RENAME_MAP.  LOOP_PHI is true when we want to rename OLD_NAME
  within a loop PHI instruction.  */
 
   tree get_rename (basic_block new_bb, tree old_name,
-  basic_block old_bb, bool loop_phi) const;
+  basic_block old_bb, phi_node_kind) const;
 
   /* For ops which are scev_analyzeable, we can regenerate a new name from
   its scalar evolution around LOOP.  */
@@ -344,7 +352,7 @@ class translate_isl_ast_to_gimple
  true when we want to rename an OP within a loop PHI instruction.  */
 
   tree get_new_name (basic_block new_bb, tree op,
-basic_block old_bb, bool loop_phi) const;
+basic_block old_bb, phi_node_kind) const;
 
   /* Collect all the operands of NEW_EXPR by recursively visiting each
  operand.  */
@@ -1349,7 +1357,7 @@ phi_uses_name (basic_block bb, tree name)
 bool
 translate_isl_ast_to_gimple::
 is_valid_rename (tree rename, basic_block def_bb, basic_block use_bb,
-bool loop_phi, tree old_name, basic_block old_bb) const
+phi_node_kind phi_kind, tree old_name, basic_block old_bb) 
const
 {
   /* The def of the rename must either dominate the uses or come from a
  back-edge.  Also the def must respect the loop closed ssa form.  */
@@ -1367,7 +1375,7 @@ is_valid_rename (tree rename, basic_block def_bb, 
basic_block use_bb,
   if (dominated_by_p (CDI_DOMINATORS, use_bb, def_bb))
 return true;
 
-  if (bb_contains_loop_phi_nodes (use_bb) && loop_phi)
+  if (bb_contains_loop_phi_nodes (use_bb) && phi_kind == loop_phi)
 {
   /* The loop-header dominates the loop-body.  */
   if (!dominated_by_p (CDI_DOMINATORS, def_bb, use_bb))
@@ -1386,14 +1394,13 @@ is_valid_rename (tree rename, basic_block def_bb, 
basic_block use_bb,
 }
 
 /* Returns the expression associated to OLD_NAME (which is used in OLD_BB), in
-   NEW_BB from RENAME_MAP.  LOOP_PHI is true when we want to rename OLD_NAME
-   within a loop PHI instruction.  */
+   NEW_BB from RENAME_MAP.  PHI_KIND determines the kind of phi node.  */
 
 tree
 translate_isl_ast_to_gimple::get_rename (basic_block new_bb,
 tree old_name,
 basic_block old_bb,
-bool loop_phi) const
+phi_node_kind phi_kind) const
 {
   gcc_assert (TREE_CODE (old_name) == SSA_NAME);
   vec  *renames = region->rename_map->get (old_name);
@@ -1407,7 +1414,9 @@ translate_isl_ast_to_gimple::get_rename (basic_block 
new_bb,
   if (TREE_CODE (rename) == SSA_NAME)
{
  basic_block bb = gimple_bb (SSA_NAME_DEF_STMT (rename));
- if (is_valid_rename (rename, bb, new_bb, loop_phi, old_name, old_bb))
+ if (is_valid_rename (rename, bb, new_bb, phi_kind, old_name, old_bb)
+ && (phi_kind == close_phi
+ || flow_bb_inside_loop_p (bb->loop_father, new_bb)))
return rename;
  return NULL_TREE;
}
@@ -1435,6 +1444,9 @@ translate_isl_ast_to_gimple::get_rename (basic_block 
new_bb,
   if (!dominated_by_p (CDI_DOMINATORS, new_bb, t2_bb))
continue;
 
+

[PATCH 02/15] remove unused variable

2016-01-15 Thread Sebastian Pop
From: Sebastian Pop 

2015-12-30  Sebastian Pop  

* graphite-poly.c (new_poly_bb): Remove use of PBB_IS_REDUCTION.
* graphite.h (struct poly_bb): Remove field is_reduction.
(PBB_IS_REDUCTION): Remove.
---
 gcc/graphite-poly.c | 1 -
 gcc/graphite.h  | 4 
 2 files changed, 5 deletions(-)

diff --git a/gcc/graphite-poly.c b/gcc/graphite-poly.c
index d188341..428c427 100644
--- a/gcc/graphite-poly.c
+++ b/gcc/graphite-poly.c
@@ -164,7 +164,6 @@ new_poly_bb (scop_p scop, gimple_poly_bb_p black_box)
   PBB_SCOP (pbb) = scop;
   pbb_set_black_box (pbb, black_box);
   PBB_DRS (pbb).create (3);
-  PBB_IS_REDUCTION (pbb) = false;
   GBB_PBB ((gimple_poly_bb_p) black_box) = pbb;
 
   return pbb;
diff --git a/gcc/graphite.h b/gcc/graphite.h
index 83f8191..f9af292 100644
--- a/gcc/graphite.h
+++ b/gcc/graphite.h
@@ -281,9 +281,6 @@ struct poly_bb
   /* A copy of the transformed scattering.  */
   isl_map *saved;
 
-  /* True when this PBB contains only a reduction statement.  */
-  bool is_reduction;
-
   /* The last basic block generated for this pbb.  */
   basic_block new_bb;
 };
@@ -291,7 +288,6 @@ struct poly_bb
 #define PBB_BLACK_BOX(PBB) ((gimple_poly_bb_p) PBB->black_box)
 #define PBB_SCOP(PBB) (PBB->scop)
 #define PBB_DRS(PBB) (PBB->drs)
-#define PBB_IS_REDUCTION(PBB) (PBB->is_reduction)
 
 extern poly_bb_p new_poly_bb (scop_p, gimple_poly_bb_p);
 extern void free_poly_bb (poly_bb_p);
-- 
2.5.0



Re: [PATCH][AArch64] Handle compare of zero_extract form of TST-immediate in rtx costs

2016-01-15 Thread James Greenhalgh
On Mon, Jan 11, 2016 at 04:41:22PM +, Kyrill Tkachov wrote:
> Hi all,
> 
> The test gcc.target/aarch64/tst_3.c fails for an explicit -mcpu=cortex-a53
> because we don't handle the recent compare with zero_extract pattern properly
> in rtx costs, so we end up recursing into its operands and end up rejecting
> the combination for some CPUs, generating an AND-immediate followed by a
> comparison against zero, instead of the TST-immediate instruction expected by
> the test.
> 
> This patch adds handling for that pattern so that we properly handle it the
> same ways as an ANDS instruction.  With this patch the aforementioned test
> passes for -mcpu=cortex-a53 as well.
> 
> Bootstrapped and tested on aarch64-none-linux-gnu.
> Ok for trunk?

OK.

Thanks,
James

> 
> Thanks,
> Kyrill
> 
> 2016-01-11  Kyrylo Tkachov  
> 
> * config/aarch64/aarch64.c (aarch64_rtx_costs, COMPARE case):
> Handle COMPARE of ZERO_EXTRACT against zero form of TST-immediate.



[PATCH 03/15] fix PR68343: disable graphite tests for isl 0.14 or earlier

2016-01-15 Thread Sebastian Pop
From: Aditya Kumar 

The patch disables all optimizations when configuring gcc with isl 0.14 or 
earlier.
The next patch makes use of the schedule-trees that is only availaible in isl 
0.15.

ChangeLog:

* Makefile.in: Regenerate.
* Makefile.tpl: Export ISLVER.
* configure: Regenerate.
* config/isl.m4: Detect isl-0.15.

gcc/

* Makefile.in: Set ISLVER in site.exp.
* config.in: Regenerate.
* configure: Regenerate.
* configure.ac: Define HAVE_isl for isl-0.15.
* graphite-isl-ast-to-gimple.c: Remove #ifdefs related to isl-0.15.
* graphite-optimize-isl.c: Same.
* graphite.c: Same.
* graphite.h: Same.
* toplev.c: Same.

gcc/testsuite/

* g++.dg/graphite/graphite.exp: Only run the tests with isl-0.15.
* gcc.dg/graphite/graphite.exp: Same.
* gfortran.dg/graphite/graphite.exp: Same.

libgomp/

* config/isl.m4: New file.
* configure: Regenerate.
* configure.ac: Detect isl-0.15.
* testsuite/Makefile.am: Set ISLVER in libgomp-test-support.exp.
* testsuite/Makefile.in: Regenerate.
* testsuite/libgomp.graphite/graphite.exp: Only run the tests with
isl-0.15.
---
 Makefile.in |   2 +
 Makefile.tpl|   2 +
 config/isl.m4   |  12 ++
 configure   |  29 
 gcc/Makefile.in |   1 +
 gcc/config.in   |   6 -
 gcc/configure   |  43 +
 gcc/configure.ac|  26 +--
 gcc/graphite-isl-ast-to-gimple.c|  18 --
 gcc/graphite-optimize-isl.c | 208 
 gcc/graphite.c  |  12 +-
 gcc/graphite.h  |   9 -
 gcc/testsuite/g++.dg/graphite/graphite.exp  |   5 +
 gcc/testsuite/gcc.dg/graphite/graphite.exp  |   5 +
 gcc/testsuite/gfortran.dg/graphite/graphite.exp |   5 +
 gcc/toplev.c|   6 +-
 libgomp/config/isl.m4   | 158 ++
 libgomp/configure   | 201 ++-
 libgomp/configure.ac|  24 +++
 libgomp/testsuite/Makefile.am   |   2 +
 libgomp/testsuite/Makefile.in   |   3 +
 libgomp/testsuite/libgomp.graphite/graphite.exp |   5 +
 22 files changed, 468 insertions(+), 314 deletions(-)
 create mode 100644 libgomp/config/isl.m4

diff --git a/Makefile.in b/Makefile.in
index e9b5950..d2c5b9f 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -222,6 +222,7 @@ HOST_EXPORTS = \
GMPINC="$(HOST_GMPINC)"; export GMPINC; \
ISLLIBS="$(HOST_ISLLIBS)"; export ISLLIBS; \
ISLINC="$(HOST_ISLINC)"; export ISLINC; \
+   ISLVER="$(HOST_ISLVER)"; export ISLVER; \
LIBELFLIBS="$(HOST_LIBELFLIBS)"; export LIBELFLIBS; \
LIBELFINC="$(HOST_LIBELFINC)"; export LIBELFINC; \
XGCC_FLAGS_FOR_TARGET="$(XGCC_FLAGS_FOR_TARGET)"; export 
XGCC_FLAGS_FOR_TARGET; \
@@ -315,6 +316,7 @@ HOST_GMPINC = @gmpinc@
 # Where to find isl
 HOST_ISLLIBS = @isllibs@
 HOST_ISLINC = @islinc@
+HOST_ISLVER = @islver@
 
 # Where to find libelf
 HOST_LIBELFLIBS = @libelflibs@
diff --git a/Makefile.tpl b/Makefile.tpl
index f7bb77e..88c2810 100644
--- a/Makefile.tpl
+++ b/Makefile.tpl
@@ -225,6 +225,7 @@ HOST_EXPORTS = \
GMPINC="$(HOST_GMPINC)"; export GMPINC; \
ISLLIBS="$(HOST_ISLLIBS)"; export ISLLIBS; \
ISLINC="$(HOST_ISLINC)"; export ISLINC; \
+   ISLVER="$(HOST_ISLVER)"; export ISLVER; \
LIBELFLIBS="$(HOST_LIBELFLIBS)"; export LIBELFLIBS; \
LIBELFINC="$(HOST_LIBELFINC)"; export LIBELFINC; \
XGCC_FLAGS_FOR_TARGET="$(XGCC_FLAGS_FOR_TARGET)"; export 
XGCC_FLAGS_FOR_TARGET; \
@@ -318,6 +319,7 @@ HOST_GMPINC = @gmpinc@
 # Where to find isl
 HOST_ISLLIBS = @isllibs@
 HOST_ISLINC = @islinc@
+HOST_ISLVER = @islver@
 
 # Where to find libelf
 HOST_LIBELFLIBS = @libelflibs@
diff --git a/config/isl.m4 b/config/isl.m4
index 86ccb94..2cfeb46 100644
--- a/config/isl.m4
+++ b/config/isl.m4
@@ -117,6 +117,18 @@ AC_DEFUN([ISL_CHECK_VERSION],
   AC_MSG_RESULT([recommended isl version is 0.15, minimum required isl 
version 0.14 is deprecated])
 fi
 
+AC_MSG_CHECKING([Checking for isl-0.15])
+AC_TRY_LINK([#include ],
+[isl_options_set_schedule_serialize_sccs (NULL, 0);],
+[ac_has_isl_options_set_schedule_serialize_sccs=yes],
+[ac_has_isl_options_set_schedule_serialize_sccs=no])
+AC_MSG_RESULT($ac_has_isl_options_set_schedule_serialize_sccs)
+
+if test x"$ac_has_isl_options_set_schedule_serialize_sccs" = x"yes"; then
+  islver="0.15"
+  AC_SUBST([islver])
+fi
+
 CFLAGS=$_isl_saved_CFLAGS
 L

[PATCH 07/15] check that all loops are valid in the combined region

2016-01-15 Thread Sebastian Pop
From: Sebastian Pop 

the bug was exposed by rewriting an if condition into an assert in the 
computation
of the loop iteration domains.

* graphite-scop-detection.c (loop_is_valid_scop): Renamed 
loop_is_valid_in_scop.
(scop_detection::harmful_stmt_in_region): Renamed 
harmful_loop_in_region.
Call loop_is_valid_in_scop.
---
 gcc/graphite-scop-detection.c | 56 ---
 1 file changed, 42 insertions(+), 14 deletions(-)

diff --git a/gcc/graphite-scop-detection.c b/gcc/graphite-scop-detection.c
index ad11227..e004185 100644
--- a/gcc/graphite-scop-detection.c
+++ b/gcc/graphite-scop-detection.c
@@ -554,7 +554,7 @@ public:
  region of code that can be represented in the polyhedral model.  SCOP
  defines the region we analyse.  */
 
-  bool loop_is_valid_scop (loop_p loop, sese_l scop) const;
+  bool loop_is_valid_in_scop (loop_p loop, sese_l scop) const;
 
   /* Return true when BEGIN is the preheader edge of a loop with a single exit
  END.  */
@@ -597,7 +597,7 @@ public:
  Limit the number of bbs between adjacent loops to
  PARAM_SCOP_MAX_NUM_BBS_BETWEEN_LOOPS.  */
 
-  bool harmful_stmt_in_region (sese_l scop) const;
+  bool harmful_loop_in_region (sese_l scop) const;
 
   /* Return true only when STMT is simple enough for being handled by Graphite.
  This depends on SCOP, as the parameters are initialized relatively to
@@ -777,8 +777,9 @@ scop_detection::merge_sese (sese_l first, sese_l second) 
const
   if (!second)
 return first;
 
-  DEBUG_PRINT (dp << "[try-merging-sese] s1: "; print_sese (dump_file, first);
-  dp << "[try-merging-sese] s2: ";
+  DEBUG_PRINT (dp << "[scop-detection] try merging sese s1: ";
+  print_sese (dump_file, first);
+  dp << "[scop-detection] try merging sese s2: ";
   print_sese (dump_file, second));
 
   /* Assumption: Both the sese's should be at the same loop depth or one scop
@@ -807,7 +808,7 @@ scop_detection::merge_sese (sese_l first, sese_l second) 
const
 
   sese_l combined (entry, exit);
 
-  DEBUG_PRINT (dp << "checking combined sese: ";
+  DEBUG_PRINT (dp << "[scop-detection] checking combined sese: ";
   print_sese (dump_file, combined));
 
   /* FIXME: We could iterate to find the dom which dominates pdom, and pdom
@@ -849,7 +850,7 @@ scop_detection::merge_sese (sese_l first, sese_l second) 
const
 }
 
   /* Analyze all the BBs in new sese.  */
-  if (harmful_stmt_in_region (combined))
+  if (harmful_loop_in_region (combined))
 return invalid_sese;
 
   DEBUG_PRINT (dp << "[merged-sese] s1: "; print_sese (dump_file, combined));
@@ -877,7 +878,7 @@ scop_detection::build_scop_depth (sese_l s, loop_p loop)
   return s;
 }
 
-  if (!loop_is_valid_scop (loop, s2))
+  if (!loop_is_valid_in_scop (loop, s2))
 return build_scop_depth (invalid_sese, loop->next);
 
   return build_scop_breadth (s2, loop);
@@ -954,7 +955,7 @@ scop_detection::can_represent_loop (loop_p loop, sese_l 
scop)
defines the region we analyse.  */
 
 bool
-scop_detection::loop_is_valid_scop (loop_p loop, sese_l scop) const
+scop_detection::loop_is_valid_in_scop (loop_p loop, sese_l scop) const
 {
   if (!scop)
 return false;
@@ -1008,7 +1009,7 @@ scop_detection::add_scop (sese_l s)
   /* Do not add scops with only one loop.  */
   if (region_has_one_loop (s))
 {
-  DEBUG_PRINT (dp << "[scop-detection-fail] Discarding one loop SCoP.\n";
+  DEBUG_PRINT (dp << "[scop-detection-fail] Discarding one loop SCoP: ";
   print_sese (dump_file, s));
   return;
 }
@@ -1016,7 +1017,7 @@ scop_detection::add_scop (sese_l s)
   if (get_exit_bb (s) == EXIT_BLOCK_PTR_FOR_FN (cfun))
 {
   DEBUG_PRINT (dp << "[scop-detection-fail] "
- << "Discarding SCoP exiting to return.";
+ << "Discarding SCoP exiting to return: ";
   print_sese (dump_file, s));
   return;
 }
@@ -1029,7 +1030,7 @@ scop_detection::add_scop (sese_l s)
   remove_intersecting_scops (s);
 
   scops.safe_push (s);
-  DEBUG_PRINT (dp << "Adding SCoP "; print_sese (dump_file, s));
+  DEBUG_PRINT (dp << "[scop-detection] Adding SCoP: "; print_sese (dump_file, 
s));
 }
 
 /* Return true when a statement in SCOP cannot be represented by Graphite.
@@ -1038,7 +1039,7 @@ scop_detection::add_scop (sese_l s)
PARAM_SCOP_MAX_NUM_BBS_BETWEEN_LOOPS.  */
 
 bool
-scop_detection::harmful_stmt_in_region (sese_l scop) const
+scop_detection::harmful_loop_in_region (sese_l scop) const
 {
   basic_block exit_bb = get_exit_bb (scop);
   basic_block entry_bb = get_entry_bb (scop);
@@ -1056,6 +1057,7 @@ scop_detection::harmful_stmt_in_region (sese_l scop) const
   = get_dominated_to_depth (CDI_DOMINATORS, entry_bb, depth);
   int i;
   basic_block bb;
+  bitmap loops = BITMAP_ALLOC (NULL);
   FOR_EACH_VEC_ELT (dom, i, bb)
 {
   DEBUG_PRINT (dp << "Visiting bb_" << bb->index << "\n");
@

[PATCH 09/15] fix memory leak in scop-detection

2016-01-15 Thread Sebastian Pop
From: Sebastian Pop 

* graphite-scop-detection.c
(scop_detection::harmful_loop_in_region): Free dom and loops.
(scop_detection::loop_body_is_valid_scop): Free bbs.
---
 gcc/graphite-scop-detection.c | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/gcc/graphite-scop-detection.c b/gcc/graphite-scop-detection.c
index be33be3..a0c630b 100644
--- a/gcc/graphite-scop-detection.c
+++ b/gcc/graphite-scop-detection.c
@@ -1088,7 +1088,11 @@ scop_detection::harmful_loop_in_region (sese_l scop) 
const
 any loop fully contained in the scop: other bbs are checked below
 in loop_is_valid_in_scop.  */
  if (harmful_stmt_in_bb (scop, bb))
-   return true;
+   {
+ dom.release ();
+ BITMAP_FREE (loops);
+ return true;
+   }
}
 
 }
@@ -1104,13 +1108,14 @@ scop_detection::harmful_loop_in_region (sese_l scop) 
const
 
   if (!loop_is_valid_in_scop (loop, scop))
{
+ dom.release ();
  BITMAP_FREE (loops);
  return true;
}
 }
 
-  BITMAP_FREE (loops);
   dom.release ();
+  BITMAP_FREE (loops);
   return false;
 }
 
@@ -1503,7 +1508,10 @@ scop_detection::loop_body_is_valid_scop (loop_p loop, 
sese_l scop) const
   basic_block bb = bbs[i];
 
   if (harmful_stmt_in_bb (scop, bb))
-   return false;
+   {
+ free (bbs);
+ return false;
+   }
 }
   free (bbs);
 
-- 
2.5.0



Re: [PATCH][AArch64] Properly reject invalid attribute strings

2016-01-15 Thread James Greenhalgh
On Fri, Jan 15, 2016 at 01:39:54PM +, Kyrill Tkachov wrote:
> Hi all,
> 
> A bug in the target attribute parsing logic led to us silently accepting
> attribute strings that did not appear in the attributes table i.e invalid
> attributes.
> 
> This patch fixes that oversight so we now error out on obviously bogus
> strings.

This is ok.

> 2016-01-15  Kyrylo Tkachov  
> 
> * config/aarch64/aarch64.c (aarch64_process_one_target_attr): Return
> false when argument string is not found in the attributes table
> at all.
> 
> 2016-01-15  Kyrylo Tkachov  
> 
> * gcc.target/aarch64/target_attr_17.c: New test.

> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
> e54ce7985f52c6a61b2ef1e3d7f847f22b1a959f..f2e4b45ac0ad1223e8149d1a35782c13f493a740
>  100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -8938,6 +8938,7 @@ aarch64_process_one_target_attr (char *arg_str, const 
> char* pragma_or_attr)
>arg++;
>  }
>const struct aarch64_attribute_info *p_attr;
> +  bool found = false;
>for (p_attr = aarch64_attributes; p_attr->name; p_attr++)
>  {
>/* If the names don't match up, or the user has given an argument
> @@ -8946,6 +8947,7 @@ aarch64_process_one_target_attr (char *arg_str, const 
> char* pragma_or_attr)
>if (strcmp (str_to_check, p_attr->name) != 0)
>   continue;
>  
> +  found = true;
>bool attr_need_arg_p = p_attr->attr_type == aarch64_attr_custom
> || p_attr->attr_type == aarch64_attr_enum;
>  
> @@ -9025,7 +9027,10 @@ aarch64_process_one_target_attr (char *arg_str, const 
> char* pragma_or_attr)
>   }
>  }
>  
> -  return true;
> +  /* If we reached here we either have found an attribute and validated
> + it or didn't match any.  If we matched an attribute but its arguments
> + were malformed we will have returned false already.  */
> +  return found;

I don't like this "found" variable, it normally smells of a refactoring
opportunity. I wonder whether you could clean the function up a bit by
restructuring the logic.

Regardless, this is OK for trunk.

Thanks,
James



Re: [PATCH, PR68976] Use reaching def phi arg in sese_add_exit_phis_edge

2016-01-15 Thread Sebastian Pop
On Fri, Jan 15, 2016 at 7:58 AM, Tom de Vries  wrote:
> During scop detection/canonicalize_loop_closed_ssa_form, an exit phi is
> introduced in the loop for _24:
> ...
>   :
>   # _58 = PHI <_24(22)>
> ...
> Note that _24 is not defined in the loop, but before it. AFAIU the header
> comment of canonicalize_loop_closed_ssa_form, this phi is not needed. That
> might be the root cause of the bug,

I think that may be the problem, as it is invariant in the loops, so
it is considered to be a parameter of the scop.
Let me see if we could avoid adding that phi node in the first place.


[PATCH 04/15] add missing ast node for isl 0.15

2016-01-15 Thread Sebastian Pop
From: Sebastian Pop 

* graphite-isl-ast-to-gimple.c (translate_isl_ast): Also handle
isl_ast_node_mark.
---
 gcc/graphite-isl-ast-to-gimple.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/gcc/graphite-isl-ast-to-gimple.c b/gcc/graphite-isl-ast-to-gimple.c
index d143ef7..dad802f 100644
--- a/gcc/graphite-isl-ast-to-gimple.c
+++ b/gcc/graphite-isl-ast-to-gimple.c
@@ -1242,6 +1242,13 @@ translate_isl_ast_to_gimple::translate_isl_ast (loop_p 
context_loop,
 case isl_ast_node_block:
   return translate_isl_ast_node_block (context_loop, node,
   next_e, ip);
+case isl_ast_node_mark:
+  {
+   isl_ast_node *n = isl_ast_node_mark_get_node (node);
+   edge e = translate_isl_ast (context_loop, n, next_e, ip);
+   isl_ast_node_free (n);
+   return e;
+  }
 
 default:
   gcc_unreachable ();
-- 
2.5.0



[PATCH 01/15] add more coalescing to simplify constraints

2016-01-15 Thread Sebastian Pop
From: Sebastian Pop 

2015-12-30  Aditya Kumar  
Sebastian Pop  

* graphite-dependences.c (constrain_domain): Add call to isl_*_coalesce.
(add_pdr_constraints): Same.
(scop_get_reads): Same.
(scop_get_must_writes): Same.
(scop_get_may_writes): Same.
(scop_get_original_schedule): Same.
(extend_schedule): Same.
(apply_schedule_on_deps): Same.
(carries_deps): Same.
(compute_deps): Same.
(scop_get_dependences): Same.
* graphite-isl-ast-to-gimple.c
(translate_isl_ast_to_gimple::generate_isl_schedule): Same.
* graphite-optimize-isl.c (get_schedule_for_band): Same.
(get_schedule_for_band_list): Same.
(get_schedule_map): Same.
(apply_schedule_map_to_scop): Same.
* graphite-sese-to-poly.c (build_pbb_scattering_polyhedrons): Same.
(build_loop_iteration_domains): Same.
(add_condition_to_pbb): Same.
(add_param_constraints): Same.
(pdr_add_memory_accesses): Same.
(pdr_add_data_dimensions): Same.
---
 gcc/graphite-dependences.c   | 63 ++--
 gcc/graphite-isl-ast-to-gimple.c |  2 ++
 gcc/graphite-optimize-isl.c  | 12 
 gcc/graphite-sese-to-poly.c  | 28 --
 4 files changed, 56 insertions(+), 49 deletions(-)

diff --git a/gcc/graphite-dependences.c b/gcc/graphite-dependences.c
index 46869d7..ae08059 100644
--- a/gcc/graphite-dependences.c
+++ b/gcc/graphite-dependences.c
@@ -49,7 +49,7 @@ constrain_domain (isl_map *map, isl_set *s)
 
   s = isl_set_set_tuple_id (s, id);
   isl_space_free (d);
-  return isl_map_intersect_domain (map, s);
+  return isl_map_coalesce (isl_map_intersect_domain (map, s));
 }
 
 /* Constrain pdr->accesses with pdr->subscript_sizes and pbb->domain.  */
@@ -59,8 +59,8 @@ add_pdr_constraints (poly_dr_p pdr, poly_bb_p pbb)
 {
   isl_map *x = isl_map_intersect_range (isl_map_copy (pdr->accesses),
isl_set_copy (pdr->subscript_sizes));
-  x = constrain_domain (x, isl_set_copy (pbb->domain));
-  return x;
+  x = isl_map_coalesce (x);
+  return constrain_domain (x, isl_set_copy (pbb->domain));
 }
 
 /* Returns all the memory reads in SCOP.  */
@@ -93,7 +93,7 @@ scop_get_reads (scop_p scop, vec pbbs)
  }
 }
 
-  return res;
+  return isl_union_map_coalesce (res);
 }
 
 /* Returns all the memory must writes in SCOP.  */
@@ -126,7 +126,7 @@ scop_get_must_writes (scop_p scop, vec pbbs)
  }
 }
 
-  return res;
+  return isl_union_map_coalesce (res);
 }
 
 /* Returns all the memory may writes in SCOP.  */
@@ -159,7 +159,7 @@ scop_get_may_writes (scop_p scop, vec pbbs)
  }
 }
 
-  return res;
+  return isl_union_map_coalesce (res);
 }
 
 /* Returns all the original schedules in SCOP.  */
@@ -179,7 +179,7 @@ scop_get_original_schedule (scop_p scop, vec 
pbbs)
isl_set_copy (pbb->domain)));
 }
 
-  return res;
+  return isl_union_map_coalesce (res);
 }
 
 /* Helper function used on each MAP of a isl_union_map.  Computes the
@@ -242,7 +242,7 @@ extend_schedule (__isl_take isl_union_map *x)
   str.umap = isl_union_map_empty (isl_union_map_get_space (x));
   isl_union_map_foreach_map (x, extend_schedule_1, (void *) &str);
   isl_union_map_free (x);
-  return str.umap;
+  return isl_union_map_coalesce (str.umap);
 }
 
 /* Applies SCHEDULE to the in and out dimensions of the dependences
@@ -252,22 +252,17 @@ static isl_map *
 apply_schedule_on_deps (__isl_keep isl_union_map *schedule,
__isl_keep isl_union_map *deps)
 {
-  isl_map *x;
-  isl_union_map *ux, *trans;
-
-  trans = isl_union_map_copy (schedule);
-  trans = extend_schedule (trans);
-  ux = isl_union_map_copy (deps);
+  isl_union_map *trans = extend_schedule (isl_union_map_copy (schedule));
+  isl_union_map *ux = isl_union_map_copy (deps);
   ux = isl_union_map_apply_domain (ux, isl_union_map_copy (trans));
   ux = isl_union_map_apply_range (ux, trans);
-  if (isl_union_map_is_empty (ux))
-{
-  isl_union_map_free (ux);
-  return NULL;
-}
-  x = isl_map_from_union_map (ux);
+  ux = isl_union_map_coalesce (ux);
+
+  if (!isl_union_map_is_empty (ux))
+return isl_map_from_union_map (ux);
 
-  return x;
+  isl_union_map_free (ux);
+  return NULL;
 }
 
 /* Return true when DEPS is non empty and the intersection of LEX with
@@ -280,25 +275,19 @@ carries_deps (__isl_keep isl_union_map *schedule,
  __isl_keep isl_union_map *deps,
  int depth)
 {
-  bool res;
-  int i;
-  isl_space *space;
-  isl_map *lex, *x;
-  isl_constraint *ineq;
-
   if (isl_union_map_is_empty (deps))
 return false;
 
-  x = apply_schedule_on_deps (schedule, deps);
+  isl_map *x = apply_schedule_on_deps (schedule, deps);
   if (x == NULL)
 return false;
-  space = isl_map_get_space (x);
-  space = isl_space_range (space);
-  lex = isl_map_lex_le (spac

Thoughts on memcmp expansion (PR43052)

2016-01-15 Thread Bernd Schmidt
PR43052 is a PR complaining about how the rep cmpsb expansion that gcc 
uses for memcmp is slower than the library function. As is so often the 
case, if you investigate a bit, you can find a lot of issues with the 
current situation in the compiler.


This PR was accidentally fixed by a patch by Nick which disabled the use 
of cmpstrnsi for memcmp expansion, on the grounds that cmpstrnsi could 
stop looking after seeing a null byte, which would be invalid for 
memcmp, so only cmpmemsi should be used. This fix was for an out-of-tree 
target.


I believe the rep cmpsb sequence used by i386 would actually be valid, 
so we could duplicate the cmpstrn pattern to also match cmpmem and be 
done - but that would then again cause the performance problem described 
in the PR, so it's probably not a good idea.


One question Richard posed in the comments: why aren't we optimizing 
small constant size memcmps other than size 1 to *s == *q? The reason is 
the return value of memcmp, which implies byte-sized operation 
(incidentally, the use of SImode in the cmpmem/cmpstr patterns is really 
odd). It's possible to work around this, but expansion becomes a little 
more tricky (subtract after bswap, maybe). Still, the current code 
generation is lame.


So, for gcc-6, I think we shouldn't do anything. The PR is fixed, and 
there's no easy bug-fix that can be done to improve matters. Not sure 
whether to keep the PR open or create a new one for the remaining 
issues. For the next stage1, I'm attaching a proof-of-concept patch that 
does the following:

 * notice if memcmp results are only used for equality comparison
   against zero
 * if so, replace with a different builtin __memcmp_eq
 * Expand __memcmp_eq for small constant sizes with loads and
   comparison, fall back to a memcmp call.

The whole thing could be extended to work for sizes larger than an int, 
along the lines of memcpy expansion controlled by move ratio etc. Thoughts?



Bernd
Index: gcc/builtins.c
===
--- gcc/builtins.c	(revision 232359)
+++ gcc/builtins.c	(working copy)
@@ -3699,25 +3699,22 @@ expand_cmpstrn_or_cmpmem (insn_code icod
 
 /* Expand expression EXP, which is a call to the memcmp built-in function.
Return NULL_RTX if we failed and the caller should emit a normal call,
-   otherwise try to get the result in TARGET, if convenient.  */
+   otherwise try to get the result in TARGET, if convenient.
+   RESULT_EQ is true if we can relax the returned value to be either zero
+   or nonzero, without caring about the sign.  */
 
 static rtx
-expand_builtin_memcmp (tree exp, rtx target)
+expand_builtin_memcmp (tree exp, rtx target, bool result_eq)
 {
   if (!validate_arglist (exp,
  			 POINTER_TYPE, POINTER_TYPE, INTEGER_TYPE, VOID_TYPE))
 return NULL_RTX;
 
-  /* Note: The cmpstrnsi pattern, if it exists, is not suitable for
- implementing memcmp because it will stop if it encounters two
- zero bytes.  */
-  insn_code icode = direct_optab_handler (cmpmem_optab, SImode);
-  if (icode == CODE_FOR_nothing)
-return NULL_RTX;
-
   tree arg1 = CALL_EXPR_ARG (exp, 0);
   tree arg2 = CALL_EXPR_ARG (exp, 1);
   tree len = CALL_EXPR_ARG (exp, 2);
+  machine_mode mode = TYPE_MODE (TREE_TYPE (exp));
+  location_t loc = EXPR_LOCATION (exp);
 
   unsigned int arg1_align = get_pointer_alignment (arg1) / BITS_PER_UNIT;
   unsigned int arg2_align = get_pointer_alignment (arg2) / BITS_PER_UNIT;
@@ -3725,12 +3722,27 @@ expand_builtin_memcmp (tree exp, rtx tar
   /* If we don't have POINTER_TYPE, call the function.  */
   if (arg1_align == 0 || arg2_align == 0)
 return NULL_RTX;
+  unsigned int min_align = MIN (arg1_align, arg2_align);
+
+  /* Note: The cmpstrnsi pattern, if it exists, is not suitable for
+ implementing memcmp because it will stop if it encounters two
+ zero bytes.  */
+  insn_code icode = direct_optab_handler (cmpmem_optab, SImode);
+
+  rtx arg3_rtx = expand_normal (fold_convert_loc (loc, sizetype, len));
+  machine_mode direct_mode = VOIDmode;
+  if (CONST_INT_P (arg3_rtx))
+direct_mode = mode_for_size (INTVAL (arg3_rtx) * BITS_PER_UNIT,
+ MODE_INT, 1);
+  if (icode == CODE_FOR_nothing
+  && (!result_eq
+	  || direct_mode == VOIDmode
+	  || direct_mode == BLKmode
+	  || GET_MODE_ALIGNMENT (direct_mode) / BITS_PER_UNIT > min_align))
+return NULL_RTX;
 
-  machine_mode mode = TYPE_MODE (TREE_TYPE (exp));
-  location_t loc = EXPR_LOCATION (exp);
   rtx arg1_rtx = get_memory_rtx (arg1, len);
   rtx arg2_rtx = get_memory_rtx (arg2, len);
-  rtx arg3_rtx = expand_normal (fold_convert_loc (loc, sizetype, len));
 
   /* Set MEM_SIZE as appropriate.  */
   if (CONST_INT_P (arg3_rtx))
@@ -3739,6 +3751,27 @@ expand_builtin_memcmp (tree exp, rtx tar
   set_mem_size (arg2_rtx, INTVAL (arg3_rtx));
 }
 
+  if (icode == CODE_FOR_nothing)
+{
+  arg1_rtx = change_address (arg1_rtx, direct_mode, XEXP (arg1_rtx, 0));
+  arg2_rtx = change_

Re: [hsa merge 07/10] IPA-HSA pass

2016-01-15 Thread Jakub Jelinek
On Fri, Jan 15, 2016 at 07:38:14PM +0300, Ilya Verbin wrote:
> On Fri, Jan 15, 2016 at 17:09:54 +0100, Jakub Jelinek wrote:
> > On Fri, Jan 15, 2016 at 05:02:34PM +0100, Martin Jambor wrote:
> > > How do other accelerators cope with the situation when half of the
> > > application is compiled with the accelerator disabled?  (Would some of
> > > their calls to GOMP_target_ext lead to abort?)
> > 
> > GOMP_target_ext should never abort (unless internal error), worst case it
> > just falls back into the host fallback.
> 
> Wouldn't that lead to hard-to-find problems in case of nonshared memory?
> I mean when someone expects that all target regions are executed on the 
> device,
> but in fact some of them are silently executed on the host with different data
> environment.

E.g. for HSA it really shouldn't matter, as it is shared memory accelerator.
For XeonPhi we hopefully can offload anything.  NVPTX is problematic,
because it can't offload all the code, but if it can be e.g. compile time
detected that it will not be possible, it can just provide offloaded code
for the target.

Jakub


Re: [hsa merge 07/10] IPA-HSA pass

2016-01-15 Thread Ilya Verbin
On Fri, Jan 15, 2016 at 17:09:54 +0100, Jakub Jelinek wrote:
> On Fri, Jan 15, 2016 at 05:02:34PM +0100, Martin Jambor wrote:
> > How do other accelerators cope with the situation when half of the
> > application is compiled with the accelerator disabled?  (Would some of
> > their calls to GOMP_target_ext lead to abort?)
> 
> GOMP_target_ext should never abort (unless internal error), worst case it
> just falls back into the host fallback.

Wouldn't that lead to hard-to-find problems in case of nonshared memory?
I mean when someone expects that all target regions are executed on the device,
but in fact some of them are silently executed on the host with different data
environment.

  -- Ilya


Re: IRA fix for 47992

2016-01-15 Thread Jeff Law

On 01/15/2016 06:42 AM, Bernd Schmidt wrote:

This is a report of a crash in IRA. If you debug it with a sufficiently
old compiler, you'll find that we manage to delete some basic blocks
from within IRA. Later on, reload calls alter_reg for all unallocated
pseudos, including one that only occurs in the deleted blocks. reload
does not notice it is unused, because REG_N_REFS is still 3. We crash in
an IRA callback because the pseudo has no allocno.

Fixed as below. A similar patch cures the problem in gcc-4.6. Adding a
testcase seems pointless - the crashing code was derived from an
existing Fortran testcase with exotic options, and the whole thing only
ever triggered in one gcc version AFAICT. Current gcc can't trigger it
because it's not using reload on x86_64.

Bootstrapped and tested on x86_64-linux, ok?

OK.
jeff



Re: [hsa merge 07/10] IPA-HSA pass

2016-01-15 Thread Jakub Jelinek
On Fri, Jan 15, 2016 at 05:02:34PM +0100, Martin Jambor wrote:
> How do other accelerators cope with the situation when half of the
> application is compiled with the accelerator disabled?  (Would some of
> their calls to GOMP_target_ext lead to abort?)

GOMP_target_ext should never abort (unless internal error), worst case it
just falls back into the host fallback.

Jakub


PR68609

2016-01-15 Thread David Edelsohn
My initial implementation of software sqrt based on estimate was
fragile for denormal inputs.  This revised version converts both sqrt
and rsqrt to use Goldschmidt's Algorithm and calculates sqrt through
an iterative correction to a sqrt estimate.

Because sqrt only is profitable for 1 iteration, this patch also
restricts swsqrt to processors that generate a high precision
estimate.

Bootstrapped on powerpc-ibm-aix7.1.0.0 and powerpc64le-linux.

Thanks, David

PR target/68609
* config/rs6000/rs6000.c (rs6000_emit_msub): Delete.
(rs6000_emit_swsqrt): Convert to Goldschmidt's Algorithm
* config/rs6000/rs6000.md (sqrt2): Limit swsqrt to high
precision estimate.

Index: rs6000.c
===
--- rs6000.c(revision 232326)
+++ rs6000.c(working copy)
@@ -32769,29 +32769,6 @@
 emit_move_insn (target, dst);
 }

-/* Generate a FMSUB instruction: dst = fma(m1, m2, -a).  */
-
-static void
-rs6000_emit_msub (rtx target, rtx m1, rtx m2, rtx a)
-{
-  machine_mode mode = GET_MODE (target);
-  rtx dst;
-
-  /* Altivec does not support fms directly;
- generate in terms of fma in that case.  */
-  if (optab_handler (fms_optab, mode) != CODE_FOR_nothing)
-dst = expand_ternary_op (mode, fms_optab, m1, m2, a, target, 0);
-  else
-{
-  a = expand_unop (mode, neg_optab, a, NULL_RTX, 0);
-  dst = expand_ternary_op (mode, fma_optab, m1, m2, a, target, 0);
-}
-  gcc_assert (dst != NULL);
-
-  if (dst != target)
-emit_move_insn (target, dst);
-}
-
 /* Generate a FNMSUB instruction: dst = -fma(m1, m2, -a).  */

 static void
@@ -32890,15 +32867,16 @@
 add_reg_note (get_last_insn (), REG_EQUAL, gen_rtx_DIV (mode, n, d));
 }

-/* Newton-Raphson approximation of single/double-precision floating point
-   rsqrt.  Assumes no trapping math and finite arguments.  */
+/* Goldschmidt's Algorithm for single/double-precision floating point
+   sqrt and rsqrt.  Assumes no trapping math and finite arguments.  */

 void
 rs6000_emit_swsqrt (rtx dst, rtx src, bool recip)
 {
   machine_mode mode = GET_MODE (src);
-  rtx x0 = gen_reg_rtx (mode);
-  rtx y = gen_reg_rtx (mode);
+  rtx e = gen_reg_rtx (mode);
+  rtx g = gen_reg_rtx (mode);
+  rtx h = gen_reg_rtx (mode);

   /* Low precision estimates guarantee 5 bits of accuracy.  High
  precision estimates guarantee 14 bits of accuracy.  SFmode
@@ -32909,55 +32887,68 @@
   if (mode == DFmode || mode == V2DFmode)
 passes++;

-  REAL_VALUE_TYPE dconst3_2;
   int i;
-  rtx halfthree;
+  rtx mhalf;
   enum insn_code code = optab_handler (smul_optab, mode);
   insn_gen_fn gen_mul = GEN_FCN (code);

   gcc_assert (code != CODE_FOR_nothing);

-  /* Load up the constant 1.5 either as a scalar, or as a vector.  */
-  real_from_integer (&dconst3_2, VOIDmode, 3, SIGNED);
-  SET_REAL_EXP (&dconst3_2, REAL_EXP (&dconst3_2) - 1);
+  mhalf = rs6000_load_constant_and_splat (mode, dconsthalf);
-  halfthree = rs6000_load_constant_and_splat (mode, dconst3_2);
+  /* e = rsqrt estimate */
+  emit_insn (gen_rtx_SET (e, gen_rtx_UNSPEC (mode, gen_rtvec (1, src),
+UNSPEC_RSQRT)));

-  /* x0 = rsqrt estimate */
-  emit_insn (gen_rtx_SET (x0, gen_rtx_UNSPEC (mode, gen_rtvec (1, src),
- UNSPEC_RSQRT)));
-
   /* If (src == 0.0) filter infinity to prevent NaN for sqrt(0.0).  */
   if (!recip)
 {
   rtx zero = force_reg (mode, CONST0_RTX (mode));
-  rtx target = emit_conditional_move (x0, GT, src, zero, mode,
- x0, zero, mode, 0);
-  if (target != x0)
-   emit_move_insn (x0, target);
+  rtx target = emit_conditional_move (e, GT, src, zero, mode,
+ e, zero, mode, 0);
+  if (target != e)
+   emit_move_insn (e, target);
 }

-  /* y = 0.5 * src = 1.5 * src - src -> fewer constants */
-  rs6000_emit_msub (y, src, halfthree, src);
+  /* g = sqrt estimate.  */
+  emit_insn (gen_mul (g, e, src));
+  /* h = 1/(2*sqrt) estimate.  */
+  emit_insn (gen_mul (h, e, mhalf));

-  for (i = 0; i < passes; i++)
+  if (recip)
 {
-  rtx x1 = gen_reg_rtx (mode);
-  rtx u = gen_reg_rtx (mode);
-  rtx v = gen_reg_rtx (mode);
+  if (passes == 1)
+   {
+ rtx t = gen_reg_rtx (mode);
+ rs6000_emit_nmsub (t, g, h, mhalf);
+ /* Apply correction directly to 1/rsqrt estimate.  */
+ rs6000_emit_madd (dst, e, t, e);
+   }
+  else
+   {
+ for (i = 0; i < passes; i++)
+   {
+ rtx t1 = gen_reg_rtx (mode);
+ rtx g1 = gen_reg_rtx (mode);
+ rtx h1 = gen_reg_rtx (mode);

-  /* x1 = x0 * (1.5 - y * (x0 * x0)) */
-  emit_insn (gen_mul (u, x0, x0));
-  rs6000_emit_nmsub (v, y, u, halfthree);
-  emit_insn (gen_mul (x1, x0, v));
-  x0 = x1;
+ rs6000_emit_nmsub (t1, g, h, mhalf);
+  

Re: [hsa merge 07/10] IPA-HSA pass

2016-01-15 Thread Martin Jambor
Hi,

On Fri, Jan 15, 2016 at 04:01:49PM +0100, Jakub Jelinek wrote:
> On Fri, Jan 15, 2016 at 03:53:23PM +0100, Martin Jambor wrote:
> > @@ -317,7 +319,7 @@ public:
> >  bool
> >  pass_ipa_hsa::gate (function *)
> >  {
> > -  return hsa_gen_requested_p () || in_lto_p;
> > +  return hsa_gen_requested_p ();
> >  }
> >  
> >  } // anon namespace
> 
> I actually didn't mean this, I mean more of:
>   return (hsa_gen_requested_p ()
> #ifdef ENABLE_HSA
> || in_lto_p
> #endif
>);
> or so.  Unless you arrange in lto-wrapper or where that if
> HSA is enabled in any LTO input source, then it is enabled also in
> lto1.  If you do that, your change is fine.
> 

This pass only creates HSA specific clones of ungridified target and
parallel regions and functions marked with declare target.  Whether or
not any HSAIL is emitted is then controlled in the hsa-gen pass gate.
The in_lto_p part was in fact a relict of a previous implementation.

So while I agree that making such a change to lto-wrapper would be
beneficial (although then we should limit its activity only to those
nodes which come from enabled units), the change above does not make
the current situation worse.  I will make sure to look into
lto-wrapper but meanwhile I still prefer the new condition.

We have tested the new change and LTO compiled code with HSA enabled
and LTO linked it with HSA disabled and:
  1) if there was no gridified loop, the result was like HSA was
 disabled from the start

  2) if there was a gridified kernel, the compiler compiled the kernel
 for the host but did not register it with libgomp and it ended up
 as an unreachable function.

How do other accelerators cope with the situation when half of the
application is compiled with the accelerator disabled?  (Would some of
their calls to GOMP_target_ext lead to abort?)

Martin


C++ PATCH for c++/69257 (ICE with incomplete deref and asm)

2016-01-15 Thread Jason Merrill
In this testcase, the compiler fails to diagnose trying to use the value 
of an INDIRECT_REF of incomplete type.  Since decay_conversion is 
modeling the lvalue->rvalue conversion here, that seems a logical place 
to complain.


When making that change, I noticed that we were incorrectly calling 
mark_rvalue_use for the array->pointer and function->pointer 
conversions, which are more properly considered lvalue uses, so I've 
fixed that as well.


The convert_like_real change was necessary to retain the "initializing 
argument" message for several tests in the testsuite, since we're now 
diagnosing the use of incomplete in a different place.


I've also improved the incomplete type diagnostic to use the location of 
the expression, if one is provided.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit efa5f059291139330e0dabaa857932ec08c7a057
Author: Jason Merrill 
Date:   Thu Jan 14 11:50:56 2016 -0500

	PR c++/69257
	* typeck.c (decay_conversion): Don't call mark_rvalue_use for
	array/function-to-pointer conversion.  Call
	complete_type_or_maybe_complain for lvalue-to-rvalue conversion.
	* call.c (convert_like_real): Print call context if
	decay_conversion errors.

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index f3f95ef..c05170a 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -6542,7 +6542,16 @@ convert_like_real (conversion *convs, tree expr, tree fn, int argnum,
 case ck_rvalue:
   expr = decay_conversion (expr, complain);
   if (expr == error_mark_node)
-	return error_mark_node;
+	{
+	  if (complain)
+	{
+	  maybe_print_user_conv_context (convs);
+	  if (fn)
+		inform (DECL_SOURCE_LOCATION (fn),
+			"  initializing argument %P of %qD", argnum, fn);
+	}
+	  return error_mark_node;
+	}
 
   if (! MAYBE_CLASS_TYPE_P (totype))
 	return expr;
diff --git a/gcc/cp/typeck.c b/gcc/cp/typeck.c
index 94267b67..0503c6f 100644
--- a/gcc/cp/typeck.c
+++ b/gcc/cp/typeck.c
@@ -1909,11 +1909,10 @@ unlowered_expr_type (const_tree exp)
 
 /* Perform the conversions in [expr] that apply when an lvalue appears
in an rvalue context: the lvalue-to-rvalue, array-to-pointer, and
-   function-to-pointer conversions.  In addition, manifest constants
-   are replaced by their values, and bitfield references are converted
-   to their declared types. Note that this function does not perform the
-   lvalue-to-rvalue conversion for class types. If you need that conversion
-   to for class types, then you probably need to use force_rvalue.
+   function-to-pointer conversions.  In addition, bitfield references are
+   converted to their declared types. Note that this function does not perform
+   the lvalue-to-rvalue conversion for class types. If you need that conversion
+   for class types, then you probably need to use force_rvalue.
 
Although the returned value is being used as an rvalue, this
function does not wrap the returned expression in a
@@ -1933,8 +1932,6 @@ decay_conversion (tree exp,
   if (type == error_mark_node)
 return error_mark_node;
 
-  exp = mark_rvalue_use (exp, loc, reject_builtin);
-
   exp = resolve_nondeduced_context (exp);
   if (type_unknown_p (exp))
 {
@@ -1962,12 +1959,19 @@ decay_conversion (tree exp,
   if (invalid_nonstatic_memfn_p (loc, exp, complain))
 return error_mark_node;
   if (code == FUNCTION_TYPE || is_overloaded_fn (exp))
-return cp_build_addr_expr (exp, complain);
+{
+  exp = mark_lvalue_use (exp);
+  if (reject_builtin && reject_gcc_builtin (exp, loc))
+	return error_mark_node;
+  return cp_build_addr_expr (exp, complain);
+}
   if (code == ARRAY_TYPE)
 {
   tree adr;
   tree ptrtype;
 
+  exp = mark_lvalue_use (exp);
+
   if (INDIRECT_REF_P (exp))
 	return build_nop (build_pointer_type (TREE_TYPE (type)),
 			  TREE_OPERAND (exp, 0));
@@ -2013,6 +2017,9 @@ decay_conversion (tree exp,
   return cp_convert (ptrtype, adr, complain);
 }
 
+  /* Otherwise, it's the lvalue-to-rvalue conversion.  */
+  exp = mark_rvalue_use (exp, loc, reject_builtin);
+
   /* If a bitfield is used in a context where integral promotion
  applies, then the caller is expected to have used
  default_conversion.  That function promotes bitfields correctly
@@ -2032,6 +2039,9 @@ decay_conversion (tree exp,
   if (!CLASS_TYPE_P (type) && cv_qualified_p (type))
 exp = build_nop (cv_unqualified (type), exp);
 
+  if (!complete_type_or_maybe_complain (type, exp, complain))
+return error_mark_node;
+
   return exp;
 }
 
diff --git a/gcc/testsuite/g++.dg/ext/asm13.C b/gcc/testsuite/g++.dg/ext/asm13.C
new file mode 100644
index 000..eece05e
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/asm13.C
@@ -0,0 +1,6 @@
+// PR c++/69257
+
+int fn1() {
+  struct S *x;
+  __asm ( "": :"" (*x));	// { dg-error "incomplete" }
+}
commit 0c5ad1f445e3008dcd25b6b6ac6f4cee7efce513
Author: Jason Merrill 
Date:   Thu Jan 14 13:00:25 2016 -0500

	* typeck2.c (cxx_incomplete_type_diagnost

Re: [PATCH] DWARF: add abstract origin links on lexical blocks DIEs

2016-01-15 Thread Richard Biener
On Fri, Jan 15, 2016 at 3:41 PM, Pierre-Marie de Rodat
 On 01/13/2016 01:17 PM, Richard Biener wrote:
>>
>> I wonder if you can construct a guality testcase that passes with and
>> fails without
>> the patch?
>
>
> I’ve tried to first look at how guality testcases are written (thanks for
> your answers on IRC, by the way :-)) and then how I could write a testcase
> for my fix. It seems there are two ways: match patterns in the assembly file
> or evaluate an expression in GDB.
>
> I already have the testcase I used during development: it’s written in Ada,
> to build with -O2. The way it checks the fix is to see if GDB manages to put
> a breakpoint on the Child2 symbol before executing the program (it cannot
> before my fix and it can afterwards). Oh, and it requires a fairly recent
> GDB version (7.10 looks good).
>
> I managed to get a similar GNU C99 reproducer (it’s attached): the debugging
> information has the pattern that exhibits the bugfix. Namely: while the
> “parent” function is inlined, the “child” function (which is in a block
> inside “parent”) is not. So GDB relies on the DW_TAG_abstract_origin in the
> inlined block to refer to the abstract block that contains the DIE that
> materializes “child“.
>
> However, it looks like there is no way in GDB to refer to C nested functions
> when they are not in the current scope:
>>
>> $ gcc -g -O2 -std=gnu99 nested_fun.c nested_fun_helpers.c
>> $ gdb -n -q ./a.out
>> (gdb) ptype child
>> No symbol "child" in current context.
>> (gdb) ptype nested_fun.parent.child
>> No symbol "nested_fun" in current context.
>
>
> On the other hand, this works with the Ada testcase:
>>
>> (gdb) ptype nested_fun.parent.child
>> type = (false, true)
>
>
> So I’m not sure what to do next: should I do a fragile testcase based on
> scanning the assembly file? (it could break with an optimizer change) create
> a guality testsuite for Ada?

Sounds like a good excuse to add a guality for Ada (which has unique
needs for dwarf).

Richard.

>> Anyway, the patch looks ok to me but please give others a chance to chime
>> in.
>
>
> Sure. Thank you for reviewing!
>
> --
> Pierre-Marie de Rodat


Re: [doc, 5/n] invoke.texi: add new "Program Instrumentation Options" section

2016-01-15 Thread Sandra Loosemore

On 01/15/2016 01:39 AM, Mikhail Maltsev wrote:

On 01/15/2016 05:17 AM, Sandra Loosemore wrote:

This patch consolidates the documentation of GCC options that add runtime 
profiling, error checking, or other instrumentation into a single section.  
Currently these are scattered all over, variously classified as debugging 
options, code generation options, optimization options, etc.

Here is the list of options that I moved into the new section:

@gccoptlist{-p  -pg  -fprofile-arcs --coverage -ftest-coverage @gol

(snip)

The list mentions "-fchecking", but this option is not related to
instrumentation. In just enables consistency checks of the compiler's
internal state, i.e. it is more related to debugging GCC itself.


Yes, you're right -- I was confused by it being buried in the middle of 
the pointer bounds checking options.  :-(  I'll move it back to its 
previous category.


-Sandra



C++ PATCH for c++/68847 (ICE with builtin in template)

2016-01-15 Thread Jason Merrill
The delayed folding code for builtins needs to make sure that the 
expression is instantiated before we try to fold it, because we can get 
here while parsing a template.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit 6e41de9cec949bc7c09406b50105afee927a0ae3
Author: Jason Merrill 
Date:   Thu Jan 14 17:25:27 2016 -0500

	PR c++/68847
	* call.c (build_cxx_call): Use fold_non_dependent_expr.

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index c05170a..ce87be7 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -7755,7 +7755,7 @@ build_cxx_call (tree fn, int nargs, tree *argarray,
   /* We need to take care that values to BUILT_IN_NORMAL
  are reduced.  */
   for (i = 0; i < nargs; i++)
-	argarray[i] = maybe_constant_value (argarray[i]);
+	argarray[i] = fold_non_dependent_expr (argarray[i]);
 
   if (!check_builtin_function_arguments (fndecl, nargs, argarray))
 	return error_mark_node;
diff --git a/gcc/testsuite/g++.dg/delayedfold/builtin1.C b/gcc/testsuite/g++.dg/delayedfold/builtin1.C
new file mode 100644
index 000..32f4435
--- /dev/null
+++ b/gcc/testsuite/g++.dg/delayedfold/builtin1.C
@@ -0,0 +1,11 @@
+// PR c++/68847
+// { dg-do compile { target cas_int } }
+
+class RegionLock {
+  template  void m_fn1();
+  int spinlock;
+} acquire_zero;
+int acquire_one;
+template  void RegionLock::m_fn1() {
+  __atomic_compare_exchange(&spinlock, &acquire_zero, &acquire_one, false, 2, 2);
+}


[PATCH, i386] Support ANDN in stv pass

2016-01-15 Thread Ilya Enkovich
Hi,

This patch continues resolving andn regression case in stv pass
(see https://gcc.gnu.org/ml/gcc-patches/2016-01/msg01017.html).
In this patch a new andn pattern added similar to other bit
DI patterns we have for stv pass.

This improves performance of 462.libquantum benchmark on Haswell
(+2.6% on -O2, +1% on -O3 -flto).

Unfortunately this patch doesn't enable generation of pandn in case
target doesn't have BMI.  Probably peephole may be used for such targets?
Or we may allow andn and then split it back to and + xor for them.

Bootstrapped and regtested on x86_64-unknown-linux-gnu.  OK for trunk?

Thanks,
Ilya
--
gcc/

2016-01-15  Ilya Enkovich  

* config/i386/i386.c (scalar_to_vector_candidate_p): Support
andnot instruction.
(scalar_chain::convert_op): Likewise.
* config/i386/i386.md (*andndi3_doubleword): New.

gcc/testsuite/

2016-01-15  Ilya Enkovich  

* gcc.target/i386/pr65105-5.c: Adjust to andn generation.


diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index de41477..a0b0d68 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2815,7 +2815,11 @@ scalar_to_vector_candidate_p (rtx_insn *insn)
   return false;
 }
 
-  if (!REG_P (XEXP (src, 0)) && !MEM_P (XEXP (src, 0)))
+  if (!REG_P (XEXP (src, 0)) && !MEM_P (XEXP (src, 0))
+  /* Check for andnot case.  */
+  && (GET_CODE (src) != AND
+ || GET_CODE (XEXP (src, 0)) != NOT
+ || !REG_P (XEXP (XEXP (src, 0), 0
   return false;
 
   if (!REG_P (XEXP (src, 1)) && !MEM_P (XEXP (src, 1)))
@@ -3383,7 +3387,12 @@ scalar_chain::convert_op (rtx *op, rtx_insn *insn)
 {
   *op = copy_rtx_if_shared (*op);
 
-  if (MEM_P (*op))
+  if (GET_CODE (*op) == NOT)
+{
+  convert_op (&XEXP (*op, 0), insn);
+  PUT_MODE (*op, V2DImode);
+}
+  else if (MEM_P (*op))
 {
   rtx tmp = gen_reg_rtx (DImode);
 
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 71941d0..f16b42a 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -8645,6 +8645,23 @@
  (clobber (reg:CC FLAGS_REG))])]
   "split_double_mode (DImode, &operands[0], 3, &operands[0], &operands[3]);")
 
+(define_insn_and_split "*andndi3_doubleword"
+  [(set (match_operand:DI 0 "register_operand" "=r,r")
+   (and:DI
+ (not:DI (match_operand:DI 1 "register_operand" "r,r"))
+ (match_operand:DI 2 "nonimmediate_operand" "r,m")))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_BMI && !TARGET_64BIT && TARGET_STV && TARGET_SSE"
+  "#"
+  "&& reload_completed"
+  [(parallel [(set (match_dup 0)
+  (and:SI (not:SI (match_dup 1)) (match_dup 2)))
+ (clobber (reg:CC FLAGS_REG))])
+   (parallel [(set (match_dup 3)
+  (and:SI (not:SI (match_dup 4)) (match_dup 5)))
+ (clobber (reg:CC FLAGS_REG))])]
+  "split_double_mode (DImode, &operands[0], 3, &operands[0], &operands[3]);")
+
 (define_insn "*hi_1"
   [(set (match_operand:HI 0 "nonimmediate_operand" "=r,rm,!k")
(any_or:HI
diff --git a/gcc/testsuite/gcc.target/i386/pr65105-5.c 
b/gcc/testsuite/gcc.target/i386/pr65105-5.c
index 5818c1c..639bbe1 100644
--- a/gcc/testsuite/gcc.target/i386/pr65105-5.c
+++ b/gcc/testsuite/gcc.target/i386/pr65105-5.c
@@ -1,7 +1,7 @@
 /* PR target/pr65105 */
 /* { dg-do compile { target { ia32 } } } */
 /* { dg-options "-O2 -march=core-avx2" } */
-/* { dg-final { scan-assembler "pand" } } */
+/* { dg-final { scan-assembler "pandn" } } */
 /* { dg-final { scan-assembler "pxor" } } */
 /* { dg-final { scan-assembler "ptest" } } */
 


Re: [hsa merge 09/10] Majority of the HSA back-end

2016-01-15 Thread Jakub Jelinek
On Fri, Jan 15, 2016 at 04:08:14PM +0100, Martin Jambor wrote:
> We don't error, apart from issuing a warning we basically ignore them.
> I believe we can do it even in the long term and that it is in fact
> useful because the standard says that the "effect" if these routines
> is "unspecified" if they get called from a target region.
> 
> Perhaps this is even something we should warn about earlier in omp
> lowering/expansion.

Well, only some of the omp.h functions are not allowed to be called from 
the target regions, others are.
For the others that have unspecified behavior, there is always the question
when it is desirable to warn.  In target construct body it might be a
warning candidate, the only possibility that it is not invoking unspec
behavior is if the target construct is not encountered, if it is in dead
code in that body, or just never encountered.
But if you have declare target routine, it is more controversial to warn,
because the routine can be run both on host (where it is fine) and on target
(where it is not), whether it calls the argument e.g. could depend on some
parameter or result of some function (say check whether it is in offloaded
region).

Anyway, thanks for fixing this, the patch is ok for trunk.  And after the
commit you're the maintainer, so it is up to you to review further changes
to it.  Please keep it nicely and consistently formatted in the future ;)

Jakub


Re: [hsa merge 10/10] HSA register allocator

2016-01-15 Thread Martin Jambor
Hi,

On Thu, Jan 14, 2016 at 03:41:34PM +0100, Jakub Jelinek wrote:
> On Wed, Jan 13, 2016 at 06:39:35PM +0100, Martin Jambor wrote:
> > +for (phi = hbb->m_first_phi;
> > +phi;
> > +phi = phi->m_next ? as_a  (phi->m_next): NULL)
> 
> Space before :
> 
> Ok with that change.
> 

I have committed the following patch from Martin to address this and a
few other code style issues.

Thanks,

Martin

2016-01-15  Martin Liska  

* hsa-regalloc.c (naive_outof_ssa): Fixed coding style.
(linear_scan_regalloc): Likewise.
(regalloc): Likewise.
---
 gcc/hsa-regalloc.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/gcc/hsa-regalloc.c b/gcc/hsa-regalloc.c
index 5a42beb..f8e83ecf 100644
--- a/gcc/hsa-regalloc.c
+++ b/gcc/hsa-regalloc.c
@@ -90,7 +90,7 @@ naive_outof_ssa (void)
 
 for (phi = hbb->m_first_phi;
 phi;
-phi = phi->m_next ? as_a  (phi->m_next): NULL)
+phi = phi->m_next ? as_a  (phi->m_next) : NULL)
   naive_process_phi (phi);
 
 /* Zap PHI nodes, they will be deallocated when everything else will.  */
@@ -525,7 +525,7 @@ linear_scan_regalloc (struct m_reg_class_desc *classes)
   else
after_end_number = insn_order;
   /* Everything live-out in this BB has at least an end point
- after us. */
+after us.  */
   EXECUTE_IF_SET_IN_BITMAP (hbb->m_liveout, 0, bit, bi)
note_lr_end (ind2reg[bit], after_end_number);
 
@@ -549,7 +549,7 @@ linear_scan_regalloc (struct m_reg_class_desc *classes)
}
 
   /* Everything live-in in this BB has a start point before
- our first insn.  */
+our first insn.  */
   int before_start_number;
   if (hbb->m_first_insn)
before_start_number = hbb->m_first_insn->m_number;
@@ -570,7 +570,7 @@ linear_scan_regalloc (struct m_reg_class_desc *classes)
   are defined at the start of the routine (prologue).  */
if (ind2reg[i]->m_lr_begin == insn_order)
  ind2reg[i]->m_lr_begin = 0;
-   /* All regs that have no use but a def will have lr_end == 0, 
+   /* All regs that have no use but a def will have lr_end == 0,
   they are actually live from def until after the insn they are
   defined in.  */
if (ind2reg[i]->m_lr_end == 0)
@@ -672,7 +672,7 @@ regalloc (void)
   basic_block bb;
   m_reg_class_desc classes[4];
 
-  /* If there are no registers used in the function, exit right away. */
+  /* If there are no registers used in the function, exit right away.  */
   if (hsa_cfun->m_reg_count == 0)
 return;
 
-- 
2.6.4



Re: [hsa merge 09/10] Majority of the HSA back-end

2016-01-15 Thread Martin Jambor
Hi,

thanks Jakub.  Below you'll find a patch, which is mostly work of
Martin Liska, that should address all the review comments.  We have
then also went over the "XXX" marks (my bad that I forgot that Michael
uses this mark), removed half of them and turned the rest into TODOs.

Let me just quickly answer two comments as well:

On Thu, Jan 14, 2016 at 03:05:33PM +0100, Jakub Jelinek wrote:
> On Wed, Jan 13, 2016 at 06:39:34PM +0100, Martin Jambor wrote:
>
...
> > +#define HSA_WARN_MEMORY_ROUTINE "OpenMP device memory library routines 
> > have " \
> > +  "undefined semantics within target regions, support for HSA ignores them"
> 
> Well, if you don't support them in HSA target regions, you'd better punt and
> not error on them.

We don't error, apart from issuing a warning we basically ignore them.
I believe we can do it even in the long term and that it is in fact
useful because the standard says that the "effect" if these routines
is "unspecified" if they get called from a target region.

Perhaps this is even something we should warn about earlier in omp
lowering/expansion.

...

> > +unsigned
> > +hsa_internal_fn::get_arity ()
> > +{
> > +  switch (m_fn)
> > +{
> > +case IFN_ACOS:
> > +case IFN_ASIN:
> > +case IFN_ATAN:
> > +case IFN_COS:
> > +case IFN_EXP:
> > +case IFN_EXP10:
> > +case IFN_EXP2:
> > +case IFN_EXPM1:
> > +case IFN_LOG:
> > +case IFN_LOG10:
> > +case IFN_LOG1P:
> > +case IFN_LOG2:
> > +case IFN_LOGB:
> > +case IFN_SIGNIFICAND:
> > +case IFN_SIN:
> > +case IFN_SQRT:
> > +case IFN_TAN:
> > +case IFN_CEIL:
> > +case IFN_FLOOR:
> > +case IFN_NEARBYINT:
> > +case IFN_RINT:
> > +case IFN_ROUND:
> > +case IFN_TRUNC:
> > +  return 1;
> > +case IFN_ATAN2:
> > +case IFN_COPYSIGN:
> > +case IFN_FMOD:
> > +case IFN_POW:
> > +case IFN_REMAINDER:
> > +case IFN_SCALB:
> > +case IFN_LDEXP:
> > +  return 2;
> > +  break;
> > +case IFN_CLRSB:
> > +case IFN_CLZ:
> > +case IFN_CTZ:
> > +case IFN_FFS:
> > +case IFN_PARITY:
> > +case IFN_POPCOUNT:
> > +default:
> > +  gcc_unreachable ();
> 
> There are various other IFNs (e.g. for __builtin_{add,sub,mul}_overflow,
> lots of others).  How do you ensure you don't ICE on those?

Martin added a comment explaining this.  This can only be reached when
we already know we are processing a known builtin, filtered by
gen_hsa_insn_for_internal_fn_call.

Thanks for looking at the code,

Martin

2016-01-15  Martin Liska  
Martin Jambor  

* hsa-brig.c (struct function_linkage_pair): Fix GNU coding style
and replace sprintf with snprintf.
(hsa_brig_section::init): Likewise.
(hsa_brig_section::output): Likewise.
(hsa_brig_section::get_ptr_by_offset): Likewise.
(brig_string_slot_hasher::hash): Likewise.
(brig_string_slot_hasher::equal): Likewise.
(brig_string_slot_hasher::remove): Likewise.
(brig_emit_string): Likewise.
(brig_init): Likewise.
(emit_directive_variable): Likewise.
(emit_function_directives): Likewise.
(emit_bb_label_directive): Likewise.
(emit_immediate_scalar_to_buffer): Likewise.
(hsa_op_immed::emit_to_buffer): Likewise.
(emit_immediate_operand): Likewise.
(emit_address_operand): Likewise.
(emit_memory_insn): Likewise.
(emit_alloca_insn): Likewise.
(emit_cmp_insn): Likewise.
(emit_branch_insn): Likewise.
(emit_switch_insn): Likewise.
(emit_call_insn): Likewise.
(emit_arg_block_insn): Likewise.
(emit_packed_insn): Likewise.
(emit_basic_insn): Likewise.
(hsa_brig_emit_function): Likewise.
(hsa_output_global_variables): Likewise.
(hsa_output_kernels): Likewise.
(hsa_output_libgomp_mapping): Likewise.
(hsa_output_brig): Likewise.
* hsa-dump.c (dump_hsa_immed): Likewise.
(dump_hsa_insn_1): Likewise.
* hsa-gen.c (hsa_symbol::total_byte_size): Likewise.
(hsa_init_simple_builtins): Likewise.
(hsa_init_data_for_cfun): Likewise.
(hsa_type_for_scalar_tree_type): Likewise.
(get_symbol_for_decl): Likewise.
(hsa_get_host_function): Likewise.
(hsa_op_immed::hsa_op_immed): Likewise.
(hsa_insn_mem::hsa_insn_mem): Likewise.
(hsa_insn_atomic::hsa_insn_atomic): Likewise.
(hsa_insn_seg::hsa_insn_seg): Likewise.
(hsa_insn_srctype::hsa_insn_srctype): Likewise.
(process_mem_base): Likewise.
(gen_hsa_insns_for_bitfield): Likewise.
(gen_hsa_insns_for_load): Likewise.
(gen_hsa_insns_for_store): Likewise.
(gen_hsa_insns_for_operation_assignment): Likewise.
(gen_hsa_insns_for_switch_stmt): Likewise.
(get_format_argument_type): Likewise.
(gen_hsa_insns_for_direct_call): Likewise.
(gen_hsa_insn

[PATCH][ARM] PR target/69135: Mark ARMv8 vcvt instructions as unconditional

2016-01-15 Thread Kyrill Tkachov

Hi all,

In this PR the ARMv8 vcvt instructions end up being conditionalised when they 
don't have a conditional form.
setting the predicable attribute to "no" is not enough. We need to set the 
"conds" attribute to unconditional as well.

Bootstrapped and tested on arm-none-linux-gnueabihf.
Ok for trunk and GCC 5?

Thanks,
Kyrill

2016-01-15  Kyrylo Tkachov  

PR target/69135
* config/arm/vfp.md (lsi2): Set "conds"
attribute to unconditional.  Remove %? from output template.

2016-01-15  Kyrylo Tkachov  

PR target/69135
* gcc.target/arm/pr69135_1.c: New test.
diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md
index cf3b202d2565341745b0c3bd3bc4299e91e86c31..ac5f3b862b5a66227cfa20c36c9f780c743ed853 100644
--- a/gcc/config/arm/vfp.md
+++ b/gcc/config/arm/vfp.md
@@ -1334,8 +1334,9 @@ (define_insn "lsi2"
 [(match_operand:SDF 1
"register_operand" "")] VCVT)))]
   "TARGET_HARD_FLOAT && TARGET_FPU_ARMV8 "
-  "vcvt%?.32.\\t%0, %1"
+  "vcvt.32.\\t%0, %1"
   [(set_attr "predicable" "no")
+   (set_attr "conds" "unconditional")
(set_attr "type" "f_cvtf2i")]
 )
 
diff --git a/gcc/testsuite/gcc.target/arm/pr69135_1.c b/gcc/testsuite/gcc.target/arm/pr69135_1.c
new file mode 100644
index ..6fb9e0681baed833bd530601bfb94d3d86c3e9f2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr69135_1.c
@@ -0,0 +1,44 @@
+/* { dg-do assemble } */
+/* { dg-require-effective-target arm_v8_vfp_ok } */
+/* { dg-require-effective-target arm_arch_v8a_ok } */
+/* { dg-options "-O2 -ffast-math" } */
+/* { dg-add-options arm_v8_vfp } */
+/* { dg-add-options arm_arch_v8a } */
+
+int global;
+
+void
+lceil_float (float x, int b)
+{
+  if (b) global = __builtin_lceilf (x);
+}
+
+void
+lceil_double (double x, int b)
+{
+  if (b) global = __builtin_lceil (x);
+}
+
+void
+lfloor_float (float x, int b)
+{
+  if (b) global =  __builtin_lfloorf (x);
+}
+
+void
+lfloor_double (double x, int b)
+{
+  if (b) global = __builtin_lfloor (x);
+}
+
+void
+lround_float (float x, int b)
+{
+  if (b) global = __builtin_lroundf (x);
+}
+
+void
+lround_double (double x, int b)
+{
+  if (b) global = __builtin_lround (x);
+}


Re: [hsa merge 07/10] IPA-HSA pass

2016-01-15 Thread Jakub Jelinek
On Fri, Jan 15, 2016 at 03:53:23PM +0100, Martin Jambor wrote:
> @@ -317,7 +319,7 @@ public:
>  bool
>  pass_ipa_hsa::gate (function *)
>  {
> -  return hsa_gen_requested_p () || in_lto_p;
> +  return hsa_gen_requested_p ();
>  }
>  
>  } // anon namespace

I actually didn't mean this, I mean more of:
  return (hsa_gen_requested_p ()
#ifdef ENABLE_HSA
  || in_lto_p
#endif
 );
or so.  Unless you arrange in lto-wrapper or where that if
HSA is enabled in any LTO input source, then it is enabled also in
lto1.  If you do that, your change is fine.

Jakub


[gomp4] implicit non-scalars data mapping in kernels backport

2016-01-15 Thread Cesar Philippidis
I've backported this patch from trunk to gomp-4_0-branch which teaches
the gimplifier to inspect the type of the value being pointed to when
deciding what type of implicit data mapping is necessary for a variable.
More discussing on this patch can be found here
.

Cesar
2016-01-15  Cesar Philippidis  

	gcc/
	* gimplify.c (oacc_default_clause): Decode reference and pointer
	types for both kernels and parallel regions.

	libgomp/
	* testsuite/libgomp.oacc-fortran/kernels-data.f90: New test.


diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 17144d1..eda2e9c 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -5994,6 +5994,10 @@ oacc_default_clause (struct gimplify_omp_ctx *ctx, tree decl, unsigned flags)
 {
   const char *rkind;
   bool on_device = false;
+  tree type = TREE_TYPE (decl);
+
+  if (lang_hooks.decls.omp_privatize_by_reference (decl))
+type = TREE_TYPE (type);
 
   if ((ctx->region_type & (ORT_ACC_PARALLEL | ORT_ACC_KERNELS)) != 0
   && is_global_var (decl)
@@ -6012,7 +6016,7 @@ oacc_default_clause (struct gimplify_omp_ctx *ctx, tree decl, unsigned flags)
   /* Scalars are default 'copy' under kernels, non-scalars are default
 	 'present_or_copy'.  */
   flags |= GOVD_MAP;
-  if (!AGGREGATE_TYPE_P (TREE_TYPE (decl)))
+  if (!AGGREGATE_TYPE_P (type))
 	flags |= GOVD_MAP_FORCE;
 
   rkind = "kernels";
@@ -6020,12 +6024,6 @@ oacc_default_clause (struct gimplify_omp_ctx *ctx, tree decl, unsigned flags)
 
 case ORT_ACC_PARALLEL:
   {
-	tree type = TREE_TYPE (decl);
-
-	if (TREE_CODE (type) == REFERENCE_TYPE
-	|| POINTER_TYPE_P (type))
-	  type = TREE_TYPE (type);
-
 	if (on_device || AGGREGATE_TYPE_P (type))
 	  /* Aggregates default to 'present_or_copy'.  */
 	  flags |= GOVD_MAP;
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/non-scalar-data.f90 b/libgomp/testsuite/libgomp.oacc-fortran/non-scalar-data.f90
new file mode 100644
index 000..4afb562
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/non-scalar-data.f90
@@ -0,0 +1,50 @@
+! Ensure that a non-scalar dummy arguments which are implicitly used inside
+! offloaded regions are properly mapped using present_or_copy.
+
+! { dg-do run }
+
+program main
+  implicit none
+
+  integer, parameter :: n = 100
+  integer :: array(n), i
+  
+  !$acc data copy(array)
+  call kernels(array, n)
+
+  !$acc update host(array)
+
+  do i = 1, n
+ if (array(i) .ne. i) call abort
+  end do
+
+  call parallel(array, n)
+  !$acc end data
+
+  do i = 1, n
+ if (array(i) .ne. i+i) call abort
+  end do
+end program main
+
+subroutine kernels (array, n)
+  integer, dimension (n) :: array
+  integer :: n, i
+
+  !$acc kernels
+  do i = 1, n
+ array(i) = i
+  end do
+  !$acc end kernels
+end subroutine kernels
+
+
+subroutine parallel (array, n)
+  integer, dimension (n) :: array
+  integer :: n, i
+
+  !$acc parallel
+  do i = 1, n
+ array(i) = i+i
+  end do
+  !$acc end parallel
+end subroutine parallel


Re: [hsa merge 07/10] IPA-HSA pass

2016-01-15 Thread Martin Jambor
On Thu, Jan 14, 2016 at 01:58:58PM +0100, Jakub Jelinek wrote:
> Otherwise LGTM.
> 
>   Jakub

Thanks Jakub, I have committed the following patch from Martin Liska
that addresses your comments.

Martin

2016-01-15  Martin Liska  

* ipa-hsa.c (process_hsa_functions): Fixed coding style.
(ipa_hsa_read_section): Likewise.
(ipa_hsa_read_section): Likewise.
(pass_ipa_hsa::gate): Removed in_lto_p from the condition.
---
 gcc/ipa-hsa.c | 22 --
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/gcc/ipa-hsa.c b/gcc/ipa-hsa.c
index dd47995..769657f 100644
--- a/gcc/ipa-hsa.c
+++ b/gcc/ipa-hsa.c
@@ -86,8 +86,9 @@ process_hsa_functions (void)
{
  if (!check_warn_node_versionable (node))
continue;
- cgraph_node *clone = node->create_virtual_clone
-   (vec  (), NULL, NULL, "hsa");
+ cgraph_node *clone
+   = node->create_virtual_clone (vec  (),
+ NULL, NULL, "hsa");
  TREE_PUBLIC (clone->decl) = TREE_PUBLIC (node->decl);
 
  clone->force_output = true;
@@ -102,8 +103,9 @@ process_hsa_functions (void)
{
  if (!check_warn_node_versionable (node))
continue;
- cgraph_node *clone = node->create_virtual_clone
-   (vec  (), NULL, NULL, "hsa");
+ cgraph_node *clone
+   = node->create_virtual_clone (vec  (),
+ NULL, NULL, "hsa");
  TREE_PUBLIC (clone->decl) = TREE_PUBLIC (node->decl);
 
  if (!cgraph_local_p (node))
@@ -209,8 +211,8 @@ static void
 ipa_hsa_read_section (struct lto_file_decl_data *file_data, const char *data,
   size_t len)
 {
-  const struct lto_function_header *header =
-(const struct lto_function_header *) data;
+  const struct lto_function_header *header
+= (const struct lto_function_header *) data;
   const int cfg_offset = sizeof (struct lto_function_header);
   const int main_offset = cfg_offset + header->cfg_size;
   const int string_offset = main_offset + header->main_size;
@@ -221,9 +223,9 @@ ipa_hsa_read_section (struct lto_file_decl_data *file_data, 
const char *data,
   lto_input_block ib_main ((const char *) data + main_offset,
   header->main_size, file_data->mode_table);
 
-  data_in =
-lto_data_in_create (file_data, (const char *) data + string_offset,
-   header->string_size, vNULL);
+  data_in
+= lto_data_in_create (file_data, (const char *) data + string_offset,
+ header->string_size, vNULL);
   count = streamer_read_uhwi (&ib_main);
 
   for (i = 0; i < count; i++)
@@ -317,7 +319,7 @@ public:
 bool
 pass_ipa_hsa::gate (function *)
 {
-  return hsa_gen_requested_p () || in_lto_p;
+  return hsa_gen_requested_p ();
 }
 
 } // anon namespace
-- 
2.6.4





Re: [hsa merge 05/10] OpenMP lowering/expansion changes (gridification)

2016-01-15 Thread Martin Jambor
Thanks Jakub and Alex,

I have committed the following to the branch to address your comments:

2016-01-15  Martin Jambor  

* gimple.h: Fixed comment of gimple_statement_omp_single_layout
* omp-low.c (get_target_argument_value): Fixed spelling in its
comment.
(push_target_argument_according_to_value): Likewise.
* tree.h (OMP_CLAUSE_GRIDDIM_DIMENSION): Renamed to
OMP_CLAUSE__GRIDDIM__DIMENSION
---
 gcc/gimple.h|  2 +-
 gcc/omp-low.c   | 12 ++--
 gcc/tree-pretty-print.c |  2 +-
 gcc/tree.h  |  5 +
 4 files changed, 9 insertions(+), 12 deletions(-)

diff --git a/gcc/gimple.h b/gcc/gimple.h
index 7eef07c..6d15dab 100644
--- a/gcc/gimple.h
+++ b/gcc/gimple.h
@@ -730,7 +730,7 @@ struct GTY((tag("GSS_OMP_CONTINUE")))
   tree control_use;
 };
 
-/* GIMPLE_OMP_SINGLE, GIMPLE_OMP_ORDERED */
+/* GIMPLE_OMP_SINGLE, GIMPLE_OMP_TEAMS, GIMPLE_OMP_ORDERED */
 
 struct GTY((tag("GSS_OMP_SINGLE_LAYOUT")))
   gimple_statement_omp_single_layout : public gimple_statement_omp
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index c534f5c..616c5bd 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -12741,7 +12741,7 @@ grid_get_kernel_launch_attributes (gimple_stmt_iterator 
*gsi,
   if (OMP_CLAUSE_CODE (clause) != OMP_CLAUSE__GRIDDIM_)
continue;
 
-  unsigned dim = OMP_CLAUSE_GRIDDIM_DIMENSION (clause);
+  unsigned dim = OMP_CLAUSE__GRIDDIM__DIMENSION (clause);
   max_dim = MAX (dim, max_dim);
 
   grid_insert_store_range_dim (gsi, lattrs,
@@ -12788,7 +12788,7 @@ get_target_argument_identifier (int device, bool 
subseqent_param, int id)
   return fold_convert (ptr_type_node, t);
 }
 
-/* Return a target argument consisiting of DEVICE identifier, value identifier
+/* Return a target argument consisting of DEVICE identifier, value identifier
ID, and the actual VALUE.  */
 
 static tree
@@ -12806,8 +12806,8 @@ get_target_argument_value (gimple_stmt_iterator *gsi, 
int device, int id,
 }
 
 /* If VALUE is an integer constant greater than -2^15 and smaller than 2^15,
-   push one argument to ARGS with bot the DEVICE, ID and VALUE embeded in it,
-   otherwise push an iedntifier (with DEVICE and ID) and the VALUE in two
+   push one argument to ARGS with both the DEVICE, ID and VALUE embedded in it,
+   otherwise push an identifier (with DEVICE and ID) and the VALUE in two
arguments.  */
 
 static void
@@ -17693,7 +17693,7 @@ grid_attempt_target_gridification (gomp_target *target,
ws = build_zero_cst (uint32_type_node);
 
   tree c = build_omp_clause (UNKNOWN_LOCATION, OMP_CLAUSE__GRIDDIM_);
-  OMP_CLAUSE_SET_GRIDDIM_DIMENSION (c, (unsigned int) i);
+  OMP_CLAUSE__GRIDDIM__DIMENSION (c) = i;
   OMP_CLAUSE__GRIDDIM__SIZE (c) = gs;
   OMP_CLAUSE__GRIDDIM__GROUP (c) = ws;
   OMP_CLAUSE_CHAIN (c) = gimple_omp_target_clauses (target);
@@ -17749,7 +17749,7 @@ grid_gridify_all_targets (gimple_seq *body_p)
   memset (&wi, 0, sizeof (wi));
   walk_gimple_seq_mod (body_p, grid_gridify_all_targets_stmt, NULL, &wi);
 }
-
+
 
 /* Main entry point.  */
 
diff --git a/gcc/tree-pretty-print.c b/gcc/tree-pretty-print.c
index 31cea10..9c13d84 100644
--- a/gcc/tree-pretty-print.c
+++ b/gcc/tree-pretty-print.c
@@ -944,7 +944,7 @@ dump_omp_clause (pretty_printer *pp, tree clause, int spc, 
int flags)
 
 case OMP_CLAUSE__GRIDDIM_:
   pp_string (pp, "_griddim_(");
-  pp_unsigned_wide_integer (pp, OMP_CLAUSE_GRIDDIM_DIMENSION (clause));
+  pp_unsigned_wide_integer (pp, OMP_CLAUSE__GRIDDIM__DIMENSION (clause));
   pp_colon (pp);
   dump_generic_node (pp, OMP_CLAUSE__GRIDDIM__SIZE (clause), spc, flags,
 false);
diff --git a/gcc/tree.h b/gcc/tree.h
index e885ea1..9b987bb 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -1636,12 +1636,9 @@ extern void protected_set_expr_location (tree, 
location_t);
 #define OMP_CLAUSE_TILE_LIST(NODE) \
   OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_TILE), 0)
 
-#define OMP_CLAUSE_GRIDDIM_DIMENSION(NODE) \
+#define OMP_CLAUSE__GRIDDIM__DIMENSION(NODE) \
   (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE__GRIDDIM_)\
->omp_clause.subcode.dimension)
-#define OMP_CLAUSE_SET_GRIDDIM_DIMENSION(NODE, DIMENSION) \
-  (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE__GRIDDIM_)\
-   ->omp_clause.subcode.dimension = (DIMENSION))
 #define OMP_CLAUSE__GRIDDIM__SIZE(NODE) \
   OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE__GRIDDIM_), 0)
 #define OMP_CLAUSE__GRIDDIM__GROUP(NODE) \
-- 
2.6.4



Re: [PATCH] DWARF: add abstract origin links on lexical blocks DIEs

2016-01-15 Thread Pierre-Marie de Rodat

On 01/13/2016 01:17 PM, Richard Biener wrote:

I wonder if you can construct a guality testcase that passes with and
fails without
the patch?


I’ve tried to first look at how guality testcases are written (thanks 
for your answers on IRC, by the way :-)) and then how I could write a 
testcase for my fix. It seems there are two ways: match patterns in the 
assembly file or evaluate an expression in GDB.


I already have the testcase I used during development: it’s written in 
Ada, to build with -O2. The way it checks the fix is to see if GDB 
manages to put a breakpoint on the Child2 symbol before executing the 
program (it cannot before my fix and it can afterwards). Oh, and it 
requires a fairly recent GDB version (7.10 looks good).


I managed to get a similar GNU C99 reproducer (it’s attached): the 
debugging information has the pattern that exhibits the bugfix. Namely: 
while the “parent” function is inlined, the “child” function (which is 
in a block inside “parent”) is not. So GDB relies on the 
DW_TAG_abstract_origin in the inlined block to refer to the abstract 
block that contains the DIE that materializes “child“.


However, it looks like there is no way in GDB to refer to C nested 
functions when they are not in the current scope:

$ gcc -g -O2 -std=gnu99 nested_fun.c nested_fun_helpers.c
$ gdb -n -q ./a.out
(gdb) ptype child
No symbol "child" in current context.
(gdb) ptype nested_fun.parent.child
No symbol "nested_fun" in current context.


On the other hand, this works with the Ada testcase:

(gdb) ptype nested_fun.parent.child
type = (false, true)


So I’m not sure what to do next: should I do a fragile testcase based on 
scanning the assembly file? (it could break with an optimizer change) 
create a guality testsuite for Ada?



Anyway, the patch looks ok to me but please give others a chance to chime in.


Sure. Thank you for reviewing!

--
Pierre-Marie de Rodat
/* { dg-do run } */
/* { dg-options "-O2 -g -std=gnu99" } */

extern void *create (const char *);
extern void destroy (void *);
extern void do_nothing (char);

struct string
{
  const char *data;
  int lb;
  int ub;
};

int
main (void)
{
  void *o1 = create ("foo");

  void
  parent (void)
  {
{
  void *o2 = create ("bar");

  int
  child (struct string s)
  {
	int i = s.lb;

	if (s.lb <= s.ub)
	  while (1)
	{
	  char c = s.data[i - s.lb];
	  do_nothing (c);
	  if (c == 'o')
		return 1;
	  if (i == s.ub)
		break;
	  ++i;
	}
	return 0;
  }

  int r;

  r = child ((struct string) {"baz", 1, 3});
  r = child ((struct string) {"qux", 2, 4});
  r = child ((struct string) {"foobar", 1, 6});
}

do_nothing (0);
  }

  /* { dg-final { gdb-test 56 "type:main::parent::child" "int (struct string)" } } */
  parent ();
  return 0;
}
void *
create (const char *s)
{
  return 0;
}

void
destroy (void *o)
{
  return;
}

void
do_nothing (char c)
{
  return;
}


Re: PR 69246: Invalid REG_ARGS_SIZE for sibcalls

2016-01-15 Thread Bernd Schmidt

On 01/15/2016 03:31 PM, Richard Sandiford wrote:

The problem in this PR was that we were treating a sibcall as popping
arguments, leading to a negative REG_ARGS_SIZE.

It doesn't really make sense to treat sibcalls as popping since
(a) they're deallocating the caller's stack, not ours, and
(b) there are no optabs for popping sibcalls (any more).

Tested on x86_64-linux-gnu (including an -m32 run and Ada).  OK to install?


Ok.


Bernd


Re: [PATCH] PR target/68991: Add vector_memory_operand and "Bm" constraint

2016-01-15 Thread Jakub Jelinek
On Fri, Jan 15, 2016 at 06:24:42AM -0800, H.J. Lu wrote:
> >> -Ofast -mavx -mno-avx2 -mtune=bdver2
> >>
> >> float *a, *b;
> >> int c, d, e, f;
> >> void
> >> foo (void)
> >> {
> >>   for (; c; c++)
> >> a[c] = 0;
> >>   if (!d)
> >> for (; c < f; c++)
> >>   b[c] = (double) e / b[c];
> >> }
> >>
> >> r232086 vs. r232088 gives.  I don't see significant differences before IRA,
> >> IRA seems to have some cost differences (strange), but the same 
> >> dispositions,
> >> and LRA ends up with all the differences.
> >>
> >
> > That may be due to the difference between define_memory_constraint and
> > define_constraint.  LRA doesn't consider register for define_constraint if
> > memory is true.
> >
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68991#c14

Tracking this in PR69299.  Maybe we really need to have two types of memory
constraints, ones which can be worst case always satisfied by reloading
their address into an address register and another ones which can be worst
case always satisfied by loading the memory into a temporary register (for
loads) or storing it from a temporary register.

Jakub


PR 69246: Invalid REG_ARGS_SIZE for sibcalls

2016-01-15 Thread Richard Sandiford
The problem in this PR was that we were treating a sibcall as popping
arguments, leading to a negative REG_ARGS_SIZE.

It doesn't really make sense to treat sibcalls as popping since
(a) they're deallocating the caller's stack, not ours, and
(b) there are no optabs for popping sibcalls (any more).

Tested on x86_64-linux-gnu (including an -m32 run and Ada).  OK to install?

Thanks,
Richard


gcc/
PR middle-end/69246
* calls.c (emit_call_1): Force n_popped to zero for sibcalls.

gcc/testsuite/
PR middle-end/69246
* gcc.target/i386/pr69246.c: New test.

diff --git a/gcc/calls.c b/gcc/calls.c
index a154934..8f573b8 100644
--- a/gcc/calls.c
+++ b/gcc/calls.c
@@ -272,12 +272,19 @@ emit_call_1 (rtx funexp, tree fntree ATTRIBUTE_UNUSED, 
tree fndecl ATTRIBUTE_UNU
   rtx rounded_stack_size_rtx = GEN_INT (rounded_stack_size);
   rtx call, funmem, pat;
   int already_popped = 0;
-  HOST_WIDE_INT n_popped
-= targetm.calls.return_pops_args (fndecl, funtype, stack_size);
+  HOST_WIDE_INT n_popped = 0;
+
+  /* Sibling call patterns never pop arguments (no sibcall(_value)_pop
+ patterns exist).  Any popping that the callee does on return will
+ be from our caller's frame rather than ours.  */
+  if (!(ecf_flags & ECF_SIBCALL))
+{
+  n_popped += targetm.calls.return_pops_args (fndecl, funtype, stack_size);
 
 #ifdef CALL_POPS_ARGS
-  n_popped += CALL_POPS_ARGS (*get_cumulative_args (args_so_far));
+  n_popped += CALL_POPS_ARGS (*get_cumulative_args (args_so_far));
 #endif
+}
 
   /* Ensure address is valid.  SYMBOL_REF is already valid, so no need,
  and we don't want to load it into a register as an optimization,
diff --git a/gcc/testsuite/gcc.target/i386/pr69246.c 
b/gcc/testsuite/gcc.target/i386/pr69246.c
new file mode 100644
index 000..e56e691
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr69246.c
@@ -0,0 +1,18 @@
+/* { dg-do compile { target ia32 } } */
+/* { dg-options "-O2" } */
+
+void (__attribute__ ((stdcall)) *a) (int);
+
+void __attribute__ ((stdcall))
+foo (int x)
+{
+  a (x);
+}
+
+int (__attribute__ ((stdcall)) *b) (int);
+
+int __attribute__ ((stdcall))
+bar (int x)
+{
+  return b (x);
+}



  1   2   >