Re: C++ PATCH to remove WITH_CLEANUP_EXPR handling

2017-07-03 Thread Jason Merrill
Absolutely.

On Mon, Jul 3, 2017 at 5:35 AM, Marek Polacek  wrote:
> On Thu, Jun 29, 2017 at 05:44:25PM -0400, Jason Merrill wrote:
>> The C++ front end hasn't generated WITH_CLEANUP_EXPR in a very long
>> time (20+ years?), so there's no need to handle it.
>
> Heh.  Found another one; is this patch ok if it passes testing?
>
> 2017-07-03  Marek Polacek  
>
> * c-warn.c (warn_if_unused_value): Remove WITH_CLEANUP_EXPR handling.
>
> diff --git gcc/c-family/c-warn.c gcc/c-family/c-warn.c
> index 5d67395..b9378c2 100644
> --- gcc/c-family/c-warn.c
> +++ gcc/c-family/c-warn.c
> @@ -465,7 +465,6 @@ warn_if_unused_value (const_tree exp, location_t locus)
>  case TARGET_EXPR:
>  case CALL_EXPR:
>  case TRY_CATCH_EXPR:
> -case WITH_CLEANUP_EXPR:
>  case EXIT_EXPR:
>  case VA_ARG_EXPR:
>return false;
>
> Marek


Re: [PATCH] C/C++: add fix-it hints for various missing symbols

2017-07-03 Thread Joseph Myers
Does the changed location fix bug 7356?

-- 
Joseph S. Myers
jos...@codesourcery.com


[PATCH] Save and restore EDGE_DFS_BACK in draw_cfg_edges

2017-07-03 Thread Tom de Vries

[was: Re: [PATCH] Add dotfn ]

On 07/03/2017 12:23 PM, Richard Biener wrote:

Btw, I think this needs fixing:
...
/* Draw all edges in the CFG.  Retreating edges are drawin as not
constraining, this makes the layout of the graph better.
(??? Calling mark_dfs_back may change the compiler's behavior when
dumping, but computing back edges here for ourselves is also not
desirable.)  */

static void
draw_cfg_edges (pretty_printer *pp, struct function *fun)
{
   basic_block bb;
   mark_dfs_back_edges ();
   FOR_ALL_BB_FN (bb, cfun)
 draw_cfg_node_succ_edges (pp, fun->funcdef_no, bb);
...

We don't want that calling a debug function changes compiler behavior
(something I ran into while debugging PR81192).

Any suggestion on how to address this? We could allocate a bitmap before and
save the edge flag for all edges, and restore afterwards.



Something like that, yes.



This patch implements that approach.

I've tried it with the PR81192 example and calling DOTFN in tail-merge, 
like this:

1. Just compiling the example without any patches gives a tail-merge
   sigsegv.
2. Compiling with the DOTFN call in tail-merge makes the sigsegv go
   away.
3. Adding this patch makes the sigsegv come back.

OK for trunk if bootstrap and reg-test on x86_64 succeeds?

Thanks,
- Tom
Save and restore EDGE_DFS_BACK in draw_cfg_edges

2017-07-03  Tom de Vries  

	* graph.c (draw_cfg_edges): Save and restore EDGE_DFS_BACK.

---
 gcc/graph.c | 49 +
 1 file changed, 45 insertions(+), 4 deletions(-)

diff --git a/gcc/graph.c b/gcc/graph.c
index 9261732..628769b 100644
--- a/gcc/graph.c
+++ b/gcc/graph.c
@@ -243,19 +243,60 @@ draw_cfg_nodes (pretty_printer *pp, struct function *fun)
 }
 
 /* Draw all edges in the CFG.  Retreating edges are drawin as not
-   constraining, this makes the layout of the graph better.
-   (??? Calling mark_dfs_back may change the compiler's behavior when
-   dumping, but computing back edges here for ourselves is also not
-   desirable.)  */
+   constraining, this makes the layout of the graph better.  */
 
 static void
 draw_cfg_edges (pretty_printer *pp, struct function *fun)
 {
   basic_block bb;
+
+  /* Save EDGE_DFS_BACK flag to dfs_back.  */
+  auto_bitmap dfs_back;
+  edge e;
+  edge_iterator ei;
+  unsigned int idx = 0;
+  FOR_EACH_BB_FN (bb, cfun)
+{
+  FOR_EACH_EDGE (e, ei, bb->preds)
+	{
+	  if (e->flags & EDGE_DFS_BACK)
+	bitmap_set_bit (dfs_back, idx);
+	  idx++;
+	}
+  FOR_EACH_EDGE (e, ei, bb->succs)
+	{
+	  if (e->flags & EDGE_DFS_BACK)
+	bitmap_set_bit (dfs_back, idx);
+	  idx++;
+	}
+}
+
   mark_dfs_back_edges ();
   FOR_ALL_BB_FN (bb, cfun)
 draw_cfg_node_succ_edges (pp, fun->funcdef_no, bb);
 
+  /* Restore EDGE_DFS_BACK flag from dfs_back.  */
+  idx = 0;
+  FOR_EACH_BB_FN (bb, cfun)
+{
+  FOR_EACH_EDGE (e, ei, bb->preds)
+	{
+	  if (bitmap_bit_p (dfs_back, idx))
+	e->flags |= EDGE_DFS_BACK;
+	  else
+	e->flags &= ~EDGE_DFS_BACK;
+	  idx++;
+	}
+  FOR_EACH_EDGE (e, ei, bb->succs)
+	{
+	  if (bitmap_bit_p (dfs_back, idx))
+	e->flags |= EDGE_DFS_BACK;
+	  else
+	e->flags &= ~EDGE_DFS_BACK;
+	  idx++;
+	}
+}
+
   /* Add an invisible edge from ENTRY to EXIT, to improve the graph layout.  */
   pp_printf (pp,
 	 "\tfn_%d_basic_block_%d:s -> fn_%d_basic_block_%d:n "


[PATCH] v2: c/c++: Add fix-it hints for suggested missing #includes

2017-07-03 Thread David Malcolm
On Fri, 2017-06-30 at 09:40 -0600, Jeff Law wrote:
> On 05/26/2017 01:54 PM, David Malcolm wrote:
> > Ping:
> >   https://gcc.gnu.org/ml/gcc-patches/2017-05/msg00321.html
> > 
> > On Thu, 2017-05-04 at 12:36 -0400, David Malcolm wrote:
> > > As of r247522, fix-it-hints can suggest the insertion of new
> > > lines.
> > > 
> > > This patch uses this to implement a new "maybe_add_include_fixit"
> > > function in c-common.c and uses it in the two places where the C
> > > and
> > > C++
> > > frontend can suggest missing #include directives. [1]
> > > 
> > > The idea is that the user can then click on the fix-it in an IDE
> > > and have it add the #include for them (or use -fdiagnostics
> > > -generate
> > > -patch).
> > > 
> > > Examples can be seen in the test cases.
> > > 
> > > The function attempts to put the #include in a reasonable place:
> > > immediately after the last #include within the file, or at the
> > > top of the file.  It is idempotent, so -fdiagnostics-generate
> > > -patch
> > > does the right thing if several such diagnostics are emitted.
> > > 
> > > Successfully bootstrapped on x86_64-pc-linux-gnu.
> > > 
> > > OK for trunk?
> > > 
> > > [1] I'm working on a followup which tweaks another diagnostic so
> > > that
> > > it
> > > can suggest that a #include was missing, so I'll use it there as
> > > well.
> > > 
> > > gcc/c-family/ChangeLog:
> > >   * c-common.c (try_to_locate_new_include_insertion_point): New
> > >   function.
> > >   (per_file_includes_t): New typedef.
> > >   (added_includes_t): New typedef.
> > >   (added_includes): New variable.
> > >   (maybe_add_include_fixit): New function.
> > >   * c-common.h (maybe_add_include_fixit): New decl.
> > > 
> > > gcc/c/ChangeLog:
> > >   * c-decl.c (implicitly_declare): When suggesting a missing
> > >   #include, provide a fix-it hint.
> > > 
> > > gcc/cp/ChangeLog:
> > >   * name-lookup.c (get_std_name_hint): Add '<' and '>' around
> > >   the header names.
> > >   (maybe_suggest_missing_header): Update for addition of '<' and
> > > '>'
> > >   to above.  Provide a fix-it hint.
> > > 
> > > gcc/testsuite/ChangeLog:
> > >   * g++.dg/lookup/missing-std-include-2.C: New text case.
> > >   * gcc.dg/missing-header-fixit-1.c: New test case.
> Generally OK.  But a few comments on how you find the location for
> where
> to suggest the new #include.
> 
> It looks like you're walking the whole table every time?!?  Shouldn't
> you at least bound things between start of file and the point where
> an
> error was issued?  ie, if you used an undefined type on line XX, a
> #include after line XX makes no sense to resolve the error.
> 
> I'm not sure how often this will get called when someone does
> something
> stupid and needs the #include.  But ISTM you're looking for two
> bounds.
> 
> For the "last" case you start at the statement which generated the
> error
> and walk backwards stopping when you find the last map.
> 
> For the "first" case, you start at the beginning and walk forward to
> find the map, then quit.
> 
> Are those not appropriate here?

Here's an updated version of the patch.

Changed in v2:

* updated try_to_locate_new_include_insertion_point so that it stops
  searching when it reaches the ordinary map containing the location
  of the diagnostic, giving an upper bound to the search (see notes
  in https://gcc.gnu.org/ml/gcc-patches/2017-06/msg02434.html for more
  discussion of this).
* added test coverage for a missing #include within a header (rather than
  the main source file).  The #include is added to the header in this
  case.
* C++: added a couple of fix-it hints to errors that were already
  suggested missing includes in the text of the message (for 
  and ); added test coverage for these.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.

OK for trunk?

gcc/c-family/ChangeLog:
* c-common.c (try_to_locate_new_include_insertion_point): New
function.
(per_file_includes_t): New typedef.
(added_includes_t): New typedef.
(added_includes): New variable.
(maybe_add_include_fixit): New function.
* c-common.h (maybe_add_include_fixit): New decl.

gcc/c/ChangeLog:
* c-decl.c (implicitly_declare): When suggesting a missing
#include, provide a fix-it hint.

gcc/cp/ChangeLog:
* name-lookup.c (get_std_name_hint): Add '<' and '>' around
the header names.
(maybe_suggest_missing_header): Update for addition of '<' and '>'
to above.  Provide a fix-it hint.
* pt.c: Include "gcc-rich-location.h"
(listify): Attempt to add fix-it hint for missing
#include .
* rtti.c: Include "gcc-rich-location.h".
(typeid_ok_p): Attempt to add fix-it hint for missing
#include .

gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/missing-initializer_list-include.C: New test case.
* g++.dg/lookup/missing-std-include-2.C: New test case.
* g++.dg/lookup/missing-std-include-3.C: New test 

[patch, libgfortran] Use memcpy in a few more places for eoshift

2017-07-03 Thread Thomas Koenig

Hello world,

attached are a few more speedups for special eoshift cases.  This
time, nothing fancy, just use memcpy for copying in the
contiguous case.

I am still looking at eoshift2 (scalar shift, array boundary)
to see if it would be possible to duplicate the speed gains for
eoshift0 (scalar shift, scalar boundary), but it won't hurt
to do this first.  At least the shift along dimension 1
should be faster by about a factor of two.

I have also added a few test cases which test eoshift in all
the variants touched by this patch.

Regression-testing as I write this.  I don't expect anything bad
(because I tested all test cases containing *eoshift*).

OK for trunk if this passes?

Regards

Thomas

2017-06-03  Thomas Koenig  

* intrinsics/eoshift2.c (eoshift2):  Use memcpy
for innermost copy where possible.
* m4/eoshift1.m4 (eoshift1): Likewise.
* m4/eoshift3.m4 (eoshift3): Likewise.
* generated/eoshift1_16.c: Regenerated.
* generated/eoshift1_4.c: Regenerated.
* generated/eoshift1_8.c: Regenerated.
* generated/eoshift3_16.c: Regenerated.
* generated/eoshift3_4.c: Regenerated.
* generated/eoshift3_8.c: Regenerated.

2017-06-03  Thomas Koenig  

* gfortran.dg/eoshift_4.f90:  New test.
* gfortran.dg/eoshift_5.f90:  New test.
* gfortran.dg/eoshift_6.f90:  New test.
Index: intrinsics/eoshift2.c
===
--- intrinsics/eoshift2.c	(Revision 249936)
+++ intrinsics/eoshift2.c	(Arbeitskopie)
@@ -181,12 +181,23 @@ eoshift2 (gfc_array_char *ret, const gfc_array_cha
   src = sptr;
   dest = [-shift * roffset];
 }
-  for (n = 0; n < len; n++)
-{
-  memcpy (dest, src, size);
-  dest += roffset;
-  src += soffset;
-}
+
+  /* If the elements are contiguous, perform a single block move.  */
+  if (soffset == size && roffset == size)
+	{
+	  size_t chunk = size * len;
+	  memcpy (dest, src, chunk);
+	  dest += chunk;
+	}
+  else
+	{
+	  for (n = 0; n < len; n++)
+	{
+	  memcpy (dest, src, size);
+	  dest += roffset;
+	  src += soffset;
+	}
+	}
   if (shift >= 0)
 {
   n = shift;
Index: m4/eoshift1.m4
===
--- m4/eoshift1.m4	(Revision 249936)
+++ m4/eoshift1.m4	(Arbeitskopie)
@@ -184,12 +184,23 @@ eoshift1 (gfc_array_char * const restrict ret,
   src = sptr;
   dest = [delta * roffset];
 }
-  for (n = 0; n < len - delta; n++)
-{
-  memcpy (dest, src, size);
-  dest += roffset;
-  src += soffset;
-}
+
+  /* If the elements are contiguous, perform a single block move.  */
+  if (soffset == size && roffset == size)
+	{
+	  size_t chunk = size * (len - delta);
+	  memcpy (dest, src, chunk);
+	  dest += chunk;
+	}
+  else
+	{
+	  for (n = 0; n < len - delta; n++)
+	{
+	  memcpy (dest, src, size);
+	  dest += roffset;
+	  src += soffset;
+	}
+	}
   if (sh < 0)
 dest = rptr;
   n = delta;
Index: m4/eoshift3.m4
===
--- m4/eoshift3.m4	(Revision 249936)
+++ m4/eoshift3.m4	(Arbeitskopie)
@@ -199,12 +199,24 @@ eoshift3 (gfc_array_char * const restrict ret,
   src = sptr;
   dest = [delta * roffset];
 }
-  for (n = 0; n < len - delta; n++)
-{
-  memcpy (dest, src, size);
-  dest += roffset;
-  src += soffset;
-}
+
+  /* If the elements are contiguous, perform a single block move.  */
+  if (soffset == size && roffset == size)
+	{
+	  size_t chunk = size * (len - delta);
+	  memcpy (dest, src, chunk);
+	  dest += chunk;
+	}
+  else
+	{
+	  for (n = 0; n < len - delta; n++)
+	{
+	  memcpy (dest, src, size);
+	  dest += roffset;
+	  src += soffset;
+	}
+	}
+
   if (sh < 0)
 dest = rptr;
   n = delta;
Index: generated/eoshift1_16.c
===
--- generated/eoshift1_16.c	(Revision 249936)
+++ generated/eoshift1_16.c	(Arbeitskopie)
@@ -183,12 +183,23 @@ eoshift1 (gfc_array_char * const restrict ret,
   src = sptr;
   dest = [delta * roffset];
 }
-  for (n = 0; n < len - delta; n++)
-{
-  memcpy (dest, src, size);
-  dest += roffset;
-  src += soffset;
-}
+
+  /* If the elements are contiguous, perform a single block move.  */
+  if (soffset == size && roffset == size)
+	{
+	  size_t chunk = size * (len - delta);
+	  memcpy (dest, src, chunk);
+	  dest += chunk;
+	}
+  else
+	{
+	  for (n = 0; n < len - delta; n++)
+	{
+	  memcpy (dest, src, size);
+	  dest += roffset;
+	  src += soffset;
+	}
+	}
   if (sh 

Re: fix libcc1 dependencies in toplevel Makefile

2017-07-03 Thread Olivier Hainque
Hi Alex,

(Back from a few days away)

> On 27 Jun 2017, at 21:50, Alexandre Oliva  wrote:
> 
>> I don't quite understand this: we're using the same prerequisite as target
>> libraries, e.g. all-target-libstdc++-v3 or all-target-libbacktrace
> 
> Not quite.  Target libraries have deps on e.g. target-libgcc, look below
> the following comments in Makefile.in:
> 
> # Dependencies for target modules on other target modules are
> # described by lang_env_dependencies; the defaults apply to anything
> # not mentioned there.
> 
> plus, maybe-configure*-target-libgcc depend on maybe-all*-gcc (see above
> those comments).  The precise deps vary per bootstrap level, or
> non-bootstrap.
> 
> But after the proposed patch there are no such deps for libcc1 in the
> bootstrap case, so we might very well attempt to build libcc1 in
> parallel with gcc.  We shouldn't do that.
> 
> But then, it all works out because we only build all-host after
> bootstrap is complete; all-stage* doesn't depend on libcc1 at all.

I think I see.

[...]

> So, would you like to give the automatic figuring out of
> non-bootstrap-on-bootstrap deps in dependencies, and guard them between
> @if gcc-no-bootstrap and @endif (then both configure- and all- libcc1
> deps would be adjusted this way)?  (I'm not saying it should be trivial
> to do or anything like that; I'm not all that familiar with it and I'd
> have to figure it out myself if I were to do it, but I think that would
> be better than adding yet another means of introducing dependencies,
> while leaving another risky dep in place)


I'm willing to study this more and see what can be done
to improve things further. There are still a few details I don't
quite grasp so it'll just take a bit of time.

Thanks a lot for the additional set of extensive comments!

With Kind Regards,

Olivier



Re: [C++ PATCH] "decomposition declaration" -> "structured binding" in C++ diagnostics

2017-07-03 Thread Jason Merrill
On Mon, Jul 3, 2017 at 3:59 PM, Jakub Jelinek  wrote:
> On Mon, Jul 03, 2017 at 03:50:06PM -0400, Jason Merrill wrote:
>> On Mon, Jul 3, 2017 at 12:02 PM, Jakub Jelinek  wrote:
>> > So like this?
>>
>> Looks good, except...
>>
>> > case sc_auto:
>> > - error_at (loc, "decomposition declaration cannot be declared "
>> > + error_at (loc, "structured binding declaration cannot be "
>> > "C++98 %");
>> >   break;
>> > default:
>>
>> This case should just fall into the default gcc_unreachable, we aren't
>> going to get C++98 auto in C++17 mode.
>
> We actually support structured bindings (with a pedwarn that they are only
> available in -std=c++1z) in older std modes (of course, it doesn't make
> much sense in -std=c++98, because you always get either the above error
> or error that the type of the structured binding is not appropriate; but
> for -std=gnu++11 and above it works well).
> If I remove the above error_at and fall through into gcc_unreachable,
> we'll ICE.

Ah, OK.

Jason


Re: [PATCH v2][RFC] Canonize names of attributes.

2017-07-03 Thread Jason Merrill
On Mon, Jul 3, 2017 at 5:52 AM, Martin Liška  wrote:
> On 06/30/2017 09:34 PM, Jason Merrill wrote:
>> On Fri, Jun 30, 2017 at 5:23 AM, Martin Liška  wrote:
>>> This is v2 of the patch, where just names of attributes are canonicalized.
>>> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>>
>> What is the purpose of the new "strict" parameter to cmp_attribs* ?  I
>> don't see any discussion of it.
>
> It's needed for arguments of attribute names, like:
>
> /usr/include/stdio.h:391:62: internal compiler error: in cmp_attribs, at 
> tree.h:5523
>   __THROWNL __attribute__ ((__format__ (__printf__, 3, 4)));
>

Mm.  Although we don't want to automatically canonicalize all
identifier arguments to attributes in the parser, we could still do it
for specific attributes, e.g. in handle_format_attribute or
handle_mode_attribute.

Jason


Re: [C++ PATCH] "decomposition declaration" -> "structured binding" in C++ diagnostics

2017-07-03 Thread Jakub Jelinek
On Mon, Jul 03, 2017 at 03:50:06PM -0400, Jason Merrill wrote:
> On Mon, Jul 3, 2017 at 12:02 PM, Jakub Jelinek  wrote:
> > So like this?
> 
> Looks good, except...
> 
> > case sc_auto:
> > - error_at (loc, "decomposition declaration cannot be declared "
> > + error_at (loc, "structured binding declaration cannot be "
> > "C++98 %");
> >   break;
> > default:
> 
> This case should just fall into the default gcc_unreachable, we aren't
> going to get C++98 auto in C++17 mode.

We actually support structured bindings (with a pedwarn that they are only
available in -std=c++1z) in older std modes (of course, it doesn't make
much sense in -std=c++98, because you always get either the above error
or error that the type of the structured binding is not appropriate; but
for -std=gnu++11 and above it works well).
If I remove the above error_at and fall through into gcc_unreachable,
we'll ICE.

Jakub


Re: [C++ PATCH] "decomposition declaration" -> "structured binding" in C++ diagnostics

2017-07-03 Thread Jason Merrill
On Mon, Jul 3, 2017 at 12:02 PM, Jakub Jelinek  wrote:
> So like this?

Looks good, except...

> case sc_auto:
> - error_at (loc, "decomposition declaration cannot be declared "
> + error_at (loc, "structured binding declaration cannot be "
> "C++98 %");
>   break;
> default:

This case should just fall into the default gcc_unreachable, we aren't
going to get C++98 auto in C++17 mode.

Jason


Re: [PATCH] C/C++: add fix-it hints for various missing symbols

2017-07-03 Thread David Malcolm
On Mon, 2017-07-03 at 19:57 +0100, Richard Sandiford wrote:
> [Thanks for all your diagnostic work btw.]
> 
> David Malcolm  writes:
> > clang can also print notes about matching opening symbols
> > e.g. the note here:
> > 
> >   missing-symbol-2.c:25:22: error: expected ']'
> > const char test [42;
> >^
> >   missing-symbol-2.c:25:19: note: to match this '['
> > const char test [42;
> > ^
> > which, although somewhat redundant for this example, seems much
> > more
> > useful if there's non-trivial nesting of constructs, or more than a
> > few
> > lines separating the open/close symbols (e.g. showing a stray
> > "namespace {"
> > that the user forgot to close).
> > 
> > I'd like to implement both of these ideas as followups, but in
> > the meantime, is the fix-it hint patch OK for trunk?
> > (successfully bootstrapped & regrtested on x86_64-pc-linux-gnu)
> 
> Just wondering: how easy would it be to restrict the note to the
> kinds
> of cases you mention?  TBH I think clang goes in for extra notes too
> much, and it's not always that case that an "expected 'foo'" message
> really is caused by a missing 'foo'.  It'd be great if there was some
> way of making the notes a bit more discerning. :-)

My plan was to only do it for open/close punctuation, i.e.:
  * '(' and ')'
  * '{' and '}'
  * '[' and ']'
  * maybe '<' and '>' in C++

> Or maybe do something like restrict the extra note to cases in which
> the
> opening character is on a different line and use an underlined range
> when the opening character is on the same line?

Good idea: if it's on the same line, use a secondary range; if it's on
a different line, use a note.

The above example would look something like this (with the '[' as a
secondary range):

  missing-symbol-2.c:25:22: error: expected ']'
  const char test [42;
  ~  ^
 ]

which is more compact than the "separate note" approach, whilst (IMHO)
being just as readable.

FWIW diagnostic-show-locus.c can handle widely-separated secondary
ranges within one rich_location, provided they're in the same source
file (see calculate_line_spans, and the start_span callback within
diagnostic_context).

Consider the unclosed namespace here:

$ cat -n test.cc
 1  namespace ns {
 2  
 3  void test ()
 4  {
 5  }

for which we currently emit the rather unhelpful:

$ gcc test.cc
test.cc:5:1: error: expected ‘}’ at end of input
 }
 ^

Printing it via a secondary range using a single rich_location with
just an "error_at_rich_loc" call would print something like:

test.cc:5:1: error: expected ‘}’ at end of input
test.cc:1:14:
 namespace ns {
  ^
test.cc:5:1:
 }
  ^
  }

which works, but I'm not a fan of.

In constrast, with the "if it's on a different line, use a note" approach, we 
would print:

test.cc:5:1: error: expected ‘}’ at end of input
 }
  ^
  }
test.cc:1:14: note: to match this '{'
 namespace ns {
  ^

which I think is better (and supports the cases where they're in different 
files (e.g. you have a stray unclosed namespace in a header file, 
somewhere...), or macros are involved, etc)

So I'll have a go at implementing the "is it on a different line" logic you 
suggest.

For reference, clang prints the following for the above case:

test.cc:5:2: error: expected '}'
}
 ^
test.cc:1:14: note: to match this '{'
namespace ns {
 ^

Thinking aloud, maybe it would be better for the fix-it hint to suggest putting 
the '}' on a whole new line.  Might even be good to suggest adding

} // namespace ns

or similar (for this specific case), giving this output:

test.cc:5:1: error: expected ‘}’ at end of input
 }
+} // namespace ns
test.cc:1:14: note: to match this '{'
 namespace ns {
  ^

(only works if the proposed insertion point is on the end of a line, given the 
current restrictions on what our fix-it machinery is capable of - we don't 
currently support splitting a pre-existing line via a fix-it hint)

Thanks.
Dave




Re: [PATCH][PR 57371] Remove useless floating point casts in comparisons

2017-07-03 Thread Marc Glisse

On Mon, 3 Jul 2017, Jeff Law wrote:


On 07/02/2017 11:03 AM, Yuri Gribov wrote:

Hi all,

This is initial patch for
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57371 . Per Joseph's
suggestion it optimizes
  (float)lhs CMP rhs
  (double)lhs CMP rhs
to
  lhs CMP (typeof(x))rhs
whenever typeof(x) can be precisely represented by floating-point type
(e.g. short by float or int by double) and rhs can be precisely
represented by typeof(x).

Bootstrapped/regtested on x64. Ok for trunk?

I'd like to extend this further in follow-up patches:
1) fold always-false/always-true comparisons e.g.
  short x;
  (float)x > INT16_MAX;  // Always false
2) get rid of cast in comparisons with zero regardless of typeof(lhs)
when -fno-trapping-math:
  (float_or_double)lhs CMP 0

-Y


pr57371-1.patch


2017-07-02  Yury Gribov  

PR tree-optimization/57371
* match.pd: New pattern.
* testsuite/gcc.dg/pr57371-1.c: New test.
* testsuite/gcc.dg/pr57371-2.c: New test.

diff -rupN gcc/gcc/match.pd gcc-57371/gcc/match.pd
--- gcc/gcc/match.pd2017-06-29 21:14:57.0 +0200
+++ gcc-57371/gcc/match.pd  2017-07-01 09:08:04.0 +0200
@@ -2802,7 +2802,35 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
(simplify
 (cmp (sq @0) (sq @1))
   (if (! HONOR_NANS (@0))
-   (cmp @0 @1))
+   (cmp @0 @1)
+
+ /* Get rid of float cast in
+ (float_type)N CMP M
+if N and M are within the range explicitly representable
+by float type.
+
+TODO: fold always true/false comparisons if M is outside valid range.  */
+ (simplify
+  (cmp (float @0) REAL_CST@1)
+  (if (SCALAR_FLOAT_TYPE_P (TREE_TYPE (@1)))
+   (with
+{
+  tree itype = TREE_TYPE (@0);
+
+  const real_format *fmt = REAL_MODE_FORMAT (TYPE_MODE (TREE_TYPE (@1)));
+
+  const REAL_VALUE_TYPE *rhs = TREE_REAL_CST_PTR (@1);
+  bool not_rhs_int_p = false;
+  wide_int rhs_int = real_to_integer (rhs, _rhs_int_p, TYPE_PRECISION 
(itype));
+}
+(if (!not_rhs_int_p
+ && !(TYPE_UNSIGNED (itype) && real_isneg (rhs))
+ && wi::ge_p (rhs_int, wi::min_value (itype), TYPE_SIGN (itype))
+ && wi::le_p (rhs_int, wi::max_value (itype), TYPE_SIGN (itype))
+ && TYPE_PRECISION (itype) <= significand_size (fmt))
+ (cmp @0 { wide_int_to_tree (itype, rhs_int); })
+
+)

Seems like a nit, but instead of "not_rhs_int_p" use "fail" or something
like that.  That makes it easier to mentally parse the conditional which
uses the result.

What happens if @0 is a floating point type?  Based on the variable name
"itype" and passing TYPE_PRECISION (itype) to real_to_integer, it seems
like you're expecting @0 to be an integer.  If so, you should verify
that it really is an integer type.  Seems like a good thing to verify
with tests as well.


@0 is the argument of a FLOAT_EXPR. verify_gimple_assign_unary guarantees 
that it is INTEGRAL_TYPE_P (or VECTOR_INTEGER_TYPE_P but then the result 
would have to be VECTOR_FLOAT_TYPE_P, and since it gets compared to 
REAL_CST... the test SCALAR_FLOAT_TYPE_P is actually redundant).


--
Marc Glisse


Re: [PATCH] C/C++: add fix-it hints for various missing symbols

2017-07-03 Thread Richard Sandiford
[Thanks for all your diagnostic work btw.]

David Malcolm  writes:
> clang can also print notes about matching opening symbols
> e.g. the note here:
>
>   missing-symbol-2.c:25:22: error: expected ']'
> const char test [42;
>^
>   missing-symbol-2.c:25:19: note: to match this '['
> const char test [42;
> ^
> which, although somewhat redundant for this example, seems much more
> useful if there's non-trivial nesting of constructs, or more than a few
> lines separating the open/close symbols (e.g. showing a stray "namespace {"
> that the user forgot to close).
>
> I'd like to implement both of these ideas as followups, but in
> the meantime, is the fix-it hint patch OK for trunk?
> (successfully bootstrapped & regrtested on x86_64-pc-linux-gnu)

Just wondering: how easy would it be to restrict the note to the kinds
of cases you mention?  TBH I think clang goes in for extra notes too
much, and it's not always that case that an "expected 'foo'" message
really is caused by a missing 'foo'.  It'd be great if there was some
way of making the notes a bit more discerning. :-)

Or maybe do something like restrict the extra note to cases in which the
opening character is on a different line and use an underlined range
when the opening character is on the same line?

Thanks,
Richard


Re: [PATCH][PR 57371] Remove useless floating point casts in comparisons

2017-07-03 Thread Yuri Gribov
On Mon, Jul 3, 2017 at 4:28 PM, Jeff Law  wrote:
> On 07/03/2017 08:52 AM, Joseph Myers wrote:
>> I'd expect much more thorough testcases here, both for cases that get
>> optimized and cases that don't.  You're only testing comparisons with
>> zero.  There should be comparisons with other values, both integer and
>> noninteger, both within the range for which optimizing would be valid and
>> outside it, both inside the range of the integer type and outside it.
>> (To the extent that you don't optimize some cases that would be valid to
>> optimize as discussed in that PR, XFAILed tests, or deferring adding
>> tests, would be reasonable.  But each case identified in that PR as not
>> valid to optimize, or only valid to optimize with -fno-trapping-math,
>> should have corresponding tests that it's not optimized.)
>>
>> Since SCALAR_FLOAT_TYPE_P includes decimal floating-point types, tests
>> with those are desirable as well (in gcc.dg/dfp or c-c++-common/dfp, I
>> suppose).
>>
> Agreed.  I think with better testing this should be able to move forward
> after the technical review.  It's not terribly different conceptually
> than the code in DOM/VRP, except that Yuri's changes work on floating
> point types.
>
> I'm pretty sure DOM's bits could be replaced with a suitable match.pd
> pattern (which IMHO would be a small improvement across multiple axis).
> VRP would be more difficult as the VRP implementation depends on getting
> the value range of the RHS of the conditional.

Joseph, Jeff,

Thanks a lot for your comments. I'll work on updated version and post
it (hopefully) soon.

-Y


Re: [PATCH][PR 57371] Remove useless floating point casts in comparisons

2017-07-03 Thread Yuri Gribov
On Mon, Jul 3, 2017 at 4:38 PM, Jeff Law  wrote:
> On 07/02/2017 11:03 AM, Yuri Gribov wrote:
>> Hi all,
>>
>> This is initial patch for
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57371 . Per Joseph's
>> suggestion it optimizes
>>   (float)lhs CMP rhs
>>   (double)lhs CMP rhs
>> to
>>   lhs CMP (typeof(x))rhs
>> whenever typeof(x) can be precisely represented by floating-point type
>> (e.g. short by float or int by double) and rhs can be precisely
>> represented by typeof(x).
>>
>> Bootstrapped/regtested on x64. Ok for trunk?
>>
>> I'd like to extend this further in follow-up patches:
>> 1) fold always-false/always-true comparisons e.g.
>>   short x;
>>   (float)x > INT16_MAX;  // Always false
>> 2) get rid of cast in comparisons with zero regardless of typeof(lhs)
>> when -fno-trapping-math:
>>   (float_or_double)lhs CMP 0
>>
>> -Y
>>
>>
>> pr57371-1.patch
>>
>>
>> 2017-07-02  Yury Gribov  
>>
>>   PR tree-optimization/57371
>>   * match.pd: New pattern.
>>   * testsuite/gcc.dg/pr57371-1.c: New test.
>>   * testsuite/gcc.dg/pr57371-2.c: New test.
>>
>> diff -rupN gcc/gcc/match.pd gcc-57371/gcc/match.pd
>> --- gcc/gcc/match.pd  2017-06-29 21:14:57.0 +0200
>> +++ gcc-57371/gcc/match.pd2017-07-01 09:08:04.0 +0200
>> @@ -2802,7 +2802,35 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>> (simplify
>>  (cmp (sq @0) (sq @1))
>>(if (! HONOR_NANS (@0))
>> - (cmp @0 @1))
>> + (cmp @0 @1)
>> +
>> + /* Get rid of float cast in
>> + (float_type)N CMP M
>> +if N and M are within the range explicitly representable
>> +by float type.
>> +
>> +TODO: fold always true/false comparisons if M is outside valid range.  
>> */
>> + (simplify
>> +  (cmp (float @0) REAL_CST@1)
>> +  (if (SCALAR_FLOAT_TYPE_P (TREE_TYPE (@1)))
>> +   (with
>> +{
>> +  tree itype = TREE_TYPE (@0);
>> +
>> +  const real_format *fmt = REAL_MODE_FORMAT (TYPE_MODE (TREE_TYPE 
>> (@1)));
>> +
>> +  const REAL_VALUE_TYPE *rhs = TREE_REAL_CST_PTR (@1);
>> +  bool not_rhs_int_p = false;
>> +  wide_int rhs_int = real_to_integer (rhs, _rhs_int_p, 
>> TYPE_PRECISION (itype));
>> +}
>> +(if (!not_rhs_int_p
>> + && !(TYPE_UNSIGNED (itype) && real_isneg (rhs))
>> + && wi::ge_p (rhs_int, wi::min_value (itype), TYPE_SIGN (itype))
>> + && wi::le_p (rhs_int, wi::max_value (itype), TYPE_SIGN (itype))
>> + && TYPE_PRECISION (itype) <= significand_size (fmt))
>> + (cmp @0 { wide_int_to_tree (itype, rhs_int); })
>> +
>> +)
> Seems like a nit, but instead of "not_rhs_int_p" use "fail" or something
> like that.  That makes it easier to mentally parse the conditional which
> uses the result.

Actually it's even worse than that, it should actually be overflow_p
and for not_rhs_int_p I need to use other APIs.

> What happens if @0 is a floating point type?  Based on the variable name
> "itype" and passing TYPE_PRECISION (itype) to real_to_integer, it seems
> like you're expecting @0 to be an integer.  If so, you should verify
> that it really is an integer type.  Seems like a good thing to verify
> with tests as well.

Right.

-Y


Re: RFC Kill TYPE_METHODS

2017-07-03 Thread Richard Biener
On July 3, 2017 7:49:32 PM GMT+02:00, Nathan Sidwell  wrote:
>We currently have both TYPE_FIELDS and TYPE_METHODS for RECORD or UNION
>
>types.
>
>Originally TYPE_FIELDS held the FIELD_DECLS, but the C++ FE puts other 
>kinds of things there -- TYPE_DECLs are a favourite.  The C++ FE was
>the 
>only user of TYPE_METHODS, which holds member functions.  AFAICT it is 
>still the only generator.
>
>Given that the common code iterating over TYPE_FIELDS must already
>check 
>for non FIELD_DECL things, it seems superflous for the methods to be on
>
>a separate list.  A quick grep shows ipa-devirt, c-ada-spec.c and debug
>
>emission to be the only non C++ FE things that would need a bit of
>cleanup.
>
>Having a single chain of member decls will simplify the C++ FE, as I
>try 
>and merge its class member symbol handling.
>
>Any objections to going down this path?

Go ahead!

>nathan



Re: [PATCH v2] Add no_tail_call attribute

2017-07-03 Thread Yuri Gribov
On Mon, Jul 3, 2017 at 6:03 PM, Jeff Law  wrote:
> On 05/29/2017 11:24 PM, Yuri Gribov wrote:
>> On Mon, May 29, 2017 at 8:14 AM, Yuri Gribov  wrote:
>>> Hi all,
>>>
>>> As discussed in
>>> https://sourceware.org/ml/libc-alpha/2017-01/msg00455.html , some
>>> libdl functions rely on return address to figure out the calling
>>> DSO and then use this information in computation (e.g. output of dlsym
>>> depends on which library called it).
>>>
>>> As reported in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66826 this
>>> may break under tailcall optimization i.e. in cases like
>>>
>>>   return dlsym(...);
>>>
>>> Carlos confirmed that they would prefer to have GCC attribute to
>>> prevent tailcalls
>>> (https://sourceware.org/ml/libc-alpha/2017-01/msg00502.html) so there
>>> you go.
>>>
>>> This was bootstrapped on x86_64. Given that this is a minor addition,
>>> I only ran newly added regtests. I hope that's enough (full testsuite
>>> would take a week on my notebook...).
>> Added docs, per Alex's suggestion.
>>
>> -Y
>>
>>
>> 0001-Added-no_tail_call-attribute.patch
>>
>>
>> From 1f4590e7a633c6335512b012578bddba7602b3c9 Mon Sep 17 00:00:00 2001
>> From: Yury Gribov 
>> Date: Sun, 28 May 2017 21:02:20 +0100
>> Subject: [PATCH] Added no_tail_call attribute.
>>
>> gcc/
>> 2017-05-29  Yury Gribov  
>>
>>   * cgraphunit.c (cgraph_node::expand_thunk): Prevent
>>   tailcalling functions marked with no_tail_call.
>>   * gcc/doc/extend.texi: Document no_tail_call.
>>   * tree-tailcall.c (find_tail_calls): Ditto.
>>   * tree.c (comp_type_attributes): Treat no_tail_call
>>   mismatch as error.
>>
>> gcc/c-family/
>> 2017-05-29  Yury Gribov  
>>
>>   * c-attribs.c: New attribute.
>>
>> gcc/testsuite/
>> 2017-05-29  Yury Gribov  
>>
>>   * gcc.dg/pr66826-1.c: New test.
>>   * gcc.dg/pr66826-2.c: New test.
> I think a "no_tail_call" attribute is quite reasonable -- more so than
> some asm hack to prevent tail calling.

Thanks! Frankly I lost my hope on this...

>> ---
>>  gcc/c-family/c-attribs.c |  1 +
>>  gcc/cgraphunit.c |  6 --
>>  gcc/doc/extend.texi  |  7 +++
>>  gcc/testsuite/gcc.dg/pr66826-1.c | 14 ++
>>  gcc/testsuite/gcc.dg/pr66826-2.c |  6 ++
>>  gcc/tree-tailcall.c  |  4 
>>  gcc/tree.c   |  7 ---
>>  7 files changed, 40 insertions(+), 5 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.dg/pr66826-1.c
>>  create mode 100644 gcc/testsuite/gcc.dg/pr66826-2.c
>>
>> diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
>> index 695c58c..482db00 100644
>> --- a/gcc/c-family/c-attribs.c
>> +++ b/gcc/c-family/c-attribs.c
>> @@ -345,6 +345,7 @@ const struct attribute_spec c_common_attribute_table[] =
>> handle_bnd_instrument, false },
>>{ "fallthrough", 0, 0, false, false, false,
>> handle_fallthrough_attribute, false },
>> +  { "no_tail_call",   0, 0, false, true, true, NULL, true },
> Is no_tail_call supposed to be attached to the function's decl or type?
>
> ISTM this is most similar to noclone, noinline, no_icf and friends which
> seem to attach the attribute to the decl rather than to the type.

Glibc people were worried that attribute would be lost when taking a
pointer to function
(https://sourceware.org/ml/libc-alpha/2017-01/msg00482.html). I think
their reasoning was that return address is a shadow argument for
dlsym-like functions so this would cause a (most likely inadvertent)
ABI error.

>>{ NULL, 0, 0, false, false, false, NULL, false }
>>  };
>>
>> diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
>> index 4a949ca..e23fd21 100644
>> --- a/gcc/cgraphunit.c
>> +++ b/gcc/cgraphunit.c
>> @@ -1823,6 +1824,7 @@ cgraph_node::expand_thunk (bool output_asm_thunks, 
>> bool force_gimple_thunk)
>>callees->call_stmt = call;
>>gimple_call_set_from_thunk (call, true);
>>gimple_call_set_with_bounds (call, instrumentation_clone);
>> +  no_tail_call_p = lookup_attribute ("no_tail_call", TYPE_ATTRIBUTES 
>> (gimple_call_fntype (call)));
> And I think the answer to the question above potentially impacts this
> chunk of code.  If we were to change the attribute to apply to the decl,
> then you'd need to look at the decl here rather than its type.
>
>
>> index f586edc..30a6fad 100644
>> --- a/gcc/tree-tailcall.c
>> +++ b/gcc/tree-tailcall.c
>> @@ -601,6 +601,10 @@ find_tail_calls (basic_block bb, struct tailcall **ret)
>>if (m && POINTER_TYPE_P (TREE_TYPE (DECL_RESULT (current_function_decl
>>  return;
>>
>> +  /* See if function does not want to be tailcalled.  */
>> +  if (lookup_attribute ("no_tail_call", TYPE_ATTRIBUTES (gimple_call_fntype 
>> (call
>> +return;
> Similarly and perhaps in other locations as 

[PATCH] C/C++: add fix-it hints for various missing symbols

2017-07-03 Thread David Malcolm
This patch improves our C/C++ frontends' handling of missing
symbols, by making c_parser_require and cp_parser_require use
"better" locations for the diagnostic, and insert fix-it hints,
under certain circumstances (see the comments in the patch for
full details).

For example, for this code with a missing semicolon:

  $ cat test.c
  int missing_semicolon (void)
  {
return 42
  }

trunk currently emits:

  test.c:4:1: error: expected ‘;’ before ‘}’ token
   }
   ^

This patch adds a fix-it hint for the missing semicolon, and puts
the error at the location of the missing semicolon, printing the
followup token as a secondary location:

  test.c:3:12: error: expected ‘;’ before ‘}’ token
 return 42
  ^
  ;
   }
   ~

More examples can be seen in the test cases.

For reference, clang prints the following:

  test.c:3:12: error: expected ';' after return statement
return 42
 ^
 ;

i.e. describing what syntactic thing came before, which
I think is likely to be more meaningful to the user.

clang can also print notes about matching opening symbols
e.g. the note here:

  missing-symbol-2.c:25:22: error: expected ']'
const char test [42;
   ^
  missing-symbol-2.c:25:19: note: to match this '['
const char test [42;
^
which, although somewhat redundant for this example, seems much more
useful if there's non-trivial nesting of constructs, or more than a few
lines separating the open/close symbols (e.g. showing a stray "namespace {"
that the user forgot to close).

I'd like to implement both of these ideas as followups, but in
the meantime, is the fix-it hint patch OK for trunk?
(successfully bootstrapped & regrtested on x86_64-pc-linux-gnu)

gcc/c-family/ChangeLog:
* c-common.c (c_parse_error): Add RICHLOC param, and use it rather
than implicitly using input_location.
(enum missing_token_insertion_kind): New enum.
(get_missing_token_insertion_kind): New function.
(maybe_suggest_missing_token_insertion): New function.
* c-common.h (c_parse_error): Add RICHLOC param.
(maybe_suggest_missing_token_insertion): New decl.

gcc/c/ChangeLog:
* c-parser.c (struct c_parser): Add "previous_token_loc" field.
(c_parser_consume_token): Set parser->previous_token_loc.
(c_parser_error): Rename to...
(c_parser_error_richloc): ...this, making static, and adding
"richloc" parameter, passing it to the c_parse_error call,
rather than calling c_parser_set_source_position_from_token.
(c_parser_error): Reintroduce, reimplementing in terms of the
above.
(c_parser_require): Add "type_is_unique" param.  Use
c_parser_error_richloc rather than c_parser_error, calling
maybe_suggest_missing_token_insertion.
(c_parser_parms_list_declarator): Override default value of new
"type_is_unique" param to c_parser_require.
(c_parser_asm_statement): Likewise.
* c-parser.h (c_parser_require): Add "type_is_unique" param,
defaulting to true.

gcc/cp/ChangeLog:
* parser.c (cp_parser_error): Add rich_location to call to
c_parse_error.
(get_required_cpp_ttype): New function.
(cp_parser_required_error): Remove calls to cp_parser_error,
instead setting a non-NULL gmsgid, and handling it if set by
calling c_parse_error, potentially with a fix-it hint.

gcc/testsuite/ChangeLog:
* c-c++-common/cilk-plus/AN/parser_errors.c: Update expected
output to reflect changes to reported locations of missing
symbols.
* c-c++-common/cilk-plus/AN/parser_errors2.c: Likewise.
* c-c++-common/cilk-plus/AN/parser_errors3.c: Likewise.
* c-c++-common/cilk-plus/AN/pr61191.c: Likewise.
* c-c++-common/gomp/pr63326.c: Likewise.
* c-c++-common/missing-symbol.c: New test case.
* g++.dg/cpp1y/digit-sep-neg.C: Update expected output to reflect
changes to reported locations of missing symbols.
* g++.dg/cpp1y/pr65202.C: Likewise.
* g++.dg/other/do1.C: Likewise.
* g++.dg/missing-symbol-2.C: New test case.
* g++.dg/parse/error11.C: Update expected output to reflect
changes to reported locations of missing symbols.
* g++.dg/parse/pragma2.C: Likewise.
* g++.dg/template/error11.C: Likewise.
* gcc.dg/missing-symbol-2.c: New test case.
* gcc.dg/missing-symbol-3.c: New test case.
* gcc.dg/noncompile/940112-1.c: Update expected output to reflect
changes to reported locations of missing symbols.
* gcc.dg/noncompile/971104-1.c: Likewise.
* obj-c++.dg/exceptions-6.mm: Likewise.
* obj-c++.dg/pr48187.mm: Likewise.
* objc.dg/exceptions-6.m: Likewise.
---
 gcc/c-family/c-common.c| 176 +-
 gcc/c-family/c-common.h|   7 

[committed] C++: fix "RT_INTERATION" typo

2017-07-03 Thread David Malcolm
r159808 (aka c247dce0f0ab6cbd1ac41cbca6b40b5d46a73f41) introduced
the required_token enum to the C++ frontend, but contained a minor
spelling mistake when expecting an iteration-statement, which this
patch fixes.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.

Committed to trunk as r249931, under the "obvious" rule.

gcc/cp/ChangeLog:
* parser.c (enum required_token): Fix spelling of
RT_INTERATION to RT_ITERATION.
(cp_parser_iteration_statement): Likewise.
(cp_parser_required_error): Likewise.
---
 gcc/cp/parser.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 4adf9aa..1ee3ffe 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -173,7 +173,7 @@ enum required_token {
   RT_AT_THROW, /* @throw */
 
   RT_SELECT,  /* selection-statement */
-  RT_INTERATION, /* iteration-statement */
+  RT_ITERATION, /* iteration-statement */
   RT_JUMP, /* jump-statement */
   RT_CLASS_KEY, /* class-key */
   RT_CLASS_TYPENAME_TEMPLATE, /* class, typename, or template */
@@ -11929,7 +11929,7 @@ cp_parser_iteration_statement (cp_parser* parser, bool 
*if_p, bool ivdep)
   token_indent_info guard_tinfo;
 
   /* Peek at the next token.  */
-  token = cp_parser_require (parser, CPP_KEYWORD, RT_INTERATION);
+  token = cp_parser_require (parser, CPP_KEYWORD, RT_ITERATION);
   if (!token)
 return error_mark_node;
 
@@ -27906,7 +27906,7 @@ cp_parser_required_error (cp_parser *parser,
  case RT_SELECT:
cp_parser_error (parser, "expected selection-statement");
return;
- case RT_INTERATION:
+ case RT_ITERATION:
cp_parser_error (parser, "expected iteration-statement");
return;
  case RT_JUMP:
-- 
1.8.5.3



RFC Kill TYPE_METHODS

2017-07-03 Thread Nathan Sidwell
We currently have both TYPE_FIELDS and TYPE_METHODS for RECORD or UNION 
types.


Originally TYPE_FIELDS held the FIELD_DECLS, but the C++ FE puts other 
kinds of things there -- TYPE_DECLs are a favourite.  The C++ FE was the 
only user of TYPE_METHODS, which holds member functions.  AFAICT it is 
still the only generator.


Given that the common code iterating over TYPE_FIELDS must already check 
for non FIELD_DECL things, it seems superflous for the methods to be on 
a separate list.  A quick grep shows ipa-devirt, c-ada-spec.c and debug 
emission to be the only non C++ FE things that would need a bit of cleanup.


Having a single chain of member decls will simplify the C++ FE, as I try 
and merge its class member symbol handling.


Any objections to going down this path?

nathan
--
Nathan Sidwell


Re: [PATCH 1/3] c-family: add name_hint/deferred_diagnostic

2017-07-03 Thread David Malcolm
On Mon, 2017-07-03 at 10:25 -0600, Jeff Law wrote:
> On 05/05/2017 11:51 AM, David Malcolm wrote:
> > In various places we use lookup_name_fuzzy to provide a hint,
> > and can report messages of the form:
> >   error: unknown foo named 'bar'
> > or:
> >   error: unknown foo named 'bar'; did you mean 'SUGGESTION?
> > 
> > This patch provides a way for lookup_name_fuzzy to provide
> > both the suggestion above, and (optionally) additional hints
> > that can be printed e.g.
> > 
> >   note: did you forget to include ?
> > 
> > This patch provides the mechanism and ports existing users
> > of lookup_name_fuzzy to the new return type.
> > There are no uses of such hints in this patch, but followup
> > patches provide various front-end specific uses of this.
> > 
> > gcc/c-family/ChangeLog:
> > * c-common.h (class deferred_diagnostic): New class.
> > (class name_hint): New class.
> > (lookup_name_fuzzy): Convert return type from const char *
> > to name_hint.  Add location_t param.
> > 
> > gcc/c/ChangeLog:
> > * c-decl.c (implicit_decl_warning): Convert "hint" from
> > const char * to name_hint.  Pass location to
> > lookup_name_fuzzy.  Suppress any deferred diagnostic if the
> > warning was not printed.
> > (undeclared_variable): Likewise for "guessed_id".
> > (lookup_name_fuzzy): Convert return type from const char *
> > to name_hint.  Add location_t param.
> > * c-parser.c (c_parser_declaration_or_fndef): Convert "hint"
> > from
> > const char * to name_hint.  Pass location to lookup_name_fuzzy.
> > (c_parser_parameter_declaration): Pass location to
> > lookup_name_fuzzy.
> > 
> > gcc/cp/ChangeLog:
> > * name-lookup.c (suggest_alternatives_for): Convert
> > "fuzzy_name" from
> > const char * to name_hint, and rename to "hint".  Pass location
> > to
> > lookup_name_fuzzy.
> > (lookup_name_fuzzy): Convert return type from const char *
> > to name_hint.  Add location_t param.
> > * parser.c (cp_parser_diagnose_invalid_type_name): Convert
> > "suggestion" from const char * to name_hint, and rename to
> > "hint".
> > Pass location to lookup_name_fuzzy.
> 
> > ---
> >  gcc/c-family/c-common.h | 121
> > +++-
> >  gcc/c/c-decl.c  |  35 +++---
> >  gcc/c/c-parser.c|  16 ---
> >  gcc/cp/name-lookup.c|  17 +++
> >  gcc/cp/parser.c |  12 ++---
> >  5 files changed, 163 insertions(+), 38 deletions(-)
> > 
> > diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
> > index 138a0a6..83c1a68 100644
> > --- a/gcc/c-family/c-common.h
> > +++ b/gcc/c-family/c-common.h
> > @@ -1009,7 +1009,126 @@ enum lookup_name_fuzzy_kind {
> >/* Any name.  */
> >FUZZY_LOOKUP_NAME
> >  };
> > -extern const char *lookup_name_fuzzy (tree, enum
> > lookup_name_fuzzy_kind);
> > +
> > +/* A deferred_diagnostic is a wrapper around optional extra
> > diagnostics
> > +   that we may want to bundle into a name_hint.
> > +
> > +   The emit method is called when no name_hint instances reference
> > +   the deferred_diagnostic.  In the simple case this is when the
> > name_hint
> > +   goes out of scope, but a reference-counting scheme is used to
> > allow
> > +   name_hint instances to be copied.  */
> > +
> > +class deferred_diagnostic
> > +{
> > + public:
> > +  virtual ~deferred_diagnostic () {}
> > +  virtual void emit () = 0;
> > +
> > +  void incref () { m_refcnt++; }
> > +  void decref ()
> > +  {
> > +if (--m_refcnt == 0)
> > +  {
> > +   if (!m_suppress)
> > + emit ();
> > +   delete this;
> > +  }
> > +  }
> > +
> > +  location_t get_location () const { return m_loc; }
> > +
> > +  /* Call this if the corresponding warning was not emitted,
> > + in which case we should also not emit the
> > deferred_diagnostic.  */
> > +  void suppress ()
> > +  {
> > +m_suppress = true;
> > +  }
> > +
> > + protected:
> > +  deferred_diagnostic (location_t loc)
> > +  : m_refcnt (0), m_loc (loc), m_suppress (false) {}
> > +
> > + private:
> > +  int m_refcnt;
> > +  location_t m_loc;
> > +  bool m_suppress;
> > +};
> So what stands out here is "delete this" and the need for explicit
> reference counting.  Also doesn't that imply that deferred_diagnostic
> objects must be allocated on the heap?  Is there another way to get
> the
> behavior you want without resorting to something like this?
> 

Thanks for looking at this.

Yes: deferred_diagnostic instances are heap-allocated.  This is because
it's an abstract base class; each concrete subclass is an
implementation detail within the frontends, for isolating the special
-case logic for each different kind of hint, and thus these concrete
subclasses are hidden within the FE code.

My initial implementation of the above had the name_hint class directly
"own" the deferred_diagnostic ptr, with a:
  delete m_deferred;
within name_hint's dtor.

This worked OK, until I encountered places in the C and 

Re: Revamp loop profile scaling to profile_probability

2017-07-03 Thread Jan Hubicka
> On Sat, Jul 1, 2017 at 7:14 PM, Jan Hubicka  wrote:
> > Hi,
> > this patch makes loop profile scaling to use profile_probability.  This
> > is mostly trivial change except for vect_do_peeling which seems to scale
> > profile down and then back up.  This is a bad idea, because things may 
> > simply
> > drop to 0.  So I kept that one to use integer scaling (because probability
> > can not represent value greater than 1).
> >
> > Bootstrapped/regtested x86_64-linux.
> 
> This likely regressed
> 
> FAIL: gcc.dg/vect/pr79347.c scan-tree-dump-not vect "Invalid sum of "

Oops, thanks. It was stupid updating typo (which took me a long while to find).
I will commit the following after regtesting.

Index: tree-vect-loop-manip.c
===
--- tree-vect-loop-manip.c  (revision 249926)
+++ tree-vect-loop-manip.c  (working copy)
@@ -1849,8 +1849,8 @@ vect_do_peeling (loop_vec_info loop_vinf
 get lost if we scale down to 0.  */
  int scale_up = REG_BR_PROB_BASE * REG_BR_PROB_BASE
 / prob_vector.to_reg_br_prob_base ();
- basic_block *bbs = get_loop_body (loop);
- scale_bbs_frequencies_int (bbs, loop->num_nodes, scale_up,
+ basic_block *bbs = get_loop_body (epilog);
+ scale_bbs_frequencies_int (bbs, epilog->num_nodes, scale_up,
 REG_BR_PROB_BASE);
  free (bbs);
}


Re: [PATCH v2] Add no_tail_call attribute

2017-07-03 Thread Jeff Law
On 05/29/2017 11:24 PM, Yuri Gribov wrote:
> On Mon, May 29, 2017 at 8:14 AM, Yuri Gribov  wrote:
>> Hi all,
>>
>> As discussed in
>> https://sourceware.org/ml/libc-alpha/2017-01/msg00455.html , some
>> libdl functions rely on return address to figure out the calling
>> DSO and then use this information in computation (e.g. output of dlsym
>> depends on which library called it).
>>
>> As reported in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66826 this
>> may break under tailcall optimization i.e. in cases like
>>
>>   return dlsym(...);
>>
>> Carlos confirmed that they would prefer to have GCC attribute to
>> prevent tailcalls
>> (https://sourceware.org/ml/libc-alpha/2017-01/msg00502.html) so there
>> you go.
>>
>> This was bootstrapped on x86_64. Given that this is a minor addition,
>> I only ran newly added regtests. I hope that's enough (full testsuite
>> would take a week on my notebook...).
> Added docs, per Alex's suggestion.
> 
> -Y
> 
> 
> 0001-Added-no_tail_call-attribute.patch
> 
> 
> From 1f4590e7a633c6335512b012578bddba7602b3c9 Mon Sep 17 00:00:00 2001
> From: Yury Gribov 
> Date: Sun, 28 May 2017 21:02:20 +0100
> Subject: [PATCH] Added no_tail_call attribute.
> 
> gcc/
> 2017-05-29  Yury Gribov  
> 
>   * cgraphunit.c (cgraph_node::expand_thunk): Prevent
>   tailcalling functions marked with no_tail_call.
>   * gcc/doc/extend.texi: Document no_tail_call.
>   * tree-tailcall.c (find_tail_calls): Ditto.
>   * tree.c (comp_type_attributes): Treat no_tail_call
>   mismatch as error.
> 
> gcc/c-family/
> 2017-05-29  Yury Gribov  
> 
>   * c-attribs.c: New attribute.
> 
> gcc/testsuite/
> 2017-05-29  Yury Gribov  
> 
>   * gcc.dg/pr66826-1.c: New test.
>   * gcc.dg/pr66826-2.c: New test.
I think a "no_tail_call" attribute is quite reasonable -- more so than
some asm hack to prevent tail calling.


> ---
>  gcc/c-family/c-attribs.c |  1 +
>  gcc/cgraphunit.c |  6 --
>  gcc/doc/extend.texi  |  7 +++
>  gcc/testsuite/gcc.dg/pr66826-1.c | 14 ++
>  gcc/testsuite/gcc.dg/pr66826-2.c |  6 ++
>  gcc/tree-tailcall.c  |  4 
>  gcc/tree.c   |  7 ---
>  7 files changed, 40 insertions(+), 5 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/pr66826-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/pr66826-2.c
> 
> diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
> index 695c58c..482db00 100644
> --- a/gcc/c-family/c-attribs.c
> +++ b/gcc/c-family/c-attribs.c
> @@ -345,6 +345,7 @@ const struct attribute_spec c_common_attribute_table[] =
> handle_bnd_instrument, false },
>{ "fallthrough", 0, 0, false, false, false,
> handle_fallthrough_attribute, false },
> +  { "no_tail_call",   0, 0, false, true, true, NULL, true },
Is no_tail_call supposed to be attached to the function's decl or type?

ISTM this is most similar to noclone, noinline, no_icf and friends which
seem to attach the attribute to the decl rather than to the type.



>{ NULL, 0, 0, false, false, false, NULL, false }
>  };
>  
> diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
> index 4a949ca..e23fd21 100644
> --- a/gcc/cgraphunit.c
> +++ b/gcc/cgraphunit.c
> @@ -1823,6 +1824,7 @@ cgraph_node::expand_thunk (bool output_asm_thunks, bool 
> force_gimple_thunk)
>callees->call_stmt = call;
>gimple_call_set_from_thunk (call, true);
>gimple_call_set_with_bounds (call, instrumentation_clone);
> +  no_tail_call_p = lookup_attribute ("no_tail_call", TYPE_ATTRIBUTES 
> (gimple_call_fntype (call)));
And I think the answer to the question above potentially impacts this
chunk of code.  If we were to change the attribute to apply to the decl,
then you'd need to look at the decl here rather than its type.


> index f586edc..30a6fad 100644
> --- a/gcc/tree-tailcall.c
> +++ b/gcc/tree-tailcall.c
> @@ -601,6 +601,10 @@ find_tail_calls (basic_block bb, struct tailcall **ret)
>if (m && POINTER_TYPE_P (TREE_TYPE (DECL_RESULT (current_function_decl
>  return;
>  
> +  /* See if function does not want to be tailcalled.  */
> +  if (lookup_attribute ("no_tail_call", TYPE_ATTRIBUTES (gimple_call_fntype 
> (call
> +return;
Similarly and perhaps in other locations as well.

THoughts?

jeff


[C++ PATCH] classtype_has_nothrow_assign_or_copy_p is confusing

2017-07-03 Thread Nathan Sidwell
I found classtype_has_nothrow_assign_or_copy_p confusing, trying to 
figure out when it should return false and when true.


AFAICT what it's trying to tell you is if *all* the copy/move ctors or 
copy assignment ops are nothrow.  Not, as its comment suggests that it 
has *at least one* nothrow variant.


If there are no assignment ops it returns false, but if there's at least 
one, but there is no copy-assignment op it'll return true.  That seems 
wrong. (but perhaps that never happens, because the copy-op will be 
implicitly defined).  It's certainly confusing.


Also the differing check on copy_fn_p's result is a difference that 
means nothing -- the -ve result means 'this is a broken copy/move 
ctor/assop', which IMHO puts it in the 'don't care' category.


This patch cleans it up to:

a) just consider well-formed fns (copy_fn_p returns > 0, but could just 
as easily become != 0)


b) only return true if it finds at least one function of interest -- and 
no functions of interest throw.


This caused no regressions, I'll commit in a few days unless someone 
notices a mistake in the above logic.


nathan
--
Nathan Sidwell
2017-07-03  Nathan Sidwell  

	* semantics.c (classtype_has_nothrow_assign_or_copy_p): Clarify
	semantics, simplify implementation.

Index: semantics.c
===
--- semantics.c	(revision 249925)
+++ semantics.c	(working copy)
@@ -9072,19 +9072,16 @@ finish_decltype_type (tree expr, bool id
 }
 
 /* Called from trait_expr_value to evaluate either __has_nothrow_assign or 
-   __has_nothrow_copy, depending on assign_p.  */
+   __has_nothrow_copy, depending on assign_p.  Returns true iff all
+   the copy {ctor,assign} fns are nothrow.  */
 
 static bool
 classtype_has_nothrow_assign_or_copy_p (tree type, bool assign_p)
 {
-  tree fns;
+  tree fns = NULL_TREE;
 
   if (assign_p)
-{
-  fns = lookup_fnfields_slot (type, cp_assignment_operator_id (NOP_EXPR));
-  if (!fns)
-	return false;
-} 
+fns = lookup_fnfields_slot (type, cp_assignment_operator_id (NOP_EXPR));
   else if (TYPE_HAS_COPY_CTOR (type))
 {
   /* If construction of the copy constructor was postponed, create
@@ -9095,27 +9092,22 @@ classtype_has_nothrow_assign_or_copy_p (
 	lazily_declare_fn (sfk_move_constructor, type);
   fns = CLASSTYPE_CONSTRUCTORS (type);
 }
-  else
-return false;
 
+  bool saw_copy = false;
   for (ovl_iterator iter (fns); iter; ++iter)
 {
   tree fn = *iter;
- 
-  if (assign_p)
+
+  if (copy_fn_p (fn) > 0)
 	{
-	  if (copy_fn_p (fn) == 0)
-	continue;
+	  saw_copy = true;
+	  maybe_instantiate_noexcept (fn);
+	  if (!TYPE_NOTHROW_P (TREE_TYPE (fn)))
+	return false;
 	}
-  else if (copy_fn_p (fn) <= 0)
-	continue;
-
-  maybe_instantiate_noexcept (fn);
-  if (!TYPE_NOTHROW_P (TREE_TYPE (fn)))
-	return false;
 }
 
-  return true;
+  return saw_copy;
 }
 
 /* Actually evaluates the trait.  */


Re: [C++] Fix decomp ICE with invalid initializer (PR c++/81258)

2017-07-03 Thread Nathan Sidwell

On 07/03/2017 12:05 PM, Jakub Jelinek wrote:


Ok.  In the light of the http://gcc.gnu.org/ml/gcc-patches/2017-06/msg02432.html
thread, shouldn't this be structured binding declaration then?
I.e.


Yes, I think that's better, thanks!

nathan
--
Nathan Sidwell


Re: [patch][arm/wwwdocs] Release note update for be8 changes

2017-07-03 Thread Richard Earnshaw (lists)
On 03/07/17 14:23, Richard Earnshaw (lists) wrote:
> The existing code in arm/bpabi.h was quite fragile and relied on matching
> specific CPU and/or architecture names.  The introduction of the option
> format for -mcpu and -march broke that in a way that would be non-trivial
> to fix by updating the list.  The hook in that file was always a pain
> as it required every new CPU being added to be add an update here as well
> (easy to miss).
> 
> I've fixed that problem once and for all by adding a new callback into
> the driver to select the correct BE8 behaviour.  This uses features in
> the ISA capabilities list to select whether or not to use BE8 format
> during linking.
> 
> I also noticed that if the user happened to pass both -mbig-endian and
> -mlittle-endian on the command line then the linker spec rules would
> get somewhat confused and potentially do the wrong thing.  I've fixed that
> by marking these options as opposites in the option descriptions.  The
> driver will now automatically suppress overridden options leading to the
> correct desired behavior.
> 
> Whilst fixing this I noticed a couple of anomolus cases in the
> existing BE8 support: we were not generating BE8 format for ARMv6 or
> ARMv7-R targets.  While the ARMv6 status was probably deliberate at
> the time, this is probably not a good idea in the long term as the
> alternative, BE32, has been deprecated by ARM.  After discussion with
> a couple of colleagues I've decided to change this, but to then add an
> option to restore the existing behaviour at the user's option.  So
> this patch introduces two new options (opposites) -mbe8 and -mbe32.
> 
> This is a quiet behavior change, so I'll add a comment to the release
> notes shortly.
> 

And this is the update to wwwdocs.


Index: htdocs/gcc-8/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-8/changes.html,v
retrieving revision 1.5
diff -p -r1.5 changes.html
*** htdocs/gcc-8/changes.html	20 Jun 2017 10:19:11 -	1.5
--- htdocs/gcc-8/changes.html	30 Jun 2017 15:25:58 -
*** a work-in-progress.
*** 94,99 
--- 94,108 
  setting unless the compiler has been configured with an explicit
  --with-fpu option.

+   
+ The default link behavior for ARMv6 and ARMv7-R targets has been
+ changed to produce BE8 format when generating big-endian images.  A new
+ flag -mbe32 can be used to force the linker to produce
+ legacy BE32 format images.  There is no change of behavior for
+ ARMv6-m and other ARMv7 or later targets: these already defaulted
+ to BE8 format.  This change brings GCC into alignment with other
+ compilers for the ARM architecture.
+   
  
  
  


Re: [patch][arm] Clean up generation of BE8 format images.

2017-07-03 Thread Richard Earnshaw (lists)
On 03/07/17 16:04, Joseph Myers wrote:
> On Mon, 3 Jul 2017, Richard Earnshaw (lists) wrote:
> 
>>  * doc/invoke.texi (ARM Options): Document -mbe8 and -mbe32.
> 
> Should also update the option summary inside @gccoptlist.
> 

Good catch.  Fixed as follows:


* doc/invoke.texi (ARM Options): Add -mbe8 and -mbe32 to option summary.

R.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index fb2e51a..04cecf9 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -631,6 +631,7 @@ Objective-C and Objective-C++ Dialects}.
 -mapcs-reentrant  -mno-apcs-reentrant @gol
 -msched-prolog  -mno-sched-prolog @gol
 -mlittle-endian  -mbig-endian @gol
+-mbe8 -mbe32 @gol
 -mfloat-abi=@var{name} @gol
 -mfp16-format=@var{name}
 -mthumb-interwork  -mno-thumb-interwork @gol


Re: [PATCH 3/3] C: hints for missing stdlib includes for macros and types

2017-07-03 Thread Jeff Law
On 05/05/2017 11:51 AM, David Malcolm wrote:
> The C frontend already "knows" about many common functions in
> the C standard library:
> 
>   test.c: In function 'test':
>   test.c:3:3: warning: implicit declaration of function 'printf' 
> [-Wimplicit-function-declaration]
>  printf ("hello world\n");
>  ^~
>   test.c:3:3: warning: incompatible implicit declaration of built-in function 
> 'printf'
>   test.c:3:3: note: include '' or provide a declaration of 'printf'
> 
> and which header file they are in.
> 
> However it doesn't know about various types and macros:
> 
> test.c:1:13: error: 'NULL' undeclared here (not in a function)
>  void *ptr = NULL;
>  ^~~~
> 
> This patch uses the name_hint/deferred_diagnostic machinery to
> add hints for missing C standard library headers for some of the
> most common type and macro names.
> 
> For example, the above becomes:
> test.c:1:13: error: 'NULL' undeclared here (not in a function)
>  void *ptr = NULL;
>  ^~~~
> test.c:1:13: note: 'NULL' is defined in header ''; did you forget 
> to '#include '?
> 
> If the patch to add fix-it hints for missing #includes is approved:
>   https://gcc.gnu.org/ml/gcc-patches/2017-05/msg00321.html
> then it's trivial to add a fix-it hint to the note.
> 
> gcc/c/ChangeLog:
>   * c-decl.c (get_c_name_hint): New function.
>   (class suggest_missing_header): New class.
>   (lookup_name_fuzzy): Call get_c_name_hint and use it to
>   suggest missing headers to the user.
> 
> gcc/testsuite/ChangeLog:
>   * gcc.dg/spellcheck-stdlib.c: New test case.
OK once prereqs are approved.

FWIW, I'm getting a little concerned that we're adding a lot of overhead
to the error paths -- that often doesn't matter.  But sometimes it does
(testcase reduction, errors with machine generated code, etc).
Something to be aware of.


jeff


Re: [PATCH 2/3] C++: provide macro used-before-defined hint (PR c++/72786).

2017-07-03 Thread Jeff Law
On 05/05/2017 11:51 AM, David Malcolm wrote:
> This patch uses the name_hint/deferred_diagnostic to provide
> a message in the C++ frontend if a macro is used before it is defined
> e.g.:
> 
> test.c:6:24: error: expected ‘;’ at end of member declaration
>virtual void clone() const OVERRIDE { }
> ^
>  ;
> test.c:6:30: error: ‘OVERRIDE’ does not name a type
>virtual void clone() const OVERRIDE { }
>   ^~~~
> test.c:6:30: note: the macro ‘OVERRIDE’ had not yet been defined
> test.c:15:0: note: it was later defined here
>  #define OVERRIDE override
> 
> It's possible to do it from the C++ frontend as tokenization happens
> up-front (and hence the macro already exists when the above is parsed);
> I attempted to do it from the C frontend, but because the C frontend only
> tokenizes on-demand during parsing, the macro isn't known about until
> later.
> 
> gcc/cp/ChangeLog:
>   PR c++/72786
>   * name-lookup.c (class macro_use_before_def): New class.
>   (lookup_name_fuzzy): Detect macro that were used before being
>   defined, and report them as such.
> 
> gcc/ChangeLog:
>   PR c++/72786
>   * spellcheck.h (best_match::blithely_get_best_candidate): New
>   accessor.
> 
> gcc/testsuite/ChangeLog:
>   PR c++/72786
>   * g++.dg/spellcheck-macro-ordering-2.C: New test case.
>   * g++.dg/spellcheck-macro-ordering.C: Add dg-message directives
>   for macro used-before-defined.
> 
> libcpp/ChangeLog:
>   PR c++/72786
>   * include/cpplib.h (cpp_macro_definition_location): New decl.
>   * macro.c (cpp_macro_definition): New function.
This is fine once the prereq is approved.

jeff


Re: [PATCH 1/3] c-family: add name_hint/deferred_diagnostic

2017-07-03 Thread Jeff Law
On 05/05/2017 11:51 AM, David Malcolm wrote:
> In various places we use lookup_name_fuzzy to provide a hint,
> and can report messages of the form:
>   error: unknown foo named 'bar'
> or:
>   error: unknown foo named 'bar'; did you mean 'SUGGESTION?
> 
> This patch provides a way for lookup_name_fuzzy to provide
> both the suggestion above, and (optionally) additional hints
> that can be printed e.g.
> 
>   note: did you forget to include ?
> 
> This patch provides the mechanism and ports existing users
> of lookup_name_fuzzy to the new return type.
> There are no uses of such hints in this patch, but followup
> patches provide various front-end specific uses of this.
> 
> gcc/c-family/ChangeLog:
>   * c-common.h (class deferred_diagnostic): New class.
>   (class name_hint): New class.
>   (lookup_name_fuzzy): Convert return type from const char *
>   to name_hint.  Add location_t param.
> 
> gcc/c/ChangeLog:
>   * c-decl.c (implicit_decl_warning): Convert "hint" from
>   const char * to name_hint.  Pass location to
>   lookup_name_fuzzy.  Suppress any deferred diagnostic if the
>   warning was not printed.
>   (undeclared_variable): Likewise for "guessed_id".
>   (lookup_name_fuzzy): Convert return type from const char *
>   to name_hint.  Add location_t param.
>   * c-parser.c (c_parser_declaration_or_fndef): Convert "hint" from
>   const char * to name_hint.  Pass location to lookup_name_fuzzy.
>   (c_parser_parameter_declaration): Pass location to
>   lookup_name_fuzzy.
> 
> gcc/cp/ChangeLog:
>   * name-lookup.c (suggest_alternatives_for): Convert "fuzzy_name" from
>   const char * to name_hint, and rename to "hint".  Pass location to
>   lookup_name_fuzzy.
>   (lookup_name_fuzzy): Convert return type from const char *
>   to name_hint.  Add location_t param.
>   * parser.c (cp_parser_diagnose_invalid_type_name): Convert
>   "suggestion" from const char * to name_hint, and rename to "hint".
>   Pass location to lookup_name_fuzzy.

> ---
>  gcc/c-family/c-common.h | 121 
> +++-
>  gcc/c/c-decl.c  |  35 +++---
>  gcc/c/c-parser.c|  16 ---
>  gcc/cp/name-lookup.c|  17 +++
>  gcc/cp/parser.c |  12 ++---
>  5 files changed, 163 insertions(+), 38 deletions(-)
> 
> diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
> index 138a0a6..83c1a68 100644
> --- a/gcc/c-family/c-common.h
> +++ b/gcc/c-family/c-common.h
> @@ -1009,7 +1009,126 @@ enum lookup_name_fuzzy_kind {
>/* Any name.  */
>FUZZY_LOOKUP_NAME
>  };
> -extern const char *lookup_name_fuzzy (tree, enum lookup_name_fuzzy_kind);
> +
> +/* A deferred_diagnostic is a wrapper around optional extra diagnostics
> +   that we may want to bundle into a name_hint.
> +
> +   The emit method is called when no name_hint instances reference
> +   the deferred_diagnostic.  In the simple case this is when the name_hint
> +   goes out of scope, but a reference-counting scheme is used to allow
> +   name_hint instances to be copied.  */
> +
> +class deferred_diagnostic
> +{
> + public:
> +  virtual ~deferred_diagnostic () {}
> +  virtual void emit () = 0;
> +
> +  void incref () { m_refcnt++; }
> +  void decref ()
> +  {
> +if (--m_refcnt == 0)
> +  {
> + if (!m_suppress)
> +   emit ();
> + delete this;
> +  }
> +  }
> +
> +  location_t get_location () const { return m_loc; }
> +
> +  /* Call this if the corresponding warning was not emitted,
> + in which case we should also not emit the deferred_diagnostic.  */
> +  void suppress ()
> +  {
> +m_suppress = true;
> +  }
> +
> + protected:
> +  deferred_diagnostic (location_t loc)
> +  : m_refcnt (0), m_loc (loc), m_suppress (false) {}
> +
> + private:
> +  int m_refcnt;
> +  location_t m_loc;
> +  bool m_suppress;
> +};
So what stands out here is "delete this" and the need for explicit
reference counting.  Also doesn't that imply that deferred_diagnostic
objects must be allocated on the heap?  Is there another way to get the
behavior you want without resorting to something like this?

Or is your argument that deferred_diagnostic is only used from within
class name_hint and thus the concerns around heap vs stack, explicit
counting, etc are all buried inside the name_hint class?  If so, is
there any reasonable way to restrict the use of deferred_disagnostic to
within the name_hint class?

The rest of the changes seem non-controversial, so I think if we can
sort out the issues with those classes then this will be fine to move
forward.

jeff



Re: [PATCH GCC8][33/33]Fix PR69710/PR68030 by reassociate vect base address and a simple CSE pass

2017-07-03 Thread Jeff Law
On 04/18/2017 04:54 AM, Bin Cheng wrote:
> Hi,
> This is the same patch posted at 
> https://gcc.gnu.org/ml/gcc-patches/2016-05/msg02000.html,
> after rebase against this patch series.  This patch was blocked because 
> without this patch
> series, it could generate worse code on targets with limited addressing mode 
> support, like
> AArch64.
> There was some discussion about alternative fix for PRs, but after thinking 
> twice I think
> this fix is in the correct direction.  A CSE interface is useful to clean up 
> code generated
> in vectorizer, and we should improve this CSE interface into a region base 
> one.  for the
> moment, optimal code is not generated on targets like x86, I believe it's 
> because the CSE
> is weak and doesn't cover all basic blocks generated by vectorizer, the issue 
> should be
> fixed if region-based CSE is implemented.
> Is it OK?
> 
> Thanks,
> bin
> 2017-04-11  Bin Cheng  
> 
>   PR tree-optimization/68030
>   PR tree-optimization/69710
>   * tree-ssa-dom.c (cse_bbs): New function.
>   * tree-ssa-dom.h (cse_bbs): New declaration.
>   * tree-vect-data-refs.c (vect_create_addr_base_for_vector_ref):
>   Re-associate address by splitting constant offset.
>   (vect_create_data_ref_ptr, vect_setup_realignment): Record changed
>   basic block.
>   * tree-vect-loop-manip.c (vect_gen_prolog_loop_niters): Record
>   changed basic block.
>   * tree-vectorizer.c (tree-ssa-dom.h): Include header file.
>   (changed_bbs): New variable.
>   (vectorize_loops): Allocate and free CHANGED_BBS.  Call cse_bbs.
>   * tree-vectorizer.h (changed_bbs): New declaration.
> 
So are you still interested in moving this forward Bin?  I know you did
a minor update in response to Michael Meissner's problems.  Is there
another update planned?

THe only obvious thing I'd suggest changing in the DOM interface is to
have continue to walk the dominator tree, but do nothing for blocks that
are not in changed_bbs.  That way you walk blocks in changed_bbs in
dominator order rather than in bb->index order.

Jeff



[Patch committed] Bug 81033 - [8 Regression] Revision r249019 breaks bootstrap on darwin

2017-07-03 Thread Dominique d'Humières
Patch

--- ../_clean/gcc/config/darwin.c   2017-01-01 17:39:06.0 +0100
+++ gcc/config/darwin.c 2017-07-03 14:21:19.0 +0200
@@ -3683,11 +3683,9 @@ default_function_sections:
 void
 darwin_function_switched_text_sections (FILE *fp, tree decl, bool new_is_cold)
 {
-  char buf[128];
-  snprintf (buf, 128, "%s%s",new_is_cold?"__cold_sect_of_":"__hot_sect_of_",
-   IDENTIFIER_POINTER (DECL_NAME (decl)));
   /* Make sure we pick up all the relevant quotes etc.  */
-  assemble_name_raw (fp, (const char *) buf);
+  assemble_name_raw (fp, new_is_cold?"__cold_sect_of_":"__hot_sect_of_");
+  assemble_name_raw (fp, IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl)));
   fputs (":\n", fp);
 }
 
committed as revision r249926.

Thanks Jan and Richard for the help.

Dominique



Re: [PATCH] diagnostics: fix end-points of ranges within macros (PR c++/79300)

2017-07-03 Thread Jeff Law
On 02/02/2017 01:53 PM, David Malcolm wrote:
> PR c++/79300 identifies an issue in which diagnostics_show_locus
> prints the wrong end-point for a range within a macro:
> 
>assert ((p + val_size) - buf == encoded_len);
>~^~~~
> 
> as opposed to:
> 
>assert ((p + val_size) - buf == encoded_len);
>~^~
> 
> The caret, start and finish locations of this compound location are
> all virtual locations.
> 
> The root cause is that when diagnostic-show-locus.c's layout ctor
> expands the caret and end-points, it calls
>   linemap_client_expand_location_to_spelling_point
> which (via expand_location_1) unwinds the macro expansions, and
> then calls linemap_expand_location.  Doing so implicitly picks the
> *caret* location for any virtual locations, and so in the above case
> it picks these spelling locations for the three parts of the location:
> 
>assert ((p + val_size) - buf == encoded_len);
>^^  ^
>START|  FINISH
>   CARET
> 
> and so erroneously strips the underlining from the final token, apart
> from its first character.
> 
> The fix is for layout's ctor to indicate that it wants the start/finish
> locations in such a situation, adding a new param to
> linemap_client_expand_location_to_spelling_point, so that
> expand_location_1 can handle this case by extracting the relevant part
> of the unwound compound location, and thus choose:
> 
>assert ((p + val_size) - buf == encoded_len);
>^^^
>START|FINISH
>   CARET
> 
> Successfully bootstrapped on x86_64-pc-linux-gnu.
> 
> OK for stage 4, or should I wait until stage 1?
> 
> gcc/ChangeLog:
>   PR c++/79300
>   * diagnostic-show-locus.c (layout::layout): Use start and finish
>   spelling location for the start and finish of each range.
>   * genmatch.c (linemap_client_expand_location_to_spelling_point):
>   Add unused aspect param.
>   * input.c (expand_location_1): Add "aspect" param, and use it
>   to access the correct part of the location.
>   (expand_location): Pass LOCATION_ASPECT_CARET to new param of
>   expand_location_1.
>   (expand_location_to_spelling_point): Likewise.
>   (linemap_client_expand_location_to_spelling_point): Add "aspect"
>   param, and pass it to expand_location_1.
> 
> gcc/testsuite/ChangeLog:
>   PR c++/79300
>   * c-c++-common/Wmisleading-indentation-3.c (fn_14): Update
>   expected underlining within macro expansion.
>   * c-c++-common/pr70264.c: Likewise.
>   * g++.dg/plugin/diagnostic-test-expressions-1.C
>   (test_within_macro_1): New test.
>   (test_within_macro_2): Likewise.
>   (test_within_macro_3): Likewise.
>   (test_within_macro_4): Likewise.
>   * gcc.dg/format/diagnostic-ranges.c (test_macro_3): Update
>   expected underlining within macro expansion.
>   (test_macro_4): Likewise.
>   * gcc.dg/plugin/diagnostic-test-expressions-1.c
>   (test_within_macro_1): New test.
>   (test_within_macro_2): Likewise.
>   (test_within_macro_3): Likewise.
>   (test_within_macro_4): Likewise.
>   * gcc.dg/spellcheck-fields-2.c (test_macro): Update expected
>   underlining within macro expansion.
> 
> libcpp/ChangeLog:
>   PR c++/79300
>   * include/line-map.h (enum location_aspect): New enum.
>   (linemap_client_expand_location_to_spelling_point): Add
>   enum location_aspect param.
>   * line-map.c (source_range::intersects_line_p): Update for new
>   param of linemap_client_expand_location_to_spelling_point.
>   (rich_location::get_expanded_location): Likewise.
>   (fixit_insert::affects_line_p): Likewise.
So we punted this to gcc-8 stage1.   Now that I've finally looked at it,
it looks good to me.

Sorry for the long wait.

jeff



Re: [C++] Fix decomp ICE with invalid initializer (PR c++/81258)

2017-07-03 Thread Jakub Jelinek
On Fri, Jun 30, 2017 at 01:38:13PM -0400, Nathan Sidwell wrote:
> On 06/30/2017 01:24 PM, Jakub Jelinek wrote:
> 
> > The initializer for structured binding has to be one of:
> > = assignment-expression
> > ( assignment-expression )
> > { assignment-expression }
> > but cp_parser_initializer can parse other forms, with fewer or more
> > expressions in there.  Some cases we caught with various cryptic errors
> > or pedwarns, but others we just ICEd on.
> > 
> > The following patch attempts to check this.
> 
> ok, but ...
> 
> > --- gcc/testsuite/g++.dg/cpp1z/decomp21.C.jj2017-01-19 
> > 17:01:21.0 +0100
> > +++ gcc/testsuite/g++.dg/cpp1z/decomp21.C   2017-06-30 11:07:04.786746784 
> > +0200
> > @@ -12,5 +12,5 @@ foo ()
> > auto [ n, o, p ] { a };
> > auto [ q, r, t ] ( s );
> > auto [ u, v, w ] ( s, );  // { dg-error "expected 
> > primary-expression before '.' token" }
> > -  auto [ x, y, z ] ( a );   // { dg-error "expression list treated as 
> > compound expression in initializer" "" { target *-*-* } .-1 }
> > +  auto [ x, y, z ] ( a );   // { dg-error "invalid initializer for 
> > structured binding" "" { target *-*-* } .-1 }
> >   }
> 
> The .-1 on the final error is actually about the previous statement, not the
> line it's lexically on.  Could you put it on a line on its own, while you're
> there?

Ok.  In the light of the http://gcc.gnu.org/ml/gcc-patches/2017-06/msg02432.html
thread, shouldn't this be structured binding declaration then?
I.e.

2017-07-03  Jakub Jelinek  

PR c++/81258
* parser.c (cp_parser_decomposition_declaration): Diagnose invalid
forms of structured binding initializers.

* g++.dg/cpp1z/decomp21.C (foo): Adjust expected diagnostics.
* g++.dg/cpp1z/decomp30.C: New test.

--- gcc/cp/parser.c.jj  2017-06-30 09:49:25.0 +0200
+++ gcc/cp/parser.c 2017-06-30 11:03:18.526521000 +0200
@@ -13196,6 +13196,16 @@ cp_parser_decomposition_declaration (cp_
   *init_loc = cp_lexer_peek_token (parser->lexer)->location;
   tree initializer = cp_parser_initializer (parser, _direct_init,
_constant_p);
+  if (initializer == NULL_TREE
+ || (TREE_CODE (initializer) == TREE_LIST
+ && TREE_CHAIN (initializer))
+ || (TREE_CODE (initializer) == CONSTRUCTOR
+ && CONSTRUCTOR_NELTS (initializer) != 1))
+   {
+ error_at (loc, "invalid initializer for structured binding "
+   "declaration");
+ initializer = error_mark_node;
+   }
 
   if (decl != error_mark_node)
{
--- gcc/testsuite/g++.dg/cpp1z/decomp21.C.jj2017-01-19 17:01:21.0 
+0100
+++ gcc/testsuite/g++.dg/cpp1z/decomp21.C   2017-06-30 11:07:04.786746784 
+0200
@@ -12,5 +12,6 @@ foo ()
   auto [ n, o, p ] { a };
   auto [ q, r, t ] ( s );
   auto [ u, v, w ] ( s, );  // { dg-error "expected primary-expression 
before '.' token" }
-  auto [ x, y, z ] ( a );   // { dg-error "expression list treated as 
compound expression in initializer" "" { target *-*-* } .-1 }
+   // { dg-error "invalid initializer for 
structured binding declaration" "" { target *-*-* } .-1 }
+  auto [ x, y, z ] ( a );
 }
--- gcc/testsuite/g++.dg/cpp1z/decomp30.C.jj2017-06-30 11:09:31.934942575 
+0200
+++ gcc/testsuite/g++.dg/cpp1z/decomp30.C   2017-06-30 11:09:22.0 
+0200
@@ -0,0 +1,12 @@
+// PR c++/81258
+// { dg-options -std=c++1z }
+
+int a[2];
+auto [b, c] (a);
+auto [d, e] { a };
+auto [f, g] = a;
+auto [h, i] ( a, a );  // { dg-error "invalid initializer for structured 
binding declaration" }
+auto [j, k] { a, a };  // { dg-error "invalid initializer for structured 
binding declaration" }
+auto [l, m] = { a };   // { dg-error "deducing from brace-enclosed initializer 
list requires" }
+auto [n, o] {};// { dg-error "invalid initializer for 
structured binding declaration" }
+auto [p, q] ();// { dg-error "invalid initializer for 
structured binding declaration" }


Jakub


Re: [C++ PATCH] "decomposition declaration" -> "structured binding" in C++ diagnostics

2017-07-03 Thread Jakub Jelinek
Hi!

On Fri, Jun 30, 2017 at 03:44:50PM -0400, Jason Merrill wrote:
> Well, the term "structured binding" refers to one of the names
> declared by the declaration, not the declaration as a whole, and those
> errors refer to the latter.  We could change "cannot be declared" to
> something else, perhaps just drop the "declared", so e.g. "structured
> binding declaration cannot be %"?  Or "cannot use X
> specifier"?

So like this?
I've used cannot be %<...%>.
For types where it used to be previously
decomposition declaration cannot be declared with type
I'm using
structured binding declaration cannot have type
(not sure if you have other preference in that case).

2017-07-03  Jakub Jelinek  

* parser.c (cp_parser_decomposition_declaration): Replace
decomposition declaration with structured binding in diagnostics.
* decl.c (cp_finish_decomp): Likewise.
(grokdeclarator): Likewise.

* g++.dg/cpp1z/decomp1.C: Expect structured binding instead of
decomposition declaration in diagnostics.
* g++.dg/cpp1z/decomp2.C: Likewise.
* g++.dg/cpp1z/decomp3.C: Likewise.
* g++.dg/cpp1z/decomp4.C: Likewise.
* g++.dg/cpp1z/decomp5.C: Likewise.
* g++.dg/cpp1z/decomp6.C: Likewise.
* g++.dg/cpp1z/decomp7.C: Likewise.
* g++.dg/cpp1z/decomp8.C: Likewise.
* g++.dg/cpp1z/decomp13.C: Likewise.
* g++.dg/cpp1z/decomp14.C: Likewise.
* g++.dg/cpp1z/decomp18.C: Likewise.
* g++.dg/cpp1z/decomp19.C: Likewise.
* g++.dg/cpp1z/decomp22.C: Likewise.
* g++.dg/cpp1z/decomp23.C: Likewise.
* g++.dg/cpp1z/decomp24.C: Likewise.
* g++.dg/cpp1z/decomp25.C: Likewise.
* g++.dg/cpp1z/decomp26.C: Likewise.
* g++.dg/cpp1z/decomp28.C: Likewise.

--- gcc/cp/parser.c.jj  2017-07-03 17:40:13.292479327 +0200
+++ gcc/cp/parser.c 2017-07-03 17:51:20.389823434 +0200
@@ -13150,7 +13150,7 @@ cp_parser_decomposition_declaration (cp_
 }
 
   if (cxx_dialect < cxx1z)
-pedwarn (loc, 0, "decomposition declaration only available with "
+pedwarn (loc, 0, "structured bindings only available with "
 "-std=c++1z or -std=gnu++1z");
 
   tree pushed_scope;
@@ -13199,7 +13199,7 @@ cp_parser_decomposition_declaration (cp_
 
   if (v.is_empty ())
 {
-  error_at (loc, "empty decomposition declaration");
+  error_at (loc, "empty structured binding declaration");
   decl = error_mark_node;
 }
 
--- gcc/cp/decl.c.jj2017-06-30 16:51:54.054985468 +0200
+++ gcc/cp/decl.c   2017-07-03 17:51:57.013383043 +0200
@@ -7486,8 +7486,8 @@ cp_finish_decomp (tree decl, tree first,
 
  if (init == error_mark_node || eltype == error_mark_node)
{
- inform (dloc, "in initialization of decomposition variable %qD",
- v[i]);
+ inform (dloc, "in initialization of structured binding "
+ "variable %qD", v[i]);
  goto error_out;
}
  /* Save the decltype away before reference collapse.  */
@@ -10135,7 +10135,7 @@ grokdeclarator (const cp_declarator *dec
  break;
 
case cdk_decomp:
- name = "decomposition";
+ name = "structured binding";
  break;
 
case cdk_error:
@@ -10589,43 +10589,43 @@ grokdeclarator (const cp_declarator *dec
? declarator->declarator->id_loc : declarator->id_loc);
   if (inlinep)
error_at (declspecs->locations[ds_inline],
- "decomposition declaration cannot be declared %");
+ "structured binding declaration cannot be %");
   if (typedef_p)
error_at (declspecs->locations[ds_typedef],
- "decomposition declaration cannot be declared %");
+ "structured binding declaration cannot be %");
   if (constexpr_p)
-   error_at (declspecs->locations[ds_constexpr], "decomposition "
- "declaration cannot be declared %");
+   error_at (declspecs->locations[ds_constexpr], "structured "
+ "binding declaration cannot be %");
   if (thread_p)
error_at (declspecs->locations[ds_thread],
- "decomposition declaration cannot be declared %qs",
+ "structured binding declaration cannot be %qs",
  declspecs->gnu_thread_keyword_p
  ? "__thread" : "thread_local");
   if (concept_p)
error_at (declspecs->locations[ds_concept],
- "decomposition declaration cannot be declared %");
+ "structured binding declaration cannot be %");
   switch (storage_class)
{
case sc_none:
  break;
case sc_register:
- error_at (loc, "decomposition declaration cannot be declared "
+ error_at (loc, "structured 

Re: [PATCH][PR 57371] Remove useless floating point casts in comparisons

2017-07-03 Thread Jeff Law
On 07/02/2017 11:03 AM, Yuri Gribov wrote:
> Hi all,
> 
> This is initial patch for
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57371 . Per Joseph's
> suggestion it optimizes
>   (float)lhs CMP rhs
>   (double)lhs CMP rhs
> to
>   lhs CMP (typeof(x))rhs
> whenever typeof(x) can be precisely represented by floating-point type
> (e.g. short by float or int by double) and rhs can be precisely
> represented by typeof(x).
> 
> Bootstrapped/regtested on x64. Ok for trunk?
> 
> I'd like to extend this further in follow-up patches:
> 1) fold always-false/always-true comparisons e.g.
>   short x;
>   (float)x > INT16_MAX;  // Always false
> 2) get rid of cast in comparisons with zero regardless of typeof(lhs)
> when -fno-trapping-math:
>   (float_or_double)lhs CMP 0
> 
> -Y
> 
> 
> pr57371-1.patch
> 
> 
> 2017-07-02  Yury Gribov  
> 
>   PR tree-optimization/57371
>   * match.pd: New pattern.
>   * testsuite/gcc.dg/pr57371-1.c: New test.
>   * testsuite/gcc.dg/pr57371-2.c: New test.
> 
> diff -rupN gcc/gcc/match.pd gcc-57371/gcc/match.pd
> --- gcc/gcc/match.pd  2017-06-29 21:14:57.0 +0200
> +++ gcc-57371/gcc/match.pd2017-07-01 09:08:04.0 +0200
> @@ -2802,7 +2802,35 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> (simplify
>  (cmp (sq @0) (sq @1))
>(if (! HONOR_NANS (@0))
> - (cmp @0 @1))
> + (cmp @0 @1)
> +
> + /* Get rid of float cast in
> + (float_type)N CMP M
> +if N and M are within the range explicitly representable
> +by float type.
> +
> +TODO: fold always true/false comparisons if M is outside valid range.  */
> + (simplify
> +  (cmp (float @0) REAL_CST@1)
> +  (if (SCALAR_FLOAT_TYPE_P (TREE_TYPE (@1)))
> +   (with
> +{
> +  tree itype = TREE_TYPE (@0);
> +
> +  const real_format *fmt = REAL_MODE_FORMAT (TYPE_MODE (TREE_TYPE (@1)));
> +
> +  const REAL_VALUE_TYPE *rhs = TREE_REAL_CST_PTR (@1);
> +  bool not_rhs_int_p = false;
> +  wide_int rhs_int = real_to_integer (rhs, _rhs_int_p, 
> TYPE_PRECISION (itype));
> +}
> +(if (!not_rhs_int_p
> + && !(TYPE_UNSIGNED (itype) && real_isneg (rhs))
> + && wi::ge_p (rhs_int, wi::min_value (itype), TYPE_SIGN (itype))
> + && wi::le_p (rhs_int, wi::max_value (itype), TYPE_SIGN (itype))
> + && TYPE_PRECISION (itype) <= significand_size (fmt))
> + (cmp @0 { wide_int_to_tree (itype, rhs_int); })
> +
> +)
Seems like a nit, but instead of "not_rhs_int_p" use "fail" or something
like that.  That makes it easier to mentally parse the conditional which
uses the result.

What happens if @0 is a floating point type?  Based on the variable name
"itype" and passing TYPE_PRECISION (itype) to real_to_integer, it seems
like you're expecting @0 to be an integer.  If so, you should verify
that it really is an integer type.  Seems like a good thing to verify
with tests as well.

Jeff




Re: [PATCH][testsuite] Add dg-require-stack-check

2017-07-03 Thread Jeff Law
On 07/03/2017 09:00 AM, Christophe Lyon wrote:
> Hi,
> 
> This is a follow-up to
> https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01791.html
> 
> This patch adds dg-require-stack-check and updates the tests that use
> dg-options "-fstack-check" to avoid failures on configurations that to
> not support it.
> 
> I merely copied what we currently do to check if visibility flags are
> supported, and cross-tested on aarch64 and arm targets with the
> results I expected.
> 
> This means that my testing does not cover the changes I propose for
> i386 and gnat.
> 
> Is it OK nonetheless?
> 
> Thanks,
> 
> Christophe
> 
> 
> stack-check-et.chlog.txt
> 
> 
> 2017-07-03  Christophe Lyon  
> 
>   * lib/target-supports-dg.exp (dg-require-stack-check): New.
>   * lib/target-supports.exp (check_stack_check_available): New.
>   * g++.dg/other/i386-9.C: Add dg-require-stack-check.
>   * gcc.c-torture/compile/stack-check-1.c: Likewise.
>   * gcc.dg/graphite/run-id-pr47653.c: Likewise.
>   * gcc.dg/pr47443.c: Likewise.
>   * gcc.dg/pr48134.c: Likewise.
>   * gcc.dg/pr70017.c: Likewise.
>   * gcc.target/aarch64/stack-checking.c: Likewise.
>   * gcc.target/arm/stack-checking.c: Likewise.
>   * gcc.target/i386/pr48723.c: Likewise.
>   * gcc.target/i386/pr55672.c: Likewise.
>   * gcc.target/i386/pr67265-2.c: Likewise.
>   * gcc.target/i386/pr67265.c: Likewise.
>   * gnat.dg/opt49.adb: Likewise.
>   * gnat.dg/stack_check1.adb: Likewise.
>   * gnat.dg/stack_check2.adb: Likewise.
>   * gnat.dg/stack_check3.adb: Likewise.
ACK once you address Rainer's comments.  I've got further stack-check
tests in the queue which I'll update once your change goes in.

jeff


Re: [PATCH][PR 57371] Remove useless floating point casts in comparisons

2017-07-03 Thread Jeff Law
On 07/03/2017 08:52 AM, Joseph Myers wrote:
> I'd expect much more thorough testcases here, both for cases that get 
> optimized and cases that don't.  You're only testing comparisons with 
> zero.  There should be comparisons with other values, both integer and 
> noninteger, both within the range for which optimizing would be valid and 
> outside it, both inside the range of the integer type and outside it.  
> (To the extent that you don't optimize some cases that would be valid to 
> optimize as discussed in that PR, XFAILed tests, or deferring adding 
> tests, would be reasonable.  But each case identified in that PR as not 
> valid to optimize, or only valid to optimize with -fno-trapping-math, 
> should have corresponding tests that it's not optimized.)
> 
> Since SCALAR_FLOAT_TYPE_P includes decimal floating-point types, tests 
> with those are desirable as well (in gcc.dg/dfp or c-c++-common/dfp, I 
> suppose).
> 
Agreed.  I think with better testing this should be able to move forward
after the technical review.  It's not terribly different conceptually
than the code in DOM/VRP, except that Yuri's changes work on floating
point types.

I'm pretty sure DOM's bits could be replaced with a suitable match.pd
pattern (which IMHO would be a small improvement across multiple axis).
VRP would be more difficult as the VRP implementation depends on getting
the value range of the RHS of the conditional.

Jeff


Re: [PATCH][testsuite] Add dg-require-stack-check

2017-07-03 Thread Christophe Lyon
On 3 July 2017 at 17:12, Rainer Orth  wrote:
> Hi Christophe,
>
>> This is a follow-up to
>> https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01791.html
>>
>> This patch adds dg-require-stack-check and updates the tests that use
>> dg-options "-fstack-check" to avoid failures on configurations that to
>> not support it.
>>
>> I merely copied what we currently do to check if visibility flags are
>> supported, and cross-tested on aarch64 and arm targets with the
>> results I expected.
>>
>> This means that my testing does not cover the changes I propose for
>> i386 and gnat.
>
> better give it a whirl e.g. on an x86 system in the compile farm to
> catch typos and stuff.
>
> Besides, this requires documenting in sourcebuild.texi.

Ha, yes, I keep forgetting about this.

>
> Rainer
>
> --
> -
> Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH][testsuite] Add dg-require-stack-check

2017-07-03 Thread Rainer Orth
Hi Christophe,

> This is a follow-up to
> https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01791.html
>
> This patch adds dg-require-stack-check and updates the tests that use
> dg-options "-fstack-check" to avoid failures on configurations that to
> not support it.
>
> I merely copied what we currently do to check if visibility flags are
> supported, and cross-tested on aarch64 and arm targets with the
> results I expected.
>
> This means that my testing does not cover the changes I propose for
> i386 and gnat.

better give it a whirl e.g. on an x86 system in the compile farm to
catch typos and stuff.

Besides, this requires documenting in sourcebuild.texi.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


[PATCH] hash table indentation

2017-07-03 Thread Nathan Sidwell

I noticed a mis-indented line.

applied to trunk.

nathan
--
Nathan Sidwell
2017-07-03  Nathan Sidwell  

	* hash-table.h (hash_table_mod1): Fix indentation.

Index: hash-table.h
===
--- hash-table.h	(revision 249922)
+++ hash-table.h	(working copy)
@@ -325,7 +325,7 @@ hash_table_mod1 (hashval_t hash, unsigne
 {
   const struct prime_ent *p = _tab[index];
   gcc_checking_assert (sizeof (hashval_t) * CHAR_BIT <= 32);
-return mul_mod (hash, p->prime, p->inv, p->shift);
+  return mul_mod (hash, p->prime, p->inv, p->shift);
 }
 
 /* Compute the secondary table index for HASH given current prime index.  */


Re: [patch][arm] Clean up generation of BE8 format images.

2017-07-03 Thread Joseph Myers
On Mon, 3 Jul 2017, Richard Earnshaw (lists) wrote:

>   * doc/invoke.texi (ARM Options): Document -mbe8 and -mbe32.

Should also update the option summary inside @gccoptlist.

-- 
Joseph S. Myers
jos...@codesourcery.com


[PATCH][testsuite] Add dg-require-stack-check

2017-07-03 Thread Christophe Lyon
Hi,

This is a follow-up to
https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01791.html

This patch adds dg-require-stack-check and updates the tests that use
dg-options "-fstack-check" to avoid failures on configurations that to
not support it.

I merely copied what we currently do to check if visibility flags are
supported, and cross-tested on aarch64 and arm targets with the
results I expected.

This means that my testing does not cover the changes I propose for
i386 and gnat.

Is it OK nonetheless?

Thanks,

Christophe
2017-07-03  Christophe Lyon  

* lib/target-supports-dg.exp (dg-require-stack-check): New.
* lib/target-supports.exp (check_stack_check_available): New.
* g++.dg/other/i386-9.C: Add dg-require-stack-check.
* gcc.c-torture/compile/stack-check-1.c: Likewise.
* gcc.dg/graphite/run-id-pr47653.c: Likewise.
* gcc.dg/pr47443.c: Likewise.
* gcc.dg/pr48134.c: Likewise.
* gcc.dg/pr70017.c: Likewise.
* gcc.target/aarch64/stack-checking.c: Likewise.
* gcc.target/arm/stack-checking.c: Likewise.
* gcc.target/i386/pr48723.c: Likewise.
* gcc.target/i386/pr55672.c: Likewise.
* gcc.target/i386/pr67265-2.c: Likewise.
* gcc.target/i386/pr67265.c: Likewise.
* gnat.dg/opt49.adb: Likewise.
* gnat.dg/stack_check1.adb: Likewise.
* gnat.dg/stack_check2.adb: Likewise.
* gnat.dg/stack_check3.adb: Likewise.
diff --git a/gcc/testsuite/lib/target-supports-dg.exp 
b/gcc/testsuite/lib/target-supports-dg.exp
index 6400d64..d50d8b0 100644
--- a/gcc/testsuite/lib/target-supports-dg.exp
+++ b/gcc/testsuite/lib/target-supports-dg.exp
@@ -265,6 +265,21 @@ proc dg-require-linker-plugin { args } {
 }
 }
 
+# If this target does not support the "stack-check" option, skip this
+# test.
+
+proc dg-require-stack-check { args } {
+set stack_check_available [ check_stack_check_available [lindex $args 1 ] ]
+if { $stack_check_available == -1 } {
+   upvar name name
+   unresolved "$name"
+}
+if { $stack_check_available != 1 } {
+   upvar dg-do-what dg-do-what
+   set dg-do-what [list [lindex ${dg-do-what} 0] "N" "P"]
+}
+}
+
 # Add any target-specific flags needed for accessing the given list
 # of features.  This must come after all dg-options.
 
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index fe5e777..d19892e 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -1063,6 +1063,17 @@ proc check_effective_target_fstack_protector {} {
 } "-fstack-protector"]
 }
 
+# Return 1 if the target supports -fstack-check or -fstack-check=$stack_kind
+proc check_stack_check_available { stack_kind } {
+if [string match "" $stack_kind] then {
+   set stack_opt "-fstack-check"
+} else { set stack_opt "-fstack-check=$stack_kind" }
+
+return [check_no_compiler_messages stack_check executable {
+   int main (void) { return 0; }
+} "$stack_opt"]
+}
+
 # Return 1 if compilation with -freorder-blocks-and-partition is error-free
 # for trivial code, 0 otherwise.  As some targets (ARM for example) only
 # warn when -fprofile-use is also supplied we test that combination too.
diff --git a/gcc/testsuite/g++.dg/other/i386-9.C 
b/gcc/testsuite/g++.dg/other/i386-9.C
index 7964057..782cf87 100644
--- a/gcc/testsuite/g++.dg/other/i386-9.C
+++ b/gcc/testsuite/g++.dg/other/i386-9.C
@@ -2,6 +2,7 @@
 // Testcase by Zdenek Sojka 
 
 // { dg-do run { target i?86-*-* x86_64-*-* } }
+/* { dg-require-stack-check "" } */
 // { dg-options "-Os -mpreferred-stack-boundary=5 -fstack-check 
-fno-omit-frame-pointer" }
 
 int main()
diff --git a/gcc/testsuite/gcc.c-torture/compile/stack-check-1.c 
b/gcc/testsuite/gcc.c-torture/compile/stack-check-1.c
index 5c99688..2a03f7c 100644
--- a/gcc/testsuite/gcc.c-torture/compile/stack-check-1.c
+++ b/gcc/testsuite/gcc.c-torture/compile/stack-check-1.c
@@ -1,3 +1,4 @@
 /* { dg-require-effective-target untyped_assembly } */
+/* { dg-require-stack-check "" } */
 /* { dg-additional-options "-fstack-check" } */
 #include "20031023-1.c"
diff --git a/gcc/testsuite/gcc.dg/graphite/run-id-pr47653.c 
b/gcc/testsuite/gcc.dg/graphite/run-id-pr47653.c
index cd9d8eb..ca91af4 100644
--- a/gcc/testsuite/gcc.dg/graphite/run-id-pr47653.c
+++ b/gcc/testsuite/gcc.dg/graphite/run-id-pr47653.c
@@ -1,3 +1,4 @@
+/* { dg-require-stack-check "generic" } */
 /* { dg-options "-O -fstack-check=generic -ftree-pre -fgraphite-identity" } */
 /* nvptx doesn't expose a stack.  */
 /* { dg-skip-if "" { nvptx-*-* } } */
diff --git a/gcc/testsuite/gcc.dg/pr47443.c b/gcc/testsuite/gcc.dg/pr47443.c
index 47abea2..5a5c43f 100644
--- a/gcc/testsuite/gcc.dg/pr47443.c
+++ b/gcc/testsuite/gcc.dg/pr47443.c
@@ -1,5 +1,6 @@
 /* PR tree-optimization/47443 */
 /* { dg-do compile } */
+/* { dg-require-stack-check "generic" } */
 /* { dg-options 

Re: [PATCH][PR 57371] Remove useless floating point casts in comparisons

2017-07-03 Thread Joseph Myers
I'd expect much more thorough testcases here, both for cases that get 
optimized and cases that don't.  You're only testing comparisons with 
zero.  There should be comparisons with other values, both integer and 
noninteger, both within the range for which optimizing would be valid and 
outside it, both inside the range of the integer type and outside it.  
(To the extent that you don't optimize some cases that would be valid to 
optimize as discussed in that PR, XFAILed tests, or deferring adding 
tests, would be reasonable.  But each case identified in that PR as not 
valid to optimize, or only valid to optimize with -fno-trapping-math, 
should have corresponding tests that it's not optimized.)

Since SCALAR_FLOAT_TYPE_P includes decimal floating-point types, tests 
with those are desirable as well (in gcc.dg/dfp or c-c++-common/dfp, I 
suppose).

-- 
Joseph S. Myers
jos...@codesourcery.com


Fix ICE in update_br_prob_note

2017-07-03 Thread Jan Hubicka
Hi,
this patch fixes two issues triggered by testcase in PR middle-end/81290.
First is that jump threading has hack converting frequencies to counts
and computing probabilities out of them.  This gets wrong quality for
these. Second is that force_edge_cold is bit overzelaous when propagate
coldnes backward across CFG.

Bootstrapped/regtested x86_64-linux, will commit it shortly.

Honza

PR middle-end/81290
* predict.c (force_edge_cold): Be more careful about propagation
backward.
* profile-count.h (profile_probability::guessed,
profile_probability::fdo, profile_count::guessed, profile_count::fdo):
New.
* tree-ssa-threadupdate.c (recompute_probabilities): Result is guessed.

* gcc.c-torture/compile/pr81290.c: New.
Index: predict.c
===
--- predict.c   (revision 249906)
+++ predict.c   (working copy)
@@ -3962,15 +3962,26 @@ force_edge_cold (edge e, bool impossible
  e2->count.apply_scale (count_sum2, count_sum);
e2->probability /= prob_comp;
  }
-  if (current_ir_type () != IR_GIMPLE)
+  if (current_ir_type () != IR_GIMPLE
+ && e->src != ENTRY_BLOCK_PTR_FOR_FN (cfun))
update_br_prob_note (e->src);
 }
   /* If all edges out of e->src are unlikely, the basic block itself
  is unlikely.  */
   else
 {
-  e->probability = profile_probability::always ();
-  if (current_ir_type () != IR_GIMPLE)
+  if (prob_sum == profile_probability::never ())
+e->probability = profile_probability::always ();
+  else
+   {
+ if (impossible)
+   e->probability = profile_probability::never ();
+ /* If BB has some edges out that are not impossible, we can not
+assume that BB itself is.  */
+ impossible = false;
+   }
+  if (current_ir_type () != IR_GIMPLE
+ && e->src != ENTRY_BLOCK_PTR_FOR_FN (cfun))
update_br_prob_note (e->src);
   if (e->src->count == profile_count::zero ())
return;
Index: profile-count.h
===
--- profile-count.h (revision 249907)
+++ profile-count.h (working copy)
@@ -351,6 +351,22 @@ public:
   return profile_probability::always() - *this;
 }
 
+  /* Return THIS with quality dropped to GUESSED.  */
+  profile_probability guessed () const
+{
+  profile_probability ret = *this;
+  ret.m_quality = profile_guessed;
+  return ret;
+}
+
+  /* Return THIS with quality dropped to AFDO.  */
+  profile_probability afdo () const
+{
+  profile_probability ret = *this;
+  ret.m_quality = profile_afdo;
+  return ret;
+}
+
   profile_probability combine_with_freq (int freq1, profile_probability other,
 int freq2) const
 {
@@ -767,6 +783,22 @@ public:
   return ret;
 }
 
+  /* Return THIS with quality dropped to GUESSED.  */
+  profile_count guessed () const
+{
+  profile_count ret = *this;
+  ret.m_quality = profile_guessed;
+  return ret;
+}
+
+  /* Return THIS with quality dropped to AFDO.  */
+  profile_count afdo () const
+{
+  profile_count ret = *this;
+  ret.m_quality = profile_afdo;
+  return ret;
+}
+
   /* Return probability of event with counter THIS within event with counter
  OVERALL.  */
   profile_probability probability_in (const profile_count overall) const
Index: testsuite/gcc.c-torture/compile/pr81290.c
===
--- testsuite/gcc.c-torture/compile/pr81290.c   (revision 0)
+++ testsuite/gcc.c-torture/compile/pr81290.c   (working copy)
@@ -0,0 +1,22 @@
+/* { dg-options "-funroll-loops" } */
+int vz;
+
+void
+ms (int sw, int cm)
+{
+  for (vz = 0; vz < 19; ++vz)
+{
+ fx:
+  sw *= 2;
+}
+
+  for (;;)
+{
+  if (sw != 0)
+for (;;)
+  {
+  }
+  if (1 / 0 && cm != 0)
+goto fx;
+}
+}
Index: tree-ssa-threadupdate.c
===
--- tree-ssa-threadupdate.c (revision 249906)
+++ tree-ssa-threadupdate.c (working copy)
@@ -908,7 +908,7 @@ recompute_probabilities (basic_block bb)
 
   /* Prevent overflow computation due to insane profiles.  */
   if (esucc->count < bb->count)
-   esucc->probability = esucc->count.probability_in (bb->count);
+   esucc->probability = esucc->count.probability_in (bb->count).guessed ();
   else
/* Can happen with missing/guessed probabilities, since we
   may determine that more is flowing along duplicated
@@ -1051,7 +1051,8 @@ freqs_to_counts_path (struct redirection
   if (ein->probability.initialized_p ())
 ein->count = profile_count::from_gcov_type
  (apply_probability (ein->src->frequency * REG_BR_PROB_BASE,
-   

Re: [Patch AArch64 docs] Document the RcPc extension

2017-07-03 Thread Richard Earnshaw (lists)
On 23/06/17 11:21, James Greenhalgh wrote:
> 
> Hi,
> 
> Andrew pointed out that I did not document the new architecture extension
> flag I added the RcPc iextension. This was intentional, as enablihg the rcpc
> extension does not change GCC code generation, and is just an assembler flag.
> But for completeness, here is documentation for the new option.
> 
> OK?
> 

OK.

R.

> Thanks,
> James
> 
> ---
> 2017-06-21  James Greenhalgh  
> 
>   * doc/invoke.texi (rcpc architecture extension): Document it.
> 
> 
> 0001-Patch-AArch64-docs-Document-the-RcPc-extension.patch
> 
> 
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 7e7a16a5..db00e51 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -14172,6 +14172,10 @@ Enable Large System Extension instructions.  This is 
> on by default for
>  @option{-march=armv8.1-a}.
>  @item fp16
>  Enable FP16 extension.  This also enables floating-point instructions.
> +@item rcpc
> +Enable the RcPc extension.  This does not change code generation from GCC,
> +but is passed on to the assembler, enabling inline asm statements to use
> +instructions from the RcPc extension.
>  
>  @end table
>  
> 



Re: [PATCH, 2/4] Handle GOMP_OPENACC_NVPTX_{DISASM,SAVE_TEMPS} in libgomp nvptx plugin

2017-07-03 Thread Tom de Vries

On 07/03/2017 04:08 PM, Thomas Schwinge wrote:

Hi!

On Mon, 26 Jun 2017 17:29:11 +0200, Jakub Jelinek  wrote:

On Mon, Jun 26, 2017 at 03:26:57PM +, Joseph Myers wrote:

On Mon, 26 Jun 2017, Tom de Vries wrote:


2. Handle GOMP_OPENACC_NVPTX_{DISASM,SAVE_TEMPS} in libgomp nvptx plugin


This patch adds handling of:
- GOMP_OPENACC_NVPTX_SAVE_TEMPS=[01], and
- GOMP_OPENACC_NVPTX_DISASM=[01]


Why the "OPENACC" in these names?


I took the format from 'GOMP_OPENACC_DIM'.


Doesn't this debugging aid apply to
any variant of offloading?


I guess you're right. These environment variables would also be 
applicable for f.i. offloading via openmp on nvptx. I'll strip the 
'OPENACC_' bit from the variables.



The filename used for dumping the module is plugin-nvptx..cubin.


Also, I suggest to make these names similar to their controlling options,
that is: "gomp-nvptx*", for example.



Makes sense, will do.

Thanks,
- Tom


Re: [C++ PATCH] conversion operator names

2017-07-03 Thread Nathan Sidwell

On 07/01/2017 05:40 PM, Andreas Schwab wrote:

On Jun 30 2017, Nathan Sidwell  wrote:


* config-lang.in (gtfiles): Add cp/lex.c.



That breaks obj-c++.


sorry about that.  Turns out I was only building objc not objc++, so I 
didn't notice that.  While objcp's config-lang.in claimed the first part 
of its gtfiles initializer was the same as cp's version, it had severely 
bitrotted, even before my sorting patch to the latter, and was therefore 
only working by accident already.  Yet again, specifying the same thing 
in two different places leads to breakage.  Let's not do that.


This patch changes objcp's config-lang.in to source cp's variant and 
extract the gtfiles list therefrom.  The fly in that ointment was that 
we source the lang frags from the toplevel build and the gcc dir, so 
srcdir is not consistent.  This patch makes it so by setting srcdir to 
the gcc dir when we include the lang frags from the toplevel.


And the fly in that ointment is that while the following works in bash:
  srcdir=${srcdir}/gcc . $frag
to just override srcdir during the sourcing of frag, in sh it changes 
the current shell's value too.  A little bit of explicit saving and 
restoring is needed.


I committed the attached as sufficiently obvious. (and fixed my boot 
procedure to include objc++)


nathan

--
Nathan Sidwell
2017-07-03  Nathan Sidwell  

	* configure.ac: Set srcdir when sourcing config-lang.in fragments.
	* configure: Rebuilt.

	* config-lang.in: Source cp/config-lang.in, sort objc++ gtfiles list.

Index: configure
===
--- configure	(revision 249835)
+++ configure	(working copy)
@@ -6166,7 +6166,12 @@ if test -d ${srcdir}/gcc; then
 language=
 lang_requires=
 lang_requires_boot_languages=
-. ${lang_frag}
+# set srcdir during sourcing lang_frag to the gcc dir.
+# Sadly overriding srcdir on the . line doesn't work in plain sh as it
+# polutes this shell
+saved_srcdir=${srcdir}
+srcdir=${srcdir}/gcc . ${lang_frag}
+srcdir=${saved_srcdir}
 for other in ${lang_requires} ${lang_requires_boot_languages}; do
   case ,${enable_languages}, in
 	*,$other,*) ;;
@@ -6241,7 +6246,10 @@ if test -d ${srcdir}/gcc; then
 subdir_requires=
 boot_language=no
 build_by_default=yes
-. ${lang_frag}
+# set srcdir during sourcing.  See above about save & restore
+saved_srcdir=${srcdir}
+srcdir=${srcdir}/gcc . ${lang_frag}
+srcdir=${saved_srcdir}
 if test x${language} = x; then
   echo "${lang_frag} doesn't set \$language." 1>&2
   exit 1
Index: configure.ac
===
--- configure.ac	(revision 249835)
+++ configure.ac	(working copy)
@@ -1839,7 +1839,12 @@ if test -d ${srcdir}/gcc; then
 language=
 lang_requires=
 lang_requires_boot_languages=
-. ${lang_frag}
+# set srcdir during sourcing lang_frag to the gcc dir.
+# Sadly overriding srcdir on the . line doesn't work in plain sh as it
+# polutes this shell
+saved_srcdir=${srcdir}
+srcdir=${srcdir}/gcc . ${lang_frag}
+srcdir=${saved_srcdir}
 for other in ${lang_requires} ${lang_requires_boot_languages}; do
   case ,${enable_languages}, in
 	*,$other,*) ;;
@@ -1914,7 +1919,10 @@ if test -d ${srcdir}/gcc; then
 subdir_requires=
 boot_language=no
 build_by_default=yes
-. ${lang_frag}
+# set srcdir during sourcing.  See above about save & restore
+saved_srcdir=${srcdir}
+srcdir=${srcdir}/gcc . ${lang_frag}
+srcdir=${saved_srcdir}
 if test x${language} = x; then
   echo "${lang_frag} doesn't set \$language." 1>&2
   exit 1
Index: gcc/objcp/config-lang.in
===
--- gcc/objcp/config-lang.in	(revision 249835)
+++ gcc/objcp/config-lang.in	(working copy)
@@ -43,8 +43,20 @@ subdir_requires="objc cp"
 # avoid having the GC stuff from that header being added to gtype-cp.h
 # or gtype-objc.h.
 
-# This list is separated in two parts: the first one is identical to
-# the C++ one, the second one contains our ObjC++ additions.
-gtfiles="\$(srcdir)/cp/rtti.c \$(srcdir)/cp/mangle.c \$(srcdir)/cp/name-lookup.h \$(srcdir)/cp/name-lookup.c \$(srcdir)/cp/cp-tree.h \$(srcdir)/cp/decl.h \$(srcdir)/cp/call.c \$(srcdir)/cp/decl.c \$(srcdir)/cp/decl2.c \$(srcdir)/cp/pt.c \$(srcdir)/cp/repo.c \$(srcdir)/cp/semantics.c \$(srcdir)/cp/tree.c \$(srcdir)/cp/parser.h \$(srcdir)/cp/parser.c \$(srcdir)/cp/method.c \$(srcdir)/cp/typeck2.c \$(srcdir)/c-family/c-common.c \$(srcdir)/c-family/c-common.h \$(srcdir)/c-family/c-objc.h \$(srcdir)/c-family/c-lex.c \$(srcdir)/c-family/c-pragma.h \$(srcdir)/c-family/c-pragma.c 

Re: [PATCH, 2/4] Handle GOMP_OPENACC_NVPTX_{DISASM,SAVE_TEMPS} in libgomp nvptx plugin

2017-07-03 Thread Jakub Jelinek
On Mon, Jul 03, 2017 at 04:08:10PM +0200, Thomas Schwinge wrote:
> > And IMNSHO GOMP_DEBUG too.
> 
> But why that?  Isn't GOMP_DEBUG just controlling terminal debugging
> output (that you'd also like to see in setuid/setgid programs)?

The output could go into stderr, which could very well be redirected into
some file and some other program could be expecting specific content in
there.  So allowing an attacker to add there other stuff is really
dangerous.  If you want to use GOMP_DEBUG on suid/sgid processes, just
run them under root.

Jakub


Re: [PATCH GCC][3/4]Generalize dead store elimination (or store motion) across loop iterations in predcom

2017-07-03 Thread Bin.Cheng
On Mon, Jul 3, 2017 at 10:38 AM, Richard Biener
 wrote:
> On Tue, Jun 27, 2017 at 12:49 PM, Bin Cheng  wrote:
>> Hi,
>> For the moment, tree-predcom.c only supports 
>> invariant/load-loads/store-loads chains.
>> This patch generalizes dead store elimination (or store motion) across loop 
>> iterations in
>> predictive commoning pass by supporting store-store chain.  As comment in 
>> the patch:
>>
>>Apart from predictive commoning on Load-Load and Store-Load chains, we
>>also support Store-Store chains -- stores killed by other store can be
>>eliminated.  Given below example:
>>
>>  for (i = 0; i < n; i++)
>>{
>>  a[i] = 1;
>>  a[i+2] = 2;
>>}
>>
>>It can be replaced with:
>>
>>  t0 = a[0];
>>  t1 = a[1];
>>  for (i = 0; i < n; i++)
>>{
>>  a[i] = 1;
>>  t2 = 2;
>>  t0 = t1;
>>  t1 = t2;
>>}
>>  a[n] = t0;
>>  a[n+1] = t1;
>>
>>If the loop runs more than 1 iterations, it can be further simplified 
>> into:
>>
>>  for (i = 0; i < n; i++)
>>{
>>  a[i] = 1;
>>}
>>  a[n] = 2;
>>  a[n+1] = 2;
>>
>>The interesting part is this can be viewed either as general store motion
>>or general dead store elimination in either intra/inter-iterations way.
>>
>> There are number of interesting facts about this enhancement:
>> a) This patch supports dead store elimination for both across-iteration case 
>> and single-iteration
>>  case.  For the latter, it is dead store elimination.
>> b) There are advantages supporting dead store elimination in predcom, for 
>> example, it has
>>  complete information about memory address.  On the contrary, DSE pass 
>> can only handle
>>  memory references with exact the same memory address expression.
>> c) It's cheap to support store-stores chain in predcom based on existing 
>> code.
>> d) As commented, the enhancement can be viewed as either generalized dead 
>> store elimination
>>  or generalized store motion.  I prefer DSE here.
>>
>> Bootstrap(O2/O3) in patch series on x86_64 and AArch64.  Is it OK?
>
> Looks mostly ok.  I have a few questions though.
>
> +  /* Don't do store elimination if loop has multiple exit edges.  */
> +  bool eliminate_store_p = single_exit (loop) != NULL;
>
> handling this would be an enhancement?  IIRC LIM store-motion handles this
> just fine by emitting code on all exits.
It is an enhancement with a little bit more complication.  We would
need to setup/record finalizer memory references for different exit
edges.  I added TODO description for this (and following one).  Is it
okay to pick up this in the future?

>
> @@ -1773,6 +2003,9 @@ determine_unroll_factor (vec chains)
>  {
>if (chain->type == CT_INVARIANT)
> continue;
> +  /* Don't unroll when eliminating stores.  */
> +  else if (chain->type == CT_STORE_STORE)
> +   return 1;
>
> this is a hard exit value so we do not handle the case where another chain
> in the loop would want to unroll? (enhancement?)  I'd have expected to do
> the same as for CT_INVARIANT here.
I didn't check what change is needed in case of unrolling.  I am not
very sure if we should prefer unroll for *load chains or prefer not
unroll for store-store chains, because unroll in general increases
loop-carried register pressure for store-store chains rather than
decreases register pressure for *load chains.
I was also thinking if it's possible to restrict unrolling somehow in
order to enable predcom at O2.  BTW, this is not common, it only
happens once in spec2k6 with factor forced to 1.  So okay if as it is
now?

>
> +  tree init = ref_at_iteration (dr, (int) 0 - i, );
> +  if (!chain->all_always_accessed && tree_could_trap_p (init))
> +   {
> + gimple_seq_discard (stmts);
> + return false;
>
> so this is the only place that remotely cares for not always performed stores.
> But as-is the patch doesn't seem to avoid speculating stores and thus
> violates the C++ memory model, aka, introduces store-data-races?  The LIM
> store-motion code was fixed to avoid this by keeping track of whether a BB
> has executed to guard the stores done in the compensation code on the loop
> exit.
>
> That said, to "fix" this all && tree_could_trap_p cases would need to be 
> removed
> (or similarly flag vars be introduced).  Speculating loads that do not
> trap is ok
> (might only introduce false uninit use reports by tools like valgrind).
Hmm, not sure IIUC.  Patch updated, is it correct (though conservative)?

Thanks,
bin
>
> Thanks,
> Richard.
>
>> Thanks,
>> bin
>> 2017-06-21  Bin Cheng  
>>
>> * tree-predcom.c: Revise general description of pass.
>> (enum chain_type): New enum type for store elimination.
>> (struct chain): New field supporting store elimination.
>> (dump_chain): Dump store-stores chain.
>> 

Re: [C++ PATCH] conversion operator names

2017-07-03 Thread Nathan Sidwell

On 06/30/2017 04:24 PM, Jason Merrill wrote:


Suspense! :)


Only because I wanted to go home :)  I was abusing identifiers during 
the streaming process, and had to clean them up anyway.  At that point 
it's a simple question of:

  if (IDENTIFIER_CONV_OP_P (t))
{ write (tag_conv_op), write (tree_type (t)}
  else
{ write (tag_ident), write_string (identifier_pointer (t)) }
with the obvious inverse operations on read back.  Then the streamer's 
entirely agnostic about conv op names.


I attach the patch I committed to the modules branch, if you're really 
curious.


nathan

--
Nathan Sidwell
2017-07-03  Nathan Sidwell  

	gcc/cp/
	* module.c (cpms_{out,in}::start): Don't deal with identifiers
	here.
	(cpms_{out,in}::tree_node): Deal with identifiers specially.

Index: gcc/cp/module.c
===
--- gcc/cp/module.c	(revision 249920)
+++ gcc/cp/module.c	(revision 249921)
@@ -851,6 +851,8 @@ public:
 rt_import,		/* An import. */
 rt_binding,		/* A name-binding.  */
 rt_definition,	/* A definition. */
+rt_identifier,	/* An identifier node.  */
+rt_conv_identifier,	/* A conversion operator name.  */
 rt_trees,		/* Global trees.  */
 rt_type_name,	/* A type name.  */
 rt_typeinfo_var,	/* A typeinfo object.  */
@@ -2368,7 +2370,7 @@ cpms_out::start (tree_code code, tree t)
 	w.u (VL_EXP_OPERAND_LENGTH (t));
   break;
 case IDENTIFIER_NODE:
-  w.str (IDENTIFIER_POINTER (t), IDENTIFIER_LENGTH (t));
+  gcc_unreachable ();
   break;
 case TREE_BINFO:
   w.u (BINFO_N_BASE_BINFOS (t));
@@ -2411,14 +2413,13 @@ cpms_in::start (tree_code code)
 	t = make_node (code);
   break;
 case IDENTIFIER_NODE:
+  gcc_unreachable ();
+  break;
 case STRING_CST:
   {
 	size_t l;
 	const char *str = r.str ();
-	if (code == IDENTIFIER_NODE)
-	  t = get_identifier_with_length (str, l);
-	else
-	  t = build_string (l, str);
+	t = build_string (l, str);
   }
   break;
 case TREE_BINFO:
@@ -3928,6 +3929,27 @@ cpms_out::tree_node (tree t)
   return;
 }
 
+  if (TREE_CODE (t) == IDENTIFIER_NODE)
+{
+  /* An identifier node.  Stream the name or type.  */
+  bool conv_op = IDENTIFIER_CONV_OP_P (t);
+
+  w.u (conv_op ? rt_conv_identifier : rt_identifier);
+  if (conv_op)
+	{
+	  t = TREE_TYPE (t);
+	  tree_node (t);
+	}
+  else
+	w.str (IDENTIFIER_POINTER (t), IDENTIFIER_LENGTH (t));
+  unsigned tag = insert (t);
+  dump () && dump ("Written:%u %sidentifier:%N",
+		   tag, conv_op ? "conv_op_" : "", t);
+  unnest ();
+  return;
+}
+
+  /* Generic node streaming.  */
   tree_code code = TREE_CODE (t);
   tree_code_class klass = TREE_CODE_CLASS (code);
   gcc_assert (rt_tree_base + code < rt_ref_base);
@@ -3935,10 +3957,8 @@ cpms_out::tree_node (tree t)
   unique++;
   w.u (rt_tree_base + code);
 
-  int body = 1;
-  if (code == IDENTIFIER_NODE)
-body = 0;
-  else if (klass == tcc_declaration)
+  bool body = true;
+  if (klass == tcc_declaration)
 {
   /* Write out ctx, name & maybe import reference info.  */
   tree_node (DECL_CONTEXT (t));
@@ -3952,11 +3972,11 @@ cpms_out::tree_node (tree t)
 	  ident_imported_decl (CP_DECL_CONTEXT (t), node_module, t);
 	  dump () && dump ("Writing imported %N@%I", t,
 			   module_name (node_module));
-	  body = -1;
+	  body = false;
 	}
 }
 
-  if (body >= 0)
+  if (body)
 start (code, t);
 
   unsigned tag = insert (t);
@@ -3964,9 +3984,9 @@ cpms_out::tree_node (tree t)
 		   klass == tcc_declaration && DECL_MODULE_EXPORT_P (t)
 		   ? " (exported)": "");
 
-  if (body > 0)
+  if (body)
 tree_node_raw (code, t);
-  else if (body < 0 && TREE_TYPE (t))
+  else if (TREE_TYPE (t))
 {
   tree type = TREE_TYPE (t);
   bool existed;
@@ -4073,6 +4093,25 @@ cpms_in::tree_node ()
   unnest ();
   return res;
 }
+  else if (tag == rt_identifier)
+{
+  size_t l;
+  const char *str = r.str ();
+  tree id = get_identifier_with_length (str, l);
+  tag = insert (id);
+  dump () && dump ("Read:%u identifier:%N", tag, id);
+  unnest ();
+  return id;
+}
+  else if (tag == rt_conv_identifier)
+{
+  tree t = tree_node ();
+  tree id = make_conv_op_name (t);
+  tag = insert (id);
+  dump () && dump ("Read:%u conv_op_identifier:%N", tag, t);
+  unnest ();
+  return id;
+}
   else if (tag < rt_tree_base || tag >= rt_tree_base + MAX_TREE_CODES)
 {
   error (tag < rt_tree_base ? "unexpected key %qd"
@@ -4086,15 +4125,13 @@ cpms_in::tree_node ()
   tree_code_class klass = TREE_CODE_CLASS (code);
   tree t = NULL_TREE;
 
-  int body = 1;
+  bool body = true;
   tree name = NULL_TREE;
   tree ctx = NULL_TREE;
   int node_module = -1;
   int set_module = -1;
 
-  if (code == IDENTIFIER_NODE)
-body = 0;
-  else if (klass == tcc_declaration)
+  if (klass == tcc_declaration)

Re: [PATCH, 2/4] Handle GOMP_OPENACC_NVPTX_{DISASM,SAVE_TEMPS} in libgomp nvptx plugin

2017-07-03 Thread Thomas Schwinge
Hi!

On Mon, 26 Jun 2017 17:29:11 +0200, Jakub Jelinek  wrote:
> On Mon, Jun 26, 2017 at 03:26:57PM +, Joseph Myers wrote:
> > On Mon, 26 Jun 2017, Tom de Vries wrote:
> > 
> > > > 2. Handle GOMP_OPENACC_NVPTX_{DISASM,SAVE_TEMPS} in libgomp nvptx plugin
> > > 
> > > This patch adds handling of:
> > > - GOMP_OPENACC_NVPTX_SAVE_TEMPS=[01], and
> > > - GOMP_OPENACC_NVPTX_DISASM=[01]

Why the "OPENACC" in these names?  Doesn't this debugging aid apply to
any variant of offloading?

> > > The filename used for dumping the module is plugin-nvptx..cubin.

Also, I suggest to make these names similar to their controlling options,
that is: "gomp-nvptx*", for example.

> > Are you sure this use of getenv and writing to that file is safe for 
> > setuid/setgid programs?  I'd expect you to need to use secure_getenv as in 
> > plugin-hsa.c; certainly for anything that could results in writes to a 
> > file like that.
> 
> Yeah, definitely it should be using secure_getenv/__secure_getenv.

ACK.

> And IMNSHO GOMP_DEBUG too.

But why that?  Isn't GOMP_DEBUG just controlling terminal debugging
output (that you'd also like to see in setuid/setgid programs)?


Grüße
 Thomas


Re: MAINTAINERS update

2017-07-03 Thread Bernd Schmidt
On 06/11/2017 08:03 PM, Gerald Pfeifer wrote:
> On Tue, 30 May 2017, Bernd Schmidt wrote:
>> On 05/30/2017 09:05 AM, Richard Biener wrote:
>>> This leaves the nvptx and c6x ports without a maintainer.  Do 
>>> you have any recommendations for a successor here?
>> Not really. It would be a shame to lose the C6X port though. If I'm 
>> CC'd on any bug reports I'm prepared to keep it working - if that's
>> considered sufficient, I can readd myself as maintainer.
> 
> I think that would be preferrable.  Even if practically it may
> not make a huge difference, people with less background/involvement
> will know who to contact, and having an entire port without maintainer
> just doesn't feel right.

I've done that now.


Bernd
Index: MAINTAINERS
===
--- MAINTAINERS	(revision 249919)
+++ MAINTAINERS	(working copy)
@@ -49,6 +49,7 @@ arm port		Richard Earnshaw	
 avr port		Denis Chertykov		
 bfin port		Jie Zhang		
+c6x port		Bernd Schmidt		
 cris port		Hans-Peter Nilsson	
 epiphany port		Joern Rennecke		
 fr30 port		Nick Clifton		
Index: ChangeLog
===
--- ChangeLog	(revision 249919)
+++ ChangeLog	(working copy)
@@ -1,3 +1,7 @@
+2017-07-03  Bernd Schmidt  
+
+	* MAINTAINERS: Readd myself for c6x.
+
 2017-06-28  Martin Liska  
 
 	PR bootstrap/81217


[PATCH][OBVIOUS] Fix a test-case by adding dg-require.

2017-07-03 Thread Martin Liška
Hi.

I'm going to install following obvious test that adds missing gd-require.

Martin

gcc/testsuite/ChangeLog:

2017-07-03  Martin Liska  

* gcc.target/i386/mvc6.c: Add requirement for ifunc.
---
 gcc/testsuite/gcc.target/i386/mvc6.c | 1 +
 1 file changed, 1 insertion(+)


diff --git a/gcc/testsuite/gcc.target/i386/mvc6.c b/gcc/testsuite/gcc.target/i386/mvc6.c
index d584f573328..af631394980 100644
--- a/gcc/testsuite/gcc.target/i386/mvc6.c
+++ b/gcc/testsuite/gcc.target/i386/mvc6.c
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-require-ifunc "" } */
 /* { dg-options "-O3" } */
 /* { dg-final { scan-assembler "vpshufb" } } */
 /* { dg-final { scan-assembler "punpcklbw" } } */



Re: [PATCH] Fix removal of ifunc (PR ipa/81214).

2017-07-03 Thread Rainer Orth
Hi Martin,

> Following patch fixes the issue where we do not emit ifunc and resolver
> for function that are not called in a compilation unit or and not
> referenced.
>
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
> i386.exp tests work on x86_64-linux-gnu.

your patch caused a testsuite regression on various targets:

FAIL: gcc.target/i386/mvc6.c (test for excess errors)
UNRESOLVED: gcc.target/i386/mvc6.c scan-assembler punpcklbw
UNRESOLVED: gcc.target/i386/mvc6.c scan-assembler vpshufb

Excess errors:
/vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.target/i386/mvc6.c:8:1: error: 
the call requires ifunc, which is not supported by this target

I'm seeing it on i386-pc-solaris2.12, Dominique reported it in the PR on
Darwin/x86, and there are also testsuite results on FreeBSD/x86.

Unlike most other __attribute__((target_clones)) tests, this one lacked
{ dg-require-ifunc "" } and didn't need it before.

The following patch fixes this.  Tested with the appropriate runtest
invocation on i386-pc-solaris2.12 and x86_64-pc-linux-gnu, installed on
mainline.

While I was at it, I checked the other testcases with
__attribute__((target_clones)):

g++.dg/ext/mvc1.C   dg-require-ifunc ""
g++.dg/ext/mvc2.C   00 (dg-warning)
g++.dg/ext/mvc3.C   00 (dg-warning)
g++.dg/ext/mvc4.C   dg-require-ifunc ""
gcc.dg/tree-prof/pr66295.c  dg-require-ifunc ""
gcc.target/i386/mvc1.c  dg-require-ifunc ""
gcc.target/i386/mvc2.c  00
gcc.target/i386/mvc3.c  00 (dg-error)
gcc.target/i386/mvc4.c  dg-require-ifunc ""
gcc.target/i386/mvc5.c  dg-require-ifunc ""
gcc.target/i386/mvc6.c  00
gcc.target/i386/mvc7.c  dg-require-ifunc ""
gcc.target/i386/mvc8.c  dg-require-ifunc ""
gcc.target/i386/mvc9.c  dg-require-ifunc ""
gcc.target/i386/pr78419.c   dg-require-ifunc ""
gcc.target/i386/pr80732.c   dg-require-ifunc ""
gcc.target/i386/pr81214.c   dg-require-ifunc ""
gcc.target/powerpc/clone1.c powerpc*-*-linux* && lp64

Of those without dg-require-ifunc, the powerpc one is (sort of) ok since
it's restricted to Linux, and those with dg-warning/dg-error are too
since the warnings is emitted before the error about missing ifunc
support.  That leaves us with gcc.target/i386/mvc2.c, which is sort of
weird because it emits no code at all.  No idea if this intended,
though.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2017-07-03  Rainer Orth  

* gcc.target/i386/mvc6.c: Require ifunc support.

# HG changeset patch
# Parent  69f342cb37ffd9b438e9a09ca3ad5692c2aa1dec
Require ifunc support in gcc.target/i386/mvc6.c

diff --git a/gcc/testsuite/gcc.target/i386/mvc6.c b/gcc/testsuite/gcc.target/i386/mvc6.c
--- a/gcc/testsuite/gcc.target/i386/mvc6.c
+++ b/gcc/testsuite/gcc.target/i386/mvc6.c
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-require-ifunc "" } */
 /* { dg-options "-O3" } */
 /* { dg-final { scan-assembler "vpshufb" } } */
 /* { dg-final { scan-assembler "punpcklbw" } } */


[PATCH][2/2] PR60510, reduction chain vectorization w/o SLP

2017-07-03 Thread Richard Biener

The following is the patch enabling non-SLP vectorization of failed SLP
reduction chains.  It simply dissolves the group composing the SLP
reduction chain when vect_analyze_slp fails to detect the SLP and then
fixes up the remaining pieces in reduction vectorization.

I've made sure that SPEC CPU 2006 is clean on x86_64 (-Ofast 
-march=haswell, test run only) and gathered some statistics and
-fopt-info-vec shows 2220 more vectorized loops (from a now total
of 13483) which is a nice improvement of 15%.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

One day left to fix fallout before I leave for vacation.

Richard.

2017-07-03  Richard Biener  

PR tree-optimization/60510
* tree-vect-loop.c (vect_create_epilog_for_reduction): Pass in
the scalar reduction PHI and use it.
(vectorizable_reduction): Properly guard the single_defuse_cycle
path for non-SLP reduction chains where we cannot use it.
Rework reduc_def/index and vector type deduction.  Rework
vector operand gathering during reduction op code-gen.
* tree-vect-slp.c (vect_analyze_slp): For failed SLP reduction
chains dissolve the chain and leave it to non-SLP reduction
handling.

* gfortran.dg/vect/pr60510.f: New testcase.

Index: gcc/testsuite/gfortran.dg/vect/pr60510.f
===
--- gcc/testsuite/gfortran.dg/vect/pr60510.f(nonexistent)
+++ gcc/testsuite/gfortran.dg/vect/pr60510.f(working copy)
@@ -0,0 +1,29 @@
+! { dg-do run }
+! { dg-additional-options "-fno-inline -ffast-math" }
+  subroutine foo(a,x,y,n)
+  implicit none
+  integer n,i
+
+  real*8 y(n),x(n),a
+
+  do i=1,n
+ a=a+x(i)*y(i)+x(i)
+  enddo
+
+  return
+  end
+
+  program test
+  real*8 x(1024),y(1024),a
+  do i=1,1024
+x(i) = i
+y(i) = i+1
+  enddo
+  call foo(a,x,y,1024)
+  if (a.ne.359488000.0) call abort()
+  end
+! If there's no longer a reduction chain detected this doesn't test what
+! it was supposed to test, vectorizing a reduction chain w/o SLP.
+! { dg-final { scan-tree-dump "reduction chain" "vect" } }
+! We should vectorize the reduction in foo and the induction in test.
+! { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" } }
Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c(revision 249902)
+++ gcc/tree-vect-loop.c(working copy)
@@ -4313,6 +4313,7 @@ get_initial_defs_for_reduction (slp_tree
 
 static void
 vect_create_epilog_for_reduction (vec vect_defs, gimple *stmt,
+ gimple *reduc_def_stmt,
  int ncopies, enum tree_code reduc_code,
  vec reduction_phis,
   int reduc_index, bool double_reduc, 
@@ -4401,9 +4402,8 @@ vect_create_epilog_for_reduction (vec vec_oprnds0;
   auto_vec vec_oprnds1;
+  auto_vec vec_oprnds2;
   auto_vec vect_defs;
   auto_vec phis;
   int vec_num;
@@ -5643,8 +5642,6 @@ vectorizable_reduction (gimple *stmt, gi
   gimple *reduc_stmt = STMT_VINFO_REDUC_DEF (stmt_info);
   if (STMT_VINFO_IN_PATTERN_P (vinfo_for_stmt (reduc_stmt)))
reduc_stmt = STMT_VINFO_RELATED_STMT (vinfo_for_stmt (reduc_stmt));
-  if (STMT_VINFO_RELEVANT (vinfo_for_stmt (reduc_stmt)) <= 
vect_used_only_live)
-   single_defuse_cycle = true;
 
   gcc_assert (is_gimple_assign (reduc_stmt));
   for (unsigned k = 1; k < gimple_num_ops (reduc_stmt); ++k)
@@ -5666,6 +5663,17 @@ vectorizable_reduction (gimple *stmt, gi
ncopies = (LOOP_VINFO_VECT_FACTOR (loop_vinfo)
   / TYPE_VECTOR_SUBPARTS (vectype_in));
 
+  use_operand_p use_p;
+  gimple *use_stmt;
+  if (ncopies > 1
+ && (STMT_VINFO_RELEVANT (vinfo_for_stmt (reduc_stmt))
+ <= vect_used_only_live)
+ && single_imm_use (gimple_phi_result (stmt), _p, _stmt)
+ && (use_stmt == reduc_stmt
+ || (STMT_VINFO_RELATED_STMT (vinfo_for_stmt (use_stmt))
+ == reduc_stmt)))
+   single_defuse_cycle = true;
+
   /* Create the destination vector  */
   scalar_dest = gimple_assign_lhs (reduc_stmt);
   vec_dest = vect_create_destination_var (scalar_dest, vectype_out);
@@ -5769,10 +5777,6 @@ vectorizable_reduction (gimple *stmt, gi
 default:
   gcc_unreachable ();
 }
-  /* The default is that the reduction variable is the last in statement.  */
-  int reduc_index = op_type - 1;
-  if (code == MINUS_EXPR)
-reduc_index = 0;
 
   if (code == COND_EXPR && slp_node)
 return false;
@@ -5792,22 +5796,30 @@ vectorizable_reduction (gimple *stmt, gi
  The last use is the reduction variable.  In case of nested cycle this
  assumption is not true: we use reduc_index to record the index of the
   

Re: [PATCH] Use secure_getenv for GOMP_DEBUG

2017-07-03 Thread Tom de Vries

On 07/03/2017 02:26 PM, Franz Sirl wrote:

Am 27.06.17 um 13:10 schrieb Tom de Vries:

--- a/libgomp/plugin/plugin-hsa.c
+++ b/libgomp/plugin/plugin-hsa.c
@@ -39,32 +39,7 @@
  #include 
  #include "libgomp-plugin.h"
  #include "gomp-constants.h"
-
-/* Secure getenv() which returns NULL if running as SUID/SGID.  */
-#ifndef HAVE_SECURE_GETENV
-#ifdef HAVE___SECURE_GETENV
-#define secure_getenv __secure_getenv
-#elif defined (HAVE_UNISTD_H) && defined(HAVE_GETUID) && 
defined(HAVE_GETEUID) \

-  && defined(HAVE_GETGID) && defined(HAVE_GETEGID)
-
-#include 
-
-/* Implementation of secure_getenv() for targets where it is not 
provided but

-   we have at least means to test real and effective IDs. */
-
-static char *
-secure_getenv (const char *name)
-{
-  if ((getuid () == geteuid ()) && (getgid () == getegid ()))
-return getenv (name);
-  else
-return NULL;
-}
-
-#else
-#define secure_getenv getenv
-#endif
-#endif
+#include "secure-getenv.h"


Hi,

that should be secure_getenv.h (underscore instead of dash).


Hi Franz,

sorry for the breakage.

Fixed in attached patch.

Committed.

Thanks,
- Tom

Fix secure_getenv.h include in plugin-hsa.c

2017-07-03  Tom de Vries  

	* plugin/plugin-hsa.c: Fix secure_getenv.h include.

---
 libgomp/plugin/plugin-hsa.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libgomp/plugin/plugin-hsa.c b/libgomp/plugin/plugin-hsa.c
index adb07ac..fc08f5d 100644
--- a/libgomp/plugin/plugin-hsa.c
+++ b/libgomp/plugin/plugin-hsa.c
@@ -39,7 +39,7 @@
 #include 
 #include "libgomp-plugin.h"
 #include "gomp-constants.h"
-#include "secure-getenv.h"
+#include "secure_getenv.h"
 
 /* As an HSA runtime is dlopened, following structure defines function
pointers utilized by the HSA plug-in.  */


[patch][arm] Clean up generation of BE8 format images.

2017-07-03 Thread Richard Earnshaw (lists)
The existing code in arm/bpabi.h was quite fragile and relied on matching
specific CPU and/or architecture names.  The introduction of the option
format for -mcpu and -march broke that in a way that would be non-trivial
to fix by updating the list.  The hook in that file was always a pain
as it required every new CPU being added to be add an update here as well
(easy to miss).

I've fixed that problem once and for all by adding a new callback into
the driver to select the correct BE8 behaviour.  This uses features in
the ISA capabilities list to select whether or not to use BE8 format
during linking.

I also noticed that if the user happened to pass both -mbig-endian and
-mlittle-endian on the command line then the linker spec rules would
get somewhat confused and potentially do the wrong thing.  I've fixed that
by marking these options as opposites in the option descriptions.  The
driver will now automatically suppress overridden options leading to the
correct desired behavior.

Whilst fixing this I noticed a couple of anomolus cases in the
existing BE8 support: we were not generating BE8 format for ARMv6 or
ARMv7-R targets.  While the ARMv6 status was probably deliberate at
the time, this is probably not a good idea in the long term as the
alternative, BE32, has been deprecated by ARM.  After discussion with
a couple of colleagues I've decided to change this, but to then add an
option to restore the existing behaviour at the user's option.  So
this patch introduces two new options (opposites) -mbe8 and -mbe32.

This is a quiet behavior change, so I'll add a comment to the release
notes shortly.

* common/config/arm/arm-common.c (arm_be8_option): New function.
* config/arm/arm-isa.h (isa_feature): Add new feature bit isa_bit_be8.
(ISA_ARMv6): Add isa_bit_be8.
* config/arm/arm.h (arm_be8_option): Add prototype.
(BE8_SPEC_FUNCTION): New define.
(EXTRA_SPEC_FUNCTIONS): Add BE8_SPEC_FUNCTION.
* config/arm/arm.opt (mbig-endian): Mark as Negative of mlittle-endian.
(mlittle-endian): Similarly.
(mbe8, mbe32): New options.
* config/arm/bpabi.h (BE8_LINK_SPEC): Call arm_be8_option.
* doc/invoke.texi (ARM Options): Document -mbe8 and -mbe32.
diff --git a/gcc/common/config/arm/arm-common.c b/gcc/common/config/arm/arm-common.c
index d06c39b..b6244d6 100644
--- a/gcc/common/config/arm/arm-common.c
+++ b/gcc/common/config/arm/arm-common.c
@@ -761,6 +761,63 @@ arm_canon_arch_option (int argc, const char **argv)
   return canonical_arch;
 }
 
+/* If building big-endian on a BE8 target generate a --be8 option for
+   the linker.  Takes four types of option: "little" - little-endian;
+   "big" - big-endian; "be8" - force be8 iff big-endian; and "arch"
+   "" (two arguments) - the target architecture.  The
+   parameter names are generated by the driver from the command-line
+   options.  */
+const char *
+arm_be8_option (int argc, const char **argv)
+{
+  int endian = TARGET_ENDIAN_DEFAULT;
+  const char *arch = NULL;
+  int arg;
+  bool force = false;
+
+  for (arg = 0; arg < argc; arg++)
+{
+  if (strcmp (argv[arg], "little") == 0)
+	endian = 0;
+  else if (strcmp (argv[arg], "big") == 0)
+	endian = 1;
+  else if (strcmp (argv[arg], "be8") == 0)
+	force = true;
+  else if (strcmp (argv[arg], "arch") == 0)
+	{
+	  arg++;
+	  gcc_assert (arg < argc);
+	  arch = argv[arg];
+	}
+  else
+	gcc_unreachable ();
+}
+
+  /* Little endian - no be8 option.  */
+  if (!endian)
+return "";
+
+  if (force)
+return "--be8";
+
+  /* Arch might not be set iff arm_canon_arch (above) detected an
+ error.  Do nothing in that case.  */
+  if (!arch)
+return "";
+
+  const arch_option *selected_arch
+= arm_parse_arch_option_name (all_architectures, "-march", arch);
+
+  /* Similarly if the given arch option was itself invalid.  */
+  if (!selected_arch)
+return "";
+
+  if (check_isa_bits_for (selected_arch->common.isa_bits, isa_bit_be8))
+return "--be8";
+
+  return "";
+}
+
 #undef ARM_CPU_NAME_LENGTH
 
 
diff --git a/gcc/config/arm/arm-isa.h b/gcc/config/arm/arm-isa.h
index 4b5a0f6..c0c2cce 100644
--- a/gcc/config/arm/arm-isa.h
+++ b/gcc/config/arm/arm-isa.h
@@ -40,7 +40,8 @@ enum isa_feature
 isa_bit_ARMv6,	/* Architecture rel 6.  */
 isa_bit_ARMv6k,	/* Architecture rel 6k.  */
 isa_bit_thumb2,	/* Thumb-2.  */
-isa_bit_notm,	/* Instructions that are not present in 'M' profile.  */
+isa_bit_notm,	/* Instructions not present in 'M' profile.  */
+isa_bit_be8,	/* Architecture uses be8 mode in big-endian.  */
 isa_bit_tdiv,	/* Thumb division instructions.  */
 isa_bit_ARMv7em,	/* Architecture rel 7e-m.  */
 isa_bit_ARMv7,	/* Architecture rel 7.  */
@@ -101,7 +102,7 @@ enum isa_feature
 #define ISA_ARMv5e	ISA_ARMv5, isa_bit_ARMv5e
 #define ISA_ARMv5te	ISA_ARMv5e, isa_bit_thumb
 #define ISA_ARMv5tej	ISA_ARMv5te
-#define ISA_ARMv6	ISA_ARMv5te, isa_bit_ARMv6
+#define 

Re: [PATCH 2/3] Simplify wrapped binops

2017-07-03 Thread Richard Biener
On Wed, Jun 28, 2017 at 4:34 PM, Robin Dapp  wrote
>> ideally you'd use a wide-int here and defer the tree allocation to the result
>
> Did that in the attached version.
>
>> So I guess we never run into the outer_op == minus case as the above is
>> clearly wrong for that?
>
> Right, damn, not only was the treatment for this missing but it was
> bogus in the other pattern as well.  Since we are mostly dealing with
> PLUS_EXPR anyways it's probably better to defer the MINUS_EXPR case for
> now.  This will also slim down the patterns a bit.
>
>> try to keep vertical spacing in patterns minimal -- I belive that patterns
>> should be small enough to fit in a terminal window (24 lines).
>
> I find using the expanded wrapped_range condition in the simplification
> somewhat cumbersome, especially because I need the condition to evaluate
> to true by default making the initialization unintuitive.  Yet, I guess
> setting wrapped_range = true was not terribly intuitive either...

+ /* Perform binary operation inside the cast if the constant fits
+and (A + CST)'s range does not wrap.  */
+ (with
+  {
+bool min_ovf = true, max_ovf = false;

While the initialization value doesn't matter (wi::add will overwrite it)
better initialize both to false ;)  Ah, you mean because we want to
transform only if get_range_info returned VR_RANGE.  Indeed somewhat
unintuitive (but still the best variant for now).

+wide_int w1 = @1;
+w1 = w1.from (w1, TYPE_PRECISION (inner_type), TYPE_SIGN
+   (inner_type));

I think wi::from (@1, ) should work as well.

+ (if (!((min_ovf && !max_ovf) || (!min_ovf && max_ovf)) )
+  (convert (plus @0 { {wide_int_to_tree (TREE_TYPE (@0), w1)}; })))

so I'm still missing a comment on why min_ovf && max_ovf is ok.
The simple-minded would have written

   (if  (! min_ovf && ! max_ovf)
...

I'd like to see testcase(s) with this patch, preferably exactly also for the
case of min_ovf && max_ovf.  That said, consider (long)[0xfffe,
0x] + 2
which should have min_ovf and max_ovf which results in [0x0, 0x1] in type
unsigned int but [0x1, 0x10001] in type long.

Richard.

> Regards
>  Robin


Re: [RFC PATCH] Fix pointer diff (was: -fsanitize=pointer-overflow support (PR sanitizer/80998))

2017-07-03 Thread Richard Biener
On Sat, 1 Jul 2017, Marc Glisse wrote:

> On Thu, 22 Jun 2017, Richard Biener wrote:
> 
> > On Thu, 22 Jun 2017, Marc Glisse wrote:
> > 
> > > On Thu, 22 Jun 2017, Richard Biener wrote:
> > > 
> > > > > If we consider pointers as unsigned, with a subtraction that has a
> > > > > signed
> > > > > result with the constraint that overflow is undefined, we cannot model
> > > > > that
> > > > > optimally with just the usual signed/unsigned operations, so I am in
> > > > > favor
> > > > > of
> > > > > POINTER_DIFF, at least in the long run (together with having a signed
> > > > > second
> > > > > argument for POINTER_PLUS, etc). For 64-bit platforms it might have
> > > > > been
> > > > > easier to declare that the upper half (3/4 ?) of the address space
> > > > > doesn't
> > > > > exist...
> > > > 
> > > > I repeatedly thought of POINTER_DIFF_EXPR but adding such a basic tree
> > > > code is quite a big job.
> > > 
> > > Yes :-(
> > > It is probably not realistic to introduce it just to avoid a couple
> > > regressions while fixing a bug.
> > > 
> > > > So we'd have POINTER_DIFF_EXPR take two pointer typed args and produce
> > > > ptrdiff_t.  What's the advantage of having this?
> > > 
> > > It represents q-p with one statement instead of 3 (long)q-(long)p or 4
> > > (long)((ulong)q-(ulong)p). It allows us to stay in the pointer world, so
> > > (q-p)>0 is equivalent to p > > what (undefined) overflow means for pointers.
> > > 
> > > Of course it is hard to know in advance if that's significant or
> > > negligible, maybe size_t finds its way in too many places anyway.
> > 
> > As with all those experiments ...
> > 
> > Well, if I would sell this as a consultant to somebody I'd estimate
> > 3 man months for this work which realistically means you have to
> > start now otherwise you won't make it this stage 1.
> 
> I wrote a quick prototype to see what the fallout would look like.
> Surprisingly, it actually passes bootstrap+testsuite on ppc64el with all
> languages with no regression. Sure, it is probably not a complete
> migration, there are likely a few places still converting to ptrdiff_t
> to perform a regular subtraction, but this seems to indicate that the
> work isn't as bad as using a signed type in pointer_plus_expr for
> instance.

The fold_binary_loc hunk looks dangerous (it'll generate MINUS_EXPR
from POINTER_MINUS_EXPR in some cases I guess).

The tree code needs documenting in tree.def and generic.texi.

Otherwise ok(*).

Thanks,
Richard.

(*) ok, just kidding -- or maybe not


Re: [PATCH] Use secure_getenv for GOMP_DEBUG

2017-07-03 Thread Franz Sirl

Am 27.06.17 um 13:10 schrieb Tom de Vries:

--- a/libgomp/plugin/plugin-hsa.c
+++ b/libgomp/plugin/plugin-hsa.c
@@ -39,32 +39,7 @@
  #include 
  #include "libgomp-plugin.h"
  #include "gomp-constants.h"
-
-/* Secure getenv() which returns NULL if running as SUID/SGID.  */
-#ifndef HAVE_SECURE_GETENV
-#ifdef HAVE___SECURE_GETENV
-#define secure_getenv __secure_getenv
-#elif defined (HAVE_UNISTD_H) && defined(HAVE_GETUID) && defined(HAVE_GETEUID) 
\
-  && defined(HAVE_GETGID) && defined(HAVE_GETEGID)
-
-#include 
-
-/* Implementation of secure_getenv() for targets where it is not provided but
-   we have at least means to test real and effective IDs. */
-
-static char *
-secure_getenv (const char *name)
-{
-  if ((getuid () == geteuid ()) && (getgid () == getegid ()))
-return getenv (name);
-  else
-return NULL;
-}
-
-#else
-#define secure_getenv getenv
-#endif
-#endif
+#include "secure-getenv.h"


Hi,

that should be secure_getenv.h (underscore instead of dash).

Franz


Fix profile updating on loop-doloop

2017-07-03 Thread Jan Hubicka
Hi,
this patch fixes powerPC bootstrap ICE caused by doloop incorrectly updating 
profile.

Comitted.

PR bootstrap/81285
* loop-doloop.c (add_test): Update profile.
Index: loop-doloop.c
===
--- loop-doloop.c   (revision 249885)
+++ loop-doloop.c   (working copy)
@@ -347,6 +347,8 @@ add_test (rtx cond, edge *e, basic_block
   rtx op0 = XEXP (cond, 0), op1 = XEXP (cond, 1);
   enum rtx_code code = GET_CODE (cond);
   basic_block bb;
+  /* The jump is supposed to handle an unlikely special case.  */
+  profile_probability prob = profile_probability::guessed_never ();
 
   mode = GET_MODE (XEXP (cond, 0));
   if (mode == VOIDmode)
@@ -357,7 +359,7 @@ add_test (rtx cond, edge *e, basic_block
   op1 = force_operand (op1, NULL_RTX);
   label = block_label (dest);
   do_compare_rtx_and_jump (op0, op1, code, 0, mode, NULL_RTX, NULL, label,
-  profile_probability::uninitialized ());
+  prob);
 
   jump = get_last_insn ();
   if (!jump || !JUMP_P (jump))
@@ -387,12 +389,14 @@ add_test (rtx cond, edge *e, basic_block
 
   JUMP_LABEL (jump) = label;
 
-  /* The jump is supposed to handle an unlikely special case.  */
-  add_int_reg_note (jump, REG_BR_PROB, 0);
-
   LABEL_NUSES (label)++;
 
-  make_edge (bb, dest, (*e)->flags & ~EDGE_FALLTHRU);
+  edge e2 = make_edge (bb, dest, (*e)->flags & ~EDGE_FALLTHRU);
+  e2->probability = prob;
+  e2->count = e2->src->count.apply_probability (prob);
+  (*e)->probability = prob.invert ();
+  (*e)->count = (*e)->count.apply_probability (prob);
+  update_br_prob_note (e2->src);
   return true;
 }
 


Re: [PATCH GCC][13/13]Distribute loop with loop versioning under runtime alias check

2017-07-03 Thread Richard Biener
On Fri, Jun 30, 2017 at 12:43 PM, Bin.Cheng  wrote:
> On Wed, Jun 28, 2017 at 2:09 PM, Bin.Cheng  wrote:
>> On Wed, Jun 28, 2017 at 1:29 PM, Richard Biener
>>  wrote:
>>> On Wed, Jun 28, 2017 at 1:46 PM, Bin.Cheng  wrote:
 On Wed, Jun 28, 2017 at 11:58 AM, Richard Biener
  wrote:
> On Tue, Jun 27, 2017 at 4:07 PM, Bin.Cheng  wrote:
>> On Tue, Jun 27, 2017 at 1:44 PM, Richard Biener
>>  wrote:
>>> On Fri, Jun 23, 2017 at 12:30 PM, Bin.Cheng  
>>> wrote:
 On Tue, Jun 20, 2017 at 10:22 AM, Bin.Cheng  
 wrote:
> On Mon, Jun 12, 2017 at 6:03 PM, Bin Cheng  wrote:
>> Hi,
 Rebased V3 for changes in previous patches.  Bootstap and test on
 x86_64 and aarch64.
>>>
>>> why is ldist-12.c no longer distributed?  your comment says it doesn't 
>>> expose
>>> more "parallelism" but the point is to reduce memory bandwith 
>>> requirements
>>> which it clearly does.
>>>
>>> Likewise for -13.c, -14.c.  -4.c may be a questionable case but the 
>>> wording
>>> of "parallelism" still confuses me.
>>>
>>> Can you elaborate on that.  Now onto the patch:
>> Given we don't model data locality or memory bandwidth, whether
>> distribution enables loops that can be executed paralleled becomes the
>> major criteria for distribution.  BTW, I think a good memory stream
>> optimization model shouldn't consider small loops as in ldist-12.c,
>> etc., appropriate for distribution.
>
> True.  But what means "parallel" here?  ldist-13.c if partitioned into 
> two loops
> can be executed "in parallel"
 So if a loop by itself can be vectorized (or so called can be executed
 paralleled), we tend to no distribute it into small ones.  But there
 is one exception here, if the distributed small loops are recognized
 as builtin functions, we still distribute it.  I assume it's generally
 better to call builtin memory functions than vectorize it by GCC?
>>>
>>> Yes.
>>>
>
>>>
>>> +   Loop distribution is the dual of loop fusion.  It separates 
>>> statements
>>> +   of a loop (or loop nest) into multiple loops (or loop nests) with 
>>> the
>>> +   same loop header.  The major goal is to separate statements which 
>>> may
>>> +   be vectorized from those that can't.  This pass implements 
>>> distribution
>>> +   in the following steps:
>>>
>>> misses the goal of being a memory stream optimization, not only a 
>>> vectorization
>>> enabler.  distributing a loop can also reduce register pressure.
>> I will revise the comment, but as explained, enabling more
>> vectorization is the major criteria for distribution to some extend
>> now.
>
> Yes, I agree -- originally it was written to optimize the stream 
> benchmark IIRC.
 Let's see if any performance drop will be reported against this patch.
 Let's see if we can create a cost model for it.
>>>
>>> Fine.
>> I will run some benchmarks to see if there is breakage.
>>>
>
>>>
>>> You introduce ldist_alias_id in struct loop (probably in 01/n which I
>>> didn't look
>>> into yet).  If you don't use that please introduce it separately.
>> Hmm, yes it is introduced in patch [01/n] and set in this patch.
>>
>>>
>>> + /* Be conservative.  If data references are not well 
>>> analyzed,
>>> +or the two data references have the same base address 
>>> and
>>> +offset, add dependence and consider it alias to each 
>>> other.
>>> +In other words, the dependence can not be resolved by
>>> +runtime alias check.  */
>>> + if (!DR_BASE_ADDRESS (dr1) || !DR_BASE_ADDRESS (dr2)
>>> + || !DR_OFFSET (dr1) || !DR_OFFSET (dr2)
>>> + || !DR_INIT (dr1) || !DR_INIT (dr2)
>>> + || !DR_STEP (dr1) || !tree_fits_uhwi_p (DR_STEP (dr1))
>>> + || !DR_STEP (dr2) || !tree_fits_uhwi_p (DR_STEP (dr2))
>>> + || res == 0)
>>>
>>> ISTR a helper that computes whether we can handle a runtime alias check 
>>> for
>>> a specific case?
>> I guess you mean runtime_alias_check_p that I factored out previously?
>>  Unfortunately, it's factored out vectorizer's usage and doesn't fit
>> here straightforwardly.  Shall I try to further generalize the
>> interface as independence patch to this one?
>
> That would be nice.
>
>>>
>>> +  /* Depend on vectorizer to fold IFN_LOOP_DIST_ALIAS.  */
>>> +  if (flag_tree_loop_vectorize)
>>> +{
>>>

Re: [PATCH GCC][01/13]Introduce internal function IFN_LOOP_DIST_ALIAS

2017-07-03 Thread Richard Biener
On Fri, Jun 30, 2017 at 12:37 PM, Bin.Cheng  wrote:
> On Wed, Jun 28, 2017 at 8:29 AM, Richard Biener
>  wrote:
>> On Tue, Jun 27, 2017 at 6:46 PM, Bin.Cheng  wrote:
>>> On Tue, Jun 27, 2017 at 3:59 PM, Richard Biener
>>>  wrote:
 On June 27, 2017 4:27:17 PM GMT+02:00, "Bin.Cheng"  
 wrote:
>On Tue, Jun 27, 2017 at 1:58 PM, Richard Biener
> wrote:
>> On Fri, Jun 23, 2017 at 12:10 PM, Bin.Cheng 
>wrote:
>>> On Mon, Jun 12, 2017 at 6:02 PM, Bin Cheng 
>wrote:
 Hi,
 I was asked by upstream to split the loop distribution patch into
>small ones.
 It is hard because data structure and algorithm are closely coupled
>together.
 Anyway, this is the patch series with smaller patches.  Basically I
>tried to
 separate data structure and bug-fix changes apart with one as the
>main patch.
 Note I only made necessary code refactoring in order to separate
>patch, apart
 from that, there is no change against the last version.

 This is the first patch introducing new internal function
>IFN_LOOP_DIST_ALIAS.
 GCC will distribute loops under condition of this function call.

 Bootstrap and test on x86_64 and AArch64.  Is it OK?
>>> Hi,
>>> I need to update this patch fixing an issue in
>>> vect_loop_dist_alias_call.  The previous patch fails to find some
>>> IFN_LOOP_DIST_ALIAS calls.
>>>
>>> Bootstrap and test in series.  Is it OK?
>>
>> So I wonder if we really need to track ldist_alias_id or if we can do
>sth
>Yes, it is needed because otherwise we probably falsely trying to
>search for IFN_LOOP_DIST_ALIAS for a normal (not from distribution)
>loop.
>
>> more "general", like tracking a copy_of or origin and then directly
>> go to nearest_common_dominator (loop->header, copy_of->header)
>> to find the controlling condition?
>I tend to not record any pointer in loop structure, it can easily go
>dangling for a across passes data structure.

 I didn't mean to record a pointer, just rename your field and make it more 
 general.  The common dominator thing shod still work, no?
>>> I might not be following.  If we record the original loop->num in the
>>> renamed field, nearest_common_dominator can't work because we don't
>>> have basic blocks to start the call?  The original loop could be
>>> eliminated at several points, for example, instantly after
>>> distribution, or folded in vectorizer for other loops distributed from
>>> the original loop.
>>> BTW, setting the copy_of/origin field in loop_version is not enough
>>> for this use case, all generated loops (actually, except the versioned
>>> loop) from distribution need to be set.
>>
>> Of course it would need to be set for all distributed loops.
>>
>> I'm not sure "loop vanishes" is the case to optimize for though.  If the loop
>> is still available then origin->header should work as BB.  If not then can't
>> we conclude, for the purpose of IFN_LOOP_DIST_ALIAS, that the whole
>> region must be dead?  We still have to identify it of course, but it means
>> we can fold stray IFN_LOOP_DIST_ALIAS calls left over after vectorization
>> to whatever we like?
>>

 As far as memory usage
>is concerned.  I actually don't need a whole integer to record the
>loop num.  I can simply restrict number of distributions in one
>function to at most 256, and record such id in a char field in struct
>loop?  Does this sounds better?

 As said, tracking loop origin sounds useful anyway so I'd rather add and 
 use that somehow.
>>> To be honest, I don't know.  the current field works like a unique
>>> index of distribution operation.  The original loop could be destroyed
>>> at different points thus no longer exists, this makes the recorded
>>> copy_of/origin less meaningful?
>>
>> I think we talked about prologue and epilogue loops to be easier identifiable
>> as so (and as to what "main" loop).  So lets say we have one "origin" field
>> and accompaning flags "origin_is_loop_dist_alias_version",
>> "origin_is_main_loop_of_prologue", etc.?  I can't think of the case where
>> origin would be two things at the same time (you can always walk up
>> the origin tree).
> Hi,
> Here is the updated patch working in this way.  There is still one
> problem with this method.  Considering one distributed loop is
> if-converted later, the orig loop for if-converted distributed loop is
> different.  Though we can update orig_loop_num, it's inaccurate and
> one consequence is we need to walk up dominance tree till entry_block.
> Note if orig_loop_num is not shared, we can stop once basic block goes
> beyond outer loop.
> I didn't introduce flags in this 

Re: [Patch AArch64] Stop generating BSL for simple integer code

2017-07-03 Thread James Greenhalgh
On Wed, Jun 21, 2017 at 11:49:07AM +0100, James Greenhalgh wrote:
> *ping*

*ping*x2

Thanks,
James

> On Mon, Jun 12, 2017 at 02:44:40PM +0100, James Greenhalgh wrote:
> > [Sorry for the re-send. I spotted that the attributes were not right for the
> >  new pattern I was adding. The change between this and the first version 
> > was:
> > 
> >   +  [(set_attr "type" "neon_bsl,neon_bsl,neon_bsl,multiple")
> >   +   (set_attr "length" "4,4,4,12")]
> > ]
> > 
> > ---
> > 
> > Hi,
> > 
> > In this testcase, all argument registers and the return register
> > will be general purpose registers:
> > 
> >   long long
> >   foo (long long a, long long b, long long c)
> >   {
> > return ((a ^ b) & c) ^ b;
> >   }
> > 
> > However, due to the implementation of aarch64_simd_bsl_internal
> > we'll match that pattern and emit a BSL, necessitating moving all those
> > arguments and results to the Advanced SIMD registers:
> > 
> > fmovd2, x0
> > fmovd0, x2
> > fmovd1, x1
> > bsl v0.8b, v2.8b, v1.8b
> > fmovx0, d0
> > 
> > To fix this, we turn aarch64_simd_bsldi_internal in to an insn_and_split 
> > that
> > knows to split back to integer operations if the register allocation
> > falls that way.
> > 
> > We could have used an unspec, but then we lose some of the nice
> > simplifications that can be made from explicitly spelling out the semantics
> > of BSL.
> > 
> > Bootstrapped on aarch64-none-linux-gnu.
> > 
> > OK?
> > 
> > Thanks,
> > James
> > 
> > ---
> > gcc/
> > 
> > 2017-06-12  James Greenhalgh  
> > 
> > * config/aarch64/aarch64-simd.md
> > (aarch64_simd_bsl_internal): Remove DImode.
> > (*aarch64_simd_bsl_alt): Likewise.
> > (aarch64_simd_bsldi_internal): New.
> > 
> > gcc/testsuite/
> > 
> > 2017-06-12  James Greenhalgh  
> > 
> > * gcc.target/aarch64/no-dimode-bsl.c: New.
> > * gcc.target/aarch64/dimode-bsl.c: New.
> > 
> 
> > diff --git a/gcc/config/aarch64/aarch64-simd.md 
> > b/gcc/config/aarch64/aarch64-simd.md
> > index c5a86ff..7b6b12f 100644
> > --- a/gcc/config/aarch64/aarch64-simd.md
> > +++ b/gcc/config/aarch64/aarch64-simd.md
> > @@ -2256,13 +2256,13 @@
> >  ;; in *aarch64_simd_bsl_alt.
> >  
> >  (define_insn "aarch64_simd_bsl_internal"
> > -  [(set (match_operand:VSDQ_I_DI 0 "register_operand" "=w,w,w")
> > -   (xor:VSDQ_I_DI
> > -  (and:VSDQ_I_DI
> > -(xor:VSDQ_I_DI
> > +  [(set (match_operand:VDQ_I 0 "register_operand" "=w,w,w")
> > +   (xor:VDQ_I
> > +  (and:VDQ_I
> > +(xor:VDQ_I
> >(match_operand: 3 "register_operand" "w,0,w")
> > -  (match_operand:VSDQ_I_DI 2 "register_operand" "w,w,0"))
> > -(match_operand:VSDQ_I_DI 1 "register_operand" "0,w,w"))
> > +  (match_operand:VDQ_I 2 "register_operand" "w,w,0"))
> > +(match_operand:VDQ_I 1 "register_operand" "0,w,w"))
> >   (match_dup: 3)
> > ))]
> >"TARGET_SIMD"
> > @@ -2280,14 +2280,14 @@
> >  ;; permutations of commutative operations, we have to have a separate 
> > pattern.
> >  
> >  (define_insn "*aarch64_simd_bsl_alt"
> > -  [(set (match_operand:VSDQ_I_DI 0 "register_operand" "=w,w,w")
> > -   (xor:VSDQ_I_DI
> > -  (and:VSDQ_I_DI
> > -(xor:VSDQ_I_DI
> > -  (match_operand:VSDQ_I_DI 3 "register_operand" "w,w,0")
> > -  (match_operand:VSDQ_I_DI 2 "register_operand" "w,0,w"))
> > - (match_operand:VSDQ_I_DI 1 "register_operand" "0,w,w"))
> > - (match_dup:VSDQ_I_DI 2)))]
> > +  [(set (match_operand:VDQ_I 0 "register_operand" "=w,w,w")
> > +   (xor:VDQ_I
> > +  (and:VDQ_I
> > +(xor:VDQ_I
> > +  (match_operand:VDQ_I 3 "register_operand" "w,w,0")
> > +  (match_operand:VDQ_I 2 "register_operand" "w,0,w"))
> > + (match_operand:VDQ_I 1 "register_operand" "0,w,w"))
> > + (match_dup:VDQ_I 2)))]
> >"TARGET_SIMD"
> >"@
> >bsl\\t%0., %3., %2.
> > @@ -2296,6 +2296,45 @@
> >[(set_attr "type" "neon_bsl")]
> >  )
> >  
> > +;; DImode is special, we want to avoid computing operations which are
> > +;; more naturally computed in general purpose registers in the vector
> > +;; registers.  If we do that, we need to move all three operands from 
> > general
> > +;; purpose registers to vector registers, then back again.  However, we
> > +;; don't want to make this pattern an UNSPEC as we'd lose scope for
> > +;; optimizations based on the component operations of a BSL.
> > +;;
> > +;; That means we need a splitter back to the individual operations, if they
> > +;; would be better calculated on the integer side.
> > +
> > +(define_insn_and_split "aarch64_simd_bsldi_internal"
> > +  [(set (match_operand:DI 0 "register_operand" "=w,w,w,")
> > +   (xor:DI
> > +  (and:DI
> > +(xor:DI
> > +  (match_operand:DI 3 "register_operand" "w,0,w,r")
> > +  (match_operand:DI 2 "register_operand" "w,w,0,r"))
> > +(match_operand:DI 1 "register_operand" 

Re: [PATCH][AArch64][GCC 6] PR target/79041: Correct -mpc-relative-literal-loads logic in aarch64_classify_symbol

2017-07-03 Thread Yvan Roux
On 27 June 2017 at 13:14, Yvan Roux  wrote:
> Hi Wilco
>
> On 27 June 2017 at 12:53, Wilco Dijkstra  wrote:
>> Hi Yvan,
>>
>>> Here is the backport of Wilco's patch (r237607) along with Kyrill's
>>> one (r244643, which removed the remaining occurences of
>>> aarch64_nopcrelative_literal_loads).  To fix the issue the original
>>> patch has to be modified, to keep aarch64_pcrelative_literal_loads
>>> test for large models in aarch64_classify_symbol.
>>
>> The patch looks good to me, however I can't approve it.
>
> ok thanks for the review.
>
>>> On trunk and gcc-7-branch the :lo12: relocations are not generated
>>> because of Wilco's fix for pr78733 (r243456 and 243486), but my
>>> understanding is that the bug is still present since compiling
>>> gcc.target/aarch64/pr78733.c with -mcmodel=large brings back the
>>> :lo12: relocations (I'll submit a patch to add the test back if my
>>> understanding is correct).
>>
>> You're right, eventhough -mpc-relative-literal-loads doesn't make much sense
>> in the large memory model, it seems best to keep the option orthogonal to
>> enable the workaround. I've prepared a patch to fix this on trunk/GCC7.
>> It also adds a test which we should add to your changes to GCC6 too.
>
> ok, I think it is what kugan's proposed earlier today in:
>
> https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01967.html
>
> I agree that -mpc-relative-literal-loads and large memory model
> doesn't make much sense, now it is what is used in kernel build
> system, but if you handle that in a bigger fix already, that's awesome
> :)

ping?
https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01708.html

> Thanks
> Yvan
>
>> Wilco


Re: [Patch AArch64 2/2] Fix memory sizes to load/store patterns

2017-07-03 Thread James Greenhalgh
On Wed, Jun 21, 2017 at 11:50:08AM +0100, James Greenhalgh wrote:
> *ping*

*ping*x2

Thanks,
James

> On Mon, Jun 12, 2017 at 02:54:00PM +0100, James Greenhalgh wrote:
> > 
> > Hi,
> > 
> > There seems to be a partial misconception in the AArch64 backend that
> > load1/load2 referred to the number of registers to load, rather than the
> > number of words to load. This patch fixes that using the new "number of
> > byte" types added in the previous patch.
> > 
> > That means using the load_16 and store_16 types that were defined in the
> > previous patch for the first time in the AArch64 backend. To ensure
> > continuity for scheduling models, I've just split this out from load_8.
> > Please update your models if this is very wrong!
> > 
> > Bootstrapped on aarch64-none-linux-gnu with no issue.
> > 
> > OK?
> > 
> > Thanks,
> > James
> > 
> > ---
> > 2017-06-12  James Greenhalgh  
> > 
> > * config/aarch64/aarch64.md (movdi_aarch64): Set load/store
> > types correctly.
> > (movti_aarch64): Likewise.
> > (movdf_aarch64): Likewise.
> > (movtf_aarch64): Likewise.
> > (load_pairdi): Likewise.
> > (store_pairdi): Likewise.
> > (load_pairdf): Likewise.
> > (store_pairdf): Likewise.
> > (loadwb_pair_): Likewise.
> > (storewb_pair_): Likewise.
> > (ldr_got_small_): Likewise.
> > (ldr_got_small_28k_): Likewise.
> > (ldr_got_tiny): Likewise.
> > * config/aarch64/iterators.md (ldst_sz): New.
> > (ldpstp_sz): Likewise.
> > * config/aarch64/thunderx.md (thunderx_storepair): Split store_8
> > to store_16.
> > (thunderx_load): Split load_8 to load_16.
> > * config/aarch64/thunderx2t99.md (thunderx2t99_loadpair): Split
> > load_8 to load_16.
> > (thunderx2t99_storepair_basic): Split store_8 to store_16.
> > * config/arm/xgene1.md (xgene1_load_pair): Split load_8 to load_16.
> > (xgene1_store_pair): Split store_8 to store_16.
> > 
> 
> > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> > index 11295a6..a1385e3 100644
> > --- a/gcc/config/aarch64/aarch64.md
> > +++ b/gcc/config/aarch64/aarch64.md
> > @@ -981,7 +981,7 @@
> > DONE;
> >  }"
> >[(set_attr "type" "mov_reg,mov_reg,mov_reg,mov_imm,mov_imm,\
> > - load_4,load_4,store_4,store_4,\
> > + load_8,load_8,store_8,store_8,\
> >   adr,adr,f_mcr,f_mrc,fmov,neon_move")
> > (set_attr "fp" "*,*,*,*,*,*,yes,*,yes,*,*,yes,yes,yes,*")
> > (set_attr "simd" "*,*,*,*,*,*,*,*,*,*,*,*,*,*,yes")]
> > @@ -1026,7 +1026,8 @@
> > ldr\\t%q0, %1
> > str\\t%q1, %0"
> >[(set_attr "type" "multiple,f_mcr,f_mrc,neon_logic_q, \
> > -load_8,store_8,store_8,f_loadd,f_stored")
> > +load_16,store_16,store_16,\
> > + load_16,store_16")
> > (set_attr "length" "8,8,8,4,4,4,4,4,4")
> > (set_attr "simd" "*,*,*,yes,*,*,*,*,*")
> > (set_attr "fp" "*,*,*,*,*,*,*,yes,yes")]
> > @@ -1121,7 +1122,7 @@
> > str\\t%x1, %0
> > mov\\t%x0, %x1"
> >[(set_attr "type" "neon_move,f_mcr,f_mrc,fmov,fconstd,\
> > - f_loadd,f_stored,load_4,store_4,mov_reg")
> > + f_loadd,f_stored,load_8,store_8,mov_reg")
> > (set_attr "simd" "yes,*,*,*,*,*,*,*,*,*")]
> >  )
> >  
> > @@ -1145,7 +1146,7 @@
> > stp\\t%1, %H1, %0
> > stp\\txzr, xzr, %0"
> >[(set_attr "type" "logic_reg,multiple,f_mcr,f_mrc,neon_move_q,f_mcr,\
> > - f_loadd,f_stored,load_8,store_8,store_8")
> > + f_loadd,f_stored,load_16,store_16,store_16")
> > (set_attr "length" "4,8,8,8,4,4,4,4,4,4,4")
> > (set_attr "simd" "yes,*,*,*,yes,*,*,*,*,*,*")]
> >  )
> > @@ -1209,7 +1210,7 @@
> >"@
> > ldp\\t%x0, %x2, %1
> > ldp\\t%d0, %d2, %1"
> > -  [(set_attr "type" "load_8,neon_load1_2reg")
> > +  [(set_attr "type" "load_16,neon_load1_2reg")
> > (set_attr "fp" "*,yes")]
> >  )
> >  
> > @@ -1244,7 +1245,7 @@
> >"@
> > stp\\t%x1, %x3, %0
> > stp\\t%d1, %d3, %0"
> > -  [(set_attr "type" "store_8,neon_store1_2reg")
> > +  [(set_attr "type" "store_16,neon_store1_2reg")
> > (set_attr "fp" "*,yes")]
> >  )
> >  
> > @@ -1278,7 +1279,7 @@
> >"@
> > ldp\\t%d0, %d2, %1
> > ldp\\t%x0, %x2, %1"
> > -  [(set_attr "type" "neon_load1_2reg,load_8")
> > +  [(set_attr "type" "neon_load1_2reg,load_16")
> > (set_attr "fp" "yes,*")]
> >  )
> >  
> > @@ -1312,7 +1313,7 @@
> >"@
> > stp\\t%d1, %d3, %0
> > stp\\t%x1, %x3, %0"
> > -  [(set_attr "type" "neon_store1_2reg,store_8")
> > +  [(set_attr "type" "neon_store1_2reg,store_16")
> > (set_attr "fp" "yes,*")]
> >  )
> >  
> > @@ -1330,7 +1331,7 @@
> > (match_operand:P 5 "const_int_operand" "n"])]
> >"INTVAL (operands[5]) == GET_MODE_SIZE (mode)"
> >"ldp\\t%2, %3, [%1], %4"
> > -  [(set_attr "type" "load_8")]
> > +  

Re: [Patch AArch64 2/2] Fix memory sizes to load/store patterns

2017-07-03 Thread James Greenhalgh
On Wed, Jun 21, 2017 at 11:50:08AM +0100, James Greenhalgh wrote:
> *ping*

Ping*2

Thanks,
James

> On Mon, Jun 12, 2017 at 02:54:00PM +0100, James Greenhalgh wrote:
> > 
> > Hi,
> > 
> > There seems to be a partial misconception in the AArch64 backend that
> > load1/load2 referred to the number of registers to load, rather than the
> > number of words to load. This patch fixes that using the new "number of
> > byte" types added in the previous patch.
> > 
> > That means using the load_16 and store_16 types that were defined in the
> > previous patch for the first time in the AArch64 backend. To ensure
> > continuity for scheduling models, I've just split this out from load_8.
> > Please update your models if this is very wrong!
> > 
> > Bootstrapped on aarch64-none-linux-gnu with no issue.
> > 
> > OK?
> > 
> > Thanks,
> > James
> > 
> > ---
> > 2017-06-12  James Greenhalgh  
> > 
> > * config/aarch64/aarch64.md (movdi_aarch64): Set load/store
> > types correctly.
> > (movti_aarch64): Likewise.
> > (movdf_aarch64): Likewise.
> > (movtf_aarch64): Likewise.
> > (load_pairdi): Likewise.
> > (store_pairdi): Likewise.
> > (load_pairdf): Likewise.
> > (store_pairdf): Likewise.
> > (loadwb_pair_): Likewise.
> > (storewb_pair_): Likewise.
> > (ldr_got_small_): Likewise.
> > (ldr_got_small_28k_): Likewise.
> > (ldr_got_tiny): Likewise.
> > * config/aarch64/iterators.md (ldst_sz): New.
> > (ldpstp_sz): Likewise.
> > * config/aarch64/thunderx.md (thunderx_storepair): Split store_8
> > to store_16.
> > (thunderx_load): Split load_8 to load_16.
> > * config/aarch64/thunderx2t99.md (thunderx2t99_loadpair): Split
> > load_8 to load_16.
> > (thunderx2t99_storepair_basic): Split store_8 to store_16.
> > * config/arm/xgene1.md (xgene1_load_pair): Split load_8 to load_16.
> > (xgene1_store_pair): Split store_8 to store_16.
> > 
> 
> > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> > index 11295a6..a1385e3 100644
> > --- a/gcc/config/aarch64/aarch64.md
> > +++ b/gcc/config/aarch64/aarch64.md
> > @@ -981,7 +981,7 @@
> > DONE;
> >  }"
> >[(set_attr "type" "mov_reg,mov_reg,mov_reg,mov_imm,mov_imm,\
> > - load_4,load_4,store_4,store_4,\
> > + load_8,load_8,store_8,store_8,\
> >   adr,adr,f_mcr,f_mrc,fmov,neon_move")
> > (set_attr "fp" "*,*,*,*,*,*,yes,*,yes,*,*,yes,yes,yes,*")
> > (set_attr "simd" "*,*,*,*,*,*,*,*,*,*,*,*,*,*,yes")]
> > @@ -1026,7 +1026,8 @@
> > ldr\\t%q0, %1
> > str\\t%q1, %0"
> >[(set_attr "type" "multiple,f_mcr,f_mrc,neon_logic_q, \
> > -load_8,store_8,store_8,f_loadd,f_stored")
> > +load_16,store_16,store_16,\
> > + load_16,store_16")
> > (set_attr "length" "8,8,8,4,4,4,4,4,4")
> > (set_attr "simd" "*,*,*,yes,*,*,*,*,*")
> > (set_attr "fp" "*,*,*,*,*,*,*,yes,yes")]
> > @@ -1121,7 +1122,7 @@
> > str\\t%x1, %0
> > mov\\t%x0, %x1"
> >[(set_attr "type" "neon_move,f_mcr,f_mrc,fmov,fconstd,\
> > - f_loadd,f_stored,load_4,store_4,mov_reg")
> > + f_loadd,f_stored,load_8,store_8,mov_reg")
> > (set_attr "simd" "yes,*,*,*,*,*,*,*,*,*")]
> >  )
> >  
> > @@ -1145,7 +1146,7 @@
> > stp\\t%1, %H1, %0
> > stp\\txzr, xzr, %0"
> >[(set_attr "type" "logic_reg,multiple,f_mcr,f_mrc,neon_move_q,f_mcr,\
> > - f_loadd,f_stored,load_8,store_8,store_8")
> > + f_loadd,f_stored,load_16,store_16,store_16")
> > (set_attr "length" "4,8,8,8,4,4,4,4,4,4,4")
> > (set_attr "simd" "yes,*,*,*,yes,*,*,*,*,*,*")]
> >  )
> > @@ -1209,7 +1210,7 @@
> >"@
> > ldp\\t%x0, %x2, %1
> > ldp\\t%d0, %d2, %1"
> > -  [(set_attr "type" "load_8,neon_load1_2reg")
> > +  [(set_attr "type" "load_16,neon_load1_2reg")
> > (set_attr "fp" "*,yes")]
> >  )
> >  
> > @@ -1244,7 +1245,7 @@
> >"@
> > stp\\t%x1, %x3, %0
> > stp\\t%d1, %d3, %0"
> > -  [(set_attr "type" "store_8,neon_store1_2reg")
> > +  [(set_attr "type" "store_16,neon_store1_2reg")
> > (set_attr "fp" "*,yes")]
> >  )
> >  
> > @@ -1278,7 +1279,7 @@
> >"@
> > ldp\\t%d0, %d2, %1
> > ldp\\t%x0, %x2, %1"
> > -  [(set_attr "type" "neon_load1_2reg,load_8")
> > +  [(set_attr "type" "neon_load1_2reg,load_16")
> > (set_attr "fp" "yes,*")]
> >  )
> >  
> > @@ -1312,7 +1313,7 @@
> >"@
> > stp\\t%d1, %d3, %0
> > stp\\t%x1, %x3, %0"
> > -  [(set_attr "type" "neon_store1_2reg,store_8")
> > +  [(set_attr "type" "neon_store1_2reg,store_16")
> > (set_attr "fp" "yes,*")]
> >  )
> >  
> > @@ -1330,7 +1331,7 @@
> > (match_operand:P 5 "const_int_operand" "n"])]
> >"INTVAL (operands[5]) == GET_MODE_SIZE (mode)"
> >"ldp\\t%2, %3, [%1], %4"
> > -  [(set_attr "type" "load_8")]
> > +  

Re: [6/7] Add a helper for getting the overall alignment of a DR

2017-07-03 Thread Richard Biener
On Mon, Jul 3, 2017 at 9:40 AM, Richard Sandiford
 wrote:
> This combines the information from previous patches to give a guaranteed
> alignment for the DR as a whole.  This should be a bit safer than using
> base_element_aligned, since that only really took the base into account
> (not the init or offset).
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Ok.

Thanks for cleaning up all this mess...

Richard.

> Richard
>
>
> 2017-07-03  Richard Sandiford  
>
> gcc/
> * tree-data-ref.h (dr_alignment): Declare.
> * tree-data-ref.c (dr_alignment): New function.
> * tree-vectorizer.h (dataref_aux): Remove base_element_aligned.
> * tree-vect-data-refs.c (vect_compute_data_ref_alignment): Don't
> set it.
> * tree-vect-stmts.c (vectorizable_store): Use dr_alignment.
>
> Index: gcc/tree-data-ref.h
> ===
> --- gcc/tree-data-ref.h 2017-07-03 08:17:58.418572314 +0100
> +++ gcc/tree-data-ref.h 2017-07-03 08:18:29.775412176 +0100
> @@ -405,6 +405,16 @@ extern bool compute_all_dependences (vec
>  vec, bool);
>  extern tree find_data_references_in_bb (struct loop *, basic_block,
>  vec *);
> +extern unsigned int dr_alignment (innermost_loop_behavior *);
> +
> +/* Return the alignment in bytes that DR is guaranteed to have at all
> +   times.  */
> +
> +inline unsigned int
> +dr_alignment (data_reference *dr)
> +{
> +  return dr_alignment (_INNERMOST (dr));
> +}
>
>  extern bool dr_may_alias_p (const struct data_reference *,
> const struct data_reference *, bool);
> Index: gcc/tree-data-ref.c
> ===
> --- gcc/tree-data-ref.c 2017-07-03 08:17:58.418572314 +0100
> +++ gcc/tree-data-ref.c 2017-07-03 08:17:59.017546839 +0100
> @@ -4769,6 +4769,30 @@ find_data_references_in_loop (struct loo
>return NULL_TREE;
>  }
>
> +/* Return the alignment in bytes that DRB is guaranteed to have at all
> +   times.  */
> +
> +unsigned int
> +dr_alignment (innermost_loop_behavior *drb)
> +{
> +  /* Get the alignment of BASE_ADDRESS + INIT.  */
> +  unsigned int alignment = drb->base_alignment;
> +  unsigned int misalignment = (drb->base_misalignment
> +  + TREE_INT_CST_LOW (drb->init));
> +  if (misalignment != 0)
> +alignment = MIN (alignment, misalignment & -misalignment);
> +
> +  /* Cap it to the alignment of OFFSET.  */
> +  if (!integer_zerop (drb->offset))
> +alignment = MIN (alignment, drb->offset_alignment);
> +
> +  /* Cap it to the alignment of STEP.  */
> +  if (!integer_zerop (drb->step))
> +alignment = MIN (alignment, drb->step_alignment);
> +
> +  return alignment;
> +}
> +
>  /* Recursive helper function.  */
>
>  static bool
> Index: gcc/tree-vectorizer.h
> ===
> --- gcc/tree-vectorizer.h   2017-07-03 08:17:58.419572272 +0100
> +++ gcc/tree-vectorizer.h   2017-07-03 08:18:09.031167838 +0100
> @@ -752,8 +752,6 @@ struct dataref_aux {
>int misalignment;
>/* If true the alignment of base_decl needs to be increased.  */
>bool base_misaligned;
> -  /* If true we know the base is at least vector element alignment aligned.  
> */
> -  bool base_element_aligned;
>tree base_decl;
>  };
>
> Index: gcc/tree-vect-data-refs.c
> ===
> --- gcc/tree-vect-data-refs.c   2017-07-03 08:17:58.419572272 +0100
> +++ gcc/tree-vect-data-refs.c   2017-07-03 08:17:59.018546796 +0100
> @@ -731,12 +731,6 @@ vect_compute_data_ref_alignment (struct
>unsigned int base_alignment = drb->base_alignment;
>unsigned int base_misalignment = drb->base_misalignment;
>unsigned HOST_WIDE_INT vector_alignment = TYPE_ALIGN_UNIT (vectype);
> -  unsigned HOST_WIDE_INT element_alignment
> -= TYPE_ALIGN_UNIT (TREE_TYPE (vectype));
> -
> -  if (base_alignment >= element_alignment
> -  && (base_misalignment & (element_alignment - 1)) == 0)
> -DR_VECT_AUX (dr)->base_element_aligned = true;
>
>if (drb->offset_alignment < vector_alignment
>|| !step_preserves_misalignment_p
> @@ -797,7 +791,6 @@ vect_compute_data_ref_alignment (struct
>
>DR_VECT_AUX (dr)->base_decl = base;
>DR_VECT_AUX (dr)->base_misaligned = true;
> -  DR_VECT_AUX (dr)->base_element_aligned = true;
>base_misalignment = 0;
>  }
>unsigned int misalignment = (base_misalignment
> Index: gcc/tree-vect-stmts.c
> ===
> --- gcc/tree-vect-stmts.c   2017-07-03 08:07:42.512875037 +0100
> +++ gcc/tree-vect-stmts.c   2017-07-03 08:17:59.019546754 +0100
> @@ -6359,11 +6359,7 @@ vectorizable_store (gimple *stmt, gimple
> 

Re: [Patch AArch64 docs] Document the RcPc extension

2017-07-03 Thread James Greenhalgh
On Fri, Jun 23, 2017 at 11:21:43AM +0100, James Greenhalgh wrote:
> 
> Hi,
> 
> Andrew pointed out that I did not document the new architecture extension
> flag I added the RcPc iextension. This was intentional, as enablihg the rcpc
> extension does not change GCC code generation, and is just an assembler flag.
> But for completeness, here is documentation for the new option.
> 
> OK?

Ping.

Thanks,
James

> 2017-06-21  James Greenhalgh  
> 
>   * doc/invoke.texi (rcpc architecture extension): Document it.
> 

> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 7e7a16a5..db00e51 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -14172,6 +14172,10 @@ Enable Large System Extension instructions.  This is 
> on by default for
>  @option{-march=armv8.1-a}.
>  @item fp16
>  Enable FP16 extension.  This also enables floating-point instructions.
> +@item rcpc
> +Enable the RcPc extension.  This does not change code generation from GCC,
> +but is passed on to the assembler, enabling inline asm statements to use
> +instructions from the RcPc extension.
>  
>  @end table
>  



Re: [5/7] Add DR_BASE_ALIGNMENT and DR_BASE_MISALIGNMENT

2017-07-03 Thread Richard Biener
On Mon, Jul 3, 2017 at 9:38 AM, Richard Sandiford
 wrote:
> This patch records the base alignment and misalignment in
> innermost_loop_behavior, to avoid the second-guessing that was
> previously done in vect_compute_data_ref_alignment.  It also makes
> vect_analyze_data_refs use dr_analyze_innermost, instead of having an
> almost-copy of the same code.
>
> I wasn't sure whether the alignments should be measured in bits
> (for consistency with most other interfaces) or in bytes (for consistency
> with DR_ALIGNED_TO, now DR_OFFSET_ALIGNMENT, and with *_ptr_info_alignment).
> I went for bytes because:
>
> - I think in practice most consumers are going to want bytes.
>   E.g. using bytes avoids having to mix TYPE_ALIGN and TYPE_ALIGN_UNIT
>   in vect_compute_data_ref_alignment.
>
> - It means that any bit-level paranoia is dealt with when building
>   the innermost_loop_behavior and doesn't get pushed down to consumers.
>
> Tested an aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Ok.

Thanks,
Richard.

> Richard
>
>
> 2017-07-03  Richard Sandiford  
>
> gcc/
> * tree-data-ref.h (innermost_loop_behavior): Add base_alignment
> and base_misalignment fields.
> (DR_BASE_ALIGNMENT, DR_BASE_MISALIGNMENT): New macros.
> * tree-data-ref.c: Include builtins.h.
> (dr_analyze_innermost): Set up the new innmost_loop_behavior fields.
> * tree-vectorizer.h (STMT_VINFO_DR_BASE_ALIGNMENT): New macro.
> (STMT_VINFO_DR_BASE_MISALIGNMENT): Likewise.
> * tree-vect-data-refs.c: Include tree-cfg.h.
> (vect_compute_data_ref_alignment): Use the new innermost_loop_behavior
> fields instead of calculating an alignment here.
> (vect_analyze_data_refs): Use dr_analyze_innermost.  Dump the new
> innermost_loop_behavior fields.
>
> Index: gcc/tree-data-ref.h
> ===
> --- gcc/tree-data-ref.h 2017-07-03 07:52:14.194782203 +0100
> +++ gcc/tree-data-ref.h 2017-07-03 07:52:55.920272347 +0100
> @@ -52,6 +52,42 @@ struct innermost_loop_behavior
>tree init;
>tree step;
>
> +  /* BASE_ADDRESS is known to be misaligned by BASE_MISALIGNMENT bytes
> + from an alignment boundary of BASE_ALIGNMENT bytes.  For example,
> + if we had:
> +
> +   struct S __attribute__((aligned(16))) { ... };
> +
> +   char *ptr;
> +   ... *(struct S *) (ptr - 4) ...;
> +
> + the information would be:
> +
> +   base_address:  ptr
> +   base_aligment:  16
> +   base_misalignment:   4
> +   init:   -4
> +
> + where init cancels the base misalignment.  If instead we had a
> + reference to a particular field:
> +
> +   struct S __attribute__((aligned(16))) { ... int f; ... };
> +
> +   char *ptr;
> +   ... ((struct S *) (ptr - 4))->f ...;
> +
> + the information would be:
> +
> +   base_address:  ptr
> +   base_aligment:  16
> +   base_misalignment:   4
> +   init:   -4 + offsetof (S, f)
> +
> + where base_address + init might also be misaligned, and by a different
> + amount from base_address.  */
> +  unsigned int base_alignment;
> +  unsigned int base_misalignment;
> +
>/* The largest power of two that divides OFFSET, capped to a suitably
>   high value if the offset is zero.  This is a byte rather than a bit
>   quantity.  */
> @@ -147,6 +183,8 @@ #define DR_OFFSET(DR)  (DR)-
>  #define DR_INIT(DR)(DR)->innermost.init
>  #define DR_STEP(DR)(DR)->innermost.step
>  #define DR_PTR_INFO(DR)(DR)->alias.ptr_info
> +#define DR_BASE_ALIGNMENT(DR)  (DR)->innermost.base_alignment
> +#define DR_BASE_MISALIGNMENT(DR)   (DR)->innermost.base_misalignment
>  #define DR_OFFSET_ALIGNMENT(DR)(DR)->innermost.offset_alignment
>  #define DR_STEP_ALIGNMENT(DR)  (DR)->innermost.step_alignment
>  #define DR_INNERMOST(DR)   (DR)->innermost
> Index: gcc/tree-data-ref.c
> ===
> --- gcc/tree-data-ref.c 2017-07-03 07:52:14.193782226 +0100
> +++ gcc/tree-data-ref.c 2017-07-03 07:52:55.920272347 +0100
> @@ -94,6 +94,7 @@ Software Foundation; either version 3, o
>  #include "dumpfile.h"
>  #include "tree-affine.h"
>  #include "params.h"
> +#include "builtins.h"
>
>  static struct datadep_stats
>  {
> @@ -802,11 +803,26 @@ dr_analyze_innermost (struct data_refere
>return false;
>  }
>
> +  /* Calculate the alignment and misalignment for the inner reference.  */
> +  unsigned int HOST_WIDE_INT base_misalignment;
> +  unsigned int base_alignment;
> +  get_object_alignment_1 (base, _alignment, _misalignment);
> +
> +  /* There are no bitfield references remaining in BASE, so the values
> + we got back must be whole bytes.  */
> +  gcc_assert (base_alignment % BITS_PER_UNIT == 0
> + 

Do not merge blocks when profile would be lost

2017-07-03 Thread Jan Hubicka
Hi,
consider function

test ()
{
  do_something_that_will_call_exit ();
  report_catastrophic_failure ();
}

No while profile estimation we know that do_something_that_will_call_exit may
end execution and thus split BBs and introduce fake edges.  After fake edges
are removed we however merge BBs and in htat case report_catastrophic_failure
gets non-zero profile.

Theresa added code to drop poroflies when sum of counts is non-zero and we also
keep preporting profile inconsistencies. The following patch fixes it by not
merging the BBs during the tree optimization.  After RTL expansion we merge them
but I think that is quite OK.

Bootstrapped/regtested x86_64-linux, plan to commit it later today.

Honza

* tree-cfgcleanup.c (want_merge_blocks_p): New function.
(cleanup_tree_cfg_bb): Use it.
* profile-count.h (profile_count::of_for_merging, profile_count::merge):
New functions.
* tree-cfg.c (gimple_merge_blocks): Use profile_count::merge.

Index: tree-cfgcleanup.c
===
--- tree-cfgcleanup.c   (revision 249885)
+++ tree-cfgcleanup.c   (working copy)
@@ -636,6 +636,19 @@ fixup_noreturn_call (gimple *stmt)
   return changed;
 }
 
+/* Return true if we want to merge BB1 and BB2 into a single block.  */
+
+static bool
+want_merge_blocks_p (basic_block bb1, basic_block bb2)
+{
+  if (!can_merge_blocks_p (bb1, bb2))
+return false;
+  gimple_stmt_iterator gsi = gsi_last_nondebug_bb (bb1);
+  if (gsi_end_p (gsi) || !stmt_can_terminate_bb_p (gsi_stmt (gsi)))
+return true;
+  return bb1->count.ok_for_merging (bb2->count);
+}
+
 
 /* Tries to cleanup cfg in basic block BB.  Returns true if anything
changes.  */
@@ -652,7 +665,7 @@ cleanup_tree_cfg_bb (basic_block bb)
  This happens when we visit BBs in a non-optimal order and
  avoids quadratic behavior with adjusting stmts BB pointer.  */
   if (single_pred_p (bb)
-  && can_merge_blocks_p (single_pred (bb), bb))
+  && want_merge_blocks_p (single_pred (bb), bb))
 /* But make sure we _do_ visit it.  When we remove unreachable paths
ending in a backedge we fail to mark the destinations predecessors
as changed.  */
@@ -662,7 +675,7 @@ cleanup_tree_cfg_bb (basic_block bb)
  conditional branches (due to the elimination of single-valued PHI
  nodes).  */
   else if (single_succ_p (bb)
-  && can_merge_blocks_p (bb, single_succ (bb)))
+  && want_merge_blocks_p (bb, single_succ (bb)))
 {
   merge_blocks (bb, single_succ (bb));
   return true;
Index: profile-count.h
===
--- profile-count.h (revision 249885)
+++ profile-count.h (working copy)
@@ -565,6 +565,31 @@ public:
   return initialized_p ();
 }
 
+  /* When merging basic blocks, the two different profile counts are unified.
+ Return true if this can be done without losing info about profile.
+ The only case we care about here is when first BB contains something
+ that makes it terminate in a way not visible in CFG.  */
+  bool ok_for_merging (profile_count other) const
+{
+  if (m_quality < profile_adjusted
+ || other.m_quality < profile_adjusted)
+   return true;
+  return !(other < *this);
+}
+
+  /* When merging two BBs with different counts, pick common count that looks
+ most representative.  */
+  profile_count merge (profile_count other) const
+{
+  if (*this == other || !other.initialized_p ()
+ || m_quality > other.m_quality)
+   return *this;
+  if (other.m_quality > m_quality
+ || other > *this)
+   return other;
+  return *this;
+}
+
   /* Basic operations.  */
   bool operator== (const profile_count ) const
 {
Index: tree-cfg.c
===
--- tree-cfg.c  (revision 249887)
+++ tree-cfg.c  (working copy)
@@ -2076,7 +2081,7 @@ gimple_merge_blocks (basic_block a, basi
  profiles.  */
   if (a->loop_father == b->loop_father)
 {
-  a->count = MAX (a->count, b->count);
+  a->count = a->count.merge (b->count);
   a->frequency = MAX (a->frequency, b->frequency);
 }
 


Re: [PATCH] Add dotfn

2017-07-03 Thread Richard Biener
On Mon, 3 Jul 2017, Tom de Vries wrote:

> On 07/03/2017 11:53 AM, Richard Biener wrote:
> > On Mon, 3 Jul 2017, Tom de Vries wrote:
> > 
> > > On 07/03/2017 09:05 AM, Richard Biener wrote:
> > > > On Mon, 3 Jul 2017, Tom de Vries wrote:
> > > > 
> > > > > Hi,
> > > > > 
> > > > > this patch adds a debug function dotfn and a convenience macro DOTFN
> > > > > similar
> > > > > to dot-fn in gdbhooks.py.
> > > > > 
> > > > > It can be used to have the compiler:
> > > > > - dump a control flow graph, or
> > > > > - pop up a control flow graph window
> > > > > at specific moments in the compilation flow, for debugging purposes.
> > > > > 
> > > > > Bootstrapped and reg-tested on x86_64.
> > > > > 
> > > > > Used for debugging PR81192.
> > > > > 
> > > > > OK for trunk?
> > > > 
> > > > Why's dot-fn not enough? > I'd rather extend stuff in gdbhooks.py than
> > > > adding this kind of stuff to gcc itself.
> > > 
> > > When expressing where and when to dump or pop-up a control flow graph,
> > > sometimes it's easier for me to do that in C than in gdb scripting.
> > 
> > Ah, you mean by patching GCC.  Yeah, I can see that this is useful
> > in some cases.  OTOH I had dot-fn this way in my local dev tree for
> > a few years ...
> > 
> > I'm retracting my objection but leave approval to somebody else
> > just to see if we can arrive at any consensus for "advanced"
> > debug stuff in GCC itself.
> > 
> 
> Ack.
> 
> > For my usecase the gdb python stuff is now nearly perfect -- apart
> > from the cases where graph generation ICEs (like corrupt loop info).
> 
> I suppose we can make a dotfn variant that calls draw_cfg_nodes_no_loops even
> if loop info is present.

Locally I have

@@ -236,7 +242,8 @@ draw_cfg_nodes_for_loop (pretty_printer
 static void
 draw_cfg_nodes (pretty_printer *pp, struct function *fun)
 {
-  if (loops_for_fn (fun))
+  if (loops_for_fn (fun)
+  && !(loops_for_fn (fun)->state & LOOPS_NEED_FIXUP))
 draw_cfg_nodes_for_loop (pp, fun->funcdef_no, get_loop (fun, 0));
   else
 draw_cfg_nodes_no_loops (pp, fun);

that avoids most of the cases but of course not always.  I suppose
a special dump_flag might work here.  The problem is really
get_loop_body* trusting loop->num_nodes and ICEing when that doesn't 
match.  Using get_loop_body_with_size with n_basic_blocks_for_fn
would avoid that but it isn't a replacement for
get_loop_body_in_bfs_order -- at least with get_loop_body_with_size
we could avoid repeatedly allocating the array in draw_cfg_nodes_for_loop.

Not sure if bfs_order dots so much nicer than dfs order.

> 
> Btw, I think this needs fixing:
> ...
> /* Draw all edges in the CFG.  Retreating edges are drawin as not 
>constraining, this makes the layout of the graph better. 
>(??? Calling mark_dfs_back may change the compiler's behavior when 
>dumping, but computing back edges here for ourselves is also not 
>desirable.)  */
> 
> static void
> draw_cfg_edges (pretty_printer *pp, struct function *fun)
> {
>   basic_block bb;
>   mark_dfs_back_edges ();
>   FOR_ALL_BB_FN (bb, cfun)
> draw_cfg_node_succ_edges (pp, fun->funcdef_no, bb);
> ...
> 
> We don't want that calling a debug function changes compiler behavior
> (something I ran into while debugging PR81192).
> 
> Any suggestion on how to address this? We could allocate a bitmap before and
> save the edge flag for all edges, and restore afterwards.

Something like that, yes.

> Thanks,
> - Tom
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [PATCH] Add dotfn

2017-07-03 Thread Tom de Vries

On 07/03/2017 11:53 AM, Richard Biener wrote:

On Mon, 3 Jul 2017, Tom de Vries wrote:


On 07/03/2017 09:05 AM, Richard Biener wrote:

On Mon, 3 Jul 2017, Tom de Vries wrote:


Hi,

this patch adds a debug function dotfn and a convenience macro DOTFN
similar
to dot-fn in gdbhooks.py.

It can be used to have the compiler:
- dump a control flow graph, or
- pop up a control flow graph window
at specific moments in the compilation flow, for debugging purposes.

Bootstrapped and reg-tested on x86_64.

Used for debugging PR81192.

OK for trunk?


Why's dot-fn not enough? > I'd rather extend stuff in gdbhooks.py than
adding this kind of stuff to gcc itself.


When expressing where and when to dump or pop-up a control flow graph,
sometimes it's easier for me to do that in C than in gdb scripting.


Ah, you mean by patching GCC.  Yeah, I can see that this is useful
in some cases.  OTOH I had dot-fn this way in my local dev tree for
a few years ...

I'm retracting my objection but leave approval to somebody else
just to see if we can arrive at any consensus for "advanced"
debug stuff in GCC itself.



Ack.


For my usecase the gdb python stuff is now nearly perfect -- apart
from the cases where graph generation ICEs (like corrupt loop info).


I suppose we can make a dotfn variant that calls draw_cfg_nodes_no_loops 
even if loop info is present.



Btw, I think this needs fixing:
...
/* Draw all edges in the CFG.  Retreating edges are drawin as not 

   constraining, this makes the layout of the graph better. 

   (??? Calling mark_dfs_back may change the compiler's behavior when 

   dumping, but computing back edges here for ourselves is also not 


   desirable.)  */

static void
draw_cfg_edges (pretty_printer *pp, struct function *fun)
{
  basic_block bb;
  mark_dfs_back_edges ();
  FOR_ALL_BB_FN (bb, cfun)
draw_cfg_node_succ_edges (pp, fun->funcdef_no, bb);
...

We don't want that calling a debug function changes compiler behavior 
(something I ran into while debugging PR81192).


Any suggestion on how to address this? We could allocate a bitmap before 
and save the edge flag for all edges, and restore afterwards.


Thanks,
- Tom


Re: [4/7] Add DR_STEP_ALIGNMENT

2017-07-03 Thread Richard Biener
On Mon, Jul 3, 2017 at 9:35 AM, Richard Sandiford
 wrote:
> A later patch adds base alignment information to innermost_loop_behavior.
> After that, the only remaining piece of alignment information that wasn't
> immediately obvious was the step alignment.  Adding that allows a minor
> simplification to vect_compute_data_ref_alignment, and also potentially
> improves the handling of variable strides for outer loop vectorisation.
> A later patch will also use it to give the alignment of the DR as a whole.
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Ok.

> Richard
>
>
> 2017-07-03  Richard Sandiford  
>
> gcc/
> * tree-data-ref.h (innermost_loop_behavior): Add a step_alignment
> field.
> (DR_STEP_ALIGNMENT): New macro.
> * tree-vectorizer.h (STMT_VINFO_DR_STEP_ALIGNMENT): Likewise.
> * tree-data-ref.c (dr_analyze_innermost): Initalize step_alignment.
> (create_data_ref): Print it.
> * tree-vect-stmts.c (vectorizable_load): Use the step alignment
> to tell whether the step preserves vector (mis)alignment.
> * tree-vect-data-refs.c (vect_compute_data_ref_alignment): Likewise.
> Move the check for an integer step and generalise to all INTEGER_CST.
> (vect_analyze_data_refs): Set DR_STEP_ALIGNMENT when setting DR_STEP.
> Print the outer step alignment.
>
> Index: gcc/tree-data-ref.h
> ===
> --- gcc/tree-data-ref.h 2017-07-03 07:51:31.005161213 +0100
> +++ gcc/tree-data-ref.h 2017-07-03 07:52:14.194782203 +0100
> @@ -56,6 +56,9 @@ struct innermost_loop_behavior
>   high value if the offset is zero.  This is a byte rather than a bit
>   quantity.  */
>unsigned int offset_alignment;
> +
> +  /* Likewise for STEP.  */
> +  unsigned int step_alignment;
>  };
>
>  /* Describes the evolutions of indices of the memory reference.  The indices
> @@ -145,6 +148,7 @@ #define DR_INIT(DR)(DR)-
>  #define DR_STEP(DR)(DR)->innermost.step
>  #define DR_PTR_INFO(DR)(DR)->alias.ptr_info
>  #define DR_OFFSET_ALIGNMENT(DR)(DR)->innermost.offset_alignment
> +#define DR_STEP_ALIGNMENT(DR)  (DR)->innermost.step_alignment
>  #define DR_INNERMOST(DR)   (DR)->innermost
>
>  typedef struct data_reference *data_reference_p;
> Index: gcc/tree-vectorizer.h
> ===
> --- gcc/tree-vectorizer.h   2017-07-03 07:51:31.006161241 +0100
> +++ gcc/tree-vectorizer.h   2017-07-03 07:52:14.196782157 +0100
> @@ -709,6 +709,8 @@ #define STMT_VINFO_DR_OFFSET(S)
>  #define STMT_VINFO_DR_STEP(S)  (S)->dr_wrt_vec_loop.step
>  #define STMT_VINFO_DR_OFFSET_ALIGNMENT(S) \
>(S)->dr_wrt_vec_loop.offset_alignment
> +#define STMT_VINFO_DR_STEP_ALIGNMENT(S) \
> +  (S)->dr_wrt_vec_loop.step_alignment
>
>  #define STMT_VINFO_IN_PATTERN_P(S) (S)->in_pattern_p
>  #define STMT_VINFO_RELATED_STMT(S) (S)->related_stmt
> Index: gcc/tree-data-ref.c
> ===
> --- gcc/tree-data-ref.c 2017-07-03 07:51:31.004161185 +0100
> +++ gcc/tree-data-ref.c 2017-07-03 07:52:14.193782226 +0100
> @@ -870,6 +870,7 @@ dr_analyze_innermost (struct data_refere
>drb->init = init;
>drb->step = step;
>drb->offset_alignment = highest_pow2_factor (offset_iv.base);
> +  drb->step_alignment = highest_pow2_factor (step);
>
>if (dump_file && (dump_flags & TDF_DETAILS))
>  fprintf (dump_file, "success.\n");
> @@ -1085,6 +1086,7 @@ create_data_ref (loop_p nest, loop_p loo
>print_generic_expr (dump_file, DR_STEP (dr), TDF_SLIM);
>fprintf (dump_file, "\n\toffset alignment: %d",
>DR_OFFSET_ALIGNMENT (dr));
> +  fprintf (dump_file, "\n\tstep alignment: %d", DR_STEP_ALIGNMENT (dr));
>fprintf (dump_file, "\n\tbase_object: ");
>print_generic_expr (dump_file, DR_BASE_OBJECT (dr), TDF_SLIM);
>fprintf (dump_file, "\n");
> Index: gcc/tree-vect-stmts.c
> ===
> --- gcc/tree-vect-stmts.c   2017-07-03 07:51:05.480852682 +0100
> +++ gcc/tree-vect-stmts.c   2017-07-03 07:52:14.195782180 +0100
> @@ -7294,8 +7294,7 @@ vectorizable_load (gimple *stmt, gimple_
>   nested within an outer-loop that is being vectorized.  */
>
>if (nested_in_vect_loop
> -  && (TREE_INT_CST_LOW (DR_STEP (dr))
> - % GET_MODE_SIZE (TYPE_MODE (vectype)) != 0))
> +  && (DR_STEP_ALIGNMENT (dr) % GET_MODE_SIZE (TYPE_MODE (vectype))) != 0)
>  {
>gcc_assert (alignment_support_scheme != dr_explicit_realign_optimized);
>compute_in_loop = true;
> Index: gcc/tree-vect-data-refs.c
> ===
> --- gcc/tree-vect-data-refs.c   2017-07-03 

Re: [PATCH] Add dotfn

2017-07-03 Thread Richard Biener
On Mon, 3 Jul 2017, Tom de Vries wrote:

> On 07/03/2017 09:05 AM, Richard Biener wrote:
> > On Mon, 3 Jul 2017, Tom de Vries wrote:
> > 
> > > Hi,
> > > 
> > > this patch adds a debug function dotfn and a convenience macro DOTFN
> > > similar
> > > to dot-fn in gdbhooks.py.
> > > 
> > > It can be used to have the compiler:
> > > - dump a control flow graph, or
> > > - pop up a control flow graph window
> > > at specific moments in the compilation flow, for debugging purposes.
> > > 
> > > Bootstrapped and reg-tested on x86_64.
> > > 
> > > Used for debugging PR81192.
> > > 
> > > OK for trunk?
> > 
> > Why's dot-fn not enough? > I'd rather extend stuff in gdbhooks.py than
> > adding this kind of stuff to gcc itself.
> 
> When expressing where and when to dump or pop-up a control flow graph,
> sometimes it's easier for me to do that in C than in gdb scripting.

Ah, you mean by patching GCC.  Yeah, I can see that this is useful
in some cases.  OTOH I had dot-fn this way in my local dev tree for
a few years ...

I'm retracting my objection but leave approval to somebody else
just to see if we can arrive at any consensus for "advanced"
debug stuff in GCC itself.

For my usecase the gdb python stuff is now nearly perfect -- apart
from the cases where graph generation ICEs (like corrupt loop info).

Richard.


Re: [PATCH v2][RFC] Canonize names of attributes.

2017-07-03 Thread Martin Liška
On 06/30/2017 09:34 PM, Jason Merrill wrote:
> On Fri, Jun 30, 2017 at 5:23 AM, Martin Liška  wrote:
>> This is v2 of the patch, where just names of attributes are canonicalized.
>> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
> 
> What is the purpose of the new "strict" parameter to cmp_attribs* ?  I
> don't see any discussion of it.

It's needed for arguments of attribute names, like:

/usr/include/stdio.h:391:62: internal compiler error: in cmp_attribs, at 
tree.h:5523
  __THROWNL __attribute__ ((__format__ (__printf__, 3, 4)));

there we need strict to be set to false:

x8a64e7 cmp_attribs
../../gcc/tree.h:5523
0x8a64e7 cmp_attribs
../../gcc/tree.h:5536
0x8a64e7 convert_format_name_to_system_name
../../gcc/c-family/c-format.c:3966
0x8a6e5c convert_format_name_to_system_name
../../gcc/c-family/c-format.c:338
0x8a6e5c decode_format_attr
../../gcc/c-family/c-format.c:299
0x8aa380 handle_format_attribute(tree_node**, tree_node*, tree_node*, int, 
bool*)
../../gcc/c-family/c-format.c:4005
0x869d07 decl_attributes(tree_node**, tree_node*, int)
../../gcc/attribs.c:548
0x6c0ee3 cplus_decl_attributes(tree_node**, tree_node*, int)
../../gcc/cp/decl2.c:1407
...

I think it's useful to have name comparison in a single function.

Martin

> 
> Jason
> 



Re: [3/7] Rename DR_ALIGNED_TO to DR_OFFSET_ALIGNMENT

2017-07-03 Thread Richard Biener
On Mon, Jul 3, 2017 at 9:32 AM, Richard Sandiford
 wrote:
> This patch renames DR_ALIGNED_TO to DR_OFFSET_ALIGNMENT, to avoid
> confusion with the upcoming DR_BASE_ALIGNMENT.  Nothing needed the
> value as a tree, and the value is clipped to BIGGEST_ALIGNMENT
> (maybe it should be MAX_OFILE_ALIGNMENT?) so we might as well use
> an unsigned int instead.
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Ok.

Richard.

> Richard
>
>
> 2017-07-03  Richard Sandiford  
>
> gcc/
> * tree-data-ref.h (innermost_loop_behavior): Replace aligned_to
> with offset_alignment.
> (DR_ALIGNED_TO): Delete.
> (DR_OFFSET_ALIGNMENT): New macro.
> * tree-vectorizer.h (STMT_VINFO_DR_ALIGNED_TO): Delete.
> (STMT_VINFO_DR_OFFSET_ALIGNMENT): New macro.
> * tree-data-ref.c (dr_analyze_innermost): Update after above changes.
> (create_data_ref): Likewise.
> * tree-vect-data-refs.c (vect_compute_data_ref_alignment): Likewise.
> (vect_analyze_data_refs): Likewise.
> * tree-if-conv.c (if_convertible_loop_p_1): Use memset before
> creating dummy innermost behavior.
>
> Index: gcc/tree-data-ref.h
> ===
> --- gcc/tree-data-ref.h 2017-07-03 08:03:19.181500132 +0100
> +++ gcc/tree-data-ref.h 2017-07-03 08:06:19.720107957 +0100
> @@ -52,9 +52,10 @@ struct innermost_loop_behavior
>tree init;
>tree step;
>
> -  /* Alignment information.  ALIGNED_TO is set to the largest power of two
> - that divides OFFSET.  */
> -  tree aligned_to;
> +  /* The largest power of two that divides OFFSET, capped to a suitably
> + high value if the offset is zero.  This is a byte rather than a bit
> + quantity.  */
> +  unsigned int offset_alignment;
>  };
>
>  /* Describes the evolutions of indices of the memory reference.  The indices
> @@ -143,7 +144,7 @@ #define DR_OFFSET(DR)  (DR)-
>  #define DR_INIT(DR)(DR)->innermost.init
>  #define DR_STEP(DR)(DR)->innermost.step
>  #define DR_PTR_INFO(DR)(DR)->alias.ptr_info
> -#define DR_ALIGNED_TO(DR)  (DR)->innermost.aligned_to
> +#define DR_OFFSET_ALIGNMENT(DR)(DR)->innermost.offset_alignment
>  #define DR_INNERMOST(DR)   (DR)->innermost
>
>  typedef struct data_reference *data_reference_p;
> Index: gcc/tree-vectorizer.h
> ===
> --- gcc/tree-vectorizer.h   2017-07-03 07:57:56.883079731 +0100
> +++ gcc/tree-vectorizer.h   2017-07-03 08:06:19.721107925 +0100
> @@ -707,7 +707,8 @@ #define STMT_VINFO_DR_BASE_ADDRESS(S)
>  #define STMT_VINFO_DR_INIT(S)  (S)->dr_wrt_vec_loop.init
>  #define STMT_VINFO_DR_OFFSET(S)(S)->dr_wrt_vec_loop.offset
>  #define STMT_VINFO_DR_STEP(S)  (S)->dr_wrt_vec_loop.step
> -#define STMT_VINFO_DR_ALIGNED_TO(S)(S)->dr_wrt_vec_loop.aligned_to
> +#define STMT_VINFO_DR_OFFSET_ALIGNMENT(S) \
> +  (S)->dr_wrt_vec_loop.offset_alignment
>
>  #define STMT_VINFO_IN_PATTERN_P(S) (S)->in_pattern_p
>  #define STMT_VINFO_RELATED_STMT(S) (S)->related_stmt
> Index: gcc/tree-data-ref.c
> ===
> --- gcc/tree-data-ref.c 2017-07-03 08:03:19.181500132 +0100
> +++ gcc/tree-data-ref.c 2017-07-03 08:06:19.720107957 +0100
> @@ -869,7 +869,7 @@ dr_analyze_innermost (struct data_refere
>drb->offset = fold_convert (ssizetype, offset_iv.base);
>drb->init = init;
>drb->step = step;
> -  drb->aligned_to = size_int (highest_pow2_factor (offset_iv.base));
> +  drb->offset_alignment = highest_pow2_factor (offset_iv.base);
>
>if (dump_file && (dump_flags & TDF_DETAILS))
>  fprintf (dump_file, "success.\n");
> @@ -1083,8 +1083,8 @@ create_data_ref (loop_p nest, loop_p loo
>print_generic_expr (dump_file, DR_INIT (dr), TDF_SLIM);
>fprintf (dump_file, "\n\tstep: ");
>print_generic_expr (dump_file, DR_STEP (dr), TDF_SLIM);
> -  fprintf (dump_file, "\n\taligned to: ");
> -  print_generic_expr (dump_file, DR_ALIGNED_TO (dr), TDF_SLIM);
> +  fprintf (dump_file, "\n\toffset alignment: %d",
> +  DR_OFFSET_ALIGNMENT (dr));
>fprintf (dump_file, "\n\tbase_object: ");
>print_generic_expr (dump_file, DR_BASE_OBJECT (dr), TDF_SLIM);
>fprintf (dump_file, "\n");
> Index: gcc/tree-vect-data-refs.c
> ===
> --- gcc/tree-vect-data-refs.c   2017-07-03 07:57:47.758408141 +0100
> +++ gcc/tree-vect-data-refs.c   2017-07-03 08:06:19.721107925 +0100
> @@ -772,7 +772,7 @@ vect_compute_data_ref_alignment (struct
>
>alignment = TYPE_ALIGN_UNIT (vectype);
>
> -  if ((compare_tree_int (drb->aligned_to, alignment) < 0)
> +  if (drb->offset_alignment < alignment
>|| 

Re: [2/7] Make dr_analyze_innermost operate on innermost_loop_behavior

2017-07-03 Thread Richard Biener
On Mon, Jul 3, 2017 at 9:30 AM, Richard Sandiford
 wrote:
> This means that callers to dr_analyze_innermost don't need a full
> data_reference and don't need to fill in any fields beforehand.
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Ok.

Thanks,
Richard.

> Richard
>
>
> 2017-07-03  Richard Sandiford  
>
> gcc/
> * tree-data-ref.h (dr_analyze_innermost): Replace the dr argument
> with a "innermost_loop_behavior *" and refeence tree.
> * tree-data-ref.c (dr_analyze_innermost): Likewise.
> (create_data_ref): Update call accordingly.
> * tree-predcom.c (find_looparound_phi): Likewise.
>
> Index: gcc/tree-data-ref.h
> ===
> --- gcc/tree-data-ref.h 2017-07-03 07:53:58.106558668 +0100
> +++ gcc/tree-data-ref.h 2017-07-03 08:03:19.181500132 +0100
> @@ -322,7 +322,7 @@ #define DDR_DIST_VECT(DDR, I) \
>  #define DDR_REVERSED_P(DDR) (DDR)->reversed_p
>
>
> -bool dr_analyze_innermost (struct data_reference *, struct loop *);
> +bool dr_analyze_innermost (innermost_loop_behavior *, tree, struct loop *);
>  extern bool compute_data_dependences_for_loop (struct loop *, bool,
>vec *,
>vec *,
> Index: gcc/tree-data-ref.c
> ===
> --- gcc/tree-data-ref.c 2017-07-03 07:57:44.485520457 +0100
> +++ gcc/tree-data-ref.c 2017-07-03 08:03:19.181500132 +0100
> @@ -864,13 +864,12 @@ dr_analyze_innermost (struct data_refere
>  fold_convert (ssizetype, base_iv.step),
>  fold_convert (ssizetype, offset_iv.step));
>
> -  DR_BASE_ADDRESS (dr) = canonicalize_base_object_address (base_iv.base);
> +  drb->base_address = canonicalize_base_object_address (base_iv.base);
>
> -  DR_OFFSET (dr) = fold_convert (ssizetype, offset_iv.base);
> -  DR_INIT (dr) = init;
> -  DR_STEP (dr) = step;
> -
> -  DR_ALIGNED_TO (dr) = size_int (highest_pow2_factor (offset_iv.base));
> +  drb->offset = fold_convert (ssizetype, offset_iv.base);
> +  drb->init = init;
> +  drb->step = step;
> +  drb->aligned_to = size_int (highest_pow2_factor (offset_iv.base));
>
>if (dump_file && (dump_flags & TDF_DETAILS))
>  fprintf (dump_file, "success.\n");
> Index: gcc/tree-predcom.c
> ===
> --- gcc/tree-predcom.c  2017-07-03 07:53:58.106558668 +0100
> +++ gcc/tree-predcom.c  2017-07-03 08:03:19.181500132 +0100
> @@ -1149,7 +1149,7 @@ find_looparound_phi (struct loop *loop,
>memset (_dr, 0, sizeof (struct data_reference));
>DR_REF (_dr) = init_ref;
>DR_STMT (_dr) = phi;
> -  if (!dr_analyze_innermost (_dr, loop))
> +  if (!dr_analyze_innermost (_INNERMOST (_dr), init_ref, loop))
>  return NULL;
>
>if (!valid_initializer_p (_dr, ref->distance + 1, root->ref))


Re: [1/7] Use innermost_loop_behavior for outer loop vectorisation

2017-07-03 Thread Richard Biener
On Mon, Jul 3, 2017 at 9:28 AM, Richard Sandiford
 wrote:
> This patch replaces the individual stmt_vinfo dr_* fields with
> an innermost_loop_behavior, so that the changes in later patches
> get picked up automatically.  It also adds a helper function for
> getting the behavior of a data reference wrt the vectorised loop.
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Ok.

Thanks,
Richard.

> Richard
>
>
> 2017-07-03  Richard Sandiford  
>
> gcc/
> * tree-vectorizer.h (_stmt_vec_info): Replace individual dr_*
> fields with dr_wrt_vec_loop.
> (STMT_VINFO_DR_BASE_ADDRESS, STMT_VINFO_DR_INIT, STMT_VINFO_DR_OFFSET)
> (STMT_VINFO_DR_STEP, STMT_VINFO_DR_ALIGNED_TO): Update accordingly.
> (STMT_VINFO_DR_WRT_VEC_LOOP): New macro.
> (vect_dr_behavior): New function.
> (vect_create_addr_base_for_vector_ref): Remove loop parameter.
> * tree-vect-data-refs.c (vect_compute_data_ref_alignment): Use
> vect_dr_behavior.  Use a step_preserves_misalignment_p boolean to
> track whether the step preserves the misalignment.
> (vect_create_addr_base_for_vector_ref): Remove loop parameter.
> Use vect_dr_behavior.
> (vect_setup_realignment): Update call accordingly.
> (vect_create_data_ref_ptr): Likewise.  Use vect_dr_behavior.
> * tree-vect-loop-manip.c (vect_gen_prolog_loop_niters): Update
> call to vect_create_addr_base_for_vector_ref.
> (vect_create_cond_for_align_checks): Likewise.
> * tree-vect-patterns.c (vect_recog_bool_pattern): Copy
> STMT_VINFO_DR_WRT_VEC_LOOP as a block.
> (vect_recog_mask_conversion_pattern): Likewise.
> * tree-vect-stmts.c (compare_step_with_zero): Use vect_dr_behavior.
> (new_stmt_vec_info): Remove redundant zeroing.
>
> Index: gcc/tree-vectorizer.h
> ===
> --- gcc/tree-vectorizer.h   2017-07-03 07:53:58.15242 +0100
> +++ gcc/tree-vectorizer.h   2017-07-03 07:57:56.883079731 +0100
> @@ -554,11 +554,7 @@ typedef struct _stmt_vec_info {
>
>/* Information about the data-ref relative to this loop
>   nest (the loop that is being considered for vectorization).  */
> -  tree dr_base_address;
> -  tree dr_init;
> -  tree dr_offset;
> -  tree dr_step;
> -  tree dr_aligned_to;
> +  innermost_loop_behavior dr_wrt_vec_loop;
>
>/* For loop PHI nodes, the base and evolution part of it.  This makes sure
>   this information is still available in vect_update_ivs_after_vectorizer
> @@ -706,11 +702,12 @@ #define STMT_VINFO_SIMD_LANE_ACCESS_P(S)
>  #define STMT_VINFO_VEC_REDUCTION_TYPE(S)   (S)->v_reduc_type
>  #define STMT_VINFO_VEC_CONST_COND_REDUC_CODE(S) (S)->const_cond_reduc_code
>
> -#define STMT_VINFO_DR_BASE_ADDRESS(S)  (S)->dr_base_address
> -#define STMT_VINFO_DR_INIT(S)  (S)->dr_init
> -#define STMT_VINFO_DR_OFFSET(S)(S)->dr_offset
> -#define STMT_VINFO_DR_STEP(S)  (S)->dr_step
> -#define STMT_VINFO_DR_ALIGNED_TO(S)(S)->dr_aligned_to
> +#define STMT_VINFO_DR_WRT_VEC_LOOP(S)  (S)->dr_wrt_vec_loop
> +#define STMT_VINFO_DR_BASE_ADDRESS(S)  (S)->dr_wrt_vec_loop.base_address
> +#define STMT_VINFO_DR_INIT(S)  (S)->dr_wrt_vec_loop.init
> +#define STMT_VINFO_DR_OFFSET(S)(S)->dr_wrt_vec_loop.offset
> +#define STMT_VINFO_DR_STEP(S)  (S)->dr_wrt_vec_loop.step
> +#define STMT_VINFO_DR_ALIGNED_TO(S)(S)->dr_wrt_vec_loop.aligned_to
>
>  #define STMT_VINFO_IN_PATTERN_P(S) (S)->in_pattern_p
>  #define STMT_VINFO_RELATED_STMT(S) (S)->related_stmt
> @@ -1012,6 +1009,22 @@ known_alignment_for_access_p (struct dat
>return (DR_MISALIGNMENT (data_ref_info) != DR_MISALIGNMENT_UNKNOWN);
>  }
>
> +/* Return the behavior of DR with respect to the vectorization context
> +   (which for outer loop vectorization might not be the behavior recorded
> +   in DR itself).  */
> +
> +static inline innermost_loop_behavior *
> +vect_dr_behavior (data_reference *dr)
> +{
> +  gimple *stmt = DR_STMT (dr);
> +  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
> +  loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
> +  if (loop_vinfo == NULL
> +  || !nested_in_vect_loop_p (LOOP_VINFO_LOOP (loop_vinfo), stmt))
> +return _INNERMOST (dr);
> +  else
> +return _VINFO_DR_WRT_VEC_LOOP (stmt_info);
> +}
>
>  /* Return true if the vect cost model is unlimited.  */
>  static inline bool
> @@ -1138,8 +1151,7 @@ extern tree vect_get_new_vect_var (tree,
>  extern tree vect_get_new_ssa_name (tree, enum vect_var_kind,
>const char * = NULL);
>  extern tree vect_create_addr_base_for_vector_ref (gimple *, gimple_seq *,
> - tree, struct loop *,
> -

Re: [PATCH GCC][3/4]Generalize dead store elimination (or store motion) across loop iterations in predcom

2017-07-03 Thread Richard Biener
On Tue, Jun 27, 2017 at 12:49 PM, Bin Cheng  wrote:
> Hi,
> For the moment, tree-predcom.c only supports invariant/load-loads/store-loads 
> chains.
> This patch generalizes dead store elimination (or store motion) across loop 
> iterations in
> predictive commoning pass by supporting store-store chain.  As comment in the 
> patch:
>
>Apart from predictive commoning on Load-Load and Store-Load chains, we
>also support Store-Store chains -- stores killed by other store can be
>eliminated.  Given below example:
>
>  for (i = 0; i < n; i++)
>{
>  a[i] = 1;
>  a[i+2] = 2;
>}
>
>It can be replaced with:
>
>  t0 = a[0];
>  t1 = a[1];
>  for (i = 0; i < n; i++)
>{
>  a[i] = 1;
>  t2 = 2;
>  t0 = t1;
>  t1 = t2;
>}
>  a[n] = t0;
>  a[n+1] = t1;
>
>If the loop runs more than 1 iterations, it can be further simplified into:
>
>  for (i = 0; i < n; i++)
>{
>  a[i] = 1;
>}
>  a[n] = 2;
>  a[n+1] = 2;
>
>The interesting part is this can be viewed either as general store motion
>or general dead store elimination in either intra/inter-iterations way.
>
> There are number of interesting facts about this enhancement:
> a) This patch supports dead store elimination for both across-iteration case 
> and single-iteration
>  case.  For the latter, it is dead store elimination.
> b) There are advantages supporting dead store elimination in predcom, for 
> example, it has
>  complete information about memory address.  On the contrary, DSE pass 
> can only handle
>  memory references with exact the same memory address expression.
> c) It's cheap to support store-stores chain in predcom based on existing code.
> d) As commented, the enhancement can be viewed as either generalized dead 
> store elimination
>  or generalized store motion.  I prefer DSE here.
>
> Bootstrap(O2/O3) in patch series on x86_64 and AArch64.  Is it OK?

Looks mostly ok.  I have a few questions though.

+  /* Don't do store elimination if loop has multiple exit edges.  */
+  bool eliminate_store_p = single_exit (loop) != NULL;

handling this would be an enhancement?  IIRC LIM store-motion handles this
just fine by emitting code on all exits.

@@ -1773,6 +2003,9 @@ determine_unroll_factor (vec chains)
 {
   if (chain->type == CT_INVARIANT)
continue;
+  /* Don't unroll when eliminating stores.  */
+  else if (chain->type == CT_STORE_STORE)
+   return 1;

this is a hard exit value so we do not handle the case where another chain
in the loop would want to unroll? (enhancement?)  I'd have expected to do
the same as for CT_INVARIANT here.

+  tree init = ref_at_iteration (dr, (int) 0 - i, );
+  if (!chain->all_always_accessed && tree_could_trap_p (init))
+   {
+ gimple_seq_discard (stmts);
+ return false;

so this is the only place that remotely cares for not always performed stores.
But as-is the patch doesn't seem to avoid speculating stores and thus
violates the C++ memory model, aka, introduces store-data-races?  The LIM
store-motion code was fixed to avoid this by keeping track of whether a BB
has executed to guard the stores done in the compensation code on the loop
exit.

That said, to "fix" this all && tree_could_trap_p cases would need to be removed
(or similarly flag vars be introduced).  Speculating loads that do not
trap is ok
(might only introduce false uninit use reports by tools like valgrind).

Thanks,
Richard.

> Thanks,
> bin
> 2017-06-21  Bin Cheng  
>
> * tree-predcom.c: Revise general description of pass.
> (enum chain_type): New enum type for store elimination.
> (struct chain): New field supporting store elimination.
> (dump_chain): Dump store-stores chain.
> (release_chain): Release resources.
> (split_data_refs_to_components): Compute and create component
> contains only stores for elimination.
> (get_chain_last_ref_at): New function.
> (make_invariant_chain): Initialization.
> (make_rooted_chain): Specify chain type in parameter.
> (add_looparound_copies): Skip for store-stores chain.
> (determine_roots_comp): Compute type of chain and pass it to
> make_rooted_chain.
> (initialize_root_vars_store_elim_2): New function.
> (finalize_eliminated_stores): New function.
> (remove_stmt): Handle store for elimination.
> (execute_pred_commoning_chain): Execute predictive commoning on
> store-store chains.
> (determine_unroll_factor): Skip unroll for store-stores chain.
> (prepare_initializers_chain_store_elim): New function.
> (prepare_initializers_chain): Hanlde store-store chain.
> (prepare_finalizers_chain, prepare_finalizers): New function.
> (tree_predictive_commoning_loop): Return 

Re: C++ PATCH to remove WITH_CLEANUP_EXPR handling

2017-07-03 Thread Marek Polacek
On Thu, Jun 29, 2017 at 05:44:25PM -0400, Jason Merrill wrote:
> The C++ front end hasn't generated WITH_CLEANUP_EXPR in a very long
> time (20+ years?), so there's no need to handle it.

Heh.  Found another one; is this patch ok if it passes testing?

2017-07-03  Marek Polacek  

* c-warn.c (warn_if_unused_value): Remove WITH_CLEANUP_EXPR handling.

diff --git gcc/c-family/c-warn.c gcc/c-family/c-warn.c
index 5d67395..b9378c2 100644
--- gcc/c-family/c-warn.c
+++ gcc/c-family/c-warn.c
@@ -465,7 +465,6 @@ warn_if_unused_value (const_tree exp, location_t locus)
 case TARGET_EXPR:
 case CALL_EXPR:
 case TRY_CATCH_EXPR:
-case WITH_CLEANUP_EXPR:
 case EXIT_EXPR:
 case VA_ARG_EXPR:
   return false;

Marek


Re: [PATCH] Add dotfn

2017-07-03 Thread Tom de Vries

On 07/03/2017 09:05 AM, Richard Biener wrote:

On Mon, 3 Jul 2017, Tom de Vries wrote:


Hi,

this patch adds a debug function dotfn and a convenience macro DOTFN similar
to dot-fn in gdbhooks.py.

It can be used to have the compiler:
- dump a control flow graph, or
- pop up a control flow graph window
at specific moments in the compilation flow, for debugging purposes.

Bootstrapped and reg-tested on x86_64.

Used for debugging PR81192.

OK for trunk?


Why's dot-fn not enough? > I'd rather extend stuff in gdbhooks.py than
adding this kind of stuff to gcc itself.


When expressing where and when to dump or pop-up a control flow graph, 
sometimes it's easier for me to do that in C than in gdb scripting.


Thanks,
- Tom


Re: [gomp4] fix an ICE involving assumed-size arrays

2017-07-03 Thread Thomas Schwinge
Hi!

On Tue, 30 Aug 2016 14:55:06 -0700, Cesar Philippidis  
wrote:
> Usually a data clause would would have OMP_CLAUSE_SIZE set, but not all
> do. In the latter case, lower_omp_target falls back to using size of the
> type of the variable specified in the data clause. However, in the case
> of assumed-size arrays, the size of the type may be NULL because its
> undefined. My fix for this solution is to set the size to one byte if
> the size of the type is NULL. This solution at least allows the runtime
> the opportunity to remap any data already present on the accelerator.
> However, if the data isn't present on the accelerator, this will likely
> result in some sort of segmentation fault on the accelerator.
> 
> The OpenACC spec is not clear how the compiler should handle
> assumed-sized arrays when the user does not provide an explicit data
> clause with a proper subarray. It was tempting to make such implicit
> variables errors, but arguably that would affect usability. Perhaps I
> should a warning for implicitly used assumed-sizes arrays?

(I don't know a lot about Fortran assumed-size arrays, but I agree that a
user might expect code to work, like that in the example you added.)

> I've applied this patch to gomp-4_0-branch. It looks like OpenMP has a
> similar problem.

... which Jakub for  fixed in trunk r243860,
 by
"disallow[ing] explicit or implicit OpenMP mapping of assumed-size
arrays".  So when merging these two changes, I had to apply the following
additional patch, which will need to get resolved some way or another:

--- gcc/fortran/trans-openmp.c
+++ gcc/fortran/trans-openmp.c
@@ -1048,6 +1048,11 @@ gfc_omp_finish_clause (tree c, gimple_seq *pre_p)
 
   tree decl = OMP_CLAUSE_DECL (c);
 
+  /* This conflicts with the OpenACC changes done to support assumed-size
+ arrays that are implicitly mapped after enter data directive (see
+ libgomp.oacc-fortran/assumed-size.f90) -- doesn't the same apply to
+ OpenMP, too?  */
+#if 0
   /* Assumed-size arrays can't be mapped implicitly, they have to be
  mapped explicitly using array sections.  */
   if (TREE_CODE (decl) == PARM_DECL
@@ -1061,6 +1066,7 @@ gfc_omp_finish_clause (tree c, gimple_seq *pre_p)
"implicit mapping of assumed size array %qD", decl);
   return;
 }
+#endif
 
   tree c2 = NULL_TREE, c3 = NULL_TREE, c4 = NULL_TREE;
   if (POINTER_TYPE_P (TREE_TYPE (decl)))
--- gcc/testsuite/gfortran.dg/gomp/pr78866-2.f90
+++ gcc/testsuite/gfortran.dg/gomp/pr78866-2.f90
@@ -3,7 +3,8 @@
 
 subroutine pr78866(x)
   integer :: x(*)
-!$omp target   ! { dg-error "implicit mapping of assumed size array" }
+! Regarding the XFAIL, see gcc/fortran/trans-openmp.c:gfc_omp_finish_clause.
+!$omp target   ! { dg-error "implicit mapping of assumed size array" 
"" { xfail *-*-* } }
   x(1) = 1
 !$omp end target
 end

For reference, here are Cesar's gomp-4_0-branch r239874 changes:

> --- a/gcc/omp-low.c
> +++ b/gcc/omp-low.c
> @@ -16534,6 +16534,12 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, 
> omp_context *ctx)
> s = OMP_CLAUSE_SIZE (c);
>   if (s == NULL_TREE)
> s = TYPE_SIZE_UNIT (TREE_TYPE (ovar));
> + /* Fortran assumed-size arrays have zero size because the
> +type is incomplete.  Set the size to one to allow the
> +runtime to remap any existing data that is already
> +present on the accelerator.  */
> + if (s == NULL_TREE)
> +   s = integer_one_node;
>   s = fold_convert (size_type_node, s);
>   purpose = size_int (map_idx++);
>   CONSTRUCTOR_APPEND_ELT (vsize, purpose, s);
> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.oacc-fortran/assumed-size.f90
> @@ -0,0 +1,31 @@
> +! Test if implicitly determined data clauses work with an
> +! assumed-sized array variable.  Note that the array variable, 'a',
> +! has been explicitly copied in and out via acc enter data and acc
> +! exit data, respectively.

(Should add a "dg-do run" directive here?)

> +
> +program test
> +  implicit none
> +
> +  integer, parameter :: n = 100
> +  integer a(n), i
> +
> +  call dtest (a, n)
> +
> +  do i = 1, n
> + if (a(i) /= i) call abort
> +  end do
> +end program test
> +
> +subroutine dtest (a, n)
> +  integer i, n
> +  integer a(*)
> +
> +  !$acc enter data copyin(a(1:n))
> +
> +  !$acc parallel loop
> +  do i = 1, n
> + a(i) = i
> +  end do
> +
> +  !$acc exit data copyout(a(1:n))
> +end subroutine dtest


Grüße
 Thomas


Re: [Libgomp, Fortran] Fix canadian cross build

2017-07-03 Thread Yvan Roux
On 23 June 2017 at 15:44, Yvan Roux  wrote:
> Hello,
>
> Fortran parts of libgomp (omp_lib.mod, openacc.mod, etc...) are
> missing in a canadian cross build, at least when target gfortran
> compiler comes from PATH and not from GFORTRAN_FOR_TARGET.
>
> Back in 2010, executability test of GFORTRAN was added to fix libgomp
> build on cygwin, but when the executable doesn't contain the path,
> "test -x" fails and part of the library are not built.
>
> This patch fixes the issue by using M4 macro AC_PATH_PROG (which
> returns the absolute name) instead of AC_CHECK_PROG in the function
> defined in config/acx.m4: NCN_STRICT_CHECK_TARGET_TOOLS.  I renamed it
> into NCN_STRICT_PATH_TARGET_TOOLS to keep the semantic used in M4.
>
> Tested by building cross and candian cross toolchain (host:
> i686-w64-mingw32) for arm-linux-gnueabihf with issue and with a
> complete libgomp.
>
> ok for trunk ?

ping?

> Thanks
> Yvan
>
> config/ChangeLog
> 2017-06-23  Yvan Roux  
>
> * acx.m4 (NCN_STRICT_CHECK_TARGET_TOOLS): Renamed to ...
> (NCN_STRICT_PATH_TARGET_TOOLS): ... this.  It reflects the replacement
> of AC_CHECK_PROG by AC_PATH_PROG to get the absolute name of the
> program.
> (ACX_CHECK_INSTALLED_TARGET_TOOL): Use renamed function.
>
> ChangeLog
> 2017-06-23  Yvan Roux  
>
> * configure.ac: Use NCN_STRICT_PATH_TARGET_TOOLS instead of
> NCN_STRICT_CHECK_TARGET_TOOLS.
> * configure: Regenerate.


RFC/A: Early predictive commoning pass

2017-07-03 Thread Richard Sandiford
General predictive commoning would play havoc with loop vectorisation,
so the current pass order is clearly the right one.  But running a very
limited form of predictive commoning before vectorisation would allow us
to vectorise things like:

 for (int i = 1; i < n; ++i)
   x[i] = x[i - 1] + 1;

This patch adds an extra pass that is restricted to cases that should
help (or at least not hinder) vectorisation.  It gives some nice
improvements on some internal benchmarks.

I compared the output for SPEC 2k6 before and after the patch.  For some
benchmarks it led to a trivial register renaming, but had no effect on
those benchmarks beyond that.  The only benchmark that changed in a
significant way was 416.gamess, where we were able to vectorise some
simple loops that we weren't previously.  None of those loops seem to
be hot though, so there was no measurable difference in the score.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  Thoughts?  Is this
too much of a special case to support a new pass?  OTOH, other compilers
do vectorise the loop above, so it would be nice if we could too...

Richard


2017-07-03  Richard Sandiford  

gcc/
* passes.def (pass_early_predcom): New.
* tree-pass.h (make_pass_early_predcom): Declare.
* tree-predcom.c (MAX_DISTANCE): Turn into an inclusive rather than
exclusive upper bound.
(only_simple_p): New variable.
(max_distance): Likewise.
(add_ref_to_chain): Use MAX_DISTANCE rather than max_distance
and treat it as an inclusive upper bound.  Require the store to
come after the load at the maximum distance if only_simple_p.
(add_looparound_copies): Do nothing if only_simple_p.
(determine_roots_comp): Use MAX_DISTANCE rather than max_distance
and treat it as an inclusive upper bound.  Require the start of
a chain to be a store if only_simple_p.
(determine_unroll_factor): Return 1 if only_simple_p.
(tree_predictive_commoning): Add an early_p parameter.  Set up
only_simple_p and max_distance.
(run_tree_predictive_commoning): Add an early_p parameter.
Update call to tree_predictive_commoning.
(pass_data_early_predcom): New descriptor.
(pass_early_predcom): New class.
(pass_data_predcom::execute): Update call to
run_tree_predictive_commoning.
(make_pass_early_predcom): New function.

gcc/testsuite/
* gnat.dg/vect18.adb: Turn off predictive commoning.

Index: gcc/passes.def
===
--- gcc/passes.def  2017-06-22 12:22:55.989380389 +0100
+++ gcc/passes.def  2017-07-03 09:17:28.626495661 +0100
@@ -290,6 +290,7 @@ along with GCC; see the file COPYING3.
  NEXT_PASS (pass_parallelize_loops, false /* oacc_kernels_p */);
  NEXT_PASS (pass_expand_omp_ssa);
  NEXT_PASS (pass_ch_vect);
+ NEXT_PASS (pass_early_predcom);
  NEXT_PASS (pass_if_conversion);
  /* pass_vectorize must immediately follow pass_if_conversion.
 Please do not add any other passes in between.  */
Index: gcc/tree-pass.h
===
--- gcc/tree-pass.h 2017-06-22 12:22:34.954287935 +0100
+++ gcc/tree-pass.h 2017-07-03 09:17:28.627495621 +0100
@@ -369,6 +369,7 @@ extern gimple_opt_pass *make_pass_tree_l
 extern gimple_opt_pass *make_pass_lim (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_tree_unswitch (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_loop_split (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_early_predcom (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_predcom (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_iv_canon (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_scev_cprop (gcc::context *ctxt);
Index: gcc/tree-predcom.c
===
--- gcc/tree-predcom.c  2017-07-03 08:42:47.632532107 +0100
+++ gcc/tree-predcom.c  2017-07-03 09:29:08.744451338 +0100
@@ -218,7 +218,7 @@ Free Software Foundation; either version
 /* The maximum number of iterations between the considered memory
references.  */
 
-#define MAX_DISTANCE (target_avail_regs < 16 ? 4 : 8)
+#define MAX_DISTANCE (target_avail_regs < 16 ? 3 : 7)
 
 /* Data references (or phi nodes that carry data reference values across
loop iterations).  */
@@ -343,6 +343,71 @@ struct component
 
 static hash_map *name_expansions;
 
+/* True if we're running the early predcom pass and should only handle
+   cases that aid vectorization.  Specifically this means that:
+
+   - only CT_INVARIANT and CT_STORE_LOAD chains are used
+   - the maximum distance for a CT_STORE_LOAD chain is 1 iteration,
+ and at that distance the store must come after the load
+   - there's no unrolling or detection of looparound phis.
+

Re: [PATCH PR78005]Fix miscompare issue by computing correct guard condition for vectorized loop

2017-07-03 Thread Richard Biener
On Mon, 3 Jul 2017, Richard Sandiford wrote:

> Richard Biener  writes:
> > On Mon, 12 Jun 2017, Bin.Cheng wrote:
> >> On Mon, Jun 12, 2017 at 9:19 AM, Richard Sandiford
> >>  wrote:
> >> > "Bin.Cheng"  writes:
> >> >> On Sat, Jun 10, 2017 at 10:40 AM, Richard Sandiford
> >> >>  wrote:
> >> >>> Sorry to return this old patch, but:
> >> >>>
> >> >>> Bin Cheng  writes:
> >>  -/* Calculate the number of iterations under which scalar loop will be
> >>  -   preferred than vectorized loop.  NITERS_PROLOG is the number of
> >>  -   iterations of prolog loop.  If it's integer const, the integer
> >>  -   number is also passed by INT_NITERS_PROLOG.  VF is vector factor;
> >>  -   TH is the threshold for vectorized loop if CHECK_PROFITABILITY is
> >>  -   true.  This function also store upper bound of the result in 
> >>  BOUND.  */
> >>  +/* Calculate the number of iterations above which vectorized loop 
> >>  will be
> >>  +   preferred than scalar loop.  NITERS_PROLOG is the number of 
> >>  iterations
> >>  +   of prolog loop.  If it's integer const, the integer number is 
> >>  also passed
> >>  +   in INT_NITERS_PROLOG.  BOUND_PROLOG is the upper bound (included) 
> >>  of
> >>  +   number of iterations of prolog loop.  VFM1 is vector factor minus 
> >>  one.
> >>  +   If CHECK_PROFITABILITY is true, TH is the threshold below which 
> >>  scalar
> >>  +   (rather than vectorized) loop will be executed.  This function 
> >>  stores
> >>  +   upper bound (included) of the result in BOUND_SCALAR.  */
> >> 
> >>   static tree
> >>   vect_gen_scalar_loop_niters (tree niters_prolog, int 
> >>  int_niters_prolog,
> >>  -  int bound_prolog, int vf, int th, int 
> >>  *bound,
> >>  -  bool check_profitability)
> >>  +  int bound_prolog, int vfm1, int th,
> >>  +  int *bound_scalar, bool 
> >>  check_profitability)
> >>   {
> >> tree type = TREE_TYPE (niters_prolog);
> >> tree niters = fold_build2 (PLUS_EXPR, type, niters_prolog,
> >>  -  build_int_cst (type, vf));
> >>  +  build_int_cst (type, vfm1));
> >> 
> >>  -  *bound = vf + bound_prolog;
> >>  +  *bound_scalar = vfm1 + bound_prolog;
> >> if (check_profitability)
> >>   {
> >>  -  th++;
> >>  +  /* TH indicates the minimum niters of vectorized loop, while we
> >>  +  compute the maximum niters of scalar loop.  */
> >>  +  th--;
> >> >>>
> >> >>> Are you sure about this last change?  It looks like it should be 
> >> >>> dropping
> >> >> Hi Richard,
> >> >> Thanks for spotting this.  I vaguely remember I got this from the way
> >> >> how niter/th was checked in previous peeling code, but did't double
> >> >> check it now.  I tend to believe there is inconsistence about th,
> >> >> especially with comment like:
> >> >>
> >> >>   /* Threshold of number of iterations below which vectorzation will 
> >> >> not be
> >> >>  performed. It is calculated from MIN_PROFITABLE_ITERS and
> >> >>  PARAM_MIN_VECT_LOOP_BOUND. */
> >> >>   unsigned int th;
> >> >>
> >> >> I also tend to believe the inconsistence was introduced partly because
> >> >> niters in vectorizer stands for latch_niters + 1, while latch_niters
> >> >> in rest of the compiler.
> >> >>
> >> >> and...,
> >> >>
> >> >>> the increment rather than replacing it with a decrement.
> >> >>>
> >> >>> It looks like the threshold is already the maximum niters for the 
> >> >>> scalar
> >> >>> loop.  It's set by:
> >> >>>
> >> >>>   min_scalar_loop_bound = ((PARAM_VALUE (PARAM_MIN_VECT_LOOP_BOUND)
> >> >>> * vectorization_factor) - 1);
> >> >>>
> >> >>>   /* Use the cost model only if it is more conservative than user 
> >> >>> specified
> >> >>>  threshold.  */
> >> >>>   th = (unsigned) min_scalar_loop_bound;
> >> >>>   if (min_profitable_iters
> >> >>>   && (!min_scalar_loop_bound
> >> >>>   || min_profitable_iters > min_scalar_loop_bound))
> >> >>> th = (unsigned) min_profitable_iters;
> >> >>>
> >> >>>   LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) = th;
> >> >>>
> >> >>> (Note the "- 1" in the min_scalar_loop_bound.  The multiplication 
> >> >>> result
> >> >>> is the minimum niters for the vector loop.)
> >> >> To be honest, min_scalar_loop_bound is more likely for something's
> >> >> lower bound which is the niters for the vector loop.  If it refers to
> >> >> the niters scalar loop, it is in actuality the "max" value we should
> >> >> use.  I am not quite sure here, partly because I am not a native
> >> >> speaker.
> >> >>
> >> >>>
> >> >>> min_profitable_iters sounds like it _ought_ to 

Re: Revamp loop profile scaling to profile_probability

2017-07-03 Thread Richard Biener
On Sat, Jul 1, 2017 at 7:14 PM, Jan Hubicka  wrote:
> Hi,
> this patch makes loop profile scaling to use profile_probability.  This
> is mostly trivial change except for vect_do_peeling which seems to scale
> profile down and then back up.  This is a bad idea, because things may simply
> drop to 0.  So I kept that one to use integer scaling (because probability
> can not represent value greater than 1).
>
> Bootstrapped/regtested x86_64-linux.

This likely regressed

FAIL: gcc.dg/vect/pr79347.c scan-tree-dump-not vect "Invalid sum of "

Richard.

> Honza
> * cfg.c (scale_bbs_frequencies): New function.
> * cfg.h (scale_bbs_frequencies): Declare it.
> * cfgloopanal.c (single_likely_exit): Cleanup.
> * cfgloopmanip.c (scale_loop_frequencies): Take profile_probability
> as parameter.
> (scale_loop_profile): Likewise.
> (loop_version): Likewise.
> (create_empty_loop_on_edge): Update.
> * cfgloopmanip.h (scale_loop_frequencies, scale_loop_profile,
> scale_loop_frequencies, scale_loop_profile, loopify,
> loop_version): Update prototypes.
> * modulo-sched.c (sms_schedule): Update.
> * predict.c (unlikely_executed_edge_p): Also check probability.
> (probably_never_executed_edge_p): Fix typo.
> * tree-if-conv.c (version_loop_for_if_conversion): Update.
> * tree-parloops.c (gen_parallel_loop): Update.
> * tree-ssa-loop-ivcanon.c (try_peel_loop): Update.
> * tree-ssa-loop-manip.c (tree_transform_and_unroll_loop): Update.
> * tree-ssa-loop-split.c (split_loop): Update.
> * tree-ssa-loop-unswitch.c (tree_unswitch_loop): Update.
> * tree-vect-loop-manip.c (vect_do_peeling): Update.
> (vect_loop_versioning): Update.
> * tree-vect-loop.c (scale_profile_for_vect_loop): Update.
> Index: cfg.c
> ===
> --- cfg.c   (revision 249866)
> +++ cfg.c   (working copy)
> @@ -1051,6 +1051,26 @@ scale_bbs_frequencies_profile_count (bas
>  }
>  }
>
> +/* Multiply all frequencies of basic blocks in array BBS of length NBBS
> +   by NUM/DEN, in profile_count arithmetic.  More accurate than previous
> +   function but considerably slower.  */
> +void
> +scale_bbs_frequencies (basic_block *bbs, int nbbs,
> +  profile_probability p)
> +{
> +  int i;
> +  edge e;
> +
> +  for (i = 0; i < nbbs; i++)
> +{
> +  edge_iterator ei;
> +  bbs[i]->frequency = p.apply (bbs[i]->frequency);
> +  bbs[i]->count = bbs[i]->count.apply_probability (p);
> +  FOR_EACH_EDGE (e, ei, bbs[i]->succs)
> +   e->count =  e->count.apply_probability (p);
> +}
> +}
> +
>  /* Helper types for hash tables.  */
>
>  struct htab_bb_copy_original_entry
> Index: cfg.h
> ===
> --- cfg.h   (revision 249866)
> +++ cfg.h   (working copy)
> @@ -109,6 +109,7 @@ extern void scale_bbs_frequencies_gcov_t
>  gcov_type);
>  extern void scale_bbs_frequencies_profile_count (basic_block *, int,
>  profile_count, profile_count);
> +extern void scale_bbs_frequencies (basic_block *, int, profile_probability);
>  extern void initialize_original_copy_tables (void);
>  extern void reset_original_copy_tables (void);
>  extern void free_original_copy_tables (void);
> Index: cfgloopanal.c
> ===
> --- cfgloopanal.c   (revision 249866)
> +++ cfgloopanal.c   (working copy)
> @@ -469,16 +469,12 @@ single_likely_exit (struct loop *loop)
>exits = get_loop_exit_edges (loop);
>FOR_EACH_VEC_ELT (exits, i, ex)
>  {
> -  if (ex->flags & (EDGE_EH | EDGE_ABNORMAL_CALL))
> -   continue;
> -  /* The constant of 5 is set in a way so noreturn calls are
> -ruled out by this test.  The static branch prediction algorithm
> - will not assign such a low probability to conditionals for usual
> - reasons.
> -FIXME: Turn to likely_never_executed  */
> -  if ((profile_status_for_fn (cfun) != PROFILE_ABSENT
> -  && ex->probability < profile_probability::from_reg_br_prob_base 
> (5))
> - || ex->count == profile_count::zero ())
> +  if (probably_never_executed_edge_p (cfun, ex)
> + /* We want to rule out paths to noreturns but not low probabilities
> +resulting from adjustments or combining.
> +FIXME: once we have better quality tracking, make this more
> +robust.  */
> + || ex->probability <= profile_probability::very_unlikely ())
> continue;
>if (!found)
> found = ex;
> Index: cfgloopmanip.c
> ===
> --- cfgloopmanip.c  (revision 249866)
> +++ cfgloopmanip.c  

Re: [PATCH PR78005]Fix miscompare issue by computing correct guard condition for vectorized loop

2017-07-03 Thread Richard Sandiford
Richard Biener  writes:
> On Mon, 12 Jun 2017, Bin.Cheng wrote:
>> On Mon, Jun 12, 2017 at 9:19 AM, Richard Sandiford
>>  wrote:
>> > "Bin.Cheng"  writes:
>> >> On Sat, Jun 10, 2017 at 10:40 AM, Richard Sandiford
>> >>  wrote:
>> >>> Sorry to return this old patch, but:
>> >>>
>> >>> Bin Cheng  writes:
>>  -/* Calculate the number of iterations under which scalar loop will be
>>  -   preferred than vectorized loop.  NITERS_PROLOG is the number of
>>  -   iterations of prolog loop.  If it's integer const, the integer
>>  -   number is also passed by INT_NITERS_PROLOG.  VF is vector factor;
>>  -   TH is the threshold for vectorized loop if CHECK_PROFITABILITY is
>>  -   true.  This function also store upper bound of the result in BOUND. 
>>   */
>>  +/* Calculate the number of iterations above which vectorized loop will 
>>  be
>>  +   preferred than scalar loop.  NITERS_PROLOG is the number of 
>>  iterations
>>  +   of prolog loop.  If it's integer const, the integer number is also 
>>  passed
>>  +   in INT_NITERS_PROLOG.  BOUND_PROLOG is the upper bound (included) of
>>  +   number of iterations of prolog loop.  VFM1 is vector factor minus 
>>  one.
>>  +   If CHECK_PROFITABILITY is true, TH is the threshold below which 
>>  scalar
>>  +   (rather than vectorized) loop will be executed.  This function 
>>  stores
>>  +   upper bound (included) of the result in BOUND_SCALAR.  */
>> 
>>   static tree
>>   vect_gen_scalar_loop_niters (tree niters_prolog, int int_niters_prolog,
>>  -  int bound_prolog, int vf, int th, int *bound,
>>  -  bool check_profitability)
>>  +  int bound_prolog, int vfm1, int th,
>>  +  int *bound_scalar, bool check_profitability)
>>   {
>> tree type = TREE_TYPE (niters_prolog);
>> tree niters = fold_build2 (PLUS_EXPR, type, niters_prolog,
>>  -  build_int_cst (type, vf));
>>  +  build_int_cst (type, vfm1));
>> 
>>  -  *bound = vf + bound_prolog;
>>  +  *bound_scalar = vfm1 + bound_prolog;
>> if (check_profitability)
>>   {
>>  -  th++;
>>  +  /* TH indicates the minimum niters of vectorized loop, while we
>>  +  compute the maximum niters of scalar loop.  */
>>  +  th--;
>> >>>
>> >>> Are you sure about this last change?  It looks like it should be dropping
>> >> Hi Richard,
>> >> Thanks for spotting this.  I vaguely remember I got this from the way
>> >> how niter/th was checked in previous peeling code, but did't double
>> >> check it now.  I tend to believe there is inconsistence about th,
>> >> especially with comment like:
>> >>
>> >>   /* Threshold of number of iterations below which vectorzation will not 
>> >> be
>> >>  performed. It is calculated from MIN_PROFITABLE_ITERS and
>> >>  PARAM_MIN_VECT_LOOP_BOUND. */
>> >>   unsigned int th;
>> >>
>> >> I also tend to believe the inconsistence was introduced partly because
>> >> niters in vectorizer stands for latch_niters + 1, while latch_niters
>> >> in rest of the compiler.
>> >>
>> >> and...,
>> >>
>> >>> the increment rather than replacing it with a decrement.
>> >>>
>> >>> It looks like the threshold is already the maximum niters for the scalar
>> >>> loop.  It's set by:
>> >>>
>> >>>   min_scalar_loop_bound = ((PARAM_VALUE (PARAM_MIN_VECT_LOOP_BOUND)
>> >>> * vectorization_factor) - 1);
>> >>>
>> >>>   /* Use the cost model only if it is more conservative than user 
>> >>> specified
>> >>>  threshold.  */
>> >>>   th = (unsigned) min_scalar_loop_bound;
>> >>>   if (min_profitable_iters
>> >>>   && (!min_scalar_loop_bound
>> >>>   || min_profitable_iters > min_scalar_loop_bound))
>> >>> th = (unsigned) min_profitable_iters;
>> >>>
>> >>>   LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) = th;
>> >>>
>> >>> (Note the "- 1" in the min_scalar_loop_bound.  The multiplication result
>> >>> is the minimum niters for the vector loop.)
>> >> To be honest, min_scalar_loop_bound is more likely for something's
>> >> lower bound which is the niters for the vector loop.  If it refers to
>> >> the niters scalar loop, it is in actuality the "max" value we should
>> >> use.  I am not quite sure here, partly because I am not a native
>> >> speaker.
>> >>
>> >>>
>> >>> min_profitable_iters sounds like it _ought_ to be the minimum niters for
>> >>> which the vector loop is used, but vect_estimate_min_profitable_iters
>> >>> instead returns the largest niters for which the scalar loop should be
>> >>> preferred:
>> >>>
>> >>>   /* Cost model disabled.  */
>> >>>   if (unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo)))
>> >>> {

[7/7] Pool alignment information for common bases

2017-07-03 Thread Richard Sandiford
This patch is a follow-on to the fix for PR81136.  The testcase for that
PR shows that we can (correctly) calculate different base alignments
for two data_references but still tell that their misalignments wrt the
vector size are equal.  This is because we calculate the base alignments
for each dr individually, without looking at the other drs, and in
general the alignment we calculate is only guaranteed if the dr's DR_REF
actually occurs.

This is working as designed, but it does expose a missed opportunity.
We know that if a vectorised loop is reached, all statements in that
loop execute at least once, so it should be safe to pool the alignment
information for all the statements we're vectorising.  The only catch is
that DR_REFs for masked loads and stores only occur if the mask value is
nonzero.  For example, in:

  struct s __attribute__((aligned(32))) {
int misaligner;
int array[N];
  };

  int *ptr;
  for (int i = 0; i < n; ++i)
ptr[i] = c[i] ? ((struct s *) (ptr - 1))->array[i] : 0;

we can only guarantee that ptr points to a "struct s" if at least
one c[i] is true.

This patch adds a DR_IS_CONDITIONAL_IN_STMT flag to record whether
the DR_REF is guaranteed to occur every time that the statement
executes to completion.  It then pools the alignment information
for references that aren't conditional in this sense.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Richard


2017-07-03  Richard Sandiford  

gcc/
* tree-vectorizer.h: Include tree-hash-traits.h.
(vec_base_alignments): New typedef.
(vec_info): Add a base_alignments field.
(vect_record_base_alignments: Declare.
* tree-data-ref.h (data_reference): Add an is_conditional_in_stmt
field.
(DR_IS_CONDITIONAL_IN_STMT): New macro.
(create_data_ref): Add an is_conditional_in_stmt argument.
* tree-data-ref.c (create_data_ref): Likewise.  Use it to initialize
the is_conditional_in_stmt field.
(data_ref_loc): Add an is_conditional_in_stmt field.
(get_references_in_stmt): Set the is_conditional_in_stmt field.
(find_data_references_in_stmt): Update call to create_data_ref.
(graphite_find_data_references_in_stmt): Likewise.
* tree-ssa-loop-prefetch.c (determine_loop_nest_reuse): Likewise.
* tree-vect-data-refs.c (vect_analyze_data_refs): Likewise.
(vect_record_base_alignment): New function.
(vect_record_base_alignments): Likewise.
(vect_compute_data_ref_alignment): Adjust base_addr and aligned_to
for nested statements even if we fail to compute a misalignment.
Use pooled base alignments for unconditional references.
(vect_find_same_alignment_drs): Compare base addresses instead
of base objects.
(vect_compute_data_ref_alignment): Call vect_record_base_alignments.
* tree-vect-slp.c (vect_slp_analyze_bb_1): Likewise.
(new_bb_vec_info): Initialize base_alignments.
* tree-vect-loop.c (new_loop_vec_info): Likewise.
* tree-vectorizer.c (vect_destroy_datarefs): Release base_alignments.

gcc/testsuite/
* gcc.dg/vect/pr81136.c: Add scan test.

Index: gcc/tree-vectorizer.h
===
--- gcc/tree-vectorizer.h   2017-07-03 08:42:50.186422191 +0100
+++ gcc/tree-vectorizer.h   2017-07-03 08:45:24.571165851 +0100
@@ -22,6 +22,7 @@ Software Foundation; either version 3, o
 #define GCC_TREE_VECTORIZER_H
 
 #include "tree-data-ref.h"
+#include "tree-hash-traits.h"
 #include "target.h"
 
 /* Used for naming of new temporaries.  */
@@ -84,6 +85,11 @@ struct stmt_info_for_cost {
 
 typedef vec stmt_vector_for_cost;
 
+/* Maps base addresses to an innermost_loop_behavior that gives the maximum
+   known alignment for that base.  */
+typedef hash_map vec_base_alignments;
+
 /
   SLP
  /
@@ -156,6 +162,10 @@ struct vec_info {
   /* All data references.  */
   vec datarefs;
 
+  /* Maps base addresses to an innermost_loop_behavior that gives the maximum
+ known alignment for that base.  */
+  vec_base_alignments base_alignments;
+
   /* All data dependences.  */
   vec ddrs;
 
@@ -1132,6 +1142,7 @@ extern bool vect_prune_runtime_alias_tes
 extern bool vect_check_gather_scatter (gimple *, loop_vec_info,
   gather_scatter_info *);
 extern bool vect_analyze_data_refs (vec_info *, int *);
+extern void vect_record_base_alignments (vec_info *);
 extern tree vect_create_data_ref_ptr (gimple *, tree, struct loop *, tree,
  tree *, gimple_stmt_iterator *,
  gimple **, bool, bool *,
Index: gcc/tree-data-ref.h

PING: Fwd: SSA range class and removal of VR_ANTI_RANGEs

2017-07-03 Thread Aldy Hernandez
-- Forwarded message --
From: Aldy Hernandez 
Date: Wed, Jun 21, 2017 at 3:01 AM
Subject: Re: SSA range class and removal of VR_ANTI_RANGEs
To: Jakub Jelinek 
Cc: Richard Biener , Andrew MacLeod
, richard.sandif...@linaro.org, gcc-patches
, Martin Sebor 


Hi folks.

The following is another iteration of the SSA range class, taking into
account many of the suggestions posted on this thread, especially the
addition of a memory efficient class for storage, folding non-zero
bits back into the range information, C++ suggestions by Martin, and
some minor suggestions.

Most importantly, I have included an irange_storage class that uses
trailing_wide_ints<5>.  This produces far better results that my
previous incarnation with wide_int[6] :).

The storage class is basically this:

class GTY((variable_size)) irange_storage
{
  friend class irange;
 public:
/* Maximum number of pairs of ranges allowed.  */
  static const unsigned int max_pairs = 2;
  /* These are the pair of subranges for the irange.  The last
 wide_int allocated is a mask representing which bits in an
 integer are known to be non-zero.  */
  trailing_wide_ints trailing_bounds;
}

Compare this with mainline which has trailing_wide_ints<3>.  The extra
2 items in this patchset chew up two 64-bit words, for an additional
16 bytes per range in SSA_NAME_RANGE_INFO.  No additional storage is
needed for SSA_NAMEs per se.

I looked at Jakub's suggestion of compiling insn-recog.c.  Although I
don't see 4M SSA_NAMES nodes created Jakub sees, I do see a little
over a million when building with:

./cc1plus insn-recog.ii -fno-PIE -O2 -fno-exceptions -fno-rtti
-fasynchronous-unwind-tables  -quiet -fsanitize=address,undefined
-fmem-report

I explored 3 different ways of measuring memory consumption:

1. /usr/bin/time -f "%M" , which measures maximum RSS usage.  This
produced results within the noise.  The RSS usage differed slightly
between runs, with no consistent difference between mainline and
patch.

2. valgrind --tool=massif , no difference.  Perhaps the overhead
of our GC hides any difference?

3. --enable-gather-detailed-mem-stats and -fmem-report ...

Total Allocated before: 2351658176
Total Allocated  after: 2353199328
diff: 1541152 (0.06%)

SSA_NAME nodes allocated: 1026694

AFAICT with -fmem-report, a 2.35gig compilation consumes 1.5 more
megs? This is total usage, and some of this gets cleaned up during GC,
so the total impact is probably less.  Unless there is another
preferred way of measuring memory usage, I think memory is a non-issue
with this approach.

Note, this is even before my pending patch avoiding generation of
useless range information
(https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01068.html).

How does this look?

Aldy
diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 8ace3c2..5e48d6e 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1416,6 +1416,7 @@ OBJS = \
print-rtl-function.o \
print-tree.o \
profile.o \
+   range.o \
read-md.o \
read-rtl.o \
read-rtl-function.o \
@@ -2484,6 +2485,7 @@ GTFILES = $(CPP_ID_DATA_H) $(srcdir)/input.h 
$(srcdir)/coretypes.h \
   $(srcdir)/gimple.h \
   $(srcdir)/gimple-ssa.h \
   $(srcdir)/tree-chkp.c \
+  $(srcdir)/range.h $(srcdir)/range.c \
   $(srcdir)/tree-ssanames.c $(srcdir)/tree-eh.c $(srcdir)/tree-ssa-address.c \
   $(srcdir)/tree-cfg.c $(srcdir)/tree-ssa-loop-ivopts.c \
   $(srcdir)/tree-dfa.c \
diff --git a/gcc/builtins.c b/gcc/builtins.c
index 4f6c9c4..b5c9eb0 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -34,6 +34,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tm_p.h"
 #include "stringpool.h"
 #include "tree-vrp.h"
+#include "range.h"
 #include "tree-ssanames.h"
 #include "expmed.h"
 #include "optabs.h"
@@ -2894,6 +2895,52 @@ builtin_memcpy_read_str (void *data, HOST_WIDE_INT 
offset,
   return c_readstr (str + offset, mode);
 }
 
+/* If a range IR may have wrapped in such a way that we can guess the
+   range is positive, return TRUE and set PROBABLE_MAX_SIZE.
+   Otherwise, return FALSE and leave PROBABLE_MAX_SIZE unchanged.  */
+
+static bool
+range_may_have_wrapped (irange ir,
+   unsigned HOST_WIDE_INT *probable_max_size)
+{
+  /* Code like:
+
+   signed int n;
+   if (n < 100)
+ {
+   # RANGE [0, 99][0x8000, 0x]
+  _1 = (unsigned) n;
+  memcpy (a, b, _1)
+ }
+
+ Produce a range allowing negative values of N.  We can still use
+ the information and make a guess that N is not negative.  */
+  if (ir.num_pairs () != 2
+  || ir.lower_bound () != 0)
+return false;
+
+  const_tree type = ir.get_type ();
+  unsigned precision = TYPE_PRECISION (type);
+  gcc_assert (TYPE_UNSIGNED (type));
+
+  /* Build a range with all 

[6/7] Add a helper for getting the overall alignment of a DR

2017-07-03 Thread Richard Sandiford
This combines the information from previous patches to give a guaranteed
alignment for the DR as a whole.  This should be a bit safer than using
base_element_aligned, since that only really took the base into account
(not the init or offset).

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Richard


2017-07-03  Richard Sandiford  

gcc/
* tree-data-ref.h (dr_alignment): Declare.
* tree-data-ref.c (dr_alignment): New function.
* tree-vectorizer.h (dataref_aux): Remove base_element_aligned.
* tree-vect-data-refs.c (vect_compute_data_ref_alignment): Don't
set it.
* tree-vect-stmts.c (vectorizable_store): Use dr_alignment.

Index: gcc/tree-data-ref.h
===
--- gcc/tree-data-ref.h 2017-07-03 08:17:58.418572314 +0100
+++ gcc/tree-data-ref.h 2017-07-03 08:18:29.775412176 +0100
@@ -405,6 +405,16 @@ extern bool compute_all_dependences (vec
 vec, bool);
 extern tree find_data_references_in_bb (struct loop *, basic_block,
 vec *);
+extern unsigned int dr_alignment (innermost_loop_behavior *);
+
+/* Return the alignment in bytes that DR is guaranteed to have at all
+   times.  */
+
+inline unsigned int
+dr_alignment (data_reference *dr)
+{
+  return dr_alignment (_INNERMOST (dr));
+}
 
 extern bool dr_may_alias_p (const struct data_reference *,
const struct data_reference *, bool);
Index: gcc/tree-data-ref.c
===
--- gcc/tree-data-ref.c 2017-07-03 08:17:58.418572314 +0100
+++ gcc/tree-data-ref.c 2017-07-03 08:17:59.017546839 +0100
@@ -4769,6 +4769,30 @@ find_data_references_in_loop (struct loo
   return NULL_TREE;
 }
 
+/* Return the alignment in bytes that DRB is guaranteed to have at all
+   times.  */
+
+unsigned int
+dr_alignment (innermost_loop_behavior *drb)
+{
+  /* Get the alignment of BASE_ADDRESS + INIT.  */
+  unsigned int alignment = drb->base_alignment;
+  unsigned int misalignment = (drb->base_misalignment
+  + TREE_INT_CST_LOW (drb->init));
+  if (misalignment != 0)
+alignment = MIN (alignment, misalignment & -misalignment);
+
+  /* Cap it to the alignment of OFFSET.  */
+  if (!integer_zerop (drb->offset))
+alignment = MIN (alignment, drb->offset_alignment);
+
+  /* Cap it to the alignment of STEP.  */
+  if (!integer_zerop (drb->step))
+alignment = MIN (alignment, drb->step_alignment);
+
+  return alignment;
+}
+
 /* Recursive helper function.  */
 
 static bool
Index: gcc/tree-vectorizer.h
===
--- gcc/tree-vectorizer.h   2017-07-03 08:17:58.419572272 +0100
+++ gcc/tree-vectorizer.h   2017-07-03 08:18:09.031167838 +0100
@@ -752,8 +752,6 @@ struct dataref_aux {
   int misalignment;
   /* If true the alignment of base_decl needs to be increased.  */
   bool base_misaligned;
-  /* If true we know the base is at least vector element alignment aligned.  */
-  bool base_element_aligned;
   tree base_decl;
 };
 
Index: gcc/tree-vect-data-refs.c
===
--- gcc/tree-vect-data-refs.c   2017-07-03 08:17:58.419572272 +0100
+++ gcc/tree-vect-data-refs.c   2017-07-03 08:17:59.018546796 +0100
@@ -731,12 +731,6 @@ vect_compute_data_ref_alignment (struct
   unsigned int base_alignment = drb->base_alignment;
   unsigned int base_misalignment = drb->base_misalignment;
   unsigned HOST_WIDE_INT vector_alignment = TYPE_ALIGN_UNIT (vectype);
-  unsigned HOST_WIDE_INT element_alignment
-= TYPE_ALIGN_UNIT (TREE_TYPE (vectype));
-
-  if (base_alignment >= element_alignment
-  && (base_misalignment & (element_alignment - 1)) == 0)
-DR_VECT_AUX (dr)->base_element_aligned = true;
 
   if (drb->offset_alignment < vector_alignment
   || !step_preserves_misalignment_p
@@ -797,7 +791,6 @@ vect_compute_data_ref_alignment (struct
 
   DR_VECT_AUX (dr)->base_decl = base;
   DR_VECT_AUX (dr)->base_misaligned = true;
-  DR_VECT_AUX (dr)->base_element_aligned = true;
   base_misalignment = 0;
 }
   unsigned int misalignment = (base_misalignment
Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c   2017-07-03 08:07:42.512875037 +0100
+++ gcc/tree-vect-stmts.c   2017-07-03 08:17:59.019546754 +0100
@@ -6359,11 +6359,7 @@ vectorizable_store (gimple *stmt, gimple
misalign = 0;
  else if (DR_MISALIGNMENT (first_dr) == -1)
{
- if (DR_VECT_AUX (first_dr)->base_element_aligned)
-   align = TYPE_ALIGN_UNIT (elem_type);
- else
-   align = get_object_alignment (DR_REF (first_dr))
-   / BITS_PER_UNIT;
+ 

[5/7] Add DR_BASE_ALIGNMENT and DR_BASE_MISALIGNMENT

2017-07-03 Thread Richard Sandiford
This patch records the base alignment and misalignment in
innermost_loop_behavior, to avoid the second-guessing that was
previously done in vect_compute_data_ref_alignment.  It also makes
vect_analyze_data_refs use dr_analyze_innermost, instead of having an
almost-copy of the same code.

I wasn't sure whether the alignments should be measured in bits
(for consistency with most other interfaces) or in bytes (for consistency
with DR_ALIGNED_TO, now DR_OFFSET_ALIGNMENT, and with *_ptr_info_alignment).
I went for bytes because:

- I think in practice most consumers are going to want bytes.
  E.g. using bytes avoids having to mix TYPE_ALIGN and TYPE_ALIGN_UNIT
  in vect_compute_data_ref_alignment.

- It means that any bit-level paranoia is dealt with when building
  the innermost_loop_behavior and doesn't get pushed down to consumers.

Tested an aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Richard


2017-07-03  Richard Sandiford  

gcc/
* tree-data-ref.h (innermost_loop_behavior): Add base_alignment
and base_misalignment fields.
(DR_BASE_ALIGNMENT, DR_BASE_MISALIGNMENT): New macros.
* tree-data-ref.c: Include builtins.h.
(dr_analyze_innermost): Set up the new innmost_loop_behavior fields.
* tree-vectorizer.h (STMT_VINFO_DR_BASE_ALIGNMENT): New macro.
(STMT_VINFO_DR_BASE_MISALIGNMENT): Likewise.
* tree-vect-data-refs.c: Include tree-cfg.h.
(vect_compute_data_ref_alignment): Use the new innermost_loop_behavior
fields instead of calculating an alignment here.
(vect_analyze_data_refs): Use dr_analyze_innermost.  Dump the new
innermost_loop_behavior fields.

Index: gcc/tree-data-ref.h
===
--- gcc/tree-data-ref.h 2017-07-03 07:52:14.194782203 +0100
+++ gcc/tree-data-ref.h 2017-07-03 07:52:55.920272347 +0100
@@ -52,6 +52,42 @@ struct innermost_loop_behavior
   tree init;
   tree step;
 
+  /* BASE_ADDRESS is known to be misaligned by BASE_MISALIGNMENT bytes
+ from an alignment boundary of BASE_ALIGNMENT bytes.  For example,
+ if we had:
+
+   struct S __attribute__((aligned(16))) { ... };
+
+   char *ptr;
+   ... *(struct S *) (ptr - 4) ...;
+
+ the information would be:
+
+   base_address:  ptr
+   base_aligment:  16
+   base_misalignment:   4
+   init:   -4
+
+ where init cancels the base misalignment.  If instead we had a
+ reference to a particular field:
+
+   struct S __attribute__((aligned(16))) { ... int f; ... };
+
+   char *ptr;
+   ... ((struct S *) (ptr - 4))->f ...;
+
+ the information would be:
+
+   base_address:  ptr
+   base_aligment:  16
+   base_misalignment:   4
+   init:   -4 + offsetof (S, f)
+
+ where base_address + init might also be misaligned, and by a different
+ amount from base_address.  */
+  unsigned int base_alignment;
+  unsigned int base_misalignment;
+
   /* The largest power of two that divides OFFSET, capped to a suitably
  high value if the offset is zero.  This is a byte rather than a bit
  quantity.  */
@@ -147,6 +183,8 @@ #define DR_OFFSET(DR)  (DR)-
 #define DR_INIT(DR)(DR)->innermost.init
 #define DR_STEP(DR)(DR)->innermost.step
 #define DR_PTR_INFO(DR)(DR)->alias.ptr_info
+#define DR_BASE_ALIGNMENT(DR)  (DR)->innermost.base_alignment
+#define DR_BASE_MISALIGNMENT(DR)   (DR)->innermost.base_misalignment
 #define DR_OFFSET_ALIGNMENT(DR)(DR)->innermost.offset_alignment
 #define DR_STEP_ALIGNMENT(DR)  (DR)->innermost.step_alignment
 #define DR_INNERMOST(DR)   (DR)->innermost
Index: gcc/tree-data-ref.c
===
--- gcc/tree-data-ref.c 2017-07-03 07:52:14.193782226 +0100
+++ gcc/tree-data-ref.c 2017-07-03 07:52:55.920272347 +0100
@@ -94,6 +94,7 @@ Software Foundation; either version 3, o
 #include "dumpfile.h"
 #include "tree-affine.h"
 #include "params.h"
+#include "builtins.h"
 
 static struct datadep_stats
 {
@@ -802,11 +803,26 @@ dr_analyze_innermost (struct data_refere
   return false;
 }
 
+  /* Calculate the alignment and misalignment for the inner reference.  */
+  unsigned int HOST_WIDE_INT base_misalignment;
+  unsigned int base_alignment;
+  get_object_alignment_1 (base, _alignment, _misalignment);
+
+  /* There are no bitfield references remaining in BASE, so the values
+ we got back must be whole bytes.  */
+  gcc_assert (base_alignment % BITS_PER_UNIT == 0
+ && base_misalignment % BITS_PER_UNIT == 0);
+  base_alignment /= BITS_PER_UNIT;
+  base_misalignment /= BITS_PER_UNIT;
+
   if (TREE_CODE (base) == MEM_REF)
 {
   if (!integer_zerop (TREE_OPERAND (base, 1)))
{
+ /* Subtract MOFF from the base and add it to POFFSET instead.
+Adjust the 

[4/7] Add DR_STEP_ALIGNMENT

2017-07-03 Thread Richard Sandiford
A later patch adds base alignment information to innermost_loop_behavior.
After that, the only remaining piece of alignment information that wasn't
immediately obvious was the step alignment.  Adding that allows a minor
simplification to vect_compute_data_ref_alignment, and also potentially
improves the handling of variable strides for outer loop vectorisation.
A later patch will also use it to give the alignment of the DR as a whole.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Richard


2017-07-03  Richard Sandiford  

gcc/
* tree-data-ref.h (innermost_loop_behavior): Add a step_alignment
field.
(DR_STEP_ALIGNMENT): New macro.
* tree-vectorizer.h (STMT_VINFO_DR_STEP_ALIGNMENT): Likewise.
* tree-data-ref.c (dr_analyze_innermost): Initalize step_alignment.
(create_data_ref): Print it.
* tree-vect-stmts.c (vectorizable_load): Use the step alignment
to tell whether the step preserves vector (mis)alignment.
* tree-vect-data-refs.c (vect_compute_data_ref_alignment): Likewise.
Move the check for an integer step and generalise to all INTEGER_CST.
(vect_analyze_data_refs): Set DR_STEP_ALIGNMENT when setting DR_STEP.
Print the outer step alignment.

Index: gcc/tree-data-ref.h
===
--- gcc/tree-data-ref.h 2017-07-03 07:51:31.005161213 +0100
+++ gcc/tree-data-ref.h 2017-07-03 07:52:14.194782203 +0100
@@ -56,6 +56,9 @@ struct innermost_loop_behavior
  high value if the offset is zero.  This is a byte rather than a bit
  quantity.  */
   unsigned int offset_alignment;
+
+  /* Likewise for STEP.  */
+  unsigned int step_alignment;
 };
 
 /* Describes the evolutions of indices of the memory reference.  The indices
@@ -145,6 +148,7 @@ #define DR_INIT(DR)(DR)-
 #define DR_STEP(DR)(DR)->innermost.step
 #define DR_PTR_INFO(DR)(DR)->alias.ptr_info
 #define DR_OFFSET_ALIGNMENT(DR)(DR)->innermost.offset_alignment
+#define DR_STEP_ALIGNMENT(DR)  (DR)->innermost.step_alignment
 #define DR_INNERMOST(DR)   (DR)->innermost
 
 typedef struct data_reference *data_reference_p;
Index: gcc/tree-vectorizer.h
===
--- gcc/tree-vectorizer.h   2017-07-03 07:51:31.006161241 +0100
+++ gcc/tree-vectorizer.h   2017-07-03 07:52:14.196782157 +0100
@@ -709,6 +709,8 @@ #define STMT_VINFO_DR_OFFSET(S)
 #define STMT_VINFO_DR_STEP(S)  (S)->dr_wrt_vec_loop.step
 #define STMT_VINFO_DR_OFFSET_ALIGNMENT(S) \
   (S)->dr_wrt_vec_loop.offset_alignment
+#define STMT_VINFO_DR_STEP_ALIGNMENT(S) \
+  (S)->dr_wrt_vec_loop.step_alignment
 
 #define STMT_VINFO_IN_PATTERN_P(S) (S)->in_pattern_p
 #define STMT_VINFO_RELATED_STMT(S) (S)->related_stmt
Index: gcc/tree-data-ref.c
===
--- gcc/tree-data-ref.c 2017-07-03 07:51:31.004161185 +0100
+++ gcc/tree-data-ref.c 2017-07-03 07:52:14.193782226 +0100
@@ -870,6 +870,7 @@ dr_analyze_innermost (struct data_refere
   drb->init = init;
   drb->step = step;
   drb->offset_alignment = highest_pow2_factor (offset_iv.base);
+  drb->step_alignment = highest_pow2_factor (step);
 
   if (dump_file && (dump_flags & TDF_DETAILS))
 fprintf (dump_file, "success.\n");
@@ -1085,6 +1086,7 @@ create_data_ref (loop_p nest, loop_p loo
   print_generic_expr (dump_file, DR_STEP (dr), TDF_SLIM);
   fprintf (dump_file, "\n\toffset alignment: %d",
   DR_OFFSET_ALIGNMENT (dr));
+  fprintf (dump_file, "\n\tstep alignment: %d", DR_STEP_ALIGNMENT (dr));
   fprintf (dump_file, "\n\tbase_object: ");
   print_generic_expr (dump_file, DR_BASE_OBJECT (dr), TDF_SLIM);
   fprintf (dump_file, "\n");
Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c   2017-07-03 07:51:05.480852682 +0100
+++ gcc/tree-vect-stmts.c   2017-07-03 07:52:14.195782180 +0100
@@ -7294,8 +7294,7 @@ vectorizable_load (gimple *stmt, gimple_
  nested within an outer-loop that is being vectorized.  */
 
   if (nested_in_vect_loop
-  && (TREE_INT_CST_LOW (DR_STEP (dr))
- % GET_MODE_SIZE (TYPE_MODE (vectype)) != 0))
+  && (DR_STEP_ALIGNMENT (dr) % GET_MODE_SIZE (TYPE_MODE (vectype))) != 0)
 {
   gcc_assert (alignment_support_scheme != dr_explicit_realign_optimized);
   compute_in_loop = true;
Index: gcc/tree-vect-data-refs.c
===
--- gcc/tree-vect-data-refs.c   2017-07-03 07:51:31.006161241 +0100
+++ gcc/tree-vect-data-refs.c   2017-07-03 07:52:14.194782203 +0100
@@ -698,10 +698,9 @@ vect_compute_data_ref_alignment (struct
  divides by the vector size.  */
   else if (nested_in_vect_loop_p (loop, stmt))
 {
-  tree step = DR_STEP (dr);

[3/7] Rename DR_ALIGNED_TO to DR_OFFSET_ALIGNMENT

2017-07-03 Thread Richard Sandiford
This patch renames DR_ALIGNED_TO to DR_OFFSET_ALIGNMENT, to avoid
confusion with the upcoming DR_BASE_ALIGNMENT.  Nothing needed the
value as a tree, and the value is clipped to BIGGEST_ALIGNMENT
(maybe it should be MAX_OFILE_ALIGNMENT?) so we might as well use
an unsigned int instead.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Richard


2017-07-03  Richard Sandiford  

gcc/
* tree-data-ref.h (innermost_loop_behavior): Replace aligned_to
with offset_alignment.
(DR_ALIGNED_TO): Delete.
(DR_OFFSET_ALIGNMENT): New macro.
* tree-vectorizer.h (STMT_VINFO_DR_ALIGNED_TO): Delete.
(STMT_VINFO_DR_OFFSET_ALIGNMENT): New macro.
* tree-data-ref.c (dr_analyze_innermost): Update after above changes.
(create_data_ref): Likewise.
* tree-vect-data-refs.c (vect_compute_data_ref_alignment): Likewise.
(vect_analyze_data_refs): Likewise.
* tree-if-conv.c (if_convertible_loop_p_1): Use memset before
creating dummy innermost behavior.

Index: gcc/tree-data-ref.h
===
--- gcc/tree-data-ref.h 2017-07-03 08:03:19.181500132 +0100
+++ gcc/tree-data-ref.h 2017-07-03 08:06:19.720107957 +0100
@@ -52,9 +52,10 @@ struct innermost_loop_behavior
   tree init;
   tree step;
 
-  /* Alignment information.  ALIGNED_TO is set to the largest power of two
- that divides OFFSET.  */
-  tree aligned_to;
+  /* The largest power of two that divides OFFSET, capped to a suitably
+ high value if the offset is zero.  This is a byte rather than a bit
+ quantity.  */
+  unsigned int offset_alignment;
 };
 
 /* Describes the evolutions of indices of the memory reference.  The indices
@@ -143,7 +144,7 @@ #define DR_OFFSET(DR)  (DR)-
 #define DR_INIT(DR)(DR)->innermost.init
 #define DR_STEP(DR)(DR)->innermost.step
 #define DR_PTR_INFO(DR)(DR)->alias.ptr_info
-#define DR_ALIGNED_TO(DR)  (DR)->innermost.aligned_to
+#define DR_OFFSET_ALIGNMENT(DR)(DR)->innermost.offset_alignment
 #define DR_INNERMOST(DR)   (DR)->innermost
 
 typedef struct data_reference *data_reference_p;
Index: gcc/tree-vectorizer.h
===
--- gcc/tree-vectorizer.h   2017-07-03 07:57:56.883079731 +0100
+++ gcc/tree-vectorizer.h   2017-07-03 08:06:19.721107925 +0100
@@ -707,7 +707,8 @@ #define STMT_VINFO_DR_BASE_ADDRESS(S)
 #define STMT_VINFO_DR_INIT(S)  (S)->dr_wrt_vec_loop.init
 #define STMT_VINFO_DR_OFFSET(S)(S)->dr_wrt_vec_loop.offset
 #define STMT_VINFO_DR_STEP(S)  (S)->dr_wrt_vec_loop.step
-#define STMT_VINFO_DR_ALIGNED_TO(S)(S)->dr_wrt_vec_loop.aligned_to
+#define STMT_VINFO_DR_OFFSET_ALIGNMENT(S) \
+  (S)->dr_wrt_vec_loop.offset_alignment
 
 #define STMT_VINFO_IN_PATTERN_P(S) (S)->in_pattern_p
 #define STMT_VINFO_RELATED_STMT(S) (S)->related_stmt
Index: gcc/tree-data-ref.c
===
--- gcc/tree-data-ref.c 2017-07-03 08:03:19.181500132 +0100
+++ gcc/tree-data-ref.c 2017-07-03 08:06:19.720107957 +0100
@@ -869,7 +869,7 @@ dr_analyze_innermost (struct data_refere
   drb->offset = fold_convert (ssizetype, offset_iv.base);
   drb->init = init;
   drb->step = step;
-  drb->aligned_to = size_int (highest_pow2_factor (offset_iv.base));
+  drb->offset_alignment = highest_pow2_factor (offset_iv.base);
 
   if (dump_file && (dump_flags & TDF_DETAILS))
 fprintf (dump_file, "success.\n");
@@ -1083,8 +1083,8 @@ create_data_ref (loop_p nest, loop_p loo
   print_generic_expr (dump_file, DR_INIT (dr), TDF_SLIM);
   fprintf (dump_file, "\n\tstep: ");
   print_generic_expr (dump_file, DR_STEP (dr), TDF_SLIM);
-  fprintf (dump_file, "\n\taligned to: ");
-  print_generic_expr (dump_file, DR_ALIGNED_TO (dr), TDF_SLIM);
+  fprintf (dump_file, "\n\toffset alignment: %d",
+  DR_OFFSET_ALIGNMENT (dr));
   fprintf (dump_file, "\n\tbase_object: ");
   print_generic_expr (dump_file, DR_BASE_OBJECT (dr), TDF_SLIM);
   fprintf (dump_file, "\n");
Index: gcc/tree-vect-data-refs.c
===
--- gcc/tree-vect-data-refs.c   2017-07-03 07:57:47.758408141 +0100
+++ gcc/tree-vect-data-refs.c   2017-07-03 08:06:19.721107925 +0100
@@ -772,7 +772,7 @@ vect_compute_data_ref_alignment (struct
 
   alignment = TYPE_ALIGN_UNIT (vectype);
 
-  if ((compare_tree_int (drb->aligned_to, alignment) < 0)
+  if (drb->offset_alignment < alignment
   || !step_preserves_misalignment_p)
 {
   if (dump_enabled_p ())
@@ -3412,8 +3412,8 @@ vect_analyze_data_refs (vec_info *vinfo,
{
  DR_OFFSET (newdr) = ssize_int (0);
  DR_STEP (newdr) = 

[1/7] Use innermost_loop_behavior for outer loop vectorisation

2017-07-03 Thread Richard Sandiford
This patch replaces the individual stmt_vinfo dr_* fields with
an innermost_loop_behavior, so that the changes in later patches
get picked up automatically.  It also adds a helper function for
getting the behavior of a data reference wrt the vectorised loop.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Richard


2017-07-03  Richard Sandiford  

gcc/
* tree-vectorizer.h (_stmt_vec_info): Replace individual dr_*
fields with dr_wrt_vec_loop.
(STMT_VINFO_DR_BASE_ADDRESS, STMT_VINFO_DR_INIT, STMT_VINFO_DR_OFFSET)
(STMT_VINFO_DR_STEP, STMT_VINFO_DR_ALIGNED_TO): Update accordingly.
(STMT_VINFO_DR_WRT_VEC_LOOP): New macro.
(vect_dr_behavior): New function.
(vect_create_addr_base_for_vector_ref): Remove loop parameter.
* tree-vect-data-refs.c (vect_compute_data_ref_alignment): Use
vect_dr_behavior.  Use a step_preserves_misalignment_p boolean to
track whether the step preserves the misalignment.
(vect_create_addr_base_for_vector_ref): Remove loop parameter.
Use vect_dr_behavior.
(vect_setup_realignment): Update call accordingly.
(vect_create_data_ref_ptr): Likewise.  Use vect_dr_behavior.
* tree-vect-loop-manip.c (vect_gen_prolog_loop_niters): Update
call to vect_create_addr_base_for_vector_ref.
(vect_create_cond_for_align_checks): Likewise.
* tree-vect-patterns.c (vect_recog_bool_pattern): Copy
STMT_VINFO_DR_WRT_VEC_LOOP as a block.
(vect_recog_mask_conversion_pattern): Likewise.
* tree-vect-stmts.c (compare_step_with_zero): Use vect_dr_behavior.
(new_stmt_vec_info): Remove redundant zeroing.

Index: gcc/tree-vectorizer.h
===
--- gcc/tree-vectorizer.h   2017-07-03 07:53:58.15242 +0100
+++ gcc/tree-vectorizer.h   2017-07-03 07:57:56.883079731 +0100
@@ -554,11 +554,7 @@ typedef struct _stmt_vec_info {
 
   /* Information about the data-ref relative to this loop
  nest (the loop that is being considered for vectorization).  */
-  tree dr_base_address;
-  tree dr_init;
-  tree dr_offset;
-  tree dr_step;
-  tree dr_aligned_to;
+  innermost_loop_behavior dr_wrt_vec_loop;
 
   /* For loop PHI nodes, the base and evolution part of it.  This makes sure
  this information is still available in vect_update_ivs_after_vectorizer
@@ -706,11 +702,12 @@ #define STMT_VINFO_SIMD_LANE_ACCESS_P(S)
 #define STMT_VINFO_VEC_REDUCTION_TYPE(S)   (S)->v_reduc_type
 #define STMT_VINFO_VEC_CONST_COND_REDUC_CODE(S) (S)->const_cond_reduc_code
 
-#define STMT_VINFO_DR_BASE_ADDRESS(S)  (S)->dr_base_address
-#define STMT_VINFO_DR_INIT(S)  (S)->dr_init
-#define STMT_VINFO_DR_OFFSET(S)(S)->dr_offset
-#define STMT_VINFO_DR_STEP(S)  (S)->dr_step
-#define STMT_VINFO_DR_ALIGNED_TO(S)(S)->dr_aligned_to
+#define STMT_VINFO_DR_WRT_VEC_LOOP(S)  (S)->dr_wrt_vec_loop
+#define STMT_VINFO_DR_BASE_ADDRESS(S)  (S)->dr_wrt_vec_loop.base_address
+#define STMT_VINFO_DR_INIT(S)  (S)->dr_wrt_vec_loop.init
+#define STMT_VINFO_DR_OFFSET(S)(S)->dr_wrt_vec_loop.offset
+#define STMT_VINFO_DR_STEP(S)  (S)->dr_wrt_vec_loop.step
+#define STMT_VINFO_DR_ALIGNED_TO(S)(S)->dr_wrt_vec_loop.aligned_to
 
 #define STMT_VINFO_IN_PATTERN_P(S) (S)->in_pattern_p
 #define STMT_VINFO_RELATED_STMT(S) (S)->related_stmt
@@ -1012,6 +1009,22 @@ known_alignment_for_access_p (struct dat
   return (DR_MISALIGNMENT (data_ref_info) != DR_MISALIGNMENT_UNKNOWN);
 }
 
+/* Return the behavior of DR with respect to the vectorization context
+   (which for outer loop vectorization might not be the behavior recorded
+   in DR itself).  */
+
+static inline innermost_loop_behavior *
+vect_dr_behavior (data_reference *dr)
+{
+  gimple *stmt = DR_STMT (dr);
+  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
+  loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
+  if (loop_vinfo == NULL
+  || !nested_in_vect_loop_p (LOOP_VINFO_LOOP (loop_vinfo), stmt))
+return _INNERMOST (dr);
+  else
+return _VINFO_DR_WRT_VEC_LOOP (stmt_info);
+}
 
 /* Return true if the vect cost model is unlimited.  */
 static inline bool
@@ -1138,8 +1151,7 @@ extern tree vect_get_new_vect_var (tree,
 extern tree vect_get_new_ssa_name (tree, enum vect_var_kind,
   const char * = NULL);
 extern tree vect_create_addr_base_for_vector_ref (gimple *, gimple_seq *,
- tree, struct loop *,
- tree = NULL_TREE);
+ tree, tree = NULL_TREE);
 
 /* In tree-vect-loop.c.  */
 /* FORNOW: Used in tree-parloops.c.  */
Index: gcc/tree-vect-data-refs.c
===
--- 

[2/7] Make dr_analyze_innermost operate on innermost_loop_behavior

2017-07-03 Thread Richard Sandiford
This means that callers to dr_analyze_innermost don't need a full
data_reference and don't need to fill in any fields beforehand.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Richard


2017-07-03  Richard Sandiford  

gcc/
* tree-data-ref.h (dr_analyze_innermost): Replace the dr argument
with a "innermost_loop_behavior *" and refeence tree.
* tree-data-ref.c (dr_analyze_innermost): Likewise.
(create_data_ref): Update call accordingly.
* tree-predcom.c (find_looparound_phi): Likewise.

Index: gcc/tree-data-ref.h
===
--- gcc/tree-data-ref.h 2017-07-03 07:53:58.106558668 +0100
+++ gcc/tree-data-ref.h 2017-07-03 08:03:19.181500132 +0100
@@ -322,7 +322,7 @@ #define DDR_DIST_VECT(DDR, I) \
 #define DDR_REVERSED_P(DDR) (DDR)->reversed_p
 
 
-bool dr_analyze_innermost (struct data_reference *, struct loop *);
+bool dr_analyze_innermost (innermost_loop_behavior *, tree, struct loop *);
 extern bool compute_data_dependences_for_loop (struct loop *, bool,
   vec *,
   vec *,
Index: gcc/tree-data-ref.c
===
--- gcc/tree-data-ref.c 2017-07-03 07:57:44.485520457 +0100
+++ gcc/tree-data-ref.c 2017-07-03 08:03:19.181500132 +0100
@@ -864,13 +864,12 @@ dr_analyze_innermost (struct data_refere
 fold_convert (ssizetype, base_iv.step),
 fold_convert (ssizetype, offset_iv.step));
 
-  DR_BASE_ADDRESS (dr) = canonicalize_base_object_address (base_iv.base);
+  drb->base_address = canonicalize_base_object_address (base_iv.base);
 
-  DR_OFFSET (dr) = fold_convert (ssizetype, offset_iv.base);
-  DR_INIT (dr) = init;
-  DR_STEP (dr) = step;
-
-  DR_ALIGNED_TO (dr) = size_int (highest_pow2_factor (offset_iv.base));
+  drb->offset = fold_convert (ssizetype, offset_iv.base);
+  drb->init = init;
+  drb->step = step;
+  drb->aligned_to = size_int (highest_pow2_factor (offset_iv.base));
 
   if (dump_file && (dump_flags & TDF_DETAILS))
 fprintf (dump_file, "success.\n");
Index: gcc/tree-predcom.c
===
--- gcc/tree-predcom.c  2017-07-03 07:53:58.106558668 +0100
+++ gcc/tree-predcom.c  2017-07-03 08:03:19.181500132 +0100
@@ -1149,7 +1149,7 @@ find_looparound_phi (struct loop *loop,
   memset (_dr, 0, sizeof (struct data_reference));
   DR_REF (_dr) = init_ref;
   DR_STMT (_dr) = phi;
-  if (!dr_analyze_innermost (_dr, loop))
+  if (!dr_analyze_innermost (_INNERMOST (_dr), init_ref, loop))
 return NULL;
 
   if (!valid_initializer_p (_dr, ref->distance + 1, root->ref))


Re: [PATCH] Fix PR81249

2017-07-03 Thread Richard Biener
On Fri, 30 Jun 2017, Christophe Lyon wrote:

> Hi Richard,
> 
> 
> On 29 June 2017 at 14:53, Richard Biener  wrote:
> >
> > Bootstrap and regtest running on x86_64-unknown-linux-gnu.
> >
> > Richard.
> >
> > 2017-06-29  Richard Biener  
> >
> > PR tree-optimization/81249
> > * tree-vect-loop.c (vect_create_epilog_for_reduction): Convert
> > condition reduction result to original scalar type.
> >
> > * g++.dg/torture/pr81249.C: New testcase.
> >
> 
> I think this patch (r249831) causes a regression on arm / aarch64:
> gcc.dg/vect/pr65947-10.c (internal compiler error)
> gcc.dg/vect/pr65947-10.c -flto -ffat-lto-objects (internal compiler error)
> 
> The regression appears between r249828 and 249831, which seems the
> most likely guilty?

Whoops - sorry.  Testing the following.

Richard.

2017-07-03  Richard Biener  

* tree-vect-loop.c (vect_create_epilog_for_reduction): Revert
back to using VIEW_CONVERT_EXPR.

Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c(revision 249897)
+++ gcc/tree-vect-loop.c(working copy)
@@ -4842,7 +4842,8 @@ vect_create_epilog_for_reduction (vec The log says:
> 
> /testsuite/gcc.dg/vect/pr65947-10.c: In function 'condition_reduction':
> /testsuite/gcc.dg/vect/pr65947-10.c:12:1: error: invalid types in nop 
> conversion
> float
> unsigned int
> _47 = (float) _46;
> during GIMPLE pass: vect
> dump file: pr65947-10.c.159t.vect
> /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/gcc.dg/vect/pr65947-10.c:12:1:
> internal compiler error: verify_gimple failed
> 0xbb9107 verify_gimple_in_cfg(function*, bool)
> /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/tree-cfg.c:5308
> 0xa7735c execute_function_todo
> /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/passes.c:1989
> 0xa770f5 execute_todo
> /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/passes.c:2043
> Please submit a full bug report,
> 
> Christophe
> 
> 
> > Index: gcc/tree-vect-loop.c
> > ===
> > --- gcc/tree-vect-loop.c(revision 249780)
> > +++ gcc/tree-vect-loop.c(working copy)
> > @@ -4833,12 +4858,9 @@ vect_create_epilog_for_reduction (vec >
> >/* Convert the reduced value back to the result type and set as the
> >  result.  */
> > -  tree data_reduc_cast = build1 (VIEW_CONVERT_EXPR, scalar_type,
> > -data_reduc);
> > -  epilog_stmt = gimple_build_assign (new_scalar_dest, data_reduc_cast);
> > -  new_temp = make_ssa_name (new_scalar_dest, epilog_stmt);
> > -  gimple_assign_set_lhs (epilog_stmt, new_temp);
> > -  gsi_insert_before (_gsi, epilog_stmt, GSI_SAME_STMT);
> > +  gimple_seq stmts = NULL;
> > +  new_temp = gimple_convert (, scalar_type, data_reduc);
> > +  gsi_insert_seq_before (_gsi, stmts, GSI_SAME_STMT);
> >scalar_results.safe_push (new_temp);
> >  }
> >else if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION
> > @@ -4903,6 +4925,11 @@ vect_create_epilog_for_reduction (vec >   val = new_val;
> > }
> > }
> > +  /* Convert the reduced value back to the result type and set as the
> > +result.  */
> > +  gimple_seq stmts = NULL;
> > +  val = gimple_convert (, scalar_type, val);
> > +  gsi_insert_seq_before (_gsi, stmts, GSI_SAME_STMT);
> >scalar_results.safe_push (val);
> >  }
> >
> >
> > Index: gcc/testsuite/g++.dg/torture/pr81249.C
> > ===
> > --- gcc/testsuite/g++.dg/torture/pr81249.C  (nonexistent)
> > +++ gcc/testsuite/g++.dg/torture/pr81249.C  (working copy)
> > @@ -0,0 +1,25 @@
> > +/* { dg-do compile } */
> > +/* { dg-additional-options "-mavx2 -mprefer-avx128" { target x86_64-*-* 
> > i?86-*-* } } */
> > +
> > +typedef struct rtx_def *rtx;
> > +union rtunion {
> > +rtx rt_rtx;
> > +};
> > +struct rtx_def {
> > +struct {
> > +   rtunion fld[0];
> > +} u;
> > +rtx elem[];
> > +} a;
> > +int b, c, d;
> > +rtx e;
> > +int main() {
> > +for (;;) {
> > +   d = 0;
> > +   for (; d < b; d++)
> > + if (a.elem[d])
> > +   e = a.elem[d]->u.fld[1].rt_rtx;
> > +   if (e)
> > + c = 0;
> > +}
> > +}
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


[PATCH][1/2] PR60510, reduction chain vectorization w/o SLP

2017-07-03 Thread Richard Biener

This is the 1st patch in the series to support reduction chain 
vectorization without SLP.  In the PR there's a reduction chain detected
that doesn't qualify SLP but then this completely fails vectorization
because we cannot do regular reduction vectorization on this.

This patch series aims at fixing this (and allow further enhancements
for a few other PRs) by splitting reduction vectorization into two pieces,
vectorizing the reduction PHIs (creating vector defs) and vectorizing
the reduction operation.  This allows reduction operations covering
more than one stmt to be trivially(*) vectorized using the regular
vectorizable_* helpers as they have access to the reduction PHI vector
defs.  SLP reduction chains as detected just require adjustments to
the code generation phase to support non-SLP operation as the analysis
phase already figured out the reduction chain is a valid reduction.

This first patch simply splits the reduction vectorization into
two phases, reduction PHI def generation and the rest.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2016-07-03  Richard Biener  

* tree-vect-loop.c (vect_analyze_loop_operations): Also analyze
reduction PHIs.
(vect_force_simple_reduction): Record reduction def -> phi mapping.
(vectorizable_reduction): Perform reduction PHI creation when
visiting a reduction PHI and adjust and simplify code generation
phase of the reduction op.  Cache dts, use fold_binary, not fold_build2.
(vect_transform_loop): Visit reduction PHIs.
* tree-vect-slp.c (vect_get_and_check_slp_defs): Record reduction
defs into the SLP tree.
(vect_build_slp_tree): Reduction defs terminate the recursion.
* tree-vect-stmts.c (vect_get_vec_def_for_operand_1): Allow lookup
of reduction defs.
(vect_get_vec_defs_for_stmt_copy): Export.
(vect_get_vec_defs): Likewise.
* tree-vectorizer.h (struct _stmt_vec_info): Amend reduc_def
purpose.
(vect_get_vec_defs_for_stmt_copy): Declare.
(vect_get_vec_defs): Likewise.

Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c(revision 249892)
+++ gcc/tree-vect-loop.c(working copy)
@@ -1778,6 +1778,10 @@ vect_analyze_loop_operations (loop_vec_i
   if (STMT_VINFO_DEF_TYPE (stmt_info) == vect_induction_def
  && ! PURE_SLP_STMT (stmt_info))
 ok = vectorizable_induction (phi, NULL, NULL, NULL);
+ else if ((STMT_VINFO_DEF_TYPE (stmt_info) == vect_reduction_def
+   || STMT_VINFO_DEF_TYPE (stmt_info) == vect_nested_cycle)
+  && ! PURE_SLP_STMT (stmt_info))
+   ok = vectorizable_reduction (phi, NULL, NULL, NULL);
 }
 
  if (ok && STMT_VINFO_LIVE_P (stmt_info))
@@ -3185,6 +3189,8 @@ vect_force_simple_reduction (loop_vec_in
   stmt_vec_info reduc_def_info = vinfo_for_stmt (phi);
   STMT_VINFO_REDUC_TYPE (reduc_def_info) = v_reduc_type;
   STMT_VINFO_REDUC_DEF (reduc_def_info) = def;
+  reduc_def_info = vinfo_for_stmt (def);
+  STMT_VINFO_REDUC_DEF (reduc_def_info) = phi;
 }
   return def;
 }
@@ -5558,7 +5564,6 @@ vectorizable_reduction (gimple *stmt, gi
 {
   tree vec_dest;
   tree scalar_dest;
-  tree loop_vec_def0 = NULL_TREE, loop_vec_def1 = NULL_TREE;
   stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
   tree vectype_out = STMT_VINFO_VECTYPE (stmt_info);
   tree vectype_in = NULL_TREE;
@@ -5576,7 +5581,6 @@ vectorizable_reduction (gimple *stmt, gi
   bool is_simple_use;
   gimple *orig_stmt;
   stmt_vec_info orig_stmt_info;
-  tree expr = NULL_TREE;
   int i;
   int ncopies;
   int epilog_copies;
@@ -5586,6 +5590,7 @@ vectorizable_reduction (gimple *stmt, gi
   gimple *new_stmt = NULL;
   int j;
   tree ops[3];
+  enum vect_def_type dts[3];
   bool nested_cycle = false, found_nested_cycle_def = false;
   gimple *reduc_def_stmt = NULL;
   bool double_reduc = false;
@@ -5598,11 +5603,23 @@ vectorizable_reduction (gimple *stmt, gi
   auto_vec vect_defs;
   auto_vec phis;
   int vec_num;
-  tree def0, def1, tem, op1 = NULL_TREE;
+  tree def0, tem;
   bool first_p = true;
   tree cr_index_scalar_type = NULL_TREE, cr_index_vector_type = NULL_TREE;
   tree cond_reduc_val = NULL_TREE;
 
+  /* Make sure it was already recognized as a reduction computation.  */
+  if (STMT_VINFO_DEF_TYPE (vinfo_for_stmt (stmt)) != vect_reduction_def
+  && STMT_VINFO_DEF_TYPE (vinfo_for_stmt (stmt)) != vect_nested_cycle)
+return false;
+
+  if (nested_in_vect_loop_p (loop, stmt))
+{
+  outer_loop = loop;
+  loop = loop->inner;
+  nested_cycle = true;
+}
+
   /* In case of reduction chain we switch to the first stmt in the chain, but
  we don't update STMT_INFO, since only the last stmt is marked as reduction
  and has reduction 

Re: [PATCH] Add dotfn

2017-07-03 Thread Richard Biener
On Mon, 3 Jul 2017, Tom de Vries wrote:

> Hi,
> 
> this patch adds a debug function dotfn and a convenience macro DOTFN similar
> to dot-fn in gdbhooks.py.
> 
> It can be used to have the compiler:
> - dump a control flow graph, or
> - pop up a control flow graph window
> at specific moments in the compilation flow, for debugging purposes.
> 
> Bootstrapped and reg-tested on x86_64.
> 
> Used for debugging PR81192.
> 
> OK for trunk?

Why's dot-fn not enough?  I'd rather extend stuff in gdbhooks.py than
adding this kind of stuff to gcc itself.

Thanks,
Richard.


  1   2   >