[PATCH] RFC: machine-readable diagnostic output (PR other/19165)

2018-11-12 Thread David Malcolm
This patch implements a -fdiagnostics-format=json option which
converts the diagnostics to be output to stderr in a JSON format;
see the documentation in invoke.texi.

Logically-related diagnostics are nested at the JSON level, using
the auto_diagnostic_group mechanism.

gcc/ChangeLog:
PR other/19165
* Makefile.in (OBJS): Move json.o to...
(OBJS-libcommon): ...here and add diagnostic-format-json.o.
* common.opt (fdiagnostics-format=): New option.
(diagnostics_output_format): New enum.
* diagnostic-format-json.cc: New file.
* diagnostic.c (default_diagnostic_final_cb): New function, taken
from start of diagnostic_finish.
(diagnostic_initialize): Initialize final_cb to
default_diagnostic_final_cb.
(diagnostic_finish): Move "being treated as errors" messages to
default_diagnostic_final_cb.  Call any final_cb.
* diagnostic.h (enum diagnostics_output_format): New enum.
(struct diagnostic_context): Add "final_cb".
(diagnostic_output_format_init): New decl.
* doc/invoke.texi (-fdiagnostics-format): New option.
* dwarf2out.c (gen_producer_string): Ignore
OPT_fdiagnostics_format_.
* gcc.c (driver_handle_option): Handle OPT_fdiagnostics_format_.
* lto-wrapper.c (append_diag_options): Ignore it.
* opts.c (common_handle_option): Handle it.
---
 gcc/Makefile.in   |   2 +-
 gcc/common.opt|  17 +++
 gcc/diagnostic-format-json.cc | 265 ++
 gcc/diagnostic.c  |  40 ---
 gcc/diagnostic.h  |  16 +++
 gcc/doc/invoke.texi   |  78 +
 gcc/dwarf2out.c   |   1 +
 gcc/gcc.c |   5 +
 gcc/lto-wrapper.c |   1 +
 gcc/opts.c|   5 +
 10 files changed, 414 insertions(+), 16 deletions(-)
 create mode 100644 gcc/diagnostic-format-json.cc

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 16c9ed6..9534d59 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1395,7 +1395,6 @@ OBJS = \
ira-color.o \
ira-emit.o \
ira-lives.o \
-   json.o \
jump.o \
langhooks.o \
lcm.o \
@@ -1619,6 +1618,7 @@ OBJS = \
 # Objects in libcommon.a, potentially used by all host binaries and with
 # no target dependencies.
 OBJS-libcommon = diagnostic.o diagnostic-color.o diagnostic-show-locus.o \
+   diagnostic-format-json.o json.o \
edit-context.o \
pretty-print.o intl.o \
sbitmap.o \
diff --git a/gcc/common.opt b/gcc/common.opt
index 5a5d332..2f669f6 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1273,6 +1273,23 @@ Enum(diagnostic_color_rule) String(always) 
Value(DIAGNOSTICS_COLOR_YES)
 EnumValue
 Enum(diagnostic_color_rule) String(auto) Value(DIAGNOSTICS_COLOR_AUTO)
 
+fdiagnostics-format=
+Common Joined RejectNegative Enum(diagnostics_output_format)
+-fdiagnostics-format=[text|json] Select output format
+
+; Required for these enum values.
+SourceInclude
+diagnostic.h
+
+Enum
+Name(diagnostics_output_format) Type(int)
+
+EnumValue
+Enum(diagnostics_output_format) String(text) 
Value(DIAGNOSTICS_OUTPUT_FORMAT_TEXT)
+
+EnumValue
+Enum(diagnostics_output_format) String(json) 
Value(DIAGNOSTICS_OUTPUT_FORMAT_JSON)
+
 fdiagnostics-parseable-fixits
 Common Var(flag_diagnostics_parseable_fixits)
 Print fix-it hints in machine-readable form.
diff --git a/gcc/diagnostic-format-json.cc b/gcc/diagnostic-format-json.cc
new file mode 100644
index 000..7860696
--- /dev/null
+++ b/gcc/diagnostic-format-json.cc
@@ -0,0 +1,265 @@
+/* JSON output for diagnostics
+   Copyright (C) 2018 Free Software Foundation, Inc.
+   Contributed by David Malcolm .
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "diagnostic.h"
+#include "json.h"
+
+/* The top-level JSON array of pending diagnostics.  */
+
+static json::array *toplevel_array;
+
+/* The JSON object for the current diagnostic group.  */
+
+static json::object *cur_group;
+
+/* The JSON array for the "children" array within the current diagnostic
+   group.  */
+
+static json::array *cur_children_array;
+
+/* Generate a JSON object for LOC.  */
+
+static json::object *
+json_from_expanded_location (location_t loc)
+{
+  expanded_location ex

Re: [PATCH] Improve -fprofile-report.

2018-11-12 Thread Martin Liška
PING^1

On 11/6/18 3:05 PM, Martin Liška wrote:
> Hi.
> 
> The patch is based on what was discussed on IRC and in the PR.
> Apart from that the reported layout is improved.
> 
> Patch survives regression tests on x86_64-linux-gnu.
> 
> Ready for trunk?
> Martin
> 
> gcc/ChangeLog:
> 
> 2018-11-06  Martin Liska  
> 
>   PR tree-optimization/87885
>   * cfghooks.c (account_profile_record): Rename
>   to ...
>   (profile_record_check_consistency): ... this.
>   Calculate missing num_mismatched_freq_in.
>   (profile_record_account_profile): New function
>   that calculates time and size of a function.
>   * cfghooks.h (struct profile_record): Remove
>   all tuples.
>   (struct cfg_hooks): Remove after_pass flag.
>   (account_profile_record): Rename to ...
>   (profile_record_check_consistency): ... this.
>   (profile_record_account_profile): New.
>   * cfgrtl.c (rtl_account_profile_record): Remove
>   after_pass flag.
>   * passes.c (check_profile_consistency): Do only
>   checking.
>   (account_profile): Calculate size and time of
>   function only.
>   (pass_manager::dump_profile_report): Reformat
>   output.
>   (execute_one_ipa_transform_pass): Call
>   consistency check before clean upand call account_profile
>   after a clean up is done.
>   (execute_one_pass): Call check_profile_consistency and
>   account_profile instead of using after_pass flag..
>   * tree-cfg.c (gimple_account_profile_record): Likewise.
> ---
>  gcc/cfghooks.c |  38 +++--
>  gcc/cfghooks.h |  17 ++--
>  gcc/cfgrtl.c   |  12 ++-
>  gcc/passes.c   | 207 ++---
>  gcc/tree-cfg.c |  11 ++-
>  5 files changed, 161 insertions(+), 124 deletions(-)
> 
> 



Re: [PATCH] Come up with --param asan-stack-small-redzone (PR sanitizer/81715).

2018-11-12 Thread Martin Liška
PING^3

On 10/23/18 11:02 AM, Martin Liška wrote:
> PING^2
> 
> On 10/9/18 10:29 AM, Martin Liška wrote:
>> PING^1
>>
>> On 9/26/18 11:33 AM, Martin Liška wrote:
>>> On 9/25/18 5:53 PM, Jakub Jelinek wrote:
 On Tue, Sep 25, 2018 at 05:26:44PM +0200, Martin Liška wrote:
> The only missing piece is how to implement asan_emit_redzone_payload more 
> smart.
> It means doing memory stores with 8,4,2,1 sizes in order to reduce # of 
> insns.
> Do we have somewhere a similar code?

 Yeah, that is a very important optimization.  I wasn't using DImode because
 at least on x86_64 64-bit constants are quite expensive and on several 
 other
 targets even more so, so SImode was a compromise to get size of the 
 prologue
 under control and not very slow.  What I think we want is figure out ranges
>>>
>>> Ah, some time ago, I remember you mentioned the 64-bit constants are 
>>> expensive
>>> (even on x86_64). Btw. it's what clang used for the red zone 
>>> instrumentation.
>>>
 of shadow bytes we want to initialize and the values we want to store 
 there,
 perhaps take also into account strict alignment vs. non-strict alignment,
 and perform kind of store merging for it.  Given that 2 shadow bytes would
 be only used for the very small variables (<=4 bytes in size, so <= 0.5
 bytes of shadow), we'd just need a way to remember the 2 shadow bytes 
 across
 handling adjacent vars and store it together.
>>>
>>> Agree, it's implemented in next version of patch.
>>>

 I think we want to introduce some define for minimum red zone size and use
 it instead of the granularity (granularity is 8 bytes, but minimum red zone
 size if we count into it also the very small variable size is 16 bytes).

> --- a/gcc/asan.h
> +++ b/gcc/asan.h
> @@ -102,6 +102,26 @@ asan_red_zone_size (unsigned int size)
>return c ? 2 * ASAN_RED_ZONE_SIZE - c : ASAN_RED_ZONE_SIZE;
>  }
>  
> +/* Return how much a stack variable occupy on a stack
> +   including a space for redzone.  */
> +
> +static inline unsigned int
> +asan_var_and_redzone_size (unsigned int size)

 The argument needs to be UHWI, otherwise you do a wrong thing for
 say 4GB + 4 bytes long variable.  Ditto the result.

> +{
> +  if (size <= 4)
> +return 16;
> +  else if (size <= 16)
> +return 32;
> +  else if (size <= 128)
> +return 32 + size;
> +  else if (size <= 512)
> +return 64 + size;
> +  else if (size <= 4096)
> +return 128 + size;
> +  else
> +return 256 + size;

 I'd prefer size + const instead of const + size operand order.

> @@ -1125,13 +1125,13 @@ expand_stack_vars (bool (*pred) (size_t), struct 
> stack_vars_data *data)
> && stack_vars[i].size.is_constant ())
>   {
> prev_offset = align_base (prev_offset,
> - MAX (alignb, ASAN_RED_ZONE_SIZE),
> + MAX (alignb, ASAN_SHADOW_GRANULARITY),

 Use that ASAN_MIN_RED_ZONE_SIZE (16) here.

>   !FRAME_GROWS_DOWNWARD);
> tree repr_decl = NULL_TREE;
> +   poly_uint64 size =  asan_var_and_redzone_size 
> (stack_vars[i].size.to_constant ());

 Too long line.  Two spaces instead of one.  Why poly_uint64?
 Plus, perhaps if data->asan_vec is empty (i.e. when assigning the topmost
 automatic variable in a frame), we should ensure that size is at least
 2 * ASAN_RED_ZONE_SIZE (or just 1 * ASAN_RED_ZONE_SIZE). 

> offset
> - = alloc_stack_frame_space (stack_vars[i].size
> -+ ASAN_RED_ZONE_SIZE,
> -MAX (alignb, ASAN_RED_ZONE_SIZE));
> + = alloc_stack_frame_space (size,
> +MAX (alignb, 
> ASAN_SHADOW_GRANULARITY));

 Again, too long line and we want 16 instead of 8 here too.
>  
> data->asan_vec.safe_push (prev_offset);
> /* Allocating a constant amount of space from a constant
> @@ -2254,7 +2254,7 @@ expand_used_vars (void)
>& ~(data.asan_alignb - HOST_WIDE_INT_1)) - sz;
> /* Allocating a constant amount of space from a constant
>starting offset must give a constant result.  */
> -   offset = (alloc_stack_frame_space (redzonesz, ASAN_RED_ZONE_SIZE)
> +   offset = (alloc_stack_frame_space (redzonesz, ASAN_SHADOW_GRANULARITY)

 and here too.

Jakub

>>>
>>> The rest is also implemented as requested. I'm testing Linux kernel now, 
>>> will send
>>> stats to the PR created for it.
>>>
>>> Patch survives testing on x86_64-linux-gnu.
>>>
>>> Martin
>>>
>>
> 



Re: [PATCH] More value_range API cleanup

2018-11-12 Thread Richard Biener
On Tue, 13 Nov 2018, Richard Biener wrote:

> On Mon, 12 Nov 2018, Aldy Hernandez wrote:
> 
> > On 11/12/18 7:12 AM, Richard Biener wrote:
> > > 
> > > This mainly tries to rectify the workaround I put in place for ipa-cp.c
> > > needing to build value_range instead of value_range_base for calling
> > > extract_range_from_unary_expr.
> > > 
> > > To make this easier I moved more set_* functions to methods.
> > > 
> > > Then for some reason I chose to fix the rathole of equiv bitmap sharing
> > > after finding at least one real bug
> > 
> > By the way, I've seen that the equiv_add() calls in vr-values.c sometimes 
> > set
> > equivalences for VARYING and UNDEFINED which in theory shouldn't happen.  
> > I've
> > been too chicken to follow that hole.
> > 
> > I think we should assert everywhere that we set the equivalences, that we're
> > not talking about VARYING or UNDEFINED.
> > 
> > > @@ -6168,37 +6172,30 @@ value_range::union_helper (value_range *vr0, const
> > > value_range *vr1)
> > > return;
> > >   }
> > >   -  value_range saved (*vr0);
> > > value_range_kind vr0type = vr0->kind ();
> > > tree vr0min = vr0->min ();
> > > tree vr0max = vr0->max ();
> > > union_ranges (&vr0type, &vr0min, &vr0max,
> > >   vr1->kind (), vr1->min (), vr1->max ());
> > > -  *vr0 = value_range (vr0type, vr0min, vr0max);
> > > -  if (vr0->varying_p ())
> > > +  /* Work on a temporary so we can still use vr0 when union returns
> > > varying.  */
> > > +  value_range tem;
> > > +  tem.set_and_canonicalize (vr0type, vr0min, vr0max);
> > > +  if (tem.varying_p ())
> > 
> > I'm not a big fan of the code duplication in the union chunks.  You're 
> > adding
> > more places that need to be kept in sync.
> > 
> > I think value_range::union_ could be easily coded as:
> > 
> >   value_range_base::union_ (other);
> >   union_helper (this, other);
> >   if (flag_checking)
> > check ();
> > 
> > And have union_helper only deal with the equivalence stuff.
> 
> The tricky part starts in the prologue for
> 
>   if (vr0->undefined_p ())
> {
>   vr0->deep_copy (vr1);
>   return;
> }
> 
> but yes, we probably can factor out a bit more common code
> here.  I'll see to followup with more minor cleanups this
> week (noticed a few details myself).

Like this?  (untested)

Richard.

Index: gcc/tree-vrp.c
===
--- gcc/tree-vrp.c  (revision 266056)
+++ gcc/tree-vrp.c  (working copy)
@@ -6084,6 +6084,39 @@ value_range::intersect (const value_rang
 }
 }
 
+/* Helper for meet operation for value ranges.  Given two value ranges VR0 and
+   VR1, return a range that contains both VR0 and VR1.  This may not be the
+   smallest possible such range.  */
+
+static value_range_base
+value_range_base::union_helper (const value_range_base *vr0,
+   const value_range_base *vr1)
+{
+  value_range_kind vr0type = vr0->kind ();
+  tree vr0min = vr0->min ();
+  tree vr0max = vr0->max ();
+  union_ranges (&vr0type, &vr0min, &vr0max,
+   vr1->kind (), vr1->min (), vr1->max ());
+
+  /* Work on a temporary so we can still use vr0 when union returns varying.  
*/
+  value_range tem;
+  tem.set_and_canonicalize (vr0type, vr0min, vr0max);
+
+  /* Failed to find an efficient meet.  Before giving up and setting
+ the result to VARYING, see if we can at least derive a useful
+ anti-range.  */
+  if (tem.varying_p ()
+  && range_includes_zero_p (vr0) == 0
+  && range_includes_zero_p (vr1) == 0)
+{
+  tem.set_nonnull (vr0->type ());
+  return tem;
+}
+
+  return tem;
+}
+
+
 /* Meet operation for value ranges.  Given two value ranges VR0 and
VR1, store in VR0 a range that contains both VR0 and VR1.  This
may not be the smallest possible such range.  */
@@ -6115,108 +6148,55 @@ value_range_base::union_ (const value_ra
   return;
 }
 
-  value_range saved (*this);
-  value_range_kind vr0type = this->kind ();
-  tree vr0min = this->min ();
-  tree vr0max = this->max ();
-  union_ranges (&vr0type, &vr0min, &vr0max,
-   other->kind (), other->min (), other->max ());
-  *this = value_range_base (vr0type, vr0min, vr0max);
-  if (this->varying_p ())
-{
-  /* Failed to find an efficient meet.  Before giving up and setting
-the result to VARYING, see if we can at least derive a useful
-anti-range.  */
-  if (range_includes_zero_p (&saved) == 0
- && range_includes_zero_p (other) == 0)
-   {
- tree zero = build_int_cst (saved.type (), 0);
- *this = value_range_base (VR_ANTI_RANGE, zero, zero);
- return;
-   }
-
-  this->set_varying ();
-  return;
-}
-  this->set_and_canonicalize (this->kind (), this->min (), this->max ());
+  *this = union_helper (this, other);
 }
 
-/* Meet operation for value ranges.  Given two value ranges VR0 and
-   VR1, store in VR0 a range that contains b

Re: [PATCH] More value_range API cleanup

2018-11-12 Thread Richard Biener
On Mon, 12 Nov 2018, Aldy Hernandez wrote:

> On 11/12/18 7:12 AM, Richard Biener wrote:
> > 
> > This mainly tries to rectify the workaround I put in place for ipa-cp.c
> > needing to build value_range instead of value_range_base for calling
> > extract_range_from_unary_expr.
> > 
> > To make this easier I moved more set_* functions to methods.
> > 
> > Then for some reason I chose to fix the rathole of equiv bitmap sharing
> > after finding at least one real bug
> 
> By the way, I've seen that the equiv_add() calls in vr-values.c sometimes set
> equivalences for VARYING and UNDEFINED which in theory shouldn't happen.  I've
> been too chicken to follow that hole.
> 
> I think we should assert everywhere that we set the equivalences, that we're
> not talking about VARYING or UNDEFINED.
> 
> > @@ -6168,37 +6172,30 @@ value_range::union_helper (value_range *vr0, const
> > value_range *vr1)
> > return;
> >   }
> >   -  value_range saved (*vr0);
> > value_range_kind vr0type = vr0->kind ();
> > tree vr0min = vr0->min ();
> > tree vr0max = vr0->max ();
> > union_ranges (&vr0type, &vr0min, &vr0max,
> > vr1->kind (), vr1->min (), vr1->max ());
> > -  *vr0 = value_range (vr0type, vr0min, vr0max);
> > -  if (vr0->varying_p ())
> > +  /* Work on a temporary so we can still use vr0 when union returns
> > varying.  */
> > +  value_range tem;
> > +  tem.set_and_canonicalize (vr0type, vr0min, vr0max);
> > +  if (tem.varying_p ())
> 
> I'm not a big fan of the code duplication in the union chunks.  You're adding
> more places that need to be kept in sync.
> 
> I think value_range::union_ could be easily coded as:
> 
>   value_range_base::union_ (other);
>   union_helper (this, other);
>   if (flag_checking)
> check ();
> 
> And have union_helper only deal with the equivalence stuff.

The tricky part starts in the prologue for

  if (vr0->undefined_p ())
{
  vr0->deep_copy (vr1);
  return;
}

but yes, we probably can factor out a bit more common code
here.  I'll see to followup with more minor cleanups this
week (noticed a few details myself).

>  Call it
> union_equivs?  You'd have to clear the equivalences if the range just became a
> varying/undefined, as both of those should in theory never have equivalences.
> 
> Also, is there a reason why you implemented value_range_base::union_ but not
> the corresponding for intersect?  I would guess it'd be needed sooner or
> later.

Laziness ;)  But yes.  It was mostly time constraints for hitting Stage1
with the major changes.

Richard.


GCC 9 Status report (2018-11-13), Stage 3 in effect now

2018-11-12 Thread Richard Biener


Status
==

GCC trunk is open for general bugfixing (Stage 3) now until the end
of Jan 6th after which only regression and documentation fixes will
be possible.

This means we have now started the stablilization phase of GCC 9
and you should see to start testing the compiler, report and fix
bugs.

Note bugs have not yet been prioritized thoroughly so there's no
meaningful Quality Data yet.


Previous Report
===

https://gcc.gnu.org/ml/gcc/2018-10/msg00158.html


Re: [PATCH AutoFDO/2]Treat ZERO as common profile probability/count

2018-11-12 Thread Bin.Cheng
On Mon, Nov 5, 2018 at 10:40 PM Jan Hubicka  wrote:
>
> diff --git a/gcc/profile-count.h b/gcc/profile-count.h
> index 4289bc5a004..2b5e3269250 100644
> --- a/gcc/profile-count.h
> +++ b/gcc/profile-count.h
> @@ -218,6 +218,11 @@ public:
>  }
>
>
> +  /* Return true if value is zero.  */
> +  bool never_p () const
> +{
> +  return m_val == 0;
> +}
>/* Return true if value has been initialized.  */
>bool initialized_p () const
>  {
> @@ -288,9 +293,9 @@ public:
>  }
>profile_probability operator+ (const profile_probability &other) const
>  {
> -  if (other == profile_probability::never ())
> +  if (other.never_p ())
> return *this;
> -  if (*this == profile_probability::never ())
> +  if (this->never_p ())
>
> This is not correct change.  If you add guessed 0 to precise 0,
> the result needs to be guessed 0 because we are no longer sure the code
> will not get executed.  This is why all the checks here go explicitly
> to profile_probability::never.
Hmm, so precise 0 means the code can never get executed? I also noticed
that in predict.c there are lots of direct assignment of profile_count::zero as:
propagation_unlikely_bbs_forward (void)
{
  //...
  bb->count = profile_count::zero ();
  //...
}
This generally promote profile_count::zero from lower precision to precise
precision, but function name/comment seems targeting unlikely executed
code, rather than never executed.  Is this inconsistent?

Thanks,
bin

>
> Honza


Bug 52869 - [DR 1207] "this" not being allowed in noexcept clauses

2018-11-12 Thread Umesh Kalappa
Hi All,

the following patch fix the subjected issue

Index: gcc/cp/parser.c
===
--- gcc/cp/parser.c (revision 266026)
+++ gcc/cp/parser.c (working copy)
@@ -24615,6 +24615,8 @@
 {
   tree expr;
   cp_lexer_consume_token (parser->lexer);
+
+  inject_this_parameter (current_class_type, TYPE_UNQUALIFIED);

   if (cp_lexer_peek_token (parser->lexer)->type == CPP_OPEN_PAREN)
{


ok to commit along the testcase with changelog update ?

Thank you
~Umesh


Re: Fix __gnu_cxx::throw_allocator 2 * O(log(N)) complexity

2018-11-12 Thread François Dumont

Oops, it was not the tested patch. Here it is.

On 11/12/18 7:43 AM, François Dumont wrote:
When doing some debugging session I noticed that the 
__gnu_cxx::throw_allocator doubles all lookup on both insert and erase.


Using map::insert result and erasing the found iterator avoids this 
double lookup.


    * include/ext/throw_allocator.h
    (annotate_base::insert(void*, size_t)): Use insert result to check 
for

    double insert attempt.
    (annotate_base::insert_construct(void*)): Likewise.
    (annotate_base::check_allocated(void*, size_t)): Return found 
iterator.

    (annotate_base::erase(void*, size_t)): Use latter method returned
    iterator.
    (annotate_base::check_constructed(void*, size_t)): Return found 
iterator.

    (annotate_base::erase_construct(void*)): Use latter method returned
    iterator.

Tested under linux x86_64.

Ok to commit ?

François



diff --git a/libstdc++-v3/include/ext/throw_allocator.h b/libstdc++-v3/include/ext/throw_allocator.h
index dd7c69e..3a5670e3454 100644
--- a/libstdc++-v3/include/ext/throw_allocator.h
+++ b/libstdc++-v3/include/ext/throw_allocator.h
@@ -87,6 +87,17 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*/
   struct annotate_base
   {
+  private:
+typedef std::pair		data_type;
+typedef std::map		map_alloc_type;
+typedef map_alloc_type::value_type		entry_type;
+typedef map_alloc_type::const_iterator	const_iterator;
+typedef map_alloc_type::const_reference	const_reference;
+#if __cplusplus >= 201103L
+typedef std::map		map_construct_type;
+#endif
+
+  public:
 annotate_base()
 {
   label();
@@ -104,31 +115,28 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 void
 insert(void* p, size_t size)
 {
+  entry_type entry = make_entry(p, size);
   if (!p)
 	{
 	  std::string error("annotate_base::insert null insert!\n");
-	  log_to_string(error, make_entry(p, size));
+	  log_to_string(error, entry);
 	  std::__throw_logic_error(error.c_str());
 	}
 
-  const_iterator found = map_alloc().find(p);
-  if (found != map_alloc().end())
+  std::pair inserted
+	= map_alloc().insert(entry);
+  if (!inserted.second)
 	{
 	  std::string error("annotate_base::insert double insert!\n");
-	  log_to_string(error, make_entry(p, size));
-	  log_to_string(error, *found);
+	  log_to_string(error, entry);
+	  log_to_string(error, *inserted.first);
 	  std::__throw_logic_error(error.c_str());
 	}
-
-  map_alloc().insert(make_entry(p, size));
 }
 
 void
 erase(void* p, size_t size)
-{
-  check_allocated(p, size);
-  map_alloc().erase(p);
-}
+{ map_alloc().erase(check_allocated(p, size)); }
 
 #if __cplusplus >= 201103L
 void
@@ -140,31 +148,26 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	  std::__throw_logic_error(error.c_str());
 	}
 
-  auto found = map_construct().find(p);
-  if (found != map_construct().end())
+  auto inserted = map_construct().insert(std::make_pair(p, get_label()));
+  if (!inserted.second)
 	{
 	  std::string error("annotate_base::insert_construct double insert!\n");
 	  log_to_string(error, std::make_pair(p, get_label()));
-	  log_to_string(error, *found);
+	  log_to_string(error, *inserted.first);
 	  std::__throw_logic_error(error.c_str());
 	}
-
-  map_construct().insert(std::make_pair(p, get_label()));
 }
 
 void
 erase_construct(void* p)
-{
-  check_constructed(p);
-  map_construct().erase(p);
-}
+{ map_construct().erase(check_constructed(p)); }
 #endif
 
 // See if a particular address and allocation size has been saved.
-inline void
+inline map_alloc_type::iterator
 check_allocated(void* p, size_t size)
 {
-  const_iterator found = map_alloc().find(p);
+  map_alloc_type::iterator found = map_alloc().find(p);
   if (found == map_alloc().end())
 	{
 	  std::string error("annotate_base::check_allocated by value "
@@ -181,6 +184,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	  log_to_string(error, *found);
 	  std::__throw_logic_error(error.c_str());
 	}
+
+  return found;
 }
 
 // See if a given label has been allocated.
@@ -256,7 +261,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 }
 
 #if __cplusplus >= 201103L
-inline void
+inline map_construct_type::iterator
 check_constructed(void* p)
 {
   auto found = map_construct().find(p);
@@ -267,6 +272,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	  log_to_string(error, std::make_pair(p, get_label()));
 	  std::__throw_logic_error(error.c_str());
 	}
+
+  return found;
 }
 
 inline void
@@ -292,15 +299,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #endif
 
   private:
-typedef std::pair		data_type;
-typedef std::map 		map_alloc_type;
-typedef map_alloc_type::value_type 		entry_type;
-typedef map_alloc_type::const_iterator 		const_iterator;
-typedef map_alloc_type::const_reference 		const_reference;
-#if __cplusplus >= 201103L
-typedef std::map		map_construct_type;
-#endif
-
 friend std

Re: [RS6000] Remove unnecessary rtx_equal_p

2018-11-12 Thread Alan Modra
On Tue, Nov 13, 2018 at 02:16:09PM +1030, Alan Modra wrote:
> REGs are unique.  This patch recognizes that fact, speeding up rs6000
> gcc infinitesimally.  Bootstrapped etc. powerpc64le-linux.  OK?

Ugh, looking over this old patch again, I don't see how I can claim
that regs in pre_inc/pre_dec/pre_modify are unique.  Patch withdrawn.

-- 
Alan Modra
Australia Development Lab, IBM


[C++ PATCH] Implement P0315R4, Lambdas in unevaluated contexts.

2018-11-12 Thread Jason Merrill
When lambdas were added in C++11 they were banned from unevaluated contexts
as a way to avoid needing to deal with them in mangling or SFINAE.  This
proposal avoids that with a more narrow proposal: lambdas never compare as
equivalent (so we don't need to mangle them), and substitution failures
within a lambda are hard errors.  Lambdas appearing in places that types
couldn't previously have been declared introduces various complications; in
particular, it seems likely to mean types with no linkage being used more
broadly, risking ODR violations.  I want to follow up this patch with some
related diagnostics.

Tested x86_64-pc-linux-gnu, applying to trunk.

* decl2.c (min_vis_expr_r): Handle LAMBDA_EXPR.
* mangle.c (write_expression): Handle LAMBDA_EXPR.
* parser.c (cp_parser_lambda_expression): Allow lambdas in
unevaluated context.  Start the tentative firewall sooner.
(cp_parser_lambda_body): Use cp_evaluated.
* pt.c (iterative_hash_template_arg): Handle LAMBDA_EXPR.
(tsubst_function_decl): Substitute a lambda even if it isn't
dependent.
(tsubst_lambda_expr): Use cp_evaluated.  Always complain.
(tsubst_copy_and_build) [LAMBDA_EXPR]: Do nothing if tf_partial.
* semantics.c (begin_class_definition): Allow in template parm list.
* tree.c (strip_typedefs_expr): Pass through LAMBDA_EXPR.
(cp_tree_equal): Handle LAMBDA_EXPR.
---
 gcc/testsuite/g++.dg/cpp2a/lambda-uneval9.h   |  9 
 gcc/cp/decl2.c|  1 +
 gcc/cp/mangle.c   | 10 
 gcc/cp/parser.c   | 22 +---
 gcc/cp/pt.c   | 28 --
 gcc/cp/semantics.c|  6 ---
 gcc/cp/tree.c | 29 +-
 gcc/testsuite/g++.dg/cpp2a/lambda-uneval9.cc  |  3 ++
 .../g++.dg/cpp0x/lambda/lambda-ice6.C |  2 +-
 .../g++.dg/cpp0x/lambda/lambda-sfinae1.C  |  9 ++--
 .../g++.dg/cpp0x/lambda/lambda-uneval.C   |  2 +-
 .../g++.dg/cpp0x/lambda/lambda-uneval2.C  |  5 +-
 gcc/testsuite/g++.dg/cpp2a/lambda-uneval1.C   | 16 ++
 gcc/testsuite/g++.dg/cpp2a/lambda-uneval2.C   | 54 +++
 gcc/testsuite/g++.dg/cpp2a/lambda-uneval3.C   | 12 +
 gcc/testsuite/g++.dg/cpp2a/lambda-uneval4.C   |  8 +++
 gcc/testsuite/g++.dg/cpp2a/lambda-uneval5.C   |  5 ++
 gcc/testsuite/g++.dg/cpp2a/lambda-uneval6.C   | 26 +
 gcc/testsuite/g++.dg/cpp2a/lambda-uneval7.C   | 12 +
 gcc/testsuite/g++.dg/cpp2a/lambda-uneval8.C   | 13 +
 gcc/testsuite/g++.dg/cpp2a/lambda-uneval9.C   | 12 +
 gcc/cp/ChangeLog  | 15 ++
 22 files changed, 262 insertions(+), 37 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/lambda-uneval9.h
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/lambda-uneval9.cc
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/lambda-uneval1.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/lambda-uneval2.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/lambda-uneval3.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/lambda-uneval4.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/lambda-uneval5.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/lambda-uneval6.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/lambda-uneval7.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/lambda-uneval8.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/lambda-uneval9.C

diff --git a/gcc/cp/decl2.c b/gcc/cp/decl2.c
index 74b9f4ee826..04537417129 100644
--- a/gcc/cp/decl2.c
+++ b/gcc/cp/decl2.c
@@ -2288,6 +2288,7 @@ min_vis_expr_r (tree *tp, int */*walk_subtrees*/, void 
*data)
 case DYNAMIC_CAST_EXPR:
 case NEW_EXPR:
 case CONSTRUCTOR:
+case LAMBDA_EXPR:
   tpvis = type_visibility (TREE_TYPE (*tp));
   break;
 
diff --git a/gcc/cp/mangle.c b/gcc/cp/mangle.c
index b9d8ee20116..64415894bc5 100644
--- a/gcc/cp/mangle.c
+++ b/gcc/cp/mangle.c
@@ -3139,6 +3139,16 @@ write_expression (tree expr)
write_expression (val);
   write_char ('E');
 }
+  else if (code == LAMBDA_EXPR)
+{
+  /* [temp.over.link] Two lambda-expressions are never considered
+equivalent.
+
+So just use the closure type mangling.  */
+  write_string ("tl");
+  write_type (LAMBDA_EXPR_CLOSURE (expr));
+  write_char ('E');
+}
   else if (dependent_name (expr))
 {
   write_unqualified_id (dependent_name (expr));
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 0428f6dda90..db0f0338179 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -10175,12 +10175,15 @@ cp_parser_lambda_expression (cp_parser* parser)
 
   LAMBDA_EXPR_LOCATION (lambda_expr) = token->location;
 
-  if (cp_unevaluated_operand)
+  if (cxx_dialect >= cxx2a)
+/* C++20 allows lambdas in unevaluated context.  */;
+  else if (cp_unevaluated_operand)
 {
   if (!token->error_reported)
{
  error_at (LAMBDA_EXPR_

[C++ PATCH] Avoid double substitution with complete explicit template arguments.

2018-11-12 Thread Jason Merrill
Previously, when we got a function template with explicit arguments for all
of the template parameters, we still did "deduction", which of course
couldn't deduce anything, but did other deduction-time checking of
non-dependent conversions and such.  This broke down with the unevaluated
lambdas patch (to follow): substituting into the lambda multiple times, once
to get the function type for deduction and then again to generate the actual
decl, doesn't work, since different substitutions of a lambda produce
different types.  I believe that skipping the initial substitution when we
have all the arguments is still conformant, and produces better diagnostics
for some testcases.

Tested x86_64-pc-linux-gnu, applying to trunk.

* pt.c (fn_type_unification): If we have a full set of explicit
arguments, go straight to substitution.
---
 gcc/cp/pt.c   | 75 +++
 gcc/testsuite/g++.dg/cpp0x/decltype48.C   |  2 +-
 gcc/testsuite/g++.dg/cpp0x/diag1.C|  2 +-
 gcc/testsuite/g++.dg/cpp0x/error4.C   |  2 +-
 gcc/testsuite/g++.dg/cpp0x/pr77655.C  |  2 +-
 .../g++.dg/diagnostic/param-type-mismatch-2.C | 16 ++--
 .../g++.dg/diagnostic/param-type-mismatch.C   | 10 +--
 gcc/cp/ChangeLog  |  5 ++
 8 files changed, 63 insertions(+), 51 deletions(-)

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 0c33c8e1527..f948aef3776 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -19800,6 +19800,11 @@ fn_type_unification (tree fn,
   tsubst_flags_t complain = (explain_p ? tf_warning_or_error : tf_none);
   bool ok;
   static int deduction_depth;
+  /* type_unification_real will pass back any access checks from default
+ template argument substitution.  */
+  vec *checks = NULL;
+  /* We don't have all the template args yet.  */
+  bool incomplete = true;
 
   tree orig_fn = fn;
   if (flag_new_inheriting_ctors)
@@ -19857,7 +19862,7 @@ fn_type_unification (tree fn,
 template results in an invalid type, type deduction fails.  */
   int i, len = TREE_VEC_LENGTH (tparms);
   location_t loc = input_location;
-  bool incomplete = false;
+  incomplete = false;
 
   if (explicit_targs == error_mark_node)
goto fail;
@@ -19923,33 +19928,52 @@ fn_type_unification (tree fn,
 }
 }
 
-  if (!push_tinst_level (fn, explicit_targs))
+  if (incomplete)
{
- excessive_deduction_depth = true;
- goto fail;
-   }
-  processing_template_decl += incomplete;
-  input_location = DECL_SOURCE_LOCATION (fn);
-  /* Ignore any access checks; we'll see them again in
-instantiate_template and they might have the wrong
-access path at this point.  */
-  push_deferring_access_checks (dk_deferred);
-  fntype = tsubst (TREE_TYPE (fn), explicit_targs,
-  complain | tf_partial | tf_fndecl_type, NULL_TREE);
-  pop_deferring_access_checks ();
-  input_location = loc;
-  processing_template_decl -= incomplete;
-  pop_tinst_level ();
+ if (!push_tinst_level (fn, explicit_targs))
+   {
+ excessive_deduction_depth = true;
+ goto fail;
+   }
+ ++processing_template_decl;
+ input_location = DECL_SOURCE_LOCATION (fn);
+ /* Ignore any access checks; we'll see them again in
+instantiate_template and they might have the wrong
+access path at this point.  */
+ push_deferring_access_checks (dk_deferred);
+ tsubst_flags_t ecomplain = complain | tf_partial | tf_fndecl_type;
+ fntype = tsubst (TREE_TYPE (fn), explicit_targs, ecomplain, 
NULL_TREE);
+ pop_deferring_access_checks ();
+ input_location = loc;
+ --processing_template_decl;
+ pop_tinst_level ();
 
-  if (fntype == error_mark_node)
-   goto fail;
+ if (fntype == error_mark_node)
+   goto fail;
+   }
 
   /* Place the explicitly specified arguments in TARGS.  */
   explicit_targs = INNERMOST_TEMPLATE_ARGS (explicit_targs);
   for (i = NUM_TMPL_ARGS (explicit_targs); i--;)
TREE_VEC_ELT (targs, i) = TREE_VEC_ELT (explicit_targs, i);
+  if (!incomplete && CHECKING_P
+ && !NON_DEFAULT_TEMPLATE_ARGS_COUNT (targs))
+   SET_NON_DEFAULT_TEMPLATE_ARGS_COUNT
+ (targs, NUM_TMPL_ARGS (explicit_targs));
+}
+
+  if (return_type && strict != DEDUCE_CALL)
+{
+  tree *new_args = XALLOCAVEC (tree, nargs + 1);
+  new_args[0] = return_type;
+  memcpy (new_args + 1, args, nargs * sizeof (tree));
+  args = new_args;
+  ++nargs;
 }
 
+  if (!incomplete)
+goto deduced;
+
   /* Never do unification on the 'this' parameter.  */
   parms = skip_artificial_parms_for (fn, TYPE_ARG_TYPES (fntype));
 
@@ -19963,14 +19987,7 @@ fn_type_unification (tree fn,
 }
   else if (return_type)
 {
-  tree *new_args;
-
   parms = t

[C++ PATCH] * decl2.c (min_vis_expr_r, expr_visibility): New.

2018-11-12 Thread Jason Merrill
We weren't properly constraining visibility based on names that appear in
the mangled representation of expressions.  This was made more obvious
by the upcoming unevaluated lambdas patch.

Tested x86_64-pc-linux-gnu, applying to trunk.

(min_vis_r): Call expr_visibility.
(constrain_visibility_for_template): Likewise.
---
 gcc/cp/decl2.c  | 77 +
 gcc/testsuite/g++.dg/abi/no-linkage-expr1.C | 19 +
 gcc/cp/ChangeLog|  4 ++
 3 files changed, 85 insertions(+), 15 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/abi/no-linkage-expr1.C

diff --git a/gcc/cp/decl2.c b/gcc/cp/decl2.c
index 13c156b947d..74b9f4ee826 100644
--- a/gcc/cp/decl2.c
+++ b/gcc/cp/decl2.c
@@ -2238,6 +2238,9 @@ maybe_emit_vtables (tree ctype)
 
 enum { VISIBILITY_ANON = VISIBILITY_INTERNAL+1 };
 
+static int expr_visibility (tree);
+static int type_visibility (tree);
+
 /* walk_tree helper function for type_visibility.  */
 
 static tree
@@ -2257,9 +2260,55 @@ min_vis_r (tree *tp, int *walk_subtrees, void *data)
   else if (CLASS_TYPE_P (*tp)
   && CLASSTYPE_VISIBILITY (*tp) > *vis_p)
 *vis_p = CLASSTYPE_VISIBILITY (*tp);
+  else if (TREE_CODE (*tp) == ARRAY_TYPE
+  && uses_template_parms (TYPE_DOMAIN (*tp)))
+{
+  int evis = expr_visibility (TYPE_MAX_VALUE (TYPE_DOMAIN (*tp)));
+  if (evis > *vis_p)
+   *vis_p = evis;
+}
   return NULL;
 }
 
+/* walk_tree helper function for expr_visibility.  */
+
+static tree
+min_vis_expr_r (tree *tp, int */*walk_subtrees*/, void *data)
+{
+  int *vis_p = (int *)data;
+  int tpvis = VISIBILITY_DEFAULT;
+
+  switch (TREE_CODE (*tp))
+{
+case CAST_EXPR:
+case IMPLICIT_CONV_EXPR:
+case STATIC_CAST_EXPR:
+case REINTERPRET_CAST_EXPR:
+case CONST_CAST_EXPR:
+case DYNAMIC_CAST_EXPR:
+case NEW_EXPR:
+case CONSTRUCTOR:
+  tpvis = type_visibility (TREE_TYPE (*tp));
+  break;
+
+case VAR_DECL:
+case FUNCTION_DECL:
+  if (! TREE_PUBLIC (*tp))
+   tpvis = VISIBILITY_ANON;
+  else
+   tpvis = DECL_VISIBILITY (*tp);
+  break;
+
+default:
+  break;
+}
+
+  if (tpvis > *vis_p)
+*vis_p = tpvis;
+
+  return NULL_TREE;
+}
+
 /* Returns the visibility of TYPE, which is the minimum visibility of its
component types.  */
 
@@ -2271,6 +2320,18 @@ type_visibility (tree type)
   return vis;
 }
 
+/* Returns the visibility of an expression EXPR that appears in the signature
+   of a function template, which is the minimum visibility of names that appear
+   in its mangling.  */
+
+static int
+expr_visibility (tree expr)
+{
+  int vis = VISIBILITY_DEFAULT;
+  cp_walk_tree_without_duplicates (&expr, min_vis_expr_r, &vis);
+  return vis;
+}
+
 /* Limit the visibility of DECL to VISIBILITY, if not explicitly
specified (or if VISIBILITY is static).  If TMPL is true, this
constraint is for a template argument, and takes precedence
@@ -2329,21 +2390,7 @@ constrain_visibility_for_template (tree decl, tree targs)
   if (TYPE_P (arg))
vis = type_visibility (arg);
   else
-   {
- if (REFERENCE_REF_P (arg))
-   arg = TREE_OPERAND (arg, 0);
- if (TREE_TYPE (arg))
-   STRIP_NOPS (arg);
- if (TREE_CODE (arg) == ADDR_EXPR)
-   arg = TREE_OPERAND (arg, 0);
- if (VAR_OR_FUNCTION_DECL_P (arg))
-   {
- if (! TREE_PUBLIC (arg))
-   vis = VISIBILITY_ANON;
- else
-   vis = DECL_VISIBILITY (arg);
-   }
-   }
+   vis = expr_visibility (arg);
   if (vis)
constrain_visibility (decl, vis, true);
 }
diff --git a/gcc/testsuite/g++.dg/abi/no-linkage-expr1.C 
b/gcc/testsuite/g++.dg/abi/no-linkage-expr1.C
new file mode 100644
index 000..c3b1286ba4c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/abi/no-linkage-expr1.C
@@ -0,0 +1,19 @@
+// { dg-do compile { target c++11 } }
+// { dg-final { scan-assembler-not "weak.*_Z" } }
+
+using P = struct {}*;
+
+template 
+void f(int(*)[((P)0, N)]) {}
+
+template 
+struct A { };
+
+template 
+void g(A<((P)0,N)>) {}
+
+int main()
+{
+  f<1>(0);
+  g<1>({});
+}
diff --git a/gcc/cp/ChangeLog b/gcc/cp/ChangeLog
index 5497a0829e3..79c162c75b0 100644
--- a/gcc/cp/ChangeLog
+++ b/gcc/cp/ChangeLog
@@ -1,5 +1,9 @@
 2018-11-12  Jason Merrill  
 
+   * decl2.c (min_vis_expr_r, expr_visibility): New.
+   (min_vis_r): Call expr_visibility.
+   (constrain_visibility_for_template): Likewise.
+
Implement P0722R3, destroying operator delete.
* call.c (std_destroying_delete_t_p, destroying_delete_p): New.
(aligned_deallocation_fn_p, usual_deallocation_fn_p): Use

base-commit: 76b94d4ba654e9af1882865933343d11f5c3b18b
-- 
2.17.2



RFC (branch prediction): PATCH to implement P0479R5, [[likely]] and [[unlikely]].

2018-11-12 Thread Jason Merrill
[[likely]] and [[unlikely]] are equivalent to the GNU hot/cold attributes,
except that they can be applied to arbitrary statements as well as labels;
this is most likely to be useful for marking if/else branches as likely or
unlikely.  Conveniently, PREDICT_EXPR fits the bill nicely as a
representation.

I also had to fix marking case labels as hot/cold, which didn't work before.
Which then required me to force __attribute ((fallthrough)) to apply to the
statement rather than the label.

Tested x86_64-pc-linux-gnu.  Does this seem like a sane implementation
approach to people with more experience with PREDICT_EXPR?

gcc/
* gimplify.c (gimplify_case_label_expr): Handle hot/cold attributes.
gcc/c-family/
* c-lex.c (c_common_has_attribute): Handle likely/unlikely.
gcc/cp/
* parser.c (cp_parser_std_attribute): Handle likely/unlikely.
(cp_parser_statement): Call process_stmt_hotness_attribute.
(cp_parser_label_for_labeled_statement): Apply attributes to case.
* cp-gimplify.c (lookup_hotness_attribute, remove_hotness_attribute)
(process_stmt_hotness_attribute): New.
* decl.c (finish_case_label): Give label in template type void.
* pt.c (tsubst_expr) [CASE_LABEL_EXPR]: Copy attributes.
[PREDICT_EXPR]: Handle.
---
 gcc/cp/cp-tree.h  |  2 +
 gcc/c-family/c-lex.c  |  4 +-
 gcc/cp/cp-gimplify.c  | 42 +
 gcc/cp/decl.c |  2 +-
 gcc/cp/parser.c   | 45 +++
 gcc/cp/pt.c   | 12 +-
 gcc/gimplify.c| 10 -
 gcc/testsuite/g++.dg/cpp2a/attr-likely1.C | 38 +++
 gcc/testsuite/g++.dg/cpp2a/attr-likely2.C | 10 +
 gcc/testsuite/g++.dg/cpp2a/feat-cxx2a.C   | 12 ++
 gcc/ChangeLog |  4 ++
 gcc/c-family/ChangeLog|  4 ++
 gcc/cp/ChangeLog  | 12 ++
 13 files changed, 184 insertions(+), 13 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/attr-likely1.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/attr-likely2.C

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index c4d79c0cf7f..c55352ec5ff 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -7541,6 +7541,8 @@ extern bool cxx_omp_disregard_value_expr  (tree, bool);
 extern void cp_fold_function   (tree);
 extern tree cp_fully_fold  (tree);
 extern void clear_fold_cache   (void);
+extern tree lookup_hotness_attribute   (tree);
+extern tree process_stmt_hotness_attribute (tree);
 
 /* in name-lookup.c */
 extern tree strip_using_decl(tree);
diff --git a/gcc/c-family/c-lex.c b/gcc/c-family/c-lex.c
index 28a820a2a3d..3cc015083e0 100644
--- a/gcc/c-family/c-lex.c
+++ b/gcc/c-family/c-lex.c
@@ -356,7 +356,9 @@ c_common_has_attribute (cpp_reader *pfile)
   || is_attribute_p ("nodiscard", attr_name)
   || is_attribute_p ("fallthrough", attr_name))
result = 201603;
- else if (is_attribute_p ("no_unique_address", attr_name))
+ else if (is_attribute_p ("no_unique_address", attr_name)
+  || is_attribute_p ("likely", attr_name)
+  || is_attribute_p ("unlikely", attr_name))
result = 201803;
  if (result)
attr_name = NULL_TREE;
diff --git a/gcc/cp/cp-gimplify.c b/gcc/cp/cp-gimplify.c
index eb761b118a1..f8212187162 100644
--- a/gcc/cp/cp-gimplify.c
+++ b/gcc/cp/cp-gimplify.c
@@ -2674,4 +2674,46 @@ cp_fold (tree x)
   return x;
 }
 
+/* Look up either "hot" or "cold" in attribute list LIST.  */
+
+tree
+lookup_hotness_attribute (tree list)
+{
+  tree attr = lookup_attribute ("hot", list);
+  if (attr)
+return attr;
+  return lookup_attribute ("cold", list);
+}
+
+/* Remove both "hot" and "cold" attributes from LIST.  */
+
+static tree
+remove_hotness_attribute (tree list)
+{
+  return remove_attribute ("hot", remove_attribute ("cold", list));
+}
+
+/* If [[likely]] or [[unlikely]] appear on this statement, turn it into a
+   PREDICT_EXPR.  */
+
+tree
+process_stmt_hotness_attribute (tree std_attrs)
+{
+  if (std_attrs == error_mark_node)
+return std_attrs;
+  if (tree attr = lookup_hotness_attribute (std_attrs))
+{
+  tree name = get_attribute_name (attr);
+  bool hot = is_attribute_p ("hot", name);
+  tree pred = build_predict_expr (hot ? PRED_HOT_LABEL : PRED_COLD_LABEL,
+ hot ? TAKEN : NOT_TAKEN);
+  add_stmt (pred);
+  if (tree other = lookup_hotness_attribute (TREE_CHAIN (attr)))
+   warning (OPT_Wattributes, "ignoring attribute %qE after earlier %qE",
+get_attribute_name (other), name);
+  std_attrs = remove_hotness_attribute (std_attrs);
+}
+  return std

[C++ PATCH] Implement P0722R3, destroying operator delete.

2018-11-12 Thread Jason Merrill
A destroying operator delete takes responsibility for calling the destructor
for the object it is deleting; this is intended to be useful for sized
delete of a class allocated with a trailing buffer, where the compiler can't
know the size of the allocation, and so would pass the wrong size to the
non-destroying sized operator delete.

Tested x86_64-pc-linux-gnu, applying to trunk.  Can someone from the
libstdc++ team clean up my libsupc++ change if it should be formatted
differently?

gcc/c-family/
* c-cppbuiltin.c (c_cpp_builtins): Define
__cpp_impl_destroying_delete.
gcc/cp/
* call.c (std_destroying_delete_t_p, destroying_delete_p): New.
(aligned_deallocation_fn_p, usual_deallocation_fn_p): Use
destroying_delete_p.
(build_op_delete_call): Handle destroying delete.
* decl2.c (coerce_delete_type): Handle destroying delete.
* init.c (build_delete): Don't call dtor with destroying delete.
* optimize.c (build_delete_destructor_body): Likewise.
libstdc++-v3/
* libsupc++/new (std::destroying_delete_t): New.
---
 gcc/cp/cp-tree.h  |  3 +-
 gcc/c-family/c-cppbuiltin.c   |  1 +
 gcc/cp/call.c | 45 +++
 gcc/cp/decl.c |  2 +-
 gcc/cp/decl2.c| 32 ++---
 gcc/cp/init.c |  8 +++-
 gcc/cp/optimize.c | 27 +++
 .../g++.dg/cpp2a/destroying-delete1.C | 41 +
 gcc/testsuite/g++.dg/cpp2a/feat-cxx2a.C   |  4 ++
 gcc/c-family/ChangeLog|  3 ++
 gcc/cp/ChangeLog  |  9 
 libstdc++-v3/ChangeLog|  4 ++
 libstdc++-v3/libsupc++/new| 12 +
 13 files changed, 174 insertions(+), 17 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/destroying-delete1.C

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 9c4664c3aa7..c4d79c0cf7f 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -6127,6 +6127,7 @@ extern tree build_new_op  (location_t, 
enum tree_code,
 extern tree build_op_call  (tree, vec **,
 tsubst_flags_t);
 extern bool aligned_allocation_fn_p(tree);
+extern tree destroying_delete_p(tree);
 extern bool usual_deallocation_fn_p(tree);
 extern tree build_op_delete_call   (enum tree_code, tree, tree,
 bool, tree, tree,
@@ -6456,7 +6457,7 @@ extern void cplus_decl_attributes (tree *, tree, 
int);
 extern void finish_anon_union  (tree);
 extern void cxx_post_compilation_parsing_cleanups (void);
 extern tree coerce_new_type(tree, location_t);
-extern tree coerce_delete_type (tree, location_t);
+extern void coerce_delete_type (tree, location_t);
 extern void comdat_linkage (tree);
 extern void determine_visibility   (tree);
 extern void constrain_class_visibility (tree);
diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c
index 8dd62158b62..7daa3e33990 100644
--- a/gcc/c-family/c-cppbuiltin.c
+++ b/gcc/c-family/c-cppbuiltin.c
@@ -980,6 +980,7 @@ c_cpp_builtins (cpp_reader *pfile)
  /* Set feature test macros for C++2a.  */
  cpp_define (pfile, "__cpp_conditional_explicit=201806");
  cpp_define (pfile, "__cpp_nontype_template_parameter_class=201806");
+ cpp_define (pfile, "__cpp_impl_destroying_delete=201806");
}
   if (flag_concepts)
cpp_define (pfile, "__cpp_concepts=201507");
diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index 6f401567c2e..b668e031d3c 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -6190,6 +6190,31 @@ aligned_allocation_fn_p (tree t)
   return (a && same_type_p (TREE_VALUE (a), align_type_node));
 }
 
+/* True if T is std::destroying_delete_t.  */
+
+static bool
+std_destroying_delete_t_p (tree t)
+{
+  return (TYPE_CONTEXT (t) == std_node
+ && id_equal (TYPE_IDENTIFIER (t), "destroying_delete_t"));
+}
+
+/* A deallocation function with at least two parameters whose second parameter
+   type is of type std::destroying_delete_t is a destroying operator delete. A
+   destroying operator delete shall be a class member function named operator
+   delete. [ Note: Array deletion cannot use a destroying operator
+   delete. --end note ] */
+
+tree
+destroying_delete_p (tree t)
+{
+  tree a = TYPE_ARG_TYPES (TREE_TYPE (t));
+  if (!a || !TREE_CHAIN (a))
+return NULL_TREE;
+  tree type = TREE_VALUE (TREE_CHAIN (a));
+  return std_destroying_delete_t_p (type) ? type : NULL_TREE;
+}
+
 /* Returns true iff T, an element of an OVERLOAD chain, is a usual deallocation
function (3.7.4.2 [basic.stc.dyna

[C++ PATCH] Implement P0780R2, pack expansion in lambda init-capture.

2018-11-12 Thread Jason Merrill
Mostly this was straightforward; the tricky bit was finding, in the
instantiation, the set of capture proxies built when instantiating the
init-capture.  The comment in lookup_init_capture_pack goes into detail.

Tested x86_64-pc-linux-gnu, applying to trunk.

* parser.c (cp_parser_lambda_introducer): Parse pack init-capture.
* pt.c (tsubst_pack_expansion): Handle init-capture packs.
(lookup_init_capture_pack): New.
(tsubst_expr) [DECL_EXPR]: Use it.
(tsubst_lambda_expr): Remember field pack expansions for
init-captures.
---
 gcc/cp/parser.c   | 13 +++
 gcc/cp/pt.c   | 93 ---
 .../g++.dg/cpp2a/lambda-pack-init1.C  | 17 
 gcc/cp/ChangeLog  | 10 ++
 4 files changed, 122 insertions(+), 11 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/lambda-pack-init1.C

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 465ab8fdbae..0428f6dda90 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -10395,6 +10395,17 @@ cp_parser_lambda_introducer (cp_parser* parser, tree 
lambda_expr)
  continue;
}
 
+  bool init_pack_expansion = false;
+  if (cp_lexer_next_token_is (parser->lexer, CPP_ELLIPSIS))
+   {
+ location_t loc = cp_lexer_peek_token (parser->lexer)->location;
+ if (cxx_dialect < cxx2a)
+   pedwarn (loc, 0, "pack init-capture only available with "
+"-std=c++2a or -std=gnu++2a");
+ cp_lexer_consume_token (parser->lexer);
+ init_pack_expansion = true;
+   }
+
   /* Remember whether we want to capture as a reference or not.  */
   if (cp_lexer_next_token_is (parser->lexer, CPP_AND))
{
@@ -10438,6 +10449,8 @@ cp_parser_lambda_introducer (cp_parser* parser, tree 
lambda_expr)
  error ("empty initializer for lambda init-capture");
  capture_init_expr = error_mark_node;
}
+ if (init_pack_expansion)
+   capture_init_expr = make_pack_expansion (capture_init_expr);
}
   else
{
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 4cb8238ba12..0c33c8e1527 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -12151,7 +12151,7 @@ tsubst_pack_expansion (tree t, tree args, 
tsubst_flags_t complain,
   where it isn't expected).  */
unsubstituted_fn_pack = true;
}
-  else if (is_normal_capture_proxy (parm_pack))
+  else if (is_capture_proxy (parm_pack))
{
  arg_pack = retrieve_local_specialization (parm_pack);
  if (argument_pack_element_is_expansion_p (arg_pack, 0))
@@ -16769,6 +16769,55 @@ tsubst_decomp_names (tree decl, tree pattern_decl, 
tree args,
   return decl;
 }
 
+/* Return the proper local_specialization for init-capture pack DECL.  */
+
+static tree
+lookup_init_capture_pack (tree decl)
+{
+  /* We handle normal pack captures by forwarding to the specialization of the
+ captured parameter.  We can't do that for pack init-captures; we need them
+ to have their own local_specialization.  We created the individual
+ VAR_DECLs (if any) under build_capture_proxy, and we need to collect them
+ when we process the DECL_EXPR for the pack init-capture in the template.
+ So, how do we find them?  We don't know the capture proxy pack when
+ building the individual resulting proxies, and we don't know the
+ individual proxies when instantiating the pack.  What we have in common is
+ the FIELD_DECL.
+
+ So...when we instantiate the FIELD_DECL, we stick the result in
+ local_specializations.  Then at the DECL_EXPR we look up that result, see
+ how many elements it has, synthesize the names, and look them up.  */
+
+  tree cname = DECL_NAME (decl);
+  tree val = DECL_VALUE_EXPR (decl);
+  tree field = TREE_OPERAND (val, 1);
+  gcc_assert (TREE_CODE (field) == FIELD_DECL);
+  tree fpack = retrieve_local_specialization (field);
+  if (fpack == error_mark_node)
+return error_mark_node;
+
+  int len = 1;
+  tree vec = NULL_TREE;
+  tree r = NULL_TREE;
+  if (TREE_CODE (fpack) == TREE_VEC)
+{
+  len = TREE_VEC_LENGTH (fpack);
+  vec = make_tree_vec (len);
+  r = make_node (NONTYPE_ARGUMENT_PACK);
+  SET_ARGUMENT_PACK_ARGS (r, vec);
+}
+  for (int i = 0; i < len; ++i)
+{
+  tree ename = vec ? make_ith_pack_parameter_name (cname, i) : cname;
+  tree elt = lookup_name_real (ename, 0, 0, true, 0, LOOKUP_NORMAL);
+  if (vec)
+   TREE_VEC_ELT (vec, i) = elt;
+  else
+   r = elt;
+}
+  return r;
+}
+
 /* Like tsubst_copy for expressions, etc. but also does semantic
processing.  */
 
@@ -16854,18 +16903,21 @@ tsubst_expr (tree t, tree args, tsubst_flags_t 
complain, tree in_decl,
/* We're in tsubst_lambda_expr, we've already inserted a new
   capture proxy, so look it up and register it.  */
tree ins

[C++ PATCH] * cp-tree.h (struct cp_evaluated): New.

2018-11-12 Thread Jason Merrill
This patch simplifies the saving/clearing/restoring of
cp_unevaluated_operand and c_inhibit_evaluation_warnings in the presence of
mid-block returns.  This cleanup was motivated by the forthcoming
unevaluated lambdas patch.

Tested x86_64-pc-linux-gnu, applying to trunk.

* init.c (get_nsdmi): Use it.
* parser.c (cp_parser_enclosed_template_argument_list): Use it.
* pt.c (coerce_template_parms, tsubst_aggr_type): Use it.
---
 gcc/cp/cp-tree.h | 15 +++
 gcc/cp/init.c|  4 +---
 gcc/cp/parser.c  |  9 +
 gcc/cp/pt.c  | 20 +++-
 gcc/cp/ChangeLog |  7 +++
 5 files changed, 27 insertions(+), 28 deletions(-)

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 6ca138d4ce6..9c4664c3aa7 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -5231,6 +5231,21 @@ struct cp_unevaluated
   ~cp_unevaluated ();
 };
 
+/* The reverse: an RAII class used for nested contexts that are evaluated even
+   if the enclosing context is not.  */
+
+struct cp_evaluated
+{
+  int uneval;
+  int inhibit;
+  cp_evaluated ()
+: uneval(cp_unevaluated_operand), inhibit(c_inhibit_evaluation_warnings)
+  { cp_unevaluated_operand = c_inhibit_evaluation_warnings = 0; }
+  ~cp_evaluated ()
+  { cp_unevaluated_operand = uneval;
+c_inhibit_evaluation_warnings = inhibit; }
+};
+
 /* in pt.c  */
 
 /* These values are used for the `STRICT' parameter to type_unification and
diff --git a/gcc/cp/init.c b/gcc/cp/init.c
index 15046b4257b..a17e1608c80 100644
--- a/gcc/cp/init.c
+++ b/gcc/cp/init.c
@@ -569,8 +569,7 @@ get_nsdmi (tree member, bool in_ctor, tsubst_flags_t 
complain)
}
   else
{
- int un = cp_unevaluated_operand;
- cp_unevaluated_operand = 0;
+ cp_evaluated ev;
 
  location_t sloc = input_location;
  input_location = expr_loc;
@@ -616,7 +615,6 @@ get_nsdmi (tree member, bool in_ctor, tsubst_flags_t 
complain)
}
 
  input_location = sloc;
- cp_unevaluated_operand = un;
}
 }
   else
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 1766ef418a2..465ab8fdbae 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -27861,8 +27861,6 @@ cp_parser_enclosed_template_argument_list (cp_parser* 
parser)
   tree saved_qualifying_scope;
   tree saved_object_scope;
   bool saved_greater_than_is_operator_p;
-  int saved_unevaluated_operand;
-  int saved_inhibit_evaluation_warnings;
 
   /* [temp.names]
 
@@ -27879,10 +27877,7 @@ cp_parser_enclosed_template_argument_list (cp_parser* 
parser)
   saved_object_scope = parser->object_scope;
   /* We need to evaluate the template arguments, even though this
  template-id may be nested within a "sizeof".  */
-  saved_unevaluated_operand = cp_unevaluated_operand;
-  cp_unevaluated_operand = 0;
-  saved_inhibit_evaluation_warnings = c_inhibit_evaluation_warnings;
-  c_inhibit_evaluation_warnings = 0;
+  cp_evaluated ev;
   /* Parse the template-argument-list itself.  */
   if (cp_lexer_next_token_is (parser->lexer, CPP_GREATER)
   || cp_lexer_next_token_is (parser->lexer, CPP_RSHIFT))
@@ -27951,8 +27946,6 @@ cp_parser_enclosed_template_argument_list (cp_parser* 
parser)
   parser->scope = saved_scope;
   parser->qualifying_scope = saved_qualifying_scope;
   parser->object_scope = saved_object_scope;
-  cp_unevaluated_operand = saved_unevaluated_operand;
-  c_inhibit_evaluation_warnings = saved_inhibit_evaluation_warnings;
 
   return arguments;
 }
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index d4ae76a89f4..4cb8238ba12 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -8256,8 +8256,6 @@ coerce_template_parms (tree parms,
   tree inner_args;
   tree new_args;
   tree new_inner_args;
-  int saved_unevaluated_operand;
-  int saved_inhibit_evaluation_warnings;
 
   /* When used as a boolean value, indicates whether this is a
  variadic template parameter list. Since it's an int, we can also
@@ -8374,10 +8372,8 @@ coerce_template_parms (tree parms,
 
   /* We need to evaluate the template arguments, even though this
  template-id may be nested within a "sizeof".  */
-  saved_unevaluated_operand = cp_unevaluated_operand;
-  cp_unevaluated_operand = 0;
-  saved_inhibit_evaluation_warnings = c_inhibit_evaluation_warnings;
-  c_inhibit_evaluation_warnings = 0;
+  cp_evaluated ev;
+
   new_inner_args = make_tree_vec (nparms);
   new_args = add_outermost_template_args (args, new_inner_args);
   int pack_adjust = 0;
@@ -8517,8 +8513,6 @@ coerce_template_parms (tree parms,
lost++;
   TREE_VEC_ELT (new_inner_args, arg_idx - pack_adjust) = arg;
 }
-  cp_unevaluated_operand = saved_unevaluated_operand;
-  c_inhibit_evaluation_warnings = saved_inhibit_evaluation_warnings;
 
   if (missing || arg_idx < nargs - variadic_args_p)
 {
@@ -12655,14 +12649,9 @@ tsubst_aggr_type (tree t,
  tree argvec;
  tree context;
  tree r;
- int saved_unevaluated_operand;
- int saved_inhibit_evaluation_w

[C++ PATCH] Change __cpp_explicit_bool to __cpp_conditional_explicit.

2018-11-12 Thread Jason Merrill
People objected to the old macro name as unclear, so it was changed at the San
Diego meeting.

Tested x86_64-pc-linux-gnu, applying to trunk.

* c-cppbuiltin.c (c_cpp_builtins): Change __cpp_explicit_bool to
__cpp_conditional_explicit.
---
 gcc/c-family/c-cppbuiltin.c | 2 +-
 gcc/testsuite/g++.dg/cpp2a/feat-cxx2a.C | 4 
 gcc/c-family/ChangeLog  | 5 +
 3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c
index e7f4c669056..8dd62158b62 100644
--- a/gcc/c-family/c-cppbuiltin.c
+++ b/gcc/c-family/c-cppbuiltin.c
@@ -978,7 +978,7 @@ c_cpp_builtins (cpp_reader *pfile)
   if (cxx_dialect > cxx17)
{
  /* Set feature test macros for C++2a.  */
- cpp_define (pfile, "__cpp_explicit_bool=201806");
+ cpp_define (pfile, "__cpp_conditional_explicit=201806");
  cpp_define (pfile, "__cpp_nontype_template_parameter_class=201806");
}
   if (flag_concepts)
diff --git a/gcc/testsuite/g++.dg/cpp2a/feat-cxx2a.C 
b/gcc/testsuite/g++.dg/cpp2a/feat-cxx2a.C
index faed6697382..4289bfcfa52 100644
--- a/gcc/testsuite/g++.dg/cpp2a/feat-cxx2a.C
+++ b/gcc/testsuite/g++.dg/cpp2a/feat-cxx2a.C
@@ -418,6 +418,10 @@
 
 // C++20 features
 
+#if __cpp_conditional_explicit != 201806
+# error "__cpp_conditional_explicit != 201806"
+#endif
+
 #if __cpp_nontype_template_parameter_class != 201806
 # error "__cpp_nontype_template_parameter_class != 201806"
 #endif
diff --git a/gcc/c-family/ChangeLog b/gcc/c-family/ChangeLog
index 6ce25c97783..42bb5ca450b 100644
--- a/gcc/c-family/ChangeLog
+++ b/gcc/c-family/ChangeLog
@@ -1,3 +1,8 @@
+2018-11-12  Jason Merrill  
+
+   * c-cppbuiltin.c (c_cpp_builtins): Change __cpp_explicit_bool to
+   __cpp_conditional_explicit.
+
 2018-11-09  Martin Sebor  
 
PR middle-end/81824

base-commit: f6b2026a461fa351cd7b97fd1865696ac0903307
-- 
2.17.2



[RS6000] secondary_reload and find_replacement

2018-11-12 Thread Alan Modra
This patch removes a call only necessary when using reload.  It also
corrects a PRE_DEC address offset.

Segher preapproved the find_replacement change a few days ago, and
the other change is an obvious bug fix.  Committed rev 266049.

* config/rs6000/rs6000.c (rs6000_secondary_reload_inner): Negate
offset for PRE_DEC.
(rs6000_secondary_reload_gpr): Don't call find_replacement.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 9c3e0ea3529..b2b0d4bad3b 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -19738,7 +19738,10 @@ rs6000_secondary_reload_inner (rtx reg, rtx mem, rtx 
scratch, bool store_p)
 
   if ((addr_mask & RELOAD_REG_PRE_INCDEC) == 0)
{
- emit_insn (gen_add2_insn (op_reg, GEN_INT (GET_MODE_SIZE (mode;
+ int delta = GET_MODE_SIZE (mode);
+ if (GET_CODE (addr) == PRE_DEC)
+   delta = -delta;
+ emit_insn (gen_add2_insn (op_reg, GEN_INT (delta)));
  new_addr = op_reg;
}
   break;
@@ -19938,17 +19941,6 @@ rs6000_secondary_reload_gpr (rtx reg, rtx mem, rtx 
scratch, bool store_p)
  && GET_CODE (XEXP (addr, 1)) == PLUS
  && XEXP (XEXP (addr, 1), 0) == XEXP (addr, 0));
   scratch_or_premodify = XEXP (addr, 0);
-  if (!HARD_REGISTER_P (scratch_or_premodify))
-   /* If we have a pseudo here then reload will have arranged
-  to have it replaced, but only in the original insn.
-  Use the replacement here too.  */
-   scratch_or_premodify = find_replacement (&XEXP (addr, 0));
-
-  /* RTL emitted by rs6000_secondary_reload_gpr uses RTL
-expressions from the original insn, without unsharing them.
-Any RTL that points into the original insn will of course
-have register replacements applied.  That is why we don't
-need to look for replacements under the PLUS.  */
   addr = XEXP (addr, 1);
 }
   gcc_assert (GET_CODE (addr) == PLUS || GET_CODE (addr) == LO_SUM);

-- 
Alan Modra
Australia Development Lab, IBM


[RS6000] Remove unnecessary rtx_equal_p

2018-11-12 Thread Alan Modra
REGs are unique.  This patch recognizes that fact, speeding up rs6000
gcc infinitesimally.  Bootstrapped etc. powerpc64le-linux.  OK?

* gcc/config/rs6000/rs6000.c (rs6000_legitimate_address_p): Replace
rtx_equal_p call for known REGs with pointer comparison.
(rs6000_secondary_reload_memory): Likewise.
(rs6000_secondary_reload_inner): Likewise.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index d3355710d91..9c3e0ea3529 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -9227,7 +9227,7 @@ rs6000_legitimate_address_p (machine_mode mode, rtx x, 
bool reg_ok_strict)
  reg_ok_strict, false)
  || (!avoiding_indexed_address_p (mode)
  && legitimate_indexed_address_p (XEXP (x, 1), reg_ok_strict)))
-  && rtx_equal_p (XEXP (XEXP (x, 1), 0), XEXP (x, 0)))
+  && XEXP (XEXP (x, 1), 0) == XEXP (x, 0))
 return 1;
   if (reg_offset_p && !quad_offset_p
   && legitimate_lo_sum_address_p (mode, x, reg_ok_strict))
@@ -19011,7 +19011,7 @@ rs6000_secondary_reload_memory (rtx addr,
   plus_arg1 = XEXP (addr, 1);
   if (!base_reg_operand (reg, GET_MODE (reg))
  || GET_CODE (plus_arg1) != PLUS
- || !rtx_equal_p (reg, XEXP (plus_arg1, 0)))
+ || XEXP (plus_arg1, 0) != reg)
{
  fail_msg = "bad PRE_MODIFY";
  extra_cost = -1;
@@ -19748,7 +19748,7 @@ rs6000_secondary_reload_inner (rtx reg, rtx mem, rtx 
scratch, bool store_p)
   op1 = XEXP (addr, 1);
   if (!base_reg_operand (op0, Pmode)
  || GET_CODE (op1) != PLUS
- || !rtx_equal_p (op0, XEXP (op1, 0)))
+ || XEXP (op1, 0) != op0)
rs6000_secondary_reload_fail (__LINE__, reg, mem, scratch, store_p);
 
   if ((addr_mask & RELOAD_REG_PRE_MODIFY) == 0)

-- 
Alan Modra
Australia Development Lab, IBM


[doc, committed] clarify documentation of cache parameters

2018-11-12 Thread Sandra Loosemore

Continuing to whack at trivial doc issues in bugzilla

This patch is for PR 59634, which is to clarify that some cache-related 
parameters control attributes of the data cache and not the instruction 
cache.


-Sandra
2018-11-13  Sandra Loosemore  

	PR middle-end/59634

	gcc/
	* doc/invoke.texi (Optimize Options): Clarify that the
	l1-cache-line-size, l1-cache-size, and l2-cache-size parameters
	apply to data cache size.
Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi	(revision 266044)
+++ gcc/doc/invoke.texi	(working copy)
@@ -11164,13 +11164,13 @@ streams being prefetched (see @option{si
 Maximum number of prefetches that can run at the same time.
 
 @item l1-cache-line-size
-The size of cache line in L1 cache, in bytes.
+The size of cache line in L1 data cache, in bytes.
 
 @item l1-cache-size
-The size of L1 cache, in kilobytes.
+The size of L1 data cache, in kilobytes.
 
 @item l2-cache-size
-The size of L2 cache, in kilobytes.
+The size of L2 data cache, in kilobytes.
 
 @item prefetch-dynamic-strides
 Whether the loop array prefetch pass should issue software prefetch hints


[RS6000] Comment fix

2018-11-12 Thread Alan Modra
Committed as obvious, rev 266047.

* config/rs6000/rs6000.c (rs6000_emit_prologue): Comment fix.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 1c5c7861552..d3355710d91 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -26997,7 +26997,7 @@ rs6000_emit_prologue (void)
 }
 
   /* If we need to save CR, put it into r12 or r11.  Choose r12 except when
- r12 will be needed by out-of-line gpr restore.  */
+ r12 will be needed by out-of-line gpr save.  */
   cr_save_regno = ((DEFAULT_ABI == ABI_AIX || DEFAULT_ABI == ABI_ELFv2)
   && !(strategy & (SAVE_INLINE_GPRS
| SAVE_NOINLINE_GPRS_SAVES_LR))

-- 
Alan Modra
Australia Development Lab, IBM


[RS6000] Don't put large integer constants in TOC for -mcmodel=medium

2018-11-12 Thread Alan Modra
For -mcmodel=medium we can use toc-relative addressing to access
constants placed in read-only data, which is better since they can be
merged when in .rodata.cst8.

Bootstrapped etc. powerpc64le-linux.  OK?

* config/rs6000/linux64.h (ASM_OUTPUT_SPECIAL_POOL_ENTRY_P): Exclude
integer constants when -mcmodel=medium.

diff --git a/gcc/config/rs6000/linux64.h b/gcc/config/rs6000/linux64.h
index e6b4fd22d73..0d8e164a598 100644
--- a/gcc/config/rs6000/linux64.h
+++ b/gcc/config/rs6000/linux64.h
@@ -582,8 +582,10 @@ extern int dot_symbols;
we also do this for floating-point constants.  We actually can only
do this if the FP formats of the target and host machines are the
same, but we can't check that since not every file that uses
-   the macros includes real.h.  We also do this when we can write the
-   entry into the TOC and the entry is not larger than a TOC entry.  */
+   the macros includes real.h.  We also do this when we can write an
+   integer into the TOC and the entry is not larger than a TOC entry,
+   but not for -mcmodel=medium where we'll use a toc-relative load for
+   constants outside the TOC.  */
 
 #undef  ASM_OUTPUT_SPECIAL_POOL_ENTRY_P
 #define ASM_OUTPUT_SPECIAL_POOL_ENTRY_P(X, MODE)   \
@@ -593,6 +595,7 @@ extern int dot_symbols;
   && GET_CODE (XEXP (XEXP (X, 0), 0)) == SYMBOL_REF)   \
|| GET_CODE (X) == LABEL_REF\
|| (GET_CODE (X) == CONST_INT   \
+  && TARGET_CMODEL != CMODEL_MEDIUM\
   && GET_MODE_BITSIZE (MODE) <= GET_MODE_BITSIZE (Pmode))  \
|| (GET_CODE (X) == CONST_DOUBLE
\
   && ((TARGET_64BIT\

-- 
Alan Modra
Australia Development Lab, IBM


[RS6000] Rotate testcase

2018-11-12 Thread Alan Modra
The testcase exercises one of the rotate patterns.  Segher okayed the
testcase a long time ago, and the comments are obvious.
Committed rev 266046.

gcc/
* config/rs6000/predicates.md (logical_const_operand),
(logical_operand): Correct comment.
gcc/testsuite/
* gcc.target/powerpc/rotmask.c: New.

diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 805d92ea1f1..1af01935b5e 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -922,7 +922,7 @@ (define_predicate "non_add_cint_operand"
&& !satisfies_constraint_L (op)")))
 
 ;; Return 1 if the operand is a constant that can be used as the operand
-;; of an OR or XOR.
+;; of an AND, OR or XOR.
 (define_predicate "logical_const_operand"
   (match_code "const_int")
 {
@@ -935,7 +935,7 @@ (define_predicate "logical_const_operand"
 })
 
 ;; Return 1 if the operand is a non-special register or a constant that
-;; can be used as the operand of an OR or XOR.
+;; can be used as the operand of an AND, OR or XOR.
 (define_predicate "logical_operand"
   (ior (match_operand 0 "gpc_reg_operand")
(match_operand 0 "logical_const_operand")))
diff --git a/gcc/testsuite/gcc.target/powerpc/rotmask.c 
b/gcc/testsuite/gcc.target/powerpc/rotmask.c
new file mode 100644
index 000..4d1b9174921
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/rotmask.c
@@ -0,0 +1,8 @@
+/* { dg-require-effective-target lp64 } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler-not "rotldi" } } */
+
+unsigned long f (unsigned long x)
+{
+  return ((x << 1) | (x >> 63)) & 0x;
+}

-- 
Alan Modra
Australia Development Lab, IBM


[RS6000] Hide insn not needing to be public

2018-11-12 Thread Alan Modra
Another obvious patch.  Committed rev 266045.

* config/rs6000/rs6000.md (addsi3_high): Prefix with '*'.

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 02e6e084785..02f194c7d33 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -1691,7 +1691,7 @@ (define_insn "*add3"
addis %0,%1,%v2"
   [(set_attr "type" "add")])
 
-(define_insn "addsi3_high"
+(define_insn "*addsi3_high"
   [(set (match_operand:SI 0 "gpc_reg_operand" "=b")
 (plus:SI (match_operand:SI 1 "gpc_reg_operand" "b")
  (high:SI (match_operand 2 "" ""]

-- 
Alan Modra
Australia Development Lab, IBM


[RS6000] Ignore "c", "l" and "h" for register preference

2018-11-12 Thread Alan Modra
This catches a few places where move insn patterns don't slightly
disparage CTR, LR and VRSAVE regs.  Also fixes the doc for the rs6000
h constraint, and removes an r->cl alternative covered by r->h.

Segher okayed a patch adding "*" like this patch a long time ago.
Somehow I never committed it.  This one does a few more things as
well, but I think it's sufficiently obvious to commit as such.
Bootstrapped etc. powerpc64le-linux and committed rev 266044.

* gcc/doc/md.texi (Machine Constraints): Correct rs6000 h constraint
description.
* config/rs6000/rs6000.md (movsi_internal1): Delete MT%0 case
covered by alternative.
(movcc_internal1): Ignore h for register preference.
(mov_hardfloat64): Likewise.
(mov_softfloat): Ignore c, l, h for register preference.

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 16f37dafbb9..02e6e084785 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -6842,21 +6842,21 @@ (define_insn "movsi_low"
 ;; STW  STFIWX   STXSIWX  LI   LIS
 ;; #XXLORXXSPLTIB 0   XXSPLTIB -1  VSPLTISW
 ;; XXLXOR 0 XXLORC -1P9 const MTVSRWZ  MFVSRWZ
-;; MF%1 MT%0 MT%0 NOP
+;; MF%1 MT%0 NOP
 (define_insn "*movsi_internal1"
   [(set (match_operand:SI 0 "nonimmediate_operand"
"=r, r,   r,   ?*wI,?*wH,
 m,  ?Z,  ?Z,  r,   r,
 r,  ?*wIwH,  ?*wJwK,  ?*wJwK,  ?*wu,
 ?*wJwK, ?*wH,?*wK,?*wIwH,  ?r,
-r,  *c*l,*h,  *h")
+r,  *h,  *h")
 
(match_operand:SI 1 "input_operand"
"r,  U,   m,   Z,   Z,
 r,  wI,  wH,  I,   L,
 n,  wIwH,O,   wM,  wB,
 O,  wM,  wS,  r,   wIwH,
-*h, r,   r,   0"))]
+*h, r,   0"))]
 
   "gpc_reg_operand (operands[0], SImode)
|| gpc_reg_operand (operands[1], SImode)"
@@ -6883,21 +6883,20 @@ (define_insn "*movsi_internal1"
mfvsrwz %0,%x1
mf%1 %0
mt%0 %1
-   mt%0 %1
nop"
   [(set_attr "type"
"*,  *,   load,fpload,  fpload,
 store,  fpstore, fpstore, *,   *,
 *,  veclogical,  vecsimple,   vecsimple,   vecsimple,
 veclogical, veclogical,  vecsimple,   mffgpr,  mftgpr,
-*,   *,   *,   *")
+*,  *,   *")
 
(set_attr "length"
"4,  4,   4,   4,   4,
 4,  4,   4,   4,   4,
 8,  4,   4,   4,   4,
 4,  4,   8,   4,   4,
-4,  4,   4,   4")])
+4,  4,   4")])
 
 ;; Like movsi, but adjust a SF value to be used in a SI context, i.e.
 ;; (set (reg:SI ...) (subreg:SI (reg:SF ...) 0))
@@ -7175,9 +7174,9 @@ (define_expand "movcc"
 
 (define_insn "*movcc_internal1"
   [(set (match_operand:CC 0 "nonimmediate_operand"
-   "=y,x,?y,y,r,r,r,r,r,*c*l,r,m")
+   "=y,x,?y,y,r,r,r,r, r,*c*l,r,m")
(match_operand:CC 1 "general_operand"
-   " y,r, r,O,x,y,r,I,h,   r,m,r"))]
+   " y,r, r,O,x,y,r,I,*h,   r,m,r"))]
   "register_operand (operands[0], CCmode)
|| register_operand (operands[1], CCmode)"
   "@
@@ -7329,11 +7328,11 @@ (define_insn "movsd_hardfloat"
 ;; LIS  G-const.   F/n-const  NOP
 (define_insn "*mov_softfloat"
   [(set (match_operand:FMOVE32 0 "nonimmediate_operand"
-   "=r, cl,r, r, m, r,
+   "=r, *c*l,  r, r, m, r,
   r, r, r, *h")
 
(match_operand:FMOVE32 1 "input_operand"
-"r, r, h, m, r, I,
+"r, r, *h,m, r, I,
   L, G, Fn,0"))]
 
   "(gpc_reg_operand (operands[0], mode)
@@ -7600,7 +7599,7 @@ (define_insn "*mov_hardfloat64"
(match_operand:FMOVE64 1 "input_operand"
 "d,   m,  d,  wY, ,
  Z,   ,   ,  ,  ,
- r,   YZ, r,  r,  h,
+ r,   YZ, r,  r,  *h,

[doc, committed] make #pragma once documentation easier to find

2018-11-12 Thread Sandra Loosemore
This patch is for PR 47823, which was about the documentation for 
"#pragma once" being buried in a strange location in the CPP manual. 
I've put the primary documentation for it with the other preprocessor 
pragmas in the CPP manual, and added a blurb to the pragmas chapter in 
the GCC manual explaining that preprocessor pragmas are documented in 
the CPP manual.


-Sandra
2018-11-12  Sandra Loosemore  

	PR preprocessor/47823

	gcc/
	* doc/cpp.texi (Alternatives to Wrapper #ifndef): Move #pragma once
	documentation to...
	(Pragmas): ...here.  
	* doc/extend.texi (Pragmas): Note additional pragmas documented
	in the CPP manual.
Index: gcc/doc/cpp.texi
===
--- gcc/doc/cpp.texi	(revision 266042)
+++ gcc/doc/cpp.texi	(working copy)
@@ -958,10 +958,7 @@ prevent the file from ever being read ag
 @samp{#import} and @samp{#include} to refer to the same header file.
 
 Another way to prevent a header file from being included more than once
-is with the @samp{#pragma once} directive.  If @samp{#pragma once} is
-seen when scanning a header file, that file will never be read again, no
-matter what.
-
+is with the @samp{#pragma once} directive (@pxref{Pragmas}).  
 @samp{#pragma once} does not have the problems that @samp{#import} does,
 but it is not recognized by all preprocessors, so you cannot rely on it
 in a portable program.
@@ -3550,12 +3547,14 @@ idea of the directory containing the cur
 @node Pragmas
 @chapter Pragmas
 
+@cindex pragma directive
+
 The @samp{#pragma} directive is the method specified by the C standard
 for providing additional information to the compiler, beyond what is
 conveyed in the language itself.  The forms of this directive
 (commonly known as @dfn{pragmas}) specified by C standard are prefixed with 
 @code{STDC}.  A C compiler is free to attach any meaning it likes to other 
-pragmas.  All GNU-defined, supported pragmas have been given a
+pragmas.  Most GNU-defined, supported pragmas have been given a
 @code{GCC} prefix.
 
 @cindex @code{_Pragma}
@@ -3658,6 +3657,12 @@ contained in the pragma must be a single
 the @samp{#warning} and @samp{#error} directives, these pragmas can be
 embedded in preprocessor macros using @samp{_Pragma}.
 
+@item #pragma once
+If @code{#pragma once} is seen when scanning a header file, that
+file will never be read again, no matter what.  It is a less-portable
+alternative to using @samp{#ifndef} to guard the contents of header files
+against multiple inclusions.
+
 @end ftable
 
 @node Other Directives
Index: gcc/doc/extend.texi
===
--- gcc/doc/extend.texi	(revision 266042)
+++ gcc/doc/extend.texi	(working copy)
@@ -22064,6 +22064,10 @@ code originally written for other compil
 we do not recommend the use of pragmas; @xref{Function Attributes},
 for further explanation.
 
+The GNU C preprocessor recognizes several pragmas in addition to the
+compiler pragmas documented here.  Refer to the CPP manual for more
+information.
+
 @menu
 * AArch64 Pragmas::
 * ARM Pragmas::


Re: [RS6000] Don't pass -many to the assembler

2018-11-12 Thread Alan Modra
On Mon, Nov 12, 2018 at 04:34:34PM -0800, Mike Stump wrote:
> On Nov 12, 2018, at 3:13 PM, Alan Modra  wrote:
> > 
> > For people developing new code, it's the right way to go, and
> > especially so for people working on gcc itself.  For people just
> > wanting stuff to compile, not so much.  I fully expect a chorus of
> > *MORON* or worse to come from the likes of the linux kernel rabble.
> 
> So, if you just want to hear people whine...

I'm happy to hear other points of view.  Ignore my hyperbole.

> On darwin, we (darwin, as a platform decision) like all instructions 
> available from the assembler.

OK, fair enough.  Another option is to just disable -many when gcc is
in development, like we enable checking.

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH] MIPS: Add `-mfix-r5900' option for the R5900 short loop erratum

2018-11-12 Thread Maciej W. Rozycki
On Sun, 11 Nov 2018, Fredrik Noring wrote:

> ../../../../libsanitizer/sanitizer_common/sanitizer_platform_limits_linux.cc:71:1:
>  note: in expansion of macro ‘COMPILER_CHECK’
>71 | COMPILER_CHECK(struct_kernel_stat_sz == sizeof(struct stat));
>   | ^~

 I guess `struct_kernel_stat_sz' and `sizeof(struct stat)' do not match.  
You may try making a preprocessed source with the same GCC invocation 
(possibly with `-dD' added if needed) to see how these items have been 
defined in your build environment.  This may reveal something obvious.

 Also unless you realise the problem is due to misconfiguration, please 
file it in GCC bugzilla as a GCC 9 regression (since as you say 8.2.0 
builds just fine in your environment).  We don't want things to break with 
new releases.

  Maciej


Re: [PATCH v3 3/3] PR preprocessor/83173: Enhance -fdump-internal-locations output

2018-11-12 Thread David Malcolm
On Mon, 2018-11-12 at 21:13 +, Mike Gulick wrote:
> On 11/2/18 5:04 PM, David Malcolm wrote:
> > On Thu, 2018-11-01 at 11:56 -0400, Mike Gulick wrote:
> > > 2017-10-31  Mike Gulick  
> > > 
> > >   PR preprocessor/83173
> > >   * gcc/input.c (dump_location_info): Dump reason and
> > >   included_from fields from line_map_ordinary struct.  Fix
> > >   indentation when location > 5 digits.
> > > 
> > >   * libcpp/location-example.txt: Update example
> > >   -fdump-internal-locations output.
> > > ---
> > >  gcc/input.c |  49 +-
> > >  libcpp/location-example.txt | 333 +-
> > > 
> > > --
> > >  2 files changed, 241 insertions(+), 141 deletions(-)
> > 
> > Sorry about the belated response.  This is a nice enhancement; some
> > nits below.
> > 
> > > diff --git a/gcc/input.c b/gcc/input.c
> > > index a94a010f353..f938a37f20e 100644
> > > --- a/gcc/input.c
> > > +++ b/gcc/input.c
> > > @@ -1075,6 +1075,17 @@ dump_labelled_location_range (FILE
> > > *stream,
> > >fprintf (stream, "\n");
> > >  }
> > >  
> > > +#define NUM_DIGITS(x) ((x) >= 10 ? 10 : \
> > > +(x) >= 1 ? 9 : \
> > > +(x) >= 1000 ? 8 : \
> > > +(x) >= 100 ? 7 : \
> > > +(x) >= 10 ? 6 : \
> > > +(x) >= 1 ? 5 : \
> > > +(x) >= 1000 ? 4 : \
> > > +(x) >= 100 ? 3 : \
> > > +(x) >= 10 ? 2 : \
> > > +1)
> > 
> > diagnostic-show-locus.c has a function "num_digits" (currently
> > static)
> > and, fwiw, a unit test.  It would be good to share the
> > implementation.
> > 
> 
> I initially tried to use this function by just adding "extern int
> num_digits(int);" into diagnostic-core.h, but that failed to link, so
> it seems
> like diagnostic-show-locus.c is not included in whatever library
> input.c gets
> linked with (I forget which library it was trying to link). 

Both input.o and diagnostic-show-locus.o are in OBJS-libcommon, so I'm
not sure what went wrong.

> Instead I moved
> num_digits and its unit test to diagnostic.c, and added the extern
> definition to
> diagnostic-core.h.  That builds and tests successfully.  Does that
> seem like a
> reasonable way to do this?

Thanks.  That sounds good (maybe put the decl in diagnostic.h rather
than diagnostic-core.h; the latter is used in lots of places, whereas
the former is more about implementation details).

> > >  /* Write a visualization of the locations in the line_table to
> > > STREAM.  */
> > >  
> > >  void
> > > @@ -1104,6 +1115,35 @@ dump_location_info (FILE *stream)
> > >  map->m_column_and_range_bits - map-
> > > >m_range_bits);
> > >fprintf (stream, "  range bits: %i\n",
> > >  map->m_range_bits);
> > > +  const char * reason;
> > > +  switch (map->reason) {
> > > +  case LC_ENTER:
> > > + reason = "LC_ENTER";
> > > + break;
> > > +  case LC_LEAVE:
> > > + reason = "LC_LEAVE";
> > > + break;
> > > +  case LC_RENAME:
> > > + reason = "LC_RENAME";
> > > + break;
> > > +  case LC_RENAME_VERBATIM:
> > > + reason = "LC_RENAME_VERBATIM";
> > > + break;
> > > +  case LC_ENTER_MACRO:
> > > + reason = "LC_RENAME_MACRO";
> > > + break;
> > > +  default:
> > > + reason = "Unknown";
> > > +  }
> > > +  fprintf (stream, "  reason: %d (%s)\n", map->reason,
> > > reason);
> > > +
> > > +  const line_map_ordinary *includer_map
> > > + = linemap_included_from_linemap (line_table, map);
> > > +  fprintf (stream, "  included from map: %d\n",
> > > +includer_map ? int (includer_map - line_table-
> > > > info_ordinary.maps)
> > > 
> > > +: -1);
> > 
> > I'm not a fan of "-1" here; it's a NULL pointer in the original
> > data.
> > How about "n/a" for that case?
> > 
> 
> That's a good suggestion.  Thanks.
> 
> > > +  fprintf (stream, "  included from location: %d\n",
> > > +linemap_included_from (map));
> > 
> > ...or merging it with this line, for something like:
> > 
> >   included from location: 127 (in ordinary map 2)
> > 
> > vs:
> > 
> >   included from location: 0
> > 
> > [...snip...]
> > 
> > Other than that, this is OK for trunk, assuming your contributor
> > paperwork is in place.
> > 
> > Dave
> > 
> 
> What is the preferred way to re-send this patch?  Should I re-send
> the entire
> patch series as v4, or just an updated version of this single patch?

The latter: just an updated version of the changed patch.  IIRC the
rest is all approved.

> 
> Also, I'm waiting on FSF for assignment paperwork.  I've re-pinged
> them after
> waiting a week.

Thanks.

> Thanks for the feedback and help.
> 
> -Mike


Re: [PATCH 2/9][GCC][AArch64][middle-end] Add rules to strip away unneeded type casts in expressions

2018-11-12 Thread Joseph Myers
On Sun, 11 Nov 2018, Tamar Christina wrote:

> This patch adds a match.pd rule for stripping away the type converts 
> when you're converting to a type that has twice the precision of the 
> current type in the same class, doing a simple math operation on it and 
> converting back to the smaller type.

What types exactly is this meant to apply to?  Floating-point?  Integer?  
Mixtures of those?  (I'm guessing not mixtures, because those would be 
something other than "convert" here.)

For integer types, it's not safe, in that if e.g. F is int and X is 
unsigned long long, you're changing from defined overflow to undefined 
overflow.

For floating-point types, using TYPE_PRECISION is suspect (it's not 
wonderfully clear what it means, but it's not the number of significand 
bits) - you need to look at the actual properties of the real format of 
the machine modes in question.

Specifically, see convert.c:convert_to_real_1, the long comment starting 
"Sometimes this transformation is safe (cannot change results through 
affecting double rounding cases) and sometimes it is not.", and the 
associated code calling real_can_shorten_arithmetic.  I think that code in 
convert.c ought to apply to your case of half precision converted to float 
for arithmetic and then converted back to half precision afterwards.  (In 
the case where the excess precision configuration - which depends on 
TARGET_FP_F16INST for AArch64 - says to do arithmetic directly on half 
precision, anyway.)

Now, there are still issues in that convert.c code in the decimal 
floating-point case (see bug 40503).  And I think match.pd is a much 
better place for this sort of thing than convert.c (and 
c-family/c-common.c:shorten_binary_op in the integer case).  But for this 
specific case of binary floating-point conversions, I think the logic in 
convert.c is what should be followed (but moved to match.pd if possible).

This patch is also lacking a testcase, which might show why the existing 
logic in convert.c isn't being applied in whatever case you're concerned 
with.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [RS6000] Don't pass -many to the assembler

2018-11-12 Thread Iain Sandoe


> On 13 Nov 2018, at 00:34, Mike Stump  wrote:
> 
> On Nov 12, 2018, at 3:13 PM, Alan Modra  wrote:
>> 
>> For people developing new code, it's the right way to go, and
>> especially so for people working on gcc itself.  For people just
>> wanting stuff to compile, not so much.  I fully expect a chorus of
>> *MORON* or worse to come from the likes of the linux kernel rabble.
> 
> So, if you just want to hear people whine...
> 
> On darwin, we (darwin, as a platform decision) like all instructions 
> available from the assembler.  The assembler and the linker have specialized 
> code to track all instructions used (from which CPU types those instructions 
> come from), and mark the object file according to what is actually used.  We 
> also have FAT binaries as a standard feature and other things to make 
> everything play nicely.  People that use inline assembly are expected to know 
> how to code, because it is an advanced feature, and not need hand holding on 
> how to write the condition that guards the code.  I don't recall seeing any 
> reports of anyone needing any extra help in this matter.  On darwin, there 
> wasn't a .machine for a while, it came later.
> 
> Anyway, I thought about saying that it would be nice if all platforms behaved 
> the same, and ask, what do people thing the recommended behavior of all 
> platforms should be?
> 
> Personally I don't have a dog in this, as darwin cannot be changed, it's a 
> platform feature, and personally, I don't write a ton of this type of code.  
> I just provide an alternate POV.  Darwin has api's to query the architecture 
> and code in the assembler/linker to help manage it's decision.  Normal ELF 
> systems, I want to say, usually lack such things.  So, choices it makes 
> aren't necessarily right for others.

Given that we have our own assembler and platform equivalent of -many 
(-force_cpusubtype_ALL) .. I was just watching the thread go by ;) 

Having said that, it would be interesting to know what the recommendation is 
with .machine.

Iain



Re: [RS6000] Don't pass -many to the assembler

2018-11-12 Thread Mike Stump
On Nov 12, 2018, at 3:13 PM, Alan Modra  wrote:
> 
> For people developing new code, it's the right way to go, and
> especially so for people working on gcc itself.  For people just
> wanting stuff to compile, not so much.  I fully expect a chorus of
> *MORON* or worse to come from the likes of the linux kernel rabble.

So, if you just want to hear people whine...

On darwin, we (darwin, as a platform decision) like all instructions available 
from the assembler.  The assembler and the linker have specialized code to 
track all instructions used (from which CPU types those instructions come 
from), and mark the object file according to what is actually used.  We also 
have FAT binaries as a standard feature and other things to make everything 
play nicely.  People that use inline assembly are expected to know how to code, 
because it is an advanced feature, and not need hand holding on how to write 
the condition that guards the code.  I don't recall seeing any reports of 
anyone needing any extra help in this matter.  On darwin, there wasn't a 
.machine for a while, it came later.

Anyway, I thought about saying that it would be nice if all platforms behaved 
the same, and ask, what do people thing the recommended behavior of all 
platforms should be?

Personally I don't have a dog in this, as darwin cannot be changed, it's a 
platform feature, and personally, I don't write a ton of this type of code.  I 
just provide an alternate POV.  Darwin has api's to query the architecture and 
code in the assembler/linker to help manage it's decision.  Normal ELF systems, 
I want to say, usually lack such things.  So, choices it makes aren't 
necessarily right for others.

Re: [PATCH 2/6] [RS6000] rs6000_output_indirect_call

2018-11-12 Thread Alan Modra
On Mon, Nov 12, 2018 at 01:44:08PM -0600, Bill Schmidt wrote:
> On 11/6/18 11:37 PM, Alan Modra wrote:
> > +fun, "l" + sibcall);
> 
> It's not at all clear to me what {"l" + sibcall} is doing here.

It's an ancient C programmer's trick, from the days when most
compilers didn't optimize too well.  I think I found it first in the
nethack sources.  :-)

> Whatever it is, it's clever enough that it warrants a comment... :-)
> Does adding "l" to false result in the null string?  Is that
> standard?

"l" results in a "const char*" pointing at 0x6c,0 bytes in memory
(assuming ascii).  Adding "true" to that implicitly converts "true" to
1 and thus a "const char*" pointing at a NUL byte.  All completely
standard, even in that new fangled C++ thingy.

A comment is as much needed as count++; needs /* add one to count. */.
If it bothers people I'll write:  sibcall ? "" : "l".

Hah, even the latest gcc doesn't optimize the conditional expression
down to "l" + sibcall.  Check out the code generated for

const char *
f1 (bool x)
{
  return "a" + x;
}

const char *
f2 (bool x)
{
  return x ? "" : "b";
}

> > --- a/gcc/config/rs6000/rs6000.md
> > +++ b/gcc/config/rs6000/rs6000.md
> > @@ -10540,11 +10540,7 @@ (define_insn "*call_indirect_nonlocal_sysv"
> >else if (INTVAL (operands[2]) & CALL_V4_CLEAR_FP_ARGS)
> >  output_asm_insn ("creqv 6,6,6", operands);
> >
> > -  if (rs6000_speculate_indirect_jumps
> > -  || which_alternative == 1 || which_alternative == 3)
> > -return "b%T0l";
> > -  else
> > -return "crset 2\;beq%T0l-";
> > +  return rs6000_output_indirect_call (operands, 0, false);
> 
> Looks like this breaks Darwin?  This pattern matches for DEFAULT_ABI == 
> ABI_DARWIN
> but rs6000_output_indirect_call will hit gcc_unreachable() for that ABI.  

Hmm, yes, thanks for pointing that one out.  I took too much notice of
the pattern name.

Segher, would you like me to repost the series with accumulated fixes,
I mean before you review the rest of the series?

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH] More value_range API cleanup

2018-11-12 Thread Aldy Hernandez

On 11/12/18 7:12 AM, Richard Biener wrote:


This mainly tries to rectify the workaround I put in place for ipa-cp.c
needing to build value_range instead of value_range_base for calling
extract_range_from_unary_expr.

To make this easier I moved more set_* functions to methods.

Then for some reason I chose to fix the rathole of equiv bitmap sharing
after finding at least one real bug


By the way, I've seen that the equiv_add() calls in vr-values.c 
sometimes set equivalences for VARYING and UNDEFINED which in theory 
shouldn't happen.  I've been too chicken to follow that hole.


I think we should assert everywhere that we set the equivalences, that 
we're not talking about VARYING or UNDEFINED.



@@ -6168,37 +6172,30 @@ value_range::union_helper (value_range *vr0, const 
value_range *vr1)
return;
  }
  
-  value_range saved (*vr0);

value_range_kind vr0type = vr0->kind ();
tree vr0min = vr0->min ();
tree vr0max = vr0->max ();
union_ranges (&vr0type, &vr0min, &vr0max,
vr1->kind (), vr1->min (), vr1->max ());
-  *vr0 = value_range (vr0type, vr0min, vr0max);
-  if (vr0->varying_p ())
+  /* Work on a temporary so we can still use vr0 when union returns varying.  
*/
+  value_range tem;
+  tem.set_and_canonicalize (vr0type, vr0min, vr0max);
+  if (tem.varying_p ())


I'm not a big fan of the code duplication in the union chunks.  You're 
adding more places that need to be kept in sync.


I think value_range::union_ could be easily coded as:

  value_range_base::union_ (other);
  union_helper (this, other);
  if (flag_checking)
check ();

And have union_helper only deal with the equivalence stuff.  Call it 
union_equivs?  You'd have to clear the equivalences if the range just 
became a varying/undefined, as both of those should in theory never have 
equivalences.


Also, is there a reason why you implemented value_range_base::union_ but 
not the corresponding for intersect?  I would guess it'd be needed 
sooner or later.


Thanks for working on this.
Aldy


Re: [PATCH] Fortran include line fixes and -fdec-include support

2018-11-12 Thread Fritz Reese
On Mon, Nov 12, 2018 at 9:51 AM Jakub Jelinek  wrote:
>
> In fortran97.pdf I read:
> "Except in a character context, blanks are insignificant and may be used 
> freely throughout the program."
> and while we handle that in most cases, we don't allow spaces in INCLUDE
> lines in fixed form, while e.g. ifort does.

I agree with fixing this unconditionally.

> Another thing, which I haven't touched in the PR except covering it with a
> testcase is that we allow INLINE line in fixed form to start even in columns
> 1 to 6, while ifort rejects that.  Is say
>  include 'omp_lib.h'
> valid in fixed form?  i in column 6 normally means a continuation line,
> though not sure if anything can in a valid program contain nclude
> followed by character literal.  Shall we reject that, or at least warn that
> it won't be portable?

AFAICT this is unambiguous so I would certainly suggest adding such a
warning (enabled by default).

> The last thing, biggest part of the patch, is that for legacy DEC
> compatibility, the DEC manuals document INCLUDE as a statement, not a line,
> [...]
> This means there can be (as can be seen in the following testcases)
> continuations in both forms, and in fixed form there can be 0 in column 6.

Makes sense to me. I concur with adding -fdec-include to support this
under -fdec.

If we are going to warn for the above and re-do the include matching
anyway, I wonder if we should have also a specific error message for a
labeled include statement? For example,

10include 'include_10.inc'

Will result in the generic 'Unclassifiable statement' error, but ifort
gives "Label on INCLUDE is invalid."

> In order not to duplicate all the handling of continuations, comment
> skipping etc., the patch just adjusts the include_line routine so that it
> signals if the current line is a possible start of a valid INCLUDE statement
> when in -fdec-include mode, and if so, whenever it reads a further line it
> retries to parse it using
> gfc_next_char/gfc_next_char_literal/gfc_gobble_whitespace APIs as an INCLUDE
> stmt.  If it is found not to be a valid INCLUDE statement line or set of
> lines, it returns 0, if it is valid, it returns 1 together with load_file
> like include_line does and clears all the lines containint the INCLUDE
> statement.  If the reading stops because we don't have enough lines, -1 is
> returned and the caller tries again with more lines.

LGTM.

> In addition to the above mentioned question about include in columns 1-6 in
> fixed form, another thing is that we support
>   print *, 'abc''def'
>   print *, "hij""klm"
> which prints abc'def and hij"klm.  Shall we support that for INCLUDE lines
> and INCLUDE statements too?

It appears ifort does also support this. I see no reason not to, as
the feature should be straightforward to implement.

> Tested on x86_64-linux, ok for trunk if it passes full bootstrap/regtest?

With the above additions it looks ok to me, but I must defer to an
official Fortran reviewer.

---
Fritz


RE: [PATCH v2] MIPS: Default to --with-llsc for the R5900 Linux target as well

2018-11-12 Thread Maciej W. Rozycki
On Fri, 9 Nov 2018, mfort...@gmail.com wrote:

> Maciej: I'm not able to commit this for Fredrik at the moment, would
> you mind doing that for him?

 Sure, I have applied the change now, using your proposed ChangeLog entry.

  Maciej


Re: [RS6000] Don't pass -many to the assembler

2018-11-12 Thread Alan Modra
On Mon, Nov 12, 2018 at 04:17:51PM +, Michael Matz wrote:
> Hi,
> 
> On Mon, 12 Nov 2018, Segher Boessenkool wrote:
> 
> > > > Wouldn't this also break compiling code that contains power9 
> > > > instructions but guarded by runtime tests to only be executed on 
> > > > power9 machines?  That seems a valid usecase, and it'd be bad if the 
> > > > assembler fails to compile such.  (You can't use -mcpu=power9 as 
> > > > work around as the other unguarded code is not supposed to be using 
> > > > power9 insns).
> > > 
> > > You'll need to put .machine directives around them.
> > 
> > My worry with that is there may be too much legacy code that does not do 
> > this :-(
> 
> We'll see once we put gcc9 through a distro build.  My worry really only 
> was that the change would result in compile breakage without a sensible 
> solution.  (I'll just give all packages whose build failures prevent gcc9 
> from being the new system compiler to Alan for fixing ;-) ).

Heh.  I've been using the patch (or one like it) myself for over 2
years, but of course I don't tend to compile whole distros.  The
length of time I've had it baking in my tree says something about my
hesitation to post the patch more than anything else.  Note that you
can easily "fix" package build failures by adding -Wa,-many to
CFLAGS.

For people developing new code, it's the right way to go, and
especially so for people working on gcc itself.  For people just
wanting stuff to compile, not so much.  I fully expect a chorus of
*MORON* or worse to come from the likes of the linux kernel rabble.

-- 
Alan Modra
Australia Development Lab, IBM


[wwwdocs] Add D language to news, frontends, release notes, and readings.

2018-11-12 Thread Iain Buclaw
Hi,

As suggested, this adds an announcement of the D front end addition to
the news items on the GCC home page, and from what I can tell, the
relevant pages where the language should get a mention.

Kept the release notes brief for now, will expand later.  Is this OK?

Thanks,
Iain
---
Index: htdocs/frontends.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/frontends.html,v
retrieving revision 1.44
diff -U 3 -r1.44 frontends.html
--- htdocs/frontends.html	30 Sep 2018 14:38:46 -	1.44
+++ htdocs/frontends.html	12 Nov 2018 22:44:02 -
@@ -11,7 +11,7 @@
 
 Currently the main GCC distribution contains front ends for C
 (gcc), C++ (g++), Objective C,
-Fortran, Ada (GNAT), and Go.
+Fortran, Ada (GNAT), Go, and D.
 
 There are several more front ends for different languages that have
 been written for GCC but not yet integrated into the main distribution
Index: htdocs/index.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/index.html,v
retrieving revision 1.1106
diff -U 3 -r1.1106 index.html
--- htdocs/index.html	26 Oct 2018 12:30:46 -	1.1106
+++ htdocs/index.html	12 Nov 2018 22:44:02 -
@@ -18,7 +18,7 @@
 C,
 C++,
 Objective-C, Fortran,
-Ada, and Go, as well as libraries for these languages (libstdc++,...).
+Ada, Go, and D, as well as libraries for these languages (libstdc++,...).
 GCC was originally written as the compiler for the http://www.gnu.org/gnu/thegnuproject.html";>GNU operating system.
 The GNU system was developed to be 100% free software, free in the sense
@@ -54,6 +54,12 @@
 News
 
 
+D front end added
+ [2018-10-29]
+ The https://dlang.org";>D programming language front end
+   has beed added to GCC.
+   This front end was contributed by Iain Buclaw.
+
 GCC 6.5 released
 [2018-10-26]
 
Index: htdocs/readings.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/readings.html,v
retrieving revision 1.306
diff -U 3 -r1.306 readings.html
--- htdocs/readings.html	1 Nov 2018 21:42:00 -	1.306
+++ htdocs/readings.html	12 Nov 2018 22:44:02 -
@@ -589,6 +589,14 @@
 
 
 
+D information
+
+
+  https://dlang.org";>D language homepage
+  https://dlang.org/spec/spec.html";>D language reference
+
+
+
 Modula 3 information
 
 
Index: htdocs/gcc-9/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-9/changes.html,v
retrieving revision 1.26
diff -U 3 -r1.26 changes.html
--- htdocs/gcc-9/changes.html	1 Nov 2018 20:34:33 -	1.26
+++ htdocs/gcc-9/changes.html	12 Nov 2018 22:44:03 -
@@ -85,6 +85,13 @@
   TS on Windows.
 
 
+D
+
+  Support for the D programming language has been added to GCC,
+implementing version 2.076 of the language and runtime library.  
+  
+
+
 Fortran
 
   Asynchronous I/O is now fully supported. The program needs to
Index: htdocs/gcc-9/criteria.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-9/criteria.html,v
retrieving revision 1.5
diff -U 3 -r1.5 criteria.html
--- htdocs/gcc-9/criteria.html	22 Oct 2018 08:33:10 -	1.5
+++ htdocs/gcc-9/criteria.html	12 Nov 2018 22:44:03 -
@@ -34,7 +34,7 @@
 Languages
 
 GCC supports several programming languages, including Ada, C, C++,
-Fortran, Objective-C, Objective-C++, and Go.
+Fortran, Objective-C, Objective-C++, Go, and D.
 For the purposes of making releases,
 however, we will consider primarily C and C++, as those are the
 languages used by the vast majority of users.  Therefore, if below


Re: [PATCH][RFC] Come up with -flive-patching master option.

2018-11-12 Thread Qing Zhao


> On Nov 12, 2018, at 2:53 AM, Martin Liška  wrote:
> 
>> 
>> Okay, I see.
>> 
>> I am also working on a similar option as yours, but make the -flive-patching 
>> as two level control:
>> 
>> +flive-patching
>> +Common RejectNegative Alias(flive-patching=,inline-clone)
>> +
>> +flive-patching=
>> +Common Report Joined RejectNegative Enum(live_patching_level) 
>> Var(flag_live_patching) Init(LIVE_NONE)
>> +-flive-patching=[inline-only-static|inline-clone]  Control 
>> optimizations to provide a safe comp for live-patching purpose.
>> 
>> the implementation for -flive-patching=inline-clone (the default) is exactly 
>> as yours,  the new level -flive-patching=inline-only-static
>> is to only enable inlining of static function for live patching, which is 
>> important for multiple-processes live patching to control memory
>> consumption. 
>> 
>> (please see my 2nd version of the -flive-patching proposal).
>> 
>> I will send out my complete patch in another email.
> 
> Hi, sure, works for me. Let's make 2 level option.

thank you.

I will send the patch tomorrow.

Qing
> 
> Martin



Re: [PATCH] C/C++: add fix-it hints for missing '&' and '*' (PR c++/87850)

2018-11-12 Thread Martin Sebor

On 11/11/2018 02:02 PM, David Malcolm wrote:

On Sun, 2018-11-11 at 11:01 -0700, Martin Sebor wrote:

On 11/10/2018 12:01 AM, Eric Gallager wrote:

On 11/9/18, David Malcolm  wrote:

This patch adds a fix-it hint to various pointer-vs-non-pointer
diagnostics, suggesting the addition of a leading '&' or '*'.

For example, note the ampersand fix-it hint in the following:

demo.c:5:22: error: invalid conversion from 'pthread_key_t' {aka
'unsigned
int'}
   to 'pthread_key_t*' {aka 'unsigned int*'} [-fpermissive]
5 |   pthread_key_create(key, NULL);
  |  ^~~
  |  |
  |  pthread_key_t {aka unsigned int}
  |  &


Having both the type and the fixit underneath the caret looks kind
of confusing


I agree it's rather subtle.  Keeping the diagnostics separate from
the suggested fix should avoid the confusion.


FWIW, the fix-it hint is in a different color (assuming that gcc is
invoked in an environment that prints that...)


I figured it would be, but I'm still not sure it's good design
to be relying on color alone to distinguish between the problem
and the suggested fix.  Especially when they are so close to one
another and the fix is just a single character with no obvious
relationship to the rest of the text on the screen.  In other
warnings there's at least the "did you forget the '@'?" part
to give a clue, even though even there the connection between
the "did you forget" and the & several lines down wouldn't
necessarily be immediately apparent.

I'm not an expert on these things so I'm going strictly by
intuition and my personal bias.  Are there any user interface
guidelines that you like to refer to that speak to this?  (Other
than our own or the blog post you referenced in one of your posts.)

Back in my GUI days, I remember reading articles and even whole
books on how to design good interfaces, including layouts and
the use of color (e.g., for the color-blind).  It seems like
there should be something relevant for the command line as well,
at least some basic principles that we could apply.

Martin


Re: [PATCH v3 3/3] PR preprocessor/83173: Enhance -fdump-internal-locations output

2018-11-12 Thread Mike Gulick
On 11/2/18 5:04 PM, David Malcolm wrote:
> On Thu, 2018-11-01 at 11:56 -0400, Mike Gulick wrote:
>> 2017-10-31  Mike Gulick  
>>
>>  PR preprocessor/83173
>>  * gcc/input.c (dump_location_info): Dump reason and
>>  included_from fields from line_map_ordinary struct.  Fix
>>  indentation when location > 5 digits.
>>
>>  * libcpp/location-example.txt: Update example
>>  -fdump-internal-locations output.
>> ---
>>  gcc/input.c |  49 +-
>>  libcpp/location-example.txt | 333 +-
>> --
>>  2 files changed, 241 insertions(+), 141 deletions(-)
> 
> Sorry about the belated response.  This is a nice enhancement; some
> nits below.
> 
>> diff --git a/gcc/input.c b/gcc/input.c
>> index a94a010f353..f938a37f20e 100644
>> --- a/gcc/input.c
>> +++ b/gcc/input.c
>> @@ -1075,6 +1075,17 @@ dump_labelled_location_range (FILE *stream,
>>fprintf (stream, "\n");
>>  }
>>  
>> +#define NUM_DIGITS(x) ((x) >= 10 ? 10 : \
>> +   (x) >= 1 ? 9 : \
>> +   (x) >= 1000 ? 8 : \
>> +   (x) >= 100 ? 7 : \
>> +   (x) >= 10 ? 6 : \
>> +   (x) >= 1 ? 5 : \
>> +   (x) >= 1000 ? 4 : \
>> +   (x) >= 100 ? 3 : \
>> +   (x) >= 10 ? 2 : \
>> +   1)
> 
> diagnostic-show-locus.c has a function "num_digits" (currently static)
> and, fwiw, a unit test.  It would be good to share the implementation.
> 

I initially tried to use this function by just adding "extern int
num_digits(int);" into diagnostic-core.h, but that failed to link, so it seems
like diagnostic-show-locus.c is not included in whatever library input.c gets
linked with (I forget which library it was trying to link).  Instead I moved
num_digits and its unit test to diagnostic.c, and added the extern definition to
diagnostic-core.h.  That builds and tests successfully.  Does that seem like a
reasonable way to do this?

>>  /* Write a visualization of the locations in the line_table to
>> STREAM.  */
>>  
>>  void
>> @@ -1104,6 +1115,35 @@ dump_location_info (FILE *stream)
>> map->m_column_and_range_bits - map->m_range_bits);
>>fprintf (stream, "  range bits: %i\n",
>> map->m_range_bits);
>> +  const char * reason;
>> +  switch (map->reason) {
>> +  case LC_ENTER:
>> +reason = "LC_ENTER";
>> +break;
>> +  case LC_LEAVE:
>> +reason = "LC_LEAVE";
>> +break;
>> +  case LC_RENAME:
>> +reason = "LC_RENAME";
>> +break;
>> +  case LC_RENAME_VERBATIM:
>> +reason = "LC_RENAME_VERBATIM";
>> +break;
>> +  case LC_ENTER_MACRO:
>> +reason = "LC_RENAME_MACRO";
>> +break;
>> +  default:
>> +reason = "Unknown";
>> +  }
>> +  fprintf (stream, "  reason: %d (%s)\n", map->reason, reason);
>> +
>> +  const line_map_ordinary *includer_map
>> += linemap_included_from_linemap (line_table, map);
>> +  fprintf (stream, "  included from map: %d\n",
>> +   includer_map ? int (includer_map - line_table-
>>> info_ordinary.maps)
>> +   : -1);
> 
> I'm not a fan of "-1" here; it's a NULL pointer in the original data.
> How about "n/a" for that case?
> 

That's a good suggestion.  Thanks.

>> +  fprintf (stream, "  included from location: %d\n",
>> +   linemap_included_from (map));
> 
> ...or merging it with this line, for something like:
> 
>   included from location: 127 (in ordinary map 2)
> 
> vs:
> 
>   included from location: 0
> 
> [...snip...]
> 
> Other than that, this is OK for trunk, assuming your contributor
> paperwork is in place.
> 
> Dave
> 

What is the preferred way to re-send this patch?  Should I re-send the entire
patch series as v4, or just an updated version of this single patch?

Also, I'm waiting on FSF for assignment paperwork.  I've re-pinged them after
waiting a week.

Thanks for the feedback and help.

-Mike


Re: [PATCH] RFC: C/C++: print help when a header can't be found

2018-11-12 Thread Martin Sebor

On 11/11/2018 04:33 PM, David Malcolm wrote:

When gcc can't find a header file, it's a hard error that stops the build,
typically requiring the user to mess around with compile flags, Makefiles,
dependencies, and so forth.

Often the exact search paths aren't obvious to the user.  Consider the
case where the include paths are injected via a tool such as pkg-config,
such as e.g.:

  gcc $(pkg-config --cflags glib-2.0) demo.c

This patch is an attempt at being more helpful for such cases.  Given that
the user can't proceed until the issue is resolved, I think it's reasonable
to default to telling the user as much as possible about what happened.
This patch list all of the search paths, and any close matches (e.g. for
misspellings).

Without the patch, the current behavior is:

misspelled-header-1.c:1:10: fatal error: test-header.hpp: No such file or 
directory
1 | #include "test-header.hpp"
  |  ^
compilation terminated.

With the patch, the user gets this output:

misspelled-header-1.c:1:10: fatal error: test-header.hpp: No such file or 
directory
1 | #include "test-header.hpp"
  |  ^
misspelled-header-1.c:1:10: note: paths searched:
misspelled-header-1.c:1:10: note:  path: ''
misspelled-header-1.c:1:10: note:   not found: 'test-header.hpp'
misspelled-header-1.c:1:10: note:   close match: 'test-header.h'
1 | #include "test-header.hpp"
  |  ^
  |  "test-header.h"
misspelled-header-1.c:1:10: note:  path: '/usr/include/glib-2.0' (via '-I')
misspelled-header-1.c:1:10: note:   not found: 
'/usr/include/glib-2.0/test-header.hpp'
misspelled-header-1.c:1:10: note:  path: '/usr/lib64/glib-2.0/include' (via 
'-I')
misspelled-header-1.c:1:10: note:   not found: 
'/usr/lib64/glib-2.0/include/test-header.hpp'
misspelled-header-1.c:1:10: note:  path: './include' (system directory)
misspelled-header-1.c:1:10: note:   not found: './include/test-header.hpp'
misspelled-header-1.c:1:10: note:  path: './include-fixed' (system directory)
misspelled-header-1.c:1:10: note:   not found: './include-fixed/test-header.hpp'
misspelled-header-1.c:1:10: note:  path: '/usr/local/include' (system directory)
misspelled-header-1.c:1:10: note:   not found: 
'/usr/local/include/test-header.hpp'
misspelled-header-1.c:1:10: note:  path: '/usr/include' (system directory)
misspelled-header-1.c:1:10: note:   not found: '/usr/include/test-header.hpp'
compilation terminated.

showing the paths that were tried, and why (e.g. the -I paths injected by
the pkg-config invocation), and the .hpp vs .h issue (with a fix-it hint).

It's verbose, but as I said above, the user can't proceed until they
resolve it, so I think being verbose is appropriate here.

Thoughts?


I think printing the directories and especially the near matches
will be very helpful, especially for big projects with lots of -I
options.

The output could be made substantially shorter, less repetitive,
and so easier to read -- basically cut in half -- by avoiding
most of the duplication and collapsing two notes into one, e.g.
like so:

  fatal error: test-header.hpp: No such file or directory
  1 | #include "test-header.hpp"
|  ^
  note: paths searched:
  note: -I '.'
  note:   close match: 'test-header.h'
  1 | #include "test-header.hpp"
|  ^
|  "test-header.h"
  note: -I '/usr/include/glib-2.0'
  note: -I '/usr/lib64/glib-2.0/include'
  note: -isystem './include'
  note: -isystem './include-fixed'
  note: -isystem '/usr/local/include'
  note: -isystem '/usr/include'

or by printing the directories in sections:

  note: -I paths searched:
  note:   '.'
  note:   close match: 'test-header.h'
  1 | #include "test-header.hpp"
|  ^
|  "test-header.h"
  note:   '/usr/include/glib-2.0'
  note:   '/usr/lib64/glib-2.0/include'
  note: -isystem paths searched:
  note:   './include'
  note:   './include-fixed'
  note:   '/usr/local/include'
  note:   '/usr/include'

Martin



Re: PR fortran/87919 patch for -fno-dec-structure

2018-11-12 Thread Fritz Reese
On Mon, Nov 12, 2018 at 3:42 PM Jakub Jelinek  wrote:
> Ok, so I'll ack it for trunk now, but please give the other Fortran
> maintainers one day to disagree before committing.
> For the release branches, I'd wait two weeks or so before backporting it.
>

Roger that. I'll happily give it some time. Thanks for looking it over.

Fritz


Re: PR fortran/87919 patch for -fno-dec-structure

2018-11-12 Thread Jakub Jelinek
On Mon, Nov 12, 2018 at 03:28:47PM -0500, Fritz Reese wrote:
> Actually, the gcc frontend appears to move -std= before the
> language-specific options before f951 is even executed regardless of
> its location compared to the -fdec flags. I don't know if this is a

That is because:
#define F951_OPTIONS"%(cc1_options) %{J*} \
 %{!nostdinc:-fintrinsic-modules-path finclude%s}\
 %{!fsyntax-only:%(invoke_as)}"
and
static const char *cc1_options =
"%{pg:%{fomit-frame-pointer:%e-pg and -fomit-frame-pointer are incompatible}}\
 %{!iplugindir*:%{fplugin*:%:find-plugindir()}}\
 %1 %{!Q:-quiet} %{!dumpbase:-dumpbase %B} %{d*} %{m*} %{aux-info*}\
 %{fcompare-debug-second:%:compare-debug-auxbase-opt(%b)} \
 %{!fcompare-debug-second:%{c|S:%{o*:-auxbase-strip %*}%{!o*:-auxbase 
%b}}}%{!c:%{!S:-auxbase %b}} \
 %{g*} %{O*} %{W*&pedantic*} %{w} %{std*&ansi&trigraphs}\
 %{v:-version} %{pg:-p} %{p} %{f*} %{undef}\

where %{std*&ansi&trigraphs} comes before %{f*}.
I guess let's not change that behavior.

> bug or if it is by design -- the feeling I get is that the gcc
> frontend processes it first since it is recognized before the flang
> specific options. Therefore, greedily setting the standard options the
> first time flag_dec appears means the standard information is lost and
> I believe your suggestion is correct: the standard flags must be set
> only once in gfc_post_options.
> 
> In fact the new testcase dec_bitwise_ops_3.f90 is a good test of this:
> it uses -fdec -fno-dec -std=legacy to avoid warnings for XOR. With the
> version posted previously, the -std=legacy is overwritten by -fno-dec
> and warnings still appear. Here's what I'd change from the previous
> patch to support this:

LGTM.

> > Anyway, that is all from me, I still don't want to stomp on Fortran
> > maintainer's review (use my global reviewer's rights for that) and
> > thus I'm deferring the review to them.  When committing, please make sure
> > to include Mark's email in the ChangeLog next to yours to credit him.
> 
> Thanks for your comments. I think nobody will feel stomped on since
> maintainers are sparse and busy. I will certainly make note of Mark's
> contributions when committing.

Ok, so I'll ack it for trunk now, but please give the other Fortran
maintainers one day to disagree before committing.
For the release branches, I'd wait two weeks or so before backporting it.

Thanks.

Jakub


[PATCH, fortran] PR fortran/85982 -- Fix ICE on invalid attributes inside DEC structures

2018-11-12 Thread Fritz Reese
All,

The simple patch below (and attached) fixes PR 85982. The issue is an
omission of the macro gfc_comp_struct() which would include DEC
structures in certain attribute checks that are performed for
derived-TYPE declarations in decl.c. In the case described in the PR
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85982) there is an ICE
because the presence of an invalid EXTERNAL attribute leaks through to
resolve_component, invalidating some invariants for objects which are
supposed to be EXTERNAL.

This is fairly obvious so I would commit to trunk and backport to
7-branch and 8-branch if nobody sees any issues this week or so.
(Nb. the test case is named dec_structure_28.f90 so as not to conflict
with the pending patch for PR fortran/87919 which adds
dec_structure_{24-27}.f90.)

--
Fritz

>From dc5a072017af29ca1e84b85b0e3a1e6af49a6928 Mon Sep 17 00:00:00 2001
From: Fritz Reese 
Date: Mon, 12 Nov 2018 15:19:39 -0500

Fix ICE due to erroneously accepted component attributes in DEC structures.

gcc/fortran/
* decl.c (match_attr_spec): Lump COMP_STRUCTURE/COMP_MAP into attribute
checking used by TYPE.

gcc/testsuite/
* gfortran.dg/dec_structure_28.f90: New test.
---
 gcc/fortran/decl.c | 17 -
 gcc/testsuite/gfortran.dg/dec_structure_28.f90 | 35 ++
 2 files changed, 46 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/dec_structure_28.f90
index 87c736fb2db..2b294fdf65f 100644
--- a/gcc/fortran/decl.c
+++ b/gcc/fortran/decl.c
@@ -5184,15 +5184,18 @@ match_attr_spec (void)
   if (d == DECL_STATIC && seen[DECL_SAVE])
continue;

-  if (gfc_current_state () == COMP_DERIVED
+  if (gfc_comp_struct (gfc_current_state ())
  && d != DECL_DIMENSION && d != DECL_CODIMENSION
  && d != DECL_POINTER   && d != DECL_PRIVATE
  && d != DECL_PUBLIC && d != DECL_CONTIGUOUS && d != DECL_NONE)
{
+ const char* const state_name = (gfc_current_state () == COMP_DERIVED
+ ? "TYPE" : "STRUCTURE");
  if (d == DECL_ALLOCATABLE)
{
  if (!gfc_notify_std (GFC_STD_F2003, "ALLOCATABLE "
-  "attribute at %C in a TYPE definition"))
+  "attribute at %C in a %s definition",
+  state_name))
{
  m = MATCH_ERROR;
  goto cleanup;
@@ -5201,7 +5204,8 @@ match_attr_spec (void)
  else if (d == DECL_KIND)
{
  if (!gfc_notify_std (GFC_STD_F2003, "KIND "
-  "attribute at %C in a TYPE definition"))
+  "attribute at %C in a %s definition",
+  state_name))
{
  m = MATCH_ERROR;
  goto cleanup;
@@ -5225,7 +5229,8 @@ match_attr_spec (void)
  else if (d == DECL_LEN)
{
  if (!gfc_notify_std (GFC_STD_F2003, "LEN "
-  "attribute at %C in a TYPE definition"))
+  "attribute at %C in a %s definition",
+  state_name))
{
  m = MATCH_ERROR;
  goto cleanup;
@@ -5248,8 +5253,8 @@ match_attr_spec (void)
}
  else
{
- gfc_error ("Attribute at %L is not allowed in a TYPE definition",
-&seen_at[d]);
+ gfc_error ("Attribute at %L is not allowed in a %s definition",
+&seen_at[d], state_name);
  m = MATCH_ERROR;
  goto cleanup;
}
diff --git a/gcc/testsuite/gfortran.dg/dec_structure_28.f90
b/gcc/testsuite/gfortran.dg/dec_structure_28.f90
new file mode 100644
index 000..bab08b2d5c3
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/dec_structure_28.f90
@@ -0,0 +1,35 @@
+! { dg-do compile }
+! { dg-options "-fdec-structure -fdec-static" }
+!
+! PR fortran/85982
+!
+! Test a regression wherein some component attributes were erroneously accepted
+! within a DEC structure.
+!
+
+structure /s/
+  integer :: a
+  integer, intent(in) :: b ! { dg-error "is not allowed" }
+  integer, intent(out) :: c ! { dg-error "is not allowed" }
+  integer, intent(inout) :: d ! { dg-error "is not allowed" }
+  integer, dimension(1,1) :: e ! OK
+  integer, external, pointer :: f ! { dg-error "is not allowed" }
+  integer, intrinsic :: f ! { dg-error "is not allowed" }
+  integer, optional :: g ! { dg-error "is not allowed" }
+  integer, parameter :: h ! { dg-error "is not allowed" }
+  integer, protected :: i ! { dg-error "is not allowed" }
+  integer, private :: j ! { dg-error "is not allowed" }
+  integer, static :: k ! { dg-error "is not allowed" }
+  integer, automatic :: l ! { dg-error "is not allowed" }
+  integer, public :: m ! { dg-error "is not

Re: [PATCH] RFC: elide repeated source locations (PR other/84889)

2018-11-12 Thread Martin Sebor

On 11/11/2018 07:43 PM, David Malcolm wrote:

We often emit more than one diagnostic at the same source location.
For example, the C++ frontend can emit many diagnostics at
the same source location when suggesting overload candidates.

For example:

../../src/gcc/testsuite/g++.dg/diagnostic/bad-binary-ops.C: In function 'int 
test_3(s, t)':
../../src/gcc/testsuite/g++.dg/diagnostic/bad-binary-ops.C:38:18: error: no match for 
'operator&&' (operand types are 's' and 't')
   38 |   return param_s && param_t;
  |  ^~
../../src/gcc/testsuite/g++.dg/diagnostic/bad-binary-ops.C:38:18: note: candidate: 
'operator&&(bool, bool)' 
../../src/gcc/testsuite/g++.dg/diagnostic/bad-binary-ops.C:38:18: note:   no 
known conversion for argument 2 from 't' to 'bool'

This is overly verbose.  Note how the same location has been printed
three times, obscuring the pertinent messages.

This patch add a new "elide" value to -fdiagnostics-show-location=
and makes it the default (previously it was "once").  With elision
the above is printed as:

../../src/gcc/testsuite/g++.dg/diagnostic/bad-binary-ops.C: In function 'int 
test_3(s, t)':
../../src/gcc/testsuite/g++.dg/diagnostic/bad-binary-ops.C:38:18: error: no match for 
'operator&&' (operand types are 's' and 't')
   38 |   return param_s && param_t;
  |  ^~
  = note: candidate: 'operator&&(bool, bool)' 
  = note:   no known conversion for argument 2 from 't' to 'bool'

where the followup notes are printed with a '=' lined up with
the source code margin.

Thoughts?


I agree the long pathname in the notes is at first glance redundant
but I'm not sure about using '=' as a shorthand for it.  I have
written many scripts to parse GCC output to extract all diagnostics
(including notes) and publish those on a Web page somewhere, as I'm
sure must have others.  All those scripts would stop working with
this change and require changes to the build system to work again.
Making those changes can be a substantial undertaking in some
organizations.

Have you considered printing just the file name instead?  Or any
other alternatives?

Martin


Re: PR fortran/87919 patch for -fno-dec-structure

2018-11-12 Thread Fritz Reese
On Thu, Nov 8, 2018 at 12:54 PM Jakub Jelinek  wrote:
>
> On Thu, Nov 08, 2018 at 12:09:33PM -0500, Fritz Reese wrote:
> > > What about the
> > >   /* Allow legacy code without warnings.  */
> > >   gfc_option.allow_std |= GFC_STD_F95_OBS | GFC_STD_F95_DEL
> > > | GFC_STD_GNU | GFC_STD_LEGACY;
> > >   gfc_option.warn_std &= ~(GFC_STD_LEGACY | GFC_STD_F95_DEL);
> > > that is done for value, shouldn't set_dec_flags remove those
> > > flags again?  Maybe not the allow_std ones, because those are set already 
> > > by
> > > default, perhaps just the warn_std flags?
> > >
> >
> > Sure. I wasn't convinced about this and how it might interplay with
> > -std= so I left it alone, but I suppose it makes sense to unsuppress
> > the warnings when disabling -fdec.
>
> Perhaps it might be better not to change the allow_std/warn_std flags
> during the option parsing, instead set or clear say flag_dec and
> only when option processing is being finalized (gfc_post_options)
> check if flag_dec is set and set those.  It would change behavior of
> -fdec -std=f2018 and similar though.  Not sure what users expect.
>

Actually, the gcc frontend appears to move -std= before the
language-specific options before f951 is even executed regardless of
its location compared to the -fdec flags. I don't know if this is a
bug or if it is by design -- the feeling I get is that the gcc
frontend processes it first since it is recognized before the flang
specific options. Therefore, greedily setting the standard options the
first time flag_dec appears means the standard information is lost and
I believe your suggestion is correct: the standard flags must be set
only once in gfc_post_options.

In fact the new testcase dec_bitwise_ops_3.f90 is a good test of this:
it uses -fdec -fno-dec -std=legacy to avoid warnings for XOR. With the
version posted previously, the -std=legacy is overwritten by -fno-dec
and warnings still appear. Here's what I'd change from the previous
patch to support this:

diff --git a/gcc/fortran/options.c b/gcc/fortran/options.c
index af89a5d2faf..b7f7360215c 100644
--- a/gcc/fortran/options.c
+++ b/gcc/fortran/options.c
@@ -66,16 +66,6 @@ set_default_std_flags (void)
 static void
 set_dec_flags (int value)
 {
-  /* Allow legacy code without warnings.
- Nb. We do not unset the allowed standards with value == 0 because
- they are set by default in set_default_std_flags.  */
-  if (value)
-gfc_option.allow_std |= GFC_STD_F95_OBS | GFC_STD_F95_DEL
-  | GFC_STD_GNU | GFC_STD_LEGACY;
-
-  SET_BITFLAG (gfc_option.warn_std, !value, GFC_STD_LEGACY);
-  SET_BITFLAG (gfc_option.warn_std, !value, GFC_STD_F95_DEL);
-
   /* Set (or unset) other DEC compatibility extensions.  */
   SET_BITFLAG (flag_dollar_ok, value, value);
   SET_BITFLAG (flag_cray_pointer, value, value);
@@ -85,6 +75,24 @@ set_dec_flags (int value)
   SET_BITFLAG (flag_dec_math, value, value);
 }

+/* Finalize DEC flags.  */
+
+static void
+post_dec_flags (int value)
+{
+  /* Don't warn for legacy code if -fdec is given; however, setting -fno-dec
+ does not force these warnings.  We make one final determination on this
+ at the end because -std= is always set first; thus, we can avoid
+ clobbering the user's desired standard settings in gfc_handle_option
+ e.g. when -fdec and -fno-dec are both given.  */
+  if (value)
+{
+  gfc_option.allow_std |= GFC_STD_F95_OBS | GFC_STD_F95_DEL
+   | GFC_STD_GNU | GFC_STD_LEGACY;
+  gfc_option.warn_std &= ~(GFC_STD_LEGACY | GFC_STD_F95_DEL);
+}
+}
+
 /* Enable (or disable) -finit-local-zero.  */

 static void
@@ -248,6 +256,9 @@ gfc_post_options (const char **pfilename)
   char *source_path;
   int i;

+  /* Finalize DEC flags.  */
+  post_dec_flags (flag_dec);
+
   /* Excess precision other than "fast" requires front-end
  support.  */
   if (flag_excess_precision_cmdline == EXCESS_PRECISION_STANDARD)
@@

> Directives are only processed in the current file, so it doesn't really
> matter what the included file has as directives.  One could even have the
> included one be with expected dg-error lines and then include it in
> the ones that don't expect any.

Good to know, thanks! In that case, I like your suggestion of reducing
the test cases to includes. See new the newly attached patch for
updated cases.

> Anyway, that is all from me, I still don't want to stomp on Fortran
> maintainer's review (use my global reviewer's rights for that) and
> thus I'm deferring the review to them.  When committing, please make sure
> to include Mark's email in the ChangeLog next to yours to credit him.

Thanks for your comments. I think nobody will feel stomped on since
maintainers are sparse and busy. I will certainly make note of Mark's
contributions when committing.

Attached is the latest version, which builds and regtests cleanly on
x86_64-redhat-linux. OK for trunk, 7-branch, and 8-branch?

Fritz

>From 1cae11a88b29fe521e0e6c6c7c1796a7adb34cad Mon Sep 17 00:00:00 200

Re: [PATCH][LRA] Fix PR87899: r264897 cause mis-compiled native arm-linux-gnueabihf toolchain

2018-11-12 Thread Peter Bergner
On 11/12/18 6:25 AM, Renlin Li wrote:
> I tried to build a native arm-linuxeabihf toolchain with the patch.
> But I got the following ICE:

Ok, the issue was a problem in handling the src reg from a register copy.
I thought I could just remove it from the dead_set, but forgot that the
updating of the program points looks at whether the pseudo is live or
not.  The change below on top of the previous patch fixes the ICE for me.
I now add the src reg back into pseudos_live before we process the insn's
input operands so it doesn't trigger a new program point being added.

Renlin and Jeff, can you apply this patch on top of the previous one
and see whether that is better?

Thanks.

Peter


--- gcc/lra-lives.c.orig2018-11-12 14:15:18.257657911 -0600
+++ gcc/lra-lives.c 2018-11-12 14:08:55.978795092 -0600
@@ -934,6 +934,18 @@
  || sparseset_contains_pseudos_p (start_dying))
next_program_point (curr_point, freq);
 
+  /* If we removed the source reg from a simple register copy from the
+live set above, then add it back now so we don't accidentally add
+it to the start_living set below.  */
+  if (ignore_reg != NULL_RTX)
+   {
+ int ignore_regno = REGNO (ignore_reg);
+ if (HARD_REGISTER_NUM_P (ignore_regno))
+   SET_HARD_REG_BIT (hard_regs_live, ignore_regno);
+ else
+   sparseset_set_bit (pseudos_live, ignore_regno);
+   }
+
   sparseset_clear (start_living);
 
   /* Mark each used value as live. */
@@ -959,11 +971,6 @@
 
   sparseset_and_compl (dead_set, start_living, start_dying);
 
-  /* If we removed the source reg from a simple register copy from the
-live set, then it will appear to be dead, but it really isn't.  */
-  if (ignore_reg != NULL_RTX)
-   sparseset_clear_bit (dead_set, REGNO (ignore_reg));
-
   sparseset_clear (start_dying);
 
   /* Mark early clobber outputs dead.  */



Re: [PATCH 2/6] [RS6000] rs6000_output_indirect_call

2018-11-12 Thread Bill Schmidt
On 11/6/18 11:37 PM, Alan Modra wrote:
> Like the last patch for external calls, now handle most assembly code
> for indirect calls in one place.  The patch also merges some insns,
> correcting some !rs6000_speculate_indirect_jumps cases branching to
> LR, which don't require a speculation barrier.
>
>   * config/rs6000/rs6000-protos.h (rs6000_output_indirect_call): Declare.
>   * config/rs6000/rs6000.c (rs6000_output_indirect_call): New function.
>   * config/rs6000/rs6000.md (call_indirect_nonlocal_sysv): Use
>   rs6000_output_indirect_call.
>   (call_value_indirect_nonlocal_sysv, sibcall_nonlocal_sysv): Likewise.
>   (call_indirect_aix, call_value_indirect_aix,
>   call_indirect_elfv2, call_value_indirect_elfv2): Likewise, and
>   handle both speculation and non-speculation cases.
>   (call_indirect_aix_nospec, call_value_indirect_aix_nospec): Delete.
>   (call_indirect_elfv2_nospec, call_value_indirect_elfv2_nospec): Delete.
>
> diff --git a/gcc/config/rs6000/rs6000-protos.h 
> b/gcc/config/rs6000/rs6000-protos.h
> index f1a421dde16..493cfe6ba2b 100644
> --- a/gcc/config/rs6000/rs6000-protos.h
> +++ b/gcc/config/rs6000/rs6000-protos.h
> @@ -112,6 +112,7 @@ extern void rs6000_output_function_entry (FILE *, const 
> char *);
>  extern void print_operand (FILE *, rtx, int);
>  extern void print_operand_address (FILE *, rtx);
>  extern const char *rs6000_output_call (rtx *, unsigned int, bool, const char 
> *);
> +extern const char *rs6000_output_indirect_call (rtx *, unsigned int, bool);
>  extern enum rtx_code rs6000_reverse_condition (machine_mode,
>  enum rtx_code);
>  extern rtx rs6000_emit_eqne (machine_mode, rtx, rtx, rtx);
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index b22cae55a0d..bf1551746d5 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -21411,6 +21411,69 @@ rs6000_output_call (rtx *operands, unsigned int fun, 
> bool sibcall,
>return str;
>  }
>
> +/* As above, for indirect calls.  */
> +
> +const char *
> +rs6000_output_indirect_call (rtx *operands, unsigned int fun, bool sibcall)
> +{
> +  /* -Wformat-overflow workaround, without which gcc thinks that %u
> +  might produce 10 digits.  FUN is 0 or 1 as of 2018-03.  */
> +  gcc_assert (fun <= 6);
> +
> +  static char str[144];
> +  const char *ptrload = TARGET_64BIT ? "d" : "wz";
> +
> +  bool speculate = (rs6000_speculate_indirect_jumps
> + || (REG_P (operands[fun])
> + && REGNO (operands[fun]) == LR_REGNO));

Wouldn't hurt to have a comment here, indicating that we only have to generate
inefficient, speculation-inhibiting code when speculation via the count cache
has been disabled by switch, and this doesn't apply to indirect calls via the
link register.  This changes behavior of the code from before, but appears to
be safe and more correct.

> +
> +  if (DEFAULT_ABI == ABI_AIX)
> +{
> +  if (speculate)
> + sprintf (str,
> +  "l%s 2,%%%u\n\t"
> +  "b%%T%ul\n\t"
> +  "l%s 2,%%%u(1)",
> +  ptrload, fun + 2, fun, ptrload, fun + 3);
> +  else
> + sprintf (str,
> +  "crset 2\n\t"
> +  "l%s 2,%%%u\n\t"
> +  "beq%%T%ul-\n\t"
> +  "l%s 2,%%%u(1)",
> +  ptrload, fun + 2, fun, ptrload, fun + 3);
> +}
> +  else if (DEFAULT_ABI == ABI_ELFv2)
> +{
> +  if (speculate)
> + sprintf (str,
> +  "b%%T%ul\n\t"
> +  "l%s 2,%%%u(1)",
> +  fun, ptrload, fun + 2);
> +  else
> + sprintf (str,
> +  "crset 2\n\t"
> +  "beq%%T%ul-\n\t"
> +  "l%s 2,%%%u(1)",
> +  fun, ptrload, fun + 2);
> +}
> +  else if (DEFAULT_ABI == ABI_V4)
> +{
> +  if (speculate)
> + sprintf (str,
> +  "b%%T%u%s",
> +  fun, "l" + sibcall);

It's not at all clear to me what {"l" + sibcall} is doing here.
Whatever it is, it's clever enough that it warrants a comment... :-)
Does adding "l" to false result in the null string?  Is that
standard?

> +  else
> + sprintf (str,
> +  "crset 2\n\t"
> +  "beq%%T%u%s-%s",
> +  fun, "l" + sibcall, sibcall ? "\n\tb $" : "");

And similar...

> +}
> +  else
> +gcc_unreachable ();
> +  return str;
> +}
> +
>  #if defined (HAVE_GAS_HIDDEN) && !TARGET_MACHO
>  /* Emit an assembler directive to set symbol visibility for DECL to
> VISIBILITY_TYPE.  */
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index 52088fdfbdb..9d9e29d12eb 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -10540,11 +10540,7 @@ (define_insn "*call_indirect_nonlocal_sysv"
>else if (INTVAL (operands[2]) & CALL_V4_CLEAR_FP_ARGS)
>  output_asm_insn ("creqv 6,6,6", operands);
>
> -  if (rs6000_speculate_indirect_jumps

[PATCH 1/4] [aarch64/arm] Updating the cost table for xgene1.

2018-11-12 Thread Christoph Muellner
*** gcc/ChangeLog ***

2018-xx-xx  Christoph Muellner  

* config/arm/aarch-cost-tables.h (xgene1_extra_costs): Update the cost 
table
for Xgene1.
---
 gcc/config/arm/aarch-cost-tables.h | 88 +++---
 1 file changed, 44 insertions(+), 44 deletions(-)

diff --git a/gcc/config/arm/aarch-cost-tables.h 
b/gcc/config/arm/aarch-cost-tables.h
index 0bd93ba..2a28347 100644
--- a/gcc/config/arm/aarch-cost-tables.h
+++ b/gcc/config/arm/aarch-cost-tables.h
@@ -440,26 +440,26 @@ const struct cpu_cost_table xgene1_extra_costs =
   {
 0, /* arith.  */
 0, /* logical.  */
-0, /* shift.  */
+COSTS_N_INSNS (1), /* shift.  */
 COSTS_N_INSNS (1), /* shift_reg.  */
-COSTS_N_INSNS (1), /* arith_shift.  */
-COSTS_N_INSNS (1), /* arith_shift_reg.  */
-COSTS_N_INSNS (1), /* log_shift.  */
-COSTS_N_INSNS (1), /* log_shift_reg.  */
-COSTS_N_INSNS (1), /* extend.  */
-0, /* extend_arithm.  */
-COSTS_N_INSNS (1), /* bfi.  */
-COSTS_N_INSNS (1), /* bfx.  */
+COSTS_N_INSNS (2), /* arith_shift.  */
+COSTS_N_INSNS (2), /* arith_shift_reg.  */
+COSTS_N_INSNS (2), /* log_shift.  */
+COSTS_N_INSNS (2), /* log_shift_reg.  */
+0, /* extend.  */
+COSTS_N_INSNS (1), /* extend_arithm.  */
+0, /* bfi.  */
+0, /* bfx.  */
 0, /* clz.  */
-COSTS_N_INSNS (1), /* rev.  */
+0, /* rev.  */
 0, /* non_exec.  */
 true   /* non_exec_costs_exec.  */
   },
   {
 /* MULT SImode */
 {
-  COSTS_N_INSNS (4),   /* simple.  */
-  COSTS_N_INSNS (4),   /* flag_setting.  */
+  COSTS_N_INSNS (3),   /* simple.  */
+  COSTS_N_INSNS (3),   /* flag_setting.  */
   COSTS_N_INSNS (4),   /* extend.  */
   COSTS_N_INSNS (4),   /* add.  */
   COSTS_N_INSNS (4),   /* extend_add.  */
@@ -467,8 +467,8 @@ const struct cpu_cost_table xgene1_extra_costs =
 },
 /* MULT DImode */
 {
-  COSTS_N_INSNS (5),   /* simple.  */
-  0,   /* flag_setting (N/A).  */
+  COSTS_N_INSNS (4),   /* simple.  */
+  COSTS_N_INSNS (4),   /* flag_setting (N/A).  */
   COSTS_N_INSNS (5),   /* extend.  */
   COSTS_N_INSNS (5),   /* add.  */
   COSTS_N_INSNS (5),   /* extend_add.  */
@@ -477,55 +477,55 @@ const struct cpu_cost_table xgene1_extra_costs =
   },
   /* LD/ST */
   {
-COSTS_N_INSNS (5), /* load.  */
-COSTS_N_INSNS (6), /* load_sign_extend.  */
-COSTS_N_INSNS (5), /* ldrd.  */
+COSTS_N_INSNS (4), /* load.  */
+COSTS_N_INSNS (5), /* load_sign_extend.  */
+COSTS_N_INSNS (4), /* ldrd.  */
 COSTS_N_INSNS (5), /* ldm_1st.  */
 1, /* ldm_regs_per_insn_1st.  */
 1, /* ldm_regs_per_insn_subsequent.  */
-COSTS_N_INSNS (10),/* loadf.  */
-COSTS_N_INSNS (10),/* loadd.  */
-COSTS_N_INSNS (5), /* load_unaligned.  */
+COSTS_N_INSNS (9), /* loadf.  */
+COSTS_N_INSNS (9), /* loadd.  */
+0, /* load_unaligned.  */
 0, /* store.  */
 0, /* strd.  */
 0, /* stm_1st.  */
 1, /* stm_regs_per_insn_1st.  */
 1, /* stm_regs_per_insn_subsequent.  */
-0, /* storef.  */
-0, /* stored.  */
+COSTS_N_INSNS (3), /* storef.  */
+COSTS_N_INSNS (3), /* stored.  */
 0, /* store_unaligned.  */
-COSTS_N_INSNS (1), /* loadv.  */
-COSTS_N_INSNS (1)  /* storev.  */
+COSTS_N_INSNS (9), /* loadv.  */
+COSTS_N_INSNS (3)  /* storev.  */
   },
   {
 /* FP SFmode */
 {
-  COSTS_N_INSNS (23),  /* div.  */
-  COSTS_N_INSNS (5),   /* mult.  */
-  COSTS_N_INSNS (5),   /* mult_addsub. */
-  COSTS_N_INSNS (5),   /* fma.  */
-  COSTS_N_INSNS (5),   /* addsub.  */
-  COSTS_N_INSNS (2),   /* fpconst. */
-  COSTS_N_INSNS (3),   /* neg.  */
-  COSTS_N_INSNS (2),   /* compare.  */
-  COSTS_N_INSNS (6),   /* widen.  */
-  COSTS_N_INSNS (6),   /* narrow.  */
+  COSTS_N_INSNS (22),  /* div.  */
+  COSTS_N_INSNS (4),   /* mult.  */
+  COSTS_N_INSNS (4),   /* mult_addsub. */
+  COSTS_N_INSNS (4),   /* fma.  */
+  COSTS_N_INSNS (4),   /* addsub.  */
+  COSTS_N_INSNS (1),   /* fpconst. */
+  COSTS_N_INSNS (4),   /* neg.  */
+  COSTS_N_INSNS (9),   /* compare.  */
+  COSTS_N_INSNS (4),   /* widen.  */
+  COSTS_N_INSNS (4),

[PATCH 4/4] [aarch64] Update xgene1 tuning struct.

2018-11-12 Thread Christoph Muellner
*** gcc/ChangeLog ***

2018-xx-xx  Christoph Muellner  

* config/aarch64/aarch64.c (xgene1_tunings): Optimize Xgene1 tunings for
GCC 9.
---
 gcc/config/aarch64/aarch64.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 903f4e2..f7f88a9 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -944,14 +944,14 @@ static const struct tune_params xgene1_tunings =
   4, /* issue_rate  */
   AARCH64_FUSE_NOTHING, /* fusible_ops  */
   "16",/* function_align.  */
-  "8", /* jump_align.  */
+  "16",/* jump_align.  */
   "16",/* loop_align.  */
   2,   /* int_reassoc_width.  */
   4,   /* fp_reassoc_width.  */
   1,   /* vec_reassoc_width.  */
   2,   /* min_div_recip_mul_sf.  */
   2,   /* min_div_recip_mul_df.  */
-  0,   /* max_case_values.  */
+  17,  /* max_case_values.  */
   tune_params::AUTOPREFETCHER_OFF, /* autoprefetcher_model.  */
   (AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS),   /* tune_flags.  */
   &xgene1_prefetch_tune
-- 
2.9.5



[PATCH 2/4] [aarch64] Update xgene1_addrcost_table.

2018-11-12 Thread Christoph Muellner
*** gcc/ChangeLog ***

2018-xx-xx  Christoph Muellner  

* config/aarch64/aarch64.c (xgene1_addrcost_table): Correct the post 
modify
costs.
---
 gcc/config/aarch64/aarch64.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 815f824..a6bc1fb 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -254,7 +254,7 @@ static const struct cpu_addrcost_table 
xgene1_addrcost_table =
   1, /* ti  */
 },
   1, /* pre_modify  */
-  0, /* post_modify  */
+  1, /* post_modify  */
   0, /* register_offset  */
   1, /* register_sextend  */
   1, /* register_zextend  */
-- 
2.9.5



[PATCH 3/4] [aarch64] Add xgene1 prefetch tunings.

2018-11-12 Thread Christoph Muellner
*** gcc/ChangeLog ***

2018-xx-xx  Christoph Muellner  

* config/aarch64/aarch64.c (xgene1_tunings): Add Xgene1 specific
prefetch tunings.
---
 gcc/config/aarch64/aarch64.c | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index a6bc1fb..903f4e2 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -662,6 +662,17 @@ static const cpu_prefetch_tune tsv110_prefetch_tune =
   -1/* default_opt_level  */
 };
 
+static const cpu_prefetch_tune xgene1_prefetch_tune =
+{
+  8,   /* num_slots  */
+  32,  /* l1_cache_size  */
+  64,  /* l1_cache_line_size  */
+  256, /* l2_cache_size  */
+  true, /* prefetch_dynamic_strides */
+  -1,   /* minimum_stride */
+  -1   /* default_opt_level  */
+};
+
 static const struct tune_params generic_tunings =
 {
   &cortexa57_extra_costs,
@@ -943,7 +954,7 @@ static const struct tune_params xgene1_tunings =
   0,   /* max_case_values.  */
   tune_params::AUTOPREFETCHER_OFF, /* autoprefetcher_model.  */
   (AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS),   /* tune_flags.  */
-  &generic_prefetch_tune
+  &xgene1_prefetch_tune
 };
 
 static const struct tune_params qdf24xx_tunings =
-- 
2.9.5



[doc, committed] clarify rtl docs about mode of high and lo_sum

2018-11-12 Thread Sandra Loosemore
I've checked in this patch for PR 21110.  As noted in the issue, RTL 
high and lo_sum expressions don't have to be Pmode and are not 
restricted to address operands.


-Sandra
2018-11-12  Sandra Loosemore  

	PR middle-end/21110

	gcc/
	* doc/rtl.texi (Constants): Clarify that mode of "high" doesn't
	have to be Pmode.
	(Arithmetic): Likewise for "lo_sum".
Index: gcc/doc/rtl.texi
===
--- gcc/doc/rtl.texi	(revision 266034)
+++ gcc/doc/rtl.texi	(working copy)
@@ -1883,14 +1883,14 @@ of relocation operator.  @var{m} should
 
 @findex high
 @item (high:@var{m} @var{exp})
-Represents the high-order bits of @var{exp}, usually a
-@code{symbol_ref}.  The number of bits is machine-dependent and is
+Represents the high-order bits of @var{exp}.  
+The number of bits is machine-dependent and is
 normally the number of bits specified in an instruction that initializes
 the high order bits of a register.  It is used with @code{lo_sum} to
 represent the typical two-instruction sequence used in RISC machines to
-reference a global memory location.
-
-@var{m} should be @code{Pmode}.
+reference large immediate values and/or link-time constants such
+as global memory addresses.  In the latter case, @var{m} is @code{Pmode}
+and @var{exp} is usually a constant expression involving @code{symbol_ref}.
 @end table
 
 @findex CONST0_RTX
@@ -2429,15 +2429,15 @@ saturates at the maximum signed value re
 
 This expression represents the sum of @var{x} and the low-order bits
 of @var{y}.  It is used with @code{high} (@pxref{Constants}) to
-represent the typical two-instruction sequence used in RISC machines
-to reference a global memory location.
+represent the typical two-instruction sequence used in RISC machines to
+reference large immediate values and/or link-time constants such
+as global memory addresses.  In the latter case, @var{m} is @code{Pmode}
+and @var{y} is usually a constant expression involving @code{symbol_ref}.
 
 The number of low order bits is machine-dependent but is
-normally the number of bits in a @code{Pmode} item minus the number of
+normally the number of bits in mode @var{m} minus the number of
 bits set by @code{high}.
 
-@var{m} should be @code{Pmode}.
-
 @findex minus
 @findex ss_minus
 @findex us_minus


RE: PING [PATCH] RX new builtin function

2018-11-12 Thread Sebastian Perta
PING

> -Original Message-
> From: Sebastian Perta 
> Sent: 24 October 2018 18:19
> To: 'gcc-patches@gcc.gnu.org' 
> Cc: 'Nick Clifton' 
> Subject: [PATCH] RX new builtin function
> 
> Hi,
> 
> The following patch adds a new builtin function for rx (
__builtin_rx_bset) to
> make it possible for the user to use BSET whenever necessary.
> Please note this builtin function is dedicated only for the variant 32 bit
variant
> of BSET (when destination is a register).
> For the 8 bit variant (when destination is a memory location) another
builtin
> function is necessary.
> 
> The patch contains also a test case which I added in
testsuite/gcc.target/rx.
> 
> The patch also modifies extend.texi as necessary.
> 
> Regression test is OK, tested with the following command:
> make -k check-gcc RUNTESTFLAGS=--target_board=rx-sim
> 
> Please find below the changelog entries and patch.
> 
> Best Regards,
> Sebastian
> 
> --- ChangeLog
> 2018-10-23  Sebastian Perta  
> 
>   * config/rx/rx.c (RX_BUILTIN_BSET): New enum.
>   * config/rx/rx.c (rx_init_builtins): Added new builtin for BSET.
>   * config/rx/rx.c (rx_expand_builtin_bit_manip): New function.
>   * config/rx/rx.c (rx_expand_builtin): Added new case for BSET.
>   * doc/extend.texi (RX Built-in Functions): Added declaration for
>   __builtin_rx_bset.
> 
> testsuite/ChangeLog
> 2018-10-23  Sebastian Perta  
> 
>   * gcc.target/rx/testbset.c: New test.
> 
> 
> 
> 
> 
> Index: config/rx/rx.c
> ==
> =
> --- config/rx/rx.c(revision 265425)
> +++ config/rx/rx.c(working copy)
> @@ -2374,6 +2374,7 @@
>RX_BUILTIN_ROUND,
>RX_BUILTIN_SETPSW,
>RX_BUILTIN_WAIT,
> +  RX_BUILTIN_BSET,
>RX_BUILTIN_max
>  };
> 
> @@ -2440,6 +2441,7 @@
>ADD_RX_BUILTIN1 (ROUND,   "round",   intSI, float);
>ADD_RX_BUILTIN1 (REVW,"revw",intSI, intSI);
>ADD_RX_BUILTIN0 (WAIT,"wait",void);
> +  ADD_RX_BUILTIN2 (BSET,"bset",intSI, intSI, intSI);
>  }
> 
>  /* Return the RX builtin for CODE.  */
> @@ -2576,6 +2578,26 @@
>return target;
>  }
> 
> +static rtx
> +rx_expand_builtin_bit_manip(tree exp, rtx target, rtx (* gen_func)(rtx,
rtx,
> rtx))
> +{
> +  rtx arg1 = expand_normal (CALL_EXPR_ARG (exp, 0));
> +  rtx arg2 = expand_normal (CALL_EXPR_ARG (exp, 1));
> +
> +  if (! REG_P (arg1))
> +arg1 = force_reg (SImode, arg1);
> +
> +  if (! REG_P (arg2))
> +arg2 = force_reg (SImode, arg2);
> +
> +  if (target == NULL_RTX || ! REG_P (target))
> + target = gen_reg_rtx (SImode);
> +
> +  emit_insn(gen_func(target, arg2, arg1));
> +
> +  return target;
> +}
> +
>  static int
>  valid_psw_flag (rtx op, const char *which)
>  {
> @@ -2653,6 +2675,7 @@
>  case RX_BUILTIN_REVW:return rx_expand_int_builtin_1_arg
>   (op, target, gen_revw, false);
>  case RX_BUILTIN_WAIT:emit_insn (gen_wait ()); return NULL_RTX;
> + case RX_BUILTIN_BSET:   return
> rx_expand_builtin_bit_manip(exp, target, gen_bitset);
> 
>  default:
>internal_error ("bad builtin code");
> Index: doc/extend.texi
> ==
> =
> --- doc/extend.texi   (revision 265425)
> +++ doc/extend.texi   (working copy)
> @@ -19635,6 +19635,10 @@
>  Generates the @code{wait} machine instruction.
>  @end deftypefn
> 
> +@deftypefn {Built-in Function}  int __builtin_rx_bset (int, int)
> +Generates the @code{bset} machine instruction.
> +@end deftypefn
> +
>  @node S/390 System z Built-in Functions
>  @subsection S/390 System z Built-in Functions
>  @deftypefn {Built-in Function} int __builtin_tbegin (void*)
> Index: testsuite/ChangeLog
> Index: testsuite/gcc.target/rx/testbset.c
> ==
> =
> --- testsuite/gcc.target/rx/testbset.c(nonexistent)
> +++ testsuite/gcc.target/rx/testbset.c(working copy)
> @@ -0,0 +1,53 @@
> +/* { dg-do run } */
> +
> +#include 
> +
> +int f1(int a, int b) __attribute((noinline));
> +int f1(int a, int b)
> +{
> + return __builtin_rx_bset (a, b);
> +}
> +
> +int f2(int a) __attribute((noinline));
> +int f2(int a)
> +{
> + return __builtin_rx_bset (a, 1);
> +}
> +
> +int x, y;
> +
> +int f3() __attribute((noinline));
> +int f3()
> +{
> + return __builtin_rx_bset (x, 4);
> +}
> +
> +int f4() __attribute((noinline));
> +int f4()
> +{
> + return __builtin_rx_bset (x, y);
> +}
> +
> +void f5() __attribute((noinline));
> +void f5()
> +{
> + x = __builtin_rx_bset (x, 6);
> +}
> +
> +int main()
> +{
> + if(f1(0xF, 8) != 0x10F)
> + abort();
> + if(f2(0xC) != 0xE)
> + abort();
> + x = 0xF;
> + if(f3() != 0x1F)
> + abort();
> + y = 5;
> + if(f4() != 0x2F)
> + abort();
> + f5();
> + if(x != 0x4F)
> + abort();
> + exit(0);
> +}



Re: [PATCH 21/25] GCN Back-end (part 2/2).

2018-11-12 Thread Jeff Law
On 11/12/18 10:52 AM, Andrew Stubbs wrote:
> On 12/11/2018 17:20, Segher Boessenkool wrote:
>> If you don't want useless USEs deleted, use UNSPEC_VOLATILE instead?
>> Or actually use the register, i.e. as input to an actually needed
>> instruction.
> 
> They're not useless. If we want to do scalar operations in vector
> registers (and we often do, on this target), then we need to write a "1"
> into the EXEC (vector mask) register.
Presumably you're setting up active lanes or some such.  This may
ultimately be better modeled by ignoring the problem until much later in
the pipeline.

Shortly before assembly output you run a little LCM-like pass to find
optimal points to insert the assignment to the vector register.  It's a
lot like the mode switching stuff or zeroing the upper halves of the AVX
regsiters to avoid partial register stalls.  THe local properties are
different, but these all feel like the same class of problem.

> 
> Unless we want to rewrite all scalar operations in terms of vec_merge
> then there's no way to "actually use the register".
I think you need to correctly model it.  If you lie to the compiler
about what's going on, you're going to run into problems.
> 
> I might investigate putting the USE inside an UNSPEC_VOLATILE. That
> would have the advantage of letting combine run again. This feels like a
> future project I'd rather not have block the port submission though.
The gcn_legitimate_combined_insn code isn't really acceptable though.
You need a cleaner solution here.

> 
> If there are two instructions that both have an UNSPEC_VOLATILE, will
> combine coalesce them into one in the combined pattern?
I think you can put a different constant on each.

jeff


Re: [PATCH] detect attribute mismatches in alias declarations (PR 81824)

2018-11-12 Thread Martin Sebor

On 11/12/2018 11:29 AM, Matthew Malcomson wrote:

Hello Martin,

The new testcase Wattribute-alias.c fails on targets without ifunc
support (e.g. aarch64-none-elf cross-build).

It seems that just adding a directive `{ dg-require-ifunc "" }` to the
test file changes the test to unsupported instead of having a fail.

I don't know much about this patch so I don't know if the non-ifunc
checks would still be useful on such targets.

Would the simple change be OK? or would it be best to split the test
file into multiple parts to still run the other checks?


I just committed the former change earlier today but splitting
the test would have probably been a better way to go.  Thanks
for reporting it just the same!  If you would prefer to split
the test that would be fine with me.

Martin


Regards,
Matthew


On 09/11/18 17:33, Martin Sebor wrote:

+/* Handle the "copy" attribute by copying the set of attributes
+   from the symbol referenced by ARGS to the declaration of *NODE.  */
+
+static tree
+handle_copy_attribute (tree *node, tree name, tree args,
+   int flags, bool *no_add_attrs)
+{
+  /* Break cycles in circular references.  */
+  static hash_set attr_copy_visited;

Does this really need to be static?


The variable was intended to break cycles in recursive calls to
the function for self-referential applications of attribute copy
but since the attribute itself is not applied (anymore) such cycles
can no longer form.  I have removed the variable and simplified
the handlers (there are tests to verify this works correctly).


diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index cfe6a8e..8ffb0cd 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 5c95f67..c027acd 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi

[ ... ]


+
+In C++, the warning is issued when an explicitcspecialization of a
primary

"explicitcspecialization" ? :-)



Fixed.



Looks pretty good.  There's the explicit specialization nit and the
static vs auto question for attr_copy_visited.  Otherwise it's OK.


Thanks.  I've retested a revision with the changes discussed here
and committed it as r265980.

Martin






Re: [PATCH 21/25] GCN Back-end (part 2/2).

2018-11-12 Thread Segher Boessenkool
On Mon, Nov 12, 2018 at 05:52:25PM +, Andrew Stubbs wrote:
> On 12/11/2018 17:20, Segher Boessenkool wrote:
> >If you don't want useless USEs deleted, use UNSPEC_VOLATILE instead?
> >Or actually use the register, i.e. as input to an actually needed
> >instruction.
> 
> They're not useless.

> >If combine is changing an X and a USE to just that X if it can, combine
> >is doing a great job!

Actually, it is incorrect to delete a USE.

Please open a PR.  Thanks.


Segher


Re: [PATCH] detect attribute mismatches in alias declarations (PR 81824)

2018-11-12 Thread Matthew Malcomson
Hello Martin,

The new testcase Wattribute-alias.c fails on targets without ifunc 
support (e.g. aarch64-none-elf cross-build).

It seems that just adding a directive `{ dg-require-ifunc "" }` to the 
test file changes the test to unsupported instead of having a fail.

I don't know much about this patch so I don't know if the non-ifunc 
checks would still be useful on such targets.

Would the simple change be OK? or would it be best to split the test 
file into multiple parts to still run the other checks?

Regards,
Matthew


On 09/11/18 17:33, Martin Sebor wrote:
>>> +/* Handle the "copy" attribute by copying the set of attributes
>>> +   from the symbol referenced by ARGS to the declaration of *NODE.  */
>>> +
>>> +static tree
>>> +handle_copy_attribute (tree *node, tree name, tree args,
>>> +   int flags, bool *no_add_attrs)
>>> +{
>>> +  /* Break cycles in circular references.  */
>>> +  static hash_set attr_copy_visited;
>> Does this really need to be static?
>
> The variable was intended to break cycles in recursive calls to
> the function for self-referential applications of attribute copy
> but since the attribute itself is not applied (anymore) such cycles
> can no longer form.  I have removed the variable and simplified
> the handlers (there are tests to verify this works correctly).
>
>>> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
>>> index cfe6a8e..8ffb0cd 100644
>>> --- a/gcc/doc/extend.texi
>>> +++ b/gcc/doc/extend.texi
>>> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
>>> index 5c95f67..c027acd 100644
>>> --- a/gcc/doc/invoke.texi
>>> +++ b/gcc/doc/invoke.texi
>> [ ... ]
>>
>>> +
>>> +In C++, the warning is issued when an explicitcspecialization of a 
>>> primary
>> "explicitcspecialization" ? :-)
>>
>
> Fixed.
>
>>
>> Looks pretty good.  There's the explicit specialization nit and the
>> static vs auto question for attr_copy_visited.  Otherwise it's OK.
>
> Thanks.  I've retested a revision with the changes discussed here
> and committed it as r265980.
>
> Martin



Re: [PATCH 2/3][GCC][AARCH64] Add new -mbranch-protection option to combine pointer signing and BTI

2018-11-12 Thread Sudakshina Das
Hi Sam

On 02/11/18 17:31, Sam Tebbs wrote:
> Hi all,
> 
> The -mbranch-protection option combines the functionality of
> -msign-return-address and the BTI features new in Armv8.5 to better reflect
> their relationship. This new option therefore supersedes and deprecates the
> existing -msign-return-address option.
> 
> -mbranch-protection=[none|standard|] - Turns on different types of 
> branch
> protection available where:
> 
>   * "none": Turn of all types of branch protection
>   * "standard" : Turns on all the types of protection to their respective
> standard levels.
>   *  can be "+" separated protection types:
> 
>   * "bti" : Branch Target Identification Mechanism.
>   * "pac-ret{+leaf+b-key}": Return Address Signing. The default return
> address signing is enabled by signing functions that save the return
> address to memory (non-leaf functions will practically always do this)
> using the a-key. The optional tuning arguments allow the user to
> extend the scope of return address signing to include leaf functions
> and to change the key to b-key. The tuning arguments must proceed the
> protection type "pac-ret".
> 
> Thus -mbranch-protection=standard -> -mbranch-protection=bti+pac-ret.
> 
> Its mapping to -msign-return-address is as follows:
> 
>   * -mbranch-protection=none -> -msign-return-address=none
>   * -mbranch-protection=standard -> -msign-return-address=leaf
>   * -mbranch-protection=pac-ret -> -msign-return-address=non-leaf
>   * -mbranch-protection=pac-ret+leaf -> -msign-return-address=all
> 
> This patch implements the option's skeleton and the "none", "standard" and
> "pac-ret" types (along with its "leaf" subtype).
> 
> The previous patch in this series is here:
> https://gcc.gnu.org/ml/gcc-patches/2018-11/msg00103.html
> 
> Bootstrapped successfully and tested on aarch64-none-elf with no regressions.
> 
> OK for trunk?
> 

Thank for doing this. I am not a maintainer so you will need a
maintainer's approval. Only nit, that I would add is that it would
be good to have more test coverage, specially for the new parsing
functions that have been added and the errors that are added.

Example checking a few valid and invalid combinations of the options
like:
-mbranch-protection=pac-ret -mbranch-protection=none //disables
everything
-mbranch-protection=leaf  //errors out
-mbranch-protection=none+pac-ret //errors out
... etc

Also instead of removing all the old deprecated options, you can keep
one (or a copy of one) to check for the deprecated warning.


diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 
e290128f535f3e6b515bff5a81fae0aa0d1c8baf..07cfe69dc3dd9161a2dd93089ccf52ef251208d2
 
100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -15221,13 +15222,18 @@ accessed using a single instruction and 
emitted after each function.  This
  limits the maximum size of functions to 1MB.  This is enabled by 
default for
  @option{-mcmodel=tiny}.

-@item -msign-return-address=@var{scope}
-@opindex msign-return-address
-Select the function scope on which return address signing will be applied.
-Permissible values are @samp{none}, which disables return address signing,
-@samp{non-leaf}, which enables pointer signing for functions which are 
not leaf
-functions, and @samp{all}, which enables pointer signing for all 
functions.  The
-default value is @samp{none}.
+@item 
-mbranch-protection=@var{none}|@var{standard}|@var{pac-ret}[+@var{leaf}]
+@opindex mbranch-protection
+Select the branch protection features to use.
+@samp{none} is the default and turns off all types of branch protection.
+@samp{standard} turns on all types of branch protection features.  If a 
feature
+has additional tuning options, then @samp{standard} sets it to its standard
+level.
+@samp{pac-ret[+@var{leaf}]} turns on return address signing to its standard
+level: signing functions that save the return address to memory (non-leaf
+functions will practically always do this) using the a-key.  The optional
+argument @samp{leaf} can be used to extend the signing to include leaf
+functions.

I am not sure if deleting the previous documentation of
-msign-retun-address is the way to go. Maybe add a "this has been
deprecated and refer to -mbranch-protection" to its description.

Thanks
Sudi

> gcc/ChangeLog:
> 
> 2018-11-02  Sam Tebbs
> 
>   * config/aarch64/aarch64.c (BRANCH_PROTEC_STR_MAX,
>   aarch64_parse_branch_protection,
>   struct aarch64_branch_protec_type,
>   aarch64_handle_no_branch_protection,
>   aarch64_handle_standard_branch_protection,
>   aarch64_validate_mbranch_protection,
>   aarch64_handle_pac_ret_protection,
>   aarch64_handle_attr_branch_protection,
>   accepted_branch_protection_string,
>   aarch64_pac_ret_subtypes,
>   aarch64_branch_protec_types,
>   aarch64_handle_pac_ret_leaf): Define.
>   (aarch64_override_options_after_change_1): Add ch

Re: [PATCH][cunroll] Add unroll-known-loop-iterations-only param and use it in aarch64

2018-11-12 Thread Kyrill Tkachov



On 12/11/18 14:10, Richard Biener wrote:

On Fri, Nov 9, 2018 at 6:57 PM Kyrill Tkachov
 wrote:

On 09/11/18 12:18, Richard Biener wrote:

On Fri, Nov 9, 2018 at 11:47 AM Kyrill Tkachov
 wrote:

Hi all,

In this testcase the codegen for VLA SVE is worse than it could be due to 
unrolling:

fully_peel_me:
  mov x1, 5
  ptrue   p1.d, all
  whilelo p0.d, xzr, x1
  ld1dz0.d, p0/z, [x0]
  faddz0.d, z0.d, z0.d
  st1dz0.d, p0, [x0]
  cntdx2
  addvl   x3, x0, #1
  whilelo p0.d, x2, x1
  beq .L1
  ld1dz0.d, p0/z, [x0, #1, mul vl]
  faddz0.d, z0.d, z0.d
  st1dz0.d, p0, [x3]
  cntwx2
  incbx0, all, mul #2
  whilelo p0.d, x2, x1
  beq .L1
  ld1dz0.d, p0/z, [x0]
  faddz0.d, z0.d, z0.d
  st1dz0.d, p0, [x0]
.L1:
  ret

In this case, due to the vector-length-agnostic nature of SVE the compiler 
doesn't know the loop iteration count.
For such loops we don't want to unroll if we don't end up eliminating branches 
as this just bloats code size
and hurts icache performance.

This patch introduces a new unroll-known-loop-iterations-only param that 
disables cunroll when the loop iteration
count is unknown (SCEV_NOT_KNOWN). This case occurs much more often for SVE VLA 
code, but it does help some
Advanced SIMD cases as well where loops with an unknown iteration count are not 
unrolled when it doesn't eliminate
the branches.

So for the above testcase we generate now:
fully_peel_me:
  mov x2, 5
  mov x3, x2
  mov x1, 0
  whilelo p0.d, xzr, x2
  ptrue   p1.d, all
.L2:
  ld1dz0.d, p0/z, [x0, x1, lsl 3]
  faddz0.d, z0.d, z0.d
  st1dz0.d, p0, [x0, x1, lsl 3]
  incdx1
  whilelo p0.d, x1, x3
  bne .L2
  ret

Not perfect still, but it's preferable to the original code.
The new param is enabled by default on aarch64 but disabled for other targets, 
leaving their behaviour unchanged
(until other target people experiment with it and set it, if appropriate).

Bootstrapped and tested on aarch64-none-linux-gnu.
Benchmarked on SPEC2017 on a Cortex-A57 and there are no differences in 
performance.

Ok for trunk?

Hum.  Why introduce a new --param and not simply key on
flag_peel_loops instead?  That is
enabled by default at -O3 and with FDO but you of course can control
that in your targets
post-option-processing hook.

You mean like this?
It's certainly a simpler patch, but I was just a bit hesitant of making this 
change for all targets :)
But I suppose it's a reasonable change.

No, that change is backward.  What I said is that peeling is already
conditional on
flag_peel_loops and that is enabled by -O3.  So you want to disable
flag_peel_loops for
SVE instead in the target.


Sorry, I got confused by the similarly named functions.
I'm talking about try_unroll_loop_completely when run as part of 
canonicalize_induction_variables i.e. the "ivcanon" pass
(sorry about blaming cunroll here). This doesn't get called through the 
try_unroll_loops_completely path.

try_unroll_loop_completely doesn't get disabled with -fno-peel-loops or 
-fno-unroll-loops.
Maybe disabling peeling inside try_unroll_loop_completely itself when 
!flag_peel_loops is viable?

Thanks,
Kyrill


It might also make sense to have more fine-grained control for this
and allow a target
to say whether it wants to peel a specific loop or not when the
middle-end thinks that
would be profitable.

Can be worth looking at as a follow-up. Do you envisage the target analysing
the gimple statements of the loop to figure out its cost?

Kind-of.  Sth like

   bool targetm.peel_loop (struct loop *);

I have no idea whether you can easily detect a SVE vectorized loop though.
Maybe there's always a special IV or so (the mask?)

Richard.


Thanks,
Kyrill


2018-11-09  Kyrylo Tkachov  

 * tree-ssa-loop-ivcanon.c (try_unroll_loop_completely): Do not unroll
 loop when number of iterations is not known and flag_peel_loops is in
 effect.

2018-11-09  Kyrylo Tkachov  

 * gcc.target/aarch64/sve/unroll-1.c: New test.





Re: [PATCH 21/25] GCN Back-end (part 2/2).

2018-11-12 Thread Andrew Stubbs

On 12/11/2018 17:20, Segher Boessenkool wrote:

If you don't want useless USEs deleted, use UNSPEC_VOLATILE instead?
Or actually use the register, i.e. as input to an actually needed
instruction.


They're not useless. If we want to do scalar operations in vector 
registers (and we often do, on this target), then we need to write a "1" 
into the EXEC (vector mask) register.


Unless we want to rewrite all scalar operations in terms of vec_merge 
then there's no way to "actually use the register".


There are additional patterns that do scalar operations in scalar 
registers, and therefore do not depend on EXEC, but there are not a 
complete set of instructions for these, so usually we don't use those 
until reload_completed (via splits). I did think of simply disabling 
them until reload_completed, but there are cases where we do want them, 
so that didn't work.


Of course, it's possible that we took a wrong turn early on and ended up 
with a sub-optimal arrangement, but it is where we are.



If combine is changing an X and a USE to just that X if it can, combine
is doing a great job!


Not if the "simpler" instruction is somehow more expensive. And, in our 
case, it isn't the instruction itself that is more expensive, but the 
extra instructions that may (or may not) need to be inserted around it 
later.


I might investigate putting the USE inside an UNSPEC_VOLATILE. That 
would have the advantage of letting combine run again. This feels like a 
future project I'd rather not have block the port submission though.


If there are two instructions that both have an UNSPEC_VOLATILE, will 
combine coalesce them into one in the combined pattern?


Thanks

Andrew


Re: [RS6000] Don't pass -many to the assembler

2018-11-12 Thread Peter Bergner
On 11/12/18 5:49 AM, Alan Modra wrote:
> I'd like to remove -many from the options passed by default to the
> assembler, on the grounds that a gcc bug in instruction selection (eg.
> emitting a power9 insn for -mcpu=power8) is better found at assembly
> time than run time.
> 
> This might annoy people for a while fixing user asm that we didn't
> diagnose previously, but I believe this is the right direction to go.
> Of course, -Wa,-many is available for anyone who just wants their
> dodgy old code to work.

+1

Peter



Re: [doc PATCH] Fix weakref description.

2018-11-12 Thread Michael Ploujnikov
On 2018-11-02 1:59 p.m., Michael Ploujnikov wrote:
> I came across this typo and also added a similar ld invocation for
> illustration purposes as mentioned by Jakub on irc.
> 

After talking to Jakub about it, I went with different terminology.


- Michael
From f14d7315e0dc9c4b6aff6137fd90e4d2595ef9f5 Mon Sep 17 00:00:00 2001
From: Michael Ploujnikov 
Date: Mon, 12 Nov 2018 12:42:37 -0500
Subject: [PATCH] Fix weakref description.

gcc/ChangeLog:

2018-11-12  Michael Ploujnikov  

	* doc/extend.texi: Fix typo in the weakref description.
---
 gcc/doc/extend.texi | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git gcc/doc/extend.texi gcc/doc/extend.texi
index e2b9ee11a54..fc507afe600 100644
--- gcc/doc/extend.texi
+++ gcc/doc/extend.texi
@@ -3619,7 +3619,7 @@ symbol, not necessarily in the same translation unit.
 The effect is equivalent to moving all references to the alias to a
 separate translation unit, renaming the alias to the aliased symbol,
 declaring it as weak, compiling the two separate translation units and
-performing a reloadable link on them.
+performing a link with relocatable output (ie: @code{ld -r}) on them.
 
 At present, a declaration to which @code{weakref} is attached can
 only be @code{static}.
-- 
2.19.1



signature.asc
Description: OpenPGP digital signature


Re: [PATCH 21/25] GCN Back-end (part 2/2).

2018-11-12 Thread Segher Boessenkool
On Mon, Nov 12, 2018 at 12:53:26PM +, Andrew Stubbs wrote:
> >>+/* Implement TARGET_LEGITIMATE_COMBINED_INSN.
> >>+
> >>+   Return false if the instruction is not appropriate as a combination 
> >>of two
> >>+   or more instructions.  */
> >>+
> >>+bool
> >>+gcn_legitimate_combined_insn (rtx_insn *insn)
> >>+{
> >>+  rtx pat = PATTERN (insn);
> >>+
> >>+  /* The combine pass tends to strip (use (exec)) patterns from insns.  
> >>This
> >>+ means it basically switches everything to use the *_scalar form of 
> >>the
> >>+ instructions, which is not helpful.  So, this function disallows 
> >>such
> >>+ combinations.  Unfortunately, this also disallows combinations of 
> >>genuine
> >>+ scalar-only patterns, but those only come from explicit expand code.
> >>+
> >>+ Possible solutions:
> >>+ - Invent TARGET_LEGITIMIZE_COMBINED_INSN.
> >>+ - Remove all (use (EXEC)) and rely on md_reorg with "exec" 
> >>attribute.
> >>+   */
> >This seems a bit hokey.  Why specifically is combine removing the USE?
> 
> I don't understand combine fully enough to explain it now, although at 
> the time I wrote this, and in a GCC 7 code base, I had followed the code 
> through and observed what it was doing.
> 
> Basically, if you have two patterns that do the same operation, but one 
> has a "parallel" with an additional "use", then combine will tend to 
> prefer the one without the "use". That doesn't stop the code working, 
> but it makes a premature (accidental) decision about instruction 
> selection that we'd prefer to leave to the register allocator.
> 
> I don't recall if it did this to lone instructions, but it would 
> certainly do so when combining two (or more) instructions, and IIRC 
> there are typically plenty of simple moves around that can be easily 
> combined.

If you don't want useless USEs deleted, use UNSPEC_VOLATILE instead?
Or actually use the register, i.e. as input to an actually needed
instruction.

If combine is changing an X and a USE to just that X if it can, combine
is doing a great job!

(combine cannot "combine" one instruction, fwiw; this sometime could be
useful (so just run simplification on every single instruction, see if
that makes a simpler valid instruction; and indeed a common case where it
can help is if the insn is a parallel and one of the arms of that isn't
needed).


Segher


Re: [PATCH][DOCS] Fix documentation of __builtin_cpu_is and __builtin_cpu_supports for x86.

2018-11-12 Thread Sandra Loosemore

On 11/12/18 4:46 AM, Martin Liška wrote:

Hi.

The patch is adding missing values for aforementioned built-ins.

Ready for trunk?
Thanks,
Martin

gcc/ChangeLog:

2018-11-12  Martin Liska  

* doc/extend.texi: Add missing values for __builtin_cpu_is and
__builtin_cpu_supports for x86 target.
---
  gcc/doc/extend.texi | 100 +++-
  1 file changed, 98 insertions(+), 2 deletions(-)



Looks fine to me, although I can't vouch for technical correctness.

-Sandra




Re: [RS6000] Don't pass -many to the assembler

2018-11-12 Thread Michael Matz
Hi,

On Mon, 12 Nov 2018, Segher Boessenkool wrote:

> > > Wouldn't this also break compiling code that contains power9 
> > > instructions but guarded by runtime tests to only be executed on 
> > > power9 machines?  That seems a valid usecase, and it'd be bad if the 
> > > assembler fails to compile such.  (You can't use -mcpu=power9 as 
> > > work around as the other unguarded code is not supposed to be using 
> > > power9 insns).
> > 
> > You'll need to put .machine directives around them.
> 
> My worry with that is there may be too much legacy code that does not do 
> this :-(

We'll see once we put gcc9 through a distro build.  My worry really only 
was that the change would result in compile breakage without a sensible 
solution.  (I'll just give all packages whose build failures prevent gcc9 
from being the new system compiler to Alan for fixing ;-) ).


Ciao,
Michael.


Re: [GCC][AArch64] [middle-end][docs] Document the xorsign optab

2018-11-12 Thread Sandra Loosemore

On 11/12/18 5:10 AM, Tamar Christina wrote:

Hi Sandra,


Ok for trunk?

+@cindex @code{xorsign@var{m}3} instruction pattern
+@item @samp{xorsign@var{m}3}
+Target suppports an efficient expansion of x * copysign (1.0, y)
+as xorsign (x, y).  Store a value with the magnitude of operand 1
+and the sign of operand 2 into operand 0.  All operands have mode
+@var{m}, which is a scalar or vector floating-point mode.
+
+This pattern is not allowed to @code{FAIL}.
+


Hmmm, needs markup, plus it's a little confusing.  How about describing
it as

Equivalent to @samp{op0 = op1 * copysign (1.0, op2)}: store a value with
the magnitude of operand 1 and the sign of operand 2 into operand 0.
All operands have mode @var{m}, which is a scalar or vector
floating-point mode.

This pattern is not allowed to @code{FAIL}.


That works for me, updated patch attached.

OK for trunk?


Yes, this is fine.  :-)

-Sandra



Re: [PATCH, GCC, AArch64] Branch Dilution Pass

2018-11-12 Thread Richard Earnshaw (lists)
On 12/11/2018 15:13, Kyrill Tkachov wrote:
> Hi Richard,
> 
> On 12/11/18 14:13, Richard Biener wrote:
>> On Fri, Nov 9, 2018 at 6:23 PM Sudakshina Das  wrote:
>> >
>> > Hi
>> >
>> > I am posting this patch on behalf of Carey (cc'ed). I also have some
>> > review comments that I will make as a reply to this later.
>> >
>> >
>> > This implements a new AArch64 specific back-end pass that helps
>> optimize
>> > branch-dense code, which can be a bottleneck for performance on some
>> Arm
>> > cores. This is achieved by padding out the branch-dense sections of the
>> > instruction stream with nops.
>>
>> Wouldn't this be more suitable for implementing inside the assembler?
>>
> 
> The number of NOPs to insert to get the performance benefits varies from
> core to core,
> I don't think we want to add such CPU-specific optimisation logic to the
> assembler.

Additionally, the compiler has to keep track of branch ranges.  It can't
do this properly if the assembler is emitting more instructions than the
compiler thinks it is.

R.

> 
> Thanks,
> Kyrill
> 
>> > This has proven to show up to a 2.61%~ improvement on the Cortex A-72
>> > (SPEC CPU 2006: sjeng).
>> >
>> > The implementation includes the addition of a new RTX instruction class
>> > FILLER_INSN, which has been white listed to allow placement of NOPs
>> > outside of a basic block. This is to allow padding after unconditional
>> > branches. This is favorable so that any performance gained from
>> > diluting branches is not paid straight back via excessive eating of
>> nops.
>> >
>> > It was deemed that a new RTX class was less invasive than modifying
>> > behavior in regards to standard UNSPEC nops.
>> >
>> > ## Command Line Options
>> >
>> > Three new target-specific options are provided:
>> > - mbranch-dilution
>> > - mbranch-dilution-granularity={num}
>> > - mbranch-dilution-max-branches={num}
>> >
>> > A number of cores known to be able to benefit from this pass have been
>> > given default tuning values for their granularity and max-branches.
>> > Each affected core has a very specific granule size and associated
>> > max-branch limit. This is a microarchitecture specific optimization.
>> > Typical usage should be -mdilute-branches with a specificed -mcpu.
>> Cores
>> > with a granularity tuned to 0 will be ignored. Options are provided for
>> > experimentation.
>> >
>> > ## Algorithm and Heuristic
>> >
>> > The pass takes a very simple 'sliding window' approach to the problem.
>> > We crawl through each instruction (starting at the first branch) and
>> > keep track of the number of branches within the current "granule" (or
>> > window). When this exceeds the max-branch value, the pass will dilute
>> > the current granule, inserting nops to push out some of the branches.
>> > The heuristic will favour unconditonal branches (for performance
>> > reasons), or branches that are between two other branches (in order to
>> > decrease the likelihood of another dilution call being needed).
>> >
>> > Each branch type required a different method for nop insertion due to
>> > RTL/basic_block restrictions:
>> >
>> > - Returning calls do not end a basic block so can be handled by
>> emitting
>> > a generic nop.
>> > - Unconditional branches must be the end of a basic block, and nops
>> > cannot be outside of a basic block.
>> >    Thus the need for FILLER_INSN, which allows placement outside of a
>> > basic block - and translates to a nop.
>> > - For most conditional branches we've taken a simple approach and only
>> > handle the fallthru edge for simplicity,
>> >    which we do by inserting a "nop block" of nops on the fallthru edge,
>> > mapping that back to the original destination block.
>> > - asm gotos and pcsets are going to be tricky to analyse from a
>> dilution
>> > perspective so are ignored at present.
>> >
>> >
>> > ## Changelog
>> >
>> > gcc/testsuite/ChangeLog:
>> >
>> > 2018-11-09  Carey Williams 
>> >
>> > * gcc.target/aarch64/branch-dilution-off.c: New test.
>> > * gcc.target/aarch64/branch-dilution-on.c: New test.
>> >
>> >
>> > gcc/ChangeLog:
>> >
>> > 2018-11-09  Carey Williams 
>> >
>> > * cfgbuild.c (inside_basic_block_p): Add FILLER_INSN case.
>> > * cfgrtl.c (rtl_verify_bb_layout): Whitelist FILLER_INSN
>> outside
>> > basic blocks.
>> > * config.gcc (extra_objs): Add aarch64-branch-dilution.o.
>> > * config/aarch64/aarch64-branch-dilution.c: New file.
>> > * config/aarch64/aarch64-passes.def (branch-dilution): Register
>> > pass.
>> > * config/aarch64/aarch64-protos.h (struct tune_params): Declare
>> > tuning parameters bdilution_gsize and bdilution_maxb.
>> > (make_pass_branch_dilution): New declaration.
>> > * config/aarch64/aarch64.c (generic_tunings,cortexa35_tunings,
>> > cortexa53_tunings,cortexa57_tunings,cortexa72_tunings,
>> > cortexa73_tunings,exynosm1_tunings,thunderxt88_tunings,
>> > thunderx_tunings,tsv110_tunings,xgen

Re: [RS6000] Don't pass -many to the assembler

2018-11-12 Thread Segher Boessenkool
On Mon, Nov 12, 2018 at 03:52:29PM +0100, Andreas Schwab wrote:
> On Nov 12 2018, Michael Matz  wrote:
> 
> > Wouldn't this also break compiling code that contains power9 instructions 
> > but guarded by runtime tests to only be executed on power9 machines?  That 
> > seems a valid usecase, and it'd be bad if the assembler fails to compile 
> > such.  (You can't use -mcpu=power9 as work around as the other 
> > unguarded code is not supposed to be using power9 insns).
> 
> You'll need to put .machine directives around them.

My worry with that is there may be too much legacy code that does not
do this :-(


Segher


Re: [PATCH] Instrument only selected files (PR gcov-profile/87442).

2018-11-12 Thread Jeff Law
On 11/12/18 12:56 AM, Martin Liška wrote:
> On 11/9/18 11:00 PM, Jeff Law wrote:
>> On 11/8/18 6:42 AM, Martin Liška wrote:
>>> Hi.
>>>
>>> The patch is about possibility to filter which files are instrumented. The 
>>> usage
>>> is explained in the PR.
>>>
>>> Patch can bootstrap and survives regression tests on x86_64-linux-gnu.
>>>
>>> Ready for trunk?
>>> Thanks,
>>> Martin
>>>
>>> gcc/ChangeLog:
>>>
>>> 2018-11-08  Martin Liska  
>>>
>>> PR gcov-profile/87442
>>> * common.opt: Add -fprofile-filter-files and -fprofile-exclude-files
>>> options.
>>> * doc/invoke.texi: Document them.
>>> * tree-profile.c (parse_profile_filter): New.
>>> (parse_profile_file_filtering): Likewise.
>>> (release_profile_file_filtering): Likewise.
>>> (include_source_file_for_profile): Likewise.
>>> (tree_profiling): Filter source files based on the
>>> newly added options.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> 2018-11-08  Martin Liska  
>>>
>>> PR gcov-profile/87442
>>> * gcc.dg/profile-filtering-1.c: New test.
>>> * gcc.dg/profile-filtering-2.c: New test.
>> Extra credit if we could also do this on a function level.  I've
>> certainly talked to developers that want finer grained control over what
>> gets instrumented and what doesn't.  This is probably enough to help
>> them, but I'm sure they'll want more :-)
>>
>>
>> OK.
>> jeff
>>
> 
> Hi.
> 
> May I consider this Jeff as approval of the patch?
Yes.  SOrry I wasn't explicit about that.
jeff


Re: [PATCH][LRA] Fix PR87899: r264897 cause mis-compiled native arm-linux-gnueabihf toolchain

2018-11-12 Thread Peter Bergner
On 11/12/18 6:25 AM, Renlin Li wrote:
> I tried to build a native arm-linuxeabihf toolchain with the patch.
> But I got the following ICE:

Why can't things ever be easy? :-)  I think we're getting closer though.

Anyway, can you please recompile the failing file but using -save-temps
and send me the resulting preprocessed source file?  Also, can you give
me the gcc configure options you used to build your GCC?  That should
give me enough info to debug this one.  Thanks.

Peter



C++ PATCH to implement C++20 P0634R3, Down with typename!

2018-11-12 Thread Marek Polacek
This patch implements C++20 P0634R3, Down with typename!

which makes 'typename' optional in several contexts specified in [temp.res].

The gist of the patch is in cp_parser_simple_type_specifier, where, if the
context makes typename optional and the id is qualified, we pretend we've
seen the typename keyword.

There's quite a lot of churn because we need to be careful where we want
to make typename optional, and e.g. a flag in cp_parser would be too global.

I'm not sure about some of the bits in typename5.C, not quite sure if the
code is valid, but I didn't have time to investigate deeply and it seems
pretty obscure anyway.  There are preexisting cases when g++ and clang++
disagree.

The resolve_typename_type hunk was to make typename9.C work with -fconcepts.

Bootstrapped/regtested on x86_64-linux.

2018-11-12  Marek Polacek  

Implement P0634R3, Down with typename!
* parser.c (CP_PARSER_FLAGS_TYPENAME_OPTIONAL): New enumerator.
(cp_parser_type_name): Remove declaration.
(cp_parser_postfix_expression): Pass TYPENAME_OPTIONAL_P to
cp_parser_type_id.
(cp_parser_new_type_id): Pass TYPENAME_OPTIONAL_P to
cp_parser_type_specifier_seq.
(cp_parser_lambda_declarator_opt): Pass TYPENAME_OPTIONAL_P
to cp_parser_parameter_declaration_clause.
(cp_parser_condition): Adjust call to cp_parser_declarator.
(cp_parser_simple_declaration): Adjust call to
cp_parser_init_declarator.
(cp_parser_conversion_type_id): Adjust call to
cp_parser_type_specifier_seq.
(cp_parser_default_type_template_argument): Pass TYPENAME_OPTIONAL_P
to cp_parser_type_id.
(cp_parser_template_parameter): Pass TYPENAME_OPTIONAL_P to
cp_parser_parameter_declaration.
(cp_parser_explicit_instantiation): Adjust call to cp_parser_declarator.
(cp_parser_simple_type_specifier): Adjust call to cp_parser_type_name.
(cp_parser_type_name): Remove unused function.
(cp_parser_enum_specifier): Adjust call to cp_parser_type_specifier_seq.
(cp_parser_alias_declaration): Pass TYPENAME_OPTIONAL_P to
cp_parser_type_id.
(cp_parser_init_declarator): New parameter.
(cp_parser_declarator): New parameter.  Use it.
(cp_parser_direct_declarator): Likewise.
(cp_parser_type_id_1): Likewise.
(cp_parser_type_id): Likewise.
(cp_parser_template_type_arg): Adjust call to cp_parser_type_id_1.
(cp_parser_trailing_type_id): Pass TYPENAME_OPTIONAL_P to
cp_parser_type_id_1.
(cp_parser_type_specifier_seq): New parameter.  Set flags to
CP_PARSER_FLAGS_TYPENAME_OPTIONAL.
(cp_parser_parameter_declaration_clause): New parameter.  Use it.
(cp_parser_parameter_declaration_list): Likewise.
(cp_parser_parameter_declaration): Likewise.
(cp_parser_member_declaration): Set flags to
CP_PARSER_FLAGS_TYPENAME_OPTIONAL.
(cp_parser_exception_declaration): Adjust calls to
cp_parser_type_specifier_seq and cp_parser_declarator.
(cp_parser_requirement_parameter_list): Adjust call to
cp_parser_parameter_declaration_clause.
(cp_parser_constructor_declarator_p): Resolve the TYPENAME_TYPE.
(cp_parser_single_declaration): Set flags to
CP_PARSER_FLAGS_TYPENAME_OPTIONAL.  Pass TYPENAME_OPTIONAL_P to
cp_parser_init_declarator.
(cp_parser_cache_defarg): Adjust call to cp_parser_declarator.
(cp_parser_objc_method_tail_params_opt): Adjust call to
cp_parser_parameter_declaration.
(cp_parser_objc_class_ivars): Adjust call to cp_parser_declarator.
(cp_parser_objc_try_catch_finally_statement): Adjust call to
cp_parser_parameter_declaration.
(cp_parser_objc_struct_declaration): Adjust call to
cp_parser_declarator.
(cp_parser_omp_for_loop_init): Adjust calls to
cp_parser_type_specifier_seq and cp_parser_declarator.

* g++.dg/cpp0x/alias-decl-43.C: Adjust dg-error.
* g++.dg/cpp0x/decltype67.C: Only expect error in c++17_down.
* g++.dg/cpp1z/typename1.C: New test.
* g++.dg/cpp2a/typename1.C: New test.
* g++.dg/cpp2a/typename10.C: New test.
* g++.dg/cpp2a/typename11.C: New test.
* g++.dg/cpp2a/typename2.C: New test.
* g++.dg/cpp2a/typename3.C: New test.
* g++.dg/cpp2a/typename4.C: New test.
* g++.dg/cpp2a/typename5.C: New test.
* g++.dg/cpp2a/typename6.C: New test.
* g++.dg/cpp2a/typename7.C: New test.
* g++.dg/cpp2a/typename8.C: New test.
* g++.dg/cpp2a/typename9.C: New test.
* g++.dg/diagnostic/missing-typename.C: Only run the test in
c++17_down.
* g++.dg/other/crash-9.C: Add template disambiguator.
* g++.dg/other/nontype-1.C: Only expect error in c++17_down.
*

[PATCH] PR libstdc++/87963 fix build for 64-bit mingw

2018-11-12 Thread Jonathan Wakely

PR libstdc++/87963
* src/c++17/memory_resource.cc (chunk::_M_bytes): Change type from
unsigned to uint32_t.
(chunk): Fix static assertion for 64-bit targets that aren't LP64.
(bigblock::all_ones): Fix undefined shift.

Tested x86_64-linux, committed to trunk.

commit d4c238672c04397626391ae9a89ebfe76d70eb55
Author: Jonathan Wakely 
Date:   Mon Nov 12 15:16:31 2018 +

PR libstdc++/87963 fix build for 64-bit mingw

PR libstdc++/87963
* src/c++17/memory_resource.cc (chunk::_M_bytes): Change type from
unsigned to uint32_t.
(chunk): Fix static assertion for 64-bit targets that aren't LP64.
(bigblock::all_ones): Fix undefined shift.

diff --git a/libstdc++-v3/src/c++17/memory_resource.cc 
b/libstdc++-v3/src/c++17/memory_resource.cc
index 781bdada381..3595e255889 100644
--- a/libstdc++-v3/src/c++17/memory_resource.cc
+++ b/libstdc++-v3/src/c++17/memory_resource.cc
@@ -421,7 +421,7 @@ namespace pmr
 // The chunk has space for n blocks, followed by a bitset of size n
 // that begins at address words.
 // This object does not own p or words, the caller will free it.
-chunk(void* p, size_t bytes, void* words, size_t n)
+chunk(void* p, uint32_t bytes, void* words, size_t n)
 : bitset(words, n),
   _M_bytes(bytes),
   _M_p(static_cast(p))
@@ -442,7 +442,7 @@ namespace pmr
 }
 
 // Allocated size of chunk:
-unsigned _M_bytes = 0;
+uint32_t _M_bytes = 0;
 // Start of allocated chunk:
 std::byte* _M_p = nullptr;
 
@@ -508,12 +508,9 @@ namespace pmr
 { return std::less{}(p, c._M_p); }
   };
 
-#ifdef __LP64__
-  // TODO pad up to 4*sizeof(void*) to avoid splitting across cache lines?
-  static_assert(sizeof(chunk) == (3 * sizeof(void*)), "");
-#else
-  static_assert(sizeof(chunk) == (4 * sizeof(void*)), "");
-#endif
+  // For 64-bit this is 3*sizeof(void*) and for 32-bit it's 4*sizeof(void*).
+  // TODO pad 64-bit to 4*sizeof(void*) to avoid splitting across cache lines?
+  static_assert(sizeof(chunk) == 2 * sizeof(uint32_t) + 2 * sizeof(void*));
 
   // An oversized allocation that doesn't fit in a pool.
   struct big_block
@@ -523,7 +520,7 @@ namespace pmr
 static constexpr unsigned _S_sizebits
   = numeric_limits::digits - _S_alignbits;
 // The maximum value that can be stored in _S_size
-static constexpr size_t all_ones = (1ul << _S_sizebits) - 1u;
+static constexpr size_t all_ones = (1ull << _S_sizebits) - 1u;
 // The minimum size of a big block
 static constexpr size_t min = 1u << _S_alignbits;
 


Re: [PATCH, GCC, AArch64] Branch Dilution Pass

2018-11-12 Thread Kyrill Tkachov

Hi Richard,

On 12/11/18 14:13, Richard Biener wrote:

On Fri, Nov 9, 2018 at 6:23 PM Sudakshina Das  wrote:
>
> Hi
>
> I am posting this patch on behalf of Carey (cc'ed). I also have some
> review comments that I will make as a reply to this later.
>
>
> This implements a new AArch64 specific back-end pass that helps optimize
> branch-dense code, which can be a bottleneck for performance on some Arm
> cores. This is achieved by padding out the branch-dense sections of the
> instruction stream with nops.

Wouldn't this be more suitable for implementing inside the assembler?



The number of NOPs to insert to get the performance benefits varies from core 
to core,
I don't think we want to add such CPU-specific optimisation logic to the 
assembler.

Thanks,
Kyrill


> This has proven to show up to a 2.61%~ improvement on the Cortex A-72
> (SPEC CPU 2006: sjeng).
>
> The implementation includes the addition of a new RTX instruction class
> FILLER_INSN, which has been white listed to allow placement of NOPs
> outside of a basic block. This is to allow padding after unconditional
> branches. This is favorable so that any performance gained from
> diluting branches is not paid straight back via excessive eating of nops.
>
> It was deemed that a new RTX class was less invasive than modifying
> behavior in regards to standard UNSPEC nops.
>
> ## Command Line Options
>
> Three new target-specific options are provided:
> - mbranch-dilution
> - mbranch-dilution-granularity={num}
> - mbranch-dilution-max-branches={num}
>
> A number of cores known to be able to benefit from this pass have been
> given default tuning values for their granularity and max-branches.
> Each affected core has a very specific granule size and associated
> max-branch limit. This is a microarchitecture specific optimization.
> Typical usage should be -mdilute-branches with a specificed -mcpu. Cores
> with a granularity tuned to 0 will be ignored. Options are provided for
> experimentation.
>
> ## Algorithm and Heuristic
>
> The pass takes a very simple 'sliding window' approach to the problem.
> We crawl through each instruction (starting at the first branch) and
> keep track of the number of branches within the current "granule" (or
> window). When this exceeds the max-branch value, the pass will dilute
> the current granule, inserting nops to push out some of the branches.
> The heuristic will favour unconditonal branches (for performance
> reasons), or branches that are between two other branches (in order to
> decrease the likelihood of another dilution call being needed).
>
> Each branch type required a different method for nop insertion due to
> RTL/basic_block restrictions:
>
> - Returning calls do not end a basic block so can be handled by emitting
> a generic nop.
> - Unconditional branches must be the end of a basic block, and nops
> cannot be outside of a basic block.
>Thus the need for FILLER_INSN, which allows placement outside of a
> basic block - and translates to a nop.
> - For most conditional branches we've taken a simple approach and only
> handle the fallthru edge for simplicity,
>which we do by inserting a "nop block" of nops on the fallthru edge,
> mapping that back to the original destination block.
> - asm gotos and pcsets are going to be tricky to analyse from a dilution
> perspective so are ignored at present.
>
>
> ## Changelog
>
> gcc/testsuite/ChangeLog:
>
> 2018-11-09  Carey Williams 
>
> * gcc.target/aarch64/branch-dilution-off.c: New test.
> * gcc.target/aarch64/branch-dilution-on.c: New test.
>
>
> gcc/ChangeLog:
>
> 2018-11-09  Carey Williams 
>
> * cfgbuild.c (inside_basic_block_p): Add FILLER_INSN case.
> * cfgrtl.c (rtl_verify_bb_layout): Whitelist FILLER_INSN outside
> basic blocks.
> * config.gcc (extra_objs): Add aarch64-branch-dilution.o.
> * config/aarch64/aarch64-branch-dilution.c: New file.
> * config/aarch64/aarch64-passes.def (branch-dilution): Register
> pass.
> * config/aarch64/aarch64-protos.h (struct tune_params): Declare
> tuning parameters bdilution_gsize and bdilution_maxb.
> (make_pass_branch_dilution): New declaration.
> * config/aarch64/aarch64.c (generic_tunings,cortexa35_tunings,
> cortexa53_tunings,cortexa57_tunings,cortexa72_tunings,
> cortexa73_tunings,exynosm1_tunings,thunderxt88_tunings,
> thunderx_tunings,tsv110_tunings,xgene1_tunings,
> qdf24xx_tunings,saphira_tunings,thunderx2t99_tunings):
> Provide default tunings for bdilution_gsize and bdilution_maxb.
> * config/aarch64/aarch64.md (filler_insn): Define new insn.
> * config/aarch64/aarch64.opt (mbranch-dilution,
> mbranch-dilution-granularity,
> mbranch-dilution-max-branches): Define new branch dilution
> options.
> * config/aarch64/t-aarch64 (aarch64-branch-dilution.c): New rule
> for aarch64-branch-dilution.c.
> * coretypes.h (rtx_fille

Re: [PATCH, GCC, ARM] Enable armv8.5-a and add +sb and +predres for previous ARMv8-a in ARM

2018-11-12 Thread Sudakshina Das
Hi Kyrill

On 09/11/18 18:21, Kyrill Tkachov wrote:
> Hi Sudi,
> 
> On 09/11/18 15:33, Sudakshina Das wrote:
>> Hi
>>
>> This patch adds -march=armv8.5-a to the Arm backend.
>> (https://developer.arm.com/products/architecture/cpu-architecture/a-profile/exploration-tools)
>>  
>>
>> Armv8.5-A also adds two new security features:
>> - Speculation Barrier instruction
>> - Execution and Data Prediction Restriction Instructions
>> These are made optional to all older Armv8-A versions. Thus we are
>> adding two new options "+sb" and "+predres" to all older Armv8-A. These
>> are passed on to the assembler and have no code generation effects and
>> have already gone in the trunk of binutils.
>>
>> Bootstrapped and regression tested with arm-none-linux-gnueabihf.
>>
>> Is this ok for trunk?
>> Sudi
>>
>> *** gcc/ChangeLog ***
>>
>> 2018-xx-xx  Sudakshina Das  
>>
>> * config/arm/arm-cpus.in (armv8_5, sb, predres): New features.
>> (ARMv8_5a): New fgroup.
>> (armv8.5-a): New arch.
>> (armv8-a, armv8.1-a, armv8.2-a, armv8.3-a, armv8.4-a): New
>> options sb and predres.
>> * config/arm/arm-tables.opt: Regenerate.
>> * config/arm/t-aprofile: Add matching rules for -march=armv8.5-a
>> * config/arm/t-arm-elf (all_v8_archs): Add armv8.5-a.
>> * config/arm/t-multilib (v8_5_a_simd_variants): New variable.
>> Add matching rules for -march=armv8.5-a and extensions.
>> * doc/invoke.texi (ARM options): Document -march=armv8.5-a.
>> Add sb and predres to all armv8-a except armv8.5-a.
>>
>> *** gcc/testsuite/ChangeLog ***
>>
>> 2018-xx-xx  Sudakshina Das  
>>
>> * gcc.target/arm/multilib.exp: Add some -march=armv8.5-a
>> combination tests.
> 
> Hi
> 
> This patch adds -march=armv8.5-a to the Arm backend.
> (https://developer.arm.com/products/architecture/cpu-architecture/a-profile/exploration-tools)
>  
> 
> Armv8.5-A also adds two new security features:
> - Speculation Barrier instruction
> - Execution and Data Prediction Restriction Instructions
> These are made optional to all older Armv8-A versions. Thus we are
> adding two new options "+sb" and "+predres" to all older Armv8-A. These
> are passed on to the assembler and have no code generation effects and
> have already gone in the trunk of binutils.
> 
> Bootstrapped and regression tested with arm-none-linux-gnueabihf.
> 
> Is this ok for trunk?
> Sudi
> 
> *** gcc/ChangeLog ***
> 
> 2018-xx-xx  Sudakshina Das
> 
>  * config/arm/arm-cpus.in (armv8_5, sb, predres): New features.
>  (ARMv8_5a): New fgroup.
>  (armv8.5-a): New arch.
>  (armv8-a, armv8.1-a, armv8.2-a, armv8.3-a, armv8.4-a): New
>  options sb and predres.
>  * config/arm/arm-tables.opt: Regenerate.
>  * config/arm/t-aprofile: Add matching rules for -march=armv8.5-a
>  * config/arm/t-arm-elf (all_v8_archs): Add armv8.5-a.
>  * config/arm/t-multilib (v8_5_a_simd_variants): New variable.
>  Add matching rules for -march=armv8.5-a and extensions.
>  * doc/invoke.texi (ARM options): Document -march=armv8.5-a.
>  Add sb and predres to all armv8-a except armv8.5-a.
> 
> *** gcc/testsuite/ChangeLog ***
> 
> 2018-xx-xx  Sudakshina Das
> 
>  * gcc.target/arm/multilib.exp: Add some -march=armv8.5-a
>  combination tests.
> 
> 
> 
> This is ok modulo a typo fix below.
> 
> Thanks,
> Kyrill
> 

Thanks. Fixed and committed as r266031.

Sudi

> 
> 
> index 
> 25788ad09851daf41038b1578307bf23b7f34a94..eba038f9d20bc54bef7bdb7fa1c0e7028d954ed7
>  
> 100644
> --- a/gcc/config/arm/t-multilib
> +++ b/gcc/config/arm/t-multilib
> @@ -70,7 +70,8 @@ v8_a_simd_variants    := $(call all_feat_combs, simd 
> crypto)
>   v8_1_a_simd_variants    := $(call all_feat_combs, simd crypto)
>   v8_2_a_simd_variants    := $(call all_feat_combs, simd fp16 fp16fml 
> crypto dotprod)
>   v8_4_a_simd_variants    := $(call all_feat_combs, simd fp16 crypto)
> -v8_r_nosimd_variants    := +crc
> +v8_5_a_simd_variants    := $(call all_feat_combs, simd fp16 crypto)
> +v8_r_nosimd_variants    := +cr5
> 
> 
> Typo, should be +crc
> 
> 
> 



Re: [RS6000] Don't pass -many to the assembler

2018-11-12 Thread Andreas Schwab
On Nov 12 2018, Michael Matz  wrote:

> Wouldn't this also break compiling code that contains power9 instructions 
> but guarded by runtime tests to only be executed on power9 machines?  That 
> seems a valid usecase, and it'd be bad if the assembler fails to compile 
> such.  (You can't use -mcpu=power9 as work around as the other 
> unguarded code is not supposed to be using power9 insns).

You'll need to put .machine directives around them.

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


[PATCH] Fortran include line fixes and -fdec-include support

2018-11-12 Thread Jakub Jelinek
Hi!

In fortran97.pdf I read:
"Except in a character context, blanks are insignificant and may be used freely 
throughout the program."
and while we handle that in most cases, we don't allow spaces in INCLUDE
lines in fixed form, while e.g. ifort does.

Another thing, which I haven't touched in the PR except covering it with a
testcase is that we allow INLINE line in fixed form to start even in columns
1 to 6, while ifort rejects that.  Is say
 include 'omp_lib.h'
valid in fixed form?  i in column 6 normally means a continuation line,
though not sure if anything can in a valid program contain nclude
followed by character literal.  Shall we reject that, or at least warn that
it won't be portable?

The last thing, biggest part of the patch, is that for legacy DEC
compatibility, the DEC manuals document INCLUDE as a statement, not a line,
the
"An INCLUDE line is not a Fortran statement."
and
"An INCLUDE line shall appear on a single source line where a statement may 
appear; it shall be
the only nonblank text on this line other than an optional trailing comment. 
Thus, a statement
label is not allowed."
bullets don't apply, but instead there is:
"The INCLUDE statement takes one of the following forms:"
"An INCLUDE statement can appear anywhere within a scoping unit. The statement
can span more than one source line, but no other statement can appear on the 
same
line. The source line cannot be labeled."

This means there can be (as can be seen in the following testcases)
continuations in both forms, and in fixed form there can be 0 in column 6.

In order not to duplicate all the handling of continuations, comment
skipping etc., the patch just adjusts the include_line routine so that it
signals if the current line is a possible start of a valid INCLUDE statement
when in -fdec-include mode, and if so, whenever it reads a further line it
retries to parse it using
gfc_next_char/gfc_next_char_literal/gfc_gobble_whitespace APIs as an INCLUDE
stmt.  If it is found not to be a valid INCLUDE statement line or set of
lines, it returns 0, if it is valid, it returns 1 together with load_file
like include_line does and clears all the lines containint the INCLUDE
statement.  If the reading stops because we don't have enough lines, -1 is
returned and the caller tries again with more lines.

Tested on x86_64-linux, ok for trunk if it passes full bootstrap/regtest?

In addition to the above mentioned question about include in columns 1-6 in
fixed form, another thing is that we support
  print *, 'abc''def'
  print *, "hij""klm"
which prints abc'def and hij"klm.  Shall we support that for INCLUDE lines
and INCLUDE statements too?

2018-11-12  Jakub Jelinek  
Mark Eggleston  

* lang.opt (fdec-include): New option.
* options.c (set_dec_flags): Set also flag_dec_include.
* scanner.c (include_line): Change return type from bool to int.
In fixed form allow spaces in between include keyword letters.
For -fdec-include, allow in fixed form 0 in column 6.  With
-fdec-include return -1 if the parsed line is not full include
statement and it could be successfully completed on continuation
lines.
(include_stmt): New function.
(load_file): Adjust include_line caller.  If it returns -1, keep
trying include_stmt until it stops returning -1 whenever adding
further line of input.

* gfortran.dg/include_10.f: New test.
* gfortran.dg/include_10.inc: New file.
* gfortran.dg/include_11.f: New test.
* gfortran.dg/include_12.f: New test.
* gfortran.dg/include_13.f90: New test.
* gfortran.dg/gomp/include_1.f: New test.
* gfortran.dg/gomp/include_1.inc: New file.
* gfortran.dg/gomp/include_2.f90: New test.

--- gcc/fortran/lang.opt.jj 2018-07-18 22:57:15.227785894 +0200
+++ gcc/fortran/lang.opt2018-11-12 09:35:03.185259773 +0100
@@ -440,6 +440,10 @@ fdec
 Fortran Var(flag_dec)
 Enable all DEC language extensions.
 
+fdec-include
+Fortran Var(flag_dec_include)
+Enable legacy parsing of INCLUDE as statement.
+
 fdec-intrinsic-ints
 Fortran Var(flag_dec_intrinsic_ints)
 Enable kind-specific variants of integer intrinsic functions.
--- gcc/fortran/options.c.jj2018-11-06 18:27:13.828831733 +0100
+++ gcc/fortran/options.c   2018-11-12 09:35:39.515655453 +0100
@@ -68,6 +68,7 @@ set_dec_flags (int value)
   flag_dec_intrinsic_ints |= value;
   flag_dec_static |= value;
   flag_dec_math |= value;
+  flag_dec_include |= value;
 }
 
 
--- gcc/fortran/scanner.c.jj2018-05-08 13:56:41.691932534 +0200
+++ gcc/fortran/scanner.c   2018-11-12 15:21:51.249391936 +0100
@@ -2135,14 +2135,18 @@ static bool load_file (const char *, con
 /* include_line()-- Checks a line buffer to see if it is an include
line.  If so, we call load_file() recursively to load the included
file.  We never return a syntax error because a statement like
-   "include = 5" is perfec

Re: [RFC][PR87528][PR86677] Disable builtin popcount detection when back-end does not define it

2018-11-12 Thread Richard Biener
On Mon, Nov 12, 2018 at 6:21 AM Kugan Vivekanandarajah
 wrote:
>
> Hi Richard,
>
> Thanks for the review.
> On Thu, 8 Nov 2018 at 00:03, Richard Biener  
> wrote:
> >
> > On Fri, Nov 2, 2018 at 10:02 AM Kugan Vivekanandarajah
> >  wrote:
> > >
> > > Hi Richard,
> > > Thanks for the review.
> > > On Tue, 30 Oct 2018 at 01:25, Richard Biener  
> > > wrote:
> > > >
> > > > On Mon, Oct 29, 2018 at 2:06 AM Kugan Vivekanandarajah
> > > >  wrote:
> > > > >
> > > > > Hi Richard and Jeff,
> > > > >
> > > > > Thanks for your comments.
> > > > >
> > > > > On Fri, 26 Oct 2018 at 19:40, Richard Biener 
> > > > >  wrote:
> > > > > >
> > > > > > On Fri, Oct 26, 2018 at 4:55 AM Jeff Law  wrote:
> > > > > > >
> > > > > > > On 10/25/18 4:33 PM, Kugan Vivekanandarajah wrote:
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > PR87528 showed a case where libgcc generated popcount is causing
> > > > > > > > regression for Skylake.
> > > > > > > > We also have PR86677 where kernel build is failing because the 
> > > > > > > > kernel
> > > > > > > > does not use the libgcc (when backend is not defining popcount
> > > > > > > > pattern).  While I agree that the kernel should implement its 
> > > > > > > > own
> > > > > > > > functionality when it is not using the libgcc, I am afraid that 
> > > > > > > > the
> > > > > > > > implementation can have the same performance issues reported for
> > > > > > > > Skylake in PR87528.
> > > > > > > >
> > > > > > > > Therefore, I would like to propose that we disable popcount 
> > > > > > > > detection
> > > > > > > > when we don't have a pattern for that. The attached patch 
> > > > > > > > (based on
> > > > > > > > previous discussions) does this.
> > > > > > > >
> > > > > > > > Bootstrapped and regression tested on x86_64-linux-gnu with no 
> > > > > > > > new
> > > > > > > > regressions. We need to disable the popcount* testcases. I will 
> > > > > > > > have
> > > > > > > > to define a effective_target_with_popcount in
> > > > > > > > gcc/testsuite/lib/target-supports.exp if this patch is OK?
> > > > > > > > Thanks,
> > > > > > > > Kugan
> > > > > > > >
> > > > > > > >
> > > > > > > > gcc/ChangeLog:
> > > > > > > >
> > > > > > > > 2018-10-25  Kugan Vivekanandarajah  
> > > > > > > >
> > > > > > > > * tree-scalar-evolution.c (expression_expensive_p): Make 
> > > > > > > > BUILTIN POPCOUNT
> > > > > > > > as expensive when backend does not define it.
> > > > > > > >
> > > > > > > >
> > > > > > > > gcc/testsuite/ChangeLog:
> > > > > > > >
> > > > > > > > 2018-10-25  Kugan Vivekanandarajah  
> > > > > > > >
> > > > > > > > * gcc.target/aarch64/popcount4.c: New test.
> > > > > > > >
> > > > > > > FWIW, I've been disabling by checking direct_optab_handler 
> > > > > > > elsewhere
> > > > > > > (number_of_iterations_popcount) in my tester.  It may in fact be 
> > > > > > > an old
> > > > > > > patch from you.
> > > > > > >
> > > > > > > Richi argued that it's the kernel team's responsibility to 
> > > > > > > provide a
> > > > > > > popcount since they don't link with libgcc.  And I'm generally in
> > > > > > > agreement with that position, though it does tend to generate some
> > > > > > > friction with the kernel developers.  We also run the real risk 
> > > > > > > of GCC 9
> > > > > > > not being able to build the kernel which, IMHO, would be a 
> > > > > > > disaster from
> > > > > > > a PR standpoint.
> > > > > > >
> > > > > > > I'd like to hear from others here.  I fully realize we're beyond 
> > > > > > > the
> > > > > > > realm of what is strictly technically correct here from a review 
> > > > > > > standpoint.
> > > > > >
> > > > > > As said final value replacement to a library call is probably not 
> > > > > > wanted
> > > > > > for optimization purpose, so adjusting expression_expensive_p is OK 
> > > > > > with
> > > > > > me.  It might not fully solve the (non-)issue in case another 
> > > > > > optimization pass
> > > > > > chooses to materialize niter computation result.
> > > > > >
> > > > > > Few comments on the patch:
> > > > > >
> > > > > > +  tree fndecl = get_callee_fndecl (expr);
> > > > > > +
> > > > > > +  if (fndecl && DECL_BUILT_IN_CLASS (fndecl) == 
> > > > > > BUILT_IN_NORMAL)
> > > > > > +   {
> > > > > > + combined_fn cfn = as_combined_fn (DECL_FUNCTION_CODE 
> > > > > > (fndecl));
> > > > > >
> > > > > >   combined_fn cfn = gimple_call_combined_fn (expr);
> > > > > >   switch (cfn)
> > > > > > {
> > > > >
> > > > > Did you mean:
> > > > > combined_fn cfn = get_call_combined_fn (expr);
> > > >
> > > > Yes.
> > > >
> > > > > > ...
> > > > > >
> > > > > > cfn will be CFN_LAST for a non-builtin/internal call.  I know 
> > > > > > Richard is mostly
> > > > > > offline but eventually he knows whether there is a better way to 
> > > > > > query
> > > > > >
> > > > > > +   CASE_CFN_POPCOUNT:
> > > > > > + /* Check if opcode for popcount is available.  */
> > > > > > + if (optab_handler (popcount_optab,

Re: [RS6000] Don't pass -many to the assembler

2018-11-12 Thread Michael Matz
Hi,

On Mon, 12 Nov 2018, Alan Modra wrote:

> I'd like to remove -many from the options passed by default to the 
> assembler, on the grounds that a gcc bug in instruction selection (eg. 
> emitting a power9 insn for -mcpu=power8) is better found at assembly 
> time than run time.
> 
> This might annoy people for a while fixing user asm that we didn't 
> diagnose previously, but I believe this is the right direction to go. Of 
> course, -Wa,-many is available for anyone who just wants their dodgy old 
> code to work.

Wouldn't this also break compiling code that contains power9 instructions 
but guarded by runtime tests to only be executed on power9 machines?  That 
seems a valid usecase, and it'd be bad if the assembler fails to compile 
such.  (You can't use -mcpu=power9 as work around as the other 
unguarded code is not supposed to be using power9 insns).


Ciao,
Michael.

> 
> Bootstrapped etc. powerpc64le-linux.  OK?
> 
>   * config/rs6000/rs6000.h (ASM_CPU_SPEC): Remove -many.
>   * config/rs6000/aix61.h (ASM_CPU_SPEC): Likewise.
>   * config/rs6000/aix71.h (ASM_CPU_SPEC): Likewise.
>   * testsuite/gcc.target/powerpc/ppc32-abi-dfp-1.c: Don't use
>   power mnemonics.
> 
> diff --git a/gcc/config/rs6000/aix61.h b/gcc/config/rs6000/aix61.h
> index 353e5d6cfeb..a7a8246bfe3 100644
> --- a/gcc/config/rs6000/aix61.h
> +++ b/gcc/config/rs6000/aix61.h
> @@ -91,8 +91,7 @@ do {
> \
>  %{mcpu=630: -m620} \
>  %{mcpu=970: -m970} \
>  %{mcpu=G5: -m970} \
> -%{mvsx: %{!mcpu*: -mpwr6}} \
> --many"
> +%{mvsx: %{!mcpu*: -mpwr6}}"
>  
>  #undef   ASM_DEFAULT_SPEC
>  #define ASM_DEFAULT_SPEC "-mpwr4"
> diff --git a/gcc/config/rs6000/aix71.h b/gcc/config/rs6000/aix71.h
> index 2398ed64baa..d2ca8dc275d 100644
> --- a/gcc/config/rs6000/aix71.h
> +++ b/gcc/config/rs6000/aix71.h
> @@ -89,8 +89,7 @@ do {
> \
>   maltivec: -m970; \
>   maix64|mpowerpc64: -mppc64; \
>   : %(asm_default)}; \
> -  :%eMissing -mcpu option in ASM_SPEC_CPU?\n} \
> --many"
> +  :%eMissing -mcpu option in ASM_SPEC_CPU?\n}"
>  
>  #undef   ASM_DEFAULT_SPEC
>  #define ASM_DEFAULT_SPEC "-mpwr4"
> diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
> index d75137cf8f5..9d78173a680 100644
> --- a/gcc/config/rs6000/rs6000.h
> +++ b/gcc/config/rs6000/rs6000.h
> @@ -137,8 +137,7 @@
>   mvsx: -mpower7; \
>   mpowerpc64: -mppc64;: %(asm_default)}; \
>:%eMissing -mcpu option in ASM_SPEC_CPU?\n} \
> -%{mvsx: -mvsx -maltivec; maltivec: -maltivec} \
> --many"
> +%{mvsx: -mvsx -maltivec; maltivec: -maltivec}"
>  
>  #define CPP_DEFAULT_SPEC ""
>  
> diff --git a/gcc/testsuite/gcc.target/powerpc/ppc32-abi-dfp-1.c 
> b/gcc/testsuite/gcc.target/powerpc/ppc32-abi-dfp-1.c
> index 14908dba690..eea7f6ffc2e 100644
> --- a/gcc/testsuite/gcc.target/powerpc/ppc32-abi-dfp-1.c
> +++ b/gcc/testsuite/gcc.target/powerpc/ppc32-abi-dfp-1.c
> @@ -45,14 +45,14 @@ __asm__ ("\t.globl\t" #NAME "_asm\n\t"
> \
>#NAME "_asm:\n\t"  \
>"lis 11,gparms@ha\n\t" \
>"la 11,gparms@l(11)\n\t"   \
> -  "st 3,0(11)\n\t"   \
> -  "st 4,4(11)\n\t"   \
> -  "st 5,8(11)\n\t"   \
> -  "st 6,12(11)\n\t"  \
> -  "st 7,16(11)\n\t"  \
> -  "st 8,20(11)\n\t"  \
> -  "st 9,24(11)\n\t"  \
> -  "st 10,28(11)\n\t" \
> +  "stw 3,0(11)\n\t"  \
> +  "stw 4,4(11)\n\t"  \
> +  "stw 5,8(11)\n\t"  \
> +  "stw 6,12(11)\n\t" \
> +  "stw 7,16(11)\n\t" \
> +  "stw 8,20(11)\n\t" \
> +  "stw 9,24(11)\n\t" \
> +  "stw 10,28(11)\n\t"\
>"stfd 1,32(11)\n\t"\
>"stfd 2,40(11)\n\t"\
>"stfd 3,48(11)\n\t"\
> 
> 


Re: RFA: vectorizer patches 2/2: reduction splitting

2018-11-12 Thread Richard Biener
On Sun, Nov 11, 2018 at 9:16 AM Joern Wolfgang Rennecke
 wrote:
>
> It's nice to use the processors vector arithmetic to good effect, but
> it's all for naught when
> there are too many moves from/to general registers cluttering up the
> loop.  With a
> double-vector reduction variable, the standard final reduction code got
> so awkward that
> the register allocator decided that the reduction variable must live in
> general purpose
> registers, not only after the loop, but across the loop patch.
> Splitting the reduction to force the first step to be done as a vector
> operation
> seemed the obvious solution. The hook was called, but the vectorizer still
> generated the vanilla final reduction code.  It turns out that the
> reduction splitting
> was calculated, but the result not used, and the calculation started anew.
>
> The attached patch fixes this.

That looks quite fragile to me or warrants further cleanups.  Can you
push up the new_phis.length assert further and elide the loop over
the PHIs?  It looks like at the very beginning we are reducing the
PHIs to a single PHI and new_phi_result is the one to look at
(and the vector is updated, but given we replace the PHI with an
assign using new_phi_result instead of the vector would be better).

RIchard.

> bootstrapped and regression tested on x86_64-pc-linux-gnu .


Re: RFA: vectorizer patches 1/2 : WIDEN_MULT_PLUS support

2018-11-12 Thread Richard Biener
On Sun, Nov 11, 2018 at 8:21 AM Joern Wolfgang Rennecke
 wrote:
>
> Our target (eSi-RISC) doesn't have DOT_PROD_EXPR or WIDEN_SUM_EXPR
> operations in
> the standard vector modes; however, it has a vectorized WIDEN_MULT_PLUS_EXPR
> implementation with a double-vector output, which works just as well,
> with a little
> help from the compiler - as implemented in these patches.

I guess I already asked this question when WIDEN_MULT_PLUS_EXPR was
introduced - but isn't that fully contained within a DOT_PROD_EXPR?

Some comments on the patch.

+  tree vecotype
+= build_vector_type (otype, GET_MODE_NUNITS (TYPE_MODE (vecitype)));

TYPE_VECTOR_SUBPARTS (vecitype)

You want to pass in the half/full types and use get_vectype_for_scalar_type
which also makes sure the target supports the vector type.

I think you want to extend and re-use supportable_widening_operation
here anyways.

Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c   (revision 266008)
+++ gcc/tree-vect-stmts.c   (working copy)
@@ -10638,7 +10638,11 @@ vect_get_vector_types_for_stmt (stmt_vec
   scalar_type);

   if (maybe_ne (GET_MODE_SIZE (TYPE_MODE (vectype)),
-   GET_MODE_SIZE (TYPE_MODE (nunits_vectype
+   GET_MODE_SIZE (TYPE_MODE (nunits_vectype)))
+  /* Reductions that use a widening reduction would show
+a mismatch but that's already been checked to be OK.  */
+  && STMT_VINFO_DEF_TYPE (stmt_info) != vect_reduction_def)
+
 return opt_result::failure_at (stmt,
   "not vectorized: different sized vector "
   "types in statement, %T and %T\n",

that change doesn't look good.

> Bootstrapped and regtested on i686-pc-linux-gnu.


Re: [PATCH] Fix ICE with -fopt-info-inline (PR ipa/87955)

2018-11-12 Thread Richard Biener
On Sun, Nov 11, 2018 at 2:33 AM David Malcolm  wrote:
>
> PR ipa/87955 reports a problem I introduced in r265920, where I converted
> the guard in report_inline_failed_reason from using:
>   if (dump_file)
> to using
>   if (dump_enabled_p ()).
> without updating the calls to cl_target_option_print_diff and
> cl_optimization_print_diff, which assume that dump_file is non-NULL.
>
> The functions are auto-generated.  Rather than porting them to the dump
> API, this patch applies the workaround of adding the missing checks on
> dump_file before calling them.
>
> Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
>
> OK for trunk?

OK.

Richard.

> gcc/ChangeLog:
> PR ipa/87955
> * ipa-inline.c (report_inline_failed_reason): Guard calls to
> cl_target_option_print_diff and cl_optimization_print_diff with
> if (dump_file).
>
> gcc/testsuite/ChangeLog:
> PR ipa/87955
> * gcc.target/i386/pr87955.c: New test.
> ---
>  gcc/ipa-inline.c| 14 --
>  gcc/testsuite/gcc.target/i386/pr87955.c | 10 ++
>  2 files changed, 18 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr87955.c
>
> diff --git a/gcc/ipa-inline.c b/gcc/ipa-inline.c
> index e04ede7..173808a 100644
> --- a/gcc/ipa-inline.c
> +++ b/gcc/ipa-inline.c
> @@ -244,13 +244,15 @@ report_inline_failed_reason (struct cgraph_edge *e)
>e->callee->ultimate_alias_target 
> ()->lto_file_data->file_name);
> }
>if (e->inline_failed == CIF_TARGET_OPTION_MISMATCH)
> -   cl_target_option_print_diff
> -(dump_file, 2, target_opts_for_fn (e->caller->decl),
> -  target_opts_for_fn (e->callee->ultimate_alias_target ()->decl));
> +   if (dump_file)
> + cl_target_option_print_diff
> +   (dump_file, 2, target_opts_for_fn (e->caller->decl),
> +target_opts_for_fn (e->callee->ultimate_alias_target ()->decl));
>if (e->inline_failed == CIF_OPTIMIZATION_MISMATCH)
> -   cl_optimization_print_diff
> - (dump_file, 2, opts_for_fn (e->caller->decl),
> -  opts_for_fn (e->callee->ultimate_alias_target ()->decl));
> +   if (dump_file)
> + cl_optimization_print_diff
> +   (dump_file, 2, opts_for_fn (e->caller->decl),
> +opts_for_fn (e->callee->ultimate_alias_target ()->decl));
>  }
>  }
>
> diff --git a/gcc/testsuite/gcc.target/i386/pr87955.c 
> b/gcc/testsuite/gcc.target/i386/pr87955.c
> new file mode 100644
> index 000..ed87da6
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr87955.c
> @@ -0,0 +1,10 @@
> +/* { dg-options "-O2 -fopt-info-inline-missed" } */
> +
> +float a;
> +
> +__attribute__((__target__("fpmath=387")))
> +int b() {
> +  return a;
> +}
> +
> +int c() { return b(); } /* { dg-missed "not inlinable: c/\[0-9\]* -> 
> b/\[0-9\]*, target specific option mismatch" } */
> --
> 1.8.5.3
>


Re: [PATCH] Add sinh(tanh(x)) and cosh(tanh(x)) rules

2018-11-12 Thread Richard Biener
On Sat, Nov 10, 2018 at 6:36 AM Segher Boessenkool
 wrote:
>
> On Fri, Nov 09, 2018 at 01:03:55PM -0700, Jeff Law wrote:
> > >> And signed zeroes.  Yeah.  I think it would have to be
> > >> flag_unsafe_math_optimizations + some more.
> > >
> > > Indeed.
> > So we need to give Giuliano some clear guidance on guarding.  This is
> > out of my area of expertise, so looking to y'all to help here.
>
> IMO, it needs flag_unsafe_optimizations, as above; and it needs to be
> investigated which (if any) options like flag_signed_zeros it needs in
> addition to that.  It needs an option like that whenever the new expression
> can give a zero with a different sign than the original expression, etc.
> Although it could be said that flag_unsafe_optimizations supercedes all
> of that.  It isn't clear.

It indeed isn't clear whether at least some of the other flags make no
sense with -funsafe-math-optimizations.  Still at least for
documentation purposes
please use !flag_siged_zeros && flag_unsafe_math_optimizations && ...

flag_unsafe_math_optimizations is generally used when there's extra rounding
involved.  Some specific kind of transforms have individual flags and do not
require flag_unsafe_math_optimizations (re-association and contraction
for example).

I'm not sure I would require flag_unsafe_math_optimizations for a 2ulp
error though.

Richard.

>
> Segher


Re: [PATCH, GCC, AArch64] Branch Dilution Pass

2018-11-12 Thread Richard Biener
On Fri, Nov 9, 2018 at 6:23 PM Sudakshina Das  wrote:
>
> Hi
>
> I am posting this patch on behalf of Carey (cc'ed). I also have some
> review comments that I will make as a reply to this later.
>
>
> This implements a new AArch64 specific back-end pass that helps optimize
> branch-dense code, which can be a bottleneck for performance on some Arm
> cores. This is achieved by padding out the branch-dense sections of the
> instruction stream with nops.

Wouldn't this be more suitable for implementing inside the assembler?

> This has proven to show up to a 2.61%~ improvement on the Cortex A-72
> (SPEC CPU 2006: sjeng).
>
> The implementation includes the addition of a new RTX instruction class
> FILLER_INSN, which has been white listed to allow placement of NOPs
> outside of a basic block. This is to allow padding after unconditional
> branches. This is favorable so that any performance gained from
> diluting branches is not paid straight back via excessive eating of nops.
>
> It was deemed that a new RTX class was less invasive than modifying
> behavior in regards to standard UNSPEC nops.
>
> ## Command Line Options
>
> Three new target-specific options are provided:
> - mbranch-dilution
> - mbranch-dilution-granularity={num}
> - mbranch-dilution-max-branches={num}
>
> A number of cores known to be able to benefit from this pass have been
> given default tuning values for their granularity and max-branches.
> Each affected core has a very specific granule size and associated
> max-branch limit. This is a microarchitecture specific optimization.
> Typical usage should be -mdilute-branches with a specificed -mcpu. Cores
> with a granularity tuned to 0 will be ignored. Options are provided for
> experimentation.
>
> ## Algorithm and Heuristic
>
> The pass takes a very simple 'sliding window' approach to the problem.
> We crawl through each instruction (starting at the first branch) and
> keep track of the number of branches within the current "granule" (or
> window). When this exceeds the max-branch value, the pass will dilute
> the current granule, inserting nops to push out some of the branches.
> The heuristic will favour unconditonal branches (for performance
> reasons), or branches that are between two other branches (in order to
> decrease the likelihood of another dilution call being needed).
>
> Each branch type required a different method for nop insertion due to
> RTL/basic_block restrictions:
>
> - Returning calls do not end a basic block so can be handled by emitting
> a generic nop.
> - Unconditional branches must be the end of a basic block, and nops
> cannot be outside of a basic block.
>Thus the need for FILLER_INSN, which allows placement outside of a
> basic block - and translates to a nop.
> - For most conditional branches we've taken a simple approach and only
> handle the fallthru edge for simplicity,
>which we do by inserting a "nop block" of nops on the fallthru edge,
> mapping that back to the original destination block.
> - asm gotos and pcsets are going to be tricky to analyse from a dilution
> perspective so are ignored at present.
>
>
> ## Changelog
>
> gcc/testsuite/ChangeLog:
>
> 2018-11-09  Carey Williams  
>
> * gcc.target/aarch64/branch-dilution-off.c: New test.
> * gcc.target/aarch64/branch-dilution-on.c: New test.
>
>
> gcc/ChangeLog:
>
> 2018-11-09  Carey Williams  
>
> * cfgbuild.c (inside_basic_block_p): Add FILLER_INSN case.
> * cfgrtl.c (rtl_verify_bb_layout): Whitelist FILLER_INSN outside
> basic blocks.
> * config.gcc (extra_objs): Add aarch64-branch-dilution.o.
> * config/aarch64/aarch64-branch-dilution.c: New file.
> * config/aarch64/aarch64-passes.def (branch-dilution): Register
> pass.
> * config/aarch64/aarch64-protos.h (struct tune_params): Declare
> tuning parameters bdilution_gsize and bdilution_maxb.
> (make_pass_branch_dilution): New declaration.
> * config/aarch64/aarch64.c (generic_tunings,cortexa35_tunings,
> cortexa53_tunings,cortexa57_tunings,cortexa72_tunings,
> cortexa73_tunings,exynosm1_tunings,thunderxt88_tunings,
> thunderx_tunings,tsv110_tunings,xgene1_tunings,
> qdf24xx_tunings,saphira_tunings,thunderx2t99_tunings):
> Provide default tunings for bdilution_gsize and bdilution_maxb.
> * config/aarch64/aarch64.md (filler_insn): Define new insn.
> * config/aarch64/aarch64.opt (mbranch-dilution,
> mbranch-dilution-granularity,
> mbranch-dilution-max-branches): Define new branch dilution
> options.
> * config/aarch64/t-aarch64 (aarch64-branch-dilution.c): New rule
> for aarch64-branch-dilution.c.
> * coretypes.h (rtx_filler_insn): New rtx class.
> * doc/invoke.texi (mbranch-dilution,
> mbranch-dilution-granularity,
> mbranch-dilution-max-branches): Document branch dilution
> options.
> * emit-rtl.c (em

Re: [PATCH][cunroll] Add unroll-known-loop-iterations-only param and use it in aarch64

2018-11-12 Thread Richard Biener
On Fri, Nov 9, 2018 at 6:57 PM Kyrill Tkachov
 wrote:
>
> On 09/11/18 12:18, Richard Biener wrote:
> > On Fri, Nov 9, 2018 at 11:47 AM Kyrill Tkachov
> >  wrote:
> >>
> >> Hi all,
> >>
> >> In this testcase the codegen for VLA SVE is worse than it could be due to 
> >> unrolling:
> >>
> >> fully_peel_me:
> >>  mov x1, 5
> >>  ptrue   p1.d, all
> >>  whilelo p0.d, xzr, x1
> >>  ld1dz0.d, p0/z, [x0]
> >>  faddz0.d, z0.d, z0.d
> >>  st1dz0.d, p0, [x0]
> >>  cntdx2
> >>  addvl   x3, x0, #1
> >>  whilelo p0.d, x2, x1
> >>  beq .L1
> >>  ld1dz0.d, p0/z, [x0, #1, mul vl]
> >>  faddz0.d, z0.d, z0.d
> >>  st1dz0.d, p0, [x3]
> >>  cntwx2
> >>  incbx0, all, mul #2
> >>  whilelo p0.d, x2, x1
> >>  beq .L1
> >>  ld1dz0.d, p0/z, [x0]
> >>  faddz0.d, z0.d, z0.d
> >>  st1dz0.d, p0, [x0]
> >> .L1:
> >>  ret
> >>
> >> In this case, due to the vector-length-agnostic nature of SVE the compiler 
> >> doesn't know the loop iteration count.
> >> For such loops we don't want to unroll if we don't end up eliminating 
> >> branches as this just bloats code size
> >> and hurts icache performance.
> >>
> >> This patch introduces a new unroll-known-loop-iterations-only param that 
> >> disables cunroll when the loop iteration
> >> count is unknown (SCEV_NOT_KNOWN). This case occurs much more often for 
> >> SVE VLA code, but it does help some
> >> Advanced SIMD cases as well where loops with an unknown iteration count 
> >> are not unrolled when it doesn't eliminate
> >> the branches.
> >>
> >> So for the above testcase we generate now:
> >> fully_peel_me:
> >>  mov x2, 5
> >>  mov x3, x2
> >>  mov x1, 0
> >>  whilelo p0.d, xzr, x2
> >>  ptrue   p1.d, all
> >> .L2:
> >>  ld1dz0.d, p0/z, [x0, x1, lsl 3]
> >>  faddz0.d, z0.d, z0.d
> >>  st1dz0.d, p0, [x0, x1, lsl 3]
> >>  incdx1
> >>  whilelo p0.d, x1, x3
> >>  bne .L2
> >>  ret
> >>
> >> Not perfect still, but it's preferable to the original code.
> >> The new param is enabled by default on aarch64 but disabled for other 
> >> targets, leaving their behaviour unchanged
> >> (until other target people experiment with it and set it, if appropriate).
> >>
> >> Bootstrapped and tested on aarch64-none-linux-gnu.
> >> Benchmarked on SPEC2017 on a Cortex-A57 and there are no differences in 
> >> performance.
> >>
> >> Ok for trunk?
> >
> > Hum.  Why introduce a new --param and not simply key on
> > flag_peel_loops instead?  That is
> > enabled by default at -O3 and with FDO but you of course can control
> > that in your targets
> > post-option-processing hook.
>
> You mean like this?
> It's certainly a simpler patch, but I was just a bit hesitant of making this 
> change for all targets :)
> But I suppose it's a reasonable change.

No, that change is backward.  What I said is that peeling is already
conditional on
flag_peel_loops and that is enabled by -O3.  So you want to disable
flag_peel_loops for
SVE instead in the target.

> >
> > It might also make sense to have more fine-grained control for this
> > and allow a target
> > to say whether it wants to peel a specific loop or not when the
> > middle-end thinks that
> > would be profitable.
>
> Can be worth looking at as a follow-up. Do you envisage the target analysing
> the gimple statements of the loop to figure out its cost?

Kind-of.  Sth like

  bool targetm.peel_loop (struct loop *);

I have no idea whether you can easily detect a SVE vectorized loop though.
Maybe there's always a special IV or so (the mask?)

Richard.

> Thanks,
> Kyrill
>
>
> 2018-11-09  Kyrylo Tkachov  
>
> * tree-ssa-loop-ivcanon.c (try_unroll_loop_completely): Do not unroll
> loop when number of iterations is not known and flag_peel_loops is in
> effect.
>
> 2018-11-09  Kyrylo Tkachov  
>
> * gcc.target/aarch64/sve/unroll-1.c: New test.
>


[PATCH] Replace sync builtins with atomic builtins

2018-11-12 Thread Janne Blomqvist
The old __sync builtins have been deprecated for a long time now in
favor of the __atomic builtins following the C++11/C11 memory model.
This patch converts libgfortran to use the modern __atomic builtins.

At the same time I weakened the consistency to relaxed for
incrementing and decrementing the counter, and acquire-release when
decrementing to check whether the counter is 0 and the unit can be
freed.  This is similar to e.g. std::shared_ptr in C++.  Jakub, as the
original author of the algorithm, do you concur?

Regtested on x86_64-pc-linux-gnu, Ok for trunk?

libgfortran/ChangeLog:

2018-11-12  Janne Blomqvist  

* acinclude.m4 (LIBGFOR_CHECK_ATOMIC_FETCH_ADD): Rename and test
presence of atomic builtins instead of sync builtins.
* configure.ac (LIBGFOR_CHECK_ATOMIC_FETCH_ADD): Call new test.
* io/io.h (inc_waiting_locked): Use __atomic_fetch_add.
(predec_waiting_locked): Use __atomic_add_fetch.
(dec_waiting_unlocked): Use __atomic_fetch_add.
* config.h.in: Regenerated.
* configure: Regenerated.
---
 libgfortran/acinclude.m4 | 20 ++--
 libgfortran/configure.ac |  4 ++--
 libgfortran/io/io.h  | 24 ++--
 3 files changed, 30 insertions(+), 18 deletions(-)

diff --git a/libgfortran/acinclude.m4 b/libgfortran/acinclude.m4
index dd5429ac0d2..5b0c094e716 100644
--- a/libgfortran/acinclude.m4
+++ b/libgfortran/acinclude.m4
@@ -59,17 +59,17 @@ extern void bar(void) __attribute__((alias("foo")));]],
   [Define to 1 if the target supports __attribute__((alias(...))).])
   fi])
 
-dnl Check whether the target supports __sync_fetch_and_add.
-AC_DEFUN([LIBGFOR_CHECK_SYNC_FETCH_AND_ADD], [
-  AC_CACHE_CHECK([whether the target supports __sync_fetch_and_add],
-libgfor_cv_have_sync_fetch_and_add, [
+dnl Check whether the target supports __atomic_fetch_add.
+AC_DEFUN([LIBGFOR_CHECK_ATOMIC_FETCH_ADD], [
+  AC_CACHE_CHECK([whether the target supports __atomic_fetch_add],
+libgfor_cv_have_atomic_fetch_add, [
   AC_LINK_IFELSE([AC_LANG_PROGRAM([[int foovar = 0;]], [[
-if (foovar <= 0) return __sync_fetch_and_add (&foovar, 1);
-if (foovar > 10) return __sync_add_and_fetch (&foovar, -1);]])],
- libgfor_cv_have_sync_fetch_and_add=yes, 
libgfor_cv_have_sync_fetch_and_add=no)])
-  if test $libgfor_cv_have_sync_fetch_and_add = yes; then
-AC_DEFINE(HAVE_SYNC_FETCH_AND_ADD, 1,
- [Define to 1 if the target supports __sync_fetch_and_add])
+if (foovar <= 0) return __atomic_fetch_add (&foovar, 1, __ATOMIC_ACQ_REL);
+if (foovar > 10) return __atomic_add_fetch (&foovar, -1, 
__ATOMIC_ACQ_REL);]])],
+ libgfor_cv_have_atomic_fetch_add=yes, 
libgfor_cv_have_atomic_fetch_add=no)])
+  if test $libgfor_cv_have_atomic_fetch_add = yes; then
+AC_DEFINE(HAVE_ATOMIC_FETCH_ADD, 1,
+ [Define to 1 if the target supports __atomic_fetch_add])
   fi])
 
 dnl Check for pragma weak.
diff --git a/libgfortran/configure.ac b/libgfortran/configure.ac
index 76007d38f6f..30ff8734760 100644
--- a/libgfortran/configure.ac
+++ b/libgfortran/configure.ac
@@ -608,8 +608,8 @@ fi
 LIBGFOR_CHECK_ATTRIBUTE_VISIBILITY
 LIBGFOR_CHECK_ATTRIBUTE_ALIAS
 
-# Check out sync builtins support.
-LIBGFOR_CHECK_SYNC_FETCH_AND_ADD
+# Check out atomic builtins support.
+LIBGFOR_CHECK_ATOMIC_FETCH_ADD
 
 # Check out #pragma weak.
 LIBGFOR_GTHREAD_WEAK
diff --git a/libgfortran/io/io.h b/libgfortran/io/io.h
index 902eb412848..282c1455763 100644
--- a/libgfortran/io/io.h
+++ b/libgfortran/io/io.h
@@ -961,8 +961,8 @@ internal_proto(free_ionml);
 static inline void
 inc_waiting_locked (gfc_unit *u)
 {
-#ifdef HAVE_SYNC_FETCH_AND_ADD
-  (void) __sync_fetch_and_add (&u->waiting, 1);
+#ifdef HAVE_ATOMIC_FETCH_ADD
+  (void) __atomic_fetch_add (&u->waiting, 1, __ATOMIC_RELAXED);
 #else
   u->waiting++;
 #endif
@@ -971,8 +971,20 @@ inc_waiting_locked (gfc_unit *u)
 static inline int
 predec_waiting_locked (gfc_unit *u)
 {
-#ifdef HAVE_SYNC_FETCH_AND_ADD
-  return __sync_add_and_fetch (&u->waiting, -1);
+#ifdef HAVE_ATOMIC_FETCH_ADD
+  /* Note that the pattern
+
+ if (predec_waiting_locked (u) == 0)
+ // destroy u
+
+ could be further optimized by making this be an __ATOMIC_RELEASE,
+ and then inserting a
+
+ __atomic_thread_fence (__ATOMIC_ACQUIRE);
+
+ inside the branch before destroying.  But for now, lets keep it
+ simple.  */
+  return __atomic_add_fetch (&u->waiting, -1, __ATOMIC_ACQ_REL);
 #else
   return --u->waiting;
 #endif
@@ -981,8 +993,8 @@ predec_waiting_locked (gfc_unit *u)
 static inline void
 dec_waiting_unlocked (gfc_unit *u)
 {
-#ifdef HAVE_SYNC_FETCH_AND_ADD
-  (void) __sync_fetch_and_add (&u->waiting, -1);
+#ifdef HAVE_ATOMIC_FETCH_ADD
+  (void) __atomic_fetch_add (&u->waiting, -1, __ATOMIC_RELAXED);
 #else
   __gthread_mutex_lock (&unit_lock);
   u->waiting--;
-- 
2.17.1



Re: [PATCH] Simplify floating point comparisons

2018-11-12 Thread Richard Biener
On Fri, Nov 9, 2018 at 6:05 PM Wilco Dijkstra  wrote:
>
> Richard Biener wrote:
> >Marc Glisse wrote:
> >> Let's try with C = DBL_MIN and x = 婊BL_MAX. I don't believe it involves
> >> signed zeros or infinities, just an underflow. First, the result depends on
> >> the rounding mode. And in the default round-to-nearest, both divisions give
> >> 0, and thus compare the same with 0, but we replace that with a sign test 
> >> on
> >> x, where they clearly give opposite answers.
> >>
> >> What would be the proper flag to test to check if we care about underflow?
> >
> > We have none specific so this makes it flag_unsafe_math_optimizations.
>
> Right I have added the unsafe math check again like in the previous version:
>
>
> The patch implements some of the optimizations discussed in
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71026.
>
> Simplify (C / x >= 0.0) into x >= 0.0 with -funsafe-math-optimizations
> (since C / x can underflow to zero if x is huge, it's not safe otherwise).
> If C is negative the comparison is reversed.
>
>
> Simplify (x * C1) > C2 into x > (C2 / C1) with -funsafe-math-optimizations.
> If C1 is negative the comparison is reversed.
>
> OK for commit?

OK.

Thanks,
Richard.

> ChangeLog
> 2018-11-09  Wilco Dijkstra  
> Jackson Woodruff  
>
> gcc/
> PR 71026/tree-optimization
> * match.pd: Simplify floating point comparisons.
>
> gcc/testsuite/
> PR 71026/tree-optimization
> * gcc.dg/div-cmp-1.c: New test.
> * gcc.dg/div-cmp-2.c: New test.
>
> --
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 
> 94fbab841f5e36bd33fda849a686fd80886ee1ff..f6c76510f95be2485e5bacd07edab336705cbd25
>  100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -405,6 +405,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (rdiv @0 (negate @1))
>   (rdiv (negate @0) @1))
>
> +(if (flag_unsafe_math_optimizations)
> + /* Simplify (C / x op 0.0) to x op 0.0 for C != 0, C != Inf/Nan.
> +Since C / x may underflow to zero, do this only for unsafe math.  */
> + (for op (lt le gt ge)
> +  neg_op (gt ge lt le)
> +  (simplify
> +   (op (rdiv REAL_CST@0 @1) real_zerop@2)
> +   (if (!HONOR_SIGNED_ZEROS (@1) && !HONOR_INFINITIES (@1))
> +(switch
> + (if (real_less (&dconst0, TREE_REAL_CST_PTR (@0)))
> +  (op @1 @2))
> + /* For C < 0, use the inverted operator.  */
> + (if (real_less (TREE_REAL_CST_PTR (@0), &dconst0))
> +  (neg_op @1 @2)))
> +
>  /* Optimize (X & (-A)) / A where A is a power of 2, to X >> log2(A) */
>  (for div (trunc_div ceil_div floor_div round_div exact_div)
>   (simplify
> @@ -4049,6 +4064,22 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> (rdiv @2 @1))
> (rdiv (op @0 @2) @1)))
>
> + (for cmp (lt le gt ge)
> +  neg_cmp (gt ge lt le)
> +  /* Simplify (x * C1) cmp C2 -> x cmp (C2 / C1), where C1 != 0.  */
> +  (simplify
> +   (cmp (mult @0 REAL_CST@1) REAL_CST@2)
> +   (with
> +{ tree tem = const_binop (RDIV_EXPR, type, @2, @1); }
> +(if (tem
> +&& !(REAL_VALUE_ISINF (TREE_REAL_CST (tem))
> + || (real_zerop (tem) && !real_zerop (@1
> + (switch
> +  (if (real_less (&dconst0, TREE_REAL_CST_PTR (@1)))
> +   (cmp @0 { tem; }))
> +  (if (real_less (TREE_REAL_CST_PTR (@1), &dconst0))
> +   (neg_cmp @0 { tem; })))
> +
>   /* Simplify sqrt(x) * sqrt(y) -> sqrt(x*y).  */
>   (for root (SQRT CBRT)
>(simplify
> diff --git a/gcc/testsuite/gcc.dg/div-cmp-1.c 
> b/gcc/testsuite/gcc.dg/div-cmp-1.c
> new file mode 100644
> index 
> ..cd1a5cd3d6fee5a10e9859ca99b344fa3fdb7f5f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/div-cmp-1.c
> @@ -0,0 +1,29 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -funsafe-math-optimizations -fdump-tree-optimized-raw" 
> } */
> +
> +int
> +cmp_mul_1 (float x)
> +{
> +  return x * 3 <= 100;
> +}
> +
> +int
> +cmp_mul_2 (float x)
> +{
> +  return x * -5 > 100;
> +}
> +
> +int
> +div_cmp_1 (float x, float y)
> +{
> +  return x / 3 <= y;
> +}
> +
> +int
> +div_cmp_2 (float x, float y)
> +{
> +  return x / 3 <= 1;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "mult_expr" 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-not "rdiv_expr" "optimized" } } */
> diff --git a/gcc/testsuite/gcc.dg/div-cmp-2.c 
> b/gcc/testsuite/gcc.dg/div-cmp-2.c
> new file mode 100644
> index 
> ..f4ac42a196a804747d0b578e0aa2131671c8d3cf
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/div-cmp-2.c
> @@ -0,0 +1,29 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -funsafe-math-optimizations -ffinite-math-only 
> -fdump-tree-optimized-raw" } */
> +
> +int
> +cmp_1 (float x)
> +{
> +  return 5 / x >= 0;
> +}
> +
> +int
> +cmp_2 (float x)
> +{
> +  return 1 / x <= 0;
> +}
> +
> +int
> +cmp_3 (float x)
> +{
> +  return -2 / x >= 0;
> +}
> +
> +int
> +cmp_4 (float x)
> +{
> +  return -5 / x <= 0;
> +}
> +
> +/* { dg-final { scan-tree-dump-not "rdiv_expr" "optimized" } } */
>

Re: [PATCH][GCC] Make DR_TARGET_ALIGNMENT compile time variable

2018-11-12 Thread Richard Biener
On Fri, Nov 9, 2018 at 5:08 PM Andre Vieira (lists)
 wrote:
>
> On 05/11/18 12:41, Richard Biener wrote:
> > On Mon, Nov 5, 2018 at 1:07 PM Andre Vieira (lists)
> >  wrote:
> >>
> >>
> >> Hi,
> >>
> Hi,
>
> Thank you for the quick response! See inline responses below.
>
> >> This patch enables targets to describe DR_TARGET_ALIGNMENT as a
> >> compile-time variable.  It does so by turning the variable into a
> >> 'poly_uint64'.  This should not affect the current code-generation for
> >> any target.
> >>
> >> We hope to use this in the near future for SVE using the
> >> current_vector_size as the preferred target alignment for vectors.  In
> >> fact I have a patch to do just this, but I am still trying to figure out
> >> whether and when it is beneficial to peel for alignment with a runtime
> >> misalignment.
> >
> > In fact in most cases I have seen the issue is that it's not visible whether
> > peeling will be able to align _all_ references and doing peeling only to
> > align some is hardly beneficial.  To improve things the vectorizer would
> > have to version the loop for the case where peeling can reach alignment
> > for a group of DRs and then vectorize one copy with peeling for alignment
> > and one copy with unaligned accesses.
>
>
> So I have seen code being peeled for alignment even when it only knows
> how to align one of a group (only checked 2 or 3) and I think this may
> still be beneficial in some cases.  I am more worried about cases where
> the number of iterations isn't enough to justify the initial peeling
> cost or when the loop isn't memory bound, i.e. very arithmetic heavy
> loops.  This is a bigger vectorization problem though, that would
> require some kind of cost-model.
>
> >
> >>  The patch I am working on will change the behavior of
> >> auto-vectorization for SVE when building vector-length agnostic code for
> >> targets that benefit from aligned vector loads/stores.  The patch will
> >> result in  the generation of a runtime computation of misalignment and
> >> the construction of a corresponding mask for the first iteration of the
> >> loop.
> >>
> >> I have decided to not offer support for prolog/epilog peeling when the
> >> target alignment is not compile-time constant, as this didn't seem
> >> useful, this is why 'vect_do_peeling' returns early if
> >> DR_TARGET_ALIGNMENT is not constant.
> >>
> >> I bootstrapped and tested this on aarch64 and x86 basically
> >> bootstrapping one target that uses this hook and one that doesn't.
> >>
> >> Is this OK for trunk?
> >
> > The patch looks good but I wonder wheter it is really necessary at this
> > point.
>
> The goal of this patch is really to enable future work, on it's own it
> does nothing.  I am working on a small target-specific patch to enable
> this for SVE, but I need to do a bit more analysis and benchmarking to
> be able to determine whether its beneficial which I will not be able to
> finish before end of stage 1. That is why I split them up and sent this
> one upstream to see if I could get the middle-end change in.

OK, fine with me then.

Thanks,
Richard.

> >
> > Thanks,
> > Richard.
> >
> >> Cheers,
> >> Andre
> >>
> >> 2018-11-05  Andre Vieira  
> >>
> >> * config/aarch64/aarch64.c 
> >> (aarch64_vectorize_preferred_vector_alignment):
> >> Change return type to poly_uint64.
> >> (aarch64_simd_vector_alignment_reachable): Adapt to preferred 
> >> vector
> >> alignment being a poly int.
> >> * doc/tm.texi (TARGET_VECTORIZE_PREFERRED_VECTOR_ALIGNMENT): 
> >> Change return
> >> type to poly_uint64.
> >> * target.def (default_preferred_vector_alignment): Likewise.
> >> * targhooks.c (default_preferred_vector_alignment): Likewise.
> >> * targhooks.h (default_preferred_vector_alignment): Likewise.
> >> * tree-vect-data-refs.c
> >> (vect_calculate_target_alignment): Likewise.
> >> (vect_compute_data_ref_alignment): Adapt to vector alignment
> >> being a poly int.
> >> (vect_update_misalignment_for_peel): Likewise.
> >> (vect_enhance_data_refs_alignment): Likewise.
> >> (vect_find_same_alignment_drs): Likewise.
> >> (vect_duplicate_ssa_name_ptr_info): Likewise.
> >> (vect_setup_realignment): Likewise.
> >> (vect_can_force_dr_alignment_p): Change alignment parameter type to
> >> poly_uint64.
> >> * tree-vect-loop-manip.c (get_misalign_in_elems): Learn to 
> >> construct a mask
> >> with a compile time variable vector alignment.
> >> (vect_gen_prolog_loop_niters): Adapt to vector alignment being a 
> >> poly int.
> >> (vect_do_peeling): Exit early if vector alignment is not constant.
> >> * tree-vect-stmts.c (ensure_base_align): Adapt to vector alignment 
> >> being a
> >> poly int.
> >> (vectorizable_store): Likewise.
> >> (vectorizable_load): Likweise.
> >> * tree-vectorizer.h (struct dr_vec_info): Make target

Re: [PATCH] Change set_value_range_to_[non]null to not preserve equivs

2018-11-12 Thread Jeff Law
On 11/12/18 4:11 AM, Richard Biener wrote:
> 
> This is a semantic change but AFAICS it shouldn't result in any 
> pessimization.  The behavior of the API is non-obvious.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
> 
> Richard.
> 
> 2018-11-12  Richard Biener  
> 
>   * tree-vrp.c (set_value_range_to_nonnull): Clear equiv.
>   (set_value_range_to_null): Likewise.
>   * vr-values.c (vr_values::extract_range_from_comparison):
>   Clear equiv for constant singleton ranges.
No concerns from my side.  When I did my work last year I was trying to
preserve existing semantics, so I didn't really look at places to drop
uses of the equivalence bitmaps.

Jeff


[PATCH] Fix PR87985

2018-11-12 Thread Richard Biener


The following fixes split_constant_offset unbound un-CSEing of
expressions when following SSA def stmts.  Simply limiting it to
single-uses isn't good for consumers so the following instead
limits analysis by implementing a cache.  Note this may still
end up un-CSEing stuff but I didn't want to try inserting
SAVE_EXPRs in split_constant_offset result...  (maybe I should
simply try though...).  Another option would be to give up
when we see several uses of an "interesting" expression, thus
make the hash-map a visited thing instead (but the result would
be somewhat odd I guess).

Anyway, the following preserves existing behavior while fixing
the compile-time issue for the testcase (which doesn't end up
generating anything interesting).

Bootstrap & regtest running on x86_64-unknown-linux-gnu.

Richard.

>From cfe2c14173b8d2fa6e998e9895dce0cdf9b3e00e Mon Sep 17 00:00:00 2001
From: Richard Guenther 
Date: Mon, 12 Nov 2018 14:45:27 +0100
Subject: [PATCH] fix-pr87985

PR middle-end/87985
* tree-data-ref.c (split_constant_offset): Add wrapper
allocating a cache hash-map.
(split_constant_offset_1): Cache results of expanding
expressions from SSA def stmts.

* gcc.dg/pr87985.c: New testcase.

diff --git a/gcc/testsuite/gcc.dg/pr87985.c b/gcc/testsuite/gcc.dg/pr87985.c
new file mode 100644
index 000..c0d07ff918f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr87985.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O -ftree-slp-vectorize" } */
+
+char *bar (void);
+__INTPTR_TYPE__ baz (void);
+
+void
+foo (__INTPTR_TYPE__ *q)
+{
+  char *p = bar ();
+  __INTPTR_TYPE__ a = baz ();
+  __INTPTR_TYPE__ b = baz ();
+  int i = 0;
+#define X q[i++] = a; q[i++] = b; a = a + b; b = b + a;
+#define Y X X X X X X X X X X
+#define Z Y Y Y Y Y Y Y Y Y Y
+  Z Z Z Z Z Z Z Z Z Z
+  p[a] = 1;
+  p[b] = 2;
+}
diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c
index 6019c6168bf..0617c97eec4 100644
--- a/gcc/tree-data-ref.c
+++ b/gcc/tree-data-ref.c
@@ -95,10 +95,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-affine.h"
 #include "params.h"
 #include "builtins.h"
-#include "stringpool.h"
-#include "tree-vrp.h"
-#include "tree-ssanames.h"
 #include "tree-eh.h"
+#include "ssa.h"
 
 static struct datadep_stats
 {
@@ -584,6 +582,10 @@ debug_ddrs (vec ddrs)
   dump_ddrs (stderr, ddrs);
 }
 
+static void
+split_constant_offset (tree exp, tree *var, tree *off,
+  hash_map > &cache);
+
 /* Helper function for split_constant_offset.  Expresses OP0 CODE OP1
(the type of the result is TYPE) as VAR + OFF, where OFF is a nonzero
constant of type ssizetype, and returns true.  If we cannot do this
@@ -592,7 +594,8 @@ debug_ddrs (vec ddrs)
 
 static bool
 split_constant_offset_1 (tree type, tree op0, enum tree_code code, tree op1,
-tree *var, tree *off)
+tree *var, tree *off,
+hash_map > &cache)
 {
   tree var0, var1;
   tree off0, off1;
@@ -613,8 +616,10 @@ split_constant_offset_1 (tree type, tree op0, enum 
tree_code code, tree op1,
   /* FALLTHROUGH */
 case PLUS_EXPR:
 case MINUS_EXPR:
-  split_constant_offset (op0, &var0, &off0);
-  split_constant_offset (op1, &var1, &off1);
+  split_constant_offset (op0, &var0, &off0, cache);
+  split_constant_offset (op1, &var1, &off1, cache);
+  if (integer_zerop (off0) && integer_zerop (off1))
+   return false;
   *var = fold_build2 (code, type, var0, var1);
   *off = size_binop (ocode, off0, off1);
   return true;
@@ -623,7 +628,9 @@ split_constant_offset_1 (tree type, tree op0, enum 
tree_code code, tree op1,
   if (TREE_CODE (op1) != INTEGER_CST)
return false;
 
-  split_constant_offset (op0, &var0, &off0);
+  split_constant_offset (op0, &var0, &off0, cache);
+  if (integer_zerop (off0))
+   return false;
   *var = fold_build2 (MULT_EXPR, type, var0, op1);
   *off = size_binop (MULT_EXPR, off0, fold_convert (ssizetype, op1));
   return true;
@@ -647,7 +654,7 @@ split_constant_offset_1 (tree type, tree op0, enum 
tree_code code, tree op1,
 
if (poffset)
  {
-   split_constant_offset (poffset, &poffset, &off1);
+   split_constant_offset (poffset, &poffset, &off1, cache);
off0 = size_binop (PLUS_EXPR, off0, off1);
if (POINTER_TYPE_P (TREE_TYPE (base)))
  base = fold_build_pointer_plus (base, poffset);
@@ -691,11 +698,40 @@ split_constant_offset_1 (tree type, tree op0, enum 
tree_code code, tree op1,
if (gimple_code (def_stmt) != GIMPLE_ASSIGN)
  return false;
 
-   var0 = gimple_assign_rhs1 (def_stmt);
subcode = gimple_assign_rhs_code (def_stmt);
+
+   /* We are using a cache to avoid un-CSEing large amounts of code.  */
+   bool use_cache = false;
+   if (!has_single_use (op0)
+   && (subcode == POINTER_PLUS_EXPR
+

Re: [PATCH] Come up with htab_hash_string_vptr and use string-specific if possible.

2018-11-12 Thread Michael Matz
Hi,

On Mon, 12 Nov 2018, Martin Liška wrote:

> > There's no fundamental reason why we can't poison identifiers in other 
> > headers.  Indeed we do in vec.h.  So move the whole thing including 
> > poisoning to hash-table.h?
> 
> That's not feasible as gcc/gcc/genhooks.c files use the function and
> we don't want to include hash-table.h in the generator files.

gencfn-macros.c:#include "hash-table.h"
genmatch.c:#include "hash-table.h"
gentarget-def.c:#include "hash-table.h"

So there's precedent.  The other solution would be to ignore genhooks.c 
(i.e. let it continue using the non-typesafe variant), I'm not very 
worried about wrong uses creeping in there.  It had like one material 
change in the last seven years.

I think I prefer the latter (ignoring the problem).

> So it's question whether it worth doing that?

Jumping through hoops for generator files seems useless.  But the general 
idea for your type-checking hashers for the compiler proper does seem 
useful.


Ciao,
Michael.

Re: [RS6000] Don't pass -many to the assembler

2018-11-12 Thread Alan Modra
On Mon, Nov 12, 2018 at 10:19:04PM +1030, Alan Modra wrote:
> I'd like to remove -many from the options passed by default to the
> assembler, on the grounds that a gcc bug in instruction selection (eg.
> emitting a power9 insn for -mcpu=power8) is better found at assembly
> time than run time.
> 
> This might annoy people for a while fixing user asm that we didn't
> diagnose previously, but I believe this is the right direction to go.
> Of course, -Wa,-many is available for anyone who just wants their
> dodgy old code to work.
> 
> Bootstrapped etc. powerpc64le-linux.  OK?

I forgot to mention something important.  This exposes a bug with our
target_clones support, in that we don't emit .machine directives when
changing cpu.  eg. gcc.target/powerpc/clone2.c fails with
"unrecognized opcode: `modsd'".

__attribute__((__target__("cpu=..."))) also doesn't emit a .machine
directive before the affected function code.

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH] combine: Do not combine moves from hard registers

2018-11-12 Thread Segher Boessenkool
On Mon, Nov 12, 2018 at 11:54:37AM +, Sam Tebbs wrote:
> On 11/08/2018 08:34 PM, Segher Boessenkool wrote:
> 
> > On Thu, Nov 08, 2018 at 03:44:44PM +, Sam Tebbs wrote:
> >> Does your patch fix the incorrect generation of "scvtf s1, s1"? I was
> >> looking at the issue as well and don't want to do any overlapping work.
> > I don't know.  Well, there are no incorrect code issues I know of at all
> > now; but you mean that it is taking an instruction more than you would
> > like to see, I suppose?
> 
> Yes, I am referring to the extra instruction being generated. In my
> opinion is incorrect code generation since the intention is that it
> shouldn't be generated, and shouldn't based on the relevant code and
> patterns implemented.

That is not what incorrect code means; it is a "missed optimisation",
instead.

Anyway, does it now do what you want?

(PR87763 btw)


Segher


Re: [PATCH 21/25] GCN Back-end (part 2/2).

2018-11-12 Thread Andrew Stubbs

On 09/11/2018 19:39, Jeff Law wrote:

+
+/* Generate epilogue.  Called from gen_epilogue during pro_and_epilogue pass.
+
+   See gcn_expand_prologue for stack details.  */
+
+void
+gcn_expand_epilogue (void)

You probably need a barrier in here to ensure that the scheduler doesn't
move an aliased memory reference into the local stack beyond the stack
adjustment.

You're less likely to run into it because you eliminate frame pointers
fairly aggressively, but it's still the right thing to do.


Sorry, I'm not sure I understand what the problem is? How can this 
happen? Surely the scheduler wouldn't change the logic of the code?



+
+/* Implement TARGET_LEGITIMATE_COMBINED_INSN.
+
+   Return false if the instruction is not appropriate as a combination of two
+   or more instructions.  */
+
+bool
+gcn_legitimate_combined_insn (rtx_insn *insn)
+{
+  rtx pat = PATTERN (insn);
+
+  /* The combine pass tends to strip (use (exec)) patterns from insns.  This
+ means it basically switches everything to use the *_scalar form of the
+ instructions, which is not helpful.  So, this function disallows such
+ combinations.  Unfortunately, this also disallows combinations of genuine
+ scalar-only patterns, but those only come from explicit expand code.
+
+ Possible solutions:
+ - Invent TARGET_LEGITIMIZE_COMBINED_INSN.
+ - Remove all (use (EXEC)) and rely on md_reorg with "exec" attribute.
+   */

This seems a bit hokey.  Why specifically is combine removing the USE?


I don't understand combine fully enough to explain it now, although at 
the time I wrote this, and in a GCC 7 code base, I had followed the code 
through and observed what it was doing.


Basically, if you have two patterns that do the same operation, but one 
has a "parallel" with an additional "use", then combine will tend to 
prefer the one without the "use". That doesn't stop the code working, 
but it makes a premature (accidental) decision about instruction 
selection that we'd prefer to leave to the register allocator.


I don't recall if it did this to lone instructions, but it would 
certainly do so when combining two (or more) instructions, and IIRC 
there are typically plenty of simple moves around that can be easily 
combined.



+  /* "Manually Inserted Wait States (NOPs)."
+
+ GCN hardware detects most kinds of register dependencies, but there
+ are some exceptions documented in the ISA manual.  This pass
+ detects the missed cases, and inserts the documented number of NOPs
+ required for correct execution.  */

How unpleasant :(  But if it's what you need to do, so be it.  I'll
assume the compiler is the right place to do this -- though some ports
handle this kind of stuff in the assembler or linker.


We're using an LLVM assembler and linker, so we have tried to use them 
as is, rather than making parallel changes that would prevent GCC 
working with the last numbered release of LLVM (see the work around for 
assembler bugs in the BImode mode instruction).


Expecting the assembler to fix this up would also throw off the 
compiler's offset calculations, and the near/far branch instructions 
have different register requirements it's better for the compiler to 
know about.


The MIPS backend also inserts NOPs in a similar way.

In future, I'd like to have the scheduler insert real instructions into 
these slots, but that's very much on the to-do list.



+/* Disable the "current_vector_size" feature intended for
+   AVX<->SSE switching.  */

Guessing you just copied the comment, you probably want to update it to
not refer to AVX/SSE.


Nope, that means exactly what it says. See the (unresolved) discussion 
around "[PATCH 13/25] Create TARGET_DISABLE_CURRENT_VECTOR_SIZE".


I'll probably move that into a separate patch to commit after the main 
port. It'll suffer poor vectorization in some examples in the mean-time, 
but that patch is not going to be straight-forward.



You probably need to define the safe-speculation stuff
(TARGET_SPECULATION_SAFE_VALUE).


Oh, OK. :-(

I have no idea whether the architecture has those issues or not.


+; "addptr" is the same as "add" except that it must not write to VCC or SCC
+; as a side-effect.  Unfortunately GCN3 does not have a suitable instruction
+; for this, so we use a split to save and restore the condition code.
+; This pattern must use "Sg" instead of "SD" to prevent the compiler
+; assigning VCC as the destination.
+; FIXME: Provide GCN5 implementation

I worry about the save/restore aspects of this.  Haven't we discussed
this somewhere?!?


I think this came up in the SPECIAL_REGNO_P patch discussion. We 
eventually found that the underlying problem was the way the 
save/restore reused pseudoregs.


The "addptr" pattern has been rewritten in my draft V2 patchset. It 
still uses a fixed scratch register, but no longer does save/restore.



Generally I don't see major concerns.   THere's some minor things to
fix.  As far as the correctness of the code yo

Re: [PATCH 3/9][GCC][AArch64] Add autovectorization support for Complex instructions

2018-11-12 Thread Tamar Christina
Hi Kyrill,

> Hi Tamar,
> 
> On 11/11/18 10:26, Tamar Christina wrote:
> > Hi All,
> >
> > This patch adds the expander support for supporting autovectorization of 
> > complex number operations
> > such as Complex addition with a rotation along the Argand plane.  This also 
> > adds support for complex
> > FMA.
> >
> > The instructions are described in the ArmARM [1] and are available from 
> > Armv8.3-a onwards.
> >
> > Concretely, this generates
> >
> > f90:
> > mov x3, 0
> > .p2align 3,,7
> > .L2:
> > ldr q0, [x0, x3]
> > ldr q1, [x1, x3]
> > fcadd   v0.2d, v0.2d, v1.2d, #90
> > str q0, [x2, x3]
> > add x3, x3, 16
> > cmp x3, 3200
> > bne .L2
> > ret
> >
> > now instead of
> >
> > f90:
> > mov x4, x1
> > mov x1, x2
> > add x3, x4, 31
> > add x2, x0, 31
> > sub x3, x3, x1
> > sub x2, x2, x1
> > cmp x3, 62
> > mov x3, 62
> > ccmpx2, x3, 0, hi
> > bls .L5
> > mov x2, x4
> > add x3, x0, 3200
> > .p2align 3,,7
> > .L3:
> > ld2 {v2.2d - v3.2d}, [x0], 32
> > ld2 {v4.2d - v5.2d}, [x2], 32
> > cmp x0, x3
> > fsubv0.2d, v2.2d, v5.2d
> > faddv1.2d, v4.2d, v3.2d
> > st2 {v0.2d - v1.2d}, [x1], 32
> > bne .L3
> > ret
> > .L5:
> > add x6, x0, 8
> > add x5, x4, 8
> > add x2, x1, 8
> > mov x3, 0
> > .p2align 3,,7
> > .L2:
> > ldr d1, [x0, x3]
> > ldr d3, [x5, x3]
> > ldr d0, [x6, x3]
> > ldr d2, [x4, x3]
> > fsubd1, d1, d3
> > faddd0, d0, d2
> > str d1, [x1, x3]
> > str d0, [x2, x3]
> > add x3, x3, 16
> > cmp x3, 3200
> > bne .L2
> > ret
> >
> > For complex additions with a 90* rotation along the Argand plane.
> >
> > [1] 
> > https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile
> >
> > Bootstrap and Regtest on aarch64-none-linux-gnu, arm-none-gnueabihf and 
> > x86_64-pc-linux-gnu
> > are still on going but previous patch showed no regressions.
> >
> > The instructions have also been tested on aarch64-none-elf and 
> > arm-none-eabi on a Armv8.3-a model
> > and -march=Armv8.3-a+fp16 and all tests pass.
> >
> > Ok for trunk?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > 2018-11-11  Tamar Christina  
> >
> > * config/aarch64/aarch64-simd.md (aarch64_fcadd,
> > fcadd3, aarch64_fcmla,
> > fcmla4): New.
> > * config/aarch64/aarch64.h (TARGET_COMPLEX): New.
> > * config/aarch64/iterators.md (UNSPEC_FCADD90, UNSPEC_FCADD270,
> > UNSPEC_FCMLA, UNSPEC_FCMLA90, UNSPEC_FCMLA180, UNSPEC_FCMLA270): 
> > New.
> > (FCADD, FCMLA): New.
> > (rot, rotsplit1, rotsplit2): New.
> > * config/arm/types.md (neon_fcadd, neon_fcmla): New.
> >
> > -- 
> 
> 
> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index 
> c4be3101fdec930707918106cd7c53cf7584553e..12a91183a98ea23015860c77a97955cb1b30bfbb
>  100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -419,6 +419,63 @@
>   }
>   )
>   
> +;; The fcadd and fcmla patterns are made UNSPEC for the explicitly due to the
> +;; fact that their usage need to guarantee that the source vectors are
> +;; contiguous.  It would be wrong to describe the operation without being 
> able
> +;; to describe the permute that is also required, but even if that is done
> +;; the permute would have been created as a LOAD_LANES which means the values
> +;; in the registers are in the wrong order.
> +(define_insn "aarch64_fcadd"
> +  [(set (match_operand:VHSDF 0 "register_operand" "=w")
> + (unspec:VHSDF [(match_operand:VHSDF 1 "register_operand" "w")
> +(match_operand:VHSDF 2 "register_operand" "w")]
> +FCADD))]
> +  "TARGET_COMPLEX"
> +  "fcadd\t%0., %1., %2., #"
> +  [(set_attr "type" "neon_fcadd")]
> +)
> +
> +(define_expand "fcadd3"
> +  [(set (match_operand:VHSDF 0 "register_operand")
> + (unspec:VHSDF [(match_operand:VHSDF 1 "register_operand")
> +(match_operand:VHSDF 2 "register_operand")]
> +FCADD))]
> +  "TARGET_COMPLEX"
> +{
> +  emit_insn (gen_aarch64_fcadd (operands[0], operands[1],
> +operands[2]));
> +  DONE;
> +})
> +
> +(define_insn "aarch64_fcmla"
> +  [(set (match_operand:VHSDF 0 "register_operand" "=w")
> + (plus:VHSDF (match_operand:VHSDF 1 "register_operand" "0")
> + (unspec:VHSDF [(match_operand:VHSDF 2 "register_operand" 
> "w")
> +(match_operand:VHS

Re: [PATCH][LRA] Fix PR87899: r264897 cause mis-compiled native arm-linux-gnueabihf toolchain

2018-11-12 Thread Renlin Li

Hi Peter,

Thanks for the patch! It makes much more sense to me to split those functions, 
and use them separately.

I tried to build a native arm-linuxeabihf toolchain with the patch. But I got 
the following ICE:

/home/renlin/try-new/./gcc/xgcc -B/home/renlin/try-new/./gcc/ -B/usr/local/arm-none-linux-gnueabihf/bin/ -B/usr/local/arm-none-linux-gnueabihf/lib/ 
-isystem /usr/local/arm-none-linux-gnueabihf/include -isystem /usr/local/arm-none-linux-gnueabihf/sys-include   -fno-checking -O2 -g -O0 -O2  -O2 -g 
-O0 -DIN_GCC-W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wno-format -Wstrict-prototypes -Wmissing-prototypes -Wold-style-definition 
-isystem ./include   -fPIC -fno-inline -g -DIN_LIBGCC2 -fbuilding-libgcc -fno-stack-protector   -fPIC -fno-inline -I. -I. -I../.././gcc 
-I../../../gcc/libgcc -I../../../gcc/libgcc/. -I../../../gcc/libgcc/../gcc -I../../../gcc/libgcc/../include  -DHAVE_CC_TLS  -o _negvdi2_s.o -MT 
_negvdi2_s.o -MD -MP -MF _negvdi2_s.dep -DSHARED -DL_negvdi2 -c ../../../gcc/libgcc/libgcc2.c

0x807eb3 lra(_IO_FILE*)
../../gcc/gcc/lra.c:2497
0x7c2755 do_reload
../../gcc/gcc/ira.c:5469
0x7c2c11 execute
../../gcc/gcc/ira.c:5653
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.
make[3]: *** [Makefile:916: _gcov_merge_icall_topn.o] Error 1
make[3]: *** Waiting for unfinished jobs
make[3]: *** [Makefile:916: _gcov_merge_single.o] Error 1
during RTL pass: reload
../../../gcc/libgcc/libgcov-driver.c: In function 
‘gcov_sort_icall_topn_counter’:
../../../gcc/libgcc/libgcov-driver.c:436:1: internal compiler error: in 
remove_some_program_points_and_update_live_ranges, at lra-lives.c:1172
436 | }
| ^
0x829189 remove_some_program_points_and_update_live_ranges
../../gcc/gcc/lra-lives.c:1172
0x829683 compress_live_ranges
../../gcc/gcc/lra-lives.c:1301
0x829d45 lra_create_live_ranges_1
../../gcc/gcc/lra-lives.c:1454
0x829d7d lra_create_live_ranges(bool, bool)
../../gcc/gcc/lra-lives.c:1466
0x807eb3 lra(_IO_FILE*)
../../gcc/gcc/lra.c:2497
0x7c2755 do_reload
../../gcc/gcc/ira.c:5469
0x7c2c11 execute
../../gcc/gcc/ira.c:5653
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.


Regards,
Renlin

On 11/12/2018 04:34 AM, Peter Bergner wrote:

Renlin, Jeff and Vlad: requests and questions for you below...

PR87899 shows another latent LRA bug exposed by my r264897 commit.
In the bugzilla report, we have the following rtl in LRA:

   (insn 1 (set (reg:SI 1 r1) (reg/f:SI 2040)))
...
   (insn 2 (set (mem/f/c:SI (pre_modify:SI (reg:SI 1 r1)
   (plus:SI (reg:SI 1 r1)
(const_int 12
(reg:SI 1048))
   (expr_list:REG_INC (reg:SI 1 r1)))
...
   

My earlier patch now sees the reg copy in insn "1" and correctly skips
adding a conflict between r1 and r2040 due to the copy.  However, insn "2"
updates r1 and r2040 is live across that update and so we should create
a conflict between them, but we currently do not and that leads to us
assigning r1 to one of r2040's reload pseudos which gets clobbered by
the r1 update in insn "2".

The reason a conflict was never added between r1 and r2040 is that LRA
skips INOUT operands when computing conflicts and so misses the definition
of r1 in insn "2" and so never adds conflicts for it.  The reason the code
skips the INOUT operands is that LRA doesn't want to create new program
points for INOUT operands, since unnecessary program points can slow down
remove_some_program_points_and_update_live_ranges.  This was all fine
before when we had conservative conflict info, but now we cannot ignore
INOUT operands.

The heart of the problem is that the {make,mark}_*_{live,dead} routines
update the liveness, conflict and program point information for operands.
My solution to the problem was to pull out the updating of the program point
info from {make,mark}_*_{live,dead} and have them only update liveness and
conflict information.  I then created a separate function that is used for
updating an operand's program points.  This allowed me to modify the insn
operand scanning to handle all operand types (IN, OUT and INOUT) and always
call the {make,mark}_*_{live,dead} functions for all operand types, while
only calling the new program point update function for IN and OUT operands.

This change then allowed me to remove the hacky handling of conflicts for
reg copies and instead use the more common method of removing the src reg
of a copy from the live set before handling the copy's definition, thereby
skipping the unwanted conflict.  Bonus! :-)

This passes bootstrap and regtesting on powerpc64le-linux with no regressions.

Re

Re: [PATCH 3/9][GCC][AArch64] Add autovectorization support for Complex instructions

2018-11-12 Thread Kyrill Tkachov

Hi Tamar,

On 11/11/18 10:26, Tamar Christina wrote:

Hi All,

This patch adds the expander support for supporting autovectorization of 
complex number operations
such as Complex addition with a rotation along the Argand plane.  This also 
adds support for complex
FMA.

The instructions are described in the ArmARM [1] and are available from 
Armv8.3-a onwards.

Concretely, this generates

f90:
mov x3, 0
.p2align 3,,7
.L2:
ldr q0, [x0, x3]
ldr q1, [x1, x3]
fcadd   v0.2d, v0.2d, v1.2d, #90
str q0, [x2, x3]
add x3, x3, 16
cmp x3, 3200
bne .L2
ret

now instead of

f90:
mov x4, x1
mov x1, x2
add x3, x4, 31
add x2, x0, 31
sub x3, x3, x1
sub x2, x2, x1
cmp x3, 62
mov x3, 62
ccmpx2, x3, 0, hi
bls .L5
mov x2, x4
add x3, x0, 3200
.p2align 3,,7
.L3:
ld2 {v2.2d - v3.2d}, [x0], 32
ld2 {v4.2d - v5.2d}, [x2], 32
cmp x0, x3
fsubv0.2d, v2.2d, v5.2d
faddv1.2d, v4.2d, v3.2d
st2 {v0.2d - v1.2d}, [x1], 32
bne .L3
ret
.L5:
add x6, x0, 8
add x5, x4, 8
add x2, x1, 8
mov x3, 0
.p2align 3,,7
.L2:
ldr d1, [x0, x3]
ldr d3, [x5, x3]
ldr d0, [x6, x3]
ldr d2, [x4, x3]
fsubd1, d1, d3
faddd0, d0, d2
str d1, [x1, x3]
str d0, [x2, x3]
add x3, x3, 16
cmp x3, 3200
bne .L2
ret

For complex additions with a 90* rotation along the Argand plane.

[1] 
https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile

Bootstrap and Regtest on aarch64-none-linux-gnu, arm-none-gnueabihf and 
x86_64-pc-linux-gnu
are still on going but previous patch showed no regressions.

The instructions have also been tested on aarch64-none-elf and arm-none-eabi on 
a Armv8.3-a model
and -march=Armv8.3-a+fp16 and all tests pass.

Ok for trunk?

Thanks,
Tamar

gcc/ChangeLog:

2018-11-11  Tamar Christina  

* config/aarch64/aarch64-simd.md (aarch64_fcadd,
fcadd3, aarch64_fcmla,
fcmla4): New.
* config/aarch64/aarch64.h (TARGET_COMPLEX): New.
* config/aarch64/iterators.md (UNSPEC_FCADD90, UNSPEC_FCADD270,
UNSPEC_FCMLA, UNSPEC_FCMLA90, UNSPEC_FCMLA180, UNSPEC_FCMLA270): New.
(FCADD, FCMLA): New.
(rot, rotsplit1, rotsplit2): New.
* config/arm/types.md (neon_fcadd, neon_fcmla): New.

--



diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 
c4be3101fdec930707918106cd7c53cf7584553e..12a91183a98ea23015860c77a97955cb1b30bfbb
 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -419,6 +419,63 @@
 }
 )
 
+;; The fcadd and fcmla patterns are made UNSPEC for the explicitly due to the

+;; fact that their usage need to guarantee that the source vectors are
+;; contiguous.  It would be wrong to describe the operation without being able
+;; to describe the permute that is also required, but even if that is done
+;; the permute would have been created as a LOAD_LANES which means the values
+;; in the registers are in the wrong order.
+(define_insn "aarch64_fcadd"
+  [(set (match_operand:VHSDF 0 "register_operand" "=w")
+   (unspec:VHSDF [(match_operand:VHSDF 1 "register_operand" "w")
+  (match_operand:VHSDF 2 "register_operand" "w")]
+  FCADD))]
+  "TARGET_COMPLEX"
+  "fcadd\t%0., %1., %2., #"
+  [(set_attr "type" "neon_fcadd")]
+)
+
+(define_expand "fcadd3"
+  [(set (match_operand:VHSDF 0 "register_operand")
+   (unspec:VHSDF [(match_operand:VHSDF 1 "register_operand")
+  (match_operand:VHSDF 2 "register_operand")]
+  FCADD))]
+  "TARGET_COMPLEX"
+{
+  emit_insn (gen_aarch64_fcadd (operands[0], operands[1],
+  operands[2]));
+  DONE;
+})
+
+(define_insn "aarch64_fcmla"
+  [(set (match_operand:VHSDF 0 "register_operand" "=w")
+   (plus:VHSDF (match_operand:VHSDF 1 "register_operand" "0")
+   (unspec:VHSDF [(match_operand:VHSDF 2 "register_operand" 
"w")
+  (match_operand:VHSDF 3 "register_operand" 
"w")]
+  FCMLA)))]
+  "TARGET_COMPLEX"
+  "fcmla\t%0., %2., %3., #"
+  [(set_attr "type" "neon_fcmla")]
+)
+
+;; The complex mla operations always need to expand to two instructions.
+;; The first operation does half the computation and the second does the
+;; remainder.  Because of this, expand early.
+(define_expand "fcmla4"
+  [(set (match_operand:VHSDF 0 "register_operand")
+   (plus:VHSDF (match_operand:VHSDF 1 "register_operand")

Re: [PATCH][DOCS] Fix documentation of __builtin_cpu_is and __builtin_cpu_supports for x86.

2018-11-12 Thread Uros Bizjak
> The patch is adding missing values for aforementioned built-ins.
>
> Ready for trunk?
> Thanks,
> Martin
>
> gcc/ChangeLog:
>
> 2018-11-12  Martin Liska  
>
> * doc/extend.texi: Add missing values for __builtin_cpu_is and
> __builtin_cpu_supports for x86 target.

OK.

Thanks,
Uros.


  1   2   >