date:20170721

Re: Deprecate DBX/stabs?

2017-07-21 Thread JonY

On 07/21/2017 01:07 PM, Nathan Sidwell wrote:
> [darwin, cygwin, rx maintainers, you might have an opinion]
> Let's at least deprecate it.  I attach a patch to do so.  With the
> patch, you'll get a note about dbx being deprecated whenever you use
> stabs debugging on a system that prefers stabs (thus both -g and -gstabs
> will warn).  On systems where stabs is not preferred, -gstabs will not
> give you a warning.  The patch survices an x86_64-linux bootstrap.
> 
> A config can chose to override thus by defining 'DBX_DEBUG_OK' in the
> build defines.
> 
> I did try build time CPP tricks, but config/rx.h and
> config/i386/darwin.h define it to be a conditional expression.
> 
> AFAICT, the following include files are not used, and could probably be
> binned too:
> config/dbx.h
> config/dbxcoff.h
> config/dbxelf.h
> (+ configi386/gstabs.h Jim found)
> 
> It looks like DBX is the default for:
> i386/cygming configured for 32-bit or lacking PE_SECREL32_RELOC
> i386/darwin.h for 32-bit target
> rx/rx.h when using AS100 syntax
> 
> nathan

Cygwin GCC has been using --with-dwarf2 for sometime now, so it
shouldn't be affected.



signature.asc
Description: OpenPGP digital signature

Re: [PATCH] New C++ warning option '-Wduplicated-access-specifiers'

2017-07-21 Thread David Malcolm

On Fri, 2017-07-21 at 19:58 +0200, Volker Reichelt wrote:
> On 21 Jul, David Malcolm wrote:
> > On Thu, 2017-07-20 at 18:35 +0200, Volker Reichelt wrote:
> >> Hi,
> >> 
> >> the following patch introduces a new C++ warning option
> >> -Wduplicated-access-specifiers that warns about redundant
> >> access-specifiers in classes, e.g.
> >> 
> >>   class B
> >>   {
> >> public:
> >>   B();
> >> 
> >> private:
> >>   void foo();
> >> private:
> >>   int i;
> >>   };
> >> 
> >> test.cc:8:5: warning: duplicate 'private' access-specifier [
> >> -Wduplicated-access-specifiers]
> >>  private:
> >>  ^~~
> >>  ---
> >> test.cc:6:5: note: access-specifier was previously given here
> >>  private:
> >>  ^~~
> > 
> > Thanks for working on this.
> > 
> > I'm not able to formally review, but you did CC me; various
> comments below throughout.
> > 
> >> The test is implemented by tracking the location of the last
> >> access-specifier together with the access-specifier itself.
> >> The location serves two purposes: the obvious one is to be able to
> >> print the location in the note that comes with the warning.
> >> The second purpose is to be able to distinguish an access
> -specifier
> >> given by the user from the default of the class type (i.e. public
> for
> >> 'struct' and private for 'class') where the location is set to
> >> UNKNOWN_LOCATION. The warning is only issued if the user gives the
> >> access-specifier twice, i.e. if the previous location is not
> >> UNKNOWN_LOCATION.
> > 
> > Presumably given
> > 
> > struct foo
> > {
> > public:
> >   /* ... *
> > };
> > 
> > we could issue something like:
> > 
> >   warning: access-specifier 'public' is redundant within 'struct'
> > 
> > for the first; similarly for:
> > 
> > class bar
> > {
> > private:
> > };
> > 
> > we could issue:
> > 
> >   warning: access-specifier 'private' is redundant within 'class'
> > 
> > 
> >> One could actually make this a two-level warning so that on the
> >> higher level also the default class-type settings are taken into
> >> account. Would that be helpful? I could prepare a second patch for
> >> that.
> > 
> > I'm not sure how best to structure it.
> > 
> > FWIW, when I first saw the patch, I wasn't a big fan of the
> warning, as I like to use access-specifiers to break up the kinds of
> entities within a class.
> > 
> > For example, our coding guidelines 
> >   https://gcc.gnu.org/codingconventions.html#Class_Form
> > recommend:
> > 
> > "first define all public types,
> > then define all non-public types,
> > then declare all public constructors,
> > ...
> > then declare all non-public member functions, and
> > then declare all non-public member variables."
> > 
> > I find it's useful to put a redundant "private:" between the last
> two,
> > e.g.:
> > 
> > class baz
> > {
> >  public:
> >   ...
> > 
> >  private:
> >   void some_private_member ();
> > 
> >  private:
> >   int m_some_field;
> > };
> > 
> > to provide a subdivision between the private member functions and
> the
> > private data fields.
> 
> That's what also can be seen in our libstdc++ to some extent.
> The other half of the warnings indicate redundant access-specifiers.
> 
> It's up to the user to keep those duplicate access-specifiers as
> subdividers or to use something else (like comments) to do that
> and to switch on the warning for her/his code.
> Because the subdivider usage seems to be relatively common,
> I don't want to enable the warning by -Wall or -Wextra.
> 
> > This might be a violation of our guidelines (though if so, I'm not
> sure
> > it's explicitly stated), but I find it useful, and the patch would
> warn
> > about it.
> > 
> > Having said that, looking at e.g. the "jit" subdir, I see lots of
> > places where the warning would fire.  In some of these, the code
> has a
> > bit of a "smell", so maybe I like the warning after all... in that
> it
> > can be good for a new warning to draw attention to code that might
> need
> > work.
> > 
> > Sorry that this is rambling; comments on the patch inline below.
> > 
> > Bootstrapped and regtested on x86_64-pc-linux-gnu.
> >> OK for trunk?
> >> 
> >> Btw, there are about 50 places in our libstdc++ headers where this
> >> warning triggers. I'll come up with a patch for this later.
> >> 
> >> Regards,
> >> Volker
> >> 
> >> 
> >> 2017-07-20  Volker Reichelt  
> >> 
> >> * doc/invoke.texi (-Wduplicated-access-specifiers):
> Document
> >> new
> >> warning option.
> >> 
> >> Index: gcc/doc/invoke.texi
> >>
> ===
> >> --- gcc/doc/invoke.texi (revision 250356)
> >> +++ gcc/doc/invoke.texi (working copy)
> >> @@ -275,7 +275,7 @@
> >>  -Wdisabled-optimization @gol
> >>  -Wno-discarded-qualifiers  -Wno-discarded-array-qualifiers @gol
> >>  -Wno-div-by-zero  -Wdouble-promotion @gol
> >> --Wduplicated-branches  -Wduplicated-cond @gol
> >>

Re: libgo patch committed: Call f?statfs64 on GNU/Linux

2017-07-21 Thread Ian Lance Taylor

On Fri, Jul 21, 2017 at 1:58 PM, Andreas Schwab  wrote:
> On Jul 21 2017, Ian Lance Taylor  wrote:
>
>> In libgo we unconditionally set _FILE_OFFSET_BITS to 64 in
>> configure.ac, so we should unconditionally call the statfs64 and
>> fstatfs64 functions rather than statfs/fstatfs.
>
> That should happen automatically when building with
> _FILE_OFFSET_BITS=64.

For C code, yes, because of #include magic.  But these names are being
generated directly by the Go frontend.

>>  These functions should be available on all versions of GNU/Linux
>> since 2.6.
>
> Its a property of glibc, not the kernel.

OK, but, in any case, a long time.

> Wrong patch?

Yes, sorry, not sure what happened.  Correct patch attached.

Ian
commit e1bd9ea4dc16e228164c92a12c5229ddf20f2b50
Author: Ian Lance Taylor 
Date:   Fri Jul 21 11:51:58 2017 -0700

syscall: call f?statfs64 on GNU/Linux

We unconditionally set _FILE_OFFSET_BITS to 64 in configure.ac, so we
should unconditionally call the statfs64 and fstatfs64 functions.
These functions should be available on all versions of GNU/Linux since 2.6.
On 64-bit systems they are aliased to statfs/fstatfs, and on 32-bit
systems they use the 64-bit data structures.

Fixes golang/go#20922

Change-Id: Ibcd1e53b03f3c8db91a3cd8825d06bcf8f43f443
Reviewed-on: https://go-review.googlesource.com/50635
Reviewed-by: Than McIntosh 

diff --git a/libgo/go/syscall/libcall_linux.go 
b/libgo/go/syscall/libcall_linux.go
index b58b2ddd..5f477840 100644
--- a/libgo/go/syscall/libcall_linux.go
+++ b/libgo/go/syscall/libcall_linux.go
@@ -212,7 +212,7 @@ func Accept4(fd int, flags int) (nfd int, sa Sockaddr, err 
error) {
 //flock(fd _C_int, how _C_int) _C_int
 
 //sys  Fstatfs(fd int, buf *Statfs_t) (err error)
-//fstatfs(fd _C_int, buf *Statfs_t) _C_int
+//fstatfs64(fd _C_int, buf *Statfs_t) _C_int
 
 func Gettid() (tid int) {
r1, _, _ := Syscall(SYS_GETTID, 0, 0, 0)
@@ -360,7 +360,7 @@ func Splice(rfd int, roff *int64, wfd int, woff *int64, len 
int, flags int) (n i
 }
 
 //sys  Statfs(path string, buf *Statfs_t) (err error)
-//statfs(path *byte, buf *Statfs_t) _C_int
+//statfs64(path *byte, buf *Statfs_t) _C_int
 
 //sys  SyncFileRange(fd int, off int64, n int64, flags int) (err error)
 //sync_file_range(fd _C_int, off Offset_t, n Offset_t, flags _C_uint) _C_int

Re: [PATCH,rs6000] Fine-tune vec_construct direct move cost

2017-07-21 Thread Segher Boessenkool

Hi Bill,

On Fri, Jul 21, 2017 at 10:40:43AM -0500, Bill Schmidt wrote:
> In https://gcc.gnu.org/ml/gcc-patches/2017-05/msg00924.html, I raised the
> vectorization cost for a vec_construct operation that requires direct
> moves between GPRs and VSRs.  The cost equation I substituted has since
> proven to be slightly more conservative than attended, and we're seeing
> some cases of SLP vectorization being avoided that should not be.  This
> patch adjusts the equation to reduce the cost somewhat.
> 
> I've tested this to ensure the cases previously seen are now being
> vectorized again, and done some benchmark testing that shows no measurable
> result, positive or negative.  So this is just minor fine-tuning, but
> still important to get right.
> 
> Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no regressions.
> Is this ok for trunk?

Sure, thanks!


Segher

Re: [PATCH] scheduler bug fix for AArch64 insn fusing SCHED_GROUP usage

2017-07-21 Thread Jim Wilson

Ping.

https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00779.html

On Thu, Jul 13, 2017 at 3:00 PM, Jim Wilson  wrote:
> The AArch64 port uses SCHED_GROUP to mark instructions that get fused
> at issue time, to ensure that they will be issued together.  However,
> in the scheduler, use of a SCHED_GROUP forces all other instructions
> to issue in the next cycle.  This is wrong for AArch64 ports using
> insn fusing which can issue multiple insns per cycle, as aarch64
> SCHED_GROUP insns can all issue in the same cycle, and other insns can
> issue in the same cycle also.
>
> I put a testcase and some info in bug 81434.
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81434
>
> The attached patch fixes the problem.  The behavior in pass == 0 is
> same as now.  All non sched group insns are ignored, and all sched
> group insns are checked to see if they need to be queued for a latter
> cycle.  The difference is in the second pass where non sched group
> insns are queued for a latter cycle only if there is a sched group
> insn that got queued.  Since sched group insns always sort to the top
> of the list of insns to schedule, all sched group insns still get
> scheduled together as before.
>
> This has been tested with an Aarch64 bootstrap and make check.
>
> OK?
>
> Jim

Re: [PATCH/AARCH64] Decimal floating point support for AARCH64

2017-07-21 Thread Peter Bergner

On 7/21/17 4:51 PM, Andrew Pinski wrote:
> So right now how TFmode is handled on AARCH64 not as a pair of 64bit
> registers but rather one 128bit registers (qN registrer; the floating
> point register and the SIMD register set on AARCH64 overlap already).
> So handling TDmode like TFmode is more natural for AARCH64 than most
> arch.  that the DFP hw support for _Decimal128 on AARCH64 would take
> the values in the qN register rather than a pair of registers.

Ah, lucky you!  Then nevermind. :-)

Peter

Re: [PATCH/AARCH64] Decimal floating point support for AARCH64

2017-07-21 Thread Andrew Pinski

On Fri, Jul 21, 2017 at 2:45 PM, Peter Bergner  wrote:
> On 7/13/17 7:12 PM, Andrew Pinski wrote:
>>   This patch adds Decimal floating point support to aarch64.  It is
>> the base support in that since there is no hardware support for DFP,
>> it just defines the ABI.  The ABI I chose is that _Decimal32 is
>> treated like float, _Decimal64 is treated like double and _Decimal128
>> is treated like long double.  In that they are passed via the floating
>> registers (sN, dN, qN).
>> Is this ok an ABI?
>
> It depends on whether AARCH ever plans on implementing HW DFP.
> On POWER, we handle things similarly to what you mention above,
> except for one extra constraint for _Decimal128 and that is that
> they must live in even/odd register pairs.  This was due to how
> the instructions were implemented in the HW, they required even/odd
> reg pairs.
>
> If there's zero chance AARCH ever implements HW DFP, then you're
> probably fine with the above, but if you go with the above and HW DFP
> is eventually added, then the HW would need handle even/odd and
> odd/even register pairs in it's instructions...or you'd need to
> add potential prologue/epilogue code to move formal args into
> even/odd regs if the HW demands it.  If there is a non-zero chance
> or you just want to be safe, you could enforce even/odd reg usage
> in the ABI upfront.

So right now how TFmode is handled on AARCH64 not as a pair of 64bit
registers but rather one 128bit registers (qN registrer; the floating
point register and the SIMD register set on AARCH64 overlap already).
So handling TDmode like TFmode is more natural for AARCH64 than most
arch.  that the DFP hw support for _Decimal128 on AARCH64 would take
the values in the qN register rather than a pair of registers.

Thanks,
Andrew Pinski

>
> Peter
>

Re: [PATCH/AARCH64] Decimal floating point support for AARCH64

2017-07-21 Thread Peter Bergner

On 7/13/17 7:12 PM, Andrew Pinski wrote:
>   This patch adds Decimal floating point support to aarch64.  It is
> the base support in that since there is no hardware support for DFP,
> it just defines the ABI.  The ABI I chose is that _Decimal32 is
> treated like float, _Decimal64 is treated like double and _Decimal128
> is treated like long double.  In that they are passed via the floating
> registers (sN, dN, qN).
> Is this ok an ABI?

It depends on whether AARCH ever plans on implementing HW DFP.
On POWER, we handle things similarly to what you mention above,
except for one extra constraint for _Decimal128 and that is that
they must live in even/odd register pairs.  This was due to how
the instructions were implemented in the HW, they required even/odd
reg pairs.

If there's zero chance AARCH ever implements HW DFP, then you're
probably fine with the above, but if you go with the above and HW DFP
is eventually added, then the HW would need handle even/odd and
odd/even register pairs in it's instructions...or you'd need to
add potential prologue/epilogue code to move formal args into
even/odd regs if the HW demands it.  If there is a non-zero chance
or you just want to be safe, you could enforce even/odd reg usage
in the ABI upfront.

Peter

Re: [PATCH, rs6000] vec_mule and vec_mulo builtin fix

2017-07-21 Thread Segher Boessenkool

On Thu, Jul 20, 2017 at 03:56:01PM -0700, Carl Love wrote:
> The following patch is a reworked patch to fix the bugs in the vec_mule
> and vec_mulo patch that had to be reverted.  The reverted fix, added the
> correct mule and mulo instructions with vectorization.  The
> vectorization support resulted in vectorization in several testcases to
> fail.  This patch adds the correct instructions for the vec_mule and
> vec_mulo builtins without vectorization.  The goal is add the
> vectorization support again in a future patch.

Ah, as you explained offline, you removed vec_widen_umult_even_v4si and
friends.  Okay :-)

The patch is okay for trunk then.  Thanks!


Segher

Re: libgo patch committed: Call f?statfs64 on GNU/Linux

2017-07-21 Thread Andreas Schwab

On Jul 21 2017, Ian Lance Taylor  wrote:

> In libgo we unconditionally set _FILE_OFFSET_BITS to 64 in
> configure.ac, so we should unconditionally call the statfs64 and
> fstatfs64 functions rather than statfs/fstatfs.

That should happen automatically when building with
_FILE_OFFSET_BITS=64.

>  These functions should be available on all versions of GNU/Linux
> since 2.6.

Its a property of glibc, not the kernel.

> Index: gcc/go/gofrontend/MERGE
> ===
> --- gcc/go/gofrontend/MERGE   (revision 250436)
> +++ gcc/go/gofrontend/MERGE   (working copy)
> @@ -1,4 +1,4 @@
> -a9f1aeced86691de891fbf2a8c97e848faf1962e
> +b712bacd939466e66972337744983e180849c535
>  
>  The first line of this file holds the git revision number of the last
>  merge done from the gofrontend repository.
> Index: libgo/runtime/go-caller.c
> ===
> --- libgo/runtime/go-caller.c (revision 250406)
> +++ libgo/runtime/go-caller.c (working copy)
> @@ -74,7 +74,7 @@ static void *back_state;
>  
>  /* A lock to control creating back_state.  */
>  
> -static Lock back_state_lock;
> +static uint32 back_state_lock;
>  
>  /* The program arguments.  */
>  

Wrong patch?

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

Re: [PATCH, rs6000] vec_mule and vec_mulo builtin fix

2017-07-21 Thread Segher Boessenkool

On Thu, Jul 20, 2017 at 03:56:01PM -0700, Carl Love wrote:
> The following patch is a reworked patch to fix the bugs in the vec_mule
> and vec_mulo patch that had to be reverted.  The reverted fix, added the
> correct mule and mulo instructions with vectorization.  The
> vectorization support resulted in vectorization in several testcases to
> fail.  This patch adds the correct instructions for the vec_mule and
> vec_mulo builtins without vectorization.  The goal is add the
> vectorization support again in a future patch.

The patch looks fine to me, but so did the previous version...  What
has changed?  It's not clear to me.


Segher

[PATCH] Emit DWARF5 DW_AT_export_symbols for namespaces

2017-07-21 Thread Jakub Jelinek

Hi!

DWARF5 introduced DW_AT_export_symbols that may be preset on
DW_TAG_namespace or DW_TAG_{structure,union,class}_type to signalize
inline or anonymous namespaces or anonymous structures/unions/classes.

What we were emitting instead is an implicit DW_TAG_imported_module
in the outer namespace.

The following patch changes nothing for -gdwarf-4 and below with
-gstrict-dwarf, for -gdwarf-4 and below -gno-strict-dwarf it
just adds DW_AT_export_symbols to inline namespaces (so that interested
consumers can find out it is inline namespace, but consumers not knowing
about DW_AT_export_symbols still have the implicit DW_TAG_imported_module).
In that mode, no changes for anonymous namespaces, because those
are already easily detectable by the consumers (missing DW_AT_name).
For -gdwarf-5 it emits DW_AT_export_symbols on both inline namespaces
and anonymous namespaces and doesn't emit the implicit
DW_TAG_imported_module, which is shorter.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

This patch doesn't do anything about anon struct/union/class, I've tried
to handle it, the problem is that ANON_AGGR_TYPE_P flag is set too late,
after the debug hook adds the type DIE.  Any thoughts on how to handle that?
And I wonder what is the counterpart of ANON_AGGR_TYPE_P in the C FE, CCing
Marek on that.

2017-07-21  Jakub Jelinek  

* debug.h (struct gcc_debug_hooks): Add IMPLICIT argument
to imported_module_or_decl hook.
(debug_nothing_tree_tree_tree_bool): Remove.
(debug_nothing_tree_tree_tree_bool_bool): New declaration.
* debug.c (do_nothing_debug_hooks): Use
debug_nothing_tree_tree_tree_bool_bool instead of
debug_nothing_tree_tree_tree_bool.
* vmsdbgout.c (vmsdbg_debug_hooks): Likewise.
* dbxout.c (dbx_debug_hooks, xcoff_debug_hooks): Likewise.
* sdbout.c (sdb_debug_hooks): Likewise.
* dwarf2out.c (dwarf2_lineno_debug_hooks): Likewise.
(gen_namespace_die): Add DW_AT_export_symbols attribute if
langhook wants it.
(dwarf2out_imported_module_or_decl): Add IMPLICIT argument,
if true, -gdwarf-5 and decl will have DW_AT_export_symbols
attribute, don't add anything.
cp/
* cp-objcp-common.c (cp_decl_dwarf_attribute): Handle
DW_AT_export_symbols.
* name-lookup.c (emit_debug_info_using_namespace): Add IMPLICIT
argument, pass it through to the debug hook.
(finish_namespace_using_directive): Adjust
emit_debug_info_using_namespace caller.
(push_namespace): Likewise.  Call it after setting
DECL_NAMESPACE_INLINE_P.
(cp_emit_debug_info_for_using): Pass false as new argument to
the imported_module_or_decl debug hook.
fortran/
* trans-decl.c (gfc_trans_use_stmts): Pass false as new argument to
the imported_module_or_decl debug hook.
ada/
* gcc-interface/utils.c (gnat_write_global_declarations): Pass false
as new argument to the imported_module_or_decl debug hook.
testsuite/
* g++.dg/debug/dwarf2/inline-ns-1.C: New test.
* g++.dg/debug/dwarf2/inline-ns-2.C: New test.

--- gcc/debug.h.jj  2017-02-18 17:11:18.0 +0100
+++ gcc/debug.h 2017-07-21 12:19:09.486576092 +0200
@@ -145,7 +145,8 @@ struct gcc_debug_hooks
 
   /* Debug information for imported modules and declarations.  */
   void (* imported_module_or_decl) (tree decl, tree name,
-   tree context, bool child);
+   tree context, bool child,
+   bool implicit);
 
   /* DECL is an inline function, whose body is present, but which is
  not being output at this point.  */
@@ -206,7 +207,8 @@ extern void debug_nothing_int_int (unsig
 extern void debug_nothing_tree (tree);
 extern void debug_nothing_tree_tree (tree, tree);
 extern void debug_nothing_tree_int (tree, int);
-extern void debug_nothing_tree_tree_tree_bool (tree, tree, tree, bool);
+extern void debug_nothing_tree_tree_tree_bool_bool (tree, tree, tree,
+   bool, bool);
 extern bool debug_true_const_tree (const_tree);
 extern void debug_nothing_rtx_insn (rtx_insn *);
 extern void debug_nothing_rtx_code_label (rtx_code_label *);
--- gcc/debug.c.jj  2017-02-18 17:11:18.0 +0100
+++ gcc/debug.c 2017-07-21 11:53:45.452383785 +0200
@@ -47,7 +47,7 @@ const struct gcc_debug_hooks do_nothing_
   debug_nothing_tree,   /* early_global_decl */
   debug_nothing_tree,   /* late_global_decl */
   debug_nothing_tree_int,   /* type_decl */
-  debug_nothing_tree_tree_tree_bool,/* imported_module_or_decl */
+  debug_nothing_tree_tree_tree_bool_bool,/* imported_module_or_decl */
   debug_nothing_tree,   /* deferred_inline_function */
   debug_nothing_tree,   /* outlining_inline_function */

Re: [PATCH][RFA/RFC] Stack clash mitigation patch 05/08

2017-07-21 Thread Segher Boessenkool

On Thu, Jul 20, 2017 at 08:20:52AM -0600, Jeff Law wrote:
> > Can only combine-stack-adjustments do this?  It seems like something
> > many passes could do, and then your new note doesn't help.
> SO far it's only been observed with c-s-a, but further auditing is
> certainly desirable here, particularly with the upcoming changes to the
> generic dynamic alloca handling.
> 
> In the V2 patch only backends would emit unrolled inline alloca/probe
> sequences like what you see above and only for prologues.  Thus there
> were a very limited number of passes to be concerned about.
> 
> In the V3 patch we have unrolled inline probing for the dynamic space as
> well, so this kind of sequence is exposed to everything after
> gimple->rtl expansion.
> 
> Unfortunately, the most thorough checker we have is x86 and on that
> target, because of stack alignment issues, we'll never see a constant
> size in the dynamic space and thus no unrolled inlined alloca/probe
> sequences.
> 
> In reality I suspect that with teh hard register references, most passes
> are going to leave those insns alone, but some auditing is necessary.

This is similar to what rs6000 uses stack_tie for.  You want the
prevent a store to the stack (the probe) from being moved after a
later stack pointer update.  By pretending (in the insn pattern)
there is a store to stack with that stack pointer update, nothing
can move stores after it.


Segher

libgo patch committed: Call f?statfs64 on GNU/Linux

2017-07-21 Thread Ian Lance Taylor

In libgo we unconditionally set _FILE_OFFSET_BITS to 64 in
configure.ac, so we should unconditionally call the statfs64 and
fstatfs64 functions rather than statfs/fstatfs.  These functions
should be available on all versions of GNU/Linux since 2.6.  On 64-bit
systems they are aliased to statfs/fstatfs, and on 32-bit systems they
use the 64-bit data structures.  This fixes
https://golang.org/issue/20922.  Bootstrapped and ran Go testsuite on
x86_64-pc-linux-gnu.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 250436)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-a9f1aeced86691de891fbf2a8c97e848faf1962e
+b712bacd939466e66972337744983e180849c535
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/runtime/go-caller.c
===
--- libgo/runtime/go-caller.c   (revision 250406)
+++ libgo/runtime/go-caller.c   (working copy)
@@ -74,7 +74,7 @@ static void *back_state;
 
 /* A lock to control creating back_state.  */
 
-static Lock back_state_lock;
+static uint32 back_state_lock;
 
 /* The program arguments.  */
 
@@ -85,7 +85,15 @@ extern Slice runtime_get_args(void);
 struct backtrace_state *
 __go_get_backtrace_state ()
 {
-  runtime_lock (_state_lock);
+  uint32 set;
+
+  /* We may not have a g here, so we can't use runtime_lock.  */
+  set = 0;
+  while (!__atomic_compare_exchange_n (_state_lock, , 1, false, 
__ATOMIC_ACQUIRE, __ATOMIC_RELAXED))
+{
+  runtime_osyield ();
+  set = 0;
+}
   if (back_state == NULL)
 {
   Slice args;
@@ -113,7 +121,7 @@ __go_get_backtrace_state ()
 
   back_state = backtrace_create_state (filename, 1, error_callback, NULL);
 }
-  runtime_unlock (_state_lock);
+  __atomic_store_n (_state_lock, 0, __ATOMIC_RELEASE);
   return back_state;
 }

Re: Deprecate DBX/stabs?

2017-07-21 Thread Iain Sandoe


> On 21 Jul 2017, at 20:54, Jim Wilson  wrote:
> 
> On Fri, Jul 21, 2017 at 12:44 PM, Iain Sandoe  wrote:
>> It ought to be already, in fact anything (powerpc*/x86/x86-64) >= Darwin9 
>> (OS X 10.5) ought to be defaulting to DWARF already, will check that 
>> sometime.
> 
> Yes, they do default to dwarf2.  The comments say pre-darwin9 32-bit
> defaults to stabs.  The question is whether anyone cares about that
> anymore.

>From my perspective, as per Mike’s comments, I’d say “let’s move on”,
- Darwin8’s DWARF support is good enough for C at least
- as per my other comments, there remain ways forward for someone who really 
wants to support it (TenFourFox seems still active and I do get a few queries 
per year from folks working with Darwin8).
- deprecation gives other folks a chance to shout if they care.

cheers
Iain

Iain Sandoe
CodeSourcery / Mentor Embedded

Re: Deprecate DBX/stabs?

2017-07-21 Thread Jim Wilson

On Fri, Jul 21, 2017 at 12:44 PM, Iain Sandoe  wrote:
> It ought to be already, in fact anything (powerpc*/x86/x86-64) >= Darwin9 (OS 
> X 10.5) ought to be defaulting to DWARF already, will check that sometime.

Yes, they do default to dwarf2.  The comments say pre-darwin9 32-bit
defaults to stabs.  The question is whether anyone cares about that
anymore.

Jim

Re: Deprecate DBX/stabs?

2017-07-21 Thread Iain Sandoe

> On 21 Jul 2017, at 20:10, Mike Stump  wrote:
> 
> On Jul 21, 2017, at 6:07 AM, Nathan Sidwell  wrote:
>> 
>> [Darwin, cygwin, rx maintainers, you might have an opinion]
> 
> darwin going forward is a DWARF platform, so, shouldn't be a heartache for 
> real folks.

agreed,
in fact the default for current assemblers doesn’t support stabs - and we’ve 
adjusted the specs to deal with that.

>  For ancient machines, ancient compilers might be a requirement.  Generally, 
> I like keeping things; but cleanups and removals are a part of life, and 
> eventually the old should be removed. I'd not stand in the way of such 
> removal.  If we were within 5 years of such a transition point, I'd argue to 
> keep it for at least 5 years.  But, the switch for darwin was Oct 26th, 2007. 
>  10 years I think is a nice cutover point for first tier things.  Beyond 10, 
> and I'd say, you are dragging your feet.  If _all_ the code for DBX were in 
> my port file, I'd be tempted to keep it indefinitely.  It's not, so, that's 
> not possible/reasonable.
> 
> Iain, do you still have the G5s?  :-)  Do they run 8 or 9?  What do you 
> think?  Seem reasonable?

I still have access to i686 and powerpc Darwin8, but testing is extremely 
infrequent.

For the record; anyone wanting modern toolchains on Darwin8 most likely has to 
face building a linker and more modern cctools anyway(the XCode 2.5 set was 
extremely flaky from at least 4.6/4.7). 

I’d suspect that a person serious in doing that would likely be willing to 
build GDB which does support x86 Darwin, at least.  FWIW I have a patch that 
makes GDB (at least 7.x) work for PowerPC too.  If anyone cares, ask ;-) (I 
doubt if I’ll try modernising it across the transition to c++ for GDB - since 
available time would prob be better spent elsewhere).

Anyone wanting to build modern GCC on < Darwin8 is going to need to build most 
of the supporting infra too - the (XCode 1.5) linker/assembler there don’t 
support enough weak stuff to be viable for modern c++.  These very old 
platforms are long past the “config && make” stage, supporting them needs a 
degree of commitment and understanding.

-

My G5 ( and most of my other ppc hardware) habitually runs 10.5 (Darwin9) and 
I’ve no motivation to reboot to 10.4 unless to test something more quickly than 
my really ancient G4s can manage.

> The 64_BIT x86 darwin question likely can be be flipped to default to dwarf; 
> so, even that isn't an issue.

It ought to be already, in fact anything (powerpc*/x86/x86-64) >= Darwin9 (OS X 
10.5) ought to be defaulting to DWARF already, will check that sometime.

cheers,
Iain
Iain Sandoe
CodeSourcery / Mentor Embedded

Re: PING^2: Fwd: SSA range class and removal of VR_ANTI_RANGEs

2017-07-21 Thread Aldy Hernandez

On Mon, Jul 17, 2017 at 6:23 AM, Richard Biener
 wrote:
> On Mon, Jul 17, 2017 at 8:51 AM, Aldy Hernandez  wrote:

>> How does this look?
>
> It's a change that on its own doesn't look worthwhile to me.
>
> So please post the changes that will build ontop of this.  Like removing
> anti-ranges from VRP or your on-demand value-range stuff.
>
> Thanks,
> Richard.

>From the looks of it, we can have a variety of VRP ranges that are not
representable at all with the an integer range class.  For instance, I
see the following ranges built in set_value_range():

[INT, (nop_expr SSA)]

[INT, (plus_expr SSA INT)]

[(negate_expr SSA), (negate_expr SSA)]

[(plus_expr (negate_expr SSA INT)),
 (plus_expr (negate_expr SSA) INT)]

[SSA, SSA]

So...I guess the first suggestion is out of the question ;-).

Aldy

[C++ PATCH] merge various METHOD_VEC accessors

2017-07-21 Thread Nathan Sidwell


In the old days there was lookup_fnfields_1 which gave you an index.
Then lazy special function creation hapened and 
lookup_fnfields_idx_nolazy appeared.  Then users wanted the overloads 
directly so lookup_fnfields_slot and lookup_fnfields_slot_nolazy were born.


Now everybody uses the slot accessors, and the only users of the index 
accessors are the slot accessors themselves.


This patch does the obvious merging, so we only have 
lookup_fnfields_slot{,_nolazy}.  Much simples.


More simples later ...

nathan
--
Nathan Sidwell
2017-07-21  Nathan Sidwell  

	* search.c (lookup_conversion_operator): Return overloads.
	(lookup_fnfields_idx_nolazy): Absorb into ...
	(lookup_fnfields_slot_nolaxy): ... here.
	(lookup_fnfields_1): Absorb into ...
	(lookup_fnfields_slot): ... here.

Index: cp/search.c
===
--- cp/search.c	(revision 250437)
+++ cp/search.c	(working copy)
@@ -1530,62 +1530,53 @@ lookup_fnfields (tree xbasetype, tree na
   return rval;
 }
 
-/* Return the index in the CLASSTYPE_METHOD_VEC for CLASS_TYPE
-   corresponding to "operator TYPE ()", or -1 if there is no such
-   operator.  Only CLASS_TYPE itself is searched; this routine does
-   not scan the base classes of CLASS_TYPE.  */
+/* Return the conversion operators in CLASS_TYPE corresponding to
+   "operator TYPE ()".  Only CLASS_TYPE itself is searched; this
+   routine does not scan the base classes of CLASS_TYPE.  */
 
-static int
+static tree
 lookup_conversion_operator (tree class_type, tree type)
 {
-  int tpl_slot = -1;
+  tree tpls = NULL_TREE;
 
   if (TYPE_HAS_CONVERSION (class_type))
 {
-  int i;
-  tree fn;
+  tree fns;
   vec *methods = CLASSTYPE_METHOD_VEC (class_type);
 
-  for (i = CLASSTYPE_FIRST_CONVERSION_SLOT;
-	   vec_safe_iterate (methods, i, ); ++i)
+  for (int i = CLASSTYPE_FIRST_CONVERSION_SLOT;
+	   vec_safe_iterate (methods, i, ); ++i)
 	{
 	  /* All the conversion operators come near the beginning of
 	 the class.  Therefore, if FN is not a conversion
 	 operator, there is no matching conversion operator in
 	 CLASS_TYPE.  */
-	  fn = OVL_FIRST (fn);
+	  tree fn = OVL_FIRST (fns);
 	  if (!DECL_CONV_FN_P (fn))
 	break;
 
 	  if (TREE_CODE (fn) == TEMPLATE_DECL)
 	/* All the templated conversion functions are on the same
 	   slot, so remember it.  */
-	tpl_slot = i;
+	tpls = fns;
 	  else if (same_type_p (DECL_CONV_FN_TYPE (fn), type))
-	return i;
+	return fns;
 	}
 }
 
-  return tpl_slot;
+  return tpls;
 }
 
-/* TYPE is a class type. Return the index of the fields within
-   the method vector with name NAME, or -1 if no such field exists.
-   Does not lazily declare implicitly-declared member functions.  */
+/* TYPE is a class type. Return the member functions in the method
+   vector with name NAME.  Does not lazily declare implicitly-declared
+   member functions.  */
 
-static int
-lookup_fnfields_idx_nolazy (tree type, tree name)
+tree
+lookup_fnfields_slot_nolazy (tree type, tree name)
 {
-  vec *method_vec;
-  tree fn;
-  size_t i;
-
-  if (!CLASS_TYPE_P (type))
-return -1;
-
-  method_vec = CLASSTYPE_METHOD_VEC (type);
+  vec *method_vec = CLASSTYPE_METHOD_VEC (type);
   if (!method_vec)
-return -1;
+return NULL_TREE;
 
   if (GATHER_STATISTICS)
 n_calls_lookup_fnfields_1++;
@@ -1594,10 +1585,12 @@ lookup_fnfields_idx_nolazy (tree type, t
 return lookup_conversion_operator (type, TREE_TYPE (name));
 
   /* Skip the conversion operators.  */
+  int i;
+  tree fns;
   for (i = CLASSTYPE_FIRST_CONVERSION_SLOT;
-   vec_safe_iterate (method_vec, i, );
+   vec_safe_iterate (method_vec, i, );
++i)
-if (!DECL_CONV_FN_P (OVL_FIRST (fn)))
+if (!DECL_CONV_FN_P (OVL_FIRST (fns)))
   break;
 
   /* If the type is complete, use binary search.  */
@@ -1615,36 +1608,35 @@ lookup_fnfields_idx_nolazy (tree type, t
 	  if (GATHER_STATISTICS)
 	n_outer_fields_searched++;
 
-	  tree tmp = (*method_vec)[i];
-	  tmp = OVL_NAME (tmp);
-	  if (tmp > name)
+	  fns = (*method_vec)[i];
+	  tree fn_name = OVL_NAME (fns);
+	  if (fn_name > name)
 	hi = i;
-	  else if (tmp < name)
+	  else if (fn_name < name)
 	lo = i + 1;
 	  else
-	return i;
+	return fns;
 	}
 }
   else
-for (; vec_safe_iterate (method_vec, i, ); ++i)
+for (; vec_safe_iterate (method_vec, i, ); ++i)
   {
 	if (GATHER_STATISTICS)
 	  n_outer_fields_searched++;
-	if (OVL_NAME (fn) == name)
-	  return i;
+	if (OVL_NAME (fns) == name)
+	  return fns;
   }
 
-  return -1;
+  return NULL_TREE;
 }
 
-/* TYPE is a class type. Return the index of the fields within
-   the method vector with name NAME, or -1 if no such field exists.  */
+/* TYPE is a class type. Return the overloads in
+   the method vector with name NAME.  Lazily create ctors etc.  */
 
-static int
-lookup_fnfields_1 (tree type,

[PATCH, RFC] Proposed PowerPC IEEE 128-bit floating point changes

2017-07-21 Thread Michael Meissner

This patch makes some changes to the way the PowerPC handles IEEE 128-bit
floating point, and I wanted to point out where I would like to change the
compiler, but be open to doing it in other ways.

There are 3 changes in this patch, and presumably more work to do beyond this
patch.

The first change is to enable the C language to use _Float128 keyword (but not
__float128) without having to use the -mfloat128 option on power7-power9
systems.  My question is in the TR that introduced _Float128, is there any
expectation that outside of the built-in functions we already provide, that we
need to provide runtime functions?  Yes, glibc 2.26 will be coming along
shortly, and it should provide most/all of the F128 functions, but distros
won't pick this library up for some time.

I would like to enable it, but I want to avoid the problem that we have with
__float128 in that once the keyword is supported, it is assumed everything is
supported.  GCC and GLIBC run on different cycles (and different people work on
it), and so you typically have to have GCC add support before GLIBC can use it.

We've discovered that boost and libstdc++ both assume the full library support
exists if the __float128 keyword is used.  We will eventually need to tackle
both of these libraries, but we need to the full GLIBC support (or libquadmath)
to provide this functionality.

Part of the reason for wanting to enable _Float128 without library support or
an explicit -mfloat128 is the types and built-in functions have to be set up at
compiler startup, and it makes the second change simpler.

The second change is to allow #pragma GCC target and attribute((target(...)))
to enable IEEE 128-bit __float128 on a per module basis (PR 70589).  If I don't
enable _Float128 unconditionally, I've discovered that I couldn't find a way to
allow _Float128 access after compiler startup.

In the second change, I changed how we handle __float128 internally.
Previously, we would create the keyword __float128 if -mfloat128, and we would
create the same type with the undocumented __ieee128 keyword (you need a
keyword to enable the type).  Now, I always create the __ieee128 keyword, and
if -mfloat128 (or pragma/attribute) is enabled, I add a define:

#define __float128 __ieee128

It was a lot simpler to do it this way, rather than trying to keep track of
whether the real keyword was __float128 or __ieee128, and do the appropriate
define or undef.

The third change is to add cpu names 'power7f', 'power8f', and 'power9f' that
enable the appropriate power architecture but also does a -mfloat128.  The
motavation here is you cannot use either #pragma GCC target anywhere or the
target attribute on the function with a target_clones attribute declaration.

The question is do people see the need for -mcpu=power9f?  Or should those be
dropped? Assuming we are keeping those, is there a name people prefer instead
of just using a 'f' suffix.  Note, the GCC infrastructure does not allow a
hyphen in the name in creating configargs.h for --with-cpu=, etc.

At the current time, you cannot use --with-cpu=power8f (it fails in configuring
libstdc++, which says that GLIBC supports __float128), but as GLIBC 2.26
becomes more refined, hopefully it will work someday.

In writing the tests, I discovered two of the float128 tests could have the
target requiements/options tweaked, and I included those changes here.  I also
discovered we had parallel bits for setting up power7, and I used the ISA 2.06
mask in the power7 definition.

So comments on how to proceed?

[gcc]
2017-07-20  Michael Meissner  

PR target/70589
* config.gcc (powerpc*-*-*): Add support for -mcpu=power{7,8,9}f
that combines -mcpu=power{7,8,9} with -mfloat128.
* config/rs6000/rs6000-cpus.def (OTHER_IEEE_SW_MASKS): New mask of
the float128 software/hardware options.
(OTHER_IEEE_HW_MASKS): Likewise.
(power7 cpu): Replace masks with ISA_2_6_MASKS_SERVER.
(power7f): New cpu target that combines a particular machine with
-mfloat128.
(power8f): Likewise.
(power9f): Likewise.
* config/rs6000/rs6000-tables.opt: Regenerate.
* config/rs6000/rs6000-c.c (rs6000_target_modify_macros): Add
support for allowing -mfloat128 to be enabled via target pragma or
attribute.  Always use __ieee128 for the IEEE 128-bit floating
point internal type, and if -mfloat128, add a define from
__float128 to __ieee128.
(rs6000_cpu_cpp_builtins): Likewise.
* config/rs6000/rs6000.c (rs6000_init_builtins): Enable _Float128
if -mfloat128-type is enabled (on by default for VSX systems)
rather than -mfloat128 (which controls __float128).
(rs6000_floatn_mode): Likewise.
(rs6000_opt_masks): Allow float128 and float128-hardware target
options to be set.
* doc/invoke.texi (PowerPC options): Document -mcpu=power{7,8,9}f.

Re: Deprecate DBX/stabs?

2017-07-21 Thread Mike Stump

On Jul 21, 2017, at 9:03 AM, Jim Wilson  wrote:
> There is also the matter of SDB_DEBUG which is still supported, and is
> no longer used by anyone, as we have already deprecated all of the
> COFF targets that were using it.  SDB support for C++ is even worse
> than the DBX support.  This should be no problem to deprecate and
> remove.  We could perhaps even just drop it without bothering to
> deprecate it first.

I think that's reasonable.  I haven't used adb and sdb in years.  :-)

Re: Deprecate DBX/stabs?

2017-07-21 Thread Mike Stump

On Jul 21, 2017, at 6:07 AM, Nathan Sidwell  wrote:
> 
> [darwin, cygwin, rx maintainers, you might have an opinion]

darwin going forward is a DWARF platform, so, shouldn't be a heartache for real 
folks.  For ancient machines, ancient compilers might be a requirement.  
Generally, I like keeping things; but cleanups and removals are a part of life, 
and eventually the old should be removed. I'd not stand in the way of such 
removal.  If we were within 5 years of such a transition point, I'd argue to 
keep it for at least 5 years.  But, the switch for darwin was Oct 26th, 2007.  
10 years I think is a nice cutover point for first tier things.  Beyond 10, and 
I'd say, you are dragging your feet.  If _all_ the code for DBX were in my port 
file, I'd be tempted to keep it indefinitely.  It's not, so, that's not 
possible/reasonable.

Iain, do you still have the G5s?  :-)  Do they run 8 or 9?  What do you think?  
Seem reasonable?

The 64_BIT x86 darwin question likely can be be flipped to default to dwarf; 
so, even that isn't an issue.

libgo patch committed: don't use runtime_lock in __go_get_backtrace_state

2017-07-21 Thread Ian Lance Taylor

If getSiginfo in libgo does not know how to determine the PC, it will
call runtime_callers. That can happen in a thread that was started by
non-Go code, in which case the TLS variable g will not be set, in
which case runtime_lock will crash.

Avoid the problem by using atomic operations for the lock. This is OK
since creating a backtrace state is fast and never blocks.

The test case is TestCgoExternalThreadSIGPROF in the runtime package
on a system that getSiginfo doesn't handle specially.

Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu and
powerpc64le-unknown-linux-gnu.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 250436)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-a9f1aeced86691de891fbf2a8c97e848faf1962e
+b712bacd939466e66972337744983e180849c535
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/runtime/go-caller.c
===
--- libgo/runtime/go-caller.c   (revision 250406)
+++ libgo/runtime/go-caller.c   (working copy)
@@ -74,7 +74,7 @@ static void *back_state;
 
 /* A lock to control creating back_state.  */
 
-static Lock back_state_lock;
+static uint32 back_state_lock;
 
 /* The program arguments.  */
 
@@ -85,7 +85,15 @@ extern Slice runtime_get_args(void);
 struct backtrace_state *
 __go_get_backtrace_state ()
 {
-  runtime_lock (_state_lock);
+  uint32 set;
+
+  /* We may not have a g here, so we can't use runtime_lock.  */
+  set = 0;
+  while (!__atomic_compare_exchange_n (_state_lock, , 1, false, 
__ATOMIC_ACQUIRE, __ATOMIC_RELAXED))
+{
+  runtime_osyield ();
+  set = 0;
+}
   if (back_state == NULL)
 {
   Slice args;
@@ -113,7 +121,7 @@ __go_get_backtrace_state ()
 
   back_state = backtrace_create_state (filename, 1, error_callback, NULL);
 }
-  runtime_unlock (_state_lock);
+  __atomic_store_n (_state_lock, 0, __ATOMIC_RELEASE);
   return back_state;
 }

[PATCH] unitialized memory access vs BIT_INSERT_EXPR

2017-07-21 Thread Andrew Pinski

Hi,
  Due to the way bit-field access lower is done, we can get an
unitialized memory load and this causes the unitialized warning to
kick in.
The case we have is:
temp_1 = BIT_FIELD_REF
temp_2 = BIT_INSERT
BIT_FIELD_REF = temp_2

What this patch does is similar to what was done for the case of
BIT_INSERT for unitialized ssa names (default) but for memory
accesses.
Note also tested with the bit-field lower pass added; this and the
fold-const.c/tree-ssa-sccvn.c patch (which was just committed) are
requirements to the lower pass.

OK?  Bootstrapped and tested on aarch64-linux-gnu with no regressions.

Thanks,
Andrew Pinski
* tree-ssa-uninit.c (warn_uninitialized_vars): Don't warn about memory
accesses where the use is for the first operand of a BIT_INSERT.
Index: tree-ssa-uninit.c
===
--- tree-ssa-uninit.c   (revision 250430)
+++ tree-ssa-uninit.c   (working copy)
@@ -273,6 +273,11 @@ warn_uninitialized_vars (bool warn_possi
  && gimple_has_location (stmt))
{
  tree rhs = gimple_assign_rhs1 (stmt);
+ tree lhs = gimple_assign_lhs (stmt);
+ bool has_bit_insert = false;
+ use_operand_p luse_p;
+ imm_use_iterator liter;
+
  if (TREE_NO_WARNING (rhs))
continue;
 
@@ -300,6 +305,26 @@ warn_uninitialized_vars (bool warn_possi
   ref.offset) <= 0)))
continue;
 
+ /* Do not warn if the access is then used for a BIT_INSERT_EXPR. 
*/
+ if (TREE_CODE (lhs) == SSA_NAME)
+   FOR_EACH_IMM_USE_FAST (luse_p, liter, lhs)
+ {
+   gimple *use_stmt = USE_STMT (luse_p);
+/* BIT_INSERT_EXPR first operand should not be considered
+  a use for the purpose of uninit warnings.  */
+   if (gassign *ass = dyn_cast  (use_stmt))
+ {
+   if (gimple_assign_rhs_code (ass) == BIT_INSERT_EXPR
+   && luse_p->use == gimple_assign_rhs1_ptr (ass))
+ {
+   has_bit_insert = true;
+   break;
+ }
+ }
+ }
+ if (has_bit_insert)
+   continue;
+
  /* Limit the walking to a constant number of stmts after
 we overcommit quadratic behavior for small functions
 and O(n) behavior.  */

Re: [PATCH][RFA/RFC] Stack clash mitigation patch 07/08 V2

2017-07-21 Thread Wilco Dijkstra

Jeff Law wrote:

> Examples please?  We should be probing the outgoing args at the probe
> interval  once the total static frame is greater than 3k.  The dynamic
> space should be probed by generic code.

OK, here are a few simple examples that enable a successful jump of the stack
guard despite -fstack-clash-protection:

int t1(int x)
{
  char arr[3000];
  return arr[x];
}

int t2(int x)
{
  char *p = __builtin_alloca (4050);
  x = t1 (x);
  return p[x];
}

#define ARG32(X) X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X
#define ARG192(X) ARG32(X),ARG32(X),ARG32(X),ARG32(X),ARG32(X),ARG32(X)
void out1(ARG192(__int128));

int t3(int x)
{
  if (x < 1000)
return t1 (x) + 1;

  out1 (ARG192(0));
  return 0;
}

This currently generates:

t1:
sub sp, sp, #3008
add x1, sp, 8
ldrbw0, [x1, w0, sxtw]
add sp, sp, 3008
ret

t2:
stp x29, x30, [sp, -16]!
mov x1, 4072
add x29, sp, 0
sub sp, sp, #4080
mov x2, sp
str xzr, [sp, x1]
bl  t1
ldrbw0, [x2, w0, sxtw]
add sp, x29, 0
ldp x29, x30, [sp], 16
ret

t3:
stp x29, x30, [sp, -16]!
cmp w0, 999
add x29, sp, 0
sub sp, sp, #3008
bgt .L15
bl  t1
add w0, w0, 1
.L14:
add sp, x29, 0
ldp x29, x30, [sp], 16
ret

As you can see t2 allocates 4080 bytes on the stack but probes only the top,
leaving 4072 bytes unprobed. When it calls t1, it drops the stack by another 
3008
bytes without a probe (as it is allowed up to 3KB), so now we've got a distance 
of
almost 7KB between 2 probes...

Similarly functions with large outgoing arguments are not correctly probed.
t3 creates 3008 bytes of outgoing area without a probe and then calls t1
which will decrement the stack by another 3008 bytes without a probe.

Both t2 and t3 must probe SP+1024 to ensure the callee can adjust SP by up to
3KB before emitting a probe.

> What we should be doing, per your request is emit an initial probe if we
> know the function is going to require probing of any form.  Then we emit
> probes at 4k intervals.  At least that's how I understood your
> simplification.  So for a 7k stack that's two probes -- one at *sp at
> the start of the prologue then the second after the first 4k is allocated.

There is no benefit in doing an initial probe that way. We don't have to probe
the first 3KB even if the function has a larger stack. So if we have a 6KB 
stack,
we can drop the stack by 3KB, probe, drop it by another 3KB, then push the
callee-saves. If the stack is larger than 7KB we probe at 7KB, then 11KB etc.

> > I don't understand what last_probe_offset is for, it should always be zero
> > after saving the callee-saves (though currently it is not set correctly). 
> > And it
> > has to be set to a fixed value to limit the maximum outgoing args when doing
> > final_adjust.
> It's supposed to catch the case where neither the initial adjustment nor
> final adjustment in and of themselves require probes, but the
> combination would.

That's not possible - the 2 cases are completely independent given that the
callee-saves always provide an implicit probe (you can't have outgoing arguments
if there is no call, and you can't have a call without callee-saving LR).

> My understanding was that you didn't want that level of trackign around
> the callee saves.  It's also not clear if that will work in the presence
> of separate shrink wrapping.

Shrinkwrapping will be safe once I ensure LR is at the bottom of the 
callee-saves.
With this fix we know for sure that if there is a call, we've saved LR at SP+0 
before
adjusting SP by final_adjust. This means we don't need to try to figure out 
where
the last probe was - it's always SP+0 if final_adjust != 0, simplifying things.

> I have no idea what you mean by setting to a fixed value for limit the
> maximum outgoing args.  We don't limit the maximum outgoing args -- we
> probe that space just like any other as we cross 4k boundaries.

No that's not correct - remember we want to reserve 1024 bytes for the outgoing
area. So we must probe if the outgoing area is > 1024, not 4KB (and we need to
probe again at 5KB, 9KB etc).

> We're clearly not communicating well.  Example would probably help.

My t3 example above shows how you can easily jump the stack guard using lots of
outgoing args (but only if you call a different function with no outgoing args 
first!).

Wilco

[C++ PATCH] no special cdtor slots

2017-07-21 Thread Nathan Sidwell

ctors and dtors had special slots in the METHOD_VEC, because they used 
not to have proper names.  Now they have unique identifiers, we can 
treat them just as any other function and look them up by 
[cd]tor_identifier.


(the eventual goal is to replace separate METHOD_VEC and FIELD_VEC with 
a single map map, so don't worry about this making 
looking for cdtors slower than an array access right now)


Applied to trunk.

nathan
--
Nathan Sidwell
2017-07-21  Nathan Sidwell  

	Remove special CDtor METHOD_VEC slots.
	* cp-tree.h (CLASSTYPE_CONSTRUCTOR_SLOT,
	CLASSTYPE_DESTRUCTOR_SLOT): Delete.
	(CLASSTYPE_CONSTRUCTORS): Use lookup_fnfields_slot_nolazy.
	(CLASSTYPE_DESTRUCTOR): Likewise.
	* class (add_method): Don't use special cdtor slots.
	* search.c (lookup_fnfields_idx_nolazy): Likewise.
	(look_for_overrides_here): Use lookup_fnfields_slot.
	* semantics (classtype_has_nothrow_assign_or_copy_p): Likewise.

Index: class.c
===
--- class.c	(revision 250426)
+++ class.c	(working copy)
@@ -1039,50 +1039,39 @@ add_method (tree type, tree method, bool
 	 we're going to end up with an assignment operator at some
 	 point as well.  */
   vec_alloc (method_vec, 8);
-  /* Create slots for constructors and destructors.  */
-  method_vec->quick_push (NULL_TREE);
-  method_vec->quick_push (NULL_TREE);
   CLASSTYPE_METHOD_VEC (type) = method_vec;
 }
 
   /* Maintain TYPE_HAS_USER_CONSTRUCTOR, etc.  */
   grok_special_member_properties (method);
 
-  /* Constructors and destructors go in special slots.  */
-  if (DECL_MAYBE_IN_CHARGE_CONSTRUCTOR_P (method))
-slot = CLASSTYPE_CONSTRUCTOR_SLOT;
-  else if (DECL_MAYBE_IN_CHARGE_DESTRUCTOR_P (method))
-slot = CLASSTYPE_DESTRUCTOR_SLOT;
-  else
-{
-  tree m;
+  tree m;
 
-  insert_p = true;
-  /* See if we already have an entry with this name.  */
-  for (slot = CLASSTYPE_FIRST_CONVERSION_SLOT;
-	   vec_safe_iterate (method_vec, slot, );
-	   ++slot)
+  insert_p = true;
+  /* See if we already have an entry with this name.  */
+  for (slot = CLASSTYPE_FIRST_CONVERSION_SLOT;
+   vec_safe_iterate (method_vec, slot, );
+   ++slot)
+{
+  m = OVL_FIRST (m);
+  if (template_conv_p)
 	{
-	  m = OVL_FIRST (m);
-	  if (template_conv_p)
-	{
-	  if (TREE_CODE (m) == TEMPLATE_DECL
-		  && DECL_TEMPLATE_CONV_FN_P (m))
-		insert_p = false;
-	  break;
-	}
-	  if (conv_p && !DECL_CONV_FN_P (m))
-	break;
-	  if (DECL_NAME (m) == DECL_NAME (method))
-	{
-	  insert_p = false;
-	  break;
-	}
-	  if (complete_p
-	  && !DECL_CONV_FN_P (m)
-	  && DECL_NAME (m) > DECL_NAME (method))
-	break;
+	  if (TREE_CODE (m) == TEMPLATE_DECL
+	  && DECL_TEMPLATE_CONV_FN_P (m))
+	insert_p = false;
+	  break;
+	}
+  if (conv_p && !DECL_CONV_FN_P (m))
+	break;
+  if (DECL_NAME (m) == DECL_NAME (method))
+	{
+	  insert_p = false;
+	  break;
 	}
+  if (complete_p
+	  && !DECL_CONV_FN_P (m)
+	  && DECL_NAME (m) > DECL_NAME (method))
+	break;
 }
   current_fns = insert_p ? NULL_TREE : (*method_vec)[slot];
 
@@ -1256,7 +1245,7 @@ add_method (tree type, tree method, bool
 
   if (conv_p)
 TYPE_HAS_CONVERSION (type) = 1;
-  else if (slot >= CLASSTYPE_FIRST_CONVERSION_SLOT && !complete_p)
+  else if (!complete_p && !IDENTIFIER_CDTOR_P (DECL_NAME (method)))
 push_class_level_binding (DECL_NAME (method), current_fns);
 
   if (insert_p)
Index: cp-tree.h
===
--- cp-tree.h	(revision 250426)
+++ cp-tree.h	(working copy)
@@ -2148,29 +2148,21 @@ struct GTY(()) lang_type {
and the RECORD_TYPE for the class template otherwise.  */
 #define CLASSTYPE_DECL_LIST(NODE) (LANG_TYPE_CLASS_CHECK (NODE)->decl_list)
 
-/* The slot in the CLASSTYPE_METHOD_VEC where constructors go.  */
-#define CLASSTYPE_CONSTRUCTOR_SLOT 0
-
-/* The slot in the CLASSTYPE_METHOD_VEC where destructors go.  */
-#define CLASSTYPE_DESTRUCTOR_SLOT 1
-
 /* The first slot in the CLASSTYPE_METHOD_VEC where conversion
operators can appear.  */
-#define CLASSTYPE_FIRST_CONVERSION_SLOT 2
+#define CLASSTYPE_FIRST_CONVERSION_SLOT 0
 
 /* A FUNCTION_DECL or OVERLOAD for the constructors for NODE.  These
are the constructors that take an in-charge parameter.  */
 #define CLASSTYPE_CONSTRUCTORS(NODE) \
-  ((*CLASSTYPE_METHOD_VEC (NODE))[CLASSTYPE_CONSTRUCTOR_SLOT])
+  (lookup_fnfields_slot_nolazy (NODE, ctor_identifier))
 
 /* A FUNCTION_DECL for the destructor for NODE.  This is the
destructors that take an in-charge parameter.  If
CLASSTYPE_LAZY_DESTRUCTOR is true, then this entry will be NULL
until the destructor is created with lazily_declare_fn.  */
 #define CLASSTYPE_DESTRUCTOR(NODE) \
-  (CLASSTYPE_METHOD_VEC (NODE)		  \
-   ? (*CLASSTYPE_METHOD_VEC (NODE))[CLASSTYPE_DESTRUCTOR_SLOT]		  \
-   : NULL_TREE)
+

libgo patch committed: Recognize PPC PC in signal handler

2017-07-21 Thread Ian Lance Taylor

This patch to libgo changes the signal handler for PPC (and PPC64)
GNU/Linux to fetch the PC from the signal context.  Bootstrapped and
ran Go testsuite on x86_64-pc-linux-gnu and
powerpc64le-unknown-linux-gnu.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 250433)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-e34cb8dee6c1f215329e0eea79202b48cb83817c
+a9f1aeced86691de891fbf2a8c97e848faf1962e
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/runtime/go-signal.c
===
--- libgo/runtime/go-signal.c   (revision 250406)
+++ libgo/runtime/go-signal.c   (working copy)
@@ -215,6 +215,11 @@ getSiginfo(siginfo_t *info, void *contex
ret.sigpc = ((ucontext_t*)(context))->uc_mcontext.gregs[REG_EIP];
   #endif
 #endif
+#ifdef __PPC__
+  #ifdef __linux__
+   ret.sigpc = ((ucontext_t*)(context))->uc_mcontext.regs->nip;
+  #endif
+#endif
 
if (ret.sigpc == 0) {
// Skip getSiginfo/sighandler/sigtrampgo/sigtramp/handler.

Re: [PATCH,AIX] Enable XCOFF in libbacktrace on AIX

2017-07-21 Thread Ian Lance Taylor

On Mon, May 15, 2017 at 7:24 AM, REIX, Tony  wrote:
> Description:
>  * This patch enables libbacktrace to handle XCOFF on AIX.
>
> Tests:
>  * Fedora25/x86_64 + GCC v7.1.0 : Configure/Build: SUCCESS
>- build made by means of a .spec file based on Fedora gcc-7.0.1-0.12 .spec 
> file
>  ../configure --enable-bootstrap 
> --enable-languages=c,c++,objc,obj-c++,fortran,go,lto --prefix=/usr 
> --mandir=/usr/share/man --infodir=/usr/share/info 
> --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-shared 
> --enable-threads=posix --enable-checking=release --enable-multilib 
> --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions 
> --enable-gnu-unique-object --enable-linker-build-id 
> --with-gcc-major-version-only --with-linker-hash-style=gnu --enable-plugin 
> --enable-initfini-array --with-isl --enable-libmpx 
> --enable-offload-targets=nvptx-none --without-cuda-driver 
> --enable-gnu-indirect-function --with-tune=generic --with-arch_32=i686 
> --build=x86_64-redhat-linux
>
> ChangeLog:
>   * libbacktrace/Makefile.am : Add xcoff.c
>   * libbacktrace/Makefile.in : Regenerated
>   * libbacktrace/configure.ac : Add XCOFF output file type
>   * libbacktrace/configure : Regenerated
>   * libbacktrace/fileline.c : Handle AIX procfs tree
>   * libbacktrace/filetype.awk : Add AIX XCOFF type detection
>   * libbacktrace/xcoff.c : New file for handling XCOFF format

Thanks.  Committed with some minor changes, as follows.

Ian


2017-07-21  Tony Reix  

* filetype.awk: Add AIX XCOFF type detection.
* configure.ac: Recognize xcoff format.
* Makefile.am (FORMAT_FILES): Add xcoff.c.
* fileline.c: Include .
(fileline_initialize): Add case for AIX procfs.
* xcoff.c: New file.
* configure, Makefile.in: Rebuild.
Index: Makefile.am
===
--- Makefile.am (revision 250406)
+++ Makefile.am (working copy)
@@ -57,7 +57,8 @@ BACKTRACE_FILES = \
 FORMAT_FILES = \
elf.c \
pecoff.c \
-   unknown.c
+   unknown.c \
+   xcoff.c
 
 VIEW_FILES = \
read.c \
@@ -155,3 +156,5 @@ sort.lo: config.h backtrace.h internal.h
 stest.lo: config.h backtrace.h internal.h
 state.lo: config.h backtrace.h backtrace-supported.h internal.h
 unknown.lo: config.h backtrace.h internal.h
+xcoff.lo: config.h backtrace.h internal.h
+
Index: configure.ac
===
--- configure.ac(revision 250406)
+++ configure.ac(working copy)
@@ -233,6 +233,9 @@ elf*) FORMAT_FILE="elf.lo" ;;
 pecoff) FORMAT_FILE="pecoff.lo"
 backtrace_supports_data=no
;;
+xcoff) FORMAT_FILE="xcoff.lo"
+   backtrace_supports_data=no
+   ;;
 *) AC_MSG_WARN([could not determine output file type])
FORMAT_FILE="unknown.lo"
backtrace_supported=no
Index: fileline.c
===
--- fileline.c  (revision 250406)
+++ fileline.c  (working copy)
@@ -37,6 +37,7 @@ POSSIBILITY OF SUCH DAMAGE.  */
 #include 
 #include 
 #include 
+#include 
 
 #include "backtrace.h"
 #include "internal.h"
@@ -57,6 +58,7 @@ fileline_initialize (struct backtrace_st
   int pass;
   int called_error_callback;
   int descriptor;
+  char buf[64];
 
   if (!state->threaded)
 failed = state->fileline_initialization_failed;
@@ -80,7 +82,7 @@ fileline_initialize (struct backtrace_st
 
   descriptor = -1;
   called_error_callback = 0;
-  for (pass = 0; pass < 4; ++pass)
+  for (pass = 0; pass < 5; ++pass)
 {
   const char *filename;
   int does_not_exist;
@@ -99,6 +101,10 @@ fileline_initialize (struct backtrace_st
case 3:
  filename = "/proc/curproc/file";
  break;
+   case 4:
+ snprintf (buf, sizeof (buf), "/proc/%d/object/a.out", getpid ());
+ filename = buf;
+ break;
default:
  abort ();
}
Index: filetype.awk
===
--- filetype.awk(revision 250406)
+++ filetype.awk(working copy)
@@ -3,3 +3,6 @@
 /\177ELF\002/ { if (NR == 1) { print "elf64"; exit } }
 /\114\001/{ if (NR == 1) { print "pecoff"; exit } }
 /\144\206/{ if (NR == 1) { print "pecoff"; exit } }
+/\001\337/{ if (NR == 1) { print "xcoff"; exit } }
+/\001\367/{ if (NR == 1) { print "xcoff"; exit } }
+
Index: xcoff.c
===
--- xcoff.c (revision 0)
+++ xcoff.c (working copy)
@@ -0,0 +1,76 @@
+/* xcoff.c -- Get debug data from a XCOFFF file for backtraces.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are
+met:
+
+(1) Redistributions of source code must retain the above copyright
+notice, this list of conditions and the following disclaimer.
+
+(2)

[C++ PACTH] small add_candidates cleanup

2017-07-21 Thread Nathan Sidwell

I wandered into add_candidates and got confused by !!.  There were a few 
opportunities for other cleanup too, so committed the attached.


nathan
--
Nathan Sidwell
2017-07-21  Nathan Sidwell  

	* call.c (add_candidates): Move decls to initialization.  Don't
	use !!.

Index: call.c
===
--- call.c	(revision 250426)
+++ call.c	(working copy)
@@ -5423,8 +5423,8 @@ add_candidates (tree fns, tree first_arg
 {
   tree ctype;
   const vec *non_static_args;
-  bool check_list_ctor;
-  bool check_converting;
+  bool check_list_ctor = false;
+  bool check_converting = false;
   unification_kind_t strict;
 
   if (!fns)
@@ -5435,7 +5435,7 @@ add_candidates (tree fns, tree first_arg
   if (DECL_CONV_FN_P (fn))
 {
   check_list_ctor = false;
-  check_converting = !!(flags & LOOKUP_ONLYCONVERTING);
+  check_converting = (flags & LOOKUP_ONLYCONVERTING) != 0;
   if (flags & LOOKUP_NO_CONVERSION)
 	/* We're doing return_type(x).  */
 	strict = DEDUCE_CONV;
@@ -5452,18 +5452,13 @@ add_candidates (tree fns, tree first_arg
 {
   if (DECL_CONSTRUCTOR_P (fn))
 	{
-	  check_list_ctor = !!(flags & LOOKUP_LIST_ONLY);
+	  check_list_ctor = (flags & LOOKUP_LIST_ONLY) != 0;
 	  /* For list-initialization we consider explicit constructors
 	 and complain if one is chosen.  */
 	  check_converting
 	= ((flags & (LOOKUP_ONLYCONVERTING|LOOKUP_LIST_INIT_CTOR))
 	   == LOOKUP_ONLYCONVERTING);
 	}
-  else
-	{
-	  check_list_ctor = false;
-	  check_converting = false;
-	}
   strict = DEDUCE_CALL;
   ctype = conversion_path ? BINFO_TYPE (conversion_path) : NULL_TREE;
 }
@@ -5476,9 +5471,6 @@ add_candidates (tree fns, tree first_arg
 
   for (lkp_iterator iter (fns); iter; ++iter)
 {
-  tree fn_first_arg;
-  const vec *fn_args;
-
   fn = *iter;
 
   if (check_converting && DECL_NONCONVERTING_P (fn))
@@ -5486,10 +5478,13 @@ add_candidates (tree fns, tree first_arg
   if (check_list_ctor && !is_list_ctor (fn))
 	continue;
 
-  /* Figure out which set of arguments to use.  */
+  tree fn_first_arg = NULL_TREE;
+  const vec *fn_args = args;
+
   if (DECL_NONSTATIC_MEMBER_FUNCTION_P (fn))
 	{
-	  /* If this function is a non-static member and we didn't get an
+	  /* Figure out where the object arg comes from.  If this
+	 function is a non-static member and we didn't get an
 	 implicit object argument, move it out of args.  */
 	  if (first_arg == NULL_TREE)
 	{
@@ -5506,12 +5501,6 @@ add_candidates (tree fns, tree first_arg
 	  fn_first_arg = first_arg;
 	  fn_args = non_static_args;
 	}
-  else
-	{
-	  /* Otherwise, just use the list of arguments provided.  */
-	  fn_first_arg = NULL_TREE;
-	  fn_args = args;
-	}
 
   if (TREE_CODE (fn) == TEMPLATE_DECL)
 	add_template_candidate (candidates,

Re: [PATCH] New C++ warning option '-Wduplicated-access-specifiers'

2017-07-21 Thread Volker Reichelt

On 21 Jul, David Malcolm wrote:
> On Thu, 2017-07-20 at 18:35 +0200, Volker Reichelt wrote:
>> Hi,
>> 
>> the following patch introduces a new C++ warning option
>> -Wduplicated-access-specifiers that warns about redundant
>> access-specifiers in classes, e.g.
>> 
>>   class B
>>   {
>> public:
>>   B();
>> 
>> private:
>>   void foo();
>> private:
>>   int i;
>>   };
>> 
>> test.cc:8:5: warning: duplicate 'private' access-specifier [
>> -Wduplicated-access-specifiers]
>>  private:
>>  ^~~
>>  ---
>> test.cc:6:5: note: access-specifier was previously given here
>>  private:
>>  ^~~
> 
> Thanks for working on this.
> 
> I'm not able to formally review, but you did CC me; various comments below 
> throughout.
> 
>> The test is implemented by tracking the location of the last
>> access-specifier together with the access-specifier itself.
>> The location serves two purposes: the obvious one is to be able to
>> print the location in the note that comes with the warning.
>> The second purpose is to be able to distinguish an access-specifier
>> given by the user from the default of the class type (i.e. public for
>> 'struct' and private for 'class') where the location is set to
>> UNKNOWN_LOCATION. The warning is only issued if the user gives the
>> access-specifier twice, i.e. if the previous location is not
>> UNKNOWN_LOCATION.
> 
> Presumably given
> 
> struct foo
> {
> public:
>   /* ... *
> };
> 
> we could issue something like:
> 
>   warning: access-specifier 'public' is redundant within 'struct'
> 
> for the first; similarly for:
> 
> class bar
> {
> private:
> };
> 
> we could issue:
> 
>   warning: access-specifier 'private' is redundant within 'class'
> 
> 
>> One could actually make this a two-level warning so that on the
>> higher level also the default class-type settings are taken into
>> account. Would that be helpful? I could prepare a second patch for
>> that.
> 
> I'm not sure how best to structure it.
> 
> FWIW, when I first saw the patch, I wasn't a big fan of the warning, as I 
> like to use access-specifiers to break up the kinds of entities within a 
> class.
> 
> For example, our coding guidelines 
>   https://gcc.gnu.org/codingconventions.html#Class_Form
> recommend:
> 
> "first define all public types,
> then define all non-public types,
> then declare all public constructors,
> ...
> then declare all non-public member functions, and
> then declare all non-public member variables."
> 
> I find it's useful to put a redundant "private:" between the last two,
> e.g.:
> 
> class baz
> {
>  public:
>   ...
> 
>  private:
>   void some_private_member ();
> 
>  private:
>   int m_some_field;
> };
> 
> to provide a subdivision between the private member functions and the
> private data fields.

That's what also can be seen in our libstdc++ to some extent.
The other half of the warnings indicate redundant access-specifiers.

It's up to the user to keep those duplicate access-specifiers as
subdividers or to use something else (like comments) to do that
and to switch on the warning for her/his code.
Because the subdivider usage seems to be relatively common,
I don't want to enable the warning by -Wall or -Wextra.

> This might be a violation of our guidelines (though if so, I'm not sure
> it's explicitly stated), but I find it useful, and the patch would warn
> about it.
> 
> Having said that, looking at e.g. the "jit" subdir, I see lots of
> places where the warning would fire.  In some of these, the code has a
> bit of a "smell", so maybe I like the warning after all... in that it
> can be good for a new warning to draw attention to code that might need
> work.
> 
> Sorry that this is rambling; comments on the patch inline below.
> 
> Bootstrapped and regtested on x86_64-pc-linux-gnu.
>> OK for trunk?
>> 
>> Btw, there are about 50 places in our libstdc++ headers where this
>> warning triggers. I'll come up with a patch for this later.
>> 
>> Regards,
>> Volker
>> 
>> 
>> 2017-07-20  Volker Reichelt  
>> 
>> * doc/invoke.texi (-Wduplicated-access-specifiers): Document
>> new
>> warning option.
>> 
>> Index: gcc/doc/invoke.texi
>> ===
>> --- gcc/doc/invoke.texi (revision 250356)
>> +++ gcc/doc/invoke.texi (working copy)
>> @@ -275,7 +275,7 @@
>>  -Wdisabled-optimization @gol
>>  -Wno-discarded-qualifiers  -Wno-discarded-array-qualifiers @gol
>>  -Wno-div-by-zero  -Wdouble-promotion @gol
>> --Wduplicated-branches  -Wduplicated-cond @gol
>> +-Wduplicated-access-specifiers  -Wduplicated-branches  -Wduplicated
>> -cond @gol
>>  -Wempty-body  -Wenum-compare  -Wno-endif-labels  -Wexpansion-to
>> -defined @gol
>>  -Werror  -Werror=*  -Wextra-semi  -Wfatal-errors @gol
>>  -Wfloat-equal  -Wformat  -Wformat=2 @gol
>> @@ -5388,6 +5388,12 @@
>>  
>>  This warning is enabled by @option{-Wall}.
>>  
>> +@item -Wduplicated-access-specifiers
>>

libgo patch committed: Use a larger stack size in CgoCallbackGC test

2017-07-21 Thread Ian Lance Taylor

This libgo patch tweaks the CgoCallbackGC test to allocate more stack
space.  This makes it more likely to succeed on a system that does not
support split stacks.  This test is actually not very meaningful for
gccgo at present, but it doesn't hurt to keep running it.
Bootstrapped and ran Go tests on x86_64-pc-linux-gnu and
powerpc64le-unknown-linux-gnu.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 250406)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-6572f7e35f962bdb8a7c174920dbb70350b96874
+e34cb8dee6c1f215329e0eea79202b48cb83817c
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/go/runtime/testdata/testprogcgo/callback.go
===
--- libgo/go/runtime/testdata/testprogcgo/callback.go   (revision 250406)
+++ libgo/go/runtime/testdata/testprogcgo/callback.go   (working copy)
@@ -23,7 +23,9 @@ static void foo() {
 pthread_t th;
 pthread_attr_t attr;
 pthread_attr_init();
-pthread_attr_setstacksize(, 256 << 10);
+// For gccgo use a stack size large enough for all the callbacks,
+// in case we are on a platform that does not support -fsplit-stack.
+pthread_attr_setstacksize(, 512 * 1);
 pthread_create(, , thr, 0);
 pthread_join(th, 0);
 }

[PING][PATCH][aarch64] Enable ifunc resolver attribute by default

2017-07-21 Thread Steve Ellcey


This is a ping for my patch to enable the ifunc resolver by default for
aarch64.

https://gcc.gnu.org/ml/gcc-patches/2017-06/msg00806.html

Steve Ellcey
sell...@cavium.com

Re: [PATCH] New C++ warning option '-Wduplicated-access-specifiers'

2017-07-21 Thread Martin Sebor


On 07/20/2017 10:35 AM, Volker Reichelt wrote:

Hi,

the following patch introduces a new C++ warning option
-Wduplicated-access-specifiers that warns about redundant
access-specifiers in classes, e.g.

  class B
  {
public:
  B();

private:
  void foo();
private:
  int i;
  };


I'm very fond of warnings that help find real bugs, or even that
provide an opportunity to review code for potential problems or
inefficiencies and suggest a possibility to improve it in some
way (make it clearer, or easier for humans or compilers to find
real bugs in, or faster, etc.), even if the code isn't strictly
incorrect.

In this case I'm having trouble coming up with an example where
the warning would have this effect.  What do you have in mind?
(Duplicate access specifiers tend to crop up in large classes
and/or as a result of preprocessor conditionals.)

Btw., there are examples of poor coding practices where I can
imagine a warning focused on access specifiers being helpful.
For instance, a class whose member functions maintain non-trivial
invariants among its data members should declare the data private
to prevent clients from inadvertently breaking those invariants.
A specific example might be a container class (like string or
vector) that owns the block of memory it allocates.  A warning
that detected when such members are publicly accessible could
help improve encapsulation.  I suspect this would be challenging
to implement and would likely require some non-trivial analysis
in the middle end.

Another example is a class that defines an inaccessible member
that isn't used (e.g., class A { int i; }; that Clang detects
with -Wunused-private-field; or class A { void f () { } };).
I would expect this to be doable in the front end.

Martin

Re: [PATCH, committed] Fix PR81162

2017-07-21 Thread Bill Schmidt

Hi Richard,

I would like to backport this fix to GCC 5, 6, and 7.  All have passed regstrap 
on
powerpc64le-linux-gnu (with the test case moved to gcc.dg/ubsan, of course).
Is this ok?

Thanks!

-- Bill


> On Jul 14, 2017, at 1:05 PM, Bill Schmidt  wrote:
> 
> Hi,
> 
> PR81162 identifies a bug in SLSR involving overflow that occurs when
> replacing a NEGATE_EXPR with a PLUS_EXPR.  This is another example
> of an unprofitable transformation that should be skipped anyway,
> hence this simple patch.  Bootstrapped and tested on
> powerpc64le-unknown-linux-gnu, committed.  Test case provided from
> the bug report.
> 
> Thanks,
> Bill
> 
> 
> [gcc]
> 
> 2016-07-14  Bill Schmidt  
> 
>   PR tree-optimization/81162
>   * gimple-ssa-strength-reduction.c (replace_mult_candidate): Don't
>   replace a negate with an add.
> 
> [gcc/testsuite]
> 
> 2016-07-14  Bill Schmidt  
> 
>   PR tree-optimization/81162
>   * gcc.dg/pr81162.c: New file.
> 
> 
> Index: gcc/gimple-ssa-strength-reduction.c
> ===
> --- gcc/gimple-ssa-strength-reduction.c   (revision 250189)
> +++ gcc/gimple-ssa-strength-reduction.c   (working copy)
> @@ -2082,13 +2082,14 @@ replace_mult_candidate (slsr_cand_t c, tree basis_
>  types but allows for safe negation without twisted logic.  */
>   if (wi::fits_shwi_p (bump)
>   && bump.to_shwi () != HOST_WIDE_INT_MIN
> -  /* It is not useful to replace casts, copies, or adds of
> +  /* It is not useful to replace casts, copies, negates, or adds of
>an SSA name and a constant.  */
>   && cand_code != SSA_NAME
>   && !CONVERT_EXPR_CODE_P (cand_code)
>   && cand_code != PLUS_EXPR
>   && cand_code != POINTER_PLUS_EXPR
> -  && cand_code != MINUS_EXPR)
> +  && cand_code != MINUS_EXPR
> +  && cand_code != NEGATE_EXPR)
> {
>   enum tree_code code = PLUS_EXPR;
>   tree bump_tree;
> Index: gcc/testsuite/gcc.dg/pr81162.c
> ===
> --- gcc/testsuite/gcc.dg/pr81162.c(nonexistent)
> +++ gcc/testsuite/gcc.dg/pr81162.c(working copy)
> @@ -0,0 +1,17 @@
> +/* PR tree-optimization/81162 */
> +/* { dg-do run } */
> +/* { dg-options "-fsanitize=undefined -O2" } */
> +
> +short s;
> +int i1 = 1;
> +int i2 = 1;
> +unsigned char uc = 147;
> +
> +int main() {
> +  s = (-uc + 2147483647) << 0;
> +  if (9031239389974324562ULL >= (-((i1 && i2) + uc) ^ -21096) ) {
> +return 0;
> +  } else {
> +return -1;
> +  }
> +}
>

Re: [PATCH][RFA/RFC] Stack clash mitigation patch 01/08 V2

2017-07-21 Thread Jeff Law

On 07/20/2017 07:23 AM, Segher Boessenkool wrote:
> Hi Jeff,
> 
> On Tue, Jul 18, 2017 at 11:17:19PM -0600, Jeff Law wrote:
>>
>> The biggest change in this update to patch 01/08 is moving of stack
>> clash protection out of -fstack-check= and into its own option,
>> -fstack-clash-protection.  I believe other issues raised by reviewers
>> have been addressed as well.
> 
> I think the documentation for the new option should say this only
> provides partial protection on targets that do not have fuller protection
> enabled.  Is there some way for users to find out which targets those
> are / if their target is one of those?
Agreed.  I've added a note in the documentation.

Sadly, there's no way short of listing them and keeping that list
up-to-date over time by hand.  If we want to do that, I would suggest
we note the processors with full support as well as those with partial
support using the -fstack-check=specific prologues.

I think we ought to provide that target specific information in the
documentation, though I worry it will get out of date.  Thoughts?

jeff

Re: [PATCH] New C++ warning option '-Wduplicated-access-specifiers'

2017-07-21 Thread Volker Reichelt

On 21 Jul, David Malcolm wrote:
> On Fri, 2017-07-21 at 11:56 -0400, David Malcolm wrote:
>> On Thu, 2017-07-20 at 18:35 +0200, Volker Reichelt wrote:
>> > Hi,
>> > 
>> > the following patch introduces a new C++ warning option
>> > -Wduplicated-access-specifiers that warns about redundant
>> > access-specifiers in classes, e.g.
>> > 
>> >   class B
>> >   {
>> > public:
>> >   B();
>> > 
>> > private:
>> >   void foo();
>> > private:
>> >   int i;
>> >   };
>> > 
>> > test.cc:8:5: warning: duplicate 'private' access-specifier [
>> > -Wduplicated-access-specifiers]
>> >  private:
>> >  ^~~
>> >  ---
>> > test.cc:6:5: note: access-specifier was previously given here
>> >  private:
>> >  ^~~
>> 
>> Thanks for working on this.
>> 
>> I'm not able to formally review, but you did CC me; various comments
>> below throughout.
>> 
>> > The test is implemented by tracking the location of the last
>> > access-specifier together with the access-specifier itself.
>> > The location serves two purposes: the obvious one is to be able to
>> > print the location in the note that comes with the warning.
>> > The second purpose is to be able to distinguish an access-specifier
>> > given by the user from the default of the class type (i.e. public
>> > for
>> > 'struct' and private for 'class') where the location is set to
>> > UNKNOWN_LOCATION. The warning is only issued if the user gives the
>> > access-specifier twice, i.e. if the previous location is not
>> > UNKNOWN_LOCATION.
>> 
>> Presumably given
>> 
>> struct foo
>> {
>> public:
>>   /* ... *
>> };
>> 
>> we could issue something like:
>> 
>>   warning: access-specifier 'public' is redundant within 'struct'
>> 
>> for the first; similarly for:
>> 
>> class bar
>> {
>> private:
>> };
>> 
>> we could issue:
>> 
>>   warning: access-specifier 'private' is redundant within 'class'
>> 
>> 
>> > One could actually make this a two-level warning so that on the
>> > higher level also the default class-type settings are taken into
>> > account. Would that be helpful? I could prepare a second patch for
>> > that.
>> 
>> I'm not sure how best to structure it.
> 
> Maybe combine the two ideas, and call it
>   -Wredundant-access-specifiers
> ?
> 
> Or just "-Waccess-specifiers" ?
> 
> [...snip...]
> 
> Dave

Thanks for having a look at this!

My favorite version would be to use '-Waccess-specifiers=1' for the
original warning and '-Waccess-specifiers=2' for the stricter version
that also checks against the class-type default.

We could then let '-Waccess-specifiers' default to any of those two.
I'm afraid that level 2 will fire way too often for legacy code
(and there might be coding conventions that require an access specifier
at the beginning of a class/struct). So I'd vote for level 1 as default.

The last argument is also the reason why I'd like to see a two-lveel
warning instead of just different error messages (although these are
very helpful for seeing what's really goiing on with your code).

What do you think?

Regards,
Volker

Re: [RFC/SCCVN] Handle BIT_INSERT_EXPR in vn_nary_op_eq

2017-07-21 Thread Andrew Pinski

On Wed, Jul 19, 2017 at 11:13 AM, Richard Biener
 wrote:
> On July 19, 2017 6:10:28 PM GMT+02:00, Andrew Pinski  
> wrote:
>>On Mon, Jul 17, 2017 at 3:02 AM, Richard Biener
>> wrote:
>>> On Thu, Jul 13, 2017 at 6:18 AM, Andrew Pinski 
>>wrote:
 On Wed, Jul 12, 2017 at 9:10 PM, Marc Glisse 
>>wrote:
> On Wed, 12 Jul 2017, Andrew Pinski wrote:
>
>> Hi,
>>  Unlike most other expressions, BIT_INSERT_EXPR has an implicit
>> operand of the precision/size of the second operand.  This means
>>if we
>> have an integer constant for the second operand and that compares
>>to
>> the same constant value, vn_nary_op_eq would return that these two
>> expressions are the same.  But in the case I was looking into the
>> integer constants had different types, one with 1 bit precision
>>and
>> the other with 2 bit precision which means the BIT_INSERT_EXPR
>>were
>> not equal at all.
>>
>> This patches the problem by checking to see if BIT_INSERT_EXPR's
>> operand 1's (second operand) type  has different precision to
>>return
>> false.
>>
>> Is this the correct location or should we be checking for this
>> differently?  If this is the correct location, is the patch ok?
>> Bootstrapped and tested on aarch64-linux-gnu with no regressions
>>(and
>> also tested with a few extra patches to expose BIT_INSERT_EXPR).
>>
>> Thanks,
>> Andrew Pinski
>>
>> ChangeLog:
>> * tree-ssa-sccvn.c (vn_nary_op_eq): Check BIT_INSERT_EXPR's
>>operand 1
>> to see if the types precision matches.
>
>
> Hello,
>
> since BIT_INSERT_EXPR is implicitly an expression with 4 arguments,
>>it makes
> sense that we may need a few such special cases. But shouldn't the
>>hash
> function be in sync with the equality comparator? Does
>>operand_equal_p need
> the same?

 The hash function does not need to be exactly the same.  The only
 requirement there is if vn_nary_op_eq returns true then the hash has
 to be the same.  Now we could improve the hash by using the
>>precision
 which will allow us not to compare as much in some cases.

 Yes operand_equal_p needs the same handling; I did not notice that
 until you mention it..
 Right now it does:
 case BIT_INSERT_EXPR:
   return OP_SAME (0) && OP_SAME (1) && OP_SAME (2);
>>>
>>> Aww.  The issue is that operand_equal_p treats INTEGER_CSTs of
>>different
>>> type/precision but the same value as equal.
>>>
>>> Revisiting that, while a good idea, shouldn't block a fix here.  So
>>...
>>>
>>> Index: tree-ssa-sccvn.c
>>> ===
>>> --- tree-ssa-sccvn.c(revision 250159)
>>> +++ tree-ssa-sccvn.c(working copy)
>>> @@ -2636,6 +2636,14 @@ vn_nary_op_eq (const_vn_nary_op_t const
>>>  if (!expressions_equal_p (vno1->op[i], vno2->op[i]))
>>>return false;
>>>
>>> +  /* BIT_INSERT_EXPR has an implict operand as the type precision
>>> + of op1.  Need to check to make sure they are the same.  */
>>> +  if (vno1->opcode == BIT_INSERT_EXPR)
>>> +if (INTEGRAL_TYPE_P (TREE_TYPE (vno1->op[0]))
>>> +   && TYPE_PRECISION (TREE_TYPE (vno1->op[1]))
>>> +   != TYPE_PRECISION (TREE_TYPE (vno2->op[1])))
>>> +  return false;
>>> +
>>>
>>> the case can be restricted to INTEGER_CST vno1->op[0] I think:
>>>
>>>   if (vno1->opcode == BIT_INSERT_EXPR
>>>   && TREE_CODE (vno1->op[0]) == INTEGER_CST
>>>   && TYPE_PRECISION (
>>>
>>> and yes, operand_equal_p needs a similar fix.  Can you re-post with
>>that added?
>>
>>Here is that with the changes you requested too.
>>
>>> Do you have a testcase?
>>
>>I don't have one which fails with the trunk.  With lowering of
>>bit-fields accesses (which I hope to submit soon; just getting in the
>>required patches first), many testcases fail (bootstrap fails for the
>>same reason too).
>>
>>OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions.
>
> In the fold-const.c hunk you need to verify arg1 op0 is an INTEGER_CST.  This 
> is not necessary in sccvn because we passed operand_equal_p first.
>
> OK with that change.

This is what I applied in the end. It is not op0 that needs to be
changed here but rather op1; I had a typo in the fold-const.c check.

Thanks,
Andrew

>
> Richard.
>
>>Thanks,
>>Andrew
>>
>>ChangeLog:
>>* tree-ssa-sccvn.c (vn_nary_op_eq): Check BIT_INSERT_EXPR's operand 1
>>to see if the types precision matches.
>>* fold-const.c (operand_equal_p): Likewise,
>>
>>
>>>
>>> Thanks,
>>> Richard.
>>>
 Thanks,
 Andrew Pinski

>
> --
> Marc Glisse
>
Index: fold-const.c
===
--- fold-const.c(revision 250430)
+++ fold-const.c(working copy)
@@ -3184,9 +3184,18 @@

Re: [PATCH][RFA/RFC] Stack clash mitigation patch 08/08 V2

2017-07-21 Thread Jeff Law

On 07/21/2017 07:23 AM, Andreas Krebbel wrote:
> Hi,
> 
> I've used your patch as the base and applied my changes on top.  The
> attached patch is the result, so it is supposed to replace your
> version.  It now also supports emitting a runtime loop.
Thanks a ton!  I'll roll your changes into the V3 patch.

> 
> It bootstraps fine but unfortunately I see an Ada regression which I
> haven't tracked down yet.
> 
>> FAIL: cb1010a
>> FAIL: gnat.dg/stack_check1.adb execution test
>> FAIL: gnat.dg/stack_check1.adb execution test
FWIW, due to time constraints I haven't been testing Ada.  An educated
guess is that this is related to the #define STACK_CHECK_STATIC_BUILTIN
which was never right for s390, but was useful temporarily.  I bet
pulling that out should fix the Ada regression.  One of the design goals
of this work is we're not supposed to affect Ada code generation at all.

There's one slight tweak I think we'll want to make as part of the V3
patch I'm about ready to post.  Specifically I have introduced PARAMS to
control the size of the guard an the size of the probe interval.

The size of the guard affects if we're going to need static probes.  For
s390 I think that we just replace PROBE_INTERVAL with the PARAM_VALUE
for the guard size when we initialize last_probe_offset.

The remainder of PROBE_INTERVAL uses turn into the PARAM_VALUE for the
probe interval size.

I can cobble those together easily I think.

Thanks again!

Jeff

std::list optimizations

2017-07-21 Thread François Dumont


Hi

Here is a proposal for 2 optimizations in the std::list implementation.

Optimization on the move constructor taking an allocator for always 
equal allocators. Compare to the version in my previous std::list patch 
I am now doing it at std::list level rather than at _List_base level. 
This way we won't instantiate the insert call and we won't check for 
empty list when the allocator always compare equal.


2nd optimization, I replace the _S_distance method by the 
std::distance algo which benefit from the nice [begin(), end()) range 
optimization when cxx11 abi is being used.


Note that I am proposing the 2 change in 1 patch to save some 
review time but I can commit those separately.


Tested under x86_64 Linux normal mode.


* include/bits/stl_list.h
(_List_base<>::_S_distance): Remove.
(_List_impl(_List_impl&&, _Node_alloc_type&&)): New.
(_List_base(_List_base&&, _Node_alloc_type&&)): Use latter.
(_List_base(_Node_alloc_type&&)): New.
(_List_base<>::_M_distance, _List_base<>::_M_node_count): Move...
(list<>::_M_distance, list<>::_M_node_count): ...here. Replace calls to
_S_distance with calls to std::distance.
(list(list&&, const allocator_type&, true_type)): New.
(list(list&&, const allocator_type&, false_type)): New.
(list(list&&, const allocator_type&)): Adapt to call latters.

Ok to commit ?

François

diff --git a/libstdc++-v3/include/bits/stl_list.h b/libstdc++-v3/include/bits/stl_list.h
index cef94f7..aaff500 100644
--- a/libstdc++-v3/include/bits/stl_list.h
+++ b/libstdc++-v3/include/bits/stl_list.h
@@ -364,19 +364,6 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
 	rebind<_List_node<_Tp> >::other _Node_alloc_type;
   typedef __gnu_cxx::__alloc_traits<_Node_alloc_type> _Node_alloc_traits;
 
-  static size_t
-  _S_distance(const __detail::_List_node_base* __first,
-		  const __detail::_List_node_base* __last)
-  {
-	size_t __n = 0;
-	while (__first != __last)
-	  {
-	__first = __first->_M_next;
-	++__n;
-	  }
-	return __n;
-  }
-
   struct _List_impl
   : public _Node_alloc_type
   {
@@ -393,6 +380,10 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
 #if __cplusplus >= 201103L
 	_List_impl(_List_impl&&) = default;
 
+	_List_impl(_List_impl&& __x, _Node_alloc_type&& __a)
+	: _Node_alloc_type(std::move(__a)), _M_node(std::move(__x._M_node))
+	{ }
+
 	_List_impl(_Node_alloc_type&& __a) noexcept
 	: _Node_alloc_type(std::move(__a))
 	{ }
@@ -409,28 +400,12 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
   void _M_inc_size(size_t __n) { _M_impl._M_node._M_size += __n; }
 
   void _M_dec_size(size_t __n) { _M_impl._M_node._M_size -= __n; }
-
-  size_t
-  _M_distance(const __detail::_List_node_base* __first,
-		  const __detail::_List_node_base* __last) const
-  { return _S_distance(__first, __last); }
-
-  // return the stored size
-  size_t _M_node_count() const { return _M_get_size(); }
 #else
   // dummy implementations used when the size is not stored
   size_t _M_get_size() const { return 0; }
   void _M_set_size(size_t) { }
   void _M_inc_size(size_t) { }
   void _M_dec_size(size_t) { }
-  size_t _M_distance(const void*, const void*) const { return 0; }
-
-  // count the number of nodes
-  size_t _M_node_count() const
-  {
-	return _S_distance(_M_impl._M_node._M_next,
-			   std::__addressof(_M_impl._M_node));
-  }
 #endif
 
   typename _Node_alloc_traits::pointer
@@ -466,12 +441,12 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
   _List_base(_List_base&&) = default;
 
   _List_base(_List_base&& __x, _Node_alloc_type&& __a)
+  : _M_impl(std::move(__x), std::move(__a))
+  { }
+
+  _List_base(_Node_alloc_type&& __a)
   : _M_impl(std::move(__a))
-  {
-	if (__x._M_get_Node_allocator() == _M_get_Node_allocator())
-	  _M_move_nodes(std::move(__x));
-	// else caller must move individual elements.
-  }
+  { }
 
   void
   _M_move_nodes(_List_base&& __x)
@@ -616,6 +591,25 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
 	}
 #endif
 
+#if _GLIBCXX_USE_CXX11_ABI
+  size_t
+  _M_distance(const_iterator __first, const_iterator __last) const
+  { return std::distance(__first, __last); }
+
+  // return the stored size
+  size_t
+  _M_node_count() const
+  { return this->_M_get_size(); }
+#else
+  // dummy implementations used when the size is not stored
+  size_t _M_distance(const_iterator, const_iterator) const { return 0; }
+
+  // count the number of nodes
+  size_t
+  _M_node_count() const
+  { return std::distance(begin(), end()); }
+#endif
+
 public:
   // [23.2.2.1] construct/copy/destroy
   // (assign() and get_allocator() are also listed in this section)
@@ -718,15 +712,27 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
   : _Base(_Node_alloc_type(__a))
   { _M_initialize_dispatch(__x.begin(), __x.end(), __false_type()); }
 
-  list(list&& __x, const allocator_type& __a)
-

Re: [PATCH] New C++ warning option '-Wduplicated-access-specifiers'

2017-07-21 Thread David Malcolm

On Fri, 2017-07-21 at 11:56 -0400, David Malcolm wrote:
> On Thu, 2017-07-20 at 18:35 +0200, Volker Reichelt wrote:
> > Hi,
> > 
> > the following patch introduces a new C++ warning option
> > -Wduplicated-access-specifiers that warns about redundant
> > access-specifiers in classes, e.g.
> > 
> >   class B
> >   {
> > public:
> >   B();
> > 
> > private:
> >   void foo();
> > private:
> >   int i;
> >   };
> > 
> > test.cc:8:5: warning: duplicate 'private' access-specifier [
> > -Wduplicated-access-specifiers]
> >  private:
> >  ^~~
> >  ---
> > test.cc:6:5: note: access-specifier was previously given here
> >  private:
> >  ^~~
> 
> Thanks for working on this.
> 
> I'm not able to formally review, but you did CC me; various comments
> below throughout.
> 
> > The test is implemented by tracking the location of the last
> > access-specifier together with the access-specifier itself.
> > The location serves two purposes: the obvious one is to be able to
> > print the location in the note that comes with the warning.
> > The second purpose is to be able to distinguish an access-specifier
> > given by the user from the default of the class type (i.e. public
> > for
> > 'struct' and private for 'class') where the location is set to
> > UNKNOWN_LOCATION. The warning is only issued if the user gives the
> > access-specifier twice, i.e. if the previous location is not
> > UNKNOWN_LOCATION.
> 
> Presumably given
> 
> struct foo
> {
> public:
>   /* ... *
> };
> 
> we could issue something like:
> 
>   warning: access-specifier 'public' is redundant within 'struct'
> 
> for the first; similarly for:
> 
> class bar
> {
> private:
> };
> 
> we could issue:
> 
>   warning: access-specifier 'private' is redundant within 'class'
> 
> 
> > One could actually make this a two-level warning so that on the
> > higher level also the default class-type settings are taken into
> > account. Would that be helpful? I could prepare a second patch for
> > that.
> 
> I'm not sure how best to structure it.

Maybe combine the two ideas, and call it
  -Wredundant-access-specifiers
?

Or just "-Waccess-specifiers" ?

[...snip...]

Dave

Re: [PATCH][AArch64] vec_pack_trunc_ should split after register allocator

2017-07-21 Thread James Greenhalgh

On Thu, Apr 27, 2017 at 05:08:38AM +, Hurugalawadi, Naveen wrote:
> Hi,
> 
> The instruction "vec_pack_trunc_" should be split after register
> allocator for scheduling reasons. Currently the instruction is marked as type
> multiple which means it will scheduled as single issued. However, nothing can
> be scheduled with either xtn/xtn2 which is a problem in some cases.

What's the reason for splitting this only after reload? I think we can
split this whenever we like, and that there isn't any benefit in keeping
the pair together?

Am I missing something?

Thanks,
James

> 
> The patch splits the instruction and fixes the issue.
> 
> Please review the patch and let me know if its okay.
> Bootstrapped and Regression tested on aarch64-thunder-linux.
> 
> 2017-04-27  Naveen H.S  
> 
>   * config/aarch64/aarch64-simd.md
>   (aarch64_simd_vec_pack_trunc_hi_): New pattern.
>   (vec_pack_trunc_): Split the instruction pattern.

Re: C PATCH to fix bogus warning with -Wmultistatement-macros (PR c/81364)

2017-07-21 Thread David Malcolm

On Fri, 2017-07-14 at 15:38 +0200, Marek Polacek wrote:
> I think David might be able to approve this one, so CCing.

It's something of a stretch for my "diagnostic messages" maintainer
hat, but I've looked over these changes and they look good to me.

Dave

> On Tue, Jul 11, 2017 at 03:23:16PM +0200, Marek Polacek wrote:
> > This patch fixes a bogus -Wmultistatement-macros warning.  The code
> > didn't
> > notice that what came after a guard such as else was actually
> > wrapped in { }
> > which is a correct use.  This bogus warning only triggered when the
> > body of
> > a conditional was coming from a different expansion than the
> > conditional
> > itself.
> > 
> > Bootstrapped/regtested on x86_64-linux, ok for trunk?
> > 
> > 2017-07-11  Marek Polacek  
> > 
> > PR c/81364
> > * c-parser.c (c_parser_else_body): Don't warn about
> > multistatement
> > macro expansion if the body is in { }.
> > (c_parser_while_statement): Likewise.
> > (c_parser_for_statement): Likewise.
> > 
> > * Wmultistatement-macros-12.c: New test.
> > 
> > diff --git gcc/c/c-parser.c gcc/c/c-parser.c
> > index f8fbc92..7524a73 100644
> > --- gcc/c/c-parser.c
> > +++ gcc/c/c-parser.c
> > @@ -5557,7 +5557,8 @@ c_parser_else_body (c_parser *parser, const
> > token_indent_info _tinfo,
> >  }
> >else
> >  {
> > -  body_loc_after_labels = c_parser_peek_token (parser)
> > ->location;
> > +  if (!c_parser_next_token_is (parser, CPP_OPEN_BRACE))
> > +   body_loc_after_labels = c_parser_peek_token (parser)
> > ->location;
> >c_parser_statement_after_labels (parser, NULL, chain);
> >  }
> >  
> > @@ -5811,6 +5812,7 @@ c_parser_while_statement (c_parser *parser,
> > bool ivdep, bool *if_p)
> >  = get_token_indent_info (c_parser_peek_token (parser));
> >  
> >location_t loc_after_labels;
> > +  bool open_brace = c_parser_next_token_is (parser,
> > CPP_OPEN_BRACE);
> >body = c_parser_c99_block_statement (parser, if_p,
> > _after_labels);
> >c_finish_loop (loc, cond, NULL, body, c_break_label,
> > c_cont_label, true);
> >add_stmt (c_end_compound_stmt (loc, block, flag_isoc99));
> > @@ -5820,7 +5822,7 @@ c_parser_while_statement (c_parser *parser,
> > bool ivdep, bool *if_p)
> >  = get_token_indent_info (c_parser_peek_token (parser));
> >warn_for_misleading_indentation (while_tinfo, body_tinfo,
> > next_tinfo);
> >  
> > -  if (next_tinfo.type != CPP_SEMICOLON)
> > +  if (next_tinfo.type != CPP_SEMICOLON && !open_brace)
> >  warn_for_multistatement_macros (loc_after_labels,
> > next_tinfo.location,
> > while_tinfo.location,
> > RID_WHILE);
> >  
> > @@ -6109,6 +6111,7 @@ c_parser_for_statement (c_parser *parser,
> > bool ivdep, bool *if_p)
> >  = get_token_indent_info (c_parser_peek_token (parser));
> >  
> >location_t loc_after_labels;
> > +  bool open_brace = c_parser_next_token_is (parser,
> > CPP_OPEN_BRACE);
> >body = c_parser_c99_block_statement (parser, if_p,
> > _after_labels);
> >  
> >if (is_foreach_statement)
> > @@ -6122,7 +6125,7 @@ c_parser_for_statement (c_parser *parser,
> > bool ivdep, bool *if_p)
> >  = get_token_indent_info (c_parser_peek_token (parser));
> >warn_for_misleading_indentation (for_tinfo, body_tinfo,
> > next_tinfo);
> >  
> > -  if (next_tinfo.type != CPP_SEMICOLON)
> > +  if (next_tinfo.type != CPP_SEMICOLON && !open_brace)
> >  warn_for_multistatement_macros (loc_after_labels,
> > next_tinfo.location,
> > for_tinfo.location, RID_FOR);
> >  
> > diff --git gcc/testsuite/c-c++-common/Wmultistatement-macros-12.c
> > gcc/testsuite/c-c++-common/Wmultistatement-macros-12.c
> > index e69de29..ac8915c 100644
> > --- gcc/testsuite/c-c++-common/Wmultistatement-macros-12.c
> > +++ gcc/testsuite/c-c++-common/Wmultistatement-macros-12.c
> > @@ -0,0 +1,43 @@
> > +/* PR c/81364 */
> > +/* { dg-do compile } */
> > +/* { dg-options "-Wmultistatement-macros" } */
> > +
> > +#define FOO0 if (1) { } else
> > +#define TST0 \
> > +void bar0 (void) \
> > +{ \
> > +  FOO0 { } /* { dg-bogus "macro expands to multiple statements" }
> > */ \
> > +}
> > +TST0
> > +
> > +#define FOO1 for (;;)
> > +#define TST1 \
> > +void bar1 (void) \
> > +{ \
> > +  FOO1 { } /* { dg-bogus "macro expands to multiple statements" }
> > */ \
> > +}
> > +TST1
> > +
> > +#define FOO2 while (1)
> > +#define TST2 \
> > +void bar2 (void) \
> > +{ \
> > +  FOO2 { } /* { dg-bogus "macro expands to multiple statements" }
> > */ \
> > +}
> > +TST2
> > +
> > +#define FOO3 switch (1)
> > +#define TST3 \
> > +void bar3 (void) \
> > +{ \
> > +  FOO3 { } /* { dg-bogus "macro expands to multiple statements" }
> > */ \
> > +}
> > +TST3
> > +
> > +#define FOO4 if (1)
> > +#define TST4 \
> > +void bar4 (void) \
> > +{ \
> > +  FOO4 { } /* { dg-bogus "macro expands to multiple statements" }
> > */ \
> > +}
> > +TST4
> > 
> > Marek
> 
>   Marek

[PATCH 2/3] Use BUILD_PATH_PREFIX_MAP envvar to transform FILE

2017-07-21 Thread Ximin Luo

Use the BUILD_PATH_PREFIX_MAP environment variable when expanding the __FILE__
macro, in the same way that debug-prefix-map works for debugging symbol paths.

This patch follows similar lines to the earlier patch for SOURCE_DATE_EPOCH.
Specifically, we read the environment variable not in libcpp but via a hook
which has an implementation defined in gcc/c-family.  However, to achieve this
is more complex than the earlier patch: we need to share the prefix_map data
structure and associated functions between libcpp and c-family.  Therefore, we
need to move these to libiberty.  (For comparison, the SOURCE_DATE_EPOCH patch
did not need this because time_t et. al. are in the standard C library.)

Acknowledgements


Dhole  who wrote the earlier patch for SOURCE_DATE_EPOCH
which saved me a lot of time on figuring out what to edit.

ChangeLogs
--

gcc/ChangeLog:

2017-07-21  Ximin Luo  

* doc/invoke.texi (Environment Variables): Document form and behaviour
of BUILD_PATH_PREFIX_MAP.

gcc/c-family/ChangeLog:

2017-07-21  Ximin Luo  

* c-common.c (cb_get_build_path_prefix_map): Define new call target.
* c-common.h (cb_get_build_path_prefix_map): Declare call target.
* c-lex.c (init_c_lex): Set the get_build_path_prefix_map callback.

libcpp/ChangeLog:

2017-07-21  Ximin Luo  

* include/cpplib.h (cpp_callbacks): Add get_build_path_prefix_map
callback.
* init.c (cpp_create_reader): Initialise build_path_prefix_map field.
* internal.h (cpp_reader): Add new field build_path_prefix_map.
* macro.c (_cpp_builtin_macro_text): Set the build_path_prefix_map
field if unset and apply it when expanding __FILE__ macros.

gcc/testsuite/ChangeLog:

2017-07-21  Ximin Luo  

* gcc.dg/cpp/build_path_prefix_map-1.c: New test.
* gcc.dg/cpp/build_path_prefix_map-2.c: New test.

Index: gcc-8-20170716/gcc/doc/invoke.texi
===
--- gcc-8-20170716.orig/gcc/doc/invoke.texi
+++ gcc-8-20170716/gcc/doc/invoke.texi
@@ -27197,6 +27197,26 @@ Recognize EUCJP characters.
 If @env{LANG} is not defined, or if it has some other value, then the
 compiler uses @code{mblen} and @code{mbtowc} as defined by the default locale 
to
 recognize and translate multibyte characters.
+
+@item BUILD_PATH_PREFIX_MAP
+@findex BUILD_PATH_PREFIX_MAP
+If this variable is set, it specifies an ordered map used to transform
+filepaths output in debugging symbols and expansions of the @code{__FILE__}
+macro.  This may be used to achieve fully reproducible output.  In the context
+of running GCC within a higher-level build tool, it is typically more reliable
+than setting command line arguments such as @option{-fdebug-prefix-map} or
+common environment variables such as @env{CFLAGS}, since the build tool may
+save these latter values into other output outside of GCC's control.
+
+The value is of the form
+@samp{@var{dst@r{[0]}}=@var{src@r{[0]}}:@var{dst@r{[1]}}=@var{src@r{[1]}}@r{@dots{}}}.
+If any @var{dst@r{[}i@r{]}} or @var{src@r{[}i@r{]}} contains @code{%}, @code{=}
+or @code{:} characters, they must be replaced with @code{%#}, @code{%+}, and
+@code{%.} respectively.
+
+Whenever GCC emits a filepath that starts with a whole path component matching
+@var{src@r{[}i@r{]}} for some @var{i}, with rightmost @var{i} taking priority,
+the matching part is replaced with @var{dst@r{[}i@r{]}} in the final output.
 @end table
 
 @noindent
Index: gcc-8-20170716/gcc/c-family/c-common.c
===
--- gcc-8-20170716.orig/gcc/c-family/c-common.c
+++ gcc-8-20170716/gcc/c-family/c-common.c
@@ -21,6 +21,7 @@ along with GCC; see the file COPYING3.
 
 #include "config.h"
 #include "system.h"
+#include "prefix-map.h"
 #include "coretypes.h"
 #include "target.h"
 #include "function.h"
@@ -7905,6 +7906,25 @@ cb_get_source_date_epoch (cpp_reader *pf
   return (time_t) epoch;
 }
 
+/* Read BUILD_PATH_PREFIX_MAP from environment to have deterministic relative
+   paths to replace embedded absolute paths to get reproducible results.
+   Returns NULL if BUILD_PATH_PREFIX_MAP is badly formed.  */
+
+prefix_map **
+cb_get_build_path_prefix_map (cpp_reader *pfile ATTRIBUTE_UNUSED)
+{
+  prefix_map **map = XCNEW (prefix_map *);
+
+  const char *arg = getenv ("BUILD_PATH_PREFIX_MAP");
+  if (!arg || prefix_map_parse (map, arg))
+return map;
+
+  free (map);
+  error_at (input_location, "environment variable BUILD_PATH_PREFIX_MAP is "
+   "not well formed; see the GCC documentation for more details.");
+  return NULL;
+}
+
 /* Callback for libcpp for offering spelling suggestions for misspelled
directives.  GOAL is an unrecognized string; CANDIDATES is a
NULL-terminated array of candidate strings.  Return the closest
Index:

[PATCH 1/3] Use BUILD_PATH_PREFIX_MAP envvar for debug-prefix-map

2017-07-21 Thread Ximin Luo

Define the BUILD_PATH_PREFIX_MAP environment variable, and treat it as implicit
-fdebug-prefix-map CLI options specified before any explicit such options.

Much of the generic code for applying and parsing prefix-maps is implemented in
libiberty instead of the dwarf2 parts of the code, in order to make subsequent
patches unrelated to debuginfo easier.

Acknowledgements


Daniel Kahn Gillmor who wrote the patch for r231835, which saved me a lot of
time figuring out what to edit.

HW42 for discussion on the details of the proposal, and for suggesting that we
retain the ability to map the prefix to something other than ".".

Other contributors to the BUILD_PATH_PREFIX_MAP specification, see
https://reproducible-builds.org/specs/build-path-prefix-map/

ChangeLogs
--

include/ChangeLog:

2017-07-21  Ximin Luo  

* prefix-map.h: New file implementing the BUILD_PATH_PREFIX_MAP
specification; includes code from /gcc/final.c and code adapted from
examples attached to the specification.

libiberty/ChangeLog:

2017-07-21  Ximin Luo  

* prefix-map.c: New file implementing the BUILD_PATH_PREFIX_MAP
specification; includes code from /gcc/final.c and code adapted from
examples attached to the specification.
* Makefile.in: Update for new files.

gcc/ChangeLog:

2017-07-21  Ximin Luo  

* debug.h: Declare add_debug_prefix_map_from_envvar.
* final.c: Define add_debug_prefix_map_from_envvar, and refactor
prefix-map utilities to use equivalent code from libiberty instead.
* opts-global.c: (handle_common_deferred_options): Call
add_debug_prefix_map_from_envvar before processing options.

gcc/testsuite/ChangeLog:

2017-07-21  Ximin Luo  

* lib/gcc-dg.exp: Allow dg-set-compiler-env-var to take only one
argument in which case it unsets the given env var.
* gcc.dg/debug/dwarf2/build_path_prefix_map-1.c: New test.
* gcc.dg/debug/dwarf2/build_path_prefix_map-2.c: New test.

Index: gcc-8-20170716/include/prefix-map.h
===
--- /dev/null
+++ gcc-8-20170716/include/prefix-map.h
@@ -0,0 +1,94 @@
+/* Declarations for manipulating filename prefixes.
+   Written 2017 by Ximin Luo 
+   This code is in the public domain. */
+
+#ifndef _PREFIX_MAP_H
+#define _PREFIX_MAP_H
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#ifdef HAVE_STDLIB_H
+#include 
+#endif
+
+/* Linked-list of mappings from old prefixes to new prefixes.  */
+
+struct prefix_map
+{
+  const char *old_prefix;
+  const char *new_prefix;
+  size_t old_len;
+  size_t new_len;
+  struct prefix_map *next;
+};
+
+
+/* Find a mapping suitable for the given OLD_NAME in the linked list MAP.\
+
+   If a mapping is found, writes a pointer to the non-matching suffix part of
+   OLD_NAME in SUFFIX, and its length in SUF_LEN.
+
+   Returns NULL if there was no suitable mapping.  */
+struct prefix_map *
+prefix_map_find (struct prefix_map *map, const char *old_name,
+const char **suffix, size_t *suf_len);
+
+/* Prepend a prefix map before a given SUFFIX.
+
+   The remapped name is written to NEW_NAME and returned as a const pointer. No
+   allocations are performed; the caller must ensure it can hold at least
+   MAP->NEW_LEN + SUF_LEN + 1 characters.  */
+const char *
+prefix_map_prepend (struct prefix_map *map, char *new_name,
+   const char *suffix, size_t suf_len);
+
+/* Remap a filename.
+
+   Returns OLD_NAME unchanged if there was no remapping, otherwise returns a
+   pointer to newly-allocated memory for the remapped filename.  The memory is
+   allocated by the given ALLOC function, which also determines who is
+   responsible for freeing it.  */
+#define prefix_map_remap_alloc_(map_head, old_name, alloc)\
+  __extension__
   \
+  ({  \
+const char *__suffix; \
+size_t __suf_len; \
+struct prefix_map *__map; \
+(__map = prefix_map_find ((map_head), (old_name), &__suffix, &__suf_len))  
\
+  ? prefix_map_prepend (__map,\
+   (char *) alloc (__map->new_len + __suf_len + 1),   \
+   __suffix, __suf_len)   \
+  : (old_name);   \
+  })
+
+/* Remap a filename.
+
+   Returns OLD_NAME unchanged if there was no remapping, otherwise returns a
+   stack-allocated pointer to the newly-remapped filename.  */
+#define

[PATCH 3/3] When remapping paths, only match whole path components

2017-07-21 Thread Ximin Luo

Change the remapping algorithm so that each old_prefix only matches paths that
have old_prefix as a whole path component prefix.  (A whole path component is a
part of a path that begins and ends at a directory separator or at either end
of the path string.)

This remapping algorithm is more predictable than the old algorithm, because
there is no chance of mappings for one directory interfering with mappings for
other directories.  It contains less corner cases and therefore it is easier
for users to figure out how to set the mapping appropriately.  Therefore, I
believe it is better as a standardised algorithm that other build tools might
like to adopt, and so in our BUILD_PATH_PREFIX_MAP specification we recommend
this algorithm - though we allow others, and explicitly mention GCC's current
algorithm.  But it would be good for GCC to adopt this newer and cleaner one.

(The original idea came from discussions with rustc developers on this topic.)

This does technically break backwards-compatibility, but I was under the
impression that this option was not seen as such a critical feature, that this
would be too important.  Nevertheless, this part is totally independent from
the other patches and may be included or excluded as GCC maintainers desire.

Acknowledgements


Discussions with Michael Woerister and other members of the Rust compiler team
on Github, and discussions with Daniel Shahaf on the rb-general@ mailing list
on lists.reproducible-builds.org.

ChangeLogs
--

libiberty/ChangeLog:

2017-07-21  Ximin Luo  

* prefix-map.c: When remapping paths, only match whole path components.

Index: gcc-8-20170716/libiberty/prefix-map.c
===
--- gcc-8-20170716.orig/libiberty/prefix-map.c
+++ gcc-8-20170716/libiberty/prefix-map.c
@@ -87,12 +87,22 @@ struct prefix_map *
 prefix_map_find (struct prefix_map *map, const char *old_name,
 const char **suffix, size_t *suf_len)
 {
+  size_t len;
+
   for (; map; map = map->next)
-if (filename_ncmp (old_name, map->old_prefix, map->old_len) == 0)
-  {
-   *suf_len = strlen (*suffix = old_name + map->old_len);
-   break;
-  }
+{
+  len = map->old_len;
+  /* Ignore trailing path separators at the end of old_prefix */
+  while (len > 0 && IS_DIR_SEPARATOR (map->old_prefix[len-1])) len--;
+  /* Check if old_name matches old_prefix at a path component boundary */
+  if (! filename_ncmp (old_name, map->old_prefix, len)
+ && (IS_DIR_SEPARATOR (old_name[len])
+ || old_name[len] == '\0'))
+   {
+ *suf_len = strlen (*suffix = old_name + len);
+ break;
+   }
+}
 
   return map;
 }

[PING^4][PATCH v2] Generate reproducible output independently of the build-path

2017-07-21 Thread Ximin Luo

(Please keep me on CC, I am not subscribed)


Proposal


This patch series adds a new environment variable BUILD_PATH_PREFIX_MAP. When
this is set, GCC will treat this as extra implicit "-fdebug-prefix-map=$value"
command-line arguments that precede any explicit ones. This makes the final
binary output reproducible, and also hides the unreproducible value (the source
path prefixes) from CFLAGS et. al. which many build tools (understandably)
embed as-is into their build output.

This environment variable also acts on the __FILE__ macro, mapping it in the
same way that debug-prefix-map works for debug symbols. We have seen that
__FILE__ is also a very large source of unreproducibility, and is represented
quite heavily in the 3k+ figure given earlier.

Finally, we tweak the mapping algorithm so that it applies only to whole path
components when matching prefixes. This is justified in further detail in the
patch header. It is an optional part of the patch series and could be dropped
if the GCC maintainers are not convinced by our arguments there.


Background
==

We have prepared a document that describes how this works in detail, so that
projects can be confident that they are interoperable:

https://reproducible-builds.org/specs/build-path-prefix-map/

The specification is currently in DRAFT status, awaiting some final feedback,
including what the GCC maintainers think about it.

We have written up some more detailed discussions on the topic, including a
thorough justification on why we chose the mechanism of environment variables:

https://wiki.debian.org/ReproducibleBuilds/StandardEnvironmentVariables

The previous iteration of the patch series, essentially the same as the current
re-submission, is here:

https://gcc.gnu.org/ml/gcc-patches/2017-04/msg00513.html

An older version, that explains some GCC-specific background, is here:

https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00182.html

The current patch series applies cleanly to GCC-8 snapshot 20170716.


Reproducibility testing
===

Over the past 3 months, we have tested this patch backported to Debian GCC-6.
Together with a patched dpkg that sets the environment variable appropriately,
it allows us to reproduce ~1800 extra packages.

This is about 6.8% of ~26400 Debian source packages, and just over 1/2 of the
ones whose irreproducibility is due to build-path issues.

https://tests.reproducible-builds.org/debian/issues/unstable/gcc_captures_build_path_issue.html
https://tests.reproducible-builds.org/debian/unstable/index_suite_amd64_stats.html

The first major increase around 2017-04 is due to us deploying this patch. The
next major increase later in 2017-04 is unrelated, due to us deploying a patch
for R. The dip during the last part of 2017-06 is due to unpatched and patched
packages getting out-of-sync partly because of extra admin work around the
Debian stretch release, and we believe that the green will soon return to their
previous high after this situation settles.


Unit testing


I've tested these patches on a Debian unstable x86_64-linux-gnu schroot running
inside a Debian jessie system, on a full-bootstrap build. The output of
contrib/compare_tests is as follows:


gcc-8-20170716$ contrib/compare_tests ../gcc-build-{0,1}
# Comparing directories
## Dir1=../gcc-build-0: 8 sum files
## Dir2=../gcc-build-1: 8 sum files

# Comparing 8 common sum files
## /bin/sh contrib/compare_tests  /tmp/gxx-sum1.13468 /tmp/gxx-sum2.13468
New tests that PASS:

gcc.dg/cpp/build_path_prefix_map-1.c (test for excess errors)
gcc.dg/cpp/build_path_prefix_map-1.c execution test
gcc.dg/cpp/build_path_prefix_map-2.c (test for excess errors)
gcc.dg/cpp/build_path_prefix_map-2.c execution test
gcc.dg/debug/dwarf2/build_path_prefix_map-1.c (test for excess errors)
gcc.dg/debug/dwarf2/build_path_prefix_map-1.c scan-assembler DW_AT_comp_dir: 
"DWARF2TEST/gcc
gcc.dg/debug/dwarf2/build_path_prefix_map-2.c (test for excess errors)
gcc.dg/debug/dwarf2/build_path_prefix_map-2.c scan-assembler DW_AT_comp_dir: "/

# No differences found in 8 common sum files


I can also provide the full logs on request.


Fuzzing
===

I've also fuzzed the prefix-map code using AFL with ASAN enabled. Due to how
AFL works I did not fuzz this patch directly but a smaller program with just
the parser and remapper, available here:

https://anonscm.debian.org/cgit/reproducible/build-path-prefix-map-spec.git/tree/consume

Over the course of about ~4k cycles, no crashes were found.

To reproduce, you could run something like:

$ echo performance | sudo tee 
/sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
$ make CC=afl-gcc clean reset-fuzz-pecsplit.c fuzz-pecsplit.c


Copyright disclaimer


I've signed a copyright disclaimer and the FSF has this on record. (RT #1209764)

Re: [PATCH] Add AddressSanitizer annotations to std::vector

2017-07-21 Thread Jonathan Wakely


On 05/07/17 21:24 +0100, Jonathan Wakely wrote:

On 05/07/17 20:44 +0100, Yuri Gribov wrote:

On Wed, Jul 5, 2017 at 8:00 PM, Jonathan Wakely  wrote:

This patch adds AddressSanitizer annotations to std::vector, so that
ASan can detect out-of-bounds accesses to the unused capacity of a
vector. e.g.

std::vector v(2);
int* p = v.data();
v.pop_back();
return p[1];  // ERROR

This cannot be detected by Debug Mode, but with these annotations ASan
knows that only v.data()[0] is valid and will give an error.

The annotations are only enabled for vector and
only when std::allocator's base class is either malloc_allocator or
new_allocator. For other allocators the memory might not come from the
freestore and so isn't tracked by ASan.


One important issue with enabling this by default is that it may
(will?) break separate sanitization (which is extremely important
feature in practice). If one part of application is sanitized but the
other isn't and some poor std::vector is push_back'ed in latter and
then accessed in former, we'll get a false positive because push_back
wouldn't properly annotate memory.


Good point.


Perhaps hide this under a compilation flag (disabled by default)?


If you define _GLIBCXX_SANITIZE_STD_ALLOCATOR to 0 the annotations are
disabled. To make them disabled by default would need some changes, to
use separate macros for "the std::allocator base class can be
sanitized" and "the user wants std::vector to be sanitized".

I'll do that before committing.



Here's what I've committed.  std::vector
operations are not annotated unless 
_GLIBCXX_SANITIZE_VECTOR is defined.


Tested powerpc64le-linux and x86_64-linux, committed to trunk.


commit 4212dcf491c1187be79c9c905d57eaf4bc17fece
Author: Jonathan Wakely 
Date:   Wed Jul 5 13:25:21 2017 +0100

Add AddressSanitizer annotations to std::vector

	* config/allocator/malloc_allocator_base.h [__SANITIZE_ADDRESS__]
	(_GLIBCXX_SANITIZE_STD_ALLOCATOR): Define.
	* config/allocator/new_allocator_base.h [__SANITIZE_ADDRESS__]
	(_GLIBCXX_SANITIZE_STD_ALLOCATOR): Define.
	* doc/xml/manual/using.xml (_GLIBCXX_SANITIZE_VECTOR): Document macro.
	* include/bits/stl_vector.h [_GLIBCXX_SANITIZE_VECTOR]
	(_Vector_impl::_Asan, _Vector_impl::_Asan::_Reinit)
	(_Vector_impl::_Asan::_Grow, _GLIBCXX_ASAN_ANNOTATE_REINIT)
	(_GLIBCXX_ASAN_ANNOTATE_GROW, _GLIBCXX_ASAN_ANNOTATE_GREW)
	(_GLIBCXX_ASAN_ANNOTATE_SHRINK, _GLIBCXX_ASAN_ANNOTATE_BEFORE_DEALLOC):
	Define annotation helper types and macros.
	(vector::~vector, vector::push_back, vector::pop_back)
	(vector::_M_erase_at_end): Add annotations.
	* include/bits/vector.tcc (vector::reserve, vector::emplace_back)
	(vector::insert, vector::_M_erase, vector::operator=)
	(vector::_M_fill_assign, vector::_M_assign_aux)
	(vector::_M_insert_rval, vector::_M_emplace_aux)
	(vector::_M_insert_aux, vector::_M_realloc_insert)
	(vector::_M_fill_insert, vector::_M_default_append)
	(vector::_M_shrink_to_fit, vector::_M_range_insert): Annotate.

diff --git a/libstdc++-v3/config/allocator/malloc_allocator_base.h b/libstdc++-v3/config/allocator/malloc_allocator_base.h
index b091bbc..54e0837 100644
--- a/libstdc++-v3/config/allocator/malloc_allocator_base.h
+++ b/libstdc++-v3/config/allocator/malloc_allocator_base.h
@@ -52,4 +52,8 @@ namespace std
 # define __allocator_base  __gnu_cxx::malloc_allocator
 #endif
 
+#if defined(__SANITIZE_ADDRESS__) && !defined(_GLIBCXX_SANITIZE_STD_ALLOCATOR)
+# define _GLIBCXX_SANITIZE_STD_ALLOCATOR 1
+#endif
+
 #endif
diff --git a/libstdc++-v3/config/allocator/new_allocator_base.h b/libstdc++-v3/config/allocator/new_allocator_base.h
index 3d2bb67..e776ed3 100644
--- a/libstdc++-v3/config/allocator/new_allocator_base.h
+++ b/libstdc++-v3/config/allocator/new_allocator_base.h
@@ -52,4 +52,8 @@ namespace std
 # define __allocator_base  __gnu_cxx::new_allocator
 #endif
 
+#if defined(__SANITIZE_ADDRESS__) && !defined(_GLIBCXX_SANITIZE_STD_ALLOCATOR)
+# define _GLIBCXX_SANITIZE_STD_ALLOCATOR 1
+#endif
+
 #endif
diff --git a/libstdc++-v3/doc/xml/manual/using.xml b/libstdc++-v3/doc/xml/manual/using.xml
index 5c0e1b9..6ce29fd 100644
--- a/libstdc++-v3/doc/xml/manual/using.xml
+++ b/libstdc++-v3/doc/xml/manual/using.xml
@@ -991,6 +991,24 @@ g++ -Winvalid-pch -I. -include stdc++.h -H -g -O2 hello.cc -o test.exe
 
 
 
+_GLIBCXX_SANITIZE_VECTOR
+
+  
+	Undefined by default. When defined, std::vector
+operations will be annotated so that AddressSanitizer can detect
+invalid accesses to the unused capacity of a
+std::vector. These annotations are only
+enabled for
+std::vectorT, std::allocatorT
+and only when std::allocator is derived from
+new_allocator
+or malloc_allocator. The annotations
+must be present on all vector operations or none, so this macro must
+

Re: Deprecate DBX/stabs?

2017-07-21 Thread Jim Wilson

On Fri, Jul 21, 2017 at 7:15 AM, David Edelsohn  wrote:
> AIX still uses DBX as the primary debugging format.  AIX supports
> DWARF but the AIX toolchain does not fully interoperate with DWARF
> generated by GCC.

We could still deprecate DBX_DEBUG while leaving XCOFF_DEBUG alone for
now.  This would encourage people to migrate to DWARF2.  We won't be
able to drop dbxout.c until both DBX_DEBUG and XCOFF_DEBUG are dropped
which could be a while, but we can perhaps avoid any new users of
stabs.

I see that avr-*, *-lynx, pre-darwin9 32-bit i686-darwin, *-openbsd,
pdp11-*, vax-*, and cygwin/mingw32 with obsolete assemblers still
default to DBX_DEBUG.  Some of those can be dropped, and the others
can migrate to dwarf.

There is also the matter of SDB_DEBUG which is still supported, and is
no longer used by anyone, as we have already deprecated all of the
COFF targets that were using it.  SDB support for C++ is even worse
than the DBX support.  This should be no problem to deprecate and
remove.  We could perhaps even just drop it without bothering to
deprecate it first.

Jim

Re: [PATCH] New C++ warning option '-Wduplicated-access-specifiers'

2017-07-21 Thread David Malcolm

On Thu, 2017-07-20 at 18:35 +0200, Volker Reichelt wrote:
> Hi,
> 
> the following patch introduces a new C++ warning option
> -Wduplicated-access-specifiers that warns about redundant
> access-specifiers in classes, e.g.
> 
>   class B
>   {
> public:
>   B();
> 
> private:
>   void foo();
> private:
>   int i;
>   };
> 
> test.cc:8:5: warning: duplicate 'private' access-specifier [
> -Wduplicated-access-specifiers]
>  private:
>  ^~~
>  ---
> test.cc:6:5: note: access-specifier was previously given here
>  private:
>  ^~~

Thanks for working on this.

I'm not able to formally review, but you did CC me; various comments below 
throughout.

> The test is implemented by tracking the location of the last
> access-specifier together with the access-specifier itself.
> The location serves two purposes: the obvious one is to be able to
> print the location in the note that comes with the warning.
> The second purpose is to be able to distinguish an access-specifier
> given by the user from the default of the class type (i.e. public for
> 'struct' and private for 'class') where the location is set to
> UNKNOWN_LOCATION. The warning is only issued if the user gives the
> access-specifier twice, i.e. if the previous location is not
> UNKNOWN_LOCATION.

Presumably given

struct foo
{
public:
  /* ... *
};

we could issue something like:

  warning: access-specifier 'public' is redundant within 'struct'

for the first; similarly for:

class bar
{
private:
};

we could issue:

  warning: access-specifier 'private' is redundant within 'class'


> One could actually make this a two-level warning so that on the
> higher level also the default class-type settings are taken into
> account. Would that be helpful? I could prepare a second patch for
> that.

I'm not sure how best to structure it.

FWIW, when I first saw the patch, I wasn't a big fan of the warning, as I like 
to use access-specifiers to break up the kinds of entities within a class.

For example, our coding guidelines 
  https://gcc.gnu.org/codingconventions.html#Class_Form
recommend:

"first define all public types,
then define all non-public types,
then declare all public constructors,
...
then declare all non-public member functions, and
then declare all non-public member variables."

I find it's useful to put a redundant "private:" between the last two,
e.g.:

class baz
{
 public:
  ...

 private:
  void some_private_member ();

 private:
  int m_some_field;
};

to provide a subdivision between the private member functions and the
private data fields.

This might be a violation of our guidelines (though if so, I'm not sure
it's explicitly stated), but I find it useful, and the patch would warn
about it.

Having said that, looking at e.g. the "jit" subdir, I see lots of
places where the warning would fire.  In some of these, the code has a
bit of a "smell", so maybe I like the warning after all... in that it
can be good for a new warning to draw attention to code that might need
work.

Sorry that this is rambling; comments on the patch inline below.

Bootstrapped and regtested on x86_64-pc-linux-gnu.
> OK for trunk?
> 
> Btw, there are about 50 places in our libstdc++ headers where this
> warning triggers. I'll come up with a patch for this later.
> 
> Regards,
> Volker
> 
> 
> 2017-07-20  Volker Reichelt  
> 
> * doc/invoke.texi (-Wduplicated-access-specifiers): Document
> new
> warning option.
> 
> Index: gcc/doc/invoke.texi
> ===
> --- gcc/doc/invoke.texi (revision 250356)
> +++ gcc/doc/invoke.texi (working copy)
> @@ -275,7 +275,7 @@
>  -Wdisabled-optimization @gol
>  -Wno-discarded-qualifiers  -Wno-discarded-array-qualifiers @gol
>  -Wno-div-by-zero  -Wdouble-promotion @gol
> --Wduplicated-branches  -Wduplicated-cond @gol
> +-Wduplicated-access-specifiers  -Wduplicated-branches  -Wduplicated
> -cond @gol
>  -Wempty-body  -Wenum-compare  -Wno-endif-labels  -Wexpansion-to
> -defined @gol
>  -Werror  -Werror=*  -Wextra-semi  -Wfatal-errors @gol
>  -Wfloat-equal  -Wformat  -Wformat=2 @gol
> @@ -5388,6 +5388,12 @@
>  
>  This warning is enabled by @option{-Wall}.
>  
> +@item -Wduplicated-access-specifiers
> +@opindex Wno-duplicated-access-specifiers
> +@opindex Wduplicated-access-specifiers
> +Warn when an access-specifier is redundant because it was already
> given
> +before.

Presumably this should be marked as C++-specific.

I think it's best to give an example for any warning, though we don't
do that currently.

>  @item -Wduplicated-branches
>  @opindex Wno-duplicated-branches
>  @opindex Wduplicated-branches
> ===
> 
> 
> 2017-07-20  Volker Reichelt  
> 
> * c.opt (Wduplicated-access-specifiers): New C++ warning
> flag.
> 
> Index: gcc/c-family/c.opt
>

[PATCH v2 2/2] combine successive multiplications by constants

2017-07-21 Thread Alexander Monakov

Previous revision here: https://gcc.gnu.org/ml/gcc-patches/2017-07/msg01090.html

Reassociate X * CST1 * CST2 to X * (CST1 * CST2).

Changed in this revision:
- remove the check for @2 being 0 or -1

* match.pd ((X * CST1) * CST2): Simplify to X * (CST1 * CST2).
testsuite:
* gcc.dg/tree-ssa/assoc-2.c: Enhance.
* gcc.dg/tree-ssa/slsr-4.c: Adjust.

---
 gcc/match.pd| 13 +
 gcc/testsuite/gcc.dg/tree-ssa/assoc-2.c | 13 -
 gcc/testsuite/gcc.dg/tree-ssa/slsr-4.c  |  8 ++--
 3 files changed, 27 insertions(+), 7 deletions(-)

diff --git a/gcc/match.pd b/gcc/match.pd
index 39e1e5c..732b80c 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -284,6 +284,19 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 || mul != wi::min_value (TYPE_PRECISION (type), SIGNED))
  { build_zero_cst (type); })
 
+/* Combine successive multiplications.  Similar to above, but handling
+   overflow is different.  */
+(simplify
+ (mult (mult @0 INTEGER_CST@1) INTEGER_CST@2)
+ (with {
+   bool overflow_p;
+   wide_int mul = wi::mul (@1, @2, TYPE_SIGN (type), _p);
+  }
+  /* Skip folding on overflow: the only special case is @1 * @2 == -INT_MIN,
+ otherwise undefined overflow implies that @0 must be zero.  */
+  (if (!overflow_p || TYPE_OVERFLOW_WRAPS (type))
+   (mult @0 { wide_int_to_tree (type, mul); }
+
 /* Optimize A / A to 1.0 if we don't care about
NaNs or Infinities.  */
 (simplify
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/assoc-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/assoc-2.c
index a92c882..cc0e9d4 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/assoc-2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/assoc-2.c
@@ -5,4 +5,15 @@ int f0(int a, int b){
   return a * 33 * b * 55;
 }
 
-/* { dg-final { scan-tree-dump-times "mult_expr" 2 "gimple" } } */
+int f1(int a){
+  a *= 33;
+  return a * 55;
+}
+
+int f2(int a, int b){
+  a *= 33;
+  return a * b * 55;
+}
+
+/* { dg-final { scan-tree-dump-times "mult_expr" 7 "gimple" } } */
+/* { dg-final { scan-tree-dump-times "mult_expr" 5 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/slsr-4.c 
b/gcc/testsuite/gcc.dg/tree-ssa/slsr-4.c
index 17d7b4c..1e943b7 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/slsr-4.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/slsr-4.c
@@ -23,13 +23,9 @@ f (int i)
   foo (y);
 }
 
-/* { dg-final { scan-tree-dump-times "\\* 4" 1 "slsr" } } */
-/* { dg-final { scan-tree-dump-times "\\* 10" 1 "slsr" } } */
-/* { dg-final { scan-tree-dump-times "\\+ 20;" 1 "slsr" } } */
+/* { dg-final { scan-tree-dump-times "\\* 40" 1 "slsr" } } */
 /* { dg-final { scan-tree-dump-times "\\+ 200" 1 "slsr" } } */
-/* { dg-final { scan-tree-dump-times "\\- 16;" 1 "slsr" } } */
 /* { dg-final { scan-tree-dump-times "\\- 160" 1 "slsr" } } */
-/* { dg-final { scan-tree-dump-times "\\* 4" 1 "optimized" } } */
-/* { dg-final { scan-tree-dump-times "\\* 10" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "\\* 40" 1 "optimized" } } */
 /* { dg-final { scan-tree-dump-times "\\+ 200" 1 "optimized" } } */
 /* { dg-final { scan-tree-dump-times "\\+ 40" 1 "optimized" } } */
-- 
1.8.3.1

[PATCH v2 1/2] match.pd: reassociate multiplications

2017-07-21 Thread Alexander Monakov

Previous revision here: https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00889.html

Reassociate (X * CST) * Y to (X * Y) * CST, this pushes constants in
multiplication chains to outermost factors, where they can be combined.

Changed in this revision:
- remove !TYPE_OVERFLOW_SANITIZED and !TYPE_SATURATING checks;
 (in previous discussion Richard indicated that introducing false negatives
  in UBSAN by concealing signed overflow is not a concern, and saturating
  types shouldn't appear here because the constant operand should be FIXED_CST)

The checks for @1 being 0 or -1 remain as they are required for correctness,
but since this rule is ordered after the simpler rules that fold X * {0, -1},
those checks are always false at runtime.


* match.pd ((X * CST) * Y): Reassociate to (X * Y) * CST.
testsuite/
* gcc.dg/tree-ssa/assoc-2.c: New testcase.

---
 gcc/match.pd| 8 
 gcc/testsuite/gcc.dg/tree-ssa/assoc-2.c | 8 
 2 files changed, 16 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/assoc-2.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 7f5807c..39e1e5c 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -2213,6 +2213,14 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (mult @0 integer_minus_onep)
  (negate @0))
 
+/* Reassociate (X * CST) * Y to (X * Y) * CST.  This does not introduce
+   signed overflow for CST != 0 && CST != -1.  */
+(simplify
+ (mult:c (mult:s @0 INTEGER_CST@1) @2)
+ (if (TREE_CODE (@2) != INTEGER_CST
+  && !integer_zerop (@1) && !integer_minus_onep (@1))
+  (mult (mult @0 @2) @1)))
+
 /* True if we can easily extract the real and imaginary parts of a complex
number.  */
 (match compositional_complex
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/assoc-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/assoc-2.c
new file mode 100644
index 000..a92c882
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/assoc-2.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fdump-tree-gimple-raw -fdump-tree-optimized-raw" } */
+
+int f0(int a, int b){
+  return a * 33 * b * 55;
+}
+
+/* { dg-final { scan-tree-dump-times "mult_expr" 2 "gimple" } } */
-- 
1.8.3.1

Re: [patch,lto] Fix PR81487

2017-07-21 Thread Georg-Johann Lay


On 21.07.2017 13:41, Richard Biener wrote:

On Thu, Jul 20, 2017 at 3:18 PM, Georg-Johann Lay  wrote:

Hi, this patch fixes some minor problems in lto-plugin:

Some older mingw32 host environments have broken asprintf.
As far as I can tell, the problem is that the mingw asprintf
implementation calls _vsnprintf (NULL, 0, ...) which always
returns -1 as length on the host.

The patch fixes this by using xasprintf from libiberty with
the additional benefit of easier use and unified error reporting
in the case when there is actually no more memory available, i.e.
it will use the same error massages like xmalloc then.

Moreover, the old implementation calls asprintf as


asprintf (, "%s@0x%x%08x", file->name, lo, hi)


for large archives when the high 32 bits (hi) are non-zero.
This looks like a typo: the high part must come first to yiels
a proper 64-bit value.

Bootstrapped & re-tested on x86_64-linux gnu and also on
mingw32 which popped the "ld.exe: asprintf failed".

Ok for trunk?


Ok.

I wonder if any of the other plain asprintf calls in GCC are
problematic.  From a quick look they are all inside code
only exercised when dumping/debugging.  But maybe we
should replace those as well.

Richard.



Maybe I just didn't run into such spots yet.  I just started using
a different cross-toolchain for canadian cross builds. With my old
i386-mingw32 (3.4.5) I never saw such problems, but the new v8
(maybe also v7) no more gave sane build result, i.e. the generated
cross-compiler to run under mingw just hangs.  Must be somewhere
around tree dump no. 62, but I couldn't even find out which pass
actually hangs.  So I used 4.9.3 x64 -> mingw cross compiler in
the hope that it works better than good old 3.4.5 which did a very
good job for a really long time.

Johann



lto-plugin/
 PR lto/81487
 * lto-plugin.c (claim_file_handler): Use xasprintf instead of
 asprintf.
 [hi!=0]: Swap hi and lo arguments supplied to xasprintf.

[PATCH,rs6000] Fine-tune vec_construct direct move cost

2017-07-21 Thread Bill Schmidt

Hi,

In https://gcc.gnu.org/ml/gcc-patches/2017-05/msg00924.html, I raised the
vectorization cost for a vec_construct operation that requires direct
moves between GPRs and VSRs.  The cost equation I substituted has since
proven to be slightly more conservative than attended, and we're seeing
some cases of SLP vectorization being avoided that should not be.  This
patch adjusts the equation to reduce the cost somewhat.

I've tested this to ensure the cases previously seen are now being
vectorized again, and done some benchmark testing that shows no measurable
result, positive or negative.  So this is just minor fine-tuning, but
still important to get right.

Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no regressions.
Is this ok for trunk?

Thanks,
Bill


2017-07-21  Bill Schmidt  

PR target/80695
* config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost):
Reduce cost estimate for direct moves.


Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 250426)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -5757,7 +5757,7 @@ rs6000_builtin_vectorization_cost (enum vect_cost_
if (TARGET_P9_VECTOR)
  return TYPE_VECTOR_SUBPARTS (vectype) - 1 + 2;
else
- return TYPE_VECTOR_SUBPARTS (vectype) - 1 + 11;
+ return TYPE_VECTOR_SUBPARTS (vectype) - 1 + 5;
  }
else
  /* V2DFmode doesn't need a direct move.  */

Re: [PATCH] Add RDMA support to Falkor.

2017-07-21 Thread Richard Earnshaw (lists)

On 29/06/17 21:53, Jim Wilson wrote:
> Falkor is an ARMV8-A part, but also includes the RDMA extension from 
> ARMV8.1-A.
> I'd like to enable support for the RDMA instructions when -mcpu=falkor is 
> used,
> and also make the RDMA intrisics available.  To do that, I need to add rdma
> as an architecture extension, and modify a few things to use it.  Binutils
> already supports rdma as an architecture extension.
> 
> I only did the aarch64 port, and not the arm port.  There are no supported
> targets that have the RDMA instructions and also aarch32 support.  There are
> also no aarch32 RDMA testcases.  So there is no way to test it.  It wasn't
> clear whether it was better to add something untested or leave it out.  I 
> chose
> to leave it out for now.
> 
> I also needed a few testcase changes.  There were redundant options being
> added for the RDMA tests that I had to remove as they are now wrong.  Also
> the fact that I only did aarch64 means we need to check both armv8-a+rdma and
> armv8.1-a for the rdma support.
> 
> This was tested with an aarch64 bootstrap and make check.  There were no
> regressions.
> 
> OK?

OK.

R.

> 
> Jim
> 
>   gcc/
>   * config/aarch64/aarch64-cores.def (falkor): Add AARCH64_FL_RDMA.
>   (qdf24xx): Likewise.
>   * config/aarch64/aarch64-options-extensions.def (rdma); New.
>   * config/aarch64/aarch64.h (AARCH64_FL_RDMA): New.
>   (AARCH64_FL_V8_1): Renumber.
>   (AARCH64_FL_FOR_ARCH8_1): Add AARCH64_FL_RDMA.
>   (AARCH64_ISA_RDMA): Use AARCH64_FL_RDMA.
>   * config/aarch64/arm_neon.h: Use +rdma instead of arch=armv8.1-a.
>   * doc/invoke.texi (AArch64 Options): Mention +rmda in -march docs.  Add
>   rdma to feature modifiers list.
> 
>   gcc/testsuite/
>   * lib/target-supports.exp (add_options_for_arm_v8_1a_neon): Delete
>   redundant -march option.
>   (check_effective_target_arm_v8_1a_neon_ok_nocache): Try armv8-a+rdma
>   in addition to armv8.1-a.
> ---
>  gcc/config/aarch64/aarch64-cores.def |  4 ++--
>  gcc/config/aarch64/aarch64-option-extensions.def |  4 
>  gcc/config/aarch64/aarch64.h |  8 +---
>  gcc/config/aarch64/arm_neon.h|  2 +-
>  gcc/doc/invoke.texi  |  5 -
>  gcc/testsuite/lib/target-supports.exp| 18 ++
>  6 files changed, 26 insertions(+), 15 deletions(-)
> 
> diff --git a/gcc/config/aarch64/aarch64-cores.def 
> b/gcc/config/aarch64/aarch64-cores.def
> index f8342ca..b8d0ba6 100644
> --- a/gcc/config/aarch64/aarch64-cores.def
> +++ b/gcc/config/aarch64/aarch64-cores.def
> @@ -65,8 +65,8 @@ AARCH64_CORE("thunderxt83",   thunderxt83,   thunderx,  8A, 
>  AARCH64_FL_FOR_ARCH
>  AARCH64_CORE("xgene1",  xgene1,xgene1,8A,  AARCH64_FL_FOR_ARCH8, 
> xgene1, 0x50, 0x000, -1)
>  
>  /* Qualcomm ('Q') cores. */
> -AARCH64_CORE("falkor",  falkor,cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 
> | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, qdf24xx,   0x51, 0xC00, -1)
> -AARCH64_CORE("qdf24xx", qdf24xx,   cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 
> | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, qdf24xx,   0x51, 0xC00, -1)
> +AARCH64_CORE("falkor",  falkor,cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 
> | AARCH64_FL_CRC | AARCH64_FL_CRYPTO | AARCH64_FL_RDMA, qdf24xx,   0x51, 
> 0xC00, -1)
> +AARCH64_CORE("qdf24xx", qdf24xx,   cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 
> | AARCH64_FL_CRC | AARCH64_FL_CRYPTO | AARCH64_FL_RDMA, qdf24xx,   0x51, 
> 0xC00, -1)
>  
>  /* Samsung ('S') cores. */
>  AARCH64_CORE("exynos-m1",   exynosm1,  exynosm1,  8A,  AARCH64_FL_FOR_ARCH8 
> | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, exynosm1,  0x53, 0x001, -1)
> diff --git a/gcc/config/aarch64/aarch64-option-extensions.def 
> b/gcc/config/aarch64/aarch64-option-extensions.def
> index c0752ce..c4f059a 100644
> --- a/gcc/config/aarch64/aarch64-option-extensions.def
> +++ b/gcc/config/aarch64/aarch64-option-extensions.def
> @@ -63,4 +63,8 @@ AARCH64_OPT_EXTENSION("fp16", AARCH64_FL_F16, 
> AARCH64_FL_FP, 0, "fphp asimdhp")
>  /* Enabling or disabling "rcpc" only changes "rcpc".  */
>  AARCH64_OPT_EXTENSION("rcpc", AARCH64_FL_RCPC, 0, 0, "lrcpc")
>  
> +/* Enabling "rdma" also enables "fp", "simd".
> +   Disabling "rdma" just disables "rdma".  */
> +AARCH64_OPT_EXTENSION("rdma", AARCH64_FL_RDMA, AARCH64_FL_FP | 
> AARCH64_FL_SIMD, 0, "rdma")
> +
>  #undef AARCH64_OPT_EXTENSION
> diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
> index 106cf3a..7f91edb 100644
> --- a/gcc/config/aarch64/aarch64.h
> +++ b/gcc/config/aarch64/aarch64.h
> @@ -144,7 +144,8 @@ extern unsigned aarch64_architecture_version;
>  #define AARCH64_FL_CRC(1 << 3)   /* Has CRC.  */
>  /* ARMv8.1-A architecture extensions.  */
>  #define AARCH64_FL_LSE (1 << 4)  /* Has Large System Extensions. 
>  */
> -#define AARCH64_FL_V8_1(1 << 5)  /* Has ARMv8.1-A extensions.  */
> +#define AARCH64_FL_RDMA

[PATCH 7/6] fortran: fix pair_cmp qsort comparator

2017-07-21 Thread Alexander Monakov

Hello,

The final tie-breaker in pair_cmp comparator looks strange, it correctly
yields zero for equal expr->symtree-n.sym values, but for unequal values
it produces 0 or 1.  This would be correct for C++ STL-style comparators
that require "less-than" predicate to be computed, but not for C qsort.

The comment before the function seems to confirm that the intent was to
indeed sort in ascending gfc_symbol order, but the code is doing mostly
the opposite.

Make the comparator properly anti-commutative by returning -1 in the last
tie-breaker when appropriate.

Bootstrapped and regtested on x86-64, OK for trunk?

* interface.c (pair_cmp): Fix gfc_symbol comparison.  Adjust comment.
---
 gcc/fortran/interface.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/fortran/interface.c b/gcc/fortran/interface.c
index 6fe0647..13e2bdd 100644
--- a/gcc/fortran/interface.c
+++ b/gcc/fortran/interface.c
@@ -3294,7 +3294,7 @@ argpair;
order:
 - p->a->expr == NULL
 - p->a->expr->expr_type != EXPR_VARIABLE
-- growing p->a->expr->symbol.  */
+- by gfc_symbol pointer value (larger first).  */
 
 static int
 pair_cmp (const void *p1, const void *p2)
@@ -3320,6 +3320,8 @@ pair_cmp (const void *p1, const void *p2)
 }
   if (a2->expr->expr_type != EXPR_VARIABLE)
 return 1;
+  if (a1->expr->symtree->n.sym > a2->expr->symtree->n.sym)
+return -1;
   return a1->expr->symtree->n.sym < a2->expr->symtree->n.sym;
 }
 
-- 
1.8.3.1

Re: Deprecate DBX/stabs?

2017-07-21 Thread David Edelsohn

> Nathan Sidwell writes:

> Let's at least deprecate it.  I attach a patch to do so.  With the
> patch, you'll get a note about dbx being deprecated whenever you use
> stabs debugging on a system that prefers stabs (thus both -g and -gstabs
> will warn).  On systems where stabs is not preferred, -gstabs will not
> give you a warning.  The patch survices an x86_64-linux bootstrap.

Absolutely not.

AIX still uses DBX as the primary debugging format.  AIX supports
DWARF but the AIX toolchain does not fully interoperate with DWARF
generated by GCC.

With the extensive use of DBX by AIX and regular patches from me to
fix xcoff stabs debugging, omitting me from the cc list implies that
you really haven't done your homework.

Thanks, David

Re: Deprecate DBX/stabs?

2017-07-21 Thread Nathan Sidwell


On 07/21/2017 09:16 AM, Richard Biener wrote:

On Fri, Jul 21, 2017 at 3:07 PM, Nathan Sidwell  wrote:



+#ifndef DBX_DEBBUG_OK
 ^^^
typo?  The patch doesn't define this anywhere - I suggest to add it to
defaults.h
as 0 and use #if?  Also would need documenting if this is supposed to be a
target macro.


Like this?  I've now included XCOFF, as it's a subset of DBX.  Nothing 
appears to default to it.


nathan
--
Nathan Sidwell
2017-07-21  Nathan Sidwell  

	* defaults.h (DBX_DEBUG_DEPRECATED): New.
	* toplev.c (process_options): Warn about DBX/SDB being deprecated.
	* doc/tm.texi.in (DBX_DEBUG_DEPRECATED): Document.
	* doc/tm.texi: Updated.

Index: defaults.h
===
--- defaults.h	(revision 250426)
+++ defaults.h	(working copy)
@@ -889,6 +889,12 @@ see the files COPYING3 and COPYING.RUNTI
 #define SDB_DEBUGGING_INFO 0
 #endif
 
+/* DBX debugging is deprecated, and will generate a note if you
+   default to it.  */
+#ifndef DBX_DEBUG_DEPRECATED
+#define DBX_DEBUG_DEPRECATED 1
+#endif
+
 /* If more than one debugging type is supported, you must define
PREFERRED_DEBUGGING_TYPE to choose the default.  */
 
Index: doc/tm.texi
===
--- doc/tm.texi	(revision 250426)
+++ doc/tm.texi	(working copy)
@@ -9553,6 +9553,14 @@ user can always get a specific type of o
 
 @c prevent bad page break with this line
 These are specific options for DBX output.
+DBX debug data is deprecated and is expected to be removed.
+
+@defmac DBX_DEBUG_DEPRECATED
+Defined this macro to 1 if GCC should not warn about defaulting to DBX
+or XCOFF debug output.  This is intended to give maintainers notice of
+deprecation, but not be unnecessarily invasive.  Defining this macro is
+a short-term measure.  You need to plan for DBX's removal.
+@end defmac
 
 @defmac DBX_DEBUGGING_INFO
 Define this macro if GCC should produce debugging output for DBX
Index: doc/tm.texi.in
===
--- doc/tm.texi.in	(revision 250426)
+++ doc/tm.texi.in	(working copy)
@@ -6842,6 +6842,14 @@ user can always get a specific type of o
 
 @c prevent bad page break with this line
 These are specific options for DBX output.
+DBX debug data is deprecated and is expected to be removed.
+
+@defmac DBX_DEBUG_DEPRECATED
+Defined this macro to 1 if GCC should not warn about defaulting to DBX
+or XCOFF debug output.  This is intended to give maintainers notice of
+deprecation, but not be unnecessarily invasive.  Defining this macro is
+a short-term measure.  You need to plan for DBX's removal.
+@end defmac
 
 @defmac DBX_DEBUGGING_INFO
 Define this macro if GCC should produce debugging output for DBX
Index: toplev.c
===
--- toplev.c	(revision 250426)
+++ toplev.c	(working copy)
@@ -1413,6 +1413,12 @@ process_options (void)
 	debug_info_level = DINFO_LEVEL_NONE;
 }
 
+  if (DBX_DEBUG_DEPRECATED
+  && write_symbols == PREFERRED_DEBUGGING_TYPE
+  && (PREFERRED_DEBUGGING_TYPE == DBX_DEBUG
+	  || PREFERRED_DEBUGGING_TYPE == XCOFF_DEBUG))
+inform (UNKNOWN_LOCATION, "DBX/XCOFF (stabs) debugging is deprecated");
+
   if (flag_dump_final_insns && !flag_syntax_only && !no_backend)
 {
   FILE *final_output = fopen (flag_dump_final_insns, "w");

Re: [PATCH, PR81430] Use finalize_options in lto1

2017-07-21 Thread Tom de Vries


On 07/21/2017 11:41 AM, Richard Biener wrote:

On Thu, 20 Jul 2017, Tom de Vries wrote:


On 07/20/2017 12:10 PM, Richard Biener wrote:

On Thu, 20 Jul 2017, Tom de Vries wrote:


Hi,

this patch fixes PR81430, an ICE in the libgomp testsuite for both openmp
and
openacc test-cases for x86_64 with nvptx accelerator.

The scenario how we hit the ICE is as follows:
- a testcase is compiled with -O2
- ix86_option_optimization_table enables
OPT_freorder_blocks_and_partition at -O2
- cc1 writes out the flag as part of DECL_FUNCTION_SPECIFIC_OPTIMIZATION
- lto1 reads in the flag as part of DECL_FUNCTION_SPECIFIC_OPTIMIZATION
- lto1 uses the flag, and runs pass_partition_blocks
- pass_partition_blocks ICEs, because it generates code that is not
supported by the nvptx target.

Note that for standalone compilation for single-thread ptx execution, we
don't
attempt to run pass_partition_blocks. This is because for nvptx,
TARGET_HAVE_NAMED_SECTIONS is set to false, and this bit in finish_options
switches off pass_partition_blocks:
...
 /* If the target requested unwind info, then turn off the
partitioning optimization with a different message.  Likewise, if
the target does not support named sections.  */

if (opts->x_flag_reorder_blocks_and_partition
&& (!targetm_common.have_named_sections
|| (opts->x_flag_unwind_tables
&& targetm_common.unwind_tables_default
&& (ui_except == UI_SJLJ || ui_except >= UI_TARGET
  {
if (opts_set->x_flag_reorder_blocks_and_partition)
  inform (loc,
  "-freorder-blocks-and-partition does not work "
  "on this architecture");
opts->x_flag_reorder_blocks_and_partition = 0;
opts->x_flag_reorder_blocks = 1;
  }
...

The patch fixes this by calling finish_options in lto1 after
cl_optimization_restore.

Points for review:
1. I'm uncertain though about the placement of the call. Perhaps it should
be
in cl_optimization_restore, before targetm.override_options_after_change?

2. I think that this is offloading specific, so perhaps this should be
guarded
with lto_stream_offload_p or #ifdef ACCEL_COMPILER or some such.


Hmm, I agree with #2.  I think it conceptually is a LTO stream adjustment
and thus we should do this at the time we stream in the
optimization/target nodes (like we remap modes for example).  Not
sure if it's possible to do this at that point, but it looks like
finish_options takes two option structs and thus we should be able to
call it.



With what parameters? Patch below tries with same option struct, but ...


Do you get the inform note?  I suppose we don't really want that, no?



... I think that way we'll get the inform note (while the previous solution
did not).

I've also tried with a tmp2 memset to 0, but that ran into problems when doing
a maybe_set_param_value.


Use global_options_set?  That should do what the other patch did.



I managed to get it working now.  The variable tmp was only partly 
initialized, which caused the problems when calling 
maybe_set_param_value. I'm now using init_options_struct.


There's no note when using -O2 or "-O2 -freorder-blocks-and-partition".

But when I do "-O2 -foffload=-freorder-blocks-and-partition" I get:
...
lto1: note: '-freorder-blocks-and-partition' does not work on this 
architecture
lto1: note: '-freorder-blocks-and-partition' does not support unwind 
info on this architecture

...

And for "-O0 -foffload=-freorder-blocks-and-partition" I just get:
...
lto1: note: '-freorder-blocks-and-partition' does not work on this 
architecture

...

Thanks,
- Tom
Call finish_options in lto1

---
 gcc/tree-streamer-in.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/gcc/tree-streamer-in.c b/gcc/tree-streamer-in.c
index d7b6d22..eb41e75 100644
--- a/gcc/tree-streamer-in.c
+++ b/gcc/tree-streamer-in.c
@@ -33,6 +33,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "ipa-chkp.h"
 #include "gomp-constants.h"
 #include "asan.h"
+#include "opts.h"
 
 
 /* Read a STRING_CST from the string table in DATA_IN using input
@@ -769,6 +770,21 @@ lto_input_ts_function_decl_tree_pointers (struct lto_input_block *ib,
   DECL_FUNCTION_SPECIFIC_TARGET (expr) = stream_read_tree (ib, data_in);
 #endif
   DECL_FUNCTION_SPECIFIC_OPTIMIZATION (expr) = stream_read_tree (ib, data_in);
+#ifdef ACCEL_COMPILER
+  {
+tree opts = DECL_FUNCTION_SPECIFIC_OPTIMIZATION (expr);
+if (opts)
+  {
+	struct gcc_options tmp;
+	init_options_struct (, NULL);
+	cl_optimization_restore (, TREE_OPTIMIZATION (opts));
+	finish_options (, _options_set, UNKNOWN_LOCATION);
+	opts = build_optimization_node ();
+	finalize_options_struct ();
+	DECL_FUNCTION_SPECIFIC_OPTIMIZATION (expr) = opts;
+  }
+  }
+#endif
 
   /* If the file contains a function with an EH personality set,
  then it was compiled with -fexceptions.  In that case, initialize

Re: [PATCH][RFA/RFC] Stack clash mitigation patch 08/08 V2

2017-07-21 Thread Andreas Krebbel

Hi,

I've used your patch as the base and applied my changes on top.  The
attached patch is the result, so it is supposed to replace your
version.  It now also supports emitting a runtime loop.

It bootstraps fine but unfortunately I see an Ada regression which I
haven't tracked down yet.

> FAIL: cb1010a
> FAIL: gnat.dg/stack_check1.adb execution test
> FAIL: gnat.dg/stack_check1.adb execution test

Bye,

-Andreas-

diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index bbae89b..796ca76 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -11040,6 +11040,179 @@ pass_s390_early_mach::execute (function *fun)
 
 } // anon namespace
 
+/* Calculate TARGET = REG + OFFSET as s390_emit_prologue would do it.
+   - push too big immediates to the literal pool and annotate the refs
+   - emit frame related notes for stack pointer changes.  */
+
+static rtx
+s390_prologue_plus_offset (rtx target, rtx reg, rtx offset, bool 
frame_related_p)
+{
+  rtx insn;
+  rtx orig_offset = offset;
+
+  gcc_assert (REG_P (target));
+  gcc_assert (REG_P (reg));
+  gcc_assert (CONST_INT_P (offset));
+
+  if (offset == const0_rtx)   /* lr/lgr */
+{
+  insn = emit_move_insn (target, reg);
+}
+  else if (DISP_IN_RANGE (INTVAL (offset)))   /* la */
+{
+  insn = emit_move_insn (target, gen_rtx_PLUS (Pmode, reg,
+  offset));
+}
+  else
+{
+  if (!satisfies_constraint_K (offset)/* ahi/aghi */
+ && (!TARGET_EXTIMM
+ || (!satisfies_constraint_Op (offset)   /* alfi/algfi */
+ && !satisfies_constraint_On (offset /* slfi/slgfi */
+   offset = force_const_mem (Pmode, offset);
+
+  if (target != reg)
+   {
+ insn = emit_move_insn (target, reg);
+ RTX_FRAME_RELATED_P (insn) = frame_related_p ? 1 : 0;
+   }
+
+  insn = emit_insn (gen_add2_insn (target, offset));
+
+  if (!CONST_INT_P (offset))
+   {
+ annotate_constant_pool_refs ( (insn));
+
+ if (frame_related_p)
+   add_reg_note (insn, REG_FRAME_RELATED_EXPR,
+ gen_rtx_SET (target,
+  gen_rtx_PLUS (Pmode, target,
+orig_offset)));
+   }
+}
+
+  RTX_FRAME_RELATED_P (insn) = frame_related_p ? 1 : 0;
+
+  return insn;
+}
+
+/* Emit a compare instruction with a volatile memory access as stack
+   probe.  It does not waste store tags and does not clobber any
+   registers apart from the condition code.  */
+static void
+s390_emit_stack_probe (rtx addr)
+{
+  rtx tmp = gen_rtx_MEM (Pmode, addr);
+  MEM_VOLATILE_P (tmp) = 1;
+  s390_emit_compare (EQ, gen_rtx_REG (Pmode, 0), tmp);
+}
+
+#define PROBE_INTERVAL (1 << STACK_CHECK_PROBE_INTERVAL_EXP)
+#if (PROBE_INTERVAL - 4 > 4095)
+#error "S/390: stack probe offset must fit into short discplacement."
+#endif
+
+/* Use a runtime loop if we have to emit more probes than this.  */
+#define MIN_UNROLL_PROBES 3
+
+/* Allocate SIZE bytes of stack space, using TEMP_REG as a temporary
+   if necessary.  LAST_PROBE_OFFSET contains the offset of the closest
+   probe relative to the stack pointer.
+
+   Note that SIZE is negative.
+
+   The return value is true if TEMP_REG has been clobbered.  */
+static bool
+allocate_stack_space (rtx size, HOST_WIDE_INT last_probe_offset,
+ rtx temp_reg)
+{
+  bool temp_reg_clobbered_p = false;
+
+  if (flag_stack_clash_protection)
+{
+  if (last_probe_offset + -INTVAL (size) < PROBE_INTERVAL)
+   dump_stack_clash_frame_info (NO_PROBE_SMALL_FRAME, true);
+  else
+   {
+ rtx offset = GEN_INT (PROBE_INTERVAL - UNITS_PER_LONG);
+ HOST_WIDE_INT rounded_size = -INTVAL (size) & -PROBE_INTERVAL;
+ HOST_WIDE_INT num_probes = rounded_size / PROBE_INTERVAL;
+ HOST_WIDE_INT residual = -INTVAL (size) - rounded_size;
+
+ if (num_probes < MIN_UNROLL_PROBES)
+   {
+ /* Emit unrolled probe statements.  */
+
+ for (unsigned int i = 0; i < num_probes; i++)
+   {
+ s390_prologue_plus_offset (stack_pointer_rtx,
+stack_pointer_rtx,
+GEN_INT (-PROBE_INTERVAL), true);
+ s390_emit_stack_probe (gen_rtx_PLUS (Pmode,
+  stack_pointer_rtx,
+  offset));
+   }
+ dump_stack_clash_frame_info (PROBE_INLINE, residual != 0);
+   }
+ else
+   {
+ /* Emit a loop probing the pages.  */
+
+ rtx_code_label *loop_start_label = gen_label_rtx ();
+
+ /* From now on temp_reg will be the CFA register.  */
+ s390_prologue_plus_offset (temp_reg,

Re: [Arm] Obsoleting Command line option -mstructure-size-boundary in eabi configurations

2017-07-21 Thread Richard Earnshaw (lists)

On 13/07/17 11:26, Michael Collison wrote:
> Updated per Richard's comments and suggestions.
> 
> Okay for trunk?

OK.  Please can you write up an entry for the release notes (wwwdocs).

R.

> 
> 2017-07-10  Michael Collison  
> 
>   * config/arm/arm.c (arm_option_override): Deprecate
>   use of -mstructure-size-boundary.
>   * config/arm/arm.opt: Deprecate -mstructure-size-boundary.
>   * doc/invoke.texi: Deprecate -mstructure-size-boundary.
> 
> -Original Message-
> From: Richard Earnshaw (lists) [mailto:richard.earns...@arm.com] 
> Sent: Thursday, July 6, 2017 3:17 AM
> To: Michael Collison ; GCC Patches 
> 
> Cc: nd 
> Subject: Re: [Arm] Obsoleting Command line option -mstructure-size-boundary 
> in eabi configurations
> 
> On 06/07/17 06:46, Michael Collison wrote:
>> NetBSD/Arm requires that DEFAULT_STRUCTURE_SIZE_BOUNDARY (see 
>> config/arm/netbsd-elf.h for details). This patch disallows 
>> -mstructure-size-boundary on netbsd if the value is not equal to the 
>> DEFAULT_STRUCTURE_SIZE_BOUNDARY.
>>
>> Okay for trunk?
>>
>> 2017-07-05  Michael Collison  
>>
>>  * config/arm/arm.c (arm_option_override): Disallow
>>  -mstructure-size-boundary on netbsd if value is not
>>  DEFAULT_STRUCTURE_SIZE_BOUNDARY.
>>
>>
> 
> Frankly, I'd rather we moved towards obsoleting this option entirely.
> The origins are from the days of the APCS (note, not AAPCS) when the default 
> was 32 when most of the world expected 8.
> 
> Now that the AAPCS is widely adopted, APCS is obsolete (NetBSD uses
> ATPCS) and NetBSD (the only port not based on AAPCS these days) defaults to 8 
> I can't see why anybody now would be interested in using a different value.
> 
> So let's just mark this option as deprecated (emit a warning if
> 
> global_options_set.x_arm_structure_size_boundary
> 
> is ever set by the user, regardless of value).  Then in GCC 9 we can perhaps 
> remove this code entirely.
> 
> Documentation and release notes will need corresponding updates as well.
> 
> R.
> 
>> pr1556.patch
>>
>>
>> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 
>> bc1e607..911c272 100644
>> --- a/gcc/config/arm/arm.c
>> +++ b/gcc/config/arm/arm.c
>> @@ -3471,7 +3471,18 @@ arm_option_override (void)
>>  }
>>else
>>  {
>> -  if (arm_structure_size_boundary != 8
>> +  /* Do not allow structure size boundary to be overridden for 
>> + netbsd.  */
>> +
>> +  if ((arm_abi == ARM_ABI_ATPCS)
>> +  && (arm_structure_size_boundary != DEFAULT_STRUCTURE_SIZE_BOUNDARY))
>> +{
>> +  warning (0,
>> +   "option %<-mstructure-size-boundary%> is deprecated for 
>> netbsd; "
>> +   "defaulting to %d",
>> +   DEFAULT_STRUCTURE_SIZE_BOUNDARY);
>> +  arm_structure_size_boundary = DEFAULT_STRUCTURE_SIZE_BOUNDARY;
>> +}
>> +  else if (arm_structure_size_boundary != 8
>>&& arm_structure_size_boundary != 32
>>&& !(ARM_DOUBLEWORD_ALIGN && arm_structure_size_boundary == 64))
>>  {
>>
> 
> 
> pr1556v2.patch
> 
> 
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index c6101ef..b5dbfeb 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -3489,6 +3489,8 @@ arm_option_override (void)
>  }
>else
>  {
> +  warning (0, "option %<-mstructure-size-boundary%> is deprecated");
> +
>if (arm_structure_size_boundary != 8
> && arm_structure_size_boundary != 32
> && !(ARM_DOUBLEWORD_ALIGN && arm_structure_size_boundary == 64))
> diff --git a/gcc/config/arm/arm.opt b/gcc/config/arm/arm.opt
> index b6c707b..6060516 100644
> --- a/gcc/config/arm/arm.opt
> +++ b/gcc/config/arm/arm.opt
> @@ -192,7 +192,7 @@ Target RejectNegative Alias(mfloat-abi=, soft) 
> Undocumented
>  
>  mstructure-size-boundary=
>  Target RejectNegative Joined UInteger Var(arm_structure_size_boundary) 
> Init(DEFAULT_STRUCTURE_SIZE_BOUNDARY)
> -Specify the minimum bit alignment of structures.
> +Specify the minimum bit alignment of structures. (Deprecated).
>  
>  mthumb
>  Target Report RejectNegative Negative(marm) Mask(THUMB) Save
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 3e5cee8..cfe5985 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -15685,6 +15685,8 @@ incompatible.  Code compiled with one value cannot 
> necessarily expect to
>  work with code or libraries compiled with another value, if they exchange
>  information using structures or unions.
>  
> +This option is deprecated.
> +
>  @item -mabort-on-noreturn
>  @opindex mabort-on-noreturn
>  Generate a call to the function @code{abort} at the end of a
>

Re: Deprecate DBX/stabs?

2017-07-21 Thread Richard Biener

On Fri, Jul 21, 2017 at 3:07 PM, Nathan Sidwell  wrote:
> [darwin, cygwin, rx maintainers, you might have an opinion]
>
> On 07/21/2017 01:11 AM, Richard Biener wrote:
>>
>> On July 21, 2017 12:03:58 AM GMT+02:00, Jim Wilson 
>> wrote:
>>>
>>> On Thu, Jul 20, 2017 at 2:00 PM, Nathan Sidwell  wrote:

 With this patch the gdb stabs test results are still awful, but they
>>>
>>> are

 unchanged awfulness.
>>>
>>>
>>> Anyways, your new dbxout.c patch looks good.  And maybe we should
>>> think about deprecating the stabs support some day.
>>
>>
>> I've suggested that multiple times also to be able to get rid of the debug
>> hook interfacing in GCC and emit dwarf directly from FEs where that makes
>> sense.  DWARF should be a superset of other debug formats so it should be
>> possible to build stabs from dwarf.
>
>
> Let's at least deprecate it.  I attach a patch to do so.  With the patch,
> you'll get a note about dbx being deprecated whenever you use stabs
> debugging on a system that prefers stabs (thus both -g and -gstabs will
> warn).  On systems where stabs is not preferred, -gstabs will not give you a
> warning.  The patch survices an x86_64-linux bootstrap.
>
> A config can chose to override thus by defining 'DBX_DEBUG_OK' in the build
> defines.
>
> I did try build time CPP tricks, but config/rx.h and config/i386/darwin.h
> define it to be a conditional expression.
>
> AFAICT, the following include files are not used, and could probably be
> binned too:
> config/dbx.h
> config/dbxcoff.h
> config/dbxelf.h
> (+ configi386/gstabs.h Jim found)
>
> It looks like DBX is the default for:
> i386/cygming configured for 32-bit or lacking PE_SECREL32_RELOC
> i386/darwin.h for 32-bit target
> rx/rx.h when using AS100 syntax

Index: toplev.c
===
--- toplev.c(revision 250424)
+++ toplev.c(working copy)
@@ -1413,6 +1413,12 @@ process_options (void)
debug_info_level = DINFO_LEVEL_NONE;
 }

+#ifndef DBX_DEBBUG_OK
^^^
typo?  The patch doesn't define this anywhere - I suggest to add it to
defaults.h
as 0 and use #if?  Also would need documenting if this is supposed to be a
target macro.

Thanks,
Richard.

> nathan
> --
> Nathan Sidwell

Re: Add support to trace comparison instructions and switch statements

2017-07-21 Thread David Edelsohn

On Fri, Jul 21, 2017 at 1:38 AM, 吴潍浠(此彼)  wrote:
> Hi Jeff
>
> I have signed the copyright assignment, and used the name 'Wish Wu' .
> Should I send you a copy of my assignment ?

Your assignment now is on file in the FSF Copyright Assignment list
where Jeff, I and other maintainers can see it.  We cannot accept
assurances from developers; please do not send copies of copyright
assignments.

Thanks, David

P.S. It normally is unnecessary to send email to both GCC and GCC
Patches mailing lists.

[PATCH][3/n] Fix PR81303

2017-07-21 Thread Richard Biener


When forcing versioning for the vector profitability check I noticed
we use different niters for the peeling check than for the versioning one
leading to two branches, the 2nd being redundant.  The following makes
that consistent, also using the known not overflown number of latch
execution count.

Bootstrap / regtest pending on x86_64-unknown-linux-gnu.

Richard.

2017-06-21  Richard Biener  

PR tree-optimization/81303
* tree-vect-loop-manip.c (vect_loop_versioning): Build
profitability check against LOOP_VINFO_NITERSM1.

Index: gcc/tree-vect-loop-manip.c
===
--- gcc/tree-vect-loop-manip.c  (revision 250386)
+++ gcc/tree-vect-loop-manip.c  (working copy)
@@ -2136,7 +2136,7 @@ vect_loop_versioning (loop_vec_info loop
   tree arg;
   profile_probability prob = profile_probability::likely ();
   gimple_seq gimplify_stmt_list = NULL;
-  tree scalar_loop_iters = LOOP_VINFO_NITERS (loop_vinfo);
+  tree scalar_loop_iters = LOOP_VINFO_NITERSM1 (loop_vinfo);
   bool version_align = LOOP_REQUIRES_VERSIONING_FOR_ALIGNMENT (loop_vinfo);
   bool version_alias = LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo);
   bool version_niter = LOOP_REQUIRES_VERSIONING_FOR_NITERS (loop_vinfo);
@@ -2144,7 +2144,7 @@ vect_loop_versioning (loop_vec_info loop
   if (check_profitability)
 cond_expr = fold_build2 (GE_EXPR, boolean_type_node, scalar_loop_iters,
 build_int_cst (TREE_TYPE (scalar_loop_iters),
-  th));
+   th - 1));
 
   if (version_niter)
 vect_create_cond_for_niters_checks (loop_vinfo, _expr);

Deprecate DBX/stabs?

2017-07-21 Thread Nathan Sidwell


[darwin, cygwin, rx maintainers, you might have an opinion]

On 07/21/2017 01:11 AM, Richard Biener wrote:

On July 21, 2017 12:03:58 AM GMT+02:00, Jim Wilson  
wrote:

On Thu, Jul 20, 2017 at 2:00 PM, Nathan Sidwell  wrote:

With this patch the gdb stabs test results are still awful, but they

are

unchanged awfulness.


Anyways, your new dbxout.c patch looks good.  And maybe we should
think about deprecating the stabs support some day.


I've suggested that multiple times also to be able to get rid of the debug hook 
interfacing in GCC and emit dwarf directly from FEs where that makes sense.  
DWARF should be a superset of other debug formats so it should be possible to 
build stabs from dwarf.


Let's at least deprecate it.  I attach a patch to do so.  With the 
patch, you'll get a note about dbx being deprecated whenever you use 
stabs debugging on a system that prefers stabs (thus both -g and -gstabs 
will warn).  On systems where stabs is not preferred, -gstabs will not 
give you a warning.  The patch survices an x86_64-linux bootstrap.


A config can chose to override thus by defining 'DBX_DEBUG_OK' in the 
build defines.


I did try build time CPP tricks, but config/rx.h and 
config/i386/darwin.h define it to be a conditional expression.


AFAICT, the following include files are not used, and could probably be 
binned too:

config/dbx.h
config/dbxcoff.h
config/dbxelf.h
(+ configi386/gstabs.h Jim found)

It looks like DBX is the default for:
i386/cygming configured for 32-bit or lacking PE_SECREL32_RELOC
i386/darwin.h for 32-bit target
rx/rx.h when using AS100 syntax

nathan
--
Nathan Sidwell
2017-07-21  Nathan Sidwell  

	* toplev.c (process_options): Warn about DBX being deprecated.

Index: toplev.c
===
--- toplev.c	(revision 250424)
+++ toplev.c	(working copy)
@@ -1413,6 +1413,12 @@ process_options (void)
 	debug_info_level = DINFO_LEVEL_NONE;
 }
 
+#ifndef DBX_DEBBUG_OK
+  if (PREFERRED_DEBUGGING_TYPE == DBX_DEBUG
+  && write_symbols == DBX_DEBUG)
+inform (UNKNOWN_LOCATION, "DBX (stabs) debugging is deprecated");
+#endif
+
   if (flag_dump_final_insns && !flag_syntax_only && !no_backend)
 {
   FILE *final_output = fopen (flag_dump_final_insns, "w");

[libgomp] Doc update - TASKLOOP/GOMP_doacross_ description

2017-07-21 Thread Venevtsev, Igor

Hi!

This patch adds an Implementing-TASKLOOP-construct section as well as a 
description for GOMP_doacross_ runtime routines family to 
Implementing-FOR-Construct section. (I checked that 'make info' and 'make html' 
produce correct doc output)

2017-07-21  Igor Venevtsev  

* libgomp.texi
(Implementing-TASKLOOP-Construct): New section
(Implementing-FOR-Construct): Add documentaion for GOMP_doacross_
runtime routines family

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 230720f..1f17014 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -3096,6 +3096,7 @@ presented by libgomp.  Only maintainers should need them.
 * Implementing SECTIONS construct::
 * Implementing SINGLE construct::
 * Implementing OpenACC's PARALLEL construct::
+* Implementing TASKLOOP construct::
 @end menu
 
 
@@ -3354,7 +3355,7 @@ Note that while it looks like there is trickiness to 
propagating
 a non-constant STEP, there isn't really.  We're explicitly allowed
 to evaluate it as many times as we want, and any variables involved
 should automatically be handled as PRIVATE or SHARED like any other
-variables.  So the expression should remain evaluable in the 
+variables.  So the expression should remain evaluable in the
 subfunction.  We can also pull it into a local variable if we like,
 but since its supposed to remain unchanged, we can also not if we like.
 
@@ -3367,7 +3368,197 @@ of these routines.
 There are separate routines for handling loops with an ORDERED
 clause.  Bookkeeping for that is non-trivial...
 
+...Yep! But let's try!
 
+Loops with ORDERED clause @strong{with param} '#pragma omp for ordered 
@strong{(N)}'  do present so-called DOACROSS parallelism
+and axpanded to @strong{GOMP_doacross_...} family of runtime routines.
+Loops with ORDERED clause @strong{without param} '#pragma omp for ordered' are 
expanded to @strong{GOMP_loop_ordered_...} routine family.
+This section describes only the @strong{GOMP_doacross_} family so far.
+
+A small intoduction into Do-across parallelsism terms and syncronisation model 
could be found here:
+@uref{https://en.wikipedia.org/wiki/Loop-level_parallelism#DOACROSS_parallelism}
+
+Another explanation of doacross parallelism from libgomp author Jakub Jelinek 
could be found here:
+@uref{https://developers.redhat.com/blog/2016/03/22/what-is-new-in-openmp-4-5-3/}
+
+OMP v4.5 expresses wait() and post() operations by @strong{#pragma omp ordered 
depend(sink:...)} and @strong{#pragma omp ordered depend(source)}
+constructs. These constructs are lowered to GOMP_doacross_wait() and 
GOMP_doacross_post() correspondingly, see example below.
+
+Here is an DOACROSS loop example from OpenMP v4.5 official examples 
(@uref{https://github.com/OpenMP/Examples}).
+@*Note that ordered for clause has a parameter,  '#pragma omp for 
ordered@strong{(2)}'.
+
+
+@smallexample
+/*
+* @@name:   doacross.2.c
+* @@type:   C
+* @@compilable: yes
+* @@linkable:   no
+* @@expect: success
+*/
+float foo(int i, int j);
+float bar(float a, float b, float c);
+float baz(float b);
+
+void work( int N, int M, float **A, float **B, float **C )
+@{
+  int i, j;
+
+  #pragma omp for ordered(2)
+  for (i=1; i

[AArch64, Patch] Generate MLA when multiply + add vector by scalar

2017-07-21 Thread Jackson Woodruff


Hi all,

This merges vector multiplies and adds into a single mla instruction 
when the multiplication is done by a scalar.


Currently, for the following:

typedef int __attribute__((vector_size(16))) vec;

vec
mla0(vec v0, vec v1, vec v2)
{
  return v0 + v1 * v2[0];
}

vec
mla1(vec v0, vec v1, int v2)
{
  return v0 + v1 * c;
}

The function `mla0` outputs a multiply accumulate by element 
instruction. `mla1` outputs two vector operations (multiply followed by 
add). That is, we currently have:


mla0:
mlav0.4s, v1.4s, v2.s[0]
ret

mla1:
fmov   s2, w0
mulv1.4s, v1.4s, v2.s[0]
addv0.4s, v1.4s, v0.4s
ret

This patch replaces this with something similar to `mla0`:

mla1:
fmov   s2, w0
mlav0.4s, v1.4s, v2.s[0]

This is also done for the identical case for a multiply followed by a 
subtract of vectors with an integer operand on the multiply. Also add 
testcases for this.


Bootstrap and testsuite run on aarch64. OK for trunk?

Jackson

Changelog entry:

gcc/

2017-06-06  Jackson Woodruff 

* config/aarch64/aarch64-simd.md (aarch64_mla_elt_merge,

aarch64_mls_elt_merge, aarch64_fma4_elt_merge,

aarch64_fnma_elt_merge): New define_insns to generate

multiply accumulate instructions for unmerged

multiply add vector instructions.


gcc/testsuite/

2017-06-06  Jackson Woodruff 

* gcc.target/aarch64/simd/vmla_elem_1.c: New.

diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 
1cb6eeb318716aadacb84a44aa2062d486e0186b..ab1aa5ab84577b3cbddd1eb0e40d29e9b2aa4b42
 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1033,6 +1033,18 @@
   [(set_attr "type" "neon_mla__scalar")]
 )
 
+(define_insn "*aarch64_mla_elt_merge"
+  [(set (match_operand:VDQHS 0 "register_operand" "=w")
+   (plus:VDQHS
+ (mult:VDQHS (vec_duplicate:VDQHS
+ (match_operand: 1 "register_operand" "w"))
+   (match_operand:VDQHS 2 "register_operand" "w"))
+ (match_operand:VDQHS 3 "register_operand" "0")))]
+ "TARGET_SIMD"
+ "mla\t%0., %2., %1.[0]"
+  [(set_attr "type" "neon_mla__scalar")]
+)
+
 (define_insn "aarch64_mls"
  [(set (match_operand:VDQ_BHSI 0 "register_operand" "=w")
(minus:VDQ_BHSI (match_operand:VDQ_BHSI 1 "register_operand" "0")
@@ -1080,6 +1092,18 @@
   [(set_attr "type" "neon_mla__scalar")]
 )
 
+(define_insn "*aarch64_mls_elt_merge"
+  [(set (match_operand:VDQHS 0 "register_operand" "=w")
+   (minus:VDQHS
+ (match_operand:VDQHS 1 "register_operand" "0")
+ (mult:VDQHS (vec_duplicate:VDQHS
+ (match_operand: 2 "register_operand" "w"))
+   (match_operand:VDQHS 3 "register_operand" "w"]
+  "TARGET_SIMD"
+  "mls\t%0., %3., %2.[0]"
+  [(set_attr "type" "neon_mla__scalar")]
+)
+
 ;; Max/Min operations.
 (define_insn "3"
  [(set (match_operand:VDQ_BHSI 0 "register_operand" "=w")
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vmla_elem_1.c 
b/gcc/testsuite/gcc.target/aarch64/simd/vmla_elem_1.c
index 
e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..df777581ab43b9b9e20b61f3f8d46193bdfda5fb
 100644
--- a/gcc/testsuite/gcc.target/aarch64/simd/vmla_elem_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vmla_elem_1.c
@@ -0,0 +1,67 @@
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+
+typedef short int __attribute__ ((vector_size (16))) v8hi;
+
+v8hi
+mla8hi (v8hi v0, v8hi v1, short int v2)
+{
+  /* { dg-final { scan-assembler "mla\\tv\[0-9\]\+\\.8h, v\[0-9\]\+\\.8h, 
v\[0-9\]\+\\.h\\\[0\\\]" } } */
+  return v0 + v1 * v2;
+}
+
+
+v8hi
+mls8hi (v8hi v0, v8hi v1, short int v2)
+{
+  /* { dg-final { scan-assembler "mls\\tv\[0-9\]\+\\.8h, v\[0-9\]\+\\.8h, 
v\[0-9\]\+\\.h\\\[0\\\]" } } */
+  return v0 - v1 * v2;
+}
+
+typedef short int __attribute__ ((vector_size (8))) v4hi;
+
+v4hi
+mla4hi (v4hi v0, v4hi v1, short int v2)
+{
+  /* { dg-final { scan-assembler "mla\\tv\[0-9\]\+\\.4h, v\[0-9\]\+\\.4h, 
v\[0-9\]\+\\.h\\\[0\\\]" } } */
+  return v0 + v1 * v2;
+}
+
+v4hi
+mls4hi (v4hi v0, v4hi v1, short int v2)
+{
+  /* { dg-final { scan-assembler "mls\\tv\[0-9\]\+\\.4h, v\[0-9\]\+\\.4h, 
v\[0-9\]\+\\.h\\\[0\\\]" } } */
+  return v0 - v1 * v2;
+}
+
+typedef int __attribute__ ((vector_size (16))) v4si;
+
+v4si
+mla4si (v4si v0, v4si v1, int v2)
+{
+  /* { dg-final { scan-assembler "mla\\tv\[0-9\]\+\\.4s, v\[0-9\]\+\\.4s, 
v\[0-9\]\+\\.s\\\[0\\\]" } } */
+  return v0 + v1 * v2;
+}
+
+v4si
+mls4si (v4si v0, v4si v1, int v2)
+{
+  /* { dg-final { scan-assembler "mls\\tv\[0-9\]\+\\.4s, v\[0-9\]\+\\.4s, 
v\[0-9\]\+\\.s\\\[0\\\]" } } */
+  return v0 - v1 * v2;
+}
+
+typedef int __attribute__((vector_size (8))) v2si;
+
+v2si
+mla2si (v2si v0, v2si v1, int v2)
+{
+  /* { dg-final { scan-assembler "mla\\tv\[0-9\]\+\\.2s, v\[0-9\]\+\\.2s,

Re: [patch,lto] Fix PR81487

2017-07-21 Thread Richard Biener

On Thu, Jul 20, 2017 at 3:18 PM, Georg-Johann Lay  wrote:
> Hi, this patch fixes some minor problems in lto-plugin:
>
> Some older mingw32 host environments have broken asprintf.
> As far as I can tell, the problem is that the mingw asprintf
> implementation calls _vsnprintf (NULL, 0, ...) which always
> returns -1 as length on the host.
>
> The patch fixes this by using xasprintf from libiberty with
> the additional benefit of easier use and unified error reporting
> in the case when there is actually no more memory available, i.e.
> it will use the same error massages like xmalloc then.
>
> Moreover, the old implementation calls asprintf as
>
>> asprintf (, "%s@0x%x%08x", file->name, lo, hi)
>
> for large archives when the high 32 bits (hi) are non-zero.
> This looks like a typo: the high part must come first to yiels
> a proper 64-bit value.
>
> Bootstrapped & re-tested on x86_64-linux gnu and also on
> mingw32 which popped the "ld.exe: asprintf failed".
>
> Ok for trunk?

Ok.

I wonder if any of the other plain asprintf calls in GCC are
problematic.  From a quick look they are all inside code
only exercised when dumping/debugging.  But maybe we
should replace those as well.

Richard.

> Johann
>
>
> lto-plugin/
> PR lto/81487
> * lto-plugin.c (claim_file_handler): Use xasprintf instead of
> asprintf.
> [hi!=0]: Swap hi and lo arguments supplied to xasprintf.

[PATCH] Fix PR81500

2017-07-21 Thread Richard Biener


Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2017-06-21  Richard Biener  

PR tree-optimization/81500
* tree-vect-loop.c (vect_is_simple_reduction): Properly fail if
we didn't identify a reduction path.

* gcc.dg/torture/pr81500.c: New testcase.

Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c(revision 250386)
+++ gcc/tree-vect-loop.c(working copy)
@@ -3243,7 +3243,7 @@ pop:
 }
 
   /* Check whether the reduction path detected is valid.  */
-  bool fail = false;
+  bool fail = path.length () == 0;
   bool neg = false;
   for (unsigned i = 1; i < path.length (); ++i)
 {
@@ -3276,9 +3276,7 @@ pop:
 
   if (dump_enabled_p ())
 {
-  report_vect_op (MSG_MISSED_OPTIMIZATION,
- SSA_NAME_DEF_STMT
-   (USE_FROM_PTR (path[path.length ()-1].second)),
+  report_vect_op (MSG_MISSED_OPTIMIZATION, def_stmt,
  "reduction: unknown pattern: ");
 }
 
Index: gcc/testsuite/gcc.dg/torture/pr81500.c
===
--- gcc/testsuite/gcc.dg/torture/pr81500.c  (nonexistent)
+++ gcc/testsuite/gcc.dg/torture/pr81500.c  (working copy)
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+
+typedef int a;
+void c(int *b)
+{
+  int d;
+  a e, f, *g, *h = b;
+  for (; d; d--) {
+  f = *g & 1;
+  *h-- = *g-- | e;
+  e = f;
+  }
+}

Re: [RFC][PATCH][AArch64] Cleanup frame pointer usage

2017-07-21 Thread Wilco Dijkstra


       
ping
    
Wilco Dijkstra wrote:
> James Greenhalgh wrote:
>
> > I note this is still marked as an RFC, are you now proposing it as a
> > patch to be merged to trunk?
> 
> Absolutely. It was marked as an RFC to get some comments - I thought it
> may be controversial to separate the frame pointer and frame chain concept. 
> And this fixes the long standing bugs caused by changing the global frame
> pointer option to an incorrect value for the leaf function optimization.

Here is a rebased version that should patch without merge issues:

Cleanup frame pointer usage.  Introduce a boolean emit_frame_chain which
determines whether to store FP and LR and setup FP to point at this record.
When the frame pointer is enabled but not strictly required (eg. no use of
alloca), we emit a frame chain in non-leaf functions, but don't use the
frame pointer to access locals.  This results in smaller code and unwind info.

Simplify the logic in aarch64_override_options_after_change_1 () and compute
whether the frame chain is required in aarch64_layout_frame () instead.
As a result aarch64_frame_pointer_required is now redundant.

Convert all callee save/restore functions to use gen_frame_mem.

Bootstrap OK.

ChangeLog:
2017-06-15  Wilco Dijkstra  

    gcc/
    PR middle-end/60580
    * config/aarch64/aarch64.h (aarch64_frame):
 Add emit_frame_chain boolean.
    * config/aarch64/aarch64.c (aarch64_frame_pointer_required)
    Remove.
    (aarch64_layout_frame): Initialise emit_frame_chain.
    (aarch64_pushwb_single_reg): Use gen_frame_mem.
    (aarch64_pop_regs): Likewise.
    (aarch64_gen_load_pair): Likewise.
    (aarch64_save_callee_saves): Likewise.
    (aarch64_restore_callee_saves): Likewise.
    (aarch64_expand_prologue): Use emit_frame_chain.
    (aarch64_can_eliminate): Simplify. When FP needed or outgoing
    arguments are large, eliminate to FP, otherwise SP.
    (aarch64_override_options_after_change_1): Simplify.
    (TARGET_FRAME_POINTER_REQUIRED): Remove define.
--
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 
08acdeb52d4083f50a4b44f43fb98009cdcc041f..722c39cfc4d57280d621fb6130e4d9f4d59d1e72
 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -591,6 +591,9 @@ struct GTY (()) aarch64_frame
   /* The size of the stack adjustment after saving callee-saves.  */
   HOST_WIDE_INT final_adjust;
 
+  /* Store FP,LR and setup a frame pointer.  */
+  bool emit_frame_chain;
+
   unsigned wb_candidate1;
   unsigned wb_candidate2;
 
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
fd3005d8056e65cb32c92bbd5eb752c977c885a5..a97b4bbe9dc0f7bccc90a9337519038041241531
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -2761,24 +2761,6 @@ aarch64_output_probe_stack_range (rtx reg1, rtx reg2)
   return "";
 }
 
-static bool
-aarch64_frame_pointer_required (void)
-{
-  /* In aarch64_override_options_after_change
- flag_omit_leaf_frame_pointer turns off the frame pointer by
- default.  Turn it back on now if we've not got a leaf
- function.  */
-  if (flag_omit_leaf_frame_pointer
-  && (!crtl->is_leaf || df_regs_ever_live_p (LR_REGNUM)))
-    return true;
-
-  /* Force a frame pointer for EH returns so the return address is at FP+8.  */
-  if (crtl->calls_eh_return)
-    return true;
-
-  return false;
-}
-
 /* Mark the registers that need to be saved by the callee and calculate
    the size of the callee-saved registers area and frame record (both FP
    and LR may be omitted).  */
@@ -2791,6 +2773,18 @@ aarch64_layout_frame (void)
   if (reload_completed && cfun->machine->frame.laid_out)
 return;
 
+  /* Force a frame chain for EH returns so the return address is at FP+8.  */
+  cfun->machine->frame.emit_frame_chain
+    = frame_pointer_needed || crtl->calls_eh_return;
+
+  /* Emit a frame chain if the frame pointer is enabled.
+ If -momit-leaf-frame-pointer is used, do not use a frame chain
+ in leaf functions which do not use LR.  */
+  if (flag_omit_frame_pointer == 2
+  && !(flag_omit_leaf_frame_pointer && crtl->is_leaf
+  && !df_regs_ever_live_p (LR_REGNUM)))
+    cfun->machine->frame.emit_frame_chain = true;
+
 #define SLOT_NOT_REQUIRED (-2)
 #define SLOT_REQUIRED (-1)
 
@@ -2825,7 +2819,7 @@ aarch64_layout_frame (void)
 last_fp_reg = regno;
   }
 
-  if (frame_pointer_needed)
+  if (cfun->machine->frame.emit_frame_chain)
 {
   /* FP and LR are placed in the linkage record.  */
   cfun->machine->frame.reg_offset[R29_REGNUM] = 0;
@@ -2997,7 +2991,7 @@ aarch64_pushwb_single_reg (machine_mode mode, unsigned 
regno,
   reg = gen_rtx_REG (mode, regno);
   mem = gen_rtx_PRE_MODIFY (Pmode, base_rtx,
 plus_constant (Pmode, base_rtx, -adjustment));
-  mem = gen_rtx_MEM (mode, mem);
+  mem = gen_frame_mem (mode, mem);

Re: [PATCH v3][AArch64] Fix symbol offset limit

2017-07-21 Thread Wilco Dijkstra

    

ping

From: Wilco Dijkstra
Sent: 17 January 2017 15:14
To: Richard Earnshaw; GCC Patches; James Greenhalgh
Cc: nd
Subject: Re: [PATCH v3][AArch64] Fix symbol offset limit
    
Here is v3 of the patch - tree_fits_uhwi_p was necessary to ensure the size of a
declaration is an integer. So the question is whether we should allow
largish offsets outside of the bounds of symbols (v1), no offsets (this 
version), or
small offsets (small negative and positive offsets just outside a symbol are 
common).
The only thing we can't allow is any offset like we currently do...

In aarch64_classify_symbol symbols are allowed full-range offsets on 
relocations.
This means the offset can use all of the +/-4GB offset, leaving no offset 
available
for the symbol itself.  This results in relocation overflow and link-time errors
for simple expressions like _char + 0xff00.

To avoid this, limit the offset to +/-1GB so that the symbol needs to be within 
a
3GB offset from its references.  For the tiny code model use a 64KB offset, 
allowing
most of the 1MB range for code/data between the symbol and its references.
For symbols with a defined size, limit the offset to be within the size of the 
symbol.


ChangeLog:
2017-01-17  Wilco Dijkstra  

    gcc/
    * config/aarch64/aarch64.c (aarch64_classify_symbol):
    Apply reasonable limit to symbol offsets.

    testsuite/
    * gcc.target/aarch64/symbol-range.c (foo): Set new limit.
    * gcc.target/aarch64/symbol-range-tiny.c (foo): Likewise.

--
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
e8d65ead95a3c5730c2ffe64a9e057779819f7b4..f1d54e332dc1cf1ef0bc4b1e46b0ebebe1c4cea4
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -9809,6 +9809,8 @@ aarch64_classify_symbol (rtx x, rtx offset)
   if (aarch64_tls_symbol_p (x))
 return aarch64_classify_tls_symbol (x);
 
+  const_tree decl = SYMBOL_REF_DECL (x);
+
   switch (aarch64_cmodel)
 {
 case AARCH64_CMODEL_TINY:
@@ -9817,25 +9819,45 @@ aarch64_classify_symbol (rtx x, rtx offset)
  we have no way of knowing the address of symbol at compile time
  so we can't accurately say if the distance between the PC and
  symbol + offset is outside the addressible range of +/-1M in the
-    TINY code model.  So we rely on images not being greater than
-    1M and cap the offset at 1M and anything beyond 1M will have to
-    be loaded using an alternative mechanism.  Furthermore if the
-    symbol is a weak reference to something that isn't known to
-    resolve to a symbol in this module, then force to memory.  */
+    TINY code model.  So we limit the maximum offset to +/-64KB and
+    assume the offset to the symbol is not larger than +/-(1M - 64KB).
+    Furthermore force to memory if the symbol is a weak reference to
+    something that doesn't resolve to a symbol in this module.  */
   if ((SYMBOL_REF_WEAK (x)
    && !aarch64_symbol_binds_local_p (x))
- || INTVAL (offset) < -1048575 || INTVAL (offset) > 1048575)
+ || !IN_RANGE (INTVAL (offset), -0x1, 0x1))
 return SYMBOL_FORCE_TO_MEM;
+
+ /* Limit offset to within the size of a declaration if available.  */
+ if (decl && DECL_P (decl))
+   {
+ const_tree decl_size = DECL_SIZE (decl);
+
+ if (tree_fits_uhwi_p (decl_size)
+ && !IN_RANGE (INTVAL (offset), 0, tree_to_uhwi (decl_size)))
+   return SYMBOL_FORCE_TO_MEM;
+   }
+
   return SYMBOL_TINY_ABSOLUTE;
 
 case AARCH64_CMODEL_SMALL:
   /* Same reasoning as the tiny code model, but the offset cap here is
-    4G.  */
+    1G, allowing +/-3G for the offset to the symbol.  */
   if ((SYMBOL_REF_WEAK (x)
    && !aarch64_symbol_binds_local_p (x))
- || !IN_RANGE (INTVAL (offset), HOST_WIDE_INT_C (-4294967263),
-   HOST_WIDE_INT_C (4294967264)))
+ || !IN_RANGE (INTVAL (offset), -0x4000, 0x4000))
 return SYMBOL_FORCE_TO_MEM;
+
+ /* Limit offset to within the size of a declaration if available.  */
+ if (decl && DECL_P (decl))
+   {
+ const_tree decl_size = DECL_SIZE (decl);
+
+ if (tree_fits_uhwi_p (decl_size)
+ && !IN_RANGE (INTVAL (offset), 0, tree_to_uhwi (decl_size)))
+   return SYMBOL_FORCE_TO_MEM;
+   }
+
   return SYMBOL_SMALL_ABSOLUTE;
 
 case AARCH64_CMODEL_TINY_PIC:
diff --git a/gcc/testsuite/gcc.target/aarch64/symbol-range-tiny.c 
b/gcc/testsuite/gcc.target/aarch64/symbol-range-tiny.c
index 
d7e46b059e41f2672b3a1da5506fa8944e752e01..d49ff4dbe5786ef6d343d2b90052c09676dd7fe5
 100644
---

Re: [PATCH][AArch64] Improve aarch64_legitimate_constant_p

2017-07-21 Thread Wilco Dijkstra


    
ping
    
This patch further improves aarch64_legitimate_constant_p.  Allow all
integer, floating point and vector constants.  Allow label references
and non-anchor symbols with an immediate offset.  This allows such
constants to be rematerialized, resulting in smaller code and fewer stack
spills.

SPEC2006 codesize reduces by 0.08%, SPEC2017 by 0.13%.

Bootstrap OK, OK for commit?

ChangeLog:
2017-07-07  Wilco Dijkstra  

    * config/aarch64/aarch64.c (aarch64_legitimate_constant_p):
    Return true for more constants, symbols and label references.
    (aarch64_valid_floating_const): Remove unused function.

--
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
a2eca64a9c13e44d223b5552c079ef4e09659e84..810c17416db01681e99a9eb8cc9f5af137ed2054
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -10173,49 +10173,46 @@ aarch64_legitimate_pic_operand_p (rtx x)
   return true;
 }
 
-/* Return true if X holds either a quarter-precision or
- floating-point +0.0 constant.  */
-static bool
-aarch64_valid_floating_const (machine_mode mode, rtx x)
-{
-  if (!CONST_DOUBLE_P (x))
-    return false;
-
-  if (aarch64_float_const_zero_rtx_p (x))
-    return true;
-
-  /* We only handle moving 0.0 to a TFmode register.  */
-  if (!(mode == SFmode || mode == DFmode))
-    return false;
-
-  return aarch64_float_const_representable_p (x);
-}
+/* Implement TARGET_LEGITIMATE_CONSTANT_P hook.  Return true for constants
+   that should be rematerialized rather than spilled.  */
 
 static bool
 aarch64_legitimate_constant_p (machine_mode mode, rtx x)
 {
+  /* Support CSE and rematerialization of common constants.  */
+  if (CONST_INT_P (x) || CONST_DOUBLE_P (x) || GET_CODE (x) == CONST_VECTOR)
+    return true;
+
   /* Do not allow vector struct mode constants.  We could support
  0 and -1 easily, but they need support in aarch64-simd.md.  */
-  if (TARGET_SIMD && aarch64_vect_struct_mode_p (mode))
+  if (aarch64_vect_struct_mode_p (mode))
 return false;
 
-  /* This could probably go away because
- we now decompose CONST_INTs according to expand_mov_immediate.  */
-  if ((GET_CODE (x) == CONST_VECTOR
-   && aarch64_simd_valid_immediate (x, mode, false, NULL))
-  || CONST_INT_P (x) || aarch64_valid_floating_const (mode, x))
-   return !targetm.cannot_force_const_mem (mode, x);
+  /* Do not allow wide int constants - this requires support in movti.  */
+  if (CONST_WIDE_INT_P (x))
+    return false;
 
-  if (GET_CODE (x) == HIGH
-  && aarch64_valid_symref (XEXP (x, 0), GET_MODE (XEXP (x, 0
-    return true;
+  /* Do not allow const (plus (anchor_symbol, const_int)).  */
+  if (GET_CODE (x) == CONST && GET_CODE (XEXP (x, 0)) == PLUS)
+  {
+    x = XEXP (XEXP (x, 0), 0);
+    if (SYMBOL_REF_P (x) && SYMBOL_REF_ANCHOR_P (x))
+  return false;
+  }
+
+  if (GET_CODE (x) == HIGH)
+    x = XEXP (x, 0);
 
   /* Treat symbols as constants.  Avoid TLS symbols as they are complex,
  so spilling them is better than rematerialization.  */
   if (SYMBOL_REF_P (x) && !SYMBOL_REF_TLS_MODEL (x))
 return true;
 
-  return aarch64_constant_address_p (x);
+  /* Label references are always constant.  */
+  if (GET_CODE (x) == LABEL_REF)
+    return true;
+
+  return false;
 }
 
 rtx

Re: [PATCH][AArch64] Fix PR79041

2017-07-21 Thread Wilco Dijkstra


    
ping

    
As described in PR79041, -mcmodel=large -mpc-relative-literal-loads
may be used to avoid generating ADRP/ADD or ADRP/LDR.  However both
trunk and GCC7 may still emit ADRP for some constant pool literals.
Fix this by adding a aarch64_pcrelative_literal_loads check.

OK for trunk/GCC7 backport?

ChangeLog:
2017-06-27  Wilco Dijkstra  

    PR target/79041
    * config/aarch64/aarch64.c (aarch64_classify_symbol):
    Avoid SYMBOL_SMALL_ABSOLUTE .
    * testsuite/gcc.target/aarch64/pr79041-2.c: New test.
--
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
060cd8476d2954119daac495ecb059c9be73edbe..329d244e9cf16dbdf849e5dd02b3999caf0cd5a7
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -10042,7 +10042,7 @@ aarch64_classify_symbol (rtx x, rtx offset)
   /* This is alright even in PIC code as the constant
  pool reference is always PC relative and within
  the same translation unit.  */
- if (CONSTANT_POOL_ADDRESS_P (x))
+ if (CONSTANT_POOL_ADDRESS_P (x) && !aarch64_pcrelative_literal_loads)
 return SYMBOL_SMALL_ABSOLUTE;
   else
 return SYMBOL_FORCE_TO_MEM;
diff --git a/gcc/testsuite/gcc.target/aarch64/pr79041-2.c 
b/gcc/testsuite/gcc.target/aarch64/pr79041-2.c
new file mode 100644
index 
..e7899725bad2b770f8488a07f99792113275bdf2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr79041-2.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mcmodel=large -mpc-relative-literal-loads" } */
+
+__int128
+t (void)
+{
+  return (__int128)1 << 80;
+}
+
+/* { dg-final { scan-assembler "adr" } } */

Re: [AARCH64] Disable pc relative literal load irrespective of TARGET_FIX_ERR_A53_84341

2017-07-21 Thread Kugan Vivekanandarajah

Ping ?

Thanks,
Kugan

On 27 June 2017 at 11:20, Kugan Vivekanandarajah
 wrote:
> https://gcc.gnu.org/ml/gcc-patches/2016-03/msg00614.html  added this
> workaround to get kernel building with when TARGET_FIX_ERR_A53_843419
> is enabled.
>
> This was added to support building kernel loadable modules. In kernel,
> when CONFIG_ARM64_ERRATUM_843419 is selected, the relocation needed
> for ADRP/LDR (R_AARCH64_ADR_PREL_PG_HI21 and
> R_AARCH64_ADR_PREL_PG_HI21_NC are removed from the kernel to avoid
> loading objects with possibly offending sequence). Thus, it could only
> support pc relative literal loads.
>
> However, the following patch was posted to kernel to add
> -mpc-relative-literal-loads
> http://www.spinics.net/lists/arm-kernel/msg476149.html
>
> -mpc-relative-literal-loads is unconditionally added to the kernel
> build as can be seen from:
> https://github.com/torvalds/linux/blob/master/arch/arm64/Makefile
>
> Therefore this patch removes the hunk so that applications like
> SPECcpu2017's 521/621.wrf can be built (with LTO in this case) without
> -mno-pc-relative-literal-loads
>
> Bootstrapped and regression tested on aarch64-linux-gnu with no new 
> regressions.
>
> Is this OK for trunk?
>
> Thanks,
> Kugan
>
> gcc/testsuite/ChangeLog:
>
> 2017-06-27  Kugan Vivekanandarajah  
>
> * gcc.target/aarch64/pr63304_1.c: Remove-mno-fix-cortex-a53-843419.
>
> gcc/ChangeLog:
>
> 2017-06-27  Kugan Vivekanandarajah  
>
> * config/aarch64/aarch64.c (aarch64_override_options_after_change_1):
> Disable pc relative literal load irrespective of TARGET_FIX_ERR_A53_84341
> for default.

[committed, nvptx] Add nvptx_override_options_after_change

2017-07-21 Thread Tom de Vries


Hi,

this patch adds nvptx_override_options_after_change, containing a 
workaround for PR81430.


Tested on x86_64 with nvptx accelerator.

Committed.

Thanks,
- Tom
Add nvptx_override_options_after_change

2017-07-21  Tom de Vries  

	PR lto/81430
	* config/nvptx/nvptx.c (nvptx_override_options_after_change): New
	function.
	(TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE): Define to
	nvptx_override_options_after_change.

---
 gcc/config/nvptx/nvptx.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index d0aa054..a718054 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -207,6 +207,17 @@ nvptx_option_override (void)
 target_flags |= MASK_SOFT_STACK | MASK_UNIFORM_SIMT;
 }
 
+/* Implement TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE.  */
+
+static void
+nvptx_override_options_after_change (void)
+{
+  /* This is a workaround for PR81430 - nvptx acceleration compilation broken
+ because of running pass_partition_blocks.  This should be dealt with in the
+ common code, not in the target.  */
+  flag_reorder_blocks_and_partition = 0;
+}
+
 /* Return a ptx type for MODE.  If PROMOTE, then use .u32 for QImode to
deal with ptx ideosyncracies.  */
 
@@ -5505,6 +5516,9 @@ nvptx_data_alignment (const_tree type, unsigned int basic_align)
 #undef TARGET_OPTION_OVERRIDE
 #define TARGET_OPTION_OVERRIDE nvptx_option_override
 
+#undef TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE
+#define TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE nvptx_override_options_after_change
+
 #undef TARGET_ATTRIBUTE_TABLE
 #define TARGET_ATTRIBUTE_TABLE nvptx_attribute_table

Re: [PATCH, PR81430] Use finalize_options in lto1

2017-07-21 Thread Richard Biener

On Thu, 20 Jul 2017, Tom de Vries wrote:

> On 07/20/2017 12:10 PM, Richard Biener wrote:
> > On Thu, 20 Jul 2017, Tom de Vries wrote:
> > 
> > > Hi,
> > > 
> > > this patch fixes PR81430, an ICE in the libgomp testsuite for both openmp
> > > and
> > > openacc test-cases for x86_64 with nvptx accelerator.
> > > 
> > > The scenario how we hit the ICE is as follows:
> > > - a testcase is compiled with -O2
> > > - ix86_option_optimization_table enables
> > >OPT_freorder_blocks_and_partition at -O2
> > > - cc1 writes out the flag as part of DECL_FUNCTION_SPECIFIC_OPTIMIZATION
> > > - lto1 reads in the flag as part of DECL_FUNCTION_SPECIFIC_OPTIMIZATION
> > > - lto1 uses the flag, and runs pass_partition_blocks
> > > - pass_partition_blocks ICEs, because it generates code that is not
> > >supported by the nvptx target.
> > > 
> > > Note that for standalone compilation for single-thread ptx execution, we
> > > don't
> > > attempt to run pass_partition_blocks. This is because for nvptx,
> > > TARGET_HAVE_NAMED_SECTIONS is set to false, and this bit in finish_options
> > > switches off pass_partition_blocks:
> > > ...
> > > /* If the target requested unwind info, then turn off the
> > >partitioning optimization with a different message.  Likewise, if
> > >the target does not support named sections.  */
> > > 
> > >if (opts->x_flag_reorder_blocks_and_partition
> > >&& (!targetm_common.have_named_sections
> > >|| (opts->x_flag_unwind_tables
> > >&& targetm_common.unwind_tables_default
> > >&& (ui_except == UI_SJLJ || ui_except >= UI_TARGET
> > >  {
> > >if (opts_set->x_flag_reorder_blocks_and_partition)
> > >  inform (loc,
> > >  "-freorder-blocks-and-partition does not work "
> > >  "on this architecture");
> > >opts->x_flag_reorder_blocks_and_partition = 0;
> > >opts->x_flag_reorder_blocks = 1;
> > >  }
> > > ...
> > > 
> > > The patch fixes this by calling finish_options in lto1 after
> > > cl_optimization_restore.
> > > 
> > > Points for review:
> > > 1. I'm uncertain though about the placement of the call. Perhaps it should
> > > be
> > > in cl_optimization_restore, before targetm.override_options_after_change?
> > > 
> > > 2. I think that this is offloading specific, so perhaps this should be
> > > guarded
> > > with lto_stream_offload_p or #ifdef ACCEL_COMPILER or some such.
> > 
> > Hmm, I agree with #2.  I think it conceptually is a LTO stream adjustment
> > and thus we should do this at the time we stream in the
> > optimization/target nodes (like we remap modes for example).  Not
> > sure if it's possible to do this at that point, but it looks like
> > finish_options takes two option structs and thus we should be able to
> > call it.
> > 
> 
> With what parameters? Patch below tries with same option struct, but ...
> 
> > Do you get the inform note?  I suppose we don't really want that, no?
> > 
> 
> ... I think that way we'll get the inform note (while the previous solution
> did not).
> 
> I've also tried with a tmp2 memset to 0, but that ran into problems when doing
> a maybe_set_param_value.

Use global_options_set?  That should do what the other patch did.

> Thanks,
> - Tom
> 
> diff --git a/gcc/tree-streamer-in.c b/gcc/tree-streamer-in.c
> index 7f7ea7f..e0e792b 100644
> --- a/gcc/tree-streamer-in.c
> +++ b/gcc/tree-streamer-in.c
> @@ -33,6 +33,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "ipa-chkp.h"
>  #include "gomp-constants.h"
>  #include "asan.h"
> +#include "opts.h"
> 
> 
>  /* Read a STRING_CST from the string table in DATA_IN using input
> @@ -769,6 +770,20 @@ lto_input_ts_function_decl_tree_pointers (struct
> lto_input_block *ib,
>DECL_FUNCTION_SPECIFIC_TARGET (expr) = stream_read_tree (ib, data_in);
>  #endif
>DECL_FUNCTION_SPECIFIC_OPTIMIZATION (expr) = stream_read_tree (ib,
> data_in);
> +#ifdef ACCEL_COMPILER
> +  {
> +tree opts = DECL_FUNCTION_SPECIFIC_OPTIMIZATION (expr);
> +if (opts)
> +  {
> +   struct gcc_options tmp;
> +   cl_optimization_restore (, TREE_OPTIMIZATION (opts));
> +   finish_options (, , DECL_SOURCE_LOCATION (expr));
> +   opts = build_optimization_node ();
> +
> +   DECL_FUNCTION_SPECIFIC_OPTIMIZATION (expr) = opts;
> +  }
> +  }
> +#endif
> 
>/* If the file contains a function with an EH personality set,
>   then it was compiled with -fexceptions.  In that case, initialize
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)

Re: [PATCH] update edge profile info in nvptx.c

2017-07-21 Thread Tom de Vries


On 07/20/2017 03:04 PM, Tom de Vries wrote:

On 07/13/2017 06:53 PM, Cesar Philippidis wrote:

Similarly, for nvptx vector reductions, when it comes time to initialize
the reduction variable, the nvptx BE constructs a branch so that only
vector lanes 1 to vector_length-1 are initialized the the default value
for a given reduction type, where vector lane 0 retains the original
value of the reduction variable. For similar reason to the gang and
worker reductions, I set the probability of the new edge introduced for
the vector reduction to even.



Hi,

The problem that you describe in abstract term looks like this concretely:

(gdb) call debug_bb_n (4)
;; basic block 4, loop depth 0, freq 662, maybe hot
;;  prev block 3, next block 16, flags: (VISITED)
;;  pred:   3 [always (guessed)]  (FALLTHRU,EXECUTABLE)
# VUSE <.MEM_61>
# PT = nonlocal unit-escaped null
_18 = MEM[(const struct .omp_data_t.33D.1518 &).omp_data_i_9(D)
   clique 1 base 1].s2D.1519;
# VUSE <.MEMD.1540>
# USE = anything
_72 = GOACC_DIM_POS (2);
if (_72 != 0)
   goto ; [100.00%] [count: INV]
else
   goto ; [INV] [count: INV]
;;  succ:   16 [always]  (TRUE_VALUE)
;;  17 (FALSE_VALUE)
...

The edge to bb16 has probability 100%. The edge to bb17 has no 
probability set.




Hi,


I.

the patch below fixes the probabilities on the outgoing edges, setting 
them to even:

...
(gdb) call debug_bb_n (4)
;; basic block 4, loop depth 0, freq 662, maybe hot
;;  prev block 3, next block 16, flags: (VISITED)
;;  pred:   3 [always (guessed)]  (FALLTHRU,EXECUTABLE)
# VUSE <.MEM_61>
# PT = nonlocal unit-escaped null
_18 = MEM[(const struct .omp_data_t.33D.1518 &).omp_data_i_9(D) clique 1 
base 1].s2D.1519;

# VUSE <.MEMD.1540>
# USE = anything
_72 = GOACC_DIM_POS (2);
if (_72 != 0)
  goto ; [50.00%] [count: INV]
else
  goto ; [50.00%] [count: INV]
;;  succ:   16 [50.0% (adjusted)]  (TRUE_VALUE)
;;  17 [50.0% (adjusted)]  (FALSE_VALUE)
...


II.

The quality is 'adjusted'. [ Even() first calls always() which has 
quality precise, and then applies scale(), which downgrades the quality 
from 'precise' to 'adjusted'. ]


The reason for that is explained in this comment AFAIU:
...
   Named probabilities except for never/always are assumed to be
   statically guessed and thus not necessarily accurate.
...

When I look at the definitions of 'adjusted' and 'precise':
...
  /* Profile was originally based on feedback but it was adjusted
 by code duplicating optimization.  It may not precisely reflect the
 particular code path.  */
  profile_adjusted = 2,
  /* Profile was read from profile feedback or determined by accurate
 static method.  */
  profile_precise = 3
...

I wonder: there seem to be two situations in which 'precise' is possible:
- Profile was read from profile feedback
- Profile was determined by accurate static method
But there is only one situation where 'adjusted' is possible:
- Profile was originally based on feedback but it was adjusted by code
  duplicating optimization.
I can imagine as well that we originally have a static method giving 
precise information, and that this information is downgraded by a code 
duplication optimization.


So, should this be instead:
...
  /* Profile was originally based on feedback or accurate static method,
 but it was adjusted by code duplicating optimization.  It may not
 precisely reflect the particular code path.  */
  profile_adjusted = 2,
...
?


III.

In this particular case, we insert a conditional jump based on a special 
machine register which gives us the knowledge that both the true and 
false edge are equally likely (in fact, both are executed, with 
different warp enabling mask).


So I think the most accurate representation would be 50/50 'precise', 
but that does not fit with the assumption above.


I'll commit the patch below for now.

But I'm curious if you have any comments.

Thanks,
- Tom

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 6314653..2c427fa 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -5149,6 +5149,7 @@ nvptx_goacc_reduction_init (gcall *call)

   /* Fixup flags from call_bb to init_bb.  */
   init_edge->flags ^= EDGE_FALLTHRU | EDGE_TRUE_VALUE;
+  init_edge->probability = profile_probability::even ();

   /* Set the initialization stmts.  */
   gimple_seq init_seq = NULL;
@@ -5164,6 +5165,7 @@ nvptx_goacc_reduction_init (gcall *call)

   /* Create false edge from call_bb to dst_bb.  */
   edge nop_edge = make_edge (call_bb, dst_bb, EDGE_FALSE_VALUE);
+  nop_edge->probability = profile_probability::even ();

   /* Create phi node in dst block.  */
   gphi *phi = create_phi_node (lhs, dst_bb);

Re: [PATCH][1/n] Fix PR81303

2017-07-21 Thread Richard Biener

On Fri, 21 Jul 2017, Bin.Cheng wrote:

> On Fri, Jul 21, 2017 at 8:12 AM, Richard Biener  wrote:
> >
> > The following is sth I noticed when looking at a way to fix PR81303.
> > We happily compute a runtime cost model threshold that executes the
> > vectorized variant even though no vector iteration takes place due
> > to the number of prologue/epilogue iterations.  The following fixes
> > that -- note that if we do not know the prologue/epilogue counts
> > statically they are estimated at vf/2 which means there's still the
> > chance the vector iteration won't execute.  To fix that we'd have to
> > estimate those as vf-1 instead, sth we might consider doing anyway
> > given that we regularly completely peel the epilogues vf-1 times
> > in that case.  Maybe as followup.
> Hi,
> Do we consider disabling epilogue peeling if # of iters is unknown at
> compilation time?

Well, we peel which if size doesn't explode should be profitable because 
of better branch prediction.

>  When loop is versioned, could we share epilogue
> with versioned loop?  Epilogue was shared before?

When not versioning we share the epilogue, yes.  With versioning
we don't because we use the versioning code.  It looks like sharing
the epilogue makes RAs job harder though.

Richard.

> Thanks,
> bin
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.
> >
> > Richard.
> >
> > 2016-07-21  Richard Biener  
> >
> > PR tree-optimization/81303
> > * tree-vect-loop.c (vect_estimate_min_profitable_iters): Take
> > into account prologue and epilogue iterations when raising
> > min_profitable_iters to sth at least covering one vector iteration.
> >
> > Index: gcc/tree-vect-loop.c
> > ===
> > --- gcc/tree-vect-loop.c(revision 250384)
> > +++ gcc/tree-vect-loop.c(working copy)
> > @@ -3702,8 +3702,9 @@ vect_estimate_min_profitable_iters (loop
> >"  Calculated minimum iters for profitability: %d\n",
> >min_profitable_iters);
> >
> > -  min_profitable_iters =
> > -   min_profitable_iters < vf ? vf : min_profitable_iters;
> > +  /* We want the vectorized loop to execute at least once.  */
> > +  if (min_profitable_iters < (vf + peel_iters_prologue + 
> > peel_iters_epilogue))
> > +min_profitable_iters = vf + peel_iters_prologue + peel_iters_epilogue;
> >
> >if (dump_enabled_p ())
> >  dump_printf_loc (MSG_NOTE, vect_location,
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)

[PATCH][2/n] Fix PR81303

2017-07-21 Thread Richard Biener


This fixes mismatched cost compute in peeling cost estimation by
using the same routine also for no peeling costing (the original
implementation failed to handle the same special cases such as
groups and thus always over-estimated no peeling cost...).

It doesn't disable peeling for bwaves for me because we still
improve (as expected) cost by 1 (but outside cost increases
significantly).  Factoring in outside cost looks difficult so
maybe instead of a hard compare (13 vs. 14 in this case) we
should require a percentage improvement.

Anyway, this fixes a bug.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

2017-07-21  Richard Biener  

PR tree-optimization/81303
* tree-vect-data-refs.c (vect_get_peeling_costs_all_drs): Pass
in datarefs vector.  Allow NULL dr0 for no peeling cost estimate.
(vect_peeling_hash_get_lowest_cost): Adjust.
(vect_enhance_data_refs_alignment): Likewise.  Use
vect_get_peeling_costs_all_drs to compute the penalty for no
peeling to match up costs.

Index: gcc/tree-vect-data-refs.c
===
--- gcc/tree-vect-data-refs.c   (revision 250386)
+++ gcc/tree-vect-data-refs.c   (working copy)
@@ -1159,25 +1159,21 @@ vect_peeling_hash_get_most_frequent (_ve
misalignment will be zero after peeling.  */
 
 static void
-vect_get_peeling_costs_all_drs (struct data_reference *dr0,
+vect_get_peeling_costs_all_drs (vec datarefs,
+   struct data_reference *dr0,
unsigned int *inside_cost,
unsigned int *outside_cost,
stmt_vector_for_cost *body_cost_vec,
unsigned int npeel,
bool unknown_misalignment)
 {
-  gimple *stmt = DR_STMT (dr0);
-  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
-  loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
-  vec datarefs = LOOP_VINFO_DATAREFS (loop_vinfo);
-
   unsigned i;
   data_reference *dr;
 
   FOR_EACH_VEC_ELT (datarefs, i, dr)
 {
-  stmt = DR_STMT (dr);
-  stmt_info = vinfo_for_stmt (stmt);
+  gimple *stmt = DR_STMT (dr);
+  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
   /* For interleaving, only the alignment of the first access
  matters.  */
   if (STMT_VINFO_GROUPED_ACCESS (stmt_info)
@@ -1192,7 +1188,9 @@ vect_get_peeling_costs_all_drs (struct d
 
   int save_misalignment;
   save_misalignment = DR_MISALIGNMENT (dr);
-  if (unknown_misalignment && dr == dr0)
+  if (npeel == 0)
+   ;
+  else if (unknown_misalignment && dr == dr0)
SET_DR_MISALIGNMENT (dr, 0);
   else
vect_update_misalignment_for_peel (dr, dr0, npeel);
@@ -1222,7 +1220,8 @@ vect_peeling_hash_get_lowest_cost (_vect
   body_cost_vec.create (2);
   epilogue_cost_vec.create (2);
 
-  vect_get_peeling_costs_all_drs (elem->dr, _cost, _cost,
+  vect_get_peeling_costs_all_drs (LOOP_VINFO_DATAREFS (loop_vinfo),
+ elem->dr, _cost, _cost,
  _cost_vec, elem->npeel, false);
 
   body_cost_vec.release ();
@@ -1651,7 +1650,7 @@ vect_enhance_data_refs_alignment (loop_v
 
   stmt_vector_for_cost dummy;
   dummy.create (2);
-  vect_get_peeling_costs_all_drs (dr0,
+  vect_get_peeling_costs_all_drs (datarefs, dr0,
  _inside_cost,
  _outside_cost,
  , vf / 2, true);
@@ -1660,7 +1659,7 @@ vect_enhance_data_refs_alignment (loop_v
   if (first_store)
{
  dummy.create (2);
- vect_get_peeling_costs_all_drs (first_store,
+ vect_get_peeling_costs_all_drs (datarefs, first_store,
  _inside_cost,
  _outside_cost,
  , vf / 2, true);
@@ -1744,18 +1743,15 @@ vect_enhance_data_refs_alignment (loop_v
 dr0 = unsupportable_dr;
   else if (do_peeling)
 {
-  /* Calculate the penalty for no peeling, i.e. leaving everything
-unaligned.
-TODO: Adapt vect_get_peeling_costs_all_drs and use here.
+  /* Calculate the penalty for no peeling, i.e. leaving everything as-is.
 TODO: Use nopeel_outside_cost or get rid of it?  */
   unsigned nopeel_inside_cost = 0;
   unsigned nopeel_outside_cost = 0;
 
   stmt_vector_for_cost dummy;
   dummy.create (2);
-  FOR_EACH_VEC_ELT (datarefs, i, dr)
-   vect_get_data_access_cost (dr, _inside_cost,
-  _outside_cost, );
+  vect_get_peeling_costs_all_drs (datarefs, NULL, _inside_cost,
+ _outside_cost, , 0, false);
   dummy.release ();
 
   /* Add epilogue costs.  As we do not peel for

Re: [PATCH] trivial cleanup in dwarf2out.c

2017-07-21 Thread Jakub Jelinek

On Fri, Jul 21, 2017 at 10:36:46AM +0200, Ulrich Drepper wrote:
> While looking through dwarf2out.c I came across this if expression where
> supposedly in case DWARF before 5 is used the 128 LEB  encoding is used.
>  This of course cannot be the case.  There isn't really a deeper problem
> since the entire block is guarded by a test for at least DWARF 5.
> 
> I propose the following patch.
> 
> [gcc/ChangeLog]
> 
> 2017-07-21  Ulrich Drepper  
> 
>   * dwarf2out.c (output_file_names): Avoid double testing for
>   dwarf_version >= 5.

Ok, thanks.

> --- gcc/dwarf2out.c   2017-07-21 06:15:26.993826963 +0200
> +++ gcc/dwarf2out.c-new   2017-07-21 10:29:03.382742797 +0200
> @@ -11697,7 +11697,7 @@ output_file_names (void)
>output_line_string (str_form, filename0, "File Entry", 0);
> 
>/* Include directory index.  */
> -  if (dwarf_version >= 5 && idx_form != DW_FORM_udata)
> +  if (idx_form != DW_FORM_udata)
>   dw2_asm_output_data (idx_form == DW_FORM_data1 ? 1 : 2,
>0, NULL);
>else

Jakub

[PATCH] trivial cleanup in dwarf2out.c

2017-07-21 Thread Ulrich Drepper

While looking through dwarf2out.c I came across this if expression where
supposedly in case DWARF before 5 is used the 128 LEB  encoding is used.
 This of course cannot be the case.  There isn't really a deeper problem
since the entire block is guarded by a test for at least DWARF 5.

I propose the following patch.

[gcc/ChangeLog]

2017-07-21  Ulrich Drepper  

* dwarf2out.c (output_file_names): Avoid double testing for
dwarf_version >= 5.

--- gcc/dwarf2out.c 2017-07-21 06:15:26.993826963 +0200
+++ gcc/dwarf2out.c-new 2017-07-21 10:29:03.382742797 +0200
@@ -11697,7 +11697,7 @@ output_file_names (void)
   output_line_string (str_form, filename0, "File Entry", 0);

   /* Include directory index.  */
-  if (dwarf_version >= 5 && idx_form != DW_FORM_udata)
+  if (idx_form != DW_FORM_udata)
dw2_asm_output_data (idx_form == DW_FORM_data1 ? 1 : 2,
 0, NULL);
   else

Re: Enable crossjumping with bb-reorering

2017-07-21 Thread Kyrill Tkachov


Hi Honza,

Just a couple of typo nits.

Cheers,
Kyrill

On 21/07/17 00:38, Jan Hubicka wrote:

Hello,
Caroline disabled crossjumping with bb-reordering in initial patch implementing
this pass.  According to discussion with Rth it was only becuase she was unsure
how to prevent crossjumping to mix up crossing and non-crossing jumps. THis is
easy to do - we only need to avoid merging code across section as done by this
patch.

Bootstrapped/regtested x86_64-linux, will commit it tomorrow.

* cfgcleanup.c (flow_find_cross_jump): Do not crossjump across
hot/cold regions.
(try_crossjump_to_edge): Do not punt on partitioned functions.
Index: cfgcleanup.c
===
--- cfgcleanup.c(revision 250378)
+++ cfgcleanup.c(working copy)
@@ -1435,6 +1435,13 @@ flow_find_cross_jump (basic_block bb1, b
if (i1 == BB_HEAD (bb1) || i2 == BB_HEAD (bb2))
break;
  
+  /* Do not turn corssing edge to non-crossing or vice versa after


"crossing"


+reload. */
+  if (BB_PARTITION (BLOCK_FOR_INSN (i1))
+ != BB_PARTITION (BLOCK_FOR_INSN (i2))
+ && reload_completed)
+   break;
+
dir = merge_dir (dir, old_insns_match_p (0, i1, i2));
if (dir == dir_none || (!dir_p && dir != dir_both))
break;
@@ -1958,18 +1965,6 @@ try_crossjump_to_edge (int mode, edge e1
  
newpos1 = newpos2 = NULL;
  
-  /* If we have partitioned hot/cold basic blocks, it is a bad idea

- to try this optimization.
-
- Basic block partitioning may result in some jumps that appear to
- be optimizable (or blocks that appear to be mergeable), but which really
- must be left untouched (they are required to make it safely across
- partition boundaries).  See the comments at the top of
- bb-reorder.c:partition_hot_cold_basic_blocks for complete details.  */
-
-  if (crtl->has_bb_partition && reload_completed)
-return false;
-
/* Search backward through forwarder blocks.  We don't need to worry
   about multiple entry or chained forwarders, as they will be optimized
   away.  We do this to look past the unconditional jump following a
@@ -2003,6 +1998,11 @@ try_crossjump_to_edge (int mode, edge e1
if (EDGE_COUNT (src1->preds) == 0 || EDGE_COUNT (src2->preds) == 0)
  return false;
  
+  /* Do not turn corssing edge to non-crossing or vice versa after reload.  */


Likewise.


+  if (BB_PARTITION (src1) != BB_PARTITION (src2)
+  && reload_completed)
+return false;
+
/* Look for the common insn sequence, part the first ...  */
if (!outgoing_edges_match (mode, src1, src2))
  return false;
@@ -2024,12 +2024,10 @@ try_crossjump_to_edge (int mode, edge e1
  
if (dir == dir_backward)

  {
-#define SWAP(T, X, Y) do { T tmp = (X); (X) = (Y); (Y) = tmp; } while (0)
-  SWAP (basic_block, osrc1, osrc2);
-  SWAP (basic_block, src1, src2);
-  SWAP (edge, e1, e2);
-  SWAP (rtx_insn *, newpos1, newpos2);
-#undef SWAP
+  std::swap (osrc1, osrc2);
+  std::swap (src1, src2);
+  std::swap (e1, e2);
+  std::swap (newpos1, newpos2);
  }
  
/* Don't proceed with the crossjump unless we found a sufficient number

Re: [PATCH][1/n] Fix PR81303

2017-07-21 Thread Bin.Cheng

On Fri, Jul 21, 2017 at 8:12 AM, Richard Biener  wrote:
>
> The following is sth I noticed when looking at a way to fix PR81303.
> We happily compute a runtime cost model threshold that executes the
> vectorized variant even though no vector iteration takes place due
> to the number of prologue/epilogue iterations.  The following fixes
> that -- note that if we do not know the prologue/epilogue counts
> statically they are estimated at vf/2 which means there's still the
> chance the vector iteration won't execute.  To fix that we'd have to
> estimate those as vf-1 instead, sth we might consider doing anyway
> given that we regularly completely peel the epilogues vf-1 times
> in that case.  Maybe as followup.
Hi,
Do we consider disabling epilogue peeling if # of iters is unknown at
compilation time?  When loop is versioned, could we share epilogue
with versioned loop?  Epilogue was shared before?

Thanks,
bin
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.
>
> Richard.
>
> 2016-07-21  Richard Biener  
>
> PR tree-optimization/81303
> * tree-vect-loop.c (vect_estimate_min_profitable_iters): Take
> into account prologue and epilogue iterations when raising
> min_profitable_iters to sth at least covering one vector iteration.
>
> Index: gcc/tree-vect-loop.c
> ===
> --- gcc/tree-vect-loop.c(revision 250384)
> +++ gcc/tree-vect-loop.c(working copy)
> @@ -3702,8 +3702,9 @@ vect_estimate_min_profitable_iters (loop
>"  Calculated minimum iters for profitability: %d\n",
>min_profitable_iters);
>
> -  min_profitable_iters =
> -   min_profitable_iters < vf ? vf : min_profitable_iters;
> +  /* We want the vectorized loop to execute at least once.  */
> +  if (min_profitable_iters < (vf + peel_iters_prologue + 
> peel_iters_epilogue))
> +min_profitable_iters = vf + peel_iters_prologue + peel_iters_epilogue;
>
>if (dump_enabled_p ())
>  dump_printf_loc (MSG_NOTE, vect_location,

[PATCH][1/n] Fix PR81303

2017-07-21 Thread Richard Biener


The following is sth I noticed when looking at a way to fix PR81303.
We happily compute a runtime cost model threshold that executes the
vectorized variant even though no vector iteration takes place due
to the number of prologue/epilogue iterations.  The following fixes
that -- note that if we do not know the prologue/epilogue counts
statically they are estimated at vf/2 which means there's still the
chance the vector iteration won't execute.  To fix that we'd have to
estimate those as vf-1 instead, sth we might consider doing anyway
given that we regularly completely peel the epilogues vf-1 times
in that case.  Maybe as followup.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2016-07-21  Richard Biener  

PR tree-optimization/81303
* tree-vect-loop.c (vect_estimate_min_profitable_iters): Take
into account prologue and epilogue iterations when raising
min_profitable_iters to sth at least covering one vector iteration.

Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c(revision 250384)
+++ gcc/tree-vect-loop.c(working copy)
@@ -3702,8 +3702,9 @@ vect_estimate_min_profitable_iters (loop
   "  Calculated minimum iters for profitability: %d\n",
   min_profitable_iters);
 
-  min_profitable_iters =
-   min_profitable_iters < vf ? vf : min_profitable_iters;
+  /* We want the vectorized loop to execute at least once.  */
+  if (min_profitable_iters < (vf + peel_iters_prologue + peel_iters_epilogue))
+min_profitable_iters = vf + peel_iters_prologue + peel_iters_epilogue;
 
   if (dump_enabled_p ())
 dump_printf_loc (MSG_NOTE, vect_location,

[PATCH][RFA/RFC] Stack clash mitigation patch 00 V3

2017-07-21 Thread Jeff Law

V3 patches of stack-clash mitigation should land tomorrow after another
round of testing completes.  Key changes this iteration:



  1. For constant sized dynamic allocations we'll allocate/probe up to 4
 STACK_CLASH_PROTECTION_PROBE_INTERVAL regions inline and unrolled.

  2. For larger constant sized dynamic allocations we rotate the loop,
 saving a compare/jump.

  3. blockage insns added to prevent scheduler reordering, particularly
 in the inline/unrolled loop case.

  4. PARAMs to control the assumed size of the guard and the probing
 interval.  Both default to 4k.  Note that the backends may not
 support all possible values for these PARAMs.

 a. The size of the guard helps determine how big of a local static
frame can be allocated without probing on targets that have an
implicit probe in the caller

 b. The interval determines how often we probe once we decide
probing is required.

 c. Backends can override the default values.  aarch64 for example
overrides the guard size

  5. More aarch64 improvements based on discussions with Wilco, Richard
 and Ramana.

 a. Support for a probing interval > 4k.

 b. Assume guard of 64k, with 1k for outgoing arglist.  Thus frames
less than 63k require no probing.

  6. Additional tests for the unrolled inline dynamic case, rotated
 loop case and use of a large guard value to avoid probing
 (x86 and ppc only)


Jeff

[PING][Arm] Obsoleting Command line option -mstructure-size-boundary in eabi configurations

2017-07-21 Thread Michael Collison

Ping. Updated patch posted here:

https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00723.html

85 matches

Mail list logo