Re: [PATCH] Prevent LTO wrappers to process a recursive execution

2016-06-22 Thread Jeff Law

On 04/25/2016 09:49 AM, Martin Liška wrote:

Hello.

To make LTO wrappers (gcc-nm, gcc-ar, gcc-ranlib) more smart, I would like to 
prevent execution
of the same binary by these wrapper. For LTO testing I symlink ar (nm, ranlib) 
to these wrappers instead
of hacking a build system to respect NM (AR, RANLIB) environment variables. The 
only problem with that solution
is that these wrappers recursively executes themselves as the first folder in 
PATH is set to the location with wrappers.

Following patch presents such recursion.

Patch can bootstrap on x86_64-linux-gnu.

Ready for trunk?
Thanks,
Martin


0001-Prevent-LTO-wrappers-to-process-a-recursive-executio.patch


From dfe0486ad7babe3d6de349001d4790684dc94bfb Mon Sep 17 00:00:00 2001
From: marxin 
Date: Fri, 22 Apr 2016 17:57:23 +0200
Subject: [PATCH] Prevent LTO wrappers to process a recursive execution

gcc/ChangeLog:

2016-04-22  Martin Liska  

* file-find.c (remove_prefix): New function.
* file-find.h (remove_prefix): Declare the function.
* gcc-ar.c (main): Skip a folder of the wrapper if
a wrapped binary would point to the same file.
Is this still something you want to pursue?  It looks pretty reasonable 
and one could make an argument that it's a good idea in and of itself.


jeff


Re: [PATCH v2] Allocate constant size dynamic stack space in the prologue

2016-06-22 Thread Jeff Law

On 05/06/2016 03:44 AM, Dominik Vogt wrote:

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index 21f21c9..4d48afd 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c

...

@@ -1099,8 +1101,10 @@ expand_stack_vars (bool (*pred) (size_t), struct 
stack_vars_data *data)

   /* If there were any, allocate space.  */
   if (large_size > 0)
-   large_base = allocate_dynamic_stack_space (GEN_INT (large_size), 0,
-  large_align, true);
+   {
+ large_allocsize = GEN_INT (large_size);
+ get_dynamic_stack_size (_allocsize, 0, large_align, NULL);

...

See below.


@@ -1186,6 +1190,18 @@ expand_stack_vars (bool (*pred) (size_t), struct 
stack_vars_data *data)
  /* Large alignment is only processed in the last pass.  */
  if (pred)
continue;
+
+ if (large_allocsize && ! large_allocation_done)
+   {
+ /* Allocate space the virtual stack vars area in the prologue.
+  */
+ HOST_WIDE_INT loffset;
+
+ loffset = alloc_stack_frame_space (INTVAL (large_allocsize),
+PREFERRED_STACK_BOUNDARY);


1) Should this use PREFERRED_STACK_BOUNDARY or just STACK_BOUNDARY?
2) Is this the right place for rounding up, or should
   it be done above, maybe in get_dynamic_stack_size?

I think PREFERRED_STACK_BOUNDARY is the correct one to use.

I think rounding in either place is fine.  We'd like to avoid multiple 
roundings, but otherwise I don't think it really matters.


jeff


Not sure whether this is the right


+ large_base = get_dynamic_stack_base (loffset, large_align);
+ large_allocation_done = true;
+   }
  gcc_assert (large_base != NULL);

  large_alloc += alignb - 1;



diff --git a/gcc/testsuite/gcc.dg/stack-layout-dynamic-1.c 
b/gcc/testsuite/gcc.dg/stack-layout-dynamic-1.c
new file mode 100644
index 000..e06a16c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/stack-layout-dynamic-1.c
@@ -0,0 +1,14 @@
+/* Verify that run time aligned local variables are aloocated in the prologue
+   in one pass together with normal local variables.  */
+/* { dg-do compile } */
+/* { dg-options "-O0" } */
+
+extern void bar (void *, void *, void *);
+void foo (void)
+{
+  int i;
+  __attribute__ ((aligned(65536))) char runtime_aligned_1[512];
+  __attribute__ ((aligned(32768))) char runtime_aligned_2[1024];
+  bar (, _aligned_1, _aligned_2);
+}
+/* { dg-final { scan-assembler-times "cfi_def_cfa_offset" 2 { target { 
s390*-*-* } } } } */


I've no idea how to test this on other targets, or how to express
the test in a target independent way.  The scan-assembler-times
does not work on x86_64.
I wonder if you could force -fomit-frame-pointer and see if we still end 
up with a frame pointer?


jeff



Re: [PATCH v2] Allocate constant size dynamic stack space in the prologue

2016-06-22 Thread Jeff Law

On 05/06/2016 03:37 AM, Dominik Vogt wrote:

Updated version of the patch described below.  Apart from fixing a
bug and adding a test, the new logic is now used always, for all
targets.  The discussion of the original patch starts here:

https://gcc.gnu.org/ml/gcc-patches/2015-11/msg03052.html

The new patch has been bootstrapped and regression tested on s390,
s390x and x86_64, but please check the questions/comments in the
follow up message.

On Wed, Nov 25, 2015 at 01:56:10PM +0100, Dominik Vogt wrote:

> The attached patch fixes a warning during Linux kernel compilation
> on S/390 due to -mwarn-dynamicstack and runtime alignment of stack
> variables with constant size causing cfun->calls_alloca to be set
> (even if alloca is not used at all).  The patched code places
> constant size runtime aligned variables in the "virtual stack
> vars" area instead of creating a "virtual stack dynamic" area.
>
> This behaviour is activated by defining
>
>   #define ALLOCATE_DYNAMIC_STACK_SPACE_IN_PROLOGUE 1
Is there some reason why we don't just to this unconditionally?  ie, if 
we know the size of dynamic space, why not just always handle that in 
the prologue?  Seems like a useful optimization for a variety of reasons.


Of course if we do this unconditionally, we definitely need to find a 
way to test it better.


In reality, I don't see where/how the patch uses 
ALLOCATE_DYNAMIC_STACK_SPACE_IN_PROLOGUE anyway and it seems to be 
enabled for all targets, which is what I want :-)





>
> in the backend; otherwise the old logic is used.
>
> The kernel uses runtime alignment for the page structure (aligned
> to 16 bytes), and apart from triggereing the alloca warning
> (-mwarn-dynamicstack), the current Gcc also generates inefficient
> code like
>
>   aghi %r15,-160  # prologue: create stack frame
>   lgr %r11,%r15   # prologue: generate frame pointer
>   aghi %r15,-32   # space for dynamic stack
>
> which could be simplified to
>
>   aghi %r15,-192
>
> (if later optimization passes are able to get rid of the frame
> pointer).  Is there a specific reason why the patched behaviour
> shouldn't be used for all platforms?
>
> --
>
> As the placement of runtime aligned stack variables with constant
> size is done completely in the middleend, I don't see a way to fix
> this in the backend.

Ciao

Dominik ^_^  ^_^

-- Dominik Vogt IBM Germany


0001-v2-ChangeLog


gcc/ChangeLog

* cfgexpand.c (expand_stack_vars): Implement synamic stack space
allocation in the prologue.
* explow.c (get_dynamic_stack_base): New function to return an address
expression for the dynamic stack base.
(get_dynamic_stack_size): New function to do the required dynamic stack
space size calculations.
(allocate_dynamic_stack_space): Use new functions.
(align_dynamic_address): Move some code from
allocate_dynamic_stack_space to new function.
* explow.h (get_dynamic_stack_base, get_dynamic_stack_size): Export.
gcc/testsuite/ChangeLog

* gcc.target/s390/warn-dynamicstack-1.c: New test.
* gcc.dg/stack-usage-2.c (foo3): Adapt expected warning.
stack-layout-dynamic-1.c: New test.


0001-v2-Allocate-constant-size-dynamic-stack-space-in-the-pr.patch


From e76a7e02f7862681d1b5344e64aca1b0a62cdc2c Mon Sep 17 00:00:00 2001
From: Dominik Vogt 
Date: Wed, 25 Nov 2015 09:31:19 +0100
Subject: [PATCH] Allocate constant size dynamic stack space in the
 prologue ...

... and place it in the virtual stack vars area, if the platform supports it.
On S/390 this saves adjusting the stack pointer twice and forcing the frame
pointer into existence.  It also removes the warning with -mwarn-dynamicstack
that is triggered by cfun->calls_alloca == 1.

This fixes a problem with the Linux kernel which aligns the page structure to
16 bytes at run time using inefficient code and issuing a bogus warning.



@@ -1186,6 +1190,18 @@ expand_stack_vars (bool (*pred) (size_t), struct 
stack_vars_data *data)
  /* Large alignment is only processed in the last pass.  */
  if (pred)
continue;
+
+ if (large_allocsize && ! large_allocation_done)
+   {
+ /* Allocate space the virtual stack vars area in the prologue.
+  */

Line wrapping nit here.  Bring "prologue" down to the next line.

I really like this.  I think the big question is how do we test it.  I 
suspect our bootstrap and regression suite probably don't exercise this 
code is any significant way.


Jeff


Re: [PATCH 2/2][v3] Drop excess size used for run time allocated stack variables.

2016-06-22 Thread Jeff Law

On 05/25/2016 07:32 AM, Dominik Vogt wrote:

On Wed, May 25, 2016 at 02:30:54PM +0100, Dominik Vogt wrote:

> On Tue, May 03, 2016 at 03:17:53PM +0100, Dominik Vogt wrote:

> > Version two of the patch including a test case.
> >
> > On Mon, May 02, 2016 at 09:10:25AM -0600, Jeff Law wrote:

> > > On 04/29/2016 04:12 PM, Dominik Vogt wrote:

> > > >The attached patch removes excess stack space allocation with
> > > >alloca in some situations.  Plese check the commit message in the
> > > >patch for details.

> >

> > > However, I would strongly recommend some tests, even if they are
> > > target specific.  You can always copy pr36728-1 into the s390x
> > > directory and look at size of the generated stack.  Simliarly for
> > > pr50938 for x86.

> >
> > However, x86 uses the "else" branch in round_push, i.e. it uses
> > "virtual_preferred_stack_boundary_rtx" to calculate the number of
> > bytes to add for stack alignment.  That value is unknown at the
> > time round_push is called, so the test case fails on such targets,
> > and I've no idea how to fix this properly.

>
> Third version of the patch with the suggested cleanup in the first
> patch and the functional stuff in the second one.  The first patch
> is based on Jeff's draft with the change suggested by Eric and
> more cleanup added by me.

This is the updated funtional patch.  Re-tested with limited
effort, i.e. tested and bootstrapped on s390x biarch (but did not
look for performance regressions compared to version 2 of the
patch).

Ciao

Dominik ^_^  ^_^

-- Dominik Vogt IBM Germany


0002-v3-ChangeLog


gcc/ChangeLog

* explow.c (round_push): Use know adjustment.
(allocate_dynamic_stack_space): Pass known adjustment to round_push.
gcc/testsuite/ChangeLog

* gcc.dg/pr50938.c: New test.


0002-v3-Drop-excess-size-used-for-run-time-allocated-stack-v.patch


From 4296d353e1d153b5b5ee435a44cae6117bf2fff0 Mon Sep 17 00:00:00 2001
From: Dominik Vogt 
Date: Fri, 29 Apr 2016 08:36:59 +0100
Subject: [PATCH 2/2] Drop excess size used for run time allocated stack
 variables.

The present calculation sometimes led to more stack memory being used than
necessary with alloca.  First, (STACK_BOUNDARY -1) would be added to the
allocated size:

  size = plus_constant (Pmode, size, extra);
  size = force_operand (size, NULL_RTX);

Then round_push was called and added another (STACK_BOUNDARY - 1) before
rounding down to a multiple of STACK_BOUNDARY.  On s390x this resulted in
adding 14 before rounding down for "x" in the test case pr36728-1.c.

round_push() now takes an argument to inform it about what has already been
added to size.
---
 gcc/explow.c   | 45 +---
 gcc/testsuite/gcc.dg/pr50938.c | 52 ++
 2 files changed, 79 insertions(+), 18 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr50938.c

diff --git a/gcc/explow.c b/gcc/explow.c
index 09a0330..85596e2 100644
--- a/gcc/explow.c
+++ b/gcc/explow.c
@@ -949,24 +949,30 @@ anti_adjust_stack (rtx adjust)
 }

 /* Round the size of a block to be pushed up to the boundary required
-   by this machine.  SIZE is the desired size, which need not be constant.  */
+   by this machine.  SIZE is the desired size, which need not be constant.
+   ALREADY_ADDED is the number of units that have already been added to SIZE
+   for other alignment reasons.
+*/

 static rtx
-round_push (rtx size)
+round_push (rtx size, int already_added)
 {
-  rtx align_rtx, alignm1_rtx;
+  rtx align_rtx, add_rtx;

   if (!SUPPORTS_STACK_ALIGNMENT
   || crtl->preferred_stack_boundary == MAX_SUPPORTED_STACK_ALIGNMENT)
 {
   int align = crtl->preferred_stack_boundary / BITS_PER_UNIT;
+  int add;

   if (align == 1)
return size;

+  add = (align > already_added) ? align - already_added - 1 : 0;
+
   if (CONST_INT_P (size))
{
- HOST_WIDE_INT new_size = (INTVAL (size) + align - 1) / align * align;
+ HOST_WIDE_INT new_size = (INTVAL (size) + add) / align * align;

  if (INTVAL (size) != new_size)
size = GEN_INT (new_size);
So presumably the idea here is when the requested SIZE would require 
allocating additional space to first see if the necessary space is 
already available inside ALREADY_ADDED and use that rather than rounding 
size up to an alignment boundary.


I can see how that works in the sense of avoiding allocating extra 
space.  What I'm struggling with is how do we know the space actually 
allocated is going to have the right alignment?


What am I missing here?

jeff





[PATCH] Fix source locations of bad enum values (PR c/71610 and PR c/71613)

2016-06-22 Thread David Malcolm
PR c/71613 identifies a problem where we fail to report this enum:

  enum { e1 = LLONG_MIN };

with -pedantic, due to LLONG_MIN being inside a system header.

This patch updates the C and C++ frontends to use the location of the
name as the primary location in the diagnostic, supplying the location
of the value as a secondary location, fixing the issue.

Before:
  $ gcc -c /tmp/test.c -Wpedantic
  /tmp/test.c: In function 'main':
  /tmp/test.c:3:14: warning: ISO C restricts enumerator values to range of 
'int' [-Wpedantic]
 enum { c = -30 };
^

After:
  $ ./xgcc -B. -c /tmp/test.c -Wpedantic
  /tmp/test.c: In function 'main':
  /tmp/test.c:3:10: warning: ISO C restricts enumerator values to range of 
'int' [-Wpedantic]
 enum { c = -30 };
^   ~~~

Successfully bootstrapped on x86_64-pc-linux-gnu;
adds 13 PASS results to gcc.sum and 9 PASS results to g++.sum.

OK for trunk?

gcc/c/ChangeLog:
PR c/71610
PR c/71613
* c-decl.c (build_enumerator): Fix description of LOC in comment.
Update diagnostics to use a rich_location at decl_loc, rather than
at loc, adding loc as a secondary range if available.
* c-parser.c (c_parser_enum_specifier): Use the full location of
the expression for value_loc, rather than just the first token.

gcc/cp/ChangeLog:
PR c/71610
PR c/71613
* cp-tree.h (build_enumerator): Add location_t param.
* decl.c (build_enumerator): Add "value_loc" param.  Update
"not an integer constant" diagnostic to use "loc" rather than
input_location, and to add "value_loc" as a secondary range if
available.
* parser.c (cp_parser_enumerator_definition): Extract the
location of the value from the cp_expr for the constant
expression, if any, and pass it to build_enumerator.
* pt.c (tsubst_enum): Extract EXPR_LOCATION of the value,
and pass it to build_enumerator.

gcc/ChangeLog:
PR c/71610
PR c/71613
* diagnostic-core.h (pedwarn_at_rich_loc): New prototype.
* diagnostic.c (pedwarn_at_rich_loc): New function.

gcc/testsuite/ChangeLog:
PR c/71610
PR c/71613
* c-c++-common/pr71610.c: New test case.
* gcc.dg/c90-const-expr-8.c: Update expected column of diagnostic.
* gcc.dg/pr71610-2.c: New test case.
* gcc.dg/pr71613.c: New test case.
---
 gcc/c/c-decl.c  | 32 +---
 gcc/c/c-parser.c| 10 --
 gcc/cp/cp-tree.h|  2 +-
 gcc/cp/decl.c   | 11 ---
 gcc/cp/parser.c |  7 +--
 gcc/cp/pt.c |  3 ++-
 gcc/diagnostic-core.h   |  2 ++
 gcc/diagnostic.c| 12 
 gcc/testsuite/c-c++-common/pr71610.c| 11 +++
 gcc/testsuite/gcc.dg/c90-const-expr-8.c |  2 +-
 gcc/testsuite/gcc.dg/pr71610-2.c| 17 +
 gcc/testsuite/gcc.dg/pr71613.c  | 19 +++
 12 files changed, 107 insertions(+), 21 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/pr71610.c
 create mode 100644 gcc/testsuite/gcc.dg/pr71610-2.c
 create mode 100644 gcc/testsuite/gcc.dg/pr71613.c

diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
index 5c08c59..0d081ff 100644
--- a/gcc/c/c-decl.c
+++ b/gcc/c/c-decl.c
@@ -8159,7 +8159,7 @@ finish_enum (tree enumtype, tree values, tree attributes)
 /* Build and install a CONST_DECL for one value of the
current enumeration type (one that was begun with start_enum).
DECL_LOC is the location of the enumerator.
-   LOC is the location of the '=' operator if any, DECL_LOC otherwise.
+   LOC is the location of the value if any, DECL_LOC otherwise.
Return a tree-list containing the CONST_DECL and its value.
Assignment of sequential values by default is handled here.  */
 
@@ -8169,6 +8169,10 @@ build_enumerator (location_t decl_loc, location_t loc,
 {
   tree decl, type;
 
+  rich_location richloc (line_table, decl_loc);
+  if (loc && loc != decl_loc)
+richloc.add_range (loc, false);
+
   /* Validate and default VALUE.  */
 
   if (value != 0)
@@ -8179,8 +8183,10 @@ build_enumerator (location_t decl_loc, location_t loc,
value = 0;
   else if (!INTEGRAL_TYPE_P (TREE_TYPE (value)))
{
- error_at (loc, "enumerator value for %qE is not an integer constant",
-   name);
+ error_at_rich_loc
+   (,
+"enumerator value for %qE is not an integer constant",
+name);
  value = 0;
}
   else
@@ -8189,14 +8195,17 @@ build_enumerator (location_t decl_loc, location_t loc,
{
  value = c_fully_fold (value, false, NULL);
  if (TREE_CODE (value) == INTEGER_CST)
-   pedwarn (loc, OPT_Wpedantic,
-  

Re: [PATCH, rs6000] Add minimum __float128 built-in support required for glibc

2016-06-22 Thread Bill Schmidt


> On Jun 22, 2016, at 6:27 PM, Joseph Myers  wrote:
> 
> On Wed, 22 Jun 2016, Bill Schmidt wrote:
> 
>> Hi Joseph,
>> 
>> That's indeed preferable for the long term -- given how close we are to the 
>> cutoff for 6.2, though, I'm worried about adding any new dependencies for 
>> getting this upstream.  I'd suggest that we go ahead with reviewing this 
>> patch in the short term, and I'll be happy to work with you later on getting
>> the impedance matching right when they become arch-neutral.
> 
> The architecture-independent built-in functions really aren't hard.  See 
> this patch, on top of my main _FloatN / _FloatNx patch.  (Not written up 
> or fully regression-tested, and I want to add more test coverage, but it 
> seems to work.)  To keep this patch minimal I didn't include updates to a 
> few case statements only needed for optimization, e.g. in 
> tree_call_nonnegative_warnv_p, but when those are changed you get 
> optimizations automatically for these functions that would be harder to 
> get with anything architecture-specific.

I understand that this is what we want for GCC 7.  My current concern is to
get my patch included in GCC 6.2, where I can't be polluting common code.
To get it accepted there, I first need this code approved in mainline.  So I
am quite willing to move to the architecture-independent ones later, but
for now I don't see that I have any choice but to seek approval for the
purely arch-dependent one.

> 
> Regarding your patch:
> 
> (a) for GCC 6 (supposing my patch is used for trunk), it's missing 
> documentation for the new functions (this patch has documentation for the 
> architecture-independent functions);

A very good point -- I totally forgot about this, and will have to add it.

> 
> (b) for trunk, having an insn pattern infkf1 for a built-in function that 
> loads a constant is not appropriate (other insn patterns to optimize the 
> architecture-independent built-in functions may well be appropriate).  
> Rather, if there is a particularly efficient way of generating code to 
> load a certain constant, the back end should be set up to use that way 
> whenever that constant occurs (more generally, whenever any constant to 
> which that efficient way applies occurs) - including for example when it 
> occurs from folding arithmetic, say (__float128) __builtin_inff (), not 
> just from __builtin_inff128 ().

The fact that I hook this built-in directly to a pattern named infkf1
doesn't seem to preclude anything you suggest.  I named it this way
on the off-chance that inf1 becomes a standard pattern in the
future, in which case I want to generate this constant.  We can 
always use gen_infkf1 to reuse this code in any other context.  I'm
not understanding your objection.

Thanks,
Bill



Re: [PATCH, rs6000] Prefer vspltisw/h over xxspltib+instruction when available

2016-06-22 Thread Segher Boessenkool
On Wed, Jun 22, 2016 at 05:29:59PM -0400, Michael Meissner wrote:
> This code should fix the problem.  It does not allow constants in the
> arguments.  Combine will create one of the vec_duplicate patterns with a
> constant integer that will generate VSPLTIS or XXSPLTIB/etc.  I also
> tightened the memory requirements to only allow indexed memory forms
> during/after register allocation, since the instruction only uses indexed
> addressing.
> 
> I bootstrapped the compiler and ran make check with no regressions on a little
> endian power8 system.  Can I check it into trunk, and after an appropriate
> waiting period check it into GCC 6.x if there were no issues?

If this works, that is marvelous.  Okay for trunk and 6 later, thanks!

Some tiny things...

>   * config/rs6000/predicates.md (splat_input_operand): Rework.
>   Don't allow constants, since the caller insns don't support
>   constants.  During and after register allocation, only allow
>   indexed or indirect addresses, and not general addresses.  Only
>   allow modes supported by the hardware.

"caller insns"?

> --- gcc/config/rs6000/rs6000.c(revision 237715)
> +++ gcc/config/rs6000/rs6000.c(working copy)
> @@ -6282,10 +6282,7 @@ gen_easy_altivec_constant (rtx op)
> Return the number of instructions needed (1 or 2) into the address pointed
> via NUM_INSNS_PTR.
>  
> -   If NOSPLIT_P, only return true for constants that only generate the 
> XXSPLTIB
> -   instruction and can go in any VSX register.  If !NOSPLIT_P, only return 
> true
> -   for constants that generate XXSPLTIB and need a sign extend operation, 
> which
> -   restricts us to the Altivec registers.
> +   Return the constant that is being split via CONSTANT_PTR.
>  
> Allow either (vec_const [...]) or (vec_duplicate ).  If OP is a 
> valid
> XXSPLTIB constant, return the constant being set via the CONST_PTR

The CONST_PTR in that last line here is a misspelling of CONSTANT_PTR;
this last part should be removed I think?


Segher


Re: [PATCH, rs6000] Add minimum __float128 built-in support required for glibc

2016-06-22 Thread Joseph Myers
On Wed, 22 Jun 2016, Bill Schmidt wrote:

> Hi Joseph,
> 
> That's indeed preferable for the long term -- given how close we are to the 
> cutoff for 6.2, though, I'm worried about adding any new dependencies for 
> getting this upstream.  I'd suggest that we go ahead with reviewing this 
> patch in the short term, and I'll be happy to work with you later on getting
> the impedance matching right when they become arch-neutral.

The architecture-independent built-in functions really aren't hard.  See 
this patch, on top of my main _FloatN / _FloatNx patch.  (Not written up 
or fully regression-tested, and I want to add more test coverage, but it 
seems to work.)  To keep this patch minimal I didn't include updates to a 
few case statements only needed for optimization, e.g. in 
tree_call_nonnegative_warnv_p, but when those are changed you get 
optimizations automatically for these functions that would be harder to 
get with anything architecture-specific.

Regarding your patch:

(a) for GCC 6 (supposing my patch is used for trunk), it's missing 
documentation for the new functions (this patch has documentation for the 
architecture-independent functions);

(b) for trunk, having an insn pattern infkf1 for a built-in function that 
loads a constant is not appropriate (other insn patterns to optimize the 
architecture-independent built-in functions may well be appropriate).  
Rather, if there is a particularly efficient way of generating code to 
load a certain constant, the back end should be set up to use that way 
whenever that constant occurs (more generally, whenever any constant to 
which that efficient way applies occurs) - including for example when it 
occurs from folding arithmetic, say (__float128) __builtin_inff (), not 
just from __builtin_inff128 ().

diff --git a/gcc/builtin-types.def b/gcc/builtin-types.def
index 7fab9f8..468313c4 100644
--- a/gcc/builtin-types.def
+++ b/gcc/builtin-types.def
@@ -76,6 +76,27 @@ DEF_PRIMITIVE_TYPE (BT_UNWINDWORD, 
(*lang_hooks.types.type_for_mode)
 DEF_PRIMITIVE_TYPE (BT_FLOAT, float_type_node)
 DEF_PRIMITIVE_TYPE (BT_DOUBLE, double_type_node)
 DEF_PRIMITIVE_TYPE (BT_LONGDOUBLE, long_double_type_node)
+DEF_PRIMITIVE_TYPE (BT_FLOAT16, (float16_type_node
+? float16_type_node
+: error_mark_node))
+DEF_PRIMITIVE_TYPE (BT_FLOAT32, (float32_type_node
+? float32_type_node
+: error_mark_node))
+DEF_PRIMITIVE_TYPE (BT_FLOAT64, (float64_type_node
+? float64_type_node
+: error_mark_node))
+DEF_PRIMITIVE_TYPE (BT_FLOAT128, (float128_type_node
+ ? float128_type_node
+ : error_mark_node))
+DEF_PRIMITIVE_TYPE (BT_FLOAT32X, (float32x_type_node
+ ? float32x_type_node
+ : error_mark_node))
+DEF_PRIMITIVE_TYPE (BT_FLOAT64X, (float64x_type_node
+ ? float64x_type_node
+ : error_mark_node))
+DEF_PRIMITIVE_TYPE (BT_FLOAT128X, (float128x_type_node
+  ? float128x_type_node
+  : error_mark_node))
 DEF_PRIMITIVE_TYPE (BT_COMPLEX_FLOAT, complex_float_type_node)
 DEF_PRIMITIVE_TYPE (BT_COMPLEX_DOUBLE, complex_double_type_node)
 DEF_PRIMITIVE_TYPE (BT_COMPLEX_LONGDOUBLE, complex_long_double_type_node)
@@ -146,6 +167,13 @@ DEF_FUNCTION_TYPE_0 (BT_FN_DOUBLE, BT_DOUBLE)
distinguish it from two types in sequence, "long" followed by
"double".  */
 DEF_FUNCTION_TYPE_0 (BT_FN_LONGDOUBLE, BT_LONGDOUBLE)
+DEF_FUNCTION_TYPE_0 (BT_FN_FLOAT16, BT_FLOAT16)
+DEF_FUNCTION_TYPE_0 (BT_FN_FLOAT32, BT_FLOAT32)
+DEF_FUNCTION_TYPE_0 (BT_FN_FLOAT64, BT_FLOAT64)
+DEF_FUNCTION_TYPE_0 (BT_FN_FLOAT128, BT_FLOAT128)
+DEF_FUNCTION_TYPE_0 (BT_FN_FLOAT32X, BT_FLOAT32X)
+DEF_FUNCTION_TYPE_0 (BT_FN_FLOAT64X, BT_FLOAT64X)
+DEF_FUNCTION_TYPE_0 (BT_FN_FLOAT128X, BT_FLOAT128X)
 DEF_FUNCTION_TYPE_0 (BT_FN_DFLOAT32, BT_DFLOAT32)
 DEF_FUNCTION_TYPE_0 (BT_FN_DFLOAT64, BT_DFLOAT64)
 DEF_FUNCTION_TYPE_0 (BT_FN_DFLOAT128, BT_DFLOAT128)
@@ -157,6 +185,13 @@ DEF_FUNCTION_TYPE_1 (BT_FN_FLOAT_FLOAT, BT_FLOAT, BT_FLOAT)
 DEF_FUNCTION_TYPE_1 (BT_FN_DOUBLE_DOUBLE, BT_DOUBLE, BT_DOUBLE)
 DEF_FUNCTION_TYPE_1 (BT_FN_LONGDOUBLE_LONGDOUBLE,
 BT_LONGDOUBLE, BT_LONGDOUBLE)
+DEF_FUNCTION_TYPE_1 (BT_FN_FLOAT16_FLOAT16, BT_FLOAT16, BT_FLOAT16)
+DEF_FUNCTION_TYPE_1 (BT_FN_FLOAT32_FLOAT32, BT_FLOAT32, BT_FLOAT32)
+DEF_FUNCTION_TYPE_1 (BT_FN_FLOAT64_FLOAT64, BT_FLOAT64, BT_FLOAT64)
+DEF_FUNCTION_TYPE_1 (BT_FN_FLOAT128_FLOAT128, BT_FLOAT128, BT_FLOAT128)
+DEF_FUNCTION_TYPE_1 (BT_FN_FLOAT32X_FLOAT32X, BT_FLOAT32X, BT_FLOAT32X)
+DEF_FUNCTION_TYPE_1 (BT_FN_FLOAT64X_FLOAT64X, BT_FLOAT64X, BT_FLOAT64X)
+DEF_FUNCTION_TYPE_1 (BT_FN_FLOAT128X_FLOAT128X, BT_FLOAT128X, BT_FLOAT128X)
 

Re: [PATCH] x86-64: Load external function address via GOT slot

2016-06-22 Thread Uros Bizjak
On Tue, Jun 21, 2016 at 9:51 PM, H.J. Lu  wrote:

>> I have attached my version of the patch. It handles all your
>> testcases, plus +1 case. Bootstrap is still running.
>>
>> Does the patch work for you?
>
> It works.

Attached version of the patch was committed to mainline SVN. Regarding
the testcases - I have made them to compile on non-ia32 only ATM.
Let's change them when ia32 support is added (it is a trivial change).

2016-06-23  Uros Bizjak  
H.J. Lu  

PR target/67400
* config/i386/i386-protos.h (ix86_force_load_from_GOT_p): New.
* config/i386/i386.c (ix86_force_load_from_GOT_p): New function.
(ix86_legitimate_constant_p): Do not allow UNSPEC_GOTPCREL if
ix86_force_load_from_GOT_p returns true.
(ix86_legitimate_address_p): Allow UNSPEC_GOTPCREL if
ix86_force_load_from_GOT_p returns true.
(ix86_print_operand_address_as): Support UNSPEC_GOTPCREL if
ix86_force_load_from_GOT_p returns true.
(ix86_expand_move): Load the external function address via the
GOT slot if ix86_force_load_from_GOT_p returns true.
* config/i386/predicates.md (x86_64_immediate_operand): Return
false for SYMBOL_REFs where ix86_force_load_from_GOT_p returns true.
(x86_64_zext_immediate_operand): Ditto.

testsuite/ChangeLog:

2016-06-23  H.J. Lu  

PR target/67400
* gcc.target/i386/pr67400-1.c: New test.
* gcc.target/i386/pr67400-2.c: Likewise.
* gcc.target/i386/pr67400-3.c: Likewise.
* gcc.target/i386/pr67400-4.c: Likewise.
* gcc.target/i386/pr67400-5.c: Likewise.
* gcc.target/i386/pr67400-6.c: Likewise.
* gcc.target/i386/pr67400-7.c: Likewise.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline SVN.

Uros.
Index: config/i386/i386-protos.h
===
--- config/i386/i386-protos.h   (revision 237716)
+++ config/i386/i386-protos.h   (working copy)
@@ -70,6 +70,7 @@ extern bool ix86_expand_set_or_movmem (rtx, rtx, r
 extern bool constant_address_p (rtx);
 extern bool legitimate_pic_operand_p (rtx);
 extern bool legitimate_pic_address_disp_p (rtx);
+extern bool ix86_force_load_from_GOT_p (rtx);
 extern void print_reg (rtx, int, FILE*);
 extern void ix86_print_operand (FILE *, rtx, int);
 
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 237716)
+++ config/i386/i386.c  (working copy)
@@ -15120,6 +15120,19 @@ darwin_local_data_pic (rtx disp)
  && XINT (disp, 1) == UNSPEC_MACHOPIC_OFFSET);
 }
 
+/* True if operand X should be loaded from GOT.  */
+
+bool
+ix86_force_load_from_GOT_p (rtx x)
+{
+  return (TARGET_64BIT && !TARGET_PECOFF && !TARGET_MACHO
+ && !flag_plt && !flag_pic
+ && ix86_cmodel != CM_LARGE
+ && GET_CODE (x) == SYMBOL_REF
+ && SYMBOL_REF_FUNCTION_P (x)
+ && !SYMBOL_REF_LOCAL_P (x));
+}
+
 /* Determine if a given RTX is a valid constant.  We already know this
satisfies CONSTANT_P.  */
 
@@ -15188,6 +15201,12 @@ ix86_legitimate_constant_p (machine_mode mode, rtx
   if (MACHO_DYNAMIC_NO_PIC_P)
return machopic_symbol_defined_p (x);
 #endif
+
+  /* External function address should be loaded
+via the GOT slot to avoid PLT.  */
+  if (ix86_force_load_from_GOT_p (x))
+   return false;
+
   break;
 
 CASE_CONST_SCALAR_INT:
@@ -15596,6 +15615,9 @@ ix86_legitimate_address_p (machine_mode, rtx addr,
return false;
 
  case UNSPEC_GOTPCREL:
+   if (ix86_force_load_from_GOT_p (XVECEXP (XEXP (disp, 0), 0, 0)))
+ goto is_legitimate_pic;
+   /* FALLTHRU */
  case UNSPEC_PCREL:
gcc_assert (flag_pic);
goto is_legitimate_pic;
@@ -18169,6 +18191,12 @@ ix86_print_operand_address_as (FILE *file, rtx add
fputs ("ds:", file);
  fprintf (file, HOST_WIDE_INT_PRINT_DEC, INTVAL (disp));
}
+  /* Load the external function address via the GOT slot to avoid PLT.  */
+  else if (GET_CODE (disp) == CONST
+  && GET_CODE (XEXP (disp, 0)) == UNSPEC
+  && XINT (XEXP (disp, 0), 1) == UNSPEC_GOTPCREL
+  && ix86_force_load_from_GOT_p (XVECEXP (XEXP (disp, 0), 0, 0)))
+   output_pic_addr_const (file, disp, 0);
   else if (flag_pic)
output_pic_addr_const (file, disp, 0);
   else
@@ -19417,6 +19445,15 @@ ix86_expand_move (machine_mode mode, rtx operands[
 
   if (model)
op1 = legitimize_tls_address (op1, model, true);
+  else if (ix86_force_load_from_GOT_p (op1))
+   {
+ /* Load the external function address via GOT slot to avoid PLT.  */
+ op1 = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, op1),
+   UNSPEC_GOTPCREL);
+ op1 = gen_rtx_CONST (Pmode, op1);
+ op1 = gen_const_mem (Pmode, op1);
+ 

Re: Implement C _FloatN, _FloatNx types [version 2]

2016-06-22 Thread FX
> Fortran note: the float128_type_node used in the Fortran front end is
> renamed to gfc_float128_type_node, since the semantics are different:
> in particular, if long double has binary128 format, then the new
> language-independent float128_type_node is a distinct type that also
> has binary128 format, but the Fortran node is expected to be NULL in
> that case.  Likewise, Fortran's complex_float128_type_node is renamed
> to gfc_complex_float128_type_node.

Fortran part is OK.


Re: [PATCH 1/3] Add gcc-auto-profile script

2016-06-22 Thread Andi Kleen
On Wed, Jun 22, 2016 at 11:34:05PM +0200, Bernhard Reutner-Fischer wrote:
> >+for m in mod[:-1]:
> >+print "model*:\ %s|\\" % m
> >+print 'model*:\ %s) E="%s$FLAGS" ;;' % (mod[-1], event)
> >+print '''*)
> >+echo >&2 "Unknown CPU. Run contrib/gen_autofdo_event.py --all --script
> >to update script."
> >+exit 1 ;;'''
> >+print "esac"
> >+print 'exec perf record -e $E -b "$@"'
> 
> Need to quote $E ?

There's never a space in it

> >+if ! grep -q Intel /proc/cpuinfo ] ; then
> 
> I'm surprised this even runs?

It works here

$ grep -q Intel /proc/cpuinfo ] ; echo $?
0

But will fix, thanks.

-Andi


-- 
a...@linux.intel.com -- Speaking for myself only.


Re: [PATCH 1/3] Add gcc-auto-profile script

2016-06-22 Thread Bernhard Reutner-Fischer
On June 22, 2016 2:37:04 PM GMT+02:00, Andi Kleen  wrote:
>From: Andi Kleen 
>
>Using autofdo is currently something difficult. It requires using the
>model specific branches taken event, which differs on different CPUs.
>The example shown in the manual requires a special patched version of
>perf that is non standard, and also will likely not work everywhere.
>
>This patch adds a new gcc-auto-profile script that figures out the
>correct event and runs perf.
>
>This is needed to actually make use of autofdo in a generic way
>in the build system and in the test suite.
>
>Since maintaining the script would be somewhat tedious (needs changes
>every time a new CPU comes out) I auto generated it from the online
>Intel event database. The script to do that is in contrib and can be
>rerun.
>
>Right now there is no test if perf works in configure. This
>would vary depending on the build and target system, and since
>it currently doesn't work in virtualization and needs uptodate
>kernel it may often fail in common distribution build setups.
>
>So far the script is not installed.
>
>v2: Remove documentation of gcc-auto-profile, as its not
>installed.
>
>gcc/:
>2016-06-22  Andi Kleen  
>
>   * doc/invoke.texi: Document gcc-auto-profile
>   * config/i386/gcc-auto-profile: New file.
>
>contrib/:
>
>2016-06-22  Andi Kleen  
>
>   * gen_autofdo_event.py: New file to regenerate
>   gcc-auto-profile.
>---
>contrib/gen_autofdo_event.py | 155
>+++
> gcc/config/i386/gcc-auto-profile |  70 ++
> 2 files changed, 225 insertions(+)
> create mode 100755 contrib/gen_autofdo_event.py
> create mode 100755 gcc/config/i386/gcc-auto-profile
>
>diff --git a/contrib/gen_autofdo_event.py
>b/contrib/gen_autofdo_event.py
>new file mode 100755
>index 000..66cd613
>--- /dev/null
>+++ b/contrib/gen_autofdo_event.py
>@@ -0,0 +1,155 @@
>+#!/usr/bin/python
>+# Generate Intel taken branches Linux perf event script for autofdo
>profiling.
>+
>+# Copyright (C) 2016 Free Software Foundation, Inc.
>+#
>+# GCC is free software; you can redistribute it and/or modify it under
>+# the terms of the GNU General Public License as published by the Free
>+# Software Foundation; either version 3, or (at your option) any later
>+# version.
>+#
>+# GCC is distributed in the hope that it will be useful, but WITHOUT
>ANY
>+# WARRANTY; without even the implied warranty of MERCHANTABILITY or
>+# FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
>License
>+# for more details.
>+#
>+# You should have received a copy of the GNU General Public License
>+# along with GCC; see the file COPYING3.  If not see
>+# .  */
>+
>+# Run it with perf record -b -e EVENT program ...
>+# The Linux Kernel needs to support the PMU of the current CPU, and
>+# It will likely not work in VMs.
>+# Add --all to print for all cpus, otherwise for current cpu.
>+# Add --script to generate shell script to run correct event.
>+#
>+# Requires internet (https) access. This may require setting up a
>proxy
>+# with export https_proxy=...
>+#
>+import urllib2
>+import sys
>+import json
>+import argparse
>+import collections
>+
>+baseurl = "https://download.01.org/perfmon;
>+
>+target_events = (u'BR_INST_RETIRED.NEAR_TAKEN',
>+ u'BR_INST_EXEC.TAKEN',
>+ u'BR_INST_RETIRED.TAKEN_JCC',
>+ u'BR_INST_TYPE_RETIRED.COND_TAKEN')
>+
>+ap = argparse.ArgumentParser()
>+ap.add_argument('--all', '-a', help='Print for all CPUs',
>action='store_true')
>+ap.add_argument('--script', help='Generate shell script',
>action='store_true')
>+args = ap.parse_args()
>+
>+eventmap = collections.defaultdict(list)
>+
>+def get_cpu_str():
>+with open('/proc/cpuinfo', 'r') as c:
>+vendor, fam, model = None, None, None
>+for j in c:
>+n = j.split()
>+if n[0] == 'vendor_id':
>+vendor = n[2]
>+elif n[0] == 'model' and n[1] == ':':
>+model = int(n[2])
>+elif n[0] == 'cpu' and n[1] == 'family':
>+fam = int(n[3])
>+if vendor and fam and model:
>+return "%s-%d-%X" % (vendor, fam, model), model
>+return None, None
>+
>+def find_event(eventurl, model):
>+print >>sys.stderr, "Downloading", eventurl
>+u = urllib2.urlopen(eventurl)
>+events = json.loads(u.read())
>+u.close()
>+
>+found = 0
>+for j in events:
>+if j[u'EventName'] in target_events:
>+event = "cpu/event=%s,umask=%s/" % (j[u'EventCode'],
>j[u'UMask'])
>+if u'PEBS' in j and j[u'PEBS'] > 0:

I'd have said
if j.get(u'PEBS, 0) > 0:
I.e. not use the default None for lets not in the but zero and test against 
that. I think that's more pythonic but either way.

>+event += "p"
>+if args.script:
>+

Re: [PATCH, rs6000] Prefer vspltisw/h over xxspltib+instruction when available

2016-06-22 Thread Michael Meissner
On Wed, Jun 22, 2016 at 09:22:22AM -0500, Segher Boessenkool wrote:
> Don't give up so easily?  ;-)
> 
> The predicate should be tightened, the expander should use a new predicate
> that allows all those other things.  The hardest part is figuring a good
> name for it ;-)

This code should fix the problem.  It does not allow constants in the
arguments.  Combine will create one of the vec_duplicate patterns with a
constant integer that will generate VSPLTIS or XXSPLTIB/etc.  I also
tightened the memory requirements to only allow indexed memory forms
during/after register allocation, since the instruction only uses indexed
addressing.

I bootstrapped the compiler and ran make check with no regressions on a little
endian power8 system.  Can I check it into trunk, and after an appropriate
waiting period check it into GCC 6.x if there were no issues?

[gcc]
2016-06-22  Michael Meissner  
Bill Schmidt  

* config/rs6000/predicates.md (splat_input_operand): Rework.
Don't allow constants, since the caller insns don't support
constants.  During and after register allocation, only allow
indexed or indirect addresses, and not general addresses.  Only
allow modes supported by the hardware.
* config/rs6000/rs6000.c (xxsplitb_constant_p): Update usage
comment.  Move check for using VSPLTIS to a common location,
instead of doing it in two different places.

[gcc/testsuite]
2016-06-22  Michael Meissner  
Bill Schmidt  

* gcc.target/powerpc/p9-splat-5.c: New test.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/predicates.md
===
--- gcc/config/rs6000/predicates.md (revision 237715)
+++ gcc/config/rs6000/predicates.md (working copy)
@@ -1056,27 +1056,34 @@ (define_predicate "input_operand"
 
 ;; Return 1 if this operand is a valid input for a vsx_splat insn.
 (define_predicate "splat_input_operand"
-  (match_code "symbol_ref,const,reg,subreg,mem,
-  const_double,const_wide_int,const_vector,const_int")
+  (match_code "reg,subreg,mem")
 {
+  machine_mode vmode;
+
+  if (mode == DFmode)
+vmode = V2DFmode;
+  else if (mode == DImode)
+vmode = V2DImode;
+  else if (mode == SImode && TARGET_P9_VECTOR)
+vmode = V4SImode;
+  else if (mode == SFmode && TARGET_P9_VECTOR)
+vmode = V4SFmode;
+  else
+return false;
+
   if (MEM_P (op))
 {
+  rtx addr = XEXP (op, 0);
+
   if (! volatile_ok && MEM_VOLATILE_P (op))
return 0;
-  if (mode == DFmode)
-   mode = V2DFmode;
-  else if (mode == DImode)
-   mode = V2DImode;
-  else if (mode == SImode && TARGET_P9_VECTOR)
-   mode = V4SImode;
-  else if (mode == SFmode && TARGET_P9_VECTOR)
-   mode = V4SFmode;
+
+  if (reload_in_progress || lra_in_progress || reload_completed)
+   return indexed_or_indirect_address (addr, vmode);
   else
-   gcc_unreachable ();
-  return memory_address_addr_space_p (mode, XEXP (op, 0),
- MEM_ADDR_SPACE (op));
+   return memory_address_addr_space_p (vmode, addr, MEM_ADDR_SPACE (op));
 }
-  return input_operand (op, mode);
+  return gpc_reg_operand (op, mode);
 })
 
 ;; Return true if OP is a non-immediate operand and not an invalid
Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 237715)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -6282,10 +6282,7 @@ gen_easy_altivec_constant (rtx op)
Return the number of instructions needed (1 or 2) into the address pointed
via NUM_INSNS_PTR.
 
-   If NOSPLIT_P, only return true for constants that only generate the XXSPLTIB
-   instruction and can go in any VSX register.  If !NOSPLIT_P, only return true
-   for constants that generate XXSPLTIB and need a sign extend operation, which
-   restricts us to the Altivec registers.
+   Return the constant that is being split via CONSTANT_PTR.
 
Allow either (vec_const [...]) or (vec_duplicate ).  If OP is a valid
XXSPLTIB constant, return the constant being set via the CONST_PTR
@@ -6355,13 +6352,6 @@ xxspltib_constant_p (rtx op,
  if (value != INTVAL (element))
return false;
}
-
-  /* See if we could generate vspltisw/vspltish directly instead of
-xxspltib + sign extend.  Special case 0/-1 to allow getting
- any VSX register instead of an Altivec register.  */
-  if (!IN_RANGE (value, -1, 0) && EASY_VECTOR_15 (value)
- && (mode == V4SImode || mode == V8HImode))
-   return false;
 }
 
   /* Handle integer constants being loaded into the upper part of 

Re: [PING**2] [PATCH] Fix asm X constraint (PR inline-asm/59155)

2016-06-22 Thread Bernd Edlinger
On 06/22/16 21:51, Jeff Law wrote:
> On 06/19/2016 07:25 AM, Bernd Edlinger wrote:
>> Hi,
>>
>> ping...
>>
>> As this discussion did not make any progress, I just attached
>> the latest version of my patch with the the changes that
>> Vladimir proposed.
>>
>> Boot-strapped and reg-tested again on x86_64-linux-gnu.
>> Is it OK for the trunk?
> Well, I don't think we've got any kind of consensus on whether or not
> this is reasonable or not.
>
> The fundamental issue is that "X" is supposed to accept anything,
> literally anything.  That implies it's really the downstream users of
> those operands that are broken.
>

Hmm...

I think it must be pretty easy to write something in a .md file with the
X constraint that ends up in an ICE, right?

But in an .md file we have much more control on what happens.
That's why I did not propose to change the meaning of "X" in .md files.

And we only have problems with asm statements that use "X" constraints.

Nobody has any use case where the really anything kind of RTL operand
is actually useful in a user-written assembler statement.

Please correct me if that is wrong.

But I think we have a use case where "X" means really more possible
registers (i.e. includes ss2, mmx etc.) than "g" (only general
registers).  Otherwise, in the test cases of pr59155 we would not
have any benefit for using "+X" instead of "+g" or "+r".

Does that sound reasonable?


Bernd.


Re: Implement C _FloatN, _FloatNx types [version 2]

2016-06-22 Thread Joseph Myers
Here is patch version 2, updated to apply without conflicts to current 
trunk (there was a conflict with r237714 moving cases over RID_* around) 
but with no other changes.  (In particular, with no further changes to the 
PowerPC choice of mode - in this patch it matches what __float128 already 
does, and if the PowerPC maintainers decide that (FLOAT128_IEEE_P 
(TFmode)) ? TFmode : KFmode as used for *q constants is preferable, that 
can easily be used instead.)

Can we get the non-C-front-end parts (which are actually pretty 
straightforward) reviewed so we can get on to adding appropriate built-in 
functions on top of this, and other followup changes?


ISO/IEC TS 18661-3:2015 defines C bindings to IEEE interchange and
extended types, in the form of _FloatN and _FloatNx type names with
corresponding fN/FN and fNx/FNx constant suffixes and FLTN_* / FLTNX_*
 macros.  This patch implements support for this feature in
GCC.

The _FloatN types, for N = 16, 32, 64 or >= 128 and a multiple of 32,
are types encoded according to the corresponding IEEE interchange
format (endianness unspecified; may use either the NaN conventions
recommended in IEEE 754-2008, or the MIPS NaN conventions, since the
choice of convention is only an IEEE recommendation, not a
requirement).  The _FloatNx types, for N = 32, 64 and 128, are IEEE
"extended" types: types extending a narrower format with range and
precision at least as big as those specified in IEEE 754 for each
extended type (and with unspecified representation, but still
following IEEE semantics for their values and operations - and with
the set of values being determined by the precision and the maximum
exponent, which means that while Intel "extended" is suitable for
_Float64x, m68k "extended" is not).  These types are always distinct
from and not compatible with each other and the standard floating
types float, double, long double; thus, double, _Float64 and _Float32x
may all have the same ABI, but they are three still distinct types.
The type names may be used with _Complex to construct corresponding
complex types (unlike __float128, which acts more like a typedef name
than a keyword - thus, this patch may be considered to fix PR
c/32187).  The new suffixes can be combined with GNU "i" and "j"
suffixes for constants of complex types (e.g. 1.0if128, 2.0f64i).

The set of types supported is implementation-defined.  In this GCC
patch, _Float32 is SFmode if that is suitable; _Float32x and _Float64
are DFmode if that is suitable; _Float128 is TFmode if that is
suitable; _Float64x is XFmode if that is suitable, and otherwise
TFmode if that is suitable.  There is a target hook to override the
choices if necessary.  "Suitable" means both conforming to the
requirements of that type, and supported as a scalar type including in
libgcc.  The ABI is whatever the back end does for scalars of that
mode (but note that _Float32 is passed without promotion in variable
arguments, unlike float).  All the existing issues with exceptions and
rounding modes for existing types apply equally to the new type names.

No GCC port supports a floating-point format suitable for _Float128x.
Although there is HFmode support for ARM and AArch64, use of that for
_Float16 is not enabled.  Supporting _Float16 would require additional
work on the excess precision aspects of TS 18661-3: there are new
values of FLT_EVAL_METHOD, which are not currently supported in GCC,
and FLT_EVAL_METHOD == 0 now means that operations and constants on
types narrower than float are evaluated to the range and precision of
float.  Implementing that, so that _Float16 gets evaluated with excess
range and precision, would involve changes to the excess precision
infrastructure so that the _Float16 case is enabled by default, unlike
the x87 case which is only enabled for -fexcess-precision=standard.
Other differences between _Float16 and __fp16 would also need to be
disentangled.

GCC has some prior support for nonstandard floating-point types in the
form of __float80 and __float128.  Where these were previously types
distinct from long double, they are made by this patch into aliases
for _Float64x / _Float128 if those types have the required properties.

In principle the set of possible _FloatN types is infinite.  This
patch hardcodes the four such types for N <= 128, but with as much
code as possible using loops over types to minimize the number of
places with such hardcoding.  I don't think it's likely any further
such types will be of use in future (or indeed that formats suitable
for _Float128x will actually be implemented).  There is a corner case
that all _FloatN, for N >= 128 and a multiple of 32, should be treated
as keywords even when the corresponding type is not supported; I
intend to deal with that in a followup patch.

Tests are added for various functionality of the new types, mostly
using type-generic headers.  PowerPC maintainers should note that the
tests do not do anything regarding passing special options to 

Re: [PATCH 1/2][v3] Drop excess size used for run time allocated stack variables.

2016-06-22 Thread Jeff Law

On 05/25/2016 07:30 AM, Dominik Vogt wrote:

On Tue, May 03, 2016 at 03:17:53PM +0100, Dominik Vogt wrote:

> Version two of the patch including a test case.
>
> On Mon, May 02, 2016 at 09:10:25AM -0600, Jeff Law wrote:

> > On 04/29/2016 04:12 PM, Dominik Vogt wrote:

> > >The attached patch removes excess stack space allocation with
> > >alloca in some situations.  Plese check the commit message in the
> > >patch for details.

>

> > However, I would strongly recommend some tests, even if they are
> > target specific.  You can always copy pr36728-1 into the s390x
> > directory and look at size of the generated stack.  Simliarly for
> > pr50938 for x86.

>
> However, x86 uses the "else" branch in round_push, i.e. it uses
> "virtual_preferred_stack_boundary_rtx" to calculate the number of
> bytes to add for stack alignment.  That value is unknown at the
> time round_push is called, so the test case fails on such targets,
> and I've no idea how to fix this properly.

Third version of the patch with the suggested cleanup in the first
patch and the functional stuff in the second one.  The first patch
is based on Jeff's draft with the change suggested by Eric and
more cleanup added by me.

Tested and bootstrapped on s390x biarch (but did not look for
performance regressions as the change should be a no-op).

Ciao

Dominik ^_^  ^_^

-- Dominik Vogt IBM Germany


0001-ChangeLog


gcc/ChangeLog0

* explow.c (allocate_dynamic_stack_space): Simplify knowing that
MUST_ALIGN was always true and extra_align ist always BITS_PER_UNIT.
I think this meets the spirit of Eric's request to save the comment.  So 
OK for the trunk.  I realize this is kind of self-approving since the 
original cleanup was mine, but Eric signaled he was OK with the cleanup 
as long as the comment was saved.


jeff


Re: [PATCH, rs6000] Add minimum __float128 built-in support required for glibc

2016-06-22 Thread Bill Schmidt
Hi Joseph,

That's indeed preferable for the long term -- given how close we are to the 
cutoff for 6.2, though, I'm worried about adding any new dependencies for 
getting this upstream.  I'd suggest that we go ahead with reviewing this 
patch in the short term, and I'll be happy to work with you later on getting
the impedance matching right when they become arch-neutral.

Thanks!
Bill

> On Jun 22, 2016, at 3:26 PM, Joseph Myers  wrote:
> 
> On Wed, 22 Jun 2016, Bill Schmidt wrote:
> 
>> Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
>> regressions.  All new tests pass except for the test for vspltish in
>> the infinity test; this relies on a patch in progress to fix things so we
>> generate that instead of an inferior sequence.  Is this ok for trunk,
>> and for 6.2 after an appropriate burn-in period?
> 
> Depending on how long it takes my _FloatN / _FloatNx patch to get 
> reviewed, it might be better to avoid adding the built-in functions as 
> target-specific on trunk.  Adding them as architecture-independent would 
> be extremely straightforward once the support for the types is there, and 
> then the architecture-specific versions would only be needed for GCC 6.  
> (The insn patterns and tests would still be relevant for trunk.)
> 
> -- 
> Joseph S. Myers
> jos...@codesourcery.com
> 



Re: [PATCH, rs6000] Add minimum __float128 built-in support required for glibc

2016-06-22 Thread Joseph Myers
On Wed, 22 Jun 2016, Bill Schmidt wrote:

> Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
> regressions.  All new tests pass except for the test for vspltish in
> the infinity test; this relies on a patch in progress to fix things so we
> generate that instead of an inferior sequence.  Is this ok for trunk,
> and for 6.2 after an appropriate burn-in period?

Depending on how long it takes my _FloatN / _FloatNx patch to get 
reviewed, it might be better to avoid adding the built-in functions as 
target-specific on trunk.  Adding them as architecture-independent would 
be extremely straightforward once the support for the types is there, and 
then the architecture-specific versions would only be needed for GCC 6.  
(The insn patterns and tests would still be relevant for trunk.)

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] Add a new target hook to compute the frame layout

2016-06-22 Thread Bernd Edlinger
On 06/22/16 20:49, Jeff Law wrote:
> On 06/21/2016 11:12 PM, Bernd Edlinger wrote:
>
>>
>> What I wanted to say here, is that lra goes thru several iterations,
>> changes something in the register allocation that has an impact on the
>> frame layout, typically 4-5 times, and calls INITIAL_ELIMINATION_OFFSET
>> 3-4 times in a row, and in the results must be consistent in each
>> iteration to be usable.
>>
>> So I am open to suggestions, how would you explain this idea in the doc?
> I'm not sure :(  The goal is still the same, you're trying to separate
> the O(n) from the O(1) operations.  So you want the COMPUTE_FRAME_LAYOUT
> hook to be called once for things which don't vary and
> INITIAL_ELIMINATION_OFFSET multiple times for things that do vary.
>
> Thinking more about this, which port has has a particularly expensive
> INITIAL_ELIMINATION_OFFSET?
>

I'd bet on mips for instance.


Bernd.


Re: Fix for PR70926 in Libiberty Demangler (5)

2016-06-22 Thread Jeff Law

On 05/26/2016 01:02 AM, Marcel Böhme wrote:

Hi: Pending review.

Best - Marcel


On 3 May 2016, at 10:40 PM, Marcel Böhme  wrote:

Hi,

This fixes four access violations 
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70926).

Two of these first read the value of a length variable len from the mangled 
string, then strncpy len characters from the mangled string; more than 
necessary.
The other two read the value of an array index n from the mangled string, which 
can be negative due to an overflow.

Bootstrapped and regression tested on x86_64-pc-linux-gnu. Test cases added to 
libiberty/testsuite/demangler-expected and checked PR70926 is resolved.

Best regards,
- Marcel

Index: libiberty/ChangeLog
===
--- libiberty/ChangeLog (revision 235801)
+++ libiberty/ChangeLog (working copy)
@@ -1,3 +1,12 @@
+2016-05-03  Marcel Böhme  
+
+   PR c++/70926
+   * cplus-dem.c: Handle large values and overflow when demangling
+   length variables.
+   (demangle_template_value_parm): Read only until end of mangled string.
+   (do_hpacc_template_literal): Likewise.
+   (do_type): Handle overflow when demangling array indices.

OK for the trunk.  Please install.

Sorry for the delays.

Jeff



[PATCH, rs6000] Add minimum __float128 built-in support required for glibc

2016-06-22 Thread Bill Schmidt
Hi,

This patch implements built-ins to support __float128 on 64-bit PowerPC.
This is a minimum set of built-ins required for use by glibc.  The following
six built-ins are supported:

  __builtin_absf128
  __builtin_copysignf128
  __builtin_huge_valf128
  __builtin_inff128
  __builtin_nanf128
  __builtin_nansf128

For the NaNs, I borrowed heavily from a similar patch posted recently for
the ia64 target, which allows the constants to be generated early on.  The
absf128 and copysignf128 built-ins rely on some existing patterns, although
I had to write a soft-float version for the latter.  For inff128, I've used a 
four-
instruction sequence to generate the bit pattern directly in the vector
register rather than requiring a load from the constant pool, although I've
included that alternative in commentary should it prove to be better in
practice.

New tests are provided to exercise the built-ins.

Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
regressions.  All new tests pass except for the test for vspltish in
the infinity test; this relies on a patch in progress to fix things so we
generate that instead of an inferior sequence.  Is this ok for trunk,
and for 6.2 after an appropriate burn-in period?

Thanks!

Bill


[gcc]

2016-06-22  Bill Schmidt  

* config/rs6000/altivec.md (*altivec_vrl): Remove
asterisk from name.
(altivec_vslo_kf_v8hi): New define_insn.
* config/rs6000/rs6000-builtin.def (BU_FLOAT128_2): New #define.
(BU_FLOAT128_1): Likewise.
(BU_FLOAT128_0): Likewise.
(INFF128): New builtin.
(HUGE_VALF128): Likewise.
(FABSF128): Likewise.
(COPYSIGNF128): Likewise.
(RS6000_BUILTIN_NANF128): Likewise.
(RS6000_BUILTIN_NANSF128): Likewise.
* config/rs6000/rs6000.c (rs6000_fold_builtin): New prototype.
(TARGET_FOLD_BUILTIN): New #define.
(rs6000_builtin_mask_calculate): Add TARGET_FLOAT128 entry.
(rs6000_invalid_builtin): Add handling for RS6000_BTM_FLOAT128.
(rs6000_fold_builtin): New target hook implementation, handling
folding of 128-bit NaNs.
(rs6000_init_builtins): Initialize const_str_type_node; ensure all
entries are filled in to avoid problems during bootstrap
self-test; define builtins for 128-bit NaNs.
(rs6000_opt_mask): Add entry for float128.
* config/rs6000/rs6000.h (RS6000_BTM_FLOAT128): New #define.
(RS6000_BTM_COMMON): Include RS6000_BTM_FLOAT128.
(rs6000_builtin_type_index): Add RS6000_BTI_const_str.
(const_str_type_node): New #define.
* config/rs6000/vsx.md (infkf1): New define_expand.

[gcc/testsuite]

2016-06-22  Bill Schmidt  

* gcc.target/powerpc/abs128-1.c: New.
* gcc.target/powerpc/copysign128-1.c: New.
* gcc.target/powerpc/inf128-1.c: New.
* gcc.target/powerpc/inf128-2.c: New.
* gcc.target/powerpc/nan128-1.c: New.


Index: gcc/config/rs6000/altivec.md
===
--- gcc/config/rs6000/altivec.md(revision 237619)
+++ gcc/config/rs6000/altivec.md(working copy)
@@ -1608,7 +1608,7 @@
   }"
   [(set_attr "type" "vecperm")])
 
-(define_insn "*altivec_vrl"
+(define_insn "altivec_vrl"
   [(set (match_operand:VI2 0 "register_operand" "=v")
 (rotate:VI2 (match_operand:VI2 1 "register_operand" "v")
(match_operand:VI2 2 "register_operand" "v")))]
@@ -1634,6 +1634,15 @@
   "vslo %0,%1,%2"
   [(set_attr "type" "vecperm")])
 
+(define_insn "altivec_vslo_kf_v8hi"
+  [(set (match_operand:KF 0 "register_operand" "=v")
+(unspec:KF [(match_operand:V8HI 1 "register_operand" "v")
+  (match_operand:V8HI 2 "register_operand" "v")]
+UNSPEC_VSLO))]
+  "TARGET_ALTIVEC"
+  "vslo %0,%1,%2"
+  [(set_attr "type" "vecperm")])
+
 (define_insn "vslv"
   [(set (match_operand:V16QI 0 "register_operand" "=v")
(unspec:V16QI [(match_operand:V16QI 1 "register_operand" "v")
Index: gcc/config/rs6000/rs6000-builtin.def
===
--- gcc/config/rs6000/rs6000-builtin.def(revision 237619)
+++ gcc/config/rs6000/rs6000-builtin.def(working copy)
@@ -652,7 +652,30 @@
 | RS6000_BTC_BINARY),  \
CODE_FOR_ ## ICODE) /* ICODE */
 
+/* IEEE 128-bit floating-point builtins.  */
+#define BU_FLOAT128_2(ENUM, NAME, ATTR, ICODE)  \
+  RS6000_BUILTIN_2 (MISC_BUILTIN_ ## ENUM,  /* ENUM */  \
+"__builtin_" NAME,  /* NAME */  \
+   RS6000_BTM_FLOAT128,/* MASK */  \
+   (RS6000_BTC_ ## ATTR/* ATTR */  \
+| RS6000_BTC_BINARY),

Debug algorithms

2016-06-22 Thread François Dumont

Hi

Here is eventually the so long promized patch to introduce Debug 
algos similarly to Debug containers.


Why such an evolution:
- More flexibility, debug algos can be used explicitely without 
activating Debug mode.
- Performance: Debug algos can get rid of Debug layer on top of 
container iterators to invoke normal algos. Operations on normal 
iterators are faster and we also benefit from the same algos 
specialization that sometimes exist on some container iterators (like 
std::deque ones). Also normal algos are now using other normal algos, 
Debug check won't be done several times.
- It will be easier to implement new Debug checks without the limitation 
to do so through some Debug macro


To do so I introduced a new namespace __cxx1998_a used for normal 
algos when Debug mode is active. I couldn't reuse __cxx1998 cause with 
current implementation of Debug containers __cxx1998 is exposed and 
because of ADL we could then have ambiguity between Debug and normal 
versions of the same algos. I also introduced a __std_a namespace which 
control the kind of algos used within the library mostly for containers 
implementation details.


Patch is compressed as it is quite big.

Tested under Linux x86_64 normal and debug modes.

François




debug_algos.patch.bz2
Description: BZip2 compressed data


Re: [PING**2] [PATCH] Fix asm X constraint (PR inline-asm/59155)

2016-06-22 Thread Jeff Law

On 06/19/2016 07:25 AM, Bernd Edlinger wrote:

Hi,

ping...

As this discussion did not make any progress, I just attached
the latest version of my patch with the the changes that
Vladimir proposed.

Boot-strapped and reg-tested again on x86_64-linux-gnu.
Is it OK for the trunk?
Well, I don't think we've got any kind of consensus on whether or not 
this is reasonable or not.


The fundamental issue is that "X" is supposed to accept anything, 
literally anything.  That implies it's really the downstream users of 
those operands that are broken.


Jeff



Re: [PATCH, rs6000] Scheduling update

2016-06-22 Thread Segher Boessenkool
Hi Pat,

On Tue, Jun 21, 2016 at 12:45:26PM -0500, Pat Haugen wrote:
> 2016-06-21  Pat Haugen  
> 
> * config/rs6000/power8.md (power8-fp): Include dfp type.
> * config/rs6000/power6.md (power6-fp): Likewise.

Please put the files in the changelog in some logical order, even if
your patch tool is a bit dumb.

> * config/rs6000/htm.md (various insns): Change type atribute to 
> htmsimple and set power9_alu2 appropriately.

"attribute", trailing space.

The "power9_alu2" attribute is writing part of the scheduling description
inside the machine description proper.  Can this be reduced, maybe by
adding an attribute describing something about the insns that makes them
be handled by the alu2?  I realise it isn't all so regular :-(

> (rs6000_option_override_internal): Remove temporary code setting
> tuning to power8. Don't set rs6000_sched_groups for power9.

Two spaces after full stop.

> (divCnt, vec_load_pendulum): New variables.

camelCase?

> (_rs6000_sched_context, rs6000_init_sched_context,
> rs6000_set_sched_context): Handle context save/restore of new
> variables.

Pre-existing, but we shouldn't use names starting with underscore+lowercase.

> Index: config/rs6000/htm.md
> ===
> --- config/rs6000/htm.md  (revision 237621)
> +++ config/rs6000/htm.md  (working copy)
> @@ -72,7 +72,8 @@ (define_insn "*tabort"
> (set (match_operand:BLK 2) (unspec:BLK [(match_dup 2)] UNSPEC_HTM_FENCE))]
>"TARGET_HTM"
>"tabort. %0"
> -  [(set_attr "type" "htm")
> +  [(set_attr "type" "htmsimple")
> +   (set_attr "power9_alu2" "yes")
> (set_attr "length" "4")])

What determines if an insn is htm or htmsimple?

> +(define_cpu_unit "x0_power9,x1_power9,xa0_power9,xa1_power9,\
> +   x2_power9,x3_power9,xb0_power9,xb1_power9,
> +   br0_power9,br1_power9" "power9dsp")

One lines has a backslash and one does not.  None are needed I think?

> +; The xa0/xa1 units really represent the 3rd dispatch port for a superslice 
> but
> +; are listed as separate units to allow those insns that preclude its use to
> +; still be scheduled two to a superslice while reserving the 3rd slot. The
> +; same applies for xb0/xb1.

Two spaces after a full stop.

> +; Any execution slice dispatch 

Trailing space.

> +; Superslice
> +(define_reservation "DU_super_power9"
> + "x0_power9+x1_power9|x2_power9+x3_power9")

This needs parens around the alternatives?  Or is it superfluous in all
the other cases that use it?

> +(define_reservation "LSU_pair_power9"
> + "lsu0_power9+lsu1_power9|lsu1_power9+lsu2_power9|\
> +  lsu2_power9+lsu3_power9|lsu3_power9+lsu1_power9")

The 3+1 looks strange, please check (we've talked about that).

> +; 2 cycle FP ops
> +(define_attr "power9_fp_2cyc" "no,yes"
> +  (cond [(eq_attr "mnemonic" "fabs,fcpsgn,fmr,fmrgow,fnabs,fneg,\
> +   xsabsdp,xscpsgndp,xsnabsdp,xsnegdp,\
> +   xsabsqp,xscpsgnqp,xsnabsqp,xsnegqp")
> +  (const_string "yes")]
> +(const_string "no")))

Eww.  Can we have an attribute for the FP move instructions, instead?
Maybe a value "fpmove" for the "type", even?

> +; Quad-precision FP ops, execute in DFU
> +(define_attr "power9_qp" "no,yes"
> +  (if_then_else (ior (match_operand:KF 0 "" "")
> + (match_operand:TF 0 "" "")
> + (match_operand:KF 1 "" "")
> + (match_operand:TF 1 "" ""))
> +(const_string "yes")
> +(const_string "no")))

(The "" are not needed I think).

This perhaps could be better handled with the "size" attribute.

> +(define_insn_reservation "power9-load-ext" 6
> +  (and (eq_attr "type" "load")
> +   (eq_attr "sign_extend" "yes")
> +   (eq_attr "update" "no")
> +   (eq_attr "cpu" "power9"))
> +  "DU_C2_power9,LSU_power9")

So you do not describe the units used after the first cycle?  Why is
that, to keep the size of the automaton down?


> +(define_insn_reservation "power9-fpload-double" 4
> +  (and (eq_attr "type" "fpload")
> +   (eq_attr "update" "no")
> +   (match_operand:DF 0 "" "")
> +   (eq_attr "cpu" "power9"))
> +  "DU_slice_3_power9,LSU_power9")

Using match_operand here is asking for trouble.  "size", and you can
default that for "fpload" insns, and document there that it looks at the
mode of operands[0] for fpload?

> +; Store data can issue 2 cycles after AGEN issue, 3 cycles for vector store
> +(define_insn_reservation "power9-store" 0
> +  (and (eq_attr "type" "store")
> +   (not (and (eq_attr "update" "yes")
> +  (eq_attr "indexed" "yes")))
> +   (eq_attr "cpu" "power9"))
> +  "DU_slice_3_power9,LSU_power9")

That should be

+(define_insn_reservation "power9-store" 0
+  (and (eq_attr "type" "store")
+   (eq_attr "update" 

Re: [PATCH] Add a new target hook to compute the frame layout

2016-06-22 Thread Jeff Law

On 06/21/2016 11:12 PM, Bernd Edlinger wrote:



What I wanted to say here, is that lra goes thru several iterations,
changes something in the register allocation that has an impact on the
frame layout, typically 4-5 times, and calls INITIAL_ELIMINATION_OFFSET
3-4 times in a row, and in the results must be consistent in each
iteration to be usable.

So I am open to suggestions, how would you explain this idea in the doc?
I'm not sure :(  The goal is still the same, you're trying to separate 
the O(n) from the O(1) operations.  So you want the COMPUTE_FRAME_LAYOUT 
hook to be called once for things which don't vary and 
INITIAL_ELIMINATION_OFFSET multiple times for things that do vary.


Thinking more about this, which port has has a particularly expensive 
INITIAL_ELIMINATION_OFFSET?


Jeff


[PATCH, i386]: Simplify SYMBOL_REF handling in ix86_expand_move

2016-06-22 Thread Uros Bizjak
Hello!

This patch de-duplicates code involving SYMBOL_REF processing in
ix86_expand_move.

No functional changes.

2016-06-22  Uros Bizjak  

* config/i386/i386.c (ix86_expand_move): Simplify SYMBOL_REF handling.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline SVN.

Uros.
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 9d36106..fa0ba63 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -19388,50 +19388,62 @@ void
 ix86_expand_move (machine_mode mode, rtx operands[])
 {
   rtx op0, op1;
+  rtx tmp, addend = NULL_RTX;
   enum tls_model model;
 
   op0 = operands[0];
   op1 = operands[1];
 
-  if (GET_CODE (op1) == SYMBOL_REF)
+  switch (GET_CODE (op1))
 {
-  rtx tmp;
+case CONST:
+  tmp = XEXP (op1, 0);
 
+  if (GET_CODE (tmp) != PLUS
+ || GET_CODE (XEXP (tmp, 0)) != SYMBOL_REF)
+   break;
+
+  op1 = XEXP (tmp, 0);
+  addend = XEXP (tmp, 1);
+  /* FALLTHRU */
+
+case SYMBOL_REF:
   model = SYMBOL_REF_TLS_MODEL (op1);
-  if (model)
-   {
- op1 = legitimize_tls_address (op1, model, true);
- op1 = force_operand (op1, op0);
- if (op1 == op0)
-   return;
- op1 = convert_to_mode (mode, op1, 1);
-   }
-  else if ((tmp = legitimize_pe_coff_symbol (op1, false)) != NULL_RTX)
-   op1 = tmp;
-}
-  else if (GET_CODE (op1) == CONST
-  && GET_CODE (XEXP (op1, 0)) == PLUS
-  && GET_CODE (XEXP (XEXP (op1, 0), 0)) == SYMBOL_REF)
-{
-  rtx addend = XEXP (XEXP (op1, 0), 1);
-  rtx symbol = XEXP (XEXP (op1, 0), 0);
-  rtx tmp;
 
-  model = SYMBOL_REF_TLS_MODEL (symbol);
   if (model)
-   tmp = legitimize_tls_address (symbol, model, true);
+   op1 = legitimize_tls_address (op1, model, true);
   else
-tmp = legitimize_pe_coff_symbol (symbol, true);
+   {
+ tmp = legitimize_pe_coff_symbol (op1, addend != NULL_RTX);
+ if (tmp)
+   {
+ op1 = tmp;
+ if (!addend)
+   break;
+   }
+ else
+   {
+ op1 = operands[1];
+ break;
+   }
+   }
 
-  if (tmp)
+  if (addend)
{
- tmp = force_operand (tmp, NULL);
- tmp = expand_simple_binop (Pmode, PLUS, tmp, addend,
+ op1 = force_operand (op1, NULL_RTX);
+ op1 = expand_simple_binop (Pmode, PLUS, op1, addend,
 op0, 1, OPTAB_DIRECT);
- if (tmp == op0)
-   return;
- op1 = convert_to_mode (mode, tmp, 1);
}
+  else
+   op1 = force_operand (op1, op0);
+
+  if (op1 == op0)
+   return;
+
+  op1 = convert_to_mode (mode, op1, 1);
+
+default:
+  break;
 }
 
   if ((flag_pic || MACHOPIC_INDIRECT)


Re: [PATCH 3/8] nvptx -muniform-simt

2016-06-22 Thread Alexander Monakov
Ping.

On Mon, 13 Jun 2016, Alexander Monakov wrote:
> On Sun, 12 Jun 2016, Sandra Loosemore wrote:
> > On 06/09/2016 10:53 AM, Alexander Monakov wrote:
> > > +@item -muniform-simt
> > > +@opindex muniform-simt
> > > +Generate code that allows to keep all lanes in each warp active, even 
> > > when
> > 
> > Allows *what* to keep?  E.g. what is doing the keeping here?  If it is the
> > generated code itself, please rephrase as
> > 
> > Generate code that keeps
> 
> Let me try to expand and rephrase what I meant:
> 
> Allows the compiler to emit code that, at run time, may have all lanes active,
> particularly in those regions of the program where observable effects from
> execution must happen as if one lane is active (outside of SIMD loops).
> 
> But nevertheless generated code can run just like conventionally generated
> code does: with each lane being active/inactive independently, and side
> effects happening from each active lane (inside of SIMD loops).
> 
> Whether it actually runs in the former (let's call it "uniform") or the latter
> ("conventional") way is switchable at run time. The compiler itself is
> responsible for emitting mode changes at SIMD region boundaries.
> 
> Does this help? Below I went with your suggestion, but changed "keeps" to "may
> keep" because that's generally true only outside of SIMD regions.
> 
> > > +observable effects from execution should appear as if only one lane was
> > 
> > s/was/is/
> > 
> > > +active. This is achieved by instrumenting syscalls and atomic 
> > > instructions
> > > in
> > > +a lightweight way that allows to switch behavior at runtime. This code
> > 
> > Same issue here  allows *what* to switch behavior?  (And how would you
> > select which run-time behavior you want?)
> 
> Sorry. This gives compiler itself a way to emit code that will switch behavior
> of the subsequently running code.
> 
> > Also, in the snippet above where it is used as a noun, please
> > s/runtime/run time/
> 
> Thanks. Does the following look better?
> 
> @item -muniform-simt
> @opindex muniform-simt
> Generate code that may keep all lanes in each warp active, even when
> observable effects from execution must appear as if only one lane is active.
> This is achieved by instrumenting syscalls and atomic instructions in a
> lightweight way, allowing the compiler to emit code that can switch at run
> time between this and conventional execution modes. This code generation
> variant is used for OpenMP offloading, but the option is exposed on its own
> for the purpose of testing the compiler; to generate code suitable for linking
> into programs using OpenMP offloading, use option @option{-mgomp}.
> 
> Alexander


Re: AW: [PATCH] Add a new target hook to compute the frame layout

2016-06-22 Thread Jeff Law

On 06/22/2016 01:20 AM, Bernd Edlinger wrote:

On 06/21/16 23:29, Jeff Law wrote:


How does this macro interact with INITIAL_FRAME_POINTER_OFFSET?


That I forgot to mention:  INITIAL_FRAME_POINTER_OFFSET is just
a single call, so whenever it is called from lra/reload the frame layout
is really expected to change, and so it does not make a difference if the target
computes the frame layout in TARGET_COMPUTE_FRAME_LAYOUT or in
INITIAL_FRAME_POINTER_OFFSET.

But I do not know of any targets that still use INITIAL_FRAME_POINTER_OFFSET,
and maybe support for this target hook could be discontinued as a follow-up 
patch.

INITIAL_FRAME_POINTER_OFFSET is only defined in 4 ports:

./ft32/ft32.h:#define INITIAL_FRAME_POINTER_OFFSET(DEPTH) (DEPTH) = 0
./m32r/m32r.h:#define INITIAL_FRAME_POINTER_OFFSET(VAR) \
./moxie/moxie.h:#define INITIAL_FRAME_POINTER_OFFSET(DEPTH) (DEPTH) = 0
./vax/vax.h:#define INITIAL_FRAME_POINTER_OFFSET(DEPTH) (DEPTH) = 0;

However, the m32r version is actually #if 0'd out.  So it's really only 
defined in 3 ports and always to "0".  So yea, it'd be a good candidate 
to collapse away.


Jeff



Re: [PATCH] Backport PowerPC complex __float128 compiler support to GCC 6.x

2016-06-22 Thread Michael Meissner
On Wed, Jun 15, 2016 at 11:01:05AM +0200, Richard Biener wrote:
> And I don't understand the layout_type change either - it looks to me
> it could just have used
> 
>   SET_TYPE_MODE (type, GET_MODE_COMPLEX_MODE (TYPE_MODE 
> (TREE_TYPE (type;
> 
> and be done with it.  To me that looks a lot safer.

I made this change in the trunk, and now I would like approval for applying
this code which includes the above change in the GCC 6.2 branch.

Here is the change for the trunk:
https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01489.html

I tested it on both a big endian power7 and a little endian power8 systems with
no regressions.  Is it ok to apply to the GCC 6.2 branch?

[gcc]
2016-06-22  Michael Meissner  

Back port from trunk
2016-06-21  Michael Meissner  

* stor-layout.c (layout_type): Move setting complex MODE to
layout_type, instead of setting it ahead of time by the caller.

Back port from trunk
2016-05-11  Alan Modra  

* config/rs6000/rs6000.c (is_complex_IBM_long_double,
abi_v4_pass_in_fpr): New functions.
(rs6000_function_arg_boundary): Exclude complex IBM long double
from 64-bit alignment when ABI_V4.
(rs6000_function_arg, rs6000_function_arg_advance_1,
rs6000_gimplify_va_arg): Use abi_v4_pass_in_fpr.

Back port from trunk
2016-05-02  Michael Meissner  

* machmode.h (mode_complex): Add support to give the complex mode
for a given mode.
(GET_MODE_COMPLEX_MODE): Likewise.
* stor-layout.c (layout_type): For COMPLEX_TYPE, use the mode
stored by build_complex_type and gfc_build_complex_type instead of
trying to figure out the appropriate mode based on the size. Raise
an assertion error, if the type was not set.
* genmodes.c (struct mode_data): Add field for the complex type of
the given type.
(blank_mode): Likewise.
(make_complex_modes): Remember the complex mode created in the
base type.
(emit_mode_complex): Write out the mode_complex array to map a
type mode to the complex version.
(emit_insn_modes_c): Likewise.
* tree.c (build_complex_type): Set the complex type to use before
calling layout_type.
* config/rs6000/rs6000.c (rs6000_hard_regno_nregs_internal): Add
support for __float128 complex datatypes.
(rs6000_hard_regno_mode_ok): Likewise.
(rs6000_setup_reg_addr_masks): Likewise.
(rs6000_complex_function_value): Likewise.
* config/rs6000/rs6000.h (FLOAT128_IEEE_P): Likewise.
__float128 and __ibm128 complex.
(FLOAT128_IBM_P): Likewise.
(ALTIVEC_ARG_MAX_RETURN): Likewise.
* doc/extend.texi (Additional Floating Types): Document that
-mfloat128 must be used to enable __float128.  Document complex
__float128 and __ibm128 support.

[gcc/testsuite]
2016-06-22  Michael Meissner  

Back port from trunk
2016-05-02  Michael Meissner  

* gcc.target/powerpc/float128-complex-1.c: New tests for complex
__float128.
* gcc.target/powerpc/float128-complex-2.c: Likewise.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 237619)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -1882,7 +1882,7 @@ rs6000_hard_regno_nregs_internal (int re
  128-bit floating point that can go in vector registers, which has VSX
  memory addressing.  */
   if (FP_REGNO_P (regno))
-reg_size = (VECTOR_MEM_VSX_P (mode)
+reg_size = (VECTOR_MEM_VSX_P (mode) || FLOAT128_VECTOR_P (mode)
? UNITS_PER_VSX_WORD
: UNITS_PER_FP_WORD);
 
@@ -1914,6 +1914,9 @@ rs6000_hard_regno_mode_ok (int regno, ma
 {
   int last_regno = regno + rs6000_hard_regno_nregs[mode][regno] - 1;
 
+  if (COMPLEX_MODE_P (mode))
+mode = GET_MODE_INNER (mode);
+
   /* PTImode can only go in GPRs.  Quad word memory operations require even/odd
  register combinations, and use PTImode where we need to deal with quad
  word memory operations.  Don't allow quad words in the argument or frame
@@ -2716,8 +2719,17 @@ rs6000_setup_reg_addr_masks (void)
 
   for (m = 0; m < NUM_MACHINE_MODES; ++m)
 {
-  machine_mode m2 = (machine_mode)m;
-  unsigned short msize = GET_MODE_SIZE (m2);
+  machine_mode m2 = (machine_mode) m;
+  bool complex_p = false;
+  size_t msize;
+
+  if (COMPLEX_MODE_P (m2))
+   {
+ complex_p = true;
+ m2 = GET_MODE_INNER (m2);
+   }
+
+  msize = GET_MODE_SIZE (m2);
 
   /* 

Re: [Patch, avr] Fix PR 71151

2016-06-22 Thread Georg-Johann Lay

Senthil Kumar Selvaraj schrieb:

Senthil Kumar Selvaraj writes:


Georg-Johann Lay writes:


Senthil Kumar Selvaraj schrieb:

Hi,

  [set JUMP_TABLES_IN_TEXT_SECTION to 1]


I added tests that use linker relaxation and discovered a relaxation bug
in binutils 2.26 (and later) that messes up symbol values in the
presence of alignment directives. I'm working on that right now -
hopefully, it'll get backported to the release branch.

Once that gets upstream, I'll resend the patch - with more tests, and
incorporating your comments.



There were two binutils bugs (PR ld/20221 and ld/20254) that were
blocking this patch - on enabling, relaxation, jumptables were
getting corrupted. Both of the issues are now fixed, and the fixes
are in master and 2.26 branch.


Should we mention in the release notes that Binutils >= 2.26 is needed 
for avr-gcc >= 6 ?


Maybe even check during configure whether an appropriate version of 
Binutils is used?


Johann


Re: [patch] preserve DECL_ORIGINAL_TYPE invariant in remap_decl

2016-06-22 Thread Jeff Law

On 06/22/2016 03:12 AM, Eric Botcazou wrote:

Hi,

the invariant is that DECL_ORIGINAL_TYPE (t) != TREE_TYPE (t) as pointer value
if t is a TYPE_DECL.  It's enforced by the DWARF back-end:

  if (DECL_ORIGINAL_TYPE (decl))
{
  type = DECL_ORIGINAL_TYPE (decl);

  if (type == error_mark_node)
return;

  gcc_assert (type != TREE_TYPE (decl));

[...]

  /* Prevent broken recursion; we can't hand off to the same type.  */
  gcc_assert (DECL_ORIGINAL_TYPE (TYPE_NAME (type)) != type);

Unfortunately it can be easily broken in remap_decl:

  /* Remap types, if necessary.  */
  TREE_TYPE (t) = remap_type (TREE_TYPE (t), id);
  if (TREE_CODE (t) == TYPE_DECL)
DECL_ORIGINAL_TYPE (t) = remap_type (DECL_ORIGINAL_TYPE (t), id);

If TREE_TYPE (t) is for example a pointer to a variably-modified type, then
the types are remapped by means of build_pointer_type_for_mode, which means
that they are also canonicalized, so TREE_TYPE (t) == DECL_ORIGINAL_TYPE (t)
after the remapping.  This happens in Ada, but also in C for:

extern void bar (void *) __attribute__((noreturn));

static int foo (int i, unsigned int n)
{
  if (i == 0)
{
  struct S { int a[n]; };
  typedef struct S *ptr;
  ptr p = __builtin_malloc (sizeof (struct S));
  bar (p);
}

  return i > 0 ? 1 : -1;
}

int f1 (int i, unsigned int n)
{
  return foo (i, n);
}

int f2 (int i, unsigned int n)
{
  return foo (i, n);
}

when foo is split into 2 parts at -O2.

This generally goes unnoticed because the inliner sets DECL_ABSTRACT_ORIGIN on
the remapped TYPE_DECL, so gen_typedef_die skips it:

  type_die = new_die (DW_TAG_typedef, context_die, decl);
  origin = decl_ultimate_origin (decl);
  if (origin != NULL)
add_abstract_origin_attribute (type_die, origin);
  else
{
  tree type;

  add_name_and_src_coords_attributes (type_die, decl);
  if (DECL_ORIGINAL_TYPE (decl))
{
  type = DECL_ORIGINAL_TYPE (decl);

  if (type == error_mark_node)
return;

  gcc_assert (type != TREE_TYPE (decl));
  equate_type_number_to_die (TREE_TYPE (decl), type_die);
}

But, in LTO mode, DECL_ABSTRACT_ORIGIN is not streamed so it's another story
and this for example breaks the LTO build of the Ada compiler at -O2 -g.

Hence the attached ad-hoc attempt at preserving the invariant in remap_decl,
which appears to work and is sufficient to fix the aforementioned bootstrap.

Tested on x86_64-suse-linux, OK for the mainline?


2016-06-22  Eric Botcazou  

* tree-inline.c (remap_decl): Preserve DECL_ORIGINAL_TYPE invariant.
Consider a comment pointing back to the relevant dwarf2 code that 
enforces the invariant.


OK for the trunk.

jeff


Re: [PATCH 3/3] Add make autoprofiledbootstrap

2016-06-22 Thread Andi Kleen
On Wed, Jun 22, 2016 at 11:51:17AM -0600, Jeff Law wrote:
> On 06/22/2016 06:37 AM, Andi Kleen wrote:
> >From: Andi Kleen 
> >
> >Add support for profiledbootstrap with autofdo. Will be useful
> >to get better testing coverage of autofdo.
> >
> >This requires Linux perf and autofdo to be installed, only
> >really for x86_64 linux on Intel so far.
> I really think we ought to check that and give a sensible error if
> autoprofiledbootstrap is requested.

It will fail at some point in the second stage with "perf not found" or
somesuch.  Is that good enough?

> 
> What happens in a one-tree build for in the gas, gdb, binutils
> subdirectories?  Does anything special need to be done for those to
> make sure they build correctly (even if they aren't taking advantage
> of profiling data)?

I haven't tested that explicitly (is there an easy way to do that?)

But in general the perf files are handled in the same way as profile
feedback files for profiledbootstrap. If that works the perf build
should work too.

> 
> I think with a reasonable error if perf isn't available and a sense
> that we're not going to horribly break one-tree-builds this should
> be ready to install.

Thanks Jeff.

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only.


Re: [patch,testsuite,avr]: Support dg-require-effective-target avr_3byte_pc (and use it with PR71151 tests).

2016-06-22 Thread Jeff Law

On 06/22/2016 11:44 AM, Georg-Johann Lay wrote:

Jeff Law schrieb:

On 06/22/2016 08:21 AM, Georg-Johann Lay wrote:

Some tests for PR71151 assume that the target MCU has a 3-byte PC.  The
tests are failing because the simulator (avrtest) rejects to load the
respective executables if .text exceeds 128KiB, e.g. for -mmcu=atmega128
which has only flash of 128KiB and only a 2-byte PC.

Hence the tests have to be skipped if the target MCU has no 3-byte PC,
hence a new dg-require-effective-target proc supporting "avr_3byte_pc".

I added the new proc right after the last check_effective_target_arm_***
so that the test is in ASCII collating order.

Ok for trunk and v6?

Would it make more sense to have a generic test around the size of the
PC and or the size of pointers rather than something AVR specific?

Jeff


At least a test for pointer-size won't work for avr because pointers for
avr are 2-byte, yet there are AVR devices with 3-byte PC.
One of the H8 series might have had similar properties.  H8 stuff is 
getting rather fuzzy for me these days.


I vaguely recall code somewhere that detected when the resulting program 
wouldn't fit into the address space and gracefully exited with an error. 
 I think it was the linker that compiled and something in dejagnu knew 
to look for that error.  IT's probably worth digging around a bit to 
find that code and see if it can be re-used.


jeff




Re: [PATCH 1/3] Add gcc-auto-profile script

2016-06-22 Thread Jeff Law

On 06/22/2016 11:44 AM, Andi Kleen wrote:

2016-06-22  Andi Kleen  

* config/i386/gcc-auto-profile: New file.

contrib/:

2016-06-22  Andi Kleen  

* gen_autofdo_event.py: New file to regenerate
gcc-auto-profile.

This part looks fine to me and can probably go in independently of the other
bits.  Right?


Right. Is that an approval?

Yes.
jeff



Re: [PATCH 3/3] Add make autoprofiledbootstrap

2016-06-22 Thread Jeff Law

On 06/22/2016 06:37 AM, Andi Kleen wrote:

From: Andi Kleen 

Add support for profiledbootstrap with autofdo. Will be useful
to get better testing coverage of autofdo.

This requires Linux perf and autofdo to be installed, only
really for x86_64 linux on Intel so far.
I really think we ought to check that and give a sensible error if 
autoprofiledbootstrap is requested.


What happens in a one-tree build for in the gas, gdb, binutils 
subdirectories?  Does anything special need to be done for those to make 
sure they build correctly (even if they aren't taking advantage of 
profiling data)?


I think with a reasonable error if perf isn't available and a sense that 
we're not going to horribly break one-tree-builds this should be ready 
to install.

jeff




Re: [patch,testsuite,avr]: Support dg-require-effective-target avr_3byte_pc (and use it with PR71151 tests).

2016-06-22 Thread Georg-Johann Lay

Jeff Law schrieb:

On 06/22/2016 08:21 AM, Georg-Johann Lay wrote:

Some tests for PR71151 assume that the target MCU has a 3-byte PC.  The
tests are failing because the simulator (avrtest) rejects to load the
respective executables if .text exceeds 128KiB, e.g. for -mmcu=atmega128
which has only flash of 128KiB and only a 2-byte PC.

Hence the tests have to be skipped if the target MCU has no 3-byte PC,
hence a new dg-require-effective-target proc supporting "avr_3byte_pc".

I added the new proc right after the last check_effective_target_arm_***
so that the test is in ASCII collating order.

Ok for trunk and v6?
Would it make more sense to have a generic test around the size of the 
PC and or the size of pointers rather than something AVR specific?


Jeff


At least a test for pointer-size won't work for avr because pointers for 
avr are 2-byte, yet there are AVR devices with 3-byte PC.


Dunno if other targets need to distinguish between PC sizes like the avr 
target...  For the vast majority of targets stuff like ptr32plus et al. 
are likely to be the right kinds of filters, but that's not the case for 
avr -- ptr*plus won't do it.


Actually it's a test for whether avrtest simulator is run for 
-mmcu=avr51 cores or -mmcu=avr6 cores, so this is quite target specific. 
 It's just the case that 3-byte-pc is more descriptive for avr people 
and more related to gcc because that feature is tested in many places in 
the avr BE and even shipped to user land as a built-in macro.


If it helps other targets I could rename it to, say, pc17plus for 
"Program Counter is at least 17 bits wide".


Johann





Re: [PATCH 1/3] Add gcc-auto-profile script

2016-06-22 Thread Andi Kleen
> > 2016-06-22  Andi Kleen  
> > 
> > * config/i386/gcc-auto-profile: New file.
> > 
> > contrib/:
> > 
> > 2016-06-22  Andi Kleen  
> > 
> > * gen_autofdo_event.py: New file to regenerate
> > gcc-auto-profile.
> This part looks fine to me and can probably go in independently of the other
> bits.  Right?

Right. Is that an approval?

-Andi


Re: [PATCH, vec-tails 05/10] Check if loop can be masked

2016-06-22 Thread Jeff Law

On 06/22/2016 10:09 AM, Ilya Enkovich wrote:


Given the common structure & duplication I can't help but wonder if a single
function should be used for widening/narrowing.  Ultimately can't you swap
mask_elems/req_elems and always go narrower to wider (using a different
optab for the two different cases)?


I think we can't always go in narrower to wider direction because widening
uses two optabs wand also because the way insn_data is checked.

OK.  Thanks for considering.



I'm guessing Richi's comment about what tree type you're looking at refers
to this and similar instances.  Doesn't this give you the type of the number
of iterations rather than the type of the iteration variable itself?




Since I build vector IV by myself and use to compare with NITERS I
feel it's safe to
use type of NITERS.  Do you expect NITERS and IV types differ?
Since you're comparing to NITERS, it sounds like you've got it right and 
that Richi and I have it wrong.


It's less a question of whether or not we expect NITERS and IV to have 
different types, but more a realization that there's nothing that 
inherently says they have to be the same.  THey probably are the same 
most of the time, but I don't think that's something we can or should 
necessarily depend on.





@@ -1791,6 +1870,20 @@ vectorizable_mask_load_store (gimple *stmt,
gimple_stmt_iterator *gsi,
   && !useless_type_conversion_p (vectype, rhs_vectype)))
 return false;

+  if (LOOP_VINFO_CAN_BE_MASKED (loop_vinfo))
+{
+  /* Check that mask conjuction is supported.  */
+  optab tab;
+  tab = optab_for_tree_code (BIT_AND_EXPR, vectype, optab_default);
+  if (!tab || optab_handler (tab, TYPE_MODE (vectype)) ==
CODE_FOR_nothing)
+   {
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"cannot be masked: unsupported mask
operation\n");
+ LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
+   }
+}


Should the optab querying be in optab-query.c?


We always directly call optab_handler for simple operations.  There are dozens
of such calls in vectorizer.
OK.  I would look favorably on a change to move those queries out into 
optabs-query as a separate patch.




We don't embed masking capabilities into vectorizer.

Actually we don't depend on masking capabilities so much.  We have to mask
loads and stores and use can_mask_load_store for that which uses existing optab
query.  We also require masking for reductions and use VEC_COND for that
(and use existing expand_vec_cond_expr_p).  Other checks are to check if we
can build required masks.  So we actually don't expose any new processor
masking capabilities to GIMPLE.  I.e. all this works on targets with no
rich masking capabilities.  E.g. we can mask loops for quite old SSE targets.
OK.  I think the key here is that load/store masking already exists and 
the others are either VEC_COND or checking if we can build the mask 
rather than can the operation be masked.  THanks for clarifying.

jeff


Re: [PATCH 2/3] Run profile feedback tests with autofdo

2016-06-22 Thread Jeff Law

On 06/22/2016 06:37 AM, Andi Kleen wrote:

From: Andi Kleen 

Extend the existing bprob and tree-prof tests to also run with autofdo.
The test runtimes are really a bit too short for autofdo, but it's
a reasonable sanity check.

This only works natively for now.

dejagnu doesn't seem to support a wrapper for unix tests, so I had
to open code running these tests.  That should be ok due to the
native run restrictions.

gcc/testsuite/:

2016-06-22  Andi Kleen  

* g++.dg/bprob/bprob.exp: Support autofdo.
* g++.dg/tree-prof/tree-prof.exp: dito.
* gcc.dg/tree-prof/tree-prof.exp: dito.
* gcc.misc-tests/bprob.exp: dito.
* gfortran.dg/prof/prof.exp: dito.
* lib/profopt.exp: dito.
* lib/target-supports.exp: Check for autofdo.
I'm generally OK with this as well.  My only concern is that we get 
something sensible on targets which don't support autofdo and on systems 
without perf installed.


ISTM the right result is for the autofdo versions of those tests to be 
UNSUPPORTED on both those situations.  Can you confirm that's what 
happens in those two cases?  If we do indeed get UNSUPPORTED for the 
autofdo versions of those tests in those cases, then this patch is OK as 
well.


jeff



Re: [PATCH 1/3] Add gcc-auto-profile script

2016-06-22 Thread Jeff Law

On 06/22/2016 06:37 AM, Andi Kleen wrote:

From: Andi Kleen 

Using autofdo is currently something difficult. It requires using the
model specific branches taken event, which differs on different CPUs.
The example shown in the manual requires a special patched version of
perf that is non standard, and also will likely not work everywhere.

This patch adds a new gcc-auto-profile script that figures out the
correct event and runs perf.

This is needed to actually make use of autofdo in a generic way
in the build system and in the test suite.

Since maintaining the script would be somewhat tedious (needs changes
every time a new CPU comes out) I auto generated it from the online
Intel event database. The script to do that is in contrib and can be
rerun.

Right now there is no test if perf works in configure. This
would vary depending on the build and target system, and since
it currently doesn't work in virtualization and needs uptodate
kernel it may often fail in common distribution build setups.

So far the script is not installed.

v2: Remove documentation of gcc-auto-profile, as its not
installed.

gcc/:
2016-06-22  Andi Kleen  

* doc/invoke.texi: Document gcc-auto-profile
* config/i386/gcc-auto-profile: New file.

contrib/:

2016-06-22  Andi Kleen  

* gen_autofdo_event.py: New file to regenerate
gcc-auto-profile.
This part looks fine to me and can probably go in independently of the 
other bits.  Right?


Jeff



Re: [PATCH, vec-tails 05/10] Check if loop can be masked

2016-06-22 Thread Jeff Law

On 06/22/2016 09:03 AM, Ilya Enkovich wrote:

2016-06-16 9:26 GMT+03:00 Jeff Law :

On 06/15/2016 05:22 AM, Richard Biener wrote:



You look at TREE_TYPE of LOOP_VINFO_NITERS (loop_vinfo) - I don't think
this is meaningful (if then only by accident).  I think you should look at
the
control IV itself, possibly it's value-range, to determine the smallest
possible
type to use.


Can we get an IV that's created after VRP?  If so, then we have to be
prepared for the case where there's no range information on the IV.  At
which point I think using type min/max of the IV is probably the right
fallback.  But I do think we should be looking at range info much more
systematically.

I can't see how TREE_TYPE of the NITERS makes sense either.


I need to build a vector {niters, ..., niters} and compare to it.  Why doesn't
it make sense to choose the same type for IV?  I agree that choosing a smaller
type may be beneficial.   Shouldn't I look at nb_iterations_upper_bound then
to check if NITERS can be casted to a smaller type?
Isn't TREE_TYPE (LOOP_VINFO_NITERS (loop_vinfo)) the type of the 
constant being used to represent the number of iterations?  That is 
independent of the type of the IV.


Though I guess your argument is that since you're building a vector of 
niters, that indeed what you want is the type of that constant, not the 
type of the IV.  That might be worth a comment in the code :-)


jeff



Re: [patch,testsuite,avr]: Support dg-require-effective-target avr_3byte_pc (and use it with PR71151 tests).

2016-06-22 Thread Mike Stump
On Jun 22, 2016, at 10:06 AM, Mike Stump  wrote:
> Please see target-utils.exp and ensure that the tools generate a stylized 
> message and then add support for that to target-utils.exp.

Also, see return "::unsupported::memory full" in gcc-dg.exp, there is a copy 
there as well.

Re: [patch,testsuite,avr]: Support dg-require-effective-target avr_3byte_pc (and use it with PR71151 tests).

2016-06-22 Thread Mike Stump
On Jun 22, 2016, at 7:21 AM, Georg-Johann Lay  wrote:
> 
> Some tests for PR71151 assume that the target MCU has a 3-byte PC.  The tests 
> are failing because the simulator (avrtest) rejects to load the respective 
> executables if .text exceeds 128KiB, e.g. for -mmcu=atmega128 which has only 
> flash of 128KiB and only a 2-byte PC.
> 
> Hence the tests have to be skipped if the target MCU has no 3-byte PC, hence 
> a new dg-require-effective-target proc supporting "avr_3byte_pc".
> 
> I added the new proc right after the last check_effective_target_arm_*** so 
> that the test is in ASCII collating order.
> 
> Ok for trunk and v6?

No.  Please see target-utils.exp and ensure that the tools generate a stylized 
message and then add support for that to target-utils.exp.  If you are using 
binutils, the text should go into a memory segment that will fill when it is 
too large.  When it does, then binutils will generate one of the messages 
already handled, then you're done.

Re: [patch,testsuite,avr]: Support dg-require-effective-target avr_3byte_pc (and use it with PR71151 tests).

2016-06-22 Thread Jeff Law

On 06/22/2016 08:21 AM, Georg-Johann Lay wrote:

Some tests for PR71151 assume that the target MCU has a 3-byte PC.  The
tests are failing because the simulator (avrtest) rejects to load the
respective executables if .text exceeds 128KiB, e.g. for -mmcu=atmega128
which has only flash of 128KiB and only a 2-byte PC.

Hence the tests have to be skipped if the target MCU has no 3-byte PC,
hence a new dg-require-effective-target proc supporting "avr_3byte_pc".

I added the new proc right after the last check_effective_target_arm_***
so that the test is in ASCII collating order.

Ok for trunk and v6?
Would it make more sense to have a generic test around the size of the 
PC and or the size of pointers rather than something AVR specific?


Jeff




Re: [PATCH 1/7] SMS remove dependence on doloop: Use loop induction variable analysis in SMS pass

2016-06-22 Thread Jeff Law

On 05/05/2016 12:01 AM, Shiva Chen wrote:

Hi,

SMS transformation would change the kernel loop iteration count.
To do this, SMS pass will find the register contain loop count
and generate the instructions to adjust loop count.

Currently, SMS will try to find count_reg by recognizing doloop_end
pattern which means if the target didn't define doloop_end pattern
or the loop not suitable for doloop optimization, SMS pass won't active.

The patch use induction variable analysis to find count_reg instead of
find doloop_end pattern.
So these patches need to be bootstrapped and regression tested.  Since 
SMS is not the default on any major platforms, I would recommend first 
hacking SMS to be enabled by default.  That isn't a patch you're going 
to submit, but instead it allows you to do bootstrap and regression 
testing on x86_64, ppc64 or whatever desktop/server machines you have 
access to.


I did that with patch #1 just to see what would happen and I get an 
assert triggered in generate_prolog_epilog:


  gcc_assert (REG_P (sub_reg));

Where sub_reg and count_reg are:

(subreg:SI (reg:DI 146 [ ivtmp.11 ]) 0)


AFAICT (reg:DI 146) is the actual IV, but the test actually occurs in 
SImode:



(insn 80 79 82 7 (parallel [
(set (reg:DI 146 [ ivtmp.11 ])
(plus:DI (reg:DI 146 [ ivtmp.11 ])
(const_int -1 [0x])))
(clobber (reg:CC 17 flags))
]) 212 {*adddi_1}
 (expr_list:REG_UNUSED (reg:CC 17 flags)
(nil)))

(insn 82 80 83 7 (set (reg:CCGOC 17 flags)
(compare:CCGOC (subreg:SI (reg:DI 146 [ ivtmp.11 ]) 0)
(const_int 0 [0]))) 
../../../../gcc/libstdc++-v3/libsupc++/hash_bytes.cc:54 3 {*cmpsi_ccno_1}

 (nil))

(jump_insn 83 82 84 7 (set (pc)
(if_then_else (ge (reg:CCGOC 17 flags)
(const_int 0 [0]))
(label_ref:DI 81)
(pc))) 
../../../../gcc/libstdc++-v3/libsupc++/hash_bytes.cc:54 635 {*jcc_1}

 (expr_list:REG_DEAD (reg:CCGOC 17 flags)
(int_list:REG_BR_PROB 8500 (nil)))
 -> 81)


So you either need to be filtering out cases where the IV is in a 
different mode than its test or handling SUBREGs better throughout the code.


I haven't looked deeply at any of the patches yet.  I won't until 
they're in better shape from a correctness standpoint.


Jeff



Re: [PATCH] Change minimum of -predicted-iterations to 1 (PR71619)

2016-06-22 Thread Martin Liška
On 06/22/2016 04:59 PM, Martin Liška wrote:
> Hello.
> 
> Following patch just changes minimum value to 1. I think having minimum to 
> zero
> does not make any good sense.
> 
> Running reg on x86_64-linux-gnu, ready after it finishes?
> 
> Thanks,
> Martin
> 

As mentioned in PR48189, zero value should be properly handled.
I'll rework the patch.

Martin


Re: [PATCH, vec-tails 05/10] Check if loop can be masked

2016-06-22 Thread Ilya Enkovich
2016-06-16 10:08 GMT+03:00 Jeff Law :
> On 05/19/2016 01:42 PM, Ilya Enkovich wrote:
>>
>> Hi,
>>
>> This patch introduces analysis to determine if loop can be masked
>> (compute LOOP_VINFO_CAN_BE_MASKED and LOOP_VINFO_REQUIRED_MASKS)
>> and compute how much masking costs.
>>
>> Thanks,
>> Ilya
>> --
>> gcc/
>>
>> 2016-05-19  Ilya Enkovich  
>>
>> * tree-vect-loop.c: Include insn-config.h and recog.h.
>> (vect_check_required_masks_widening): New.
>> (vect_check_required_masks_narrowing): New.
>> (vect_get_masking_iv_elems): New.
>> (vect_get_masking_iv_type): New.
>> (vect_get_extreme_masks): New.
>> (vect_check_required_masks): New.
>> (vect_analyze_loop_operations): Add vect_check_required_masks
>> call to compute LOOP_VINFO_CAN_BE_MASKED.
>> (vect_analyze_loop_2): Initialize LOOP_VINFO_CAN_BE_MASKED and
>> LOOP_VINFO_NEED_MASKING before starting over.
>> (vectorizable_reduction): Compute LOOP_VINFO_CAN_BE_MASKED and
>> masking cost.
>> * tree-vect-stmts.c (can_mask_load_store): New.
>> (vect_model_load_masking_cost): New.
>> (vect_model_store_masking_cost): New.
>> (vect_model_simple_masking_cost): New.
>> (vectorizable_mask_load_store): Compute LOOP_VINFO_CAN_BE_MASKED
>> and masking cost.
>> (vectorizable_simd_clone_call): Likewise.
>> (vectorizable_store): Likewise.
>> (vectorizable_load): Likewise.
>> (vect_stmt_should_be_masked_for_epilogue): New.
>> (vect_add_required_mask_for_stmt): New.
>> (vect_analyze_stmt): Compute LOOP_VINFO_CAN_BE_MASKED.
>> * tree-vectorizer.h (vect_model_load_masking_cost): New.
>> (vect_model_store_masking_cost): New.
>> (vect_model_simple_masking_cost): New.
>>
>>
>> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
>> index e25a0ce..31360d3 100644
>> --- a/gcc/tree-vect-loop.c
>> +++ b/gcc/tree-vect-loop.c
>> @@ -31,6 +31,8 @@ along with GCC; see the file COPYING3.  If not see
>>  #include "tree-pass.h"
>>  #include "ssa.h"
>>  #include "optabs-tree.h"
>> +#include "insn-config.h"
>> +#include "recog.h" /* FIXME: for insn_data */
>
> Ick :(
>
>
>> +
>> +/* Function vect_check_required_masks_narowing.
>
> narrowing
>
>
>> +
>> +   Return 1 if vector mask of type MASK_TYPE can be narrowed
>> +   to a type having REQ_ELEMS elements in a single vector.  */
>> +
>> +static bool
>> +vect_check_required_masks_narrowing (loop_vec_info loop_vinfo,
>> +tree mask_type, unsigned req_elems)
>
> Given the common structure & duplication I can't help but wonder if a single
> function should be used for widening/narrowing.  Ultimately can't you swap
> mask_elems/req_elems and always go narrower to wider (using a different
> optab for the two different cases)?

I think we can't always go in narrower to wider direction because widening
uses two optabs wand also because the way insn_data is checked.

>
>
>
>
>> +
>> +/* Function vect_get_masking_iv_elems.
>> +
>> +   Return a number of elements in IV used for loop masking.  */
>> +static int
>> +vect_get_masking_iv_elems (loop_vec_info loop_vinfo)
>> +{
>> +  tree iv_type = TREE_TYPE (LOOP_VINFO_NITERS (loop_vinfo));
>
> I'm guessing Richi's comment about what tree type you're looking at refers
> to this and similar instances.  Doesn't this give you the type of the number
> of iterations rather than the type of the iteration variable itself?
>
>

Since I build vector IV by myself and use to compare with NITERS I
feel it's safe to
use type of NITERS.  Do you expect NITERS and IV types differ?

>
>
>  +
>>
>> +  if (!expand_vec_cmp_expr_p (iv_vectype, mask_type))
>> +{
>> +  if (dump_enabled_p ())
>> +   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>> +"cannot be masked: required vector comparison "
>> +"is not supported.\n");
>> +  LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
>> +  return;
>> +}
>
> On a totally unrelated topic, I was speaking with David Malcolm earlier this
> week about how to turn this kind of missed optimization information we
> currently emit into dumps into something more easily consumed by users.
>
> The general issue is that we've got customers that want to understand why
> certain optimizations fire or do not fire.  They're by far more interested
> in the vectorizer than anything else.
>
> We have a sense that much of the information those customers desire is
> sitting in the dump files, but it's buried in there with other stuff that
> isn't generally useful to users.
>
> So we're pondering what it might take to take these glorified fprintf calls
> and turn them into a first class diagnostic that could be emitted to stderr
> or into the dump file depending (of course) on the options passed to GCC.
>
> The reason I 

[ARM][testsuite] neon-testgen.ml removal

2016-06-22 Thread Christophe Lyon
Hi,

This is a new attempt at removing neon-testgen.ml and generated files.

Compared to my previous version several months ago:
- I have recently added testcases to make sure we do not lose coverage
as described in
https://gcc.gnu.org/ml/gcc-patches/2015-11/msg02922.html
- I now also remove neon.ml as requested by Kyrylo in
https://gcc.gnu.org/ml/gcc-patches/2015-11/msg01664.html, and moved
the remaining hand-written tests up to gcc.target/arm.

Doing this, I had to slightly update vst1Q_laneu64-1.c because it's
now compiled with more pedantic flags and there was a signed/unsigned
char buffer pointer mismatch.

Sorry, I had to compress the patch, otherwise it's too large and rejected
by the list server.

OK?

Christophe
[ARM] neon-testgen.ml, neon.ml and generated files removal.

gcc/
2016-06-17  Christophe Lyon  

* config/arm/neon-testgen.ml: Delete.
* config/arm/neon.ml: Delete.

gcc/testsuite/
2016-06-17  Christophe Lyon  

* gcc.target/arm/neon/polytypes.c: Move to ...
* gcc.target/arm/polytypes.c: ... here.
* gcc.target/arm/neon/pr51534.c: Move to ...
* gcc.target/arm/pr51534.c: ... here.
* gcc.target/arm/neon/vect-vcvt.c: Move to ...
* gcc.target/arm/vect-vcvt.c: ... here.
* gcc.target/arm/neon/vect-vcvtq.c: Move to ...
* gcc.target/arm/vect-vcvtq.c: ... here.
* gcc.target/arm/neon/vfp-shift-a2t2.c: Move to ...
* gcc.target/arm/vfp-shift-a2t2.c: ... here.
* gcc.target/arm/neon/vst1Q_laneu64-1.c: Move to ...
* gcc.target/arm/vst1Q_laneu64-1.c: ... here. Fix foo() prototype.
* gcc.target/arm/neon/neon.exp: Delete.
* gcc.target/arm/neon/*.c: Delete.



neon-testgen.patch.txt.xz
Description: application/force-download


[committed] libcpp: Tweak to missing #include source location

2016-06-22 Thread David Malcolm
This patch tweaks the error message location for missing header files.

Previously these read:

test.c:1:17: fatal error: 404.h: No such file or directory
 #include "404.h"
 ^
compilation terminated.

With this patch, the pertinent string is underlined:

test.c:1:10: fatal error: 404.h: No such file or directory
 #include "404.h"
  ^~~
compilation terminated.

Successfully bootstrapped on x86_64-pc-linux-gnu;
adds 12 new PASS results to gcc.sum and 36 new PASS results to
g++.sum.

Committed to trunk as r237715.

gcc/testsuite/ChangeLog:
* c-c++-common/missing-header-1.c: New test case.
* c-c++-common/missing-header-2.c: New test case.
* c-c++-common/missing-header-3.c: New test case.
* c-c++-common/missing-header-4.c: New test case.

libcpp/ChangeLog:
* directives.c (do_include_common): Pass on "location" to
_cpp_stack_include.
* errors.c (cpp_diagnostic): Reimplement in terms of...
(cpp_diagnostic_at): New function.
(cpp_error_at): New function.
(cpp_errno_filename): Add "loc" param and use it by using
cpp_error_at rather than cpp_error.
* files.c (find_file_in_dir): Add "loc" param and pass it to
open_file_failed.
(_cpp_find_file): Add "loc" param.  Use it to convert calls to
cpp_error to cpp_error_at, and pass it to find_file_in_dir and
open_file_failed.
(read_file_guts): Add "loc" param.  Use it to convert calls to
cpp_error to cpp_error_at.  Pass it to cpp_errno_filename.
(read_file): Add "loc" param.  Pass it to open_file_failed and
read_file_guts.
(should_stack_file): Add "loc" param.  Pass it to read_file.
(_cpp_stack_file): Add "loc" param.  Pass it to should_stack_file.
(_cpp_stack_include): Add "loc" param.  Pass it to
_cpp_find_file and _cpp_stack_file.
(open_file_failed): Add "loc" param.  Pass it to
cpp_errno_filename.
(_cpp_fake_include): Add 0 as a source_location in call to
_cpp_find_file.
(_cpp_compare_file_date): Likewise.
(cpp_push_include): Likewise for call to _cpp_stack_include.
(cpp_push_default_include): Likewise.
(_cpp_save_file_entries): Likewise for call to open_file_failed.
(_cpp_has_header): Likewise for call to _cpp_find_file.
* include/cpplib.h (cpp_errno_filename): Add source_location
param.
(cpp_error_at): New declaration.
* init.c (cpp_read_main_file): Add 0 as a source_location in calls
to _cpp_find_file and _cpp_stack_file.
* internal.h (_cpp_find_file): Add source_location param.
(_cpp_stack_file): Likewise.
(_cpp_stack_include): Likewise.
---
 gcc/testsuite/c-c++-common/missing-header-1.c |   8 ++
 gcc/testsuite/c-c++-common/missing-header-2.c |   8 ++
 gcc/testsuite/c-c++-common/missing-header-3.c |   8 ++
 gcc/testsuite/c-c++-common/missing-header-4.c |   8 ++
 libcpp/directives.c   |   2 +-
 libcpp/errors.c   |  52 ++---
 libcpp/files.c| 105 +++---
 libcpp/include/cpplib.h   |   7 +-
 libcpp/init.c |   7 +-
 libcpp/internal.h |   7 +-
 10 files changed, 153 insertions(+), 59 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/missing-header-1.c
 create mode 100644 gcc/testsuite/c-c++-common/missing-header-2.c
 create mode 100644 gcc/testsuite/c-c++-common/missing-header-3.c
 create mode 100644 gcc/testsuite/c-c++-common/missing-header-4.c

diff --git a/gcc/testsuite/c-c++-common/missing-header-1.c 
b/gcc/testsuite/c-c++-common/missing-header-1.c
new file mode 100644
index 000..30e92ad
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/missing-header-1.c
@@ -0,0 +1,8 @@
+/* { dg-options "-fdiagnostics-show-caret" } */
+#include "this-file-does-not-exist.h" /* { dg-error "10: 
this-file-does-not-exist.h: No such file or directory" } */
+
+/* { dg-begin-multiline-output "" }
+ #include "this-file-does-not-exist.h"
+  ^~~~
+compilation terminated.
+   { dg-end-multiline-output "" } */
diff --git a/gcc/testsuite/c-c++-common/missing-header-2.c 
b/gcc/testsuite/c-c++-common/missing-header-2.c
new file mode 100644
index 000..a634703
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/missing-header-2.c
@@ -0,0 +1,8 @@
+/* { dg-options "-fdiagnostics-show-caret" } */
+#include  /* { dg-error "10: 
this-file-does-not-exist.h: No such file or directory" } */
+
+/* { dg-begin-multiline-output "" }
+ #include 
+  ^~~~
+compilation terminated.
+   { dg-end-multiline-output "" } */
diff --git a/gcc/testsuite/c-c++-common/missing-header-3.c 
b/gcc/testsuite/c-c++-common/missing-header-3.c
new file mode 100644
index 000..4147367
--- /dev/null
+++ 

Re: RFA: Generate normal DWARF DW_LOC descriptors for non integer mode pointers

2016-06-22 Thread Nick Clifton
Hi Jeff,

> I can buy that ;-)   OK with a suitable ChangeLog entry.

Thanks!  Checked in with this changelog entry.

Cheers
  Nick

gcc/ChangeLog
2016-06-22  Nick Clifton  

* dwarf2out.c (scompare_loc_descriptor): Use SCALAR_INT_MODE_P() in
place of GET_MODE_CLASS() == MODE_INT, so that partial integer
modes are accepted as well.
(ucompare_loc_descriptor): Likewise.
(minmax_loc_descriptor): Likewise.
(clz_loc_descriptor): Likewise.
(popcount_loc_descriptor): Likewise.
(bswap_loc_descriptor): Likewise.
(rotate_loc_descriptor): Likewise.
(mem_loc_descriptor): Likewise.
(loc_descriptor): Likewise.


Re: [PATCH, vec-tails 05/10] Check if loop can be masked

2016-06-22 Thread Ilya Enkovich
2016-06-16 9:26 GMT+03:00 Jeff Law :
> On 06/15/2016 05:22 AM, Richard Biener wrote:
>>
>>
>> You look at TREE_TYPE of LOOP_VINFO_NITERS (loop_vinfo) - I don't think
>> this is meaningful (if then only by accident).  I think you should look at
>> the
>> control IV itself, possibly it's value-range, to determine the smallest
>> possible
>> type to use.
>
> Can we get an IV that's created after VRP?  If so, then we have to be
> prepared for the case where there's no range information on the IV.  At
> which point I think using type min/max of the IV is probably the right
> fallback.  But I do think we should be looking at range info much more
> systematically.
>
> I can't see how TREE_TYPE of the NITERS makes sense either.

I need to build a vector {niters, ..., niters} and compare to it.  Why doesn't
it make sense to choose the same type for IV?  I agree that choosing a smaller
type may be beneficial.   Shouldn't I look at nb_iterations_upper_bound then
to check if NITERS can be casted to a smaller type?

Thanks,
Ilya

>
>> Finally we have a related missed optimization opportunity, namely avoiding
>> peeling for gaps if we mask the last load of the group (profitability
>> depends
>> on the overhead of such masking of course as it would be done in the main
>> vectorized loop).
>
> I think that's a specific instance of a more general question -- what
> transformations can be avoided by masking and can we generate costs to
> select between those transformations and masking.  Seems like a follow-up
> item rather than a requirement for this work to go forward to me.
>
> Jeff


[PATCH] Change minimum of -predicted-iterations to 1 (PR71619)

2016-06-22 Thread Martin Liška
Hello.

Following patch just changes minimum value to 1. I think having minimum to zero
does not make any good sense.

Running reg on x86_64-linux-gnu, ready after it finishes?

Thanks,
Martin
>From c677c2bdc0f011dbda013dd396431b3afcbc2ae9 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 22 Jun 2016 14:03:25 +0200
Subject: [PATCH] Change minimum of -predicted-iterations to 1 (PR71619)

gcc/testsuite/ChangeLog:

2016-06-22  Martin Liska  

	* gcc.dg/tree-ssa/pr71619.c: New test.

gcc/ChangeLog:

2016-06-22  Martin Liska  

	PR middle-end/71619
	* params.def: Set minimum value of max-predicted-iterations
	to 1.
---
 gcc/params.def  |  2 +-
 gcc/testsuite/gcc.dg/tree-ssa/pr71619.c | 12 
 2 files changed, 13 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr71619.c

diff --git a/gcc/params.def b/gcc/params.def
index 894b7f3..d85c5db 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -422,7 +422,7 @@ DEFPARAM (PARAM_ALIGN_LOOP_ITERATIONS,
 DEFPARAM(PARAM_MAX_PREDICTED_ITERATIONS,
 	 "max-predicted-iterations",
 	 "The maximum number of loop iterations we predict statically.",
-	 100, 0, 0)
+	 100, 1, 0)
 
 /* This parameter controls the probability of builtin_expect. The default
value is 90%. This empirical value is obtained through the weighted
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr71619.c b/gcc/testsuite/gcc.dg/tree-ssa/pr71619.c
new file mode 100644
index 000..1e9b3a7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr71619.c
@@ -0,0 +1,12 @@
+/* PR 71619 */
+
+/* { dg-do compile } */
+/* { dg-options "-O --param=max-predicted-iterations=0" } */
+/* { dg-error "minimum value of parameter 'max-predicted-iterations' is 1" "" { target *-*-* } 0 } */
+
+void
+foo ()
+{
+  int count = -10;
+  while (count++);
+}
-- 
2.8.4



Re: [patch] Fix problematic debug info for parameters at -O0

2016-06-22 Thread Eric Botcazou
> Do you really need the memset in there to reproduce it?
> Wouldn't asm volatile ("" : : "r" ([0]) : "memory");
> or something similar be enough?  Or if you need to clear something,
> clear much smaller part of the array?

Probably only the 128 first bytes, will change that.

-- 
Eric Botcazou


Re: [PATCH, rs6000] Prefer vspltisw/h over xxspltib+instruction when available

2016-06-22 Thread Segher Boessenkool
On Tue, Jun 21, 2016 at 06:46:57PM -0500, Bill Schmidt wrote:
> >> When I did this, I ran into a problem with an existing test case.  We end 
> >> up matching the *vsx_splat_v4si_internal pattern instead of falling back 
> >> to the altivec_vspltisw pattern.  The constraints don't match for constant 
> >> input.  To avoid this, I added a pattern ahead of this one that will match 
> >> for VMX output registers and produce the vspltisw as desired.  This 
> >> corrected the failing test and produces the expected code.
> > 
> > Why does the predicate allow constant input, while the constraints do not?
> 
> I have no idea why it was built that way.  The predicate seems to provide for 
> all sorts of things, but this and the subsequent pattern both handle only a 
> subset of the constraints implied by it.  To be honest, I didn't feel 
> competent to try to fix the existing patterns.  Do you have any suggestions 
> for what to do instead?

Don't give up so easily?  ;-)

The predicate should be tightened, the expander should use a new predicate
that allows all those other things.  The hardest part is figuring a good
name for it ;-)


Segher


[patch,testsuite,avr]: Support dg-require-effective-target avr_3byte_pc (and use it with PR71151 tests).

2016-06-22 Thread Georg-Johann Lay
Some tests for PR71151 assume that the target MCU has a 3-byte PC.  The tests 
are failing because the simulator (avrtest) rejects to load the respective 
executables if .text exceeds 128KiB, e.g. for -mmcu=atmega128 which has only 
flash of 128KiB and only a 2-byte PC.


Hence the tests have to be skipped if the target MCU has no 3-byte PC, hence a 
new dg-require-effective-target proc supporting "avr_3byte_pc".


I added the new proc right after the last check_effective_target_arm_*** so 
that the test is in ASCII collating order.


Ok for trunk and v6?

Johann


gcc/testsuite/
PR target/71151
* lib/target-supports.exp (check_effective_target_avr_3byte_pc):
New proc.
* gcc.target/avr/pr71151-5.c: Remove code for __AVR_2_BYTE_PC__.
Use dg-require-effective-target avr_3byte_pc.
* gcc.target/avr/pr71151-6.c: Same.
* gcc.target/avr/pr71151-7.c: Same.
* gcc.target/avr/pr71151-8.c: Same.
Index: testsuite/lib/target-supports.exp
===
--- testsuite/lib/target-supports.exp	(revision 237587)
+++ testsuite/lib/target-supports.exp	(working copy)
@@ -3588,6 +3588,16 @@ proc check_effective_target_arm_prefer_l
 }  "-O2 -mthumb" ]
 }
 
+# Return 1 if this is an AVR target with a 3-byte PC.
+
+proc check_effective_target_avr_3byte_pc { } {
+return [ check_no_compiler_messages avr_3byte_pc assembly {
+#if !defined(__AVR_3_BYTE_PC__)
+#error !__AVR_3_BYTE_PC__
+#endif
+}]
+}
+
 # Return 1 if this is a PowerPC target supporting -meabi.
 
 proc check_effective_target_powerpc_eabi_ok { } {
Index: testsuite/gcc.target/avr/pr71151-5.c
===
--- testsuite/gcc.target/avr/pr71151-5.c	(revision 237587)
+++ testsuite/gcc.target/avr/pr71151-5.c	(working copy)
@@ -1,5 +1,6 @@
 /* { dg-do run } */
 /* { dg-options "-Os -fno-tree-switch-conversion -ffunction-sections -fdata-sections -mno-relax -Wl,--section-start=.foo=0x2" } */
+/* { dg-require-effective-target avr_3byte_pc } */
 
 /* Make sure jumptables work properly if placed above 128 KB, i.e. 3 byte
flash address for loading jump table entry and a jump table entry
@@ -11,10 +12,6 @@
 
 int main()
 {
-	/* Not meant for devices with flash <= 128K */
-#if defined (__AVR_2_BYTE_PC__)
-	exit(0);
-#else
 	foo(5);
 	if (y != 37)
 		abort();
@@ -26,5 +23,4 @@ int main()
 	foo(7);
 	if (y != 98)
 		abort();
-#endif
 }
Index: testsuite/gcc.target/avr/pr71151-6.c
===
--- testsuite/gcc.target/avr/pr71151-6.c	(revision 237587)
+++ testsuite/gcc.target/avr/pr71151-6.c	(working copy)
@@ -1,5 +1,6 @@
 /* { dg-do run } */
 /* { dg-options "-Os -fno-tree-switch-conversion -ffunction-sections -fdata-sections -mrelax -Wl,--section-start=.foo=0x2" } */
+/* { dg-require-effective-target avr_3byte_pc } */
 
 /* Make sure jumptables work properly if placed above 128 KB, i.e. 3 byte
flash address for loading jump table entry and a jump table entry
@@ -11,10 +12,6 @@
 
 int main()
 {
-	/* Not meant for devices with flash <= 128K */
-#if defined (__AVR_2_BYTE_PC__)
-	exit(0);
-#else
 	foo(5);
 	if (y != 37)
 		abort();
@@ -26,5 +23,4 @@ int main()
 	foo(7);
 	if (y != 98)
 		abort();
-#endif
 }
Index: testsuite/gcc.target/avr/pr71151-7.c
===
--- testsuite/gcc.target/avr/pr71151-7.c	(revision 237587)
+++ testsuite/gcc.target/avr/pr71151-7.c	(working copy)
@@ -1,5 +1,6 @@
 /* { dg-do run } */
 /* { dg-options "-Os -fno-tree-switch-conversion -ffunction-sections -fdata-sections -mno-relax -Wl,--section-start=.foo=0x1fffa" } */
+/* { dg-require-effective-target avr_3byte_pc } */
 
 /* Make sure jumptables work properly if placed straddling 128 KB i.e
some entries below 128 KB and some above it, with relaxation disabled. */
@@ -9,10 +10,6 @@
 
 int main()
 {
-	/* Not meant for devices with flash <= 128K */
-#if defined (__AVR_2_BYTE_PC__)
-	exit(0);
-#else
 	foo(5);
 	if (y != 37)
 		abort();
@@ -24,5 +21,4 @@ int main()
 	foo(7);
 	if (y != 98)
 		abort();
-#endif
 }
Index: testsuite/gcc.target/avr/pr71151-8.c
===
--- testsuite/gcc.target/avr/pr71151-8.c	(revision 237587)
+++ testsuite/gcc.target/avr/pr71151-8.c	(working copy)
@@ -1,5 +1,6 @@
 /* { dg-do run } */
 /* { dg-options "-Os -fno-tree-switch-conversion -ffunction-sections -fdata-sections -mrelax -Wl,--section-start=.foo=0x1fffa" } */
+/* { dg-require-effective-target avr_3byte_pc } */
 
 /* Make sure jumptables work properly if placed straddling 128 KB i.e
some entries below 128 KB and some above it, with relaxation disabled. */
@@ -9,10 +10,6 @@
 
 int main()
 {
-	/* Not meant for devices with flash <= 128K */
-#if defined (__AVR_2_BYTE_PC__)
-	exit(0);
-#else
 	foo(5);
 	if (y != 37)
 		abort();
@@ -24,5 +21,4 @@ int main()
 	

Re: [PATCH, vec-tails 04/10] Add masking cost

2016-06-22 Thread Ilya Enkovich
On 16 Jun 00:16, Jeff Law wrote:
> On 05/19/2016 01:40 PM, Ilya Enkovich wrote:
> >Hi,
> >
> >This patch extends vectorizer cost model to include masking cost by
> >adding new cost model locations and new target hook to compute
> >masking cost.
> >
> >Thanks,
> >Ilya
> >--
> >gcc/
> >
> >2016-05-19  Ilya Enkovich  
> >
> > * config/i386/i386.c (ix86_init_cost): Extend costs array.
> > (ix86_add_stmt_masking_cost): New.
> > (ix86_finish_cost): Add masking_prologue_cost and masking_body_cost
> > args.
> > (TARGET_VECTORIZE_ADD_STMT_MASKING_COST): New.
> > * config/i386/i386.h (TARGET_INCREASE_MASK_STORE_COST): New.
> > * config/i386/x86-tune.def (X86_TUNE_INCREASE_MASK_STORE_COST): New.
> > * config/rs6000/rs6000.c (_rs6000_cost_data): Extend cost array.
> > (rs6000_init_cost): Initialize new cost elements.
> > (rs6000_finish_cost): Add masking_prologue_cost and masking_body_cost.
> > * config/spu/spu.c (spu_init_cost): Extend costs array.
> > (spu_finish_cost): Add masking_prologue_cost and masking_body_cost args.
> > * doc/tm.texi.in (TARGET_VECTORIZE_ADD_STMT_MASKING_COST): New.
> > * doc/tm.texi: Regenerated.
> > * target.def (add_stmt_masking_cost): New.
> > (finish_cost): Add masking_prologue_cost and masking_body_cost args.
> > * target.h (enum vect_cost_for_stmt): Add vector_mask_load and
> > vector_mask_store.
> > (enum vect_cost_model_location): Add vect_masking_prologue
> > and vect_masking_body.
> > * targhooks.c (default_builtin_vectorization_cost): Support
> > vector_mask_load and vector_mask_store.
> > (default_init_cost): Extend costs array.
> > (default_add_stmt_masking_cost): New.
> > (default_finish_cost): Add masking_prologue_cost and masking_body_cost
> > args.
> > * targhooks.h (default_add_stmt_masking_cost): New.
> > * tree-vect-loop.c (vect_estimate_min_profitable_iters): Adjust
> > finish_cost call.
> > * tree-vect-slp.c (vect_bb_vectorization_profitable_p): Likewise.
> > * tree-vectorizer.h (add_stmt_masking_cost): New.
> > (finish_cost): Add masking_prologue_cost and masking_body_cost args.
> >
> >
> >diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> >index 9f62089..6c2c364 100644
> >--- a/gcc/config/i386/i386.c
> >+++ b/gcc/config/i386/i386.c
> >@@ -53932,8 +53932,12 @@ ix86_spill_class (reg_class_t rclass, machine_mode 
> >mode)
> > static void *
> > ix86_init_cost (struct loop *)
> > {
> >-  unsigned *cost = XNEWVEC (unsigned, 3);
> >-  cost[vect_prologue] = cost[vect_body] = cost[vect_epilogue] = 0;
> >+  unsigned *cost = XNEWVEC (unsigned, 5);
> >+  cost[vect_prologue] = 0;
> >+  cost[vect_body] = 0;
> >+  cost[vect_epilogue] = 0;
> >+  cost[vect_masking_prologue] = 0;
> >+  cost[vect_masking_body] = 0;
> >   return cost;
> Trivial nit -- no need or desire to use whitespace to line up the
> initializers.   It looks like others may have done this in the duplicated
> instances of finish_cost. But we shouldn't propagate that mistake into the
> init_cost hooks ;-)
> 
> 
> @@ -1115,8 +1117,12 @@ default_get_mask_mode (unsigned nunits, unsigned
> vector_size)
> > void *
> > default_init_cost (struct loop *loop_info ATTRIBUTE_UNUSED)
> > {
> >-  unsigned *cost = XNEWVEC (unsigned, 3);
> >-  cost[vect_prologue] = cost[vect_body] = cost[vect_epilogue] = 0;
> >+  unsigned *cost = XNEWVEC (unsigned, 5);
> >+  cost[vect_prologue] = 0;
> >+  cost[vect_body] = 0;
> >+  cost[vect_epilogue] = 0;
> >+  cost[vect_masking_prologue] = 0;
> >+  cost[vect_masking_body] = 0;
> >   return cost;
> Here too.  There's others.  I won't point them all out.  Please double check
> for this nit in any added code.  You don't have to go back and fix existing
> problems of this nature.
> 
> I don't see anything here I really object to -- Richi and I may disagree on
> the compute-costs once in a single scan vs restarting the scan.  If Richi
> feels strongly about restarting for some reason, I'll defer to him -- he's
> done more work in the vectorizer than myself.
> 
> I'd suggest taking another stab at the docs for the hooks based on Richi's
> question about whether or not the hook returns the cost of hte masked
> statement or the cost of masking the statement.
> 
> jeff

Thanks for review.  Here is an updated version with initializers and
documentation fixed.

Thanks,
Ilya
--
gcc/

2016-05-22  Ilya Enkovich  

* config/i386/i386.c (ix86_init_cost): Extend costs array.
(ix86_add_stmt_masking_cost): New.
(ix86_finish_cost): Add masking_prologue_cost and masking_body_cost
args.
(TARGET_VECTORIZE_ADD_STMT_MASKING_COST): New.
* config/i386/i386.h (TARGET_INCREASE_MASK_STORE_COST): New.
* config/i386/x86-tune.def (X86_TUNE_INCREASE_MASK_STORE_COST): New.
* config/rs6000/rs6000.c (_rs6000_cost_data): Extend cost array.
(rs6000_init_cost): Initialize 

Re: [PATCH PING] boehm-gc: check for execinfo.h directly

2016-06-22 Thread Mike Frysinger
On 21 Jun 2016 21:10, Jeff Law wrote:
> On 06/21/2016 06:59 PM, Mike Frysinger wrote:
> > On 21 Jun 2016 15:46, Jeff Law wrote:
> >
> >> If accepted into upstream Boehm-GC, then this is obviously acceptable in
> >> GCC's copy.
> >
> > so changes can be pushed directly if it's already in upstream ?
> > my original goal is already fixed in upstream, but it's not in
> > gcc's copy ...
> 
> Yes.  Ideally we'd just resync at some point in the near future, but if 
> you want to cherry-pick a fix from upstream so it gets fixed sooner, 
> that's fine.

great, then i'll cherry pick the fix as it was merged upstream,
and pursue the updated version via upstream's github.
-mike
From cc481b795b819595d2d1a5d671448c9894d1f05c Mon Sep 17 00:00:00 2001
From: Mike Frysinger 
Date: Wed, 22 Jun 2016 10:04:18 -0400
Subject: [PATCH] boehm-gc: pull in upstream fix for uClibc

2007-12-18  Hans Boehm  (really Radek Polak)

	* include/gc.h: Don't define GC_HAVE_BUILTIN_BACKTRACE for uclibc.
---
 boehm-gc/ChangeLog| 4 
 boehm-gc/include/gc.h | 2 +-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/boehm-gc/ChangeLog b/boehm-gc/ChangeLog
index 6896c67b757a..208b18efd4ea 100644
--- a/boehm-gc/ChangeLog
+++ b/boehm-gc/ChangeLog
@@ -1,3 +1,7 @@
+2016-06-22  Mike Frysinger  
+
+	* include/gc.h: Don't define GC_HAVE_BUILTIN_BACKTRACE for uclibc.
+
 2016-03-29  Samuel Thibault  
 
 	* configure.host: Set gc_use_mmap on *-kfreebsd-gnu* and *-gnu*.
diff --git a/boehm-gc/include/gc.h b/boehm-gc/include/gc.h
index 6b38f2d0e6ca..fca98ffb61d5 100644
--- a/boehm-gc/include/gc.h
+++ b/boehm-gc/include/gc.h
@@ -503,7 +503,7 @@ GC_API GC_PTR GC_malloc_atomic_ignore_off_page GC_PROTO((size_t lb));
 #if defined(__linux__) || defined(__GLIBC__)
 # include 
 # if (__GLIBC__ == 2 && __GLIBC_MINOR__ >= 1 || __GLIBC__ > 2) \
- && !defined(__ia64__)
+ && !defined(__ia64__) && !defined(__UCLIBC__)
 #   ifndef GC_HAVE_BUILTIN_BACKTRACE
 # define GC_HAVE_BUILTIN_BACKTRACE
 #   endif
-- 
2.8.2



signature.asc
Description: Digital signature


Re: [PATCH, PR middle-end/71488] Fix vectorization of comparison of booleans

2016-06-22 Thread Ilya Enkovich
2016-06-21 23:57 GMT+03:00 Jeff Law :
> On 06/16/2016 05:06 AM, Ilya Enkovich wrote:
>>
>> Hi,
>>
>> This patch fixes incorrect comparison vectorization for booleans.
>> The problem is that regular comparison which works for scalars
>> doesn't work for vectors due to different binary representation.
>> Also this never works for scalar masks.
>>
>> This patch replaces such comparisons with bitwise operations
>> which work correctly for both vector and scalar masks.
>>
>> Bootstrapped and regtested on x86_64-unknown-linux-gnu.  Is it
>> OK for trunk?  What should be done for gcc-6-branch?  Port this
>> patch or just restrict vectorization for comparison of booleans?
>>
>> Thanks,
>> Ilya
>> --
>> gcc/
>>
>> 2016-06-15  Ilya Enkovich  
>>
>> PR middle-end/71488
>> * tree-vect-patterns.c (vect_recog_mask_conversion_pattern):
>> Support
>> comparison of boolean vectors.
>> * tree-vect-stmts.c (vectorizable_comparison): Vectorize
>> comparison
>> of boolean vectors using bitwise operations.
>>
>> gcc/testsuite/
>>
>> 2016-06-15  Ilya Enkovich  
>>
>> PR middle-end/71488
>> * g++.dg/pr71488.C: New test.
>> * gcc.dg/vect/vect-bool-cmp.c: New test.
>
> OK.  Given this is a code generation bug, I'll support porting this patch to
> the gcc-6 branch.  Is there any reason to think that porting out be more
> risky than usual?  It looks pretty simple to me, am I missing some subtle
> dependency?

I don't feel this patch is too risky.  I asked only because simple restriction
on masks comparison is even more safe.

Thanks,
Ilya


>
> jeff
>


Re: [PATCH] Implement -fdiagnostics-parseable-fixits

2016-06-22 Thread Mike Stump
On Jun 21, 2016, at 8:25 PM, David Malcolm  wrote:
> I implemented tests using both -fself-test and DejaGnu.
> For the DejaGnu test coverage, I attempted to implement detection of the
> output strings via existing directives, but after several hours of
> failing, I instead implemented a new "dg-regexp" directive, which doesn't
> expect anything other than the given regexp.  If anyone can see a way
> to implement the tests using existing directives, I'll port to that.

> I need review of the proposed additions to gcc/testsuite/lib at least
> (the rest I believe I can self-approve, but another pair of eyes would
> be appreciated).

Test suite bits look fine; Ok.


Re: [Patch AArch64] Use default_elf_asm_named_section instead of special cased hook

2016-06-22 Thread Andreas Schwab
* config/aarch64/aarch64-protos.h (aarch64_elf_asm_named_section):
Remove declaration.

diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index e8c2ac8..3cdd69b 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -371,7 +371,6 @@ unsigned aarch64_dbx_register_number (unsigned);
 unsigned aarch64_trampoline_size (void);
 void aarch64_asm_output_labelref (FILE *, const char *);
 void aarch64_cpu_cpp_builtins (cpp_reader *);
-void aarch64_elf_asm_named_section (const char *, unsigned, tree);
 const char * aarch64_gen_far_branch (rtx *, int, const char *, const char *);
 const char * aarch64_output_probe_stack_range (rtx, rtx);
 void aarch64_err_no_fpadvsimd (machine_mode, const char *);
-- 
2.9.0


-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


Re: [patch] Fix problematic debug info for parameters at -O0

2016-06-22 Thread Jakub Jelinek
On Wed, Jun 22, 2016 at 03:16:20PM +0200, Eric Botcazou wrote:
> /* { dg-do run } */
> /* { dg-options "-g" } */
> /* { dg-skip-if "" { *-*-* }  { "*" } { "-O0" } } */
> 
> typedef __UINTPTR_TYPE__ uintptr_t;
> 
> typedef struct { uintptr_t pa; uintptr_t pb; } fatp_t
>   __attribute__ ((aligned (2 * __alignof__ (uintptr_t;
> 
> __attribute__((noinline, noclone)) void
> clear_stack (void)
> {
>   char a[128 * 1024 + 128];
> 
>   __builtin_memset (a, 0, sizeof (a));

Do you really need the memset in there to reproduce it?
Wouldn't asm volatile ("" : : "r" ([0]) : "memory");
or something similar be enough?  Or if you need to clear something,
clear much smaller part of the array?

Jakub


Re: [patch] Fix problematic debug info for parameters at -O0

2016-06-22 Thread Eric Botcazou
> The testcase doesn't necessarily need to FAIL without the patch on x86, it
> is fine if it fails on some PowerPC* or Visium.

Here's what I have installed on mainline and 6 branch (not sure it's worth 
fixing on the aging 5 branch).  The test fails on PowerPC/Linux:

(gdb) b param-5.c:26
Breakpoint 1 at 0x1510: file param-5.c, line 26.
(gdb) run
Starting program: /nfs/tron/work/botcazou/gcc-head/powerpc-linux-gnu/param-5 

Breakpoint 1, foo (
str=, 
count=0) at param-5.c:26
26count--;  /* BREAK */
(gdb) bt
#0  foo (
str=, 
count=0) at param-5.c:26
#1  0x150c in foo (
str=, count=1)
at param-5.c:24
#2  0x1590 in main () at param-5.c:33


2016-06-22  Eric Botcazou  

* function.c (assign_parm_setup_reg): Prevent sharing in another case.


2016-06-22  Eric Botcazou  

* gcc.dg/guality/param-5.c: New test.

-- 
Eric BotcazouIndex: function.c
===
--- function.c	(revision 237677)
+++ function.c	(working copy)
@@ -3314,6 +3314,8 @@ assign_parm_setup_reg (struct assign_par
 	  set_mem_attributes (parmreg, parm, 1);
 	}
 
+  /* We need to preserve an address based on VIRTUAL_STACK_VARS_REGNUM for
+	 the debug info in case it is not legitimate.  */
   if (GET_MODE (parmreg) != GET_MODE (rtl))
 	{
 	  rtx tempreg = gen_reg_rtx (GET_MODE (rtl));
@@ -3323,7 +3325,8 @@ assign_parm_setup_reg (struct assign_par
 			 all->last_conversion_insn);
 	  emit_move_insn (tempreg, rtl);
 	  tempreg = convert_to_mode (GET_MODE (parmreg), tempreg, unsigned_p);
-	  emit_move_insn (parmreg, tempreg);
+	  emit_move_insn (MEM_P (parmreg) ? copy_rtx (parmreg) : parmreg,
+			  tempreg);
 	  all->first_conversion_insn = get_insns ();
 	  all->last_conversion_insn = get_last_insn ();
 	  end_sequence ();
@@ -3331,7 +3334,7 @@ assign_parm_setup_reg (struct assign_par
 	  did_conversion = true;
 	}
   else
-	emit_move_insn (parmreg, rtl);
+	emit_move_insn (MEM_P (parmreg) ? copy_rtx (parmreg) : parmreg, rtl);
 
   rtl = parmreg;
 
/* { dg-do run } */
/* { dg-options "-g" } */
/* { dg-skip-if "" { *-*-* }  { "*" } { "-O0" } } */

typedef __UINTPTR_TYPE__ uintptr_t;

typedef struct { uintptr_t pa; uintptr_t pb; } fatp_t
  __attribute__ ((aligned (2 * __alignof__ (uintptr_t;

__attribute__((noinline, noclone)) void
clear_stack (void)
{
  char a[128 * 1024 + 128];

  __builtin_memset (a, 0, sizeof (a));
}

__attribute__((noinline, noclone)) void
foo (fatp_t str, int count)
{
  char a[128 * 1024];

  if (count > 0)
foo (str, count - 1);
  clear_stack ();
  count--;  /* BREAK */
}

int
main (void)
{
  fatp_t ptr = { 31415927, 27182818 };
  foo (ptr, 1);
  return 0;
}

/* { dg-final { gdb-test 26 "str.pa" "31415927" } } */
/* { dg-final { gdb-test 26 "str.pb" "27182818" } } */


[PATCH 1/3] Add gcc-auto-profile script

2016-06-22 Thread Andi Kleen
From: Andi Kleen 

Using autofdo is currently something difficult. It requires using the
model specific branches taken event, which differs on different CPUs.
The example shown in the manual requires a special patched version of
perf that is non standard, and also will likely not work everywhere.

This patch adds a new gcc-auto-profile script that figures out the
correct event and runs perf.

This is needed to actually make use of autofdo in a generic way
in the build system and in the test suite.

Since maintaining the script would be somewhat tedious (needs changes
every time a new CPU comes out) I auto generated it from the online
Intel event database. The script to do that is in contrib and can be
rerun.

Right now there is no test if perf works in configure. This
would vary depending on the build and target system, and since
it currently doesn't work in virtualization and needs uptodate
kernel it may often fail in common distribution build setups.

So far the script is not installed.

v2: Remove documentation of gcc-auto-profile, as its not
installed.

gcc/:
2016-06-22  Andi Kleen  

* doc/invoke.texi: Document gcc-auto-profile
* config/i386/gcc-auto-profile: New file.

contrib/:

2016-06-22  Andi Kleen  

* gen_autofdo_event.py: New file to regenerate
gcc-auto-profile.
---
 contrib/gen_autofdo_event.py | 155 +++
 gcc/config/i386/gcc-auto-profile |  70 ++
 2 files changed, 225 insertions(+)
 create mode 100755 contrib/gen_autofdo_event.py
 create mode 100755 gcc/config/i386/gcc-auto-profile

diff --git a/contrib/gen_autofdo_event.py b/contrib/gen_autofdo_event.py
new file mode 100755
index 000..66cd613
--- /dev/null
+++ b/contrib/gen_autofdo_event.py
@@ -0,0 +1,155 @@
+#!/usr/bin/python
+# Generate Intel taken branches Linux perf event script for autofdo profiling.
+
+# Copyright (C) 2016 Free Software Foundation, Inc.
+#
+# GCC is free software; you can redistribute it and/or modify it under
+# the terms of the GNU General Public License as published by the Free
+# Software Foundation; either version 3, or (at your option) any later
+# version.
+#
+# GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+# WARRANTY; without even the implied warranty of MERCHANTABILITY or
+# FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+# for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# .  */
+
+# Run it with perf record -b -e EVENT program ...
+# The Linux Kernel needs to support the PMU of the current CPU, and
+# It will likely not work in VMs.
+# Add --all to print for all cpus, otherwise for current cpu.
+# Add --script to generate shell script to run correct event.
+#
+# Requires internet (https) access. This may require setting up a proxy
+# with export https_proxy=...
+#
+import urllib2
+import sys
+import json
+import argparse
+import collections
+
+baseurl = "https://download.01.org/perfmon;
+
+target_events = (u'BR_INST_RETIRED.NEAR_TAKEN',
+ u'BR_INST_EXEC.TAKEN',
+ u'BR_INST_RETIRED.TAKEN_JCC',
+ u'BR_INST_TYPE_RETIRED.COND_TAKEN')
+
+ap = argparse.ArgumentParser()
+ap.add_argument('--all', '-a', help='Print for all CPUs', action='store_true')
+ap.add_argument('--script', help='Generate shell script', action='store_true')
+args = ap.parse_args()
+
+eventmap = collections.defaultdict(list)
+
+def get_cpu_str():
+with open('/proc/cpuinfo', 'r') as c:
+vendor, fam, model = None, None, None
+for j in c:
+n = j.split()
+if n[0] == 'vendor_id':
+vendor = n[2]
+elif n[0] == 'model' and n[1] == ':':
+model = int(n[2])
+elif n[0] == 'cpu' and n[1] == 'family':
+fam = int(n[3])
+if vendor and fam and model:
+return "%s-%d-%X" % (vendor, fam, model), model
+return None, None
+
+def find_event(eventurl, model):
+print >>sys.stderr, "Downloading", eventurl
+u = urllib2.urlopen(eventurl)
+events = json.loads(u.read())
+u.close()
+
+found = 0
+for j in events:
+if j[u'EventName'] in target_events:
+event = "cpu/event=%s,umask=%s/" % (j[u'EventCode'], j[u'UMask'])
+if u'PEBS' in j and j[u'PEBS'] > 0:
+event += "p"
+if args.script:
+eventmap[event].append(model)
+else:
+print j[u'EventName'], "event for model", model, "is", event
+found += 1
+return found
+
+if not args.all:
+cpu, model = get_cpu_str()
+if not cpu:
+sys.exit("Unknown CPU type")
+
+url = baseurl + "/mapfile.csv"
+print >>sys.stderr, "Downloading", url
+u = urllib2.urlopen(url)
+found = 

[PATCH 2/3] Run profile feedback tests with autofdo

2016-06-22 Thread Andi Kleen
From: Andi Kleen 

Extend the existing bprob and tree-prof tests to also run with autofdo.
The test runtimes are really a bit too short for autofdo, but it's
a reasonable sanity check.

This only works natively for now.

dejagnu doesn't seem to support a wrapper for unix tests, so I had
to open code running these tests.  That should be ok due to the
native run restrictions.

gcc/testsuite/:

2016-06-22  Andi Kleen  

* g++.dg/bprob/bprob.exp: Support autofdo.
* g++.dg/tree-prof/tree-prof.exp: dito.
* gcc.dg/tree-prof/tree-prof.exp: dito.
* gcc.misc-tests/bprob.exp: dito.
* gfortran.dg/prof/prof.exp: dito.
* lib/profopt.exp: dito.
* lib/target-supports.exp: Check for autofdo.
---
 gcc/testsuite/g++.dg/bprob/bprob.exp | 10 
 gcc/testsuite/g++.dg/tree-prof/tree-prof.exp | 10 
 gcc/testsuite/gcc.dg/tree-prof/tree-prof.exp | 10 
 gcc/testsuite/gcc.misc-tests/bprob.exp   | 14 ++
 gcc/testsuite/gfortran.dg/prof/prof.exp  |  9 
 gcc/testsuite/lib/profopt.exp| 68 ++--
 gcc/testsuite/lib/target-supports.exp| 31 +
 7 files changed, 149 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/g++.dg/bprob/bprob.exp 
b/gcc/testsuite/g++.dg/bprob/bprob.exp
index d07..e45d965 100644
--- a/gcc/testsuite/g++.dg/bprob/bprob.exp
+++ b/gcc/testsuite/g++.dg/bprob/bprob.exp
@@ -53,6 +53,7 @@ if $tracelevel then {
 
 set profile_options "-fprofile-arcs"
 set feedback_options "-fbranch-probabilities"
+set profile_wrapper ""
 
 # Main loop.
 foreach profile_option $profile_options feedback_option $feedback_options {
@@ -65,4 +66,13 @@ foreach profile_option $profile_options feedback_option 
$feedback_options {
 }
 }
 
+foreach profile_option $profile_options feedback_option $feedback_options {
+foreach src [lsort [glob -nocomplain $srcdir/$subdir/bprob-*.c]] {
+if ![runtest_file_p $runtests $src] then {
+continue
+}
+auto-profopt-execute $src
+}
+}
+
 set PROFOPT_OPTIONS $bprob_save_profopt_options
diff --git a/gcc/testsuite/g++.dg/tree-prof/tree-prof.exp 
b/gcc/testsuite/g++.dg/tree-prof/tree-prof.exp
index 7a4b5cb..ea08602 100644
--- a/gcc/testsuite/g++.dg/tree-prof/tree-prof.exp
+++ b/gcc/testsuite/g++.dg/tree-prof/tree-prof.exp
@@ -44,6 +44,7 @@ set PROFOPT_OPTIONS [list {}]
 # profile data.
 set profile_option "-fprofile-generate -D_PROFILE_GENERATE"
 set feedback_option "-fprofile-use -D_PROFILE_USE"
+set profile_wrapper ""
 
 foreach src [lsort [glob -nocomplain $srcdir/$subdir/*.C]] {
 # If we're only testing specific files and this isn't one of them, skip it.
@@ -53,4 +54,13 @@ foreach src [lsort [glob -nocomplain $srcdir/$subdir/*.C]] {
 profopt-execute $src
 }
 
+foreach profile_option $profile_options feedback_option $feedback_options {
+foreach src [lsort [glob -nocomplain $srcdir/$subdir/bprob-*.c]] {
+if ![runtest_file_p $runtests $src] then {
+continue
+}
+auto-profopt-execute $src
+}
+}
+
 set PROFOPT_OPTIONS $treeprof_save_profopt_options
diff --git a/gcc/testsuite/gcc.dg/tree-prof/tree-prof.exp 
b/gcc/testsuite/gcc.dg/tree-prof/tree-prof.exp
index 650ad8d..abf7231 100644
--- a/gcc/testsuite/gcc.dg/tree-prof/tree-prof.exp
+++ b/gcc/testsuite/gcc.dg/tree-prof/tree-prof.exp
@@ -44,6 +44,7 @@ set PROFOPT_OPTIONS [list {}]
 # profile data.
 set profile_option "-fprofile-generate -D_PROFILE_GENERATE"
 set feedback_option "-fprofile-use -D_PROFILE_USE"
+set profile_wrapper ""
 
 foreach src [lsort [glob -nocomplain $srcdir/$subdir/*.c]] {
 # If we're only testing specific files and this isn't one of them, skip it.
@@ -53,4 +54,13 @@ foreach src [lsort [glob -nocomplain $srcdir/$subdir/*.c]] {
 profopt-execute $src
 }
 
+foreach profile_option $profile_options feedback_option $feedback_options {
+foreach src [lsort [glob -nocomplain $srcdir/$subdir/bprob-*.c]] {
+if ![runtest_file_p $runtests $src] then {
+continue
+}
+auto-profopt-execute $src
+}
+}
+
 set PROFOPT_OPTIONS $treeprof_save_profopt_options
diff --git a/gcc/testsuite/gcc.misc-tests/bprob.exp 
b/gcc/testsuite/gcc.misc-tests/bprob.exp
index 52dcb1f..f43f011 100644
--- a/gcc/testsuite/gcc.misc-tests/bprob.exp
+++ b/gcc/testsuite/gcc.misc-tests/bprob.exp
@@ -41,6 +41,7 @@ load_lib profopt.exp
 set bprob_save_profopt_options $PROFOPT_OPTIONS
 set PROFOPT_OPTIONS [list { -O2 } { -O3  }]
 
+set profile_wrapper ""
 set profile_options "-fprofile-arcs"
 set feedback_options "-fbranch-probabilities"
 
@@ -54,4 +55,17 @@ foreach profile_option $profile_options feedback_option 
$feedback_options {
 }
 }
 
+if { ! [check_profiling_available "-fauto-profile"] } {
+set PROFOPT_OPTIONS $bprob_save_profopt_options
+return
+}
+
+foreach profile_option $profile_options feedback_option $feedback_options {
+foreach 

Re: [PATCH][AArch64][2/2] (Re)Implement vcopy_lane intrinsics

2016-06-22 Thread Marcus Shawcroft
On 7 June 2016 at 17:56, Kyrill Tkachov  wrote:

> Ok for trunk?
>
> Thanks,
> Kyrill
>
> 2016-06-07  Kyrylo Tkachov  
> James Greenhalgh  
>
> * config/aarch64/arm_neon.h (vcopyq_lane_f32, vcopyq_lane_f64,
> vcopyq_lane_p8, vcopyq_lane_p16, vcopyq_lane_s8, vcopyq_lane_s16,
> vcopyq_lane_s32, vcopyq_lane_s64, vcopyq_lane_u8, vcopyq_lane_u16,
> vcopyq_lane_u32, vcopyq_lane_u64): Reimplement in C.
> (vcopy_lane_f32, vcopy_lane_f64, vcopy_lane_p8, vcopy_lane_p16,
> vcopy_lane_s8, vcopy_lane_s16, vcopy_lane_s32, vcopy_lane_s64,
> vcopy_lane_u8, vcopy_lane_u16, vcopy_lane_u32, vcopy_lane_u64,
> vcopy_laneq_f32, vcopy_laneq_f64, vcopy_laneq_p8, vcopy_laneq_p16,
> vcopy_laneq_s8, vcopy_laneq_s16, vcopy_laneq_s32, vcopy_laneq_s64,
> vcopy_laneq_u8, vcopy_laneq_u16, vcopy_laneq_u32, vcopy_laneq_u64,
> vcopyq_laneq_f32, vcopyq_laneq_f64, vcopyq_laneq_p8, vcopyq_laneq_p16,
> vcopyq_laneq_s8, vcopyq_laneq_s16, vcopyq_laneq_s32, vcopyq_laneq_s64,
> vcopyq_laneq_u8, vcopyq_laneq_u16, vcopyq_laneq_u32, vcopyq_laneq_u64):
> New intrinsics.
>
> 2016-06-07  Kyrylo Tkachov  
> James Greenhalgh  
>
> * gcc.target/aarch64/vect_copy_lane_1.c: New test.

OK, thanks /Marcus


[Ada] Crash on config pragma Component_Alignment

2016-06-22 Thread Arnaud Charlet
Pragma Component_Alignment was not implemented properly and caused a crash
when used in a configuration file due to how it was applied via the scope
table. This patch correctly identifies this case and uses a global variable
Configuration_Component_Alignment to capture the value set during
configuration analysis and applies it in place of the default value.


-- Source --


--  stor_unit.adc

pragma Component_Alignment (Form => Storage_Unit);

--  pack_storage_unit.ads

package Pack_Storage_Unit is
   pragma Component_Alignment (Form => Storage_Unit);

   type Small_Int is new Integer range 0 .. 1;
   for Small_Int'Size use 1;

   type Rec is record
  Comp_1 : Small_Int;
  Comp_2 : Small_Int;
  Comp_3 : Boolean;
  Comp_4 : Integer;
  Comp_5 : Long_Integer;
   end record;
end Pack_Storage_Unit;

--  pack.ads

package Pack is

   type Small_Int is new Integer range 0 .. 1;
   for Small_Int'Size use 1;

   type Rec is record
  Comp_1 : Small_Int;
  Comp_2 : Small_Int;
  Comp_3 : Boolean;
  Comp_4 : Integer;
  Comp_5 : Long_Integer;
   end record;
end Pack;


-- Compilation and output --


$ gcc -c -gnatR pack_storage_unit.ads > output1.txt
$ gcc -c -gnatR pack.ads -gnatec=stor_unit.adc > output2.txt
$ grep -v -F -x -f output1.txt output2.txt
Representation information for unit Pack (spec)
for Rec'Size use 128;
for Rec'Alignment use 8;
   Comp_4 at 4 range  0 .. 31;
   Comp_5 at 8 range  0 .. 63;

Tested on x86_64-pc-linux-gnu, committed on trunk

2016-06-22  Justin Squirek  

* sem_ch8.adb (Push_Scope): Add a check for when the
scope table is empty to assign the global variable
Configuration_Component_Alignment.
* sem.adb (Do_Analyze): Add Configuration_Component_Alignment
to be assigned when the environment is cleaned instead of the
default.
* sem.ads Add a global variable Configuration_Component_Alignment
to store the value given by pragma Component_Alignment in the
context of a configuration file.
* sem_prag.adb (Analyze_Pragma): Correct the case for
Component_Alignment so that the pragma is verified and add
comments to explain how it is applied to the scope stack.

Index: sem.adb
===
--- sem.adb (revision 237680)
+++ sem.adb (working copy)
@@ -1355,7 +1355,8 @@
  Outer_Generic_Scope := Empty;
  Scope_Suppress  := Suppress_Options;
  Scope_Stack.Table
-   (Scope_Stack.Last).Component_Alignment_Default := Calign_Default;
+   (Scope_Stack.Last).Component_Alignment_Default :=
+ Configuration_Component_Alignment;
  Scope_Stack.Table
(Scope_Stack.Last).Is_Active_Stack_Base := True;
 
Index: sem.ads
===
--- sem.ads (revision 237680)
+++ sem.ads (working copy)
@@ -461,6 +461,11 @@
--  Transient blocks have three associated actions list, to be inserted
--  before and after the block's statements, and as cleanup actions.
 
+   Configuration_Component_Alignment : Component_Alignment_Kind :=
+ Calign_Default;
+   --  Used for handling the pragma Component_Alignment in the context of a
+   --  configuration file.
+
type Scope_Stack_Entry is record
   Entity : Entity_Id;
   --  Entity representing the scope
Index: sem_ch8.adb
===
--- sem_ch8.adb (revision 237693)
+++ sem_ch8.adb (working copy)
@@ -8192,10 +8192,22 @@
  SST.Save_Default_SSO  := Default_SSO;
  SST.Save_Uneval_Old   := Uneval_Old;
 
+ --  Each new scope pushed onto the scope stack inherits the component
+ --  alignment of the previous scope. This emulates the "visibility"
+ --  semantics of pragma Component_Alignment.
+
  if Scope_Stack.Last > Scope_Stack.First then
 SST.Component_Alignment_Default := Scope_Stack.Table
  (Scope_Stack.Last - 1).
Component_Alignment_Default;
+
+ --  Otherwise, this is the first scope being pushed on the scope
+ --  stack. Inherit the component alignment from the configuration
+ --  form of pragma Component_Alignment (if any).
+
+ else
+SST.Component_Alignment_Default :=
+  Configuration_Component_Alignment;
  end if;
 
  SST.Last_Subprogram_Name   := null;
Index: sem_prag.adb
===
--- sem_prag.adb(revision 237693)
+++ sem_prag.adb(working copy)
@@ -12787,9 +12787,21 @@
  ("invalid Form parameter for 

[Ada] Analysis of pragmas containing integer expressions not verified properly

2016-06-22 Thread Arnaud Charlet
If a string is used as an argument instead of an integer,
Check_Arg_Is_OK_Static_Expression with Any_Integer will falsely verify causing
the compiler to halt compilation when the caller acts on the assumption that it
was verified. This patch creates checks so that Any_Integer works properly and
documentation to explain how unresolved types get handled. 


-- Source --


--  static_int_test.adb

pragma C_Pass_By_Copy("JUNK"); --  Expects a static integer expression
procedure Static_Int_Test is
   Another_Error : String := 1;
begin
   null;
end Static_Int_Test;


-- Compilation and output --


$ gnatmake -q -f static_int_test.adb
static_int_test.adb:1:23: expected an integer type
static_int_test.adb:1:23: found a string type
static_int_test.adb:3:30: expected type "Standard.String"
static_int_test.adb:3:30: found type universal integer
gnatmake: "static_int_test.adb" compilation error

Tested on x86_64-pc-linux-gnu, committed on trunk

2016-06-22  Justin Squirek  

* sem_prag.adb (Check_Expr_Is_OK_Static_Expression): Fix ordering
of if-block and add in a condition to test for errors during
resolution.
* sem_res.adb (Resolution_Failed): Add comment to explain why
the type of a node which failed to resolve is set to the desired
type instead of Any_Type.
* sem_ch8.adb (Analyze_Object_Renaming): Add a check for Any_Type
to prevent crashes on Is_Access_Constant.

Index: sem_prag.adb
===
--- sem_prag.adb(revision 237686)
+++ sem_prag.adb(working copy)
@@ -5060,12 +5060,15 @@
 Analyze_And_Resolve (Expr);
  end if;
 
- if Is_OK_Static_Expression (Expr) then
-return;
+ --  An expression cannot be considered static if its resolution failed
+ --  or if it erroneous. Stop the analysis of the related pragma.
 
- elsif Etype (Expr) = Any_Type then
+ if Etype (Expr) = Any_Type or else Error_Posted (Expr) then
 raise Pragma_Exit;
 
+ elsif Is_OK_Static_Expression (Expr) then
+return;
+
  --  An interesting special case, if we have a string literal and we
  --  are in Ada 83 mode, then we allow it even though it will not be
  --  flagged as static. This allows the use of Ada 95 pragmas like
@@ -5077,12 +5080,6 @@
  then
 return;
 
- --  Static expression that raises Constraint_Error. This has already
- --  been flagged, so just exit from pragma processing.
-
- elsif Is_OK_Static_Expression (Expr) then
-raise Pragma_Exit;
-
  --  Finally, we have a real error
 
  else
Index: sem_res.adb
===
--- sem_res.adb (revision 237680)
+++ sem_res.adb (working copy)
@@ -1974,7 +1974,12 @@
   procedure Resolution_Failed is
   begin
  Patch_Up_Value (N, Typ);
+
+ --  Set the type to the desired one to minimize cascaded errors. Note
+ --  that this is an approximation and does not work in all cases.
+
  Set_Etype (N, Typ);
+
  Debug_A_Exit ("resolving  ", N, " (done, resolution failed)");
  Set_Is_Overloaded (N, False);
 
Index: sem_ch8.adb
===
--- sem_ch8.adb (revision 237680)
+++ sem_ch8.adb (working copy)
@@ -1022,22 +1022,30 @@
 
  Resolve (Nam, T);
 
+ --  Do not perform the legality checks below when the resolution of
+ --  the renaming name failed because the associated type is Any_Type.
+
+ if Etype (Nam) = Any_Type then
+null;
+
  --  Ada 2005 (AI-231): In the case where the type is defined by an
  --  access_definition, the renamed entity shall be of an access-to-
  --  constant type if and only if the access_definition defines an
  --  access-to-constant type. ARM 8.5.1(4)
 
- if Constant_Present (Access_Definition (N))
+ elsif Constant_Present (Access_Definition (N))
and then not Is_Access_Constant (Etype (Nam))
  then
-Error_Msg_N ("(Ada 2005): the renamed object is not "
- & "access-to-constant (RM 8.5.1(6))", N);
+Error_Msg_N
+   ("(Ada 2005): the renamed object is not access-to-constant "
+& "(RM 8.5.1(6))", N);
 
  elsif not Constant_Present (Access_Definition (N))
and then Is_Access_Constant (Etype (Nam))
  then
-Error_Msg_N ("(Ada 2005): the renamed object is not "
- & "access-to-variable (RM 8.5.1(6))", N);
+Error_Msg_N
+  ("(Ada 2005): the renamed object is not access-to-variable "
+   & "(RM 8.5.1(6))", N);
  

[Ada] Spurious error on derived type with unknown discriminants and predicate

2016-06-22 Thread Arnaud Charlet
This patch fixes a spurious error on an instantiation of an unbounded
container, when the element type is a private type with unknown discriminants,
derived from an array subtype with a predicate aspect.

The following must ocmpile quietly:

   gcc -c gpr2-attribute.adb

---
package GPR2 is

   subtype Name_Type is String
 with Dynamic_Predicate => Name_Type'Length > 0;

end GPR2;
---
with Ada.Strings.Unbounded; use Ada.Strings.Unbounded;

package GPR2.Attribute is

   type Qualified_Name (<>) is private;

   function Create (Name : Name_Type) return Qualified_Name;

private

   type Qualified_Name is new Name_Type;

end GPR2.Attribute;
---
with Ada.Containers.Indefinite_Ordered_Maps;

package body GPR2.Attribute is

   type Def is null record;

   package Attribute_Definitions is new Ada.Containers.Indefinite_Ordered_Maps
 (Qualified_Name, Def);

   function Create (Name : Name_Type) return Qualified_Name is
   begin
  return Qualified_Name (Name);
   end Create;

end GPR2.Attribute;

Tested on x86_64-pc-linux-gnu, committed on trunk

2016-06-22  Ed Schonberg  

* einfo.ads, einfo.adb (Is_Actual_Subtype): New flag, defined
on subtypes that are created within subprogram bodies to handle
unconstrained composite formals.
* checks.adb (Apply_Predicate_Check): Do not generate a check on
an object whose type is an actual subtype.
* sem_ch6.adb (Set_Actual_Subtypes): Do not generate an
actual subtype for a formal whose base type is private.
Set Is_Actual_Subtype on corresponding entity after analyzing
its declaration.

Index: einfo.adb
===
--- einfo.adb   (revision 237680)
+++ einfo.adb   (working copy)
@@ -607,8 +607,8 @@
 
--Has_Inherited_InvariantsFlag291
--Is_Partial_Invariant_Procedure  Flag292
+   --Is_Actual_Subtype   Flag293
 
-   --(unused)Flag293
--(unused)Flag294
--(unused)Flag295
--(unused)Flag296
@@ -2014,6 +2014,12 @@
   return Flag69 (Id);
end Is_Access_Constant;
 
+   function Is_Actual_Subtype (Id : E) return B is
+   begin
+  pragma Assert (Is_Type (Id));
+  return Flag293 (Id);
+   end Is_Actual_Subtype;
+
function Is_Ada_2005_Only (Id : E) return B is
begin
   return Flag185 (Id);
@@ -5036,6 +5042,12 @@
   Set_Flag69 (Id, V);
end Set_Is_Access_Constant;
 
+   procedure Set_Is_Actual_Subtype (Id : E; V : B := True) is
+   begin
+  pragma Assert (Is_Type (Id));
+  Set_Flag293 (Id, V);
+   end Set_Is_Actual_Subtype;
+
procedure Set_Is_Ada_2005_Only (Id : E; V : B := True) is
begin
   Set_Flag185 (Id, V);
@@ -9186,6 +9198,7 @@
   W ("Is_Abstract_Subprogram",  Flag19  (Id));
   W ("Is_Abstract_Type",Flag146 (Id));
   W ("Is_Access_Constant",  Flag69  (Id));
+  W ("Is_Actual_Subtype",   Flag293 (Id));
   W ("Is_Ada_2005_Only",Flag185 (Id));
   W ("Is_Ada_2012_Only",Flag199 (Id));
   W ("Is_Aliased",  Flag15  (Id));
Index: einfo.ads
===
--- einfo.ads   (revision 237680)
+++ einfo.ads   (working copy)
@@ -2232,6 +2232,10 @@
 --Is_Access_Type (synthesized)
 --   Applies to all entities, true for access types and subtypes
 
+--Is_Actual_Subtype (Flag293)
+--   Defined on all types, true for the generated constrained subtypes
+--   that are built for unconstrained composite actuals.
+
 --Is_Ada_2005_Only (Flag185)
 --   Defined in all entities, true if a valid pragma Ada_05 or Ada_2005
 --   applies to the entity which specifically names the entity, indicating
@@ -7017,6 +7021,7 @@
function Is_Abstract_Subprogram  (Id : E) return B;
function Is_Abstract_Type(Id : E) return B;
function Is_Access_Constant  (Id : E) return B;
+   function Is_Actual_Subtype   (Id : E) return B;
function Is_Ada_2005_Only(Id : E) return B;
function Is_Ada_2012_Only(Id : E) return B;
function Is_Aliased  (Id : E) return B;
@@ -7689,6 +7694,7 @@
procedure Set_Is_Abstract_Subprogram  (Id : E; V : B := True);
procedure Set_Is_Abstract_Type(Id : E; V : B := True);
procedure Set_Is_Access_Constant  (Id : E; V : B := True);
+   procedure Set_Is_Actual_Subtype   (Id : E; V : B := True);
procedure Set_Is_Ada_2005_Only(Id : E; V : B := True);
procedure Set_Is_Ada_2012_Only(Id : E; V : B := True);
procedure Set_Is_Aliased  (Id : E; V : B := True);
@@ -8477,6 

[Ada] Independent tasks and the Fall_Back_Handler

2016-06-22 Thread Arnaud Charlet
This patch fixes a bug in which if a Fall_Back_Handler is installed for the
environment task, independent tasks will call it. The following test should
run quietly:

with Ada.Text_IO;
package body Debug is

   protected body Dbg is
  procedure Termination
(Cause   : in Task_Termination.Cause_Of_Termination;
 Task_Id : in Task_Identification.Task_Id;
 Except  : in Exceptions.Exception_Occurrence) is
  begin
 Text_IO.Put_Line
   (Task_Identification.Image (Task_Id) & " " & Cause'Img);
  end Termination;
   end Dbg;

end Debug;

with Ada.Exceptions,
  Ada.Task_Termination,
  Ada.Task_Identification;

use Ada;

package Debug is

   protected Dbg is
  procedure Termination
(Cause   : in Task_Termination.Cause_Of_Termination;
 Task_Id : in Task_Identification.Task_Id;
 Except  : in Exceptions.Exception_Occurrence);
   end Dbg;

end Debug;

with Ada.Real_Time.Timing_Events,
  Ada.Task_Termination,
  Debug;
use Ada;

procedure Pb_Terminate is
begin
   Task_Termination.Set_Dependents_Fallback_Handler
 (Debug.Dbg.Termination'Access);
end Pb_Terminate;

Tested on x86_64-pc-linux-gnu, committed on trunk

2016-06-22  Bob Duff  

* s-tassta.adb (Task_Wrapper): Fix handling of Fall_Back_Handler
wrt independent tasks.

Index: s-tassta.adb
===
--- s-tassta.adb(revision 237680)
+++ s-tassta.adb(working copy)
@@ -6,7 +6,7 @@
 --  --
 --  B o d y --
 --  --
--- Copyright (C) 1992-2014, Free Software Foundation, Inc.  --
+-- Copyright (C) 1992-2016, Free Software Foundation, Inc.  --
 --  --
 -- GNARL is free software; you can  redistribute it  and/or modify it under --
 -- terms of the  GNU General Public License as published  by the Free Soft- --
@@ -1339,7 +1339,13 @@
 
   if Self_ID.Common.Specific_Handler /= null then
  TH := Self_ID.Common.Specific_Handler;
-  else
+
+  --  Independent tasks should not call the Fall_Back_Handler (of the
+  --  environment task), because they are implementation artifacts that
+  --  should be invisible to Ada programs.
+
+  elsif Self_ID.Master_of_Task /= Independent_Task_Level then
+
  --  Look for a fall-back handler following the master relationship
  --  for the task. As specified in ARM C.7.3 par. 9/2, "the fall-back
  --  handler applies only to the dependent tasks of the task". Hence,


Re: [PATCH][AArch64][1/2] Add support INS (element) instruction to copy lanes between vectors

2016-06-22 Thread Kyrill Tkachov

Ping.
Richard, Marcus, do you have any feedback on this?

https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00502.html

Thanks,
Kyrill
On 14/06/16 10:36, James Greenhalgh wrote:

On Tue, Jun 07, 2016 at 05:56:47PM +0100, Kyrill Tkachov wrote:

Hi all,

This patch addresses an deficiency we have in handling vector lane-to-lane
moves in the AArch64 backend.  Generally we can use the INS (element)
instruction but, as a user complains in
https://gcc.gnu.org/ml/gcc-help/2016-05/msg00069.html
we don't. James had a patch adding an appropriate combine pattern some time
ago (https://gcc.gnu.org/ml/gcc-patches/2013-09/msg01068.html) but it never
got applied.

This patch is a rebase of that patch that adds necessary
vec_merge+vec_duplicate+vec_select combine pattern.  I chose to use a
define_insn rather than the define_insn_and_split in that patch that just
deletes the instruction when the source and destination registers are the
same, as I think that's not he combine patterns job to delete the redundant
instruction but rather some other passes job. Also, I was not able to create
a testcase where it would make a difference.

Also, this patch doesn't reimplement that vcopy*lane* intrinsics from inline
assembly to a vget_lane+vset_lane combo.  This can be done as a separate
patch on top of this one.

Bootstrapped and tested on aarch64-none-linux-gnu.
Also tested on aarch64_be-none-elf.

Ok for trunk?

This looks OK to me, but as it is based on my code I probably can't
approve it within the spirit of the write access policies (I only have
localized review permission).

Best wait for Richard/Marcus or a global reviewer to take a look.


Thanks,
Kyrill

2016-06-07  James Greenhalgh  
 Kyrylo Tkachov  

 * config/aarch64/aarch64-simd.md (*aarch64_simd_vec_copy_lane):
 New define_insn.
  (*aarch64_simd_vec_copy_lane_): Likewise.

Watch your ChangeLog formatting.

Thanks,
James


2016-06-07  James Greenhalgh  
 Kyrylo Tkachov  

 * gcc.target/aarch64/vget_set_lane_1.c: New test.






Re: [PATCH][AArch64][2/2] (Re)Implement vcopy_lane intrinsics

2016-06-22 Thread Kyrill Tkachov

Ping.
Richard, Marcus, do you have any feedback on this?

https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00503.html

Thanks,
Kyrill

On 14/06/16 10:38, James Greenhalgh wrote:

On Tue, Jun 07, 2016 at 05:56:51PM +0100, Kyrill Tkachov wrote:

Hi all,

This is the second part of James's patch from:
https://gcc.gnu.org/ml/gcc-patches/2013-09/msg01068.html
separated out. It reimplements the vcopyq_lane* intrinsics in C and
adds implementations of the other missing vcopy_lane_ intrinsics.

The differences from that patch are in the use of __aarch64_vset_lane_any and
__aarch64_vget_lane_any rather than the typed variants of these that were
used back in 2013 (and don't exist anymore).

The testcase is also adjusted for the ABI change in GCC 5 where integer x1
vectors are now passed and returned in SIMD registers.

The vcopy_laneq_f64 test in the testcase is currently XFAILed because it
currently doesn't generate the optimal DUP instruction but instead emits a
UMOV to a scalar register and then an FMOV. This is a GCC 7 regression
tracked by PR 71307 and I think unrelated to this patch.

Bootstrapped and tested on aarch64-none-linux-gnu.  Also tested on
aarch64_be-none-elf.

Ok for trunk?

Again, this looks OK to me, but as it is based on my code I can't approve
it within the spirit of the write access policies. Please wait for Marcus
or Richard to take a look.

Thanks,
James


Thanks,
Kyrill

2016-06-07  Kyrylo Tkachov  
 James Greenhalgh  

 * config/aarch64/arm_neon.h (vcopyq_lane_f32, vcopyq_lane_f64,
 vcopyq_lane_p8, vcopyq_lane_p16, vcopyq_lane_s8, vcopyq_lane_s16,
 vcopyq_lane_s32, vcopyq_lane_s64, vcopyq_lane_u8, vcopyq_lane_u16,
 vcopyq_lane_u32, vcopyq_lane_u64): Reimplement in C.
 (vcopy_lane_f32, vcopy_lane_f64, vcopy_lane_p8, vcopy_lane_p16,
 vcopy_lane_s8, vcopy_lane_s16, vcopy_lane_s32, vcopy_lane_s64,
 vcopy_lane_u8, vcopy_lane_u16, vcopy_lane_u32, vcopy_lane_u64,
 vcopy_laneq_f32, vcopy_laneq_f64, vcopy_laneq_p8, vcopy_laneq_p16,
 vcopy_laneq_s8, vcopy_laneq_s16, vcopy_laneq_s32, vcopy_laneq_s64,
 vcopy_laneq_u8, vcopy_laneq_u16, vcopy_laneq_u32, vcopy_laneq_u64,
 vcopyq_laneq_f32, vcopyq_laneq_f64, vcopyq_laneq_p8, vcopyq_laneq_p16,
 vcopyq_laneq_s8, vcopyq_laneq_s16, vcopyq_laneq_s32, vcopyq_laneq_s64,
 vcopyq_laneq_u8, vcopyq_laneq_u16, vcopyq_laneq_u32, vcopyq_laneq_u64):
 New intrinsics.

2016-06-07  Kyrylo Tkachov  
 James Greenhalgh  

 * gcc.target/aarch64/vect_copy_lane_1.c: New test.




[Ada] Improve and unify warning machinery for address clauses

2016-06-22 Thread Arnaud Charlet
This change moves the rest of the warning machinery for address clauses to
Validate_Address_Clauses, ensuring that all the variants are issued from it.
This affects only absolute address clauses in practice, i.e. address clauses
of the form for I'Address use To_Address (16#_#) and variants thereof.

This automatically brings a couple of improvements: warnings are more accurate
because they take into account the final alignment set by the back-end and they
catch more cases because the back-end sets the alignment of every single type
and object in the program.  The warning also prints the alignment value now.

The following code gives an example of the warnings:

pragma Unsuppress (Alignment_Check);


 1. with System.Storage_Elements; use System.Storage_Elements;
 2.
 3. package P is
 4.
 5.   I : Integer;
 6.   for I'Address use To_Address (16#7FFF_0001#); -- warning
  |
>>> warning: specified address for "I" is inconsistent with alignment
>>> warning: program execution may be erroneous (RM 13.3(27))
>>> warning: alignment of "I" is 4

 7.
 8.   type Rec is record
 9. I : Integer;
10.   end record;
11.
12.   R1 : Rec;
13.   for R1'Address use To_Address (16#7FFF_0001#); -- warning
  |
>>> warning: specified address for "R1" is inconsistent with alignment
>>> warning: program execution may be erroneous (RM 13.3(27))
>>> warning: alignment of "R1" is 4

14.
15.   C : constant System.Address := To_Address (16#7FFF_0001#); -- warning
16.
17.   R2 : Rec;
18.   for R2'Address use C;
  |
>>> warning: specified address for "R2" is inconsistent with alignment
>>> warning: program execution may be erroneous (RM 13.3(27))
>>> warning: alignment of "R2" is 4

19.
20.   R3 : Rec;
21.   for R3'Address use To_Address (16#7FFF_0004#); -- no warning
22.
23. end P;

 23 lines: No errors, 9 warnings

Tested on x86_64-pc-linux-gnu, committed on trunk

2016-06-22  Eric Botcazou  

* sem_util.ads (Address_Value): Declare new function.
* sem_util.adb (Address_Value): New function extracted
unmodified from Apply_Address_Clause_Check, which returns the
underlying value of the expression of an address clause.
* checks.adb (Compile_Time_Bad_Alignment): Delete.
(Apply_Address_Clause_Check): Call Address_Value on
the expression.  Do not issue the main warning here and
issue the secondary warning only when the value of the
expression is not known at compile time.
* sem_ch13.adb (Address_Clause_Check_Record): Add A component and
adjust the description.
(Analyze_Attribute_Definition_Clause): In the case
of an address, move up the code creating an entry in the table of
address clauses.  Also create an entry for an absolute address.
(Validate_Address_Clauses): Issue the warning for absolute
addresses here too.  Tweak condition associated with overlays
for consistency.

Index: checks.adb
===
--- checks.adb  (revision 237687)
+++ checks.adb  (revision 237688)
@@ -638,36 +638,12 @@
   AC   : constant Node_Id:= Address_Clause (E);
   Loc  : constant Source_Ptr := Sloc (AC);
   Typ  : constant Entity_Id  := Etype (E);
-  Aexp : constant Node_Id:= Expression (AC);
 
   Expr : Node_Id;
   --  Address expression (not necessarily the same as Aexp, for example
   --  when Aexp is a reference to a constant, in which case Expr gets
   --  reset to reference the value expression of the constant).
 
-  procedure Compile_Time_Bad_Alignment;
-  --  Post error warnings when alignment is known to be incompatible. Note
-  --  that we do not go as far as inserting a raise of Program_Error since
-  --  this is an erroneous case, and it may happen that we are lucky and an
-  --  underaligned address turns out to be OK after all.
-
-  
-  -- Compile_Time_Bad_Alignment --
-  
-
-  procedure Compile_Time_Bad_Alignment is
-  begin
- if Address_Clause_Overlay_Warnings then
-Error_Msg_FE
-  ("?o?specified address for& may be inconsistent with alignment",
-   Aexp, E);
-Error_Msg_FE
-  ("\?o?program execution may be erroneous (RM 13.3(27))",
-   Aexp, E);
-Set_Address_Warning_Posted (AC);
- end if;
-  end Compile_Time_Bad_Alignment;
-
--  Start of processing for Apply_Address_Clause_Check
 
begin
@@ -690,44 +666,12 @@
 
   --  Obtain expression from address clause
 
-  Expr := Expression (AC);
+  Expr := Address_Value (Expression (AC));
 
-  --  The following loop digs for the real 

[Ada] New implementation of Ada.Containers.Unbounded_Priority_Queues

2016-06-22 Thread Arnaud Charlet
This patch uses O(lg N) algorithms for Unbounded_Priority_Queues.
No expected change in behavior; no test available.

Tested on x86_64-pc-linux-gnu, committed on trunk

2016-06-22  Bob Duff  

* a-cuprqu.ads, a-cuprqu.adb: Completely rewrite this package. Use
red-black trees, which gives O(lg N) worst-case performance on
Enqueue and Dequeue. The previous version had O(N) Enqueue in
the worst case.

Index: a-cuprqu.adb
===
--- a-cuprqu.adb(revision 237680)
+++ a-cuprqu.adb(working copy)
@@ -6,7 +6,7 @@
 --  --
 -- B o d y  --
 --  --
---Copyright (C) 2011-2015, Free Software Foundation, Inc.   --
+--Copyright (C) 2011-2016, Free Software Foundation, Inc.   --
 --  --
 -- GNAT is free software;  you can  redistribute it  and/or modify it under --
 -- terms of the  GNU General Public License as published  by the Free Soft- --
@@ -27,225 +27,8 @@
 -- This unit was originally developed by Matthew J Heaney.  --
 --
 
-with Ada.Unchecked_Deallocation;
-
 package body Ada.Containers.Unbounded_Priority_Queues is
 
-   package body Implementation is
-
-  ---
-  -- Local Subprograms --
-  ---
-
-  function Before_Or_Equal (X, Y : Queue_Priority) return Boolean;
-  --  True if X is before or equal to Y. Equal means both Before(X,Y) and
-  --  Before(Y,X) are False.
-
-  procedure Free is
-new Ada.Unchecked_Deallocation (Node_Type, Node_Access);
-
-  -
-  -- Before_Or_Equal --
-  -
-
-  function Before_Or_Equal (X, Y : Queue_Priority) return Boolean is
-  begin
- return (if Before (X, Y) then True else not Before (Y, X));
-  end Before_Or_Equal;
-
-  -
-  -- Dequeue --
-  -
-
-  procedure Dequeue
-(List: in out List_Type;
- Element : out Queue_Interfaces.Element_Type)
-  is
- H : constant Node_Access := List.Header'Unchecked_Access;
- pragma Assert (List.Length /= 0);
- pragma Assert (List.Header.Next /= H);
- --  List can't be empty; see the barrier
-
- pragma Assert
-   (List.Header.Next.Next = H or else
-Before_Or_Equal (Get_Priority (List.Header.Next.Element),
- Get_Priority (List.Header.Next.Next.Element)));
- --  The first item is before-or-equal to the second
-
- pragma Assert
-   (List.Header.Next.Next_Unequal = H or else
-Before (Get_Priority (List.Header.Next.Element),
-Get_Priority (List.Header.Next.Next_Unequal.Element)));
- --  The first item is before its Next_Unequal item
-
- --  The highest-priority item is always first; just remove it and
- --  return that element.
-
- X : Node_Access := List.Header.Next;
-
-  --  Start of processing for Dequeue
-
-  begin
- Element := X.Element;
- X.Next.Prev := H;
- List.Header.Next := X.Next;
- List.Header.Next_Unequal := X.Next;
- List.Length := List.Length - 1;
- Free (X);
-  end Dequeue;
-
-  procedure Dequeue
-(List : in out List_Type;
- At_Least : Queue_Priority;
- Element  : in out Queue_Interfaces.Element_Type;
- Success  : out Boolean)
-  is
-  begin
- --  This operation dequeues a high priority item if it exists in the
- --  queue. By "high priority" we mean an item whose priority is equal
- --  or greater than the value At_Least. The generic formal operation
- --  Before has the meaning "has higher priority than". To dequeue an
- --  item (meaning that we return True as our Success value), we need
- --  as our predicate the equivalent of "has equal or higher priority
- --  than", but we cannot say that directly, so we require some logical
- --  gymnastics to make it so.
-
- --  If E is the element at the head of the queue, and symbol ">"
- --  refers to the "is higher priority than" function Before, then we
- --  derive our predicate as follows:
- --original: P(E) >= At_Least
- --same as:  not (P(E) < At_Least)
- --same as:  not (At_Least > P(E))
- --same as:  not Before (At_Least, P(E))
-
- --  But that predicate needs to be true in order to successfully
- --  dequeue an item. If it's false, it 

[Ada] Crash on illegal expression in context with predicate

2016-06-22 Thread Arnaud Charlet
This patch fixes a compiler abort on a return statement for a function whose
type is a derived type with a dynamic predicate, when the return expression
has the parent type.

Compiling gpr2-attribute.adb must yield:

   gpr2-attribute.adb:8:14:
 expected type "Qualified_Name" defined at gpr2-attribute.ads:12
   gpr2-attribute.adb:8:14: found type "Standard.String"

---
package GPR2 is

   subtype Name_Type is String
 with Dynamic_Predicate => Name_Type'Length > 0;

end GPR2;
--
with Ada.Strings.Unbounded; use Ada.Strings.Unbounded;

package GPR2.Attribute is

   type Qualified_Name (<>) is private;

   function Create (Name : Name_Type) return Qualified_Name;

private

   type Qualified_Name is new Name_Type;

end GPR2.Attribute;
--

package body GPR2.Attribute is

   function Create (Name : Name_Type) return Qualified_Name is
   begin
  --  OK: return Qualified_Name (Name);
  --  with below code (missing conversion) GNAT crashes
  return Name;
   end Create;

end GPR2.Attribute;

Tested on x86_64-pc-linux-gnu, committed on trunk

2016-06-22  Ed Schonberg  

* sem_ch13.adb (Is_Predicate_Static): An inherited predicate
can be static only if it applies to a scalar type.

Index: sem_ch13.adb
===
--- sem_ch13.adb(revision 237680)
+++ sem_ch13.adb(working copy)
@@ -8552,8 +8552,7 @@
 Expression => Expr;
 
 --  If declaration has not been analyzed yet, Insert declaration
---  before freeze node.
---  Insert body after freeze node.
+--  before freeze node.  Insert body itself after freeze node.
 
 if not Analyzed (FDecl) then
Insert_Before_And_Analyze (N, FDecl);
@@ -11644,9 +11643,11 @@
   --  to specify a static predicate for a subtype which is inheriting a
   --  dynamic predicate, so the static predicate validation here ignores
   --  the inherited predicate even if it is dynamic.
+  --  In all cases, a static predicate can only apply to a scalar type.
 
   elsif Nkind (Expr) = N_Function_Call
 and then Is_Predicate_Function (Entity (Name (Expr)))
+and then Is_Scalar_Type (Etype (First_Entity (Entity (Name (Expr)
   then
  return True;
 


[Ada] New debug switch -gnatd.o

2016-06-22 Thread Arnaud Charlet
This patch causes -gnatd.o to choose a more conservative elaboration order.
The following test should compile and run quietly.

gnatmake -q -f -gnatd.o -g -O0 elab_indirect_2-main -bargs -p -ws

with Elab_Indirect;
with Elab_Indirect.Child;
package body Elab_Indirect_2 is

   procedure P is
   begin
  null;
   end P;

   procedure Process_Line (Line : String) is
   begin
  Elab_Indirect.Child.Child_Proc;
   end Process_Line;

   procedure Q is
   begin
  Elab_Indirect.Process_Lines (Process_Line'Access);
   end Q;

begin
   Q;
end Elab_Indirect_2;
package Elab_Indirect_2 is

   procedure P;

end Elab_Indirect_2;
procedure Elab_Indirect_2.Main is
begin
   null;
end Elab_Indirect_2.Main;
package body Elab_Indirect is

   procedure Process_Lines
 (Process_Line : access procedure (Line : String)) is
   begin
  Process_Line ("Hello");
   end Process_Lines;

end Elab_Indirect;
package Elab_Indirect is

   procedure Process_Lines
 (Process_Line : access procedure (Line : String));

end Elab_Indirect;
with Text_IO; use Text_IO;
package body Elab_Indirect.Child is

   type String_Ref is access all String;
   Var : String_Ref := new String'("Hello world");

   procedure Child_Proc is
   begin
  if Var.all /= "Hello world" then
 raise Program_Error;
  end if;
   end Child_Proc;

end Elab_Indirect.Child;
package Elab_Indirect.Child is

   procedure Child_Proc;

end Elab_Indirect.Child;

Tested on x86_64-pc-linux-gnu, committed on trunk

2016-06-22  Bob Duff  

* debug.adb: Document debug switch -gnatd.o.
* sem_elab.adb (Check_Internal_Call): Debug switch -gnatd.o
now causes a more conservative treatment of indirect calls,
treating P'Access as a call to P in more cases. We Can't make
this the default, because it breaks common idioms, for example
the soft links.
* sem_util.adb: Add an Assert.

Index: debug.adb
===
--- debug.adb   (revision 237680)
+++ debug.adb   (working copy)
@@ -105,7 +105,7 @@
--  d.l  Use Ada 95 semantics for limited function returns
--  d.m  For -gnatl, print full source only for main unit
--  d.n  Print source file names
-   --  d.o
+   --  d.o  Conservative elaboration order for indirect calls
--  d.p
--  d.q
--  d.r  Enable OK_To_Reorder_Components in non-variant records
@@ -556,6 +556,9 @@
--   compiler has a bug -- these are the files that need to be included
--   in a bug report.
 
+   --  d.o  Conservative elaboration order for indirect calls. This causes
+   --   P'Access to be treated as a call in more cases.
+
--  d.r  Forces the flag OK_To_Reorder_Components to be set in all record
--   base types that have no discriminants.
 
Index: sem_util.adb
===
--- sem_util.adb(revision 237680)
+++ sem_util.adb(working copy)
@@ -6314,6 +6314,7 @@
  Encl_Unit := Library_Unit (Encl_Unit);
   end loop;
 
+  pragma Assert (Nkind (Encl_Unit) = N_Compilation_Unit);
   return Encl_Unit;
end Enclosing_Lib_Unit_Node;
 
Index: sem_elab.adb
===
--- sem_elab.adb(revision 237680)
+++ sem_elab.adb(working copy)
@@ -2139,7 +2139,8 @@
   --  node comes from source.
 
   if Nkind (N) = N_Attribute_Reference
-and then (not Warn_On_Elab_Access or else not Comes_From_Source (N))
+and then ((not Warn_On_Elab_Access and then not Debug_Flag_Dot_O)
+or else not Comes_From_Source (N))
   then
  return;
 


[Ada] Spurious error with predicate on type derived from unconstrained array

2016-06-22 Thread Arnaud Charlet
This patch fixes a spurious error on the compilation of a subprogram whose
formal parameter is derived from an unconstrained array type with a dynamic
predicate aspect.

The following must compile quietly:

   gcc -c gpr2-attribute.adb
   gcc -c -gnata gpr2-attribute.adb

---
package GPR2 is

   subtype Name_Type is String
 with Dynamic_Predicate => Name_Type'Length > 0;
end GPR2;
---
package GPR2.Attribute is

   type Qualified_Name (<>) is private;

   procedure Get (Q_Name : Qualified_Name);

private

   type Qualified_Name is new Name_Type;

end GPR2.Attribute;
--
package body GPR2.Attribute is

   procedure Get (Q_Name : Qualified_Name) is
  N : Name_Type := Name_Type (Q_Name);
   begin
  null;
   end Get;

end GPR2.Attribute;

Tested on x86_64-pc-linux-gnu, committed on trunk

2016-06-22  Ed Schonberg  

* sem_ch6.adb (Set_Actual_Subtypes): If the type of the actual
has predicates, the actual subtype must be frozen properly
because of the generated tests that may follow.  The predicate
may be specified by an explicit aspect, or may be inherited in
a derivation.

Index: sem_ch6.adb
===
--- sem_ch6.adb (revision 237680)
+++ sem_ch6.adb (working copy)
@@ -11308,9 +11308,10 @@
  Freeze_Entity (Defining_Identifier (Decl), N));
 
 --  Ditto if the type has a dynamic predicate, because the
---  generated function will mention the actual subtype.
+--  generated function will mention the actual subtype. The
+--  predicate may come from an explicit aspect of be inherited.
 
-elsif Has_Dynamic_Predicate_Aspect (T) then
+elsif Has_Predicates (T) then
Insert_List_Before_And_Analyze (Decl,
  Freeze_Entity (Defining_Identifier (Decl), N));
 end if;


[PATCH][ARM] Add support for some ARMv8-A cores to driver-arm.c

2016-06-22 Thread Kyrill Tkachov

Hi all,

This patch adds entries to the arm_cpu_table in driver-arm.c to enable it to 
perform native CPU detection
on some aarch32 ARMv8-A systems. The cores added are Cortex-A32, Cortex-A35, 
Cortex-A53, Cortex-A57,
Cortex-A72, Cortex-A73.

Bootstrapped and tested on arm-none-linux-gnueabihf.

Ok for trunk?

Thanks,
Kyrill

2016-06-22  Kyrylo Tkachov  

* config/arm/driver-arm.c (arm_cpu_table): Add entries for cortex-a32,
cortex-a35, cortex-a53, cortex-a57, cortex-a72, cortex-a73.
diff --git a/gcc/config/arm/driver-arm.c b/gcc/config/arm/driver-arm.c
index 95dc9d53b6c179946d62f45b2b0d4a21960405b8..45f2f2a1a1de748b3c3ee551945cfe1b8945bc72 100644
--- a/gcc/config/arm/driver-arm.c
+++ b/gcc/config/arm/driver-arm.c
@@ -46,6 +46,12 @@ static struct vendor_cpu arm_cpu_table[] = {
 {"0xc0d", "armv7ve", "cortex-a12"},
 {"0xc0e", "armv7ve", "cortex-a17"},
 {"0xc0f", "armv7ve", "cortex-a15"},
+{"0xd01", "armv8-a+crc", "cortex-a32"},
+{"0xd04", "armv8-a+crc", "cortex-a35"},
+{"0xd03", "armv8-a+crc", "cortex-a53"},
+{"0xd07", "armv8-a+crc", "cortex-a57"},
+{"0xd08", "armv8-a+crc", "cortex-a72"},
+{"0xd09", "armv8-a+crc", "cortex-a73"},
 {"0xc14", "armv7-r", "cortex-r4"},
 {"0xc15", "armv7-r", "cortex-r5"},
 {"0xc20", "armv6-m", "cortex-m0"},


Re: [patch] Fix problematic debug info for parameters at -O0

2016-06-22 Thread Eric Botcazou
> The testcase doesn't necessarily need to FAIL without the patch on x86, it
> is fine if it fails on some PowerPC* or Visium.

Well, the value of a guality test that isn't exercised on x86 is close to 0, 
but I can try on PowerPC indeed.

-- 
Eric Botcazou


Re: [patch] Fix problematic debug info for parameters at -O0

2016-06-22 Thread Jakub Jelinek
On Wed, Jun 22, 2016 at 11:21:44AM +0200, Eric Botcazou wrote:
> > Ok, even for branches I think, but would be nice to have a corresponding
> > guality testcase (perhaps just for -O0 with dg-skip-if) which fails without
> > this patch and succeeds with it.
> 
> Thanks.  The failure mode is that the offset from VIRTUAL_STACK_VARS_REGNUM 
> is 
> too large so, on RISC architectures, it is not legitimate and the address is 
> rewritten.  Can this me mimiced on x86?

The testcase doesn't necessarily need to FAIL without the patch on x86, it
is fine if it fails on some PowerPC* or Visium.
I'd expect something like:

/* { dg-do run } */
/* { dg-options "-g" } */
/* { dg-skip-if "" { *-*-* }  { "*" } { "-O0" } } */

volatile int v++;

rettype
foo (..., sometype arg1, sometype arg2, ...)
{
  // whatever needed to grow the stack frame enough
  v++;
  /* { dg-final { gdb-test  "arg1" "3" } } */
  /* { dg-final { gdb-test  "arg2" "4" } } */
}

int
main ()
{
  foo (..., 3, 4, ...);
  return 0;
}

might be enough.

Jakub


Re: [patch] Fix problematic debug info for parameters at -O0

2016-06-22 Thread Eric Botcazou
> Ok, even for branches I think, but would be nice to have a corresponding
> guality testcase (perhaps just for -O0 with dg-skip-if) which fails without
> this patch and succeeds with it.

Thanks.  The failure mode is that the offset from VIRTUAL_STACK_VARS_REGNUM is 
too large so, on RISC architectures, it is not legitimate and the address is 
rewritten.  Can this me mimiced on x86?

-- 
Eric Botcazou


Re: Unreviewed patches

2016-06-22 Thread Rainer Orth
Hi Jeff,

> On 06/06/2016 02:16 AM, Rainer Orth wrote:
>> The following patches have remained unreviewed for a week:
>>
>>  [gotools, libcc1] Update copyright dates
>> https://gcc.gnu.org/ml/gcc-patches/2016-05/msg02307.html
> Everything bug the gotools changes are OK.  THe master bits for gotools is
> outside of GCC.  Probably best to contact Ian for getting these updated.

I'd already installed the complete patch, based on 

https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00407.html

Since update-copyright.py only updates files with FSF copyright notices,
which are local to the gcc tree, I hope this should be safe.

>> Richard already approved the update-copyright.py changes, but the actual
>> effects on gotools and libcc1 require either maintainer or release
>> manager approval, I believe.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


[patch] preserve DECL_ORIGINAL_TYPE invariant in remap_decl

2016-06-22 Thread Eric Botcazou
Hi,

the invariant is that DECL_ORIGINAL_TYPE (t) != TREE_TYPE (t) as pointer value 
if t is a TYPE_DECL.  It's enforced by the DWARF back-end:

  if (DECL_ORIGINAL_TYPE (decl))
{
  type = DECL_ORIGINAL_TYPE (decl);

  if (type == error_mark_node)
return;

  gcc_assert (type != TREE_TYPE (decl));

[...]

  /* Prevent broken recursion; we can't hand off to the same type.  */
  gcc_assert (DECL_ORIGINAL_TYPE (TYPE_NAME (type)) != type);

Unfortunately it can be easily broken in remap_decl:

  /* Remap types, if necessary.  */
  TREE_TYPE (t) = remap_type (TREE_TYPE (t), id);
  if (TREE_CODE (t) == TYPE_DECL)
DECL_ORIGINAL_TYPE (t) = remap_type (DECL_ORIGINAL_TYPE (t), id);

If TREE_TYPE (t) is for example a pointer to a variably-modified type, then 
the types are remapped by means of build_pointer_type_for_mode, which means 
that they are also canonicalized, so TREE_TYPE (t) == DECL_ORIGINAL_TYPE (t) 
after the remapping.  This happens in Ada, but also in C for:

extern void bar (void *) __attribute__((noreturn));

static int foo (int i, unsigned int n)
{
  if (i == 0)
{
  struct S { int a[n]; };
  typedef struct S *ptr;
  ptr p = __builtin_malloc (sizeof (struct S));
  bar (p);
}

  return i > 0 ? 1 : -1;
}

int f1 (int i, unsigned int n)
{
  return foo (i, n);
}

int f2 (int i, unsigned int n)
{
  return foo (i, n);
}

when foo is split into 2 parts at -O2.

This generally goes unnoticed because the inliner sets DECL_ABSTRACT_ORIGIN on 
the remapped TYPE_DECL, so gen_typedef_die skips it:

  type_die = new_die (DW_TAG_typedef, context_die, decl);
  origin = decl_ultimate_origin (decl);
  if (origin != NULL)
add_abstract_origin_attribute (type_die, origin);
  else
{
  tree type;

  add_name_and_src_coords_attributes (type_die, decl);
  if (DECL_ORIGINAL_TYPE (decl))
{
  type = DECL_ORIGINAL_TYPE (decl);

  if (type == error_mark_node)
return;

  gcc_assert (type != TREE_TYPE (decl));
  equate_type_number_to_die (TREE_TYPE (decl), type_die);
}

But, in LTO mode, DECL_ABSTRACT_ORIGIN is not streamed so it's another story 
and this for example breaks the LTO build of the Ada compiler at -O2 -g.

Hence the attached ad-hoc attempt at preserving the invariant in remap_decl, 
which appears to work and is sufficient to fix the aforementioned bootstrap.

Tested on x86_64-suse-linux, OK for the mainline?


2016-06-22  Eric Botcazou  

* tree-inline.c (remap_decl): Preserve DECL_ORIGINAL_TYPE invariant.

-- 
Eric BotcazouIndex: tree-inline.c
===
--- tree-inline.c	(revision 237677)
+++ tree-inline.c	(working copy)
@@ -367,7 +367,18 @@ remap_decl (tree decl, copy_body_data *i
   /* Remap types, if necessary.  */
   TREE_TYPE (t) = remap_type (TREE_TYPE (t), id);
   if (TREE_CODE (t) == TYPE_DECL)
-DECL_ORIGINAL_TYPE (t) = remap_type (DECL_ORIGINAL_TYPE (t), id);
+	{
+	  DECL_ORIGINAL_TYPE (t) = remap_type (DECL_ORIGINAL_TYPE (t), id);
+
+	  /* Preserve the invariant that DECL_ORIGINAL_TYPE != TREE_TYPE.  */
+	  if (DECL_ORIGINAL_TYPE (t) == TREE_TYPE (t))
+	{
+	  tree x = build_variant_type_copy (TREE_TYPE (t));
+	  TYPE_STUB_DECL (x) = TYPE_STUB_DECL (TREE_TYPE (t));
+	  TYPE_NAME (x) = t;
+	  DECL_ORIGINAL_TYPE (t) = x;
+	}
+	}
 
   /* Remap sizes as necessary.  */
   walk_tree (_SIZE (t), copy_tree_body_r, id, NULL);


Re: [patch] Fix problematic debug info for parameters at -O0

2016-06-22 Thread Jakub Jelinek
On Wed, Jun 22, 2016 at 10:52:32AM +0200, Eric Botcazou wrote:
> the fix for PR middle-end/61268:
>   https://gcc.gnu.org/viewcvs/gcc?view=revision=213002
> changed validize_mem to modify its argument in-place, which in turns means 
> that emit_move_insn can do it too.  That's a little surprising, but I guess 
> not really problematic in the end, except when the RTX is shared with the 
> DECL_RTL of a DECL node, because this DECL_RTL is also used for the debug 
> info 
> at -O0; it's typically a MEM based on VIRTUAL_STACK_VARS_REGNUM initially but 
> it can be changed into a MEM based on a pseudo which then can be assigned a 
> volatile register by RA, in which case the debug info would be based on this 
> volatile register instead of the frame pointer, which can break backtraces.
> 
> The attached patch fixes one of those cases in assign_parm_setup_reg, which 
> results in broken backtraces for Visium and PowerPC/EABI for some testcase.
> 
> Tested on x86_64-suse-linux, OK for the mainline?  What about the release 
> branches (5 & 6 are affected), is the issue worth fixing there too?
> 
> 
> 2016-06-22  Eric Botcazou  
> 
>   * function.c (assign_parm_setup_reg): Prevent sharing in another case.

Ok, even for branches I think, but would be nice to have a corresponding
guality testcase (perhaps just for -O0 with dg-skip-if) which fails without
this patch and succeeds with it.

Jakub


Re: [PATCH] Drop excess size used for run time allocated stack variables.

2016-06-22 Thread Dominik Vogt
On Tue, Jun 21, 2016 at 04:26:03PM -0600, Jeff Law wrote:
> On 06/21/2016 03:35 AM, Dominik Vogt wrote:
> >What do we do now with the two patches?  At the moment, the
> >functional patch depends on the changes in the cleanup patch, so
> >it cannot be applied on its own.  Options:
> >
> >(with the requested cleanup in the functional patch)
> >
> > 1) Apply both patches as they are now and do further cleanup on
> >top of it.
> > 2) Rewrite the functional patch so that it applies without the
> >cleanup patch and commit it now.
> > 3) Look into the suggested cleanup now and adapt the functional
> >patch to it when its ready.
> >
> >Actually I'd prefer (1) or (2) to just get the functional patch
> >off my desk.  I agree that the cleanup is very useful, but there's
> >not relation between the cleanup and the functional stuff except
> >that they touch the same code.  Having the functional patch
> >applied would simplify further work for me.
> I thought Eric had ack'd the cleanup patch with a comment fix, so
> that can move forward and presumably unblock your functional patch.
> Right?
> 
> So I think the TODO here is for me to fix the comment per Eric's
> review so that you can move forward.  The trick is getting it done
> before I go on PTO at the end of this week :-)

The comment fix is part of the version of the cleanup patch I
posted, but I've removed some more dead code.  I can handle all of
this if I know what to do exactly.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany



[patch] Fix problematic debug info for parameters at -O0

2016-06-22 Thread Eric Botcazou
Hi,

the fix for PR middle-end/61268:
  https://gcc.gnu.org/viewcvs/gcc?view=revision=213002
changed validize_mem to modify its argument in-place, which in turns means 
that emit_move_insn can do it too.  That's a little surprising, but I guess 
not really problematic in the end, except when the RTX is shared with the 
DECL_RTL of a DECL node, because this DECL_RTL is also used for the debug info 
at -O0; it's typically a MEM based on VIRTUAL_STACK_VARS_REGNUM initially but 
it can be changed into a MEM based on a pseudo which then can be assigned a 
volatile register by RA, in which case the debug info would be based on this 
volatile register instead of the frame pointer, which can break backtraces.

The attached patch fixes one of those cases in assign_parm_setup_reg, which 
results in broken backtraces for Visium and PowerPC/EABI for some testcase.

Tested on x86_64-suse-linux, OK for the mainline?  What about the release 
branches (5 & 6 are affected), is the issue worth fixing there too?


2016-06-22  Eric Botcazou  

* function.c (assign_parm_setup_reg): Prevent sharing in another case.

-- 
Eric BotcazouIndex: function.c
===
--- function.c	(revision 237677)
+++ function.c	(working copy)
@@ -3314,6 +3314,8 @@ assign_parm_setup_reg (struct assign_par
 	  set_mem_attributes (parmreg, parm, 1);
 	}
 
+  /* We need to preserve an address based on VIRTUAL_STACK_VARS_REGNUM for
+	 the debug info in case it is not legitimate.  */
   if (GET_MODE (parmreg) != GET_MODE (rtl))
 	{
 	  rtx tempreg = gen_reg_rtx (GET_MODE (rtl));
@@ -3323,7 +3325,8 @@ assign_parm_setup_reg (struct assign_par
 			 all->last_conversion_insn);
 	  emit_move_insn (tempreg, rtl);
 	  tempreg = convert_to_mode (GET_MODE (parmreg), tempreg, unsigned_p);
-	  emit_move_insn (parmreg, tempreg);
+	  emit_move_insn (MEM_P (parmreg) ? copy_rtx (parmreg) : parmreg,
+			  tempreg);
 	  all->first_conversion_insn = get_insns ();
 	  all->last_conversion_insn = get_last_insn ();
 	  end_sequence ();
@@ -3331,7 +3334,7 @@ assign_parm_setup_reg (struct assign_par
 	  did_conversion = true;
 	}
   else
-	emit_move_insn (parmreg, rtl);
+	emit_move_insn (MEM_P (parmreg) ? copy_rtx (parmreg) : parmreg, rtl);
 
   rtl = parmreg;
 


Re: [PATCH][AArch64] Add initial support for Cortex-A73

2016-06-22 Thread James Greenhalgh
On Wed, Jun 22, 2016 at 09:12:25AM +0100, Kyrill Tkachov wrote:
> Hi James,
> 
> On 21/06/16 17:38, James Greenhalgh wrote:
> >On Tue, Jun 21, 2016 at 04:55:43PM +0100, Kyrill Tkachov wrote:
> >>Hi all,
> >>
> >>This is a rebase of https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00403.html
> >>on top of Evandro's changes.
> >>Also, to elaborate on the original posting, the initial tuning structure is
> >>based on the Cortex-A57 one but with the issue rate set to 2, FMA steering
> >>turned off and ADRP+LDR fusion enabled.
> >I see you've also chosen to use the generic_branch_cost costs for
> >branches. As you didn't mention it explicitly here, was that intentional?
> >
> 
> Ah, that was copied from the Cortex-a72 tuning. I didn't spend any time
> experimenting with it.  generic_branch_costs should be good enough for the
> initial enablement.  I can change it to cortexa57_branch_cost if you'd like.
> Or we can do it separately later (I suspect Cortex-A72 should use those costs
> too.)

Yes, I'm more than happy for it to be a follow-up. We'll probably need
to revisit the settings for a few of the cores once the if-convert cost
model changes I've been working on go in anyway.

Thanks again,
James
 



Re: [PATCH][AArch64] Add initial support for Cortex-A73

2016-06-22 Thread Kyrill Tkachov

Hi James,

On 21/06/16 17:38, James Greenhalgh wrote:

On Tue, Jun 21, 2016 at 04:55:43PM +0100, Kyrill Tkachov wrote:

Hi all,

This is a rebase of https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00403.html
on top of Evandro's changes.
Also, to elaborate on the original posting, the initial tuning structure is
based on the Cortex-A57 one but with the issue rate set to 2, FMA steering
turned off and ADRP+LDR fusion enabled.

I see you've also chosen to use the generic_branch_cost costs for
branches. As you didn't mention it explicitly here, was that intentional?



Ah, that was copied from the Cortex-a72 tuning. I didn't spend any time 
experimenting with it.
generic_branch_costs should be good enough for the initial enablement.
I can change it to cortexa57_branch_cost if you'd like.
Or we can do it separately later (I suspect Cortex-A72 should use those costs 
too.)




Is this ok for trunk?

This looks OK to me. Watch out for the conflict with the Broadcom Vulcan
patch that was committed to trunk earlier today. The merge should be easy.

Thanks for the patch!


Thanks, I'll rebase and commit it today.
Kyrill


James


2016-06-21  Kyrylo Tkachov  

 * config/aarch64/aarch64.c (cortexa73_tunings): New struct.
 * config/aarch64/aarch64-cores.def (cortex-a73): New entry.
 (cortex-a73.cortex-a35): Likewise.
 (cortex-a73.cortex-a53): Likewise.
 * config/aarch64/aarch64-tune.md: Regenerate.
 * doc/invoke.texi (AArch64 Options): Document cortex-a73,
 cortex-a73.cortex-a35 and cortex-a73.cortex-a53 arguments to
 -mcpu and -mtune.




AW: [PATCH] Add a new target hook to compute the frame layout

2016-06-22 Thread Bernd Edlinger
On 06/21/16 23:29, Jeff Law wrote:
>
> How does this macro interact with INITIAL_FRAME_POINTER_OFFSET?

That I forgot to mention:  INITIAL_FRAME_POINTER_OFFSET is just
a single call, so whenever it is called from lra/reload the frame layout
is really expected to change, and so it does not make a difference if the target
computes the frame layout in TARGET_COMPUTE_FRAME_LAYOUT or in
INITIAL_FRAME_POINTER_OFFSET.

But I do not know of any targets that still use INITIAL_FRAME_POINTER_OFFSET,
and maybe support for this target hook could be discontinued as a follow-up 
patch.

What do you think?


Bernd.

Re: [patch, avr,wwwdocs] PR 58655

2016-06-22 Thread Pitchumani Sivanupandi

On Tuesday 21 June 2016 09:39 PM, Georg-Johann Lay wrote:

Pitchumani Sivanupandi schrieb:

Attached patches add documentation for -mfract-convert-truncate option
and add that info to release notes (gcc-4.9 changes).

If OK, could someone commit please? I do not have commit access.

Regards,
Pitchumani

gcc/ChangeLog

2016-06-21  Pitchumani Sivanupandi 

PR target/58655
* doc/invoke.texi (AVR Options): Document -mfract-convert-truncate
option.

--- a/wwwdocs/htdocs/gcc-4.9/changes.html
+++ b/wwwdocs/htdocs/gcc-4.9/changes.html
@@ -579,6 +579,14 @@ auto incr(T x) { return x++; }
size when compiling for the M-profile processors.
  
  
+AVR
+
+  
+A new command-line option -mfract-convert-truncate has been added.


 tags around the option.


+It allows compiler to use truncation instead of rounding towards
+0 for fractional int types.


"zero" instead of "0", and it's for fixed-point types, not for int types.


+  
+
 IA-32/x86-64
   
 -mfpmath=sse is now implied by
-ffast-math

...

 @emph{Blackfin Options}
 @gccoptlist{-mcpu=@var{cpu}@r{[}-@var{sirevision}@r{]} @gol
@@ -14586,6 +14586,10 @@ sbiw r26, const   ; X -= const
 @opindex mtiny-stack
 Only change the lower 8@tie{}bits of the stack pointer.

+@item -mfract-convert-truncate
+@opindex mfract-convert-truncate
+Allow to use truncation instead of rounding towards 0 for fractional
int types.


Same here: "zero" and "fixed-point".


+
 @item -nodevicelib
 @opindex nodevicelib
 Don't link against AVR-LibC's device specific library
@code{lib.a}.



Thanks Johann.

Updated the patches.

Regards,
Pitchumani

gcc/ChangeLog

2016-06-22  Pitchumani Sivanupandi  

 PR target/58655
 * config/avr/avr.opt (-mfract-convert-truncate): Update description.
 * doc/invoke.texi (AVR Options): Document it.


diff --git a/gcc/config/avr/avr.opt b/gcc/config/avr/avr.opt
index 05aa4b6..1af792b 100644
--- a/gcc/config/avr/avr.opt
+++ b/gcc/config/avr/avr.opt
@@ -97,7 +97,7 @@ Warn if the ISR is misspelled, i.e. without __vector prefix. Enabled by default.
 
 mfract-convert-truncate
 Target Report Mask(FRACT_CONV_TRUNC)
-Allow to use truncation instead of rounding towards 0 for fractional int types.
+Allow to use truncation instead of rounding towards zero for fractional fixed-point types.
 
 nodevicelib
 Driver Target Report RejectNegative
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index e000218..040fb6e 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -643,8 +643,8 @@ Objective-C and Objective-C++ Dialects}.
 @emph{AVR Options}
 @gccoptlist{-mmcu=@var{mcu} -maccumulate-args -mbranch-cost=@var{cost} @gol
 -mcall-prologues -mint8 -mn_flash=@var{size} -mno-interrupts @gol
--mrelax -mrmw -mstrict-X -mtiny-stack -nodevicelib -Waddr-space-convert @gol
--Wmisspelled-isr}
+-mrelax -mrmw -mstrict-X -mtiny-stack -mfract-convert-truncate -nodevicelib @gol
+-Waddr-space-convert -Wmisspelled-isr}
 
 @emph{Blackfin Options}
 @gccoptlist{-mcpu=@var{cpu}@r{[}-@var{sirevision}@r{]} @gol
@@ -14586,6 +14586,10 @@ sbiw r26, const   ; X -= const
 @opindex mtiny-stack
 Only change the lower 8@tie{}bits of the stack pointer.
 
+@item -mfract-convert-truncate
+@opindex mfract-convert-truncate
+Allow to use truncation instead of rounding towards zero for fractional fixed-point types.
+
 @item -nodevicelib
 @opindex nodevicelib
 Don't link against AVR-LibC's device specific library @code{lib.a}.

--- a/wwwdocs/htdocs/gcc-4.9/changes.html
+++ b/wwwdocs/htdocs/gcc-4.9/changes.html
@@ -579,6 +579,14 @@ auto incr(T x) { return x++; }
size when compiling for the M-profile processors.
  
  
+AVR
+
+  
+A new command-line option -mfract-convert-truncate has been
+added. It allows compiler to use truncation instead of rounding towards
+zero for fractional fixed-point types.
+  
+
 IA-32/x86-64
   
 -mfpmath=sse is now implied by -ffast-math



Re: [PATCH 1/3] Disable libgcj and libgloss for Phoenix-RTOS targets.

2016-06-22 Thread Jakub Sejdak
The whole idea of this patch is to disable those things in newlib, but
they must synchronize this file with GCC.
So if merging this into trunk will be all they need, then I have no
need to merge this into release branches.

2016-06-21 22:08 GMT+02:00 Jeff Law :
> On 06/15/2016 08:22 AM, Kuba Sejdak wrote:
>>
>> This patch disables libgcj and libgloss in main configure.ac for new OS
>> port - Phoenix-RTOS.
>> Those libs are unnecessary to build GCC or newlib for arm-phoenix.
>>
>> Is it ok for trunk? If possible, If possible, please merge it also to
>> GCC-6 and GCC-5 branches.
>>
>> 2016-06-15  Jakub Sejdak  
>>
>> * configure.ac: Disable libgcj and libgloss for Phoenix-RTOS targets.
>> * configure: Regenerated.
>
> These are fine for the trunk.  Please go ahead and commit once your SVN
> write access is set up.
>
> We generally don't do feature enablement in release branches.  Jakub, Joseph
> or Richi would have go grant an exception for this to be accepted on the
> release branches.
>
> jeff
>



-- 
Jakub Sejdak
Software Engineer
Phoenix Systems (www.phoesys.com)
+48 608 050 163


Re: [PATCH] Fix ICE with invalid use of flags output operand

2016-06-22 Thread Uros Bizjak
Hello!

> this fixes an ICE that happens when an asm statement tries to print
> the flags output operand.

> gcc:
> 2016-06-11  Bernd Edlinger  
>
> * config/i386/i386.c (print_reg): Emit an error message on attempt to
> print FLAGS_REG.
>
> testsuite:
> 2016-06-11  Bernd Edlinger  
>
> * gcc.target/i386/asm-flag-7.c: New test.

OK.

Thanks,
Uros.


Re: [PATCH 1/2] gcc: Remove unneeded global flag.

2016-06-22 Thread Jakub Jelinek
On Tue, Jun 21, 2016 at 08:55:15PM -0600, Jeff Law wrote:
> user_defined_section_attribute was introduced as part of the hot/cold
> partitioning changes.
> 
> https://gcc.gnu.org/ml/gcc-patches/2004-07/msg01545.html
> 
> 
> What's supposed to happen is hot/cold partitioning is supposed to be turned
> off for the function which has the a user defined section attribute.

The flag has been added before we had -funit-at-a-time, and IMNSHO it
couldn't work properly even when it has been introduced, because if you had
void foo (void) __attribute__((section ("...")));
void bar (void)
{
...
}
void foo (void)
{
...
}
then it would mistakenly apply to bar rather than foo.
So, either we can reconstruct whether the current function decl has user
section attribute, then perhaps during expansion we should set the flag
to that or use such a test directly in the gate (e.g. would lookup_attribute
("section", DECL_ATTRIBUTES (current_function_decl)) DTRT?) and drop the
flag, or we need some way to preserve that information per function.

Jakub