Re: [PATCH, rs6000] Fix AIX test case failures

2018-07-13 Thread David Edelsohn
On Fri, Jul 13, 2018 at 7:15 PM Carl Love  wrote:
>
> On Fri, 2018-07-13 at 16:00 -0500, Segher Boessenkool wrote:
> > On Fri, Jul 13, 2018 at 10:51:24AM -0400, David Edelsohn wrote:
> > > On AIX it would be calling divtc3, but AIX defaults to 64 bit long
> > > double.  Either all of these tests need
> > >
> > > /* { dg-require-effective-target longdouble128 } */
> > >
> > > or
> > >
> > > /* { dg-additional-options "-mlong-double-128" { target powerpc-
> > > ibm-aix* } } */
> > >
> > > along with testing for "tc", e.g., bl .__divtc3
> >
> > Which would you prefer David?  (I'd do the former).
> >
> >
> > Segher
> >
>
> Segher, David:
>
> I reworked the patch per the first option that David gave.  The tests
> divkc3-2.c, divkc3-3.c, mulkc3-2.c and mulkc3-3.c pass on Power 9 Linux
> as they did before.  The tests are unsupported on Power8 Linux as they
> were before.  Now, the tests are reported as unsupported on AIX rather
> then failing on AIX.
>
> Please let me know if you both approve the updated patch below.  Thanks
> for the input and help on this.
>
>Carl Love
>
> ---
>
> gcc/testsuite/ChangeLog:
>
> 2018-07-13  Carl Love  
>
> * gcc.target/powerpc/divkc3-2.c: Add dg-require-effective-target
> longdouble128.
> * gcc.target/powerpc/divkc3-3.c: Ditto.
> * gcc.target/powerpc/mulkc3-2.c: Ditto.
> * gcc.target/powerpc/mulkc3-3.c: Ditto.
> * gcc.target/powerpc/fold-vec-mergehl-double.c: Update counts.
> * gcc.target/powerpc/pr85456.c: Make check Linux and AIX specific.
> ---
>  gcc/testsuite/gcc.target/powerpc/divkc3-2.c| 1 +
>  gcc/testsuite/gcc.target/powerpc/divkc3-3.c| 1 +
>  gcc/testsuite/gcc.target/powerpc/fold-vec-mergehl-double.c | 4 +---
>  gcc/testsuite/gcc.target/powerpc/mulkc3-2.c| 1 +
>  gcc/testsuite/gcc.target/powerpc/mulkc3-3.c| 1 +
>  gcc/testsuite/gcc.target/powerpc/pr85456.c | 3 ++-
>  6 files changed, 7 insertions(+), 4 deletions(-)

Hi, Carl

This is essentially what I have been testing with today.

This is okay.

Thanks, David


Re: [RFC] Induction variable candidates not sufficiently general

2018-07-13 Thread Bin.Cheng
On Fri, Jul 13, 2018 at 6:04 AM, Kelvin Nilsen  wrote:
> A somewhat old "issue report" pointed me to the code generated for a 4-fold 
> manually unrolled version of the following loop:
>
>>   while (++len != len_limit) /* this is loop */
>>   if (pb[len] != cur[len])
>>   break;
>
> As unrolled, the loop appears as:
>
>> while (++len != len_limit) /* this is loop */ {
>>   if (pb[len] != cur[len])
>> break;
>>   if (++len == len_limit)  /* unrolled 2nd iteration */
>> break;
>>   if (pb[len] != cur[len])
>> break;
>>   if (++len == len_limit)  /* unrolled 3rd iteration */
>> break;
>>   if (pb[len] != cur[len])
>> break;
>>   if (++len == len_limit)  /* unrolled 4th iteration */
>> break;
>>   if (pb[len] != cur[len])
>> break;
>> }
>
> In examining the behavior of tree-ssa-loop-ivopts.c, I've discovered the only 
> induction variable candidates that are being considered are all forms of the 
> len variable.  We are not considering any induction variables to represent 
> the address expressions [len] and [len].
>
> I rewrote the source code for this loop to make the addressing expressions 
> more explicit, as in the following:
>
>>   cur++;
>>   while (++pb != last_pb) /* this is loop */ {
>>   if (*pb != *cur)
>> break;
>>   ++cur;
>>   if (++pb == last_pb)  /* unrolled 2nd iteration */
>> break;
>>   if (*pb != *cur)
>> break;
>>   ++cur;
>>   if (++pb == last_pb)  /* unrolled 3rd iteration */
>> break;
>>   if (*pb != *cur)
>> break;
>>   ++cur;
>>   if (++pb == last_pb)  /* unrolled 4th iteration */
>> break;
>>   if (*pb != *cur)
>> break;
>>   ++cur;
>>   }
>
> Now, gcc does a better job of identifying the "address expression induction 
> variables".  This version of the loop runs about 10% faster than the original 
> on my target architecture.
>
> This would seem to be a textbook pattern for the induction variable analysis. 
>  Does anyone have any thoughts on the best way to add these candidates to the 
> set of induction variables that are considered by tree-ssa-loop-ivopts.c?
>
> Thanks in advance for any suggestions.
>
Hi,
Could you please file a bug with your original slow test code
attached?  I tried to construct meaningful test case from your code
snippet but not successful.  There is difference in generated
assembly, but it's not that fundamental.  So a bug with preprocessed
test would be high appreciated.
I think there are two potential issues in cost computation for such
case: invariant expression and iv uses outside of loop handled as
inside uses.

Thanks,
bin


Re: [PATCH] [v2][aarch64] Avoid tag collisions for loads falkor

2018-07-13 Thread Siddhesh Poyarekar

On 07/13/2018 06:32 PM, Kyrill Tkachov wrote:

This looks good to me modulo a couple of minor comments inline.
You'll still need an approval from a maintainer.


Thanks, I'll send a fixed up version on Monday.

+  for (ause= DF_REG_USE_CHAIN (regno); ause; ause = DF_REF_NEXT_REG 
(ause))

+    {


Space after ause


OK.


+  /* Falkor does not support SVE vectors.  */
+  gcc_assert (GET_MODE_SIZE (mode).is_constant ());
+


I think this will blow up if someone tries compiling for SVE 
(-march=armv8.2-a+sve for example)
with -mtune=falkor. We don't want to crash then. I believe you just want 
to bail out of the optimisation by returning false.

You should update the comment in tag () to reflect this as well.


OK.

Thanks,
Siddhesh


Re: [PATCH] restore -Warray-bounds for string literals (PR 83776)

2018-07-13 Thread Martin Sebor

On 05/02/2018 01:22 AM, Richard Biener wrote:

On Fri, Jan 26, 2018 at 3:16 AM, Martin Sebor  wrote:

PR tree-optimization/83776 - [6/7/8 Regression] missing
-Warray-bounds indexing past the end of a string literal,
identified a not-so-recent improvement to constant propagation
as the reason for GCC no longer being able to detect out-of-
bounds accesses to string literals.  The root cause is that
the change caused accesses to strings to be transformed into
MEM_REFs that the -Warray-bounds checker isn't prepared to
handle.  A simple example is:

  int h (void)
  {
const char *p = "1234";
return p[16];   // missing -Warray-bounds
  }

To fix the regression the attached patch extends the array bounds
checker to handle the small subset of MEM_REF expressions that
refer to string literals but stops of short of doing more than
that.  There are outstanding gaps in the detection that the patch
intentionally doesn't handle.  They are either caused by other
regressions (PR 84047) or by other latent bugs/limitations, or
by limitations in the approach I took to try to keep the patch
simple.  I hope to address some of those in a follow-up patch
for GCC 9.


+  gimple *def = SSA_NAME_DEF_STMT (arg);
+  if (!is_gimple_assign (def))
+   {
+ if (tree var = SSA_NAME_VAR (arg))
+   arg = var;

this is never correct.  Whether sth has an underlying VAR_DECL or not
is irrelevant.  It looks like you'll always take the

+  else
+return;

path then anyways.  So why obfuscate the code this way?


I have removed the code.  It was a vestige of something that
didn't pan out and I didn't notice.



+  offset_int ofr[] = {
+   wi::to_offset (fold_convert (ptrdiff_type_node, vr->min)),
+   wi::to_offset (fold_convert (ptrdiff_type_node, vr->max))
+  };

huh.  Do you maybe want to use widest_int for ofr[]?  What's
wrong with wi::to_offset (vr->min)?  Building another intermediate
tree node here looks just bogus.


I need to convert size_type indices to signed to keep their sign
if it's negative and include it in warnings.  I've moved the code
into a conditional where it's used to minimize the cost but other
than that I don't know how else to convert it.



What are you trying to do in this loop anyways?


The loop It determines the range of the final index/offset for
a series of POINTER_PLUS_EXPRs.  It handles cases like this:

  int g (int i, int j, int k)
  {
if (i < 1) i = 1;
if (j < 1) j = 1;
if (k < 1) k = 1;

const char *p0 = "123";
const char *p1 = p0 + i;
const char *p2 = p1 + j;
const char *p3 = p2 + k;

// p2[3] and p3[1] are out of bounds
return p0[4] + p1[3] + p2[2] + p3[1];
  }


I suppose
look at

  p_1 = [i_2];  // already bounds-checked, but with ignore_off_by_one
  ... = MEM[p_1 + CST];

?  But then

+  if (TREE_CODE (varoff) != SSA_NAME)
+   break;

you should at least handle INTEGER_CSTs here?


It's already handled (and set in CSTOFF).  There should be no
more constant offsets after that (I haven't come across any.)



+  if (!vr || vr->type == VR_UNDEFINED || !vr->min || !vr->max)
+   break;

please use positive tests here, VR_RANGE || VR_ANTI_RANGE.  As usual
the anti-range handling looks odd.  Please add comments so we can follow
what you were thinking when writing range merging code.  Even better if you
can stick to use existing code rather than always re-inventing the wheel...


The anti-range handling is to conservatively add
[-MAXOBJSIZE -1, MAXOBJSIZE] to OFFRANGE.  I've added comments
to make it clear.  I'd be more than happy to reuse existing
code if I knew where to find it (if it exists).  It sure would
save me lots of work to have a library of utility functions to
call instead of rolling my own code each time.



I think I commented on earlier variants but this doesn't seem to resemble
any of them.


I've reworked the patch (sorry) to also handle arrays.  For GCC
9 it seems I might as well do both in one go.

Attached is an updated patch with these changes.

Martin
PR tree-optimization/84047 - missing -Warray-bounds on an out-of-bounds index into an array
PR tree-optimization/83776 - missing -Warray-bounds indexing past the end of a string literal

gcc/ChangeLog:

	PR tree-optimization/84047
	PR tree-optimization/83776
	* tree-vrp.c (vrp_prop::check_mem_ref): New function.
	(check_array_bounds): Call it.
	* /gcc/tree-sra.c (get_access_for_expr): Fail for out-of-bounds
	array offsets.

gcc/testsuite/ChangeLog:

	PR tree-optimization/83776
	PR tree-optimization/84047
	* gcc.dg/Warray-bounds-29.c: New test.
	* gcc.dg/Warray-bounds-30.c: New test.
	* gcc.dg/Warray-bounds-31.c: New test.
	* gcc.dg/Warray-bounds-32.c: New test.

diff --git a/gcc/testsuite/gcc.dg/Warray-bounds-29.c b/gcc/testsuite/gcc.dg/Warray-bounds-29.c
new file mode 100644
index 000..72c5d1c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/Warray-bounds-29.c
@@ -0,0 +1,150 @@
+/* PR tree-optimization/83776: missing -Warray-bounds indexing 

Re: [PATCH, rs6000] Fix AIX test case failures

2018-07-13 Thread Carl Love
On Fri, 2018-07-13 at 16:00 -0500, Segher Boessenkool wrote:
> On Fri, Jul 13, 2018 at 10:51:24AM -0400, David Edelsohn wrote:
> > On AIX it would be calling divtc3, but AIX defaults to 64 bit long
> > double.  Either all of these tests need
> > 
> > /* { dg-require-effective-target longdouble128 } */
> > 
> > or
> > 
> > /* { dg-additional-options "-mlong-double-128" { target powerpc-
> > ibm-aix* } } */
> > 
> > along with testing for "tc", e.g., bl .__divtc3
> 
> Which would you prefer David?  (I'd do the former).
> 
> 
> Segher
> 

Segher, David:

I reworked the patch per the first option that David gave.  The tests
divkc3-2.c, divkc3-3.c, mulkc3-2.c and mulkc3-3.c pass on Power 9 Linux
as they did before.  The tests are unsupported on Power8 Linux as they
were before.  Now, the tests are reported as unsupported on AIX rather
then failing on AIX.

Please let me know if you both approve the updated patch below.  Thanks
for the input and help on this.

   Carl Love

---

gcc/testsuite/ChangeLog:

2018-07-13  Carl Love  

* gcc.target/powerpc/divkc3-2.c: Add dg-require-effective-target
longdouble128.
* gcc.target/powerpc/divkc3-3.c: Ditto.
* gcc.target/powerpc/mulkc3-2.c: Ditto.
* gcc.target/powerpc/mulkc3-3.c: Ditto.
* gcc.target/powerpc/fold-vec-mergehl-double.c: Update counts.
* gcc.target/powerpc/pr85456.c: Make check Linux and AIX specific.
---
 gcc/testsuite/gcc.target/powerpc/divkc3-2.c| 1 +
 gcc/testsuite/gcc.target/powerpc/divkc3-3.c| 1 +
 gcc/testsuite/gcc.target/powerpc/fold-vec-mergehl-double.c | 4 +---
 gcc/testsuite/gcc.target/powerpc/mulkc3-2.c| 1 +
 gcc/testsuite/gcc.target/powerpc/mulkc3-3.c| 1 +
 gcc/testsuite/gcc.target/powerpc/pr85456.c | 3 ++-
 6 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/divkc3-2.c 
b/gcc/testsuite/gcc.target/powerpc/divkc3-2.c
index d3fcbedac..e34ed40ba 100644
--- a/gcc/testsuite/gcc.target/powerpc/divkc3-2.c
+++ b/gcc/testsuite/gcc.target/powerpc/divkc3-2.c
@@ -1,5 +1,6 @@
 /* { dg-do compile { target { powerpc*-*-* } } } */
 /* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-require-effective-target longdouble128 } */
 /* { dg-options "-O2 -mpower8-vector -mabi=ieeelongdouble -Wno-psabi" } */
 
 /* Check that complex multiply generates the right call when long double is
diff --git a/gcc/testsuite/gcc.target/powerpc/divkc3-3.c 
b/gcc/testsuite/gcc.target/powerpc/divkc3-3.c
index 45695fef8..c0fda8b24 100644
--- a/gcc/testsuite/gcc.target/powerpc/divkc3-3.c
+++ b/gcc/testsuite/gcc.target/powerpc/divkc3-3.c
@@ -1,5 +1,6 @@
 /* { dg-do compile { target { powerpc*-*-* } } } */
 /* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-require-effective-target longdouble128 } */
 /* { dg-options "-O2 -mpower8-vector -mabi=ibmlongdouble -Wno-psabi" } */
 
 /* Check that complex multiply generates the right call when long double is
diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-mergehl-double.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-mergehl-double.c
index 25f4bc6aa..14f944817 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-mergehl-double.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-mergehl-double.c
@@ -19,7 +19,5 @@ testd_h (vector double vd2, vector double vd3)
   return vec_mergeh (vd2, vd3);
 }
 
-/* vec_merge with doubles tend to just use xxpermdi (3 ea for BE, 1 ea for 
LE).  */
-/* { dg-final { scan-assembler-times "xxpermdi" 2  { target { powerpc*le-*-* } 
}} } */
-/* { dg-final { scan-assembler-times "xxpermdi" 6  { target { powerpc-*-* } }  
   } } */
+/* { dg-final { scan-assembler-times "xxpermdi" 2 } } */
 
diff --git a/gcc/testsuite/gcc.target/powerpc/mulkc3-2.c 
b/gcc/testsuite/gcc.target/powerpc/mulkc3-2.c
index 9ba577a0c..eee6de9e2 100644
--- a/gcc/testsuite/gcc.target/powerpc/mulkc3-2.c
+++ b/gcc/testsuite/gcc.target/powerpc/mulkc3-2.c
@@ -1,5 +1,6 @@
 /* { dg-do compile { target { powerpc*-*-* } } } */
 /* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-require-effective-target longdouble128 } */
 /* { dg-options "-O2 -mpower8-vector -mabi=ieeelongdouble -Wno-psabi" } */
 
 /* Check that complex multiply generates the right call when long double is
diff --git a/gcc/testsuite/gcc.target/powerpc/mulkc3-3.c 
b/gcc/testsuite/gcc.target/powerpc/mulkc3-3.c
index db8730158..b6d2bdf73 100644
--- a/gcc/testsuite/gcc.target/powerpc/mulkc3-3.c
+++ b/gcc/testsuite/gcc.target/powerpc/mulkc3-3.c
@@ -1,5 +1,6 @@
 /* { dg-do compile { target { powerpc*-*-* } } } */
 /* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-require-effective-target longdouble128 } */
 /* { dg-options "-O2 -mpower8-vector -mabi=ibmlongdouble -Wno-psabi" } */
 
 /* Check that complex multiply generates the right call when long double is
diff 

Re: [PATCH] restore -Warray-bounds for string literals (PR 83776)

2018-07-13 Thread Martin Sebor

+/* Checks one MEM_REF in REF, located at LOCATION, for out-of-bounds
+   references to string constants.  If VRP can determine that the array
+   subscript is a constant, check if it is outside valid range.
+   If the array subscript is a RANGE, warn if it is non-overlapping
+   with valid range.
+   IGNORE_OFF_BY_ONE is true if the MEM_REF is inside an ADDR_EXPR.  */

This function doesn't have IGNORE_OFF_BY_ONE as a parameter.  Drop it
from the comment.


In the latest update (yet to be posted) the function uses it.


+  /* Determine the offsets and increment OFFRANGE for the bounds of each.  */
+  while (TREE_CODE (arg) == SSA_NAME)
+{
+  gimple *def = SSA_NAME_DEF_STMT (arg);
+  if (!is_gimple_assign (def))
+   {
+ if (tree var = SSA_NAME_VAR (arg))
+   arg = var;
+ break;
+   }

What's the point of looking at the underlying SSA_NAME_VAR here? I can't
see how that's ever helpful.  You'll always exit the loop at this point
which does something like

if (TREE_CODE (arg) == ADDR_EXPR)
  {
 do something interesting
  }
else
  return;

ISTM that any time you dig into SSA_NAME_VAR (arg) what you're going to
get back is some kind of _DECL node -- I'm not aware of a case where
you're going to get back an ADDR_EXPR.


I have removed the code.  It was a vestige of something that
didn't work out and I didn't notice.


+
+  tree_code code = gimple_assign_rhs_code (def);
+  if (code == POINTER_PLUS_EXPR)
+   {
+ arg = gimple_assign_rhs1 (def);
+ varoff = gimple_assign_rhs2 (def);
+   }
+  else if (code == ASSERT_EXPR)
+   {
+ arg = TREE_OPERAND (gimple_assign_rhs1 (def), 0);
+ continue;
+   }
+  else
+   return;
+
+  if (TREE_CODE (varoff) != SSA_NAME)
+   break;
+
+  vr = get_value_range (varoff);
+  if (!vr || vr->type == VR_UNDEFINED || !vr->min || !vr->max)
+   break;

Doesn't this let VR_ANTI_RANGE through?  Why not instead

  if (!vr || vr->type != VR_RANGE || !vr->min || !vr->max)

?  I'm having trouble convincing myself the subsequent code will DTRT
for an anti-range.


The anti-range code adds PTRDIFF_MIN and PTRDIFF_MAX to
the offset (that's what ARRBOUNDS is initially set to, until
we have found the array that's being dereferenced).

Because of the loop bailing for an anti-range could be too
early (the subsequent iterations might compensate for
the conservative anti-range handling and find a bug in
offsets added later).  It's unlikely but so are all bugs
so I try to handle even corner cases.


+
+  if (TREE_CODE (vr->min) != INTEGER_CST
+  || TREE_CODE (vr->max) != INTEGER_CST)
+break;
+
+  offset_int ofr[] = {
+   wi::to_offset (fold_convert (ptrdiff_type_node, vr->min)),
+   wi::to_offset (fold_convert (ptrdiff_type_node, vr->max))
+  };
+
+  if (vr->type == VR_RANGE)
+   {
+ if (ofr[0] < ofr[1])
+   {
+ offrange[0] += ofr[0];
+ offrange[1] += ofr[1];
+   }
+ else
+   {
+ offrange[0] += strbounds[0];
+ offrange[1] += strbounds[1];
+   }

When can the ELSE clause above happen for a VR_RANGE?


For a maximum in excess of PTRDIFF_MAX and a non-negative
minimum that's less than that.


+  /* At level 2 check also intermediate offsets.  */
+  int i = 0;
+  if (extrema[i] < -strbounds[1]
+  || extrema[i = 1] > strbounds[1] + eltsize)
+{
+  HOST_WIDE_INT tmpidx = extrema[i].to_shwi () / eltsize.to_shwi ();
+
+  warning_at (location, OPT_Warray_bounds,
+ "intermediate array offset %wi is outside array bounds "
+ "of %qT",
+ tmpidx,  strtype);
+  TREE_NO_WARNING (ref) = 1;
+}
+}

This seems ill-advised.  All that matters is the actual index used in
the dereference.  Intermediate values (ie, address computations) along
the way are uninteresting -- we may form an address out of the bounds of
the array as an intermediate computation.  But the actual memory
reference will be within the range.


The idea is to help detect bugs that cannot be detected otherwise.
Take the example below:

  void g (int i)
  {
if (i < 1 || 2 < i)
  i = 1;

const char *p1 = "12" + i;
const char *p2 = p1 + i;

extern int x, y;

x = p2[-4];   // #1: only valid if p2 is out of bounds
y = p2[0];// #2 therefore this must be out of bounds
  }

For the dereference at #1 to be valid i must be 2 (i.e.,
the upper bound of its range) and so p2 is therefore out
of bounds.

We may be able to compensate for it and compute the right
address at #1 but if the out-of-bounds value is used in
another dereference that makes a different assumption
(such as #2) one of the two is definitely wrong.  It might
be possible to do something smarter and determine if the
pointer really is used this way but I didn't want to
complicate the things too much so I put the logic under
level 2.

I 

Re: [PATCH, rs6000] Fix AIX test case failures

2018-07-13 Thread Segher Boessenkool
On Fri, Jul 13, 2018 at 10:51:24AM -0400, David Edelsohn wrote:
> On AIX it would be calling divtc3, but AIX defaults to 64 bit long
> double.  Either all of these tests need
> 
> /* { dg-require-effective-target longdouble128 } */
> 
> or
> 
> /* { dg-additional-options "-mlong-double-128" { target powerpc-ibm-aix* } } 
> */
> 
> along with testing for "tc", e.g., bl .__divtc3

Which would you prefer David?  (I'd do the former).


Segher


[PATCH], Remove undocumented -mtoc-fusion from PowerPC

2018-07-13 Thread Michael Meissner
Back in the days when I was developing the extended fusion support for PowerPC
(-mpower9-fusion), I added a partially implemented option called toc fusion.
The idea was to recognize TOC entries (that normally get split into HIGH/LO_SUM
pairs) early on, and keep the pairs together.  Unfortunately, I messed the
setting, and you could not actually use -mtoc-fusion without also setting
-mcmodel=medium, since the TOC fusion tests in rs6000.c occured before the
default code model was set in SUBSUBTARGET_OPTIONS.  However, I stopped doing
fusion work to do other things (basic power9 enablement and IEEE 128-bit
floating point).

While it would be simple to move the tests for TOC fusion to after the location
where the code model is set, I'm thinking that the current code is rather
limited.  Right now, toc fusion replaces each TOC reference with a new insn
that has the scratch register as a clobber.  However, if you have multiple
references to the same variable (such as doing the ++/-- operators) in a basic
block or referencs to variables whose location near to the variable you
previously referenced, we will generate multiple ADDIS operations.

I have ideas how to a better job of fusion for current and future machines
using a machine dependent pass to do fusion optimizations within a basic block.
This means rather than keeping the toc fusion around (that nobody used), I
would prefer to delete the current code, and replace it with better code as I
implement it.

I have tested this on a power8 little endian system with a bootstrap build and
with make check.  There were no regressions.  In addition, I built the full
spec 2006 CPU benchmark suite for power9 to make sure I didn't accidently
delete insns that are used for -mpower9-fusion.  Can I check this into the
trunk?  I don't anticipate that we will need a backport to the FSF GCC 8
branch.

2018-07-13  Michael Meissner  

* config/rs6000/constraints.md (wG constraint): Delete, no longer
used.
* config/rs6000/predicates.md (p9_fusion_reg_operand): Rename
predicate to reflect toc fusion has been deleted.
(toc_fusion_mem_raw): Delete, no longer used.
(toc_fusion_mem_wrapped): Likewise.
* config/rs6000/rs6000-cpus.def (POWERPC_MASKS): Delete toc
fusion mask bit.
* config/rs6000/rs6000-protos.h (fusion_wrap_memory_address):
Delete, no longer used.
* config/rs6000/rs6000.c (struct rs6000_reg_addr): Delete fields
meant to be used for toc fusion.
(rs6000_debug_print_mode): Delete toc fusion debugging.
(rs6000_debug_reg_global): Likewise.
(rs6000_init_hard_regno_mode_ok): Delete setting up fields for toc
fusion and secondary reload support that were never used.
(rs6000_option_override_internal): Delete TOC fusion, that was only
partially defined, and it did not work unless you also used the
-mcmodel= switch.
(rs6000_legitimate_address_p): Delete TOC fusion support.
(rs6000_opt_masks): Likewise.
(fusion_wrap_memory_address): Delete function, no longer used.
(fusion_split_address); Delete TOC fusion support.
* config/rs6000/rs6000.h (TARGET_TOC_FUSION_INT): Delete, no
longer used with toc fusion being deleted.
(TARGET_TOC_FUSION_FP): Likewise.
* config/rs6000/rs6000.md (UNSPEC_FUSION_ADDIS): Delete TOC fusion
UNSPEC.
(toc fusion spliter): Delete TOC fusion support.
(toc_fusionload_): Likewise.
(toc_fusionload_di): Likewise.
(fusion_gpr_load_): Delete generator function, this insn no
longer needs to be named.  Rename predicate to delete TOC fusion.
(fusion_gpr___load): Likewise.
(fusion_gpr___store): Likewise.
(fusion_vsx___load): Likewise.
(fusion_vsx___store): Likewise.
(p9 fusion peephole2s): Rename predicate to delete TOC fusion.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/constraints.md
===
--- gcc/config/rs6000/constraints.md(revision 262647)
+++ gcc/config/rs6000/constraints.md(working copy)
@@ -157,10 +157,8 @@ (define_memory_constraint "wF"
   "Memory operand suitable for power9 fusion load/stores"
   (match_operand 0 "fusion_addis_mem_combo_load"))
 
-;; Fusion gpr load.
-(define_memory_constraint "wG"
-  "Memory operand suitable for TOC fusion memory references"
-  (match_operand 0 "toc_fusion_mem_wrapped"))
+;; wG is now available.  Previously it was a memory operand suitable for TOC
+;; fusion.
 
 (define_register_constraint "wH" "rs6000_constraints[RS6000_CONSTRAINT_wH]"
   "Altivec register to hold 32-bit integers or NO_REGS.")
Index: gcc/config/rs6000/predicates.md
===
--- gcc/config/rs6000/predicates.md   

libgo patch committed: Skip zero-sized fields in structs when converting to libffi CIF

2018-07-13 Thread Ian Lance Taylor
The libffi library doesn't understand zero-sized objects.  This patch
to libgo fixes it so that when we see a zero-sized field in a struct,
we just skip it when converting to the libffi data structures. There
is no value to pass in any case, so not telling libffi about the field
doesn't affect anything.  The test case for this is
https://golang.org/cl/123316.  This fixes
https://golang.org/issue/26335.  Bootstrapped and ran Go testsuite on
x86_64-pc-linux-gnu.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 262641)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-3f7e72eca3f9221e67c055841d42851aa6a66aff
+db991403fc97854201b3f40492f4f6b9d471cabc
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/go/runtime/ffi.go
===
--- libgo/go/runtime/ffi.go (revision 262540)
+++ libgo/go/runtime/ffi.go (working copy)
@@ -225,11 +225,40 @@ func structToFFI(typ *structtype) *__ffi
return emptyStructToFFI()
}
 
-   fields := make([]*__ffi_type, c+1)
+   fields := make([]*__ffi_type, 0, c+1)
+   checkPad := false
for i, v := range typ.fields {
-   fields[i] = typeToFFI(v.typ)
+   // Skip zero-sized fields; they confuse libffi,
+   // and there is no value to pass in any case.
+   // We do have to check whether the alignment of the
+   // zero-sized field introduces any padding for the
+   // next field.
+   if v.typ.size == 0 {
+   checkPad = true
+   continue
+   }
+
+   if checkPad {
+   off := uintptr(0)
+   for j := i - 1; j >= 0; j-- {
+   if typ.fields[j].typ.size > 0 {
+   off = typ.fields[j].offset() + 
typ.fields[j].typ.size
+   break
+   }
+   }
+   off += uintptr(v.typ.align) - 1
+   off &^= uintptr(v.typ.align) - 1
+   if off != v.offset() {
+   fields = append(fields, padFFI(v.offset()-off))
+   }
+   checkPad = false
+   }
+
+   fields = append(fields, typeToFFI(v.typ))
}
-   fields[c] = nil
+
+   fields = append(fields, nil)
+
return &__ffi_type{
_type:_FFI_TYPE_STRUCT,
elements: [0],
@@ -302,6 +331,19 @@ func emptyStructToFFI() *__ffi_type {
return &__ffi_type{
_type:_FFI_TYPE_STRUCT,
elements: [0],
+   }
+}
+
+// padFFI returns a padding field of the given size
+func padFFI(size uintptr) *__ffi_type {
+   elements := make([]*__ffi_type, size+1)
+   for i := uintptr(0); i < size; i++ {
+   elements[i] = ffi_type_uint8()
+   }
+   elements[size] = nil
+   return &__ffi_type{
+   _type:_FFI_TYPE_STRUCT,
+   elements: [0],
}
 }
 


Re: [PATCH] x86: Tune Skylake, Cannonlake and Icelake as Haswell

2018-07-13 Thread H.J. Lu
On Fri, Jul 13, 2018 at 9:31 AM, Jan Hubicka  wrote:
>> > We have also noticed that benchmarks on skylake are not good compared to
>> > haswell, this nicely explains it.  I think this is -march=native regression
>> > compared to GCC versions that did not suppored better CPUs than Haswell.  
>> > So it
>> > would be nice to backport it.
>>
>> Yes, we should.   Here is the patch to backport to GCC 8.  OK for GCC 8 after
>> it has been checked into trunk?
>
> OK,
> Honza
>>
>> Thanks.
>>
>> --
>> H.J.
>
>> From 40a1050b330b421a1f445cb2a40b5a002da2e6d6 Mon Sep 17 00:00:00 2001
>> From: "H.J. Lu" 
>> Date: Mon, 4 Jun 2018 19:16:06 -0700
>> Subject: [PATCH] x86: Tune Skylake, Cannonlake and Icelake as Haswell
>>
>> r259399, which added PROCESSOR_SKYLAKE, disabled many x86 optimizations
>> which are enabled by PROCESSOR_HASWELL.  As the result, -mtune=skylake
>> generates slower codes on Skylake than before.  The same also applies
>> to Cannonlake and Icelak tuning.
>>
>> This patch changes -mtune={skylake|cannonlake|icelake} to tune like
>> -mtune=haswell for until their tuning is properly adjusted. It also
>> enables -mprefer-vector-width=256 for -mtune=haswell, which has no
>> impact on codegen when AVX512 isn't enabled.
>>
>> Performance impacts on SPEC CPU 2017 rate with 1 copy using
>>
>> -march=native -mfpmath=sse -O2 -m64
>>
>> are
>>
>> 1. On Broadwell server:
>>
>> 500.perlbench_r   -0.56%
>> 502.gcc_r -0.18%
>> 505.mcf_r 0.24%
>> 520.omnetpp_r 0.00%
>> 523.xalancbmk_r   -0.32%
>> 525.x264_r-0.17%
>> 531.deepsjeng_r   0.00%
>> 541.leela_r   0.00%
>> 548.exchange2_r   0.12%
>> 557.xz_r  0.00%
>> Geomean   0.00%
>>
>> 503.bwaves_r  0.00%
>> 507.cactuBSSN_r   0.21%
>> 508.namd_r0.00%
>> 510.parest_r  0.19%
>> 511.povray_r  -0.48%
>> 519.lbm_r 0.00%
>> 521.wrf_r 0.28%
>> 526.blender_r 0.19%
>> 527.cam4_r0.39%
>> 538.imagick_r 0.00%
>> 544.nab_r -0.36%
>> 549.fotonik3d_r   0.51%
>> 554.roms_r0.00%
>> Geomean   0.17%
>>
>> On Skylake client:
>>
>> 500.perlbench_r   0.96%
>> 502.gcc_r 0.13%
>> 505.mcf_r -1.03%
>> 520.omnetpp_r -1.11%
>> 523.xalancbmk_r   1.02%
>> 525.x264_r0.50%
>> 531.deepsjeng_r   2.97%
>> 541.leela_r   0.50%
>> 548.exchange2_r   -0.95%
>> 557.xz_r  2.41%
>> Geomean   0.56%
>>
>> 503.bwaves_r  0.49%
>> 507.cactuBSSN_r   3.17%
>> 508.namd_r4.05%
>> 510.parest_r  0.15%
>> 511.povray_r  0.80%
>> 519.lbm_r 3.15%
>> 521.wrf_r 10.56%
>> 526.blender_r 2.97%
>> 527.cam4_r2.36%
>> 538.imagick_r 46.40%
>> 544.nab_r 2.04%
>> 549.fotonik3d_r   0.00%
>> 554.roms_r1.27%
>> Geomean   5.49%
>>
>> On Skylake server:
>>
>> 500.perlbench_r   0.71%
>> 502.gcc_r -0.51%
>> 505.mcf_r -1.06%
>> 520.omnetpp_r -0.33%
>> 523.xalancbmk_r   -0.22%
>> 525.x264_r1.72%
>> 531.deepsjeng_r   -0.26%
>> 541.leela_r   0.57%
>> 548.exchange2_r   -0.75%
>> 557.xz_r  -1.28%
>> Geomean   -0.21%
>>
>> 503.bwaves_r  0.00%
>> 507.cactuBSSN_r   2.66%
>> 508.namd_r3.67%
>> 510.parest_r  1.25%
>> 511.povray_r  2.26%
>> 519.lbm_r 1.69%
>> 521.wrf_r 11.03%
>> 526.blender_r 3.39%
>> 527.cam4_r1.69%
>> 538.imagick_r 64.59%
>> 544.nab_r -0.54%
>> 549.fotonik3d_r   2.68%
>> 554.roms_r0.00%
>> Geomean   6.19%
>>
>> This patch improves -march=native performance on Skylake up to 60% and
>> leaves -march=native performance unchanged on Haswell.
>>
>> gcc/
>>
>>   Backport from mainline
>>   2018-07-12  H.J. Lu  
>>   Sunil K Pandey  
>>
>>   PR target/84413
>>   * config/i386/i386.c (m_CORE_AVX512): New.
>>   (m_CORE_AVX2): Likewise.
>>   (m_CORE_ALL): Add m_CORE_AVX2.
>>   * config/i386/x86-tune.def: Replace m_HASWELL with m_CORE_AVX2.
>>   Replace m_SKYLAKE_AVX512 with m_CORE_AVX512 on avx256_optimal
>>   and remove the rest of m_SKYLAKE_AVX512.
>>
>> gcc/testsuite/
>>
>>   Backport from mainline
>>   2018-07-12  H.J. Lu  
>>   Sunil K Pandey  
>>
>>   PR target/84413
>>   * gcc.target/i386/pr84413-1.c: New test.
>>   * gcc.target/i386/pr84413-2.c: Likewise.
>>   * gcc.target/i386/pr84413-3.c: Likewise.
>>   * gcc.target/i386/pr84413-4.c: Likewise.

This is the patch I checked into trunk.  I dropped 

Re: [PATCH] Properly unshare TYPE_SIZE_UNIT/DECL_FIELD_OFFSET (PR86216)

2018-07-13 Thread Richard Biener
On July 13, 2018 6:52:26 PM GMT+02:00, Eric Botcazou  
wrote:
>> It breaks Ada bootstrap.  I guess Ada has variable-size types in
>> non-function scope (not sure how TYPE_SIZES_GIMPLIFIED works then
>> though).  That said, r92495 moved the unshare_expr from layout_type
>> to gimplify_one_sizepos.
>
>See gimplify.c:763 and below.

Thanks. In that light the unsharing at the places the FE builds expressions 
using TYPE_SIZE and friends looks like the way to go. 

I still wonder why unsharing in gimplify_one_sizepos is necessary though. Ist 
that because even deep unsharing doesn't walk types? 

Richard. 



Go patch commited: Fix parsing of composite literals with omitted pointer types

2018-07-13 Thread Ian Lance Taylor
This patch to the Go frontend fixes parsing of composite literals with
omitted pointer types.  The frontend could parse omitted pointer
typess at the end of the type, but not in the middle, so code like
[]*[][]int{{{1}}} failed.  A test case is in
https://golang.org/cl/123477.  This fixes
https://golang.org/issue/26340.  Bootstrapped and ran Go testsuite on
x86_64-pc-linux-gnu.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 262572)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-267686fd1dffbc03e610e9f17dadb4e72c75f18d
+3f7e72eca3f9221e67c055841d42851aa6a66aff
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/expressions.cc
===
--- gcc/go/gofrontend/expressions.cc(revision 262554)
+++ gcc/go/gofrontend/expressions.cc(working copy)
@@ -13666,6 +13666,7 @@ Composite_literal_expression::do_lower(G
 
   for (int depth = 0; depth < this->depth_; ++depth)
 {
+  type = type->deref();
   if (type->array_type() != NULL)
type = type->array_type()->element_type();
   else if (type->map_type() != NULL)


Re: [PATCH, rs6000] Alphabetize prototypes of AltiVec built-in functions in extend.texi

2018-07-13 Thread Segher Boessenkool
On Tue, Jul 10, 2018 at 06:13:50PM -0500, Kelvin Nilsen wrote:
> This patch alphabetizes the list of AltiVec built-in function prototypes that 
> consume about 15 pages of the gcc.pdf file.  As part of the alphabetization 
> effort, certain functions that should not be documented in this section of 
> the manual are separated from the others and moved to the end of the section 
> with comments to explain their role.
> 
> This patch prepares the way for future patches that will remove certain 
> prototypes from this section and will insert certain prototypes that are 
> currently missing from this section.  It also improves readability and 
> maintainability of the section.

I don't think having these thing alphabetical is a good idea; there are
much better orderings possible.  But it is a step towards sorting out the
mess that is the current documentation, so okay for trunk!

Thanks for working on this.


Segher


>   * doc/extend.texi (PowerPC AltiVec Built-in Functions):
>   Alphabetize prototypes of built-in functions, separating out
>   built-in functions that are listed in this section but should be
>   described elsewhere.


Re: [PATCH, rs6000] Add missing logical-op interfaces to emmintrin.h

2018-07-13 Thread Segher Boessenkool
Hi!

On Wed, Jul 11, 2018 at 12:26:24PM -0500, Bill Schmidt wrote:
> It was recently brought to our attention that the existing emmintrin.h
> header, which was believed to be feature-complete for SSE2 support, is
> actually missing four logical-op interfaces:
> 
>  _mm_and_si128
>  _mm_andnot_si128
>  _mm_or_si128
>  _mm_xor_si128
> 
> This patch provides those with the obvious implementations, along with
> test cases.  I've bootstrapped it on powerpc64le-linux-gnu (P8, P9)
> and powerpc64-linux-gnu (P7, P8) and tested it with no regressions.
> Is this okay for trunk?
> 
> Although this isn't a regression, it is an oversight that leaves the
> SSE2 support incomplete.  Thus I'd like to ask permission to also
> backport this to gcc-8-branch after a short waiting period.  It's
> passed regstrap on P8 and P9 LE, and P7/P8 BE testing is underway.
> Is that backport okay if testing succeeds?
> 
> [BTW, I'm shepherding this patch on behalf of Steve Munroe.]

This looks fine.  Okay for trunk.  Also okay for 8 (as we discussed, you
probably should check if 8 hasn't diverged from trunk here; it shouldn't
have).

Thanks to both of you,


Segher


Re: C++ patch ping

2018-07-13 Thread Jakub Jelinek
On Fri, Jul 13, 2018 at 12:24:02PM -0400, Nathan Sidwell wrote:
> On 07/13/2018 09:49 AM, Jakub Jelinek wrote:
> > Hi!
> > 
> > I'd like to ping the following C++ patches:
> > 
> > - PR c++/85515
> >make range for temporaries unspellable during parsing and only
> >turn them into spellable for debug info purposes
> >http://gcc.gnu.org/ml/gcc-patches/2018-07/msg00086.html
> 
> 
> How hard would it be to add the 6 special identifiers to the C++ global
> table via initialize_predefined_identifiers (decl.c) and then use them
> directly in the for range machinery?  repeated get_identifier
> ("string-const") just smells bad.

Probably not too hard, but we have hundreds of other
get_identifier ("string-const") calls in the middle-end, C++ FE, other FEs.
Are those 6 more important than say "abi_tag", "gnu", "begin", "end", "get",
"tuple_size", "tuple_element", and many others?

Is it worth caching those?

Jakub


Re: [PATCH] Properly unshare TYPE_SIZE_UNIT/DECL_FIELD_OFFSET (PR86216)

2018-07-13 Thread Eric Botcazou
> It breaks Ada bootstrap.  I guess Ada has variable-size types in
> non-function scope (not sure how TYPE_SIZES_GIMPLIFIED works then
> though).  That said, r92495 moved the unshare_expr from layout_type
> to gimplify_one_sizepos.

See gimplify.c:763 and below.

-- 
Eric Botcazou


Re: [PATCH][GCC][AArch64] Updated stack-clash implementation supporting 64k probes. [patch (1/6)]

2018-07-13 Thread Jeff Law
On 07/12/2018 11:39 AM, Tamar Christina wrote:
>>> +
>>> +  /* Round size to the nearest multiple of guard_size, and calculate the
>>> + residual as the difference between the original size and the rounded
>>> + size. */
>>> +  HOST_WIDE_INT rounded_size = size & -guard_size;
>>> +  HOST_WIDE_INT residual = size - rounded_size;
>>> +
>>> +  /* We can handle a small number of allocations/probes inline.  Otherwise
>>> + punt to a loop.  */
>>> +  if (rounded_size <= STACK_CLASH_MAX_UNROLL_PAGES * guard_size)
>>> +{
>>> +  for (HOST_WIDE_INT i = 0; i < rounded_size; i += guard_size)
>>> +   {
>>> + aarch64_sub_sp (NULL, temp2, guard_size, true);
>>> + emit_stack_probe (plus_constant (Pmode, stack_pointer_rtx,
>>> +  STACK_CLASH_CALLER_GUARD));
>>> +   }
>> So the only concern with this code is that it'll be inefficient and
>> possibly incorrect for probe sizes larger than ARITH_FACTOR.
>> Ultimately, that's a case I don't think we really care that much about.
>> I wouldn't lose sleep if the port clamped the requested probing interval
>> at ARITH_FACTOR.
>>
> I'm a bit confused here, the ARITH_FACTOR seems to have to do with the Ada
> stack probing implementation, which isn't used by this new code aside
> from the part that emits the actual probe when doing a variable or large
> allocation in aarch64_output_probe_stack_range.
> 
> Clamping the probing interval at ARITH_FACTOR means we can't do 64KB
> probing intervals. 
It may have been a misunderstanding on my part.  My understanding is
that ARITH_FACTOR represents the largest immediate constant we could
handle in this code using a single insn.  Anything above ARITH_FACTOR
needed a scratch register and I couldn't convince myself that we had a
suitable scratch register available.

But I'm really not well versed on the aarch64 architecture or the
various implementation details in aarch64.c.  So I'm happy to defer to
you and others @ ARM on what's best to do here.


>> That can be problematical in a couple cases.  First it can run afoul of
>> combine-stack-adjustments.  Essentially that pass will combine a series
>> of stack adjustments into a single insn so your unrolled probing turns
>> into something like this:
>>
>>   multi-page stack adjust
>>   probe first page
>>   probe second page
>>   probe third page
>>   and so on..
>>
> This is something we actually do want, particularly in the case of 4KB pages
> as the immediates would fit in the store.  It's one of the things we were
> thinking about for future improvements.
> 
>> That violates the design constraint that we never allocate more than a
>> page at a time.
> Would there be a big issue here if we didn't adhere to this constraint?
Yes, because it enlarges a window for exploitation.  Consider signal
delivery occurring after the adjustment but before the probes.  The
signal handler could then be running on a clashed heap/stack.

> 
>> Do you happen to have a libc.so and ld.so compiled with this code?  I've
>> got a scanner here which will look at a DSO and flag obviously invalid
>> stack allocations.  If you've got a suitable libc.so and ld.so I can run
>> them through the scanner.
>>
>>
>> Along similar lines, have you run the glibc testsuite with this stuff
>> enabled by default.  That proved useful to find codegen issues,
>> particularly with the register inheritance in the epilogue.
>>
> I'm running one now, I'll send the two binaries once I get the results back
> from the run. Thanks!
Great.  Looking forward getting those  .so files I can can throw them
into the scanner.

>>> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
>>> index 
>>> 830f97603487d6ed07c4dc854a7898c4d17894d1..d1ed54c7160ab682c78d5950947608244d293025
>>>  100644
>>> --- a/gcc/config/aarch64/aarch64.md
>>> +++ b/gcc/config/aarch64/aarch64.md
>>> @@ -3072,7 +3072,7 @@
>>>  
>>>  (define_insn "cmp"
>>>[(set (reg:CC CC_REGNUM)
>>> -   (compare:CC (match_operand:GPI 0 "register_operand" "r,r,r")
>>> +   (compare:CC (match_operand:GPI 0 "register_operand" "rk,r,r")
>>> (match_operand:GPI 1 "aarch64_plus_operand" "r,I,J")))]
>>>""
>>>"@
>> I don't think I needed this, but I can certainly understand how it might
>> be necessary.  THe only one I know we needed as the probe_stack_range
>> constraint change.
>>
> It's not strictly needed, but it allows us to do the comparison with the
> stack pointer in the loop without needing to emit sp to a temporary first.
> So it's just a tiny optimization. :)
Understood.  A similar tweak for cmp was done in a patch from
Richard E in his patches for spectre v1 mitigation.  I'm certainly not
objecting :-)


Jeff



Re: C++ patch ping

2018-07-13 Thread Nathan Sidwell

On 07/13/2018 09:49 AM, Jakub Jelinek wrote:


- PR c++/3698, PR c++/86208
   extern_decl_map & TREE_USED fix (plus 2 untested variants)
   http://gcc.gnu.org/ml/gcc-patches/2018-07/msg00084.html


ok, thanks

--
Nathan Sidwell


RE: [PATCH][GCC][AArch64] Updated stack-clash implementation supporting 64k probes. [patch (1/6)]

2018-07-13 Thread Tamar Christina
Hi All,

I'm sending an updated patch which makes sure unwind tables are disabled always 
for tests that do sequence checks so they pass in all configurations.
There's no change to the cover letter or implementation.

Regards,
Tamar

> -Original Message-
> From: Tamar Christina 
> Sent: Thursday, July 12, 2018 18:40
> To: Jeff Law 
> Cc: gcc-patches@gcc.gnu.org; nd ; James Greenhalgh
> ; Richard Earnshaw
> ; Marcus Shawcroft
> 
> Subject: Re: [PATCH][GCC][AArch64] Updated stack-clash implementation
> supporting 64k probes. [patch (1/6)]
> 
> Hi Jeff,
> 
> Thanks for the review!
> 
> The 07/11/2018 18:30, Jeff Law wrote:
> > On 07/11/2018 05:20 AM, Tamar Christina wrote:
> > > Hi All,
> > >
> > > This patch implements the use of the stack clash mitigation for aarch64.
> > > In Aarch64 we expect both the probing interval and the guard size to
> > > be 64KB and we enforce them to always be equal.
> > >
> > > We also probe up by 1024 bytes in the general case when a probe is
> required.
> > >
> > > AArch64 has the following probing conditions:
> > >
> > >  1) Any allocation less than 63KB requires no probing.  An ABI defined 
> > > safe
> > > buffer of 1Kbytes is used and a page size of 64k is assumed.
> > >
> > >  2) Any allocations larger than 1 page size, is done in increments of page
> size
> > > and probed up by 1KB leaving the residuals.
> > >
> > >  3a) Any residual for local arguments that is less than 63KB requires no
> probing.
> > >  Essentially this is a sliding window.  The probing range determines 
> > > the
> ABI
> > >  safe buffer, and the amount to be probed up.
> > >
> > >   b) Any residual for outgoing arguments that is less than 1KB requires no
> probing,
> > >  However to maintain our invariant, anything above or equal to 1KB
> requires a probe.
> > >
> > > Incrementally allocating less than the probing thresholds, e.g.
> > > recursive functions will not be an issue as the storing of LR counts as a
> probe.
> > >
> > >
> > > +---+
> > > |  ABI SAFE REGION  |
> > >   +--
> > >   | |   |
> > >   | |   |
> > >   | |   |
> > >   | |   |
> > >   | |   |
> > >   | |   |
> > >  maximum amount   | |   |
> > >  not needing a| |   |
> > >  probe| |   |
> > >   | |   |
> > >   | |   |
> > >   | |   |
> > >   | |   |Probe offset when
> > >   |  probe is required
> > >   | |   |
> > >   + +---+   Point of 
> > > first probe
> > > |  ABI SAFE REGION  |
> > > -
> > > |   |
> > > |   |
> > > |   |
> > >
> > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > > Target was tested with stack clash on and off by default.
> > >
> > > Ok for trunk?
> > >
> > > Thanks,
> > > Tamar
> > >
> > > gcc/
> > > 2018-07-11  Jeff Law  
> > >   Richard Sandiford 
> > >   Tamar Christina  
> > >
> > >   PR target/86486
> > >   * config/aarch64/aarch64.md (cmp,
> > >   probe_stack_range): Add k (SP) constraint.
> > >   * config/aarch64/aarch64.h (STACK_CLASH_CALLER_GUARD,
> > >   STACK_CLASH_MAX_UNROLL_PAGES): New.
> > >   * config/aarch64/aarch64.c (aarch64_output_probe_stack_range):
> Emit
> > >   stack probes for stack clash.
> > >   (aarch64_allocate_and_probe_stack_space): New.
> > >   (aarch64_expand_prologue): Use it.
> > >   (aarch64_expand_epilogue): Likewise and update IP regs re-use
> criteria.
> > >   (aarch64_sub_sp): Add emit_move_imm optional param.
> > >
> > > gcc/testsuite/
> > > 2018-07-11  Jeff Law  
> > >   Richard Sandiford 
> > >   Tamar Christina  
> > >
> > >   PR target/86486
> > >   * gcc.target/aarch64/stack-check-12.c: New.
> > >   * gcc.target/aarch64/stack-check-13.c: New.
> > >   * gcc.target/aarch64/stack-check-cfa-1.c: New.
> > >   * gcc.target/aarch64/stack-check-cfa-2.c: New.
> > >   * gcc.target/aarch64/stack-check-prologue-1.c: New.
> > >   * gcc.target/aarch64/stack-check-prologue-10.c: New.
> > >   * gcc.target/aarch64/stack-check-prologue-11.c: New.
> > >   * gcc.target/aarch64/stack-check-prologue-2.c: New.
> > >   * gcc.target/aarch64/stack-check-prologue-3.c: New.
> > >   * 

Re: [PATCH] Fix part of PR86389

2018-07-13 Thread Sandra Loosemore

On 07/03/2018 07:55 AM, Richard Biener wrote:


Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

 From 52aad98947e5cfcb5624ff24f0c557d0029c34fe Mon Sep 17 00:00:00 2001
From: Richard Guenther 
Date: Tue, 3 Jul 2018 14:04:01 +0200
Subject: [PATCH] fix-pr86389

2018-07-03  Richard Biener  

PR ipa/86389
* tree-ssa-structalias.c (find_func_clobbers): Properly
handle indirect calls.

* gcc.dg/torture/pr86389.c: New testcase.


FYI, it looks like this new testcase requires

/* { dg-require-effective-target trampolines } */

as it is failing on a target without nested function support. 
Alternatively, maybe the testcase could be rewritten not to use a nested 
function?  I'm not sure if nested-ness is required to test the bug this 
issue was for.


-Sandra


RE: [PATCH][GCC][AArch64] Ensure that outgoing argument size is at least 8 bytes when alloca and stack-clash. [Patch (3/6)]

2018-07-13 Thread Tamar Christina
Hi All,

I'm sending an updated patch which updates a testcase that  hits one of our 
corner cases.
This is an assembler scan only update in a testcase.

Regards,
Tamar

> -Original Message-
> From: Tamar Christina 
> Sent: Wednesday, July 11, 2018 12:21
> To: gcc-patches@gcc.gnu.org
> Cc: nd ; James Greenhalgh ;
> Richard Earnshaw ; Marcus Shawcroft
> 
> Subject: [PATCH][GCC][AArch64] Ensure that outgoing argument size is at
> least 8 bytes when alloca and stack-clash. [Patch (3/6)]
> 
> Hi All,
> 
> This patch adds a requirement that the number of outgoing arguments for a
> function is at least 8 bytes when using stack-clash protection.
> 
> By using this condition we can avoid a check in the alloca code and so have
> smaller and simpler code there.
> 
> A simplified version of the AArch64 stack frames is:
> 
>+---+
>|   |
>|   |
>|   |
>+---+
>|LR |
>+---+
>|FP |
>+---+
>|dynamic allocations|   expanding area which will push the outgoing
>+---+   args down during each allocation.
>|padding|
>+---+
>|outgoing stack args|  safety buffer of 8 bytes (aligned)
>+---+
> 
> By always defining an outgoing argument, alloca(0) effectively is safe to
> probe at $sp due to the reserved buffer being there.  It will never corrupt 
> the
> stack.
> 
> This is also safe for alloca(x) where x is 0 or x % page_size == 0.  In the 
> former
> it is the same case as alloca(0) while the latter is safe because any 
> allocation
> pushes the outgoing stack args down:
> 
>|FP |
>+---+
>|   |
>|dynamic allocations|   alloca (x)
>|   |
>+---+
>|padding|
>+---+
>|outgoing stack args|  safety buffer of 8 bytes (aligned)
>+---+
> 
> Which means when you probe for the residual, if it's 0 you'll again just probe
> in the outgoing stack args range, which we know is non-zero (at least 8 
> bytes).
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> Target was tested with stack clash on and off by default.
> 
> Ok for trunk?
> 
> Thanks,
> Tamar
> 
> gcc/
> 2018-07-11  Tamar Christina  
> 
>   PR target/86486
>   * config/aarch64/aarch64.h (STACK_CLASH_OUTGOING_ARGS,
>   STACK_DYNAMIC_OFFSET): New.
>   * config/aarch64/aarch64.c (aarch64_layout_frame):
>   Update outgoing args size.
>   (aarch64_stack_clash_protection_alloca_probe_range,
>   TARGET_STACK_CLASH_PROTECTION_ALLOCA_PROBE_RANGE):
> New.
> 
> gcc/testsuite/
> 2018-07-11  Tamar Christina  
> 
>   PR target/86486
>   * gcc.target/aarch64/stack-check-alloca-1.c: New.
>   * gcc.target/aarch64/stack-check-alloca-10.c: New.
>   * gcc.target/aarch64/stack-check-alloca-2.c: New.
>   * gcc.target/aarch64/stack-check-alloca-3.c: New.
>   * gcc.target/aarch64/stack-check-alloca-4.c: New.
>   * gcc.target/aarch64/stack-check-alloca-5.c: New.
>   * gcc.target/aarch64/stack-check-alloca-6.c: New.
>   * gcc.target/aarch64/stack-check-alloca-7.c: New.
>   * gcc.target/aarch64/stack-check-alloca-8.c: New.
>   * gcc.target/aarch64/stack-check-alloca-9.c: New.
>   * gcc.target/aarch64/stack-check-alloca.h: New.
>   * gcc.target/aarch64/stack-check-14.c: New.
>   * gcc.target/aarch64/stack-check-15.c: New.
> 
> --
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 1345f0eb171d05e2b833935c0a32f79c3db03f99..e9560b53bd8b5761855561dbf82d9c90cc1c282a 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -88,6 +88,10 @@
before probing has to be done for stack clash protection.  */
 #define STACK_CLASH_CALLER_GUARD 1024
 
+/* This value represents the minimum amount of bytes we expect the function's
+   outgoing arguments to be when stack-clash is enabled.  */
+#define STACK_CLASH_OUTGOING_ARGS 8
+
 /* This value controls how many pages we manually unroll the loop for when
generating stack clash probes.  */
 #define STACK_CLASH_MAX_UNROLL_PAGES 4
@@ -1069,4 +1073,15 @@ extern poly_uint16 aarch64_sve_vg;
 
 #define REGMODE_NATURAL_SIZE(MODE) aarch64_regmode_natural_size (MODE)
 
+/* Allocate the minimum of STACK_CLASH_OUTGOING_ARGS if stack clash protection
+   is enabled for the outgoing arguments.  This is essential as the extra args
+   space allows if to skip a check in alloca.  */
+#undef STACK_DYNAMIC_OFFSET
+#define STACK_DYNAMIC_OFFSET(FUNDECL)	   \
+   ((flag_stack_clash_protection	   \
+ && cfun->calls_alloca		   \
+ && known_lt (crtl->outgoing_args_size, 

RE: [PATCH][GCC][AArch64] Validate and set default parameters for stack-clash. [Patch (3/3)]

2018-07-13 Thread Tamar Christina
Hi All,

I am sending an updated patch which takes into account a
case where the set parameter value would not be safe to call.

No change in the cover letter.

Regards,
Tamar

> -Original Message-
> From: Tamar Christina 
> Sent: Wednesday, July 11, 2018 12:25
> To: gcc-patches@gcc.gnu.org
> Cc: nd ; James Greenhalgh ;
> Richard Earnshaw ; Marcus Shawcroft
> 
> Subject: [PATCH][GCC][AArch64] Validate and set default parameters for
> stack-clash. [Patch (3/3)]
> 
> Hi All,
> 
> This patch defines the default parameters and validation for the aarch64
> stack clash probing interval and guard sizes.  It cleans up the previous
> implementation and insures that at no point the invalidate arguments are
> present in the pipeline for AArch64.  Currently they are only corrected once
> cc1 initalizes the back-end.
> 
> The default for AArch64 is 64 KB for both of these and we only support 4 KB
> and 64 KB probes.  We also enforce that any value you set here for the
> parameters must be in sync.
> 
> If an invalid value is specified an error will be generated and compilation
> aborted.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> Target was tested with stack clash on and off by default.
> 
> Ok for trunk?
> 
> Thanks,
> Tamar
> 
> gcc/
> 2018-07-11  Tamar Christina  
> 
>   * common/config/aarch64/aarch64-common.c
> (TARGET_OPTION_DEFAULT_PARAM,
>   aarch64_option_default_param):  New.
>   (params.h): Include.
>   (TARGET_OPTION_VALIDATE_PARAM,
> aarch64_option_validate_param): New.
>   * config/aarch64/aarch64.c (aarch64_override_options_internal):
> Simplify
>   stack-clash protection validation code.
> 
> --
diff --git a/gcc/common/config/aarch64/aarch64-common.c b/gcc/common/config/aarch64/aarch64-common.c
index 292fb818705d4650113da59a6d88cf2aa7c9e57d..30bbec1380d6db60475f0d770944af98f566773d 100644
--- a/gcc/common/config/aarch64/aarch64-common.c
+++ b/gcc/common/config/aarch64/aarch64-common.c
@@ -30,6 +30,7 @@
 #include "opts.h"
 #include "flags.h"
 #include "diagnostic.h"
+#include "params.h"
 
 #ifdef  TARGET_BIG_ENDIAN_DEFAULT
 #undef  TARGET_DEFAULT_TARGET_FLAGS
@@ -41,6 +42,10 @@
 
 #undef	TARGET_OPTION_OPTIMIZATION_TABLE
 #define TARGET_OPTION_OPTIMIZATION_TABLE aarch_option_optimization_table
+#undef TARGET_OPTION_DEFAULT_PARAMS
+#define TARGET_OPTION_DEFAULT_PARAMS aarch64_option_default_params
+#undef TARGET_OPTION_VALIDATE_PARAM
+#define TARGET_OPTION_VALIDATE_PARAM aarch64_option_validate_param
 
 /* Set default optimization options.  */
 static const struct default_options aarch_option_optimization_table[] =
@@ -60,6 +65,48 @@ static const struct default_options aarch_option_optimization_table[] =
 { OPT_LEVELS_NONE, 0, NULL, 0 }
   };
 
+/* Implement target validation TARGET_OPTION_DEFAULT_PARAM.  */
+
+static bool
+aarch64_option_validate_param (const int value, const int param)
+{
+  /* Check that both parameters are the same.  */
+  if (param == (int) PARAM_STACK_CLASH_PROTECTION_GUARD_SIZE)
+{
+  if (value != 12 && value != 16)
+	{
+	error ("only values 12 (4 KB) and 16 (64 KB) are supported for guard "
+		"size.  Given value %d (%llu KB) is out of range.\n",
+		value, (1ULL << value) / 1024ULL);
+	return false;
+	}
+}
+
+  return true;
+}
+
+/* Implement TARGET_OPTION_DEFAULT_PARAMS.  */
+
+static void
+aarch64_option_default_params (void)
+{
+  /* We assume the guard page is 64k.  */
+  int index = (int) PARAM_STACK_CLASH_PROTECTION_GUARD_SIZE;
+  if (!compiler_params[index].configure_value)
+ set_default_param_value (PARAM_STACK_CLASH_PROTECTION_GUARD_SIZE, 16);
+
+  int guard_size
+= default_param_value (PARAM_STACK_CLASH_PROTECTION_GUARD_SIZE);
+
+  /* Set the interval parameter to be the same as the guard size.  This way the
+ mid-end code does the right thing for us.  */
+  set_default_param_value (PARAM_STACK_CLASH_PROTECTION_PROBE_INTERVAL,
+			   guard_size);
+
+  /* Validate the options.  */
+  aarch64_option_validate_param (guard_size, index);
+}
+
 /* Implement TARGET_HANDLE_OPTION.
This function handles the target specific options for CPU/target selection.
 
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index f2a4e0b3db62d9da87458bff98c16e8fb876f431..3fe0e47c561dfb7d1abf06a3322b4a6df63c7a21 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -10905,19 +10905,7 @@ aarch64_override_options_internal (struct gcc_options *opts)
 			 opts->x_param_values,
 			 global_options_set.x_param_values);
 
-  /* If the user hasn't change it via configure then set the default to 64 KB
- for the backend.  */
-  if (DEFAULT_STK_CLASH_GUARD_SIZE == 0)
-  maybe_set_param_value (PARAM_STACK_CLASH_PROTECTION_GUARD_SIZE, 16,
-			opts->x_param_values,
-			global_options_set.x_param_values);
-
-  /* Validate the guard size.  */
   int guard_size = PARAM_VALUE (PARAM_STACK_CLASH_PROTECTION_GUARD_SIZE);
-  if (guard_size != 12 && guard_size != 

Re: [PATCH] x86: Tune Skylake, Cannonlake and Icelake as Haswell

2018-07-13 Thread Jan Hubicka
> > We have also noticed that benchmarks on skylake are not good compared to
> > haswell, this nicely explains it.  I think this is -march=native regression
> > compared to GCC versions that did not suppored better CPUs than Haswell.  
> > So it
> > would be nice to backport it.
> 
> Yes, we should.   Here is the patch to backport to GCC 8.  OK for GCC 8 after
> it has been checked into trunk?

OK,
Honza
> 
> Thanks.
> 
> -- 
> H.J.

> From 40a1050b330b421a1f445cb2a40b5a002da2e6d6 Mon Sep 17 00:00:00 2001
> From: "H.J. Lu" 
> Date: Mon, 4 Jun 2018 19:16:06 -0700
> Subject: [PATCH] x86: Tune Skylake, Cannonlake and Icelake as Haswell
> 
> r259399, which added PROCESSOR_SKYLAKE, disabled many x86 optimizations
> which are enabled by PROCESSOR_HASWELL.  As the result, -mtune=skylake
> generates slower codes on Skylake than before.  The same also applies
> to Cannonlake and Icelak tuning.
> 
> This patch changes -mtune={skylake|cannonlake|icelake} to tune like
> -mtune=haswell for until their tuning is properly adjusted. It also
> enables -mprefer-vector-width=256 for -mtune=haswell, which has no
> impact on codegen when AVX512 isn't enabled.
> 
> Performance impacts on SPEC CPU 2017 rate with 1 copy using
> 
> -march=native -mfpmath=sse -O2 -m64
> 
> are
> 
> 1. On Broadwell server:
> 
> 500.perlbench_r   -0.56%
> 502.gcc_r -0.18%
> 505.mcf_r 0.24%
> 520.omnetpp_r 0.00%
> 523.xalancbmk_r   -0.32%
> 525.x264_r-0.17%
> 531.deepsjeng_r   0.00%
> 541.leela_r   0.00%
> 548.exchange2_r   0.12%
> 557.xz_r  0.00%
> Geomean   0.00%
> 
> 503.bwaves_r  0.00%
> 507.cactuBSSN_r   0.21%
> 508.namd_r0.00%
> 510.parest_r  0.19%
> 511.povray_r  -0.48%
> 519.lbm_r 0.00%
> 521.wrf_r 0.28%
> 526.blender_r 0.19%
> 527.cam4_r0.39%
> 538.imagick_r 0.00%
> 544.nab_r -0.36%
> 549.fotonik3d_r   0.51%
> 554.roms_r0.00%
> Geomean   0.17%
> 
> On Skylake client:
> 
> 500.perlbench_r   0.96%
> 502.gcc_r 0.13%
> 505.mcf_r -1.03%
> 520.omnetpp_r -1.11%
> 523.xalancbmk_r   1.02%
> 525.x264_r0.50%
> 531.deepsjeng_r   2.97%
> 541.leela_r   0.50%
> 548.exchange2_r   -0.95%
> 557.xz_r  2.41%
> Geomean   0.56%
> 
> 503.bwaves_r  0.49%
> 507.cactuBSSN_r   3.17%
> 508.namd_r4.05%
> 510.parest_r  0.15%
> 511.povray_r  0.80%
> 519.lbm_r 3.15%
> 521.wrf_r 10.56%
> 526.blender_r 2.97%
> 527.cam4_r2.36%
> 538.imagick_r 46.40%
> 544.nab_r 2.04%
> 549.fotonik3d_r   0.00%
> 554.roms_r1.27%
> Geomean   5.49%
> 
> On Skylake server:
> 
> 500.perlbench_r   0.71%
> 502.gcc_r -0.51%
> 505.mcf_r -1.06%
> 520.omnetpp_r -0.33%
> 523.xalancbmk_r   -0.22%
> 525.x264_r1.72%
> 531.deepsjeng_r   -0.26%
> 541.leela_r   0.57%
> 548.exchange2_r   -0.75%
> 557.xz_r  -1.28%
> Geomean   -0.21%
> 
> 503.bwaves_r  0.00%
> 507.cactuBSSN_r   2.66%
> 508.namd_r3.67%
> 510.parest_r  1.25%
> 511.povray_r  2.26%
> 519.lbm_r 1.69%
> 521.wrf_r 11.03%
> 526.blender_r 3.39%
> 527.cam4_r1.69%
> 538.imagick_r 64.59%
> 544.nab_r -0.54%
> 549.fotonik3d_r   2.68%
> 554.roms_r0.00%
> Geomean   6.19%
> 
> This patch improves -march=native performance on Skylake up to 60% and
> leaves -march=native performance unchanged on Haswell.
> 
> gcc/
> 
>   Backport from mainline
>   2018-07-12  H.J. Lu  
>   Sunil K Pandey  
> 
>   PR target/84413
>   * config/i386/i386.c (m_CORE_AVX512): New.
>   (m_CORE_AVX2): Likewise.
>   (m_CORE_ALL): Add m_CORE_AVX2.
>   * config/i386/x86-tune.def: Replace m_HASWELL with m_CORE_AVX2.
>   Replace m_SKYLAKE_AVX512 with m_CORE_AVX512 on avx256_optimal
>   and remove the rest of m_SKYLAKE_AVX512.
> 
> gcc/testsuite/
> 
>   Backport from mainline
>   2018-07-12  H.J. Lu  
>   Sunil K Pandey  
> 
>   PR target/84413
>   * gcc.target/i386/pr84413-1.c: New test.
>   * gcc.target/i386/pr84413-2.c: Likewise.
>   * gcc.target/i386/pr84413-3.c: Likewise.
>   * gcc.target/i386/pr84413-4.c: Likewise.
> ---
>  gcc/config/i386/i386.c|  5 -
>  gcc/config/i386/x86-tune.def  | 26 +++
>  gcc/testsuite/gcc.target/i386/pr84413-1.c | 17 +++
>  

Re: C++ patch ping

2018-07-13 Thread Nathan Sidwell

On 07/13/2018 09:49 AM, Jakub Jelinek wrote:

Hi!

I'd like to ping the following C++ patches:

- PR c++/85515
   make range for temporaries unspellable during parsing and only
   turn them into spellable for debug info purposes
   http://gcc.gnu.org/ml/gcc-patches/2018-07/msg00086.html



How hard would it be to add the 6 special identifiers to the C++ global 
table via initialize_predefined_identifiers (decl.c) and then use them 
directly in the for range machinery?  repeated get_identifier 
("string-const") just smells bad.


nathan

--
Nathan Sidwell


[ARM/FDPIC v2 20/21] [ARM][testsuite] FDPIC: Adjust pr43698.c to avoid clash with uclibc.

2018-07-13 Thread christophe.lyon
From: Christophe Lyon 

uclibc defines bswap_32, so use a different name in this test.

2018-XX-XX  Christophe Lyon  

gcc/testsuite/
* gcc.target/arm/pr43698.c (bswap_32): Rename as my_bswap_32.

Change-Id: I2591bd911030814331cabf97ee5cf6cf8124b4f3

diff --git a/gcc/testsuite/gcc.target/arm/pr43698.c 
b/gcc/testsuite/gcc.target/arm/pr43698.c
index 1fc497c..3b5dad0 100644
--- a/gcc/testsuite/gcc.target/arm/pr43698.c
+++ b/gcc/testsuite/gcc.target/arm/pr43698.c
@@ -6,7 +6,7 @@
 
 char do_reverse_endian = 0;
 
-#  define bswap_32(x) \
+#  define my_bswap_32(x) \
   x) & 0xff00) >> 24) | \
(((x) & 0x00ff) >>  8) | \
(((x) & 0xff00) <<  8) | \
@@ -16,7 +16,7 @@ char do_reverse_endian = 0;
   (__extension__ ({ \
   uint64_t __res; \
   if (!do_reverse_endian) {__res = (X); \
-  } else if (sizeof(X) == 4) { __res = bswap_32((X)); \
+  } else if (sizeof(X) == 4) { __res = my_bswap_32((X)); \
   } \
   __res; \
 }))
-- 
2.6.3



[ARM/FDPIC v2 21/21] [ARM][testsuite] FDPIC: Skip tests using architecture older than v7

2018-07-13 Thread christophe.lyon
From: Christophe Lyon 

Since FDPIC requires an architecture >=7, these tests fail because
they enforce and older version. They would pass if the compiler didn't
bail out though.

2018-07-13  Christophe Lyon  

* gcc.target/arm/armv6-unaligned-load-ice.c: Add arm_arch
effective-target.
* gcc.target/arm/attr-unaligned-load-ice.c: Likewise.
* gcc.target/arm/attr_arm-err.c: Likewise.
* gcc.target/arm/ftest-armv4-arm.c: Likewise.
* gcc.target/arm/ftest-armv4t-arm.c: Likewise.
* gcc.target/arm/ftest-armv4t-thumb.c: Likewise.
* gcc.target/arm/ftest-armv5t-arm.c: Likewise.
* gcc.target/arm/ftest-armv5t-thumb.c: Likewise.
* gcc.target/arm/ftest-armv5te-arm.c: Likewise.
* gcc.target/arm/ftest-armv5te-thumb.c: Likewise.
* gcc.target/arm/ftest-armv6-arm.c: Likewise.
* gcc.target/arm/ftest-armv6-thumb.c: Likewise.
* gcc.target/arm/ftest-armv6k-arm.c: Likewise.
* gcc.target/arm/ftest-armv6k-thumb.c: Likewise.
* gcc.target/arm/ftest-armv6m-thumb.c: Likewise.
* gcc.target/arm/ftest-armv6t2-arm.c: Likewise.
* gcc.target/arm/ftest-armv6t2-thumb.c: Likewise.
* gcc.target/arm/ftest-armv6z-arm.c: Likewise.
* gcc.target/arm/ftest-armv6z-thumb.c: Likewise.
* gcc.target/arm/g2.c: Likewise.
* gcc.target/arm/macro_defs1.c: Likewise.
* gcc.target/arm/pr59858.c: Likewise.
* gcc.target/arm/pr65647-2.c: Likewise.
* gcc.target/arm/pr79058.c: Likewise.
* gcc.target/arm/pr83712.c: Likewise.
* gcc.target/arm/pragma_arch_switch_2.c: Likewise.
* gcc.target/arm/scd42-1.c: Likewise.
* gcc.target/arm/scd42-2.c: Likewise.
* gcc.target/arm/scd42-3.c: Likewise.

Change-Id: I0845b262b241026561cc52a19ff8bb1659675e49

diff --git a/gcc/testsuite/gcc.target/arm/armv6-unaligned-load-ice.c 
b/gcc/testsuite/gcc.target/arm/armv6-unaligned-load-ice.c
index 88528f1..4c1568f 100644
--- a/gcc/testsuite/gcc.target/arm/armv6-unaligned-load-ice.c
+++ b/gcc/testsuite/gcc.target/arm/armv6-unaligned-load-ice.c
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
 /* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-march=*" } 
{ "-march=armv6k" } } */
 /* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-marm" } { 
"" } } */
+/* { dg-require-effective-target arm_arch_v6k_ok } */
 /* { dg-options "-mthumb -Os -mfloat-abi=softfp" } */
 /* { dg-add-options arm_arch_v6k } */
 
diff --git a/gcc/testsuite/gcc.target/arm/attr-unaligned-load-ice.c 
b/gcc/testsuite/gcc.target/arm/attr-unaligned-load-ice.c
index e1ed1c1..2eeb522 100644
--- a/gcc/testsuite/gcc.target/arm/attr-unaligned-load-ice.c
+++ b/gcc/testsuite/gcc.target/arm/attr-unaligned-load-ice.c
@@ -2,6 +2,7 @@
Verify that unaligned_access is correctly with attribute target.  */
 /* { dg-do compile } */
 /* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-march=*" } 
{ "-march=armv6" } } */
+/* { dg-require-effective-target arm_arch_v6_ok } */
 /* { dg-options "-Os -mfloat-abi=softfp -mtp=soft" } */
 /* { dg-add-options arm_arch_v6 } */
 
diff --git a/gcc/testsuite/gcc.target/arm/attr_arm-err.c 
b/gcc/testsuite/gcc.target/arm/attr_arm-err.c
index 630c06a..d410056 100644
--- a/gcc/testsuite/gcc.target/arm/attr_arm-err.c
+++ b/gcc/testsuite/gcc.target/arm/attr_arm-err.c
@@ -2,6 +2,7 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target arm_arm_ok } */
 /* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-march=*" } 
{ "-march=armv6-m" } } */
+/* { dg-require-effective-target arm_arch_v6m_ok } */
 /* { dg-add-options arm_arch_v6m } */
 
 int __attribute__((target("arm")))
diff --git a/gcc/testsuite/gcc.target/arm/ftest-armv4-arm.c 
b/gcc/testsuite/gcc.target/arm/ftest-armv4-arm.c
index 4b48ef8..447a8ec 100644
--- a/gcc/testsuite/gcc.target/arm/ftest-armv4-arm.c
+++ b/gcc/testsuite/gcc.target/arm/ftest-armv4-arm.c
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
 /* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-march=*" } 
{ "-march=armv4" } } */
 /* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-mthumb" } { 
"" } } */
+/* { dg-require-effective-target arm_arch_v4_ok } */
 /* { dg-options "-marm" } */
 /* { dg-add-options arm_arch_v4 } */
 
diff --git a/gcc/testsuite/gcc.target/arm/ftest-armv4t-arm.c 
b/gcc/testsuite/gcc.target/arm/ftest-armv4t-arm.c
index 016506f..05db533 100644
--- a/gcc/testsuite/gcc.target/arm/ftest-armv4t-arm.c
+++ b/gcc/testsuite/gcc.target/arm/ftest-armv4t-arm.c
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
 /* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-march=*" } 
{ "-march=armv4t" } } */
 /* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-mthumb" } { 
"" } } */
+/* { dg-require-effective-target arm_arch_v4t_ok } */
 /* { dg-options "-marm" } */
 /* { dg-add-options arm_arch_v4t } */
 
diff --git 

[ARM/FDPIC v2 19/21] [ARM][testsuite] FDPIC: Enable tests on pie_enabled targets

2018-07-13 Thread christophe.lyon
From: Christophe Lyon 

Some tests have the "nonpic" guard, but pass on
arm*-*-uclinuxfdpiceabi because it is in PIE mode by default. Rather
than adding this target to all these tests, add the "pie_enabled"
effective target.

2018-XX-XX  Christophe Lyon  

gcc/testsuite/
* g++.dg/cpp0x/noexcept03.C: Add pie_enabled.
* g++.dg/ipa/devirt-c-7.C: Likewise.
* g++.dg/ipa/ivinline-1.C: Likewise.
* g++.dg/ipa/ivinline-2.C: Likewise.
* g++.dg/ipa/ivinline-3.C: Likewise.
* g++.dg/ipa/ivinline-4.C: Likewise.
* g++.dg/ipa/ivinline-5.C: Likewise.
* g++.dg/ipa/ivinline-7.C: Likewise.
* g++.dg/ipa/ivinline-8.C: Likewise.
* g++.dg/ipa/ivinline-9.C: Likewise.
* g++.dg/tls/pr79288.C: Likewise.
* gcc.dg/addr_equal-1.c: Likewise.
* gcc.dg/const-1.c: Likewise.
* gcc.dg/ipa/pure-const-1.c: Likewise.
* gcc.dg/noreturn-8.c: Likewise.
* gcc.dg/pr33826.c: Likewise.
* gcc.dg/torture/ipa-pta-1.c: Likewise.
* gcc.dg/tree-ssa/alias-2.c: Likewise.
* gcc.dg/tree-ssa/ipa-split-5.c: Likewise.
* gcc.dg/tree-ssa/loadpre6.c: Likewise.
* gcc.dg/uninit-19.c: Likewise.

Change-Id: I1a0d836b892c23891f739fccdc467d0f354ab82c

diff --git a/gcc/testsuite/g++.dg/cpp0x/noexcept03.C 
b/gcc/testsuite/g++.dg/cpp0x/noexcept03.C
index 2d37867..906a44d 100644
--- a/gcc/testsuite/g++.dg/cpp0x/noexcept03.C
+++ b/gcc/testsuite/g++.dg/cpp0x/noexcept03.C
@@ -1,6 +1,6 @@
 // Runtime test for noexcept-specification.
 // { dg-options "-Wnoexcept" }
-// { dg-do run { target nonpic } }
+// { dg-do run { target { nonpic || pie_enabled } } }
 // { dg-require-effective-target c++11 }
 
 #include 
diff --git a/gcc/testsuite/g++.dg/ipa/devirt-c-7.C 
b/gcc/testsuite/g++.dg/ipa/devirt-c-7.C
index 2e76cbe..efb65c2 100644
--- a/gcc/testsuite/g++.dg/ipa/devirt-c-7.C
+++ b/gcc/testsuite/g++.dg/ipa/devirt-c-7.C
@@ -1,7 +1,6 @@
 /* Verify that ipa-cp will not get confused by placement new constructing an
object within another one when looking for dynamic type change .  */
-/* { dg-do run } */
-/* { dg-require-effective-target nonpic } */
+/* { dg-do run { target { nonpic || pie_enabled } } } */
 /* { dg-options "-O3 -Wno-attributes"  } */
 
 extern "C" void abort (void);
diff --git a/gcc/testsuite/g++.dg/ipa/ivinline-1.C 
b/gcc/testsuite/g++.dg/ipa/ivinline-1.C
index 9b10d20..2d988bc 100644
--- a/gcc/testsuite/g++.dg/ipa/ivinline-1.C
+++ b/gcc/testsuite/g++.dg/ipa/ivinline-1.C
@@ -1,6 +1,6 @@
 /* Verify that simple virtual calls are inlined even without early
inlining.  */
-/* { dg-do run { target nonpic } } */
+/* { dg-do run { target { nonpic || pie_enabled } } } */
 /* { dg-options "-O3 -fdump-ipa-inline -fno-early-inlining -fno-ipa-cp"  } */
 
 extern "C" void abort (void);
diff --git a/gcc/testsuite/g++.dg/ipa/ivinline-2.C 
b/gcc/testsuite/g++.dg/ipa/ivinline-2.C
index 21cd46f..d978638 100644
--- a/gcc/testsuite/g++.dg/ipa/ivinline-2.C
+++ b/gcc/testsuite/g++.dg/ipa/ivinline-2.C
@@ -1,6 +1,6 @@
 /* Verify that simple virtual calls using this pointer are inlined
even without early inlining..  */
-/* { dg-do run { target nonpic } } */
+/* { dg-do run { target { nonpic || pie_enabled } } } */
 /* { dg-options "-O3 -fdump-ipa-inline -fno-early-inlining -fno-ipa-cp"  } */
 
 extern "C" void abort (void);
diff --git a/gcc/testsuite/g++.dg/ipa/ivinline-3.C 
b/gcc/testsuite/g++.dg/ipa/ivinline-3.C
index 1e24644..f756a16 100644
--- a/gcc/testsuite/g++.dg/ipa/ivinline-3.C
+++ b/gcc/testsuite/g++.dg/ipa/ivinline-3.C
@@ -1,6 +1,6 @@
 /* Verify that simple virtual calls on an object refrence are inlined
even without early inlining.  */
-/* { dg-do run { target nonpic } } */
+/* { dg-do run { target { nonpic || pie_enabled } } } */
 /* { dg-options "-O3 -fdump-ipa-inline -fno-early-inlining -fno-ipa-cp"  } */
 
 extern "C" void abort (void);
diff --git a/gcc/testsuite/g++.dg/ipa/ivinline-4.C 
b/gcc/testsuite/g++.dg/ipa/ivinline-4.C
index cf0d980..5fbd3ef 100644
--- a/gcc/testsuite/g++.dg/ipa/ivinline-4.C
+++ b/gcc/testsuite/g++.dg/ipa/ivinline-4.C
@@ -1,7 +1,7 @@
 /* Verify that simple virtual calls are inlined even without early
inlining, even when a typecast to an ancestor is involved along the
way.  */
-/* { dg-do run { target nonpic } } */
+/* { dg-do run { target { nonpic || pie_enabled } } } */
 /* { dg-options "-O3 -fdump-ipa-inline -fno-early-inlining -fno-ipa-cp"  } */
 
 extern "C" void abort (void);
diff --git a/gcc/testsuite/g++.dg/ipa/ivinline-5.C 
b/gcc/testsuite/g++.dg/ipa/ivinline-5.C
index f15ebf2..6c19907 100644
--- a/gcc/testsuite/g++.dg/ipa/ivinline-5.C
+++ b/gcc/testsuite/g++.dg/ipa/ivinline-5.C
@@ -1,6 +1,6 @@
 /* Verify that virtual call inlining does not pick a wrong method when
there is a user defined ancestor in an object.  */
-/* { dg-do run { target nonpic } } */
+/* { dg-do run { target { nonpic || pie_enabled } } } */
 /* { dg-options "-O3 -fdump-ipa-inline 

[ARM/FDPIC v2 18/21] [ARM][testsuite] FDPIC: Handle *-*-uclinux*

2018-07-13 Thread christophe.lyon
From: Christophe Lyon 

Add *-*-uclinux* to tests that work on this target.

2018-XX-XX  Christophe Lyon  

gcc/testsuite/
* g++.dg/abi/forced.C: Add *-*-uclinux*.
* g++.dg/abi/guard2.C: Likewise.
* g++.dg/ext/cleanup-10.C: Likewise.
* g++.dg/ext/cleanup-11.C: Likewise.
* g++.dg/ext/cleanup-8.C: Likewise.
* g++.dg/ext/cleanup-9.C: Likewise.
* g++.dg/ext/sync-4.C: Likewise.
* g++.dg/ipa/comdat.C: Likewise.
* gcc.dg/20041106-1.c: Likewise.
* gcc.dg/cleanup-10.c: Likewise.
* gcc.dg/cleanup-11.c: Likewise.
* gcc.dg/cleanup-8.c: Likewise.
* gcc.dg/cleanup-9.c: Likewise.
* gcc.dg/fdata-sections-1.c: Likewise.
* gcc.dg/fdata-sections-2.c: Likewise.
* gcc.dg/pr39323-1.c: Likewise.
* gcc.dg/pr39323-2.c: Likewise.
* gcc.dg/pr39323-3.c: Likewise.
* gcc.dg/pr65780-1.c: Likewise.
* gcc.dg/pr65780-2.c: Likewise.
* gcc.dg/pr67338.c: Likewise.
* gcc.dg/pr78185.c: Likewise.
* gcc.dg/pr83100-1.c: Likewise.
* gcc.dg/pr83100-4.c: Likewise.
* gcc.dg/strlenopt-12g.c: Likewise.
* gcc.dg/strlenopt-14g.c: Likewise.
* gcc.dg/strlenopt-14gf.c: Likewise.
* gcc.dg/strlenopt-16g.c: Likewise.
* gcc.dg/strlenopt-17g.c: Likewise.
* gcc.dg/strlenopt-18g.c: Likewise.
* gcc.dg/strlenopt-1f.c: Likewise.
* gcc.dg/strlenopt-22g.c: Likewise.
* gcc.dg/strlenopt-2f.c: Likewise.
* gcc.dg/strlenopt-31g.c: Likewise.
* gcc.dg/strlenopt-33g.c: Likewise.
* gcc.dg/strlenopt-4g.c: Likewise.
* gcc.dg/strlenopt-4gf.c: Likewise.
* gcc.dg/strncmp-2.c: Likewise.
* gcc.dg/struct-ret-3.c: Likewise.
* gcc.dg/torture/pr69760.c: Likewise.
* gcc.target/arm/div64-unwinding.c: Likewise.
* gcc.target/arm/stack-checking.c: Likewise.
* gcc.target/arm/synchronize.c: Likewise.
* gcc.target/arm/pr66912.c: Add arm*-*-uclinuxfdpiceabi.
* lib/target-supports.exp (check_effective_target_pie): Likewise.
(check_effective_target_sync_long_long_runtime): Likewise.
(check_effective_target_sync_int_long): Likewise.
(check_effective_target_sync_char_short): Likewise.

Change-Id: I89bfea79d4490c5df0b6470def5a31d7f31ac2cc

diff --git a/gcc/testsuite/g++.dg/abi/forced.C 
b/gcc/testsuite/g++.dg/abi/forced.C
index 0e6be28..2d1ec53 100644
--- a/gcc/testsuite/g++.dg/abi/forced.C
+++ b/gcc/testsuite/g++.dg/abi/forced.C
@@ -1,4 +1,4 @@
-// { dg-do run { target *-*-linux* *-*-gnu* } }
+// { dg-do run { target *-*-linux* *-*-gnu* *-*-uclinux* } }
 // { dg-options "-pthread" }
 
 #include 
diff --git a/gcc/testsuite/g++.dg/abi/guard2.C 
b/gcc/testsuite/g++.dg/abi/guard2.C
index c35fa7e..74139a8 100644
--- a/gcc/testsuite/g++.dg/abi/guard2.C
+++ b/gcc/testsuite/g++.dg/abi/guard2.C
@@ -1,6 +1,6 @@
 // PR c++/41611
 // Test that the guard gets its own COMDAT group.
-// { dg-final { scan-assembler "_ZGVZN1A1fEvE1i,comdat" { target *-*-linux* 
*-*-gnu* } } }
+// { dg-final { scan-assembler "_ZGVZN1A1fEvE1i,comdat" { target *-*-linux* 
*-*-gnu* *-*-uclinux* } } }
 
 struct A {
   static int f()
diff --git a/gcc/testsuite/g++.dg/ext/cleanup-10.C 
b/gcc/testsuite/g++.dg/ext/cleanup-10.C
index 66c7b76..56aeb66 100644
--- a/gcc/testsuite/g++.dg/ext/cleanup-10.C
+++ b/gcc/testsuite/g++.dg/ext/cleanup-10.C
@@ -1,4 +1,4 @@
-/* { dg-do run { target hppa*-*-hpux* *-*-linux* *-*-gnu* powerpc*-*-darwin* 
*-*-darwin[912]* } } */
+/* { dg-do run { target hppa*-*-hpux* *-*-linux* *-*-gnu* powerpc*-*-darwin* 
*-*-darwin[912]* *-*-uclinux* } } */
 /* { dg-options "-fexceptions -fnon-call-exceptions -O2" } */
 /* Verify that cleanups work with exception handling through signal frames
on alternate stack.  */
diff --git a/gcc/testsuite/g++.dg/ext/cleanup-11.C 
b/gcc/testsuite/g++.dg/ext/cleanup-11.C
index 6e96521..c6d3560 100644
--- a/gcc/testsuite/g++.dg/ext/cleanup-11.C
+++ b/gcc/testsuite/g++.dg/ext/cleanup-11.C
@@ -1,4 +1,4 @@
-/* { dg-do run { target hppa*-*-hpux* *-*-linux* *-*-gnu* powerpc*-*-darwin* 
*-*-darwin[912]* } } */
+/* { dg-do run { target hppa*-*-hpux* *-*-linux* *-*-gnu* powerpc*-*-darwin* 
*-*-darwin[912]* *-*-uclinux* } } */
 /* { dg-options "-fexceptions -fnon-call-exceptions -O2" } */
 /* Verify that cleanups work with exception handling through realtime signal
frames on alternate stack.  */
diff --git a/gcc/testsuite/g++.dg/ext/cleanup-8.C 
b/gcc/testsuite/g++.dg/ext/cleanup-8.C
index ccf9bef..e99508d 100644
--- a/gcc/testsuite/g++.dg/ext/cleanup-8.C
+++ b/gcc/testsuite/g++.dg/ext/cleanup-8.C
@@ -1,4 +1,4 @@
-/* { dg-do run { target hppa*-*-hpux* *-*-linux* *-*-gnu* powerpc*-*-darwin* 
*-*-darwin[912]* } } */
+/* { dg-do run { target hppa*-*-hpux* *-*-linux* *-*-gnu* powerpc*-*-darwin* 
*-*-darwin[912]* *-*-uclinux* } } */
 /* { dg-options "-fexceptions -fnon-call-exceptions -O2" } */
 /* Verify that 

[ARM/FDPIC v2 17/21] [ARM][testsuite] FDPIC: Skip tests that don't work in PIC mode

2018-07-13 Thread christophe.lyon
From: Christophe Lyon 

Some tests fail on arm*-*-uclinuxfdpiceabi because it generates PIC
code and they don't support it: skip them. They also fail on
arm*-linux* when forcing -fPIC.

2018-XX-XX  Christophe Lyon  

gcc/testsuite/
* gcc.target/arm/eliminate.c: Accept only nonpic targets.
* g++.dg/other/anon5.C: Likewise.

Change-Id: I8efb8d356ce25b020c44a84b07f79a996dca0358

diff --git a/gcc/testsuite/g++.dg/other/anon5.C 
b/gcc/testsuite/g++.dg/other/anon5.C
index ee4601e..dadd92e 100644
--- a/gcc/testsuite/g++.dg/other/anon5.C
+++ b/gcc/testsuite/g++.dg/other/anon5.C
@@ -1,5 +1,6 @@
 // PR c++/34094
 // { dg-do link { target { ! { *-*-darwin* *-*-hpux* *-*-solaris2.* } } } }
+// { dg-require-effective-target nonpic }
 // { dg-options "-gdwarf-2" }
 // Ignore additional message on powerpc-ibm-aix
 // { dg-prune-output "obtain more information" } */
diff --git a/gcc/testsuite/gcc.target/arm/eliminate.c 
b/gcc/testsuite/gcc.target/arm/eliminate.c
index f254dd8..299d4df 100644
--- a/gcc/testsuite/gcc.target/arm/eliminate.c
+++ b/gcc/testsuite/gcc.target/arm/eliminate.c
@@ -1,4 +1,4 @@
-/* { dg-do compile } */
+/* { dg-do compile { target { nonpic } } } */
 /* { dg-options "-O2" }  */
 
 struct X
-- 
2.6.3



[ARM/FDPIC v2 16/21] [ARM][testsuite] FDPIC: Skip v8-m and v6-m tests that currently produce an ICE

2018-07-13 Thread christophe.lyon
From: Christophe Lyon 

v6-M and v8-M are not supported currently in FDPIC mode, it's better
to skip the tests.

2018-XX-XX  Christophe Lyon  
Mickaël Guêné 

gcc/testsuite/
* gcc.target/arm/atomic-comp-swap-release-acquire-3.c: Skip on
arm*-*-uclinuxfdpiceabi.
* gcc.target/arm/atomic-op-acq_rel-3.c: Likewise.
* gcc.target/arm/atomic-op-acquire-3.c: Likewise.
* gcc.target/arm/atomic-op-char-3.c: Likewise.
* gcc.target/arm/atomic-op-consume-3.c: Likewise.
* gcc.target/arm/atomic-op-int-3.c: Likewise.
* gcc.target/arm/atomic-op-relaxed-3.c: Likewise.
* gcc.target/arm/atomic-op-release-3.c: Likewise.
* gcc.target/arm/atomic-op-seq_cst-3.c: Likewise.
* gcc.target/arm/atomic-op-short-3.c: Likewise.
* gcc.target/arm/pr65647.c: Likewise.

Change-Id: I2357be4c92b5a1a8430ae6617c7bba7bec0ea213

diff --git a/gcc/testsuite/gcc.target/arm/atomic-comp-swap-release-acquire-3.c 
b/gcc/testsuite/gcc.target/arm/atomic-comp-swap-release-acquire-3.c
index 0191f7a..81b5c3d 100644
--- a/gcc/testsuite/gcc.target/arm/atomic-comp-swap-release-acquire-3.c
+++ b/gcc/testsuite/gcc.target/arm/atomic-comp-swap-release-acquire-3.c
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-skip-if "FDPIC does not support v8m yet" { arm*-*-uclinuxfdpiceabi } 
"*" "" } */
 /* { dg-require-effective-target arm_arch_v8m_base_ok } */
 /* { dg-options "-O2 -fno-ipa-icf" } */
 /* { dg-add-options arm_arch_v8m_base } */
diff --git a/gcc/testsuite/gcc.target/arm/atomic-op-acq_rel-3.c 
b/gcc/testsuite/gcc.target/arm/atomic-op-acq_rel-3.c
index f2ed32d..2b03f75 100644
--- a/gcc/testsuite/gcc.target/arm/atomic-op-acq_rel-3.c
+++ b/gcc/testsuite/gcc.target/arm/atomic-op-acq_rel-3.c
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-skip-if "FDPIC does not support v8m yet" { arm*-*-uclinuxfdpiceabi } 
"*" "" } */
 /* { dg-require-effective-target arm_arch_v8m_base_ok } */
 /* { dg-options "-O2" } */
 /* { dg-add-options arm_arch_v8m_base } */
diff --git a/gcc/testsuite/gcc.target/arm/atomic-op-acquire-3.c 
b/gcc/testsuite/gcc.target/arm/atomic-op-acquire-3.c
index bba1c27..d315b25 100644
--- a/gcc/testsuite/gcc.target/arm/atomic-op-acquire-3.c
+++ b/gcc/testsuite/gcc.target/arm/atomic-op-acquire-3.c
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-skip-if "FDPIC does not support v8m yet" { arm*-*-uclinuxfdpiceabi } 
"*" "" } */
 /* { dg-require-effective-target arm_arch_v8m_base_ok } */
 /* { dg-options "-O2" } */
 /* { dg-add-options arm_arch_v8m_base } */
diff --git a/gcc/testsuite/gcc.target/arm/atomic-op-char-3.c 
b/gcc/testsuite/gcc.target/arm/atomic-op-char-3.c
index 17117ee..11e596d 100644
--- a/gcc/testsuite/gcc.target/arm/atomic-op-char-3.c
+++ b/gcc/testsuite/gcc.target/arm/atomic-op-char-3.c
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-skip-if "FDPIC does not support v8m yet" { arm*-*-uclinuxfdpiceabi } 
"*" "" } */
 /* { dg-require-effective-target arm_arch_v8m_base_ok } */
 /* { dg-options "-O2" } */
 /* { dg-add-options arm_arch_v8m_base } */
diff --git a/gcc/testsuite/gcc.target/arm/atomic-op-consume-3.c 
b/gcc/testsuite/gcc.target/arm/atomic-op-consume-3.c
index 8352f0c..e5da00b 100644
--- a/gcc/testsuite/gcc.target/arm/atomic-op-consume-3.c
+++ b/gcc/testsuite/gcc.target/arm/atomic-op-consume-3.c
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-skip-if "FDPIC does not support v8m yet" { arm*-*-uclinuxfdpiceabi } 
"*" "" } */
 /* { dg-require-effective-target arm_arch_v8m_base_ok } */
 /* { dg-options "-O2" } */
 /* { dg-add-options arm_arch_v8m_base } */
diff --git a/gcc/testsuite/gcc.target/arm/atomic-op-int-3.c 
b/gcc/testsuite/gcc.target/arm/atomic-op-int-3.c
index d4f1db3..997ab08 100644
--- a/gcc/testsuite/gcc.target/arm/atomic-op-int-3.c
+++ b/gcc/testsuite/gcc.target/arm/atomic-op-int-3.c
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-skip-if "FDPIC does not support v8m yet" { arm*-*-uclinuxfdpiceabi } 
"*" "" } */
 /* { dg-require-effective-target arm_arch_v8m_base_ok } */
 /* { dg-options "-O2" } */
 /* { dg-add-options arm_arch_v8m_base } */
diff --git a/gcc/testsuite/gcc.target/arm/atomic-op-relaxed-3.c 
b/gcc/testsuite/gcc.target/arm/atomic-op-relaxed-3.c
index 09b5ea9..383a48a 100644
--- a/gcc/testsuite/gcc.target/arm/atomic-op-relaxed-3.c
+++ b/gcc/testsuite/gcc.target/arm/atomic-op-relaxed-3.c
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-skip-if "FDPIC does not support v8m yet" { arm*-*-uclinuxfdpiceabi } 
"*" "" } */
 /* { dg-require-effective-target arm_arch_v8m_base_ok } */
 /* { dg-options "-O2" } */
 /* { dg-add-options arm_arch_v8m_base } */
diff --git a/gcc/testsuite/gcc.target/arm/atomic-op-release-3.c 
b/gcc/testsuite/gcc.target/arm/atomic-op-release-3.c
index 2b136f5..3227c75 100644
--- a/gcc/testsuite/gcc.target/arm/atomic-op-release-3.c
+++ b/gcc/testsuite/gcc.target/arm/atomic-op-release-3.c
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-skip-if "FDPIC does not support v8m yet" { 

[ARM/FDPIC v2 15/21] [ARM][testsuite] FDPIC: Adjust scan-assembler patterns.

2018-07-13 Thread christophe.lyon
From: Christophe Lyon 

In FDPIC mode, r9 is saved in addition to other registers, so update
the expected patterns accordingly.

2018-XX-XX  Christophe Lyon  
Mickaël Guêné 

* gcc/testsuite/
* gcc.target/arm/interrupt-1.c: Add scan-assembler pattern for
arm*-*-uclinuxfdpiceabi.
* gcc.target/arm/interrupt-2.c: Likewise.
* gcc.target/arm/pr70830.c: Likewise.

Change-Id: Id946b79bacc32be585c31e60a355191f104cc29e

diff --git a/gcc/testsuite/gcc.target/arm/interrupt-1.c 
b/gcc/testsuite/gcc.target/arm/interrupt-1.c
index fe94877..493763d 100644
--- a/gcc/testsuite/gcc.target/arm/interrupt-1.c
+++ b/gcc/testsuite/gcc.target/arm/interrupt-1.c
@@ -13,5 +13,7 @@ void foo ()
   bar (0);
 }
 
-/* { dg-final { scan-assembler "push\t{r0, r1, r2, r3, r4, fp, ip, lr}" } } */
-/* { dg-final { scan-assembler "ldmfd\tsp!, {r0, r1, r2, r3, r4, fp, ip, 
pc}\\^" } } */
+/* { dg-final { scan-assembler "push\t{r0, r1, r2, r3, r4, fp, ip, lr}" { 
target { ! arm*-*-uclinuxfdpiceabi } } } } */
+/* { dg-final { scan-assembler "ldmfd\tsp!, {r0, r1, r2, r3, r4, fp, ip, 
pc}\\^" { target { ! arm*-*-uclinuxfdpiceabi } } } } */
+/* { dg-final { scan-assembler "push\t{r0, r1, r2, r3, r4, r5, r9, fp, ip, 
lr}" { target arm*-*-uclinuxfdpiceabi } } } */
+/* { dg-final { scan-assembler "ldmfd\tsp!, {r0, r1, r2, r3, r4, r5, r9, fp, 
ip, pc}\\^" { target arm*-*-uclinuxfdpiceabi } } } */
diff --git a/gcc/testsuite/gcc.target/arm/interrupt-2.c 
b/gcc/testsuite/gcc.target/arm/interrupt-2.c
index 289eca0..5be1f16 100644
--- a/gcc/testsuite/gcc.target/arm/interrupt-2.c
+++ b/gcc/testsuite/gcc.target/arm/interrupt-2.c
@@ -15,5 +15,7 @@ void test()
   foo = 0;
 }
 
-/* { dg-final { scan-assembler "push\t{r0, r1, r2, r3, r4, r5, ip, lr}" } } */
-/* { dg-final { scan-assembler "ldmfd\tsp!, {r0, r1, r2, r3, r4, r5, ip, 
pc}\\^" } } */
+/* { dg-final { scan-assembler "push\t{r0, r1, r2, r3, r4, r5, ip, lr}" { 
target { ! arm*-*-uclinuxfdpiceabi } } } } */
+/* { dg-final { scan-assembler "ldmfd\tsp!, {r0, r1, r2, r3, r4, r5, ip, 
pc}\\^" { target { ! arm*-*-uclinuxfdpiceabi } } } } */
+/* { dg-final { scan-assembler "push\t{r0, r1, r2, r3, r4, r5, r6, r9, ip, 
lr}" { target arm*-*-uclinuxfdpiceabi } } } */
+/* { dg-final { scan-assembler "ldmfd\tsp!, {r0, r1, r2, r3, r4, r5, r6, r9, 
ip, pc}\\^" { target arm*-*-uclinuxfdpiceabi } } } */
diff --git a/gcc/testsuite/gcc.target/arm/pr70830.c 
b/gcc/testsuite/gcc.target/arm/pr70830.c
index cad903b..cd84c42 100644
--- a/gcc/testsuite/gcc.target/arm/pr70830.c
+++ b/gcc/testsuite/gcc.target/arm/pr70830.c
@@ -11,4 +11,5 @@ void __attribute__ ((interrupt ("IRQ"))) 
dm3730_IRQHandler(void)
 {
 prints("IRQ" );
 }
-/* { dg-final { scan-assembler "ldmfd\tsp!, {r0, r1, r2, r3, ip, pc}\\^" } } */
+/* { dg-final { scan-assembler "ldmfd\tsp!, {r0, r1, r2, r3, ip, pc}\\^" { 
target { ! arm*-*-uclinuxfdpiceabi } } } } */
+/* { dg-final { scan-assembler "ldmfd\tsp!, {r0, r1, r2, r3, r4, r9, ip, 
pc}\\^" { target arm*-*-uclinuxfdpiceabi } } } */
-- 
2.6.3



[ARM/FDPIC v2 13/21] [ARM] FDPIC: Force LSB bit for PC in Cortex-M architecture

2018-07-13 Thread christophe.lyon
From: Christophe Lyon 

Without this, when we are unwinding across a signal frame we can jump
to an even address which leads to an exception.

This is needed in __gnu_persnality_sigframe_fdpic() when restoring the
PC from the signal frame since the PC saved by the kernel has the LSB
bit set to zero.

2018-XX-XX  Christophe Lyon  
Mickaël Guêné 

libgcc/
* config/arm/unwind-arm.c (_Unwind_VRS_Set): Handle v7m
architecture.

Change-Id: Ie84de548226bcf1751e19a09e8f091fb3013ccea

diff --git a/libgcc/config/arm/unwind-arm.c b/libgcc/config/arm/unwind-arm.c
index 564e4f13..6da6e3d 100644
--- a/libgcc/config/arm/unwind-arm.c
+++ b/libgcc/config/arm/unwind-arm.c
@@ -198,6 +198,11 @@ _Unwind_VRS_Result _Unwind_VRS_Set (_Unwind_Context 
*context,
return _UVRSR_FAILED;
 
   vrs->core.r[regno] = *(_uw *) valuep;
+#if defined(__ARM_ARCH_7M__)
+  /* Force LSB bit since we always run thumb code.  */
+  if (regno == 15)
+   vrs->core.r[regno] |= 1;
+#endif
   return _UVRSR_OK;
 
 case _UVRSC_VFP:
-- 
2.6.3



[ARM/FDPIC v2 14/21] [ARM][testsuite] FDPIC: Skip unsupported tests

2018-07-13 Thread christophe.lyon
From: Christophe Lyon 

Several tests cannot work on ARM-FDPIC for various reasons: skip them,
or skip some directives.

gcc.dg/20020312-2.c: Skip since it forces -fno-pic.

gcc.target/arm/:
* Skip since r9 is clobbered by assembly code:
  20051215-1.c
  mmx-1.c
  pr61948.c
  pr77933-1.c
  pr77933-2.c

* Skip since the test forces armv5te which is not supported by FDPIC:
  pr40887.c
  pr19599.c

* Skip since FDPIC disables sibcall to external functions:
  sibcall-1.c
  tail-long-call
  vfp-longcall-apcs

* Skip size check since it's different for FDPIC:
  ivopts-2.c
  ivopts-3.c
  ivopts-4.c
  ivopts-5.c
  pr43597.c
  pr43920-2.c

* Disable assembler scanning invalid for FDPIC:
  pr45701-1.c
  pr45701-2.c
  stack-red-zone.c

* gnu2 TLS dialect is not supported by FDPIC:
  tlscall.c

* Test relies on symbols not generated in FDPIC:
  data-rel-2.c
  data-rel-3.c

2018-XX-XX  Christophe Lyon  
Mickaël Guêné 

gcc/testsuite/
* gcc.dg/20020312-2.c: Skip on arm*-*-uclinuxfdpiceabi.
* gcc.target/arm/20051215-1.c: Likewise.
* gcc.target/arm/mmx-1.c: Likewise.
* gcc.target/arm/pr19599.c: Likewise.
* gcc.target/arm/pr40887.c: Likewise.
* gcc.target/arm/pr61948.c: Likewise.
* gcc.target/arm/pr77933-1.c: Likewise.
* gcc.target/arm/pr77933-2.c: Likewise.
* gcc.target/arm/sibcall-1.c: Likewise.
* gcc.target/arm/data-rel-2.c: Likewise.
* gcc.target/arm/data-rel-3.c: Likewise.
* gcc.target/arm/tail-long-call: Likewise.
* gcc.target/arm/tlscall.c: Likewise.
* gcc.target/arm/vfp-longcall-apcs: Likewise.
* gcc.target/arm/ivopts-2.c: Skip object-size test on
arm*-*-uclinuxfdpiceabi.
* gcc.target/arm/ivopts-3.c: Likewise.
* gcc.target/arm/ivopts-4.c: Likewise.
* gcc.target/arm/ivopts-5.c: Likewise.
* gcc.target/arm/pr43597.c: Likewise.
* gcc.target/arm/pr43920-2.c: Likewise.
* gcc.target/arm/pr45701-1.c: Skip scan-assembler on
arm*-*-uclinuxfdpiceabi.
* gcc.target/arm/pr45701-2.c: Likewise.
* gcc.target/arm/stack-red-zone.c: Likewise.

Change-Id: Icada7ce52537901fdac10403e7997571b7e2c509

diff --git a/gcc/testsuite/gcc.dg/20020312-2.c 
b/gcc/testsuite/gcc.dg/20020312-2.c
index f5929e0..a7758b8 100644
--- a/gcc/testsuite/gcc.dg/20020312-2.c
+++ b/gcc/testsuite/gcc.dg/20020312-2.c
@@ -8,6 +8,7 @@
 /* { dg-do run } */
 /* { dg-options "-O -fno-pic" } */
 /* { dg-require-effective-target nonlocal_goto } */
+/* { dg-skip-if "" { arm*-*-uclinuxfdpiceabi } "*" "" } */
 
 extern void abort (void);
 
diff --git a/gcc/testsuite/gcc.target/arm/20051215-1.c 
b/gcc/testsuite/gcc.target/arm/20051215-1.c
index 0519dc7..cc07693 100644
--- a/gcc/testsuite/gcc.target/arm/20051215-1.c
+++ b/gcc/testsuite/gcc.target/arm/20051215-1.c
@@ -3,6 +3,7 @@
the call would need an output reload.  */
 /* { dg-do run } */
 /* { dg-options "-O2 -fno-omit-frame-pointer" } */
+/* { dg-skip-if "r9 is reserved in FDPIC" { arm*-*-uclinuxfdpiceabi } "*" "" } 
*/
 extern void abort (void);
 typedef void (*callback) (void);
 
diff --git a/gcc/testsuite/gcc.target/arm/data-rel-2.c 
b/gcc/testsuite/gcc.target/arm/data-rel-2.c
index 6ba47d6..7d37a8c 100644
--- a/gcc/testsuite/gcc.target/arm/data-rel-2.c
+++ b/gcc/testsuite/gcc.target/arm/data-rel-2.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "Not supported in FDPIC" { arm*-*-uclinuxfdpiceabi } "*" "" } 
*/
 /* { dg-options "-fPIC -mno-pic-data-is-text-relative -mno-single-pic-base" } 
*/
 /* { dg-final { scan-assembler-not "j-\\(.LPIC"  } } */
 /* { dg-final { scan-assembler "_GLOBAL_OFFSET_TABLE_-\\(.LPIC" } } */
diff --git a/gcc/testsuite/gcc.target/arm/data-rel-3.c 
b/gcc/testsuite/gcc.target/arm/data-rel-3.c
index 2ce1e66..534c6c4 100644
--- a/gcc/testsuite/gcc.target/arm/data-rel-3.c
+++ b/gcc/testsuite/gcc.target/arm/data-rel-3.c
@@ -1,3 +1,4 @@
+/* { dg-skip-if "Not supported in FDPIC" { arm*-*-uclinuxfdpiceabi } "*" "" } 
*/
 /* { dg-options "-fPIC -mpic-data-is-text-relative" } */
 /* { dg-final { scan-assembler "j-\\(.LPIC"  } } */
 /* { dg-final { scan-assembler-not "_GLOBAL_OFFSET_TABLE_-\\(.LPIC" } } */
diff --git a/gcc/testsuite/gcc.target/arm/ivopts-2.c 
b/gcc/testsuite/gcc.target/arm/ivopts-2.c
index afe91aa..f1d5edb 100644
--- a/gcc/testsuite/gcc.target/arm/ivopts-2.c
+++ b/gcc/testsuite/gcc.target/arm/ivopts-2.c
@@ -14,4 +14,4 @@ tr4 (short array[], int n)
 
 /* { dg-final { scan-tree-dump-times "PHI 

[ARM/FDPIC v2 12/21] [ARM] FDPIC: Restore r9 after we call __aeabi_read_tp

2018-07-13 Thread christophe.lyon
From: Christophe Lyon 

We call __aeabi_read_tp() to get the thread pointer. Since this is a
function call, we have to restore the FDPIC register afterwards.

2018-XX-XX  Christophe Lyon  
Mickaël Guêné 

gcc/
* config/arm/arm.c (arm_load_tp): Add FDPIC support.
* config/arm/arm.md (load_tp_soft_fdpic): New pattern.
(load_tp_soft): Disable in FDPIC mode.

Change-Id: I0a2e3466c9afb869ad8e844083ad178de014658e

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 5d32f6a..91b000e 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -8669,7 +8669,25 @@ arm_load_tp (rtx target)
 
   rtx tmp;
 
-  emit_insn (gen_load_tp_soft ());
+  if (TARGET_FDPIC)
+   {
+ rtx par = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (3));
+
+ emit_insn (gen_load_tp_soft_fdpic ());
+
+ /* Restore r9.  */
+ XVECEXP (par, 0, 0)
+   = gen_rtx_UNSPEC (VOIDmode,
+ gen_rtvec (2, gen_rtx_REG (Pmode, FDPIC_REGNUM),
+get_hard_reg_initial_val (Pmode, 
FDPIC_REGNUM)),
+ UNSPEC_PIC_RESTORE);
+ XVECEXP (par, 0, 1) = gen_rtx_USE (VOIDmode, gen_rtx_REG (Pmode, 
FDPIC_REGNUM));
+ XVECEXP (par, 0, 2)
+   = gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (Pmode, FDPIC_REGNUM));
+ emit_insn (par);
+   }
+  else
+   emit_insn (gen_load_tp_soft ());
 
   tmp = gen_rtx_REG (SImode, R0_REGNUM);
   emit_move_insn (target, tmp);
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index a699a60..9259b08 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -11485,12 +11485,25 @@
 )
 
 ;; Doesn't clobber R1-R3.  Must use r0 for the first operand.
+(define_insn "load_tp_soft_fdpic"
+  [(set (reg:SI 0) (unspec:SI [(const_int 0)] UNSPEC_TLS))
+   (clobber (reg:SI 9))
+   (clobber (reg:SI LR_REGNUM))
+   (clobber (reg:SI IP_REGNUM))
+   (clobber (reg:CC CC_REGNUM))]
+  "TARGET_SOFT_TP && TARGET_FDPIC"
+  "bl\\t__aeabi_read_tp\\t@ load_tp_soft"
+  [(set_attr "conds" "clob")
+   (set_attr "type" "branch")]
+)
+
+;; Doesn't clobber R1-R3.  Must use r0 for the first operand.
 (define_insn "load_tp_soft"
   [(set (reg:SI 0) (unspec:SI [(const_int 0)] UNSPEC_TLS))
(clobber (reg:SI LR_REGNUM))
(clobber (reg:SI IP_REGNUM))
(clobber (reg:CC CC_REGNUM))]
-  "TARGET_SOFT_TP"
+  "TARGET_SOFT_TP && !TARGET_FDPIC"
   "bl\\t__aeabi_read_tp\\t@ load_tp_soft"
   [(set_attr "conds" "clob")
(set_attr "type" "branch")]
-- 
2.6.3



[ARM/FDPIC v2 11/21] [ARM] FDPIC: Add support to unwind FDPIC signal frame

2018-07-13 Thread christophe.lyon
From: Christophe Lyon 

2018-XX-XX  Christophe Lyon  
Mickaël Guêné 

libgcc/
* unwind-arm-common.inc (ARM_SET_R7_RT_SIGRETURN)
(THUMB2_SET_R7_RT_SIGRETURN, FDPIC_LDR_R12_WITH_FUNCDESC)
(FDPIC_LDR_R9_WITH_GOT, FDPIC_LDR_PC_WITH_RESTORER)
(FDPIC_FUNCDESC_OFFSET, ARM_NEW_RT_SIGFRAME_UCONTEXT)
(ARM_UCONTEXT_SIGCONTEXT, ARM_SIGCONTEXT_R0, 
FDPIC_T2_LDR_R12_WITH_FUNCDESC)
(FDPIC_T2_LDR_R9_WITH_GOT, FDPIC_T2_LDR_PC_WITH_RESTORER): New.
(__gnu_personality_sigframe_fdpic): New.
(get_eit_entry): Add FDPIC signal frame support.

Change-Id: I7f9527cc50665dd1a731b7badf71c319fb38bf57

diff --git a/libgcc/unwind-arm-common.inc b/libgcc/unwind-arm-common.inc
index d7c611f..7a18a7b 100644
--- a/libgcc/unwind-arm-common.inc
+++ b/libgcc/unwind-arm-common.inc
@@ -30,6 +30,26 @@
 #include 
 #endif
 
+#if __FDPIC__
+/* Load r7 with rt_sigreturn value.  */
+#define ARM_SET_R7_RT_SIGRETURN0xe3a070ad  /* mov   r7, 
#0xad */
+#define THUMB2_SET_R7_RT_SIGRETURN 0x07adf04f  /* mov.w r7, #0xad */
+
+/* FDPIC jump to restorer sequence.  */
+#define FDPIC_LDR_R12_WITH_FUNCDESC0xe59fc004  /* ldr   r12, [pc, #4] 
*/
+#define FDPIC_LDR_R9_WITH_GOT  0xe59c9004  /* ldr   r9, [r12, #4] 
*/
+#define FDPIC_LDR_PC_WITH_RESTORER 0xe59cf000  /* ldr   pc, [r12] */
+#define FDPIC_T2_LDR_R12_WITH_FUNCDESC  0xc008f8df /* ldr.w r12, [pc, #8] 
*/
+#define FDPIC_T2_LDR_R9_WITH_GOT   0x9004f8dc  /* ldr.w r9, [r12, #4] 
*/
+#define FDPIC_T2_LDR_PC_WITH_RESTORER   0xf000f8dc /* ldr.w pc, [r12] */
+#define FDPIC_FUNCDESC_OFFSET  12
+
+/* Signal frame offsets.  */
+#define ARM_NEW_RT_SIGFRAME_UCONTEXT   0x80
+#define ARM_UCONTEXT_SIGCONTEXT0x14
+#define ARM_SIGCONTEXT_R0  0xc
+#endif
+
 /* We add a prototype for abort here to avoid creating a dependency on
target headers.  */
 extern void abort (void);
@@ -199,6 +219,45 @@ search_EIT_table (const __EIT_entry * table, int nrec, _uw 
return_address)
 }
 }
 
+#if __FDPIC__
+/* VFP is not restored, but this is sufficient to allow unwinding.  */
+static _Unwind_Reason_Code
+__gnu_personality_sigframe_fdpic (_Unwind_State state,
+ _Unwind_Control_Block *ucbp,
+ _Unwind_Context *context)
+{
+unsigned int sp;
+unsigned int pc;
+unsigned int funcdesc;
+unsigned int handler;
+unsigned int first_handler_instruction;
+int i;
+
+_Unwind_VRS_Get (context, _UVRSC_CORE, R_SP, _UVRSD_UINT32, );
+_Unwind_VRS_Get (context, _UVRSC_CORE, R_PC, _UVRSD_UINT32, );
+
+funcdesc = *(unsigned int *)((pc & ~1) + FDPIC_FUNCDESC_OFFSET);
+handler = *(unsigned int *)(funcdesc);
+first_handler_instruction = *(unsigned int *)(handler & ~1);
+
+/* Adjust SP to point to the start of registers according to
+   signal type.  */
+if (first_handler_instruction == ARM_SET_R7_RT_SIGRETURN
+   || first_handler_instruction == THUMB2_SET_R7_RT_SIGRETURN)
+   sp += ARM_NEW_RT_SIGFRAME_UCONTEXT
+ + ARM_UCONTEXT_SIGCONTEXT
+ + ARM_SIGCONTEXT_R0;
+else
+   sp += ARM_UCONTEXT_SIGCONTEXT
+ + ARM_SIGCONTEXT_R0;
+/* Restore regs saved on stack by the kernel.  */
+for (i = 0; i < 16; i++)
+   _Unwind_VRS_Set (context, _UVRSC_CORE, i, _UVRSD_UINT32, sp + 4 * i);
+
+return _URC_CONTINUE_UNWIND;
+}
+#endif
+
 /* Find the exception index table eintry for the given address.
Fill in the relevant fields of the UCB.
Returns _URC_FAILURE if an error occurred, _URC_OK on success.  */
@@ -222,6 +281,27 @@ get_eit_entry (_Unwind_Control_Block *ucbp, _uw 
return_address)
);
   if (!eitp)
{
+#if __FDPIC__
+ /* If we are unwinding a signal handler then perhaps we have
+reached a trampoline.  Try to detect jump to restorer
+sequence.  */
+ _uw *pc = (_uw *)((return_address+2) & ~1);
+ if ((pc[0] == FDPIC_LDR_R12_WITH_FUNCDESC
+  && pc[1] == FDPIC_LDR_R9_WITH_GOT
+  && pc[2] == FDPIC_LDR_PC_WITH_RESTORER)
+ || (pc[0] == FDPIC_T2_LDR_R12_WITH_FUNCDESC
+ && pc[1] == FDPIC_T2_LDR_R9_WITH_GOT
+ && pc[2] == FDPIC_T2_LDR_PC_WITH_RESTORER))
+   {
+ struct funcdesc_t *funcdesc
+   = (struct funcdesc_t *) &__gnu_personality_sigframe_fdpic;
+
+ UCB_PR_ADDR (ucbp) = funcdesc->ptr;
+ UCB_PR_GOT (ucbp) = funcdesc->got;
+
+ return _URC_OK;
+   }
+#endif
  UCB_PR_ADDR (ucbp) = 0;
  return _URC_FAILURE;
}
@@ -236,6 +316,27 @@ get_eit_entry (_Unwind_Control_Block *ucbp, _uw 
return_address)
 
   if (!eitp)
 {
+#if __FDPIC__
+  /* If we are unwinding a signal handler then perhaps we have
+reached a trampoline.  Try to 

[ARM/FDPIC v2 10/21] [ARM] FDPIC: Implement TLS support.

2018-07-13 Thread christophe.lyon
From: Christophe Lyon 

Support additional relocations: TLS_GD32_FDPIC, TLS_LDM32_FDPIC, and
TLS_IE32_FDPIC.

We do not support the GNU2 TLS dialect.

2018-XX-XX  Christophe Lyon  
Mickaël Guêné 

gcc/
* config/arm/arm.c (tls_reloc): Add TLS_GD32_FDPIC,
TLS_LDM32_FDPIC and TLS_IE32_FDPIC.
(arm_call_tls_get_addr): Add FDPIC support.
(legitimize_tls_address): Likewise.
(arm_emit_tls_decoration): Likewise.

Change-Id: I4ea5034ff654540c4658d0a79fb92f70550cdf4a

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index ffc9128..5d32f6a 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -2373,9 +2373,12 @@ char arm_arch_name[] = "__ARM_ARCH_PROFILE__";
 
 enum tls_reloc {
   TLS_GD32,
+  TLS_GD32_FDPIC,
   TLS_LDM32,
+  TLS_LDM32_FDPIC,
   TLS_LDO32,
   TLS_IE32,
+  TLS_IE32_FDPIC,
   TLS_LE32,
   TLS_DESCSEQ  /* GNU scheme */
 };
@@ -8697,20 +8700,34 @@ arm_call_tls_get_addr (rtx x, rtx reg, rtx *valuep, int 
reloc)
   gcc_assert (reloc != TLS_DESCSEQ);
   start_sequence ();
 
-  labelno = GEN_INT (pic_labelno++);
-  label = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, labelno), UNSPEC_PIC_LABEL);
-  label = gen_rtx_CONST (VOIDmode, label);
+  if (TARGET_FDPIC)
+{
+  sum = gen_rtx_UNSPEC (Pmode,
+   gen_rtvec (2, x, GEN_INT (reloc)),
+   UNSPEC_TLS);
+}
+  else
+{
+  labelno = GEN_INT (pic_labelno++);
+  label = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, labelno), UNSPEC_PIC_LABEL);
+  label = gen_rtx_CONST (VOIDmode, label);
 
-  sum = gen_rtx_UNSPEC (Pmode,
-   gen_rtvec (4, x, GEN_INT (reloc), label,
-  GEN_INT (TARGET_ARM ? 8 : 4)),
-   UNSPEC_TLS);
+  sum = gen_rtx_UNSPEC (Pmode,
+   gen_rtvec (4, x, GEN_INT (reloc), label,
+  GEN_INT (TARGET_ARM ? 8 : 4)),
+   UNSPEC_TLS);
+}
   reg = load_tls_operand (sum, reg);
 
-  if (TARGET_ARM)
-emit_insn (gen_pic_add_dot_plus_eight (reg, reg, labelno));
+  if (TARGET_FDPIC)
+{
+  emit_insn (gen_addsi3 (reg, reg, gen_rtx_REG (Pmode, FDPIC_REGNUM)));
+}
   else
-emit_insn (gen_pic_add_dot_plus_four (reg, reg, labelno));
+if (TARGET_ARM)
+  emit_insn (gen_pic_add_dot_plus_eight (reg, reg, labelno));
+else
+  emit_insn (gen_pic_add_dot_plus_four (reg, reg, labelno));
 
   *valuep = emit_library_call_value (get_tls_get_addr (), NULL_RTX,
 LCT_PURE, /* LCT_CONST?  */
@@ -8745,6 +8762,7 @@ arm_tls_descseq_addr (rtx x, rtx reg)
   return reg;
 }
 
+
 rtx
 legitimize_tls_address (rtx x, rtx reg)
 {
@@ -8757,6 +8775,9 @@ legitimize_tls_address (rtx x, rtx reg)
 case TLS_MODEL_GLOBAL_DYNAMIC:
   if (TARGET_GNU2_TLS)
{
+ if (TARGET_FDPIC)
+   gcc_unreachable();
+
  reg = arm_tls_descseq_addr (x, reg);
 
  tp = arm_load_tp (NULL_RTX);
@@ -8766,7 +8787,10 @@ legitimize_tls_address (rtx x, rtx reg)
   else
{
  /* Original scheme */
- insns = arm_call_tls_get_addr (x, reg, , TLS_GD32);
+ if (TARGET_FDPIC)
+   insns = arm_call_tls_get_addr (x, reg, , TLS_GD32_FDPIC);
+ else
+   insns = arm_call_tls_get_addr (x, reg, , TLS_GD32);
  dest = gen_reg_rtx (Pmode);
  emit_libcall_block (insns, dest, ret, x);
}
@@ -8775,6 +8799,9 @@ legitimize_tls_address (rtx x, rtx reg)
 case TLS_MODEL_LOCAL_DYNAMIC:
   if (TARGET_GNU2_TLS)
{
+ if (TARGET_FDPIC)
+   gcc_unreachable();
+
  reg = arm_tls_descseq_addr (x, reg);
 
  tp = arm_load_tp (NULL_RTX);
@@ -8783,7 +8810,10 @@ legitimize_tls_address (rtx x, rtx reg)
}
   else
{
- insns = arm_call_tls_get_addr (x, reg, , TLS_LDM32);
+ if (TARGET_FDPIC)
+   insns = arm_call_tls_get_addr (x, reg, , TLS_LDM32_FDPIC);
+ else
+   insns = arm_call_tls_get_addr (x, reg, , TLS_LDM32);
 
  /* Attach a unique REG_EQUIV, to allow the RTL optimizers to
 share the LDM result with other LD model accesses.  */
@@ -8802,23 +8832,35 @@ legitimize_tls_address (rtx x, rtx reg)
   return dest;
 
 case TLS_MODEL_INITIAL_EXEC:
-  labelno = GEN_INT (pic_labelno++);
-  label = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, labelno), UNSPEC_PIC_LABEL);
-  label = gen_rtx_CONST (VOIDmode, label);
-  sum = gen_rtx_UNSPEC (Pmode,
-   gen_rtvec (4, x, GEN_INT (TLS_IE32), label,
-  GEN_INT (TARGET_ARM ? 8 : 4)),
-   UNSPEC_TLS);
-  reg = load_tls_operand (sum, reg);
-
-  if (TARGET_ARM)
-   emit_insn (gen_tls_load_dot_plus_eight (reg, reg, labelno));
-  else if (TARGET_THUMB2)
-   emit_insn (gen_tls_load_dot_plus_four (reg, NULL, reg, labelno));

[ARM/FDPIC v2 09/21] [ARM] FDPIC: Add support for taking address of nested function

2018-07-13 Thread christophe.lyon
From: Christophe Lyon 

In FDPIC mode, the trampoline generated to support pointers to nested
functions looks like:

   .wordtrampoline address
   .wordtrampoline GOT address
   ldr  r12, [pc, #8]
   ldr  r9, [pc, #8]
   ldr  pc, [pc]
   .wordstatic chain value
   .wordGOT address
   .wordfunction's address

because in FDPIC function pointers are actually pointers to function
descriptors, we have to actually generate a function descriptor for
the trampoline.

2018-XX-XX  Christophe Lyon  
Mickaël Guêné 

gcc/
* config/arm/arm.c (arm_asm_trampoline_template): Add FDPIC
support.
(arm_trampoline_init): Likewise.
(arm_trampoline_init): Likewise.
* config/arm/arm.h (TRAMPOLINE_SIZE): Likewise.

Change-Id: I4b5127261a9aefa0f0318f110574ec07a856aeb1

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 51da2bc..ffc9128 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -3950,13 +3950,50 @@ arm_warn_func_return (tree decl)
   .wordstatic chain value
   .wordfunction's address
XXX FIXME: When the trampoline returns, r8 will be clobbered.  */
+/* In FDPIC mode, the trampoline looks like:
+  .wordtrampoline address
+  .wordtrampoline GOT address
+  ldr  r12, [pc, #8] ; #4 for Thumb2
+  ldr  r9,  [pc, #8] ; #4 for Thumb2
+  ldr  pc,  [pc, #8] ; #4 for Thumb2
+  .wordstatic chain value
+  .wordGOT address
+  .wordfunction's address
+*/
 
 static void
 arm_asm_trampoline_template (FILE *f)
 {
   fprintf (f, "\t.syntax unified\n");
 
-  if (TARGET_ARM)
+  if (TARGET_FDPIC)
+{
+  /* The first two words are a function descriptor pointing to the
+trampoline code just below.  */
+  if (TARGET_ARM)
+   fprintf (f, "\t.arm\n");
+  else if (TARGET_THUMB2)
+   fprintf (f, "\t.thumb\n");
+  else
+   /* Only ARM and Thumb-2 are supported.  */
+   gcc_assert ( !TARGET_ARM && !TARGET_THUMB2);
+
+  assemble_aligned_integer (UNITS_PER_WORD, const0_rtx);
+  assemble_aligned_integer (UNITS_PER_WORD, const0_rtx);
+  /* Trampoline code which sets the static chain register but also
+PIC register before jumping into real code.  */
+  asm_fprintf (f, "\tldr\t%r, [%r, #%d]\n",
+  STATIC_CHAIN_REGNUM, PC_REGNUM,
+  TARGET_THUMB2 ? 8 : 4);
+  asm_fprintf (f, "\tldr\t%r, [%r, #%d]\n",
+  PIC_OFFSET_TABLE_REGNUM, PC_REGNUM,
+  TARGET_THUMB2 ? 8 : 4);
+  asm_fprintf (f, "\tldr\t%r, [%r, #%d]\n",
+  PC_REGNUM, PC_REGNUM,
+  TARGET_THUMB2 ? 8 : 4);
+  assemble_aligned_integer (UNITS_PER_WORD, const0_rtx);
+}
+  else if (TARGET_ARM)
 {
   fprintf (f, "\t.arm\n");
   asm_fprintf (f, "\tldr\t%r, [%r, #0]\n", STATIC_CHAIN_REGNUM, PC_REGNUM);
@@ -3997,12 +4034,37 @@ arm_trampoline_init (rtx m_tramp, tree fndecl, rtx 
chain_value)
   emit_block_move (m_tramp, assemble_trampoline_template (),
   GEN_INT (TRAMPOLINE_SIZE), BLOCK_OP_NORMAL);
 
-  mem = adjust_address (m_tramp, SImode, TARGET_32BIT ? 8 : 12);
-  emit_move_insn (mem, chain_value);
+  if (TARGET_FDPIC)
+{
+  rtx funcdesc = XEXP (DECL_RTL (fndecl), 0);
+  rtx fnaddr = gen_rtx_MEM (Pmode, funcdesc);
+  rtx gotaddr = gen_rtx_MEM (Pmode, plus_constant (Pmode, funcdesc, 4));
+  rtx trampoline_code_start
+   = plus_constant (Pmode, XEXP (m_tramp, 0), TARGET_THUMB2 ? 9 : 8);
+
+  /* Write initial funcdesc which points to the trampoline.  */
+  mem = adjust_address (m_tramp, SImode, 0);
+  emit_move_insn (mem, trampoline_code_start);
+  mem = adjust_address (m_tramp, SImode, 4);
+  emit_move_insn (mem, gen_rtx_REG (Pmode, PIC_OFFSET_TABLE_REGNUM));
+  /* Setup static chain.  */
+  mem = adjust_address (m_tramp, SImode, 20);
+  emit_move_insn (mem, chain_value);
+  /* GOT + real function entry point.  */
+  mem = adjust_address (m_tramp, SImode, 24);
+  emit_move_insn (mem, gotaddr);
+  mem = adjust_address (m_tramp, SImode, 28);
+  emit_move_insn (mem, fnaddr);
+}
+  else
+{
+  mem = adjust_address (m_tramp, SImode, TARGET_32BIT ? 8 : 12);
+  emit_move_insn (mem, chain_value);
 
-  mem = adjust_address (m_tramp, SImode, TARGET_32BIT ? 12 : 16);
-  fnaddr = XEXP (DECL_RTL (fndecl), 0);
-  emit_move_insn (mem, fnaddr);
+  mem = adjust_address (m_tramp, SImode, TARGET_32BIT ? 12 : 16);
+  fnaddr = XEXP (DECL_RTL (fndecl), 0);
+  emit_move_insn (mem, fnaddr);
+}
 
   a_tramp = XEXP (m_tramp, 0);
   emit_library_call (gen_rtx_SYMBOL_REF (Pmode, "__clear_cache"),
@@ -4016,7 +4078,9 @@ arm_trampoline_init (rtx m_tramp, tree fndecl, rtx 

[ARM/FDPIC v2 08/21] [ARM] FDPIC: Ensure local/global binding for function descriptors

2018-07-13 Thread christophe.lyon
From: Christophe Lyon 

Use local binding rules to decide whether we can use GOTOFFFUNCDESC to
compute the function address.

2018-XX-XX  Christophe Lyon  
Mickaël Guêné 

gcc/
* config/arm/arm.c (arm_local_funcdesc_p): New function.
(legitimize_pic_address): Ensure binding rules on function
pointers in FDPIC mode.
(arm_assemble_integer): Likewise.

Change-Id: I3fa0b63bc0f672903f405aa72cc46052de1c0feb

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index c9f391b..51da2bc 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -3764,6 +3764,42 @@ arm_options_perform_arch_sanity_checks (void)
 }
 }
 
+/* Test whether a local function descriptor is canonical, i.e.,
+   whether we can use GOTOFFFUNCDESC to compute the address of the
+   function.  */
+static bool
+arm_local_funcdesc_p (rtx fnx)
+{
+  tree fn;
+  enum symbol_visibility vis;
+  bool ret;
+
+  if (!TARGET_FDPIC)
+return TRUE;
+
+  if (! SYMBOL_REF_LOCAL_P (fnx))
+return FALSE;
+
+  fn = SYMBOL_REF_DECL (fnx);
+
+  if (! fn)
+return FALSE;
+
+  vis = DECL_VISIBILITY (fn);
+
+  if (vis == VISIBILITY_PROTECTED)
+/* Private function descriptors for protected functions are not
+   canonical.  Temporarily change the visibility to global so that
+   we can ensure unicity of funcdesc pointers.  */
+DECL_VISIBILITY (fn) = VISIBILITY_DEFAULT;
+
+  ret = default_binds_local_p_1 (fn, flag_pic);
+
+  DECL_VISIBILITY (fn) = vis;
+
+  return ret;
+}
+
 static void
 arm_add_gc_roots (void)
 {
@@ -7481,7 +7517,9 @@ legitimize_pic_address (rtx orig, machine_mode mode, rtx 
reg)
   || (GET_CODE (orig) == SYMBOL_REF
   && SYMBOL_REF_LOCAL_P (orig)
   && (SYMBOL_REF_DECL (orig)
-  ? !DECL_WEAK (SYMBOL_REF_DECL (orig)) : 1)))
+  ? !DECL_WEAK (SYMBOL_REF_DECL (orig)) : 1)
+  && (!SYMBOL_REF_FUNCTION_P(orig)
+  || arm_local_funcdesc_p (orig
  && NEED_GOT_RELOC
  && arm_pic_data_is_text_relative)
insn = arm_pic_static_addr (orig, reg);
@@ -23069,7 +23107,9 @@ arm_assemble_integer (rtx x, unsigned int size, int 
aligned_p)
  || (GET_CODE (x) == SYMBOL_REF
  && (!SYMBOL_REF_LOCAL_P (x)
  || (SYMBOL_REF_DECL (x)
- ? DECL_WEAK (SYMBOL_REF_DECL (x)) : 0
+ ? DECL_WEAK (SYMBOL_REF_DECL (x)) : 0)
+ || (SYMBOL_REF_FUNCTION_P (x)
+ && !arm_local_funcdesc_p (x)
{
  if (TARGET_FDPIC && SYMBOL_REF_FUNCTION_P (x))
fputs ("(GOTFUNCDESC)", asm_out_file);
-- 
2.6.3



[ARM/FDPIC v2 07/21] [ARM] FDPIC: Avoid saving/restoring r9 on stack since it is RO

2018-07-13 Thread christophe.lyon
From: Christophe Lyon 

2018-XX-XX  Christophe Lyon  
Mickaël Guêné 

gcc/
* config/arm/arm.c (arm_compute_save_reg0_reg12_mask): Handle
FDPIC.
(thumb1_compute_save_core_reg_mask): Likewise.

Change-Id: Ib534cf91704cdc740867b46a8fe45fda27894562

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 44c3b08..c9f391b 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -19486,7 +19486,7 @@ arm_compute_save_reg0_reg12_mask (void)
 
   /* Also save the pic base register if necessary.  */
   if (flag_pic
- && !TARGET_SINGLE_PIC_BASE
+ && !TARGET_SINGLE_PIC_BASE && !TARGET_FDPIC
  && arm_pic_register != INVALID_REGNUM
  && crtl->uses_pic_offset_table)
save_reg_mask |= 1 << PIC_OFFSET_TABLE_REGNUM;
@@ -19520,7 +19520,7 @@ arm_compute_save_reg0_reg12_mask (void)
   /* If we aren't loading the PIC register,
 don't stack it even though it may be live.  */
   if (flag_pic
- && !TARGET_SINGLE_PIC_BASE
+ && !TARGET_SINGLE_PIC_BASE && !TARGET_FDPIC
  && arm_pic_register != INVALID_REGNUM
  && (df_regs_ever_live_p (PIC_OFFSET_TABLE_REGNUM)
  || crtl->uses_pic_offset_table))
@@ -19689,7 +19689,7 @@ thumb1_compute_save_core_reg_mask (void)
 mask |= 1 << HARD_FRAME_POINTER_REGNUM;
 
   if (flag_pic
-  && !TARGET_SINGLE_PIC_BASE
+  && !TARGET_SINGLE_PIC_BASE && !TARGET_FDPIC
   && arm_pic_register != INVALID_REGNUM
   && crtl->uses_pic_offset_table)
 mask |= 1 << PIC_OFFSET_TABLE_REGNUM;
-- 
2.6.3



[ARM/FDPIC v2 06/21] [ARM] FDPIC: Add support for c++ exceptions

2018-07-13 Thread christophe.lyon
From: Christophe Lyon 

The main difference with existing support is that function addresses
are function descriptor addresses instead. This means that all code
dealing with function pointers now has to cope with function
descriptors instead.

For the same reason, Linux kernel helpers can no longer be called by
dereferencing their address, so we implement the same functionality as
a regular function here.

When restoring a function address, we also have to restore the FDPIC
register value (r9).

2018-XX-XX  Christophe Lyon  
Mickaël Guêné 

gcc/
* ginclude/unwind-arm-common.h (unwinder_cache): Add reserved5
field.

libgcc/
* config/arm/linux-atomic.c (__kernel_cmpxchg): Add FDPIC support.
(__kernel_dmb): Likewise.
(__fdpic_cmpxchg): New function.
(__fdpic_dmb): New function.
* config/arm/unwind-arm.h (gnu_Unwind_Find_got): New function.
(_Unwind_decode_typeinfo_ptr): Add FDPIC support.
* unwindo-arm-common.inc (UCB_PR_GOT): New.
(funcdesc_t): New struct.
(get_eit_entry): Add FDPIC support.
(unwind_phase2): Likewise.
(unwind_phase2_forced): Likewise.
(__gnu_Unwind_RaiseException): Likewise.
(__gnu_Unwind_Resume): Likewise.
(__gnu_Unwind_Backtrace): Likewise.
* unwind-pe.h (read_encoded_value_with_base): Likewise.

libstdc++/
* libsupc++/eh_personality.cc (get_ttype_entry): Add FDPIC
support.

Change-Id: Ic0841eb3d7bfb0b3f6d187cd52a660b8fd394d85

diff --git a/gcc/ginclude/unwind-arm-common.h b/gcc/ginclude/unwind-arm-common.h
index 8a1a919..150bd0f 100644
--- a/gcc/ginclude/unwind-arm-common.h
+++ b/gcc/ginclude/unwind-arm-common.h
@@ -91,7 +91,7 @@ extern "C" {
  _uw reserved2;  /* Personality routine address */
  _uw reserved3;  /* Saved callsite address */
  _uw reserved4;  /* Forced unwind stop arg */
- _uw reserved5;
+ _uw reserved5;  /* Personality routine GOT value in FDPIC mode.  */
}
   unwinder_cache;
   /* Propagation barrier cache (valid after phase 1): */
diff --git a/libgcc/config/arm/linux-atomic.c b/libgcc/config/arm/linux-atomic.c
index d334c58..161d1ce 100644
--- a/libgcc/config/arm/linux-atomic.c
+++ b/libgcc/config/arm/linux-atomic.c
@@ -25,11 +25,49 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
 
 /* Kernel helper for compare-and-exchange.  */
 typedef int (__kernel_cmpxchg_t) (int oldval, int newval, int *ptr);
+#if __FDPIC__
+/* Non-FDPIC ABIs call __kernel_cmpxchg directly by dereferencing its
+   address, but under FDPIC we would generate a broken call
+   sequence. That's why we have to implement __kernel_cmpxchg and
+   __kernel_dmb here: this way, the FDPIC call sequence works.  */
+#define __kernel_cmpxchg __fdpic_cmpxchg
+#else
 #define __kernel_cmpxchg (*(__kernel_cmpxchg_t *) 0x0fc0)
+#endif
 
 /* Kernel helper for memory barrier.  */
 typedef void (__kernel_dmb_t) (void);
+#if __FDPIC__
+#define __kernel_dmb __fdpic_dmb
+#else
 #define __kernel_dmb (*(__kernel_dmb_t *) 0x0fa0)
+#endif
+
+#if __FDPIC__
+static int __fdpic_cmpxchg (int oldval, int newval, int *ptr)
+{
+  int result;
+
+  asm volatile ("1: ldrex r3, [%[ptr]]\n\t"
+   "subs  r3, r3, %[oldval]\n\t"
+   "itt eq\n\t"
+   "strexeq r3, %[newval], [%[ptr]]\n\t"
+   "teqeq r3, #1\n\t"
+   "it eq\n\t"
+   "beq 1b\n\t"
+   "rsbs  %[result], r3, #0\n\t"
+   : [result] "=r" (result)
+   : [oldval] "r" (oldval) , [newval] "r" (newval), [ptr] "r" (ptr)
+   : "r3");
+return result;
+}
+
+static void __fdpic_dmb ()
+{
+  asm volatile ("dmb\n\t");
+}
+
+#endif
 
 /* Note: we implement byte, short and int versions of atomic operations using
the above kernel helpers; see linux-atomic-64bit.c for "long long" (64-bit)
diff --git a/libgcc/config/arm/unwind-arm.h b/libgcc/config/arm/unwind-arm.h
index 9f7d3f2..5874b9b 100644
--- a/libgcc/config/arm/unwind-arm.h
+++ b/libgcc/config/arm/unwind-arm.h
@@ -36,6 +36,25 @@
 #ifdef __cplusplus
 extern "C" {
 #endif
+_Unwind_Ptr __attribute__((weak)) __gnu_Unwind_Find_got (_Unwind_Ptr);
+
+static inline _Unwind_Ptr gnu_Unwind_Find_got (_Unwind_Ptr ptr)
+{
+_Unwind_Ptr res;
+
+if (__gnu_Unwind_Find_got)
+   res =  __gnu_Unwind_Find_got (ptr);
+else
+  {
+   asm volatile ("mov %[result], r9"
+ : [result]"=r" (res)
+ :
+ :);
+  }
+
+return res;
+}
+
   /* Decode an R_ARM_TARGET2 relocation.  */
   static inline _Unwind_Word
   _Unwind_decode_typeinfo_ptr (_Unwind_Word base __attribute__ ((unused)),
@@ -48,7 +67,12 @@ extern "C" {
   if (!tmp)
return 0;
 
-#if (defined(linux) && !defined(__uClinux__)) || defined(__NetBSD__) \
+#if __FDPIC__
+  /* For FDPIC, we store the offset of the GOT entry. 

[ARM/FDPIC v2 05/21] [ARM] FDPIC: Fix __do_global_dtors_aux and frame_dummy generation

2018-07-13 Thread christophe.lyon
From: Christophe Lyon 

In FDPIC, we need to make sure __do_global_dtors_aux and frame_dummy
are referenced by their address, not by pointers to the function
descriptors.

2018-XX-XX  Christophe Lyon  
Mickaël Guêné 

* libgcc/crtstuff.c: Add support for FDPIC.

Change-Id: Iff3aec3815e8ebd87276c0107752f00908a22100

diff --git a/libgcc/crtstuff.c b/libgcc/crtstuff.c
index d81c527..ad40719 100644
--- a/libgcc/crtstuff.c
+++ b/libgcc/crtstuff.c
@@ -429,9 +429,17 @@ __do_global_dtors_aux (void)
 #ifdef FINI_SECTION_ASM_OP
 CRT_CALL_STATIC_FUNCTION (FINI_SECTION_ASM_OP, __do_global_dtors_aux)
 #elif defined (FINI_ARRAY_SECTION_ASM_OP)
+#if defined(__FDPIC__)
+__asm__(
+"   .section .fini_array\n"
+"   .word __do_global_dtors_aux\n"
+);
+asm (TEXT_SECTION_ASM_OP);
+#else /* defined(__FDPIC__) */
 static func_ptr __do_global_dtors_aux_fini_array_entry[]
   __attribute__ ((__used__, section(".fini_array"), aligned(sizeof(func_ptr
   = { __do_global_dtors_aux };
+#endif /* defined(__FDPIC__) */
 #else /* !FINI_SECTION_ASM_OP && !FINI_ARRAY_SECTION_ASM_OP */
 static void __attribute__((used))
 __do_global_dtors_aux_1 (void)
@@ -473,9 +481,17 @@ frame_dummy (void)
 #ifdef __LIBGCC_INIT_SECTION_ASM_OP__
 CRT_CALL_STATIC_FUNCTION (__LIBGCC_INIT_SECTION_ASM_OP__, frame_dummy)
 #else /* defined(__LIBGCC_INIT_SECTION_ASM_OP__) */
+#if defined(__FDPIC__)
+__asm__(
+"   .section .init_array\n"
+"   .word frame_dummy\n"
+);
+asm (TEXT_SECTION_ASM_OP);
+#else /* defined(__FDPIC__) */
 static func_ptr __frame_dummy_init_array_entry[]
   __attribute__ ((__used__, section(".init_array"), aligned(sizeof(func_ptr
   = { frame_dummy };
+#endif /* defined(__FDPIC__) */
 #endif /* !defined(__LIBGCC_INIT_SECTION_ASM_OP__) */
 #endif /* USE_EH_FRAME_REGISTRY || USE_TM_CLONE_REGISTRY */
 
-- 
2.6.3



[ARM/FDPIC v2 04/21] [ARM] FDPIC: Add support for FDPIC for arm architecture

2018-07-13 Thread christophe.lyon
From: Christophe Lyon 

The FDPIC register is hard-coded to r9, as defined in the ABI.

We have to disable tailcall optimizations if we don't know if the
target function is in the same module. If not, we have to set r9 to
the value associated with the target module.

When generating a symbol address, we have to take into account whether
it is a pointer to data or to a function, because different
relocations are needed.

2018-XX-XX  Christophe Lyon  
Mickaël Guêné 

* config/arm/arm-c.c (__FDPIC__): Define new pre-processor macro
in FDPIC mode.
* config/arm/arm-protos.h (arm_load_function_descriptor): Declare
new function.
* config/arm/arm.c (arm_option_override): Define pic register to
FDPIC_REGNUM.
(arm_function_ok_for_sibcall) Disable sibcall optimization if we
have no decl or go through PLT.
(arm_load_pic_register): Handle TARGET_FDPIC.
(arm_is_segment_info_known): New function.
(arm_pic_static_addr): Add support for FDPIC.
(arm_load_function_descriptor): New function.
(arm_assemble_integer): Add support for FDPIC.
* config/arm/arm.h (PIC_OFFSET_TABLE_REG_CALL_CLOBBERED):
Define. (FDPIC_REGNUM): New define.
* config/arm/arm.md (call): Add support for FDPIC.
(call_value): Likewise.
(*restore_pic_register_after_call): New pattern.
(untyped_call): Disable if FDPIC.
(untyped_return): Likewise.
* config/arm/unspecs.md (UNSPEC_PIC_RESTORE): New.

Change-Id: Icee8484772f97ac6f3a9574df4aa4f25a8196786

diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c
index 4471f79..90733cc 100644
--- a/gcc/config/arm/arm-c.c
+++ b/gcc/config/arm/arm-c.c
@@ -202,6 +202,8 @@ arm_cpu_builtins (struct cpp_reader* pfile)
   builtin_define ("__ARM_EABI__");
 }
 
+  def_or_undef_macro (pfile, "__FDPIC__", TARGET_FDPIC);
+
   def_or_undef_macro (pfile, "__ARM_ARCH_EXT_IDIV__", TARGET_IDIV);
   def_or_undef_macro (pfile, "__ARM_FEATURE_IDIV", TARGET_IDIV);
 
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 8537262..edebeb7 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -134,6 +134,7 @@ extern int arm_max_const_double_inline_cost (void);
 extern int arm_const_double_inline_cost (rtx);
 extern bool arm_const_double_by_parts (rtx);
 extern bool arm_const_double_by_immediates (rtx);
+extern rtx arm_load_function_descriptor (rtx funcdesc);
 extern void arm_emit_call_insn (rtx, rtx, bool);
 bool detect_cmse_nonsecure_call (tree);
 extern const char *output_call (rtx *);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index c70be36..44c3b08 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -3466,6 +3466,14 @@ arm_option_override (void)
   if (flag_pic && TARGET_VXWORKS_RTP)
 arm_pic_register = 9;
 
+  /* If in FDPIC mode then force arm_pic_register to be r9.  */
+  if (TARGET_FDPIC)
+{
+  arm_pic_register = FDPIC_REGNUM;
+  if (TARGET_ARM_ARCH < 7)
+   error ("FDPIC mode is not supported on architectures older than 7");
+}
+
   if (arm_pic_register_string != NULL)
 {
   int pic_register = decode_reg_name (arm_pic_register_string);
@@ -7247,6 +7255,21 @@ arm_function_ok_for_sibcall (tree decl, tree exp)
   if (cfun->machine->sibcall_blocked)
 return false;
 
+  if (TARGET_FDPIC)
+{
+  /* In FDPIC, never tailcall something for which we have no decl:
+the target function could be in a different module, requiring
+a different FDPIC register value.  */
+  if (decl == NULL)
+   return false;
+
+  /* Don't tailcall if we go through the PLT since the FDPIC
+register is then corrupted and we don't restore it after
+static function calls.  */
+  if (!targetm.binds_local_p (decl))
+   return false;
+}
+
   /* Never tailcall something if we are generating code for Thumb-1.  */
   if (TARGET_THUMB1)
 return false;
@@ -7625,7 +7648,9 @@ arm_load_pic_register (unsigned long saved_regs 
ATTRIBUTE_UNUSED)
 {
   rtx l1, labelno, pic_tmp, pic_rtx, pic_reg;
 
-  if (crtl->uses_pic_offset_table == 0 || TARGET_SINGLE_PIC_BASE)
+  if (crtl->uses_pic_offset_table == 0
+  || TARGET_SINGLE_PIC_BASE
+  || TARGET_FDPIC)
 return;
 
   gcc_assert (flag_pic);
@@ -7693,28 +7718,170 @@ arm_load_pic_register (unsigned long saved_regs 
ATTRIBUTE_UNUSED)
   emit_use (pic_reg);
 }
 
+/* Try to know if the object will go in text or data segment. This is
+   used in FDPIC mode, to decide which relocations to use when
+   accessing ORIG. */
+static bool
+arm_is_segment_info_known (rtx orig, bool *is_readonly)
+{
+  bool res = false;
+
+  *is_readonly = false;
+
+  if (GET_CODE (orig) == LABEL_REF)
+{
+  res = true;
+  *is_readonly = true;
+}
+  else if (GET_CODE (orig) == SYMBOL_REF)
+{
+  if (CONSTANT_POOL_ADDRESS_P (orig))
+   {
+ res = true;
+ 

[ARM/FDPIC v2 03/21] [ARM] FDPIC: Force FDPIC related options unless -mno-fdpic is provided

2018-07-13 Thread christophe.lyon
From: Christophe Lyon 

In FDPIC mode, we set -fPIE unless the user provides -fno-PIE, -fpie,
-fPIC or -fpic: indeed FDPIC code is PIC, but we want to generate code
for executables rather than shared libraries by default.

We also make sure to use the --fdpic assembler option, and select the
appropriate linker emulation.

At link time, we also default to -pie, unless we are generating a
shared library or a relocatable file (-r). Note that even for static
link, we must specify the dynamic linker because the executable still
has to relocate itself at startup.

We also force 'now' binding since lazy binding is not supported.

We should also apply the same behavior for -Wl,-Ur as for -r, but I
couldn't find how to describe that in the specs fragment.

2018-XX-XX  Christophe Lyon  
Mickaël Guêné 

gcc/
* config.gcc: Handle arm*-*-uclinuxfdpiceabi.
* config/arm/bpabi.h (TARGET_FDPIC_ASM_SPEC): New.
(SUBTARGET_EXTRA_ASM_SPEC): Use TARGET_FDPIC_ASM_SPEC.
* config/arm/linux-eabi.h (FDPIC_CC1_SPEC): New.
(CC1_SPEC): Use FDPIC_CC1_SPEC.
* config/arm/uclinuxfdpiceabi.h: New file.

libsanitizer/
* configure.tgt (arm*-*-uclinuxfdpiceabi): Sanitizers are
unsupported in this configuration.

Change-Id: If369e0a10bb916fd72e38f71498d3c640fa85c4c

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 808ff82..747afd8 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -1145,6 +1145,11 @@ arm*-*-linux-* | arm*-*-uclinuxfdpiceabi)
# ARM GNU/Linux with ELF
esac
tmake_file="${tmake_file} arm/t-arm arm/t-arm-elf arm/t-bpabi 
arm/t-linux-eabi"
tm_file="$tm_file arm/bpabi.h arm/linux-eabi.h arm/aout.h 
vxworks-dummy.h arm/arm.h"
+   case $target in
+   arm*-*-uclinuxfdpiceabi)
+   tm_file="$tm_file arm/uclinuxfdpiceabi.h"
+   ;;
+   esac
# Generation of floating-point instructions requires at least ARMv5te.
if [ "$with_float" = "hard" -o "$with_float" = "softfp" ] ; then
target_cpu_cname="arm10e"
diff --git a/gcc/config/arm/bpabi.h b/gcc/config/arm/bpabi.h
index 1e3ecfb..5901154 100644
--- a/gcc/config/arm/bpabi.h
+++ b/gcc/config/arm/bpabi.h
@@ -55,6 +55,8 @@
 #define TARGET_FIX_V4BX_SPEC " %{mcpu=arm8|mcpu=arm810|mcpu=strongarm*"\
   "|march=armv4|mcpu=fa526|mcpu=fa626:--fix-v4bx}"
 
+#define TARGET_FDPIC_ASM_SPEC  ""
+
 #define BE8_LINK_SPEC  \
   "%{!r:%{!mbe32:%:be8_linkopt(%{mlittle-endian:little}"   \
   "   %{mbig-endian:big}"  \
@@ -64,7 +66,7 @@
 /* Tell the assembler to build BPABI binaries.  */
 #undef  SUBTARGET_EXTRA_ASM_SPEC
 #define SUBTARGET_EXTRA_ASM_SPEC \
-  "%{mabi=apcs-gnu|mabi=atpcs:-meabi=gnu;:-meabi=5}" TARGET_FIX_V4BX_SPEC
+  "%{mabi=apcs-gnu|mabi=atpcs:-meabi=gnu;:-meabi=5}" TARGET_FIX_V4BX_SPEC 
TARGET_FDPIC_ASM_SPEC
 
 #ifndef SUBTARGET_EXTRA_LINK_SPEC
 #define SUBTARGET_EXTRA_LINK_SPEC ""
diff --git a/gcc/config/arm/linux-eabi.h b/gcc/config/arm/linux-eabi.h
index 8585fde..4cee958 100644
--- a/gcc/config/arm/linux-eabi.h
+++ b/gcc/config/arm/linux-eabi.h
@@ -98,11 +98,14 @@
 #undef  ASAN_CC1_SPEC
 #define ASAN_CC1_SPEC "%{%:sanitize(address):-funwind-tables}"
 
+#define FDPIC_CC1_SPEC ""
+
 #undef  CC1_SPEC
 #define CC1_SPEC   \
-  LINUX_OR_ANDROID_CC (GNU_USER_TARGET_CC1_SPEC " " ASAN_CC1_SPEC, \
+  LINUX_OR_ANDROID_CC (GNU_USER_TARGET_CC1_SPEC " " ASAN_CC1_SPEC " "  \
+  FDPIC_CC1_SPEC,  \
   GNU_USER_TARGET_CC1_SPEC " " ASAN_CC1_SPEC " "   \
-  ANDROID_CC1_SPEC)
+  ANDROID_CC1_SPEC "" FDPIC_CC1_SPEC)
 
 #define CC1PLUS_SPEC \
   LINUX_OR_ANDROID_CC ("", ANDROID_CC1PLUS_SPEC)
diff --git a/gcc/config/arm/uclinuxfdpiceabi.h 
b/gcc/config/arm/uclinuxfdpiceabi.h
new file mode 100644
index 000..43a17de
--- /dev/null
+++ b/gcc/config/arm/uclinuxfdpiceabi.h
@@ -0,0 +1,53 @@
+/* Configuration file for ARM GNU/Linux FDPIC EABI targets.
+   Copyright (C) 2018 Free Software Foundation, Inc.
+   Contributed by STMicroelectronics.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   .  */
+
+/* On uClibc EABI GNU/Linux, we want to force -mfdpic by 

[ARM/FDPIC v2 02/21] [ARM] FDPIC: Handle arm*-*-uclinuxfdpiceabi in configure scripts

2018-07-13 Thread christophe.lyon
From: Christophe Lyon 

The new arm-uclinuxfdpiceabi target behaves pretty much like
arm-linux-gnueabi. In order the enable the same set of features, we
have to update several configure scripts that generally match targets
like *-*-linux*: in most places, we add *-uclinux* where there is
already *-linux*, or uclinux* when there is already linux*.

In gcc/config.gcc and libgcc/config.host we use *-*-uclinuxfdpiceabi
because there is already a different behaviour for *-*uclinux* target.

In libtool.m4, we use uclinuxfdpiceabi in cases where ELF shared
libraries support is required, as uclinux does not guarantee that.

2018-XX-XX  Christophe Lyon  

* config/futex.m4: Handle *-uclinux*.
* config/tls.m4 (GCC_CHECK_TLS): Likewise.
* gcc/config.gcc: Handle *-*-uclinuxfdpiceabi.
* libatomic/configure.tgt: Handle arm*-*-uclinux*.
* libgcc/config.host: Handle *-*-uclinuxfdpiceabi.
* libitm/configure.tgt: Handle *-*-uclinux*.
* libatomic/configure: Regenerate.
* libitm/configure: Regenerate.
* libstdc++-v3/acinclude.m4: Handle uclinux*.
* libstdc++-v3/configure: Regenerate.
* libstdc++-v3/configure.host: Handle uclinux*
* libtool.m4: Handle uclinux*.

Change-Id: I6a1fdcd9847d8a82179a214612a3474c1f492916

diff --git a/config/futex.m4 b/config/futex.m4
index e95144d..4dffe15 100644
--- a/config/futex.m4
+++ b/config/futex.m4
@@ -9,7 +9,7 @@ AC_DEFUN([GCC_LINUX_FUTEX],[dnl
 GCC_ENABLE(linux-futex,default, ,[use the Linux futex system call],
   permit yes|no|default)
 case "$target" in
-  *-linux*)
+  *-linux* | *-uclinux*)
 case "$enable_linux_futex" in
   default)
# If headers don't have gettid/futex syscalls definition, then
diff --git a/config/tls.m4 b/config/tls.m4
index 4e170c8..5a8676e 100644
--- a/config/tls.m4
+++ b/config/tls.m4
@@ -76,7 +76,7 @@ AC_DEFUN([GCC_CHECK_TLS], [
  dnl Shared library options may depend on the host; this check
  dnl is only known to be needed for GNU/Linux.
  case $host in
-   *-*-linux*)
+   *-*-linux* | -*-uclinux*)
  LDFLAGS="-shared -Wl,--no-undefined $LDFLAGS"
  ;;
  esac
diff --git a/gcc/config.gcc b/gcc/config.gcc
index ef67c88..808ff82 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -759,7 +759,7 @@ case ${target} in
 *-*-fuchsia*)
   native_system_header_dir=/include
   ;;
-*-*-linux* | frv-*-*linux* | *-*-kfreebsd*-gnu | *-*-gnu* | 
*-*-kopensolaris*-gnu)
+*-*-linux* | frv-*-*linux* | *-*-kfreebsd*-gnu | *-*-gnu* | 
*-*-kopensolaris*-gnu | *-*-uclinuxfdpiceabi)
   extra_options="$extra_options gnu-user.opt"
   gas=yes
   gnu_ld=yes
@@ -768,7 +768,7 @@ case ${target} in
   esac
   tmake_file="t-slibgcc"
   case $target in
-*-*-linux* | frv-*-*linux* | *-*-kfreebsd*-gnu | *-*-kopensolaris*-gnu)
+*-*-linux* | frv-*-*linux* | *-*-kfreebsd*-gnu | *-*-kopensolaris*-gnu  | 
*-*-uclinuxfdpiceabi)
   :;;
 *-*-gnu*)
   native_system_header_dir=/include
@@ -788,7 +788,7 @@ case ${target} in
 *-*-*android*)
   tm_defines="$tm_defines DEFAULT_LIBC=LIBC_BIONIC"
   ;;
-*-*-*uclibc*)
+*-*-*uclibc* | *-*-uclinuxfdpiceabi)
   tm_defines="$tm_defines DEFAULT_LIBC=LIBC_UCLIBC"
   ;;
 *-*-*musl*)
@@ -1135,7 +1135,7 @@ arm*-*-netbsdelf*)
tmake_file="${tmake_file} arm/t-arm"
target_cpu_cname="arm6"
;;
-arm*-*-linux-*)# ARM GNU/Linux with ELF
+arm*-*-linux-* | arm*-*-uclinuxfdpiceabi)  # ARM GNU/Linux 
with ELF
tm_file="dbxelf.h elfos.h gnu-user.h linux.h linux-android.h 
glibc-stdint.h arm/elf.h arm/linux-gas.h arm/linux-elf.h"
extra_options="${extra_options} linux-android.opt"
case $target in
diff --git a/libatomic/configure b/libatomic/configure
index b902e2c..5b3ef8e 100755
--- a/libatomic/configure
+++ b/libatomic/configure
@@ -5819,7 +5819,7 @@ irix5* | irix6* | nonstopux*)
   ;;
 
 # This must be Linux ELF.
-linux* | k*bsd*-gnu | kopensolaris*-gnu)
+linux* | k*bsd*-gnu | kopensolaris*-gnu | uclinuxfdpiceabi)
   lt_cv_deplibs_check_method=pass_all
   ;;
 
@@ -8305,7 +8305,7 @@ $as_echo_n "checking for $compiler option to produce 
PIC... " >&6; }
   lt_prog_compiler_static='-non_shared'
   ;;
 
-linux* | k*bsd*-gnu | kopensolaris*-gnu)
+linux* | k*bsd*-gnu | kopensolaris*-gnu | uclinux*)
   case $cc_basename in
   # old Intel for x86_64 which still supported -KPIC.
   ecc*)
@@ -8900,7 +8900,7 @@ _LT_EOF
   archive_expsym_cmds='sed "s,^,_," $export_symbols 
>$output_objdir/$soname.expsym~$CC -shared $pic_flag $libobjs $deplibs 
$compiler_flags ${wl}-h,$soname 
${wl}--retain-symbols-file,$output_objdir/$soname.expsym 
${wl}--image-base,`expr ${RANDOM-$$} % 4096 / 2 \* 262144 + 1342177280` -o $lib'
   ;;
 
-gnu* | linux* | tpf* | k*bsd*-gnu | kopensolaris*-gnu)
+gnu* | linux* | tpf* | k*bsd*-gnu | kopensolaris*-gnu | 

[ARM/FDPIC v2 00/21] FDPIC ABI for ARM

2018-07-13 Thread christophe.lyon
From: Christophe Lyon 

Hello,

This patch series implements the GCC contribution of the FDPIC ABI for
ARM targets.

This ABI enables to run Linux on ARM MMU-less cores and supports
shared libraries to reduce the memory footprint.

Without MMU, text and data segments relative distances are different
from one process to another, hence the need for a dedicated FDPIC
register holding the start address of the data segment. One of the
side effects is that function pointers require two words to be
represented: the address of the code, and the data segment start
address. These two words are designated as "Function Descriptor",
hence the "FD PIC" name.

On ARM, the FDPIC register is r9 [1], and the target name is
arm-uclinuxfdpiceabi. Note that arm-uclinux exists, but uses another
ABI and the BFLAT file format; it does not support code sharing.
The -mfdpic option is enabled by default, and -mno-fdpic should be
used to build the Linux kernel.

This work was developed some time ago by STMicroelectronics, and was
presented during Linaro Connect SFO15 (September 2015). You can watch
the discussion and read the slides [2].
This presentation was related to the toolchain published on github [3],
which is based on binutils-2.22, gcc-4.7, uclibc-0.9.33.2, gdb-7.5.1
and qemu-2.3.0, and for which pre-built binaries are available [3].

The ABI itself is described in details in [1].

Our Linux kernel patches have been updated and committed by Nicolas
Pitre (Linaro) in July 2017. They are required so that the loader is
able to handle this new file type. Indeed, the ELF files are tagged
with ELFOSABI_ARM_FDPIC. This new tag has been allocated by ARM, as
well as the new relocations involved.

The binutils and QEMU patch series have been merged recently. [4][5]

To build such a toolchain, you'd also need to use my uClibc branch[6].
I have posted uclibc-ng patches for review [7]

I am currently working on updating the patches for the remaining
toolchain components: uclibc and gdb.

This series provides support for ARM v7 architecture and has been
tested on arm-linux-gnueabi without regression, as well as
arm-uclinuxfdpiceabi, using QEMU. arm-uclinuxfdpiceabi has more
failures than arm-linux-gnueabi, but is quite functional.

Are the GCC patches OK for inclusion in master?

Changes between v1 and v2:
- fix GNU coding style
- exit with an error for pre-Armv7
- use ACLE __ARM_ARCH and remove dead code for pre-Armv4
- remove unsupported attempts of pre-Armv7/thumb1 support
- add instructions in comments next to opcodes
- merge patches 11 and 13
- fixed protected visibility handling in patch 8
- merged legitimize_tls_address_fdpic and
  legitimize_tls_address_not_fdpic as requested

Thanks,

Christophe.


[1] https://github.com/mickael-guene/fdpic_doc/blob/master/abi.txt
[2] 
http://connect.linaro.org/resource/sfo15/sfo15-406-arm-fdpic-toolset-kernel-libraries-for-cortex-m-cortex-r-mmuless-cores/
[3] https://github.com/mickael-guene/fdpic_manifest
[4] 
https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=commit;h=f1ac0afe481e83c9a33f247b81fa7de789edc4d9
[5] 
https://git.qemu.org/?p=qemu.git;a=commit;h=e8fa72957419c11984608062c7dcb204a6003a06
[6] 
https://git.linaro.org/people/christophe.lyon/uclibc.git/log/?h=uClibc-0.9.33.2-fdpic-upstream
[7] https://mailman.uclibc-ng.org/pipermail/devel/2018-July/001705.html

Christophe Lyon (21):
  [ARM] FDPIC: Add -mfdpic option support
  [ARM] FDPIC: Handle arm*-*-uclinuxfdpiceabi in configure scripts
  [ARM] FDPIC: Force FDPIC related options unless -mno-fdpic is provided
  [ARM] FDPIC: Add support for FDPIC for arm architecture
  [ARM] FDPIC: Fix __do_global_dtors_aux and frame_dummy generation
  [ARM] FDPIC: Add support for c++ exceptions
  [ARM] FDPIC: Avoid saving/restoring r9 on stack since it is RO
  [ARM] FDPIC: Ensure local/global binding for function descriptors
  [ARM] FDPIC: Add support for taking address of nested function
  [ARM] FDPIC: Implement TLS support.
  [ARM] FDPIC: Add support to unwind FDPIC signal frame
  [ARM] FDPIC: Restore r9 after we call __aeabi_read_tp
  [ARM] FDPIC: Force LSB bit for PC in Cortex-M architecture
  [ARM][testsuite] FDPIC: Skip unsupported tests
  [ARM][testsuite] FDPIC: Adjust scan-assembler patterns.
  [ARM][testsuite] FDPIC: Skip v8-m and v6-m tests that currently
produce an ICE
  [ARM][testsuite] FDPIC: Skip tests that don't work in PIC mode
  [ARM][testsuite] FDPIC: Handle *-*-uclinux*
  [ARM][testsuite] FDPIC: Enable tests on pie_enabled targets
  [ARM][testsuite] FDPIC: Adjust pr43698.c to avoid clash with uclibc.
  [ARM][testsuite] FDPIC: Skip tests using architecture older than v7

 config/futex.m4|   2 +-
 config/tls.m4  |   2 +-
 gcc/config.gcc |  13 +-
 gcc/config/arm/arm-c.c |   2 +
 gcc/config/arm/arm-protos.h|   1 +
 gcc/config/arm/arm.c   | 

[ARM/FDPIC v2 01/21] [ARM] FDPIC: Add -mfdpic option support

2018-07-13 Thread christophe.lyon
From: Christophe Lyon 

2018-XX-XX  Christophe Lyon  
Mickaël Guêné  

gcc/
* config/arm/arm.opt: Add -mfdpic option.

Change-Id: Ie5c4ed7434488933de6133186da09cd3ea1291a7

diff --git a/gcc/config/arm/arm.opt b/gcc/config/arm/arm.opt
index a1286a4..231c1cb 100644
--- a/gcc/config/arm/arm.opt
+++ b/gcc/config/arm/arm.opt
@@ -302,3 +302,7 @@ When linking for big-endian targets, generate a legacy BE32 
format image.
 mbranch-cost=
 Target RejectNegative Joined UInteger Var(arm_branch_cost) Init(-1)
 Cost to assume for a branch insn.
+
+mfdpic
+Target Report Mask(FDPIC)
+Enable Function Descriptor PIC mode.
-- 
2.6.3



Re: [PATCH] x86: Tune Skylake, Cannonlake and Icelake as Haswell

2018-07-13 Thread H.J. Lu
On Fri, Jul 13, 2018 at 9:07 AM, Jan Hubicka  wrote:
>> > > > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
>> > > > index 9e46b7b136f..762ab89fc9e 100644
>> > > > --- a/gcc/config/i386/i386.c
>> > > > +++ b/gcc/config/i386/i386.c
>> > > > @@ -137,17 +137,22 @@ const struct processor_costs *ix86_cost = NULL;
>> > > >  #define m_CORE2 (HOST_WIDE_INT_1U<> > > >  #define m_NEHALEM (HOST_WIDE_INT_1U<> > > >  #define m_SANDYBRIDGE (HOST_WIDE_INT_1U<> > > > -#define m_HASWELL (HOST_WIDE_INT_1U<> > > > +#define m_HASWELL ((HOST_WIDE_INT_1U<> > > > +  | (HOST_WIDE_INT_1U<> > > > +  | (HOST_WIDE_INT_1U<> > > > +  | (HOST_WIDE_INT_1U<> > > > +  | (HOST_WIDE_INT_1U<> > > > +  | (HOST_WIDE_INT_1U<> > > >
>> > >
>> > > Please introduce a new per-family define and group processors in this
>> > > define. Something like m_BDVER, m_BTVER and m_AMD_MULTIPLE for AMD
>> > targets.
>> > > We should not redefine m_HASWELL to include unrelated families.
>> > >
>> >
>> > Here is the updated patch.  OK for trunk if all tests pass?
>> >
>> >
>> OK.
>
> We have also noticed that benchmarks on skylake are not good compared to
> haswell, this nicely explains it.  I think this is -march=native regression
> compared to GCC versions that did not suppored better CPUs than Haswell.  So 
> it
> would be nice to backport it.

Yes, we should.   Here is the patch to backport to GCC 8.  OK for GCC 8 after
it has been checked into trunk?

Thanks.

-- 
H.J.
From 40a1050b330b421a1f445cb2a40b5a002da2e6d6 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Mon, 4 Jun 2018 19:16:06 -0700
Subject: [PATCH] x86: Tune Skylake, Cannonlake and Icelake as Haswell

r259399, which added PROCESSOR_SKYLAKE, disabled many x86 optimizations
which are enabled by PROCESSOR_HASWELL.  As the result, -mtune=skylake
generates slower codes on Skylake than before.  The same also applies
to Cannonlake and Icelak tuning.

This patch changes -mtune={skylake|cannonlake|icelake} to tune like
-mtune=haswell for until their tuning is properly adjusted. It also
enables -mprefer-vector-width=256 for -mtune=haswell, which has no
impact on codegen when AVX512 isn't enabled.

Performance impacts on SPEC CPU 2017 rate with 1 copy using

-march=native -mfpmath=sse -O2 -m64

are

1. On Broadwell server:

500.perlbench_r		-0.56%
502.gcc_r		-0.18%
505.mcf_r		0.24%
520.omnetpp_r		0.00%
523.xalancbmk_r		-0.32%
525.x264_r		-0.17%
531.deepsjeng_r		0.00%
541.leela_r		0.00%
548.exchange2_r		0.12%
557.xz_r		0.00%
Geomean			0.00%

503.bwaves_r		0.00%
507.cactuBSSN_r		0.21%
508.namd_r		0.00%
510.parest_r		0.19%
511.povray_r		-0.48%
519.lbm_r		0.00%
521.wrf_r		0.28%
526.blender_r		0.19%
527.cam4_r		0.39%
538.imagick_r		0.00%
544.nab_r		-0.36%
549.fotonik3d_r		0.51%
554.roms_r		0.00%
Geomean			0.17%

On Skylake client:

500.perlbench_r		0.96%
502.gcc_r		0.13%
505.mcf_r		-1.03%
520.omnetpp_r		-1.11%
523.xalancbmk_r		1.02%
525.x264_r		0.50%
531.deepsjeng_r		2.97%
541.leela_r		0.50%
548.exchange2_r		-0.95%
557.xz_r		2.41%
Geomean			0.56%

503.bwaves_r		0.49%
507.cactuBSSN_r		3.17%
508.namd_r		4.05%
510.parest_r		0.15%
511.povray_r		0.80%
519.lbm_r		3.15%
521.wrf_r		10.56%
526.blender_r		2.97%
527.cam4_r		2.36%
538.imagick_r		46.40%
544.nab_r		2.04%
549.fotonik3d_r		0.00%
554.roms_r		1.27%
Geomean			5.49%

On Skylake server:

500.perlbench_r		0.71%
502.gcc_r		-0.51%
505.mcf_r		-1.06%
520.omnetpp_r		-0.33%
523.xalancbmk_r		-0.22%
525.x264_r		1.72%
531.deepsjeng_r		-0.26%
541.leela_r		0.57%
548.exchange2_r		-0.75%
557.xz_r		-1.28%
Geomean			-0.21%

503.bwaves_r		0.00%
507.cactuBSSN_r		2.66%
508.namd_r		3.67%
510.parest_r		1.25%
511.povray_r		2.26%
519.lbm_r		1.69%
521.wrf_r		11.03%
526.blender_r		3.39%
527.cam4_r		1.69%
538.imagick_r		64.59%
544.nab_r		-0.54%
549.fotonik3d_r		2.68%
554.roms_r		0.00%
Geomean			6.19%

This patch improves -march=native performance on Skylake up to 60% and
leaves -march=native performance unchanged on Haswell.

gcc/

	Backport from mainline
	2018-07-12  H.J. Lu  
		Sunil K Pandey  

	PR target/84413
	* config/i386/i386.c (m_CORE_AVX512): New.
	(m_CORE_AVX2): Likewise.
	(m_CORE_ALL): Add m_CORE_AVX2.
	* config/i386/x86-tune.def: Replace m_HASWELL with m_CORE_AVX2.
	Replace m_SKYLAKE_AVX512 with m_CORE_AVX512 on avx256_optimal
	and remove the rest of m_SKYLAKE_AVX512.

gcc/testsuite/

	Backport from mainline
	2018-07-12  H.J. Lu  
		Sunil K Pandey  

	PR target/84413
	* gcc.target/i386/pr84413-1.c: New test.
	* gcc.target/i386/pr84413-2.c: Likewise.
	* gcc.target/i386/pr84413-3.c: Likewise.
	* gcc.target/i386/pr84413-4.c: Likewise.
---
 gcc/config/i386/i386.c|  5 -
 gcc/config/i386/x86-tune.def  | 26 +++
 gcc/testsuite/gcc.target/i386/pr84413-1.c | 17 +++
 gcc/testsuite/gcc.target/i386/pr84413-2.c | 17 +++
 gcc/testsuite/gcc.target/i386/pr84413-3.c | 17 +++
 

[GCC][PATCH][Aarch64] Exploiting BFXIL when OR-ing two AND-operations with appropriate bitmasks

2018-07-13 Thread Sam Tebbs

Hi all,

This patch adds an optimisation that exploits the AArch64 BFXIL instruction
when or-ing the result of two bitwise and operations with non-overlapping
bitmasks (e.g. (a & 0x) | (b & 0x)).

Example:

unsigned long long combine(unsigned long long a, unsigned long long b) {
  return (a & 0xll) | (b & 0xll);
}

void read2(unsigned long long a, unsigned long long b, unsigned long long *c,
  unsigned long long *d) {
  *c = combine(a, b); *d = combine(b, a);
}

When compiled with -O2, read2 would result in:

read2:
  and   x5, x1, #0x
  and   x4, x0, #0x
  orr   x4, x4, x5
  and   x1, x1, #0x
  and   x0, x0, #0x
  str   x4, [x2]
  orr   x0, x0, x1
  str   x0, [x3]
  ret

But with this patch results in:

read2:
  mov   x4, x1
  bfxil x4, x0, 0, 32
  str   x4, [x2]
  bfxil x0, x1, 0, 32
  str   x0, [x3]
  ret
  
Bootstrapped and regtested on aarch64-none-linux-gnu and aarch64-none-elf with 
no regressions.


gcc/
2018-07-11  Sam Tebbs  

    * config/aarch64/aarch64.md (*aarch64_bfxil, *aarch64_bfxil_alt):
    Define.
    * config/aarch64/aarch64-protos.h (aarch64_is_left_consecutive):
    Define.
    * config/aarch64/aarch64.c (aarch64_is_left_consecutive): New function.

gcc/testsuite
2018-07-11  Sam Tebbs  

    * gcc.target/aarch64/combine_bfxil.c: New file.
    * gcc.target/aarch64/combine_bfxil_2.c: New file.


   diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 514ddc4..b025cd6 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -558,4 +558,6 @@ rtl_opt_pass *make_pass_fma_steering (gcc::context *ctxt);
 
 poly_uint64 aarch64_regmode_natural_size (machine_mode);
 
+bool aarch64_is_left_consecutive (HOST_WIDE_INT);
+
 #endif /* GCC_AARCH64_PROTOS_H */
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index d75d45f..884958b 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1439,6 +1439,14 @@ aarch64_hard_regno_caller_save_mode (unsigned regno, unsigned,
 return SImode;
 }
 
+/* Implement IS_LEFT_CONSECUTIVE.  Check if an integer's bits are consecutive
+   ones from the MSB.  */
+bool
+aarch64_is_left_consecutive (HOST_WIDE_INT i)
+{
+  return (i | (i - 1)) == HOST_WIDE_INT_M1;
+}
+
 /* Implement TARGET_CONSTANT_ALIGNMENT.  Make strings word-aligned so
that strcpy from constants will be faster.  */
 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index a014a01..383d699 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -4844,6 +4844,42 @@
   [(set_attr "type" "rev")]
 )
 
+(define_insn "*aarch64_bfxil"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+(ior:DI (and:DI (match_operand:DI 1 "register_operand" "r")
+		(match_operand 3 "const_int_operand"))
+	(and:DI (match_operand:DI 2 "register_operand" "0")
+		(match_operand 4 "const_int_operand"]
+  "INTVAL (operands[3]) == ~INTVAL (operands[4])
+&& aarch64_is_left_consecutive (INTVAL (operands[3]))"
+  {
+HOST_WIDE_INT op4 = INTVAL (operands[4]);
+operands[3] = GEN_INT (64 - ceil_log2 (op4));
+output_asm_insn ("bfxil\\t%0, %1, 0, %3", operands);
+return "";
+  }
+  [(set_attr "type" "bfx")]
+)
+
+; An alternate bfxil pattern where the second bitmask is the smallest, and so
+; the first register used is changed instead of the second
+(define_insn "*aarch64_bfxil_alt"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+(ior:DI (and:DI (match_operand:DI 1 "register_operand" "0")
+		(match_operand 3 "const_int_operand"))
+	(and:DI (match_operand:DI 2 "register_operand" "r")
+		(match_operand 4 "const_int_operand"]
+  "INTVAL (operands[3]) == ~INTVAL (operands[4])
+&& aarch64_is_left_consecutive (INTVAL (operands[4]))"
+  {
+HOST_WIDE_INT op3 = INTVAL (operands[3]);
+operands[3] = GEN_INT (64 - ceil_log2 (op3));
+output_asm_insn ("bfxil\\t%0, %2, 0, %3", operands);
+return "";
+  }
+  [(set_attr "type" "bfx")]
+)
+
 ;; There are no canonicalisation rules for the position of the lshiftrt, ashift
 ;; operations within an IOR/AND RTX, therefore we have two patterns matching
 ;; each valid permutation.
diff --git a/gcc/testsuite/gcc.target/aarch64/combine_bfxil.c b/gcc/testsuite/gcc.target/aarch64/combine_bfxil.c
new file mode 100644
index 000..a0c6be4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/combine_bfxil.c
@@ -0,0 +1,33 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+unsigned long long
+combine_balanced (unsigned long long a, unsigned long long b)
+{
+  return (a & 0xll) | (b & 0xll);
+}
+
+
+unsigned long long
+combine_unbalanced (unsigned long long a, unsigned long long b)
+{
+  return (a & 0xff00ll) | (b & 0x00ffll);
+}
+
+void
+foo2 (unsigned long long a, 

Re: [PATCH] x86: Tune Skylake, Cannonlake and Icelake as Haswell

2018-07-13 Thread Jan Hubicka
> > > > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> > > > index 9e46b7b136f..762ab89fc9e 100644
> > > > --- a/gcc/config/i386/i386.c
> > > > +++ b/gcc/config/i386/i386.c
> > > > @@ -137,17 +137,22 @@ const struct processor_costs *ix86_cost = NULL;
> > > >  #define m_CORE2 (HOST_WIDE_INT_1U< > > >  #define m_NEHALEM (HOST_WIDE_INT_1U< > > >  #define m_SANDYBRIDGE (HOST_WIDE_INT_1U< > > > -#define m_HASWELL (HOST_WIDE_INT_1U< > > > +#define m_HASWELL ((HOST_WIDE_INT_1U< > > > +  | (HOST_WIDE_INT_1U< > > > +  | (HOST_WIDE_INT_1U< > > > +  | (HOST_WIDE_INT_1U< > > > +  | (HOST_WIDE_INT_1U< > > > +  | (HOST_WIDE_INT_1U< > > >
> > >
> > > Please introduce a new per-family define and group processors in this
> > > define. Something like m_BDVER, m_BTVER and m_AMD_MULTIPLE for AMD
> > targets.
> > > We should not redefine m_HASWELL to include unrelated families.
> > >
> >
> > Here is the updated patch.  OK for trunk if all tests pass?
> >
> >
> OK.

We have also noticed that benchmarks on skylake are not good compared to
haswell, this nicely explains it.  I think this is -march=native regression
compared to GCC versions that did not suppored better CPUs than Haswell.  So it
would be nice to backport it.

Honza


Re: [PATCH] reject conflicting attributes before calling handlers (PR 86453)

2018-07-13 Thread Christophe Lyon
On Fri, 13 Jul 2018 at 17:10, Martin Sebor  wrote:
>
> On 07/13/2018 02:53 AM, Christophe Lyon wrote:
> > Hi,
> >
> > On Thu, 12 Jul 2018 at 00:04, Martin Sebor  wrote:
> >>
> >> The attached change set adjusts the attribute exclusion code
> >> to detect and reject incompatible attributes before attribute
> >> handlers are called to have a chance to make changes despite
> >> the exclusions.  The handlers are not run when a conflict is
> >> found.
> >>
> >> Tested on x86_64-linux.  I expected the fallout to be bigger
> >> but only a handful of tests needed adjusting and the changes
> >> all look like clear improvements.  I.e., conflicting attributes
> >> that diagnosed as being ignored really are being ignored as one
> >> would expect.
> >>
> >
> > Since you committed this patch (r262596), I've noticed regressions on
> > aarch64/arm:
> > g++.dg/warn/pr86453.C  -std=c++11  (test for warnings, line 4)
> > g++.dg/warn/pr86453.C  -std=c++11 (test for excess errors)
> > g++.dg/warn/pr86453.C  -std=c++14  (test for warnings, line 4)
> > g++.dg/warn/pr86453.C  -std=c++14 (test for excess errors)
> > g++.dg/warn/pr86453.C  -std=c++98  (test for warnings, line 4)
> > g++.dg/warn/pr86453.C  -std=c++98 (test for excess errors)
> >
> > The log says:
> > Excess errors:
> > /gcc/testsuite/g++.dg/warn/pr86453.C:4:44: warning: ignoring attribute
> > 'packed' because it conflicts with attribute 'aligned' [-Wattributes]
> >
> > Isn't there the same message on x86_64?
>
> There was.  The test above was added between the time I tested
> my patch and the time I committed it.  I adjusted it yesterday
> via r262609 so the failure should be gone.
>

Indeed, thanks!
I reported the regression because I didn't see any comment about it on
gcc-patches.

Christophe

> Martin


[PATCH, middle-end]: Fix PR86511, traps are generated for non-trapping compares

2018-07-13 Thread Uros Bizjak
As demonstrated in the PR, middle-end changes the trappines of the
compare by expanding non-trapping compare to a combination of
setcc/cmove branchless code, e.g. UNLT is split to UNORDERED setcc and
LT cmove.

The above conversion is invalid w.r.t traps, since UNLT doesn't trap
on NaNs, while LT does.

The solution is to avoid the above expansion for compares that would
change their trappines and emit jumps around

2018-07-13  Uros Bizjak  

PR target/86511
* expmed.c (emit_store_flag): Do not emit setcc followed by a
conditional move when trapping comparison was split to a
non-trapping one (and vice versa).

Patch was bootstrapped and regression tested on x86_64-linux-gnu
{,-m32}, regression tests on alphaev68-linux-gnu are still running.

OK for mainline and branch?

Uros.
diff --git a/gcc/expmed.c b/gcc/expmed.c
index b01e1946898a..f114eb45e01f 100644
--- a/gcc/expmed.c
+++ b/gcc/expmed.c
@@ -6038,6 +6038,11 @@ emit_store_flag (rtx target, enum rtx_code code, rtx 
op0, rtx op1,
   if (!HAVE_conditional_move)
return 0;
 
+  /* Do not turn a trapping comparison into a non-trapping one.  */
+  if ((code != EQ && code != NE && code != UNEQ && code != LTGT)
+ && flag_trapping_math)
+   return 0;
+
   /* Try using a setcc instruction for ORDERED/UNORDERED, followed by a
 conditional move.  */
   tem = emit_store_flag_1 (subtarget, first_code, op0, op1, mode, 0,


[PR c++/86374] Name lookup failure in enclosing template

2018-07-13 Thread Nathan Sidwell
This was a latent problem exposed by Jason's fix for 85815.  There we 
needed to find the class in the scope stack.  Unfortunately, here we'd 
pushed an (incomplete) instantiation list, rather than the general 
template list.  We were previously getting away with that because 
we'd do the lookup in the a tsubstd scope, that had found the general 
template in another way.


The fix is to have lookup_template call tsubst_aggr_class directly for 
contexts that are classes, with the entering_scope flag set.  That way 
we figure that list is the general template.  This matches the code 
in tsubst_aggr_class itself, when it's tsubsting the context of the 
class it is dealing with.


A small drive-by cleanup in the parser.

Committing to trunk now. will commit to gcc-8 after testing there.

nathan

--
Nathan Sidwell
2018-07-13  Nathan Sidwell  

	PR c++/86374
	* pt.c (lookup_template_class_1): Use tsubst_aggr_type for
	contexts that are classes.
	* parser.c (cp_parser_template_id): Combine entering_scope decl &
	initializer.

	PR c++/86374
	* g++.dg/pr86374.C: New.

Index: cp/parser.c
===
--- cp/parser.c	(revision 262582)
+++ cp/parser.c	(working copy)
@@ -15973,15 +15973,14 @@ cp_parser_template_id (cp_parser *parser
   else if (DECL_TYPE_TEMPLATE_P (templ)
 	   || DECL_TEMPLATE_TEMPLATE_PARM_P (templ))
 {
-  bool entering_scope;
   /* In "template  ... A::", A is the abstract A
 	 template (rather than some instantiation thereof) only if
 	 is not nested within some other construct.  For example, in
 	 "template  void f(T) { A::", A is just an
 	 instantiation of A.  */
-  entering_scope = (template_parm_scope_p ()
-			&& cp_lexer_next_token_is (parser->lexer,
-		   CPP_SCOPE));
+  bool entering_scope
+	= (template_parm_scope_p ()
+	   && cp_lexer_next_token_is (parser->lexer, CPP_SCOPE));
   template_id
 	= finish_template_type (templ, arguments, entering_scope);
 }
Index: cp/pt.c
===
--- cp/pt.c	(revision 262582)
+++ cp/pt.c	(working copy)
@@ -9368,8 +9368,15 @@ lookup_template_class_1 (tree d1, tree a
 	  return found;
 	}
 
-  context = tsubst (DECL_CONTEXT (gen_tmpl), arglist,
-			complain, in_decl);
+  context = DECL_CONTEXT (gen_tmpl);
+  if (context && TYPE_P (context))
+	{
+	  context = tsubst_aggr_type (context, arglist, complain, in_decl, true);
+	  context = complete_type (context);
+	}
+  else
+	context = tsubst (context, arglist, complain, in_decl);
+
   if (context == error_mark_node)
 	return error_mark_node;
 
Index: testsuite/g++.dg/pr86374.C
===
--- testsuite/g++.dg/pr86374.C	(revision 0)
+++ testsuite/g++.dg/pr86374.C	(working copy)
@@ -0,0 +1,20 @@
+// pr C++/86374
+// bogus lookup error
+template
+struct list {
+  static const int index = 1;
+  template  struct addWithChecking {};
+};
+
+template
+struct find {
+  static const int result = 0;
+};
+
+template 
+template
+struct list::addWithChecking
+{
+  static const int xres =
+find >::result; // bogus error about index here.
+};


Re: [PATCH] reject conflicting attributes before calling handlers (PR 86453)

2018-07-13 Thread Martin Sebor

On 07/13/2018 02:53 AM, Christophe Lyon wrote:

Hi,

On Thu, 12 Jul 2018 at 00:04, Martin Sebor  wrote:


The attached change set adjusts the attribute exclusion code
to detect and reject incompatible attributes before attribute
handlers are called to have a chance to make changes despite
the exclusions.  The handlers are not run when a conflict is
found.

Tested on x86_64-linux.  I expected the fallout to be bigger
but only a handful of tests needed adjusting and the changes
all look like clear improvements.  I.e., conflicting attributes
that diagnosed as being ignored really are being ignored as one
would expect.



Since you committed this patch (r262596), I've noticed regressions on
aarch64/arm:
g++.dg/warn/pr86453.C  -std=c++11  (test for warnings, line 4)
g++.dg/warn/pr86453.C  -std=c++11 (test for excess errors)
g++.dg/warn/pr86453.C  -std=c++14  (test for warnings, line 4)
g++.dg/warn/pr86453.C  -std=c++14 (test for excess errors)
g++.dg/warn/pr86453.C  -std=c++98  (test for warnings, line 4)
g++.dg/warn/pr86453.C  -std=c++98 (test for excess errors)

The log says:
Excess errors:
/gcc/testsuite/g++.dg/warn/pr86453.C:4:44: warning: ignoring attribute
'packed' because it conflicts with attribute 'aligned' [-Wattributes]

Isn't there the same message on x86_64?


There was.  The test above was added between the time I tested
my patch and the time I committed it.  I adjusted it yesterday
via r262609 so the failure should be gone.

Martin


Re: [PATCH] S/390: libstdc++: 64 and 32 bit baseline update

2018-07-13 Thread Andreas Schwab
On Jul 13 2018, Andreas Krebbel  wrote:

> @@ -5645,3 +5657,5 @@ OBJECT:8:_ZTTSi@@GLIBCXX_3.4
>  OBJECT:8:_ZTTSo@@GLIBCXX_3.4
>  OBJECT:8:_ZTTSt13basic_istreamIwSt11char_traitsIwEE@@GLIBCXX_3.4
>  OBJECT:8:_ZTTSt13basic_ostreamIwSt11char_traitsIwEE@@GLIBCXX_3.4
> +TLS:4:_ZSt11__once_call@@GLIBCXX_3.4.11
> +TLS:4:_ZSt15__once_callable@@GLIBCXX_3.4.11

You should not have any TLS entries.

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."


Re: [PATCH, rs6000] Fix AIX test case failures

2018-07-13 Thread David Edelsohn
On Mon, Jun 25, 2018 at 1:04 PM Segher Boessenkool
 wrote:
>
> On Mon, Jun 25, 2018 at 09:53:17AM -0700, Carl Love wrote:
> > On Mon, 2018-06-25 at 04:44 -0500, Segher Boessenkool wrote:
> > > On Fri, Jun 22, 2018 at 02:55:44PM -0700, Carl Love wrote:
> > > > --- a/gcc/testsuite/gcc.target/powerpc/divkc3-2.c
> > > > +++ b/gcc/testsuite/gcc.target/powerpc/divkc3-2.c
> > > > @@ -13,4 +13,5 @@ divide (cld_t *p, cld_t *q, cld_t *r)
> > > >*p = *q / *r;
> > > >  }
> > > >
> > > > -/* { dg-final { scan-assembler "bl __divkc3" } } */
> > > > +/* { dg-final { scan-assembler "bl __divkc3" { target { powerpc*-
> > > > *-linux* } } } } */
> > > > +/* { dg-final { scan-assembler "bl .__divdc3" { target { powerpc*-
> > > > *-aix* } } } } */
> > >
> > > Should it be calling __divdc3 on AIX, is that correct?
> >
> > I was a bit surprised that it wasn't calling divkc3.  I am guessing
> > these are library routines we are calling?  I couldn't find the source
> > code for them and don't really know what the difference is between
> > divkc3 and divdc3.
>
> divkc3 is for KCmode, that is the complex mode for KFmode (128-bit IEEE).
> divdc3 is for DCmode, that is the complex mode for DFmode (64-bit IEEE,
> that is, "double").
>
> I think this is the same as PR82625, for which I have a patch in testing.
>
> > So, not sure why AIX and Linux are not calling the name for the
> > function or if what is being called is functionally equivalent?
>
> AIX uses 64-bit long double by default, and GCC has a bug with that and
> -mabi=ieeelongdouble and __ieee128.
>
> It thinks __ieee128 is the same as long double if it has -mabi=ieeelongdouble,
> but that is not always true.  So it ends up using the long double type for
> __ieee128, but that is just double precision float in this case.

On AIX it would be calling divtc3, but AIX defaults to 64 bit long
double.  Either all of these tests need

/* { dg-require-effective-target longdouble128 } */

or

/* { dg-additional-options "-mlong-double-128" { target powerpc-ibm-aix* } } */

along with testing for "tc", e.g., bl .__divtc3

Thanks, David


Re: [PATCH][Middle-end][version 3]3rd patch of PR78809

2018-07-13 Thread Qing Zhao
thank you.

the patch was just committed into trunk as:

https://gcc.gnu.org/viewcvs/gcc?view=revision=262636 


Qing
> On Jul 12, 2018, at 12:03 PM, Jeff Law  wrote:
> 
>> 
>> gcc/ChangeLog:
>> 
>> +2018-07-11  Qing Zhao  
>> +
>> +PR middle-end/78809
>> +* builtins.c (expand_builtin_memcmp): Inline the calls first
>> +when result_eq is false.
>> +(expand_builtin_strcmp): Inline the calls first.
>> +(expand_builtin_strncmp): Likewise.
>> +(inline_string_cmp): New routine. Expand a string compare 
>> +call by using a sequence of char comparison.
>> +(inline_expand_builtin_string_cmp): New routine. Inline expansion
>> +a call to str(n)cmp/memcmp.
>> +* doc/invoke.texi (--param builtin-string-cmp-inline-length): New 
>> option.
>> +* params.def (BUILTIN_STRING_CMP_INLINE_LENGTH): New.
>> +
>> 
>> gcc/testsuite/ChangeLog:
>> 
>> +2018-07-11  Qing Zhao  
>> +
>> +PR middle-end/78809
>> +* gcc.dg/strcmpopt_5.c: New test.
>> +* gcc.dg/strcmpopt_6.c: New test.
> OK
> THanks
> 
> Jeff



C++ patch ping

2018-07-13 Thread Jakub Jelinek
Hi!

I'd like to ping the following C++ patches:

- PR c++/85515
  make range for temporaries unspellable during parsing and only
  turn them into spellable for debug info purposes
  http://gcc.gnu.org/ml/gcc-patches/2018-07/msg00086.html

- PR c++/3698, PR c++/86208
  extern_decl_map & TREE_USED fix (plus 2 untested variants)
  http://gcc.gnu.org/ml/gcc-patches/2018-07/msg00084.html

Jakub


Re: C++ PATCH for c++/86190, bogus -Wsign-conversion warning

2018-07-13 Thread Marek Polacek
Ping.

On Tue, Jul 03, 2018 at 09:35:24AM -0400, Marek Polacek wrote:
> This PR complains about bogus -Wsign-conversion warning even with an
> explicit static_cast.  It started with this hunk from the delayed folding
> merge:
> 
> @@ -5028,20 +5022,12 @@ cp_build_binary_op (location_t location,
>  
>if (short_compare)
> {
> - /* Don't write , etc., because that would prevent op0
> -from being kept in a register.
> -Instead, make copies of the our local variables and
> -pass the copies by reference, then copy them back afterward.  */
> - tree xop0 = op0, xop1 = op1, xresult_type = result_type;
> + /* We call shorten_compare only for diagnostic-reason.  */
> + tree xop0 = fold_simple (op0), xop1 = fold_simple (op1),
> +  xresult_type = result_type;
>   enum tree_code xresultcode = resultcode;
> - tree val
> -   = shorten_compare (location, , , _type,
> + shorten_compare (location, , , _type,
>);
> - if (val != 0)
> -   return cp_convert (boolean_type_node, val, complain);
> - op0 = xop0, op1 = xop1;
> - converted = 1;
> - resultcode = xresultcode;
> }
>  
>if ((short_compare || code == MIN_EXPR || code == MAX_EXPR)
> 
> which means that converted is now unset so we go to
> 
>  5350   if (! converted)
>  5351 {
>  5352   if (TREE_TYPE (op0) != result_type)
>  5353 op0 = cp_convert_and_check (result_type, op0, complain);
>  5354   if (TREE_TYPE (op1) != result_type)
>  5355 op1 = cp_convert_and_check (result_type, op1, complain);
> 
> and cp_convert_and_check gives those warning.  The direct comparison
> of types instead of same_type_p means we can try to convert same types,
> but it still wouldn't fix this PR.  What we should probably do is to
> simply disable -Wsign-conversion conversion for comparison, because
> -Wsign-compare will warn for those.  With this patch, the C++ FE will
> follow what the C FE and clang++ do.
> 
> Also fix some formatting that's been bothering me, while at it.
> 
> Bootstrapped/regtested on x86_64-linux, ok for trunk/8?
> 
> 2018-07-03  Marek Polacek  
> 
>   PR c++/86190 - bogus -Wsign-conversion warning
>   * typeck.c (cp_build_binary_op): Fix formatting.  Add a warning
>   sentinel.
> 
>   * g++.dg/warn/Wsign-conversion-3.C: New test.
>   * g++.dg/warn/Wsign-conversion-4.C: New test.
> 
> diff --git gcc/cp/typeck.c gcc/cp/typeck.c
> index 3a4f1cdf479..cfd1dd8b150 100644
> --- gcc/cp/typeck.c
> +++ gcc/cp/typeck.c
> @@ -5311,12 +5311,13 @@ cp_build_binary_op (location_t location,
>  
>if (short_compare)
>   {
> -   /* We call shorten_compare only for diagnostic-reason.  */
> -   tree xop0 = fold_simple (op0), xop1 = fold_simple (op1),
> -xresult_type = result_type;
> +   /* We call shorten_compare only for diagnostics.  */
> +   tree xop0 = fold_simple (op0);
> +   tree xop1 = fold_simple (op1);
> +   tree xresult_type = result_type;
> enum tree_code xresultcode = resultcode;
> shorten_compare (location, , , _type,
> -);
> +);
>   }
>  
>if ((short_compare || code == MIN_EXPR || code == MAX_EXPR)
> @@ -5349,6 +5350,7 @@ cp_build_binary_op (location_t location,
>   otherwise, it will be given type RESULT_TYPE.  */
>if (! converted)
>  {
> +  warning_sentinel w (warn_sign_conversion, short_compare);
>if (TREE_TYPE (op0) != result_type)
>   op0 = cp_convert_and_check (result_type, op0, complain);
>if (TREE_TYPE (op1) != result_type)
> diff --git gcc/testsuite/g++.dg/warn/Wsign-conversion-3.C 
> gcc/testsuite/g++.dg/warn/Wsign-conversion-3.C
> index e69de29bb2d..2c3fef31475 100644
> --- gcc/testsuite/g++.dg/warn/Wsign-conversion-3.C
> +++ gcc/testsuite/g++.dg/warn/Wsign-conversion-3.C
> @@ -0,0 +1,13 @@
> +// PR c++/86190
> +// { dg-options "-Wsign-conversion -Wsign-compare" }
> +
> +typedef unsigned long sz_t;
> +sz_t s();
> +bool f(int i) { return s() < (unsigned long) i; }
> +bool f2(int i) { return s() < static_cast(i); }
> +bool f3(int i) { return s() < i; } // { dg-warning "comparison of integer 
> expressions of different signedness" }
> +bool f4(int i) { return s() < (long) i; } // { dg-warning "comparison of 
> integer expressions of different signedness" }
> +bool f5(short int i) { return s() < (int) i; } // { dg-warning "comparison 
> of integer expressions of different signedness" }
> +bool f6(signed char i) { return s() < (int) i; } // { dg-warning "comparison 
> of integer expressions of different signedness" }
> +bool f7(unsigned char i) { return s() < i; }
> +bool f8(signed char i) { return s() < i; } // { dg-warning "comparison of 
> integer expressions of different signedness" }
> diff --git gcc/testsuite/g++.dg/warn/Wsign-conversion-4.C 
> gcc/testsuite/g++.dg/warn/Wsign-conversion-4.C
> index e69de29bb2d..40814b95587 100644
> 

Re: [PATCH] x86: Tune Skylake, Cannonlake and Icelake as Haswell

2018-07-13 Thread Uros Bizjak
On Fri, Jul 13, 2018 at 3:12 PM, H.J. Lu  wrote:

> On Fri, Jul 13, 2018 at 08:53:02AM +0200, Uros Bizjak wrote:
> > On Thu, Jul 12, 2018 at 9:57 PM, H.J. Lu  wrote:
> >
> > > r259399, which added PROCESSOR_SKYLAKE, disabled many x86 optimizations
> > > which are enabled by PROCESSOR_HASWELL.  As the result, -mtune=skylake
> > > generates slower codes on Skylake than before.  The same also applies
> > > to Cannonlake and Icelak tuning.
> > >
> > > This patch changes -mtune={skylake|cannonlake|icelake} to tune like
> > > -mtune=haswell for until their tuning is properly adjusted. It also
> > > enables -mprefer-vector-width=256 for -mtune=haswell, which has no
> > > impact on codegen when AVX512 isn't enabled.
> > >
> > > Performance impacts on SPEC CPU 2017 rate with 1 copy using
> > >
> > > -march=native -mfpmath=sse -O2 -m64
> > >
> > > are
> > >
> > > 1. On Broadwell server:
> > >
> > > 500.perlbench_r -0.56%
> > > 502.gcc_r   -0.18%
> > > 505.mcf_r   0.24%
> > > 520.omnetpp_r   0.00%
> > > 523.xalancbmk_r -0.32%
> > > 525.x264_r  -0.17%
> > > 531.deepsjeng_r 0.00%
> > > 541.leela_r 0.00%
> > > 548.exchange2_r 0.12%
> > > 557.xz_r0.00%
> > > geomean 0.00%
> > >
> > > 503.bwaves_r0.00%
> > > 507.cactuBSSN_r 0.21%
> > > 508.namd_r  0.00%
> > > 510.parest_r0.19%
> > > 511.povray_r-0.48%
> > > 519.lbm_r   0.00%
> > > 521.wrf_r   0.28%
> > > 526.blender_r   0.19%
> > > 527.cam4_r  0.39%
> > > 538.imagick_r   0.00%
> > > 544.nab_r   -0.36%
> > > 549.fotonik3d_r 0.51%
> > > 554.roms_r  0.00%
> > > geomean 0.17%
> > >
> > > On Skylake client:
> > >
> > > 500.perlbench_r 0.96%
> > > 502.gcc_r   0.13%
> > > 505.mcf_r   -1.03%
> > > 520.omnetpp_r   -1.11%
> > > 523.xalancbmk_r 1.02%
> > > 525.x264_r  0.50%
> > > 531.deepsjeng_r 2.97%
> > > 541.leela_r 0.50%
> > > 548.exchange2_r -0.95%
> > > 557.xz_r2.41%
> > > geomean 0.56%
> > >
> > > 503.bwaves_r0.49%
> > > 507.cactuBSSN_r 3.17%
> > > 508.namd_r  4.05%
> > > 510.parest_r0.15%
> > > 511.povray_r0.80%
> > > 519.lbm_r   3.15%
> > > 521.wrf_r   10.56%
> > > 526.blender_r   2.97%
> > > 527.cam4_r  2.36%
> > > 538.imagick_r   46.40%
> > > 544.nab_r   2.04%
> > > 549.fotonik3d_r 0.00%
> > > 554.roms_r  1.27%
> > > geomean 5.49%
> > >
> > > On Skylake server:
> > >
> > > 500.perlbench_r 0.71%
> > > 502.gcc_r   -0.51%
> > > 505.mcf_r   -1.06%
> > > 520.omnetpp_r   -0.33%
> > > 523.xalancbmk_r -0.22%
> > > 525.x264_r  1.72%
> > > 531.deepsjeng_r -0.26%
> > > 541.leela_r 0.57%
> > > 548.exchange2_r -0.75%
> > > 557.xz_r-1.28%
> > > geomean -0.21%
> > >
> > > 503.bwaves_r0.00%
> > > 507.cactuBSSN_r 2.66%
> > > 508.namd_r  3.67%
> > > 510.parest_r1.25%
> > > 511.povray_r2.26%
> > > 519.lbm_r   1.69%
> > > 521.wrf_r   11.03%
> > > 526.blender_r   3.39%
> > > 527.cam4_r  1.69%
> > > 538.imagick_r   64.59%
> > > 544.nab_r   -0.54%
> > > 549.fotonik3d_r 2.68%
> > > 554.roms_r  0.00%
> > > geomean 6.19%
> > >
> > > This patch improves -march=native performance on Skylake up to 60% and
> > > leaves -march=native performance unchanged on Haswell.
> > >
> > > OK for trunk?
> > >
> > > Thanks.
> > >
> > > H.J.
> > > ---
> > > gcc/
> > >
> > > 2018-07-12  H.J. Lu  
> > > Sunil K Pandey  
> > >
> > > PR target/84413
> > > * config/i386/i386.c (m_HASWELL): Add PROCESSOR_SKYLAKE,
> > > PROCESSOR_SKYLAKE_AVX512, PROCESSOR_CANNONLAKE,
> > > PROCESSOR_ICELAKE_CLIENT and PROCESSOR_ICELAKE_SERVER.
> > > (m_SKYLAKE): Set to 0.
> > > (m_SKYLAKE_AVX512): Likewise.
> > > (m_CANNONLAKE): Likewise.
> > > (m_ICELAKE_CLIENT): Likewise.
> > > (m_ICELAKE_SERVER): Likewise.
> > > * config/i386/x86-tune.def (avx256_optimal): Also enabled for
> > > m_HASWELL.
> > >
> > > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> > > index 9e46b7b136f..762ab89fc9e 100644
> > > --- a/gcc/config/i386/i386.c
> > > +++ b/gcc/config/i386/i386.c
> > > @@ -137,17 +137,22 @@ const struct processor_costs *ix86_cost = NULL;
> > >  #define m_CORE2 (HOST_WIDE_INT_1U< > >  #define m_NEHALEM (HOST_WIDE_INT_1U< > >  #define m_SANDYBRIDGE (HOST_WIDE_INT_1U< > > -#define m_HASWELL (HOST_WIDE_INT_1U< > > +#define m_HASWELL ((HOST_WIDE_INT_1U< > > + 

Re: abstract wide int binop code from VRP

2018-07-13 Thread Richard Biener
On Fri, Jul 13, 2018 at 3:18 PM Richard Biener
 wrote:
>
> On Fri, Jul 13, 2018 at 10:05 AM Aldy Hernandez  wrote:
> >
> >
> >
> > On 07/13/2018 03:02 AM, Richard Biener wrote:
> > > On Thu, Jul 12, 2018 at 10:12 AM Aldy Hernandez  wrote:
> >
> > > So besides the general discussion about references/pointers for out 
> > > parameters
> > > let's stay consistet within related APIs.  This means wide_int_binop 
> > > should
> > > have a
> > >
> > > wide_int
> > > wide_int_binop (enum tree_code, const wide_int &, const wide_int &,
> > > signop, wi::overflow_type *)
> > >
> > > signature.  Notice how I elided the out wide_int parameter to be the
> > > return value which means
> > > the function isn't supposed to fail which means gcc_unreachable () for
> > > "unhandled" tree codes.
> >
> > wide_int_binop was returning failure for:
> >
> > > case CEIL_DIV_EXPR:
> > >   if (arg2 == 0)
> > >   return false;
> > >   res = wi::div_ceil (arg1, arg2, sign, );
> > >   break;
> > >
> > > case ROUND_DIV_EXPR:
> > >   if (arg2 == 0)
> > >   return false;
> > >   res = wi::div_round (arg1, arg2, sign, );
> > >   break;
> > etc
> >
> > How do you suggest we indicate success/failure to the caller?
>
> Oh, ok.  Exceptions?  (eh...)
>
> Well, so I guess you can leave the signature as-is apart from turing
> the overflow
> result into a pointer.

Alternatively handle it like wi::sdiv and friends which indicate overflow
and use a zero result.  Thus, remove the "failure" path here.  Of course
when used via the tree interface this probably isn't the desired result
which means this really isn't a general wide-int op with code thing
but only a helper for int_cst_binop :/

So alternatively do the interface I suggested w/o the zero checks
and do the zero checks in the int_const_binop caller.

Richard.

> OK with that change.
> Richard.
>
> > Aldy
> >
> > > It's more like an exceptional state anyway.
> > >
> > > The same goes for the poly_int_binop signature.
> > >
> > > The already existing wi::accumulate_overflow should probably be re-done as
> > >
> > > wi::overflow_type wi::accumulate_overflow (wi::overflow_type,
> > > wi::overflow_type);
> > >
> > > Richard.
> > >
> > >> Thanks for the review!
> > >> Aldy


Re: abstract wide int binop code from VRP

2018-07-13 Thread Richard Biener
On Fri, Jul 13, 2018 at 10:05 AM Aldy Hernandez  wrote:
>
>
>
> On 07/13/2018 03:02 AM, Richard Biener wrote:
> > On Thu, Jul 12, 2018 at 10:12 AM Aldy Hernandez  wrote:
>
> > So besides the general discussion about references/pointers for out 
> > parameters
> > let's stay consistet within related APIs.  This means wide_int_binop should
> > have a
> >
> > wide_int
> > wide_int_binop (enum tree_code, const wide_int &, const wide_int &,
> > signop, wi::overflow_type *)
> >
> > signature.  Notice how I elided the out wide_int parameter to be the
> > return value which means
> > the function isn't supposed to fail which means gcc_unreachable () for
> > "unhandled" tree codes.
>
> wide_int_binop was returning failure for:
>
> > case CEIL_DIV_EXPR:
> >   if (arg2 == 0)
> >   return false;
> >   res = wi::div_ceil (arg1, arg2, sign, );
> >   break;
> >
> > case ROUND_DIV_EXPR:
> >   if (arg2 == 0)
> >   return false;
> >   res = wi::div_round (arg1, arg2, sign, );
> >   break;
> etc
>
> How do you suggest we indicate success/failure to the caller?

Oh, ok.  Exceptions?  (eh...)

Well, so I guess you can leave the signature as-is apart from turing
the overflow
result into a pointer.

OK with that change.
Richard.

> Aldy
>
> > It's more like an exceptional state anyway.
> >
> > The same goes for the poly_int_binop signature.
> >
> > The already existing wi::accumulate_overflow should probably be re-done as
> >
> > wi::overflow_type wi::accumulate_overflow (wi::overflow_type,
> > wi::overflow_type);
> >
> > Richard.
> >
> >> Thanks for the review!
> >> Aldy


Re: [PATCH] x86: Tune Skylake, Cannonlake and Icelake as Haswell

2018-07-13 Thread H.J. Lu
On Fri, Jul 13, 2018 at 08:53:02AM +0200, Uros Bizjak wrote:
> On Thu, Jul 12, 2018 at 9:57 PM, H.J. Lu  wrote:
> 
> > r259399, which added PROCESSOR_SKYLAKE, disabled many x86 optimizations
> > which are enabled by PROCESSOR_HASWELL.  As the result, -mtune=skylake
> > generates slower codes on Skylake than before.  The same also applies
> > to Cannonlake and Icelak tuning.
> >
> > This patch changes -mtune={skylake|cannonlake|icelake} to tune like
> > -mtune=haswell for until their tuning is properly adjusted. It also
> > enables -mprefer-vector-width=256 for -mtune=haswell, which has no
> > impact on codegen when AVX512 isn't enabled.
> >
> > Performance impacts on SPEC CPU 2017 rate with 1 copy using
> >
> > -march=native -mfpmath=sse -O2 -m64
> >
> > are
> >
> > 1. On Broadwell server:
> >
> > 500.perlbench_r -0.56%
> > 502.gcc_r   -0.18%
> > 505.mcf_r   0.24%
> > 520.omnetpp_r   0.00%
> > 523.xalancbmk_r -0.32%
> > 525.x264_r  -0.17%
> > 531.deepsjeng_r 0.00%
> > 541.leela_r 0.00%
> > 548.exchange2_r 0.12%
> > 557.xz_r0.00%
> > geomean 0.00%
> >
> > 503.bwaves_r0.00%
> > 507.cactuBSSN_r 0.21%
> > 508.namd_r  0.00%
> > 510.parest_r0.19%
> > 511.povray_r-0.48%
> > 519.lbm_r   0.00%
> > 521.wrf_r   0.28%
> > 526.blender_r   0.19%
> > 527.cam4_r  0.39%
> > 538.imagick_r   0.00%
> > 544.nab_r   -0.36%
> > 549.fotonik3d_r 0.51%
> > 554.roms_r  0.00%
> > geomean 0.17%
> >
> > On Skylake client:
> >
> > 500.perlbench_r 0.96%
> > 502.gcc_r   0.13%
> > 505.mcf_r   -1.03%
> > 520.omnetpp_r   -1.11%
> > 523.xalancbmk_r 1.02%
> > 525.x264_r  0.50%
> > 531.deepsjeng_r 2.97%
> > 541.leela_r 0.50%
> > 548.exchange2_r -0.95%
> > 557.xz_r2.41%
> > geomean 0.56%
> >
> > 503.bwaves_r0.49%
> > 507.cactuBSSN_r 3.17%
> > 508.namd_r  4.05%
> > 510.parest_r0.15%
> > 511.povray_r0.80%
> > 519.lbm_r   3.15%
> > 521.wrf_r   10.56%
> > 526.blender_r   2.97%
> > 527.cam4_r  2.36%
> > 538.imagick_r   46.40%
> > 544.nab_r   2.04%
> > 549.fotonik3d_r 0.00%
> > 554.roms_r  1.27%
> > geomean 5.49%
> >
> > On Skylake server:
> >
> > 500.perlbench_r 0.71%
> > 502.gcc_r   -0.51%
> > 505.mcf_r   -1.06%
> > 520.omnetpp_r   -0.33%
> > 523.xalancbmk_r -0.22%
> > 525.x264_r  1.72%
> > 531.deepsjeng_r -0.26%
> > 541.leela_r 0.57%
> > 548.exchange2_r -0.75%
> > 557.xz_r-1.28%
> > geomean -0.21%
> >
> > 503.bwaves_r0.00%
> > 507.cactuBSSN_r 2.66%
> > 508.namd_r  3.67%
> > 510.parest_r1.25%
> > 511.povray_r2.26%
> > 519.lbm_r   1.69%
> > 521.wrf_r   11.03%
> > 526.blender_r   3.39%
> > 527.cam4_r  1.69%
> > 538.imagick_r   64.59%
> > 544.nab_r   -0.54%
> > 549.fotonik3d_r 2.68%
> > 554.roms_r  0.00%
> > geomean 6.19%
> >
> > This patch improves -march=native performance on Skylake up to 60% and
> > leaves -march=native performance unchanged on Haswell.
> >
> > OK for trunk?
> >
> > Thanks.
> >
> > H.J.
> > ---
> > gcc/
> >
> > 2018-07-12  H.J. Lu  
> > Sunil K Pandey  
> >
> > PR target/84413
> > * config/i386/i386.c (m_HASWELL): Add PROCESSOR_SKYLAKE,
> > PROCESSOR_SKYLAKE_AVX512, PROCESSOR_CANNONLAKE,
> > PROCESSOR_ICELAKE_CLIENT and PROCESSOR_ICELAKE_SERVER.
> > (m_SKYLAKE): Set to 0.
> > (m_SKYLAKE_AVX512): Likewise.
> > (m_CANNONLAKE): Likewise.
> > (m_ICELAKE_CLIENT): Likewise.
> > (m_ICELAKE_SERVER): Likewise.
> > * config/i386/x86-tune.def (avx256_optimal): Also enabled for
> > m_HASWELL.
> >
> > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> > index 9e46b7b136f..762ab89fc9e 100644
> > --- a/gcc/config/i386/i386.c
> > +++ b/gcc/config/i386/i386.c
> > @@ -137,17 +137,22 @@ const struct processor_costs *ix86_cost = NULL;
> >  #define m_CORE2 (HOST_WIDE_INT_1U< >  #define m_NEHALEM (HOST_WIDE_INT_1U< >  #define m_SANDYBRIDGE (HOST_WIDE_INT_1U< > -#define m_HASWELL (HOST_WIDE_INT_1U< > +#define m_HASWELL ((HOST_WIDE_INT_1U< > +  | (HOST_WIDE_INT_1U< > +  | (HOST_WIDE_INT_1U< > +  | (HOST_WIDE_INT_1U< > +  | (HOST_WIDE_INT_1U< > +  | (HOST_WIDE_INT_1U< >
> 
> Please introduce a new per-family define and group processors in this
> define. Something like m_BDVER, m_BTVER and 

Re: [PATCH] [v2][aarch64] Avoid tag collisions for loads falkor

2018-07-13 Thread Kyrill Tkachov

Hi Siddhesh,

On 13/07/18 12:26, Siddhesh Poyarekar wrote:

Hi,

This is a rewrite of the tag collision avoidance patch that Kugan had
written as a machine reorg pass back in February.

The falkor hardware prefetching system uses a combination of the
source, destination and offset to decide which prefetcher unit to
train with the load.  This is great when loads in a loop are
sequential but sub-optimal if there are unrelated loads in a loop that
tag to the same prefetcher unit.

This pass attempts to rename the desination register of such colliding
loads using routines available in regrename.c so that their tags do
not collide.  This shows some performance gains with mcf and xalancbmk
(~5% each) and will be tweaked further.  The pass is placed near the
fag end of the pass list so that subsequent passes don't inadvertantly
end up undoing the renames.

A full gcc bootstrap and testsuite ran successfully on aarch64, i.e. it
did not introduce any new regressions.  I also did a make-check with
-mcpu=falkor to ensure that there were no regressions.  The couple of
regressions I found were target-specific and were related to scheduling
and cost differences and are not correctness issues.

Changes from v1:

- Fixed up issues pointed out by Kyrill
- Avoid renaming R0/V0 since they could be return values
- Fixed minor formatting issues.

2018-07-02  Siddhesh Poyarekar 
Kugan Vivekanandarajah 

* config/aarch64/falkor-tag-collision-avoidance.c: New file.
* config.gcc (extra_objs): Build it.
* config/aarch64/t-aarch64 (falkor-tag-collision-avoidance.o):
Likewise.
* config/aarch64/aarch64-passes.def
(pass_tag_collision_avoidance): New pass.
* config/aarch64/aarch64.c (qdf24xx_tunings): Add
AARCH64_EXTRA_TUNE_RENAME_LOAD_REGS to tuning_flags.
(aarch64_classify_address): Remove static qualifier.
(aarch64_address_info, aarch64_address_type): Move to...
* config/aarch64/aarch64-protos.h: ... here.
(make_pass_tag_collision_avoidance): New function.
* config/aarch64/aarch64-tuning-flags.def (rename_load_regs):
New tuning flag.



This looks good to me modulo a couple of minor comments inline.
You'll still need an approval from a maintainer.

Thanks,
Kyrill


CC: james.greenha...@arm.com
CC: kyrylo.tkac...@foss.arm.com
---
 gcc/config.gcc|   2 +-
 gcc/config/aarch64/aarch64-passes.def |   1 +
 gcc/config/aarch64/aarch64-protos.h   |  49 +
 gcc/config/aarch64/aarch64-tuning-flags.def   |   2 +
 gcc/config/aarch64/aarch64.c  |  48 +-
 .../aarch64/falkor-tag-collision-avoidance.c  | 856 ++
 gcc/config/aarch64/t-aarch64  |   9 +
 7 files changed, 921 insertions(+), 46 deletions(-)





+/* Find the use def chain in which INSN exists and then see if there is a
+   definition inside the loop and outside it.  We use this as a simple
+   approximation to determine whether the base register is an IV.  The basic
+   idea is to find INSN in the use-def chains for its base register and find
+   all definitions that reach it.  Of all these definitions, there should be at
+   least one definition that is a simple addition of a constant value, either
+   as a binary operation or a pre or post update.
+
+   The function returns true if the base register is estimated to be an IV.  */
+static bool
+iv_p (rtx_insn *insn, rtx reg, struct loop *loop)
+{
+  df_ref ause;
+  unsigned regno = REGNO (reg);
+
+  /* Ignore loads from the stack.  */
+  if (regno == SP_REGNUM)
+return false;
+
+  for (ause= DF_REG_USE_CHAIN (regno); ause; ause = DF_REF_NEXT_REG (ause))
+{


Space after ause


+  if (!DF_REF_INSN_INFO (ause)
+ || !NONDEBUG_INSN_P (DF_REF_INSN (ause)))
+   continue;
+
+  if (insn != DF_REF_INSN (ause))
+   continue;
+
+  struct df_insn_info *insn_info = DF_INSN_INFO_GET (insn);
+  df_ref def_rec;
+
+  FOR_EACH_INSN_INFO_DEF (def_rec, insn_info)
+   {
+ rtx_insn *insn = DF_REF_INSN (def_rec);
+ basic_block bb = BLOCK_FOR_INSN (insn);
+
+ if (dominated_by_p (CDI_DOMINATORS, bb, loop->header)
+ && bb->loop_father == loop)
+   {
+ if (recog_memoized (insn) < 0)
+   continue;
+
+ rtx pat = PATTERN (insn);
+
+ /* Prefetch or clobber; unlikely to be a constant stride.  The
+falkor software prefetcher tuning is pretty conservative, so
+its presence indicates that the access pattern is probably
+strided but most likely with an unknown stride size or a
+stride size that is quite large.  */
+ if (GET_CODE (pat) != SET)
+   continue;
+
+ rtx x = SET_SRC (pat);
+ if (GET_CODE (x) == ZERO_EXTRACT
+ || GET_CODE (x) == ZERO_EXTEND
+ || GET_CODE (x) == 

Re: [PATCH] Properly unshare TYPE_SIZE_UNIT/DECL_FIELD_OFFSET (PR86216)

2018-07-13 Thread Richard Biener
On Fri, 13 Jul 2018, Jakub Jelinek wrote:

> On Fri, Jul 13, 2018 at 01:49:38PM +0200, Richard Biener wrote:
> > The testcase in the PR, while previously ICEing because the C++ FE doesn't
> > properly capture VLA size fields, now ICEs because gimplification
> > introduces SSA uses that appear in a different function than its
> > definition.  This happens because there is tree sharing between
> > the functions.  For nested functions (which the C++ lambdas are not)
> > such tree sharing ended up being harmless before GCC7 because unnesting
> > resolves all locals with wrong origin to the static chain (and 
> > gimplification ordering made sure definitions always appear in the
> > outer function).
> > 
> > The following resolves this by unsharing size expressions in c-common.c
> > 
> > I realize that this may end up pessimizing early code since 1:1
> > tree-sharing with what is gimplified from a DECL_EXPR doesn't
> > share the gimplification result.
> 
> I think the unsharing is just wrong, we never want to unshare those,
> the SAVE_EXPR expression needs to be evaluated at the DECL_EXPR point and
> never anywhere else, never twice (it could have side-effects etc.).
> The C++ FE must be fixed to handle the lambda cases.

Well, the SAVE_EXPR isn't the issue - it is the expression wrapping it,
in this case

((sizetype) (SAVE_EXPR <(ssizetype) arg + -1>) + 1) * 4)

which is in TYPE_SIZE_UNIT.

The unsharing we do makes the sizepos get a decl but all other
references to the TYPE_SIZE_UNIT expression get gimplified again.
And folding may cause the tree sharing to happen not at the
TYPE_SIZE_UNIT point but at a sub-expression of that.

> > Another option might be to force gimplification to not generate
> > SSA temporaries when gimplifying size positions but gimplify_one_sizepos
> > oddly enough unshares trees before gimplifying ...(!?)  This would
> > need to be removed (see patch after the tested patch below).
> 
> I like that gimplify_one_sizepos change much more (I guess we need a
> non-lambda testcase).

It breaks Ada bootstrap.  I guess Ada has variable-size types in
non-function scope (not sure how TYPE_SIZES_GIMPLIFIED works then
though).  That said, r92495 moved the unshare_expr from layout_type
to gimplify_one_sizepos.

Without removing the unshare_expr the patch is a no-op because
gimplifying SAVE_EXPRs already works correctly.

> Lambdas were broken even before GCC7, while we might not ICE, we certainly
> didn't generate correct code.

Well, we did ICE but only during RTL expansion (no GIMPLE verifier
checks that automatic vars in a function are in the proper function).

But yes, we probably need a non-C++, nested function testcase.

Richard.


[PATCH] S/390: libstdc++: 64 and 32 bit baseline update

2018-07-13 Thread Andreas Krebbel
Obviously I missed doing a refresh for some time already.
Do the updates look reasonable?

Andreas

libstdc++-v3/ChangeLog:

2018-07-13  Andreas Krebbel  

* config/abi/post/s390-linux-gnu/baseline_symbols.txt: Update.
* config/abi/post/s390x-linux-gnu/32/baseline_symbols.txt: Update.
* config/abi/post/s390x-linux-gnu/baseline_symbols.txt: Update.
---
 .../config/abi/post/s390-linux-gnu/baseline_symbols.txt| 14 ++
 .../abi/post/s390x-linux-gnu/32/baseline_symbols.txt   | 14 ++
 .../config/abi/post/s390x-linux-gnu/baseline_symbols.txt   | 14 ++
 3 files changed, 42 insertions(+)

diff --git a/libstdc++-v3/config/abi/post/s390-linux-gnu/baseline_symbols.txt 
b/libstdc++-v3/config/abi/post/s390-linux-gnu/baseline_symbols.txt
index 8deb2b2..3f5dee6 100644
--- a/libstdc++-v3/config/abi/post/s390-linux-gnu/baseline_symbols.txt
+++ b/libstdc++-v3/config/abi/post/s390-linux-gnu/baseline_symbols.txt
@@ -444,6 +444,7 @@ 
FUNC:_ZNKSt13basic_fstreamIwSt11char_traitsIwEE7is_openEv@GLIBCXX_3.4
 FUNC:_ZNKSt13basic_istreamIwSt11char_traitsIwEE6gcountEv@@GLIBCXX_3.4
 FUNC:_ZNKSt13basic_istreamIwSt11char_traitsIwEE6sentrycvbEv@@GLIBCXX_3.4
 FUNC:_ZNKSt13basic_ostreamIwSt11char_traitsIwEE6sentrycvbEv@@GLIBCXX_3.4
+FUNC:_ZNKSt13random_device13_M_getentropyEv@@GLIBCXX_3.4.25
 FUNC:_ZNKSt13runtime_error4whatEv@@GLIBCXX_3.4
 FUNC:_ZNKSt14basic_ifstreamIcSt11char_traitsIcEE5rdbufEv@@GLIBCXX_3.4
 FUNC:_ZNKSt14basic_ifstreamIcSt11char_traitsIcEE7is_openEv@@GLIBCXX_3.4.5
@@ -1859,10 +1860,12 @@ FUNC:_ZNSt11char_traitsIcE2eqERKcS2_@@GLIBCXX_3.4.5
 FUNC:_ZNSt11char_traitsIcE2eqERKcS2_@GLIBCXX_3.4
 FUNC:_ZNSt11char_traitsIwE2eqERKwS2_@@GLIBCXX_3.4.5
 FUNC:_ZNSt11char_traitsIwE2eqERKwS2_@GLIBCXX_3.4
+FUNC:_ZNSt11logic_errorC1EOS_@@GLIBCXX_3.4.26
 FUNC:_ZNSt11logic_errorC1EPKc@@GLIBCXX_3.4.21
 
FUNC:_ZNSt11logic_errorC1ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE@@GLIBCXX_3.4.21
 FUNC:_ZNSt11logic_errorC1ERKS_@@GLIBCXX_3.4.21
 FUNC:_ZNSt11logic_errorC1ERKSs@@GLIBCXX_3.4
+FUNC:_ZNSt11logic_errorC2EOS_@@GLIBCXX_3.4.26
 FUNC:_ZNSt11logic_errorC2EPKc@@GLIBCXX_3.4.21
 
FUNC:_ZNSt11logic_errorC2ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE@@GLIBCXX_3.4.21
 FUNC:_ZNSt11logic_errorC2ERKS_@@GLIBCXX_3.4.21
@@ -1870,6 +1873,7 @@ FUNC:_ZNSt11logic_errorC2ERKSs@@GLIBCXX_3.4
 FUNC:_ZNSt11logic_errorD0Ev@@GLIBCXX_3.4
 FUNC:_ZNSt11logic_errorD1Ev@@GLIBCXX_3.4
 FUNC:_ZNSt11logic_errorD2Ev@@GLIBCXX_3.4
+FUNC:_ZNSt11logic_erroraSEOS_@@GLIBCXX_3.4.26
 FUNC:_ZNSt11logic_erroraSERKS_@@GLIBCXX_3.4.21
 FUNC:_ZNSt11range_errorC1EPKc@@GLIBCXX_3.4.21
 
FUNC:_ZNSt11range_errorC1ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE@@GLIBCXX_3.4.21
@@ -2230,10 +2234,12 @@ FUNC:_ZNSt13random_device7_M_finiEv@@GLIBCXX_3.4.18
 
FUNC:_ZNSt13random_device7_M_initERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE@@GLIBCXX_3.4.21
 FUNC:_ZNSt13random_device7_M_initERKSs@@GLIBCXX_3.4.18
 FUNC:_ZNSt13random_device9_M_getvalEv@@GLIBCXX_3.4.18
+FUNC:_ZNSt13runtime_errorC1EOS_@@GLIBCXX_3.4.26
 FUNC:_ZNSt13runtime_errorC1EPKc@@GLIBCXX_3.4.21
 
FUNC:_ZNSt13runtime_errorC1ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE@@GLIBCXX_3.4.21
 FUNC:_ZNSt13runtime_errorC1ERKS_@@GLIBCXX_3.4.21
 FUNC:_ZNSt13runtime_errorC1ERKSs@@GLIBCXX_3.4
+FUNC:_ZNSt13runtime_errorC2EOS_@@GLIBCXX_3.4.26
 FUNC:_ZNSt13runtime_errorC2EPKc@@GLIBCXX_3.4.21
 
FUNC:_ZNSt13runtime_errorC2ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE@@GLIBCXX_3.4.21
 FUNC:_ZNSt13runtime_errorC2ERKS_@@GLIBCXX_3.4.21
@@ -2241,6 +2247,7 @@ FUNC:_ZNSt13runtime_errorC2ERKSs@@GLIBCXX_3.4
 FUNC:_ZNSt13runtime_errorD0Ev@@GLIBCXX_3.4
 FUNC:_ZNSt13runtime_errorD1Ev@@GLIBCXX_3.4
 FUNC:_ZNSt13runtime_errorD2Ev@@GLIBCXX_3.4
+FUNC:_ZNSt13runtime_erroraSEOS_@@GLIBCXX_3.4.26
 FUNC:_ZNSt13runtime_erroraSERKS_@@GLIBCXX_3.4.21
 
FUNC:_ZNSt14basic_ifstreamIcSt11char_traitsIcEE4openEPKcSt13_Ios_Openmode@@GLIBCXX_3.4
 
FUNC:_ZNSt14basic_ifstreamIcSt11char_traitsIcEE4openERKNSt7__cxx1112basic_stringIcS1_SaIcEEESt13_Ios_Openmode@@GLIBCXX_3.4.21
@@ -3017,6 +3024,7 @@ 
FUNC:_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE6assignERKS4_@@GLIBCXX
 
FUNC:_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE6assignERKS4_mm@@GLIBCXX_3.4.21
 
FUNC:_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE6assignESt16initializer_listIcE@@GLIBCXX_3.4.21
 
FUNC:_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE6assignEmc@@GLIBCXX_3.4.21
+FUNC:_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE6insertEN9__gnu_cxx17__normal_iteratorIPKcS4_EESt16initializer_listIcE@@GLIBCXX_3.4.26
 
FUNC:_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE6insertEN9__gnu_cxx17__normal_iteratorIPKcS4_EEc@@GLIBCXX_3.4.21
 
FUNC:_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE6insertEN9__gnu_cxx17__normal_iteratorIPKcS4_EEmc@@GLIBCXX_3.4.21
 

Re: [PATCH] Properly unshare TYPE_SIZE_UNIT/DECL_FIELD_OFFSET (PR86216)

2018-07-13 Thread Jakub Jelinek
On Fri, Jul 13, 2018 at 01:49:38PM +0200, Richard Biener wrote:
> The testcase in the PR, while previously ICEing because the C++ FE doesn't
> properly capture VLA size fields, now ICEs because gimplification
> introduces SSA uses that appear in a different function than its
> definition.  This happens because there is tree sharing between
> the functions.  For nested functions (which the C++ lambdas are not)
> such tree sharing ended up being harmless before GCC7 because unnesting
> resolves all locals with wrong origin to the static chain (and 
> gimplification ordering made sure definitions always appear in the
> outer function).
> 
> The following resolves this by unsharing size expressions in c-common.c
> 
> I realize that this may end up pessimizing early code since 1:1
> tree-sharing with what is gimplified from a DECL_EXPR doesn't
> share the gimplification result.

I think the unsharing is just wrong, we never want to unshare those,
the SAVE_EXPR expression needs to be evaluated at the DECL_EXPR point and
never anywhere else, never twice (it could have side-effects etc.).
The C++ FE must be fixed to handle the lambda cases.

> Another option might be to force gimplification to not generate
> SSA temporaries when gimplifying size positions but gimplify_one_sizepos
> oddly enough unshares trees before gimplifying ...(!?)  This would
> need to be removed (see patch after the tested patch below).

I like that gimplify_one_sizepos change much more (I guess we need a
non-lambda testcase).
Lambdas were broken even before GCC7, while we might not ICE, we certainly
didn't generate correct code.

Jakub


arm - Add vendor and CPU id information to arm-cpus.in

2018-07-13 Thread Richard Earnshaw (lists)
This patch moves the vendor and CPU id data from driver-arm.c to the
main table of CPU data in arm-cpus.in.  It then adds rules to
parsecpu.awk to build data tables that can be used by the driver for
automatic CPU detection when running natively.  This is the last major
bit of CPU-specific data that can be usefully moved to the CPU data
tables (I don't think it is sensible to move the per-cpu tuning data
from its current location).

The syntax and parser can support revision ranges, but at present
nothing is done with that data: no supported cpu currently needs that
capability.

* config/arm/driver-arm.c: Include arm-native.h.
(host_detect_local_cpu): Use auto-generated data tables.
(vendors, arm_cpu_table): Delete.  Move part information to ...
* config/arm/arm-cpus.in: ... here.
* config/arm/parsecpu.awk (gen_native): New function.
(vendor, part): New CPU fields.
(END): Add support for building the native CPU detection tables.
* config/arm/t-arm (arm-native.h): Add build rule.
(driver-arm.o): Add dependency on arm-native.h.

Tested on both native and cross builds.  Installed on trunk.
diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
index d6eed2f..d82e95a 100644
--- a/gcc/config/arm/arm-cpus.in
+++ b/gcc/config/arm/arm-cpus.in
@@ -624,6 +624,8 @@ end arch iwmmxt2
 #   [option  add|remove ]*
 #   [optalias  ]*
 #   [costs ]
+#   [vendor 
+#[part  [minrev [maxrev]]]
 # end cpu 
 #
 # If omitted, cname is formed from transforming the cpuname to convert
@@ -633,6 +635,14 @@ end arch iwmmxt2
 # Each add option must have a distinct feature set and each remove
 # option must similarly have a distinct feature set.  Option aliases can be
 # added with the optalias statement.
+# Vendor, part and revision information is used for native CPU and architecture
+# detection.  All values must be in hex (lower case) with the leading '0x'
+# omitted.  For example the cortex-a9 will have vendor 41 and part c09.
+# Revision information is used to match a subrange of part
+# revisions: minrev <= detected <= maxrev.
+# If a minrev or maxrev are omitted then minrev defaults to zero and maxrev
+# to infinity.
+# Revision information is not implemented yet; no part uses it.
 
 # V4 Architecture Processors
 begin cpu arm8
@@ -878,6 +888,8 @@ begin cpu arm926ej-s
  architecture armv5tej+fp
  option nofp remove ALL_FP
  costs 9e
+ vendor 41
+ part 926
 end cpu arm926ej-s
 
 begin cpu arm1026ej-s
@@ -886,6 +898,8 @@ begin cpu arm1026ej-s
  architecture armv5tej+fp
  option nofp remove ALL_FP
  costs 9e
+ vendor 41
+ part a26
 end cpu arm1026ej-s
 
 
@@ -902,6 +916,8 @@ begin cpu arm1136jf-s
  tune flags LDSCHED
  architecture armv6j+fp
  costs 9e
+ vendor 41
+ part b36
 end cpu arm1136jf-s
 
 begin cpu arm1176jz-s
@@ -916,6 +932,8 @@ begin cpu arm1176jzf-s
  tune flags LDSCHED
  architecture armv6kz+fp
  costs 9e
+ vendor 41
+ part b76
 end cpu arm1176jzf-s
 
 begin cpu mpcorenovfp
@@ -928,6 +946,8 @@ begin cpu mpcore
  tune flags LDSCHED
  architecture armv6k+fp
  costs 9e
+ vendor 41
+ part b02
 end cpu mpcore
 
 begin cpu arm1156t2-s
@@ -942,6 +962,8 @@ begin cpu arm1156t2f-s
  tune flags LDSCHED
  architecture armv6t2+fp
  costs v6t2
+ vendor 41
+ part b56
 end cpu arm1156t2f-s
 
 
@@ -951,6 +973,8 @@ begin cpu cortex-m1
  tune flags LDSCHED
  architecture armv6s-m
  costs v6m
+ vendor 41
+ part c21
 end cpu cortex-m1
 
 begin cpu cortex-m0
@@ -958,6 +982,8 @@ begin cpu cortex-m0
  tune flags LDSCHED
  architecture armv6s-m
  costs v6m
+ vendor 41
+ part c20
 end cpu cortex-m0
 
 begin cpu cortex-m0plus
@@ -1022,6 +1048,8 @@ begin cpu cortex-a5
  option nosimd remove ALL_SIMD
  option nofp remove ALL_FP
  costs cortex_a5
+ vendor 41
+ part c05
 end cpu cortex-a5
 
 begin cpu cortex-a7
@@ -1031,6 +1059,8 @@ begin cpu cortex-a7
  option nosimd remove ALL_SIMD
  option nofp remove ALL_FP
  costs cortex_a7
+ vendor 41
+ part c07
 end cpu cortex-a7
 
 begin cpu cortex-a8
@@ -1039,6 +1069,8 @@ begin cpu cortex-a8
  architecture armv7-a+simd
  option nofp remove ALL_FP
  costs cortex_a8
+ vendor 41
+ part c08
 end cpu cortex-a8
 
 begin cpu cortex-a9
@@ -1048,6 +1080,8 @@ begin cpu cortex-a9
  option nosimd remove ALL_SIMD
  option nofp remove ALL_FP
  costs cortex_a9
+ vendor 41
+ part c09
 end cpu cortex-a9
 
 begin cpu cortex-a12
@@ -1057,6 +1091,8 @@ begin cpu cortex-a12
  architecture armv7ve+simd
  option nofp remove ALL_FP
  costs cortex_a12
+ vendor 41
+ part c0d
 end cpu cortex-a12
 
 begin cpu cortex-a15
@@ -1065,6 +1101,8 @@ begin cpu cortex-a15
  architecture armv7ve+simd
  option nofp remove ALL_FP
  costs cortex_a15
+ vendor 41
+ part c0f
 end cpu cortex-a15
 
 begin cpu cortex-a17
@@ -1073,6 +,8 @@ begin cpu cortex-a17
  architecture armv7ve+simd
  option nofp remove ALL_FP
  costs cortex_a12
+ vendor 41
+ part c0e
 end cpu cortex-a17
 
 begin cpu cortex-r4
@@ -1087,6 +1127,8 @@ begin cpu cortex-r4f
  tune flags LDSCHED
  

[PATCH] Properly unshare TYPE_SIZE_UNIT/DECL_FIELD_OFFSET (PR86216)

2018-07-13 Thread Richard Biener


The testcase in the PR, while previously ICEing because the C++ FE doesn't
properly capture VLA size fields, now ICEs because gimplification
introduces SSA uses that appear in a different function than its
definition.  This happens because there is tree sharing between
the functions.  For nested functions (which the C++ lambdas are not)
such tree sharing ended up being harmless before GCC7 because unnesting
resolves all locals with wrong origin to the static chain (and 
gimplification ordering made sure definitions always appear in the
outer function).

The following resolves this by unsharing size expressions in c-common.c

I realize that this may end up pessimizing early code since 1:1
tree-sharing with what is gimplified from a DECL_EXPR doesn't
share the gimplification result.

Another option might be to force gimplification to not generate
SSA temporaries when gimplifying size positions but gimplify_one_sizepos
oddly enough unshares trees before gimplifying ...(!?)  This would
need to be removed (see patch after the tested patch below).

Bootstrapped / tested on x86_64-unknown-linux-gnu.

Richard.

2018-07-13  Richard Biener  

PR c/86216
* c-common.c (c_sizeof_or_alignof_type): Unshare TYPE_SIZE_UNIT.
(fold_offsetof): Unshare DECL_FIELD_OFFSET and TYPE_SIZE_UNIT.

Index: gcc/c-family/c-common.c
===
--- gcc/c-family/c-common.c (revision 262624)
+++ gcc/c-family/c-common.c (working copy)
@@ -3635,7 +3635,8 @@ c_sizeof_or_alignof_type (location_t loc
 {
   if (is_sizeof)
/* Convert in case a char is more than one unit.  */
-   value = size_binop_loc (loc, CEIL_DIV_EXPR, TYPE_SIZE_UNIT (type),
+   value = size_binop_loc (loc, CEIL_DIV_EXPR,
+   unshare_expr (TYPE_SIZE_UNIT (type)),
size_int (TYPE_PRECISION (char_type_node)
  / BITS_PER_UNIT));
   else if (min_alignof)
@@ -6210,7 +6211,8 @@ fold_offsetof (tree expr, tree type, enu
 "member %qD", t);
  return error_mark_node;
}
-  off = size_binop_loc (input_location, PLUS_EXPR, DECL_FIELD_OFFSET (t),
+  off = size_binop_loc (input_location, PLUS_EXPR,
+   unshare_expr (DECL_FIELD_OFFSET (t)),
size_int (tree_to_uhwi (DECL_FIELD_BIT_OFFSET (t))
  / BITS_PER_UNIT));
   break;
@@ -6266,7 +6268,8 @@ fold_offsetof (tree expr, tree type, enu
}
 
   t = convert (sizetype, t);
-  off = size_binop (MULT_EXPR, TYPE_SIZE_UNIT (TREE_TYPE (expr)), t);
+  off = size_binop (MULT_EXPR,
+   unshare_expr (TYPE_SIZE_UNIT (TREE_TYPE (expr))), t);
   break;
 
 case COMPOUND_EXPR:



alternative, untested:

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 6b76d17dbaf..455bc940bac 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -12706,11 +12706,14 @@ gimplify_one_sizepos (tree *expr_p, gimple_seq 
*stmt_p)
   || CONTAINS_PLACEHOLDER_P (expr))
 return;
 
-  *expr_p = unshare_expr (expr);
-
   /* SSA names in decl/type fields are a bad idea - they'll get reclaimed
- if the def vanishes.  */
+ if the def vanishes.  And since tree sharing across functions can
+ happen for size expressions no temporaries generated by the
+ gimplification may be SSA names either.  */
+  bool saved_into_ssa = gimplify_ctxp->into_ssa;
+  gimplify_ctxp->into_ssa = false;
   gimplify_expr (expr_p, stmt_p, NULL, is_gimple_val, fb_rvalue, false);
+  gimplify_ctxp->into_ssa = saved_into_ssa;
 
   /* If expr wasn't already is_gimple_sizepos or is_gimple_constant from 
the
  FE, ensure that it is a VAR_DECL, otherwise we might handle some 
decls



Re: [PATCH][testsuite/guality] Run guality tests with Og

2018-07-13 Thread Richard Biener
On Fri, 13 Jul 2018, Tom de Vries wrote:

> Hi,
> 
> we advertise Og as the optimization level of choice for the standard
> edit-compile-debug cycle, but do not run the guality tests for Og with the
> default torture options.
> 
> This patch ensures that we test -Og in the guality tests.
> 
> F.i., for gcc.dg/guality there are 45 fails for Og (while there are none for
> O1), in these test-cases:
> ...
> gcc.dg/guality/pr54200.c
> gcc.dg/guality/pr54970.c
> gcc.dg/guality/pr56154-1.c
> gcc.dg/guality/pr59776.c
> gcc.dg/guality/sra-1.c
> ...
> 
> Tested gcc.dg/guality on c-only compiler, currently doing bootstrap and
> reg-test on x86_64.
> 
> OK for trunk if no issues found during testing?

OK.

Richard.

> Thanks,
> - Tom
> 
> [testsuite/guality] Run guality tests with Og
> 
> 2018-07-13  Tom de Vries  
> 
>   * lib/gcc-gdb-test.exp (guality_minimal_options): New proc.
>   * g++.dg/guality/guality.exp: Ensure Og is part of torture options.
>   * gcc.dg/guality/guality.exp: Same.
>   * gfortran.dg/guality/guality.exp: Same.
> 
> ---
>  gcc/testsuite/g++.dg/guality/guality.exp  |  9 +
>  gcc/testsuite/gcc.dg/guality/guality.exp  |  3 ++-
>  gcc/testsuite/gfortran.dg/guality/guality.exp |  9 +
>  gcc/testsuite/lib/gcc-gdb-test.exp| 14 ++
>  4 files changed, 34 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/testsuite/g++.dg/guality/guality.exp 
> b/gcc/testsuite/g++.dg/guality/guality.exp
> index 4be22baa19c..757b20b61e2 100644
> --- a/gcc/testsuite/g++.dg/guality/guality.exp
> +++ b/gcc/testsuite/g++.dg/guality/guality.exp
> @@ -48,6 +48,14 @@ if ![info exists ::env(GUALITY_GDB_NAME)] {
>  }
>  report_gdb $::env(GUALITY_GDB_NAME) [info script]
>  
> +global DG_TORTURE_OPTIONS LTO_TORTURE_OPTIONS
> +set guality_dg_torture_options [guality_minimal_options $DG_TORTURE_OPTIONS]
> +torture-init
> +set-torture-options \
> +$guality_dg_torture_options \
> +[list {}] \
> +$LTO_TORTURE_OPTIONS
> +
>  if {[check_guality "
>#include \"$srcdir/$subdir/guality.h\"
>volatile long int varl = 6;
> @@ -65,4 +73,5 @@ if [info exists guality_gdb_name] {
>  unsetenv GUALITY_GDB_NAME
>  }
>  
> +torture-finish
>  dg-finish
> diff --git a/gcc/testsuite/gcc.dg/guality/guality.exp 
> b/gcc/testsuite/gcc.dg/guality/guality.exp
> index d9994341477..ca77a446f86 100644
> --- a/gcc/testsuite/gcc.dg/guality/guality.exp
> +++ b/gcc/testsuite/gcc.dg/guality/guality.exp
> @@ -62,7 +62,8 @@ proc guality_transform_options { args } {
>  }
>  
>  global DG_TORTURE_OPTIONS
> -set guality_dg_torture_options [guality_transform_options 
> $DG_TORTURE_OPTIONS]
> +set guality_dg_torture_options [guality_minimal_options $DG_TORTURE_OPTIONS]
> +set guality_dg_torture_options [guality_transform_options 
> $guality_dg_torture_options]
>  set guality_lto_torture_options [guality_transform_options 
> $LTO_TORTURE_OPTIONS]
>  torture-init
>  set-torture-options \
> diff --git a/gcc/testsuite/gfortran.dg/guality/guality.exp 
> b/gcc/testsuite/gfortran.dg/guality/guality.exp
> index f76347dd52f..f224cdfefa5 100644
> --- a/gcc/testsuite/gfortran.dg/guality/guality.exp
> +++ b/gcc/testsuite/gfortran.dg/guality/guality.exp
> @@ -29,10 +29,19 @@ if ![info exists ::env(GUALITY_GDB_NAME)] {
>  }
>  report_gdb $::env(GUALITY_GDB_NAME) [info script]
>  
> +global DG_TORTURE_OPTIONS LTO_TORTURE_OPTIONS
> +set guality_dg_torture_options [guality_minimal_options $DG_TORTURE_OPTIONS]
> +torture-init
> +set-torture-options \
> +$guality_dg_torture_options \
> +[list {}] \
> +$LTO_TORTURE_OPTIONS
> +
>  gfortran-dg-runtest [lsort [glob $srcdir/$subdir/*.\[fF\]{,90,95,03,08} ]] 
> "" ""
>  
>  if [info exists guality_gdb_name] {
>  unsetenv GUALITY_GDB_NAME
>  }
>  
> +torture-finish
>  dg-finish
> diff --git a/gcc/testsuite/lib/gcc-gdb-test.exp 
> b/gcc/testsuite/lib/gcc-gdb-test.exp
> index bb966d43023..b13d3ec7f85 100644
> --- a/gcc/testsuite/lib/gcc-gdb-test.exp
> +++ b/gcc/testsuite/lib/gcc-gdb-test.exp
> @@ -166,3 +166,17 @@ proc report_gdb { gdb loc } {
>  }
>  send_log -- "---\n$gdb_version\n---\n"
>  }
> +
> +# Argument 0 is the option list.
> +# Return the option list, ensuring that at least -Og is present.
> +
> +proc guality_minimal_options { args } {
> +set options [lindex $args 0]
> +foreach opt $options {
> + if { [regexp -- "-Og" $opt] } {
> + return $options
> + }
> +}
> +
> +return [lappend options "-Og"]
> +}
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


[PATCH] Fix PR85974

2018-07-13 Thread Richard Biener


The following patch fixes address difference folding to consider
when one operand doesn't need a conversion.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2018-07-13  Richard Biener  

PR middle-end/85974
* match.pd (addr1 - addr2): Allow either of the operand to
have a conversion.

* gcc.c-torture/compile/930326-1.c: Adjust to cover widening.

Index: gcc/match.pd
===
--- gcc/match.pd(revision 262624)
+++ gcc/match.pd(working copy)
@@ -1673,14 +1673,14 @@ (define_operator_list COND_TERNARY
(if (ptr_difference_const (@0, @1, ))
 { build_int_cst_type (type, diff); }
 (simplify
- (pointer_diff (convert?@2 ADDR_EXPR@0) (convert?@3 @1))
+ (pointer_diff (convert?@2 ADDR_EXPR@0) (convert1?@3 @1))
  (if (tree_nop_conversion_p (TREE_TYPE(@2), TREE_TYPE (@0))
   && tree_nop_conversion_p (TREE_TYPE(@3), TREE_TYPE (@1)))
   (with { poly_int64 diff; }
(if (ptr_difference_const (@0, @1, ))
 { build_int_cst_type (type, diff); }
 (simplify
- (pointer_diff (convert?@2 @0) (convert?@3 ADDR_EXPR@1))
+ (pointer_diff (convert?@2 @0) (convert1?@3 ADDR_EXPR@1))
  (if (tree_nop_conversion_p (TREE_TYPE(@2), TREE_TYPE (@0))
   && tree_nop_conversion_p (TREE_TYPE(@3), TREE_TYPE (@1)))
   (with { poly_int64 diff; }
Index: gcc/testsuite/gcc.c-torture/compile/930326-1.c
===
--- gcc/testsuite/gcc.c-torture/compile/930326-1.c  (revision 262624)
+++ gcc/testsuite/gcc.c-torture/compile/930326-1.c  (working copy)
@@ -4,3 +4,4 @@ struct
 } s;
 
 long i = s.f-
+long long j = s.f-


Re: [PATCH][wwwdocs] Mention Cortex-A76 support in GCC 9 changes.html

2018-07-13 Thread Richard Earnshaw (lists)
On 13/07/18 00:26, Gerald Pfeifer wrote:
> On Fri, 29 Jun 2018, Kyrill  Tkachov wrote:
>> This patch adds support for the Arm Cortex-A76 processor in changes.html 
>> for GCC 9. It enables the AArch64 section of the page and adds the news 
>> blob there. It also adds an entry to the already-existing arm entry.
> 
> Thank you, Kyrill.
> 
> Should I also apply the change below to account for the recent
> ARM to Arm rebranding?
> 
> Gerald
> 
> Index: changes.html
> ===
> RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-9/changes.html,v
> retrieving revision 1.12
> diff -u -r1.12 changes.html
> --- changes.html  4 Jul 2018 20:22:12 -   1.12
> +++ changes.html  12 Jul 2018 23:25:41 -
> @@ -99,7 +99,7 @@
>  
>  
>  
> -ARM
> +Arm
>  
>
>  Support has been added for the following processors
> 

Please.

R.


[PATCH] [v2][aarch64] Avoid tag collisions for loads falkor

2018-07-13 Thread Siddhesh Poyarekar
Hi,

This is a rewrite of the tag collision avoidance patch that Kugan had
written as a machine reorg pass back in February.

The falkor hardware prefetching system uses a combination of the
source, destination and offset to decide which prefetcher unit to
train with the load.  This is great when loads in a loop are
sequential but sub-optimal if there are unrelated loads in a loop that
tag to the same prefetcher unit.

This pass attempts to rename the desination register of such colliding
loads using routines available in regrename.c so that their tags do
not collide.  This shows some performance gains with mcf and xalancbmk
(~5% each) and will be tweaked further.  The pass is placed near the
fag end of the pass list so that subsequent passes don't inadvertantly
end up undoing the renames.

A full gcc bootstrap and testsuite ran successfully on aarch64, i.e. it
did not introduce any new regressions.  I also did a make-check with
-mcpu=falkor to ensure that there were no regressions.  The couple of
regressions I found were target-specific and were related to scheduling
and cost differences and are not correctness issues.

Changes from v1:

- Fixed up issues pointed out by Kyrill
- Avoid renaming R0/V0 since they could be return values
- Fixed minor formatting issues.

2018-07-02  Siddhesh Poyarekar  
Kugan Vivekanandarajah  

* config/aarch64/falkor-tag-collision-avoidance.c: New file.
* config.gcc (extra_objs): Build it.
* config/aarch64/t-aarch64 (falkor-tag-collision-avoidance.o):
Likewise.
* config/aarch64/aarch64-passes.def
(pass_tag_collision_avoidance): New pass.
* config/aarch64/aarch64.c (qdf24xx_tunings): Add
AARCH64_EXTRA_TUNE_RENAME_LOAD_REGS to tuning_flags.
(aarch64_classify_address): Remove static qualifier.
(aarch64_address_info, aarch64_address_type): Move to...
* config/aarch64/aarch64-protos.h: ... here.
(make_pass_tag_collision_avoidance): New function.
* config/aarch64/aarch64-tuning-flags.def (rename_load_regs):
New tuning flag.

CC: james.greenha...@arm.com
CC: kyrylo.tkac...@foss.arm.com
---
 gcc/config.gcc|   2 +-
 gcc/config/aarch64/aarch64-passes.def |   1 +
 gcc/config/aarch64/aarch64-protos.h   |  49 +
 gcc/config/aarch64/aarch64-tuning-flags.def   |   2 +
 gcc/config/aarch64/aarch64.c  |  48 +-
 .../aarch64/falkor-tag-collision-avoidance.c  | 856 ++
 gcc/config/aarch64/t-aarch64  |   9 +
 7 files changed, 921 insertions(+), 46 deletions(-)
 create mode 100644 gcc/config/aarch64/falkor-tag-collision-avoidance.c

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 63162aab676..c66dda0770e 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -304,7 +304,7 @@ aarch64*-*-*)
extra_headers="arm_fp16.h arm_neon.h arm_acle.h"
c_target_objs="aarch64-c.o"
cxx_target_objs="aarch64-c.o"
-   extra_objs="aarch64-builtins.o aarch-common.o cortex-a57-fma-steering.o"
+   extra_objs="aarch64-builtins.o aarch-common.o cortex-a57-fma-steering.o 
falkor-tag-collision-avoidance.o"
target_gtfiles="\$(srcdir)/config/aarch64/aarch64-builtins.c"
target_has_targetm_common=yes
;;
diff --git a/gcc/config/aarch64/aarch64-passes.def 
b/gcc/config/aarch64/aarch64-passes.def
index 87747b420b0..f61a8870aa1 100644
--- a/gcc/config/aarch64/aarch64-passes.def
+++ b/gcc/config/aarch64/aarch64-passes.def
@@ -19,3 +19,4 @@
.  */
 
 INSERT_PASS_AFTER (pass_regrename, 1, pass_fma_steering);
+INSERT_PASS_AFTER (pass_machine_reorg, 1, pass_tag_collision_avoidance);
diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 87c6ae20278..0a4558c2023 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -283,6 +283,49 @@ struct tune_params
   const struct cpu_prefetch_tune *prefetch;
 };
 
+/* Classifies an address.
+
+   ADDRESS_REG_IMM
+   A simple base register plus immediate offset.
+
+   ADDRESS_REG_WB
+   A base register indexed by immediate offset with writeback.
+
+   ADDRESS_REG_REG
+   A base register indexed by (optionally scaled) register.
+
+   ADDRESS_REG_UXTW
+   A base register indexed by (optionally scaled) zero-extended register.
+
+   ADDRESS_REG_SXTW
+   A base register indexed by (optionally scaled) sign-extended register.
+
+   ADDRESS_LO_SUM
+   A LO_SUM rtx with a base register and "LO12" symbol relocation.
+
+   ADDRESS_SYMBOLIC:
+   A constant symbolic address, in pc-relative literal pool.  */
+
+enum aarch64_address_type {
+  ADDRESS_REG_IMM,
+  ADDRESS_REG_WB,
+  ADDRESS_REG_REG,
+  ADDRESS_REG_UXTW,
+  ADDRESS_REG_SXTW,
+  ADDRESS_LO_SUM,
+  ADDRESS_SYMBOLIC
+};
+
+/* Address information.  */
+struct aarch64_address_info {
+  enum aarch64_address_type type;
+  rtx base;
+  rtx offset;
+  

Re: [PATCH][debug] Reuse debug exprs generated in remap_ssa_name

2018-07-13 Thread Richard Biener
On Fri, Jul 13, 2018 at 12:09 PM Tom de Vries  wrote:
>
> On 07/09/2018 02:43 PM, Richard Biener wrote:
> > On Sun, Jul 8, 2018 at 11:27 AM Tom de Vries  wrote:
> >>
> >> On Sun, Jul 08, 2018 at 11:22:41AM +0200, Tom de Vries wrote:
> >>> On Fri, Jul 06, 2018 at 04:38:50PM +0200, Richard Biener wrote:
>  On Fri, Jul 6, 2018 at 12:47 PM Tom de Vries  wrote:
> > On 07/05/2018 01:39 PM, Richard Biener wrote:
> >>>
> >>> 
> >>>
>  I now also spotted the code in remap_ssa_name that is supposed to handle
>  this it seems and for the testcase we only give up because the PARM_DECL 
>  is
>  remapped to a VAR_DECL.  So I suppose it is to be handled via the
>  debug-args stuff
>  which probably lacks in the area of versioning.
> 
>  Your patch feels like it adds stuff ontop of existing mechanisms that
>  should "just work"
>  with the correct setup at the correct places...
> 
> >>>
> >>> Hmm, I realized that I may be complicating things, by trying to do an
> >>> optimal fix in a single patch, so I decided to write two patches, one
> >>> with a fix, and then one improving the fix to be more optimal.
> >>>
> >>> Also, I suspect that the "just work" approach is this:
> >>> ...
> >>># DEBUG D#8 s=> iD.1900
> >>># DEBUG iD.1949 => D#8
> >>># DEBUG D#6 s=> iD.1949
> >>> ...
> >>> whereas previously I tried to map 'D#6' on iD.1900 directly.
> >>>
> >>
> >> Second patch OK for trunk?
> >
> > OK, though I wonder how it doesn't fail with that testcase with
> > the mismatching type where the removed param-decl is mapped
> > to a local var-decl.
>
> Previously I mapped the default def ssa-name onto the debug expression,
> which meant I had to add special handling at the start of
> remap_ssa_name, where already mapped ssa-names are handled, for the
> mismatched argument/parameter type case.
>
> Now I map the local var-decl onto the debug expression, and the code
> that is added to look it up is already guarded with processing_debug_stmt.

Ah, thanks for the explanation!

Richard.

> Thanks,
> - Tom
>
> >> diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
> >> index 6fbd8c3ca61..164c7fff710 100644
> >> --- a/gcc/tree-inline.c
> >> +++ b/gcc/tree-inline.c
> >> @@ -215,12 +215,16 @@ remap_ssa_name (tree name, copy_body_data *id)
> >>   processing_debug_stmt = -1;
> >>   return name;
> >> }
> >> + n = id->decl_map->get (val);
> >> + if (n && TREE_CODE (*n) == DEBUG_EXPR_DECL)
> >> +   return *n;
> >>   def_temp = gimple_build_debug_source_bind (vexpr, val, NULL);
> >>   DECL_ARTIFICIAL (vexpr) = 1;
> >>   TREE_TYPE (vexpr) = TREE_TYPE (name);
> >>   SET_DECL_MODE (vexpr, DECL_MODE (SSA_NAME_VAR (name)));
> >>   gsi = gsi_after_labels (single_succ (ENTRY_BLOCK_PTR_FOR_FN 
> >> (cfun)));
> >>   gsi_insert_before (, def_temp, GSI_SAME_STMT);
> >> + insert_decl_map (id, val, vexpr);
> >>   return vexpr;
> >> }
> >>
> >>


Re: [PATCH][testsuite, guality] Add -fno-ipa-icf in gcc.dg/guality

2018-07-13 Thread Richard Biener
On Fri, Jul 13, 2018 at 11:18 AM Tom de Vries  wrote:
>
> Hi,
>
> Optimization fipa-icf breaks debug info (as is noted in PR63572 - "ICF
> breaks user debugging experience"), which make guality tests clztest.c,
> ctztest.c and sra-1.c unsupported for option combination "-O2 -flto
> -fuse-linker-plugin -fno-fat-lto-objects".  F.i., in clztest.c foo and bar are
> merged, and gdb can set a breakpoint on a line in foo, but trying to set a
> breakpoint on a line in bar results in a breakpoint in main instead.
>
> This patch works around the problem by adding -fno-ipa-icf (as is already done
> in csttest.c and pr43077-1.c) to those testcases:
> ...
> -UNSUPPORTED: gcc.dg/guality/clztest.c ... line . g == f
> +PASS:gcc.dg/guality/clztest.c ... line . g == f
> -UNSUPPORTED: gcc.dg/guality/ctztest.c ... line . g == f
> +PASS:gcc.dg/guality/ctztest.c ... line . g == f
> -UNSUPPORTED: gcc.dg/guality/sra-1.c ... line .+1 a[0] == 4
> +PASS:gcc.dg/guality/sra-1.c ... line .+1 a[0] == 4
> -UNSUPPORTED: gcc.dg/guality/sra-1.c ... line . a[1] == 14
> +PASS:gcc.dg/guality/sra-1.c ... line . a[1] == 14
> ...
>
> Tested on x86_64.
>
> OK for trunk?

OK.

Thanks,
Richard.

> Thanks,
> - Tom
>
> [testsuite, guality] Add -fno-ipa-icf in gcc.dg/guality
>
> 2018-07-13  Tom de Vries  
>
> * gcc.dg/guality/clztest.c: Add -fno-ipa-icf in dg-options.
> * gcc.dg/guality/ctztest.c: Same.
> * gcc.dg/guality/sra-1.c: Same.
>
> ---
>  gcc/testsuite/gcc.dg/guality/clztest.c | 2 +-
>  gcc/testsuite/gcc.dg/guality/ctztest.c | 2 +-
>  gcc/testsuite/gcc.dg/guality/sra-1.c   | 2 +-
>  3 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.dg/guality/clztest.c 
> b/gcc/testsuite/gcc.dg/guality/clztest.c
> index f89c1c31a15..69527561c22 100644
> --- a/gcc/testsuite/gcc.dg/guality/clztest.c
> +++ b/gcc/testsuite/gcc.dg/guality/clztest.c
> @@ -1,5 +1,5 @@
>  /* { dg-do run { target { { i?86-*-* x86_64-*-* } && lp64 } } } */
> -/* { dg-options "-g" } */
> +/* { dg-options "-g -fno-ipa-icf" } */
>
>  volatile int vv;
>
> diff --git a/gcc/testsuite/gcc.dg/guality/ctztest.c 
> b/gcc/testsuite/gcc.dg/guality/ctztest.c
> index 5ce6c674be3..276752ac986 100644
> --- a/gcc/testsuite/gcc.dg/guality/ctztest.c
> +++ b/gcc/testsuite/gcc.dg/guality/ctztest.c
> @@ -1,5 +1,5 @@
>  /* { dg-do run { target { { i?86-*-* x86_64-*-* } && lp64 } } } */
> -/* { dg-options "-g" } */
> +/* { dg-options "-g -fno-ipa-icf" } */
>
>  volatile int vv;
>
> diff --git a/gcc/testsuite/gcc.dg/guality/sra-1.c 
> b/gcc/testsuite/gcc.dg/guality/sra-1.c
> index a747bc302aa..8ad57cf3f8e 100644
> --- a/gcc/testsuite/gcc.dg/guality/sra-1.c
> +++ b/gcc/testsuite/gcc.dg/guality/sra-1.c
> @@ -1,6 +1,6 @@
>  /* PR debug/43983 */
>  /* { dg-do run } */
> -/* { dg-options "-g" } */
> +/* { dg-options "-g -fno-ipa-icf" } */
>
>  struct A { int i; int j; };
>  struct B { int : 4; int i : 12; int j : 12; int : 4; };


[PATCH][testsuite/guality] Run guality tests with Og

2018-07-13 Thread Tom de Vries
Hi,

we advertise Og as the optimization level of choice for the standard
edit-compile-debug cycle, but do not run the guality tests for Og with the
default torture options.

This patch ensures that we test -Og in the guality tests.

F.i., for gcc.dg/guality there are 45 fails for Og (while there are none for
O1), in these test-cases:
...
gcc.dg/guality/pr54200.c
gcc.dg/guality/pr54970.c
gcc.dg/guality/pr56154-1.c
gcc.dg/guality/pr59776.c
gcc.dg/guality/sra-1.c
...

Tested gcc.dg/guality on c-only compiler, currently doing bootstrap and
reg-test on x86_64.

OK for trunk if no issues found during testing?

Thanks,
- Tom

[testsuite/guality] Run guality tests with Og

2018-07-13  Tom de Vries  

* lib/gcc-gdb-test.exp (guality_minimal_options): New proc.
* g++.dg/guality/guality.exp: Ensure Og is part of torture options.
* gcc.dg/guality/guality.exp: Same.
* gfortran.dg/guality/guality.exp: Same.

---
 gcc/testsuite/g++.dg/guality/guality.exp  |  9 +
 gcc/testsuite/gcc.dg/guality/guality.exp  |  3 ++-
 gcc/testsuite/gfortran.dg/guality/guality.exp |  9 +
 gcc/testsuite/lib/gcc-gdb-test.exp| 14 ++
 4 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/g++.dg/guality/guality.exp 
b/gcc/testsuite/g++.dg/guality/guality.exp
index 4be22baa19c..757b20b61e2 100644
--- a/gcc/testsuite/g++.dg/guality/guality.exp
+++ b/gcc/testsuite/g++.dg/guality/guality.exp
@@ -48,6 +48,14 @@ if ![info exists ::env(GUALITY_GDB_NAME)] {
 }
 report_gdb $::env(GUALITY_GDB_NAME) [info script]
 
+global DG_TORTURE_OPTIONS LTO_TORTURE_OPTIONS
+set guality_dg_torture_options [guality_minimal_options $DG_TORTURE_OPTIONS]
+torture-init
+set-torture-options \
+$guality_dg_torture_options \
+[list {}] \
+$LTO_TORTURE_OPTIONS
+
 if {[check_guality "
   #include \"$srcdir/$subdir/guality.h\"
   volatile long int varl = 6;
@@ -65,4 +73,5 @@ if [info exists guality_gdb_name] {
 unsetenv GUALITY_GDB_NAME
 }
 
+torture-finish
 dg-finish
diff --git a/gcc/testsuite/gcc.dg/guality/guality.exp 
b/gcc/testsuite/gcc.dg/guality/guality.exp
index d9994341477..ca77a446f86 100644
--- a/gcc/testsuite/gcc.dg/guality/guality.exp
+++ b/gcc/testsuite/gcc.dg/guality/guality.exp
@@ -62,7 +62,8 @@ proc guality_transform_options { args } {
 }
 
 global DG_TORTURE_OPTIONS
-set guality_dg_torture_options [guality_transform_options $DG_TORTURE_OPTIONS]
+set guality_dg_torture_options [guality_minimal_options $DG_TORTURE_OPTIONS]
+set guality_dg_torture_options [guality_transform_options 
$guality_dg_torture_options]
 set guality_lto_torture_options [guality_transform_options 
$LTO_TORTURE_OPTIONS]
 torture-init
 set-torture-options \
diff --git a/gcc/testsuite/gfortran.dg/guality/guality.exp 
b/gcc/testsuite/gfortran.dg/guality/guality.exp
index f76347dd52f..f224cdfefa5 100644
--- a/gcc/testsuite/gfortran.dg/guality/guality.exp
+++ b/gcc/testsuite/gfortran.dg/guality/guality.exp
@@ -29,10 +29,19 @@ if ![info exists ::env(GUALITY_GDB_NAME)] {
 }
 report_gdb $::env(GUALITY_GDB_NAME) [info script]
 
+global DG_TORTURE_OPTIONS LTO_TORTURE_OPTIONS
+set guality_dg_torture_options [guality_minimal_options $DG_TORTURE_OPTIONS]
+torture-init
+set-torture-options \
+$guality_dg_torture_options \
+[list {}] \
+$LTO_TORTURE_OPTIONS
+
 gfortran-dg-runtest [lsort [glob $srcdir/$subdir/*.\[fF\]{,90,95,03,08} ]] "" 
""
 
 if [info exists guality_gdb_name] {
 unsetenv GUALITY_GDB_NAME
 }
 
+torture-finish
 dg-finish
diff --git a/gcc/testsuite/lib/gcc-gdb-test.exp 
b/gcc/testsuite/lib/gcc-gdb-test.exp
index bb966d43023..b13d3ec7f85 100644
--- a/gcc/testsuite/lib/gcc-gdb-test.exp
+++ b/gcc/testsuite/lib/gcc-gdb-test.exp
@@ -166,3 +166,17 @@ proc report_gdb { gdb loc } {
 }
 send_log -- "---\n$gdb_version\n---\n"
 }
+
+# Argument 0 is the option list.
+# Return the option list, ensuring that at least -Og is present.
+
+proc guality_minimal_options { args } {
+set options [lindex $args 0]
+foreach opt $options {
+   if { [regexp -- "-Og" $opt] } {
+   return $options
+   }
+}
+
+return [lappend options "-Og"]
+}


Re: [PATCH, Ada] RISC-V: Initial riscv linux Ada port.

2018-07-13 Thread Eric Botcazou
> I poked at this a little and noticed a difference between the x86_64
> support and the RISC-V support.  The RISC-V C language port has char
> as unsigned by default.  The x86_64 port has char signed by default.
> If I add a -fsigned-char option, then the testcase works as expected
> for RISC-V.  Curiously, the Ada compiler accepts -fsigned-char but not
> -funsigned-char.

But it accepts -fno-signed-char, which is equivalent. :-)  In any case, I 
agree that it should also accept -funsigned-char, now done.

> I tried hacking in a -funsigned-char flag, but when
> I use it with the x86_64 port the result is still correct.  Maybe my
> quick hack wasn't quite right.  Anyways, the default signedness of
> char has something to do with the problem.

I don't seem to be able to reproduce the failure with a cross-compiler though 
so that's really weird.


* gcc-interface/lang.opt (funsigned-char): New option.
* gcc-interface/misc.c (gnat_handle_option): Accept it.
* gcc-interface/utils.c (finish_character_type): Tweak comment.


-- 
Eric BotcazouIndex: gcc-interface/lang.opt
===
--- gcc-interface/lang.opt	(revision 262551)
+++ gcc-interface/lang.opt	(working copy)
@@ -80,6 +80,10 @@ fsigned-char
 Ada AdaWhy AdaSCIL
 Make \"char\" signed by default.
 
+funsigned-char
+Ada AdaWhy AdaSCIL
+Make \"char\" unsigned by default.
+
 gant
 Ada AdaWhy AdaSCIL Driver Joined Undocumented RejectNegative
 Catch typos.
Index: gcc-interface/misc.c
===
--- gcc-interface/misc.c	(revision 262551)
+++ gcc-interface/misc.c	(working copy)
@@ -170,6 +170,7 @@ gnat_handle_option (size_t scode, const
 
 case OPT_fshort_enums:
 case OPT_fsigned_char:
+case OPT_funsigned_char:
   /* These are handled by the middle-end.  */
   break;
 
Index: gcc-interface/utils.c
===
--- gcc-interface/utils.c	(revision 262551)
+++ gcc-interface/utils.c	(working copy)
@@ -1684,7 +1684,7 @@ record_builtin_type (const char *name, t
   integral types are unsigned.
 
   Unfortunately the signedness of 'char' in C is implementation-defined
-  and GCC even has the option -fsigned-char to toggle it at run time.
+  and GCC even has the option -f{un}signed-char to toggle it at run time.
   Since GNAT's philosophy is to be compatible with C by default, to wit
   Interfaces.C.char is defined as a mere copy of Character, we may need
   to declare character types as signed types in GENERIC and generate the


Re: [PATCH][debug] Reuse debug exprs generated in remap_ssa_name

2018-07-13 Thread Tom de Vries
On 07/09/2018 02:43 PM, Richard Biener wrote:
> On Sun, Jul 8, 2018 at 11:27 AM Tom de Vries  wrote:
>>
>> On Sun, Jul 08, 2018 at 11:22:41AM +0200, Tom de Vries wrote:
>>> On Fri, Jul 06, 2018 at 04:38:50PM +0200, Richard Biener wrote:
 On Fri, Jul 6, 2018 at 12:47 PM Tom de Vries  wrote:
> On 07/05/2018 01:39 PM, Richard Biener wrote:
>>>
>>> 
>>>
 I now also spotted the code in remap_ssa_name that is supposed to handle
 this it seems and for the testcase we only give up because the PARM_DECL is
 remapped to a VAR_DECL.  So I suppose it is to be handled via the
 debug-args stuff
 which probably lacks in the area of versioning.

 Your patch feels like it adds stuff ontop of existing mechanisms that
 should "just work"
 with the correct setup at the correct places...

>>>
>>> Hmm, I realized that I may be complicating things, by trying to do an
>>> optimal fix in a single patch, so I decided to write two patches, one
>>> with a fix, and then one improving the fix to be more optimal.
>>>
>>> Also, I suspect that the "just work" approach is this:
>>> ...
>>># DEBUG D#8 s=> iD.1900
>>># DEBUG iD.1949 => D#8
>>># DEBUG D#6 s=> iD.1949
>>> ...
>>> whereas previously I tried to map 'D#6' on iD.1900 directly.
>>>
>>
>> Second patch OK for trunk?
> 
> OK, though I wonder how it doesn't fail with that testcase with
> the mismatching type where the removed param-decl is mapped
> to a local var-decl.

Previously I mapped the default def ssa-name onto the debug expression,
which meant I had to add special handling at the start of
remap_ssa_name, where already mapped ssa-names are handled, for the
mismatched argument/parameter type case.

Now I map the local var-decl onto the debug expression, and the code
that is added to look it up is already guarded with processing_debug_stmt.

Thanks,
- Tom

>> diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
>> index 6fbd8c3ca61..164c7fff710 100644
>> --- a/gcc/tree-inline.c
>> +++ b/gcc/tree-inline.c
>> @@ -215,12 +215,16 @@ remap_ssa_name (tree name, copy_body_data *id)
>>   processing_debug_stmt = -1;
>>   return name;
>> }
>> + n = id->decl_map->get (val);
>> + if (n && TREE_CODE (*n) == DEBUG_EXPR_DECL)
>> +   return *n;
>>   def_temp = gimple_build_debug_source_bind (vexpr, val, NULL);
>>   DECL_ARTIFICIAL (vexpr) = 1;
>>   TREE_TYPE (vexpr) = TREE_TYPE (name);
>>   SET_DECL_MODE (vexpr, DECL_MODE (SSA_NAME_VAR (name)));
>>   gsi = gsi_after_labels (single_succ (ENTRY_BLOCK_PTR_FOR_FN 
>> (cfun)));
>>   gsi_insert_before (, def_temp, GSI_SAME_STMT);
>> + insert_decl_map (id, val, vexpr);
>>   return vexpr;
>> }
>>
>>


Re: [testsuite] Minor tweak to 4 Aarch64 testcases

2018-07-13 Thread Eric Botcazou
> I used xfail for these testcases in particular because the intrinsics that
> they test should be available for both arm and aarch64.
> They are currently not implemented on arm, even though they should be.
> The other tests that are skipped instead of xfailed test intrinsics that
> should only be available on aarch64 and not arm.

OK, thanks for the explanation.

-- 
Eric Botcazou


[PATCH][testsuite, guality] Add -fno-ipa-icf in gcc.dg/guality

2018-07-13 Thread Tom de Vries
Hi,

Optimization fipa-icf breaks debug info (as is noted in PR63572 - "ICF
breaks user debugging experience"), which make guality tests clztest.c,
ctztest.c and sra-1.c unsupported for option combination "-O2 -flto
-fuse-linker-plugin -fno-fat-lto-objects".  F.i., in clztest.c foo and bar are
merged, and gdb can set a breakpoint on a line in foo, but trying to set a
breakpoint on a line in bar results in a breakpoint in main instead.

This patch works around the problem by adding -fno-ipa-icf (as is already done
in csttest.c and pr43077-1.c) to those testcases:
...
-UNSUPPORTED: gcc.dg/guality/clztest.c ... line . g == f
+PASS:gcc.dg/guality/clztest.c ... line . g == f
-UNSUPPORTED: gcc.dg/guality/ctztest.c ... line . g == f
+PASS:gcc.dg/guality/ctztest.c ... line . g == f
-UNSUPPORTED: gcc.dg/guality/sra-1.c ... line .+1 a[0] == 4
+PASS:gcc.dg/guality/sra-1.c ... line .+1 a[0] == 4
-UNSUPPORTED: gcc.dg/guality/sra-1.c ... line . a[1] == 14
+PASS:gcc.dg/guality/sra-1.c ... line . a[1] == 14
...

Tested on x86_64.

OK for trunk?

Thanks,
- Tom

[testsuite, guality] Add -fno-ipa-icf in gcc.dg/guality

2018-07-13  Tom de Vries  

* gcc.dg/guality/clztest.c: Add -fno-ipa-icf in dg-options.
* gcc.dg/guality/ctztest.c: Same.
* gcc.dg/guality/sra-1.c: Same.

---
 gcc/testsuite/gcc.dg/guality/clztest.c | 2 +-
 gcc/testsuite/gcc.dg/guality/ctztest.c | 2 +-
 gcc/testsuite/gcc.dg/guality/sra-1.c   | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/guality/clztest.c 
b/gcc/testsuite/gcc.dg/guality/clztest.c
index f89c1c31a15..69527561c22 100644
--- a/gcc/testsuite/gcc.dg/guality/clztest.c
+++ b/gcc/testsuite/gcc.dg/guality/clztest.c
@@ -1,5 +1,5 @@
 /* { dg-do run { target { { i?86-*-* x86_64-*-* } && lp64 } } } */
-/* { dg-options "-g" } */
+/* { dg-options "-g -fno-ipa-icf" } */
 
 volatile int vv;
 
diff --git a/gcc/testsuite/gcc.dg/guality/ctztest.c 
b/gcc/testsuite/gcc.dg/guality/ctztest.c
index 5ce6c674be3..276752ac986 100644
--- a/gcc/testsuite/gcc.dg/guality/ctztest.c
+++ b/gcc/testsuite/gcc.dg/guality/ctztest.c
@@ -1,5 +1,5 @@
 /* { dg-do run { target { { i?86-*-* x86_64-*-* } && lp64 } } } */
-/* { dg-options "-g" } */
+/* { dg-options "-g -fno-ipa-icf" } */
 
 volatile int vv;
 
diff --git a/gcc/testsuite/gcc.dg/guality/sra-1.c 
b/gcc/testsuite/gcc.dg/guality/sra-1.c
index a747bc302aa..8ad57cf3f8e 100644
--- a/gcc/testsuite/gcc.dg/guality/sra-1.c
+++ b/gcc/testsuite/gcc.dg/guality/sra-1.c
@@ -1,6 +1,6 @@
 /* PR debug/43983 */
 /* { dg-do run } */
-/* { dg-options "-g" } */
+/* { dg-options "-g -fno-ipa-icf" } */
 
 struct A { int i; int j; };
 struct B { int : 4; int i : 12; int j : 12; int : 4; };


[gen/AArch64] Generate helpers for substituting iterator values into pattern names

2018-07-13 Thread Richard Sandiford
Given a pattern like:

  (define_insn "aarch64_frecpe" ...)

the SVE ACLE implementation wants to generate the pattern for a
particular (non-constant) mode.  This patch automatically generates
helpers to do that, specifically:

  // Return CODE_FOR_nothing on failure.
  insn_code maybe_code_for_aarch64_frecpe (machine_mode);

  // Assert that the code exists.
  insn_code code_for_aarch64_frecpe (machine_mode);

  // Return NULL_RTX on failure.
  rtx maybe_gen_aarch64_frecpe (machine_mode, rtx, rtx);

  // Assert that generation succeeds.
  rtx gen_aarch64_frecpe (machine_mode, rtx, rtx);

Many patterns don't have sensible names when all <...>s are removed.
E.g. "2" would give a base name "2".  The new functions
therefore require explicit opt-in, which should also help to reduce
code bloat.

The (arbitrary) opt-in syntax I went for was to prefix the pattern
name with '@', similarly to the existing '*' marker.

The patch also makes config/aarch64 use the new routines in cases where
they obviously apply.  This was mostly straight-forward, but it seemed
odd that we defined:

   aarch64_reload_movcp<...>

but then only used it with DImode, never SImode.  If we should be
using Pmode instead of DImode, then that's a simple change,
but should probably be a separate patch.

Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-elf
and x86_64-linux-gnu.  I think I can self-approve the gen* bits,
but OK for the AArch64 parts?

Any objections to this approach or syntax?

Richard


2018-07-13  Richard Sandiford  

gcc/
* doc/md.texi: Expand the documentation of instruction names
to mention port-local uses.  Document '@' in pattern names.
* read-md.h (overloaded_instance, overloaded_name): New structs.
(mapping): Declare.
(md_reader::handle_overloaded_name): New member function.
(md_reader::get_overloads): Likewise.
(md_reader::m_first_overload): New member variable.
(md_reader::m_next_overload_ptr): Likewise.
(md_reader::m_overloads_htab): Likewise.
* read-md.c (md_reader::md_reader): Initialize m_first_overload,
m_next_overload_ptr and m_overloads_htab.
* read-rtl.c (iterator_group): Add "type" and "get_c_token" fields.
(get_mode_token, get_code_token, get_int_token): New functions.
(map_attr_string): Add an optional argument that passes back
the associated iterator.
(overloaded_name_hash, overloaded_name_eq_p, named_rtx_p):
(md_reader::handle_overloaded_name, add_overload_instance): New
functions.
(apply_iterators): Handle '@' names.  Report an error if '@'
is used without iterators.
(initialize_iterators): Initialize the new iterator_group fields.
* gencodes.c (handle_overloaded_code_for): New function.
(main): Use it to print declarations of maybe_code_for_* functions
and inline definitions of code_for_*.
* genflags.c (emit_overloaded_gen_proto): New function.
(main): Use it to print declarations of maybe_gen_* functions
and inline definitions of gen_*.
* genemit.c (print_overload_arguments, print_overload_test)
(handle_overloaded_code_for, handle_overloaded_gen): New functions.
(main): Use it to print definitions of maybe_code_for_* and
maybe_gen_* functions.
* config/aarch64/aarch64.c (aarch64_split_128bit_move): Use
gen_aarch64_mov{low,high}_di and gen_aarch64_movdi_{low,high}
instead of explicit mode checks.
(aarch64_split_simd_combine): Likewise gen_aarch64_simd_combine.
(aarch64_split_simd_move): Likewise gen_aarch64_split_simd_mov.
(aarch64_emit_load_exclusive): Likewise gen_aarch64_load_exclusive.
(aarch64_emit_store_exclusive): Likewise gen_aarch64_store_exclusive.
(aarch64_expand_compare_and_swap): Likewise
gen_aarch64_compare_and_swap and gen_aarch64_compare_and_swap_lse
(aarch64_gen_atomic_cas): Likewise gen_aarch64_atomic_cas.
(aarch64_emit_atomic_swap): Likewise gen_aarch64_atomic_swp.
(aarch64_constant_pool_reload_icode): Delete.
(aarch64_secondary_reload): Use code_for_aarch64_reload_movcp
instead of aarch64_constant_pool_reload_icode.  Use
code_for_aarch64_reload_mov instead of explicit mode checks.
(rsqrte_type, get_rsqrte_type, rsqrts_type, get_rsqrts_type): Delete.
(aarch64_emit_approx_sqrt): Use gen_aarch64_rsqrte instead of
get_rsqrte_type and gen_aarch64_rsqrts instead of gen_rqrts_type.
(recpe_type, get_recpe_type, recps_type, get_recps_type): Delete.
(aarch64_emit_approx_div): Use gen_aarch64_frecpe instead of
get_recpe_type and gen_aarch64_frecps instead of get_recps_type.
(aarch64_atomic_load_op_code): Delete.
(aarch64_emit_atomic_load_op): Likewise.
(aarch64_gen_atomic_ldop): Use UNSPECV_ATOMIC_* instead of
aarch64_atomic_load_op_code.  

[testsuite] Robustify target_tls_runtime check

2018-07-13 Thread Eric Botcazou
As witnessed by the kludge added for MSP430 and Visium, the check doesn't 
really work (that's also visible for arm-eabi).

Tested on x86_64-suse-linux, visium-elf and arm-eabi, OK for mainline?


2018-07-13  Eric Botcazou  

* lib/target-supports.exp (check_effective_target_tls_runtime): Force
global-dynamic model for thread variable and remove kludge.

-- 
Eric BotcazouIndex: lib/target-supports.exp
===
--- lib/target-supports.exp	(revision 262551)
+++ lib/target-supports.exp	(working copy)
@@ -878,13 +878,8 @@ proc check_effective_target_tls_emulated
 # Return 1 if TLS executables can run correctly, 0 otherwise.
 
 proc check_effective_target_tls_runtime {} {
-# The runtime does not have TLS support, but just
-# running the test below is insufficient to show this.
-if { [istarget msp430-*-*] || [istarget visium-*-*] } {
-	return 0
-}
 return [check_runtime tls_runtime {
-	__thread int thr = 0;
+	__thread int thr __attribute__((tls_model("global-dynamic"))) = 0;
 	int main (void) { return thr; }
 } [add_options_for_tls ""]]
 }


Re: [PATCH] reject conflicting attributes before calling handlers (PR 86453)

2018-07-13 Thread Christophe Lyon
Hi,

On Thu, 12 Jul 2018 at 00:04, Martin Sebor  wrote:
>
> The attached change set adjusts the attribute exclusion code
> to detect and reject incompatible attributes before attribute
> handlers are called to have a chance to make changes despite
> the exclusions.  The handlers are not run when a conflict is
> found.
>
> Tested on x86_64-linux.  I expected the fallout to be bigger
> but only a handful of tests needed adjusting and the changes
> all look like clear improvements.  I.e., conflicting attributes
> that diagnosed as being ignored really are being ignored as one
> would expect.
>

Since you committed this patch (r262596), I've noticed regressions on
aarch64/arm:
g++.dg/warn/pr86453.C  -std=c++11  (test for warnings, line 4)
g++.dg/warn/pr86453.C  -std=c++11 (test for excess errors)
g++.dg/warn/pr86453.C  -std=c++14  (test for warnings, line 4)
g++.dg/warn/pr86453.C  -std=c++14 (test for excess errors)
g++.dg/warn/pr86453.C  -std=c++98  (test for warnings, line 4)
g++.dg/warn/pr86453.C  -std=c++98 (test for excess errors)

The log says:
Excess errors:
/gcc/testsuite/g++.dg/warn/pr86453.C:4:44: warning: ignoring attribute
'packed' because it conflicts with attribute 'aligned' [-Wattributes]

Isn't there the same message on x86_64?


> Martin


Re: [testsuite] Minor tweak to 4 Aarch64 testcases

2018-07-13 Thread Kyrill Tkachov

Hi Eric,

On 13/07/18 09:23, Eric Botcazou wrote:

These 4 Aarch64 testcases use dg-xfail-if to disable themselves on ARM, while
all the other equivalent testcases use dg-skip-if.  The latter form is better
because it doesn't unnecessarily pollute the testsuite report.

Tested on arm-eabi, OK for the mainline?




I used xfail for these testcases in particular because the intrinsics that they 
test
should be available for both arm and aarch64.
They are currently not implemented on arm, even though they should be.
The other tests that are skipped instead of xfailed test intrinsics that should
only be available on aarch64 and not arm.

Cheers,
Kyrill


2018-07-13  Eric Botcazou  

* gcc.target/aarch64/advsimd-intrinsics/vld1x2.c: Replace dg-xfail-if
with dg-skip-if for ARM targets.
* gcc.target/aarch64/advsimd-intrinsics/vld1x3.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vst1x2.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vst1x3.c: Likewise.

--
Eric Botcazou




[testsuite] Minor tweak to 4 Aarch64 testcases

2018-07-13 Thread Eric Botcazou
These 4 Aarch64 testcases use dg-xfail-if to disable themselves on ARM, while 
all the other equivalent testcases use dg-skip-if.  The latter form is better 
because it doesn't unnecessarily pollute the testsuite report.

Tested on arm-eabi, OK for the mainline?


2018-07-13  Eric Botcazou  

* gcc.target/aarch64/advsimd-intrinsics/vld1x2.c: Replace dg-xfail-if
with dg-skip-if for ARM targets.
* gcc.target/aarch64/advsimd-intrinsics/vld1x3.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vst1x2.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vst1x3.c: Likewise.

-- 
Eric BotcazouIndex: gcc.target/aarch64/advsimd-intrinsics/vld1x2.c
===
--- gcc.target/aarch64/advsimd-intrinsics/vld1x2.c	(revision 262551)
+++ gcc.target/aarch64/advsimd-intrinsics/vld1x2.c	(working copy)
@@ -1,7 +1,7 @@
 /* We haven't implemented these intrinsics for arm yet.  */
-/* { dg-xfail-if "" { arm*-*-* } } */
 /* { dg-do run } */
 /* { dg-options "-O3" } */
+/* { dg-skip-if "" { arm*-*-* } } */
 
 #include 
 
Index: gcc.target/aarch64/advsimd-intrinsics/vld1x3.c
===
--- gcc.target/aarch64/advsimd-intrinsics/vld1x3.c	(revision 262551)
+++ gcc.target/aarch64/advsimd-intrinsics/vld1x3.c	(working copy)
@@ -1,7 +1,7 @@
 /* We haven't implemented these intrinsics for arm yet.  */
-/* { dg-xfail-if "" { arm*-*-* } } */
 /* { dg-do run } */
 /* { dg-options "-O3" } */
+/* { dg-skip-if "" { arm*-*-* } } */
 
 #include 
 #include "arm-neon-ref.h"
Index: gcc.target/aarch64/advsimd-intrinsics/vst1x2.c
===
--- gcc.target/aarch64/advsimd-intrinsics/vst1x2.c	(revision 262551)
+++ gcc.target/aarch64/advsimd-intrinsics/vst1x2.c	(working copy)
@@ -1,7 +1,7 @@
 /* We haven't implemented these intrinsics for arm yet.  */
-/* { dg-xfail-if "" { arm*-*-* } } */
 /* { dg-do run } */
 /* { dg-options "-O3" } */
+/* { dg-skip-if "" { arm*-*-* } } */
 
 #include 
 #include "arm-neon-ref.h"
Index: gcc.target/aarch64/advsimd-intrinsics/vst1x3.c
===
--- gcc.target/aarch64/advsimd-intrinsics/vst1x3.c	(revision 262551)
+++ gcc.target/aarch64/advsimd-intrinsics/vst1x3.c	(working copy)
@@ -1,7 +1,7 @@
 /* We haven't implemented these intrinsics for arm yet.  */
-/* { dg-xfail-if "" { arm*-*-* } } */
 /* { dg-do run } */
 /* { dg-options "-O3" } */
+/* { dg-skip-if "" { arm*-*-* } } */
 
 #include 
 #include "arm-neon-ref.h"


Re: [PATCH, S390] Increase function alignment to 16 bytes

2018-07-13 Thread Andreas Krebbel
On 07/12/2018 01:34 PM, Robin Dapp wrote:
> Hi,
> 
>> Please skip '+  && !opts->x_optimize_size)'. I'm attaching patch
>> that will
>> set opts->x_flag_align_functions to 0 for -Os. It's part of another batch
>> alignment patches I'm preparing.
> 
> done in the attached version and added some tests (which do not all fail
> without the patch as we can get lucky with the alignment).
> 
> Regtested on s390x.
> 
> Regards
>  Robin
> 
> --
> 
> gcc/ChangeLog:
> 
> 2018-07-12  Robin Dapp  
> 
>   * config/s390/s390.c (s390_default_align): Set default function
>   alignment to 16.
>   (s390_override_options_after_change): Call s390_default align.
>   (s390_option_override_internal): Call s390_default_align.
>   (TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE): Define.
> 
> gcc/testsuite/ChangeLog:
> 
> 2018-07-12  Robin Dapp  
> 
>   * gcc.target/s390/function-align1.c: New test.
>   * gcc.target/s390/function-align2.c: New test.
>   * gcc.target/s390/function-align3.c: New test.
> 

Ok to apply. Thanks!

Andreas



[SPARC] Minor tweak

2018-07-13 Thread Eric Botcazou
Tested on SPARC/Solaris, applied on the mainline.


2018-07-13  Eric Botcazou  

* config/sparc/sparc-protos.h (sparc_compute_frame_size): Delete.
* config/sparc/sparc.c (sparc_compute_frame_size): Make static.

-- 
Eric BotcazouIndex: config/sparc/sparc-protos.h
===
--- config/sparc/sparc-protos.h	(revision 262551)
+++ config/sparc/sparc-protos.h	(working copy)
@@ -31,7 +31,6 @@ extern unsigned long sparc_type_code (tr
 #endif /* TREE_CODE */
 
 extern void order_regs_for_local_alloc (void);
-extern HOST_WIDE_INT sparc_compute_frame_size (HOST_WIDE_INT, int);
 extern int sparc_initial_elimination_offset (int);
 extern void sparc_expand_prologue (void);
 extern void sparc_flat_expand_prologue (void);
Index: config/sparc/sparc.c
===
--- config/sparc/sparc.c	(revision 262551)
+++ config/sparc/sparc.c	(working copy)
@@ -5459,7 +5459,7 @@ save_local_or_in_reg_p (unsigned int reg
 /* Compute the frame size required by the function.  This function is called
during the reload pass and also by sparc_expand_prologue.  */
 
-HOST_WIDE_INT
+static HOST_WIDE_INT
 sparc_compute_frame_size (HOST_WIDE_INT size, int leaf_function)
 {
   HOST_WIDE_INT frame_size, apparent_frame_size;


Re: abstract wide int binop code from VRP

2018-07-13 Thread Aldy Hernandez




On 07/13/2018 03:02 AM, Richard Biener wrote:

On Thu, Jul 12, 2018 at 10:12 AM Aldy Hernandez  wrote:



So besides the general discussion about references/pointers for out parameters
let's stay consistet within related APIs.  This means wide_int_binop should
have a

wide_int
wide_int_binop (enum tree_code, const wide_int &, const wide_int &,
signop, wi::overflow_type *)

signature.  Notice how I elided the out wide_int parameter to be the
return value which means
the function isn't supposed to fail which means gcc_unreachable () for
"unhandled" tree codes.


wide_int_binop was returning failure for:


case CEIL_DIV_EXPR:
  if (arg2 == 0)
return false;
  res = wi::div_ceil (arg1, arg2, sign, );
  break;

case ROUND_DIV_EXPR:
  if (arg2 == 0)
return false;
  res = wi::div_round (arg1, arg2, sign, );
  break;

etc

How do you suggest we indicate success/failure to the caller?

Aldy


It's more like an exceptional state anyway.

The same goes for the poly_int_binop signature.

The already existing wi::accumulate_overflow should probably be re-done as

wi::overflow_type wi::accumulate_overflow (wi::overflow_type,
wi::overflow_type);

Richard.


Thanks for the review!
Aldy


Re: [Patch, Fortran] PR 85599: warn about short-circuiting of logical expressions for non-pure functions

2018-07-13 Thread Janus Weil
Just noticed another problematic case: Calls to generic interfaces are
not detected as implicitly pure, see enhanced test case in attachment.
Will look into this problem on the weekend ...

Cheers,
Janus

2018-07-12 21:43 GMT+02:00 Janus Weil :
> Hi all,
>
> here is a minor update of the patch, which cures some problems with
> implicitly pure functions in the previous version.
>
> Most implicitly pure functions were actually detected correctly, but
> implicitly pure functions that called other implicitly pure functions
> were not detected properly, and therefore could trigger a warning.
> This is fixed in the current version, which still regtests cleanly
> (note that alloc-comp-3.f90 currently fails due to PR 86417). The test
> case is also enhanced to include the problematic case.
>
> Ok for trunk?
>
> Cheers,
> Janus
>
>
>
> 2018-07-11 23:06 GMT+02:00 Janus Weil :
>> Hi all,
>>
>> after the dust of the heated discussion around this PR has settled a
>> bit, here is another attempt to implement at least some basic warnings
>> about compiler-dependent behavior concerning the short-circuiting of
>> logical expressions.
>>
>> As a reminder (and recap of the previous discussion), the Fortran
>> standard unfortunately is a bit sloppy in this area: It allows
>> compilers to short-circuit the second operand of .AND. / .OR.
>> operators, but does not require this. As a result, compilers can do
>> what they want without conflicting with the standard, and they do:
>> gfortran does short-circuiting (via TRUTH_ANDIF_EXPR/TRUTH_ORIF_EXPR),
>> ifort does not.
>>
>> I'm continuing here the least-invasive approach of keeping gfortran's
>> current behavior, but warning about cases where compilers may produce
>> different results.
>>
>> The attached patch is very close to the version I posted previously
>> (which was already approved by Janne), with the difference that the
>> warnings are now triggered by -Wextra and not -Wsurprising (which is
>> included in -Wall), as suggested by Nick Maclaren. I think this is
>> more reasonable, since not everyone may want to see these warnings.
>>
>> Note that I don't want to warn about all possible optimizations that
>> might be allowed by the standard, but only about those that are
>> actually problematic in practice and result in compiler-dependent
>> behavior.
>>
>> The patch regtests cleanly on x86_64-linux-gnu. Ok for trunk?
>>
>> Cheers,
>> Janus
>>
>>
>> 2018-07-11  Thomas Koenig  
>> Janus Weil  
>>
>> PR fortran/85599
>> * dump-parse-tree (show_attr): Add handling of implicit_pure.
>> * resolve.c (impure_function_callback): New function.
>> (resolve_operator): Call it vial gfc_expr_walker.
>>
>>
>> 2018-07-11  Janus Weil  
>>
>> PR fortran/85599
>> * gfortran.dg/short_circuiting.f90: New test.
! { dg-do compile }
! { dg-additional-options "-Wextra" }
!
! PR 85599: warn about short-circuiting of logical expressions for non-pure functions
!
! Contributed by Janus Weil 

module a

   interface impl_pure_a
  module procedure impl_pure_a1
   end interface

contains

logical function impl_pure_a1()
  impl_pure_a1 = .true.
   end function

end module


program short_circuit

   use a

   logical :: flag
   flag = .false.
   flag = check() .and. flag
   flag = flag .and. check()! { dg-warning "might not be evaluated" }
   flag = flag .and. pure_check()
   flag = flag .and. impl_pure_1()
   flag = flag .and. impl_pure_2()
   flag = flag .and. impl_pure_a1()
   flag = flag .and. impl_pure_a()  ! bogus warning here

contains

   logical function check()
  integer, save :: i = 1
  print *, "check", i
  i = i + 1
  check = .true.
   end function

   logical pure function pure_check()
  pure_check = .true.
   end function

   logical function impl_pure_1()
  impl_pure_1 = .true.
   end function

   logical function impl_pure_2()
  impl_pure_2 = impl_pure_1()
   end function


end


Re: abstract wide int binop code from VRP

2018-07-13 Thread Richard Biener
On Thu, Jul 12, 2018 at 10:12 AM Aldy Hernandez  wrote:
>
> On 07/11/2018 01:33 PM, Richard Sandiford wrote:
> > Aldy Hernandez  writes:
> >> On 07/11/2018 08:52 AM, Richard Biener wrote:
> >>> On Wed, Jul 11, 2018 at 8:48 AM Aldy Hernandez  wrote:
> 
>  Hmmm, I think we can do better, and since this hasn't been reviewed yet,
>  I don't think anyone will mind the adjustment to the patch ;-).
> 
>  I really hate int_const_binop_SOME_RANDOM_NUMBER.  We should abstract
>  them into properly named poly_int_binop, wide_int_binop, and tree_binop,
>  and then use a default argument for int_const_binop() to get things 
>  going.
> 
>  Sorry for more changes in flight, but I thought we could benefit from
>  more cleanups :).
> 
>  OK for trunk pending tests?
> >>>
> >>> Much of GCC pre-dates function overloading / default args ;)
> >>
> >> Heh...and ANSI C.
> >>
> >>>
> >>> Looks OK but can you please rename your tree_binop to int_cst_binop?
> >>> Or maybe inline it into int_const_binop, also sharing the force_fit_type 
> >>> ()
> >>> tail with poly_int_binop?
> >>
> >> I tried both, but inlining looked cleaner :).  Done.
> >>
> >>>
> >>> What about mixed INTEGER_CST / poly_int constants?  Shouldn't it
> >>> be
> >>>
> >>> if (neither-poly-nor-integer-cst (arg1 || arg2))
> >>>   return NULL_TREE;
> >>> if (poly_int_tree (arg1) || poly_int_tree (arg2))
> >>>   poly-int-stuff
> >>> else if (INTEGER_CST && INTEGER_CST)
> >>>   wide-int-stuff
> >>>
> >>> ?  I see that is a pre-existing issue but if you are at refactoring...
> >>> wi::to_poly_wide should handle INTEGER_CST operands just fine
> >>> I hope.
> >>
> >> This aborted:
> >> gcc_assert (NUM_POLY_INT_COEFFS != 1);
> >>
> >> but even taking it out made the bootstrap die somewhere else.
> >>
> >> If it's ok, I'd rather not tackle this now, as I have some more cleanups
> >> that are pending on this.  If you feel strongly, I could do it at a
> >> later time.
> >>
> >> OK pending tests?
> >
> > LGTM FWIW, just some nits:
> >
> >> -/* Subroutine of int_const_binop_1 that handles two INTEGER_CSTs.  */
> >> +/* Combine two wide ints ARG1 and ARG2 under operation CODE to produce
> >> +   a new constant in RES.  Return FALSE if we don't know how to
> >> +   evaluate CODE at compile-time.  */
> >>
> >> -static tree
> >> -int_const_binop_2 (enum tree_code code, const_tree parg1, const_tree 
> >> parg2,
> >> -   int overflowable)
> >> +bool
> >> +wide_int_binop (enum tree_code code,
> >> +wide_int , const wide_int , const wide_int ,
> >> +signop sign, wi::overflow_type )
> >>   {
> >
> > IMO we should avoid pass-back by reference like the plague. :-)
> > It's especially confusing when the code does things like:
> >
> >  case FLOOR_DIV_EXPR:
> >if (arg2 == 0)
> >   return false;
> >res = wi::div_floor (arg1, arg2, sign, );
> >break;
>  >
>  > It looked at first like it was taking the address of a local variable
>  > and failing to propagate the information back up.
>  >
>  > I think we should stick to using pointers for this kind of thing.
>  >
>
> Hmmm, I kinda like them.  It just takes some getting used to, but
> generally yields cleaner code as you don't have to keep using '*'
> everywhere.  Plus, the callee can assume the pointer is non-zero.
>
> >> -/* Combine two integer constants PARG1 and PARG2 under operation CODE
> >> -   to produce a new constant.  Return NULL_TREE if we don't know how
> >> +/* Combine two poly int's ARG1 and ARG2 under operation CODE to
> >> +   produce a new constant in RES.  Return FALSE if we don't know how
> >>  to evaluate CODE at compile-time.  */
> >>
> >> -static tree
> >> -int_const_binop_1 (enum tree_code code, const_tree arg1, const_tree arg2,
> >> -   int overflowable)
> >> +static bool
> >> +poly_int_binop (poly_wide_int , enum tree_code code,
> >> +const_tree arg1, const_tree arg2,
> >> +signop sign, wi::overflow_type )
> >>   {
> >
> > Would be good to be consistent about the order of the result and code
> > arguments.  Here it's "result, code" (which seems better IMO),
> > but in wide_int_binop it's "code, result".
>
> Agreed.  Fixed.
>
> >
> >> +/* Combine two integer constants PARG1 and PARG2 under operation CODE
> >> +   to produce a new constant.  Return NULL_TREE if we don't know how
> >> +   to evaluate CODE at compile-time.  */
> >> +
> >>   tree
> >> -int_const_binop (enum tree_code code, const_tree arg1, const_tree arg2)
> >> +int_const_binop (enum tree_code code, const_tree arg1, const_tree arg2,
> >> + int overflowable)
> >
> > s/PARG/ARG/g in comment.
>
> Fixed.
>
> >
> >>   {
> >> -  return int_const_binop_1 (code, arg1, arg2, 1);
> >> +  bool success = false;
> >> +  poly_wide_int poly_res;
> >> +  tree type = TREE_TYPE (arg1);
> >> +  signop sign = TYPE_SIGN (type);
> >> +  wi::overflow_type overflow = wi::OVF_NONE;
> >> +

Re: [PATCH] x86: Tune Skylake, Cannonlake and Icelake as Haswell

2018-07-13 Thread Uros Bizjak
On Thu, Jul 12, 2018 at 9:57 PM, H.J. Lu  wrote:

> r259399, which added PROCESSOR_SKYLAKE, disabled many x86 optimizations
> which are enabled by PROCESSOR_HASWELL.  As the result, -mtune=skylake
> generates slower codes on Skylake than before.  The same also applies
> to Cannonlake and Icelak tuning.
>
> This patch changes -mtune={skylake|cannonlake|icelake} to tune like
> -mtune=haswell for until their tuning is properly adjusted. It also
> enables -mprefer-vector-width=256 for -mtune=haswell, which has no
> impact on codegen when AVX512 isn't enabled.
>
> Performance impacts on SPEC CPU 2017 rate with 1 copy using
>
> -march=native -mfpmath=sse -O2 -m64
>
> are
>
> 1. On Broadwell server:
>
> 500.perlbench_r -0.56%
> 502.gcc_r   -0.18%
> 505.mcf_r   0.24%
> 520.omnetpp_r   0.00%
> 523.xalancbmk_r -0.32%
> 525.x264_r  -0.17%
> 531.deepsjeng_r 0.00%
> 541.leela_r 0.00%
> 548.exchange2_r 0.12%
> 557.xz_r0.00%
> geomean 0.00%
>
> 503.bwaves_r0.00%
> 507.cactuBSSN_r 0.21%
> 508.namd_r  0.00%
> 510.parest_r0.19%
> 511.povray_r-0.48%
> 519.lbm_r   0.00%
> 521.wrf_r   0.28%
> 526.blender_r   0.19%
> 527.cam4_r  0.39%
> 538.imagick_r   0.00%
> 544.nab_r   -0.36%
> 549.fotonik3d_r 0.51%
> 554.roms_r  0.00%
> geomean 0.17%
>
> On Skylake client:
>
> 500.perlbench_r 0.96%
> 502.gcc_r   0.13%
> 505.mcf_r   -1.03%
> 520.omnetpp_r   -1.11%
> 523.xalancbmk_r 1.02%
> 525.x264_r  0.50%
> 531.deepsjeng_r 2.97%
> 541.leela_r 0.50%
> 548.exchange2_r -0.95%
> 557.xz_r2.41%
> geomean 0.56%
>
> 503.bwaves_r0.49%
> 507.cactuBSSN_r 3.17%
> 508.namd_r  4.05%
> 510.parest_r0.15%
> 511.povray_r0.80%
> 519.lbm_r   3.15%
> 521.wrf_r   10.56%
> 526.blender_r   2.97%
> 527.cam4_r  2.36%
> 538.imagick_r   46.40%
> 544.nab_r   2.04%
> 549.fotonik3d_r 0.00%
> 554.roms_r  1.27%
> geomean 5.49%
>
> On Skylake server:
>
> 500.perlbench_r 0.71%
> 502.gcc_r   -0.51%
> 505.mcf_r   -1.06%
> 520.omnetpp_r   -0.33%
> 523.xalancbmk_r -0.22%
> 525.x264_r  1.72%
> 531.deepsjeng_r -0.26%
> 541.leela_r 0.57%
> 548.exchange2_r -0.75%
> 557.xz_r-1.28%
> geomean -0.21%
>
> 503.bwaves_r0.00%
> 507.cactuBSSN_r 2.66%
> 508.namd_r  3.67%
> 510.parest_r1.25%
> 511.povray_r2.26%
> 519.lbm_r   1.69%
> 521.wrf_r   11.03%
> 526.blender_r   3.39%
> 527.cam4_r  1.69%
> 538.imagick_r   64.59%
> 544.nab_r   -0.54%
> 549.fotonik3d_r 2.68%
> 554.roms_r  0.00%
> geomean 6.19%
>
> This patch improves -march=native performance on Skylake up to 60% and
> leaves -march=native performance unchanged on Haswell.
>
> OK for trunk?
>
> Thanks.
>
> H.J.
> ---
> gcc/
>
> 2018-07-12  H.J. Lu  
> Sunil K Pandey  
>
> PR target/84413
> * config/i386/i386.c (m_HASWELL): Add PROCESSOR_SKYLAKE,
> PROCESSOR_SKYLAKE_AVX512, PROCESSOR_CANNONLAKE,
> PROCESSOR_ICELAKE_CLIENT and PROCESSOR_ICELAKE_SERVER.
> (m_SKYLAKE): Set to 0.
> (m_SKYLAKE_AVX512): Likewise.
> (m_CANNONLAKE): Likewise.
> (m_ICELAKE_CLIENT): Likewise.
> (m_ICELAKE_SERVER): Likewise.
> * config/i386/x86-tune.def (avx256_optimal): Also enabled for
> m_HASWELL.
>
> gcc/testsuite/
>
> 2018-07-12  H.J. Lu  
> Sunil K Pandey  
>
> PR target/84413
> * gcc.target/i386/pr84413-1.c: New test.
> * gcc.target/i386/pr84413-2.c: Likewise.
> * gcc.target/i386/pr84413-3.c: Likewise.
> * gcc.target/i386/pr84413-4.c: Likewise.
> ---
>  gcc/config/i386/i386.c| 17 +++--
>  gcc/config/i386/x86-tune.def  |  9 ++---
>  gcc/testsuite/gcc.target/i386/pr84413-1.c | 17 +
>  gcc/testsuite/gcc.target/i386/pr84413-2.c | 17 +
>  gcc/testsuite/gcc.target/i386/pr84413-3.c | 17 +
>  gcc/testsuite/gcc.target/i386/pr84413-4.c | 17 +
>  6 files changed, 85 insertions(+), 9 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr84413-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr84413-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr84413-3.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr84413-4.c
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 9e46b7b136f..762ab89fc9e 

Re: [RFC] Induction variable candidates not sufficiently general

2018-07-13 Thread Richard Biener
On Fri, Jul 13, 2018 at 12:05 AM Kelvin Nilsen  wrote:
>
> A somewhat old "issue report" pointed me to the code generated for a 4-fold 
> manually unrolled version of the following loop:
>
> >   while (++len != len_limit) /* this is loop */
> >   if (pb[len] != cur[len])
> >   break;
>
> As unrolled, the loop appears as:
>
> > while (++len != len_limit) /* this is loop */ {
> >   if (pb[len] != cur[len])
> > break;
> >   if (++len == len_limit)  /* unrolled 2nd iteration */
> > break;
> >   if (pb[len] != cur[len])
> > break;
> >   if (++len == len_limit)  /* unrolled 3rd iteration */
> > break;
> >   if (pb[len] != cur[len])
> > break;
> >   if (++len == len_limit)  /* unrolled 4th iteration */
> > break;
> >   if (pb[len] != cur[len])
> > break;
> > }
>
> In examining the behavior of tree-ssa-loop-ivopts.c, I've discovered the only 
> induction variable candidates that are being considered are all forms of the 
> len variable.  We are not considering any induction variables to represent 
> the address expressions [len] and [len].

I am surprised - did you dig down why?  Because generally IVOPTs does
consider pointer IVs.

Richard.

> I rewrote the source code for this loop to make the addressing expressions 
> more explicit, as in the following:
>
> >   cur++;
> >   while (++pb != last_pb) /* this is loop */ {
> >   if (*pb != *cur)
> > break;
> >   ++cur;
> >   if (++pb == last_pb)  /* unrolled 2nd iteration */
> > break;
> >   if (*pb != *cur)
> > break;
> >   ++cur;
> >   if (++pb == last_pb)  /* unrolled 3rd iteration */
> > break;
> >   if (*pb != *cur)
> > break;
> >   ++cur;
> >   if (++pb == last_pb)  /* unrolled 4th iteration */
> > break;
> >   if (*pb != *cur)
> > break;
> >   ++cur;
> >   }
>
> Now, gcc does a better job of identifying the "address expression induction 
> variables".  This version of the loop runs about 10% faster than the original 
> on my target architecture.
>
> This would seem to be a textbook pattern for the induction variable analysis. 
>  Does anyone have any thoughts on the best way to add these candidates to the 
> set of induction variables that are considered by tree-ssa-loop-ivopts.c?
>
> Thanks in advance for any suggestions.
>