[PATCH] Fix PR c++/70106 (type of parenthesized qualified-id has wrong cv-qualifiers)

2016-03-10 Thread Patrick Palka
Here's another force_paren_expr issue.  Hopefully I described the issue
adequately in the inline comments that are part of the patch.  The
relevant function that builds the temporary SCOPE_REF is
finish_non_static_data_member.

Does this patch look OK to commit after bootstrap + regtest?

gcc/cp/ChangeLog:

PR c++70106
* semantics.c (force_paren_expr): Just build a PAREN_EXPR when
processing_template_decl and EXPR is a SCOPE_REF.

gcc/testsuite/ChangeLog:

PR c++70106
* g++.dg/cpp1y/paren3.C: New test.
---
 gcc/cp/semantics.c  | 10 +-
 gcc/testsuite/g++.dg/cpp1y/paren3.C | 30 ++
 2 files changed, 39 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/paren3.C

diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index a9dbf16..874dbe2 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -1649,7 +1649,15 @@ force_paren_expr (tree expr)
 
   if (TREE_CODE (expr) == COMPONENT_REF)
 REF_PARENTHESIZED_P (expr) = true;
-  else if (type_dependent_expression_p (expr))
+  else if (type_dependent_expression_p (expr)
+  /* When processing_template_decl, a SCOPE_REF may actually be
+ referring to a non-static data member of the current class, in
+ which case its TREE_TYPE may not be properly cv-qualified (the
+ cv-qualifiers of the implicit *this object haven't yet been taken
+ into account) so we have to delay building a static_cast until
+ instantiation.  */
+  || (processing_template_decl
+  && TREE_CODE (expr) == SCOPE_REF))
 expr = build1 (PAREN_EXPR, TREE_TYPE (expr), expr);
   else if (VAR_P (expr) && DECL_HARD_REGISTER (expr))
 /* We can't bind a hard register variable to a reference.  */;
diff --git a/gcc/testsuite/g++.dg/cpp1y/paren3.C 
b/gcc/testsuite/g++.dg/cpp1y/paren3.C
new file mode 100644
index 000..48359b0
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/paren3.C
@@ -0,0 +1,30 @@
+// PR c++/70106
+// { dg-do compile { target c++14 } }
+
+template 
+struct A
+{
+  int x;
+
+  void foo () const {
+(A::x);
+  };
+};
+
+struct B
+{
+  int x;
+
+  template 
+  void foo () const {
+(B::x);
+  };
+};
+
+void
+foo ()
+{
+  A ().foo ();
+  B ().foo ();
+}
+
-- 
2.8.0.rc1.12.gfce6d53



[RFA][PATCH][PR tree-optimization/64058] Improve and stabilize sorting of coalesce pairs

2016-03-10 Thread Jeff Law


As discussed in the BZ, we have multiple problems with how we sort the 
coalesce list during out-of-ssa coalescing.


First, the sort is not stable.  If the cost of two coalesce pairs is the 
same, we break the tie by looking at the underlying SSA_NAME_VERSION of 
the first, then the second elements in the coalesce pairs.


As a result, changes in SSA_NAME_VERSIONs in the IL can result in 
different coalescing during out-of-ssa.  That in turn can cause changes 
in what objects are coalesced, which in turn causes random performance 
changes.


This patch addresses that problem by recording an index for each 
coalescing pair discovered and using that index as the final tiebreaker 
rather than looking at SSA_NAME_VERSIONs.  That brings stability to the 
coalescing process and avoids a lot of unnecessary differences in the 
code we generate when SSA_NAME_VERSIONs change.


The second problem is our costing heuristic only looks at edge 
frequencies & flags.  It's actually a pretty good heuristic and captures 
the main goal of coalescing -- reducing the most commonly executed 
copies.  However, in the case where the edge frequencies/flags result in 
the same cost we can do better.


When we coalesce two SSA_NAMEs, we have to build the union of the 
conflicts of each of the SSA_NAMEs -- which means the resulting union 
object is less likely to be able to participate in further coalescing.


So given two coalescing pairs with the same primary cost, preferring the 
coalescing pair with the smaller resulting conflict set gives us a 
better chance that the resulting object will be able to participate in 
further coalescing.


That heuristic broadly mirrors one aspect of how iterated conservative 
coalescing works.  The other interesting heuristic (that I did not 
implement) was to favor coalescing of the pair which had a higher degree 
of common conflicts between the two nodes -- which broadly falls into 
the same category as what we're doing with this patch.  The key being 
that the conflict sets are an important thing to consider when coalescing.


Using the conflict sizes as a tie-breaker eliminates the regression in 
64058 and AFAICT also eliminates the regression in 68654 (the latter 
doesn't include a testcase or as in-depth analysis as 64058, but my 
testing indicates this patch should generate the desired code for both 
cases).


The patch has (of course) bootstrapped and regression tested on 
x86_64-linux-gnu.


I'd be curious for thoughts on how to build a testcase for this.  I 
could emit the conflict sizes along with the coalescing cost in the 
dumps, but that won't positively verify that we've done the preferred 
set of coalescings.


I might be able to look at the .expand dumps and perhaps look for copies 
on edges.  However, unless the only copies are the ones that were 
causing the regression, I suspect such a test would end up being rather 
fragile.


Other thoughts on how to get this under regression testing?  And of 
course, thoughts on the patch itself?


Thanks,
Jeff
diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index cc91e84..f28baa2 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,18 @@
+2016-03-10  Jeff Law  
+
+   PR tree-optimization/64058
+   * tree-ssa-coalesce.c (struct coalesce_pair): Add new fields
+   CONFLICT_COUNT and INDEX.
+   (num_coalesce_pairs): Move up earlier in the file.
+   (find_coalesce_pair): Initialize the INDEX field for each pair
+   discovered.
+   (add_conflict_counts): New function to initialize the CONFLICT_COUNT
+   field for each conflict pair.
+   (coalesce_ssa_name): Call it.
+   (compare_pairs): No longer sort on the elements of each pair.
+   Instead break ties with the conflict size and finally the index
+   of the coalesce pair.
+
 2016-03-10  Ulrich Weigand  
 
PR target/70168
diff --git a/gcc/tree-ssa-coalesce.c b/gcc/tree-ssa-coalesce.c
index 6624e7e..b8a2e0d 100644
--- a/gcc/tree-ssa-coalesce.c
+++ b/gcc/tree-ssa-coalesce.c
@@ -50,6 +50,19 @@ struct coalesce_pair
   int first_element;
   int second_element;
   int cost;
+
+  /* A count of the number of unique partitions this pair would conflict
+ with if coalescing was successful.  This is the secondary sort key,
+ given two pairs with equal costs, we will prefer the pair with a smaller
+ conflict set.
+
+ Note this is not updated and propagated as pairs are coalesced.  */
+  int conflict_count;
+
+  /* The order in which coalescing pairs are discovered is recorded in this
+ field, which is used as the final tie breaker when sorting coalesce
+ pairs.  */
+  int index;
 };
 
 /* Coalesce pair hashtable helpers.  */
@@ -254,6 +267,13 @@ delete_coalesce_list (coalesce_list *cl)
   free (cl);
 }
 
+/* Return the number of unique coalesce pairs in CL.  */
+
+static inline int
+num_coalesce_pairs (coalesce_list *cl)
+{
+  return cl->list->elements ();
+}
 
 /* Find a matching 

Re: Wonly-top-basic-asm

2016-03-10 Thread David Wohlferd
So, we have been discussing this issue for 4 months now.  Over that 
time, I have tried to incorporate everyone's feedback.


As a result we have gone from a tiny doc patch (just describe the 
current semantics), to a big doc patch (completely deprecate basic asm 
when used in a function) to a medium doc patch + code fix (warning when 
using basic asm in a function) and now back to a 
slightly-bigger-than-tiny doc patch.


I have made no changes since the last patch I posted 
(https://gcc.gnu.org/ml/gcc-patches/2016-02/msg01406.html) for the 
reasons discussed below.


I assert that this patch both contains important information users need 
and is better than the current text.  I expect that Sandra is prepared 
to check this in as soon as someone signs off on its technical accuracy.


dw

On 2/28/2016 11:02 PM, David Wohlferd wrote:

On 2/26/2016 7:09 AM, Bernd Schmidt wrote:

On 02/21/2016 11:27 AM, David Wohlferd wrote:

So now what?  I have one Bernd who likes the sample, and one who
doesn't.  Obviously I think what I'm proposing is better than what's
there now and I've done my best to say why.  But me believing it to be
better doesn't get anything checked in.


I hadn't thought it through well enough. Jan's objection (order isn't 
guaranteed) is relevant. I'd drop the example.


To be clear: Are you suggesting that we delete the sample that is 
there and have no example at all for basic asm?


I'm not sure I agree.  Looking at the linux kernel source, there are 
times and places where basic asm is appropriate, even necessary.  I 
realize that macros are an uncommon usage.  But it makes for a more 
interesting sample than simply outputting a section name.


If ordering is your concern, would adding a reference to 
-fno-toplevel-reorder make you feel better about this?  It seems 
unnecessary in this particular context, but mentioning this option on 
the basic asm page is certainly appropriate.


dw





Re: [PATCH] Turn some compile-time tests into run-time tests

2016-03-10 Thread Patrick Palka
On Thu, Mar 10, 2016 at 6:38 PM, Patrick Palka  wrote:
> I ran the command
>
>   git grep -l "dg-do compile" | xargs grep -l __builtin_abort | xargs grep 
> -lw main
>
> to find tests marked as compile-time tests that likely ought to instead
> be marked as run-time tests, by the rationale that they use
> __builtin_abort and they also define main().  (I also then confirmed that they
> compile, link and run cleanly on my machine.)
>
> After this patch, the remaining test files reported by the above command
> are:
>
>   These do not define all the functions they use:
> gcc/testsuite/g++.dg/ipa/devirt-41.C
> gcc/testsuite/g++.dg/ipa/devirt-44.C
> gcc/testsuite/g++.dg/ipa/devirt-45.C
> gcc/testsuite/gcc.target/i386/pr55672.c

Actually this i386 test just performs obvious UB which is why I left
it as a compile-time test.


[PATCH] Turn some compile-time tests into run-time tests

2016-03-10 Thread Patrick Palka
I ran the command

  git grep -l "dg-do compile" | xargs grep -l __builtin_abort | xargs grep -lw 
main

to find tests marked as compile-time tests that likely ought to instead
be marked as run-time tests, by the rationale that they use
__builtin_abort and they also define main().  (I also then confirmed that they
compile, link and run cleanly on my machine.)

After this patch, the remaining test files reported by the above command
are:

  These do not define all the functions they use:
gcc/testsuite/g++.dg/ipa/devirt-41.C
gcc/testsuite/g++.dg/ipa/devirt-44.C
gcc/testsuite/g++.dg/ipa/devirt-45.C
gcc/testsuite/gcc.target/i386/pr55672.c

  These are non-x86 tests so I can't confirm that they run cleanly:
gcc/testsuite/gcc.target/arm/pr58041.c
gcc/testsuite/gcc.target/powerpc/pr35907.c
gcc/testsuite/gcc.target/s390/dwarfregtable-1.c
gcc/testsuite/gcc.target/s390/dwarfregtable-2.c
gcc/testsuite/gcc.target/s390/dwarfregtable-3.c

  These use dg-error:
libstdc++-v3/testsuite/20_util/forward/c_neg.cc
libstdc++-v3/testsuite/20_util/forward/f_neg.cc

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK to
commit?  Does anyone have another heuristic one can use to help find
these kinds of typos?

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/constexpr-aggr2.C: Make it a run-time test.
* g++.dg/cpp0x/nullptr32.C: Likewise.
* g++.dg/cpp1y/digit-sep-cxx11-neg.C: Likewise.
* g++.dg/cpp1y/digit-sep.C: Likewise.
* g++.dg/ext/flexary13.C: Likewise.
* gcc.dg/alias-14.c: Likewise.
* gcc.dg/ipa/PR65282.c: Likewise.
* gcc.dg/pr69644.c: Likewise.
* gcc.dg/tree-ssa/pr38533.c: Likewise.
* gcc.dg/tree-ssa/pr61385.c: Likewise.
---
 gcc/testsuite/g++.dg/cpp0x/constexpr-aggr2.C | 2 +-
 gcc/testsuite/g++.dg/cpp0x/nullptr32.C   | 2 +-
 gcc/testsuite/g++.dg/cpp1y/digit-sep-cxx11-neg.C | 2 +-
 gcc/testsuite/g++.dg/cpp1y/digit-sep.C   | 2 +-
 gcc/testsuite/g++.dg/ext/flexary13.C | 2 +-
 gcc/testsuite/gcc.dg/alias-14.c  | 2 +-
 gcc/testsuite/gcc.dg/ipa/PR65282.c   | 2 +-
 gcc/testsuite/gcc.dg/pr69644.c   | 2 +-
 gcc/testsuite/gcc.dg/tree-ssa/pr38533.c  | 2 +-
 gcc/testsuite/gcc.dg/tree-ssa/pr61385.c  | 2 +-
 10 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-aggr2.C 
b/gcc/testsuite/g++.dg/cpp0x/constexpr-aggr2.C
index 805d026..ed01aad 100644
--- a/gcc/testsuite/g++.dg/cpp0x/constexpr-aggr2.C
+++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-aggr2.C
@@ -1,5 +1,5 @@
 // PR c++/68782
-// { dg-do compile { target c++11 } }
+// { dg-do run { target c++11 } }
 
 #define assert(X) do { if (!(X)) __builtin_abort(); } while (0)
 
diff --git a/gcc/testsuite/g++.dg/cpp0x/nullptr32.C 
b/gcc/testsuite/g++.dg/cpp0x/nullptr32.C
index a1e6710..4de4140 100644
--- a/gcc/testsuite/g++.dg/cpp0x/nullptr32.C
+++ b/gcc/testsuite/g++.dg/cpp0x/nullptr32.C
@@ -1,6 +1,6 @@
 // PR c++/63942
 // A mangling alias for the first constructor was conflicting with the second.
-// { dg-do compile { target c++11 } }
+// { dg-do run { target c++11 } }
 // { dg-options "-fno-inline" }
 
 int i;
diff --git a/gcc/testsuite/g++.dg/cpp1y/digit-sep-cxx11-neg.C 
b/gcc/testsuite/g++.dg/cpp1y/digit-sep-cxx11-neg.C
index 411ba52..c1cbc6c 100644
--- a/gcc/testsuite/g++.dg/cpp1y/digit-sep-cxx11-neg.C
+++ b/gcc/testsuite/g++.dg/cpp1y/digit-sep-cxx11-neg.C
@@ -1,4 +1,4 @@
-// { dg-do compile { target { ! c++14 } } }
+// { dg-do run { target { ! c++14 } } }
 
 #define assert(E) if(!(E))__builtin_abort();
 
diff --git a/gcc/testsuite/g++.dg/cpp1y/digit-sep.C 
b/gcc/testsuite/g++.dg/cpp1y/digit-sep.C
index 111e7a5..75a12f6 100644
--- a/gcc/testsuite/g++.dg/cpp1y/digit-sep.C
+++ b/gcc/testsuite/g++.dg/cpp1y/digit-sep.C
@@ -1,4 +1,4 @@
-// { dg-do compile { target c++14 } }
+// { dg-do run { target c++14 } }
 
 #define assert(E) if(!(E))__builtin_abort();
 
diff --git a/gcc/testsuite/g++.dg/ext/flexary13.C 
b/gcc/testsuite/g++.dg/ext/flexary13.C
index 462ed65..1347542 100644
--- a/gcc/testsuite/g++.dg/ext/flexary13.C
+++ b/gcc/testsuite/g++.dg/ext/flexary13.C
@@ -1,4 +1,4 @@
-// { dg-do compile }
+// { dg-do run }
 // { dg-options -Wno-pedantic }
 
 #define STR(s) #s
diff --git a/gcc/testsuite/gcc.dg/alias-14.c b/gcc/testsuite/gcc.dg/alias-14.c
index 1ca1c09..24f0d1c 100644
--- a/gcc/testsuite/gcc.dg/alias-14.c
+++ b/gcc/testsuite/gcc.dg/alias-14.c
@@ -1,4 +1,4 @@
-/* { dg-do compile } */
+/* { dg-do run } */
 /* { dg-options "-O2" } */
 #include 
 void *a;
diff --git a/gcc/testsuite/gcc.dg/ipa/PR65282.c 
b/gcc/testsuite/gcc.dg/ipa/PR65282.c
index a52bec6..a7a881a 100644
--- a/gcc/testsuite/gcc.dg/ipa/PR65282.c
+++ b/gcc/testsuite/gcc.dg/ipa/PR65282.c
@@ -1,4 +1,4 @@
-/* { dg-do compile } */
+/* { dg-do run } */
 /* { dg-options "-Os -fdump-ipa-icf"  } */
 
 int a[2];
diff --git a/gcc/testsuite/gcc.dg/pr69644.c 

Re: [PATCH, rs6000] Fix PR target/70168

2016-03-10 Thread David Edelsohn
On Thu, Mar 10, 2016 at 6:10 PM, Ulrich Weigand  wrote:
> Hello,
>
> this patch fixes PR target/70168, a wrong code generation problem
> caused by rs6000_expand_atomic_compare_and_swap not properly handling
> the case where changing retval clobbers newval due to a register overlap.
>
> Tested with no regressions on powerpc64le-linux on mainline
> and gcc-5-branch.
>
> OK for both?
>
> Bye,
> Ulrich
>
>
> ChangeLog:
>
> PR target/70168
> * config/rs6000/rs6000.c (rs6000_expand_atomic_compare_and_swap):
> Handle overlapping retval and newval.

Okay everywhere.

Thanks, David


[PATCH, rs6000] Fix PR target/70168

2016-03-10 Thread Ulrich Weigand
Hello,

this patch fixes PR target/70168, a wrong code generation problem
caused by rs6000_expand_atomic_compare_and_swap not properly handling
the case where changing retval clobbers newval due to a register overlap.

Tested with no regressions on powerpc64le-linux on mainline
and gcc-5-branch.

OK for both?

Bye,
Ulrich


ChangeLog:

PR target/70168
* config/rs6000/rs6000.c (rs6000_expand_atomic_compare_and_swap):
Handle overlapping retval and newval.

Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 234119)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -20740,6 +20740,9 @@
   if (mode != TImode && !reg_or_short_operand (oldval, mode))
 oldval = copy_to_mode_reg (mode, oldval);
 
+  if (reg_overlap_mentioned_p (retval, newval))
+newval = copy_to_reg (newval);
+
   mem = rs6000_pre_atomic_barrier (mem, mod_s);
 
   label1 = NULL_RTX;
-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



[PATCH] Fix PR c++/70121 (premature folding of const var that was implicitly captured)

2016-03-10 Thread Patrick Palka
Within a lambda we should implicitly capture an outer const variable
only if it's odr-used in the body of the lambda.  But we are currently
making the decision of whether to capture such a variable, or else to
fold it to a constant, too early -- before we can know whether it's
being odr-used or not.  So we currently always fold a const variable to
a constant if possible instead of otherwise capturing it, but of course
doing this is wrong if e.g. the address of this variable is taken inside
the lambda's body.

This patch reverses the behavior of process_outer_var_ref, so that we
always implicitly capture a const variable if it's capturable, instead
of always trying to first fold it to a constant.  This behavior however
is wrong too, and introduces a different but perhaps less important
regression: if we implicitly capture by value a const object that is not
actually odr-used within the body of the lambda, we may introduce a
redundant call to its copy/move constructor, see pr70121-2.C.

Ideally we should be capturing a variable only if it's not odr-used
within the entire body of the lambda, but doing that would be a
significantly less trivial patch I think.  So I wonder if this patch is
a good tradeoff for GCC 6.  But I mostly wonder if my analysis and
proposed ideal solution is correct :)  Either way I'm planning to
bootstrap + regtest this patch and also test it against Boost.

gcc/cp/ChangeLog:

PR c++/70121
* semantics.c (process_outer_var_ref): Don't fold DECL to a
constant if it's otherwise going to get implicitly captured.

gcc/testsuite/ChangeLog:

PR c++/70121
* g++.dg/cpp0x/lambda/pr70121.C: New test.
* g++.dg/cpp0x/lambda/pr70121-2.C: New test.
---
 gcc/cp/semantics.c|  6 +
 gcc/testsuite/g++.dg/cpp0x/lambda/pr70121-2.C | 22 
 gcc/testsuite/g++.dg/cpp0x/lambda/pr70121.C   | 37 +++
 3 files changed, 65 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/lambda/pr70121-2.C
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/lambda/pr70121.C

diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index ea72e0e..a9dbf16 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -3332,6 +3332,12 @@ process_outer_var_ref (tree decl, tsubst_flags_t 
complain)
   the closure type when instantiating the lambda context.  That is
   probably also the way to handle lambdas within pack expansions.  */
return decl;
+  /* Don't try to fold DECL to a constant if we are otherwise going to
+implicitly capture it.  FIXME we should avoid folding DECL to a
+constant only if it's odr-used within the lambda body, but we can't
+determine that at this point.  */
+  else if (lambda_expr && context == containing_function)
+   ;
   else if (decl_constant_var_p (decl))
{
  tree t = maybe_constant_value (convert_from_reference (decl));
diff --git a/gcc/testsuite/g++.dg/cpp0x/lambda/pr70121-2.C 
b/gcc/testsuite/g++.dg/cpp0x/lambda/pr70121-2.C
new file mode 100644
index 000..4676068
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/lambda/pr70121-2.C
@@ -0,0 +1,22 @@
+// PR c++/70121
+// { dg-do compile { target c++11 } }
+
+struct X
+{
+  int a;
+
+  constexpr X () : a (28) { };
+  X (const X&) = delete;
+  X (const X&&) = delete;
+};
+
+void
+baz (void)
+{
+  constexpr X x;
+  // These are non-odr usages of x.a so they should each be folded to a
+  // constant expression without having to capture and copy x.
+  auto ff1 = [=] { return x.a; }; // { dg-bogus "deleted" "" { xfail *-*-* } }
+  const auto  = [=] { return x.a; }; // { dg-bogus "deleted" "" { xfail 
*-*-* } }
+  const auto & = [=] { return x.a; }; // { dg-bogus "deleted" "" { xfail 
*-*-* } }
+}
diff --git a/gcc/testsuite/g++.dg/cpp0x/lambda/pr70121.C 
b/gcc/testsuite/g++.dg/cpp0x/lambda/pr70121.C
new file mode 100644
index 000..db8a4ca
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/lambda/pr70121.C
@@ -0,0 +1,37 @@
+// PR c++/70121
+// { dg-do compile { target c++11 } }
+
+static void
+foo (void)
+{
+  const int val = 28;
+  auto ff_v = [=]() -> const int& { return val; }; // { dg-bogus "temporary" }
+  auto ff_r = [&]() -> const int& { return val; }; // { dg-bogus "temporary" }
+
+  if (_r () != )
+__builtin_abort ();
+
+  if (ff_v () + ff_r () != 56)
+__builtin_abort ();
+}
+
+static void
+bar (void)
+{
+  const int val = 28;
+  static const int *p;
+  auto ff_v = [=] { p =  }; // { dg-bogus "lvalue required" }
+  auto ff_r = [&] { p =  }; // { dg-bogus "lvalue required" }
+
+  ff_v ();
+  ff_r ();
+  if (p != )
+__builtin_abort ();
+}
+
+int
+main ()
+{
+  foo ();
+  bar ();
+}
-- 
2.8.0.rc1.12.gfce6d53



[PATCH] Fix PR7001

2016-03-10 Thread Cesar Philippidis
This patch updates a libgomp test case which was expecting char to be
signed when that's not guaranteed to be the case. Jakub pre-approved
this patch in the PR.

I'll apply this patch to trunk and backport it to gomp-4_0-branch shortly.

Cesar
2016-03-10  Cesar Philippidis  

	libgomp/
	PR testsuite/70009
	* testsuite/libgomp.oacc-c-c++-common/vprop.c: Make test data signed.

diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/vprop.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/vprop.c
index a9b63dc..17b9568 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/vprop.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/vprop.c
@@ -4,8 +4,8 @@
 void		\
 test_##type ()	\
 {		\
-  type b[100];	\
-  type i, j, x = -1, y = -1;			\
+  signed type b[100];\
+  signed type i, j, x = -1, y = -1;		\
 		\
   _Pragma("acc parallel loop copyout (b)")	\
   for (j = 0; j > -5; j--)			\


Re: [RFC][PR69708] IPA inline not working for function reference in static const struc

2016-03-10 Thread kugan



On 11/03/16 03:39, Martin Jambor wrote:

Hi,

On Tue, Mar 01, 2016 at 09:04:25AM +1100, kugan wrote:

Hi,

As discussed in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69708 and
corresponding mailing list discussion, IPA CP is not detecting  a
jump-function with the sq function as value.




sorry it took so long for me to look at this.  First, I have looked at
your patch and found a number of issues (see comments below), but when
I tried to fix them (see my patch below), I found out that using the
aggregate jump functions is not the the best approach.  But let me
start with the comments first:


Hi Martin,

Thanks for the explanation.

Kugan


Re: [AArch64] Emit square root using the Newton series

2016-03-10 Thread Evandro Menezes

On 03/10/16 13:10, Wilco Dijkstra wrote:

 frsqrte  s1, s0
 fmul s2, s1, s1
 frsqrts  s2, s0, s2
 fcmp s0, 0.0
 fmul s1, s1, s2
 fmul s2, s1, s1
 fmul s1, s0, s1
 frsqrts  s2, s0, s2
 fcsels1, s0, s1, eq
 fmul s0, s1, s2


That's what I had in mind too, but around the approximation for x^-1/2 
and using masks for vector cases thusly:


fcmne   v3.4s, v0.4s, #0.0
frsqrte v1.4s, v0.4s
fmulv2.4s, v1.4s, v1.4s
frsqrts v2.4s, v0.4s, v2.4s
fmulv1.4s, v1.4s, v2.4s
fmulv2.4s, v1.4s, v1.4s
frsqrts v2.4s, v0.4s, v2.4s
fmulv1.4s, v1.4s, v2.4s
and v1.4s, v3.4s
fmulv0.4s, v1.4s, v0.4s


Thanks,

--
Evandro Menezes



Re: [C++ PATCH] Reuse certain cxx_eval_constant_expression results in cxx_eval_vec_init_1 (PR c++/70001)

2016-03-10 Thread Jakub Jelinek
On Thu, Mar 10, 2016 at 07:39:57PM +0100, Jakub Jelinek wrote:
> Is this also ok, if it passes bootstrap/regtest?
> 
> 2016-03-10  Jakub Jelinek  
> 
>   PR c++/70001
>   * constexpr.c (cxx_eval_vec_init_1): Reuse CONSTRUCTOR initializers
>   for 1..max even for multi-dimensional arrays, and reuse not just
>   eltinit itself, but surrounding subobject CONSTRUCTOR too.
> 
>   * g++.dg/cpp0x/constexpr-70001-4.C: New test.

Successfully bootstrapped/regtested on x86_64-linux and i686-linux.

Jakub


Re: [AArch64] Emit square root using the Newton series

2016-03-10 Thread Wilco Dijkstra

On 03/10/16 10:52, Wilco Dijkstra wrote:
> Hi Evandro,
>
>> I have however encountered precision issues with DF, namely some benchmarks 
>> in the SPECfp CPU2000 suite would fail to validate.
> Accuracy is not an issue, the computation is extremely accurate. The issue is 
> that your patch doesn't support sqrt(0.0) - it returns NaN rather than zero, 
> and that causes the miscompares you're seeing. So support for the zero case 
> should be added.
>
> This would be a better expansion, supporting zero, and with lower latency 
> than the current sequence:

Now I think of it, frsqrts returns 1.5 for the zero case, so we only need to 
fix up the estimated
sqrt value before the final multiply. Since a FCSEL/VAND can be hidden 
completely behind the
latency of frsqrts, both scalar and vector case could do this:

frsqrte  s1, s0
fmul s2, s1, s1
frsqrts  s2, s0, s2
fcmp s0, 0.0
fmul s1, s1, s2
fmul s2, s1, s1
fmul s1, s0, s1
frsqrts  s2, s0, s2
fcsels1, s0, s1, eq
fmul s0, s1, s2

Wilco





Re: [C++ PATCH] Reuse certain cxx_eval_constant_expression results in cxx_eval_vec_init_1 (PR c++/70001)

2016-03-10 Thread Jakub Jelinek
On Thu, Mar 10, 2016 at 01:32:10PM -0500, Patrick Palka wrote:
> Looks fine to me :)

On a closer look, this doesn't handle the multi-dimensional array cases,
and even for single-dimensional ones will not share the CONSTRUCTOR
if init_subob_ctx created one, and call init_subob_ctx many times
and in there in some cases e.g. build new new_ctx.object every time etc.

Is this also ok, if it passes bootstrap/regtest?

2016-03-10  Jakub Jelinek  

PR c++/70001
* constexpr.c (cxx_eval_vec_init_1): Reuse CONSTRUCTOR initializers
for 1..max even for multi-dimensional arrays, and reuse not just
eltinit itself, but surrounding subobject CONSTRUCTOR too.

* g++.dg/cpp0x/constexpr-70001-4.C: New test.

--- gcc/cp/constexpr.c.jj   2016-03-10 12:52:04.0 +0100
+++ gcc/cp/constexpr.c  2016-03-10 19:24:28.435537864 +0100
@@ -2340,7 +2340,6 @@ cxx_eval_vec_init_1 (const constexpr_ctx
   vec **p = _ELTS (ctx->ctor);
   vec_alloc (*p, max + 1);
   bool pre_init = false;
-  tree pre_init_elt = NULL_TREE;
   unsigned HOST_WIDE_INT i;
 
   /* For the default constructor, build up a call to the default
@@ -2370,6 +2369,7 @@ cxx_eval_vec_init_1 (const constexpr_ctx
 {
   tree idx = build_int_cst (size_type_node, i);
   tree eltinit;
+  bool reuse = false;
   constexpr_ctx new_ctx;
   init_subob_ctx (ctx, new_ctx, idx, pre_init ? init : elttype);
   if (new_ctx.ctor != ctx->ctor)
@@ -2378,7 +2378,10 @@ cxx_eval_vec_init_1 (const constexpr_ctx
{
  /* A multidimensional array; recurse.  */
  if (value_init || init == NULL_TREE)
-   eltinit = NULL_TREE;
+   {
+ eltinit = NULL_TREE;
+ reuse = i == 0;
+   }
  else
eltinit = cp_build_array_ref (input_location, init, idx,
  tf_warning_or_error);
@@ -2390,18 +2393,9 @@ cxx_eval_vec_init_1 (const constexpr_ctx
{
  /* Initializing an element using value or default initialization
 we just pre-built above.  */
- if (pre_init_elt == NULL_TREE)
-   pre_init_elt
- = cxx_eval_constant_expression (_ctx, init, lval,
- non_constant_p, overflow_p);
- eltinit = pre_init_elt;
- /* Don't reuse the result of cxx_eval_constant_expression
-call if it isn't a constant initializer or if it requires
-relocations.  */
- if (initializer_constant_valid_p (pre_init_elt,
-   TREE_TYPE (pre_init_elt))
- != null_pointer_node)
-   pre_init_elt = NULL_TREE;
+ eltinit = cxx_eval_constant_expression (_ctx, init, lval,
+ non_constant_p, overflow_p);
+ reuse = i == 0;
}
   else
{
@@ -2427,6 +2421,23 @@ cxx_eval_vec_init_1 (const constexpr_ctx
}
   else
CONSTRUCTOR_APPEND_ELT (*p, idx, eltinit);
+  /* Don't reuse the result of cxx_eval_constant_expression
+call if it isn't a constant initializer or if it requires
+relocations.  */
+  if (reuse
+ && max > 1
+ && (initializer_constant_valid_p (eltinit, TREE_TYPE (eltinit))
+ == null_pointer_node))
+   {
+ if (new_ctx.ctor != ctx->ctor)
+   eltinit = new_ctx.ctor;
+ for (i = 1; i < max; ++i)
+   {
+ idx = build_int_cst (size_type_node, i);
+ CONSTRUCTOR_APPEND_ELT (*p, idx, eltinit);
+   }
+ break;
+   }
 }
 
   if (!*non_constant_p)
--- gcc/testsuite/g++.dg/cpp0x/constexpr-70001-4.C.jj   2016-03-10 
19:28:13.386481311 +0100
+++ gcc/testsuite/g++.dg/cpp0x/constexpr-70001-4.C  2016-03-10 
19:28:43.295074924 +0100
@@ -0,0 +1,13 @@
+// PR c++/70001
+// { dg-do compile { target c++11 } }
+
+struct B
+{
+  int a;
+  constexpr B () : a (0) { }
+};
+
+struct A
+{
+  B b[1 << 19][1][1][1];
+} c;


Jakub


Re: [C++ PATCH] Reuse certain cxx_eval_constant_expression results in cxx_eval_vec_init_1 (PR c++/70001)

2016-03-10 Thread Jason Merrill

OK.

Jason


Re: [C++ PATCH] Reuse certain cxx_eval_constant_expression results in cxx_eval_vec_init_1 (PR c++/70001)

2016-03-10 Thread Patrick Palka
On Thu, Mar 10, 2016 at 12:56 PM, Jakub Jelinek  wrote:
> On Thu, Mar 10, 2016 at 06:37:32PM +0100, Jakub Jelinek wrote:
>> On Thu, Mar 10, 2016 at 12:34:40PM -0500, Patrick Palka wrote:
>> > Doesn't this mean that we call initializer_constant_valid_p at each
>> > iteration?  This would slow down the non-constant case even further.
>> > So I wonder if the return value of initializer_constant_valid_p could
>> > be cached or something, since it seems like a potentially expensive
>> > predicate.
>>
>> You're right, but I've already committed the patch.  I'll write an
>> incremental patch and test it.
>
> So, like this?
>
> 2016-03-10  Jakub Jelinek  
>
> PR c++/70001
> * constexpr.c (cxx_eval_vec_init_1): For pre_init, only
> call initializer_constant_valid_p on the first iteration.
>
> --- gcc/cp/constexpr.c.jj   2016-03-10 12:52:04.0 +0100
> +++ gcc/cp/constexpr.c  2016-03-10 18:45:13.416533853 +0100
> @@ -2391,17 +2391,21 @@ cxx_eval_vec_init_1 (const constexpr_ctx
>   /* Initializing an element using value or default initialization
>  we just pre-built above.  */
>   if (pre_init_elt == NULL_TREE)
> -   pre_init_elt
> - = cxx_eval_constant_expression (_ctx, init, lval,
> - non_constant_p, overflow_p);
> - eltinit = pre_init_elt;
> - /* Don't reuse the result of cxx_eval_constant_expression
> -call if it isn't a constant initializer or if it requires
> -relocations.  */
> - if (initializer_constant_valid_p (pre_init_elt,
> -   TREE_TYPE (pre_init_elt))
> - != null_pointer_node)
> -   pre_init_elt = NULL_TREE;
> +   {
> + eltinit
> +   = cxx_eval_constant_expression (_ctx, init, lval,
> +   non_constant_p, overflow_p);
> + /* Don't reuse the result of cxx_eval_constant_expression
> +call if it isn't a constant initializer or if it requires
> +relocations.  */
> + if (i == 0
> + && (initializer_constant_valid_p (eltinit,
> +   TREE_TYPE (eltinit))
> + == null_pointer_node))
> +   pre_init_elt = eltinit;
> +   }
> + else
> +   eltinit = pre_init_elt;
> }
>else
> {

Looks fine to me :)


LRA remat issue with hard regs (PR70123)

2016-03-10 Thread Bernd Schmidt
When I submitted my previous lra-remat patch, I mentioned I had some 
concerns about the way we dealt with register number comparisons, but I 
didn't want to change things blindly without a testcase. PR70123 has now 
provided such a testcase where we are trying to rematerialize a hard 
register (r6). While scanning we encounter an instruction of the form

 (set (reg 285) (reg 272))
i.e. involving only pseudos, but reg_renumber[285] is r6. Since we only 
compare register numbers, we do not notice that the hard reg is clobbered.


The following patch modifies the function input_regno_present_p, and 
also renames it so that its purpose is more obvious to someone familiar 
with other parts of gcc. I've made it look at reg_renumber, and also try 
to deal with multi-word hard registers properly.


I'm not entirely sure this is a fully safe approach however, since I 
can't yet answer the question of whether LRA could change another pseudo 
to reside in hard register 6, thereby making the rematerialization 
invalid after the fact. Therefore the patch also includes a change to 
just disable candidates if they involve hard registers. I haven't 
observed that making any difference in code generation (on x86_64), 
beyond fixing the testcase on s390.


Bootstrapped and tested on x86_64-linux; Jakub verified that the 
testcase works afterwards. Ok for trunk and 5-branch, either for one or 
for both parts? I'm hoping the testcase in gcc.dg/torture will get 
exercised in the right way on s390, but I haven't run tests on that machine.



Bernd
	PR target/70123
	* lra-remat.c (operand_to_remat): Disallow hard regs in the value t
	be rematerialized.
	(reg_overlap_for_remat_p): Renamed from input_regno_present_p.
	Arguments swapped.  All callers changed.  Take reg_renumber into
	account, and Calculate and compare register ranges for hard regs.

	PR target/70123
	* gcc.dg/torture/pr70123.c: New test.

Index: gcc/lra-remat.c
===
--- gcc/lra-remat.c	(revision 234025)
+++ gcc/lra-remat.c	(working copy)
@@ -413,6 +413,10 @@ operand_to_remat (rtx_insn *insn)
   if (reg->regno >= FIRST_PSEUDO_REGISTER
 	  && bitmap_bit_p (_regs, reg->regno))
 	return -1;
+
+  /* Don't allow hard registers to be rematerialized.  */
+  if (reg->regno < FIRST_PSEUDO_REGISTER)
+	return -1;
 }
   if (found_reg == NULL)
 return -1;
@@ -718,21 +722,46 @@ calculate_local_reg_remat_bb_data (void)
 
 
 
-/* Return true if REGNO is an input operand of INSN.  */
+/* Return true if REG overlaps an input operand of INSN.  */
 static bool
-input_regno_present_p (rtx_insn *insn, int regno)
+reg_overlap_for_remat_p (lra_insn_reg *reg, rtx_insn *insn)
 {
   int iter;
   lra_insn_recog_data_t id = lra_get_insn_recog_data (insn);
   struct lra_static_insn_data *static_id = id->insn_static_data;
-  struct lra_insn_reg *reg;
-  
+  unsigned regno = reg->regno;
+  int nregs;
+
+  if (regno >= FIRST_PSEUDO_REGISTER && reg_renumber[regno] >= 0)
+regno = reg_renumber[regno];
+  if (regno >= FIRST_PSEUDO_REGISTER)
+nregs = 1;
+  else
+nregs = hard_regno_nregs[regno][reg->biggest_mode];
+
+  struct lra_insn_reg *reg2;
+
   for (iter = 0; iter < 2; iter++)
-for (reg = (iter == 0 ? id->regs : static_id->hard_regs);
-	 reg != NULL;
-	 reg = reg->next)
-  if (reg->type == OP_IN && reg->regno == regno)
-	return true;
+for (reg2 = (iter == 0 ? id->regs : static_id->hard_regs);
+	 reg2 != NULL;
+	 reg2 = reg2->next)
+  {
+	if (reg2->type != OP_IN)
+	  continue;
+	unsigned regno2 = reg2->regno;
+	int nregs2;
+
+	if (regno2 >= FIRST_PSEUDO_REGISTER && reg_renumber[regno2] >= 0)
+	  regno2 = reg_renumber[regno2];
+	if (regno >= FIRST_PSEUDO_REGISTER)
+	  nregs2 = 1;
+	else
+	  nregs2 = hard_regno_nregs[regno2][reg->biggest_mode];
+
+	if ((regno2 + nregs2 - 1 >= regno && regno2 < regno + nregs)
+	|| (regno + nregs - 1 >= regno2 && regno < regno2 + nregs2))
+	  return true;
+  }
   return false;
 }
 
@@ -833,7 +862,7 @@ calculate_gen_cands (void)
 			  && dst_regno == cand->regno)
 			continue;
 		  if (cand->regno == reg->regno
-			  || input_regno_present_p (insn2, reg->regno))
+			  || reg_overlap_for_remat_p (reg, insn2))
 			{
 			  bitmap_clear_bit (gen_cands, cand->index);
 			  bitmap_set_bit (_bitmap, uid);
@@ -1219,7 +1248,7 @@ do_remat (void)
 			&& dst_regno == cand->regno)
 		  continue;
 		if (cand->regno == reg->regno
-			|| input_regno_present_p (cand->insn, reg->regno))
+			|| reg_overlap_for_remat_p (reg, cand->insn))
 		  bitmap_set_bit (_bitmap, cand->index);
 		  }
 
Index: gcc/testsuite/gcc.dg/torture/pr70123.c
===
--- gcc/testsuite/gcc.dg/torture/pr70123.c	(revision 0)
+++ gcc/testsuite/gcc.dg/torture/pr70123.c	(working copy)
@@ -0,0 +1,198 @@
+/* { dg-do run } */
+/* { dg-options "-fPIC" { target fpic } } */
+__attribute__ ((noinline, noclone)) int
+bar (int flag, 

Re: [C++ PATCH] Reuse certain cxx_eval_constant_expression results in cxx_eval_vec_init_1 (PR c++/70001)

2016-03-10 Thread Jakub Jelinek
On Thu, Mar 10, 2016 at 06:37:32PM +0100, Jakub Jelinek wrote:
> On Thu, Mar 10, 2016 at 12:34:40PM -0500, Patrick Palka wrote:
> > Doesn't this mean that we call initializer_constant_valid_p at each
> > iteration?  This would slow down the non-constant case even further.
> > So I wonder if the return value of initializer_constant_valid_p could
> > be cached or something, since it seems like a potentially expensive
> > predicate.
> 
> You're right, but I've already committed the patch.  I'll write an
> incremental patch and test it.

So, like this?

2016-03-10  Jakub Jelinek  

PR c++/70001
* constexpr.c (cxx_eval_vec_init_1): For pre_init, only
call initializer_constant_valid_p on the first iteration.

--- gcc/cp/constexpr.c.jj   2016-03-10 12:52:04.0 +0100
+++ gcc/cp/constexpr.c  2016-03-10 18:45:13.416533853 +0100
@@ -2391,17 +2391,21 @@ cxx_eval_vec_init_1 (const constexpr_ctx
  /* Initializing an element using value or default initialization
 we just pre-built above.  */
  if (pre_init_elt == NULL_TREE)
-   pre_init_elt
- = cxx_eval_constant_expression (_ctx, init, lval,
- non_constant_p, overflow_p);
- eltinit = pre_init_elt;
- /* Don't reuse the result of cxx_eval_constant_expression
-call if it isn't a constant initializer or if it requires
-relocations.  */
- if (initializer_constant_valid_p (pre_init_elt,
-   TREE_TYPE (pre_init_elt))
- != null_pointer_node)
-   pre_init_elt = NULL_TREE;
+   {
+ eltinit
+   = cxx_eval_constant_expression (_ctx, init, lval,
+   non_constant_p, overflow_p);
+ /* Don't reuse the result of cxx_eval_constant_expression
+call if it isn't a constant initializer or if it requires
+relocations.  */
+ if (i == 0
+ && (initializer_constant_valid_p (eltinit,
+   TREE_TYPE (eltinit))
+ == null_pointer_node))
+   pre_init_elt = eltinit;
+   }
+ else
+   eltinit = pre_init_elt;
}
   else
{


Jakub


Re: Fix 70083, lra-induced crash

2016-03-10 Thread Vladimir Makarov

On 03/10/2016 03:26 AM, Bernd Schmidt wrote:
This crash happens because LRA tries to save an AVX hard reg in a 
large mode, and it only appears in the function in smaller modes. 
Stack alignment isn't set up to support the larger mode.


Currently, biggest_mode for hard registers is set up from 
regno_reg_rtx, set up to a large mode for argument regs. That mode is 
not necessarily seen in the function itself and may be too large. If 
that initialization is changed to use VOIDmode, we compute the correct 
value during lra_push_insns, but then subsequently we clear it to 
VOIDmode again, and it never seems to get updated. Hence, the patch 
has several parts: initialize hard reg biggest_mode with VOIDmode, 
ensure it gets updated during process_bb_lives, and use the value in 
split_reg.


Bootstrapped and tested on x86_64-linux, ok?


Yes.  Thank you, Bernd.



Re: [C++ PATCH] Reuse certain cxx_eval_constant_expression results in cxx_eval_vec_init_1 (PR c++/70001)

2016-03-10 Thread Jakub Jelinek
On Thu, Mar 10, 2016 at 12:34:40PM -0500, Patrick Palka wrote:
> Doesn't this mean that we call initializer_constant_valid_p at each
> iteration?  This would slow down the non-constant case even further.
> So I wonder if the return value of initializer_constant_valid_p could
> be cached or something, since it seems like a potentially expensive
> predicate.

You're right, but I've already committed the patch.  I'll write an
incremental patch and test it.

Jakub


Re: [C++ PATCH] Reuse certain cxx_eval_constant_expression results in cxx_eval_vec_init_1 (PR c++/70001)

2016-03-10 Thread Patrick Palka
On Thu, Mar 10, 2016 at 12:03 PM, Jakub Jelinek  wrote:
> Hi!
>
> As mentioned in the PR, the compile time and compile memory are wasted
> if a large array is is using value or default initialization, and
> if the resulting initializer value is simple enough, we can just share
> it by all the elements.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2016-03-10  Patrick Palka  
> Jakub Jelinek  
>
> PR c++/70001
> * constexpr.c (cxx_eval_vec_init_1): For pre_init case, reuse
> return value from cxx_eval_constant_expression from earlier
> elements if it is valid constant initializer requiring no
> relocations.
>
> * g++.dg/cpp0x/constexpr-70001-1.C: New test.
> * g++.dg/cpp0x/constexpr-70001-2.C: New test.
> * g++.dg/cpp0x/constexpr-70001-3.C: New test.
>
> --- gcc/cp/constexpr.c.jj   2016-03-08 21:04:43.050564671 +0100
> +++ gcc/cp/constexpr.c  2016-03-10 12:52:04.016852313 +0100
> @@ -2340,6 +2340,7 @@ cxx_eval_vec_init_1 (const constexpr_ctx
>vec **p = _ELTS (ctx->ctor);
>vec_alloc (*p, max + 1);
>bool pre_init = false;
> +  tree pre_init_elt = NULL_TREE;
>unsigned HOST_WIDE_INT i;
>
>/* For the default constructor, build up a call to the default
> @@ -2389,9 +2390,18 @@ cxx_eval_vec_init_1 (const constexpr_ctx
> {
>   /* Initializing an element using value or default initialization
>  we just pre-built above.  */
> - eltinit = (cxx_eval_constant_expression
> -(_ctx, init,
> - lval, non_constant_p, overflow_p));
> + if (pre_init_elt == NULL_TREE)
> +   pre_init_elt
> + = cxx_eval_constant_expression (_ctx, init, lval,
> + non_constant_p, overflow_p);
> + eltinit = pre_init_elt;
> + /* Don't reuse the result of cxx_eval_constant_expression
> +call if it isn't a constant initializer or if it requires
> +relocations.  */
> + if (initializer_constant_valid_p (pre_init_elt,
> +   TREE_TYPE (pre_init_elt))
> + != null_pointer_node)
> +   pre_init_elt = NULL_TREE;

Doesn't this mean that we call initializer_constant_valid_p at each
iteration?  This would slow down the non-constant case even further.
So I wonder if the return value of initializer_constant_valid_p could
be cached or something, since it seems like a potentially expensive
predicate.


Re: RFA: PR 70044: Catch a second call to aarch64_override_options_after_change

2016-03-10 Thread Nick Clifton
Hi James,

>> OK to apply ?
> 
> OK, thanks.

Thanks - applied.

>>> Note that this patch would expose a bug in 
>>> common/config/aarch64/aarch64-common.c
>>> where there's a thinko in the handling of OPT_momit_leaf_frame_pointer.
>>> That's my bad and I'll propose a patch for it soon.
> 
> I don't think I've seen this on list yet, it might be worth waiting until
> Kyrill has put this patch up before you commit.

I did this.  Plus I checked to make sure that the patch still works and that 
the new test passes...

Cheers
  Nick



Re: [C++ PATCH] Reuse certain cxx_eval_constant_expression results in cxx_eval_vec_init_1 (PR c++/70001)

2016-03-10 Thread Jason Merrill

OK.

Jason


Re: [C++ RFC] magic_varargs_p issues (PR c++/70144)

2016-03-10 Thread Jakub Jelinek
On Wed, Mar 09, 2016 at 06:18:31PM +0100, Jakub Jelinek wrote:
> So, what approach do you prefer?  I've so far bootstrapped/regtested the
> second patch, which showed those
> +FAIL: g++.dg/cilk-plus/AN/builtin_fn_custom_tplt.cc
> +UNRESOLVED: g++.dg/cilk-plus/AN/builtin_fn_custom_tplt.cc
> +FAIL: g++.dg/cilk-plus/AN/builtin_fn_mutating_tplt.cc
> +UNRESOLVED: g++.dg/cilk-plus/AN/builtin_fn_mutating_tplt.cc
> (for all opt/-g levels) regressions.

I've also successfully bootstrapped/regtested the other patch (though,
at least the 
+  if (a == error_mark_node)
+   return error_mark_node;
hunk should be added in there too, and finally attached patch, which is
the combination of the two, use the first patch for Cilk+ reductions
and second otherwise.  I've noticed that convert_arguments also checks
magic_varargs_p, but I don't understand much the relationship of
build_over_call and convert_arguments, are those used either one, or another
one, and so all this should be in both?

2016-03-09  Jakub Jelinek  

PR c++/70144
* cp-tree.h (magic_varargs_p): Return int instead of bool.
* call.c (magic_varargs_p): Return int instead of bool, return 2 for
Cilk+ reductions, otherwise 1 for magic varargs and 0 for normal
varargs.
(build_over_call): If magic_varargs_p == 2, call reject_gcc_builtin,
if magic_varargs_p == 1, call decay_conversion
instead of mark_type_use.  Don't store error_mark_node arguments to
argarray, instead return error_mark_node.

* c-c++-common/pr70144-1.c: New test.
* c-c++-common/pr70144-2.c: New test.

--- gcc/cp/cp-tree.h.jj 2016-03-05 07:46:50.0 +0100
+++ gcc/cp/cp-tree.h2016-03-10 16:17:21.121129641 +0100
@@ -5563,7 +5563,7 @@ public:
 
 /* in call.c */
 extern bool check_dtor_name(tree, tree);
-bool magic_varargs_p(tree);
+int magic_varargs_p(tree);
 
 extern tree build_conditional_expr (location_t, tree, tree, tree, 
  tsubst_flags_t);
--- gcc/cp/call.c.jj2016-03-09 15:06:21.697396705 +0100
+++ gcc/cp/call.c   2016-03-10 16:08:12.850558154 +0100
@@ -7040,15 +7040,17 @@ convert_for_arg_passing (tree type, tree
   return val;
 }
 
-/* Returns true iff FN is a function with magic varargs, i.e. ones for
-   which no conversions at all should be done.  This is true for some
-   builtins which don't act like normal functions.  */
+/* Returns non-zero iff FN is a function with magic varargs, i.e. ones for
+   which just decay_conversion or no conversions at all should be done.
+   This is true for some builtins which don't act like normal functions.
+   Return 2 if no conversions at all should be done, 1 if just
+   decay_conversion.  */
 
-bool
+int
 magic_varargs_p (tree fn)
 {
   if (flag_cilkplus && is_cilkplus_reduce_builtin (fn) != BUILT_IN_NONE)
-return true;
+return 2;
 
   if (DECL_BUILT_IN (fn))
 switch (DECL_FUNCTION_CODE (fn))
@@ -7057,14 +7059,14 @@ magic_varargs_p (tree fn)
   case BUILT_IN_CONSTANT_P:
   case BUILT_IN_NEXT_ARG:
   case BUILT_IN_VA_START:
-   return true;
+   return 1;
 
   default:;
return lookup_attribute ("type generic",
 TYPE_ATTRIBUTES (TREE_TYPE (fn))) != 0;
   }
 
-  return false;
+  return 0;
 }
 
 /* Returns the decl of the dispatcher function if FN is a function version.  */
@@ -7515,9 +7517,17 @@ build_over_call (struct z_candidate *can
   for (; arg_index < vec_safe_length (args); ++arg_index)
 {
   tree a = (*args)[arg_index];
-  if (magic_varargs_p (fn))
-   /* Do no conversions for magic varargs.  */
-   a = mark_type_use (a);
+  int magic = magic_varargs_p (fn);
+  if (magic == 2)
+   {
+ /* Do no conversions for certain magic varargs.  */
+ a = mark_type_use (a);
+ if (TREE_CODE (a) == FUNCTION_DECL && reject_gcc_builtin (a))
+   return error_mark_node;
+   }
+  else if (magic == 1)
+   /* For other magic varargs only do decay_conversion.  */
+   a = decay_conversion (a, complain);
   else if (DECL_CONSTRUCTOR_P (fn)
   && same_type_ignoring_top_level_qualifiers_p (DECL_CONTEXT (fn),
 TREE_TYPE (a)))
@@ -7530,6 +7540,8 @@ build_over_call (struct z_candidate *can
}
   else
a = convert_arg_to_ellipsis (a, complain);
+  if (a == error_mark_node)
+   return error_mark_node;
   argarray[j++] = a;
 }
 
--- gcc/testsuite/c-c++-common/pr70144-1.c.jj   2016-03-09 13:10:58.246778355 
+0100
+++ gcc/testsuite/c-c++-common/pr70144-1.c  2016-03-09 13:10:04.0 
+0100
@@ -0,0 +1,9 @@
+/* PR c++/70144 */
+/* { dg-do compile } */
+
+void
+foo ()
+{
+  __builtin_constant_p (__builtin_constant_p) ?: ({ unsigned t = 0; t; }); 

[C++ PATCH] Reuse certain cxx_eval_constant_expression results in cxx_eval_vec_init_1 (PR c++/70001)

2016-03-10 Thread Jakub Jelinek
Hi!

As mentioned in the PR, the compile time and compile memory are wasted
if a large array is is using value or default initialization, and
if the resulting initializer value is simple enough, we can just share
it by all the elements.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-03-10  Patrick Palka  
Jakub Jelinek  

PR c++/70001
* constexpr.c (cxx_eval_vec_init_1): For pre_init case, reuse
return value from cxx_eval_constant_expression from earlier
elements if it is valid constant initializer requiring no
relocations.

* g++.dg/cpp0x/constexpr-70001-1.C: New test.
* g++.dg/cpp0x/constexpr-70001-2.C: New test.
* g++.dg/cpp0x/constexpr-70001-3.C: New test.

--- gcc/cp/constexpr.c.jj   2016-03-08 21:04:43.050564671 +0100
+++ gcc/cp/constexpr.c  2016-03-10 12:52:04.016852313 +0100
@@ -2340,6 +2340,7 @@ cxx_eval_vec_init_1 (const constexpr_ctx
   vec **p = _ELTS (ctx->ctor);
   vec_alloc (*p, max + 1);
   bool pre_init = false;
+  tree pre_init_elt = NULL_TREE;
   unsigned HOST_WIDE_INT i;
 
   /* For the default constructor, build up a call to the default
@@ -2389,9 +2390,18 @@ cxx_eval_vec_init_1 (const constexpr_ctx
{
  /* Initializing an element using value or default initialization
 we just pre-built above.  */
- eltinit = (cxx_eval_constant_expression
-(_ctx, init,
- lval, non_constant_p, overflow_p));
+ if (pre_init_elt == NULL_TREE)
+   pre_init_elt
+ = cxx_eval_constant_expression (_ctx, init, lval,
+ non_constant_p, overflow_p);
+ eltinit = pre_init_elt;
+ /* Don't reuse the result of cxx_eval_constant_expression
+call if it isn't a constant initializer or if it requires
+relocations.  */
+ if (initializer_constant_valid_p (pre_init_elt,
+   TREE_TYPE (pre_init_elt))
+ != null_pointer_node)
+   pre_init_elt = NULL_TREE;
}
   else
{
--- gcc/testsuite/g++.dg/cpp0x/constexpr-70001-1.C.jj   2016-03-10 
13:08:58.732932160 +0100
+++ gcc/testsuite/g++.dg/cpp0x/constexpr-70001-1.C  2016-03-10 
13:05:53.0 +0100
@@ -0,0 +1,13 @@
+// PR c++/70001
+// { dg-do compile { target c++11 } }
+
+struct B
+{
+  int a;
+  constexpr B () : a (0) { }
+};
+
+struct A
+{
+  B b[1 << 19];
+} c;
--- gcc/testsuite/g++.dg/cpp0x/constexpr-70001-2.C.jj   2016-03-10 
13:09:01.866889167 +0100
+++ gcc/testsuite/g++.dg/cpp0x/constexpr-70001-2.C  2016-03-10 
13:07:27.0 +0100
@@ -0,0 +1,19 @@
+// PR c++/70001
+// { dg-do run { target c++11 } }
+
+struct B
+{
+  struct B *a;
+  constexpr B () : a (this) { }
+};
+
+constexpr int N = 1 << 4;
+struct A { B c[N]; } d;
+
+int
+main ()
+{
+  for (int i = 0; i < N; ++i)
+if (d.c[i].a != [i])
+  __builtin_abort ();
+}
--- gcc/testsuite/g++.dg/cpp0x/constexpr-70001-3.C.jj   2016-03-10 
13:09:04.700850290 +0100
+++ gcc/testsuite/g++.dg/cpp0x/constexpr-70001-3.C  2016-03-10 
13:09:53.199184977 +0100
@@ -0,0 +1,26 @@
+// PR c++/70001
+// { dg-do compile { target c++11 } }
+
+#include 
+#include 
+
+typedef std::complex cd;
+
+const int LOG = 17;
+const int N = (1 << LOG);
+
+std::array a;
+std::array b;
+
+void
+foo (std::array )
+{
+  std::array, LOG + 1> f;
+}
+
+int
+main ()
+{
+  foo (a);
+  foo (b);
+}

Jakub


Re: [AArch64] Emit square root using the Newton series

2016-03-10 Thread Evandro Menezes

On 03/10/16 10:52, Wilco Dijkstra wrote:

Hi Evandro,


I have however encountered precision issues with DF, namely some benchmarks in 
the SPECfp CPU2000 suite would fail to validate.

Accuracy is not an issue, the computation is extremely accurate. The issue is 
that your patch doesn't support sqrt(0.0) - it returns NaN rather than zero, 
and that causes the miscompares you're seeing. So support for the zero case 
should be added.

This would be a better expansion, supporting zero, and with lower latency than 
the current sequence:

 fcmps0, 0.0
 beq  zero
 frsqrtes1, s0
 fmuls2, s1, s1
 frsqrtss2, s0, s2
 fmuls1, s1, s2
 fmuls2, s1, s1
 fmul   s1, s0, s1
 frsqrtss2, s0, s2
 fmuls0, s1, s2
zero:

For the vector variant you can't avoid the extra latency of an AND, but it 
should not be slower than it is today.


Thanks for the pointer, Wilco.  Will work it in the patch.

--
Evandro Menezes



Re: [AArch64] Emit square root using the Newton series

2016-03-10 Thread Wilco Dijkstra
Hi Evandro,

> I have however encountered precision issues with DF, namely some benchmarks 
> in the SPECfp CPU2000 suite would fail to validate. 

Accuracy is not an issue, the computation is extremely accurate. The issue is 
that your patch doesn't support sqrt(0.0) - it returns NaN rather than zero, 
and that causes the miscompares you're seeing. So support for the zero case 
should be added.

This would be a better expansion, supporting zero, and with lower latency than 
the current sequence:

fcmp    s0, 0.0
beq      zero
    frsqrte    s1, s0
    fmul    s2, s1, s1
    frsqrts    s2, s0, s2
    fmul    s1, s1, s2
    fmul    s2, s1, s1
fmul   s1, s0, s1
    frsqrts    s2, s0, s2
fmul    s0, s1, s2
zero:

For the vector variant you can't avoid the extra latency of an AND, but it 
should not be slower than it is today.

Cheers,
Wilco








Re: [RFC][PR69708] IPA inline not working for function reference in static const struc

2016-03-10 Thread Martin Jambor
Hi,

On Tue, Mar 01, 2016 at 09:04:25AM +1100, kugan wrote:
> Hi,
> 
> As discussed in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69708 and
> corresponding mailing list discussion, IPA CP is not detecting  a
> jump-function with the sq function as value.
> 
> 

sorry it took so long for me to look at this.  First, I have looked at
your patch and found a number of issues (see comments below), but when
I tried to fix them (see my patch below), I found out that using the
aggregate jump functions is not the the best approach.  But let me
start with the comments first:

> 
> 
> 2016-03-01  Kugan Vivekanandarajah  
> 
> 
> 
>   * ipa-prop.c (determine_locally_known_aggregate_parts): Determine jump
> 
>function for static constant initialization.
> 

> diff --git a/gcc/ipa-prop.c b/gcc/ipa-prop.c
> index 72c2fed..22da097 100644
> --- a/gcc/ipa-prop.c
> +++ b/gcc/ipa-prop.c
> @@ -1562,6 +1562,57 @@ determine_locally_known_aggregate_parts (gcall *call, 
> tree arg,
>jfunc->agg.by_ref = by_ref;
>build_agg_jump_func_from_list (list, const_count, arg_offset, jfunc);
>  }
> +  else if ((TREE_CODE (arg) == VAR_DECL)
> +&& is_global_var (arg))

It would be better to check for this before iterating over statements
because they of course cannot write anything useful to a constant
global.

> +{
> +  /* PR69708:  Figure out aggregate jump-function with constant init
> +  value.  */
> +  struct ipa_known_agg_contents_list *n, **p;
> +  HOST_WIDE_INT offset = 0, size, max_size;
> +  varpool_node *node = varpool_node::get (arg);

What do you need the varpool_node for?  node->decl should always be
arg, I believe that unless you use ultimate_alias_target, node->decl
will always be arg.  

> +  if (node
> +   && DECL_INITIAL (node->decl)
> +   && TREE_READONLY (node->decl)
> +   && TREE_CODE (DECL_INITIAL (node->decl)) == CONSTRUCTOR)
> + {
> +   tree exp = DECL_INITIAL (node->decl);
> +   unsigned HOST_WIDE_INT ix;
> +   tree field, val;
> +   bool reverse;
> +   FOR_EACH_CONSTRUCTOR_ELT (CONSTRUCTOR_ELTS (exp), ix, field, val)
> + {
> +   bool already_there = false;
> +   if (!field)
> + break;
> +   get_ref_base_and_extent (field, , ,
> +_size, );

I think you got this working just by luck.  On your testcase, field is
a field_decl which get_ref_base_and_extent does not handle and so the
returned offset is always zero.  Just add another field before the
call field and see for yourself.

Moreover, we should be able to also handle global read only arrays of
functions where field would be an index or NULL.  And we should also
try to handle structures containing such arrays, which means
constructors would be nested (see the last testcase of the patch
below).

> +   if (max_size == -1
> +   || max_size != size)
> + break;
> +   p = get_place_in_agg_contents_list (, offset, size,
> +   _there);
> +   if (!p)
> + break;
> +   n = XALLOCA (struct ipa_known_agg_contents_list);

I believe the elements of a constructor are always already sorted and
so xalloca and insert-sort performed by get_place_in_agg_contents_list
is not necessary.

> +   n->size = size;
> +   n->offset = offset;
> +   if (is_gimple_ip_invariant (val))
> + {

Nevertheless, thanks for the patch, as it shows how the issue can be
somewhat fixed fairly easily, even though imperfectly.  I have fixed
the issues above and came up with the patch below.  However, if you
look at the testcases you'll see xfails because even though we should
be able to find the target of all calls in foo and bar functions, in a
few testcases we only manage to find the first or none at all.

The first reason for that is that when we identify whether targets of
indirect calls come from a parameter, alias analysis tells us that the
first call might have changed the values in the aggregate and thus we
do not identify the subsequent calls as calling a target that we know
comes from a parameter.

To fix this, we must do two things.  First, store the parameter index
and offset to indirect edges even if alias analysis tells us that the
value might be overwritten, which however must be stored there too, so
that we must use the index/offset information only when the we later
on discover that the parameter points to a constant variable.  So this
needs to be marked somewhere in the jump function too.  Thah alone 

However, a similar AA issue is going to persist for pass-through jump
functions, as shown by xfailing ipcp-cstagg-7.c testcase.  To fix
that, we'd either have to force propagation of aggregate values from
constant globals even through jump functions that have agg_preserved
flag cleared, or, and I think this is perhaps a better idea, rethink
the whole 

Re: [PATCH][AArch64] Replace insn to zero up DF register

2016-03-10 Thread James Greenhalgh
On Thu, Mar 10, 2016 at 10:32:15AM -0600, Evandro Menezes wrote:
> >I agree to postpone until GCC 7.
> >
> >[AArch64] Replace insn to zero up SIMD registers
> >
> >gcc/
> >* config/aarch64/aarch64.md
> >(*movhf_aarch64): Add "movi %0, #0" to zero up register.
> >(*movsf_aarch64): Likewise and add "simd" attributes.
> >(*movdf_aarch64): Likewise.
> >
> >This patch removes the FP attributes from the HF, SF, DF, TF moves.
> 
> And now, with the patch. :-/
> 

Thanks for sticking with it. This is OK for GCC 7 when development
opens.

Remember to mention the most recent changes in your Changelog entry
(Remove "fp" attribute from *movhf_aarch64 and *movtf_aarch64).

Thanks,
James



Re: Fix ICE in initialize_node_lattices

2016-03-10 Thread Jan Hubicka
> On Thu, Mar 10, 2016 at 05:12:59PM +0100, Jan Hubicka wrote:
> > --- testsuite/g++.dg/lto/pr69589_0.C(revision 0)
> > +++ testsuite/g++.dg/lto/pr69589_0.C(working copy)
> > @@ -0,0 +1,26 @@
> > +// { dg-lto-do link }
> > +// { dg-lto-options "-O2 -rdynamic" } 
> 
> Shouldn't this have
> // { dg-require-visibility "hidden" }
> when it uses the visibility push(hidden) ?

You are right, I will add it to the testcase.

Honza
> 
> > +// { dg-extra-ld-options "-r -nostdlib" }
> > +#pragma GCC visibility push(hidden)
> > +struct A { int [] (long); };
> > +template  struct B;
> > +template  >
> > +using Z = int;
> > +template  struct C;
> > +struct S {
> > +  int e;
> > +  virtual ~S () {}
> > +};
> > +struct D : S {
> > +  A a;
> > +  long i;
> > +  D() { { e ? [i] : nullptr; } }
> > +};
> > +template <>
> > +struct C { Z m8 () const; };
> > +Z
> > +C::m8 () const
> > +{
> > +  D ();
> > +}
> 
>   Jakub


Re: [PATCH][AArch64] Replace insn to zero up DF register

2016-03-10 Thread Evandro Menezes

On 03/10/16 10:27, Evandro Menezes wrote:

On 03/10/16 07:23, James Greenhalgh wrote:

On Wed, Mar 09, 2016 at 03:35:43PM -0600, Evandro Menezes wrote:

On 03/01/16 13:08, Evandro Menezes wrote:

On 03/01/16 13:02, Wilco Dijkstra wrote:

Evandro Menezes wrote:

The meaning of these attributes are not clear to me.  Is there a
reference somewhere about which insns are FP or SIMD or neither?

The meaning should be clear, "fp" is a floating point
instruction, "simd" a SIMD one
as defined in ARM-ARM.

Indeed, I had to add the Y for the f_mcr insn to match it with 
nosimd.
However, I didn't feel that it should be moved to the right, 
since it's

already disparaged.  Am I missing something detail?

It might not matter for this specific case, but I have seen
reload forcing the very
first alternative without looking at any costs or preferences -
as long as it is legal.
This suggests we need to order alternatives from most preferred
alternative to least
preferred one.

I think it is good enough for commit, James?

Methinks that my issue with those attributes is that I'm not as
fluent in AArch64 as I'd like to be.

Please, feel free to edit the patch changing the order then.

Replace insn to zero up SIMD registers

gcc/
 * config/aarch64/aarch64.md
 (*movhf_aarch64): Add "movi %0, #0" to zero up register.
 (*movsf_aarch64): Likewise and add "simd" and "fp" 
attributes.

 (*movdf_aarch64): Likewise.

Swapped the order of the constraints to favor MOVI.

Just say the word...
I'm wondering whether this is appropriate for GCC6 now that we are so 
late
in the development cycle. Additionally, I have some comments on your 
patch:


diff --git a/gcc/config/aarch64/aarch64.md 
b/gcc/config/aarch64/aarch64.md

index 68676c9..4502a58 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -1163,11 +1163,12 @@
  )
(define_insn "*movhf_aarch64"
-  [(set (match_operand:HF 0 "nonimmediate_operand" "=w, 
?r,w,w,m,r,m ,r")
-(match_operand:HF 1 "general_operand"  "?rY, 
w,w,m,w,m,rY,r"))]
+  [(set (match_operand:HF 0 "nonimmediate_operand" "=w,w 
,?r,w,w,m,r,m ,r")
+(match_operand:HF 1 "general_operand"  "Y ,?rY, 
w,w,m,w,m,rY,r"))]

"TARGET_FLOAT && (register_operand (operands[0], HFmode)
  || aarch64_reg_or_fp_zero (operands[1], HFmode))"
"@
+   movi\\t%0.4h, #0
 mov\\t%0.h[0], %w1
 umov\\t%w0, %1.h[0]
 mov\\t%0.h[0], %1.h[0]
@@ -1176,18 +1177,19 @@
 ldrh\\t%w0, %1
 strh\\t%w1, %0
 mov\\t%w0, %w1"
-  [(set_attr "type" "neon_from_gp,neon_to_gp,neon_move,\
+  [(set_attr "type" "neon_move,neon_from_gp,neon_to_gp,neon_move,\
   f_loads,f_stores,load1,store1,mov_reg")
-   (set_attr "simd" "yes,yes,yes,*,*,*,*,*")
-   (set_attr "fp"   "*,*,*,yes,yes,*,*,*")]
+   (set_attr "simd" "yes,yes,yes,yes,*,*,*,*,*")
+   (set_attr "fp"   "*,*,*,*,yes,yes,*,*,*")]
  )
(define_insn "*movsf_aarch64"
-  [(set (match_operand:SF 0 "nonimmediate_operand" "=w, ?r,w,w  
,w,m,r,m ,r")
-(match_operand:SF 1 "general_operand"  "?rY, 
w,w,Ufc,m,w,m,rY,r"))]
+  [(set (match_operand:SF 0 "nonimmediate_operand" "=w,w ,?r,w,w  
,w,m,r,m ,r")
+(match_operand:SF 1 "general_operand"  "Y ,?rY, 
w,w,Ufc,m,w,m,rY,r"))]

"TARGET_FLOAT && (register_operand (operands[0], SFmode)
  || aarch64_reg_or_fp_zero (operands[1], SFmode))"
"@
+   movi\\t%0.2s, #0
 fmov\\t%s0, %w1
 fmov\\t%w0, %s1
 fmov\\t%s0, %s1
@@ -1197,16 +1199,19 @@
 ldr\\t%w0, %1
 str\\t%w1, %0
 mov\\t%w0, %w1"
-  [(set_attr "type" "f_mcr,f_mrc,fmov,fconsts,\
- f_loads,f_stores,load1,store1,mov_reg")]
+  [(set_attr "type" "neon_move,f_mcr,f_mrc,fmov,fconsts,\
+ f_loads,f_stores,load1,store1,mov_reg")
+   (set_attr "simd" "yes,*,*,*,*,*,*,*,*,*")
+   (set_attr "fp"   "*,*,*,yes,yes,yes,yes,*,*,*")]
  )

This fp attribute looks wrong to me. The two fmov instructions that move
between core and FP registers should be tagged "yes". However, this is
irrelevant as the whole pattern is guarded by TARGET_FLOAT.

It would be clearer to drop the FP attribute entirely, so as not to give
the erroneous impression that some alternatives in this insn are enabled
for !TARGET_FLOAT.


You mean to remove the FP attribute from all, HF, SF, DF, TF?


  (define_insn "*movdf_aarch64"
-  [(set (match_operand:DF 0 "nonimmediate_operand" "=w, ?r,w,w  
,w,m,r,m ,r")
-(match_operand:DF 1 "general_operand"  "?rY, 
w,w,Ufc,m,w,m,rY,r"))]
+  [(set (match_operand:DF 0 "nonimmediate_operand" "=w,w ,?r,w,w  
,w,m,r,m ,r")
+(match_operand:DF 1 "general_operand"  "Y ,?rY, 
w,w,Ufc,m,w,m,rY,r"))]

"TARGET_FLOAT && (register_operand (operands[0], DFmode)
  || aarch64_reg_or_fp_zero (operands[1], DFmode))"
"@
+   movi\\t%d0, #0
 fmov\\t%d0, %x1
 fmov\\t%x0, %d1
 fmov\\t%d0, %d1
@@ -1216,8 +1221,10 @@
 ldr\\t%x0, %1
 str\\t%x1, %0
 mov\\t%x0, %x1"
-  

Re: [PATCH][AArch64] Replace insn to zero up DF register

2016-03-10 Thread Evandro Menezes

On 03/10/16 07:23, James Greenhalgh wrote:

On Wed, Mar 09, 2016 at 03:35:43PM -0600, Evandro Menezes wrote:

On 03/01/16 13:08, Evandro Menezes wrote:

On 03/01/16 13:02, Wilco Dijkstra wrote:

Evandro Menezes wrote:

The meaning of these attributes are not clear to me.  Is there a
reference somewhere about which insns are FP or SIMD or neither?

The meaning should be clear, "fp" is a floating point
instruction, "simd" a SIMD one
as defined in ARM-ARM.


Indeed, I had to add the Y for the f_mcr insn to match it with nosimd.
However, I didn't feel that it should be moved to the right, since it's
already disparaged.  Am I missing something detail?

It might not matter for this specific case, but I have seen
reload forcing the very
first alternative without looking at any costs or preferences -
as long as it is legal.
This suggests we need to order alternatives from most preferred
alternative to least
preferred one.

I think it is good enough for commit, James?

Methinks that my issue with those attributes is that I'm not as
fluent in AArch64 as I'd like to be.

Please, feel free to edit the patch changing the order then.

Replace insn to zero up SIMD registers

gcc/
 * config/aarch64/aarch64.md
 (*movhf_aarch64): Add "movi %0, #0" to zero up register.
 (*movsf_aarch64): Likewise and add "simd" and "fp" attributes.
 (*movdf_aarch64): Likewise.

Swapped the order of the constraints to favor MOVI.

Just say the word...

I'm wondering whether this is appropriate for GCC6 now that we are so late
in the development cycle. Additionally, I have some comments on your patch:


diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 68676c9..4502a58 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -1163,11 +1163,12 @@
  )
  
  (define_insn "*movhf_aarch64"

-  [(set (match_operand:HF 0 "nonimmediate_operand" "=w, ?r,w,w,m,r,m ,r")
-   (match_operand:HF 1 "general_operand"  "?rY, w,w,m,w,m,rY,r"))]
+  [(set (match_operand:HF 0 "nonimmediate_operand" "=w,w  ,?r,w,w,m,r,m ,r")
+   (match_operand:HF 1 "general_operand"  "Y ,?rY, w,w,m,w,m,rY,r"))]
"TARGET_FLOAT && (register_operand (operands[0], HFmode)
  || aarch64_reg_or_fp_zero (operands[1], HFmode))"
"@
+   movi\\t%0.4h, #0
 mov\\t%0.h[0], %w1
 umov\\t%w0, %1.h[0]
 mov\\t%0.h[0], %1.h[0]
@@ -1176,18 +1177,19 @@
 ldrh\\t%w0, %1
 strh\\t%w1, %0
 mov\\t%w0, %w1"
-  [(set_attr "type" "neon_from_gp,neon_to_gp,neon_move,\
+  [(set_attr "type" "neon_move,neon_from_gp,neon_to_gp,neon_move,\
   f_loads,f_stores,load1,store1,mov_reg")
-   (set_attr "simd" "yes,yes,yes,*,*,*,*,*")
-   (set_attr "fp"   "*,*,*,yes,yes,*,*,*")]
+   (set_attr "simd" "yes,yes,yes,yes,*,*,*,*,*")
+   (set_attr "fp"   "*,*,*,*,yes,yes,*,*,*")]
  )
  
  (define_insn "*movsf_aarch64"

-  [(set (match_operand:SF 0 "nonimmediate_operand" "=w, ?r,w,w  ,w,m,r,m ,r")
-   (match_operand:SF 1 "general_operand"  "?rY, w,w,Ufc,m,w,m,rY,r"))]
+  [(set (match_operand:SF 0 "nonimmediate_operand" "=w,w  ,?r,w,w  ,w,m,r,m 
,r")
+   (match_operand:SF 1 "general_operand"  "Y ,?rY, 
w,w,Ufc,m,w,m,rY,r"))]
"TARGET_FLOAT && (register_operand (operands[0], SFmode)
  || aarch64_reg_or_fp_zero (operands[1], SFmode))"
"@
+   movi\\t%0.2s, #0
 fmov\\t%s0, %w1
 fmov\\t%w0, %s1
 fmov\\t%s0, %s1
@@ -1197,16 +1199,19 @@
 ldr\\t%w0, %1
 str\\t%w1, %0
 mov\\t%w0, %w1"
-  [(set_attr "type" "f_mcr,f_mrc,fmov,fconsts,\
- f_loads,f_stores,load1,store1,mov_reg")]
+  [(set_attr "type" "neon_move,f_mcr,f_mrc,fmov,fconsts,\
+ f_loads,f_stores,load1,store1,mov_reg")
+   (set_attr "simd" "yes,*,*,*,*,*,*,*,*,*")
+   (set_attr "fp"   "*,*,*,yes,yes,yes,yes,*,*,*")]
  )

This fp attribute looks wrong to me. The two fmov instructions that move
between core and FP registers should be tagged "yes". However, this is
irrelevant as the whole pattern is guarded by TARGET_FLOAT.

It would be clearer to drop the FP attribute entirely, so as not to give
the erroneous impression that some alternatives in this insn are enabled
for !TARGET_FLOAT.


You mean to remove the FP attribute from all, HF, SF, DF, TF?


  (define_insn "*movdf_aarch64"
-  [(set (match_operand:DF 0 "nonimmediate_operand" "=w, ?r,w,w  ,w,m,r,m ,r")
-   (match_operand:DF 1 "general_operand"  "?rY, w,w,Ufc,m,w,m,rY,r"))]
+  [(set (match_operand:DF 0 "nonimmediate_operand" "=w,w  ,?r,w,w  ,w,m,r,m 
,r")
+   (match_operand:DF 1 "general_operand"  "Y ,?rY, 
w,w,Ufc,m,w,m,rY,r"))]
"TARGET_FLOAT && (register_operand (operands[0], DFmode)
  || aarch64_reg_or_fp_zero (operands[1], DFmode))"
"@
+   movi\\t%d0, #0
 fmov\\t%d0, %x1
 fmov\\t%x0, %d1
 fmov\\t%d0, %d1
@@ -1216,8 +1221,10 @@
 ldr\\t%x0, %1
 str\\t%x1, %0
 mov\\t%x0, %x1"
-  [(set_attr "type" 

Re: [PATCH][AArch64] Fix thinko in handling of -momit-leaf-frame-pointer option

2016-03-10 Thread James Greenhalgh
On Thu, Mar 10, 2016 at 03:42:38PM +, Kyrill Tkachov wrote:
> Hi all,
> 
> When extending the aarch64_handle_option function for GCC 6 I introduced a 
> thinko
> when handling the -momit-leaf-frame-pointer option and had it set the variable
> for -fomit-frame-pointer instead. This hasn't been particularly impactful 
> because
> -momit-leaf-frame-pointer is on by default and will set -fomit-frame-pointer 
> to true
> anyway in aarch64_override_options_after_change_1, but it would be exposed by 
> the fix
> for PR 70044.
> 
> This patch fixes that thinko.
> Bootstrapped and tested on aarch64-none-linux-gnu.
> 
> Ok for trunk?

Looks obvious to me from your description. OK for trunk.

Thanks,
James

> 2016-03-10  Kyrylo Tkachov  
> 
> * common/config/aarch64/aarch64-common.c (aarch64_handle_option):
> Set x_flag_omit_leaf_frame_pointer when handling
> -momit-leaf-frame-pointer.

> diff --git a/gcc/common/config/aarch64/aarch64-common.c 
> b/gcc/common/config/aarch64/aarch64-common.c
> index 
> 69f5da70e3d796752040de80ab1594a260e613d1..4969f07fef4c1c7d3bd7e5fa03149a4945e30b23
>  100644
> --- a/gcc/common/config/aarch64/aarch64-common.c
> +++ b/gcc/common/config/aarch64/aarch64-common.c
> @@ -98,7 +98,7 @@ aarch64_handle_option (struct gcc_options *opts,
>return true;
>  
>  case OPT_momit_leaf_frame_pointer:
> -  opts->x_flag_omit_frame_pointer = val;
> +  opts->x_flag_omit_leaf_frame_pointer = val;
>return true;
>  
>  default:



Re: Fix ICE in initialize_node_lattices

2016-03-10 Thread Jakub Jelinek
On Thu, Mar 10, 2016 at 05:12:59PM +0100, Jan Hubicka wrote:
> --- testsuite/g++.dg/lto/pr69589_0.C  (revision 0)
> +++ testsuite/g++.dg/lto/pr69589_0.C  (working copy)
> @@ -0,0 +1,26 @@
> +// { dg-lto-do link }
> +// { dg-lto-options "-O2 -rdynamic" } 

Shouldn't this have
// { dg-require-visibility "hidden" }
when it uses the visibility push(hidden) ?

> +// { dg-extra-ld-options "-r -nostdlib" }
> +#pragma GCC visibility push(hidden)
> +struct A { int [] (long); };
> +template  struct B;
> +template  >
> +using Z = int;
> +template  struct C;
> +struct S {
> +  int e;
> +  virtual ~S () {}
> +};
> +struct D : S {
> +  A a;
> +  long i;
> +  D() { { e ? [i] : nullptr; } }
> +};
> +template <>
> +struct C { Z m8 () const; };
> +Z
> +C::m8 () const
> +{
> +  D ();
> +}

Jakub


Re: [PATCH] Add -funconstrained-commons to work around PR/69368 (and others) in SPEC2006

2016-03-10 Thread Dominique d'Humières
Alan,

The test gfortran.dg/unconstrained_commons.f fails in the 32 bit mode. It needs 
some regexp capturing

  mycommon.x[j_1] = _5;

as in "  mycommon\.x\[.?_[0-9]+\] = _[0-9]+;"

TIA

Dominique



Fix ICE in initialize_node_lattices

2016-03-10 Thread Jan Hubicka
Hi,
this patch fixes ICE in ipa-cp that detects local function that is dead.
The issue is that virutal functions may have no address taken but still may
be virtual call targets.  This patch makes remove_unreachable_nodes to handle
this and not mark them local (since they are not)

Honza

* cgraph.c (cgraph_node::dump): Dump split_part and 
indirect_call_target.
* cgraph.h (cgraph_node): Add indirect_call_target flag.
* ipa.c (has_addr_references_p): Cleanup.
(is_indirect_call_target_p): New.
(walk_polymorphic_call_targets): Do not mark virtuals that may be
called indirectly as local.
(symbol_table::remove_unreachable_nodes): Compute indirect_call_target.

PR lto/69589
* g++.dg/lto/pr69589_0.C: New testcase
* g++.dg/lto/pr69589_1.C: New testcase
Index: cgraph.c
===
--- cgraph.c(revision 234108)
+++ cgraph.c(working copy)
@@ -2061,6 +2061,10 @@ cgraph_node::dump (FILE *f)
 fprintf (f, " icf_merged");
   if (merged_comdat)
 fprintf (f, " merged_comdat");
+  if (split_part)
+fprintf (f, " split_part");
+  if (indirect_call_target)
+fprintf (f, " indirect_call_target");
   if (nonfreeing_fn)
 fprintf (f, " nonfreeing_fn");
   if (DECL_STATIC_CONSTRUCTOR (decl))
Index: cgraph.h
===
--- cgraph.h(revision 234108)
+++ cgraph.h(working copy)
@@ -1366,6 +1366,8 @@ public:
   unsigned parallelized_function : 1;
   /* True if function is part split out by ipa-split.  */
   unsigned split_part : 1;
+  /* True if the function appears as possible target of indirect call.  */
+  unsigned indirect_call_target : 1;
 
 private:
   /* Worker for call_for_symbol_and_aliases.  */
Index: ipa.c
===
--- ipa.c   (revision 234108)
+++ ipa.c   (working copy)
@@ -41,7 +41,7 @@ along with GCC; see the file COPYING3.
 
 static bool
 has_addr_references_p (struct cgraph_node *node,
-  void *data ATTRIBUTE_UNUSED)
+  void *)
 {
   int i;
   struct ipa_ref *ref = NULL;
@@ -52,6 +52,14 @@ has_addr_references_p (struct cgraph_nod
   return false;
 }
 
+/* Return true when NODE can be target of an indirect call.  */
+
+static bool
+is_indirect_call_target_p (struct cgraph_node *node, void *)
+{
+  return node->indirect_call_target;
+}
+
 /* Look for all functions inlined to NODE and update their inlined_to pointers
to INLINED_TO.  */
 
@@ -172,23 +180,24 @@ walk_polymorphic_call_targets (hash_set<
(TYPE_METHOD_BASETYPE (TREE_TYPE (n->decl
continue;
 
-  symtab_node *body = n->function_symbol ();
+ n->indirect_call_target = true;
+ symtab_node *body = n->function_symbol ();
 
  /* Prior inlining, keep alive bodies of possible targets for
 devirtualization.  */
-  if (n->definition
-  && (before_inlining_p
-  && opt_for_fn (body->decl, optimize)
-  && opt_for_fn (body->decl, flag_devirtualize)))
- {
-/* Be sure that we will not optimize out alias target
-   body.  */
-if (DECL_EXTERNAL (n->decl)
-&& n->alias
-&& before_inlining_p)
-  reachable->add (body);
-   reachable->add (n);
- }
+ if (n->definition
+ && (before_inlining_p
+ && opt_for_fn (body->decl, optimize)
+ && opt_for_fn (body->decl, flag_devirtualize)))
+{
+   /* Be sure that we will not optimize out alias target
+  body.  */
+   if (DECL_EXTERNAL (n->decl)
+   && n->alias
+   && before_inlining_p)
+ reachable->add (body);
+  reachable->add (n);
+}
  /* Even after inlining we want to keep the possible targets in the
 boundary, so late passes can still produce direct call even if
 the chance for inlining is lost.  */
@@ -323,6 +332,7 @@ symbol_table::remove_unreachable_nodes (
   FOR_EACH_FUNCTION (node)
 {
   node->used_as_abstract_origin = false;
+  node->indirect_call_target = false;
   if (node->definition
  && !node->global.inlined_to
  && !node->in_other_partition
@@ -659,7 +669,14 @@ symbol_table::remove_unreachable_nodes (
  fprintf (file, " %s", node->name ());
node->address_taken = false;
changed = true;
-   if (node->local_p ())
+   if (node->local_p ()
+   /* Virtual functions may be kept in cgraph just because
+  of possible later devirtualization.  Do not mark them as
+  local too early so we won't optimize them out 

Better fix for PR69630

2016-03-10 Thread Jan Hubicka
Hi,
my first fix was just partial and fixed ICE in one branch not in other.
The issue is that code does not expect cxa_pure_virtual in target list and ICEs 
handling
it as a method while checking whether it should suggest final.

Bootstrapped/regtested x86_64-linux, comitted.

ipa/69630
* ipa-devirt.c (possible_polymorphic_call_targets): Do not ICE
on cxa_pure_virtual.
Index: ipa-devirt.c
===
--- ipa-devirt.c(revision 234108)
+++ ipa-devirt.c(working copy)
@@ -3177,11 +3177,11 @@ possible_polymorphic_call_targets (tree
 
  if (!outer_type->all_derivations_known)
{
- if (!speculative && final_warning_records)
+ if (!speculative && final_warning_records
+ && TREE_CODE (TREE_TYPE (nodes[0]->decl)) == METHOD_TYPE)
{
  if (complete
  && nodes.length () == 1
- && TREE_CODE (TREE_TYPE (nodes[0]->decl)) == METHOD_TYPE
  && warn_suggest_final_types
  && !outer_type->derived_types.length ())
{


Re: RFA: PR 70044: Catch a second call to aarch64_override_options_after_change

2016-03-10 Thread Kyrill Tkachov


On 10/03/16 15:23, James Greenhalgh wrote:

On Mon, Mar 07, 2016 at 01:12:16PM +, Nick Clifton wrote:

Hi Kyrill,


This is missing a second hunk from the patch you attached in the PR that I 
think is necessary
for this to work (setting to x_flag_omit_frame_pointer)...

Doh!  Silly me - there was a snafu restoring the patch after I had reverted it 
in order to
check that the pre- and post- patch gcc test results were the same.


Note that this patch would expose a bug in 
common/config/aarch64/aarch64-common.c
where there's a thinko in the handling of OPT_momit_leaf_frame_pointer.
That's my bad and I'll propose a patch for it soon.

OK.


Also, is there a way to create a testcase for the testuite?
i.e. is there a simple way to scan the assembly generated after the final LTO 
processing
for the presence of the frame pointer?

Originally I thought not.  But then I found scan-lto-assembler in 
testsuite/lib/scanasm.exp
and that made everything simple.

So attached is a revised patch with the missing second hunk restored and a 
testcase added.
(Which I have checked and confirmed that it does fail without the patch and it 
does pass
with the patch applied).

OK to apply ?

OK, thanks.


Note that this patch would expose a bug in 
common/config/aarch64/aarch64-common.c
where there's a thinko in the handling of OPT_momit_leaf_frame_pointer.
That's my bad and I'll propose a patch for it soon.

I don't think I've seen this on list yet, it might be worth waiting until
Kyrill has put this patch up before you commit.


Posted at https://gcc.gnu.org/ml/gcc-patches/2016-03/msg00638.html

Kyrill


Thanks,
James





[PATCH][AArch64] Fix thinko in handling of -momit-leaf-frame-pointer option

2016-03-10 Thread Kyrill Tkachov

Hi all,

When extending the aarch64_handle_option function for GCC 6 I introduced a 
thinko
when handling the -momit-leaf-frame-pointer option and had it set the variable
for -fomit-frame-pointer instead. This hasn't been particularly impactful 
because
-momit-leaf-frame-pointer is on by default and will set -fomit-frame-pointer to 
true
anyway in aarch64_override_options_after_change_1, but it would be exposed by 
the fix
for PR 70044.

This patch fixes that thinko.
Bootstrapped and tested on aarch64-none-linux-gnu.

Ok for trunk?

Thanks,
Kyrill

2016-03-10  Kyrylo Tkachov  

* common/config/aarch64/aarch64-common.c (aarch64_handle_option):
Set x_flag_omit_leaf_frame_pointer when handling
-momit-leaf-frame-pointer.
diff --git a/gcc/common/config/aarch64/aarch64-common.c b/gcc/common/config/aarch64/aarch64-common.c
index 69f5da70e3d796752040de80ab1594a260e613d1..4969f07fef4c1c7d3bd7e5fa03149a4945e30b23 100644
--- a/gcc/common/config/aarch64/aarch64-common.c
+++ b/gcc/common/config/aarch64/aarch64-common.c
@@ -98,7 +98,7 @@ aarch64_handle_option (struct gcc_options *opts,
   return true;
 
 case OPT_momit_leaf_frame_pointer:
-  opts->x_flag_omit_frame_pointer = val;
+  opts->x_flag_omit_leaf_frame_pointer = val;
   return true;
 
 default:


Re: Add C++ special math functions to C++17

2016-03-10 Thread Ed Smith-Rowland

On 03/10/2016 05:24 AM, Tobias Burnus wrote:

Ed Smith-Rowland wrote:

--- include/bits/specfun.h  
(svn+ssh://e...@gcc.gnu.org/svn/gcc/trunk/libstdc++-v3/include) (working copy)
+++ svn+ssh://e...@gcc.gnu.org/svn/gcc/trunk/libstdc++-v3/include   
(revision 234098)
@@ -1,6 +1,6 @@
  // Mathematical Special Functions for -*- C++ -*-
  
-// Copyright (C) 2006-2016 Free Software Foundation, Inc.

+// Copyright (C) 2006-2015 Free Software Foundation, Inc.

The patch looks reversed ("---" for the "working copy", copyright-year change 
reversed etc.)

Cheers,

Tobias


Third time's a charm.
(I accidentally checked these into my branch along with a general merge 
from trunk - hence my issues)


2016-03-10  Edward Smith-Rowland  <3dw...@verizon.net>

Pull C++ Special Math into std for C++17.
* include/bits/specfun.h: Allow special functions for C++17;
define __cpp_lib_math_special_functions.
* include/c_global/cmath: Allow special functions for C++17.
* include/c_compatibility/math.h: Ditto

Index: include/bits/specfun.h
===
--- include/bits/specfun.h  
(svn+ssh://e...@gcc.gnu.org/svn/gcc/trunk/libstdc++-v3/include) (revision 
234110)
+++ include/bits/specfun.h  (.../include)   (working copy)
@@ -1,6 +1,6 @@
 // Mathematical Special Functions for -*- C++ -*-
 
-// Copyright (C) 2006-2015 Free Software Foundation, Inc.
+// Copyright (C) 2006-2016 Free Software Foundation, Inc.
 //
 // This file is part of the GNU ISO C++ Library.  This library is free
 // software; you can redistribute it and/or modify it under the
@@ -34,15 +34,21 @@
 
 #include 
 
-#if __STDCPP_WANT_MATH_SPEC_FUNCS__ == 0
+#define __STDCPP_MATH_SPEC_FUNCS__ 201003L
+
+#define __cpp_lib_math_special_functions 201603L
+
+#if __cplusplus <= 201402L && __STDCPP_WANT_MATH_SPEC_FUNCS__ == 0
 # error include  and define __STDCPP_WANT_MATH_SPEC_FUNCS__
 #endif
 
-#define __STDCPP_MATH_SPEC_FUNCS__ 201003L
-
+#include 
 #include 
-#include 
-#include 
+#if __cplusplus >= 201103L
+#  include 
+#else
+#  include 
+#endif
 
 #include 
 #include 
Index: include/c_global/cmath
===
--- include/c_global/cmath  
(svn+ssh://e...@gcc.gnu.org/svn/gcc/trunk/libstdc++-v3/include) (revision 
234110)
+++ include/c_global/cmath  (.../include)   (working copy)
@@ -1790,7 +1790,7 @@
 
 #endif // C++11
 
-#if __STDCPP_WANT_MATH_SPEC_FUNCS__ == 1
+#if __cplusplus > 201402L || __STDCPP_WANT_MATH_SPEC_FUNCS__ == 1
 #  include 
 #endif
 
Index: include/c_compatibility/math.h
===
--- include/c_compatibility/math.h  
(svn+ssh://e...@gcc.gnu.org/svn/gcc/trunk/libstdc++-v3/include) (revision 
234110)
+++ include/c_compatibility/math.h  (.../include)   (working copy)
@@ -111,7 +111,7 @@
 using std::trunc;
 #endif // C++11 && _GLIBCXX_USE_C99_MATH_TR1
 
-#if __STDCPP_WANT_MATH_SPEC_FUNCS__ == 1
+#if __cplusplus > 201402L || __STDCPP_WANT_MATH_SPEC_FUNCS__ == 1
 using std::assoc_laguerref;
 using std::assoc_laguerrel;
 using std::assoc_laguerre;


Re: RFA: PR 70044: Catch a second call to aarch64_override_options_after_change

2016-03-10 Thread James Greenhalgh
On Mon, Mar 07, 2016 at 01:12:16PM +, Nick Clifton wrote:
> Hi Kyrill,
> 
> > This is missing a second hunk from the patch you attached in the PR that I 
> > think is necessary
> > for this to work (setting to x_flag_omit_frame_pointer)...
> 
> Doh!  Silly me - there was a snafu restoring the patch after I had reverted 
> it in order to
> check that the pre- and post- patch gcc test results were the same.
> 
> > Note that this patch would expose a bug in 
> > common/config/aarch64/aarch64-common.c
> > where there's a thinko in the handling of OPT_momit_leaf_frame_pointer.
> > That's my bad and I'll propose a patch for it soon.
> 
> OK.
> 
> > Also, is there a way to create a testcase for the testuite?
> > i.e. is there a simple way to scan the assembly generated after the final 
> > LTO processing
> > for the presence of the frame pointer?
> 
> Originally I thought not.  But then I found scan-lto-assembler in 
> testsuite/lib/scanasm.exp
> and that made everything simple.
> 
> So attached is a revised patch with the missing second hunk restored and a 
> testcase added.
> (Which I have checked and confirmed that it does fail without the patch and 
> it does pass
> with the patch applied).
> 
> OK to apply ?

OK, thanks.

> > Note that this patch would expose a bug in 
> > common/config/aarch64/aarch64-common.c
> > where there's a thinko in the handling of OPT_momit_leaf_frame_pointer.
> > That's my bad and I'll propose a patch for it soon.

I don't think I've seen this on list yet, it might be worth waiting until
Kyrill has put this patch up before you commit.

Thanks,
James



Re: C++ PATCH for c++/70153 (unhandled UNARY_PLUS_EXPR causes stack overflow)

2016-03-10 Thread Marek Polacek
On Thu, Mar 10, 2016 at 09:59:07AM -0500, Jason Merrill wrote:
> On 03/10/2016 09:53 AM, Marek Polacek wrote:
> >+case UNARY_PLUS_EXPR:
> >+  op0 = cp_fold_maybe_rvalue (TREE_OPERAND (x, 0), rval_ops);
> 
> You can just use cp_fold_rvalue here.  OK with that change.

Thanks.  Final patch I'm about to install:

2016-03-10  Marek Polacek  

PR c++/70153
* cp-gimplify.c (cp_fold): Handle UNARY_PLUS_EXPR.

* g++.dg/delayedfold/unary-plus1.C: New test.

diff --git gcc/cp/cp-gimplify.c gcc/cp/cp-gimplify.c
index 6af3760..6a767fa 100644
--- gcc/cp/cp-gimplify.c
+++ gcc/cp/cp-gimplify.c
@@ -1996,7 +1996,6 @@ cp_fold (tree x)
 case BIT_NOT_EXPR:
 case TRUTH_NOT_EXPR:
 case FIXED_CONVERT_EXPR:
-case UNARY_PLUS_EXPR:
 case INDIRECT_REF:
 
   loc = EXPR_LOCATION (x);
@@ -2016,6 +2015,14 @@ cp_fold (tree x)
  || !VOID_TYPE_P (TREE_TYPE (TREE_OPERAND (x, 0;
   break;
 
+case UNARY_PLUS_EXPR:
+  op0 = cp_fold_rvalue (TREE_OPERAND (x, 0));
+  if (op0 == error_mark_node)
+   x = error_mark_node;
+  else
+   x = fold_convert (TREE_TYPE (x), op0);
+  break;
+
 case POSTDECREMENT_EXPR:
 case POSTINCREMENT_EXPR:
 case INIT_EXPR:
diff --git gcc/testsuite/g++.dg/delayedfold/unary-plus1.C 
gcc/testsuite/g++.dg/delayedfold/unary-plus1.C
index e69de29..ebf3493 100644
--- gcc/testsuite/g++.dg/delayedfold/unary-plus1.C
+++ gcc/testsuite/g++.dg/delayedfold/unary-plus1.C
@@ -0,0 +1,22 @@
+// PR c++/70153
+// { dg-do run }
+
+unsigned long long int
+fn1 (unsigned long long int i)
+{
+  return 2ULL * ((1 + (unsigned long int) +1) * i);
+}
+
+unsigned long long int
+fn2 (unsigned long long int i)
+{
+  return 2ULL * ((1 + (unsigned long int) +(1 + 0)) * i);
+}
+
+int
+main (void)
+{
+  if (fn1 (3ULL) != 12ULL
+  || fn2 (3ULL) != 12ULL)
+__builtin_abort ();
+}

Marek


Re: C++ PATCH for c++/70153 (unhandled UNARY_PLUS_EXPR causes stack overflow)

2016-03-10 Thread Jason Merrill

On 03/10/2016 09:53 AM, Marek Polacek wrote:

+case UNARY_PLUS_EXPR:
+  op0 = cp_fold_maybe_rvalue (TREE_OPERAND (x, 0), rval_ops);


You can just use cp_fold_rvalue here.  OK with that change.

Jason




Re: C++ PATCH for c++/70153 (unhandled UNARY_PLUS_EXPR causes stack overflow)

2016-03-10 Thread Marek Polacek
On Thu, Mar 10, 2016 at 09:34:17AM -0500, Jason Merrill wrote:
> On 03/10/2016 08:07 AM, Marek Polacek wrote:
> >On Thu, Mar 10, 2016 at 01:56:41PM +0100, Jakub Jelinek wrote:
> >>On Thu, Mar 10, 2016 at 01:46:45PM +0100, Marek Polacek wrote:
> >>>2016-03-10  Marek Polacek  
> >>>
> >>>   PR c++/70153
> >>>   * cp-gimplify.c (cp_fold): Handle UNARY_PLUS_EXPR.
> >>>
> >>>   * g++.dg/delayedfold/unary-plus1.C: New test.
> >>>
> >>>diff --git gcc/cp/cp-gimplify.c gcc/cp/cp-gimplify.c
> >>>index 6af3760..db23efe 100644
> >>>--- gcc/cp/cp-gimplify.c
> >>>+++ gcc/cp/cp-gimplify.c
> >>>@@ -2009,6 +2009,8 @@ cp_fold (tree x)
> >>> else
> >>>   x = fold_build1_loc (loc, code, TREE_TYPE (x), op0);
> >>>   }
> >>>+  else if (code == UNARY_PLUS_EXPR)
> >>>+  x = fold_convert (TREE_TYPE (x), op0);
> >>>else
> >>>   x = fold (x);
> >>>
> >>
> >>Won't this still leak UNARY_PLUS_EXPR into the folded result if
> >>you could fold the operand of that?  It will take the
> >>   x = fold_build1_loc (loc, code, TREE_TYPE (x), op0);
> >>path...
> >
> >...of course :(.  Testcase for that:
> >return 2ULL * ((1 + (unsigned long int) +(1 + 0)) * i);
> >
> >>Wouldn't it be better to just handle case UNARY_PLUS_EXPR:
> >>separately, and always optimize it away?
> >>So like:
> >> case UNARY_PLUS_EXPR:
> >>   loc = EXPR_LOCATION (x);
> >>   op0 = cp_fold_maybe_rvalue (TREE_OPERAND (x, 0), rval_ops);
> >>   if (op0 == error_mark_node)
> >>x = error_mark_node;
> >>   else
> >> x = fold_convert_loc (loc, TREE_TYPE (x), op0);
> >>   break;
> >>or so?
> >
> >Let's ask Jason.  If he prefers this approach, I'll get it done.
> 
> Sounds good.

So I went ahead and regtested this one.  It uses fold_convert instead
of fold_convert_loc because I think that's what's desirable here.
I've extended the test so that it also checks +(1 + 0).

Ok if the bootstrap passes as well?

2016-03-10  Marek Polacek  

PR c++/70153
* cp-gimplify.c (cp_fold): Handle UNARY_PLUS_EXPR.

* g++.dg/delayedfold/unary-plus1.C: New test.

diff --git gcc/cp/cp-gimplify.c gcc/cp/cp-gimplify.c
index 6af3760..71ac588 100644
--- gcc/cp/cp-gimplify.c
+++ gcc/cp/cp-gimplify.c
@@ -1996,7 +1996,6 @@ cp_fold (tree x)
 case BIT_NOT_EXPR:
 case TRUTH_NOT_EXPR:
 case FIXED_CONVERT_EXPR:
-case UNARY_PLUS_EXPR:
 case INDIRECT_REF:
 
   loc = EXPR_LOCATION (x);
@@ -2016,6 +2015,14 @@ cp_fold (tree x)
  || !VOID_TYPE_P (TREE_TYPE (TREE_OPERAND (x, 0;
   break;
 
+case UNARY_PLUS_EXPR:
+  op0 = cp_fold_maybe_rvalue (TREE_OPERAND (x, 0), rval_ops);
+  if (op0 == error_mark_node)
+   x = error_mark_node;
+  else
+   x = fold_convert (TREE_TYPE (x), op0);
+  break;
+
 case POSTDECREMENT_EXPR:
 case POSTINCREMENT_EXPR:
 case INIT_EXPR:
diff --git gcc/testsuite/g++.dg/delayedfold/unary-plus1.C 
gcc/testsuite/g++.dg/delayedfold/unary-plus1.C
index e69de29..ebf3493 100644
--- gcc/testsuite/g++.dg/delayedfold/unary-plus1.C
+++ gcc/testsuite/g++.dg/delayedfold/unary-plus1.C
@@ -0,0 +1,22 @@
+// PR c++/70153
+// { dg-do run }
+
+unsigned long long int
+fn1 (unsigned long long int i)
+{
+  return 2ULL * ((1 + (unsigned long int) +1) * i);
+}
+
+unsigned long long int
+fn2 (unsigned long long int i)
+{
+  return 2ULL * ((1 + (unsigned long int) +(1 + 0)) * i);
+}
+
+int
+main (void)
+{
+  if (fn1 (3ULL) != 12ULL
+  || fn2 (3ULL) != 12ULL)
+__builtin_abort ();
+}

Marek


Re: [PATCH][AArch64] PR target/70002: Make aarch64_set_current_function play nice with pragma resetting

2016-03-10 Thread James Greenhalgh
On Thu, Mar 03, 2016 at 11:38:11AM +, Kyrill Tkachov wrote:
> Hi all,
> 
> This patch fixes the ICE that was introduced by my earlier patch to 
> aarch64_set_current_function:
> FAIL: gcc.dg/torture/pr52429.c -O2 -flto -fno-use-linker-plugin 
> -flto-partition=none (internal compiler error)
> 
> And it also fixes a bug that I was working on separately relating to popping 
> pragmas.
> The patch rewrites the aarch64_set_current_function implementation to be the 
> same as the one in the arm port
> that Christian wrote and which is simpler than the existing implementation 
> and has been tested for some time
> without problems. I've thought this was the way to go but was hoping to do it 
> for GCC 7 instead, but I think
> given the ICE we'd rather have consistent implementations of this hook 
> between arm and aarch64 (and ideally
> this should be moved into the midend for all targets, I don't see much 
> target-specific information in the
> implementation of this across the targets, but not at this stage).
> 
> Similar to that implementation the setting and restoring of the target 
> globals is factored into a separate
> function that is used in aarch64_set_current_function and the pragma handling 
> function to tell the midend
> when to reinitialise its structures.
> 
> This patch fixes the ICE, the testcase attached, and passes bootstrap and 
> regression testing on
> aarch64-none-linux-gnu.
> 
> Sorry for missing the ICE originally.
> Ok for trunk?

OK with the typos below fixed.

> diff --git a/gcc/config/aarch64/aarch64-c.c b/gcc/config/aarch64/aarch64-c.c
> index 
> 3590ae0daa5d80050b0f81cd6ab9a7779f463516..e057daaec24c0add673d0b2c776d4c4c43d1f0ea
>  100644
> --- a/gcc/config/aarch64/aarch64-c.c
> +++ b/gcc/config/aarch64/aarch64-c.c
> @@ -178,6 +178,12 @@ aarch64_pragma_target_parse (tree args, tree pop_target)
>  
>cpp_opts->warn_unused_macros = saved_warn_unused_macros;
>  
> +  /* If we're popping or reseting make sure to update the globals so that
> + the optab availability predicates get recomputed.  */
> +  if (pop_target)
> +aarch64_save_restore_target_globals (pop_target);
> +
> +

Extra newline.

>/* Initialize SIMD builtins if we haven't already.
>   Set current_target_pragma to NULL for the duration so that
>   the builtin initialization code doesn't try to tag the functions
> diff --git a/gcc/config/aarch64/aarch64-protos.h 
> b/gcc/config/aarch64/aarch64-protos.h
> index 
> e4e49fc9ccc3d568c84b35c1a0c0733475017cca..c40d2b0c78494b50508c1b5135b8ee7676a61631
>  100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -361,6 +361,7 @@ void aarch64_emit_call_insn (rtx);
>  void aarch64_register_pragmas (void);
>  void aarch64_relayout_simd_types (void);
>  void aarch64_reset_previous_fndecl (void);
> +void aarch64_save_restore_target_globals (tree);
>  void aarch64_emit_approx_rsqrt (rtx, rtx);
>  
>  /* Initialize builtins for SIMD intrinsics.  */
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
> 1e10d9798ddc5f5d2aac4255d3a8fe4ecaf1402a..a05160e08d0474ed9c1e2afa1d00375839417034
>  100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -8570,6 +8570,21 @@ aarch64_reset_previous_fndecl (void)
>aarch64_previous_fndecl = NULL;
>  }
>  
> +/* Restore or save the TREE_TARGET_GLOBALS from or to NEW_TREE.
> +   Used by aarch64_set_current_function and aarch64_pragma_target_parse to
> +   make sure optab availability predicates are recomputed when necessary.  */
> +
> +void
> +aarch64_save_restore_target_globals (tree new_tree)
> +{
> +  if (TREE_TARGET_GLOBALS (new_tree))
> +restore_target_globals (TREE_TARGET_GLOBALS (new_tree));
> +  else if (new_tree == target_option_default_node)
> +restore_target_globals (_target_globals);
> +  else
> +TREE_TARGET_GLOBALS (new_tree) = save_target_globals_default_opts ();
> +}
> +
>  /* Implement TARGET_SET_CURRENT_FUNCTION.  Unpack the codegen decisions
> like tuning and ISA features from the DECL_FUNCTION_SPECIFIC_TARGET
> of the function, if such exists.  This function may be called multiple
> @@ -8579,63 +8594,32 @@ aarch64_reset_previous_fndecl (void)
>  static void
>  aarch64_set_current_function (tree fndecl)
>  {
> +  if (!fndecl || fndecl == aarch64_previous_fndecl)
> +return;
> +
>tree old_tree = (aarch64_previous_fndecl
>  ? DECL_FUNCTION_SPECIFIC_TARGET (aarch64_previous_fndecl)
>  : NULL_TREE);
>  
> -  tree new_tree = (fndecl
> -? DECL_FUNCTION_SPECIFIC_TARGET (fndecl)
> -: NULL_TREE);
> -
> +  tree new_tree = DECL_FUNCTION_SPECIFIC_TARGET (fndecl);
>  
> -  if (fndecl && fndecl != aarch64_previous_fndecl)
> -{
> -  aarch64_previous_fndecl = fndecl;
> -  if (old_tree == new_tree)
> - ;
> +  /* If current function has no attributes but previous one did,

s/but previous/but the previous/

> + use the 

Re: C++ PATCH for c++/70153 (unhandled UNARY_PLUS_EXPR causes stack overflow)

2016-03-10 Thread Jason Merrill

On 03/10/2016 08:07 AM, Marek Polacek wrote:

On Thu, Mar 10, 2016 at 01:56:41PM +0100, Jakub Jelinek wrote:

On Thu, Mar 10, 2016 at 01:46:45PM +0100, Marek Polacek wrote:

2016-03-10  Marek Polacek  

PR c++/70153
* cp-gimplify.c (cp_fold): Handle UNARY_PLUS_EXPR.

* g++.dg/delayedfold/unary-plus1.C: New test.

diff --git gcc/cp/cp-gimplify.c gcc/cp/cp-gimplify.c
index 6af3760..db23efe 100644
--- gcc/cp/cp-gimplify.c
+++ gcc/cp/cp-gimplify.c
@@ -2009,6 +2009,8 @@ cp_fold (tree x)
  else
x = fold_build1_loc (loc, code, TREE_TYPE (x), op0);
}
+  else if (code == UNARY_PLUS_EXPR)
+   x = fold_convert (TREE_TYPE (x), op0);
else
x = fold (x);



Won't this still leak UNARY_PLUS_EXPR into the folded result if
you could fold the operand of that?  It will take the
   x = fold_build1_loc (loc, code, TREE_TYPE (x), op0);
path...


...of course :(.  Testcase for that:
return 2ULL * ((1 + (unsigned long int) +(1 + 0)) * i);


Wouldn't it be better to just handle case UNARY_PLUS_EXPR:
separately, and always optimize it away?
So like:
 case UNARY_PLUS_EXPR:
   loc = EXPR_LOCATION (x);
   op0 = cp_fold_maybe_rvalue (TREE_OPERAND (x, 0), rval_ops);
   if (op0 == error_mark_node)
x = error_mark_node;
   else
 x = fold_convert_loc (loc, TREE_TYPE (x), op0);
   break;
or so?


Let's ask Jason.  If he prefers this approach, I'll get it done.


Sounds good.

Jason




Re: [committed, libffi] Match upstream soname

2016-03-10 Thread Richard Biener
On Thu, Mar 10, 2016 at 3:03 PM, Matthias Klose  wrote:
> On 07.03.2016 12:59, Richard Henderson wrote:
>>
>> On 03/06/2016 07:24 PM, Matthias Klose wrote:
>>>
>>> On 05.03.2016 19:28, Richard Henderson wrote:

 When I went to apply my symbol versioning patch to upstream,
 I discovered that upstream had already bumped their soname to
 6.4.0, beyond the bump that I'd applied to gcc to 5.0.0.

 So I bumped upstream to 7.0.0, including the symbol versioning,
 and this adjusts gcc to match.
>>>
>>>
>>> Looking at the libffi upstream git repo, I can't find this change
>>> upstream, and
>>> neither mentioned on the libffi mailing list. What am I missing?
>>
>>
>> That must of libffi's workflow is via github, I think.
>>
>>https://github.com/atgreen/libffi/pull/230
>
>
> well, then I don't understand the subject, because there's no upstream
> commit yet.  Plus this patch doesn't seem to be relevant for GCC because
> libffi is only used as a convenience library, and isn't installed at all.
> Unless a distributor decides to ship libffi built from the GCC sources
> again.

Which I incidentially (as the original bugreporter) do.  The shared libffi is
always built when bootstrapping.  It's not installed by a toplevel
make install but a make install inside $target/libffi/ works just fine.

Richard.

> Matthias
>


Re: [committed, libffi] Match upstream soname

2016-03-10 Thread Matthias Klose

On 07.03.2016 12:59, Richard Henderson wrote:

On 03/06/2016 07:24 PM, Matthias Klose wrote:

On 05.03.2016 19:28, Richard Henderson wrote:

When I went to apply my symbol versioning patch to upstream,
I discovered that upstream had already bumped their soname to
6.4.0, beyond the bump that I'd applied to gcc to 5.0.0.

So I bumped upstream to 7.0.0, including the symbol versioning,
and this adjusts gcc to match.


Looking at the libffi upstream git repo, I can't find this change upstream, and
neither mentioned on the libffi mailing list. What am I missing?


That must of libffi's workflow is via github, I think.

   https://github.com/atgreen/libffi/pull/230


well, then I don't understand the subject, because there's no upstream commit 
yet.  Plus this patch doesn't seem to be relevant for GCC because libffi is only 
used as a convenience library, and isn't installed at all. Unless a distributor 
decides to ship libffi built from the GCC sources again.


Matthias



[arm-embedded][PATCH][ARM] Make Cortex-R8 use ARMv7 multilib

2016-03-10 Thread Andre Vieira (lists)
On 09/03/16 16:55, Andre Vieira (lists) wrote:
> Hi there,
> 
> This patch adds cortex-r8 to the list of cores using the armv7 multilib.
> 
> This patch is based on Thomas' multilib patch series:
> https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01584.html
> https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01585.html
> https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01586.html
> 
> Is this OK?
> 
> 2016-03-09  Andre Vieira  
> 
>   * gcc/config/arm/t-baremetal: Add cortex-r8.
> 
Hi,

We decided to apply the following patch to the ARM embedded 5 branch.

Best regards,
Andre



[Patch, avr]Fix multiple ICE fallout of PR 69764

2016-03-10 Thread Senthil Kumar Selvaraj
Hi,

   This patch fixes ~230 internal compiler errors that showed up after
   the fix for PR 69764. After the patch, target backends need to
   explicitly specify mode for operand 2 of shift and rotate patterns -
   see md.texi mod at
   
https://gcc.gnu.org/viewcvs/gcc/trunk/gcc/doc/md.texi?r1=233358=233613=233613.

   The avr backend had VOIDmode as the mode for rotl SPN, and this patch
   sets it to the mode of the insn to fix the problem.

   Reg testing shows ICE fixes and no new failures.

   If this is ok, could someone commit please? I don't have commit
   access.

Regards
Senthil

2016-03-10  Senthil Kumar Selvaraj  

* config/avr/avr.md (rotl3): Set
mode for operand 2.

diff --git gcc/config/avr/avr.md gcc/config/avr/avr.md
index ff26f2e..c988446 100644
--- gcc/config/avr/avr.md
+++ gcc/config/avr/avr.md
@@ -3351,7 +3351,7 @@
 (define_expand "rotl3"
   [(parallel [(set (match_operand:HISI 0 "register_operand" "")
(rotate:HISI (match_operand:HISI 1 "register_operand" "")
-(match_operand:VOID 2 "const_int_operand" "")))
+(match_operand:HISI 2 "const_int_operand" "")))
   (clobber (match_dup 3))])]
   ""
   {


[arm-embedded][PATCH 2/2][GCC][ARM] Fix testcases after introduction of Cortex-R8

2016-03-10 Thread Andre Vieira (lists)
On 09/03/16 16:57, Andre Vieira (lists) wrote:
> On 08/03/16 14:56, Kyrill Tkachov wrote:
>> Hi Andre,
>>
>> On 08/03/16 11:05, Andre Vieira (lists) wrote:
>>> On 03/03/16 11:28, Kyrill Tkachov wrote:
 Hi Andre,

 On 02/03/16 12:21, Andre Vieira (lists) wrote:
> Hi,
>
> Tests used to check for "r8" which will not work because cortex-r8
> string is now included in the assembly. Fixed by checking for
> "[^\-]r8".
>
> Is this Ok?
>
> Cheers,
> Andre
>
> gcc/testsuite/ChangeLog:
>
> 2016-03-02  Andre Vieira  
>
>* gcc.target/arm/pr45701-1.c: Change assembler scan to not
>trigger for cortex-r8, when scanning for register r8.
>* gcc.target/arm/pr45701-2.c: Likewise.
 Ok.
 Thanks,
 Kyrill

>>> Thomas commited on my behalf at revision r234040.
>>>
>>> Had to rebase arm-tune.md and invoke.texi, these were all obvious
>>> changes.
>>
>> I'm seeing a DejaGNU error while testing
>> RUNTESTFLAGS="arm.exp=pr45701-*.c":
>> ERROR: (DejaGnu) proc "^-" does not exist.
>> The error code is NONE
>> The info on the error is:
>> invalid command name "^-"
>> while executing
>> "::tcl_unknown ^-"
>> ("uplevel" body line 1)
>> invoked from within
>> "uplevel 1 ::tcl_unknown $args"
>>
>> That's due to the scan-assembler-not test:
>> /* { dg-final { scan-assembler-not "[^\-]r8" } } */
>>
>> The '[' and ']' need to be escaped by a backslash.
>> Can you please post a patch to add the escapes.
>> Sorry for missing this in the original review...
>>
>> Kyrill
>>
>>> Cheers,
>>> Andre
>>>
>>
> Hi there,
> 
> Sorry for missing those too.
> 
> 2016-03-09 Andre Vieira 
> 
> * gcc.target/arm/pr45701-1.c: Escape brackets.
> * gcc.target/arm/pr45701-2.c: Likewise.
> 
Hi,

We decided to apply the following patch to the ARM embedded 5 branch.

Best regards,
Andre


[arm-embedded][PATCH 1/2][GCC][ARM] Add support for Cortex-R8

2016-03-10 Thread Andre Vieira (lists)
On 02/03/16 12:20, Andre Vieira (lists) wrote:
> 
> gcc/ChangeLog:
> 
> 2016-03-02  Andre Vieira  
> 
>  * config/arm/arm-cores.def (cortex-r8): New.
>  * config/arm/arm-tables.opt (cortex-r8): New.
>  * config/arm/arm-tune.md: Regenerate.
>  * gcc/doc/invoke.texi: Add cortex-r8 to list of cpu values.
> 
Hi,

We decided to apply the following patch to the ARM embedded 5 branch.

Best regards,
Andre


[arm-embedded][PATCH 2/2][GCC][ARM] Fix testcases after introduction of Cortex-R8

2016-03-10 Thread Andre Vieira (lists)
On 02/03/16 12:21, Andre Vieira (lists) wrote:
> Hi,
> 
> Tests used to check for "r8" which will not work because cortex-r8
> string is now included in the assembly. Fixed by checking for "[^\-]r8".
> 
> Is this Ok?
> 
> Cheers,
> Andre
> 
> gcc/testsuite/ChangeLog:
> 
> 2016-03-02  Andre Vieira  
> 
>  * gcc.target/arm/pr45701-1.c: Change assembler scan to not
>  trigger for cortex-r8, when scanning for register r8.
>  * gcc.target/arm/pr45701-2.c: Likewise.
> 
Hi,

We decided to apply the following patch to the ARM embedded 5 branch.

Best regards,
Andre


Re: [AArch64] Disable pcrelative_literal_loads with fix-cortex-a53-843419

2016-03-10 Thread James Greenhalgh
On Thu, Mar 10, 2016 at 01:37:50PM +0100, Christophe Lyon wrote:
> On 10 March 2016 at 12:43, James Greenhalgh  wrote:
> > On Tue, Jan 26, 2016 at 03:43:36PM +0100, Christophe Lyon wrote:
> >> With the attachment
> >>
> >>
> >> On 26 January 2016 at 15:42, Christophe Lyon  
> >> wrote:
> >> > Hi,
> >> >
> >> > This is a followup to PR63304.
> >> >
> >> > As discussed in bugzilla, this patch disables pcrelative_literal_loads
> >> > when -mfix-cortex-a53-843419 (or its default configure option) is
> >> > used.
> >> >
> >> > I copied the behavior of -mfix-cortex-a53-835769 (e.g. in
> >> > aarch64_can_inline_p), and I have tested by building the Linux kernel
> >> > using -mfix-cortex-a53-843419 and checked that
> >> > R_AARCH64_ADR_PREL_PG_HI21 relocations are not emitted anymore (under
> >> > CONFIG_ARM64_ERRATUM_843419).
> >> >
> >> > For reference, this is motivated by:
> >> > https://bugs.linaro.org/show_bug.cgi?id=1994
> >> > and further details on Launchpad:
> >> > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1533009
> >> >
> >> > OK for trunk?
> >
> > Thanks, this looks like a clear regression from GCC 5 (we can no longer
> > build the kernel, so this workaround is fine to go in now). Please remember
> > to add the link to the relevant PR in the ChangeLog.
> >
> > I'd also really appreciate a nice big comment over this code:
> >
> >> +  /* If it is not set on the command line, we default to no pc
> >> + relative literal loads, unless the workaround for Cortex-A53
> >> + erratum 843419 is in effect.  */
> >> +  if (opts->x_nopcrelative_literal_loads == 2
> >> +  && !TARGET_FIX_ERR_A53_843419)
> >
> > Explaining why this is important (i.e. some summary of the discussion
> > in PR63304 regarding the kernel module loader).
> >
> > Can you repost with that comment added? I don't have any other objections
> > to the patch.
> >
> 
> OK, here is an updated version.

Thanks.

This is OK for trunk.

James



Re: [PATCH][AArch64] Replace insn to zero up DF register

2016-03-10 Thread James Greenhalgh
On Wed, Mar 09, 2016 at 03:35:43PM -0600, Evandro Menezes wrote:
> On 03/01/16 13:08, Evandro Menezes wrote:
> >On 03/01/16 13:02, Wilco Dijkstra wrote:
> >>Evandro Menezes wrote:
> >>>The meaning of these attributes are not clear to me.  Is there a
> >>>reference somewhere about which insns are FP or SIMD or neither?
> >>The meaning should be clear, "fp" is a floating point
> >>instruction, "simd" a SIMD one
> >>as defined in ARM-ARM.
> >>
> >>>Indeed, I had to add the Y for the f_mcr insn to match it with nosimd.
> >>>However, I didn't feel that it should be moved to the right, since it's
> >>>already disparaged.  Am I missing something detail?
> >>It might not matter for this specific case, but I have seen
> >>reload forcing the very
> >>first alternative without looking at any costs or preferences -
> >>as long as it is legal.
> >>This suggests we need to order alternatives from most preferred
> >>alternative to least
> >>preferred one.
> >>
> >>I think it is good enough for commit, James?
> >
> >Methinks that my issue with those attributes is that I'm not as
> >fluent in AArch64 as I'd like to be.
> >
> >Please, feel free to edit the patch changing the order then.
> 
>Replace insn to zero up SIMD registers
> 
>gcc/
> * config/aarch64/aarch64.md
> (*movhf_aarch64): Add "movi %0, #0" to zero up register.
> (*movsf_aarch64): Likewise and add "simd" and "fp" attributes.
> (*movdf_aarch64): Likewise.
> 
> Swapped the order of the constraints to favor MOVI.
> 
> Just say the word...

I'm wondering whether this is appropriate for GCC6 now that we are so late
in the development cycle. Additionally, I have some comments on your patch:

> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 68676c9..4502a58 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -1163,11 +1163,12 @@
>  )
>  
>  (define_insn "*movhf_aarch64"
> -  [(set (match_operand:HF 0 "nonimmediate_operand" "=w, ?r,w,w,m,r,m ,r")
> - (match_operand:HF 1 "general_operand"  "?rY, w,w,m,w,m,rY,r"))]
> +  [(set (match_operand:HF 0 "nonimmediate_operand" "=w,w  ,?r,w,w,m,r,m ,r")
> + (match_operand:HF 1 "general_operand"  "Y ,?rY, w,w,m,w,m,rY,r"))]
>"TARGET_FLOAT && (register_operand (operands[0], HFmode)
>  || aarch64_reg_or_fp_zero (operands[1], HFmode))"
>"@
> +   movi\\t%0.4h, #0
> mov\\t%0.h[0], %w1
> umov\\t%w0, %1.h[0]
> mov\\t%0.h[0], %1.h[0]
> @@ -1176,18 +1177,19 @@
> ldrh\\t%w0, %1
> strh\\t%w1, %0
> mov\\t%w0, %w1"
> -  [(set_attr "type" "neon_from_gp,neon_to_gp,neon_move,\
> +  [(set_attr "type" "neon_move,neon_from_gp,neon_to_gp,neon_move,\
>   f_loads,f_stores,load1,store1,mov_reg")
> -   (set_attr "simd" "yes,yes,yes,*,*,*,*,*")
> -   (set_attr "fp"   "*,*,*,yes,yes,*,*,*")]
> +   (set_attr "simd" "yes,yes,yes,yes,*,*,*,*,*")
> +   (set_attr "fp"   "*,*,*,*,yes,yes,*,*,*")]
>  )
>  
>  (define_insn "*movsf_aarch64"
> -  [(set (match_operand:SF 0 "nonimmediate_operand" "=w, ?r,w,w  ,w,m,r,m ,r")
> - (match_operand:SF 1 "general_operand"  "?rY, w,w,Ufc,m,w,m,rY,r"))]
> +  [(set (match_operand:SF 0 "nonimmediate_operand" "=w,w  ,?r,w,w  ,w,m,r,m 
> ,r")
> + (match_operand:SF 1 "general_operand"  "Y ,?rY, 
> w,w,Ufc,m,w,m,rY,r"))]
>"TARGET_FLOAT && (register_operand (operands[0], SFmode)
>  || aarch64_reg_or_fp_zero (operands[1], SFmode))"
>"@
> +   movi\\t%0.2s, #0
> fmov\\t%s0, %w1
> fmov\\t%w0, %s1
> fmov\\t%s0, %s1
> @@ -1197,16 +1199,19 @@
> ldr\\t%w0, %1
> str\\t%w1, %0
> mov\\t%w0, %w1"
> -  [(set_attr "type" "f_mcr,f_mrc,fmov,fconsts,\
> - f_loads,f_stores,load1,store1,mov_reg")]
> +  [(set_attr "type" "neon_move,f_mcr,f_mrc,fmov,fconsts,\
> + f_loads,f_stores,load1,store1,mov_reg")
> +   (set_attr "simd" "yes,*,*,*,*,*,*,*,*,*")
> +   (set_attr "fp"   "*,*,*,yes,yes,yes,yes,*,*,*")]
>  )

This fp attribute looks wrong to me. The two fmov instructions that move
between core and FP registers should be tagged "yes". However, this is
irrelevant as the whole pattern is guarded by TARGET_FLOAT.

It would be clearer to drop the FP attribute entirely, so as not to give
the erroneous impression that some alternatives in this insn are enabled
for !TARGET_FLOAT.

>  (define_insn "*movdf_aarch64"
> -  [(set (match_operand:DF 0 "nonimmediate_operand" "=w, ?r,w,w  ,w,m,r,m ,r")
> - (match_operand:DF 1 "general_operand"  "?rY, w,w,Ufc,m,w,m,rY,r"))]
> +  [(set (match_operand:DF 0 "nonimmediate_operand" "=w,w  ,?r,w,w  ,w,m,r,m 
> ,r")
> + (match_operand:DF 1 "general_operand"  "Y ,?rY, 
> w,w,Ufc,m,w,m,rY,r"))]
>"TARGET_FLOAT && (register_operand (operands[0], DFmode)
>  || aarch64_reg_or_fp_zero (operands[1], DFmode))"
>"@
> +   movi\\t%d0, #0
> fmov\\t%d0, %x1
> fmov\\t%x0, %d1
> fmov\\t%d0, %d1
> @@ -1216,8 +1221,10 @@

Re: [ptx] debug info

2016-03-10 Thread Alexander Monakov
On Thu, 10 Mar 2016, Nathan Sidwell wrote:
> Hm, something must have changed since  I found that sorry neccessary.

As I already said in my opening sentence (not quoted in your response), you
removed the unnecessary override.  This is exactly what lets toplevel code see
requested debug format now, and react accordingly.

Alexander


Re: [ptx] debug info

2016-03-10 Thread Nathan Sidwell

On 03/10/16 07:52, Alexander Monakov wrote:

On Wed, 9 Mar 2016, Nathan Sidwell wrote:

Furthermore, this is not useful without support in libgomp/plugin-nvptx.c
and nvptx-none-run.c (PTX JIT does not propagate lineinfo by default).
Would you like me to submit patches for those?


please.


Here's the pull request for nvptx-run.c changes:
https://github.com/MentorEmbedded/nvptx-tools/pull/11


merged, thanks



Re: C++ PATCH for c++/70153 (unhandled UNARY_PLUS_EXPR causes stack overflow)

2016-03-10 Thread Marek Polacek
On Thu, Mar 10, 2016 at 01:56:41PM +0100, Jakub Jelinek wrote:
> On Thu, Mar 10, 2016 at 01:46:45PM +0100, Marek Polacek wrote:
> > 2016-03-10  Marek Polacek  
> > 
> > PR c++/70153
> > * cp-gimplify.c (cp_fold): Handle UNARY_PLUS_EXPR.
> > 
> > * g++.dg/delayedfold/unary-plus1.C: New test.
> > 
> > diff --git gcc/cp/cp-gimplify.c gcc/cp/cp-gimplify.c
> > index 6af3760..db23efe 100644
> > --- gcc/cp/cp-gimplify.c
> > +++ gcc/cp/cp-gimplify.c
> > @@ -2009,6 +2009,8 @@ cp_fold (tree x)
> >   else
> > x = fold_build1_loc (loc, code, TREE_TYPE (x), op0);
> > }
> > +  else if (code == UNARY_PLUS_EXPR)
> > +   x = fold_convert (TREE_TYPE (x), op0);
> >else
> > x = fold (x);
> >  
> 
> Won't this still leak UNARY_PLUS_EXPR into the folded result if
> you could fold the operand of that?  It will take the
>   x = fold_build1_loc (loc, code, TREE_TYPE (x), op0);
> path...

...of course :(.  Testcase for that:
return 2ULL * ((1 + (unsigned long int) +(1 + 0)) * i);

> Wouldn't it be better to just handle case UNARY_PLUS_EXPR:
> separately, and always optimize it away?
> So like:
> case UNARY_PLUS_EXPR:
>   loc = EXPR_LOCATION (x);
>   op0 = cp_fold_maybe_rvalue (TREE_OPERAND (x, 0), rval_ops);
>   if (op0 == error_mark_node)
>   x = error_mark_node;
>   else
> x = fold_convert_loc (loc, TREE_TYPE (x), op0);
>   break;
> or so?

Let's ask Jason.  If he prefers this approach, I'll get it done.

Thanks,

Marek


Re: [ptx] debug info

2016-03-10 Thread Nathan Sidwell

On 03/09/16 11:41, Alexander Monakov wrote:


:|x86_64-pc-linux-gnu-accel-nvptx-none-gcc -xc - -S -o- -gstabs
:1:0: sorry, unimplemented: stabs debug format not supported

cc1: error: target system does not support the ‘stabs’ debug format

:|x86_64-pc-linux-gnu-accel-nvptx-none-gcc -xc - -S -o- -gcoff
cc1: error: target system does not support the ‘coff’ debug format

So nvptx can the generic mechanism that produces such errors for all targets
and all debug formats, rather than ad-hoc target-specific handling.


Hm, something must have changed since  I found that sorry neccessary.

nathan


Re: C++ PATCH for c++/70153 (unhandled UNARY_PLUS_EXPR causes stack overflow)

2016-03-10 Thread Jakub Jelinek
On Thu, Mar 10, 2016 at 01:46:45PM +0100, Marek Polacek wrote:
> 2016-03-10  Marek Polacek  
> 
>   PR c++/70153
>   * cp-gimplify.c (cp_fold): Handle UNARY_PLUS_EXPR.
> 
>   * g++.dg/delayedfold/unary-plus1.C: New test.
> 
> diff --git gcc/cp/cp-gimplify.c gcc/cp/cp-gimplify.c
> index 6af3760..db23efe 100644
> --- gcc/cp/cp-gimplify.c
> +++ gcc/cp/cp-gimplify.c
> @@ -2009,6 +2009,8 @@ cp_fold (tree x)
> else
>   x = fold_build1_loc (loc, code, TREE_TYPE (x), op0);
>   }
> +  else if (code == UNARY_PLUS_EXPR)
> + x = fold_convert (TREE_TYPE (x), op0);
>else
>   x = fold (x);
>  

Won't this still leak UNARY_PLUS_EXPR into the folded result if
you could fold the operand of that?  It will take the
  x = fold_build1_loc (loc, code, TREE_TYPE (x), op0);
path...
Wouldn't it be better to just handle case UNARY_PLUS_EXPR:
separately, and always optimize it away?
So like:
case UNARY_PLUS_EXPR:
  loc = EXPR_LOCATION (x);
  op0 = cp_fold_maybe_rvalue (TREE_OPERAND (x, 0), rval_ops);
  if (op0 == error_mark_node)
x = error_mark_node;
  else
x = fold_convert_loc (loc, TREE_TYPE (x), op0);
  break;
or so?

Jakub


Re: [ptx] debug info

2016-03-10 Thread Alexander Monakov
On Wed, 9 Mar 2016, Nathan Sidwell wrote:
> > Furthermore, this is not useful without support in libgomp/plugin-nvptx.c
> > and nvptx-none-run.c (PTX JIT does not propagate lineinfo by default).
> > Would you like me to submit patches for those?
> 
> please.

Here's the pull request for nvptx-run.c changes:
https://github.com/MentorEmbedded/nvptx-tools/pull/11

Alexander


C++ PATCH for c++/70153 (unhandled UNARY_PLUS_EXPR causes stack overflow)

2016-03-10 Thread Marek Polacek
Here, we got ourselves into infinite recursion in fold, resulting in a stack
overflow.  I believe the crux is that we're passing UNARY_PLUS_EXPR, a C++-only
code, into fold which can't really fold e.g. 1 + UNARY_PLUS_EXPR <1>.  I think
cp_fold should be able to fold this to 2, but currently is not.  But I noticed
that cxx_eval_constant_expression seems to handle this:
if (tcode == UNARY_PLUS_EXPR)
  r = fold_convert (TREE_TYPE (t), op);
so I tweaked cp_fold to do something similar.  With this patch we're able to
fold the expression above properly and fold doesn't run into infinite looping
anymore.

I made the test a run-time test to sanity-check we're generating the correct
value.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2016-03-10  Marek Polacek  

PR c++/70153
* cp-gimplify.c (cp_fold): Handle UNARY_PLUS_EXPR.

* g++.dg/delayedfold/unary-plus1.C: New test.

diff --git gcc/cp/cp-gimplify.c gcc/cp/cp-gimplify.c
index 6af3760..db23efe 100644
--- gcc/cp/cp-gimplify.c
+++ gcc/cp/cp-gimplify.c
@@ -2009,6 +2009,8 @@ cp_fold (tree x)
  else
x = fold_build1_loc (loc, code, TREE_TYPE (x), op0);
}
+  else if (code == UNARY_PLUS_EXPR)
+   x = fold_convert (TREE_TYPE (x), op0);
   else
x = fold (x);
 
diff --git gcc/testsuite/g++.dg/delayedfold/unary-plus1.C 
gcc/testsuite/g++.dg/delayedfold/unary-plus1.C
index e69de29..5ebee22 100644
--- gcc/testsuite/g++.dg/delayedfold/unary-plus1.C
+++ gcc/testsuite/g++.dg/delayedfold/unary-plus1.C
@@ -0,0 +1,15 @@
+// PR c++/70153
+// { dg-do run }
+
+unsigned long long int
+foo (unsigned long long int i)
+{
+  return 2ULL * ((1 + (unsigned long int) +1) * i);
+}
+
+int
+main (void)
+{
+  if (foo (3ULL) != 12ULL)
+__builtin_abort ();
+}

Marek


Re: [AArch64] Disable pcrelative_literal_loads with fix-cortex-a53-843419

2016-03-10 Thread Christophe Lyon
On 10 March 2016 at 12:43, James Greenhalgh  wrote:
> On Tue, Jan 26, 2016 at 03:43:36PM +0100, Christophe Lyon wrote:
>> With the attachment
>>
>>
>> On 26 January 2016 at 15:42, Christophe Lyon  
>> wrote:
>> > Hi,
>> >
>> > This is a followup to PR63304.
>> >
>> > As discussed in bugzilla, this patch disables pcrelative_literal_loads
>> > when -mfix-cortex-a53-843419 (or its default configure option) is
>> > used.
>> >
>> > I copied the behavior of -mfix-cortex-a53-835769 (e.g. in
>> > aarch64_can_inline_p), and I have tested by building the Linux kernel
>> > using -mfix-cortex-a53-843419 and checked that
>> > R_AARCH64_ADR_PREL_PG_HI21 relocations are not emitted anymore (under
>> > CONFIG_ARM64_ERRATUM_843419).
>> >
>> > For reference, this is motivated by:
>> > https://bugs.linaro.org/show_bug.cgi?id=1994
>> > and further details on Launchpad:
>> > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1533009
>> >
>> > OK for trunk?
>
> Thanks, this looks like a clear regression from GCC 5 (we can no longer
> build the kernel, so this workaround is fine to go in now). Please remember
> to add the link to the relevant PR in the ChangeLog.
>
> I'd also really appreciate a nice big comment over this code:
>
>> +  /* If it is not set on the command line, we default to no pc
>> + relative literal loads, unless the workaround for Cortex-A53
>> + erratum 843419 is in effect.  */
>> +  if (opts->x_nopcrelative_literal_loads == 2
>> +  && !TARGET_FIX_ERR_A53_843419)
>
> Explaining why this is important (i.e. some summary of the discussion
> in PR63304 regarding the kernel module loader).
>
> Can you repost with that comment added? I don't have any other objections
> to the patch.
>

OK, here is an updated version.


> Thanks,
> James
>
>
2016-03-10  Christophe Lyon  

PR target/70113.
* config/aarch64/aarch64.h (TARGET_FIX_ERR_A53_843419_DEFAULT):
Always define to 0 or 1.
(TARGET_FIX_ERR_A53_843419): New macro.
* config/aarch64/aarch64-elf-raw.h
(TARGET_FIX_ERR_A53_843419_DEFAULT): Update for above changes.
* config/aarch64/aarch64-linux.h: Likewise.
* config/aarch64/aarch64.c
(aarch64_override_options_after_change_1): Do not default
aarch64_nopcrelative_literal_loads to true if Cortex-A53 erratum
843419 is on.
(aarch64_attributes): Handle fix-cortex-a53-843419.
(aarch64_can_inline_p): Likewise.
* config/aarch64/aarch64.opt (aarch64_fix_a53_err843419): Save.
diff --git a/gcc/config/aarch64/aarch64-elf-raw.h 
b/gcc/config/aarch64/aarch64-elf-raw.h
index 2dcb6d4..9097017 100644
--- a/gcc/config/aarch64/aarch64-elf-raw.h
+++ b/gcc/config/aarch64/aarch64-elf-raw.h
@@ -35,7 +35,7 @@
   " %{mfix-cortex-a53-835769:--fix-cortex-a53-835769}"
 #endif
 
-#ifdef TARGET_FIX_ERR_A53_843419_DEFAULT
+#if TARGET_FIX_ERR_A53_843419_DEFAULT
 #define CA53_ERR_843419_SPEC \
   " %{!mno-fix-cortex-a53-843419:--fix-cortex-a53-843419}"
 #else
diff --git a/gcc/config/aarch64/aarch64-linux.h 
b/gcc/config/aarch64/aarch64-linux.h
index 6064b26..5fcaa59 100644
--- a/gcc/config/aarch64/aarch64-linux.h
+++ b/gcc/config/aarch64/aarch64-linux.h
@@ -53,7 +53,7 @@
   " %{mfix-cortex-a53-835769:--fix-cortex-a53-835769}"
 #endif
 
-#ifdef TARGET_FIX_ERR_A53_843419_DEFAULT
+#if TARGET_FIX_ERR_A53_843419_DEFAULT
 #define CA53_ERR_843419_SPEC \
   " %{!mno-fix-cortex-a53-843419:--fix-cortex-a53-843419}"
 #else
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 801f95a..51dfe79 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -8132,9 +8132,18 @@ aarch64_override_options_after_change_1 (struct 
gcc_options *opts)
   if (opts->x_nopcrelative_literal_loads == 1)
 aarch64_nopcrelative_literal_loads = false;
 
-  /* If it is not set on the command line, we default to no
- pc relative literal loads.  */
-  if (opts->x_nopcrelative_literal_loads == 2)
+  /* If it is not set on the command line, we default to no pc
+ relative literal loads, unless the workaround for Cortex-A53
+ erratum 843419 is in effect.  */
+  /* This is PR70113. When building the Linux kernel with
+ CONFIG_ARM64_ERRATUM_843419, support for relocations
+ R_AARCH64_ADR_PREL_PG_HI21 and R_AARCH64_ADR_PREL_PG_HI21_NC is
+ removed from the kernel to avoid loading objects with possibly
+ offending sequences. With nopcrelative_literal_loads, we would
+ generate such relocations, preventing the kernel build from
+ succeeding.  */
+  if (opts->x_nopcrelative_literal_loads == 2
+  && !TARGET_FIX_ERR_A53_843419)
 aarch64_nopcrelative_literal_loads = true;
 
   /* In the tiny memory model it makes no sense
@@ -8818,6 +8827,8 @@ static const struct aarch64_attribute_info 
aarch64_attributes[] =
  OPT_mgeneral_regs_only },
   { "fix-cortex-a53-835769", aarch64_attr_bool, true, NULL,
 

Re: [PATCH][AArch64] PR target/70002: Make aarch64_set_current_function play nice with pragma resetting

2016-03-10 Thread Kyrill Tkachov

Ping.
https://gcc.gnu.org/ml/gcc-patches/2016-03/msg00237.html

Thanks,
Kyrill
On 03/03/16 11:38, Kyrill Tkachov wrote:

Hi all,

This patch fixes the ICE that was introduced by my earlier patch to 
aarch64_set_current_function:
FAIL: gcc.dg/torture/pr52429.c -O2 -flto -fno-use-linker-plugin 
-flto-partition=none (internal compiler error)

And it also fixes a bug that I was working on separately relating to popping 
pragmas.
The patch rewrites the aarch64_set_current_function implementation to be the 
same as the one in the arm port
that Christian wrote and which is simpler than the existing implementation and 
has been tested for some time
without problems. I've thought this was the way to go but was hoping to do it 
for GCC 7 instead, but I think
given the ICE we'd rather have consistent implementations of this hook between 
arm and aarch64 (and ideally
this should be moved into the midend for all targets, I don't see much 
target-specific information in the
implementation of this across the targets, but not at this stage).

Similar to that implementation the setting and restoring of the target globals 
is factored into a separate
function that is used in aarch64_set_current_function and the pragma handling 
function to tell the midend
when to reinitialise its structures.

This patch fixes the ICE, the testcase attached, and passes bootstrap and 
regression testing on
aarch64-none-linux-gnu.

Sorry for missing the ICE originally.
Ok for trunk?

Thanks,
Kyrill

P.S. The vector arguments re-layout hack that was in 
aarch64_set_current_function is removed because
it has become unneeded since Christian fixed some midend bugs so that it's done 
automatically there.

2016-03-03  Kyrylo Tkachov  

PR target/70002
* config/aarch64/aarch64-protos.h
(aarch64_save_restore_target_globals): New prototype.
* config/aarch64/aarch64-c.c (aarch64_pragma_target_parse):
Call the above when popping pragma.
* config/aarch64/aarch64.c (aarch64_save_restore_target_globals):
New function.
(aarch64_set_current_function): Rewrite using the above.

2016-03-03  Kyrylo Tkachov  

PR target/70002
PR target/69245
* gcc.target/aarch64/pr69245_2.c: New test.




Re: [AArch64] Disable pcrelative_literal_loads with fix-cortex-a53-843419

2016-03-10 Thread James Greenhalgh
On Tue, Jan 26, 2016 at 03:43:36PM +0100, Christophe Lyon wrote:
> With the attachment
> 
> 
> On 26 January 2016 at 15:42, Christophe Lyon  
> wrote:
> > Hi,
> >
> > This is a followup to PR63304.
> >
> > As discussed in bugzilla, this patch disables pcrelative_literal_loads
> > when -mfix-cortex-a53-843419 (or its default configure option) is
> > used.
> >
> > I copied the behavior of -mfix-cortex-a53-835769 (e.g. in
> > aarch64_can_inline_p), and I have tested by building the Linux kernel
> > using -mfix-cortex-a53-843419 and checked that
> > R_AARCH64_ADR_PREL_PG_HI21 relocations are not emitted anymore (under
> > CONFIG_ARM64_ERRATUM_843419).
> >
> > For reference, this is motivated by:
> > https://bugs.linaro.org/show_bug.cgi?id=1994
> > and further details on Launchpad:
> > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1533009
> >
> > OK for trunk?

Thanks, this looks like a clear regression from GCC 5 (we can no longer
build the kernel, so this workaround is fine to go in now). Please remember
to add the link to the relevant PR in the ChangeLog.

I'd also really appreciate a nice big comment over this code:

> +  /* If it is not set on the command line, we default to no pc
> + relative literal loads, unless the workaround for Cortex-A53
> + erratum 843419 is in effect.  */
> +  if (opts->x_nopcrelative_literal_loads == 2
> +  && !TARGET_FIX_ERR_A53_843419)

Explaining why this is important (i.e. some summary of the discussion
in PR63304 regarding the kernel module loader).

Can you repost with that comment added? I don't have any other objections
to the patch.

Thanks,
James




Re: [AArch64] Fix dependency of gcc-plugin.h

2016-03-10 Thread James Greenhalgh
On Tue, Mar 08, 2016 at 03:20:52PM +0100, Christophe Lyon wrote:
> Hi,
> 
> Our bug report https://bugs.linaro.org/show_bug.cgi?id=2123
> complains about aarch64's missing plugin dependency.
> 
> IFAIT, the problem is present on trunk too, and the small attached
> patch fixes it.
> OK?

This is OK.

Thanks,
James

> 2016-03-08  Christophe Lyon  
> 
>   * config/aarch64/t-aarch64 (OPTIONS_H_EXTRA): Add
>   aarch64-fusion-pairs.def and aarch64-tuning-flags.def

> diff --git a/gcc/config/aarch64/t-aarch64 b/gcc/config/aarch64/t-aarch64
> index e2c942b..778e15c 100644
> --- a/gcc/config/aarch64/t-aarch64
> +++ b/gcc/config/aarch64/t-aarch64
> @@ -20,7 +20,9 @@
>  
>  TM_H += $(srcdir)/config/aarch64/aarch64-cores.def
>  OPTIONS_H_EXTRA += $(srcdir)/config/aarch64/aarch64-cores.def \
> -$(srcdir)/config/aarch64/aarch64-arches.def
> +$(srcdir)/config/aarch64/aarch64-arches.def \
> +$(srcdir)/config/aarch64/aarch64-fusion-pairs.def \
> +$(srcdir)/config/aarch64/aarch64-tuning-flags.def
>  
>  $(srcdir)/config/aarch64/aarch64-tune.md: 
> $(srcdir)/config/aarch64/gentune.sh \
>   $(srcdir)/config/aarch64/aarch64-cores.def



Re: [PATCH] Additional check for mask type in masked load/store vectorization

2016-03-10 Thread Richard Biener
On Thu, Mar 10, 2016 at 11:12 AM, Ilya Enkovich  wrote:
> Hi,
>
> Curently when vectorizing masked load/store we don't check
> mask and value have the same number of elements.  Usually
> conversion patterns apply and we don't have such case but
> conversion patterns may fail to properly determine mask
> type and we shouldn't rely on it so hard.
>
> Unfortunately I couldn't make a small testcase where we fail
> to determine mask type.  For me it happeneds when mixed C/Fortran
> code was compiled with LTO and I had various scalar boolean types.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu.  Ok for trunk
> with no testcase?

Ok.

Richard.

> Thanks,
> Ilya
> --
> gcc/
>
> 2016-03-10  Ilya Enkovich  
>
> * tree-vect-stmts.c (vectorizable_mask_load_store): Check mask
> has a proper number of elements.
>
>
> diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
> index 6ac273d..06b1ab7 100644
> --- a/gcc/tree-vect-stmts.c
> +++ b/gcc/tree-vect-stmts.c
> @@ -1742,7 +1742,8 @@ vectorizable_mask_load_store (gimple *stmt, 
> gimple_stmt_iterator *gsi,
>if (!mask_vectype)
>  mask_vectype = get_mask_type_for_scalar_type (TREE_TYPE (vectype));
>
> -  if (!mask_vectype || !VECTOR_BOOLEAN_TYPE_P (mask_vectype))
> +  if (!mask_vectype || !VECTOR_BOOLEAN_TYPE_P (mask_vectype)
> +  || TYPE_VECTOR_SUBPARTS (mask_vectype) != TYPE_VECTOR_SUBPARTS 
> (vectype))
>  return false;
>
>if (is_store)


Re: [PATCH] Add -funconstrained-commons to work around PR/69368 (and others) in SPEC2006

2016-03-10 Thread Richard Biener
On Wed, Mar 9, 2016 at 6:54 PM, Alan Lawrence
 wrote:
> On 07/03/16 11:02, Alan Lawrence wrote:
>>
>> On 04/03/16 13:27, Richard Biener wrote:
>>>
>>> I think to make it work with LTO you need to mark it 'Optimization'.
>>> Also it's about
>>> arrays so maybe
>>>
>>> 'Assume common declarations may be overridden with ones with a larger
>>> trailing array'
>>>
>>> also if we document it here we should eventually document it in
>>> invoke.texi.
>>>
>>> Not sure if "unknown commons" is a good term, maybe "unconstrained
>>> commons" instead?
>>
>>
>> All done; I doubt there is really a good word, unconstrained seems as good
>> as
>> any. I've reused much the same wording in invoke.texi, unless you think
>> there
>> is more to add.
>>
>> On 04/03/16 13:33, Jakub Jelinek wrote:
>>>
>>> Also, isn't the *.opt description line supposed to end with a full stop?
>>
>>
>> Ah, yes, thanks.
>>
>> Is this version OK for trunk?
>>
>> gcc/ChangeLog:
>>
>> DATE  Alan Lawrence  
>>Jakub Jelinek  
>>
>>  * common.opt (funconstrained-commons,
>> flag_unconstrained_commons): New.
>>  * tree.c (array_at_struct_end_p): Do not limit to size of decl
>> for
>>  DECL_COMMONS if flag_unconstrained_commons is set.
>>  * tree-dfa.c (get_ref_base_and_extent): Likewise.
>
>
> And add to that
> * doc/invoke.texi (Optimize Options): Add -funconstrained-commons.
> (funconstrained-commons): Document.

Ok.

Richard.

> Thanks,
> Alan
>
>>
>> gcc/testsuite/ChangeLog:
>>
>>  * gfortran.dg/unconstrained_commons.f: New.
>> ---
>>   gcc/common.opt|  5 +
>>   gcc/doc/invoke.texi   |  8 +++-
>>   gcc/testsuite/gfortran.dg/unconstrained_commons.f | 20
>> 
>>   gcc/tree-dfa.c| 15 ++-
>>   gcc/tree.c|  6 --
>>   5 files changed, 50 insertions(+), 4 deletions(-)
>>   create mode 100644 gcc/testsuite/gfortran.dg/unconstrained_commons.f
>>
>> diff --git a/gcc/common.opt b/gcc/common.opt
>> index 520fa9c..bbf79ef 100644
>> --- a/gcc/common.opt
>> +++ b/gcc/common.opt
>> @@ -2451,6 +2451,11 @@ fsplit-paths
>>   Common Report Var(flag_split_paths) Init(0) Optimization
>>   Split paths leading to loop backedges.
>>
>> +funconstrained-commons
>> +Common Var(flag_unconstrained_commons) Optimization
>> +Assume common declarations may be overridden with ones with a larger
>> +trailing array.
>> +
>>   funit-at-a-time
>>   Common Report Var(flag_unit_at_a_time) Init(1)
>>   Compile whole compilation unit at a time.
>> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
>> index 0a2a6f4..68933a1 100644
>> --- a/gcc/doc/invoke.texi
>> +++ b/gcc/doc/invoke.texi
>> @@ -407,7 +407,7 @@ Objective-C and Objective-C++ Dialects}.
>>   -ftree-parallelize-loops=@var{n} -ftree-pre -ftree-partial-pre
>> -ftree-pta @gol
>>   -ftree-reassoc -ftree-sink -ftree-slsr -ftree-sra @gol
>>   -ftree-switch-conversion -ftree-tail-merge -ftree-ter @gol
>> --ftree-vectorize -ftree-vrp @gol
>> +-ftree-vectorize -ftree-vrp -funconstrained-commons @gol
>>   -funit-at-a-time -funroll-all-loops -funroll-loops @gol
>>   -funsafe-loop-optimizations -funsafe-math-optimizations -funswitch-loops
>> @gol
>>   -fipa-ra -fvariable-expansion-in-unroller -fvect-cost-model -fvpt @gol
>> @@ -6659,6 +6659,12 @@ the loop optimizer itself cannot prove that these
>> assumptions are valid.
>>   If you use @option{-Wunsafe-loop-optimizations}, the compiler warns you
>>   if it finds this kind of loop.
>>
>> +@item -funconstrained-commons
>> +@opindex funconstrained-commons
>> +This option tells the compiler that variables declared in common blocks
>> +(e.g. Fortran) may later be overridden with longer trailing arrays. This
>> +prevents certain optimizations that depend on knowing the array bounds.
>> +
>>   @item -fcrossjumping
>>   @opindex fcrossjumping
>>   Perform cross-jumping transformation.
>> diff --git a/gcc/testsuite/gfortran.dg/unconstrained_commons.f
>> b/gcc/testsuite/gfortran.dg/unconstrained_commons.f
>> new file mode 100644
>> index 000..f9fc471
>> --- /dev/null
>> +++ b/gcc/testsuite/gfortran.dg/unconstrained_commons.f
>> @@ -0,0 +1,20 @@
>> +! { dg-do compile }
>> +! { dg-options "-O3 -funconstrained-commons -fdump-tree-dom2-details" }
>> +
>> +! Test for PR69368: a single-element array in a common block, which will
>> be
>> +! overridden with a larger size at link time (contrary to language spec).
>> +! Dominator opts considers accesses to differently-computed elements of X
>> as
>> +! equivalent, unless -funconstrained-commons is passed in.
>> +  SUBROUTINE FOO
>> +  IMPLICIT DOUBLE PRECISION (X)
>> +  INTEGER J
>> +  COMMON /MYCOMMON / X(1)
>> +  DO 10 J=1,1024
>> + X(J+1)=X(J+7)
>> +  10  CONTINUE
>> +  RETURN
>> +  END
>> +! { dg-final 

Re: Add C++ special math functions to C++17

2016-03-10 Thread Tobias Burnus

Ed Smith-Rowland wrote:
> --- include/bits/specfun.h
> (svn+ssh://e...@gcc.gnu.org/svn/gcc/trunk/libstdc++-v3/include) (working copy)
> +++ svn+ssh://e...@gcc.gnu.org/svn/gcc/trunk/libstdc++-v3/include 
> (revision 234098)
> @@ -1,6 +1,6 @@
>  // Mathematical Special Functions for -*- C++ -*-
>  
> -// Copyright (C) 2006-2016 Free Software Foundation, Inc.
> +// Copyright (C) 2006-2015 Free Software Foundation, Inc.

The patch looks reversed ("---" for the "working copy", copyright-year change 
reversed etc.)

Cheers,

Tobias


[PATCH] Additional check for mask type in masked load/store vectorization

2016-03-10 Thread Ilya Enkovich
Hi,

Curently when vectorizing masked load/store we don't check
mask and value have the same number of elements.  Usually
conversion patterns apply and we don't have such case but
conversion patterns may fail to properly determine mask
type and we shouldn't rely on it so hard. 

Unfortunately I couldn't make a small testcase where we fail
to determine mask type.  For me it happeneds when mixed C/Fortran
code was compiled with LTO and I had various scalar boolean types.

Bootstrapped and regtested on x86_64-pc-linux-gnu.  Ok for trunk
with no testcase?

Thanks,
Ilya
--
gcc/

2016-03-10  Ilya Enkovich  

* tree-vect-stmts.c (vectorizable_mask_load_store): Check mask
has a proper number of elements.


diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 6ac273d..06b1ab7 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -1742,7 +1742,8 @@ vectorizable_mask_load_store (gimple *stmt, 
gimple_stmt_iterator *gsi,
   if (!mask_vectype)
 mask_vectype = get_mask_type_for_scalar_type (TREE_TYPE (vectype));
 
-  if (!mask_vectype || !VECTOR_BOOLEAN_TYPE_P (mask_vectype))
+  if (!mask_vectype || !VECTOR_BOOLEAN_TYPE_P (mask_vectype)
+  || TYPE_VECTOR_SUBPARTS (mask_vectype) != TYPE_VECTOR_SUBPARTS (vectype))
 return false;
 
   if (is_store)


Re: [PATCH 2/2][GCC][ARM] Fix testcases after introduction of Cortex-R8

2016-03-10 Thread Andre Vieira (lists)
On 09/03/16 17:03, Mike Stump wrote:
> On Mar 9, 2016, at 8:57 AM, Andre Vieira (lists) 
>  wrote:
>>> I'm seeing a DejaGNU error while testing
>>> RUNTESTFLAGS="arm.exp=pr45701-*.c":
>>> ERROR: (DejaGnu) proc "^-" does not exist.
> 
>> 2016-03-09 Andre Vieira 
>>
>> * gcc.target/arm/pr45701-1.c: Escape brackets.
> 
> Be sure to run test cases with dejagnu before check in.
> 
Hi Mike,

I did run them but apparently tcl errors dont appear as a FAIL and I
failed to spot the tcl error message. Ill keep an eye out for those in
the future.

Cheers,
Andre


Re: [PATCH] PR69195, Reload confused by invalid reg equivs

2016-03-10 Thread Bernd Schmidt

On 03/10/2016 10:18 AM, Alan Modra wrote:

Doing the indirect jump optimization turned out to be quite easy.

Bootstrapped and regression tested powerpc64le-linux, gcc-6, gcc-5 and
gcc-4.9.  Bootstrap and regression test x86_64-linux still running.
OK to apply?


So much nicer. Ok, and thanks.


Bernd



[PATCH] PR69195, Reload confused by invalid reg equivs

2016-03-10 Thread Alan Modra
Doing the indirect jump optimization turned out to be quite easy.

Bootstrapped and regression tested powerpc64le-linux, gcc-6, gcc-5 and
gcc-4.9.  Bootstrap and regression test x86_64-linux still running.
OK to apply?

gcc/
PR rtl-optimization/69195
PR rtl-optimization/47992
* ira.c (recorded_label_ref): Delete.
(update_equiv_regs): Return void.
(indirect_jump_optimize): New function.
(ira): Call indirect_jump_optimize and delete_trivially_dead_insns
before regstat_compute_ri.  Don't rebuild_jump_labels here.
Delete update_regstat.
gcc/testsuite/
* gcc.dg/pr69195.c: New.
* gcc.dg/pr69238.c: New.

diff --git a/gcc/ira.c b/gcc/ira.c
index 0973258..5e7a2ed 100644
--- a/gcc/ira.c
+++ b/gcc/ira.c
@@ -3319,9 +3319,6 @@ adjust_cleared_regs (rtx loc, const_rtx old_rtx 
ATTRIBUTE_UNUSED, void *data)
   return NULL_RTX;
 }
 
-/* Nonzero if we recorded an equivalence for a LABEL_REF.  */
-static int recorded_label_ref;
-
 /* Find registers that are equivalent to a single value throughout the
compilation (either because they can be referenced in memory or are
set once from a single constant).  Lower their priority for a
@@ -3331,10 +3328,8 @@ static int recorded_label_ref;
value into the using insn.  If it succeeds, we can eliminate the
register completely.
 
-   Initialize init_insns in ira_reg_equiv array.
-
-   Return non-zero if jump label rebuilding should be done.  */
-static int
+   Initialize init_insns in ira_reg_equiv array.  */
+static void
 update_equiv_regs (void)
 {
   rtx_insn *insn;
@@ -3343,10 +3338,6 @@ update_equiv_regs (void)
   bitmap cleared_regs;
   bool *pdx_subregs;
 
-  /* We need to keep track of whether or not we recorded a LABEL_REF so
- that we know if the jump optimizer needs to be rerun.  */
-  recorded_label_ref = 0;
-
   /* Use pdx_subregs to show whether a reg is used in a paradoxical
  subreg.  */
   pdx_subregs = XCNEWVEC (bool, max_regno);
@@ -3578,17 +3569,6 @@ update_equiv_regs (void)
  = gen_rtx_INSN_LIST (VOIDmode, insn,
   ira_reg_equiv[regno].init_insns);
 
- /* Record whether or not we created a REG_EQUIV note for a 
LABEL_REF.
-We might end up substituting the LABEL_REF for uses of the
-pseudo here or later.  That kind of transformation may turn an
-indirect jump into a direct jump, in which case we must rerun 
the
-jump optimizer to ensure that the JUMP_LABEL fields are valid. 
 */
- if (GET_CODE (x) == LABEL_REF
- || (GET_CODE (x) == CONST
- && GET_CODE (XEXP (x, 0)) == PLUS
- && (GET_CODE (XEXP (XEXP (x, 0), 0)) == LABEL_REF)))
-   recorded_label_ref = 1;
-
  reg_equiv[regno].replacement = x;
  reg_equiv[regno].src_p = _SRC (set);
  reg_equiv[regno].loop_depth = (short) loop_depth;
@@ -3706,9 +3686,9 @@ update_equiv_regs (void)
  if (! INSN_P (insn))
continue;
 
- /* Don't substitute into a non-local goto, this confuses CFG.  */
- if (JUMP_P (insn)
- && find_reg_note (insn, REG_NON_LOCAL_GOTO, NULL_RTX))
+ /* Don't substitute into jumps.  indirect_jump_optimize does
+this for anything we are prepared to handle.  */
+ if (JUMP_P (insn))
continue;
 
  for (link = REG_NOTES (insn); link; link = XEXP (link, 1))
@@ -3860,11 +3840,50 @@ update_equiv_regs (void)
   end_alias_analysis ();
   free (reg_equiv);
   free (pdx_subregs);
-  return recorded_label_ref;
 }
 
-
+/* A pass over indirect jumps, converting simple cases to direct jumps.  */
+static void
+indirect_jump_optimize (void)
+{
+  basic_block bb;
+  bool rebuild_p = false;
 
+  FOR_EACH_BB_REVERSE_FN (bb, cfun)
+{
+  rtx_insn *insn = BB_END (bb);
+  if (!JUMP_P (insn))
+   continue;
+
+  rtx x = pc_set (insn);
+  if (!x || !REG_P (SET_SRC (x)))
+   continue;
+
+  int regno = REGNO (SET_SRC (x));
+  if (DF_REG_DEF_COUNT (regno) == 1)
+   {
+ rtx_insn *def_insn = DF_REF_INSN (DF_REG_DEF_CHAIN (regno));
+ rtx note = find_reg_note (def_insn, REG_LABEL_OPERAND, NULL_RTX);
+
+ if (note)
+   {
+ rtx lab = gen_rtx_LABEL_REF (Pmode, XEXP (note, 0));
+ if (validate_replace_rtx (SET_SRC (x), lab, insn))
+   rebuild_p = true;
+   }
+   }
+}
+
+  if (rebuild_p)
+{
+  timevar_push (TV_JUMP);
+  rebuild_jump_labels (get_insns ());
+  if (purge_all_dead_edges ())
+   delete_unreachable_blocks ();
+  timevar_pop (TV_JUMP);
+}
+}
+
 /* Set up fields memory, constant, and invariant from init_insns in
the structures of array ira_reg_equiv.  */
 static void
@@ -5090,7 +5109,6 @@ ira (FILE *f)
 {
   bool loops_p;
   int 

Re: [PATCH 2/2][GCC][ARM] Fix testcases after introduction of Cortex-R8

2016-03-10 Thread Kyrill Tkachov


On 09/03/16 16:57, Andre Vieira (lists) wrote:

On 08/03/16 14:56, Kyrill Tkachov wrote:

Hi Andre,

On 08/03/16 11:05, Andre Vieira (lists) wrote:

On 03/03/16 11:28, Kyrill Tkachov wrote:

Hi Andre,

On 02/03/16 12:21, Andre Vieira (lists) wrote:

Hi,

Tests used to check for "r8" which will not work because cortex-r8
string is now included in the assembly. Fixed by checking for
"[^\-]r8".

Is this Ok?

Cheers,
Andre

gcc/testsuite/ChangeLog:

2016-03-02  Andre Vieira  

* gcc.target/arm/pr45701-1.c: Change assembler scan to not
trigger for cortex-r8, when scanning for register r8.
* gcc.target/arm/pr45701-2.c: Likewise.

Ok.
Thanks,
Kyrill


Thomas commited on my behalf at revision r234040.

Had to rebase arm-tune.md and invoke.texi, these were all obvious
changes.

I'm seeing a DejaGNU error while testing
RUNTESTFLAGS="arm.exp=pr45701-*.c":
ERROR: (DejaGnu) proc "^-" does not exist.
The error code is NONE
The info on the error is:
invalid command name "^-"
 while executing
"::tcl_unknown ^-"
 ("uplevel" body line 1)
 invoked from within
"uplevel 1 ::tcl_unknown $args"

That's due to the scan-assembler-not test:
/* { dg-final { scan-assembler-not "[^\-]r8" } } */

The '[' and ']' need to be escaped by a backslash.
Can you please post a patch to add the escapes.
Sorry for missing this in the original review...

Kyrill


Cheers,
Andre


Hi there,

Sorry for missing those too.


Looks ok.
Thanks,
Kyrill


2016-03-09 Andre Vieira 

* gcc.target/arm/pr45701-1.c: Escape brackets.
* gcc.target/arm/pr45701-2.c: Likewise.




Re: [RFC] dump_varmap in tree-ssa-structalias.c

2016-03-10 Thread Richard Biener
On Thu, 10 Mar 2016, Tom de Vries wrote:

> Hi,
> 
> I wrote attached patch to print the actual contents of the varmap variable in
> tree-ssa-structalias.c.
> 
> Does it make sense to rewrite this into a dump_varmap/debug_varmap patch?

Yes (but please not dump it by default) and I'd rather have a
split-out dump_varinfo to work with when debugging.

Richard.

> Thanks,
> - Tom
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: Add C++ special math functions to C++17

2016-03-10 Thread Ed Smith-Rowland

On 03/10/2016 03:48 AM, Jakub Jelinek wrote:

On Thu, Mar 10, 2016 at 03:44:26AM -0500, Ed Smith-Rowland wrote:

Here is a small patch to allow the special maths functions that came in with
TR29124 into C++17 in Jacksonville.
I am leaving the TR29124 access with __STDCPP_WANT_MATH_SPEC_FUNCS__ for C++
< 17.

I added the SD-6 feature macro:
#define __cpp_lib_math_special_functions 20163L

Passes testing on x86-64-linux.

OK?

ENOPATCH ;)

Jakub


Arrgh!

2016-03-10  Edward Smith-Rowland  <3dw...@verizon.net>

Pull C++ Special Math into std for C++17.
* include/bits/specfun.h: Allow special functions for C++17;
define __cpp_lib_math_special_functions.
* include/c_global/cmath: Allow special functions for C++17.
* include/c_compatibility/math.h: Ditto

Index: include/bits/specfun.h
===
--- include/bits/specfun.h  
(svn+ssh://e...@gcc.gnu.org/svn/gcc/trunk/libstdc++-v3/include) (working copy)
+++ svn+ssh://e...@gcc.gnu.org/svn/gcc/trunk/libstdc++-v3/include   
(revision 234098)
@@ -1,6 +1,6 @@
 // Mathematical Special Functions for -*- C++ -*-
 
-// Copyright (C) 2006-2016 Free Software Foundation, Inc.
+// Copyright (C) 2006-2015 Free Software Foundation, Inc.
 //
 // This file is part of the GNU ISO C++ Library.  This library is free
 // software; you can redistribute it and/or modify it under the
@@ -34,21 +34,15 @@
 
 #include 
 
-#define __STDCPP_MATH_SPEC_FUNCS__ 201003L
-
-#define __cpp_lib_math_special_functions 201603L
-
-#if __cplusplus <= 201402L && __STDCPP_WANT_MATH_SPEC_FUNCS__ == 0
+#if __STDCPP_WANT_MATH_SPEC_FUNCS__ == 0
 # error include  and define __STDCPP_WANT_MATH_SPEC_FUNCS__
 #endif
 
+#define __STDCPP_MATH_SPEC_FUNCS__ 201003L
+
+#include 
 #include 
-#include 
-#if __cplusplus >= 201103L
-#  include 
-#else
-#  include 
-#endif
+#include 
 
 #include 
 #include 
Index: include/c_global/cmath
===
--- include/c_global/cmath  
(svn+ssh://e...@gcc.gnu.org/svn/gcc/trunk/libstdc++-v3/include) (working copy)
+++ svn+ssh://e...@gcc.gnu.org/svn/gcc/trunk/libstdc++-v3/include   
(revision 234098)
@@ -1790,7 +1790,7 @@
 
 #endif // C++11
 
-#if __cplusplus > 201402L || __STDCPP_WANT_MATH_SPEC_FUNCS__ == 1
+#if __STDCPP_WANT_MATH_SPEC_FUNCS__ == 1
 #  include 
 #endif
 
Index: include/c_compatibility/math.h
===
--- include/c_compatibility/math.h  
(svn+ssh://e...@gcc.gnu.org/svn/gcc/trunk/libstdc++-v3/include) (working copy)
+++ svn+ssh://e...@gcc.gnu.org/svn/gcc/trunk/libstdc++-v3/include   
(revision 234098)
@@ -111,7 +111,7 @@
 using std::trunc;
 #endif // C++11 && _GLIBCXX_USE_C99_MATH_TR1
 
-#if __cplusplus > 201402L || __STDCPP_WANT_MATH_SPEC_FUNCS__ == 1
+#if __STDCPP_WANT_MATH_SPEC_FUNCS__ == 1
 using std::assoc_laguerref;
 using std::assoc_laguerrel;
 using std::assoc_laguerre;


Re: Add C++ special math functions to C++17

2016-03-10 Thread Jakub Jelinek
On Thu, Mar 10, 2016 at 03:44:26AM -0500, Ed Smith-Rowland wrote:
> Here is a small patch to allow the special maths functions that came in with
> TR29124 into C++17 in Jacksonville.
> I am leaving the TR29124 access with __STDCPP_WANT_MATH_SPEC_FUNCS__ for C++
> < 17.
> 
> I added the SD-6 feature macro:
> #define __cpp_lib_math_special_functions 20163L
> 
> Passes testing on x86-64-linux.
> 
> OK?

ENOPATCH ;)

Jakub


Add C++ special math functions to C++17

2016-03-10 Thread Ed Smith-Rowland
Here is a small patch to allow the special maths functions that came in 
with TR29124 into C++17 in Jacksonville.
I am leaving the TR29124 access with __STDCPP_WANT_MATH_SPEC_FUNCS__ for 
C++ < 17.


I added the SD-6 feature macro:
#define __cpp_lib_math_special_functions 20163L

Passes testing on x86-64-linux.

OK?



Fix 69650, bogus line numbers from libcpp

2016-03-10 Thread Bernd Schmidt
This is a case where bogus #line directives can confuse libcpp into 
producing nonsensical line numbers, even leading to a crash later on in LTO.


The following patch moves the test earlier to a point where we can more 
easily recover from the error condition. I should note that I changed 
the raw fprintf (stderr) to a cpp_error call, which is a slight change 
in behaviour (we don't even get to LTO anymore due to erroring out earlier).


Bootstrapped and tested on x86_64-linux (as always including Ada, which 
failed with an earlier version of the patch). Ok?



Bernd
	PR lto/69650
	* directives.c (do_linemarker): Test for file left but not entered
	here.
	* line-map.c (linemap_add): Not here.

	PR lto/69650
	* gcc.dg/pr69650.c: New test.

Index: libcpp/directives.c
===
--- libcpp/directives.c	(revision 234025)
+++ libcpp/directives.c	(working copy)
@@ -1046,6 +1046,19 @@ do_linemarker (cpp_reader *pfile)
 
   skip_rest_of_line (pfile);
 
+  if (reason == LC_LEAVE)
+{
+  const line_map_ordinary *from;  
+  if (MAIN_FILE_P (map)
+	  || (new_file
+	  && (from = INCLUDED_FROM (pfile->line_table, map)) != NULL
+	  && filename_cmp (ORDINARY_MAP_FILE_NAME (from), new_file) != 0))
+	{
+	  cpp_error (pfile, CPP_DL_ERROR,
+		 "file \"%s\" left but not entered", new_file);
+	  return;
+	}
+}
   /* Compensate for the increment in linemap_add that occurs in
  _cpp_do_file_change.  We're currently at the start of the line
  *following* the #line directive.  A separate source_location for this
Index: libcpp/line-map.c
===
--- libcpp/line-map.c	(revision 234025)
+++ libcpp/line-map.c	(working copy)
@@ -514,43 +514,23 @@ linemap_add (struct line_maps *set, enum
 	 "included", this variable points the map in use right before the
 	 #include "included", inside the same "includer" file.  */
   line_map_ordinary *from;
-  bool error;
-
-  if (MAIN_FILE_P (map - 1))
-	{
-	  /* So this _should_ mean we are leaving the main file --
-	 effectively ending the compilation unit. But to_file not
-	 being NULL means the caller thinks we are leaving to
-	 another file. This is an erroneous behaviour but we'll
-	 try to recover from it. Let's pretend we are not leaving
-	 the main file.  */
-	  error = true;
-  reason = LC_RENAME;
-  from = map - 1;
-	}
-  else
-	{
-	  /* (MAP - 1) points to the map we are leaving. The
-	 map from which (MAP - 1) got included should be the map
-	 that comes right before MAP in the same file.  */
-	  from = INCLUDED_FROM (set, map - 1);
-	  error = to_file && filename_cmp (ORDINARY_MAP_FILE_NAME (from),
-	   to_file);
-	}
 
-  /* Depending upon whether we are handling preprocessed input or
-	 not, this can be a user error or an ICE.  */
-  if (error)
-	fprintf (stderr, "line-map.c: file \"%s\" left but not entered\n",
-		 to_file);
+  linemap_assert (!MAIN_FILE_P (map - 1));
+  /* (MAP - 1) points to the map we are leaving. The
+	 map from which (MAP - 1) got included should be the map
+	 that comes right before MAP in the same file.  */
+  from = INCLUDED_FROM (set, map - 1);
 
   /* A TO_FILE of NULL is special - we use the natural values.  */
-  if (error || to_file == NULL)
+  if (to_file == NULL)
 	{
 	  to_file = ORDINARY_MAP_FILE_NAME (from);
 	  to_line = SOURCE_LINE (from, from[1].start_location);
 	  sysp = ORDINARY_MAP_IN_SYSTEM_HEADER_P (from);
 	}
+  else
+	linemap_assert (filename_cmp (ORDINARY_MAP_FILE_NAME (from),
+  to_file) == 0);
 }
 
   map->sysp = sysp;
Index: gcc/testsuite/gcc.dg/pr69650.c
===
--- gcc/testsuite/gcc.dg/pr69650.c	(revision 0)
+++ gcc/testsuite/gcc.dg/pr69650.c	(working copy)
@@ -0,0 +1,5 @@
+/* { dg-do compile } */
+/* { dg-options "-std=gnu99" } */
+
+# 9 "" 2 /* { dg-error "left but not entered" } */
+not_a_type a; /* { dg-error "unknown type" } */



[committed][testsuite] Xfail pr46032.c scan if alignment requirements not met

2016-03-10 Thread Tom de Vries

Hi,

this patch fixes re-opened PR68915 on sparc.

It xfails a scan line that's not applicable for sparc.

Committed to trunk.

Thanks,
- Tom
Xfail pr46032.c scan if alignment requirements not met

2016-03-10  Tom de Vries  

	PR testsuite/68915
	* gcc.dg/vect/pr46032.c: Xfail scan if alignment requirements not met.

---
 gcc/testsuite/gcc.dg/vect/pr46032.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/vect/pr46032.c b/gcc/testsuite/gcc.dg/vect/pr46032.c
index d53d1f2..e1a5834 100644
--- a/gcc/testsuite/gcc.dg/vect/pr46032.c
+++ b/gcc/testsuite/gcc.dg/vect/pr46032.c
@@ -44,6 +44,6 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "note: vectorized 1 loop" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "note: vectorized 1 loop" 1 "vect" { xfail { vect_no_align && { ! vect_hw_misalign } } } } } */
 /* { dg-final { scan-tree-dump-not "versioning for alias required" "vect" } } */
 


Fix 70083, lra-induced crash

2016-03-10 Thread Bernd Schmidt
This crash happens because LRA tries to save an AVX hard reg in a large 
mode, and it only appears in the function in smaller modes. Stack 
alignment isn't set up to support the larger mode.


Currently, biggest_mode for hard registers is set up from regno_reg_rtx, 
set up to a large mode for argument regs. That mode is not necessarily 
seen in the function itself and may be too large. If that initialization 
is changed to use VOIDmode, we compute the correct value during 
lra_push_insns, but then subsequently we clear it to VOIDmode again, and 
it never seems to get updated. Hence, the patch has several parts: 
initialize hard reg biggest_mode with VOIDmode, ensure it gets updated 
during process_bb_lives, and use the value in split_reg.


Bootstrapped and tested on x86_64-linux, ok?


Bernd
	PR target/70083
	* lra-lives.c (process_bb_lives): Also update biggest mode for hard
	regs.
	(lra_create_live_ranges_1): initialize hard register biggest_mode to
	VOIDmode.
	* lra-constraints.c (split_reg): For hard regs, try to find the
	biggest single-register mode used in the function.

testsuite/
	PR target/70083
	* gcc.dg/torture/pr70083.c: New test.
	* gcc.target/i386/pr70083.c: New test.

Index: gcc/lra-lives.c
===
--- gcc/lra-lives.c	(revision 234025)
+++ gcc/lra-lives.c	(working copy)
@@ -700,12 +700,13 @@ process_bb_lives (basic_block bb, int 
 
   /* Update max ref width and hard reg usage.  */
   for (reg = curr_id->regs; reg != NULL; reg = reg->next)
-	if (reg->regno >= FIRST_PSEUDO_REGISTER
-	&& (GET_MODE_SIZE (reg->biggest_mode)
-		> GET_MODE_SIZE (lra_reg_info[reg->regno].biggest_mode)))
-	  lra_reg_info[reg->regno].biggest_mode = reg->biggest_mode;
-	else if (reg->regno < FIRST_PSEUDO_REGISTER)
-	  lra_hard_reg_usage[reg->regno] += freq;
+	{
+	  if (GET_MODE_SIZE (reg->biggest_mode)
+	  > GET_MODE_SIZE (lra_reg_info[reg->regno].biggest_mode))
+	lra_reg_info[reg->regno].biggest_mode = reg->biggest_mode;
+	  if (reg->regno < FIRST_PSEUDO_REGISTER)
+	lra_hard_reg_usage[reg->regno] += freq;
+	}
 
   call_p = CALL_P (curr_insn);
   src_regno = (set != NULL_RTX && REG_P (SET_SRC (set))
@@ -1208,7 +1209,7 @@ lra_create_live_ranges_1 (bool all_p, bo
 	 conservative because of recent transformation.  Here in this
 	 file we recalculate it again as it costs practically
 	 nothing.  */
-  if (regno_reg_rtx[i] != NULL_RTX)
+  if (i >= FIRST_PSEUDO_REGISTER && regno_reg_rtx[i] != NULL_RTX)
 	lra_reg_info[i].biggest_mode = GET_MODE (regno_reg_rtx[i]);
   else
 	lra_reg_info[i].biggest_mode = VOIDmode;
Index: gcc/lra-constraints.c
===
--- gcc/lra-constraints.c	(revision 234025)
+++ gcc/lra-constraints.c	(working copy)
@@ -4972,6 +4972,7 @@ split_reg (bool before_p, int original_r
   rtx_insn *restore, *save;
   bool after_p;
   bool call_save_p;
+  machine_mode mode;
 
   if (original_regno < FIRST_PSEUDO_REGISTER)
 {
@@ -4979,24 +4980,32 @@ split_reg (bool before_p, int original_r
   hard_regno = original_regno;
   call_save_p = false;
   nregs = 1;
+  mode = lra_reg_info[hard_regno].biggest_mode;
+  machine_mode reg_rtx_mode = GET_MODE (regno_reg_rtx[hard_regno]);
+  if (GET_MODE_SIZE (mode) > GET_MODE_SIZE (reg_rtx_mode))
+	{
+	  original_reg = regno_reg_rtx[hard_regno];
+	  mode = reg_rtx_mode;
+	}
+  else
+	original_reg = gen_rtx_REG (mode, hard_regno);
 }
   else
 {
+  mode = PSEUDO_REGNO_MODE (original_regno);
   hard_regno = reg_renumber[original_regno];
-  nregs = hard_regno_nregs[hard_regno][PSEUDO_REGNO_MODE (original_regno)];
+  nregs = hard_regno_nregs[hard_regno][mode];
   rclass = lra_get_allocno_class (original_regno);
   original_reg = regno_reg_rtx[original_regno];
   call_save_p = need_for_call_save_p (original_regno);
 }
-  original_reg = regno_reg_rtx[original_regno];
   lra_assert (hard_regno >= 0);
   if (lra_dump_file != NULL)
 fprintf (lra_dump_file,
 	 "	  \n");
+	  
   if (call_save_p)
 {
-  machine_mode mode = GET_MODE (original_reg);
-
   mode = HARD_REGNO_CALLER_SAVE_MODE (hard_regno,
 	  hard_regno_nregs[hard_regno][mode],
 	  mode);
@@ -5004,8 +5013,7 @@ split_reg (bool before_p, int original_r
 }
   else
 {
-  rclass = choose_split_class (rclass, hard_regno,
-   GET_MODE (original_reg));
+  rclass = choose_split_class (rclass, hard_regno, mode);
   if (rclass == NO_REGS)
 	{
 	  if (lra_dump_file != NULL)
@@ -5023,8 +5031,7 @@ split_reg (bool before_p, int original_r
 	}
 	  return false;
 	}
-  new_reg = lra_create_new_reg (GET_MODE (original_reg), original_reg,
-rclass, "split");
+  new_reg = lra_create_new_reg (mode, original_reg, rclass, "split");
   reg_renumber[REGNO (new_reg)] = hard_regno;
 }
   save = 

[PATCH] Fix PR70128

2016-03-10 Thread Richard Biener

The following fixes a miscompilation of the linux kernel which does
patching of instructions by accessing memory relative to function
addresses (that's undefined in C and won't work for targets using
function descriptors but hey - those guys say they know what they
are doing).

Thus the following makes points-to analysis consider function and
label addresses as pointing to global memory (rather than "readonly"
data which we don't put into points-to sets because readonly data
is not interesting for alias analysis purposes).

We still don't consider this instruction data to hold valid pointers
so if people will start to extract pointers from there and expect
dereferences to alias say global symbols they'll still not get what
they expect - if we were to handle this "correctly" as well we'd
get negative effects from our analysis by getting NONLOCAL/ANYTHING
bleed through the solution.  The patch below doesn't suffer from
anything like that because it only changes how we interpret the
points-to result rather than changing its outcome.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2016-03-10  Richard Biener  

PR tree-optimization/70128
* tree-ssa-structalias.c (set_uids_in_ptset): Set
vars_contains_nonlocal for any FUNCTION_DECL or LABEL_DECL.

* gcc.dg/tree-ssa/alias-34.c: New testcase.
* gcc.dg/tree-ssa/alias-35.c: Likewise.

Index: gcc/tree-ssa-structalias.c
===
*** gcc/tree-ssa-structalias.c  (revision 234025)
--- gcc/tree-ssa-structalias.c  (working copy)
*** set_uids_in_ptset (bitmap into, bitmap f
*** 6280,6285 
--- 6280,6295 
  && ! auto_var_in_fn_p (vi->decl, fndecl)))
pt->vars_contains_nonlocal = true;
}
+ 
+   else if (TREE_CODE (vi->decl) == FUNCTION_DECL
+  || TREE_CODE (vi->decl) == LABEL_DECL)
+   {
+ /* Nothing should read/write from/to code so we can
+save bits by not including them in the points-to bitmaps.
+Still mark the points-to set as containing global memory
+to make code-patching possible - see PR70128.  */
+ pt->vars_contains_nonlocal = true;
+   }
  }
  }
  
Index: gcc/testsuite/gcc.dg/tree-ssa/alias-34.c
===
*** gcc/testsuite/gcc.dg/tree-ssa/alias-34.c(revision 0)
--- gcc/testsuite/gcc.dg/tree-ssa/alias-34.c(working copy)
***
*** 0 
--- 1,19 
+ /* { dg-do compile } */
+ /* { dg-options "-O2 -fno-strict-aliasing -fdump-tree-optimized" } */
+ 
+ void foo (int b)
+ {
+   void *p;
+ lab:
+   if (b)
+ p = &
+   else
+ {
+ lab2:
+   p = &
+ }
+   *(char *)p = 1;
+ }
+ 
+ /* We should keep the store to the label locations.  */
+ /* { dg-final { scan-tree-dump " = 1;" "optimized" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/alias-35.c
===
*** gcc/testsuite/gcc.dg/tree-ssa/alias-35.c(revision 0)
--- gcc/testsuite/gcc.dg/tree-ssa/alias-35.c(working copy)
***
*** 0 
--- 1,18 
+ /* PR70128 */
+ /* { dg-do compile } */
+ /* { dg-options "-O2 -fno-strict-aliasing -fdump-tree-optimized" } */
+ 
+ void foo (int b)
+ {
+   extern void bar (void);
+   extern void baz (void);
+   void *p;
+   if (b)
+ p = bar;
+   else
+ p = baz;
+   *(char *)p = 1;
+ }
+ 
+ /* We should keep the store to the function locations.  */
+ /* { dg-final { scan-tree-dump " = 1;" "optimized" } } */