Re: copy/copy_backward/fill/fill_n/equal rework

2019-09-24 Thread François Dumont

Ping ?

On 9/9/19 8:34 PM, François Dumont wrote:

Hi

    This patch improves stl_algobase.h 
copy/copy_backward/fill/fill_n/equal implementations. The improvements 
are:


- activation of algo specialization for __gnu_debug::_Safe_iterator 
(w/o _GLIBCXX_DEBUG mode)


- activation of algo specialization for _Deque_iterator even if mixed 
with another kind of iterator.


- activation of algo specializations __copy_move_a2 for something else 
than pointers. For example this code:


std::vector v { 'a', 'b',  };

ostreambuf_iterator out(std::cout);

std::copy(v.begin(), v.end(), out);

is not calling the specialization __copy_move_a2(const char*, const 
char*, ostreambuf_iterator<>);


It also fix a _GLIBCXX_DEBUG issue where the __niter_base 
specialization was wrongly removing the _Safe_iterator<> layer. The 
testsuite/25_algorithms/copy/debug/1_neg.cc test case was failing on a 
debug assertion because _after_ the copy we were trying to increment 
the vector iterator after past-the-end. Of course the problem is the 
_after_, Debug mode should detect this _before_ it takes place which 
it does now.


Note that std::fill_n is now making use of std::fill for some 
optimizations dealing with random access iterators.


Performances are very good:

Before:

copy_backward_deque_iterators.cc    deque 2 deque 1084r 1084u 
0s 0mem    0pf
copy_backward_deque_iterators.cc    deque 2 vector 3373r 3372u 
0s 0mem    0pf
copy_backward_deque_iterators.cc    vector 2 deque 3316r 3316u 
0s 0mem    0pf
copy_backward_deque_iterators.cc    int deque 2 char vector 3610r 
3609u    0s 0mem    0pf
copy_backward_deque_iterators.cc    char vector 2 int deque 3552r 
3552u    0s 0mem    0pf
copy_backward_deque_iterators.cc    deque 2 list 10528r 10528u 
0s 0mem    0pf
copy_backward_deque_iterators.cc    list 2 deque 2161r 2162u 
0s 0mem    0pf
copy_deque_iterators.cc      deque 2 deque         752r 
751u    0s 0mem    0pf
copy_deque_iterators.cc      deque 2 vector       3300r 
3299u    0s 0mem    0pf
copy_deque_iterators.cc      vector 2 deque       3144r 
3140u    0s 0mem    0pf
copy_deque_iterators.cc      int deque 2 char vector      3340r 
3338u    1s 0mem    0pf
copy_deque_iterators.cc      char vector 2 int deque      3132r 
3132u    0s 0mem    0pf
copy_deque_iterators.cc      deque 2 list     10013r 
10012u    0s 0mem    0pf
copy_deque_iterators.cc      list 2 deque     2274r 
2275u    0s 0mem    0pf
equal_deque_iterators.cc     deque vs deque       8676r 
8675u    0s 0mem    0pf
equal_deque_iterators.cc     deque vs vector      5870r 
5870u    0s 0mem    0pf
equal_deque_iterators.cc     vector vs deque      3163r 
3163u    0s 0mem    0pf
equal_deque_iterators.cc     int deque vs char vector     5845r 
5845u    0s 0mem    0pf
equal_deque_iterators.cc     char vector vs int deque     3307r 
3307u    0s 0mem    0pf


After:

copy_backward_deque_iterators.cc    deque 2 deque  697r  697u 
0s 0mem    0pf
copy_backward_deque_iterators.cc    deque 2 vector  219r  218u 
0s 0mem    0pf
copy_backward_deque_iterators.cc    vector 2 deque  453r  453u 
0s 0mem    0pf
copy_backward_deque_iterators.cc    int deque 2 char vector 1914r 
1915u    0s 0mem    0pf
copy_backward_deque_iterators.cc    char vector 2 int deque 2112r 
2111u    0s 0mem    0pf
copy_backward_deque_iterators.cc    deque 2 list 7770r 7771u 
0s 0mem    0pf
copy_backward_deque_iterators.cc    list 2 deque 2194r 2193u 
0s 0mem    0pf
copy_deque_iterators.cc      deque 2 deque         505r 
504u    0s 0mem    0pf
copy_deque_iterators.cc      deque 2 vector        221r 
221u    0s 0mem    0pf
copy_deque_iterators.cc      vector 2 deque        398r 
397u    0s 0mem    0pf
copy_deque_iterators.cc      int deque 2 char vector      1770r 
1767u    0s 0mem    0pf
copy_deque_iterators.cc      char vector 2 int deque      1995r 
1993u    0s 0mem    0pf
copy_deque_iterators.cc      deque 2 list     7650r 
7641u    2s 0mem    0pf
copy_deque_iterators.cc      list 2 deque     2270r 
2270u    0s 0mem    0pf
equal_deque_iterators.cc     deque vs deque        769r 
768u    0s 0mem    0pf
equal_deque_iterators.cc     deque vs vector       231r 
230u    0s 0mem    0pf
equal_deque_iterators.cc     vector vs deque       397r 
397u    0s 0mem    0pf
equal_deque_iterators.cc     int deque vs char vector     1541r 
1541u    0s 0mem    0pf
equal_deque_iterators.cc     char vector vs int deque     1623r 
1623u    0s 0mem    0pf


In Debug Mode it is of course even better. I haven't had the patience 
to run the benches before the patch, it just takes 

Re: C++ PATCH for c++/91877 - ICE with converting member of packed struct.

2019-09-24 Thread Jason Merrill

On 9/24/19 11:17 AM, Marek Polacek wrote:

This started to ICE with my CWG 2352 fix.  In reference_binding, we now bind
directly when the types are similar, not just when they are the same.  But even
direct binding can involve a temporary, e.g. for a bit-field, or, as in this
test, for a packed field.


Well, if there's a temporary, it isn't direct binding.


convert_like will actually create the temporary, but we were triggering the
assert checking that the types are the same.  Now they don't have to be, so
adjust the assert accordingly.


Previously we could have gotten a derived class, but we would have a 
ck_base to make the assert succeed.  Do we want a ck_qual under the 
ck_ref_bind?  It isn't necessary for adjustment, but might be helpful 
for conversion sequence ranking (which seems to be missing some standard 
wording).  On the other hand, it might be too much trouble making things 
understand ck_ref_bind around ck_qual, and compare_ics can probably do 
just fine without.


The patch is OK, but please add some testcases for overload resolution 
with more and less-qualified similar types.



Bootstrapped/regtested on x86_64-linux, ok for trunk?

2019-09-24  Marek Polacek  

PR c++/91877 - ICE with converting member of packed struct.
* call.c (convert_like_real): Use similar_type_p in an assert.

* g++.dg/conversion/packed1.C: New test.

diff --git gcc/cp/call.c gcc/cp/call.c
index 77f10a9f5f1..45b984ecb11 100644
--- gcc/cp/call.c
+++ gcc/cp/call.c
@@ -7382,8 +7382,7 @@ convert_like_real (conversion *convs, tree expr, tree fn, 
int argnum,
tree type = TREE_TYPE (ref_type);
cp_lvalue_kind lvalue = lvalue_kind (expr);
  
-	gcc_assert (same_type_ignoring_top_level_qualifiers_p

-   (type, next_conversion (convs)->type));
+   gcc_assert (similar_type_p (type, next_conversion (convs)->type));
if (!CP_TYPE_CONST_NON_VOLATILE_P (type)
&& !TYPE_REF_IS_RVALUE (ref_type))
  {
diff --git gcc/testsuite/g++.dg/conversion/packed1.C 
gcc/testsuite/g++.dg/conversion/packed1.C
new file mode 100644
index 000..c4be930bc19
--- /dev/null
+++ gcc/testsuite/g++.dg/conversion/packed1.C
@@ -0,0 +1,12 @@
+// PR c++/91877 - ICE with converting member of packed struct.
+// { dg-do compile { target c++11 } }
+// { dg-options "-fpack-struct" }
+
+template  class b {
+public:
+  b(const a &);
+};
+struct {
+  int *c;
+} d;
+void e() { b(d.c); }





[PATCH v4] Missed function specialization + partial devirtualization

2019-09-24 Thread luoxhu
Hi,

Sorry for replying so late due to cauldron conference and other LTO issues
I was working on.

v4 Changes:
 1. Rebase to trunk.
 2. Remove num_of_ics and use vector's length to avoid redundancy.
 3. Update the code in ipa-profile.c to improve review feasibility.
 4. Add function has_indirect_call_p and has_multiple_indirect_call_p.
 5. For parameter control, I will leave it to next patch as it is a
relative independent function.  Currently, maximum number of
promotions is GCOV_TOPN_VALUES as only 4 profiling value limited
from profile-generate, therefore minimum probability is adjusted to
25% in value-prof.c, it was 75% also by hard code for single
indirect target.  No control to minimal number of edge
executions yet.  What's more, this patch is a bit large now.

This patch aims to fix PR69678 caused by PGO indirect call profiling
performance issues.
The bug that profiling data is never working was fixed by Martin's pull
back of topN patches, performance got GEOMEAN ~1% improvement(+24% for
511.povray_r specifically).
Still, currently the default profile only generates SINGLE indirect target
that called more than 75%.  This patch leverages MULTIPLE indirect
targets use in LTO-WPA and LTO-LTRANS stage, as a result, function
specialization, profiling, partial devirtualization, inlining and
cloning could be done successfully based on it.
Performance can get improved from 0.70 sec to 0.38 sec on simple tests.
Details are:
  1.  PGO with topn is enabled by default now, but only one indirect
  target edge will be generated in ipa-profile pass, so add variables to enable
  multiple speculative edges through passes, speculative_id will record the
  direct edge index bind to the indirect edge, indirect_call_targets length
  records how many direct edges owned by the indirect edge, postpone gimple_ic
  to ipa-profile like default as inline pass will decide whether it is benefit
  to transform indirect call.
  2.  Use speculative_id to track and search the reference node matched
  with the direct edge's callee for multiple targets.  Actually, it is the
  caller's responsibility to handle the direct edges mapped to same indirect
  edge.  speculative_call_info will return one of the direct edge specified,
  this will leverage current IPA edge process framework mostly.
  3.  Enable LTO WPA/LTRANS stage multiple indirect call targets analysis for
  profile full support in ipa passes and cgraph_edge functions.  speculative_id
  can be set by make_speculative id when multiple targets are binded to
  one indirect edge, and cloned if new edge is cloned.  speculative_id
  is streamed out and stream int by lto like lto_stmt_uid.
  4.  Add 1 in module testcase and 2 cross module testcases.
  5.  Bootstrap and regression test passed on Power8-LE.  No function
  and performance regression for SPEC2017.

gcc/ChangeLog

2019-09-25  Xiong Hu Luo  

PR ipa/69678
* cgraph.c (symbol_table::create_edge): Init speculative_id.
(cgraph_edge::make_speculative): Add param for setting speculative_id.
(cgraph_edge::speculative_call_info): Find reference by
speculative_id for multiple indirect targets.
(cgraph_edge::resolve_speculation): Decrease the speculations
for indirect edge, drop it's speculative if not direct target
left.
(cgraph_edge::redirect_call_stmt_to_callee): Likewise.
(cgraph_node::verify_node): Don't report error if speculative
edge not include statement.
(cgraph_edge::has_multiple_indirect_call_p): New function.
(cgraph_edge::has_indirect_call_p): New function.
* cgraph.h (struct indirect_target_info): New struct.
(indirect_call_targets): New vector variable.
(make_speculative): Add param for setting speculative_id.
(cgraph_edge::has_multiple_indirect_call_p): New declare.
(cgraph_edge::has_indirect_call_p): New declare.
(speculative_id): New variable.
* cgraphclones.c (cgraph_node::create_clone): Clone speculative_id.
* ipa-inline.c (inline_small_functions): Fix iterator update.
* ipa-profile.c (ipa_profile_generate_summary): Add indirect
multiple targets logic.
(ipa_profile): Likewise.
* ipa-ref.h (speculative_id): New variable.
* ipa.c (process_references): Fix typo.
* lto-cgraph.c (lto_output_edge): Add indirect multiple targets
logic.  Stream out speculative_id.
(input_edge): Likewise.
* predict.c (dump_prediction): Remove edges count assert to be
precise.
* symtab.c (symtab_node::create_reference): Init speculative_id.
(symtab_node::clone_references): Clone speculative_id.
(symtab_node::clone_referring): Clone speculative_id.
(symtab_node::clone_reference): Clone speculative_id.
(symtab_node::clear_stmts_in_references): Clear speculative_id.
* tree-inline.c (copy_bb): Duplicate all the 

[PATCH] xtensa: fix PR target/91880

2019-09-24 Thread Max Filippov
Xtensa hwloop_optimize segfaults when zero overhead loop is about to be
inserted as the first instruction of the function.
Insert zero overhead loop instruction into new basic block before the
loop when basic block that precedes the loop is empty.

2019-09-24  Max Filippov  
gcc/
* config/xtensa/xtensa.c (hwloop_optimize): Insert zero overhead
loop instruction into new basic block before the loop when basic
block that precedes the loop is empty.

gcc/testsuite/
* gcc.target/xtensa/pr91880.c: New test case.
* gcc.target/xtensa/xtensa.exp: New test suite.
---
 gcc/config/xtensa/xtensa.c |  5 ++--
 gcc/testsuite/gcc.target/xtensa/pr91880.c  | 10 
 gcc/testsuite/gcc.target/xtensa/xtensa.exp | 41 ++
 3 files changed, 54 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/xtensa/pr91880.c
 create mode 100644 gcc/testsuite/gcc.target/xtensa/xtensa.exp

diff --git a/gcc/config/xtensa/xtensa.c b/gcc/config/xtensa/xtensa.c
index ee5612441e25..2527468d57db 100644
--- a/gcc/config/xtensa/xtensa.c
+++ b/gcc/config/xtensa/xtensa.c
@@ -4235,7 +4235,9 @@ hwloop_optimize (hwloop_info loop)
 
   seq = get_insns ();
 
-  if (!single_succ_p (entry_bb) || vec_safe_length (loop->incoming) > 1)
+  entry_after = BB_END (entry_bb);
+  if (!single_succ_p (entry_bb) || vec_safe_length (loop->incoming) > 1
+  || !entry_after)
 {
   basic_block new_bb;
   edge e;
@@ -4256,7 +4258,6 @@ hwloop_optimize (hwloop_info loop)
 }
   else
 {
-  entry_after = BB_END (entry_bb);
   while (DEBUG_INSN_P (entry_after)
  || (NOTE_P (entry_after)
 && NOTE_KIND (entry_after) != NOTE_INSN_BASIC_BLOCK))
diff --git a/gcc/testsuite/gcc.target/xtensa/pr91880.c 
b/gcc/testsuite/gcc.target/xtensa/pr91880.c
new file mode 100644
index ..f4895a1bb8ec
--- /dev/null
+++ b/gcc/testsuite/gcc.target/xtensa/pr91880.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fomit-frame-pointer -fno-tree-vectorize" } */
+
+void foo (unsigned int n, char *a, char *b)
+{
+  int i;
+
+  for (i = 0; i <= n - 1; ++i)
+a[i] = b[i];
+}
diff --git a/gcc/testsuite/gcc.target/xtensa/xtensa.exp 
b/gcc/testsuite/gcc.target/xtensa/xtensa.exp
new file mode 100644
index ..8720327f526e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/xtensa/xtensa.exp
@@ -0,0 +1,41 @@
+# Copyright (C) 2019 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# .
+
+# GCC testsuite that uses the `dg.exp' driver.
+
+# Exit immediately if this isn't an Xtensa target.
+if ![istarget xtensa*-*-*] then {
+  return
+}
+
+# Load support procs.
+load_lib gcc-dg.exp
+
+# If a testcase doesn't have special options, use these.
+global DEFAULT_CFLAGS
+if ![info exists DEFAULT_CFLAGS] then {
+set DEFAULT_CFLAGS " -ansi -pedantic-errors"
+}
+
+# Initialize `dg'.
+dg-init
+
+# Main loop.
+dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.\[cS\]]] \
+   "" $DEFAULT_CFLAGS
+
+# All done.
+dg-finish
-- 
2.11.0



Re: [Darwin, PPC, Mode Iterators 1/n, committed] Use mode iterators in picbase patterns.

2019-09-24 Thread Segher Boessenkool
Hi Iain,

On Tue, Sep 24, 2019 at 08:31:16PM +0100, Iain Sandoe wrote:
> This switches the picbase load and reload patterns to use the 'P' mode
> iterator instead of writing an SI and DI pattern for each (and deletes the
> old patterns).  No functional change intended.

>  (define_expand "load_macho_picbase"
> -  [(set (reg:SI LR_REGNO)
> +  [(set (reg LR_REGNO)

This changes it to VOIDmode instead?  It should have been reg:P LR_REGNO?

>  (define_expand "reload_macho_picbase"
> -  [(set (reg:SI LR_REGNO)
> +  [(set (reg LR_REGNO)

Same here.


Segher


[PATCH] PR fortran/91802 -- rank+corank must be less than 16

2019-09-24 Thread Steve Kargl
The attached patch has been tested on x86_64-*-freebsd.  OK to commit?

2019-09-24  Steven G. Kargl  

PR fortran/91802
* decl.c (attr_decl1): Check if rank+corank > 15.

2019-09-24  Steven G. Kargl  

PR fortran/91802
* gfortran.dg/pr91802.f90: New test.
-- 
Steve
Index: gcc/fortran/decl.c
===
--- gcc/fortran/decl.c	(revision 275969)
+++ gcc/fortran/decl.c	(working copy)
@@ -8468,6 +8468,15 @@ attr_decl1 (void)
   goto cleanup;
 }
 
+  /* Check F2018:C822.  */
+  if (sym->attr.dimension && sym->attr.codimension
+  && sym->as && sym->as->rank + sym->as->corank > 15)
+{
+  gfc_error ("rank + corank of %qs exceeds 15 at %C", sym->name);
+  m = MATCH_ERROR;
+  goto cleanup;
+}
+
   if (sym->attr.cray_pointee && sym->as != NULL)
 {
   /* Fix the array spec.  */
Index: gcc/testsuite/gfortran.dg/pr91802.f90
===
--- gcc/testsuite/gfortran.dg/pr91802.f90	(nonexistent)
+++ gcc/testsuite/gfortran.dg/pr91802.f90	(working copy)
@@ -0,0 +1,9 @@
+! { dg-do compile }
+! { dg-options "-fcoarray=single" }
+! Code contributed by Gerhard Steinmetz
+! PR fortran/91802
+module m
+   real :: x
+   dimension ::   x(1,2,1,2,1,2,1,2)
+   codimension :: x[1,2,1,2,1,2,1,*] ! { dg-error "exceeds 15" }
+end


[PATCH] The inline keyword is supported in all new C standards

2019-09-24 Thread Palmer Dabbelt
The documentation used to indicate that the inline keyword was only
supported by c99 and c11, whereas in fact it is supported by c99 and all
newer standards.

gcc/ChangeLog

2019-09-24  Palmer Dabbelt  

* doc/extended.texi (Alternate Keywords): Change "-std=c11" to "a
later standard."
---
 gcc/doc/extend.texi | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 64fccfe9b87..ef2fde3d989 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -10739,7 +10739,7 @@ a general-purpose header file that should be usable by 
all programs,
 including ISO C programs.  The keywords @code{asm}, @code{typeof} and
 @code{inline} are not available in programs compiled with
 @option{-ansi} or @option{-std} (although @code{inline} can be used in a
-program compiled with @option{-std=c99} or @option{-std=c11}).  The
+program compiled with @option{-std=c99} or a later standard).  The
 ISO C99 keyword
 @code{restrict} is only available when @option{-std=gnu99} (which will
 eventually be the default) or @option{-std=c99} (or the equivalent
-- 
2.21.0



[PATCH] PR fortran/91864 -- inquiry parameter is a constant

2019-09-24 Thread Steve Kargl
The attached patch has been tested on x86_64-*-freebsd.  OK to commit?

2019-09-24  Steven G. Kargl  

PR fortran/91864
* gcc/fortran/io.c (match_io_element): An inquiry parameter cannot be
read into.
* gcc/fortran/match.c (gfc_match_allocate): An inquiry parameter 
can be neither an allocate-object nor stat variable.
(gfc_match_deallocate): An inquiry parameter cannot be deallocated.

2019-09-24  Steven G. Kargl  

PR fortran/91864
* gcc/testsuite/gfortran.dg/pr91864.f90

-- 
Steve
Index: gcc/fortran/io.c
===
--- gcc/fortran/io.c	(revision 276104)
+++ gcc/fortran/io.c	(working copy)
@@ -3657,8 +3657,18 @@ match_io_element (io_kind k, gfc_code **cpp)
 {
   m = gfc_match_variable (, 0);
   if (m == MATCH_NO)
-	gfc_error ("Expected variable in READ statement at %C");
+	{
+	  gfc_error ("Expecting variable in READ statement at %C");
+	  m = MATCH_ERROR;
+	}
 
+  if (m == MATCH_YES && expr->expr_type == EXPR_CONSTANT)
+	{
+	  gfc_error ("Expecting variable or io-implied-do in READ statement "
+		   "at %L", >where);
+	  m = MATCH_ERROR;
+	}
+
   if (m == MATCH_YES
 	  && expr->expr_type == EXPR_VARIABLE
 	  && expr->symtree->n.sym->attr.external)
@@ -3667,7 +3677,6 @@ match_io_element (io_kind k, gfc_code **cpp)
 		 >where);
 	  m = MATCH_ERROR;
 	}
-
 }
   else
 {
Index: gcc/fortran/match.c
===
--- gcc/fortran/match.c	(revision 276104)
+++ gcc/fortran/match.c	(working copy)
@@ -4242,6 +4242,12 @@ gfc_match_allocate (void)
   if (m == MATCH_ERROR)
 	goto cleanup;
 
+  if (tail->expr->expr_type == EXPR_CONSTANT)
+	{
+	  gfc_error ("Unexpected constant at %C");
+	  goto cleanup;
+	}
+
   if (gfc_check_do_variable (tail->expr->symtree))
 	goto cleanup;
 
@@ -4374,6 +4380,12 @@ alloc_opt_list:
 	  tmp = NULL;
 	  saw_stat = true;
 
+	  if (stat->expr_type == EXPR_CONSTANT)
+	{
+	  gfc_error ("STAT tag at %L cannot be a constant", >where);
+	  goto cleanup;
+	}
+
 	  if (gfc_check_do_variable (stat->symtree))
 	goto cleanup;
 
@@ -4649,6 +4661,12 @@ gfc_match_deallocate (void)
 	goto cleanup;
   if (m == MATCH_NO)
 	goto syntax;
+
+  if (tail->expr->expr_type == EXPR_CONSTANT)
+	{
+	  gfc_error ("Unexpected constant at %C");
+	  goto cleanup;
+	}
 
   if (gfc_check_do_variable (tail->expr->symtree))
 	goto cleanup;
Index: gcc/testsuite/gfortran.dg/pr91864.f90
===
--- gcc/testsuite/gfortran.dg/pr91864.f90	(nonexistent)
+++ gcc/testsuite/gfortran.dg/pr91864.f90	(working copy)
@@ -0,0 +1,22 @@
+program p
+   integer :: i
+   read (*,*) i%kind   ! { dg-error "Expecting variable or io-implied-do" }
+end
+
+subroutine t
+   integer, allocatable :: x(:)
+   integer :: stat
+   allocate (x(3), stat=stat%kind)   ! { dg-error "cannot be a constant" }
+end
+
+subroutine u
+   integer, allocatable :: x(:)
+   integer :: stat
+   allocate (x(3), stat%kind=stat)   ! { dg-error "Unexpected constant" }
+end
+
+subroutine v
+   integer, allocatable :: x(:)
+   integer :: stat
+   deallocate (x, stat%kind=stat)   ! { dg-error "Unexpected constant" }
+end


Re: [PATCH] Remove vectorizer reduction operand swapping

2019-09-24 Thread Christophe Lyon
On Wed, 18 Sep 2019 at 20:11, Richard Biener  wrote:
>
>
> It shouldn't be neccessary.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.
> (SLP part testing separately)
>
> Richard.
>
> 2019-09-18  Richard Biener  
>
> * tree-vect-loop.c (vect_is_simple_reduction): Remove operand
> swapping.
> (vectorize_fold_left_reduction): Remove assert.
> (vectorizable_reduction): Also expect COND_EXPR non-reduction
> operand in position 2.  Remove assert.
>

Hi,

Since this was committed (r275898), I've noticed a regression on armeb:
FAIL: gcc.dg/vect/vect-cond-4.c execution test

I'm seeing this with qemu, but I do not have the execution traces yet.

Christophe

> Index: gcc/tree-vect-loop.c
> ===
> --- gcc/tree-vect-loop.c(revision 275872)
> +++ gcc/tree-vect-loop.c(working copy)
> @@ -3278,56 +3278,8 @@ vect_is_simple_reduction (loop_vec_info
>   || !flow_bb_inside_loop_p (loop, gimple_bb (def2_info->stmt))
>   || vect_valid_reduction_input_p (def2_info)))
>  {
> -  if (! nested_in_vect_loop && orig_code != MINUS_EXPR)
> -   {
> - /* Check if we can swap operands (just for simplicity - so that
> -the rest of the code can assume that the reduction variable
> -is always the last (second) argument).  */
> - if (code == COND_EXPR)
> -   {
> - /* Swap cond_expr by inverting the condition.  */
> - tree cond_expr = gimple_assign_rhs1 (def_stmt);
> - enum tree_code invert_code = ERROR_MARK;
> - enum tree_code cond_code = TREE_CODE (cond_expr);
> -
> - if (TREE_CODE_CLASS (cond_code) == tcc_comparison)
> -   {
> - bool honor_nans = HONOR_NANS (TREE_OPERAND (cond_expr, 0));
> - invert_code = invert_tree_comparison (cond_code, 
> honor_nans);
> -   }
> - if (invert_code != ERROR_MARK)
> -   {
> - TREE_SET_CODE (cond_expr, invert_code);
> - swap_ssa_operands (def_stmt,
> -gimple_assign_rhs2_ptr (def_stmt),
> -gimple_assign_rhs3_ptr (def_stmt));
> -   }
> - else
> -   {
> - if (dump_enabled_p ())
> -   report_vect_op (MSG_NOTE, def_stmt,
> -   "detected reduction: cannot swap operands 
> "
> -   "for cond_expr");
> - return NULL;
> -   }
> -   }
> - else
> -   swap_ssa_operands (def_stmt, gimple_assign_rhs1_ptr (def_stmt),
> -  gimple_assign_rhs2_ptr (def_stmt));
> -
> - if (dump_enabled_p ())
> -   report_vect_op (MSG_NOTE, def_stmt,
> -   "detected reduction: need to swap operands: ");
> -
> - if (CONSTANT_CLASS_P (gimple_assign_rhs1 (def_stmt)))
> -   LOOP_VINFO_OPERANDS_SWAPPED (loop_info) = true;
> -}
> -  else
> -{
> -  if (dump_enabled_p ())
> -report_vect_op (MSG_NOTE, def_stmt, "detected reduction: ");
> -}
> -
> +  if (dump_enabled_p ())
> +   report_vect_op (MSG_NOTE, def_stmt, "detected reduction: ");
>return def_stmt_info;
>  }
>
> @@ -5969,7 +5921,6 @@ vectorize_fold_left_reduction (stmt_vec_
>gcc_assert (!nested_in_vect_loop_p (loop, stmt_info));
>gcc_assert (ncopies == 1);
>gcc_assert (TREE_CODE_LENGTH (code) == binary_op);
> -  gcc_assert (reduc_index == (code == MINUS_EXPR ? 0 : 1));
>gcc_assert (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
>   == FOLD_LEFT_REDUCTION);
>
> @@ -6542,9 +6493,9 @@ vectorizable_reduction (stmt_vec_info st
>   reduc_index = i;
> }
>
> -  if (i == 1 && code == COND_EXPR)
> +  if (code == COND_EXPR)
> {
> - /* Record how value of COND_EXPR is defined.  */
> + /* Record how the non-reduction-def value of COND_EXPR is defined.  
> */
>   if (dt == vect_constant_def)
> {
>   cond_reduc_dt = dt;
> @@ -6622,10 +6573,6 @@ vectorizable_reduction (stmt_vec_info st
>   return false;
> }
>
> -  /* vect_is_simple_reduction ensured that operand 2 is the
> -loop-carried operand.  */
> -  gcc_assert (reduc_index == 2);
> -
>/* Loop peeling modifies initial value of reduction PHI, which
>  makes the reduction stmt to be transformed different to the
>  original stmt analyzed.  We need to record reduction code for


Re: [libcpp] Issue a pedantic warning for UCNs outside UCS codespace

2019-09-24 Thread Eric Botcazou
> I think this has to depend on the C standards version.  I think each C
> standard needs to be read against the edition of ISO 10646 current at
> the time of standards approval (the references are sadly not
> versioned, so the version is implied).  Early versions of ISO 10646
> definitely do not have the codespace restriction you mention.

Note the already existing hardcoded check in ucn_valid_in_identifier though.

-- 
Eric Botcazou


Re: [C++ Patch] Use DECL_SOURCE_LOCATION more in name-lookup.c

2019-09-24 Thread Marek Polacek
On Tue, Sep 24, 2019 at 09:07:03PM +0200, Paolo Carlini wrote:
> Hi,
> 
> Marek's recent fix prompted an audit of name-lookup.c and I found a few
> additional straightforward places where we should use a more accurate
> location. Tested x86_64-linux.
> 
> Thanks, Paolo.
> 
> ///
> 

> /cp
> 2019-09-24  Paolo Carlini  
> 
>   * name-lookup.c (check_extern_c_conflict): Use DECL_SOURCE_LOCATION.
>   (check_local_shadow): Use it in three additional places.
> 
> /testsuite
> 2019-09-24  Paolo Carlini  
> 
>   * g++.dg/diagnostic/redeclaration-1.C: New.
>   * g++.dg/lookup/extern-c-hidden.C: Test location(s) too.
>   * g++.dg/lookup/extern-c-redecl.C: Likewise.
>   * g++.dg/lookup/extern-c-redecl6.C: Likewise.
>   * g++.old-deja/g++.other/using9.C: Likewise.

LGTM.

Marek


[Darwin, PPC, Mode Iterators 1/n, committed] Use mode iterators in picbase patterns.

2019-09-24 Thread Iain Sandoe
This switches the picbase load and reload patterns to use the 'P' mode
iterator instead of writing an SI and DI pattern for each (and deletes the
old patterns).  No functional change intended.

Tested on powerpc-darwin9, powerpc64-linux-gnu,
applied to mainline
thanks
Iain

gcc/ChangeLog:

2019-09-24  Iain Sandoe  

* config/rs6000/rs6000.md (load_macho_picbase_): New, using
the 'P' mode iterator, replacing the (removed) SI and DI variants.
(reload_macho_picbase_): Likewise.

diff --git a/gcc/config/rs6000/darwin.md b/gcc/config/rs6000/darwin.md
index 471058dd41..4a284211af 100644
--- a/gcc/config/rs6000/darwin.md
+++ b/gcc/config/rs6000/darwin.md
@@ -217,7 +217,7 @@ You should have received a copy of the GNU General Public 
License
   "")
 
 (define_expand "load_macho_picbase"
-  [(set (reg:SI LR_REGNO)
+  [(set (reg LR_REGNO)
 (unspec [(match_operand 0 "")]
UNSPEC_LD_MPIC))]
   "(DEFAULT_ABI == ABI_DARWIN) && flag_pic"
@@ -230,9 +230,9 @@ You should have received a copy of the GNU General Public 
License
   DONE;
 })
 
-(define_insn "load_macho_picbase_si"
-  [(set (reg:SI LR_REGNO)
-   (unspec:SI [(match_operand:SI 0 "immediate_operand" "s")
+(define_insn "load_macho_picbase_"
+  [(set (reg:P LR_REGNO)
+   (unspec:P [(match_operand:P 0 "immediate_operand" "s")
(pc)] UNSPEC_LD_MPIC))]
   "(DEFAULT_ABI == ABI_DARWIN) && flag_pic"
 {
@@ -246,22 +246,6 @@ You should have received a copy of the GNU General Public 
License
   [(set_attr "type" "branch")
(set_attr "cannot_copy" "yes")])
 
-(define_insn "load_macho_picbase_di"
-  [(set (reg:DI LR_REGNO)
-   (unspec:DI [(match_operand:DI 0 "immediate_operand" "s")
-   (pc)] UNSPEC_LD_MPIC))]
-  "(DEFAULT_ABI == ABI_DARWIN) && flag_pic && TARGET_64BIT"
-{
-#if TARGET_MACHO
-  machopic_should_output_picbase_label (); /* Update for new func.  */
-#else
-  gcc_unreachable ();
-#endif
-  return "bcl 20,31,%0\n%0:";
-}
-  [(set_attr "type" "branch")
-   (set_attr "cannot_copy" "yes")])
-
 (define_expand "macho_correct_pic"
   [(set (match_operand 0 "")
(plus (match_operand 1 "")
@@ -301,7 +285,7 @@ You should have received a copy of the GNU General Public 
License
   [(set_attr "length" "8")])
 
 (define_expand "reload_macho_picbase"
-  [(set (reg:SI LR_REGNO)
+  [(set (reg LR_REGNO)
 (unspec [(match_operand 0 "")]
UNSPEC_RELD_MPIC))]
   "(DEFAULT_ABI == ABI_DARWIN) && flag_pic"
@@ -314,9 +298,9 @@ You should have received a copy of the GNU General Public 
License
   DONE;
 })
 
-(define_insn "reload_macho_picbase_si"
-  [(set (reg:SI LR_REGNO)
-(unspec:SI [(match_operand:SI 0 "immediate_operand" "s")
+(define_insn "reload_macho_picbase_"
+  [(set (reg:P LR_REGNO)
+(unspec:P [(match_operand:P 0 "immediate_operand" "s")
(pc)] UNSPEC_RELD_MPIC))]
   "(DEFAULT_ABI == ABI_DARWIN) && flag_pic"
 {
@@ -337,29 +321,6 @@ You should have received a copy of the GNU General Public 
License
   [(set_attr "type" "branch")
(set_attr "cannot_copy" "yes")])
 
-(define_insn "reload_macho_picbase_di"
-  [(set (reg:DI LR_REGNO)
-   (unspec:DI [(match_operand:DI 0 "immediate_operand" "s")
-   (pc)] UNSPEC_RELD_MPIC))]
-  "(DEFAULT_ABI == ABI_DARWIN) && flag_pic && TARGET_64BIT"
-{
-#if TARGET_MACHO
-  if (machopic_should_output_picbase_label ())
-{
-  static char tmp[64];
-  const char *cnam = machopic_get_function_picbase ();
-  snprintf (tmp, 64, "bcl 20,31,%s\n%s:\n%%0:", cnam, cnam);
-  return tmp;
-}
-  else
-#else
-  gcc_unreachable ();
-#endif
-return "bcl 20,31,%0\n%0:";
-}
-  [(set_attr "type" "branch")
-   (set_attr "cannot_copy" "yes")])
-
 ;; We need to restore the PIC register, at the site of nonlocal label.
 
 (define_insn_and_split "nonlocal_goto_receiver"



[Darwin, PPC, Mode Iterators 0/n, committed] Make iterators visible to darwin.md.

2019-09-24 Thread Iain Sandoe


As a clean-up, we want to be able to use mode iterators in darwin.md.
This patch moves the include point for the Darwin md file until after
the definition of the mode iterators and attrs.  No functional change
intended.

Discussed with, and approved by, Segher off-line,
Tested on powerpc-darwin9, powerpc64-linux-gnu, applied to mainline,
thanks
Iain

gcc/ChangeLog:

2019-09-24  Iain Sandoe  

* config/rs6000/rs6000.md: Move darwin.md include until
after the definition of the mode iterators.

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index f0b0bb4526..4dbf85bbc9 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -361,8 +361,6 @@
 (include "predicates.md")
 (include "constraints.md")
 
-(include "darwin.md")
-
 ^L
 ;; Mode iterators
 
@@ -731,6 +729,7 @@
 (SF "TARGET_P8_VECTOR")
 (DI "TARGET_POWERPC64")])
 
+(include "darwin.md")
 ^L
 ;; Start with fixed-point load and store insns.  Here we put only the more
 ;; complex forms.  Basic data transfer is done later.



[C++ Patch] Use DECL_SOURCE_LOCATION more in name-lookup.c

2019-09-24 Thread Paolo Carlini

Hi,

Marek's recent fix prompted an audit of name-lookup.c and I found a few 
additional straightforward places where we should use a more accurate 
location. Tested x86_64-linux.


Thanks, Paolo.

///

/cp
2019-09-24  Paolo Carlini  

* name-lookup.c (check_extern_c_conflict): Use DECL_SOURCE_LOCATION.
(check_local_shadow): Use it in three additional places.

/testsuite
2019-09-24  Paolo Carlini  

* g++.dg/diagnostic/redeclaration-1.C: New.
* g++.dg/lookup/extern-c-hidden.C: Test location(s) too.
* g++.dg/lookup/extern-c-redecl.C: Likewise.
* g++.dg/lookup/extern-c-redecl6.C: Likewise.
* g++.old-deja/g++.other/using9.C: Likewise.
Index: cp/name-lookup.c
===
--- cp/name-lookup.c(revision 276104)
+++ cp/name-lookup.c(working copy)
@@ -2549,12 +2549,12 @@ check_extern_c_conflict (tree decl)
   if (mismatch)
{
  auto_diagnostic_group d;
- pedwarn (input_location, 0,
+ pedwarn (DECL_SOURCE_LOCATION (decl), 0,
   "conflicting C language linkage declaration %q#D", decl);
  inform (DECL_SOURCE_LOCATION (old),
  "previous declaration %q#D", old);
  if (mismatch < 0)
-   inform (input_location,
+   inform (DECL_SOURCE_LOCATION (decl),
"due to different exception specifications");
}
   else
@@ -2674,7 +2674,8 @@ check_local_shadow (tree decl)
  /* ARM $8.3 */
  if (b->kind == sk_function_parms)
{
- error ("declaration of %q#D shadows a parameter", decl);
+ error_at (DECL_SOURCE_LOCATION (decl),
+   "declaration of %q#D shadows a parameter", decl);
  return;
}
}
@@ -2700,7 +2701,8 @@ check_local_shadow (tree decl)
   && (old_scope->kind == sk_cond || old_scope->kind == sk_for))
{
  auto_diagnostic_group d;
- error ("redeclaration of %q#D", decl);
+ error_at (DECL_SOURCE_LOCATION (decl),
+   "redeclaration of %q#D", decl);
  inform (DECL_SOURCE_LOCATION (old),
  "%q#D previously declared here", old);
  return;
@@ -2723,7 +2725,8 @@ check_local_shadow (tree decl)
   && in_function_try_handler))
{
  auto_diagnostic_group d;
- if (permerror (input_location, "redeclaration of %q#D", decl))
+ if (permerror (DECL_SOURCE_LOCATION (decl),
+"redeclaration of %q#D", decl))
inform (DECL_SOURCE_LOCATION (old),
"%q#D previously declared here", old);
  return;
Index: testsuite/g++.dg/diagnostic/redeclaration-1.C
===
--- testsuite/g++.dg/diagnostic/redeclaration-1.C   (nonexistent)
+++ testsuite/g++.dg/diagnostic/redeclaration-1.C   (working copy)
@@ -0,0 +1,20 @@
+void
+foo (int i)
+{
+  int i  // { dg-error "7:declaration of .int i. shadows a parameter" }
+(0);
+  
+  for (int j ;;)
+int j  // { dg-error "9:redeclaration of .int j." }
+  (0);
+}
+
+void
+bar (int i)
+  try
+{ }
+  catch (...)
+{
+  int i  // { dg-error "11:redeclaration of .int i." }
+   (0);
+}
Index: testsuite/g++.dg/lookup/extern-c-hidden.C
===
--- testsuite/g++.dg/lookup/extern-c-hidden.C   (revision 276104)
+++ testsuite/g++.dg/lookup/extern-c-hidden.C   (working copy)
@@ -4,8 +4,8 @@ extern "C" float fabsf (float);  // { dg-message "
 
 namespace Bob 
 {
-  extern "C" float fabsf (float, float); // { dg-error "C language" }
+  extern "C" float fabsf (float, float); // { dg-error "20:conflicting C 
language" }
   extern "C" double fabs (double, double); // { dg-message "previous 
declaration" }
 }
 
-extern "C" double fabs (double); // { dg-error "C language" }
+extern "C" double fabs (double); // { dg-error "19:conflicting C language" }
Index: testsuite/g++.dg/lookup/extern-c-redecl.C
===
--- testsuite/g++.dg/lookup/extern-c-redecl.C   (revision 276104)
+++ testsuite/g++.dg/lookup/extern-c-redecl.C   (working copy)
@@ -8,4 +8,4 @@ namespace A {
 // next line should trigger an error because
 // it conflicts with previous declaration of foo_func (), due to
 // different exception specifications.
-extern "C" void foo_func (); // { dg-error "C language linkage|exception 
specifications" }
+extern "C" void foo_func (); // { dg-error "17:conflicting C language 
linkage|exception specifications" }
Index: testsuite/g++.dg/lookup/extern-c-redecl6.C
===
--- testsuite/g++.dg/lookup/extern-c-redecl6.C  (revision 276104)
+++ testsuite/g++.dg/lookup/extern-c-redecl6.C  (working copy)
@@ -16,10 +16,10 

[committed] handle null and non-constant values in get_range_strlen_dynamic (PR 91570)

2019-09-24 Thread Martin Sebor

I committed r276105 fixing these two issues plus one more that
Jeff noticed yesterday.

Martin

On 9/21/19 6:03 PM, Martin Sebor wrote:

The new get_range_strlen_dynamic function has a couple of bugs
where it assumes that the length range bounds it gets back from
get_range_strlen are non-null integer constants.  The attached
"quick and dirty" fix removes those assumptions.  Since it's
apparently causing package failures in Jeff's GCC buildbot
I will commit the patch on Monday to get those builds to pass.
But I'm not too happy with how fragile this seems to be so I
will try to do some further cleanup here in the near future
to make it more robust.

Martin




Re: [PATCH] driver: Also prune joined switches with negation

2019-09-24 Thread Matt Turner
On Tue, Sep 24, 2019 at 1:24 AM Kyrill Tkachov
 wrote:
>
> Hi Matt,
>
> On 9/24/19 5:04 AM, Matt Turner wrote:
> > When -march=native is passed to host_detect_local_cpu to the backend,
> > it overrides all command lines after it.  That means
> >
> > $ gcc -march=native -march=armv8-a
> >
> > is treated as
> >
> > $ gcc -march=armv8-a -march=native
> >
> > Prune joined switches with Negative and RejectNegative to allow
> > -march=armv8-a to override previous -march=native on command-line.
> >
> > This is the same fix as was applied for i386 in SVN revision 269164
> > but for
> > aarch64 and arm.
> >
> The fix is ok for arm and LGTM for aarch64 FWIW.

Thanks!

> How has this been tested?

The problem was noticed in this bug report:

   https://bugs.gentoo.org/693522

I remembered seeing the i386 fix and I separately encountered the
problem on ARM when building the pixman library which has iwMMXt code
which requires march=iwmmxt (Could I bribe someone into fixing that by
giving gcc an -miwmmxt flag?)

I verified the fix works by patching gcc and seeing that nss (the
package from the Gentoo bug report) successfully builds with
CFLAGS="-march=native -O2 -pipe"

SVN revision 269164 also added some tests to the gcc test suite, but I
am not sufficiently familiar with building gcc and running the test
suite to verify that any test I speculatively add actually works.

> However...
>
>
> > gcc/
> >
> > PR driver/69471
> > * config/aarch64/aarch64.opt (march=): Add Negative(march=).
> > (mtune=): Add Negative(mtune=).
> > * config/arm/arm.opt: Likewise.
> > ---
> >  gcc/config/aarch64/aarch64.opt | 5 +++--
> >  gcc/config/arm/arm.opt | 4 ++--
> >  2 files changed, 5 insertions(+), 4 deletions(-)
> >
> > diff --git a/gcc/config/aarch64/aarch64.opt
> > b/gcc/config/aarch64/aarch64.opt
> > index 865b6a6d8ca..908dca23b3c 100644
> > --- a/gcc/config/aarch64/aarch64.opt
> > +++ b/gcc/config/aarch64/aarch64.opt
> > @@ -119,7 +119,8 @@ EnumValue
> >  Enum(aarch64_tls_size) String(48) Value(48)
> >
> >  march=
> > -Target RejectNegative ToLower Joined Var(aarch64_arch_string)
> > +Target RejectNegative Negative(march=) ToLower Joined
> > Var(aarch64_arch_string)
> > +
> >  Use features of architecture ARCH.
> >
> >  mcpu=
>
>
> ... Looks like we'll need something similar for -mcpu. On arm and
> aarch64 the -mcpu is the most commonly used option and that can also
> take a "native" value that would suffer from the same issue I presume.

Thank you. I've sent a second version with this addressed in reply to
my initial patch.

If the patch is okay, I think we'd appreciate it if it were backported
to the gcc-8 branch as well.


[PATCH] driver: Also prune joined switches with negation

2019-09-24 Thread Matt Turner
When -march=native is passed to host_detect_local_cpu to the backend,
it overrides all command lines after it.  That means

$ gcc -march=native -march=armv8-a

is treated as

$ gcc -march=armv8-a -march=native

Prune joined switches with Negative and RejectNegative to allow
-march=armv8-a to override previous -march=native on command-line.

This is the same fix as was applied for i386 in SVN revision 269164 but for
aarch64 and arm.

gcc/

PR driver/69471
* config/aarch64/aarch64.opt (march=): Add Negative(march=).
(mtune=): Add Negative(mtune=). (mcpu=): Add Negative(mcpu=).
* config/arm/arm.opt: Likewise.
---
 gcc/config/aarch64/aarch64.opt | 6 +++---
 gcc/config/arm/arm.opt | 6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
index 865b6a6d8ca..fc43428b32a 100644
--- a/gcc/config/aarch64/aarch64.opt
+++ b/gcc/config/aarch64/aarch64.opt
@@ -119,15 +119,15 @@ EnumValue
 Enum(aarch64_tls_size) String(48) Value(48)
 
 march=
-Target RejectNegative ToLower Joined Var(aarch64_arch_string)
+Target RejectNegative Negative(march=) ToLower Joined Var(aarch64_arch_string)
 Use features of architecture ARCH.
 
 mcpu=
-Target RejectNegative ToLower Joined Var(aarch64_cpu_string)
+Target RejectNegative Negative(mcpu=) ToLower Joined Var(aarch64_cpu_string)
 Use features of and optimize for CPU.
 
 mtune=
-Target RejectNegative ToLower Joined Var(aarch64_tune_string)
+Target RejectNegative Negative(mtune=) ToLower Joined Var(aarch64_tune_string)
 Optimize for CPU.
 
 mabi=
diff --git a/gcc/config/arm/arm.opt b/gcc/config/arm/arm.opt
index 452f0cf6d67..76c10ab62a2 100644
--- a/gcc/config/arm/arm.opt
+++ b/gcc/config/arm/arm.opt
@@ -82,7 +82,7 @@ mapcs-stack-check
 Target Report Mask(APCS_STACK) Undocumented
 
 march=
-Target RejectNegative ToLower Joined Var(arm_arch_string)
+Target RejectNegative Negative(march=) ToLower Joined Var(arm_arch_string)
 Specify the name of the target architecture.
 
 ; Other arm_arch values are loaded from arm-tables.opt
@@ -107,7 +107,7 @@ Target Report Mask(CALLER_INTERWORKING)
 Thumb: Assume function pointers may go to non-Thumb aware code.
 
 mcpu=
-Target RejectNegative ToLower Joined Var(arm_cpu_string)
+Target RejectNegative Negative(mcpu=) ToLower Joined Var(arm_cpu_string)
 Specify the name of the target CPU.
 
 mfloat-abi=
@@ -232,7 +232,7 @@ Target Report Mask(TPCS_LEAF_FRAME)
 Thumb: Generate (leaf) stack frames even if not needed.
 
 mtune=
-Target RejectNegative ToLower Joined Var(arm_tune_string)
+Target RejectNegative Negative(mtune=) ToLower Joined Var(arm_tune_string)
 Tune code for the given processor.
 
 mprint-tune-info
-- 
2.21.0



Re: [libcpp] Issue a pedantic warning for UCNs outside UCS codespace

2019-09-24 Thread Florian Weimer
* Eric Botcazou:

> the Universal Character Names accepted by the C family of compilers
> are mapped to those of ISO/IEC 10646, which defines the Universal
> Character Set codespace as the range 0-0x10 inclusive.  The
> upper bound is already enforced for identifiers but not for
> literals, so the following code is accepted in C99:
>
> #include 
>
> wchar_t a = L'\U0011';
>
> whereas it is rejected with an error by other compilers (Clang, MSVC).
>
> I'm not sure whether the compiler is really equired to issue a diagnostic in 
> this case.  Moreover a few tests in the testsuite manipulate UCNs outside the 
> UCS codespace.  That's why I suggest issuing a pedantic warning.
>
> Tested on x86_64-suse-linux, OK for the mainline?

Since this is a pedantic warning …

I think this has to depend on the C standards version.  I think each C
standard needs to be read against the edition of ISO 10646 current at
the time of standards approval (the references are sadly not
versioned, so the version is implied).  Early versions of ISO 10646
definitely do not have the codespace restriction you mention.


Re: [PATCH, Fortran] Optionally suppress no-automatic overwrites recursive warning - for review

2019-09-24 Thread Bernhard Reutner-Fischer
On Tue, 24 Sep 2019 15:11:43 +0100
Mark Eggleston  wrote:

> I didn't realise that's how it worked. That's cleaner. Once fixed OK for 
> commit?

> +@item -Wno-overwrite-recursive
> +@opindex @code{Woverwrite-recursive}
> +@cindex  warnings, overwrite recursive
> +Do not warn when @option{-fno-automatic} is used with @option{-frecursive}. 
> Recursion
> +will be broken if the relevant local variables do not have the attribute
> +@code{AUTOMATIC} explicitly declared. This option can be used to suppress 
> the warning
> +when it is known that recursion is not broken. Useful for build environment 
> that use
> +@option{-Werror}.

I'm not a native speaker, but i would expect environment in plural.
With that nit fixed i have no further comments, i.e. looks fine but i
cannot approve it.

thanks,


Re: Question on direction of GCC support for HWASAN.

2019-09-24 Thread Evgenii Stepanov via gcc-patches
On Tue, Sep 24, 2019 at 9:36 AM Szabolcs Nagy  wrote:
>
> On 23/09/2019 08:52, Martin Liška wrote:
> > On 9/20/19 7:11 PM, Matthew Malcomson wrote:
> >> The implementation is unlikely to be production-quality since
> >> development on libhwasan is only on its `platform` ABI.  This libhwasan
> >> ABI requires changes to the system libc so that it calls into libhwasan
> >> on interesting events.
> >> I haven't looked into adding these changes to glibc, but expect that
> >> most people running a Linux distribution would not want to install a
> >> special glibc to use this sanitizer.
> >
> > Can you please provide a link about what special one needs in glibc
> > to support HWASAN?
>
> i don't know if there is such a link other than taking
> a hint from the internal api in the source
> https://github.com/llvm/llvm-project/blob/master/compiler-rt/lib/hwasan/hwasan_interface_internal.h
>
> memory has to be (un)tagged on (de)allocation, which
> requires libc help to know the limits and when the
> (de)allocation happens in case of tls/stack memory
> (e.g. dealloced at unwind, longjmp, setcontext, thread
> exit, thread cancel, child exit after vfork) and in
> case of global data in dynamically loaded shared libs.

This is a slightly better link, but it misses
__hwasan_library_(load|unload) hooks:
https://github.com/llvm/llvm-project/blob/master/compiler-rt/include/sanitizer/hwasan_interface.h

You can also search bionic source for __hwasan and __sanitizer:
https://android.googlesource.com/platform/bionic/+/refs/heads/master


Re: [PATCH 3/4] New IPA-SRA implementation

2019-09-24 Thread Martin Jambor
Hi,

sorry for replying so late, I still haven't recovered from two weeks of
traveling and conferences.

On Sat, Sep 21 2019, Richard Sandiford wrote:
> Hi,
>
> Thanks for doing this.
>
> Martin Jambor  writes:
>> +/* Analyze function body scan results stored in param_accesses and
>> +   param_accesses, detect possible transformations and store information of
>> +   those in function summary.  NODE, FUN and IFS are all various structures
>> +   describing the currently analyzed function.  */
>> +
>> +static void
>> +process_scan_results (cgraph_node *node, struct function *fun,
>> +  isra_func_summary *ifs,
>> +  vec *param_descriptions)
>> +{
>> +  bool check_pass_throughs = false;
>> +  bool dereferences_propagated = false;
>> +  tree parm = DECL_ARGUMENTS (node->decl);
>> +  unsigned param_count = param_descriptions->length();
>> +
>> +  for (unsigned desc_index = 0;
>> +   desc_index < param_count;
>> +   desc_index++, parm = DECL_CHAIN (parm))
>> +{
>> +  gensum_param_desc *desc = &(*param_descriptions)[desc_index];
>> +  if (!desc->locally_unused && !desc->split_candidate)
>> +continue;
>
> I'm jumping in the middle without working through the whole pass,
> so this is probably a daft question sorry, but: what is this loop
> required to do when:
>
>   !desc->split_candidate && desc->locally_unused

You have figured out correctly that this is a thinko.  I meant not to
continue for non-register-types which might not be used locally but
their locally_unused flag is only set a few lines below...

>
> ?  AFAICT...
>
>> +
>> +  if (flag_checking)
>> +isra_verify_access_tree (desc->accesses);
>> +
>> +  if (!dereferences_propagated
>> +  && desc->by_ref
>> +  && desc->accesses)
>> +{
>> +  propagate_dereference_distances (fun);
>> +  dereferences_propagated = true;
>> +}
>> +
>> +  HOST_WIDE_INT nonarg_acc_size = 0;
>> +  bool only_calls = true;
>> +  bool check_failed = false;
>> +
>> +  int entry_bb_index = ENTRY_BLOCK_PTR_FOR_FN (fun)->index;
>> +  for (gensum_param_access *acc = desc->accesses;
>> +   acc;
>> +   acc = acc->next_sibling)
>> +if (check_gensum_access (parm, desc, acc, _acc_size, _calls,
>> + entry_bb_index))
>> +  {
>> +check_failed = true;
>> +break;
>> +  }
>> +  if (check_failed)
>> +continue;
>> +
>> +  if (only_calls)
>> +desc->locally_unused = true;

...specifically here.

>> +
>> +  HOST_WIDE_INT cur_param_size
>> += tree_to_uhwi (TYPE_SIZE (TREE_TYPE (parm)));
>> +  HOST_WIDE_INT param_size_limit;
>> +  if (!desc->by_ref || optimize_function_for_size_p (fun))
>> +param_size_limit = cur_param_size;
>> +  else
>> +param_size_limit = (PARAM_VALUE (PARAM_IPA_SRA_PTR_GROWTH_FACTOR)
>> +   * cur_param_size);
>> +  if (nonarg_acc_size > param_size_limit
>> +  || (!desc->by_ref && nonarg_acc_size == param_size_limit))
>> +{
>> +  disqualify_split_candidate (desc, "Would result into a too big set of"
>> +  "replacements.");
>> +}
>> +  else
>> +{
>> +  /* create_parameter_descriptors makes sure unit sizes of all
>> + candidate parameters fit unsigned integers restricted to
>> + ISRA_ARG_SIZE_LIMIT.  */
>> +  desc->param_size_limit = param_size_limit / BITS_PER_UNIT;
>> +  desc->nonarg_acc_size = nonarg_acc_size / BITS_PER_UNIT;
>> +  if (desc->split_candidate && desc->ptr_pt_count)
>> +{
>> +  gcc_assert (desc->by_ref);
>> +  check_pass_throughs = true;
>> +}
>> +}
>> +}
>
> ...disqualify_split_candidate should be a no-op in that case,
> because we've already disqualified the parameter for a different reason.
> So it looks like the main effect is instead to set up param_size_limit
> and nonarg_acc_size, the latter of which I assume is 0 when
> desc->locally_unused.

This is the only bit where you are wrong, param_size_limit is the type
size for aggregates and twice pointer size for pointers (well, actually
PARAM_IPA_SRA_PTR_GROWTH_FACTOR times the size of a pointer).  Even for
locally unused parameters because we might "pull" some of them from
callees, when there are some.  But it is not really relevant for the
problem you are facing.

>
> The reason for asking is that the final "else" says that we've already
> checked that param_size_limit is in range, but that's only true if
> desc->split_candidate.  In particular:
>
>   if (is_gimple_reg (parm)
> && !isra_track_scalar_param_local_uses (fun, node, parm, num,
> _call_uses))
>   {
> if (dump_file && (dump_flags & TDF_DETAILS))
>   fprintf (dump_file, " is a scalar with only %i call uses\n",
>scalar_call_uses);
>
> desc->locally_unused = true;
> desc->call_uses 

[libcpp] Issue a pedantic warning for UCNs outside UCS codespace

2019-09-24 Thread Eric Botcazou
Hi,

the Universal Character Names accepted by the C family of compilers are mapped 
to those of ISO/IEC 10646, which defines the Universal Character Set codespace 
as the range 0-0x10 inclusive.  The upper bound is already enforced for 
identifiers but not for literals, so the following code is accepted in C99:

#include 

wchar_t a = L'\U0011';

whereas it is rejected with an error by other compilers (Clang, MSVC).

I'm not sure whether the compiler is really equired to issue a diagnostic in 
this case.  Moreover a few tests in the testsuite manipulate UCNs outside the 
UCS codespace.  That's why I suggest issuing a pedantic warning.

Tested on x86_64-suse-linux, OK for the mainline?


2019-09-24  Eric Botcazou  

libcpp/
* charset.c (UCS_LIMIT): New macro.
(ucn_valid_in_identifier): Use it instead of a hardcoded constant.
(_cpp_valid_ucn): Issue a pedantic warning for UCNs larger than
UCS_LIMIT outside of identifiers.


2019-09-24  Eric Botcazou  

gcc/testsuite/
* gcc.dg/cpp/ucs.c: Add test for new warning and adjust.
* gcc.dg/cpp/utf8-5byte-1.c: Add -w to the options.
* gcc.dg/attr-alias-5.c: Likewise.

-- 
Eric BotcazouIndex: libcpp/charset.c
===
--- libcpp/charset.c	(revision 275988)
+++ libcpp/charset.c	(working copy)
@@ -901,6 +901,9 @@ struct ucnrange {
 };
 #include "ucnid.h"
 
+/* ISO 10646 defines the UCS codespace as the range 0-0x10 inclusive.  */
+#define UCS_LIMIT 0x10
+
 /* Returns 1 if C is valid in an identifier, 2 if C is valid except at
the start of an identifier, and 0 if C is not valid in an
identifier.  We assume C has already gone through the checks of
@@ -915,7 +918,7 @@ ucn_valid_in_identifier (cpp_reader *pfi
   int mn, mx, md;
   unsigned short valid_flags, invalid_start_flags;
 
-  if (c > 0x10)
+  if (c > UCS_LIMIT)
 return 0;
 
   mn = 0;
@@ -1016,6 +1019,9 @@ ucn_valid_in_identifier (cpp_reader *pfi
whose short identifier is less than 00A0 other than 0024 ($), 0040 (@),
or 0060 (`), nor one in the range D800 through DFFF inclusive.
 
+   If the hexadecimal value is larger than the upper bound of the UCS
+   codespace specified in ISO/IEC 10646, a pedantic warning is issued.
+
*PSTR must be preceded by "\u" or "\U"; it is assumed that the
buffer end is delimited by a non-hex digit.  Returns false if the
UCN has not been consumed, true otherwise.
@@ -1135,6 +1141,10 @@ _cpp_valid_ucn (cpp_reader *pfile, const
"universal character %.*s is not valid at the start of an identifier",
 		   (int) (str - base), base);
 }
+  else if (result > UCS_LIMIT)
+cpp_error (pfile, CPP_DL_PEDWARN,
+	   "%.*s is outside the UCS codespace",
+	   (int) (str - base), base);
 
   *cp = result;
   return true;
Index: gcc/testsuite/gcc.dg/attr-alias-5.c
===
--- gcc/testsuite/gcc.dg/attr-alias-5.c	(revision 275988)
+++ gcc/testsuite/gcc.dg/attr-alias-5.c	(working copy)
@@ -1,7 +1,7 @@
 /* Verify diagnostics for aliases to strings containing extended
identifiers or bad characters.  */
 /* { dg-do compile } */
-/* { dg-options "-std=gnu99" } */
+/* { dg-options "-std=gnu99 -w" } */
 /* { dg-require-alias "" } */
 /* { dg-require-ascii-locale "" } */
 /* { dg-skip-if "" { powerpc*-*-aix* } } */
Index: gcc/testsuite/gcc.dg/cpp/ucs.c
===
--- gcc/testsuite/gcc.dg/cpp/ucs.c	(revision 275988)
+++ gcc/testsuite/gcc.dg/cpp/ucs.c	(working copy)
@@ -39,7 +39,7 @@
 #endif
 
 #if WCHAR_MAX >= 0x7ff
-# if L'\U1234abcd' != 0x1234abcd
+# if L'\U1234abcd' != 0x1234abcd /* { dg-warning "outside" "" } */
 #  error bad long ucs	/* { dg-bogus "bad" "bad U1234abcd evaluation" } */
 # endif
 #endif
@@ -49,7 +49,7 @@ void foo ()
   int c;
 
   c = L'\ubad';		/* { dg-error "incomplete" "incomplete UCN 1" } */
-  c = L"\U1234"[0];	/* { dg-error "incomplete" "incompete UCN 2" } */
+  c = L"\U1234"[0];	/* { dg-error "incomplete" "incomplete UCN 2" } */
 
   c = L'\u000x';	/* { dg-error "incomplete" "non-hex digit in UCN" } */
   /* If sizeof(HOST_WIDE_INT) > sizeof(wchar_t), we can get a multi-character
@@ -64,4 +64,6 @@ void foo ()
   c = '\u0025';		/* { dg-error "not a valid" "0025 invalid UCN" } */
   c = L"\uD800"[0];	/* { dg-error "not a valid" "D800 invalid UCN" } */
   c = L'\UDFFF';	/* { dg-error "not a valid" "DFFF invalid UCN" } */
+
+  c = L'\U0011';	/* { dg-warning "outside" "11 outside UCS" } */
 }
Index: gcc/testsuite/gcc.dg/cpp/utf8-5byte-1.c
===
--- gcc/testsuite/gcc.dg/cpp/utf8-5byte-1.c	(revision 275988)
+++ gcc/testsuite/gcc.dg/cpp/utf8-5byte-1.c	(working copy)
@@ -1,7 +1,7 @@
 /* Test for bug in conversions from 5-byte UTF-8 sequences in
cpplib.  */
 /* { dg-do run { target { 4byte_wchar_t } } } */
-/* { 

Re: [PATCH] Remove unused #include "vec.h" from hash-table.h

2019-09-24 Thread Segher Boessenkool
On Tue, Sep 24, 2019 at 07:44:10AM +0200, Bernhard Reutner-Fischer wrote:
> On Mon, 23 Sep 2019 14:52:19 -0500
> "Christian Biesinger via gcc-patches"  wrote:
> > From: Christian Biesinger 
> > Removes an unused include as a cleanup. Requires updating
> > lots of files who previously relied on this transitive include.
> 
> Note that we have a tool to help prune unused includes, somewhere.

contrib/header-tools/reduce-headers?  And see gcc-order-headers.


Segher


Re: Question on direction of GCC support for HWASAN.

2019-09-24 Thread Szabolcs Nagy
On 23/09/2019 08:52, Martin Liška wrote:
> On 9/20/19 7:11 PM, Matthew Malcomson wrote:
>> The implementation is unlikely to be production-quality since
>> development on libhwasan is only on its `platform` ABI.  This libhwasan
>> ABI requires changes to the system libc so that it calls into libhwasan
>> on interesting events.
>> I haven't looked into adding these changes to glibc, but expect that
>> most people running a Linux distribution would not want to install a
>> special glibc to use this sanitizer.
> 
> Can you please provide a link about what special one needs in glibc
> to support HWASAN?

i don't know if there is such a link other than taking
a hint from the internal api in the source
https://github.com/llvm/llvm-project/blob/master/compiler-rt/lib/hwasan/hwasan_interface_internal.h

memory has to be (un)tagged on (de)allocation, which
requires libc help to know the limits and when the
(de)allocation happens in case of tls/stack memory
(e.g. dealloced at unwind, longjmp, setcontext, thread
exit, thread cancel, child exit after vfork) and in
case of global data in dynamically loaded shared libs.


Re: [AArch64][SVE] Utilize ASRD instruction for division and remainder

2019-09-24 Thread Richard Sandiford
Yuliang Wang  writes:
> Hi,
>
> The C snippets below  (signed division/modulo by a power-of-2 immediate 
> value):
>
> #define P ...
>
> void foo_div (int *a, int *b, int N)
> {
> for (int i = 0; i < N; i++)
> a[i] = b[i] / (1 << P);
> }
> void foo_mod (int *a, int *b, int N)
> {
> for (int i = 0; i < N; i++)
> a[i] = b[i] % (1 << P);
> }
>
> Vectorize to the following on AArch64 + SVE:
>
> foo_div:
> movx0, 0
> movw2, N
> ptruep1.b, all
> whilelop0.s, wzr, w2
> .p2align3,,7
> .L2:
> ld1wz1.s, p0/z, [x3, x0, lsl 2]
> cmpltp2.s, p1/z, z1.s, #0//
> movz0.s, p2/z, #7//
> addz0.s, z0.s, z1.s//
> asrz0.s, z0.s, #3//
> st1wz0.s, p0, [x1, x0, lsl 2]
> incwx0
> whilelop0.s, w0, w2
> b.any.L2
> ret
>
> foo_mod:
> ...
> .L2:
> ld1wz0.s, p0/z, [x3, x0, lsl 2]
> cmpltp2.s, p1/z, z0.s, #0//
> movz1.s, p2/z, #-1//
> lsrz1.s, z1.s, #29//
> addz0.s, z0.s, z1.s//
> andz0.s, z0.s, #{2^P-1}//
> subz0.s, z0.s, z1.s//
> st1wz0.s, p0, [x1, x0, lsl 2]
> incwx0
> whilelop0.s, w0, w2
> b.any.L2
> ret
>
> This patch utilizes the special-purpose ASRD (arithmetic shift-right for 
> divide by immediate) instruction:
>
> foo_div:
> ...
> .L2:
> ld1wz0.s, p0/z, [x3, x0, lsl 2]
> asrdz0.s, p1/m, z0.s, #{P}//
> st1wz0.s, p0, [x1, x0, lsl 2]
> incwx0
> whilelop0.s, w0, w2
> b.any.L2
> ret
>
> foo_mod:
> ...
> .L2:
> ld1wz0.s, p0/z, [x3, x0, lsl 2]
> movprfxz1, z0//
> asrdz1.s, p1/m, z1.s, #{P}//
> lslz1.s, z1.s, #{P}//
> subz0.s, z0.s, z1.s//
> st1wz0.s, p0, [x1, x0, lsl 2]
> incwx0
> whilelop0.s, w0, w2
> b.any.L2
> ret
>
> Added new tests. Built and regression tested on aarch64-none-elf.
>
> Best Regards,
> Yuliang Wang
>
>
> gcc/ChangeLog:
>
> 2019-09-23  Yuliang Wang  
>
> * config/aarch64/aarch64-sve.md (asrd3): New pattern for ASRD.
> * config/aarch64/iterators.md (UNSPEC_ASRD): New unspec.
> (ASRDIV): New int iterator.
> * internal-fn.def (IFN_ASHR_DIV): New internal function.
> * optabs.def (ashr_div_optab): New optab.
> * tree-vect-patterns.c (vect_recog_divmod_pattern):
> Modify pattern to support new operation.
> * doc/md.texi (asrd$var{m3}): Documentation for the above.
> * doc/sourcebuild.texi (vect_asrdiv_si): Document new target selector.

This looks good to me.  My only real question is about naming:
maybe IFN_DIV_POW2 would be a better name for the internal function
and sdiv_pow2_optab/"div_pow2$a3" for the optab?  But I'm useless at
naming things, so maybe others would prefer your names.

Thanks,
Richard

>
> gcc/testsuite/ChangeLog:
>
> 2019-09-23  Yuliang Wang  
>
> * gcc.dg/vect/vect-asrdiv-1.c: New test.
> * gcc.target/aarch64/sve/asrdiv_1.c: As above.
> * lib/target-support.exp (check_effective_target_vect_asrdiv_si):
> Return true for AArch64 with SVE.
>
> diff --git a/gcc/config/aarch64/aarch64-sve.md 
> b/gcc/config/aarch64/aarch64-sve.md
> index 
> f58353e9c6dc0df97ce4074db6bb22181f426e5b..607440b7ba16d5616695f29a9cf7c4c277a4a502
>  100644
> --- a/gcc/config/aarch64/aarch64-sve.md
> +++ b/gcc/config/aarch64/aarch64-sve.md
> @@ -71,6 +71,7 @@
>  ;;  [INT] Binary logical operations
>  ;;  [INT] Binary logical operations (inverted second input)
>  ;;  [INT] Shifts
> +;;  [INT] Shifts (rounding towards 0)
>  ;;  [FP] General binary arithmetic corresponding to rtx codes
>  ;;  [FP] General binary arithmetic corresponding to unspecs
>  ;;  [FP] Addition
> @@ -2563,6 +2564,46 @@
>[(set_attr "movprfx" "yes")]
>  )
>  
> +;; -
> +;;  [INT] Shifts (rounding towards 0)
> +;; -
> +;; Includes:
> +;; - ASRD
> +;; -
> +
> +;; Unpredicated arithmetic right shift for division by power-of-2.
> +(define_expand "asrd3"
> +  [(set (match_operand:SVE_I 0 "register_operand" "")
> + (unspec:SVE_I
> +   [(match_dup 3)
> +(unspec:SVE_I
> +  [(match_operand:SVE_I 1 "register_operand" "")
> +   (match_operand 2 "aarch64_simd_rshift_imm")]
> + UNSPEC_ASRD)]
> +  UNSPEC_PRED_X))]
> +  "TARGET_SVE"
> +  {
> +operands[3] = aarch64_ptrue_reg (mode);
> +  }
> +)
> +
> +;; Predicated ASRD with PTRUE.
> +(define_insn "*asrd3"
> +  [(set (match_operand:SVE_I 0 "register_operand" "=w, ?")
> + (unspec:SVE_I
> +   [(match_operand: 1 "register_operand" "Upl, Upl")
> +(unspec:SVE_I
> +  [(match_operand:SVE_I 2 "register_operand" "0, w")
> +   (match_operand 3 "aarch64_simd_rshift_imm")]
> + UNSPEC_ASRD)]
> +  UNSPEC_PRED_X))]
> +  "TARGET_SVE"
> +  "@
> +  asrd\t%0., %1/m, %0., #%3
> +  movprfx\t%0, %2\;asrd\t%0., %1/m, %0., #%3"
> +  [(set_attr "movprfx" "*,yes")]
> +)
> +
>  ;; 

Re: [PATCH] Fix ICE when __builtin_calloc has no LHS (PR tree-optimization/91014).

2019-09-24 Thread Jeff Law
On 9/24/19 4:34 AM, Martin Liška wrote:
> On 9/24/19 11:14 AM, Thomas Schwinge wrote:
>> Hi!
>>
>> Curious: even if you found the issue on a s390x target, shouldn't this
>> (presumably generic?) test case live in a generic place instead of
>> 'gcc.target/s390/'?
> 
> Sure, that's logical and I've just tested that locally on x86_64-linux-gnu.
> 
> Ready to be installed?
Sure, and IMHO moving tests like this should be something that can be
done without explicit ACKs.

jeff


Re: [PATCH] PR tree-optimization/90836 Missing popcount pattern matching

2019-09-24 Thread Dmitrij Pochepko
Hi,

can anybody take a look at v2?

Thanks,
Dmitrij

On Mon, Sep 09, 2019 at 10:03:40PM +0300, Dmitrij Pochepko wrote:
> Hi all.
> 
> Please take a look at v2 (attached).
> I changed patch according to review comments. The same testing was performed 
> again.
> 
> Thanks,
> Dmitrij
> 
> On Thu, Sep 05, 2019 at 06:34:49PM +0300, Dmitrij Pochepko wrote:
> > This patch adds matching for Hamming weight (popcount) implementation. The 
> > following sources:
> > 
> > int
> > foo64 (unsigned long long a)
> > {
> > unsigned long long b = a;
> > b -= ((b>>1) & 0xULL);
> > b = ((b>>2) & 0xULL) + (b & 0xULL);
> > b = ((b>>4) + b) & 0x0F0F0F0F0F0F0F0FULL;
> > b *= 0x0101010101010101ULL;
> > return (int)(b >> 56);
> > }
> > 
> > and
> > 
> > int
> > foo32 (unsigned int a)
> > {
> > unsigned long b = a;
> > b -= ((b>>1) & 0xUL);
> > b = ((b>>2) & 0xUL) + (b & 0xUL);
> > b = ((b>>4) + b) & 0x0F0F0F0FUL;
> > b *= 0x01010101UL;
> > return (int)(b >> 24);
> > }
> > 
> > and equivalents are now recognized as popcount for platforms with hw 
> > popcount support. Bootstrapped and tested on x86_64-pc-linux-gnu and 
> > aarch64-linux-gnu systems with no regressions. 
> > 
> > (I have no write access to repo)
> > 
> > Thanks,
> > Dmitrij
> > 
> > 
> > gcc/ChangeLog:
> > 
> > PR tree-optimization/90836
> > 
> > * gcc/match.pd (popcount): New pattern.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > PR tree-optimization/90836
> > 
> > * lib/target-supports.exp (check_effective_target_popcount)
> > (check_effective_target_popcountll): New effective targets.
> > * gcc.dg/tree-ssa/popcount4.c: New test.
> > * gcc.dg/tree-ssa/popcount4l.c: New test.
> > * gcc.dg/tree-ssa/popcount4ll.c: New test.
> 
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index 0317bc7..b1867bf 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -5358,6 +5358,70 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >(cmp (popcount @0) integer_zerop)
> >(rep @0 { build_zero_cst (TREE_TYPE (@0)); }
> >  
> > +/* 64- and 32-bits branchless implementations of popcount are detected:
> > +
> > +   int popcount64c (uint64_t x)
> > +   {
> > + x -= (x >> 1) & 0xULL;
> > + x = (x & 0xULL) + ((x >> 2) & 0xULL);
> > + x = (x + (x >> 4)) & 0x0f0f0f0f0f0f0f0fULL;
> > + return (x * 0x0101010101010101ULL) >> 56;
> > +   }
> > +
> > +   int popcount32c (uint32_t x)
> > +   {
> > + x -= (x >> 1) & 0x;
> > + x = (x & 0x) + ((x >> 2) & 0x);
> > + x = (x + (x >> 4)) & 0x0f0f0f0f;
> > + return (x * 0x01010101) >> 24;
> > +   }  */
> > +(simplify
> > +  (convert
> > +(rshift
> > +  (mult
> > +   (bit_and:c
> > + (plus:c
> > +   (rshift @8 INTEGER_CST@5)
> > +   (plus:c@8
> > + (bit_and @6 INTEGER_CST@7)
> > + (bit_and
> > +   (rshift
> > + (minus@6
> > +   @0
> > +   (bit_and
> > + (rshift @0 INTEGER_CST@4)
> > + INTEGER_CST@11))
> > + INTEGER_CST@10)
> > +   INTEGER_CST@9)))
> > + INTEGER_CST@3)
> > +   INTEGER_CST@2)
> > +  INTEGER_CST@1))
> > +  /* Check constants and optab.  */
> > +  (with
> > + {
> > +   tree argtype = TREE_TYPE (@0);
> > +   unsigned prec = TYPE_PRECISION (argtype);
> > +   int shift = TYPE_PRECISION (long_long_unsigned_type_node) - prec;
> > +   const unsigned long long c1 = 0x0101010101010101ULL >> shift,
> > +   c2 = 0x0F0F0F0F0F0F0F0FULL >> shift,
> > +   c3 = 0xULL >> shift,
> > +   c4 = 0xULL >> shift;
> > + }
> > +(if (types_match (type, integer_type_node) && tree_to_uhwi (@4) == 1
> > + && tree_to_uhwi (@10) == 2 && tree_to_uhwi (@5) == 4
> > + && tree_to_uhwi (@1) == prec - 8 && tree_to_uhwi (@2) == c1
> > + && tree_to_uhwi (@3) == c2 && tree_to_uhwi (@9) == c3
> > + && tree_to_uhwi (@7) == c3 && tree_to_uhwi (@11) == c4
> > + && optab_handler (popcount_optab, TYPE_MODE (argtype))
> > +   != CODE_FOR_nothing)
> > +   (switch
> > +   (if (types_match (argtype, long_long_unsigned_type_node))
> > + (BUILT_IN_POPCOUNTLL @0))
> > +   (if (types_match (argtype, long_unsigned_type_node))
> > + (BUILT_IN_POPCOUNTL @0))
> > +   (if (types_match (argtype, unsigned_type_node))
> > + (BUILT_IN_POPCOUNT @0))
> > +
> >  /* Simplify:
> >  
> >   a = a1 op a2
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/popcount4.c 
> > b/gcc/testsuite/gcc.dg/tree-ssa/popcount4.c
> > new file mode 100644
> > index 000..9f759f8
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/popcount4.c
> > @@ -0,0 +1,22 @@
> > +/* { dg-do compile } */
> > +/* { 

C++ PATCH for c++/91877 - ICE with converting member of packed struct.

2019-09-24 Thread Marek Polacek
This started to ICE with my CWG 2352 fix.  In reference_binding, we now bind
directly when the types are similar, not just when they are the same.  But even
direct binding can involve a temporary, e.g. for a bit-field, or, as in this
test, for a packed field.

convert_like will actually create the temporary, but we were triggering the
assert checking that the types are the same.  Now they don't have to be, so
adjust the assert accordingly.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2019-09-24  Marek Polacek  

PR c++/91877 - ICE with converting member of packed struct.
* call.c (convert_like_real): Use similar_type_p in an assert.

* g++.dg/conversion/packed1.C: New test.

diff --git gcc/cp/call.c gcc/cp/call.c
index 77f10a9f5f1..45b984ecb11 100644
--- gcc/cp/call.c
+++ gcc/cp/call.c
@@ -7382,8 +7382,7 @@ convert_like_real (conversion *convs, tree expr, tree fn, 
int argnum,
tree type = TREE_TYPE (ref_type);
cp_lvalue_kind lvalue = lvalue_kind (expr);
 
-   gcc_assert (same_type_ignoring_top_level_qualifiers_p
-   (type, next_conversion (convs)->type));
+   gcc_assert (similar_type_p (type, next_conversion (convs)->type));
if (!CP_TYPE_CONST_NON_VOLATILE_P (type)
&& !TYPE_REF_IS_RVALUE (ref_type))
  {
diff --git gcc/testsuite/g++.dg/conversion/packed1.C 
gcc/testsuite/g++.dg/conversion/packed1.C
new file mode 100644
index 000..c4be930bc19
--- /dev/null
+++ gcc/testsuite/g++.dg/conversion/packed1.C
@@ -0,0 +1,12 @@
+// PR c++/91877 - ICE with converting member of packed struct.
+// { dg-do compile { target c++11 } }
+// { dg-options "-fpack-struct" }
+
+template  class b {
+public:
+  b(const a &);
+};
+struct {
+  int *c;
+} d;
+void e() { b(d.c); }


Re: [PATCH][RFC] Come up with VEC_COND_OP_EXPRs.

2019-09-24 Thread Richard Sandiford
Richard Biener  writes:
> On Tue, Sep 24, 2019 at 1:57 PM Richard Sandiford
>  wrote:
>>
>> Richard Biener  writes:
>> > On Tue, Sep 24, 2019 at 1:11 PM Richard Sandiford
>> >  wrote:
>> >>
>> >> Martin Liška  writes:
>> >> > Hi.
>> >> >
>> >> > The patch introduces couple of new TREE_CODEs that will help us to have
>> >> > a proper GIMPLE representation of current VECT_COND_EXPR. Right now,
>> >> > the first argument is typically a GENERIC tcc_expression tree with 2 
>> >> > operands
>> >> > that are visited at various places in GIMPLE code. That said, based on 
>> >> > the discussion
>> >> > with Richi, I'm suggesting to come up with e.g.
>> >> > VECT_COND_LT_EXPR. Such a 
>> >> > change logically
>> >> > introduces new GIMPLE_QUATERNARY_RHS gassignments. For now, the 
>> >> > VEC_COND_EXPR remains
>> >> > and is only valid in GENERIC and gimplifier will take care of the 
>> >> > corresponding transition.
>> >> >
>> >> > The patch is a prototype and missing bits are:
>> >> > - folding support addition for GIMPLE_QUATERNARY_RHS is missing
>> >> > - fancy tcc_comparison expressions like LTGT_EXPR, UNORDERED_EXPR, 
>> >> > ORDERED_EXPR,
>> >> >   UNLT_EXPR and others are not supported right now
>> >> > - comments are missing for various functions added
>> >> >
>> >> > Apart from that I was able to bootstrap and run tests with a quite 
>> >> > small fallout.
>> >> > Thoughts?
>> >> > Martin
>> >>
>> >> I think this is going in the wrong direction.  There are some targets
>> >> that can only handle VEC_COND_EXPRs well if we know the associated
>> >> condition, and others where a compare-and-VEC_COND_EXPR will always be
>> >> two operations.  In that situation, it seems like the native gimple
>> >> representation should be the simpler representation rather than the
>> >> more complex one.  That way the comparisons can be optimised
>> >> independently of any VEC_COND_EXPRs on targets that benefit from that.
>> >>
>> >> So IMO it would be better to use three-operand VEC_COND_EXPRs with
>> >> no embedded conditions as the preferred gimple representation and
>> >> have internal functions for the fused operations that some targets
>> >> prefer.  This means that using fused operations is "just" an instruction
>> >> selection decision rather than hard-coded throughout gimple.  (And that
>> >> fits in well with the idea of doing more instruction selection in gimple.)
>> >
>> > So I've been doing that before, but more generally also for COND_EXPR.
>> > We cannot rely on TER and the existing RTL expansion "magic" for the
>> > instruction selection issue you mention because TER isn't reliable.  With
>> > IFNs for optabs we could do actual [vector] condition instruction selection
>> > before RTL expansion, ignoring "single-use" issues - is that what you are
>> > hinting at?
>>
>> Yeah.  It'd be similar to how most FMA selection happens after
>> vectorisation but before expand.
>>
>> > How should the vectorizer deal with this?  Should it directly
>> > use the optab IFNs then when facing "split" COND_EXPRs?  IIRC the
>> > most fallout of a simple patch (adjusting is_gimple_condexpr) is in the
>> > vectorizer.
>>
>> I guess that would be down to how well the vector costings work if we
>> just stick to VEC_COND_EXPR and cost the comparison separately.  Using
>> optabs directly in the vectoriser definitely sounds OK if that ends up
>> being necessary for good code.  But if (like you say) the COND_EXPR is
>> also split apart, we'd be costing the scalar comparison and selection
>> separately as well.
>>
>> > Note I'm specifically looking for a solution that applies to both COND_EXPR
>> > and VEC_COND_EXPR since both suffer from the same issues.
>>
>> Yeah, think the same approach would work for COND_EXPR if it's needed.
>> (And I think the same trade-off applies there too.  Some targets will
>> always need a separate comparison to implement a four-operand COND_EXPR.)
>>
>> > There was also recent work in putting back possibly trapping comparisons
>> > into [VEC_]COND_EXPR because it doesn't interfere with EH and allows
>> > better code.
>>
>> OK, that's a good counter-reason :-)  But it seems quite special-purpose.
>> I assume this works even for targets that do split the VEC_COND_EXPR
>> because the result is undefined on entry to the EH receiver if the
>> operation didn't complete.  But that should be true of any non-trapping
>> work done after the comparison, with the same proviso.
>>
>> So this still seems like an instruction-selection issue.  We're just
>> saying that it's OK to combine a trapping comparison and a VEC_COND_EXPR
>> from the non-trapping path.  The same would be true for any other
>> instruction selection that combines trapping and non-trapping
>> operations, provided that the speculated parts can never trap.
>
> Sure, but that case would necessarily be combining the compare and the
> select to the compare place which is "backwards" (and would speculate
> the select).  Certainly something we don't do 

Re: C++ PATCH for c++/91845 - ICE with invalid pointer-to-member.

2019-09-24 Thread Jason Merrill
OK.

On Mon, Sep 23, 2019 at 10:06 PM Marek Polacek  wrote:
>
> build_m_component_ref checks if either datum/component it got are erroneous 
> but
> they can be turned into the error_mark_node by mark_use as in this case: datum
> is "a" before the call to mark_lvalue_use, but that emits an error and returns
> the error_mark_node, which then crashes.
>
> We can just move the checks after calling mark_[lr]value_use; those just
> return when they get the error_mark_node.  But I tweaked mark_use to also
> handle the case when the type is erroneous.
>
> Bootstrapped/regtested on x86_64-linux, ok for trunk?
>
> 2019-09-23  Marek Polacek  
>
> PR c++/91845 - ICE with invalid pointer-to-member.
> * expr.c (mark_use): Use error_operand_p.
> * typeck2.c (build_m_component_ref): Check error_operand_p after
> calling mark_[lr]value_use.
>
> * g++.dg/cpp1y/pr91845.C: New test.
>
> diff --git gcc/cp/expr.c gcc/cp/expr.c
> index 212a7f93c5a..d488912d5db 100644
> --- gcc/cp/expr.c
> +++ gcc/cp/expr.c
> @@ -96,7 +96,7 @@ mark_use (tree expr, bool rvalue_p, bool read_p,
>  {
>  #define RECUR(t) mark_use ((t), rvalue_p, read_p, loc, reject_builtin)
>
> -  if (expr == NULL_TREE || expr == error_mark_node)
> +  if (expr == NULL_TREE || error_operand_p (expr))
>  return expr;
>
>if (reject_builtin && reject_gcc_builtin (expr, loc))
> diff --git gcc/cp/typeck2.c gcc/cp/typeck2.c
> index d5098fa24bb..58fa54f40af 100644
> --- gcc/cp/typeck2.c
> +++ gcc/cp/typeck2.c
> @@ -2068,12 +2068,12 @@ build_m_component_ref (tree datum, tree component, 
> tsubst_flags_t complain)
>tree binfo;
>tree ctype;
>
> -  if (error_operand_p (datum) || error_operand_p (component))
> -return error_mark_node;
> -
>datum = mark_lvalue_use (datum);
>component = mark_rvalue_use (component);
>
> +  if (error_operand_p (datum) || error_operand_p (component))
> +return error_mark_node;
> +
>ptrmem_type = TREE_TYPE (component);
>if (!TYPE_PTRMEM_P (ptrmem_type))
>  {
> diff --git gcc/testsuite/g++.dg/cpp1y/pr91845.C 
> gcc/testsuite/g++.dg/cpp1y/pr91845.C
> new file mode 100644
> index 000..cb80dd7a8a7
> --- /dev/null
> +++ gcc/testsuite/g++.dg/cpp1y/pr91845.C
> @@ -0,0 +1,14 @@
> +// PR c++/91845 - ICE with invalid pointer-to-member.
> +// { dg-do compile { target c++14 } }
> +
> +void non_const_mem_ptr() {
> +  struct A {
> +  };
> +  constexpr A a = {1, 2}; // { dg-error "too many initializers" }
> +  struct B {
> +int A::*p;
> +constexpr int g() const {
> +  return a.*p; // { dg-error "use of local variable" }
> +};
> +  };
> +}


Re: C++ PATCH for c++/91868 - improve -Wshadow location.

2019-09-24 Thread Jason Merrill
OK.

On Mon, Sep 23, 2019 at 10:04 PM Marek Polacek  wrote:
>
> We can improve various -Wshadow warnings by using DECL_SOURCE_LOCATION
> rather than whatever is in input_location.
>
> Bootstrapped/regtested on x86_64-linux, ok for trunk?
>
> 2019-09-23  Marek Polacek  
>
> PR c++/91868 - improve -Wshadow location.
> * name-lookup.c (check_local_shadow): Use DECL_SOURCE_LOCATION
> instead of input_location.
>
> * g++.dg/warn/Wshadow-16.C: New test.
>
> diff --git gcc/cp/name-lookup.c gcc/cp/name-lookup.c
> index 8bbb92ddc9f..74f1072fa8c 100644
> --- gcc/cp/name-lookup.c
> +++ gcc/cp/name-lookup.c
> @@ -2771,7 +2771,7 @@ check_local_shadow (tree decl)
> msg = "declaration of %qD shadows a previous local";
>
>auto_diagnostic_group d;
> -  if (warning_at (input_location, warning_code, msg, decl))
> +  if (warning_at (DECL_SOURCE_LOCATION (decl), warning_code, msg, decl))
> inform_shadowed (old);
>return;
>  }
> @@ -2798,7 +2798,7 @@ check_local_shadow (tree decl)
> || TYPE_PTRMEMFUNC_P (TREE_TYPE (decl)))
>   {
> auto_diagnostic_group d;
> -   if (warning_at (input_location, OPT_Wshadow,
> +   if (warning_at (DECL_SOURCE_LOCATION (decl), OPT_Wshadow,
> "declaration of %qD shadows a member of %qT",
> decl, current_nonlambda_class_type ())
> && DECL_P (member))
> @@ -2818,7 +2818,7 @@ check_local_shadow (tree decl)
>  /* XXX shadow warnings in outer-more namespaces */
>  {
>auto_diagnostic_group d;
> -  if (warning_at (input_location, OPT_Wshadow,
> +  if (warning_at (DECL_SOURCE_LOCATION (decl), OPT_Wshadow,
>   "declaration of %qD shadows a global declaration",
>   decl))
> inform_shadowed (old);
> diff --git gcc/testsuite/g++.dg/warn/Wshadow-16.C 
> gcc/testsuite/g++.dg/warn/Wshadow-16.C
> new file mode 100644
> index 000..1ba54ec107d
> --- /dev/null
> +++ gcc/testsuite/g++.dg/warn/Wshadow-16.C
> @@ -0,0 +1,24 @@
> +// PR c++/91868 - improve -Wshadow location.
> +// { dg-options "-Wshadow" }
> +
> +int global; // { dg-message "shadowed declaration" }
> +
> +struct S
> +{
> +  static int bar; // { dg-message "shadowed declaration" }
> +  S (int i) { int bar // { dg-warning "19:declaration of .bar. shadows a 
> member" }
> +  (1);
> +int global // { dg-warning "9:declaration of .global. shadows a global 
> declaration" }
> +  (42);
> +  }
> +};
> +
> +void
> +foo ()
> +{
> +  int xx; // { dg-message "shadowed declaration" }
> +  {
> +S xx // { dg-warning "7:declaration of .xx. shadows a previous local" }
> +(42);
> +  }
> +}


Re: [PATCH, Fortran] Optionally suppress no-automatic overwrites recursive warning - for review

2019-09-24 Thread Mark Eggleston



On 24/09/2019 14:53, Bernhard Reutner-Fischer wrote:

On Tue, 24 Sep 2019 12:12:04 +0100
Mark Eggleston  wrote:


@@ -411,7 +411,7 @@ gfc_post_options (const char **pfilename)
 && flag_max_stack_var_size != 0)
   gfc_warning_now (0, "Flag %<-fno-automatic%> overwrites 
%<-fmax-stack-var-size=%d%>",
 flag_max_stack_var_size);
-  else if (!flag_automatic && flag_recursive)
+  else if (!flag_automatic && flag_recursive && warn_overwrite_recursive)
   gfc_warning_now (0, "Flag %<-fno-automatic%> overwrites 
%<-frecursive%>");
 else if (!flag_automatic && flag_openmp)
   gfc_warning_now (0, "Flag %<-fno-automatic%> overwrites %<-frecursive%> 
implied by "


Doesn't look right to me. Do you want
gfc_warning_now (OPT_Woverwrite_recursive, "Flag ...
instead?

Done.

by "instead" i mean you to leave the if unchanged.
I didn't realise that's how it worked. That's cleaner. Once fixed OK for 
commit?


thanks,


--
https://www.codethink.co.uk/privacy.html



Re: [PATCH, Fortran] Optionally suppress no-automatic overwrites recursive warning - for review

2019-09-24 Thread Bernhard Reutner-Fischer
On Tue, 24 Sep 2019 12:12:04 +0100
Mark Eggleston  wrote:

> >> @@ -411,7 +411,7 @@ gfc_post_options (const char **pfilename)
> >> && flag_max_stack_var_size != 0)
> >>   gfc_warning_now (0, "Flag %<-fno-automatic%> overwrites 
> >> %<-fmax-stack-var-size=%d%>",
> >> flag_max_stack_var_size);
> >> -  else if (!flag_automatic && flag_recursive)
> >> +  else if (!flag_automatic && flag_recursive && warn_overwrite_recursive)
> >>   gfc_warning_now (0, "Flag %<-fno-automatic%> overwrites 
> >> %<-frecursive%>");
> >> else if (!flag_automatic && flag_openmp)
> >>   gfc_warning_now (0, "Flag %<-fno-automatic%> overwrites 
> >> %<-frecursive%> implied by "
> >>
> > Doesn't look right to me. Do you want
> > gfc_warning_now (OPT_Woverwrite_recursive, "Flag ...
> > instead?
> Done.

by "instead" i mean you to leave the if unchanged.

thanks,


Re: [PATCH][AArch64] Don't split 64-bit constant stores to volatile location

2019-09-24 Thread Kyrill Tkachov

Hi all,

On 8/22/19 10:16 AM, Kyrill Tkachov wrote:

Hi all,

The optimisation to optimise:
   typedef unsigned long long u64;

   void bar(u64 *x)
   {
 *x = 0xabcdef10abcdef10;
   }

from:
    mov x1, 61200
    movk    x1, 0xabcd, lsl 16
    movk    x1, 0xef10, lsl 32
    movk    x1, 0xabcd, lsl 48
    str x1, [x0]

into:
    mov w1, 61200
    movk    w1, 0xabcd, lsl 16
    stp w1, w1, [x0]

ends up producing two distinct stores if the destination is volatile:
  void bar(u64 *x)
  {
    *(volatile u64 *)x = 0xabcdef10abcdef10;
  }
    mov w1, 61200
    movk    w1, 0xabcd, lsl 16
    str w1, [x0]
    str w1, [x0, 4]

because we end up not merging the strs into an stp. It's questionable 
whether the use of STP is valid for volatile in the first place.
To avoid unnecessary pain in a context where it's unlikely to be 
performance critical [1] (use of volatile), this patch avoids this
transformation for volatile destinations, so we produce the original 
single STR-X.


Bootstrapped and tested on aarch64-none-linux-gnu.

Ok for trunk (and eventual backports)?


This has been approved by James offline.

Committed to trunk with r276098.

Thanks,

Kyrill


Thanks,
Kyrill

[1] 
https://lore.kernel.org/lkml/20190821103200.kpufwtviqhpbuv2n@willie-the-truck/



gcc/
2019-08-22  Kyrylo Tkachov 

    * config/aarch64/aarch64.md (mov): Don't call
    aarch64_split_dimode_const_store on volatile MEM.

gcc/testsuite/
2019-08-22  Kyrylo Tkachov 

    * gcc.target/aarch64/nosplit-di-const-volatile_1.c: New test.



Re: [GCC][PATCH][AArch64] Update hwcap string for fp16fml in aarch64-option-extensions.def

2019-09-24 Thread Kyrill Tkachov

Hi all,

On 9/10/19 1:34 PM, Stam Markianos-Wright wrote:


Hi all,

This is a minor patch that fixes the entry for the fp16fml feature in
GCC's aarch64-option-extensions.def.

As can be seen in the Linux sources here
https://github.com/torvalds/linux/blob/master/arch/arm64/kernel/cpuinfo.c#L69 


the correct string is "asimdfhm", not "asimdfml".

Cross-compiled and tested on aarch64-none-linux-gnu.

Is this ok for trunk?

Also, I don't have commit rights, so could someone commit it on my behalf?

James approved it offline so I've committed it on Stam's behalf as 
r276097 with a slightly adjusted ChangeLog:


2019-09-24  Stamatis Markianos-Wright 

    * config/aarch64/aarch64-option-extensions.def (fp16fml):
    Update hwcap string for fp16fml.

Thanks,

Kyrill


Thanks,
Stam Markianos-Wright


The diff is:

diff --git a/gcc/config/aarch64/aarch64-option-extensions.def
b/gcc/config/aarch64/aarch64-option-extensions.def
index 9919edd43d0..60e8f28fff5 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -135,7 +135,7 @@ AARCH64_OPT_EXTENSION("sm4", AARCH64_FL_SM4,
AARCH64_FL_SIMD, \
   /* Enabling "fp16fml" also enables "fp" and "fp16".
  Disabling "fp16fml" just disables "fp16fml".  */
   AARCH64_OPT_EXTENSION("fp16fml", AARCH64_FL_F16FML, \
-  AARCH64_FL_FP | AARCH64_FL_F16, 0, false, "asimdfml")
+  AARCH64_FL_FP | AARCH64_FL_F16, 0, false, "asimdfhm")

   /* Enabling "sve" also enables "fp16", "fp" and "simd".
  Disabling "sve" disables "sve", "sve2", "sve2-aes", "sve2-sha3",
"sve2-sm4"



gcc/ChangeLog:

2019-09-09  Stamatis Markianos-Wright 

  * config/aarch64/aarch64-option-extensions.def: Updated hwcap
string for fp16fml.




Re: [PATCH] Fix up __builtin_alloca_with_align (0, ...) folding (PR sanitizer/91707)

2019-09-24 Thread Jakub Jelinek
On Tue, Sep 24, 2019 at 03:10:49PM +0200, Richard Biener wrote:
> Hmm yeah.
> 
> Note that in principle the domain could be signed so that the
> -1 is more obvious.  Also [1:0] would be an equally valid empty
> domain.  Not sure if that helps the specific jump-threading case, of 
> course...

No, that doesn't help.
The code is essentially
void
foo (int x)
{
  if (x == 0)
bar ();
  int v[x];
  v[0] = 1;
  if (x == 0)
bar ();
}
where if jump threading creates
if (x == 0) { bar (); int v[0]; v[0] = 1; bar (); }
else { int v[x]; v[0] = 1; }
out of it, we do warn.  Whether we should warn in that case is something for
ongoing debate (I don't like such warnings, because the if (x == 0) doesn't
necessarily mean the code will be called with such arguments, it might be
just that something written generically got inlined in, but others like them
(Martin, Jeff)), in this specific case it is even that the if (x == 0) bar ();
doesn't actually come from the user code at all, but from the sanitization
and so even less desirable, because, well, user code didn't have any tests
like that at all.

Jakub


Re: [PATCH] Fix up __builtin_alloca_with_align (0, ...) folding (PR sanitizer/91707)

2019-09-24 Thread Richard Biener
On Tue, 24 Sep 2019, Jakub Jelinek wrote:

> On Tue, Sep 24, 2019 at 01:15:46PM +0200, Richard Biener wrote:
> > > build_array_type_nelts is only meaningful for non-zero number of elements,
> > > for 0 it creates weirdo arrays like char D.2358[0:18446744073709551615].
> > > The following patch uses in that case types like the C FE emits for
> > > zero-length array instead (i.e. char D.2358[0:] with forced 0 size).
> > > 
> > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> > 
> > Not sure [0:-1] is actually the canonical zero-length array (and IIRC
> > what the C++ FE creates and what layout_type can lay out).  So why
> 
> You're right, patch withdrawn.
> 
> > not fix the sanitizers instead?
> 
> Well, the problem isn't in sanitizers, but jump threading and late warnings
> that are warning even about code specialized by jump threading.
> It could be indeed solved with __builtin_warning if we defer the late
> warnings and ignore them inside of sanitization report only paths (if we can
> detect them reliably, perhaps pass dominated by a failed ubsan or asan
> sanitization check), or by making jump threading not try to optimize the
> cold sanitization diagnostics parts.

Hmm yeah.

Note that in principle the domain could be signed so that the
-1 is more obvious.  Also [1:0] would be an equally valid empty
domain.  Not sure if that helps the specific jump-threading case, of 
course...

Richard.


[AArch64][SVE] Utilize ASRD instruction for division and remainder

2019-09-24 Thread Yuliang Wang
Hi,

The C snippets below  (signed division/modulo by a power-of-2 immediate value):

#define P ...

void foo_div (int *a, int *b, int N)
{
for (int i = 0; i < N; i++)
a[i] = b[i] / (1 << P);
}
void foo_mod (int *a, int *b, int N)
{
for (int i = 0; i < N; i++)
a[i] = b[i] % (1 << P);
}

Vectorize to the following on AArch64 + SVE:

foo_div:
mov x0, 0
mov w2, N
ptrue   p1.b, all
whilelo p0.s, wzr, w2
.p2align3,,7
.L2:
ld1wz1.s, p0/z, [x3, x0, lsl 2]
cmplt   p2.s, p1/z, z1.s, #0//
mov z0.s, p2/z, #7  //
add z0.s, z0.s, z1.s//
asr z0.s, z0.s, #3  //
st1wz0.s, p0, [x1, x0, lsl 2]
incwx0
whilelo p0.s, w0, w2
b.any   .L2
ret

foo_mod:
...
.L2:
ld1wz0.s, p0/z, [x3, x0, lsl 2]
cmplt   p2.s, p1/z, z0.s, #0//
mov z1.s, p2/z, #-1 //
lsr z1.s, z1.s, #29 //
add z0.s, z0.s, z1.s//
and z0.s, z0.s, #{2^P-1}//
sub z0.s, z0.s, z1.s//
st1wz0.s, p0, [x1, x0, lsl 2]
incwx0
whilelo p0.s, w0, w2
b.any   .L2
ret

This patch utilizes the special-purpose ASRD (arithmetic shift-right for divide 
by immediate) instruction:

foo_div:
...
.L2:
ld1wz0.s, p0/z, [x3, x0, lsl 2]
asrdz0.s, p1/m, z0.s, #{P}  //
st1wz0.s, p0, [x1, x0, lsl 2]
incwx0
whilelo p0.s, w0, w2
b.any   .L2
ret

foo_mod:
...
.L2:
ld1wz0.s, p0/z, [x3, x0, lsl 2]
movprfx z1, z0  //
asrdz1.s, p1/m, z1.s, #{P}  //
lsl z1.s, z1.s, #{P}//
sub z0.s, z0.s, z1.s//
st1wz0.s, p0, [x1, x0, lsl 2]
incwx0
whilelo p0.s, w0, w2
b.any   .L2
ret

Added new tests. Built and regression tested on aarch64-none-elf.

Best Regards,
Yuliang Wang


gcc/ChangeLog:

2019-09-23  Yuliang Wang  

* config/aarch64/aarch64-sve.md (asrd3): New pattern for ASRD.
* config/aarch64/iterators.md (UNSPEC_ASRD): New unspec.
(ASRDIV): New int iterator.
* internal-fn.def (IFN_ASHR_DIV): New internal function.
* optabs.def (ashr_div_optab): New optab.
* tree-vect-patterns.c (vect_recog_divmod_pattern):
Modify pattern to support new operation.
* doc/md.texi (asrd$var{m3}): Documentation for the above.
* doc/sourcebuild.texi (vect_asrdiv_si): Document new target selector.

gcc/testsuite/ChangeLog:

2019-09-23  Yuliang Wang  

* gcc.dg/vect/vect-asrdiv-1.c: New test.
* gcc.target/aarch64/sve/asrdiv_1.c: As above.
* lib/target-support.exp (check_effective_target_vect_asrdiv_si):
Return true for AArch64 with SVE.


rb11863.patch
Description: rb11863.patch


Re: [PATCH] Fix up __builtin_alloca_with_align (0, ...) folding (PR sanitizer/91707)

2019-09-24 Thread Jakub Jelinek
On Tue, Sep 24, 2019 at 01:15:46PM +0200, Richard Biener wrote:
> > build_array_type_nelts is only meaningful for non-zero number of elements,
> > for 0 it creates weirdo arrays like char D.2358[0:18446744073709551615].
> > The following patch uses in that case types like the C FE emits for
> > zero-length array instead (i.e. char D.2358[0:] with forced 0 size).
> > 
> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> Not sure [0:-1] is actually the canonical zero-length array (and IIRC
> what the C++ FE creates and what layout_type can lay out).  So why

You're right, patch withdrawn.

> not fix the sanitizers instead?

Well, the problem isn't in sanitizers, but jump threading and late warnings
that are warning even about code specialized by jump threading.
It could be indeed solved with __builtin_warning if we defer the late
warnings and ignore them inside of sanitization report only paths (if we can
detect them reliably, perhaps pass dominated by a failed ubsan or asan
sanitization check), or by making jump threading not try to optimize the
cold sanitization diagnostics parts.

Jakub


Re: [PATCH][RFC] Come up with VEC_COND_OP_EXPRs.

2019-09-24 Thread Richard Biener
On Tue, Sep 24, 2019 at 1:57 PM Richard Sandiford
 wrote:
>
> Richard Biener  writes:
> > On Tue, Sep 24, 2019 at 1:11 PM Richard Sandiford
> >  wrote:
> >>
> >> Martin Liška  writes:
> >> > Hi.
> >> >
> >> > The patch introduces couple of new TREE_CODEs that will help us to have
> >> > a proper GIMPLE representation of current VECT_COND_EXPR. Right now,
> >> > the first argument is typically a GENERIC tcc_expression tree with 2 
> >> > operands
> >> > that are visited at various places in GIMPLE code. That said, based on 
> >> > the discussion
> >> > with Richi, I'm suggesting to come up with e.g.
> >> > VECT_COND_LT_EXPR. Such a 
> >> > change logically
> >> > introduces new GIMPLE_QUATERNARY_RHS gassignments. For now, the 
> >> > VEC_COND_EXPR remains
> >> > and is only valid in GENERIC and gimplifier will take care of the 
> >> > corresponding transition.
> >> >
> >> > The patch is a prototype and missing bits are:
> >> > - folding support addition for GIMPLE_QUATERNARY_RHS is missing
> >> > - fancy tcc_comparison expressions like LTGT_EXPR, UNORDERED_EXPR, 
> >> > ORDERED_EXPR,
> >> >   UNLT_EXPR and others are not supported right now
> >> > - comments are missing for various functions added
> >> >
> >> > Apart from that I was able to bootstrap and run tests with a quite small 
> >> > fallout.
> >> > Thoughts?
> >> > Martin
> >>
> >> I think this is going in the wrong direction.  There are some targets
> >> that can only handle VEC_COND_EXPRs well if we know the associated
> >> condition, and others where a compare-and-VEC_COND_EXPR will always be
> >> two operations.  In that situation, it seems like the native gimple
> >> representation should be the simpler representation rather than the
> >> more complex one.  That way the comparisons can be optimised
> >> independently of any VEC_COND_EXPRs on targets that benefit from that.
> >>
> >> So IMO it would be better to use three-operand VEC_COND_EXPRs with
> >> no embedded conditions as the preferred gimple representation and
> >> have internal functions for the fused operations that some targets
> >> prefer.  This means that using fused operations is "just" an instruction
> >> selection decision rather than hard-coded throughout gimple.  (And that
> >> fits in well with the idea of doing more instruction selection in gimple.)
> >
> > So I've been doing that before, but more generally also for COND_EXPR.
> > We cannot rely on TER and the existing RTL expansion "magic" for the
> > instruction selection issue you mention because TER isn't reliable.  With
> > IFNs for optabs we could do actual [vector] condition instruction selection
> > before RTL expansion, ignoring "single-use" issues - is that what you are
> > hinting at?
>
> Yeah.  It'd be similar to how most FMA selection happens after
> vectorisation but before expand.
>
> > How should the vectorizer deal with this?  Should it directly
> > use the optab IFNs then when facing "split" COND_EXPRs?  IIRC the
> > most fallout of a simple patch (adjusting is_gimple_condexpr) is in the
> > vectorizer.
>
> I guess that would be down to how well the vector costings work if we
> just stick to VEC_COND_EXPR and cost the comparison separately.  Using
> optabs directly in the vectoriser definitely sounds OK if that ends up
> being necessary for good code.  But if (like you say) the COND_EXPR is
> also split apart, we'd be costing the scalar comparison and selection
> separately as well.
>
> > Note I'm specifically looking for a solution that applies to both COND_EXPR
> > and VEC_COND_EXPR since both suffer from the same issues.
>
> Yeah, think the same approach would work for COND_EXPR if it's needed.
> (And I think the same trade-off applies there too.  Some targets will
> always need a separate comparison to implement a four-operand COND_EXPR.)
>
> > There was also recent work in putting back possibly trapping comparisons
> > into [VEC_]COND_EXPR because it doesn't interfere with EH and allows
> > better code.
>
> OK, that's a good counter-reason :-)  But it seems quite special-purpose.
> I assume this works even for targets that do split the VEC_COND_EXPR
> because the result is undefined on entry to the EH receiver if the
> operation didn't complete.  But that should be true of any non-trapping
> work done after the comparison, with the same proviso.
>
> So this still seems like an instruction-selection issue.  We're just
> saying that it's OK to combine a trapping comparison and a VEC_COND_EXPR
> from the non-trapping path.  The same would be true for any other
> instruction selection that combines trapping and non-trapping
> operations, provided that the speculated parts can never trap.

Sure, but that case would necessarily be combining the compare and the
select to the compare place which is "backwards" (and would speculate
the select).  Certainly something we don't do anywhere.  This case btw
made me consider going the four-operand way (I've pondered with all available
ops multiple times...).

> > 

Re: [PATCH][RFC] Add new ipa-reorder pass

2019-09-24 Thread Martin Liška
On 9/19/19 10:33 AM, Martin Liška wrote:
> - One needs modified binutils and I that would probably require a configure 
> detection. The only way
>   which I see is based on ld --version. I'm planning to make the binutils 
> submission soon.

The patch submission link:
https://sourceware.org/ml/binutils/2019-09/msg00219.html


Re: [PATCH][RFC] Come up with VEC_COND_OP_EXPRs.

2019-09-24 Thread Richard Sandiford
Richard Biener  writes:
> On Tue, Sep 24, 2019 at 1:11 PM Richard Sandiford
>  wrote:
>>
>> Martin Liška  writes:
>> > Hi.
>> >
>> > The patch introduces couple of new TREE_CODEs that will help us to have
>> > a proper GIMPLE representation of current VECT_COND_EXPR. Right now,
>> > the first argument is typically a GENERIC tcc_expression tree with 2 
>> > operands
>> > that are visited at various places in GIMPLE code. That said, based on the 
>> > discussion
>> > with Richi, I'm suggesting to come up with e.g.
>> > VECT_COND_LT_EXPR. Such a 
>> > change logically
>> > introduces new GIMPLE_QUATERNARY_RHS gassignments. For now, the 
>> > VEC_COND_EXPR remains
>> > and is only valid in GENERIC and gimplifier will take care of the 
>> > corresponding transition.
>> >
>> > The patch is a prototype and missing bits are:
>> > - folding support addition for GIMPLE_QUATERNARY_RHS is missing
>> > - fancy tcc_comparison expressions like LTGT_EXPR, UNORDERED_EXPR, 
>> > ORDERED_EXPR,
>> >   UNLT_EXPR and others are not supported right now
>> > - comments are missing for various functions added
>> >
>> > Apart from that I was able to bootstrap and run tests with a quite small 
>> > fallout.
>> > Thoughts?
>> > Martin
>>
>> I think this is going in the wrong direction.  There are some targets
>> that can only handle VEC_COND_EXPRs well if we know the associated
>> condition, and others where a compare-and-VEC_COND_EXPR will always be
>> two operations.  In that situation, it seems like the native gimple
>> representation should be the simpler representation rather than the
>> more complex one.  That way the comparisons can be optimised
>> independently of any VEC_COND_EXPRs on targets that benefit from that.
>>
>> So IMO it would be better to use three-operand VEC_COND_EXPRs with
>> no embedded conditions as the preferred gimple representation and
>> have internal functions for the fused operations that some targets
>> prefer.  This means that using fused operations is "just" an instruction
>> selection decision rather than hard-coded throughout gimple.  (And that
>> fits in well with the idea of doing more instruction selection in gimple.)
>
> So I've been doing that before, but more generally also for COND_EXPR.
> We cannot rely on TER and the existing RTL expansion "magic" for the
> instruction selection issue you mention because TER isn't reliable.  With
> IFNs for optabs we could do actual [vector] condition instruction selection
> before RTL expansion, ignoring "single-use" issues - is that what you are
> hinting at?

Yeah.  It'd be similar to how most FMA selection happens after
vectorisation but before expand.

> How should the vectorizer deal with this?  Should it directly
> use the optab IFNs then when facing "split" COND_EXPRs?  IIRC the
> most fallout of a simple patch (adjusting is_gimple_condexpr) is in the
> vectorizer.

I guess that would be down to how well the vector costings work if we
just stick to VEC_COND_EXPR and cost the comparison separately.  Using
optabs directly in the vectoriser definitely sounds OK if that ends up
being necessary for good code.  But if (like you say) the COND_EXPR is
also split apart, we'd be costing the scalar comparison and selection
separately as well.

> Note I'm specifically looking for a solution that applies to both COND_EXPR
> and VEC_COND_EXPR since both suffer from the same issues.

Yeah, think the same approach would work for COND_EXPR if it's needed.
(And I think the same trade-off applies there too.  Some targets will
always need a separate comparison to implement a four-operand COND_EXPR.)

> There was also recent work in putting back possibly trapping comparisons
> into [VEC_]COND_EXPR because it doesn't interfere with EH and allows
> better code.

OK, that's a good counter-reason :-)  But it seems quite special-purpose.
I assume this works even for targets that do split the VEC_COND_EXPR
because the result is undefined on entry to the EH receiver if the
operation didn't complete.  But that should be true of any non-trapping
work done after the comparison, with the same proviso.

So this still seems like an instruction-selection issue.  We're just
saying that it's OK to combine a trapping comparison and a VEC_COND_EXPR
from the non-trapping path.  The same would be true for any other
instruction selection that combines trapping and non-trapping
operations, provided that the speculated parts can never trap.

> Also you SVE people had VN issues with cond-exprs and
> VN runs into the exact same issue (but would handle separate comparisons
> better - with the caveat of breaking TER).

The VN thing turned out to be a red herring there, sorry.  I think
I was remembering the state before ifcvt did its own value numbering.
The remaining issue for the vectoriser is that we don't avoid duplicate
cast conversions in vect_recog_mask_conversion_pattern, but that's
mostly a cost thing.  The redundancies do get removed by later passes.

Thanks,
Richard


[PATCH] More reduction vectorization refactoring

2019-09-24 Thread Richard Biener


Baby-steps towards sanity.  My focus is currently vectorizable_reduction
vs. vect_create_epilog_for_reduction - the following aims at simplifying
all the condition reduction special-casing.  The real next intermediate
goal is to move all code-generation that is not "epilogue" from
vect_create_epilog_for_reduction to vectorizable_reduction, first
completing the last big refactoring and move all PHI creation to
the point when vectorizable_reduction is called on the scalar
reduction PHI.

Anyway...

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2019-09-24  Richard Biener  

* tree-vectorizer.h (_stmt_vec_info::const_cond_reduc_code):
Rename to...
(_stmt_vec_info::cond_reduc_code): ... this.
(_stmt_vec_info::induc_cond_initial_val): Add.
(STMT_VINFO_VEC_CONST_COND_REDUC_CODE): Rename to...
(STMT_VINFO_VEC_COND_REDUC_CODE): ... this.
(STMT_VINFO_VEC_INDUC_COND_INITIAL_VAL): Add.
* tree-vectorizer.c (vec_info::new_stmt_vec_info): Adjust.
* tree-vect-loop.c (get_initial_def_for_reduction): Pass in
the reduction code.
(vect_create_epilog_for_reduction): Drop special
induction condition reduction params, pass in reduction code
and simplify.
(vectorizable_reduction): Perform condition reduction kind
selection only at analysis time.  Adjust passing on state.

Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c(revision 276092)
+++ gcc/tree-vect-loop.c(working copy)
@@ -3981,14 +3981,14 @@ vect_model_induction_cost (stmt_vec_info
A cost model should help decide between these two schemes.  */
 
 static tree
-get_initial_def_for_reduction (stmt_vec_info stmt_vinfo, tree init_val,
+get_initial_def_for_reduction (stmt_vec_info stmt_vinfo,
+  enum tree_code code, tree init_val,
tree *adjustment_def)
 {
   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_vinfo);
   class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
   tree scalar_type = TREE_TYPE (init_val);
   tree vectype = get_vectype_for_scalar_type (scalar_type);
-  enum tree_code code = gimple_assign_rhs_code (stmt_vinfo->stmt);
   tree def_for_init;
   tree init_def;
   REAL_VALUE_TYPE real_init_val = dconst0;
@@ -4273,14 +4273,15 @@ static void
 vect_create_epilog_for_reduction (vec vect_defs,
  stmt_vec_info stmt_info,
  gimple *reduc_def_stmt,
+ enum tree_code code,
  int ncopies, internal_fn reduc_fn,
  vec reduction_phis,
   bool double_reduc, 
  slp_tree slp_node,
  slp_instance slp_node_instance,
- tree induc_val, enum tree_code induc_code,
  tree neutral_op)
 {
+  tree induc_val = NULL_TREE;
   stmt_vec_info prev_phi_info;
   tree vectype;
   machine_mode mode;
@@ -4370,17 +4371,22 @@ vect_create_epilog_for_reduction (vec (phi_info->stmt);
- if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
- == INTEGER_INDUC_COND_REDUCTION)
-   {
- /* Initialise the reduction phi to zero.  This prevents initial
-values of non-zero interferring with the reduction op.  */
- gcc_assert (ncopies == 1);
- gcc_assert (i == 0);
-
- tree vec_init_def_type = TREE_TYPE (vec_init_def);
- tree induc_val_vec
-   = build_vector_from_val (vec_init_def_type, induc_val);
-
- add_phi_arg (phi, induc_val_vec, loop_preheader_edge (loop),
-  UNKNOWN_LOCATION);
-   }
- else
-   add_phi_arg (phi, vec_init_def, loop_preheader_edge (loop),
-UNKNOWN_LOCATION);
+ add_phi_arg (phi, vec_init_def, loop_preheader_edge (loop),
+  UNKNOWN_LOCATION);
 
   /* Set the loop-latch arg for the reduction-phi.  */
   if (j > 0)
@@ -4652,12 +4642,6 @@ vect_create_epilog_for_reduction (vecstmt);
-  /* For MINUS_EXPR the initial vector is [init_val,0,...,0], therefore,
- partial results are added and not subtracted.  */
-  if (code == MINUS_EXPR) 
-code = PLUS_EXPR;
-
   /* SLP reduction without reduction chain, e.g.,
  # a1 = phi 
  # b1 = phi 
@@ -5049,20 +5033,6 @@ vect_create_epilog_for_reduction (vecstmt);
 
   vect_create_epilog_for_reduction (vect_defs, stmt_info, reduc_def_phi,
-   epilog_copies, reduc_fn, phis,
+   orig_code, epilog_copies, reduc_fn, phis,
double_reduc, slp_node, slp_node_instance,
-   

Re: [PATCH][RFC] Come up with VEC_COND_OP_EXPRs.

2019-09-24 Thread Richard Biener
On Tue, Sep 24, 2019 at 1:11 PM Richard Sandiford
 wrote:
>
> Martin Liška  writes:
> > Hi.
> >
> > The patch introduces couple of new TREE_CODEs that will help us to have
> > a proper GIMPLE representation of current VECT_COND_EXPR. Right now,
> > the first argument is typically a GENERIC tcc_expression tree with 2 
> > operands
> > that are visited at various places in GIMPLE code. That said, based on the 
> > discussion
> > with Richi, I'm suggesting to come up with e.g.
> > VECT_COND_LT_EXPR. Such a 
> > change logically
> > introduces new GIMPLE_QUATERNARY_RHS gassignments. For now, the 
> > VEC_COND_EXPR remains
> > and is only valid in GENERIC and gimplifier will take care of the 
> > corresponding transition.
> >
> > The patch is a prototype and missing bits are:
> > - folding support addition for GIMPLE_QUATERNARY_RHS is missing
> > - fancy tcc_comparison expressions like LTGT_EXPR, UNORDERED_EXPR, 
> > ORDERED_EXPR,
> >   UNLT_EXPR and others are not supported right now
> > - comments are missing for various functions added
> >
> > Apart from that I was able to bootstrap and run tests with a quite small 
> > fallout.
> > Thoughts?
> > Martin
>
> I think this is going in the wrong direction.  There are some targets
> that can only handle VEC_COND_EXPRs well if we know the associated
> condition, and others where a compare-and-VEC_COND_EXPR will always be
> two operations.  In that situation, it seems like the native gimple
> representation should be the simpler representation rather than the
> more complex one.  That way the comparisons can be optimised
> independently of any VEC_COND_EXPRs on targets that benefit from that.
>
> So IMO it would be better to use three-operand VEC_COND_EXPRs with
> no embedded conditions as the preferred gimple representation and
> have internal functions for the fused operations that some targets
> prefer.  This means that using fused operations is "just" an instruction
> selection decision rather than hard-coded throughout gimple.  (And that
> fits in well with the idea of doing more instruction selection in gimple.)

So I've been doing that before, but more generally also for COND_EXPR.
We cannot rely on TER and the existing RTL expansion "magic" for the
instruction selection issue you mention because TER isn't reliable.  With
IFNs for optabs we could do actual [vector] condition instruction selection
before RTL expansion, ignoring "single-use" issues - is that what you are
hinting at?  How should the vectorizer deal with this?  Should it directly
use the optab IFNs then when facing "split" COND_EXPRs?  IIRC the
most fallout of a simple patch (adjusting is_gimple_condexpr) is in the
vectorizer.

Note I'm specifically looking for a solution that applies to both COND_EXPR
and VEC_COND_EXPR since both suffer from the same issues.

There was also recent work in putting back possibly trapping comparisons
into [VEC_]COND_EXPR because it doesn't interfere with EH and allows
better code.  Also you SVE people had VN issues with cond-exprs and
VN runs into the exact same issue (but would handle separate comparisons
better - with the caveat of breaking TER).

Richard.

>
> Thanks,
> Richard


Re: [PATCH] Use more switch statements.

2019-09-24 Thread Richard Biener
On Tue, Sep 24, 2019 at 12:15 PM Martin Liška  wrote:
>
> Hi.
>
> The patch is about a refactoring where we should use
> more switch statements rather that if-elseif-elseif
> chains.
>
> I've been testing the patch.
> Ready to be installed after tests?

OK.

Richard.

> Martin
>
> gcc/ChangeLog:
>
> 2019-09-24  Martin Liska  
>
> * cfgexpand.c (gimple_assign_rhs_to_tree): Use switch statement
> instead of if-elseif-elseif-...
> * gimple-expr.c (extract_ops_from_tree): Likewise.
> * gimple.c (get_gimple_rhs_num_ops): Likewise.
> * tree-ssa-forwprop.c (rhs_to_tree): Likewise.
> ---
>  gcc/cfgexpand.c | 62 -
>  gcc/gimple-expr.c   | 59 ---
>  gcc/gimple.c| 22 ---
>  gcc/tree-ssa-forwprop.c | 29 ++-
>  4 files changed, 90 insertions(+), 82 deletions(-)
>
>


Re: [PATCH] Fix up __builtin_alloca_with_align (0, ...) folding (PR sanitizer/91707)

2019-09-24 Thread Richard Biener
On Tue, 24 Sep 2019, Jakub Jelinek wrote:

> Hi!
> 
> build_array_type_nelts is only meaningful for non-zero number of elements,
> for 0 it creates weirdo arrays like char D.2358[0:18446744073709551615].
> The following patch uses in that case types like the C FE emits for
> zero-length array instead (i.e. char D.2358[0:] with forced 0 size).
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Not sure [0:-1] is actually the canonical zero-length array (and IIRC
what the C++ FE creates and what layout_type can lay out).  So why
not fix the sanitizers instead?

Richard.

> 2019-09-24  Jakub Jelinek  
> 
>   PR sanitizer/91707
>   * tree-ssa-ccp.c (fold_builtin_alloca_with_align): For n_elem 0
>   use a type like C zero length array instead of array from 0
>   to SIZE_MAX.
> 
> --- gcc/tree-ssa-ccp.c.jj 2019-09-20 12:25:26.809718354 +0200
> +++ gcc/tree-ssa-ccp.c2019-09-23 19:38:03.530722874 +0200
> @@ -2223,7 +2223,18 @@ fold_builtin_alloca_with_align (gimple *
>/* Declare array.  */
>elem_type = build_nonstandard_integer_type (BITS_PER_UNIT, 1);
>n_elem = size * 8 / BITS_PER_UNIT;
> -  array_type = build_array_type_nelts (elem_type, n_elem);
> +  if (n_elem == 0)
> +{
> +  /* For alloca (0), use array type similar to C zero-length arrays.  */
> +  tree range_type = build_range_type (sizetype, size_zero_node, 
> NULL_TREE);
> +  array_type = build_array_type (elem_type, range_type);
> +  array_type = build_distinct_type_copy (TYPE_MAIN_VARIANT (array_type));
> +  TYPE_SIZE (array_type) = bitsize_zero_node;
> +  TYPE_SIZE_UNIT (array_type) = size_zero_node;
> +  SET_TYPE_STRUCTURAL_EQUALITY (array_type);
> +}
> +  else
> +array_type = build_array_type_nelts (elem_type, n_elem);
>var = create_tmp_var (array_type);
>SET_DECL_ALIGN (var, TREE_INT_CST_LOW (gimple_call_arg (stmt, 1)));
>if (uid != 0)
> 
>   Jakub
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 247165 (AG München)

Re: [PATCH] Fold ((T)(A + CST1)) + CST2 -> (T)(A) + (T)CST1 + CST2 for unsigned T and undefined overflow A + CST1 (PR middle-end/91866)

2019-09-24 Thread Richard Biener
On Tue, 24 Sep 2019, Jakub Jelinek wrote:

> Hi!
> 
> As mentioned in the PR, the following patch decreases number of +/-
> operation when one is inside sign extension, done with undefined overflow,
> and the outer is using wrapping arithmetics.
> 
> The :s as well as the outer + are there in order to make sure it is actually
> beneficial, that we decrease the number of +/-, either from two to one or
> from two to zero, depending on whether they cancel each other.
> The price for the reduction is that we loose information about the undefined
> behavior for certain @0 values.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Thanks,
Richard.

> 2019-09-24  Jakub Jelinek  
> 
>   PR middle-end/91866
>   * match.pd (((T)(A)) + CST -> (T)(A + CST)): Formatting fix.
>   (((T)(A + CST1)) + CST2 -> (T)(A) + (T)CST1 + CST2): New optimization.
> 
>   * gcc.dg/tree-ssa/pr91866.c: New test.
> 
> --- gcc/match.pd.jj   2019-09-21 23:53:52.108385196 +0200
> +++ gcc/match.pd  2019-09-24 10:18:58.804114496 +0200
> @@ -2265,8 +2265,9 @@ (define_operator_list COND_TERNARY
> max_ovf = wi::OVF_OVERFLOW;
>  tree inner_type = TREE_TYPE (@0);
>  
> -wide_int w1 = wide_int::from (wi::to_wide (@1), TYPE_PRECISION 
> (inner_type),
> - TYPE_SIGN (inner_type));
> + wide_int w1
> +   = wide_int::from (wi::to_wide (@1), TYPE_PRECISION (inner_type),
> + TYPE_SIGN (inner_type));
>  
>  wide_int wmin0, wmax0;
>  if (get_range_info (@0, , ) == VR_RANGE)
> @@ -2280,6 +2281,20 @@ (define_operator_list COND_TERNARY
>   )))
>  #endif
>  
> +/* ((T)(A + CST1)) + CST2 -> (T)(A) + (T)CST1 + CST2  */
> +#if GIMPLE
> +  (for op (plus minus)
> +   (simplify
> +(plus (convert:s (op:s @0 INTEGER_CST@1)) INTEGER_CST@2)
> + (if (TREE_CODE (TREE_TYPE (@0)) == INTEGER_TYPE
> +   && TREE_CODE (type) == INTEGER_TYPE
> +   && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))
> +   && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))
> +   && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0))
> +   && TYPE_OVERFLOW_WRAPS (type))
> +   (plus (convert @0) (op @2 (convert @1))
> +#endif
> +
>/* ~A + A -> -1 */
>(simplify
> (plus:c (bit_not @0) @0)
> --- gcc/testsuite/gcc.dg/tree-ssa/pr91866.c.jj2019-09-24 
> 10:35:34.035784152 +0200
> +++ gcc/testsuite/gcc.dg/tree-ssa/pr91866.c   2019-09-24 10:36:59.858463024 
> +0200
> @@ -0,0 +1,12 @@
> +/* PR middle-end/91866 */
> +/* { dg-do compile { target { ilp32 || lp64 } } } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } * /
> +/* { dg-final { scan-tree-dump-times " \\+ 11;" 3 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times " \[+-] \[0-9-]\[0-9]*;" 3 "optimized" 
> } } */
> +/* { dg-final { scan-tree-dump-times "\\(long long unsigned int\\) x_" 5 
> "optimized" } } */
> +
> +unsigned long long f1 (int x) { return (x + 1) - 1ULL; }
> +unsigned long long f2 (int x) { return (x - 5) + 5ULL; }
> +unsigned long long f3 (int x) { return (x - 15) + 26ULL; }
> +unsigned long long f4 (int x) { return (x + 6) + 5ULL; }
> +unsigned long long f5 (int x) { return (x - (-1 - __INT_MAX__)) + 10ULL - 
> __INT_MAX__; }
> 
>   Jakub
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 247165 (AG München)

Re: [PATCH, Fortran] Optionally suppress no-automatic overwrites recursive warning - for review

2019-09-24 Thread Mark Eggleston


On 20/09/2019 07:46, Bernhard Reutner-Fischer wrote:

On Thu, 19 Sep 2019 17:46:29 +0200
Tobias Burnus  wrote:


Hi Mark,

On 9/19/19 3:40 PM, Mark Eggleston wrote:

The following warning is produced when -fno-automatic and -frecursive
are used at the same time:

f951: Warning: Flag '-fno-automatic' overwrites '-frecursive'

This patch allows the warning to be switched off using a new option,
-Woverwrite-recursive, initialised to on.

I don't have a test case for this as I don't know how to test for a
warning that isn't related to a line of code.

Try:

! { dg-warning "Flag .-fno-automatic. overwrites .-frecursive." "" {
target *-*-* } 0 }

The syntax is { dg-warning "message", "label" {target ...} linenumber },
where linenumber = 0 means it can be on any line.

Thanks that was the bit I was missing. Test cases now added.


If the output doesn't match (but I think it does with "Warning:"),
general messages can be caught with "dg-message".

Also:


@@ -411,7 +411,7 @@ gfc_post_options (const char **pfilename)
&& flag_max_stack_var_size != 0)
  gfc_warning_now (0, "Flag %<-fno-automatic%> overwrites 
%<-fmax-stack-var-size=%d%>",
 flag_max_stack_var_size);
-  else if (!flag_automatic && flag_recursive)
+  else if (!flag_automatic && flag_recursive && warn_overwrite_recursive)
  gfc_warning_now (0, "Flag %<-fno-automatic%> overwrites %<-frecursive%>");
else if (!flag_automatic && flag_openmp)
  gfc_warning_now (0, "Flag %<-fno-automatic%> overwrites %<-frecursive%> 
implied by "


Doesn't look right to me. Do you want
gfc_warning_now (OPT_Woverwrite_recursive, "Flag ...
instead?

Done.


thanks,


Additionally I realised that I hadn't updated the manual.

Updated patch is attached.

Updated change log:

gcc/fortran

    Mark Eggleston 

    * invoke.texi: Add -Wno-overwrite-recursive to list of options. Add
    description of -Wno-overwrite-recursive. Fix typo in description
    of -Winteger-division.
    * lang.opt: Add option -Woverwrite-recursive initialised as on.
    * option.c (gfc_post_options): Output warning only if it is enabled.

gcc/testsuite

    Mark Eggleston 

    * gfortran.dg/no_overwrite_recursive_1.f90: New test.
    * gfortran.dg/no_overwrite_recursive_2.f90: New test.

OK to commit?

Mark



--
https://www.codethink.co.uk/privacy.html

>From e3c40212a648d9fbf8ea33525a943cc85e0652b0 Mon Sep 17 00:00:00 2001
From: Mark Eggleston 
Date: Tue, 16 Apr 2019 09:09:12 +0100
Subject: [PATCH] Suppress warning with -Wno-overwrite-recursive

The message "Warning: Flag '-fno-automatic' overwrites '-frecursive'" is
output by default when -fno-automatic and -frecursive are used together.
It warns that recursion may be broken, however if all the relavent variables
in the recursive procedure have automatic attributes the warning is
unnecessary so -Wno-overwrite-recursive can be used to suppress it. This
will allow compilation when warnings are regarded as errors.

Suppress warning with -Wno-overwrite-recursive
---
 gcc/fortran/invoke.texi  | 20 +++-
 gcc/fortran/lang.opt |  4 
 gcc/fortran/options.c|  5 +++--
 .../gfortran.dg/no_overwrite_recursive_1.f90 | 11 +++
 .../gfortran.dg/no_overwrite_recursive_2.f90 | 10 ++
 5 files changed, 43 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/no_overwrite_recursive_1.f90
 create mode 100644 gcc/testsuite/gfortran.dg/no_overwrite_recursive_2.f90

diff --git a/gcc/fortran/invoke.texi b/gcc/fortran/invoke.texi
index 40eeadd00db..b60749b3755 100644
--- a/gcc/fortran/invoke.texi
+++ b/gcc/fortran/invoke.texi
@@ -149,10 +149,11 @@ and warnings}.
 -Wc-binding-type -Wcharacter-truncation -Wconversion @gol
 -Wdo-subscript -Wfunction-elimination -Wimplicit-interface @gol
 -Wimplicit-procedure -Wintrinsic-shadow -Wuse-without-only @gol
--Wintrinsics-std -Wline-truncation -Wno-align-commons -Wno-tabs @gol
--Wreal-q-constant -Wsurprising -Wunderflow -Wunused-parameter @gol
--Wrealloc-lhs -Wrealloc-lhs-all -Wfrontend-loop-interchange @gol
--Wtarget-lifetime -fmax-errors=@var{n} -fsyntax-only -pedantic @gol
+-Wintrinsics-std -Wline-truncation -Wno-align-commons @gol
+-Wno-overwrite-recursive -Wno-tabs -Wreal-q-constant -Wsurprising @gol
+-Wunderflow -Wunused-parameter -Wrealloc-lhs -Wrealloc-lhs-all @gol
+-Wfrontend-loop-interchange -Wtarget-lifetime -fmax-errors=@var{n} @gol
+-fsyntax-only -pedantic @gol
 -pedantic-errors @gol
 }
 
@@ -997,7 +998,7 @@ nor has been declared as @code{EXTERNAL}.
 @opindex @code{Winteger-division}
 @cindex warnings, integer division
 @cindex warnings, division of integers
-Warn if a constant integer division truncates it result.
+Warn if a constant integer division truncates its result.
 As an example, 3/5 evaluates to 0.
 
 @item -Wintrinsics-std
@@ -1010,6 +1011,15 @@ it as @code{EXTERNAL} procedure because of this.  

Re: [PATCH][RFC] Come up with VEC_COND_OP_EXPRs.

2019-09-24 Thread Richard Sandiford
Martin Liška  writes:
> Hi.
>
> The patch introduces couple of new TREE_CODEs that will help us to have
> a proper GIMPLE representation of current VECT_COND_EXPR. Right now,
> the first argument is typically a GENERIC tcc_expression tree with 2 operands
> that are visited at various places in GIMPLE code. That said, based on the 
> discussion
> with Richi, I'm suggesting to come up with e.g.
> VECT_COND_LT_EXPR. Such a change 
> logically
> introduces new GIMPLE_QUATERNARY_RHS gassignments. For now, the VEC_COND_EXPR 
> remains
> and is only valid in GENERIC and gimplifier will take care of the 
> corresponding transition.
>
> The patch is a prototype and missing bits are:
> - folding support addition for GIMPLE_QUATERNARY_RHS is missing
> - fancy tcc_comparison expressions like LTGT_EXPR, UNORDERED_EXPR, 
> ORDERED_EXPR,
>   UNLT_EXPR and others are not supported right now
> - comments are missing for various functions added
>
> Apart from that I was able to bootstrap and run tests with a quite small 
> fallout.
> Thoughts?
> Martin

I think this is going in the wrong direction.  There are some targets
that can only handle VEC_COND_EXPRs well if we know the associated
condition, and others where a compare-and-VEC_COND_EXPR will always be
two operations.  In that situation, it seems like the native gimple
representation should be the simpler representation rather than the
more complex one.  That way the comparisons can be optimised
independently of any VEC_COND_EXPRs on targets that benefit from that.

So IMO it would be better to use three-operand VEC_COND_EXPRs with
no embedded conditions as the preferred gimple representation and
have internal functions for the fused operations that some targets
prefer.  This means that using fused operations is "just" an instruction
selection decision rather than hard-coded throughout gimple.  (And that
fits in well with the idea of doing more instruction selection in gimple.)

Thanks,
Richard


[PATCH] Fix up __builtin_alloca_with_align (0, ...) folding (PR sanitizer/91707)

2019-09-24 Thread Jakub Jelinek
Hi!

build_array_type_nelts is only meaningful for non-zero number of elements,
for 0 it creates weirdo arrays like char D.2358[0:18446744073709551615].
The following patch uses in that case types like the C FE emits for
zero-length array instead (i.e. char D.2358[0:] with forced 0 size).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2019-09-24  Jakub Jelinek  

PR sanitizer/91707
* tree-ssa-ccp.c (fold_builtin_alloca_with_align): For n_elem 0
use a type like C zero length array instead of array from 0
to SIZE_MAX.

--- gcc/tree-ssa-ccp.c.jj   2019-09-20 12:25:26.809718354 +0200
+++ gcc/tree-ssa-ccp.c  2019-09-23 19:38:03.530722874 +0200
@@ -2223,7 +2223,18 @@ fold_builtin_alloca_with_align (gimple *
   /* Declare array.  */
   elem_type = build_nonstandard_integer_type (BITS_PER_UNIT, 1);
   n_elem = size * 8 / BITS_PER_UNIT;
-  array_type = build_array_type_nelts (elem_type, n_elem);
+  if (n_elem == 0)
+{
+  /* For alloca (0), use array type similar to C zero-length arrays.  */
+  tree range_type = build_range_type (sizetype, size_zero_node, NULL_TREE);
+  array_type = build_array_type (elem_type, range_type);
+  array_type = build_distinct_type_copy (TYPE_MAIN_VARIANT (array_type));
+  TYPE_SIZE (array_type) = bitsize_zero_node;
+  TYPE_SIZE_UNIT (array_type) = size_zero_node;
+  SET_TYPE_STRUCTURAL_EQUALITY (array_type);
+}
+  else
+array_type = build_array_type_nelts (elem_type, n_elem);
   var = create_tmp_var (array_type);
   SET_DECL_ALIGN (var, TREE_INT_CST_LOW (gimple_call_arg (stmt, 1)));
   if (uid != 0)

Jakub


Re: [PATCH, nvptx] Expand OpenACC child function arguments to use CUDA params space

2019-09-24 Thread Chung-Lin Tang

Hi Thomas, thanks for the review.

On 2019/9/20 12:28 AM, Thomas Schwinge wrote:

This new implementation works by modifying the GIMPLE for child functions
directly at the very start (before, actually) of RTL expansion

That's now near the other end of the pipeline.;-)  What's the motivation
for putting it there, instead of early in the nvptx offloading
compilation (around 'pass_oacc_device_lower' etc. time, where I would've
assumed this transformation to be done)?  Not asking you to change that
now, but curious for the reason.


I am not sure we have a natural boundary that defines/marks the start of the
offload compiler stages. Maybe if we had an explicit "start_of_offload" pass,
we can embed this processing there, and enable it with a bool-valued target hook
by the accelerator backend. (possibly only when ACCEL_COMPILER is defined)

In short of that, I think placing it here before RTL expansion seems the
most well defined, even if we have to handle some optimized obscurity.


and thus
is placed in TARGET_EXPAND_TO_RTL_HOOK, as the core issue is we inherently
need something different generated between the host-fallback vs for the GPU.

(Likewise, different per each offload target.)


The new nvptx_expand_to_rtl_hook modifies the function decl type and
arguments, and scans the gimple body to remove occurrences of .omp_data_i.*
Detection of OpenACC child functions is done through "omp target entrypoint"
and "oacc function" attributes. Because OpenMP target child functions
have a more elaborate wrapper generated for them, this pass only supports
OpenACC right now.

At the Cauldron, the question indeed has been raised (Jakub, Tom) why not
enabled for OpenMP, too.  My answer was that this surely can be done, but
the change as presented here already is an improvement over the current
status ("stands on its own", as Jeff Law would call it), so I'm fine with
you handling OpenACC first, and then OpenMP can follow later (at some as
of yet indeterminite point in time, even).


The OpenMP way of wrapping the user defined GPU kernel with lots of 
initialization
code does make this much more tedious I think.

The question should actually be, can OpenMP simply do this kind of 
initialization
by the host libgomp runtime like OpenACC does, and make the nvptx kernel
proper more similar between the two?


libgomp has tested with this patch x86_64-linux (nvptx-none accelerator)
without regressions

Can you present performance numbers, too?


Haven't got to that yet.


(I'm currently undergoing more gcc tests as well).

As these changes, being confined to nvptx code only, can't possibly have
any effect on other target testing, I assume that's nvptx target testing
you're talking about?  (..., where also I'm not expecting any
disturbance.)


Yeah, I was talking about nvptx-none compiler testing. Haven't found any 
changes.


--- gcc/config/nvptx/nvptx.c(revision 275493)

+++ gcc/config/nvptx/nvptx.c(working copy)
+static void
+nvptx_expand_to_rtl_hook (void)
+{
+  /* For utilizing CUDA .param kernel arguments, we detect and modify
+ the gimple of offloaded child functions, here before RTL expansion,
+ starting with standard OMP form:
+  foo._omp_fn.0 (const struct .omp_data_t.8 & restrict .omp_data_i) { ... }
+
+ and transform it into a style where the OMP data record fields are
+ "exploded" into individual scalar arguments:
+  foo._omp_fn.0 (int * a, int * b, int * c) { ... }
+
+ Note that there are implicit assumptions of how OMP lowering (and/or other
+ intervening passes) behaves contained in this transformation code;
+ if those passes change in their output, this code may possibly need
+ updating.  */
+
+  if (lookup_attribute ("omp target entrypoint",
+   DECL_ATTRIBUTES (current_function_decl))
+  /* The rather indirect manner in which OpenMP target functions are
+launched makes this transformation only valid for OpenACC currently.
+TODO: e.g. write_omp_entry(), nvptx_declare_function_name(), etc.
+needs changes for this to work with OpenMP.  */
+  && lookup_attribute ("oacc function",
+  DECL_ATTRIBUTES (current_function_decl))
+  && VOID_TYPE_P (TREE_TYPE (DECL_RESULT (current_function_decl

Why the 'void' return conditional?  (Or, should that rather be an
'gcc_checking_assert' at the top of the following block?)


That the shape of child functions omp-low generates. Maybe that should be an
assertion, though here I'm just doing sanity checking and ignoring otherwise.

Come to think of it, maybe I should try using the assertion to check if
I'm unintentionally ignoring transforming some cases...


+{
+  tree omp_data_arg = DECL_ARGUMENTS (current_function_decl);
+  tree argtype = TREE_TYPE (omp_data_arg);
+
+  /* Ensure this function is of the form of a single reference argument
+to the OMP data record, or a single void* argument (when no values
+passed)  

[PATCH] Fold ((T)(A + CST1)) + CST2 -> (T)(A) + (T)CST1 + CST2 for unsigned T and undefined overflow A + CST1 (PR middle-end/91866)

2019-09-24 Thread Jakub Jelinek
Hi!

As mentioned in the PR, the following patch decreases number of +/-
operation when one is inside sign extension, done with undefined overflow,
and the outer is using wrapping arithmetics.

The :s as well as the outer + are there in order to make sure it is actually
beneficial, that we decrease the number of +/-, either from two to one or
from two to zero, depending on whether they cancel each other.
The price for the reduction is that we loose information about the undefined
behavior for certain @0 values.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2019-09-24  Jakub Jelinek  

PR middle-end/91866
* match.pd (((T)(A)) + CST -> (T)(A + CST)): Formatting fix.
(((T)(A + CST1)) + CST2 -> (T)(A) + (T)CST1 + CST2): New optimization.

* gcc.dg/tree-ssa/pr91866.c: New test.

--- gcc/match.pd.jj 2019-09-21 23:53:52.108385196 +0200
+++ gcc/match.pd2019-09-24 10:18:58.804114496 +0200
@@ -2265,8 +2265,9 @@ (define_operator_list COND_TERNARY
  max_ovf = wi::OVF_OVERFLOW;
 tree inner_type = TREE_TYPE (@0);
 
-wide_int w1 = wide_int::from (wi::to_wide (@1), TYPE_PRECISION 
(inner_type),
-   TYPE_SIGN (inner_type));
+   wide_int w1
+ = wide_int::from (wi::to_wide (@1), TYPE_PRECISION (inner_type),
+   TYPE_SIGN (inner_type));
 
 wide_int wmin0, wmax0;
 if (get_range_info (@0, , ) == VR_RANGE)
@@ -2280,6 +2281,20 @@ (define_operator_list COND_TERNARY
  )))
 #endif
 
+/* ((T)(A + CST1)) + CST2 -> (T)(A) + (T)CST1 + CST2  */
+#if GIMPLE
+  (for op (plus minus)
+   (simplify
+(plus (convert:s (op:s @0 INTEGER_CST@1)) INTEGER_CST@2)
+ (if (TREE_CODE (TREE_TYPE (@0)) == INTEGER_TYPE
+ && TREE_CODE (type) == INTEGER_TYPE
+ && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))
+ && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))
+ && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0))
+ && TYPE_OVERFLOW_WRAPS (type))
+   (plus (convert @0) (op @2 (convert @1))
+#endif
+
   /* ~A + A -> -1 */
   (simplify
(plus:c (bit_not @0) @0)
--- gcc/testsuite/gcc.dg/tree-ssa/pr91866.c.jj  2019-09-24 10:35:34.035784152 
+0200
+++ gcc/testsuite/gcc.dg/tree-ssa/pr91866.c 2019-09-24 10:36:59.858463024 
+0200
@@ -0,0 +1,12 @@
+/* PR middle-end/91866 */
+/* { dg-do compile { target { ilp32 || lp64 } } } */
+/* { dg-options "-O2 -fdump-tree-optimized" } * /
+/* { dg-final { scan-tree-dump-times " \\+ 11;" 3 "optimized" } } */
+/* { dg-final { scan-tree-dump-times " \[+-] \[0-9-]\[0-9]*;" 3 "optimized" } 
} */
+/* { dg-final { scan-tree-dump-times "\\(long long unsigned int\\) x_" 5 
"optimized" } } */
+
+unsigned long long f1 (int x) { return (x + 1) - 1ULL; }
+unsigned long long f2 (int x) { return (x - 5) + 5ULL; }
+unsigned long long f3 (int x) { return (x - 15) + 26ULL; }
+unsigned long long f4 (int x) { return (x + 6) + 5ULL; }
+unsigned long long f5 (int x) { return (x - (-1 - __INT_MAX__)) + 10ULL - 
__INT_MAX__; }

Jakub


Re: [PATCH] Fix ICE when __builtin_calloc has no LHS (PR tree-optimization/91014).

2019-09-24 Thread Martin Liška
On 9/24/19 11:14 AM, Thomas Schwinge wrote:
> Hi!
> 
> Curious: even if you found the issue on a s390x target, shouldn't this
> (presumably generic?) test case live in a generic place instead of
> 'gcc.target/s390/'?

Sure, that's logical and I've just tested that locally on x86_64-linux-gnu.

Ready to be installed?
Thanks,
Martin

> 
> 
> Grüße
>  Thomas

>From cb3f2ae1b00129dae4854e370b1049e8d2f01f97 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Tue, 24 Sep 2019 12:31:56 +0200
Subject: [PATCH] Move a target test-case to generic folder.

gcc/testsuite/ChangeLog:

2019-09-24  Martin Liska  

	* gcc.target/s390/pr91014.c: Move to ...
	* gcc.dg/pr91014.c: ... this.
---
 gcc/testsuite/{gcc.target/s390 => gcc.dg}/pr91014.c | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 rename gcc/testsuite/{gcc.target/s390 => gcc.dg}/pr91014.c (100%)

diff --git a/gcc/testsuite/gcc.target/s390/pr91014.c b/gcc/testsuite/gcc.dg/pr91014.c
similarity index 100%
rename from gcc/testsuite/gcc.target/s390/pr91014.c
rename to gcc/testsuite/gcc.dg/pr91014.c
-- 
2.23.0



[PATCH][RFC] Come up with VEC_COND_OP_EXPRs.

2019-09-24 Thread Martin Liška
Hi.

The patch introduces couple of new TREE_CODEs that will help us to have
a proper GIMPLE representation of current VECT_COND_EXPR. Right now,
the first argument is typically a GENERIC tcc_expression tree with 2 operands
that are visited at various places in GIMPLE code. That said, based on the 
discussion
with Richi, I'm suggesting to come up with e.g.
VECT_COND_LT_EXPR. Such a change 
logically
introduces new GIMPLE_QUATERNARY_RHS gassignments. For now, the VEC_COND_EXPR 
remains
and is only valid in GENERIC and gimplifier will take care of the corresponding 
transition.

The patch is a prototype and missing bits are:
- folding support addition for GIMPLE_QUATERNARY_RHS is missing
- fancy tcc_comparison expressions like LTGT_EXPR, UNORDERED_EXPR, ORDERED_EXPR,
  UNLT_EXPR and others are not supported right now
- comments are missing for various functions added

Apart from that I was able to bootstrap and run tests with a quite small 
fallout.
Thoughts?
Martin

---
 gcc/cfgexpand.c | 33 -
 gcc/expr.c  | 36 +-
 gcc/expr.h  |  2 +-
 gcc/gimple-expr.c   | 14 +-
 gcc/gimple-expr.h   |  6 +--
 gcc/gimple-fold.c   | 15 +-
 gcc/gimple-match-head.c |  3 ++
 gcc/gimple-pretty-print.c   | 76 
 gcc/gimple.c| 95 ++-
 gcc/gimple.h| 82 +-
 gcc/gimplify.c  | 42 +++-
 gcc/optabs.c| 58 +-
 gcc/optabs.h|  2 +-
 gcc/tree-cfg.c  | 99 -
 gcc/tree-inline.c   |  2 +-
 gcc/tree-ssa-forwprop.c | 11 +++--
 gcc/tree-ssa-loop-niter.c   |  4 +-
 gcc/tree-ssa-operands.c |  1 -
 gcc/tree-ssa-pre.c  |  5 +-
 gcc/tree-ssa-reassoc.c  |  4 +-
 gcc/tree-ssa-scopedtables.c | 46 -
 gcc/tree-ssa-scopedtables.h |  2 +
 gcc/tree-vect-generic.c | 53 +++-
 gcc/tree-vect-loop.c| 50 ---
 gcc/tree-vect-patterns.c|  4 +-
 gcc/tree-vect-stmts.c   | 17 ---
 gcc/tree.def|  7 +++
 gcc/tree.h  | 64 
 28 files changed, 620 insertions(+), 213 deletions(-)


diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index a2f96239e2f..eb5ada52a3b 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -106,6 +106,12 @@ gimple_assign_rhs_to_tree (gimple *stmt)
   tree t;
   switch (get_gimple_rhs_class (gimple_expr_code (stmt)))
 {
+case GIMPLE_QUATERNARY_RHS:
+  t = build4 (gimple_assign_rhs_code (stmt),
+		  TREE_TYPE (gimple_assign_lhs (stmt)),
+		  gimple_assign_rhs1 (stmt), gimple_assign_rhs2 (stmt),
+		  gimple_assign_rhs3 (stmt), gimple_assign_rhs4 (stmt));
+  break;
 case GIMPLE_TERNARY_RHS:
   t = build3 (gimple_assign_rhs_code (stmt),
 		  TREE_TYPE (gimple_assign_lhs (stmt)),
@@ -3793,17 +3799,20 @@ expand_gimple_stmt_1 (gimple *stmt)
 	ops.type = TREE_TYPE (lhs);
 	switch (get_gimple_rhs_class (ops.code))
 	  {
-		case GIMPLE_TERNARY_RHS:
-		  ops.op2 = gimple_assign_rhs3 (assign_stmt);
-		  /* Fallthru */
-		case GIMPLE_BINARY_RHS:
-		  ops.op1 = gimple_assign_rhs2 (assign_stmt);
-		  /* Fallthru */
-		case GIMPLE_UNARY_RHS:
-		  ops.op0 = gimple_assign_rhs1 (assign_stmt);
-		  break;
-		default:
-		  gcc_unreachable ();
+	  case GIMPLE_QUATERNARY_RHS:
+		ops.op3 = gimple_assign_rhs4 (assign_stmt);
+		/* Fallthru */
+	  case GIMPLE_TERNARY_RHS:
+		ops.op2 = gimple_assign_rhs3 (assign_stmt);
+		/* Fallthru */
+	  case GIMPLE_BINARY_RHS:
+		ops.op1 = gimple_assign_rhs2 (assign_stmt);
+		/* Fallthru */
+	  case GIMPLE_UNARY_RHS:
+		ops.op0 = gimple_assign_rhs1 (assign_stmt);
+		break;
+	  default:
+		gcc_unreachable ();
 	  }
 	ops.location = gimple_location (stmt);
 
@@ -5172,7 +5181,7 @@ expand_debug_expr (tree exp)
 
 /* Vector stuff.  For most of the codes we don't have rtl codes.  */
 case REALIGN_LOAD_EXPR:
-case VEC_COND_EXPR:
+CASE_VEC_COND_EXPR:
 case VEC_PACK_FIX_TRUNC_EXPR:
 case VEC_PACK_FLOAT_EXPR:
 case VEC_PACK_SAT_EXPR:
diff --git a/gcc/expr.c b/gcc/expr.c
index 2f2b53f8b69..de18229f162 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -8450,7 +8450,7 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode,
   int ignore;
   bool reduce_bit_field;
   location_t loc = ops->location;
-  tree treeop0, treeop1, treeop2;
+  tree treeop0, treeop1, treeop2, treeop3;
 #define REDUCE_BIT_FIELD(expr)	(reduce_bit_field			  \
  ? reduce_to_bit_field_precision ((expr), \
   target, \
@@ -8464,13 +8464,15 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode,
   treeop0 = ops->op0;
   treeop1 = ops->op1;
   treeop2 = ops->op2;
+  treeop3 = ops->op3;
 
   /* We should be called only on simple (binary or unary) expressions,
  exactly those that 

[PATCH] Use more switch statements.

2019-09-24 Thread Martin Liška
Hi.

The patch is about a refactoring where we should use
more switch statements rather that if-elseif-elseif
chains.

I've been testing the patch.
Ready to be installed after tests?
Martin

gcc/ChangeLog:

2019-09-24  Martin Liska  

* cfgexpand.c (gimple_assign_rhs_to_tree): Use switch statement
instead of if-elseif-elseif-...
* gimple-expr.c (extract_ops_from_tree): Likewise.
* gimple.c (get_gimple_rhs_num_ops): Likewise.
* tree-ssa-forwprop.c (rhs_to_tree): Likewise.
---
 gcc/cfgexpand.c | 62 -
 gcc/gimple-expr.c   | 59 ---
 gcc/gimple.c| 22 ---
 gcc/tree-ssa-forwprop.c | 29 ++-
 4 files changed, 90 insertions(+), 82 deletions(-)


diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index 5a93447f520..a2f96239e2f 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -104,38 +104,38 @@ tree
 gimple_assign_rhs_to_tree (gimple *stmt)
 {
   tree t;
-  enum gimple_rhs_class grhs_class;
-
-  grhs_class = get_gimple_rhs_class (gimple_expr_code (stmt));
-
-  if (grhs_class == GIMPLE_TERNARY_RHS)
-t = build3 (gimple_assign_rhs_code (stmt),
-		TREE_TYPE (gimple_assign_lhs (stmt)),
-		gimple_assign_rhs1 (stmt),
-		gimple_assign_rhs2 (stmt),
-		gimple_assign_rhs3 (stmt));
-  else if (grhs_class == GIMPLE_BINARY_RHS)
-t = build2 (gimple_assign_rhs_code (stmt),
-		TREE_TYPE (gimple_assign_lhs (stmt)),
-		gimple_assign_rhs1 (stmt),
-		gimple_assign_rhs2 (stmt));
-  else if (grhs_class == GIMPLE_UNARY_RHS)
-t = build1 (gimple_assign_rhs_code (stmt),
-		TREE_TYPE (gimple_assign_lhs (stmt)),
-		gimple_assign_rhs1 (stmt));
-  else if (grhs_class == GIMPLE_SINGLE_RHS)
-{
-  t = gimple_assign_rhs1 (stmt);
-  /* Avoid modifying this tree in place below.  */
-  if ((gimple_has_location (stmt) && CAN_HAVE_LOCATION_P (t)
-	   && gimple_location (stmt) != EXPR_LOCATION (t))
-	  || (gimple_block (stmt)
-	  && currently_expanding_to_rtl
-	  && EXPR_P (t)))
-	t = copy_node (t);
+  switch (get_gimple_rhs_class (gimple_expr_code (stmt)))
+{
+case GIMPLE_TERNARY_RHS:
+  t = build3 (gimple_assign_rhs_code (stmt),
+		  TREE_TYPE (gimple_assign_lhs (stmt)),
+		  gimple_assign_rhs1 (stmt), gimple_assign_rhs2 (stmt),
+		  gimple_assign_rhs3 (stmt));
+  break;
+case GIMPLE_BINARY_RHS:
+  t = build2 (gimple_assign_rhs_code (stmt),
+		  TREE_TYPE (gimple_assign_lhs (stmt)),
+		  gimple_assign_rhs1 (stmt), gimple_assign_rhs2 (stmt));
+  break;
+case GIMPLE_UNARY_RHS:
+  t = build1 (gimple_assign_rhs_code (stmt),
+		  TREE_TYPE (gimple_assign_lhs (stmt)),
+		  gimple_assign_rhs1 (stmt));
+  break;
+case GIMPLE_SINGLE_RHS:
+  {
+	t = gimple_assign_rhs1 (stmt);
+	/* Avoid modifying this tree in place below.  */
+	if ((gimple_has_location (stmt) && CAN_HAVE_LOCATION_P (t)
+	 && gimple_location (stmt) != EXPR_LOCATION (t))
+	|| (gimple_block (stmt) && currently_expanding_to_rtl
+		&& EXPR_P (t)))
+	  t = copy_node (t);
+	break;
+  }
+default:
+  gcc_unreachable ();
 }
-  else
-gcc_unreachable ();
 
   if (gimple_has_location (stmt) && CAN_HAVE_LOCATION_P (t))
 SET_EXPR_LOCATION (t, gimple_location (stmt));
diff --git a/gcc/gimple-expr.c b/gcc/gimple-expr.c
index b0c9f9b671a..4082828e198 100644
--- a/gcc/gimple-expr.c
+++ b/gcc/gimple-expr.c
@@ -528,37 +528,40 @@ void
 extract_ops_from_tree (tree expr, enum tree_code *subcode_p, tree *op1_p,
 		   tree *op2_p, tree *op3_p)
 {
-  enum gimple_rhs_class grhs_class;
-
   *subcode_p = TREE_CODE (expr);
-  grhs_class = get_gimple_rhs_class (*subcode_p);
-
-  if (grhs_class == GIMPLE_TERNARY_RHS)
-{
-  *op1_p = TREE_OPERAND (expr, 0);
-  *op2_p = TREE_OPERAND (expr, 1);
-  *op3_p = TREE_OPERAND (expr, 2);
-}
-  else if (grhs_class == GIMPLE_BINARY_RHS)
-{
-  *op1_p = TREE_OPERAND (expr, 0);
-  *op2_p = TREE_OPERAND (expr, 1);
-  *op3_p = NULL_TREE;
-}
-  else if (grhs_class == GIMPLE_UNARY_RHS)
-{
-  *op1_p = TREE_OPERAND (expr, 0);
-  *op2_p = NULL_TREE;
-  *op3_p = NULL_TREE;
-}
-  else if (grhs_class == GIMPLE_SINGLE_RHS)
+  switch (get_gimple_rhs_class (*subcode_p))
 {
-  *op1_p = expr;
-  *op2_p = NULL_TREE;
-  *op3_p = NULL_TREE;
+case GIMPLE_TERNARY_RHS:
+  {
+	*op1_p = TREE_OPERAND (expr, 0);
+	*op2_p = TREE_OPERAND (expr, 1);
+	*op3_p = TREE_OPERAND (expr, 2);
+	break;
+  }
+case GIMPLE_BINARY_RHS:
+  {
+	*op1_p = TREE_OPERAND (expr, 0);
+	*op2_p = TREE_OPERAND (expr, 1);
+	*op3_p = NULL_TREE;
+	break;
+  }
+case GIMPLE_UNARY_RHS:
+  {
+	*op1_p = TREE_OPERAND (expr, 0);
+	*op2_p = NULL_TREE;
+	*op3_p = NULL_TREE;
+	break;
+  }
+case GIMPLE_SINGLE_RHS:
+  {
+	*op1_p = expr;
+	*op2_p = NULL_TREE;
+	*op3_p = NULL_TREE;
+	break;
+  }
+default:
+  gcc_unreachable ();
 }
-  else
-

Re: [PATCH] PR libstdc++/91788 improve codegen for std::variant::index()

2019-09-24 Thread Jonathan Wakely

On 24/09/19 11:24 +0200, Marc Glisse wrote:

On Tue, 24 Sep 2019, Jonathan Wakely wrote:


On 24/09/19 09:57 +0100, Jonathan Wakely wrote:

On 23/09/19 19:39 +0200, Marc Glisse wrote:

On Mon, 23 Sep 2019, Jonathan Wakely wrote:


If __index_type is a smaller type than size_t, then the result of
size_t(__index_type(-1)) is not equal to size_t(-1), but to an incorrect
value such as size_t(255) or size_t(65535). The old implementation of
variant::index() uses (size_t(__index_type(_M_index + 1)) - 1)
which is always correct, but generates suboptimal code for many common
cases.

When the __index_type is size_t or valueless variants are not possible
we can just return the value directly.

When the number of alternatives is sufficiently small the result of
converting the _M_index value to the corresponding signed type will be
either non-negative or -1. In those cases converting to the signed type
and then to size_t will either produce the correct positive value or
will sign extend -1 to (size_t)-1 as desired.

For the remaining case we keep the existing arithmetic operations to
ensure the correct result.

PR libstdc++/91788 (partial)
* include/std/variant (variant::index()): Improve codegen for cases
where conversion to size_t already works correctly.

Tested x86_64-linux, committed to trunk.


Thanks.

+   if constexpr (is_same_v<__index_type, size_t>)
+ return this->_M_index;

I don't think this special case is useful, gcc has no trouble 
optimizing the other 2 versions to nothing when the types are 
the same. Of course it won't hurt either.


My rationale was that it's much cheaper to instantiate is_same_v than
the __never_valueless() check (and will be even cheaper after
the concepts-cxx2a branch merges, as I plan to make is_same_v use the
__is_same_as built-in to avoid instantiating the std::is_same class
template).

That's probably not a big saving, as the __never_valueless function
template will almost certainly be used by some other member function
anyway.


On the other hand ... a variant with size_t as the index type is
probably vanishingly rare, because it would need tens of thousands of
alternatives.


I thought the code only allowed unsigned char and unsigned short, so 
it would require a platform where size_t is the same as one of 
those...


For some reason I thought it would use size_t when there are more than
64k alternatives, but we just don't support that. So it's never
size_t.

I'm running the tests for the attached fix.



So doing the (sizeof...(_Types) <= __index_type(-1)/2
case first might make more sense.


Er, from a codegen point of view, I would rather start with the 
simplest version (the zero-extension, as in the current code).


--
Marc Glisse
commit 2f997a270eb1360dd3411f4f9a6212fa0dd4e8d6
Author: Jonathan Wakely 
Date:   Tue Sep 24 11:10:24 2019 +0100

Remove check for impossible condition in std::variant::index()

The __index_type is only ever unsigned char or unsigned short, so not
the same type as size_t.

* include/std/variant (variant::index()): Remove impossible case.

diff --git a/libstdc++-v3/include/std/variant b/libstdc++-v3/include/std/variant
index c0043243ec2..646ef416272 100644
--- a/libstdc++-v3/include/std/variant
+++ b/libstdc++-v3/include/std/variant
@@ -1520,9 +1520,7 @@ namespace __variant
   constexpr size_t index() const noexcept
   {
 	using __index_type = typename _Base::__index_type;
-	if constexpr (is_same_v<__index_type, size_t>)
-	  return this->_M_index;
-	else if constexpr (__detail::__variant::__never_valueless<_Types...>())
+	if constexpr (__detail::__variant::__never_valueless<_Types...>())
 	  return this->_M_index;
 	else if constexpr (sizeof...(_Types) <= __index_type(-1) / 2)
 	  return make_signed_t<__index_type>(this->_M_index);


[PATCH] Fix ICE in VN

2019-09-24 Thread Richard Biener


The following fixes an ICE privately reported to me by Jeff.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2019-09-24  Richard Biener  

* tree-ssa-sccvn.c (vn_reference_lookup_3): Valueize MEM_REF
base.

* gcc.dg/torture/20190924-1.c: New testcase.

Index: gcc/tree-ssa-sccvn.c
===
--- gcc/tree-ssa-sccvn.c(revision 276054)
+++ gcc/tree-ssa-sccvn.c(working copy)
@@ -2935,8 +2935,9 @@ vn_reference_lookup_3 (ao_ref *ref, tree
  else
return (void *)-1;
}
-  if (TREE_CODE (rhs) != SSA_NAME
- && TREE_CODE (rhs) != ADDR_EXPR)
+  if (TREE_CODE (rhs) == SSA_NAME)
+   rhs = SSA_VAL (rhs);
+  else if (TREE_CODE (rhs) != ADDR_EXPR)
return (void *)-1;
 
   /* The bases of the destination and the references have to agree.  */
Index: gcc/testsuite/gcc.dg/torture/20190924-1.c
===
--- gcc/testsuite/gcc.dg/torture/20190924-1.c   (nonexistent)
+++ gcc/testsuite/gcc.dg/torture/20190924-1.c   (working copy)
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+
+struct acct_gather_energy {
+   int base_consumed_energy;
+   int consumed_energy;
+   int previous_consumed_energy;
+};
+static struct acct_gather_energy xcc_energy;
+struct acct_gather_energy *new;
+int _get_joules_task(int first)
+{
+  if (!first && new->previous_consumed_energy)
+first = 1;
+  new->base_consumed_energy = new->consumed_energy;
+  __builtin_memcpy(_energy, new, sizeof(struct acct_gather_energy));
+  return xcc_energy.base_consumed_energy;
+}


[PATCH] PR libstdc++/91871 fix Clang warnings in testsuite

2019-09-24 Thread Jonathan Wakely

PR libstdc++/91871
* testsuite/util/testsuite_hooks.h
(conversion::iterator_to_const_iterator()): Do not return an invalid
iterator. Test direct-initialization and direct-list-initialization
as well as implicit conversion.

Tested x86_64-linux, normal and debug mode. Committed to trunk.



commit a4d7bc903e708136f8f359a16bd7a982aff40881
Author: Jonathan Wakely 
Date:   Tue Sep 24 11:07:20 2019 +0100

PR libstdc++/91871 fix Clang warnings in testsuite

PR libstdc++/91871
* testsuite/util/testsuite_hooks.h
(conversion::iterator_to_const_iterator()): Do not return an invalid
iterator. Test direct-initialization and direct-list-initialization
as well as implicit conversion.

diff --git a/libstdc++-v3/testsuite/util/testsuite_hooks.h 
b/libstdc++-v3/testsuite/util/testsuite_hooks.h
index 51c431bf9c0..84a44faa710 100644
--- a/libstdc++-v3/testsuite/util/testsuite_hooks.h
+++ b/libstdc++-v3/testsuite/util/testsuite_hooks.h
@@ -326,13 +326,15 @@ namespace __gnu_test
   typedef typename _Container::const_iterator const_iterator;
 
   // Implicit conversion iterator to const_iterator.
-  static const_iterator
+  static void
   iterator_to_const_iterator()
   {
_Container v;
-   const_iterator it = v.begin();
-   const_iterator end = v.end();
-   return it == end ? v.end() : it;
+   const_iterator i __attribute__((unused)) = const_iterator(v.begin());
+   const_iterator j __attribute__((unused)) = true ? i : v.begin();
+#if __cplusplus >= 201103L
+   const_iterator k __attribute__((unused)) { v.begin() };
+#endif
   }
 };
 


Re: [PATCH] PR libstdc++/91788 improve codegen for std::variant::index()

2019-09-24 Thread Marc Glisse

On Tue, 24 Sep 2019, Jonathan Wakely wrote:


On 24/09/19 09:57 +0100, Jonathan Wakely wrote:

On 23/09/19 19:39 +0200, Marc Glisse wrote:

On Mon, 23 Sep 2019, Jonathan Wakely wrote:


If __index_type is a smaller type than size_t, then the result of
size_t(__index_type(-1)) is not equal to size_t(-1), but to an incorrect
value such as size_t(255) or size_t(65535). The old implementation of
variant::index() uses (size_t(__index_type(_M_index + 1)) - 1)
which is always correct, but generates suboptimal code for many common
cases.

When the __index_type is size_t or valueless variants are not possible
we can just return the value directly.

When the number of alternatives is sufficiently small the result of
converting the _M_index value to the corresponding signed type will be
either non-negative or -1. In those cases converting to the signed type
and then to size_t will either produce the correct positive value or
will sign extend -1 to (size_t)-1 as desired.

For the remaining case we keep the existing arithmetic operations to
ensure the correct result.

PR libstdc++/91788 (partial)
* include/std/variant (variant::index()): Improve codegen for cases
where conversion to size_t already works correctly.

Tested x86_64-linux, committed to trunk.


Thanks.

+   if constexpr (is_same_v<__index_type, size_t>)
+ return this->_M_index;

I don't think this special case is useful, gcc has no trouble optimizing 
the other 2 versions to nothing when the types are the same. Of course it 
won't hurt either.


My rationale was that it's much cheaper to instantiate is_same_v than
the __never_valueless() check (and will be even cheaper after
the concepts-cxx2a branch merges, as I plan to make is_same_v use the
__is_same_as built-in to avoid instantiating the std::is_same class
template).

That's probably not a big saving, as the __never_valueless function
template will almost certainly be used by some other member function
anyway.


On the other hand ... a variant with size_t as the index type is
probably vanishingly rare, because it would need tens of thousands of
alternatives.


I thought the code only allowed unsigned char and unsigned short, so it 
would require a platform where size_t is the same as one of those...



So doing the (sizeof...(_Types) <= __index_type(-1)/2
case first might make more sense.


Er, from a codegen point of view, I would rather start with the simplest 
version (the zero-extension, as in the current code).


--
Marc Glisse


Re: [PATCH] Fix ICE when __builtin_calloc has no LHS (PR tree-optimization/91014).

2019-09-24 Thread Thomas Schwinge
Hi!

Curious: even if you found the issue on a s390x target, shouldn't this
(presumably generic?) test case live in a generic place instead of
'gcc.target/s390/'?


Grüße
 Thomas


On 2019-06-27T11:21:33+0200, Martin Liška  wrote:
> This is quite an obvious changes I've noticed during fuzzing
> of s390x target compiler.
>
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>
> Ready to be installed?
> Thanks,
> Martin
>
> gcc/ChangeLog:
>
> 2019-06-27  Martin Liska  
>
>   PR tree-optimization/91014
>   * tree-ssa-dse.c (initialize_ao_ref_for_dse): Bail out
>   when LHS is NULL_TREE.
>
> gcc/testsuite/ChangeLog:
>
> 2019-06-27  Martin Liska  
>
>   PR tree-optimization/91014
>   * gcc.target/s390/pr91014.c: New test.
> ---
>  gcc/testsuite/gcc.target/s390/pr91014.c | 8 
>  gcc/tree-ssa-dse.c  | 5 +++--
>  2 files changed, 11 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/s390/pr91014.c
>
>
> diff --git a/gcc/testsuite/gcc.target/s390/pr91014.c 
> b/gcc/testsuite/gcc.target/s390/pr91014.c
> new file mode 100644
> index 000..eb37b333b5b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/s390/pr91014.c
> @@ -0,0 +1,8 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O" } */
> +/* { dg-require-effective-target alloca } */
> +
> +void foo(void)
> +{
> + __builtin_calloc (1, 1); /* { dg-warning "ignoring return value of 
> '__builtin_calloc' declared with attribute 'warn_unused_result'" } */
> +}
> diff --git a/gcc/tree-ssa-dse.c b/gcc/tree-ssa-dse.c
> index 1b1a9f34230..df05a55ce78 100644
> --- a/gcc/tree-ssa-dse.c
> +++ b/gcc/tree-ssa-dse.c
> @@ -129,10 +129,11 @@ initialize_ao_ref_for_dse (gimple *stmt, ao_ref *write)
>   {
> tree nelem = gimple_call_arg (stmt, 0);
> tree selem = gimple_call_arg (stmt, 1);
> +   tree lhs;
> if (TREE_CODE (nelem) == INTEGER_CST
> -   && TREE_CODE (selem) == INTEGER_CST)
> +   && TREE_CODE (selem) == INTEGER_CST
> +   && (lhs = gimple_call_lhs (stmt)) != NULL_TREE)
>   {
> -   tree lhs = gimple_call_lhs (stmt);
> tree size = fold_build2 (MULT_EXPR, TREE_TYPE (nelem),
>  nelem, selem);
> ao_ref_init_from_ptr_and_size (write, lhs, size);


Re: [PATCH] PR libstdc++/91788 improve codegen for std::variant::index()

2019-09-24 Thread Jonathan Wakely

On 24/09/19 09:57 +0100, Jonathan Wakely wrote:

On 23/09/19 19:39 +0200, Marc Glisse wrote:

On Mon, 23 Sep 2019, Jonathan Wakely wrote:


If __index_type is a smaller type than size_t, then the result of
size_t(__index_type(-1)) is not equal to size_t(-1), but to an incorrect
value such as size_t(255) or size_t(65535). The old implementation of
variant::index() uses (size_t(__index_type(_M_index + 1)) - 1)
which is always correct, but generates suboptimal code for many common
cases.

When the __index_type is size_t or valueless variants are not possible
we can just return the value directly.

When the number of alternatives is sufficiently small the result of
converting the _M_index value to the corresponding signed type will be
either non-negative or -1. In those cases converting to the signed type
and then to size_t will either produce the correct positive value or
will sign extend -1 to (size_t)-1 as desired.

For the remaining case we keep the existing arithmetic operations to
ensure the correct result.

PR libstdc++/91788 (partial)
* include/std/variant (variant::index()): Improve codegen for cases
where conversion to size_t already works correctly.

Tested x86_64-linux, committed to trunk.


Thanks.

+   if constexpr (is_same_v<__index_type, size_t>)
+ return this->_M_index;

I don't think this special case is useful, gcc has no trouble 
optimizing the other 2 versions to nothing when the types are the 
same. Of course it won't hurt either.


My rationale was that it's much cheaper to instantiate is_same_v than
the __never_valueless() check (and will be even cheaper after
the concepts-cxx2a branch merges, as I plan to make is_same_v use the
__is_same_as built-in to avoid instantiating the std::is_same class
template).

That's probably not a big saving, as the __never_valueless function
template will almost certainly be used by some other member function
anyway.


On the other hand ... a variant with size_t as the index type is
probably vanishingly rare, because it would need tens of thousands of
alternatives. So doing the (sizeof...(_Types) <= __index_type(-1)/2
case first might make more sense.




Re: [PATCH] PR libstdc++/91788 improve codegen for std::variant::index()

2019-09-24 Thread Jonathan Wakely

On 23/09/19 19:39 +0200, Marc Glisse wrote:

On Mon, 23 Sep 2019, Jonathan Wakely wrote:


If __index_type is a smaller type than size_t, then the result of
size_t(__index_type(-1)) is not equal to size_t(-1), but to an incorrect
value such as size_t(255) or size_t(65535). The old implementation of
variant::index() uses (size_t(__index_type(_M_index + 1)) - 1)
which is always correct, but generates suboptimal code for many common
cases.

When the __index_type is size_t or valueless variants are not possible
we can just return the value directly.

When the number of alternatives is sufficiently small the result of
converting the _M_index value to the corresponding signed type will be
either non-negative or -1. In those cases converting to the signed type
and then to size_t will either produce the correct positive value or
will sign extend -1 to (size_t)-1 as desired.

For the remaining case we keep the existing arithmetic operations to
ensure the correct result.

PR libstdc++/91788 (partial)
* include/std/variant (variant::index()): Improve codegen for cases
where conversion to size_t already works correctly.

Tested x86_64-linux, committed to trunk.


Thanks.

+   if constexpr (is_same_v<__index_type, size_t>)
+ return this->_M_index;

I don't think this special case is useful, gcc has no trouble 
optimizing the other 2 versions to nothing when the types are the 
same. Of course it won't hurt either.


My rationale was that it's much cheaper to instantiate is_same_v than
the __never_valueless() check (and will be even cheaper after
the concepts-cxx2a branch merges, as I plan to make is_same_v use the
__is_same_as built-in to avoid instantiating the std::is_same class
template).

That's probably not a big saving, as the __never_valueless function
template will almost certainly be used by some other member function
anyway.



Re: [PATCH] driver: Also prune joined switches with negation

2019-09-24 Thread Kyrill Tkachov

Hi Matt,

On 9/24/19 5:04 AM, Matt Turner wrote:

When -march=native is passed to host_detect_local_cpu to the backend,
it overrides all command lines after it.  That means

$ gcc -march=native -march=armv8-a

is treated as

$ gcc -march=armv8-a -march=native

Prune joined switches with Negative and RejectNegative to allow
-march=armv8-a to override previous -march=native on command-line.

This is the same fix as was applied for i386 in SVN revision 269164 
but for

aarch64 and arm.


The fix is ok for arm and LGTM for aarch64 FWIW.

How has this been tested?

However...



gcc/

    PR driver/69471
    * config/aarch64/aarch64.opt (march=): Add Negative(march=).
    (mtune=): Add Negative(mtune=).
    * config/arm/arm.opt: Likewise.
---
 gcc/config/aarch64/aarch64.opt | 5 +++--
 gcc/config/arm/arm.opt | 4 ++--
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.opt 
b/gcc/config/aarch64/aarch64.opt

index 865b6a6d8ca..908dca23b3c 100644
--- a/gcc/config/aarch64/aarch64.opt
+++ b/gcc/config/aarch64/aarch64.opt
@@ -119,7 +119,8 @@ EnumValue
 Enum(aarch64_tls_size) String(48) Value(48)

 march=
-Target RejectNegative ToLower Joined Var(aarch64_arch_string)
+Target RejectNegative Negative(march=) ToLower Joined 
Var(aarch64_arch_string)

+
 Use features of architecture ARCH.

 mcpu=



... Looks like we'll need something similar for -mcpu. On arm and 
aarch64 the -mcpu is the most commonly used option and that can also 
take a "native" value that would suffer from the same issue I presume.


Thanks,

Kyrill


@@ -127,7 +128,7 @@ Target RejectNegative ToLower Joined 
Var(aarch64_cpu_string)

 Use features of and optimize for CPU.

 mtune=
-Target RejectNegative ToLower Joined Var(aarch64_tune_string)
+Target RejectNegative Negative(mtune=) ToLower Joined 
Var(aarch64_tune_string)

 Optimize for CPU.

 mabi=
diff --git a/gcc/config/arm/arm.opt b/gcc/config/arm/arm.opt
index 452f0cf6d67..e3ead5c95d1 100644
--- a/gcc/config/arm/arm.opt
+++ b/gcc/config/arm/arm.opt
@@ -82,7 +82,7 @@ mapcs-stack-check
 Target Report Mask(APCS_STACK) Undocumented

 march=
-Target RejectNegative ToLower Joined Var(arm_arch_string)
+Target RejectNegative Negative(march=) ToLower Joined 
Var(arm_arch_string)

 Specify the name of the target architecture.

 ; Other arm_arch values are loaded from arm-tables.opt
@@ -232,7 +232,7 @@ Target Report Mask(TPCS_LEAF_FRAME)
 Thumb: Generate (leaf) stack frames even if not needed.

 mtune=
-Target RejectNegative ToLower Joined Var(arm_tune_string)
+Target RejectNegative Negative(mtune=) ToLower Joined 
Var(arm_tune_string)

 Tune code for the given processor.

 mprint-tune-info
--
2.21.0



Re: [PR 91831] Copy PARM_DECLs of artificial thunks

2019-09-24 Thread Richard Biener
On Mon, Sep 23, 2019 at 6:59 PM Martin Jambor  wrote:
>
> Hi,
>
> I am quite surprised I did not catch this before but the new
> ipa-param-manipulation does not copy PARM_DECLs when creating artificial
> thinks (I think it originally did but then I somehow removed during one
> cleanups).  Fixed below by adding the capability at the natural place.
> It is triggered whenever context of the PARM_DECL that is just taken
> from the original function does not match the target fndecl rather than
> by some constructor parameter because in such situation it is always the
> correct thing to do.
>
> Bootstrapped and tested on x86_64-linux.  OK for trunk?

OK.

Thanks,
Richard.

> Thanks,
>
> Martin
>
> 2019-09-23  Martin Jambor  
>
> PR ipa/91831
> * ipa-param-manipulation.c (carry_over_param): Make a method of
> ipa_param_body_adjustments, remove now unnecessary argument.  Also 
> copy
> in case of a context mismatch.
> (ipa_param_body_adjustments::common_initialization): Adjust call to
> carry_over_param.
> * ipa-param-manipulation.h (class ipa_param_body_adjustments): Add
> private method carry_over_param.
>
> testsuite/
> * g++.dg/ipa/pr91831.C: New test.
> ---
>  gcc/ipa-param-manipulation.c   | 22 ++
>  gcc/ipa-param-manipulation.h   |  1 +
>  gcc/testsuite/g++.dg/ipa/pr91831.C | 19 +++
>  3 files changed, 34 insertions(+), 8 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/ipa/pr91831.C
>
> diff --git a/gcc/ipa-param-manipulation.c b/gcc/ipa-param-manipulation.c
> index 7f52e9c2506..913b96fefa4 100644
> --- a/gcc/ipa-param-manipulation.c
> +++ b/gcc/ipa-param-manipulation.c
> @@ -906,18 +906,24 @@ ipa_param_body_adjustments::register_replacement 
> (ipa_adjusted_param *apm,
>m_replacements.safe_push (psr);
>  }
>
> -/* Copy or not, as appropriate given ID, a pre-existing PARM_DECL T so that
> -   it can be included in the parameters of the modified function.  */
> +/* Copy or not, as appropriate given m_id and decl context, a pre-existing
> +   PARM_DECL T so that it can be included in the parameters of the modified
> +   function.  */
>
> -static tree
> -carry_over_param (tree t, struct copy_body_data *id)
> +tree
> +ipa_param_body_adjustments::carry_over_param (tree t)
>  {
>tree new_parm;
> -  if (id)
> +  if (m_id)
>  {
> -  new_parm = remap_decl (t, id);
> +  new_parm = remap_decl (t, m_id);
>if (TREE_CODE (new_parm) != PARM_DECL)
> -   new_parm = id->copy_decl (t, id);
> +   new_parm = m_id->copy_decl (t, m_id);
> +}
> +  else if (DECL_CONTEXT (t) != m_fndecl)
> +{
> +  new_parm = copy_node (t);
> +  DECL_CONTEXT (new_parm) = m_fndecl;
>  }
>else
>  new_parm = t;
> @@ -982,7 +988,7 @@ ipa_param_body_adjustments::common_initialization (tree 
> old_fndecl,
>   || apm->prev_clone_adjustment)
> {
>   kept[prev_index] = true;
> - new_parm = carry_over_param (m_oparms[prev_index], m_id);
> + new_parm = carry_over_param (m_oparms[prev_index]);
>   m_new_decls.quick_push (new_parm);
> }
>else if (apm->op == IPA_PARAM_OP_NEW
> diff --git a/gcc/ipa-param-manipulation.h b/gcc/ipa-param-manipulation.h
> index 34477da51b7..8e9554563e4 100644
> --- a/gcc/ipa-param-manipulation.h
> +++ b/gcc/ipa-param-manipulation.h
> @@ -370,6 +370,7 @@ public:
>  private:
>void common_initialization (tree old_fndecl, tree *vars,
>   vec *tree_map);
> +  tree carry_over_param (tree t);
>unsigned get_base_index (ipa_adjusted_param *apm);
>ipa_param_body_replacement *lookup_replacement_1 (tree base,
> unsigned unit_offset);
> diff --git a/gcc/testsuite/g++.dg/ipa/pr91831.C 
> b/gcc/testsuite/g++.dg/ipa/pr91831.C
> new file mode 100644
> index 000..66e4b693151
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/ipa/pr91831.C
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 --param uninlined-thunk-insns=1000"  } */
> +
> +struct A {
> +  virtual void m_fn1();
> +};
> +struct B {
> +  virtual void *m_fn2(int, int) = 0;
> +};
> +struct C : A, B {
> +  void *m_fn2(int, int) { return this; }
> +};
> +void *fn1(B ) { return p1.m_fn2(0, 0); }
> +
> +int main() {
> +  C c;
> +  fn1(c);
> +  return 0;
> +}
> --
> 2.23.0
>


Re: [PR 91832] Do not ICE on negative offsets in ipa-sra

2019-09-24 Thread Richard Biener
On Mon, Sep 23, 2019 at 4:17 PM Martin Jambor  wrote:
>
> Hi,
>
> IPA-SRA asserts that an offset obtained from get_ref_base_and_extent
> is non-negative (after it verifies it is based on a parameter).  That
> assumption is invalid as the testcase shows.  One could probably also write a
> testcase with defined behavior, but unless I see a reasonable one
> where the transformation is really desirable, I'd like to just punt on
> those cases.
>
> Bootstrapped and tested on x86_64-linux.  OK for trunk?

OK.

Richard.

> Thanks,
>
> Martin
>
> 2019-09-23  Martin Jambor  
>
> PR ipa/91832
> * ipa-sra.c (scan_expr_access): Check that offset is non-negative.
>
> testsuite/
> * gcc.dg/ipa/pr91832.c: New test.
> ---
>  gcc/ipa-sra.c  |  7 ++-
>  gcc/testsuite/gcc.dg/ipa/pr91832.c | 12 
>  2 files changed, 18 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.dg/ipa/pr91832.c
>
> diff --git a/gcc/ipa-sra.c b/gcc/ipa-sra.c
> index a32defb59bd..0ccebbd4607 100644
> --- a/gcc/ipa-sra.c
> +++ b/gcc/ipa-sra.c
> @@ -1692,7 +1692,12 @@ scan_expr_access (tree expr, gimple *stmt, 
> isra_scan_context ctx,
>disqualify_split_candidate (desc, "Encountered a bit-field access.");
>return;
>  }
> -  gcc_assert (offset >= 0);
> +  if (offset < 0)
> +{
> +  disqualify_split_candidate (desc, "Encountered an access at a "
> + "negative offset.");
> +  return;
> +}
>gcc_assert ((offset % BITS_PER_UNIT) == 0);
>gcc_assert ((size % BITS_PER_UNIT) == 0);
>if ((offset / BITS_PER_UNIT) >= (UINT_MAX - ISRA_ARG_SIZE_LIMIT)
> diff --git a/gcc/testsuite/gcc.dg/ipa/pr91832.c 
> b/gcc/testsuite/gcc.dg/ipa/pr91832.c
> new file mode 100644
> index 000..4a0d62ec1d9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/ipa/pr91832.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2"  } */
> +
> +struct A1 {
> +  char a1[1];
> +};
> +
> +void fn2(char a);
> +
> +void fn1(struct A1 *p1) {
> +  fn2(p1->a1[-1]);
> +}
> --
> 2.23.0
>


[Patch, fortran] PR91729 - ICE in gfc_match_select_rank, at fortran/match.c:6586

2019-09-24 Thread Paul Richard Thomas
Fixed as obvious in revision: 276051.

The patch is largely due to Steve, for which thanks.

Paul

2019-09-23  Paul Thomas  

PR fortran/91729
* match.c (gfc_match_select_rank): Initialise 'as' to NULL.
Check for a symtree in the selector expression before trying to
assign a value to 'as'. Revert to gfc_error and go to cleanup
after setting a MATCH_ERROR.

2019-09-23  Paul Thomas  

PR fortran/91729
* gfortran.dg/select_rank_2.f90 : Add two more errors in foo2.
* gfortran.dg/select_rank_3.f90 : New test.


-- 
"If you can't explain it simply, you don't understand it well enough"
- Albert Einstein


[PATCH], V4.1, patch #2: Add prefixed insn attribute (revised)

2019-09-24 Thread Michael Meissner
This patch revises patch #2, fixing an issue that shows up in compiling large
code like the Spec 2017 benchmark suite.  The issue was when a vector register
uses TImode, it needs to assume the non-prefixed instruction uses the DQ
encoding.

I also changed the spelling of PC-relative to be consitant.

The patch is also adjusted due to the changes made in the revised patch #1.

Assuming the revised patch #1 is checked in, can I check in this revised patch
into the trunk?  I did a bootstrap and make check with the patch and there were
no regressions.  I applied the remaining patches, and they also have no
regressions, and they can build the Spec 2017 test suite.

2019-09-23  Michael Meissner  

* config/rs6000/rs6000-protos.h (prefixed_load_p): New
declaration.
(prefixed_store_p): New declaration.
(prefixed_paddi_p): New declaration.
(rs6000_asm_output_opcode): New declaration.
(rs6000_final_prescan_insn): Move declaration and update calling
signature.
(address_is_prefixed): New helper inline function.
* config/rs6000/rs6000.c (rs6000_emit_move): Support loading
PC-relative addresses.
(reg_to_non_prefixed): New function to identify what the
non-prefixed memory instruction format is for a register.
(prefixed_load_p): New function to identify prefixed loads.
(prefixed_store_p): New function to identify prefixed stores.
(prefixed_paddi_p): New function to identify prefixed load
immediates.
(next_insn_prefixed_p): New static state variable.
(rs6000_final_prescan_insn): New function to determine if an insn
uses a prefixed instruction.
(rs6000_asm_output_opcode): New function to emit 'p' in front of a
prefixed instruction.
* config/rs6000/rs6000.h (FINAL_PRESCAN_INSN): New target hook.
(ASM_OUTPUT_OPCODE): New target hook.
* config/rs6000/rs6000.md (prefixed): New insn attribute for
prefixed instructions.
(prefixed_length): New insn attribute for the size of prefixed
instructions.
(non_prefixed_length): New insn attribute for the size of
non-prefixed instructions.
(pcrel_local_addr): New insn to load up a local PC-relative
address.
(pcrel_extern_addr): New insn to load up an external PC-relative
address.

Index: gcc/config/rs6000/rs6000-protos.h
===
--- gcc/config/rs6000/rs6000-protos.h   (revision 276069)
+++ gcc/config/rs6000/rs6000-protos.h   (working copy)
@@ -189,6 +189,30 @@ enum non_prefixed {
 
 extern enum insn_form address_to_insn_form (rtx, machine_mode,
enum non_prefixed);
+extern bool prefixed_load_p (rtx_insn *);
+extern bool prefixed_store_p (rtx_insn *);
+extern bool prefixed_paddi_p (rtx_insn *);
+extern void rs6000_asm_output_opcode (FILE *);
+extern void rs6000_final_prescan_insn (rtx_insn *, rtx [], int);
+
+/* Return true if the address is a prefixed instruction that can be directly
+   used in a memory instruction (i.e. using numeric offset or a PC-relative
+   reference to a local symbol).
+
+   References to external PC-relative symbols aren't allowed, because GCC has
+   to load the address into a register and then issue a separate load or
+   store.  */
+
+static inline bool
+address_is_prefixed (rtx addr,
+machine_mode mode,
+enum non_prefixed non_prefixed_insn)
+{
+  enum insn_form iform = address_to_insn_form (addr, mode,
+  non_prefixed_insn);
+  return (iform == INSN_FORM_PREFIXED_NUMERIC
+ || iform == INSN_FORM_PCREL_LOCAL);
+}
 #endif /* RTX_CODE */
 
 #ifdef TREE_CODE
@@ -268,8 +292,6 @@ extern void rs6000_d_target_versions (vo
 const char * rs6000_xcoff_strip_dollar (const char *);
 #endif
 
-void rs6000_final_prescan_insn (rtx_insn *, rtx *operand, int num_operands);
-
 extern unsigned char rs6000_class_max_nregs[][LIM_REG_CLASSES];
 extern unsigned char rs6000_hard_regno_nregs[][FIRST_PSEUDO_REGISTER];
 
Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 276069)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -9639,6 +9639,22 @@ rs6000_emit_move (rtx dest, rtx source,
  return;
}
 
+  /* Use the default pattern for loading up PC-relative addresses.  */
+  if (TARGET_PCREL && mode == Pmode
+ && (SYMBOL_REF_P (operands[1]) || LABEL_REF_P (operands[1])
+ || GET_CODE (operands[1]) == CONST))
+   {
+ enum insn_form iform = address_to_insn_form (operands[1], mode,
+  NON_PREFIXED_DEFAULT);
+
+ if (iform == INSN_FORM_PCREL_LOCAL
+ || iform == INSN_FORM_PCREL_EXTERNAL)
+   {
+ emit_insn