Re: [PATCH] Avoid peeling in cunrolli

2017-11-29 Thread Richard Biener
On Wed, 29 Nov 2017, Richard Biener wrote:

> 
> It turns out that we don't vectorize the 2nd testcase in PR83202
> (or rather we do that in weird ways during BB vectorization) because
> cunrolli decides to peel the inner loop completely based on
> the size of the accessed arrays.  That unfortunately leaves exit
> tests in the outer loop body which in turn makes us not vectorize
> the loop.
> 
> We have a late unrolling pass for these kind of unrollings so this
> patch simply avoids doing this during cunrolli.
> 
> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

And this is what I applied after reviewing testsuite regressions of
the first.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

Richard.

2017-11-30  Richard Biener  

PR tree-optimization/83202
* tree-ssa-loop-ivcanon.c (try_unroll_loop_completely): Add
allow_peel argument and guard peeling.
(canonicalize_loop_induction_variables): Likewise.
(canonicalize_induction_variables): Pass false.
(tree_unroll_loops_completely_1): Pass unroll_outer to disallow
peeling from cunrolli.

* gcc.dg/vect/pr83202-1.c: New testcase.
* gcc.dg/tree-ssa/pr61743-1.c: Adjust.

Index: gcc/tree-ssa-loop-ivcanon.c
===
--- gcc/tree-ssa-loop-ivcanon.c (revision 255201)
+++ gcc/tree-ssa-loop-ivcanon.c (working copy)
@@ -679,7 +679,7 @@ try_unroll_loop_completely (struct loop
edge exit, tree niter,
enum unroll_level ul,
HOST_WIDE_INT maxiter,
-   location_t locus)
+   location_t locus, bool allow_peel)
 {
   unsigned HOST_WIDE_INT n_unroll = 0;
   bool n_unroll_found = false;
@@ -711,7 +711,8 @@ try_unroll_loop_completely (struct loop
 exit = NULL;
 
   /* See if we can improve our estimate by using recorded loop bounds.  */
-  if (maxiter >= 0
+  if ((allow_peel || maxiter == 0 || ul == UL_NO_GROWTH)
+  && maxiter >= 0
   && (!n_unroll_found || (unsigned HOST_WIDE_INT)maxiter < n_unroll))
 {
   n_unroll = maxiter;
@@ -1139,7 +1140,7 @@ try_peel_loop (struct loop *loop,
 static bool
 canonicalize_loop_induction_variables (struct loop *loop,
   bool create_iv, enum unroll_level ul,
-  bool try_eval)
+  bool try_eval, bool allow_peel)
 {
   edge exit = NULL;
   tree niter;
@@ -1207,7 +1208,8 @@ canonicalize_loop_induction_variables (s
  populates the loop bounds.  */
   modified |= remove_redundant_iv_tests (loop);
 
-  if (try_unroll_loop_completely (loop, exit, niter, ul, maxiter, locus))
+  if (try_unroll_loop_completely (loop, exit, niter, ul, maxiter, locus,
+ allow_peel))
 return true;
 
   if (create_iv
@@ -1238,7 +1240,7 @@ canonicalize_induction_variables (void)
 {
   changed |= canonicalize_loop_induction_variables (loop,
true, UL_SINGLE_ITER,
-   true);
+   true, false);
 }
   gcc_assert (!need_ssa_update_p (cfun));
 
@@ -1353,7 +1355,7 @@ tree_unroll_loops_completely_1 (bool may
 ul = UL_NO_GROWTH;
 
   if (canonicalize_loop_induction_variables
-(loop, false, ul, !flag_tree_loop_ivcanon))
+(loop, false, ul, !flag_tree_loop_ivcanon, unroll_outer))
 {
   /* If we'll continue unrolling, we need to propagate constants
 within the new basic blocks to fold away induction variable
Index: gcc/testsuite/gcc.dg/vect/pr83202-1.c
===
--- gcc/testsuite/gcc.dg/vect/pr83202-1.c   (nonexistent)
+++ gcc/testsuite/gcc.dg/vect/pr83202-1.c   (working copy)
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_double } */
+
+void test(double data[8][8])
+{
+  for (int i = 0; i < 8; i++)
+{
+  for (int j = 0; j < i; j+=4)
+   {
+ data[i][j] *= data[i][j];
+ data[i][j+1] *= data[i][j+1];
+ data[i][j+2] *= data[i][j+2];
+ data[i][j+3] *= data[i][j+3];
+   }
+}
+}
+
+/* { dg-final { scan-tree-dump "Loop contains only SLP stmts" "vect" } } */
+/* { dg-final { scan-tree-dump "ectorized 1 loops" "vect" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/pr61743-1.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/pr61743-1.c   (revision 255201)
+++ gcc/testsuite/gcc.dg/tree-ssa/pr61743-1.c   (working copy)
@@ -48,5 +48,6 @@ int foo1 (e_u8 a[4][N], int b1, int b2,
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "loop with 3 iterations completely 
unrolled" 8 "cunroll" } } */
-/* { dg-final { scan-tree-dump-times "loop with 

Go patch committed: Don't make map zero value constant

2017-11-29 Thread Ian Lance Taylor
This patch changes the Go frontend to not make the map zero value
constant.  The map zero value is a common symbol, and it doesn't
really make sense to have a constant common symbol. Current GCC has
started to reject this case, probably as part of the fix for PR 83100.
Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 255090)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-57adb928c3cc61ac8fa47554394670a1c455afc2
+0d6b3abcbfe04949db947081651a503ceb12fe6e
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/types.cc
===
--- gcc/go/gofrontend/types.cc  (revision 254748)
+++ gcc/go/gofrontend/types.cc  (working copy)
@@ -7717,10 +7717,10 @@ Map_type::backend_zero_value(Gogo* gogo)
   std::string asm_name(go_selectively_encode_id(zname));
   Bvariable* zvar =
   gogo->backend()->implicit_variable(zname, asm_name,
- barray_type, false, true, true,
-  Map_type::zero_value_align);
+ barray_type, false, false, true,
+Map_type::zero_value_align);
   gogo->backend()->implicit_variable_set_init(zvar, zname, barray_type,
- false, true, true, NULL);
+ false, false, true, NULL);
   return zvar;
 }
 


Re: [PATCH] final: Improve output for -dp and -fverbose-asm

2017-11-29 Thread Martin Sebor

On 11/29/2017 04:13 PM, Segher Boessenkool wrote:

This improves the assembler output (for -dp and -fverbose-asm) in
several ways.  It always prints the insn_cost.  It does not print
"[length = NN]" but "[c=NN l=NN]", to save space.  It does not add one
to the instruction alternative number (everything else starts counting
those at 0, too).  And finally, it tries to keep things lined up in
columns a bit better.

Tested on powerpc64-linux {-m32,-m64}; is this okay for trunk?


[c=NN l=NN] will be meaningless to those without some deeper
insight into/experience with what's actually being printed.
It might as well say [NN NN].  But such extreme terseness would
seem to run counter to the documented purpose of -fverbose-asm
to "Put extra commentary information in the generated assembly
code to make it more readable."

Looking further in the manual I don't see the [length=NN] bit
mentioned (nor does your patch add a mention of it) so while
the meaning of [length=NN] isn't exactly obvious, using [l=NN]
instead certainly won't make it easier to read.  Does the manual
need updating?

Martin




Segher


2017-11-29  Segher Boessenkool  

* final.c (output_asm_name): Print insn_cost.  Shorten output.  Print
which_alternative instead of which_alternative + 1.
(output_asm_insn): Print an extra tab if the template is short.

---
 gcc/final.c | 22 +++---
 1 file changed, 15 insertions(+), 7 deletions(-)

diff --git a/gcc/final.c b/gcc/final.c
index fdae241..afb6906 100644
--- a/gcc/final.c
+++ b/gcc/final.c
@@ -3469,16 +3469,20 @@ output_asm_name (void)
 {
   if (debug_insn)
 {
-  int num = INSN_CODE (debug_insn);
-  fprintf (asm_out_file, "\t%s %d\t%s",
-  ASM_COMMENT_START, INSN_UID (debug_insn),
-  insn_data[num].name);
-  if (insn_data[num].n_alternatives > 1)
-   fprintf (asm_out_file, "/%d", which_alternative + 1);
+  fprintf (asm_out_file, "\t%s %d\t",
+  ASM_COMMENT_START, INSN_UID (debug_insn));

+  fprintf (asm_out_file, "[c=%d",
+  insn_cost (debug_insn, optimize_insn_for_speed_p ()));
   if (HAVE_ATTR_length)
-   fprintf (asm_out_file, "\t[length = %d]",
+   fprintf (asm_out_file, " l=%d",
 get_attr_length (debug_insn));
+  fprintf (asm_out_file, "]  ");
+
+  int num = INSN_CODE (debug_insn);
+  fprintf (asm_out_file, "%s", insn_data[num].name);
+  if (insn_data[num].n_alternatives > 1)
+   fprintf (asm_out_file, "/%d", which_alternative);

   /* Clear this so only the first assembler insn
 of any rtl insn will get the special comment for -dp.  */
@@ -3824,6 +3828,10 @@ output_asm_insn (const char *templ, rtx *operands)
putc (c, asm_out_file);
   }

+  /* Try to keep the asm a bit more readable.  */
+  if ((flag_verbose_asm || flag_print_asm_name) && strlen (templ) < 9)
+putc ('\t', asm_out_file);
+
   /* Write out the variable names for operands, if we know them.  */
   if (flag_verbose_asm)
 output_asm_operand_names (operands, oporder, ops);





Re: [PATCH] PR libgcc/83112, Fix warnings on libgcc float128-ifunc.c

2017-11-29 Thread Segher Boessenkool
Hi,

On Mon, Nov 27, 2017 at 06:40:09PM -0500, Michael Meissner wrote:
> @@ -33,3 +35,13 @@ $(fp128_hw_obj) : $(srcdir)/config/rs6
>  
>  $(fp128_ifunc_obj): INTERNAL_CFLAGS += $(FP128_CFLAGS_SW)
>  $(fp128_ifunc_obj): $(srcdir)/config/rs6000/t-float128-hw
> +
> +_mulkc3-hw.c: $(srcdir)/config/rs6000/_mulkc3.c
> + rm -rf _mulkc3.c
> + (echo "#define __mulkc3 __mulkc3_hw"; \
> +  cat $(srcdir)/config/rs6000/_mulkc3.c) > _mulkc3-hw.c

Please don't -rf.  -rf is a dangerous habit.

This also won't work if anything tries to build _mulkc3-hw.c a second
time; you have deleted its prerequisite.

Maybe some other scheme is better?

> --- libgcc/config/rs6000/t-float128   (revision 255177)
> +++ libgcc/config/rs6000/t-float128   (working copy)
> @@ -86,7 +86,7 @@ test:
>   for x in $(fp128_obj); do echo "$$x"; done;
>  
>  clean-float128:
> - rm -rf $(fp128_softfp_src)
> + rm -rf $(fp128_softfp_src) $(fp128_hardfp_src)
>   @$(MULTICLEAN) multi-clean DO=clean-float128

-rm to avoid warnings from rm if you clean without the files being there.

Otherwise looks good.  Thanks!


Segher


Re: [PATCH] make canonicalize_condition keep its promise

2017-11-29 Thread Aaron Sawdey
On Tue, 2017-11-21 at 11:45 -0600, Aaron Sawdey wrote:
> On Tue, 2017-11-21 at 10:06 -0700, Jeff Law wrote:
> > On 11/20/2017 06:41 AM, Aaron Sawdey wrote:
> > > On Sun, 2017-11-19 at 16:44 -0700, Jeff Law wrote:
> > > > On 11/15/2017 08:40 AM, Aaron Sawdey wrote:
> > > > > So, the story of this very small patch starts with me adding
> > > > > patterns
> > > > > for ppc instructions bdz[tf] and bdnz[tf] such as this:
> > > > > 
> > > > >   [(set (pc)
> > > > >   (if_then_else
> > > > >     (and
> > > > >    (ne (match_operand:P 1 "register_operand"
> > > > > "c,*b,*b,*b")
> > > > >    (const_int 1))
> > > > >    (match_operator 3 "branch_comparison_operator"
> > > > >     [(match_operand 4 "cc_reg_operand"
> > > > > "y,y,y,y")
> > > > >      (const_int 0)]))
> > > > >     (label_ref (match_operand 0))
> > > > >     (pc)))
> > > > >    (set (match_operand:P 2 "nonimmediate_operand"
> > > > > "=1,*r,m,*d*wi*c*l")
> > > > >   (plus:P (match_dup 1)
> > > > >   (const_int -1)))
> > > > >    (clobber (match_scratch:P 5 "=X,X,,r"))
> > > > >    (clobber (match_scratch:CC 6 "=X,,,"))
> > > > >    (clobber (match_scratch:CCEQ 7 "=X,,,"))]
> > > > > 
> > > > > However when this gets to the loop_doloop pass, we get an
> > > > > assert
> > > > > fail
> > > > > in iv_number_of_iterations():
> > > > > 
> > > > >   gcc_assert (COMPARISON_P (condition));
> > > > > 
> > > > > This is happening because this branch insn tests two things
> > > > > ANDed
> > > > > together so the and is at the top of the expression, not a
> > > > > comparison.
> > > > 
> > > > Is this something you've created for an existing
> > > > loop?  Presumably an
> > > > existing loop that previously was a simple loop?
> > > 
> > > The rtl to use this instruction is generated by new code I'm
> > > working on
> > > to do a builtin expansion of memcmp using a loop. I call
> > > gen_bdnztf_di() to generate rtl for the insn. It would be nice to
> > > be
> > > able to generate this instruction from doloop conversion but that
> > > is
> > > beyond the scope of what I'm working on presently.
> > 
> > Understood.
> > 
> > So what I think (and I'm hoping you can confirm one way or the
> > other)
> > is
> > that by generating this instruction you're turing a loop which
> > previously was considered a simple loop by the IV code and turning
> > it
> > into something the IV bits no longer think is a simple loop.
> > 
> > I think that's problematical as when the loop is thought to be a
> > simple
> > loop, it has to have a small number of forms for its loop back/exit
> > loop
> > tests and whether or not a loop is a simple loop is cached in the
> > loop
> > structure.
> > 
> > I think we need to dig into that first.  If my suspicion is correct
> > then
> > this patch is really just papering over that deeper problem.  So I
> > think
> > you need to dig a big deeper into why you're getting into the code
> > in
> > question (canonicalize_condition) and whether or not the call chain
> > makes any sense given the changes you've made to the loop.
> > 
> 
> Jeff,
>   There is no existing loop structure. This starts with a memcmp()
> call
> and then goes down through the builtin expansion mechanism, which is
> ultimately expanding the pattern cmpmemsi which is where my code is
> generating a loop that finishes with bdnzt. The code that's
> ultimately
> generated looks like this:
> 
> srdi 9,10,4
> li 6,0
> mtctr 9
> li 4,8
> .L9:
> ldbrx 8,11,6
> ldbrx 9,7,6
> ldbrx 5,11,4
> ldbrx 3,7,4
> addi 6,6,16
> addi 4,4,16
> subfc. 9,9,8
> bne 0,.L4
> subfc. 9,3,5
> bdnzt 2,.L9
> 
> So it is a loop with a branch out, and then the branch decrement nz
> true back to the top. The iv's here (regs 4 and 6) were generated by
> my
> expansion code. 
> 
> I really think the ultimate problem here is that both
> canonicalize_condition and get_condition promise in their documenting
> comments that they will return something that has a cond at the root
> of
> the rtx, or 0 if they don't understand what they're given. In this
> case
> they do not understand the rtx of bdnzt and are returning rtx rooted
> with an and, not a cond. This may seem like papering over the
> problem,
> but I think it is legitimate for these functions to return 0 when the
> branch insn in question does not have a simple cond at the heart of
> it.
> And bootstrap/regtest did pass with my patch on ppc64le and x86_64.
> Ultimately, yes something better ought to be done here.
> 

Jeff,
  Just wondering if you would have time to take another look at this
now that Thanksgiving is past. I'm hoping to get this resolved so I can
get this new memcmp expansion code into gcc8.

Thanks,
Aaron

-- 
Aaron Sawdey, Ph.D.  acsaw...@linux.vnet.ibm.com
050-2/C113  (507) 253-7520 home: 507/263-0782
IBM Linux 

Re: [C++ PATCH] Fix ICE on invalid std::tuple_size<...>::value (PR c++/83205)

2017-11-29 Thread Martin Sebor

On 11/29/2017 03:32 PM, Jakub Jelinek wrote:

Hi!

If tsize doesn't fit into uhwi, then it obviously can't match the number of
identifiers in the structured binding declaration.  While we could use
compare_tree_int and avoid that way this conversion to uhwi (though,
compare_tree_int does that anyway), for the normal uhwi case we have special
cases in cnt_mismatch shared by the other kinds of structured bindings
that handle smaller and larger cases separately, so I think it is better
to do it this way.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2017-11-29  Jakub Jelinek  

PR c++/83205
* decl.c (cp_finish_decomp): Handle the case when tsize is not
error_mark_node, but doesn't fit into uhwi.

* g++.dg/cpp1z/decomp32.C: New test.

--- gcc/cp/decl.c.jj2017-11-28 22:23:34.0 +0100
+++ gcc/cp/decl.c   2017-11-29 15:00:24.487658715 +0100
@@ -7518,6 +7518,12 @@ cp_finish_decomp (tree decl, tree first,
 "constant expression", type);
  goto error_out;
}
+  if (!tree_fits_uhwi_p (tsize))
+   {
+ error_at (loc, "%u names provided while %qT decomposes into "


When count is 1 as in the test below the error isn't grammatically
correct ("1 names").  I see that the same message is already issued
elsewhere in the function so this seems like an opportunity to use
the right form here and also fix the other one at the same time or
in a followup.  The error_n function exists to issue the right form
for the language, singular or plural.  It's not as convenient when
the sentence contains two terms that may be singular or plural,
but that can also be dealt with.

Martin


+"%E elements", count, type, tsize);
+ goto error_out;
+   }
   eltscnt = tree_to_uhwi (tsize);
   if (count != eltscnt)
goto cnt_mismatch;
--- gcc/testsuite/g++.dg/cpp1z/decomp32.C.jj2017-11-29 15:08:59.215378903 
+0100
+++ gcc/testsuite/g++.dg/cpp1z/decomp32.C   2017-11-29 15:10:31.576244649 
+0100
@@ -0,0 +1,24 @@
+// PR c++/83205
+// { dg-do compile { target c++11 } }
+// { dg-options "" }
+
+struct A { int i; };
+struct B { int i; };
+namespace std {
+  template  struct tuple_size;
+  template <> struct tuple_size {
+static constexpr int value = -1;
+  };
+#ifdef __SIZEOF_INT128__
+  template <> struct tuple_size {
+static constexpr unsigned __int128 value = -1;
+  };
+#endif
+}
+
+auto [a] = A{};// { dg-error "1 names provided while 'A' decomposes into -1 
elements" }
+   // { dg-warning "structured bindings only available with" "" { 
target c++14_down } .-1 }
+#ifdef __SIZEOF_INT128__
+auto [b] = B{};// { dg-error "1 names provided while 'B' decomposes into 
\[0-9xa-fXA-F]* elements" "" { target int128 } }
+   // { dg-warning "structured bindings only available with" "" { target { 
c++14_down && int128 } } .-1 }
+#endif

Jakub





[committed] hppa-linux: Update baseline symbols

2017-11-29 Thread John David Anglin
The attached patch updates the hppa-linux baseline symbols for gcc-8.

Dave
--
John David Anglin   dave.ang...@bell.net


2017-11-29  John David Anglin  

* config/abi/post/hppa-linux-gnu/baseline_symbols.txt: Update.

Index: config/abi/post/hppa-linux-gnu/baseline_symbols.txt
===
--- config/abi/post/hppa-linux-gnu/baseline_symbols.txt (revision 255165)
+++ config/abi/post/hppa-linux-gnu/baseline_symbols.txt (working copy)
@@ -444,6 +444,7 @@
 FUNC:_ZNKSt13basic_istreamIwSt11char_traitsIwEE6gcountEv@@GLIBCXX_3.4
 FUNC:_ZNKSt13basic_istreamIwSt11char_traitsIwEE6sentrycvbEv@@GLIBCXX_3.4
 FUNC:_ZNKSt13basic_ostreamIwSt11char_traitsIwEE6sentrycvbEv@@GLIBCXX_3.4
+FUNC:_ZNKSt13random_device13_M_getentropyEv@@GLIBCXX_3.4.25
 FUNC:_ZNKSt13runtime_error4whatEv@@GLIBCXX_3.4
 FUNC:_ZNKSt14basic_ifstreamIcSt11char_traitsIcEE5rdbufEv@@GLIBCXX_3.4
 FUNC:_ZNKSt14basic_ifstreamIcSt11char_traitsIcEE7is_openEv@@GLIBCXX_3.4.5
@@ -1329,6 +1330,7 @@
 FUNC:_ZNSbIwSt11char_traitsIwESaIwEEC1EPKwjRKS1_@@GLIBCXX_3.4
 FUNC:_ZNSbIwSt11char_traitsIwESaIwEEC1ERKS1_@@GLIBCXX_3.4
 FUNC:_ZNSbIwSt11char_traitsIwESaIwEEC1ERKS2_@@GLIBCXX_3.4
+FUNC:_ZNSbIwSt11char_traitsIwESaIwEEC1ERKS2_jRKS1_@@GLIBCXX_3.4.24
 FUNC:_ZNSbIwSt11char_traitsIwESaIwEEC1ERKS2_jj@@GLIBCXX_3.4
 FUNC:_ZNSbIwSt11char_traitsIwESaIwEEC1ERKS2_jjRKS1_@@GLIBCXX_3.4
 
FUNC:_ZNSbIwSt11char_traitsIwESaIwEEC1ESt16initializer_listIwERKS1_@@GLIBCXX_3.4.11
@@ -1342,6 +1344,7 @@
 FUNC:_ZNSbIwSt11char_traitsIwESaIwEEC2EPKwjRKS1_@@GLIBCXX_3.4
 FUNC:_ZNSbIwSt11char_traitsIwESaIwEEC2ERKS1_@@GLIBCXX_3.4
 FUNC:_ZNSbIwSt11char_traitsIwESaIwEEC2ERKS2_@@GLIBCXX_3.4
+FUNC:_ZNSbIwSt11char_traitsIwESaIwEEC2ERKS2_jRKS1_@@GLIBCXX_3.4.24
 FUNC:_ZNSbIwSt11char_traitsIwESaIwEEC2ERKS2_jj@@GLIBCXX_3.4
 FUNC:_ZNSbIwSt11char_traitsIwESaIwEEC2ERKS2_jjRKS1_@@GLIBCXX_3.4
 
FUNC:_ZNSbIwSt11char_traitsIwESaIwEEC2ESt16initializer_listIwERKS1_@@GLIBCXX_3.4.11
@@ -4023,6 +4026,8 @@
 OBJECT:0:GLIBCXX_3.4.21
 OBJECT:0:GLIBCXX_3.4.22
 OBJECT:0:GLIBCXX_3.4.23
+OBJECT:0:GLIBCXX_3.4.24
+OBJECT:0:GLIBCXX_3.4.25
 OBJECT:0:GLIBCXX_3.4.3
 OBJECT:0:GLIBCXX_3.4.4
 OBJECT:0:GLIBCXX_3.4.5


Re: [PING 2][PATCH] enhance -Wrestrict to handle string built-ins (PR 78918)

2017-11-29 Thread Martin Sebor

On 11/27/2017 05:44 AM, Richard Biener wrote:

On Thu, Nov 16, 2017 at 10:29 PM, Martin Sebor  wrote:

Ping.

I've fixed the outstanding false positive exposed by the Linux
kernel.  The kernel builds with four instances of the warning,
all of them valid (perfect overlap in memcpy).

I also built Glibc.  It shows one instance of the warning, also
a true positive (cause by calling a restrict-qualified function
with two copies of the same argument).

Finally, I built Binutils and GDB with no warnings.

The attached patch includes just that one fix, with everything
else being the same.


+ /* Detect invalid bounds and overlapping copies and issue
+either -Warray-bounds or -Wrestrict.  */
+ if (check_bounds_or_overlap (stmt, dest, src, len, len))
+   gimple_set_no_warning (stmt, true);

if (! gimple_no_warning (stmt)
&& ...)

to avoid repeated work if this call doesn't get folded.


Sure.



@@ -7295,3 +7342,4 @@ gimple_stmt_integer_valued_real_p (gimple *stmt,
int depth)
   return false;
 }
 }
+

please don't do unrelated whitespace changes.

+
+  if (const strinfo *chksi = olddsi ? olddsi : dsi)
+if (si
+   && !check_bounds_or_overlap (stmt, chksi->ptr, si->ptr, NULL_TREE, len))
+  /* Avoid transforming strcpy when out-of-bounds offsets or
+overlapping access is detected.  */
+  return;

as I said elsewhere diagnostics should not prevent optimization.  Your warning
code isn't optimization-grade (that is, false positives are possible).


I remember you pointing it out earlier and agreeing that warning
options should not disable/enable optimizations.  In this case,
if GCC can avoid the undefined behavior by folding a call that
would otherwise result from calling the library function, it
makes sense to go ahead and fold it.  I forgot to allow for
that in the last revision of my patch but I've fixed it in
the updated one I just posted.


+   if (!check_bounds_or_overlap (stmt, dst, sptr, NULL_TREE, slen))
+ /* Avoid transforming strcat when out-of-bounds offsets or
+overlapping access is detected.  */
+ return;
+  }

Likewise.

+  if (!check_bounds_or_overlap (stmt, dst, sptr, dstlen, srcsize))
+ /* Avoid transforming strcat when out-of-bounds offsets or
+overlapping access is detected.  */
+   return;

Likewise.

I have no strong opinion against the "code duplication" Jeff mentions with
regarding to builtin_access and friends.  The relation to ao_ref and friends
could be documented a bit and how builtin_memref/builtin_access are
not suitable for optimization.


I added a comment to the classes to make it clearer.  Although
now that the classes are in a dedicated warning pass there is
little risk of someone misusing them for optimization.

Thanks
Martin



Thanks,
Richard.



On 11/09/2017 04:57 PM, Martin Sebor wrote:


Ping:

  https://gcc.gnu.org/ml/gcc-patches/2017-10/msg01642.html

On 10/23/2017 08:42 PM, Martin Sebor wrote:


Attached is a reworked solution to enhance -Wrestrict while
avoiding changing tree-vrp.c or any other VRP machinery.  Richard,
in considering you suggestions I realized that the ao_ref struct
isn't general enough to detect the kinds of problems I needed to
etect (storing bit-offsets in HOST_WIDE_INT means out-of-bounds
offsets cannot be represented or detected, leading to either false
positives or false negatives).

Instead, the solution adds a couple of small classes to builtins.c
to overcome this limitation (I'm contemplating moving them along
with -Wstringop-overflow to a separate file to keep builtins.c
from getting too much bigger).

The detection of out-of-bounds offsets and overlapping accesses
is relatively simple but the rest of the changes are somewhat
involved because of the computation of the offsets and sizes of
the overlaps.

Jeff, as per your suggestion/request in an earlier review (bug
81117) I've renamed some of the existing functions to better
reflect their new function (including renaming strlen_optimize_stmt
in tree-ssa-strlen.c to strlen_check_and_optimize_stmt).  There's
quite a bit of churn due to some of this renaming.  I don't think
this renaming makes the review too difficult but if you feel
differently I can [be persuaded to] split it out into a separate
patch.

To validate the patch I compiled the Linux kernel and Binutils/GDB.
There's one false positive I'm working on resolving that's caused
by an incorrect interpretation of an offset in a range whose lower
bound is greater than its upper bound.  This it so say that while
I'm aware the patch isn't perfect it already works reasonably well
in practice and I think it's close enough to review.

Thanks
Martin









Re: [PATCH][AArch64] Fix ICE due to store_pair_lanes

2017-11-29 Thread Steve Ellcey

FYI: Here is a cut down test case showing the failure:

int foo (void) { }
extern void plunk ();
int splat (void)
{
  static int once = 0;
  plunk (, foo);
}

% obj/gcc/gcc/cc1 -mabi=ilp32 -O2 -quiet x.i

during RTL pass: final
x.i: In function ‘splat’:
x.i:7:1: internal compiler error: in aarch64_print_address_internal, at 
config/aarch64/aarch64.c:5638
 }
 ^
0x14286c7 aarch64_print_address_internal
/home/sellcey/gcc-spec-
ilp32/src/gcc/gcc/config/aarch64/aarch64.c:5638
0x1428d4b aarch64_print_operand_address
/home/sellcey/gcc-spec-
ilp32/src/gcc/gcc/config/aarch64/aarch64.c:5735
0xb5fa93 output_address(machine_mode, rtx_def*)
/home/sellcey/gcc-spec-ilp32/src/gcc/gcc/final.c:3905
0xb5f40b output_asm_insn(char const*, rtx_def**)
/home/sellcey/gcc-spec-ilp32/src/gcc/gcc/final.c:3766
0xb5e0df final_scan_insn(rtx_insn*, _IO_FILE*, int, int, int*)
/home/sellcey/gcc-spec-ilp32/src/gcc/gcc/final.c:3064
0xb5bfbb final(rtx_insn*, _IO_FILE*, int)
/home/sellcey/gcc-spec-ilp32/src/gcc/gcc/final.c:2052
0xb60d27 rest_of_handle_final
/home/sellcey/gcc-spec-ilp32/src/gcc/gcc/final.c:4490
0xb6110f execute
/home/sellcey/gcc-spec-ilp32/src/gcc/gcc/final.c:4564
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.


[PATCH] final: Improve output for -dp and -fverbose-asm

2017-11-29 Thread Segher Boessenkool
This improves the assembler output (for -dp and -fverbose-asm) in
several ways.  It always prints the insn_cost.  It does not print
"[length = NN]" but "[c=NN l=NN]", to save space.  It does not add one
to the instruction alternative number (everything else starts counting
those at 0, too).  And finally, it tries to keep things lined up in
columns a bit better.

Tested on powerpc64-linux {-m32,-m64}; is this okay for trunk?


Segher


2017-11-29  Segher Boessenkool  

* final.c (output_asm_name): Print insn_cost.  Shorten output.  Print
which_alternative instead of which_alternative + 1.
(output_asm_insn): Print an extra tab if the template is short.

---
 gcc/final.c | 22 +++---
 1 file changed, 15 insertions(+), 7 deletions(-)

diff --git a/gcc/final.c b/gcc/final.c
index fdae241..afb6906 100644
--- a/gcc/final.c
+++ b/gcc/final.c
@@ -3469,16 +3469,20 @@ output_asm_name (void)
 {
   if (debug_insn)
 {
-  int num = INSN_CODE (debug_insn);
-  fprintf (asm_out_file, "\t%s %d\t%s",
-  ASM_COMMENT_START, INSN_UID (debug_insn),
-  insn_data[num].name);
-  if (insn_data[num].n_alternatives > 1)
-   fprintf (asm_out_file, "/%d", which_alternative + 1);
+  fprintf (asm_out_file, "\t%s %d\t",
+  ASM_COMMENT_START, INSN_UID (debug_insn));
 
+  fprintf (asm_out_file, "[c=%d",
+  insn_cost (debug_insn, optimize_insn_for_speed_p ()));
   if (HAVE_ATTR_length)
-   fprintf (asm_out_file, "\t[length = %d]",
+   fprintf (asm_out_file, " l=%d",
 get_attr_length (debug_insn));
+  fprintf (asm_out_file, "]  ");
+
+  int num = INSN_CODE (debug_insn);
+  fprintf (asm_out_file, "%s", insn_data[num].name);
+  if (insn_data[num].n_alternatives > 1)
+   fprintf (asm_out_file, "/%d", which_alternative);
 
   /* Clear this so only the first assembler insn
 of any rtl insn will get the special comment for -dp.  */
@@ -3824,6 +3828,10 @@ output_asm_insn (const char *templ, rtx *operands)
putc (c, asm_out_file);
   }
 
+  /* Try to keep the asm a bit more readable.  */
+  if ((flag_verbose_asm || flag_print_asm_name) && strlen (templ) < 9)
+putc ('\t', asm_out_file);
+
   /* Write out the variable names for operands, if we know them.  */
   if (flag_verbose_asm)
 output_asm_operand_names (operands, oporder, ops);
-- 
1.8.3.1



[PATCH] combine: Print to dump if some insn cannot be combined into i3

2017-11-29 Thread Segher Boessenkool
Eventually we should print the reason that any combination fails.
This is a good start (these happen often).

Applying to trunk.


Segher


2017-11-29  Segher Boessenkool  

* combine.c (try_combine): Print a message to dump file whenever
I0, I1, or I2 cannot be combined into I3.

---
 gcc/combine.c | 24 ++--
 1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/gcc/combine.c b/gcc/combine.c
index c578e47..22a382d 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -3061,13 +3061,25 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
rtx_insn *i0,
   SUBST (PATTERN (i2), XVECEXP (PATTERN (i2), 0, 1));
 }
 
-  /* Verify that I2 and I1 are valid for combining.  */
-  if (! can_combine_p (i2, i3, i0, i1, NULL, NULL, , )
-  || (i1 && ! can_combine_p (i1, i3, i0, NULL, i2, NULL,
-, ))
-  || (i0 && ! can_combine_p (i0, i3, NULL, NULL, i1, i2,
-, )))
+  /* Verify that I2 and maybe I1 and I0 can be combined into I3.  */
+  if (!can_combine_p (i2, i3, i0, i1, NULL, NULL, , ))
 {
+  if (dump_file)
+   fprintf (dump_file, "Can't combine i2 into i3\n");
+  undo_all ();
+  return 0;
+}
+  if (i1 && !can_combine_p (i1, i3, i0, NULL, i2, NULL, , ))
+{
+  if (dump_file)
+   fprintf (dump_file, "Can't combine i1 into i3\n");
+  undo_all ();
+  return 0;
+}
+  if (i0 && !can_combine_p (i0, i3, NULL, NULL, i1, i2, , ))
+{
+  if (dump_file)
+   fprintf (dump_file, "Can't combine i0 into i3\n");
   undo_all ();
   return 0;
 }
-- 
1.8.3.1



[PATCH] combine: Do not throw away unneeded arms of parallels (PR83156)

2017-11-29 Thread Segher Boessenkool
The fix for PR82621 makes us not split an I2 if one of the results of
those SETs is unused, since combine does not handle that properly.  But
this results in degradation for i386 (or more in general, for any
target that does not have patterns for parallels with an unused result
as a CLOBBER instead of a SET for that result).

This patch instead makes us not split only if one of the results is set
again before I3.  That fixes PR83156 and also fixes PR82621.

Unfortunately it undoes the nice optimisations that the previous patch
did, on powerpc.

Tested on powerpc64-linux {-m32,-m64}; committing to trunk.


Segher


2017-11-29  Segher Boessenkool  

PR rtl-optimization/83156
PR rtl-optimization/82621
* combine.c (try_combine): Don't split an I2 if one of the dests is
set again before I3.  Allow unused dests.

---
 gcc/combine.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/combine.c b/gcc/combine.c
index 8462397..c578e47 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -3042,7 +3042,8 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
rtx_insn *i0,
   && can_split_parallel_of_n_reg_sets (i2, 2)
   && !reg_used_between_p (SET_DEST (XVECEXP (PATTERN (i2), 0, 0)), i2, i3)
   && !reg_used_between_p (SET_DEST (XVECEXP (PATTERN (i2), 0, 1)), i2, i3)
-  && !find_reg_note (i2, REG_UNUSED, 0))
+  && !reg_set_between_p  (SET_DEST (XVECEXP (PATTERN (i2), 0, 0)), i2, i3)
+  && !reg_set_between_p  (SET_DEST (XVECEXP (PATTERN (i2), 0, 1)), i2, i3))
 {
   /* If there is no I1, there is no I0 either.  */
   i0 = i1;
-- 
1.8.3.1



[PATCH] Improve __builtin_mul_overflow (uns, 1U << y, ) (PR target/83210)

2017-11-29 Thread Jakub Jelinek
Hi!

Even if target has an umulv4_optab pattern like i?86 mul[lq]; jo,
if one argument is constant power of two, using two shifts, or
for multiplication by 2 just shl with js on previous value is faster
(though, for -Os mul[lq]; jo wins).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2017-11-29  Jakub Jelinek  

PR target/83210
* internal-fn.c (expand_mul_overflow): Optimize unsigned
multiplication by power of 2 constant into two shifts + comparison.

* gcc.target/i386/pr83210.c: New test.

--- gcc/internal-fn.c.jj2017-11-22 21:37:46.0 +0100
+++ gcc/internal-fn.c   2017-11-29 17:44:17.699009169 +0100
@@ -1462,6 +1462,39 @@ expand_mul_overflow (location_t loc, tre
   type = build_nonstandard_integer_type (GET_MODE_PRECISION (mode), uns);
   sign = uns ? UNSIGNED : SIGNED;
   icode = optab_handler (uns ? umulv4_optab : mulv4_optab, mode);
+  if (uns
+  && (integer_pow2p (arg0) || integer_pow2p (arg1))
+  && (optimize_insn_for_speed_p () || icode == CODE_FOR_nothing))
+{
+  /* Optimize unsigned multiplication by power of 2 constant
+using 2 shifts, one for result, one to extract the shifted
+out bits to see if they are all zero.
+Don't do this if optimizing for size and we have umulv4_optab,
+in that case assume multiplication will be shorter.  */
+  rtx opn0 = op0;
+  rtx opn1 = op1;
+  tree argn0 = arg0;
+  tree argn1 = arg1;
+  if (integer_pow2p (arg0))
+   {
+ std::swap (opn0, opn1);
+ std::swap (argn0, argn1);
+   }
+  int cnt = tree_log2 (argn1);
+  if (cnt >= 0 && cnt < GET_MODE_PRECISION (mode))
+   {
+ rtx upper = const0_rtx;
+ res = expand_shift (LSHIFT_EXPR, mode, opn0, cnt, NULL_RTX, uns);
+ if (cnt != 0)
+   upper = expand_shift (RSHIFT_EXPR, mode, opn0,
+ GET_MODE_PRECISION (mode) - cnt,
+ NULL_RTX, uns);
+ do_compare_rtx_and_jump (upper, const0_rtx, EQ, true, mode,
+  NULL_RTX, NULL, done_label,
+  profile_probability::very_likely ());
+ goto do_error_label;
+   }
+}
   if (icode != CODE_FOR_nothing)
 {
   struct expand_operand ops[4];
--- gcc/testsuite/gcc.target/i386/pr83210.c.jj  2017-11-29 17:58:00.102881065 
+0100
+++ gcc/testsuite/gcc.target/i386/pr83210.c 2017-11-29 17:57:36.0 
+0100
@@ -0,0 +1,53 @@
+/* PR target/83210 */
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler-not {\mmul[lq]\M} } } */
+
+void bar (void);
+
+unsigned
+f1 (unsigned int x)
+{
+  unsigned res;
+  if (__builtin_mul_overflow (x, 2, ))
+bar ();
+  return res;
+}
+
+unsigned long
+f2 (unsigned long x)
+{
+  unsigned long res;
+  if (__builtin_mul_overflow (16, x, ))
+bar ();
+  return res;
+}
+
+unsigned long long
+f3 (unsigned long long x)
+{
+  unsigned long long res;
+  if (__builtin_mul_overflow (x, (1ULL << (__SIZEOF_LONG_LONG__ * __CHAR_BIT__ 
- 1)), ))
+bar ();
+  return res;
+}
+
+#ifdef __SIZEOF_INT128__
+unsigned __int128
+f4 (unsigned __int128 x)
+{
+  unsigned __int128 res;
+  if (__builtin_mul_overflow (x, (((unsigned __int128) 1) << 
(__SIZEOF_INT128__ * __CHAR_BIT__ / 2)), ))
+bar ();
+  return res;
+}
+
+unsigned __int128
+f5 (unsigned __int128 x)
+{
+  unsigned __int128 res;
+  if (__builtin_mul_overflow (x, (((unsigned __int128) 1) << 
(__SIZEOF_INT128__ * __CHAR_BIT__ / 2 + 3)), ))
+bar ();
+  return res;
+}
+#endif

Jakub


[PATCH] Riscv patterns to optimize away some redundant zero/sign extends.

2017-11-29 Thread Jim Wilson
This adds some new patterns to eliminate some unecessary zero/sign extends.
This also adds one testcase for each pattern, and one testcase that works
only with the rtx_costs change.  There is a lot more work to be done here,
but this does help.  With a toolchain/linux kernel/busybox build, I see these
patterns trigger a little over 2300 times.

This was tested with a gcc make check, with no regressions.  It was also tested
by building the toolchain/linux kernel/busybox and booting the kernel on a
simulator.

Committed.

gcc/
* config/riscv/riscv.c (SINGLE_SHIFT_COST): New.
(riscv_rtx_costs): Case ZERO_EXTRACT, match new pattern, and return
SINGLE_SHIFT_COST.  Case LT and ZERO_EXTEND, likewise.  Case ASHIFT,
use SINGLE_SHIFT_COST.
* config/riscv/riscv.md (lshrsi3_zero_extend_1): New.
(lshrsi3_zero_extend_2, lshrsi3_zero_extend_3): New.

gcc/testsuite/
* gcc.target/riscv/riscv.exp: New.
* gcc.target/riscv/zero-extend-1.c: New.
* gcc.target/riscv/zero-extend-2.c: New.
* gcc.target/riscv/zero-extend-3.c: New.
* gcc.target/riscv/zero-extend-4.c: New.
---
 gcc/config/riscv/riscv.c   | 32 +--
 gcc/config/riscv/riscv.md  | 43 ++
 gcc/testsuite/gcc.target/riscv/riscv.exp   | 41 
 gcc/testsuite/gcc.target/riscv/zero-extend-1.c |  8 +
 gcc/testsuite/gcc.target/riscv/zero-extend-2.c | 13 
 gcc/testsuite/gcc.target/riscv/zero-extend-3.c | 12 +++
 gcc/testsuite/gcc.target/riscv/zero-extend-4.c | 20 
 7 files changed, 167 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/riscv.exp
 create mode 100644 gcc/testsuite/gcc.target/riscv/zero-extend-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zero-extend-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zero-extend-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zero-extend-4.c

diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
index 279af909a69..5547d68193c 100644
--- a/gcc/config/riscv/riscv.c
+++ b/gcc/config/riscv/riscv.c
@@ -1429,6 +1429,8 @@ riscv_extend_cost (rtx op, bool unsigned_p)
 
 /* Implement TARGET_RTX_COSTS.  */
 
+#define SINGLE_SHIFT_COST 1
+
 static bool
 riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno 
ATTRIBUTE_UNUSED,
 int *total, bool speed)
@@ -1489,10 +1491,21 @@ riscv_rtx_costs (rtx x, machine_mode mode, int 
outer_code, int opno ATTRIBUTE_UN
   *total = riscv_binary_cost (x, 1, 2);
   return false;
 
+case ZERO_EXTRACT:
+  /* This is an SImode shift.  */
+  if (outer_code == SET && (INTVAL (XEXP (x, 2)) > 0)
+ && (INTVAL (XEXP (x, 1)) + INTVAL (XEXP (x, 2)) == 32))
+   {
+ *total = COSTS_N_INSNS (SINGLE_SHIFT_COST);
+ return true;
+   }
+  return false;
+
 case ASHIFT:
 case ASHIFTRT:
 case LSHIFTRT:
-  *total = riscv_binary_cost (x, 1, CONSTANT_P (XEXP (x, 1)) ? 4 : 9);
+  *total = riscv_binary_cost (x, SINGLE_SHIFT_COST,
+ CONSTANT_P (XEXP (x, 1)) ? 4 : 9);
   return false;
 
 case ABS:
@@ -1504,6 +1517,14 @@ riscv_rtx_costs (rtx x, machine_mode mode, int 
outer_code, int opno ATTRIBUTE_UN
   return true;
 
 case LT:
+  /* This is an SImode shift.  */
+  if (outer_code == SET && GET_MODE (x) == DImode
+ && GET_MODE (XEXP (x, 0)) == SImode)
+   {
+ *total = COSTS_N_INSNS (SINGLE_SHIFT_COST);
+ return true;
+   }
+  /* Fall through.  */
 case LTU:
 case LE:
 case LEU:
@@ -1601,8 +1622,15 @@ riscv_rtx_costs (rtx x, machine_mode mode, int 
outer_code, int opno ATTRIBUTE_UN
*total = COSTS_N_INSNS (1);
   return false;
 
-case SIGN_EXTEND:
 case ZERO_EXTEND:
+  /* This is an SImode shift.  */
+  if (GET_CODE (XEXP (x, 0)) == LSHIFTRT)
+   {
+ *total = COSTS_N_INSNS (SINGLE_SHIFT_COST);
+ return true;
+   }
+  /* Fall through.  */
+case SIGN_EXTEND:
   *total = riscv_extend_cost (XEXP (x, 0), GET_CODE (x) == ZERO_EXTEND);
   return false;
 
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 814ff6ec6ad..db4fed48e53 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -1524,6 +1524,49 @@
   [(set_attr "type" "shift")
(set_attr "mode" "SI")])
 
+;; Non-canonical, but can be formed by ree when combine is not successful at
+;; producing one of the two canonical patterns below.
+(define_insn "*lshrsi3_zero_extend_1"
+  [(set (match_operand:DI   0 "register_operand" "=r")
+   (zero_extend:DI
+(lshiftrt:SI (match_operand:SI 1 "register_operand" " r")
+ (match_operand:SI 2 "const_int_operand"]
+  "TARGET_64BIT && (INTVAL (operands[2]) & 0x1f) > 0"
+{
+  operands[2] = GEN_INT 

[C++ PATCH] Fix ICE on invalid std::tuple_size<...>::value (PR c++/83205)

2017-11-29 Thread Jakub Jelinek
Hi!

If tsize doesn't fit into uhwi, then it obviously can't match the number of
identifiers in the structured binding declaration.  While we could use
compare_tree_int and avoid that way this conversion to uhwi (though,
compare_tree_int does that anyway), for the normal uhwi case we have special
cases in cnt_mismatch shared by the other kinds of structured bindings
that handle smaller and larger cases separately, so I think it is better
to do it this way.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2017-11-29  Jakub Jelinek  

PR c++/83205
* decl.c (cp_finish_decomp): Handle the case when tsize is not
error_mark_node, but doesn't fit into uhwi.

* g++.dg/cpp1z/decomp32.C: New test.

--- gcc/cp/decl.c.jj2017-11-28 22:23:34.0 +0100
+++ gcc/cp/decl.c   2017-11-29 15:00:24.487658715 +0100
@@ -7518,6 +7518,12 @@ cp_finish_decomp (tree decl, tree first,
 "constant expression", type);
  goto error_out;
}
+  if (!tree_fits_uhwi_p (tsize))
+   {
+ error_at (loc, "%u names provided while %qT decomposes into "
+"%E elements", count, type, tsize);
+ goto error_out;
+   }
   eltscnt = tree_to_uhwi (tsize);
   if (count != eltscnt)
goto cnt_mismatch;
--- gcc/testsuite/g++.dg/cpp1z/decomp32.C.jj2017-11-29 15:08:59.215378903 
+0100
+++ gcc/testsuite/g++.dg/cpp1z/decomp32.C   2017-11-29 15:10:31.576244649 
+0100
@@ -0,0 +1,24 @@
+// PR c++/83205
+// { dg-do compile { target c++11 } }
+// { dg-options "" }
+
+struct A { int i; };
+struct B { int i; };
+namespace std {
+  template  struct tuple_size;
+  template <> struct tuple_size {
+static constexpr int value = -1;
+  };
+#ifdef __SIZEOF_INT128__
+  template <> struct tuple_size {
+static constexpr unsigned __int128 value = -1;
+  };
+#endif
+}
+
+auto [a] = A{};// { dg-error "1 names provided while 'A' decomposes 
into -1 elements" }
+   // { dg-warning "structured bindings only available with" "" { 
target c++14_down } .-1 }
+#ifdef __SIZEOF_INT128__
+auto [b] = B{};// { dg-error "1 names provided while 'B' decomposes 
into \[0-9xa-fXA-F]* elements" "" { target int128 } }
+   // { dg-warning "structured bindings only available with" "" { 
target { c++14_down && int128 } } .-1 }
+#endif

Jakub


Re: [PATCH][AArch64] Fix ICE due to store_pair_lanes

2017-11-29 Thread Steve Ellcey
On Wed, 2017-11-29 at 21:13 +0100, Christophe Lyon wrote:

> Hi Wilco,
> 
> This breaks the build of aarch64-none-elf toolchains:
> 
> /tmp/8868058_9.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/final.c:4564
> Please submit a full bug report,
> with preprocessed source if appropriate.
> Please include the complete backtrace with any bug report.
> See  for instructions.
> make[4]: *** [unwind-dw2-fde.o] Error 1
> make[4]: Leaving directory
> `/tmp/8868058_9.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-aarch64_be-none-elf/gcc1/aarch64_be-none-elf/ilp32/libgcc'
> make[3]: *** [multi-do] Error 1

I am seeing this failure too, on my aarch64-linux-gnu build where I
have ILP32 enabled.

Steve Ellcey
sell...@cavium.com


[PATCH] rs6000: Add second variant of adde

2017-11-29 Thread Segher Boessenkool
This adds a second variant of the adde insn pattern, this one with the
CA register as the second operand.  The existing pattern has it as the
third operand.  It would be ideal if RTL was always canonicalised like
that, but it isn't (and that is not trivial), and this is a simple and
harmless patch.


Segher


2017-11-28  Segher Boessenkool  

* config/rs6000/rs6000.md (*add3_carry_in_internal2): New.

---
 gcc/config/rs6000/rs6000.md | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 20b1581..7e91d54 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -1915,6 +1915,16 @@ (define_insn "*add3_carry_in_internal"
   "adde %0,%1,%2"
   [(set_attr "type" "add")])
 
+(define_insn "*add3_carry_in_internal2"
+  [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
+   (plus:GPR (plus:GPR (match_operand:GPR 1 "gpc_reg_operand" "r")
+   (reg:GPR CA_REGNO))
+ (match_operand:GPR 2 "gpc_reg_operand" "r")))
+   (clobber (reg:GPR CA_REGNO))]
+  ""
+  "adde %0,%1,%2"
+  [(set_attr "type" "add")])
+
 (define_insn "add3_carry_in_0"
   [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
(plus:GPR (match_operand:GPR 1 "gpc_reg_operand" "r")
-- 
1.8.3.1



patch to fix PR80818

2017-11-29 Thread Vladimir Makarov

  The following patch fixes

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80818

  The patch was successfully tested and bootstrapped on x86_64. The 
patch has no test because it is hard to check the problem.  I checked 
manually that the patch solves the problem on a test provided by Andreas 
Krebbel.


   Committed as rev. 255258.


Index: ChangeLog
===
--- ChangeLog	(revision 255255)
+++ ChangeLog	(working copy)
@@ -1,3 +1,15 @@
+2017-11-29  Vladimir Makarov  
+
+	PR rtl-optimization/80818
+	* lra.c (collect_non_operand_hard_regs): New arg insn.  Pass it
+	recursively.  Use insn code for clobber.
+	(lra_set_insn_recog_data): Pass the new arg to
+	collect_non_operand_hard_regs.
+	(add_regs_to_insn_regno_info): Pass insn instead of uid.  Use insn
+	code for clobber.
+	(lra_update_insn_regno_info): Pass insn to
+	add_regs_to_insn_regno_info.
+
 2017-11-29  Julia Koval  
 
 	* config/i386/avx512vbmi2intrin.h (_mm512_shldv_epi16,
Index: lra.c
===
--- lra.c	(revision 255256)
+++ lra.c	(working copy)
@@ -807,7 +807,8 @@ setup_operand_alternative (lra_insn_reco
to LIST.  X is a part of insn given by DATA.	 Return the result
list.  */
 static struct lra_insn_reg *
-collect_non_operand_hard_regs (rtx *x, lra_insn_recog_data_t data,
+collect_non_operand_hard_regs (rtx_insn *insn, rtx *x,
+			   lra_insn_recog_data_t data,
 			   struct lra_insn_reg *list,
 			   enum op_type type, bool early_clobber)
 {
@@ -881,25 +882,28 @@ collect_non_operand_hard_regs (rtx *x, l
   switch (code)
 {
 case SET:
-  list = collect_non_operand_hard_regs (_DEST (op), data,
+  list = collect_non_operand_hard_regs (insn, _DEST (op), data,
 	list, OP_OUT, false);
-  list = collect_non_operand_hard_regs (_SRC (op), data,
+  list = collect_non_operand_hard_regs (insn, _SRC (op), data,
 	list, OP_IN, false);
   break;
 case CLOBBER:
-  /* We treat clobber of non-operand hard registers as early
-	 clobber (the behavior is expected from asm).  */
-  list = collect_non_operand_hard_regs ( (op, 0), data,
-	list, OP_OUT, true);
-  break;
+  {
+	int code = INSN_CODE (insn);
+	/* We treat clobber of non-operand hard registers as early
+	   clobber (the behavior is expected from asm).  */
+	list = collect_non_operand_hard_regs (insn,  (op, 0), data,
+	  list, OP_OUT, code < 0);
+	break;
+  }
 case PRE_INC: case PRE_DEC: case POST_INC: case POST_DEC:
-  list = collect_non_operand_hard_regs ( (op, 0), data,
+  list = collect_non_operand_hard_regs (insn,  (op, 0), data,
 	list, OP_INOUT, false);
   break;
 case PRE_MODIFY: case POST_MODIFY:
-  list = collect_non_operand_hard_regs ( (op, 0), data,
+  list = collect_non_operand_hard_regs (insn,  (op, 0), data,
 	list, OP_INOUT, false);
-  list = collect_non_operand_hard_regs ( (op, 1), data,
+  list = collect_non_operand_hard_regs (insn,  (op, 1), data,
 	list, OP_IN, false);
   break;
 default:
@@ -907,12 +911,12 @@ collect_non_operand_hard_regs (rtx *x, l
   for (i = GET_RTX_LENGTH (code) - 1; i >= 0; i--)
 	{
 	  if (fmt[i] == 'e')
-	list = collect_non_operand_hard_regs ( (op, i), data,
+	list = collect_non_operand_hard_regs (insn,  (op, i), data,
 		  list, OP_IN, false);
 	  else if (fmt[i] == 'E')
 	for (j = XVECLEN (op, i) - 1; j >= 0; j--)
-	  list = collect_non_operand_hard_regs ( (op, i, j), data,
-		list, OP_IN, false);
+	  list = collect_non_operand_hard_regs (insn,  (op, i, j),
+		data, list, OP_IN, false);
 	}
 }
   return list;
@@ -1055,7 +1059,7 @@ lra_set_insn_recog_data (rtx_insn *insn)
 insn_static_data->hard_regs = NULL;
   else
 insn_static_data->hard_regs
-  = collect_non_operand_hard_regs ( (insn), data,
+  = collect_non_operand_hard_regs (insn,  (insn), data,
    NULL, OP_IN, false);
   data->arg_hard_regs = NULL;
   if (CALL_P (insn))
@@ -1402,13 +1406,14 @@ lra_get_copy (int n)
 /* This page contains code dealing with info about registers in
insns.  */
 
-/* Process X of insn UID recursively and add info (operand type is
+/* Process X of INSN recursively and add info (operand type is
given by TYPE, flag of that it is early clobber is EARLY_CLOBBER)
about registers in X to the insn DATA.  If X can be early clobbered,
alternatives in which it can be early clobbered are given by
EARLY_CLOBBER_ALTS.  */
 static void
-add_regs_to_insn_regno_info (lra_insn_recog_data_t data, rtx x, int uid,
+add_regs_to_insn_regno_info (lra_insn_recog_data_t data, rtx x,
+			 rtx_insn *insn,
 			 enum op_type type, bool early_clobber,
 			 alternative_mask early_clobber_alts)
 {
@@ -1436,7 +1441,7 @@ add_regs_to_insn_regno_info (lra_insn_re
   /* Process 

Re: [PR 82808] Use result types for arithmetic jump functions

2017-11-29 Thread Martin Jambor
On Tue, Nov 28 2017, Martin Jambor wrote:
>
...
>
> Done, this is what I have just committed.  I will prepare a conservative
> fix for gcc 7 with only the expr_type_first_operand_type_p part.
>

The following is what I have committed to the gcc-7-branch after a round
of bootstrap and testing on an x86_64-linux.

Thanks,

Martin


2017-11-29  Martin Jambor  

PR ipa/82808
* tree.c (expr_type_first_operand_type_p): New function.
* tree.h (expr_type_first_operand_type_p): Declare it.
* ipa-cp.c (ipa_get_jf_pass_through_result): Use it.

testsuite/
* gcc.dg/ipa/pr82808.c: New test.
---
 gcc/ipa-cp.c   | 26 +++---
 gcc/testsuite/gcc.dg/ipa/pr82808.c | 27 +++
 gcc/tree.c | 44 ++
 gcc/tree.h |  1 +
 4 files changed, 85 insertions(+), 13 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/ipa/pr82808.c

diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
index fa3d5fd7548..716c8cc3a1f 100644
--- a/gcc/ipa-cp.c
+++ b/gcc/ipa-cp.c
@@ -1224,20 +1224,20 @@ ipa_get_jf_pass_through_result (struct ipa_jump_func 
*jfunc, tree input)
   if (!is_gimple_ip_invariant (input))
 return NULL_TREE;
 
-  if (TREE_CODE_CLASS (ipa_get_jf_pass_through_operation (jfunc))
-  == tcc_unary)
-res = fold_unary (ipa_get_jf_pass_through_operation (jfunc),
- TREE_TYPE (input), input);
+  tree_code opcode = ipa_get_jf_pass_through_operation (jfunc);
+  if (TREE_CODE_CLASS (opcode) == tcc_comparison)
+restype = boolean_type_node;
+  else if (expr_type_first_operand_type_p (opcode))
+restype = TREE_TYPE (input);
   else
-{
-  if (TREE_CODE_CLASS (ipa_get_jf_pass_through_operation (jfunc))
- == tcc_comparison)
-   restype = boolean_type_node;
-  else
-   restype = TREE_TYPE (input);
-  res = fold_binary (ipa_get_jf_pass_through_operation (jfunc), restype,
-input, ipa_get_jf_pass_through_operand (jfunc));
-}
+return NULL_TREE;
+
+  if (TREE_CODE_CLASS (opcode) == tcc_unary)
+res = fold_unary (opcode, restype, input);
+  else
+res = fold_binary (opcode, restype, input,
+  ipa_get_jf_pass_through_operand (jfunc));
+
   if (res && !is_gimple_ip_invariant (res))
 return NULL_TREE;
 
diff --git a/gcc/testsuite/gcc.dg/ipa/pr82808.c 
b/gcc/testsuite/gcc.dg/ipa/pr82808.c
new file mode 100644
index 000..9c95d0b6ed7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ipa/pr82808.c
@@ -0,0 +1,27 @@
+/* { dg-options "-O2" } */
+/* { dg-do run } */
+
+static void __attribute__((noinline))
+foo (double *a, double x)
+{
+  *a = x;
+}
+
+static double  __attribute__((noinline))
+f_c1 (int m, double *a)
+{
+  foo (a, m);
+  return *a;
+}
+
+int
+main (){
+  double data;
+  double ret = 0 ;
+
+  if ((ret = f_c1 (2, )) != 2)
+{
+  __builtin_abort ();
+}
+  return 0;
+}
diff --git a/gcc/tree.c b/gcc/tree.c
index 69425ab59ee..698213c3501 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -14515,6 +14515,50 @@ get_nonnull_args (const_tree fntype)
   return argmap;
 }
 
+/* Return true if an expression with CODE has to have the same result type as
+   its first operand.  */
+
+bool
+expr_type_first_operand_type_p (tree_code code)
+{
+  switch (code)
+{
+case NEGATE_EXPR:
+case ABS_EXPR:
+case BIT_NOT_EXPR:
+case PAREN_EXPR:
+case CONJ_EXPR:
+
+case PLUS_EXPR:
+case MINUS_EXPR:
+case MULT_EXPR:
+case TRUNC_DIV_EXPR:
+case CEIL_DIV_EXPR:
+case FLOOR_DIV_EXPR:
+case ROUND_DIV_EXPR:
+case TRUNC_MOD_EXPR:
+case CEIL_MOD_EXPR:
+case FLOOR_MOD_EXPR:
+case ROUND_MOD_EXPR:
+case RDIV_EXPR:
+case EXACT_DIV_EXPR:
+case MIN_EXPR:
+case MAX_EXPR:
+case BIT_IOR_EXPR:
+case BIT_XOR_EXPR:
+case BIT_AND_EXPR:
+
+case LSHIFT_EXPR:
+case RSHIFT_EXPR:
+case LROTATE_EXPR:
+case RROTATE_EXPR:
+  return true;
+
+default:
+  return false;
+}
+}
+
 #if CHECKING_P
 
 namespace selftest {
diff --git a/gcc/tree.h b/gcc/tree.h
index 375edcd4ba6..f20b77f17e4 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -5471,6 +5471,7 @@ extern void gt_pch_nx (tree &, gt_pointer_operator, void 
*);
 
 extern bool nonnull_arg_p (const_tree);
 extern bool is_redundant_typedef (const_tree);
+extern bool expr_type_first_operand_type_p (tree_code);
 
 extern location_t
 set_source_range (tree expr, location_t start, location_t finish);
-- 
2.15.0



Re: [PATCH] C++: improve location of static_assert errors

2017-11-29 Thread Jason Merrill
OK.

On Wed, Nov 29, 2017 at 4:26 PM, David Malcolm  wrote:
> Whilst investigating PR c++/82893 I noticed that the location
> of static_assert diagnostics uses that of the "static_assert"
> token, which seemed suboptimal to me, e.g.:
>
> error: non-constant condition for static assertion
>  static_assert(sizeof(Baz) == 8, "");
>  ^
>
> This patch changes it to use the location of the condition:
>
> error: non-constant condition for static assertion
>  static_assert(sizeof(Baz) == 8, "");
>~^~~~
>
> thus highlighting the condition itself.
>
> On doing so I ran into issues where the condition has an
> UNKNOWN_LOCATION, for traits and noexcept(), so I added
> locations for these, and bulletproofed the routine in case
> there are still ways to get an UNKNOWN_LOCATION for the
> condition.
>
> Successfully bootstrapped on x86_64-pc-linux-gnu.
>
> OK for trunk?
>
> gcc/cp/ChangeLog:
> * parser.c (cp_parser_unary_expression): Generate a location for
> "noexcept".
> (cp_parser_trait_expr): Generate and return a location_t,
> converting the return type from tree to cp_expr.
> (cp_parser_static_assert): Pass location of the condition to
> finish_static_assert, rather than that of the "static_assert"
> token, where available.
>
> gcc/testsuite/ChangeLog:
> * g++.dg/cpp1y/static_assert3.C: New test case.
>
> libstdc++-v3/ChangeLog:
> * testsuite/20_util/duration/literals/range.cc: Update expected
> line of a static_assert failure.
> ---
>  gcc/cp/parser.c| 54 
> --
>  gcc/testsuite/g++.dg/cpp1y/static_assert3.C| 26 +++
>  .../testsuite/20_util/duration/literals/range.cc   |  2 +-
>  3 files changed, 68 insertions(+), 14 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/cpp1y/static_assert3.C
>
> diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
> index 1ad351c..b0539896 100644
> --- a/gcc/cp/parser.c
> +++ b/gcc/cp/parser.c
> @@ -2542,7 +2542,7 @@ static void cp_parser_late_parsing_default_args
>(cp_parser *, tree);
>  static tree cp_parser_sizeof_operand
>(cp_parser *, enum rid);
> -static tree cp_parser_trait_expr
> +static cp_expr cp_parser_trait_expr
>(cp_parser *, enum rid);
>  static bool cp_parser_declares_only_class_p
>(cp_parser *);
> @@ -8164,6 +8164,8 @@ cp_parser_unary_expression (cp_parser *parser, 
> cp_id_kind * pidk,
> bool saved_non_integral_constant_expression_p;
> bool saved_greater_than_is_operator_p;
>
> +   location_t start_loc = token->location;
> +
> cp_lexer_consume_token (parser->lexer);
> matching_parens parens;
> parens.require_open (parser);
> @@ -8200,8 +8202,19 @@ cp_parser_unary_expression (cp_parser *parser, 
> cp_id_kind * pidk,
>
> parser->type_definition_forbidden_message = saved_message;
>
> +   location_t finish_loc
> + = cp_lexer_peek_token (parser->lexer)->location;
> parens.require_close (parser);
> -   return finish_noexcept_expr (expr, tf_warning_or_error);
> +
> +   /* Construct a location of the form:
> +  noexcept (expr)
> +  ^~~
> +  with start == caret, finishing at the close-paren.  */
> +   location_t noexcept_loc
> + = make_location (start_loc, start_loc, finish_loc);
> +
> +   return cp_expr (finish_noexcept_expr (expr, tf_warning_or_error),
> +   noexcept_loc);
>   }
>
> default:
> @@ -9943,7 +9956,7 @@ cp_parser_builtin_offsetof (cp_parser *parser)
> Returns a representation of the expression, the underlying type
> of the type at issue when KEYWORD is RID_UNDERLYING_TYPE.  */
>
> -static tree
> +static cp_expr
>  cp_parser_trait_expr (cp_parser* parser, enum rid keyword)
>  {
>cp_trait_kind kind;
> @@ -10056,6 +10069,9 @@ cp_parser_trait_expr (cp_parser* parser, enum rid 
> keyword)
>gcc_unreachable ();
>  }
>
> +  /* Get location of initial token.  */
> +  location_t start_loc = cp_lexer_peek_token (parser->lexer)->location;
> +
>/* Consume the token.  */
>cp_lexer_consume_token (parser->lexer);
>
> @@ -10099,20 +10115,27 @@ cp_parser_trait_expr (cp_parser* parser, enum rid 
> keyword)
> }
>  }
>
> +  location_t finish_loc = cp_lexer_peek_token (parser->lexer)->location;
>parens.require_close (parser);
>
> +  /* Construct a location of the form:
> +   __is_trivially_copyable(_Tp)
> +   ^~~~
> + with start == caret, finishing at the close-paren.  */
> +  location_t trait_loc = make_location (start_loc, start_loc, finish_loc);
> +
>/* Complete the trait expression, which may mean either processing
>   the trait expr now or saving it for template instantiation.  */
>switch (kind)
>  

Re: [PATCH] Add a warning for invalid function casts

2017-11-29 Thread Jason Merrill

On 10/09/2017 06:30 PM, Bernd Edlinger wrote:

+/* Heuristic check if two parameter types can be considered ABI-equivalent.  */
+
+static bool
+cxx_abi_equiv_type_p (tree t1, tree t2)


This name is too general for a function that is specifically for 
implementing a particular warning.



+  if (INTEGRAL_TYPE_P (t1)
+  && INTEGRAL_TYPE_P (t2)
+  && TYPE_PRECISION (t1) == TYPE_PRECISION (t2)
+  && (TYPE_UNSIGNED (t1) == TYPE_UNSIGNED (t2)
+ || TYPE_PRECISION (t1) >= TYPE_PRECISION (integer_type_node)))
+return true;


This section needs a comment explaining what you're allowing and why.

+  else if (TYPE_PTRMEMFUNC_P (type) && TYPE_PTRMEMFUNC_P (intype))
+{
+  if ((complain & tf_warning)
+ && !same_type_p (type, intype))


Why not use cxx_safe_function_type_cast_p here, too? 
TYPE_PTRMEMFUNC_FN_TYPE will be helpful.


Jason


Re: [PATCH][PR c++/82888] smarter code for default initialization of scalar arrays

2017-11-29 Thread Jason Merrill
On Fri, Nov 17, 2017 at 11:04 PM, Nathan Froyd  wrote:
> On Fri, Nov 17, 2017 at 8:50 AM, Jason Merrill  wrote:
>> On Thu, Nov 16, 2017 at 11:21 AM, Nathan Froyd  wrote:
>>> diff --git a/gcc/cp/init.c b/gcc/cp/init.c
>>> index c76460d..53d6133 100644
>>> --- a/gcc/cp/init.c
>>> +++ b/gcc/cp/init.c
>>> @@ -4038,6 +4038,15 @@ build_vec_init (tree base, tree maxindex, tree init,
>>> }
>>>  }
>>>
>>> +  /* Default-initialize scalar arrays directly.  */
>>> +  if (TREE_CODE (atype) == ARRAY_TYPE
>>> +  && SCALAR_TYPE_P (TREE_TYPE (atype))
>>> +  && !init)
>>
>> This should check explicit_value_init._p rather than !init.  And also
>> check zero_init_p.
>
> Do you mean explicit_value_init_p && zero_init_p (atype)?

Yes.

> zero_init_p
> doesn't sound like the correct thing to use here, because it doesn't
> take into account whether a class array type has a constructor.  I am
> probably misunderstanding the purpose of the zero_init_p check,
> though.

Since you're already checking for scalar type, we don't need to worry
about classes.  The zero_init_p check is to handle pointers to data
members properly.

Jason


[PATCH] C++: improve location of static_assert errors

2017-11-29 Thread David Malcolm
Whilst investigating PR c++/82893 I noticed that the location
of static_assert diagnostics uses that of the "static_assert"
token, which seemed suboptimal to me, e.g.:

error: non-constant condition for static assertion
 static_assert(sizeof(Baz) == 8, "");
 ^

This patch changes it to use the location of the condition:

error: non-constant condition for static assertion
 static_assert(sizeof(Baz) == 8, "");
   ~^~~~

thus highlighting the condition itself.

On doing so I ran into issues where the condition has an
UNKNOWN_LOCATION, for traits and noexcept(), so I added
locations for these, and bulletproofed the routine in case
there are still ways to get an UNKNOWN_LOCATION for the
condition.

Successfully bootstrapped on x86_64-pc-linux-gnu.

OK for trunk?

gcc/cp/ChangeLog:
* parser.c (cp_parser_unary_expression): Generate a location for
"noexcept".
(cp_parser_trait_expr): Generate and return a location_t,
converting the return type from tree to cp_expr.
(cp_parser_static_assert): Pass location of the condition to
finish_static_assert, rather than that of the "static_assert"
token, where available.

gcc/testsuite/ChangeLog:
* g++.dg/cpp1y/static_assert3.C: New test case.

libstdc++-v3/ChangeLog:
* testsuite/20_util/duration/literals/range.cc: Update expected
line of a static_assert failure.
---
 gcc/cp/parser.c| 54 --
 gcc/testsuite/g++.dg/cpp1y/static_assert3.C| 26 +++
 .../testsuite/20_util/duration/literals/range.cc   |  2 +-
 3 files changed, 68 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/static_assert3.C

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 1ad351c..b0539896 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -2542,7 +2542,7 @@ static void cp_parser_late_parsing_default_args
   (cp_parser *, tree);
 static tree cp_parser_sizeof_operand
   (cp_parser *, enum rid);
-static tree cp_parser_trait_expr
+static cp_expr cp_parser_trait_expr
   (cp_parser *, enum rid);
 static bool cp_parser_declares_only_class_p
   (cp_parser *);
@@ -8164,6 +8164,8 @@ cp_parser_unary_expression (cp_parser *parser, cp_id_kind 
* pidk,
bool saved_non_integral_constant_expression_p;
bool saved_greater_than_is_operator_p;
 
+   location_t start_loc = token->location;
+
cp_lexer_consume_token (parser->lexer);
matching_parens parens;
parens.require_open (parser);
@@ -8200,8 +8202,19 @@ cp_parser_unary_expression (cp_parser *parser, 
cp_id_kind * pidk,
 
parser->type_definition_forbidden_message = saved_message;
 
+   location_t finish_loc
+ = cp_lexer_peek_token (parser->lexer)->location;
parens.require_close (parser);
-   return finish_noexcept_expr (expr, tf_warning_or_error);
+
+   /* Construct a location of the form:
+  noexcept (expr)
+  ^~~
+  with start == caret, finishing at the close-paren.  */
+   location_t noexcept_loc
+ = make_location (start_loc, start_loc, finish_loc);
+
+   return cp_expr (finish_noexcept_expr (expr, tf_warning_or_error),
+   noexcept_loc);
  }
 
default:
@@ -9943,7 +9956,7 @@ cp_parser_builtin_offsetof (cp_parser *parser)
Returns a representation of the expression, the underlying type
of the type at issue when KEYWORD is RID_UNDERLYING_TYPE.  */
 
-static tree
+static cp_expr
 cp_parser_trait_expr (cp_parser* parser, enum rid keyword)
 {
   cp_trait_kind kind;
@@ -10056,6 +10069,9 @@ cp_parser_trait_expr (cp_parser* parser, enum rid 
keyword)
   gcc_unreachable ();
 }
 
+  /* Get location of initial token.  */
+  location_t start_loc = cp_lexer_peek_token (parser->lexer)->location;
+
   /* Consume the token.  */
   cp_lexer_consume_token (parser->lexer);
 
@@ -10099,20 +10115,27 @@ cp_parser_trait_expr (cp_parser* parser, enum rid 
keyword)
}
 }
 
+  location_t finish_loc = cp_lexer_peek_token (parser->lexer)->location;
   parens.require_close (parser);
 
+  /* Construct a location of the form:
+   __is_trivially_copyable(_Tp)
+   ^~~~
+ with start == caret, finishing at the close-paren.  */
+  location_t trait_loc = make_location (start_loc, start_loc, finish_loc);
+
   /* Complete the trait expression, which may mean either processing
  the trait expr now or saving it for template instantiation.  */
   switch (kind)
 {
 case CPTK_UNDERLYING_TYPE:
-  return finish_underlying_type (type1);
+  return cp_expr (finish_underlying_type (type1), trait_loc);
 case CPTK_BASES:
-  return finish_bases (type1, false);
+  return cp_expr (finish_bases (type1, false), trait_loc);
 case CPTK_DIRECT_BASES:
-  

Re: [C++ Patch] PR 82293 ("[8 Regression] ICE in nonlambda_method_basetype at gcc/cp/lambda.c:886")

2017-11-29 Thread Jason Merrill
OK.

On Wed, Nov 22, 2017 at 9:49 AM, Paolo Carlini  wrote:
> Hi,
>
> this ICE on valid is triggered by the existing cpp0x/lambda/lambda-ice20.C
> when -Wshadow is requested. Today I confirmed that it can be reproduced only
> after r251433, the "Reimplement handling of lambdas in templates." patch:
> then, as part of tsubst_lambda_expr, do_pushdecl calls check_local_shadow
> which in turn checks nonlambda_method_basetype and at that time
> current_class_type is null. I believe this is just something which we have
> to handle in the obvious way: indeed, most other uses of LAMBDA_TYPE_P are
> already checking first that the argument is non-null, see, for example, the
> related current_nonlambda_class_type. Tested x86_64-linux.
>
> Thanks, Paolo.
>
> ///
>
>


Re: RFA (libstdc++): PATCH to implement C++17 over-aligned new

2017-11-29 Thread Jason Merrill
On Fri, Nov 24, 2017 at 8:31 AM, Marc Glisse  wrote:
> Hello,
>
> @@ -4247,9 +4248,20 @@ build_operator_new_call (tree fnname, vec va_gc> **args,
>   we disregard block-scope declarations of "operator new".  */
>fns = lookup_function_nonclass (fnname, *args, /*block_p=*/false);
>
> +  if (align_arg)
> +{
> +  vec* align_args
> +   = vec_copy_and_insert (*args, align_arg, 1);
> +  cand = perform_overload_resolution (fns, align_args, ,
> + _viable_p, tf_none);
> +  /* If no aligned allocation function matches, try again without the
> +alignment.  */
> +}
> +
>
> Code further in the function expects to be able to adjust args, which is
> defeated by copying them in align_args, see PR 82760.

Fixed, thanks.


C++ PATCH for c++/82760, memory corruption with C++17 aligned new

2017-11-29 Thread Jason Merrill
Since build_operator_new_call expects to be able to adjust the
arguments later in the function, we need to update *args with the
actual arg vec we're using.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit 08383aaa77a6a3f37dad598ec840028af43f04d9
Author: Jason Merrill 
Date:   Tue Nov 28 07:45:02 2017 -0500

PR c++/82760 - memory corruption with aligned new.

* call.c (build_operator_new_call): Update *args if we add the
align_arg.

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index 45c811e828e..e04626863af 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -4372,6 +4372,8 @@ build_operator_new_call (tree fnname, vec 
**args,
= vec_copy_and_insert (*args, align_arg, 1);
   cand = perform_overload_resolution (fns, align_args, ,
  _viable_p, tf_none);
+  if (cand)
+   *args = align_args;
   /* If no aligned allocation function matches, try again without the
 alignment.  */
 }
diff --git a/gcc/testsuite/g++.dg/cpp1z/aligned-new8.C 
b/gcc/testsuite/g++.dg/cpp1z/aligned-new8.C
new file mode 100644
index 000..11dd45722b7
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/aligned-new8.C
@@ -0,0 +1,19 @@
+// PR c++/82760
+// { dg-options -std=c++17 }
+// { dg-do run }
+
+#include 
+#include 
+
+struct alignas(2 * alignof (std::max_align_t)) aligned_foo {
+  char x[2048];
+
+  ~aligned_foo() { }
+  aligned_foo() { __builtin_memset(x, 0, sizeof(x)); }
+};
+
+int main()
+{
+  aligned_foo * gFoo = new (std::nothrow) aligned_foo[2];
+  delete[] gFoo;
+}


Re: [PATCH][AArch64] Fix ICE due to store_pair_lanes

2017-11-29 Thread Christophe Lyon
On 28 November 2017 at 19:15, James Greenhalgh  wrote:
> On Mon, Nov 27, 2017 at 03:20:29PM +, Wilco Dijkstra wrote:
>> The recently added store_pair_lanes causes ICEs in output_operand.
>> This is due to aarch64_classify_address treating it like a 128-bit STR
>> rather than a STP. The valid immediate offsets don't fully overlap,
>> causing it to return false.  Eg. offset 264 is a valid 8-byte STP offset
>> but not a valid 16-byte STR offset since it isn't a multiple of 16.
>>
>> The original instruction isn't passed in the printing code, so the context
>> is unclear.  The solution is to add a new operand formatting specifier
>> which is used for LDP/STP instructions like this.  This, like the Uml
>> constraint that applies to store_pair_lanes, uses PARALLEL when calling
>> aarch64_classify_address so that it knows it is an STP.
>> Also add the 'z' specifier for future use by load/store pair instructions.
>>
>> Passes regress, OK for commit?
>
> OK. But...
>
>> +  if (aarch64_classify_address (, x, mode, op, true))
>
> This interface is not nice, resulting in...
>
>> +/* Print address 'x' of a LDP/STP with mode 'mode'.  */
>> +static void
>> +aarch64_print_ldpstp_address (FILE *f, machine_mode mode, rtx x)
>> +{
>> +  aarch64_print_address_internal (f, mode, x, PARALLEL);
>> +}
>> +
>> +/* Print address 'x' of a memory access with mode 'mode'.  */
>> +static void
>> +aarch64_print_operand_address (FILE *f, machine_mode mode, rtx x)
>> +{
>> +  aarch64_print_address_internal (f, mode, x, MEM);
>> +}
>
> These, which I *really* dislike.
>
> Ideas on how to clean up this interface would be appreciated.
>
> Thanks,
> James
>

Hi Wilco,

This breaks the build of aarch64-none-elf toolchains:

/tmp/8868058_9.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-aarch64_be-none-elf/gcc1/./gcc/xgcc
-B/tmp/8868058_9.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-aarch64_be-none-elf/gcc1/./gcc/
-B/tmp/8868058_9.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/aarch64_be-none-elf/bin/
-B/tmp/8868058_9.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/aarch64_be-none-elf/lib/
-isystem 
/tmp/8868058_9.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/aarch64_be-none-elf/include
-isystem 
/tmp/8868058_9.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/tools/aarch64_be-none-elf/sys-include
   -g -O2 -mabi=ilp32 -O2  -g -O2 -DIN_GCC
-DCROSS_DIRECTORY_STRUCTURE  -W -Wall -Wwrite-strings -Wcast-qual
-Wstrict-prototypes -Wmissing-prototypes -Wold-style-definition
-isystem ./include   -g -DIN_LIBGCC2 -fbuilding-libgcc
-fno-stack-protector -Dinhibit_libc  -I. -I. -I../../.././gcc
-I/tmp/8868058_9.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc
-I/tmp/8868058_9.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/.
-I/tmp/8868058_9.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/../gcc
-I/tmp/8868058_9.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/../include
   -o unwind-dw2-fde.o -MT unwind-dw2-fde.o -MD -MP -MF
unwind-dw2-fde.dep -fexceptions -c
/tmp/8868058_9.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/unwind-dw2-fde.c
-fvisibility=hidden -DHIDE_EXPORTS
during RTL pass: final
/tmp/8868058_9.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/unwind-dw2-fde.c:
In function 'search_object':
/tmp/8868058_9.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/unwind-dw2-fde.c:1024:1:
internal compiler error: in aarch64_print_address_internal, at
config/aarch64/aarch64.c:5637
 }
 ^
0xf2259e aarch64_print_address_internal
/tmp/8868058_9.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/aarch64/aarch64.c:5637
0x7fb063 output_address(machine_mode, rtx_def*)
/tmp/8868058_9.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/final.c:3905
0x7fe563 output_asm_insn(char const*, rtx_def**)
/tmp/8868058_9.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/final.c:3766
0x7fed04 final_scan_insn(rtx_insn*, _IO_FILE*, int, int, int*)
/tmp/8868058_9.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/final.c:3064
0x7ffcfa final(rtx_insn*, _IO_FILE*, int)
/tmp/8868058_9.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/final.c:2052
0x80088b rest_of_handle_final
/tmp/8868058_9.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/final.c:4490
0x80088b execute
/tmp/8868058_9.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/final.c:4564
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.
make[4]: *** [unwind-dw2-fde.o] Error 1
make[4]: Leaving directory
`/tmp/8868058_9.tmpdir/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-aarch64_be-none-elf/gcc1/aarch64_be-none-elf/ilp32/libgcc'
make[3]: *** [multi-do] Error 1


Can you have a look?

Thanks,

Christophe


Re: [patch] jump threading multiple paths that start from the same BB

2017-11-29 Thread Aldy Hernandez

On 11/27/2017 06:27 PM, Jeff Law wrote:

On 11/07/2017 10:33 AM, Aldy Hernandez wrote:



Without further ado, here are my monumental, earth shattering improvements:

Conditional branches
    Without patch: 411846839709
    With    patch: 411831330316
     %changed: -0.0037660%

Number of instructions
    Without patch: 2271844650634
    With    patch: 2271761153578
     %changed: -0.0036754%

So that's pretty small.  A really good improvement would be on the order
of a half-percent reduction in runtime conditional branches.  I usually
test with .i files that have enable-checking turned on -- which tends to
give lots of opportunities due to the redundancies in our checking code.

On a positive note, you're eliminating roughly 4.5 other dynamic
instructions for every runtime conditional branch you remove, which is
actually a good ratio.  3.5 is what I typically see for a fairly
extensive amount of work.   Patrick P's work last year was on the order
of 7.5.  So while it didn't fire often, when it did it was highly effective.


I've retested with .ii files gathered from a stage1 build with 
--enable-checking=yes,all.  The results are an order of magnitude better 
but not impressive by any means:


Conditional branches
   Without patch: 133689781
   Withpatch: 1333666327560
%changed: -0.0249416%

Number of instructions
   Without patch: 7347240547621
   Withpatch: 7348622241739
%changed: -0.0188021%




We have installed changes of this nature when they were part of a larger
set of changes, particularly if they helped us elsewhere (like
eliminating key path to silence a bogus warning or addressing other
regressions).


I don't know if the above changes your view, but I am not losing sleep 
over this, so no worries.  I will just keep accumulating these along 
with the myriad of other changes I have ;-).


Thanks for taking a look at this, and making sure I'm headed in the 
right direction.


Aldy


Re: [PATCH][i386,AVX] Enable VBMI2 support [7/7]

2017-11-29 Thread Kirill Yukhin
Hello Julia!
On 24 Oct 10:28, Koval, Julia wrote:
> Hi,
> This patch enables VPSHRDV instruction. The doc for isaset and instruction: 
> https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf
> 
> Ok for trunk?
Your patch is OK for trunk. I've checked it in.

--
Thanks, K


Re: [PATCH][i386,AVX] Enable VBMI2 support [6/7]

2017-11-29 Thread Kirill Yukhin
Hello Julia,
On 24 Oct 10:16, Koval, Julia wrote:
> Hi,
> This patch enables VPSHRDV instruction. The doc for isaset and instruction: 
> https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf
> 
> Ok for trunk?
Your patch is OK for trunk. I've checked it in.

--
Thanks, K


Add myself as GCC maintainer

2017-11-29 Thread Qing Zhao
Added myself as GCC maintainer with r255248:

https://gcc.gnu.org/ml/gcc-cvs/2017-11/msg00965.html 


thanks.

Qing

==

ChangeLog

+2017-11-29  Qing Zhao  
+
+   * MAINTAINERS (Write After Approval): Add myself.
+

Index: MAINTAINERS
===
--- MAINTAINERS (revision 255246)
+++ MAINTAINERS (working copy)
@@ -644,6 +644,7 @@
 Kirill Yukhin  
 Adhemerval Zanella 
 Yufeng Zhang   
+Qing Zhao  
 Shujing Zhao   
 Jon Ziegler
 Roman Zippel   



Re: RFC: Variable-length VECTOR_CSTs

2017-11-29 Thread Richard Sandiford
Thanks for the quick feedback!

Richard Biener  writes:
> On Wed, Nov 29, 2017 at 12:57 PM, Richard Sandiford
>  wrote:
>> It was clear from the SVE reviews that people were unhappy with how
>> "special" the variable-length case was.  One particular concern was
>> the use of VEC_DUPLICATE_CST and VEC_SERIES_CST, and the way that
>> that would in turn lead to different representations of VEC_PERM_EXPRs
>> with constant permute vectors, and difficulties in invoking
>> vec_perm_const_ok.
>>
>> This is an RFC for a VECTOR_CST representation that treats each
>> specific constant as one instance of an arbitrary-length sequence.
>> The reprensentation then extends to variable-length vectors in a
>> natural way.
>>
>> As discussed on IRC, if a vector contains X*N elements for some
>> constant N and integer X>0, the main features we need are:
>>
>> 1) the ability to represent a sequence that duplicates N values
>>
>>This is useful for SLP invariants.
>>
>> 2) the ability to represent a sequence that starts with N values and
>>is followed by zeros
>>
>>This is useful for the initial value in a double or SLP reduction
>>
>> 3) the ability to represent N interleaved series
>>
>>This is useful for SLP inductions and for VEC_PERM_EXPRs.
>>
>> For (2), zero isn't necessarily special, since vectors used in an AND
>> reduction might need to fill with ones.  Also, we might need up to N
>> different fill values with mixed SLP operations; it isn't necessarily
>> safe to assume that a single fill value will always be enough.
>>
>> The same goes for (3): there's no reason in principle why the
>> steps in an SLP induction should all be the same (although they
>> do need to be at the moment).  E.g. once we support SLP on:
>>
>>   for (unsigned int i = 0; i < n; i += 2)
>> {
>>   x[i] += 4 + i;
>>   x[i + 1] += 11 + i * 3;
>> }
>>
>> we'll need {[4, 14], +, [2, 6]}.
>>
>> So the idea is to represent vectors as P interleaved patterns of the form:
>>
>>   [BASE0, BASE1, BASE1 + STEP, BASE1 + STEP*2, ...]
>>
>> where the STEP is always zero (actually null) for non-integer vectors.
>> This is effectively projecting a "foreground" value of P elements
>> onto an arbitrary-length "background" sequenece, where the background
>> sequence contains P parallel linear series.
>>
>> E.g. to pick an extreme and unlikely example,
>>
>>   [42, 99, 2, 20, 3, 30, 4, 40, ...]
>>
>> has 2 patterns:
>>
>>   BASE0 = 42, BASE1 = 2, STEP = 1
>>   BASE0 = 99, BASE1 = 20, STEP = 10
>>
>> The more useful cases are degenerate versions of this general case.
>>
>> As far as memory consumption goes: the number of patterns needed for a
>> fixed-length vector with 2*N elements is always at most N; in the worst
>> case, we simply interleave the first N elements with the second N elements.
>> The worst-case increase in footprint is therefore N trees for the steps.
>> In practice the footprint is usually smaller than it was before, since
>> most constants do have a pattern.
>>
>> The patch below implements this for trees.  I have patches to use the
>> same style of encoding for CONST_VECTOR and vec_perm_indices, but the
>> tree one is probably easiest to read.
>>
>> The patch only adds the representation.  Follow-on patches make more
>> use of it (and usually make things simpler; e.g. integer_zerop is no
>> longer a looping operation).
>>
>> Does this look better?
>
> Yes, the overall design looks good.  I wonder why you chose to have
> the number of patterns being a power of two?  I suppose this is
> to have the same number of elements from all patterns in the final
> vector (which is power-of-two sized)?

Right.  The rtl and vec_perm_indices parts don't have this restriction,
since some ports do define non-power-of-2 vectors for internal use.
The problem is that, since VECTOR_CSTs are used by the FE, we need
to support all valid vector lengths without blowing the 16-bit field.
Using the same style of representation as TYPE_VECTOR_SUBPARTS seemed
like the safest way of doing that.

> I wonder if there exists a vector where say a three-pattern
> interleaving would be smaller than a four-pattern one?

Only in the non-power-of-2 case.

> Given you add flags for various purposes would it make sense to
> overload 'step' with a regular element to avoid the storage increase
> in case step is unnecessary?  This makes it have three elements
> which is of course awkward :/

I wondered about keeping it as an array of trees and tacking the
steps onto the end as an optional addition.  But the idea is that
tree_vector_pattern becomes the preferred way of handling constant
vectors, if it can be used, so it seemed neater to use in the tree
node too.

> Few more comments below.
>
> Otherwise looks good to go!
>
> Thanks for doing this.
>
> [...]
>> ! /* PATTERNS represents a constant of type TYPE.  Compress it into the
>> !canonical form, returning the new vector of patterns.  Use CUR and 

Re: [RFC][PATCH] Extend DCE to remove unnecessary new/delete-pairs

2017-11-29 Thread Richard Biener
On November 29, 2017 4:56:44 PM GMT+01:00, Martin Sebor  
wrote:
>On 11/29/2017 01:30 AM, Jakub Jelinek wrote:
>> On Tue, Nov 28, 2017 at 09:11:00PM -0700, Martin Sebor wrote:
>>> On 11/27/2017 02:22 AM, Dominik Inführ wrote:
 Thanks for all the reviews! I’ve revised the patch, the
>operator_delete_flag is now stored in tree_decl_with_vis (there already
>seem to be some FUNCTION_DECL-flags in there). I’ve also added the
>option -fallocation-dce to disable this optimization. It bootstraps and
>no regressions on aarch64 and x86_64.

>>> It's great to be able to eliminate pairs of these calls.  For
>>> unpaired calls, though, I think it would be even more useful to
>>> also issue a warning.  Otherwise the elimination will mask bugs
>>
>> ??  I hope you're only talking about allocation where the returned
>> pointer can't leak elsewhere, doing allocation in one function
>> (e.g. constructor, or whatever other function) and deallocation in
>some
>> other one is so common such a warning would be not just useless, but
>> harmful with almost all occurrences being false positives.
>>
>> Warning on malloc/standard operator new or malloc/realloc-like
>function
>> when the return pointer can't escape the current function is
>reasonable.
>
>Yes, warn for leaks, or for calls to delete/free with no matching
>new/malloc (when they can be detected).
>
> From the test case included in the patch, warn on the first two
>of the following three functions:
>
>+++ b/gcc/testsuite/g++.dg/cpp1y/new1.C
>@@ -0,0 +1,65 @@
>+/* { dg-do compile } */
>+/* { dg-options "-O2 -fdump-tree-cddce-details" } */
>+
>+#include 
>+
>+void
>+new_without_use() {
>+  int *x = new int;
>+}
>+
>+void
>+new_array_without_use() {
>+  int *x = new int[5];
>+}
>+
>+void
>+new_primitive() {
>+  int *x = new int;
>+  delete x;
>+}
>
>An obvious extension to such a checker would then be to also detect
>possible invalid deallocations, as in:
>
>   void f (unsigned n)
>   {
> void *p = n < 256 ? alloca (n) : malloc (n);
> // ...
> free (p);
>   }
>
>David Malcolm was working on something like that earlier this year
>so he might have some thoughts on this as well.

Or

P = new x;
Free (P) ;

Richard. 

>Martin



Re: [RFC][PATCH] Extend DCE to remove unnecessary new/delete-pairs

2017-11-29 Thread Andrew Pinski
On Wed, Nov 29, 2017 at 8:15 AM, David Malcolm  wrote:
> On Wed, 2017-11-29 at 08:56 -0700, Martin Sebor wrote:
>> On 11/29/2017 01:30 AM, Jakub Jelinek wrote:
>> > On Tue, Nov 28, 2017 at 09:11:00PM -0700, Martin Sebor wrote:
>> > > On 11/27/2017 02:22 AM, Dominik Inführ wrote:
>> > > > Thanks for all the reviews! I’ve revised the patch, the
>> > > > operator_delete_flag is now stored in tree_decl_with_vis (there
>> > > > already seem to be some FUNCTION_DECL-flags in there). I’ve
>> > > > also added the option -fallocation-dce to disable this
>> > > > optimization. It bootstraps and no regressions on aarch64 and
>> > > > x86_64.
>> > > >
>> > >
>> > > It's great to be able to eliminate pairs of these calls.  For
>> > > unpaired calls, though, I think it would be even more useful to
>> > > also issue a warning.  Otherwise the elimination will mask bugs
>> >
>> > ??  I hope you're only talking about allocation where the returned
>> > pointer can't leak elsewhere, doing allocation in one function
>> > (e.g. constructor, or whatever other function) and deallocation in
>> > some
>> > other one is so common such a warning would be not just useless,
>> > but
>> > harmful with almost all occurrences being false positives.
>> >
>> > Warning on malloc/standard operator new or malloc/realloc-like
>> > function
>> > when the return pointer can't escape the current function is
>> > reasonable.
>>
>> Yes, warn for leaks, or for calls to delete/free with no matching
>> new/malloc (when they can be detected).
>>
>>  From the test case included in the patch, warn on the first two
>> of the following three functions:
>>
>> +++ b/gcc/testsuite/g++.dg/cpp1y/new1.C
>> @@ -0,0 +1,65 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -fdump-tree-cddce-details" } */
>> +
>> +#include 
>> +
>> +void
>> +new_without_use() {
>> +  int *x = new int;
>> +}
>> +
>> +void
>> +new_array_without_use() {
>> +  int *x = new int[5];
>> +}
>> +
>> +void
>> +new_primitive() {
>> +  int *x = new int;
>> +  delete x;
>> +}
>>
>> An obvious extension to such a checker would then be to also detect
>> possible invalid deallocations, as in:
>>
>>void f (unsigned n)
>>{
>>  void *p = n < 256 ? alloca (n) : malloc (n);
>>  // ...
>>  free (p);
>>}
>>
>> David Malcolm was working on something like that earlier this year
>> so he might have some thoughts on this as well.
>
> I was experimenting with optimizing away matching malloc/free pairs,
> moving the allocation to either the stack, or to a thread-local
> obstack, under certain conditions, or to hoist allocations out of
> loops.
>
> I didn't get any significant wins, but much of this was due to my lack
> of experience with the middle-end, and being drawn back to front-
> end/diagnostic improvements.
>
> Issues I ran into included:
>
> * wrapper functions like "Perl_malloc" which require LTO for the
> optimization to be able to see the allocations
>
> * 435.gromacs has this "interesting" function:

It is still in the upstream sources too:
https://sourcecodebrowser.com/gromacs/3.3.1/smalloc_8h.html#a776f070172af6850dda887c7358fa630

Thanks,
Andrew Pinski

>
>   unsigned maxavail(void)
>   {
> char *ptr;
> unsigned low,high,size;
>
> low=0;
> high=256e6;
> while ((high-low) > 4) {
>   size=(high+low)/2;
>   if ((ptr=malloc((size_t)size))==NULL)
> high=size;
>   else {
> free(ptr);
> low=size;
>   }
> }
> return low;
>   }
>
>   i.e. a loop which attempts a binary search of malloc calls to try to
> find a threshold at which they fail.
>
>
> Dave


Re: [RFC][PATCH] Extend DCE to remove unnecessary new/delete-pairs

2017-11-29 Thread David Malcolm
On Wed, 2017-11-29 at 08:56 -0700, Martin Sebor wrote:
> On 11/29/2017 01:30 AM, Jakub Jelinek wrote:
> > On Tue, Nov 28, 2017 at 09:11:00PM -0700, Martin Sebor wrote:
> > > On 11/27/2017 02:22 AM, Dominik Inführ wrote:
> > > > Thanks for all the reviews! I’ve revised the patch, the
> > > > operator_delete_flag is now stored in tree_decl_with_vis (there
> > > > already seem to be some FUNCTION_DECL-flags in there). I’ve
> > > > also added the option -fallocation-dce to disable this
> > > > optimization. It bootstraps and no regressions on aarch64 and
> > > > x86_64.
> > > > 
> > > 
> > > It's great to be able to eliminate pairs of these calls.  For
> > > unpaired calls, though, I think it would be even more useful to
> > > also issue a warning.  Otherwise the elimination will mask bugs
> > 
> > ??  I hope you're only talking about allocation where the returned
> > pointer can't leak elsewhere, doing allocation in one function
> > (e.g. constructor, or whatever other function) and deallocation in
> > some
> > other one is so common such a warning would be not just useless,
> > but
> > harmful with almost all occurrences being false positives.
> > 
> > Warning on malloc/standard operator new or malloc/realloc-like
> > function
> > when the return pointer can't escape the current function is
> > reasonable.
> 
> Yes, warn for leaks, or for calls to delete/free with no matching
> new/malloc (when they can be detected).
> 
>  From the test case included in the patch, warn on the first two
> of the following three functions:
> 
> +++ b/gcc/testsuite/g++.dg/cpp1y/new1.C
> @@ -0,0 +1,65 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-cddce-details" } */
> +
> +#include 
> +
> +void
> +new_without_use() {
> +  int *x = new int;
> +}
> +
> +void
> +new_array_without_use() {
> +  int *x = new int[5];
> +}
> +
> +void
> +new_primitive() {
> +  int *x = new int;
> +  delete x;
> +}
> 
> An obvious extension to such a checker would then be to also detect
> possible invalid deallocations, as in:
> 
>void f (unsigned n)
>{
>  void *p = n < 256 ? alloca (n) : malloc (n);
>  // ...
>  free (p);
>}
> 
> David Malcolm was working on something like that earlier this year
> so he might have some thoughts on this as well.

I was experimenting with optimizing away matching malloc/free pairs,
moving the allocation to either the stack, or to a thread-local
obstack, under certain conditions, or to hoist allocations out of
loops.  

I didn't get any significant wins, but much of this was due to my lack
of experience with the middle-end, and being drawn back to front-
end/diagnostic improvements.

Issues I ran into included:

* wrapper functions like "Perl_malloc" which require LTO for the
optimization to be able to see the allocations

* 435.gromacs has this "interesting" function:

  unsigned maxavail(void)
  {
char *ptr;
unsigned low,high,size;
  
low=0;
high=256e6;
while ((high-low) > 4) {
  size=(high+low)/2;
  if ((ptr=malloc((size_t)size))==NULL)
high=size;
  else {
free(ptr);
low=size;
  }
}
return low;
  }

  i.e. a loop which attempts a binary search of malloc calls to try to
find a threshold at which they fail.


Dave


Re: [RFC][PATCH] Extend DCE to remove unnecessary new/delete-pairs

2017-11-29 Thread Martin Sebor

On 11/29/2017 01:30 AM, Jakub Jelinek wrote:

On Tue, Nov 28, 2017 at 09:11:00PM -0700, Martin Sebor wrote:

On 11/27/2017 02:22 AM, Dominik Inführ wrote:

Thanks for all the reviews! I’ve revised the patch, the operator_delete_flag is 
now stored in tree_decl_with_vis (there already seem to be some 
FUNCTION_DECL-flags in there). I’ve also added the option -fallocation-dce to 
disable this optimization. It bootstraps and no regressions on aarch64 and 
x86_64.


It's great to be able to eliminate pairs of these calls.  For
unpaired calls, though, I think it would be even more useful to
also issue a warning.  Otherwise the elimination will mask bugs


??  I hope you're only talking about allocation where the returned
pointer can't leak elsewhere, doing allocation in one function
(e.g. constructor, or whatever other function) and deallocation in some
other one is so common such a warning would be not just useless, but
harmful with almost all occurrences being false positives.

Warning on malloc/standard operator new or malloc/realloc-like function
when the return pointer can't escape the current function is reasonable.


Yes, warn for leaks, or for calls to delete/free with no matching
new/malloc (when they can be detected).

From the test case included in the patch, warn on the first two
of the following three functions:

+++ b/gcc/testsuite/g++.dg/cpp1y/new1.C
@@ -0,0 +1,65 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-cddce-details" } */
+
+#include 
+
+void
+new_without_use() {
+  int *x = new int;
+}
+
+void
+new_array_without_use() {
+  int *x = new int[5];
+}
+
+void
+new_primitive() {
+  int *x = new int;
+  delete x;
+}

An obvious extension to such a checker would then be to also detect
possible invalid deallocations, as in:

  void f (unsigned n)
  {
void *p = n < 256 ? alloca (n) : malloc (n);
// ...
free (p);
  }

David Malcolm was working on something like that earlier this year
so he might have some thoughts on this as well.

Martin


Re: [PATCH] Fix PR tree-optimization/83195 testcase for arm

2017-11-29 Thread Kyrill Tkachov

Hi Jakub,

On 29/11/17 08:18, Jakub Jelinek wrote:

Hi!

The pr82929.c testcase uses store_merge effective target, which is
int32plus && nonstrict_align.  Unfortunately, arm is handled for
nonstrict_align wierdly, although it has STRICT_ALIGNMENT 1, it is sometimes
considered nonstrict_align (the only exception apparently).

Now, the testcase really needs a non-strict alignment target where
STRICT_ALIGNMENT is 0, otherwise the optimization it tests is not
beneficial.  So, the following patch stops testing it on arm, and adds
another test where the pointers are guaranteed to be aligned and thus we
can test for the optimization even on non-strict alignment targets.

Bootstrapped/regtested on x86_64-linux and i686-linux, tested by hand using
a cross-compiler to arm, ok for trunk?

2017-11-29  Jakub Jelinek  

PR tree-optimization/83195
* gcc.dg/pr82929.c: Don't check for "Merging successful" on arm.
* gcc.dg/pr82929-2.c: New test.

--- gcc/testsuite/gcc.dg/pr82929.c.jj   2017-11-10 15:42:39.0 +0100
+++ gcc/testsuite/gcc.dg/pr82929.c  2017-11-28 17:50:43.705221829 +0100
@@ -15,4 +15,4 @@ foo (short *p, short *q, short *r)
p[1] = e & f;
  }
  
-/* { dg-final { scan-tree-dump-times "Merging successful" 1 "store-merging" } } */

+/* { dg-final { scan-tree-dump-times "Merging successful" 1 "store-merging" { 
target { ! arm*-*-* } } } } */
--- gcc/testsuite/gcc.dg/pr82929-2.c.jj 2017-11-28 17:47:41.858409094 +0100
+++ gcc/testsuite/gcc.dg/pr82929-2.c2017-11-28 17:48:55.264526160 +0100
@@ -0,0 +1,21 @@
+/* PR tree-optimization/82929 */
+/* { dg-do compile { target store_merge } } */
+/* { dg-options "-O2 -fdump-tree-store-merging" } */
+
+void
+foo (short *p, short *q, short *r)
+{
+  p = __builtin_assume_aligned (p, __alignof__ (int));
+  q = __builtin_assume_aligned (q, __alignof__ (int));
+  r = __builtin_assume_aligned (r, __alignof__ (int));
+  short a = q[0];
+  short b = q[1];
+  short c = ~a;
+  short d = r[0];
+  short e = r[1];
+  short f = ~b;
+  p[0] = c & d;
+  p[1] = e & f;
+}
+
+/* { dg-final { scan-tree-dump-times "Merging successful" 1 "store-merging" } 
} */


Sudi has kindly tried it out and the new test passes on arm, so this 
patch is ok from an arm perspective.

Thanks for fixing this,
Kyrill


Jakub




Re: RFC: Variable-length VECTOR_CSTs

2017-11-29 Thread Richard Biener
On Wed, Nov 29, 2017 at 12:57 PM, Richard Sandiford
 wrote:
> It was clear from the SVE reviews that people were unhappy with how
> "special" the variable-length case was.  One particular concern was
> the use of VEC_DUPLICATE_CST and VEC_SERIES_CST, and the way that
> that would in turn lead to different representations of VEC_PERM_EXPRs
> with constant permute vectors, and difficulties in invoking
> vec_perm_const_ok.
>
> This is an RFC for a VECTOR_CST representation that treats each
> specific constant as one instance of an arbitrary-length sequence.
> The reprensentation then extends to variable-length vectors in a
> natural way.
>
> As discussed on IRC, if a vector contains X*N elements for some
> constant N and integer X>0, the main features we need are:
>
> 1) the ability to represent a sequence that duplicates N values
>
>This is useful for SLP invariants.
>
> 2) the ability to represent a sequence that starts with N values and
>is followed by zeros
>
>This is useful for the initial value in a double or SLP reduction
>
> 3) the ability to represent N interleaved series
>
>This is useful for SLP inductions and for VEC_PERM_EXPRs.
>
> For (2), zero isn't necessarily special, since vectors used in an AND
> reduction might need to fill with ones.  Also, we might need up to N
> different fill values with mixed SLP operations; it isn't necessarily
> safe to assume that a single fill value will always be enough.
>
> The same goes for (3): there's no reason in principle why the
> steps in an SLP induction should all be the same (although they
> do need to be at the moment).  E.g. once we support SLP on:
>
>   for (unsigned int i = 0; i < n; i += 2)
> {
>   x[i] += 4 + i;
>   x[i + 1] += 11 + i * 3;
> }
>
> we'll need {[4, 14], +, [2, 6]}.
>
> So the idea is to represent vectors as P interleaved patterns of the form:
>
>   [BASE0, BASE1, BASE1 + STEP, BASE1 + STEP*2, ...]
>
> where the STEP is always zero (actually null) for non-integer vectors.
> This is effectively projecting a "foreground" value of P elements
> onto an arbitrary-length "background" sequenece, where the background
> sequence contains P parallel linear series.
>
> E.g. to pick an extreme and unlikely example,
>
>   [42, 99, 2, 20, 3, 30, 4, 40, ...]
>
> has 2 patterns:
>
>   BASE0 = 42, BASE1 = 2, STEP = 1
>   BASE0 = 99, BASE1 = 20, STEP = 10
>
> The more useful cases are degenerate versions of this general case.
>
> As far as memory consumption goes: the number of patterns needed for a
> fixed-length vector with 2*N elements is always at most N; in the worst
> case, we simply interleave the first N elements with the second N elements.
> The worst-case increase in footprint is therefore N trees for the steps.
> In practice the footprint is usually smaller than it was before, since
> most constants do have a pattern.
>
> The patch below implements this for trees.  I have patches to use the
> same style of encoding for CONST_VECTOR and vec_perm_indices, but the
> tree one is probably easiest to read.
>
> The patch only adds the representation.  Follow-on patches make more
> use of it (and usually make things simpler; e.g. integer_zerop is no
> longer a looping operation).
>
> Does this look better?

Yes, the overall design looks good.  I wonder why you chose to have
the number of patterns being a power of two?  I suppose this is
to have the same number of elements from all patterns in the final
vector (which is power-of-two sized)?  I wonder if there exists
a vector where say a three-pattern interleaving would be smaller
than a four-pattern one?

Given you add flags for various purposes would it make sense to
overload 'step' with a regular element to avoid the storage increase
in case step is unnecessary?  This makes it have three elements
which is of course awkward :/

Few more comments below.

Otherwise looks good to go!

Thanks for doing this.

> Thanks,
> Richard
>
>
> 2017-11-29  Richard Sandiford  
>
> gcc/
> * doc/generic.texi (VECTOR_CST): Describe new representation of
> vector constants.
> * tree.def (VECTOR_CST): Update comment to refer to generic.texi.
> * tree-core.h (tree_base): Add a vector_cst field to the u union.
> (tree_vector_pattern): New struct.
> (tree_vector): Replace elts array with a patterns array.
> * tree.h (VECTOR_CST_NELTS): Redefine using TYPE_VECTOR_SUBPARTS.
> (VECTOR_CST_ELTS): Delete.
> (VECTOR_CST_ELT): Redefine using vector_cst_elt.
> (VECTOR_CST_LOG2_NPATTERNS, VECTOR_CST_NPATTERNS): New macros.
> (VECTOR_CST_DUPLICATE_P, VECTOR_CST_SERIES_P, VECTOR_CST_BASE)
> (VECTOR_CST_STEP): Likewise.
> (make_vector): Take the values of VECTOR_CST_LOG2_NPATTERNS,
> VECTOR_CST_DUPLICATE_P and VECTOR_CST_SERIES_P as arguments.
> (build_vector): Declare an overload that takes a vector of
> 

Re: RFC: Variable-length VECTOR_CSTs

2017-11-29 Thread David Malcolm
On Wed, 2017-11-29 at 11:57 +, Richard Sandiford wrote:

[...]

I can't really comment on the representation ideas, but I'm always
happy to see new selftests...

*** test_labels ()
> *** 13954,13959 
> --- 14179,14350 
> ASSERT_FALSE (FORCED_LABEL (label_decl));
>   }
>   
> + /* Check that VECTOR_CST Y contains the elements in X.  */
> + 
> + static void
> + check_vector_cst (vec x, tree y)
> + {
> +   for (unsigned int i = 0; i < x.length (); ++i)
> + ASSERT_EQ (wi::to_wide (x[i]), wi::to_wide (vector_cst_elt (y,
> i)));
> + }

...a couple of nits/suggestions:

(a) How about renaming "x" to "expected"?  Maybe rename "y" to
"actual"?  (to better document the sense of the test).

(b) At first glance, I wondered if this routine should also have
something like:

  ASSERT_EQ (expected.length (), VECTOR_CST_NELTS (actual));

Though that seems to come from the vector type, and it's always 8 in
these examples, so I'm not sure.

 
Hope this is constructive
Dave


[PATCH] Avoid peeling in cunrolli

2017-11-29 Thread Richard Biener

It turns out that we don't vectorize the 2nd testcase in PR83202
(or rather we do that in weird ways during BB vectorization) because
cunrolli decides to peel the inner loop completely based on
the size of the accessed arrays.  That unfortunately leaves exit
tests in the outer loop body which in turn makes us not vectorize
the loop.

We have a late unrolling pass for these kind of unrollings so this
patch simply avoids doing this during cunrolli.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2017-11-29  Richard Biener  

PR tree-optimization/83202
* tree-ssa-loop-ivcanon.c (try_unroll_loop_completely): Add
allow_peel argument and guard peeling.
(canonicalize_loop_induction_variables): Likewise.
(canonicalize_induction_variables): Pass false.
(tree_unroll_loops_completely_1): Pass unroll_outer to disallow
peeling from cunrolli.

* gcc.dg/vect/pr83202-1.c: New testcase.

Index: gcc/tree-ssa-loop-ivcanon.c
===
--- gcc/tree-ssa-loop-ivcanon.c (revision 255201)
+++ gcc/tree-ssa-loop-ivcanon.c (working copy)
@@ -679,7 +679,7 @@ try_unroll_loop_completely (struct loop
edge exit, tree niter,
enum unroll_level ul,
HOST_WIDE_INT maxiter,
-   location_t locus)
+   location_t locus, bool allow_peel)
 {
   unsigned HOST_WIDE_INT n_unroll = 0;
   bool n_unroll_found = false;
@@ -711,7 +711,8 @@ try_unroll_loop_completely (struct loop
 exit = NULL;
 
   /* See if we can improve our estimate by using recorded loop bounds.  */
-  if (maxiter >= 0
+  if (allow_peel
+  && maxiter >= 0
   && (!n_unroll_found || (unsigned HOST_WIDE_INT)maxiter < n_unroll))
 {
   n_unroll = maxiter;
@@ -1139,7 +1140,7 @@ try_peel_loop (struct loop *loop,
 static bool
 canonicalize_loop_induction_variables (struct loop *loop,
   bool create_iv, enum unroll_level ul,
-  bool try_eval)
+  bool try_eval, bool allow_peel)
 {
   edge exit = NULL;
   tree niter;
@@ -1207,7 +1208,8 @@ canonicalize_loop_induction_variables (s
  populates the loop bounds.  */
   modified |= remove_redundant_iv_tests (loop);
 
-  if (try_unroll_loop_completely (loop, exit, niter, ul, maxiter, locus))
+  if (try_unroll_loop_completely (loop, exit, niter, ul, maxiter, locus,
+ allow_peel))
 return true;
 
   if (create_iv
@@ -1238,7 +1240,7 @@ canonicalize_induction_variables (void)
 {
   changed |= canonicalize_loop_induction_variables (loop,
true, UL_SINGLE_ITER,
-   true);
+   true, false);
 }
   gcc_assert (!need_ssa_update_p (cfun));
 
@@ -1353,7 +1355,7 @@ tree_unroll_loops_completely_1 (bool may
 ul = UL_NO_GROWTH;
 
   if (canonicalize_loop_induction_variables
-(loop, false, ul, !flag_tree_loop_ivcanon))
+(loop, false, ul, !flag_tree_loop_ivcanon, unroll_outer))
 {
   /* If we'll continue unrolling, we need to propagate constants
 within the new basic blocks to fold away induction variable
Index: gcc/testsuite/gcc.dg/vect/pr83202-1.c
===
--- gcc/testsuite/gcc.dg/vect/pr83202-1.c   (nonexistent)
+++ gcc/testsuite/gcc.dg/vect/pr83202-1.c   (working copy)
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_double } */
+
+void test(double data[8][8])
+{
+  for (int i = 0; i < 8; i++)
+{
+  for (int j = 0; j < i; j+=4)
+   {
+ data[i][j] *= data[i][j];
+ data[i][j+1] *= data[i][j+1];
+ data[i][j+2] *= data[i][j+2];
+ data[i][j+3] *= data[i][j+3];
+   }
+}
+}
+
+/* { dg-final { scan-tree-dump "Loop contains only SLP stmts" "vect" } } */
+/* { dg-final { scan-tree-dump "ectorized 1 loops" "vect" } } */


[PATCH] Address SLP cost-model issue of PR83202

2017-11-29 Thread Richard Biener

SLP costing and code generation has the very old issue that it operates
on the SLP tree which, because it isn't a graph, contains redundancies.

The following patch builds upon the hashing infrastructure I added
a few months back and implements a workaround at code-generation and
costing time without changing the core data structures.

We simply can re-use vectorized stmts from a SLP node with exactly
the same set of scalar stmts.  Likewise we only have to cost them
once.

I've implemented the code-generation side to verify correctness
of the costing adjustment.  The simple example that we fail to
vectorize on x86_64 because the cost model sees it as never
profitable made me do this now (also because the implementation
turned out to be so simple).

Bootstrap and regtest in progress on x86_64-unknown-linux-gnu.

Richard.

2017-11-29  Richard Biener  

PR tree-optimization/83202
* tree-vect-slp.c (scalar_stmts_set_t): New typedef.
(bst_fail): Use it.
(vect_analyze_slp_cost_1): Add visited set, do not account SLP
nodes vectorized to the same stmts multiple times.
(vect_analyze_slp_cost): Allocate a visited set and pass it down.
(vect_analyze_slp_instance): Adjust.
(scalar_stmts_to_slp_tree_map_t): New typedef.
(vect_schedule_slp_instance): Add a map recording the SLP node
representing the vectorized stmts for a set of scalar stmts.
Avoid code-generating redundancies.
(vect_schedule_slp): Allocate map and pass it down.

* gcc.dg/vect/costmodel/x86_64/costmodel-pr83202.c: New testcase.

Index: gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr83202.c
===
--- gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr83202.c  
(nonexistent)
+++ gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr83202.c  
(working copy)
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_double } */
+
+void test(double data[4][2])
+{
+  for (int i = 0; i < 4; i++)
+{
+  data[i][0] = data[i][0] * data[i][0];
+  data[i][1] = data[i][1] * data[i][1];
+}
+}
+
+/* We should vectorize this with SLP and V2DF vectors.  */
+/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" } } */
+/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" } } */
Index: gcc/tree-vect-slp.c
===
--- gcc/tree-vect-slp.c (revision 255229)
+++ gcc/tree-vect-slp.c (working copy)
@@ -961,7 +961,8 @@ bst_traits::equal (value_type existing,
   return true;
 }
 
-static hash_set , bst_traits> *bst_fail;
+typedef hash_set , bst_traits> scalar_stmts_set_t;
+static scalar_stmts_set_t *bst_fail;
 
 static slp_tree
 vect_build_slp_tree_2 (vec_info *vinfo,
@@ -1674,7 +1675,8 @@ static void
 vect_analyze_slp_cost_1 (slp_instance instance, slp_tree node,
 stmt_vector_for_cost *prologue_cost_vec,
 stmt_vector_for_cost *body_cost_vec,
-unsigned ncopies_for_cost)
+unsigned ncopies_for_cost,
+scalar_stmts_set_t* visited)
 {
   unsigned i, j;
   slp_tree child;
@@ -1682,11 +1684,18 @@ vect_analyze_slp_cost_1 (slp_instance in
   stmt_vec_info stmt_info;
   tree lhs;
 
+  /* If we already costed the exact same set of scalar stmts we're done.
+ We share the generated vector stmts for those.  */
+  if (visited->contains (SLP_TREE_SCALAR_STMTS (node)))
+return;
+
+  visited->add (SLP_TREE_SCALAR_STMTS (node).copy ());
+
   /* Recurse down the SLP tree.  */
   FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child)
 if (SLP_TREE_DEF_TYPE (child) == vect_internal_def)
   vect_analyze_slp_cost_1 (instance, child, prologue_cost_vec,
-  body_cost_vec, ncopies_for_cost);
+  body_cost_vec, ncopies_for_cost, visited);
 
   /* Look at the first scalar stmt to determine the cost.  */
   stmt = SLP_TREE_SCALAR_STMTS (node)[0];
@@ -1871,9 +1880,11 @@ vect_analyze_slp_cost (slp_instance inst
 
   prologue_cost_vec.create (10);
   body_cost_vec.create (10);
+  scalar_stmts_set_t *visited = new scalar_stmts_set_t ();
   vect_analyze_slp_cost_1 (instance, SLP_INSTANCE_TREE (instance),
   _cost_vec, _cost_vec,
-  ncopies_for_cost);
+  ncopies_for_cost, visited);
+  delete visited;
 
   /* Record the prologue costs, which were delayed until we were
  sure that SLP was successful.  */
@@ -2037,7 +2048,7 @@ vect_analyze_slp_instance (vec_info *vin
   /* Build the tree for the SLP instance.  */
   bool *matches = XALLOCAVEC (bool, group_size);
   unsigned npermutes = 0;
-  bst_fail = new hash_set , bst_traits> ();
+  bst_fail = new scalar_stmts_set_t ();
   node = vect_build_slp_tree (vinfo, scalar_stmts, group_size,

Re: [PATCH, i386] Fix movdi_internal to return MODE_TI with AVX512

2017-11-29 Thread Uros Bizjak
On Wed, Nov 29, 2017 at 1:10 PM, Uros Bizjak  wrote:
> On Wed, Nov 29, 2017 at 12:05 PM, Shalnov, Sergey
>  wrote:
>> Hi,
>> I found wrong MODE_XI used in movdi_internal that cause zmm
>> Generation with "-march=skylake-avx512 -mprefer-vector-width=128"
>> options set. This patch fixes the mode and register type but keep using
>> AVX512 instruction set.
>
> IMO, a beter solution is to introduce ext_rex_sse_reg_operand and use
> it in place of existing ext_sse_reg_operand predicate. This is XImode
> move.

Probably you will need the same change in movsi_internal.

Uros.


Re: [PATCH, i386] Fix movdi_internal to return MODE_TI with AVX512

2017-11-29 Thread Uros Bizjak
On Wed, Nov 29, 2017 at 12:05 PM, Shalnov, Sergey
 wrote:
> Hi,
> I found wrong MODE_XI used in movdi_internal that cause zmm
> Generation with "-march=skylake-avx512 -mprefer-vector-width=128"
> options set. This patch fixes the mode and register type but keep using
> AVX512 instruction set.

IMO, a beter solution is to introduce ext_rex_sse_reg_operand and use
it in place of existing ext_sse_reg_operand predicate. This is XImode
move.

Uros.


RFC: Variable-length VECTOR_CSTs

2017-11-29 Thread Richard Sandiford
It was clear from the SVE reviews that people were unhappy with how
"special" the variable-length case was.  One particular concern was
the use of VEC_DUPLICATE_CST and VEC_SERIES_CST, and the way that
that would in turn lead to different representations of VEC_PERM_EXPRs
with constant permute vectors, and difficulties in invoking
vec_perm_const_ok.

This is an RFC for a VECTOR_CST representation that treats each
specific constant as one instance of an arbitrary-length sequence.
The reprensentation then extends to variable-length vectors in a
natural way.

As discussed on IRC, if a vector contains X*N elements for some
constant N and integer X>0, the main features we need are:

1) the ability to represent a sequence that duplicates N values

   This is useful for SLP invariants.

2) the ability to represent a sequence that starts with N values and
   is followed by zeros

   This is useful for the initial value in a double or SLP reduction

3) the ability to represent N interleaved series

   This is useful for SLP inductions and for VEC_PERM_EXPRs.

For (2), zero isn't necessarily special, since vectors used in an AND
reduction might need to fill with ones.  Also, we might need up to N
different fill values with mixed SLP operations; it isn't necessarily
safe to assume that a single fill value will always be enough.

The same goes for (3): there's no reason in principle why the
steps in an SLP induction should all be the same (although they
do need to be at the moment).  E.g. once we support SLP on:

  for (unsigned int i = 0; i < n; i += 2)
{
  x[i] += 4 + i;
  x[i + 1] += 11 + i * 3;
}

we'll need {[4, 14], +, [2, 6]}.

So the idea is to represent vectors as P interleaved patterns of the form:

  [BASE0, BASE1, BASE1 + STEP, BASE1 + STEP*2, ...]

where the STEP is always zero (actually null) for non-integer vectors.
This is effectively projecting a "foreground" value of P elements
onto an arbitrary-length "background" sequenece, where the background
sequence contains P parallel linear series.

E.g. to pick an extreme and unlikely example,

  [42, 99, 2, 20, 3, 30, 4, 40, ...]

has 2 patterns:

  BASE0 = 42, BASE1 = 2, STEP = 1
  BASE0 = 99, BASE1 = 20, STEP = 10

The more useful cases are degenerate versions of this general case.

As far as memory consumption goes: the number of patterns needed for a
fixed-length vector with 2*N elements is always at most N; in the worst
case, we simply interleave the first N elements with the second N elements.
The worst-case increase in footprint is therefore N trees for the steps.
In practice the footprint is usually smaller than it was before, since
most constants do have a pattern.

The patch below implements this for trees.  I have patches to use the
same style of encoding for CONST_VECTOR and vec_perm_indices, but the
tree one is probably easiest to read.

The patch only adds the representation.  Follow-on patches make more
use of it (and usually make things simpler; e.g. integer_zerop is no
longer a looping operation).

Does this look better?

Thanks,
Richard


2017-11-29  Richard Sandiford  

gcc/
* doc/generic.texi (VECTOR_CST): Describe new representation of
vector constants.
* tree.def (VECTOR_CST): Update comment to refer to generic.texi.
* tree-core.h (tree_base): Add a vector_cst field to the u union.
(tree_vector_pattern): New struct.
(tree_vector): Replace elts array with a patterns array.
* tree.h (VECTOR_CST_NELTS): Redefine using TYPE_VECTOR_SUBPARTS.
(VECTOR_CST_ELTS): Delete.
(VECTOR_CST_ELT): Redefine using vector_cst_elt.
(VECTOR_CST_LOG2_NPATTERNS, VECTOR_CST_NPATTERNS): New macros.
(VECTOR_CST_DUPLICATE_P, VECTOR_CST_SERIES_P, VECTOR_CST_BASE)
(VECTOR_CST_STEP): Likewise.
(make_vector): Take the values of VECTOR_CST_LOG2_NPATTERNS,
VECTOR_CST_DUPLICATE_P and VECTOR_CST_SERIES_P as arguments.
(build_vector): Declare an overload that takes a vector of
tree_vector_patterns.
(vector_cst_int_elt, vector_cst_elt): Declare.
* tree.c (tree_code_size): Abort if passed VECTOR_CST.
(tree_size): Update for new VECTOR_CST layout.
(make_vector): Take the values of VECTOR_CST_LOG2_NPATTERNS,
VECTOR_CST_DUPLICATE_P and VECTOR_CST_SERIES_P as arguments.
(combine_patterns, compress_vector_patterns, vector_overflow_p)
(vector_duplicate_p, vector_series_p): New functions.
(build_vector): Add an overload that takes a vector of
tree_vector_patterns.  Use it for the overload that takes
a vector of elements.
(vector_cst_int_elt, vector_cst_elt): New functions.
(drop_tree_overflow): For VECTOR_CST, drop the overflow flags
from the VECTOR_CST_BASEs.
(check_vector_cst, test_vector_cst_patterns): New functions.
(tree_c_tests): Call it.
* lto-streamer-out.c 

[PATCH, i386] Fix movdi_internal to return MODE_TI with AVX512

2017-11-29 Thread Shalnov, Sergey
Hi,
I found wrong MODE_XI used in movdi_internal that cause zmm 
Generation with "-march=skylake-avx512 -mprefer-vector-width=128" 
options set. This patch fixes the mode and register type but keep using
AVX512 instruction set.

2017-11-28  Sergey Shalnov  
gcc/
* config/i386/i386.md: Fix AVX512 register width
in AVX512 instruction.



0009-Fix-AVX512-register-width-in-movdi_internal.patch
Description: 0009-Fix-AVX512-register-width-in-movdi_internal.patch


Re: [PATCH GCC]Rename and make remove_dead_inserted_code a simple dce interface

2017-11-29 Thread Bin.Cheng
On Wed, Nov 29, 2017 at 10:02 AM, Richard Biener
 wrote:
> On Tue, Nov 28, 2017 at 3:48 PM, Bin Cheng  wrote:
>> Hi,
>> This patch renames remove_dead_inserted_code to simple_dce_from_worklist, 
>> moves it to tree-ssa-dce.c
>> and makes it a simple public DCE interface.  Bootstrap and test along with 
>> loop interchange.  It's required
>> for interchange pass.  Is it OK?
>
> +  /* ???  Re-use seeds as worklist not only as initial set.  This may end up
> + removing more code as well.  If we keep seeds unchanged we could 
> restrict
> + new worklist elements to members of seed.  */
>
> Please remove this comment, while it applies to PRE when one takes
> remove_dead_inserted_code
> literally it doesn't apply to a seeded DCE.
>
> Please also rename 'seeds' to 'worklist' directly and document that
> worklist is consumed by the function.
> The function has linear complexity in the number of dead stmts, the
> constant factor is the number of
> SSA use operands in those stmts (so 2 on average I'd say).
>
> Ok with that change.
Updated, will commit new patch as attached.

Thanks,
bin

>
> Thanks,
> Richard.
>
>> BTW, I will push this along with interchange to branch: 
>> gcc.gnu.org/svn/gcc/branches/gimple-linterchange.
>>
>> Thanks,
>> bin
>> 2017-11-27  Bin Cheng  
>>
>> * tree-ssa-dce.c (simple_dce_from_worklist): Move and rename from
>> tree-ssa-pre.c::remove_dead_inserted_code.
>> * tree-ssa-dce.h: New file.
>> * tree-ssa-pre.c (tree-ssa-dce.h): Include new header file.
>> (remove_dead_inserted_code): Move and rename to function
>> tree-ssa-dce.c::simple_dce_from_worklist.
>> (pass_pre::execute): Update use.
From 219f42625b89eb81e2beb6605c9d594e83ed5048 Mon Sep 17 00:00:00 2001
From: amker 
Date: Sun, 26 Nov 2017 20:56:19 +0800
Subject: [PATCH 01/41] simple-dce-interface

---
 gcc/tree-ssa-dce.c | 52 ++
 gcc/tree-ssa-dce.h | 22 ++
 gcc/tree-ssa-pre.c | 67 +++---
 3 files changed, 82 insertions(+), 59 deletions(-)
 create mode 100644 gcc/tree-ssa-dce.h

diff --git a/gcc/tree-ssa-dce.c b/gcc/tree-ssa-dce.c
index a5f0edf..8595dec 100644
--- a/gcc/tree-ssa-dce.c
+++ b/gcc/tree-ssa-dce.c
@@ -1723,3 +1723,55 @@ make_pass_cd_dce (gcc::context *ctxt)
 {
   return new pass_cd_dce (ctxt);
 }
+
+
+/* A cheap DCE interface.  WORKLIST is a list of possibly dead stmts and
+   is consumed by this function.  The function has linear complexity in
+   the number of dead stmts with a constant factor like the average SSA
+   use operands number.  */
+
+void
+simple_dce_from_worklist (bitmap worklist)
+{
+  while (! bitmap_empty_p (worklist))
+{
+  /* Pop item.  */
+  unsigned i = bitmap_first_set_bit (worklist);
+  bitmap_clear_bit (worklist, i);
+
+  tree def = ssa_name (i);
+  /* Removed by somebody else or still in use.  */
+  if (! def || ! has_zero_uses (def))
+	continue;
+
+  gimple *t = SSA_NAME_DEF_STMT (def);
+  if (gimple_has_side_effects (t))
+	continue;
+
+  /* Add uses to the worklist.  */
+  ssa_op_iter iter;
+  use_operand_p use_p;
+  FOR_EACH_PHI_OR_STMT_USE (use_p, t, iter, SSA_OP_USE)
+	{
+	  tree use = USE_FROM_PTR (use_p);
+	  if (TREE_CODE (use) == SSA_NAME
+	  && ! SSA_NAME_IS_DEFAULT_DEF (use))
+	bitmap_set_bit (worklist, SSA_NAME_VERSION (use));
+	}
+
+  /* Remove stmt.  */
+  if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  fprintf (dump_file, "Removing dead stmt:");
+	  print_gimple_stmt (dump_file, t, 0);
+	}
+  gimple_stmt_iterator gsi = gsi_for_stmt (t);
+  if (gimple_code (t) == GIMPLE_PHI)
+	remove_phi_node (, true);
+  else
+	{
+	  gsi_remove (, true);
+	  release_defs (t);
+	}
+}
+}
diff --git a/gcc/tree-ssa-dce.h b/gcc/tree-ssa-dce.h
new file mode 100644
index 000..2adb086
--- /dev/null
+++ b/gcc/tree-ssa-dce.h
@@ -0,0 +1,22 @@
+/* Copyright (C) 2017 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT
+ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+#ifndef TREE_SSA_DCE_H
+#define TREE_SSA_DCE_H
+extern void simple_dce_from_worklist (bitmap);
+#endif
diff --git a/gcc/tree-ssa-pre.c b/gcc/tree-ssa-pre.c
index 281f100..c19d486 100644
--- a/gcc/tree-ssa-pre.c
+++ b/gcc/tree-ssa-pre.c
@@ 

Re: [Ping][PATCH v3] Fix Incorrect ASan global variables alignment on arm (PR sanitizer/81697)

2017-11-29 Thread Jakub Jelinek
On Tue, Nov 28, 2017 at 10:04:51AM +0300, Maxim Ostapenko wrote:
> (CC'ing Jakub and Ramana)
> 
> I would like to ping the following patch:
> https://gcc.gnu.org/ml/gcc-patches/2017-10/msg02288.html
> Fix Incorrect ASan global variables alignment on arm (PR sanitizer/81697)
> 
> -Maxim

> gcc/ChangeLog:
> 
> 2017-11-28  Maxim Ostapenko  
> 
>   PR sanitizer/81697
>   * asan.c (asan_protect_global): Add new ignore_decl_rtl_set_p
>   parameter. Return true if ignore_decl_rtl_set_p is true and other
>   conditions are satisfied.
>   * asan.h (asan_protect_global): Add new parameter.
>   * varasm.c (categorize_decl_for_section): Pass true as second parameter
>   to asan_protect_global calls.
> 
> gcc/testsuite/ChangeLog:
> 
> 2017-11-28  Maxim Ostapenko  
> 
>   PR sanitizer/81697
>   * g++.dg/asan/global-alignment.C: New test.
> 
> diff --git a/gcc/asan.c b/gcc/asan.c
> index d5128aa..78c3b60 100644
> --- a/gcc/asan.c
> +++ b/gcc/asan.c
> @@ -1605,7 +1605,7 @@ is_odr_indicator (tree decl)
> ASAN_RED_ZONE_SIZE bytes.  */
>  
>  bool
> -asan_protect_global (tree decl)
> +asan_protect_global (tree decl, bool ignore_decl_rtl_set_p)
>  {
>if (!ASAN_GLOBALS)
>  return false;
> @@ -1627,7 +1627,13 @@ asan_protect_global (tree decl)
>|| DECL_THREAD_LOCAL_P (decl)
>/* Externs will be protected elsewhere.  */
>|| DECL_EXTERNAL (decl)
> -  || !DECL_RTL_SET_P (decl)
> +  /* PR sanitizer/81697: For architectures that use section anchors first
> +  call to asan_protect_global may occur before DECL_RTL (decl) is set.
> +  We should ignore DECL_RTL_SET_P then, because otherwise the first call
> +  to asan_protect_global will return FALSE and the following calls on the
> +  same decl after setting DECL_RTL (decl) will return TRUE and we'll end
> +  up with inconsistency at runtime.  */
> +  || (!DECL_RTL_SET_P (decl) && !ignore_decl_rtl_set_p)

This part is fine.

>/* Comdat vars pose an ABI problem, we can't know if
>the var that is selected by the linker will have
>padding or not.  */
> @@ -1651,6 +1657,9 @@ asan_protect_global (tree decl)
>|| is_odr_indicator (decl))
>  return false;
>  
> +  if (ignore_decl_rtl_set_p)
> +return true;

This isn't, because then you bypass checks that really don't care
about RTL, like:
  if (lookup_attribute ("weakref", DECL_ATTRIBUTES (decl)))
return false;

  if (!TARGET_SUPPORTS_ALIASES && asan_needs_local_alias (decl))
return false;

So, IMHO just wrap only the DECL_RTL/symbol related stuff in if
(!ignore_decl_rtl_set_p).  Perhaps even better in
if (!ignore_decl_rtl_set_p || DECL_RTL_SET_P (decl))
so that if the rtl is already set, you do the check anyway.

> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/asan/global-alignment.C
> @@ -0,0 +1,18 @@
> +/* { dg-options "-fmerge-all-constants" } */
> +/* { dg-do compile } */
> +/* { dg-skip-if "" { *-*-* } { "-O0" } { "" } } */
> +
> +#include 
> +#include 
> +
> +const char kRecoveryInstallString[] = "NEW";
> +const char kRecoveryUpdateString[] = "UPDATE";
> +const char kRecoveryUninstallationString[] = "UNINSTALL";
> +
> +const std::map kStringToRequestMap = {
> +  {kRecoveryInstallString, 0},
> +  {kRecoveryUpdateString, 0},
> +  {kRecoveryUninstallationString, 0},
> +};
> +
> +/* { dg-final { scan-assembler-times 
> {\.section\s+\.rodata\n(?:(?!\.section).)*\.\w+\s+"NEW} 1 } } */

I don't really like the testcase.  The scaning for .rodata section is too
fragile, e.g. darwin will need something completely different, doesn't e.g.
powerpc use .sdata section instead, etc.

And, isn't std::map and std::string completely unnecessary?
I'd prefer a runtime test that shows the false positive without the patch,
preferrably in C, just with those const char arrays used by some C code
without any headers.

> diff --git a/gcc/varasm.c b/gcc/varasm.c
> index a139151..849eae0 100644
> --- a/gcc/varasm.c
> +++ b/gcc/varasm.c
> @@ -6508,7 +6508,7 @@ categorize_decl_for_section (const_tree decl, int reloc)
>else if (TREE_CODE (decl) == STRING_CST)
>  {
>if ((flag_sanitize & SANITIZE_ADDRESS)
> -   && asan_protect_global (CONST_CAST_TREE (decl)))
> +   && asan_protect_global (CONST_CAST_TREE (decl), true))

This doesn't make sense to me.  asan_protect_global for STRING_CST doesn't
care about any RTL, nor -fsection-anchors puts them into any kind of
block.

>/* or !flag_merge_constants */
>  return SECCAT_RODATA;
>else
> @@ -6536,7 +6536,7 @@ categorize_decl_for_section (const_tree decl, int reloc)
>   ret = reloc == 1 ? SECCAT_DATA_REL_RO_LOCAL : SECCAT_DATA_REL_RO;
>else if (reloc || flag_merge_constants < 2
>  || ((flag_sanitize & SANITIZE_ADDRESS)
> -&& asan_protect_global (CONST_CAST_TREE (decl
> +  

Re: [PATCH v3 1/4] [SPARC] Errata workaround for GRLIB-TN-0012

2017-11-29 Thread Eric Botcazou
> 2017-11-17  Daniel Cederman  
> 
>   * config/sparc/sparc.c (fpop_insn_p): New function.
>   (sparc_do_work_around_errata): Insert NOP instructions to
>   prevent sequences that could trigger the TN-0012 errata for
>   GR712RC.
>   (pass_work_around_errata::gate): Also test sparc_fix_gr712rc.
>   * config/sparc/sparc.md (fix_gr712rc): New attribute.
>   (in_branch_annul_delay): Prevent floating-point instructions
>   in delay slot of annulled integer branch.

OK for mainline and 7 branch, thanks.

-- 
Eric Botcazou


Re: [PATCH GCC]Rename and make remove_dead_inserted_code a simple dce interface

2017-11-29 Thread Richard Biener
On Tue, Nov 28, 2017 at 3:48 PM, Bin Cheng  wrote:
> Hi,
> This patch renames remove_dead_inserted_code to simple_dce_from_worklist, 
> moves it to tree-ssa-dce.c
> and makes it a simple public DCE interface.  Bootstrap and test along with 
> loop interchange.  It's required
> for interchange pass.  Is it OK?

+  /* ???  Re-use seeds as worklist not only as initial set.  This may end up
+ removing more code as well.  If we keep seeds unchanged we could restrict
+ new worklist elements to members of seed.  */

Please remove this comment, while it applies to PRE when one takes
remove_dead_inserted_code
literally it doesn't apply to a seeded DCE.

Please also rename 'seeds' to 'worklist' directly and document that
worklist is consumed by the function.
The function has linear complexity in the number of dead stmts, the
constant factor is the number of
SSA use operands in those stmts (so 2 on average I'd say).

Ok with that change.

Thanks,
Richard.

> BTW, I will push this along with interchange to branch: 
> gcc.gnu.org/svn/gcc/branches/gimple-linterchange.
>
> Thanks,
> bin
> 2017-11-27  Bin Cheng  
>
> * tree-ssa-dce.c (simple_dce_from_worklist): Move and rename from
> tree-ssa-pre.c::remove_dead_inserted_code.
> * tree-ssa-dce.h: New file.
> * tree-ssa-pre.c (tree-ssa-dce.h): Include new header file.
> (remove_dead_inserted_code): Move and rename to function
> tree-ssa-dce.c::simple_dce_from_worklist.
> (pass_pre::execute): Update use.


Re: (patch, fortran] PR83021 - [7/8 Regression] gfortran segfault in polymorphic assignment

2017-11-29 Thread Paul Richard Thomas
Committed to 7-branch and trunk and r255205 and r255202 respectively.

Paul


On 26 November 2017 at 18:40, Paul Richard Thomas
 wrote:
> Dear All,
>
> This regression was caused by the patch for PR81447. The chunk that
> has been modified came about because use association of derived types
> in block data, in the presence of a vtable, was trying to add vtable
> procedures, which is not allowed. The original patch did not
> explicitly target block data and this is fixed here. I decided that a
> testcase was not necessary but this could be done if desired.
>
> Bootstrapped and regtested on FC23/x86_64 - OK for both branches?
>
> I will commit tomorrow morning if there are no complaints.
>
> Best regards
>
> Paul
>
> 2017-11-26  Paul Thomas  
>
> PR fortran/83021
> * resolve.c (resolve_component): Only escape for use assciated
> vtypes if the current namespace has no proc_name and is most
> particularly block data.



-- 
"If you can't explain it simply, you don't understand it well enough"
- Albert Einstein


[Patch, fortran] PR83076 - [8 Regression] ICE in gfc_deallocate_scalar_with_status, at fortran/trans.c:1598

2017-11-29 Thread Paul Richard Thomas
This problem is not really a regression but is a "feature" that was
exposed by my patch for PR81447.

The testcase fails because the caf token for the pointer component is
not present in the type. This is fixed in trans-types.c
(gfc_get_derived_type) in the manner described in the ChangeLog.

Bootstrapped and regtested on FC23/x86_64 - OK for trunk?

I would be grateful if caf aficionados would give the patch a whirl on
their favourite codes.

Cheers

Paul

2017-11-29  Paul Thomas  

PR fortran/83076
* trans-types.c (gfc_get_derived_type): Flag GFC_FCOARRAY_LIB
for module derived types that are not vtypes. Use this flag to
use the module backend_decl as the canonical type and to build
the type anew, ensuring that scalar allocatable and pointer
components have the caf token field added.

2017-11-29  Paul Thomas  

PR fortran/83076
* gfortran.dg/coarray_45.f90 : New test.
Index: gcc/fortran/trans-types.c
===
*** gcc/fortran/trans-types.c   (revision 255161)
--- gcc/fortran/trans-types.c   (working copy)
*** gfc_get_derived_type (gfc_symbol * deriv
*** 2483,2488 
--- 2483,2492 
gfc_dt_list *dt;
gfc_namespace *ns;
tree tmp;
+   bool coarray_flag;
+
+   coarray_flag = flag_coarray == GFC_FCOARRAY_LIB
+&& derived->module && !derived->attr.vtype;

gcc_assert (!derived->attr.pdt_template);

*** gfc_get_derived_type (gfc_symbol * deriv
*** 2523,2534 
return derived->backend_decl;
  }

!   /* If use associated, use the module type for this one.  */
if (derived->backend_decl == NULL
&& derived->attr.use_assoc
&& derived->module
&& gfc_get_module_backend_decl (derived))
! goto copy_derived_types;

/* The derived types from an earlier namespace can be used as the
   canonical type.  */
--- 2527,2545 
return derived->backend_decl;
  }

!   /* If use associated, use the module type for this one, except for the case
!  where codimensions are present or where a caf_token is needed for pointer
!  or allocatable components. */
if (derived->backend_decl == NULL
&& derived->attr.use_assoc
&& derived->module
&& gfc_get_module_backend_decl (derived))
! {
!   if (coarray_flag || codimen)
!   got_canonical = true;
!   else
!   goto copy_derived_types;
! }

/* The derived types from an earlier namespace can be used as the
   canonical type.  */
*** gfc_get_derived_type (gfc_symbol * deriv
*** 2764,2770 
GFC_DECL_PTR_ARRAY_P (c->backend_decl) = 1;

/* Do not add a caf_token field for classes' data components.  */
!   if (codimen && !c->attr.dimension && !c->attr.codimension
  && (c->attr.allocatable || c->attr.pointer)
  && c->caf_token == NULL_TREE && strcmp ("_data", c->name) != 0)
{
--- 2775,2782 
GFC_DECL_PTR_ARRAY_P (c->backend_decl) = 1;

/* Do not add a caf_token field for classes' data components.  */
!   if ((codimen || coarray_flag)
! && !c->attr.dimension && !c->attr.codimension
  && (c->attr.allocatable || c->attr.pointer)
  && c->caf_token == NULL_TREE && strcmp ("_data", c->name) != 0)
{
Index: gcc/testsuite/gfortran.dg/coarray_45.f90
===
*** gcc/testsuite/gfortran.dg/coarray_45.f90(nonexistent)
--- gcc/testsuite/gfortran.dg/coarray_45.f90(working copy)
***
*** 0 
--- 1,24 
+ ! { dg-do compile }
+ ! { dg-options "-fcoarray=lib -lcaf_single " }
+ !
+ ! Test the fix for PR83076
+ !
+ module m
+type t
+   integer, pointer :: z
+end type
+type(t) :: ptr
+ contains
+function g(x)
+   type(t) :: x[*]
+   if (associated (x%z, ptr%z)) deallocate (x%z) ! This used to ICE with 
-fcoarray=lib
+end
+ end module
+
+   use m
+ contains
+function f(x)
+   type(t) :: x[*]
+   if (associated (x%z, ptr%z)) deallocate (x%z)
+end
+ end


Re: [PATCH] Implement std::to_address for C++2a

2017-11-29 Thread Glen Fernandes
(Also added a new [_neg] test)

Move static_assert for function pointers to __to_address

2017-11-28  Glen Joseph Fernandes  

* include/bits/ptr_traits.h (to_address): Moved static_assert.
* testsuite/20_util/to_address/1_neg.cc: New test.


Tested x86_64-pc-linux-gnu.
commit 0081e4be38f203a0a3d40c66cfa8b1f7b88f8e61
Author: Glen Joseph Fernandes 
Date:   Tue Nov 28 23:56:48 2017 -0500

Move static_assert for function pointers to __to_address

2017-11-28  Glen Joseph Fernandes  

* include/bits/ptr_traits.h (to_address): Moved static_assert.
* testsuite/20_util/to_address/1_neg.cc: New test.

diff --git a/libstdc++-v3/include/bits/ptr_traits.h 
b/libstdc++-v3/include/bits/ptr_traits.h
index 67cc7e97a80..11a0deb9448 100644
--- a/libstdc++-v3/include/bits/ptr_traits.h
+++ b/libstdc++-v3/include/bits/ptr_traits.h
@@ -147,11 +147,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 using __ptr_rebind = typename pointer_traits<_Ptr>::template rebind<_Tp>;
 
   template
 constexpr _Tp*
 __to_address(_Tp* __ptr) noexcept
-{ return __ptr; }
+{
+  static_assert(!std::is_function<_Tp>::value, "not a function pointer");
+  return __ptr;
+}
 
 #if __cplusplus <= 201703L
   template
 constexpr typename std::pointer_traits<_Ptr>::element_type*
 __to_address(const _Ptr& __ptr)
@@ -175,14 +178,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
* @ingroup pointer_abstractions
   */
   template
 constexpr _Tp*
 to_address(_Tp* __ptr) noexcept
-{
-  static_assert(!std::is_function_v<_Tp>, "not a pointer to function");
-  return __ptr;
-}
+{ return std::__to_address(__ptr); }
 
   /**
* @brief Obtain address referenced by a pointer to an object
* @param __ptr A pointer to an object
* @return @c pointer_traits<_Ptr>::to_address(__ptr) if that expression is
diff --git a/libstdc++-v3/testsuite/20_util/to_address/1_neg.cc 
b/libstdc++-v3/testsuite/20_util/to_address/1_neg.cc
new file mode 100644
index 000..c80013aa930
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/to_address/1_neg.cc
@@ -0,0 +1,36 @@
+// Copyright (C) 2017 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-options "-std=gnu++2a" }
+// { dg-do compile { target c++2a } }
+// { dg-error "not a function pointer" "" { target *-*-* } 153 }
+
+#include 
+
+struct P
+{
+  using element_type = void();
+
+  element_type* operator->() const noexcept
+  { return nullptr; }
+};
+
+void test01()
+{
+  P p;
+  std::to_address(p); // { dg-error "required from here" }
+}


Re: [Patch][x86, backport] Backport to GCC-6 vzeroupper patches

2017-11-29 Thread Uros Bizjak
On Wed, Nov 29, 2017 at 10:46 AM, Peryt, Sebastian
 wrote:
> Hi,
>
> I'd like to ask for backporting to GCC-6 branch vzeroupper generation patches 
> from trunk,
> that are resolving 3 PRs:
> PR target/82941
> PR target/82942
> PR target/82990
>
> Two patches were combined into one and rebased. Bootstraped and tested.
> Is it ok for merge?
>
> Changelog:
>
> Fix PR82941 and PR82942 by adding proper vzeroupper generation on SKX.
> Add X86_TUNE_EMIT_VZEROUPPER to indicate if vzeroupper instruction
> should be inserted before a transfer of control flow out of the function.
> It is turned on by default unless we are tuning for KNL.  Users can always
> use -mzeroupper or -mno-zeroupper to override X86_TUNE_EMIT_VZEROUPPER.
>
> 2017-11-29  Sebastian Peryt  
> H.J. Lu  
>
> gcc/
> Bakcported from trunk
> PR target/82941
> PR target/82942
> PR target/82990
> * config/i386/i386.c (pass_insert_vzeroupper): Remove
> TARGET_AVX512F check from gate condition.
> (ix86_check_avx256_register): Changed to ...
> (ix86_check_avx_upper_register): ... this. Add extra check for
> VALID_AVX512F_REG_OR_XI_MODE.
> (ix86_avx_u128_mode_needed): Changed
> ix86_check_avx256_register to ix86_check_avx_upper_register.
> (ix86_check_avx256_stores): Changed to ...
> (ix86_check_avx_upper_stores): ... this. Changed
> ix86_check_avx256_register to ix86_check_avx_upper_register.
> (ix86_avx_u128_mode_after): Changed
> avx_reg256_found to avx_upper_reg_found. Changed
> ix86_check_avx256_stores to ix86_check_avx_upper_stores.
> (ix86_avx_u128_mode_entry): Changed
> ix86_check_avx256_register to ix86_check_avx_upper_register.
> (ix86_avx_u128_mode_exit): Ditto.
> (ix86_option_override_internal): Set MASK_VZEROUPPER if
> neither -mzeroupper nor -mno-zeroupper is used and
> TARGET_EMIT_VZEROUPPER is set.
> * config/i386/i386.h: (host_detect_local_cpu): New define.
> (TARGET_EMIT_VZEROUPPER): New.
> * config/i386/x86-tune.def: Add X86_TUNE_EMIT_VZEROUPPER.
>
> 2017-11-29  Sebastian Peryt  
> H.J. Lu  
>
> gcc/testsuite/
> Backported from trunk
> PR target/82941
> PR target/82942
> PR target/82990
> * gcc.target/i386/pr82941-1.c: New test.
> * gcc.target/i386/pr82941-2.c: Likewise.
> * gcc.target/i386/pr82942-1.c: Likewise.
> * gcc.target/i386/pr82942-2.c: Likewise.
> * gcc.target/i386/pr82990-1.c: Likewise.
> * gcc.target/i386/pr82990-2.c: Likewise.
> * gcc.target/i386/pr82990-3.c: Likewise.
> * gcc.target/i386/pr82990-4.c: Likewise.
> * gcc.target/i386/pr82990-5.c: Likewise.
> * gcc.target/i386/pr82990-6.c: Likewise.
> * gcc.target/i386/pr82990-7.c: Likewise.

I doubt that gcc-6 is used for KNL, but OK anyway.

Uros.


Re: [Patch][x86, backport] Backport to GCC-7 vzeroupper patches

2017-11-29 Thread Uros Bizjak
On Wed, Nov 29, 2017 at 10:39 AM, Peryt, Sebastian
 wrote:
> Hi,
>
> I'd like to ask for backporting to GCC-7 branch vzeroupper generation patches 
> from trunk,
> that are resolving 3 PRs:
> PR target/82941
> PR target/82942
> PR target/82990
>
> Two patches were combined into one and rebased. Bootstraped and tested.
> Is it ok for merge?
>
> Changelog:
>
> Fix PR82941 and PR82942 by adding proper vzeroupper generation on SKX.
> Add X86_TUNE_EMIT_VZEROUPPER to indicate if vzeroupper instruction should
> be inserted before a transfer of control flow out of the function.  It is
> turned on by default unless we are tuning for KNL.  Users can always use
> -mzeroupper or -mno-zeroupper to override X86_TUNE_EMIT_VZEROUPPER.
>
> 2017-11-29  Sebastian Peryt  
> H.J. Lu  
>
> gcc/
> Bakcported from trunk
> PR target/82941
> PR target/82942
> PR target/82990
> * config/i386/i386.c (pass_insert_vzeroupper): Remove
> TARGET_AVX512F check from gate condition.
> (ix86_check_avx256_register): Changed to ...
> (ix86_check_avx_upper_register): ... this. Add extra check for
> VALID_AVX512F_REG_OR_XI_MODE.
> (ix86_avx_u128_mode_needed): Changed
> ix86_check_avx256_register to ix86_check_avx_upper_register.
> (ix86_check_avx256_stores): Changed to ...
> (ix86_check_avx_upper_stores): ... this. Changed
> ix86_check_avx256_register to ix86_check_avx_upper_register.
> (ix86_avx_u128_mode_after): Changed
> avx_reg256_found to avx_upper_reg_found. Changed
> ix86_check_avx256_stores to ix86_check_avx_upper_stores.
> (ix86_avx_u128_mode_entry): Changed
> ix86_check_avx256_register to ix86_check_avx_upper_register.
> (ix86_avx_u128_mode_exit): Ditto.
> (ix86_option_override_internal): Set MASK_VZEROUPPER if
> neither -mzeroupper nor -mno-zeroupper is used and
> TARGET_EMIT_VZEROUPPER is set.
> * config/i386/i386.h: (host_detect_local_cpu): New define.
> (TARGET_EMIT_VZEROUPPER): New.
> * config/i386/x86-tune.def: Add X86_TUNE_EMIT_VZEROUPPER.
>
> 2017-11-29  Sebastian Peryt  
> H.J. Lu  
>
> gcc/testsuite/
> Backported from trunk
> PR target/82941
> PR target/82942
> PR target/82990
> * gcc.target/i386/pr82941-1.c: New test.
> * gcc.target/i386/pr82941-2.c: Likewise.
> * gcc.target/i386/pr82942-1.c: Likewise.
> * gcc.target/i386/pr82942-2.c: Likewise.
> * gcc.target/i386/pr82990-1.c: Likewise.
> * gcc.target/i386/pr82990-2.c: Likewise.
> * gcc.target/i386/pr82990-3.c: Likewise.
> * gcc.target/i386/pr82990-4.c: Likewise.
> * gcc.target/i386/pr82990-5.c: Likewise.
> * gcc.target/i386/pr82990-6.c: Likewise.
> * gcc.target/i386/pr82990-7.c: Likewise.

OK.

Thanks,
Uros.


[Patch][x86, backport] Backport to GCC-6 vzeroupper patches

2017-11-29 Thread Peryt, Sebastian
Hi,

I'd like to ask for backporting to GCC-6 branch vzeroupper generation patches 
from trunk,
that are resolving 3 PRs:
PR target/82941
PR target/82942
PR target/82990

Two patches were combined into one and rebased. Bootstraped and tested.
Is it ok for merge?

Changelog:

Fix PR82941 and PR82942 by adding proper vzeroupper generation on SKX.
Add X86_TUNE_EMIT_VZEROUPPER to indicate if vzeroupper instruction
should be inserted before a transfer of control flow out of the function.
It is turned on by default unless we are tuning for KNL.  Users can always
use -mzeroupper or -mno-zeroupper to override X86_TUNE_EMIT_VZEROUPPER.

2017-11-29  Sebastian Peryt  
H.J. Lu  

gcc/
Bakcported from trunk
PR target/82941
PR target/82942
PR target/82990
* config/i386/i386.c (pass_insert_vzeroupper): Remove
TARGET_AVX512F check from gate condition.
(ix86_check_avx256_register): Changed to ...
(ix86_check_avx_upper_register): ... this. Add extra check for
VALID_AVX512F_REG_OR_XI_MODE.
(ix86_avx_u128_mode_needed): Changed
ix86_check_avx256_register to ix86_check_avx_upper_register.
(ix86_check_avx256_stores): Changed to ...
(ix86_check_avx_upper_stores): ... this. Changed
ix86_check_avx256_register to ix86_check_avx_upper_register.
(ix86_avx_u128_mode_after): Changed
avx_reg256_found to avx_upper_reg_found. Changed
ix86_check_avx256_stores to ix86_check_avx_upper_stores.
(ix86_avx_u128_mode_entry): Changed
ix86_check_avx256_register to ix86_check_avx_upper_register.
(ix86_avx_u128_mode_exit): Ditto.
(ix86_option_override_internal): Set MASK_VZEROUPPER if
neither -mzeroupper nor -mno-zeroupper is used and
TARGET_EMIT_VZEROUPPER is set.
* config/i386/i386.h: (host_detect_local_cpu): New define.
(TARGET_EMIT_VZEROUPPER): New.
* config/i386/x86-tune.def: Add X86_TUNE_EMIT_VZEROUPPER.

2017-11-29  Sebastian Peryt  
H.J. Lu  

gcc/testsuite/
Backported from trunk
PR target/82941
PR target/82942
PR target/82990
* gcc.target/i386/pr82941-1.c: New test.
* gcc.target/i386/pr82941-2.c: Likewise.
* gcc.target/i386/pr82942-1.c: Likewise.
* gcc.target/i386/pr82942-2.c: Likewise.
* gcc.target/i386/pr82990-1.c: Likewise.
* gcc.target/i386/pr82990-2.c: Likewise.
* gcc.target/i386/pr82990-3.c: Likewise.
* gcc.target/i386/pr82990-4.c: Likewise.
* gcc.target/i386/pr82990-5.c: Likewise.
* gcc.target/i386/pr82990-6.c: Likewise.
* gcc.target/i386/pr82990-7.c: Likewise.

Thanks,
Sebastian


0001-backportPR82941-GCC-6.patch
Description: 0001-backportPR82941-GCC-6.patch


Re: [PATCH] Improve build_simple_mem_ref_loc (PR middle-end/83185)

2017-11-29 Thread Richard Biener
On Wed, 29 Nov 2017, Jakub Jelinek wrote:

> Hi!
> 
> This PR is about forwprop propagating:
>   _17 = __builtin_alloca_with_align (_16, 256);
>   _18 = _17 + 32;
>   __builtin___asan_alloca_poison (_18, _8);
>   _7 = &*_18[4];
>   __builtin_va_start (_7, 0);
> to:
>   _17 = __builtin_alloca_with_align (_16, 256);
>   _18 = _17 + 32;
>   __builtin___asan_alloca_poison (_18, _8);
>   _7 = [(struct [0:D.2257][1] *)_17 + 32B][4];
>   __builtin_va_start (_7, 0);
> which is something the verifiers allow and then backend VA_START
> handling calling build_simple_mem_ref_loc on the ADDR_EXPR it got.
> get_addr_base_and_unit_offset only looks through a MEM_REF if it
> has ADDR_EXPR as the first operand, which is not the case here, so nothing
> sums up the 96 offset from the [4] ARRAY_REF with the extra 32 from the
> MEM_REF.
> 
> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
> trunk?

Ok.

Richard.

> 2017-11-29  Jakub Jelinek  
> 
>   PR middle-end/83185
>   * tree.c (build_simple_mem_ref_loc): Handle
>   get_addr_base_and_unit_offset returning a MEM_REF.
> 
>   * gcc.dg/asan/pr83185.c: New test.
> 
> --- gcc/tree.c.jj 2017-11-28 12:11:38.0 +0100
> +++ gcc/tree.c2017-11-28 17:22:01.800939050 +0100
> @@ -4692,7 +4692,13 @@ build_simple_mem_ref_loc (location_t loc
>  {
>ptr = get_addr_base_and_unit_offset (TREE_OPERAND (ptr, 0), );
>gcc_assert (ptr);
> -  ptr = build_fold_addr_expr (ptr);
> +  if (TREE_CODE (ptr) == MEM_REF)
> + {
> +   offset += mem_ref_offset (ptr).to_short_addr ();
> +   ptr = TREE_OPERAND (ptr, 0);
> + }
> +  else
> + ptr = build_fold_addr_expr (ptr);
>gcc_assert (is_gimple_reg (ptr) || is_gimple_min_invariant (ptr));
>  }
>tem = build2 (MEM_REF, TREE_TYPE (ptype),
> --- gcc/testsuite/gcc.dg/asan/pr83185.c.jj2017-11-28 17:24:15.540329283 
> +0100
> +++ gcc/testsuite/gcc.dg/asan/pr83185.c   2017-11-28 17:24:19.851277394 
> +0100
> @@ -0,0 +1,14 @@
> +/* PR middle-end/83185 */
> +/* { dg-do compile } */
> +
> +#include 
> +
> +int bar (void);
> +
> +void
> +foo (int i, ...)
> +{
> +  va_list aps[bar()];
> +  va_start (aps[4], i);
> +  va_end (aps[4]);
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [PATCH] Improve seq_cost (PR middle-end/80929)

2017-11-29 Thread Richard Biener
On Wed, 29 Nov 2017, Jakub Jelinek wrote:

> Hi!
> 
> This PR complains that seq_cost counts all non-single_set insns as 1
> unconditionally, which is indeed bad.  For some like compare + arithmetics
> we even have now code in pattern_cost that handles those reasonably by
> default, for others targets have the option through a hook to return
> whatever they want.
> 
> On the other side, IMNSHO we don't want to count CODE_LABELs, NOTEs,
> BARRIERs, DEBUG_INSNs...
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Works for me.

Thanks,
Richard.

> 2017-11-29  Jakub Jelinek  
> 
>   PR middle-end/80929
>   * rtlanal.c (seq_cost): For non-single_set insns try to use insn_cost.
> 
> --- gcc/rtlanal.c.jj  2017-10-27 14:16:39.0 +0200
> +++ gcc/rtlanal.c 2017-11-27 13:35:46.321509933 +0100
> @@ -5342,7 +5342,13 @@ seq_cost (const rtx_insn *seq, bool spee
>if (set)
>  cost += set_rtx_cost (set, speed);
> -  else
> -cost++;
> +  else if (NONDEBUG_INSN_P (seq))
> + {
> +   int this_cost = insn_cost (CONST_CAST_RTX_INSN (seq), speed);
> +   if (this_cost > 0)
> + cost += this_cost;
> +   else
> + cost++;
> + }
>  }
>  
>return cost;
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


[Patch][x86, backport] Backport to GCC-7 vzeroupper patches

2017-11-29 Thread Peryt, Sebastian
Hi,

I'd like to ask for backporting to GCC-7 branch vzeroupper generation patches 
from trunk,
that are resolving 3 PRs:
PR target/82941
PR target/82942
PR target/82990

Two patches were combined into one and rebased. Bootstraped and tested.
Is it ok for merge?

Changelog:

Fix PR82941 and PR82942 by adding proper vzeroupper generation on SKX.
Add X86_TUNE_EMIT_VZEROUPPER to indicate if vzeroupper instruction should
be inserted before a transfer of control flow out of the function.  It is
turned on by default unless we are tuning for KNL.  Users can always use
-mzeroupper or -mno-zeroupper to override X86_TUNE_EMIT_VZEROUPPER.

2017-11-29  Sebastian Peryt  
H.J. Lu  

gcc/
Bakcported from trunk
PR target/82941
PR target/82942
PR target/82990
* config/i386/i386.c (pass_insert_vzeroupper): Remove
TARGET_AVX512F check from gate condition.
(ix86_check_avx256_register): Changed to ...
(ix86_check_avx_upper_register): ... this. Add extra check for
VALID_AVX512F_REG_OR_XI_MODE.
(ix86_avx_u128_mode_needed): Changed
ix86_check_avx256_register to ix86_check_avx_upper_register.
(ix86_check_avx256_stores): Changed to ...
(ix86_check_avx_upper_stores): ... this. Changed
ix86_check_avx256_register to ix86_check_avx_upper_register.
(ix86_avx_u128_mode_after): Changed
avx_reg256_found to avx_upper_reg_found. Changed
ix86_check_avx256_stores to ix86_check_avx_upper_stores.
(ix86_avx_u128_mode_entry): Changed
ix86_check_avx256_register to ix86_check_avx_upper_register.
(ix86_avx_u128_mode_exit): Ditto.
(ix86_option_override_internal): Set MASK_VZEROUPPER if
neither -mzeroupper nor -mno-zeroupper is used and
TARGET_EMIT_VZEROUPPER is set.
* config/i386/i386.h: (host_detect_local_cpu): New define.
(TARGET_EMIT_VZEROUPPER): New.
* config/i386/x86-tune.def: Add X86_TUNE_EMIT_VZEROUPPER.

2017-11-29  Sebastian Peryt  
H.J. Lu  

gcc/testsuite/
Backported from trunk
PR target/82941
PR target/82942
PR target/82990
* gcc.target/i386/pr82941-1.c: New test.
* gcc.target/i386/pr82941-2.c: Likewise.
* gcc.target/i386/pr82942-1.c: Likewise.
* gcc.target/i386/pr82942-2.c: Likewise.
* gcc.target/i386/pr82990-1.c: Likewise.
* gcc.target/i386/pr82990-2.c: Likewise.
* gcc.target/i386/pr82990-3.c: Likewise.
* gcc.target/i386/pr82990-4.c: Likewise.
* gcc.target/i386/pr82990-5.c: Likewise.
* gcc.target/i386/pr82990-6.c: Likewise.
* gcc.target/i386/pr82990-7.c: Likewise.


Thanks,
Sebastian


0001-backportPR82942-GCC-7.patch
Description: 0001-backportPR82942-GCC-7.patch


Re: [PATCH] complex type canonicalization

2017-11-29 Thread Jakub Jelinek
On Wed, Nov 29, 2017 at 10:33:04AM +0100, Jakub Jelinek wrote:
> > -  /* We need to create a name, since complex is a fundamental type.  */
> > -  if (!TYPE_NAME (t) && named)
> > +  /* We need to create a name, since complex is a fundamental type.  */
> > +  if (named)
> > +   {
> > + const char *name = NULL;
> > +
> > + if (TREE_TYPE (t) == char_type_node)
> > +   name = "complex char";
> > + else if (TREE_TYPE (t) == signed_char_type_node)
> > +   name = "complex signed char";
> > + else if (TREE_TYPE (t) == unsigned_char_type_node)
> > +   name = "complex unsigned char";
> > + else if (TREE_TYPE (t) == short_integer_type_node)
> > +   name = "complex short int";
> > + else if (TREE_TYPE (t) == short_unsigned_type_node)
> > +   name = "complex short unsigned int";
> > + else if (TREE_TYPE (t) == integer_type_node)
> > +   name = "complex int";
> > + else if (TREE_TYPE (t) == unsigned_type_node)
> > +   name = "complex unsigned int";
> > + else if (TREE_TYPE (t) == long_integer_type_node)
> > +   name = "complex long int";
> > + else if (TREE_TYPE (t) == long_unsigned_type_node)
> > +   name = "complex long unsigned int";
> > + else if (TREE_TYPE (t) == long_long_integer_type_node)
> > +   name = "complex long long int";
> > + else if (TREE_TYPE (t) == long_long_unsigned_type_node)
> > +   name = "complex long long unsigned int";
> > +
> > + if (name != NULL)
> > +   TYPE_NAME (t) = build_decl (UNKNOWN_LOCATION, TYPE_DECL,
> > +   get_identifier (name), t);
> > +   }
> 
> Are you sure nothing can build_complex_type before build_common_tree_nodes
> is called and builds those?  If that would happen, we'd fail to add names
> with your patch, while we'd add them before.  The addition of TYPE_NAME
> looks orthogonal to the TYPE_CANONICAL handling and type hashing.
> I must also say I'm surprised that the recursive build_complex_type
> call passes in named too, but as it does type comparison by pointers,
> perhaps that's ok.

On the other side, the types that are == compared to are also (newly!, not
by reusing existing types) created in build_common_tree_nodes, so I think
your patch is ok as is.

Jakub


Re: [PATCH] complex type canonicalization

2017-11-29 Thread Jakub Jelinek
On Tue, Nov 28, 2017 at 01:27:54PM -0500, Nathan Sidwell wrote:
> --- tree.c(revision 255202)
> +++ tree.c(working copy)
> -  /* We need to create a name, since complex is a fundamental type.  */
> -  if (!TYPE_NAME (t) && named)
> +  /* We need to create a name, since complex is a fundamental type.  */
> +  if (named)
> + {
> +   const char *name = NULL;
> +
> +   if (TREE_TYPE (t) == char_type_node)
> + name = "complex char";
> +   else if (TREE_TYPE (t) == signed_char_type_node)
> + name = "complex signed char";
> +   else if (TREE_TYPE (t) == unsigned_char_type_node)
> + name = "complex unsigned char";
> +   else if (TREE_TYPE (t) == short_integer_type_node)
> + name = "complex short int";
> +   else if (TREE_TYPE (t) == short_unsigned_type_node)
> + name = "complex short unsigned int";
> +   else if (TREE_TYPE (t) == integer_type_node)
> + name = "complex int";
> +   else if (TREE_TYPE (t) == unsigned_type_node)
> + name = "complex unsigned int";
> +   else if (TREE_TYPE (t) == long_integer_type_node)
> + name = "complex long int";
> +   else if (TREE_TYPE (t) == long_unsigned_type_node)
> + name = "complex long unsigned int";
> +   else if (TREE_TYPE (t) == long_long_integer_type_node)
> + name = "complex long long int";
> +   else if (TREE_TYPE (t) == long_long_unsigned_type_node)
> + name = "complex long long unsigned int";
> +
> +   if (name != NULL)
> + TYPE_NAME (t) = build_decl (UNKNOWN_LOCATION, TYPE_DECL,
> + get_identifier (name), t);
> + }

Are you sure nothing can build_complex_type before build_common_tree_nodes
is called and builds those?  If that would happen, we'd fail to add names
with your patch, while we'd add them before.  The addition of TYPE_NAME
looks orthogonal to the TYPE_CANONICAL handling and type hashing.
I must also say I'm surprised that the recursive build_complex_type
call passes in named too, but as it does type comparison by pointers,
perhaps that's ok.

Otherwise LGTM.

Jakub


Re: [PATCH] Fix vec_concatv2di pattern for SSE4 (PR target/80819)

2017-11-29 Thread Uros Bizjak
On Wed, Nov 29, 2017 at 9:24 AM, Jakub Jelinek  wrote:
> Hi!
>
> Before r218303 we had just (=x,0,rm) alternative for SSE4 (no AVX),
> that change turned it into (=Yr,0,*rm) and (=*x,0,rm) alternatives,
> so that we avoid too many prefixes if possible.
> The latter alternative is fine, we want the *, because that is the point,
> Yr class is the subset of x registers that don't need the REX prefix.
> The * in the first alternative makes no sense, with it IRA is effectively
> forced to allocate the second vec_concat pseudo into NO_REGS - memory,
> and while postreload can fix it up afterwards, we end up with dead stores
> that nothing ever removes afterwards.
>
> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
> trunk?
>
> 2017-11-29  Jakub Jelinek  
>
> PR target/80819
> * config/i386/sse.md (vec_concatv2di): Remove * from (=Yr,0,*rm)
> alternative.
>
> * gcc.target/i386/pr80819-1.c: New test.
> * gcc.target/i386/pr80819-2.c: New test.

OK.

Thanks,
Uros.

> --- gcc/config/i386/sse.md.jj   2017-11-24 08:58:05.0 +0100
> +++ gcc/config/i386/sse.md  2017-11-28 18:04:20.739396199 +0100
> @@ -13915,7 +13915,7 @@ (define_insn "vec_concatv2di"
>   (match_operand:DI 1 "nonimmediate_operand"
>   "  0, 0,x ,Yv,r ,vm,?!*Yn,0,Yv,0,0,v")
>   (match_operand:DI 2 "vector_move_operand"
> - "*rm,rm,rm,rm,C ,C ,C ,x,Yv,x,m,m")))]
> + " rm,rm,rm,rm,C ,C ,C ,x,Yv,x,m,m")))]
>"TARGET_SSE"
>"@
> pinsrq\t{$1, %2, %0|%0, %2, 1}
> --- gcc/testsuite/gcc.target/i386/pr80819-1.c.jj2017-11-28 
> 18:11:09.452482042 +0100
> +++ gcc/testsuite/gcc.target/i386/pr80819-1.c   2017-11-28 18:09:57.0 
> +0100
> @@ -0,0 +1,13 @@
> +/* PR target/80819 */
> +/* { dg-do compile { target { ! ia32 } } } */
> +/* { dg-options "-O2 -msse4 -mno-avx -mtune=haswell -masm=att" } */
> +
> +typedef unsigned long long V __attribute__((vector_size (16)));
> +
> +V
> +foo (unsigned long long x, unsigned long long y)
> +{
> +  return (V) { x, y };
> +}
> +
> +/* { dg-final { scan-assembler-not "movq\[ \t]*%rsi, \[-0-9]*\\(" } } */
> --- gcc/testsuite/gcc.target/i386/pr80819-2.c.jj2017-11-28 
> 18:11:15.942404034 +0100
> +++ gcc/testsuite/gcc.target/i386/pr80819-2.c   2017-11-28 18:11:21.915332239 
> +0100
> @@ -0,0 +1,13 @@
> +/* PR target/80819 */
> +/* { dg-do compile { target { ! ia32 } } } */
> +/* { dg-options "-O2 -msse4 -mno-avx -mtune=generic -masm=att" } */
> +
> +typedef unsigned long long V __attribute__((vector_size (16)));
> +
> +V
> +foo (unsigned long long x, unsigned long long y)
> +{
> +  return (V) { x, y };
> +}
> +
> +/* { dg-final { scan-assembler-not "movq\[ \t]*%rsi, \[-0-9]*\\(" } } */
>
> Jakub


Re: [RFC][PATCH] Extend DCE to remove unnecessary new/delete-pairs

2017-11-29 Thread Jakub Jelinek
On Tue, Nov 28, 2017 at 09:11:00PM -0700, Martin Sebor wrote:
> On 11/27/2017 02:22 AM, Dominik Inführ wrote:
> > Thanks for all the reviews! I’ve revised the patch, the 
> > operator_delete_flag is now stored in tree_decl_with_vis (there already 
> > seem to be some FUNCTION_DECL-flags in there). I’ve also added the option 
> > -fallocation-dce to disable this optimization. It bootstraps and no 
> > regressions on aarch64 and x86_64.
> > 
> It's great to be able to eliminate pairs of these calls.  For
> unpaired calls, though, I think it would be even more useful to
> also issue a warning.  Otherwise the elimination will mask bugs

??  I hope you're only talking about allocation where the returned
pointer can't leak elsewhere, doing allocation in one function
(e.g. constructor, or whatever other function) and deallocation in some
other one is so common such a warning would be not just useless, but
harmful with almost all occurrences being false positives.

Warning on malloc/standard operator new or malloc/realloc-like function
when the return pointer can't escape the current function is reasonable.

Jakub


[PATCH] Fix vec_concatv2di pattern for SSE4 (PR target/80819)

2017-11-29 Thread Jakub Jelinek
Hi!

Before r218303 we had just (=x,0,rm) alternative for SSE4 (no AVX),
that change turned it into (=Yr,0,*rm) and (=*x,0,rm) alternatives,
so that we avoid too many prefixes if possible.
The latter alternative is fine, we want the *, because that is the point,
Yr class is the subset of x registers that don't need the REX prefix.
The * in the first alternative makes no sense, with it IRA is effectively
forced to allocate the second vec_concat pseudo into NO_REGS - memory,
and while postreload can fix it up afterwards, we end up with dead stores
that nothing ever removes afterwards.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

2017-11-29  Jakub Jelinek  

PR target/80819
* config/i386/sse.md (vec_concatv2di): Remove * from (=Yr,0,*rm)
alternative.

* gcc.target/i386/pr80819-1.c: New test.
* gcc.target/i386/pr80819-2.c: New test.

--- gcc/config/i386/sse.md.jj   2017-11-24 08:58:05.0 +0100
+++ gcc/config/i386/sse.md  2017-11-28 18:04:20.739396199 +0100
@@ -13915,7 +13915,7 @@ (define_insn "vec_concatv2di"
  (match_operand:DI 1 "nonimmediate_operand"
  "  0, 0,x ,Yv,r ,vm,?!*Yn,0,Yv,0,0,v")
  (match_operand:DI 2 "vector_move_operand"
- "*rm,rm,rm,rm,C ,C ,C ,x,Yv,x,m,m")))]
+ " rm,rm,rm,rm,C ,C ,C ,x,Yv,x,m,m")))]
   "TARGET_SSE"
   "@
pinsrq\t{$1, %2, %0|%0, %2, 1}
--- gcc/testsuite/gcc.target/i386/pr80819-1.c.jj2017-11-28 
18:11:09.452482042 +0100
+++ gcc/testsuite/gcc.target/i386/pr80819-1.c   2017-11-28 18:09:57.0 
+0100
@@ -0,0 +1,13 @@
+/* PR target/80819 */
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -msse4 -mno-avx -mtune=haswell -masm=att" } */
+
+typedef unsigned long long V __attribute__((vector_size (16)));
+
+V
+foo (unsigned long long x, unsigned long long y)
+{
+  return (V) { x, y };
+}
+
+/* { dg-final { scan-assembler-not "movq\[ \t]*%rsi, \[-0-9]*\\(" } } */
--- gcc/testsuite/gcc.target/i386/pr80819-2.c.jj2017-11-28 
18:11:15.942404034 +0100
+++ gcc/testsuite/gcc.target/i386/pr80819-2.c   2017-11-28 18:11:21.915332239 
+0100
@@ -0,0 +1,13 @@
+/* PR target/80819 */
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -msse4 -mno-avx -mtune=generic -masm=att" } */
+
+typedef unsigned long long V __attribute__((vector_size (16)));
+
+V
+foo (unsigned long long x, unsigned long long y)
+{
+  return (V) { x, y };
+}
+
+/* { dg-final { scan-assembler-not "movq\[ \t]*%rsi, \[-0-9]*\\(" } } */

Jakub


[PATCH] Fix PR tree-optimization/83195 testcase for arm

2017-11-29 Thread Jakub Jelinek
Hi!

The pr82929.c testcase uses store_merge effective target, which is
int32plus && nonstrict_align.  Unfortunately, arm is handled for
nonstrict_align wierdly, although it has STRICT_ALIGNMENT 1, it is sometimes
considered nonstrict_align (the only exception apparently).

Now, the testcase really needs a non-strict alignment target where
STRICT_ALIGNMENT is 0, otherwise the optimization it tests is not
beneficial.  So, the following patch stops testing it on arm, and adds
another test where the pointers are guaranteed to be aligned and thus we
can test for the optimization even on non-strict alignment targets.

Bootstrapped/regtested on x86_64-linux and i686-linux, tested by hand using
a cross-compiler to arm, ok for trunk?

2017-11-29  Jakub Jelinek  

PR tree-optimization/83195
* gcc.dg/pr82929.c: Don't check for "Merging successful" on arm.
* gcc.dg/pr82929-2.c: New test.

--- gcc/testsuite/gcc.dg/pr82929.c.jj   2017-11-10 15:42:39.0 +0100
+++ gcc/testsuite/gcc.dg/pr82929.c  2017-11-28 17:50:43.705221829 +0100
@@ -15,4 +15,4 @@ foo (short *p, short *q, short *r)
   p[1] = e & f;
 }
 
-/* { dg-final { scan-tree-dump-times "Merging successful" 1 "store-merging" } 
} */
+/* { dg-final { scan-tree-dump-times "Merging successful" 1 "store-merging" { 
target { ! arm*-*-* } } } } */
--- gcc/testsuite/gcc.dg/pr82929-2.c.jj 2017-11-28 17:47:41.858409094 +0100
+++ gcc/testsuite/gcc.dg/pr82929-2.c2017-11-28 17:48:55.264526160 +0100
@@ -0,0 +1,21 @@
+/* PR tree-optimization/82929 */
+/* { dg-do compile { target store_merge } } */
+/* { dg-options "-O2 -fdump-tree-store-merging" } */
+
+void
+foo (short *p, short *q, short *r)
+{
+  p = __builtin_assume_aligned (p, __alignof__ (int));
+  q = __builtin_assume_aligned (q, __alignof__ (int));
+  r = __builtin_assume_aligned (r, __alignof__ (int));
+  short a = q[0];
+  short b = q[1];
+  short c = ~a;
+  short d = r[0];
+  short e = r[1];
+  short f = ~b;
+  p[0] = c & d;
+  p[1] = e & f;
+}
+
+/* { dg-final { scan-tree-dump-times "Merging successful" 1 "store-merging" } 
} */

Jakub


[PATCH] Improve build_simple_mem_ref_loc (PR middle-end/83185)

2017-11-29 Thread Jakub Jelinek
Hi!

This PR is about forwprop propagating:
  _17 = __builtin_alloca_with_align (_16, 256);
  _18 = _17 + 32;
  __builtin___asan_alloca_poison (_18, _8);
  _7 = &*_18[4];
  __builtin_va_start (_7, 0);
to:
  _17 = __builtin_alloca_with_align (_16, 256);
  _18 = _17 + 32;
  __builtin___asan_alloca_poison (_18, _8);
  _7 = [(struct [0:D.2257][1] *)_17 + 32B][4];
  __builtin_va_start (_7, 0);
which is something the verifiers allow and then backend VA_START
handling calling build_simple_mem_ref_loc on the ADDR_EXPR it got.
get_addr_base_and_unit_offset only looks through a MEM_REF if it
has ADDR_EXPR as the first operand, which is not the case here, so nothing
sums up the 96 offset from the [4] ARRAY_REF with the extra 32 from the
MEM_REF.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

2017-11-29  Jakub Jelinek  

PR middle-end/83185
* tree.c (build_simple_mem_ref_loc): Handle
get_addr_base_and_unit_offset returning a MEM_REF.

* gcc.dg/asan/pr83185.c: New test.

--- gcc/tree.c.jj   2017-11-28 12:11:38.0 +0100
+++ gcc/tree.c  2017-11-28 17:22:01.800939050 +0100
@@ -4692,7 +4692,13 @@ build_simple_mem_ref_loc (location_t loc
 {
   ptr = get_addr_base_and_unit_offset (TREE_OPERAND (ptr, 0), );
   gcc_assert (ptr);
-  ptr = build_fold_addr_expr (ptr);
+  if (TREE_CODE (ptr) == MEM_REF)
+   {
+ offset += mem_ref_offset (ptr).to_short_addr ();
+ ptr = TREE_OPERAND (ptr, 0);
+   }
+  else
+   ptr = build_fold_addr_expr (ptr);
   gcc_assert (is_gimple_reg (ptr) || is_gimple_min_invariant (ptr));
 }
   tem = build2 (MEM_REF, TREE_TYPE (ptype),
--- gcc/testsuite/gcc.dg/asan/pr83185.c.jj  2017-11-28 17:24:15.540329283 
+0100
+++ gcc/testsuite/gcc.dg/asan/pr83185.c 2017-11-28 17:24:19.851277394 +0100
@@ -0,0 +1,14 @@
+/* PR middle-end/83185 */
+/* { dg-do compile } */
+
+#include 
+
+int bar (void);
+
+void
+foo (int i, ...)
+{
+  va_list aps[bar()];
+  va_start (aps[4], i);
+  va_end (aps[4]);
+}

Jakub


[PATCH] Improve seq_cost (PR middle-end/80929)

2017-11-29 Thread Jakub Jelinek
Hi!

This PR complains that seq_cost counts all non-single_set insns as 1
unconditionally, which is indeed bad.  For some like compare + arithmetics
we even have now code in pattern_cost that handles those reasonably by
default, for others targets have the option through a hook to return
whatever they want.

On the other side, IMNSHO we don't want to count CODE_LABELs, NOTEs,
BARRIERs, DEBUG_INSNs...

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2017-11-29  Jakub Jelinek  

PR middle-end/80929
* rtlanal.c (seq_cost): For non-single_set insns try to use insn_cost.

--- gcc/rtlanal.c.jj2017-10-27 14:16:39.0 +0200
+++ gcc/rtlanal.c   2017-11-27 13:35:46.321509933 +0100
@@ -5342,7 +5342,13 @@ seq_cost (const rtx_insn *seq, bool spee
   if (set)
 cost += set_rtx_cost (set, speed);
-  else
-cost++;
+  else if (NONDEBUG_INSN_P (seq))
+   {
+ int this_cost = insn_cost (CONST_CAST_RTX_INSN (seq), speed);
+ if (this_cost > 0)
+   cost += this_cost;
+ else
+   cost++;
+   }
 }
 
   return cost;

Jakub