date:20191202

On Mon, 2 Dec 2019, Richard Biener wrote:

> On December 2, 2019 4:27:47 PM GMT+01:00, Alexander Monakov 
>  wrote:
> >On Mon, 2 Dec 2019, Richard Biener wrote:
> >
> >> +typedef long long v4di __attribute__((vector_size(32)));
> >> +struct Vec
> >> +{
> >> +  unsigned int v[8];
> >> +};
> >> +
> >> +v4di pun (struct Vec *s)
> >> +{
> >> +  v4di tem;
> >> +  __builtin_memcpy (, s, 32);
> >> +  return tem;
> >> +}
> >> +
> >> +/* We're expecting exactly two stmts, in particular no
> >BIT_INSERT_EXPR
> >> +   and no memcpy call.
> >> +_3 = MEM  [(char * {ref-all})s_2(D)];
> >> +return _3;  */
> >
> >So just to make sure I understand correctly: the type in angle brackets
> >does
> >not imply 256-bit alignment, and this access has implied alignment of
> >just 
> >32 bits, which is deduced from the type of s_2, right?
> 
> Yes. I should have quoted the more obvious -gimple IL dump instead. 

  _3 = __MEM  ((char * {ref-all})s_2(D));

so it's actually only 8 bits alignment.

Richard.

Re: [C++ PATCH] Fix up constexpr virtual calls (PR c++/92695)


On 12/2/19 4:43 PM, Jakub Jelinek wrote:

On Mon, Dec 02, 2019 at 02:29:56PM -0500, Jason Merrill wrote:

On 11/27/19 6:38 PM, Jakub Jelinek wrote:

+ if (i == 0 && DECL_VIRTUAL_P (fun))
+   {
+ tree addr = arg;
+ STRIP_NOPS (addr);
+ if (TREE_CODE (addr) == ADDR_EXPR)
+   {
+ tree obj = TREE_OPERAND (addr, 0);
+ while (TREE_CODE (obj) == COMPONENT_REF
+&& DECL_FIELD_IS_BASE (TREE_OPERAND (obj, 1)))
+   obj = TREE_OPERAND (obj, 0);


Shouldn't this loop stop when we reach DECL_CONTEXT (fun)?


I guess it can, patch I can test tonight attached.  I thought it doesn't matter
much, because cxx_eval_indirect_ref would be able to go from further derived
class to the base if we want too far, but maybe that is not the case
with multiple inheritance.


OK after testing.


Relatedly, I notice that the OBJ_TYPE_REF code breaks with multiple
inheritance: Adding a virtual function to Y in constexpr-virtual6.C leads to


I'm afraid I'd be lost with that.  There is another bug in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92695#c12 about constexpr
defaulted virtual destructors not being synthetized until end of TU.

2019-12-02  Jakub Jelinek  

PR c++/92695
* constexpr.c (cxx_bind_parameters_in_call): For virtual calls,
adjust the first argument to point to the derived object rather
than its base.

* g++.dg/cpp2a/constexpr-virtual14.C: New test.

--- gcc/cp/constexpr.c.jj   2019-12-02 22:30:31.608306270 +0100
+++ gcc/cp/constexpr.c  2019-12-02 22:35:42.374501609 +0100
@@ -1441,6 +1441,26 @@ cxx_bind_parameters_in_call (const const
arg = adjust_temp_type (type, arg);
  if (!TREE_CONSTANT (arg))
*non_constant_args = true;
+ /* For virtual calls, adjust the this argument, so that it is
+the object on which the method is called, rather than
+one of its bases.  */
+ if (i == 0 && DECL_VIRTUAL_P (fun))
+   {
+ tree addr = arg;
+ STRIP_NOPS (addr);
+ if (TREE_CODE (addr) == ADDR_EXPR)
+   {
+ tree obj = TREE_OPERAND (addr, 0);
+ while (TREE_CODE (obj) == COMPONENT_REF
+&& DECL_FIELD_IS_BASE (TREE_OPERAND (obj, 1))
+&& !same_type_ignoring_top_level_qualifiers_p
+   (TREE_TYPE (obj), DECL_CONTEXT (fun)))
+   obj = TREE_OPERAND (obj, 0);
+ if (obj != TREE_OPERAND (addr, 0))
+   arg = build_fold_addr_expr_with_type (obj,
+ TREE_TYPE (arg));
+   }
+   }
  TREE_VEC_ELT (binds, i) = arg;
}
parms = TREE_CHAIN (parms);
--- gcc/testsuite/g++.dg/cpp2a/constexpr-virtual14.C.jj 2019-12-02 
22:34:05.554998508 +0100
+++ gcc/testsuite/g++.dg/cpp2a/constexpr-virtual14.C2019-12-02 
22:34:05.554998508 +0100
@@ -0,0 +1,27 @@
+// PR c++/92695
+// { dg-do compile { target c++2a } }
+
+struct A {
+  virtual int get () = 0;
+  virtual int set (A *o) = 0;
+};
+struct B : A {
+  constexpr int get () override { return 10; }
+  constexpr int set (A *o) override { a = o; return 20; }
+  A *a {};
+};
+constexpr auto addressof = [] (A ) { return  };
+struct C {
+  B b;
+  A *c { addressof (b) };
+  constexpr int add () { return c->set (addressof (b)); }
+};
+struct D {
+  B b[2];
+  A *c { addressof (b[0]) };
+  constexpr int add () { return c->set (addressof (b[0])); }
+};
+template 
+constexpr int get () { T f; return f.add (); }
+static_assert (get () == 20);
+static_assert (get () == 20);


Jakub

Re: [C++ PATCH] Fix ICE in build_new_op_1 (PR c++/92705)


On 12/2/19 5:17 PM, Jakub Jelinek wrote:

On Mon, Dec 02, 2019 at 01:48:46PM -0500, Jason Merrill wrote:

On 11/29/19 6:33 PM, Jakub Jelinek wrote:

Hi!

The changed code in build_new_op_1 ICEs on the following testcase,
because conv is user_conv_p with kind == ck_ambig, for which next_conversion
returns NULL.  It seems in other spots where for user_conv_p we are walking
the conversion chain we also don't assume there must be ck_user, so this
patch just uses the first conv if ck_user is not found (so that the previous
diagnostics about ambiguous conversion is emitted).


It seems like various places could use a function to strip down to the first
ck_user, ck_ambig, ck_list, ck_aggr, or ck_identity conversion encountered:
source_type, the user-defined conversion comparision in compare_ics, and now
here.  Mind doing that refactoring?  Maybe call it
strip_standard_conversion.


In neither of the spots it is exactly equivalent to what the code was doing
before (or what the patched build_new_op_1 did), but this is an area I know
next to nothing about, so I've just tried to type into patch what you wrote
above.  Do you mean something like below (didn't see regressions in make
check-c++-all, but haven't tried full bootstrap/regtest yet)?


Yes, thanks.  OK if testing passes.

Jason


2019-12-02  Jason Merrill  
Jakub Jelinek  

PR c++/92705
* call.c (strip_standard_conversion): New function.
(build_new_op_1): Use it for user_conv_p.
(compare_ics): Likewise.
(source_type): Likewise.

* g++.dg/conversion/ambig4.C: New test.

--- gcc/cp/call.c.jj2019-11-30 00:15:46.298953538 +0100
+++ gcc/cp/call.c   2019-12-02 23:02:48.494379961 +0100
@@ -865,6 +865,22 @@ next_conversion (conversion *conv)
return conv->u.next;
  }
  
+/* Strip to the first ck_user, ck_ambig, ck_list, ck_aggr or ck_identity

+   encountered.  */
+
+static conversion *
+strip_standard_conversion (conversion *conv)
+{
+  while (conv
+&& conv->kind != ck_user
+&& conv->kind != ck_ambig
+&& conv->kind != ck_list
+&& conv->kind != ck_aggr
+&& conv->kind != ck_identity)
+conv = next_conversion (conv);
+  return conv;
+}
+
  /* Subroutine of build_aggr_conv: check whether CTOR, a braced-init-list,
 is a valid aggregate initializer for array type ATYPE.  */
  
@@ -6370,8 +6386,7 @@ build_new_op_1 (const op_location_t 

  conv = cand->convs[0];
  if (conv->user_conv_p)
{
- while (conv->kind != ck_user)
-   conv = next_conversion (conv);
+ conv = strip_standard_conversion (conv);
  arg1 = convert_like (conv, arg1, complain);
}
  
@@ -6380,8 +6395,7 @@ build_new_op_1 (const op_location_t 

  conv = cand->convs[1];
  if (conv->user_conv_p)
{
- while (conv->kind != ck_user)
-   conv = next_conversion (conv);
+ conv = strip_standard_conversion (conv);
  arg2 = convert_like (conv, arg2, complain);
}
}
@@ -6391,8 +6405,7 @@ build_new_op_1 (const op_location_t 
  conv = cand->convs[2];
  if (conv->user_conv_p)
{
- while (conv->kind != ck_user)
-   conv = next_conversion (conv);
+ conv = strip_standard_conversion (conv);
  arg3 = convert_like (conv, arg3, complain);
}
}
@@ -10538,17 +10551,8 @@ compare_ics (conversion *ics1, conversio
if (ics1->user_conv_p || ics1->kind == ck_list
|| ics1->kind == ck_aggr || ics2->kind == ck_aggr)
  {
-  conversion *t1;
-  conversion *t2;
-
-  for (t1 = ics1; t1 && t1->kind != ck_user; t1 = next_conversion (t1))
-   if (t1->kind == ck_ambig || t1->kind == ck_aggr
-   || t1->kind == ck_list)
- break;
-  for (t2 = ics2; t2 && t2->kind != ck_user; t2 = next_conversion (t2))
-   if (t2->kind == ck_ambig || t2->kind == ck_aggr
-   || t2->kind == ck_list)
- break;
+  conversion *t1 = strip_standard_conversion (ics1);
+  conversion *t2 = strip_standard_conversion (ics2);
  
if (!t1 || !t2 || t1->kind != t2->kind)

return 0;
@@ -10956,14 +10960,7 @@ compare_ics (conversion *ics1, conversio
  static tree
  source_type (conversion *t)
  {
-  for (;; t = next_conversion (t))
-{
-  if (t->kind == ck_user
- || t->kind == ck_ambig
- || t->kind == ck_identity)
-   return t->type;
-}
-  gcc_unreachable ();
+  return strip_standard_conversion (t)->type;
  }
  
  /* Note a warning about preferring WINNER to LOSER.  We do this by storing

--- gcc/testsuite/g++.dg/conversion/ambig4.C.jj 2019-12-02 22:52:24.407017475 
+0100
+++ gcc/testsuite/g++.dg/conversion/ambig4.C2019-12-02 22:52:24.407017475 
+0100
@@ -0,0 +1,14 @@
+// PR c++/92705
+// { dg-do compile }
+

Re: [C++ PATCH] c++/91353 - P1331R2: Allow trivial default init in constexpr contexts.


On 12/2/19 5:09 PM, Marek Polacek wrote:

On Mon, Dec 02, 2019 at 12:09:17PM -0500, Jason Merrill wrote:

On 12/1/19 8:09 PM, Marek Polacek wrote:

+|| (skip_empty
+&& is_really_empty_class (TREE_TYPE (field),


This should probably check DECL_SIZE (field) == size_zero_node instead,
since that will properly distinguish between overlapping and non-overlapping
data members of empty class type.  And please test how this works with data
members of empty class type both with and without [[no_unique_address]].


I don't think that's possible -- empty classes in C++ have sizeof(char), unless
their only member is char[0], then their DECL_SIZE is 0.


I think you're talking about the TYPE_SIZE of the class, and I'm talking 
about the DECL_SIZE of the FIELD_DECL.



I've added two testcases: constexpr-init13.C and constexpr-init14.C.  Is there
another scenario regarding [[no_unique_address]] that you want me to test?


I think the classes with empty base fields need to have another 
initialized field after them to have a chance of tripping


  if (idx != field)

Jason

Re: C++ PATCH for c++/91363 - P0960R3: Parenthesized initialization of aggregates


On 12/2/19 7:31 PM, Marek Polacek wrote:

@@ -1967,8 +1978,23 @@ expand_default_init (tree binfo, tree true_exp, tree 
exp, tree init, int flags,
tree ctor_name = (true_exp == exp
? complete_ctor_identifier : base_ctor_identifier);
  
+  /* Given class A,

+
+  A a(1, 2);
+
+can mean a call to a constructor A::A(int, int), if present.  If not,
+but A is an aggregate, we will try aggregate initialization.  */
rval = build_special_member_call (exp, ctor_name, , binfo, flags,
complain);
+  if (BRACE_ENCLOSED_INITIALIZER_P (rval))
+   {
+ gcc_assert (cxx_dialect >= cxx2a);
+ rval = digest_init (type, rval, tf_warning_or_error);
+ rval = build2 (INIT_EXPR, type, exp, rval);
+ /* So that we do finish_expr_stmt below.  Don't return here, we
+need to release PARMS.  */
+ TREE_SIDE_EFFECTS (rval) = 1;
+   }


Can we still get a CONSTRUCTOR here after the change to 
build_new_method_call_1?


Jason

[PATCH v6] Missed function specialization + partial devirtualization

2019-12-02 Thread luoxhu

Hi Martin and Honza,

On 2019/11/18 21:02, Martin Liška wrote:
> On 11/16/19 10:59 AM, luoxhu wrote:
>> Sorry that I don't quite understand your meanning here.  I didn't grep the
>> word "cgraph_edge_summary" in source code, do you mean add new structure
> 
> Hello.
> 
> He wanted to write call_summary class and so you need something similar to
> ipa-sra.c:431. It's a data structure which associate a data to cgraph_edge.
> Is it understandable please?
> 
> Martin
> 

Update the patch as below with "git format-patch -U15" for review convenience,
the GC issue is fixed after leveraging the full GC framework in ipa-profile.c, 
Thanks:

v6 Changes:
 1. Define and use speculative_call_targets summary, move
 speculative_call_target from cgraph.h to ipa-profile.c.
 2. Use num_speculative_call_targets in cgraph_indirect_call_info.
 3. Refine with review comments.

This patch aims to fix PR69678 caused by PGO indirect call profiling
performance issues.
The bug that profiling data is never working was fixed by Martin's pull
back of topN patches, performance got GEOMEAN ~1% improvement(+24% for
511.povray_r specifically).
Still, currently the default profile only generates SINGLE indirect target
that called more than 75%.  This patch leverages MULTIPLE indirect
targets use in LTO-WPA and LTO-LTRANS stage, as a result, function
specialization, profiling, partial devirtualization, inlining and
cloning could be done successfully based on it.
Performance can get improved from 0.70 sec to 0.38 sec on simple tests.
Details are:
  1.  PGO with topn is enabled by default now, but only one indirect
  target edge will be generated in ipa-profile pass, so add variables to enable
  multiple speculative edges through passes, speculative_id will record the
  direct edge index bind to the indirect edge, indirect_call_targets length
  records how many direct edges owned by the indirect edge, postpone gimple_ic
  to ipa-profile like default as inline pass will decide whether it is benefit
  to transform indirect call.
  2.  Use speculative_id to track and search the reference node matched
  with the direct edge's callee for multiple targets.  Actually, it is the
  caller's responsibility to handle the direct edges mapped to same indirect
  edge.  speculative_call_info will return one of the direct edge specified,
  this will leverage current IPA edge process framework mostly.
  3.  Enable LTO WPA/LTRANS stage multiple indirect call targets analysis for
  profile full support in ipa passes and cgraph_edge functions.  speculative_id
  can be set by make_speculative id when multiple targets are binded to
  one indirect edge, and cloned if new edge is cloned.  speculative_id
  is streamed out and stream int by lto like lto_stmt_uid.
  4.  Add 1 in module testcase and 2 cross module testcases.
  5.  Bootstrap and regression test passed on Power8-LE.  No function
  and performance regression for SPEC2017.

gcc/ChangeLog

2019-12-02  Xiong Hu Luo  

PR ipa/69678
* Makefile.in (GTFILES): Add ipa-profile.c.
* cgraph.c (symbol_table::create_edge): Init speculative_id.
(cgraph_edge::make_speculative): Add param for setting speculative_id.
(cgraph_edge::speculative_call_info): Update comments and find reference
by speculative_id for multiple indirect targets.
(cgraph_edge::resolve_speculation): Decrease the speculations
for indirect edge, drop it's speculative if not direct target
left. Update comments.
(cgraph_edge::redirect_call_stmt_to_callee): Likewise.
(cgraph_node::dump): Print num_speculative_call_targets.
(cgraph_node::verify_node): Don't report error if speculative
edge not include statement.
(cgraph_edge::num_speculative_call_targets_p): New function.
* cgraph.h (int common_target_id): Remove.
(int common_target_probability): Remove.
(num_speculative_call_targets): New variable.
(make_speculative): Add param for setting speculative_id.
(cgraph_edge::num_speculative_call_targets_p): New declare.
(speculative_id): New variable.
* cgraphclones.c (cgraph_node::create_clone): Clone speculative_id.
* ipa-profile.c (struct speculative_call_target): New struct.
(class speculative_call_summary): New class.
(class speculative_call_summaries): New class.
(call_sums): New variable.
(ipa_profile_generate_summary): Generate indirect multiple targets 
summaries.
(ipa_profile_write_edge_summary): New function.
(ipa_profile_write_summary): Stream out indirect multiple targets 
summaries.
(ipa_profile_dump_all_summaries): New function.
(ipa_profile_read_edge_summary): New function.
(ipa_profile_read_summary_section): New function.
(ipa_profile_read_summary): Stream in indirect multiple targets 
summaries.
(ipa_profile): Generate num_speculative_call_targets from
summaries.

Diagnose use of [*] in old-style parameter definitions (PR c/88704)

2019-12-02 Thread Joseph Myers

GCC wrongly accepts [*] in old-style parameter definitions because
because parm_flag is set on the scope used for those definitions and,
unlike the case of a prototype in a function definition, there is no
subsequent check to disallow this invalid usage.  This patch adds such
a check.  (At this point we don't have location information for the
[*], so the diagnostic location isn't ideal.)

Bootstrapped with no regressions for x86_64-pc-linux-gnu.  Applied to 
mainline.

gcc/c:
2019-12-03  Joseph Myers  

PR c/88704
* c-decl.c (store_parm_decls_oldstyle): Diagnose use of [*] in
old-style parameter definitions.

gcc/testsuite:
2019-12-03  Joseph Myers  

PR c/88704
* gcc.dg/vla-25.c: New test.

Index: gcc/c/c-decl.c
===
--- gcc/c/c-decl.c  (revision 278912)
+++ gcc/c/c-decl.c  (working copy)
@@ -9394,6 +9394,9 @@ store_parm_decls_oldstyle (tree fndecl, const stru
"old-style function definition");
 }
 
+  if (current_scope->had_vla_unspec)
+error ("%<[*]%> not allowed in other than function prototype scope");
+
   /* Match each formal parameter name with its declaration.  Save each
  decl in the appropriate TREE_PURPOSE slot of the parmids chain.  */
   for (parm = parmids; parm; parm = TREE_CHAIN (parm))
Index: gcc/testsuite/gcc.dg/vla-25.c
===
--- gcc/testsuite/gcc.dg/vla-25.c   (nonexistent)
+++ gcc/testsuite/gcc.dg/vla-25.c   (working copy)
@@ -0,0 +1,9 @@
+/* Test [*] diagnosed on old-style parameter declaration.  PR c/88704.  */
+/* { dg-do compile } */
+/* { dg-options "-std=c99 -pedantic-errors" } */
+
+void
+f (x)
+ int x[*];
+{ /* { dg-error "not allowed in other than function prototype scope" } */
+}

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [C++ PATCH] c++/91353 - P1331R2: Allow trivial default init in constexpr contexts.

2019-12-02 Thread Marek Polacek

On Mon, Dec 02, 2019 at 09:15:14AM +0100, Jakub Jelinek wrote:
> On Sun, Dec 01, 2019 at 08:09:56PM -0500, Marek Polacek wrote:
> > On Thu, Nov 28, 2019 at 11:29:20PM -0500, Jason Merrill wrote:
> > > Sounds like reduced_constant_expression_p needs to deal better with empty
> > > bases.
> > 
> > This got a bit complicated because it also needs to handle unions and
> > now we also need to heed vptr.  But the following seems to work.
> > 
> > (I'll skip the story about a bogus error I hit when building cmcstl2 and
> > the need to debug a ~90,000 LOC test, because creduce got stuck reducing
> > it.)
> 
> Note, I got creduce stuck several times and what helped is:
> --- /usr/bin/creduce  2019-02-15 17:17:32.0 +0100
> +++ /usr/bin/creduce.nonnested2019-11-30 11:34:21.604937392 +0100
> @@ -802,7 +802,7 @@ my @all_methods = (
>  { "name" => "pass_clang","arg" => "local-to-global","pri" => 
> 9500, "C" => 1, },
>  { "name" => "pass_clang","arg" => "param-to-global","pri" => 
> 203,  "C" => 1, },
>  { "name" => "pass_clang","arg" => "param-to-local", "pri" => 
> 204,  "C" => 1, },
> -{ "name" => "pass_clang","arg" => "remove-nested-function", "pri" => 
> 205,  "C" => 1, },
> +   #{ "name" => "pass_clang","arg" => "remove-nested-function", "pri" => 
> 205,  "C" => 1, },
>  { "name" => "pass_clang","arg" => "rename-fun",  
>   "last_pass_pri" => 207, "C" => 1, },
>  { "name" => "pass_clang","arg" => "union-to-struct","pri" => 
> 208,  },
>  { "name" => "pass_clang","arg" => "rename-param",
>   "last_pass_pri" => 209, "C" => 1, },
> where I can use creduce.nonnested if normal creduce gets stuck.  That pass
> tends to increase size of code rather than reduce if there are nested
> functions, by adding a large series of temporaries:
>   type __reduce_tmp_123 = ...
>   type __reduce_tmp_122 = __reduce_tmp_123;
>   type __reduce_tmp_121 = __reduce_tmp_122;
>   type __reduce_tmp_120 = __reduce_tmp_121;
>   type __reduce_tmp_119 = __reduce_tmp_120;
>   type __reduce_tmp_118 = __reduce_tmp_119;
>   type __reduce_tmp_117 = __reduce_tmp_118;
> ...
> and every iteration adds another line.

Ah, that's useful, thanks!

--
Marek Polacek • Red Hat, Inc. • 300 A St, Boston, MA

Re: Fix crossmodule ipa-inline hint

On Sun, Dec 01, 2019 at 03:44:38PM +0100, Jan Hubicka wrote:
> > On Nov 30 2019, Jan Hubicka wrote:
> > 
> > >   * g++.dg/lto/inline-crossmodule-1.h: New testcase.
> > >   * g++.dg/lto/inline-crossmodule-1_0.C: New testcase.
> > >   * g++.dg/lto/inline-crossmodule-1_1.C: New testcase.
> > 
> > ERROR: (DejaGnu) proc "scan-wpa-ipa-times {Inlined ret1} 1 inlined" does 
> > not exist.
> 
> Uhh, should be scan-wpa-ipa-dump-times, I will test and commit the
> obvious patch.

It doesn't work:
UNRESOLVED: g++-dg-lto-inline-crossmodule-1-01.exe scan-wpa-ipa-dump-times 
inlined "(cross module)" 1
UNRESOLVED: g++-dg-lto-inline-crossmodule-1-01.exe scan-wpa-ipa-dump-times 
inlined "Inlined key[^n]*(cross module)" 1
UNRESOLVED: g++-dg-lto-inline-crossmodule-1-01.exe scan-wpa-ipa-dump-times 
inlined "Inlined ret1" 1
UNRESOLVED: g++-dg-lto-inline-crossmodule-1-01.exe scan-wpa-ipa-dump-times 
inlined "Inlined ret2" 1

Fixed thusly, tested on x86_64-linux, committed to trunk as obvious:

2019-12-03  Jakub Jelinek  

* g++.dg/lto/inline-crossmodule-1_0.C: Use -fdump-ipa-inline-details
instead of -fdump-ipa-inline.  Use "inline" instead of "inlined" as
last argument to scan-wpa-ipa-dump-times, use \\\( and \\\) instead of
( and ) in the regex.

--- gcc/testsuite/g++.dg/lto/inline-crossmodule-1_0.C.jj2019-12-02 
22:28:23.433287949 +0100
+++ gcc/testsuite/g++.dg/lto/inline-crossmodule-1_0.C   2019-12-03 
01:30:40.444232221 +0100
@@ -1,11 +1,11 @@
 // { dg-lto-do link }
-/* { dg-lto-options { "-O2 -fno-early-inlining -flto -fdump-ipa-inline" } } */
+/* { dg-lto-options { "-O2 -fno-early-inlining -flto 
-fdump-ipa-inline-details" } } */
 #include "inline-crossmodule-1.h"
 int a::key ()
 {
   return 0;
 }
-/* { dg-final { scan-wpa-ipa-dump-times "Inlined ret1" 1 "inlined"  } } */
-/* { dg-final { scan-wpa-ipa-dump-times "Inlined ret2" 1 "inlined"  } } */
-/* { dg-final { scan-wpa-ipa-dump-times "Inlined key\[^\\n\]*(cross module)" 1 
"inlined"  } } */
-/* { dg-final { scan-wpa-ipa-dump-times "(cross module)" 1 "inlined"  } } */
+/* { dg-final { scan-wpa-ipa-dump-times "Inlined ret1" 1 "inline"  } } */
+/* { dg-final { scan-wpa-ipa-dump-times "Inlined ret2" 1 "inline"  } } */
+/* { dg-final { scan-wpa-ipa-dump-times "Inlined key\[^\\n\]*\\\(cross 
module\\\)" 1 "inline"  } } */
+/* { dg-final { scan-wpa-ipa-dump-times "\\\(cross module\\\)" 1 "inline"  } } 
*/


Jakub

Re: C++ PATCH for c++/91363 - P0960R3: Parenthesized initialization of aggregates

2019-12-02 Thread Marek Polacek

On Thu, Nov 28, 2019 at 01:01:44AM -0500, Jason Merrill wrote:
> On 11/27/19 5:25 PM, Marek Polacek wrote:
> > On Wed, Nov 27, 2019 at 04:54:55PM -0500, Jason Merrill wrote:
> > > > > > @@ -921,8 +921,20 @@ perform_member_init (tree member, tree init)
> > > > > > inform (DECL_SOURCE_LOCATION (member),
> > > > > > "%q#D should be initialized", member );
> > > > > > }
> > > > > > - finish_expr_stmt (build_aggr_init (decl, init, flags,
> > > > > > -tf_warning_or_error));
> > > > > > + init = build_aggr_init (decl, init, flags, 
> > > > > > tf_warning_or_error);
> > > > > > + /* In C++20, a member initializer list can be initializing an
> > > > > > +aggregate from a parenthesized list of values:
> > > > > > +
> > > > > > +  struct S {
> > > > > > +A aggr;
> > > > > > +S() : aggr(1, 2, 3) { }
> > > > > > +  };
> > > > > > +
> > > > > > + In such case, build_aggr_init will build up an INIT_EXPR 
> > > > > > like
> > > > > > + we do for aggr{1, 2, 3}, so that 
> > > > > > build_data_member_initialization
> > > > > > + can grok it.  */
> > > > > > + if (TREE_CODE (init) != INIT_EXPR)
> > > > > > +   finish_expr_stmt (init);
> > > > > 
> > > > > Why don't we want to finish_expr_stmt an INIT_EXPR?
> > > > 
> > > > Because expand_default_init has already finish_expr_stmt-ed it.  Adding 
> > > > it
> > > > twice sounds wrong.
> > > 
> > > But the finish_expr_stmt in expand_default_init is inside the 
> > > STATEMENT_LIST
> > > we pushed into in build_aggr_init.  If finish_init_stmts returned the
> > > INIT_EXPR, we still need to add it to the enclosing STATEMENT_LIST.
> > 
> > I don't understand why none of my tests broke
> 
> This seems to be because build_aggr_init returns an EXPR_STMT, so the test
> didn't affect anything.

Ah, got it.  Sorry about that.

> > --- gcc/cp/call.c
> > +++ gcc/cp/call.c
> > @@ -10111,6 +10111,30 @@ build_new_method_call_1 (tree instance, tree fns, 
> > vec **args,
> > if (!any_viable_p)
> >   {
> > +  /* [dcl.init], 17.6.2.2:
> > +
> > +Otherwise, if no constructor is viable, the destination type is
> > +a (possibly cv-qualified) aggregate class A, and the initializer
> > +is a parenthesized expression-list, the object is initialized as
> > +follows...
> > +
> > +We achieve this by building up a CONSTRUCTOR, as for list-init,
> > +and setting CONSTRUCTOR_IS_PAREN_INIT to distinguish between
> > +the two.  */
> > +  if (DECL_CONSTRUCTOR_P (fn)
> > + && !(flags & LOOKUP_ONLYCONVERTING)
> > + && !cp_unevaluated_operand
> > + && cxx_dialect >= cxx2a
> > + && CP_AGGREGATE_TYPE_P (basetype)
> > + && !user_args->is_empty ())
> > +   {
> > + /* Create a CONSTRUCTOR from ARGS, e.g. {1, 2} from <1, 2>.  */
> > + tree list = build_tree_list_vec (user_args);
> > + tree ctor = build_constructor_from_list (init_list_type_node, list);
> > + CONSTRUCTOR_IS_DIRECT_INIT (ctor) = true;
> > + CONSTRUCTOR_IS_PAREN_INIT (ctor) = true;
> > + return ctor;
> > +   }
> 
> This still bothers me for the case where instance is not a dummy; returning
> the CONSTRUCTOR forgets all about instance and we have to rely on the caller
> reintroducing it.
> 
> I think for that case we want to build an INIT_EXPR here, like the T{}
> handling just above.

OK, adjusted as suggested.

> > --- gcc/cp/init.c
> > +++ gcc/cp/init.c
> > @@ -921,6 +921,17 @@ perform_member_init (tree member, tree init)
> > inform (DECL_SOURCE_LOCATION (member),
> > "%q#D should be initialized", member );
> > }
> > + /* In C++20, a member initializer list can be initializing an
> > +aggregate from a parenthesized list of values:
> > +
> > +  struct S {
> > +A aggr;
> > +S() : aggr(1, 2, 3) { }
> > +  };
> > +
> > + In such case, build_aggr_init will build up an INIT_EXPR like
> > + we do for aggr{1, 2, 3}, so that build_data_member_initialization
> > + can grok it.  */
> 
> I think we can drop this comment, too.

Duly dropped.

I've also come across a new test, so paren-init19.C added.

Bootstrapped/regtested on x86_64-linux, built Boost and cmcstl2.

2019-12-02  Marek Polacek  

PR c++/91363 - P0960R3: Parenthesized initialization of aggregates.
* c-cppbuiltin.c (c_cpp_builtins): Predefine
__cpp_aggregate_paren_init=201902 for -std=c++2a.

* call.c (build_new_method_call_1): Handle parenthesized initialization
of aggregates by building up a CONSTRUCTOR.
(extend_ref_init_temps): Do nothing for CONSTRUCTOR_IS_PAREN_INIT.
* cp-tree.h (CONSTRUCTOR_IS_PAREN_INIT, LOOKUP_AGGREGATE_PAREN_INIT):
Define.
* decl.c (grok_reference_init): Handle aggregate initialization from
a parenthesized

Re: PowerPC V9 patches, Add the PCREL_OPT optimization

Hi!

On Fri, Nov 15, 2019 at 07:17:34PM -0500, Michael Meissner wrote:
> This series of patches adds the PCREL_OPT optimization for the PC-relative
> support in the PowerPC compiler.
> 
> This optimization convert a single load or store of an external variable to 
> use
> the R_PPC64_PCREL_OPT relocation.
> 
> For example, a normal load of an external int variable (with -mcpu=future)
> would generate:
> 
>   PLD 9,ext_symbol@got@pcrel(0),1
>   LWA 10,0(9)
> 
> That is, load the address of 'ext_symbol' into register 9.  If 'ext_symbol' is
> defined in another module in the main program, and the current module is also
> in the main program, the linker will optimize this to:

What does "module" mean?  Translation unit?  Object file?  And "main
program" is what ELF calls "executable", right?

You don't need to say that here, it is not something the compiler can do
anything about.  You could just say "if possible, the linker will..." etc.

>   PADDI 9,ext_symbol(0),1
>   LWZ 10,0(9)

I don't think it will change an lwa insn to an lwz?  Probably it should
be lwz throughout?

Is that "paddi" syntax correct?  I think you might mean
"paddi 9,0,ext_symbol,1", aka "pla 9,ext_symbol"?

> If either the definition is not in the main program or we are linking for a
> shared library, the linker will create an address in the .got section and do a
> PLD of it:
> 
>   .section .got
>   .got.ext_symbol:
>   .quad ext_symbol
> 
>   .section .text
>   PLD 9,.got.ext_symbol(0),1
>   LWZ 10,0(9)

Like what the user wrote, sure -- the linker does not optimise it, does
not change it?  Or am I missing something?

> If the only use of the GOT address is a single load and store, we can optimize
> this further:

A single load *or* store.

>   PLD 9,ext_symbol@got@pcrel(0),1
>   .Lpcrel1:
>   .reloc .Lpcrel1-8,R_PPC64_PCREL_OPT,.-(.Lpcrel1-8)
>   LWZ 10,0(9)
> 
> In this case, if the variable is defined in another module for the main
> program, and we are linking for the main program, the linker will transform
> this to:
> 
>   PLWZ 10,ext_symbol@pcrel(0),1
>   NOP
> 
> There can be arbitrary instructions between the PLD and the LWA (or STW).

... because that is what that relocation means.  The compiler still has
to make sure that any such insns should not prevent this transform.

> For either loads or store, register 9 must only be used in the load or store,
> and must die at that point.
> 
> For loads, there must be no reference to register 10 between the PLD and the
> LWZ.  For a store, register 10 must be live at the PLD instruction, and must
> not be modified between the PLD and the STW instructions.

"No reference"...  Nothing indirect either (like from a function call,
or simply some insn that does not name the register directly).  Or code
like

pld 9,ext_symbol@got@pcrel(0),1 ; .Lpcrel1:
.reloc .Lpcrel1-8,R_PPC64_PCREL_OPT,.-(.Lpcrel1-8)
b 2f

here:   # some code that does not explicitly reference r10 here,
# but r10 is live here nevertheless, and is used later
b somewhere_else

2:  lwz 10,0(9)

complicates your analysis, too.  So something DF is needed here, or
there are lots and lots and lots of cases to look out for.

Segher

Re: [PATCH v2 2/2] testsuite: Fix run-time tracking down of `libgcc_s'

2019-12-02 Thread Ian Lance Taylor

On Fri, Nov 29, 2019 at 1:09 AM Maciej W. Rozycki  wrote:
>
> Fix a catastrophic libgo testsuite failure in cross-compilation where
> the shared `libgcc_s' library cannot be found by the loader at run time
> in build-tree testing and consequently all test cases fail the execution
> stage, giving output (here with the `x86_64-linux-gnu' host and the
> `riscv64-linux-gnu' target, with RISC-V QEMU in the Linux user emulation
> mode as the target board) like:
>
> spawn qemu-riscv64 -E 
> LD_LIBRARY_PATH=.:.../riscv64-linux-gnu/lib64/lp64d/libgo/.libs ./a.exe
> ./a.exe: error while loading shared libraries: libgcc_s.so.1: cannot open 
> shared object file: No such file or directory
> FAIL: archive/tar
>
> To do so rework `gcc-set-multilib-library-path' so as not to rely on the
> `rootme' TCL variable to have been preset in testsuite invocation, which
> only works for the GCC test suites and not for library test suites, and
> also use `remote_exec host' rather than `exec' to invoke the compiler in
> determination of `libgcc_s' locations, so that the solution works in
> remote testing as well while also avoiding the hardcoded limit of the
> executable's path length imposed by `exec'.
>
> This is based on an observation that the multilib root directory can be
> determined by stripping out the multilib directory in effect as printed
> with the `-print-multi-directory' option from the path produced by the
> `-print-file-name=' option.  And then individual full multilib paths can
> be assembled for the other multilibs by appending their respective
> multilib directories to the multilib root directory.
>
> Unlike with the old solution the full multilib paths are not checked for
> the presence of the shared `libgcc_s' library there, but that is
> supposed to be harmless.  Also the full multilib path for the multilib
> used with the compiler used for testing will now come first, which
> should reduce run-time processing in the usual case.
>
> With this change in place test output instead looks like:
>
> spawn qemu-riscv64 -E 
> LD_LIBRARY_PATH=.:.../riscv64-linux-gnu/lib64/lp64d/libgo/.libs:..././gcc/lib64/lp64d:..././gcc/.:..././gcc/lib32/ilp32:..././gcc/lib32/ilp32d:..././gcc/lib64/lp64
>  ./a.exe
> PASS
> PASS: archive/tar
>
> No summary comparison, because the libgo testsuite does not provide one
> in this configuration for some reason, however this change improves
> overall results from 0 PASSes and 159 FAILs to 133 PASSes and 26 FAILs.
>
> gcc/testsuite/
> * lib/gcc-defs.exp (gcc-set-multilib-library-path): Use
> `-print-file-name=' to determine the multilib root directory.
> Use `remote_exec host' rather than `exec' to invoke the
> compiler.

This patch is for better Go support but the patch is to generic code.
The patch is fine with me but should ideally be reviewed by a
testsuite maintainer.

Ian

Re: [C++ Patch] A few more cp_expr_loc_or_input_loc and a diagnostic regression fix

2019-12-02 Thread Paolo Carlini


Hi,

On 02/12/19 19:58, Jason Merrill wrote:

On 11/29/19 8:08 AM, Paolo Carlini wrote:

Hi,

a few more rather straightforward uses for cp_expr_loc_or_input_loc.

Additionally, while working on the latter, I noticed that, compared 
to say, gcc-7, lately the code we have in cp_build_addr_expr_1 to 
diagnose taking the address of 'main' often doesn't work anymore, 
when the argument is wrapped in a location_wrapper. The below fixes 
that by using tree_strip_any_location_wrapper in the DECL_MAIN_P 
check, which works fine, but I can imagine various other ways to 
solve the issue...


Maybe

location_t loc = cp_expr_loc_or_input_loc (arg);
STRIP_ANY_LOCATION_WRAPPER (arg);

at the top?  In general I prefer the local variable to writing 
cp_expr_loc_or_input_loc in lots of places, whether or not we then 
strip wrappers.


Sure. In a few circumstances I hesitated because 
cp_expr_loc_or_input_loc isn't really trivial, boils down to quite a few 
conditionals, and adds a bit to the complexity of simple functions when 
in fact no errors nor warnings are issued. But certainly calling it at 
the top is often much cleaner. I changed both the functions.


However, using STRIP_ANY_LOCATION_WRAPPER at the beginning of 
cp_build_addr_expr_1 doesn't seem straightforward, causes a few 
regressions (wrong location for conversion/Wwrite-strings.C; even an ICE 
for cpp1z/constexpr-lambda23.C; cpp1z/decomp48.C). The function doesn't 
seem trivial from this point of view, there is already a localized 
tree_strip_any_location_wrapper near the end, for 
processing_template_decl. I would rather further investigate that kind 
of simplification in a separate patch, beyond the regression fix. 
(before I would like to improve the locations for all the various kinds 
of cast, quite a few changes)


Thanks, Paolo.

///


Index: cp/typeck.c
===
--- cp/typeck.c (revision 278911)
+++ cp/typeck.c (working copy)
@@ -6061,6 +6061,7 @@ cp_build_addr_expr_1 (tree arg, bool strict_lvalue
 {
   tree argtype;
   tree val;
+  location_t loc;
 
   if (!arg || error_operand_p (arg))
 return error_mark_node;
@@ -6070,6 +6071,7 @@ cp_build_addr_expr_1 (tree arg, bool strict_lvalue
 return error_mark_node;
 
   argtype = lvalue_type (arg);
+  loc = cp_expr_loc_or_input_loc (arg);
 
   gcc_assert (!(identifier_p (arg) && IDENTIFIER_ANY_OP_P (arg)));
 
@@ -6103,12 +6105,14 @@ cp_build_addr_expr_1 (tree arg, bool strict_lvalue
  else if (current_class_type
   && TREE_OPERAND (arg, 0) == current_class_ref)
/* An expression like   */
-   permerror (input_location, "ISO C++ forbids taking the address of 
an unqualified"
+   permerror (loc,
+  "ISO C++ forbids taking the address of an unqualified"
   " or parenthesized non-static member function to form"
   " a pointer to member function.  Say %<&%T::%D%>",
   base, name);
  else
-   permerror (input_location, "ISO C++ forbids taking the address of a 
bound member"
+   permerror (loc,
+  "ISO C++ forbids taking the address of a bound member"
   " function to form a pointer to member function."
   "  Say %<&%T::%D%>",
   base, name);
@@ -6135,7 +6139,7 @@ cp_build_addr_expr_1 (tree arg, bool strict_lvalue
   if (kind == clk_none)
{
  if (complain & tf_error)
-   lvalue_error (cp_expr_loc_or_input_loc (arg), lv_addressof);
+   lvalue_error (loc, lv_addressof);
  return error_mark_node;
}
   if (strict_lvalue && (kind & (clk_rvalueref|clk_class)))
@@ -6143,8 +6147,7 @@ cp_build_addr_expr_1 (tree arg, bool strict_lvalue
  if (!(complain & tf_error))
return error_mark_node;
  /* Make this a permerror because we used to accept it.  */
- permerror (cp_expr_loc_or_input_loc (arg),
-"taking address of rvalue");
+ permerror (loc, "taking address of rvalue");
}
 }
 
@@ -6154,13 +6157,13 @@ cp_build_addr_expr_1 (tree arg, bool strict_lvalue
   arg = build1 (CONVERT_EXPR, type, arg);
   return arg;
 }
-  else if (pedantic && DECL_MAIN_P (arg))
+  else if (pedantic && DECL_MAIN_P (tree_strip_any_location_wrapper (arg)))
 {
   /* ARM $3.4 */
   /* Apparently a lot of autoconf scripts for C++ packages do this,
 so only complain if -Wpedantic.  */
   if (complain & (flag_pedantic_errors ? tf_error : tf_warning))
-   pedwarn (input_location, OPT_Wpedantic,
+   pedwarn (loc, OPT_Wpedantic,
 "ISO C++ forbids taking address of function %<::main%>");
   else if (flag_pedantic_errors)
return error_mark_node;
@@ -6218,7 +6221,8 @@ cp_build_addr_expr_1 (tree arg, bool strict_lvalue
if (TYPE_REF_P

Re: [C++ PATCH] Fix ICE in build_new_op_1 (PR c++/92705)

On Mon, Dec 02, 2019 at 01:48:46PM -0500, Jason Merrill wrote:
> On 11/29/19 6:33 PM, Jakub Jelinek wrote:
> > Hi!
> > 
> > The changed code in build_new_op_1 ICEs on the following testcase,
> > because conv is user_conv_p with kind == ck_ambig, for which next_conversion
> > returns NULL.  It seems in other spots where for user_conv_p we are walking
> > the conversion chain we also don't assume there must be ck_user, so this
> > patch just uses the first conv if ck_user is not found (so that the previous
> > diagnostics about ambiguous conversion is emitted).
> 
> It seems like various places could use a function to strip down to the first
> ck_user, ck_ambig, ck_list, ck_aggr, or ck_identity conversion encountered:
> source_type, the user-defined conversion comparision in compare_ics, and now
> here.  Mind doing that refactoring?  Maybe call it
> strip_standard_conversion.

In neither of the spots it is exactly equivalent to what the code was doing
before (or what the patched build_new_op_1 did), but this is an area I know
next to nothing about, so I've just tried to type into patch what you wrote
above.  Do you mean something like below (didn't see regressions in make
check-c++-all, but haven't tried full bootstrap/regtest yet)?

2019-12-02  Jason Merrill  
Jakub Jelinek  

PR c++/92705
* call.c (strip_standard_conversion): New function.
(build_new_op_1): Use it for user_conv_p.
(compare_ics): Likewise.
(source_type): Likewise.

* g++.dg/conversion/ambig4.C: New test.

--- gcc/cp/call.c.jj2019-11-30 00:15:46.298953538 +0100
+++ gcc/cp/call.c   2019-12-02 23:02:48.494379961 +0100
@@ -865,6 +865,22 @@ next_conversion (conversion *conv)
   return conv->u.next;
 }
 
+/* Strip to the first ck_user, ck_ambig, ck_list, ck_aggr or ck_identity
+   encountered.  */
+
+static conversion *
+strip_standard_conversion (conversion *conv)
+{
+  while (conv
+&& conv->kind != ck_user
+&& conv->kind != ck_ambig
+&& conv->kind != ck_list
+&& conv->kind != ck_aggr
+&& conv->kind != ck_identity)
+conv = next_conversion (conv);
+  return conv;
+}
+
 /* Subroutine of build_aggr_conv: check whether CTOR, a braced-init-list,
is a valid aggregate initializer for array type ATYPE.  */
 
@@ -6370,8 +6386,7 @@ build_new_op_1 (const op_location_t 
  conv = cand->convs[0];
  if (conv->user_conv_p)
{
- while (conv->kind != ck_user)
-   conv = next_conversion (conv);
+ conv = strip_standard_conversion (conv);
  arg1 = convert_like (conv, arg1, complain);
}
 
@@ -6380,8 +6395,7 @@ build_new_op_1 (const op_location_t 
  conv = cand->convs[1];
  if (conv->user_conv_p)
{
- while (conv->kind != ck_user)
-   conv = next_conversion (conv);
+ conv = strip_standard_conversion (conv);
  arg2 = convert_like (conv, arg2, complain);
}
}
@@ -6391,8 +6405,7 @@ build_new_op_1 (const op_location_t 
  conv = cand->convs[2];
  if (conv->user_conv_p)
{
- while (conv->kind != ck_user)
-   conv = next_conversion (conv);
+ conv = strip_standard_conversion (conv);
  arg3 = convert_like (conv, arg3, complain);
}
}
@@ -10538,17 +10551,8 @@ compare_ics (conversion *ics1, conversio
   if (ics1->user_conv_p || ics1->kind == ck_list
   || ics1->kind == ck_aggr || ics2->kind == ck_aggr)
 {
-  conversion *t1;
-  conversion *t2;
-
-  for (t1 = ics1; t1 && t1->kind != ck_user; t1 = next_conversion (t1))
-   if (t1->kind == ck_ambig || t1->kind == ck_aggr
-   || t1->kind == ck_list)
- break;
-  for (t2 = ics2; t2 && t2->kind != ck_user; t2 = next_conversion (t2))
-   if (t2->kind == ck_ambig || t2->kind == ck_aggr
-   || t2->kind == ck_list)
- break;
+  conversion *t1 = strip_standard_conversion (ics1);
+  conversion *t2 = strip_standard_conversion (ics2);
 
   if (!t1 || !t2 || t1->kind != t2->kind)
return 0;
@@ -10956,14 +10960,7 @@ compare_ics (conversion *ics1, conversio
 static tree
 source_type (conversion *t)
 {
-  for (;; t = next_conversion (t))
-{
-  if (t->kind == ck_user
- || t->kind == ck_ambig
- || t->kind == ck_identity)
-   return t->type;
-}
-  gcc_unreachable ();
+  return strip_standard_conversion (t)->type;
 }
 
 /* Note a warning about preferring WINNER to LOSER.  We do this by storing
--- gcc/testsuite/g++.dg/conversion/ambig4.C.jj 2019-12-02 22:52:24.407017475 
+0100
+++ gcc/testsuite/g++.dg/conversion/ambig4.C2019-12-02 22:52:24.407017475 
+0100
@@ -0,0 +1,14 @@
+// PR c++/92705
+// { dg-do compile }
+
+struct A {};
+struct B {};
+struct C { operator B * (); }; // {

Re: [C++ PATCH] c++/91353 - P1331R2: Allow trivial default init in constexpr contexts.

2019-12-02 Thread Marek Polacek

On Mon, Dec 02, 2019 at 12:09:17PM -0500, Jason Merrill wrote:
> On 12/1/19 8:09 PM, Marek Polacek wrote:
> > --- gcc/cp/constexpr.c
> > +++ gcc/cp/constexpr.c
> > @@ -779,7 +779,9 @@ cx_check_missing_mem_inits (tree ctype, tree body, bool 
> > complain)
> > if (TREE_CODE (ctype) == UNION_TYPE)
> >   {
> > -  if (nelts == 0 && next_initializable_field (field))
> > +  if (cxx_dialect < cxx2a
> > + && nelts == 0
> > + && next_initializable_field (field))
> 
> Do we want cx_check_missing_mem_inits to do anything in C++20?  Or can we
> return immediately at the beginning?

We can just return right away.  Changed.

> > {
> >   if (complain)
> > error ("% constructor for union %qT must "
> > @@ -2153,15 +2157,27 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, 
> > tree t,
> > entry->result = result;
> >   }
> > -  /* The result of a constexpr function must be completely initialized.  */
> > -  if (TREE_CODE (result) == CONSTRUCTOR)
> > +  /* The result of a constexpr function must be completely initialized.
> > +
> > + However, in C++20, a constexpr constructor doesn't necessarily have
> > + to initialize all the fields, so we don't clear 
> > CONSTRUCTOR_NO_CLEARING
> > + in order to detect reading an unitialized object in constexpr instead
> > + of value-initializing it.  (reduced_constant_expression_p is expected 
> > to
> > + take care of clearing the flag.)  */
> > +  if (TREE_CODE (result) == CONSTRUCTOR
> > +  && (cxx_dialect < cxx2a
> > + || !DECL_CONSTRUCTOR_P (fun)
> > + || !DECL_DECLARED_CONSTEXPR_P (fun)))
> 
> How can we get here for a non-constexpr function?

We can't -- I was just being explicit.  I dropped the DECL_DECLARED_CONSTEXPR_P
line.

> >   clear_no_implicit_zero (result);
> > pop_cx_call_context ();
> > return result;
> >   }
> > -/* FIXME speed this up, it's taking 16% of compile time on sieve testcase. 
> >  */
> > +/* Return true if T is a valid constant initializer.  If a CONSTRUCTOR
> > +   initializes all the members, the CONSTRUCTOR_NO_CLEARING flag will be
> > +   cleared.
> > +   FIXME speed this up, it's taking 16% of compile time on sieve testcase. 
> >  */
> >   bool
> >   reduced_constant_expression_p (tree t)
> > @@ -2183,8 +2199,17 @@ reduced_constant_expression_p (tree t)
> >   if (TREE_CODE (TREE_TYPE (t)) == VECTOR_TYPE)
> > /* An initialized vector would have a VECTOR_CST.  */
> > return false;
> > + else if (cxx_dialect >= cxx2a
> > +  /* An ARRAY_TYPE doesn't have any TYPE_FIELDS.  */
> > +  && (TREE_CODE (TREE_TYPE (t)) == ARRAY_TYPE
> > +  /* A union only initializes one member.  */
> > +  || TREE_CODE (TREE_TYPE (t)) == UNION_TYPE))
> > +   field = NULL_TREE;
> >   else
> > -   field = next_initializable_field (TYPE_FIELDS (TREE_TYPE (t)));
> > +   /* In C++20, skip fields that don't actually need initializing,
> > +  like empty bases.  */
> > +   field = next_initializable_field (TYPE_FIELDS (TREE_TYPE (t)),
> > + cxx_dialect >= cxx2a);
> > }
> > else
> > field = NULL_TREE;
> > @@ -2198,7 +2223,8 @@ reduced_constant_expression_p (tree t)
> > {
> >   if (idx != field)
> > return false;
> > - field = next_initializable_field (DECL_CHAIN (field));
> > + field = next_initializable_field (DECL_CHAIN (field),
> > +   cxx_dialect >= cxx2a);
> > }
> > }
> > if (field)
> > diff --git gcc/cp/cp-tree.h gcc/cp/cp-tree.h
> > index 7e810b8ee7b..35f3352bdff 100644
> > --- gcc/cp/cp-tree.h
> > +++ gcc/cp/cp-tree.h
> > @@ -6541,7 +6541,7 @@ extern bool is_direct_enum_init   
> > (tree, tree);
> >   extern void initialize_artificial_var (tree, 
> > vec *);
> >   extern tree check_var_type(tree, tree, 
> > location_t);
> >   extern tree reshape_init(tree, tree, 
> > tsubst_flags_t);
> > -extern tree next_initializable_field (tree);
> > +extern tree next_initializable_field   (tree, bool = false);
> >   extern tree fndecl_declared_return_type   (tree);
> >   extern bool undeduced_auto_decl   (tree);
> >   extern bool require_deduced_type  (tree, tsubst_flags_t = 
> > tf_warning_or_error);
> > diff --git gcc/cp/decl.c gcc/cp/decl.c
> > index 81d73433547..9feba7d5b54 100644
> > --- gcc/cp/decl.c
> > +++ gcc/cp/decl.c
> > +next_initializable_field (tree field, bool skip_empty /*=false*/)
> >   {
> > while (field
> >  && (TREE_CODE (field) != FIELD_DECL
> >  || DECL_UNNAMED_BIT_FIELD (field)
> >  || (DECL_ARTIFICIAL (field)
> > -&& !(cxx_dialect >= cxx17 && DECL_FIELD_IS_BASE (field)
> > +/* In C++17, don't skip base class fields.  */
> > +

Re: [PATCH], V8, #6 of #6, Testsuite: Test -fstack-protect-strong works with prefixed addressing

Hi Mike,

On Thu, Nov 14, 2019 at 07:26:44PM -0500, Michael Meissner wrote:
>   * gcc.target/powerpc/prefix-stack-protect.c: New test to make sure
>   -fstack-protect-strong works with prefixed addressing.

The option is -fstack-protector-strong.  But what is tested is equally
true for any use of the stack protector, whichever subflag is used, so
it should just say that (and in the changelog, just "New test.").

> --- /tmp/byVdyb_prefix-stack-protect.c2019-11-13 17:45:35.374176204 
> -0500
> +++ gcc/testsuite/gcc.target/powerpc/prefix-stack-protect.c   2019-11-13 
> 17:45:35.143174125 -0500
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_prefixed_addr_ok } */
> +/* { dg-options "-O2 -mdejagnu-cpu=future -fstack-protector-strong" } */
> +
> +/* Test that we can handle large stack frames with -fstack-protector-strong 
> and
> +   prefixed addressing.  */
> +
> +extern long foo (char *);
> +
> +long
> +bar (void)
> +{
> +  char buffer[0x2];
> +  return foo (buffer) + 1;
> +}
> +
> +/* { dg-final { scan-assembler {\mpld\M}  } } */
> +/* { dg-final { scan-assembler {\mpstd\M} } } */

So what it actually tests is if we use prefixed loads and stores for
the stack protector generated code?


Segher

Re: [C++ PATCH] Fix up constexpr virtual calls (PR c++/92695)

On Mon, Dec 02, 2019 at 02:29:56PM -0500, Jason Merrill wrote:
> On 11/27/19 6:38 PM, Jakub Jelinek wrote:
> > + if (i == 0 && DECL_VIRTUAL_P (fun))
> > +   {
> > + tree addr = arg;
> > + STRIP_NOPS (addr);
> > + if (TREE_CODE (addr) == ADDR_EXPR)
> > +   {
> > + tree obj = TREE_OPERAND (addr, 0);
> > + while (TREE_CODE (obj) == COMPONENT_REF
> > +&& DECL_FIELD_IS_BASE (TREE_OPERAND (obj, 1)))
> > +   obj = TREE_OPERAND (obj, 0);
> 
> Shouldn't this loop stop when we reach DECL_CONTEXT (fun)?

I guess it can, patch I can test tonight attached.  I thought it doesn't matter
much, because cxx_eval_indirect_ref would be able to go from further derived
class to the base if we want too far, but maybe that is not the case
with multiple inheritance.

> Relatedly, I notice that the OBJ_TYPE_REF code breaks with multiple
> inheritance: Adding a virtual function to Y in constexpr-virtual6.C leads to

I'm afraid I'd be lost with that.  There is another bug in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92695#c12 about constexpr
defaulted virtual destructors not being synthetized until end of TU.

2019-12-02  Jakub Jelinek  

PR c++/92695
* constexpr.c (cxx_bind_parameters_in_call): For virtual calls,
adjust the first argument to point to the derived object rather
than its base.

* g++.dg/cpp2a/constexpr-virtual14.C: New test.

--- gcc/cp/constexpr.c.jj   2019-12-02 22:30:31.608306270 +0100
+++ gcc/cp/constexpr.c  2019-12-02 22:35:42.374501609 +0100
@@ -1441,6 +1441,26 @@ cxx_bind_parameters_in_call (const const
arg = adjust_temp_type (type, arg);
  if (!TREE_CONSTANT (arg))
*non_constant_args = true;
+ /* For virtual calls, adjust the this argument, so that it is
+the object on which the method is called, rather than
+one of its bases.  */
+ if (i == 0 && DECL_VIRTUAL_P (fun))
+   {
+ tree addr = arg;
+ STRIP_NOPS (addr);
+ if (TREE_CODE (addr) == ADDR_EXPR)
+   {
+ tree obj = TREE_OPERAND (addr, 0);
+ while (TREE_CODE (obj) == COMPONENT_REF
+&& DECL_FIELD_IS_BASE (TREE_OPERAND (obj, 1))
+&& !same_type_ignoring_top_level_qualifiers_p
+   (TREE_TYPE (obj), DECL_CONTEXT (fun)))
+   obj = TREE_OPERAND (obj, 0);
+ if (obj != TREE_OPERAND (addr, 0))
+   arg = build_fold_addr_expr_with_type (obj,
+ TREE_TYPE (arg));
+   }
+   }
  TREE_VEC_ELT (binds, i) = arg;
}
   parms = TREE_CHAIN (parms);
--- gcc/testsuite/g++.dg/cpp2a/constexpr-virtual14.C.jj 2019-12-02 
22:34:05.554998508 +0100
+++ gcc/testsuite/g++.dg/cpp2a/constexpr-virtual14.C2019-12-02 
22:34:05.554998508 +0100
@@ -0,0 +1,27 @@
+// PR c++/92695
+// { dg-do compile { target c++2a } }
+
+struct A {
+  virtual int get () = 0;
+  virtual int set (A *o) = 0;
+};
+struct B : A {
+  constexpr int get () override { return 10; }
+  constexpr int set (A *o) override { a = o; return 20; }
+  A *a {};
+};
+constexpr auto addressof = [] (A ) { return  };
+struct C {
+  B b;
+  A *c { addressof (b) };
+  constexpr int add () { return c->set (addressof (b)); }
+};
+struct D {
+  B b[2];
+  A *c { addressof (b[0]) };
+  constexpr int add () { return c->set (addressof (b[0])); }
+};
+template 
+constexpr int get () { T f; return f.add (); }
+static_assert (get () == 20);
+static_assert (get () == 20);


Jakub

[PATCH] doc: Note that some warnings depend on optimizations (PR 92757)

2019-12-02 Thread Jonathan Wakely


PR driver/92757
* doc/invoke.texi (Warning Options): Add caveat about some warnings
depending on optimization settings.

The bug reporter wants this clarified. I'm not entirely convinced it's
necessary, but it doesn't seem to do any harm.

OK for trunk?


commit 5e0f6aebde20712f318d3bcfcd25809e04be4460
Author: Jonathan Wakely 
Date:   Mon Dec 2 21:32:54 2019 +

doc: Note that some warnings depend on optimizations (PR 92757)

PR driver/92757
* doc/invoke.texi (Warning Options): Add caveat about some warnings
depending on optimization settings.

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index d165f31a865..099faf5f662 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -4361,6 +4361,11 @@ are being produced.  This allows the use of new 
@option{-Wno-} options
 with old compilers, but if something goes wrong, the compiler
 warns that an unrecognized option is present.
 
+The effectiveness of some warnings depends on optimizations also being
+enabled. For example @option{-Wsuggest-final-types} is more effective
+with link-time optimization and @option{-Wmaybe-uninitialized} will not
+warn at all unless optimization is enabled.
+
 @table @gcctabopt
 @item -Wpedantic
 @itemx -pedantic

Re: [GCC][PATCH] Add ARM-specific Bfloat format support to middle-end

2019-12-02 Thread Joseph Myers

On Mon, 2 Dec 2019, Jeff Law wrote:

> > 2019-11-13  Stam Markianos-Wright  
> > 
> >* real.c (struct arm_bfloat_half_format,
> >encode_arm_bfloat_half, decode_arm_bfloat_half): New.
> >* real.h (arm_bfloat_half_format): New.
> > 
> > 
> Generally OK.  Please consider using "arm_bfloat_half" instead of
> "bfloat_half" for the name field in the arm_bfloat_half_format
> structure.  I'm not sure if that's really visible externally, but it

Isn't this the same format used by AVX512_BF16 / Intel DL Boost (albeit 
with Arm and Intel using different rounding modes)?

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH], V8, #5 of 6, Testsuite: Test PC-relative load/store instructions

On Thu, Nov 14, 2019 at 07:24:15PM -0500, Michael Meissner wrote:
> This patch adds tests for using the PC-relative addressing on the 'future'
> system.

>   * gcc/testsuite/gcc.target/powerpc/prefix-pcrel.h: New set of
>   tests to test prefixed addressing on 'future' system with
>   PC-relative tests.

This is one file, it is not a "set".  Just say "New file."?

> --- /tmp/79Y8V6_prefix-pcrel-dd.c 2019-11-13 17:43:34.462087329 -0500
> +++ gcc/testsuite/gcc.target/powerpc/prefix-pcrel-dd.c2019-11-13 
> 17:43:34.183084816 -0500
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_pcrel_ok } */
> +/* { dg-options "-O2 -mdejagnu-cpu=future" } */
> +
> +/* Tests for prefixed instructions testing whether pc-relative prefixed
> +   instructions are generated for SImode.  */

This is DDmode.

> --- /tmp/BalpdH_prefix-pcrel-kf.c 2019-11-13 17:43:34.494087617 -0500
> +++ gcc/testsuite/gcc.target/powerpc/prefix-pcrel-kf.c2019-11-13 
> 17:43:34.205085014 -0500
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_pcrel_ok } */
> +/* { dg-options "-O2 -mdejagnu-cpu=future" } */
> +
> +/* Tests for prefixed instructions testing whether pc-relative prefixed
> +   instructions are generated for KFmode.  */
> +
> +#define TYPE __float128

So use __ieee128, instead?

This will be *first* supported on powerpc64le-linux only, sure.

> --- /tmp/74AMuu_prefix-pcrel-sd.c 2019-11-13 17:43:34.509087752 -0500
> +++ gcc/testsuite/gcc.target/powerpc/prefix-pcrel-sd.c2019-11-13 
> 17:43:34.215085104 -0500
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_pcrel_ok } */
> +/* { dg-options "-O2 -mdejagnu-cpu=future" } */
> +
> +/* Tests for prefixed instructions testing whether pc-relative prefixed
> +   instructions are generated for SImode.  */

SDmode.

> +typedef signed char  schar;
> +typedef unsigned charuchar;
> +typedef unsigned short   ushort;
> +typedef unsigned int uint;
> +typedef unsigned longulong;
> +typedef long double  ldouble;
> +typedef vector doublev2df;
> +typedef vector long  v2di;
> +typedef vector float v4sf;
> +typedef vector int   v4si;
> +
> +#ifndef TYPE
> +#define TYPE ulong
> +#endif

Same comment as before.

> +#if DO_VALUE
> +OTYPE
> +value (void)
> +{
> +  return (OTYPE)a;
> +}
> +#endif
> +
> +#if DO_SET
> +void
> +set (ITYPE b)
> +{
> +  a = (TYPE)b;
> +}
> +#endif

The casts here are not needed, and will only make problems harder to find?


Segher

Re: [PATCH], V8, #4 of 6, Testsuite: Test for prefixed instructions with large offsets

Hi!

On Thu, Nov 14, 2019 at 07:21:31PM -0500, Michael Meissner wrote:
> This patch tests whether using large numeric offsets causes prefixed loads or
> stores to be generated.

> --- /tmp/RMaUEu_prefix-large-dd.c 2019-11-13 17:42:31.960524470 -0500
> +++ gcc/testsuite/gcc.target/powerpc/prefix-large-dd.c2019-11-13 
> 17:42:31.719522299 -0500
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_prefixed_addr_ok } */
> +/* { dg-options "-O2 -mdejagnu-cpu=future" } */
> +
> +/* Tests for prefixed instructions testing whether we can generate a prefixed
> +   load/store instruction that has a 34-bit offset.  */

"Do", not "can", etc.  And it tests for 21-bit (unsigned) or 22-bit
(signed) numbers, not 34-bit.  You could just say "larger offset", or
similar?

> --- /tmp/rcmxsH_prefix-large-sd.c 2019-11-13 17:42:32.004524866 -0500
> +++ gcc/testsuite/gcc.target/powerpc/prefix-large-sd.c2019-11-13 
> 17:42:31.751522588 -0500
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_prefixed_addr_ok } */
> +/* { dg-options "-O2 -mdejagnu-cpu=future" } */
> +
> +/* Tests for prefixed instructions testing whether we can generate a prefixed
> +   load/store instruction that has a 34-bit offset.  */

That is not what this test tests:

> +#define TYPE _Decimal32
> +
> +#include "prefix-large.h"
> +
> +/* { dg-final { scan-assembler-times {\mpaddi\M|\mpli|\mpla\M} 3 } } */
> +/* { dg-final { scan-assembler-times {\mlfiwzx\M}  2 } } */
> +/* { dg-final { scan-assembler-times {\mstfiwx\M}  2 } } */

It tests that we *do not* generate prefixed load and store insns.

> --- /tmp/ROIUrk_prefix-large.h2019-11-13 17:42:32.061525379 -0500
> +++ gcc/testsuite/gcc.target/powerpc/prefix-large.h   2019-11-13 
> 17:42:31.775522804 -0500
> @@ -0,0 +1,59 @@
> +/* Common tests for prefixed instructions testing whether we can generate a
> +   34-bit offset using 1 instruction.  */
> +
> +typedef signed char  schar;
> +typedef unsigned charuchar;
> +typedef unsigned short   ushort;
> +typedef unsigned int uint;
> +typedef unsigned longulong;
> +typedef long double  ldouble;

"long double" isn't a specific type, not without other setup, but you
never use this one anyway.  Well, none of these typedefs at all?  So just
remove it all please.

> +typedef vector doublev2df;
> +typedef vector long  v2di;
> +typedef vector float v4sf;
> +typedef vector int   v4si;

> +#ifndef TYPE
> +#define TYPE ulong
> +#endif

(So adjust this one then).


Segher

Re: [PATCH], V8, #3 of 6, Testsuite: Insure no prefixed instruction uses update addressing

On Thu, Nov 14, 2019 at 07:19:15PM -0500, Michael Meissner wrote:
> +/* { dg-final { scan-assembler-times {\mpli\M|\mpla\M|\mpaddi\M} 4 } } */

How can this generate pli or pla?  If this is something this test tests
for, the comment should say.

> +/* { dg-final { scan-assembler-not   {\mp?stwzu\M} } } */

That instruction does not exist anyway (it's stwu).

> +/* { dg-final { scan-assembler-not   {\maddis\M}   } } */
> +/* { dg-final { scan-assembler-not   {\maddi\M}} } */

See above.

Segher

[PATCH] Enhance _GLIBCXX_DEBUG constexpr support

2019-12-02 Thread François Dumont


Hi

    Here is a patch to enhance constexpr support in _GLIBCXX_DEBUG. I 
work on std::lower_bound/upper_bound to find out if Debug mode is well 
prepared. I'll continue on other algos later.


    I initially hope that I could count on the compiler for the 
valid_range check. But for lower_bound/upper_bound there is no constexpr 
dedicated implementation like for copy/copy_backward so it implies 
changing the existing implementation. So I tried to change the 
while(__len > 0) into a while(__len != 0) and it gave:


constexpr_valid_range_neg.cc:35:20: error: non-constant condition for 
static assertion

   35 | static_assert(test1()); // { dg-error "" }
  |   ~^~
In file included from 
/home/fdt/dev/gcc/install/include/c++/10.0.0/algorithm:61,

 from constexpr_valid_range_neg.cc:22:
constexpr_valid_range_neg.cc:35:20:   in ‘constexpr’ expansion of ‘test1()’
constexpr_valid_range_neg.cc:30:38:   in ‘constexpr’ expansion of 
‘std::lower_bound(ca0.std::array::end(), 
ca0.std::array::begin(), 6)’
/home/fdt/dev/gcc/install/include/c++/10.0.0/bits/stl_algobase.h:1484:32: 
in ‘constexpr’ expansion of ‘std::__lower_bound__gnu_cxx::__ops::_Iter_less_val>(__first, __last, (* & __val), 
__gnu_cxx::__ops::__iter_less_val())’
/home/fdt/dev/gcc/install/include/c++/10.0.0/bits/stl_algobase.h:1444:7: 
error: ‘constexpr’ loop iteration count exceeds limit of 262144 (use 
‘-fconstexpr-loop-limit=’ to increase the limit)

 1444 |   while (__len != 0)
  |   ^

    It seems ok but it isn't. The compiler had to loop 262144 times to 
eventually give this status which is not even clear about the fact that 
begin/end has been inverted. It is a quite heavy operation for a limited 
result.


    So this patch rather enable _GLIBCXX_DEBUG valid_range check which 
gives:


constexpr_valid_range_neg.cc:35:20: error: non-constant condition for 
static assertion

   35 | static_assert(test1()); // { dg-error "" }
  |   ~^~
In file included from 
/home/fdt/dev/gcc/install/include/c++/10.0.0/debug/debug.h:90,
 from 
/home/fdt/dev/gcc/install/include/c++/10.0.0/bits/stl_algobase.h:69,
 from 
/home/fdt/dev/gcc/install/include/c++/10.0.0/algorithm:61,

 from constexpr_valid_range_neg.cc:22:
constexpr_valid_range_neg.cc:35:20:   in ‘constexpr’ expansion of ‘test1()’
constexpr_valid_range_neg.cc:30:38:   in ‘constexpr’ expansion of 
‘std::lower_bound(ca0.std::__debug::array12>::end(), ca0.std::__debug::array::begin(), 6)’
/home/fdt/dev/gcc/install/include/c++/10.0.0/bits/stl_algobase.h:1482:7: 
error: inline assembly is not a constant expression

 1482 |   __glibcxx_requires_partitioned_lower(__first, __last, __val);
  |   ^~~~
/home/fdt/dev/gcc/install/include/c++/10.0.0/bits/stl_algobase.h:1482:7: 
note: only unevaluated inline assembly is allowed in a ‘constexpr’ 
function in C++2a


This time it is done in no time. Of course you can see that the asm 
trick to generate a non-constant condition is not so nice.


We just don't see the asm call parameter but showing 
__glibcxx_requires_partitioned_lower is not so bad. For this reason the 
tests in this patch are not checking for any failure message. We'll see 
how to adapt when we have the necessary front end help to generate this 
compilation error.


Of course I could have a nicer compilation error by directly calling 
__glibcxx_requires_valid_range in the algo and stop doing it within 
__glibcxx_requires_partitioned_lower. Do you want to expose all macros 
this way ?



    * include/debug/formatter.h (__check_singular): Add C++11 constexpr
    qualification.
    * include/debug/helper_functions.h (__check_singular): Likewise. Skip
    check if constant evaluated.
    (__valid_range): Remove check skip if constant evaluated.
    * include/debug/macros.h [_GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED]
    (_GLIBCXX_DEBUG_VERIFY_COND_AT): Define.
    * testsuite/25_algorithms/lower_bound/constexpr.cc (test): Add 
checks on

    lower_bound results.
    * testsuite/25_algorithms/upper_bound/constexpr.cc (test): Likewise.
    * testsuite/25_algorithms/lower_bound/debug/
    constexpr_partitioned_neg.cc: New.
    * testsuite/25_algorithms/lower_bound/debug/
    constexpr_partitioned_pred_neg.cc: New.
    * testsuite/25_algorithms/lower_bound/debug/
    constexpr_valid_range_neg.cc: New.
    * testsuite/25_algorithms/lower_bound/debug/partitioned_neg.cc: New.
    * testsuite/25_algorithms/lower_bound/debug/partitioned_pred_neg.cc:
    New.
    * testsuite/25_algorithms/upper_bound/debug/
    constexpr_partitioned_neg.cc: New.
    * testsuite/25_algorithms/upper_bound/debug/
    constexpr_partitioned_pred_neg.cc: New.
    * testsuite/25_algorithms/upper_bound/debug/
    constexpr_valid_range_neg.cc: New.
    * testsuite/25_algorithms/upper_bound/debug/partitioned_neg.cc: New.
    *

Re: [C++ PATCH] Fix up constexpr virtual calls (PR c++/92695)

On 11/27/19 6:38 PM, Jakub Jelinek wrote:

Hi!

The OBJ_TYPE_REF constexpr handling looks through DECL_FIELD_IS_BASE
COMPONENT_REFs to find the actual object on which the method is called,
but as the following testcase shows, we need to do the similar thing also
for the argument passed as this, because cxx_eval_indirect_ref otherwise
fails to fold it.  It can handle going from derived to base, not not vice
versa.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

2019-11-27  Jakub Jelinek  

PR c++/92695
* constexpr.c (cxx_bind_parameters_in_call): For virtual calls,
adjust the first argument to point to the derived object rather
than its base.

* g++.dg/cpp2a/constexpr-virtual14.C: New test.

--- gcc/cp/constexpr.c.jj   2019-11-27 17:26:57.229014008 +0100
+++ gcc/cp/constexpr.c  2019-11-27 17:53:37.477566346 +0100
@@ -1441,6 +1441,24 @@ cxx_bind_parameters_in_call (const const
arg = adjust_temp_type (type, arg);
  if (!TREE_CONSTANT (arg))
*non_constant_args = true;
+ /* For virtual calls, adjust the this argument, so that it is
+the object on which the method is called, rather than
+one of its bases.  */
+ if (i == 0 && DECL_VIRTUAL_P (fun))
+   {
+ tree addr = arg;
+ STRIP_NOPS (addr);
+ if (TREE_CODE (addr) == ADDR_EXPR)
+   {
+ tree obj = TREE_OPERAND (addr, 0);
+ while (TREE_CODE (obj) == COMPONENT_REF
+&& DECL_FIELD_IS_BASE (TREE_OPERAND (obj, 1)))
+   obj = TREE_OPERAND (obj, 0);

Shouldn't this loop stop when we reach DECL_CONTEXT (fun)?

Relatedly, I notice that the OBJ_TYPE_REF code breaks with multiple 
inheritance: Adding a virtual function to Y in constexpr-virtual6.C leads to

> wa.ii:26:22: error: non-constant condition for static assertion
>26 | static_assert(r2.f() == );
>   |   ~~~^~
> wa.ii:26:19: error: call to non-'constexpr' function 'virtual void 
Y::dummy()'

>26 | static_assert(r2.f() == );
>   |   ^~
> wa.ii:12:16: note: 'virtual void Y::dummy()' declared here
>12 |   virtual void dummy () { }

Re: [PATCH], V8, #2 of 6, Testsuite: Test illegal DS/DQ offsets become prefixed insns

On Thu, Nov 14, 2019 at 07:16:28PM -0500, Michael Meissner wrote:
> This test tests whether traditional DS and DQ instructions that require the
> bottom 2 bits of the offset to be zero (DS) or the bottom 4 of the offset to 
> be
> zero (DQ) properly generate the prefixed form of the instruction instead of
> loading the offset into a GPR and doing an indexed memory operation.

>   * gcc.target/powerpc/prefix-odd-memory.c: New test to make sure
>   prefixed instructions are generated if an offset would not be
>   legal for the non-prefixed DS/DQ instructions.

Same comments as before.

> +/* { dg-require-effective-target powerpc_prefixed_addr_ok } */

What are the plans for getting rid of this test?  Do we need it at all
even now, in the tests that use -mcpu=future anyway?

> +/* Tests whether we can generate a prefixed load/store operation for 
> addresses
> +   that don't meet DS/DQ alignment constraints.  */

Test that we *do*, even; and also, test that we *don't* when it isn't
necessary.

And it is about the low bits of the offset on these insns, not about the
alignment.

> +  *(float *)(p + 1) = f; /* should generate STF.  */

stfs.

> +  *(double *)(p + 1) = d;/* should generate STD.  */

stfd, instead.


Segher

Re: [C++ PATCH] Fix OBJ_TYPE_REF constexpr handling (PR c++/92695)


On 11/27/19 6:44 PM, Jakub Jelinek wrote:

Hi!

On the following testcase the constexpr evaluation of the virtual call
fails, because what cxx_eval_constant_expression returns for
OBJ_TYPE_REF_OBJECT is actually not ADDR_EXPR, but ADDR_EXPR wrapped in
a NOP_EXPR.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?


OK.


2019-11-27  Jakub Jelinek  

PR c++/92695
* constexpr.c (cxx_eval_constant_expression) : Use
STRIP_NOPS before checking for ADDR_EXPR.

* g++.dg/cpp2a/constexpr-virtual15.C: New test.

--- gcc/cp/constexpr.c.jj   2019-11-27 17:53:37.477566346 +0100
+++ gcc/cp/constexpr.c  2019-11-27 21:16:51.094188509 +0100
@@ -5566,6 +5566,7 @@ cxx_eval_constant_expression (const cons
tree obj = OBJ_TYPE_REF_OBJECT (t);
obj = cxx_eval_constant_expression (ctx, obj, lval, non_constant_p,
overflow_p);
+   STRIP_NOPS (obj);
/* We expect something in the form of  get x. */
if (TREE_CODE (obj) != ADDR_EXPR
|| !DECL_P (get_base_address (TREE_OPERAND (obj, 0
--- gcc/testsuite/g++.dg/cpp2a/constexpr-virtual15.C.jj 2019-11-27 
21:18:15.418895652 +0100
+++ gcc/testsuite/g++.dg/cpp2a/constexpr-virtual15.C2019-11-27 
21:17:48.602306802 +0100
@@ -0,0 +1,7 @@
+// PR c++/92695
+// { dg-do compile { target c++2a } }
+
+struct A { virtual int get() = 0; };
+struct B : A { constexpr int get() override { return 10; } };
+struct D { B b[2]; A* c{&(b[0])}; };
+static_assert(D{}.c->get() == 10);

Jakub

Re: Ping: [C++ PATCH] Opt out of GNU vector extensions for built-in SVE types


On 11/29/19 5:59 AM, Richard Sandiford wrote:

Ping

Richard Sandiford  writes:

This is the C++ equivalent of r277950, which prevented the
use of the GNU vector extensions with SVE vector types for C.
[https://gcc.gnu.org/viewcvs/gcc?view=revision=277950].
I've copied the rationale below for reference.

The changes here are very similar to the C ones.  Perhaps the only
noteworthy thing (that I know of) is that the patch continues to treat
!gnu_vector_type_p vector types as literal types/potential constexprs.
Disabling the GNU vector extensions shouldn't in itself stop the types
from being literal types, since whatever the target provides instead
might be constexpr material.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Richard

-
The AArch64 port defines built-in SVE types at start-up under names
like __SVInt8_t.  These types are represented in the front end and
gimple as normal VECTOR_TYPEs and are code-generated as normal vectors.
However, we'd like to stop the frontends from treating them in the
same way as GNU-style ("vector_size") vectors, for several reasons:

(1) We allowed the GNU vector extensions to be mixed with Advanced SIMD
 vector types and it ended up causing a lot of confusion on big-endian
 targets.  Although SVE handles big-endian vectors differently from
 Advanced SIMD, there are still potential surprises; see the block
 comment near the head of aarch64-sve.md for details.

(2) One of the SVE vectors is a packed one-bit-per-element boolean vector.
 That isn't a combination the GNU vector extensions have supported
 before.  E.g. it means that vectors can no longer decompose to
 arrays for indexing, and that not all elements are individually
 addressable.  It also makes it less clear which order the initialiser
 should be in (lsb first, or bitfield ordering?).  We could define
 all that of course, but it seems a bit weird to go to the effort
 for this case when, given all the other reasons, we don't want the
 extensions anyway.

(3) The GNU vector extensions only provide full-vector operations,
 which is a very artifical limitation on a predicated architecture
 like SVE.

(4) The set of operations provided by the GNU vector extensions is
 relatively small, whereas the SVE intrinsics provide many more.

(5) It makes it easier to ensure that (with default options) code is
 portable between compilers without the GNU vector extensions having
 to become an official part of the SVE intrinsics spec.

(6) The length of the SVE types is usually not fixed at compile time,
 whereas the GNU vector extension is geared around fixed-length
 vectors.

 It's possible to specify the length of an SVE vector using the
 command-line option -msve-vector-bits=N, but in principle it should
 be possible to have functions compiled for different N in the same
 translation unit.  This isn't supported yet but would be very useful
 for implementing ifuncs.  Once mixing lengths in a translation unit
 is supported, the SVE types should represent the same type throughout
 the translation unit, just as GNU vector types do.

However, when -msve-vector-bits=N is in effect, we do allow conversions
between explicit GNU vector types of N bits and the corresponding SVE
types.  This doesn't undermine the intent of (5) because in this case
the use of GNU vector types is explicit and intentional.  It also doesn't
undermine the intent of (6) because converting between the types is just
a conditionally-supported operation.  In other words, the types still
represent the same types throughout the translation unit, it's just that
conversions between them are valid in cases where a certain precondition
is known to hold.  It's similar to the way that the SVE vector types are
defined throughout the translation unit but can only be used in functions
for which SVE is enabled.
-


2019-11-08  Richard Sandiford  

gcc/cp/
* cp-tree.h (CP_AGGREGATE_TYPE_P): Check for gnu_vector_type_p
instead of VECTOR_TYPE.
* call.c (build_conditional_expr_1): Restrict vector handling
to vectors that satisfy gnu_vector_type_p.  Don't treat the
"then" and "else" types as equivalent if they have the same
vector shape but differ in whether they're GNU vectors.
* cvt.c (ocp_convert): Only allow vectors to be converted
to bool if they satisfy gnu_vector_type_p.
(build_expr_type_conversion): Only allow conversions from
vectors if they satisfy gnu_vector_type_p.
* typeck.c (cp_build_binary_op): Only allow binary operators to be
applied to vectors if they satisfy gnu_vector_type_p.
(cp_build_unary_op): Likewise unary operators.
(build_reinterpret_cast_1):

Index: gcc/cp/call.c

Re: [C++ Patch] A few more cp_expr_loc_or_input_loc and a diagnostic regression fix


On 11/29/19 8:08 AM, Paolo Carlini wrote:

Hi,

a few more rather straightforward uses for cp_expr_loc_or_input_loc.

Additionally, while working on the latter, I noticed that, compared to 
say, gcc-7, lately the code we have in cp_build_addr_expr_1 to diagnose 
taking the address of 'main' often doesn't work anymore, when the 
argument is wrapped in a location_wrapper. The below fixes that by using 
tree_strip_any_location_wrapper in the DECL_MAIN_P check, which works 
fine, but I can imagine various other ways to solve the issue...


Maybe

location_t loc = cp_expr_loc_or_input_loc (arg);
STRIP_ANY_LOCATION_WRAPPER (arg);

at the top?  In general I prefer the local variable to writing 
cp_expr_loc_or_input_loc in lots of places, whether or not we then strip 
wrappers.


Jason

[PATCH] rs6000: Make rs6000_invalid_builtin static (committed)

2019-12-02 Thread Bill Schmidt


Hi,

I noticed this function should have been made static in the recent separation of
rs6000-call.c from rs6000.c.  Bootstrapped and tested on powerpc64le-linux-gnu,
committed.

Thanks!
Bill


Make rs6000_invalid_builtin static.

2019-12-02  Bill Schmidt  

* config/rs6000/rs6000-call.c (rs6000_invalid_builtin): Make
static.
* config/rs6000/rs6000-internal.h (rs6000_invalid_builtin): Remove
decl.


diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index 7280a4ed9c8..9c9da09af5e 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -5087,7 +5087,7 @@ rs6000_builtin_is_supported_p (enum rs6000_builtins 
fncode)
 /* Raise an error message for a builtin function that is called without the
appropriate target options being set.  */
 
-void

+static void
 rs6000_invalid_builtin (enum rs6000_builtins fncode)
 {
   size_t uns_fncode = (size_t) fncode;
diff --git a/gcc/config/rs6000/rs6000-internal.h 
b/gcc/config/rs6000/rs6000-internal.h
index baccfb3f887..51eb3e053cf 100644
--- a/gcc/config/rs6000/rs6000-internal.h
+++ b/gcc/config/rs6000/rs6000-internal.h
@@ -138,7 +138,6 @@ extern void rs6000_output_mi_thunk (FILE *file,
tree function);
 extern bool rs6000_output_addr_const_extra (FILE *file, rtx x);
 extern bool rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi);
-extern void rs6000_invalid_builtin (enum rs6000_builtins fncode);
 extern tree rs6000_build_builtin_va_list (void);
 extern void rs6000_va_start (tree valist, rtx nextarg);
 extern tree rs6000_gimplify_va_arg (tree valist, tree type, gimple_seq *pre_p,

Re: [PATCH], V8, #1 of 6, Tessuite: Add PADDI tests

On Thu, Nov 14, 2019 at 07:12:54PM -0500, Michael Meissner wrote:
> This patch adds 3 tests that tests whether PLI (PADDI) is generated to load up
> DImode constants, load up SImode constants, and adding 34-bit constants.

>   * gcc.target/powerpc/paddi-1.c: New test to test using PLI to
>   load up a large DImode constant.
>   * gcc.target/powerpc/paddi-2.c: New test to test using PLI to
>   load up a large SImode constant.
>   * gcc.target/powerpc/paddi-3.c: New test to test using PADDI to
>   add a large DImode constant.

* gcc.target/powerpc/paddi-1.c: New test.
* gcc.target/powerpc/paddi-2.c: New test.
* gcc.target/powerpc/paddi-3.c: New test.

Changelogs say *what*, not *why*.  If you want to say what is tested,
say that in the testcases itself (you should, in fact; and more precise
than this please).

> --- /tmp/s2UNQW_paddi-1.c 2019-11-13 17:39:21.274807246 -0500
> +++ gcc/testsuite/gcc.target/powerpc/paddi-1.c2019-11-13 
> 17:39:21.067805382 -0500
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_prefixed_addr_ok } */
> +/* { dg-options "-O2 -mdejagnu-cpu=future" } */
> +
> +/* Test that PADDI is generated to add a large constant.  */
> +unsigned long
> +add (unsigned long a)
> +{
> +  return a + 0x12345678UL;
> +}
> +
> +/* { dg-final { scan-assembler {\mpaddi\M} } } */

This will always succeed (it matches the ".file" line).


Segher

Re: [C++ PATCH] Fix ICE in build_new_op_1 (PR c++/92705)


On 11/29/19 6:33 PM, Jakub Jelinek wrote:

Hi!

The changed code in build_new_op_1 ICEs on the following testcase,
because conv is user_conv_p with kind == ck_ambig, for which next_conversion
returns NULL.  It seems in other spots where for user_conv_p we are walking
the conversion chain we also don't assume there must be ck_user, so this
patch just uses the first conv if ck_user is not found (so that the previous
diagnostics about ambiguous conversion is emitted).


It seems like various places could use a function to strip down to the 
first ck_user, ck_ambig, ck_list, ck_aggr, or ck_identity conversion 
encountered: source_type, the user-defined conversion comparision in 
compare_ics, and now here.  Mind doing that refactoring?  Maybe call it 
strip_standard_conversion.



Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2019-11-29  Jakub Jelinek  

PR c++/92705
* call.c (build_new_op_1): For user_conv_p, if there is no
ck_user conversion, use the first one.

* g++.dg/conversion/ambig4.C: New test.

--- gcc/cp/call.c.jj2019-11-29 12:50:09.664608879 +0100
+++ gcc/cp/call.c   2019-11-29 14:09:54.311718859 +0100
@@ -6370,8 +6370,12 @@ build_new_op_1 (const op_location_t 
  conv = cand->convs[0];
  if (conv->user_conv_p)
{
- while (conv->kind != ck_user)
-   conv = next_conversion (conv);
+ for (conversion *t = conv; t; t = next_conversion (t))
+   if (t->kind == ck_user)
+ {
+   conv = t;
+   break;
+ }
  arg1 = convert_like (conv, arg1, complain);
}
  
@@ -6380,8 +6384,12 @@ build_new_op_1 (const op_location_t 

  conv = cand->convs[1];
  if (conv->user_conv_p)
{
- while (conv->kind != ck_user)
-   conv = next_conversion (conv);
+ for (conversion *t = conv; t; t = next_conversion (t))
+   if (t->kind == ck_user)
+ {
+   conv = t;
+   break;
+ }
  arg2 = convert_like (conv, arg2, complain);
}
}
@@ -6391,8 +6399,12 @@ build_new_op_1 (const op_location_t 
  conv = cand->convs[2];
  if (conv->user_conv_p)
{
- while (conv->kind != ck_user)
-   conv = next_conversion (conv);
+ for (conversion *t = conv; t; t = next_conversion (t))
+   if (t->kind == ck_user)
+ {
+   conv = t;
+   break;
+ }
  arg3 = convert_like (conv, arg3, complain);
}
}
--- gcc/testsuite/g++.dg/conversion/ambig4.C.jj 2019-11-29 14:11:35.239183848 
+0100
+++ gcc/testsuite/g++.dg/conversion/ambig4.C2019-11-29 14:11:07.006613238 
+0100
@@ -0,0 +1,14 @@
+// PR c++/92705
+// { dg-do compile }
+
+struct A {};
+struct B {};
+struct C { operator B * (); }; // { dg-message "candidate" }
+struct D { operator B * (); }; // { dg-message "candidate" }
+struct E : C, D { operator A * (); };
+
+void
+foo (E e, int B::* pmf)
+{
+  int i = e->*pmf;  // { dg-error "is ambiguous" }
+}

Jakub

[patch] install the lto-dump man page

2019-12-02 Thread Matthias Klose

GCC 10 comes with a new lto-dump texi file, but the man page isn't built and
installed. Fix with the attached patch. Ok to install?

Matthias
	* Makefile.in (SOURCES): Add doc/lto-dump.1.
	(install-man): Add $(LTO_DUMP_INSTALL_NAME)$(man1ext).
	($(LTO_DUMP_INSTALL_NAME)$(man1ext): New.

Index: gcc/Makefile.in
===
--- gcc/Makefile.in	(revision 278870)
+++ gcc/Makefile.in	(working copy)
@@ -3202,7 +3202,8 @@
 	 gcov.texi trouble.texi bugreport.texi service.texi		\
 	 contribute.texi compat.texi funding.texi gnu.texi gpl_v3.texi	\
 	 fdl.texi contrib.texi cppenv.texi cppopts.texi avr-mmcu.texi	\
-	 implement-c.texi implement-cxx.texi gcov-tool.texi gcov-dump.texi lto-dump.texi
+	 implement-c.texi implement-cxx.texi gcov-tool.texi gcov-dump.texi \
+	 lto-dump.texi
 
 # we explicitly use $(srcdir)/doc/tm.texi here to avoid confusion with
 # the generated tm.texi; the latter might have a more recent timestamp,
@@ -3325,7 +3326,8 @@
 	$(SHELL) $(srcdir)/doc/install.texi2html
 
 MANFILES = doc/gcov.1 doc/cpp.1 doc/gcc.1 doc/gfdl.7 doc/gpl.7 \
-   doc/fsf-funding.7 doc/gcov-tool.1 doc/gcov-dump.1
+   doc/fsf-funding.7 doc/gcov-tool.1 doc/gcov-dump.1 \
+	   $(if $(filter yes,@enable_lto@),doc/lto-dump.1)
 
 generated-manpages: man
 
@@ -3722,6 +3724,7 @@
 	$(DESTDIR)$(man1dir)/$(GCOV_INSTALL_NAME)$(man1ext) \
 	$(DESTDIR)$(man1dir)/$(GCOV_TOOL_INSTALL_NAME)$(man1ext) \
 	$(DESTDIR)$(man1dir)/$(GCOV_DUMP_INSTALL_NAME)$(man1ext) \
+	$(if $(filter yes,@enable_lto@),$(DESTDIR)$(man1dir)/$(LTO_DUMP_INSTALL_NAME)$(man1ext)) \
 	$(DESTDIR)$(man7dir)/fsf-funding$(man7ext) \
 	$(DESTDIR)$(man7dir)/gfdl$(man7ext) \
 	$(DESTDIR)$(man7dir)/gpl$(man7ext)
@@ -3756,6 +3759,11 @@
 	-$(INSTALL_DATA) $< $@
 	-chmod a-x $@
 
+$(DESTDIR)$(man1dir)/$(LTO_DUMP_INSTALL_NAME)$(man1ext): doc/lto-dump.1 installdirs
+	-rm -f $@
+	-$(INSTALL_DATA) $< $@
+	-chmod a-x $@
+
 # Install all the header files built in the include subdirectory.
 install-headers: $(INSTALL_HEADERS_DIR)
 # Fix symlinks to absolute paths in the installed include directory to

Re: [mid-end] Add notes to dataflow insn info when re-emitting (PR92410)

2019-12-02 Thread Matthew Malcomson

Ping

On 12/11/2019 09:11, Matthew Malcomson wrote:
> In scheduling passes, notes are removed with `remove_notes` before the
> scheduling is done, and added back in with `reemit_notes` once the
> scheduling has been decided.
> 
> This process leaves the notes in the RTL chain with different insn uid's
> than were there before.  Having different UID's (larger than the
> previous ones) means that DF_INSN_INFO_GET(insn) will access outside of
> the allocated array.
> 
> This has been seen in the `regstat_bb_compute_calls_crossed` function.
> This patch adds an assert to the `regstat_bb_compute_calls_crossed`
> function so that bad accesses here are caught instead of going
> unnoticed, and then avoids the problem.
> 
> We avoid the problem by ensuring that new notes added by `reemit_notes` have 
> an
> insn record given to them.  This is done by adding a call to
> `df_insn_create_insn_record` on each note added in `reemit_notes`.
> `df_insn_create_insn_record` leaves this new record zeroed out, which appears
> to be fine for notes (e.g. `df_bb_refs_record` already does not set
> anything except the luid for notes, and notes have no dataflow information to
> record).
> 
> We add the testcase that Martin found here
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92410#c2 .
> This testcase fails with the "regstat.c" change, and then succeeds with the
> "haifa-sched.c" change.
> 
> There is a similar problem with labels, that the `gcc_assert` catches
> when running regression tests in gcc.dg/fold-eqandshift-1.c and
> gcc.c-torture/compile/pr32482.c.
> This is due to the `cfg_layout_finalize` call in `bb-reorder.c` emitting
> new labels for the start of the newly created basic blocks. These labels are
> not given a dataflow df_insn_info member.
> 
> We solve this by manually calling `df_recompute_luids` on each basic
> block once this pass has finished.
> 
> Testing done:
>Bootstrapped and regression test on aarch64-none-linux-gnu native.
> 
> gcc/ChangeLog:
> 
> 2019-11-12  Matthew Malcomson  
> 
>   PR middle-end/92410
>   * bb-reorder.c (pass_reorder_blocks::execute): Recompute
>   dataflow luids once basic blocks have been reordered.
>   * haifa-sched.c (reemit_notes): Create df insn record for each
>   new note.
>   * regstat.c (regstat_bb_compute_calls_crossed): Assert every
>   insn has an insn record before trying to use it.
> 
> gcc/testsuite/ChangeLog:
> 
> 2019-11-12  Matthew Malcomson  
> 
>   PR middle-end/92410
>   * gcc.dg/torture/pr92410.c: New test.
> 
> 
> 
> ### Attachment also inlined for ease of reply
> ###
> 
> 
> diff --git a/gcc/bb-reorder.c b/gcc/bb-reorder.c
> index 
> 0ac39140c6ce3db8499f99cd8f48321de61b..4943f97a2cfccc7ab7c4834fff2c81f62fc5b738
>  100644
> --- a/gcc/bb-reorder.c
> +++ b/gcc/bb-reorder.c
> @@ -2662,6 +2662,8 @@ pass_reorder_blocks::execute (function *fun)
> bb->aux = bb->next_bb;
> cfg_layout_finalize ();
>   
> +  FOR_EACH_BB_FN (bb, fun)
> +df_recompute_luids (bb);
> return 0;
>   }
>   
> diff --git a/gcc/haifa-sched.c b/gcc/haifa-sched.c
> index 
> 41cf1f362e8c34d009b0a310ff5b9a9ffb613631..2e1a84f1325d54ece1cf3c6f0bab607099c71d27
>  100644
> --- a/gcc/haifa-sched.c
> +++ b/gcc/haifa-sched.c
> @@ -5433,6 +5433,7 @@ reemit_notes (rtx_insn *insn)
>   
> last = emit_note_before (note_type, last);
> remove_note (insn, note);
> +   df_insn_create_insn_record (last);
>   }
>   }
>   }
> diff --git a/gcc/regstat.c b/gcc/regstat.c
> index 
> 4da9b7cc523cde113df931226ea56310b659e619..c6cefb117d7330cc1b9787c37083e05e2acda2d2
>  100644
> --- a/gcc/regstat.c
> +++ b/gcc/regstat.c
> @@ -324,6 +324,7 @@ regstat_bb_compute_calls_crossed (unsigned int bb_index, 
> bitmap live)
>   
> FOR_BB_INSNS_REVERSE (bb, insn)
>   {
> +  gcc_assert (INSN_UID (insn) < DF_INSN_SIZE ());
> struct df_insn_info *insn_info = DF_INSN_INFO_GET (insn);
> unsigned int regno;
>   
> diff --git a/gcc/testsuite/gcc.dg/torture/pr92410.c 
> b/gcc/testsuite/gcc.dg/torture/pr92410.c
> new file mode 100644
> index 
> ..628e329782260ef36f26506bfbc0d36a93b33f41
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/torture/pr92410.c
> @@ -0,0 +1,8 @@
> +/* PR middle-end/92410  */
> +/* { dg-do compile } */
> +int v;
> +
> +int a() {
> +  ;
> +  return v;
> +}
>

[committed][AArch64] Catch attempts to use SVE types when SVE is disabled

This patch reports an error if code tries to use variable-length
SVE types when SVE is disabled.  We already report a similar error
for definitions or uses of SVE functions when SVE is disabled.

Tested on aarch64-linux-gnu and applied as r278909.

Richard


2019-12-02  Richard Sandiford  

gcc/
* config/aarch64/aarch64.c (aarch64_report_sve_required): New function.
(aarch64_expand_mov_immediate): Use it when attempting to measure
the length of an SVE vector.
(aarch64_mov_operand_p): Only allow SVE CNT immediates when
SVE is enabled.

gcc/testsuite/
* gcc.target/aarch64/sve/acle/general/nosve_4.c: New test.
* gcc.target/aarch64/sve/acle/general/nosve_5.c: Likewise.
* gcc.target/aarch64/sve/pcs/nosve_4.c: Expected a second error
for the copy.
* gcc.target/aarch64/sve/pcs/nosve_5.c: Likewise.
* gcc.target/aarch64/sve/pcs/nosve_6.c: Likewise.

Index: gcc/config/aarch64/aarch64.c
===
--- gcc/config/aarch64/aarch64.c2019-11-30 18:48:18.491984384 +
+++ gcc/config/aarch64/aarch64.c2019-12-02 17:47:25.856701262 +
@@ -1473,6 +1473,25 @@ aarch64_err_no_fpadvsimd (machine_mode m
 " vector types", "+nofp");
 }
 
+/* Report when we try to do something that requires SVE when SVE is disabled.
+   This is an error of last resort and isn't very high-quality.  It usually
+   involves attempts to measure the vector length in some way.  */
+static void
+aarch64_report_sve_required (void)
+{
+  static bool reported_p = false;
+
+  /* Avoid reporting a slew of messages for a single oversight.  */
+  if (reported_p)
+return;
+
+  error ("this operation requires the SVE ISA extension");
+  inform (input_location, "you can enable SVE using the command-line"
+ " option %<-march%>, or by using the %"
+ " attribute or pragma");
+  reported_p = true;
+}
+
 /* Return true if REGNO is P0-P15 or one of the special FFR-related
registers.  */
 inline bool
@@ -4525,6 +4544,11 @@ aarch64_expand_mov_immediate (rtx dest,
 folding it into the relocation.  */
   if (!offset.is_constant (_offset))
{
+ if (!TARGET_SVE)
+   {
+ aarch64_report_sve_required ();
+ return;
+   }
  if (base == const0_rtx && aarch64_sve_cnt_immediate_p (offset))
emit_insn (gen_rtx_SET (dest, imm));
  else
@@ -16864,7 +16888,7 @@ aarch64_mov_operand_p (rtx x, machine_mo
   if (GET_CODE (x) == SYMBOL_REF && mode == DImode && CONSTANT_ADDRESS_P (x))
 return true;
 
-  if (aarch64_sve_cnt_immediate_p (x))
+  if (TARGET_SVE && aarch64_sve_cnt_immediate_p (x))
 return true;
 
   return aarch64_classify_symbolic_expression (x)
Index: gcc/testsuite/gcc.target/aarch64/sve/acle/general/nosve_4.c
===
--- /dev/null   2019-09-17 11:41:18.176664108 +0100
+++ gcc/testsuite/gcc.target/aarch64/sve/acle/general/nosve_4.c 2019-12-02 
17:47:25.856701262 +
@@ -0,0 +1,8 @@
+/* { dg-options "-march=armv8-a" } */
+
+void
+f (__SVBool_t *x, __SVBool_t *y)
+{
+  *x = *y; /* { dg-error {this operation requires the SVE ISA extension} } */
+  *x = *y;
+}
Index: gcc/testsuite/gcc.target/aarch64/sve/acle/general/nosve_5.c
===
--- /dev/null   2019-09-17 11:41:18.176664108 +0100
+++ gcc/testsuite/gcc.target/aarch64/sve/acle/general/nosve_5.c 2019-12-02 
17:47:25.856701262 +
@@ -0,0 +1,8 @@
+/* { dg-options "-march=armv8-a" } */
+
+void
+f (__SVInt8_t *x, __SVInt8_t *y)
+{
+  *x = *y; /* { dg-error {this operation requires the SVE ISA extension} } */
+  *x = *y;
+}
Index: gcc/testsuite/gcc.target/aarch64/sve/pcs/nosve_4.c
===
--- gcc/testsuite/gcc.target/aarch64/sve/pcs/nosve_4.c  2019-10-29 
17:01:12.679889042 +
+++ gcc/testsuite/gcc.target/aarch64/sve/pcs/nosve_4.c  2019-12-02 
17:47:25.868701178 +
@@ -10,5 +10,6 @@ void take_svuint8 (svuint8_t);
 void
 f (svuint8_t *ptr)
 {
-  take_svuint8 (*ptr); /* { dg-error {'take_svuint8' requires the SVE ISA 
extension} } */
+  take_svuint8 (*ptr); /* { dg-error {this operation requires the SVE ISA 
extension} } */
+  /* { dg-error {'take_svuint8' requires the SVE ISA extension} "" { target 
*-*-* } .-1 } */
 }
Index: gcc/testsuite/gcc.target/aarch64/sve/pcs/nosve_5.c
===
--- gcc/testsuite/gcc.target/aarch64/sve/pcs/nosve_5.c  2019-10-29 
17:01:12.679889042 +
+++ gcc/testsuite/gcc.target/aarch64/sve/pcs/nosve_5.c  2019-12-02 
17:47:25.868701178 +
@@ -11,5 +11,6 @@ void take_svuint8_eventually (float, flo
 void
 f (svuint8_t *ptr)
 {
-  take_svuint8_eventually (0, 0, 0, 0, 0, 0, 0, 0, *ptr); /* { dg-error 
{arguments of type '(svuint8_t|__SVUint8_t)' require the SVE ISA

[committed][AArch64] Run general SVE ACLE tests for C

Now that the C frontend can cope with POLY_INT_CST-length initialisers,
we can make aarch64-sve-acle.exp run the full set of tests.  This will
introduce new failures for -mabi=ilp32; I'll make the testsuite ILP32
clean separately.

Tested on aarch64-linux-gnu and applied as r278908.

Richard


2019-12-02  Richard Sandiford  

gcc/testsuite/
* gcc.target/aarch64/sve/acle/aarch64-sve-acle.exp: Run the
general/* tests too.

Index: gcc/testsuite/gcc.target/aarch64/sve/acle/aarch64-sve-acle.exp
===
--- gcc/testsuite/gcc.target/aarch64/sve/acle/aarch64-sve-acle.exp  
2019-10-29 08:59:18.431479432 +
+++ gcc/testsuite/gcc.target/aarch64/sve/acle/aarch64-sve-acle.exp  
2019-12-02 17:44:20.257966719 +
@@ -45,9 +45,9 @@ if { [check_effective_target_aarch64_sve
 }
 
 # Main loop.
-# FIXME: This should include general/*.c too, but leave that until the
-# C frontend allows initialization of SVE vectors.
-set files [glob -nocomplain $srcdir/$subdir/general-c/*.c]
+set files [glob -nocomplain \
+  "$srcdir/$subdir/general/*.c" \
+  "$srcdir/$subdir/general-c/*.c"]
 dg-runtest [lsort $files] "$sve_flags" $DEFAULT_CFLAGS
 
 # All done.

[committed][AArch64] Add a couple of SVE ACLE comparison folds

When writing vector-length specific SVE code, it's useful to be able
to store an svbool_t predicate in a GNU vector of unsigned chars.
This patch makes sure that there is no overhead when converting
to that form and then immediately reading it back again.

Tested on aarch64-linux-gnu and applied as r278907.

(In case this seems strange for stage 3: the SVE ACLE is a new
feature for GCC 10 and so we'd like it to be as good as we can make it.
Nothing else will be affected.)

Richard


2019-12-02  Richard Sandiford  

gcc/
* config/aarch64/aarch64-sve-builtins.h
(gimple_folder::force_vector): Declare.
* config/aarch64/aarch64-sve-builtins.cc
(gimple_folder::force_vector): New function.
* config/aarch64/aarch64-sve-builtins-base.cc
(svcmp_impl::fold): Likewise.
(svdup_impl::fold): Handle svdup_z too.

gcc/testsuite/
* gcc.target/aarch64/sve/acle/general/eqne_dup_1.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_f16.c (dup_0_f16_z): Expect
the call to be folded to zero.
* gcc.target/aarch64/sve/acle/asm/dup_f32.c (dup_0_f32_z): Likewise.
* gcc.target/aarch64/sve/acle/asm/dup_f64.c (dup_0_f64_z): Likewise.
* gcc.target/aarch64/sve/acle/asm/dup_s8.c (dup_0_s8_z): Likewise.
* gcc.target/aarch64/sve/acle/asm/dup_s16.c (dup_0_s16_z): Likewise.
* gcc.target/aarch64/sve/acle/asm/dup_s32.c (dup_0_s32_z): Likewise.
* gcc.target/aarch64/sve/acle/asm/dup_s64.c (dup_0_s64_z): Likewise.
* gcc.target/aarch64/sve/acle/asm/dup_u8.c (dup_0_u8_z): Likewise.
* gcc.target/aarch64/sve/acle/asm/dup_u16.c (dup_0_u16_z): Likewise.
* gcc.target/aarch64/sve/acle/asm/dup_u32.c (dup_0_u32_z): Likewise.
* gcc.target/aarch64/sve/acle/asm/dup_u64.c (dup_0_u64_z): Likewise.

Index: gcc/config/aarch64/aarch64-sve-builtins.h
===
--- gcc/config/aarch64/aarch64-sve-builtins.h   2019-10-29 08:59:18.415479546 
+
+++ gcc/config/aarch64/aarch64-sve-builtins.h   2019-12-02 17:38:33.888329008 
+
@@ -488,6 +488,7 @@ class GTY((user)) function_instance
   gimple_folder (const function_instance &, tree,
 gimple_stmt_iterator *, gcall *);
 
+  tree force_vector (gimple_seq &, tree, tree);
   tree convert_pred (gimple_seq &, tree, unsigned int);
   tree fold_contiguous_base (gimple_seq &, tree);
   tree load_store_cookie (tree);
Index: gcc/config/aarch64/aarch64-sve-builtins.cc
===
--- gcc/config/aarch64/aarch64-sve-builtins.cc  2019-11-30 18:48:18.467984552 
+
+++ gcc/config/aarch64/aarch64-sve-builtins.cc  2019-12-02 17:38:33.888329008 
+
@@ -2234,6 +2234,17 @@ gimple_folder::gimple_folder (const func
 {
 }
 
+/* VALUE might be a vector of type VECTYPE or a single scalar element.
+   Duplicate it into a vector of type VECTYPE in the latter case, adding any
+   new statements to STMTS.  */
+tree
+gimple_folder::force_vector (gimple_seq , tree vectype, tree value)
+{
+  if (!VECTOR_TYPE_P (TREE_TYPE (value)))
+value = gimple_build_vector_from_val (, vectype, value);
+  return value;
+}
+
 /* Convert predicate argument ARGNO so that it has the type appropriate for
an operation on VECTYPE.  Add any new statements to STMTS.  */
 tree
Index: gcc/config/aarch64/aarch64-sve-builtins-base.cc
===
--- gcc/config/aarch64/aarch64-sve-builtins-base.cc 2019-11-16 
11:26:06.891163135 +
+++ gcc/config/aarch64/aarch64-sve-builtins-base.cc 2019-12-02 
17:38:33.888329008 +
@@ -333,6 +333,28 @@ public:
   CONSTEXPR svcmp_impl (tree_code code, int unspec_for_fp)
 : m_code (code), m_unspec_for_fp (unspec_for_fp) {}
 
+  gimple *
+  fold (gimple_folder ) const OVERRIDE
+  {
+tree pg = gimple_call_arg (f.call, 0);
+tree rhs1 = gimple_call_arg (f.call, 1);
+tree rhs2 = gimple_call_arg (f.call, 2);
+
+/* Convert a ptrue-predicated integer comparison into the corresponding
+   gimple-level operation.  */
+if (integer_all_onesp (pg)
+   && f.type_suffix (0).element_bytes == 1
+   && f.type_suffix (0).integer_p)
+  {
+   gimple_seq stmts = NULL;
+   rhs2 = f.force_vector (stmts, TREE_TYPE (rhs1), rhs2);
+   gsi_insert_seq_before (f.gsi, stmts, GSI_SAME_STMT);
+   return gimple_build_assign (f.lhs, m_code, rhs1, rhs2);
+  }
+
+return NULL;
+  }
+
   rtx
   expand (function_expander ) const OVERRIDE
   {
@@ -700,6 +722,17 @@ public:
  return gimple_build_assign (f.lhs, VEC_DUPLICATE_EXPR, rhs);
   }
 
+/* svdup_z (pg, x) == VEC_COND_EXPR , 0>.  */
+if (f.pred == PRED_z)
+  {
+   gimple_seq stmts = NULL;
+   tree pred = f.convert_pred (stmts, vec_type, 0);
+   rhs = f.force_vector (stmts, vec_type, rhs);
+   gsi_insert_seq_before (f.gsi, stmts, GSI_SAME_STMT);
+   return

[Committed][Arm][testsuite] Fix failure for arm-fp16-ops-*.C

2019-12-02 Thread Sudakshina Das

Hi

Since r275022 which deprecates some uses of volatile, we have seen the 
following failures on arm-none-eabi and arm-none-linux-gnueabihf
FAIL: g++.dg/ext/arm-fp16/arm-fp16-ops-1.C  -std=gnu++2a (test for 
excess errors)
FAIL: g++.dg/ext/arm-fp16/arm-fp16-ops-2.C  -std=gnu++2a (test for 
excess errors)
FAIL: g++.dg/ext/arm-fp16/arm-fp16-ops-3.C  -std=gnu++2a (test for 
excess errors)
FAIL: g++.dg/ext/arm-fp16/arm-fp16-ops-4.C  -std=gnu++2a (test for 
excess errors)
FAIL: g++.dg/ext/arm-fp16/arm-fp16-ops-5.C  -std=gnu++2a (test for 
excess errors)
FAIL: g++.dg/ext/arm-fp16/arm-fp16-ops-6.C  -std=gnu++2a (test for 
excess errors)
FAIL: g++.dg/ext/arm-fp16/arm-fp16-ops-7.C  -std=gnu++2a (test for 
excess errors)
FAIL: g++.dg/ext/arm-fp16/arm-fp16-ops-8.C  -std=gnu++2a (test for 
excess errors)
Which catches the deprecated uses of volatile variables declared in 
arm-fp16-ops.h.

This patch removes the volatile declarations from the header. Since none 
of the tests are run with any high optimization levels, this should 
change should not prevent the real function of the tests.

Tests with RUNTESTFLAGS="dg.exp=arm-fp16-ops-*.C" now pass with the 
patch on arm-none-eabi.
Committed as obvious r278905

gcc/testsuite/ChangeLog:

2019-xx-xx  Sudakshina Das  

* g++.dg/ext/arm-fp16/arm-fp16-ops.h: Remove volatile keyword.

Thanks
Sudi
diff --git a/gcc/testsuite/g++.dg/ext/arm-fp16/arm-fp16-ops.h b/gcc/testsuite/g++.dg/ext/arm-fp16/arm-fp16-ops.h
index 320494e..a92e081 100644
--- a/gcc/testsuite/g++.dg/ext/arm-fp16/arm-fp16-ops.h
+++ b/gcc/testsuite/g++.dg/ext/arm-fp16/arm-fp16-ops.h
@@ -7,16 +7,16 @@
 #define TEST(e) assert (e)
 #define TESTNOT(e) assert (!(e))
 
-volatile __fp16 h0 = 0.0;
-volatile __fp16 h1 = 1.0;
-volatile __fp16 h42 = 42.0;
-volatile __fp16 hm2 = -2.0;
-volatile __fp16 temp;
-
-volatile float f0 = 0.0;
-volatile float f1 = 1.0;
-volatile float f42 = 42.0;
-volatile float fm2 = -2.0;
+__fp16 h0 = 0.0;
+__fp16 h1 = 1.0;
+__fp16 h42 = 42.0;
+__fp16 hm2 = -2.0;
+__fp16 temp;
+
+float f0 = 0.0;
+float f1 = 1.0;
+float f42 = 42.0;
+float fm2 = -2.0;
 
 int main (void)
 {

Re: Tighten check for vector types in fold_convertible_p (PR 92741)

On December 2, 2019 5:40:41 PM GMT+01:00, Richard Sandiford 
 wrote:
>Richard Biener  writes:
>> On December 2, 2019 5:27:05 PM GMT+01:00, Richard Sandiford
> wrote:
>>>In this PR, IPA-CP was misled into using NOP_EXPR rather than
>>>VIEW_CONVERT_EXPR to reinterpret a vector of 4 shorts as a vector
>>>of 2 ints.  This tripped the tree-cfg.c assert I'd added in r278245.
>>>
>>>Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
>>
>> Hmm, but then it may create such conversion without verifying the
>target supports it? 
>
>This doesn't seem like the right place to check that though.

Agreed. 

>E.g. tree-vect-generic.c could lower conversions if necessary, just
>like for any other vector op.  AIUI fold_convertible_p is just checking
>whether the conversion satisfies the tree/gimple semantics of NOP_EXPR.

Hmm, OK. 

>Note that in the IPA-CP case, we should never generate a vector
>NOP_EXPR
>that isn't also tree_nop_conversion_p, see comment 4 in the PR.  It's
>possible we might want to rewrite the IPA-CP (and other IPA conditions)
>to be more explicit about this, but I think we want the patch either
>way.
>All it does is disallow cases that were previously (wrongly) allowed.

So let's go with the patch as is then. 

Thanks, 
Richard. 

>Thanks,
>Richard
>
>>
>> Richard. 
>>
>>>Richard
>>>
>>>
>>>2019-12-02  Richard Sandiford  
>>>
>>>gcc/
>>> PR middle-end/92741
>>> * fold-const.c (fold_convertible_p): Check vector types more
>>> thoroughly.
>>>
>>>gcc/testsuite/
>>> PR middle-end/92741
>>> * gcc.dg/pr92741.c: New test.
>>>
>>>Index: gcc/fold-const.c
>>>===
>>>--- gcc/fold-const.c 2019-11-18 15:27:36.053367794 +
>>>+++ gcc/fold-const.c 2019-12-02 16:25:02.662405351 +
>>>@@ -2375,10 +2375,15 @@ fold_convertible_p (const_tree type, con
>>> 
>>> case REAL_TYPE:
>>> case FIXED_POINT_TYPE:
>>>-case VECTOR_TYPE:
>>> case VOID_TYPE:
>>>   return TREE_CODE (type) == TREE_CODE (orig);
>>> 
>>>+case VECTOR_TYPE:
>>>+  return (VECTOR_TYPE_P (orig)
>>>+  && known_eq (TYPE_VECTOR_SUBPARTS (type),
>>>+   TYPE_VECTOR_SUBPARTS (orig))
>>>+  && fold_convertible_p (TREE_TYPE (type), TREE_TYPE (orig)));
>>>+
>>> default:
>>>   return false;
>>> }
>>>Index: gcc/testsuite/gcc.dg/pr92741.c
>>>===
>>>--- /dev/null2019-09-17 11:41:18.176664108 +0100
>>>+++ gcc/testsuite/gcc.dg/pr92741.c   2019-12-02 16:25:02.662405351
>+
>>>@@ -0,0 +1,19 @@
>>>+/* { dg-options "-O2 -fexceptions -fnon-call-exceptions -fno-inline"
>}
>>>*/
>>>+
>>>+typedef int vh __attribute__ ((__vector_size__ (2 * sizeof (int;
>>>+typedef short int cq __attribute__ ((__vector_size__ (4 * sizeof
>>>(short int;
>>>+
>>>+static void
>>>+id (int *r8, vh *tu)
>>>+{
>>>+  *(vh *) r8 = *tu;
>>>+}
>>>+
>>>+void
>>>+mr (void)
>>>+{
>>>+  int r8;
>>>+  cq he = { 0, };
>>>+
>>>+  id (, (vh *) );
>>>+}

Re: [C++ PATCH] Fix nsdmi handling for bitfields (PR c++/92732)


On 11/29/19 6:35 PM, Jakub Jelinek wrote:

Hi!

As the second testcase shows, we shouldn't be calling convert_for_*
with TREE_TYPE (decl) for bitfields, we need DECL_BIT_FIELD_TYPE
in that case instead (unlowered_expr_type doesn't work here,
as that wants a COMPONENT_REF which we don't have).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2019-11-29  Jakub Jelinek  

PR c++/92732
* typeck2.c (digest_nsdmi_init): For bitfields, use
DECL_BIT_FIELD_TYPE instead of TREE_TYPE.

* g++.dg/cpp2a/bitfield3.C: Don't expect narrowing conversion
warnings.
* g++.dg/cpp2a/bitfield4.C: New test.

--- gcc/cp/typeck2.c.jj 2019-11-25 22:44:24.911346688 +0100
+++ gcc/cp/typeck2.c2019-11-29 21:48:58.825713516 +0100
@@ -1335,6 +1335,9 @@ digest_nsdmi_init (tree decl, tree init,
gcc_assert (TREE_CODE (decl) == FIELD_DECL);
  
tree type = TREE_TYPE (decl);

+  if (DECL_BIT_FIELD_TYPE (decl))
+type = cp_build_qualified_type (DECL_BIT_FIELD_TYPE (decl),
+   cp_type_quals (type));


I don't think we need to add qualifiers to a scalar initializer, they'll 
just get discarded again.  OK with that change.


Jason

Re: [GCC][PATCH] Add ARM-specific Bfloat format support to middle-end

2019-12-02 Thread Jeff Law

On 11/15/19 5:02 AM, Stam Markianos-Wright wrote:
> Hi all,
> 
> This patch adds support for a new real_format for ARM Brain Floating 
> Point numbers to the middle end. This is to be used exclusively in the 
> ARM back-end.
> 
> The encode_arm_bfloat_half and decode_arm_bfloat_half functions are 
> provided to satisfy real_format struct requirements, but are never 
> intended to be called, which is why they are provided without an 
> explicit test.
> 
> Details on ARM Bfloat can be found here: 
> https://community.arm.com/developer/ip-products/processors/b/ml-ip-blog/posts/bfloat16-processing-for-neural-networks-on-armv8_2d00_a
> 
> Regtested on aarch64-none-elf for sanity.
> 
> Is this ok for trunk?
> 
> Also, I do not have commit rights, so could someone commit this on my 
> behalf?
> 
> Thank you!
> Stam Markianos-Wright
> 
> 
> 2019-11-13  Stam Markianos-Wright  
> 
>* real.c (struct arm_bfloat_half_format,
>encode_arm_bfloat_half, decode_arm_bfloat_half): New.
>* real.h (arm_bfloat_half_format): New.
> 
> 
Generally OK.  Please consider using "arm_bfloat_half" instead of
"bfloat_half" for the name field in the arm_bfloat_half_format
structure.  I'm not sure if that's really visible externally, but it
seems to me we want to be conservative and not imply any of these
half-float formats are the "one true half-float format".

jeff

Re: [C++ PATCH] c++/91353 - P1331R2: Allow trivial default init in constexpr contexts.


On 12/1/19 8:09 PM, Marek Polacek wrote:

On Thu, Nov 28, 2019 at 11:29:20PM -0500, Jason Merrill wrote:

Sounds like reduced_constant_expression_p needs to deal better with empty
bases.


This got a bit complicated because it also needs to handle unions and
now we also need to heed vptr.  But the following seems to work.

(I'll skip the story about a bogus error I hit when building cmcstl2 and
the need to debug a ~90,000 LOC test, because creduce got stuck reducing
it.)

Bootstrapped/regtested on x86_64-linux, built Boost and cmcstl2.

Ok?

2019-12-01  Marek Polacek  
Jakub Jelinek  

PR c++/91353 - P1331R2: Allow trivial default init in constexpr 
contexts.
* c-cppbuiltin.c (c_cpp_builtins): Adjust the value of __cpp_constexpr.

* class.c (trivial_default_constructor_is_constexpr): Return true in
C++20.
* constexpr.c (cx_check_missing_mem_inits): Allow missing field
initializers in C++20.
(cxx_eval_call_expression): Don't clear CONSTRUCTOR_NO_CLEARING for
constexpr constructors in C++20.
(reduced_constant_expression_p): Don't set FIELD for union and array
types.  Adjust calls to next_initializable_field.
* cp-tree.h (next_initializable_field): Adjust declaration.
* decl.c (check_for_uninitialized_const_var): Permit trivial default
initialization in constexpr.
(next_initializable_field): Add a bool parameter.  Use it.
* method.c (walk_field_subobs): Still consider a constructor that
doesn't initialize all the members constexpr.

* g++.dg/cpp0x/constexpr-array6.C: Adjust dg-error.
* g++.dg/cpp0x/constexpr-ctor.C: Likewise.
* g++.dg/cpp0x/constexpr-diag3.C: Likewise.
* g++.dg/cpp0x/constexpr-diag4.C: Likewise.
* g++.dg/cpp0x/constexpr-ex3.C: Likewise.
* g++.dg/cpp0x/constexpr-template2.C: Likewise.
* g++.dg/cpp0x/constexpr-union2.C: Likewise.
* g++.dg/cpp0x/lambda/lambda-mangle.C: Rip out a piece of code ...
* g++.dg/cpp0x/lambda/lambda-mangle6.C: ... and put it here.
* g++.dg/cpp0x/pr79118.C: Adjust dg-error.
* g++.dg/cpp1y/constexpr-83921-3.C: Likewise.
* g++.dg/cpp1y/constexpr-neg1.C: Likewise.
* g++.dg/cpp1z/constexpr-lambda12.C: Likewise.
* g++.dg/cpp1z/feat-cxx1z.C: Use -std=c++17.
* g++.dg/cpp2a/constexpr-init1.C: New test.
* g++.dg/cpp2a/constexpr-init2.C: New test.
* g++.dg/cpp2a/constexpr-init3.C: New test.
* g++.dg/cpp2a/constexpr-init4.C: New test.
* g++.dg/cpp2a/constexpr-init5.C: New test.
* g++.dg/cpp2a/constexpr-init6.C: New test.
* g++.dg/cpp2a/constexpr-init7.C: New test.
* g++.dg/cpp2a/constexpr-init8.C: New test.
* g++.dg/cpp2a/constexpr-init9.C: New test.
* g++.dg/cpp2a/constexpr-init10.C: New test.
* g++.dg/cpp2a/constexpr-init11.C: New test.
* g++.dg/cpp2a/constexpr-init12.C: New test.
* g++.dg/cpp2a/constexpr-try5.C: Adjust dg-error.
* g++.dg/cpp2a/feat-cxx2a.C: Test __cpp_constexpr.
* g++.dg/cpp2a/lambda-mangle.C: New test.
* g++.dg/debug/dwarf2/pr44641.C: Skip for c++2a.
* g++.dg/ext/stmtexpr21.C: Adjust dg-error.

diff --git gcc/c-family/c-cppbuiltin.c gcc/c-family/c-cppbuiltin.c
index 6491545bc3b..680087cd254 100644
--- gcc/c-family/c-cppbuiltin.c
+++ gcc/c-family/c-cppbuiltin.c
@@ -975,7 +975,8 @@ c_cpp_builtins (cpp_reader *pfile)
  cpp_define (pfile, "__cpp_fold_expressions=201603L");
  cpp_define (pfile, "__cpp_nontype_template_args=201411L");
  cpp_define (pfile, "__cpp_range_based_for=201603L");
- cpp_define (pfile, "__cpp_constexpr=201603L");
+ if (cxx_dialect <= cxx17)
+   cpp_define (pfile, "__cpp_constexpr=201603L");
  cpp_define (pfile, "__cpp_if_constexpr=201606L");
  cpp_define (pfile, "__cpp_capture_star_this=201603L");
  cpp_define (pfile, "__cpp_inline_variables=201606L");
@@ -997,6 +998,7 @@ c_cpp_builtins (cpp_reader *pfile)
  cpp_define (pfile, "__cpp_init_captures=201803L");
  cpp_define (pfile, "__cpp_generic_lambdas=201707L");
  cpp_define (pfile, "__cpp_designated_initializers=201707L");
+ cpp_define (pfile, "__cpp_constexpr=201907L");
  cpp_define (pfile, "__cpp_constexpr_in_decltype=201711L");
  cpp_define (pfile, "__cpp_conditional_explicit=201806L");
  cpp_define (pfile, "__cpp_consteval=201811L");
diff --git gcc/cp/class.c gcc/cp/class.c
index f36f75fa0db..d8bb44990b7 100644
--- gcc/cp/class.c
+++ gcc/cp/class.c
@@ -5288,8 +5288,14 @@ trivial_default_constructor_is_constexpr (tree t)
/* A defaulted trivial default constructor is constexpr
   if there is nothing to initialize.  */
gcc_assert (!TYPE_HAS_COMPLEX_DFLT (t));
-  /* A class with a vptr doesn't have a trivial default ctor.  */
-  return is_really_empty_class (t,

Re: [PATCH] extend -Wstringop-overflow to allocated objects (PR 91582)

2019-12-02 Thread Jeff Law

On 11/8/19 3:11 PM, Martin Sebor wrote:
> Unless it's used with _FORTIFY_SOURCE, -Wstringop-overflow
> doesn't consider out-of-bounds accesses to objects allocated
> by alloca, malloc, other functions declared with attribute
> alloc_size, or even VLAs with variable bounds.  This was
> a known limitation of the checks (done just before expansion)
> relying on the the object size pass when they were introduced
> in GCC 7.
> 
> But since its introduction in GCC 7, the warning has evolved
> beyond some of the limitations of the object size pass.  Unlike
> it, the warning considers non-constant offsets and stores with
> non-constant sizes.  Attached is a simple enhancement that
> (finally) adds the ability to also detect overflow in allocated
> objects to the warning.
> 
> With the patch GCC detects the overflow in code like this:
> 
>   char* f (void)
>   {
>     char s[] = "12345";
>     char *p = malloc (strlen (s));
>     strcpy (p, s);   // warning here
>     return p;
>   }
> 
> but not (yet) in something like this:
> 
>   char* g (const char *s)
>   {
>     char *p = malloc (strlen (s));
>     strcpy (p, s);   // no warning (yet)
>     return p;
>   }
> 
> and quite a few other examples.  Doing better requires extending
> the strlen pass.  I'm working on this extension and expect to
> submit a patch before stage 1 ends.
> 
> Martin
> 
> PS I was originally planning to do all the allocation checking
> in the strlen pass but it occurred to me that by also enhancing
> the compute_objsize function, all warnings that use it will
> benefit.  Besides -Wstringop-overflow this includes a subset
> of -Warray-bounds, -Wformat-overflow, and -Wrestrict.  It's
> nice when a small enhancement has such a broad positive effect.

> PR middle-end/91582 - missing heap overflow detection for strcpy
> 
> gcc/ChangeLog:
> 
> * builtins.c (gimple_call_alloc_size): New function.
> (compute_objsize): Add argument.  Call gimple_call_alloc_size.
> Handle variable offsets and indices.
> * builtins.h (gimple_call_alloc_size): Declare.
> (compute_objsize): Add argument.
> * tree-ssa-strlen.c (handle_store): Handle calls to allocated objects.
> 
> gcc/testsuite/ChangeLog:
> 
> * c-c++-common/Wstringop-truncation.c: Remove xfails.
> * gcc/testsuite/g++.dg/ext/attr-alloc_size.C: Suppress -Warray-bounds.
> * gcc.dg/Wstringop-overflow-22.c: New test.
> * gcc/testsuite/gcc.dg/attr-alloc_size.c: Suppress -Warray-bounds.
> * gcc/testsuite/gcc.dg/attr-copy-2.c: Same.
> * gcc.dg/builtin-stringop-chk-5.c: Remove xfails.
> * gcc.dg/builtin-stringop-chk-8.c: Same.  Correct the text of expected
> warnings.
> * gcc.target/i386/pr82002-2a.c: Prune expected warning.
> * gcc.target/i386/pr82002-2b.c: Same.
[ ... ]


> Index: gcc/builtins.c
> ===
> --- gcc/builtins.c  (revision 277978)
> +++ gcc/builtins.c  (working copy)
> @@ -3563,6 +3563,80 @@ check_access (tree exp, tree, tree, tree dstwrite,
>return true;
>  }
> 
> +/* If STMT is a call to an allocation function, returns the size
> +   of the object allocated by the call.  */
> +
> +tree
> +gimple_call_alloc_size (gimple *stmt)
> +{
> +  tree size = gimple_call_arg (stmt, argidx1);
> +  tree n = argidx2 < nargs ? gimple_call_arg (stmt, argidx2) : 
> integer_one_node;
> +
> +  /* To handle ranges do the math in wide_int and return the product
> + of the upper bounds as a constant.  Ignore anti-ranges.  */
> +  wide_int rng1[2];
> +  if (TREE_CODE (size) == INTEGER_CST)
> +rng1[0] = rng1[1] = wi::to_wide (size);
> +  else if (TREE_CODE (size) != SSA_NAME
> +  || get_range_info (size, rng1, rng1 + 1) != VR_RANGE)
> +return NULL_TREE;
> +
> +  wide_int rng2[2];
> +  if (TREE_CODE (n) == INTEGER_CST)
> +rng2[0] = rng2[1] = wi::to_wide (n);
> +  else if (TREE_CODE (n) != SSA_NAME
> +  || get_range_info (n, rng2 + 1, rng2 + 1) != VR_RANGE)
> +return NULL_TREE;
Should that 2nd call to get_range_info be "get_range_info (n, rng2, rng2
+ 1)?  I don't think it makes any difference in practice due to the
implementation of get_range_info, but if it wasn't intentional let's get
it fixed.


> Index: gcc/builtins.h
> ===
> --- gcc/builtins.h  (revision 277978)
> +++ gcc/builtins.h  (working copy)
> @@ -134,7 +134,8 @@ extern tree fold_call_stmt (gcall *, bool);
>  extern void set_builtin_user_assembler_name (tree decl, const char *asmspec);
>  extern bool is_simple_builtin (tree);
>  extern bool is_inexpensive_builtin (tree);
> -extern tree compute_objsize (tree, int, tree * = NULL);
> +tree gimple_call_alloc_size (gimple *);
> +extern tree compute_objsize (tree, int, tree * = NULL, tree * = NULL);
> 
>  extern bool readonly_data_expr (tree exp);
>  extern bool init_target_chars (void);
Is there a reason

Re: [PR92726] OpenACC: 'NULL'-in -> no-op, and/or 'NULL'-out


Hi Thomas,

On 11/29/19 3:40 PM, Thomas Schwinge wrote:

"[Patch][OpenMP/OpenACC/Fortran] Fix mapping of optional (present|absent)
arguments" reminded me that still this behavioral change has not been
split out, cited below, that you described as "trivial".

I've just filed  "OpenACC: 'NULL'-in ->
no-op, and/or 'NULL'-out", so please reference that one in the ChangeLog
updates.

So, eventually that'll be more than just
'libgomp/oacc-mem.c:update_dev_host', but I understand we need that one
now, for Fortran optional arguments support.  Any other changes can then
be handled later (once the OpenACC specification changes have been
completed).


For the changes at hand, they are called as library for 
update_device(_async) and update_self(_async). And via the 'update' 
directives via the 'maps' alway_pointer, (force_)to and (force_)from. 
'insert_pointer' is called via 'GOACC_enter_exit_data for 'enter' if the 
pointer can already be found; hence, I think the latter is untestable.


Side note: the [update_(device|self)]_async variants are mentioned  in 
the libgomp.texi but not really documented; they only appear at the end 
of "OpenACC Profiling Interface" – last block at: 
https://gcc.gnu.org/onlinedocs/libgomp/OpenACC-Profiling-Interface.html


I have the feeling the _async variants should be properly documented in 
the manual; especially, the permitted values for the third argument.



Please also add a new test case 'libgomp.oacc-c-c++-common/null-1.c' with
a "Test 'NULL'-in -> no-op, and/or 'NULL'-out" header, executing things […]


How about the attached patch?

Tobias

2019-12-02  Tobias Burnus  
	Kwok Cheung Yeung 

	libgomp/
	* oacc-mem.c (update_dev_host, gomp_acc_insert_pointer): Just return
	if input it a NULL pointer.
	* testsuite/libgomp.oacc-c-c++-common/lib-43.c: Remove; dependent on
	diagnostic of NULL pointer.
	* testsuite/libgomp.oacc-c-c++-common/lib-47.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/update-2.c: New; check whether
	NULL is handled with update.
	* testsuite/libgomp.oacc-fortran/update-2.f90: Ditto.


 libgomp/libgomp.texi   |  4 ++
 libgomp/oacc-mem.c |  9 
 libgomp/testsuite/libgomp.oacc-c-c++-common/lib-43.c   | 51 --
 libgomp/testsuite/libgomp.oacc-c-c++-common/lib-47.c   | 49 -
 libgomp/testsuite/libgomp.oacc-c-c++-common/update-2.c | 33 ++
 libgomp/testsuite/libgomp.oacc-fortran/update-2.f90| 40 +
 6 files changed, 86 insertions(+), 100 deletions(-)

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 6db895f6272..4532e366f4b 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -2536,6 +2536,8 @@ This function updates the device copy from the previously mapped host memory.
 The host memory is specified with the host address @var{a} and a length of
 @var{len} bytes.
 
+If @var{a} is the @code{NULL} pointer, this is a no-op.
+
 In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
 a contiguous array section. The second form @var{a} specifies a variable or
 array element and @var{len} specifies the length in bytes.
@@ -2569,6 +2571,8 @@ This function updates the host copy from the previously mapped device memory.
 The host memory is specified with the host address @var{a} and a length of
 @var{len} bytes.
 
+If @var{a} is the @code{NULL} pointer, this is a no-op.
+
 In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
 a contiguous array section. The second form @var{a} specifies a variable or
 array element and @var{len} specifies the length in bytes.
diff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c
index aafe88d3a14..55c195bd819 100644
--- a/libgomp/oacc-mem.c
+++ b/libgomp/oacc-mem.c
@@ -829,6 +829,12 @@ update_dev_host (int is_dev, void *h, size_t s, int async)
   if (acc_dev->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
 return;
 
+  /* Fortran optional arguments that are non-present result in a
+ NULL host address here.  This can safely be ignored as it is
+ not possible to 'update' a non-present optional argument.  */
+  if (h == NULL)
+return;
+
   acc_prof_info prof_info;
   acc_api_info api_info;
   bool profiling_p = GOACC_PROFILING_SETUP_P (thr, _info, _info);
@@ -899,6 +905,9 @@ gomp_acc_insert_pointer (size_t mapnum, void **hostaddrs, size_t *sizes,
   struct goacc_thread *thr = goacc_thread ();
   struct gomp_device_descr *acc_dev = thr->dev;
 
+  if (*hostaddrs == NULL)
+return;
+
   if (acc_is_present (*hostaddrs, *sizes))
 {
   splay_tree_key n;
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-43.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-43.c
deleted file mode 100644
index 5db29124e9e..000
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-43.c
+++ /dev/null
@@ -1,51 +0,0 @@
-/* Exercise acc_update_device with a NULL data address on nvidia

[PING][PATCH][GCC][ARM] Arm generates out of range conditional branches in Thumb2 (PR91816)

2019-12-02 Thread Stam Markianos-Wright



On 11/15/19 5:26 PM, Stam Markianos-Wright wrote:
> Pinging with more correct maintainers this time :)
> 
> Also would need to backport to gcc7,8,9, but need to get this approved 
> first!
> 
> Thank you,
> Stam
> 
> 
>  Forwarded Message 
> Subject: Re: [PATCH][GCC][ARM] Arm generates out of range conditional 
> branches in Thumb2 (PR91816)
> Date: Mon, 21 Oct 2019 10:37:09 +0100
> From: Stam Markianos-Wright 
> To: Ramana Radhakrishnan 
> CC: gcc-patches@gcc.gnu.org , nd , 
> James Greenhalgh , Richard Earnshaw 
> 
> 
> 
> 
> On 10/13/19 4:23 PM, Ramana Radhakrishnan wrote:
>>>
>>> Patch bootstrapped and regression tested on arm-none-linux-gnueabihf,
>>> however, on my native Aarch32 setup the test times out when run as part
>>> of a big "make check-gcc" regression, but not when run individually.
>>>
>>> 2019-10-11  Stamatis Markianos-Wright 
>>>
>>> * config/arm/arm.md: Update b for Thumb2 range checks.
>>> * config/arm/arm.c: New function arm_gen_far_branch.
>>>    * config/arm/arm-protos.h: New function arm_gen_far_branch
>>> prototype.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> 2019-10-11  Stamatis Markianos-Wright 
>>>
>>>    * testsuite/gcc.target/arm/pr91816.c: New test.
>>
>>> diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
>>> index f995974f9bb..1dce333d1c3 100644
>>> --- a/gcc/config/arm/arm-protos.h
>>> +++ b/gcc/config/arm/arm-protos.h
>>> @@ -570,4 +570,7 @@ void arm_parse_option_features (sbitmap, const 
>>> cpu_arch_option *,
>>>   void arm_initialize_isa (sbitmap, const enum isa_feature *);
>>> +const char * arm_gen_far_branch (rtx *, int,const char * , const 
>>> char *);
>>> +
>>> +
>>
>> Lets get the nits out of the way.
>>
>> Unnecessary extra new line, need a space between int and const above.
>>
>>
> 
> .Fixed!
> 
>>>   #endif /* ! GCC_ARM_PROTOS_H */
>>> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
>>> index 39e1a1ef9a2..1a693d2ddca 100644
>>> --- a/gcc/config/arm/arm.c
>>> +++ b/gcc/config/arm/arm.c
>>> @@ -32139,6 +32139,31 @@ arm_run_selftests (void)
>>>   }
>>>   } /* Namespace selftest.  */
>>> +
>>> +/* Generate code to enable conditional branches in functions over 1 
>>> MiB.  */
>>> +const char *
>>> +arm_gen_far_branch (rtx * operands, int pos_label, const char * dest,
>>> +    const char * branch_format)
>>
>> Not sure if this is some munging from the attachment but check
>> vertical alignment of parameters.
>>
> 
> .Fixed!
> 
>>> +{
>>> +  rtx_code_label * tmp_label = gen_label_rtx ();
>>> +  char label_buf[256];
>>> +  char buffer[128];
>>> +  ASM_GENERATE_INTERNAL_LABEL (label_buf, dest , \
>>> +    CODE_LABEL_NUMBER (tmp_label));
>>> +  const char *label_ptr = arm_strip_name_encoding (label_buf);
>>> +  rtx dest_label = operands[pos_label];
>>> +  operands[pos_label] = tmp_label;
>>> +
>>> +  snprintf (buffer, sizeof (buffer), "%s%s", branch_format , 
>>> label_ptr);
>>> +  output_asm_insn (buffer, operands);
>>> +
>>> +  snprintf (buffer, sizeof (buffer), "b\t%%l0%d\n%s:", pos_label, 
>>> label_ptr);
>>> +  operands[pos_label] = dest_label;
>>> +  output_asm_insn (buffer, operands);
>>> +  return "";
>>> +}
>>> +
>>> +
>>
>> Unnecessary extra newline.
>>
> 
> .Fixed!
> 
>>>   #undef TARGET_RUN_TARGET_SELFTESTS
>>>   #define TARGET_RUN_TARGET_SELFTESTS selftest::arm_run_selftests
>>>   #endif /* CHECKING_P */
>>> diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
>>> index f861c72ccfc..634fd0a59da 100644
>>> --- a/gcc/config/arm/arm.md
>>> +++ b/gcc/config/arm/arm.md
>>> @@ -6686,9 +6686,16 @@
>>>   ;; And for backward branches we have
>>>   ;;   (neg_range - neg_base_offs + pc_offs) = (neg_range - (-2 or 
>>> -4) + 4).
>>>   ;;
>>> +;; In 16-bit Thumb these ranges are:
>>>   ;; For a 'b'   pos_range = 2046, neg_range = -2048 giving 
>>> (-2040->2048).
>>>   ;; For a 'b' pos_range = 254,  neg_range = -256  giving (-250 
>>> ->256).
>>> +;; In 32-bit Thumb these ranges are:
>>> +;; For a 'b'   +/- 16MB is not checked for.
>>> +;; For a 'b' pos_range = 1048574,  neg_range = -1048576  giving
>>> +;; (-1048568 -> 1048576).
>>> +
>>> +
>>
>> Unnecessary extra newline.
>>
> 
> .Fixed!
> 
>>>   (define_expand "cbranchsi4"
>>>     [(set (pc) (if_then_else
>>>     (match_operator 0 "expandable_comparison_operator"
>>> @@ -6947,22 +6954,42 @@
>>>     (pc)))]
>>>     "TARGET_32BIT"
>>>     "*
>>> -  if (arm_ccfsm_state == 1 || arm_ccfsm_state == 2)
>>> -    {
>>> -  arm_ccfsm_state += 2;
>>> -  return \"\";
>>> -    }
>>> -  return \"b%d1\\t%l0\";
>>> + if (arm_ccfsm_state == 1 || arm_ccfsm_state == 2)
>>> +  {
>>> +    arm_ccfsm_state += 2;
>>> +    return \"\";
>>> +  }
>>> + switch (get_attr_length (insn))
>>> +  {
>>> +    // Thumb2 16-bit b{cond}
>>> +    case 2:
>>> +
>>> +    // Thumb2 32-bit b{cond}
>>> +    case 4: return \"b%d1\\t%l0\";break;
>>> +
>>> +    // Thumb2 b{cond} out of range.  Use unconditional branch.
>>>

Ping: [GCC][PATCH] Add ARM-specific Bfloat format support to middle-end

2019-12-02 Thread Stam Markianos-Wright



On 11/25/19 2:54 PM, Stam Markianos-Wright wrote:
> 
> On 11/15/19 12:02 PM, Stam Markianos-Wright wrote:
>> Hi all,
>>
>> This patch adds support for a new real_format for ARM Brain Floating 
>> Point numbers to the middle end. This is to be used exclusively in the 
>> ARM back-end.
>>
>> The encode_arm_bfloat_half and decode_arm_bfloat_half functions are 
>> provided to satisfy real_format struct requirements, but are never 
>> intended to be called, which is why they are provided without an 
>> explicit test.
>>
>> Details on ARM Bfloat can be found here: 
>> https://community.arm.com/developer/ip-products/processors/b/ml-ip-blog/posts/bfloat16-processing-for-neural-networks-on-armv8_2d00_a
>>  
>>
>>
>> Regtested on aarch64-none-elf for sanity.
>>
>> Is this ok for trunk?
> 
> Ping.

>>
>> Also, I do not have commit rights, so could someone commit this on my 
>> behalf?
> 
> Ping.

> 
> Thank you :)
> 
>>
>> Thank you!
>> Stam Markianos-Wright
>>
>>
>> 2019-11-13  Stam Markianos-Wright  
>>
>>    * real.c (struct arm_bfloat_half_format,
>>    encode_arm_bfloat_half, decode_arm_bfloat_half): New.
>>    * real.h (arm_bfloat_half_format): New.
>>
>>

Re: Tighten check for vector types in fold_convertible_p (PR 92741)

Richard Biener  writes:
> On December 2, 2019 5:27:05 PM GMT+01:00, Richard Sandiford 
>  wrote:
>>In this PR, IPA-CP was misled into using NOP_EXPR rather than
>>VIEW_CONVERT_EXPR to reinterpret a vector of 4 shorts as a vector
>>of 2 ints.  This tripped the tree-cfg.c assert I'd added in r278245.
>>
>>Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
>
> Hmm, but then it may create such conversion without verifying the target 
> supports it? 

This doesn't seem like the right place to check that though.
E.g. tree-vect-generic.c could lower conversions if necessary, just
like for any other vector op.  AIUI fold_convertible_p is just checking
whether the conversion satisfies the tree/gimple semantics of NOP_EXPR.

Note that in the IPA-CP case, we should never generate a vector NOP_EXPR
that isn't also tree_nop_conversion_p, see comment 4 in the PR.  It's
possible we might want to rewrite the IPA-CP (and other IPA conditions)
to be more explicit about this, but I think we want the patch either way.
All it does is disallow cases that were previously (wrongly) allowed.

Thanks,
Richard

>
> Richard. 
>
>>Richard
>>
>>
>>2019-12-02  Richard Sandiford  
>>
>>gcc/
>>  PR middle-end/92741
>>  * fold-const.c (fold_convertible_p): Check vector types more
>>  thoroughly.
>>
>>gcc/testsuite/
>>  PR middle-end/92741
>>  * gcc.dg/pr92741.c: New test.
>>
>>Index: gcc/fold-const.c
>>===
>>--- gcc/fold-const.c  2019-11-18 15:27:36.053367794 +
>>+++ gcc/fold-const.c  2019-12-02 16:25:02.662405351 +
>>@@ -2375,10 +2375,15 @@ fold_convertible_p (const_tree type, con
>> 
>> case REAL_TYPE:
>> case FIXED_POINT_TYPE:
>>-case VECTOR_TYPE:
>> case VOID_TYPE:
>>   return TREE_CODE (type) == TREE_CODE (orig);
>> 
>>+case VECTOR_TYPE:
>>+  return (VECTOR_TYPE_P (orig)
>>+   && known_eq (TYPE_VECTOR_SUBPARTS (type),
>>+TYPE_VECTOR_SUBPARTS (orig))
>>+   && fold_convertible_p (TREE_TYPE (type), TREE_TYPE (orig)));
>>+
>> default:
>>   return false;
>> }
>>Index: gcc/testsuite/gcc.dg/pr92741.c
>>===
>>--- /dev/null 2019-09-17 11:41:18.176664108 +0100
>>+++ gcc/testsuite/gcc.dg/pr92741.c2019-12-02 16:25:02.662405351 +
>>@@ -0,0 +1,19 @@
>>+/* { dg-options "-O2 -fexceptions -fnon-call-exceptions -fno-inline" }
>>*/
>>+
>>+typedef int vh __attribute__ ((__vector_size__ (2 * sizeof (int;
>>+typedef short int cq __attribute__ ((__vector_size__ (4 * sizeof
>>(short int;
>>+
>>+static void
>>+id (int *r8, vh *tu)
>>+{
>>+  *(vh *) r8 = *tu;
>>+}
>>+
>>+void
>>+mr (void)
>>+{
>>+  int r8;
>>+  cq he = { 0, };
>>+
>>+  id (, (vh *) );
>>+}

Don't install unnecessary ARRAY_REF element sizes

Even EXACT_DIV_EXPR doesn't distribute across addition for wrapping
types, so in general we can't fold EXACT_DIV_EXPRs of POLY_INT_CSTs
at compile time.  This was causing an ICE when trying to gimplify the
element size field in an ARRAY_REF.

If the result of that EXACT_DIV_EXPR is an invariant, we don't bother
recording it in the ARRAY_REF and simply read the element size from the
element type.  This avoids the overhead of doing:

  /* ??? tree_ssa_useless_type_conversion will eliminate casts to
 sizetype from another type of the same width and signedness.  */
  if (TREE_TYPE (aligned_size) != sizetype)
aligned_size = fold_convert_loc (loc, sizetype, aligned_size);
  return size_binop_loc (loc, MULT_EXPR, aligned_size,
 size_int (TYPE_ALIGN_UNIT (elmt_type)));

each time array_ref_element_size is called.

So rather than read array_ref_element_size, do some arithmetic on it,
and only then check whether the result is an invariant, we might as
well check whether the element size is an invariant to start with.
We're then directly testing whether array_ref_element_size gives
a reusable value.

For consistency, the patch makes the same change for the offset field
in a COMPONENT_REF, although I don't think that can trigger yet.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Richard


2019-12-02  Richard Sandiford  

gcc/
* gimplify.c (gimplify_compound_lval): Don't gimplify and install
an array element size if array_element_size is already an invariant.
Similarly don't gimplify and install a field offset if
component_ref_field_offset is already an invariant.

gcc/testsuite/
* gcc.target/aarch64/sve/acle/general-c/struct_1.c: New test.

Index: gcc/gimplify.c
===
--- gcc/gimplify.c  2019-11-22 09:57:52.178273436 +
+++ gcc/gimplify.c  2019-12-02 16:32:08.063499557 +
@@ -2987,17 +2987,18 @@ gimplify_compound_lval (tree *expr_p, gi
 
  if (TREE_OPERAND (t, 3) == NULL_TREE)
{
- tree elmt_type = TREE_TYPE (TREE_TYPE (TREE_OPERAND (t, 0)));
- tree elmt_size = unshare_expr (array_ref_element_size (t));
- tree factor = size_int (TYPE_ALIGN_UNIT (elmt_type));
-
- /* Divide the element size by the alignment of the element
-type (above).  */
- elmt_size
-   = size_binop_loc (loc, EXACT_DIV_EXPR, elmt_size, factor);
-
+ tree elmt_size = array_ref_element_size (t);
  if (!is_gimple_min_invariant (elmt_size))
{
+ elmt_size = unshare_expr (elmt_size);
+ tree elmt_type = TREE_TYPE (TREE_TYPE (TREE_OPERAND (t, 0)));
+ tree factor = size_int (TYPE_ALIGN_UNIT (elmt_type));
+
+ /* Divide the element size by the alignment of the element
+type (above).  */
+ elmt_size = size_binop_loc (loc, EXACT_DIV_EXPR,
+ elmt_size, factor);
+
  TREE_OPERAND (t, 3) = elmt_size;
  tret = gimplify_expr (_OPERAND (t, 3), pre_p,
post_p, is_gimple_reg,
@@ -3017,16 +3018,18 @@ gimplify_compound_lval (tree *expr_p, gi
  /* Set the field offset into T and gimplify it.  */
  if (TREE_OPERAND (t, 2) == NULL_TREE)
{
- tree offset = unshare_expr (component_ref_field_offset (t));
- tree field = TREE_OPERAND (t, 1);
- tree factor
-   = size_int (DECL_OFFSET_ALIGN (field) / BITS_PER_UNIT);
-
- /* Divide the offset by its alignment.  */
- offset = size_binop_loc (loc, EXACT_DIV_EXPR, offset, factor);
-
+ tree offset = component_ref_field_offset (t);
  if (!is_gimple_min_invariant (offset))
{
+ offset = unshare_expr (offset);
+ tree field = TREE_OPERAND (t, 1);
+ tree factor
+   = size_int (DECL_OFFSET_ALIGN (field) / BITS_PER_UNIT);
+
+ /* Divide the offset by its alignment.  */
+ offset = size_binop_loc (loc, EXACT_DIV_EXPR,
+  offset, factor);
+
  TREE_OPERAND (t, 2) = offset;
  tret = gimplify_expr (_OPERAND (t, 2), pre_p,
post_p, is_gimple_reg,
Index: gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/struct_1.c
===
--- /dev/null   2019-09-17 11:41:18.176664108 +0100
+++ gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/struct_1.c  
2019-12-02 16:32:08.075499476 +
@@ -0,0 +1,10 @@
+/* { dg-options "-std=c99" } */
+
+#include 
+
+void
+f (svint8_t a, svint8_t b)
+{
+  /* Not supported, but

Re: Tighten check for vector types in fold_convertible_p (PR 92741)

On December 2, 2019 5:27:05 PM GMT+01:00, Richard Sandiford 
 wrote:
>In this PR, IPA-CP was misled into using NOP_EXPR rather than
>VIEW_CONVERT_EXPR to reinterpret a vector of 4 shorts as a vector
>of 2 ints.  This tripped the tree-cfg.c assert I'd added in r278245.
>
>Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Hmm, but then it may create such conversion without verifying the target 
supports it? 

Richard. 

>Richard
>
>
>2019-12-02  Richard Sandiford  
>
>gcc/
>   PR middle-end/92741
>   * fold-const.c (fold_convertible_p): Check vector types more
>   thoroughly.
>
>gcc/testsuite/
>   PR middle-end/92741
>   * gcc.dg/pr92741.c: New test.
>
>Index: gcc/fold-const.c
>===
>--- gcc/fold-const.c   2019-11-18 15:27:36.053367794 +
>+++ gcc/fold-const.c   2019-12-02 16:25:02.662405351 +
>@@ -2375,10 +2375,15 @@ fold_convertible_p (const_tree type, con
> 
> case REAL_TYPE:
> case FIXED_POINT_TYPE:
>-case VECTOR_TYPE:
> case VOID_TYPE:
>   return TREE_CODE (type) == TREE_CODE (orig);
> 
>+case VECTOR_TYPE:
>+  return (VECTOR_TYPE_P (orig)
>+&& known_eq (TYPE_VECTOR_SUBPARTS (type),
>+ TYPE_VECTOR_SUBPARTS (orig))
>+&& fold_convertible_p (TREE_TYPE (type), TREE_TYPE (orig)));
>+
> default:
>   return false;
> }
>Index: gcc/testsuite/gcc.dg/pr92741.c
>===
>--- /dev/null  2019-09-17 11:41:18.176664108 +0100
>+++ gcc/testsuite/gcc.dg/pr92741.c 2019-12-02 16:25:02.662405351 +
>@@ -0,0 +1,19 @@
>+/* { dg-options "-O2 -fexceptions -fnon-call-exceptions -fno-inline" }
>*/
>+
>+typedef int vh __attribute__ ((__vector_size__ (2 * sizeof (int;
>+typedef short int cq __attribute__ ((__vector_size__ (4 * sizeof
>(short int;
>+
>+static void
>+id (int *r8, vh *tu)
>+{
>+  *(vh *) r8 = *tu;
>+}
>+
>+void
>+mr (void)
>+{
>+  int r8;
>+  cq he = { 0, };
>+
>+  id (, (vh *) );
>+}

Re: Mark constant-sized objects as addressable if they have poly-int accesses

[finally getting back to this]

Richard Biener  writes:
> On Fri, Nov 8, 2019 at 10:40 AM Richard Sandiford
>  wrote:
>>
>> If SVE code is written for a specific vector length, it might load from
>> or store to fixed-sized objects.  This needs to work even without
>> -msve-vector-bits=N (which should never be needed for correctness).
>>
>> There's no way of handling a direct poly-int sized reference to a
>> fixed-size register; it would have to go via memory.  And in that
>> case it's more efficient to mark the fixed-size object as
>> addressable from the outset, like we do for array references
>> with non-constant indices.
>>
>> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
>
> Hmm, shouldn't you somehow avoid walking subtrees again like
> the array-ref cases?  Do "intermediate" types really matter here?
> Thus if you have BIT_FIELDREF  fixed-size-decl>,
> select first element > do you really need it addressable?
>
> It seems to me you want to check it only on the actual reference type.

Yeah, I guess it was overly general.  I don't think it can happen
for anything other than the MEM_REF/TARGET_MEM_REF itself.

How about this version?  Tested as before.

Richard


2019-12-02  Richard Sandiford  

gcc/
* cfgexpand.c (discover_nonconstant_array_refs_r): If an access
with POLY_INT_CST size is made to a fixed-size object, force the
object to live in memory.

gcc/testsuite/
* gcc.target/aarch64/sve/acle/general/deref_1.c: New test.

Index: gcc/cfgexpand.c
===
--- gcc/cfgexpand.c 2019-11-13 08:42:44.901373255 +
+++ gcc/cfgexpand.c 2019-12-02 16:27:45.157295080 +
@@ -6133,6 +6133,21 @@ discover_nonconstant_array_refs_r (tree
 
   *walk_subtrees = 0;
 }
+  /* References of size POLY_INT_CST to a fixed-size object must go
+ through memory.  It's more efficient to force that here than
+ to create temporary slots on the fly.  */
+  else if ((TREE_CODE (t) == MEM_REF || TREE_CODE (t) == TARGET_MEM_REF)
+  && TYPE_SIZE (TREE_TYPE (t))
+  && POLY_INT_CST_P (TYPE_SIZE (TREE_TYPE (t
+{
+  tree base = get_base_address (t);
+  if (base
+ && DECL_P (base)
+ && DECL_MODE (base) != BLKmode
+ && GET_MODE_SIZE (DECL_MODE (base)).is_constant ())
+   TREE_ADDRESSABLE (base) = 1;
+  *walk_subtrees = 0;
+}
 
   return NULL_TREE;
 }
Index: gcc/testsuite/gcc.target/aarch64/sve/acle/general/deref_1.c
===
--- /dev/null   2019-09-17 11:41:18.176664108 +0100
+++ gcc/testsuite/gcc.target/aarch64/sve/acle/general/deref_1.c 2019-12-02 
16:27:45.165295026 +
@@ -0,0 +1,25 @@
+/* { dg-options "-O2" } */
+
+#include 
+
+uint64_t
+f1 (int32_t *x, int32_t *y)
+{
+  union { uint64_t x; char c[8]; } u;
+  svbool_t pg = svptrue_b32 ();
+  *(svbool_t *)[0] = svcmpeq (pg, svld1 (pg, x), 0);
+  *(svbool_t *)[4] = svcmpeq (pg, svld1 (pg, y), 1);
+  return u.x;
+}
+
+typedef unsigned int v4si __attribute__((vector_size(16)));
+
+/* The aliasing is somewhat dubious here, but it must compile.  */
+
+v4si
+f2 (void)
+{
+  v4si res;
+  *(svuint32_t *)  = svindex_u32 (0, 1);
+  return res;
+}

Re: [PATCH][2/2] More abstraction penalty removal for PR92645

On December 2, 2019 4:27:47 PM GMT+01:00, Alexander Monakov 
 wrote:
>On Mon, 2 Dec 2019, Richard Biener wrote:
>
>> +typedef long long v4di __attribute__((vector_size(32)));
>> +struct Vec
>> +{
>> +  unsigned int v[8];
>> +};
>> +
>> +v4di pun (struct Vec *s)
>> +{
>> +  v4di tem;
>> +  __builtin_memcpy (, s, 32);
>> +  return tem;
>> +}
>> +
>> +/* We're expecting exactly two stmts, in particular no
>BIT_INSERT_EXPR
>> +   and no memcpy call.
>> +_3 = MEM  [(char * {ref-all})s_2(D)];
>> +return _3;  */
>
>So just to make sure I understand correctly: the type in angle brackets
>does
>not imply 256-bit alignment, and this access has implied alignment of
>just 
>32 bits, which is deduced from the type of s_2, right?

Yes. I should have quoted the more obvious -gimple IL dump instead. 

Richard. 

>Thanks!
>Alexander

Tighten check for vector types in fold_convertible_p (PR 92741)

In this PR, IPA-CP was misled into using NOP_EXPR rather than
VIEW_CONVERT_EXPR to reinterpret a vector of 4 shorts as a vector
of 2 ints.  This tripped the tree-cfg.c assert I'd added in r278245.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Richard


2019-12-02  Richard Sandiford  

gcc/
PR middle-end/92741
* fold-const.c (fold_convertible_p): Check vector types more
thoroughly.

gcc/testsuite/
PR middle-end/92741
* gcc.dg/pr92741.c: New test.

Index: gcc/fold-const.c
===
--- gcc/fold-const.c2019-11-18 15:27:36.053367794 +
+++ gcc/fold-const.c2019-12-02 16:25:02.662405351 +
@@ -2375,10 +2375,15 @@ fold_convertible_p (const_tree type, con
 
 case REAL_TYPE:
 case FIXED_POINT_TYPE:
-case VECTOR_TYPE:
 case VOID_TYPE:
   return TREE_CODE (type) == TREE_CODE (orig);
 
+case VECTOR_TYPE:
+  return (VECTOR_TYPE_P (orig)
+ && known_eq (TYPE_VECTOR_SUBPARTS (type),
+  TYPE_VECTOR_SUBPARTS (orig))
+ && fold_convertible_p (TREE_TYPE (type), TREE_TYPE (orig)));
+
 default:
   return false;
 }
Index: gcc/testsuite/gcc.dg/pr92741.c
===
--- /dev/null   2019-09-17 11:41:18.176664108 +0100
+++ gcc/testsuite/gcc.dg/pr92741.c  2019-12-02 16:25:02.662405351 +
@@ -0,0 +1,19 @@
+/* { dg-options "-O2 -fexceptions -fnon-call-exceptions -fno-inline" } */
+
+typedef int vh __attribute__ ((__vector_size__ (2 * sizeof (int;
+typedef short int cq __attribute__ ((__vector_size__ (4 * sizeof (short 
int;
+
+static void
+id (int *r8, vh *tu)
+{
+  *(vh *) r8 = *tu;
+}
+
+void
+mr (void)
+{
+  int r8;
+  cq he = { 0, };
+
+  id (, (vh *) );
+}

Re: [PATCH v2 00/11] timed_mutex, shared_timed_mutex: Add full steady clock support

2019-12-02 Thread Jonathan Wakely


On 15/10/19 18:57 +0100, Mike Crowe wrote:

glibc v2.30 added the pthread_mutex_clocklock,
pthread_rwlock_clockrdlock and pthread_rwlock_clockwrlock
functions. These accept CLOCK_MONOTONIC, so they can be used to
implement proper steady_clock support in timed_mutex,
recursive_timed_mutex and shared_timed_mutex that is immune to the
system clock being warped.

Unfortunately we can't warp the system clock in the testsuite, so it's
not possible to automatically ensure that the system_clock test cases
result in a wait on CLOCK_REALTIME and steady_clock test cases result
in a wait on CLOCK_MONOTONIC. It's recommended to run the test under
strace(1) and check whether the expected futex calls are made by glibc
or ltrace(1) and check whether the expected pthread calls are
made. The easiest way to do this is to copy and paste the line used to
build the test from the output of running the tests (for example):

make check-target-libstdc++-v3 
RUNTESTFLAGS="conformance.exp=30_threads/shared_mutex/* -v -v"

to compile the test with only the tests for one clock enabled and then
run it as:

strace -f ./1.exe

You should see calls to:

futex(..., FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, ...)

with large values of tv_sec when using system_clock and calls to:

futex(..., FUTEX_WAIT_BITSET_PRIVATE, ...)

Alternatively, you can use:

ltrace -f ./1.exe

and look for calls to pthread_mutex_clocklock,
pthread_rwlock_clockrdlock and pthread_rwlock_clockwrlock with a
parameter of 1 for CLOCK_MONOTONIC and with values of tv_sec based on
the system uptime when using steady_clock.

This series also adds some extra tests and fixes some other minor
problems with the existing implementation and tests.

Changes since v1[1]:

* Fix flaw pointed out[2] by François Dumont and add tests to prove
 that.

[1] https://gcc.gnu.org/ml/libstdc++/2019-09/msg00106.html
[2] https://gcc.gnu.org/ml/libstdc++/2019-10/msg00021.html

Mike Crowe (11):
 libstdc++ testsuite: Check return value from timed_mutex::try_lock_until
 libstdc++ testsuite: Add timed_mutex::try_lock_until test
 libstdc++ testsuite: Also test timed_mutex with steady_clock
 libstdc++ testsuite: Also test unique_lock::try_lock_until with steady_clock
 PR libstdc++/78237 Add full steady_clock support to timed_mutex
 libstdc++ testsuite: Move slow_clock to its own header
 PR libstdc++/91906 Fix timed_mutex::try_lock_until on arbitrary clock
 libstdc++ testsuite: Also test shared_timed_mutex with steady_clock
 libstdc++ shared_mutex: Add full steady_clock support to shared_timed_mutex
 libstdc++ timed_mutex: Ensure that try_lock_for waits for long enough
 shared_mutex: Fix try_lock_until and try_lock_shared_until on arbitrary clock

libstdc++-v3/acinclude.m4   |  
64 +++-
libstdc++-v3/config.h.in|   
7 +++-
libstdc++-v3/configure  | 
170 -
libstdc++-v3/configure.ac   |   
6 +++-
libstdc++-v3/include/std/mutex  |  
60 +
libstdc++-v3/include/std/shared_mutex   | 
117 +-
libstdc++-v3/testsuite/30_threads/condition_variable/members/2.cc   |  
17 +---
libstdc++-v3/testsuite/30_threads/recursive_timed_mutex/try_lock_until/3.cc |  
76 -
libstdc++-v3/testsuite/30_threads/shared_timed_mutex/try_lock/3.cc  |  
17 ---
libstdc++-v3/testsuite/30_threads/shared_timed_mutex/try_lock_until/1.cc|  
87 +-
libstdc++-v3/testsuite/30_threads/shared_timed_mutex/try_lock_until/2.cc|  
75 -
libstdc++-v3/testsuite/30_threads/timed_mutex/try_lock_until/3.cc   |  
76 -
libstdc++-v3/testsuite/30_threads/timed_mutex/try_lock_until/4.cc   |  
68 +-
libstdc++-v3/testsuite/30_threads/timed_mutex/try_lock_until/57641.cc   |  
18 +---
libstdc++-v3/testsuite/30_threads/unique_lock/locking/4.cc  |  
14 --
libstdc++-v3/testsuite/util/slow_clock.h|  
38 -
16 files changed, 850 insertions(+), 60 deletions(-)
create mode 100644 
libstdc++-v3/testsuite/30_threads/recursive_timed_mutex/try_lock_until/3.cc
create mode 100644 
libstdc++-v3/testsuite/30_threads/shared_timed_mutex/try_lock_until/1.cc
create mode 100644 
libstdc++-v3/testsuite/30_threads/shared_timed_mutex/try_lock_until/2.cc
create mode 100644 
libstdc++-v3/testsuite/30_threads/timed_mutex/try_lock_until/3.cc
create mode 100644 
libstdc++-v3/testsuite/30_threads/timed_mutex/try_lock_until/4.cc
create mode 100644

Re: [PATCH][2/2] More abstraction penalty removal for PR92645

2019-12-02 Thread Alexander Monakov

On Mon, 2 Dec 2019, Richard Biener wrote:

> +typedef long long v4di __attribute__((vector_size(32)));
> +struct Vec
> +{
> +  unsigned int v[8];
> +};
> +
> +v4di pun (struct Vec *s)
> +{
> +  v4di tem;
> +  __builtin_memcpy (, s, 32);
> +  return tem;
> +}
> +
> +/* We're expecting exactly two stmts, in particular no BIT_INSERT_EXPR
> +   and no memcpy call.
> +_3 = MEM  [(char * {ref-all})s_2(D)];
> +return _3;  */

So just to make sure I understand correctly: the type in angle brackets does
not imply 256-bit alignment, and this access has implied alignment of just 
32 bits, which is deduced from the type of s_2, right?

Thanks!
Alexander

[PATCH][2/2] More abstraction penalty removal for PR92645



This recovers some of the nearly dead code in 
gimple_fold_builtin_memory_op by allowing a rewrite of memcpy
with a properly aligned source or destination decl.  In particular
this handles register typed vars to be tranformed (and later
rewritten into SSA form).

Together with 1/2 the testcase then optimizes from

skvx::bit_pun<__vector(4) long long int, skvx::Vec<8, unsigned int> > 
(const struct Vec & s)
{
  vector(4) long long int D.151565;
  vector(4) long long int d;

  try
{
  memcpy (, s, 32);
  D.151565 = d;
  return D.151565;
}
  finally
{
  d = {CLOBBER};
}
}

to

skvx::bit_pun<__vector(4) long long int, skvx::Vec<8, unsigned int> > 
(const struct Vec & s)
{
  vector(4) long long int d;
  vector(4) long long int _3;

   :
  _3 = MEM  [(char * {ref-all})s_2(D)];
  return _3;

}

instead of a weird memcpy + bit-insert combo.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

2019-11-02  Richard Biener  

PR tree-optimization/92645
* gimple-fold.c (gimple_fold_builtin_memory_op): Fold memcpy
from or to a properly aligned register variable.

* gcc.target/i386/pr92645-5.c: New testcase.

Index: gcc/gimple-fold.c
===
--- gcc/gimple-fold.c   (revision 278893)
+++ gcc/gimple-fold.c   (working copy)
@@ -987,32 +987,21 @@ gimple_fold_builtin_memory_op (gimple_st
   src_align = get_pointer_alignment (src);
   dest_align = get_pointer_alignment (dest);
   if (dest_align < TYPE_ALIGN (desttype)
- || src_align < TYPE_ALIGN (srctype))
+ && src_align < TYPE_ALIGN (srctype))
return false;
 
   destvar = NULL_TREE;
+  srcvar = NULL_TREE;
   if (TREE_CODE (dest) == ADDR_EXPR
  && var_decl_component_p (TREE_OPERAND (dest, 0))
- && tree_int_cst_equal (TYPE_SIZE_UNIT (desttype), len))
+ && tree_int_cst_equal (TYPE_SIZE_UNIT (desttype), len)
+ && dest_align >= TYPE_ALIGN (desttype))
destvar = fold_build2 (MEM_REF, desttype, dest, off0);
-
-  srcvar = NULL_TREE;
-  if (TREE_CODE (src) == ADDR_EXPR
- && var_decl_component_p (TREE_OPERAND (src, 0))
- && tree_int_cst_equal (TYPE_SIZE_UNIT (srctype), len))
-   {
- if (!destvar
- || src_align >= TYPE_ALIGN (desttype))
-   srcvar = fold_build2 (MEM_REF, destvar ? desttype : srctype,
- src, off0);
- else if (!STRICT_ALIGNMENT)
-   {
- srctype = build_aligned_type (TYPE_MAIN_VARIANT (desttype),
-   src_align);
- srcvar = fold_build2 (MEM_REF, srctype, src, off0);
-   }
-   }
-
+  else if (TREE_CODE (src) == ADDR_EXPR
+  && var_decl_component_p (TREE_OPERAND (src, 0))
+  && tree_int_cst_equal (TYPE_SIZE_UNIT (srctype), len)
+  && src_align >= TYPE_ALIGN (srctype))
+   srcvar = fold_build2 (MEM_REF, srctype, src, off0);
   if (srcvar == NULL_TREE && destvar == NULL_TREE)
return false;
 
Index: gcc/testsuite/gcc.target/i386/pr92645-5.c
===
--- gcc/testsuite/gcc.target/i386/pr92645-5.c   (nonexistent)
+++ gcc/testsuite/gcc.target/i386/pr92645-5.c   (working copy)
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fdump-tree-cddce1 -mavx2 -Wno-psabi" } */
+typedef long long v4di __attribute__((vector_size(32)));
+struct Vec
+{
+  unsigned int v[8];
+};
+
+v4di pun (struct Vec *s)
+{
+  v4di tem;
+  __builtin_memcpy (, s, 32);
+  return tem;
+}
+
+/* We're expecting exactly two stmts, in particular no BIT_INSERT_EXPR
+   and no memcpy call.
+_3 = MEM  [(char * {ref-all})s_2(D)];
+return _3;  */
+/* { dg-final { scan-tree-dump-times " = MEM" 1 "cddce1" } } */
+/* { dg-final { scan-tree-dump-not "memcpy" "cddce1" } } */

Re: Fix jump function update in inliner

2019-12-02 Thread Martin Jambor

Hi,

On Sun, Dec 01 2019, Jan Hubicka wrote:
> Hi,
> this patch fixes the problem with clearing bits and m_vr in inliner
> update we discussed earlier.  I am not sure if
> dete_type_change_from_memory_writes needs the clear?

No, they don't.  Since your rewrite of devirtualization, the jfunc
parameter of functions detect_type_change_from_memory_writes,
detect_type_change and detect_type_change_ssa is quite clearly
completely useless and should be removed (along with its description in
the comments).

I can remove them as a followup or feel free to do so.

Otherwise I believe we already discussed that this is the right thing to
do.

Martin


>
> Bootstrapped/regtested x86_64-linux, Martin, does it make sense?
>
> Honza
>
>   * ipa-prop.c (ipa_set_jf_unknown): Do not clear bits and m_vr.
>   (detect_type_change_from_memory_writes): Clear it here.
> Index: ipa-prop.c
> ===
> --- ipa-prop.c(revision 278877)
> +++ ipa-prop.c(working copy)
> @@ -512,8 +512,6 @@ static void
>  ipa_set_jf_unknown (struct ipa_jump_func *jfunc)
>  {
>jfunc->type = IPA_JF_UNKNOWN;
> -  jfunc->bits = NULL;
> -  jfunc->m_vr = NULL;
>  }
>  
>  /* Set JFUNC to be a copy of another jmp (to be used by jump function
> @@ -819,6 +817,8 @@ detect_type_change_from_memory_writes (i
>  return false;
>  
>ipa_set_jf_unknown (jfunc);
> +  jfunc->bits = NULL;
> +  jfunc->m_vr = NULL;
>return true;
>  }
>

[PATCH][1/2] More abstraction penalty removal for PR92645



The following avoids a spurious BIT_INSERT_EXPR when rewriting
a decl into SSA form.

Bootstrap & regtest running on x86_64-unknown-linux-gnu.

Richard.

2019-11-02  Richard Biener  

PR tree-optimization/92645
* tree-ssa.c (execute_update_addresses_taken): Avoid representing
a full def of a vector via a BIT_INSERT_EXPR.

Index: gcc/tree-ssa.c
===
--- gcc/tree-ssa.c  (revision 278893)
+++ gcc/tree-ssa.c  (working copy)
@@ -1899,6 +1899,11 @@ execute_update_addresses_taken (void)
&& bitmap_bit_p (suitable_for_renaming, DECL_UID (sym))
&& VECTOR_TYPE_P (TREE_TYPE (sym))
&& TYPE_MODE (TREE_TYPE (sym)) != BLKmode
+   /* If it is a full replacement we can do better below.  */
+   && maybe_ne (wi::to_poly_offset
+  (TYPE_SIZE_UNIT (TREE_TYPE (lhs))),
+wi::to_poly_offset
+   (TYPE_SIZE_UNIT (TREE_TYPE (sym
&& known_ge (mem_ref_offset (lhs), 0)
&& known_gt (wi::to_poly_offset
   (TYPE_SIZE_UNIT (TREE_TYPE (sym))),

Re: [PATCH 4/7 libgomp,amdgcn] GCN libgomp port

2019-12-02 Thread Julian Brown

On Mon, 2 Dec 2019 15:43:29 +0100
Thomas Schwinge  wrote:

> > --- a/libgomp/openacc.h
> > +++ b/libgomp/openacc.h
> > @@ -55,6 +55,7 @@ typedef enum acc_device_t {
> >/* acc_device_host_nonshm = 3 removed.  */
> >acc_device_not_host = 4,
> >acc_device_nvidia = 5,
> > +  acc_device_gcn = 8,  
> 
> There is no 'acc_device_gcn' in OpenACC.  Per OpenACC 3.0, A.1.2. "AMD
> GPU Targets", for example, there is 'acc_device_radeon' (and "the
> case-insensitive name 'radeon' for the environment variable
> 'ACC_DEVICE_TYPE'").  If that is not appropriate to use (I have not
> read up how exactly AMD's "GCN" and "radeon" relate to each other),
> we should get that clarified in the OpenACC specification.

FWIW, I'm pretty sure there are Radeon devices that do not use the GCN
ISA. OTOH, there are also Nvidia devices that are not PTX-compatible.
No strong opinion on the acc_device_foo name from me, though (maybe
"acc_device_amdgcn"?).

Julian

[PING^2][PATCH 0/4] Fix library testsuite compilation for build sysroot

2019-12-02 Thread Maciej W. Rozycki

On Mon, 11 Nov 2019, Maciej W. Rozycki wrote:

>  This patch series addresses a problem with the testsuite compiler being 
> set up across libatomic, libffi, libgo, libgomp with no correlation 
> whatsoever to the target compiler being used in GCC compilation.  
> Consequently there in no arrangement made to set up the compilation 
> sysroot according to the build sysroot specified for GCC compilation, 
> causing a catastrophic failure across the testsuites affected from the 
> inability to link executables.

 Ping for:





  Maciej

Re: [PATCH 4/7 libgomp,amdgcn] GCN libgomp port

2019-12-02 Thread Thomas Schwinge

Hi!

On 2019-11-12T13:29:13+, Andrew Stubbs  wrote:
> --- a/include/gomp-constants.h
> +++ b/include/gomp-constants.h
> @@ -174,6 +174,7 @@ enum gomp_map_kind
>  #define GOMP_DEVICE_NVIDIA_PTX   5
>  #define GOMP_DEVICE_INTEL_MIC6
>  #define GOMP_DEVICE_HSA  7
> +#define GOMP_DEVICE_GCN  8

Unless we are to expect non-AMD GCN implementations (hardware), I'd favor
if all these "GCN" things were called "AMD GCN", like done for "Nvidia
PTX", or "Intel MIC", or why shouldn't we?

> --- a/libgomp/configure.ac
> +++ b/libgomp/configure.ac
> @@ -176,7 +176,7 @@ case "$host" in
>*-*-rtems*)
>  # RTEMS supports Pthreads, but the library is not available at GCC build 
> time.
>  ;;
> -  nvptx*-*-*)
> +  nvptx*-*-* | amdgcn*-*-*)
>  # NVPTX does not support Pthreads, has its own code replacement.
>  libgomp_use_pthreads=no
>  # NVPTX is an accelerator-only target

I'd have sorted alphabetically, and updated the "NVPTX" comments.

> --- a/libgomp/configure.tgt
> +++ b/libgomp/configure.tgt
> @@ -164,6 +164,10 @@ case "${target}" in
>   fi
>   ;;
>  
> +  amdgcn*-*-*)
> + config_path="gcn accel"
> + ;;
> +
>*)
>   ;;

(Again, I'd sort alphabetically, but again, I acknowledge that the list is
not sorted at present.)

> --- a/libgomp/openacc.h
> +++ b/libgomp/openacc.h
> @@ -55,6 +55,7 @@ typedef enum acc_device_t {
>/* acc_device_host_nonshm = 3 removed.  */
>acc_device_not_host = 4,
>acc_device_nvidia = 5,
> +  acc_device_gcn = 8,

There is no 'acc_device_gcn' in OpenACC.  Per OpenACC 3.0, A.1.2. "AMD
GPU Targets", for example, there is 'acc_device_radeon' (and "the
case-insensitive name 'radeon' for the environment variable
'ACC_DEVICE_TYPE'").  If that is not appropriate to use (I have not read
up how exactly AMD's "GCN" and "radeon" relate to each other), we should
get that clarified in the OpenACC specification.

Grüße
 Thomas

signature.asc
Description: PGP signature

Re: Host/device shared memory

2019-12-02 Thread Andrew Stubbs


On 02/12/2019 14:23, Thomas Schwinge wrote:

Hi!

On 2019-11-15T13:43:04+0100, Jakub Jelinek  wrote:

On Fri, Nov 15, 2019 at 12:38:06PM +, Andrew Stubbs wrote:

On 15/11/2019 12:21, Jakub Jelinek wrote:

I'm surprised by the set acc_mem_shared 0, I thought gcn is a shared memory
offloading target.


APUs, such as Carizzo are shared memory. DGPUs, such as Fiji and Vega, have
their own memory. A DGPU can access host memory, provided that it has been
set up just so, but that is very slow, and I don't know of a way to do that
without still having to copy the program data into that special region.


For a few years already, Nvidia GPUs/drivers have been supporting what
they call Unified Memory, where the driver/kernel automatically handles
the movement of memory pages between host/device memories.  Given some
reasonable pre-fetching logic (either automatic in the driver/kernel, or
"guided" by the compiler/runtime), this reportedly achieves good
performance -- or even better performance than manually-managed memory
copying, as really only the data pages accessed (plus pre-fetched) will
be copied.


Yeah, this is not that.  When the AMD GPU accesses host memory it 
appears to bypass both L1 and L2 caching.  There's no copying, just 
direct, on-demand accesses.  This makes the performance really bad. We 
use it only for message passing, which is probably the original intent.



For example, see  "Compiler
assisted hybrid implicit and explicit GPU memory management under unified
address space", which I've recently (SuperComputing 2019) have seen
presented, or other publications.

This is not currently implemented in GCC, but could/should be at some
point.

This (or even a mixture of manual-discrete/automatic-shared?) would then
be an execution mode of libgomp/plugin, selected at run-time?


All we really need from libgomp, to support AMD APUs, is to be able to 
toggle the shared memory mode dynamically, rather than having it baked 
into the capabilities at start-up. Probably we could figure out the 
capabilities at run-time already, but that would break when a system has 
both kinds of device. Anyway, this is theoretical as I have no intention 
to implement support for such devices.


Andrew

[PATCH] PR C++/92739

2019-12-02 Thread Andrew Sutton

Find attached.

   gcc/cp/
* parser.c (cp_parser_constraint_requires_parens): Exclude
attributes
as postfix expressions.

gcc/testsuite/
* g++.dg/concepts-pr92739.C: New test.

Andrew Sutton


0001-Fix-PR-c-92739.patch
Description: Binary data

[PATCH v3] Add `--with-toolexeclibdir=' configuration option

2019-12-02 Thread Maciej W. Rozycki

Provide means, in the form of a `--with-toolexeclibdir=' configuration 
option, to override the default installation directory for target 
libraries, otherwise known as $toolexeclibdir.  This is so that it is 
possible to get newly-built libraries, particularly the shared ones, 
installed in a common place, so that they can be readily used by the 
target system as their host libraries, possibly over NFS, without a need 
to manually copy them over from the currently hardcoded location they 
would otherwise be installed in.

In the presence of the `--enable-version-specific-runtime-libs' option 
and for configurations building native GCC the option is ignored.

config/
* toolexeclibdir.m4: New file.

gcc/
* doc/install.texi (Cross-Compiler-Specific Options): Document 
`--with-toolexeclibdir' option.

libada/
* Makefile.in (configure_deps): Add `toolexeclibdir.m4'.
* configure.ac: Handle `--with-toolexeclibdir='.
* configure: Regenerate.

libatomic/
* configure.ac: Handle `--with-toolexeclibdir='.
* Makefile.in: Regenerate.
* aclocal.m4: Regenerate.
* configure: Regenerate.
* testsuite/Makefile.in: Regenerate.

libffi/
* configure.ac: Handle `--with-toolexeclibdir='.
* Makefile.in: Regenerate.
* aclocal.m4: Regenerate.
* configure: Regenerate.
* include/Makefile.in: Regenerate.
* man/Makefile.in: Regenerate.
* testsuite/Makefile.in: Regenerate.

libgcc/
* Makefile.in (configure_deps): Add `toolexeclibdir.m4'.
* configure.ac: Handle `--with-toolexeclibdir='.
* configure: Regenerate.

libgfortran/
* configure.ac: Handle `--with-toolexeclibdir='.
* Makefile.in: Regenerate.
* aclocal.m4: Regenerate.
* configure: Regenerate.

libgo/
* configure.ac: Handle `--with-toolexeclibdir='.
* Makefile.in: Regenerate.
* aclocal.m4: Regenerate.
* configure: Regenerate.
* testsuite/Makefile.in: Regenerate.

libgomp/
* configure.ac: Handle `--with-toolexeclibdir='.
* Makefile.in: Regenerate.
* aclocal.m4: Regenerate.
* configure: Regenerate.
* testsuite/Makefile.in: Regenerate.

libhsail-rt/
* configure.ac: Handle `--with-toolexeclibdir='.
* Makefile.in: Regenerate.
* aclocal.m4: Regenerate.
* configure: Regenerate.

libitm/
* configure.ac: Handle `--with-toolexeclibdir='.
* Makefile.in: Regenerate.
* aclocal.m4: Regenerate.
* configure: Regenerate.
* testsuite/Makefile.in: Regenerate.

libobjc/
* Makefile.in (aclocal_deps): Add `toolexeclibdir.m4'.
* aclocal.m4: Include `toolexeclibdir.m4'.
* configure.ac: Handle `--with-toolexeclibdir='.
* configure: Regenerate.

liboffloadmic/
* plugin/configure.ac: Handle `--with-toolexeclibdir='.
* plugin/Makefile.in: Regenerate.
* plugin/aclocal.m4: Regenerate.
* plugin/configure: Regenerate.
* configure.ac: Handle `--with-toolexeclibdir='.
* Makefile.in: Regenerate.
* aclocal.m4: Regenerate.
* configure: Regenerate.

libphobos/
* m4/druntime.m4: Handle `--with-toolexeclibdir='.
* m4/Makefile.in: Regenerate.
* libdruntime/Makefile.in: Regenerate.
* src/Makefile.in: Regenerate.
* testsuite/Makefile.in: Regenerate.
* Makefile.in: Regenerate.
* aclocal.m4: Regenerate.
* configure: Regenerate.

libquadmath/
* configure.ac: Handle `--with-toolexeclibdir='.
* Makefile.in: Regenerate.
* aclocal.m4: Regenerate.
* configure: Regenerate.

libsanitizer/
* configure.ac: Handle `--with-toolexeclibdir='.
* Makefile.in: Regenerate.
* aclocal.m4: Regenerate.
* configure: Regenerate.
* asan/Makefile.in: Regenerate.
* interception/Makefile.in: Regenerate.
* libbacktrace/Makefile.in: Regenerate.
* lsan/Makefile.in: Regenerate.
* sanitizer_common/Makefile.in: Regenerate.
* tsan/Makefile.in: Regenerate.
* ubsan/Makefile.in: Regenerate.

libssp/
* configure.ac: Handle `--with-toolexeclibdir='.
* Makefile.in: Regenerate.
* aclocal.m4: Regenerate.
* configure: Regenerate.

libstdc++-v3/
* acinclude.m4: Handle `--with-toolexeclibdir='.
* Makefile.in: Regenerate.
* aclocal.m4: Regenerate.
* configure: Regenerate.
* doc/Makefile.in: Regenerate.
* include/Makefile.in: Regenerate.
* libsupc++/Makefile.in: Regenerate.
* po/Makefile.in: Regenerate.
* python/Makefile.in: Regenerate.
* src/Makefile.in: Regenerate.
* src/c++11/Makefile.in:

Host/device shared memory (was: [patch, libgomp] Enable OpenACC GCN testing)

2019-12-02 Thread Thomas Schwinge

Hi!

On 2019-11-15T13:43:04+0100, Jakub Jelinek  wrote:
> On Fri, Nov 15, 2019 at 12:38:06PM +, Andrew Stubbs wrote:
>> On 15/11/2019 12:21, Jakub Jelinek wrote:
>> > I'm surprised by the set acc_mem_shared 0, I thought gcn is a shared memory
>> > offloading target.
>> 
>> APUs, such as Carizzo are shared memory. DGPUs, such as Fiji and Vega, have
>> their own memory. A DGPU can access host memory, provided that it has been
>> set up just so, but that is very slow, and I don't know of a way to do that
>> without still having to copy the program data into that special region.

For a few years already, Nvidia GPUs/drivers have been supporting what
they call Unified Memory, where the driver/kernel automatically handles
the movement of memory pages between host/device memories.  Given some
reasonable pre-fetching logic (either automatic in the driver/kernel, or
"guided" by the compiler/runtime), this reportedly achieves good
performance -- or even better performance than manually-managed memory
copying, as really only the data pages accessed (plus pre-fetched) will
be copied.

For example, see  "Compiler
assisted hybrid implicit and explicit GPU memory management under unified
address space", which I've recently (SuperComputing 2019) have seen
presented, or other publications.

This is not currently implemented in GCC, but could/should be at some
point.

This (or even a mixture of manual-discrete/automatic-shared?) would then
be an execution mode of libgomp/plugin, selected at run-time?

> Ah, that is going to be interesting e.g. for
> #pragma omp requires unified_shared_memory
> My plan was that for this the TU constructor would call some libgomp
> library function that in the end would when loading plugins ensure that
> only shared memory supporting plugins are loaded.  If the gcn plugin
> will support shared memory only conditionally, we'll need some interfaces to
> ensure that.

(I have not yet completely digested the relevant OpenMP features.)

Grüße
 Thomas

signature.asc
Description: PGP signature

Re: [patch, libgomp] Enable OpenACC GCN testing

2019-12-02 Thread Thomas Schwinge

Hi Andrew!

Sorry for the delay.

On 2019-11-14T16:36:38+, Andrew Stubbs  wrote:
> This patch adds some necessary bits to enable OpenACC testings for 
> amdgcn offloading.

Generally, I'm in favor if you'd consider such a thing (that in principle
is just a copy/adapt of the existing cases) as obvious to commit (even
more so with your "amdgcn port" maintainer hat on), especially so given
that this has been/is blocking you, as Tobias told me more than once.

That said:

> --- a/libgomp/testsuite/lib/libgomp.exp
> +++ b/libgomp/testsuite/lib/libgomp.exp
> @@ -316,6 +316,9 @@ proc offload_target_to_openacc_device_type { 
> offload_target } {
>   nvptx* {
>   return "nvidia"
>   }
> + amdgcn* {
> + return "gcn"
> + }
>   default {
>   error "Unknown offload target: $offload_target"
>   }

Maintain alphabetical sorting?

> @@ -444,3 +447,28 @@ proc check_effective_target_hsa_offloading_selected {} {
>   check_effective_target_hsa_offloading_selected_nocache
>  }]
>  }
> +# Return 1 if at least one AMD GCN board is present.

Missing vertical space?

> +proc check_effective_target_openacc_amdgcn_accel_present { } {

> +proc check_effective_target_openacc_amdgcn_accel_selected { } {

I'd also have inserted these maintaining alphabetical sorting, but I see
the existing other stuff also doesn't, so no need to bother.

For all the following three:

> --- a/libgomp/testsuite/libgomp.oacc-c++/c++.exp
> +++ b/libgomp/testsuite/libgomp.oacc-c++/c++.exp
> @@ -106,6 +106,9 @@ if { $lang_test_file_found } {
>  
>   set acc_mem_shared 0
>   }
> + gcn {
> + set acc_mem_shared 0
> + }
>   default {
>   error "Unknown OpenACC device type: $openacc_device_type 
> (offload target: $offload_target)"
>   }

> --- a/libgomp/testsuite/libgomp.oacc-c/c.exp
> +++ b/libgomp/testsuite/libgomp.oacc-c/c.exp
> @@ -69,6 +69,9 @@ foreach offload_target [concat [split $offload_targets ","] 
> "disable"] {
>  
>   set acc_mem_shared 0
>   }
> + gcn {
> + set acc_mem_shared 0
> + }
>   default {
>   error "Unknown OpenACC device type: $openacc_device_type (offload 
> target: $offload_target)"
>   }

> --- a/libgomp/testsuite/libgomp.oacc-fortran/fortran.exp
> +++ b/libgomp/testsuite/libgomp.oacc-fortran/fortran.exp
> @@ -94,6 +94,9 @@ if { $lang_test_file_found } {
>  
>   set acc_mem_shared 0
>   }
> + gcn {
> + set acc_mem_shared 0
> + }
>   default {
>   error "Unknown OpenACC device type: $openacc_device_type 
> (offload target: $offload_target)"
>   }

..., maintain alphabetical sorting, and please also include the
'untested' check/skip, as done in the 'nvidia' cases.


To record the review effort, please include "Reviewed-by: Thomas Schwinge
" in the commit log, see
.


Grüße
 Thomas


signature.asc
Description: PGP signature

Re: Symver attribute

> On Mon, Dec 02, 2019 at 01:53:09PM +0100, Jan Hubicka wrote:
> > > > It would be great to convert libstdc++ to the new infrastructure so it
> > > > becomes LTO safe and gives some real world testing to this
> > > > infrastructure.  I tried that but found it is bit non-trivial since
> > > > currently way we need to attach the attribute to definition itself,
> > > > while current .symver output is done in separate C++ files.
> 
> What is the reason for this limitation?  I think that is pretty severe.
> Wouldn't it be enough to accept symver attribute on any declaration, but at
> the end verify that declarations that have that attribute have the
> definition in the current TU?
> 
> I mean, it is fairly common to:
> void foo ()
> {
>...
> }
> 
> asm (".symver ...");
> where the foo definition comes from one source and symver from another one,
> which includes the first source.
> 
> So, the corresponding new rewrite for that might be:
> void foo ()
> {
>   ...
> }
> 
> extern __typeof (foo) foo __attribute__((symver ("bar@@BAZ")));

Aha, this works :)  I basically meant that I need the declaration which
I found somewhat nontrivial to dig out of the libstdc++ headers.  I did
not think of typeof. I suppose I can add this as an example to the
dcumentation (as an example convertion .symver assembly into attribute).

Honza
> 
>   Jakub
>

Re: [Patch][Fortran] OpenACC – permit common blocks in some clauses


Hi Thomas,

for completeness, I tried now *blank commons* with OpenACC in *PGI's 
pgfortran.*


From the error message, it looks as if the parser does not handle blank 
commons at all. (Matches the current parser in gfortran.) pgfortran is 
also not very good at diagnostics as nonexisting common block names are 
not diagnosed.


Cheers,

Tobias

Re: Symver attribute

On Mon, Dec 02, 2019 at 01:53:09PM +0100, Jan Hubicka wrote:
> > > It would be great to convert libstdc++ to the new infrastructure so it
> > > becomes LTO safe and gives some real world testing to this
> > > infrastructure.  I tried that but found it is bit non-trivial since
> > > currently way we need to attach the attribute to definition itself,
> > > while current .symver output is done in separate C++ files.

What is the reason for this limitation?  I think that is pretty severe.
Wouldn't it be enough to accept symver attribute on any declaration, but at
the end verify that declarations that have that attribute have the
definition in the current TU?

I mean, it is fairly common to:
void foo ()
{
   ...
}

asm (".symver ...");
where the foo definition comes from one source and symver from another one,
which includes the first source.

So, the corresponding new rewrite for that might be:
void foo ()
{
  ...
}

extern __typeof (foo) foo __attribute__((symver ("bar@@BAZ")));

Jakub

Re: Symver attribute

> > It would be great to convert libstdc++ to the new infrastructure so it
> > becomes LTO safe and gives some real world testing to this
> > infrastructure.  I tried that but found it is bit non-trivial since
> > currently way we need to attach the attribute to definition itself,
> > while current .symver output is done in separate C++ files.
> 
> I will add it to my TODO list.

Thank you! It would be great to have some evidence that things works as
expected and the attribute is practically useful for real world
projects.

I will update the documentation by your suggestions.

Honza

Re: [wwwdocs][Fortran] gcc-10/porting_to.html – Fortran's argument-checking

Revised version. Changes: I now assume that this patch goes in before 
Wilco's; hence, it includes the full HTML file and the changes.html 
modification.


Additionally, I improved the wording at several places a tiny bit – it 
really helps to re-read what one has written :-) (Thanks, Gerald, I 
followed most of your suggestions, though some became obsolete due to 
other changes.)


Any other comments? Otherwise, I would like to commit it :-)

Tobias

 htdocs/gcc-10/changes.html|   4 +-
 htdocs/gcc-10/porting_to.html | 134 ++
 2 files changed, 136 insertions(+), 2 deletions(-)

diff --git a/htdocs/gcc-10/changes.html b/htdocs/gcc-10/changes.html
index 5cca0977..6e45c8fe 100644
--- a/htdocs/gcc-10/changes.html
+++ b/htdocs/gcc-10/changes.html
@@ -17,11 +17,11 @@
 
 This page is a "brief" summary of some of the huge number of improvements
 in GCC 10.
-
+
 
 
 Note: GCC 10 has not been released yet, so this document is
diff --git a/htdocs/gcc-10/porting_to.html b/htdocs/gcc-10/porting_to.html
new file mode 100644
index ..2c404191
--- /dev/null
+++ b/htdocs/gcc-10/porting_to.html
@@ -0,0 +1,134 @@
+
+
+
+
+
+Porting to GCC 10
+https://gcc.gnu.org/gcc.css; />
+
+
+
+Porting to GCC 10
+
+
+The GCC 10 release series differs from previous GCC releases in
+a number of ways. Some of these are a result
+of bug fixing, and some old behaviors have been intentionally changed
+to support new standards, or relaxed in standards-conforming ways to
+facilitate compilation or run-time performance.
+
+
+
+Some of these changes are user visible and can cause grief when
+porting to GCC 10. This document is an effort to identify common issues
+and provide solutions. Let us know if you have suggestions for improvements!
+
+
+
+
+
+
+
+Fortran language issues
+
+Argument mismatches
+
+
+GCC 10 now rejects argument mismatches occurring in the same source file.
+Those are not permitted by the Fortran standard and in general have the
+potential to generate invalid code.  However, the Fortran standard does permit
+passing an array element or a scalar string (of default character kind or of
+c_char kind) as actual argument to an array dummy argument.
+(For the exact wording, see the Fortran standard on argument association;
+in particular, Fortran 2018, Sect. 15.5.2.4, Para. 4.)
+
+
+
+Depending on their nature, argument mismatches have the potential to cause the
+generation of invalid code and, hence, should be investigated.  The most common
+reason that code fails due to newly enforced check is the following:  instead of
+using an array element as actual argument, a scalar is used; one solution is to
+replace the scalar by a size-one array.  (This should be passed as a whole as
+there is no point in passing it as array element.)  Additionally, check that the
+code indeed only accesses this single element.  —  Other mismatches occur more
+rarely but usually indicate more serious bugs where a wrong result is likely
+(at least for some target-platform and optimization combination).
+
+
+
+If short-term fixing of those issues is not feasible, the compiler flag
+-fallow-argument-mismatch (implied by -std=legacy)
+downgrades the error to a warning.
+
+
+
+Example: Assume a subroutine which takes an assumed-size or explicit-size array
+and the array size as another argument, such as
+
+
+
+  subroutine sub_assumed(array, n)
+real array(*)
+integer n
+…
+  end
+
+  subroutine another_explicit(array, n)
+integer n
+real array(n)
+…
+  end
+
+
+
+An invalid but comparably common use is to pass scalar to such procedures:
+
+
+
+  real var
+  …
+  call sub_assumed(var, 1)
+
+
+
+This can be fixed in several ways. The simplest and most localized one is the
+following; the scalar is replaced by an array.  In the second subroutine, it
+is assumed that the argument is both read from and written to. In the third
+procedure, a single number is passed, which is assumed to be only accessed for
+reading. (Note: By adding the brackets, a Fortran 66 or 77 compiler can no
+longer compile it.)
+
+
+  subroutine caller()
+real var(1)
+…
+call sub_assumed(var, 1)
+  end
+
+  subroutine caller_arg(var)
+real var
+real var2(1)
+…
+var2(1)var2(1) = var
+call sub_assumed(var2, 1)
+var = var2(1)
+  end
+
+  subroutine caller_readonly(var)
+…
+! Before: var = func(42.0, 1)
+var = func([42.0], 1)
+
+
+
+
+Links
+
+
+

Re: [PATCH][GCC8][AArch64] Backport Cortex-A76, Ares and Neoverse N1 cpu names

2019-12-02 Thread Kyrill Tkachov




On 12/2/19 12:14 PM, Wilco Dijkstra wrote:

Add support for Cortex-A76, Ares and Neoverse N1 cpu names in GCC8 branch.

2019-11-29  Wilco Dijkstra  

    * config/aarch64/aarch64-cores.def (ares): Define.
    (cortex-a76): Likewise.
    (neoverse-n1): Likewise.
    * config/aarch64/aarch64-tune.md: Regenerate.
    * doc/invoke.texi (AArch64 Options): Document ares, cortex-a76 and
    neoverse-n1.

Ok as it's very non-invasive and provides a convenience to users of that 
branch.


Thanks,

Kyrill



--
diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index 
33b96ca2861dce506a854cff19cfcaa32f0db23a..f48b7c22b2d261203ac25c010a054e47c291ddfc 
100644

--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -85,6 +85,9 @@ AARCH64_CORE("thunderx2t99", thunderx2t99,  
thunderx2t99, 8_1A,  AARCH64_FL_FOR

 /* ARM ('A') cores. */
 AARCH64_CORE("cortex-a55",  cortexa55, cortexa53, 8_2A,  
AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | 
AARCH64_FL_DOTPROD, cortexa53, 0x41, 0xd05, -1)
 AARCH64_CORE("cortex-a75",  cortexa75, cortexa57, 8_2A,  
AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | 
AARCH64_FL_DOTPROD, cortexa73, 0x41, 0xd0a, -1)
+AARCH64_CORE("cortex-a76",  cortexa76, cortexa57, 8_2A,  
AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | 
AARCH64_FL_DOTPROD, cortexa72, 0x41, 0xd0b, -1)
+AARCH64_CORE("ares",    ares,  cortexa57, 8_2A, 
AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | 
AARCH64_FL_DOTPROD, cortexa72, 0x41, 0xd0c, -1)
+AARCH64_CORE("neoverse-n1", neoversen1,cortexa57, 8_2A,  
AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | 
AARCH64_FL_DOTPROD, cortexa72, 0x41, 0xd0c, -1)


 /* ARMv8.3-A Architecture Processors.  */

diff --git a/gcc/config/aarch64/aarch64-tune.md 
b/gcc/config/aarch64/aarch64-tune.md
index 
7b3a7460561ee87e13799f726919c3f870781f6d..f08b7e44b27beeb41df928cf3aa09e59e734b5d2 
100644

--- a/gcc/config/aarch64/aarch64-tune.md
+++ b/gcc/config/aarch64/aarch64-tune.md
@@ -1,5 +1,5 @@
 ;; -*- buffer-read-only: t -*-
 ;; Generated automatically by gentune.sh from aarch64-cores.def
 (define_attr "tune"
-"cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,thunderxt81,thunderxt83,xgene1,falkor,qdf24xx,exynosm1,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55"
+"cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,thunderxt81,thunderxt83,xgene1,falkor,qdf24xx,exynosm1,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,ares,neoversen1,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55"
 (const (symbol_ref "((enum attr_tune) aarch64_tune)")))
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 
c63f5611afb52b2358207a458dd6c275403a5a45..57340cea31df315ce37cfd57e084844da78df9fe 
100644

--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -14747,6 +14747,7 @@ Specify the name of the target processor for 
which GCC should tune the

 performance of the code.  Permissible values for this option are:
 @samp{generic}, @samp{cortex-a35}, @samp{cortex-a53}, @samp{cortex-a55},
 @samp{cortex-a57}, @samp{cortex-a72}, @samp{cortex-a73}, 
@samp{cortex-a75},

+@samp{cortex-a76}, @samp{ares}, @samp{neoverse-n1}
 @samp{exynos-m1}, @samp{falkor}, @samp{qdf24xx}, @samp{saphira},
 @samp{xgene1}, @samp{vulcan}, @samp{thunderx},
 @samp{thunderxt88}, @samp{thunderxt88p1}, @samp{thunderxt81},

[PATCH][GCC8][AArch64] Backport Cortex-A76, Ares and Neoverse N1 cpu names

2019-12-02 Thread Wilco Dijkstra

Add support for Cortex-A76, Ares and Neoverse N1 cpu names in GCC8 branch.

2019-11-29  Wilco Dijkstra   

* config/aarch64/aarch64-cores.def (ares): Define.
(cortex-a76): Likewise.
(neoverse-n1): Likewise.
* config/aarch64/aarch64-tune.md: Regenerate.
* doc/invoke.texi (AArch64 Options): Document ares, cortex-a76 and
neoverse-n1.

--
diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index 
33b96ca2861dce506a854cff19cfcaa32f0db23a..f48b7c22b2d261203ac25c010a054e47c291ddfc
 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -85,6 +85,9 @@ AARCH64_CORE("thunderx2t99",  thunderx2t99,  thunderx2t99, 
8_1A,  AARCH64_FL_FOR
 /* ARM ('A') cores. */
 AARCH64_CORE("cortex-a55",  cortexa55, cortexa53, 8_2A,  
AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD, 
cortexa53, 0x41, 0xd05, -1)
 AARCH64_CORE("cortex-a75",  cortexa75, cortexa57, 8_2A,  
AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD, 
cortexa73, 0x41, 0xd0a, -1)
+AARCH64_CORE("cortex-a76",  cortexa76, cortexa57, 8_2A,  
AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD, 
cortexa72, 0x41, 0xd0b, -1)
+AARCH64_CORE("ares",   ares,  cortexa57, 8_2A,  AARCH64_FL_FOR_ARCH8_2 
| AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD, cortexa72, 0x41, 
0xd0c, -1)
+AARCH64_CORE("neoverse-n1", neoversen1,cortexa57, 8_2A,  
AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD, 
cortexa72, 0x41, 0xd0c, -1)
 
 /* ARMv8.3-A Architecture Processors.  */
 
diff --git a/gcc/config/aarch64/aarch64-tune.md 
b/gcc/config/aarch64/aarch64-tune.md
index 
7b3a7460561ee87e13799f726919c3f870781f6d..f08b7e44b27beeb41df928cf3aa09e59e734b5d2
 100644
--- a/gcc/config/aarch64/aarch64-tune.md
+++ b/gcc/config/aarch64/aarch64-tune.md
@@ -1,5 +1,5 @@
 ;; -*- buffer-read-only: t -*-
 ;; Generated automatically by gentune.sh from aarch64-cores.def
 (define_attr "tune"
-   
"cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,thunderxt81,thunderxt83,xgene1,falkor,qdf24xx,exynosm1,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55"
+   
"cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,thunderxt81,thunderxt83,xgene1,falkor,qdf24xx,exynosm1,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,ares,neoversen1,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55"
(const (symbol_ref "((enum attr_tune) aarch64_tune)")))
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 
c63f5611afb52b2358207a458dd6c275403a5a45..57340cea31df315ce37cfd57e084844da78df9fe
 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -14747,6 +14747,7 @@ Specify the name of the target processor for which GCC 
should tune the
 performance of the code.  Permissible values for this option are:
 @samp{generic}, @samp{cortex-a35}, @samp{cortex-a53}, @samp{cortex-a55},
 @samp{cortex-a57}, @samp{cortex-a72}, @samp{cortex-a73}, @samp{cortex-a75},
+@samp{cortex-a76}, @samp{ares}, @samp{neoverse-n1}
 @samp{exynos-m1}, @samp{falkor}, @samp{qdf24xx}, @samp{saphira},
 @samp{xgene1}, @samp{vulcan}, @samp{thunderx},
 @samp{thunderxt88}, @samp{thunderxt88p1}, @samp{thunderxt81},

Re: [wwwdocs][Fortran] gcc-10/porting_to.html – Fortran's argument-checking

2019-12-02 Thread Gerald Pfeifer

On Mon, 2 Dec 2019, Tobias Burnus wrote:
> Hence: How about the following?

I noticed some minor issues (below).  Generally this looks fine,
though I better defer to someone more versed in Fortran. ;-)

> Additionally, one should check that the code indeed only accesses

Here I'd say "Additionally check that..." (which is simpler, and the
"one should", or "man sollte" in German, is usually recommended against
in documentation.)

> Other mismatches are more rarely but usually indicate

"more rare" (not an adverb here)

>  one should check that the code indeed only accesses this single element 
> and, thus, does handles it correctly.

"does handle" or simply "handles"

> If short-term fixing of those issues is not feasible, the compiler flag 
> -fallow-argument-mismatch (implied by 
> -std=legacy) can be used to downgrade the error to a 
> warning.

Here we could say "downgrades the error" instead of the longer "can
be used to downgrade the error"; and in general active voice is 
preferrable.

> An invalid but comparably common use is to pass scalar as argument, 
> which is invalid:

"to pass a scalar"?

And maybe avoid one of the uses of invalid, dropping the "which is
invalid" part?

Thank you,
Gerald

Re: Symver attribute

2019-12-02 Thread Jonathan Wakely


On 30/11/19 22:13 +0100, Jan Hubicka wrote:

Hi,
this is patch incorporating (I hope) all the suggestions and corrections
which I applied.  I will work incremnetaly on supporting the name@@@node
semantics which is bit different from @ and @@ one by actually removing
the original symbol.  Here I need to change assembler name of the symbol
itself but then I still need an alternative assembler name for the local
use. This is bit like weakref rewriting but not quite and it seems it is
better to do that incrementally.

It would be great to convert libstdc++ to the new infrastructure so it
becomes LTO safe and gives some real world testing to this
infrastructure.  I tried that but found it is bit non-trivial since
currently way we need to attach the attribute to definition itself,
while current .symver output is done in separate C++ files.


I will add it to my TODO list.



Index: doc/extend.texi
===
--- doc/extend.texi (revision 278877)
+++ doc/extend.texi (working copy)
@@ -3711,6 +3711,41 @@ Function Attributes}, @ref{PowerPC Funct
@ref{Nios II Function Attributes}, and @ref{S/390 Function Attributes}
for details.

+@item symver ("@var{name2}@@@var{nodename}")
+On ELF targets this attribute creates a symbol version.  The @var{name2} part
+of the parameter is the actual name of the symbol by which it will be
+externally referenced.  The @code{nodename} portion should be the name of a
+node specified in the version script supplied to the linker when building a
+shared library.  Versioned symbol must be defined and must be exported with


I think this should be either "A versioned symbol" or "Versioned
symbols".


+default visibility.
+
+@smallexample
+__attribute__ ((__symver__ ("foo@@VERS_1"))) int
+foo_v1 (void)
+@{
+@}
+@end smallexample
+
+Will produce a @code{.symver foo_v1, foo@@VERS_1} directive in the assembler
+output.
+
+It's an error to define multiple version of a given symbol.  In such case


"In such cases"


+an alias can be used.
+
+@smallexample
+__attribute__ ((__symver__ ("foo@@VERS_2")))
+__attribute__ ((alias ("foo_v1")))
+int symver_foo_v1 (void);
+@end smallexample
+
+This example creates an alias of @code{foo_v1} with symbol name
+@code{symver_foo_v1} which will be version @code{VERS_2} of @code{foo}.
+
+Finally if the parameter is @code{"@var{name2}@var{nodename}"} then in
+addition to creating a symbol version (as if
+@code{"@var{name2}@@@var{nodename}"} was used) the version will be also used


s/was used/had been used/

s/will be also used/will also be used/

Re: [PATCH] Refactor IPA devirt a bit.

> 
> Sure, I'm sending updated version of the patch.
> 
> Ready for trunk now?

OK, thanks!

Honza

Re: [PATCH] Refactor IPA devirt a bit.


On 12/2/19 11:50 AM, Jan Hubicka wrote:

Hello.

The patches makes a small refactoring in ipa-devirt.c and comes up
with a handy debugging function debug_tree_odr_name.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

gcc/ChangeLog:

2019-12-02  Martin Liska  

* ipa-devirt.c (warn_types_mismatch): Use get_odr_name_for_type.
* print-tree.c (debug_tree_odr_name): New.
* print-tree.h (debug_tree_odr_name): New.
* tree.h (get_odr_name_for_type): New.

gcc/testsuite/ChangeLog:

2019-12-02  Martin Liska  

* g++.dg/lto/odr-7_0.C: New test.
* g++.dg/lto/odr-7_1.C: New test.


This is OK except ...

+/* Print ODR name of a TYPE if available.
+   Use demangler when option DEMANGLE is used.  */
+
+DEBUG_FUNCTION void
+debug_tree_odr_name (tree type, bool demangle)


I would probably keep these in ipa-devirt since it has everything
related to devirtualization in it and there is not much related
to normal tree printing machinery.

+/* If TYPE has mangled ODR name, return it.  Otherwise return NULL.  */
+
+inline const char *
+get_odr_name_for_type (tree type)


Similarly here I would keep it in ipa-util where all the other ODR
related stuff is (next stage1 we could move it into something like
odr-type.[ch]).

Also document that it works only after free_lang_data was run.


Sure, I'm sending updated version of the patch.

Ready for trunk now?

Martin



Honza

+{
+  tree type_name = TYPE_NAME (type);
+  if (type_name == NULL_TREE
+  || !DECL_ASSEMBLER_NAME_SET_P (type_name))
+return NULL;
+
+  return IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (type_name));
+}
+
  /* A struct for encapsulating location information about an operator
 and the operation built from it.
  





>From 6126f8f44106cd779a2d77a4755c68bcf6b9fa20 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Mon, 2 Dec 2019 09:59:26 +0100
Subject: [PATCH] Refactor IPA devirt a bit.

gcc/ChangeLog:

2019-12-02  Martin Liska  

	* ipa-devirt.c (warn_types_mismatch): Use get_odr_name_for_type
	function.
	(debug_tree_odr_name): New.
	* ipa-utils.h (get_odr_name_for_type): New.

gcc/testsuite/ChangeLog:

2019-12-02  Martin Liska  

	* g++.dg/lto/odr-7_0.C: New test.
	* g++.dg/lto/odr-7_1.C: New test.
---
 gcc/ipa-devirt.c   | 37 +++---
 gcc/ipa-utils.h| 14 +++
 gcc/testsuite/g++.dg/lto/odr-7_0.C | 18 +++
 gcc/testsuite/g++.dg/lto/odr-7_1.C | 13 +++
 4 files changed, 68 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/lto/odr-7_0.C
 create mode 100644 gcc/testsuite/g++.dg/lto/odr-7_1.C

diff --git a/gcc/ipa-devirt.c b/gcc/ipa-devirt.c
index a884a465a5d..1017b2a5c7c 100644
--- a/gcc/ipa-devirt.c
+++ b/gcc/ipa-devirt.c
@@ -1036,20 +1036,13 @@ warn_types_mismatch (tree t1, tree t2, location_t loc1, location_t loc2)
   /* If types have mangled ODR names and they are different, it is most
  informative to output those.
  This also covers types defined in different namespaces.  */
-  if (TYPE_NAME (mt1) && TYPE_NAME (mt2)
-  && TREE_CODE (TYPE_NAME (mt1)) == TYPE_DECL
-  && TREE_CODE (TYPE_NAME (mt2)) == TYPE_DECL
-  && DECL_ASSEMBLER_NAME_SET_P (TYPE_NAME (mt1))
-  && DECL_ASSEMBLER_NAME_SET_P (TYPE_NAME (mt2))
-  && DECL_ASSEMBLER_NAME (TYPE_NAME (mt1))
-	 != DECL_ASSEMBLER_NAME (TYPE_NAME (mt2)))
-{
-  char *name1 = xstrdup (cplus_demangle
-	 (IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (TYPE_NAME (mt1))),
-	  DMGL_PARAMS | DMGL_ANSI | DMGL_TYPES));
-  char *name2 = cplus_demangle
-	 (IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (TYPE_NAME (mt2))),
-	  DMGL_PARAMS | DMGL_ANSI | DMGL_TYPES);
+  const char *odr1 = get_odr_name_for_type (mt1);
+  const char *odr2 = get_odr_name_for_type (mt2);
+  if (odr1 != NULL && odr2 != NULL && odr1 != odr2)
+{
+  const int opts = DMGL_PARAMS | DMGL_ANSI | DMGL_TYPES;
+  char *name1 = xstrdup (cplus_demangle (odr1, opts));
+  char *name2 = xstrdup (cplus_demangle (odr2, opts));
   if (name1 && name2 && strcmp (name1, name2))
 	{
 	  inform (loc_t1,
@@ -3989,4 +3982,20 @@ make_pass_ipa_devirt (gcc::context *ctxt)
   return new pass_ipa_devirt (ctxt);
 }
 
+/* Print ODR name of a TYPE if available.
+   Use demangler when option DEMANGLE is used.  */
+
+DEBUG_FUNCTION void
+debug_tree_odr_name (tree type, bool demangle)
+{
+  const char *odr = get_odr_name_for_type (type);
+  if (demangle)
+{
+  const int opts = DMGL_PARAMS | DMGL_ANSI | DMGL_TYPES;
+  odr = cplus_demangle (odr, opts);
+}
+
+  fprintf (stderr, "%s\n", odr);
+}
+
 #include "gt-ipa-devirt.h"
diff --git a/gcc/ipa-utils.h b/gcc/ipa-utils.h
index 60c52e0fa53..81a54797558 100644
--- a/gcc/ipa-utils.h
+++ b/gcc/ipa-utils.h
@@ -248,4 +248,18 @@ odr_type_p (const_tree t)
  && DECL_ASSEMBLER_NAME_SET_P (TYPE_NAME (t));
 }
 
+/* If TYPE has mangled ODR name, return it.

Re: [PATCH] Refactor IPA devirt a bit.

> Hello.
> 
> The patches makes a small refactoring in ipa-devirt.c and comes up
> with a handy debugging function debug_tree_odr_name.
> 
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> 
> Ready to be installed?
> Thanks,
> Martin
> 
> gcc/ChangeLog:
> 
> 2019-12-02  Martin Liska  
> 
>   * ipa-devirt.c (warn_types_mismatch): Use get_odr_name_for_type.
>   * print-tree.c (debug_tree_odr_name): New.
>   * print-tree.h (debug_tree_odr_name): New.
>   * tree.h (get_odr_name_for_type): New.
> 
> gcc/testsuite/ChangeLog:
> 
> 2019-12-02  Martin Liska  
> 
>   * g++.dg/lto/odr-7_0.C: New test.
>   * g++.dg/lto/odr-7_1.C: New test.

This is OK except ...
> +/* Print ODR name of a TYPE if available.
> +   Use demangler when option DEMANGLE is used.  */
> +
> +DEBUG_FUNCTION void
> +debug_tree_odr_name (tree type, bool demangle)

I would probably keep these in ipa-devirt since it has everything
related to devirtualization in it and there is not much related
to normal tree printing machinery.
> +/* If TYPE has mangled ODR name, return it.  Otherwise return NULL.  */
> +
> +inline const char *
> +get_odr_name_for_type (tree type)

Similarly here I would keep it in ipa-util where all the other ODR
related stuff is (next stage1 we could move it into something like
odr-type.[ch]).

Also document that it works only after free_lang_data was run.

Honza
> +{
> +  tree type_name = TYPE_NAME (type);
> +  if (type_name == NULL_TREE
> +  || !DECL_ASSEMBLER_NAME_SET_P (type_name))
> +return NULL;
> +
> +  return IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (type_name));
> +}
> +
>  /* A struct for encapsulating location information about an operator
> and the operation built from it.
>  
>

[PATCH] Fix PR92742



Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2019-12-02  Richard Biener  

PR tree-optimization/92742
* tree-vect-loop.c (vect_fixup_reduc_chain): Do not
touch the def-type but verify it is consistent with the
original stmts.

* gcc.dg/torture/pr92742.c: New testcase.

Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c(revision 278893)
+++ gcc/tree-vect-loop.c(working copy)
@@ -643,6 +643,8 @@ vect_fixup_reduc_chain (stmt_vec_info st
   do
 {
   stmtp = STMT_VINFO_RELATED_STMT (stmt_info);
+  gcc_checking_assert (STMT_VINFO_DEF_TYPE (stmtp)
+  == STMT_VINFO_DEF_TYPE (stmt_info));
   REDUC_GROUP_FIRST_ELEMENT (stmtp) = firstp;
   stmt_info = REDUC_GROUP_NEXT_ELEMENT (stmt_info);
   if (stmt_info)
@@ -650,7 +652,6 @@ vect_fixup_reduc_chain (stmt_vec_info st
  = STMT_VINFO_RELATED_STMT (stmt_info);
 }
   while (stmt_info);
-  STMT_VINFO_DEF_TYPE (stmtp) = vect_reduction_def;
 }
 
 /* Fixup scalar cycles that now have their stmts detected as patterns.  */
Index: gcc/testsuite/gcc.dg/torture/pr92742.c
===
--- gcc/testsuite/gcc.dg/torture/pr92742.c  (nonexistent)
+++ gcc/testsuite/gcc.dg/torture/pr92742.c  (working copy)
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-ftree-vectorize" } */
+
+unsigned int qw;
+
+int
+rs (int iq, int wg)
+{
+  for (qw = 0; qw < 2; ++qw)
+{
+}
+
+  while (iq < 1)
+{
+  wg *= qw * 2;
+  ++iq;
+}
+
+  return wg;
+}

[PATCH] Refactor IPA devirt a bit.


Hello.

The patches makes a small refactoring in ipa-devirt.c and comes up
with a handy debugging function debug_tree_odr_name.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

gcc/ChangeLog:

2019-12-02  Martin Liska  

* ipa-devirt.c (warn_types_mismatch): Use get_odr_name_for_type.
* print-tree.c (debug_tree_odr_name): New.
* print-tree.h (debug_tree_odr_name): New.
* tree.h (get_odr_name_for_type): New.

gcc/testsuite/ChangeLog:

2019-12-02  Martin Liska  

* g++.dg/lto/odr-7_0.C: New test.
* g++.dg/lto/odr-7_1.C: New test.
---
 gcc/ipa-devirt.c   | 21 +++--
 gcc/print-tree.c   | 17 +
 gcc/print-tree.h   |  1 +
 gcc/testsuite/g++.dg/lto/odr-7_0.C | 18 ++
 gcc/testsuite/g++.dg/lto/odr-7_1.C | 13 +
 gcc/tree.h | 13 +
 6 files changed, 69 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/lto/odr-7_0.C
 create mode 100644 gcc/testsuite/g++.dg/lto/odr-7_1.C


diff --git a/gcc/ipa-devirt.c b/gcc/ipa-devirt.c
index a884a465a5d..e53461b1f5c 100644
--- a/gcc/ipa-devirt.c
+++ b/gcc/ipa-devirt.c
@@ -1036,20 +1036,13 @@ warn_types_mismatch (tree t1, tree t2, location_t loc1, location_t loc2)
   /* If types have mangled ODR names and they are different, it is most
  informative to output those.
  This also covers types defined in different namespaces.  */
-  if (TYPE_NAME (mt1) && TYPE_NAME (mt2)
-  && TREE_CODE (TYPE_NAME (mt1)) == TYPE_DECL
-  && TREE_CODE (TYPE_NAME (mt2)) == TYPE_DECL
-  && DECL_ASSEMBLER_NAME_SET_P (TYPE_NAME (mt1))
-  && DECL_ASSEMBLER_NAME_SET_P (TYPE_NAME (mt2))
-  && DECL_ASSEMBLER_NAME (TYPE_NAME (mt1))
-	 != DECL_ASSEMBLER_NAME (TYPE_NAME (mt2)))
-{
-  char *name1 = xstrdup (cplus_demangle
-	 (IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (TYPE_NAME (mt1))),
-	  DMGL_PARAMS | DMGL_ANSI | DMGL_TYPES));
-  char *name2 = cplus_demangle
-	 (IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (TYPE_NAME (mt2))),
-	  DMGL_PARAMS | DMGL_ANSI | DMGL_TYPES);
+  const char *odr1 = get_odr_name_for_type (mt1);
+  const char *odr2 = get_odr_name_for_type (mt2);
+  if (odr1 != NULL && odr2 != NULL && odr1 != odr2)
+{
+  const int opts = DMGL_PARAMS | DMGL_ANSI | DMGL_TYPES;
+  char *name1 = xstrdup (cplus_demangle (odr1, opts));
+  char *name2 = xstrdup (cplus_demangle (odr2, opts));
   if (name1 && name2 && strcmp (name1, name2))
 	{
 	  inform (loc_t1,
diff --git a/gcc/print-tree.c b/gcc/print-tree.c
index bd09ec4d7a7..5e728a2641a 100644
--- a/gcc/print-tree.c
+++ b/gcc/print-tree.c
@@ -34,6 +34,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-cfg.h"
 #include "dumpfile.h"
 #include "print-tree.h"
+#include "demangle.h"
 
 /* Define the hash table of nodes already seen.
Such nodes are not repeated; brief cross-references are used.  */
@@ -1141,6 +1142,22 @@ debug_raw (const tree_node *ptr)
 fprintf (stderr, "\n");
 }
 
+/* Print ODR name of a TYPE if available.
+   Use demangler when option DEMANGLE is used.  */
+
+DEBUG_FUNCTION void
+debug_tree_odr_name (tree type, bool demangle)
+{
+  const char *odr = get_odr_name_for_type (type);
+  if (demangle)
+{
+  const int opts = DMGL_PARAMS | DMGL_ANSI | DMGL_TYPES;
+  odr = cplus_demangle (odr, opts);
+}
+
+  fprintf (stderr, "%s\n", odr);
+}
+
 static void
 dump_tree_via_hooks (const tree_node *ptr, dump_flags_t options)
 {
diff --git a/gcc/print-tree.h b/gcc/print-tree.h
index cbea48c486e..10b8df53ac7 100644
--- a/gcc/print-tree.h
+++ b/gcc/print-tree.h
@@ -23,6 +23,7 @@ along with GCC; see the file COPYING3.  If not see
 extern void debug_tree (tree);
 extern void debug_raw (const tree_node );
 extern void debug_raw (const tree_node *ptr);
+extern void debug_tree_odr_name (tree, bool demangle = true);
 extern void debug (const tree_node );
 extern void debug (const tree_node *ptr);
 extern void debug_verbose (const tree_node );
diff --git a/gcc/testsuite/g++.dg/lto/odr-7_0.C b/gcc/testsuite/g++.dg/lto/odr-7_0.C
new file mode 100644
index 000..e33b8192bc1
--- /dev/null
+++ b/gcc/testsuite/g++.dg/lto/odr-7_0.C
@@ -0,0 +1,18 @@
+// { dg-lto-do link }
+
+struct bar // { dg-lto-message "type name 'bar' should match type name 'foobar'" }
+{
+  int xxx;
+};
+
+struct foo // { dg-lto-warning "8: 'struct foo' violates the C\\+\\+ One Definition Rule" }
+{
+  bar a;
+};
+
+foo myfoo2;
+
+int
+main()
+{
+}
diff --git a/gcc/testsuite/g++.dg/lto/odr-7_1.C b/gcc/testsuite/g++.dg/lto/odr-7_1.C
new file mode 100644
index 000..464bd895900
--- /dev/null
+++ b/gcc/testsuite/g++.dg/lto/odr-7_1.C
@@ -0,0 +1,13 @@
+template 
+struct foobar
+{
+  int xxx;
+  T pes;
+};
+
+struct foo
+{
+  foobar a;
+};
+
+foo myfoo;
diff --git a/gcc/tree.h b/gcc/tree.h
index 0f3cc5d7e5a..40a4fde6aec 100644
---

[wwwdocs][Fortran] gcc-10/porting_to.html – Fortran's argument-checking

As I assume that Wilco's -fno-common porting-to patch will be submitted 
first, I didn't include the full boilerplate; cf. 
https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02662.html


As suggested by Eric, an entry in porting-to about the Fortran 
argument-checking change could also be useful, cf. 
https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02711.html


Hence: How about the following?

Tobias

Re: [PATCH] Fix up swap peephole2 (PR target/92744)

2019-12-02 Thread Uros Bizjak

On Mon, Dec 2, 2019 at 10:06 AM Jakub Jelinek  wrote:
>
> On Mon, Dec 02, 2019 at 08:28:27AM +0100, Uros Bizjak wrote:
> > > I'll have a look tomorrow.
> >
> > general_reg_operand should be used in the pattern.
>
> That indeed works, ok for trunk if it passes bootstrap/regtest?
>
> 2019-12-02  Uroš Bizjak  
> Jakub Jelinek  
>
> PR target/92744
> * config/i386/i386.md (peephole2 for *swap): Use
> general_reg_operand predicates instead of register_operand.
>
> * g++.dg/dfp/pr92744.C: New test.

OK.

Thanks,
Uros.

> --- gcc/config/i386/i386.md.jj  2019-11-19 10:28:18.187668272 +0100
> +++ gcc/config/i386/i386.md 2019-12-02 09:55:04.98961 +0100
> @@ -2788,10 +2788,10 @@ (define_insn "*swap"
> (set_attr "bdver1_decode" "double")])
>
>  (define_peephole2
> -  [(set (match_operand:SWI 0 "register_operand")
> -   (match_operand:SWI 1 "register_operand"))
> +  [(set (match_operand:SWI 0 "general_reg_operand")
> +   (match_operand:SWI 1 "general_reg_operand"))
> (set (match_dup 1)
> -   (match_operand:SWI 2 "register_operand"))
> +   (match_operand:SWI 2 "general_reg_operand"))
> (set (match_dup 2) (match_dup 0))]
>"peep2_reg_dead_p (3, operands[0])
> && optimize_insn_for_size_p ()"
> --- gcc/testsuite/g++.dg/dfp/pr92744.C.jj   2019-12-02 10:01:04.133064486 
> +0100
> +++ gcc/testsuite/g++.dg/dfp/pr92744.C  2019-12-02 10:00:31.577571093 +0100
> @@ -0,0 +1,20 @@
> +// PR target/92744
> +// { dg-do compile }
> +// { dg-options "-Os -fno-tree-ccp" }
> +
> +typedef float T __attribute__((mode(SD)));
> +struct A { T a; };
> +void foo ();
> +
> +bool
> +operator!= (A x, A y)
> +{
> +  return x.a != y.a;
> +}
> +
> +void
> +bar (A x, A y)
> +{
> +  if (x != y)
> +foo ();
> +}
>
>
> Jakub
>

[PATCH] Fix up swap peephole2 (PR target/92744)

On Mon, Dec 02, 2019 at 08:28:27AM +0100, Uros Bizjak wrote:
> > I'll have a look tomorrow.
> 
> general_reg_operand should be used in the pattern.

That indeed works, ok for trunk if it passes bootstrap/regtest?

2019-12-02  Uroš Bizjak  
Jakub Jelinek  

PR target/92744
* config/i386/i386.md (peephole2 for *swap): Use
general_reg_operand predicates instead of register_operand.

* g++.dg/dfp/pr92744.C: New test.

--- gcc/config/i386/i386.md.jj  2019-11-19 10:28:18.187668272 +0100
+++ gcc/config/i386/i386.md 2019-12-02 09:55:04.98961 +0100
@@ -2788,10 +2788,10 @@ (define_insn "*swap"
(set_attr "bdver1_decode" "double")])
 
 (define_peephole2
-  [(set (match_operand:SWI 0 "register_operand")
-   (match_operand:SWI 1 "register_operand"))
+  [(set (match_operand:SWI 0 "general_reg_operand")
+   (match_operand:SWI 1 "general_reg_operand"))
(set (match_dup 1)
-   (match_operand:SWI 2 "register_operand"))
+   (match_operand:SWI 2 "general_reg_operand"))
(set (match_dup 2) (match_dup 0))]
   "peep2_reg_dead_p (3, operands[0])
&& optimize_insn_for_size_p ()"
--- gcc/testsuite/g++.dg/dfp/pr92744.C.jj   2019-12-02 10:01:04.133064486 
+0100
+++ gcc/testsuite/g++.dg/dfp/pr92744.C  2019-12-02 10:00:31.577571093 +0100
@@ -0,0 +1,20 @@
+// PR target/92744
+// { dg-do compile }
+// { dg-options "-Os -fno-tree-ccp" }
+
+typedef float T __attribute__((mode(SD)));
+struct A { T a; };
+void foo ();
+
+bool
+operator!= (A x, A y)
+{
+  return x.a != y.a;
+}
+
+void
+bar (A x, A y)
+{
+  if (x != y)
+foo ();
+}


Jakub

Re: [PATCH] Improve AB+-A -> A(B+-1) and A+-AB -> A(1+-B) match.pd optimization

On Sat, 30 Nov 2019, Jakub Jelinek wrote:

> Hi!
> 
> As discussed in the PR, we can't optimize e.g.
>   int a = t - 1;
>   int b = a * v;
>   return b + v;
> into return t * v; for signed non-wrapv arithmetics.  This can be done
> by the match.pd (A * B) +- A -> (B +- 1) * A or
> A +- (A * B) -> (1 +- B) * A canonicalizations.  Being a lazy guy,
> I wrote attached brute force proglet to look for all the cases (for signed
> char) where there is no UB in the original and the transformation would
> introduce UB.  For the simple cases with just A and B, rather than A, B and
> C, the problematic cases are for signed char only:
> A*B+A -> (B+1)*A A==-1 && B==127
> A*B+A -> (B+1)*A A==0 && B==127
> A*B-A -> (B-1)*A A==0 && B==-128
> A-A*B -> (1-B)*A A==-1 && B==-127
> A-A*B -> (1-B)*A A==0 && B==-128
> A-A*B -> (1-B)*A A==0 && B==-127
> The current patterns already use VRP (tree_expr_nonzero_p and
> expr_not_equal_to) to do the transformation only if A is known not to be 0
> or -1.  But as the above problematic cases show, for A*B-A the -1
> case is actually not problematic (transformation doesn't introduce UB;
> if A is -1, -1*B+1 has UB only for minimum and minimum+1, and the
> replacement (B-1)*-1 has also UB for those two cases only) and even when we
> know nothing about A value range, if we know something about B value range,
> we could still optimize.  So, for the
>   int a = t - 1;
>   int b = a * v;
>   return b + v;
> case, the a = t - 1 has value range that doesn't include maximum and
> so we can conclude it is ok to transform it into return ((t - 1) + 1) * v
> and thus t * v.
> 
> Unfortunately, the patch "broke" a few loop_versioning_*.f90 tests (CCing
> author), where there are small differences between the lversion pass,
> e.g. in loop_versioning_1.f90 (f1):
> -  # RANGE ~[0, 0]
> -  _4 = iftmp.11_6 * S.4_19;
> -  _11 = _4 - iftmp.11_6;
> +  # RANGE [0, 9223372036854775806] NONZERO 9223372036854775807
> +  _4 = S.4_19 + -1;
> +  _11 = _4 * iftmp.11_6;
> and the lversion pass then emits just one message instead of two, but in the
> end assembly is identical.  In loop_versioning_6.f90 (though, with -m32
> only), the code before lversion pass actually looks better in f1:
> -  # i_35 = PHI <1(8), i_28(9)>
> -  _9 = iftmp.33_15 * i_35;
> -  _10 = _9 * 2;
> -  _21 = _10 - iftmp.33_15;
> -  (*x.0_23)[_21] = 1.0e+2;
> -  _11 = i_35 * 2;
> -  _12 = _11 + 1;
> -  _13 = _12 * iftmp.33_15;
> -  _22 = _13 - iftmp.33_15;
> -  (*x.0_23)[_22] = 1.01e+2;
> -  i_28 = i_35 + 1;
> -  if (iftmp.36_25 < i_28)
> +  # i_31 = PHI <1(8), i_26(9)>
> +  _10 = iftmp.33_13 * i_31;
> +  _11 = _10 * 2;
> +  _19 = _11 - iftmp.33_13;
> +  (*x.0_21)[_19] = 1.0e+2;
> +  (*x.0_21)[_11] = 1.01e+2;
> +  i_26 = i_31 + 1;
> +  if (iftmp.36_23 < i_26)
> where due to the new canonicalizations we managed to avoid some
> multiplications.  One index was iftmp*i*2-iftmp and the other
> was iftmp*(i*2+1)-iftmp and with the patch we managed to simplify
> the latter into iftmp*i*2 and use for that the temporary used for
> the first expression.  f1 is actually in assembly smaller because of this.
> The lp64 vs. ! lp64 is just a wild guess, guess testing on further targets
> will show what is the target property that matters.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Thanks,
Richard.

> 2019-11-29  Jakub Jelinek  
> 
>   PR tree-optimization/92712
>   * match.pd ((A * B) +- A -> (B +- 1) * A,
>   A +- (A * B) -> (1 +- B) * A): Allow optimizing signed integers
>   even when we don't know anything about range of A, but do know
>   something about range of B and the simplification won't introduce
>   new UB.
> 
>   * gcc.dg/tree-ssa/pr92712-1.c: New test.
>   * gcc.dg/tree-ssa/pr92712-2.c: New test.
>   * gcc.dg/tree-ssa/pr92712-3.c: New test.
>   * gfortran.dg/loop_versioning_1.f90: Adjust expected number of
>   likely to be innermost dimension messages.
>   * gfortran.dg/loop_versioning_10.f90: Likewise.
>   * gfortran.dg/loop_versioning_6.f90: Likewise.
> 
> --- gcc/match.pd.jj   2019-11-05 14:59:22.546873967 +0100
> +++ gcc/match.pd  2019-11-29 18:17:27.472002727 +0100
> @@ -2480,18 +2480,42 @@ (define_operator_list COND_TERNARY
>  (plusminus @0 (mult:c@3 @0 @2))
>  (if ((!ANY_INTEGRAL_TYPE_P (type)
> || TYPE_OVERFLOW_WRAPS (type)
> +   /* For @0 + @0*@2 this transformation would introduce UB
> +  (where there was none before) for @0 in [-1,0] and @2 max.
> +  For @0 - @0*@2 this transformation would introduce UB
> +  for @0 0 and @2 in [min,min+1] or @0 -1 and @2 min+1.  */
> || (INTEGRAL_TYPE_P (type)
> -   && tree_expr_nonzero_p (@0)
> -   && expr_not_equal_to (@0, wi::minus_one (TYPE_PRECISION (type)
> +   && ((tree_expr_nonzero_p (@0)
> +&& expr_not_equal_to (@0,
> + wi::minus_one (TYPE_PRECISION (type
> +   ||

Re: [PATCH][RFC] Add new ipa-reorder pass

On 12/1/19 11:37 PM, Jan Hubicka wrote:

Hi,
I was playing with it a bit more and built with
-fno-profile-reorder-functions.

Here is -fno-profile-reorder-functions compared to first run
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try=3d537be0cb37458e7928f69a37efb2a6d6b85eae=try=4543abfa08870391544b56d16dfcd530dac0dc30=1
2.2% improvement on the page rendering is off noise but I would hope for
bit more.

Here is -fno-profile-reorder-functions compared to clustering
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try=3d537be0cb37458e7928f69a37efb2a6d6b85eae=try=1c2d53b10b042c15edbe7bd26e2740641840=1
I am not sure if it is a noise.

Either Fireox's talos is not good benchmark for code layout (which I
would be surprised since it is quite sensitive to code size issues) or
there are some problems.

In general I think the patch is useful and mostly mainline ready except
for detailes but it would be good to have some more evidence that it
works as expected on large binaries besides tramp3d build time. There
are number of ways where things may go wrong ranging from misupdated
profiles, problems with function splitting, comdats and other issues.

Based on my testing, I was able to see cc1plus binary really sorted as seen
by the pass, including various IPA clones that inherited order from their
origins.

Was you able to benchmark some other benefits?

Unfortunately not.

I remember we discussed
collecting traces from valgrind, perhaps we could test that they are
looking good?

I have some semi-working icegrind port here:
https://github.com/marxin/valgrind/tree/icegrind

It will probably need some extra work and it's terribly slow. But you can try
it.

I would first wait for the Firefox dump files and then we can discuss what
to do with the pass.

Martin

Honza

Re: [PATCH][RFC] Add new ipa-reorder pass

On 12/1/19 11:45 AM, Jan Hubicka wrote:

Hi.

I'm sending v3 of the patch where I changed:
- function.cold sections are properly put into .text.unlikely and
not into a .text.sorted.XYZ section

I've just finished measurements and I still have the original speed up
for tramp3d:
Total runs: 10, before: 13.92, after: 13.82, cmp: 99.219%

Hi,
I have updated binutils to current head on the Firefox testing patch and
run FDO build with tp first run ordering and call chain clustering.
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try=1313e6a4d74ebff702afa7594684beb83c01d95f=try=1c2d53b10b042c15edbe7bd26e2740641840=1

Hello.

Thank you for the testing.

It seems there are no differences in performance. The two binaries can
be downloaded at

w/o patch:
https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/MK-7DC3FQcevZC_Nvlnq8Q/runs/0/artifacts/public/build/target.tar.bz2
with call chain clustering.
https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/UVh6iNILT-qb8sYM5vxVCQ/runs/0/artifacts/public/build/target.tar.bz2

I would ideally need output of -fdump-ipa-reorder which prints sorted symbols
and
so that I can compare it with resulting assembly.

Since Firefox is quite sensitive to code size I would expect to be able
to measure some benefits here. Any idea what may have go wrong?

That's a pity.

I checked that the binaries seems generally sane - out of 58MB text
segment there is 34MB cold section. It is possible that system ld is
used instead of provided one, but that would be weird. I will try to
find way to double-check that updating binutils really updated them for
GCC.

I can double check once having the dump file.

Martin

Honza

Re: [PATCH][RFC] Add new ipa-reorder pass

On 12/1/19 12:11 PM, Jan Hubicka wrote:

Hi.

I'm sending v3 of the patch where I changed:
- function.cold sections are properly put into .text.unlikely and
not into a .text.sorted.XYZ section

I've just finished measurements and I still have the original speed up
for tramp3d:
Total runs: 10, before: 13.92, after: 13.82, cmp: 99.219%

It seems there are no differences in performance. The two binaries can
be downloaded at

Since Firefox is quite sensitive to code size I would expect to be able
to measure some benefits here. Any idea what may have go wrong?
I checked that the binaries seems generally sane - out of 58MB text
segment there is 34MB cold section. It is possible that system ld is
used instead of provided one, but that would be weird. I will try to
find way to double-check that updating binutils really updated them for
GCC.

Linker used is
GNU ld (GNU Binutils) 2.33.50.20191130

Hello.

You can check linker support with:
$ ~/bin/binutils/bin/ld.bfd --verbose | grep text.sorted
*(SORT(.text.sorted.*))

Is Firefox using BFD or GOLD for linking?

Does it have all necessary support in? Do I need something like
-ffunction-sections?

No, you don't need it.

Martin

Honza

Re: [C++ PATCH] c++/91353 - P1331R2: Allow trivial default init in constexpr contexts.