date:20191015

[PATCH] Clarify constness and state

2019-10-15 Thread François Dumont


    * src/c++11/debug.cc (print_field): Replace constness_names 
    entry with . Replace state_names  entry 
with

    .

Committed as trivial.

François

Index: src/c++11/debug.cc
===
--- src/c++11/debug.cc	(rÃ©vision 277048)
+++ src/c++11/debug.cc	(copie de travail)
@@ -721,7 +721,7 @@
 	static const char*
 	  constness_names[_Error_formatter::__last_constness] =
 	  {
-		"",
+		"",
 		"constant",
 		"mutable"
 	  };
@@ -732,7 +732,7 @@
 	static const char*
 	  state_names[_Error_formatter::__last_state] =
 	  {
-		"",
+		"",
 		"singular",
 		"dereferenceable (start-of-sequence)",
 		"dereferenceable",

Re: [PATCH] rs6000: -flto forgets 'no-vsx' function attributes (PR target/70010)

2019-10-15 Thread Jiufu Guo

Peter Bergner  writes:

> Segher Boessenkool  writes:
>> So what should we do about this?  There are arguments for *both*
>> behaviours, and apparently with LTO we do not know which flags are
>> explicit?
>
> Actually, from my testing, it seems the rs6000_isa_flags_explicit
> flags are set correctly in LTO!
>
>
>
> On 10/15/19 7:45 AM, Jiufu Guo wrote:
>> The difference between 'with LTO' and 'without LTO' is:
>> Wit LTO, if a function does not have target attribute in source code,
>> then using option attribute from command line(e.g. -mcpu=xxx) as
>> isa_flags; and if features (e.g. -mno-vsx, -mvsx) are specified on
>> command line, isa_flags_explicit is also set.
>> If a function has target attribute in source code explicitly,
>> then isa_flags_explicit is set to reflect feature is specified
>> explicitly besides explicit features from command line; and isa_flags is
>> set as the effective features.
>> 
>> Without LTO, if a function does not have target attribute in source code,
>> then it's isa_flags_explicit and isa_flags are as NULL (target_options
>> == NULL). 
>> If a function has target attribute in source code, it is same as 'with
>> LTO'. 
>
> After reading this a few times and compiling a few unit tests cases in
> gdb/cc1/lto1, I agree with your explanation above and with your addition
> to my patch.  However, how about a slight modification to your change
> with some updated comments?  Updated patch below.
Great update. 
>
> This also incorporates Segher and Andreas's changes to my test case.
> I have not looked at your extra test cases, so I have not added them
> to this patch, but I like the idea of adding test cases that exercise
> this code in LTO context.
>
> I am going on vacation tomorrow, so Jiufu, can you take over ownership
> of this combined patch, along with your extra test cases?  I have not
> bootstrapped or regtested this, so that still needs doing.  If you're
> busy, I can pick this up when I return to work on Monday.
Sure, I will add test cases and do bootstrap/regtest.

Jiufu
BR
>
> Peter
>
>
> gcc/
>   PR target/70010
>   * config/rs6000/rs6000.c (rs6000_can_inline_p): Prohibit inlining if
>   the callee explicitly disables some isa_flags the caller is using.
>
> gcc.testsuite/
>   PR target/70010
>   * gcc.target/powerpc/pr70010.c: New test.
>
> Index: gcc/config/rs6000/rs6000.c
> ===
> --- gcc/config/rs6000/rs6000.c(revision 276975)
> +++ gcc/config/rs6000/rs6000.c(working copy)
> @@ -23964,25 +23964,31 @@ rs6000_can_inline_p (tree caller, tree c
>tree caller_tree = DECL_FUNCTION_SPECIFIC_TARGET (caller);
>tree callee_tree = DECL_FUNCTION_SPECIFIC_TARGET (callee);
>  
> -  /* If callee has no option attributes, then it is ok to inline.  */
> +  /* If the callee has no option attributes, then it is ok to inline.  */
>if (!callee_tree)
>  ret = true;
>  
> -  /* If caller has no option attributes, but callee does then it is not ok to
> - inline.  */
> -  else if (!caller_tree)
> -ret = false;
> -
>else
>  {
> -  struct cl_target_option *caller_opts = TREE_TARGET_OPTION 
> (caller_tree);
> +  HOST_WIDE_INT caller_isa;
>struct cl_target_option *callee_opts = TREE_TARGET_OPTION 
> (callee_tree);
> +  HOST_WIDE_INT callee_isa = callee_opts->x_rs6000_isa_flags;
> +  HOST_WIDE_INT explicit_isa = callee_opts->x_rs6000_isa_flags_explicit;
> +
> +  /* If the caller has option attributes, then use them.
> +  Otherwise, use the command line options.  */
> +  if (caller_tree)
> + caller_isa = TREE_TARGET_OPTION (caller_tree)->x_rs6000_isa_flags;
> +  else
> + caller_isa = rs6000_isa_flags;
>  
> -  /* Callee's options should a subset of the caller's, i.e. a vsx 
> function
> -  can inline an altivec function but a non-vsx function can't inline a
> -  vsx function.  */
> -  if ((caller_opts->x_rs6000_isa_flags & callee_opts->x_rs6000_isa_flags)
> -   == callee_opts->x_rs6000_isa_flags)
> +  /* The callee's options must be a subset of the caller's options, i.e.
> +  a vsx function may inline an altivec function, but a non-vsx function
> +  must not inline a vsx function.  However, for those options that the
> +  callee has explicitly set, then we must enforce that the callee's
> +  and caller's options match exactly; see PR70010.  */
> +  if (((caller_isa & callee_isa) == callee_isa)
> +   && (caller_isa & explicit_isa) == (callee_isa & explicit_isa))
>   ret = true;
>  }
>  
> Index: gcc/testsuite/gcc.target/powerpc/pr70010.c
> ===
> --- gcc/testsuite/gcc.target/powerpc/pr70010.c(nonexistent)
> +++ gcc/testsuite/gcc.target/powerpc/pr70010.c(working copy)
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_vsx_ok } */
> +/* { dg-options "-O2

Ping: [PATCH v5] Missed function specialization + partial devirtualization

2019-10-15 Thread luoxhu


Ping:

Attachment: v5-0001-Missed-function-specialization-partial-devirtuali.patch:

https://gcc.gnu.org/ml/gcc-patches/2019-09/txtuTT17jV7n5.txt


Thanks,
Xiong Hu


On 2019/9/27 15:13, luoxhu wrote:

Hi Martin,

Thanks for your time of so many round of reviews.
It really helped me a lot.
Updated with your comments and attached for Honza's review and approve.  :)


Xiong Hu
BR

On 2019/9/26 16:36, Martin Liška wrote:

On 9/26/19 7:23 AM, luoxhu wrote:

Thanks Martin,


On 2019/9/25 18:57, Martin Liška wrote:

On 9/25/19 5:45 AM, luoxhu wrote:

Hi,

Sorry for replying so late due to cauldron conference and other LTO 
issues

I was working on.


Hello.

That's fine, we still have plenty of time for patch review.

Not fixed issues which I reported in v3 (and still valid in v4):
- please come up with indirect_target_info::indirect_target_info and 
use it

Sorry for miss out.


Hello.

Sure, please use a contructor initialization (see my patch).




- do you need to stream out indirect_call_targets when common_target_id 
== 0?


No need to stream out items with common_target_id == 0, removed the if 
condition in lto-cgraph.c.


Fine. Do we have a guarantee that item->common_target_id is always != 0? 
Please put there an assert.






Then I'm suggesting to use vec::is_empty (please see my patch).
OK.  But has_multiple_indirect_call_p should return different than 
has_indirect_call_p as it checks more that one targets?


Sure, that was mistake in my patch from previous reply.



gcc/cgraph.c
/* Return true if this edge has multiple indirect call targets.  */
  bool
  cgraph_edge::has_multiple_indirect_call_p (void)
  {
-  return indirect_info && indirect_info->indirect_call_targets
-    && indirect_info->indirect_call_targets->length () > 1;
+  return (indirect_info && indirect_info->indirect_call_targets
+ && indirect_info->indirect_call_targets->length () > 1);
  }

  /* Return true if this edge has at least one indirect call target.  */
  bool
  cgraph_edge::has_indirect_call_p (void)
  {
-  return indirect_info && indirect_info->indirect_call_targets
-    && indirect_info->indirect_call_targets->length ();
+  return (indirect_info && indirect_info->indirect_call_targets
+ && !indirect_info->indirect_call_targets->is_empty ());
  }



I see following failures for the tests provided:
FAIL: gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c compilation,  
-fprofile-generate -D_PROFILE_GENERATE
FAIL: gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c compilation,  
-fprofile-generate -D_PROFILE_GENERATE
FAIL: gcc.dg/tree-prof/indir-call-prof-topn.c compilation,  
-fprofile-generate -D_PROFILE_GENERATE


Sorry that I forgot to remove the deprecated build option in the 3 cases
(also updated the scan exp check):
-/* { dg-options "-O2 -flto -DDOJOB=1 -fdump-ipa-profile_estimate 
--param indir-call-topn-profile=1" } */

+/* { dg-options "-O2 -flto -DDOJOB=1 -fdump-ipa-profile_estimate" } */


The new patch is attached.  Thanks.


Hm, looking at the gimple_ic_transform function. I think the function 
should always

return false as it never does a GIMPLE transformation.

Apart from that, I'm fine with the patch. Note that I'm not the 
maintainer, but I bet we simplified

the patch review to Honza significantly.

Last missing piece is probably the update ChangeLog.

Thank you for working on that,
Martin




Xiong Hu



Next comments follow directly in the email body:



v4 Changes:
   1. Rebase to trunk.
   2. Remove num_of_ics and use vector's length to avoid redundancy.
   3. Update the code in ipa-profile.c to improve review feasibility.
   4. Add function has_indirect_call_p and has_multiple_indirect_call_p.
   5. For parameter control, I will leave it to next patch as it is a
  relative independent function.  Currently, maximum number of
  promotions is GCOV_TOPN_VALUES as only 4 profiling value limited
  from profile-generate, therefore minimum probability is adjusted to
  25% in value-prof.c, it was 75% also by hard code for single
  indirect target.  No control to minimal number of edge
  executions yet.  What's more, this patch is a bit large now.

This patch aims to fix PR69678 caused by PGO indirect call profiling
performance issues.
The bug that profiling data is never working was fixed by Martin's pull
back of topN patches, performance got GEOMEAN ~1% improvement(+24% for
511.povray_r specifically).
Still, currently the default profile only generates SINGLE indirect 
target

that called more than 75%.  This patch leverages MULTIPLE indirect
targets use in LTO-WPA and LTO-LTRANS stage, as a result, function
specialization, profiling, partial devirtualization, inlining and
cloning could be done successfully based on it.
Performance can get improved from 0.70 sec to 0.38 sec on simple tests.
Details are:
    1.  PGO with topn is enabled by default now, but only one indirect
    target edge will be generated in ipa-profile pass, so add 
variables to enable
    multi

Re: [ C++ ] [ PATCH ] [ RFC ] p1301 - [[nodiscard("should have a reason")]]

2019-10-15 Thread JeanHeyd Meneide

I am also not very smart, wherein I attach patches that do not have the tests.

Sorry!
diff --git a/gcc/c-family/c-lex.c b/gcc/c-family/c-lex.c
index e3c602fbb8d..fb05b5f8af0 100644
--- a/gcc/c-family/c-lex.c
+++ b/gcc/c-family/c-lex.c
@@ -353,13 +353,14 @@ c_common_has_attribute (cpp_reader *pfile)
  else if (is_attribute_p ("deprecated", attr_name))
result = 201309;
  else if (is_attribute_p ("maybe_unused", attr_name)
-  || is_attribute_p ("nodiscard", attr_name)
   || is_attribute_p ("fallthrough", attr_name))
result = 201603;
  else if (is_attribute_p ("no_unique_address", attr_name)
   || is_attribute_p ("likely", attr_name)
   || is_attribute_p ("unlikely", attr_name))
result = 201803;
+ else if (is_attribute_p ("nodiscard", attr_name))
+   result = 201907;
  if (result)
attr_name = NULL_TREE;
}
diff --git a/gcc/cp/cvt.c b/gcc/cp/cvt.c
index 364af72e68d..3365219636f 100644
--- a/gcc/cp/cvt.c
+++ b/gcc/cp/cvt.c
@@ -35,6 +35,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "convert.h"
 #include "stringpool.h"
 #include "attribs.h"
+#include "escaped_string.h"
 
 static tree convert_to_pointer_force (tree, tree, tsubst_flags_t);
 static tree build_type_conversion (tree, tree);
@@ -1026,22 +1027,45 @@ maybe_warn_nodiscard (tree expr, impl_conv_void 
implicit)
 
   tree rettype = TREE_TYPE (type);
   tree fn = cp_get_fndecl_from_callee (callee);
+  tree attr;
   if (implicit != ICV_CAST && fn
-  && lookup_attribute ("nodiscard", DECL_ATTRIBUTES (fn)))
+  && (attr = lookup_attribute ("nodiscard", DECL_ATTRIBUTES (fn
 {
+  escaped_string msg;
+  tree args = TREE_VALUE(attr);
+  const bool has_string_arg = args && TREE_CODE (TREE_VALUE (args)) == 
STRING_CST;
+  if (has_string_arg)
+msg.escape (TREE_STRING_POINTER (TREE_VALUE (args)));
+  const bool has_msg = msg;
+  const char* format = (has_msg ?
+   G_("ignoring return value of %qD, "
+  "declared with attribute %: %<%s%>") :
+   G_("ignoring return value of %qD, "
+  "declared with attribute %%s"));
+  const char* raw_msg = (has_msg ? static_cast(msg) : "");
   auto_diagnostic_group d;
   if (warning_at (loc, OPT_Wunused_result,
- "ignoring return value of %qD, "
- "declared with attribute nodiscard", fn))
+ format, fn, raw_msg))
inform (DECL_SOURCE_LOCATION (fn), "declared here");
 }
   else if (implicit != ICV_CAST
-  && lookup_attribute ("nodiscard", TYPE_ATTRIBUTES (rettype)))
+  && (attr = lookup_attribute ("nodiscard", TYPE_ATTRIBUTES 
(rettype
 {
+  escaped_string msg;
+  tree args = TREE_VALUE(attr);
+  const bool has_string_arg = args && TREE_CODE (TREE_VALUE (args)) == 
STRING_CST;
+  if (has_string_arg)
+msg.escape (TREE_STRING_POINTER (TREE_VALUE (args)));
+  const bool has_msg = msg;
+  const char* format = has_msg ?
+   G_("ignoring return value of type %qT, "
+  "declared with attribute %: %<%s%>") :
+   G_("ignoring return value of type %qT, "
+  "declared with attribute %%s");
+  const char* raw_msg = (has_msg ? static_cast(msg) : "");
   auto_diagnostic_group d;
   if (warning_at (loc, OPT_Wunused_result,
- "ignoring returned value of type %qT, "
- "declared with attribute nodiscard", rettype))
+ format, rettype, raw_msg))
{
  if (fn)
inform (DECL_SOURCE_LOCATION (fn),
@@ -1050,24 +1074,59 @@ maybe_warn_nodiscard (tree expr, impl_conv_void 
implicit)
  "%qT declared here", rettype);
}
 }
-  else if (TREE_CODE (expr) == TARGET_EXPR
-  && lookup_attribute ("warn_unused_result", TYPE_ATTRIBUTES (type)))
+  else if (TREE_CODE (expr) == TARGET_EXPR)
 {
   /* The TARGET_EXPR confuses do_warn_unused_result into thinking that the
 result is used, so handle that case here.  */
-  if (fn)
+  if (lookup_attribute ("warn_unused_result", TYPE_ATTRIBUTES (type)))
{
- auto_diagnostic_group d;
- if (warning_at (loc, OPT_Wunused_result,
- "ignoring return value of %qD, "
- "declared with attribute %",
- fn))
-   inform (DECL_SOURCE_LOCATION (fn), "declared here");
+ if (fn)
+   {
+ auto_diagnostic_group d;
+ if (warning_at (loc, OPT_Wunused_result,
+ "ignoring return value of %qD, "
+ "declared with attribute %",
+ fn))
+   inform (DECL_SOURCE_LOCATION (fn), "declared here");
+

Re: [PATCH V3] Loop split upon semi-invariant condition (PR tree-optimization/89134)

2019-10-15 Thread Feng Xue OS

Hi Philipp,

   This is an updated patch based on comments form Michael, and if he think 
this is ok, we will merge it into trunk. Thanks,

Feng


From: Philipp Tomsich 
Sent: Tuesday, October 15, 2019 11:49 PM
To: Feng Xue OS
Cc: Michael Matz; Richard Biener; gcc-patches@gcc.gnu.org; Christoph Müllner; 
erick.oc...@theobroma-systems.com
Subject: Re: [PATCH V3] Loop split upon semi-invariant condition (PR 
tree-optimization/89134)

Feng,

This looks good from our side and has shown useful (combined with the other 2 
patches) in
our testing with SPEC2017.
Given that this looks final: what is the plan for getting this merged?

Thanks,
Philipp.

> On 12.09.2019, at 12:23, Feng Xue OS  
> wrote:
>
> ---
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 1391a562c35..28981fa1048 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -11418,6 +11418,19 @@ The maximum number of branches unswitched in a 
> single loop.
> @item lim-expensive
> The minimum cost of an expensive expression in the loop invariant motion.
>
> +@item max-cond-loop-split-insns
> +In a loop, if a branch of a conditional statement is selected since certain
> +loop iteration, any operand that contributes to computation of the 
> conditional
> +expression remains unchanged in all following iterations, the statement is
> +semi-invariant, upon which we can do a kind of loop split transformation.
> +@option{max-cond-loop-split-insns} controls maximum number of insns to be
> +added due to loop split on semi-invariant conditional statement.
> +
> +@item min-cond-loop-split-prob
> +When FDO profile information is available, @option{min-cond-loop-split-prob}
> +specifies minimum threshold for probability of semi-invariant condition
> +statement to trigger loop split.
> +
> @item iv-consider-all-candidates-bound
> Bound on number of candidates for induction variables, below which
> all candidates are considered for each use in induction variable
> diff --git a/gcc/params.def b/gcc/params.def
> index 13001a7bb2d..12bc8c26c9e 100644
> --- a/gcc/params.def
> +++ b/gcc/params.def
> @@ -386,6 +386,20 @@ DEFPARAM(PARAM_MAX_UNSWITCH_LEVEL,
>   "The maximum number of unswitchings in a single loop.",
>   3, 0, 0)
>
> +/* The maximum number of increased insns due to loop split on semi-invariant
> +   condition statement.  */
> +DEFPARAM(PARAM_MAX_COND_LOOP_SPLIT_INSNS,
> + "max-cond-loop-split-insns",
> + "The maximum number of insns to be added due to loop split on "
> + "semi-invariant condition statement.",
> + 100, 0, 0)
> +
> +DEFPARAM(PARAM_MIN_COND_LOOP_SPLIT_PROB,
> + "min-cond-loop-split-prob",
> + "The minimum threshold for probability of semi-invariant condition "
> + "statement to trigger loop split.",
> + 30, 0, 100)
> +
> /* The maximum number of insns in loop header duplicated by the copy loop
>headers pass.  */
> DEFPARAM(PARAM_MAX_LOOP_HEADER_INSNS,
>
> diff --git a/gcc/testsuite/g++.dg/tree-ssa/loop-cond-split-1.C 
> b/gcc/testsuite/g++.dg/tree-ssa/loop-cond-split-1.C
> new file mode 100644
> index 000..51f9da22fc7
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/tree-ssa/loop-cond-split-1.C
> @@ -0,0 +1,33 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-lsplit-details" } */
> +
> +#include 
> +#include 
> +
> +using namespace std;
> +
> +class  A
> +{
> +public:
> +  bool empty;
> +  void set (string s);
> +};
> +
> +class  B
> +{
> +  map m;
> +  void f ();
> +};
> +
> +extern A *ga;
> +
> +void B::f ()
> +{
> +  for (map::iterator iter = m.begin (); iter != m.end (); 
> ++iter)
> +{
> +  if (ga->empty)
> +ga->set (iter->second);
> +}
> +}
> +
> +/* { dg-final { scan-tree-dump-times "split loop 1 at branch" 1 "lsplit" } } 
> */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loop-cond-split-1.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/loop-cond-split-1.c
> new file mode 100644
> index 000..bbd522d6bcd
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/loop-cond-split-1.c
> @@ -0,0 +1,23 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-lsplit-details" } */
> +
> +__attribute__((pure)) __attribute__((noinline)) int inc (int i)
> +{
> +  return i + 1;
> +}
> +
> +extern int do_something (void);
> +extern int b;
> +
> +void test(int n)
> +{
> +  int i;
> +
> +  for (i = 0; i < n; i = inc (i))
> +{
> +  if (b)
> +b = do_something();
> +}
> +}
> +
> +/* { dg-final { scan-tree-dump-times "split loop 1 at branch" 1 "lsplit" } } 
> */
> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
> index f5f083384bc..e4a1b6d2019 100644
> --- a/gcc/tree-ssa-loop-split.c
> +++ b/gcc/tree-ssa-loop-split.c
> @@ -32,7 +32,10 @@ along with GCC; see the file COPYING3.  If not see
> #include "tree-ssa-loop.h"
> #include "tree-ssa-loop-manip.h"
> #include "tree-into-ssa.h"
> +#include "tree-inline.h"
> +#include "tree-cfgcleanup.h"
> #include "cfgloop.h"
> +#includ

Re: [PATCH] rs6000: -flto forgets 'no-vsx' function attributes (PR target/70010)

2019-10-15 Thread Peter Bergner

Segher Boessenkool  writes:
> So what should we do about this?  There are arguments for *both*
> behaviours, and apparently with LTO we do not know which flags are
> explicit?

Actually, from my testing, it seems the rs6000_isa_flags_explicit
flags are set correctly in LTO!



On 10/15/19 7:45 AM, Jiufu Guo wrote:
> The difference between 'with LTO' and 'without LTO' is:
> Wit LTO, if a function does not have target attribute in source code,
> then using option attribute from command line(e.g. -mcpu=xxx) as
> isa_flags; and if features (e.g. -mno-vsx, -mvsx) are specified on
> command line, isa_flags_explicit is also set.
> If a function has target attribute in source code explicitly,
> then isa_flags_explicit is set to reflect feature is specified
> explicitly besides explicit features from command line; and isa_flags is
> set as the effective features.
> 
> Without LTO, if a function does not have target attribute in source code,
> then it's isa_flags_explicit and isa_flags are as NULL (target_options
> == NULL). 
> If a function has target attribute in source code, it is same as 'with
> LTO'. 

After reading this a few times and compiling a few unit tests cases in
gdb/cc1/lto1, I agree with your explanation above and with your addition
to my patch.  However, how about a slight modification to your change
with some updated comments?  Updated patch below.

This also incorporates Segher and Andreas's changes to my test case.
I have not looked at your extra test cases, so I have not added them
to this patch, but I like the idea of adding test cases that exercise
this code in LTO context.

I am going on vacation tomorrow, so Jiufu, can you take over ownership
of this combined patch, along with your extra test cases?  I have not
bootstrapped or regtested this, so that still needs doing.  If you're
busy, I can pick this up when I return to work on Monday.

Peter


gcc/
PR target/70010
* config/rs6000/rs6000.c (rs6000_can_inline_p): Prohibit inlining if
the callee explicitly disables some isa_flags the caller is using.

gcc.testsuite/
PR target/70010
* gcc.target/powerpc/pr70010.c: New test.

Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 276975)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -23964,25 +23964,31 @@ rs6000_can_inline_p (tree caller, tree c
   tree caller_tree = DECL_FUNCTION_SPECIFIC_TARGET (caller);
   tree callee_tree = DECL_FUNCTION_SPECIFIC_TARGET (callee);
 
-  /* If callee has no option attributes, then it is ok to inline.  */
+  /* If the callee has no option attributes, then it is ok to inline.  */
   if (!callee_tree)
 ret = true;
 
-  /* If caller has no option attributes, but callee does then it is not ok to
- inline.  */
-  else if (!caller_tree)
-ret = false;
-
   else
 {
-  struct cl_target_option *caller_opts = TREE_TARGET_OPTION (caller_tree);
+  HOST_WIDE_INT caller_isa;
   struct cl_target_option *callee_opts = TREE_TARGET_OPTION (callee_tree);
+  HOST_WIDE_INT callee_isa = callee_opts->x_rs6000_isa_flags;
+  HOST_WIDE_INT explicit_isa = callee_opts->x_rs6000_isa_flags_explicit;
+
+  /* If the caller has option attributes, then use them.
+Otherwise, use the command line options.  */
+  if (caller_tree)
+   caller_isa = TREE_TARGET_OPTION (caller_tree)->x_rs6000_isa_flags;
+  else
+   caller_isa = rs6000_isa_flags;
 
-  /* Callee's options should a subset of the caller's, i.e. a vsx function
-can inline an altivec function but a non-vsx function can't inline a
-vsx function.  */
-  if ((caller_opts->x_rs6000_isa_flags & callee_opts->x_rs6000_isa_flags)
- == callee_opts->x_rs6000_isa_flags)
+  /* The callee's options must be a subset of the caller's options, i.e.
+a vsx function may inline an altivec function, but a non-vsx function
+must not inline a vsx function.  However, for those options that the
+callee has explicitly set, then we must enforce that the callee's
+and caller's options match exactly; see PR70010.  */
+  if (((caller_isa & callee_isa) == callee_isa)
+ && (caller_isa & explicit_isa) == (callee_isa & explicit_isa))
ret = true;
 }
 
Index: gcc/testsuite/gcc.target/powerpc/pr70010.c
===
--- gcc/testsuite/gcc.target/powerpc/pr70010.c  (nonexistent)
+++ gcc/testsuite/gcc.target/powerpc/pr70010.c  (working copy)
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-O2 -finline-functions" } */
+/* { dg-final { scan-assembler {\mbl vadd_no_vsx\M} } } */
+
+typedef int vec_t __attribute__((vector_size(16)));
+
+static vec_t
+__attribute__((__target__("no-vsx")))
+vadd_no_vsx (vec_t a, vec_t b)
+{
+  return a + b;
+}
+
+vec_t
+__attribute__((__target__("vsx")

[ C++ ] [ PATCH ] [ RFC ] p1301 - [[nodiscard("should have a reason")]]

2019-10-15 Thread JeanHeyd Meneide

Attached is a patch for p1301 that improves in the way Jason Merrill
specified earlier
(https://gcc.gnu.org/ml/gcc-patches/2019-09/msg00858.html), but it
keeps segfaulting on my build of GCC. I don't know what changes I've
made that cause it to segfault: it does so whenever the error()
function is called but the backtraces aren't showing me anything
conclusive.

The tests test what I expect them to and the output is fine, I just
can't get the segfaults to stop, so I'm putting the patch up so
someone can critique what I've written, or someone else to test it
too. Sorry for taking so long.

Thanks,
JeanHeyd
diff --git a/gcc/c-family/c-lex.c b/gcc/c-family/c-lex.c
index e3c602fbb8d..fb05b5f8af0 100644
--- a/gcc/c-family/c-lex.c
+++ b/gcc/c-family/c-lex.c
@@ -353,13 +353,14 @@ c_common_has_attribute (cpp_reader *pfile)
  else if (is_attribute_p ("deprecated", attr_name))
result = 201309;
  else if (is_attribute_p ("maybe_unused", attr_name)
-  || is_attribute_p ("nodiscard", attr_name)
   || is_attribute_p ("fallthrough", attr_name))
result = 201603;
  else if (is_attribute_p ("no_unique_address", attr_name)
   || is_attribute_p ("likely", attr_name)
   || is_attribute_p ("unlikely", attr_name))
result = 201803;
+ else if (is_attribute_p ("nodiscard", attr_name))
+   result = 201907;
  if (result)
attr_name = NULL_TREE;
}
diff --git a/gcc/cp/cvt.c b/gcc/cp/cvt.c
index 364af72e68d..3365219636f 100644
--- a/gcc/cp/cvt.c
+++ b/gcc/cp/cvt.c
@@ -35,6 +35,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "convert.h"
 #include "stringpool.h"
 #include "attribs.h"
+#include "escaped_string.h"
 
 static tree convert_to_pointer_force (tree, tree, tsubst_flags_t);
 static tree build_type_conversion (tree, tree);
@@ -1026,22 +1027,45 @@ maybe_warn_nodiscard (tree expr, impl_conv_void 
implicit)
 
   tree rettype = TREE_TYPE (type);
   tree fn = cp_get_fndecl_from_callee (callee);
+  tree attr;
   if (implicit != ICV_CAST && fn
-  && lookup_attribute ("nodiscard", DECL_ATTRIBUTES (fn)))
+  && (attr = lookup_attribute ("nodiscard", DECL_ATTRIBUTES (fn
 {
+  escaped_string msg;
+  tree args = TREE_VALUE(attr);
+  const bool has_string_arg = args && TREE_CODE (TREE_VALUE (args)) == 
STRING_CST;
+  if (has_string_arg)
+msg.escape (TREE_STRING_POINTER (TREE_VALUE (args)));
+  const bool has_msg = msg;
+  const char* format = (has_msg ?
+   G_("ignoring return value of %qD, "
+  "declared with attribute %: %<%s%>") :
+   G_("ignoring return value of %qD, "
+  "declared with attribute %%s"));
+  const char* raw_msg = (has_msg ? static_cast(msg) : "");
   auto_diagnostic_group d;
   if (warning_at (loc, OPT_Wunused_result,
- "ignoring return value of %qD, "
- "declared with attribute nodiscard", fn))
+ format, fn, raw_msg))
inform (DECL_SOURCE_LOCATION (fn), "declared here");
 }
   else if (implicit != ICV_CAST
-  && lookup_attribute ("nodiscard", TYPE_ATTRIBUTES (rettype)))
+  && (attr = lookup_attribute ("nodiscard", TYPE_ATTRIBUTES 
(rettype
 {
+  escaped_string msg;
+  tree args = TREE_VALUE(attr);
+  const bool has_string_arg = args && TREE_CODE (TREE_VALUE (args)) == 
STRING_CST;
+  if (has_string_arg)
+msg.escape (TREE_STRING_POINTER (TREE_VALUE (args)));
+  const bool has_msg = msg;
+  const char* format = has_msg ?
+   G_("ignoring return value of type %qT, "
+  "declared with attribute %: %<%s%>") :
+   G_("ignoring return value of type %qT, "
+  "declared with attribute %%s");
+  const char* raw_msg = (has_msg ? static_cast(msg) : "");
   auto_diagnostic_group d;
   if (warning_at (loc, OPT_Wunused_result,
- "ignoring returned value of type %qT, "
- "declared with attribute nodiscard", rettype))
+ format, rettype, raw_msg))
{
  if (fn)
inform (DECL_SOURCE_LOCATION (fn),
@@ -1050,24 +1074,59 @@ maybe_warn_nodiscard (tree expr, impl_conv_void 
implicit)
  "%qT declared here", rettype);
}
 }
-  else if (TREE_CODE (expr) == TARGET_EXPR
-  && lookup_attribute ("warn_unused_result", TYPE_ATTRIBUTES (type)))
+  else if (TREE_CODE (expr) == TARGET_EXPR)
 {
   /* The TARGET_EXPR confuses do_warn_unused_result into thinking that the
 result is used, so handle that case here.  */
-  if (fn)
+  if (lookup_attribute ("warn_unused_result", TYPE_ATTRIBUTES (type)))
{
- auto_diagnostic_group d;
- if (warning_at (loc, OPT_Wunused_result,
- "ignoring return value

Re: [PATCH] rs6000: -flto forgets 'no-vsx' function attributes (PR target/70010)

2019-10-15 Thread Peter Bergner

On 10/15/19 11:12 AM, Peter Bergner wrote:
> On 10/15/19 10:44 AM, Richard Biener wrote:
>> On October 15, 2019 5:09:52 PM GMT+02:00, Peter Bergner 
>>  wrote:
>>> If the user explicitly said not to compile a function with a particular
>>> option, how can we justify ignoring that request just because we're
>>> inlining it?  We don't do that for the out of line version of that
>>> callee function.
>>
>> I wonder how you can tell apart explicit vs. Implicit in the LTO context
>> where the option is represented as target attribute on the function.
> 
> Ah, so all of the options used to compile the callee in an LTO context,
> whether they were implicit or explicit will look explicit?  I agree, that
> would be a problem!
> 
> Jiufu, can you see if there is a way to determine whether a callee
> option in the LTO context really was an explicit option versus an
> implicit/default option?  ...or does your follow on patch to my
> patch do that?

So I set a break point on rs6000_can_inline_p() when it's called from
lto1, using a few different unit test cases, and it seems the callee's
->x_rs6000_isa_flags_explicit is set correctly!  That means we can
differentiate between implicitly and explicitly set options. 

Peter

Re: [PATCH 0/2][MSP430] Optimize zero_extend insns and PSImode pointer manipulation

2019-10-15 Thread Segher Boessenkool

On Tue, Oct 15, 2019 at 12:49:18PM -0600, Jeff Law wrote:
> There aren't many that use PSImode.  In general we don't handle partial
> modes well in the optimizers -- largely because they're just not that
> common and the exact size is unspecified.  PSImode for example can be
> anywhere between 16 and 32 bits.  We can't track the sign bit state,
> simplify extensions, etc etc.

For GCC, it has exactly 32 bits, just not all are necessarily significant:

  A @code{MODE_PARTIAL_INT} mode behaves as if it were as wide as the
  corresponding @code{MODE_INT} mode, except that it has an unknown
  number of undefined bits.

So sure, you could use only the low 16 bits of it, for example, if that
works out well for your port.

Such modes cannot (in the portable code) be used for any arithmetic, only
for data movement and the like.

> One of the problems I distinctly remember is the promotion of integer
> loop induction variables to ptrdiff_t then using them in pointer
> arithmetic generated *horrible* code.

Heh.

Segher

Re: [POC v2 PATCH] __builtin_warning

2019-10-15 Thread Joseph Myers

On Tue, 15 Oct 2019, Richard Sandiford wrote:

> > No.  I'd expect the code generating the IR from GCC diagnostics should 
> > arrange for the message to be translated first (while in the case of 
> > __builtin_warning in user code, it would not be translated), and the code 
> > subsequently using the IR to use diagnostic_set_info_translated or 
> > functions that end up calling it without trying to translate.
> 
> Might be a daft question, but would we ever want GCC to introduce new
> calls before LTO streaming?  I guess we shouldn't assume that the user
> who builds the final object will be using the same locale.

In that case, we'd need the IR to distinguish warnings to be translated 
(store original message text and arguments) and warnings from user code 
not to be translated.

-- 
Joseph S. Myers
jos...@codesourcery.com

[committed] hppa: Improve ordering of accesses during function pointer canonicalization

2019-10-15 Thread John David Anglin

The main fix here is to load the relocation offset before the function pointer 
during
function pointer canonicalization.  There is still a small race in 
multi-threaded applications
because the dynamic linker currently updates the relocation offset before the 
function pointer
when doing lazy binding.  When binutils is fixed to double word align function 
descriptors,
it will be possible to atomically update the descriptor using a floating point 
store.  Then,
this race will be eliminated.

Tested on hppa-unknown-linux-gnu with no regressions.  Committed to trunk and 
gcc-9 branch.

Dave

2019-10-15  John David Anglin  

* config/pa/fptr.c (_dl_read_access_allowed): Change argument to
unsigned int.  Adjust callers.
(__canonicalize_funcptr_for_compare): Change plabel type to volatile
unsigned int *.  Load relocation offset before function pointer.
Add barrier to ensure ordering.

Index: config/pa/fptr.c
===
--- config/pa/fptr.c(revision 276964)
+++ config/pa/fptr.c(working copy)
@@ -53,7 +53,7 @@
 extern unsigned int _GLOBAL_OFFSET_TABLE_;

 static inline int
-_dl_read_access_allowed (unsigned int *addr)
+_dl_read_access_allowed (unsigned int addr)
 {
   int result;

@@ -79,7 +79,8 @@
 {
   static unsigned int fixup_plabel[2] __attribute__((used));
   fixup_t fixup;
-  unsigned int *got, *iptr, *plabel;
+  volatile unsigned int *plabel;
+  unsigned int *got, *iptr, reloc_offset;
   int i;

   /* -1 and page 0 are special.  -1 is used in crtend to mark the end of
@@ -94,8 +95,8 @@
  to the entry of the PLT stub just before the global offset table.
  The second word in the plabel contains the relocation offset for the
  function.  */
-  plabel = (unsigned int *) ((unsigned int) fptr & ~3);
-  if (!_dl_read_access_allowed (plabel))
+  plabel = (volatile unsigned int *) ((unsigned int) fptr & ~3);
+  if (!_dl_read_access_allowed ((unsigned int)plabel))
 return (unsigned int) fptr;

   /* Load first word of candidate descriptor.  It should be a pointer
@@ -102,9 +103,12 @@
  with word alignment and point to memory that can be read.  */
   got = (unsigned int *) plabel[0];
   if (((unsigned int) got & 3) != 0
-  || !_dl_read_access_allowed (got))
+  || !_dl_read_access_allowed ((unsigned int)got))
 return (unsigned int) fptr;

+  /* We need to load the relocation offset before the function address.  */
+  reloc_offset = plabel[1];
+  __sync_synchronize();
   got = (unsigned int *) (plabel[0] + GOT_FROM_PLT_STUB);

   /* Return the address of the function if the plabel has been resolved.  */
@@ -140,7 +144,7 @@

   /* Call fixup to resolve the function address.  got[1] contains the
  link_map pointer and plabel[1] the relocation offset.  */
-  fixup ((struct link_map *) got[1], plabel[1]);
+  fixup ((struct link_map *) got[1], reloc_offset);

   return plabel[0];
 }

[PATCH] genattrtab: Parenthesize expressions correctly (PR92107)

2019-10-15 Thread Segher Boessenkool

As PR92107 shows, genattrtab doesn't parenthesize expressions correctly
(or at all, even).  This fixes it.

I'll commit it as trivial and obvious if my bootstrap with it shows no
problems (or someone tells me not to, of course).


Segher


2019-10-15  Segher Boessenkool  

PR rtl-optimization/92107
* genattrtab.c (write_attr_value) : Parenthesize the
expression written.

---
 gcc/genattrtab.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/genattrtab.c b/gcc/genattrtab.c
index cdf0b5c..2fd8593 100644
--- a/gcc/genattrtab.c
+++ b/gcc/genattrtab.c
@@ -4425,11 +4425,11 @@ write_attr_value (FILE *outf, class attr_desc *attr, 
rtx value)
   goto do_operator;
 
 do_operator:
+  fprintf (outf, "(");
   write_attr_value (outf, attr, XEXP (value, 0));
-  fputc (' ', outf);
-  fputc (op,  outf);
-  fputc (' ', outf);
+  fprintf (outf, " %c ", op);
   write_attr_value (outf, attr, XEXP (value, 1));
+  fprintf (outf, ")");
   break;
 
 case IF_THEN_ELSE:
-- 
1.8.3.1

[Patch][Fortran] OpenACC – permit common blocks in some clauses

2019-10-15 Thread Tobias Burnus


This OpenACC-only patch extends the support for /common/ blocks.

[In OpenMP (4.0 to 5.0, unchanged) and gfortran, common blocks are supported in 
copyin/copyprivate, in firstprivate/lastprivate/private/shared, in 
threadprivate and in declare target.]

For OpenACC, gfortran already supports common blocks for 
device_resident/usedevice/cache/flush/link.

This patch adds them (for OpenACC only) to copy/copyin/copyout, create/delete,
host, pcopy/pcopy_in/pcopy_out, present_or_copy, present_or_copy_in,
present_or_copy_out, present_or_create and self.
[Of those, only "copy()" is also an OpenMP clause name.]
[Cf. OpenACC 2.7 in 1.9 (for the p* variants) and 2.13; the latter is new since 
OpenACC 2.0.]



I think the Fortran part is obvious, once one agrees on the list of clauses; 
and OK from a Fortran's maintainer view.

gcc/gimplify.c: oacc_default_clause contains some changes; there are 
additionally two lines which only differ for ORT_ACC – Hence, it is an 
OpenACC-only change!
The ME change is about privatizing common blocks (I haven't studied this part 
closer.)


@Thomas: Please review
@Jakub, all: comments and approvals are welcome.

Tobias

PS: This patch is the rediffed OG9 (alias OG8) patch 
0793cef408c9937f4c4e2423dd1f7d6a97b9bed3 by Cesar Philippidis from 2016. (Which 
was on gomp-4_0-branch as r240165). Due to the wonders of GIT – when not 
requiring linear history and due to rebasing with GCC9, it is also part of the 
OG9 commit ac6c90812344f4f4cfe4d2f5901c1a9d038a4000 – which in addition also 
does some other things like handling OpenACC device pointers.

2019-10-15  Cesar Philippidis 
	Tobias Burnus  

	gcc/fortran/
	* openmp.c (gfc_match_omp_map_clause): Add and pass allow_commons
	argument.
	(gfc_match_omp_clauses): Update calls to permit common blocks for
	OpenACC's copy/copyin/copyout, create/delete, host,
	pcopy/pcopy_in/pcopy_out, present_or_copy, present_or_copy_in,
	present_or_copy_out, present_or_create and self.

	gcc/
	* gimplify.c (oacc_default_clause): Privatize fortran common blocks.
	(omp_notice_variable): Defer the expansion of DECL_VALUE_EXPR for
	common block decls.

	gcc/testsuite/
	* gfortran.dg/goacc/common-block-1.f90: New test.
	* gfortran.dg/goacc/common-block-2.f90: New test.

	libgomp/
	* testsuite/libgomp.oacc-fortran/common-block-1.f90: New test.
	* testsuite/libgomp.oacc-fortran/common-block-2.f90: New test.
	* testsuite/libgomp.oacc-fortran/common-block-3.f90: New test.

diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index 5c91fcdfd31..dbcb647ea6a 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -926,10 +926,11 @@ omp_inv_mask::omp_inv_mask (const omp_mask &m) : omp_mask (m)
mapping.  */
 
 static bool
-gfc_match_omp_map_clause (gfc_omp_namelist **list, gfc_omp_map_op map_op)
+gfc_match_omp_map_clause (gfc_omp_namelist **list, gfc_omp_map_op map_op,
+			  bool allow_common)
 {
   gfc_omp_namelist **head = NULL;
-  if (gfc_match_omp_variable_list ("", list, false, NULL, &head, true)
+  if (gfc_match_omp_variable_list ("", list, allow_common, NULL, &head, true)
   == MATCH_YES)
 {
   gfc_omp_namelist *n;
@@ -1051,7 +1052,7 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, const omp_mask mask,
 	  if ((mask & OMP_CLAUSE_COPY)
 	  && gfc_match ("copy ( ") == MATCH_YES
 	  && gfc_match_omp_map_clause (&c->lists[OMP_LIST_MAP],
-	   OMP_MAP_TOFROM))
+	   OMP_MAP_TOFROM, openacc))
 	continue;
 	  if (mask & OMP_CLAUSE_COPYIN)
 	{
@@ -1059,7 +1060,7 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, const omp_mask mask,
 		{
 		  if (gfc_match ("copyin ( ") == MATCH_YES
 		  && gfc_match_omp_map_clause (&c->lists[OMP_LIST_MAP],
-		   OMP_MAP_TO))
+		   OMP_MAP_TO, true))
 		continue;
 		}
 	  else if (gfc_match_omp_variable_list ("copyin (",
@@ -1070,7 +1071,7 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, const omp_mask mask,
 	  if ((mask & OMP_CLAUSE_COPYOUT)
 	  && gfc_match ("copyout ( ") == MATCH_YES
 	  && gfc_match_omp_map_clause (&c->lists[OMP_LIST_MAP],
-	   OMP_MAP_FROM))
+	   OMP_MAP_FROM, true))
 	continue;
 	  if ((mask & OMP_CLAUSE_COPYPRIVATE)
 	  && gfc_match_omp_variable_list ("copyprivate (",
@@ -1080,7 +1081,7 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, const omp_mask mask,
 	  if ((mask & OMP_CLAUSE_CREATE)
 	  && gfc_match ("create ( ") == MATCH_YES
 	  && gfc_match_omp_map_clause (&c->lists[OMP_LIST_MAP],
-	   OMP_MAP_ALLOC))
+	   OMP_MAP_ALLOC, true))
 	continue;
 	  break;
 	case 'd':
@@ -1116,7 +1117,7 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, const omp_mask mask,
 	  if ((mask & OMP_CLAUSE_DELETE)
 	  && gfc_match ("delete ( ") == MATCH_YES
 	  && gfc_match_omp_map_clause (&c->lists[OMP_LIST_MAP],
-	   OMP_MAP_RELEASE))
+	   OMP_MAP_RELEASE, true))
 	continue;
 	  if ((mask & OMP_CLAUSE_DEPEND)
 	  && gfc_match ("depend ( ") == MATCH_YES)
@@ -1168,12 +1169,12 @@ gfc_match_omp_cl

[PATCH] handle string copies with non-constant lengths (PR 91996)

2019-10-15 Thread Martin Sebor


The attached patch removes a FIXME added recently to the strlen
pass as a reminder to extend the handling of multi-byte stores
of characters copied from non-constant strings with constant
lengths to strings with non-constant lengths in some known range.
For the string length range information it relies on the EVRP
instance introduced into the pass with the sprintf integration
and so far only used by sprintf.  (This is just a small first
step in generalizing the strlen pass to take advantage of strlen
ranges.)

Tested on x86_64-linux.

Martin
PR tree-optimization/91996 - fold non-constant strlen relational expressions

gcc/testsuite/ChangeLog:

	PR tree-optimization/91996
	* gcc.dg/strlenopt-80.c: New test.
	* gcc.dg/strlenopt-81.c: New test.

gcc/ChangeLog:

	PR tree-optimization/91996
	* tree-ssa-strlen.c (maybe_warn_pointless_strcmp): Improve location
	information.
	(compare_nonzero_chars): Add an overload.
	(count_nonzero_bytes): Add an argument.  Call overload above.
	Handle non-constant lengths in some range.
	(handle_store): Add an argument.
	(check_and_optimize_stmt): Pass an argument to handle_store.

diff --git a/gcc/testsuite/gcc.dg/strlenopt-80.c b/gcc/testsuite/gcc.dg/strlenopt-80.c
new file mode 100644
index 000..c25d727c067
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/strlenopt-80.c
@@ -0,0 +1,95 @@
+/* PR tree-optimization/91996 - fold strlen relational expressions
+   { dg-do compile }
+   { dg-options "-O2 -Wall -fdump-tree-optimized" } */
+
+#define CHAR_BIT  __CHAR_BIT__
+#define SIZE_MAX  __SIZE_MAX__
+#define LEN_MAX   (__PTRDIFF_MAX__ - 2)
+
+typedef __PTRDIFF_TYPE__ ptrdiff_t;
+typedef __SIZE_TYPE__size_t;
+
+extern void* memcpy (void*, const void*, size_t);
+extern size_t strlen (const char*);
+
+#define CONCAT(a, b) a ## b
+#define CAT(a, b)CONCAT (a, b)
+
+extern void sink (void*, ...);
+extern void failure_on_line (int);
+
+extern char src[];
+extern char dst[];
+
+/* Copy (1 << NCPYLOG) bytes from an unknown string SRC with strlen (SRC)
+   in the range [MINSRCLEN, MAXSRCLEN] into DST + DSTOFF and verify that
+   strlen (DST + DSTOFF) is in the range [MINDSTLEN, MAXDSTLEN].  */
+#define MIN_MAX(dst, dstoff, src,	\
+	 minsrclen, maxsrclen, mindstlen, maxdstlen, ncpylog)	\
+  void CAT (test_on_line_, __LINE__) (void)\
+  {	\
+size_t srclen = strlen (src);	\
+if ((minsrclen) <= srclen && srclen <= (maxsrclen)) {		\
+  char *d = (dst) + (dstoff);	\
+  memcpy (d, src, (size_t)1 << (ncpylog));\
+  size_t dstlen = strlen (d);	\
+  if (dstlen < (mindstlen) || (maxdstlen) < dstlen)			\
+	{\
+	  failure_on_line (__LINE__);	\
+	}\
+  sink (dst, src);			\
+}	\
+  } typedef void dummy_type
+
+// Verify the lower bound of the resulting strlen range.
+#define MIN(dst, dstoff, src, minsrclen, mindstlen, ncpylog)		\
+  MIN_MAX (dst, dstoff, src, minsrclen, LEN_MAX, mindstlen, LEN_MAX, ncpylog)
+
+MIN (dst, 0, src, 2, 1, 0);
+MIN (dst, 0, src, 3, 1, 0);
+MIN (dst, 0, src, 3, 2, 1);
+MIN (dst, 0, src, 3, 2, 2);
+MIN (dst, 0, src, 3, 2, 3);
+
+MIN (dst, 1, src, 2, 1, 0);
+MIN (dst, 1, src, 3, 1, 0);
+MIN (dst, 1, src, 3, 2, 1);
+MIN (dst, 1, src, 3, 2, 2);
+MIN (dst, 1, src, 3, 2, 3);
+
+MIN (dst, 2, src, 2, 1, 0);
+MIN (dst, 3, src, 3, 1, 0);
+MIN (dst, 4, src, 3, 2, 1);
+MIN (dst, 5, src, 3, 2, 2);
+MIN (dst, 6, src, 3, 2, 3);
+
+
+MIN (dst, 0, src, 5, 1, 0);
+MIN (dst, 0, src, 5, 2, 1);
+MIN (dst, 0, src, 5, 4, 2);
+MIN (dst, 0, src, 5, 5, 3);
+MIN (dst, 0, src, 5, 5, 4);
+
+MIN (dst, 11, src, 5, 1, 0);
+MIN (dst, 22, src, 5, 2, 1);
+MIN (dst, 33, src, 5, 4, 2);
+MIN (dst, 44, src, 5, 5, 3);
+MIN (dst, 55, src, 5, 5, 4);
+
+MIN (dst, 11, src, LEN_MAX, 1, 0);
+MIN (dst, 22, src, LEN_MAX, 2, 1);
+MIN (dst, 33, src, LEN_MAX, 4, 2);
+MIN (dst, 44, src, LEN_MAX, 5, 3);
+MIN (dst, 55, src, LEN_MAX, 5, 4);
+MIN (dst, 66, src, LEN_MAX, 9, 8);
+MIN (dst, 66, src, LEN_MAX, LEN_MAX, sizeof (ptrdiff_t) * CHAR_BIT - 1);
+
+
+MIN_MAX (dst, 0, src, 3, 5, 1, LEN_MAX, 0);
+MIN_MAX (dst, 0, src, 3, 5, 2, LEN_MAX, 1);
+MIN_MAX (dst, 0, src, 3, 5, 3, LEN_MAX, 2);
+
+/* Upper bound not implemented yet.
+   MIN_MAX (dst, 0, src, 3, 5, 3, 5, 3);  */
+
+/* { dg-final { scan-tree-dump-times "failure_on_line \\(" 0 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/strlenopt-81.c b/gcc/testsuite/gcc.dg/strlenopt-81.c
new file mode 100644
index 000..fc4b6995a04
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/strlenopt-81.c
@@ -0,0 +1,190 @@
+/* PR tree-optimization/ - fold strlen relational expressions
+   { dg-do run }
+   { dg-options "-O2 "-Wall -fdump-tree-optimized" } */
+
+typedef __SIZE_TYPE__ size_t;
+
+#define NOIPA   __attribute__ ((noipa))
+
+#define CONCAT(a, b) a ## b
+#define CAT(a, b)CONCAT (a, b)
+
+/* Used in tests where EXPR is expected to be folded to false.  */
+#define ELIM(expr)			\
+  if (expr) {\
+extern void\
+  CAT (CAT (test_on_line_, __L

[Committed] Two new bit-field compile testcases

2019-10-15 Thread Andrew Pinski

Hi all,
  While working on implementing lowering of bit-field accesses in
gimple, I ran into an ICE which was not covered by the current
testsuite.

Committed these two new testcases as obvious.

Thanks,
Andrew Pinski

ChangeLog:
* gcc.c-torture/compile/20191015-1.c: New test.
* gcc.c-torture/compile/20191015-2.c: New test.
Index: gcc/testsuite/gcc.c-torture/compile/20191015-1.c
===
--- gcc/testsuite/gcc.c-torture/compile/20191015-1.c(nonexistent)
+++ gcc/testsuite/gcc.c-torture/compile/20191015-1.c(working copy)
@@ -0,0 +1,17 @@
+typedef unsigned uint32_t;
+typedef unsigned char uint8_t;
+typedef unsigned short uint16_t;
+typedef struct {
+ uint32_t mbxOwner:1;
+ uint32_t mbxHc:1;
+ uint32_t mbxReserved:6;
+ uint32_t mbxCommand : 8;
+ uint32_t mbxStatus : 16;
+} MAILBOX_t;
+uint32_t f(void) {
+   uint32_t mbox;
+ mbox = 0;
+ ((MAILBOX_t *)&mbox)->mbxCommand = 0x24;
+ ((MAILBOX_t *)&mbox)->mbxOwner = 1;
+return mbox;
+}
Index: gcc/testsuite/gcc.c-torture/compile/20191015-2.c
===
--- gcc/testsuite/gcc.c-torture/compile/20191015-2.c(nonexistent)
+++ gcc/testsuite/gcc.c-torture/compile/20191015-2.c(working copy)
@@ -0,0 +1,17 @@
+typedef unsigned uint32_t;
+typedef unsigned char uint8_t;
+typedef unsigned short uint16_t;
+typedef struct {
+ uint8_t mbxOwner:1;
+ uint8_t mbxHc:1;
+ uint8_t mbxReserved:6;
+ uint8_t mbxCommand : 8;
+ uint16_t mbxStatus : 16;
+} MAILBOX_t;
+uint32_t f(void) {
+   uint32_t mbox;
+ mbox = 0;
+ ((MAILBOX_t *)&mbox)->mbxCommand = 0x24;
+ ((MAILBOX_t *)&mbox)->mbxOwner = 1;
+return mbox;
+}

[PATCH] ostreambuf_iterator std::advance overload

2019-10-15 Thread François Dumont


Hi

    I completed this overload before noticing that the Standard do not 
expect anything when 'advancing' an output iterator.


    But as I've done it consistenly with the istreambuf_iterator here 
it is with samples about how to use it.


    Let me know if acceptable.

François

diff --git a/libstdc++-v3/include/bits/streambuf_iterator.h b/libstdc++-v3/include/bits/streambuf_iterator.h
index 17dd3972d45..76ba96d634a 100644
--- a/libstdc++-v3/include/bits/streambuf_iterator.h
+++ b/libstdc++-v3/include/bits/streambuf_iterator.h
@@ -242,6 +242,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	copy(istreambuf_iterator<_CharT2>, istreambuf_iterator<_CharT2>,
 	 ostreambuf_iterator<_CharT2>);
 
+  template
+	friend typename __gnu_cxx::__enable_if<__is_char<_CharT2>::__value,
+	   void>::__type
+	advance(ostreambuf_iterator<_CharT2>&, _Distance);
+
 private:
   streambuf_type*	_M_sbuf;
   bool		_M_failed;
@@ -453,6 +458,37 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	}
 }
 
+  template
+typename __gnu_cxx::__enable_if<__is_char<_CharT>::__value,
+void>::__type
+advance(ostreambuf_iterator<_CharT>& __i, _Distance __n)
+{
+  if (__n == 0)
+	return;
+
+  __glibcxx_assert(__n > 0);
+  __glibcxx_requires_cond(!__i.failed(),
+			  _M_message(__gnu_debug::__msg_advance_oob)
+			  ._M_iterator(__i)
+			  ._M_integer(__n));
+
+  typedef basic_streambuf<_CharT> __streambuf_t;
+  typedef typename __streambuf_t::pos_type __pos_t;
+  __pos_t __cur_pos
+	= __i._M_sbuf->pubseekoff(0, ios_base::cur, ios_base::out);
+  __pos_t __new_pos =
+	__i._M_sbuf->pubseekoff(__n, ios_base::cur, ios_base::out);
+
+  if (__new_pos - __cur_pos != __n)
+	{
+	  __i._M_failed = true;
+	  __glibcxx_requires_cond(!__i.failed(),
+  _M_message(__gnu_debug::__msg_advance_oob)
+  ._M_iterator(__i)
+  ._M_integer(__n));
+	}
+}
+
 // @} group iterators
 
 _GLIBCXX_END_NAMESPACE_VERSION
diff --git a/libstdc++-v3/testsuite/25_algorithms/advance/ostreambuf_iterator/char/1.cc b/libstdc++-v3/testsuite/25_algorithms/advance/ostreambuf_iterator/char/1.cc
new file mode 100644
index 000..15cf0b27863
--- /dev/null
+++ b/libstdc++-v3/testsuite/25_algorithms/advance/ostreambuf_iterator/char/1.cc
@@ -0,0 +1,55 @@
+// Copyright (C) 2019 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+#include 
+#include 
+#include 
+
+#include 
+
+void test01()
+{
+  using namespace std;
+
+  const char data1[] = "Drei Phantasien nach Friedrich Holderlin";
+  string str1(data1);
+  str1[17] = 'i';
+
+  ostringstream oss1(str1);
+  ostreambuf_iterator beg1(oss1);
+
+  std::advance(beg1, 17);
+  *beg1 = 'a';
+
+  VERIFY( !beg1.failed() );
+  VERIFY( oss1.str() == data1 );
+  str1 = oss1.str();
+
+  // -1 for the trailing '\0'
+  // -1 for the beg1 assignment.
+  std::advance(beg1, sizeof(data1) - 17 - 1 - 1);
+  *beg1 = '.';
+
+  str1 += '.';
+  VERIFY( oss1.str() == str1 );
+}
+
+int main()
+{
+  test01();
+  return 0;
+}
diff --git a/libstdc++-v3/testsuite/25_algorithms/advance/ostreambuf_iterator/char/1_neg.cc b/libstdc++-v3/testsuite/25_algorithms/advance/ostreambuf_iterator/char/1_neg.cc
new file mode 100644
index 000..9d1e4a94742
--- /dev/null
+++ b/libstdc++-v3/testsuite/25_algorithms/advance/ostreambuf_iterator/char/1_neg.cc
@@ -0,0 +1,40 @@
+// Copyright (C) 2019 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-do run { xfail *-*-* } }
+// { dg-require-debug-mode "" }
+
+#include 
+#include 
+#include 
+
+void test01()
+{
+  using namespace std;
+
+  const char data1[]

[Darwin, committed] Clarify fix and continue support (NFC).

2019-10-15 Thread Iain Sandoe

This updates the description of the support for fix and continue debugging.
No functional change intended.

tested on x86_64-darwin16,
applied to mainline,
Iain

gcc/ChangeLog:

2019-10-15  Iain Sandoe  

* config/darwin.c: Update description of fix and continue.

diff --git a/gcc/config/darwin.c b/gcc/config/darwin.c
index d34b0035bd..c0fafedfa8 100644
--- a/gcc/config/darwin.c
+++ b/gcc/config/darwin.c
@@ -49,18 +49,24 @@ along with GCC; see the file COPYING3.  If not see
 #include "intl.h"
 #include "optabs.h"
 
-/* Darwin supports a feature called fix-and-continue, which is used
-   for rapid turn around debugging.  When code is compiled with the
-   -mfix-and-continue flag, two changes are made to the generated code
-   that allow the system to do things that it would normally not be
-   able to do easily.  These changes allow gdb to load in
-   recompilation of a translation unit that has been changed into a
-   running program and replace existing functions and methods of that
-   translation unit with versions of those functions and methods
-   from the newly compiled translation unit.  The new functions access
-   the existing static symbols from the old translation unit, if the
-   symbol existed in the unit to be replaced, and from the new
-   translation unit, otherwise.
+/* Fix and Continue.
+
+   NOTES:
+   1) this facility requires suitable support from a modified version
+   of GDB, which is not provided on any system after MacOS 10.7/Darwin11.
+   2) There is no support for this in any X86 version of the FSF compiler.
+
+   Fix and continue was used on some earlier MacOS systems for rapid turn
+   around debugging.  When code is compiled with the -mfix-and-continue
+   flag, two changes are made to the generated code that allow the system
+   to do things that it would normally not be able to do easily.  These
+   changes allow gdb to load in recompilation of a translation unit that
+   has been changed into a running program and replace existing functions
+   and methods of that translation unit with versions of those functions
+   and methods from the newly compiled translation unit.  The new functions
+   access the existing static symbols from the old translation unit, if the
+   symbol existed in the unit to be replaced, and from the new translation
+   unit, otherwise.
 
The changes are to insert 5 nops at the beginning of all functions
and to use indirection to get at static symbols.  The 5 nops

[Darwin, committed] Update darwin_binds_local_p.

2019-10-15 Thread Iain Sandoe



The use of default_binds_local_p had got out of sync with the varasm
changes, this restores the call to be direct.  In practice, we add some
further tests to determine local binding - but this callback is used for
the initial assessments made by default_encode_section_info().

tested on x86_64-darwin16,
applied to mainline,
thanks
Iain

gcc/ChangeLog:

2019-10-15  Iain Sandoe  

* config/darwin.c (darwin_binds_local_p): Update to call
default_binds_local_p_3 () directly. amend comments.

diff --git a/gcc/config/darwin.c b/gcc/config/darwin.c
index 1d386e0670..d34b0035bd 100644
--- a/gcc/config/darwin.c
+++ b/gcc/config/darwin.c
@@ -3049,8 +3049,14 @@ darwin_file_end (void)
 bool
 darwin_binds_local_p (const_tree decl)
 {
-  return default_binds_local_p_1 (decl,
- TARGET_KEXTABI && DARWIN_VTABLE_P (decl));
+  /* We use the "shlib" input to indicate that a symbol should be
+ considered overridable; only relevant for vtables in kernel modules
+ on earlier system versions, and with a TODO to complete.  */
+  bool force_overridable = TARGET_KEXTABI && DARWIN_VTABLE_P (decl);
+  return default_binds_local_p_3 (decl, force_overridable /* shlib */,
+ false /* weak dominate */,
+ false /* extern_protected_data */,
+ false /* common_local_p */);
 }
 
 /* The Darwin's implementation of TARGET_ASM_OUTPUT_ANCHOR.  Define the

Re: [PATCH] Implement std::advance for istreambuf_iterator using pubseekoff

2019-10-15 Thread François Dumont


Here is an update to set _M_sbuf to null if the user advance too much.

Note that in this case the streambuf remains un-modified which is 
different from the current implementation. I think it is another 
enhancement.


I also change the Debug assertion message for something more dedicated 
to std::advance algo.


François

On 10/14/19 10:12 PM, François Dumont wrote:
The same way I proposed to review std::copy overload for 
istreambuf_iterator we can implement std::advance using pubseekoff.


It is both a cleaner implementation and avoids yet another friend 
declaration.



    * include/std/streambuf
    (advance(istreambuf_iterator<>&, _Distance)): Remove friend 
declaration.
    * include/bits/streambuf_iterator.h (__copy_move_a2): Re-implement 
using

    streambuf pubseekoff.

Tested under Linux x86_64.

Ok to commit ?

François



diff --git a/libstdc++-v3/include/bits/streambuf_iterator.h b/libstdc++-v3/include/bits/streambuf_iterator.h
index 134b3486b9a..17dd3972d45 100644
--- a/libstdc++-v3/include/bits/streambuf_iterator.h
+++ b/libstdc++-v3/include/bits/streambuf_iterator.h
@@ -431,37 +431,26 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   __glibcxx_assert(__n > 0);
   __glibcxx_requires_cond(!__i._M_at_eof(),
-			  _M_message(__gnu_debug::__msg_inc_istreambuf)
-			  ._M_iterator(__i));
-
-  typedef istreambuf_iterator<_CharT>		   __is_iterator_type;
-  typedef typename __is_iterator_type::traits_type	   traits_type;
-  typedef typename __is_iterator_type::streambuf_type  streambuf_type;
-  typedef typename traits_type::int_type		   int_type;
-  const int_type __eof = traits_type::eof();
-
-  streambuf_type* __sb = __i._M_sbuf;
-  while (__n > 0)
-	{
-	  streamsize __size = __sb->egptr() - __sb->gptr();
-	  if (__size > __n)
+			  _M_message(__gnu_debug::__msg_advance_oob)
+			  ._M_iterator(__i)
+			  ._M_integer(__n));
+
+  typedef basic_streambuf<_CharT> __streambuf_t;
+  typedef typename __streambuf_t::pos_type __pos_t;
+  __pos_t __cur_pos
+	= __i._M_sbuf->pubseekoff(0, ios_base::cur, ios_base::in);
+  __pos_t __new_pos
+	= __i._M_sbuf->pubseekoff(__n, ios_base::cur, ios_base::in);
+  __i._M_c = char_traits<_CharT>::eof();
+
+  if (__new_pos - __cur_pos != __n)
 	{
-	  __sb->__safe_gbump(__n);
-	  break;
-	}
-
-	  __sb->__safe_gbump(__size);
-	  __n -= __size;
-	  if (traits_type::eq_int_type(__sb->underflow(), __eof))
-	{
-	  __glibcxx_requires_cond(__n == 0,
-_M_message(__gnu_debug::__msg_inc_istreambuf)
-._M_iterator(__i));
-	  break;
-	}
+	  __i._M_sbuf = 0;
+	  __glibcxx_requires_cond(!__i._M_at_eof(),
+  _M_message(__gnu_debug::__msg_advance_oob)
+  ._M_iterator(__i)
+  ._M_integer(__n));
 	}
-
-  __i._M_c = __eof;
 }
 
 // @} group iterators
diff --git a/libstdc++-v3/include/std/streambuf b/libstdc++-v3/include/std/streambuf
index 3442f19bd78..ef03da39bc2 100644
--- a/libstdc++-v3/include/std/streambuf
+++ b/libstdc++-v3/include/std/streambuf
@@ -155,11 +155,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 find(istreambuf_iterator<_CharT2>, istreambuf_iterator<_CharT2>,
 	 const _CharT2&);
 
-  template
-friend typename __gnu_cxx::__enable_if<__is_char<_CharT2>::__value,
-	   void>::__type
-advance(istreambuf_iterator<_CharT2>&, _Distance);
-
   template
 friend basic_istream<_CharT2, _Traits2>&
 operator>>(basic_istream<_CharT2, _Traits2>&, _CharT2*);
diff --git a/libstdc++-v3/testsuite/25_algorithms/advance/istreambuf_iterators/char/3.cc b/libstdc++-v3/testsuite/25_algorithms/advance/istreambuf_iterators/char/3.cc
new file mode 100644
index 000..8763e7fa78e
--- /dev/null
+++ b/libstdc++-v3/testsuite/25_algorithms/advance/istreambuf_iterators/char/3.cc
@@ -0,0 +1,49 @@
+// Copyright (C) 2017-2019 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// Debug mode would detect the invalid std::advance call.
+// { dg-require-normal-mode "" }
+
+#include 
+#include 
+#include 
+
+#include 
+
+void test01()
+{
+  using namespace std;
+
+  typedef istreambuf_iterator in_iterator_type;
+
+  const char data1[] = "Drei Phantasien nach Friedrich Holderlin";
+  istringstream iss1(data1);
+  in_iterator_type beg1(iss1);
+

[PATCH] diagnose hard errors in concept satisfaction

2019-10-15 Thread Andrew Sutton

Certain errors encountered during constraint satisfaction render the
program ill-formed. Emit those as errors during satisfaction and not
when diagnosing constraint errors.

The errors should include the full context for failure (i.e., when
satisfying X, when satisfying Y, this failed), but we don't build that
information until we know we need to diagnose an error. This patch
does not include that context.

Andrew Sutton


errors.patch
Description: Binary data

Re: [PATCH 0/2][MSP430] Optimize zero_extend insns and PSImode pointer manipulation

2019-10-15 Thread Jeff Law

On 10/15/19 6:49 AM, Jozef Lawrynowicz wrote:
> On Mon, 14 Oct 2019 15:18:08 -0600
> Jeff Law  wrote:
> 
>> On 10/8/19 4:34 AM, Jozef Lawrynowicz wrote:
>>> In the large memory model, MSP430 instructions have some useful properties 
>>> when
>>> performing byte, word or address-word writes to registers or memory:
>>> - Byte-writes to registers clear bits 19:8
>>> - Word-writes to registers clear bits 19:16
>>> - PSImode writes to memory clear bits 16:4 of the second memory word
>>>
>>> This patch makes use of these properties to optimize some zero_extend
>>> instructions.
>>>
>>> There are some "synonyms" for these zero_extend instructions that combine
>>> searches for when optimizing code which manipulates PSImode pointers. The 
>>> patch
>>> adds a number of these unnamed RTL insns.
>>>
>>> The first patch is an "obvious" patch with no functional changes, which just
>>> reorders the zero_extend insns in the md file so we get them in one place.
>>> The second patch has functional changes.
>>>
>>> (Note that the patches will not apply cleanly unless the recently submitted
>>> patch to implement post increment addressing has been applied:
>>> https://gcc.gnu.org/ml/gcc-patches/2019-10/msg00492.html)
>>>
>>> Successfully regtested on trunk in the small and large memory models.
>>>
>>> Ok for trunk?
>>>
>>> Jozef Lawrynowicz (2):
>>>   MSP430: Reorder and group zero_extend insns in msp430.md
>>>   MSP430: PSImode pointer manipulation and zero extend insn
>>> optimizations  
>> FWIW, we often end up describing some of this stuff via
>> WORD_REGISTER_OPERATIONS.  However that may not work for word sizes that
>> are not a power of two bits.
>>
>> I only mention it because it may be worth a bit of experimentation on
>> your end.
> 
> We have WORD_REGISTER_OPERATIONS defined to 1, but UNITS_PER_WORD is always 2
> bytes. But yes I should look at how the RTL passes use this macro.
> 
> I played around with some other macros e.g. TARGET_TRULY_NOOP_TRUNCATION,
> REGMODE_NATURAL_SIZE but none had any desirable effect. It feels like there is
> still something that needs to be fixed which will unlock better RTL generation
> for PSImode operations, but I believe I've exhausted the target macros which
> can influence this. So maybe something in the middle-end needs to be added or
> extended.
> I should also study some of the other ports with PSImode registers more 
> closely.
There aren't many that use PSImode.  In general we don't handle partial
modes well in the optimizers -- largely because they're just not that
common and the exact size is unspecified.  PSImode for example can be
anywhere between 16 and 32 bits.  We can't track the sign bit state,
simplify extensions, etc etc.

One of the problems I distinctly remember is the promotion of integer
loop induction variables to ptrdiff_t then using them in pointer
arithmetic generated *horrible* code.  It's ultimately necessary from a
standard standpoint.


> 
> P.S. I took at the old mn10200 port as you suggested before but I wasn't able
> to improve code gen by changing any of the target macros that were defined
> differently. The combiner patterns for pointer manipulation in the port were
> similar but not close enough to the patterns I added in this patch.
ACK.

jeff

Re: C++ PATCH for c++/92106 - ICE with structured bindings and -Wreturn-local-addr

2019-10-15 Thread Jakub Jelinek

On Tue, Oct 15, 2019 at 02:28:12PM -0400, Marek Polacek wrote:
> --- /dev/null
> +++ gcc/testsuite/g++.dg/cpp1z/decomp50.C
> @@ -0,0 +1,51 @@
> +// PR c++/92106 - ICE with structured bindings and -Wreturn-local-addr. 
> +// { dg-do compile { target c++17 } }
> +
> +template  struct B;
> +template  struct B<_Tp *> { typedef _Tp reference; };

I've looked at the unreduced testcase and it seems to have
template  struct B<_Tp *> { typedef _Tp& reference; };
for this line instead.  I'd think it would be better to change the testcase,
so that it is well defined rather than undefined then.
It is struct iterator_traits<_Tp*>, right?

Otherwise, the patch LGTM, but I'll defer to Jason as the maintainer.

Jakub

Re: C++ PATCH for c++/92106 - ICE with structured bindings and -Wreturn-local-addr

2019-10-15 Thread Marek Polacek

On Tue, Oct 15, 2019 at 07:34:31PM +0200, Jakub Jelinek wrote:
> On Tue, Oct 15, 2019 at 01:17:17PM -0400, Marek Polacek wrote:
> > 2019-10-15  Marek Polacek  
> > 
> > PR c++/92106 - ICE with structured bindings and -Wreturn-local-addr.
> > * typeck.c (maybe_warn_about_returning_address_of_local): Avoid
> > recursing on null initializer.
> > 
> > * g++.dg/cpp1z/decomp50.C: New test.
> > 
> > diff --git gcc/cp/typeck.c gcc/cp/typeck.c
> > index 141d86f50c9..1825540016f 100644
> > --- gcc/cp/typeck.c
> > +++ gcc/cp/typeck.c
> > @@ -9354,10 +9354,8 @@ maybe_warn_about_returning_address_of_local (tree 
> > retval)
> >  binding.  */
> >   tree base = DECL_DECOMP_BASE (whats_returned);
> >   if (TYPE_REF_P (TREE_TYPE (base)))
> > -   {
> > - tree init = DECL_INITIAL (base);
> > +   if (tree init = DECL_INITIAL (base))
> >   return maybe_warn_about_returning_address_of_local (init);
> > -   }
> 
> Actually, seeing the dg-warning in the testcase, I think we shouldn't warn,
> for range-for it is just too hard to find out if it will be returning
> address of a local or not. &value in itself is not address of a local
> variable when the structured binding is a reference.
> Well, in the testcase as is it actually is (perhaps just in the reduced
> one and not original):
> const struct J & D.2293;
> const int name [value-expr: D.2293->name];
> const int value [value-expr: D.2293->value];
> ...
>   struct reference D.2352;
> 
>   try
> {
>   D.2352 = D::operator* (&__for_begin);
>   D.2293 = &D.2352;
> but a small change to the testcase:
>  template  class D {
>  public:
> -  typename B<_Iterator>::reference operator*();
> +  typename B<_Iterator>::reference &operator*();
>void operator++();
>  };
> results in D.2293 = D::operator* (&__for_begin);
> and then it might very well not be address of a local variable.
> 
> This isn't just about a false positive warning, when
> maybe_warn_about_returning_address_of_local returns true, then
> we actually return NULL pointer instead of the value user wanted.
> 
> So, I think you want to add and do else return false;
> if init is NULL.

That's an interesting point, thanks.  Here's a patch with that return
added.

Bootstrapped/regtested on x86_64-linux, ok for trunk / 9?

2019-10-15  Marek Polacek  

PR c++/92106 - ICE with structured bindings and -Wreturn-local-addr.
* typeck.c (maybe_warn_about_returning_address_of_local): Avoid
recursing on null initializer and return false instead.

* g++.dg/cpp1z/decomp50.C: New test.

diff --git gcc/cp/typeck.c gcc/cp/typeck.c
index 141d86f50c9..90bd5bb9cdc 100644
--- gcc/cp/typeck.c
+++ gcc/cp/typeck.c
@@ -9355,8 +9355,10 @@ maybe_warn_about_returning_address_of_local (tree retval)
  tree base = DECL_DECOMP_BASE (whats_returned);
  if (TYPE_REF_P (TREE_TYPE (base)))
{
- tree init = DECL_INITIAL (base);
- return maybe_warn_about_returning_address_of_local (init);
+ if (tree init = DECL_INITIAL (base))
+   return maybe_warn_about_returning_address_of_local (init);
+ else
+   return false;
}
}
   bool w = false;
diff --git gcc/testsuite/g++.dg/cpp1z/decomp50.C 
gcc/testsuite/g++.dg/cpp1z/decomp50.C
new file mode 100644
index 000..13b40146379
--- /dev/null
+++ gcc/testsuite/g++.dg/cpp1z/decomp50.C
@@ -0,0 +1,51 @@
+// PR c++/92106 - ICE with structured bindings and -Wreturn-local-addr. 
+// { dg-do compile { target c++17 } }
+
+template  struct B;
+template  struct B<_Tp *> { typedef _Tp reference; };
+struct C {
+  template  using rebind = _Up *;
+};
+template  class D {
+public:
+  typename B<_Iterator>::reference operator*();
+  void operator++();
+};
+
+template 
+bool operator!=(D<_Iterator, _Container>, D<_Iterator, _Container>);
+template  class F {
+public:
+  typedef _Tp value_type;
+};
+
+template  struct G {
+  template  struct H { using type = C::rebind<_Tp>; };
+  using const_pointer = typename H::type;
+};
+template > class I {
+  typedef D::const_pointer, int> const_iterator;
+
+public:
+  const_iterator begin();
+  const_iterator end();
+};
+
+struct A {
+  struct J {
+int name;
+int value;
+  };
+  I members;
+  template  const int *find(Key) {
+for (const auto &[name, value] : members)
+  // See 
+  // for why we don't warn here.
+  return &value; // { dg-bogus "address of local variable" }
+return nullptr;
+  }
+};
+int main() {
+  A a;
+  a.find("");
+}

Re: [PATCH] rs6000: -flto forgets 'no-vsx' function attributes (PR target/70010)

2019-10-15 Thread Peter Bergner

On 10/15/19 1:21 PM, Segher Boessenkool wrote:
> On Tue, Oct 15, 2019 at 12:47:02PM -0500, Peter Bergner wrote:
>> I'd say this is user error, telling the compiler it has to inline the callee
>> function, but then using incompatible options on the caller and the callee,
>> so that it cannot.  I think the error message is the correct thing here.
> 
> Everything here is -mno-vsx though?  The always_inline "foo" has it as
> target attribute, but everyone (also) has it from the command line arg?

Oh, I thought main() was compiled with -mvsx.  My bad! :-)

Peter

Re: [PATCH][AArch64] Fix symbol offset limit

2019-10-15 Thread Wilco Dijkstra

Hi Richard,

> Sure, the "extern array of unknown size" case isn't about section anchors.
> But this part of my message (snipped above) was about the other case
> (objects of known size), and applied to individual objects as well as
> section anchors.
>
> What I was trying to say is: yes, we need better offsets for references
> to extern objects of unknown size.  But your patch does that by reducing
> the offset range for all objects, including ones with known sizes in
> which the documented ranges should be safe.
>
> I was trying to explain why I don't think we need to reduce the range
> in that case too.  If offset_within_block_p then any offset should be
> OK (the aggressive interpretation) or the original documented ranges
> should be OK.  I think we only need the smaller range when
> offset_within_block_p returns false.

Right I see now. Yes given offset_within_block_p is not sufficient, we could 
test
both conditions. Here is the updated patch:

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
7ee31a66b12d7354759f06449955e933421f5fe0..31c394a7e8567dd7d4f1698e5ba98aeb8807df38
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -14079,26 +14079,30 @@ aarch64_classify_symbol (rtx x, HOST_WIDE_INT offset)
 the offset does not cause overflow of the final address.  But
 we have no way of knowing the address of symbol at compile time
 so we can't accurately say if the distance between the PC and
-symbol + offset is outside the addressible range of +/-1M in the
-TINY code model.  So we rely on images not being greater than
-1M and cap the offset at 1M and anything beyond 1M will have to
-be loaded using an alternative mechanism.  Furthermore if the
-symbol is a weak reference to something that isn't known to
-resolve to a symbol in this module, then force to memory.  */
- if ((SYMBOL_REF_WEAK (x)
-  && !aarch64_symbol_binds_local_p (x))
- || !IN_RANGE (offset, -1048575, 1048575))
+symbol + offset is outside the addressible range of +/-1MB in the
+TINY code model.  So we limit the maximum offset to +/-64KB and
+assume the offset to the symbol is not larger than +/-(1MB - 64KB).
+Furthermore force to memory if the symbol is a weak reference to
+something that doesn't resolve to a symbol in this module.  */
+
+ if (SYMBOL_REF_WEAK (x) && !aarch64_symbol_binds_local_p (x))
+   return SYMBOL_FORCE_TO_MEM;
+ if (!(IN_RANGE (offset, -0x1, 0x1)
+   || offset_within_block_p (x, offset)))
return SYMBOL_FORCE_TO_MEM;
+
  return SYMBOL_TINY_ABSOLUTE;
 
case AARCH64_CMODEL_SMALL:
  /* Same reasoning as the tiny code model, but the offset cap here is
-4G.  */
- if ((SYMBOL_REF_WEAK (x)
-  && !aarch64_symbol_binds_local_p (x))
- || !IN_RANGE (offset, HOST_WIDE_INT_C (-4294967263),
-   HOST_WIDE_INT_C (4294967264)))
+1MB, allowing +/-3.9GB for the offset to the symbol.  */
+
+ if (SYMBOL_REF_WEAK (x) && !aarch64_symbol_binds_local_p (x))
+   return SYMBOL_FORCE_TO_MEM;
+ if (!(IN_RANGE (offset, -0x10, 0x10)
+   || offset_within_block_p (x, offset)))
return SYMBOL_FORCE_TO_MEM;
+
  return SYMBOL_SMALL_ABSOLUTE;
 
case AARCH64_CMODEL_TINY_PIC:
diff --git a/gcc/testsuite/gcc.target/aarch64/symbol-range-tiny.c 
b/gcc/testsuite/gcc.target/aarch64/symbol-range-tiny.c
index 
d7e46b059e41f2672b3a1da5506fa8944e752e01..fc6a4f3ec780d9fa86de1c8e1a42a55992ee8b2d
 100644
--- a/gcc/testsuite/gcc.target/aarch64/symbol-range-tiny.c
+++ b/gcc/testsuite/gcc.target/aarch64/symbol-range-tiny.c
@@ -1,12 +1,12 @@
-/* { dg-do compile } */
+/* { dg-do link } */
 /* { dg-options "-O3 -save-temps -mcmodel=tiny" } */
 
-int fixed_regs[0x0020];
+char fixed_regs[0x0008];
 
 int
-foo()
+main ()
 {
-  return fixed_regs[0x0008];
+  return fixed_regs[0x000ff000];
 }
 
 /* { dg-final { scan-assembler-not "adr\tx\[0-9\]+, fixed_regs\\\+" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/symbol-range.c 
b/gcc/testsuite/gcc.target/aarch64/symbol-range.c
index 
6574cf4310430b847e77ea56bf8f20ef312d53e4..d8e82fa1b2829fd300b6ccf7f80241e5573e7e17
 100644
--- a/gcc/testsuite/gcc.target/aarch64/symbol-range.c
+++ b/gcc/testsuite/gcc.target/aarch64/symbol-range.c
@@ -1,12 +1,12 @@
-/* { dg-do compile } */
+/* { dg-do link } */
 /* { dg-options "-O3 -save-temps -mcmodel=small" } */
 
-int fixed_regs[0x2ULL];
+char fixed_regs[0x8000];
 
 int
-foo()
+main ()
 {
-  return fixed_regs[0x1ULL];
+  return fixed_regs[0xf000];
 }
 
 /* { dg-final { scan-assembler-not "adrp\tx\[0-9\]+, fixed_regs\\\+" } } */

Re: [PATCH] rs6000: -flto forgets 'no-vsx' function attributes (PR target/70010)

2019-10-15 Thread Segher Boessenkool

On Tue, Oct 15, 2019 at 12:47:02PM -0500, Peter Bergner wrote:
> On 10/15/19 4:56 AM, Segher Boessenkool wrote:
> > On Tue, Oct 15, 2019 at 05:15:07PM +0800, Jiufu Guo wrote:
> >> And another issue: Behavior is still inconsistent between "-mno-vsx
> >> -flto" and "-mno-vsx" for user code. Previous patch makes it consistent
> >> between "-mvsx -flto" and "-mvsx". 
> > 
> >>  $GCC_BUILD/gcc/xgcc -B$GCC_BUILD/gcc novsx.c -O2 -mno-vsx
> >> /home/guojiufu/temp/novsx.c: In function 'main':
> >> /home/guojiufu/temp/novsx.c:6:1: error: inlining failed in call to 
> >> 'always_inline' 'foo': target specific option mismatch
> > 
> > So what should we do about this?  There are arguments for *both*
> > behaviours, and apparently with LTO we do not know which flags are
> > explicit?
> 
> I'd say this is user error, telling the compiler it has to inline the callee
> function, but then using incompatible options on the caller and the callee,
> so that it cannot.  I think the error message is the correct thing here.

Everything here is -mno-vsx though?  The always_inline "foo" has it as
target attribute, but everyone (also) has it from the command line arg?


Segher

Re: [patch] canonicalize unsigned [1,MAX] ranges into ~[0,0]

2019-10-15 Thread Jakub Jelinek

On Tue, Oct 15, 2019 at 08:35:07AM -0400, Aldy Hernandez wrote:
> > I'm seeing this on 32-bit i386-pc-solaris2.11 and sparc-sun-solaris2.11,
> > with more reports for armv8l, pru, and s390x.
> > 
> > Comparing the dumps between 64 and 32-bit, I see
> > 
> > -_1: int * [1B, -1B]
> > +_1: int * [1B, 4294967295B]
> 
> I wonder why 32-bit targets at displaying 4294967295 instead of -1.  Or are
> pointers 64-bits here?

Because the dump method does:
  if (INTEGRAL_TYPE_P (ttype)
  && vrp_val_is_max (max ())
  && TYPE_PRECISION (ttype) != 1)
fprintf (file, "+INF");
  else
print_generic_expr (file, max ());
so for integral types and maximum value, it prints +INF, but not for
pointers.
Perhaps we want to print +INF also for pointers,
  if ((INTEGRAL_TYPE_P (ttype) || POINTER_TYPE_P (ttype))
  && vrp_val_is_max (max (), true)
  && TYPE_PRECISION (ttype) != 1)
fprintf (file, "+INF");
  else
print_generic_expr (file, max ());
but maybe vrp_val_is_{min,max} should be rewritten for pointer types to be
more efficient, don't create trees, for min just use integer_zerop and for
max just compare wide_int?

Jakub

[PATCH] concepts cleanups and subsumption caching

2019-10-15 Thread Andrew Sutton

This patch finishes moving concepts-related functionality out of pt.c
and into constraint.cc an logic.cc, and adds logic.cc to gtfiles.

As part of that cleanup, I reimplemented and reenabled the subsumption
caching. It's not clear if this provides any significant performance
benefits, but it will prevent redundant and potentially costly
comparisons of constraints.

Tested on bootstrap and vs. cmcstl2.

Andrew Sutton


cleanup.patch
Description: Binary data

Re: [PATCH][ARM] Switch to default sched pressure algorithm

2019-10-15 Thread Christophe Lyon

On Sat, 12 Oct 2019 at 02:52, Ramana Radhakrishnan
 wrote:
>
> On Fri, Oct 11, 2019 at 6:19 PM Wilco Dijkstra  wrote:
> >
> > Hi Ramana,
> >
> > > Can you see what happens with the Cortex-A8 or Cortex-A9 schedulers to
> > > spread the range across some v7-a CPUs as well ? While they aren't that 
> > > popular today I
> > > would suggest you look at them because the defaults for v7-a are still to 
> > > use the
> > > Cortex-A8 scheduler and the Cortex-A9 scheduler might well also get used 
> > > in places given
> > > the availability of hardware.
> >
> > The results are practically identical to Cortex-A53 and A57 - there is a 
> > huge codesize win
> > across the board on SPEC2006, there isn't a single benchmark that is larger 
> > (ie. more
> > spilling).
> >
> > > I'd be happy to move this forward if you could show if there is no 
> > > *increase* in spills
> > > for the same range of benchmarks that you are doing for the Cortex-A8 and 
> > > Cortex-A9
> > > schedulers.
> >
> > There certainly isn't. I don't think results like these could be any more 
> > one-sided, it's a
> > significant win for every single benchmark, both for codesize and 
> > performance!
> >
>
> Ok go ahead - please be sensitive to testsuite regressions.
>

Hi Wilco,

I've noticed that your patch caused a regression:
FAIL: gcc.dg/tree-prof/pr77698.c scan-rtl-dump-times alignments
"internal loop alignment added" 1

Christophe


when the compiler is configured --with-mode thumb (or forcing -mthumb
when running the tests)
> Ramana
>
>
> > What isn't clear is whether something has gone horribly wrong in the 
> > scheduler which
> > could be fixed/reverted, but as it is right now I can't see it being useful 
> > at all. This means
> > we should also reevaluate whether pressure scheduling now hurts AArch64 too.
> >
> > Cheers,
> > Wilco

[PATCH v2 07/11] PR libstdc++/91906 Fix timed_mutex::try_lock_until on arbitrary clock

2019-10-15 Thread Mike Crowe

A non-standard clock may tick more slowly than std::chrono::steady_clock.
This means that we risk returning false early when the specified timeout
may not have expired. This can be avoided by looping until the timeout time
as reported by the non-standard clock has been reached.

Unfortunately, we have no way to tell whether the non-standard clock ticks
more quickly that std::chrono::steady_clock. If it does then we risk
returning later than would be expected, but that is unavoidable and
permitted by the standard.

François Dumont pointed out[1] a flaw in an earlier version of this patch
that revealed a hole in the test coverage, so I've added a new test that
try_lock_until acts as try_lock if the timeout has already expired.

[1] https://gcc.gnu.org/ml/libstdc++/2019-10/msg00021.html

* include/std/mutex (_M_try_lock_until): Loop until the absolute
timeout time is reached as measured against the appropriate clock.
* testsuite/30_threads/timed_mutex/try_lock_until/3.cc:
Also run test using slow_clock to test above fix.
* testsuite/30_threads/recursive_timed_mutex/try_lock_until/3.cc:
Likewise.
* testsuite/30_threads/recursive_timed_mutex/try_lock_until/4.cc:
Add new test that try_lock_until behaves as try_lock if the timeout
has already expired or exactly matches the current time.
---
 libstdc++-v3/include/std/mutex  | 
13 +++--
 libstdc++-v3/testsuite/30_threads/recursive_timed_mutex/try_lock_until/3.cc |  
2 ++
 libstdc++-v3/testsuite/30_threads/timed_mutex/try_lock_until/3.cc   |  
2 ++
 libstdc++-v3/testsuite/30_threads/timed_mutex/try_lock_until/4.cc   | 
68 
 4 files changed, 83 insertions(+), 2 deletions(-)
 create mode 100644 
libstdc++-v3/testsuite/30_threads/timed_mutex/try_lock_until/4.cc

diff --git a/libstdc++-v3/include/std/mutex b/libstdc++-v3/include/std/mutex
index e06d286..a81825d 100644
--- a/libstdc++-v3/include/std/mutex
+++ b/libstdc++-v3/include/std/mutex
@@ -189,8 +189,17 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
bool
_M_try_lock_until(const chrono::time_point<_Clock, _Duration>& __atime)
{
- auto __rtime = __atime - _Clock::now();
- return _M_try_lock_for(__rtime);
+  // The user-supplied clock may not tick at the same rate as
+  // steady_clock, so we must loop in order to guarantee that
+  // the timeout has expired before returning false.
+  auto __now = _Clock::now();
+  do {
+auto __rtime = __atime - __now;
+if (_M_try_lock_for(__rtime))
+  return true;
+__now = _Clock::now();
+  } while (__atime > __now);
+  return false;
}
 };
 
diff --git 
a/libstdc++-v3/testsuite/30_threads/recursive_timed_mutex/try_lock_until/3.cc 
b/libstdc++-v3/testsuite/30_threads/recursive_timed_mutex/try_lock_until/3.cc
index 9ca656d..6ababbd 100644
--- 
a/libstdc++-v3/testsuite/30_threads/recursive_timed_mutex/try_lock_until/3.cc
+++ 
b/libstdc++-v3/testsuite/30_threads/recursive_timed_mutex/try_lock_until/3.cc
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 
 template 
 void test()
@@ -71,4 +72,5 @@ int main()
 {
   test();
   test();
+  test();
 }
diff --git a/libstdc++-v3/testsuite/30_threads/timed_mutex/try_lock_until/3.cc 
b/libstdc++-v3/testsuite/30_threads/timed_mutex/try_lock_until/3.cc
index ef417ea..ed218d8 100644
--- a/libstdc++-v3/testsuite/30_threads/timed_mutex/try_lock_until/3.cc
+++ b/libstdc++-v3/testsuite/30_threads/timed_mutex/try_lock_until/3.cc
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 
 template 
 void test()
@@ -71,4 +72,5 @@ int main()
 {
   test();
   test();
+  test();
 }
diff --git a/libstdc++-v3/testsuite/30_threads/timed_mutex/try_lock_until/4.cc 
b/libstdc++-v3/testsuite/30_threads/timed_mutex/try_lock_until/4.cc
new file mode 100644
index 000..cd94ede
--- /dev/null
+++ b/libstdc++-v3/testsuite/30_threads/timed_mutex/try_lock_until/4.cc
@@ -0,0 +1,68 @@
+// { dg-do run }
+// { dg-options "-pthread"  }
+// { dg-require-effective-target c++14 }
+// { dg-require-effective-target pthread }
+// { dg-require-gthreads "" }
+
+// Copyright (C) 2019 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License a

[PATCH v2 04/11] libstdc++ testsuite: Also test unique_lock::try_lock_until with steady_clock

2019-10-15 Thread Mike Crowe

* testsuite/30_threads/unique_lock/locking/4.cc: Template test
functions so they can be used to test both steady_clock and
system_clock.
---
 libstdc++-v3/testsuite/30_threads/unique_lock/locking/4.cc | 12 +--
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/libstdc++-v3/testsuite/30_threads/unique_lock/locking/4.cc 
b/libstdc++-v3/testsuite/30_threads/unique_lock/locking/4.cc
index f10cd3f..1c544be 100644
--- a/libstdc++-v3/testsuite/30_threads/unique_lock/locking/4.cc
+++ b/libstdc++-v3/testsuite/30_threads/unique_lock/locking/4.cc
@@ -26,17 +26,18 @@
 #include 
 #include 
 
-int main()
+template 
+void test()
 {
   typedef std::timed_mutex mutex_type;
   typedef std::unique_lock lock_type;
-  typedef std::chrono::system_clock clock_type;
 
   try 
 {
   mutex_type m;
   lock_type l(m, std::defer_lock);
-  clock_type::time_point t = clock_type::now() + std::chrono::seconds(1);
+  const typename clock_type::time_point t = clock_type::now()
++ std::chrono::seconds(1);
 
   try
{
@@ -61,6 +62,11 @@ int main()
 {
   VERIFY( false );
 }
+}
 
+int main()
+{
+  test();
+  test();
   return 0;
 }
-- 
git-series 0.9.1

[PATCH v2 05/11] PR libstdc++/78237 Add full steady_clock support to timed_mutex

2019-10-15 Thread Mike Crowe

The pthread_mutex_clocklock function is available in glibc since the 2.30
release. If this function is available in the C library it can be used to
fix PR libstdc++/78237 by supporting steady_clock properly with
timed_mutex.

This means that code using timed_mutex::try_lock_for or
timed_mutex::wait_until with steady_clock is no longer subject to timing
out early or potentially waiting for much longer if the system clock is
warped at an inopportune moment.

If pthread_mutex_clocklock is available then steady_clock is deemed to be
the "best" clock available which means that it is used for the relative
try_lock_for calls and absolute try_lock_until calls using steady_clock and
user-defined clocks. Calls explicitly using system_clock (aka
high_resolution_clock) continue to use CLOCK_REALTIME via
__gthread_cond_timedwait.

If pthread_mutex_clocklock is not available then system_clock is deemed to
be the "best" clock available which means that the previous suboptimal
behaviour remains.

* acinclude.m4: Define GLIBCXX_CHECK_PTHREAD_MUTEX_CLOCKLOCK to
detect presence of pthread_mutex_clocklock function.

* config.h.in: Add _GLIBCXX_USE_PTHREAD_MUTEX_CLOCKLOCK define.

* configure.ac: Call GLIBCXX_CHECK_PTHREAD_MUTEX_CLOCKLOCK.

* configure: Add generated part to detect pthread_mutex_clocklock
function.

* include/std/mutex (__timed_mutex_impl): Remove unnecessary
__clock_t. (_M_try_lock_for) Use best clock to
turn relative timeout into absolute timeout. (_M_try_lock_until)
Keep existing implementation for system_clock. Add new
implementation for steady_clock that calls _M_clocklock. Modify
overload for user-defined clock to use a relative wait so that it
automatically uses the best clock.

* include/std/mutex (timed_mutex): If pthread_mutex_clocklock is
available, add _M_clocklock method that calls it.

* include/std/mutex (recursive_timed_mutex): Likewise.
---
 libstdc++-v3/acinclude.m4  | 31 +-
 libstdc++-v3/config.h.in   |  3 +-
 libstdc++-v3/configure | 82 +++-
 libstdc++-v3/configure.ac  |  3 +-
 libstdc++-v3/include/std/mutex | 49 +
 5 files changed, 160 insertions(+), 8 deletions(-)

diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
index ad2cb01..9ecfa96 100644
--- a/libstdc++-v3/acinclude.m4
+++ b/libstdc++-v3/acinclude.m4
@@ -4228,6 +4228,37 @@ AC_DEFUN([GLIBCXX_CHECK_PTHREAD_COND_CLOCKWAIT], [
 ])
 
 dnl
+dnl Check whether pthread_mutex_clocklock is available in  for 
std::timed_mutex to use,
+dnl and define _GLIBCXX_USE_PTHREAD_MUTEX_CLOCKLOCK.
+dnl
+AC_DEFUN([GLIBCXX_CHECK_PTHREAD_MUTEX_CLOCKLOCK], [
+
+  AC_LANG_SAVE
+  AC_LANG_CPLUSPLUS
+  ac_save_CXXFLAGS="$CXXFLAGS"
+  CXXFLAGS="$CXXFLAGS -fno-exceptions"
+  ac_save_LIBS="$LIBS"
+  LIBS="$LIBS -lpthread"
+
+  AC_MSG_CHECKING([for pthread_mutex_clocklock])
+  AC_CACHE_VAL(glibcxx_cv_PTHREAD_MUTEX_CLOCKLOCK, [
+GCC_TRY_COMPILE_OR_LINK(
+  [#include ],
+  [pthread_mutex_t mutex; struct timespec ts; int n = 
pthread_mutex_clocklock(&mutex, CLOCK_REALTIME, &ts);],
+  [glibcxx_cv_PTHREAD_MUTEX_CLOCKLOCK=yes],
+  [glibcxx_cv_PTHREAD_MUTEX_CLOCKLOCK=no])
+  ])
+  if test $glibcxx_cv_PTHREAD_MUTEX_CLOCKLOCK = yes; then
+AC_DEFINE(_GLIBCXX_USE_PTHREAD_MUTEX_CLOCKLOCK, 1, [Define if 
pthread_mutex_clocklock is available in .])
+  fi
+  AC_MSG_RESULT($glibcxx_cv_PTHREAD_MUTEX_CLOCKLOCK)
+
+  CXXFLAGS="$ac_save_CXXFLAGS"
+  LIBS="$ac_save_LIBS"
+  AC_LANG_RESTORE
+])
+
+dnl
 dnl Check whether sysctl is available in , and define 
_GLIBCXX_USE_SYSCTL_HW_NCPU.
 dnl
 AC_DEFUN([GLIBCXX_CHECK_SYSCTL_HW_NCPU], [
diff --git a/libstdc++-v3/config.h.in b/libstdc++-v3/config.h.in
index 32f7863..3d51534 100644
--- a/libstdc++-v3/config.h.in
+++ b/libstdc++-v3/config.h.in
@@ -1002,6 +1002,9 @@
 /* Define if pthread_cond_clockwait is available in . */
 #undef _GLIBCXX_USE_PTHREAD_COND_CLOCKWAIT
 
+/* Define if pthread_mutex_clocklock is available in . */
+#undef _GLIBCXX_USE_PTHREAD_MUTEX_CLOCKLOCK
+
 /* Define if POSIX read/write locks are available in . */
 #undef _GLIBCXX_USE_PTHREAD_RWLOCK_T
 
diff --git a/libstdc++-v3/configure b/libstdc++-v3/configure
index cb67581..bf0f50a 100755
--- a/libstdc++-v3/configure
+++ b/libstdc++-v3/configure
@@ -21880,6 +21880,88 @@ ac_compiler_gnu=$ac_cv_c_compiler_gnu
 
 
 
+# For pthread_mutex_clocklock
+
+
+
+  ac_ext=cpp
+ac_cpp='$CXXCPP $CPPFLAGS'
+ac_compile='$CXX -c $CXXFLAGS $CPPFLAGS conftest.$ac_ext >&5'
+ac_link='$CXX -o conftest$ac_exeext $CXXFLAGS $CPPFLAGS $LDFLAGS 
conftest.$ac_ext $LIBS >&5'
+ac_compiler_gnu=$ac_cv_cxx_compiler_gnu
+
+  ac_save_CXXFLAGS="$CXXFLAGS"
+  CXXFLAGS="$CXXFLAGS -fno-exceptions"
+  ac_save_LIBS="$LIBS"
+  LIBS="$LIBS -lpthread"
+
+  { $as_echo "$as_me:${as_lineno-$LINENO}: checking for 
pthread_mutex_clocklock" >&5
+$as_echo_n "checking for pthread

[PATCH v2 03/11] libstdc++ testsuite: Also test timed_mutex with steady_clock

2019-10-15 Thread Mike Crowe

* testsuite/30_threads/timed_mutex/try_lock_until/57641.cc:
  Template test functions and use them to test both steady_clock
  and system_clock.
---
 libstdc++-v3/testsuite/30_threads/timed_mutex/try_lock_until/57641.cc | 18 
+-
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git 
a/libstdc++-v3/testsuite/30_threads/timed_mutex/try_lock_until/57641.cc 
b/libstdc++-v3/testsuite/30_threads/timed_mutex/try_lock_until/57641.cc
index 12ee52f..6355d8f 100644
--- a/libstdc++-v3/testsuite/30_threads/timed_mutex/try_lock_until/57641.cc
+++ b/libstdc++-v3/testsuite/30_threads/timed_mutex/try_lock_until/57641.cc
@@ -49,18 +49,26 @@ struct clock
 std::timed_mutex mx;
 bool locked = false;
 
+template 
 void f()
 {
-  locked = mx.try_lock_until(clock::now() + C::milliseconds(1));
+  locked = mx.try_lock_until(ClockType::now() + C::milliseconds(1));
 }
 
-int main()
+template 
+void test()
 {
   std::lock_guard l(mx);
-  auto start = C::system_clock::now();
-  std::thread t(f);
+  auto start = ClockType::now();
+  std::thread t(f);
   t.join();
-  auto stop = C::system_clock::now();
+  auto stop = ClockType::now();
   VERIFY( (stop - start) < C::seconds(9) );
   VERIFY( !locked );
 }
+
+int main()
+{
+  test();
+  test();
+}
-- 
git-series 0.9.1

[PATCH v2 01/11] libstdc++ testsuite: Check return value from timed_mutex::try_lock_until

2019-10-15 Thread Mike Crowe

* testsuite/30_threads/unique_lock/locking/4.cc: Wrap call to
timed_mutex::try_lock_until in VERIFY macro to check its return
value.
---
 libstdc++-v3/testsuite/30_threads/unique_lock/locking/4.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libstdc++-v3/testsuite/30_threads/unique_lock/locking/4.cc 
b/libstdc++-v3/testsuite/30_threads/unique_lock/locking/4.cc
index fe0935f..f10cd3f 100644
--- a/libstdc++-v3/testsuite/30_threads/unique_lock/locking/4.cc
+++ b/libstdc++-v3/testsuite/30_threads/unique_lock/locking/4.cc
@@ -40,7 +40,7 @@ int main()
 
   try
{
- l.try_lock_until(t);
+ VERIFY( l.try_lock_until(t) );
}
   catch(const std::system_error&)
{
-- 
git-series 0.9.1

[PATCH v2 02/11] libstdc++ testsuite: Add timed_mutex::try_lock_until test

2019-10-15 Thread Mike Crowe

I was unable to find an existing tests for timed_mutex::try_lock_until and
recursive_timed_mutex::try_lock_until timing out. It would have been easier
to add a single templated test, but since these classes are tested in
separate directories I've created two separate tests.

* testsuite/30_threads/timed_mutex/try_lock_until/3.cc: New
file. Ensure that timed_mutex::try_lock_until actually times out
after the specified time when using both system_clock and
steady_clock.

* testsuite/30_threads/recursive_timed_mutex/try_lock_until/3.cc:
Likewise but for recursive_timed_mutex.
---
 libstdc++-v3/testsuite/30_threads/recursive_timed_mutex/try_lock_until/3.cc | 
74 -
 libstdc++-v3/testsuite/30_threads/timed_mutex/try_lock_until/3.cc   | 
74 -
 2 files changed, 148 insertions(+)
 create mode 100644 
libstdc++-v3/testsuite/30_threads/recursive_timed_mutex/try_lock_until/3.cc
 create mode 100644 
libstdc++-v3/testsuite/30_threads/timed_mutex/try_lock_until/3.cc

diff --git 
a/libstdc++-v3/testsuite/30_threads/recursive_timed_mutex/try_lock_until/3.cc 
b/libstdc++-v3/testsuite/30_threads/recursive_timed_mutex/try_lock_until/3.cc
new file mode 100644
index 000..9ca656d
--- /dev/null
+++ 
b/libstdc++-v3/testsuite/30_threads/recursive_timed_mutex/try_lock_until/3.cc
@@ -0,0 +1,74 @@
+// { dg-do run }
+// { dg-options "-pthread"  }
+// { dg-require-effective-target c++14 }
+// { dg-require-effective-target pthread }
+// { dg-require-gthreads "" }
+
+// Copyright (C) 2019 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+
+#include 
+#include 
+#include 
+#include 
+
+template 
+void test()
+{
+  typedef std::recursive_timed_mutex mutex_type;
+
+  try
+{
+  mutex_type m;
+  m.lock();
+  bool b;
+
+  std::thread t([&] {
+try
+  {
+using namespace std::chrono;
+const auto timeout = 100ms;
+const auto start = clock_type::now();
+const auto b = m.try_lock_until(start + timeout);
+const auto t = clock_type::now() - start;
+VERIFY( !b );
+VERIFY( t >= timeout );
+  }
+catch (const std::system_error& e)
+  {
+VERIFY( false );
+  }
+   });
+  t.join();
+  m.unlock();
+}
+  catch (const std::system_error& e)
+{
+  VERIFY( false );
+}
+  catch (...)
+{
+  VERIFY( false );
+}
+}
+
+int main()
+{
+  test();
+  test();
+}
diff --git a/libstdc++-v3/testsuite/30_threads/timed_mutex/try_lock_until/3.cc 
b/libstdc++-v3/testsuite/30_threads/timed_mutex/try_lock_until/3.cc
new file mode 100644
index 000..ef417ea
--- /dev/null
+++ b/libstdc++-v3/testsuite/30_threads/timed_mutex/try_lock_until/3.cc
@@ -0,0 +1,74 @@
+// { dg-do run }
+// { dg-options "-pthread"  }
+// { dg-require-effective-target c++14 }
+// { dg-require-effective-target pthread }
+// { dg-require-gthreads "" }
+
+// Copyright (C) 2019 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+
+#include 
+#include 
+#include 
+#include 
+
+template 
+void test()
+{
+  typedef std::timed_mutex mutex_type;
+
+  try
+{
+  mutex_type m;
+  m.lock();
+  bool b;
+
+  std::thread t([&] {
+try
+  {
+using namespace std::chrono;
+const auto timeout = 100ms;
+const auto start = clock_type::now();
+const auto b = m.try_lock_unt

[PATCH v2 10/11] libstdc++ timed_mutex: Ensure that try_lock_for waits for long enough

2019-10-15 Thread Mike Crowe

The user-defined clock used with shared_mutex::try_lock_for and
shared_mutex::try_lock_shared_for may have higher precision than
__clock_t. We may need to round the duration up to ensure that the timeout
is long enough. (See __timed_mutex_impl::_M_try_lock_for)

* include/std/shared_mutex: (try_lock_for) Round up wait duration
  if necessary. (try_lock_shared_for) Likewise.
---
 libstdc++-v3/include/std/shared_mutex | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/libstdc++-v3/include/std/shared_mutex 
b/libstdc++-v3/include/std/shared_mutex
index b637bdc..6e67324 100644
--- a/libstdc++-v3/include/std/shared_mutex
+++ b/libstdc++-v3/include/std/shared_mutex
@@ -471,9 +471,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
 template
   bool
-  try_lock_for(const chrono::duration<_Rep, _Period>& __rel_time)
+  try_lock_for(const chrono::duration<_Rep, _Period>& __rtime)
   {
-   return try_lock_until(__clock_t::now() + __rel_time);
+auto __rt = chrono::duration_cast<__clock_t::duration>(__rtime);
+if (ratio_greater<__clock_t::period, _Period>())
+  ++__rt;
+   return try_lock_until(__clock_t::now() + __rt);
   }
 
 // Shared ownership
@@ -484,9 +487,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
 template
   bool
-  try_lock_shared_for(const chrono::duration<_Rep, _Period>& __rel_time)
+  try_lock_shared_for(const chrono::duration<_Rep, _Period>& __rtime)
   {
-   return try_lock_shared_until(__clock_t::now() + __rel_time);
+auto __rt = chrono::duration_cast<__clock_t::duration>(__rtime);
+if (ratio_greater<__clock_t::period, _Period>())
+  ++__rt;
+   return try_lock_shared_until(__clock_t::now() + __rt);
   }
 
 #if _GLIBCXX_USE_PTHREAD_RWLOCK_T && _GTHREAD_USE_MUTEX_TIMEDLOCK
-- 
git-series 0.9.1

[PATCH v2 11/11] shared_mutex: Fix try_lock_until and try_lock_shared_until on arbitrary clock

2019-10-15 Thread Mike Crowe

This is the equivalent to PR libstdc++/91906, but for shared_mutex.

A non-standard clock may tick more slowly than std::chrono::steady_clock.
This means that we risk returning false early when the specified timeout
may not have expired. This can be avoided by looping until the timeout time
as reported by the non-standard clock has been reached.

Unfortunately, we have no way to tell whether the non-standard clock ticks
more quickly that std::chrono::steady_clock. If it does then we risk
returning later than would be expected, but that is unavoidable without
waking up periodically to check, which would be rather too expensive.

François Dumont pointed out[1] a flaw in an earlier version of this patch
that revealed a hole in the test coverage, so I've added a new test that
try_lock_until acts as try_lock if the timeout has already expired.

[1] https://gcc.gnu.org/ml/libstdc++/2019-10/msg00021.html

* include/std/shared_mutex (try_lock_until, try_lock_shared_until):
Loop until the absolute timeout time is reached as measured against
the appropriate clock.
* testsuite/30_threads/shared_mutex/try_lock_until/1.cc: New
file. Test shared_mutex::try_lock_until and
shared_mutex::try_lock_shared_until timeouts against various
clocks.
* testsuite/30_threads/shared_mutex/try_lock_until/1.cc: New
file. Test shared_mutex::try_lock_until and
shared_mutex::try_lock_shared_until timeouts against various
clocks.
---
 libstdc++-v3/include/std/shared_mutex| 28 
++-
 libstdc++-v3/testsuite/30_threads/shared_timed_mutex/try_lock_until/1.cc | 87 
-
 libstdc++-v3/testsuite/30_threads/shared_timed_mutex/try_lock_until/2.cc | 75 
++-
 3 files changed, 184 insertions(+), 6 deletions(-)
 create mode 100644 
libstdc++-v3/testsuite/30_threads/shared_timed_mutex/try_lock_until/1.cc
 create mode 100644 
libstdc++-v3/testsuite/30_threads/shared_timed_mutex/try_lock_until/2.cc

diff --git a/libstdc++-v3/include/std/shared_mutex 
b/libstdc++-v3/include/std/shared_mutex
index 6e67324..841f18b 100644
--- a/libstdc++-v3/include/std/shared_mutex
+++ b/libstdc++-v3/include/std/shared_mutex
@@ -554,9 +554,17 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   bool
   try_lock_until(const chrono::time_point<_Clock, _Duration>& __atime)
   {
-typename _Clock::time_point __now = _Clock::now();
-auto __rtime = __atime - __now;
-return try_lock_for(__rtime);
+// The user-supplied clock may not tick at the same rate as
+// steady_clock, so we must loop in order to guarantee that
+// the timeout has expired before returning false.
+   typename _Clock::time_point __now = _Clock::now();
+do {
+  auto __rtime = __atime - __now;
+  if (try_lock_for(__rtime))
+return true;
+  __now = _Clock::now();
+} while (__atime > __now);
+return false;
   }
 
 // Shared ownership
@@ -631,9 +639,17 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   try_lock_shared_until(const chrono::time_point<_Clock,
 _Duration>& __atime)
   {
-typename _Clock::time_point __now = _Clock::now();
-auto __rtime = __atime - __now;
-return try_lock_shared_for(__rtime);
+// The user-supplied clock may not tick at the same rate as
+// steady_clock, so we must loop in order to guarantee that
+// the timeout has expired before returning false.
+   typename _Clock::time_point __now = _Clock::now();
+do {
+  auto __rtime = __atime - __now;
+  if (try_lock_shared_for(__rtime))
+return true;
+  __now = _Clock::now();
+} while (__atime > __now);
+return false;
   }
 
 #else // ! (_GLIBCXX_USE_PTHREAD_RWLOCK_T && _GTHREAD_USE_MUTEX_TIMEDLOCK)
diff --git 
a/libstdc++-v3/testsuite/30_threads/shared_timed_mutex/try_lock_until/1.cc 
b/libstdc++-v3/testsuite/30_threads/shared_timed_mutex/try_lock_until/1.cc
new file mode 100644
index 000..a0ab536
--- /dev/null
+++ b/libstdc++-v3/testsuite/30_threads/shared_timed_mutex/try_lock_until/1.cc
@@ -0,0 +1,87 @@
+// { dg-do run }
+// { dg-options "-pthread"  }
+// { dg-require-effective-target c++14 }
+// { dg-require-effective-target pthread }
+// { dg-require-gthreads "" }
+
+// Copyright (C) 2019 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY W

[PATCH v2 09/11] libstdc++ shared_mutex: Add full steady_clock support to shared_timed_mutex

2019-10-15 Thread Mike Crowe

The pthread_rwlock_clockrdlock and pthread_rwlock_clockwrlock functions
were added to glibc in v2.30. They have also been added to Android
Bionic. If these functions are available in the C library then they can be
used to implement shared_timed_mutex::try_lock_until,
shared_timed_mutex::try_lock_for, shared_timed_mutex::try_lock_shared_until
and shared_timed_mutex::try_lock_shared_for so that they are no longer
unaffected by the system clock being warped. (This is the shared_mutex
equivalent of PR libstdc++/78237 for mutex.)

If the new functions are available then steady_clock is deemed to be the
"best" clock available which means that it is used for the relative
try_lock_for calls and absolute try_lock_until calls using steady_clock and
user-defined clocks. It's not possible to have
_GLIBCXX_USE_PTHREAD_RWLOCK_CLOCKLOCK defined without
_GLIBCXX_USE_PTHREAD_RWLOCK_T, so the requirement that the clock be the
same as condition_variable is maintained. Calls explicitly using
system_clock (aka high_resolution_clock) continue to use CLOCK_REALTIME via
the old pthread_rwlock_timedrdlock and pthread_rwlock_timedwrlock
functions.

If the new functions are not available then system_clock is deemed to be
the "best" clock available which means that the previous suboptimal
behaviour remains.

* libstdc++-v3/acinclude.m4: Define
GLIBCXX_CHECK_PTHREAD_RWLOCK_CLOCKLOCK to check for the presence of
both pthread_rwlock_clockrdlock and pthread_rwlock_clockwrlock.

* config.h.in: Add _GLIBCXX_USE_PTHREAD_RWLOCK_CLOCKLOCK define.

* configure.ac: Call GLIBCXX_USE_PTHREAD_RWLOCK_CLOCKLOCK.

* configure: Add generated part to detect
pthread_rwlock_clockrdlock and pthread_rwlock_clockwrlock
functions.

* include/std/shared_mutex (shared_timed_mutex): Define __clock_t
as the best clock to use for relative waits. (try_lock_until): Use
existing try_lock_until implementation for system_clock (which
matches __clock_t when _GLIBCCXX_USE_PTHREAD_RWLOCK_CLOCKLOCK is
not defined). Add new overload for steady_clock that uses
pthread_rwlock_clockwrlock if it is available. Simplify overload
for non-standard clock to just call try_lock_for with a relative
timeout. (try_lock_shared_until): Likewise.
---
 libstdc++-v3/acinclude.m4 | 33 +++-
 libstdc++-v3/config.h.in  |  4 +-
 libstdc++-v3/configure| 88 -
 libstdc++-v3/configure.ac |  3 +-
 libstdc++-v3/include/std/shared_mutex | 87 ++--
 5 files changed, 198 insertions(+), 17 deletions(-)

diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
index 9ecfa96..d860dbe 100644
--- a/libstdc++-v3/acinclude.m4
+++ b/libstdc++-v3/acinclude.m4
@@ -4259,6 +4259,39 @@ AC_DEFUN([GLIBCXX_CHECK_PTHREAD_MUTEX_CLOCKLOCK], [
 ])
 
 dnl
+dnl Check whether pthread_mutex_clocklock is available in  for 
std::timed_mutex to use,
+dnl and define _GLIBCXX_USE_PTHREAD_MUTEX_CLOCKLOCK.
+dnl
+AC_DEFUN([GLIBCXX_CHECK_PTHREAD_RWLOCK_CLOCKLOCK], [
+
+  AC_LANG_SAVE
+  AC_LANG_CPLUSPLUS
+  ac_save_CXXFLAGS="$CXXFLAGS"
+  CXXFLAGS="$CXXFLAGS -fno-exceptions"
+  ac_save_LIBS="$LIBS"
+  LIBS="$LIBS -lpthread"
+
+  AC_MSG_CHECKING([for pthread_rwlock_clockrdlock, pthread_wlock_clockwrlock])
+  AC_CACHE_VAL(glibcxx_cv_PTHREAD_RWLOCK_CLOCKLOCK, [
+GCC_TRY_COMPILE_OR_LINK(
+  [#include ],
+  [pthread_rwlock_t rwl; struct timespec ts;]
+  [int n = pthread_rwlock_clockrdlock(&rwl, CLOCK_REALTIME, &ts);]
+  [int m = pthread_rwlock_clockwrlock(&rwl, CLOCK_REALTIME, &ts);],
+  [glibcxx_cv_PTHREAD_RWLOCK_CLOCKLOCK=yes],
+  [glibcxx_cv_PTHREAD_RWLOCK_CLOCKLOCK=no])
+  ])
+  if test $glibcxx_cv_PTHREAD_RWLOCK_CLOCKLOCK = yes; then
+AC_DEFINE(_GLIBCXX_USE_PTHREAD_RWLOCK_CLOCKLOCK, 1, [Define if 
pthread_rwlock_clockrdlock and pthread_rwlock_clockwrlock are available in 
.])
+  fi
+  AC_MSG_RESULT($glibcxx_cv_PTHREAD_RWLOCK_CLOCKLOCK)
+
+  CXXFLAGS="$ac_save_CXXFLAGS"
+  LIBS="$ac_save_LIBS"
+  AC_LANG_RESTORE
+])
+
+dnl
 dnl Check whether sysctl is available in , and define 
_GLIBCXX_USE_SYSCTL_HW_NCPU.
 dnl
 AC_DEFUN([GLIBCXX_CHECK_SYSCTL_HW_NCPU], [
diff --git a/libstdc++-v3/config.h.in b/libstdc++-v3/config.h.in
index 3d51534..8940e0c 100644
--- a/libstdc++-v3/config.h.in
+++ b/libstdc++-v3/config.h.in
@@ -1005,6 +1005,10 @@
 /* Define if pthread_mutex_clocklock is available in . */
 #undef _GLIBCXX_USE_PTHREAD_MUTEX_CLOCKLOCK
 
+/* Define if pthread_rwlock_clockrdlock and pthread_rwlock_clockwrlock are
+   available in . */
+#undef _GLIBCXX_USE_PTHREAD_RWLOCK_CLOCKLOCK
+
 /* Define if POSIX read/write locks are available in . */
 #undef _GLIBCXX_USE_PTHREAD_RWLOCK_T
 
diff --git a/libstdc++-v3/configure b/libstdc++-v3/configure
index bf0f50a..edb565f 100755
--- a/libstdc++-v3/configure
+++ b/libstdc++-v3/configure
@@ -21963,6 +21963,94 @@ ac_compiler_gnu=$

[PATCH v2 08/11] libstdc++ testsuite: Also test shared_timed_mutex with steady_clock

2019-10-15 Thread Mike Crowe

* testsuite/30_threads/shared_timed_mutex/try_lock/3.cc: Convert
  existing test to templated function so that it can be called with
  both system_clock and steady_clock.
---
 libstdc++-v3/testsuite/30_threads/shared_timed_mutex/try_lock/3.cc | 17 
-
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/libstdc++-v3/testsuite/30_threads/shared_timed_mutex/try_lock/3.cc 
b/libstdc++-v3/testsuite/30_threads/shared_timed_mutex/try_lock/3.cc
index fd21033..1cf191c 100644
--- a/libstdc++-v3/testsuite/30_threads/shared_timed_mutex/try_lock/3.cc
+++ b/libstdc++-v3/testsuite/30_threads/shared_timed_mutex/try_lock/3.cc
@@ -27,7 +27,8 @@
 #include 
 #include 
 
-int main()
+template 
+void test()
 {
   typedef std::shared_timed_mutex mutex_type;
 
@@ -42,15 +43,15 @@ int main()
   {
 using namespace std::chrono;
 auto timeout = 100ms;
-auto start = system_clock::now();
+auto start = clock_type::now();
 b = m.try_lock_for(timeout);
-auto t = system_clock::now() - start;
+auto t = clock_type::now() - start;
 VERIFY( !b );
 VERIFY( t >= timeout );
 
-start = system_clock::now();
+start = clock_type::now();
 b = m.try_lock_until(start + timeout);
-t = system_clock::now() - start;
+t = clock_type::now() - start;
 VERIFY( !b );
 VERIFY( t >= timeout );
   }
@@ -71,3 +72,9 @@ int main()
   VERIFY( false );
 }
 }
+
+int main()
+{
+  test();
+  test();
+}
-- 
git-series 0.9.1

[PATCH v2 00/11] timed_mutex, shared_timed_mutex: Add full steady clock support

2019-10-15 Thread Mike Crowe

glibc v2.30 added the pthread_mutex_clocklock,
pthread_rwlock_clockrdlock and pthread_rwlock_clockwrlock
functions. These accept CLOCK_MONOTONIC, so they can be used to
implement proper steady_clock support in timed_mutex,
recursive_timed_mutex and shared_timed_mutex that is immune to the
system clock being warped.

Unfortunately we can't warp the system clock in the testsuite, so it's
not possible to automatically ensure that the system_clock test cases
result in a wait on CLOCK_REALTIME and steady_clock test cases result
in a wait on CLOCK_MONOTONIC. It's recommended to run the test under
strace(1) and check whether the expected futex calls are made by glibc
or ltrace(1) and check whether the expected pthread calls are
made. The easiest way to do this is to copy and paste the line used to
build the test from the output of running the tests (for example):

 make check-target-libstdc++-v3 
RUNTESTFLAGS="conformance.exp=30_threads/shared_mutex/* -v -v"

to compile the test with only the tests for one clock enabled and then
run it as:

 strace -f ./1.exe

You should see calls to:

 futex(..., FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, ...)

with large values of tv_sec when using system_clock and calls to:

 futex(..., FUTEX_WAIT_BITSET_PRIVATE, ...)

Alternatively, you can use:

 ltrace -f ./1.exe

and look for calls to pthread_mutex_clocklock,
pthread_rwlock_clockrdlock and pthread_rwlock_clockwrlock with a
parameter of 1 for CLOCK_MONOTONIC and with values of tv_sec based on
the system uptime when using steady_clock.

This series also adds some extra tests and fixes some other minor
problems with the existing implementation and tests.

Changes since v1[1]:

* Fix flaw pointed out[2] by François Dumont and add tests to prove
  that.

[1] https://gcc.gnu.org/ml/libstdc++/2019-09/msg00106.html
[2] https://gcc.gnu.org/ml/libstdc++/2019-10/msg00021.html

Mike Crowe (11):
  libstdc++ testsuite: Check return value from timed_mutex::try_lock_until
  libstdc++ testsuite: Add timed_mutex::try_lock_until test
  libstdc++ testsuite: Also test timed_mutex with steady_clock
  libstdc++ testsuite: Also test unique_lock::try_lock_until with steady_clock
  PR libstdc++/78237 Add full steady_clock support to timed_mutex
  libstdc++ testsuite: Move slow_clock to its own header
  PR libstdc++/91906 Fix timed_mutex::try_lock_until on arbitrary clock
  libstdc++ testsuite: Also test shared_timed_mutex with steady_clock
  libstdc++ shared_mutex: Add full steady_clock support to shared_timed_mutex
  libstdc++ timed_mutex: Ensure that try_lock_for waits for long enough
  shared_mutex: Fix try_lock_until and try_lock_shared_until on arbitrary clock

 libstdc++-v3/acinclude.m4   |  
64 +++-
 libstdc++-v3/config.h.in|  
 7 +++-
 libstdc++-v3/configure  | 
170 -
 libstdc++-v3/configure.ac   |  
 6 +++-
 libstdc++-v3/include/std/mutex  |  
60 +
 libstdc++-v3/include/std/shared_mutex   | 
117 +-
 libstdc++-v3/testsuite/30_threads/condition_variable/members/2.cc   |  
17 +---
 libstdc++-v3/testsuite/30_threads/recursive_timed_mutex/try_lock_until/3.cc |  
76 -
 libstdc++-v3/testsuite/30_threads/shared_timed_mutex/try_lock/3.cc  |  
17 ---
 libstdc++-v3/testsuite/30_threads/shared_timed_mutex/try_lock_until/1.cc|  
87 +-
 libstdc++-v3/testsuite/30_threads/shared_timed_mutex/try_lock_until/2.cc|  
75 -
 libstdc++-v3/testsuite/30_threads/timed_mutex/try_lock_until/3.cc   |  
76 -
 libstdc++-v3/testsuite/30_threads/timed_mutex/try_lock_until/4.cc   |  
68 +-
 libstdc++-v3/testsuite/30_threads/timed_mutex/try_lock_until/57641.cc   |  
18 +---
 libstdc++-v3/testsuite/30_threads/unique_lock/locking/4.cc  |  
14 --
 libstdc++-v3/testsuite/util/slow_clock.h|  
38 -
 16 files changed, 850 insertions(+), 60 deletions(-)
 create mode 100644 
libstdc++-v3/testsuite/30_threads/recursive_timed_mutex/try_lock_until/3.cc
 create mode 100644 
libstdc++-v3/testsuite/30_threads/shared_timed_mutex/try_lock_until/1.cc
 create mode 100644 
libstdc++-v3/testsuite/30_threads/shared_timed_mutex/try_lock_until/2.cc
 create mode 100644 
libstdc++-v3/testsuite/30_threads/timed_mutex/try_lock_until/3.cc
 create mode 100644 
libstdc++-v3/testsuite/30_threads/timed_mutex/try_lock_until/4.cc
 create mode 100644 libstdc++-v3/testsuite/util/slow

[PATCH v2 06/11] libstdc++ testsuite: Move slow_clock to its own header

2019-10-15 Thread Mike Crowe

Move slow_clock test class into a header file so that it can be used by
other tests in the future.

* testsuite/util/slow_clock.h: New file. Move implementation of
slow_clock test class.

* testsuite/30_threads/condition_variable/members/2.cc: Include
slow_clock from header.
---
 libstdc++-v3/testsuite/30_threads/condition_variable/members/2.cc | 17 
+
 libstdc++-v3/testsuite/util/slow_clock.h  | 38 
++
 2 files changed, 39 insertions(+), 16 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/util/slow_clock.h

diff --git a/libstdc++-v3/testsuite/30_threads/condition_variable/members/2.cc 
b/libstdc++-v3/testsuite/30_threads/condition_variable/members/2.cc
index cbac3fa..f821d3f 100644
--- a/libstdc++-v3/testsuite/30_threads/condition_variable/members/2.cc
+++ b/libstdc++-v3/testsuite/30_threads/condition_variable/members/2.cc
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 
 template 
 void test01()
@@ -52,22 +53,6 @@ void test01()
 }
 }
 
-struct slow_clock
-{
-  using rep = std::chrono::system_clock::rep;
-  using period = std::chrono::system_clock::period;
-  using duration = std::chrono::system_clock::duration;
-  using time_point = std::chrono::time_point;
-  static constexpr bool is_steady = false;
-
-  static time_point now()
-  {
-auto real = std::chrono::system_clock::now();
-return time_point{real.time_since_epoch() / 3};
-  }
-};
-
-
 void test01_alternate_clock()
 {
   try
diff --git a/libstdc++-v3/testsuite/util/slow_clock.h 
b/libstdc++-v3/testsuite/util/slow_clock.h
new file mode 100644
index 000..b81754e
--- /dev/null
+++ b/libstdc++-v3/testsuite/util/slow_clock.h
@@ -0,0 +1,38 @@
+// -*- C++ -*-
+
+// Copyright (C) 2019 Free Software Foundation, Inc.
+
+// This library is free software; you can redistribute it and/or
+// modify it under the terms of the GNU General Public License as
+// published by the Free Software Foundation; either version 3, or (at
+// your option) any later version.
+
+// This library is distributed in the hope that it will be useful, but
+// WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+// General Public License for more details.
+
+// You should have received a copy of the GNU General Public License
+// along with this library; see the file COPYING3.  If not see
+// .
+
+
+// A clock that ticks at a third of the speed of system_clock that can be used
+// to ensure that functions with timeouts don't erroneously return early.
+
+#include 
+
+struct slow_clock
+{
+  using rep = std::chrono::system_clock::rep;
+  using period = std::chrono::system_clock::period;
+  using duration = std::chrono::system_clock::duration;
+  using time_point = std::chrono::time_point;
+  static constexpr bool is_steady = false;
+
+  static time_point now()
+  {
+auto real = std::chrono::system_clock::now();
+return time_point{real.time_since_epoch() / 3};
+  }
+};
-- 
git-series 0.9.1

Re: [PATCH] rs6000: -flto forgets 'no-vsx' function attributes (PR target/70010)

2019-10-15 Thread Peter Bergner

On 10/15/19 4:56 AM, Segher Boessenkool wrote:
> On Tue, Oct 15, 2019 at 05:15:07PM +0800, Jiufu Guo wrote:
>> And another issue: Behavior is still inconsistent between "-mno-vsx
>> -flto" and "-mno-vsx" for user code. Previous patch makes it consistent
>> between "-mvsx -flto" and "-mvsx". 
> 
>>  $GCC_BUILD/gcc/xgcc -B$GCC_BUILD/gcc novsx.c -O2 -mno-vsx
>> /home/guojiufu/temp/novsx.c: In function 'main':
>> /home/guojiufu/temp/novsx.c:6:1: error: inlining failed in call to 
>> 'always_inline' 'foo': target specific option mismatch
> 
> So what should we do about this?  There are arguments for *both*
> behaviours, and apparently with LTO we do not know which flags are
> explicit?

I'd say this is user error, telling the compiler it has to inline the callee
function, but then using incompatible options on the caller and the callee,
so that it cannot.  I think the error message is the correct thing here.

Peter

Re: C++ PATCH for c++/92106 - ICE with structured bindings and -Wreturn-local-addr

2019-10-15 Thread Jakub Jelinek

On Tue, Oct 15, 2019 at 01:17:17PM -0400, Marek Polacek wrote:
> 2019-10-15  Marek Polacek  
> 
>   PR c++/92106 - ICE with structured bindings and -Wreturn-local-addr.
>   * typeck.c (maybe_warn_about_returning_address_of_local): Avoid
>   recursing on null initializer.
> 
>   * g++.dg/cpp1z/decomp50.C: New test.
> 
> diff --git gcc/cp/typeck.c gcc/cp/typeck.c
> index 141d86f50c9..1825540016f 100644
> --- gcc/cp/typeck.c
> +++ gcc/cp/typeck.c
> @@ -9354,10 +9354,8 @@ maybe_warn_about_returning_address_of_local (tree 
> retval)
>binding.  */
> tree base = DECL_DECOMP_BASE (whats_returned);
> if (TYPE_REF_P (TREE_TYPE (base)))
> - {
> -   tree init = DECL_INITIAL (base);
> + if (tree init = DECL_INITIAL (base))
> return maybe_warn_about_returning_address_of_local (init);
> - }

Actually, seeing the dg-warning in the testcase, I think we shouldn't warn,
for range-for it is just too hard to find out if it will be returning
address of a local or not. &value in itself is not address of a local
variable when the structured binding is a reference.
Well, in the testcase as is it actually is (perhaps just in the reduced
one and not original):
const struct J & D.2293;
const int name [value-expr: D.2293->name];
const int value [value-expr: D.2293->value];
...
  struct reference D.2352;

  try
{
  D.2352 = D::operator* (&__for_begin);
  D.2293 = &D.2352;
but a small change to the testcase:
 template  class D {
 public:
-  typename B<_Iterator>::reference operator*();
+  typename B<_Iterator>::reference &operator*();
   void operator++();
 };
results in D.2293 = D::operator* (&__for_begin);
and then it might very well not be address of a local variable.

This isn't just about a false positive warning, when
maybe_warn_about_returning_address_of_local returns true, then
we actually return NULL pointer instead of the value user wanted.

So, I think you want to add and do else return false;
if init is NULL.

> +for (const auto &[name, value] : members)
> +  return &value; // { dg-warning "address of local variable" }
> +return nullptr;
> +  }
> +};
> +int main() {
> +  A a;
> +  a.find("");
> +}

Jakub

Re: [POC v2 PATCH] __builtin_warning

2019-10-15 Thread Richard Sandiford

Joseph Myers  writes:
> On Mon, 14 Oct 2019, Martin Sebor wrote:
>
>> On 10/14/19 4:03 PM, Joseph Myers wrote:
>> > How does this interact with translation?
>> > 
>> > My expectation would be that in user code, the message is taken literally
>> > as-is; it is not looked up in the GCC message catalog even if it is
>> > identical to some GCC diagnostic.  However, when used internally for GCC
>> > diagnostics, they should be translated, meaning the translation for GCC
>> > diagnostics using this mechanism should occur before the built-in call is
>> > stored in the IR.
>> 
>> Right, that would make sense.  Does something make you think it will
>> be hard to do?
>
> No.  I'd expect the code generating the IR from GCC diagnostics should 
> arrange for the message to be translated first (while in the case of 
> __builtin_warning in user code, it would not be translated), and the code 
> subsequently using the IR to use diagnostic_set_info_translated or 
> functions that end up calling it without trying to translate.

Might be a daft question, but would we ever want GCC to introduce new
calls before LTO streaming?  I guess we shouldn't assume that the user
who builds the final object will be using the same locale.

Richard

C++ PATCH for c++/92106 - ICE with structured bindings and -Wreturn-local-addr

2019-10-15 Thread Marek Polacek

Here we are returning the address of a structure binding and since it's a
reference, we recursed on its initializer, but in this case there was none
and we crashed in cp_fold.  So don't recurse when we don't have an init to
recurse on.

Bootstrapped/regtested on x86_64-linux, ok for trunk/9?

2019-10-15  Marek Polacek  

PR c++/92106 - ICE with structured bindings and -Wreturn-local-addr.
* typeck.c (maybe_warn_about_returning_address_of_local): Avoid
recursing on null initializer.

* g++.dg/cpp1z/decomp50.C: New test.

diff --git gcc/cp/typeck.c gcc/cp/typeck.c
index 141d86f50c9..1825540016f 100644
--- gcc/cp/typeck.c
+++ gcc/cp/typeck.c
@@ -9354,10 +9354,8 @@ maybe_warn_about_returning_address_of_local (tree retval)
 binding.  */
  tree base = DECL_DECOMP_BASE (whats_returned);
  if (TYPE_REF_P (TREE_TYPE (base)))
-   {
- tree init = DECL_INITIAL (base);
+   if (tree init = DECL_INITIAL (base))
  return maybe_warn_about_returning_address_of_local (init);
-   }
}
   bool w = false;
   auto_diagnostic_group d;
diff --git gcc/testsuite/g++.dg/cpp1z/decomp50.C 
gcc/testsuite/g++.dg/cpp1z/decomp50.C
new file mode 100644
index 000..c41c0d8cbbf
--- /dev/null
+++ gcc/testsuite/g++.dg/cpp1z/decomp50.C
@@ -0,0 +1,49 @@
+// PR c++/92106 - ICE with structured bindings and -Wreturn-local-addr. 
+// { dg-do compile { target c++17 } }
+
+template  struct B;
+template  struct B<_Tp *> { typedef _Tp reference; };
+struct C {
+  template  using rebind = _Up *;
+};
+template  class D {
+public:
+  typename B<_Iterator>::reference operator*();
+  void operator++();
+};
+
+template 
+bool operator!=(D<_Iterator, _Container>, D<_Iterator, _Container>);
+template  class F {
+public:
+  typedef _Tp value_type;
+};
+
+template  struct G {
+  template  struct H { using type = C::rebind<_Tp>; };
+  using const_pointer = typename H::type;
+};
+template > class I {
+  typedef D::const_pointer, int> const_iterator;
+
+public:
+  const_iterator begin();
+  const_iterator end();
+};
+
+struct A {
+  struct J {
+int name;
+int value;
+  };
+  I members;
+  template  const int *find(Key) {
+for (const auto &[name, value] : members)
+  return &value; // { dg-warning "address of local variable" }
+return nullptr;
+  }
+};
+int main() {
+  A a;
+  a.find("");
+}

[C++ PATCH] Implement P1073R3: Immediate functions (PR c++/88335)

2019-10-15 Thread Jakub Jelinek

Hi!

The following patch implements P1073R3, i.e. consteval, except that
virtual consteval is not supported (I think support for that would need to
include the consteval virtual methods at the end of the binfo structures
after all non-consteval virtual methods, but make sure we don't actually
emit those in the actual virtual tables etc.).

Unlike the previous implementation, this doesn't invoke consteval function
already during parsing, but later on, so there aren't issues with say
consteval constructors or consteval in default arguments.
Initially I thought the best spot to do that would be during cp_fold, but
that isn't called e.g. on template function bodies before instantiation,
or on the expressions from unevaluated contexts, so I had to add a function
to walk trees and evaluate consteval function calls found in there.
Furthermore, cp_fold isn't called just during cp_fold_function, but also
during fold_for_warn etc., so some consteval functions might be evaluated
multiple times, which can be a problem that if there are errors, they might
be emitted multiple times.

If you think it is worth avoiding the tree walk for -std=c++2a for the
finish_function of non-templates, perhaps it could be done during
cp_genericize too.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2019-10-15  Jakub Jelinek  

PR c++/88335 - Implement P1073R3: Immediate functions
c-family/
* c-common.h (enum rid): Add RID_CONSTEVAL.
* c-common.c (c_common_reswords): Add consteval.
cp/
* cp-tree.h (struct lang_decl_fn): Add immediate_fn_p bit.
(DECL_IMMEDIATE_FUNCTION_P, SET_DECL_IMMEDIATE_FUNCTION_P): Define.
(enum cp_decl_spec): Add ds_consteval.
(cxx_eval_consteval): Declare.
* parser.c (cp_keyword_starts_decl_specifier_p): Adjust comments
for C++11 and C++20 specifiers.  Handle RID_CONSTEVAL.
(CP_PARSER_FLAGS_ONLY_MUTABLE_OR_CONSTEXPR): Adjust comment.
(cp_parser_has_attribute_expression): Call cxx_eval_consteval.
(cp_parser_lambda_declarator_opt): Handle ds_consteval.
(cp_parser_decl_specifier_seq): Handle RID_CONSTEVAL.
(cp_parser_explicit_instantiation): Diagnose explicit instantiation
with consteval specifier.
(set_and_check_decl_spec_loc): Add consteval entry, formatting fix.
* call.c (build_addr_func): For direct calls to immediate functions
use build_address rather than decay_conversion.
* error.c (dump_function_decl): Handle DECL_IMMEDIATE_FUNCTION_P.
* semantics.c (expand_or_defer_fn_1): Use tentative linkage and don't
call mark_needed for immediate functions.
(finish_decltype_type): Call cxx_eval_consteval.
* typeck.c (cxx_sizeof_or_alignof_expr): Likewise.  Formatting fix.
(cp_build_addr_expr_1): Reject taking address of immediate function
outside of immediate function.
* decl.c (validate_constexpr_redeclaration): Diagnose consteval
vs. non-consteval or vice versa redeclaration.  Use
SET_DECL_IMMEDIATE_FUNCTION_P if new_decl is immediate function.
(check_tag_decl): Use %qs with keyword string to simplify translation.
Handle ds_consteval.
(start_decl): Adjust diagnostics for static or thread_local variables
in immediate functions.
(check_initializer): Call cxx_eval_consteval.
(grokfndecl): Call sorry_at on virtual consteval.  Use %qs with keyword
to string to simplify translation.  Diagnose consteval main.  Use
SET_DECL_IMMEDIATE_FUNCTION_P for consteval.
(grokdeclarator): Handle consteval.  Use %qs with keyword strings to
simplify translation.  Use separate ifs instead of chained else if
for invalid specifiers.  For constinit clear constinit_p rather than
constexpr_p.
(finish_function): Call cxx_eval_consteval for non-immediate functions.
* constexpr.c (struct constexpr_global_ctx): Add in_consteval field,
initialize it in the constructor.
(cxx_eval_call_expression): When encountering immediate function call
outside of in_consteval and outside of immediate function, call
cxx_constant_value.
(find_immediate_fndecl): New function.
(cxx_eval_outermost_constant_expr): Allow consteval calls returning
void.  Set global_ctx.in_consteval.  Diagnose returning address of
immediate function from in_consteval evaluation.
(fold_non_dependent_expr_template): Add forward declaration.  Add
allow_non_constant argument and pass it through to
cxx_eval_outermost_constant_expr.
(cxx_eval_consteval_r, cxx_eval_consteval): New functions.
(fold_non_dependent_expr, fold_non_dependent_init): Adjust
fold_non_dependent_expr_template callers.
* method.c (defaulted_late_check): Adjust diagnostics for consteval.
* except.c (finish_noexcept_expr): Call cxx_eval_consteval.

Re: [PATCH V4] Generalized predicate/condition for parameter reference in IPA (PR ipa/91088)

2019-10-15 Thread Jan Hubicka

> >> +  if (unmodified_parm_or_parm_agg_item (fbi, stmt, expr, index_p, 
> >> &size,
> >> + aggpos))
> >> + {
> >> +  tree type = TREE_TYPE (expr);
> >> +
> >> +   /* Stop if found bit-field whose size does not equal any natural
> >> +  integral type.  */
> >> +   if (maybe_ne (tree_to_poly_int64 (TYPE_SIZE (type)), size))
> >> + break;
> 
> > Why is this needed?
> This is to exclude bit-field reference. Actually, the restrict is not 
> required, a TODO to remove it. Now I directly check bit-field attribute.
> 
> >> - = add_condition (summary, index, size, &aggpos, this_code,
> >> -  unshare_expr_without_location
> >> -  (gimple_cond_rhs (last)));
> >> + = add_condition (summary, index, type, &aggpos, this_code,
> >> +  gimple_cond_rhs (last), param_ops);
> 
> > An why unsharing is no longer needed here?
> > It is importnat to avoid anything which reffers to function local blocks
> > to get into the global LTO stream.
> Do unsharing inside add_condition, since constants in param_ops also need to 
> be unshared.
> 
> >> +   if (op1.code != op2.code
> >> +   || op1.val_is_rhs != op2.val_is_rhs
> >> +   || !vrp_operand_equal_p (op1.val, op2.val)
> 
> > Why you need vrp_operand_equal_p instead of operand_equal_p here?
> op1.val and op2.val might be NULL_TREE.
> 
> > Overall the patch looks very reasonable. We may end up with bit more
> > general expressions (i.e. supporting arithmetics involving more than one
> > operand).
> If you means more than one constant operand, I'v changed the patch to support 
> ternary operation.
> 
> And if you means more than one parameter operand, this will involve much more 
> code change in ipa-fnsummary, it's better to let it be another TODO.
> 
> > I see you also fixed a lot of typos in comments, please go head and
> > commit them separately as obvious.
> Removed.
> 
> Thank for your comments.

This is OK now. please add changelog.

Honza
> Feng
> ---
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 0e3693598e7..05b1bb97c6b 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -11943,6 +11943,13 @@ For switch exceeding this limit, IPA-CP will not 
> construct cloning cost
>  predicate, which is used to estimate cloning benefit, for default case
>  of the switch statement.
>  
> +@item ipa-max-param-expr-ops
> +IPA-CP will analyze conditional statement that references some function
> +parameter to estimate benefit for cloning upon certain constant value.
> +But if number of operations in a parameter expression exceeds
> +@option{ipa-max-param-expr-ops}, the expression is treated as complicated
> +one, and is not handled by IPA analysis.
> +
>  @item lto-partitions
>  Specify desired number of partitions produced during WHOPR compilation.
>  The number of partitions should exceed the number of CPUs used for 
> compilation.
> diff --git a/gcc/ipa-fnsummary.c b/gcc/ipa-fnsummary.c
> index 1bf1806eaf8..c25e3395f59 100644
> --- a/gcc/ipa-fnsummary.c
> +++ b/gcc/ipa-fnsummary.c
> @@ -331,6 +331,8 @@ evaluate_conditions_for_known_args (struct cgraph_node 
> *node,
>  {
>tree val;
>tree res;
> +  int j;
> +  struct expr_eval_op *op;
>  
>/* We allow call stmt to have fewer arguments than the callee function
>   (especially for K&R style programs).  So bound check here (we assume
> @@ -382,7 +384,7 @@ evaluate_conditions_for_known_args (struct cgraph_node 
> *node,
> continue;
>   }
>  
> -  if (maybe_ne (tree_to_poly_int64 (TYPE_SIZE (TREE_TYPE (val))), 
> c->size))
> +  if (TYPE_SIZE (c->type) != TYPE_SIZE (TREE_TYPE (val)))
>   {
> clause |= 1 << (i + predicate::first_dynamic_condition);
> nonspec_clause |= 1 << (i + predicate::first_dynamic_condition);
> @@ -394,7 +396,30 @@ evaluate_conditions_for_known_args (struct cgraph_node 
> *node,
> continue;
>   }
>  
> -  val = fold_unary (VIEW_CONVERT_EXPR, TREE_TYPE (c->val), val);
> +  val = fold_unary (VIEW_CONVERT_EXPR, c->type, val);
> +  for (j = 0; vec_safe_iterate (c->param_ops, j, &op); j++)
> + {
> +   if (!val)
> + break;
> +   if (!op->val[0])
> + val = fold_unary (op->code, op->type, val);
> +   else if (!op->val[1])
> + val = fold_binary (op->code, op->type,
> +op->index ? op->val[0] : val,
> +op->index ? val : op->val[0]);
> +   else if (op->index == 0)
> + val = fold_ternary (op->code, op->type,
> + val, op->val[0], op->val[1]);
> +   else if (op->index == 1)
> + val = fold_ternary (op->code, op->type,
> + op->val[0], val, op->val[1]);
> +   else if (op->index == 2)
> + val = fold_ternary (op->code,

Re: [PATCH][wwwdocs] Purge CVS from gccmission.html

2019-10-15 Thread Gerald Pfeifer

On Mon, 14 Oct 2019, Kyrill Tkachov wrote:
>> Surely would be fine with me.
> I see, thanks. Here's a proposed patch then.

My previous mail was meant to pre-approve your patch. ;-)

Yes, this is okay.

Thanks,
Gerald

Re: [PATCH] rs6000: -flto forgets 'no-vsx' function attributes (PR target/70010)

2019-10-15 Thread Peter Bergner

On 10/15/19 10:44 AM, Richard Biener wrote:
> On October 15, 2019 5:09:52 PM GMT+02:00, Peter Bergner 
>  wrote:
>> If the user explicitly said not to compile a function with a particular
>> option, how can we justify ignoring that request just because we're
>> inlining it?  We don't do that for the out of line version of that
>> callee function.
>
> I wonder how you can tell apart explicit vs. Implicit in the LTO context
> where the option is represented as target attribute on the function.

Ah, so all of the options used to compile the callee in an LTO context,
whether they were implicit or explicit will look explicit?  I agree, that
would be a problem!

Jiufu, can you see if there is a way to determine whether a callee
option in the LTO context really was an explicit option versus an
implicit/default option?  ...or does your follow on patch to my
patch do that?

Peter

Re: [PATCH V4] Extend IPA-CP to support arithmetically-computed value-passing on by-ref argument (PR ipa/91682)

2019-10-15 Thread Philipp Tomsich

Feng,

this now looks fine to me: what is the current schedule to get this merged?

Thanks,
Philipp.

> On 19.09.2019, at 16:30, Feng Xue OS  
> wrote:
> 
> Fix a bug on unary/binary operation check.
> 
> Feng
> ---
> diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
> index 33d52fe5537..f218f1093b8 100644
> --- a/gcc/ipa-cp.c
> +++ b/gcc/ipa-cp.c
> @@ -1244,23 +1244,23 @@ initialize_node_lattices (struct cgraph_node *node)
>   }
> }
> 
> -/* Return the result of a (possibly arithmetic) pass through jump function
> -   JFUNC on the constant value INPUT.  RES_TYPE is the type of the parameter
> -   to which the result is passed.  Return NULL_TREE if that cannot be
> -   determined or be considered an interprocedural invariant.  */
> +/* Return the result of a (possibly arithmetic) operation on the constant
> +   value INPUT.  OPERAND is 2nd operand for binary operation.  RES_TYPE is
> +   the type of the parameter to which the result is passed.  Return
> +   NULL_TREE if that cannot be determined or be considered an
> +   interprocedural invariant.  */
> 
> static tree
> -ipa_get_jf_pass_through_result (struct ipa_jump_func *jfunc, tree input,
> - tree res_type)
> +ipa_get_jf_arith_result (enum tree_code opcode, tree input, tree operand,
> +  tree res_type)
> {
>   tree res;
> 
> -  if (ipa_get_jf_pass_through_operation (jfunc) == NOP_EXPR)
> +  if (opcode == NOP_EXPR)
> return input;
>   if (!is_gimple_ip_invariant (input))
> return NULL_TREE;
> 
> -  tree_code opcode = ipa_get_jf_pass_through_operation (jfunc);
>   if (!res_type)
> {
>   if (TREE_CODE_CLASS (opcode) == tcc_comparison)
> @@ -1274,8 +1274,7 @@ ipa_get_jf_pass_through_result (struct ipa_jump_func 
> *jfunc, tree input,
>   if (TREE_CODE_CLASS (opcode) == tcc_unary)
> res = fold_unary (opcode, res_type, input);
>   else
> -res = fold_binary (opcode, res_type, input,
> -ipa_get_jf_pass_through_operand (jfunc));
> +res = fold_binary (opcode, res_type, input, operand);
> 
>   if (res && !is_gimple_ip_invariant (res))
> return NULL_TREE;
> @@ -1283,6 +1282,21 @@ ipa_get_jf_pass_through_result (struct ipa_jump_func 
> *jfunc, tree input,
>   return res;
> }
> 
> +/* Return the result of a (possibly arithmetic) pass through jump function
> +   JFUNC on the constant value INPUT.  RES_TYPE is the type of the parameter
> +   to which the result is passed.  Return NULL_TREE if that cannot be
> +   determined or be considered an interprocedural invariant.  */
> +
> +static tree
> +ipa_get_jf_pass_through_result (struct ipa_jump_func *jfunc, tree input,
> + tree res_type)
> +{
> +  return ipa_get_jf_arith_result (ipa_get_jf_pass_through_operation (jfunc),
> +   input,
> +   ipa_get_jf_pass_through_operand (jfunc),
> +   res_type);
> +}
> +
> /* Return the result of an ancestor jump function JFUNC on the constant value
>INPUT.  Return NULL_TREE if that cannot be determined.  */
> 
> @@ -1416,6 +1430,146 @@ ipa_context_from_jfunc (ipa_node_params *info, 
> cgraph_edge *cs, int csidx,
>   return ctx;
> }
> 
> +/* See if NODE is a clone with a known aggregate value at a given OFFSET of a
> +   parameter with the given INDEX.  */
> +
> +static tree
> +get_clone_agg_value (struct cgraph_node *node, HOST_WIDE_INT offset,
> +  int index)
> +{
> +  struct ipa_agg_replacement_value *aggval;
> +
> +  aggval = ipa_get_agg_replacements_for_node (node);
> +  while (aggval)
> +{
> +  if (aggval->offset == offset
> +   && aggval->index == index)
> + return aggval->value;
> +  aggval = aggval->next;
> +}
> +  return NULL_TREE;
> +}
> +
> +/* Determine whether ITEM, jump function for an aggregate part, evaluates to 
> a
> +   single known constant value and if so, return it.  Otherwise return NULL.
> +   NODE and INFO describes the caller node or the one it is inlined to, and
> +   its related info.  */
> +
> +static tree
> +ipa_agg_value_from_node (class ipa_node_params *info,
> +  struct cgraph_node *node,
> +  struct ipa_agg_jf_item *item)
> +{
> +  tree value = NULL_TREE;
> +  int src_idx;
> +
> +  if (item->offset < 0 || item->jftype == IPA_JF_UNKNOWN)
> +return NULL_TREE;
> +
> +  if (item->jftype == IPA_JF_CONST)
> +return item->value.constant;
> +
> +  gcc_checking_assert (item->jftype == IPA_JF_PASS_THROUGH
> +|| item->jftype == IPA_JF_LOAD_AGG);
> +
> +  src_idx = item->value.pass_through.formal_id;
> +
> +  if (info->ipcp_orig_node)
> +{
> +  if (item->jftype == IPA_JF_PASS_THROUGH)
> + value = info->known_csts[src_idx];
> +  else
> + value = get_clone_agg_value (node, item->value.load_agg.offset,
> +  src_idx);
> +}
> +  else if (info->lattices)
> +{
> +  class ipcp_param_lattices *sr

Re: [PATCH V3] Loop split upon semi-invariant condition (PR tree-optimization/89134)

2019-10-15 Thread Michael Matz

Hi,

On Tue, 15 Oct 2019, Philipp Tomsich wrote:

> This looks good from our side and has shown useful (combined with the other 2 
> patches) in
> our testing with SPEC2017.
> Given that this looks final: what is the plan for getting this merged?

I'll get to review this v3 version this week.


Ciao,
Michael.

> 
> Thanks,
> Philipp.
> 
> > On 12.09.2019, at 12:23, Feng Xue OS  > com> wrote:
> > 
> > ---
> > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > index 1391a562c35..28981fa1048 100644
> > --- a/gcc/doc/invoke.texi
> > +++ b/gcc/doc/invoke.texi
> > @@ -11418,6 +11418,19 @@ The maximum number of branches unswitched in a 
> > single loop.
> > @item lim-expensive
> > The minimum cost of an expensive expression in the loop invariant motion.
> > 
> > +@item max-cond-loop-split-insns
> > +In a loop, if a branch of a conditional statement is selected since certain
> > +loop iteration, any operand that contributes to computation of the 
> > conditional
> > +expression remains unchanged in all following iterations, the statement is
> > +semi-invariant, upon which we can do a kind of loop split transformation.
> > +@option{max-cond-loop-split-insns} controls maximum number of insns to be
> > +added due to loop split on semi-invariant conditional statement.
> > +
> > +@item min-cond-loop-split-prob
> > +When FDO profile information is available, 
> > @option{min-cond-loop-split-prob}
> > +specifies minimum threshold for probability of semi-invariant condition
> > +statement to trigger loop split.
> > +
> > @item iv-consider-all-candidates-bound
> > Bound on number of candidates for induction variables, below which
> > all candidates are considered for each use in induction variable
> > diff --git a/gcc/params.def b/gcc/params.def
> > index 13001a7bb2d..12bc8c26c9e 100644
> > --- a/gcc/params.def
> > +++ b/gcc/params.def
> > @@ -386,6 +386,20 @@ DEFPARAM(PARAM_MAX_UNSWITCH_LEVEL,
> > "The maximum number of unswitchings in a single loop.",
> > 3, 0, 0)
> > 
> > +/* The maximum number of increased insns due to loop split on 
> > semi-invariant
> > +   condition statement.  */
> > +DEFPARAM(PARAM_MAX_COND_LOOP_SPLIT_INSNS,
> > +   "max-cond-loop-split-insns",
> > +   "The maximum number of insns to be added due to loop split on "
> > +   "semi-invariant condition statement.",
> > +   100, 0, 0)
> > +
> > +DEFPARAM(PARAM_MIN_COND_LOOP_SPLIT_PROB,
> > +   "min-cond-loop-split-prob",
> > +   "The minimum threshold for probability of semi-invariant condition "
> > +   "statement to trigger loop split.",
> > +   30, 0, 100)
> > +
> > /* The maximum number of insns in loop header duplicated by the copy loop
> >headers pass.  */
> > DEFPARAM(PARAM_MAX_LOOP_HEADER_INSNS,
> > 
> > diff --git a/gcc/testsuite/g++.dg/tree-ssa/loop-cond-split-1.C 
> > b/gcc/testsuite/g++.dg/tree-ssa/loop-cond-split-1.C
> > new file mode 100644
> > index 000..51f9da22fc7
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.dg/tree-ssa/loop-cond-split-1.C
> > @@ -0,0 +1,33 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O3 -fdump-tree-lsplit-details" } */
> > +
> > +#include 
> > +#include 
> > +
> > +using namespace std;
> > +
> > +class  A
> > +{
> > +public:
> > +  bool empty;
> > +  void set (string s);
> > +};
> > +
> > +class  B
> > +{
> > +  map m;
> > +  void f ();
> > +};
> > +
> > +extern A *ga;
> > +
> > +void B::f ()
> > +{
> > +  for (map::iterator iter = m.begin (); iter != m.end (); 
> > ++iter)
> > +{
> > +  if (ga->empty)
> > +ga->set (iter->second);
> > +}
> > +}
> > +
> > +/* { dg-final { scan-tree-dump-times "split loop 1 at branch" 1 "lsplit" } 
> > } */
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loop-cond-split-1.c 
> > b/gcc/testsuite/gcc.dg/tree-ssa/loop-cond-split-1.c
> > new file mode 100644
> > index 000..bbd522d6bcd
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/loop-cond-split-1.c
> > @@ -0,0 +1,23 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O3 -fdump-tree-lsplit-details" } */
> > +
> > +__attribute__((pure)) __attribute__((noinline)) int inc (int i)
> > +{
> > +  return i + 1;
> > +}
> > +
> > +extern int do_something (void);
> > +extern int b;
> > +
> > +void test(int n)
> > +{
> > +  int i;
> > +
> > +  for (i = 0; i < n; i = inc (i))
> > +{
> > +  if (b)
> > +b = do_something();
> > +}
> > +}
> > +
> > +/* { dg-final { scan-tree-dump-times "split loop 1 at branch" 1 "lsplit" } 
> > } */
> > diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
> > index f5f083384bc..e4a1b6d2019 100644
> > --- a/gcc/tree-ssa-loop-split.c
> > +++ b/gcc/tree-ssa-loop-split.c
> > @@ -32,7 +32,10 @@ along with GCC; see the file COPYING3.  If not see
> > #include "tree-ssa-loop.h"
> > #include "tree-ssa-loop-manip.h"
> > #include "tree-into-ssa.h"
> > +#include "tree-inline.h"
> > +#include "tree-cfgcleanup.h"
> > #include "cfgloop.h"
> > +#include "params.h"
> > #include "tree-scalar-evolution.h"
> > #include "gimple-i

Re: [PATCH V3] Loop split upon semi-invariant condition (PR tree-optimization/89134)

2019-10-15 Thread Philipp Tomsich

Feng,

This looks good from our side and has shown useful (combined with the other 2 
patches) in
our testing with SPEC2017.
Given that this looks final: what is the plan for getting this merged?

Thanks,
Philipp.

> On 12.09.2019, at 12:23, Feng Xue OS  
> wrote:
> 
> ---
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 1391a562c35..28981fa1048 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -11418,6 +11418,19 @@ The maximum number of branches unswitched in a 
> single loop.
> @item lim-expensive
> The minimum cost of an expensive expression in the loop invariant motion.
> 
> +@item max-cond-loop-split-insns
> +In a loop, if a branch of a conditional statement is selected since certain
> +loop iteration, any operand that contributes to computation of the 
> conditional
> +expression remains unchanged in all following iterations, the statement is
> +semi-invariant, upon which we can do a kind of loop split transformation.
> +@option{max-cond-loop-split-insns} controls maximum number of insns to be
> +added due to loop split on semi-invariant conditional statement.
> +
> +@item min-cond-loop-split-prob
> +When FDO profile information is available, @option{min-cond-loop-split-prob}
> +specifies minimum threshold for probability of semi-invariant condition
> +statement to trigger loop split.
> +
> @item iv-consider-all-candidates-bound
> Bound on number of candidates for induction variables, below which
> all candidates are considered for each use in induction variable
> diff --git a/gcc/params.def b/gcc/params.def
> index 13001a7bb2d..12bc8c26c9e 100644
> --- a/gcc/params.def
> +++ b/gcc/params.def
> @@ -386,6 +386,20 @@ DEFPARAM(PARAM_MAX_UNSWITCH_LEVEL,
>   "The maximum number of unswitchings in a single loop.",
>   3, 0, 0)
> 
> +/* The maximum number of increased insns due to loop split on semi-invariant
> +   condition statement.  */
> +DEFPARAM(PARAM_MAX_COND_LOOP_SPLIT_INSNS,
> + "max-cond-loop-split-insns",
> + "The maximum number of insns to be added due to loop split on "
> + "semi-invariant condition statement.",
> + 100, 0, 0)
> +
> +DEFPARAM(PARAM_MIN_COND_LOOP_SPLIT_PROB,
> + "min-cond-loop-split-prob",
> + "The minimum threshold for probability of semi-invariant condition "
> + "statement to trigger loop split.",
> + 30, 0, 100)
> +
> /* The maximum number of insns in loop header duplicated by the copy loop
>headers pass.  */
> DEFPARAM(PARAM_MAX_LOOP_HEADER_INSNS,
> 
> diff --git a/gcc/testsuite/g++.dg/tree-ssa/loop-cond-split-1.C 
> b/gcc/testsuite/g++.dg/tree-ssa/loop-cond-split-1.C
> new file mode 100644
> index 000..51f9da22fc7
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/tree-ssa/loop-cond-split-1.C
> @@ -0,0 +1,33 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-lsplit-details" } */
> +
> +#include 
> +#include 
> +
> +using namespace std;
> +
> +class  A
> +{
> +public:
> +  bool empty;
> +  void set (string s);
> +};
> +
> +class  B
> +{
> +  map m;
> +  void f ();
> +};
> +
> +extern A *ga;
> +
> +void B::f ()
> +{
> +  for (map::iterator iter = m.begin (); iter != m.end (); 
> ++iter)
> +{
> +  if (ga->empty)
> +ga->set (iter->second);
> +}
> +}
> +
> +/* { dg-final { scan-tree-dump-times "split loop 1 at branch" 1 "lsplit" } } 
> */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loop-cond-split-1.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/loop-cond-split-1.c
> new file mode 100644
> index 000..bbd522d6bcd
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/loop-cond-split-1.c
> @@ -0,0 +1,23 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-lsplit-details" } */
> +
> +__attribute__((pure)) __attribute__((noinline)) int inc (int i)
> +{
> +  return i + 1;
> +}
> +
> +extern int do_something (void);
> +extern int b;
> +
> +void test(int n)
> +{
> +  int i;
> +
> +  for (i = 0; i < n; i = inc (i))
> +{
> +  if (b)
> +b = do_something();
> +}
> +}
> +
> +/* { dg-final { scan-tree-dump-times "split loop 1 at branch" 1 "lsplit" } } 
> */
> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
> index f5f083384bc..e4a1b6d2019 100644
> --- a/gcc/tree-ssa-loop-split.c
> +++ b/gcc/tree-ssa-loop-split.c
> @@ -32,7 +32,10 @@ along with GCC; see the file COPYING3.  If not see
> #include "tree-ssa-loop.h"
> #include "tree-ssa-loop-manip.h"
> #include "tree-into-ssa.h"
> +#include "tree-inline.h"
> +#include "tree-cfgcleanup.h"
> #include "cfgloop.h"
> +#include "params.h"
> #include "tree-scalar-evolution.h"
> #include "gimple-iterator.h"
> #include "gimple-pretty-print.h"
> @@ -40,7 +43,9 @@ along with GCC; see the file COPYING3.  If not see
> #include "gimple-fold.h"
> #include "gimplify-me.h"
> 
> -/* This file implements loop splitting, i.e. transformation of loops like
> +/* This file implements two kinds of loop splitting.
> +
> +   One transformation of loops like:
> 
>for (i = 0; i < 100; i++)
>  {
> @@ -61

Re: [PATCH] rs6000: -flto forgets 'no-vsx' function attributes (PR target/70010)

2019-10-15 Thread Richard Biener

On October 15, 2019 5:09:52 PM GMT+02:00, Peter Bergner  
wrote:
>On 10/15/19 4:32 AM, Richard Biener wrote:
>> I believe this is going to bite you exactly in the case you want the
>> opposite behavior.  If you have CUs compiled with defaults and
>> a specialized one with VSX that calls into generic compiled functions
>> you _do_ want to allow inlining into the VSX enabled routines.
>
>First off, there's nothing special about VSX/non-VSX here, so when I
>talk
>about VSX below, I'm really just using it as a stand-in for any option.
>
>So what you're saying is that a VSX enabled function that calls a
>non-VSX
>enabled function should be able to inline that non-VSX function.  That
>is
>what the current code allows and I agree that should still be allowed.
>My extra code only disallows that scenario *IF* the user explicitly
>said
>DO NOT compile the callee function with VSX.  It still allows it to be
>inlined if the callee was compiled without VSX implicitly / by default,
>so I don't think my patch disallows the scenario you mention above.
>
>If the user explicitly said not to compile a function with a particular
>option, how can we justify ignoring that request just because we're
>inlining it?  We don't do that for the out of line version of that
>callee
>function.

You can probably tell whether there's an explicit -mno-vsx on the command line 
I wonder how you can tell apart explicit vs. Implicit in the LTO context where 
the option is represented as target attribute on the function.

>> How can it be fatal to inline a non-VSX function into a VSX one?
>
>I can think of scenarios where it could be fatal (again, VSX is just a
>stand-in for any option), but maybe the user used -mno-vsx for
>performance
>reasons or maybe this is kernel code and the user knows this function
>will
>be run with VSX hardware disabled or ...
>
>
>Peter

Re: [PATCH] OpenACC reference count overhaul

2019-10-15 Thread Thomas Schwinge

Hi Julian!

On 2019-10-03T09:35:04-0700, Julian Brown  wrote:
> This patch has been broken out of the patch supporting OpenACC 2.6 manual
> deep copy last posted here:
>
>   https://gcc.gnu.org/ml/gcc-patches/2018-12/msg01084.html

Thanks.

> a couple of
> tests need fixing also

Let's look at these first, and independently.

The overall goal not being to bend test cases until they (again) work,
but rather to verify what they're testing, so that they're valid OpenACC
code, or if not that, then they're testing specifics of the GCC
implementation (for example, the 'dg-shouldfail' test cases).

>   * testsuite/libgomp.oacc-c-c++-common/context-2.c: Use correct API to
>   deallocate acc_copyin'd data.
>   * testsuite/libgomp.oacc-c-c++-common/context-4.c: Likewise.

> --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/context-2.c
> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/context-2.c

> +acc_delete (&h_X[0], N * sizeof (float));
> +acc_delete (&h_Y1[0], N * sizeof (float));
> +
>  free (h_X);
>  free (h_Y1);
>  free (h_Y2);
>  
> -acc_free (d_X);
> -acc_free (d_Y);

> --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/context-4.c
> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/context-4.c

> +acc_delete (&h_X[0], N * sizeof (float));
> +acc_delete (&h_Y1[0], N * sizeof (float));
> +
>  free (h_X);
>  free (h_Y1);
>  free (h_Y2);
>  
> -acc_free (d_X);
> -acc_free (d_Y);

ACK -- but do we understand why the same shouldn't be applied to the very
similar 'libgomp.oacc-c-c++-common/context-1.c' and
'libgomp.oacc-c-c++-common/context-3.c', too?

I suppose your testing of the "OpenACC reference count overhaul" tripped
over these constructs?  (Why just some, then?)

The same pattern ('acc_copyin', 'acc_free') also appears in
'libgomp.oacc-c-c++-common/clauses-1.c', does that also need to be
corrected?  Same in 'libgomp.oacc-c-c++-common/lib-13.c' (... where that
test case actually is titled "Check acc_is_present and acc_delete"
instead of "[...] acc_free", huh), 'libgomp.oacc-c-c++-common/lib-14.c',
'libgomp.oacc-c-c++-common/lib-18.c'.

Then, the 'acc_deviceptr', 'acc_unmap_data', 'acc_free' usage in
'libgomp.oacc-c-c++-common/clauses-1.c' also seems strange, as the
respective 'acc_free' argument certainly is not (at least not directly) a
"pointer value that was returned by a call to 'acc_malloc'".  Does it
make sense to (continue to) support that, assuming that's how it's
implemented internally, or should these be corrected to valid OpenACC,
too?  Same in 'libgomp.oacc-c-c++-common/present-1.c'.

Same in 'libgomp.oacc-c-c++-common/clauses-2.c' (we 'dg-shouldfail'
earlier, but the later code should otherwise be made correct anyway).

Several of these things again in 'libgomp.oacc-c-c++-common/nested-1.c'.

(The other 'libgomp.oacc-c-c++-common/lib-*.c' ones are correctly pairing
'acc_malloc', 'acc_free', as far as I can tell.)

> --- a/libgomp/testsuite/libgomp.oacc-fortran/data-2.f90
> +++ b/libgomp/testsuite/libgomp.oacc-fortran/data-2.f90

> @@ -70,10 +71,14 @@ program test
>  end do
>!$acc end parallel
>
> -  !$acc exit data copyout (d(1:N)) async
> +  !$acc exit data delete (c(1:N)) copyout (d(1:N)) async
>!$acc exit data async
>!$acc wait

ACK, but also it seems to me as if the '!$acc exit data async' (currently
"clause-less") was meant to carry the 'delete (c(1:N))' clause?

> @@ -1,4 +1,5 @@
>  ! { dg-do run }
> +! { dg-additional-options "-cpp" }

> [...]

> +#if !ACC_MEM_SHARED
> +  if (acc_is_present (c) .eqv. .TRUE.) call abort
> +#endif

;-) Should be able to simplify that one to 'if (acc_is_present (c))', no?

But is that a really useful test here: don't we elsewhere have enough of
such 'acc_is_present' testing?  (That is, OK to keep that, but likewise
OK to drop that.)

And, just for background information: per PR84381, it has been suggested
to use the Fortran standard 'stop' (or was it 'error stop'?) instead of
'call abort'.  But no need to change that here individually; the libgomp
testsuite still (or, again?)  contains a lot of 'call abort'.

> +
>do i = 1, N
>  if (d(i) .ne. 4.0) call abort
>end do

..., for example, here.  ;-) (For avoidance of doubt, I'm not asking you
to change these now.)

So, please address these items first, as separate "Fix OpenACC test cases
regarding 'acc_malloc', 'acc_free' pairing", and "Fix OpenACC test case
for unstructured data regions" (or similar) commits.  If you're confident
you're doing "the obvious", feel free to commit without further review.

Grüße
 Thomas

signature.asc
Description: PGP signature

Re: Type representation in CTF and DWARF

2019-10-15 Thread Nick Alcock

On 9 Oct 2019, Indu Bhagat told this:

> Yes, CTF does not support C++ at this time. To cover all of C (including
> GNU C extensions), we need to add representation for things like Vector type,
> non IEEE float etc. (somewhat infrequently occurring constructs)

One note: adding C++ support will not make the representation of CTF for
C any larger, because I plan to do as DWARF does and have a language tag
in the header, and only support one language per CTF dictionary[1].  The
type section format will otherwise be completely distinct between the
two languages, specifically in order that the C side of things not pay
the price for the (necessarily richer) C++ type representation. This is
very much a C++ thing: don't pay for what you don't use :)

So there's no need to worry that adding C++ support will make any C
compactness figures worse. You only need to consider that the C++ CTF
representation may not be able to be as compact as the C representation
-- and even there I hope to come close.

[1! though there is a possibility of having a C++ dictionary cite types
from a C one, allowing some sharing: this is all format v5 stuff,
i.e. two format revs away, and this bit in particular is not yet
designed, but feels possible.)

Re: [PATCH] rs6000: -flto forgets 'no-vsx' function attributes (PR target/70010)

2019-10-15 Thread Peter Bergner

On 10/15/19 4:32 AM, Richard Biener wrote:
> I believe this is going to bite you exactly in the case you want the
> opposite behavior.  If you have CUs compiled with defaults and
> a specialized one with VSX that calls into generic compiled functions
> you _do_ want to allow inlining into the VSX enabled routines.

First off, there's nothing special about VSX/non-VSX here, so when I talk
about VSX below, I'm really just using it as a stand-in for any option.

So what you're saying is that a VSX enabled function that calls a non-VSX
enabled function should be able to inline that non-VSX function.  That is
what the current code allows and I agree that should still be allowed.
My extra code only disallows that scenario *IF* the user explicitly said
DO NOT compile the callee function with VSX.  It still allows it to be
inlined if the callee was compiled without VSX implicitly / by default,
so I don't think my patch disallows the scenario you mention above.

If the user explicitly said not to compile a function with a particular
option, how can we justify ignoring that request just because we're
inlining it?  We don't do that for the out of line version of that callee
function.

> How can it be fatal to inline a non-VSX function into a VSX one?

I can think of scenarios where it could be fatal (again, VSX is just a
stand-in for any option), but maybe the user used -mno-vsx for performance
reasons or maybe this is kernel code and the user knows this function will
be run with VSX hardware disabled or ...

Peter

Re: [Patch][Fortran] OpenMP+OpenACC: Remove bogus contigous-pointer check (PR/65438)

2019-10-15 Thread Jakub Jelinek

On Tue, Oct 15, 2019 at 04:10:34PM +0200, Tobias Burnus wrote:
> But for "omp update" with noncontiguous memory, it is not clear what's
> faster:
> * Simply updating one contiguous memory block, enclosing the gaps
> * Doing multiple updates of contiguous memory to take advantage of memory
> gaps.

omp target update with non-contiguous array sections isn't supported yet
even for C/C++, and when it is, it would be nice to handle it efficiently.
We already have omp_target_memcpy_rect function that can copy non-contiguous
data, but again there is no plugin API to allow at least for the common
cases that some hw is able to handle (perhaps 2 or 3 dimensions).  Can at
least CUDA handle that?  AMD GCN?
Anyway, I'm afraid that is left for next year.

Jakub

Re: [Patch][Fortran] OpenMP+OpenACC: Remove bogus contigous-pointer check (PR/65438)

2019-10-15 Thread Tobias Burnus


Hi Thomas,

On 10/15/19 3:07 PM, Thomas Schwinge wrote:

(Of cause, it is also run-time checkable

OK, I was about to ask for that, if that makes sense to do.


The question is for what. One could add it for run-time checking (e.g. 
for gfortran "-fcheck=…). Or for actions like "omp update", which 
permits strides/noncontiguous memory.


But for "omp update" with noncontiguous memory, it is not clear what's 
faster:

* Simply updating one contiguous memory block, enclosing the gaps
* Doing multiple updates of contiguous memory to take advantage of 
memory gaps.


Depending how large the gaps between the (contiguous) chunks are, one 
method or the other is faster. Thus, I don't think we do need this any 
time soon. At least for the OpenMP code I have seen, some additional 
compile-time checks make more sense before adding a run-time check. [No 
idea about the OpenACC FE code, yet.]



Is there any point in adding a test case for such constructs, in 
particular, which the current code would've (erroneously) flagged 
as"Noncontiguous deferred shape array", to codify/document that this 
is supported? 


My feeling is no – but if you think it makes sense. A simple example is 
the following:


integer, target :: A(100)
integer, pointer :: ptr(:)
ptr => A
!$acc data copy(ptr)
ptr(1) = ptr(2)
!$acc end data
end

Before, it failed with:

    4 | !$acc data copy(ptr)
  |    1
Error: Noncontiguous deferred shape array ‘ptr’ in MAP clause at (1)


I will now commit it as is, i.e. without a test case. If needed, one can 
still add it as additional commit.


Tobias

Re: [Patch][Fortran] OpenMP+OpenACC: Remove bogus contigous-pointer check (PR/65438)

2019-10-15 Thread Thomas Schwinge

Hi Tobias!

On 2019-10-15T11:42:12+0200, Tobias Burnus  wrote:
> Permit more valid code by removing a too tight constraint – simple 
> patch, very long background & history (feel free to skip).

Thanks for writing that down, that's helpful to have in the archives.
(Certainly helps me, learning Fortran piece by piece.)  ;-)

> openmp.c's "check_array_not_assumed" shows an error for:
> * assumed-size arrays  (makes totally sense)
> * assumed-rank arrays (makes sense as long as TS29113/F2018 is not 
> supported)
> * pointers which do not have the "contiguous" attribute
>
> Not only function name wise, the latter is the odd one out. While in 
> many cases a contiguous memory is required, one can either make it a 
> compile-time testable constraint ('simply contiguous') – or one puts the 
> onus is on the programmer; the latter seems to be done in the 
> OpenACC/OpenMP specs.
>
> (Of cause, it is also run-time checkable

OK, I was about to ask for that, if that makes sense to do.

> and gfortran uses a run-time 
> test when passing a potentially noncontiguous array to a dummy argument 
> with "contiguous" attribute; if the actual argument is contiguous, it is 
> passed as is - otherwise, a copy-in and copy-out is performed [for 
> intent(inout)].)

I read this such that what we've got is sufficient, no further run-time
checking is needed.

> HENCE: Remove a compile-time check when both specs (OpenMP/OpenACC) put 
> the onus on the programmer. As compile-time checks are only a subset, we 
> reject valid contiguous memory which is just not simply contiguous.

Is there any point in adding a test case for such constructs, in
particular, which the current code would've (erroneously) flagged as
"Noncontiguous deferred shape array", to codify/document that this is
supported?

> Additionally, the check only works if only an identifier is permitted as 
> argument (it checks the symbol). For derived-type components, the check 
> is not effective. (Assumed rank and assumed size are properties of dummy 
> arguments, hence, those work fine.)

> OK for the trunk?

I'll gladly defer ;-) to your Fortran expertise as well as OpenACC
specification interpretation.  (Test case, if feasible, pre-approved
anyway.)

> PS: This patch comes from the OG9 branch as 
> ac6c90812344f4f4cfe4d2f5901c1a9d038a4000 but was actually already in the 
> OG8 branch with commit d50862a7522446cf101eac9dc2e32967dc8318b7 from 
> 2015 (!)

(Just for the record, before that on gomp-4_0-branch, and before that in
an internal development branch.)  ;-)

> – I have used the ChangeLog entry from the latter.

ACK.  Given that you've now re-researched all the rationale, it's
certainly fine to add your name, too.  The ChangeLog date needs to be the
actual commit date.


Grüße
 Thomas


signature.asc
Description: PGP signature

Re: [patch] canonicalize unsigned [1,MAX] ranges into ~[0,0]

2019-10-15 Thread Iain Sandoe

Hi Aldy,

Rainer Orth  wrote:
>> 
>> On 10/15/19 7:58 AM, Rainer Orth wrote:
>>> Hi Aldy,
>>> 
>> ~[0,0] has been the accepted way for a long time, I'd really prefer to
>> keep that (for now).
> It has.  Very true.  But I don't necessarily think that means we should
> be introducing even more of 'em.
>>> [...]
 Happily, normalizing into ~0 for signed and [1,MAX] for unsigned,
 simplified the patch because it on longer needs tweaks to
 ranges_from_anti_range.
 
 OK for trunk?
>>> 
>>> the new testcase FAILs on several (all?) 32-bit targets:
>>> 
>>> +FAIL: gcc.dg/tree-ssa/evrp4.c scan-tree-dump evrp "[1B, -1B]"
>> 
>> That's unfortunate.
>> 
>> Is this the only test that is failing?
> 
> it's the only on on Solaris/SPARC and Solaris/x86.  Haven't checked
> other affected targets, though.
> 
>>> I'm seeing this on 32-bit i386-pc-solaris2.11 and sparc-sun-solaris2.11,
>>> with more reports for armv8l, pru, and s390x.
>>> 
>>> Comparing the dumps between 64 and 32-bit, I see
>>> 
>>> -_1: int * [1B, -1B]
>>> +_1: int * [1B, 4294967295B]
>> 
>> I wonder why 32-bit targets at displaying 4294967295 instead of -1.  Or are
>> pointers 64-bits here?
> 
> No, it's a pure 32-bit target.  The compiler is 32-bit, too, but
> bi-arch (32 and 64-bit).

identical result on i686-darwin9, also a pure32b target (with a 64b multilb).
Iain

Re: [PATCH 2/2][MSP430] Optimize zero_extend insns and PSImode pointer manipulation

2019-10-15 Thread Jozef Lawrynowicz

On Mon, Oct 14, 2019 at 10:22 PM Jeff Law  wrote:
>
> On 10/8/19 4:39 AM, Jozef Lawrynowicz wrote:
> > This patch has the functional changes to optimize zero_extend insns and 
> > pointer
> > manipulation in the large memory model.
> >
> >
> > 0002-MSP430-PSImode-pointer-manipulation-and-zero-extend-.patch
> >
> > From f8156e115c4743ce94a86835ffa5601b6d28a555 Mon Sep 17 00:00:00 2001
> > From: Jozef Lawrynowicz 
> > Date: Mon, 7 Oct 2019 11:44:16 +0100
> > Subject: [PATCH 2/2] MSP430: PSImode pointer manipulation and zero extend 
> > insn
> >  optimizations
> >
> > gcc/ChangeLog:
> >
> > 2019-10-08  Jozef Lawrynowicz  
> >
> >   * config/msp430/msp430.md (movqipsi): New.
> >   (zero_extendqipsi2): New.
> >   (zero_extendqisi2): Optimize case where src register and base dst
> >   register are the same.
> >   (zero_extendhipsi2): Don't use 430X insn for rYs->r case.
> >   (zero_extendpsisi2): Optimize r->m case.
> >   Add unnamed insn patterns to catch insns combine searches for when
> >   optimizing pointer manipulation.
> So you've got a movqipsi and zero_extendqipsi2 with the same RTL
> structure.  ISTM the movqipsi should just get removed.

Thanks, fixed and applied.
Jozef
>
>
> OK with that change.
>
> jeff

[PATCH] rs6000: Fix PR92093

2019-10-15 Thread Bill Schmidt


The test case added for PR91275 fails on big-endian because 
__builtin_crypto_vpmsumd
is not a bi-endian intrinsic; source code must account for endian differences 
when
calling this intrinsic.  Fixed this and a type issue that only shows up on 
32-bit.
I thought I had previously tested this on P8 BE, but clearly not.

Tested on powerpc64-unknown-linux-gnu, committed as obvious.  Will backport 
soon.

Thanks,
Bill


2019-10-15  Bill Schmidt  

PR target/92093
* gcc.target/powerpc/pr91275.c: Fix type and endian issues.


Index: gcc/testsuite/gcc.target/powerpc/pr91275.c
===
--- gcc/testsuite/gcc.target/powerpc/pr91275.c  (revision 276968)
+++ gcc/testsuite/gcc.target/powerpc/pr91275.c  (working copy)
@@ -10,12 +10,17 @@ int main() {
   const unsigned long long r0l = 0x8e7dfceac070e3a0;
   vector unsigned long long r0 = (vector unsigned long long) {r0l, 0}, v;
   const vector unsigned long long pd
-= (vector unsigned long) {0xc2LLU << 56, 0};
+= (vector unsigned long long) {0xc2LLU << 56, 0};
 
   v = __builtin_crypto_vpmsumd ((vector unsigned long long) {r0[0], 0}, pd);
 
+#if __LITTLE_ENDIAN__

   if (v[0] != 0x4000 || v[1] != 0x65bd7ab605a4a8ff)
 __builtin_abort ();
+#else
+  if (v[1] != 0x4000 || v[0] != 0x65bd7ab605a4a8ff)
+__builtin_abort ();
+#endif
 
   return 0;

 }

Re: [PATCH 0/2][MSP430] Optimize zero_extend insns and PSImode pointer manipulation

2019-10-15 Thread Jozef Lawrynowicz

On Mon, 14 Oct 2019 15:18:08 -0600
Jeff Law  wrote:

> On 10/8/19 4:34 AM, Jozef Lawrynowicz wrote:
> > In the large memory model, MSP430 instructions have some useful properties 
> > when
> > performing byte, word or address-word writes to registers or memory:
> > - Byte-writes to registers clear bits 19:8
> > - Word-writes to registers clear bits 19:16
> > - PSImode writes to memory clear bits 16:4 of the second memory word
> > 
> > This patch makes use of these properties to optimize some zero_extend
> > instructions.
> > 
> > There are some "synonyms" for these zero_extend instructions that combine
> > searches for when optimizing code which manipulates PSImode pointers. The 
> > patch
> > adds a number of these unnamed RTL insns.
> > 
> > The first patch is an "obvious" patch with no functional changes, which just
> > reorders the zero_extend insns in the md file so we get them in one place.
> > The second patch has functional changes.
> > 
> > (Note that the patches will not apply cleanly unless the recently submitted
> > patch to implement post increment addressing has been applied:
> > https://gcc.gnu.org/ml/gcc-patches/2019-10/msg00492.html)
> > 
> > Successfully regtested on trunk in the small and large memory models.
> > 
> > Ok for trunk?
> > 
> > Jozef Lawrynowicz (2):
> >   MSP430: Reorder and group zero_extend insns in msp430.md
> >   MSP430: PSImode pointer manipulation and zero extend insn
> > optimizations  
> FWIW, we often end up describing some of this stuff via
> WORD_REGISTER_OPERATIONS.  However that may not work for word sizes that
> are not a power of two bits.
> 
> I only mention it because it may be worth a bit of experimentation on
> your end.

We have WORD_REGISTER_OPERATIONS defined to 1, but UNITS_PER_WORD is always 2
bytes. But yes I should look at how the RTL passes use this macro.

I played around with some other macros e.g. TARGET_TRULY_NOOP_TRUNCATION,
REGMODE_NATURAL_SIZE but none had any desirable effect. It feels like there is
still something that needs to be fixed which will unlock better RTL generation
for PSImode operations, but I believe I've exhausted the target macros which
can influence this. So maybe something in the middle-end needs to be added or
extended.
I should also study some of the other ports with PSImode registers more closely.

P.S. I took at the old mn10200 port as you suggested before but I wasn't able
to improve code gen by changing any of the target macros that were defined
differently. The combiner patterns for pointer manipulation in the port were
similar but not close enough to the patterns I added in this patch.

Thanks,
Jozef
> 
> jeff

Re: [PATCH] rs6000: -flto forgets 'no-vsx' function attributes (PR target/70010)

2019-10-15 Thread Jiufu Guo

Segher Boessenkool  writes:

> Hi!
>
> On Tue, Oct 15, 2019 at 05:15:07PM +0800, Jiufu Guo wrote:
>> And another issue: Behavior is still inconsistent between "-mno-vsx
>> -flto" and "-mno-vsx" for user code. Previous patch makes it consistent
>> between "-mvsx -flto" and "-mvsx". 
>
>>  $GCC_BUILD/gcc/xgcc -B$GCC_BUILD/gcc novsx.c -O2 -mno-vsx
>> /home/guojiufu/temp/novsx.c: In function 'main':
>> /home/guojiufu/temp/novsx.c:6:1: error: inlining failed in call to
>> 'always_inline' 'foo': target specific option mismatch
>
> So what should we do about this?  There are arguments for *both*
> behaviours, and apparently with LTO we do not know which flags are
> explicit?
With LTO, rs6000_isa_flags_explicit and rs6000_isa_flags are also
checkable. rs6000_isa_flags indicates effective features;
bits of rs6000_isa_flags_explicit indicate if it is specified
explicitly.

The difference between 'with LTO' and 'without LTO' is:
Wit LTO, if a function does not have target attribute in source code,
then using option attribute from command line(e.g. -mcpu=xxx) as
isa_flags; and if features (e.g. -mno-vsx, -mvsx) are specified on
command line, isa_flags_explicit is also set.
If a function has target attribute in source code explicitly,
then isa_flags_explicit is set to reflect feature is specified
explicitly besides explicit features from command line; and isa_flags is
set as the effective features.

Without LTO, if a function does not have target attribute in source code,
then it's isa_flags_explicit and isa_flags are as NULL (target_options
== NULL). 
If a function has target attribute in source code, it is same as 'with
LTO'. 

>
>> +  /* Propogate global flags to caller.  */
>> +  HOST_WIDE_INT caller_isa = rs6000_isa_flags;
>
> I don't think that is right, or, I don't see why, anyway?
Propogating global flags is exiting behavior for function during LTO
compiling; and global flags are also used to caculate
isa_flags/isa_flags_explicit for function which has target attribute in
source code.
So, it would make sense to using global flags for caller here.
One more exmaple to explain this:
For example a function "int foo () {x;}", to compile it using
command "gcc -mcpu=power8 -mno-vsx"; we should using power8+no-vsx
instructions for this function. 
>
>> +
>> +  if (((caller_isa & callee_isa) == callee_isa)
>> + && (caller_isa & explicit_isa) == (callee_isa & explicit_isa))
>> +   ret = true;
>> +}
>
>
> Segher

Re: [patch] canonicalize unsigned [1,MAX] ranges into ~[0,0]

2019-10-15 Thread Rainer Orth

Hi Aldy,

> On 10/15/19 7:58 AM, Rainer Orth wrote:
>> Hi Aldy,
>>
> ~[0,0] has been the accepted way for a long time, I'd really prefer to
> keep that (for now).
 It has.  Very true.  But I don't necessarily think that means we should
 be introducing even more of 'em.
>> [...]
>>> Happily, normalizing into ~0 for signed and [1,MAX] for unsigned,
>>> simplified the patch because it on longer needs tweaks to
>>> ranges_from_anti_range.
>>>
>>> OK for trunk?
>>
>> the new testcase FAILs on several (all?) 32-bit targets:
>>
>> +FAIL: gcc.dg/tree-ssa/evrp4.c scan-tree-dump evrp "[1B, -1B]"
>
> That's unfortunate.
>
> Is this the only test that is failing?

it's the only on on Solaris/SPARC and Solaris/x86.  Haven't checked
other affected targets, though.

>> I'm seeing this on 32-bit i386-pc-solaris2.11 and sparc-sun-solaris2.11,
>> with more reports for armv8l, pru, and s390x.
>>
>> Comparing the dumps between 64 and 32-bit, I see
>>
>> -_1: int * [1B, -1B]
>> +_1: int * [1B, 4294967295B]
>
> I wonder why 32-bit targets at displaying 4294967295 instead of -1.  Or are
> pointers 64-bits here?

No, it's a pure 32-bit target.  The compiler is 32-bit, too, but
bi-arch (32 and 64-bit).

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

Re: [patch] canonicalize unsigned [1,MAX] ranges into ~[0,0]

2019-10-15 Thread Aldy Hernandez





On 10/15/19 7:58 AM, Rainer Orth wrote:

Hi Aldy,


~[0,0] has been the accepted way for a long time, I'd really prefer to
keep that (for now).

It has.  Very true.  But I don't necessarily think that means we should
be introducing even more of 'em.

[...]

Happily, normalizing into ~0 for signed and [1,MAX] for unsigned,
simplified the patch because it on longer needs tweaks to
ranges_from_anti_range.

OK for trunk?


the new testcase FAILs on several (all?) 32-bit targets:

+FAIL: gcc.dg/tree-ssa/evrp4.c scan-tree-dump evrp "[1B, -1B]"


That's unfortunate.

Is this the only test that is failing?



I'm seeing this on 32-bit i386-pc-solaris2.11 and sparc-sun-solaris2.11,
with more reports for armv8l, pru, and s390x.

Comparing the dumps between 64 and 32-bit, I see

-_1: int * [1B, -1B]
+_1: int * [1B, 4294967295B]


I wonder why 32-bit targets at displaying 4294967295 instead of -1.  Or 
are pointers 64-bits here?


I wonder if we should just change value_range_base::dump() to dump 
~[0,0] for ::nonzero_p(), that way we have a consistent way of 
*displaying* non-zeroness for tests.


Aldy

[C++ PATCH] clone_function_decl breakup

2019-10-15 Thread Nathan Sidwell

This patch, from the modules branch, breaks out function cloning from 
the method vector updating.  We have a new function, build_clones, which 
does the building, returning a count of the number of clones (2 or 3). 
clone_function_decl separately adds them to the method vector, if they 
should be added.  I suppose this could have used FOR_EVERY_CLONE, but I 
went with the counting scheme.


On the module branch, build_clones is not static and has a few more 
parameters, but that's a different patch.


installing on trunk.

nathan
--
Nathan Sidwell
2019-10-15  Nathan Sidwell  

	* class.c (build_clones): Break out of clone_function_decl.  Just
	build the clones.
	(clone_function_decl): Call build_clones, then maybe add them to
	the method vector.

Index: class.c
===
--- class.c	(revision 276992)
+++ class.c	(working copy)
@@ -4700,17 +4700,11 @@ build_clone (tree fn, tree name)
 }
 
-/* Produce declarations for all appropriate clones of FN.  If
-   UPDATE_METHODS is true, the clones are added to the
-   CLASSTYPE_MEMBER_VEC.  */
+/* Build the clones of FN, return the number of clones built.  These
+   will be inserted onto DECL_CHAIN of FN.  */
 
-void
-clone_function_decl (tree fn, bool update_methods)
+unsigned
+build_clones (tree fn)
 {
-  tree clone;
-
-  /* Avoid inappropriate cloning.  */
-  if (DECL_CHAIN (fn)
-  && DECL_CLONED_FUNCTION_P (DECL_CHAIN (fn)))
-return;
+  unsigned count = 0;
 
   if (DECL_MAYBE_IN_CHARGE_CONSTRUCTOR_P (fn))
@@ -4718,10 +4712,7 @@ clone_function_decl (tree fn, bool updat
   /* For each constructor, we need two variants: an in-charge version
 	 and a not-in-charge version.  */
-  clone = build_clone (fn, complete_ctor_identifier);
-  if (update_methods)
-	add_method (DECL_CONTEXT (clone), clone, false);
-  clone = build_clone (fn, base_ctor_identifier);
-  if (update_methods)
-	add_method (DECL_CONTEXT (clone), clone, false);
+  build_clone (fn, complete_ctor_identifier);
+  build_clone (fn, base_ctor_identifier);
+  count += 2;
 }
   else
@@ -4740,18 +4731,38 @@ clone_function_decl (tree fn, bool updat
   if (DECL_VIRTUAL_P (fn))
 	{
-	  clone = build_clone (fn, deleting_dtor_identifier);
-	  if (update_methods)
-	add_method (DECL_CONTEXT (clone), clone, false);
+	  build_clone (fn, deleting_dtor_identifier);
+	  count++;
 	}
-  clone = build_clone (fn, complete_dtor_identifier);
-  if (update_methods)
-	add_method (DECL_CONTEXT (clone), clone, false);
-  clone = build_clone (fn, base_dtor_identifier);
-  if (update_methods)
-	add_method (DECL_CONTEXT (clone), clone, false);
+  build_clone (fn, complete_dtor_identifier);
+  build_clone (fn, base_dtor_identifier);
+  count += 2;
 }
 
+  return count;
+}
+
+/* Produce declarations for all appropriate clones of FN.  If
+   UPDATE_METHODS is true, the clones are added to the
+   CLASSTYPE_MEMBER_VEC.  */
+
+void
+clone_function_decl (tree fn, bool update_methods)
+{
+  /* Avoid inappropriate cloning.  */
+  if (DECL_CHAIN (fn)
+  && DECL_CLONED_FUNCTION_P (DECL_CHAIN (fn)))
+return;
+
+  unsigned count = build_clones (fn);
+
   /* Note that this is an abstract function that is never emitted.  */
   DECL_ABSTRACT_P (fn) = true;
+
+  if (update_methods)
+for (tree clone = fn; count--;)
+  {
+	clone = DECL_CHAIN (clone);
+	add_method (DECL_CONTEXT (clone), clone, false);
+  }
 }

[PATCH] Avoid writing to trees during streaming

2019-10-15 Thread Richard Biener



Honza figured that variably_modified_type_p uses TREE_VISITED
to not run into an Ada abdomination.  That causes havoc during
WPA streaming which happens in multiple forked processes and
thus causes a lot of COW faulting and resident memory usage.
It also stands in the way of using threads here.

LTO bootstrapped and tested on x86_64-unknown-linux-gnu.

Honza - does this look OK?

Thanks,
Richard.

2019-10-15  Richard Biener  

* lto-streamer-out.c (lto_variably_modified_type_p): New.
(tree_is_indexable): Use it.
* tree-streamer-out.c (pack_ts_type_common_value_fields):
Stream variably_modified_type_p as TYPE_LANG_FLAG_0.
* tree-streamer-in.c (unpack_ts_type_common_value_fields): Likewise.

Index: gcc/lto-streamer-out.c
===
--- gcc/lto-streamer-out.c  (revision 276985)
+++ gcc/lto-streamer-out.c  (working copy)
@@ -120,6 +120,17 @@ output_type_ref (struct output_block *ob
   lto_output_type_ref_index (ob->decl_state, ob->main_stream, node);
 }
 
+/* Wrapper around variably_modified_type_p avoiding type modification
+   during WPA streaming.  */
+
+static bool
+lto_variably_modified_type_p (tree type)
+{
+  return (in_lto_p
+ ? TYPE_LANG_FLAG_0 (TYPE_MAIN_VARIANT (type))
+ : variably_modified_type_p (type, NULL_TREE));
+}
+
 
 /* Return true if tree node T is written to various tables.  For these
nodes, we sometimes want to write their phyiscal representation
@@ -134,7 +145,7 @@ tree_is_indexable (tree t)
  definition.  */
   if ((TREE_CODE (t) == PARM_DECL || TREE_CODE (t) == RESULT_DECL)
   && DECL_CONTEXT (t))
-return variably_modified_type_p (TREE_TYPE (DECL_CONTEXT (t)), NULL_TREE);
+return lto_variably_modified_type_p (TREE_TYPE (DECL_CONTEXT (t)));
   /* IMPORTED_DECL is put into BLOCK and thus it never can be shared.
  We should no longer need to stream it.  */
   else if (TREE_CODE (t) == IMPORTED_DECL)
@@ -154,10 +165,10 @@ tree_is_indexable (tree t)
  them we have to localize their members as well.
  ???  In theory that includes non-FIELD_DECLs as well.  */
   else if (TYPE_P (t)
-  && variably_modified_type_p (t, NULL_TREE))
+  && lto_variably_modified_type_p (t))
 return false;
   else if (TREE_CODE (t) == FIELD_DECL
-  && variably_modified_type_p (DECL_CONTEXT (t), NULL_TREE))
+  && lto_variably_modified_type_p (DECL_CONTEXT (t)))
 return false;
   else
 return (TYPE_P (t) || DECL_P (t) || TREE_CODE (t) == SSA_NAME);
Index: gcc/tree-streamer-in.c
===
--- gcc/tree-streamer-in.c  (revision 276985)
+++ gcc/tree-streamer-in.c  (working copy)
@@ -378,6 +378,7 @@ unpack_ts_type_common_value_fields (stru
   TYPE_RESTRICT (expr) = (unsigned) bp_unpack_value (bp, 1);
   TYPE_USER_ALIGN (expr) = (unsigned) bp_unpack_value (bp, 1);
   TYPE_READONLY (expr) = (unsigned) bp_unpack_value (bp, 1);
+  TYPE_LANG_FLAG_0 (expr) = (unsigned) bp_unpack_value (bp, 1);
   if (RECORD_OR_UNION_TYPE_P (expr))
 {
   TYPE_TRANSPARENT_AGGR (expr) = (unsigned) bp_unpack_value (bp, 1);
Index: gcc/tree-streamer-out.c
===
--- gcc/tree-streamer-out.c (revision 276985)
+++ gcc/tree-streamer-out.c (working copy)
@@ -326,6 +326,12 @@ pack_ts_type_common_value_fields (struct
   bp_pack_value (bp, TYPE_RESTRICT (expr), 1);
   bp_pack_value (bp, TYPE_USER_ALIGN (expr), 1);
   bp_pack_value (bp, TYPE_READONLY (expr), 1);
+  unsigned vla_p;
+  if (in_lto_p)
+vla_p = TYPE_LANG_FLAG_0 (TYPE_MAIN_VARIANT (expr));
+  else
+vla_p = variably_modified_type_p (expr, NULL_TREE);
+  bp_pack_value (bp, vla_p, 1);
   /* We used to stream TYPE_ALIAS_SET == 0 information to let frontends mark
  types that are opaque for TBAA.  This however did not work as intended,
  because TYPE_ALIAS_SET == 0 was regularly lost in type merging.  */

Re: [SLP] SLP vectorization: vectorize vector constructors

2019-10-15 Thread Richard Biener

On Fri, 11 Oct 2019, Joel Hutton wrote:

> Hi Richard,
> 
> Thanks for your help, I've reworked my SLP RFC based on your feedback.
>  > I think a better place for the loop searching for CONSTRUCTORs is
>  > vect_slp_analyze_bb_1 where I'd put it before the check you remove,
>  > and I'd simply append found CONSTRUCTORs to the grouped_stores
>  > array
> I've moved this check into a separate function and called it from 
> vect_slp_analyze_bb_1
> 
>  > The fixup you do in vectorizable_operation doesn't
>  > belong there either, I'd add a new field to the SLP instance
>  > structure refering to the CONSTRUCTOR stmt and do the fixup
>  > in vect_schedule_slp_instance instead where you can simply
>  > replace the CONSTRUCTOR with the vectorized SSA name then.
> 
> Done.
> 
>  > +   /* Check that the constructor elements are unique.  */
>  > +   FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (rhs), i, val)
>  > + {
>  > +   tree prev_val;
>  > +   int j;
>  > +   FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (rhs), j,
>  > prev_val)
>  > +   {
>  > + if (val == prev_val && i!=j)
>  >
>  > why's that necessary? (it looks incomplete, also doesn't catch
>  > [duplicate] constants)
> 
> The thinking was that there was no benefit in vectorizing a constructor of
> duplicates, or a vector of constants, although now you mention it that 
> thinking
> may be flawed. I've removed it
> 
>  > You miss to check that CONSTRUCTOR_NELTS == TYPE_VECTOR_SUBPARTS
>  > (we can have omitted trailing zeros).

^^^

I don't see this being handled?  You give up on non-SSA names
but not on the omitted trailing zeros.

> ...
>  > What happens if you have a vector constructor that is twice
>  > as large as the machine supports?  The vectorizer will happily
>  > produce a two vector SSA name vectorized result but your
>  > CONSTRUCTOR replacement doesn't work here.  I think this should
>  > be made work correctly (not give up on that case).
> 
> I've reworked the patch to account for this, by checking that the 
> vectorized version
> has one vectorized stmt at the root of the SLP tree. I'm not sure how to 
> handle
> a vector constructor twice as large as the machine supports, as far as I 
> can see,
> when only analyzing a within a basic block, the SSA name of the 
> constructor has to
> be maintained.

You build a CONSTRUCTOR of vectors, thus

_orig_ssa_1 = { vect_part1_2, vect_part2_3, ... };


+
+ if (constructor)
+   {
+ SLP_INSTANCE_ROOT_STMT (new_instance) = stmt_info->stmt;
+   }
+ else
+   SLP_INSTANCE_ROOT_STMT (new_instance) = NULL;
+

too much vertical space, no {} around single-stmt if clauses

@@ -2725,6 +2760,10 @@ vect_bb_slp_scalar_cost (basic_block bb,
stmt_vec_info use_stmt_info = vinfo->lookup_stmt 
(use_stmt);
if (!use_stmt_info || !PURE_SLP_STMT (use_stmt_info))
  {
+   /* Check this is not a constructor that will be 
vectorized
+  away.  */
+   if (BB_VINFO_GROUPED_STORES (vinfo).contains 
(use_stmt_info))
+   continue;

hmm, so why not set the slp type on SLP_INSTANCE_ROOT_STMT instead?
In theory the stmt should be part of the SLP tree itself but that's 
probably too awkward to be made work at the moment ;)

vect_ssa_use_outside_bb and vect_slp_check_for_constructors are new
functions which need comments.

+  /* For vector constructors, the same SSA name must be used to maintain 
data
+ flow into other basic blocks.  */
+  if (instance->root == node && SLP_INSTANCE_ROOT_STMT (instance)
+  && SLP_TREE_NUMBER_OF_VEC_STMTS (node) == 1
+  && SLP_TREE_VEC_STMTS (node).exists ())
+{

it should read

  /* Vectorize the instance root.  */

and be in vect_schedule_slp after the vect_schedule_slp_instance.
As said above you need to handle SLP_TREE_NUMBER_OF_VEC_STMTS > 1,
you also cannot simply do "nothing" here, "failing" vectorization
(well, you probably can - DCE will remove the vectorized code - but
at least if there were calls in the SLP tree they will be mangled
by vectorization so the scalar code is wrecked).  SO it should be

 if (SLP_INSTANCE_ROOT_STMT (instance))
   .. you may not fail to replace the scalar root stmt here ..

+ if (CONSTRUCTOR_NELTS (rhs) == 0)
+   vectorizable = false;
+

if you use continue; you can elide the 'vectorizable' variable.

+ if (!vect_ssa_use_outside_bb (gimple_assign_lhs (stmt)))
+   vectorizable = false;
+

why's that?  no comments that clarify ;)  The vector may be
used as argument to a call or as source of a store.  So I'd simply
remove this check (and the function).

Thanks,
Richard.

> Currently SLP vectorization can build SLP trees starting from reductions or
> from group stores. This patch adds a third starting point: vector 
> constructors.
> 
> For the following test case (c

[linemap PATCH] Constify lookup

2019-10-15 Thread Nathan Sidwell

looking up a line map takes a non-constant line_maps object, which is 
confusing.


This makes the caching fields mutable, so permits a constant object, as 
one might expect for a lookup.


The linemaps_info_{ordinary,macro} structures are crying out to be 
templatized, but that kind of turns into a rathole cleaning up the C-era 
accessors and like.  I punted.


committing to trunk

nathan
--
Nathan Sidwell
2019-10-15  Nathan Sidwell  

	* include/line-map.h (struct maps_info_ordinary): Make cache
	mutable.
	(struct maps_info_macro): Likewise.
	(LINEMAPS_CACHE): Remove non-ref accessor. Constify ref accessor.
	(LINEMAPS_ORDINARY_CACHE, LINEMAPS_MACRO_CACHE): Likewise.
	(LINEMAPS_ORDINARY_MAP_AT, LINEMAPS_MACRO_MAP_AT): Use
	LINEMAPS_USED and LINEMAPS_MAP_AT.
	(linemap_lookup): Constify line_map arg.
	linemap.c (linemap_ordinary_map_lookup, linemap_macro_map_lookup):
	Constify line_map arg.

Index: include/line-map.h
===
--- include/line-map.h	(revision 276991)
+++ include/line-map.h	(working copy)
@@ -725,5 +725,5 @@ struct GTY(()) maps_info_ordinary {
   unsigned int used;
 
-  unsigned int cache;
+  mutable unsigned int cache;
 };
 
@@ -740,5 +740,5 @@ struct GTY(()) maps_info_macro {
   unsigned int used;
 
-  unsigned int cache;
+  mutable unsigned int cache;
 };
 
@@ -866,17 +866,6 @@ LINEMAPS_USED (line_maps *set, bool map_
linemap_lookup. MAP_KIND shall be TRUE if we are interested in
macro maps, FALSE otherwise.  */
-inline unsigned int
-LINEMAPS_CACHE (const line_maps *set, bool map_kind)
-{
-  if (map_kind)
-return set->info_macro.cache;
-  else
-return set->info_ordinary.cache;
-}
-
-/* As above, but by reference (e.g. as an lvalue).  */
-
 inline unsigned int &
-LINEMAPS_CACHE (line_maps *set, bool map_kind)
+LINEMAPS_CACHE (const line_maps *set, bool map_kind)
 {
   if (map_kind)
@@ -928,7 +917,7 @@ inline line_map_ordinary *
 LINEMAPS_ORDINARY_MAP_AT (const line_maps *set, int index)
 {
-  linemap_assert (index >= 0);
-  linemap_assert ((unsigned int)index < set->info_ordinary.used);
-  return &set->info_ordinary.maps[index];
+  linemap_assert (index >= 0
+		  && (unsigned int)index < LINEMAPS_USED (set, false));
+  return (line_map_ordinary *)LINEMAPS_MAP_AT (set, false, index);
 }
 
@@ -950,14 +939,6 @@ LINEMAPS_ORDINARY_USED (const line_maps
 /* Return the index of the last ordinary map that was looked up with
linemap_lookup.  */
-inline unsigned int
-LINEMAPS_ORDINARY_CACHE (const line_maps *set)
-{
-  return LINEMAPS_CACHE (set, false);
-}
-
-/* As above, but by reference (e.g. as an lvalue).  */
-
 inline unsigned int &
-LINEMAPS_ORDINARY_CACHE (line_maps *set)
+LINEMAPS_ORDINARY_CACHE (const line_maps *set)
 {
   return LINEMAPS_CACHE (set, false);
@@ -992,7 +973,7 @@ inline line_map_macro *
 LINEMAPS_MACRO_MAP_AT (const line_maps *set, int index)
 {
-  linemap_assert (index >= 0);
-  linemap_assert ((unsigned int)index < set->info_macro.used);
-  return &set->info_macro.maps[index];
+  linemap_assert (index >= 0
+		  && (unsigned int)index < LINEMAPS_USED (set, true));
+  return (line_map_macro *)LINEMAPS_MAP_AT (set, true, index);
 }
 
@@ -1012,16 +993,8 @@ LINEMAPS_MACRO_USED (const line_maps *se
 }
 
-/* Returns the index of the last macro map looked up with
+/* Return the index of the last macro map that was looked up with
linemap_lookup.  */
-inline unsigned int
-LINEMAPS_MACRO_CACHE (const line_maps *set)
-{
-  return LINEMAPS_CACHE (set, true);
-}
-
-/* As above, but by reference (e.g. as an lvalue).  */
-
 inline unsigned int &
-LINEMAPS_MACRO_CACHE (line_maps *set)
+LINEMAPS_MACRO_CACHE (const line_maps *set)
 {
   return LINEMAPS_CACHE (set, true);
@@ -1131,5 +1104,5 @@ extern const line_map *linemap_add
function returns NULL.  */
 extern const line_map *linemap_lookup
-  (class line_maps *, location_t);
+  (const line_maps *, location_t);
 
 /* Returns TRUE if the line table set tracks token locations across
Index: line-map.c
===
--- line-map.c	(revision 276991)
+++ line-map.c	(working copy)
@@ -28,7 +28,7 @@ along with this program; see the file CO
 
 static void trace_include (const line_maps *, const line_map_ordinary *);
-static const line_map_ordinary * linemap_ordinary_map_lookup (line_maps *,
+static const line_map_ordinary * linemap_ordinary_map_lookup (const line_maps *,
 			  location_t);
-static const line_map_macro* linemap_macro_map_lookup (line_maps *,
+static const line_map_macro* linemap_macro_map_lookup (const line_maps *,
 		   location_t);
 static location_t linemap_macro_map_loc_to_def_point
@@ -938,5 +938,5 @@ linemap_position_for_loc_and_offset (lin
 
 const struct line_map*
-linemap_lookup (line_maps *set, location_t line)
+linemap_lookup (const line_maps *set, location_t line)
 {
   if (IS_ADHOC_LOC (line))
@@ -953,5 +953,5 @@ linemap_lookup (line_maps *set, location
 
 static cons

Re: [Patch 1/2][ipa] Add target hook to sanitize cloned declaration's attributes

2019-10-15 Thread Jan Hubicka

> Hi,
> 
> I have changed the name to make the target hook more general and not create
> the illusion it is related to sanitizers.
> 
> Is this OK for trunk?
Can't you just go via IPA symbol summary or add_cgraph_duplication_hook?

Honza

Re: [patch] canonicalize unsigned [1,MAX] ranges into ~[0,0]

2019-10-15 Thread Rainer Orth

Hi Aldy,

>>> ~[0,0] has been the accepted way for a long time, I'd really prefer to
>>> keep that (for now).
>> It has.  Very true.  But I don't necessarily think that means we should
>> be introducing even more of 'em.
[...]
> Happily, normalizing into ~0 for signed and [1,MAX] for unsigned,
> simplified the patch because it on longer needs tweaks to
> ranges_from_anti_range.
>
> OK for trunk?

the new testcase FAILs on several (all?) 32-bit targets:

+FAIL: gcc.dg/tree-ssa/evrp4.c scan-tree-dump evrp "[1B, -1B]"

I'm seeing this on 32-bit i386-pc-solaris2.11 and sparc-sun-solaris2.11,
with more reports for armv8l, pru, and s390x.

Comparing the dumps between 64 and 32-bit, I see

-_1: int * [1B, -1B]
+_1: int * [1B, 4294967295B]

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

Re: [PATCH] rs6000: -flto forgets 'no-vsx' function attributes (PR target/70010)

2019-10-15 Thread Richard Biener

On Tue, Oct 15, 2019 at 1:33 PM Segher Boessenkool
 wrote:
>
> On Tue, Oct 15, 2019 at 01:19:51PM +0200, Richard Biener wrote:
> > On Tue, Oct 15, 2019 at 12:07 PM Segher Boessenkool
> >  wrote:
> > > On Tue, Oct 15, 2019 at 11:32:27AM +0200, Richard Biener wrote:
> > > > > I think we just need to fix the bug in the current logic when checking
> > > > > whether the caller's ISA flags supports the callee's ISA flags. ...and
> > > > > for that, I think we just need to add a test that enforces that the
> > > > > caller's ISA flags match exactly the callee's flags, for those flags
> > > > > that were explicitly set in the callee.  The patch below seems to fix
> > > > > the issue (regtesting now).  Does this look like what we want?
> > > >
> > > > I believe this is going to bite you exactly in the case you want the
> > > > opposite behavior.  If you have CUs compiled with defaults and
> > > > a specialized one with VSX that calls into generic compiled functions
> > > > you _do_ want to allow inlining into the VSX enabled routines.
> > >
> > > Yes, but *not* inlining is relatively harmless, while inlining can be
> > > fatal.  I don't see how we can handle both scenarios optimally.
> >
> > How can it be fatal to inline a non-VSX function into a VSX one?
>
> Oh I misread, I thought it was the other way around.
>
> > > > Just
> > > > think of LTO, C++ and comdats - you'll get a random comdat entity
> > > > at link time for inlining - either from the VSX CU or the non-VSX one.
> > >
> > > This would make LTO totally unusable, with or without this patch?  
> > > Something
> > > else must be going on?
> >
> > It's the same without LTO - the linker will simply choose (randomly)
> > one of the comdats from one of the CUs providing it, not caring about
> > some built with and some without VSX.
>
> Hrm, so how does that ever work?

Possibly people are "not doing this"?  Aka have a program with a runtime
capability check for VSX, dispatch to std::vector/algorithm CUs
with/without -mvsx
and then link the result?  Because any instantiated template therein will end up
as a comdat...

Guess we still live in a mostly C world ;)

Richard.

>
> Segher

Re: [SVE] PR86753

2019-10-15 Thread Richard Biener

On Tue, Oct 15, 2019 at 8:07 AM Prathamesh Kulkarni
 wrote:
>
> On Wed, 9 Oct 2019 at 08:14, Prathamesh Kulkarni
>  wrote:
> >
> > On Tue, 8 Oct 2019 at 13:21, Richard Sandiford
> >  wrote:
> > >
> > > Leaving the main review to Richard, just some comments...
> > >
> > > Prathamesh Kulkarni  writes:
> > > > @@ -9774,6 +9777,10 @@ vect_is_simple_cond (tree cond, vec_info *vinfo,
> > > >
> > > > When STMT_INFO is vectorized as a nested cycle, for_reduction is 
> > > > true.
> > > >
> > > > +   For COND_EXPR if T comes from masked load, and is 
> > > > conditional
> > > > +   on C, we apply loop mask to result of vector comparison, if it's 
> > > > present.
> > > > +   Similarly for E, if it is conditional on !C.
> > > > +
> > > > Return true if STMT_INFO is vectorizable in this way.  */
> > > >
> > > >  bool
> > >
> > > I think this is a bit misleading.  But IMO it'd be better not to have
> > > a comment here and just rely on the one in the main function body.
> > > This optimisation isn't really changing the vectorisation strategy,
> > > and the comment could easily get forgotten if things change in future.
> > >
> > > > [...]
> > > > @@ -,6 +10006,35 @@ vectorizable_condition (stmt_vec_info 
> > > > stmt_info, gimple_stmt_iterator *gsi,
> > > >/* Handle cond expr.  */
> > > >for (j = 0; j < ncopies; j++)
> > > >  {
> > > > +  tree loop_mask = NULL_TREE;
> > > > +  bool swap_cond_operands = false;
> > > > +
> > > > +  /* Look up if there is a loop mask associated with the
> > > > +  scalar cond, or it's inverse.  */
> > >
> > > Maybe:
> > >
> > >See whether another part of the vectorized code applies a loop
> > >mask to the condition, or to its inverse.
> > >
> > > > +
> > > > +  if (loop_vinfo && LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
> > > > + {
> > > > +   scalar_cond_masked_key cond (cond_expr, ncopies);
> > > > +   if (loop_vinfo->scalar_cond_masked_set.contains (cond))
> > > > + {
> > > > +   vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
> > > > +   loop_mask = vect_get_loop_mask (gsi, masks, ncopies, 
> > > > vectype, j);
> > > > + }
> > > > +   else
> > > > + {
> > > > +   bool honor_nans = HONOR_NANS (TREE_TYPE (cond.op0));
> > > > +   cond.code = invert_tree_comparison (cond.code, honor_nans);
> > > > +   if (loop_vinfo->scalar_cond_masked_set.contains (cond))
> > > > + {
> > > > +   vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
> > > > +   loop_mask = vect_get_loop_mask (gsi, masks, ncopies,
> > > > +   vectype, j);
> > > > +   cond_code = cond.code;
> > > > +   swap_cond_operands = true;
> > > > + }
> > > > + }
> > > > + }
> > > > +
> > > >stmt_vec_info new_stmt_info = NULL;
> > > >if (j == 0)
> > > >   {
> > > > @@ -10114,6 +10153,47 @@ vectorizable_condition (stmt_vec_info 
> > > > stmt_info, gimple_stmt_iterator *gsi,
> > > >   }
> > > >   }
> > > >   }
> > > > +
> > > > +   /* If loop mask is present, then AND it with
> > >
> > > Maybe "If we decided to apply a loop mask, ..."
> > >
> > > > +  result of vec comparison, so later passes (fre4)
> > >
> > > Probably better not to name the pass -- could easily change in future.
> > >
> > > > +  will reuse the same condition used in masked load.
> > >
> > > Could be a masked store, or potentially other things too.
> > > So maybe just "will reuse the masked condition"?
> > >
> > > > +
> > > > +  For example:
> > > > +  for (int i = 0; i < 100; ++i)
> > > > +x[i] = y[i] ? z[i] : 10;
> > > > +
> > > > +  results in following optimized GIMPLE:
> > > > +
> > > > +  mask__35.8_43 = vect__4.7_41 != { 0, ... };
> > > > +  vec_mask_and_46 = loop_mask_40 & mask__35.8_43;
> > > > +  _19 = &MEM[base: z_12(D), index: ivtmp_56, step: 4, offset: 
> > > > 0B];
> > > > +  vect_iftmp.11_47 = .MASK_LOAD (_19, 4B, vec_mask_and_46);
> > > > +  vect_iftmp.12_52 = VEC_COND_EXPR  > > > +vect_iftmp.11_47, { 10, 
> > > > ... }>;
> > > > +
> > > > +  instead of recomputing vec != { 0, ... } in vec_cond_expr  */
> > >
> > > That's true, but gives the impression that avoiding the vec != { 0, ... }
> > > is the main goal, whereas we could do that just by forcing a three-operand
> > > COND_EXPR.  It's really more about making sure that vec != { 0, ... }
> > > and its masked form aren't both live at the same time.  So maybe:
> > >
> > >  instead of using a masked and unmasked forms of
> > >  vect__4.7_41 != { 0, ... } (masked in the MASK_LOAD,
> > >  unmasked in the VEC_COND_EXPR).  */
> > >
> > Hi Richard,
> > Thanks for the suggestions, I have updated comments in th

Re: [PATCH] rs6000: -flto forgets 'no-vsx' function attributes (PR target/70010)

2019-10-15 Thread Segher Boessenkool

On Tue, Oct 15, 2019 at 01:19:51PM +0200, Richard Biener wrote:
> On Tue, Oct 15, 2019 at 12:07 PM Segher Boessenkool
>  wrote:
> > On Tue, Oct 15, 2019 at 11:32:27AM +0200, Richard Biener wrote:
> > > > I think we just need to fix the bug in the current logic when checking
> > > > whether the caller's ISA flags supports the callee's ISA flags. ...and
> > > > for that, I think we just need to add a test that enforces that the
> > > > caller's ISA flags match exactly the callee's flags, for those flags
> > > > that were explicitly set in the callee.  The patch below seems to fix
> > > > the issue (regtesting now).  Does this look like what we want?
> > >
> > > I believe this is going to bite you exactly in the case you want the
> > > opposite behavior.  If you have CUs compiled with defaults and
> > > a specialized one with VSX that calls into generic compiled functions
> > > you _do_ want to allow inlining into the VSX enabled routines.
> >
> > Yes, but *not* inlining is relatively harmless, while inlining can be
> > fatal.  I don't see how we can handle both scenarios optimally.
> 
> How can it be fatal to inline a non-VSX function into a VSX one?

Oh I misread, I thought it was the other way around.

> > > Just
> > > think of LTO, C++ and comdats - you'll get a random comdat entity
> > > at link time for inlining - either from the VSX CU or the non-VSX one.
> >
> > This would make LTO totally unusable, with or without this patch?  Something
> > else must be going on?
> 
> It's the same without LTO - the linker will simply choose (randomly)
> one of the comdats from one of the CUs providing it, not caring about
> some built with and some without VSX.

Hrm, so how does that ever work?


Segher

Re: [PATCH] More PR92046 fixes, make --param allow-store-data-races a -f option

2019-10-15 Thread Jakub Jelinek

On Tue, Oct 15, 2019 at 01:21:12PM +0200, Richard Biener wrote:
> On Tue, 15 Oct 2019, Kyrill Tkachov wrote:
> 
> > Hi Richard,
> > 
> > On 10/15/19 8:17 AM, Richard Biener wrote:
> > >
> > > This makes allow-store-data-races adjustable per function by making it
> > > a regular option rather than a --param.
> > 
> > 
> > Note that the kernel has --param=allow-store-data-races=0 in its build 
> > flags.
> > 
> > I guess that will break unless they rename it to 
> > -fno-allow-store-data-races?
> 
> Yes.  Or simply drop the --param since unless they happen to use -Ofast
> their setting is the default.

I think nothing will break, because it is
KBUILD_CFLAGS   += $(call cc-option,--param=allow-store-data-races=0)
and cc-option will expand to nothing if the --param is not supported
anymore.  They can surely add
KBUILD_CFLAGS   += $(call cc-option,-fno-allow-store-data-races)
but as Richard mentioned, if kernel isn't built with -Ofast, it should make
no difference.

Jakub

Re: [Patch][Fortran] OpenMP+OpenACC: Remove bogus contigous-pointer check (PR/65438)

2019-10-15 Thread Tobias Burnus


Hi Thomas,

On 10/15/19 12:59 PM, Thomas König wrote:

What about using gfc_is_not_contigous? This would allow to issue an error when 
we can prove the user made an error.


Most clauses take only an identifiers (i.e. "sym" not "expr"). The 
gfc_is_not_contiguous check only returns true if there is an explicit 
array reference with strides. – All current callees only have "sym" (or 
in one case an expr which is only an identifier).


However, it might be useful for the few places like OpenMP's "map" 
clause which also permit array section. I will try to remember this 
rather new function (added Jan 2019) when adding some constraint checks.


Thanks,

Tobias

Re: Add a constant_range_value_p function (PR 92033)

2019-10-15 Thread Richard Biener

On Tue, Oct 15, 2019 at 12:35 PM Richard Sandiford
 wrote:
>
> Richard Biener  writes:
> > On October 14, 2019 2:32:43 PM GMT+02:00, Richard Sandiford 
> >  wrote:
> >>Richard Biener  writes:
> >>> On Fri, Oct 11, 2019 at 4:42 PM Richard Sandiford
> >>>  wrote:
> 
>  The range-tracking code has a pretty hard-coded assumption that
>  is_gimple_min_invariant is equivalent to "INTEGER_CST or invariant
>  ADDR_EXPR".  It seems better to add a predicate specifically for
>  that rather than contiually fight cases in which it can't handle
>  other invariants.
> 
>  Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
> >>>
> >>> ICK.  Nobody is going to remember this new restriction and
> >>> constant_range_value_p reads like constant_value_range_p ;)
> >>>
> >>> Btw, is_gimple_invariant_address shouldn't have been exported,
> >>> it's only use could have used is_gimple_min_invariant...
> >>
> >>What do you think we should do instead?
> >
> > Just handle POLY_INT_CST in a few place to quickly enough drop to varying.
>
> OK, how about this?  Aldy's suggestion would be fine by me too,
> but I thought I'd try this first given Aldy's queasiness about
> allowing POLY_INT_CSTs further in.
>
> The main case in which this gives useful ranges is a lower bound
> of A + B * X becoming A when B >= 0.  E.g.:
>
>   (1) [32 + 16X, 100] -> [32, 100]
>   (2) [32 + 16X, 32 + 16X] -> [32, MAX]
>
> But the same thing can be useful for the upper bound with negative
> X coefficients.
>
> We can revisit this later if keeping a singleton range for (2)
> would be better.
>
> Tested as before.

Works for me.

Richard.

> Richard
>
>
> 2019-10-15  Richard Sandiford  
>
> gcc/
> PR middle-end/92033
> * poly-int.h (constant_lower_bound_with_limit): New function.
> (constant_upper_bound_with_limit): Likewise.
> * doc/poly-int.texi: Document them.
> * tree-vrp.c (value_range_base::set): Convert POLY_INT_CST bounds
> into the worst-case INTEGER_CST bounds.
>
> Index: gcc/poly-int.h
> ===
> --- gcc/poly-int.h  2019-07-10 19:41:26.395898027 +0100
> +++ gcc/poly-int.h  2019-10-15 11:30:14.099625553 +0100
> @@ -1528,6 +1528,29 @@ constant_lower_bound (const poly_int_pod
>return a.coeffs[0];
>  }
>
> +/* Return the constant lower bound of A, given that it is no less than B.  */
> +
> +template
> +inline POLY_CONST_COEFF (Ca, Cb)
> +constant_lower_bound_with_limit (const poly_int_pod &a, const Cb &b)
> +{
> +  if (known_ge (a, b))
> +return a.coeffs[0];
> +  return b;
> +}
> +
> +/* Return the constant upper bound of A, given that it is no greater
> +   than B.  */
> +
> +template
> +inline POLY_CONST_COEFF (Ca, Cb)
> +constant_upper_bound_with_limit (const poly_int_pod &a, const Cb &b)
> +{
> +  if (known_le (a, b))
> +return a.coeffs[0];
> +  return b;
> +}
> +
>  /* Return a value that is known to be no greater than A and B.  This
> will be the greatest lower bound for some indeterminate values but
> not necessarily for all.  */
> Index: gcc/doc/poly-int.texi
> ===
> --- gcc/doc/poly-int.texi   2019-03-08 18:14:25.333011645 +
> +++ gcc/doc/poly-int.texi   2019-10-15 11:30:14.099625553 +0100
> @@ -803,6 +803,18 @@ the assertion is known to hold.
>  @item constant_lower_bound (@var{a})
>  Assert that @var{a} is nonnegative and return the smallest value it can have.
>
> +@item constant_lower_bound_with_limit (@var{a}, @var{b})
> +Return the least value @var{a} can have, given that the context in
> +which @var{a} appears guarantees that the answer is no less than @var{b}.
> +In other words, the caller is asserting that @var{a} is greater than or
> +equal to @var{b} even if @samp{known_ge (@var{a}, @var{b})} doesn't hold.
> +
> +@item constant_upper_bound_with_limit (@var{a}, @var{b})
> +Return the greatest value @var{a} can have, given that the context in
> +which @var{a} appears guarantees that the answer is no greater than @var{b}.
> +In other words, the caller is asserting that @var{a} is less than or equal
> +to @var{b} even if @samp{known_le (@var{a}, @var{b})} doesn't hold.
> +
>  @item lower_bound (@var{a}, @var{b})
>  Return a value that is always less than or equal to both @var{a} and @var{b}.
>  It will be the greatest such value for some indeterminate values
> Index: gcc/tree-vrp.c
> ===
> --- gcc/tree-vrp.c  2019-10-14 09:04:54.515259320 +0100
> +++ gcc/tree-vrp.c  2019-10-15 11:30:14.099625553 +0100
> @@ -727,6 +727,24 @@ value_range_base::set (enum value_range_
>return;
>  }
>
> +  /* Convert POLY_INT_CST bounds into worst-case INTEGER_CST bounds.  */
> +  if (POLY_INT_CST_P (min))
> +{
> +  tree type_min = vrp_val_min (TREE_TYPE (min), true);
> +  widest_int lb
> +   = constant_lowe

Re: [PATCH] More PR92046 fixes, make --param allow-store-data-races a -f option

2019-10-15 Thread Richard Biener

On Tue, 15 Oct 2019, Kyrill Tkachov wrote:

> Hi Richard,
> 
> On 10/15/19 8:17 AM, Richard Biener wrote:
> >
> > This makes allow-store-data-races adjustable per function by making it
> > a regular option rather than a --param.
> 
> 
> Note that the kernel has --param=allow-store-data-races=0 in its build flags.
> 
> I guess that will break unless they rename it to 
> -fno-allow-store-data-races?

Yes.  Or simply drop the --param since unless they happen to use -Ofast
their setting is the default.

Richard.

> Thanks,
> 
> Kyrill
> 
> 
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?
> >
> > Thanks,
> > Richard.
> >
> > 2019-10-15  Richard Biener  
> >
> >     PR middle-end/92046
> >     * common.opt (fallow-store-data-races): New.
> >     * params.def (PARAM_ALLOW_STORE_DATA_RACES): Remove.
> >     * params.h (ALLOW_STORE_DATA_RACES): Likewise.
> >     * doc/invoke.texi (fallow-store-data-races): Document.
> >     (--param allow-store-data-races): Remove docs.
> >     * opts.c (default_options_table): Enable -fallow-store-data-races
> >     at -Ofast.
> >     (default_options_optimization): Do not enable --param
> >     allow-store-data-races at -Ofast.
> >     * tree-if-conv.c (ifcvt_memrefs_wont_trap): Use
> > flag_store_data_races
> >     instead of PARAM_ALLOW_STORE_DATA_RACES.
> >     * tree-ssa-loop-im.c (execute_sm): Likewise.
> >
> >     * c-c++-common/cxxbitfields-3.c: Adjust.
> >     * c-c++-common/cxxbitfields-6.c: Likewise.
> >     * c-c++-common/simulate-thread/bitfields-1.c: Likewise.
> >     * c-c++-common/simulate-thread/bitfields-2.c: Likewise.
> >     * c-c++-common/simulate-thread/bitfields-3.c: Likewise.
> >     * c-c++-common/simulate-thread/bitfields-4.c: Likewise.
> >     * g++.dg/simulate-thread/bitfields-2.C: Likewise.
> >     * g++.dg/simulate-thread/bitfields.C: Likewise.
> >     * gcc.dg/lto/pr52097_0.c: Likewise.
> >     * gcc.dg/simulate-thread/speculative-store-2.c: Likewise.
> >     * gcc.dg/simulate-thread/speculative-store-3.c: Likewise.
> >     * gcc.dg/simulate-thread/speculative-store-4.c: Likewise.
> >     * gcc.dg/simulate-thread/speculative-store.c: Likewise.
> >     * gcc.dg/tree-ssa/20050314-1.c: Likewise.
> >
> > Index: gcc/common.opt
> > ===
> > --- gcc/common.opt  (revision 276983)
> > +++ gcc/common.opt  (working copy)
> > @@ -993,6 +993,10 @@ Align the start of loops.
> >  falign-loops=
> >  Common RejectNegative Joined Var(str_align_loops) Optimization
> >
> > +fallow-store-data-races
> > +Common Report Var(flag_store_data_races) Optimization
> > +Allow the compiler to introduce new data races on stores.
> > +
> >  fargument-alias
> >  Common Ignore
> >  Does nothing. Preserved for backward compatibility.
> > Index: gcc/doc/invoke.texi
> > ===
> > --- gcc/doc/invoke.texi (revision 276983)
> > +++ gcc/doc/invoke.texi (working copy)
> > @@ -406,6 +406,7 @@ Objective-C and Objective-C++ Dialects}.
> >  -falign-jumps[=@var{n}[:@var{m}:[@var{n2}[:@var{m2} @gol
> >  -falign-labels[=@var{n}[:@var{m}:[@var{n2}[:@var{m2} @gol
> >  -falign-loops[=@var{n}[:@var{m}:[@var{n2}[:@var{m2} @gol
> > +-fallow-store-data-races @gol
> >  -fassociative-math  -fauto-profile -fauto-profile[=@var{path}] @gol
> >  -fauto-inc-dec  -fbranch-probabilities @gol
> >  -fcaller-saves @gol
> > @@ -8463,9 +8464,9 @@ designed to reduce code size.
> >  Disregard strict standards compliance.  @option{-Ofast} enables all
> >  @option{-O3} optimizations.  It also enables optimizations that are not
> >  valid for all standard-compliant programs.
> > -It turns on @option{-ffast-math} and the Fortran-specific
> > -@option{-fstack-arrays}, unless @option{-fmax-stack-var-size} is
> > -specified, and @option{-fno-protect-parens}.
> > +It turns on @option{-ffast-math}, @option{-fallow-store-data-races}
> > +and the Fortran-specific @option{-fstack-arrays}, unless
> > +@option{-fmax-stack-var-size} is specified, and
> > @option{-fno-protect-parens}.
> >
> >  @item -Og
> >  @opindex Og
> > @@ -10227,6 +10228,12 @@ The maximum allowed @var{n} option value
> >
> >  Enabled at levels @option{-O2}, @option{-O3}.
> >
> > +@item -fallow-store-data-races
> > +@opindex fallow-store-data-races
> > +Allow the compiler to introduce new data races on stores.
> > +
> > +Enabled at level @option{-Ofast}.
> > +
> >  @item -funit-at-a-time
> >  @opindex funit-at-a-time
> >  This option is left for compatibility reasons. @option{-funit-at-a-time}
> > @@ -12060,10 +12067,6 @@ The maximum number of conditional store
> >  if either vectorization (@option{-ftree-vectorize}) or if-conversion
> >  (@option{-ftree-loop-if-convert}) is disabled.
> >
> > -@item allow-store-data-races
> > -Allow optimizers to introduce new data races on stores.
> > -Set to 1 to all

Re: [PATCH] rs6000: -flto forgets 'no-vsx' function attributes (PR target/70010)

2019-10-15 Thread Richard Biener

On Tue, Oct 15, 2019 at 12:07 PM Segher Boessenkool
 wrote:
>
> On Tue, Oct 15, 2019 at 11:32:27AM +0200, Richard Biener wrote:
> > > I think we just need to fix the bug in the current logic when checking
> > > whether the caller's ISA flags supports the callee's ISA flags. ...and
> > > for that, I think we just need to add a test that enforces that the
> > > caller's ISA flags match exactly the callee's flags, for those flags
> > > that were explicitly set in the callee.  The patch below seems to fix
> > > the issue (regtesting now).  Does this look like what we want?
> >
> > I believe this is going to bite you exactly in the case you want the
> > opposite behavior.  If you have CUs compiled with defaults and
> > a specialized one with VSX that calls into generic compiled functions
> > you _do_ want to allow inlining into the VSX enabled routines.
>
> Yes, but *not* inlining is relatively harmless, while inlining can be
> fatal.  I don't see how we can handle both scenarios optimally.

How can it be fatal to inline a non-VSX function into a VSX one?

> > Just
> > think of LTO, C++ and comdats - you'll get a random comdat entity
> > at link time for inlining - either from the VSX CU or the non-VSX one.
>
> This would make LTO totally unusable, with or without this patch?  Something
> else must be going on?

It's the same without LTO - the linker will simply choose (randomly)
one of the comdats from one of the CUs providing it, not caring about
some built with and some without VSX.

Richard.

>
> Segher

[C++ PATCH] build_clone cleanup

2019-10-15 Thread Nathan Sidwell

build_clone is recursive when applied to a template, but I found the 
control flow confusing.  this makes it clearer and moves some decls to 
their initializers.


Applying to trunk.

nathan

--
Nathan Sidwell
2019-10-15  Nathan Sidwell  

	* class.c (build_clone): Refactor to clarify recursiveness.

Index: gcc/cp/class.c
===
--- gcc/cp/class.c	(revision 276991)
+++ gcc/cp/class.c	(working copy)
@@ -4582,14 +4582,13 @@ static tree
 build_clone (tree fn, tree name)
 {
-  tree parms;
-  tree clone;
-
   /* Copy the function.  */
-  clone = copy_decl (fn);
+  tree clone = copy_decl (fn);
   /* Reset the function name.  */
   DECL_NAME (clone) = name;
   /* Remember where this function came from.  */
   DECL_ABSTRACT_ORIGIN (clone) = fn;
-  /* Make it easy to find the CLONE given the FN.  */
+
+  /* Make it easy to find the CLONE given the FN.  Note the
+ template_result of a template will be chained this way too.  */
   DECL_CHAIN (clone) = DECL_CHAIN (fn);
   DECL_CHAIN (fn) = clone;
@@ -4600,17 +4599,16 @@ build_clone (tree fn, tree name)
   tree result = build_clone (DECL_TEMPLATE_RESULT (clone), name);
   DECL_TEMPLATE_RESULT (clone) = result;
+
   DECL_TEMPLATE_INFO (result) = copy_node (DECL_TEMPLATE_INFO (result));
   DECL_TI_TEMPLATE (result) = clone;
+
   TREE_TYPE (clone) = TREE_TYPE (result);
   return clone;
 }
-  else
-{
-  /* Clone constraints.  */
-  if (flag_concepts)
-if (tree ci = get_constraints (fn))
-  set_constraints (clone, copy_node (ci));
-}
 
+  if (flag_concepts)
+/* Clone constraints.  */
+if (tree ci = get_constraints (fn))
+  set_constraints (clone, copy_node (ci));
 
   SET_DECL_ASSEMBLER_NAME (clone, NULL_TREE);
@@ -4624,6 +4622,5 @@ build_clone (tree fn, tree name)
 {
   DECL_VIRTUAL_P (clone) = 0;
-  if (TREE_CODE (clone) != TEMPLATE_DECL)
-	DECL_VINDEX (clone) = NULL_TREE;
+  DECL_VINDEX (clone) = NULL_TREE;
 }
 
@@ -4690,5 +4687,5 @@ build_clone (tree fn, tree name)
 DECL_CHAIN (DECL_CHAIN (DECL_ARGUMENTS (clone))) = NULL_TREE;
 
-  for (parms = DECL_ARGUMENTS (clone); parms; parms = DECL_CHAIN (parms))
+  for (tree parms = DECL_ARGUMENTS (clone); parms; parms = DECL_CHAIN (parms))
 {
   DECL_CONTEXT (parms) = clone;

Re: [PATCH] More PR92046 fixes, make --param allow-store-data-races a -f option

2019-10-15 Thread Kyrill Tkachov


Hi Richard,

On 10/15/19 8:17 AM, Richard Biener wrote:


This makes allow-store-data-races adjustable per function by making it
a regular option rather than a --param.



Note that the kernel has --param=allow-store-data-races=0 in its build 
flags.


I guess that will break unless they rename it to 
-fno-allow-store-data-races?


Thanks,

Kyrill




Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?

Thanks,
Richard.

2019-10-15  Richard Biener  

    PR middle-end/92046
    * common.opt (fallow-store-data-races): New.
    * params.def (PARAM_ALLOW_STORE_DATA_RACES): Remove.
    * params.h (ALLOW_STORE_DATA_RACES): Likewise.
    * doc/invoke.texi (fallow-store-data-races): Document.
    (--param allow-store-data-races): Remove docs.
    * opts.c (default_options_table): Enable -fallow-store-data-races
    at -Ofast.
    (default_options_optimization): Do not enable --param
    allow-store-data-races at -Ofast.
    * tree-if-conv.c (ifcvt_memrefs_wont_trap): Use 
flag_store_data_races

    instead of PARAM_ALLOW_STORE_DATA_RACES.
    * tree-ssa-loop-im.c (execute_sm): Likewise.

    * c-c++-common/cxxbitfields-3.c: Adjust.
    * c-c++-common/cxxbitfields-6.c: Likewise.
    * c-c++-common/simulate-thread/bitfields-1.c: Likewise.
    * c-c++-common/simulate-thread/bitfields-2.c: Likewise.
    * c-c++-common/simulate-thread/bitfields-3.c: Likewise.
    * c-c++-common/simulate-thread/bitfields-4.c: Likewise.
    * g++.dg/simulate-thread/bitfields-2.C: Likewise.
    * g++.dg/simulate-thread/bitfields.C: Likewise.
    * gcc.dg/lto/pr52097_0.c: Likewise.
    * gcc.dg/simulate-thread/speculative-store-2.c: Likewise.
    * gcc.dg/simulate-thread/speculative-store-3.c: Likewise.
    * gcc.dg/simulate-thread/speculative-store-4.c: Likewise.
    * gcc.dg/simulate-thread/speculative-store.c: Likewise.
    * gcc.dg/tree-ssa/20050314-1.c: Likewise.

Index: gcc/common.opt
===
--- gcc/common.opt  (revision 276983)
+++ gcc/common.opt  (working copy)
@@ -993,6 +993,10 @@ Align the start of loops.
 falign-loops=
 Common RejectNegative Joined Var(str_align_loops) Optimization

+fallow-store-data-races
+Common Report Var(flag_store_data_races) Optimization
+Allow the compiler to introduce new data races on stores.
+
 fargument-alias
 Common Ignore
 Does nothing. Preserved for backward compatibility.
Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi (revision 276983)
+++ gcc/doc/invoke.texi (working copy)
@@ -406,6 +406,7 @@ Objective-C and Objective-C++ Dialects}.
 -falign-jumps[=@var{n}[:@var{m}:[@var{n2}[:@var{m2} @gol
 -falign-labels[=@var{n}[:@var{m}:[@var{n2}[:@var{m2} @gol
 -falign-loops[=@var{n}[:@var{m}:[@var{n2}[:@var{m2} @gol
+-fallow-store-data-races @gol
 -fassociative-math  -fauto-profile -fauto-profile[=@var{path}] @gol
 -fauto-inc-dec  -fbranch-probabilities @gol
 -fcaller-saves @gol
@@ -8463,9 +8464,9 @@ designed to reduce code size.
 Disregard strict standards compliance.  @option{-Ofast} enables all
 @option{-O3} optimizations.  It also enables optimizations that are not
 valid for all standard-compliant programs.
-It turns on @option{-ffast-math} and the Fortran-specific
-@option{-fstack-arrays}, unless @option{-fmax-stack-var-size} is
-specified, and @option{-fno-protect-parens}.
+It turns on @option{-ffast-math}, @option{-fallow-store-data-races}
+and the Fortran-specific @option{-fstack-arrays}, unless
+@option{-fmax-stack-var-size} is specified, and 
@option{-fno-protect-parens}.


 @item -Og
 @opindex Og
@@ -10227,6 +10228,12 @@ The maximum allowed @var{n} option value

 Enabled at levels @option{-O2}, @option{-O3}.

+@item -fallow-store-data-races
+@opindex fallow-store-data-races
+Allow the compiler to introduce new data races on stores.
+
+Enabled at level @option{-Ofast}.
+
 @item -funit-at-a-time
 @opindex funit-at-a-time
 This option is left for compatibility reasons. @option{-funit-at-a-time}
@@ -12060,10 +12067,6 @@ The maximum number of conditional store
 if either vectorization (@option{-ftree-vectorize}) or if-conversion
 (@option{-ftree-loop-if-convert}) is disabled.

-@item allow-store-data-races
-Allow optimizers to introduce new data races on stores.
-Set to 1 to allow, otherwise to 0.
-
 @item case-values-threshold
 The smallest number of different values for which it is best to use a
 jump-table instead of a tree of conditional branches.  If the value is
Index: gcc/opts.c
===
--- gcc/opts.c  (revision 276983)
+++ gcc/opts.c  (working copy)
@@ -564,6 +564,7 @@ static const struct default_options defa

 /* -Ofast adds optimizations to -O3.  */
 { OPT_LEVELS_FAST, OPT_ffast_math, NULL, 1 },
+    { OPT_LEVELS_FAST, OPT_fallow_store_data_races, NULL, 1 },

Re: [Patch 1/2][ipa] Add target hook to sanitize cloned declaration's attributes

2019-10-15 Thread Andre Vieira (lists)


Hi,

I have changed the name to make the target hook more general and not 
create the illusion it is related to sanitizers.


Is this OK for trunk?

Cheers,
Andre

gcc/ChangeLog:

2019-10-15  Andre Vieira  

* cgraphclones.c(create_clone): Call new target hook when
creating a new node.
(create_version_clone_with_body): Likewise.
* target.def(modify_clone_cgraph_node): New target hook.
* targhooks.c(default_modify_clone_cgraph_node): New.
* targhooks.h(default_modify_clone_cgraph_node): New.
* doc/tm.texi.in: Include new TARGET_MODIFY_CLONE_CGRAPH_NODE.
* doc/tm.texi: Regenerated

On 09/10/2019 16:35, Andre Vieira (lists) wrote:

Adding Jakub to CC'

On 09/10/2019 16:29, Andre Vieira (lists) wrote:

Hi Martin,

I see thank you. For my particular use case I only need to strip 
attributes for local symbols.  However, if I were to claim it is only 
used for non-externally visible clones than I would also need to make 
sure they are non-externally visible now and put something in place 
that checks this in the future.  I feel it's easier and more flexible 
to leave it up to the users of the hook to check it themselves using 
DECL_EXTERNAL.  I will mention it in the description though.


Cheers,
Andre

gcc/ChangeLog:

2019-10-09  Andre Vieira  

 * cgraphclones.c(create_clone): Call new target hook when
 creating a new node.
 (create_version_clone): Likewise.
 (create_version_clone_with_body): Likewise.
 * target.def(sanitize_clone_attributes): New target hook.
 * targhooks.c(default_sanitize_clone_attributes): New.
 * targhooks.h(default_sanitize_clone_attributes): New.
 * doc/tm.texi.in: Document new target hook.
 * doc/tm.texi: Regenerated

On 09/10/2019 12:43, Martin Jambor wrote:

Hi,

On Wed, Oct 09 2019, Andre Vieira (lists) wrote:

Hi Martin,

Thanks for the suggestion.  I have moved the target hook to the
following functions:
'create_clone'
'create_version_clone'


I am not sure that calling the hook in create_version_clone is what you
want.  It should perhaps be made clearer in the API but unlike
(functions that call) create_clone and create_version_clone_with_body,
the result is not a local symbol but it may also be an externally
visible "version" of the original.  OpenMP uses it to create nodes for
external declarations (not definitions!) of variants of functions marked
with omp declare simd and the function calling it in trans-mem.c also
apparently uses it to create possibly externally visible clones.


diff --git a/gcc/cgraphclones.c b/gcc/cgraphclones.c
index 909407b9a71690fe7b6ce731e5ddaa06af6d4410..4526329244677cd01a66d43eaf905b6f72948db9 100644
--- a/gcc/cgraphclones.c
+++ b/gcc/cgraphclones.c
@@ -80,6 +80,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-inline.h"
 #include "dumpfile.h"
 #include "gimple-pretty-print.h"
+#include "target.h"
 
 /* Create clone of edge in the node N represented by CALL_EXPR
the callgraph.  */
@@ -407,6 +408,8 @@ cgraph_node::create_clone (tree new_decl, profile_count prof_count,
   if (!new_inlined_to)
 dump_callgraph_transformation (this, new_node, suffix);
 
+  targetm.modify_clone_cgraph_node (new_node);
+
   return new_node;
 }
 
@@ -1004,6 +1007,9 @@ cgraph_node::create_version_clone_with_body
   update_call_expr (new_version_node);
 
   symtab->call_cgraph_insertion_hooks (new_version_node);
+
+  targetm.modify_clone_cgraph_node (new_version_node);
+
   return new_version_node;
 }
 
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index a86c210d4fe390bd0356b6e50ba7c6c34a36239a..1fecf91505e13bab6b1c061fef89eb210e32f1b6 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -12039,6 +12039,12 @@ If defined, this function returns an appropriate alignment in bits for an atomic
 ISO C11 requires atomic compound assignments that may raise floating-point exceptions to raise exceptions corresponding to the arithmetic operation whose result was successfully stored in a compare-and-exchange sequence.  This requires code equivalent to calls to @code{feholdexcept}, @code{feclearexcept} and @code{feupdateenv} to be generated at appropriate points in the compare-and-exchange sequence.  This hook should set @code{*@var{hold}} to an expression equivalent to the call to @code{feholdexcept}, @code{*@var{clear}} to an expression equivalent to the call to @code{feclearexcept} and @code{*@var{update}} to an expression equivalent to the call to @code{feupdateenv}.  The three expressions are @code{NULL_TREE} on entry to the hook and may be left as @code{NULL_TREE} if no code is required in a particular place.  The default implementation leaves all three expressions as @code{NULL_TREE}.  The @code{__atomic_feraiseexcept} function from @code{libatomic} may be of use as part of the code generated in @code{*@var{update}}.
 @end deftypefn
 
+@deftypefn {Target Hook} void TARGET_MODIFY_CLONE_CGRAPH_NODE (struct cgraph_node *@va

Re: [Patch 2/2][Arm] Implement TARGET_HOOK_SANITIZE_CLONE_ATTRIBUTES to remove cmse_nonsecure_entry

2019-10-15 Thread Andre Vieira (lists)


Hi,

Just some minor changes after name changes in the first patch. I'll 
assume this is also OK.


gcc/ChangeLog:
2019-10-15  Andre Vieira  

* config/arm/arm.c (TARGET_MODIFY_CLONE_CGRAPH_NODE): Define.
(arm_modify_clone_cgraph_node): New.

gcc/testsuite/ChangeLog:
2019-10-15  Andre Vieira  

* gcc.target/arm/cmse/ipa-clone.c: New test.


On 11/10/2019 00:29, Ramana Radhakrishnan wrote:

On Tue, Oct 8, 2019 at 4:21 PM Andre Vieira (lists)
 wrote:


Hi,

This patch implements the TARGET_HOOK_SANITIZE_CLONE_ATTRIBUTES for the
arm backend to remove the 'cmse_nonsecure_entry' attribute from cmse.

Bootstrapped the series on x86_64 and built arm-none-eabi, running the
cmse testsuite for armv8-m.main and armv8-m.base.

Is this OK for trunk?


Ok if the common bit is approved.  And do watch out for any testsuite
multilib fallout.

Ramana


Cheers,
Andre

gcc/ChangeLog:

2019-10-08  Andre Vieira  

  * config/arm/arm.c (TARGET_SANITIZE_CLONE_ATTRIBUTES): Define.
  (arm_sanitize_clone_attributes): New.

gcc/testsuite/ChangeLog:
2019-10-08  Andre Vieira  

  * gcc.target/arm/cmse/ipa-clone.c: New test.
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 9f0975dc0710626ef46abecaa3a205e993821118..3874b25a401888227a3d69818c27bbf9229bcebf 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -811,6 +811,9 @@ static const struct attribute_spec arm_attribute_table[] =
 
 #undef TARGET_CONSTANT_ALIGNMENT
 #define TARGET_CONSTANT_ALIGNMENT arm_constant_alignment
+
+#undef TARGET_MODIFY_CLONE_CGRAPH_NODE
+#define TARGET_MODIFY_CLONE_CGRAPH_NODE arm_modify_clone_cgraph_node
 
 /* Obstack for minipool constant handling.  */
 static struct obstack minipool_obstack;
@@ -31999,6 +32002,15 @@ arm_constant_alignment (const_tree exp, HOST_WIDE_INT align)
   return align;
 }
 
+static void
+arm_modify_clone_cgraph_node (struct cgraph_node * node)
+{
+  tree attrs = DECL_ATTRIBUTES (node->decl);
+  if (lookup_attribute ("cmse_nonsecure_entry", attrs))
+attrs = remove_attribute ("cmse_nonsecure_entry", attrs);
+  DECL_ATTRIBUTES (node->decl) = attrs;
+}
+
 /* Emit a speculation barrier on target architectures that do not have
DSB/ISB directly.  Such systems probably don't need a barrier
themselves, but if the code is ever run on a later architecture, it
diff --git a/gcc/testsuite/gcc.target/arm/cmse/ipa-clone.c b/gcc/testsuite/gcc.target/arm/cmse/ipa-clone.c
new file mode 100644
index ..6ab4c34f7499f9615b5d44c633bb5f9d69e88d39
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/cmse/ipa-clone.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-mcmse" }  */
+/* { dg-additional-options "-Ofast" } */
+
+#include 
+
+int __attribute__ ((cmse_nonsecure_entry))
+foo (int a)
+{
+return 1;
+}
+
+int __attribute__ ((cmse_nonsecure_entry))
+bar (int a)
+{
+return 1;
+}
+
+int main (void)
+{
+return 0;
+}
+
+/* { dg-final { scan-assembler "foo.constprop.0:" } } */
+/* { dg-final { scan-assembler-not "__acle_se_foo.constprop.0:" } } */
+/* { dg-final { scan-assembler "foo:" } } */
+/* { dg-final { scan-assembler "__acle_se_foo:" } } */
+

Re: [Patch][Fortran] OpenMP+OpenACC: Remove bogus contigous-pointer check (PR/65438)

2019-10-15 Thread Jakub Jelinek

On Tue, Oct 15, 2019 at 11:42:12AM +0200, Tobias Burnus wrote:
> OpenMP – OpenMP 4.0 and 4.5 are based on Fortran 2003 (hence: no
> 'contiguous' attribute), OpenMP 5.0 is based on Fortran 2008. Hence, it
> explicitly uses the contiguous attribute. It also introduces 'simply
> contiguous array section' which largely matches Fortran's simply
> contiguous.Several contiguous restrictions were lifted with OpenMP 5.0; one
> of the few remaining ones is:
> 
> * [OpenMP 5's map clause] "If a list item is an array section, it must
> specify contiguous storage"
> But this also does not require (simply) contiguous, i.e. compile-time
> checking.

The function you are changing has been introduced with OpenACC and is used
solely for OpenACC clauses, so I'll defer review to Thomas here.

> The current check is used by OpenACC in
> 
> * resolve_oacc_data_clauses
> * resolve_oacc_deviceptr_clause
> * resolve_omp_clauses for OpenACC loops/EXEC_OACC_PARALLEL
> 
> And by OpenMP in
> 
> * resolve_omp_clauses for OMP_LIST_USE_DEVICE/OMP_LIST_DEVICE_RESIDENT

device_resident and use_device are OpenACC clauses,
resolve_omp_clauses is called for both OpenMP and OpenACC, but is called
only for OpenACC:
if (code
&& (oacc_is_loop (code) || code->op == EXEC_OACC_PARALLEL))
  check_array_not_assumed (n->sym, n->where, name);
and for those 2 clauses.

Jakub

Re: [Patch][Fortran] OpenMP+OpenACC: Remove bogus contigous-pointer check (PR/65438)

2019-10-15 Thread Thomas König

Hi Tobias,

What about using gfc_is_not_contigous? This would allow to issue an error when 
we can prove the user made an error.

Regards

Thomas

[PATCH] Fix PR92048

2019-10-15 Thread Richard Biener



Committed.

Richard.

2019-10-15  Richard Biener  

PR testsuite/92048
* gcc.dg/vect/fast-math-vect-pr29925.c: Avoid unrolling of
inner loop.

Index: gcc/testsuite/gcc.dg/vect/fast-math-vect-pr29925.c
===
--- gcc/testsuite/gcc.dg/vect/fast-math-vect-pr29925.c  (revision 276983)
+++ gcc/testsuite/gcc.dg/vect/fast-math-vect-pr29925.c  (working copy)
@@ -13,6 +13,7 @@
for (i=0;i

Re: Add a constant_range_value_p function (PR 92033)

2019-10-15 Thread Richard Sandiford

Richard Biener  writes:
> On October 14, 2019 2:32:43 PM GMT+02:00, Richard Sandiford 
>  wrote:
>>Richard Biener  writes:
>>> On Fri, Oct 11, 2019 at 4:42 PM Richard Sandiford
>>>  wrote:

 The range-tracking code has a pretty hard-coded assumption that
 is_gimple_min_invariant is equivalent to "INTEGER_CST or invariant
 ADDR_EXPR".  It seems better to add a predicate specifically for
 that rather than contiually fight cases in which it can't handle
 other invariants.

 Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
>>>
>>> ICK.  Nobody is going to remember this new restriction and
>>> constant_range_value_p reads like constant_value_range_p ;)
>>>
>>> Btw, is_gimple_invariant_address shouldn't have been exported,
>>> it's only use could have used is_gimple_min_invariant...
>>
>>What do you think we should do instead?
>
> Just handle POLY_INT_CST in a few place to quickly enough drop to varying. 

OK, how about this?  Aldy's suggestion would be fine by me too,
but I thought I'd try this first given Aldy's queasiness about
allowing POLY_INT_CSTs further in.

The main case in which this gives useful ranges is a lower bound
of A + B * X becoming A when B >= 0.  E.g.:

  (1) [32 + 16X, 100] -> [32, 100]
  (2) [32 + 16X, 32 + 16X] -> [32, MAX]

But the same thing can be useful for the upper bound with negative
X coefficients.

We can revisit this later if keeping a singleton range for (2)
would be better.

Tested as before.

Richard


2019-10-15  Richard Sandiford  

gcc/
PR middle-end/92033
* poly-int.h (constant_lower_bound_with_limit): New function.
(constant_upper_bound_with_limit): Likewise.
* doc/poly-int.texi: Document them.
* tree-vrp.c (value_range_base::set): Convert POLY_INT_CST bounds
into the worst-case INTEGER_CST bounds.

Index: gcc/poly-int.h
===
--- gcc/poly-int.h  2019-07-10 19:41:26.395898027 +0100
+++ gcc/poly-int.h  2019-10-15 11:30:14.099625553 +0100
@@ -1528,6 +1528,29 @@ constant_lower_bound (const poly_int_pod
   return a.coeffs[0];
 }
 
+/* Return the constant lower bound of A, given that it is no less than B.  */
+
+template
+inline POLY_CONST_COEFF (Ca, Cb)
+constant_lower_bound_with_limit (const poly_int_pod &a, const Cb &b)
+{
+  if (known_ge (a, b))
+return a.coeffs[0];
+  return b;
+}
+
+/* Return the constant upper bound of A, given that it is no greater
+   than B.  */
+
+template
+inline POLY_CONST_COEFF (Ca, Cb)
+constant_upper_bound_with_limit (const poly_int_pod &a, const Cb &b)
+{
+  if (known_le (a, b))
+return a.coeffs[0];
+  return b;
+}
+
 /* Return a value that is known to be no greater than A and B.  This
will be the greatest lower bound for some indeterminate values but
not necessarily for all.  */
Index: gcc/doc/poly-int.texi
===
--- gcc/doc/poly-int.texi   2019-03-08 18:14:25.333011645 +
+++ gcc/doc/poly-int.texi   2019-10-15 11:30:14.099625553 +0100
@@ -803,6 +803,18 @@ the assertion is known to hold.
 @item constant_lower_bound (@var{a})
 Assert that @var{a} is nonnegative and return the smallest value it can have.
 
+@item constant_lower_bound_with_limit (@var{a}, @var{b})
+Return the least value @var{a} can have, given that the context in
+which @var{a} appears guarantees that the answer is no less than @var{b}.
+In other words, the caller is asserting that @var{a} is greater than or
+equal to @var{b} even if @samp{known_ge (@var{a}, @var{b})} doesn't hold.
+
+@item constant_upper_bound_with_limit (@var{a}, @var{b})
+Return the greatest value @var{a} can have, given that the context in
+which @var{a} appears guarantees that the answer is no greater than @var{b}.
+In other words, the caller is asserting that @var{a} is less than or equal
+to @var{b} even if @samp{known_le (@var{a}, @var{b})} doesn't hold.
+
 @item lower_bound (@var{a}, @var{b})
 Return a value that is always less than or equal to both @var{a} and @var{b}.
 It will be the greatest such value for some indeterminate values
Index: gcc/tree-vrp.c
===
--- gcc/tree-vrp.c  2019-10-14 09:04:54.515259320 +0100
+++ gcc/tree-vrp.c  2019-10-15 11:30:14.099625553 +0100
@@ -727,6 +727,24 @@ value_range_base::set (enum value_range_
   return;
 }
 
+  /* Convert POLY_INT_CST bounds into worst-case INTEGER_CST bounds.  */
+  if (POLY_INT_CST_P (min))
+{
+  tree type_min = vrp_val_min (TREE_TYPE (min), true);
+  widest_int lb
+   = constant_lower_bound_with_limit (wi::to_poly_widest (min),
+  wi::to_widest (type_min));
+  min = wide_int_to_tree (TREE_TYPE (min), lb);
+}
+  if (POLY_INT_CST_P (max))
+{
+  tree type_max = vrp_val_max (TREE_TYPE (max), true);
+  widest_int ub
+   = constant_upper_bou

[PATCH] Fix PR92094

2019-10-15 Thread Richard Biener



The following fixes vectorization of nested cycles when the nested
cycle only constists of a PHI node.  As in the previous fix a
nested cycle only consists of the PHI, it doesn't necessarily have
another stmt only participating in that cycle (in this case it
participates in another nested cycle).

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2019-10-15  Richard Biener  

PR tree-optimization/92094
* tree-vect-loop.c (vectorizable_reduction): For nested cycles
do not adjust the reduction definition def type.
* tree-vect-stmts.c (vect_transform_stmt): Verify the scalar stmt
defines the latch argument of the PHI.

* gfortran.dg/pr92094.f90: New testcase.

Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c(revision 276983)
+++ gcc/tree-vect-loop.c(working copy)
@@ -5742,20 +5751,9 @@ vectorizable_reduction (stmt_vec_info stmt_info, s
   if (STMT_VINFO_DEF_TYPE (stmt_info) == vect_nested_cycle)
 {
   if (is_a  (stmt_info->stmt))
-   {
- /* Analysis for double-reduction is done on the outer
-loop PHI, nested cycles have no further restrictions.  */
- STMT_VINFO_TYPE (stmt_info) = cycle_phi_info_type;
- /* For nested cycles we want to let regular vectorizable_*
-routines handle code-generation.  */
- if (STMT_VINFO_DEF_TYPE (reduc_info) != vect_double_reduction_def)
-   {
- stmt_info = STMT_VINFO_REDUC_DEF (stmt_info);
- STMT_VINFO_DEF_TYPE (stmt_info) = vect_internal_def;
- STMT_VINFO_DEF_TYPE (vect_stmt_to_vectorize (stmt_info))
-   = vect_internal_def;
-   }
-   }
+   /* Analysis for double-reduction is done on the outer
+  loop PHI, nested cycles have no further restrictions.  */
+   STMT_VINFO_TYPE (stmt_info) = cycle_phi_info_type;
   else
STMT_VINFO_TYPE (stmt_info) = reduc_vec_info_type;
   return true;
Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c   (revision 276983)
+++ gcc/tree-vect-stmts.c   (working copy)
@@ -10906,13 +10906,16 @@ vect_transform_stmt (stmt_vec_info stmt_info, gimp
   && STMT_VINFO_REDUC_TYPE (reduc_info) != EXTRACT_LAST_REDUCTION)
 {
   gphi *phi;
+  edge e;
   if (!slp_node
  && (phi = dyn_cast 
  (STMT_VINFO_REDUC_DEF (orig_stmt_info)->stmt))
  && dominated_by_p (CDI_DOMINATORS,
-gimple_bb (orig_stmt_info->stmt), gimple_bb (phi)))
+gimple_bb (orig_stmt_info->stmt), gimple_bb (phi))
+ && (e = loop_latch_edge (gimple_bb (phi)->loop_father))
+ && (PHI_ARG_DEF_FROM_EDGE (phi, e)
+ == gimple_get_lhs (orig_stmt_info->stmt)))
{
- edge e = loop_latch_edge (gimple_bb (phi)->loop_father);
  stmt_vec_info phi_info
= STMT_VINFO_VEC_STMT (STMT_VINFO_REDUC_DEF (orig_stmt_info));
  stmt_vec_info vec_stmt = STMT_VINFO_VEC_STMT (stmt_info);
@@ -10932,7 +10935,7 @@ vect_transform_stmt (stmt_vec_info stmt_info, gimp
{
  slp_tree phi_node = slp_node_instance->reduc_phis;
  gphi *phi = as_a  (SLP_TREE_SCALAR_STMTS (phi_node)[0]->stmt);
- edge e = loop_latch_edge (gimple_bb (phi)->loop_father);
+ e = loop_latch_edge (gimple_bb (phi)->loop_father);
  gcc_assert (SLP_TREE_VEC_STMTS (phi_node).length ()
  == SLP_TREE_VEC_STMTS (slp_node).length ());
  for (unsigned i = 0; i < SLP_TREE_VEC_STMTS (phi_node).length (); ++i)
Index: gcc/testsuite/gfortran.dg/pr92094.f90
===
--- gcc/testsuite/gfortran.dg/pr92094.f90   (nonexistent)
+++ gcc/testsuite/gfortran.dg/pr92094.f90   (working copy)
@@ -0,0 +1,28 @@
+! { dg-do compile }
+! { dg-options "-O3" }
+  subroutine hesfcn(n, x, h, ldh)
+  integer n,ldh
+  double precision x(n), h(ldh)
+
+  integer i,j,k,kj
+  double precision th,u1,u2,v2
+ 
+  kj = 0
+  do 770 j = 1, n
+ kj = kj - j
+ do 760 k = 1, j
+kj = kj + 1
+v2 = 2 * x(k) - 1
+u1 = 0
+u2 = 2
+do 750 i = 1, n
+   h(kj) = h(kj) + u2
+   th = 4 * v2 + u2 - u1
+   u1 = u2
+   u2 = th
+   th = v2 - 1
+  750   continue
+  760continue
+  770 continue
+
+  end

Re: [PATCH] rs6000: -flto forgets 'no-vsx' function attributes (PR target/70010)

2019-10-15 Thread Segher Boessenkool

On Tue, Oct 15, 2019 at 11:32:27AM +0200, Richard Biener wrote:
> > I think we just need to fix the bug in the current logic when checking
> > whether the caller's ISA flags supports the callee's ISA flags. ...and
> > for that, I think we just need to add a test that enforces that the
> > caller's ISA flags match exactly the callee's flags, for those flags
> > that were explicitly set in the callee.  The patch below seems to fix
> > the issue (regtesting now).  Does this look like what we want?
> 
> I believe this is going to bite you exactly in the case you want the
> opposite behavior.  If you have CUs compiled with defaults and
> a specialized one with VSX that calls into generic compiled functions
> you _do_ want to allow inlining into the VSX enabled routines.

Yes, but *not* inlining is relatively harmless, while inlining can be
fatal.  I don't see how we can handle both scenarios optimally.

> Just
> think of LTO, C++ and comdats - you'll get a random comdat entity
> at link time for inlining - either from the VSX CU or the non-VSX one.

This would make LTO totally unusable, with or without this patch?  Something
else must be going on?


Segher

Re: [PATCH] rs6000: -flto forgets 'no-vsx' function attributes (PR target/70010)

2019-10-15 Thread Segher Boessenkool

On Tue, Oct 15, 2019 at 11:52:26AM +0200, Andreas Schwab wrote:
> On Okt 15 2019, Segher Boessenkool  wrote:
> 
> > Please use
> >   /* { dg-final { scan-assembler "\mbl vadd_no_vsx\M" } } */
> 
> ITYM
> 
> /* { dg-final { scan-assembler {\mbl vadd_no_vsx\M} } } */

Ha yes, thanks :-)  That's the problem with cut-and-paste (which is also
the cause of all these strange dg lines, I think).

Segher

Re: [PATCH] rs6000: -flto forgets 'no-vsx' function attributes (PR target/70010)

2019-10-15 Thread Segher Boessenkool

Hi!

On Tue, Oct 15, 2019 at 05:15:07PM +0800, Jiufu Guo wrote:
> And another issue: Behavior is still inconsistent between "-mno-vsx
> -flto" and "-mno-vsx" for user code. Previous patch makes it consistent
> between "-mvsx -flto" and "-mvsx". 

>  $GCC_BUILD/gcc/xgcc -B$GCC_BUILD/gcc novsx.c -O2 -mno-vsx
> /home/guojiufu/temp/novsx.c: In function 'main':
> /home/guojiufu/temp/novsx.c:6:1: error: inlining failed in call to 
> 'always_inline' 'foo': target specific option mismatch

So what should we do about this?  There are arguments for *both*
behaviours, and apparently with LTO we do not know which flags are
explicit?

> +  /* Propogate global flags to caller.  */
> +  HOST_WIDE_INT caller_isa = rs6000_isa_flags;

I don't think that is right, or, I don't see why, anyway?

> +
> +  if (((caller_isa & callee_isa) == callee_isa)
> + && (caller_isa & explicit_isa) == (callee_isa & explicit_isa))
> +   ret = true;
> +}


Segher

Re: [PATCH] rs6000: -flto forgets 'no-vsx' function attributes (PR target/70010)

2019-10-15 Thread Andreas Schwab

On Okt 15 2019, Segher Boessenkool  wrote:

> Please use
>   /* { dg-final { scan-assembler "\mbl vadd_no_vsx\M" } } */

ITYM

/* { dg-final { scan-assembler {\mbl vadd_no_vsx\M} } } */

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

[Patch][Fortran] OpenMP+OpenACC: Remove bogus contigous-pointer check (PR/65438)

2019-10-15 Thread Tobias Burnus

Permit more valid code by removing a too tight constraint – simple 
patch, very long background & history (feel free to skip).



Arrays in Fortran are classified as:

* assumed-size: "… :: var(*)" – and array but only the caller knows the 
size, which usually gets passed as some additional argument ("N") in the 
function all. – But at least contiguous. Typical for FORTRAN 66 code.
* assumed-rank: "… :: var(..)" – array can be any rank (dimension); 
provides full descriptor but the rank is not know at compile time, may 
or may not contigous/allocatable/pointer. Introduced mainly for MPI to 
avoid creating 16 versions of an interface (scalar to rank-15 arrays) – 
but has many uses; requires 'select rank' to be consumable within 
Fortran. — New with Fortran 2018 (or rather the TS29113, i.e. between 
F2013 and F2018).
* assumed shape: "… :: var(:, :)" – nonallocatable, nonpointer. Uses an 
array descriptor, can be contiguous or not (Always contiguous with 
Fortran 2008's 'contiguous' attribute)
*deferred shape: "var(:, :, :)" with allocatable/pointer attribute. 
'allocatable' implies contiguous by construction.
(* plus explicit-shape arrays like "… :: var(N,M)"  and with constant 
initialization: implied-shape arrays.)



openmp.c's "check_array_not_assumed" shows an error for:
* assumed-size arrays  (makes totally sense)
* assumed-rank arrays (makes sense as long as TS29113/F2018 is not 
supported)

* pointers which do not have the "contiguous" attribute

Not only function name wise, the latter is the odd one out. While in 
many cases a contiguous memory is required, one can either make it a 
compile-time testable constraint ('simply contiguous') – or one puts the 
onus is on the programmer; the latter seems to be done in the 
OpenACC/OpenMP specs.


(Of cause, it is also run-time checkable and gfortran uses a run-time 
test when passing a potentially noncontiguous array to a dummy argument 
with "contiguous" attribute; if the actual argument is contiguous, it is 
passed as is - otherwise, a copy-in and copy-out is performed [for 
intent(inout)].)



Currently, OpenACC 2.7 has:

* Based on Fortran 2003 – i.e. no 'contiguous attribute'.
* "2.7.1. Data Specification in Data Clauses": "Any array or subarray in 
a data clause, including Fortran array pointers, must be a contiguous 
block of memory, except for dynamic multidimensional C arrays."
* "2.14.4. Update Directive": "Noncontiguous subarrays may be specified. 
It is implementation-specific whether noncontiguous regions are updated 
by using one transfer for each contiguous subregion, or whether the 
noncontiguous data is packed, transferred once, and unpacked, or whether 
one or more larger subarrays (no larger than the smallest contiguous 
region that contains the specified subarray) are updated."


Oddly enough, OpenACC 2.7 only refers to "Fortran 2003" (in 1.6) but 
uses a TS29113/F2018 feature in "3.2.20. acc_copy_in": assumed-rank arrays.



OpenMP – OpenMP 4.0 and 4.5 are based on Fortran 2003 (hence: no 
'contiguous' attribute), OpenMP 5.0 is based on Fortran 2008. Hence, it 
explicitly uses the contiguous attribute. It also introduces 'simply 
contiguous array section' which largely matches Fortran's simply 
contiguous.Several contiguous restrictions were lifted with OpenMP 5.0; 
one of the few remaining ones is:


* [OpenMP 5's map clause] "If a list item is an array section, it must 
specify contiguous storage"
But this also does not require (simply) contiguous, i.e. compile-time 
checking.



The current check is used by OpenACC in

* resolve_oacc_data_clauses
* resolve_oacc_deviceptr_clause
* resolve_omp_clauses for OpenACC loops/EXEC_OACC_PARALLEL

And by OpenMP in

* resolve_omp_clauses for OMP_LIST_USE_DEVICE/OMP_LIST_DEVICE_RESIDENT


Some more archeology (thanks Thomas!):
* PR fortran/65438 (of 2015-03-16) - requires to remove this check, 
without providing any reasoning
* https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg145081.html 
(of 2016-08-09) was an attempt to add a warning – which was rejected on 
a similar ground then listed above: The 'contiguous' attribute (or 
simply contiguous property) is not explicitly ask for in the spec – and 
as OpenACC only supports Fortran < 2008, enforcing a Fortran 2008 
feature does not make sense.



HENCE: Remove a compile-time check when both specs (OpenMP/OpenACC) put 
the onus on the programmer. As compile-time checks are only a subset, we 
reject valid contiguous memory which is just not simply contiguous.


Additionally, the check only works if only an identifier is permitted as 
argument (it checks the symbol). For derived-type components, the check 
is not effective. (Assumed rank and assumed size are properties of dummy 
arguments, hence, those work fine.)


OK for the trunk?

Cheers,

Tobias

PS: This patch comes from the OG9 branch as 
ac6c90812344f4f4cfe4d2f5901c1a9d038a4000 but was actually already in the 
OG8 branch with commit d50862a7522446cf101eac9dc2e32967dc8318b7 from 
201

Re: [PATCH] rs6000: -flto forgets 'no-vsx' function attributes (PR target/70010)

2019-10-15 Thread Iain Sandoe

Segher Boessenkool  wrote:

> Hi!
> 
> On Mon, Oct 14, 2019 at 07:18:11PM -0500, Peter Bergner wrote:
>> On 10/14/19 2:57 PM, Segher Boessenkool wrote:
>>> On Mon, Oct 14, 2019 at 06:35:06PM +0200, Richard Biener wrote:
 The general case should be that if the caller ISA supports the callee one
 then inlining is OK. If this is not wanted in some cases then there are
 options like using a noinline attribute. 
>>> 
>>> I agree, and that is what we already do afaik.
>> 
>> I agree on the making sure the caller's ISA supports the callee's ISA
>> before allowing inlining...and I think that's what our code it "trying"
>> to do.  It just has some bugs.
> 
> It says that it wants the callee ISA to be a subset of the caller ISA,
> and that is what it does.  Which is exactly what we want.  But as the
> PR points out it is *also* useful to not inline callees that explicitly
> disabled some ISA feature the caller has.  Which we never promissed, but
> perhaps we should.
> 
>> I don't think sprinkling noinline's around will work given LTO can
>> inline random functions from one object file into a function in another
>> object file and we have no idea what the options were used for both files.
>> I think that would just force us to end up putting nolines on all fuctions
>> that might be compiled with LTO.
> 
> I think LTO should handle noinline attributes just fine, it would be a
> much bigger problem than this PR if it doesn't!
> 
> 
> Missing date+name+address here; and missing PR ref.
> 
>> gcc/
>>  * config/rs6000/rs6000.c (rs6000_can_inline_p): Handle explicit
>>  options.
> 
> "Prohibit inlining if the callee explicitly disables some isa_flags the
> caller has", or something like that?  A few more words please.
> 
>> --- gcc/testsuite/gcc.target/powerpc/pr70010.c   (nonexistent)
>> +++ gcc/testsuite/gcc.target/powerpc/pr70010.c   (working copy)
>> @@ -0,0 +1,21 @@
>> +/* { dg-do compile { target { powerpc*-*-* } } } */
> 
> Just
>  /* { dg-do compile } */
> please: everything in gcc.target checks for powerpc*-*-* always (and
> compile is the default, but it's nice documentation).
> 
>> +/* { dg-skip-if "" { powerpc*-*-darwin* } } */
> 
> You don't need this I think.  If this test cannot work on Darwin for
> some reason, the Darwin maintainers will take care of it.  Unless you
> *know* they want (or need) the skip here?

Right, it should not be needed because ..

>> +/* { dg-require-effective-target powerpc_vsx_ok } */

… this ^ should DTRT for Darwin too.  If not, then that’s the place I want
fix it.

>> +/* { dg-options "-O2 -finline" } */
> 
> -finline does nothing; it isn't even documented, only -fno-inline is.
> 
>> +/* { dg-final { scan-assembler "bl vadd_no_vsx" } } */
> 
> Please use
>  /* { dg-final { scan-assembler "\mbl vadd_no_vsx\M" } } */
> so that it won't match "rldibl vadd_no_vsx", etc.  Doesn't matter much
> for this testcase, but it's a good habit (and using it makes me not
> comment about it, also useful ;-) )
> 
> Okay for trunk with those tweaks.  Thanks!
> 
> 
> Segher

Re: [PATCH] rs6000: -flto forgets 'no-vsx' function attributes (PR target/70010)

2019-10-15 Thread Richard Biener

On Tue, Oct 15, 2019 at 2:18 AM Peter Bergner  wrote:
>
> On 10/14/19 2:57 PM, Segher Boessenkool wrote:
> > On Mon, Oct 14, 2019 at 06:35:06PM +0200, Richard Biener wrote:
> >> The general case should be that if the caller ISA supports the callee one
> >> then inlining is OK. If this is not wanted in some cases then there are
> >> options like using a noinline attribute.
> >
> > I agree, and that is what we already do afaik.
>
> I agree on the making sure the caller's ISA supports the callee's ISA
> before allowing inlining...and I think that's what our code it "trying"
> to do.  It just has some bugs.
>
>
> > But in this case, the callee explicitly disables something (-mno-vsx),
> > while the caller has it enabled (all modern cpus have VSX).  If it ends
> > up being inlined, it will get VSX insns generated for it.  Which is what
> > Jiu Fu's patch aims to prevent.
> >
> > So you are saying the GCC policy is that you should use noinline on the
> > callee in such cases?
>
> I don't think sprinkling noinline's around will work given LTO can
> inline random functions from one object file into a function in another
> object file and we have no idea what the options were used for both files.
> I think that would just force us to end up putting nolines on all fuctions
> that might be compiled with LTO.

There's a function attribute for this.

> I think we just need to fix the bug in the current logic when checking
> whether the caller's ISA flags supports the callee's ISA flags. ...and
> for that, I think we just need to add a test that enforces that the
> caller's ISA flags match exactly the callee's flags, for those flags
> that were explicitly set in the callee.  The patch below seems to fix
> the issue (regtesting now).  Does this look like what we want?

I believe this is going to bite you exactly in the case you want the
opposite behavior.  If you have CUs compiled with defaults and
a specialized one with VSX that calls into generic compiled functions
you _do_ want to allow inlining into the VSX enabled routines.  Just
think of LTO, C++ and comdats - you'll get a random comdat entity
at link time for inlining - either from the VSX CU or the non-VSX one.

Richard.

> Peter
>
>
> gcc/
> * config/rs6000/rs6000.c (rs6000_can_inline_p): Handle explicit
> options.
>
> gcc.testsuite/
> * gcc.target/powerpc/pr70010.c: New test.
>
>
> Index: gcc/config/rs6000/rs6000.c
> ===
> --- gcc/config/rs6000/rs6000.c  (revision 276975)
> +++ gcc/config/rs6000/rs6000.c  (working copy)
> @@ -23976,13 +23976,18 @@ rs6000_can_inline_p (tree caller, tree c
>else
>  {
>struct cl_target_option *caller_opts = TREE_TARGET_OPTION 
> (caller_tree);
> +  HOST_WIDE_INT caller_isa = caller_opts->x_rs6000_isa_flags;
>struct cl_target_option *callee_opts = TREE_TARGET_OPTION 
> (callee_tree);
> +  HOST_WIDE_INT callee_isa = callee_opts->x_rs6000_isa_flags;
> +  HOST_WIDE_INT explicit_isa = callee_opts->x_rs6000_isa_flags_explicit;
>
> -  /* Callee's options should a subset of the caller's, i.e. a vsx 
> function
> -can inline an altivec function but a non-vsx function can't inline a
> -vsx function.  */
> -  if ((caller_opts->x_rs6000_isa_flags & callee_opts->x_rs6000_isa_flags)
> - == callee_opts->x_rs6000_isa_flags)
> +  /* The callee's options must be a subset of the caller's options, i.e.
> +a vsx function may inline an altivec function, but a non-vsx function
> +must not inline a vsx function.  However, for those options that the
> +callee has explicitly set, then we must enforce that the callee's
> +and caller's options match exactly; see PR70010.  */
> +  if (((caller_isa & callee_isa) == callee_isa)
> + && (caller_isa & explicit_isa) == (callee_isa & explicit_isa))
> ret = true;
>  }
>
> Index: gcc/testsuite/gcc.target/powerpc/pr70010.c
> ===
> --- gcc/testsuite/gcc.target/powerpc/pr70010.c  (nonexistent)
> +++ gcc/testsuite/gcc.target/powerpc/pr70010.c  (working copy)
> @@ -0,0 +1,21 @@
> +/* { dg-do compile { target { powerpc*-*-* } } } */
> +/* { dg-skip-if "" { powerpc*-*-darwin* } } */
> +/* { dg-require-effective-target powerpc_vsx_ok } */
> +/* { dg-options "-O2 -finline" } */
> +/* { dg-final { scan-assembler "bl vadd_no_vsx" } } */
> +
> +typedef int vec_t __attribute__((vector_size(16)));
> +
> +static vec_t
> +__attribute__((__target__("no-vsx")))
> +vadd_no_vsx (vec_t a, vec_t b)
> +{
> +  return a + b;
> +}
> +
> +vec_t
> +__attribute__((__target__("vsx")))
> +call_vadd_no_vsx (vec_t x, vec_t y, vec_t z)
> +{
> +  return vadd_no_vsx (x, y) - z;
> +}
>
>
>
> bergner@pike:~/gcc/BUGS/CPU$ cat pr70010.i
> typedef int vec_t __attribute__((vector_size(16)));
>
> static vec_t
> __attribute__((__target__("no-vsx")))
> vadd_no_vsx (vec_t a, vec_t b)
> {

Re: [PATCH] rs6000: -flto forgets 'no-vsx' function attributes (PR target/70010)

2019-10-15 Thread Segher Boessenkool

Hi!

On Mon, Oct 14, 2019 at 07:18:11PM -0500, Peter Bergner wrote:
> On 10/14/19 2:57 PM, Segher Boessenkool wrote:
> > On Mon, Oct 14, 2019 at 06:35:06PM +0200, Richard Biener wrote:
> >> The general case should be that if the caller ISA supports the callee one
> >> then inlining is OK. If this is not wanted in some cases then there are
> >> options like using a noinline attribute. 
> > 
> > I agree, and that is what we already do afaik.
> 
> I agree on the making sure the caller's ISA supports the callee's ISA
> before allowing inlining...and I think that's what our code it "trying"
> to do.  It just has some bugs.

It says that it wants the callee ISA to be a subset of the caller ISA,
and that is what it does.  Which is exactly what we want.  But as the
PR points out it is *also* useful to not inline callees that explicitly
disabled some ISA feature the caller has.  Which we never promissed, but
perhaps we should.

> I don't think sprinkling noinline's around will work given LTO can
> inline random functions from one object file into a function in another
> object file and we have no idea what the options were used for both files.
> I think that would just force us to end up putting nolines on all fuctions
> that might be compiled with LTO.

I think LTO should handle noinline attributes just fine, it would be a
much bigger problem than this PR if it doesn't!

Missing date+name+address here; and missing PR ref.

> gcc/
>   * config/rs6000/rs6000.c (rs6000_can_inline_p): Handle explicit
>   options.

"Prohibit inlining if the callee explicitly disables some isa_flags the
caller has", or something like that?  A few more words please.

> --- gcc/testsuite/gcc.target/powerpc/pr70010.c(nonexistent)
> +++ gcc/testsuite/gcc.target/powerpc/pr70010.c(working copy)
> @@ -0,0 +1,21 @@
> +/* { dg-do compile { target { powerpc*-*-* } } } */

Just
  /* { dg-do compile } */
please: everything in gcc.target checks for powerpc*-*-* always (and
compile is the default, but it's nice documentation).

> +/* { dg-skip-if "" { powerpc*-*-darwin* } } */

You don't need this I think.  If this test cannot work on Darwin for
some reason, the Darwin maintainers will take care of it.  Unless you
*know* they want (or need) the skip here?

> +/* { dg-require-effective-target powerpc_vsx_ok } */
> +/* { dg-options "-O2 -finline" } */

-finline does nothing; it isn't even documented, only -fno-inline is.

> +/* { dg-final { scan-assembler "bl vadd_no_vsx" } } */

Please use
  /* { dg-final { scan-assembler "\mbl vadd_no_vsx\M" } } */
so that it won't match "rldibl vadd_no_vsx", etc.  Doesn't matter much
for this testcase, but it's a good habit (and using it makes me not
comment about it, also useful ;-) )

Okay for trunk with those tweaks.  Thanks!

Segher

Re: [PATCH] RFA (gimplify.h) Fix incorrect cp/ use of get_formal_tmp_var.

2019-10-15 Thread Richard Biener

On Mon, Oct 14, 2019 at 10:25 PM Jason Merrill  wrote:
>
> The comment for get_formal_tmp_var says that it shouldn't be used for
> expressions whose value might change between initialization and use, and in
> this case we're creating a temporary precisely because the value might
> change, so we should use get_initialized_tmp_var instead.
>
> I also noticed that many callers of get_initialized_tmp_var pass NULL for
> post_p, so it seems appropriate to make it a default argument.  OK for trunk?

OK.

> Tested x86_64-pc-linux-gnu.
>
> gcc/cp/
> * cp-gimplify.c (cp_gimplify_expr): Use get_initialized_tmp_var.
> gcc/
> * gimplify.h (get_initialized_tmp_var): Add default argument to
> post_p.
> ---
>  gcc/gimplify.h   | 2 +-
>  gcc/cp/cp-gimplify.c | 2 +-
>  gcc/gimplify.c   | 5 +++--
>  3 files changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/gimplify.h b/gcc/gimplify.h
> index 1070006374a..6c997a769cd 100644
> --- a/gcc/gimplify.h
> +++ b/gcc/gimplify.h
> @@ -57,7 +57,7 @@ extern gbind *gimple_current_bind_expr (void);
>  extern vec gimple_bind_expr_stack (void);
>  extern void gimplify_and_add (tree, gimple_seq *);
>  extern tree get_formal_tmp_var (tree, gimple_seq *);
> -extern tree get_initialized_tmp_var (tree, gimple_seq *, gimple_seq *,
> +extern tree get_initialized_tmp_var (tree, gimple_seq *, gimple_seq * = NULL,
>  bool = true);
>  extern void declare_vars (tree, gimple *, bool);
>  extern void gimple_add_tmp_var (tree);
> diff --git a/gcc/cp/cp-gimplify.c b/gcc/cp/cp-gimplify.c
> index 154fa70ec06..80754b930b9 100644
> --- a/gcc/cp/cp-gimplify.c
> +++ b/gcc/cp/cp-gimplify.c
> @@ -767,7 +767,7 @@ cp_gimplify_expr (tree *expr_p, gimple_seq *pre_p, 
> gimple_seq *post_p)
> && (TREE_CODE (op1) == CALL_EXPR
> || (SCALAR_TYPE_P (TREE_TYPE (op1))
> && !TREE_CONSTANT (op1
> -TREE_OPERAND (*expr_p, 1) = get_formal_tmp_var (op1, pre_p);
> +TREE_OPERAND (*expr_p, 1) = get_initialized_tmp_var (op1, pre_p);
>}
>ret = GS_OK;
>break;
> diff --git a/gcc/gimplify.c b/gcc/gimplify.c
> index 836706961f3..7f9100ba97d 100644
> --- a/gcc/gimplify.c
> +++ b/gcc/gimplify.c
> @@ -661,8 +661,9 @@ get_formal_tmp_var (tree val, gimple_seq *pre_p)
> are as in gimplify_expr.  */
>
>  tree
> -get_initialized_tmp_var (tree val, gimple_seq *pre_p, gimple_seq *post_p,
> -bool allow_ssa)
> +get_initialized_tmp_var (tree val, gimple_seq *pre_p,
> +gimple_seq *post_p /* = NULL */,
> +bool allow_ssa /* = true */)
>  {
>return internal_get_tmp_var (val, pre_p, post_p, false, allow_ssa);
>  }
>
> base-commit: aa45db50a034b266c338b55dee1b412178ea84a7
> --
> 2.18.1
>

Re: [PATCH] teach gengtype about 'mutable'

2019-10-15 Thread Richard Biener

Yes, it is. :)

On Mon, Oct 14, 2019 at 10:09 PM Nathan Sidwell  wrote:
>
> On 10/14/19 3:46 PM, Jeff Law wrote:
> > On 10/14/19 6:09 AM, Nathan Sidwell wrote:
> >> On 10/14/19 7:16 AM, Richard Biener wrote:
> >>> On Sun, Oct 13, 2019 at 4:45 PM Nathan Sidwell  wrote:
> 
>  In constifying some more of line-map I discovered gengtype didn't
>  know mutable.
>  Added thusly.
> >>>
> >>> mutable is bad.  Why do you want to use it?
> >>
> >> the line map info has a caching field.
> > Isn't that in fact a classic use case for mutable?
>
> Indeed it is.  I'm curious as to the 'mutable is bad' origin.  Is it
> similar to 'goto is bad'?
>
> nathan
>
>
> --
> Nathan Sidwell

Re: [PATCH] S/390: Run %a0:DI splitters only after reload

2019-10-15 Thread Andreas Krebbel

On 11.10.19 18:35, Ilya Leoshkevich wrote:
> Bootstrapped and regtested on s390x-redhat-linux.
> 
> gcc/ChangeLog:
> 
> 2019-10-10  Ilya Leoshkevich  
> 
>   * config/s390/s390.md: Run %a0:DI splitters only after reload.
> 
> gcc/testsuite/ChangeLog:
> 
> 2019-10-10  Ilya Leoshkevich  
> 
>   * gcc.target/s390/load-thread-pointer-once.c: New test.

Ok. Thanks!

Andreas

1 2 >

1 - 100 of 107 matches

Mail list logo