[pushed] c++: preserve BASELINK from lookup [PR91706]

2021-06-07 Thread Jason Merrill via Gcc-patches
In the earlier patch for PR91706 I fixed the BASELINK built by
baselink_for_fns, but since we already had one from lookup, we should keep
that one around instead of stripping it.  The removed hunk in
get_class_binding was a wierdly large amount of code to decide whether to
pull out BASELINK_FUNCTIONS.

Tested x86_64-pc-linux-gnu, applying to trunk.

gcc/cp/ChangeLog:

PR c++/91706
* name-lookup.c (get_class_binding): Keep a BASELINK.
(set_inherited_value_binding_p): Adjust.
* lambda.c (is_lambda_ignored_entity): Adjust.
* pt.c (lookup_template_function): Copy a BASELINK before
modifying it.
---
 gcc/cp/lambda.c  |  6 +++---
 gcc/cp/name-lookup.c | 24 +---
 gcc/cp/pt.c  |  1 +
 3 files changed, 5 insertions(+), 26 deletions(-)

diff --git a/gcc/cp/lambda.c b/gcc/cp/lambda.c
index 4a1e090ead4..2e9d38bbe83 100644
--- a/gcc/cp/lambda.c
+++ b/gcc/cp/lambda.c
@@ -1338,9 +1338,9 @@ is_lambda_ignored_entity (tree val)
 
   /* None of the lookups that use qualify_lookup want the op() from the
  lambda; they want the one from the enclosing class.  */
-  val = OVL_FIRST (val);
-  if (LAMBDA_FUNCTION_P (val))
-return true;
+  if (tree fns = maybe_get_fns (val))
+if (LAMBDA_FUNCTION_P (OVL_FIRST (fns)))
+  return true;
 
   return false;
 }
diff --git a/gcc/cp/name-lookup.c b/gcc/cp/name-lookup.c
index 241ad2b9c32..1be5f3da6d5 100644
--- a/gcc/cp/name-lookup.c
+++ b/gcc/cp/name-lookup.c
@@ -5236,7 +5236,7 @@ set_inherited_value_binding_p (cxx_binding *binding, tree 
decl,
 {
   tree context;
 
-  if (TREE_CODE (decl) == OVERLOAD)
+  if (is_overloaded_fn (decl))
context = ovl_scope (decl);
   else
{
@@ -5338,28 +5338,6 @@ get_class_binding (tree name, cp_binding_level *scope)
 /*protect=*/2, /*want_type=*/false,
 tf_warning_or_error);
 
-  if (value_binding
-  && (TREE_CODE (value_binding) == TYPE_DECL
- || DECL_CLASS_TEMPLATE_P (value_binding)
- || (TREE_CODE (value_binding) == TREE_LIST
- && TREE_TYPE (value_binding) == error_mark_node
- && (TREE_CODE (TREE_VALUE (value_binding))
- == TYPE_DECL
-/* We found a type binding, even when looking for a non-type
-   binding.  This means that we already processed this binding
-   above.  */
-;
-  else if (value_binding)
-{
-  if (TREE_CODE (value_binding) == TREE_LIST
- && TREE_TYPE (value_binding) == error_mark_node)
-   /* NAME is ambiguous.  */
-   ;
-  else if (BASELINK_P (value_binding))
-   /* NAME is some overloaded functions.  */
-   value_binding = BASELINK_FUNCTIONS (value_binding);
-}
-
   /* If we found either a type binding or a value binding, create a
  new binding object.  */
   if (type_binding || value_binding)
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 2ae886d3a39..b0155a9c370 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -9597,6 +9597,7 @@ lookup_template_function (tree fns, tree arglist)
 
   if (BASELINK_P (fns))
 {
+  fns = copy_node (fns);
   BASELINK_FUNCTIONS (fns) = build2 (TEMPLATE_ID_EXPR,
 unknown_type_node,
 BASELINK_FUNCTIONS (fns),

base-commit: 715614ec3ec5390293e508bb190335d28db1fa8b
prerequisite-patch-id: 91247507710e22bb04c3dffc0a3c418152b7dcd0
-- 
2.27.0



[pushed] c++: alias with same name as base fn [PR91706]

2021-06-07 Thread Jason Merrill via Gcc-patches
This is a bit complex.  Looking up c in the definition of D::c finds
C::c, OK.  Looking up c in the definition of E finds D::c, OK.  Since the
alias is not dependent, we strip it from the template argument, leaving

using E = A())>;

where 'c' still refers to C::c.  But instantiating E looks up 'c' again and
finds D::c, which isn't a function, and sadness ensues.

I think the bug here is looking up 'c' in D at instantiation time; the
declaration we found before is not dependent.  This seems to happen because
baselink_for_fns gets BASELINK_BINFO wrong; it is supposed to be the base
where lookup found the functions, C in this case.

Tested x86_64-pc-linux-gnu, applying to trunk.

gcc/cp/ChangeLog:

PR c++/91706
* semantics.c (baselink_for_fns): Fix BASELINK_BINFO.

gcc/testsuite/ChangeLog:

PR c++/91706
* g++.dg/template/lookup17.C: New test.
---
 gcc/cp/semantics.c   |  6 --
 gcc/testsuite/g++.dg/template/lookup17.C | 18 ++
 2 files changed, 22 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/template/lookup17.C

diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index d08c1ddabf9..f506a239864 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -3663,8 +3663,10 @@ baselink_for_fns (tree fns)
   cl = currently_open_derived_class (scope);
   if (!cl)
 cl = scope;
-  cl = TYPE_BINFO (cl);
-  return build_baselink (cl, cl, fns, /*optype=*/NULL_TREE);
+  tree access_path = TYPE_BINFO (cl);
+  tree conv_path = (cl == scope ? access_path
+   : lookup_base (cl, scope, ba_any, NULL, tf_none));
+  return build_baselink (conv_path, access_path, fns, /*optype=*/NULL_TREE);
 }
 
 /* Returns true iff DECL is a variable from a function outside
diff --git a/gcc/testsuite/g++.dg/template/lookup17.C 
b/gcc/testsuite/g++.dg/template/lookup17.C
new file mode 100644
index 000..b8571b9f1eb
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/lookup17.C
@@ -0,0 +1,18 @@
+// PR c++/91706
+// { dg-do compile { target c++11 } }
+// { dg-additional-options -g }
+
+template  struct A;
+
+struct B { static constexpr bool g = false; };
+
+struct C {
+  template  static B c ();
+};
+
+template  struct D : C {
+  using c = decltype (c());
+  using E = A;
+};
+
+D g;

base-commit: 715614ec3ec5390293e508bb190335d28db1fa8b
-- 
2.27.0



[pushed] c++: fix modules binfo merging

2021-06-07 Thread Jason Merrill via Gcc-patches
My coming fix for PR91706 caused some regressions in the modules testsuite.
This turned out to be because the change to properly use the base subobject
BINFO as BASELINK_BINFO hit problems with the code for merging binfos.  The
tree reader needed a typo fix.  The duplicate_hash function was crashing on
the BINFO for a variadic base in .  I started fixing the hash
function, but then noticed that there's no ::equal function defined;
duplicate_hash just uses pointer equality, so we might as well also
use the normal pointer hash for the moment.

Tested x86_64-pc-linux-gnu, applying to trunk.

gcc/cp/ChangeLog:

* module.cc (duplicate_hash::hash): Comment out.
(trees_in::tree_value): Adjust loop counter.
---
 gcc/cp/module.cc | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index f0fb0144706..f259515a498 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -2820,12 +2820,16 @@ struct merge_key {
 
 struct duplicate_hash : nodel_ptr_hash
 {
+#if 0
+  /* This breaks variadic bases in the xtreme_header tests.  Since ::equal is
+ the default pointer_hash::equal, let's use the default hash as well.  */
   inline static hashval_t hash (value_type decl)
   {
 if (TREE_CODE (decl) == TREE_BINFO)
   decl = TYPE_NAME (BINFO_TYPE (decl));
 return hashval_t (DECL_UID (decl));
   }
+#endif
 };
 
 /* Hashmap of merged duplicates.  Usually decls, but can contain
@@ -8908,7 +8912,7 @@ trees_in::tree_value ()
  dump (dumper::MERGE)
&& dump ("Deduping binfo %N[%u]", type, ix);
  existing = TYPE_BINFO (type);
- while (existing && ix)
+ while (existing && ix--)
existing = TREE_CHAIN (existing);
  if (existing)
register_duplicate (t, existing);

base-commit: e1521b170b44be5cd5d36a98b6b760457b68f566
prerequisite-patch-id: 7e7fc5a2a18d7a60f7db06fcb792fd5e5f7ae636
-- 
2.27.0



[pushed] c++: alias member template [PR100102]

2021-06-07 Thread Jason Merrill via Gcc-patches
Patrick already fixed the primary cause of this bug.  But while I was
looking at this testcase I noticed that with the qualified name k::o we
ended up with a plain FUNCTION_DECL, whereas without the k:: we got a
BASELINK.  There seems to be no good reason not to return the BASELINK
in this case as well.

Tested x86_64-pc-linux-gnu, applying to trunk.

PR c++/100102

gcc/cp/ChangeLog:

* init.c (build_offset_ref): Return the BASELINK for a static
member function.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/alias-decl-73.C: New test.
---
 gcc/cp/init.c  | 2 +-
 gcc/testsuite/g++.dg/cpp0x/alias-decl-73.C | 9 +
 2 files changed, 10 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/alias-decl-73.C

diff --git a/gcc/cp/init.c b/gcc/cp/init.c
index b1123287300..1b161d526f6 100644
--- a/gcc/cp/init.c
+++ b/gcc/cp/init.c
@@ -2214,7 +2214,7 @@ build_offset_ref (tree type, tree member, bool address_p,
  if (!ok)
return error_mark_node;
  if (DECL_STATIC_FUNCTION_P (t))
-   return t;
+   return member;
  member = t;
}
   else
diff --git a/gcc/testsuite/g++.dg/cpp0x/alias-decl-73.C 
b/gcc/testsuite/g++.dg/cpp0x/alias-decl-73.C
new file mode 100644
index 000..aae778646dc
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/alias-decl-73.C
@@ -0,0 +1,9 @@
+// PR c++/100102
+// { dg-do compile { target c++11 } }
+
+template  using a = int;
+template  struct k {
+  static long o();
+  template  using n = a;
+  n q;
+};

base-commit: e1521b170b44be5cd5d36a98b6b760457b68f566
-- 
2.27.0



Re: [PATCH] define auto_vec copy ctor and assignment (PR 90904)

2021-06-07 Thread Trevor Saunders
On Mon, Jun 07, 2021 at 02:34:26PM -0600, Martin Sebor wrote:
> On 6/7/21 2:51 AM, Richard Biener wrote:
> > On Thu, Jun 3, 2021 at 10:29 AM Trevor Saunders  
> > wrote:
> > > 
> > > On Wed, Jun 02, 2021 at 10:04:03AM -0600, Martin Sebor via Gcc-patches 
> > > wrote:
> > > > On 6/2/21 12:55 AM, Richard Biener wrote:
> > > > > On Tue, Jun 1, 2021 at 9:56 PM Martin Sebor  wrote:
> > > > > > 
> > > > > > On 5/27/21 2:53 PM, Jason Merrill wrote:
> > > > > > > On 4/27/21 11:52 AM, Martin Sebor via Gcc-patches wrote:
> > > > > > > > On 4/27/21 8:04 AM, Richard Biener wrote:
> > > > > > > > > On Tue, Apr 27, 2021 at 3:59 PM Martin Sebor 
> > > > > > > > >  wrote:
> > > > > > > > > > 
> > > > > > > > > > On 4/27/21 1:58 AM, Richard Biener wrote:
> > > > > > > > > > > On Tue, Apr 27, 2021 at 2:46 AM Martin Sebor via 
> > > > > > > > > > > Gcc-patches
> > > > > > > > > > >  wrote:
> > > > > > > > > > > > 
> > > > > > > > > > > > PR 90904 notes that auto_vec is unsafe to copy and 
> > > > > > > > > > > > assign because
> > > > > > > > > > > > the class manages its own memory but doesn't define (or 
> > > > > > > > > > > > delete)
> > > > > > > > > > > > either special function.  Since I first ran into the 
> > > > > > > > > > > > problem,
> > > > > > > > > > > > auto_vec has grown a move ctor and move assignment from
> > > > > > > > > > > > a dynamically-allocated vec but still no copy ctor or 
> > > > > > > > > > > > copy
> > > > > > > > > > > > assignment operator.
> > > > > > > > > > > > 
> > > > > > > > > > > > The attached patch adds the two special functions to 
> > > > > > > > > > > > auto_vec along
> > > > > > > > > > > > with a few simple tests.  It makes auto_vec safe to use 
> > > > > > > > > > > > in containers
> > > > > > > > > > > > that expect copyable and assignable element types and 
> > > > > > > > > > > > passes
> > > > > > > > > > > > bootstrap
> > > > > > > > > > > > and regression testing on x86_64-linux.
> > > > > > > > > > > 
> > > > > > > > > > > The question is whether we want such uses to appear since 
> > > > > > > > > > > those
> > > > > > > > > > > can be quite inefficient?  Thus the option is to delete 
> > > > > > > > > > > those
> > > > > > > > > > > operators?
> > > > > > > > > > 
> > > > > > > > > > I would strongly prefer the generic vector class to have 
> > > > > > > > > > the properties
> > > > > > > > > > expected of any other generic container: copyable and 
> > > > > > > > > > assignable.  If
> > > > > > > > > > we also want another vector type with this restriction I 
> > > > > > > > > > suggest to add
> > > > > > > > > > another "noncopyable" type and make that property explicit 
> > > > > > > > > > in its name.
> > > > > > > > > > I can submit one in a followup patch if you think we need 
> > > > > > > > > > one.
> > > > > > > > > 
> > > > > > > > > I'm not sure (and not strictly against the copy and assign).  
> > > > > > > > > Looking
> > > > > > > > > around
> > > > > > > > > I see that vec<> does not do deep copying.  Making auto_vec<> 
> > > > > > > > > do it
> > > > > > > > > might be surprising (I added the move capability to match how 
> > > > > > > > > vec<>
> > > > > > > > > is used - as "reference" to a vector)
> > > > > > > > 
> > > > > > > > The vec base classes are special: they have no ctors at all 
> > > > > > > > (because
> > > > > > > > of their use in unions).  That's something we might have to 
> > > > > > > > live with
> > > > > > > > but it's not a model to follow in ordinary containers.
> > > > > > > 
> > > > > > > I don't think we have to live with it anymore, now that we're 
> > > > > > > writing
> > > > > > > C++11.
> > > > > > > 
> > > > > > > > The auto_vec class was introduced to fill the need for a 
> > > > > > > > conventional
> > > > > > > > sequence container with a ctor and dtor.  The missing copy ctor 
> > > > > > > > and
> > > > > > > > assignment operators were an oversight, not a deliberate 
> > > > > > > > feature.
> > > > > > > > This change fixes that oversight.
> > > 
> > > I've been away a while, but trying to get back into this, sorry.  It was
> > > definitely an oversight to leave these undefined for the compiler to
> > > provide a default definition of, but I agree with Richi, the better
> > > thing to have done, or do now would be to mark them as deleted and make
> > > auto_vec move only (with copy() for when you really need a deep copy.
> > > > > > > > 
> > > > > > > > The revised patch also adds a copy ctor/assignment to the 
> > > > > > > > auto_vec
> > > > > > > > primary template (that's also missing it).  In addition, it adds
> > > > > > > > a new class called auto_vec_ncopy that disables copying and
> > > > > > > > assignment as you prefer.
> > > > > > > 
> > > > > > > Hmm, adding another class doesn't really help with the confusion 
> > > > > > > richi
> > > > > > > mentions.  And many uses of auto_vec will pass them as vec, which 
> > > > > > > will
> > > > > > > still do a shallow copy.  I think it's probably better 

Re: [PATCH] define auto_vec copy ctor and assignment (PR 90904)

2021-06-07 Thread Trevor Saunders
On Mon, Jun 07, 2021 at 04:17:09PM -0600, Martin Sebor wrote:
> On 6/3/21 2:29 AM, Trevor Saunders wrote:
> > On Wed, Jun 02, 2021 at 10:04:03AM -0600, Martin Sebor via Gcc-patches 
> > wrote:
> > > On 6/2/21 12:55 AM, Richard Biener wrote:
> > > > On Tue, Jun 1, 2021 at 9:56 PM Martin Sebor  wrote:
> > > > > 
> > > > > On 5/27/21 2:53 PM, Jason Merrill wrote:
> > > > > > On 4/27/21 11:52 AM, Martin Sebor via Gcc-patches wrote:
> > > > > > > On 4/27/21 8:04 AM, Richard Biener wrote:
> > > > > > > > On Tue, Apr 27, 2021 at 3:59 PM Martin Sebor  
> > > > > > > > wrote:
> > > > > > > > > 
> > > > > > > > > On 4/27/21 1:58 AM, Richard Biener wrote:
> > > > > > > > > > On Tue, Apr 27, 2021 at 2:46 AM Martin Sebor via Gcc-patches
> > > > > > > > > >  wrote:
> > > > > > > > > > > 
> > > > > > > > > > > PR 90904 notes that auto_vec is unsafe to copy and assign 
> > > > > > > > > > > because
> > > > > > > > > > > the class manages its own memory but doesn't define (or 
> > > > > > > > > > > delete)
> > > > > > > > > > > either special function.  Since I first ran into the 
> > > > > > > > > > > problem,
> > > > > > > > > > > auto_vec has grown a move ctor and move assignment from
> > > > > > > > > > > a dynamically-allocated vec but still no copy ctor or copy
> > > > > > > > > > > assignment operator.
> > > > > > > > > > > 
> > > > > > > > > > > The attached patch adds the two special functions to 
> > > > > > > > > > > auto_vec along
> > > > > > > > > > > with a few simple tests.  It makes auto_vec safe to use 
> > > > > > > > > > > in containers
> > > > > > > > > > > that expect copyable and assignable element types and 
> > > > > > > > > > > passes
> > > > > > > > > > > bootstrap
> > > > > > > > > > > and regression testing on x86_64-linux.
> > > > > > > > > > 
> > > > > > > > > > The question is whether we want such uses to appear since 
> > > > > > > > > > those
> > > > > > > > > > can be quite inefficient?  Thus the option is to delete 
> > > > > > > > > > those
> > > > > > > > > > operators?
> > > > > > > > > 
> > > > > > > > > I would strongly prefer the generic vector class to have the 
> > > > > > > > > properties
> > > > > > > > > expected of any other generic container: copyable and 
> > > > > > > > > assignable.  If
> > > > > > > > > we also want another vector type with this restriction I 
> > > > > > > > > suggest to add
> > > > > > > > > another "noncopyable" type and make that property explicit in 
> > > > > > > > > its name.
> > > > > > > > > I can submit one in a followup patch if you think we need one.
> > > > > > > > 
> > > > > > > > I'm not sure (and not strictly against the copy and assign).  
> > > > > > > > Looking
> > > > > > > > around
> > > > > > > > I see that vec<> does not do deep copying.  Making auto_vec<> 
> > > > > > > > do it
> > > > > > > > might be surprising (I added the move capability to match how 
> > > > > > > > vec<>
> > > > > > > > is used - as "reference" to a vector)
> > > > > > > 
> > > > > > > The vec base classes are special: they have no ctors at all 
> > > > > > > (because
> > > > > > > of their use in unions).  That's something we might have to live 
> > > > > > > with
> > > > > > > but it's not a model to follow in ordinary containers.
> > > > > > 
> > > > > > I don't think we have to live with it anymore, now that we're 
> > > > > > writing
> > > > > > C++11.
> > > > > > 
> > > > > > > The auto_vec class was introduced to fill the need for a 
> > > > > > > conventional
> > > > > > > sequence container with a ctor and dtor.  The missing copy ctor 
> > > > > > > and
> > > > > > > assignment operators were an oversight, not a deliberate feature.
> > > > > > > This change fixes that oversight.
> > 
> > I've been away a while, but trying to get back into this, sorry.  It was
> > definitely an oversight to leave these undefined for the compiler to
> > provide a default definition of, but I agree with Richi, the better
> > thing to have done, or do now would be to mark them as deleted and make
> > auto_vec move only (with copy() for when you really need a deep copy.
> > > > > > > 
> > > > > > > The revised patch also adds a copy ctor/assignment to the auto_vec
> > > > > > > primary template (that's also missing it).  In addition, it adds
> > > > > > > a new class called auto_vec_ncopy that disables copying and
> > > > > > > assignment as you prefer.
> > > > > > 
> > > > > > Hmm, adding another class doesn't really help with the confusion 
> > > > > > richi
> > > > > > mentions.  And many uses of auto_vec will pass them as vec, which 
> > > > > > will
> > > > > > still do a shallow copy.  I think it's probably better to disable 
> > > > > > the
> > > > > > copy special members for auto_vec until we fix vec<>.
> > > > > 
> > > > > There are at least a couple of problems that get in the way of fixing
> > > > > all of vec to act like a well-behaved C++ container:
> > > > > 
> > > > > 1) The embedded vec has a trailing "flexible" array member with its
> > > > > instances 

[PATCH] doc/typo: mthread -> mthreads

2021-06-07 Thread imba-tjd via Gcc-patches
> unrecognized command line option '-mthread'; did you mean '-mthreads'?
---
 gcc/doc/invoke.texi | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 04048cd8332b..92bb1308b805 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -31848,8 +31848,8 @@ startup object and entry point.
 This option is available for Cygwin and MinGW targets.  It
 specifies that the @code{dllimport} attribute should be ignored.
 
-@item -mthread
-@opindex mthread
+@item -mthreads
+@opindex mthreads
 This option is available for MinGW targets. It specifies
 that MinGW-specific thread support is to be used.
 
-- 
2.31.1



Re: [RFC/PATCH 00/11] Fix up some unexpected empty split conditions

2021-06-07 Thread Kewen.Lin via Gcc-patches
Hi Segher,

on 2021/6/8 上午7:50, Segher Boessenkool wrote:
> Hi!
> 
> On Fri, Jun 04, 2021 at 10:57:51AM +0800, Kewen.Lin via Gcc-patches wrote:
>> To find out those need fixing seems to be the critical part.  It's
>> not hard to add one explicit "&&" to those that don't have it now, but
>> even with further bootstrapped and regression tested I'm still not
>> confident the adjustments are safe enough, since the testing coverage
>> could be limited.  It may need more efforts to revisit, or/and test
>> with more coverages, and port maintainers' reviews.
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572120.html
> 
> This adds an "&&" everywhere (or in fact, it just skips any existing
> one, it just has the same effect of adding it everywhere).  I tested it
> with building gcc and Linux for all supported targets (31 of them; I do
> some with multiple configs, mostly 32-bit and 64-bit).  None had any
> difference before and after the change.
> 
> So I am no longer worried that there will be any fallout from doing
> this.  There are many things that *could* go wrong, but I don't think
> there will be enough at all to be an impediment to just throwing the
> switch.
> 
> If we go this way no target will need any significant fixing, maybe none
> at all will be needed across all targets.  And no changes will be needed
> anywhere immediately.  We could make leading "&&" deprecated, and the
> same for split condition "1" (which was "&& 1").  This is easy to change
> automatically as well.
> 
Thanks very much for doing this!

I guess we are not going to backport this?  If we won't, it seems to need
some way to ensure the implied "&&" will show up explicitly when backporting
some define_insn_and_split.

BR,
Kewen


Re: [RFC/PATCH 00/11] Fix up some unexpected empty split conditions

2021-06-07 Thread Kewen.Lin via Gcc-patches
on 2021/6/7 下午3:12, Richard Biener wrote:
> On Fri, Jun 4, 2021 at 4:58 AM Kewen.Lin via Gcc-patches
>  wrote:
>>
>> Hi Segher,
>>
>> on 2021/6/3 下午5:18, Segher Boessenkool wrote:
>>> On Thu, Jun 03, 2021 at 03:00:44AM -0500, Segher Boessenkool wrote:
 On Thu, Jun 03, 2021 at 01:22:38PM +0800, Kewen.Lin wrote:
 The whole point of requiring the split condition to start with && is so
 it will become harder to mess things up (it will make the gen* code a
 tiny little bit simpler as well).  And there is no transition period or
 anything like that needed either.  Just the bunch that will break will
 need fixing.  So let's find out how many of those there are :-)

>>
>> To find out those need fixing seems to be the critical part.  It's
>> not hard to add one explicit "&&" to those that don't have it now, but
>> even with further bootstrapped and regression tested I'm still not
>> confident the adjustments are safe enough, since the testing coverage
>> could be limited.  It may need more efforts to revisit, or/and test
>> with more coverages, and port maintainers' reviews.
>>
>> In order to find one example which needs more fixing, for rs6000/i386/
>> aarch64, I fixed all define_insn_and_splits whose insn cond isn't empty
>> (from gensupport's view since the iterator can add more on) while split
>> cond don't start with "&&" , also skipped those whose insn conds are
>> the same as their split conds.  Unfortunately (or fortunately :-\) all
>> were bootstrapped and regress-tested.
>>
>> The related diffs are attached, which is based on r12-0.
> 
> For preserving existing behavior with requiring "&& " we can (mechanically?)
> split define_insn_and_split into define_insn and define_split with the old
> condition, right?
> 

Yeah, we can replace all possible "unsafe" ones with separated define_insn
and define_split.

> That is, all empty insn condition cases are of course obvious to fix.
> 

Sorry for the confusion, those ones are false positive.  Mode iterators
can append more conditions onto empty insn condition to make it not empty,
I noticed it happened for split condition at the same time.  The previous
checking is very rough and didn't care about them as they are not harmful
for the testing.  My latest local patch has excluded them by checking the
beginning and the end, also for the variant with one more external "()".

BR,
Kewen

>> How many such cases *are* there?  There are no users exposed to this,
>> and when the split condition is required to start with "&&" (instead of
>> getting that implied) it is not a silent change ever, either.
>
> If I read the proposal right, the explicit "&&" is only required when 
> going
> to find all potential problematic places for final implied "&&" change.
> But one explicit "&&" does offer good readability.

 My proposal is to always require && (or maybe identical insn and split
 conditions should be allowed as well, if people still use that -- that
 is how we wrote "&& 1" before that existed).
>>>
>>> I prototyped this.  There are very many errors.  Iterators often modify
>>> the insn condition (for one iteration of it), but that does not work if
>>> the split condition does not start with "&&"!
>>>
>>> See attached prototype.
>>>
>>>
>>
>> Thanks for the prototype!
>>
>> BR,
>> Kewen
>>
>>> Segher
>>>
>>> = = =
>>>
>>> diff --git a/gcc/gensupport.c b/gcc/gensupport.c
>>> index 2cb760ffb90f..05d46fd3775c 100644
>>> --- a/gcc/gensupport.c
>>> +++ b/gcc/gensupport.c
>>> @@ -590,7 +590,6 @@ process_rtx (rtx desc, file_location loc)
>>>  case DEFINE_INSN_AND_SPLIT:
>>>  case DEFINE_INSN_AND_REWRITE:
>>>{
>>> - const char *split_cond;
>>>   rtx split;
>>>   rtvec attr;
>>>   int i;
>>> @@ -611,15 +610,20 @@ process_rtx (rtx desc, file_location loc)
>>>
>>>   /* If the split condition starts with "&&", append it to the
>>>  insn condition to create the new split condition.  */
>>> - split_cond = XSTR (desc, 4);
>>> - if (split_cond[0] == '&' && split_cond[1] == '&')
>>> + const char *insn_cond = XSTR (desc, 2);
>>> + const char *split_cond = XSTR (desc, 4);
>>> + if (!strncmp (split_cond, "&&", 2))
>>> {
>>>   rtx_reader_ptr->copy_md_ptr_loc (split_cond + 2, split_cond);
>>> - split_cond = rtx_reader_ptr->join_c_conditions (XSTR (desc, 2),
>>> + split_cond = rtx_reader_ptr->join_c_conditions (insn_cond,
>>>   split_cond + 2);
>>> +   } else if (insn_cond[0]) {
>>> + if (GET_CODE (desc) == DEFINE_INSN_AND_REWRITE)
>>> +   error_at (loc, "the rewrite condition must start with `&&'");
>>> + else
>>> +   error_at (loc, "the split condition must start with `&&' [%s]", 
>>> insn_cond);
>>> }
>>> - else if (GET_CODE (desc) == DEFINE_INSN_AND_REWRITE)
>>> -   error_at (loc, "the rewrite condition must start 

[PATCH v2] rs6000: Support doubleword swaps removal in rot64 load store [PR100085]

2021-06-07 Thread Xionghu Luo via Gcc-patches
Update the patch according to the comments.  Thanks.


On P8LE, extra rot64+rot64 load or store instructions are generated
in float128 to vector __int128 conversion.

This patch teaches pass swaps to also handle such pattens to remove
extra swap instructions.

(insn 7 6 8 2 (set (subreg:V1TI (reg:KF 123) 0)
(rotate:V1TI (mem/u/c:V1TI (reg/f:DI 121) [0  S16 A128])
(const_int 64 [0x40]))) {*vsx_le_permute_v1ti})
(insn 8 7 9 2 (set (subreg:V1TI (reg:KF 122) 0)
(rotate:V1TI (subreg:V1TI (reg:KF 123) 0)
(const_int 64 [0x40])))  {*vsx_le_permute_v1ti})
=>
(insn 22 6 23 2 (set (subreg:V1TI (reg:KF 123) 0)
(mem/u/c:V1TI (and:DI (reg/f:DI 121)
  (const_int -16 [0xfff0])) [0  S16 A128])))
(insn 23 22 25 2 (set (subreg:V1TI (reg:KF 122) 0)
(subreg:V1TI (reg:KF 123) 0)))

gcc/ChangeLog:

* config/rs6000/rs6000-p8swap.c (pattern_is_rotate64_p): New.
(insn_is_load_p): Use pattern_is_rotate64_p.
(insn_is_swap_p): Likewise.
(quad_aligned_load_p): Likewise.
(const_load_sequence_p): Likewise.
(replace_swapped_aligned_load): Likewise.
(recombine_lvx_pattern): Likewise.
(recombine_stvx_pattern): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/float128-call.c: Adjust.
* gcc.target/powerpc/pr100085.c: New test.
---
 gcc/config/rs6000/rs6000-p8swap.c | 37 +++
 .../gcc.target/powerpc/float128-call.c|  4 +-
 gcc/testsuite/gcc.target/powerpc/pr100085.c   | 24 
 3 files changed, 56 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr100085.c

diff --git a/gcc/config/rs6000/rs6000-p8swap.c 
b/gcc/config/rs6000/rs6000-p8swap.c
index ec503ab742f..3b74e05e396 100644
--- a/gcc/config/rs6000/rs6000-p8swap.c
+++ b/gcc/config/rs6000/rs6000-p8swap.c
@@ -250,6 +250,20 @@ union_uses (swap_web_entry *insn_entry, rtx insn, df_ref 
def)
 }
 }
 
+/* Return 1 iff PAT is a rotate 64 bit expression; else return 0.  */
+
+static bool
+pattern_is_rotate64_p (rtx pat)
+{
+  rtx rot = SET_SRC (pat);
+
+  if (GET_CODE (rot) == ROTATE && CONST_INT_P (XEXP (rot, 1))
+  && INTVAL (XEXP (rot, 1)) == 64)
+return true;
+
+  return false;
+}
+
 /* Return 1 iff INSN is a load insn, including permuting loads that
represent an lvxd2x instruction; else return 0.  */
 static unsigned int
@@ -266,6 +280,9 @@ insn_is_load_p (rtx insn)
  && MEM_P (XEXP (SET_SRC (body), 0)))
return 1;
 
+  if (pattern_is_rotate64_p (body) && MEM_P (XEXP (SET_SRC (body), 0)))
+   return 1;
+
   return 0;
 }
 
@@ -305,6 +322,8 @@ insn_is_swap_p (rtx insn)
   if (GET_CODE (body) != SET)
 return 0;
   rtx rhs = SET_SRC (body);
+  if (pattern_is_rotate64_p (body))
+return 1;
   if (GET_CODE (rhs) != VEC_SELECT)
 return 0;
   rtx parallel = XEXP (rhs, 1);
@@ -392,7 +411,8 @@ quad_aligned_load_p (swap_web_entry *insn_entry, rtx_insn 
*insn)
  false.  */
   rtx body = PATTERN (def_insn);
   if (GET_CODE (body) != SET
-  || GET_CODE (SET_SRC (body)) != VEC_SELECT
+  || !(GET_CODE (SET_SRC (body)) == VEC_SELECT
+ || pattern_is_rotate64_p (body))
   || !MEM_P (XEXP (SET_SRC (body), 0)))
 return false;
 
@@ -531,7 +551,8 @@ const_load_sequence_p (swap_web_entry *insn_entry, rtx insn)
 false.  */
   rtx body = PATTERN (def_insn);
   if (GET_CODE (body) != SET
- || GET_CODE (SET_SRC (body)) != VEC_SELECT
+ || !(GET_CODE (SET_SRC (body)) == VEC_SELECT
+  || pattern_is_rotate64_p (body))
  || !MEM_P (XEXP (SET_SRC (body), 0)))
return false;
 
@@ -1730,7 +1751,8 @@ replace_swapped_aligned_load (swap_web_entry *insn_entry, 
rtx swap_insn)
  swap (indicated by code VEC_SELECT).  */
   rtx body = PATTERN (def_insn);
   gcc_assert ((GET_CODE (body) == SET)
- && (GET_CODE (SET_SRC (body)) == VEC_SELECT)
+ && (GET_CODE (SET_SRC (body)) == VEC_SELECT
+ || pattern_is_rotate64_p (body))
  && MEM_P (XEXP (SET_SRC (body), 0)));
 
   rtx src_exp = XEXP (SET_SRC (body), 0);
@@ -2148,7 +2170,8 @@ recombine_lvx_pattern (rtx_insn *insn, del_info 
*to_delete)
 {
   rtx body = PATTERN (insn);
   gcc_assert (GET_CODE (body) == SET
- && GET_CODE (SET_SRC (body)) == VEC_SELECT
+ && (GET_CODE (SET_SRC (body)) == VEC_SELECT
+ || pattern_is_rotate64_p (body))
  && MEM_P (XEXP (SET_SRC (body), 0)));
 
   rtx mem = XEXP (SET_SRC (body), 0);
@@ -2223,9 +2246,9 @@ static void
 recombine_stvx_pattern (rtx_insn *insn, del_info *to_delete)
 {
   rtx body = PATTERN (insn);
-  gcc_assert (GET_CODE (body) == SET
- && MEM_P (SET_DEST (body))
- && GET_CODE (SET_SRC (body)) == VEC_SELECT);
+  gcc_assert (GET_CODE (body) == SET && MEM_P (SET_DEST (body))
+ && (GET_CODE (SET_SRC (body)) == 

[PATCH] libstdc++: Fix Wrong param type in :atomic_ref<_Tp*>::wait [PR100889]

2021-06-07 Thread Thomas Rodgers
This time without the repeatred [PR] in the subject line.

Fixes libstdc++/100889

libstdc++-v3/ChangeLog:

* include/bits/atomic_base.h (atomic_ref<_Tp*>::wait):
Change parameter type from _Tp to _Tp*.
* testsuite/29_atomics/atomic_ref/wait_notify.cc: Extend
coverage of types tested.
---
 libstdc++-v3/include/bits/atomic_base.h   |  2 +-
 .../29_atomics/atomic_ref/wait_notify.cc  | 38 ---
 2 files changed, 26 insertions(+), 14 deletions(-)

diff --git a/libstdc++-v3/include/bits/atomic_base.h 
b/libstdc++-v3/include/bits/atomic_base.h
index 029b8ad65a9..20cf1343c58 100644
--- a/libstdc++-v3/include/bits/atomic_base.h
+++ b/libstdc++-v3/include/bits/atomic_base.h
@@ -1870,7 +1870,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
 #if __cpp_lib_atomic_wait
   _GLIBCXX_ALWAYS_INLINE void
-  wait(_Tp __old, memory_order __m = memory_order_seq_cst) const noexcept
+  wait(_Tp* __old, memory_order __m = memory_order_seq_cst) const noexcept
   { __atomic_impl::wait(_M_ptr, __old, __m); }
 
   // TODO add const volatile overload
diff --git a/libstdc++-v3/testsuite/29_atomics/atomic_ref/wait_notify.cc 
b/libstdc++-v3/testsuite/29_atomics/atomic_ref/wait_notify.cc
index 2fd31304222..2500dddf884 100644
--- a/libstdc++-v3/testsuite/29_atomics/atomic_ref/wait_notify.cc
+++ b/libstdc++-v3/testsuite/29_atomics/atomic_ref/wait_notify.cc
@@ -26,22 +26,34 @@
 
 #include 
 
+template
+  void
+  test (S va, S vb)
+  {
+S aa{ va };
+S bb{ vb };
+std::atomic_ref a{ aa };
+a.wait(bb);
+std::thread t([&]
+  {
+a.store(bb);
+a.notify_one();
+  });
+a.wait(aa);
+t.join();
+  }
+
 int
 main ()
 {
+  test(0, 42);
+  test(0, 42);
+  test(0u, 42u);
+  test(0.0f, 42.0f);
+  test(0.0, 42.0);
+  test(nullptr, reinterpret_cast(42));
+
   struct S{ int i; };
-  S aa{ 0 };
-  S bb{ 42 };
-
-  std::atomic_ref a{ aa };
-  VERIFY( a.load().i == aa.i );
-  a.wait(bb);
-  std::thread t([&]
-{
-  a.store(bb);
-  a.notify_one();
-});
-  a.wait(aa);
-  t.join();
+  test(S{ 0 }, S{ 42 });
   return 0;
 }
-- 
2.26.2



[PATCH] c++: explicit() ignored on deduction guide [PR100065]

2021-06-07 Thread Marek Polacek via Gcc-patches
When we have explicit() with a value-dependent argument, we can't
evaluate it at parsing time, so cp_parser_function_specifier_opt stashes
the argument into the decl-specifiers and grokdeclarator then stores it
into explicit_specifier_map, which is then used when substituting the
function decl.  grokdeclarator stores it for constructors and conversion
functions, but we also need to do it for deduction guides, otherwise
we'll forget that we've seen an explicit-specifier as in the attached
test.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/branches?

PR c++/100065

gcc/cp/ChangeLog:

* decl.c (grokdeclarator): Store a value-dependent
explicit-specifier even for deduction guides.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/explicit18.C: New test.
---
 gcc/cp/decl.c   |  2 ++
 gcc/testsuite/g++.dg/cpp2a/explicit18.C | 23 +++
 2 files changed, 25 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/explicit18.C

diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index a3687dbb0dd..cbf647dd569 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -14043,6 +14043,8 @@ grokdeclarator (const cp_declarator *declarator,
storage_class = sc_none;
  }
  }
+   if (declspecs->explicit_specifier)
+ store_explicit_specifier (decl, declspecs->explicit_specifier);
   }
 else
   {
diff --git a/gcc/testsuite/g++.dg/cpp2a/explicit18.C 
b/gcc/testsuite/g++.dg/cpp2a/explicit18.C
new file mode 100644
index 000..c8916fa4743
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/explicit18.C
@@ -0,0 +1,23 @@
+// PR c++/100065
+// { dg-do compile { target c++20 } }
+
+template
+struct bool_constant {
+  static constexpr bool value = B;
+  constexpr operator bool() const { return value; }
+};
+
+using true_type = bool_constant;
+using false_type = bool_constant;
+
+template
+struct X {
+template
+X(T);
+};
+
+template
+explicit(b) X(bool_constant) -> X;
+
+X false_ = false_type{}; // OK
+X true_  = true_type{};  // { dg-error "explicit deduction guide" }

base-commit: e89759fdfc80db223bd852aba937acb2d7c2cd80
-- 
2.31.1



Re: [RFC/PATCH 00/11] Fix up some unexpected empty split conditions

2021-06-07 Thread Segher Boessenkool
Hi!

On Fri, Jun 04, 2021 at 10:57:51AM +0800, Kewen.Lin via Gcc-patches wrote:
> To find out those need fixing seems to be the critical part.  It's
> not hard to add one explicit "&&" to those that don't have it now, but
> even with further bootstrapped and regression tested I'm still not
> confident the adjustments are safe enough, since the testing coverage
> could be limited.  It may need more efforts to revisit, or/and test
> with more coverages, and port maintainers' reviews.

https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572120.html

This adds an "&&" everywhere (or in fact, it just skips any existing
one, it just has the same effect of adding it everywhere).  I tested it
with building gcc and Linux for all supported targets (31 of them; I do
some with multiple configs, mostly 32-bit and 64-bit).  None had any
difference before and after the change.

So I am no longer worried that there will be any fallout from doing
this.  There are many things that *could* go wrong, but I don't think
there will be enough at all to be an impediment to just throwing the
switch.

If we go this way no target will need any significant fixing, maybe none
at all will be needed across all targets.  And no changes will be needed
anywhere immediately.  We could make leading "&&" deprecated, and the
same for split condition "1" (which was "&& 1").  This is easy to change
automatically as well.


Segher


Re: [PATCH][version 3]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-06-07 Thread Kees Cook via Gcc-patches
On Mon, Jun 07, 2021 at 04:18:46PM +, Qing Zhao wrote:
> Hi, 
> 
> > On Jun 7, 2021, at 2:53 AM, Richard Biener  wrote:
> > 
> >> 
> >> To address the above suggestion:
> >> 
> >> My study shows: the call to __builtin_clear_padding is expanded during 
> >> gimplification phase.
> >> And there is no __bultin_clear_padding expanding during rtx expanding 
> >> phase.
> >> However, for -ftrivial-auto-var-init, padding initialization should be 
> >> done both in gimplification phase and rtx expanding phase.
> >> since the __builtin_clear_padding might not be good for rtx expanding, 
> >> reusing __builtin_clear_padding might not work.
> >> 
> >> Let me know if you have any more comments on this.
> > 
> > Yes, I didn't suggest to literally emit calls to __builtin_clear_padding 
> > but instead to leverage the lowering code, more specifically share the
> > code that figures _what_ is to be initialized (where the padding is)
> > and eventually the actual code generation pieces.  That might need some
> > refactoring but the code where padding resides should be present only
> > a single time (since it's quite complex).
> 
> Okay, I see your point here.
> 
> > 
> > Which is also why I suggested to split out the padding initialization
> > bits to a separate patch (and option).
> 
> Personally, I am okay with splitting padding initialization from this current 
> patch,
> Kees, what’s your opinion on this? i.e, the current -ftrivial-auto-var-init 
> will NOT initialize padding, we will add another option to 
> Explicitly initialize padding.

If two new options are needed, that's fine. But "-ftrivial-auto-var-init"
needs to do both (it is _not_ getting fully initialized if padding isn't
included). And would be a behavioral mismatch between Clang and GCC. :)

-- 
Kees Cook


Re: [PATCH][version 3]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-06-07 Thread Kees Cook via Gcc-patches
On Mon, Jun 07, 2021 at 09:48:41AM +0200, Richard Biener wrote:
> On Thu, 27 May 2021, Qing Zhao wrote:
> > @@ -5001,6 +5185,17 @@ gimplify_init_constructor (tree *expr_p, gimple_seq
> > *pre_p, gimple_seq *post_p,
> >  /* If a single access to the target must be ensured and all
> > elements
> > are zero, then it's optimal to clear whatever their number.
> > */
> >  cleared = true;
> > +   else if (flag_trivial_auto_var_init > AUTO_INIT_UNINITIALIZED
> > +&& !TREE_STATIC (object)
> > +&& type_has_padding (type))
> > + /* If the user requests to initialize automatic variables with
> > +paddings inside the type, we should initialize the paddings
> > too.
> > +C guarantees that brace-init with fewer initializers than
> > members
> > +aggregate will initialize the rest of the aggregate as-if it
> > were
> > +static initialization.  In turn static initialization
> > guarantees
> > +that pad is initialized to zero bits.
> > +So, it's better to clear the whole record under such
> > situation.  */
> > + cleared = true;
> > 
> > so here we have padding as well - I think this warrants to be controlled
> > by an extra option?  And we can maybe split this out to a separate
> > patch? (the whole padding stuff)
> > 
> > Clang does the padding initialization with this option, shall we be 
> > consistent with Clang?
> 
> Just for the sake of consistency?  No.  Is there a technical reason
> for this complication?  Say we have
> 
>   struct { short s; int i; } a;
> 
> what's the technical reason to initialize the padding?  I might
> be tempted to use -ftrivial-auto-init but I'd definitely don't
> want to spend cycles/instructions initializing the padding in the
> above struct.

Yes, this is very important. This is one of the more common ways memory
content leaks happen in programs (especially the kernel). e.g.:

struct example {
short s;
int i;
};

struct example instance = { .i = foo };

While "s" gets zeroed, the padding may not, and may contain prior memory
contents. Having this be deterministically zero is important for this
feature. If the structure gets byte-copied to a buffer (e.g. syscall,
etc), the padding will go along for the ride.

-- 
Kees Cook


Re: [PATCH, rs6000] Update Power10 scheduling description for fused instruction types

2021-06-07 Thread Segher Boessenkool
On Mon, Jun 07, 2021 at 03:41:29PM -0500, Pat Haugen wrote:
> Update Power10 scheduling description for new fused instruction types.

Okay for trunk.  Thanks!


Segher


Re: [PATCH 2/2] Add IEEE 128-bit fp conditional move on PowerPC.

2021-06-07 Thread Segher Boessenkool
On Tue, May 18, 2021 at 04:28:27PM -0400, Michael Meissner wrote:
> In this patch, I simplified things compared to previous patches.  Instead of
> allowing any four of the modes to be used for the conditional move comparison
> and the move itself could use different modes, I restricted the conditional
> move to just the same mode.  I.e. you can do:
> 
> _Float128 a, b, c, d, e, r;
> 
> r = (a == b) ? c : d;
> 
> But you can't do:
> 
> _Float128 c, d, r;
> double a, b;
> 
> r = (a == b) ? c : d;
> 
> or:
> 
> _Float128 a, b;
> double c, d, r;
> 
> r = (a == b) ? c : d;
> 
> This eliminates a lot of the complexity of the code, because you don't have to
> worry about the sizes being different, and the IEEE 128-bit types being
> restricted to Altivec registers, while the SF/DF modes can use any VSX
> register.

You do not have to worry about that anyway.  You can just reuse the
existing rs6000_maybe_emit_fp_cmove.  Or why not?  The IF_THEN_ELSE we
generate there should work fine?

> +(define_expand "movcc"
> +   [(set (match_operand:IEEE128 0 "gpc_reg_operand")
> +  (if_then_else:IEEE128 (match_operand 1 "comparison_operator")
> +(match_operand:IEEE128 2 "gpc_reg_operand")
> +(match_operand:IEEE128 3 "gpc_reg_operand")))]
> +  "TARGET_POWER10 && TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode)"
> +{
> +  if (rs6000_emit_cmove (operands[0], operands[1], operands[2], operands[3]))
> +DONE;
> +  else
> +FAIL;
> +})

Why is this a special pattern anyway?  Why can you not do
  d = cond ? x : y;
with cond any comparison, not even including any floating point
possibly?


Segher


Re: [PATCH] define auto_vec copy ctor and assignment (PR 90904)

2021-06-07 Thread Martin Sebor via Gcc-patches

On 6/3/21 2:29 AM, Trevor Saunders wrote:

On Wed, Jun 02, 2021 at 10:04:03AM -0600, Martin Sebor via Gcc-patches wrote:

On 6/2/21 12:55 AM, Richard Biener wrote:

On Tue, Jun 1, 2021 at 9:56 PM Martin Sebor  wrote:


On 5/27/21 2:53 PM, Jason Merrill wrote:

On 4/27/21 11:52 AM, Martin Sebor via Gcc-patches wrote:

On 4/27/21 8:04 AM, Richard Biener wrote:

On Tue, Apr 27, 2021 at 3:59 PM Martin Sebor  wrote:


On 4/27/21 1:58 AM, Richard Biener wrote:

On Tue, Apr 27, 2021 at 2:46 AM Martin Sebor via Gcc-patches
 wrote:


PR 90904 notes that auto_vec is unsafe to copy and assign because
the class manages its own memory but doesn't define (or delete)
either special function.  Since I first ran into the problem,
auto_vec has grown a move ctor and move assignment from
a dynamically-allocated vec but still no copy ctor or copy
assignment operator.

The attached patch adds the two special functions to auto_vec along
with a few simple tests.  It makes auto_vec safe to use in containers
that expect copyable and assignable element types and passes
bootstrap
and regression testing on x86_64-linux.


The question is whether we want such uses to appear since those
can be quite inefficient?  Thus the option is to delete those
operators?


I would strongly prefer the generic vector class to have the properties
expected of any other generic container: copyable and assignable.  If
we also want another vector type with this restriction I suggest to add
another "noncopyable" type and make that property explicit in its name.
I can submit one in a followup patch if you think we need one.


I'm not sure (and not strictly against the copy and assign).  Looking
around
I see that vec<> does not do deep copying.  Making auto_vec<> do it
might be surprising (I added the move capability to match how vec<>
is used - as "reference" to a vector)


The vec base classes are special: they have no ctors at all (because
of their use in unions).  That's something we might have to live with
but it's not a model to follow in ordinary containers.


I don't think we have to live with it anymore, now that we're writing
C++11.


The auto_vec class was introduced to fill the need for a conventional
sequence container with a ctor and dtor.  The missing copy ctor and
assignment operators were an oversight, not a deliberate feature.
This change fixes that oversight.


I've been away a while, but trying to get back into this, sorry.  It was
definitely an oversight to leave these undefined for the compiler to
provide a default definition of, but I agree with Richi, the better
thing to have done, or do now would be to mark them as deleted and make
auto_vec move only (with copy() for when you really need a deep copy.


The revised patch also adds a copy ctor/assignment to the auto_vec
primary template (that's also missing it).  In addition, it adds
a new class called auto_vec_ncopy that disables copying and
assignment as you prefer.


Hmm, adding another class doesn't really help with the confusion richi
mentions.  And many uses of auto_vec will pass them as vec, which will
still do a shallow copy.  I think it's probably better to disable the
copy special members for auto_vec until we fix vec<>.


There are at least a couple of problems that get in the way of fixing
all of vec to act like a well-behaved C++ container:

1) The embedded vec has a trailing "flexible" array member with its
instances having different size.  They're initialized by memset and
copied by memcpy.  The class can't have copy ctors or assignments
but it should disable/delete them instead.

2) The heap-based vec is used throughout GCC with the assumption of
shallow copy semantics (not just as function arguments but also as
members of other such POD classes).  This can be changed by providing
copy and move ctors and assignment operators for it, and also for
some of the classes in which it's a member and that are used with
the same assumption.

3) The heap-based vec::block_remove() assumes its elements are PODs.
That breaks in VEC_ORDERED_REMOVE_IF (used in gcc/dwarf2cfi.c:2862
and tree-vect-patterns.c).

I took a stab at both and while (1) is easy, (2) is shaping up to
be a big and tricky project.  Tricky because it involves using
std::move in places where what's moved is subsequently still used.
I can keep plugging away at it but it won't change the fact that
the embedded and heap-based vecs have different requirements.


So you figured that neither vec<> nor auto_vec<> are a container like
std::vector.


That's obvious from glancing at their definitions.  I didn't go
through the exercise to figure that out.



I'm not sure it makes sense to try to make it so since obviously vec<>
was designed to match the actual needs of GCC.  auto_vec<> was added
to make a RAII (like auto_bitmap, etc.) wrapper, plus it got the ability
to provide initial stack storage.


The goal was to see if the two vec instances could be made safer
to use but taking advantage of C++ 11 features.  As I 

[PATCH] rtl: Join the insn and split conditions in define_insn_and_split

2021-06-07 Thread Segher Boessenkool
In theory we could have a split condition not inclusive of the insn
condition in the past.  That never was a good idea, the code does not do
what a non-suspicious reader would think it does.  But it leads to more
serious problems together with iterators: if the split condition (as
written) does not start with "&&", you do not get the insn condition
included in the split condition, and that holds for the part of the insn
condition that was generated by the iterator as well!

This patch simply always joins the two conditions (after the iterators
have done their work) to get the effective split condition.

I tested this on all Linux targets, building the Linux kernel for each,
and it does not change generated code for any of them, so I think we do
not have much breakage to fear.  But it is possible for other targets of
course, and for floating point or vector code, etc.

Is this okay for trunk?


Segher


2021-06-07  Segher Boessenkool  

* gensupport.c (process_rtx) [DEFINE_INSN_AND_SPLIT]: Always include
the insn condition in the split condition.

---
 gcc/gensupport.c | 25 -
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/gcc/gensupport.c b/gcc/gensupport.c
index 2cb760ffb90f..8a6345d36470 100644
--- a/gcc/gensupport.c
+++ b/gcc/gensupport.c
@@ -590,7 +590,6 @@ process_rtx (rtx desc, file_location loc)
 case DEFINE_INSN_AND_SPLIT:
 case DEFINE_INSN_AND_REWRITE:
   {
-   const char *split_cond;
rtx split;
rtvec attr;
int i;
@@ -609,17 +608,25 @@ process_rtx (rtx desc, file_location loc)
remove_constraints (XVECEXP (split, 0, i));
  }
 
-   /* If the split condition starts with "&&", append it to the
-  insn condition to create the new split condition.  */
-   split_cond = XSTR (desc, 4);
-   if (split_cond[0] == '&' && split_cond[1] == '&')
+   const char *insn_cond = XSTR (desc, 2);
+   const char *split_cond = XSTR (desc, 4);
+   if (strncmp (split_cond, "&&", 2)
+   && GET_CODE (desc) == DEFINE_INSN_AND_REWRITE)
+ error_at (loc, "the rewrite condition must start with `&&'");
+
+   /* If the split condition starts with "&&", skip that.  */
+   if (!strncmp (split_cond, "&&", 2))
  {
rtx_reader_ptr->copy_md_ptr_loc (split_cond + 2, split_cond);
-   split_cond = rtx_reader_ptr->join_c_conditions (XSTR (desc, 2),
-   split_cond + 2);
+   split_cond += 2;
  }
-   else if (GET_CODE (desc) == DEFINE_INSN_AND_REWRITE)
- error_at (loc, "the rewrite condition must start with `&&'");
+
+   /* Always use the conjunction of the given split condition and the
+  insn condition (which includes stuff from iterators, it is not just
+  what is given in the pattern in the machine description) as the
+  split condition to use.  */
+   split_cond = rtx_reader_ptr->join_c_conditions (insn_cond, split_cond);
+
XSTR (split, 1) = split_cond;
if (GET_CODE (desc) == DEFINE_INSN_AND_REWRITE)
  XVEC (split, 2) = gen_rewrite_sequence (XVEC (desc, 1));
-- 
1.8.3.1



[PATCH] libstdc++: Fix Wrong param type in :atomic_ref<_Tp*>::wait [PR100889] [PR100889]

2021-06-07 Thread Thomas Rodgers
Fixes libstdc++/100889

libstdc++-v3/ChangeLog:

* include/bits/atomic_base.h (atomic_ref<_Tp*>::wait):
Change parameter type from _Tp to _Tp*.
* testsuite/29_atomics/atomic_ref/wait_notify.cc: Extend
coverage of types tested.
---
 libstdc++-v3/include/bits/atomic_base.h   |  2 +-
 .../29_atomics/atomic_ref/wait_notify.cc  | 38 ---
 2 files changed, 26 insertions(+), 14 deletions(-)

diff --git a/libstdc++-v3/include/bits/atomic_base.h 
b/libstdc++-v3/include/bits/atomic_base.h
index 029b8ad65a9..20cf1343c58 100644
--- a/libstdc++-v3/include/bits/atomic_base.h
+++ b/libstdc++-v3/include/bits/atomic_base.h
@@ -1870,7 +1870,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
 #if __cpp_lib_atomic_wait
   _GLIBCXX_ALWAYS_INLINE void
-  wait(_Tp __old, memory_order __m = memory_order_seq_cst) const noexcept
+  wait(_Tp* __old, memory_order __m = memory_order_seq_cst) const noexcept
   { __atomic_impl::wait(_M_ptr, __old, __m); }
 
   // TODO add const volatile overload
diff --git a/libstdc++-v3/testsuite/29_atomics/atomic_ref/wait_notify.cc 
b/libstdc++-v3/testsuite/29_atomics/atomic_ref/wait_notify.cc
index 2fd31304222..2500dddf884 100644
--- a/libstdc++-v3/testsuite/29_atomics/atomic_ref/wait_notify.cc
+++ b/libstdc++-v3/testsuite/29_atomics/atomic_ref/wait_notify.cc
@@ -26,22 +26,34 @@
 
 #include 
 
+template
+  void
+  test (S va, S vb)
+  {
+S aa{ va };
+S bb{ vb };
+std::atomic_ref a{ aa };
+a.wait(bb);
+std::thread t([&]
+  {
+a.store(bb);
+a.notify_one();
+  });
+a.wait(aa);
+t.join();
+  }
+
 int
 main ()
 {
+  test(0, 42);
+  test(0, 42);
+  test(0u, 42u);
+  test(0.0f, 42.0f);
+  test(0.0, 42.0);
+  test(nullptr, reinterpret_cast(42));
+
   struct S{ int i; };
-  S aa{ 0 };
-  S bb{ 42 };
-
-  std::atomic_ref a{ aa };
-  VERIFY( a.load().i == aa.i );
-  a.wait(bb);
-  std::thread t([&]
-{
-  a.store(bb);
-  a.notify_one();
-});
-  a.wait(aa);
-  t.join();
+  test(S{ 0 }, S{ 42 });
   return 0;
 }
-- 
2.26.2



Re: [PATCH 1/3]: C N2653 char8_t: Language support

2021-06-07 Thread Joseph Myers
Also, it seems odd to add a new field to cpp_options without any code in 
libcpp that uses the value of that field.

-- 
Joseph S. Myers
jos...@codesourcery.com


[PATCH] PR tree-optimization/100299 - Implement a sparse bitmap representation for Rangers on-entry cache.

2021-06-07 Thread Andrew MacLeod via Gcc-patches
This pair of patches provides a sparse representation of the on-entry 
cache for ranger.  When the number of basic blocks exceed 
-param=evrp-sparse-threshold=  (default to 800), ranger moves to a 
sparse representation rather allocating a full vector for each ssa-name.


This is based on an extension of the sparse bitmap code which enables us 
to use multiple bits instead of single bits. We'll cache up to 15 unique 
ranges for any ssa_name in this implementation.


Bootstrapped on x86_64-pc-linux-gnu with no new regressions. Pushed.

I will also port this to gcc11 in a couple of days assuming no 
unforeseen issues show up.


Andrew

commit 508e291d0aa1b2ec6d2287b324d044ac4ddf82e6
Author: Andrew MacLeod 
Date:   Mon Jun 7 13:12:01 2021 -0400

Implement multi-bit aligned accessors for sparse bitmap.

Provide set/get routines to allow sparse bitmaps to be treated as an array
of multiple bit values. Only chunk sizes that are powers of 2 are supported.

* bitmap.c (bitmap_set_aligned_chunk): New.
(bitmap_get_aligned_chunk): New.
(test_aligned_chunk): New.
(bitmap_c_tests): Call test_aligned_chunk.
* bitmap.h (bitmap_set_aligned_chunk, bitmap_get_aligned_chunk): New.

diff --git a/gcc/bitmap.c b/gcc/bitmap.c
index 5a650cdfc1d..b915fdfbb54 100644
--- a/gcc/bitmap.c
+++ b/gcc/bitmap.c
@@ -1004,6 +1004,83 @@ bitmap_bit_p (const_bitmap head, int bit)
   return (ptr->bits[word_num] >> bit_num) & 1;
 }
 
+/* Set CHUNK_SIZE bits at a time in bitmap HEAD.
+   Store CHUNK_VALUE starting at bits CHUNK * chunk_size.
+   This is the set routine for viewing bitmap as a multi-bit sparse array.  */
+
+void
+bitmap_set_aligned_chunk (bitmap head, unsigned int chunk,
+			  unsigned int chunk_size, BITMAP_WORD chunk_value)
+{
+  // Ensure chunk size is a power of 2 and fits in BITMAP_WORD.
+  gcc_checking_assert (pow2p_hwi (chunk_size));
+  gcc_checking_assert (chunk_size < (sizeof (BITMAP_WORD) * CHAR_BIT));
+
+  // Ensure chunk_value is within range of chunk_size bits.
+  BITMAP_WORD max_value = (1 << chunk_size) - 1;
+  gcc_checking_assert (chunk_value <= max_value);
+
+  unsigned bit = chunk * chunk_size;
+  unsigned indx = bit / BITMAP_ELEMENT_ALL_BITS;
+  bitmap_element *ptr;
+  if (!head->tree_form)
+ptr = bitmap_list_find_element (head, indx);
+  else
+ptr = bitmap_tree_find_element (head, indx);
+  unsigned word_num = bit / BITMAP_WORD_BITS % BITMAP_ELEMENT_WORDS;
+  unsigned bit_num  = bit % BITMAP_WORD_BITS;
+  BITMAP_WORD bit_val = chunk_value << bit_num;
+  BITMAP_WORD mask = ~(max_value << bit_num);
+
+  if (ptr != 0)
+{
+  ptr->bits[word_num] &= mask;
+  ptr->bits[word_num] |= bit_val;
+  return;
+}
+
+  ptr = bitmap_element_allocate (head);
+  ptr->indx = bit / BITMAP_ELEMENT_ALL_BITS;
+  ptr->bits[word_num] = bit_val;
+  if (!head->tree_form)
+bitmap_list_link_element (head, ptr);
+  else
+bitmap_tree_link_element (head, ptr);
+}
+
+/* This is the get routine for viewing bitmap as a multi-bit sparse array.
+   Return a set of CHUNK_SIZE consecutive bits from HEAD, starting at bit
+   CHUNK * chunk_size.   */
+
+BITMAP_WORD
+bitmap_get_aligned_chunk (const_bitmap head, unsigned int chunk,
+			  unsigned int chunk_size)
+{
+  // Ensure chunk size is a power of 2, fits in BITMAP_WORD and is in range.
+  gcc_checking_assert (pow2p_hwi (chunk_size));
+  gcc_checking_assert (chunk_size < (sizeof (BITMAP_WORD) * CHAR_BIT));
+
+  BITMAP_WORD max_value = (1 << chunk_size) - 1;
+  unsigned bit = chunk * chunk_size;
+  unsigned int indx = bit / BITMAP_ELEMENT_ALL_BITS;
+  const bitmap_element *ptr;
+  unsigned bit_num;
+  unsigned word_num;
+
+  if (!head->tree_form)
+ptr = bitmap_list_find_element (const_cast (head), indx);
+  else
+ptr = bitmap_tree_find_element (const_cast (head), indx);
+  if (ptr == 0)
+return 0;
+
+  bit_num = bit % BITMAP_WORD_BITS;
+  word_num = bit / BITMAP_WORD_BITS % BITMAP_ELEMENT_WORDS;
+
+  // Return 4 bits.
+  return (ptr->bits[word_num] >> bit_num) & max_value;
+}
+
 #if GCC_VERSION < 3400
 /* Table of number of set bits in a character, indexed by value of char.  */
 static const unsigned char popcount_table[] =
@@ -2857,6 +2934,33 @@ test_bitmap_single_bit_set_p ()
   ASSERT_EQ (1066, bitmap_first_set_bit (b));
 }
 
+/* Verify accessing aligned bit chunks works as expected.  */
+
+static void
+test_aligned_chunk (unsigned num_bits)
+{
+  bitmap b = bitmap_gc_alloc ();
+  int limit = 2 ^ num_bits;
+
+  int index = 3;
+  for (int x = 0; x < limit; x++)
+{
+  bitmap_set_aligned_chunk (b, index, num_bits, (BITMAP_WORD) x);
+  ASSERT_TRUE ((int) bitmap_get_aligned_chunk (b, index, num_bits) == x);
+  ASSERT_TRUE ((int) bitmap_get_aligned_chunk (b, index + 1,
+		   num_bits) == 0);
+  ASSERT_TRUE ((int) bitmap_get_aligned_chunk (b, index - 1,
+		   num_bits) == 0);
+  index += 3;
+}
+  index = 3;
+  for (int x = 0; x < limit ; x++)

Re: [PATCH 2/2] Add IEEE 128-bit fp conditional move on PowerPC.

2021-06-07 Thread Segher Boessenkool
On Thu, May 20, 2021 at 02:27:06PM -0500, will schmidt wrote:
> On Tue, 2021-05-18 at 16:28 -0400, Michael Meissner wrote:
> > +  if (compare_mode == result_mode
> > +  || (compare_mode == SFmode && result_mode == DFmode)
> > +  || (compare_mode == DFmode && result_mode == SFmode))
> > +;
> > +  else
> > +return false;
> 
> Interesting if/else block.  May want to reverse the logic. I defer if
> this way is notably simpler than inverting it.

This is not simpler, no.  You want to do something that just returns
*first*, and then not have an "else".  *That* is simpler.

And just write !(...) around the condition, don't try to manually invert
it please.  You want both correct code and readable code, not neither of
these, they are not extremes you need to balance, each helps the other!


Segher


Re: RFC: Sphinx for GCC documentation

2021-06-07 Thread Bernhard Reutner-Fischer via Gcc-patches
On Mon, 7 Jun 2021 15:30:22 +0200
Martin Liška  wrote:

> Anyway, this is resolved as I use more appropriate directive:
> https://splichal.eu/scripts/sphinx/gfortran/_build/html/intrinsic-procedures/access-checks-file-access-modes.html

ISTM there's a typo s/Tailing/Trailing/ in gcc/fortran/intrinsic.texi

git grep -wi Tailing
seems to highlight a couple more.
Maybe you have time to fix these?

PS: The occurrence in gcc/testsuite/gcc.dg/format/strfmon-1.c sounds
odd.
TIA,


Re: [PATCH 1/3]: C N2653 char8_t: Language support

2021-06-07 Thread Joseph Myers
On Sun, 6 Jun 2021, Tom Honermann via Gcc-patches wrote:

> When -fchar8_t support is enabled for non-C++ modes, the _CHAR8_T_SOURCE macro
> is predefined.  This is the mechanism proposed to glibc to opt-in to
> declarations of the char8_t typedef and c8rtomb and mbrtoc8 functions proposed
> in N2653.  See [2].

I don't think glibc should have such a feature test macro, and I don't 
think GCC should define such feature test macros either - _*_SOURCE macros 
are generally for the *user* to define to decide what namespace they want 
visible, not for the compiler to define.  Without proliferating new 
language dialects, __STDC_VERSION__ ought to be sufficient to communicate 
from the compiler to the library (including to GCC's own headers such as 
stdatomic.h).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH 0/3]: C N2653 char8_t implementation

2021-06-07 Thread Joseph Myers
On Sun, 6 Jun 2021, Tom Honermann via Gcc-patches wrote:

> These changes do not impact default gcc behavior.  The existing -fchar8_t
> option is extended to C compilation to enable the N2653 changes, and
> -fno-char8_t is extended to explicitly disable them.  N2653 has not yet been
> accepted by WG14, so no changes are made to handling of the C2X language
> dialect.

Why is that option needed?  Normally I'd expect features to be enabled or 
disabled based on the selected language version, rather than having 
separate options to adjust the configuration for one very specific feature 
in a language version.  Adding extra language dialects not corresponding 
to any standard version but to some peculiar mix of versions (such as C17 
with a changed type for u8"", or C2X with a changed type for u8'') needs a 
strong reason for those language dialects to be useful (for example, the 
-fgnu89-inline option was justified by widespread use of GNU-style extern 
inline in headers).

I think the whole patch series would best wait until after the proposal 
has been considered by a WG14 meeting, in addition to not increasing the 
number of language dialects supported.

-- 
Joseph S. Myers
jos...@codesourcery.com


[PATCH] i386: Add init pattern for V4QI vectors [PR100637]

2021-06-07 Thread Uros Bizjak via Gcc-patches
2021-06-07  Uroš Bizjak  

gcc/
PR target/100637
* config/i386/i386-expand.c (ix86_expand_vector_init_duplicate):
Handle V4QI mode.
(ix86_expand_vector_init_one_nonzero): Ditto.
(ix86_expand_vector_init_one_var): Ditto.
(ix86_expand_vector_init_general): Ditto.
* config/i386/mmx.md (vec_initv4qiqi): New expander.

gcc/testsuite/

PR target/100637
* gcc.target/i386/pr100637-5b.c: New test.
* gcc.target/i386/pr100637-5w.c: Ditto.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Pushed to master.

Uros.
diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index fb0676f1158..c3ce21b4387 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -13733,6 +13733,7 @@ ix86_expand_vector_init_duplicate (bool mmx_ok, 
machine_mode mode,
   return false;
 
 case E_V8QImode:
+case E_V4QImode:
   if (!mmx_ok)
return false;
   goto widen;
@@ -13878,6 +13879,9 @@ ix86_expand_vector_init_one_nonzero (bool mmx_ok, 
machine_mode mode,
 case E_V4HImode:
   use_vector_set = TARGET_SSE || TARGET_3DNOW_A;
   break;
+case E_V4QImode:
+  use_vector_set = TARGET_SSE4_1;
+  break;
 case E_V32QImode:
 case E_V16HImode:
   use_vector_set = TARGET_AVX;
@@ -14086,6 +14090,10 @@ ix86_expand_vector_init_one_var (bool mmx_ok, 
machine_mode mode,
break;
   wmode = V4HImode;
   goto widen;
+case E_V4QImode:
+  if (TARGET_SSE4_1)
+   break;
+  wmode = V2HImode;
 widen:
   /* There's no way to set one QImode entry easily.  Combine
 the variable value with its adjacent constant value, and
@@ -14535,6 +14543,7 @@ quarter:
 case E_V8QImode:
 
 case E_V2HImode:
+case E_V4QImode:
   break;
 
 default:
diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index c3fd2805f25..0a17a54fad5 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -3369,7 +3369,17 @@
(match_operand 1)]
   "TARGET_SSE2"
 {
-  ix86_expand_vector_init (false, operands[0],
+  ix86_expand_vector_init (TARGET_MMX_WITH_SSE, operands[0],
+  operands[1]);
+  DONE;
+})
+
+(define_expand "vec_initv4qiqi"
+  [(match_operand:V2HI 0 "register_operand")
+   (match_operand 1)]
+  "TARGET_SSE2"
+{
+  ix86_expand_vector_init (TARGET_MMX_WITH_SSE, operands[0],
   operands[1]);
   DONE;
 })
diff --git a/gcc/testsuite/gcc.target/i386/pr100637-5b.c 
b/gcc/testsuite/gcc.target/i386/pr100637-5b.c
new file mode 100644
index 000..3e6cc8ff975
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr100637-5b.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -msse2" } */
+
+typedef char S;
+typedef S V __attribute__((vector_size(4 * sizeof(S;
+
+V duplicate (S a)
+{
+  return (V) { a, a, a, a };
+}
+
+V one_nonzero (S a)
+{
+  return (V) { 0, a };
+}
+
+V one_var (S a)
+{
+  return (V) { 1, a };
+}
+
+V general (S a, S b, S c, S d)
+{
+  return (V) { a, b, c, d };
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr100637-5w.c 
b/gcc/testsuite/gcc.target/i386/pr100637-5w.c
new file mode 100644
index 000..3f677385c16
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr100637-5w.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -msse2" } */
+
+typedef short S;
+typedef S V __attribute__((vector_size(2 * sizeof(S;
+
+V duplicate (S a)
+{
+  return (V) { a, a };
+}
+
+V one_nonzero (S a)
+{
+  return (V) { 0, a };
+}
+
+V one_var (S a)
+{
+  return (V) { 1, a };
+}
+
+V general (S a, S b)
+{
+  return (V) { a, b };
+}


[PATCH, rs6000] Update Power10 scheduling description for fused instruction types

2021-06-07 Thread Pat Haugen via Gcc-patches
Update Power10 scheduling description for new fused instruction types.

Bootstrap/regtest on powerpc64le(Power10) with no new regressions. Ok for
trunk?

-Pat


2021-06-07  Pat Haugen  

gcc/ChangeLog:

* config/rs6000/power10.md (power10-fused-load, power10-fused-store,
power10-fused_alu, power10-fused-vec, power10-fused-branch): New.



diff --git a/gcc/config/rs6000/power10.md b/gcc/config/rs6000/power10.md
index 665f0f22c62..0186ae95896 100644
--- a/gcc/config/rs6000/power10.md
+++ b/gcc/config/rs6000/power10.md
@@ -100,6 +100,11 @@ (define_insn_reservation "power10-load" 4
(eq_attr "cpu" "power10"))
   "DU_any_power10,LU_power10")
 
+(define_insn_reservation "power10-fused-load" 4
+  (and (eq_attr "type" "fused_load_cmpi,fused_addis_load,fused_load_load")
+   (eq_attr "cpu" "power10"))
+  "DU_even_power10,LU_power10")
+
 (define_insn_reservation "power10-prefixed-load" 4
   (and (eq_attr "type" "load")
(eq_attr "update" "no")
@@ -176,6 +181,11 @@ (define_insn_reservation "power10-store" 0
(eq_attr "cpu" "power10"))
   "DU_any_power10,STU_power10")
 
+(define_insn_reservation "power10-fused-store" 0
+  (and (eq_attr "type" "fused_store_store")
+   (eq_attr "cpu" "power10"))
+  "DU_even_power10,STU_power10")
+
 (define_insn_reservation "power10-prefixed-store" 0
   (and (eq_attr "type" "store,fpstore,vecstore")
(eq_attr "prefixed" "yes")
@@ -244,6 +254,11 @@ (define_insn_reservation "power10-alu" 2
 (define_bypass 4 "power10-alu"
 "power10-crlogical,power10-mfcr,power10-mfcrf")
 
+(define_insn_reservation "power10-fused_alu" 2
+  (and (eq_attr "type" "fused_arith_logical,fused_cmp_isel,fused_carry")
+   (eq_attr "cpu" "power10"))
+  "DU_even_power10,EXU_power10")
+
 ; paddi
 (define_insn_reservation "power10-paddi" 2
   (and (eq_attr "type" "add")
@@ -403,6 +418,11 @@ (define_insn_reservation "power10-vec-2cyc" 2
(eq_attr "cpu" "power10"))
   "DU_any_power10,EXU_power10")
 
+(define_insn_reservation "power10-fused-vec" 2
+  (and (eq_attr "type" "fused_vector")
+   (eq_attr "cpu" "power10"))
+  "DU_even_power10,EXU_power10")
+
 (define_insn_reservation "power10-veccmp" 3
   (and (eq_attr "type" "veccmp")
(eq_attr "cpu" "power10"))
@@ -490,6 +510,11 @@ (define_insn_reservation "power10-branch" 2
(eq_attr "cpu" "power10"))
   "DU_any_power10,STU_power10")
 
+(define_insn_reservation "power10-fused-branch" 3
+  (and (eq_attr "type" "fused_mtbc")
+   (eq_attr "cpu" "power10"))
+  "DU_even_power10,STU_power10")
+
 
 ; Crypto
 (define_insn_reservation "power10-crypto" 4



Re: [PATCH] define auto_vec copy ctor and assignment (PR 90904)

2021-06-07 Thread Martin Sebor via Gcc-patches

On 6/7/21 2:51 AM, Richard Biener wrote:

On Thu, Jun 3, 2021 at 10:29 AM Trevor Saunders  wrote:


On Wed, Jun 02, 2021 at 10:04:03AM -0600, Martin Sebor via Gcc-patches wrote:

On 6/2/21 12:55 AM, Richard Biener wrote:

On Tue, Jun 1, 2021 at 9:56 PM Martin Sebor  wrote:


On 5/27/21 2:53 PM, Jason Merrill wrote:

On 4/27/21 11:52 AM, Martin Sebor via Gcc-patches wrote:

On 4/27/21 8:04 AM, Richard Biener wrote:

On Tue, Apr 27, 2021 at 3:59 PM Martin Sebor  wrote:


On 4/27/21 1:58 AM, Richard Biener wrote:

On Tue, Apr 27, 2021 at 2:46 AM Martin Sebor via Gcc-patches
 wrote:


PR 90904 notes that auto_vec is unsafe to copy and assign because
the class manages its own memory but doesn't define (or delete)
either special function.  Since I first ran into the problem,
auto_vec has grown a move ctor and move assignment from
a dynamically-allocated vec but still no copy ctor or copy
assignment operator.

The attached patch adds the two special functions to auto_vec along
with a few simple tests.  It makes auto_vec safe to use in containers
that expect copyable and assignable element types and passes
bootstrap
and regression testing on x86_64-linux.


The question is whether we want such uses to appear since those
can be quite inefficient?  Thus the option is to delete those
operators?


I would strongly prefer the generic vector class to have the properties
expected of any other generic container: copyable and assignable.  If
we also want another vector type with this restriction I suggest to add
another "noncopyable" type and make that property explicit in its name.
I can submit one in a followup patch if you think we need one.


I'm not sure (and not strictly against the copy and assign).  Looking
around
I see that vec<> does not do deep copying.  Making auto_vec<> do it
might be surprising (I added the move capability to match how vec<>
is used - as "reference" to a vector)


The vec base classes are special: they have no ctors at all (because
of their use in unions).  That's something we might have to live with
but it's not a model to follow in ordinary containers.


I don't think we have to live with it anymore, now that we're writing
C++11.


The auto_vec class was introduced to fill the need for a conventional
sequence container with a ctor and dtor.  The missing copy ctor and
assignment operators were an oversight, not a deliberate feature.
This change fixes that oversight.


I've been away a while, but trying to get back into this, sorry.  It was
definitely an oversight to leave these undefined for the compiler to
provide a default definition of, but I agree with Richi, the better
thing to have done, or do now would be to mark them as deleted and make
auto_vec move only (with copy() for when you really need a deep copy.


The revised patch also adds a copy ctor/assignment to the auto_vec
primary template (that's also missing it).  In addition, it adds
a new class called auto_vec_ncopy that disables copying and
assignment as you prefer.


Hmm, adding another class doesn't really help with the confusion richi
mentions.  And many uses of auto_vec will pass them as vec, which will
still do a shallow copy.  I think it's probably better to disable the
copy special members for auto_vec until we fix vec<>.


There are at least a couple of problems that get in the way of fixing
all of vec to act like a well-behaved C++ container:

1) The embedded vec has a trailing "flexible" array member with its
instances having different size.  They're initialized by memset and
copied by memcpy.  The class can't have copy ctors or assignments
but it should disable/delete them instead.

2) The heap-based vec is used throughout GCC with the assumption of
shallow copy semantics (not just as function arguments but also as
members of other such POD classes).  This can be changed by providing
copy and move ctors and assignment operators for it, and also for
some of the classes in which it's a member and that are used with
the same assumption.

3) The heap-based vec::block_remove() assumes its elements are PODs.
That breaks in VEC_ORDERED_REMOVE_IF (used in gcc/dwarf2cfi.c:2862
and tree-vect-patterns.c).

I took a stab at both and while (1) is easy, (2) is shaping up to
be a big and tricky project.  Tricky because it involves using
std::move in places where what's moved is subsequently still used.
I can keep plugging away at it but it won't change the fact that
the embedded and heap-based vecs have different requirements.


So you figured that neither vec<> nor auto_vec<> are a container like
std::vector.


That's obvious from glancing at their definitions.  I didn't go
through the exercise to figure that out.



I'm not sure it makes sense to try to make it so since obviously vec<>
was designed to match the actual needs of GCC.  auto_vec<> was added
to make a RAII (like auto_bitmap, etc.) wrapper, plus it got the ability
to provide initial stack storage.


The goal was to see if the two vec instances could be made safer
to 

[PATCH] middle-end/100951 - make sure to generate VECTOR_CST in lowering

2021-06-07 Thread H.J. Lu via Gcc-patches
On Mon, Jun 7, 2021 at 11:10 AM Richard Biener
 wrote:
>
> On Mon, Jun 7, 2021 at 7:59 PM Richard Biener
>  wrote:
> >
> > On Mon, Jun 7, 2021 at 4:19 PM H.J. Lu  wrote:
> > >
> > > On Mon, Jun 7, 2021 at 12:12 AM Richard Sandiford
> > >  wrote:
> > > >
> > > > "H.J. Lu"  writes:
> > > > > Update vec_duplicate to allow to fail so that backend can only allow
> > > > > broadcasting an integer constant to a vector when broadcast 
> > > > > instruction
> > > > > is available.
> > > >
> > > > I'm not sure why we need this to fail though.  Once the optab is defined
> > > > for target X, the optab should handle all duplicates for target X,
> > > > even if there are different strategies it can use.
> > > >
> > > > AIUI the case you want to make conditional is the constant case.
> > > > I guess the first question is: why don't we simplify those CONSTRUCTORs
> > > > to VECTOR_CSTs in gimple?  I'm surprised we still see the constant case
> > > > as a constructor here.
> > >
> > > The particular testcase for vec_duplicate is gcc.dg/pr100239.c.
> > >
> > > > If we can't rely on that happening, then would it work to change:
> > > >
> > > > /* Try using vec_duplicate_optab for uniform vectors.  */
> > > > if (!TREE_SIDE_EFFECTS (exp)
> > > > && VECTOR_MODE_P (mode)
> > > > && eltmode == GET_MODE_INNER (mode)
> > > > && ((icode = optab_handler (vec_duplicate_optab, mode))
> > > > != CODE_FOR_nothing)
> > > > && (elt = uniform_vector_p (exp)))
> > > >
> > > > to something like:
> > > >
> > > > /* Try using vec_duplicate_optab for uniform vectors.  */
> > > > if (!TREE_SIDE_EFFECTS (exp)
> > > > && VECTOR_MODE_P (mode)
> > > > && eltmode == GET_MODE_INNER (mode)
> > > > && (elt = uniform_vector_p (exp)))
> > > >   {
> > > > if (TREE_CODE (elt) == INTEGER_CST
> > > > || TREE_CODE (elt) == POLY_INT_CST
> > > > || TREE_CODE (elt) == REAL_CST
> > > > || TREE_CODE (elt) == FIXED_CST)
> > > >   {
> > > > rtx src = gen_const_vec_duplicate (mode, expand_normal 
> > > > (node));
> > > > emit_move_insn (target, src);
> > > > break;
> > > >   }
> > > > …
> > > >   }
> > >
> > > I will give it a try.
> >
> > I can confirm that veclower leaves us with an unfolded constant CTOR.
> > If you file a PR to remind me I'll fix that.
>
> The attached untested patch fixes this for the testcase.
>

Here is the patch + the testcase.

-- 
H.J.
From aac56894719b59e552b493c970946225ed8c27f6 Mon Sep 17 00:00:00 2001
From: Richard Biener 
Date: Mon, 7 Jun 2021 20:08:13 +0200
Subject: [PATCH] middle-end/100951 - make sure to generate VECTOR_CST in
 lowering

When vector lowering creates piecewise ops make sure to create
VECTOR_CSTs instead of CONSTRUCTORs when possible.

gcc/

2021-06-07  Richard Biener  

	PR middle-end/100951
	* tree-vect-generic.c (): Build a VECTOR_CST if all
	elements are constant.

gcc/testsuite/

2021-06-07  H.J. Lu  

	PR middle-end/100951
	* gcc.target/i386/pr100951.c: New test.
---
 gcc/testsuite/gcc.target/i386/pr100951.c | 15 +++
 gcc/tree-vect-generic.c  | 34 +---
 2 files changed, 45 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100951.c

diff --git a/gcc/testsuite/gcc.target/i386/pr100951.c b/gcc/testsuite/gcc.target/i386/pr100951.c
new file mode 100644
index 000..16d8bafa663
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr100951.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O0 -march=x86-64" } */
+
+typedef short __attribute__((__vector_size__ (8 * sizeof (short V;
+V v, w;
+
+void
+foo (void)
+{
+  w = __builtin_shuffle (v != v, 0 < (V) {}, (V) {192} >> 5);
+}
+
+/* { dg-final { scan-assembler-not "punpcklwd" } } */
+/* { dg-final { scan-assembler-not "pshufd" } } */
+/* { dg-final { scan-assembler-times "pxor\[\\t \]%xmm\[0-9\]+, %xmm\[0-9\]+" 1 } } */
diff --git a/gcc/tree-vect-generic.c b/gcc/tree-vect-generic.c
index d9c0ac9de7e..5f3f9fa005e 100644
--- a/gcc/tree-vect-generic.c
+++ b/gcc/tree-vect-generic.c
@@ -328,16 +328,22 @@ expand_vector_piecewise (gimple_stmt_iterator *gsi, elem_op_func f,
   if (!ret_type)
 ret_type = type;
   vec_alloc (v, (nunits + delta - 1) / delta);
+  bool constant_p = true;
   for (i = 0; i < nunits;
i += delta, index = int_const_binop (PLUS_EXPR, index, part_width))
 {
   tree result = f (gsi, inner_type, a, b, index, part_width, code,
 		   ret_type);
+  if (!CONSTANT_CLASS_P (result))
+	constant_p = false;
   constructor_elt ce = {NULL_TREE, result};
   v->quick_push (ce);
 }
 
-  return build_constructor (ret_type, v);
+  if (constant_p)
+return build_vector_from_ctor (ret_type, v);
+  else
+return build_constructor (ret_type, v);
 }
 
 

Re: [PATCH 1/2] Add IEEE 128-bit min/max support on PowerPC.

2021-06-07 Thread Segher Boessenkool
On Tue, May 18, 2021 at 04:26:06PM -0400, Michael Meissner wrote:
> This patch adds the support for the IEEE 128-bit floating point C minimum and
> maximum instructions.

> gcc/
> 2021-05-18  Michael Meissner  
> 
>   * config/rs6000/rs6000.c (rs6000_emit_minmax): Add support for ISA
>   3.1   IEEE   128-bit   floating  point   xsmaxcqp   and   xsmincqp
>   instructions.

3.1 fits on the previous line (it is better to not split numbers to a
new line).  What is up with the weird multiple spaces?  We don't align
the right border in changelogs :-)

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/float128-minmax-2.c
> @@ -0,0 +1,15 @@
> +/* { dg-require-effective-target ppc_float128_hw } */
> +/* { dg-require-effective-target power10_ok } */

Is this needed?  And, why is ppc_float128_hw needed?  That combination
does not seem to make sense.

> --- a/gcc/testsuite/gcc.target/powerpc/float128-minmax.c
> +++ b/gcc/testsuite/gcc.target/powerpc/float128-minmax.c
> @@ -3,6 +3,13 @@
>  /* { dg-require-effective-target float128 } */
>  /* { dg-options "-mpower9-vector -O2 -ffast-math" } */
>  
> +/* If the compiler was configured to automatically generate power10 support 
> with
> +   --with-cpu=power10, turn it off.  Otherwise, it will generate XXMAXCQP and
> +   XXMINCQP instructions.  */
> +#ifdef _ARCH_PWR10
> +#pragma GCC target ("cpu=power9")
> +#endif

Yeah, don't.  Add a dg-skip-if if that is what you want.  That
-mpower9-vector shouldn't be there either.


Segher


Re: [PATCH 1/2] Add IEEE 128-bit min/max support on PowerPC.

2021-06-07 Thread Segher Boessenkool
Hi!

On Thu, May 20, 2021 at 09:38:49PM -0400, Michael Meissner wrote:
> Basically for code generation tests, I see the following cases:
> 
> 1) Test code targetting precisley power8 (or power9, power10), etc.  Hopefully
> these are rare.

-mdejagnu-cpu= works perfectly for this.  You may need a *_ok or a *_hw
as well (and/or other selectors).

> 2) Test code targetting at least power8.  But as these tests show, that a lot
> of the code won't generate the appropriate instructions on power10.  This is
> what we have now.  It relies on undocumented switches like -mpower9-vector to
> add the necessary support.

You should simply not run this test on too new systems.  You can use
dg-skip-if or similar.

> 3) Test code targetting at least power8 but go to power9 at the maximum.

But why?  We will keep testing all interesting CPU / OS combos as long
as they are interesting.


Segher


Re: [PATCH] Implement a context aware points-to analyzer for use in evrp.

2021-06-07 Thread Andrew MacLeod via Gcc-patches

On 6/7/21 9:30 AM, Richard Biener via Gcc-patches wrote:

On Mon, Jun 7, 2021 at 12:10 PM Aldy Hernandez via Gcc-patches
 wrote:

The substitute_and_fold_engine which evrp uses is expecting symbolics
from value_of_expr / value_on_edge / etc, which ranger does not provide.
In some cases, these provide important folding cues, as in the case of
aliases for pointers.  For example, legacy evrp may return [, ]
for the value of "bar" where bar is on an edge where bar == , or
when bar has been globally set to   This information is then used
by the subst & fold engine to propagate the known value of bar.

Currently this is a major source of discrepancies between evrp and
ranger.  Of the 284 cases legacy evrp is getting over ranger, 237 are
for pointer equality as discussed above.

This patch implements a context aware points-to class which
ranger-evrp can use to query what a pointer is currently pointing to.
With it, we reduce the 284 cases legacy evrp is getting to 47.

The API for the points-to analyzer is the following:

class points_to_analyzer
{
public:
   points_to_analyzer (gimple_ranger *r);
   ~points_to_analyzer ();
   void enter (basic_block);
   void leave (basic_block);
   void visit_stmt (gimple *stmt);
   tree get_points_to (tree name) const;
...
};

The enter(), leave(), and visit_stmt() methods are meant to be called
from a DOM walk.   At any point throughout the walk, one can call
get_points_to() to get whatever an SSA is pointing to.

If this class is useful to others, we could place it in a more generic
location.

Tested on x86-64 Linux with a regular bootstrap/tests and by comparing
EVRP folds over ranger before and after this patch.

Hmm, but why call it "points-to" - when I look at the implementation
it's really about equivalences.  Thus,

  if (var1_2 == var2_3)

could be handled the same way.  Also "points-to" implies (to me)
that [1] and [2] point to the same object but your points-to
is clearly tracking equivalences only.

So maybe at least rename it to pointer_equiv_analyzer?  ISTR
propagating random (symbolic) equivalences has issues.


Yeah, pointer_equiv is probably more accurate. This is purely for cases 
where we know a pointer points to something that isn't an ssa_name.  
Eventually this is likely to be subsumed into a pointer_range object, 
but unlikely in this release.


I don't think this is actually doing the propagation though... It tracks 
that a_2 currently points to  and returns that to either 
simplifier or folder thru value_of_expr().  Presumably it is up to them 
to determine whether the tree expression passed back is safe to 
propagate.   Is there any attempt in EVRP to NOT set the range of 
something to [, ] under some conditions?   This is what the 
change amounts to.  Ranger would just return a range of [1, +INF], and 
value_of_expr  would therefore return NULL.  This allows value_of to 
return  in these conditions.   Aldy, did you see any other checks in 
the vr-values code?


Things like   if (var1_2 == var2_3) deal with just ssa-names and will be 
handled by an ssa_name relation oracle. It just treats equivalencies 
like a a slightly special kind of relation. Im just about to bring that 
forward this week.


Andrew




[PATCH] x86: Don't compile pr82735-[345].c for x32

2021-06-07 Thread H.J. Lu via Gcc-patches
On Thu, Jun 3, 2021 at 11:31 PM Hongtao Liu via Gcc-patches
 wrote:
>
> On Fri, Jun 4, 2021 at 2:27 PM Uros Bizjak via Gcc-patches
>  wrote:
> >
> > On Thu, Jun 3, 2021 at 8:54 AM liuhongt  wrote:
> > >
> > > When __builtin_ia32_vzeroupper is called explicitly, the corresponding
> > > vzeroupper pattern does not carry any CLOBBERS or SETs before LRA,
> > > which leads to incorrect optimization in pass_reload. In order to
> > > solve this problem, this patch refine instructions as call_insns in
> > > which the call has a special vzeroupper ABI.
> > >
> > > gcc/ChangeLog:
> > >
> > > PR target/82735
> > > * config/i386/i386-expand.c (ix86_expand_builtin): Remove
> > > assignment of cfun->machine->has_explicit_vzeroupper.
> > > * config/i386/i386-features.c
> > > (ix86_add_reg_usage_to_vzerouppers): Delete.
> > > (ix86_add_reg_usage_to_vzeroupper): Ditto.
> > > (rest_of_handle_insert_vzeroupper): Remove
> > > ix86_add_reg_usage_to_vzerouppers, add df_analyze at the end
> > > of the function.
> > > (gate): Remove cfun->machine->has_explicit_vzeroupper.
> > > * config/i386/i386-protos.h (ix86_expand_avx_vzeroupper):
> > > Declared.
> > > * config/i386/i386.c (ix86_insn_callee_abi): New function.
> > > (ix86_initialize_callee_abi): Ditto.
> > > (ix86_expand_avx_vzeroupper): Ditto.
> > > (ix86_hard_regno_call_part_clobbered): Adjust for vzeroupper
> > > ABI.
> > > (TARGET_INSN_CALLEE_ABI): Define as ix86_insn_callee_abi.
> > > (ix86_emit_mode_set): Call ix86_expand_avx_vzeroupper
> > > directly.
> > > * config/i386/i386.h (struct GTY(()) machine_function): Delete
> > > has_explicit_vzeroupper.
> > > * config/i386/i386.md (enum unspec): New member
> > > UNSPEC_CALLEE_ABI.
> > > (I386_DEFAULT,I386_VZEROUPPER,I386_UNKNOWN): New
> > > define_constants for insn callee abi index.
> > > * config/i386/predicates.md (vzeroupper_pattern): Adjust.
> > > * config/i386/sse.md (UNSPECV_VZEROUPPER): Deleted.
> > > (avx_vzeroupper): Call ix86_expand_avx_vzeroupper.
> > > (*avx_vzeroupper): Rename to ..
> > > (avx_vzeroupper_callee_abi): .. this, and adjust pattern as
> > > call_insn which has a special vzeroupper ABI.
> > > (*avx_vzeroupper_1): Deleted.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > PR target/82735
> > > * gcc.target/i386/pr82735-1.c: New test.
> > > * gcc.target/i386/pr82735-2.c: New test.
> > > * gcc.target/i386/pr82735-3.c: New test.
> > > * gcc.target/i386/pr82735-4.c: New test.
> > > * gcc.target/i386/pr82735-5.c: New test.
> >
> > LGTM, with a small nit below.
> >
> > Thanks,
> > Uros.
> >
> > > ---
> > >  gcc/config/i386/i386-expand.c |  4 -
> > >  gcc/config/i386/i386-features.c   | 99 +++
> > >  gcc/config/i386/i386-protos.h |  1 +
> > >  gcc/config/i386/i386.c| 55 -
> > >  gcc/config/i386/i386.h|  4 -
> > >  gcc/config/i386/i386.md   | 10 +++
> > >  gcc/config/i386/predicates.md |  5 +-
> > >  gcc/config/i386/sse.md| 59 --
> > >  gcc/testsuite/gcc.target/i386/pr82735-1.c | 29 +++
> > >  gcc/testsuite/gcc.target/i386/pr82735-2.c | 22 +
> > >  gcc/testsuite/gcc.target/i386/pr82735-3.c |  5 ++
> > >  gcc/testsuite/gcc.target/i386/pr82735-4.c | 48 +++
> > >  gcc/testsuite/gcc.target/i386/pr82735-5.c | 54 +
> > >  13 files changed, 252 insertions(+), 143 deletions(-)
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr82735-1.c
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr82735-2.c
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr82735-3.c
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr82735-4.c
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr82735-5.c
> > >
> > > diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
> > > index 9f3d41955a2..d25d59aa4e7 100644
> > > --- a/gcc/config/i386/i386-expand.c
> > > +++ b/gcc/config/i386/i386-expand.c
> > > @@ -13282,10 +13282,6 @@ rdseed_step:
> > >
> > >return 0;
> > >
> > > -case IX86_BUILTIN_VZEROUPPER:
> > > -  cfun->machine->has_explicit_vzeroupper = true;
> > > -  break;
> > > -
> > >  default:
> > >break;
> > >  }
> > > diff --git a/gcc/config/i386/i386-features.c 
> > > b/gcc/config/i386/i386-features.c
> > > index 77783a154b6..a25769ae478 100644
> > > --- a/gcc/config/i386/i386-features.c
> > > +++ b/gcc/config/i386/i386-features.c
> > > @@ -1768,92 +1768,22 @@ convert_scalars_to_vector (bool timode_p)
> > >return 0;
> > >  }
> > >
> > > -/* Modify the vzeroupper pattern in INSN so that it describes the effect
> > > -   that the instruction has on the 

Re: [PATCH ] RS6000 Add 128-bit Binary Integer sign extend operations

2021-06-07 Thread Segher Boessenkool
Hi Carl,

On Wed, Apr 28, 2021 at 10:39:14AM -0700, Carl Love wrote:
> The agreement for the sign extension builtin was to just make it Endian
> aware rather then go with a more complex definition.  The prior patch
> has been updated with this new functionality.
> 
> This patch adds support for the 128-bit extension instruction and
> corresponding builtin support for the various sign extensions.
> 
> This was originally part of the Add 128-bit Integer operations patch
> series.  The patch logically goes with the earlier 5 patch series.

But it has nothing to do with patch 5/5, so it confused me that you
posted it as a reply to that.

> +(define_expand "vsignextend_v2di_v1ti"
> +  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
> + (unspec:V1TI [(match_operand:V2DI 1 "vsx_register_operand" "v")]
> +  UNSPEC_VSX_SIGN_EXTEND))]
> +  "TARGET_POWER10"
> +{
> +  if (BYTES_BIG_ENDIAN)
> +{
> +   rtx tmp = gen_reg_rtx (V2DImode);
> +
> +   emit_insn (gen_altivec_vrevev2di2(tmp, operands[1]));
> +   emit_insn (gen_vsx_sign_extend_v2di_v1ti(operands[0], tmp));
> + }
> +  else
> +emit_insn (gen_vsx_sign_extend_v2di_v1ti(operands[0], operands[1]));
> +  DONE;
> +})

The indentation is broken -- everything should go by two columns.

In cases like this where the pattern in the expand agrees with a
define_insn you have, you can write just

{
  if (BYTES_BIG_ENDIAN)
{
  ...
  DONE;
}
}

This is the normal way of writing it.  When a define_expand does not
call DONE or FAIL, the pattern is inserted in the insn stream.


Okay for trunk with the indents fixed, and maybe the expand thing.  Also
okay for GCC 11 if it survived testing on all targets and OSes.  That
goes for all patches in this series btw.  Thanks!


Segher


Re: [PATCH] Add --enable-default-semantic-interposition to GCC configure

2021-06-07 Thread Fangrui Song



On 2021-06-07, Jakub Jelinek wrote:

On Mon, Jun 07, 2021 at 12:01:55PM -0600, Jeff Law via Gcc-patches wrote:

> This breaks assumptions across the board.  If software packages want
> to use -fno-semantic-interposition that is one thing.  But distros
> should not be changing the default.  This is just like using
> -ffast-math :).
Some distros already force immediate binding at link time for security
purposes on a distro-wide basis which, IIUC, does the same thing, but
without the benefits from a code generation standpoint.


If you are talking about -Wl,-z,now, that is very different, semantic
interposition then still works just fine.
If you are talking about the glibc style by hand "protected" visibility,
bind calls to symbols defined in the same library through internal symbols,
then that is done only for a couple of packages and is stronger than
-fno-semantic-interposition.

Jakub



-fno-semantic-interposition can save a PLT entry (and associated
R_*_JUMP_SLOT dynamic relocation) if a default visibility STB_GLOBAL
function is only called in its defining TU, not by other TUs linked into
the shared object.

This is a subset of the PLT-suppressing optimization if a distribution defaults 
to ld -Bsymbolic-non-weak-functions
(https://maskray.me/blog/2021-05-16-elf-interposition-and-bsymbolic#the-last-alliance-of-elf-and-men)



Binding definitions in the same component can make software securer.

https://twitter.com/CarlosODonell/status/1400879768028028935
"Disable PRELOAD/AUDIT, which is what I'm going to pursue e.g.  system-wide glibc 
hardening tunable."
If such a thing is deployed, why cannot a passionate distribution default to 
gcc -fno-semantic-interposition  and ld -Bsymbolic-non-weak-functions
can bring back the lost performance (15+% for my clang; 27% for cpython; ...)



Last, the "assumption" is just GCC's mapping from source code to the ELF binary
format.
https://maskray.me/blog/2021-05-09-fno-semantic-interposition#source-level-implication
We could also argue that C++ odr rule doesn't like us doing semantic 
interposition.
(I know it's vague 
https://stackoverflow.com/questions/5563/odr-violation-when-linking-static-and-dynamic-library
 )


Aligning stack offsets for spills

2021-06-07 Thread Jeff Law



So, as many of you know I left Red Hat a while ago and joined Tachyum.  
We're building a new processor and we've come across an issue where I 
think we need upstream discussion.


I can't divulge many of the details right now, but one of the quirks of 
our architecture is that reg+d addressing modes for our vector 
loads/stores require the displacement to be aligned.  This is an 
artifact of how these instructions are encoded.


Obviously we can emit a load of the address into a register when the 
displacement isn't aligned.  From a correctness point that works 
perfectly.  Unfortunately, it's a significant performance hit on some 
standard benchmarks (spec) where we have a great number of spills of 
vector objects into the stack at unaligned offsets in the hot parts of 
the code.



We've considered 3 possible approaches to solve this problem.

1. When the displacement isn't properly aligned, allocate more space in 
assign_stack_local so that we can make the offset aligned.  The downside 
is this potentially burns a lot of stack space, but in practice the cost 
was minimal (16 bytes in a 9k frame)  From a performance standpoint this 
works perfectly.


2. Abuse the register elimination code to create a second pointer into 
the stack.  Spills would start as  + offset, then either get 
eliminated to sp+offset' when the offset is aligned or gpr+offset'' when 
the offset wasn't properly aligned. We started a bit down this path, but 
with #1 working so well, we didn't get this approach to proof-of-concept.


3. Hack up the post-reload optimizers to fix things up as best as we 
can.  This may still be advantageous, but again with #1 working so well, 
we didn't explore this in any significant way.  We may still look at 
this at some point in other contexts.




Here's what we're playing with.  Obviously we'd need a target hook to 
drive this behavior.  I was thinking that we'd pass in any slot offset 
alignment requirements (from the target hook) to assign_stack_local and 
that would bubble down to this point in try_fit_stack_local:


diff --git a/gcc/function.c b/gcc/function.c
index d616f5f64f4..7f441b87a63 100644
--- a/gcc/function.c
+++ b/gcc/function.c
@@ -307,6 +307,14 @@ try_fit_stack_local (poly_int64 start, poly_int64 
length,

   frame_off = targetm.starting_frame_offset () % frame_alignment;
   frame_phase = frame_off ? frame_alignment - frame_off : 0;

+  if (known_eq (size, 64) && alignment < 64)
+    alignment = 64;
+
   /* Round the frame offset to the specified alignment.  */

   if (FRAME_GROWS_DOWNWARD)

Thoughts?

Jeff


Re: [PATCH 5/5 ver4] RS6000: Conversions between 128-bit integer and floating point values.

2021-06-07 Thread Segher Boessenkool
On Mon, Apr 26, 2021 at 09:36:33AM -0700, Carl Love wrote:
> This patch adds support for converting to/from 128-bit integers and
> 128-bit decimal floating point formats using the new P10 instructions
> dcffixqq and dctfixqq.  The new instructions are only used on P10 HW,
> otherwise the conversions continue to use the existing SW routines.
> 
> The files fixkfti-sw.c and fixunskfti-sw.c are renamed versions of
> fixkfti.c and fixunskfti.c respectively.  The function names in the
> files were updated with the rename as well as some white spaces fixes.

> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -6421,6 +6421,42 @@
> xscvsxddp %x0,%x1"
>[(set_attr "type" "fp")])
>  
> +(define_insn "floatti2"
> +  [(set (match_operand:IEEE128 0 "vsx_register_operand" "=v")
> +   (float:IEEE128 (match_operand:TI 1 "vsx_register_operand" "v")))]

Broken indent?  It should be indented by 8 spaces, thus, a tab.  (More of
this further on, please fix all).


It isn't clear to me why you need the separate *_sw and *_hw names, or
if it is just to make it clearer, or maybe something else?  Some words
here would have helped :-)


Okay for trunk.  Thanks!


Segher


Re: [PATCH] Implement a context aware points-to analyzer for use in evrp.

2021-06-07 Thread Aldy Hernandez via Gcc-patches



On 6/7/21 3:30 PM, Richard Biener wrote:

On Mon, Jun 7, 2021 at 12:10 PM Aldy Hernandez via Gcc-patches
 wrote:


The substitute_and_fold_engine which evrp uses is expecting symbolics
from value_of_expr / value_on_edge / etc, which ranger does not provide.
In some cases, these provide important folding cues, as in the case of
aliases for pointers.  For example, legacy evrp may return [, ]
for the value of "bar" where bar is on an edge where bar == , or
when bar has been globally set to   This information is then used
by the subst & fold engine to propagate the known value of bar.

Currently this is a major source of discrepancies between evrp and
ranger.  Of the 284 cases legacy evrp is getting over ranger, 237 are
for pointer equality as discussed above.

This patch implements a context aware points-to class which
ranger-evrp can use to query what a pointer is currently pointing to.
With it, we reduce the 284 cases legacy evrp is getting to 47.

The API for the points-to analyzer is the following:

class points_to_analyzer
{
public:
   points_to_analyzer (gimple_ranger *r);
   ~points_to_analyzer ();
   void enter (basic_block);
   void leave (basic_block);
   void visit_stmt (gimple *stmt);
   tree get_points_to (tree name) const;
...
};

The enter(), leave(), and visit_stmt() methods are meant to be called
from a DOM walk.   At any point throughout the walk, one can call
get_points_to() to get whatever an SSA is pointing to.

If this class is useful to others, we could place it in a more generic
location.

Tested on x86-64 Linux with a regular bootstrap/tests and by comparing
EVRP folds over ranger before and after this patch.


Hmm, but why call it "points-to" - when I look at the implementation
it's really about equivalences.  Thus,

  if (var1_2 == var2_3)

could be handled the same way.  Also "points-to" implies (to me)
that [1] and [2] point to the same object but your points-to
is clearly tracking equivalences only.

So maybe at least rename it to pointer_equiv_analyzer?  ISTR


Good point.  Renaming done.  I've adjusted the changelog and commit 
message as well.


Thanks.
Aldy
>From 0b1f6d28a4cdaeb9e0a6285050a5b35eca86829d Mon Sep 17 00:00:00 2001
From: Aldy Hernandez 
Date: Fri, 4 Jun 2021 20:25:20 +0200
Subject: [PATCH] Implement a context aware pointer equivalency class for use
 in evrp.

The substitute_and_fold_engine which evrp uses is expecting symbolics
from value_of_expr / value_on_edge / etc, which ranger does not provide.
In some cases, these provide important folding cues, as in the case of
aliases for pointers.  For example, legacy evrp may return [, ]
for the value of "bar" where bar is on an edge where bar == , or
when bar has been globally set to   This information is then used
by the subst & fold engine to propagate the known value of bar.

Currently this is a major source of discrepancies between evrp and
ranger.  Of the 284 cases legacy evrp is getting over ranger, 237 are
for pointer equality as discussed above.

This patch implements a context aware pointer equivalency class which
ranger-evrp can use to query what an SSA pointer is currently
equivalent to.  With it, we reduce the 284 cases legacy evrp is getting
to 47.

The API for the pointer equivalency analyzer is the following:

class pointer_equiv_analyzer
{
public:
  pointer_equiv_analyzer (gimple_ranger *r);
  ~pointer_equiv_analyzer ();
  void enter (basic_block);
  void leave (basic_block);
  void visit_stmt (gimple *stmt);
  tree get_equiv (tree ssa) const;
...
};

The enter(), leave(), and visit_stmt() methods are meant to be called
from a DOM walk.   At any point throughout the walk, one can call
get_equiv() to get whatever an SSA is equivalent to.

Tested on x86-64 Linux with a regular bootstrap/tests and by comparing
EVRP folds over ranger before and after this patch.

gcc/ChangeLog:

	* gimple-ssa-evrp.c (class ssa_equiv_stack): New.
	(ssa_equiv_stack::ssa_equiv_stack): New.
	(ssa_equiv_stack::~ssa_equiv_stack): New.
	(ssa_equiv_stack::enter): New.
	(ssa_equiv_stack::leave): New.
	(ssa_equiv_stack::push_replacement): New.
	(ssa_equiv_stack::get_replacement): New.
	(is_pointer_ssa): New.
	(class pointer_equiv_analyzer): New.
	(pointer_equiv_analyzer::pointer_equiv_analyzer): New.
	(pointer_equiv_analyzer::~pointer_equiv_analyzer): New.
	(pointer_equiv_analyzer::set_global_equiv): New.
	(pointer_equiv_analyzer::set_cond_equiv): New.
	(pointer_equiv_analyzer::get_equiv): New.
	(pointer_equiv_analyzer::enter): New.
	(pointer_equiv_analyzer::leave): New.
	(pointer_equiv_analyzer::get_equiv_expr): New.
	(pta_valueize): New.
	(pointer_equiv_analyzer::visit_stmt): New.
	(pointer_equiv_analyzer::visit_edge): New.
	(hybrid_folder::value_of_expr): Call PTA.
	(hybrid_folder::value_on_edge): Same.
	(hybrid_folder::pre_fold_bb): New.
	(hybrid_folder::post_fold_bb): New.
	(hybrid_folder::pre_fold_stmt): New.
	(rvrp_folder::pre_fold_bb): New.
	(rvrp_folder::post_fold_bb): New.
	(rvrp_folder::pre_fold_stmt): New.
	

Re: [PATCH] Add --enable-default-semantic-interposition to GCC configure

2021-06-07 Thread Jakub Jelinek via Gcc-patches
On Mon, Jun 07, 2021 at 12:01:55PM -0600, Jeff Law via Gcc-patches wrote:
> > This breaks assumptions across the board.  If software packages want
> > to use -fno-semantic-interposition that is one thing.  But distros
> > should not be changing the default.  This is just like using
> > -ffast-math :).
> Some distros already force immediate binding at link time for security
> purposes on a distro-wide basis which, IIUC, does the same thing, but
> without the benefits from a code generation standpoint.

If you are talking about -Wl,-z,now, that is very different, semantic
interposition then still works just fine.
If you are talking about the glibc style by hand "protected" visibility,
bind calls to symbols defined in the same library through internal symbols,
then that is done only for a couple of packages and is stronger than
-fno-semantic-interposition.

Jakub



Re: [PATCH v2 1/2] Allow vec_duplicate_optab to fail

2021-06-07 Thread Richard Biener via Gcc-patches
On Mon, Jun 7, 2021 at 7:59 PM Richard Biener
 wrote:
>
> On Mon, Jun 7, 2021 at 4:19 PM H.J. Lu  wrote:
> >
> > On Mon, Jun 7, 2021 at 12:12 AM Richard Sandiford
> >  wrote:
> > >
> > > "H.J. Lu"  writes:
> > > > Update vec_duplicate to allow to fail so that backend can only allow
> > > > broadcasting an integer constant to a vector when broadcast instruction
> > > > is available.
> > >
> > > I'm not sure why we need this to fail though.  Once the optab is defined
> > > for target X, the optab should handle all duplicates for target X,
> > > even if there are different strategies it can use.
> > >
> > > AIUI the case you want to make conditional is the constant case.
> > > I guess the first question is: why don't we simplify those CONSTRUCTORs
> > > to VECTOR_CSTs in gimple?  I'm surprised we still see the constant case
> > > as a constructor here.
> >
> > The particular testcase for vec_duplicate is gcc.dg/pr100239.c.
> >
> > > If we can't rely on that happening, then would it work to change:
> > >
> > > /* Try using vec_duplicate_optab for uniform vectors.  */
> > > if (!TREE_SIDE_EFFECTS (exp)
> > > && VECTOR_MODE_P (mode)
> > > && eltmode == GET_MODE_INNER (mode)
> > > && ((icode = optab_handler (vec_duplicate_optab, mode))
> > > != CODE_FOR_nothing)
> > > && (elt = uniform_vector_p (exp)))
> > >
> > > to something like:
> > >
> > > /* Try using vec_duplicate_optab for uniform vectors.  */
> > > if (!TREE_SIDE_EFFECTS (exp)
> > > && VECTOR_MODE_P (mode)
> > > && eltmode == GET_MODE_INNER (mode)
> > > && (elt = uniform_vector_p (exp)))
> > >   {
> > > if (TREE_CODE (elt) == INTEGER_CST
> > > || TREE_CODE (elt) == POLY_INT_CST
> > > || TREE_CODE (elt) == REAL_CST
> > > || TREE_CODE (elt) == FIXED_CST)
> > >   {
> > > rtx src = gen_const_vec_duplicate (mode, expand_normal 
> > > (node));
> > > emit_move_insn (target, src);
> > > break;
> > >   }
> > > …
> > >   }
> >
> > I will give it a try.
>
> I can confirm that veclower leaves us with an unfolded constant CTOR.
> If you file a PR to remind me I'll fix that.

The attached untested patch fixes this for the testcase.

Richard.

> Richard.
>
> > Thanks.
> >
> > --
> > H.J.


p
Description: Binary data


Re: [PATCH] Add --enable-default-semantic-interposition to GCC configure

2021-06-07 Thread Jeff Law via Gcc-patches




On 6/6/2021 5:18 PM, Andrew Pinski via Gcc-patches wrote:

On Sun, Jun 6, 2021 at 4:13 PM Fangrui Song via Gcc-patches
 wrote:

From: Fangrui Song 

--enable-default-semantic-interposition=no makes -fPIC default to
-fno-semantic-interposition which enables interprocedural optimizations
for default visibility non-vague-linkage function definitions.

The suppression of interprocedural optimizations and inlining for such
functions is the biggest difference between -fPIE/-fPIC.
Distributions may want to enable default -fno-semantic-interposition to
reclaim the lost performance (e.g. CPython is said to be 27% faster;
Clang is 3% faster).


This breaks assumptions across the board.  If software packages want
to use -fno-semantic-interposition that is one thing.  But distros
should not be changing the default.  This is just like using
-ffast-math :).
Some distros already force immediate binding at link time for security 
purposes on a distro-wide basis which, IIUC, does the same thing, but 
without the benefits from a code generation standpoint.


Jeff



Re: [PATCH v2 1/2] Allow vec_duplicate_optab to fail

2021-06-07 Thread Richard Biener via Gcc-patches
On Mon, Jun 7, 2021 at 4:19 PM H.J. Lu  wrote:
>
> On Mon, Jun 7, 2021 at 12:12 AM Richard Sandiford
>  wrote:
> >
> > "H.J. Lu"  writes:
> > > Update vec_duplicate to allow to fail so that backend can only allow
> > > broadcasting an integer constant to a vector when broadcast instruction
> > > is available.
> >
> > I'm not sure why we need this to fail though.  Once the optab is defined
> > for target X, the optab should handle all duplicates for target X,
> > even if there are different strategies it can use.
> >
> > AIUI the case you want to make conditional is the constant case.
> > I guess the first question is: why don't we simplify those CONSTRUCTORs
> > to VECTOR_CSTs in gimple?  I'm surprised we still see the constant case
> > as a constructor here.
>
> The particular testcase for vec_duplicate is gcc.dg/pr100239.c.
>
> > If we can't rely on that happening, then would it work to change:
> >
> > /* Try using vec_duplicate_optab for uniform vectors.  */
> > if (!TREE_SIDE_EFFECTS (exp)
> > && VECTOR_MODE_P (mode)
> > && eltmode == GET_MODE_INNER (mode)
> > && ((icode = optab_handler (vec_duplicate_optab, mode))
> > != CODE_FOR_nothing)
> > && (elt = uniform_vector_p (exp)))
> >
> > to something like:
> >
> > /* Try using vec_duplicate_optab for uniform vectors.  */
> > if (!TREE_SIDE_EFFECTS (exp)
> > && VECTOR_MODE_P (mode)
> > && eltmode == GET_MODE_INNER (mode)
> > && (elt = uniform_vector_p (exp)))
> >   {
> > if (TREE_CODE (elt) == INTEGER_CST
> > || TREE_CODE (elt) == POLY_INT_CST
> > || TREE_CODE (elt) == REAL_CST
> > || TREE_CODE (elt) == FIXED_CST)
> >   {
> > rtx src = gen_const_vec_duplicate (mode, expand_normal 
> > (node));
> > emit_move_insn (target, src);
> > break;
> >   }
> > …
> >   }
>
> I will give it a try.

I can confirm that veclower leaves us with an unfolded constant CTOR.
If you file a PR to remind me I'll fix that.

Richard.

> Thanks.
>
> --
> H.J.


Re: [PATCH][RFC] Sparse on entry cache for Ranger.

2021-06-07 Thread Richard Biener via Gcc-patches
On Mon, Jun 7, 2021 at 7:26 PM Andrew MacLeod  wrote:
>
> On 6/7/21 4:40 AM, Richard Biener wrote:
> > On Wed, Jun 2, 2021 at 11:15 PM Andrew MacLeod  wrote:
> >>
> >> My thoughts are I would put this into trunk, and assuming nothing comes
> >> up  over the next couple of days, port it back to GCC11 to resolve
> >> 100299 and other excessive memory consumption PRs there as well. given
> >> that its reusing bitmap code for the sparse representation, it seems
> >> like it would be low risk.
> >>
> >> Are we OK with the addition of the bitmap_get_quad and bitmap_set_quad
> >> routines in bitmap.c?  It seems like they might be useful to others.
> >> They are simple tweaks of bitmap_set_bit and bitmap_bit_p.. just dealing
> >> with 4 bits at a time.  I could make them local if this is a problem,
> >> but i don't have access to the bitmap internals there.
> > I think _quad is a bit too specific - it's aligned chunks so maybe
> >
> > void bitmap_set_aligned_chunk (bitmap, unsigned int chunk, unsigned
> > int chunk_size, BITMAP_WORD chunk_value);
> >
> > and
> >
> > BITMAP_WORD bitmap_get_aligned_chunk (bitmap, unsigned int chunk,
> > unsigned chunk_size);
> >
> > and assert that chunk_size is power-of-two and fits in BITMAP_WORD?
> >
> > (also note using unsigned ints and BITMAP_WORD for the data type)
> >
> > I've been using two-bit representations in a few places (but mostly
> > setting/testing the
> > respective bits independently), I suppose for example
>
> That's exactly how this started.. I was using a pair of bits for
> pointers. UNDEFINED, zero, non-zero and varying... and checking the bits
> independently. when I decided I needed 3 bits, the whole quad thing
> evolved since picking up 3 or 4 consecutive bits one at a time seemed
> too inefficient.
>
> >
> > static dep_state
> > query_loop_dependence (class loop *loop, im_mem_ref *ref, dep_kind kind)
> > {
> >unsigned first_bit = 6 * loop->num + kind * 2;
> >if (bitmap_bit_p (>dep_loop, first_bit))
> >  return dep_independent;
> >else if (bitmap_bit_p (>dep_loop, first_bit + 1))
> >  return dep_dependent;
> >
> > could use a chunk size of 2 and a single bitmap query.  Incidentially this
> > specific code uses 6 bits, so it's not fully aligned ...
> >
> > /* We use six bits per loop in the ref->dep_loop bitmap to record
> > the dep_kind x dep_state combinations.  */
> >
> > enum dep_kind { lim_raw, sm_war, sm_waw };
> > enum dep_state { dep_unknown, dep_independent, dep_dependent };
> >
> > ... but there's also at most a single bit set.
> >
> > Anyway, I'm OK with adding API to access aligned power-of-two sized chunks.
> > Even not power-of-two sized unaligned chunks should be quite straight
> > forward to implement if we limit the chunk size to BITMAP_WORD by
> > simply advancing to the next bitmap word / element when necessary.
> >
> > An alternative low-level API would provide accesses to whole BITMAP_WORD
> > entries and the quads could be implemented on top of that
> > (bitmap_set_word/_get_word)
> >
> > Richard.
> >
> I think I'll stick to the power of 2 limitation for now.  If someone
> finds a pressing need or desire, they can enhance it :-)
>
> Wanna eyeball this an make sure I'm not doing something unportable.. I
> just used your original 2 function names, and swapped out the 4 and a
> couple of constants for computed values. Works fine for me.
>
> I also made the self test process 2, 4 and 8 bit quantities.
>
> Its going thru a test cycle now.

+  gcc_checking_assert (__builtin_popcount (chunk_size) == 1);

please use pow2p_hwi (chunk_size) instead, __builtin_popcount might
not be available with non-GCC host compilers.

Otherwise looks good to me.

Thanks,
Richard.

> Andrew
>
>


Re: [PATCH 02/57] Support scanning of build-time GC roots in gengtype

2021-06-07 Thread Bill Schmidt via Gcc-patches

On 6/7/21 12:45 PM, Richard Biener wrote:

On Mon, Jun 7, 2021 at 5:38 PM Bill Schmidt  wrote:

On 6/7/21 8:36 AM, Richard Biener wrote:

Some maybe obvious issue - what about DOS-style path hosts?
You seem to build ../ strings to point to parent dirs...  I'm not sure
what we do elsewhere - I suppose we arrange for appropriate
-I command line arguments?


Well, actually it's just using "./" to identify the build directory,
though I see what you mean about potential Linux bias. There is
precedent for this syntax identifying the build directory in config.gcc
for target macro files:

#  tm_file  A list of target macro files, if different from
#   "$cpu_type/$cpu_type.h". Usually it's constructed
#   per target in a way like this:
#   tm_file="${tm_file} dbxelf.h elfos.h
${cpu_type.h}/elf.h"
#   Note that the preferred order is:
#   - specific target header
"${cpu_type}/${cpu_type.h}"
#   - generic headers like dbxelf.h elfos.h, etc.
#   - specializing target headers like
${cpu_type.h}/elf.h
#   This helps to keep OS specific stuff out of the CPU
#   defining header ${cpu_type}/${cpu_type.h}.
#
#   It is possible to include automatically-generated
#   build-directory files by prefixing them with "./".
#   All other files should relative to $srcdir/config.

...so I thought I would try to be consistent with this change. In patch
0025 I use this as follows:

--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -491,6 +491,7 @@ powerpc*-*-*)
  extra_options="${extra_options} g.opt fused-madd.opt
rs6000/rs6000-tables.opt"
  target_gtfiles="$target_gtfiles
\$(srcdir)/config/rs6000/rs6000-logue.c
\$(srcdir)/config/rs6000/rs6000-call.c"
  target_gtfiles="$target_gtfiles
\$(srcdir)/config/rs6000/rs6000-pcrel-opt.c"
+   target_gtfiles="$target_gtfiles ./rs6000-builtins.h"
;;
   pru-*-*)
cpu_type=pru

I'm open to trying to do something different if you think that's
appropriate.

Well, I'm not sure whether/how to resolve this.  You could try
building a cross to powerpc-linux from a x86_64-mingw host ...
maybe there's one on the CF?  Or some of your fellow RedHat
people have access to mingw or the like envs to try whether it
just works with your change ...

Otherwise it looks OK.


I'll see what I can find.  Thanks again for reviewing the patch!

Bill




Richard.


Thanks for your help with this!

Bill



Re: [PATCH 02/57] Support scanning of build-time GC roots in gengtype

2021-06-07 Thread Richard Biener via Gcc-patches
On Mon, Jun 7, 2021 at 5:38 PM Bill Schmidt  wrote:
>
> On 6/7/21 8:36 AM, Richard Biener wrote:
> >
> > Some maybe obvious issue - what about DOS-style path hosts?
> > You seem to build ../ strings to point to parent dirs...  I'm not sure
> > what we do elsewhere - I suppose we arrange for appropriate
> > -I command line arguments?
> >
> Well, actually it's just using "./" to identify the build directory,
> though I see what you mean about potential Linux bias. There is
> precedent for this syntax identifying the build directory in config.gcc
> for target macro files:
>
> #  tm_file  A list of target macro files, if different from
> #   "$cpu_type/$cpu_type.h". Usually it's constructed
> #   per target in a way like this:
> #   tm_file="${tm_file} dbxelf.h elfos.h
> ${cpu_type.h}/elf.h"
> #   Note that the preferred order is:
> #   - specific target header
> "${cpu_type}/${cpu_type.h}"
> #   - generic headers like dbxelf.h elfos.h, etc.
> #   - specializing target headers like
> ${cpu_type.h}/elf.h
> #   This helps to keep OS specific stuff out of the CPU
> #   defining header ${cpu_type}/${cpu_type.h}.
> #
> #   It is possible to include automatically-generated
> #   build-directory files by prefixing them with "./".
> #   All other files should relative to $srcdir/config.
>
> ...so I thought I would try to be consistent with this change. In patch
> 0025 I use this as follows:
>
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -491,6 +491,7 @@ powerpc*-*-*)
>  extra_options="${extra_options} g.opt fused-madd.opt
> rs6000/rs6000-tables.opt"
>  target_gtfiles="$target_gtfiles
> \$(srcdir)/config/rs6000/rs6000-logue.c
> \$(srcdir)/config/rs6000/rs6000-call.c"
>  target_gtfiles="$target_gtfiles
> \$(srcdir)/config/rs6000/rs6000-pcrel-opt.c"
> +   target_gtfiles="$target_gtfiles ./rs6000-builtins.h"
> ;;
>   pru-*-*)
> cpu_type=pru
>
> I'm open to trying to do something different if you think that's
> appropriate.

Well, I'm not sure whether/how to resolve this.  You could try
building a cross to powerpc-linux from a x86_64-mingw host ...
maybe there's one on the CF?  Or some of your fellow RedHat
people have access to mingw or the like envs to try whether it
just works with your change ...

Otherwise it looks OK.

Richard.

> Thanks for your help with this!
>
> Bill
>


Re: [PATCH][RFC] Sparse on entry cache for Ranger.

2021-06-07 Thread Andrew MacLeod via Gcc-patches

On 6/7/21 4:40 AM, Richard Biener wrote:

On Wed, Jun 2, 2021 at 11:15 PM Andrew MacLeod  wrote:


My thoughts are I would put this into trunk, and assuming nothing comes
up  over the next couple of days, port it back to GCC11 to resolve
100299 and other excessive memory consumption PRs there as well. given
that its reusing bitmap code for the sparse representation, it seems
like it would be low risk.

Are we OK with the addition of the bitmap_get_quad and bitmap_set_quad
routines in bitmap.c?  It seems like they might be useful to others.
They are simple tweaks of bitmap_set_bit and bitmap_bit_p.. just dealing
with 4 bits at a time.  I could make them local if this is a problem,
but i don't have access to the bitmap internals there.

I think _quad is a bit too specific - it's aligned chunks so maybe

void bitmap_set_aligned_chunk (bitmap, unsigned int chunk, unsigned
int chunk_size, BITMAP_WORD chunk_value);

and

BITMAP_WORD bitmap_get_aligned_chunk (bitmap, unsigned int chunk,
unsigned chunk_size);

and assert that chunk_size is power-of-two and fits in BITMAP_WORD?

(also note using unsigned ints and BITMAP_WORD for the data type)

I've been using two-bit representations in a few places (but mostly
setting/testing the
respective bits independently), I suppose for example


That's exactly how this started.. I was using a pair of bits for 
pointers. UNDEFINED, zero, non-zero and varying... and checking the bits 
independently. when I decided I needed 3 bits, the whole quad thing 
evolved since picking up 3 or 4 consecutive bits one at a time seemed 
too inefficient.




static dep_state
query_loop_dependence (class loop *loop, im_mem_ref *ref, dep_kind kind)
{
   unsigned first_bit = 6 * loop->num + kind * 2;
   if (bitmap_bit_p (>dep_loop, first_bit))
 return dep_independent;
   else if (bitmap_bit_p (>dep_loop, first_bit + 1))
 return dep_dependent;

could use a chunk size of 2 and a single bitmap query.  Incidentially this
specific code uses 6 bits, so it's not fully aligned ...

/* We use six bits per loop in the ref->dep_loop bitmap to record
the dep_kind x dep_state combinations.  */

enum dep_kind { lim_raw, sm_war, sm_waw };
enum dep_state { dep_unknown, dep_independent, dep_dependent };

... but there's also at most a single bit set.

Anyway, I'm OK with adding API to access aligned power-of-two sized chunks.
Even not power-of-two sized unaligned chunks should be quite straight
forward to implement if we limit the chunk size to BITMAP_WORD by
simply advancing to the next bitmap word / element when necessary.

An alternative low-level API would provide accesses to whole BITMAP_WORD
entries and the quads could be implemented on top of that
(bitmap_set_word/_get_word)

Richard.

I think I'll stick to the power of 2 limitation for now.  If someone 
finds a pressing need or desire, they can enhance it :-)


Wanna eyeball this an make sure I'm not doing something unportable.. I 
just used your original 2 function names, and swapped out the 4 and a 
couple of constants for computed values. Works fine for me.


I also made the self test process 2, 4 and 8 bit quantities.

Its going thru a test cycle now.

Andrew


commit e0fee2e994c4b763c2a8b2bcfb4b0ee30cb3e500
Author: Andrew MacLeod 
Date:   Mon Jun 7 13:12:01 2021 -0400

Implement multi-bit aligned accessors for sparse bitmap.

Provide set/get routines to allow sparse bitmaps to be treated as an array
of multiple bit values. Only chunk sizes that are powers of 2 are supported.

* bitmap.c (bitmap_set_aligned_chunk): New.
(bitmap_get_aligned_chunk): New.
(test_aligned_chunk): New.
(bitmap_c_tests): Call test_aligned_chunk.
* bitmap.h (bitmap_set_aligned_chunk, bitmap_get_aligned_chunk): New.

diff --git a/gcc/bitmap.c b/gcc/bitmap.c
index 5a650cdfc1d..7e1a218944d 100644
--- a/gcc/bitmap.c
+++ b/gcc/bitmap.c
@@ -1004,6 +1004,83 @@ bitmap_bit_p (const_bitmap head, int bit)
   return (ptr->bits[word_num] >> bit_num) & 1;
 }
 
+/* Set CHUNK_SIZE bits at a time in bitmap HEAD.
+   Store CHUNK_VALUE starting at bits CHUNK * chunk_size.
+   This is the set routine for viewing bitmap as a multi-bit sparse array.  */
+
+void
+bitmap_set_aligned_chunk (bitmap head, unsigned int chunk,
+			  unsigned int chunk_size, BITMAP_WORD chunk_value)
+{
+  // Ensure chunk size is a power of 2 and fits in BITMAP_WORD.
+  gcc_checking_assert (__builtin_popcount (chunk_size) == 1);
+  gcc_checking_assert (chunk_size < (sizeof (BITMAP_WORD) * CHAR_BIT));
+
+  // Ensure chunk_value is within range of chunk_size bits.
+  BITMAP_WORD max_value = (1 << chunk_size) - 1;
+  gcc_checking_assert (chunk_value <= max_value);
+
+  unsigned bit = chunk * chunk_size;
+  unsigned indx = bit / BITMAP_ELEMENT_ALL_BITS;
+  bitmap_element *ptr;
+  if (!head->tree_form)
+ptr = bitmap_list_find_element (head, indx);
+  else
+ptr = bitmap_tree_find_element (head, indx);
+  unsigned word_num = bit 

Re: [PATCH][version 3]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-06-07 Thread Qing Zhao via Gcc-patches
Hi, 

> On Jun 7, 2021, at 2:53 AM, Richard Biener  wrote:
> 
>> 
>> To address the above suggestion:
>> 
>> My study shows: the call to __builtin_clear_padding is expanded during 
>> gimplification phase.
>> And there is no __bultin_clear_padding expanding during rtx expanding phase.
>> However, for -ftrivial-auto-var-init, padding initialization should be done 
>> both in gimplification phase and rtx expanding phase.
>> since the __builtin_clear_padding might not be good for rtx expanding, 
>> reusing __builtin_clear_padding might not work.
>> 
>> Let me know if you have any more comments on this.
> 
> Yes, I didn't suggest to literally emit calls to __builtin_clear_padding 
> but instead to leverage the lowering code, more specifically share the
> code that figures _what_ is to be initialized (where the padding is)
> and eventually the actual code generation pieces.  That might need some
> refactoring but the code where padding resides should be present only
> a single time (since it's quite complex).

Okay, I see your point here.

> 
> Which is also why I suggested to split out the padding initialization
> bits to a separate patch (and option).

Personally, I am okay with splitting padding initialization from this current 
patch,
Kees, what’s your opinion on this? i.e, the current -ftrivial-auto-var-init 
will NOT initialize padding, we will add another option to 
Explicitly initialize padding.

Qing


> 
> Richard.



Re: [PATCH][version 3]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-06-07 Thread Qing Zhao via Gcc-patches
(Kees, can you answer one of Richard’s question below? On the reason to 
initialize padding of structures)

Richard,


On Jun 7, 2021, at 2:48 AM, Richard Biener 
mailto:rguent...@suse.de>> wrote:

Meh - can you try using a mailer that does proper quoting?  It's difficult
to spot your added comments.  Will try anyway (and sorry for the delay)

Only the email replied to gcc-patch alias had this issue, all the other emails 
I sent are fine. Not sure why?


Both clang and my patch add initialization to the above auto variable “line”.

So, I have the following questions need help:

1. Do we need to exclude C++ class with ctor from auto initialization?

2. I see Clang use call to internal memset to initialize such class, but for my 
patch, I only initialize the data fields inside this class.
   Which one is better?

I can't answer either question, but generally using block-initialization
(for example via memset, but we'd generally prefer X = {}) is better for
later optimization.

Okay. So, Is this he same reason as lowering the call to .DEFFERED_INIT through 
expand_builtin_memset other than expand_assign?


seeing this, can you explain why using .DEFERRED_INIT does not
work for VLAs?

The major reason for going different routes for VLAs vs. no-VLAs is:

In the original gimplification phase, VLAs and no-VLAs go different routes.
I just followed the different routes for them:

In “gimplify_decl_expr”, VLA goes to “gimplify_vla_decl”, and is expanded to
call to alloca.  Naturally, I add calls to “memset/memcpy” in 
“gimplify_vla_decl” to
Initialize it.

On the other hand, no-VLAs are handled differently in “gimplify_decl_expr”, so
I added calls to “.DEFFERED_INIT” to initialize them.

What’s the major issue if I add calls to “memset/memcpy” in “gimplify_vla_decl” 
to
Initialize VLAs?

Just inconsistency and unexpected different behavior with respect to
uninitialized warnings?

Okay.
Will try to initialize VLA through the call to .DEFFERED_INIT too. And see 
whether there is any issue with it.


@@ -5001,6 +5185,17 @@ gimplify_init_constructor (tree *expr_p, gimple_seq
*pre_p, gimple_seq *post_p,
/* If a single access to the target must be ensured and all
elements
   are zero, then it's optimal to clear whatever their number.
*/
cleared = true;
+   else if (flag_trivial_auto_var_init > AUTO_INIT_UNINITIALIZED
+&& !TREE_STATIC (object)
+&& type_has_padding (type))
+ /* If the user requests to initialize automatic variables with
+paddings inside the type, we should initialize the paddings
too.
+C guarantees that brace-init with fewer initializers than
members
+aggregate will initialize the rest of the aggregate as-if it
were
+static initialization.  In turn static initialization
guarantees
+that pad is initialized to zero bits.
+So, it's better to clear the whole record under such
situation.  */
+ cleared = true;

so here we have padding as well - I think this warrants to be controlled
by an extra option?  And we can maybe split this out to a separate
patch? (the whole padding stuff)

Clang does the padding initialization with this option, shall we be
consistent with Clang?

Just for the sake of consistency?  No.  Is there a technical reason
for this complication?  Say we have

 struct { short s; int i; } a;

what's the technical reason to initialize the padding?  I might
be tempted to use -ftrivial-auto-init but I'd definitely don't
want to spend cycles/instructions initializing the padding in the
above struct.

Kees, could you please answer this question? What’s the major reason to 
initialize padding
of structures from the security point of view?


At this point I also wonder whether doing the actual initialization
by block-initializing the current function frame at allocation
time.

Which phase is for “allocation time”, please point me to the specific phase and 
source file.


That would be a way smaller patch (but possibly backend
specific).  On x86 it could be a single rep mov; for all but the
VLA cases.  Just a thought.



Thanks.

Qing



Re: [PATCH 02/57] Support scanning of build-time GC roots in gengtype

2021-06-07 Thread Bill Schmidt via Gcc-patches

On 6/7/21 8:36 AM, Richard Biener wrote:


Some maybe obvious issue - what about DOS-style path hosts?
You seem to build ../ strings to point to parent dirs...  I'm not sure
what we do elsewhere - I suppose we arrange for appropriate
-I command line arguments?

Well, actually it's just using "./" to identify the build directory, 
though I see what you mean about potential Linux bias. There is 
precedent for this syntax identifying the build directory in config.gcc 
for target macro files:


#  tm_file  A list of target macro files, if different from
#   "$cpu_type/$cpu_type.h". Usually it's constructed
#   per target in a way like this:
#   tm_file="${tm_file} dbxelf.h elfos.h 
${cpu_type.h}/elf.h"

#   Note that the preferred order is:
#   - specific target header 
"${cpu_type}/${cpu_type.h}"

#   - generic headers like dbxelf.h elfos.h, etc.
#   - specializing target headers like 
${cpu_type.h}/elf.h

#   This helps to keep OS specific stuff out of the CPU
#   defining header ${cpu_type}/${cpu_type.h}.
#
#   It is possible to include automatically-generated
#   build-directory files by prefixing them with "./".
#   All other files should relative to $srcdir/config.

...so I thought I would try to be consistent with this change. In patch 
0025 I use this as follows:


--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -491,6 +491,7 @@ powerpc*-*-*)
    extra_options="${extra_options} g.opt fused-madd.opt 
rs6000/rs6000-tables.opt"
    target_gtfiles="$target_gtfiles 
\$(srcdir)/config/rs6000/rs6000-logue.c 
\$(srcdir)/config/rs6000/rs6000-call.c"
    target_gtfiles="$target_gtfiles 
\$(srcdir)/config/rs6000/rs6000-pcrel-opt.c"

+   target_gtfiles="$target_gtfiles ./rs6000-builtins.h"
;;
 pru-*-*)
cpu_type=pru

I'm open to trying to do something different if you think that's 
appropriate.


Thanks for your help with this!

Bill



Re: [PATCH] PR libstdc++/98842: Fixed Constraints on operator<=>(optional, U)

2021-06-07 Thread Jonathan Wakely via Gcc-patches
On Fri, 4 Jun 2021 at 21:41, Jonathan Wakely wrote:
>
> On Thu, 3 Jun 2021 at 17:27, Seija K. via Libstdc++ 
> wrote:
>
> > The original operator was underconstrained. _Up needs to fulfill
> > compare_three_way_result,
> > as mentioned in this bug report
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98842
> >
>
> Thanks, I'll get the patch applied next week.

The patch causes testsuite failures.

I fixed it with the attached change instead, including a workaround
for an apparent C++20 defect.

Pushed to trunk so far.
commit adec14811714e22a6c1f7f0199adc05370f0d8b0
Author: Jonathan Wakely 
Date:   Mon Jun 7 13:02:15 2021

libstdc++: Constrain three-way comparison for std::optional [PR 98842]

The operator<=>(const optional&, const U&) operator is supposed to be
constrained with three_way_comparable_with so that it can only be
used when T and U are weakly-equality-comparable and also three-way
comparable.

Adding that constrain completely breaks std::optional comparisons,
because it causes constraint recursion. To avoid that, an additional
check that U is not a specialization of std::optional is needed. That
appears to be a defect in the standard and should be reported to LWG.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

PR libstdc++/98842
* include/std/optional (operator<=>(const optional& const U&)):
Add missing constraint and add workaround for template
recursion.
* testsuite/20_util/optional/relops/three_way.cc: Check that
type without equality comparison cannot be compared when wrapped
in std::optional.

diff --git a/libstdc++-v3/include/std/optional 
b/libstdc++-v3/include/std/optional
index 8b9e038e6e5..415f8c49ef4 100644
--- a/libstdc++-v3/include/std/optional
+++ b/libstdc++-v3/include/std/optional
@@ -1234,7 +1234,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 { return !__rhs || __lhs >= *__rhs; }
 
 #ifdef __cpp_lib_three_way_comparison
+  template
+inline constexpr bool __is_optional_v = false;
+  template
+inline constexpr bool __is_optional_v> = true;
+
   template
+requires (!__is_optional_v<_Up>)
+  && three_way_comparable_with<_Tp, _Up>
 constexpr compare_three_way_result_t<_Tp, _Up>
 operator<=>(const optional<_Tp>& __x, const _Up& __v)
 { return bool(__x) ? *__x <=> __v : strong_ordering::less; }
diff --git a/libstdc++-v3/testsuite/20_util/optional/relops/three_way.cc 
b/libstdc++-v3/testsuite/20_util/optional/relops/three_way.cc
index 953bef4a1ea..5fc5eec5abf 100644
--- a/libstdc++-v3/testsuite/20_util/optional/relops/three_way.cc
+++ b/libstdc++-v3/testsuite/20_util/optional/relops/three_way.cc
@@ -15,8 +15,8 @@
 // with this library; see the file COPYING3.  If not see
 // .
 
-// { dg-options "-std=gnu++2a" }
-// { dg-do compile { target c++2a } }
+// { dg-options "-std=gnu++20" }
+// { dg-do compile { target c++20 } }
 
 #include 
 
@@ -74,3 +74,21 @@ test02()
   static_assert( nullopt <= O{} );
   static_assert( nullopt <= O{1} );
 }
+
+template
+  concept has_spaceship = requires (const T& t) { t <=> t; };
+
+void
+test03()
+{
+  struct E
+  {
+auto operator<=>(const E&) const { return std::strong_ordering::equal; }
+  };
+  static_assert( !std::three_way_comparable ); // not equality comparable
+  using O = std::optional;
+  static_assert( !std::three_way_comparable );
+  static_assert( ! has_spaceship ); // PR libstdc++/98842
+  struct U : O { };
+  static_assert( ! has_spaceship );
+}


Re: [PATCH] predcom: Adjust some unnecessary update_ssa calls

2021-06-07 Thread Martin Sebor via Gcc-patches

On 6/7/21 8:46 AM, Richard Biener via Gcc-patches wrote:

On Wed, Jun 2, 2021 at 11:29 AM Kewen.Lin  wrote:


Hi,

As Richi suggested in PR100794, this patch is to remove
some unnecessary update_ssa calls with flag
TODO_update_ssa_only_virtuals, also do some refactoring.

Bootstrapped/regtested on powerpc64le-linux-gnu P9,
x86_64-redhat-linux and aarch64-linux-gnu, built well
on Power9 ppc64le with --with-build-config=bootstrap-O3,
and passed both P8 and P9 SPEC2017 full build with
{-O3, -Ofast} + {,-funroll-loops}.

Is it ok for trunk?


LGTM, minor comment on the fancy C++:

+  auto cleanup = [&]() {
+release_chains (chains);
+free_data_refs (datarefs);
+BITMAP_FREE (looparound_phis);
+free_affine_expand_cache (_expansions);
+  };

+  cleanup ();
+  return 0;

so that could have been

   class cleanup {
  ~cleanup()
 {
   release_chains (chains);
   free_data_refs (datarefs);
   BITMAP_FREE (looparound_phis);
   free_affine_expand_cache (_expansions);
 }
   } cleanup;

?  Or some other means of adding registering a RAII-style cleanup?


I agree this would be better than invoking the cleanup lambda
explicitly.

Going a step further would be to encapsulate all the data in a class
(eliminating the static variables) and make
tree_predictive_commoning_loop() its member function (along with
whatever others it calls), and have the dtor take care of
the cleanup.

Of course, if the data types were classes with ctors and dtors
(including, for example, auto_vec rather than vec for chains)
the cleanup would just happen and none of the explicit calls
would be necessary.

Martin


I mean, we can't wrap it all in

   try {...}
   finally {...}

because C++ doesn't have finally.

OK with this tiny part of the C++ refactoring delayed, but we can also simply
discuss best options.  At least for looparound_phis a good cleanup would
be to pass the bitmap around and use auto_bitmap local to
tree_predictive_commoning_loop ...

Thanks,
Richard.


BR,
Kewen
-
gcc/ChangeLog:

 * tree-predcom.c (execute_pred_commoning): Remove update_ssa call.
 (tree_predictive_commoning_loop): Factor some cleanup stuffs into
 lambda function cleanup, remove scev_reset call, and adjust return
 value.
 (tree_predictive_commoning): Adjust for different changed values,
 only set flag TODO_update_ssa_only_virtuals if changed.
 (pass_data pass_data_predcom): Remove TODO_update_ssa_only_virtuals
 from todo_flags_finish.





Re: [PATCH] Improve match_simplify_replacement in phi-opt

2021-06-07 Thread Richard Biener via Gcc-patches
On Thu, Jun 3, 2021 at 7:32 AM apinski--- via Gcc-patches
 wrote:
>
> From: Andrew Pinski 
>
> This improves match_simplify_replace in phi-opt to handle the
> case where there is one cheap (non-call) preparation statement in the
> middle basic block similar to xor_replacement and others.
> This allows to remove xor_replacement which it does too.
>
> OK?  Bootstrapped and tested on x86_64-linux-gnu with no regressions.
>
> Thanks,
> Andrew Pinski
>
> Changes since v1:
> v2 - change the check on the preparation statement to
> allow only assignments and no calls and only assignments
> that feed into the phi.
>
> gcc/ChangeLog:
>
> PR tree-optimization/25290
> * tree-ssa-phiopt.c (xor_replacement): Delete.
> (tree_ssa_phiopt_worker): Delete use of xor_replacement.
> (match_simplify_replacement): Allow one cheap preparation
> statement that can be moved to before the if.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/pr96928-1.c: Fix testcase for now that ~
> happens on the outside of the bit_xor.
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/pr96928-1.c |   4 +-
>  gcc/tree-ssa-phiopt.c | 164 +++---
>  2 files changed, 54 insertions(+), 114 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr96928-1.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/pr96928-1.c
> index a2770e5e896..2e86620da11 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/pr96928-1.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr96928-1.c
> @@ -1,9 +1,9 @@
>  /* PR tree-optimization/96928 */
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -fdump-tree-phiopt2" } */
> +/* { dg-options "-O2 -fdump-tree-phiopt2 -fdump-tree-optimized" } */
>  /* { dg-final { scan-tree-dump-times " = a_\[0-9]*\\\(D\\\) >> " 5 "phiopt2" 
> } } */
>  /* { dg-final { scan-tree-dump-times " = ~c_\[0-9]*\\\(D\\\);" 1 "phiopt2" } 
> } */
> -/* { dg-final { scan-tree-dump-times " = ~" 1 "phiopt2" } } */
> +/* { dg-final { scan-tree-dump-times " = ~" 1 "optimized" } } */
>  /* { dg-final { scan-tree-dump-times " = \[abc_0-9\\\(\\\)D]* \\\^ " 5 
> "phiopt2" } } */
>  /* { dg-final { scan-tree-dump-not "a < 0" "phiopt2" } } */
>
> diff --git a/gcc/tree-ssa-phiopt.c b/gcc/tree-ssa-phiopt.c
> index 969b868397e..ab852ea1ad4 100644
> --- a/gcc/tree-ssa-phiopt.c
> +++ b/gcc/tree-ssa-phiopt.c
> @@ -28,6 +28,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "cfghooks.h"
>  #include "tree-pass.h"
>  #include "ssa.h"
> +#include "tree-ssa.h"
>  #include "optabs-tree.h"
>  #include "insn-config.h"
>  #include "gimple-pretty-print.h"
> @@ -63,8 +64,6 @@ static bool minmax_replacement (basic_block, basic_block,
> edge, edge, gphi *, tree, tree);
>  static bool abs_replacement (basic_block, basic_block,
>  edge, edge, gphi *, tree, tree);
> -static bool xor_replacement (basic_block, basic_block,
> -edge, edge, gphi *, tree, tree);
>  static bool spaceship_replacement (basic_block, basic_block,
>edge, edge, gphi *, tree, tree);
>  static bool cond_removal_in_popcount_clz_ctz_pattern (basic_block, 
> basic_block,
> @@ -352,9 +351,6 @@ tree_ssa_phiopt_worker (bool do_store_elim, bool 
> do_hoist_loads, bool early_p)
> cfgchanged = true;
>   else if (abs_replacement (bb, bb1, e1, e2, phi, arg0, arg1))
> cfgchanged = true;
> - else if (!early_p
> -  && xor_replacement (bb, bb1, e1, e2, phi, arg0, arg1))
> -   cfgchanged = true;
>   else if (!early_p
>&& cond_removal_in_popcount_clz_ctz_pattern (bb, bb1, e1,
> e2, phi, arg0,
> @@ -801,14 +797,51 @@ match_simplify_replacement (basic_block cond_bb, 
> basic_block middle_bb,
>edge true_edge, false_edge;
>gimple_seq seq = NULL;
>tree result;
> -
> -  if (!empty_block_p (middle_bb))
> -return false;
> +  gimple *stmt_to_move = NULL;
>
>/* Special case A ? B : B as this will always simplify to B. */
>if (operand_equal_for_phi_arg_p (arg0, arg1))
>  return false;
>
> +  /* If the basic block only has a cheap preparation statement,
> + allow it and move it once the transformation is done. */
> +  if (!empty_block_p (middle_bb))
> +{
> +  stmt_to_move = last_and_only_stmt (middle_bb);
> +  if (!stmt_to_move)
> +   return false;
> +
> +  if (gimple_vuse (stmt_to_move))
> +   return false;
> +
> +  if (gimple_could_trap_p (stmt_to_move)
> + || gimple_has_side_effects (stmt_to_move))
> +   return false;
> +
> +  if (gimple_uses_undefined_value_p (stmt_to_move))
> +   return false;
> +
> +  /* Allow assignments and not no calls.

"not no"

> +As const calls don't match any of the above, yet they could
> +still have some side-effects - they could contain
> +gimple_could_trap_p 

Re: [PATCH] libstdc++: add missing typename for dependent type in std::ranges::elements_view [PR100900]

2021-06-07 Thread Jonathan Wakely via Gcc-patches
The patch is approved, if Patrick doesn't do it I'll push it later
today. Thanks!




Re: [PATCH v2] predcom: Enabled by loop vect at O2 [PR100794]

2021-06-07 Thread Richard Biener via Gcc-patches
On Thu, Jun 3, 2021 at 5:33 AM Kewen.Lin  wrote:
>
> Hi Richard,
>
> on 2021/6/3 上午1:19, Richard Sandiford wrote:
> > "Kewen.Lin via Gcc-patches"  writes:
> >> Hi,
> >>
> >> As PR100794 shows, in the current implementation PRE bypasses
> >> some optimization to avoid introducing loop carried dependence
> >> which stops loop vectorizer to vectorize the loop.  At -O2,
> >> there is no downstream pass to re-catch this kind of opportunity
> >> if loop vectorizer fails to vectorize that loop.
> >>
> >> This patch follows Richi's suggestion in the PR, if predcom flag
> >> isn't set and loop vectorization will enable predcom without any
> >> unrolling implicitly.  The Power9 SPEC2017 evaluation showed it
> >> can speed up 521.wrf_r 3.30% and 554.roms_r 1.08% at very-cheap
> >> cost model, no remarkable impact at cheap cost model, the build
> >> time and size impact is fine (see the PR for the details).
> >>
> >> By the way, I tested another proposal to guard PRE not skip the
> >> optimization for cheap and very-cheap vect cost models, the
> >> evaluation results showed it's fine with very cheap cost model,
> >> but it can degrade some bmks like 521.wrf_r -9.17% and
> >> 549.fotonik3d_r -2.07% etc.
> >>
> >> Bootstrapped/regtested on powerpc64le-linux-gnu P9,
> >> x86_64-redhat-linux and aarch64-linux-gnu.
> >>
> >> Is it ok for trunk?
> >>
> >> BR,
> >> Kewen
> >> -
> >> gcc/ChangeLog:
> >>
> >>  PR tree-optimization/100794
> >>  * tree-predcom.c (tree_predictive_commoning_loop): Add parameter
> >>  allow_unroll_p and only allow unrolling when it's true.
> >>  (tree_predictive_commoning): Add parameter allow_unroll_p and
> >>  adjust for it.
> >>  (run_tree_predictive_commoning): Likewise.
> >>  (class pass_predcom): Add private member allow_unroll_p.
> >>  (pass_predcom::pass_predcom): Init allow_unroll_p.
> >>  (pass_predcom::gate): Check flag_tree_loop_vectorize and
> >>  global_options_set.x_flag_predictive_commoning.
> >>  (pass_predcom::execute): Adjust for allow_unroll_p.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >>  PR tree-optimization/100794
> >>  * gcc.dg/tree-ssa/pr100794.c: New test.
> >>
> >>  gcc/testsuite/gcc.dg/tree-ssa/pr100794.c | 20 +
> >>  gcc/tree-predcom.c   | 57 +---
> >>  2 files changed, 60 insertions(+), 17 deletions(-)
> >>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr100794.c
> >>
> >> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr100794.c 
> >> b/gcc/testsuite/gcc.dg/tree-ssa/pr100794.c
> >> new file mode 100644
> >> index 000..6f707ae7fba
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr100794.c
> >> @@ -0,0 +1,20 @@
> >> +/* { dg-do compile } */
> >> +/* { dg-options "-O2 -ftree-loop-vectorize -fdump-tree-pcom-details 
> >> -fdisable-tree-vect" } */
> >> +
> >> +extern double arr[100];
> >> +extern double foo (double, double);
> >> +extern double sum;
> >> +
> >> +void
> >> +test (int i_0, int i_n)
> >> +{
> >> +  int i;
> >> +  for (i = i_0; i < i_n - 1; i++)
> >> +{
> >> +  double a = arr[i];
> >> +  double b = arr[i + 1];
> >> +  sum += a * b;
> >> +}
> >> +}
> >> +
> >> +/* { dg-final { scan-tree-dump "Executing predictive commoning without 
> >> unrolling" "pcom" } } */
> >> diff --git a/gcc/tree-predcom.c b/gcc/tree-predcom.c
> >> index 02f911a08bb..65a93c8e505 100644
> >> --- a/gcc/tree-predcom.c
> >> +++ b/gcc/tree-predcom.c
> >> @@ -3178,13 +3178,13 @@ insert_init_seqs (class loop *loop, vec 
> >> chains)
> >> applied to this loop.  */
> >>
> >>  static unsigned
> >> -tree_predictive_commoning_loop (class loop *loop)
> >> +tree_predictive_commoning_loop (class loop *loop, bool allow_unroll_p)
> >>  {
> >>vec datarefs;
> >>vec dependences;
> >>struct component *components;
> >>vec chains = vNULL;
> >> -  unsigned unroll_factor;
> >> +  unsigned unroll_factor = 0;
> >>class tree_niter_desc desc;
> >>bool unroll = false, loop_closed_ssa = false;
> >>
> >> @@ -3272,11 +3272,13 @@ tree_predictive_commoning_loop (class loop *loop)
> >>dump_chains (dump_file, chains);
> >>  }
> >>
> >> -  /* Determine the unroll factor, and if the loop should be unrolled, 
> >> ensure
> >> - that its number of iterations is divisible by the factor.  */
> >> -  unroll_factor = determine_unroll_factor (chains);
> >> -  unroll = (unroll_factor > 1
> >> -&& can_unroll_loop_p (loop, unroll_factor, ));
> >> +  if (allow_unroll_p)
> >> +/* Determine the unroll factor, and if the loop should be unrolled, 
> >> ensure
> >> +   that its number of iterations is divisible by the factor.  */
> >> +unroll_factor = determine_unroll_factor (chains);
> >> +
> >> +  if (unroll_factor > 1)
> >> +unroll = can_unroll_loop_p (loop, unroll_factor, );
> >>
> >>/* Execute the predictive commoning transformations, and possibly 
> >> unroll the
> >>   loop.  */
> >> @@ -3319,7 +3321,7 @@ 

Re: [PATCH] predcom: Adjust some unnecessary update_ssa calls

2021-06-07 Thread Richard Biener via Gcc-patches
On Wed, Jun 2, 2021 at 11:29 AM Kewen.Lin  wrote:
>
> Hi,
>
> As Richi suggested in PR100794, this patch is to remove
> some unnecessary update_ssa calls with flag
> TODO_update_ssa_only_virtuals, also do some refactoring.
>
> Bootstrapped/regtested on powerpc64le-linux-gnu P9,
> x86_64-redhat-linux and aarch64-linux-gnu, built well
> on Power9 ppc64le with --with-build-config=bootstrap-O3,
> and passed both P8 and P9 SPEC2017 full build with
> {-O3, -Ofast} + {,-funroll-loops}.
>
> Is it ok for trunk?

LGTM, minor comment on the fancy C++:

+  auto cleanup = [&]() {
+release_chains (chains);
+free_data_refs (datarefs);
+BITMAP_FREE (looparound_phis);
+free_affine_expand_cache (_expansions);
+  };

+  cleanup ();
+  return 0;

so that could have been

  class cleanup {
 ~cleanup()
{
  release_chains (chains);
  free_data_refs (datarefs);
  BITMAP_FREE (looparound_phis);
  free_affine_expand_cache (_expansions);
}
  } cleanup;

?  Or some other means of adding registering a RAII-style cleanup?
I mean, we can't wrap it all in

  try {...}
  finally {...}

because C++ doesn't have finally.

OK with this tiny part of the C++ refactoring delayed, but we can also simply
discuss best options.  At least for looparound_phis a good cleanup would
be to pass the bitmap around and use auto_bitmap local to
tree_predictive_commoning_loop ...

Thanks,
Richard.

> BR,
> Kewen
> -
> gcc/ChangeLog:
>
> * tree-predcom.c (execute_pred_commoning): Remove update_ssa call.
> (tree_predictive_commoning_loop): Factor some cleanup stuffs into
> lambda function cleanup, remove scev_reset call, and adjust return
> value.
> (tree_predictive_commoning): Adjust for different changed values,
> only set flag TODO_update_ssa_only_virtuals if changed.
> (pass_data pass_data_predcom): Remove TODO_update_ssa_only_virtuals
> from todo_flags_finish.
>


[committed] H8 -- use simple moves to eliminate redundant test/compares

2021-06-07 Thread Jeff Law


On the H8 most simple data movement instructions set the N and Z bits in 
the status register in useful ways.  This patch exposes that and as a 
result we eliminate more unnecessary test/compare instructions.  
Eliminating an SImode test is particularly profitable from a codesize 
standpoint (6 bytes).


Committed to the trunk.

Jeff
commit f0d1a675e0f621fc12c7a9db47446ae38289408a
Author: Jeff Law 
Date:   Sun Jun 6 00:44:13 2021 -0400

Use moves to eliminate redundant test/compare instructions

gcc/

* config/h8300/movepush.md: Change most _clobber_flags
patterns to instead use  subst.
(movsi_cczn): New pattern with usable CC cases split out.
(movsi_h8sx_cczn): Likewise.

diff --git a/gcc/config/h8300/movepush.md b/gcc/config/h8300/movepush.md
index 9ce00fb656c..ada4ddd0beb 100644
--- a/gcc/config/h8300/movepush.md
+++ b/gcc/config/h8300/movepush.md
@@ -13,7 +13,7 @@
   [(parallel [(set (match_dup 0) (match_dup 1))
  (clobber (reg:CC CC_REG))])])
 
-(define_insn "*movqi_clobber_flags"
+(define_insn "*movqi"
   [(set (match_operand:QI 0 "general_operand_dst" "=r,r ,<,r,r,m")
(match_operand:QI 1 "general_operand_src" " I,r>,r,n,m,r"))
(clobber (reg:CC CC_REG))]
@@ -36,7 +36,7 @@
   [(parallel [(set (match_dup 0) (match_dup 1))
  (clobber (reg:CC CC_REG))])])
 
-(define_insn "*movqi_h8sx_clobber_flags"
+(define_insn "*movqi_h8sx"
   [(set (match_operand:QI 0 "general_operand_dst" "=Z,rQ")
(match_operand:QI 1 "general_operand_src" "P4>X,rQi"))
(clobber (reg:CC CC_REG))]
@@ -74,7 +74,7 @@
  (clobber (reg:CC CC_REG))])])
 
 
-(define_insn "movstrictqi_clobber_flags"
+(define_insn "*movstrictqi"
   [(set (strict_low_part (match_operand:QI 0 "general_operand_dst" "+r,r"))
 (match_operand:QI 1 "general_operand_src" "I,rmi>"))
(clobber (reg:CC CC_REG))]
@@ -97,7 +97,7 @@
   [(parallel [(set (match_dup 0) (match_dup 1))
  (clobber (reg:CC CC_REG))])])
 
-(define_insn "*movhi_clobber_flags"
+(define_insn "*movhi"
   [(set (match_operand:HI 0 "general_operand_dst" "=r,r,<,r,r,m")
(match_operand:HI 1 "general_operand_src" "I,r>,r,i,m,r"))
(clobber (reg:CC CC_REG))]
@@ -121,7 +121,7 @@
   [(parallel [(set (match_dup 0) (match_dup 1))
  (clobber (reg:CC CC_REG))])])
   
-(define_insn "*movhi_h8sx_clobber_flags"
+(define_insn "*movhi_h8sx"
   [(set (match_operand:HI 0 "general_operand_dst" "=r,r,Z,Q,rQ")
(match_operand:HI 1 "general_operand_src" "I,P3>X,P4>X,IP8>X,rQi"))
(clobber (reg:CC CC_REG))]
@@ -144,7 +144,7 @@
   [(parallel [(set (strict_low_part (match_dup 0)) (match_dup 1))
  (clobber (reg:CC CC_REG))])])
 
-(define_insn "movstricthi_clobber_flags"
+(define_insn "*movstricthi"
   [(set (strict_low_part (match_operand:HI 0 "general_operand_dst" "+r,r,r"))
 (match_operand:HI 1 "general_operand_src" 
"I,P3>X,rmi"))
(clobber (reg:CC CC_REG))]
@@ -168,8 +168,8 @@
  (clobber (reg:CC CC_REG))])])
 
 (define_insn "*movsi_clobber_flags"
-  [(set (match_operand:SI 0 "general_operand_dst" "=r,r,r,<,r,r,m,*a,*a,r")
-   (match_operand:SI 1 "general_operand_src" "I,r,i,r,>,m,r,I,r,*a"))
+  [(set (match_operand:SI 0 "general_operand_dst" "=r,r,r,<,r,r,m,*a,*a, r")
+   (match_operand:SI 1 "general_operand_src" " I,r,i,r,>,m,r, I, r,*a"))
(clobber (reg:CC CC_REG))]
   "(TARGET_H8300S || TARGET_H8300H) && !TARGET_H8300SX
 && h8300_move_ok (operands[0], operands[1])"
@@ -235,6 +235,25 @@
 }
   [(set (attr "length") (symbol_ref "compute_mov_length (operands)"))])
 
+(define_insn "*movsi_cczn"
+  [(set (reg:CCZN CC_REG)
+   (compare:CCZN
+ (match_operand:SI 1 "general_operand_src" " I,r,i,r,>,m,r")
+ (const_int 0)))
+   (set (match_operand:SI 0 "general_operand_dst" "=r,r,r,<,r,r,m")
+   (match_dup 1))]
+  "(TARGET_H8300S || TARGET_H8300H) && !TARGET_H8300SX
+&& h8300_move_ok (operands[0], operands[1])"
+  "@
+   sub.l   %S0,%S0
+   mov.l   %S1,%S0
+   mov.l   %S1,%S0
+   mov.l   %S1,%S0
+   mov.l   %S1,%S0
+   mov.l   %S1,%S0
+   mov.l   %S1,%S0"
+  [(set (attr "length") (symbol_ref "compute_mov_length (operands)"))])
+
 (define_insn_and_split "*movsi_h8sx"
   [(set (match_operand:SI 0 "general_operand_dst" "=r,r,Q,rQ,*a,*a,r")
(match_operand:SI 1 "general_operand_src" "I,P3>X,IP8>X,rQi,I,r,*a"))]
@@ -260,6 +279,22 @@
   [(set_attr "length_table" "*,*,short_immediate,movl,*,*,*")
(set_attr "length" "2,2,*,*,2,6,4")])
 
+(define_insn "*movsi_h8sx_ccnz"
+  [(set (reg:CCZN CC_REG)
+   (compare:CCZN
+ (match_operand:SI 1 "general_operand_src" "I,P3>X,IP8>X,rQi")
+ (const_int 0)))
+   (set (match_operand:SI 0 "general_operand_dst" "=r,r,Q,rQ")
+   (match_dup 1))]
+  "TARGET_H8300SX"
+  "@
+   sub.l   %S0,%S0
+   mov.l   %S1:3,%S0
+   mov.l   %S1,%S0
+   mov.l   

[PATCH] x86: Update g++.target/i386/pr100885.C

2021-06-07 Thread H.J. Lu via Gcc-patches
On Fri, Jun 4, 2021 at 12:47 AM Jakub Jelinek via Gcc-patches
 wrote:
>
> On Fri, Jun 04, 2021 at 01:03:58AM +, Liu, Hongtao wrote:
> > Thanks for the review.
> > Yes, you're right, AVX512VL parts are already guaranteed by 
> > ix86_hard_regno_mode_ok.
> >
> > Here is updated patch.
>
> One remaining thing, could you try to modify the testcase back to
> #include  and using intrinsics instead of the target builtins,
> so that next time we replace some builtins we don't have to adjust the
> testcase (and of course verify that without your patch it still ICEs and
> with your patch it doesn't)?
>
> Ok for trunk with that change.
>
> Jakub
>

I am checking in this as an obvious fix.

-- 
H.J.
From 6dcfc0b15c4bcb97ab5eb9c4d78c383fe38cba51 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Mon, 7 Jun 2021 07:29:31 -0700
Subject: [PATCH] x86: Update g++.target/i386/pr100885.C

Since long is 32 bits for x32, update g++.target/i386/pr100885.C to cast
__m64 to long long for x32.

	PR target/100885
	* g++.target/i386/pr100885.C (_mm_set_epi64): Cast __m64 to long
	long.
---
 gcc/testsuite/g++.target/i386/pr100885.C | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/g++.target/i386/pr100885.C b/gcc/testsuite/g++.target/i386/pr100885.C
index 08a5bdd02a2..bec08f7e96d 100644
--- a/gcc/testsuite/g++.target/i386/pr100885.C
+++ b/gcc/testsuite/g++.target/i386/pr100885.C
@@ -33,7 +33,7 @@ protected:
   }
 };
 __m128i _mm_set_epi64(__m64 __q0) {
-  __m128i __trans_tmp_5{(long)__q0};
+  __m128i __trans_tmp_5{(long long)__q0};
   return __trans_tmp_5;
 }
 long _mm_storel_epi64___P, Draw_dsts;
-- 
2.31.1



Re: [PATCH PR100740]Fix overflow check in simplifying exit cond comparing two IVs.

2021-06-07 Thread Richard Biener via Gcc-patches
On Sun, Jun 6, 2021 at 12:01 PM Bin.Cheng  wrote:
>
> On Wed, Jun 2, 2021 at 3:28 PM Richard Biener via Gcc-patches
>  wrote:
> >
> > On Tue, Jun 1, 2021 at 4:00 PM bin.cheng via Gcc-patches
> >  wrote:
> > >
> > > Hi,
> > > As described in patch summary, this fixes the wrong code issue by adding 
> > > overflow-ness
> > > check for iv1.step - iv2.step.
> > >
> > > Bootstrap and test on x86_64.  Any comments?
> >
> > + bool wrap_p = TYPE_OVERFLOW_WRAPS (step_type);
> > + if (wrap_p)
> > +   {
> > + tree t = fold_binary_to_constant (GE_EXPR, step_type,
> > +   iv0->step, iv1->step);
> > + wrap_p = integer_zerop (t);
> > +   }
> >
> > I think we can't use TYPE_OVERFLOW_WRAPS/TYPE_OVERFLOW_UNDEFINED since
> > that's only relevant for expressions written by the user - we're
> > computing iv0.step - iv1.step
> > which can even overflow when TYPE_OVERFLOW_UNDEFINED (in fact we may not
> > even generate this expression then!).  So I think we have to do sth like
> >
> >/* If the iv0->step - iv1->step wraps, fail.  */
> >if (!operand_equal_p (iv0->step, iv1->step)
> >&& (TREE_CODE (iv0->step) != INTEGER_CST || TREE_CODE
> > (iv1->step) != INTEGER_CST)
> >&& !wi::gt (wi::to_widest (iv0->step), wi::to_widest (iv1->step))
> >  return false;
> >
> > which only handles equality and all integer constant steps. You could
> Thanks for the suggestion.  I realized that we have LE/LT/NE
> conditions here, and for LE/LT what we need to check is iv0/iv1
> converge to each other, rather than diverge.  Also steps here can only
> be constants, so there is no need to use range information.

Ah, that simplifies things.

+ if (tree_int_cst_lt (iv0->step, iv1->step))
+   return false;

so it looks to me that iv?->step can be negative which means we should
verify that abs(iv0->step - iv1->step) <= abs (iv0->step), correct?

   tree step = fold_binary_to_constant (MINUS_EXPR, step_type,
   iv0->step, iv1->step);
...
+ if (TREE_CODE (step) != INTEGER_CST)
+   return false;

note fold_binary_to_constant will return NULL if the result is not
TREE_CONSTANT (which would also include symbolic constants
like  - ).  It wasn't checked before, of course but since we're
touching the code we might very well be checking for NULL step ...
(or assert it is not for documentation purposes).

That said, if iv0->step and iv1->step are known INTEGER_CSTs
(I think they indeed are given the constraints we impose on
simple_iv_with_niters).

That said, with just a quick look again it looks to me the
IV1 {<=,<} IV2 transform to IV1 - IV2step {<=,<} IV2base
is OK whenever the effective step magnitude on the IV1'
decreases, thus abs(IV1.step - IV2.step) <= abs(IV1.step)
since then IV1 is still guaranteed to not overflow.  But
for example {0, +, 1} and {10, -, 1} also converge if the
number of iterations is less than 10 but they would not pass
this criteria.  So I'm not sure "convergence" is a good wording
here - but maybe I'm missing some critical piece of understanding
here.

But in any case it looks like we're on the right track ;)

Thanks,
Richard.

> > also use ranges
> > like
> >
> >  wide_int min0, max0, min1, max1;
> >   if (!operand_equal_p (iv->step, iv1->step)
> >   && (determine_value_range (iv0->step, , ) != VR_RANGE
> >  || determine_value_range (iv1->step, , ) != VR_RANGE
> >  || !wi::ge (min0, max1)))
> >return false;
> >
> > Note I'm not sure why
> >
> >iv0->step = step;
> >if (!POINTER_TYPE_P (type))
> > iv0->no_overflow = false;
> I don't exactly remember, this was added sometime when no_overflow was
> introduced.  Note we only do various checks for non NE_EXPR so the
> step isn't always less in absolute value?  I will check if we should
> reset it in all cases.
>
> Patch updated.  test ongoing.
>
> Thanks,
> bin
> >
> > here the no_overflow reset does not happen for pointer types?  Or
> > rather why does
> > it happen at all?  Don't we strictly make the step less in absolute value?
> >
> > > Thanks,
> > > bin


Re: [RFC/PATCH] updating global ranges and their effect on __builtin_unreachable code

2021-06-07 Thread Andrew MacLeod via Gcc-patches

On 6/7/21 9:45 AM, Richard Biener wrote:

On Mon, Jun 7, 2021 at 3:37 PM Andrew MacLeod  wrote:

On 6/7/21 3:25 AM, Richard Biener wrote:

On Wed, Jun 2, 2021 at 2:53 PM Andrew MacLeod  wrote:

On 6/2/21 7:52 AM, Richard Biener wrote:

On Wed, Jun 2, 2021 at 12:34 PM Aldy Hernandez via Gcc-patches
 wrote:

We've been having "issues" in our branch when exporting to the global
space ranges that take into account previously known ranges
(SSA_NAME_RANGE_INFO, etc).  For the longest time we had the export
feature turned off because it had the potential of removing
__builtin_unreachable code early in the pipeline.  This was causing one
or two tests to fail.

I finally got fed up, and investigated why.

Take the following code:

  i_4 = somerandom ();
  if (i_4 < 0)
goto ; [INV]
  else
goto ; [INV]

   :
  __builtin_unreachable ();

   :

It turns out that both legacy evrp and VRP have code that notices the
above pattern and sets the *global* range for i_4 to [0,MAX].  That is,
the range for i_4 is set, not at BB4, but at the definition site.  See
uses of assert_unreachable_fallthru_edge_p() for details.

This global range causes subsequent passes (VRP1 in the testcase below),
to remove the checks and the __builtin_unreachable code altogether.

// pr80776-1.c
int somerandom (void);
void
Foo (void)
{
  int i = somerandom ();
  if (! (0 <= i))
__builtin_unreachable ();
  if (! (0 <= i && i <= 99))
__builtin_unreachable ();
  sprintf (number, "%d", i);
}

This means that by the time the -Wformat-overflow warning runs, the
above sprintf has been left unguarded, and a bogus warning is issued.

Currently the above test does not warn, but that's because of an
oversight in export_global_ranges().  This function is disregarding
known global ranges (SSA_NAME_RANGE_INFO and SSA_NAME_PTR_INFO) and only
setting ranges the ranger knows about.

For the above test the IL is:

   :
  i_4 = somerandom ();
  if (i_4 < 0)
goto ; [INV]
  else
goto ; [INV]

   :
  __builtin_unreachable ();

   :
  i.0_1 = (unsigned int) i_4;
  if (i.0_1 > 99)
goto ; [INV]
  else
goto ; [INV]

   :
  __builtin_unreachable ();

   :
  _7 = __builtin___sprintf_chk (, 1, 7, "%d", i_4);


Legacy evrp has determined that the range for i_4 is [0,MAX] per my
analysis above, but ranger has no known range for i_4 at the definition
site.  So at export_global_ranges time, ranger leaves the [0,MAX] alone.

OTOH, evrp sets the global range at the definition for i.0_1 to
[0,99] per the same unreachable feature.  However, ranger has
correctly determined that the range for i.0_1 at the definition is
[0,MAX], which it then proceeds to export.  Since the current
export_global_ranges (mistakenly) does not take into account previous
global ranges, the ranges in the global tables end up like this:

i_4: [0, MAX]
i.0_1: [0, MAX]

This causes the first unreachable block to be removed in VRP1, but the
second one to remain.  Later VRP can determine that i_4 in the sprintf
call is [0,99], and no warning is issued.

But... the missing bogus warning is due to current export_global_ranges
ignoring SSA_NAME_RANGE_INFO and friends, something which I'd like to
fix.  However, fixing this, gets us back to:

i_4: [0, MAX]
i.0_1: [0, 99]

Which means, we'll be back to removing the unreachable blocks and
issuing a warning in pr80776-1.c (like we have been since the beginning
of time).

The attached patch fixes export_global_ranges to the expected behavior,
and adds the previous XFAIL to pr80776-1.c, while documenting why this
warning is issued in the first place.

Once legacy evrp is removed, this won't be an issue, as ranges in the IL
will tell the truth.  However, this will mean that we will no longer
remove the first __builtin_unreachable combo.  But ISTM, that would be
correct behavior ??.

BTW, in addition to this patch we could explore removing the
assert_unreachable_fallthru_edge_p() use in the evrp_analyzer, since it
is no longer needed to get the warnings in the testcases in the original
PR correctly (gcc.dg/pr80776-[12].c).

But the whole point of all this singing and dancing is not to make
warnings but to be able to implement assert (); or assume (); that
will result in no code but optimization based on the assumption.

That means that all the checks guarding __builtin_unreachable ()
should be removed at the GIMPLE level - just not too early
to preserve range info on the variables participating in the
guarding condition.

So yes, it sounds fragile but instead it's carefully architected.  Heh.

In particular it is designed so that early optimization leaves those
unreachable () around (for later LTO consumption and inlining, etc.
to be able to re-create the ranges) whilst VRP1 / DOM will end up
eliminating them.  I think we have testcases that verify said behavior,
namely optimize out range checks 

Re: [PATCH v2 1/2] Allow vec_duplicate_optab to fail

2021-06-07 Thread H.J. Lu via Gcc-patches
On Mon, Jun 7, 2021 at 12:12 AM Richard Sandiford
 wrote:
>
> "H.J. Lu"  writes:
> > Update vec_duplicate to allow to fail so that backend can only allow
> > broadcasting an integer constant to a vector when broadcast instruction
> > is available.
>
> I'm not sure why we need this to fail though.  Once the optab is defined
> for target X, the optab should handle all duplicates for target X,
> even if there are different strategies it can use.
>
> AIUI the case you want to make conditional is the constant case.
> I guess the first question is: why don't we simplify those CONSTRUCTORs
> to VECTOR_CSTs in gimple?  I'm surprised we still see the constant case
> as a constructor here.

The particular testcase for vec_duplicate is gcc.dg/pr100239.c.

> If we can't rely on that happening, then would it work to change:
>
> /* Try using vec_duplicate_optab for uniform vectors.  */
> if (!TREE_SIDE_EFFECTS (exp)
> && VECTOR_MODE_P (mode)
> && eltmode == GET_MODE_INNER (mode)
> && ((icode = optab_handler (vec_duplicate_optab, mode))
> != CODE_FOR_nothing)
> && (elt = uniform_vector_p (exp)))
>
> to something like:
>
> /* Try using vec_duplicate_optab for uniform vectors.  */
> if (!TREE_SIDE_EFFECTS (exp)
> && VECTOR_MODE_P (mode)
> && eltmode == GET_MODE_INNER (mode)
> && (elt = uniform_vector_p (exp)))
>   {
> if (TREE_CODE (elt) == INTEGER_CST
> || TREE_CODE (elt) == POLY_INT_CST
> || TREE_CODE (elt) == REAL_CST
> || TREE_CODE (elt) == FIXED_CST)
>   {
> rtx src = gen_const_vec_duplicate (mode, expand_normal 
> (node));
> emit_move_insn (target, src);
> break;
>   }
> …
>   }

I will give it a try.

Thanks.

-- 
H.J.


Re: RFC: Sphinx for GCC documentation

2021-06-07 Thread Tobias Burnus



On 07.06.21 15:28, Martin Liška wrote:

* I note that we write before the argument index, that those are
without -/-- prefix
   but that's not true. Something to fix after the conversation.


Can you please show me a few examples of it?


* https://splichal.eu/scripts/sphinx/gfortran/_build/html/option-index.html

* https://splichal.eu/scripts/sphinx/gcc/_build/html/option-index.html

* https://splichal.eu/scripts/sphinx/cpp/_build/html/option-index.html

Read the text on those page – and then look at the converted index,
which has all the '-' and '--' before the options.

That's not a conversion bug – a change which needs to be done as part of
or after the conversion.

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf


Re: [Patch] Fortran/OpenMP: Fix clause splitting for target/parallel/teams [PR99928]

2021-06-07 Thread Jakub Jelinek via Gcc-patches
On Mon, Jun 07, 2021 at 03:21:10PM +0200, Tobias Burnus wrote:
> +static void
> +gfc_add_clause_implicitly (gfc_omp_clauses *clauses_out,
> +gfc_omp_clauses *clauses_in,
> +bool is_target, bool is_parallel_do)
> +{
> +  int clauselist_to_add = is_target ? OMP_LIST_MAP : OMP_LIST_SHARED;

For SHARED and array sections on reduction shared should be for the whole
variable rather than just the array section.
But I guess that can be solved only when the support is there.

> +   if (clauses_out->lists[clauselist_to_add]
> +   && clauses_out->lists[clauselist_to_add] == 
> clauses_in->lists[clauselist_to_add])

Too long line.

Otherwise LGTM.

Jakub



Re: [RFC/PATCH] updating global ranges and their effect on __builtin_unreachable code

2021-06-07 Thread Richard Biener via Gcc-patches
On Mon, Jun 7, 2021 at 3:37 PM Andrew MacLeod  wrote:
>
> On 6/7/21 3:25 AM, Richard Biener wrote:
> > On Wed, Jun 2, 2021 at 2:53 PM Andrew MacLeod  wrote:
> >> On 6/2/21 7:52 AM, Richard Biener wrote:
> >>> On Wed, Jun 2, 2021 at 12:34 PM Aldy Hernandez via Gcc-patches
> >>>  wrote:
>  We've been having "issues" in our branch when exporting to the global
>  space ranges that take into account previously known ranges
>  (SSA_NAME_RANGE_INFO, etc).  For the longest time we had the export
>  feature turned off because it had the potential of removing
>  __builtin_unreachable code early in the pipeline.  This was causing one
>  or two tests to fail.
> 
>  I finally got fed up, and investigated why.
> 
>  Take the following code:
> 
>   i_4 = somerandom ();
>   if (i_4 < 0)
> goto ; [INV]
>   else
> goto ; [INV]
> 
>    :
>   __builtin_unreachable ();
> 
>    :
> 
>  It turns out that both legacy evrp and VRP have code that notices the
>  above pattern and sets the *global* range for i_4 to [0,MAX].  That is,
>  the range for i_4 is set, not at BB4, but at the definition site.  See
>  uses of assert_unreachable_fallthru_edge_p() for details.
> 
>  This global range causes subsequent passes (VRP1 in the testcase below),
>  to remove the checks and the __builtin_unreachable code altogether.
> 
>  // pr80776-1.c
>  int somerandom (void);
>  void
>  Foo (void)
>  {
>   int i = somerandom ();
>   if (! (0 <= i))
> __builtin_unreachable ();
>   if (! (0 <= i && i <= 99))
> __builtin_unreachable ();
>   sprintf (number, "%d", i);
>  }
> 
>  This means that by the time the -Wformat-overflow warning runs, the
>  above sprintf has been left unguarded, and a bogus warning is issued.
> 
>  Currently the above test does not warn, but that's because of an
>  oversight in export_global_ranges().  This function is disregarding
>  known global ranges (SSA_NAME_RANGE_INFO and SSA_NAME_PTR_INFO) and only
>  setting ranges the ranger knows about.
> 
>  For the above test the IL is:
> 
>    :
>   i_4 = somerandom ();
>   if (i_4 < 0)
> goto ; [INV]
>   else
> goto ; [INV]
> 
>    :
>   __builtin_unreachable ();
> 
>    :
>   i.0_1 = (unsigned int) i_4;
>   if (i.0_1 > 99)
> goto ; [INV]
>   else
> goto ; [INV]
> 
>    :
>   __builtin_unreachable ();
> 
>    :
>   _7 = __builtin___sprintf_chk (, 1, 7, "%d", i_4);
> 
> 
>  Legacy evrp has determined that the range for i_4 is [0,MAX] per my
>  analysis above, but ranger has no known range for i_4 at the definition
>  site.  So at export_global_ranges time, ranger leaves the [0,MAX] alone.
> 
>  OTOH, evrp sets the global range at the definition for i.0_1 to
>  [0,99] per the same unreachable feature.  However, ranger has
>  correctly determined that the range for i.0_1 at the definition is
>  [0,MAX], which it then proceeds to export.  Since the current
>  export_global_ranges (mistakenly) does not take into account previous
>  global ranges, the ranges in the global tables end up like this:
> 
>  i_4: [0, MAX]
>  i.0_1: [0, MAX]
> 
>  This causes the first unreachable block to be removed in VRP1, but the
>  second one to remain.  Later VRP can determine that i_4 in the sprintf
>  call is [0,99], and no warning is issued.
> 
>  But... the missing bogus warning is due to current export_global_ranges
>  ignoring SSA_NAME_RANGE_INFO and friends, something which I'd like to
>  fix.  However, fixing this, gets us back to:
> 
>  i_4: [0, MAX]
>  i.0_1: [0, 99]
> 
>  Which means, we'll be back to removing the unreachable blocks and
>  issuing a warning in pr80776-1.c (like we have been since the beginning
>  of time).
> 
>  The attached patch fixes export_global_ranges to the expected behavior,
>  and adds the previous XFAIL to pr80776-1.c, while documenting why this
>  warning is issued in the first place.
> 
>  Once legacy evrp is removed, this won't be an issue, as ranges in the IL
>  will tell the truth.  However, this will mean that we will no longer
>  remove the first __builtin_unreachable combo.  But ISTM, that would be
>  correct behavior ??.
> 
>  BTW, in addition to this patch we could explore removing the
>  assert_unreachable_fallthru_edge_p() use in the evrp_analyzer, since it
>  is no longer needed to get the warnings in the testcases in the original
>  PR correctly (gcc.dg/pr80776-[12].c).
> >>> But the whole point of all this 

Re: [RFC/PATCH] updating global ranges and their effect on __builtin_unreachable code

2021-06-07 Thread Andrew MacLeod via Gcc-patches

On 6/7/21 3:25 AM, Richard Biener wrote:

On Wed, Jun 2, 2021 at 2:53 PM Andrew MacLeod  wrote:

On 6/2/21 7:52 AM, Richard Biener wrote:

On Wed, Jun 2, 2021 at 12:34 PM Aldy Hernandez via Gcc-patches
 wrote:

We've been having "issues" in our branch when exporting to the global
space ranges that take into account previously known ranges
(SSA_NAME_RANGE_INFO, etc).  For the longest time we had the export
feature turned off because it had the potential of removing
__builtin_unreachable code early in the pipeline.  This was causing one
or two tests to fail.

I finally got fed up, and investigated why.

Take the following code:

 i_4 = somerandom ();
 if (i_4 < 0)
   goto ; [INV]
 else
   goto ; [INV]

  :
 __builtin_unreachable ();

  :

It turns out that both legacy evrp and VRP have code that notices the
above pattern and sets the *global* range for i_4 to [0,MAX].  That is,
the range for i_4 is set, not at BB4, but at the definition site.  See
uses of assert_unreachable_fallthru_edge_p() for details.

This global range causes subsequent passes (VRP1 in the testcase below),
to remove the checks and the __builtin_unreachable code altogether.

// pr80776-1.c
int somerandom (void);
void
Foo (void)
{
 int i = somerandom ();
 if (! (0 <= i))
   __builtin_unreachable ();
 if (! (0 <= i && i <= 99))
   __builtin_unreachable ();
 sprintf (number, "%d", i);
}

This means that by the time the -Wformat-overflow warning runs, the
above sprintf has been left unguarded, and a bogus warning is issued.

Currently the above test does not warn, but that's because of an
oversight in export_global_ranges().  This function is disregarding
known global ranges (SSA_NAME_RANGE_INFO and SSA_NAME_PTR_INFO) and only
setting ranges the ranger knows about.

For the above test the IL is:

  :
 i_4 = somerandom ();
 if (i_4 < 0)
   goto ; [INV]
 else
   goto ; [INV]

  :
 __builtin_unreachable ();

  :
 i.0_1 = (unsigned int) i_4;
 if (i.0_1 > 99)
   goto ; [INV]
 else
   goto ; [INV]

  :
 __builtin_unreachable ();

  :
 _7 = __builtin___sprintf_chk (, 1, 7, "%d", i_4);


Legacy evrp has determined that the range for i_4 is [0,MAX] per my
analysis above, but ranger has no known range for i_4 at the definition
site.  So at export_global_ranges time, ranger leaves the [0,MAX] alone.

OTOH, evrp sets the global range at the definition for i.0_1 to
[0,99] per the same unreachable feature.  However, ranger has
correctly determined that the range for i.0_1 at the definition is
[0,MAX], which it then proceeds to export.  Since the current
export_global_ranges (mistakenly) does not take into account previous
global ranges, the ranges in the global tables end up like this:

i_4: [0, MAX]
i.0_1: [0, MAX]

This causes the first unreachable block to be removed in VRP1, but the
second one to remain.  Later VRP can determine that i_4 in the sprintf
call is [0,99], and no warning is issued.

But... the missing bogus warning is due to current export_global_ranges
ignoring SSA_NAME_RANGE_INFO and friends, something which I'd like to
fix.  However, fixing this, gets us back to:

i_4: [0, MAX]
i.0_1: [0, 99]

Which means, we'll be back to removing the unreachable blocks and
issuing a warning in pr80776-1.c (like we have been since the beginning
of time).

The attached patch fixes export_global_ranges to the expected behavior,
and adds the previous XFAIL to pr80776-1.c, while documenting why this
warning is issued in the first place.

Once legacy evrp is removed, this won't be an issue, as ranges in the IL
will tell the truth.  However, this will mean that we will no longer
remove the first __builtin_unreachable combo.  But ISTM, that would be
correct behavior ??.

BTW, in addition to this patch we could explore removing the
assert_unreachable_fallthru_edge_p() use in the evrp_analyzer, since it
is no longer needed to get the warnings in the testcases in the original
PR correctly (gcc.dg/pr80776-[12].c).

But the whole point of all this singing and dancing is not to make
warnings but to be able to implement assert (); or assume (); that
will result in no code but optimization based on the assumption.

That means that all the checks guarding __builtin_unreachable ()
should be removed at the GIMPLE level - just not too early
to preserve range info on the variables participating in the
guarding condition.

So yes, it sounds fragile but instead it's carefully architected.  Heh.

In particular it is designed so that early optimization leaves those
unreachable () around (for later LTO consumption and inlining, etc.
to be able to re-create the ranges) whilst VRP1 / DOM will end up
eliminating them.  I think we have testcases that verify said behavior,
namely optimize out range checks based on the assertions - maybe missed
the case where this only happens after inlining (important for your friendly
C++ abstraction 

Re: [PATCH 02/57] Support scanning of build-time GC roots in gengtype

2021-06-07 Thread Richard Biener via Gcc-patches
On Mon, Jun 7, 2021 at 2:35 PM Bill Schmidt  wrote:
>
> On 6/7/21 5:39 AM, Richard Sandiford wrote:
> > Bill Schmidt via Gcc-patches  writes:
> >> On 5/20/21 5:24 PM, Segher Boessenkool wrote:
> >>> On Tue, May 11, 2021 at 11:01:22AM -0500, Bill Schmidt wrote:
>  Hi!  I'd like to ping this specific patch from the series, which is the
>  only one remaining that affects common code.  I confess that I don't
>  know whom to ask for a review for gengtype; I didn't get any good ideas
>  from MAINTAINERS.  If you know of a good reviewer candidate, please CC
>  them.
> >>> Richard is listed as the "gen* on machine desc" maintainer, that might
> >>> be the closest to this.  cc:ed.
> >> Hi, Richard -- any thoughts on this patch?
> >>
> >> https://gcc.gnu.org/pipermail/gcc-patches/2021-April/568841.html
> > I don't really know gengtype.c, sorry.  (The gen* thing was for
> > the md generators, not genmatch and gengtype.)
> >
> > Richard
>
> OK, thanks, Richard!
>
> Richi, from git blame, it appears everyone that used to know about
> gengtype has moved on.

Ouch.

> Would you be able to give a quick peek at this?
> It's a pretty uninteresting patch, all in all. :-)

Some maybe obvious issue - what about DOS-style path hosts?
You seem to build ../ strings to point to parent dirs...  I'm not sure
what we do elsewhere - I suppose we arrange for appropriate
-I command line arguments?

Richard.

>
> Thanks,
> Bill
>


Re: [PATCH] define auto_vec copy ctor and assignment (PR 90904)

2021-06-07 Thread Richard Biener via Gcc-patches
On Mon, Jun 7, 2021 at 12:33 PM Trevor Saunders  wrote:
>
> On Mon, Jun 07, 2021 at 10:51:18AM +0200, Richard Biener via Gcc-patches 
> wrote:
> > On Thu, Jun 3, 2021 at 10:29 AM Trevor Saunders  
> > wrote:
> > >
> > > On Wed, Jun 02, 2021 at 10:04:03AM -0600, Martin Sebor via Gcc-patches 
> > > wrote:
> > > > On 6/2/21 12:55 AM, Richard Biener wrote:
> > > > > On Tue, Jun 1, 2021 at 9:56 PM Martin Sebor  wrote:
> > > > > >
> > > > > > On 5/27/21 2:53 PM, Jason Merrill wrote:
> > > > > > > On 4/27/21 11:52 AM, Martin Sebor via Gcc-patches wrote:
> > > > > > > > On 4/27/21 8:04 AM, Richard Biener wrote:
> > > > > > > > > On Tue, Apr 27, 2021 at 3:59 PM Martin Sebor 
> > > > > > > > >  wrote:
> > > > > > > > > >
> > > > > > > > > > On 4/27/21 1:58 AM, Richard Biener wrote:
> > > > > > > > > > > On Tue, Apr 27, 2021 at 2:46 AM Martin Sebor via 
> > > > > > > > > > > Gcc-patches
> > > > > > > > > > >  wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > PR 90904 notes that auto_vec is unsafe to copy and 
> > > > > > > > > > > > assign because
> > > > > > > > > > > > the class manages its own memory but doesn't define (or 
> > > > > > > > > > > > delete)
> > > > > > > > > > > > either special function.  Since I first ran into the 
> > > > > > > > > > > > problem,
> > > > > > > > > > > > auto_vec has grown a move ctor and move assignment from
> > > > > > > > > > > > a dynamically-allocated vec but still no copy ctor or 
> > > > > > > > > > > > copy
> > > > > > > > > > > > assignment operator.
> > > > > > > > > > > >
> > > > > > > > > > > > The attached patch adds the two special functions to 
> > > > > > > > > > > > auto_vec along
> > > > > > > > > > > > with a few simple tests.  It makes auto_vec safe to use 
> > > > > > > > > > > > in containers
> > > > > > > > > > > > that expect copyable and assignable element types and 
> > > > > > > > > > > > passes
> > > > > > > > > > > > bootstrap
> > > > > > > > > > > > and regression testing on x86_64-linux.
> > > > > > > > > > >
> > > > > > > > > > > The question is whether we want such uses to appear since 
> > > > > > > > > > > those
> > > > > > > > > > > can be quite inefficient?  Thus the option is to delete 
> > > > > > > > > > > those
> > > > > > > > > > > operators?
> > > > > > > > > >
> > > > > > > > > > I would strongly prefer the generic vector class to have 
> > > > > > > > > > the properties
> > > > > > > > > > expected of any other generic container: copyable and 
> > > > > > > > > > assignable.  If
> > > > > > > > > > we also want another vector type with this restriction I 
> > > > > > > > > > suggest to add
> > > > > > > > > > another "noncopyable" type and make that property explicit 
> > > > > > > > > > in its name.
> > > > > > > > > > I can submit one in a followup patch if you think we need 
> > > > > > > > > > one.
> > > > > > > > >
> > > > > > > > > I'm not sure (and not strictly against the copy and assign).  
> > > > > > > > > Looking
> > > > > > > > > around
> > > > > > > > > I see that vec<> does not do deep copying.  Making auto_vec<> 
> > > > > > > > > do it
> > > > > > > > > might be surprising (I added the move capability to match how 
> > > > > > > > > vec<>
> > > > > > > > > is used - as "reference" to a vector)
> > > > > > > >
> > > > > > > > The vec base classes are special: they have no ctors at all 
> > > > > > > > (because
> > > > > > > > of their use in unions).  That's something we might have to 
> > > > > > > > live with
> > > > > > > > but it's not a model to follow in ordinary containers.
> > > > > > >
> > > > > > > I don't think we have to live with it anymore, now that we're 
> > > > > > > writing
> > > > > > > C++11.
> > > > > > >
> > > > > > > > The auto_vec class was introduced to fill the need for a 
> > > > > > > > conventional
> > > > > > > > sequence container with a ctor and dtor.  The missing copy ctor 
> > > > > > > > and
> > > > > > > > assignment operators were an oversight, not a deliberate 
> > > > > > > > feature.
> > > > > > > > This change fixes that oversight.
> > >
> > > I've been away a while, but trying to get back into this, sorry.  It was
> > > definitely an oversight to leave these undefined for the compiler to
> > > provide a default definition of, but I agree with Richi, the better
> > > thing to have done, or do now would be to mark them as deleted and make
> > > auto_vec move only (with copy() for when you really need a deep copy.
> > > > > > > >
> > > > > > > > The revised patch also adds a copy ctor/assignment to the 
> > > > > > > > auto_vec
> > > > > > > > primary template (that's also missing it).  In addition, it adds
> > > > > > > > a new class called auto_vec_ncopy that disables copying and
> > > > > > > > assignment as you prefer.
> > > > > > >
> > > > > > > Hmm, adding another class doesn't really help with the confusion 
> > > > > > > richi
> > > > > > > mentions.  And many uses of auto_vec will pass them as vec, which 
> > > > > > > will
> > > > > > > still do a shallow copy.  I 

Re: [PATCH] Implement a context aware points-to analyzer for use in evrp.

2021-06-07 Thread Richard Biener via Gcc-patches
On Mon, Jun 7, 2021 at 12:10 PM Aldy Hernandez via Gcc-patches
 wrote:
>
> The substitute_and_fold_engine which evrp uses is expecting symbolics
> from value_of_expr / value_on_edge / etc, which ranger does not provide.
> In some cases, these provide important folding cues, as in the case of
> aliases for pointers.  For example, legacy evrp may return [, ]
> for the value of "bar" where bar is on an edge where bar == , or
> when bar has been globally set to   This information is then used
> by the subst & fold engine to propagate the known value of bar.
>
> Currently this is a major source of discrepancies between evrp and
> ranger.  Of the 284 cases legacy evrp is getting over ranger, 237 are
> for pointer equality as discussed above.
>
> This patch implements a context aware points-to class which
> ranger-evrp can use to query what a pointer is currently pointing to.
> With it, we reduce the 284 cases legacy evrp is getting to 47.
>
> The API for the points-to analyzer is the following:
>
> class points_to_analyzer
> {
> public:
>   points_to_analyzer (gimple_ranger *r);
>   ~points_to_analyzer ();
>   void enter (basic_block);
>   void leave (basic_block);
>   void visit_stmt (gimple *stmt);
>   tree get_points_to (tree name) const;
> ...
> };
>
> The enter(), leave(), and visit_stmt() methods are meant to be called
> from a DOM walk.   At any point throughout the walk, one can call
> get_points_to() to get whatever an SSA is pointing to.
>
> If this class is useful to others, we could place it in a more generic
> location.
>
> Tested on x86-64 Linux with a regular bootstrap/tests and by comparing
> EVRP folds over ranger before and after this patch.

Hmm, but why call it "points-to" - when I look at the implementation
it's really about equivalences.  Thus,

 if (var1_2 == var2_3)

could be handled the same way.  Also "points-to" implies (to me)
that [1] and [2] point to the same object but your points-to
is clearly tracking equivalences only.

So maybe at least rename it to pointer_equiv_analyzer?  ISTR
propagating random (symbolic) equivalences has issues.

Thanks,
Richard.

> gcc/ChangeLog:
>
> * gimple-ssa-evrp.c (class ssa_equiv_stack): New.
> (ssa_equiv_stack::ssa_equiv_stack): New.
> (ssa_equiv_stack::~ssa_equiv_stack): New.
> (ssa_equiv_stack::enter): New.
> (ssa_equiv_stack::leave): New.
> (ssa_equiv_stack::push_replacement): New.
> (ssa_equiv_stack::get_replacement): New.
> (is_pointer_ssa): New.
> (class points_to_analyzer): New.
> (points_to_analyzer::points_to_analyzer): New.
> (points_to_analyzer::~points_to_analyzer): New.
> (points_to_analyzer::set_global_points_to): New.
> (points_to_analyzer::set_cond_points_to): New.
> (points_to_analyzer::get_points_to): New.
> (points_to_analyzer::enter): New.
> (points_to_analyzer::leave): New.
> (points_to_analyzer::get_points_to_expr): New.
> (pta_valueize): New.
> (points_to_analyzer::visit_stmt): New.
> (points_to_analyzer::visit_edge): New.
> (hybrid_folder::value_of_expr): Call PTA.
> (hybrid_folder::value_on_edge): Same.
> (hybrid_folder::pre_fold_bb): New.
> (hybrid_folder::post_fold_bb): New.
> (hybrid_folder::pre_fold_stmt): New.
> (rvrp_folder::pre_fold_bb): New.
> (rvrp_folder::post_fold_bb): New.
> (rvrp_folder::pre_fold_stmt): New.
> (rvrp_folder::value_of_expr): Call PTA.
> (rvrp_folder::value_on_edge): Same.
> ---
>  gcc/gimple-ssa-evrp.c | 352 +-
>  1 file changed, 350 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/gimple-ssa-evrp.c b/gcc/gimple-ssa-evrp.c
> index 118d10365a0..6ce32d7b620 100644
> --- a/gcc/gimple-ssa-evrp.c
> +++ b/gcc/gimple-ssa-evrp.c
> @@ -42,6 +42,305 @@ along with GCC; see the file COPYING3.  If not see
>  #include "vr-values.h"
>  #include "gimple-ssa-evrp-analyze.h"
>  #include "gimple-range.h"
> +#include "fold-const.h"
> +
> +// Unwindable SSA equivalence table for pointers.
> +//
> +// The main query point is get_replacement() which returns what a given SSA 
> can
> +// be replaced with in the current scope.
> +
> +class ssa_equiv_stack
> +{
> +public:
> +  ssa_equiv_stack ();
> +  ~ssa_equiv_stack ();
> +  void enter (basic_block);
> +  void leave (basic_block);
> +  void push_replacement (tree name, tree replacement);
> +  tree get_replacement (tree name) const;
> +
> +private:
> +  auto_vec> m_stack;
> +  tree *m_replacements;
> +  const std::pair  m_marker = std::make_pair (NULL, NULL);
> +};
> +
> +ssa_equiv_stack::ssa_equiv_stack ()
> +{
> +  m_replacements = new tree[num_ssa_names] ();
> +}
> +
> +ssa_equiv_stack::~ssa_equiv_stack ()
> +{
> +  m_stack.release ();
> +  delete m_replacements;
> +}
> +
> +// Pushes a marker at the given point.
> +
> +void
> +ssa_equiv_stack::enter (basic_block)
> +{
> +  

Re: RFC: Sphinx for GCC documentation

2021-06-07 Thread Martin Liška

On 6/4/21 4:24 PM, Koning, Paul wrote:




On Jun 4, 2021, at 3:55 AM, Tobias Burnus  wrote:

Hello,

On 13.05.21 13:45, Martin Liška wrote:

On 4/1/21 3:30 PM, Martin Liška wrote:

That said, I'm asking the GCC community for a green light before I
invest
more time on it?

So far, I've received just a small feedback about the transition. In
most cases positive.

[1] https://splichal.eu/scripts/sphinx/


The HTML output looks quite nice.

What I observed:

* Looking at
  
https://splichal.eu/scripts/sphinx/gfortran/_build/html/intrinsic-procedures/access-checks-file-access-modes.html
why is the first argument description in bold?
It is also not very readable to have a scollbar there – linebreaks would be 
better.
→ I think that's because the assumption is that the first line contains a header
  and the rest the data


Explicit line breaks are likely to be wrong depending on the reader's window 
size.  I would suggest setting the table to have cells with line-wrapped 
contents.  That would typically be the default in HTML, I'm curious why that is 
not happening here.


Note that Sphinx supports 2 types of tables: Grid Table and Simple table. We 
prefer the later and
one can do proper line breaking in the Grid type.

https://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html#tables

Anyway, this is resolved as I use more appropriate directive:
https://splichal.eu/scripts/sphinx/gfortran/_build/html/intrinsic-procedures/access-checks-file-access-modes.html

Martin



paul






[vect-patterns][RFC] Refactor widening patterns to allow internal_fn's

2021-06-07 Thread Joel Hutton via Gcc-patches
Hi all,

This refactor allows widening patterns (such as widen_plus/widen_minus) to be 
represented as
either internal_fns or tree_codes. The widening patterns were originally added 
as tree codes with the expectation that they would be refactored later.

[vect-patterns] Refactor as internal_fn's

Refactor vect-patterns to allow patterns to be internal_fns starting
with widening_plus/minus patterns.


gcc/ChangeLog:

* gimple-match.h (class code_helper): Move code_helper class to more 
visible header.
* internal-fn.h (internal_fn_name): Add internal_fn range check.
* optabs-tree.h (supportable_convert_operation): Change function 
prototypes to use code_helper.
* tree-vect-patterns.c (vect_recog_widen_op_pattern): Refactor to use 
code_helper.
* tree-vect-stmts.c (vect_gen_widened_results_half): Refactor to use 
code_helper, build internal_fns.
(vect_create_vectorized_promotion_stmts): Refactor to use code_helper.
(vectorizable_conversion): Refactor to use code_helper.
(supportable_widening_operation): Refactor to use code_helper.
(supportable_narrowing_operation): Refactor to use code_helper.
* tree-vectorizer.h (supportable_widening_operation): Refactor to use 
code_helper.
(supportable_narrowing_operation): Refactor to use code_helper.
* tree.h (class code_helper): Refactor to use code_helper.


rb14487.patch
Description: rb14487.patch


Re: RFC: Sphinx for GCC documentation

2021-06-07 Thread Martin Liška

On 6/4/21 9:55 AM, Tobias Burnus wrote:

Hello,

On 13.05.21 13:45, Martin Liška wrote:

On 4/1/21 3:30 PM, Martin Liška wrote:

That said, I'm asking the GCC community for a green light before I
invest
more time on it?

So far, I've received just a small feedback about the transition. In
most cases positive.

[1] https://splichal.eu/scripts/sphinx/




Hi.


The HTML output looks quite nice.


Thanks.



What I observed:

* Looking at
   
https://splichal.eu/scripts/sphinx/gfortran/_build/html/intrinsic-procedures/access-checks-file-access-modes.html
why is the first argument description in bold?
It is also not very readable to have a scollbar there – linebreaks would be 
better.
→ I think that's because the assumption is that the first line contains a header
   and the rest the data


I've converted the problematic table to '.. function::' directive that has 
:returns: and :param: arguments.
Hope the output is fine now?



* https://splichal.eu/scripts/sphinx/gfortran/_build/latex/gfortran.pdf
   If I look at page 92 (alias 96), 8.2.13 _gfortran_caf_sendget, the first 
column
   is too small to fit the argument names. – Admittedly, the current 
gfortran.pdf
   is not much better – it is very tight but just fits. I don't know how to fix 
this.


This is also converted and should look much better.



* I note that we write before the argument index, that those are without -/-- 
prefix
   but that's not true. Something to fix after the conversation.


Can you please show me a few examples of it?



* The syntax highlighting for gfortran is odd. Looking at @smallexample:
- intrinsic.texi: All Fortran examples (F90/free-form)
- gfc-internals.texi: 4x Fortran, 4x C, 3x plain text
- gfortran.texi: Shell, Fortran, C, plain text.
- invoke.texi: 4x Shell, 2x C, 4x Fortran


Should be fixed now as I set 'fortran' code-block in the fortran manual.
Right now, there are few warnings that a code block is C/C++, but that's
quite a small fallout.


Does not seem to be that simple, but it would be nice if at least all in
intrinsic.texi would be marked as Fortran.


Should be better now?



Actually, I do not quite understand when the output is formatted a C (wrongly
or rightly) as Fortran (rarely but correctly) as plain or in some odd formatting
which randomly highlights some examples.


We make quests based on keyworks in a code-block and we also consider texinfo 
filename.


Possibly also an item for after the conversion.


Sure, there are still some warnings that can be seen with 'make html' (or other 
target).

Thanks for review,
Martin



Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf




Re: [PATCH] Reformat target.def for better parsing.

2021-06-07 Thread Richard Biener via Gcc-patches
On Mon, Jun 7, 2021 at 10:35 AM Martin Liška  wrote:
>
> On 6/7/21 8:14 AM, Richard Biener wrote:
> > Hmm, what's the problem with parsing the strings?  The newlines are
> > only because of our line length limits and hard-coding them looks both
> > error-prone and ugly.
>
> For the future Sphinx conversion, I need to replace content of the hook 
> definitions
> in .def files with RST format (as opposed to the current TEXI format). I did 
> so
> by putting a placeholders in genhooks.c and then parsing of the created .rst 
> file.
>
> When there are no newlines, then I can't split the generated .rst content to 
> multiple lines
> as defined in .def files.
>
> So my patch makes it only consistent as most of the hooks use '\n\' line 
> endings.

Oh - they do.  Yes, it makes sense as a fix then.

Thanks,
Richard.

> Does it make sense?
> Martin
>


[Patch] Fortran/OpenMP: Fix clause splitting for target/parallel/teams [PR99928]

2021-06-07 Thread Tobias Burnus

This removes the remaining xfails in gfortran.dg/gomp/pr99928-*.f90
and should in theory fix all splitting issues.

I see some differences for 'parallel do simd' with regards to attaching
lastprivate to 'for' vs. 'parallel' There are probably more.
(I hope none of them is relevant.)
There are also a couple of FIXME regarding 'parallel for' and the
clause attachment (in both the C/C++ and Fortran testcase.)

Still missing for C/C++ parity is the support for reduction with
array slides (→ pr88828-{9,10}.c).

I hope that I got the splitting correct.

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf
Fortran/OpenMP: Fix clause splitting for target/parallel/teams [PR99928]

	PR middle-end/99928

gcc/fortran/ChangeLog:
 
	* trans-openmp.c (gfc_add_clause_implicitly,
	(gfc_split_omp_clauses): Use it.
	(gfc_free_split_omp_clauses): New.
	(gfc_trans_omp_do_simd, gfc_trans_omp_parallel_do,
	gfc_trans_omp_parallel_do_simd, gfc_trans_omp_distribute,
	gfc_trans_omp_teams, gfc_trans_omp_target, gfc_trans_omp_taskloop,
	gfc_trans_omp_master_taskloop, gfc_trans_omp_parallel_master): Use it.

gcc/testsuite/ChangeLog:

	* gfortran.dg/gomp/openmp-simd-6.f90: Update scan-tree-dump.
	* gfortran.dg/gomp/scan-5.f90: Likewise.
	* gfortran.dg/gomp/loop-1.f90: Likewise; remove xfail.
	* gfortran.dg/gomp/pr99928-1.f90: Remove xfail.
	* gfortran.dg/gomp/pr99928-2.f90: Likewise.
	* gfortran.dg/gomp/pr99928-3.f90: Likewise.
	* gfortran.dg/gomp/pr99928-8.f90: Likewise.

 gcc/fortran/trans-openmp.c   | 186 ++-
 gcc/testsuite/gfortran.dg/gomp/loop-1.f90|   7 +-
 gcc/testsuite/gfortran.dg/gomp/openmp-simd-6.f90 |   2 +-
 gcc/testsuite/gfortran.dg/gomp/pr99928-1.f90 |   4 +-
 gcc/testsuite/gfortran.dg/gomp/pr99928-2.f90 |   4 +-
 gcc/testsuite/gfortran.dg/gomp/pr99928-3.f90 |  16 +-
 gcc/testsuite/gfortran.dg/gomp/pr99928-8.f90 |  48 +++---
 gcc/testsuite/gfortran.dg/gomp/scan-5.f90|   2 +-
 8 files changed, 224 insertions(+), 45 deletions(-)

diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
index 1e22cdb82b7..0b7df0d441f 100644
--- a/gcc/fortran/trans-openmp.c
+++ b/gcc/fortran/trans-openmp.c
@@ -5358,6 +5358,146 @@ enum
   GFC_OMP_MASK_TASKLOOP = (1 << GFC_OMP_SPLIT_TASKLOOP)
 };
 
+/* If a var is in lastprivate/firstprivate/reduction but not in a
+   data mapping/sharing clause, add it to 'map(tofrom:' if is_target
+   and to 'shared' otherwise.  */
+static void
+gfc_add_clause_implicitly (gfc_omp_clauses *clauses_out,
+			   gfc_omp_clauses *clauses_in,
+			   bool is_target, bool is_parallel_do)
+{
+  int clauselist_to_add = is_target ? OMP_LIST_MAP : OMP_LIST_SHARED;
+  gfc_omp_namelist *tail = NULL;
+  for (int i = 0; i < 5; ++i)
+{
+  gfc_omp_namelist *n;
+  switch (i)
+	{
+	case 0: n = clauses_in->lists[OMP_LIST_FIRSTPRIVATE]; break;
+	case 1: n = clauses_in->lists[OMP_LIST_LASTPRIVATE]; break;
+	case 2: n = clauses_in->lists[OMP_LIST_REDUCTION]; break;
+	case 3: n = clauses_in->lists[OMP_LIST_REDUCTION_INSCAN]; break;
+	case 4: n = clauses_in->lists[OMP_LIST_REDUCTION_TASK]; break;
+	default: gcc_unreachable ();
+	}
+  for (; n != NULL; n = n->next)
+	{
+	  gfc_omp_namelist *n2, **n_firstp = NULL, **n_lastp = NULL;
+	  for (int j = 0; j < 6; ++j)
+	{
+	  gfc_omp_namelist **n2ref = NULL, *prev2 = NULL;
+	  switch (j)
+		{
+		case 0:
+		  n2ref = _out->lists[clauselist_to_add];
+		  break;
+		case 1:
+		  n2ref = _out->lists[OMP_LIST_FIRSTPRIVATE];
+		  break;
+		case 2:
+		  if (is_target)
+		n2ref = _in->lists[OMP_LIST_LASTPRIVATE];
+		  else
+		n2ref = _out->lists[OMP_LIST_LASTPRIVATE];
+		  break;
+		case 3: n2ref = _out->lists[OMP_LIST_REDUCTION]; break;
+		case 4:
+		  n2ref = _out->lists[OMP_LIST_REDUCTION_INSCAN];
+		  break;
+		case 5:
+		  n2ref = _out->lists[OMP_LIST_REDUCTION_TASK];
+		  break;
+		default: gcc_unreachable ();
+		}
+	  for (n2 = *n2ref; n2 != NULL; prev2 = n2, n2 = n2->next)
+		if (n2->sym == n->sym)
+		  break;
+	  if (n2)
+		{
+		  if (j == 0 /* clauselist_to_add */)
+		break;  /* Already present.  */
+		  if (j == 1 /* OMP_LIST_FIRSTPRIVATE */)
+		{
+		  n_firstp = prev2 ? >next : n2ref;
+		  continue;
+		}
+		  if (j == 2 /* OMP_LIST_LASTPRIVATE */)
+		{
+		  n_lastp = prev2 ? >next : n2ref;
+		  continue;
+		}
+		  break;
+		}
+	}
+	  if (n_firstp && n_lastp)
+	{
+	  /* For parallel do, GCC puts firstprivatee/lastprivate
+		 on the parallel.  */
+	  if (is_parallel_do)
+		continue;
+	  *n_firstp = (*n_firstp)->next;
+	  if (!is_target)
+		*n_lastp = (*n_lastp)->next;
+	}
+	  else if (is_target && n_lastp)
+	;
+	  else if (n2 || n_firstp || n_lastp)
+	continue;
+	  if (clauses_out->lists[clauselist_to_add]
+	  && clauses_out->lists[clauselist_to_add] == 

Re: [PATCH] Simplify (view_convert ~a) < 0 to (view_convert a) >= 0 [PR middle-end/100738]

2021-06-07 Thread Richard Biener via Gcc-patches
On Mon, Jun 7, 2021 at 9:02 AM Hongtao Liu via Gcc-patches
 wrote:
>
> On Mon, Jun 7, 2021 at 2:22 PM Hongtao Liu  wrote:
> >
> > On Fri, Jun 4, 2021 at 4:18 PM Marc Glisse  wrote:
> > >
> > > On Fri, 4 Jun 2021, Hongtao Liu via Gcc-patches wrote:
> > >
> > > > On Tue, Jun 1, 2021 at 6:17 PM Marc Glisse  wrote:
> > > >>
> > > >> On Tue, 1 Jun 2021, Hongtao Liu via Gcc-patches wrote:
> > > >>
> > > >>> Hi:
> > > >>>  This patch is about to simplify (view_convert:type ~a) < 0 to
> > > >>> (view_convert:type a) >= 0 when type is signed integer. Similar for
> > > >>> (view_convert:type ~a) >= 0.
> > > >>>  Bootstrapped and regtested on x86_64-linux-gnu{-m32,}.
> > > >>>  Ok for the trunk?
> > > >>>
> > > >>> gcc/ChangeLog:
> > > >>>
> > > >>>PR middle-end/100738
> > > >>>* match.pd ((view_convert ~a) < 0 --> (view_convert a) >= 0,
> > > >>>(view_convert ~a) >= 0 --> (view_convert a) < 0): New GIMPLE
> > > >>>simplification.
> > > >>
> > > >> We already have
> > > >>
> > > >> /* Fold ~X op C as X op' ~C, where op' is the swapped comparison.  */
> > > >> (for cmp (simple_comparison)
> > > >>   scmp (swapped_simple_comparison)
> > > >>   (simplify
> > > >>(cmp (bit_not@2 @0) CONSTANT_CLASS_P@1)
> > > >>(if (single_use (@2)
> > > >> && (TREE_CODE (@1) == INTEGER_CST || TREE_CODE (@1) == 
> > > >> VECTOR_CST))
> > > >> (scmp @0 (bit_not @1)
> > > >>
> > > >> Would it make sense to try and generalize it a bit, say with
> > > >>
> > > >> (cmp (nop_convert1? (bit_not @0)) CONSTANT_CLASS_P)
> > > >>
> > > >> (scmp (view_convert:XXX @0) (bit_not @1))
> > > >>
> > > > Thanks for your advice, it looks great.
> > > > And can I use *view_convert1?* instead of *nop_convert1?* here,
> > > > because the original case is view_convert, and nop_convert would fail
> > > > to simplify the case.
> > >
> > > Near the top of match.pd, you can see
> > >
> > > /* With nop_convert? combine convert? and view_convert? in one pattern
> > > plus conditionalize on tree_nop_conversion_p conversions.  */
> > > (match (nop_convert @0)
> > >   (convert @0)
> > >   (if (tree_nop_conversion_p (type, TREE_TYPE (@0)
> > > (match (nop_convert @0)
> > >   (view_convert @0)
> > >   (if (VECTOR_TYPE_P (type) && VECTOR_TYPE_P (TREE_TYPE (@0))
> > >&& known_eq (TYPE_VECTOR_SUBPARTS (type),
> > > TYPE_VECTOR_SUBPARTS (TREE_TYPE (@0)))
> > >&& tree_nop_conversion_p (TREE_TYPE (type), TREE_TYPE (TREE_TYPE 
> > > (@0))
> > >
> > Oh, it's restricted to the same number of elements which is not the
> > case i tested.
> > That's why nop_convert failed to simplify the case.
> And tree_nop_conversion_p also doesn't handle vector types with
> different element numbers.
> Shouldn't v4si --> v16qi a nop conversion for all targets?

It's a conversion with semantics, like v4si + V_C_E (v16qi) doesn't
make sense when you strip the V_C_E as you'll get v4si + v16qi.  We
don't consider those a nop conversion.

Richard.

> >
> > Guess we can define another nop1_convert to handle vector types with
> > different number of elements, but still tree_nop_convertion_p?
> >
> > > So at least the intention is that it can handle both NOP_EXPR for scalars
> > > and VIEW_CONVERT_EXPR for vectors, and I think we alread use it that way
> > > in some places in match.pd, like
> > >
> > > (simplify
> > >   (negate (nop_convert? (bit_not @0)))
> > >   (plus (view_convert @0) { build_each_one_cst (type); }))
> > >
> > > (simplify
> > >   (bit_xor:c (nop_convert?:s (bit_not:s @0)) @1)
> > >   (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
> > >(bit_not (bit_xor (view_convert @0) @1
> > >
> > > (the 'if' seems redundant for this one)
> > >
> > >   (simplify
> > >(negate (nop_convert? (negate @1)))
> > >(if (!TYPE_OVERFLOW_SANITIZED (type)
> > > && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@1)))
> > > (view_convert @1)))
> > >
> > > etc.
> > >
> > >
> > > At some point this got some genmatch help, to handle '?' and numbers, so I
> > > don't remember all the details, but following these examples should work.
> > >
> > > --
> > > Marc Glisse
> >
> >
> >
> > --
> > BR,
> > Hongtao
>
>
>
> --
> BR,
> Hongtao


Re: [RFC/PATCH] updating global ranges and their effect on __builtin_unreachable code

2021-06-07 Thread Aldy Hernandez via Gcc-patches
On Mon, Jun 7, 2021 at 9:30 AM Richard Biener
 wrote:
>
> On Wed, Jun 2, 2021 at 4:53 PM Aldy Hernandez  wrote:
> >
> >
> >
> > On 6/2/21 1:52 PM, Richard Biener wrote:
> > > On Wed, Jun 2, 2021 at 12:34 PM Aldy Hernandez via Gcc-patches
> > >  wrote:

> >
> > > the case where this only happens after inlining (important for your 
> > > friendly
> > > C++ abstraction hell), and the unreachable()s gone.
> >
> > I have pointed this out before, and will repeat it in case you missed it:
> >
> > "Richard, you have made it very clear that we disagree on core design
> > issues, but that's no reason to continually make snide comments on every
> > patch or PRs  Can we keep the discussions focused on the technical bits?"
>
> You need to stop taking every sentence I write as personal insult.  I was
> refering to C++ abstraction in general in the sense as that we should be
> prepared to handle index checking removal even across inline boundaries.
> Reading my comment I guess the "your" might mislead you - sorry for
> not being a native speaker and thus picking a wrong general term here.
>
> Richard.
>
> > https://gcc.gnu.org/pipermail/gcc-patches/2021-April/569072.html

No, I am only addressing the thinly veiled insults and snide remarks
you frequently make. And I will continue to do so if you steer off
topic. These are not language or cultural issues so don't try to paint
them as such. I am also not a native speaker and do not use it as an
excuse for my behavior.

Aldy



Re: [PATCH 02/57] Support scanning of build-time GC roots in gengtype

2021-06-07 Thread Bill Schmidt via Gcc-patches

On 6/7/21 5:39 AM, Richard Sandiford wrote:

Bill Schmidt via Gcc-patches  writes:

On 5/20/21 5:24 PM, Segher Boessenkool wrote:

On Tue, May 11, 2021 at 11:01:22AM -0500, Bill Schmidt wrote:

Hi!  I'd like to ping this specific patch from the series, which is the
only one remaining that affects common code.  I confess that I don't
know whom to ask for a review for gengtype; I didn't get any good ideas
from MAINTAINERS.  If you know of a good reviewer candidate, please CC
them.

Richard is listed as the "gen* on machine desc" maintainer, that might
be the closest to this.  cc:ed.

Hi, Richard -- any thoughts on this patch?

https://gcc.gnu.org/pipermail/gcc-patches/2021-April/568841.html

I don't really know gengtype.c, sorry.  (The gen* thing was for
the md generators, not genmatch and gengtype.)

Richard


OK, thanks, Richard!

Richi, from git blame, it appears everyone that used to know about 
gengtype has moved on.  Would you be able to give a quick peek at this?  
It's a pretty uninteresting patch, all in all. :-)


Thanks,
Bill



Re: [wwwdocs] Add HTML anchors to GCC 5 "porting to" notes

2021-06-07 Thread Gerald Pfeifer
On Mon, 7 Jun 2021, Jonathan Wakely wrote:
> This adds id attributes to the heading elements of
> https://gcc.gnu.org/gcc-5/porting_to.html (so that I can link directly
> to the section on inline functions).
> 
> All later porting_to.html notes have anchors like this.
> 
> OK for wwwdocs?

(a) This is a no-brainer. :-)  
(b) Thank you.
(c) All of the above. :-)

Gerald



Re: [RFC] Implementing detection of saturation and rounding arithmetic

2021-06-07 Thread Bin.Cheng via Gcc-patches
On Fri, Jun 4, 2021 at 12:35 AM Andre Vieira (lists) via Gcc-patches
 wrote:
>
> Hi,
>
> This RFC is motivated by the IV sharing RFC in
> https://gcc.gnu.org/pipermail/gcc-patches/2021-May/569502.html and the
> need to have the IVOPTS pass be able to clean up IV's shared between
> multiple loops. When creating a similar problem with C code I noticed
> IVOPTs treated IV's with uses outside the loop differently, this didn't
> even required multiple loops, take for instance the following example
> using SVE intrinsics:
>
> #include 
> #include 
> extern void use (char *);
> void bar (char  * __restrict__ a, char * __restrict__ b, char *
> __restrict__ c, unsigned n)
> {
>  svbool_t all_true = svptrue_b8 ();
>unsigned i = 0;
>if (n < (UINT_MAX - svcntb() - 1))
>  {
>  for (; i < n; i += svcntb())
>  {
>  svuint8_t va = svld1 (all_true, (uint8_t*)a);
>  svuint8_t vb = svld1 (all_true, (uint8_t*)b);
>  svst1 (all_true, (uint8_t *)c, svadd_z (all_true, va,vb));
>  a += svcntb();
>  b += svcntb();
>  c += svcntb();
>  }
>  }
>use (a);
> }
>
> IVOPTs tends to generate a shared IV for SVE memory accesses, as we
> don't have a post-increment for SVE load/stores. If we had not included
> 'use (a);' in this example, IVOPTs would have replaced the IV's for a, b
> and c with a single one, (also used for the loop-control). See:
>
> [local count: 955630225]:
># ivtmp.7_8 = PHI 
>va_14 = MEM  [(unsigned char *)a_10(D) + ivtmp.7_8 * 1];
>vb_15 = MEM  [(unsigned char *)b_11(D) + ivtmp.7_8 * 1];
>_2 = svadd_u8_z ({ -1, ... }, va_14, vb_15);
>MEM <__SVUint8_t> [(unsigned char *)c_12(D) + ivtmp.7_8 * 1] = _2;
>ivtmp.7_25 = ivtmp.7_8 + POLY_INT_CST [16, 16];
>i_23 = (unsigned int) ivtmp.7_25;
>if (n_9(D) > i_23)
>  goto ; [89.00%]
>else
>  goto ; [11.00%]
>
>   However, due to the 'use (a);' it will create two IVs one for
> loop-control, b and c and one for a. See:
>
>[local count: 955630225]:
># a_28 = PHI 
># ivtmp.7_25 = PHI 
>va_15 = MEM  [(unsigned char *)a_28];
>vb_16 = MEM  [(unsigned char *)b_12(D) + ivtmp.7_25 * 1];
>_2 = svadd_u8_z ({ -1, ... }, va_15, vb_16);
>MEM <__SVUint8_t> [(unsigned char *)c_13(D) + ivtmp.7_25 * 1] = _2;
>a_18 = a_28 + POLY_INT_CST [16, 16];
>ivtmp.7_24 = ivtmp.7_25 + POLY_INT_CST [16, 16];
>i_8 = (unsigned int) ivtmp.7_24;
>if (n_10(D) > i_8)
>  goto ; [89.00%]
>else
>  goto ; [11.00%]
>
> With the first patch attached in this RFC 'no_cost.patch', I tell IVOPTs
> to not cost uses outside of the loop. This makes IVOPTs generate a
> single IV, but unfortunately it decides to create the variable for the
> use inside the loop and it also seems to use the pre-increment value of
> the shared-IV and add the [16,16] to it. See:
>
> [local count: 955630225]:
># ivtmp.7_25 = PHI 
>va_15 = MEM  [(unsigned char *)a_11(D) + ivtmp.7_25 * 1];
>vb_16 = MEM  [(unsigned char *)b_12(D) + ivtmp.7_25 * 1];
>_2 = svadd_u8_z ({ -1, ... }, va_15, vb_16);
>MEM <__SVUint8_t> [(unsigned char *)c_13(D) + ivtmp.7_25 * 1] = _2;
>_8 = (unsigned long) a_11(D);
>_7 = _8 + ivtmp.7_25;
>_6 = _7 + POLY_INT_CST [16, 16];
>a_18 = (char * restrict) _6;
>ivtmp.7_24 = ivtmp.7_25 + POLY_INT_CST [16, 16];
>i_5 = (unsigned int) ivtmp.7_24;
>if (n_10(D) > i_5)
>  goto ; [89.00%]
>else
>  goto ; [11.00%]
>
> With the patch 'var_after.patch' I make get_computation_aff_1 use
> 'cand->var_after' for outside uses thus using the post-increment var of
> the candidate IV. This means I have to insert it in a different place
> and make sure to delete the old use->stmt. I'm sure there is a better
> way to do this using IVOPTs current framework, but I didn't find one
> yet. See the result:
>
>[local count: 955630225]:
># ivtmp.7_25 = PHI 
>va_15 = MEM  [(unsigned char *)a_11(D) + ivtmp.7_25 * 1];
>vb_16 = MEM  [(unsigned char *)b_12(D) + ivtmp.7_25 * 1];
>_2 = svadd_u8_z ({ -1, ... }, va_15, vb_16);
>MEM <__SVUint8_t> [(unsigned char *)c_13(D) + ivtmp.7_25 * 1] = _2;
>ivtmp.7_24 = ivtmp.7_25 + POLY_INT_CST [16, 16];
>_8 = (unsigned long) a_11(D);
>_7 = _8 + ivtmp.7_24;
>a_18 = (char * restrict) _7;
>i_6 = (unsigned int) ivtmp.7_24;
>if (n_10(D) > i_6)
>  goto ; [89.00%]
>else
>  goto ; [11.00%]
>
>
> This is still not optimal as we are still doing the update inside the
> loop and there is absolutely no need for that. I found that running sink
> would solve it and it seems someone has added a second sink pass, so
> that saves me a third patch :) see after sink2:
>
> [local count: 955630225]:
># ivtmp.7_25 = PHI 
>va_15 = MEM  [(unsigned char *)a_11(D) + ivtmp.7_25 * 1];
>vb_16 = MEM  [(unsigned char *)b_12(D) + ivtmp.7_25 * 1];
>_2 = svadd_u8_z ({ -1, ... }, 

Re: [PATCH, OpenMP 5.0] Implement relaxation of implicit map vs. existing device mappings (for mainline trunk)

2021-06-07 Thread Thomas Schwinge
Hi Chung-Lin!

On 2021-05-14T21:20:25+0800, Chung-Lin Tang  wrote:
> This is a version of patch 
> https://gcc.gnu.org/pipermail/gcc-patches/2021-May/569665.html
> for mainline trunk.

Related to the discussion in that thread,
<87tuneu3f4.fsf@euler.schwinge.homeip.net">http://mid.mail-archive.com/87tuneu3f4.fsf@euler.schwinge.homeip.net>,
please keep this disabled for OpenACC, for the time being.

I do like the general idea (but haven't reviewed in detail the
implementation), but this needs some more thought (and additional
changes) for OpenACC, also related to other patches that are to be
upstreamed.


Does your 'OMP_CLAUSE_MAP_IMPLICIT_P':


/* Nonzero if this map clause was created through implicit data-mapping
   rules. */
#define OMP_CLAUSE_MAP_IMPLICIT_P(NODE) \
  (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_MAP)->base.deprecated_flag)

... need to be integrated/refactored regarding the
'OMP_CLAUSE_MAP_IMPLICIT' that Jakub recently added in
commit r12-1109-gc94424b0ed786ec92b6904da69af8b5243b34fdc
"openmp: Fix up handling of reduction clause on constructs
combined with target [PR99928]":

/* Nonzero on map clauses added implicitly for reduction clauses on combined
   or composite constructs.  They shall be removed if there is an explicit
   map clause.  */
#define OMP_CLAUSE_MAP_IMPLICIT(NODE) \
  (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_MAP)->base.default_def_flag)


Grüße
 Thomas


> This patch implements relaxing the requirements when a map with the implicit 
> attribute encounters
> an overlapping existing map. As the OpenMP 5.0 spec describes on page 320, 
> lines 18-27 (and 5.1 spec,
> page 352, lines 13-22):
>
> "If a single contiguous part of the original storage of a list item with an 
> implicit data-mapping
>   attribute has corresponding storage in the device data environment prior to 
> a task encountering the
>   construct that is associated with the map clause, only that part of the 
> original storage will have
>   corresponding storage in the device data environment as a result of the map 
> clause."
>
> Also tracked in the OpenMP spec context as issue #1463:
> https://github.com/OpenMP/spec/issues/1463
>
> The implementation inside the compiler is to of course, tag the implicitly 
> created maps with some
> indication of "implicit". I've done this with a OMP_CLAUSE_MAP_IMPLICIT_P 
> macro, using
> 'base.deprecated_flag' underneath.
>
> There is an encoding of this as GOMP_MAP_IMPLICIT == 
> GOMP_MAP_FLAG_SPECIAL_3|GOMP_MAP_FLAG_SPECIAL_4
> in include/gomp-constants.h for the runtime, but I've intentionally avoided 
> exploding the entire
> gimplify/omp-low with a new set of GOMP_MAP_IMPLICIT_TO/FROM/etc. symbols, 
> instead adding in the new
> flag bits only at the final runtime call generation during omp-lowering.
>
> The rest is libgomp mapping taking care of the implicit case: allowing map 
> success if an existing
> map is a proper subset of the new map, if the new map is implicit. 
> Straightforward enough I think.
>
> There are also some additions to print the implicit attribute during tree 
> pretty-printing, for that
> reason some scan tests were updated.
>
> Also, another adjustment in this patch is how implicitly created clauses are 
> added to the current
> clause list in gimplify_adjust_omp_clauses(). Instead of simply appending the 
> new clauses to the end,
> this patch adds them at the position "after initial non-map clauses, but 
> right before any existing
> map clauses".
>
> The reason for this is: when combined with other map clauses, for example:
>
>#pragma omp target map(rec.ptr[:N])
>for (int i = 0; i < N; i++)
>  rec.ptr[i] += 1;
>
> There will be an implicit map created for map(rec), because of the access 
> inside the target region.
> The expectation is that 'rec' is implicitly mapped, and then the pointed 
> array-section part by 'rec.ptr'
> will be mapped, and then attachment to the 'rec.ptr' field of the mapped 
> 'rec' (in that order).
>
> If the implicit 'map(rec)' is appended to the end, instead of placed before 
> other maps, the attachment
> operation will not find anything to attach to, and the entire region will 
> fail.
>
> Note: this touches a bit on another issue which I will be sending a patch for 
> later:
> per the discussion on omp-lang, an array section list item should *not* be 
> mapping its base-pointer
> (although an attachment attempt should exist), while in current GCC behavior, 
> for struct member pointers
> like 'rec.ptr' above, we do map it (which should be deemed incorrect).
>
> This means that as of right now, this modification of map order doesn't 
> really exhibit the above mentioned
> behavior yet. I have included it as part of this patch because the 
> "[implicit]" tree printing requires
> modifying many gimple scan tests already, so including the test modifications 
> together seems more
> manageable patch-wise.
>
> Tested with no regressions on x86_64-linux with nvptx 

Re: [PING] [PATCH] For obj-c stage-final re-use the checksum from the previous stage

2021-06-07 Thread Richard Biener
On Fri, 4 Jun 2021, Bernd Edlinger wrote:

> Ping...

OK.

Richard.

> On 5/28/21 9:47 AM, Bernd Edlinger wrote:
> > Hi Richard,
> > 
> > I've replicated your PR to make the objective-c checksum compare equal
> > 
> > commit fb2647aaf55b453b37badfd40c14c59486a74584
> > Author: Richard Biener 
> > Date:   Tue May 3 08:14:27 2016 +
> > 
> > Make-lang.in (cc1-checksum.c): For stage-final re-use the checksum from 
> > the previous stage.
> > 
> > 2016-05-03  Richard Biener  
> > 
> > c/
> > * Make-lang.in (cc1-checksum.c): For stage-final re-use
> > the checksum from the previous stage.
> > 
> > cp/
> > * Make-lang.in (cc1plus-checksum.c): For stage-final re-use
> > the checksum from the previous stage.
> > 
> > From-SVN: r235804
> > 
> > 
> > This silences the stage compare.
> > 
> > Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
> > Is it OK for trunk?
> > 
> > 
> > Thanks
> > Bernd.
> > 
> > 
> > 2021-05-28  Bernd Edlinger  
> > 
> > objc/
> > * Make-lang.in (cc1obj-checksum.c): For stage-final re-use
> > the checksum from the previous stage.
> > 
> > objcp/
> > * Make-lang.in (cc1objplus-checksum.c): For stage-final re-use
> > the checksum from the previous stage.
> > 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)


[PATCH 2/2] arm: Auto-vectorization for MVE: add pack/unpack patterns

2021-06-07 Thread Christophe Lyon via Gcc-patches
This patch adds vec_unpack_hi_, vec_unpack_lo_,
vec_pack_trunc_ patterns for MVE.

It does so by moving the unpack patterns from neon.md to
vec-common.md, while adding them support for MVE. The pack expander is
derived from the Neon one (which in turn is renamed into
neon_quad_vec_pack_trunc_).

The patch introduces mve_vec_pack_trunc_ to avoid the need for a
zero-initialized temporary, which is needed if the
vec_pack_trunc_ expander calls @mve_vmovn[bt]q_
instead.

With this patch, we can now vectorize the 16 and 8-bit versions of
vclz and vshl, although the generated code could still be improved.
For test_clz_s16, we now generate
vldrh.16q3, [r1]
vmovlb.s16   q2, q3
vmovlt.s16   q3, q3
vclz.i32  q2, q2
vclz.i32  q3, q3
vmovnb.i32  q1, q2
vmovnt.i32  q1, q3
vstrh.16q1, [r0]
which could be improved to
vldrh.16q3, [r1]
vclz.i16q1, q3
vstrh.16q1, [r0]
if we could avoid the need for unpack/pack steps.

For reference, clang-12 generates:
vldrh.s32   q0, [r1]
vldrh.s32   q1, [r1, #8]
vclz.i32q0, q0
vstrh.32q0, [r0]
vclz.i32q0, q1
vstrh.32q0, [r0, #8]

2021-06-03  Christophe Lyon  

gcc/
* config/arm/mve.md (mve_vmovltq_): Prefix with '@'.
(mve_vmovlbq_): Likewise.
(mve_vmovnbq_): Likewise.
(mve_vmovntq_): Likewise.
(@mve_vec_pack_trunc_): New pattern.
* config/arm/neon.md (vec_unpack_hi_): Move to
vec-common.md.
(vec_unpack_lo_): Likewise.
(vec_pack_trunc_): Rename to
neon_quad_vec_pack_trunc_.
* config/arm/vec-common.md (vec_unpack_hi_): New
pattern.
(vec_unpack_lo_): New.
(vec_pack_trunc_): New.

gcc/testsuite/
* gcc.target/arm/simd/mve-vclz.c: Update expected results.
* gcc.target/arm/simd/mve-vshl.c: Likewise.
---
 gcc/config/arm/mve.md| 20 -
 gcc/config/arm/neon.md   | 39 +
 gcc/config/arm/vec-common.md | 89 
 gcc/testsuite/gcc.target/arm/simd/mve-vclz.c |  7 +-
 gcc/testsuite/gcc.target/arm/simd/mve-vshl.c |  5 +-
 5 files changed, 114 insertions(+), 46 deletions(-)

diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 99e46d0bc69..b18292c07d3 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -510,7 +510,7 @@ (define_insn "mve_vrev32q_"
 ;;
 ;; [vmovltq_u, vmovltq_s])
 ;;
-(define_insn "mve_vmovltq_"
+(define_insn "@mve_vmovltq_"
   [
(set (match_operand: 0 "s_register_operand" "=w")
(unspec: [(match_operand:MVE_3 1 "s_register_operand" 
"w")]
@@ -524,7 +524,7 @@ (define_insn "mve_vmovltq_"
 ;;
 ;; [vmovlbq_s, vmovlbq_u])
 ;;
-(define_insn "mve_vmovlbq_"
+(define_insn "@mve_vmovlbq_"
   [
(set (match_operand: 0 "s_register_operand" "=w")
(unspec: [(match_operand:MVE_3 1 "s_register_operand" 
"w")]
@@ -2187,7 +2187,7 @@ (define_insn "mve_vmlsldavxq_s"
 ;;
 ;; [vmovnbq_u, vmovnbq_s])
 ;;
-(define_insn "mve_vmovnbq_"
+(define_insn "@mve_vmovnbq_"
   [
(set (match_operand: 0 "s_register_operand" "=w")
(unspec: [(match_operand: 1 
"s_register_operand" "0")
@@ -2202,7 +2202,7 @@ (define_insn "mve_vmovnbq_"
 ;;
 ;; [vmovntq_s, vmovntq_u])
 ;;
-(define_insn "mve_vmovntq_"
+(define_insn "@mve_vmovntq_"
   [
(set (match_operand: 0 "s_register_operand" "=w")
(unspec: [(match_operand: 1 
"s_register_operand" "0")
@@ -2214,6 +2214,18 @@ (define_insn "mve_vmovntq_"
   [(set_attr "type" "mve_move")
 ])
 
+(define_insn "@mve_vec_pack_trunc_"
+ [(set (match_operand: 0 "register_operand" "=")
+   (vec_concat:
+   (truncate:
+   (match_operand:MVE_5 1 "register_operand" "w"))
+   (truncate:
+   (match_operand:MVE_5 2 "register_operand" "w"]
+ "TARGET_HAVE_MVE"
+ "vmovnb.i  %q0, %q1\;vmovnt.i   %q0, %q2"
+  [(set_attr "type" "mve_move")]
+)
+
 ;;
 ;; [vmulq_f])
 ;;
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 0fdffaf4ec4..392d9607919 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -5924,43 +5924,6 @@ (define_insn "neon_vec_unpack_hi_"
   [(set_attr "type" "neon_shift_imm_long")]
 )
 
-(define_expand "vec_unpack_hi_"
-  [(match_operand: 0 "register_operand")
-   (SE: (match_operand:VU 1 "register_operand"))]
- "TARGET_NEON && !BYTES_BIG_ENDIAN"
-  {
-   rtvec v = rtvec_alloc (/2)  ;
-   rtx t1;
-   int i;
-   for (i = 0; i < (/2); i++)
- RTVEC_ELT (v, i) = GEN_INT ((/2) + i);
-  
-   t1 = gen_rtx_PARALLEL (mode, v);
-   emit_insn (gen_neon_vec_unpack_hi_ (operands[0], 
- operands[1], 
-t1));
-   DONE;
-  }
-)
-
-(define_expand "vec_unpack_lo_"
-  [(match_operand: 0 "register_operand")
-   

[PATCH 1/2] arm: Auto-vectorization for MVE: vclz

2021-06-07 Thread Christophe Lyon via Gcc-patches
This patch adds support for auto-vectorization of clz for MVE.

It does so by removing the unspec from mve_vclzq_ and uses
'clz' instead. It moves to neon_vclz expander from neon.md to
vec-common.md and renames it into the standard name clz2.

2021-06-03  Christophe Lyon  

gcc/
* config/arm/iterators.md (): Remove VCLZQ_U, VCLZQ_S.
(VCLZQ): Remove.
* config/arm/mve.md (mve_vclzq_): Add '@' prefix,
remove  iterator.
(mve_vclzq_u): New.
* config/arm/neon.md (clz2): Rename to neon_vclz.
(neon_vclz"
 ;;
 ;; [vclzq_u, vclzq_s])
 ;;
-(define_insn "mve_vclzq_"
+(define_insn "@mve_vclzq_s"
   [
(set (match_operand:MVE_2 0 "s_register_operand" "=w")
-   (unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "w")]
-VCLZQ))
+   (clz:MVE_2 (match_operand:MVE_2 1 "s_register_operand" "w")))
   ]
   "TARGET_HAVE_MVE"
   "vclz.i%#  %q0, %q1"
   [(set_attr "type" "mve_move")
 ])
+(define_expand "mve_vclzq_u"
+  [
+   (set (match_operand:MVE_2 0 "s_register_operand")
+   (clz:MVE_2 (match_operand:MVE_2 1 "s_register_operand")))
+  ]
+  "TARGET_HAVE_MVE"
+)
 
 ;;
 ;; [vclsq_s])
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 18571d819eb..0fdffaf4ec4 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -3018,7 +3018,7 @@ (define_insn "neon_vcls"
   [(set_attr "type" "neon_cls")]
 )
 
-(define_insn "clz2"
+(define_insn "neon_vclz"
   [(set (match_operand:VDQIW 0 "s_register_operand" "=w")
 (clz:VDQIW (match_operand:VDQIW 1 "s_register_operand" "w")))]
   "TARGET_NEON"
@@ -3026,15 +3026,6 @@ (define_insn "clz2"
   [(set_attr "type" "neon_cnt")]
 )
 
-(define_expand "neon_vclz"
-  [(match_operand:VDQIW 0 "s_register_operand")
-   (match_operand:VDQIW 1 "s_register_operand")]
-  "TARGET_NEON"
-{
-  emit_insn (gen_clz2 (operands[0], operands[1]));
-  DONE;
-})
-
 (define_insn "popcount2"
   [(set (match_operand:VE 0 "s_register_operand" "=w")
 (popcount:VE (match_operand:VE 1 "s_register_operand" "w")))]
diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
index ed1bc293b78..ad1c6edd005 100644
--- a/gcc/config/arm/unspecs.md
+++ b/gcc/config/arm/unspecs.md
@@ -556,8 +556,6 @@ (define_c_enum "unspec" [
   VQABSQ_S
   VDUPQ_N_U
   VDUPQ_N_S
-  VCLZQ_U
-  VCLZQ_S
   VCLSQ_S
   VADDVQ_S
   VADDVQ_U
diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md
index 2779c1a8aaa..1ba1e5eb008 100644
--- a/gcc/config/arm/vec-common.md
+++ b/gcc/config/arm/vec-common.md
@@ -625,3 +625,16 @@ (define_expand "uavg3_ceil"
   operands[0], operands[1], operands[2]));
   DONE;
 })
+
+(define_expand "clz2"
+  [(match_operand:VDQIW 0 "s_register_operand")
+   (match_operand:VDQIW 1 "s_register_operand")]
+  "ARM_HAVE__ARITH
+   && !TARGET_REALLY_IWMMXT"
+{
+  if (TARGET_NEON)
+emit_insn (gen_neon_vclz (operands[0], operands[1]));
+  else
+emit_insn (gen_mve_vclzq_s (mode, operands[0], operands[1]));
+  DONE;
+})
diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vclz.c 
b/gcc/testsuite/gcc.target/arm/simd/mve-vclz.c
new file mode 100644
index 000..7068736bc28
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/mve-vclz.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_1m_mve_ok } */
+/* { dg-add-options arm_v8_1m_mve } */
+/* { dg-additional-options "-O3" } */
+
+#include 
+
+#define FUNC(SIGN, TYPE, BITS, NAME)   \
+  void test_ ## NAME ##_ ## SIGN ## BITS (TYPE##BITS##_t * __restrict__ dest, \
+ TYPE##BITS##_t *a) {  \
+int i; \
+for (i=0; i < (128 / BITS); i++) { \
+  dest[i] = (TYPE##BITS##_t)__builtin_clz(a[i]);   \
+}  \
+}
+
+FUNC(s, int, 32, clz)
+FUNC(u, uint, 32, clz)
+FUNC(s, int, 16, clz)
+FUNC(u, uint, 16, clz)
+FUNC(s, int, 8, clz)
+FUNC(u, uint, 8, clz)
+
+/* 16 and 8-bit versions are not vectorized because they need pack/unpack
+   patterns since __builtin_clz uses 32-bit parameter and return value.  */
+/* { dg-final { scan-assembler-times {vclz\.i32  q[0-9]+, q[0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {vclz\.i16  q[0-9]+, q[0-9]+} 2 { xfail 
*-*-* } } } */
+/* { dg-final { scan-assembler-times {vclz\.i8  q[0-9]+, q[0-9]+} 2 { xfail 
*-*-* } } } */
-- 
2.25.1



Re: [PATCH 02/57] Support scanning of build-time GC roots in gengtype

2021-06-07 Thread Richard Sandiford via Gcc-patches
Bill Schmidt via Gcc-patches  writes:
> On 5/20/21 5:24 PM, Segher Boessenkool wrote:
>> On Tue, May 11, 2021 at 11:01:22AM -0500, Bill Schmidt wrote:
>>> Hi!  I'd like to ping this specific patch from the series, which is the
>>> only one remaining that affects common code.  I confess that I don't
>>> know whom to ask for a review for gengtype; I didn't get any good ideas
>>> from MAINTAINERS.  If you know of a good reviewer candidate, please CC
>>> them.
>> Richard is listed as the "gen* on machine desc" maintainer, that might
>> be the closest to this.  cc:ed.
>
> Hi, Richard -- any thoughts on this patch?
>
> https://gcc.gnu.org/pipermail/gcc-patches/2021-April/568841.html

I don't really know gengtype.c, sorry.  (The gen* thing was for
the md generators, not genmatch and gengtype.)

Richard


Re: [PATCH] define auto_vec copy ctor and assignment (PR 90904)

2021-06-07 Thread Trevor Saunders
On Mon, Jun 07, 2021 at 10:51:18AM +0200, Richard Biener via Gcc-patches wrote:
> On Thu, Jun 3, 2021 at 10:29 AM Trevor Saunders  wrote:
> >
> > On Wed, Jun 02, 2021 at 10:04:03AM -0600, Martin Sebor via Gcc-patches 
> > wrote:
> > > On 6/2/21 12:55 AM, Richard Biener wrote:
> > > > On Tue, Jun 1, 2021 at 9:56 PM Martin Sebor  wrote:
> > > > >
> > > > > On 5/27/21 2:53 PM, Jason Merrill wrote:
> > > > > > On 4/27/21 11:52 AM, Martin Sebor via Gcc-patches wrote:
> > > > > > > On 4/27/21 8:04 AM, Richard Biener wrote:
> > > > > > > > On Tue, Apr 27, 2021 at 3:59 PM Martin Sebor  
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > On 4/27/21 1:58 AM, Richard Biener wrote:
> > > > > > > > > > On Tue, Apr 27, 2021 at 2:46 AM Martin Sebor via Gcc-patches
> > > > > > > > > >  wrote:
> > > > > > > > > > >
> > > > > > > > > > > PR 90904 notes that auto_vec is unsafe to copy and assign 
> > > > > > > > > > > because
> > > > > > > > > > > the class manages its own memory but doesn't define (or 
> > > > > > > > > > > delete)
> > > > > > > > > > > either special function.  Since I first ran into the 
> > > > > > > > > > > problem,
> > > > > > > > > > > auto_vec has grown a move ctor and move assignment from
> > > > > > > > > > > a dynamically-allocated vec but still no copy ctor or copy
> > > > > > > > > > > assignment operator.
> > > > > > > > > > >
> > > > > > > > > > > The attached patch adds the two special functions to 
> > > > > > > > > > > auto_vec along
> > > > > > > > > > > with a few simple tests.  It makes auto_vec safe to use 
> > > > > > > > > > > in containers
> > > > > > > > > > > that expect copyable and assignable element types and 
> > > > > > > > > > > passes
> > > > > > > > > > > bootstrap
> > > > > > > > > > > and regression testing on x86_64-linux.
> > > > > > > > > >
> > > > > > > > > > The question is whether we want such uses to appear since 
> > > > > > > > > > those
> > > > > > > > > > can be quite inefficient?  Thus the option is to delete 
> > > > > > > > > > those
> > > > > > > > > > operators?
> > > > > > > > >
> > > > > > > > > I would strongly prefer the generic vector class to have the 
> > > > > > > > > properties
> > > > > > > > > expected of any other generic container: copyable and 
> > > > > > > > > assignable.  If
> > > > > > > > > we also want another vector type with this restriction I 
> > > > > > > > > suggest to add
> > > > > > > > > another "noncopyable" type and make that property explicit in 
> > > > > > > > > its name.
> > > > > > > > > I can submit one in a followup patch if you think we need one.
> > > > > > > >
> > > > > > > > I'm not sure (and not strictly against the copy and assign).  
> > > > > > > > Looking
> > > > > > > > around
> > > > > > > > I see that vec<> does not do deep copying.  Making auto_vec<> 
> > > > > > > > do it
> > > > > > > > might be surprising (I added the move capability to match how 
> > > > > > > > vec<>
> > > > > > > > is used - as "reference" to a vector)
> > > > > > >
> > > > > > > The vec base classes are special: they have no ctors at all 
> > > > > > > (because
> > > > > > > of their use in unions).  That's something we might have to live 
> > > > > > > with
> > > > > > > but it's not a model to follow in ordinary containers.
> > > > > >
> > > > > > I don't think we have to live with it anymore, now that we're 
> > > > > > writing
> > > > > > C++11.
> > > > > >
> > > > > > > The auto_vec class was introduced to fill the need for a 
> > > > > > > conventional
> > > > > > > sequence container with a ctor and dtor.  The missing copy ctor 
> > > > > > > and
> > > > > > > assignment operators were an oversight, not a deliberate feature.
> > > > > > > This change fixes that oversight.
> >
> > I've been away a while, but trying to get back into this, sorry.  It was
> > definitely an oversight to leave these undefined for the compiler to
> > provide a default definition of, but I agree with Richi, the better
> > thing to have done, or do now would be to mark them as deleted and make
> > auto_vec move only (with copy() for when you really need a deep copy.
> > > > > > >
> > > > > > > The revised patch also adds a copy ctor/assignment to the auto_vec
> > > > > > > primary template (that's also missing it).  In addition, it adds
> > > > > > > a new class called auto_vec_ncopy that disables copying and
> > > > > > > assignment as you prefer.
> > > > > >
> > > > > > Hmm, adding another class doesn't really help with the confusion 
> > > > > > richi
> > > > > > mentions.  And many uses of auto_vec will pass them as vec, which 
> > > > > > will
> > > > > > still do a shallow copy.  I think it's probably better to disable 
> > > > > > the
> > > > > > copy special members for auto_vec until we fix vec<>.
> > > > >
> > > > > There are at least a couple of problems that get in the way of fixing
> > > > > all of vec to act like a well-behaved C++ container:
> > > > >
> > > > > 1) The embedded vec has a trailing "flexible" array member with its

Re: [PATCH] Implement a context aware points-to analyzer for use in evrp.

2021-06-07 Thread Aldy Hernandez via Gcc-patches
Sorry, meant to append... "OK for trunk?" :)

On Mon, Jun 7, 2021 at 12:10 PM Aldy Hernandez  wrote:
>
> The substitute_and_fold_engine which evrp uses is expecting symbolics
> from value_of_expr / value_on_edge / etc, which ranger does not provide.
> In some cases, these provide important folding cues, as in the case of
> aliases for pointers.  For example, legacy evrp may return [, ]
> for the value of "bar" where bar is on an edge where bar == , or
> when bar has been globally set to   This information is then used
> by the subst & fold engine to propagate the known value of bar.
>
> Currently this is a major source of discrepancies between evrp and
> ranger.  Of the 284 cases legacy evrp is getting over ranger, 237 are
> for pointer equality as discussed above.
>
> This patch implements a context aware points-to class which
> ranger-evrp can use to query what a pointer is currently pointing to.
> With it, we reduce the 284 cases legacy evrp is getting to 47.
>
> The API for the points-to analyzer is the following:
>
> class points_to_analyzer
> {
> public:
>   points_to_analyzer (gimple_ranger *r);
>   ~points_to_analyzer ();
>   void enter (basic_block);
>   void leave (basic_block);
>   void visit_stmt (gimple *stmt);
>   tree get_points_to (tree name) const;
> ...
> };
>
> The enter(), leave(), and visit_stmt() methods are meant to be called
> from a DOM walk.   At any point throughout the walk, one can call
> get_points_to() to get whatever an SSA is pointing to.
>
> If this class is useful to others, we could place it in a more generic
> location.
>
> Tested on x86-64 Linux with a regular bootstrap/tests and by comparing
> EVRP folds over ranger before and after this patch.
>
> gcc/ChangeLog:
>
> * gimple-ssa-evrp.c (class ssa_equiv_stack): New.
> (ssa_equiv_stack::ssa_equiv_stack): New.
> (ssa_equiv_stack::~ssa_equiv_stack): New.
> (ssa_equiv_stack::enter): New.
> (ssa_equiv_stack::leave): New.
> (ssa_equiv_stack::push_replacement): New.
> (ssa_equiv_stack::get_replacement): New.
> (is_pointer_ssa): New.
> (class points_to_analyzer): New.
> (points_to_analyzer::points_to_analyzer): New.
> (points_to_analyzer::~points_to_analyzer): New.
> (points_to_analyzer::set_global_points_to): New.
> (points_to_analyzer::set_cond_points_to): New.
> (points_to_analyzer::get_points_to): New.
> (points_to_analyzer::enter): New.
> (points_to_analyzer::leave): New.
> (points_to_analyzer::get_points_to_expr): New.
> (pta_valueize): New.
> (points_to_analyzer::visit_stmt): New.
> (points_to_analyzer::visit_edge): New.
> (hybrid_folder::value_of_expr): Call PTA.
> (hybrid_folder::value_on_edge): Same.
> (hybrid_folder::pre_fold_bb): New.
> (hybrid_folder::post_fold_bb): New.
> (hybrid_folder::pre_fold_stmt): New.
> (rvrp_folder::pre_fold_bb): New.
> (rvrp_folder::post_fold_bb): New.
> (rvrp_folder::pre_fold_stmt): New.
> (rvrp_folder::value_of_expr): Call PTA.
> (rvrp_folder::value_on_edge): Same.
> ---
>  gcc/gimple-ssa-evrp.c | 352 +-
>  1 file changed, 350 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/gimple-ssa-evrp.c b/gcc/gimple-ssa-evrp.c
> index 118d10365a0..6ce32d7b620 100644
> --- a/gcc/gimple-ssa-evrp.c
> +++ b/gcc/gimple-ssa-evrp.c
> @@ -42,6 +42,305 @@ along with GCC; see the file COPYING3.  If not see
>  #include "vr-values.h"
>  #include "gimple-ssa-evrp-analyze.h"
>  #include "gimple-range.h"
> +#include "fold-const.h"
> +
> +// Unwindable SSA equivalence table for pointers.
> +//
> +// The main query point is get_replacement() which returns what a given SSA 
> can
> +// be replaced with in the current scope.
> +
> +class ssa_equiv_stack
> +{
> +public:
> +  ssa_equiv_stack ();
> +  ~ssa_equiv_stack ();
> +  void enter (basic_block);
> +  void leave (basic_block);
> +  void push_replacement (tree name, tree replacement);
> +  tree get_replacement (tree name) const;
> +
> +private:
> +  auto_vec> m_stack;
> +  tree *m_replacements;
> +  const std::pair  m_marker = std::make_pair (NULL, NULL);
> +};
> +
> +ssa_equiv_stack::ssa_equiv_stack ()
> +{
> +  m_replacements = new tree[num_ssa_names] ();
> +}
> +
> +ssa_equiv_stack::~ssa_equiv_stack ()
> +{
> +  m_stack.release ();
> +  delete m_replacements;
> +}
> +
> +// Pushes a marker at the given point.
> +
> +void
> +ssa_equiv_stack::enter (basic_block)
> +{
> +  m_stack.safe_push (m_marker);
> +}
> +
> +// Pops the stack to the last marker, while performing replacements along the
> +// way.
> +
> +void
> +ssa_equiv_stack::leave (basic_block)
> +{
> +  gcc_checking_assert (!m_stack.is_empty ());
> +  while (m_stack.last () != m_marker)
> +{
> +  std::pair e = m_stack.pop ();
> +  m_replacements[SSA_NAME_VERSION (e.first)] = e.second;
> +}
> +  m_stack.pop ();

Re: [PATCH 1/4]middle-end Vect: Add support for dot-product where the sign for the multiplicant changes.

2021-06-07 Thread Richard Sandiford via Gcc-patches
Sorry for the slow response.

Tamar Christina  writes:
> […]
> diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
> index 
> 441d6cd28c4eaded7abd756164890dbcffd2f3b8..82123b96313e6783ea214b9259805d65c07d8858
>  100644
> --- a/gcc/tree-vect-patterns.c
> +++ b/gcc/tree-vect-patterns.c
> @@ -201,7 +201,8 @@ vect_get_external_def_edge (vec_info *vinfo, tree var)
>  static bool
>  vect_supportable_direct_optab_p (vec_info *vinfo, tree otype, tree_code code,
>  tree itype, tree *vecotype_out,
> -tree *vecitype_out = NULL)
> +tree *vecitype_out = NULL,
> +enum optab_subtype subtype = optab_default)
>  {
>tree vecitype = get_vectype_for_scalar_type (vinfo, itype);
>if (!vecitype)
> @@ -211,7 +212,7 @@ vect_supportable_direct_optab_p (vec_info *vinfo, tree 
> otype, tree_code code,
>if (!vecotype)
>  return false;
>
> -  optab optab = optab_for_tree_code (code, vecitype, optab_default);
> +  optab optab = optab_for_tree_code (code, vecitype, subtype);
>if (!optab)
>  return false;
>
> @@ -487,10 +488,14 @@ vect_joust_widened_integer (tree type, bool shift_p, 
> tree op,
>  }
>
>  /* Return true if the common supertype of NEW_TYPE and *COMMON_TYPE
> -   is narrower than type, storing the supertype in *COMMON_TYPE if so.  */
> +   is narrower than type, storing the supertype in *COMMON_TYPE if so.
> +   If UNPROM_TYPE then accept that *COMMON_TYPE and NEW_TYPE may be of
> +   different signs but equal precision and that the resulting
> +   multiplication of them be compatible with UNPROM_TYPE.   */
>
>  static bool
> -vect_joust_widened_type (tree type, tree new_type, tree *common_type)
> +vect_joust_widened_type (tree type, tree new_type, tree *common_type,
> +tree unprom_type = NULL)
>  {
>if (types_compatible_p (*common_type, new_type))
>  return true;
> @@ -514,7 +519,18 @@ vect_joust_widened_type (tree type, tree new_type, tree 
> *common_type)
>unsigned int precision = MAX (TYPE_PRECISION (*common_type),
> TYPE_PRECISION (new_type));
>precision *= 2;
> -  if (precision * 2 > TYPE_PRECISION (type))
> +
> +  /* Check if the mismatch is only in the sign and if we have
> + UNPROM_TYPE then allow it if there is enough precision to
> + not lose any information during the conversion.  */
> +  if (unprom_type
> +  && TYPE_SIGN (unprom_type) == SIGNED
> +  && tree_nop_conversion_p (*common_type, new_type))
> +   return true;
> +
> +  /* The resulting application is unsigned, check if we have enough
> + precision to perform the operation.  */
> +  if (precision * 2 > TYPE_PRECISION (unprom_type ? unprom_type : type))
>  return false;
>
>*common_type = build_nonstandard_integer_type (precision, false);
> @@ -532,6 +548,10 @@ vect_joust_widened_type (tree type, tree new_type, tree 
> *common_type)
> to a type that (a) is narrower than the result of STMT_INFO and
> (b) can hold all leaf operand values.
>
> +   If UNPROM_TYPE then allow that the signs of the operands
> +   may differ in signs but not in precision and that the resulting type
> +   of the operation on the operands is compatible with UNPROM_TYPE.
> +
> Return 0 if STMT_INFO isn't such a tree, or if no such COMMON_TYPE
> exists.  */
>
> @@ -539,7 +559,8 @@ static unsigned int
>  vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code 
> code,
>   tree_code widened_code, bool shift_p,
>   unsigned int max_nops,
> - vect_unpromoted_value *unprom, tree *common_type)
> + vect_unpromoted_value *unprom, tree *common_type,
> + tree unprom_type = NULL)
>  {
>/* Check for an integer operation with the right code.  */
>gassign *assign = dyn_cast  (stmt_info->stmt);
> @@ -600,7 +621,8 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info 
> stmt_info, tree_code code,
> = vinfo->lookup_def (this_unprom->op);
>   nops = vect_widened_op_tree (vinfo, def_stmt_info, code,
>widened_code, shift_p, max_nops,
> -  this_unprom, common_type);
> +  this_unprom, common_type,
> +  unprom_type);
>   if (nops == 0)
> return 0;
>
> @@ -617,7 +639,7 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info 
> stmt_info, tree_code code,
>   if (i == 0)
> *common_type = this_unprom->type;
>   else if (!vect_joust_widened_type (type, this_unprom->type,
> -common_type))
> +common_type, unprom_type))
> return 0;
> 

[PATCH] Implement a context aware points-to analyzer for use in evrp.

2021-06-07 Thread Aldy Hernandez via Gcc-patches
The substitute_and_fold_engine which evrp uses is expecting symbolics
from value_of_expr / value_on_edge / etc, which ranger does not provide.
In some cases, these provide important folding cues, as in the case of
aliases for pointers.  For example, legacy evrp may return [, ]
for the value of "bar" where bar is on an edge where bar == , or
when bar has been globally set to   This information is then used
by the subst & fold engine to propagate the known value of bar.

Currently this is a major source of discrepancies between evrp and
ranger.  Of the 284 cases legacy evrp is getting over ranger, 237 are
for pointer equality as discussed above.

This patch implements a context aware points-to class which
ranger-evrp can use to query what a pointer is currently pointing to.
With it, we reduce the 284 cases legacy evrp is getting to 47.

The API for the points-to analyzer is the following:

class points_to_analyzer
{
public:
  points_to_analyzer (gimple_ranger *r);
  ~points_to_analyzer ();
  void enter (basic_block);
  void leave (basic_block);
  void visit_stmt (gimple *stmt);
  tree get_points_to (tree name) const;
...
};

The enter(), leave(), and visit_stmt() methods are meant to be called
from a DOM walk.   At any point throughout the walk, one can call
get_points_to() to get whatever an SSA is pointing to.

If this class is useful to others, we could place it in a more generic
location.

Tested on x86-64 Linux with a regular bootstrap/tests and by comparing
EVRP folds over ranger before and after this patch.

gcc/ChangeLog:

* gimple-ssa-evrp.c (class ssa_equiv_stack): New.
(ssa_equiv_stack::ssa_equiv_stack): New.
(ssa_equiv_stack::~ssa_equiv_stack): New.
(ssa_equiv_stack::enter): New.
(ssa_equiv_stack::leave): New.
(ssa_equiv_stack::push_replacement): New.
(ssa_equiv_stack::get_replacement): New.
(is_pointer_ssa): New.
(class points_to_analyzer): New.
(points_to_analyzer::points_to_analyzer): New.
(points_to_analyzer::~points_to_analyzer): New.
(points_to_analyzer::set_global_points_to): New.
(points_to_analyzer::set_cond_points_to): New.
(points_to_analyzer::get_points_to): New.
(points_to_analyzer::enter): New.
(points_to_analyzer::leave): New.
(points_to_analyzer::get_points_to_expr): New.
(pta_valueize): New.
(points_to_analyzer::visit_stmt): New.
(points_to_analyzer::visit_edge): New.
(hybrid_folder::value_of_expr): Call PTA.
(hybrid_folder::value_on_edge): Same.
(hybrid_folder::pre_fold_bb): New.
(hybrid_folder::post_fold_bb): New.
(hybrid_folder::pre_fold_stmt): New.
(rvrp_folder::pre_fold_bb): New.
(rvrp_folder::post_fold_bb): New.
(rvrp_folder::pre_fold_stmt): New.
(rvrp_folder::value_of_expr): Call PTA.
(rvrp_folder::value_on_edge): Same.
---
 gcc/gimple-ssa-evrp.c | 352 +-
 1 file changed, 350 insertions(+), 2 deletions(-)

diff --git a/gcc/gimple-ssa-evrp.c b/gcc/gimple-ssa-evrp.c
index 118d10365a0..6ce32d7b620 100644
--- a/gcc/gimple-ssa-evrp.c
+++ b/gcc/gimple-ssa-evrp.c
@@ -42,6 +42,305 @@ along with GCC; see the file COPYING3.  If not see
 #include "vr-values.h"
 #include "gimple-ssa-evrp-analyze.h"
 #include "gimple-range.h"
+#include "fold-const.h"
+
+// Unwindable SSA equivalence table for pointers.
+//
+// The main query point is get_replacement() which returns what a given SSA can
+// be replaced with in the current scope.
+
+class ssa_equiv_stack
+{
+public:
+  ssa_equiv_stack ();
+  ~ssa_equiv_stack ();
+  void enter (basic_block);
+  void leave (basic_block);
+  void push_replacement (tree name, tree replacement);
+  tree get_replacement (tree name) const;
+
+private:
+  auto_vec> m_stack;
+  tree *m_replacements;
+  const std::pair  m_marker = std::make_pair (NULL, NULL);
+};
+
+ssa_equiv_stack::ssa_equiv_stack ()
+{
+  m_replacements = new tree[num_ssa_names] ();
+}
+
+ssa_equiv_stack::~ssa_equiv_stack ()
+{
+  m_stack.release ();
+  delete m_replacements;
+}
+
+// Pushes a marker at the given point.
+
+void
+ssa_equiv_stack::enter (basic_block)
+{
+  m_stack.safe_push (m_marker);
+}
+
+// Pops the stack to the last marker, while performing replacements along the
+// way.
+
+void
+ssa_equiv_stack::leave (basic_block)
+{
+  gcc_checking_assert (!m_stack.is_empty ());
+  while (m_stack.last () != m_marker)
+{
+  std::pair e = m_stack.pop ();
+  m_replacements[SSA_NAME_VERSION (e.first)] = e.second;
+}
+  m_stack.pop ();
+}
+
+// Set the equivalence of NAME to REPLACEMENT.
+
+void
+ssa_equiv_stack::push_replacement (tree name, tree replacement)
+{
+  tree old = m_replacements[SSA_NAME_VERSION (name)];
+  m_replacements[SSA_NAME_VERSION (name)] = replacement;
+  m_stack.safe_push (std::make_pair (name, old));
+}
+
+// Return the equivalence of NAME.
+
+tree

[wwwdocs] Add HTML anchors to GCC 5 "porting to" notes

2021-06-07 Thread Jonathan Wakely via Gcc-patches
This adds id attributes to the heading elements of
https://gcc.gnu.org/gcc-5/porting_to.html (so that I can link directly
to the section on inline functions).

All later porting_to.html notes have anchors like this.

OK for wwwdocs?


commit 9f1723bc1d6b52fcfcfc4a8aee93eada98412e78
Author: Jonathan Wakely 
Date:   Mon Jun 7 11:00:22 2021 +0100

Add HTML anchors to GCC 5 "porting to" notes

diff --git a/htdocs/gcc-5/porting_to.html b/htdocs/gcc-5/porting_to.html
index 7d629e78..c359dfb2 100644
--- a/htdocs/gcc-5/porting_to.html
+++ b/htdocs/gcc-5/porting_to.html
@@ -28,7 +28,7 @@ manner. Additions and suggestions for improvement are welcome.
 
 
 
-Preprocessor issues
+Preprocessor issues
 
 The preprocessor started to emit line markers to properly distinguish
 whether a macro token comes from a system header, or from a normal header
@@ -65,9 +65,9 @@ Observe how the exitfailure and 1 
tokens
 are not on the same line anymore.
 
 
-C language issues
+C language issues
 
-Default standard is now GNU11
+Default standard is now GNU11
 
 GCC defaults to -std=gnu11 instead of -std=gnu89.
 This brings several changes that users should be aware of.  The following
@@ -84,7 +84,7 @@ former warns about features not present in ISO C90, but 
present in ISO C99.
 The latter warns about features not present in ISO C99, but present in
 ISO C11.  See the GCC manual for more info.
 
-Different semantics for inline functions
+Different semantics for inline functions
 
 While -std=gnu89 employs the GNU89 inline semantics,
 -std=gnu11 uses the C99 inline semantics.  The C99 inline 
semantics
@@ -184,7 +184,7 @@ standard due to multiple definition errors:
   }
 
 
-Some warnings are enabled by default
+Some warnings are enabled by default
 
 The C99 mode enables some warnings by default.  For instance, GCC warns
 about missing declarations of functions:
@@ -267,7 +267,7 @@ returning no value in a function returning non-void:
 The fix is either to specify a proper return value, or to declare the return
 type of foo as void.
 
-Initializing statics with compound literals
+Initializing statics with compound literals
 
 Previously, initializing objects with static storage duration with compound
 literals was only allowed in the GNU89 mode.  This restriction has been lifted
@@ -296,7 +296,7 @@ snippet is an example of such initialization:
^
 
 
-__STDC_VERSION__ macro
+__STDC_VERSION__ macro
 
 As the default mode changed to C11, the __STDC_VERSION__
 standard macro, introduced in C95, is now defined by default, and has
@@ -314,7 +314,7 @@ the value 201112L.
 
 You can check the macro using gcc -dM -E -std=gnu11 -  /dev/null 
| grep STDC_VER.
 
-Different meaning of the %a *scanf conversion 
specification
+Different meaning of the %a *scanf conversion 
specification
 
 In C89, the GNU C library supports dynamic allocation via the 
%as,
 %aS, and %a[...] conversion specifications; see
@@ -346,7 +346,7 @@ This is a change in semantics, and in combination with the
 m as a length modifier as per POSIX.1-2008.  That is, use
 %ms or %m[...].
 
-New warnings
+New warnings
 
 Several new warnings have been added to the C front end.  Among others
 -Wpedantic now warns about non-standard predefined identifiers.
@@ -374,9 +374,9 @@ For instance:
 
 
 
-C++ language issues
+C++ language issues
 
-Converting std::nullptr_t to bool
+Converting std::nullptr_t to 
bool
 
 Converting std::nullptr_t to bool in C++11
 mode now requires direct-initialization.  This has been changed in
@@ -397,7 +397,7 @@ mode now requires direct-initialization.  This has been 
changed in
 It is recommended to use the false keyword instead of
 converting nullptr to bool.
 
-Return by converting move constructor
+Return by converting move constructor
 
 GCC 5 implements
 http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1579;>DR 
1579
@@ -430,7 +430,7 @@ using return X(y); or
 
 
 
-Links
+Links
 
 
 Marek Polacek and Jakub Jelinek,


[wwwdocs] Document libstdc++ support for generic unordered lookup

2021-06-07 Thread Jonathan Wakely via Gcc-patches
This should have been in the GCC 11 release notes, pushed now.

commit bbbf05e7f3d3e7a74d23e8cc252ebae895158abe
Author: Jonathan Wakely 
Date:   Mon Jun 7 10:27:00 2021 +0100

Document libstdc++ support for generic unordered lookup

diff --git a/htdocs/gcc-11/changes.html b/htdocs/gcc-11/changes.html
index ef533e31..97606174 100644
--- a/htdocs/gcc-11/changes.html
+++ b/htdocs/gcc-11/changes.html
@@ -526,6 +526,7 @@ You may also want to check out our
 and semaphore
   syncstream
   Efficient access to basic_stringbuf's buffer.
+  Heterogeneous lookup in unordered containers.
 
   
   Experimental C++23 support, including:


Re: [PATCH] define auto_vec copy ctor and assignment (PR 90904)

2021-06-07 Thread Richard Biener via Gcc-patches
On Thu, Jun 3, 2021 at 10:29 AM Trevor Saunders  wrote:
>
> On Wed, Jun 02, 2021 at 10:04:03AM -0600, Martin Sebor via Gcc-patches wrote:
> > On 6/2/21 12:55 AM, Richard Biener wrote:
> > > On Tue, Jun 1, 2021 at 9:56 PM Martin Sebor  wrote:
> > > >
> > > > On 5/27/21 2:53 PM, Jason Merrill wrote:
> > > > > On 4/27/21 11:52 AM, Martin Sebor via Gcc-patches wrote:
> > > > > > On 4/27/21 8:04 AM, Richard Biener wrote:
> > > > > > > On Tue, Apr 27, 2021 at 3:59 PM Martin Sebor  
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > On 4/27/21 1:58 AM, Richard Biener wrote:
> > > > > > > > > On Tue, Apr 27, 2021 at 2:46 AM Martin Sebor via Gcc-patches
> > > > > > > > >  wrote:
> > > > > > > > > >
> > > > > > > > > > PR 90904 notes that auto_vec is unsafe to copy and assign 
> > > > > > > > > > because
> > > > > > > > > > the class manages its own memory but doesn't define (or 
> > > > > > > > > > delete)
> > > > > > > > > > either special function.  Since I first ran into the 
> > > > > > > > > > problem,
> > > > > > > > > > auto_vec has grown a move ctor and move assignment from
> > > > > > > > > > a dynamically-allocated vec but still no copy ctor or copy
> > > > > > > > > > assignment operator.
> > > > > > > > > >
> > > > > > > > > > The attached patch adds the two special functions to 
> > > > > > > > > > auto_vec along
> > > > > > > > > > with a few simple tests.  It makes auto_vec safe to use in 
> > > > > > > > > > containers
> > > > > > > > > > that expect copyable and assignable element types and passes
> > > > > > > > > > bootstrap
> > > > > > > > > > and regression testing on x86_64-linux.
> > > > > > > > >
> > > > > > > > > The question is whether we want such uses to appear since 
> > > > > > > > > those
> > > > > > > > > can be quite inefficient?  Thus the option is to delete those
> > > > > > > > > operators?
> > > > > > > >
> > > > > > > > I would strongly prefer the generic vector class to have the 
> > > > > > > > properties
> > > > > > > > expected of any other generic container: copyable and 
> > > > > > > > assignable.  If
> > > > > > > > we also want another vector type with this restriction I 
> > > > > > > > suggest to add
> > > > > > > > another "noncopyable" type and make that property explicit in 
> > > > > > > > its name.
> > > > > > > > I can submit one in a followup patch if you think we need one.
> > > > > > >
> > > > > > > I'm not sure (and not strictly against the copy and assign).  
> > > > > > > Looking
> > > > > > > around
> > > > > > > I see that vec<> does not do deep copying.  Making auto_vec<> do 
> > > > > > > it
> > > > > > > might be surprising (I added the move capability to match how 
> > > > > > > vec<>
> > > > > > > is used - as "reference" to a vector)
> > > > > >
> > > > > > The vec base classes are special: they have no ctors at all (because
> > > > > > of their use in unions).  That's something we might have to live 
> > > > > > with
> > > > > > but it's not a model to follow in ordinary containers.
> > > > >
> > > > > I don't think we have to live with it anymore, now that we're writing
> > > > > C++11.
> > > > >
> > > > > > The auto_vec class was introduced to fill the need for a 
> > > > > > conventional
> > > > > > sequence container with a ctor and dtor.  The missing copy ctor and
> > > > > > assignment operators were an oversight, not a deliberate feature.
> > > > > > This change fixes that oversight.
>
> I've been away a while, but trying to get back into this, sorry.  It was
> definitely an oversight to leave these undefined for the compiler to
> provide a default definition of, but I agree with Richi, the better
> thing to have done, or do now would be to mark them as deleted and make
> auto_vec move only (with copy() for when you really need a deep copy.
> > > > > >
> > > > > > The revised patch also adds a copy ctor/assignment to the auto_vec
> > > > > > primary template (that's also missing it).  In addition, it adds
> > > > > > a new class called auto_vec_ncopy that disables copying and
> > > > > > assignment as you prefer.
> > > > >
> > > > > Hmm, adding another class doesn't really help with the confusion richi
> > > > > mentions.  And many uses of auto_vec will pass them as vec, which will
> > > > > still do a shallow copy.  I think it's probably better to disable the
> > > > > copy special members for auto_vec until we fix vec<>.
> > > >
> > > > There are at least a couple of problems that get in the way of fixing
> > > > all of vec to act like a well-behaved C++ container:
> > > >
> > > > 1) The embedded vec has a trailing "flexible" array member with its
> > > > instances having different size.  They're initialized by memset and
> > > > copied by memcpy.  The class can't have copy ctors or assignments
> > > > but it should disable/delete them instead.
> > > >
> > > > 2) The heap-based vec is used throughout GCC with the assumption of
> > > > shallow copy semantics (not just as function arguments but also as
> > > > members of other such POD 

Re: [PATCH][RFC] Sparse on entry cache for Ranger.

2021-06-07 Thread Richard Biener via Gcc-patches
On Wed, Jun 2, 2021 at 11:15 PM Andrew MacLeod  wrote:
>
> As mentioned earlier, I abstracted the on-entry cache at the beginning
> of stage1. This was to make it easier to port future changes back to
> GCC11 so we could provide alternate representations to deal with memory
> issues, or what have you.
>
> This patch introduces a sparse representation of the cache which is
> used  when the number of basic blocks gets too large.
>
> I commandeered the bitmap code since its efficient and has been working
> a long time, and added 2 routines to get and set 4 bits (quads) at a
> time.  This allows me to use a bitmap like its a sparse array which can
> contain a value between 0 and 15, and is conveniently pre-initialized to
> values of 0 at no cost :-)   This is then used as an index into a small
> local table to store ranges for the name.  Its limiting in that an
> ssa-name will not be able to have more than 15 unique ranges, but this
> happens in less than 8% of all cases in the data I collected, and most
> of those are switches.. any ranges after the 15 slots are full revert to
> VARYING.  The values for VARYING and UNDEFINED are pre-populated, and
> for pointers, I also pre-populate [0,0] and ~[0, 0].
>
> This also adds --param=evrp-sparse-threshold=  which allows the
> threshold between the full vector and this new sparse representation to
> be changed. It defaults to a value of 800. I've done various performance
> runs, and this seems to be a reasonably balanced number. In fact its a
> 28% improvement in EVRP compile time over 390 files from a gcc bootstrap
> with minimal impact on missed opportunities.
>
> I've also tried to see if using less than 15 values has any significant
> effect (The lookup is linear when setting), but it does not appear to.
>
> I've also bootstrapped with the sparse threshold at 0 to ensure there
> aren't any issues.
>
> My thoughts are I would put this into trunk, and assuming nothing comes
> up  over the next couple of days, port it back to GCC11 to resolve
> 100299 and other excessive memory consumption PRs there as well. given
> that its reusing bitmap code for the sparse representation, it seems
> like it would be low risk.
>
> Are we OK with the addition of the bitmap_get_quad and bitmap_set_quad
> routines in bitmap.c?  It seems like they might be useful to others.
> They are simple tweaks of bitmap_set_bit and bitmap_bit_p.. just dealing
> with 4 bits at a time.  I could make them local if this is a problem,
> but i don't have access to the bitmap internals there.

I think _quad is a bit too specific - it's aligned chunks so maybe

void bitmap_set_aligned_chunk (bitmap, unsigned int chunk, unsigned
int chunk_size, BITMAP_WORD chunk_value);

and

BITMAP_WORD bitmap_get_aligned_chunk (bitmap, unsigned int chunk,
unsigned chunk_size);

and assert that chunk_size is power-of-two and fits in BITMAP_WORD?

(also note using unsigned ints and BITMAP_WORD for the data type)

I've been using two-bit representations in a few places (but mostly
setting/testing the
respective bits independently), I suppose for example

static dep_state
query_loop_dependence (class loop *loop, im_mem_ref *ref, dep_kind kind)
{
  unsigned first_bit = 6 * loop->num + kind * 2;
  if (bitmap_bit_p (>dep_loop, first_bit))
return dep_independent;
  else if (bitmap_bit_p (>dep_loop, first_bit + 1))
return dep_dependent;

could use a chunk size of 2 and a single bitmap query.  Incidentially this
specific code uses 6 bits, so it's not fully aligned ...

/* We use six bits per loop in the ref->dep_loop bitmap to record
   the dep_kind x dep_state combinations.  */

enum dep_kind { lim_raw, sm_war, sm_waw };
enum dep_state { dep_unknown, dep_independent, dep_dependent };

... but there's also at most a single bit set.

Anyway, I'm OK with adding API to access aligned power-of-two sized chunks.
Even not power-of-two sized unaligned chunks should be quite straight
forward to implement if we limit the chunk size to BITMAP_WORD by
simply advancing to the next bitmap word / element when necessary.

An alternative low-level API would provide accesses to whole BITMAP_WORD
entries and the quads could be implemented on top of that
(bitmap_set_word/_get_word)

Richard.

> Bootstraps on x86_64-pc-linux-gnu with no regressions.
>
> Andrew
>
> PS in PR10299 we spend a fraction of a second in EVRP now.
>
>
>


Re: [PATCH] Reformat target.def for better parsing.

2021-06-07 Thread Martin Liška

On 6/7/21 8:14 AM, Richard Biener wrote:

Hmm, what's the problem with parsing the strings?  The newlines are
only because of our line length limits and hard-coding them looks both
error-prone and ugly.


For the future Sphinx conversion, I need to replace content of the hook 
definitions
in .def files with RST format (as opposed to the current TEXI format). I did so
by putting a placeholders in genhooks.c and then parsing of the created .rst 
file.

When there are no newlines, then I can't split the generated .rst content to 
multiple lines
as defined in .def files.

So my patch makes it only consistent as most of the hooks use '\n\' line 
endings.

Does it make sense?
Martin



Re: [EXTERNAL] Re: [PATCH] tree-optimization: Optimize division followed by multiply [PR95176]

2021-06-07 Thread Richard Biener via Gcc-patches
On Wed, Jun 2, 2021 at 10:55 PM Victor Tong  wrote:
>
> Hi Richard,
>
> Thanks for reviewing my patch. I did a search online and you're right -- 
> there isn't a vector modulo instruction. I'll remove the X * (Y / X) --> Y - 
> (Y % X) pattern and the existing X - (X / Y) * Y --> X % Y from triggering on 
> vector types.
>
> I looked into why the following pattern isn't triggering:
>
>   (simplify
>(minus @0 (nop_convert1? (minus (nop_convert2? @0) @1)))
>(view_convert @1))
>
> The nop_converts expand into tree_nop_conversion_p checks. In fn2() of the 
> testsuite/gcc.dg/fold-minus-6.c, the expression during generic matching looks 
> like:
>
> 42 - (long int) (42 - 42 % x)
>
> When looking at the right-hand side of the expression (the (long int) (42 - 
> 42 % x)), the tree_nop_conversion_p check fails because of the type precision 
> difference. The expression inside of the cast has a 32-bit precision and the 
> outer expression has a 64-bit precision.
>
> I looked around at other patterns and it seems like nop_convert and 
> view_convert are used because of underflow/overflow concerns. I'm not 
> familiar with the two constructs. What's the difference between using them 
> and checking TYPE_OVERFLOW_UNDEFINED? In the scenario above, since 
> TYPE_OVERFLOW_UNDEFINED is true, the second pattern that I added (X - (X - Y) 
> --> Y) gets triggered.

But TYPE_OVERFLOW_UNDEFINED is not a good condition here since the
conversion is the problematic one and
conversions have implementation defined behavior.  Now, the above does
not match because it wasn't designed to,
and for non-constant '42' it would have needed a (convert ...) around
the first @0 as well (matching of constants is
by value, not by value + type).

That said, your

+/* X - (X - Y) --> Y */
+(simplify
+ (minus (convert1? @0) (convert2? (minus @@0 @1)))
+ (if ((INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type)) &&
TYPE_OVERFLOW_UNDEFINED(type))
+  (convert @1)))

would match (int)x - (int)(x - y) where you assert the outer subtract
has undefined behavior
on overflow but the inner subtract could wrap and the (int) conversion
can be truncating
or widening.  Is that really always a valid transform then?

Richard.

> Thanks,
> Victor
>
>
> From: Richard Biener 
> Sent: Tuesday, April 27, 2021 1:29 AM
> To: Victor Tong 
> Cc: gcc-patches@gcc.gnu.org 
> Subject: [EXTERNAL] Re: [PATCH] tree-optimization: Optimize division followed 
> by multiply [PR95176]
>
> On Thu, Apr 1, 2021 at 1:03 AM Victor Tong via Gcc-patches
>  wrote:
> >
> > Hello,
> >
> > This patch fixes PR tree-optimization/95176. A new pattern in match.pd was 
> > added to transform "a * (b / a)" --> "b - (b % a)". A new test case was 
> > also added to cover this scenario.
> >
> > The new pattern interfered with the existing pattern of "X - (X / Y) * Y". 
> > In some cases (such as in fn4() in gcc/testsuite/gcc.dg/fold-minus-6.c), 
> > the new pattern is applied causing the existing pattern to no longer apply. 
> > This results in worse code generation because the expression is left as "X 
> > - (X - Y)". An additional subtraction pattern of "X - (X - Y) --> Y" was 
> > added to this patch to avoid this regression.
> >
> > I also didn't remove the existing pattern because it triggered in more 
> > cases than the new pattern because of a tree_invariant_p check that's 
> > inserted by genmatch for the new pattern.
>
> Yes, we do not handle using Y multiple times when it might contain
> side-effects in GENERIC folding
> (comments in genmatch suggest we can use save_expr but we don't
> implement this [anymore]).
>
> On GIMPLE there's also the issue that your new pattern creates a
> complex expression which
> makes it failed to be used by value-numbering for example where the
> old pattern was OK
> (eventually, if no conversion was required).
>
> So indeed it looks OK to preserve both.
>
> I wonder why you needed the
>
> +/* X - (X - Y) --> Y */
> +(simplify
> + (minus (convert1? @0) (convert2? (minus @@0 @1)))
> + (if ((INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type)) &&
> TYPE_OVERFLOW_UNDEFINED(type))
> +  (convert @1)))
>
> pattern since it should be handled by
>
>   /* Match patterns that allow contracting a plus-minus pair
>  irrespective of overflow issues.  */
>   /* (A +- B) - A   ->  +- B */
>   /* (A +- B) -+ B  ->  A */
>   /* A - (A +- B)   -> -+ B */
>   /* A +- (B -+ A)  ->  +- B */
>
> in particular
>
>   (simplify
>(minus @0 (nop_convert1? (minus (nop_convert2? @0) @1)))
>(view_convert @1))
>
> if there's supported cases missing I'd rather extend this pattern than
> replicating it.
>
> +/* X * (Y / X) is the same as Y - (Y % X).  */
> +(simplify
> + (mult:c (convert1? @0) (convert2? (trunc_div @1 @@0)))
> + (if (INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type))
> +  (minus (convert @1) (convert (trunc_mod @1 @0)
>
> note that if you're allowing vector types you have to use
> (view_convert ...) in the
> transform and you 

Re: [PATCH][version 3]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-06-07 Thread Richard Biener
On Thu, 3 Jun 2021, Qing Zhao wrote:

> Hi, Richard,
> 
> 
> On May 26, 2021, at 6:18 AM, Richard Biener 
> mailto:rguent...@suse.de>> wrote:
> 
> On Wed, 12 May 2021, Qing Zhao wrote:
> 
> Hi,
> 
> This is the 3rd version of the patch for the new security feature for GCC.
> 
> Please take look and let me know your comments and suggestions.
> 
> 
> +/* Returns true when the given TYPE has padding inside it.
> +   return false otherwise.  */
> +bool
> +type_has_padding (tree type)
> +{
> +  switch (TREE_CODE (type))
> +{
> +case RECORD_TYPE:
> +  {
> 
> btw, there's __builtin_clear_padding and a whole machinery around
> it in gimple-fold.c, I'm sure that parts could be re-used if they
> are neccessary in the end.
> 
> To address the above suggestion:
> 
> My study shows: the call to __builtin_clear_padding is expanded during 
> gimplification phase.
> And there is no __bultin_clear_padding expanding during rtx expanding phase.
> However, for -ftrivial-auto-var-init, padding initialization should be done 
> both in gimplification phase and rtx expanding phase.
> since the __builtin_clear_padding might not be good for rtx expanding, 
> reusing __builtin_clear_padding might not work.
> 
> Let me know if you have any more comments on this.

Yes, I didn't suggest to literally emit calls to __builtin_clear_padding 
but instead to leverage the lowering code, more specifically share the
code that figures _what_ is to be initialized (where the padding is)
and eventually the actual code generation pieces.  That might need some
refactoring but the code where padding resides should be present only
a single time (since it's quite complex).

Which is also why I suggested to split out the padding initialization
bits to a separate patch (and option).

Richard.


Re: [PATCH][version 3]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-06-07 Thread Richard Biener
On Thu, 3 Jun 2021, Qing Zhao wrote:

> Hi, Richard,
> 
> For the following, I need more clarification:
> 
> 
> 
> +/* Expand the IFN_DEFERRED_INIT function according to its second
> argument.  */
> +static void
> +expand_DEFERRED_INIT (internal_fn, gcall *stmt)
> +{
> +  tree var = gimple_call_lhs (stmt);
> +  tree init = NULL_TREE;
> +  enum auto_init_type init_type
> += (enum auto_init_type) TREE_INT_CST_LOW (gimple_call_arg (stmt, 1));
> +
> +  switch (init_type)
> +{
> +default:
> +  gcc_unreachable ();
> +case AUTO_INIT_PATTERN:
> +  init = build_pattern_cst_for_auto_init (TREE_TYPE (var));
> +  expand_assignment (var, init, false);
> +  break;
> +case AUTO_INIT_ZERO:
> +  init = build_zero_cst (TREE_TYPE (var));
> +  expand_assignment (var, init, false);
> +  break;
> +}
> 
> I think actually building build_pattern_cst_for_auto_init can generate
> massive garbage and for big auto vars code size is also a concern and
> ideally on x86 you'd produce rep movq.  So I don't think going
> via expand_assignment is good.  Instead you possibly want to lower
> .DEFERRED_INIT to MEMs following expand_builtin_memset and
> eventually enhance that to allow storing pieces larger than a byte.
> 
> 
> I will lower .DEFFERED_INIT to MEMS following expand_builtin_memset for 
> “AUTO_INIT_PATTERN”.
> My question is:
> Do I need to do the same for “AUTO_INIT_ZERO”?

No, the representation for a general "zero constant" for aggregates is
just a single CONSTRUCTOR node with zero explicit elements and thus
quite optimal.

Richard.

> Thanks.
> 
> Qing
> 


Re: [PATCH][version 3]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-06-07 Thread Richard Biener
On Thu, 27 May 2021, Qing Zhao wrote:

> Hi, Richard,
> 
> Thanks a lot for your comments.
> 
> 
> On May 26, 2021, at 6:18 AM, Richard Biener 
> mailto:rguent...@suse.de>> wrote:
> 
> On Wed, 12 May 2021, Qing Zhao wrote:
> 
> Hi,
> 
> This is the 3rd version of the patch for the new security feature for GCC.

Meh - can you try using a mailer that does proper quoting?  It's difficult
to spot your added comments.  Will try anyway (and sorry for the delay)

> Please take look and let me know your comments and suggestions.
> 
> thanks.
> 
> Qing
> 
> **Compare with the 2nd version, the following are the major changes:
> 
> 1. use "lookup_attribute ("uninitialized",) directly instead of adding
>   one new field "uninitialized" into tree_decl_with_vis.
> 2. update documentation to mention that the new option will not confuse
>   -Wuninitialized, GCC still consider an auto without explicit initializer
>   as uninitialized.
> 3. change the name of "build_pattern_cst" to more specific name as
>   "build_pattern_cst_for_auto_init".
> 4. handling of nested VLA;
>   Adding new testing cases (auto-init-15/16.c) for this new handling.
> 5. Add  new verifications of calls to .DEFERRED_INIT in tree-cfg.c;
> 6. in tree-sra.c, update the handling of "grp_to_be_debug_replaced",
>   bind the lhs variable to a call to .DEFERRED_INIT.
> 7. In tree-ssa-structalias.c, delete "find_func_aliases_for_deferred_init",
>   return directly for a call to .DEFERRED_INIT in 
> "find_func_aliases_for_call".
> 8. Add more detailed comments in tree-ssa-uninit.c and tree-ssa.c to explain
>   the special handling on REALPART_EXPR/IMAGPRT_EXPR.
> 9. in build_pattern_cst_for_auto_init:
>   BOOLEAN_TYPE will be set to zero always;
>   INTEGER_TYPE (?and ENUMERAL_TYPE) use wi::from_buffer in order to
>correctly handle 128-bit integers.
>   POINTER_TYPE will not assert on SIZE < 32.
>   REAL_TYPE add fallback;
> 10. changed gcc_assert to gcc_unreachable in several places;
> 11. add more comments;
> 12. some style issue changes.
> 
> **Please see the version 2 at:
> https://gcc.gnu.org/pipermail/gcc-patches/2021-March/567262.html
> 
> 
> **The following 2 items are the ones I didn’t addressed in this version 
> due to further study and might need more discussion:
> 
> 1. Using __builtin_clear_padding  to replace type_has_padding.
> 
> My study shows: the call to __builtin_clear_padding is expanded during 
> gimplification phase.
> And there is no __bultin_clear_padding expanding during rtx expanding phase.
> If so,  for -ftrivial-auto-var-init, padding initialization should be done 
> both in gimplification phase and rtx expanding phase.
> And since the __builtin_clear_padding might not be good for rtx expanding, 
> reusing __builtin_clear_padding might not work.
> 
> 2. Pattern init to NULLPTR_TYPE and ENUMERAL_TYPE: need more comments from 
> Richard Biener on this.
> 
> **The change of the 3rd version compared to the 2nd version are:
> 
> 
> +@item -ftrivial-auto-var-init=@var{choice}
> +@opindex ftrivial-auto-var-init
> +Initialize automatic variables with either a pattern or with zeroes to
> increase
> +the security and predictability of a program by preventing uninitialized
> memory
> +disclosure and use.
> 
> the docs do not state what "trivial" actually means?  Does it affect
> C++ classes with ctors, thus is "trivial" equal to what C++ considers
> a POD type?
> 
> Thank you for this question.
> 
> The name -ftrivial-auto-var-init is just for compatible with Clang. I really 
> don’t know why
> they added trivial.
> 
> As I checked a small example with C++ class with ctors, I see both Clang and 
> my patch add
> Initialization to this class:
> 
> =
> #include 
> 
> using namespace std;
> 
> class Line {
>public:
>   void setLength( double len );
>   double getLength( void );
>   Line();  // This is the constructor
>private:
>   double length;
> };
> 
> // Member functions definitions including constructor
> Line::Line(void) {
>cout << "Object is being created" << endl;
> }
> void Line::setLength( double len ) {
>   length = len;
> }
> double Line::getLength( void ) {
>   return length;
> }
> 
> // Main function for the program
> int main() {
>   Line line;
> 
>   // set line length
>   line.setLength(6.0);
>   cout << "Length of line : " << line.getLength() < 
>   return 0;
> }
> 
> =
> 
> Both clang and my patch add initialization to the above auto variable “line”.
> 
> So, I have the following questions need help:
> 
> 1. Do we need to exclude C++ class with ctor from auto initialization?
> 
> 2. I see Clang use call to internal memset to initialize such class, but for 
> my patch, I only initialize the data fields inside this class.
> Which one is better?

I can't answer either question, but generally using block-initialization
(for example via memset, but we'd generally prefer X = {}) is better for
later optimization.

> 
> 
> diff --git a/gcc/expr.c b/gcc/expr.c
> 

[PATCH] build: Implement --with-multilib-list for avr target

2021-06-07 Thread Matt Jacobson via Gcc-patches
The AVR target builds a lot of multilib variants of target libraries by default,
and I found myself wanting to use the --with-multilib-list argument to limit
what I was building, to shorten build times.  This patch implements that option
for the AVR target.

Tested by configuring and building an AVR compiler and target libs on macOS.

I don't have commit access, so if this patch is suitable, I'd need someone else
to commit it for me.  Thanks.

gcc/ChangeLog:

2020-06-07  Matt Jacobson  

* config.gcc: For the AVR target, populate TM_MULTILIB_CONFIG.
* config/avr/genmultilib.awk: Add ability to filter generated multilib
list.
* config/avr/t-avr: Pass TM_MULTILIB_CONFIG to genmultilib.awk.
* configure.ac: Update help string for --with-multilib-list.
* configure: Regenerate.


diff --git a/gcc/config.gcc b/gcc/config.gcc
index 6a349965c..fd83996a4 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -4201,6 +4201,13 @@ case "${target}" in
fi
;;
 
+   avr-*-*)
+   # Handle --with-multilib-list.
+   if test "x${with_multilib_list}" != xdefault; then
+   TM_MULTILIB_CONFIG="${with_multilib_list}"
+   fi
+   ;;
+
 csky-*-*)
supported_defaults="cpu endian float"
;;
diff --git a/gcc/config/avr/genmultilib.awk b/gcc/config/avr/genmultilib.awk
index 2d07c0e53..ad8814602 100644
--- a/gcc/config/avr/genmultilib.awk
+++ b/gcc/config/avr/genmultilib.awk
@@ -67,6 +67,16 @@ BEGIN {
 
 dir_long_double = "long-double"   (96 - with_long_double)
 opt_long_double = "mlong-double=" (96 - with_long_double)
+
+if (with_multilib_list != "")
+{
+   split(with_multilib_list, multilib_list, ",")
+
+   for (i in multilib_list)
+   {
+   multilibs[multilib_list[i]] = 1
+   }
+}
 }
 
 ##
@@ -137,6 +147,9 @@ BEGIN {
if (core == "avr1")
next
 
+   if (with_multilib_list != "" && !(core in multilibs))
+   next
+
option[core] = "mmcu=" core
 
m_options  = m_options m_sep option[core]
@@ -150,6 +163,9 @@ BEGIN {
 if (core == "avr1")
next
 
+if (with_multilib_list != "" && !(core in multilibs))
+   next
+
 opts = option[core]
 
 # split device specific feature list
diff --git a/gcc/config/avr/t-avr b/gcc/config/avr/t-avr
index 3e1a1ba68..7d20c6107 100644
--- a/gcc/config/avr/t-avr
+++ b/gcc/config/avr/t-avr
@@ -127,6 +127,7 @@ t-multilib-avr: $(srcdir)/config/avr/genmultilib.awk \
-v HAVE_LONG_DOUBLE64=$(HAVE_LONG_DOUBLE64) \
-v with_double=$(WITH_DOUBLE)   \
-v with_long_double=$(WITH_LONG_DOUBLE) \
+   -v with_multilib_list=$(TM_MULTILIB_CONFIG) \
-f $< $< $(AVR_MCUS) > $@
 
 include t-multilib-avr
diff --git a/gcc/configure.ac b/gcc/configure.ac
index 715fcba04..c3ed65df7 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -1106,7 +1106,7 @@ if test x"$enable_hsa" = x1 ; then
 fi
 
 AC_ARG_WITH(multilib-list,
-[AS_HELP_STRING([--with-multilib-list], [select multilibs (AArch64, SH and 
x86-64 only)])],
+[AS_HELP_STRING([--with-multilib-list], [select multilibs (AArch64, AVR, i386, 
or1k, RISC-V, SH, and x86-64 only)])],
 :,
 with_multilib_list=default)
 



Re: [RFC/PATCH] updating global ranges and their effect on __builtin_unreachable code

2021-06-07 Thread Richard Biener via Gcc-patches
On Wed, Jun 2, 2021 at 4:53 PM Aldy Hernandez  wrote:
>
>
>
> On 6/2/21 1:52 PM, Richard Biener wrote:
> > On Wed, Jun 2, 2021 at 12:34 PM Aldy Hernandez via Gcc-patches
> >  wrote:
>
> > But the whole point of all this singing and dancing is not to make
> > warnings but to be able to implement assert (); or assume (); that
> > will result in no code but optimization based on the assumption.
> >
> > That means that all the checks guarding __builtin_unreachable ()
> > should be removed at the GIMPLE level - just not too early
> > to preserve range info on the variables participating in the
> > guarding condition.
> >
> > So yes, it sounds fragile but instead it's carefully architected.  Heh.
> >
> > In particular it is designed so that early optimization leaves those
> > unreachable () around (for later LTO consumption and inlining, etc.
> > to be able to re-create the ranges) whilst VRP1 / DOM will end up
> > eliminating them.  I think we have testcases that verify said behavior,
> > namely optimize out range checks based on the assertions - maybe missed
>
> Understood.
>
> I will note that my proposed patch does not remove any unreachables, and
> maintains current behavior.  It just refines the ranges from the ranger
> with current global ranges.  So I think the patch should go in,
> regardless of what is decided with __builtin_unreachables downthread.
>
> > the case where this only happens after inlining (important for your friendly
> > C++ abstraction hell), and the unreachable()s gone.
>
> I have pointed this out before, and will repeat it in case you missed it:
>
> "Richard, you have made it very clear that we disagree on core design
> issues, but that's no reason to continually make snide comments on every
> patch or PRs  Can we keep the discussions focused on the technical bits?"

You need to stop taking every sentence I write as personal insult.  I was
refering to C++ abstraction in general in the sense as that we should be
prepared to handle index checking removal even across inline boundaries.
Reading my comment I guess the "your" might mislead you - sorry for
not being a native speaker and thus picking a wrong general term here.

Richard.

> https://gcc.gnu.org/pipermail/gcc-patches/2021-April/569072.html
>


Re: [RFC/PATCH] updating global ranges and their effect on __builtin_unreachable code

2021-06-07 Thread Richard Biener via Gcc-patches
On Wed, Jun 2, 2021 at 2:53 PM Andrew MacLeod  wrote:
>
> On 6/2/21 7:52 AM, Richard Biener wrote:
> > On Wed, Jun 2, 2021 at 12:34 PM Aldy Hernandez via Gcc-patches
> >  wrote:
> >> We've been having "issues" in our branch when exporting to the global
> >> space ranges that take into account previously known ranges
> >> (SSA_NAME_RANGE_INFO, etc).  For the longest time we had the export
> >> feature turned off because it had the potential of removing
> >> __builtin_unreachable code early in the pipeline.  This was causing one
> >> or two tests to fail.
> >>
> >> I finally got fed up, and investigated why.
> >>
> >> Take the following code:
> >>
> >> i_4 = somerandom ();
> >> if (i_4 < 0)
> >>   goto ; [INV]
> >> else
> >>   goto ; [INV]
> >>
> >>  :
> >> __builtin_unreachable ();
> >>
> >>  :
> >>
> >> It turns out that both legacy evrp and VRP have code that notices the
> >> above pattern and sets the *global* range for i_4 to [0,MAX].  That is,
> >> the range for i_4 is set, not at BB4, but at the definition site.  See
> >> uses of assert_unreachable_fallthru_edge_p() for details.
> >>
> >> This global range causes subsequent passes (VRP1 in the testcase below),
> >> to remove the checks and the __builtin_unreachable code altogether.
> >>
> >> // pr80776-1.c
> >> int somerandom (void);
> >> void
> >> Foo (void)
> >> {
> >> int i = somerandom ();
> >> if (! (0 <= i))
> >>   __builtin_unreachable ();
> >> if (! (0 <= i && i <= 99))
> >>   __builtin_unreachable ();
> >> sprintf (number, "%d", i);
> >> }
> >>
> >> This means that by the time the -Wformat-overflow warning runs, the
> >> above sprintf has been left unguarded, and a bogus warning is issued.
> >>
> >> Currently the above test does not warn, but that's because of an
> >> oversight in export_global_ranges().  This function is disregarding
> >> known global ranges (SSA_NAME_RANGE_INFO and SSA_NAME_PTR_INFO) and only
> >> setting ranges the ranger knows about.
> >>
> >> For the above test the IL is:
> >>
> >>  :
> >> i_4 = somerandom ();
> >> if (i_4 < 0)
> >>   goto ; [INV]
> >> else
> >>   goto ; [INV]
> >>
> >>  :
> >> __builtin_unreachable ();
> >>
> >>  :
> >> i.0_1 = (unsigned int) i_4;
> >> if (i.0_1 > 99)
> >>   goto ; [INV]
> >> else
> >>   goto ; [INV]
> >>
> >>  :
> >> __builtin_unreachable ();
> >>
> >>  :
> >> _7 = __builtin___sprintf_chk (, 1, 7, "%d", i_4);
> >>
> >>
> >> Legacy evrp has determined that the range for i_4 is [0,MAX] per my
> >> analysis above, but ranger has no known range for i_4 at the definition
> >> site.  So at export_global_ranges time, ranger leaves the [0,MAX] alone.
> >>
> >> OTOH, evrp sets the global range at the definition for i.0_1 to
> >> [0,99] per the same unreachable feature.  However, ranger has
> >> correctly determined that the range for i.0_1 at the definition is
> >> [0,MAX], which it then proceeds to export.  Since the current
> >> export_global_ranges (mistakenly) does not take into account previous
> >> global ranges, the ranges in the global tables end up like this:
> >>
> >> i_4: [0, MAX]
> >> i.0_1: [0, MAX]
> >>
> >> This causes the first unreachable block to be removed in VRP1, but the
> >> second one to remain.  Later VRP can determine that i_4 in the sprintf
> >> call is [0,99], and no warning is issued.
> >>
> >> But... the missing bogus warning is due to current export_global_ranges
> >> ignoring SSA_NAME_RANGE_INFO and friends, something which I'd like to
> >> fix.  However, fixing this, gets us back to:
> >>
> >> i_4: [0, MAX]
> >> i.0_1: [0, 99]
> >>
> >> Which means, we'll be back to removing the unreachable blocks and
> >> issuing a warning in pr80776-1.c (like we have been since the beginning
> >> of time).
> >>
> >> The attached patch fixes export_global_ranges to the expected behavior,
> >> and adds the previous XFAIL to pr80776-1.c, while documenting why this
> >> warning is issued in the first place.
> >>
> >> Once legacy evrp is removed, this won't be an issue, as ranges in the IL
> >> will tell the truth.  However, this will mean that we will no longer
> >> remove the first __builtin_unreachable combo.  But ISTM, that would be
> >> correct behavior ??.
> >>
> >> BTW, in addition to this patch we could explore removing the
> >> assert_unreachable_fallthru_edge_p() use in the evrp_analyzer, since it
> >> is no longer needed to get the warnings in the testcases in the original
> >> PR correctly (gcc.dg/pr80776-[12].c).
> > But the whole point of all this singing and dancing is not to make
> > warnings but to be able to implement assert (); or assume (); that
> > will result in no code but optimization based on the assumption.
> >
> > That means that all the checks guarding __builtin_unreachable ()
> > should be removed at the GIMPLE level - just not too early
> > to preserve range info on the variables participating in the
> > 

Re: [ARM] PR97906 - Missed lowering abs(a) >= abs(b) to vacge

2021-06-07 Thread Prathamesh Kulkarni via Gcc-patches
On Tue, 1 Jun 2021 at 16:03, Prathamesh Kulkarni
 wrote:
>
> Hi,
> As mentioned in PR, for following test-case:
>
> #include 
>
> uint32x2_t f1(float32x2_t a, float32x2_t b)
> {
>   return vabs_f32 (a) >= vabs_f32 (b);
> }
>
> uint32x2_t f2(float32x2_t a, float32x2_t b)
> {
>   return (uint32x2_t) __builtin_neon_vcagev2sf (a, b);
> }
>
> We generate vacge for f2, but with -ffast-math, we generate following for f1:
> f1:
> vabs.f32d1, d1
> vabs.f32d0, d0
> vcge.f32d0, d0, d1
> bx  lr
>
> This happens because, the middle-end inverts the comparison to b <= a,
> .optimized dump:
>  _8 = __builtin_neon_vabsv2sf (a_4(D));
>   _7 = __builtin_neon_vabsv2sf (b_5(D));
>   _1 = _7 <= _8;
>   _2 = VIEW_CONVERT_EXPR(_1);
>   _6 = VIEW_CONVERT_EXPR(_2);
>   return _6;
>
> and combine fails to match the following pattern:
> (set (reg:V2SI 121)
> (neg:V2SI (le:V2SI (abs:V2SF (reg:V2SF 123))
> (abs:V2SF (reg:V2SF 122)
>
> because neon_vca pattern has GTGE code iterator.
> The attached patch adjusts the neon_vca patterns to use GLTE instead
> similar to neon_vca_fp16insn, and removes NEON_VACMP iterator.
> Code-gen with patch:
> f1:
> vacle.f32   d0, d1, d0
> bx  lr
>
> Bootstrapped + tested on arm-linux-gnueabihf and cross-tested on arm*-*-*.
> OK to commit ?
ping https://gcc.gnu.org/pipermail/gcc-patches/2021-June/571568.html

Thanks,
Prathamesh
>
> Thanks,
> Prathamesh


  1   2   >