Re: [PATCH][X86] Enable X86_TUNE_AVX256_UNALIGNED_{LOAD, STORE}_OPTIMAL for generic tune [PR target/98172]

2021-02-03 Thread Uros Bizjak via Gcc-patches
On Thu, Feb 4, 2021 at 5:28 AM Hongtao Liu  wrote:

> > > >GCC11 will be the system GCC 2 years from now, and for the
> > > > processors then, they shouldn't even need to split a 256-bit vector
> > > > into 2 128-bits vectors.
> > > >.i.e. Test SPEC2017 with the below 2 options on Zen3/ICL show
> > > > option B is better than Option A.
> > > > Option A:
> > > > -march=x86-64 -mtune=generic -mavx2 -mfma -Ofast
> > > >
> > > > Option B:
> > > > Option A + 
> > > > -mtune-ctrl="256_unaligned_load_optimal,256_unaligned_store_optimal"
> > > >
> > > >   Bootstrapped and regtested on x86-64_iinux-gnu{-m32,}.
> > >
> > > Given the explicit list for unaligned loads it's a no-brainer to change 
> > > that
> > > for X86_TUNE_AVX256_UNALIGNED_LOAD_OPTIMAL.  Given both
> > > BDVER and ZNVER1 are listed for X86_TUNE_AVX256_UNALIGNED_STORE_OPTIMAL
> > > we should try to benchmark the effect on ZNVER1 - Martin, do we still
> > > have a znver1 machine around?
> >
> > They are also turned on for Sandybridge.  I don't believe we should keep it
> > in GCC 11 to penalize today's CPUs as well as CPUs in 2024.
> >
> I agree with H.J, and I would also like to hear Uros' opinion.

I don't have any benchmark data to form my opinion on, but I
definitely agree that the compiler should tune for the newer processor
where speed matters the most, and 10 years old processors are
irrelevant as far as speed is concerned.

So, if it is expected that gcc-11 will be most used in 2-3 years from
now, it should by default target the architecture that will be most
used at that time. But I think that distribution maintainers should
decide here.

Uros.


[pushed] c++: No aggregate CTAD with explicit dguide [PR98802]

2021-02-03 Thread Jason Merrill via Gcc-patches
In my implementation of P2082R1 I missed this piece: the aggregate deduction
candidate is not generated if the class has user-written deduction guides.

Tested x86_64-pc-linux-gnu, applying to trunk.

gcc/cp/ChangeLog:

PR c++/98802
* pt.c (do_class_deduction): No aggregate guide if any_dguides_p.

gcc/testsuite/ChangeLog:

PR c++/98802
* g++.dg/cpp1z/class-deduction78.C: New test.
---
 gcc/cp/pt.c   |  5 +++--
 .../g++.dg/cpp1z/class-deduction78.C  | 20 +++
 2 files changed, 23 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1z/class-deduction78.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index af7c67af29f..3605b67e424 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -29272,8 +29272,9 @@ do_class_deduction (tree ptype, tree tmpl, tree init,
}
 }
 
-  if (tree guide = maybe_aggr_guide (tmpl, init, args))
-cands = lookup_add (guide, cands);
+  if (!any_dguides_p)
+if (tree guide = maybe_aggr_guide (tmpl, init, args))
+  cands = lookup_add (guide, cands);
 
   tree call = error_mark_node;
 
diff --git a/gcc/testsuite/g++.dg/cpp1z/class-deduction78.C 
b/gcc/testsuite/g++.dg/cpp1z/class-deduction78.C
new file mode 100644
index 000..651645486d2
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/class-deduction78.C
@@ -0,0 +1,20 @@
+// PR c++/98802
+// { dg-do compile { target c++17 } }
+
+using size_t = decltype(sizeof(42));
+
+template
+struct List {
+T head;
+List tail;
+};
+
+template
+struct List {};
+
+template List(T) -> List;
+template List(T, List) -> List;
+
+int main() {
+  auto list2 = List{0, List{1, List{2}}};
+}

base-commit: 7b258ac7afaaf7d8157df10ea35c6002d935d7d0
-- 
2.27.0



Re: [PATCH] [AVX512] Fix ICE: Convert integer mask to vector in ix86_expand_fp_vec_cmp/ix86_expand_int_vec_cmp [PR98537]

2021-02-03 Thread Hongtao Liu via Gcc-patches
Rebase and update patch:

Fix ICE: Don't generate integer mask comparison for 128/256-bits
vector when op_true/op_false are NULL or constm1_rtx/const0_rtx
[PR98537]

in ix86_expand_sse_cmp/ix86_expand_int_sse_cmp

gcc/ChangeLog:

PR target/98537
* config/i386/i386-expand.c (ix86_expand_sse_cmp): Don't
generate integer mask comparison for 128/256-bits vector when
op_true/op_false is NULL_RTX or CONSTM1_RTX/CONST0_RTX. Also
delete redundant !maskcmp condition.
(ix86_expand_int_vec_cmp): Ditto but no redundant deletion
here.
(ix86_expand_sse_movcc): Delete definition of maskcmp, add the
condition directly to if (maskcmp), add extra check for
cmpmode, it should be MODE_INT.
(ix86_expand_fp_vec_cmp): Pass NULL to ix86_expand_sse_cmp's
parameters op_true/op_false.
(ix86_use_mask_cmp_p): New.

gcc/testsuite/ChangeLog:

PR target/98537
* g++.target/i386/avx512bw-pr98537-1.C: New test.
* g++.target/i386/avx512vl-pr98537-1.C: New test.
* g++.target/i386/avx512vl-pr98537-2.C: New test.
* gcc.target/i386/avx512vl-pr88547-1.c: Adjust testcase,
integer mask comparison should not be generated.
* gcc.target/i386/avx512vl-pr92686-vpcmp-1.c: This test is
used to guard code generation of integer mask comparison, but
for vector comparison to vector dest, integer mask comparison
is disliked, so detele this useless test.
* gcc.target/i386/avx512vl-pr92686-vpcmp-2.c: Ditto.
* gcc.target/i386/avx512vl-pr92686-vpcmp-intelasm-1.c: Ditto.



-- 
BR,
Hongtao


0001-Fix-ICE-Don-t-generate-integer-mask-comparision-for-.patch
Description: Binary data


Re: [PATCH][X86] Enable X86_TUNE_AVX256_UNALIGNED_{LOAD, STORE}_OPTIMAL for generic tune [PR target/98172]

2021-02-03 Thread Hongtao Liu via Gcc-patches
On Thu, Jan 28, 2021 at 9:18 PM H.J. Lu  wrote:
>
> On Thu, Jan 28, 2021 at 1:21 AM Richard Biener via Gcc-patches
>  wrote:
> >
> > On Thu, Jan 28, 2021 at 7:32 AM Hongtao Liu via Gcc-patches
> >  wrote:
> > >
> > > Hi:
> > >GCC11 will be the system GCC 2 years from now, and for the
> > > processors then, they shouldn't even need to split a 256-bit vector
> > > into 2 128-bits vectors.
> > >.i.e. Test SPEC2017 with the below 2 options on Zen3/ICL show
> > > option B is better than Option A.
> > > Option A:
> > > -march=x86-64 -mtune=generic -mavx2 -mfma -Ofast
> > >
> > > Option B:
> > > Option A + 
> > > -mtune-ctrl="256_unaligned_load_optimal,256_unaligned_store_optimal"
> > >
> > >   Bootstrapped and regtested on x86-64_iinux-gnu{-m32,}.
> >
> > Given the explicit list for unaligned loads it's a no-brainer to change that
> > for X86_TUNE_AVX256_UNALIGNED_LOAD_OPTIMAL.  Given both
> > BDVER and ZNVER1 are listed for X86_TUNE_AVX256_UNALIGNED_STORE_OPTIMAL
> > we should try to benchmark the effect on ZNVER1 - Martin, do we still
> > have a znver1 machine around?
>
> They are also turned on for Sandybridge.  I don't believe we should keep it
> in GCC 11 to penalize today's CPUs as well as CPUs in 2024.
>
I agree with H.J, and I would also like to hear Uros' opinion.
> > Note that with the settings differing in a way to split stores but not to 
> > split
> > loads, loading a just stored value can cause bad STLF and quite a
> > performance hit (since znver1 has 128bit data paths that shouldn't
> > be an issue there but it would have an issue for actually aligned data
> > on CPUs with 256bit data paths).
> >
> > Thanks,
> > Richard.
> >
> > >   Ok for trunk?
> > >
> > >
> > >
> > >
> > > --
> > > BR,
> > > Hongtao
>
>
>
> --
> H.J.



-- 
BR,
Hongtao


[pushed] c++: subst failure in attribute argument [PR95192]

2021-02-03 Thread Jason Merrill via Gcc-patches
Another SFINAE issue: we weren't propagating substitution failure in
attributes back up.  And tsubst_function_decl needs to check substitution
before register_specialization.  I thought about moving the other error
returns up as well, but they aren't SFINAE cases, so they can stay where
they are.

This change caused pr84630.C to stop giving an error; this was because
partial instantiation of the lambda failed silently, and before the change
that meant error_mark_node passed to decl_attributes, which complained about
there being an argument at all.  With the change the partial instantiation
fails, but no error was ever given, because push_template_decl silently
failed if current_template_parms wasn't set.  So let's set c_t_p
appropriately.  lambda-uneval13.C is a valid testcase to exercise this.

Tested x86_64-pc-linux-gnu, applying to trunk.

gcc/cp/ChangeLog:

PR c++/95192
* pt.c (tsubst_attribute): Handle error.
(apply_late_template_attributes): Return false on error.
(tsubst_function_decl): Check its return value.
(tsubst_decl): Likewise.
(push_template_decl): Assert current_template_parms.
(tsubst_template_decl): Set current_template_parms.

gcc/testsuite/ChangeLog:

PR c++/95192
* g++.dg/cpp0x/pr84630.C: Call b().
* g++.dg/cpp2a/lambda-uneval13.C: New test.
* g++.dg/ext/attr-expr1.C: New test.
---
 gcc/cp/pt.c  | 41 +---
 gcc/testsuite/g++.dg/cpp0x/pr84630.C |  1 +
 gcc/testsuite/g++.dg/cpp2a/lambda-uneval13.C | 11 ++
 gcc/testsuite/g++.dg/ext/attr-expr1.C|  9 +
 4 files changed, 48 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/lambda-uneval13.C
 create mode 100644 gcc/testsuite/g++.dg/ext/attr-expr1.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index c5b0a9292db..af7c67af29f 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -11571,6 +11571,8 @@ tsubst_attribute (tree t, tree *decl_p, tree args,
 val = tsubst_expr (val, args, complain, in_decl,
   /*integral_constant_expression_p=*/false);
 
+  if (val == error_mark_node)
+return error_mark_node;
   if (val != TREE_VALUE (t))
 return build_tree_list (TREE_PURPOSE (t), val);
   return t;
@@ -11617,9 +11619,10 @@ tsubst_attributes (tree attributes, tree args,
 
 /* Apply any attributes which had to be deferred until instantiation
time.  DECL_P, ATTRIBUTES and ATTR_FLAGS are as cplus_decl_attributes;
-   ARGS, COMPLAIN, IN_DECL are as tsubst.  */
+   ARGS, COMPLAIN, IN_DECL are as tsubst.  Returns true normally,
+   false on error.  */
 
-static void
+static bool
 apply_late_template_attributes (tree *decl_p, tree attributes, int attr_flags,
tree args, tsubst_flags_t complain, tree 
in_decl)
 {
@@ -11628,12 +11631,12 @@ apply_late_template_attributes (tree *decl_p, tree 
attributes, int attr_flags,
   tree *p;
 
   if (attributes == NULL_TREE)
-return;
+return true;
 
   if (DECL_P (*decl_p))
 {
   if (TREE_TYPE (*decl_p) == error_mark_node)
-   return;
+   return false;
   p = _ATTRIBUTES (*decl_p);
   /* DECL_ATTRIBUTES comes from copy_node in tsubst_decl, and is identical
  to our attributes parameter.  */
@@ -11668,9 +11671,11 @@ apply_late_template_attributes (tree *decl_p, tree 
attributes, int attr_flags,
  t = *p;
  if (ATTR_IS_DEPENDENT (t))
{
+ *q = tsubst_attribute (t, decl_p, args, complain, in_decl);
+ if (*q == error_mark_node)
+   return false;
  *p = TREE_CHAIN (t);
  TREE_CHAIN (t) = NULL_TREE;
- *q = tsubst_attribute (t, decl_p, args, complain, in_decl);
  while (*q)
q = _CHAIN (*q);
}
@@ -11680,6 +11685,7 @@ apply_late_template_attributes (tree *decl_p, tree 
attributes, int attr_flags,
 
   cplus_decl_attributes (decl_p, late_attrs, attr_flags);
 }
+  return true;
 }
 
 /* The template TMPL is being instantiated with the template arguments TARGS.
@@ -14048,6 +14054,10 @@ tsubst_function_decl (tree t, tree args, 
tsubst_flags_t complain,
 tsubst (DECL_FRIEND_CONTEXT (t),
 args, complain, in_decl));
 
+  if (!apply_late_template_attributes (, DECL_ATTRIBUTES (r), 0,
+  args, complain, in_decl))
+return error_mark_node;
+
   /* Set up the DECL_TEMPLATE_INFO for R.  There's no need to do
  this in the special friend case mentioned above where
  GEN_TMPL is NULL.  */
@@ -14127,8 +14137,6 @@ tsubst_function_decl (tree t, tree args, tsubst_flags_t 
complain,
   && !processing_template_decl)
 defaulted_late_check (r);
 
-  apply_late_template_attributes (, DECL_ATTRIBUTES (r), 0,
- args, complain, in_decl);
   if (flag_openmp)
 if (tree attr = 

Re: [PATCH] improve detection of incompatible redeclarations (PR 97882)

2021-02-03 Thread Joseph Myers
On Wed, 3 Feb 2021, Martin Sebor via Gcc-patches wrote:

> +/* Return true of T1 and T2 are matching types for the purposes of
> +   redeclaring a variable or a function without a prototype (i.e.,
> +   considering just its return type).  */

I think this comment is confusing (it suggests it's checking something 
looser than the notion of compatibility checked by comptypes, but it's 
actually checking something stricter).  But I also think it's wrong to do 
anything restricted to the return type.

For example, I think the following should be rejected, as enum E and 
unsigned int end up incompatible and quite likely ABI-incompatible.

enum E;
void f(enum E);
void f(unsigned int);
enum E { x = 1ULL << 63 };

Maybe the ICE is specific to the case of return types, but I think the 
same rules about type compatibility should apply regardless of precisely 
where the incomplete enum type appears.

I'd expect the natural fix for this PR to involve making 
comptypes_internal treat an incomplete enum as incompatible with an 
integer type.  Only if that causes too many problems should we then look 
at other approaches.

-- 
Joseph S. Myers
jos...@codesourcery.com


[PATCH] c++: Fix bogus -Wvolatile warning in C++20 [PR98947]

2021-02-03 Thread Marek Polacek via Gcc-patches
Since most of volatile is deprecated in C++20, we are required to warn
for compound assignments to volatile variables and so on.  But here we
have

  volatile int x, y, z;
  (b ? x : y) = 1;

and we shouldn't warn, because simple assignments like x = 24; should
not provoke the warning when they are a discarded-value expression.

We warn here because when ?: is used as an lvalue, we transform it in
cp_build_modify_expr/COND_EXPR from (a ? b : c) = rhs to

  (a ? (b = rhs) : (c = rhs))

and build_conditional_expr then calls mark_lvalue_use for the new
artificial assignments, which then evokes the warning.  So use
a warning sentinel.  But since we should still warn for

  (b ? x : y) += 1; // compound assignment
  (b ? (x = 2) : y) = 1; // lvalue-to-rvalue conv on x = 2

I've tweaked mark_use to only set TREE_THIS_VOLATILE when the warning
is enabled.

I'd argue this is a regression because GCC 9 doesn't warn.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/10?

gcc/cp/ChangeLog:

PR c++/98947
* expr.c (mark_use) : Only set TREE_THIS_VOLATILE
if warn_volatile.
* typeck.c (cp_build_modify_expr) : Add a warning
sentinel for -Wvolatile around build_conditional_expr.

gcc/testsuite/ChangeLog:

PR c++/98947
* g++.dg/cpp2a/volatile5.C: New test.
---
 gcc/cp/expr.c  |  3 ++-
 gcc/cp/typeck.c|  4 
 gcc/testsuite/g++.dg/cpp2a/volatile5.C | 15 +++
 3 files changed, 21 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/volatile5.C

diff --git a/gcc/cp/expr.c b/gcc/cp/expr.c
index 480e740f08c..d697978ce19 100644
--- a/gcc/cp/expr.c
+++ b/gcc/cp/expr.c
@@ -228,7 +228,8 @@ mark_use (tree expr, bool rvalue_p, bool read_p,
  && !cp_unevaluated_operand
  && (TREE_THIS_VOLATILE (lhs)
  || CP_TYPE_VOLATILE_P (TREE_TYPE (lhs)))
- && !TREE_THIS_VOLATILE (expr))
+ && !TREE_THIS_VOLATILE (expr)
+ && warn_volatile)
{
  warning_at (location_of (expr), OPT_Wvolatile,
  "using value of simple assignment with %-"
diff --git a/gcc/cp/typeck.c b/gcc/cp/typeck.c
index a87d5e5f2ac..52c2344530d 100644
--- a/gcc/cp/typeck.c
+++ b/gcc/cp/typeck.c
@@ -8617,6 +8617,10 @@ cp_build_modify_expr (location_t loc, tree lhs, enum 
tree_code modifycode,
tree op2 = TREE_OPERAND (lhs, 2);
if (TREE_CODE (op2) != THROW_EXPR)
  op2 = cp_build_modify_expr (loc, op2, modifycode, rhs, complain);
+   /* build_conditional_expr calls mark_lvalue_use for op1/op2,
+  which are now assignments due to the above transformation,
+  generating bogus C++20 warnings.  */
+   warning_sentinel w (warn_volatile);
tree cond = build_conditional_expr (input_location,
TREE_OPERAND (lhs, 0), op1, op2,
complain);
diff --git a/gcc/testsuite/g++.dg/cpp2a/volatile5.C 
b/gcc/testsuite/g++.dg/cpp2a/volatile5.C
new file mode 100644
index 000..1f9d23845b4
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/volatile5.C
@@ -0,0 +1,15 @@
+// PR c++/98947
+// { dg-do compile }
+
+volatile int x, y, z;
+
+void
+f (bool b)
+{
+  (b ? x : y) = 1;
+  (b ? x : y) += 1; // { dg-warning "compound assignment" "" { target c++20 } }
+  z = (b ? x : y) = 1; // { dg-warning "using value of simple assignment" "" { 
target c++20 } }
+  ((z = 2) ? x : y) = 1; // { dg-warning "using value of simple assignment" "" 
{ target c++20 } }
+  (b ? (x = 2) : y) = 1; // { dg-warning "using value of simple assignment" "" 
{ target c++20 } }
+  (b ? x : (y = 5)) = 1; // { dg-warning "using value of simple assignment" "" 
{ target c++20 } }
+}

base-commit: 34215a7a3a359d700a520f1d5bdaec835f0b5180
-- 
2.29.2



[PATCH] improve detection of incompatible redeclarations (PR 97882)

2021-02-03 Thread Martin Sebor via Gcc-patches

The test case in the bug report shows that the C front end is
too permissive in accepting invalid redeclarations that involve
an incomplete enum and an otherwise compatible integer type as
return types of a function without a prototype, or as types of
an ordinary variable.  For example, the redeclaration below is
accepted:

  extern enum E { e0 } e;
  extern unsigned e;

In the case of a function the redeclaration can lead to a back
end ICE when one of the declarations is a function definition:

  extern enum F f ();
  extern unsigned f () { }

The attached patch tightens up the front end to reject even these
invalid redeclarations, thus avoiding the ICE.

Tested on x86_64-linux.

The bug is a P2 GCC 7-11 regression but in his comment Joseph
suggests to avoid backporting the fix to release branches due to
the potential to invalidate otherwise presumably benign code that's
accepted there, so I'm looking for approval only for GCC 11.

Martin
PR c/97882 - Segmentation Fault on improper redeclaration of function

gcc/c/ChangeLog:

	PR c/97882
	* c-decl.c (matching_types_p): New function.
	(diagnose_mismatched_decls): Call it.  Add detail to warning message.
	(start_function): Call matching_types_p instead of comptypes.

gcc/testsuite/ChangeLog:

	PR c/97882
	* gcc.dg/decl-8.c: Adjust text of expected diagnostic.
	* gcc.dg/label-decl-4.c: Same.
	* gcc.dg/mismatch-decl-1.c: Same.
	* gcc.dg/old-style-then-proto-1.c: Same.
	* gcc.dg/parm-mismatch-1.c: Same.
	* gcc.dg/pr35445.c: Same.
	* gcc.dg/redecl-11.c: Same.
	* gcc.dg/redecl-12.c: Same.
	* gcc.dg/redecl-13.c: Same.
	* gcc.dg/redecl-15.c: Same.
	* gcc.dg/tls/thr-init-1.c: Same.
	* objc.dg/tls/diag-3.m: Same.
	* c-c++-common/array-lit.s: New test.
	* gcc.dg/pr97882.c: New test.
	* gcc.dg/qual-return-7.c: New test.
	* gcc.dg/qual-return-8.c: New test.

diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
index be95643fcf9..d8102d71766 100644
--- a/gcc/c/c-decl.c
+++ b/gcc/c/c-decl.c
@@ -1910,15 +1910,79 @@ validate_proto_after_old_defn (tree newdecl, tree newtype, tree oldtype)
 static void
 locate_old_decl (tree decl)
 {
-  if (TREE_CODE (decl) == FUNCTION_DECL && fndecl_built_in_p (decl)
+  if (TREE_CODE (decl) == FUNCTION_DECL
+  && fndecl_built_in_p (decl)
   && !C_DECL_DECLARED_BUILTIN (decl))
 ;
   else if (DECL_INITIAL (decl))
-inform (input_location, "previous definition of %q+D was here", decl);
+inform (input_location,
+	"previous definition of %q+D with type %qT",
+	decl, TREE_TYPE (decl));
   else if (C_DECL_IMPLICIT (decl))
-inform (input_location, "previous implicit declaration of %q+D was here", decl);
+inform (input_location,
+	"previous implicit declaration of %q+D with type %qT",
+	decl, TREE_TYPE (decl));
   else
-inform (input_location, "previous declaration of %q+D was here", decl);
+inform (input_location,
+	"previous declaration of %q+D with type %qT",
+	decl, TREE_TYPE (decl));
+}
+
+/* Return true of T1 and T2 are matching types for the purposes of
+   redeclaring a variable or a function without a prototype (i.e.,
+   considering just its return type).  */
+
+static bool
+matching_types_p (tree_code which, tree t0, tree t1)
+{
+  bool types_differ = false;
+  if (!comptypes_check_different_types (t0, t1, _differ))
+return false;
+
+  if (!types_differ)
+return true;
+
+  if (which == FUNCTION_DECL)
+{
+  /* When WHICH denotes a function T1 and T2 may be either function
+	 types or return types.  */
+  if (TREE_CODE (t0) == FUNCTION_TYPE)
+	{
+	  if (prototype_p (t0))
+	return true;
+
+	  t0 = TREE_TYPE (t0);
+	  t1 = TREE_TYPE (t1);
+	}
+
+  if (flag_isoc11)
+	{
+	  t0 = TYPE_MAIN_VARIANT (t0);
+	  t1 = TYPE_MAIN_VARIANT (t1);
+	}
+}
+  else if (which != VAR_DECL)
+return true;
+
+  /* Check the return type by itself and detect mismatches in non-pointer
+ types only.  Pointers to arrays may be different when thee latter
+ specifies a bound that the former doesn't.  */
+
+  while (TREE_CODE (t0) == ARRAY_TYPE || POINTER_TYPE_P (t0))
+t0 = TREE_TYPE (t0);
+  while (TREE_CODE (t1) == ARRAY_TYPE || POINTER_TYPE_P (t1))
+t1 = TREE_TYPE (t1);
+
+  if (TREE_CODE (t0) == FUNCTION_TYPE)
+/* For pointers to functions be sure to apply the rules above.  */
+return matching_types_p (FUNCTION_DECL, t0, t1);
+
+  types_differ = false;
+  if (!comptypes_check_different_types (t0, t1, _differ)
+  || types_differ)
+return false;
+
+  return true;
 }
 
 /* Subroutine of duplicate_decls.  Compare NEWDECL to OLDDECL.
@@ -1983,8 +2047,7 @@ diagnose_mismatched_decls (tree newdecl, tree olddecl,
   bool pedwarned = false;
   bool warned = false;
   auto_diagnostic_group d;
-
-  if (!comptypes (oldtype, newtype))
+  if (!matching_types_p (TREE_CODE (olddecl), oldtype, newtype))
 {
   if (TREE_CODE (olddecl) == FUNCTION_DECL
 	  && fndecl_built_in_p (olddecl, BUILT_IN_NORMAL)
@@ -2083,7 +2146,8 @@ diagnose_mismatched_decls (tree 

Re: [PATCH] adjust "partly out of bounds" warning (PR 98503)

2021-02-03 Thread Martin Sebor via Gcc-patches

On 2/3/21 2:57 PM, Jeff Law wrote:



On 1/28/21 4:03 PM, Martin Sebor wrote:

The GCC 11 -Warray-bounds enhancement to diagnose accesses whose
leading offset is in bounds but whose trailing offset is not has
been causing some confusion.  When the warning is issued for
an access to an in-bounds member via a pointer to a struct that's
larger than the pointed-to object, phrasing this strictly invalid
access in terms of array subscripts can be misleading, especially
when the source code doesn't involve any arrays or indexing.

Since the problem boils down to an aliasing violation much more
so than an actual out-of-bounds access, the attached patch adjusts
the code to issue a -Wstrict-aliasing warning in these cases instead
of -Warray-bounds.  In addition, since the aliasing assumptions in
GCC can be disabled by -fno-strict-aliasing, the patch also makes
these instances of the warning conditional on -fstrict-aliasing
being in effect.

Martin

gcc-98503.diff

PR middle-end/98503 -Warray-bounds when -Wstrict-aliasing would be more 
appropriate

gcc/ChangeLog:

PR middle-end/98503
* gimple-array-bounds.cc (array_bounds_checker::check_mem_ref):
Issue -Wstrict-aliasing for a subset of violations.
(array_bounds_checker::check_array_bounds):  Set new member.
* gimple-array-bounds.h (array_bounds_checker::cref_of_mref): New
data member.

gcc/testsuite/ChangeLog:

PR middle-end/98503
* g++.dg/warn/Warray-bounds-10.C: Adjust text of expected warnings.
* g++.dg/warn/Warray-bounds-11.C: Same.
* g++.dg/warn/Warray-bounds-12.C: Same.
* g++.dg/warn/Warray-bounds-13.C: Same.
* gcc.dg/Warray-bounds-63.c: Avoid -Wstrict-aliasing.  Adjust text
of expected warnings.
* gcc.dg/Warray-bounds-66.c: Adjust text of expected warnings.
* gcc.dg/Wstrict-aliasing-2.c: New test.
* gcc.dg/Wstrict-aliasing-3.c: New test.

What I don't like here is the strict-aliasing warnings inside the file
and analysis phase for array bounds checking.

ISTM that catching this at cast time would be better.  So perhaps in
build_c_cast and the C++ equivalent?

It would mean we're warning at the cast site rather than the
dereference, but that may ultimately be better for the warning anyway.
I'm not sure.


I had actually experimented with a this (in the middle end, and only
for accesses) but even that caused so many warnings that I abandoned
it, at least for now.  -Warray-bounds has the benefit of flow analysis
(and dead code elimination).  In the front end it would have neither
and be both excessively noisy and ineffective at the same time.  For
example:

  struct A { int i; };
  struct B { int i; };

  struct A a;
  B *p = (B*)   // FE would warn here (even if there's no access)

That's also why the current -Warray-bounds (and the proposed
-Wstrict-aliasing refinement) only triggers for accesses via larger
types, as in:

  struct C { int i, j; };
  void *p = // FE can't warn here
  C *q = (C*)p;// or here
  p->j = 0;// middle end warns here

Martin

PS The existing -Wsrtrict-aliasing in the front end is superficial
at best.  It only detects problems in simple accesses that don't
involve any data flow.  For instance it warns only if f() below but
not in g():

  int a;

  void f (void)
  {
*(float*) = 0;   // -Wstrict-aliasing
  }

  void g (void)
  {
float *p = (float*)
*p = 0;// silence
  }

I'd like to improve it at some point but to catch the problem in g()
the warning will certainly have to move into the middle end.


Re: [PATCH] libcpp: Fix up -fdirectives-only preprocessing [PR98882]

2021-02-03 Thread Jeff Law via Gcc-patches



On 1/29/21 4:01 PM, Jakub Jelinek via Gcc-patches wrote:
> Hi!
>
> GCC 11 ICEs on all -fdirectives-only preprocessing when the files don't end
> with a newline.
>
> The problem is in the assertion, for empty TUs buffer->cur == buffer->rlimit
> and so buffer->rlimit[-1] access triggers UB in the preprocessor, for
> non-empty TUs it refers to the last character in the file, which can be
> anything.
> The preprocessor adds a '\n' character (or '\r', in particular if the
> user file ends with '\r' then it adds another '\r' rather than '\n'), but
> that is added after the limit, i.e. at buffer->rlimit[0].
>
> Now, if the routine handles occassional bumping of pos to buffer->rlimit + 1,
> I think it is just the assert that needs changing, usually we read from *pos
> if pos < limit and then e.g. if it is '\r', look at the following character
> (which could be one of those '\n' or '\r' at buffer->rlimit[0]).  There is
> also the case where for '\\' before the limit we read following character
> and if it is '\n', do one thing, if it is '\r' read another character.
> But in that case if '\\' was the last char in the TU, the limit char will be
> '\n', so we are ok.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2021-01-29  Jakub Jelinek  
>
>   PR preprocessor/98882
>   * lex.c (cpp_directive_only_process): Don't assert that rlimit[-1]
>   is a newline, instead assert that rlimit[0] is either newline or
>   carriage return.  When seeing '\\' followed by '\r', check limit
>   before accessing pos[1].
OK
jeff



Re: [PATCH] adjust "partly out of bounds" warning (PR 98503)

2021-02-03 Thread Jeff Law via Gcc-patches



On 1/28/21 4:03 PM, Martin Sebor wrote:
> The GCC 11 -Warray-bounds enhancement to diagnose accesses whose
> leading offset is in bounds but whose trailing offset is not has
> been causing some confusion.  When the warning is issued for
> an access to an in-bounds member via a pointer to a struct that's
> larger than the pointed-to object, phrasing this strictly invalid
> access in terms of array subscripts can be misleading, especially
> when the source code doesn't involve any arrays or indexing.
>
> Since the problem boils down to an aliasing violation much more
> so than an actual out-of-bounds access, the attached patch adjusts
> the code to issue a -Wstrict-aliasing warning in these cases instead
> of -Warray-bounds.  In addition, since the aliasing assumptions in
> GCC can be disabled by -fno-strict-aliasing, the patch also makes
> these instances of the warning conditional on -fstrict-aliasing
> being in effect.
>
> Martin
>
> gcc-98503.diff
>
> PR middle-end/98503 -Warray-bounds when -Wstrict-aliasing would be more 
> appropriate
>
> gcc/ChangeLog:
>
>   PR middle-end/98503
>   * gimple-array-bounds.cc (array_bounds_checker::check_mem_ref):
>   Issue -Wstrict-aliasing for a subset of violations.
>   (array_bounds_checker::check_array_bounds):  Set new member.
>   * gimple-array-bounds.h (array_bounds_checker::cref_of_mref): New
>   data member.
>
> gcc/testsuite/ChangeLog:
>
>   PR middle-end/98503
>   * g++.dg/warn/Warray-bounds-10.C: Adjust text of expected warnings.
>   * g++.dg/warn/Warray-bounds-11.C: Same.
>   * g++.dg/warn/Warray-bounds-12.C: Same.
>   * g++.dg/warn/Warray-bounds-13.C: Same.
>   * gcc.dg/Warray-bounds-63.c: Avoid -Wstrict-aliasing.  Adjust text
>   of expected warnings.
>   * gcc.dg/Warray-bounds-66.c: Adjust text of expected warnings.
>   * gcc.dg/Wstrict-aliasing-2.c: New test.
>   * gcc.dg/Wstrict-aliasing-3.c: New test.
What I don't like here is the strict-aliasing warnings inside the file
and analysis phase for array bounds checking.

ISTM that catching this at cast time would be better.  So perhaps in
build_c_cast and the C++ equivalent?

It would mean we're warning at the cast site rather than the
dereference, but that may ultimately be better for the warning anyway. 
I'm not sure.



Jeff



Re: [PATCH] doc: mention -mprefer-vector-width in target attrs

2021-02-03 Thread Jeff Law via Gcc-patches



On 2/3/21 8:15 AM, Martin Liška wrote:
> The patch documents -mprefer-vector-width which is a valid target
> attribute/pragma.
>
> Ready for master?
> Thanks,
> Martin
>
> gcc/ChangeLog:
>
> * doc/extend.texi: Mention -mprefer-vector-width in target
> attributes.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/prefer-vector-width-attr.c: New test.
OK
jeff



[committed] libphobos: Merge upstream druntime 9d0c8364, phobos 9d575282e.

2021-02-03 Thread Iain Buclaw via Gcc-patches
Hi,

This patch merges the D runtime library with upstream druntime 9d0c8364,
and the standard library with upstream phobos 9d575282e.

Adds bindings to libdruntime to replace extern(C) declarations found in
the phobos part of the library, and fixes an issue with the locale
bindings being incomplete (PR98910).

Bootstrapped and regression tested on x86_64-linux-gnu/-m32/-mx32, and
committed to mainline.

Regards,
Iain.

---
libphobos/ChangeLog:

PR d/98910
* libdruntime/MERGE: Merge upstream druntime 9d0c8364.
* libdruntime/Makefile.am (DRUNTIME_DSOURCES): Add
  core/internal/attributes.d
(DRUNTIME_DSOURCES_BIONIC): Add core/sys/bionic/stdlib.d.
(DRUNTIME_DSOURCES_DARWIN): Add core/sys/darwin/stdlib.d, and
core/sys/darwin/sys/sysctl.d.
(DRUNTIME_DSOURCES_DRAGONFLYBSD): Add
core/sys/dragonflybsd/stdlib.d, and
core/sys/dragonflybsd/sys/sysctl.d.
(DRUNTIME_DSOURCES_FREEBSD): Add core/sys/freebsd/stdlib.d, and
core/sys/freebsd/sys/sysctl.d.
(DRUNTIME_DSOURCES_NETBSD): Add core/sys/netbsd/stdlib.d, and
core/sys/netbsd/sys/sysctl.d.
(DRUNTIME_DSOURCES_OPENBSD): Add core/sys/openbsd/stdlib.d, and
core/sys/openbsd/sys/sysctl.d.
(DRUNTIME_DSOURCES_SOLARIS): Add core/sys/solaris/stdlib.d.
* libdruntime/Makefile.in: Regenerate.
* src/MERGE: Merge upstream phobos 9d575282e.
---
 libphobos/libdruntime/MERGE   |   2 +-
 libphobos/libdruntime/Makefile.am |  69 ++--
 libphobos/libdruntime/Makefile.in | 174 +
 .../libdruntime/core/internal/attributes.d|  11 +
 .../libdruntime/core/sys/bionic/stdlib.d  |  17 +
 .../libdruntime/core/sys/darwin/mach/dyld.d   |   5 +-
 .../libdruntime/core/sys/darwin/stdlib.d  |  26 ++
 .../libdruntime/core/sys/darwin/sys/sysctl.d  | 253 +
 .../core/sys/dragonflybsd/stdlib.d|  17 +
 .../core/sys/dragonflybsd/sys/sysctl.d| 199 +++
 .../libdruntime/core/sys/freebsd/stdlib.d |  17 +
 .../libdruntime/core/sys/freebsd/sys/sysctl.d | 211 +++
 .../libdruntime/core/sys/netbsd/stdlib.d  |  17 +
 .../libdruntime/core/sys/netbsd/sys/sysctl.d  | 254 +
 .../libdruntime/core/sys/openbsd/stdlib.d |  17 +
 .../libdruntime/core/sys/openbsd/sys/sysctl.d | 254 +
 libphobos/libdruntime/core/sys/posix/locale.d | 335 +++---
 libphobos/libdruntime/core/sys/posix/mqueue.d |   6 +-
 .../libdruntime/core/sys/posix/pthread.d  |   3 +-
 .../libdruntime/core/sys/posix/sys/statvfs.d  | 101 --
 .../libdruntime/core/sys/posix/sys/types.d|   9 +-
 .../libdruntime/core/sys/solaris/stdlib.d |  17 +
 libphobos/src/MERGE   |   2 +-
 libphobos/src/std/conv.d  |   2 -
 libphobos/src/std/datetime/systime.d  | 110 --
 libphobos/src/std/datetime/timezone.d |  17 +-
 libphobos/src/std/exception.d |   5 +-
 .../allocator/building_blocks/region.d|  44 ++-
 .../experimental/allocator/mmap_allocator.d   |  17 +
 libphobos/src/std/file.d  |  88 -
 libphobos/src/std/math.d  |  33 +-
 libphobos/src/std/parallelism.d   | 233 +++-
 libphobos/src/std/socket.d|   4 +-
 libphobos/src/std/stdio.d |   9 +-
 libphobos/src/std/system.d|   6 +
 35 files changed, 2172 insertions(+), 412 deletions(-)
 create mode 100644 libphobos/libdruntime/core/internal/attributes.d
 create mode 100644 libphobos/libdruntime/core/sys/bionic/stdlib.d
 create mode 100644 libphobos/libdruntime/core/sys/darwin/stdlib.d
 create mode 100644 libphobos/libdruntime/core/sys/darwin/sys/sysctl.d
 create mode 100644 libphobos/libdruntime/core/sys/dragonflybsd/stdlib.d
 create mode 100644 libphobos/libdruntime/core/sys/dragonflybsd/sys/sysctl.d
 create mode 100644 libphobos/libdruntime/core/sys/freebsd/stdlib.d
 create mode 100644 libphobos/libdruntime/core/sys/freebsd/sys/sysctl.d
 create mode 100644 libphobos/libdruntime/core/sys/netbsd/stdlib.d
 create mode 100644 libphobos/libdruntime/core/sys/netbsd/sys/sysctl.d
 create mode 100644 libphobos/libdruntime/core/sys/openbsd/stdlib.d
 create mode 100644 libphobos/libdruntime/core/sys/openbsd/sys/sysctl.d
 create mode 100644 libphobos/libdruntime/core/sys/solaris/stdlib.d

diff --git a/libphobos/libdruntime/MERGE b/libphobos/libdruntime/MERGE
index 4654e58e2d9..3485bde1200 100644
--- a/libphobos/libdruntime/MERGE
+++ b/libphobos/libdruntime/MERGE
@@ -1,4 +1,4 @@
-e4aae28e36c118f13e346a61af6c413aadd8e838
+9d0c8364450064d0b6e68da4384f8acd19eb454f
 
 The first line of this file holds the git revision number of the last
 merge done from the dlang/druntime repository.
diff --git a/libphobos/libdruntime/Makefile.am 
b/libphobos/libdruntime/Makefile.am
index 57de872862b..df2c06c3dab 100644
--- 

[wwwdocs] Mention C++23 flags and addition of size_t literals.

2021-02-03 Thread Ed Smith-Rowland via Gcc-patches

I pushed this patch to update changes to mention the C++23 flags and the size_t 
literals.

Ed

commit bfc68f3f88406a07757d111a0f894426a6bc4522
Author: Ed Smith-Rowland <3dw...@verizon.net  >
Date:   Wed Feb 3 14:23:00 2021 -0500

Mention C++23 flags and P0330R8, Literal Suffix for (signed) size_t.
Tweak an href in cxx-status.

diff --git a/htdocs/gcc-11/changes.html b/htdocs/gcc-11/changes.html
index efbf3341..a8b036c3 100644
--- a/htdocs/gcc-11/changes.html
+++ b/htdocs/gcc-11/changes.html
@@ -251,6 +251,18 @@ a work-in-progress.
   to C++ 20 Status
 
   
+  
+The C++ front end has experimental support for some of the upcoming C++23
+draft features with the -std=c++23, -std=gnu++23,
+-std=c++2b or -std=gnu++2b flags,
+including
+
+  P0330R8, Literal Suffix for (signed) size_t.
+
+For a full list of new features,
+see the C++
+status page.
+  
   Several C++ Defect Reports have been resolved, e.g.:
 
   DR 625, Use of auto as a template-argument
diff --git a/htdocs/projects/cxx-status.html b/htdocs/projects/cxx-status.html
index 91916930..94f45def 100644
--- a/htdocs/projects/cxx-status.html
+++ b/htdocs/projects/cxx-status.html
@@ -19,7 +19,7 @@
 C++14
 C++17
 C++20
-C++23
+C++23
 Technical Specifications
   
   


---

Summary of changes:
 htdocs/gcc-11/changes.html  | 12 
 htdocs/projects/cxx-status.html |  2 +-
 2 files changed, 13 insertions(+), 1 deletion(-)



Re: [PATCH 2/2] RISC-V: Add riscv{32, 64}be with big endian as default

2021-02-03 Thread Marcus Comstedt


Kito Cheng  writes:

> Yeah, but I'd like to include following 2 lines too:
>
> %{mbig-endian:-EB} \
> %{mlittle-endian:-EL} \
>
> I saw it's just the same among 3 files.

Ah, I see.  Then it becomes a little more of a mixed grab bag.

I see that SuperH has a spec "subtarget_link_spec" which includes
-EL/-EB as well as other stuff, and then includes that into a
SH_LINK_SPEC, which also adds the link emulation as well as
e.g. mrelax.

Maybe we could take a page from their playbook but skip the extra
complexity, and just use "RISCV_LINK_SPEC" for the common stuff?

What about "%{mno-relax:--no-relax}"?  It's the same in linux.h and
elf.h, but missing from freebsd.h.  Intentional, or is this another
candidate for putting in the common define?


  // Marcus




Re: [PATCH] c++: Mark member functions as const [PR98951]

2021-02-03 Thread Jason Merrill via Gcc-patches

On 2/3/21 12:31 PM, Marek Polacek wrote:

These member functions look like they could be marked const, since
they don't modify any (non-mutable) class members.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?


OK, I guess this is trivial enough for stage4.


PR c++/98951
* call.c (struct z_candidate): Mark rewritten and reversed as const.
(struct NonPublicField): Mark operator() as const.
(struct NonTrivialField): Likewise.
---
  gcc/cp/call.c | 8 
  1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index 87a7af12796..3068c0f8cfd 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -520,8 +520,8 @@ struct z_candidate {
/* The flags active in add_candidate.  */
int flags;
  
-  bool rewritten () { return (flags & LOOKUP_REWRITTEN); }

-  bool reversed () { return (flags & LOOKUP_REVERSED); }
+  bool rewritten () const { return (flags & LOOKUP_REWRITTEN); }
+  bool reversed () const { return (flags & LOOKUP_REVERSED); }
  };
  
  /* Returns true iff T is a null pointer constant in the sense of

@@ -9474,7 +9474,7 @@ first_non_static_field (tree type, Predicate pred)
  
  struct NonPublicField

  {
-  bool operator() (const_tree t)
+  bool operator() (const_tree t) const
{
  return DECL_P (t) && (TREE_PRIVATE (t) || TREE_PROTECTED (t));
}
@@ -9491,7 +9491,7 @@ first_non_public_field (tree type)
  
  struct NonTrivialField

  {
-  bool operator() (const_tree t)
+  bool operator() (const_tree t) const
{
  return !trivial_type_p (DECL_P (t) ? TREE_TYPE (t) : t);
}

base-commit: 530203d6e3244c25eda4124f0fa5756ca9a5683e





[AArch64] Fix vector multiplication costs

2021-02-03 Thread Andre Vieira (lists) via Gcc-patches
This patch introduces a vect.mul RTX cost and decouples the vector 
multiplication costing from the scalar one.


After Wilco's "AArch64: Add cost table for Cortex-A76" patch we saw a 
regression in vector codegen. Reproduceable with the small test added in 
this patch.
Upon further investigation we noticed 'aarch64_rtx_mult_cost' was using 
scalar costs to calculate the cost of vector multiplication, which was 
now lower and preventing 'choose_mult_variant' from making the right 
choice to expand such vector multiplications with constants as shift and 
sub's. I also added a special case for SSRA to use the default vector 
cost rather than mult, SSRA seems to be cost using 
'aarch64_rtx_mult_cost', which to be fair is quite curious. I believe we 
should have a better look at 'aarch64_rtx_costs' altogether and 
completely decouple vector and scalar costs. Though that is something 
that requires more rewriting than I believe should be done in Stage 4.


I gave all targets a vect.mult cost of 4x the vect.alu cost, with the 
exception of targets with cost 0 for vect.alu, those I gave the cost 4.


Bootstrapped on aarch64.

Is this OK for trunk?

gcc/ChangeLog:

    * config/aarch64/aarch64-cost-tables.h: Add entries for vect.mul.
    * config/aarch64/aarch64.c (aarch64_rtx_mult_cost): Use 
vect.mul for

    vector multiplies and vect.alu for SSRA.
    * config/arm/aarch-common-protos.h (struct vector_cost_table): 
Define

    vect.mul cost field.
    * config/arm/aarch-cost-tables.h: Add entries for vect.mul.
    * config/arm/arm.c: Likewise.

gcc/testsuite/ChangeLog:

    * gcc.target/aarch64/asimd-mul-to-shl-sub.c: New test.

diff --git a/gcc/config/aarch64/aarch64-cost-tables.h 
b/gcc/config/aarch64/aarch64-cost-tables.h
index 
c309f88cbd56f0d2347996d860c982a3a6744492..dd2e7e7cbb13d24f0b51092270cd7e2d75fabf29
 100644
--- a/gcc/config/aarch64/aarch64-cost-tables.h
+++ b/gcc/config/aarch64/aarch64-cost-tables.h
@@ -123,7 +123,8 @@ const struct cpu_cost_table qdf24xx_extra_costs =
   },
   /* Vector */
   {
-COSTS_N_INSNS (1)  /* alu.  */
+COSTS_N_INSNS (1),  /* alu.  */
+COSTS_N_INSNS (4)   /* mult.  */
   }
 };
 
@@ -227,7 +228,8 @@ const struct cpu_cost_table thunderx_extra_costs =
   },
   /* Vector */
   {
-COSTS_N_INSNS (1)  /* Alu.  */
+COSTS_N_INSNS (1), /* Alu.  */
+COSTS_N_INSNS (4)  /* mult.  */
   }
 };
 
@@ -330,7 +332,8 @@ const struct cpu_cost_table thunderx2t99_extra_costs =
   },
   /* Vector */
   {
-COSTS_N_INSNS (1)  /* Alu.  */
+COSTS_N_INSNS (1), /* Alu.  */
+COSTS_N_INSNS (4)  /* Mult.  */
   }
 };
 
@@ -433,7 +436,8 @@ const struct cpu_cost_table thunderx3t110_extra_costs =
   },
   /* Vector */
   {
-COSTS_N_INSNS (1)  /* Alu.  */
+COSTS_N_INSNS (1), /* Alu.  */
+COSTS_N_INSNS (4)  /* Mult.  */
   }
 };
 
@@ -537,7 +541,8 @@ const struct cpu_cost_table tsv110_extra_costs =
   },
   /* Vector */
   {
-COSTS_N_INSNS (1)  /* alu.  */
+COSTS_N_INSNS (1),  /* alu.  */
+COSTS_N_INSNS (4)   /* mult.  */
   }
 };
 
@@ -640,7 +645,8 @@ const struct cpu_cost_table a64fx_extra_costs =
   },
   /* Vector */
   {
-COSTS_N_INSNS (1)  /* alu.  */
+COSTS_N_INSNS (1),  /* alu.  */
+COSTS_N_INSNS (4)   /* mult.  */
   }
 };
 
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
b6192e55521004ae70cd13acbdb4dab142216845..146ed8c1b693d7204a754bc4e6d17025e0af544b
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -11568,7 +11568,6 @@ aarch64_rtx_mult_cost (rtx x, enum rtx_code code, int 
outer, bool speed)
   if (VECTOR_MODE_P (mode))
 {
   unsigned int vec_flags = aarch64_classify_vector_mode (mode);
-  mode = GET_MODE_INNER (mode);
   if (vec_flags & VEC_ADVSIMD)
{
  /* The by-element versions of the instruction have the same costs as
@@ -11582,6 +11581,17 @@ aarch64_rtx_mult_cost (rtx x, enum rtx_code code, int 
outer, bool speed)
  else if (GET_CODE (op1) == VEC_DUPLICATE)
op1 = XEXP (op1, 0);
}
+  cost += rtx_cost (op0, mode, MULT, 0, speed);
+  cost += rtx_cost (op1, mode, MULT, 1, speed);
+  if (speed)
+   {
+ if (GET_CODE (x) == MULT)
+   cost += extra_cost->vect.mult;
+ /* This is to catch the SSRA costing currently flowing here.  */
+ else
+   cost += extra_cost->vect.alu;
+   }
+  return cost;
 }
 
   /* Integer multiply/fma.  */
diff --git a/gcc/config/arm/aarch-common-protos.h 
b/gcc/config/arm/aarch-common-protos.h
index 
251de3d61a833a2bb4b77e9211cac7fbc17c0b75..7a9cf3d324c103de74af741abe9ef30b76fea5ce
 100644
--- a/gcc/config/arm/aarch-common-protos.h
+++ b/gcc/config/arm/aarch-common-protos.h
@@ -132,6 +132,7 @@ struct fp_cost_table
 struct vector_cost_table
 {
   const int alu;
+  const int mult;
 };
 
 struct cpu_cost_table
diff --git a/gcc/config/arm/aarch-cost-tables.h 

Re: Merge from trunk to gccgo branch

2021-02-03 Thread Ian Lance Taylor via Gcc-patches
I merged trunk revision 530203d6e3244c25eda4124f0fa5756ca9a5683e to
the gccgo branch.

Ian


[pushed] c++: Fix alias comparison [PR98926]

2021-02-03 Thread Jason Merrill via Gcc-patches
The comparison of dependent aliases wasn't working here because
processing_template_decl wasn't set, so dependent_alias_template_spec_p was
always returning false.

Tested x86_64-pc-linux-gnu (with the strict-gc parms below), applying to trunk.

gcc/cp/ChangeLog:

PR c++/98926
PR c++/98570
* pt.c (spec_hasher::equal): Set processing_template_decl.
* Make-lang.in (check-g++-strict-gc): Add --param
hash-table-verification-limit=1.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/alias-decl-dr1558.C: Pass --param
hash-table-verification-limit=1.
---
 gcc/cp/pt.c| 2 ++
 gcc/testsuite/g++.dg/cpp0x/alias-decl-dr1558.C | 1 +
 gcc/cp/Make-lang.in| 2 +-
 3 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 4781519d00f..c5b0a9292db 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -1720,6 +1720,7 @@ spec_hasher::equal (spec_entry *e1, spec_entry *e2)
 
   ++comparing_specializations;
   ++comparing_dependent_aliases;
+  ++processing_template_decl;
   equal = (e1->tmpl == e2->tmpl
   && comp_template_args (e1->args, e2->args));
   if (equal && flag_concepts
@@ -1734,6 +1735,7 @@ spec_hasher::equal (spec_entry *e1, spec_entry *e2)
   tree c2 = e2->spec ? get_constraints (e2->spec) : NULL_TREE;
   equal = equivalent_constraints (c1, c2);
 }
+  --processing_template_decl;
   --comparing_dependent_aliases;
   --comparing_specializations;
 
diff --git a/gcc/testsuite/g++.dg/cpp0x/alias-decl-dr1558.C 
b/gcc/testsuite/g++.dg/cpp0x/alias-decl-dr1558.C
index 2bbb138ec22..8495462bd6d 100644
--- a/gcc/testsuite/g++.dg/cpp0x/alias-decl-dr1558.C
+++ b/gcc/testsuite/g++.dg/cpp0x/alias-decl-dr1558.C
@@ -1,5 +1,6 @@
 // DR 1558 still applies when using void_t as a template-argument.
 // { dg-do compile { target c++11 } }
+// { dg-additional-options "--param hash-table-verification-limit=1" }
 
 template using void_t = void;
 template struct A { };
diff --git a/gcc/cp/Make-lang.in b/gcc/cp/Make-lang.in
index 62295fb0dfe..155be74efdb 100644
--- a/gcc/cp/Make-lang.in
+++ b/gcc/cp/Make-lang.in
@@ -224,7 +224,7 @@ check-c++-all:
 
 # Run the testsuite with garbage collection at every opportunity.
 check-g++-strict-gc:
-   $(MAKE) RUNTESTFLAGS="$(RUNTESTFLAGS) 
--extra_opts,--param,ggc-min-heapsize=0,--param,ggc-min-expand=0" \
+   $(MAKE) RUNTESTFLAGS="$(RUNTESTFLAGS) 
--extra_opts,--param,ggc-min-heapsize=0,--param,ggc-min-expand=0,--param,hash-table-verification-limit=1"
 \
  TESTSUITEDIR="$(TESTSUITEDIR).gc" check-g++
 check-c++-subtargets : check-g++-subtargets
 # List of targets that can use the generic check- rule and its // variant.

base-commit: 5c3d388aee5609d32bd8e3ba1add776b1a6f0d1f
-- 
2.27.0



[PATCH] c++: Mark member functions as const [PR98951]

2021-02-03 Thread Marek Polacek via Gcc-patches
These member functions look like they could be marked const, since
they don't modify any (non-mutable) class members.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

PR c++/98951
* call.c (struct z_candidate): Mark rewritten and reversed as const.
(struct NonPublicField): Mark operator() as const.
(struct NonTrivialField): Likewise.
---
 gcc/cp/call.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index 87a7af12796..3068c0f8cfd 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -520,8 +520,8 @@ struct z_candidate {
   /* The flags active in add_candidate.  */
   int flags;
 
-  bool rewritten () { return (flags & LOOKUP_REWRITTEN); }
-  bool reversed () { return (flags & LOOKUP_REVERSED); }
+  bool rewritten () const { return (flags & LOOKUP_REWRITTEN); }
+  bool reversed () const { return (flags & LOOKUP_REVERSED); }
 };
 
 /* Returns true iff T is a null pointer constant in the sense of
@@ -9474,7 +9474,7 @@ first_non_static_field (tree type, Predicate pred)
 
 struct NonPublicField
 {
-  bool operator() (const_tree t)
+  bool operator() (const_tree t) const
   {
 return DECL_P (t) && (TREE_PRIVATE (t) || TREE_PROTECTED (t));
   }
@@ -9491,7 +9491,7 @@ first_non_public_field (tree type)
 
 struct NonTrivialField
 {
-  bool operator() (const_tree t)
+  bool operator() (const_tree t) const
   {
 return !trivial_type_p (DECL_P (t) ? TREE_TYPE (t) : t);
   }

base-commit: 530203d6e3244c25eda4124f0fa5756ca9a5683e
-- 
2.29.2



Re: [PATCH] document BLOCK_ABSTRACT_ORIGIN et al.

2021-02-03 Thread Martin Sebor via Gcc-patches

On 2/3/21 5:01 AM, Richard Biener wrote:

On Mon, Feb 1, 2021 at 5:20 PM Martin Sebor  wrote:


I have pushed the tree.h comments in g:6a2053773b8.  I will wait
for an approval of the changes to the manual.


Sorry for not looking earlier.


Sorry, I thought you were fine with the text after your first review.
I'll adjust the tree.h comments when we're done, though I'd like to
think the example in the manual will do a lot more to help make it
clear than the comments in tree.h can.



+/* The scope enclosing the scope NODE, or FUNCTION_DECL for the "outermost"
+   function scope.  Inlined functions are chained by this so that given
+   expression E and its TREE_BLOCK(E) B, BLOCK_SUPERCONTEXT(B) is the scope
+   in which E has been made or into which E has been inlined.   */

I can't really understand what you are trying to say with the second
sentence.  There's
nothing really special about BLOCK_SUPERCONTEXT and inlines so I believe this
sentence only adds confusion.


The sentence explains how SUPERCONTEXT chains inlined blocks.  In
the manual diff I show an example:

void f0 (char *p, int n) { memset (p, 1, n); }

  void f1 (char *p, int n) { f0 (p + 1, n + 1); }

void f2 (char *p, int n) { f1 (p + 1, n + 1); }

  int a[6];
  void f3 (char *p, int n) { f2 (a, 3); }

The blocks for all calls inlined into f3 are chained like so:

  CALL_EXPR: memset  E

  BLOCK #13 <--+ TREE_BLOCK (E)
  +-- SUPERCONTEXT: BLOCK #12  |
  | ABSTRACT_ORIGIN: BLOCK #0 --+
  |||
  +-> BLOCK #12 (f1)<--|-+  |
  +-- SUPERCONTEXT: BLOCK #10  | |  |
  | SUBBLOCKS: BLOCK #13 --|-|  |
  | ABSTRACT_ORIGIN: f0 ---+ |  |
  |  |  |
  +-> BLOCK #10 (f2) <-+ |  |
  +--- SUPERCONTEXT: BLOCK #8  | |  |
  |SUBBLOCKS: BLOCK #12 ---|-|  |
  |ABSTRACT_ORIGIN: f1 --+  |
  |||
  +-> BLOCK #8 (f3)||
  + SUPERCONTEXT: BLOCK #0 ||
  | SUBBLOCKS: BLOCK #10 --||
  | ABSTRACT_ORIGIN: f2 ---+|
  | |
  +-> BLOCK #0 (f3) <---+
SUPERCONTEXT: f3
SUBBLOCKS: BLOCK #8

Does the following sound better? (Dropping the "in which E has been
made.")

  Inlined functions are chained by this so that given expression E
  and its TREE_BLOCK(E) B, BLOCK_SUPERCONTEXT(B) is the scope into
  which E has been inlined.


  #define BLOCK_SUPERCONTEXT(NODE) (BLOCK_CHECK (NODE)->block.supercontext)
+/* Points to the next scope at the same level of nesting as scope NODE.  */
  #define BLOCK_CHAIN(NODE) (BLOCK_CHECK (NODE)->block.chain)
+/* A BLOCK, or FUNCTION_DECL of the function from which a block has been
+   inlined.

... from which a block has been ultimatively copied for example by inlining.

[clones also will have abstract origins]

   In a scope immediately enclosing an inlined leaf expression,
+   points to the outermost scope into which it has been inlined (thus
+   bypassing all intermediate BLOCK_SUPERCONTEXTs). */

?


This describes the long arrow on the right, pointing Block #13's
ABSTRACT_ORIGIN down to Block #0.  All the other AO's point down
to the next/enclosing block (arrows on the left).  I didn't expect
this when I first worked with the blocks so it seemed like
an important detail to mention.



Maybe:  An inlined function is represented by a scope with
BLOCK_ABSTRACT_ORIGIN being the FUNCTION_DECL of the inlined function
containing the inlined functions scope tree as children.  All abstract origins
are ultimate, that is BLOCK_ABSTRACT_ORIGIN(NODE)
== BLOCK_ABSTRACT_ORIGIN(BLOCK_ABSTRACT_ORIGIN (NODE)).


The first sentence sounds good to me as far as it goes but it
doesn't capture the long arrow above.  (By children I assume you
mean SUBBLOCKS, correct?)

I don't follow what you're trying to say in the second sentence.
The equality isn't true for Block #0 whose AO is null.  It also
isn't true for Block #12 and the others whose AO is a DECL, not
a block.

What do you mean by "ultimate" in plain English?

FWIW, if I were to try to explain it using the example I'd say
only Block #13's AO is "ultimate:" it points down in the diagram
to the block of the function into which the expression has
ultimately been inlined.  The AO's of all the other intervening
inlined blocks are the DECLs of the inlined callees (up-pointing
arrows); they don't look ultimate to me in this sense.

But however this is phrased I suspect it won't be perfectly clear
without an example or a picture.

Martin



  #define BLOCK_ABSTRACT_ORIGIN(NODE) (BLOCK_CHECK 
(NODE)->block.abstract_origin)



On 1/27/21 5:54 PM, Martin Sebor wrote:

Attached is an updated patch for both tree.h and the internals manual
documenting the most important BLOCK_ macros and what they represent.

On 1/21/21 2:52 PM, Martin Sebor wrote:

On 1/18/21 6:25 AM, Richard Biener wrote:

PS Here are my notes on the macros and the two related 

libgo patch committed: Install new packages

2021-02-03 Thread Ian Lance Taylor via Gcc-patches
In the update of libgo to the Go 1.16 beta and release candidate, I
forgot to update the Makefile to actually install new packages.  This
patch does that.  Bootstrapped and ran Go testsuite on
x86_64-pc-linux-gnu.  Committed to mainline.

Ian
8e6839059d52c02acb52a4ba1ea6a5fcda88d16b
diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE
index cb12c83a700..905c6fe9326 100644
--- a/gcc/go/gofrontend/MERGE
+++ b/gcc/go/gofrontend/MERGE
@@ -1,4 +1,4 @@
-271a043537f2f0ae93bde2cf4f4897e68a476ece
+78770fd9c29037dec8b2919c0f02067915c6ad33
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
diff --git a/libgo/Makefile.am b/libgo/Makefile.am
index 7be90a564fe..eea7ff15aed 100644
--- a/libgo/Makefile.am
+++ b/libgo/Makefile.am
@@ -130,6 +130,7 @@ toolexeclibgo_DATA = \
bytes.gox \
context.gox \
crypto.gox \
+   embed.gox \
encoding.gox \
errors.gox \
expvar.gox \
@@ -256,6 +257,11 @@ toolexeclibgogo_DATA = \
go/token.gox \
go/types.gox
 
+toolexeclibgogobuilddir = $(toolexeclibgogodir)/build
+
+toolexeclibgogobuild_DATA = \
+   go/build/constraint.gox
+
 toolexeclibgohashdir = $(toolexeclibgodir)/hash
 
 toolexeclibgohash_DATA = \
@@ -292,6 +298,7 @@ toolexeclibgoindex_DATA = \
 toolexeclibgoiodir = $(toolexeclibgodir)/io
 
 toolexeclibgoio_DATA = \
+   io/fs.gox \
io/ioutil.gox
 
 toolexeclibgologdir = $(toolexeclibgodir)/log
@@ -360,6 +367,7 @@ toolexeclibgoruntimedir = $(toolexeclibgodir)/runtime
 
 toolexeclibgoruntime_DATA = \
runtime/debug.gox \
+   runtime/metrics.gox \
runtime/pprof.gox \
runtime/trace.gox
 
@@ -371,6 +379,7 @@ toolexeclibgosync_DATA = \
 toolexeclibgotestingdir = $(toolexeclibgodir)/testing
 
 toolexeclibgotesting_DATA = \
+   testing/fstest.gox \
testing/iotest.gox \
testing/quick.gox
 


[PATCH] Fix Ada bootstrap failure on Cygwin since switch to C++11 (PR98590)

2021-02-03 Thread Mikael Pettersson via Gcc-patches
This fixes the bootstrap failure with Ada on Cygwin since the switch
to C++11. The configure checks detect that fileno_unlocked () is
present, but when Ada's cstreams.c is compiled in C++11 mode,
 does not declare it, causing a hard error.

Fixed by defining _GNU_SOURCE before including .

Ok for the master branch?

gcc/ada/

2021-02-03  Mikael Pettersson  

PR bootstrap/98590
* cstreams.c: Ensure fileno_unlocked() is visible on Cygwin.

diff --git a/gcc/ada/cstreams.c b/gcc/ada/cstreams.c
index 4e00dedbbd6..9d2f41c5269 100644
--- a/gcc/ada/cstreams.c
+++ b/gcc/ada/cstreams.c
@@ -37,6 +37,11 @@
 #define _FILE_OFFSET_BITS 64
 /* the define above will make off_t a 64bit type on GNU/Linux */

+/* tell Cygwin's  to expose fileno_unlocked() to work around
PR98590 */
+#if defined(__CYGWIN__) && !defined(__CYGWIN32__) && !defined(_GNU_SOURCE)
+#define _GNU_SOURCE
+#endif
+
 #include 
 #include 
 #include 
From 7a277d8c2a6c1d4ccfbb0ca350e4b1f35a3e575c Mon Sep 17 00:00:00 2001
From: Mikael Pettersson 
Date: Wed, 3 Feb 2021 17:25:42 +0100
Subject: [PATCH] Ensure fileno_unlocked() is visible on Cygwin.

---
 gcc/ada/cstreams.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/gcc/ada/cstreams.c b/gcc/ada/cstreams.c
index 4e00dedbbd6..9d2f41c5269 100644
--- a/gcc/ada/cstreams.c
+++ b/gcc/ada/cstreams.c
@@ -37,6 +37,11 @@
 #define _FILE_OFFSET_BITS 64
 /* the define above will make off_t a 64bit type on GNU/Linux */
 
+/* tell Cygwin's  to expose fileno_unlocked() to work around PR98590 */
+#if defined(__CYGWIN__) && !defined(__CYGWIN32__) && !defined(_GNU_SOURCE)
+#define _GNU_SOURCE
+#endif
+
 #include 
 #include 
 #include 
-- 
2.26.2



[committed] testsuite: Add test for already fixed PR [PR97804]

2021-02-03 Thread Jakub Jelinek via Gcc-patches
Hi!

This testcase got fixed with the PR98463
r11-6895-g94ff4c9dd98f39280fba22d1ad0958fb25a5363b fix.

Regtested on x86_64-linux, committed to trunk and 10 branch.

2021-02-03  Jakub Jelinek  

PR c++/97804
* g++.dg/cpp2a/no_unique_address11.C: New test.

--- gcc/testsuite/g++.dg/cpp2a/no_unique_address11.C.jj
+++ gcc/testsuite/g++.dg/cpp2a/no_unique_address11.C
@@ -0,0 +1,18 @@
+// PR c++/97804
+// { dg-do compile { target c++17 } }
+
+template  struct b {
+  constexpr b() : c() {}
+  [[no_unique_address]] a c;
+};
+template  struct d;
+template 
+struct d : d<1, f...>, b {};
+template  struct d : b {};
+template  class h : d<0, g...> {};
+struct i {};
+class j {
+  using k = int;
+  h l;
+  float m = 0.025f;
+} n;

Jakub



RE: [PATCH] arm: Fix up neon_vector_mem_operand [PR97528]

2021-02-03 Thread Kyrylo Tkachov via Gcc-patches


> -Original Message-
> From: Andre Vieira (lists) 
> Sent: 03 February 2021 13:58
> To: Kyrylo Tkachov ; gcc-patches@gcc.gnu.org
> Cc: ja...@redhat.com
> Subject: Re: [PATCH] arm: Fix up neon_vector_mem_operand [PR97528]
> 
> Same patch applies cleanly on gcc-8, bootstrapped
> arm-none-linux-gnueabihf and ran regressions also clean.
> 
> Can I also commit it to gcc-8?

Ok.
Thanks,
Kyrill

> 
> Thanks,
> Andre
> 
> On 02/02/2021 17:36, Kyrylo Tkachov wrote:
> >
> >> -Original Message-
> >> From: Andre Vieira (lists) 
> >> Sent: 02 February 2021 17:27
> >> To: gcc-patches@gcc.gnu.org
> >> Cc: Kyrylo Tkachov ; ja...@redhat.com
> >> Subject: Re: [PATCH] arm: Fix up neon_vector_mem_operand [PR97528]
> >>
> >> Hi,
> >>
> >> This is a gcc-9 backport of the PR97528 fix that has been applied to
> >> trunk and gcc-10.
> >> Bootstraped on arm-linux-gnueabihf and regression tested.
> >>
> >> OK for gcc-9 branch?
> > Ok.
> > Thanks,
> > Kyrill
> >
> >> 2021-02-02  Andre Vieira  
> >>
> >>       Backport from mainline
> >>       2020-11-20  Jakub Jelinek  
> >>
> >>       PR target/97528
> >>       * config/arm/arm.c (neon_vector_mem_operand): For POST_MODIFY,
> >> require
> >>       first POST_MODIFY operand is a REG and is equal to the first operand
> >>       of PLUS.
> >>
> >>       * gcc.target/arm/pr97528.c: New test.
> >>
> >> On 20/11/2020 11:25, Kyrylo Tkachov via Gcc-patches wrote:
>  -Original Message-
>  From: Jakub Jelinek 
>  Sent: 19 November 2020 18:57
>  To: Richard Earnshaw ; Ramana
>  Radhakrishnan ; Kyrylo Tkachov
>  
>  Cc: gcc-patches@gcc.gnu.org
>  Subject: [PATCH] arm: Fix up neon_vector_mem_operand [PR97528]
> 
>  Hi!
> 
>  The documentation for POST_MODIFY says:
>   Currently, the compiler can only handle second operands of the
>   form (plus (reg) (reg)) and (plus (reg) (const_int)), where
>   the first operand of the PLUS has to be the same register as
>   the first operand of the *_MODIFY.
>  The following testcase ICEs, because combine just attempts to simplify
>  things and ends up with
>  (post_modify (reg1) (plus (mult (reg2) (const_int 4)) (reg1))
>  but the target predicates accept it, because they only verify
>  that POST_MODIFY's second operand is PLUS and the second operand
>  of the PLUS is a REG.
> 
>  The following patch fixes this by performing further verification that
>  the POST_MODIFY is in the form it should be.
> 
>  Bootstrapped/regtested on armv7hl-linux-gnueabi, ok for trunk
>  and release branches after a while?
> >>> Ok.
> >>> Thanks,
> >>> Kyrill
> >>>
>  2020-11-19  Jakub Jelinek  
> 
>   PR target/97528
>   * config/arm/arm.c (neon_vector_mem_operand): For
>  POST_MODIFY, require
>   first POST_MODIFY operand is a REG and is equal to the first operand
>   of PLUS.
> 
>   * gcc.target/arm/pr97528.c: New test.
> 
>  --- gcc/config/arm/arm.c.jj  2020-11-13 19:00:46.729620560
> +0100
>  +++ gcc/config/arm/arm.c 2020-11-18 17:05:44.656867343
> +0100
>  @@ -13429,7 +13429,9 @@ neon_vector_mem_operand (rtx op, int
> typ
>   /* Allow post-increment by register for VLDn */
>   if (type == 2 && GET_CODE (ind) == POST_MODIFY
>   && GET_CODE (XEXP (ind, 1)) == PLUS
>  -  && REG_P (XEXP (XEXP (ind, 1), 1)))
>  +  && REG_P (XEXP (XEXP (ind, 1), 1))
>  +  && REG_P (XEXP (ind, 0))
>  +  && rtx_equal_p (XEXP (ind, 0), XEXP (XEXP (ind, 1), 0)))
>  return true;
> 
>   /* Match:
>  --- gcc/testsuite/gcc.target/arm/pr97528.c.jj2020-11-18
>  17:09:58.195053288 +0100
>  +++ gcc/testsuite/gcc.target/arm/pr97528.c   2020-11-18
>  17:09:47.839168237 +0100
>  @@ -0,0 +1,28 @@
>  +/* PR target/97528 */
>  +/* { dg-do compile } */
>  +/* { dg-require-effective-target arm_neon_ok } */
>  +/* { dg-options "-O1" }  */
>  +/* { dg-add-options arm_neon } */
>  +
>  +#include 
>  +
>  +typedef __simd64_int16_t T;
>  +typedef __simd64_uint16_t U;
>  +unsigned short c;
>  +int d;
>  +U e;
>  +
>  +void
>  +foo (void)
>  +{
>  +  unsigned short *dst = 
>  +  int g = d, b = 4;
>  +  U dc = e;
>  +  for (int h = 0; h < b; h++)
>  +{
>  +  unsigned short *i = dst;
>  +  U j = dc;
>  +  vst1_s16 ((int16_t *) i, (T) j);
>  +  dst += g;
>  +}
>  +}
> 
> 
>   Jakub


Re: [PATCH 00/16] stdx::simd fixes and testsuite improvements

2021-02-03 Thread Jonathan Wakely via Gcc-patches

On 27/01/21 21:36 +0100, Matthias Kretz wrote:

As promised on IRC ...

Matthias Kretz (15):
 Support skip, only, expensive, and xfail markers
 Fix NEON intrinsic types usage
 Support -mlong-double-64 on PPC
 Fix simd_mask on POWER w/o POWER8
 Fix several check-simd interaction issues
 Fix DRIVEROPTS and TESTFLAGS processing
 Fix incorrect display of old test summaries
 Immediate feedback with -v
 Fix mask reduction of simd_mask on POWER7
 Skip testing hypot3 for long double on PPC
 Abort test after 1000 lines of output
 Support timeout and timeout-factor options
 Improve test codegen for interpreting assembly
 Implement hmin and hmax
 Work around test failures using -mno-tree-vrp

yaozhongxiao (1):
 Improve "find_first/last_set" for NEON


All 16 committed now. Thanks.




[committed] libstdc++: Fix incorrect test for std::error_code comparisons

2021-02-03 Thread Jonathan Wakely via Gcc-patches
The tests for std::error_code comparisons assumed that a default
constructed object uses std::generic_category(). That's true for a
default constructed std::error_condition, but not std::error_code.

Fix the three-way comparisons to correctly depend on the result of
comparing the categories, and add another test for comparing two objects
with the same category and different values.

libstdc++-v3/ChangeLog:

* testsuite/19_diagnostics/error_code/operators/not_equal.cc:
Add comparison with same category and different values.
* testsuite/19_diagnostics/error_code/operators/less.cc:
Likewise. Fix comparison involving different categories.
* testsuite/19_diagnostics/error_code/operators/three_way.cc:
Likewise.
* testsuite/19_diagnostics/error_condition/operators/less.cc:
Add comment.
* testsuite/19_diagnostics/error_condition/operators/three_way.cc:
Likewise.

Tested x86_64-linux. Committed to trunk.

commit a6f08be383f846a0474ea8d1da9222b802c36c7c
Author: Jonathan Wakely 
Date:   Wed Feb 3 15:49:36 2021

libstdc++: Fix incorrect test for std::error_code comparisons

The tests for std::error_code comparisons assumed that a default
constructed object uses std::generic_category(). That's true for a
default constructed std::error_condition, but not std::error_code.

Fix the three-way comparisons to correctly depend on the result of
comparing the categories, and add another test for comparing two objects
with the same category and different values.

libstdc++-v3/ChangeLog:

* testsuite/19_diagnostics/error_code/operators/not_equal.cc:
Add comparison with same category and different values.
* testsuite/19_diagnostics/error_code/operators/less.cc:
Likewise. Fix comparison involving different categories.
* testsuite/19_diagnostics/error_code/operators/three_way.cc:
Likewise.
* testsuite/19_diagnostics/error_condition/operators/less.cc:
Add comment.
* testsuite/19_diagnostics/error_condition/operators/three_way.cc:
Likewise.

diff --git a/libstdc++-v3/testsuite/19_diagnostics/error_code/operators/less.cc 
b/libstdc++-v3/testsuite/19_diagnostics/error_code/operators/less.cc
index 655515c4988..abb754136ea 100644
--- a/libstdc++-v3/testsuite/19_diagnostics/error_code/operators/less.cc
+++ b/libstdc++-v3/testsuite/19_diagnostics/error_code/operators/less.cc
@@ -29,10 +29,13 @@ int main()
   VERIFY( !(e1 < e1) );
   VERIFY( !(e2 < e2) );
 
-  VERIFY( (e1 < e2) == (e1.value() < e2.value()) );
+  VERIFY( (e1 < e2) == (e1.category() < e2.category()) );
 
   const __gnu_test::test_category cat;
   std::error_code e3(e2.value(), cat);
   VERIFY( !(e3 < e3) );
   VERIFY( (e2 < e3) == (e2.category() < e3.category()) );
+
+  std::error_code e4(std::make_error_code(std::errc::invalid_argument));
+  VERIFY( (e4 < e2) == (e4.value() < e2.value()) );
 }
diff --git 
a/libstdc++-v3/testsuite/19_diagnostics/error_code/operators/not_equal.cc 
b/libstdc++-v3/testsuite/19_diagnostics/error_code/operators/not_equal.cc
index a8dfa505dc2..543ffceb832 100644
--- a/libstdc++-v3/testsuite/19_diagnostics/error_code/operators/not_equal.cc
+++ b/libstdc++-v3/testsuite/19_diagnostics/error_code/operators/not_equal.cc
@@ -34,5 +34,6 @@ int main()
   std::error_code e3(e2.value(), cat);
   VERIFY( e2 != e3 );
 
-  return 0;
+  std::error_code e4(std::make_error_code(std::errc::invalid_argument));
+  VERIFY( e4 != e2 );
 }
diff --git 
a/libstdc++-v3/testsuite/19_diagnostics/error_code/operators/three_way.cc 
b/libstdc++-v3/testsuite/19_diagnostics/error_code/operators/three_way.cc
index 448f51d1d73..50c54bea94f 100644
--- a/libstdc++-v3/testsuite/19_diagnostics/error_code/operators/three_way.cc
+++ b/libstdc++-v3/testsuite/19_diagnostics/error_code/operators/three_way.cc
@@ -33,7 +33,7 @@ test01()
 
   VERIFY( std::is_neq(e1 <=> e2) );
   VERIFY( std::is_lt(e1 <=> e2) || std::is_gt(e1 <=> e2) );
-  VERIFY( (e1 <=> e2) == (e1.value() <=> e2.value()) );
+  VERIFY( (e1 <=> e2) == (e1.category() <=> e2.category()) );
 
   VERIFY( e1 == e1 );
   VERIFY( !(e1 == e2) );
@@ -52,6 +52,12 @@ test01()
 
   VERIFY( !(e3 < e3) );
   VERIFY( (e2 < e3) == (e2.category() < e3.category()) );
+
+  std::error_code e4(std::make_error_code(std::errc::invalid_argument));
+
+  VERIFY( std::is_neq(e4 <=> e2) );
+  VERIFY( std::is_lt(e4 <=> e2) || std::is_gt(e4 <=> e2) );
+  VERIFY( (e4 <=> e2) == (e4.value() <=> e2.value()) );
 }
 
 int main()
diff --git 
a/libstdc++-v3/testsuite/19_diagnostics/error_condition/operators/less.cc 
b/libstdc++-v3/testsuite/19_diagnostics/error_condition/operators/less.cc
index 96f8b6868af..8a6b71fdc35 100644
--- a/libstdc++-v3/testsuite/19_diagnostics/error_condition/operators/less.cc
+++ b/libstdc++-v3/testsuite/19_diagnostics/error_condition/operators/less.cc
@@ -29,6 +29,7 @@ int main()
   VERIFY( 

Re: [PATCH] doc: mention -mprefer-vector-width in target attrs

2021-02-03 Thread Marek Polacek via Gcc-patches
On Wed, Feb 03, 2021 at 04:15:24PM +0100, Martin Liška wrote:
> The patch documents -mprefer-vector-width which is a valid target 
> attribute/pragma.

> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index 8daa1c67974..a14875cec37 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -7020,6 +7020,28 @@ On x86 targets, the @code{fentry_section} attribute 
> sets the name
>  of the section to record function entry instrumentation calls in when
>  enabled with @option{-pg -mrecord-mcount}
> +@item prefer-vector-width=@var{OPT}
> +@cindex @code{prefer-vector-width} function attribute, x86
> +On x86 targets, the @code{prefer-vector-width} attribute inform the

"informs"

> +compiler to use @var{OPT}-bit vector width in instructions
> +instead of default on the selected platform.

"the default"

Marek



[PATCH] doc: mention -mprefer-vector-width in target attrs

2021-02-03 Thread Martin Liška

The patch documents -mprefer-vector-width which is a valid target 
attribute/pragma.

Ready for master?
Thanks,
Martin

gcc/ChangeLog:

* doc/extend.texi: Mention -mprefer-vector-width in target
attributes.

gcc/testsuite/ChangeLog:

* gcc.target/i386/prefer-vector-width-attr.c: New test.
---
 gcc/doc/extend.texi   | 22 +++
 .../i386/prefer-vector-width-attr.c   | 11 ++
 2 files changed, 33 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/prefer-vector-width-attr.c

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 8daa1c67974..a14875cec37 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -7020,6 +7020,28 @@ On x86 targets, the @code{fentry_section} attribute sets 
the name
 of the section to record function entry instrumentation calls in when
 enabled with @option{-pg -mrecord-mcount}
 
+@item prefer-vector-width=@var{OPT}

+@cindex @code{prefer-vector-width} function attribute, x86
+On x86 targets, the @code{prefer-vector-width} attribute inform the
+compiler to use @var{OPT}-bit vector width in instructions
+instead of default on the selected platform.
+
+Valid @var{OPT} values are:
+
+@table @samp
+@item none
+No extra limitations applied to GCC other than defined by the selected 
platform.
+
+@item 128
+Prefer 128-bit vector width for instructions.
+
+@item 256
+Prefer 256-bit vector width for instructions.
+
+@item 512
+Prefer 512-bit vector width for instructions.
+@end table
+
 @end table
 
 On the x86, the inliner does not inline a

diff --git a/gcc/testsuite/gcc.target/i386/prefer-vector-width-attr.c 
b/gcc/testsuite/gcc.target/i386/prefer-vector-width-attr.c
new file mode 100644
index 000..3929f909a44
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/prefer-vector-width-attr.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+
+#pragma GCC push_options
+#pragma GCC target("prefer-vector-width=512")
+
+int
+__attribute__((target("prefer-vector-width=none")))
+main()
+{
+  return 0;
+}
--
2.30.0



Re: [RFC] test builtin ratio for loop distribution

2021-02-03 Thread Alexandre Oliva
On Feb  3, 2021, Richard Biener  wrote:

> So I think we should try to match what __builtin_memcpy/memset
> expansion would do here, taking advantage of extra alignment
> and size knowledge.  In particular,

>  a) if __builtin_memcpy/memset would use setmem/cpymem optabs
>  see if we can have variants of memcpy/memset transferring alignment
>  and size knowledge

We could add more optional parameters to them to convey the length's
known ctz.  However, the ctz can't be recovered reliably.  We can't even
recover it after gimplifying the length within ldist!

That said, my other patch already enables ctz calls to recover it, at
least in libgcc risc-v tfmode cases, and it's possible it's readily
available in other cases.  I'd rather leave that for someone dealing
with the machine-specific patterns to figure out whether a separate
argument would be useful.  RISC-V, which is what I'm dealing with,
doesn't have much to offer as far as these patterns are concerned.

>  b) if expansion would use BY_PIECES then expand to an unrolled loop

Why would that be better than keeping the constant-length memset call,
that would be turned into an unrolled loop during expand?

>  c) if expansion would emit a memset/memcpy call but we know
>  alignment and have a low bound on niters emit a loop (like your patch 
> does)

> a) might be difficult but adding the builtin variants may pay off in any case.

*nod*

> The patch itself could benefit from one or two helpers we already
> have, first of all there's create_empty_loop_on_edge (so you don't
> need the loop fixup)

Uhh, thanks, but...  you realize nearly all of the gimple-building code
is one and the same for the loop and for trailing count misalignment?
There doesn't seem to be a lot of benefit to using this primitive, aside
from its dealing with creating the loop data structure which, again, I'd
only want to do in the loop case.

(I guess I should add more comments as to the inline expansion
 strategy.  it's equivalent to:

 i = len, ptr = base, blksz = 1 << alctz;
 while (i >= blksz) { *(ub*)ptr = val; i -= blksz; ptr += blksz; }
 blksz >>= 1; if (i >= blksz) { *(ub*)ptr = val; i -= blksz; ptr += 
blksz; }
 blksz >>= 1; if (i >= blksz) { *(ub*)ptr = val; i -= blksz; ptr += 
blksz; }
 ... until blksz gets down to zero or to 1< Note that for memmove if we know the dependence direction, we
> can also emit a loop / unrolled code.

*nod*, but the logic would have to be quite different, using bit tests,
and odds are we won't know the direction and have to output a test and
code for both possibilities, which would be quite unlikely to be
beneficial.  Though the original code would quite likely make the
direction visible; perhaps if the size is small enough that we would
expand a memcpy inline, we should refrain from transforming the loop
into a memmove call.

In the case at hand, there's no benefit at all to these transformations:
we start from a loop with the known alignment and a small loop count (0
to 2 words copied), and if everything goes all right with the
transformation, we may be lucky to get back to that.  It's not like the
transformation could even increase the known alignment, so why bother,
and throw away debug info by rewriting the loop into the same code minus
debug?

> I think the builtins with alignment and calloc-style element count
> will be useful on its own.

Oh, I see, you're suggesting actual separate builtin functions.  Uhh...
I'm not sure I want to go there.  I'd much rather recover the ctz of the
length, and use it in existing code.


I'd also prefer if the generic memset (and memcpy and memmove?) builtin
expanders dealt with non-constant lengths in the way I implemented.
That feels like the right spot for it.  That deprives us of gimple loop
optimizations in the inlined loop generated by the current patch, but if
we expand an unrolled loop with compares and offsets with small
constants, loop optimizations might not even be relevant.


FWIW, the patch I posted yesterday is broken, the regstrap test did not
even build libstdc++-v3 successfully.  I'm not sure whether to pursue it
further, or to reimplement it in the expander.  Suggestions are welcome.

-- 
Alexandre Oliva, happy hacker  https://FSFLA.org/blogs/lxo/
   Free Software Activist GNU Toolchain Engineer
Vim, Vi, Voltei pro Emacs -- GNUlius Caesar


[committed] amdgcn: Add gfx908 support

2021-02-03 Thread Andrew Stubbs
This patch adds a new -march option and multilib configuration to the 
amdgcn GPU target. The patch does not attempt to support any of the new 
features of the gfx908 devices, but does set the correct ELF flags etc. 
that are expected by the ROCm runtime.


The GFX908 devices are not generally available yet, and don't have a 
public name yet, but with this we will be ready when they do.


Andrew
amdgcn: Add gfx908 support

gcc/

	* config/gcn/gcn-opts.h (enum processor_type): Add PROCESSOR_GFX908.
	* config/gcn/gcn.c (gcn_omp_device_kind_arch_isa): Add gfx908.
	(output_file_start): Add gfx908.
	* config/gcn/gcn.opt (gpu_type): Add gfx908.
	* config/gcn/t-gcn-hsa (MULTILIB_OPTIONS): Add march=gfx908.
	(MULTILIB_DIRNAMES): Add gfx908.
	* config/gcn/mkoffload.c (EF_AMDGPU_MACH_AMDGCN_GFX908): New define.
	(main): Recognize gfx908.
	* config/gcn/t-omp-device: Add gfx908.

libgomp/

	* plugin/plugin-gcn.c (EF_AMDGPU_MACH): Add
	EF_AMDGPU_MACH_AMDGCN_GFX908.
	(gcn_gfx908_s): New constant string.
	(isa_hsa_name): Add gfx908.
	(isa_code): Add gfx908.

diff --git a/gcc/config/gcn/gcn-opts.h b/gcc/config/gcn/gcn-opts.h
index ed9b45109ff..ed67d015ff8 100644
--- a/gcc/config/gcn/gcn-opts.h
+++ b/gcc/config/gcn/gcn-opts.h
@@ -22,7 +22,8 @@ enum processor_type
 {
   PROCESSOR_FIJI,// gfx803
   PROCESSOR_VEGA10,  // gfx900
-  PROCESSOR_VEGA20   // gfx906
+  PROCESSOR_VEGA20,  // gfx906
+  PROCESSOR_GFX908   // as yet unnamed
 };
 
 /* Set in gcn_option_override.  */
diff --git a/gcc/config/gcn/gcn.c b/gcc/config/gcn/gcn.c
index 2351b24a4d5..e8bb0b63756 100644
--- a/gcc/config/gcn/gcn.c
+++ b/gcc/config/gcn/gcn.c
@@ -2589,6 +2589,8 @@ gcn_omp_device_kind_arch_isa (enum omp_device_kind_arch_isa trait,
 	return gcn_arch == PROCESSOR_VEGA10;
   if (strcmp (name, "gfx906") == 0)
 	return gcn_arch == PROCESSOR_VEGA20;
+  if (strcmp (name, "gfx908") == 0)
+	return gcn_arch == PROCESSOR_GFX908;
   return 0;
 default:
   gcc_unreachable ();
@@ -5030,6 +5032,7 @@ output_file_start (void)
 case PROCESSOR_FIJI: cpu = "gfx803"; break;
 case PROCESSOR_VEGA10: cpu = "gfx900"; break;
 case PROCESSOR_VEGA20: cpu = "gfx906"; break;
+case PROCESSOR_GFX908: cpu = "gfx908+sram-ecc"; break;
 default: gcc_unreachable ();
 }
 
diff --git a/gcc/config/gcn/gcn.opt b/gcc/config/gcn/gcn.opt
index 7fd84f83572..767d45826c2 100644
--- a/gcc/config/gcn/gcn.opt
+++ b/gcc/config/gcn/gcn.opt
@@ -34,6 +34,9 @@ Enum(gpu_type) String(gfx900) Value(PROCESSOR_VEGA10)
 EnumValue
 Enum(gpu_type) String(gfx906) Value(PROCESSOR_VEGA20)
 
+EnumValue
+Enum(gpu_type) String(gfx908) Value(PROCESSOR_GFX908)
+
 march=
 Target RejectNegative Joined ToLower Enum(gpu_type) Var(gcn_arch) Init(PROCESSOR_FIJI)
 Specify the name of the target GPU.
diff --git a/gcc/config/gcn/mkoffload.c b/gcc/config/gcn/mkoffload.c
index eb1c717e6e9..dc9d5180a35 100644
--- a/gcc/config/gcn/mkoffload.c
+++ b/gcc/config/gcn/mkoffload.c
@@ -51,6 +51,8 @@
 #define EF_AMDGPU_MACH_AMDGCN_GFX900 0x2c
 #undef  EF_AMDGPU_MACH_AMDGCN_GFX906
 #define EF_AMDGPU_MACH_AMDGCN_GFX906 0x2f
+#undef  EF_AMDGPU_MACH_AMDGCN_GFX908
+#define EF_AMDGPU_MACH_AMDGCN_GFX908 0x230  // Assume SRAM-ECC enabled.
 
 #ifndef R_AMDGPU_NONE
 #define R_AMDGPU_NONE		0
@@ -856,6 +858,8 @@ main (int argc, char **argv)
 	elf_arch = EF_AMDGPU_MACH_AMDGCN_GFX900;
   else if (strcmp (argv[i], "-march=gfx906") == 0)
 	elf_arch = EF_AMDGPU_MACH_AMDGCN_GFX906;
+  else if (strcmp (argv[i], "-march=gfx908") == 0)
+	elf_arch = EF_AMDGPU_MACH_AMDGCN_GFX908;
 }
 
   if (!(fopenacc ^ fopenmp))
diff --git a/gcc/config/gcn/t-gcn-hsa b/gcc/config/gcn/t-gcn-hsa
index bf47da79227..ee4d9b30ff2 100644
--- a/gcc/config/gcn/t-gcn-hsa
+++ b/gcc/config/gcn/t-gcn-hsa
@@ -42,8 +42,8 @@ ALL_HOST_OBJS += gcn-run.o
 gcn-run$(exeext): gcn-run.o
 	+$(LINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) -o $@ $< -ldl
 
-MULTILIB_OPTIONS = march=gfx900/march=gfx906
-MULTILIB_DIRNAMES = gfx900 gfx906
+MULTILIB_OPTIONS = march=gfx900/march=gfx906/march=gfx908
+MULTILIB_DIRNAMES = gfx900 gfx906 gfx908
 
 gcn-tree.o: $(srcdir)/config/gcn/gcn-tree.c
 	$(COMPILE) $<
diff --git a/gcc/config/gcn/t-omp-device b/gcc/config/gcn/t-omp-device
index d9809d5f455..8461c432ca9 100644
--- a/gcc/config/gcn/t-omp-device
+++ b/gcc/config/gcn/t-omp-device
@@ -1,4 +1,4 @@
 omp-device-properties-gcn: $(srcdir)/config/gcn/gcn.c
 	echo kind: gpu > $@
 	echo arch: gcn >> $@
-	echo isa: fiji gfx900 gfx906 >> $@
+	echo isa: fiji gfx900 gfx906 gfx908 >> $@
diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index 47f0b6e25f8..8e6af69988e 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -403,6 +403,7 @@ typedef enum {
   EF_AMDGPU_MACH_AMDGCN_GFX803 = 0x02a,
   EF_AMDGPU_MACH_AMDGCN_GFX900 = 0x02c,
   EF_AMDGPU_MACH_AMDGCN_GFX906 = 0x02f,
+  EF_AMDGPU_MACH_AMDGCN_GFX908 = 0x030
 } EF_AMDGPU_MACH;
 
 const static int EF_AMDGPU_MACH_MASK = 0x00ff;
@@ -1596,6 +1597,7 @@ elf_gcn_isa_field (Elf64_Ehdr 

RE: [PATCH] aarch64: Use RTL builtins for [su]mlsl_high_lane[q] intrinsics

2021-02-03 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Jonathan Wright 
> Sent: 03 February 2021 12:43
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov 
> Subject: [PATCH] aarch64: Use RTL builtins for [su]mlsl_high_lane[q]
> intrinsics
> 
> Hi,
> 
> As subject, this patch rewrites [su]mlsl_high_lane[q] Neon intrinsics to use
> RTL builtins rather than inline assembly code, allowing for better scheduling
> and optimization.
> 
> Regression tested and bootstrapped on aarch64-none-linux-gnu and
> aarch64_be-none-elf - no issues.
> 
> Ok for master?

Ok.
Thanks,
Kyrill

> 
> Thanks,
> Jonathan
> 
> ---
> 
> gcc/ChangeLog:
> 
> 2021-02-02  Jonathan Wright  
> 
> * config/aarch64/aarch64-simd-builtins.def: Add
> [su]mlsl_hi_lane[q] builtin macro generators.
> * config/aarch64/aarch64-simd.md
> (aarch64_mlsl_hi_lane_insn): Define.
> (aarch64_mlsl_hi_lane): Define.
> (aarch64_mlsl_hi_laneq_insn): Define.
> (aarch64_mlsl_hi_laneq): Define.
> * config/aarch64/arm_neon.h (vmlsl_high_lane_s16): Use RTL
> builtin instead of inline asm.
> (vmlsl_high_lane_s32): Likewise.
> (vmlsl_high_lane_u16): Likewise.
> (vmlsl_high_lane_u32): Likewise.
> (vmlsl_high_laneq_s16): Likewise.
> (vmlsl_high_laneq_s32): Likewise.
> (vmlsl_high_laneq_u16): Likewise.
> (vmlsl_high_laneq_u32): Likewise.
> (vmlal_high_laneq_u32): Likewise.



Re: [PATCH] arm: Fix up neon_vector_mem_operand [PR97528]

2021-02-03 Thread Andre Vieira (lists) via Gcc-patches
Same patch applies cleanly on gcc-8, bootstrapped 
arm-none-linux-gnueabihf and ran regressions also clean.


Can I also commit it to gcc-8?

Thanks,
Andre

On 02/02/2021 17:36, Kyrylo Tkachov wrote:



-Original Message-
From: Andre Vieira (lists) 
Sent: 02 February 2021 17:27
To: gcc-patches@gcc.gnu.org
Cc: Kyrylo Tkachov ; ja...@redhat.com
Subject: Re: [PATCH] arm: Fix up neon_vector_mem_operand [PR97528]

Hi,

This is a gcc-9 backport of the PR97528 fix that has been applied to
trunk and gcc-10.
Bootstraped on arm-linux-gnueabihf and regression tested.

OK for gcc-9 branch?

Ok.
Thanks,
Kyrill


2021-02-02  Andre Vieira  

      Backport from mainline
      2020-11-20  Jakub Jelinek  

      PR target/97528
      * config/arm/arm.c (neon_vector_mem_operand): For POST_MODIFY,
require
      first POST_MODIFY operand is a REG and is equal to the first operand
      of PLUS.

      * gcc.target/arm/pr97528.c: New test.

On 20/11/2020 11:25, Kyrylo Tkachov via Gcc-patches wrote:

-Original Message-
From: Jakub Jelinek 
Sent: 19 November 2020 18:57
To: Richard Earnshaw ; Ramana
Radhakrishnan ; Kyrylo Tkachov

Cc: gcc-patches@gcc.gnu.org
Subject: [PATCH] arm: Fix up neon_vector_mem_operand [PR97528]

Hi!

The documentation for POST_MODIFY says:
 Currently, the compiler can only handle second operands of the
 form (plus (reg) (reg)) and (plus (reg) (const_int)), where
 the first operand of the PLUS has to be the same register as
 the first operand of the *_MODIFY.
The following testcase ICEs, because combine just attempts to simplify
things and ends up with
(post_modify (reg1) (plus (mult (reg2) (const_int 4)) (reg1))
but the target predicates accept it, because they only verify
that POST_MODIFY's second operand is PLUS and the second operand
of the PLUS is a REG.

The following patch fixes this by performing further verification that
the POST_MODIFY is in the form it should be.

Bootstrapped/regtested on armv7hl-linux-gnueabi, ok for trunk
and release branches after a while?

Ok.
Thanks,
Kyrill


2020-11-19  Jakub Jelinek  

PR target/97528
* config/arm/arm.c (neon_vector_mem_operand): For
POST_MODIFY, require
first POST_MODIFY operand is a REG and is equal to the first operand
of PLUS.

* gcc.target/arm/pr97528.c: New test.

--- gcc/config/arm/arm.c.jj 2020-11-13 19:00:46.729620560 +0100
+++ gcc/config/arm/arm.c2020-11-18 17:05:44.656867343 +0100
@@ -13429,7 +13429,9 @@ neon_vector_mem_operand (rtx op, int typ
 /* Allow post-increment by register for VLDn */
 if (type == 2 && GET_CODE (ind) == POST_MODIFY
 && GET_CODE (XEXP (ind, 1)) == PLUS
-  && REG_P (XEXP (XEXP (ind, 1), 1)))
+  && REG_P (XEXP (XEXP (ind, 1), 1))
+  && REG_P (XEXP (ind, 0))
+  && rtx_equal_p (XEXP (ind, 0), XEXP (XEXP (ind, 1), 0)))
return true;

 /* Match:
--- gcc/testsuite/gcc.target/arm/pr97528.c.jj   2020-11-18
17:09:58.195053288 +0100
+++ gcc/testsuite/gcc.target/arm/pr97528.c  2020-11-18
17:09:47.839168237 +0100
@@ -0,0 +1,28 @@
+/* PR target/97528 */
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-options "-O1" }  */
+/* { dg-add-options arm_neon } */
+
+#include 
+
+typedef __simd64_int16_t T;
+typedef __simd64_uint16_t U;
+unsigned short c;
+int d;
+U e;
+
+void
+foo (void)
+{
+  unsigned short *dst = 
+  int g = d, b = 4;
+  U dc = e;
+  for (int h = 0; h < b; h++)
+{
+  unsigned short *i = dst;
+  U j = dc;
+  vst1_s16 ((int16_t *) i, (T) j);
+  dst += g;
+}
+}


Jakub


RE: [PATCH] aarch64: Use RTL builtins for [su]mlal_high_lane[q] intrinsics

2021-02-03 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Jonathan Wright 
> Sent: 03 February 2021 12:39
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov 
> Subject: [PATCH] aarch64: Use RTL builtins for [su]mlal_high_lane[q]
> intrinsics
> 
> Hi,
> 
> As subject, this patch rewrites [su]mlal_high_lane[q] Neon intrinsics to use
> RTL builtins rather than inline assembly code, allowing for better scheduling
> and optimization.
> 
> Regression tested and bootstrapped on aarch64-none-linux-gnu and
> aarch64_be-none-elf - no issues.
> 
> Ok for master?

Ok.
Thanks,
Kyrill

> 
> Thanks,
> Jonathan
> 
> ---
> 
> gcc/ChangeLog:
> 
> 2021-02-02  Jonathan Wright  
> 
> * config/aarch64/aarch64-simd-builtins.def: Add
> [su]mlal_hi_lane[q] builtin generator macros.
> * config/aarch64/aarch64-simd.md
> (aarch64_mlal_hi_lane_insn): Define.
> (aarch64_mlal_hi_lane): Define.
> (aarch64_mlal_hi_laneq_insn): Define.
> (aarch64_mlal_hi_laneq): Define.
> * config/aarch64/arm_neon.h (vmlal_high_lane_s16): Use RTL
> builtin instead of inline asm.
> (vmlal_high_lane_s32): Likewise.
> (vmlal_high_lane_u16): Likewise.
> (vmlal_high_lane_u32): Likewise.
> (vmlal_high_laneq_s16): Likewise.
> (vmlal_high_laneq_s32): Likewise.
> (vmlal_high_laneq_u16): Likewise.
> (vmlal_high_laneq_u32): Likewise.



RE: [PATCH] aarch64: Use RTL builtins for [su]mlsl_high_n intrinsics

2021-02-03 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Jonathan Wright 
> Sent: 03 February 2021 12:35
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov 
> Subject: [PATCH] aarch64: Use RTL builtins for [su]mlsl_high_n intrinsics
> 
> Hi,
> 
> As subject, this patch rewrites [su]mlsl_high_n Neon intrinsics to use RTL
> builtins rather than inline assembly code, allowing for better scheduling and
> optimization.
> 
> Regression tested and bootstrapped on aarch64-none-linux-gnu and
> aarch64_be-none-elf - no issues.
> 
> Ok for master?

Ok.
Thanks,
Kyrill

> 
> Thanks,
> Jonathan
> 
> ---
> 
> gcc/ChangeLog:
> 
> 2021-01-27  Jonathan Wright  
> 
> * config/aarch64/aarch64-simd-builtins.def: Add [su]mlsl_hi_n
> builtin generator macros.
> * config/aarch64/aarch64-simd.md (aarch64_mlsl_hi_n_insn):
> Define.
> (aarch64_mlsl_hi_n): Define.
> * config/aarch64/arm_neon.h (vmlsl_high_n_s16): Use RTL builtin
> instead of inline asm.
> (vmlsl_high_n_s32): Likewise.
> (vmlsl_high_n_u16): Likewise.
> (vmlsl_high_n_u32): Likewise.



Re: [PATCH] c++: ICE with late parsing of noexcept in nested class [PR98899]

2021-02-03 Thread Jason Merrill via Gcc-patches

On 2/2/21 5:09 PM, Marek Polacek wrote:

Here we crash with a noexcept-specifier in a nested template class,
because my handling of such deferred-parse noexcept-specifiers was
gronked when we need to instantiate a DEFERRED_PARSE before it was
actually parsed at the end of the outermost class.

In

   struct S {
 template struct B {
   B() noexcept(noexcept(x));
   int x;
 };
 struct A : B {
   A() : B() {}
 };
   };

we call complete_type for B which triggers tsubsting S::B::B()
whose noexcept-specifier still contains a DEFERRED_PARSE.  The trick is
to stash such noexcept-specifiers into DEFPARSE_INSTANTIATIONS so that
we can replace it later when we've finally parsed all deferred
noexcept-specifiers.

In passing, fix missing usage of UNPARSED_NOEXCEPT_SPEC_P.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/10?


OK.


gcc/cp/ChangeLog:

PR c++/98899
* parser.c (cp_parser_class_specifier_1): Use any possible
DEFPARSE_INSTANTIATIONS to update DEFERRED_NOEXCEPT_PATTERN.
(cp_parser_save_noexcept): Initialize DEFPARSE_INSTANTIATIONS.
* pt.c (tsubst_exception_specification): Stash new_specs into
DEFPARSE_INSTANTIATIONS.
* tree.c (fixup_deferred_exception_variants): Use
UNPARSED_NOEXCEPT_SPEC_P.

gcc/testsuite/ChangeLog:

PR c++/98899
* g++.dg/cpp0x/noexcept65.C: New test.
---
  gcc/cp/parser.c | 13 ++---
  gcc/cp/pt.c | 16 +++
  gcc/cp/tree.c   |  3 +--
  gcc/testsuite/g++.dg/cpp0x/noexcept65.C | 35 +
  4 files changed, 62 insertions(+), 5 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/noexcept65.C

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index abadaf972d6..5da8670f0e2 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -25026,8 +25026,8 @@ cp_parser_class_specifier_1 (cp_parser* parser)
  pushed_scope = push_scope (class_type);
}
  
-	  tree spec = TYPE_RAISES_EXCEPTIONS (TREE_TYPE (decl));

- spec = TREE_PURPOSE (spec);
+ tree def_parse = TYPE_RAISES_EXCEPTIONS (TREE_TYPE (decl));
+ def_parse = TREE_PURPOSE (def_parse);
  
  	  /* Make sure that any template parameters are in scope.  */

  maybe_begin_member_template_processing (decl);
@@ -25044,7 +25044,7 @@ cp_parser_class_specifier_1 (cp_parser* parser)
parser->local_variables_forbidden_p |= THIS_FORBIDDEN;
  
  	  /* Now we can parse the noexcept-specifier.  */

- spec = cp_parser_late_noexcept_specifier (parser, spec);
+ tree spec = cp_parser_late_noexcept_specifier (parser, def_parse);
  
  	  if (spec == error_mark_node)

spec = NULL_TREE;
@@ -25052,6 +25052,12 @@ cp_parser_class_specifier_1 (cp_parser* parser)
  /* Update the fn's type directly -- it might have escaped
 beyond this decl :(  */
  fixup_deferred_exception_variants (TREE_TYPE (decl), spec);
+ /* Update any instantiations we've already created.  We must
+keep the new noexcept-specifier wrapped in a DEFERRED_NOEXCEPT
+so that maybe_instantiate_noexcept can tsubst the NOEXCEPT_EXPR
+in the pattern.  */
+ for (tree i : DEFPARSE_INSTANTIATIONS (def_parse))
+   DEFERRED_NOEXCEPT_PATTERN (TREE_PURPOSE (i)) = TREE_PURPOSE (spec);
  
  	  /* Restore the state of local_variables_forbidden_p.  */

  parser->local_variables_forbidden_p = local_variables_forbidden_p;
@@ -26695,6 +26701,7 @@ cp_parser_save_noexcept (cp_parser *parser)
/* Save away the noexcept-specifier; we will process it when the
   class is complete.  */
DEFPARSE_TOKENS (expr) = cp_token_cache_new (first, last);
+  DEFPARSE_INSTANTIATIONS (expr) = nullptr;
expr = build_tree_list (expr, NULL_TREE);
return expr;
  }
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index aa1687a9f2a..4781519d00f 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -15189,6 +15189,22 @@ tsubst_exception_specification (tree fntype,
 /*integral_constant_expression_p=*/true);
}
new_specs = build_noexcept_spec (new_specs, complain);
+  /* We've instantiated a template before a noexcept-specifier
+contained therein has been parsed.  This can happen for
+a nested template class:
+
+ struct S {
+   template struct B { B() noexcept(...); };
+   struct A : B { ... use B() ... };
+ };
+
+where completing B will trigger instantiating the
+noexcept, even though we only parse it at the end of S.  */
+  if (UNPARSED_NOEXCEPT_SPEC_P (specs))
+   {
+ gcc_checking_assert (defer_ok);
+ vec_safe_push (DEFPARSE_INSTANTIATIONS (expr), new_specs);
+   }
  }
else if (specs)
  {
diff --git a/gcc/cp/tree.c b/gcc/cp/tree.c
index 2e5a1f198e8..e6ced274959 100644
--- a/gcc/cp/tree.c
+++ b/gcc/cp/tree.c

Re: c++: cross-module __cxa_atexit use [PR 98531]

2021-02-03 Thread Nathan Sidwell

[Dropping Richard, as I don't think his input is necessary further]

Hi Rainer,
here are two patches, to be applied in sequence (replacing earlier 
versions of the patch).


The first is merely a testsuite correction of the earlier patch to 
inhibit the cxa_atexit-specific tests where it matters.  (I don't think 
it matters in the ABI ones, if the library does not provide them, we 
won't hit a declaration that could differ)


The second fixes the atexit issue (and array entity with cxa_atexit) you 
encountered on solaris 11.3.


With these two patches I could build the xtreme-header-2_a.ii you 
provided on an i386-solaris2.11 xcompiler (I can only compile there, not 
got a complete runtime to run the tests).


Please let me know how these behave, thanks.

patch 3a:
c++: cross-module __cxa_atexit use [PR 98531]

The compiler's use of lazily-declared library functions must insert
said functions into a symbol table, so that they can be correctly
merged across TUs at the module-level.  We have too many different
ways of declaring such library functions.  This fixes __cxa_atexit (or
its system-specific variations), pushing (or merging) the decl into
the appropriate namespace.  Because we're pushing a lazy builtin,
check_redeclaration_exception_specification needed a tweak to allow a
such a builtin's eh spec to differ from what the user may have already
declared. (I suspect no all headers declare atexit as noexcept.)

We can't test the -fno-use-cxa-atexit path with modules, as that
requires a followup patch to a closely related piece (which also
affects cxa_atexit targets in other circumstances).

PR c++/98531
gcc/cp/
* cp-tree.h (push_abi_namespace, pop_abi_namespace): Declare.
* decl.c (push_abi_namespace, pop_abi_namespace): Moved
from rtti.c, add default namespace arg.
(check_redeclaration_exception_specification): Allow a lazy
builtin's eh spec to differ from an lready-declared user
declaration.
(declare_global_var): Use push/pop_abi_namespace.
(get_atexit_node): Push the fndecl into a namespace.
* rtti.c (push_abi_namespace, pop_abi_namespace): Moved to
decl.c.
gcc/testsuite/
* g++.dg/modules/pr98531-1.h: New.
* g++.dg/modules/pr98531-1_a.H: New.
* g++.dg/modules/pr98531-1_b.C: New.
* g++.dg/abi/pr98531-1.C: New.
* g++.dg/abi/pr98531-2.C: New.
* g++.dg/abi/pr98531-3.C: New.
* g++.dg/abi/pr98531-4.C: New.


patch 3b:
c++: cleanup function name [PR 98531]

The next piece of 98531 is that in some cases we need to create a
cleanup function to do the work (when the object is an array, or we're
using regular atexit).  We were not pushing that function's decl
anywhere (not giving it a context) so streaming it failed.

This is a partial fix.  You'll notice we're naming these from a per-TU
counter.  I've captured that in PR98893.

gcc/cp/
* decl.c (start_cleanup_fn): Push function into
namespace.
gcc/testsuite/
* g++.dg/modules/pr98531-2.h: New.
* g++.dg/modules/pr98531-2_a.H: New.
* g++.dg/modules/pr98531-2_b.C: New.
* g++.dg/modules/pr98531-3.h: New.
* g++.dg/modules/pr98531-3_a.H: New.
* g++.dg/modules/pr98531-3_b.C: New.

--
Nathan Sidwell
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index f31319904eb..bb06b6e44a9 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -193,7 +193,9 @@ enum cp_tree_index
 
 CPTI_MODULE_HWM,
 /* Nodes after here change during compilation, or should not be in
-   the module's global tree table.  */
+   the module's global tree table.  Such nodes must be locatable
+   via name lookup or type-construction, as those are the only
+   cross-TU matching capabilities remaining.  */
 
 /* We must find these via the global namespace.  */
 CPTI_STD,
@@ -6622,6 +6624,9 @@ extern tree make_typename_type			(tree, tree, enum tag_types, tsubst_flags_t);
 extern tree build_typename_type			(tree, tree, tree, tag_types);
 extern tree make_unbound_class_template		(tree, tree, tree, tsubst_flags_t);
 extern tree make_unbound_class_template_raw	(tree, tree, tree);
+extern unsigned push_abi_namespace		(tree node = abi_node);
+extern void pop_abi_namespace			(unsigned flags,
+		 tree node = abi_node);
 extern tree build_library_fn_ptr		(const char *, tree, int);
 extern tree build_cp_library_fn_ptr		(const char *, tree, int);
 extern tree push_library_fn			(tree, tree, tree, int);
diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 1a114a2e2d0..edabeb989b3 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -1209,7 +1209,8 @@ check_redeclaration_exception_specification (tree new_decl,
  all declarations, including the definition and an explicit
  specialization, of that function shall have an
  

[PATCH] aarch64: Use RTL builtins for [su]mlsl_high_lane[q] intrinsics

2021-02-03 Thread Jonathan Wright via Gcc-patches
Hi,

As subject, this patch rewrites [su]mlsl_high_lane[q] Neon intrinsics to use
RTL builtins rather than inline assembly code, allowing for better scheduling
and optimization.

Regression tested and bootstrapped on aarch64-none-linux-gnu and
aarch64_be-none-elf - no issues.

Ok for master?

Thanks,
Jonathan

---

gcc/ChangeLog:

2021-02-02  Jonathan Wright  

* config/aarch64/aarch64-simd-builtins.def: Add
[su]mlsl_hi_lane[q] builtin macro generators.
* config/aarch64/aarch64-simd.md
(aarch64_mlsl_hi_lane_insn): Define.
(aarch64_mlsl_hi_lane): Define.
(aarch64_mlsl_hi_laneq_insn): Define.
(aarch64_mlsl_hi_laneq): Define.
* config/aarch64/arm_neon.h (vmlsl_high_lane_s16): Use RTL
builtin instead of inline asm.
(vmlsl_high_lane_s32): Likewise.
(vmlsl_high_lane_u16): Likewise.
(vmlsl_high_lane_u32): Likewise.
(vmlsl_high_laneq_s16): Likewise.
(vmlsl_high_laneq_s32): Likewise.
(vmlsl_high_laneq_u16): Likewise.
(vmlsl_high_laneq_u32): Likewise.
(vmlal_high_laneq_u32): Likewise.
diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index 152c2e6d361bdab0275e3b38759723fd2a3ffee5..76ab021725900b249ecabf3f8df2167169a263e9 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -305,6 +305,11 @@
   BUILTIN_VQ_HSI (QUADOPU_LANE, umlal_hi_lane, 0, NONE)
   BUILTIN_VQ_HSI (QUADOPU_LANE, umlal_hi_laneq, 0, NONE)
 
+  BUILTIN_VQ_HSI (QUADOP_LANE, smlsl_hi_lane, 0, NONE)
+  BUILTIN_VQ_HSI (QUADOP_LANE, smlsl_hi_laneq, 0, NONE)
+  BUILTIN_VQ_HSI (QUADOPU_LANE, umlsl_hi_lane, 0, NONE)
+  BUILTIN_VQ_HSI (QUADOPU_LANE, umlsl_hi_laneq, 0, NONE)
+
   BUILTIN_VSD_HSI (BINOP, sqdmull, 0, NONE)
   BUILTIN_VSD_HSI (TERNOP_LANE, sqdmull_lane, 0, NONE)
   BUILTIN_VSD_HSI (TERNOP_LANE, sqdmull_laneq, 0, NONE)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index e741b656cb081e26b9e6e262ae50fab3716e1ed4..2e347b92b79cb0c1dfef710034602d8f61f62173 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -2270,6 +2270,78 @@
   [(set_attr "type" "neon_mla__scalar_long")]
 )
 
+(define_insn "aarch64_mlsl_hi_lane_insn"
+  [(set (match_operand: 0 "register_operand" "=w")
+	(minus:
+	  (match_operand: 1 "register_operand" "0")
+	  (mult:
+	(ANY_EXTEND: (vec_select:
+	  (match_operand:VQ_HSI 2 "register_operand" "w")
+	  (match_operand:VQ_HSI 3 "vect_par_cnst_hi_half" "")))
+	(ANY_EXTEND: (vec_duplicate:
+	  (vec_select:
+		(match_operand: 4 "register_operand" "")
+		(parallel [(match_operand:SI 5 "immediate_operand" "i")]
+	  )))]
+  "TARGET_SIMD"
+  {
+operands[5] = aarch64_endian_lane_rtx (mode, INTVAL (operands[5]));
+return "mlsl2\\t%0., %2., %4.[%5]";
+  }
+  [(set_attr "type" "neon_mla__scalar_long")]
+)
+
+(define_expand "aarch64_mlsl_hi_lane"
+  [(match_operand: 0 "register_operand")
+   (match_operand: 1 "register_operand")
+   (ANY_EXTEND:(match_operand:VQ_HSI 2 "register_operand"))
+   (match_operand: 3 "register_operand")
+   (match_operand:SI 4 "immediate_operand")]
+  "TARGET_SIMD"
+{
+  rtx p = aarch64_simd_vect_par_cnst_half (mode, , true);
+  emit_insn (gen_aarch64_mlsl_hi_lane_insn (operands[0],
+	 operands[1], operands[2], p, operands[3], operands[4]));
+  DONE;
+}
+)
+
+(define_insn "aarch64_mlsl_hi_laneq_insn"
+  [(set (match_operand: 0 "register_operand" "=w")
+	(minus:
+	  (match_operand: 1 "register_operand" "0")
+	  (mult:
+	(ANY_EXTEND: (vec_select:
+	  (match_operand:VQ_HSI 2 "register_operand" "w")
+	  (match_operand:VQ_HSI 3 "vect_par_cnst_hi_half" "")))
+	(ANY_EXTEND: (vec_duplicate:
+	  (vec_select:
+		(match_operand: 4 "register_operand" "")
+		(parallel [(match_operand:SI 5 "immediate_operand" "i")]
+	  )))]
+  "TARGET_SIMD"
+  {
+operands[5] = aarch64_endian_lane_rtx (mode, INTVAL (operands[5]));
+return "mlsl2\\t%0., %2., %4.[%5]";
+  }
+  [(set_attr "type" "neon_mla__scalar_long")]
+)
+
+(define_expand "aarch64_mlsl_hi_laneq"
+  [(match_operand: 0 "register_operand")
+   (match_operand: 1 "register_operand")
+   (ANY_EXTEND:(match_operand:VQ_HSI 2 "register_operand"))
+   (match_operand: 3 "register_operand")
+   (match_operand:SI 4 "immediate_operand")]
+  "TARGET_SIMD"
+{
+  rtx p = aarch64_simd_vect_par_cnst_half (mode, , true);
+  emit_insn (gen_aarch64_mlsl_hi_laneq_insn (operands[0],
+	 operands[1], operands[2], p, operands[3], operands[4]));
+  DONE;
+}
+)
+
 ;; FP vector operations.
 ;; AArch64 AdvSIMD supports single-precision (32-bit) and 
 ;; double-precision (64-bit) floating-point data types and arithmetic as
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index ee68240f0d019a4a3be89e1e923cb14ee8026468..7b99e16b53cae27f6d2e7a29985cb4963d74739e 100644
--- a/gcc/config/aarch64/arm_neon.h

[PATCH] aarch64: Use RTL builtins for [su]mlal_high_lane[q] intrinsics

2021-02-03 Thread Jonathan Wright via Gcc-patches
Hi,

As subject, this patch rewrites [su]mlal_high_lane[q] Neon intrinsics to use
RTL builtins rather than inline assembly code, allowing for better scheduling
and optimization.

Regression tested and bootstrapped on aarch64-none-linux-gnu and
aarch64_be-none-elf - no issues.

Ok for master?

Thanks,
Jonathan

---

gcc/ChangeLog:

2021-02-02  Jonathan Wright  

* config/aarch64/aarch64-simd-builtins.def: Add
[su]mlal_hi_lane[q] builtin generator macros.
* config/aarch64/aarch64-simd.md
(aarch64_mlal_hi_lane_insn): Define.
(aarch64_mlal_hi_lane): Define.
(aarch64_mlal_hi_laneq_insn): Define.
(aarch64_mlal_hi_laneq): Define.
* config/aarch64/arm_neon.h (vmlal_high_lane_s16): Use RTL
builtin instead of inline asm.
(vmlal_high_lane_s32): Likewise.
(vmlal_high_lane_u16): Likewise.
(vmlal_high_lane_u32): Likewise.
(vmlal_high_laneq_s16): Likewise.
(vmlal_high_laneq_s32): Likewise.
(vmlal_high_laneq_u16): Likewise.
(vmlal_high_laneq_u32): Likewise.
diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index 336f9f9a56b07668678e5b384a89f518433da58b..152c2e6d361bdab0275e3b38759723fd2a3ffee5 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -300,6 +300,11 @@
   BUILTIN_VD_HSI (QUADOPU_LANE, vec_umlsl_lane_, 0, NONE)
   BUILTIN_VD_HSI (QUADOPU_LANE, vec_umlsl_laneq_, 0, NONE)
 
+  BUILTIN_VQ_HSI (QUADOP_LANE, smlal_hi_lane, 0, NONE)
+  BUILTIN_VQ_HSI (QUADOP_LANE, smlal_hi_laneq, 0, NONE)
+  BUILTIN_VQ_HSI (QUADOPU_LANE, umlal_hi_lane, 0, NONE)
+  BUILTIN_VQ_HSI (QUADOPU_LANE, umlal_hi_laneq, 0, NONE)
+
   BUILTIN_VSD_HSI (BINOP, sqdmull, 0, NONE)
   BUILTIN_VSD_HSI (TERNOP_LANE, sqdmull_lane, 0, NONE)
   BUILTIN_VSD_HSI (TERNOP_LANE, sqdmull_laneq, 0, NONE)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 1e9b4d933f3f9385d857b497e573de6aee25c57f..e741b656cb081e26b9e6e262ae50fab3716e1ed4 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -2180,6 +2180,76 @@
   [(set_attr "type" "neon_mla__scalar_long")]
 )
 
+(define_insn "aarch64_mlal_hi_lane_insn"
+  [(set (match_operand: 0 "register_operand" "=w")
+	(plus:
+	  (mult:
+	(ANY_EXTEND: (vec_select:
+	  (match_operand:VQ_HSI 2 "register_operand" "w")
+	  (match_operand:VQ_HSI 3 "vect_par_cnst_hi_half" "")))
+	(ANY_EXTEND: (vec_duplicate:
+	  (vec_select:
+		(match_operand: 4 "register_operand" "")
+		(parallel [(match_operand:SI 5 "immediate_operand" "i")])
+	  (match_operand: 1 "register_operand" "0")))]
+  "TARGET_SIMD"
+  {
+operands[5] = aarch64_endian_lane_rtx (mode, INTVAL (operands[5]));
+return "mlal2\\t%0., %2., %4.[%5]";
+  }
+  [(set_attr "type" "neon_mla__scalar_long")]
+)
+
+(define_expand "aarch64_mlal_hi_lane"
+  [(match_operand: 0 "register_operand")
+   (match_operand: 1 "register_operand")
+   (ANY_EXTEND:(match_operand:VQ_HSI 2 "register_operand"))
+   (match_operand: 3 "register_operand")
+   (match_operand:SI 4 "immediate_operand")]
+  "TARGET_SIMD"
+{
+  rtx p = aarch64_simd_vect_par_cnst_half (mode, , true);
+  emit_insn (gen_aarch64_mlal_hi_lane_insn (operands[0],
+	 operands[1], operands[2], p, operands[3], operands[4]));
+  DONE;
+}
+)
+
+(define_insn "aarch64_mlal_hi_laneq_insn"
+  [(set (match_operand: 0 "register_operand" "=w")
+	(plus:
+	  (mult:
+	(ANY_EXTEND: (vec_select:
+	  (match_operand:VQ_HSI 2 "register_operand" "w")
+	  (match_operand:VQ_HSI 3 "vect_par_cnst_hi_half" "")))
+	(ANY_EXTEND: (vec_duplicate:
+	  (vec_select:
+		(match_operand: 4 "register_operand" "")
+		(parallel [(match_operand:SI 5 "immediate_operand" "i")])
+	  (match_operand: 1 "register_operand" "0")))]
+  "TARGET_SIMD"
+  {
+operands[5] = aarch64_endian_lane_rtx (mode, INTVAL (operands[5]));
+return "mlal2\\t%0., %2., %4.[%5]";
+  }
+  [(set_attr "type" "neon_mla__scalar_long")]
+)
+
+(define_expand "aarch64_mlal_hi_laneq"
+  [(match_operand: 0 "register_operand")
+   (match_operand: 1 "register_operand")
+   (ANY_EXTEND:(match_operand:VQ_HSI 2 "register_operand"))
+   (match_operand: 3 "register_operand")
+   (match_operand:SI 4 "immediate_operand")]
+  "TARGET_SIMD"
+{
+  rtx p = aarch64_simd_vect_par_cnst_half (mode, , true);
+  emit_insn (gen_aarch64_mlal_hi_laneq_insn (operands[0],
+	 operands[1], operands[2], p, operands[3], operands[4]));
+  DONE;
+}
+)
+
 (define_insn "aarch64_vec_mlsl_lane"
   [(set (match_operand: 0 "register_operand" "=w")
(minus:
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 7e2c2fc3827e773b960abc137b2cadea61a54577..ee68240f0d019a4a3be89e1e923cb14ee8026468 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -7182,117 +7182,69 @@ vmla_u32 (uint32x2_t __a, uint32x2_t __b, 

[PATCH] aarch64: Use RTL builtins for [su]mlsl_high_n intrinsics

2021-02-03 Thread Jonathan Wright via Gcc-patches
Hi,

As subject, this patch rewrites [su]mlsl_high_n Neon intrinsics to use RTL
builtins rather than inline assembly code, allowing for better scheduling and
optimization.

Regression tested and bootstrapped on aarch64-none-linux-gnu and
aarch64_be-none-elf - no issues.

Ok for master?

Thanks,
Jonathan

---

gcc/ChangeLog:

2021-01-27  Jonathan Wright  

* config/aarch64/aarch64-simd-builtins.def: Add [su]mlsl_hi_n
builtin generator macros.
* config/aarch64/aarch64-simd.md (aarch64_mlsl_hi_n_insn):
Define.
(aarch64_mlsl_hi_n): Define.
* config/aarch64/arm_neon.h (vmlsl_high_n_s16): Use RTL builtin
instead of inline asm.
(vmlsl_high_n_s32): Likewise.
(vmlsl_high_n_u16): Likewise.
(vmlsl_high_n_u32): Likewise.
diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index c102289c26123ae913df87d327237647d2621655..336f9f9a56b07668678e5b384a89f518433da58b 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -230,6 +230,10 @@
   BUILTIN_VQW (TERNOP, smlsl_hi, 0, NONE)
   BUILTIN_VQW (TERNOPU, umlsl_hi, 0, NONE)
 
+  /* Implemented by aarch64_mlsl_hi_n.  */
+  BUILTIN_VQ_HSI (TERNOP, smlsl_hi_n, 0, NONE)
+  BUILTIN_VQ_HSI (TERNOPU, umlsl_hi_n, 0, NONE)
+
   /* Implemented by aarch64_mlal_hi.  */
   BUILTIN_VQW (TERNOP, smlal_hi, 0, NONE)
   BUILTIN_VQW (TERNOPU, umlal_hi, 0, NONE)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index a883f6ad4de8bb6d0c5f6478df5c516c159df4bb..1e9b4d933f3f9385d857b497e573de6aee25c57f 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1974,6 +1974,35 @@
 }
 )
 
+(define_insn "aarch64_mlsl_hi_n_insn"
+  [(set (match_operand: 0 "register_operand" "=w")
+(minus:
+  (match_operand: 1 "register_operand" "0")
+  (mult:
+(ANY_EXTEND: (vec_select:
+  (match_operand:VQ_HSI 2 "register_operand" "w")
+  (match_operand:VQ_HSI 3 "vect_par_cnst_hi_half" "")))
+(ANY_EXTEND: (vec_duplicate:
+	(match_operand: 4 "register_operand" ""))]
+  "TARGET_SIMD"
+  "mlsl2\t%0., %2., %4.[0]"
+  [(set_attr "type" "neon_mla__long")]
+)
+
+(define_expand "aarch64_mlsl_hi_n"
+  [(match_operand: 0 "register_operand")
+   (match_operand: 1 "register_operand")
+   (ANY_EXTEND:(match_operand:VQ_HSI 2 "register_operand"))
+   (match_operand: 3 "register_operand")]
+  "TARGET_SIMD"
+{
+  rtx p = aarch64_simd_vect_par_cnst_half (mode, , true);
+  emit_insn (gen_aarch64_mlsl_hi_n_insn (operands[0],
+ operands[1], operands[2], p, operands[3]));
+  DONE;
+}
+)
+
 (define_insn "aarch64_mlal"
   [(set (match_operand: 0 "register_operand" "=w")
 (plus:
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index ae8526d5972067c05265a1f0bcf9fde5e347fb3b..7e2c2fc3827e773b960abc137b2cadea61a54577 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -7792,48 +7792,28 @@ __extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vmlsl_high_n_s16 (int32x4_t __a, int16x8_t __b, int16_t __c)
 {
-  int32x4_t __result;
-  __asm__ ("smlsl2 %0.4s, %2.8h, %3.h[0]"
-   : "=w"(__result)
-   : "0"(__a), "w"(__b), "x"(__c)
-   : /* No clobbers */);
-  return __result;
+  return __builtin_aarch64_smlsl_hi_nv8hi (__a, __b, __c);
 }
 
 __extension__ extern __inline int64x2_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vmlsl_high_n_s32 (int64x2_t __a, int32x4_t __b, int32_t __c)
 {
-  int64x2_t __result;
-  __asm__ ("smlsl2 %0.2d, %2.4s, %3.s[0]"
-   : "=w"(__result)
-   : "0"(__a), "w"(__b), "w"(__c)
-   : /* No clobbers */);
-  return __result;
+  return __builtin_aarch64_smlsl_hi_nv4si (__a, __b, __c);
 }
 
 __extension__ extern __inline uint32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vmlsl_high_n_u16 (uint32x4_t __a, uint16x8_t __b, uint16_t __c)
 {
-  uint32x4_t __result;
-  __asm__ ("umlsl2 %0.4s, %2.8h, %3.h[0]"
-   : "=w"(__result)
-   : "0"(__a), "w"(__b), "x"(__c)
-   : /* No clobbers */);
-  return __result;
+  return __builtin_aarch64_umlsl_hi_nv8hi_ (__a, __b, __c);
 }
 
 __extension__ extern __inline uint64x2_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vmlsl_high_n_u32 (uint64x2_t __a, uint32x4_t __b, uint32_t __c)
 {
-  uint64x2_t __result;
-  __asm__ ("umlsl2 %0.2d, %2.4s, %3.s[0]"
-   : "=w"(__result)
-   : "0"(__a), "w"(__b), "w"(__c)
-   : /* No clobbers */);
-  return __result;
+  return __builtin_aarch64_umlsl_hi_nv4si_ (__a, __b, __c);
 }
 
 __extension__ extern __inline int16x8_t


RE: [PATCH] aarch64: Use RTL builtins for [su]mlal_high_n intrinsics

2021-02-03 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Jonathan Wright 
> Sent: 03 February 2021 12:29
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov 
> Subject: [PATCH] aarch64: Use RTL builtins for [su]mlal_high_n intrinsics
> 
> Hi,
> 
> As subject, this patch rewrites [su]mlal_high_n Neon intrinsics to use RTL
> builtins rather than inline assembly code, allowing for better scheduling and
> optimization.
> 
> Regression tested and bootstrapped on aarch64-none-linux-gnu and
> aarch64_be-none-elf - no issues.
> 
> Ok for master?

Ok.
Thanks,
Kyrill

> 
> Thanks,
> Jonathan
> 
> --
> 
> gcc/ChangeLog:
> 
> 2021-01-27  Jonathan Wright  
> 
> * config/aarch64/aarch64-simd-builtins.def: Add [su]mlal_hi_n
> builtin generator macros.
> * config/aarch64/aarch64-simd.md (aarch64_mlal_hi_n_insn):
> Define.
> (aarch64_mlal_hi_n): Define.
> * config/aarch64/arm_neon.h (vmlal_high_n_s16): Use RTL builtin
> instead of inline asm.
> (vmlal_high_n_s32): Likewise.
> (vmlal_high_n_u16): Likewise.
> (vmlal_high_n_u32): Likewise.



RE: [PATCH] aarch64: Use RTL builtins for [su]mlal_high intrinsics

2021-02-03 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Jonathan Wright 
> Sent: 03 February 2021 12:20
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov 
> Subject: [PATCH] aarch64: Use RTL builtins for [su]mlal_high intrinsics
> 
> Hi,
> 
> As subject, this patch rewrites [su]mlal_high Neon intrinsics to use RTL
> builtins rather than inline assembly code, allowing for better scheduling
> and optimization.
> 
> Regression tested and bootstrapped on aarch64-none-linux-gnu - no
> issues.
> 
> Ok for master?

Ok.
Thanks,
Kyrill

> 
> Thanks,
> Jonathan
> 
> ---
> 
> gcc/ChangeLog:
> 
> 2021-01-27  Jonathan Wright  
> 
> * config/aarch64/aarch64-simd-builtins.def: Add RTL builtin
> generator macros.
> * config/aarch64/aarch64-simd.md (*aarch64_mlal_hi):
> Rename to...
> (aarch64_mlal_hi_insn): This.
> (aarch64_mlal_hi): Define.
> * config/aarch64/arm_neon.h (vmlal_high_s8): Use RTL builtin
> instead of inline asm.
> (vmlal_high_s16): Likewise.
> (vmlal_high_s32): Likewise.
> (vmlal_high_u8): Likewise.
> (vmlal_high_u16): Likewise.
> (vmlal_high_u32): Likewise.



[PATCH] aarch64: Use RTL builtins for [su]mlal_high_n intrinsics

2021-02-03 Thread Jonathan Wright via Gcc-patches
Hi,

As subject, this patch rewrites [su]mlal_high_n Neon intrinsics to use RTL
builtins rather than inline assembly code, allowing for better scheduling and
optimization.

Regression tested and bootstrapped on aarch64-none-linux-gnu and
aarch64_be-none-elf - no issues.

Ok for master?

Thanks,
Jonathan

--

gcc/ChangeLog:

2021-01-27  Jonathan Wright  

* config/aarch64/aarch64-simd-builtins.def: Add [su]mlal_hi_n
builtin generator macros.
* config/aarch64/aarch64-simd.md (aarch64_mlal_hi_n_insn):
Define.
(aarch64_mlal_hi_n): Define.
* config/aarch64/arm_neon.h (vmlal_high_n_s16): Use RTL builtin
instead of inline asm.
(vmlal_high_n_s32): Likewise.
(vmlal_high_n_u16): Likewise.
(vmlal_high_n_u32): Likewise.
diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index 2d91a0768d66fb8570ce518c06faae28c0ffcf27..c102289c26123ae913df87d327237647d2621655 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -234,6 +234,10 @@
   BUILTIN_VQW (TERNOP, smlal_hi, 0, NONE)
   BUILTIN_VQW (TERNOPU, umlal_hi, 0, NONE)
 
+  /* Implemented by aarch64_mlal_hi_n.  */
+  BUILTIN_VQ_HSI (TERNOP, smlal_hi_n, 0, NONE)
+  BUILTIN_VQ_HSI (TERNOPU, umlal_hi_n, 0, NONE)
+
   BUILTIN_VSQN_HSDI (UNOPUS, sqmovun, 0, NONE)
   /* Implemented by aarch64_qmovn.  */
   BUILTIN_VSQN_HSDI (UNOP, sqmovn, 0, NONE)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index ff5037fb44ebb4d1d37ab838de6391e105e90bbf..a883f6ad4de8bb6d0c5f6478df5c516c159df4bb 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1899,6 +1899,35 @@
 }
 )
 
+(define_insn "aarch64_mlal_hi_n_insn"
+  [(set (match_operand: 0 "register_operand" "=w")
+(plus:
+  (mult:
+  (ANY_EXTEND: (vec_select:
+ (match_operand:VQ_HSI 2 "register_operand" "w")
+ (match_operand:VQ_HSI 3 "vect_par_cnst_hi_half" "")))
+  (ANY_EXTEND: (vec_duplicate:
+	   (match_operand: 4 "register_operand" ""
+  (match_operand: 1 "register_operand" "0")))]
+  "TARGET_SIMD"
+  "mlal2\t%0., %2., %4.[0]"
+  [(set_attr "type" "neon_mla__long")]
+)
+
+(define_expand "aarch64_mlal_hi_n"
+  [(match_operand: 0 "register_operand")
+   (match_operand: 1 "register_operand")
+   (ANY_EXTEND:(match_operand:VQ_HSI 2 "register_operand"))
+   (match_operand: 3 "register_operand")]
+  "TARGET_SIMD"
+{
+  rtx p = aarch64_simd_vect_par_cnst_half (mode, , true);
+  emit_insn (gen_aarch64_mlal_hi_n_insn (operands[0],
+ operands[1], operands[2], p, operands[3]));
+  DONE;
+}
+)
+
 (define_insn "*aarch64_mlsl_lo"
   [(set (match_operand: 0 "register_operand" "=w")
 (minus:
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 53aae934c37eadc179bb1d4e7fe033d06364628a..ae8526d5972067c05265a1f0bcf9fde5e347fb3b 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -7298,48 +7298,28 @@ __extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vmlal_high_n_s16 (int32x4_t __a, int16x8_t __b, int16_t __c)
 {
-  int32x4_t __result;
-  __asm__ ("smlal2 %0.4s,%2.8h,%3.h[0]"
-   : "=w"(__result)
-   : "0"(__a), "w"(__b), "x"(__c)
-   : /* No clobbers */);
-  return __result;
+  return __builtin_aarch64_smlal_hi_nv8hi (__a, __b, __c);
 }
 
 __extension__ extern __inline int64x2_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vmlal_high_n_s32 (int64x2_t __a, int32x4_t __b, int32_t __c)
 {
-  int64x2_t __result;
-  __asm__ ("smlal2 %0.2d,%2.4s,%3.s[0]"
-   : "=w"(__result)
-   : "0"(__a), "w"(__b), "w"(__c)
-   : /* No clobbers */);
-  return __result;
+  return __builtin_aarch64_smlal_hi_nv4si (__a, __b, __c);
 }
 
 __extension__ extern __inline uint32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vmlal_high_n_u16 (uint32x4_t __a, uint16x8_t __b, uint16_t __c)
 {
-  uint32x4_t __result;
-  __asm__ ("umlal2 %0.4s,%2.8h,%3.h[0]"
-   : "=w"(__result)
-   : "0"(__a), "w"(__b), "x"(__c)
-   : /* No clobbers */);
-  return __result;
+  return __builtin_aarch64_umlal_hi_nv8hi_ (__a, __b, __c);
 }
 
 __extension__ extern __inline uint64x2_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vmlal_high_n_u32 (uint64x2_t __a, uint32x4_t __b, uint32_t __c)
 {
-  uint64x2_t __result;
-  __asm__ ("umlal2 %0.2d,%2.4s,%3.s[0]"
-   : "=w"(__result)
-   : "0"(__a), "w"(__b), "w"(__c)
-   : /* No clobbers */);
-  return __result;
+  return __builtin_aarch64_umlal_hi_nv4si_ (__a, __b, __c);
 }
 
 __extension__ extern __inline int16x8_t


[PATCH] aarch64: Use RTL builtins for [su]mlal_high intrinsics

2021-02-03 Thread Jonathan Wright via Gcc-patches
Hi,

As subject, this patch rewrites [su]mlal_high Neon intrinsics to use RTL
builtins rather than inline assembly code, allowing for better scheduling
and optimization.

Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.

Ok for master?

Thanks,
Jonathan

---

gcc/ChangeLog:

2021-01-27  Jonathan Wright  

* config/aarch64/aarch64-simd-builtins.def: Add RTL builtin
generator macros.
* config/aarch64/aarch64-simd.md (*aarch64_mlal_hi):
Rename to...
(aarch64_mlal_hi_insn): This.
(aarch64_mlal_hi): Define.
* config/aarch64/arm_neon.h (vmlal_high_s8): Use RTL builtin
instead of inline asm.
(vmlal_high_s16): Likewise.
(vmlal_high_s32): Likewise.
(vmlal_high_u8): Likewise.
(vmlal_high_u16): Likewise.
(vmlal_high_u32): Likewise.
diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index b82b6431d6f2a8d7d21023da589f3eecec7f0d65..2d91a0768d66fb8570ce518c06faae28c0ffcf27 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -230,6 +230,10 @@
   BUILTIN_VQW (TERNOP, smlsl_hi, 0, NONE)
   BUILTIN_VQW (TERNOPU, umlsl_hi, 0, NONE)
 
+  /* Implemented by aarch64_mlal_hi.  */
+  BUILTIN_VQW (TERNOP, smlal_hi, 0, NONE)
+  BUILTIN_VQW (TERNOPU, umlal_hi, 0, NONE)
+
   BUILTIN_VSQN_HSDI (UNOPUS, sqmovun, 0, NONE)
   /* Implemented by aarch64_qmovn.  */
   BUILTIN_VSQN_HSDI (UNOP, sqmovn, 0, NONE)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index d1858663a4e78c0861d902b37e93c0b00d75e661..ff5037fb44ebb4d1d37ab838de6391e105e90bbf 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1869,7 +1869,7 @@
   [(set_attr "type" "neon_mla__long")]
 )
 
-(define_insn "*aarch64_mlal_hi"
+(define_insn "aarch64_mlal_hi_insn"
   [(set (match_operand: 0 "register_operand" "=w")
 (plus:
   (mult:
@@ -1885,6 +1885,20 @@
   [(set_attr "type" "neon_mla__long")]
 )
 
+(define_expand "aarch64_mlal_hi"
+  [(match_operand: 0 "register_operand")
+   (match_operand: 1 "register_operand")
+   (ANY_EXTEND:(match_operand:VQW 2 "register_operand"))
+   (match_operand:VQW 3 "register_operand")]
+  "TARGET_SIMD"
+{
+  rtx p = aarch64_simd_vect_par_cnst_half (mode, , true);
+  emit_insn (gen_aarch64_mlal_hi_insn (operands[0], operands[1],
+		 operands[2], p, operands[3]));
+  DONE;
+}
+)
+
 (define_insn "*aarch64_mlsl_lo"
   [(set (match_operand: 0 "register_operand" "=w")
 (minus:
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index ad0dfef80f39c1baf1e8c7c1bb95f325eff6ac7a..53aae934c37eadc179bb1d4e7fe033d06364628a 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -7346,72 +7346,42 @@ __extension__ extern __inline int16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vmlal_high_s8 (int16x8_t __a, int8x16_t __b, int8x16_t __c)
 {
-  int16x8_t __result;
-  __asm__ ("smlal2 %0.8h,%2.16b,%3.16b"
-   : "=w"(__result)
-   : "0"(__a), "w"(__b), "w"(__c)
-   : /* No clobbers */);
-  return __result;
+  return __builtin_aarch64_smlal_hiv16qi (__a, __b, __c);
 }
 
 __extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vmlal_high_s16 (int32x4_t __a, int16x8_t __b, int16x8_t __c)
 {
-  int32x4_t __result;
-  __asm__ ("smlal2 %0.4s,%2.8h,%3.8h"
-   : "=w"(__result)
-   : "0"(__a), "w"(__b), "w"(__c)
-   : /* No clobbers */);
-  return __result;
+  return __builtin_aarch64_smlal_hiv8hi (__a, __b, __c);
 }
 
 __extension__ extern __inline int64x2_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vmlal_high_s32 (int64x2_t __a, int32x4_t __b, int32x4_t __c)
 {
-  int64x2_t __result;
-  __asm__ ("smlal2 %0.2d,%2.4s,%3.4s"
-   : "=w"(__result)
-   : "0"(__a), "w"(__b), "w"(__c)
-   : /* No clobbers */);
-  return __result;
+  return __builtin_aarch64_smlal_hiv4si (__a, __b, __c);
 }
 
 __extension__ extern __inline uint16x8_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vmlal_high_u8 (uint16x8_t __a, uint8x16_t __b, uint8x16_t __c)
 {
-  uint16x8_t __result;
-  __asm__ ("umlal2 %0.8h,%2.16b,%3.16b"
-   : "=w"(__result)
-   : "0"(__a), "w"(__b), "w"(__c)
-   : /* No clobbers */);
-  return __result;
+  return __builtin_aarch64_umlal_hiv16qi_ (__a, __b, __c);
 }
 
 __extension__ extern __inline uint32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 vmlal_high_u16 (uint32x4_t __a, uint16x8_t __b, uint16x8_t __c)
 {
-  uint32x4_t __result;
-  __asm__ ("umlal2 %0.4s,%2.8h,%3.8h"
-   : "=w"(__result)
-   : "0"(__a), "w"(__b), "w"(__c)
-   : /* No clobbers */);
-  return __result;
+  return 

Re: [PATCH] Fill up padding in lto_section struct.

2021-02-03 Thread Jakub Jelinek via Gcc-patches
On Wed, Feb 03, 2021 at 12:58:12PM +0100, Martin Liška wrote:
> The patch is about filling a padding byte that is later
> streamed with memcpy.
> 
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> 
> Ready to be installed?
> Thanks,
> Martin
> 
> gcc/ChangeLog:
> 
>   PR lto/98912
>   * lto-streamer-out.c (produce_lto_section): Fill up missing
>   padding.
>   * lto-streamer.h (struct lto_section): Add _padding field.

LGTM, thanks.

> --- a/gcc/lto-streamer-out.c
> +++ b/gcc/lto-streamer-out.c
> @@ -2670,7 +2670,7 @@ produce_lto_section ()
>bool slim_object = flag_generate_lto && !flag_fat_lto_objects;
>lto_section s
> -= { LTO_major_version, LTO_minor_version, slim_object, 0 };
> += { LTO_major_version, LTO_minor_version, slim_object, 0, 0 };
>s.set_compression (compression);
>lto_write_data (, sizeof s);
>lto_end_section ();
> diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h
> index 7736ae77b8b..5c7cd84d46f 100644
> --- a/gcc/lto-streamer.h
> +++ b/gcc/lto-streamer.h
> @@ -369,6 +369,7 @@ struct lto_section
>int16_t major_version;
>int16_t minor_version;
>unsigned char slim_object;
> +  unsigned char _padding;
>/* Flags is a private field that is not defined publicly.  */
>uint16_t flags;
> -- 
> 2.30.0

Jakub



Re: [PATCH] document BLOCK_ABSTRACT_ORIGIN et al.

2021-02-03 Thread Richard Biener via Gcc-patches
On Mon, Feb 1, 2021 at 5:20 PM Martin Sebor  wrote:
>
> I have pushed the tree.h comments in g:6a2053773b8.  I will wait
> for an approval of the changes to the manual.

Sorry for not looking earlier.

+/* The scope enclosing the scope NODE, or FUNCTION_DECL for the "outermost"
+   function scope.  Inlined functions are chained by this so that given
+   expression E and its TREE_BLOCK(E) B, BLOCK_SUPERCONTEXT(B) is the scope
+   in which E has been made or into which E has been inlined.   */

I can't really understand what you are trying to say with the second
sentence.  There's
nothing really special about BLOCK_SUPERCONTEXT and inlines so I believe this
sentence only adds confusion.

 #define BLOCK_SUPERCONTEXT(NODE) (BLOCK_CHECK (NODE)->block.supercontext)
+/* Points to the next scope at the same level of nesting as scope NODE.  */
 #define BLOCK_CHAIN(NODE) (BLOCK_CHECK (NODE)->block.chain)
+/* A BLOCK, or FUNCTION_DECL of the function from which a block has been
+   inlined.

... from which a block has been ultimatively copied for example by inlining.

[clones also will have abstract origins]

  In a scope immediately enclosing an inlined leaf expression,
+   points to the outermost scope into which it has been inlined (thus
+   bypassing all intermediate BLOCK_SUPERCONTEXTs). */

?

Maybe:  An inlined function is represented by a scope with
BLOCK_ABSTRACT_ORIGIN being the FUNCTION_DECL of the inlined function
containing the inlined functions scope tree as children.  All abstract origins
are ultimate, that is BLOCK_ABSTRACT_ORIGIN(NODE)
== BLOCK_ABSTRACT_ORIGIN(BLOCK_ABSTRACT_ORIGIN (NODE)).

 #define BLOCK_ABSTRACT_ORIGIN(NODE) (BLOCK_CHECK (NODE)->block.abstract_origin)


> On 1/27/21 5:54 PM, Martin Sebor wrote:
> > Attached is an updated patch for both tree.h and the internals manual
> > documenting the most important BLOCK_ macros and what they represent.
> >
> > On 1/21/21 2:52 PM, Martin Sebor wrote:
> >> On 1/18/21 6:25 AM, Richard Biener wrote:
>  PS Here are my notes on the macros and the two related functions:
> 
>  BLOCK: Denotes a lexical scope.  Contains BLOCK_VARS of variables
>  declared in it, BLOCK_SUBBLOCKS of scopes nested in it, and
>  BLOCK_CHAIN pointing to the next BLOCK.  Its BLOCK_SUPERCONTEXT
>  point to the BLOCK of the enclosing scope.  May have
>  a BLOCK_ABSTRACT_ORIGIN and a BLOCK_SOURCE_LOCATION.
> 
>  BLOCK_SUPERCONTEXT: The scope of the enclosing block, or FUNCTION_DECL
>  for the "outermost" function scope.  Inlined functions are chained by
>  this so that given expression E and its TREE_BLOCK(E) B,
>  BLOCK_SUPERCONTEXT(B) is the scope (BLOCK) in which E has been made
>  or into which E has been inlined.  In the latter case,
> 
>  BLOCK_ORIGIN(B) evaluates either to the enclosing BLOCK or to
>  the enclosing function DECL.  It's never null.
> 
>  BLOCK_ABSTRACT_ORIGIN(B) is the FUNCTION_DECL of the function into
>  which it has been inlined, or null if B is not inlined.
> >>>
> >>> It's the BLOCK or FUNCTION it was inlined _from_, not were it was
> >>> inlined to.
> >>> It's the "ultimate" source, thus the abstract copy of the block or
> >>> function decl
> >>> (for the outermost scope, aka inlined_function_outer_scope_p).  It
> >>> corresponds
> >>> to what you'd expect for the DWARF abstract origin.
> >>
> >> Thanks for the correction!  It's just the "innermost" block that
> >> points to the "ultimate" destination into which it's been inlined.
> >>
> >>>
> >>> BLOCK_ABSTRACT_ORIGIN can be NULL (in case it isn't an inline instance).
> >>>
>  BLOCK_ABSTRACT_ORIGIN: A BLOCK, or FUNCTION_DECL of the function
>  into which a block has been inlined.  In a BLOCK immediately enclosing
>  an inlined leaf expression points to the outermost BLOCK into which it
>  has been inlined (thus bypassing all intermediate BLOCK_SUPERCONTEXTs).
> 
>  BLOCK_FRAGMENT_ORIGIN: ???
>  BLOCK_FRAGMENT_CHAIN: ???
> >>>
> >>> that's for scope blocks split by hot/cold partitioning and only
> >>> temporarily
> >>> populated.
> >>
> >> Thanks, I now see these documented in detail in tree.h.
> >>
> >>>
>  bool inlined_function_outer_scope_p(BLOCK)   [tree.h]
>  Returns true if a BLOCK has a source location.
>  True for all but the innermost (no SUBBLOCKs?) and outermost blocks
>  into which an expression has been inlined. (Is this always true?)
> 
>  tree block_ultimate_origin(BLOCK)   [tree.c]
>  Returns BLOCK_ABSTRACT_ORIGIN(BLOCK), AO, after asserting that
>  (DECL_P(AO) && DECL_ORIGIN(AO) == AO) || BLOCK_ORIGIN(AO) == AO).
> >>
> >> The attached diff adds the comments above to tree.h.
> >>
> >> I looked for a good place in the manual to add the same text but I'm
> >> not sure.  Would the Blocks @subsection in generic.texi be appropriate?
> >>
> >> Martin
> >
> >
>


[PATCH] Fill up padding in lto_section struct.

2021-02-03 Thread Martin Liška

The patch is about filling a padding byte that is later
streamed with memcpy.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

gcc/ChangeLog:

PR lto/98912
* lto-streamer-out.c (produce_lto_section): Fill up missing
padding.
* lto-streamer.h (struct lto_section): Add _padding field.
---
 gcc/lto-streamer-out.c | 2 +-
 gcc/lto-streamer.h | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/lto-streamer-out.c b/gcc/lto-streamer-out.c
index 405f3bfc56c..a26d4885800 100644
--- a/gcc/lto-streamer-out.c
+++ b/gcc/lto-streamer-out.c
@@ -2670,7 +2670,7 @@ produce_lto_section ()
 
   bool slim_object = flag_generate_lto && !flag_fat_lto_objects;

   lto_section s
-= { LTO_major_version, LTO_minor_version, slim_object, 0 };
+= { LTO_major_version, LTO_minor_version, slim_object, 0, 0 };
   s.set_compression (compression);
   lto_write_data (, sizeof s);
   lto_end_section ();
diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h
index 7736ae77b8b..5c7cd84d46f 100644
--- a/gcc/lto-streamer.h
+++ b/gcc/lto-streamer.h
@@ -369,6 +369,7 @@ struct lto_section
   int16_t major_version;
   int16_t minor_version;
   unsigned char slim_object;
+  unsigned char _padding;
 
   /* Flags is a private field that is not defined publicly.  */

   uint16_t flags;
--
2.30.0



[PATCH v2] c++: fix string literal member initializer bug [PR90926]

2021-02-03 Thread Tom Greenslade (thomgree) via Gcc-patches
Update of https://gcc.gnu.org/pipermail/gcc-patches/2020-December/562259.html

build_aggr_conv did not correctly handle string literal member initializers.
Extended can_convert_array to handle this case. For the additional check of
compatibility of character types, factored out code from digest_init_r into a 
new function.

Testcase added for this.

Bootstrapped/regtested on x86_64-pc-linux-gnu.

gcc/cp/ChangeLog:

PR c++/90926
* call.c (can_convert_array): Extend to handle all valid aggregate
initializers of an array; including by string literals, not just by
brace-init-list.
(build_aggr_conv): Call can_convert_array more often, not just in
brace-init-list case.
* typeck2.c (character_array_from_string_literal): New function.
(digest_init_r): call character_array_from_string_literal
* cp-tree.h: (character_array_from_string_literal): Declare.
* g++.dg/cpp1y/nsdmi-aggr12.C: New test.

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index 87a7af12796..b917c67204f 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -895,28 +895,40 @@ strip_standard_conversion (conversion *conv)
   return conv;
 }
 
-/* Subroutine of build_aggr_conv: check whether CTOR, a braced-init-list,
-   is a valid aggregate initializer for array type ATYPE.  */
+/* Subroutine of build_aggr_conv: check whether FROM is a valid aggregate
+   initializer for array type ATYPE.  */
 
 static bool
-can_convert_array (tree atype, tree ctor, int flags, tsubst_flags_t complain)
+can_convert_array (tree atype, tree from, int flags, tsubst_flags_t complain)
 {
-  unsigned i;
   tree elttype = TREE_TYPE (atype);
-  for (i = 0; i < CONSTRUCTOR_NELTS (ctor); ++i)
+  unsigned i;
+
+  if (TREE_CODE (from) == CONSTRUCTOR)
 {
-  tree val = CONSTRUCTOR_ELT (ctor, i)->value;
-  bool ok;
-  if (TREE_CODE (elttype) == ARRAY_TYPE
- && TREE_CODE (val) == CONSTRUCTOR)
-   ok = can_convert_array (elttype, val, flags, complain);
-  else
-   ok = can_convert_arg (elttype, TREE_TYPE (val), val, flags,
- complain);
-  if (!ok)
-   return false;
+  for (i = 0; i < CONSTRUCTOR_NELTS (from); ++i)
+   {
+ tree val = CONSTRUCTOR_ELT (from, i)->value;
+ bool ok;
+ if (TREE_CODE (elttype) == ARRAY_TYPE)
+   ok = can_convert_array (elttype, val, flags, complain);
+ else
+   ok = can_convert_arg (elttype, TREE_TYPE (val), val, flags,
+ complain);
+ if (!ok)
+   return false;
+   }
+  return true;
 }
-  return true;
+
+  if (char_type_p (TYPE_MAIN_VARIANT (elttype))
+  && TREE_CODE (tree_strip_any_location_wrapper (from)) == STRING_CST)
+{ 
+  return character_array_from_string_literal(atype, from);
+}
+
+  /* No other valid way to aggregate initialize an array.  */
+  return false;
 }
 
 /* Helper for build_aggr_conv.  Return true if FIELD is in PSET, or if
@@ -973,8 +985,7 @@ build_aggr_conv (tree type, tree ctor, int flags, 
tsubst_flags_t complain)
  tree ftype = TREE_TYPE (idx);
  bool ok;
 
- if (TREE_CODE (ftype) == ARRAY_TYPE
- && TREE_CODE (val) == CONSTRUCTOR)
+ if (TREE_CODE (ftype) == ARRAY_TYPE)
ok = can_convert_array (ftype, val, flags, complain);
  else
ok = can_convert_arg (ftype, TREE_TYPE (val), val, flags,
@@ -1021,9 +1032,8 @@ build_aggr_conv (tree type, tree ctor, int flags, 
tsubst_flags_t complain)
  val = empty_ctor;
}
 
-  if (TREE_CODE (ftype) == ARRAY_TYPE
- && TREE_CODE (val) == CONSTRUCTOR)
-   ok = can_convert_array (ftype, val, flags, complain);
+  if (TREE_CODE (ftype) == ARRAY_TYPE)
+   ok = can_convert_array (ftype, val, flags, complain); 
   else
ok = can_convert_arg (ftype, TREE_TYPE (val), val, flags,
  complain);
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index f31319904eb..8dfc581 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -7946,6 +7946,7 @@ extern tree split_nonconstant_init(tree, 
tree);
 extern bool check_narrowing(tree, tree, tsubst_flags_t,
 bool = false);
 extern bool ordinary_char_type_p   (tree);
+extern bool character_array_from_string_literal (tree, tree);
 extern tree digest_init(tree, tree, 
tsubst_flags_t);
 extern tree digest_init_flags  (tree, tree, int, 
tsubst_flags_t);
 extern tree digest_nsdmi_init  (tree, tree, tsubst_flags_t);
diff --git a/gcc/cp/typeck2.c b/gcc/cp/typeck2.c
index 9ba2897390a..8fbabeb46d9 100644
--- a/gcc/cp/typeck2.c
+++ b/gcc/cp/typeck2.c
@@ -1003,6 +1003,28 @@ ordinary_char_type_p (tree type)
  || type == unsigned_char_type_node);
 }
 
+/* Checks if 

Re: [PATCH] Add unordered containers heterogeneous lookup

2021-02-03 Thread Jonathan Wakely via Gcc-patches

On 03/02/21 11:23 +, Jonathan Wakely wrote:

On 25/01/21 19:21 +0100, François Dumont via Libstdc++ wrote:
I think I never got a clear answer that we'll wait for stage 1 to 
consider this patch so here is a ping.


My concern with this patch is that it alters the existing code used
for non-heterogeneous lookups in C++11/14/17. I think if we're going
to do thatk, it needs to wait for stage 1.

If the new C++20 code was added with new _M_find_before_node and


s/_M_find_before_node/_M_find_before_node_tr/


_M_find_node_tr members (duplicating the existing code) then it would
be safe to add now, because it wouldn't touch the stable C++11/14/17
code.




Re: [PATCH] Add unordered containers heterogeneous lookup

2021-02-03 Thread Jonathan Wakely via Gcc-patches

On 25/01/21 19:21 +0100, François Dumont via Libstdc++ wrote:
I think I never got a clear answer that we'll wait for stage 1 to 
consider this patch so here is a ping.


My concern with this patch is that it alters the existing code used
for non-heterogeneous lookups in C++11/14/17. I think if we're going
to do thatk, it needs to wait for stage 1.

If the new C++20 code was added with new _M_find_before_node and
_M_find_node_tr members (duplicating the existing code) then it would
be safe to add now, because it wouldn't touch the stable C++11/14/17
code.



On 01/12/20 8:19 am, François Dumont wrote:
Let me know if I need to reference a specific paper or any other 
Standard reference here. Maybe P1690R1 I used here ?


I tried to allow the same partition trick you can have on ordered 
containers (see Partition in tests) even if here elements are not 
ordered so I aren't sure there can be any usage of it.


    libstdc++: Add unordered containers heterogeneous lookup

    Add unordered containers heterogeneous lookup member functions 
find, count, contains and
    equal_range in C++20. Those members are considered for overload 
resolution only if hash and
    equal functors used to instantiate the container have a nested 
is_transparent type.


    libstdc++-v3/ChangeLog:

    * include/bits/stl_tree.h
    (__has_is_transparent, __has_is_transparent_t): Move...
    * include/bits/stl_function.h: ...here.
    * include/bits/hashtable_policy.h 
(_Hash_code_base<>::_M_hash_code):

    Use template key type.
    (_Hashtable_base<>::_M_equals): Likewise.
    * include/bits/hashtable.h (_Hashtable<>::_M_find_tr, 
_Hashtable<>::_M_count_tr,
    _Hashtable<>::_M_equal_range_tr): New member function 
templates to perform

    heterogeneous lookup.
    (_Hashtable<>::_M_find_before_node): Use template key type.
    (_Hashtable<>::_M_find_node): Likewise.
    * include/bits/unordered_map.h (unordered_map::find<>, 
unordered_map::count<>,
    unordered_map::contains<>, 
unordered_map::equal_range<>): New member function

    templates to perform heterogeneous lookup.
    (unordered_multimap::find<>, unordered_multimap::count<>,
    unordered_multimap::contains<>, 
unordered_multimap::equal_range<>): Likewise.
    * include/bits/unordered_set.h (unordered_set::find<>, 
unordered_set::count<>,
    unordered_set::contains<>, 
unordered_set::equal_range<>): Likewise.

    (unordered_multiset::find<>, unordered_multiset::count<>,
    unordered_multiset::contains<>, 
unordered_multiset::equal_range<>): Likewise.

    * include/debug/unordered_map
    (unordered_map::find<>, unordered_map::equal_range<>): 
Likewise.
    (unordered_multimap::find<>, 
unordered_multimap::equal_range<>): Likewise.

    * include/debug/unordered_set
    (unordered_set::find<>, unordered_set::equal_range<>): 
Likewise.
    (unordered_multiset::find<>, 
unordered_multiset::equal_range<>): Likewise.
    * testsuite/23_containers/unordered_map/operations/1.cc: 
New test.
    * 
testsuite/23_containers/unordered_multimap/operations/1.cc: New 
test.
    * 
testsuite/23_containers/unordered_multiset/operations/1.cc: New 
test.
    * testsuite/23_containers/unordered_set/operations/1.cc: 
New test.


Tested under Linux x86_64 normal and debug modes.

François







Re: [RFC] Feedback on approach for adding support for V8QI->V8HI widening patterns

2021-02-03 Thread Joel Hutton via Gcc-patches
> Do you mean a v8qi->v8hi widening subtract or a v16qi->v8hi widening
> subtract?  

I mean the latter, that seemed to be what richi was suggesting previously. 

> The problem with the latter is that we need to fill the
> extra unused elements with something and remove them later.

That's fair enough, fake/don't care elements is a bit of a hack. I'll try 
something out with the conversions and regular subtract.

thanks for the help,
Joel


Re: [RFC] Feedback on approach for adding support for V8QI->V8HI widening patterns

2021-02-03 Thread Richard Sandiford via Gcc-patches
Joel Hutton  writes:
>>> So emit a v4qi->v8qi gimple conversion
>>> then a regular widen_lo/hi using the existing backend patterns/optabs?
>>
>>I was thinking of using a v8qi->v8hi convert on each operand followed
>>by a normal v8hi subtraction.  That's what we'd generate if the target
>>didn't define the widening patterns.
>
> Is there a reason that conversion is preferred? If we use a widening subtract
> then we don't need to rely on RTL fusing later.

Do you mean a v8qi->v8hi widening subtract or a v16qi->v8hi widening
subtract?  The problem with the latter is that we need to fill the
extra unused elements with something and remove them later.
And I don't think Richard liked the idea of having separate
v8qi->v8hi and v16qi->v8hi widening subtracts.

Relying on RTL fusing doesn't seem too bad TBH.

Thanks,
Richard


[Ada] Fix regression with partial rep clause on variant record type

2021-02-03 Thread Eric Botcazou
It is present on the mainline, 10 and 9 branches, and can yield an incorrect
layout when there is a partial representation clause on a discriminated record
type with a variant part.

Tested on x86-64/Linux, applied on mainline, 10 and 9 branches.


2021-02-03  Eric Botcazou  

* gcc-interface/decl.c (components_to_record): If the first component
with rep clause is the _Parent field with variable size, temporarily
set it aside when computing the internal layout of the REP part again.
* gcc-interface/utils.c (finish_record_type): Revert to taking the
maximum when merging sizes for all record types with rep clause.
(merge_sizes): Put SPECIAL parameter last and adjust recursive calls.

-- 
Eric Botcazoudiff --git a/gcc/ada/gcc-interface/decl.c b/gcc/ada/gcc-interface/decl.c
index 8120d4e33cf..aea191c7ecb 100644
--- a/gcc/ada/gcc-interface/decl.c
+++ b/gcc/ada/gcc-interface/decl.c
@@ -8330,12 +8330,12 @@ components_to_record (Node_Id gnat_component_list, Entity_Id gnat_record_type,
   if (p_gnu_rep_list && gnu_rep_list)
 *p_gnu_rep_list = chainon (*p_gnu_rep_list, gnu_rep_list);
 
-  /* Deal with the annoying case of an extension of a record with variable size
- and partial rep clause, for which the _Parent field is forced at offset 0
- and has variable size, which we do not support below.  Note that we cannot
- do it if the field has fixed size because we rely on the presence of the
- REP part built below to trigger the reordering of the fields in a derived
- record type when all the fields have a fixed position.  */
+  /* Deal with the case of an extension of a record type with variable size and
+ partial rep clause, for which the _Parent field is forced at offset 0 and
+ has variable size.  Note that we cannot do it if the field has fixed size
+ because we rely on the presence of the REP part built below to trigger the
+ reordering of the fields in a derived record type when all the fields have
+ a fixed position.  */
   else if (gnu_rep_list
 	   && !DECL_CHAIN (gnu_rep_list)
 	   && TREE_CODE (DECL_SIZE (gnu_rep_list)) != INTEGER_CST
@@ -8353,33 +8353,52 @@ components_to_record (Node_Id gnat_component_list, Entity_Id gnat_record_type,
  record, before the others, if we also have fields without rep clause.  */
   else if (gnu_rep_list)
 {
-  tree gnu_rep_type, gnu_rep_part;
-  int i, len = list_length (gnu_rep_list);
-  tree *gnu_arr = XALLOCAVEC (tree, len);
+  tree gnu_parent, gnu_rep_type;
 
   /* If all the fields have a rep clause, we can do a flat layout.  */
   layout_with_rep = !gnu_field_list
 			&& (!gnu_variant_part || variants_have_rep);
+
+  /* Same as above but the extension itself has a rep clause, in which case
+	 we need to set aside the _Parent field to lay out the REP part.  */
+  if (TREE_CODE (DECL_SIZE (gnu_rep_list)) != INTEGER_CST
+	  && !layout_with_rep
+	  && !variants_have_rep
+	  && first_free_pos
+	  && integer_zerop (first_free_pos)
+	  && integer_zerop (bit_position (gnu_rep_list)))
+	{
+	  gnu_parent = gnu_rep_list;
+	  gnu_rep_list = DECL_CHAIN (gnu_rep_list);
+	}
+  else
+	gnu_parent = NULL_TREE;
+
   gnu_rep_type
 	= layout_with_rep ? gnu_record_type : make_node (RECORD_TYPE);
 
-  for (gnu_field = gnu_rep_list, i = 0;
-	   gnu_field;
-	   gnu_field = DECL_CHAIN (gnu_field), i++)
-	gnu_arr[i] = gnu_field;
+  /* Sort the fields in order of increasing bit position.  */
+  const int len = list_length (gnu_rep_list);
+  tree *gnu_arr = XALLOCAVEC (tree, len);
+
+  gnu_field = gnu_rep_list;
+  for (int i = 0; i < len; i++)
+	{
+	  gnu_arr[i] = gnu_field;
+	  gnu_field = DECL_CHAIN (gnu_field);
+	}
 
   qsort (gnu_arr, len, sizeof (tree), compare_field_bitpos);
 
-  /* Put the fields in the list in order of increasing position, which
-	 means we start from the end.  */
   gnu_rep_list = NULL_TREE;
-  for (i = len - 1; i >= 0; i--)
+  for (int i = len - 1; i >= 0; i--)
 	{
 	  DECL_CHAIN (gnu_arr[i]) = gnu_rep_list;
 	  gnu_rep_list = gnu_arr[i];
 	  DECL_CONTEXT (gnu_arr[i]) = gnu_rep_type;
 	}
 
+  /* Do the layout of the REP part, if any.  */
   if (layout_with_rep)
 	gnu_field_list = gnu_rep_list;
   else
@@ -8388,14 +8407,36 @@ components_to_record (Node_Id gnat_component_list, Entity_Id gnat_record_type,
 	= create_concat_name (gnat_record_type, "REP");
 	  TYPE_REVERSE_STORAGE_ORDER (gnu_rep_type)
 	= TYPE_REVERSE_STORAGE_ORDER (gnu_record_type);
-	  finish_record_type (gnu_rep_type, gnu_rep_list, 1, debug_info);
+	  finish_record_type (gnu_rep_type, gnu_rep_list, 1, false);
 
 	  /* If FIRST_FREE_POS is nonzero, we need to ensure that the fields
 	 without rep clause are laid out starting from this position.
 	 Therefore, we force it as a minimal size on the REP part.  */
-	  gnu_rep_part
+	  tree gnu_rep_part
 	= create_rep_part (gnu_rep_type, 

Re: [RFC] Feedback on approach for adding support for V8QI->V8HI widening patterns

2021-02-03 Thread Joel Hutton via Gcc-patches
>> So emit a v4qi->v8qi gimple conversion
>> then a regular widen_lo/hi using the existing backend patterns/optabs?
>
>I was thinking of using a v8qi->v8hi convert on each operand followed
>by a normal v8hi subtraction.  That's what we'd generate if the target
>didn't define the widening patterns.

Is there a reason that conversion is preferred? If we use a widening subtract
then we don't need to rely on RTL fusing later.


Re: [PATCH 2/4] openacc: Use class_pointer instead of pointer attribute for class types

2021-02-03 Thread Tobias Burnus

On 02.02.21 14:28, Julian Brown wrote:


Elsewhere in the Fortran front-end, the class_pointer attribute is
used for BT_CLASS entities instead of the pointer attribute. [...]This patch
follows suit for OpenACC. I couldn't actually come up with a test case
where this makes a difference (i.e., where "class_pointer" and "pointer"
have different values at this point in the code), but this may nonetheless
fix a latent bug.

Tested with offloading to AMD GCN. OK for mainline?


I think attr.pointer == true != attr.class_pointer only happens
for dummy arguments and select-type temporaries, neither of which
can occur for derived-type components.

Thus, I think it is not needed – but there are merits of having
consistency. Hence, OK.

Tobias


2020-02-02  Julian Brown

gcc/fortran/
  * trans-openmp.c (gfc_trans_omp_clauses): Use class_pointer attribute
  for BT_CLASS.
---
  gcc/fortran/trans-openmp.c | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
index 8d8da4593c3..7be34ef9a35 100644
--- a/gcc/fortran/trans-openmp.c
+++ b/gcc/fortran/trans-openmp.c
@@ -2997,7 +2997,10 @@ gfc_trans_omp_clauses (stmtblock_t *block, 
gfc_omp_clauses *clauses,
if (lastcomp->u.c.component->ts.type == BT_DERIVED
|| lastcomp->u.c.component->ts.type == BT_CLASS)
  {
-   if (sym_attr.pointer || (openacc && sym_attr.allocatable))
+   bool pointer
+ = (lastcomp->u.c.component->ts.type == BT_CLASS
+? sym_attr.class_pointer : sym_attr.pointer);
+   if (pointer || (openacc && sym_attr.allocatable))
  {
tree data, size;

-- 2.29.2

-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf


[Ada] Assorted LTO fixes

2021-02-03 Thread Eric Botcazou
This polishes a few rough edges visible in LTO mode.

Tested on x86-64/Linux, applied on mainline, 10 and 9 branches.


2021-02-03  Eric Botcazou  

* gcc-interface/decl.c (gnat_to_gnu_entity) : Make the
two fields of the fat pointer type addressable, and do not make the
template type read-only.
: If the type has discriminants, mark it as may_alias.
* gcc-interface/utils.c (make_dummy_type): Likewise.
(build_dummy_unc_pointer_types): Likewise.

-- 
Eric Botcazoudiff --git a/gcc/ada/gcc-interface/decl.c b/gcc/ada/gcc-interface/decl.c
index 5ea1b16af67..8120d4e33cf 100644
--- a/gcc/ada/gcc-interface/decl.c
+++ b/gcc/ada/gcc-interface/decl.c
@@ -2197,14 +2197,16 @@ gnat_to_gnu_entity (Entity_Id gnat_entity, tree gnu_expr, bool definition)
 	  }
 	else
 	  {
+	/* We make the fields addressable for the sake of compatibility
+	   with languages for which the regular fields are addressable.  */
 	tem
 	  = create_field_decl (get_identifier ("P_ARRAY"),
    ptr_type_node, gnu_fat_type,
-   NULL_TREE, NULL_TREE, 0, 0);
+   NULL_TREE, NULL_TREE, 0, 1);
 	DECL_CHAIN (tem)
 	  = create_field_decl (get_identifier ("P_BOUNDS"),
    gnu_ptr_template, gnu_fat_type,
-   NULL_TREE, NULL_TREE, 0, 0);
+   NULL_TREE, NULL_TREE, 0, 1);
 	finish_fat_pointer_type (gnu_fat_type, tem);
 	SET_TYPE_UNCONSTRAINED_ARRAY (gnu_fat_type, gnu_type);
 	  }
@@ -2327,7 +2329,6 @@ gnat_to_gnu_entity (Entity_Id gnat_entity, tree gnu_expr, bool definition)
 	finish_record_type (gnu_template_type, gnu_template_fields, 0,
 			debug_info_p);
 	TYPE_CONTEXT (gnu_template_type) = current_function_decl;
-	TYPE_READONLY (gnu_template_type) = 1;
 
 	/* If Component_Size is not already specified, annotate it with the
 	   size of the component.  */
@@ -3054,15 +3055,24 @@ gnat_to_gnu_entity (Entity_Id gnat_entity, tree gnu_expr, bool definition)
 		|| type_annotate_only);
 	  }
 
-	/* Make a node for the record.  If we are not defining the record,
-	   suppress expanding incomplete types.  */
+	/* Make a node for the record type.  */
 	gnu_type = make_node (tree_code_for_record_type (gnat_entity));
 	TYPE_NAME (gnu_type) = gnu_entity_name;
 	TYPE_PACKED (gnu_type) = (packed != 0) || has_align || has_rep;
 	TYPE_REVERSE_STORAGE_ORDER (gnu_type)
 	  = Reverse_Storage_Order (gnat_entity);
+
+	/* If the record type has discriminants, pointers to it may also point
+	   to constrained subtypes of it, so mark it as may_alias for LTO.  */
+	if (has_discr)
+	  prepend_one_attribute
+	(_list, ATTR_MACHINE_ATTRIBUTE,
+	 get_identifier ("may_alias"), NULL_TREE,
+	 gnat_entity);
+
 	process_attributes (_type, _list, true, gnat_entity);
 
+	/* If we are not defining it, suppress expanding incomplete types.  */
 	if (!definition)
 	  {
 	defer_incomplete_level++;
diff --git a/gcc/ada/gcc-interface/utils.c b/gcc/ada/gcc-interface/utils.c
index c503bfbb36d..2656f117fa9 100644
--- a/gcc/ada/gcc-interface/utils.c
+++ b/gcc/ada/gcc-interface/utils.c
@@ -467,6 +467,11 @@ make_dummy_type (Entity_Id gnat_type)
 = create_type_stub_decl (TYPE_NAME (gnu_type), gnu_type);
   if (Is_By_Reference_Type (gnat_equiv))
 TYPE_BY_REFERENCE_P (gnu_type) = 1;
+  if (Has_Discriminants (gnat_equiv))
+decl_attributes (_type,
+		 tree_cons (get_identifier ("may_alias"), NULL_TREE,
+NULL_TREE),
+		 ATTR_FLAG_TYPE_IN_PLACE);
 
   SET_DUMMY_NODE (gnat_equiv, gnu_type);
 
@@ -516,10 +521,10 @@ build_dummy_unc_pointer_types (Entity_Id gnat_desig_type, tree gnu_desig_type)
 = create_type_stub_decl (create_concat_name (gnat_desig_type, "XUP"),
 			 gnu_fat_type);
   fields = create_field_decl (get_identifier ("P_ARRAY"), gnu_ptr_array,
-			  gnu_fat_type, NULL_TREE, NULL_TREE, 0, 0);
+			  gnu_fat_type, NULL_TREE, NULL_TREE, 0, 1);
   DECL_CHAIN (fields)
 = create_field_decl (get_identifier ("P_BOUNDS"), gnu_ptr_template,
-			 gnu_fat_type, NULL_TREE, NULL_TREE, 0, 0);
+			 gnu_fat_type, NULL_TREE, NULL_TREE, 0, 1);
   finish_fat_pointer_type (gnu_fat_type, fields);
   SET_TYPE_UNCONSTRAINED_ARRAY (gnu_fat_type, gnu_desig_type);
   /* Suppress debug info until after the type is completed.  */


Re: [RFC] Feedback on approach for adding support for V8QI->V8HI widening patterns

2021-02-03 Thread Richard Sandiford via Gcc-patches
Joel Hutton  writes:
 In practice this will only affect targets that choose to use mixed
 vector sizes, and I think it's reasonable to optimise only for the
 case in which such targets support widening conversions.  So what
 do you think about the idea of emitting separate conversions and
 a normal subtract?  We'd be relying on RTL to fuse them together,
 but at least there would be no redundancy to eliminate.
>>>
>>> So in vectorizable_conversion for the widen-minus you'd check
>>> whether you can do a v4qi -> v4hi and then emit a conversion
>>> and a wide minus?
>>
>>Yeah.
>
> This seems reasonable, as I recall we decided against adding
> internal functions for the time being as all the existing vec patterns
> code would have to be refactored.

FWIW, that was for the hi/lo part.  The internal function in this case
would have been a normal standalone operation that makes sense independently
of the hi/lo pairs, and could be generated independently of the vectoriser
(e.g. from match.pd patterns).

Using an internal function is actually less work than using a tree code,
because you don't need to update all the various tree_code switch
statements.

> So emit a v4qi->v8qi gimple conversion
> then a regular widen_lo/hi using the existing backend patterns/optabs?

I was thinking of using a v8qi->v8hi convert on each operand followed
by a normal v8hi subtraction.  That's what we'd generate if the target
didn't define the widening patterns.

Thanks,
Richard



Re: [RFC] Feedback on approach for adding support for V8QI->V8HI widening patterns

2021-02-03 Thread Joel Hutton via Gcc-patches
>>> In practice this will only affect targets that choose to use mixed
>>> vector sizes, and I think it's reasonable to optimise only for the
>>> case in which such targets support widening conversions.  So what
>>> do you think about the idea of emitting separate conversions and
>>> a normal subtract?  We'd be relying on RTL to fuse them together,
>>> but at least there would be no redundancy to eliminate.
>>
>> So in vectorizable_conversion for the widen-minus you'd check
>> whether you can do a v4qi -> v4hi and then emit a conversion
>> and a wide minus?
>
>Yeah.

This seems reasonable, as I recall we decided against adding
internal functions for the time being as all the existing vec patterns
code would have to be refactored. So emit a v4qi->v8qi gimple conversion
then a regular widen_lo/hi using the existing backend patterns/optabs?


Re: [PATCH] i386, df: Fix up gcc.c-torture/compile/20051216-1.c -O1 -march=cascadelake

2021-02-03 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> On Tue, 2 Feb 2021, Richard Sandiford wrote:
>
>> Richard Biener  writes:
>> > On January 30, 2021 11:52:20 AM GMT+01:00, Jakub Jelinek 
>> >  wrote:
>> >>On Sat, Jan 30, 2021 at 11:47:24AM +0100, Richard Biener wrote:
>> >>> OK, so I'd prefer we simply unset the flag after processing deferred
>> >>rescan. I clearly misread the function to do that. 
>> >>
>> >>This works too, will bootstrap/regtest it now.
>> >
>> > OK. 
>> 
>> FWIW, I'm still not convinced we need to defer the rescan here.
>> AIUI, the concern was that the pass introduces uses of something
>> and then only later introduces the definition.  But that's OK:
>> the two things don't have to be added in a set order.
>> 
>> In particular, the normal rescan doesn't propagate the effects
>> throughout the cfg; it just updates the list of references in the
>> instruction itself and marks the block as dirty for later processing.
>> 
>> The usual reason for deferring rescans is if the instruction can't
>> cope with the df_refs changing from under them like that, e.g.
>> because they're iterating through a list of df_refs, or because
>> they're using df_refs to represent value numbers.  It shouldn't
>> be a problem for this pass.
>
> I guess that's correct but doing the scan on the "final" insn
> feels much cleaner - given all the confusion it also looks safer
> (doing defered scans as before) since I intend to backport this.
> But if you feel strongly about this we can of course change it.

I don't feel too strongly about it.  It's just that I think the
reason we needed two bites at this is because we're trying to do
something unusual (have a pass not use DF but nevertheless
micromanage the way DF does its updates) instead of trusting
the infrastructure to DTRT.

Richard


Re: [RFC] Feedback on approach for adding support for V8QI->V8HI widening patterns

2021-02-03 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> On Tue, Feb 2, 2021 at 5:19 PM Richard Sandiford
>  wrote:
>>
>> Richard Biener  writes:
>> > On Tue, Feb 2, 2021 at 4:03 PM Richard Sandiford
>> >  wrote:
>> >>
>> >> Richard Biener  writes:
>> >> > On Mon, Feb 1, 2021 at 6:54 PM Joel Hutton  wrote:
>> >> >>
>> >> >> Hi Richard(s),
>> >> >>
>> >> >> I'm just looking to see if I'm going about this the right way, based 
>> >> >> on the discussion we had on IRC. I've managed to hack something 
>> >> >> together, I've attached a (very) WIP patch which gives the correct 
>> >> >> codegen for the testcase in question 
>> >> >> (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98772). It would 
>> >> >> obviously need to support other widening patterns and differentiate 
>> >> >> between big/little endian among other things.
>> >> >>
>> >> >> I added a backend pattern because I wasn't quite clear which changes 
>> >> >> to make in order to allow the existing backend patterns to be used 
>> >> >> with a V8QI, or how to represent V16QI where we don't care about the 
>> >> >> top/bottom 8. I made some attempt in optabs.c, which is in the patch 
>> >> >> commented out, but I'm not sure if I'm going about this the right way.
>> >> >
>> >> > Hmm, as said, I'd try to arrange like illustrated in the attachment,
>> >> > confined to vectorizable_conversion.  The
>> >> > only complication might be sub-optimal code-gen for the vector-vector
>> >> > CTOR compensating for the input
>> >> > vector (on RTL that would be a paradoxical subreg from say V4HI to V8HI)
>> >>
>> >> Yeah.  I don't really like this because it means that it'll be
>> >> impossible to remove the redundant work in gimple.  The extra elements
>> >> are just a crutch to satisfy the type system.
>> >
>> > We can certainly devise a more clever way to represent a paradoxical 
>> > subreg,
>> > but at least the actual operation (WIDEN_MINUS_LOW) would match what
>> > the hardware can do.
>>
>> At least for the Arm ISAs, the low parts are really 64-bit → 128-bit
>> operations.  E.g. the low-part intrinsic for signed 8-bit integers is:
>>
>>int16x8_t vsubl_s8 (int8x8_t __a, int8x8_t __b);
>>
>> whereas the high-part intrinsic is:
>>
>>int16x8_t vsubl_high_s8 (int8x16_t __a, int8x16_t __b);
>>
>> So representing the low part as a 128-bit → 128-bit operation is already
>> a little artifical.
>
> that's intrinsincs - but I guess the actual machine instruction is different?

FWIW, the instructions are the same.  E.g. for AArch64 it's:

ssubl   v0.8h, v0.8b, v1.8b

(8b being a 64-bit vector and 8h being a 128-bit vector) instead of:

ssubl   v0.8h, v0.16b, v1.16b

The AArch32 lowpart is:

vsubl.s16 q0, d0, d1

where a q register joins together two d registers.

>> > OTOH we could simply accept half of a vector for
>> > the _LOW (little-endial) or _HIGH (big-endian) op and have the expander
>> > deal with subreg frobbing?  Not that I'd like that very much though, even
>> > a VIEW_CONVERT  (v4hi-reg) would be cleaner IMHO (not sure
>> > how to go about endianess here ... the _LOW/_HIGH paints us into some
>> > corner here)
>>
>> I think it only makes sense for the low part.  But yeah, I guess that
>> would work (although I agree it doesn't seem very appealing :-)).
>>
>> > A new IFN (direct optab?) means targets with existing support for _LO/HI
>> > do not automatically benefit which is a shame.
>>
>> In practice this will only affect targets that choose to use mixed
>> vector sizes, and I think it's reasonable to optimise only for the
>> case in which such targets support widening conversions.  So what
>> do you think about the idea of emitting separate conversions and
>> a normal subtract?  We'd be relying on RTL to fuse them together,
>> but at least there would be no redundancy to eliminate.
>
> So in vectorizable_conversion for the widen-minus you'd check
> whether you can do a v4qi -> v4hi and then emit a conversion
> and a wide minus?

Yeah.

Richard

> I guess as long as vectorizer costing behaves
> as if the op is fused that's a similarly OK trick as a V_C_E or a
> vector CTOR.
>
> Richard.
>
>> Thanks,
>> Richard
>> >
>> >> As far as Joel's patch goes, I was imagining that the new operation
>> >> would be an internal function rather than a tree code.  However,
>> >> if we don't want that, maybe we should just emit separate conversions
>> >> and a normal subtraction, like we would for (signed) x - (unsigned) y.
>> >>
>> >> Thanks,
>> >> Richard


[PATCH] more memory leak fixes

2021-02-03 Thread Richard Biener
This fixes more memory leaks as discovered by building 521.wrf_r.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-02-03  Richard Biener  

* lto-streamer.c (lto_get_section_name): Free temporary
buffer.
* tree-loop-distribution.c
(loop_distribution::merge_dep_scc_partitions): Free edge data.
---
 gcc/lto-streamer.c   | 8 ++--
 gcc/tree-loop-distribution.c | 1 +
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/gcc/lto-streamer.c b/gcc/lto-streamer.c
index 36e673514a7..84db5eb944d 100644
--- a/gcc/lto-streamer.c
+++ b/gcc/lto-streamer.c
@@ -106,6 +106,7 @@ lto_get_section_name (int section_type, const char *name,
   const char *add;
   char post[32];
   const char *sep;
+  char *buffer = NULL;
 
   if (section_type == LTO_section_function_body)
 {
@@ -113,7 +114,7 @@ lto_get_section_name (int section_type, const char *name,
   if (name[0] == '*')
name++;
 
-  char *buffer = (char *)xmalloc (strlen (name) + 32);
+  buffer = (char *)xmalloc (strlen (name) + 32);
   sprintf (buffer, "%s.%d", name, node_order);
 
   add = buffer;
@@ -138,7 +139,10 @@ lto_get_section_name (int section_type, const char *name,
 sprintf (post, "." HOST_WIDE_INT_PRINT_HEX_PURE, f->id);
   else
 sprintf (post, "." HOST_WIDE_INT_PRINT_HEX_PURE, get_random_seed (false)); 
-  return concat (section_name_prefix, sep, add, post, NULL);
+  char *res = concat (section_name_prefix, sep, add, post, NULL);
+  if (buffer)
+free (buffer);
+  return res;
 }
 
 
diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c
index bb15fd3723f..7ee19fc8677 100644
--- a/gcc/tree-loop-distribution.c
+++ b/gcc/tree-loop-distribution.c
@@ -2358,6 +2358,7 @@ loop_distribution::merge_dep_scc_partitions (struct graph 
*rdg,
   sort_partitions_by_post_order (pg, partitions);
   gcc_assert (partitions->length () == (unsigned)num_sccs);
   free_partition_graph_vdata (pg);
+  for_each_edge (pg, free_partition_graph_edata_cb, NULL);
   free_graph (pg);
 }
 
-- 
2.26.2


[PATCH] rs6000: Convert the vector element register to SImode [PR98914]

2021-02-03 Thread Xionghu Luo via Gcc-patches
v[k] will also be expanded to IFN VEC_SET if k is long type when built
with -Og.  -O0 didn't exposed the issue due to v is TREE_ADDRESSABLE,
-O1 and above also didn't capture it because of v[k] is not optimized to
VIEW_CONVERT_EXPR(v)[k_1].
vec_insert defines the element argument type to be signed int by ELFv2
ABI, so convert it to SImode if it wasn't for Power target requirements.

gcc/ChangeLog:

2021-02-03  Xionghu Luo  

* config/rs6000/rs6000.c (rs6000_expand_vector_set): Convert
elt_rtx to SImode if it wasn't.

gcc/testsuite/ChangeLog:

2021-02-03  Xionghu Luo  

* gcc.target/powerpc/pr98914.c: New test.
---
 gcc/config/rs6000/rs6000.c | 17 ++---
 gcc/testsuite/gcc.target/powerpc/pr98914.c | 11 +++
 2 files changed, 21 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr98914.c

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index ec068c58aa5..9f7f8da56c6 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -7000,8 +7000,6 @@ rs6000_expand_vector_set_var_p9 (rtx target, rtx val, rtx 
idx)
 
   gcc_assert (VECTOR_MEM_VSX_P (mode) && !CONST_INT_P (idx));
 
-  gcc_assert (GET_MODE (idx) == E_SImode);
-
   machine_mode inner_mode = GET_MODE (val);
 
   rtx tmp = gen_reg_rtx (GET_MODE (idx));
@@ -7047,8 +7045,6 @@ rs6000_expand_vector_set_var_p8 (rtx target, rtx val, rtx 
idx)
 
   gcc_assert (VECTOR_MEM_VSX_P (mode) && !CONST_INT_P (idx));
 
-  gcc_assert (GET_MODE (idx) == E_SImode);
-
   machine_mode inner_mode = GET_MODE (val);
   HOST_WIDE_INT mode_mask = GET_MODE_MASK (inner_mode);
 
@@ -7144,7 +7140,7 @@ rs6000_expand_vector_set (rtx target, rtx val, rtx 
elt_rtx)
   machine_mode mode = GET_MODE (target);
   machine_mode inner_mode = GET_MODE_INNER (mode);
   rtx reg = gen_reg_rtx (mode);
-  rtx mask, mem, x;
+  rtx mask, mem, x, elt_si;
   int width = GET_MODE_SIZE (inner_mode);
   int i;
 
@@ -7154,16 +7150,23 @@ rs6000_expand_vector_set (rtx target, rtx val, rtx 
elt_rtx)
 {
   if (!CONST_INT_P (elt_rtx))
{
+ /* elt_rtx should be SImode from ELFv2 ABI.  */
+ elt_si = gen_reg_rtx (E_SImode);
+ if (GET_MODE (elt_rtx) != E_SImode)
+   convert_move (elt_si, elt_rtx, 0);
+ else
+   elt_si = elt_rtx;
+
  /* For V2DI/V2DF, could leverage the P9 version to generate xxpermdi
 when elt_rtx is variable.  */
  if ((TARGET_P9_VECTOR && TARGET_POWERPC64) || width == 8)
{
- rs6000_expand_vector_set_var_p9 (target, val, elt_rtx);
+ rs6000_expand_vector_set_var_p9 (target, val, elt_si);
  return;
}
  else if (TARGET_P8_VECTOR && TARGET_DIRECT_MOVE_64BIT)
{
- rs6000_expand_vector_set_var_p8 (target, val, elt_rtx);
+ rs6000_expand_vector_set_var_p8 (target, val, elt_si);
  return;
}
}
diff --git a/gcc/testsuite/gcc.target/powerpc/pr98914.c 
b/gcc/testsuite/gcc.target/powerpc/pr98914.c
new file mode 100644
index 000..e4d78e3e6b3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr98914.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-Og -mvsx" } */
+
+vector int
+foo (vector int v)
+{
+  for (long k = 0; k < 1; ++k)
+v[k] = 0;
+  return v;
+}
-- 
2.25.1



Re: [arm/testsuite]: Skip pr97969.c if -mthumb is not compatible [PR target/97969]

2021-02-03 Thread Christophe Lyon via Gcc-patches
Ping?
I guess that's obvious enough?

On Wed, 27 Jan 2021 at 10:03, Christophe Lyon
 wrote:
>
> Depending on how the toolchain is configured or how the testsuite is
> executed, -mthumb may not be compatible. Like for other tests, skip
> pr97969.c in this case.
>
> For instance arm-linux-gnueabihf and -march=armv5t in RUNTESTFLAGS.
>
> 2021-01-27  Christophe Lyon  
>
> gcc/testsuite/
> PR target/97969
> * gcc.target/arm/pr97969.c: Skip if thumb mode is not available.
>
> diff --git a/gcc/testsuite/gcc.target/arm/pr97969.c
> b/gcc/testsuite/gcc.target/arm/pr97969.c
> index 714a1d1..0b5d07f 100644
> --- a/gcc/testsuite/gcc.target/arm/pr97969.c
> +++ b/gcc/testsuite/gcc.target/arm/pr97969.c
> @@ -1,4 +1,5 @@
>  /* { dg-do compile } */
> +/* { dg-skip-if "" { ! { arm_thumb1_ok || arm_thumb2_ok } } } */
>  /* { dg-options "-std=c99 -fno-omit-frame-pointer -mthumb -w -Os" } */
>
>  typedef a[23];


Re: [RFC] test builtin ratio for loop distribution

2021-02-03 Thread Richard Biener via Gcc-patches
On Tue, Feb 2, 2021 at 6:14 PM Alexandre Oliva  wrote:
>
> On Jan 28, 2021, Richard Biener  wrote:
>
> > That would allow turning back the memset into the original loop (but
> > with optimal IVs, etc.).
>
> Is this sort of what you had in mind?
>
> I haven't tested the inline expansion of memset much yet; and that of
> memcpy, not at all; this really is mainly to check that I understood
> what you had in mind before I spend further time polishing it.

So I think we should try to match what __builtin_memcpy/memset
expansion would do here, taking advantage of extra alignment
and size knowledge.  In particular,

 a) if __builtin_memcpy/memset would use setmem/cpymem optabs
 see if we can have variants of memcpy/memset transferring alignment
 and size knowledge

 b) if expansion would use BY_PIECES then expand to an unrolled loop

 c) if expansion would emit a memset/memcpy call but we know
 alignment and have a low bound on niters emit a loop (like your patch does)

a) might be difficult but adding the builtin variants may pay off in any case.

The patch itself could benefit from one or two helpers we already
have, first of all there's create_empty_loop_on_edge (so you don't
need the loop fixup) which conveniently adds the control IV for you.
If you want to use pointer IVs for simplicity there's create_iv.  In the
end you shouldn't need more than creating the actual memory GIMPLE
stmts.  If expansion would use BY_PIECES you can implement the
unrolled code by setting loop->unroll to the number of iterations
to (maximally) peel and cunroll will take care of that.

Note that for memmove if we know the dependence direction, we
can also emit a loop / unrolled code.

I think the builtins with alignment and calloc-style element count
will be useful on its own.

Richard.

>
> test builtin ratio for loop distribution
>
> From: Alexandre Oliva 
>
> The ldist pass turns even very short loops into memset calls.  E.g.,
> the TFmode emulation calls end with a loop of up to 3 iterations, to
> zero out trailing words, and the loop distribution pass turns them
> into calls of the memset builtin.
>
> Though short constant-length memsets are usually dealt with
> efficiently, for non-constant-length ones, the options are setmemM, or
> a function calls.
>
> RISC-V doesn't have any setmemM pattern, so the loops above end up
> "optimized" into memset calls, incurring not only the overhead of an
> explicit call, but also discarding the information the compiler has
> about the alignment of the destination, and that the length is a
> multiple of the word alignment.
>
> This patch adds, to the loop distribution pass, the ability to perform
> inline expansion of bounded variable-length memset and memcpy in ways
> that take advantage of known alignments and size's factors, when
> preexisting *_RATIO macros suggest the inline expansion is
> advantageous.
>
>
> for  gcc/ChangeLog
>
> * tree-loop-distribution.c: Include builtins.h.
> (generate_memset_builtin): Introduce optional inline expansion
> of bounded variable-sized memset calls.
> (generate_memcpy_builtin): Likewise for memcpy only.
> (loop_distribution::execute): Fix loop structure afterwards.
> ---
>  gcc/tree-loop-distribution.c |  280 
> ++
>  1 file changed, 279 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c
> index bb15fd3723fb6..3be7a4c1ac281 100644
> --- a/gcc/tree-loop-distribution.c
> +++ b/gcc/tree-loop-distribution.c
> @@ -115,6 +115,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tree-vectorizer.h"
>  #include "tree-eh.h"
>  #include "gimple-fold.h"
> +#include "builtins.h"
>
>
>  #define MAX_DATAREFS_NUM \
> @@ -1148,6 +1149,23 @@ generate_memset_builtin (class loop *loop, partition 
> *partition)
>/* The new statements will be placed before LOOP.  */
>gsi = gsi_last_bb (loop_preheader_edge (loop)->src);
>
> +  /* Compute builtin->size range and ctz before it's gimplified; 
> sub-expressions
> + thereof are rewritten in place, so they end up referencing SSA_NAMEs for
> + which we don't have VR info.  */
> +  unsigned align = get_pointer_alignment (builtin->dst_base) / BITS_PER_UNIT;
> +  unsigned alctz, szctz, xbits;
> +  wide_int szmin, szmax;
> +  value_range_kind vrk;
> +  if (align)
> +{
> +  alctz = wi::ctz (align);
> +  szctz = MIN (tree_ctz (builtin->size), alctz);
> +  xbits = alctz - szctz;
> +  vrk = determine_value_range (builtin->size, , );
> +  if (szmin == szmax)
> +   align = 0;
> +}
> +
>nb_bytes = rewrite_to_non_trapping_overflow (builtin->size);
>nb_bytes = force_gimple_operand_gsi (, nb_bytes, true, NULL_TREE,
>false, GSI_CONTINUE_LINKING);
> @@ -1172,6 +1190,127 @@ generate_memset_builtin (class loop *loop, partition 
> *partition)
>val = tem;
>  }
>
> +  unsigned 

Re: [PATCH] release pointer_query cache when done (PR 98937)

2021-02-03 Thread Richard Biener via Gcc-patches
On Wed, Feb 3, 2021 at 1:15 AM Martin Sebor  wrote:
>
> On 2/2/21 2:29 PM, David Malcolm wrote:
> > On Tue, 2021-02-02 at 12:57 -0700, Martin Sebor via Gcc-patches wrote:
> >> The strlen pass initializes its pointer_query member object with
> >> a cache consisting of a couple of vec's.  Because vec doesn't
> >> implement RAII its memory must be explicitly released to avoid
> >> memory leaks.  The attached patch adds a dtor to
> >> the strlen_dom_walker to do that.
> >>
> >> Tested on x86_64-linux and by verifying that the cache leaks are
> >> gone by compiling gcc.dg/Wstringop-overflow*.c tests under Valgrind.
> >>
> >> I'll plan to commit this change as "obvious" tomorrow unless there
> >> are suggestions for changes.
> >
> > Why not make the vecs within struct pointer_query::cache_type's be
> > auto_vecs?  Then presumably the autogenerated dtor for
> > pointer_query::cache_type would clean things up, as called from the
> > autogenerated dtor for strlen_dom_walker, when cleaning up the
> > var_cache field?
> >
> > Or am I missing something?  (sorry, I'm not familiar with this code)
> > Dave
>
> It would work as long as the cache isn't copied or assigned anywhere.
> I don't think it is either, and GCC compiles and no C tests fail, so
> it should be okay.
>
> But I'm leery of using auto_vec as a class member because of pr90904.
> Looks like auto_vec now has a move ctor and move assignment but still
> no safe copy ctor or assignment.  The cache copy ctor and assignment
> operator could be deleted to avoid accidental copies, but at that
> point it starts to become more involved than just flushing the cache.
>
> If you or someone else has a preference for using auto_vec despite
> this I'll change it.  Otherwise I'd just as soon leave it as is.

I prefer the original patch at this point.

Richard.

> Martin
>
> >
> >> Martin
> >>
> >> PS Valgrind shows a fair number of leaks even with the patch but
> >> none of them due to the pointer_query cache.
> >
> >
>


Re: [PATCH]middle-end slp: Split out patterns away from using SLP_ONLY into their own flag

2021-02-03 Thread Richard Biener
On Tue, 2 Feb 2021, Tamar Christina wrote:

> Hi All,
> 
> Previously the SLP pattern matcher was using STMT_VINFO_SLP_VECT_ONLY as a way
> to dissolve the SLP only patterns during SLP cancellation.  However it seems
> like the semantics for STMT_VINFO_SLP_VECT_ONLY are slightly different than 
> what
> I expected.
> 
> Namely that the non-SLP path can still use a statement marked
> STMT_VINFO_SLP_VECT_ONLY.  One such example is masked loads which are used 
> both
> in the SLP and non-SLP path.
> 
> To fix this I now introduce a new flag STMT_VINFO_SLP_VECT_ONLY_PATTERN which 
> is
> used only by the pattern matcher.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
> and no issues.
> 
> Ok for master?

OK.

Thanks,
Richard.

> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   PR tree-optimization/98928
>   * tree-vect-loop.c (vect_analyze_loop_2): Change
>   STMT_VINFO_SLP_VECT_ONLY to STMT_VINFO_SLP_VECT_ONLY_PATTERN.
>   * tree-vect-slp-patterns.c (complex_pattern::build): Likewise.
>   * tree-vectorizer.h (STMT_VINFO_SLP_VECT_ONLY_PATTERN): New.
>   (class _stmt_vec_info): Add slp_vect_pattern_only_p.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR tree-optimization/98928
>   * gcc.target/i386/pr98928.c: New test.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/testsuite/gcc.target/i386/pr98928.c 
> b/gcc/testsuite/gcc.target/i386/pr98928.c
> new file mode 100644
> index 
> ..9503b579a88d95c427d3e3e5a71565b0c048c125
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr98928.c
> @@ -0,0 +1,59 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-Ofast -march=skylake-avx512 -fwhole-program -w" 
> } */
> +
> +typedef float MagickRealType;
> +typedef short Quantum;
> +float InterpolateMagickPixelPacket_alpha[1];
> +int InterpolateMagickPixelPacket_i;
> +
> +void InterpolateMagickPixelPacket();
> +
> +void main() { InterpolateMagickPixelPacket(); }
> +
> +typedef struct {
> +  MagickRealType red, green, blue, opacity, index;
> +} MagickPixelPacket;
> +typedef struct {
> +  Quantum blue, green, red, opacity;
> +} PixelPacket;
> +struct _Image {
> +  int colorspace;
> +  int matte;
> +} GetMagickPixelPacket(MagickPixelPacket *pixel) {
> +  pixel->red = pixel->green = pixel->blue = 0.0;
> +}
> +int AlphaBlendMagickPixelPacket(struct _Image *image, PixelPacket *color,
> +Quantum *indexes, MagickPixelPacket *pixel,
> +MagickRealType *alpha) {
> +  if (image->matte) {
> +*alpha = pixel->red = pixel->green = pixel->blue = pixel->opacity =
> +color->opacity;
> +pixel->index = 0.0;
> +if (image->colorspace)
> +  pixel->index = *indexes;
> +return 0;
> +  }
> +  *alpha = 1.0 / 0.2;
> +  pixel->red = *alpha * color->red;
> +  pixel->green = *alpha * color->green;
> +  pixel->blue = *alpha * color->blue;
> +  pixel->opacity = pixel->index = 0.0;
> +  if (image->colorspace && indexes)
> +pixel->index = *indexes;
> +}
> +MagickPixelPacket InterpolateMagickPixelPacket_pixels[1];
> +PixelPacket InterpolateMagickPixelPacket_p;
> +
> +void
> +InterpolateMagickPixelPacket(struct _Image *image) {
> +  Quantum *indexes;
> +  for (; InterpolateMagickPixelPacket_i; InterpolateMagickPixelPacket_i++) {
> +GetMagickPixelPacket(InterpolateMagickPixelPacket_pixels +
> + InterpolateMagickPixelPacket_i);
> +AlphaBlendMagickPixelPacket(
> +image, _p + 
> InterpolateMagickPixelPacket_i,
> +indexes + InterpolateMagickPixelPacket_i,
> +InterpolateMagickPixelPacket_pixels + InterpolateMagickPixelPacket_i,
> +InterpolateMagickPixelPacket_alpha + InterpolateMagickPixelPacket_i);
> +  }
> +}
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index 
> acfd1952e3b803ea79cf51433101466743c9793e..200ed27b32ef4aa54c6783afa1864924b6f55582
>  100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -2700,7 +2700,7 @@ again:
>   {
> stmt_vec_info pattern_stmt_info
>   = STMT_VINFO_RELATED_STMT (stmt_info);
> -   if (STMT_VINFO_SLP_VECT_ONLY (pattern_stmt_info))
> +   if (STMT_VINFO_SLP_VECT_ONLY_PATTERN (pattern_stmt_info))
>   STMT_VINFO_IN_PATTERN_P (stmt_info) = false;
>  
> gimple *pattern_def_seq = STMT_VINFO_PATTERN_DEF_SEQ (stmt_info);
> diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c
> index 
> d25560fab97bb852e949884850d51c6148b14a68..f0817da9f622d22e3df2e30410d1cf610b4ffa1d
>  100644
> --- a/gcc/tree-vect-slp-patterns.c
> +++ b/gcc/tree-vect-slp-patterns.c
> @@ -599,7 +599,7 @@ complex_pattern::build (vec_info *vinfo)
>the call there.  */
>vect_mark_pattern_stmts (vinfo, stmt_info, call_stmt,
>  SLP_TREE_VECTYPE (node));
> -  STMT_VINFO_SLP_VECT_ONLY (call_stmt_info) = true;
> +  STMT_VINFO_SLP_VECT_ONLY_PATTERN (call_stmt_info) 

Re: [RFC] Feedback on approach for adding support for V8QI->V8HI widening patterns

2021-02-03 Thread Richard Biener via Gcc-patches
On Tue, Feb 2, 2021 at 5:19 PM Richard Sandiford
 wrote:
>
> Richard Biener  writes:
> > On Tue, Feb 2, 2021 at 4:03 PM Richard Sandiford
> >  wrote:
> >>
> >> Richard Biener  writes:
> >> > On Mon, Feb 1, 2021 at 6:54 PM Joel Hutton  wrote:
> >> >>
> >> >> Hi Richard(s),
> >> >>
> >> >> I'm just looking to see if I'm going about this the right way, based on 
> >> >> the discussion we had on IRC. I've managed to hack something together, 
> >> >> I've attached a (very) WIP patch which gives the correct codegen for 
> >> >> the testcase in question 
> >> >> (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98772). It would 
> >> >> obviously need to support other widening patterns and differentiate 
> >> >> between big/little endian among other things.
> >> >>
> >> >> I added a backend pattern because I wasn't quite clear which changes to 
> >> >> make in order to allow the existing backend patterns to be used with a 
> >> >> V8QI, or how to represent V16QI where we don't care about the 
> >> >> top/bottom 8. I made some attempt in optabs.c, which is in the patch 
> >> >> commented out, but I'm not sure if I'm going about this the right way.
> >> >
> >> > Hmm, as said, I'd try to arrange like illustrated in the attachment,
> >> > confined to vectorizable_conversion.  The
> >> > only complication might be sub-optimal code-gen for the vector-vector
> >> > CTOR compensating for the input
> >> > vector (on RTL that would be a paradoxical subreg from say V4HI to V8HI)
> >>
> >> Yeah.  I don't really like this because it means that it'll be
> >> impossible to remove the redundant work in gimple.  The extra elements
> >> are just a crutch to satisfy the type system.
> >
> > We can certainly devise a more clever way to represent a paradoxical subreg,
> > but at least the actual operation (WIDEN_MINUS_LOW) would match what
> > the hardware can do.
>
> At least for the Arm ISAs, the low parts are really 64-bit → 128-bit
> operations.  E.g. the low-part intrinsic for signed 8-bit integers is:
>
>int16x8_t vsubl_s8 (int8x8_t __a, int8x8_t __b);
>
> whereas the high-part intrinsic is:
>
>int16x8_t vsubl_high_s8 (int8x16_t __a, int8x16_t __b);
>
> So representing the low part as a 128-bit → 128-bit operation is already
> a little artifical.

that's intrinsincs - but I guess the actual machine instruction is different?

> > OTOH we could simply accept half of a vector for
> > the _LOW (little-endial) or _HIGH (big-endian) op and have the expander
> > deal with subreg frobbing?  Not that I'd like that very much though, even
> > a VIEW_CONVERT  (v4hi-reg) would be cleaner IMHO (not sure
> > how to go about endianess here ... the _LOW/_HIGH paints us into some
> > corner here)
>
> I think it only makes sense for the low part.  But yeah, I guess that
> would work (although I agree it doesn't seem very appealing :-)).
>
> > A new IFN (direct optab?) means targets with existing support for _LO/HI
> > do not automatically benefit which is a shame.
>
> In practice this will only affect targets that choose to use mixed
> vector sizes, and I think it's reasonable to optimise only for the
> case in which such targets support widening conversions.  So what
> do you think about the idea of emitting separate conversions and
> a normal subtract?  We'd be relying on RTL to fuse them together,
> but at least there would be no redundancy to eliminate.

So in vectorizable_conversion for the widen-minus you'd check
whether you can do a v4qi -> v4hi and then emit a conversion
and a wide minus?  I guess as long as vectorizer costing behaves
as if the op is fused that's a similarly OK trick as a V_C_E or a
vector CTOR.

Richard.

> Thanks,
> Richard
> >
> >> As far as Joel's patch goes, I was imagining that the new operation
> >> would be an internal function rather than a tree code.  However,
> >> if we don't want that, maybe we should just emit separate conversions
> >> and a normal subtraction, like we would for (signed) x - (unsigned) y.
> >>
> >> Thanks,
> >> Richard