[pushed] c++: speed up looking up the current class

2021-06-11 Thread Jason Merrill via Gcc-patches
While looking at template instantiation tracing, I noticed that we were
frequently looking up a particular class template instance while
instantiating it.  This patch shortcuts that lookup, and speeds up compiling
stdc++.h with my (checking/unoptimized) compiler by about 3%.

Tested x86_64-pc-linux-gnu, applying to trunk.

gcc/cp/ChangeLog:

* pt.c (lookup_template_class_1): Shortcut current_class_type.
---
 gcc/cp/pt.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 141388ad2e5..d4bb5cc5eaf 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -9833,6 +9833,13 @@ lookup_template_class_1 (tree d1, tree arglist, tree 
in_decl, tree context,
   /* From here on, we're only interested in the most general
 template.  */
 
+  /* Shortcut looking up the current class scope again.  */
+  if (current_class_type)
+   if (tree ti = CLASSTYPE_TEMPLATE_INFO (current_class_type))
+ if (gen_tmpl == most_general_template (TI_TEMPLATE (ti))
+ && comp_template_args (arglist, TI_ARGS (ti)))
+   return current_class_type;
+
   /* Calculate the BOUND_ARGS.  These will be the args that are
 actually tsubst'd into the definition to create the
 instantiation.  */

base-commit: f16f65f8364b5bf23c72a8fdbba4974ecadc5cb6
-- 
2.27.0



[PATCH 1/1] Fix a typo in an AutoFDO error string

2021-06-11 Thread Eugene Rozenfeld via Gcc-patches
Trivial change, committed to trunk.

[PATCH 1/1] Fix a typo in an AutoFDO error string

gcc/ChangeLog:

* auto-profile.c (read_profile): fix a typo in an error string
---
 gcc/auto-profile.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/auto-profile.c b/gcc/auto-profile.c
index 2a6d9a1fc24..a4601243dc9 100644
--- a/gcc/auto-profile.c
+++ b/gcc/auto-profile.c
@@ -939,7 +939,7 @@ read_profile (void)
   unsigned version = gcov_read_unsigned ();
   if (version != AUTO_PROFILE_VERSION)
 {
-  error ("AutoFDO profile version %u does match %u",
+  error ("AutoFDO profile version %u does not match %u",
 version, AUTO_PROFILE_VERSION);
   return;
 }
-- 
2.25.1



Re: [PATCH] Fix effective target for check-builtin-vec_rlnm-runnable.c test

2021-06-11 Thread Segher Boessenkool
Hi!

On Fri, Jun 11, 2021 at 02:19:40PM -0700, Carl Love wrote:
> The gcc test suite compiles and attempts to run the check-builtin-
> vec_rlnm-runnable.c test on Power 8 platforms.  The test should only be
> run on Power 9 and newer platforms.  The attached patch fixes the
> target for the executable test so it only runs on Power 9 and newer
> platforms.

>   * gcc.target/powerpc/check-builtin-vec_rlnm-runnable.c 
> (dg-require-effective-target):
>   Change target to p9vector_hw.

You would put the ( on a new line, lik:
* gcc.target/powerpc/check-builtin-vec_rlnm-runnable.c
(dg-require-effective-target): Change target to p9vector_hw.

Okay for trunk like that.  Thanks!


Segher


[PATCH] c++: Don't complain about [[maybe_unused]] on non-static data members

2021-06-11 Thread Jakub Jelinek via Gcc-patches
On Fri, Jun 11, 2021 at 10:35:58PM +0200, Jakub Jelinek wrote:
> > But I also agree that GCC shouldn't warn here.
> 
> We could do that by using a wrapper around handle_unused_attribute
> for the maybe_unused attribute, that way warn on unused attribute on
> FIELD_DECLs, but not for maybe_unused (until we actually implement some
> warning that uses it).

Here it is in patch form.
Ok for trunk if it passes bootstrap/regtest?

2021-06-11  Jakub Jelinek  

* tree.c (handle_maybe_unused_attribute): New function.
(std_attribute_table): Use it as handler for maybe_unused attribute.

* g++.dg/cpp1z/maybe_unused2.C: New test.

--- gcc/cp/tree.c.jj2021-05-28 11:03:19.490884332 +0200
+++ gcc/cp/tree.c   2021-06-11 23:41:36.503413441 +0200
@@ -4872,6 +4872,23 @@ handle_likeliness_attribute (tree *node,
 return error_mark_node;
 }
 
+/* The C++17 [[maybe_unused]] attribute maps to GNU unused attribute,
+   except that we don't want -Wattributes to warn for [[maybe_unused]]
+   on non-static data members.  */
+
+static tree
+handle_maybe_unused_attribute (tree *node, tree name, tree args,
+  int flags, bool *no_add_attrs)
+{
+  if (TREE_CODE (*node) == FIELD_DECL)
+{
+  *no_add_attrs = true;
+  return NULL_TREE;
+}
+  else
+return handle_unused_attribute (node, name, args, flags, no_add_attrs);
+}
+
 /* Table of valid C++ attributes.  */
 const struct attribute_spec cxx_attribute_table[] =
 {
@@ -4890,7 +4907,7 @@ const struct attribute_spec std_attribut
   /* { name, min_len, max_len, decl_req, type_req, fn_type_req,
affects_type_identity, handler, exclude } */
   { "maybe_unused", 0, 0, false, false, false, false,
-handle_unused_attribute, NULL },
+handle_maybe_unused_attribute, NULL },
   { "nodiscard", 0, 1, false, false, false, false,
 handle_nodiscard_attribute, NULL },
   { "no_unique_address", 0, 0, true, false, false, false,
--- gcc/testsuite/g++.dg/cpp1z/maybe_unused2.C.jj   2021-06-11 
23:40:51.742027943 +0200
+++ gcc/testsuite/g++.dg/cpp1z/maybe_unused2.C  2021-06-11 23:40:47.642084225 
+0200
@@ -0,0 +1,7 @@
+// { dg-do compile { target c++11 } }
+// { dg-options "-Wunused -Wextra" }
+
+struct [[maybe_unused]] A {
+  [[maybe_unused]] static int i;
+  [[maybe_unused]] int a;
+};


Jakub



[PATCH] Fix effective target for check-builtin-vec_rlnm-runnable.c test

2021-06-11 Thread Carl Love via Gcc-patches
GCC maintainers:

The gcc test suite compiles and attempts to run the check-builtin-
vec_rlnm-runnable.c test on Power 8 platforms.  The test should only be
run on Power 9 and newer platforms.  The attached patch fixes the
target for the executable test so it only runs on Power 9 and newer
platforms.

The patch was tested on powerpc64-linux instead (Power 8 BE).  The test
harness reports 1 unsupported test.

The patch was also tested on:

powerpc64le-linux instead (Power 9 LE)
powerpc64le-linux instead (Power 10 LE)

The test harness reports 3 expected passes and no failures.

Please let me know if the patch looks OK for mainline.  Thanks.

Carl 

-


The effective target for a Power 9 runnable test should be
p9vector_hw.

2021-06-11  Carl Love  

gcc/testsuite/ChangeLog

* gcc.target/powerpc/check-builtin-vec_rlnm-runnable.c 
(dg-require-effective-target):
Change target to p9vector_hw.
---
 .../gcc.target/powerpc/check-builtin-vec_rlnm-runnable.c| 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/check-builtin-vec_rlnm-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/check-builtin-vec_rlnm-runnable.c
index cd67b06afbe..55935eaafd2 100644
--- a/gcc/testsuite/gcc.target/powerpc/check-builtin-vec_rlnm-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/check-builtin-vec_rlnm-runnable.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-require-effective-target p9vector_hw } */
 /* { dg-options "-O2 -mdejagnu-cpu=power9 -save-temps" } */
 
 /* Verify the vec_rlm and vec_rlmi builtins works correctly.  */
-- 
2.17.1




[PATCH] x86: Replace ix86_red_zone_size with ix86_red_zone_used

2021-06-11 Thread H.J. Lu via Gcc-patches
Add red_zone_used to machine_function to track if red zone is used.
When expanding function prologue, set red_zone_used to true if red
zone is used.

gcc/

PR target/pr101023
* config/i386/i386.c (ix86_expand_prologue): Set red_zone_used
to true if red zone is used.
(ix86_output_indirect_jmp): Replace ix86_red_zone_size with
ix86_red_zone_used.
* config/i386/i386.h (machine_function): Add red_zone_used.
(ix86_red_zone_size): Removed.
(ix86_red_zone_used): New.
* config/i386/i386.md (peephole2 patterns): Replace
ix86_red_zone_size with ix86_red_zone_used.

gcc/testsuite/

PR target/pr101023
* g++.target/i386/pr101023a.C: New test.
* g++.target/i386/pr101023b.C: Likewise.
---
 gcc/config/i386/i386.c|  6 ++-
 gcc/config/i386/i386.h|  5 +-
 gcc/config/i386/i386.md   |  8 +--
 gcc/testsuite/g++.target/i386/pr101023a.C | 62 +++
 gcc/testsuite/g++.target/i386/pr101023b.C |  5 ++
 5 files changed, 80 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/i386/pr101023a.C
 create mode 100644 gcc/testsuite/g++.target/i386/pr101023b.C

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 05b8dc806cd..a61255857ff 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -8401,10 +8401,14 @@ ix86_expand_prologue (void)
   || frame.stack_pointer_offset < CHECK_STACK_LIMIT))
{
  ix86_emit_save_regs_using_mov (frame.reg_save_offset);
+ cfun->machine->red_zone_used = true;
  int_registers_saved = true;
}
 }
 
+  if (frame.red_zone_size != 0)
+cfun->machine->red_zone_used = true;
+
   if (stack_realign_fp)
 {
   int align_bytes = crtl->stack_alignment_needed / BITS_PER_UNIT;
@@ -15915,7 +15919,7 @@ ix86_output_indirect_jmp (rtx call_op)
 {
   /* We can't have red-zone since "call" in the indirect thunk
  pushes the return address onto stack, destroying red-zone.  */
-  if (ix86_red_zone_size != 0)
+  if (ix86_red_zone_used)
gcc_unreachable ();
 
   ix86_output_indirect_branch (call_op, "%0", true);
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 919d0b2418a..182b3275991 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -2663,6 +2663,9 @@ struct GTY(()) machine_function {
  invalid calls.  */
   BOOL_BITFIELD silent_p : 1;
 
+  /* True if red zone is used.  */
+  BOOL_BITFIELD red_zone_used : 1;
+
   /* The largest alignment, in bytes, of stack slot actually used.  */
   unsigned int max_used_stack_alignment;
 
@@ -2693,7 +2696,7 @@ extern GTY(()) tree ms_va_list_type_node;
 #define ix86_current_function_calls_tls_descriptor \
   (ix86_tls_descriptor_calls_expanded_in_cfun && df_regs_ever_live_p (SP_REG))
 #define ix86_static_chain_on_stack (cfun->machine->static_chain_on_stack)
-#define ix86_red_zone_size (cfun->machine->frame.red_zone_size)
+#define ix86_red_zone_used (cfun->machine->red_zone_used)
 
 /* Control behavior of x86_file_start.  */
 #define X86_FILE_START_VERSION_DIRECTIVE false
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 7743c61ec86..6e4abf32e7c 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -20491,7 +20491,7 @@ (define_peephole2
  (clobber (mem:BLK (scratch)))])]
   "(TARGET_SINGLE_PUSH || optimize_insn_for_size_p ())
&& INTVAL (operands[0]) == -GET_MODE_SIZE (word_mode)
-   && ix86_red_zone_size == 0"
+   && !ix86_red_zone_used"
   [(clobber (match_dup 1))
(parallel [(set (mem:W (pre_dec:P (reg:P SP_REG))) (match_dup 1))
  (clobber (mem:BLK (scratch)))])])
@@ -20505,7 +20505,7 @@ (define_peephole2
  (clobber (mem:BLK (scratch)))])]
   "(TARGET_DOUBLE_PUSH || optimize_insn_for_size_p ())
&& INTVAL (operands[0]) == -2*GET_MODE_SIZE (word_mode)
-   && ix86_red_zone_size == 0"
+   && !ix86_red_zone_used"
   [(clobber (match_dup 1))
(set (mem:W (pre_dec:P (reg:P SP_REG))) (match_dup 1))
(parallel [(set (mem:W (pre_dec:P (reg:P SP_REG))) (match_dup 1))
@@ -20520,7 +20520,7 @@ (define_peephole2
  (clobber (reg:CC FLAGS_REG))])]
   "(TARGET_SINGLE_PUSH || optimize_insn_for_size_p ())
&& INTVAL (operands[0]) == -GET_MODE_SIZE (word_mode)
-   && ix86_red_zone_size == 0"
+   && !ix86_red_zone_used"
   [(clobber (match_dup 1))
(set (mem:W (pre_dec:P (reg:P SP_REG))) (match_dup 1))])
 
@@ -20532,7 +20532,7 @@ (define_peephole2
  (clobber (reg:CC FLAGS_REG))])]
   "(TARGET_DOUBLE_PUSH || optimize_insn_for_size_p ())
&& INTVAL (operands[0]) == -2*GET_MODE_SIZE (word_mode)
-   && ix86_red_zone_size == 0"
+   && !ix86_red_zone_used"
   [(clobber (match_dup 1))
(set (mem:W (pre_dec:P (reg:P SP_REG))) (match_dup 1))
(set (mem:W (pre_dec:P (reg:P SP_REG))) (match_dup 1))])
diff --git 

Re: [PATCH] libgccjit: Add function to set the initial value of a global variable [PR96089]

2021-06-11 Thread Antoni Boucher via Gcc-patches
David: this one wasn't reviewed yet by you, so you can review it.

Le jeudi 20 mai 2021 à 21:27 -0400, Antoni Boucher a écrit :
> Hi.
> 
> I made this patch to set an arbitrary value to a global variable.
> 
> This patch suffers from the same issue as inline assembly
> (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100380), i.e. it
> segfaults if the `rvalue` is created after the global variable.
> It seems to be a design issue so I'm not sure what would be the fix
> for
> it and having it fixed would allow me to test this new function much
> more and see if things are missing (i.e. it might require a way to
> create a constant struct).
> See the link above for an explanation of this issue.
> 
> Thanks for the review.




Re: [PATCH] rs6000: Support doubleword swaps removal in rot64 load store [PR100085]

2021-06-11 Thread Segher Boessenkool
On Thu, Jun 10, 2021 at 03:11:08PM +0800, Xionghu Luo wrote:
> On 2021/6/10 00:24, Segher Boessenkool wrote:
> >>"!BYTES_BIG_ENDIAN && TARGET_VSX && reload_completed && 
> >> !TARGET_P9_VECTOR
> >> && !altivec_indexed_or_indirect_operand (operands[0], mode)"
> >>[(const_int 0)]
> >> {
> >>rs6000_emit_le_vsx_permute (operands[1], operands[1], mode);
> >>rs6000_emit_le_vsx_permute (operands[0], operands[1], mode);
> >>rs6000_emit_le_vsx_permute (operands[1], operands[1], mode);
> >>DONE;
> >> })
> > 
> > So it seems like it is only 3 insns in the very unlucky case?  Normally
> > it will end up as just one simple store?
> 
> I am afraid there is not "simple store" for *TImode on P8LE*.  There is only
> stxvd2x that rotates the element(stvx requires memory to be aligned, not
> suitable pattern), so every vsx_le_perm_store_v1ti must be split to 3
> instructions for alternative 0, it seems incorrect to force the cost to be 4.

Often it could be done as just two insns though?  If the value stored is
not used elsewhere?

So we could make the first alternative cost 8 then as well, which will
also work out for combine, right?

Alternatively we could have what is now the second alternative be the
first, if that is realistic -- that one already has cost 8 (it is just
two machine instructions).


Segher


Re: [C PATCH] qualifiers of pointers to arrays in C2X [PR 98397]

2021-06-11 Thread Uecker, Martin

(PING. In case you missed this. Sorry, forgot to CC you.)

Am Montag, den 24.05.2021, 08:05 +0200 schrieb Martin Uecker:
> Hi Joseph,
> 
> I found some time to update this patch. The only real change
> of the patch is the qualifier in the conditional expression for
> pointer to arrays in C2X. All the rest are the warnings,
> which were wrong in the last version.
> 
> I hope I got this correct this time in combination with
> -pedantic-errors and -Wc11-c2x-compat. 
> 
> Martin
> 
> 
> 2021-05-16  Martin Uecker  
> 
> gcc/c/
>  PR c/98397
>  * c-typeck.c (comp_target_types): Change pedwarn to pedwarn_c11
>  for pointers to arrays with qualifiers.
>  (build_conditional_expr): For C23 don't lose qualifiers for pointers
>  to arrays when the other pointer is a void pointer. Update warnings.
>  (convert_for_assignment): Update warnings for C2X when converting from
>  void* with qualifiers to a pointer to array with the same qualifiers.
> 
> gcc/testsuite/
>  PR c/98397
>  * gcc.dg/c11-qual-1.c: New test.
>  * gcc.dg/c2x-qual-1.c: New test.
>  * gcc.dg/c2x-qual-2.c: New test.
>  * gcc.dg/c2x-qual-3.c: New test.
>  * gcc.dg/c2x-qual-4.c: New test.
>  * gcc.dg/c2x-qual-5.c: New test.
>  * gcc.dg/c2x-qual-6.c: New test.
>  * gcc.dg/pointer-array-quals-1.c: Remove unnecessary flag.
>  * gcc.dg/pointer-array-quals-2.c: Remove unnecessary flag.
> 
> 
> diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
> index fc64ef96fb8..5b13656c090 100644
> --- a/gcc/c/c-typeck.c
> +++ b/gcc/c/c-typeck.c
> @@ -1328,8 +1328,8 @@ comp_target_types (location_t location, tree ttl, tree 
> ttr)
>val = comptypes_check_enum_int (mvl, mvr, _and_int_p);
>  
>if (val == 1 && val_ped != 1)
> -pedwarn (location, OPT_Wpedantic, "pointers to arrays with different 
> qualifiers "
> -  "are incompatible in ISO C");
> +pedwarn_c11 (location, OPT_Wpedantic, "invalid use of pointers to arrays 
> with different
> qualifiers "
> +   "in ISO C before C2X");
>  
>if (val == 2)
>  pedwarn (location, OPT_Wpedantic, "types are not quite compatible");
> @@ -5396,39 +5396,40 @@ build_conditional_expr (location_t colon_loc, tree 
> ifexp, bool ifexp_bcp,
>   "used in conditional expression");
> return error_mark_node;
>   }
> -  else if (VOID_TYPE_P (TREE_TYPE (type1))
> -&& !TYPE_ATOMIC (TREE_TYPE (type1)))
> - {
> -   if ((TREE_CODE (TREE_TYPE (type2)) == ARRAY_TYPE)
> -   && (TYPE_QUALS (strip_array_types (TREE_TYPE (type2)))
> -   & ~TYPE_QUALS (TREE_TYPE (type1
> - warning_at (colon_loc, OPT_Wdiscarded_array_qualifiers,
> - "pointer to array loses qualifier "
> - "in conditional expression");
> -
> -   if (TREE_CODE (TREE_TYPE (type2)) == FUNCTION_TYPE)
> +  else if ((VOID_TYPE_P (TREE_TYPE (type1))
> + && !TYPE_ATOMIC (TREE_TYPE (type1)))
> +|| (VOID_TYPE_P (TREE_TYPE (type2))
> +&& !TYPE_ATOMIC (TREE_TYPE (type2
> + {
> +   tree t1 = TREE_TYPE (type1);
> +   tree t2 = TREE_TYPE (type2);
> +   if (!VOID_TYPE_P (t1))
> +{
> +  /* roles are swapped */
> +  t1 = t2;
> +  t2 = TREE_TYPE (type1);
> +}
> +   tree t2_stripped = strip_array_types (t2);
> +   if ((TREE_CODE (t2) == ARRAY_TYPE)
> +   && (TYPE_QUALS (t2_stripped) & ~TYPE_QUALS (t1)))
> + {
> +   if (!flag_isoc2x)
> + warning_at (colon_loc, OPT_Wdiscarded_array_qualifiers,
> + "pointer to array loses qualifier "
> + "in conditional expression");
> +   else if (warn_c11_c2x_compat > 0)
> + warning_at (colon_loc, OPT_Wc11_c2x_compat,
> + "pointer to array loses qualifier "
> + "in conditional expression in ISO C before C2X");
> + }
> +   if (TREE_CODE (t2) == FUNCTION_TYPE)
>   pedwarn (colon_loc, OPT_Wpedantic,
>"ISO C forbids conditional expr between "
>"% and function pointer");
> -   result_type = build_pointer_type (qualify_type (TREE_TYPE (type1),
> -   TREE_TYPE (type2)));
> - }
> -  else if (VOID_TYPE_P (TREE_TYPE (type2))
> -&& !TYPE_ATOMIC (TREE_TYPE (type2)))
> - {
> -   if ((TREE_CODE (TREE_TYPE (type1)) == ARRAY_TYPE)
> -   && (TYPE_QUALS (strip_array_types (TREE_TYPE (type1)))
> -   & ~TYPE_QUALS (TREE_TYPE (type2
> - warning_at (colon_loc, OPT_Wdiscarded_array_qualifiers,
> - "pointer to array loses qualifier "
> - "in conditional expression");
> -
> -   if (TREE_CODE (TREE_TYPE (type1)) == FUNCTION_TYPE)
> - 

Re: [PATCH] c++: Substitute into function parms in lexical order [PR96560]

2021-06-11 Thread Jason Merrill via Gcc-patches

On 4/29/21 7:48 AM, Patrick Palka wrote:

On Wed, 28 Apr 2021, Jason Merrill wrote:


On 4/28/21 2:24 PM, Patrick Palka wrote:

This makes tsubst_arg_types substitute into a function's parameter types
in left-to-right order instead of right-to-left order, in accordance with
DR 1227.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?  [ diff generated with -w to hide noisy whitespace changes ]


OK. We'll still substitute lambda default args in reverse order, but that
shouldn't happen in sfinae context, so that shouldn't be a problem.  Maybe add
an assert (complain & tf_error) in the lambda case?


Apparently cpp2a/lambda-uneval2.C trips over such an assert during
deduction for:

   template 
   auto j(T t) -> decltype([](auto x) -> decltype(x.invalid) { } (t));

which makes sense, I suppose.

It looks like we can just move up the default argument processing to
before the recursive call to tsubst_arg_types, as in the below.
Bootstrapped and regtested on x86_64-pc-linux-gnu.


OK.


BTW, now that LAMBDA_EXPR has LAMBDA_EXPR_REGEN_INFO I think we have
enough information to defer substituting into lambda default arguments
until they're actually needed.  Would that be something to look into as
a followup patch?


I think the language currently specifies that they're substituted 
immediately, so probably not unless that changes.



-- >8 --

Subject: [PATCH] c++: Substitute into function parms in lexical order
  [PR96560]

This makes tsubst_arg_types substitute into a function's parameter types
in left-to-right order instead of right-to-left order, in accordance with
DR 1227.

gcc/cp/ChangeLog:

DR 1227
PR c++/96560
* pt.c (tsubst_arg_types): Rearrange so that we substitute into
TYPE_ARG_TYPES in forward order while short circuiting
appropriately.  Adjust formatting.

gcc/testsuite/ChangeLog:

DR 1227
PR c++/96560
* g++.dg/template/sfinae-dr1227.C: New test.
---
  gcc/cp/pt.c   | 51 ++-
  gcc/testsuite/g++.dg/template/sfinae-dr1227.C | 23 +
  2 files changed, 51 insertions(+), 23 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/template/sfinae-dr1227.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index eaf46659f85..e6d65595e2f 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -15068,20 +15068,13 @@ tsubst_arg_types (tree arg_types,
  tsubst_flags_t complain,
  tree in_decl)
  {
-  tree remaining_arg_types;
tree type = NULL_TREE;
-  int i = 1;
+  int len = 1;
tree expanded_args = NULL_TREE;
-  tree default_arg;
  
if (!arg_types || arg_types == void_list_node || arg_types == end)

  return arg_types;
  
-  remaining_arg_types = tsubst_arg_types (TREE_CHAIN (arg_types),

- args, end, complain, in_decl);
-  if (remaining_arg_types == error_mark_node)
-return error_mark_node;
-
if (PACK_EXPANSION_P (TREE_VALUE (arg_types)))
  {
/* For a pack expansion, perform substitution on the
@@ -15092,7 +15085,7 @@ tsubst_arg_types (tree arg_types,
  
if (TREE_CODE (expanded_args) == TREE_VEC)

  /* So that we'll spin through the parameters, one by one.  */
-i = TREE_VEC_LENGTH (expanded_args);
+   len = TREE_VEC_LENGTH (expanded_args);
else
  {
/* We only partially substituted into the parameter
@@ -15101,14 +15094,15 @@ tsubst_arg_types (tree arg_types,
expanded_args = NULL_TREE;
  }
  }
+  else
+type = tsubst (TREE_VALUE (arg_types), args, complain, in_decl);
  
-  while (i > 0) {

---i;
-
+  /* Check if a substituted type is erroneous before substituting into
+ the rest of the chain.  */
+  for (int i = 0; i < len; i++)
+{
if (expanded_args)
type = TREE_VEC_ELT (expanded_args, i);
-else if (!type)
-  type = tsubst (TREE_VALUE (arg_types), args, complain, in_decl);
  
if (type == error_mark_node)

return error_mark_node;
@@ -15122,15 +15116,12 @@ tsubst_arg_types (tree arg_types,
}
  return error_mark_node;
}
-
-/* Do array-to-pointer, function-to-pointer conversion, and ignore
-   top-level qualifiers as required.  */
-type = cv_unqualified (type_decays_to (type));
+}
  
/* We do not substitute into default arguments here.  The standard

   mandates that they be instantiated only when needed, which is
   done in build_over_call.  */
-default_arg = TREE_PURPOSE (arg_types);
+  tree default_arg = TREE_PURPOSE (arg_types);
  
/* Except that we do substitute default arguments under tsubst_lambda_expr,

   since the new op() won't have any associated template arguments for us
@@ -15139,20 +15130,34 @@ tsubst_arg_types (tree arg_types,
  default_arg = tsubst_copy_and_build (default_arg, args, complain, in_decl,
 false/*fn*/, 

Re: [PATCH] Document that -fno-trampolines is for Ada only [PR100735]

2021-06-11 Thread Martin Uecker via Gcc-patches



> 
> Jeff et. al.
> 
> > On 9 Jun 2021, at 17:23, Jeff Law via Gcc-patches  
> > wrote:
> > On 5/25/2021 2:23 PM, Paul Eggert wrote:
> >> The GCC manual's documentation of -fno-trampolines was apparently
> >> written from an Ada point of view. However, when I read it I
> >> understandably mistook it to say that -fno-trampolines also works for
> >> C, C++, etc. It doesn't: it is silently ignored for these languages,
> >> and I assume for any language other than Ada.
> >> 
> >> This confusion caused me to go in the wrong direction in a Gnulib
> >> dicussion, as I mistakenly thought that entire C apps with nested
> >> functions could be compiled with -fno-trampolines and then use nested
> >> C functions in stack overflow handlers where the alternate stack
> >> is allocated via malloc. I was wrong, as this won't work on common
> >> platforms like x86-64 where malloc yields non-executable storage.
> >> 
> >> gcc/
> >> * doc/invoke.texi (Code Gen Options):
> >> * doc/tm.texi.in (Trampolines):
> >> Document that -fno-trampolines and -ftrampolines work
> >> only with Ada.
> > So Martin Uecker probably has the most state on this.  IIRC when we last 
> > discussed -fno-
> trampolines the belief was that it could be easily made to work independent 
> of the language, but
> that it was ultimately an ABI change.   That ultimately derailed plans to use 
> -fno-trampolines for
> other languages in the immediate term.
> 
> This is correct, it’s not technically too hard to make it work for another 
> language (I have a hack
> in my arm64-darwin branch that does this for gfortran).  As noted for most 
> ports it is an ABI
> break and thus not usable outside a such a work-around.
> 
> For the record (for the arm64-darwin port in the first instance), together 
> with some of my friends
> at embecosm we plan to implement a solution to the trampoline that does not 
> require executable
> stack and does not require an ABI break.  Perhaps such a solution will be of 
> interest to other
> ports that do not want executable stack.
> 
> We’re not quite ready to post the design yet - but will do so in the next few 
> weeks (all being
> well).
> 

For reference the discussion and PATCH for C can be found here:

https://gcc.gnu.org/legacy-ml/gcc-patches/2018-12/msg01532.html


FWIW: There is a proposal discussed in WG14 to add lambdas.

In this context a more general and portable solution to this
problem would also be useful. 

One related idea is to add a wider function pointer type
to C that includes a data pointer.  A regular function
pointer could be converted to the wider pointer but
back conversion only works if it originally was a
regular function pointer.  This pointer type could then
also be used for interoperablity with other languages
that have callable objects or closures.

If GCC could add something like this and there is then
implementation experience, we could later try to
standardize it. (there was also positive feedback
to this idea from some C++ people I spoke to)


Martin

 









Re: [PATCH,rs6000] Do not check if SMS succeeds on powerpc

2021-06-11 Thread Segher Boessenkool
Hi!

On Fri, Jun 11, 2021 at 01:56:58PM -0500, Aaron Sawdey wrote:
> These tests have become unstable and SMS either succeeds or doesn't
> depending on things like changes in instruction latency. Removing
> the scan-rtl-dump-times checks for powerpc*-*-*.
> 
> If bootstrap/regtest is passes, ok for trunk and backport to 11?

Yes.  Thank you!


Segher


[PATCH,rs6000] Do not check if SMS succeeds on powerpc

2021-06-11 Thread Aaron Sawdey via Gcc-patches
These tests have become unstable and SMS either succeeds or doesn't
depending on things like changes in instruction latency. Removing
the scan-rtl-dump-times checks for powerpc*-*-*.

If bootstrap/regtest is passes, ok for trunk and backport to 11?

Thanks!
   Aaron

gcc/testsuite

* gcc.dg/sms-1.c: Remove scan-rtl-dump-times check.
* gcc.dg/sms-2.c: Remove scan-rtl-dump-times check.
* gcc.dg/sms-3.c: Remove scan-rtl-dump-times check.
* gcc.dg/sms-4.c: Remove scan-rtl-dump-times check.
* gcc.dg/sms-6.c: Remove scan-rtl-dump-times check.
* gcc.dg/sms-8.c: Remove scan-rtl-dump-times check.
* gcc.dg/sms-10.c: Remove scan-rtl-dump-times check.
---
 gcc/testsuite/gcc.dg/sms-1.c  | 2 --
 gcc/testsuite/gcc.dg/sms-10.c | 3 ---
 gcc/testsuite/gcc.dg/sms-2.c  | 2 --
 gcc/testsuite/gcc.dg/sms-3.c  | 3 ---
 gcc/testsuite/gcc.dg/sms-4.c  | 3 ---
 gcc/testsuite/gcc.dg/sms-6.c  | 2 --
 gcc/testsuite/gcc.dg/sms-8.c  | 4 
 7 files changed, 19 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/sms-1.c b/gcc/testsuite/gcc.dg/sms-1.c
index 26868c34c71..098e1aa6e45 100644
--- a/gcc/testsuite/gcc.dg/sms-1.c
+++ b/gcc/testsuite/gcc.dg/sms-1.c
@@ -40,5 +40,3 @@ main ()
   return 0;
 }
 
-/* { dg-final { scan-rtl-dump-times "SMS succeeded" 1 "sms"  { target 
powerpc*-*-* } } } */
-
diff --git a/gcc/testsuite/gcc.dg/sms-10.c b/gcc/testsuite/gcc.dg/sms-10.c
index d85e8e2a274..df3bba24ed0 100644
--- a/gcc/testsuite/gcc.dg/sms-10.c
+++ b/gcc/testsuite/gcc.dg/sms-10.c
@@ -113,6 +113,3 @@ main ()
 
   return 0;
 }
-
-/* { dg-final { scan-rtl-dump-times "SMS succeeded" 1 "sms" { target 
powerpc*-*-* } } } */
-
diff --git a/gcc/testsuite/gcc.dg/sms-2.c b/gcc/testsuite/gcc.dg/sms-2.c
index 7b96f550262..f8375f9f05d 100644
--- a/gcc/testsuite/gcc.dg/sms-2.c
+++ b/gcc/testsuite/gcc.dg/sms-2.c
@@ -31,5 +31,3 @@ fun (nb)
  sy = 0;
   }
 }
-
-/* { dg-final { scan-rtl-dump-times "SMS loop many exits" 1 "sms" { target 
powerpc*-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/sms-3.c b/gcc/testsuite/gcc.dg/sms-3.c
index 822b516af2f..5e56ecf761c 100644
--- a/gcc/testsuite/gcc.dg/sms-3.c
+++ b/gcc/testsuite/gcc.dg/sms-3.c
@@ -38,6 +38,3 @@ main ()
   foo (6, 3);
   return 0;
 }
-
-/* { dg-final { scan-rtl-dump-times "SMS succeeded" 1 "sms" { target 
powerpc*-*-* } } } */
-
diff --git a/gcc/testsuite/gcc.dg/sms-4.c b/gcc/testsuite/gcc.dg/sms-4.c
index f5ebb55a2f4..8416b8b9ce9 100644
--- a/gcc/testsuite/gcc.dg/sms-4.c
+++ b/gcc/testsuite/gcc.dg/sms-4.c
@@ -34,6 +34,3 @@ main ()
   foo (5, a, b, c, dst);
   return 0;
 }
-
-/* { dg-final { scan-rtl-dump-times "SMS succeeded" 1 "sms" { target 
powerpc*-*-* } } } */
-
diff --git a/gcc/testsuite/gcc.dg/sms-6.c b/gcc/testsuite/gcc.dg/sms-6.c
index e57e01539eb..d6fa45a2cf9 100644
--- a/gcc/testsuite/gcc.dg/sms-6.c
+++ b/gcc/testsuite/gcc.dg/sms-6.c
@@ -41,5 +41,3 @@ int main()
   
   return 0;
 }
-
-/* { dg-final { scan-rtl-dump-times "SMS succeeded" 3 "sms" { target 
powerpc*-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/sms-8.c b/gcc/testsuite/gcc.dg/sms-8.c
index 7ccaa454125..dc0a3fc1f9b 100644
--- a/gcc/testsuite/gcc.dg/sms-8.c
+++ b/gcc/testsuite/gcc.dg/sms-8.c
@@ -34,7 +34,3 @@ main ()
   res = foo (3, 4);
   return 0;
 }
-
-/* { dg-final { scan-rtl-dump-times "SMS succeeded" 1 "sms" { target 
powerpc*-*-* } } } */
-
-
-- 
2.27.0



Re: GCC documentation: porting to Sphinx

2021-06-11 Thread Koning, Paul via Gcc-patches



> On Jun 11, 2021, at 11:50 AM, Joseph Myers  wrote:
> 
> ...
> 
> "make" at top level should build all the info manuals and man pages, as at 
> present (if a suitable Sphinx version is installed), and "make install" 
> should install them, in the same directories as at present.
> 
> "make html" at top level should build all the HTML manuals, and "make 
> install-html" should install them.
> 
> "make pdf" and "make install-pdf" at top level should work likewise.
> 
> "make install-html" and "make install-pdf" should put things under 
> $(DESTDIR)$(htmldir) and $(DESTDIR)$(pdfdir) as at present.

And in addition, it would be nice to have additional make  and make 
install- targets for other output formats that Sphinx can generate for us, 
at least some of them.  "epub" comes to mind as an example I would like to have.

paul



PING [PATCH] PR fortran/95502 - ICE in gfc_check_do_variable, at fortran/parse.c:4446

2021-06-11 Thread Harald Anlauf via Gcc-patches
*PING*

BTW this patch also addresses PR95501, so I'll update the Changelog.

Thanks,
Harald

> Gesendet: Freitag, 04. Juni 2021 um 23:12 Uhr
> Von: "Harald Anlauf" 
> An: "fortran" , "gcc-patches" 
> Betreff: [PATCH] PR fortran/95502 - ICE in gfc_check_do_variable, at 
> fortran/parse.c:4446
>
> ICE-on-invalid issues during error recovery.  Testcase by Gerhard,
> initial patch by Steve.  I found another variant which needed an
> additional fix for a NULL pointer dereference.
>
> Regtested on x86_64-pc-linux-gnu.
>
> OK for mainline / 11-branch?
>
> Thanks,
> Harald
>
>
> Fortran - ICE in gfc_check_do_variable, at fortran/parse.c:4446
>
> Avoid NULL pointer dereferences during error recovery.
>
> gcc/fortran/ChangeLog:
>
>   PR fortran/95502
>   * expr.c (gfc_check_pointer_assign): Avoid NULL pointer
>   dereference.
>   * match.c (gfc_match_pointer_assignment): Likewise.
>   * parse.c (gfc_check_do_variable): Avoid comparison with NULL
>   symtree.
>
> gcc/testsuite/ChangeLog:
>
>   PR fortran/95502
>   * gfortran.dg/pr95502.f90: New test.
>
>


[committed] d: foreach over a tuple doesn't work on 16-bit targets (PR100999)

2021-06-11 Thread Iain Buclaw via Gcc-patches
Hi,

This patch improves semantic passes in the front-end around the
`foreach' and `static foreach' statements to be more resilient to
compiling in a minimal D runtime environment.  Checking of the index
type has been improved as well so now there won't be needless compiler
errors when using 8 or 16-bit integers as index types when the size fits
the expected loop range.

Bootstrapped and regression tested on x86_64-linux-gnu/-m32/-mx32,
committed to mainline, and backported to the gcc-10 and gcc-11 release
branches.

Regards,
Iain.

---
gcc/d/ChangeLog:

PR d/100999
* dmd/MERGE: Merge upstream dmd 7a3808254.

libphobos/ChangeLog:

PR d/100999
* src/MERGE: Merge upstream phobos 55bb17543.
---
 gcc/d/dmd/MERGE   |  2 +-
 gcc/d/dmd/cond.c  | 29 +---
 gcc/d/dmd/dinterpret.c|  9 +++
 gcc/d/dmd/expression.c|  2 +-
 gcc/d/dmd/expressionsem.c | 12 ++--
 gcc/d/dmd/statementsem.c  | 36 +-
 .../compilable/extra-files/minimal/object.d   |  1 +
 .../gdc.test/compilable/interpret5.d  | 30 
 gcc/testsuite/gdc.test/compilable/minimal3.d  | 36 ++
 .../gdc.test/compilable/staticforeach.d   | 38 ++
 gcc/testsuite/gdc.test/compilable/test21742.d | 13 
 gcc/testsuite/gdc.test/compilable/test22006.d | 14 
 .../gdc.test/fail_compilation/b12504.d| 64 +
 .../gdc.test/fail_compilation/diag16976.d | 69 ++-
 .../gdc.test/fail_compilation/fail117.d   |  6 +-
 .../gdc.test/fail_compilation/fail22006.d | 22 ++
 .../gdc.test/fail_compilation/fail238_m32.d   |  8 +--
 .../gdc.test/fail_compilation/fail238_m64.d   |  8 +--
 .../gdc.test/fail_compilation/fail7424b.d |  2 +-
 .../gdc.test/fail_compilation/fail7424c.d |  2 +-
 .../gdc.test/fail_compilation/fail7424d.d |  2 +-
 .../gdc.test/fail_compilation/fail7424e.d |  2 +-
 .../gdc.test/fail_compilation/fail7424f.d |  2 +-
 .../gdc.test/fail_compilation/fail7424g.d |  2 +-
 .../gdc.test/fail_compilation/fail7424h.d |  2 +-
 .../gdc.test/fail_compilation/fail7424i.d |  2 +-
 .../gdc.test/fail_compilation/fail9766.d  |  4 +-
 .../gdc.test/fail_compilation/ice9406.d   |  3 +-
 .../gdc.test/fail_compilation/test21927.d | 20 ++
 .../gdc.test/fail_compilation/test21939.d |  9 +++
 libphobos/src/MERGE   |  2 +-
 libphobos/src/std/typecons.d  | 15 ++--
 32 files changed, 388 insertions(+), 80 deletions(-)
 create mode 100644 
gcc/testsuite/gdc.test/compilable/extra-files/minimal/object.d
 create mode 100644 gcc/testsuite/gdc.test/compilable/interpret5.d
 create mode 100644 gcc/testsuite/gdc.test/compilable/minimal3.d
 create mode 100644 gcc/testsuite/gdc.test/compilable/test21742.d
 create mode 100644 gcc/testsuite/gdc.test/compilable/test22006.d
 create mode 100644 gcc/testsuite/gdc.test/fail_compilation/b12504.d
 create mode 100644 gcc/testsuite/gdc.test/fail_compilation/fail22006.d
 create mode 100644 gcc/testsuite/gdc.test/fail_compilation/test21927.d
 create mode 100644 gcc/testsuite/gdc.test/fail_compilation/test21939.d

diff --git a/gcc/d/dmd/MERGE b/gcc/d/dmd/MERGE
index a617f285eac..d20785d9126 100644
--- a/gcc/d/dmd/MERGE
+++ b/gcc/d/dmd/MERGE
@@ -1,4 +1,4 @@
-4a4e46a6f304a667e0c05d4455706ec2056ffddc
+7a3808254878df8cb70a055bea58afc79187b778
 
 The first line of this file holds the git revision number of the last
 merge done from the dlang/dmd repository.
diff --git a/gcc/d/dmd/cond.c b/gcc/d/dmd/cond.c
index 6f112ad909c..6c7dc9eb1a3 100644
--- a/gcc/d/dmd/cond.c
+++ b/gcc/d/dmd/cond.c
@@ -112,6 +112,7 @@ static void lowerArrayAggregate(StaticForeach *sfe, Scope 
*sc)
 sfe->aggrfe->aggr = new TupleExp(aggr->loc, es);
 sfe->aggrfe->aggr = expressionSemantic(sfe->aggrfe->aggr, sc);
 sfe->aggrfe->aggr = sfe->aggrfe->aggr->optimize(WANTvalue);
+sfe->aggrfe->aggr = sfe->aggrfe->aggr->ctfeInterpret();
 }
 else
 {
@@ -198,7 +199,8 @@ static TypeStruct *createTupleType(Loc loc, Expressions *e)
 Type *ty = new TypeTypeof(loc, new TupleExp(loc, e));
 sdecl->members->push(new VarDeclaration(loc, ty, fid, NULL));
 TypeStruct *r = (TypeStruct *)sdecl->type;
-r->vtinfo = TypeInfoStructDeclaration::create(r); // prevent typeinfo from 
going to object file
+if (global.params.useTypeInfo && Type::dtypeinfo)
+r->vtinfo = TypeInfoStructDeclaration::create(r); // prevent typeinfo 
from going to object file
 return r;
 }
 
@@ -312,15 +314,25 @@ static void lowerNonArrayAggregate(StaticForeach *sfe, 
Scope *sc)
 Identifier *idres = Identifier::generateId("__res");
 VarDeclaration *vard = new VarDeclaration(aloc, aty, idres, NULL);
 Statements *s2 = new Statements();
-s2->push(new ExpStatement(aloc, vard));
-Expression *catass = new 

[committed] libstdc++: Fix filesystem::path comparisons for C++23

2021-06-11 Thread Jonathan Wakely via Gcc-patches
In C++23 there is a basic_string_view(Range&&) constructor, which is
constrained to take a range (specifically, a contiguous_range). When the
filesystem::path comparison operators call lhs.compare(rhs) the overload
taking a string_view is considered, which means checking whether path
satisfies the range concept. That satisfaction result changes depending
whether path::iterator is complete, which is ill-formed; no diagnostic
required. To avoid the problem, this change ensures that the overload
resolution is performed in a context where path::iterator is complete
and the range concept is satisfied. (The result of overload resolution
is always that the compare(const path&) overload is the best match, but
we still have to consider the compare(basic_string_view) one
to decide if it even participates in overload resolution).

For std::filesystem::path we can't define the comparison operators later
in the file, because they are hidden friends, so a new helper is
introduced that gets defined when everything else is complete.

For std::experimental::filesystem::path we can just move the definitions
of the comparison operators later in the file.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* include/bits/fs_path.h (operator==, operator<=>): Use new
_S_compare function.
(path::_S_compare): New function to call path::compare in a
context where path::iterator is complete.
* include/experimental/bits/fs_path.h (operator<, operator==):
Define after path::iterator is complete.
* testsuite/27_io/filesystem/path/native/conv_c++23.cc: New
test.
* testsuite/experimental/filesystem/path/native/conv_c++23.cc:
New test.

Tested powerpc64le-linux. Committed to trunk.

This also needs to be backported to gcc-11 where the string_view
constructor is present for C++23 mode.

commit 1e690757d30775ed340a368b9a9463b2ad68de01
Author: Jonathan Wakely 
Date:   Fri Jun 11 19:18:11 2021

libstdc++: Fix filesystem::path comparisons for C++23

In C++23 there is a basic_string_view(Range&&) constructor, which is
constrained to take a range (specifically, a contiguous_range). When the
filesystem::path comparison operators call lhs.compare(rhs) the overload
taking a string_view is considered, which means checking whether path
satisfies the range concept. That satisfaction result changes depending
whether path::iterator is complete, which is ill-formed; no diagnostic
required. To avoid the problem, this change ensures that the overload
resolution is performed in a context where path::iterator is complete
and the range concept is satisfied. (The result of overload resolution
is always that the compare(const path&) overload is the best match, but
we still have to consider the compare(basic_string_view) one
to decide if it even participates in overload resolution).

For std::filesystem::path we can't define the comparison operators later
in the file, because they are hidden friends, so a new helper is
introduced that gets defined when everything else is complete.

For std::experimental::filesystem::path we can just move the definitions
of the comparison operators later in the file.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* include/bits/fs_path.h (operator==, operator<=>): Use new
_S_compare function.
(path::_S_compare): New function to call path::compare in a
context where path::iterator is complete.
* include/experimental/bits/fs_path.h (operator<, operator==):
Define after path::iterator is complete.
* testsuite/27_io/filesystem/path/native/conv_c++23.cc: New
test.
* testsuite/experimental/filesystem/path/native/conv_c++23.cc:
New test.

diff --git a/libstdc++-v3/include/bits/fs_path.h 
b/libstdc++-v3/include/bits/fs_path.h
index ad2518f414c..5e285204527 100644
--- a/libstdc++-v3/include/bits/fs_path.h
+++ b/libstdc++-v3/include/bits/fs_path.h
@@ -513,13 +513,13 @@ namespace __detail
 
 /// Compare paths
 friend bool operator==(const path& __lhs, const path& __rhs) noexcept
-{ return __lhs.compare(__rhs) == 0; }
+{ return path::_S_compare(__lhs, __rhs) == 0; }
 
 #if __cpp_lib_three_way_comparison
 /// Compare paths
 friend strong_ordering
 operator<=>(const path& __lhs, const path& __rhs) noexcept
-{ return __lhs.compare(__rhs) <=> 0; }
+{ return path::_S_compare(__lhs, __rhs) <=> 0; }
 #else
 /// Compare paths
 friend bool operator!=(const path& __lhs, const path& __rhs) noexcept
@@ -627,6 +627,11 @@ namespace __detail
   static basic_string<_CharT, _Traits, _Allocator>
   _S_str_convert(basic_string_view, const _Allocator&);
 
+// Returns lhs.compare(rhs), but defined after path::iterator is complete.
+__attribute__((__always_inline__))
+static int
+  

Re: [PATCH] libgccjit: Handle truncation and extension for casts [PR 95498]

2021-06-11 Thread David Malcolm via Gcc-patches
On Thu, 2021-05-27 at 21:22 -0400, Antoni Boucher wrote:
> Here's the patch with the condition removed.
> I believe everything is now fixed.
> Thanks!

Thanks; this looks good to me.  Is this the latest version of the
patch; would you like me to apply it?

Dave

> 
> Le jeudi 27 mai 2021 à 18:21 -0400, David Malcolm a écrit :
> > On Tue, 2021-05-25 at 20:16 -0400, Antoni Boucher wrote:
> > > I updated the patch according to the comments by Tom Tromey.
> > > 
> > > There's one question left about your question regarding
> > > C_MAYBE_CONST_EXPR, David:
> > > 
> > > I am not sure if we can get a C_MAYBE_CONST_EXPR from libgccjit,
> > > and
> > > it
> > > indeed seems like it's only created in c-family.
> > > However, we do use it in libgccjit here:
> > > https://github.com/gcc-mirror/gcc/blob/master/gcc/jit/jit-playback.c#L1180
> > > 
> > > I tried removing the condition `if (TREE_CODE (t_ret) !=
> > > C_MAYBE_CONST_EXPR)` and all the tests of libgccjit still pass.
> > > 
> > > That code was copied from here:
> > > https://github.com/gcc-mirror/gcc/blob/master/gcc/c/c-convert.c#L175
> > > and might not be needed in libgccjit.
> > > 
> > > Should I just remove the condition, then?
> > 
> > I think so.
> > 
> > Thanks
> > Dave
> > 
> > > 
> > > Le jeudi 13 mai 2021 à 19:58 -0400, David Malcolm a écrit :
> > > > On Thu, 2021-05-13 at 19:31 -0400, Antoni Boucher wrote:
> > > > > Thanks for your answer.
> > > > > 
> > > > > See my answers below:
> > > > > 
> > > > > Le jeudi 13 mai 2021 à 18:13 -0400, David Malcolm a écrit :
> > > > > > On Sat, 2021-02-20 at 17:17 -0500, Antoni Boucher via Gcc-
> > > > > > patches
> > > > > > wrote:
> > > > > > > Hi.
> > > > > > > Thanks for your feedback!
> > > > > > > 
> > > > > > 
> > > > > > Sorry about the delay in responding.
> > > > > > 
> > > > > > In the past I was hesitant about adding more cast support
> > > > > > to
> > > > > > libgccjit
> > > > > > since I felt that the user could always just create a union
> > > > > > to
> > > > > > do
> > > > > > the
> > > > > > cast.  Then I tried actually using the libgccjit API to do
> > > > > > this,
> > > > > > and
> > > > > > realized how much work it adds, so I now think we do want
> > > > > > to
> > > > > > support
> > > > > > casting more types.
> > > > > > 
> > > > > > 
> > > > > > > See answers below:
> > > > > > > 
> > > > > > > On Sat, Feb 20, 2021 at 11:20:35AM -0700, Tom Tromey
> > > > > > > wrote:
> > > > > > > > > > > > > "Antoni" == Antoni Boucher via Gcc-patches
> > > > > > > > > > > > > <   
> > > > > > > > > > > > > gcc-patches@gcc.gnu.org> writes:
> > > > > > > > 
> > > > > > > > Antoni> gcc/jit/
> > > > > > > > Antoni> PR target/95498
> > > > > > > > Antoni> * jit-playback.c: Add support to handle
> > > > > > > > truncation
> > > > > > > > and extension
> > > > > > > > Antoni> in the convert function.
> > > > > > > > 
> > > > > > > > Antoni> +  switch (dst_code)
> > > > > > > > Antoni> +    {
> > > > > > > > Antoni> +    case INTEGER_TYPE:
> > > > > > > > Antoni> +    case ENUMERAL_TYPE:
> > > > > > > > Antoni> +  t_ret = convert_to_integer (dst_type,
> > > > > > > > expr);
> > > > > > > > Antoni> +  goto maybe_fold;
> > > > > > > > Antoni> +
> > > > > > > > Antoni> +    default:
> > > > > > > > Antoni> +  gcc_assert
> > > > > > > > (gcc::jit::active_playback_ctxt);
> > > > > > > > Antoni> +  gcc::jit::active_playback_ctxt-
> > > > > > > > >add_error
> > > > > > > > (NULL,
> > > > > > > > "unhandled conversion");
> > > > > > > > Antoni> +  fprintf (stderr, "input expression:\n");
> > > > > > > > Antoni> +  debug_tree (expr);
> > > > > > > > Antoni> +  fprintf (stderr, "requested type:\n");
> > > > > > > > Antoni> +  debug_tree (dst_type);
> > > > > > > > Antoni> +  return error_mark_node;
> > > > > > > > Antoni> +
> > > > > > > > Antoni> +    maybe_fold:
> > > > > > > > Antoni> +  if (TREE_CODE (t_ret) !=
> > > > > > > > C_MAYBE_CONST_EXPR)
> > > > > > 
> > > > > > Do we even get C_MAYBE_CONST_EXPR in libgccjit?  That tree
> > > > > > code
> > > > > > is
> > > > > > defined in c-family/c-common.def; how can nodes of that
> > > > > > kind
> > > > > > be
> > > > > > created
> > > > > > outside of the c-family?
> > > > > 
> > > > > I am not sure, but that seems like it's only created in c-
> > > > > family
> > > > > indeed.
> > > > > However, we do use it in libgccjit here:
> > > > > 
> > > > > https://github.com/gcc-mirror/gcc/blob/master/gcc/jit/jit-playback.c#L1180
> > > > > 
> > > > > > 
> > > > > > > > Antoni> +   t_ret = fold (t_ret);
> > > > > > > > Antoni> +  return t_ret;
> > > > > > > > 
> > > > > > > > It seems weird to have a single 'goto' to maybe_fold,
> > > > > > > > especially
> > > > > > > > inside
> > > > > > > > a switch like this.
> > > > > > > > 
> > > > > > > > If you think the maybe_fold code won't be reused, then
> > > > > > > > it
> > > > > > > > should
> > > > > > > > just
> > > > > > > > be
> > > > > > > > hoisted 

Re: [PATCH 0/3]: C N2653 char8_t implementation

2021-06-11 Thread Joseph Myers
On Fri, 11 Jun 2021, Tom Honermann via Gcc-patches wrote:

> The option is needed because it impacts core language backward compatibility
> (for both C and C++, the type of u8 string literals; for C++, the type of u8
> character literals and the new char8_t fundamental type).

Lots of new features in new standard versions can affect backward 
compatibility.  We generally bundle all of those up into a single -std 
option rather than having an explosion of different language variants with 
different features enabled or disabled.  I don't think this feature, for 
C, reaches the threshold that would justify having a separate option to 
control it, especially given that people can use -Wno-pointer-sign or 
pointer casts or their own local char8_t typedef as an intermediate step 
if they want code using u8"" strings to work for both old and new standard 
versions.

I don't think u8"" strings are widely used in C library headers in a way 
where the choice of type matters.  (Use of a feature in library headers is 
a key thing that can justify options such as -fgnu89-inline, because it 
means the choice of language version is no longer fully under control of a 
single project.)

The only feature proposed for C2x that I think is likely to have 
significant compatibility implications in practice for a lot of code is 
making bool, true and false into keywords.  I still don't think a separate 
option makes sense there.  (If that feature is accepted for C2x, what 
would be useful is for people to do distribution rebuilds with -std=gnu2x 
as the default to find and fix code that breaks, in advance of the default 
actually changing in GCC.  But the workaround for not-yet-fixed code would 
be -std=gnu11, not a separate option for that one feature.)

> > I think the whole patch series would best wait until after the proposal
> > has been considered by a WG14 meeting, in addition to not increasing the
> > number of language dialects supported.
> 
> As an opt-in feature, this is useful to gain implementation and deployment
> experience for WG14.

I think this feature is one of the cases where experience in C++ is 
sufficiently relevant for C (although there are certainly cases of other 
language features where the languages are sufficiently different that 
using C++ experience like that can be problematic).

E.g. we didn't need -fdigit-separators for C before digit separators were 
added to C2x, and we don't need -fno-digit-separators now they are in C2x 
(the feature is just enabled or disabled based on the language version), 
although that's one of many features that do affect compatibility in 
corner cases.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH][version 3]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-06-11 Thread Kees Cook via Gcc-patches
On Fri, Jun 11, 2021 at 01:04:09PM +0200, Richard Biener wrote:
> On Tue, 8 Jun 2021, Kees Cook wrote:
> 
> > On Tue, Jun 08, 2021 at 09:41:38AM +0200, Richard Biener wrote:
> > > On Mon, 7 Jun 2021, Qing Zhao wrote:
> > > 
> > > > Hi, 
> > > > 
> > > > > On Jun 7, 2021, at 2:53 AM, Richard Biener  wrote:
> > > > > 
> > > > >> 
> > > > >> To address the above suggestion:
> > > > >> 
> > > > >> My study shows: the call to __builtin_clear_padding is expanded 
> > > > >> during gimplification phase.
> > > > >> And there is no __bultin_clear_padding expanding during rtx 
> > > > >> expanding phase.
> > > > >> However, for -ftrivial-auto-var-init, padding initialization should 
> > > > >> be done both in gimplification phase and rtx expanding phase.
> > > > >> since the __builtin_clear_padding might not be good for rtx 
> > > > >> expanding, reusing __builtin_clear_padding might not work.
> > > > >> 
> > > > >> Let me know if you have any more comments on this.
> > > > > 
> > > > > Yes, I didn't suggest to literally emit calls to 
> > > > > __builtin_clear_padding 
> > > > > but instead to leverage the lowering code, more specifically share the
> > > > > code that figures _what_ is to be initialized (where the padding is)
> > > > > and eventually the actual code generation pieces.  That might need 
> > > > > some
> > > > > refactoring but the code where padding resides should be present only
> > > > > a single time (since it's quite complex).
> > > > 
> > > > Okay, I see your point here.
> > > > 
> > > > > 
> > > > > Which is also why I suggested to split out the padding initialization
> > > > > bits to a separate patch (and option).
> > > > 
> > > > Personally, I am okay with splitting padding initialization from this 
> > > > current patch,
> > > > Kees, what’s your opinion on this? i.e, the current 
> > > > -ftrivial-auto-var-init will NOT initialize padding, we will add 
> > > > another option to 
> > > > Explicitly initialize padding.
> > > 
> > > It would also be possible to have -fauto-var-init, -fauto-var-init-padding
> > > and have -ftrivial-auto-var-init for clang compatibility enabling both.
> > 
> > Sounds good to me!
> > 
> > > Or -fauto-var-init={zero,pattern,padding} and allow
> > > -fauto-var-init=pattern,padding to be specified.  Note there's also
> > > padding between auto variables on the stack - that "trailing"
> > > padding isn't initialized either?  (yes, GCC sorts variables to minimize
> > > that padding)  For example for
> > > 
> > > void foo()
> > > {
> > >   char a[3];
> > >   bar (a);
> > > }
> > > 
> > > there's 12 bytes padding after 'a', shouldn't we initialize that?  If not,
> > > why's other padding important to be initialized?
> > 
> > This isn't a situation that I'm aware of causing real-world problems.
> > The issues have all come from padding within an addressable object. I
> > haven't tested Clang's behavior on this (and I have no kernel tests for
> > this padding), but I do check for trailing padding, like:
> > 
> > struct test_trailing_hole {
> > char *one;
> > char *two;
> > char *three;
> > char four;
> > /* "sizeof(unsigned long) - 1" byte padding hole here. */
> > };
> 
> Any justification why tail padding for
> 
>  struct foo { double x; char x[3]; } a;
> 
> is important but not for
> 
>  char x[3];
> 
> ?  It does look like an odd inconsistency to me.

The problem is with sizeof() and the various compounding results related
to it. Namely, things that do whole-struct copies (which is unfortunately
common in the kernel) will include the padding for "a" since it is within
the object, as represented by sizeof(), but not for x:

#include 

int main(void)
{
struct foo { double y; char x[3]; } a;
char x[3];

printf("a: %zu (a.y: %zu, a.x: %zu)\n", sizeof(a), sizeof(a.y), 
sizeof(a.x));
printf("x: %zu\n", sizeof(x));

return 0;
}

a: 16 (a.y: 8, a.x: 3)
x: 3

And it gets worse with structs-within-structs, etc.

-- 
Kees Cook


Re: [PATCH v2] inline-asm: Fix ICE with bitfields in "m" operands [PR100785]

2021-06-11 Thread Joseph Myers
On Fri, 11 Jun 2021, Jakub Jelinek via Gcc-patches wrote:

> gcc/c/
>   * c-typeck.c (c_mark_addressable): Diagnose trying to make
>   bit-fields addressable.

The C front-end changes are OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH 1/4] introduce diagnostic infrastructure changes (PR 98512)

2021-06-11 Thread David Malcolm via Gcc-patches
On Thu, 2021-06-10 at 17:26 -0600, Martin Sebor wrote:
> This diff introduces the diagnostic infrastructure changes to support
> controlling warnings at any call site in the inlining stack and
> printing
> the inlining context without the %K and %G directives.

Thanks for working on this, looks very promising.

> Improve warning suppression for inlined functions.
> 
> Resolves:
> PR middle-end/98871 - Cannot silence -Wmaybe-uninitialized at declaration site
> PR middle-end/98512 - #pragma GCC diagnostic ignored ineffective in 
> conjunction with alias attribute

Am I right in thinking that you add test coverage for both of these in
patch 2 of the kit?

> 
> gcc/ChangeLog:
> 
>   * diagnostic.c (update_inlining_context): New.
>   (update_effective_level_from_pragmas): Handle inlining context.
>   (diagnostic_report_diagnostic): Same.
>   * diagnostic.h (struct diagnostic_info): Add ctor.
>   (struct diagnostic_context): Add members.
>   * tree-diagnostic.c (get_inlining_locations): New.
>   (set_inlining_location): New.
>   (tree_diagnostics_defaults): Set new callback pointers.

[..snip...]

> @@ -1204,7 +1256,7 @@ diagnostic_report_diagnostic (diagnostic_context 
> *context,
>/* We do this to avoid giving the message for -pedantic-errors.  */
>orig_diag_kind = diagnostic->kind;
>  }
> - 
> +

Stray whitespace change?  Though it looks like a fix of a stray space,
so not a big deal.

>if (diagnostic->kind == DK_NOTE && context->inhibit_notes_p)
>  return false;
>  

[..snip...]

> diff --git a/gcc/diagnostic.h b/gcc/diagnostic.h
> index 1b9d6b1f64d..b95ee23dda0 100644
> --- a/gcc/diagnostic.h
> +++ b/gcc/diagnostic.h
> @@ -87,6 +87,10 @@ enum diagnostics_extra_output_kind
> list in diagnostic.def.  */
>  struct diagnostic_info
>  {
> +  diagnostic_info ()
> +: message (), richloc (), metadata (), x_data (), kind (), option_index 
> ()
> +  { }
> +

Why does the patch add this ctor?

>/* Text to be formatted.  */
>text_info message;
>  
> @@ -343,6 +347,32 @@ struct diagnostic_context
>  
>/* Callback for final cleanup.  */
>void (*final_cb) (diagnostic_context *context);
> +
> +  /* The inlining context of the diagnostic (may have just one
> + element if a diagnostic is not for an inlined expression).  */
> +  struct inlining_ctx
> +  {
> +void reset ()
> +{
> +  ilocs.release ();
> +  loc = UNKNOWN_LOCATION;
> +  ao = NULL;
> +  allsyslocs = false;
> +}
> +
> +/* Locations along the inlining stack.  */
> +auto_vec ilocs;
> +/* The locus of the diagnostic. */
> +location_t loc;
> +/* The abstract origin of the location.  */
> +void *ao;
> +/* Set of every ILOCS element is in a system header.  */
> +bool allsyslocs;
> +  } ictx;

Why is the inlining ctx part of the diagnostic_context?  That feels
strange to me. This inlining information relates to a particular
diagnostic, so it seems more appropriate to me that it should be part
of the diagnostic_info (which might thus necessitate having a ctor for
diagnostic_info).  Doing that might avoid the need for "reset", if I'm
right in assuming that getting the data is done once per diagnostic
during diagnostic_report_diagnostic.

Alternatively, could this be state that's created on the stack during
diagnostic_report_diagnostic and passed around by pointer as another
parameter?  (putting it in diagnostic_info might be simplest though)

Maybe rename it to "inlining_info"?

How involved would it be to make it be a class with private fields?

Can the field names have "m_" prefixes, please?

> +  /* Callbacks to get and set the inlining context.  */

Probably should spell out in the comment here that doing so requires
knowledge of trees, which is why it's a callback (to avoid diagnostic.c
from having to know about trees).

> +  void (*get_locations_cb)(diagnostic_context *, diagnostic_info *);
> +  void (*set_location_cb)(const diagnostic_context *, diagnostic_info *);
>  };
>  
>  static inline void
> diff --git a/gcc/tree-diagnostic.c b/gcc/tree-diagnostic.c
> index 95b8ef30070..a8c5484849a 100644
> --- a/gcc/tree-diagnostic.c
> +++ b/gcc/tree-diagnostic.c
> @@ -305,6 +305,84 @@ default_tree_printer (pretty_printer *pp, text_info 
> *text, const char *spec,
>return true;
>  }
>  
> +/* Get the inlining stack corresponding to the DIAGNOSTIC location.  */
> +
> +static void
> +get_inlining_locations (diagnostic_context *context,
> + diagnostic_info *diagnostic)
> +{
> +  context->ictx.reset ();
> +
> +  location_t loc = diagnostic_location (diagnostic);
> +  tree block = LOCATION_BLOCK (loc);
> +
> +  /* Count the number of locations in system headers.  When all are,
> + warnings are suppressed by -Wno-system-headers.  Otherwise, they
> + involve some user code, possibly inlined into a function in a system
> + header, and are not treated as coming from system headers.  */
> +  unsigned 

Re: [PATCH][version 3]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-06-11 Thread Qing Zhao via Gcc-patches


On Jun 11, 2021, at 10:49 AM, Qing Zhao via Gcc-patches 
mailto:gcc-patches@gcc.gnu.org>> wrote:



On Jun 11, 2021, at 6:12 AM, Richard Biener 
mailto:rguent...@suse.de>> wrote:

On Thu, 10 Jun 2021, Qing Zhao wrote:

Hi, Richard,

I need more discussion on the following comments you raised:

On May 26, 2021, at 6:18 AM, Richard Biener 
mailto:rguent...@suse.de>> wrote:

+/* Expand the IFN_DEFERRED_INIT function according to its second
argument.  */
+static void
+expand_DEFERRED_INIT (internal_fn, gcall *stmt)
+{
+  tree var = gimple_call_lhs (stmt);
+  tree init = NULL_TREE;
+  enum auto_init_type init_type
+= (enum auto_init_type) TREE_INT_CST_LOW (gimple_call_arg (stmt, 1));
+
+  switch (init_type)
+{
+default:
+  gcc_unreachable ();
+case AUTO_INIT_PATTERN:
+  init = build_pattern_cst_for_auto_init (TREE_TYPE (var));
+  expand_assignment (var, init, false);
+  break;
+case AUTO_INIT_ZERO:
+  init = build_zero_cst (TREE_TYPE (var));
+  expand_assignment (var, init, false);
+  break;
+}

I think actually building build_pattern_cst_for_auto_init can generate
massive garbage and for big auto vars code size is also a concern and
ideally on x86 you'd produce rep movq.  So I don't think going
via expand_assignment is good.  Instead you possibly want to lower
.DEFERRED_INIT to MEMs following expand_builtin_memset and
eventually enhance that to allow storing pieces larger than a byte.

When I tried to lower .DEFERRED_INIT to MEMs for  “AUTO_INIT_PATTERN”, I have 
the following questions:

1. If .DEFERRED_INIT will be lowered to MEMS through “memset”, then we 
basically initialize the whole memory covering the
auto variable, including paddings. Right?

Yes.

2. Only when the value that is used to initialization has a repeated
 byte-pattern, we can lower it through “memset”. Otherwise, If the
 value that is used to initialization does Not have a repeated
 byte-pattern, we can NOT lower it through “memset”, right?

Yes.  This is why I said you should do it _similar_ to how memcpy
is implemented.  OTOH I don't see a good reason to support patterns
that are bigger than a byte ...

Currently, for the values that are used to initialize for “AUTO_INIT_PATTERN”, 
we have:

/* The following value is a guaranteed unmappable pointer value and has a
   repeated byte-pattern which makes it easier to synthesize.  We use it for
   pointers as well as integers so that aggregates are likely to be
   initialized with this repeated value.  */
uint64_t largevalue = 0xull;
/* For 32-bit platforms it's a bit trickier because, across systems, only the
   zero page can reasonably be expected to be unmapped, and even then we need
   a very low address.  We use a smaller value, and that value sadly doesn't
   have a repeated byte-pattern.  We don't use it for integers.  */
uint32_t smallvalue = 0x00AA;

In additional to the above, for BOOLEAN_TYPE:

  case BOOLEAN_TYPE:
/* We think that initializing the boolean variable to 0 other than 1
   is better even for pattern initialization.  */

Due to “BOOLEAN_TYPE” and “POINTER_TYPE”, we cannot always have a
repeated byte-pattern for variables that include BOOLEAN_TYPE Or Pointer
types. Therefore, lowering the .DEFERRED_INIT for “PATTERN”
initialization through “memset” is not always possible.

Let me know if I miss anything in the above. Do you have other suggestions?

The main point is that you need to avoid building the explicit initializer
only to have it consumed by assignment expansion.  If you want to keep
all the singing and dancing (as opposed to maybe initializing with a
0x1 byte pattern) then I think for efficiency you still want to
block-initialize the variable and then only fixup the special fields.

Yes, this is a good idea.

We can memset the whole structure with repeated pattern “0xAA” first,
Then mixup BOOLEAN_TYPE and POINTER TYPE for 32-bit platform.
That might be more efficient.

However, the paddings will be initialized to “0xAA”.
But this should be fine since with -fauto-var-init,  the paddings can be any 
value.

So, still should be fine.

Qing



Re: [PATCH 1/3]: C N2653 char8_t: Language support

2021-06-11 Thread Jakub Jelinek via Gcc-patches
On Fri, Jun 11, 2021 at 12:20:48PM -0400, Tom Honermann wrote:
> I'm open to whatever signaling mechanism would be preferred.  It took me a
> while to settle on _CHAR8_T_SOURCE as the mechanism to propose as I didn't
> find much for other precedents.
> 
> I agree that having _CHAR8_T_SOURCE be implied by the -fchar8_t option is
> unusual with respect to other feature test macros.  Is that what you find to
> be weird and inconsistent?
> 
> Predefining __SIZEOF_CHAR8_T__ would be consistent with __SIZEOF_WCHAR_T__,
> but kind of strange too since the size is always 1.
> 
> Perhaps a better approach would be to follow the __CHAR16_TYPE__ and
> __CHAR32_TYPE__ precedent and define __CHAR8_TYPE__ to unsigned char.  That
> is likewise a bit strange since the type would always be unsigned char, but
> it does provide a bit more symmetry.  That could potentially have some use
> as well; for C++, it could be defined as char8_t and thereby reflect the
> difference between the two languages.  Perhaps it could be useful in the
> future as well if WG14 were to add distinct char8_t, char16_t, and char32_t
> types as C++ did (I'm not offering any prediction regarding the likelihood
> of that happening).

C++ already predefines
#define __CHAR8_TYPE__ unsigned char
#define __CHAR16_TYPE__ short unsigned int
#define __CHAR32_TYPE__ unsigned int
for -std={c,gnu}++2{0,a,3,b} or -fchar8_t (unless -fno-char8_t), so I agree
just making sure __CHAR8_TYPE__ is defined to unsigned char even for C
is best.
And you probably don't need to do anything in the C patch for it,
void
c_stddef_cpp_builtins(void)
{
  builtin_define_with_value ("__SIZE_TYPE__", SIZE_TYPE, 0);
...
  if (flag_char8_t)
builtin_define_with_value ("__CHAR8_TYPE__", CHAR8_TYPE, 0);
  builtin_define_with_value ("__CHAR16_TYPE__", CHAR16_TYPE, 0);
  builtin_define_with_value ("__CHAR32_TYPE__", CHAR32_TYPE, 0);
will do that.

Jakub



RE: [GCC][PATCH] arm: Fix polymorphic variants failing with undefined reference to `__ARM_undef` error.

2021-06-11 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Srinath Parvathaneni 
> Sent: 10 June 2021 17:14
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Earnshaw
> 
> Subject: [GCC][PATCH] arm: Fix polymorphic variants failing with undefined
> reference to `__ARM_undef` error.
> 
> Hi,
> 
> This patch fixes the issue mentioned in PR101016, which is mve polymorphic
> variants
> failing at linking with undefined reference to "__ARM_undef" error.
> 
> Regression tested on arm-none-eabi and found no regressions.
> 
> Ok for master?

Ok.
Thanks,
Kyrill

> 
> Regards,
> Srinath.
> 
> gcc/ChangeLog:
> 
> 2021-06-10  Srinath Parvathaneni  
> 
>   PR target/101016
>   * config/arm/arm_mve.h (__arm_vld1q): Change
> __ARM_mve_coerce(p0,
>   int8_t const *) to __ARM_mve_coerce1(p0, int8_t *) in the argument
> for
>   the polymorphic variants matching code.
>   (__arm_vld1q_z): Likewise.
>   (__arm_vld2q): Likewise.
>   (__arm_vld4q): Likewise.
>   (__arm_vldrbq_gather_offset): Likewise.
>   (__arm_vldrbq_gather_offset_z): Likewise.
> 
> gcc/testsuite/ChangeLog:
> 
> 2021-06-10  Srinath Parvathaneni  
> 
>   PR target/101016
>   * gcc.target/arm/mve/intrinsics/pr101016.c: New test.
> 
> 
> 
> ### Attachment also inlined for ease of reply
> ###
> 
> 
> diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
> index
> 1380f3acbfe64026bc882c308bb1c243e27ac4b3..83f10036990fc3df956fb2fa
> 4818d1304138b485 100644
> --- a/gcc/config/arm/arm_mve.h
> +++ b/gcc/config/arm/arm_mve.h
> @@ -37565,47 +37565,47 @@ extern void *__ARM_undef;
> 
>  #define __arm_vld1q(p0) (\
>_Generic( (int (*)[__ARM_mve_typeid(p0)])0, \
> -  int (*)[__ARM_mve_type_int8_t_ptr]: __arm_vld1q_s8
> (__ARM_mve_coerce(p0, int8_t const *)), \
> -  int (*)[__ARM_mve_type_int16_t_ptr]: __arm_vld1q_s16
> (__ARM_mve_coerce(p0, int16_t const *)), \
> -  int (*)[__ARM_mve_type_int32_t_ptr]: __arm_vld1q_s32
> (__ARM_mve_coerce(p0, int32_t const *)), \
> -  int (*)[__ARM_mve_type_uint8_t_ptr]: __arm_vld1q_u8
> (__ARM_mve_coerce(p0, uint8_t const *)), \
> -  int (*)[__ARM_mve_type_uint16_t_ptr]: __arm_vld1q_u16
> (__ARM_mve_coerce(p0, uint16_t const *)), \
> -  int (*)[__ARM_mve_type_uint32_t_ptr]: __arm_vld1q_u32
> (__ARM_mve_coerce(p0, uint32_t const *)), \
> -  int (*)[__ARM_mve_type_float16_t_ptr]: __arm_vld1q_f16
> (__ARM_mve_coerce(p0, float16_t const *)), \
> -  int (*)[__ARM_mve_type_float32_t_ptr]: __arm_vld1q_f32
> (__ARM_mve_coerce(p0, float32_t const *
> +  int (*)[__ARM_mve_type_int8_t_ptr]: __arm_vld1q_s8
> (__ARM_mve_coerce1(p0, int8_t *)), \
> +  int (*)[__ARM_mve_type_int16_t_ptr]: __arm_vld1q_s16
> (__ARM_mve_coerce1(p0, int16_t *)), \
> +  int (*)[__ARM_mve_type_int32_t_ptr]: __arm_vld1q_s32
> (__ARM_mve_coerce1(p0, int32_t *)), \
> +  int (*)[__ARM_mve_type_uint8_t_ptr]: __arm_vld1q_u8
> (__ARM_mve_coerce1(p0, uint8_t *)), \
> +  int (*)[__ARM_mve_type_uint16_t_ptr]: __arm_vld1q_u16
> (__ARM_mve_coerce1(p0, uint16_t *)), \
> +  int (*)[__ARM_mve_type_uint32_t_ptr]: __arm_vld1q_u32
> (__ARM_mve_coerce1(p0, uint32_t *)), \
> +  int (*)[__ARM_mve_type_float16_t_ptr]: __arm_vld1q_f16
> (__ARM_mve_coerce1(p0, float16_t *)), \
> +  int (*)[__ARM_mve_type_float32_t_ptr]: __arm_vld1q_f32
> (__ARM_mve_coerce1(p0, float32_t *
> 
>  #define __arm_vld1q_z(p0,p1) ( \
>_Generic( (int (*)[__ARM_mve_typeid(p0)])0, \
> -  int (*)[__ARM_mve_type_int8_t_ptr]: __arm_vld1q_z_s8
> (__ARM_mve_coerce(p0, int8_t const *), p1), \
> -  int (*)[__ARM_mve_type_int16_t_ptr]: __arm_vld1q_z_s16
> (__ARM_mve_coerce(p0, int16_t const *), p1), \
> -  int (*)[__ARM_mve_type_int32_t_ptr]: __arm_vld1q_z_s32
> (__ARM_mve_coerce(p0, int32_t const *), p1), \
> -  int (*)[__ARM_mve_type_uint8_t_ptr]: __arm_vld1q_z_u8
> (__ARM_mve_coerce(p0, uint8_t const *), p1), \
> -  int (*)[__ARM_mve_type_uint16_t_ptr]: __arm_vld1q_z_u16
> (__ARM_mve_coerce(p0, uint16_t const *), p1), \
> -  int (*)[__ARM_mve_type_uint32_t_ptr]: __arm_vld1q_z_u32
> (__ARM_mve_coerce(p0, uint32_t const *), p1), \
> -  int (*)[__ARM_mve_type_float16_t_ptr]: __arm_vld1q_z_f16
> (__ARM_mve_coerce(p0, float16_t const *), p1), \
> -  int (*)[__ARM_mve_type_float32_t_ptr]: __arm_vld1q_z_f32
> (__ARM_mve_coerce(p0, float32_t const *), p1)))
> +  int (*)[__ARM_mve_type_int8_t_ptr]: __arm_vld1q_z_s8
> (__ARM_mve_coerce1(p0, int8_t *), p1), \
> +  int (*)[__ARM_mve_type_int16_t_ptr]: __arm_vld1q_z_s16
> (__ARM_mve_coerce1(p0, int16_t *), p1), \
> +  int (*)[__ARM_mve_type_int32_t_ptr]: __arm_vld1q_z_s32
> (__ARM_mve_coerce1(p0, int32_t *), p1), \
> +  int (*)[__ARM_mve_type_uint8_t_ptr]: __arm_vld1q_z_u8
> (__ARM_mve_coerce1(p0, uint8_t *), p1), \
> +  int (*)[__ARM_mve_type_uint16_t_ptr]: __arm_vld1q_z_u16
> (__ARM_mve_coerce1(p0, uint16_t *), p1), \
> +  int (*)[__ARM_mve_type_uint32_t_ptr]: __arm_vld1q_z_u32
> (__ARM_mve_coerce1(p0, uint32_t *), p1), \
> +  int (*)[__ARM_mve_type_float16_t_ptr]: __arm_vld1q_z_f16
> 

Re: [PATCH 2/2] arm: Auto-vectorization for MVE: add pack/unpack patterns

2021-06-11 Thread Richard Sandiford via Gcc-patches
Christophe Lyon  writes:
> Thanks for the feedback. How about v2 attached?
> Do you want me to merge neon_vec_unpack and
> mve_vec_unpack  and only have different assembly?
> if (TARGET_HAVE_MVE)
>   return "vmovlb.%# %q0, %q1";
> else
>   return "vmovlb.%# %q0, %q1";

I think it'd be better to keep them separate, given that they have
different attributes.

> @@ -2199,10 +,23 @@ (define_insn "mve_vmovnbq_"
>[(set_attr "type" "mve_move")
>  ])
>  
> +;; vmovnb pattern used by the vec_pack_trunc expander to avoid the
> +;; need for an extra register.

“to avoid the need for an uninitailized input operand” might
be clearer.

> +(define_insn "@mve_vec_pack_trunc_lo_"
> +  [
> +   (set (match_operand: 0 "s_register_operand" "=w")
> + (unspec: [(match_operand:MVE_5 1 "s_register_operand" 
> "w")]
> +  VMOVNBQ_S))
> +  ]
> +  "TARGET_HAVE_MVE"
> +  "vmovnb.i%# %q0, %q1"
> +  [(set_attr "type" "mve_move")
> +])
> +
> […]
> diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md
> index 430a92ce966..3afb3e1d891 100644
> --- a/gcc/config/arm/vec-common.md
> +++ b/gcc/config/arm/vec-common.md
> @@ -632,3 +632,91 @@ (define_expand "clz2"
>"ARM_HAVE__ARITH
> && !TARGET_REALLY_IWMMXT"
>  )
> +
> +;; vmovl[tb] are not available for V4SI on MVE
> +(define_expand "vec_unpack_hi_"
> +  [(match_operand: 0 "register_operand")
> +   (SE: (match_operand:VU 1 "register_operand"))]
> + "ARM_HAVE__ARITH
> +  && !TARGET_REALLY_IWMMXT
> +  && ! (mode == V4SImode && TARGET_HAVE_MVE)
> +  && !BYTES_BIG_ENDIAN"
> +  {
> +rtvec v = rtvec_alloc (/2);
> +rtx t1;
> +int i;
> +for (i = 0; i < (/2); i++)
> +  RTVEC_ELT (v, i) = GEN_INT ((/2) + i);
> +
> +t1 = gen_rtx_PARALLEL (mode, v);
> +
> +if (TARGET_NEON)
> +  emit_insn (gen_neon_vec_unpack_hi_ (operands[0],
> + operands[1],
> + t1));
> +else
> +  emit_insn (gen_mve_vec_unpack_hi_ (operands[0],
> +operands[1],
> +t1));
> +DONE;
> +  }
> +)

Since the patterns are the same for both MVE and Neon (a good thing!),
it would be simpler to write this as:

(define_insn "mve_vec_unpack_lo_"
  [(set (match_operand: 0 "register_operand" "=w")
(SE: (vec_select:
  (match_operand:VU 1 "register_operand" "w")
  (match_dup 2]
  …

and then set operands[2] to what is t1 above.  There would then be
no need to call emit_insn & DONE, and we could use the natural
iterator (MVE_3) for the MVE patterns.

Same idea for the lo version.

> […]
> +;; vmovn[tb] are not available for V2DI on MVE
> +(define_expand "vec_pack_trunc_"
> + [(set (match_operand: 0 "register_operand")
> +   (vec_concat:
> + (truncate:
> + (match_operand:VN 1 "register_operand"))
> + (truncate:
> + (match_operand:VN 2 "register_operand"]
> + "ARM_HAVE__ARITH
> +  && !TARGET_REALLY_IWMMXT
> +  && ! (mode == V2DImode && TARGET_HAVE_MVE)
> +  && !BYTES_BIG_ENDIAN"
> + {
> +   if (TARGET_NEON)
> + {
> +   emit_insn (gen_neon_quad_vec_pack_trunc_ (operands[0], 
> operands[1],
> +operands[2]));
> + }
> +   else
> + {
> +   rtx tmpreg = gen_reg_rtx (mode);
> +   emit_insn (gen_mve_vec_pack_trunc_lo (mode, tmpreg, 
> operands[1]));
> +   emit_insn (gen_mve_vmovntq (VMOVNTQ_S, mode,
> +tmpreg, tmpreg, operands[2]));
> +   emit_move_insn (operands[0], tmpreg);

vmovntq can assign directly to operands[0], without a separate move.

OK with those changes, thanks.

Richard

> + }
> +   DONE;
> + }
> +)
> […]


Re: [PATCH][version 3]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-06-11 Thread Kees Cook via Gcc-patches
On Fri, Jun 11, 2021 at 03:49:02PM +, Qing Zhao wrote:
> 
> On Jun 11, 2021, at 6:12 AM, Richard Biener  wrote:
> > [...]
> > 
> > The main point is that you need to avoid building the explicit initializer
> > only to have it consumed by assignment expansion.  If you want to keep
> > all the singing and dancing (as opposed to maybe initializing with a
> > 0x1 byte pattern) then I think for efficiency you still want to
> > block-initialize the variable and then only fixup the special fields.
> 
> Yes, this is a good idea. 
> 
> We can memset the whole structure with repeated pattern “0xAA” first,
> Then mixup BOOLEAN_TYPE and POINTER TYPE for 32-bit platform. 
> That might be more efficient. 
> 
> > 
> > But as said, all this is quite over-designed IMHO and simply
> > zeroing everything would be much simpler and good enough.
> 
> So, the fundenmental questions are:
> 
> 1. do we need the functionality of “Pattern Initialization” for debugging 
> purpose?
> I see that other compilers support both Zero initialization and Pattern 
> initialization. (Clang and Microsoft compiler)
> 
> http://lists.llvm.org/pipermail/cfe-dev/2020-April/065221.html
> https://msrc-blog.microsoft.com/2020/05/13/solving-uninitialized-stack-memory-on-windows/
> Pattern init is used in development build for debugging purpose, zero init is 
> used in production build for security purpose.

Correct -- "pattern" is much better about triggering all kinds of
problems, and suitable for debug builds. "zero" is less disruptive and
generally provides a safer failure mode for production builds.

> So, I assume that GCC might want to provide similar functionality?  But I am 
> open on this. 
> 
> Kees, will Kernel use “Pattern initialization” feature? 

It is currently support, yes. Note that if I had to choose between "nothing" and
"only zero", I will happily take "only zero". However, it seems like
pattern init isn't much more of an addition in this series.

> 2. Since “Pattern initialization” is just used for debugging purpose, the 
> runtime and code size overhead might not be that 
> Important at all, right?

That has been my impression, yes.

Thanks!

-Kees

-- 
Kees Cook


Re: [PATCH 1/3]: C N2653 char8_t: Language support

2021-06-11 Thread Tom Honermann via Gcc-patches

On 6/11/21 12:01 PM, Jakub Jelinek wrote:

On Fri, Jun 11, 2021 at 11:52:41AM -0400, Tom Honermann via Gcc-patches wrote:

On 6/7/21 5:11 PM, Joseph Myers wrote:

On Sun, 6 Jun 2021, Tom Honermann via Gcc-patches wrote:


When -fchar8_t support is enabled for non-C++ modes, the _CHAR8_T_SOURCE macro
is predefined.  This is the mechanism proposed to glibc to opt-in to
declarations of the char8_t typedef and c8rtomb and mbrtoc8 functions proposed
in N2653.  See [2].

I don't think glibc should have such a feature test macro, and I don't
think GCC should define such feature test macros either - _*_SOURCE macros
are generally for the *user* to define to decide what namespace they want
visible, not for the compiler to define.  Without proliferating new
language dialects, __STDC_VERSION__ ought to be sufficient to communicate
from the compiler to the library (including to GCC's own headers such as
stdatomic.h).


In general I agree, but I think an exception is warranted in this case for a
few reasons:

1. The feature includes both core language changes (the change of type
for u8 string literals) and library changes.  The library changes
are not actually dependent on the core language change, but they are
intended to be used together.
2. Existing use of the char8_t identifier can be found in existing open
source projects and likely exists in some closed source projects as
well.  An opt-in approach avoids conflict and the need to
conditionalize code based on gcc version.
3. An opt-in approach enables evaluation of the feature prior to any
WG14 approval.

But calling it _CHAR8_T_SOURCE is weird and inconsistent with everything
else.
In C++, there is __cpp_char8_t 201811L predefined macro for char8_t.
Using that in C is not right, sure.
Often we use __SIZEOF_type__ macros not just for sizeof(), but also for
presence check of the types, like
#ifdef __SIZEOF_INT128__
__int128 i;
#else
long long i;
#endif
etc., while char8_t has sizeof (char8_t) == 1, perhaps predefining
__SIZEOF_CHAR8_T__ 1
instead of _CHAR8_T_SOURCE would be better?


I'm open to whatever signaling mechanism would be preferred.  It took me 
a while to settle on _CHAR8_T_SOURCE as the mechanism to propose as I 
didn't find much for other precedents.


I agree that having _CHAR8_T_SOURCE be implied by the -fchar8_t option 
is unusual with respect to other feature test macros.  Is that what you 
find to be weird and inconsistent?


Predefining __SIZEOF_CHAR8_T__ would be consistent with 
__SIZEOF_WCHAR_T__, but kind of strange too since the size is always 1.


Perhaps a better approach would be to follow the __CHAR16_TYPE__ and 
__CHAR32_TYPE__ precedent and define __CHAR8_TYPE__ to unsigned char.  
That is likewise a bit strange since the type would always be unsigned 
char, but it does provide a bit more symmetry.  That could potentially 
have some use as well; for C++, it could be defined as char8_t and 
thereby reflect the difference between the two languages.  Perhaps it 
could be useful in the future as well if WG14 were to add distinct 
char8_t, char16_t, and char32_t types as C++ did (I'm not offering any 
prediction regarding the likelihood of that happening).


Tom.



Jakub





Re: [PATCH 1/3]: C N2653 char8_t: Language support

2021-06-11 Thread Jakub Jelinek via Gcc-patches
On Fri, Jun 11, 2021 at 11:52:41AM -0400, Tom Honermann via Gcc-patches wrote:
> On 6/7/21 5:11 PM, Joseph Myers wrote:
> > On Sun, 6 Jun 2021, Tom Honermann via Gcc-patches wrote:
> > 
> > > When -fchar8_t support is enabled for non-C++ modes, the _CHAR8_T_SOURCE 
> > > macro
> > > is predefined.  This is the mechanism proposed to glibc to opt-in to
> > > declarations of the char8_t typedef and c8rtomb and mbrtoc8 functions 
> > > proposed
> > > in N2653.  See [2].
> > I don't think glibc should have such a feature test macro, and I don't
> > think GCC should define such feature test macros either - _*_SOURCE macros
> > are generally for the *user* to define to decide what namespace they want
> > visible, not for the compiler to define.  Without proliferating new
> > language dialects, __STDC_VERSION__ ought to be sufficient to communicate
> > from the compiler to the library (including to GCC's own headers such as
> > stdatomic.h).
> > 
> In general I agree, but I think an exception is warranted in this case for a
> few reasons:
> 
> 1. The feature includes both core language changes (the change of type
>for u8 string literals) and library changes.  The library changes
>are not actually dependent on the core language change, but they are
>intended to be used together.
> 2. Existing use of the char8_t identifier can be found in existing open
>source projects and likely exists in some closed source projects as
>well.  An opt-in approach avoids conflict and the need to
>conditionalize code based on gcc version.
> 3. An opt-in approach enables evaluation of the feature prior to any
>WG14 approval.

But calling it _CHAR8_T_SOURCE is weird and inconsistent with everything
else.
In C++, there is __cpp_char8_t 201811L predefined macro for char8_t.
Using that in C is not right, sure.
Often we use __SIZEOF_type__ macros not just for sizeof(), but also for
presence check of the types, like
#ifdef __SIZEOF_INT128__
__int128 i;
#else
long long i;
#endif
etc., while char8_t has sizeof (char8_t) == 1, perhaps predefining
__SIZEOF_CHAR8_T__ 1
instead of _CHAR8_T_SOURCE would be better?

Jakub



Re: [PATCH 1/3]: C N2653 char8_t: Language support

2021-06-11 Thread Tom Honermann via Gcc-patches

On 6/7/21 5:12 PM, Joseph Myers wrote:

Also, it seems odd to add a new field to cpp_options without any code in
libcpp that uses the value of that field.

Ah, thank you.  That appears to be leftover code from prior 
experimentation and I failed to identify it as such when preparing the 
patch.  I'll provide a revised patch.


Tom.



Re: [PATCH 1/3]: C N2653 char8_t: Language support

2021-06-11 Thread Tom Honermann via Gcc-patches

On 6/7/21 5:11 PM, Joseph Myers wrote:

On Sun, 6 Jun 2021, Tom Honermann via Gcc-patches wrote:


When -fchar8_t support is enabled for non-C++ modes, the _CHAR8_T_SOURCE macro
is predefined.  This is the mechanism proposed to glibc to opt-in to
declarations of the char8_t typedef and c8rtomb and mbrtoc8 functions proposed
in N2653.  See [2].

I don't think glibc should have such a feature test macro, and I don't
think GCC should define such feature test macros either - _*_SOURCE macros
are generally for the *user* to define to decide what namespace they want
visible, not for the compiler to define.  Without proliferating new
language dialects, __STDC_VERSION__ ought to be sufficient to communicate
from the compiler to the library (including to GCC's own headers such as
stdatomic.h).

In general I agree, but I think an exception is warranted in this case 
for a few reasons:


1. The feature includes both core language changes (the change of type
   for u8 string literals) and library changes.  The library changes
   are not actually dependent on the core language change, but they are
   intended to be used together.
2. Existing use of the char8_t identifier can be found in existing open
   source projects and likely exists in some closed source projects as
   well.  An opt-in approach avoids conflict and the need to
   conditionalize code based on gcc version.
3. An opt-in approach enables evaluation of the feature prior to any
   WG14 approval.

Tom.



Re: GCC documentation: porting to Sphinx

2021-06-11 Thread Joseph Myers
On Fri, 11 Jun 2021, Martin Liška wrote:

> > Where languages have their own manuals, I think it's more appropriate for
> > those to go under the language-specific directories.
> 
> So it will require the following folder structure:
> 
> $gccroot/gcc/doc/gcc - for GCC documentation
> $gccroot/gcc/doc/gccint - for GCC internal documentation
> $gccroot/gcc/doc/gfortran - for Fortran documentation
> $gccroot/gcc/doc/gccgo - for GO documentation

I'm thinking of

$gccroot/gcc/fortran/doc
$gccroot/gcc/go/doc

(or subdirectories thereof if desired) for the Fortran and Go manuals, so 
they go alongside the front end sources.

> The Sphinx Makefile will be capable of e.g.

My concern with makefiles is what the main GCC build system does, with 
"make" run at the top level of the build tree and with the targets defined 
by the GNU Coding Standards, not with what happens if someone manually 
does make in a subdirectory of the source or build tree.

"make" at top level should build all the info manuals and man pages, as at 
present (if a suitable Sphinx version is installed), and "make install" 
should install them, in the same directories as at present.

"make html" at top level should build all the HTML manuals, and "make 
install-html" should install them.

"make pdf" and "make install-pdf" at top level should work likewise.

"make install-html" and "make install-pdf" should put things under 
$(DESTDIR)$(htmldir) and $(DESTDIR)$(pdfdir) as at present.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH][version 3]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-06-11 Thread Qing Zhao via Gcc-patches


> On Jun 11, 2021, at 6:12 AM, Richard Biener  wrote:
> 
> On Thu, 10 Jun 2021, Qing Zhao wrote:
> 
>> Hi, Richard,
>> 
>> I need more discussion on the following comments you raised:
>> 
>>> On May 26, 2021, at 6:18 AM, Richard Biener  wrote:
>>> 
>>> +/* Expand the IFN_DEFERRED_INIT function according to its second 
>>> argument.  */
>>> +static void
>>> +expand_DEFERRED_INIT (internal_fn, gcall *stmt)
>>> +{
>>> +  tree var = gimple_call_lhs (stmt);
>>> +  tree init = NULL_TREE;
>>> +  enum auto_init_type init_type
>>> += (enum auto_init_type) TREE_INT_CST_LOW (gimple_call_arg (stmt, 1));
>>> +
>>> +  switch (init_type)
>>> +{
>>> +default:
>>> +  gcc_unreachable ();
>>> +case AUTO_INIT_PATTERN:
>>> +  init = build_pattern_cst_for_auto_init (TREE_TYPE (var));
>>> +  expand_assignment (var, init, false);
>>> +  break;
>>> +case AUTO_INIT_ZERO:
>>> +  init = build_zero_cst (TREE_TYPE (var));
>>> +  expand_assignment (var, init, false);
>>> +  break;
>>> +}
>>> 
>>> I think actually building build_pattern_cst_for_auto_init can generate
>>> massive garbage and for big auto vars code size is also a concern and
>>> ideally on x86 you'd produce rep movq.  So I don't think going
>>> via expand_assignment is good.  Instead you possibly want to lower
>>> .DEFERRED_INIT to MEMs following expand_builtin_memset and
>>> eventually enhance that to allow storing pieces larger than a byte.
>> 
>> When I tried to lower .DEFERRED_INIT to MEMs for  “AUTO_INIT_PATTERN”, I 
>> have the following questions:
>> 
>> 1. If .DEFERRED_INIT will be lowered to MEMS through “memset”, then we 
>> basically initialize the whole memory covering the
>> auto variable, including paddings. Right?
> 
> Yes.
> 
>> 2. Only when the value that is used to initialization has a repeated 
>>   byte-pattern, we can lower it through “memset”. Otherwise, If the 
>>   value that is used to initialization does Not have a repeated 
>>   byte-pattern, we can NOT lower it through “memset”, right?
> 
> Yes.  This is why I said you should do it _similar_ to how memcpy
> is implemented.  OTOH I don't see a good reason to support patterns
> that are bigger than a byte ...
> 
>> Currently, for the values that are used to initialize for 
>> “AUTO_INIT_PATTERN”, we have:
>> 
>>  /* The following value is a guaranteed unmappable pointer value and has a
>> repeated byte-pattern which makes it easier to synthesize.  We use it for
>> pointers as well as integers so that aggregates are likely to be
>> initialized with this repeated value.  */
>>  uint64_t largevalue = 0xull;
>>  /* For 32-bit platforms it's a bit trickier because, across systems, only 
>> the
>> zero page can reasonably be expected to be unmapped, and even then we 
>> need
>> a very low address.  We use a smaller value, and that value sadly doesn't
>> have a repeated byte-pattern.  We don't use it for integers.  */
>>  uint32_t smallvalue = 0x00AA;
>> 
>> In additional to the above, for BOOLEAN_TYPE:
>> 
>>case BOOLEAN_TYPE:
>>  /* We think that initializing the boolean variable to 0 other than 1
>> is better even for pattern initialization.  */
>> 
>> Due to “BOOLEAN_TYPE” and “POINTER_TYPE”, we cannot always have a 
>> repeated byte-pattern for variables that include BOOLEAN_TYPE Or Pointer 
>> types. Therefore, lowering the .DEFERRED_INIT for “PATTERN” 
>> initialization through “memset” is not always possible.
>> 
>> Let me know if I miss anything in the above. Do you have other suggestions?
> 
> The main point is that you need to avoid building the explicit initializer
> only to have it consumed by assignment expansion.  If you want to keep
> all the singing and dancing (as opposed to maybe initializing with a
> 0x1 byte pattern) then I think for efficiency you still want to
> block-initialize the variable and then only fixup the special fields.

Yes, this is a good idea. 

We can memset the whole structure with repeated pattern “0xAA” first,
Then mixup BOOLEAN_TYPE and POINTER TYPE for 32-bit platform. 
That might be more efficient. 

> 
> But as said, all this is quite over-designed IMHO and simply
> zeroing everything would be much simpler and good enough.

So, the fundenmental questions are:

1. do we need the functionality of “Pattern Initialization” for debugging 
purpose?
I see that other compilers support both Zero initialization and Pattern 
initialization. (Clang and Microsoft compiler)

http://lists.llvm.org/pipermail/cfe-dev/2020-April/065221.html
https://msrc-blog.microsoft.com/2020/05/13/solving-uninitialized-stack-memory-on-windows/
Pattern init is used in development build for debugging purpose, zero init is 
used in production build for security purpose.

So, I assume that GCC might want to provide similar functionality?  But I am 
open on this. 

Kees, will Kernel use “Pattern initialization” feature? 

2. Since “Pattern initialization” is just used for debugging 

Re: [PATCH 0/3]: C N2653 char8_t implementation

2021-06-11 Thread Tom Honermann via Gcc-patches

On 6/7/21 5:03 PM, Joseph Myers wrote:

On Sun, 6 Jun 2021, Tom Honermann via Gcc-patches wrote:


These changes do not impact default gcc behavior.  The existing -fchar8_t
option is extended to C compilation to enable the N2653 changes, and
-fno-char8_t is extended to explicitly disable them.  N2653 has not yet been
accepted by WG14, so no changes are made to handling of the C2X language
dialect.

Why is that option needed?  Normally I'd expect features to be enabled or
disabled based on the selected language version, rather than having
separate options to adjust the configuration for one very specific feature
in a language version.  Adding extra language dialects not corresponding
to any standard version but to some peculiar mix of versions (such as C17
with a changed type for u8"", or C2X with a changed type for u8'') needs a
strong reason for those language dialects to be useful (for example, the
-fgnu89-inline option was justified by widespread use of GNU-style extern
inline in headers).


The option is needed because it impacts core language backward 
compatibility (for both C and C++, the type of u8 string literals; for 
C++, the type of u8 character literals and the new char8_t fundamental 
type).


The ability to opt-in or opt-out of the feature eases migration by 
enabling source code compatibility.  C and C++ standards are not 
published at the same cadence.  A project that targets C++20 and C17 may 
therefore have a need to either opt-out of char8_t support on the C++ 
side (already possible via -fno-char8_t), or to opt-in to char8_t 
support on the C side until such time as the targets change to C++20(+) 
and C23(+); assuming WG14 approval at some point.




I think the whole patch series would best wait until after the proposal
has been considered by a WG14 meeting, in addition to not increasing the
number of language dialects supported.


As an opt-in feature, this is useful to gain implementation and 
deployment experience for WG14.


It would be appropriate to document this as an experimental feature 
pending WG14 approval.  If WG14 declines it or approves it with 
different behavior, the feature can then be removed or changed.


The option could also be introduced as -fexperimental-char8_t if that 
eases concerns, though I do not favor that approach due to misalignment 
with the existing option for C++.


Tom.



Re: [PATCH 2/2] arm: Auto-vectorization for MVE: add pack/unpack patterns

2021-06-11 Thread Christophe Lyon via Gcc-patches
On Fri, 11 Jun 2021 at 12:20, Richard Sandiford
 wrote:
>
> Christophe Lyon  writes:
> > In the meantime, I tried to make some progress, and looked at how
> > things work on aarch64.
> >
> > This led me to define (in mve.md):
> >
> > (define_insn "@mve_vec_pack_trunc_lo_"
> >  [(set (match_operand: 0 "register_operand" "=w")
> >(truncate: (match_operand:MVE_5 1 "register_operand" 
> > "w")))]
> >  "TARGET_HAVE_MVE"
> >  "vmovnb.i   %q0, %q1\;"
> >   [(set_attr "type" "mve_move")]
> > )
> >
> > (define_insn "@mve_vec_pack_trunc_hi_"
> >  [(set (match_operand: 0 "register_operand" "=w")
> >(vec_concat:
> > (match_operand: 1 "register_operand" "0")
> > (truncate:
> > (match_operand:MVE_5 2 "register_operand" "w"]
> >  "TARGET_HAVE_MVE"
> >  "vmovnt.i   %q0, %q2\;"
> >   [(set_attr "type" "mve_move")]
> > )
> >
> > and in vec-common.md, for
> > (define_expand "vec_pack_trunc_"
> >  [(set (match_operand: 0 "register_operand")
> >(vec_concat:
> > (truncate:
> > (match_operand:VN 1 "register_operand"))
> > (truncate:
> > (match_operand:VN 2 "register_operand"]
> >
> > I expand this for MVE:
> >   rtx tmpreg = gen_reg_rtx (mode);
> >   emit_insn (gen_mve_vec_pack_trunc_lo (mode, tmpreg, 
> > operands[1]));
> >   emit_insn (gen_mve_vec_pack_trunc_hi (mode, operands[0],
> > tmpreg, operands[2]));
> >
> > I am getting an error in reload:
> > error: unable to generate reloads for:
> > (insn 10 9 11 2 (set (reg:V4HI 122 [ vect__4.16 ])
> > (truncate:V4HI (reg:V4SI 118 [ vect__4.16 ])))
> > "/gcc.target/arm/simd/mve-vec-pack.c":17:112 3609
> > {mve_vec_pack_trunc_lo_v4si}
> >  (expr_list:REG_DEAD (reg:V4SI 118 [ vect__4.16 ])
> > (nil)))
> >
> > The next insn is:
> > (insn 11 10 12 2 (set (reg:V8HI 121 [ vect__7.18 ])
> > (vec_concat:V8HI (reg:V4HI 122 [ vect__4.16 ])
> > (truncate:V4HI (reg:V4SI 119 [ vect__4.17 ]
> > "/gcc.target/arm/simd/mve-vec-pack.c":17:112 3611
> > {mve_vec_pack_trunc_hi_v4si}
> >  (expr_list:REG_DEAD (reg:V4HI 122 [ vect__4.16 ])
> > (expr_list:REG_DEAD (reg:V4SI 119 [ vect__4.17 ])
> > (nil
> >
> > What is causing the reload error?
>
> For this to work on MVE, we'd need to allow the 64-bit V_narrow modes to be
> stored in MVE registers.  I'm not sure off-hand whether allowing that
> would be a good idea or not.
>
> If we continue to allow only 128-bit vectors in MVE registers then
> we'll need to continue to use unspecs instead of truncate and vec_concat.
>

Thanks for the feedback. How about v2 attached?
Do you want me to merge neon_vec_unpack and
mve_vec_unpack  and only have different assembly?
if (TARGET_HAVE_MVE)
  return "vmovlb.%# %q0, %q1";
else
  return "vmovlb.%# %q0, %q1";


> Thanks,
> Richard
From 5abc6196c934438421e293b6b4733a98ec80e982 Mon Sep 17 00:00:00 2001
From: Christophe Lyon 
Date: Thu, 3 Jun 2021 14:35:50 +
Subject: [PATCH v2] arm: Auto-vectorization for MVE: add pack/unpack patterns

This patch adds vec_unpack_hi_, vec_unpack_lo_,
vec_pack_trunc_ patterns for MVE.

It does so by moving the unpack patterns from neon.md to
vec-common.md, while adding them support for MVE. The pack expander is
derived from the Neon one (which in turn is renamed into
neon_quad_vec_pack_trunc_).

The patch introduces mve_vec_unpack_lo_ and
mve_vec_unpack_hi_ which are similar to their Neon
counterparts, except for the assembly syntax.

I did not prefix them with '@', because I couldn't find how to call
gen_mve_vec_unpack_[lo|hi] with  as first parameter.  Anyway, we
can keep the VU iterator instead of MVE_3 since the offending V4SI
mode is disabled in the expander.

The patch introduces mve_vec_pack_trunc_lo_ to avoid the need for a
zero-initialized temporary, which is needed if the
vec_pack_trunc_ expander calls @mve_vmovn[bt]q_
instead.

With this patch, we can now vectorize the 16 and 8-bit versions of
vclz and vshl, although the generated code could still be improved.
For test_clz_s16, we now generate
vldrh.16q3, [r1]
vmovlb.s16   q2, q3
vmovlt.s16   q3, q3
vclz.i32  q2, q2
vclz.i32  q3, q3
vmovnb.i32  q1, q2
vmovnt.i32  q1, q3
vstrh.16q1, [r0]
which could be improved to
vldrh.16q3, [r1]
	vclz.i16	q1, q3
vstrh.16q1, [r0]
if we could avoid the need for unpack/pack steps.

For reference, clang-12 generates:
	vldrh.s32   q0, [r1]
	vldrh.s32   q1, [r1, #8]
	vclz.i32q0, q0
	vstrh.32q0, [r0]
	vclz.i32q0, q1
	vstrh.32q0, [r0, #8]

2021-06-11  Christophe Lyon  

	gcc/
	* config/arm/mve.md (mve_vec_unpack_lo_): New pattern.
	(mve_vec_unpack_hi_): New pattern.
	(@mve_vec_pack_trunc_lo_): New pattern.
	(mve_vmovntq_): Prefix with '@'.
	* config/arm/neon.md (vec_unpack_hi_): Move to
	

Re: [PATCH v2] inline-asm: Fix ICE with bitfields in "m" operands [PR100785]

2021-06-11 Thread Jason Merrill via Gcc-patches

On 6/11/21 10:50 AM, Jakub Jelinek wrote:

On Wed, Jun 02, 2021 at 12:14:51PM -0400, Jason Merrill wrote:

On 6/2/21 11:25 AM, Jakub Jelinek wrote:

On Wed, Jun 02, 2021 at 11:09:45AM -0400, Jason Merrill wrote:

On 6/2/21 3:59 AM, Jakub Jelinek wrote:

  if (!allows_reg && !cxx_mark_addressable (*op))
operand = error_mark_node;
+ else if (!allows_reg && bitfield_p (*op))
+   {
+ error_at (loc, "attempt to take address of bit-field");


Hmm, why aren't we already catching this in cxx_mark_addressable?


That is certainly possible, but it goes against Eric's patch
https://gcc.gnu.org/pipermail/gcc-patches/2015-June/421483.html


Hmm, I wonder what his rationale was?


No idea (and see below, nothing in the testsuite seems to require that).

Anyway, sorry for forgetting about this.

Here is a variant of the patch that adds the errors to
{c,cxx}_mark_addressable but keeps them in
build_unary_op/cp_build_addr_expr_1 too where for C++ it handles SFINAE
there etc.

Bootstrapped/regtested on x86_64-linux and i686-linux.


C++ change is OK; the whole patch is OK on Tuesday if no other comments.


2021-06-11  Jakub Jelinek  

PR inline-asm/100785
gcc/
* gimplify.c (gimplify_asm_expr): Don't diagnose errors if
output or input operands were already error_mark_node.
* cfgexpand.c (expand_asm_stmt): If errors are emitted,
remove all inputs, outputs and clobbers from the asm and
set template to "".
gcc/c/
* c-typeck.c (c_mark_addressable): Diagnose trying to make
bit-fields addressable.
gcc/cp/
* semantics.c (cxx_mark_addressable): Diagnose trying to make
bit-fields addressable.
gcc/testsuite/
* c-c++-common/pr100785.c: New test.
* gcc.dg/pr48552-1.c: Don't expect invalid lvalue errors.
* gcc.dg/pr48552-2.c: Likewise.

--- gcc/gimplify.c.jj   2021-06-10 15:27:35.141299425 +0200
+++ gcc/gimplify.c  2021-06-11 13:27:51.212867419 +0200
@@ -6321,12 +6321,14 @@ gimplify_asm_expr (tree *expr_p, gimple_
if (!allows_reg && allows_mem)
mark_addressable (TREE_VALUE (link));
  
+  tree orig = TREE_VALUE (link);

tret = gimplify_expr (_VALUE (link), pre_p, post_p,
is_inout ? is_gimple_min_lval : is_gimple_lvalue,
fb_lvalue | fb_mayfail);
if (tret == GS_ERROR)
{
- error ("invalid lvalue in % output %d", i);
+ if (orig != error_mark_node)
+   error ("invalid lvalue in % output %d", i);
  ret = tret;
}
  
@@ -6521,8 +6523,9 @@ gimplify_asm_expr (tree *expr_p, gimple_

  mark_addressable (TREE_VALUE (link));
  if (tret == GS_ERROR)
{
- error_at (EXPR_LOC_OR_LOC (TREE_VALUE (link), input_location),
-   "memory input %d is not directly addressable", i);
+ if (inputv != error_mark_node)
+   error_at (EXPR_LOC_OR_LOC (TREE_VALUE (link), input_location),
+ "memory input %d is not directly addressable", i);
  ret = tret;
}
}
--- gcc/cfgexpand.c.jj  2021-06-02 10:07:47.632826557 +0200
+++ gcc/cfgexpand.c 2021-06-11 13:27:51.212867419 +0200
@@ -3082,6 +3082,7 @@ expand_asm_stmt (gasm *stmt)
unsigned ninputs = gimple_asm_ninputs (stmt);
unsigned nlabels = gimple_asm_nlabels (stmt);
unsigned i;
+  bool error_seen = false;
  
/* ??? Diagnose during gimplification?  */

if (ninputs + noutputs + nlabels > MAX_RECOG_OPERANDS)
@@ -3140,6 +3141,7 @@ expand_asm_stmt (gasm *stmt)
{
  /* ??? Diagnose during gimplification?  */
  error ("unknown register name %qs in %", regname);
+ error_seen = true;
}
  else if (j == -4)
{
@@ -3202,7 +3204,10 @@ expand_asm_stmt (gasm *stmt)
&& REG_P (DECL_RTL (output_tvec[j]))
&& HARD_REGISTER_P (DECL_RTL (output_tvec[j]))
&& output_hregno == REGNO (DECL_RTL (output_tvec[j])))
- error ("invalid hard register usage between output operands");
+ {
+   error ("invalid hard register usage between output operands");
+   error_seen = true;
+ }
  
  	  /* Verify matching constraint operands use the same hard register

 and that the non-matching constraint operands do not use the same
@@ -3225,13 +3230,19 @@ expand_asm_stmt (gasm *stmt)
  }
if (i == match
&& output_hregno != input_hregno)
- error ("invalid hard register usage between output operand "
-"and matching constraint operand");
+ {
+   error ("invalid hard register usage between output "
+  "operand and matching constraint 

[PATCH v2] inline-asm: Fix ICE with bitfields in "m" operands [PR100785]

2021-06-11 Thread Jakub Jelinek via Gcc-patches
On Wed, Jun 02, 2021 at 12:14:51PM -0400, Jason Merrill wrote:
> On 6/2/21 11:25 AM, Jakub Jelinek wrote:
> > On Wed, Jun 02, 2021 at 11:09:45AM -0400, Jason Merrill wrote:
> > > On 6/2/21 3:59 AM, Jakub Jelinek wrote:
> > > >   if (!allows_reg && !cxx_mark_addressable (*op))
> > > > operand = error_mark_node;
> > > > + else if (!allows_reg && bitfield_p (*op))
> > > > +   {
> > > > + error_at (loc, "attempt to take address of 
> > > > bit-field");
> > > 
> > > Hmm, why aren't we already catching this in cxx_mark_addressable?
> > 
> > That is certainly possible, but it goes against Eric's patch
> > https://gcc.gnu.org/pipermail/gcc-patches/2015-June/421483.html
> 
> Hmm, I wonder what his rationale was?

No idea (and see below, nothing in the testsuite seems to require that).

Anyway, sorry for forgetting about this.

Here is a variant of the patch that adds the errors to
{c,cxx}_mark_addressable but keeps them in
build_unary_op/cp_build_addr_expr_1 too where for C++ it handles SFINAE
there etc.

Bootstrapped/regtested on x86_64-linux and i686-linux.

2021-06-11  Jakub Jelinek  

PR inline-asm/100785
gcc/
* gimplify.c (gimplify_asm_expr): Don't diagnose errors if
output or input operands were already error_mark_node.
* cfgexpand.c (expand_asm_stmt): If errors are emitted,
remove all inputs, outputs and clobbers from the asm and
set template to "".
gcc/c/
* c-typeck.c (c_mark_addressable): Diagnose trying to make
bit-fields addressable.
gcc/cp/
* semantics.c (cxx_mark_addressable): Diagnose trying to make
bit-fields addressable.
gcc/testsuite/
* c-c++-common/pr100785.c: New test.
* gcc.dg/pr48552-1.c: Don't expect invalid lvalue errors.
* gcc.dg/pr48552-2.c: Likewise.

--- gcc/gimplify.c.jj   2021-06-10 15:27:35.141299425 +0200
+++ gcc/gimplify.c  2021-06-11 13:27:51.212867419 +0200
@@ -6321,12 +6321,14 @@ gimplify_asm_expr (tree *expr_p, gimple_
   if (!allows_reg && allows_mem)
mark_addressable (TREE_VALUE (link));
 
+  tree orig = TREE_VALUE (link);
   tret = gimplify_expr (_VALUE (link), pre_p, post_p,
is_inout ? is_gimple_min_lval : is_gimple_lvalue,
fb_lvalue | fb_mayfail);
   if (tret == GS_ERROR)
{
- error ("invalid lvalue in % output %d", i);
+ if (orig != error_mark_node)
+   error ("invalid lvalue in % output %d", i);
  ret = tret;
}
 
@@ -6521,8 +6523,9 @@ gimplify_asm_expr (tree *expr_p, gimple_
  mark_addressable (TREE_VALUE (link));
  if (tret == GS_ERROR)
{
- error_at (EXPR_LOC_OR_LOC (TREE_VALUE (link), input_location),
-   "memory input %d is not directly addressable", i);
+ if (inputv != error_mark_node)
+   error_at (EXPR_LOC_OR_LOC (TREE_VALUE (link), input_location),
+ "memory input %d is not directly addressable", i);
  ret = tret;
}
}
--- gcc/cfgexpand.c.jj  2021-06-02 10:07:47.632826557 +0200
+++ gcc/cfgexpand.c 2021-06-11 13:27:51.212867419 +0200
@@ -3082,6 +3082,7 @@ expand_asm_stmt (gasm *stmt)
   unsigned ninputs = gimple_asm_ninputs (stmt);
   unsigned nlabels = gimple_asm_nlabels (stmt);
   unsigned i;
+  bool error_seen = false;
 
   /* ??? Diagnose during gimplification?  */
   if (ninputs + noutputs + nlabels > MAX_RECOG_OPERANDS)
@@ -3140,6 +3141,7 @@ expand_asm_stmt (gasm *stmt)
{
  /* ??? Diagnose during gimplification?  */
  error ("unknown register name %qs in %", regname);
+ error_seen = true;
}
  else if (j == -4)
{
@@ -3202,7 +3204,10 @@ expand_asm_stmt (gasm *stmt)
&& REG_P (DECL_RTL (output_tvec[j]))
&& HARD_REGISTER_P (DECL_RTL (output_tvec[j]))
&& output_hregno == REGNO (DECL_RTL (output_tvec[j])))
- error ("invalid hard register usage between output operands");
+ {
+   error ("invalid hard register usage between output operands");
+   error_seen = true;
+ }
 
  /* Verify matching constraint operands use the same hard register
 and that the non-matching constraint operands do not use the same
@@ -3225,13 +3230,19 @@ expand_asm_stmt (gasm *stmt)
  }
if (i == match
&& output_hregno != input_hregno)
- error ("invalid hard register usage between output operand "
-"and matching constraint operand");
+ {
+   error ("invalid hard register usage between output "
+  "operand and matching constraint operand");
+   error_seen = true;
+  

Re: [PATCH 3/4] remove %K from error() calls in the aarch64/arm back ends (PR 98512)

2021-06-11 Thread Martin Sebor via Gcc-patches

On 6/11/21 7:10 AM, Christophe Lyon wrote:

On Fri, 11 Jun 2021 at 09:53, Christophe Lyon
 wrote:


On Fri, 11 Jun 2021 at 01:29, Martin Sebor via Gcc-patches
 wrote:


This patch removes the uses of %K from error() calls in the aarch64
and arm back ends.  I tested this change only by building a cross-
compiler but I can't easily run the tests so I'd appreciate some help
validating it.  The fallout from the change should be limited to changes
to error messages so in the worst case it could be addressed after
committing the patch.


I've submitted a validation with your patch, I'll let you know the
results (in a few hours)



I haven't noticed any regression with that patch alone.
I just noticed it is patch 3/4: I didn't apply patches 1 and 2 before
this one, do you want me to relaunch the tests with the 3 patches
combined?


That would be great, as soon as I fix the thinko Richard Sandiford
pointed out.

Thanks for the offer!

Martin



Thanks,

Christophe


Thanks

Christophe




Re: [PATCH 3/4] remove %K from error() calls in the aarch64/arm back ends (PR 98512)

2021-06-11 Thread Martin Sebor via Gcc-patches

On 6/11/21 3:58 AM, Richard Sandiford wrote:

Martin Sebor via Gcc-patches  writes:

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 7b37e1b602c..7cdc824730c 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -13242,13 +13242,8 @@ bounds_check (rtx operand, HOST_WIDE_INT low, 
HOST_WIDE_INT high,
lane = INTVAL (operand);
  
if (lane < low || lane >= high)

-{
-  if (exp)
-   error ("%K%s %wd out of range %wd - %wd",
-  exp, desc, lane, low, high - 1);
-  else
-   error ("%s %wd out of range %wd - %wd", desc, lane, low, high - 1);
-}
+error_at (EXPR_LOCATION (exp),
+ "%s %wd out of range %wd - %wd", desc, lane, low, high - 1);
  }
  
  /* Bounds-check lanes.  */


This part doesn't look safe: “exp” is null when called from arm_const_bounds.


Doh!  Yes, will fix, thanks.

Martin



Thanks,
Richard





Re: GCC documentation: porting to Sphinx

2021-06-11 Thread Martin Liška

On 6/11/21 1:48 AM, Martin Sebor wrote:

On 6/10/21 7:18 AM, Martin Liška wrote:

On 6/10/21 11:07 AM, Martin Liška wrote:

Doing that, one has 2 unique links, that would be needed for get_option_url 
function.
Plus, both :option:`-Wfoo` and :option:`-Wno-foo` references are going to work.


And I've actually did the transformation and one can see it e.g. here:
https://splichal.eu/scripts/sphinx/gcc/_build/html/gcc-command-options/options-to-request-or-suppress-warnings.html#cmdoption-Wprio-ctor-dtor 


I find the style you have below right now clearer than keeping both
options in the same heading and adding a Note explaining the default
etc.  I.e., this

   _
   -Wchar-subscripts

     Warn if an array subscript has type char. This is a common cause
     of error, as programmers often forget that this type is signed on
     some machines. This warning is enabled by -Wall.

   
   -Wno-char-subscripts

    Default option value for -Wchar-subscripts.


Yes, fully agree with you!

Thanks,
Martin



on this page right now:
https://splichal.eu/scripts/sphinx/gcc/_build/html/gcc-command-options/options-to-request-or-suppress-warnings.html#cmdoption-Wchar-subscripts

seems better than this:

   _
   -Wno-shift-overflow, -Wshift-overflow=n, -Wshift-overflow

     +--+
     | (!) Note |
     +---
     | Default value is -Wno-shift-overflow, -Wshift-overflow is    |
     | enabled by -Wall.    |
     +--+

     These options control warnings about left shift overflows.

and also better than the alternative with (non-default) after the option
in the heading.

https://splichal.eu/scripts/sphinx/demo/_build/html/#cmdoption-Wno-shift-overflow3

Martin




Martin






Re: GCC documentation: porting to Sphinx

2021-06-11 Thread Martin Liška

On 6/10/21 6:49 PM, Joseph Myers wrote:

On Thu, 10 Jun 2021, Martin Liška wrote:


1) Can we organize the new documentation in $gccroot/doc folder
similarly to what I have in texi2rst-generated repo? Would be beneficial
as we can have a single Makefile and shared content will be in a same
depth to the individual manuals.


Where languages have their own manuals, I think it's more appropriate for
those to go under the language-specific directories.


So it will require the following folder structure:

$gccroot/gcc/doc/gcc - for GCC documentation
$gccroot/gcc/doc/gccint - for GCC internal documentation
$gccroot/gcc/doc/gfortran - for Fortran documentation
$gccroot/gcc/doc/gccgo - for GO documentation
...
$gccroot/doc/share - shared components
$gccroot/libgomp/doc - for libgomp documentation
...

Are you fine with that?



That doesn't stop the use of shared makefile code.  Make-lang.in is a
fragment included from gcc/Makefile.in ("-include $(LANG_MAKEFRAGS)").  I
certainly expect it should be possible to write GNU make code in
gcc/Makefile.in for building and installing manuals, such that
subdirectories only need to define a few variables describing what manuals
they have and everything else is handled by common code.



The Sphinx Makefile will be capable of e.g.
make html -C $gccroot/gcc/doc/gcc BUILDDIR=`pwd/put_it_somewhere`

and the only configure dependency will VERSION_PACKAGE and BUGURL which will
be provided in env:
https://github.com/marxin/texi2rst-generated/blob/6cfcb7b8ae6497d49ea23a38262dfa26854bdb40/sphinx/baseconf.py#L38-L39

Martin


Re: [PATCH] c++: Substitute into function parms in lexical order [PR96560]

2021-06-11 Thread Patrick Palka via Gcc-patches
On Thu, Apr 29, 2021 at 7:48 AM Patrick Palka  wrote:
>
> On Wed, 28 Apr 2021, Jason Merrill wrote:
>
> > On 4/28/21 2:24 PM, Patrick Palka wrote:
> > > This makes tsubst_arg_types substitute into a function's parameter types
> > > in left-to-right order instead of right-to-left order, in accordance with
> > > DR 1227.
> > >
> > > Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
> > > trunk?  [ diff generated with -w to hide noisy whitespace changes ]
> >
> > OK. We'll still substitute lambda default args in reverse order, but that
> > shouldn't happen in sfinae context, so that shouldn't be a problem.  Maybe 
> > add
> > an assert (complain & tf_error) in the lambda case?
>
> Apparently cpp2a/lambda-uneval2.C trips over such an assert during
> deduction for:
>
>   template 
>   auto j(T t) -> decltype([](auto x) -> decltype(x.invalid) { } (t));
>
> which makes sense, I suppose.
>
> It looks like we can just move up the default argument processing to
> before the recursive call to tsubst_arg_types, as in the below.
> Bootstrapped and regtested on x86_64-pc-linux-gnu.

Ping

>
> BTW, now that LAMBDA_EXPR has LAMBDA_EXPR_REGEN_INFO I think we have
> enough information to defer substituting into lambda default arguments
> until they're actually needed.  Would that be something to look into as
> a followup patch?
>
> -- >8 --
>
> Subject: [PATCH] c++: Substitute into function parms in lexical order
>  [PR96560]
>
> This makes tsubst_arg_types substitute into a function's parameter types
> in left-to-right order instead of right-to-left order, in accordance with
> DR 1227.
>
> gcc/cp/ChangeLog:
>
> DR 1227
> PR c++/96560
> * pt.c (tsubst_arg_types): Rearrange so that we substitute into
> TYPE_ARG_TYPES in forward order while short circuiting
> appropriately.  Adjust formatting.
>
> gcc/testsuite/ChangeLog:
>
> DR 1227
> PR c++/96560
> * g++.dg/template/sfinae-dr1227.C: New test.
> ---
>  gcc/cp/pt.c   | 51 ++-
>  gcc/testsuite/g++.dg/template/sfinae-dr1227.C | 23 +
>  2 files changed, 51 insertions(+), 23 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/template/sfinae-dr1227.C
>
> diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
> index eaf46659f85..e6d65595e2f 100644
> --- a/gcc/cp/pt.c
> +++ b/gcc/cp/pt.c
> @@ -15068,20 +15068,13 @@ tsubst_arg_types (tree arg_types,
>   tsubst_flags_t complain,
>   tree in_decl)
>  {
> -  tree remaining_arg_types;
>tree type = NULL_TREE;
> -  int i = 1;
> +  int len = 1;
>tree expanded_args = NULL_TREE;
> -  tree default_arg;
>
>if (!arg_types || arg_types == void_list_node || arg_types == end)
>  return arg_types;
>
> -  remaining_arg_types = tsubst_arg_types (TREE_CHAIN (arg_types),
> - args, end, complain, in_decl);
> -  if (remaining_arg_types == error_mark_node)
> -return error_mark_node;
> -
>if (PACK_EXPANSION_P (TREE_VALUE (arg_types)))
>  {
>/* For a pack expansion, perform substitution on the
> @@ -15092,7 +15085,7 @@ tsubst_arg_types (tree arg_types,
>
>if (TREE_CODE (expanded_args) == TREE_VEC)
>  /* So that we'll spin through the parameters, one by one.  */
> -i = TREE_VEC_LENGTH (expanded_args);
> +   len = TREE_VEC_LENGTH (expanded_args);
>else
>  {
>/* We only partially substituted into the parameter
> @@ -15101,14 +15094,15 @@ tsubst_arg_types (tree arg_types,
>expanded_args = NULL_TREE;
>  }
>  }
> +  else
> +type = tsubst (TREE_VALUE (arg_types), args, complain, in_decl);
>
> -  while (i > 0) {
> ---i;
> -
> +  /* Check if a substituted type is erroneous before substituting into
> + the rest of the chain.  */
> +  for (int i = 0; i < len; i++)
> +{
>if (expanded_args)
> type = TREE_VEC_ELT (expanded_args, i);
> -else if (!type)
> -  type = tsubst (TREE_VALUE (arg_types), args, complain, in_decl);
>
>if (type == error_mark_node)
> return error_mark_node;
> @@ -15122,15 +15116,12 @@ tsubst_arg_types (tree arg_types,
> }
>   return error_mark_node;
> }
> -
> -/* Do array-to-pointer, function-to-pointer conversion, and ignore
> -   top-level qualifiers as required.  */
> -type = cv_unqualified (type_decays_to (type));
> +}
>
>/* We do not substitute into default arguments here.  The standard
>   mandates that they be instantiated only when needed, which is
>   done in build_over_call.  */
> -default_arg = TREE_PURPOSE (arg_types);
> +  tree default_arg = TREE_PURPOSE (arg_types);
>
>/* Except that we do substitute default arguments under tsubst_lambda_expr,
>   since the new op() won't have any associated template arguments for us
> @@ -15139,20 +15130,34 @@ tsubst_arg_types (tree arg_types,
>  default_arg = 

Re: For 'OMP_CLAUSE' in 'dump_generic_node', dump the whole OMP clause chain

2021-06-11 Thread Jakub Jelinek via Gcc-patches
On Fri, Jun 11, 2021 at 04:04:35PM +0200, Thomas Schwinge wrote:
> >From db04d261071eb5691906fd46436032a7db8a0b02 Mon Sep 17 00:00:00 2001
> From: Thomas Schwinge 
> Date: Fri, 11 Jun 2021 15:37:33 +0200
> Subject: [PATCH] For 'OMP_CLAUSE' in 'dump_generic_node', dump the whole OMP
>  clause chain
> 
> ... instead of just the first clause.
> 
>   gcc/
>   * tree-pretty-print.h (dump_omp_clauses): Add 'bool = true'
>   default argument.
>   * tree-pretty-print.c (dump_omp_clauses): Update.
>   (dump_generic_node) : Use it.
>   gcc/testsuite/
>   * gcc.dg/gomp/simd-clones-2.c: Enhance.

LGTM, thanks.

Jakub



For 'OMP_CLAUSE' in 'dump_generic_node', dump the whole OMP clause chain

2021-06-11 Thread Thomas Schwinge
Hi!

See attached.  OK to push after testing -- or would there ever be a
reason where this is not appropriate?


For example, for 'gcc.dg/gomp/simd-clones-2.c', this changes ("corrects")
the '*.optimized' dump file as follows:

@@ -1,7 +1,7 @@

 ;; Function addit (addit, funcdef_no=0, decl_uid=2073, cgraph_uid=1, 
symbol_order=0)

-__attribute__((omp declare simd (notinbranch), omp declare simd 
(inbranch)))
+__attribute__((omp declare simd (notinbranch aligned(2:32)), omp declare 
simd (inbranch uniform(2) linear(1:66
 int addit (int a, int b, int * c)
 {
   int _3;
@@ -16,7 +16,7 @@ int addit (int a, int b, int * c)

 ;; Function setArray (setArray, funcdef_no=1, decl_uid=2078, cgraph_uid=2, 
symbol_order=1)

-__attribute__((omp declare simd (notinbranch)))
+__attribute__((omp declare simd (notinbranch uniform(0) aligned(0:32) 
linear(2:1
 float setArray (float * a, float x, int k)
 {
   long unsigned int _1;
@@ -40,7 +40,7 @@ float setArray (float * a, float x, int k)

 ;; Function setArray.simdclone.0 (_ZGVbN4ua32vl_setArray, funcdef_no=2, 
decl_uid=2089, cgraph_uid=3, symbol_order=2)

-__attribute__((omp declare simd (notinbranch)))
+__attribute__((omp declare simd (notinbranch uniform(0) aligned(0:32) 
linear(2:1
 vector(4) float setArray.simdclone.0 (float * a, vector(4) float simd.4, 
int k)
 {
   vector(4) float vect__8.86;
@@ -65,7 +65,7 @@ vector(4) float setArray.simdclone.0 (float * a, 
vector(4) float simd.4, int k)

 ;; Function setArray.simdclone.1 (_ZGVcN8ua32vl_setArray, funcdef_no=3, 
decl_uid=2100, cgraph_uid=5, symbol_order=4)

-__attribute__((omp declare simd (notinbranch)))
+__attribute__((omp declare simd (notinbranch uniform(0) aligned(0:32) 
linear(2:1
 vector(8) float setArray.simdclone.1 (float * a, vector(8) float simd.8, 
int k)
 {
   vector(8) float vect__8.98;
@@ -90,7 +90,7 @@ vector(8) float setArray.simdclone.1 (float * a, 
vector(8) float simd.8, int k)

 ;; Function setArray.simdclone.2 (_ZGVdN8ua32vl_setArray, funcdef_no=4, 
decl_uid=2372, cgraph_uid=6, symbol_order=5)

-__attribute__((omp declare simd (notinbranch)))
+__attribute__((omp declare simd (notinbranch uniform(0) aligned(0:32) 
linear(2:1
 vector(8) float setArray.simdclone.2 (float * a, vector(8) float simd.12, 
int k)
 {
   vector(8) float vect__8.110;
@@ -115,7 +115,7 @@ vector(8) float setArray.simdclone.2 (float * a, 
vector(8) float simd.12, int k)

 ;; Function setArray.simdclone.3 (_ZGVeN16ua32vl_setArray, funcdef_no=5, 
decl_uid=2558, cgraph_uid=7, symbol_order=6)

-__attribute__((omp declare simd (notinbranch)))
+__attribute__((omp declare simd (notinbranch uniform(0) aligned(0:32) 
linear(2:1
 vector(16) float setArray.simdclone.3 (float * a, vector(16) float 
simd.16, int k)
 {
   vector(16) float vect__8.122;
@@ -140,7 +140,7 @@ vector(16) float setArray.simdclone.3 (float * a, 
vector(16) float simd.16, int

 ;; Function addit.simdclone.0 (_ZGVbN4vvva32_addit, funcdef_no=6, 
decl_uid=3029, cgraph_uid=8, symbol_order=7)

-__attribute__((omp declare simd (notinbranch), omp declare simd 
(inbranch)))
+__attribute__((omp declare simd (notinbranch aligned(2:32)), omp declare 
simd (inbranch uniform(2) linear(1:66
 vector(4) int addit.simdclone.0 (vector(4) int simd.22, vector(4) int 
simd.23, vector(2) unsigned long simd.24, vector(2) unsigned long simd.25)
 {
   vector(4) int vect__3.134;
@@ -155,7 +155,7 @@ vector(4) int addit.simdclone.0 (vector(4) int simd.22, 
vector(4) int simd.23, v

 ;; Function addit.simdclone.1 (_ZGVcN4vvva32_addit, funcdef_no=7, 
decl_uid=3046, cgraph_uid=9, symbol_order=8)

-__attribute__((omp declare simd (notinbranch), omp declare simd 
(inbranch)))
+__attribute__((omp declare simd (notinbranch aligned(2:32)), omp declare 
simd (inbranch uniform(2) linear(1:66
 vector(4) int addit.simdclone.1 (vector(4) int simd.31, vector(4) int 
simd.32, vector(2) unsigned long simd.33, vector(2) unsigned long simd.34)
 {
   vector(4) int vect__3.144;
@@ -170,7 +170,7 @@ vector(4) int addit.simdclone.1 (vector(4) int simd.31, 
vector(4) int simd.32, v

 ;; Function addit.simdclone.2 (_ZGVdN8vvva32_addit, funcdef_no=8, 
decl_uid=3063, cgraph_uid=10, symbol_order=9)

-__attribute__((omp declare simd (notinbranch), omp declare simd 
(inbranch)))
+__attribute__((omp declare simd (notinbranch aligned(2:32)), omp declare 
simd (inbranch uniform(2) linear(1:66
 vector(8) int addit.simdclone.2 (vector(8) int simd.40, vector(8) int 
simd.41, vector(4) unsigned long simd.42, vector(4) unsigned long simd.43)
 {
   vector(8) int vect__3.154;
@@ -185,7 +185,7 @@ vector(8) int addit.simdclone.2 (vector(8) int simd.40, 
vector(8) int simd.41, v

 ;; Function addit.simdclone.3 (_ZGVeN16vvva32_addit, 

[PATCH] tree-optimization/101025 - fix store-motion dependence checking

2021-06-11 Thread Richard Biener
This plugs a hole in store-motion where it fails to perform dependence
checking on conditionally executed but not store-motioned refs.

Bootstrapped on x86_64-unknown-linux-gnu, regtest in progress.

2021-06-11  Richard Biener  

PR tree-optimization/101025
* tree-ssa-loop-im.c (sm_seq_valid_bb): Make sure to process
all refs that require dependence checking.

* gcc.dg/torture/pr101025.c: New testcase.
---
 gcc/testsuite/gcc.dg/torture/pr101025.c | 23 +++
 gcc/tree-ssa-loop-im.c  | 38 +++--
 2 files changed, 59 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr101025.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr101025.c 
b/gcc/testsuite/gcc.dg/torture/pr101025.c
new file mode 100644
index 000..483e0ff7997
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr101025.c
@@ -0,0 +1,23 @@
+/* { dg-do run } */
+
+int a[10];
+int b, d, g;
+volatile char c;
+short e;
+volatile int f;
+int main()
+{
+  for (; d <= 9; d++) {
+  b = e = 0;
+  for (; e < 4; e++)
+a[e] = 4;
+  for (; b <= 3; b++)
+if (g)
+  f = 0;
+else
+  a[b] = c;
+  }
+  if (a[1] != 0)
+__builtin_abort ();
+  return 0;
+}
diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c
index 8034cf68d27..1c865b28fd6 100644
--- a/gcc/tree-ssa-loop-im.c
+++ b/gcc/tree-ssa-loop-im.c
@@ -2169,6 +2169,7 @@ execute_sm (class loop *loop, im_mem_ref *ref,
 enum sm_kind { sm_ord, sm_unord, sm_other };
 struct seq_entry
 {
+  seq_entry () {}
   seq_entry (unsigned f, sm_kind k, tree fr = NULL)
 : first (f), second (k), from (fr) {}
   unsigned first;
@@ -2352,6 +2353,8 @@ sm_seq_valid_bb (class loop *loop, basic_block bb, tree 
vdef,
  unsigned min_len = MIN(first_edge_seq.length (),
 edge_seq.length ());
  /* Incrementally merge seqs into first_edge_seq.  */
+ int first_uneq = -1;
+ auto_vec extra_refs;
  for (unsigned int i = 0; i < min_len; ++i)
{
  /* ???  We can more intelligently merge when we face different
@@ -2367,6 +2370,11 @@ sm_seq_valid_bb (class loop *loop, basic_block bb, tree 
vdef,
bitmap_set_bit (refs_not_supported, edge_seq[i].first);
  first_edge_seq[i].second = sm_other;
  first_edge_seq[i].from = NULL_TREE;
+ /* Record the dropped refs for later processing.  */
+ if (first_uneq == -1)
+   first_uneq = i;
+ extra_refs.safe_push (seq_entry (edge_seq[i].first,
+  sm_other, NULL_TREE));
}
  /* sm_other prevails.  */
  else if (first_edge_seq[i].second != edge_seq[i].second)
@@ -2399,10 +2407,36 @@ sm_seq_valid_bb (class loop *loop, basic_block bb, tree 
vdef,
}
  else if (edge_seq.length () > first_edge_seq.length ())
{
+ if (first_uneq == -1)
+   first_uneq = first_edge_seq.length ();
  for (unsigned i = first_edge_seq.length ();
   i < edge_seq.length (); ++i)
-   if (edge_seq[i].second == sm_ord)
- bitmap_set_bit (refs_not_supported, edge_seq[i].first);
+   {
+ if (edge_seq[i].second == sm_ord)
+   bitmap_set_bit (refs_not_supported, edge_seq[i].first);
+ extra_refs.safe_push (seq_entry (edge_seq[i].first,
+  sm_other, NULL_TREE));
+   }
+   }
+ /* Put unmerged refs at first_uneq to force dependence checking
+on them.  */
+ if (first_uneq != -1)
+   {
+ /* Missing ordered_splice_at.  */
+ if ((unsigned)first_uneq == first_edge_seq.length ())
+   first_edge_seq.safe_splice (extra_refs);
+ else
+   {
+ unsigned fes_length = first_edge_seq.length ();
+ first_edge_seq.safe_grow (fes_length
+   + extra_refs.length ());
+ memmove (_edge_seq[first_uneq + extra_refs.length 
()],
+  _edge_seq[first_uneq],
+  (fes_length - first_uneq) * sizeof (seq_entry));
+ memcpy (_edge_seq[first_uneq],
+ extra_refs.address (),
+ extra_refs.length () * sizeof (seq_entry));
+   }
}
}
  /* Use the sequence from the first edge and push SMs down.  */
-- 
2.26.2


[committed] analyzer: tweak priority of callstrings in worklist::key_t::cmp

2021-06-11 Thread David Malcolm via Gcc-patches
While debugging another issue I noticed that the analyzer could fail to
merge nodes for control flow in which one path had called a function
and another path hadn't:

BB
   /  \
  /\
 fn call   no fn call
  \/
   \  /
 join BB

The root cause was that the worklist sort function wasn't prioritizing
call strings, and thus it was fully exploring the "no function called"
path to the exit BB, and only then exploring the "within the function call"
parts of the "funcion called" path.

This patch prioritizes call strings when sorting the worklist so that
the nodes with deeper call strings are processed before those with shallower
call strings, thus allowing such nodes to be merged at the joinpoint.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as 9d20ec97475b1102d6ca005ad165056d34615a3d.

gcc/analyzer/ChangeLog:
* engine.cc (worklist::key_t::cmp): Move sort by call_string to
before SCC.

gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/loop-0-up-to-n-by-1-with-iter-obj.c: Update
expected number of enodes after the loop.
* gcc.dg/analyzer/paths-8.c: New test.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/engine.cc| 25 ++-
 .../loop-0-up-to-n-by-1-with-iter-obj.c   |  3 +--
 gcc/testsuite/gcc.dg/analyzer/paths-8.c   | 17 +
 3 files changed, 37 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/paths-8.c

diff --git a/gcc/analyzer/engine.cc b/gcc/analyzer/engine.cc
index 5b519fdf385..48320bc062e 100644
--- a/gcc/analyzer/engine.cc
+++ b/gcc/analyzer/engine.cc
@@ -2004,7 +2004,25 @@ worklist::key_t::cmp (const worklist::key_t , const 
worklist::key_t )
return cmp;
 }
 
-  /* First, order by SCC.  */
+  /* Sort by callstring, so that nodes with deeper call strings are processed
+ before those with shallower call strings.
+ If we have
+ splitting BB
+ /  \
+/\
+   fn call   no fn call
+\/
+ \  /
+join BB
+ then we want the path inside the function call to be fully explored up
+ to the return to the join BB before we explore on the "no fn call" path,
+ so that both enodes at the join BB reach the front of the worklist at
+ the same time and thus have a chance of being merged.  */
+  int cs_cmp = call_string::cmp (call_string_a, call_string_b);
+  if (cs_cmp)
+return cs_cmp;
+
+  /* Order by SCC.  */
   int scc_id_a = ka.get_scc_id (ka.m_enode);
   int scc_id_b = kb.get_scc_id (kb.m_enode);
   if (scc_id_a != scc_id_b)
@@ -2033,11 +2051,6 @@ worklist::key_t::cmp (const worklist::key_t , const 
worklist::key_t )
 
   gcc_assert (snode_a == snode_b);
 
-  /* The points might vary by callstring; try sorting by callstring.  */
-  int cs_cmp = call_string::cmp (call_string_a, call_string_b);
-  if (cs_cmp)
-return cs_cmp;
-
   /* Order within supernode via program point.  */
   int within_snode_cmp
 = function_point::cmp_within_supernode (point_a.get_function_point (),
diff --git a/gcc/testsuite/gcc.dg/analyzer/loop-0-up-to-n-by-1-with-iter-obj.c 
b/gcc/testsuite/gcc.dg/analyzer/loop-0-up-to-n-by-1-with-iter-obj.c
index 2b0352711ae..0172c9b324c 100644
--- a/gcc/testsuite/gcc.dg/analyzer/loop-0-up-to-n-by-1-with-iter-obj.c
+++ b/gcc/testsuite/gcc.dg/analyzer/loop-0-up-to-n-by-1-with-iter-obj.c
@@ -69,6 +69,5 @@ void test(int n)
 
   free (it);
 
-  __analyzer_dump_exploded_nodes (0); /* { dg-warning "2 processed enodes" } */
-  // TODO: why 2 enodes here, rather than 1
+  __analyzer_dump_exploded_nodes (0); /* { dg-warning "1 processed enode" } */
 }
diff --git a/gcc/testsuite/gcc.dg/analyzer/paths-8.c 
b/gcc/testsuite/gcc.dg/analyzer/paths-8.c
new file mode 100644
index 000..b350d4d7dbd
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/paths-8.c
@@ -0,0 +1,17 @@
+#include "analyzer-decls.h"
+
+static void __attribute__((noinline))
+__analyzer_callee_1 (void)
+{
+  /* empty.  */
+}
+
+void
+test_1 (int flag)
+{
+  if (flag)
+__analyzer_callee_1 ();
+
+  /* Verify that we merge state, whether or not the call happens.  */
+  __analyzer_dump_exploded_nodes (0); /* { dg-warning "1 processed enode" } */
+}
-- 
2.26.3



[PATCH v2] combine: Tweak the condition of last_set invalidation

2021-06-11 Thread Kewen.Lin via Gcc-patches
Hi Segher,

Thanks for the review!

on 2021/6/10 上午4:17, Segher Boessenkool wrote:
> Hi!
> 
> On Wed, Dec 16, 2020 at 04:49:49PM +0800, Kewen.Lin wrote:
>> Currently we have the check:
>>
>>   if (!insn
>>|| (value && rsp->last_set_table_tick >= label_tick_ebb_start))
>>  rsp->last_set_invalid = 1; 
>>
>> which means if we want to record some value for some reg and
>> this reg got refered before in a valid scope,
> 
> If we already know it is *set* in this same extended basic block.
> Possibly by the same instruction btw.
> 
>> we invalidate the
>> set of reg (last_set_invalid to 1).  It avoids to find the wrong
>> set for one reg reference, such as the case like:
>>
>>... op regX  // this regX could find wrong last_set below
>>regX = ...   // if we think this set is valid
>>... op regX
> 
> Yup, exactly.
> 
>> But because of retry's existence, the last_set_table_tick could
>> be set by some later reference insns, but we see it's set due
>> to retry on the set (for that reg) insn again, such as:
>>
>>insn 1
>>insn 2
>>
>>regX = ... --> (a)
>>... op regX--> (b)
>>
>>insn 3
>>
>>// assume all in the same BB.
>>
>> Assuming we combine 1, 2 -> 3 sucessfully and replace them as two
>> (3 insns -> 2 insns),
> 
> This will delete insn 1 and write the combined result to insns 2 and 3.
> 
>> retrying from insn1 or insn2 again:
> 
> Always 2, but your point remains valid.
> 
>> it will scan insn (a) again, the below condition holds for regX:
>>
>>   (value && rsp->last_set_table_tick >= label_tick_ebb_start)
>>
>> it will mark this set as invalid set.  But actually the
>> last_set_table_tick here is set by insn (b) before retrying, so it
>> should be safe to be taken as valid set.
> 
> Yup.
> 
>> This proposal is to check whether the last_set_table safely happens
>> after the current set, make the set still valid if so.
> 
>> Full SPEC2017 building shows this patch gets more sucessful combines
>> from 1902208 to 1902243 (trivial though).
> 
> Do you have some example, or maybe even a testcase?  :-)
> 

Sorry for the late reply, it took some time to get one reduced case.

typedef struct SA *pa_t;

struct SC {
  int h;
  pa_t elem[];
};

struct SD {
  struct SC *e;
};

struct SA {
  struct {
struct SD f[1];
  } g;
};

void foo(pa_t *k, char **m) {
  int l, i;
  pa_t a;
  l = (int)a->g.f[5].e;
  i = 0;
  for (; i < l; i++) {
k[i] = a->g.f[5].e->elem[i];
m[i] = "";
  }
}

Baseline is r12-0 and the option is "-O3 -mcpu=power9 -fno-strict-aliasing",
with this patch, the generated assembly can save two rlwinm s.

>> +  /* Record the luid of the insn whose expression involving register n.  */
>> +
>> +  int   last_set_table_luid;
> 
> "Record the luid of the insn for which last_set_table_tick was set",
> right?
> 

But it can be updated later to one smaller luid, how about the wording like:


+  /* Record the luid of the insn which uses register n, the insn should
+ be the first one using register n in that block of the insn which
+ last_set_table_tick was set for.  */


>> -static void update_table_tick (rtx);
>> +static void update_table_tick (rtx, int);
> 
> Please remove this declaration instead, the function is not used until
> after its actual definition :-)
> 

Done.

>> @@ -13243,7 +13247,21 @@ update_table_tick (rtx x)
>>for (r = regno; r < endregno; r++)
>>  {
>>reg_stat_type *rsp = _stat[r];
>> -  rsp->last_set_table_tick = label_tick;
>> +  if (rsp->last_set_table_tick >= label_tick_ebb_start)
>> +{
>> +  /* Later references should not have lower ticks.  */
>> +  gcc_assert (label_tick >= rsp->last_set_table_tick);
> 
> This should be obvious, but checking it won't hurt, okay.
> 
>> +  /* Should pick up the lowest luid if the references
>> + are in the same block.  */
>> +  if (label_tick == rsp->last_set_table_tick
>> +  && rsp->last_set_table_luid > insn_luid)
>> +rsp->last_set_table_luid = insn_luid;
> 
> Why?  Is it conservative for the check you will do later?  Please spell
> this out, it is crucial!
> 

Since later the combinations involving this insn probably make the
register be used in one insn sitting ahead (which has smaller luid than
the one which was recorded before).  Yes, it's very conservative, this
ensure that we always use the luid of the insn which is the first insn
using this register in the block.  The last_set invalidation is going
to catch the case like:

   ... regX  // avoid the set used here ...
   regX = ...
   ...

Once we have the smallest luid one of all insns which use register X,
any unsafe regX sets should be caught.

I updated the comments to:

+  /* Since combination may generate some instructions
+ to replace some foregoing instructions with the
+ references to register r (using register r), we
+ need to 

Re: [PATCH 3/4] remove %K from error() calls in the aarch64/arm back ends (PR 98512)

2021-06-11 Thread Christophe Lyon via Gcc-patches
On Fri, 11 Jun 2021 at 09:53, Christophe Lyon
 wrote:
>
> On Fri, 11 Jun 2021 at 01:29, Martin Sebor via Gcc-patches
>  wrote:
> >
> > This patch removes the uses of %K from error() calls in the aarch64
> > and arm back ends.  I tested this change only by building a cross-
> > compiler but I can't easily run the tests so I'd appreciate some help
> > validating it.  The fallout from the change should be limited to changes
> > to error messages so in the worst case it could be addressed after
> > committing the patch.
>
> I've submitted a validation with your patch, I'll let you know the
> results (in a few hours)
>

I haven't noticed any regression with that patch alone.
I just noticed it is patch 3/4: I didn't apply patches 1 and 2 before
this one, do you want me to relaunch the tests with the 3 patches
combined?

Thanks,

Christophe

> Thanks
>
> Christophe


Re: [GCC][Patch] arm: Fix the mve multilib for the broken cmse support (pr99939).

2021-06-11 Thread Richard Earnshaw via Gcc-patches




On 11/06/2021 14:02, Srinath Parvathaneni via Gcc-patches wrote:

Hi Richard,

I have addressed all your review comments in 
https://gcc.gnu.org/pipermail/gcc-patches/2021-June/571739.html
in the following patch.

The current CMSE support in the multilib build for "-march=armv8.1-m.main+mve 
-mfloat-abi=hard -mfpu=auto"
is broken as specified in PR99939 and this patch fixes the issue.

Regression tested on arm-none-eabi and found no regressions.

Ok for master? and Ok for GCC-10 branch?

Regards,
Srinath.

gcc/testsuite/ChangeLog:

2021-06-11  Srinath Parvathaneni  

PR target/99939
* gcc.target/arm/cmse/cmse-18.c: Add separate scan-assembler
directives check for target is v8.1-m.main+mve or not before
comparing the assembly output.
* gcc.target/arm/cmse/cmse-20.c: New test.

libgcc/ChangeLog:

2021-06-11  Srinath Parvathaneni  

PR target/99939
* config/arm/cmse_nonsecure_call.S: Add __ARM_FEATURE_MVE
macro.
* config/arm/t-arm: To link cmse.o and cmse_nonsecure_call.o
on passing -mcmse option.




OK.

R.


### Attachment also inlined for ease of reply###


diff --git a/gcc/testsuite/gcc.target/arm/cmse/cmse-18.c 
b/gcc/testsuite/gcc.target/arm/cmse/cmse-18.c
index 
e1ff09257b7900982f49117d4cfc16f3bd79d76c..db7d975a90ea4bd1810aea03949ec1e8837e
 100644
--- a/gcc/testsuite/gcc.target/arm/cmse/cmse-18.c
+++ b/gcc/testsuite/gcc.target/arm/cmse/cmse-18.c
@@ -8,4 +8,5 @@ void bar(f func, int a)
func(a);
  }
  
-/* { dg-final { scan-rtl-dump "call unspec\\\[\\\[r4:SI\\\]\\\]" "final" } } */

+/* { dg-final { scan-rtl-dump "call unspec\\\[\\\[r4:SI\\\]\\\]" "final" { 
target { ! arm_v8_1m_mve_ok } } } } */
+/* { dg-final { scan-rtl-dump "call unspec\\\[\\\[r\[0-7\]:SI\\\]\\\]" "final" 
{ target { arm_v8_1m_mve_ok } } } } */
diff --git a/gcc/testsuite/gcc.target/arm/cmse/cmse-20.c 
b/gcc/testsuite/gcc.target/arm/cmse/cmse-20.c
new file mode 100644
index 
..08e89bff6378f1f96950fc40f3ab3946bd433773
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/cmse/cmse-20.c
@@ -0,0 +1,28 @@
+/* This test is executed only if the execution engine supports CMSE 
instructions.  */
+/* { dg-options "--save-temps -mcmse 
-Wl,--section-start,.gnu.sgstubs=0x0040" } */
+
+#include 
+#include 
+#include 
+
+void __attribute__((cmse_nonsecure_entry))
+secure_fun (int a, int *p)
+{
+  void *b = cmse_check_address_range ((void *)p, a, 1);
+
+  if (b == NULL)
+   __builtin_abort ();
+  printf("%d", *((int *)b));
+}
+
+int
+main (void)
+{
+  int *ptr;
+  int size = 1;
+  ptr = (int *) calloc (1, sizeof(int *));
+  *ptr = 1315852292;
+  secure_fun (size, ptr);
+  free (ptr);
+  return 0;
+}
diff --git a/libgcc/config/arm/cmse_nonsecure_call.S 
b/libgcc/config/arm/cmse_nonsecure_call.S
index 
146f3ed52e9c7e915e5dbd9b70624ec3bd7cd5b5..00830ade98ea650c328c709d5d308fbc96f7f21c
 100644
--- a/libgcc/config/arm/cmse_nonsecure_call.S
+++ b/libgcc/config/arm/cmse_nonsecure_call.S
@@ -25,7 +25,7 @@
  
  .syntax unified

  #ifdef __ARM_PCS_VFP
-# if __ARM_FP & 0x8
+# if (__ARM_FP & 0x8) || (__ARM_FEATURE_MVE & 1)
.fpu fpv5-d16
  # else
.fpu fpv4-sp-d16
@@ -59,7 +59,7 @@ vmov  s24, s25, r5, r5
  vmov  s26, s27, r5, r5
  vmov  s28, s29, r5, r5
  vmov  s30, s31, r5, r5
-#elif __ARM_FP & 0x08
+#elif (__ARM_FP & 0x8) || (__ARM_FEATURE_MVE & 1)
  vmov.f64d9, d8
  vmov.f64d10, d8
  vmov.f64d11, d8
diff --git a/libgcc/config/arm/t-arm b/libgcc/config/arm/t-arm
index 
3625a2590beec4e4e0e0881be9ad284c595c7190..c1553d4e5d80751b13dc2e9c9e36d5ebe82e5f8c
 100644
--- a/libgcc/config/arm/t-arm
+++ b/libgcc/config/arm/t-arm
@@ -3,18 +3,17 @@ LIB1ASMFUNCS = _thumb1_case_sqi _thumb1_case_uqi 
_thumb1_case_shi \
_thumb1_case_uhi _thumb1_case_si _speculation_barrier
  
  HAVE_CMSE:=$(findstring __ARM_FEATURE_CMSE,$(shell $(gcc_compile_bare) -dM -E - 
-HAVE_V81M:=$(findstring armv8.1-m.main,$(gcc_compile_bare))
  ifeq ($(shell $(gcc_compile_bare) -E -mcmse - /dev/null 
2>/dev/null; echo $$?),0)
  CMSE_OPTS:=-mcmse
  endif
  
  ifdef HAVE_CMSE

-ifndef HAVE_V81M
+
  libgcc-objects += cmse.o cmse_nonsecure_call.o
  
  cmse.o: $(srcdir)/config/arm/cmse.c

$(gcc_compile) -c $(CMSE_OPTS) $<
+
  cmse_nonsecure_call.o: $(srcdir)/config/arm/cmse_nonsecure_call.S
   $(gcc_compile) -c $<
  endif
-endif



[GCC][Patch] arm: Fix the mve multilib for the broken cmse support (pr99939).

2021-06-11 Thread Srinath Parvathaneni via Gcc-patches
Hi Richard,

I have addressed all your review comments in 
https://gcc.gnu.org/pipermail/gcc-patches/2021-June/571739.html
in the following patch.

The current CMSE support in the multilib build for "-march=armv8.1-m.main+mve 
-mfloat-abi=hard -mfpu=auto"
is broken as specified in PR99939 and this patch fixes the issue.

Regression tested on arm-none-eabi and found no regressions.

Ok for master? and Ok for GCC-10 branch?

Regards,
Srinath.

gcc/testsuite/ChangeLog:

2021-06-11  Srinath Parvathaneni  

PR target/99939
* gcc.target/arm/cmse/cmse-18.c: Add separate scan-assembler
directives check for target is v8.1-m.main+mve or not before
comparing the assembly output.
* gcc.target/arm/cmse/cmse-20.c: New test.

libgcc/ChangeLog:

2021-06-11  Srinath Parvathaneni  

PR target/99939
* config/arm/cmse_nonsecure_call.S: Add __ARM_FEATURE_MVE
macro.
* config/arm/t-arm: To link cmse.o and cmse_nonsecure_call.o
on passing -mcmse option.



### Attachment also inlined for ease of reply###


diff --git a/gcc/testsuite/gcc.target/arm/cmse/cmse-18.c 
b/gcc/testsuite/gcc.target/arm/cmse/cmse-18.c
index 
e1ff09257b7900982f49117d4cfc16f3bd79d76c..db7d975a90ea4bd1810aea03949ec1e8837e
 100644
--- a/gcc/testsuite/gcc.target/arm/cmse/cmse-18.c
+++ b/gcc/testsuite/gcc.target/arm/cmse/cmse-18.c
@@ -8,4 +8,5 @@ void bar(f func, int a)
   func(a);
 }
 
-/* { dg-final { scan-rtl-dump "call unspec\\\[\\\[r4:SI\\\]\\\]" "final" } } */
+/* { dg-final { scan-rtl-dump "call unspec\\\[\\\[r4:SI\\\]\\\]" "final" { 
target { ! arm_v8_1m_mve_ok } } } } */
+/* { dg-final { scan-rtl-dump "call unspec\\\[\\\[r\[0-7\]:SI\\\]\\\]" "final" 
{ target { arm_v8_1m_mve_ok } } } } */
diff --git a/gcc/testsuite/gcc.target/arm/cmse/cmse-20.c 
b/gcc/testsuite/gcc.target/arm/cmse/cmse-20.c
new file mode 100644
index 
..08e89bff6378f1f96950fc40f3ab3946bd433773
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/cmse/cmse-20.c
@@ -0,0 +1,28 @@
+/* This test is executed only if the execution engine supports CMSE 
instructions.  */
+/* { dg-options "--save-temps -mcmse 
-Wl,--section-start,.gnu.sgstubs=0x0040" } */
+
+#include 
+#include 
+#include 
+
+void __attribute__((cmse_nonsecure_entry))
+secure_fun (int a, int *p)
+{
+  void *b = cmse_check_address_range ((void *)p, a, 1);
+
+  if (b == NULL)
+   __builtin_abort ();
+  printf("%d", *((int *)b));
+}
+
+int
+main (void)
+{
+  int *ptr;
+  int size = 1;
+  ptr = (int *) calloc (1, sizeof(int *));
+  *ptr = 1315852292;
+  secure_fun (size, ptr);
+  free (ptr);
+  return 0;
+}
diff --git a/libgcc/config/arm/cmse_nonsecure_call.S 
b/libgcc/config/arm/cmse_nonsecure_call.S
index 
146f3ed52e9c7e915e5dbd9b70624ec3bd7cd5b5..00830ade98ea650c328c709d5d308fbc96f7f21c
 100644
--- a/libgcc/config/arm/cmse_nonsecure_call.S
+++ b/libgcc/config/arm/cmse_nonsecure_call.S
@@ -25,7 +25,7 @@
 
 .syntax unified
 #ifdef __ARM_PCS_VFP
-# if __ARM_FP & 0x8
+# if (__ARM_FP & 0x8) || (__ARM_FEATURE_MVE & 1)
.fpu fpv5-d16
 # else
.fpu fpv4-sp-d16
@@ -59,7 +59,7 @@ vmov  s24, s25, r5, r5
 vmov   s26, s27, r5, r5
 vmov   s28, s29, r5, r5
 vmov   s30, s31, r5, r5
-#elif __ARM_FP & 0x08
+#elif (__ARM_FP & 0x8) || (__ARM_FEATURE_MVE & 1)
 vmov.f64d9, d8
 vmov.f64d10, d8
 vmov.f64d11, d8
diff --git a/libgcc/config/arm/t-arm b/libgcc/config/arm/t-arm
index 
3625a2590beec4e4e0e0881be9ad284c595c7190..c1553d4e5d80751b13dc2e9c9e36d5ebe82e5f8c
 100644
--- a/libgcc/config/arm/t-arm
+++ b/libgcc/config/arm/t-arm
@@ -3,18 +3,17 @@ LIB1ASMFUNCS = _thumb1_case_sqi _thumb1_case_uqi 
_thumb1_case_shi \
_thumb1_case_uhi _thumb1_case_si _speculation_barrier
 
 HAVE_CMSE:=$(findstring __ARM_FEATURE_CMSE,$(shell $(gcc_compile_bare) -dM -E 
- /dev/null 
2>/dev/null; echo $$?),0)
 CMSE_OPTS:=-mcmse
 endif
 
 ifdef HAVE_CMSE
-ifndef HAVE_V81M
+
 libgcc-objects += cmse.o cmse_nonsecure_call.o
 
 cmse.o: $(srcdir)/config/arm/cmse.c
$(gcc_compile) -c $(CMSE_OPTS) $<
+
 cmse_nonsecure_call.o: $(srcdir)/config/arm/cmse_nonsecure_call.S
   $(gcc_compile) -c $<
 endif
-endif

diff --git a/gcc/testsuite/gcc.target/arm/cmse/cmse-18.c 
b/gcc/testsuite/gcc.target/arm/cmse/cmse-18.c
index 
e1ff09257b7900982f49117d4cfc16f3bd79d76c..db7d975a90ea4bd1810aea03949ec1e8837e
 100644
--- a/gcc/testsuite/gcc.target/arm/cmse/cmse-18.c
+++ b/gcc/testsuite/gcc.target/arm/cmse/cmse-18.c
@@ -8,4 +8,5 @@ void bar(f func, int a)
   func(a);
 }
 
-/* { dg-final { scan-rtl-dump "call unspec\\\[\\\[r4:SI\\\]\\\]" "final" } } */
+/* { dg-final { scan-rtl-dump "call unspec\\\[\\\[r4:SI\\\]\\\]" "final" { 
target { ! arm_v8_1m_mve_ok } } } } */
+/* { dg-final { scan-rtl-dump "call unspec\\\[\\\[r\[0-7\]:SI\\\]\\\]" "final" 
{ target { arm_v8_1m_mve_ok } } } } */
diff --git a/gcc/testsuite/gcc.target/arm/cmse/cmse-20.c 

Re: [PATCH v2] c++: Add C++23 consteval if support - P1938R3 [PR100974]

2021-06-11 Thread Jason Merrill via Gcc-patches

On 6/11/21 6:28 AM, Jakub Jelinek wrote:

On Thu, Jun 10, 2021 at 03:00:54PM -0400, Jason Merrill wrote:

The second is clearer about the fix, the first is clearer about the problem.
Maybe add a fixit to the first error?


Done.


OK, just add a comment then.

OK, just add a comment then.


And added comments.


--- gcc/cp/cp-gimplify.c.jj 2021-06-09 21:54:39.473193977 +0200
+++ gcc/cp/cp-gimplify.c2021-06-10 09:49:35.898557178 +0200
@@ -161,7 +161,9 @@ genericize_if_stmt (tree *stmt_p)
  if (!else_)
else_ = build_empty_stmt (locus);
-  if (integer_nonzerop (cond) && !TREE_SIDE_EFFECTS (else_))
+  if (IF_STMT_CONSTEVAL_P (stmt))
+stmt = else_;


This seems redundant, since you're using boolean_false_node for the
condition.


It is only when !TREE_SIDE_EFFECTS (else_).
I think that is about having labels in the then_/else_ blocks and
gotos jumping into those from outside.
But IF_STMT_CONSTEVAL_P guarantees that is not the case (and we verify
earlier), so we can do that (and in fact, for IF_STMT_CONSTEVAL_P have
to, because then_ could contain consteval calls not constant expression
evaluated).
I guess we could do that for IF_STMT_CONSTEXPR_P too, that also
doesn't allow gotos/switch into the branches.


For this too.



--- gcc/cp/call.c.jj2021-06-09 21:54:39.436194489 +0200
+++ gcc/cp/call.c   2021-06-10 09:49:35.949556470 +0200
@@ -8840,6 +8840,7 @@ immediate_invocation_p (tree fn, int nar
  || !DECL_IMMEDIATE_FUNCTION_P (current_function_decl))
  && (current_binding_level->kind != sk_function_parms
  || !current_binding_level->immediate_fn_ctx_p)
+ && !in_immediate_fn_ctx_p


Now that we have this flag, shouldn't we set it in actual immediate function
context?  Or rename the flag to match when it's actually set.


I guess I can try that.  Though, I'm not sure if we could also get rid of
the current_binding_level->immediate_fn_ctx_p for sk_function_parms case.


Had a look at this, but could handle it.
So instead I've renamed it to in_consteval_if_p.

Ok for trunk if it passes bootstrap/regtest?


OK.


2021-06-11  Jakub Jelinek  

PR c++/100974 - P1938R3 - if consteval
gcc/c-family/
* c-cppbuiltin.c (c_cpp_builtins): Predefine __cpp_if_consteval for
-std=c++2b.
gcc/cp/
* cp-tree.h (struct saved_scope): Add consteval_if_p
member.  Formatting fix for the discarded_stmt comment.
(in_consteval_if_p, IF_STMT_CONSTEVAL_P): Define.
* parser.c (cp_parser_lambda_expression): Temporarily disable
in_consteval_if_p when parsing lambda body.
(cp_parser_selection_statement): Parse consteval if.
* decl.c (struct named_label_entry): Add in_consteval_if member.
(level_for_consteval_if): New function.
(poplevel_named_label_1, check_previous_goto_1, check_goto): Handle
consteval if.
* constexpr.c (cxx_eval_builtin_function_call): Clarify in comment
why CP_BUILT_IN_IS_CONSTANT_EVALUATED needs to *non_constant_p
for !ctx->manifestly_const_eval.
(cxx_eval_conditional_expression): For IF_STMT_CONSTEVAL_P evaluate
condition as if it was __builtin_is_constant_evaluated call.
(potential_constant_expression_1): For IF_STMT_CONSTEVAL_P always
recurse on both branches.
* cp-gimplify.c (genericize_if_stmt): Genericize IF_STMT_CONSTEVAL_P
as the else branch.
* pt.c (tsubst_expr) : Copy IF_STMT_CONSTEVAL_P.
Temporarily set in_consteval_if_p when recursing on
IF_STMT_CONSTEVAL_P then branch.
(tsubst_lambda_expr): Temporarily disable
in_consteval_if_p when instantiating lambda body.
* call.c (immediate_invocation_p): Return false when
in_consteval_if_p.
gcc/testsuite/
* g++.dg/cpp23/consteval-if1.C: New test.
* g++.dg/cpp23/consteval-if2.C: New test.
* g++.dg/cpp23/consteval-if3.C: New test.
* g++.dg/cpp23/consteval-if4.C: New test.
* g++.dg/cpp23/consteval-if5.C: New test.
* g++.dg/cpp23/consteval-if6.C: New test.
* g++.dg/cpp23/consteval-if7.C: New test.
* g++.dg/cpp23/consteval-if8.C: New test.
* g++.dg/cpp23/consteval-if9.C: New test.
* g++.dg/cpp23/consteval-if10.C: New test.
* g++.dg/cpp23/feat-cxx2b.C: Add __cpp_if_consteval tests.

--- gcc/c-family/c-cppbuiltin.c.jj  2021-06-10 19:56:20.584335765 +0200
+++ gcc/c-family/c-cppbuiltin.c 2021-06-11 11:34:15.408578098 +0200
@@ -1029,6 +1029,7 @@ c_cpp_builtins (cpp_reader *pfile)
{
  /* Set feature test macros for C++23.  */
  cpp_define (pfile, "__cpp_size_t_suffix=202011L");
+ cpp_define (pfile, "__cpp_if_consteval=202106L");
}
if (flag_concepts)
  {
--- gcc/cp/cp-tree.h.jj 2021-06-11 11:32:35.857980773 +0200
+++ gcc/cp/cp-tree.h2021-06-11 11:35:52.468210524 +0200
@@ -478,6 +478,7 @@ extern GTY(()) tree cp_global_trees[CPTI
  

Re: [PATCH] rs6000: Support more short/char to float conversion

2021-06-11 Thread Kewen.Lin via Gcc-patches
Hi Segher,

on 2021/6/10 下午6:58, Segher Boessenkool wrote:
> Hi!
> 
> On Thu, Jun 10, 2021 at 05:32:23PM +0800, Kewen.Lin wrote:
>> +/* { dg-do compile { target lp64 } } */
> 
> One final thing: what requires lp64 here?  Could you try without please?
> 

The lp64 is required for lxsi[bh]zx and mvexts[bh]2d, without it
-m32 testing will have some failures as below:

FAIL: gcc.target/powerpc/p9-fpcvt-3.c scan-assembler \\mlxsibzx\\M
FAIL: gcc.target/powerpc/p9-fpcvt-3.c scan-assembler \\mlxsihzx\\M
FAIL: gcc.target/powerpc/p9-fpcvt-3.c scan-assembler M
FAIL: gcc.target/powerpc/p9-fpcvt-3.c scan-assembler \\mvextsh2d\\M


> Okay for trunk with that considered.  Thanks!
> 

Thanks!  It's committed in r12-1378.

BR,
Kewen


[PATCH] tree-optimization/101028 - fix endless SLP reassoc discovery

2021-06-11 Thread Richard Biener
This fixes a missing clearing of mismatched lanes from the
fatal fail path in SLP reassoc discovery in the most conservative
way.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2021-06-11  Richard Biener  

PR tree-optimization/101028
* tree-vect-slp.c (vect_build_slp_tree_2): When SLP
reassoc discovery fails fatally, mark appropriate lanes
in matches[] so.

* gcc.dg/pr101028.c: New testcase.
---
 gcc/testsuite/gcc.dg/pr101028.c | 34 +
 gcc/tree-vect-slp.c | 26 ++---
 2 files changed, 57 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr101028.c

diff --git a/gcc/testsuite/gcc.dg/pr101028.c b/gcc/testsuite/gcc.dg/pr101028.c
new file mode 100644
index 000..501e6af37cf
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr101028.c
@@ -0,0 +1,34 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast" } */
+
+typedef struct {
+  double x, y;
+} PointInfo;
+
+typedef struct {
+  PointInfo point;
+} PrimitiveInfo;
+
+int TraceBezier_alpha, TraceBezier_i;
+double TraceBezier_weight;
+PointInfo *TraceBezier_points;
+PrimitiveInfo *TraceBezier_primitive_info;
+
+void TracePath() {
+  double *coefficients;
+  PointInfo point;
+  long j;
+  for (; TraceBezier_i; TraceBezier_i++) {
+point.x = point.y = TraceBezier_alpha = 1.0;
+j = 0;
+for (; j < 4; j++) {
+  point.x += TraceBezier_alpha * coefficients[j] *
+ TraceBezier_primitive_info->point.x;
+  point.y += TraceBezier_alpha * TraceBezier_primitive_info->point.y;
+  TraceBezier_alpha *= TraceBezier_weight;
+  TraceBezier_primitive_info++;
+}
+TraceBezier_points[TraceBezier_i] = point;
+TraceBezier_weight += 1.0;
+  }
+}
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index 897bd6f37b9..9ded58592c8 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -1849,14 +1849,28 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
 mismatch still hard-FAIL.  */
  if (chain_len == 0)
hard_fail = false;
+ else
+   {
+ matches[lane] = false;
+ /* ???  We might want to process the other lanes, but
+make sure to not give false matching hints to the
+caller for lanes we did not process.  */
+ if (lane != group_size - 1)
+   matches[0] = false;
+   }
  break;
}
  else if (chain_len == 0)
chain_len = chain.length ();
  else if (chain.length () != chain_len)
-   /* ???  Here we could slip in magic to compensate with
-  neutral operands.  */
-   break;
+   {
+ /* ???  Here we could slip in magic to compensate with
+neutral operands.  */
+ matches[lane] = false;
+ if (lane != group_size - 1)
+   matches[0] = false;
+ break;
+   }
  chains.quick_push (chain.copy ());
  chain.truncate (0);
}
@@ -1905,6 +1919,9 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
dump_printf_loc (MSG_NOTE, vect_location,
 "giving up on chain due to mismatched "
 "def types\n");
+ matches[lane] = false;
+ if (lane != group_size - 1)
+   matches[0] = false;
  goto out;
}
  if (dt == vect_constant_def
@@ -1983,6 +2000,9 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
dump_printf_loc (MSG_NOTE, vect_location,
 "failed to match up op %d\n", n);
  op_stmts.release ();
+ matches[lane] = false;
+ if (lane != group_size - 1)
+   matches[0] = false;
  goto out;
}
  if (dump_enabled_p ())
-- 
2.26.2


[GCC][Patch] arm: Fix the mve multilib for the broken cmse support (pr99939).

2021-06-11 Thread Srinath Parvathaneni via Gcc-patches
Hi Richard,

I have addressed all your review comments in 
https://gcc.gnu.org/pipermail/gcc-patches/2021-June/571739.html
in the following patch.

The current CMSE support in the multilib build for "-march=armv8.1-m.main+mve 
-mfloat-abi=hard -mfpu=auto"
is broken as specified in PR99939 and this patch fixes the issue.

Regression tested on arm-none-eabi and found no regressions.

Ok for master? and Ok for GCC-10 branch?

Regards,
Srinath.

gcc/testsuite/ChangeLog:

2021-06-11  Srinath Parvathaneni  

* gcc.target/arm/cmse/cmse-18.c: Add separate scan-assembler
directives check for target is v8.1-m.main+mve or not before
comparing the assembly output.
* gcc.target/arm/cmse/cmse-20.c: New test.

libgcc/ChangeLog:

2021-06-11  Srinath Parvathaneni  

* config/arm/cmse_nonsecure_call.S: Add __ARM_FEATURE_MVE
macro.
* config/arm/t-arm: To link cmse.o and cmse_nonsecure_call.o
on passing -mcmse option.



### Attachment also inlined for ease of reply###


diff --git a/gcc/testsuite/gcc.target/arm/cmse/cmse-18.c 
b/gcc/testsuite/gcc.target/arm/cmse/cmse-18.c
index 
e1ff09257b7900982f49117d4cfc16f3bd79d76c..db7d975a90ea4bd1810aea03949ec1e8837e
 100644
--- a/gcc/testsuite/gcc.target/arm/cmse/cmse-18.c
+++ b/gcc/testsuite/gcc.target/arm/cmse/cmse-18.c
@@ -8,4 +8,5 @@ void bar(f func, int a)
   func(a);
 }
 
-/* { dg-final { scan-rtl-dump "call unspec\\\[\\\[r4:SI\\\]\\\]" "final" } } */
+/* { dg-final { scan-rtl-dump "call unspec\\\[\\\[r4:SI\\\]\\\]" "final" { 
target { ! arm_v8_1m_mve_ok } } } } */
+/* { dg-final { scan-rtl-dump "call unspec\\\[\\\[r\[0-7\]:SI\\\]\\\]" "final" 
{ target { arm_v8_1m_mve_ok } } } } */
diff --git a/gcc/testsuite/gcc.target/arm/cmse/cmse-20.c 
b/gcc/testsuite/gcc.target/arm/cmse/cmse-20.c
new file mode 100644
index 
..08e89bff6378f1f96950fc40f3ab3946bd433773
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/cmse/cmse-20.c
@@ -0,0 +1,28 @@
+/* This test is executed only if the execution engine supports CMSE 
instructions.  */
+/* { dg-options "--save-temps -mcmse 
-Wl,--section-start,.gnu.sgstubs=0x0040" } */
+
+#include 
+#include 
+#include 
+
+void __attribute__((cmse_nonsecure_entry))
+secure_fun (int a, int *p)
+{
+  void *b = cmse_check_address_range ((void *)p, a, 1);
+
+  if (b == NULL)
+   __builtin_abort ();
+  printf("%d", *((int *)b));
+}
+
+int
+main (void)
+{
+  int *ptr;
+  int size = 1;
+  ptr = (int *) calloc (1, sizeof(int *));
+  *ptr = 1315852292;
+  secure_fun (size, ptr);
+  free (ptr);
+  return 0;
+}
diff --git a/libgcc/config/arm/cmse_nonsecure_call.S 
b/libgcc/config/arm/cmse_nonsecure_call.S
index 
146f3ed52e9c7e915e5dbd9b70624ec3bd7cd5b5..00830ade98ea650c328c709d5d308fbc96f7f21c
 100644
--- a/libgcc/config/arm/cmse_nonsecure_call.S
+++ b/libgcc/config/arm/cmse_nonsecure_call.S
@@ -25,7 +25,7 @@
 
 .syntax unified
 #ifdef __ARM_PCS_VFP
-# if __ARM_FP & 0x8
+# if (__ARM_FP & 0x8) || (__ARM_FEATURE_MVE & 1)
.fpu fpv5-d16
 # else
.fpu fpv4-sp-d16
@@ -59,7 +59,7 @@ vmov  s24, s25, r5, r5
 vmov   s26, s27, r5, r5
 vmov   s28, s29, r5, r5
 vmov   s30, s31, r5, r5
-#elif __ARM_FP & 0x08
+#elif (__ARM_FP & 0x8) || (__ARM_FEATURE_MVE & 1)
 vmov.f64d9, d8
 vmov.f64d10, d8
 vmov.f64d11, d8
diff --git a/libgcc/config/arm/t-arm b/libgcc/config/arm/t-arm
index 
3625a2590beec4e4e0e0881be9ad284c595c7190..c1553d4e5d80751b13dc2e9c9e36d5ebe82e5f8c
 100644
--- a/libgcc/config/arm/t-arm
+++ b/libgcc/config/arm/t-arm
@@ -3,18 +3,17 @@ LIB1ASMFUNCS = _thumb1_case_sqi _thumb1_case_uqi 
_thumb1_case_shi \
_thumb1_case_uhi _thumb1_case_si _speculation_barrier
 
 HAVE_CMSE:=$(findstring __ARM_FEATURE_CMSE,$(shell $(gcc_compile_bare) -dM -E 
- /dev/null 
2>/dev/null; echo $$?),0)
 CMSE_OPTS:=-mcmse
 endif
 
 ifdef HAVE_CMSE
-ifndef HAVE_V81M
+
 libgcc-objects += cmse.o cmse_nonsecure_call.o
 
 cmse.o: $(srcdir)/config/arm/cmse.c
$(gcc_compile) -c $(CMSE_OPTS) $<
+
 cmse_nonsecure_call.o: $(srcdir)/config/arm/cmse_nonsecure_call.S
   $(gcc_compile) -c $<
 endif
-endif

diff --git a/gcc/testsuite/gcc.target/arm/cmse/cmse-18.c 
b/gcc/testsuite/gcc.target/arm/cmse/cmse-18.c
index 
e1ff09257b7900982f49117d4cfc16f3bd79d76c..db7d975a90ea4bd1810aea03949ec1e8837e
 100644
--- a/gcc/testsuite/gcc.target/arm/cmse/cmse-18.c
+++ b/gcc/testsuite/gcc.target/arm/cmse/cmse-18.c
@@ -8,4 +8,5 @@ void bar(f func, int a)
   func(a);
 }
 
-/* { dg-final { scan-rtl-dump "call unspec\\\[\\\[r4:SI\\\]\\\]" "final" } } */
+/* { dg-final { scan-rtl-dump "call unspec\\\[\\\[r4:SI\\\]\\\]" "final" { 
target { ! arm_v8_1m_mve_ok } } } } */
+/* { dg-final { scan-rtl-dump "call unspec\\\[\\\[r\[0-7\]:SI\\\]\\\]" "final" 
{ target { arm_v8_1m_mve_ok } } } } */
diff --git a/gcc/testsuite/gcc.target/arm/cmse/cmse-20.c 
b/gcc/testsuite/gcc.target/arm/cmse/cmse-20.c
new file mode 100644
index 

Re: [PATCH] PR tree-optimization/96392 Optimize x+0.0 if x is an integer

2021-06-11 Thread Richard Biener via Gcc-patches
On Thu, Jun 10, 2021 at 9:45 PM Roger Sayle  wrote:
>
>
> The patch implements a missed optimization enhancement.  Under usual
> IEEE rules, x+0.0 can't be simplified to x when x might potentially
> be an IEEE minus zero (-0.0).  The current logic in the middle-end
> checks whether the type of x should honor signed zeros, but with this
> patch we introduce tree_expr_maybe_real_minus_zero_p that allows us
> to confirm that the value can't possibly be -0.0, for example, the result
> of a conversion from an integer type, or the result of fabs (or has a
> type that doesn't honor signed zero).
>
> Whilst modifying match.pd, I also converted some additional folding
> transformations from "testing the type" to "testing the value".
>
> The following patch has been tested on x86_64-pc-linux-gnu with
> a make bootstrap and make -k check with no new failures.
>
> Ok for mainline?

OK.  Maybe we can at some point record & propagate these
FP value predicates on SSA names just similar to SSA_NAME_RANGE_INFO ...

Thanks,
Richard.

>
> 2020-06-10  Roger Sayle  
>
> gcc/ChangeLog
> PR tree-optimization/96392
> * fold-const.c (fold_real_zero_addition_p): Take both arguments
> of the addition or subtraction, not just the zero.  Use this
> other argument in tests for signaling NaNs and signed zeros.
> (tree_expr_maybe_real_minus_zero_p): New predicate.
> * fold-const.h (fold_real_zero_addition_p): Update prototype.
> (tree_expr_maybe_real_minus_zero_p): New function prototype.
> * match.pd: Update calls to fold_real_zero_addition_p.
> Replace HONOR_NANS with tree_expr_maybe_nan_p.
> Replace HONOR_SIGNED_ZEROS with tree_expr_maybe_real_minus_zero_p.
> Replace HONOR_SNANS with tree_expr_maybe_signaling_nan_p.
> * tree-ssa-reassoc.c (eliminate_using_constants): Update
> call to fold_real_zero_addition_p.
>
> gcc/testsuite/ChangeLog
> PR tree-optimization/96392
> * gcc.dg/pr96392.c: New test.
>
> Roger
> --
> Roger Sayle
> NextMove Software
> Cambridge, UK
>


[PATCH] tree-optimization/101026 - fix SLP re-association

2021-06-11 Thread Richard Biener
Since we cannot yet encode the operation in the SLP node itself
but need a representative stmt require an existing one for now
to avoid the need to build a fake GIMPLE stmt.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-06-11  Richard Biener  

PR tree-optimization/101026
* tree-vect-slp.c (vect_build_slp_tree_2): Make sure we
have a representative for the associated chain nodes.

* gfortran.dg/pr101026.f: New testcase.
---
 gcc/testsuite/gfortran.dg/pr101026.f | 11 +++
 gcc/tree-vect-slp.c  |  4 +++-
 2 files changed, 14 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gfortran.dg/pr101026.f

diff --git a/gcc/testsuite/gfortran.dg/pr101026.f 
b/gcc/testsuite/gfortran.dg/pr101026.f
new file mode 100644
index 000..9576d8802ca
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr101026.f
@@ -0,0 +1,11 @@
+! { dg-do compile }
+! { dg-options "-Ofast -frounding-math" }
+  SUBROUTINE PASSB4 (CC,CH)
+  DIMENSION CC(IDO,4,L1), CH(IDO,L1,*)
+ DO 103 I=2,IDO,2
+TI4 = CC0-CC(I,4,K)
+CI4 = TI1-TI4
+CH(I-1,K,4) = CI4
+CH(I,K,4) = CI4
+  103CONTINUE
+  END
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index 6237a61ffd4..897bd6f37b9 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -1860,7 +1860,9 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
  chains.quick_push (chain.copy ());
  chain.truncate (0);
}
-  if (chains.length () == group_size)
+  if (chains.length () == group_size
+ /* We cannot yet use SLP_TREE_CODE to communicate the operation.  */
+ && op_stmt_info)
{
  /* Now we have a set of chains with the same length.  */
  /* 1. pre-sort according to def_type and operation.  */
-- 
2.26.2


Re: [PATCH] arc: Add --with-fpu support for ARCv2 cpus

2021-06-11 Thread Claudiu Zissulescu via Gcc-patches

Hi Bernhard,

Please find attached my latest patch, it includes (hopefully) all your 
feedback.


Thank you for comments,
Claudiu
>From 03075b3d9194120d7adb3cdc2aa0f58e3ea9dd1d Mon Sep 17 00:00:00 2001
From: Claudiu Zissulescu 
Date: Wed, 21 Oct 2020 16:11:43 +0300
Subject: [PATCH] arc: Add --with-fpu support for ARCv2 cpus

Support for a compile-time default FPU. The --with-fpu configuration
option is ignored if -mfpu compiler option is specified. The FPU
options are only available for ARCv2 cpus.

gcc/
-mm-dd  Claudiu Zissulescu  

	* config.gcc (arc): Add support for with_cpu option.
	* config/arc/arc.h (OPTION_DEFAULT_SPECS): Add fpu.

Signed-off-by: Claudiu Zissulescu 
---
 gcc/config.gcc   | 44 +++-
 gcc/config/arc/arc.h |  4 
 2 files changed, 43 insertions(+), 5 deletions(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 13c2004e3c52..5581dae88b52 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -4258,18 +4258,52 @@ case "${target}" in
 		;;
 
 	arc*-*-*)
-		supported_defaults="cpu"
+		supported_defaults="cpu fpu"
 
+		new_cpu=hs38_linux
 		if [ x"$with_cpu" = x ] \
-		|| grep "^ARC_CPU ($with_cpu," \
-		   ${srcdir}/config/arc/arc-cpus.def \
-		   > /dev/null; then
+		|| grep -q -E "^ARC_CPU[[:blank:]]*\($with_cpu," \
+		   ${srcdir}/config/arc/arc-cpus.def
+		then
 		 # Ok
-		 true
+		 new_cpu=$with_cpu
 		else
 		 echo "Unknown cpu used in --with-cpu=$with_cpu" 1>&2
 		 exit 1
 		fi
+
+		# see if --with-fpu matches any of the supported FPUs
+		case "$with_fpu" in
+		"")
+			# OK
+			;;
+		fpus | fpus_div | fpus_fma | fpus_all)
+			# OK if em or hs
+			flags_ok="[emhs]+"
+			;;
+		fpuda | fpuda_div | fpuda_fma | fpuda_all)
+			# OK only em
+			flags_ok="em"
+			;;
+		fpud | fpud_div | fpud_fma | fpud_all)
+			# OK only hs
+			flags_ok="hs"
+			;;
+		*)
+			echo "Unknown floating point type used in "\
+			 "--with-fpu=$with_fpu" 1>&2
+			exit 1
+			;;
+		esac
+
+		if [ -n "$flags_ok" ] \
+		   && ! grep -q -E "^ARC_CPU[[:blank:]]*\($new_cpu,[[:blank:]]*$flags_ok," \
+		   ${srcdir}/config/arc/arc-cpus.def
+		then
+		   echo "Unknown floating point type used in "\
+			 "--with-fpu=$with_fpu for cpu $new_cpu" 1>&2
+			 exit 1
+		fi
 		;;
 
 csky-*-*)
diff --git a/gcc/config/arc/arc.h b/gcc/config/arc/arc.h
index 2ccebfa7afe6..e78f6e8202b3 100644
--- a/gcc/config/arc/arc.h
+++ b/gcc/config/arc/arc.h
@@ -100,7 +100,11 @@ extern const char *arc_cpu_to_as (int argc, const char **argv);
   "%:cpu_to_as(%{mcpu=*:%*}) %{mspfp*} %{mdpfp*} "  \
   "%{mfpu=fpuda*:-mfpuda} %{mcode-density}"
 
+/* Support for a compile-time default CPU and FPU.  The rules are:
+   --with-cpu is ignored if -mcpu, mARC*, marc*, mA7, mA6 are specified.
+   --with-fpu is ignored if -mfpu is specified.  */
 #define OPTION_DEFAULT_SPECS		\
+  {"fpu", "%{!mfpu=*:-mfpu=%(VALUE)}"},	\
   {"cpu", "%{!mcpu=*:%{!mARC*:%{!marc*:%{!mA7:%{!mA6:-mcpu=%(VALUE)}" }
 
 #ifndef DRIVER_ENDIAN_SELF_SPECS
-- 
2.31.1



Re: [PATCH][version 3]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-06-11 Thread Richard Biener
On Thu, 10 Jun 2021, Qing Zhao wrote:

> Hi, Richard,
> 
> I need more discussion on the following comments you raised:
> 
> > On May 26, 2021, at 6:18 AM, Richard Biener  wrote:
> > 
> > +/* Expand the IFN_DEFERRED_INIT function according to its second 
> > argument.  */
> > +static void
> > +expand_DEFERRED_INIT (internal_fn, gcall *stmt)
> > +{
> > +  tree var = gimple_call_lhs (stmt);
> > +  tree init = NULL_TREE;
> > +  enum auto_init_type init_type
> > += (enum auto_init_type) TREE_INT_CST_LOW (gimple_call_arg (stmt, 1));
> > +
> > +  switch (init_type)
> > +{
> > +default:
> > +  gcc_unreachable ();
> > +case AUTO_INIT_PATTERN:
> > +  init = build_pattern_cst_for_auto_init (TREE_TYPE (var));
> > +  expand_assignment (var, init, false);
> > +  break;
> > +case AUTO_INIT_ZERO:
> > +  init = build_zero_cst (TREE_TYPE (var));
> > +  expand_assignment (var, init, false);
> > +  break;
> > +}
> > 
> > I think actually building build_pattern_cst_for_auto_init can generate
> > massive garbage and for big auto vars code size is also a concern and
> > ideally on x86 you'd produce rep movq.  So I don't think going
> > via expand_assignment is good.  Instead you possibly want to lower
> > .DEFERRED_INIT to MEMs following expand_builtin_memset and
> > eventually enhance that to allow storing pieces larger than a byte.
> 
> When I tried to lower .DEFERRED_INIT to MEMs for  “AUTO_INIT_PATTERN”, I have 
> the following questions:
> 
> 1. If .DEFERRED_INIT will be lowered to MEMS through “memset”, then we 
> basically initialize the whole memory covering the
> auto variable, including paddings. Right?

Yes.

> 2. Only when the value that is used to initialization has a repeated 
>byte-pattern, we can lower it through “memset”. Otherwise, If the 
>value that is used to initialization does Not have a repeated 
>byte-pattern, we can NOT lower it through “memset”, right?

Yes.  This is why I said you should do it _similar_ to how memcpy
is implemented.  OTOH I don't see a good reason to support patterns
that are bigger than a byte ...

> Currently, for the values that are used to initialize for 
> “AUTO_INIT_PATTERN”, we have:
> 
>   /* The following value is a guaranteed unmappable pointer value and has a
>  repeated byte-pattern which makes it easier to synthesize.  We use it for
>  pointers as well as integers so that aggregates are likely to be
>  initialized with this repeated value.  */
>   uint64_t largevalue = 0xull;
>   /* For 32-bit platforms it's a bit trickier because, across systems, only 
> the
>  zero page can reasonably be expected to be unmapped, and even then we 
> need
>  a very low address.  We use a smaller value, and that value sadly doesn't
>  have a repeated byte-pattern.  We don't use it for integers.  */
>   uint32_t smallvalue = 0x00AA;
> 
> In additional to the above, for BOOLEAN_TYPE:
> 
> case BOOLEAN_TYPE:
>   /* We think that initializing the boolean variable to 0 other than 1
>  is better even for pattern initialization.  */
> 
> Due to “BOOLEAN_TYPE” and “POINTER_TYPE”, we cannot always have a 
> repeated byte-pattern for variables that include BOOLEAN_TYPE Or Pointer 
> types. Therefore, lowering the .DEFERRED_INIT for “PATTERN” 
> initialization through “memset” is not always possible.
> 
> Let me know if I miss anything in the above. Do you have other suggestions?

The main point is that you need to avoid building the explicit initializer
only to have it consumed by assignment expansion.  If you want to keep
all the singing and dancing (as opposed to maybe initializing with a
0x1 byte pattern) then I think for efficiency you still want to
block-initialize the variable and then only fixup the special fields.

But as said, all this is quite over-designed IMHO and simply
zeroing everything would be much simpler and good enough.

Richard.


Re: [PATCH][version 3]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-06-11 Thread Richard Biener
On Tue, 8 Jun 2021, Kees Cook wrote:

> On Tue, Jun 08, 2021 at 09:41:38AM +0200, Richard Biener wrote:
> > On Mon, 7 Jun 2021, Qing Zhao wrote:
> > 
> > > Hi, 
> > > 
> > > > On Jun 7, 2021, at 2:53 AM, Richard Biener  wrote:
> > > > 
> > > >> 
> > > >> To address the above suggestion:
> > > >> 
> > > >> My study shows: the call to __builtin_clear_padding is expanded during 
> > > >> gimplification phase.
> > > >> And there is no __bultin_clear_padding expanding during rtx expanding 
> > > >> phase.
> > > >> However, for -ftrivial-auto-var-init, padding initialization should be 
> > > >> done both in gimplification phase and rtx expanding phase.
> > > >> since the __builtin_clear_padding might not be good for rtx expanding, 
> > > >> reusing __builtin_clear_padding might not work.
> > > >> 
> > > >> Let me know if you have any more comments on this.
> > > > 
> > > > Yes, I didn't suggest to literally emit calls to 
> > > > __builtin_clear_padding 
> > > > but instead to leverage the lowering code, more specifically share the
> > > > code that figures _what_ is to be initialized (where the padding is)
> > > > and eventually the actual code generation pieces.  That might need some
> > > > refactoring but the code where padding resides should be present only
> > > > a single time (since it's quite complex).
> > > 
> > > Okay, I see your point here.
> > > 
> > > > 
> > > > Which is also why I suggested to split out the padding initialization
> > > > bits to a separate patch (and option).
> > > 
> > > Personally, I am okay with splitting padding initialization from this 
> > > current patch,
> > > Kees, what’s your opinion on this? i.e, the current 
> > > -ftrivial-auto-var-init will NOT initialize padding, we will add another 
> > > option to 
> > > Explicitly initialize padding.
> > 
> > It would also be possible to have -fauto-var-init, -fauto-var-init-padding
> > and have -ftrivial-auto-var-init for clang compatibility enabling both.
> 
> Sounds good to me!
> 
> > Or -fauto-var-init={zero,pattern,padding} and allow
> > -fauto-var-init=pattern,padding to be specified.  Note there's also
> > padding between auto variables on the stack - that "trailing"
> > padding isn't initialized either?  (yes, GCC sorts variables to minimize
> > that padding)  For example for
> > 
> > void foo()
> > {
> >   char a[3];
> >   bar (a);
> > }
> > 
> > there's 12 bytes padding after 'a', shouldn't we initialize that?  If not,
> > why's other padding important to be initialized?
> 
> This isn't a situation that I'm aware of causing real-world problems.
> The issues have all come from padding within an addressable object. I
> haven't tested Clang's behavior on this (and I have no kernel tests for
> this padding), but I do check for trailing padding, like:
> 
> struct test_trailing_hole {
> char *one;
> char *two;
> char *three;
> char four;
> /* "sizeof(unsigned long) - 1" byte padding hole here. */
> };

Any justification why tail padding for

 struct foo { double x; char x[3]; } a;

is important but not for

 char x[3];

?  It does look like an odd inconsistency to me.

Richard.


Re: [PATCH 04/11 v2] libstdc++: Make use of __builtin_bit_cast

2021-06-11 Thread Matthias Kretz
While testing newer patches I found several missing conversions from 
__bit_cast to simd_bit_cast in this patch (i.e. where bit casting to / from 
fixed_size was sometimes required). Corrected patch attached.


From: Matthias Kretz 

The __bit_cast function was a hack to achieve what __builtin_bit_cast
can do, therefore use __builtin_bit_cast if possible. However,
__builtin_bit_cast cannot be used to cast from/to fixed_size_simd, since
it isn't trivially copyable (in the language sense — in principle it
is). Therefore add __proposed::simd_bit_cast to enable the use case
required in the test framework.

Signed-off-by: Matthias Kretz 

libstdc++-v3/ChangeLog:

* include/experimental/bits/simd.h (__bit_cast): Implement via
__builtin_bit_cast #if available.
(__proposed::simd_bit_cast): Add overloads for simd and
simd_mask, which use __builtin_bit_cast (or __bit_cast #if not
available), which return an object of the requested type with
the same bits as the argument.
* include/experimental/bits/simd_math.h: Use simd_bit_cast
instead of __bit_cast to allow casts to fixed_size_simd.
(copysign): Remove branch that was only required if __bit_cast
cannot be constexpr.
* testsuite/experimental/simd/tests/bits/test_values.h: Switch
from __bit_cast to __proposed::simd_bit_cast since the former
will not cast fixed_size objects anymore.
---
 libstdc++-v3/include/experimental/bits/simd.h | 57 ++-
 .../include/experimental/bits/simd_math.h | 36 +---
 .../simd/tests/bits/test_values.h |  8 +--
 3 files changed, 75 insertions(+), 26 deletions(-)


--
──
 Dr. Matthias Kretz   https://mattkretz.github.io
 GSI Helmholtz Centre for Heavy Ion Research   https://gsi.de
 std::experimental::simd  https://github.com/VcDevel/std-simd
──diff --git a/libstdc++-v3/include/experimental/bits/simd.h b/libstdc++-v3/include/experimental/bits/simd.h
index 163f1b574e2..852d0b62012 100644
--- a/libstdc++-v3/include/experimental/bits/simd.h
+++ b/libstdc++-v3/include/experimental/bits/simd.h
@@ -1598,7 +1598,9 @@ template 
   _GLIBCXX_SIMD_INTRINSIC constexpr _To
   __bit_cast(const _From __x)
   {
-// TODO: implement with / replace by __builtin_bit_cast ASAP
+#if __has_builtin(__builtin_bit_cast)
+return __builtin_bit_cast(_To, __x);
+#else
 static_assert(sizeof(_To) == sizeof(_From));
 constexpr bool __to_is_vectorizable
   = is_arithmetic_v<_To> || is_enum_v<_To>;
@@ -1629,6 +1631,7 @@ template 
 			 reinterpret_cast(&__x), sizeof(_To));
 	return __r;
   }
+#endif
   }
 
 // }}}
@@ -2900,6 +2903,58 @@ template (__x)};
   }
+
+template 
+  _GLIBCXX_SIMD_INTRINSIC _GLIBCXX_SIMD_CONSTEXPR
+  _To
+  simd_bit_cast(const simd<_Up, _Abi>& __x)
+  {
+using _Tp = typename _To::value_type;
+using _ToMember = typename _SimdTraits<_Tp, typename _To::abi_type>::_SimdMember;
+using _From = simd<_Up, _Abi>;
+using _FromMember = typename _SimdTraits<_Up, _Abi>::_SimdMember;
+// with concepts, the following should be constraints
+static_assert(sizeof(_To) == sizeof(_From));
+static_assert(is_trivially_copyable_v<_Tp> && is_trivially_copyable_v<_Up>);
+static_assert(is_trivially_copyable_v<_ToMember> && is_trivially_copyable_v<_FromMember>);
+#if __has_builtin(__builtin_bit_cast)
+return {__private_init, __builtin_bit_cast(_ToMember, __data(__x))};
+#else
+return {__private_init, __bit_cast<_ToMember>(__data(__x))};
+#endif
+  }
+
+template 
+  _GLIBCXX_SIMD_INTRINSIC _GLIBCXX_SIMD_CONSTEXPR
+  _To
+  simd_bit_cast(const simd_mask<_Up, _Abi>& __x)
+  {
+using _From = simd_mask<_Up, _Abi>;
+static_assert(sizeof(_To) == sizeof(_From));
+static_assert(is_trivially_copyable_v<_From>);
+// _To can be simd, specifically simd> in which case _To is not trivially
+// copyable.
+if constexpr (is_simd_v<_To>)
+  {
+	using _Tp = typename _To::value_type;
+	using _ToMember = typename _SimdTraits<_Tp, typename _To::abi_type>::_SimdMember;
+	static_assert(is_trivially_copyable_v<_ToMember>);
+#if __has_builtin(__builtin_bit_cast)
+	return {__private_init, __builtin_bit_cast(_ToMember, __x)};
+#else
+	return {__private_init, __bit_cast<_ToMember>(__x)};
+#endif
+  }
+else
+  {
+	static_assert(is_trivially_copyable_v<_To>);
+#if __has_builtin(__builtin_bit_cast)
+	return __builtin_bit_cast(_To, __x);
+#else
+	return __bit_cast<_To>(__x);
+#endif
+  }
+  }
 } // namespace __proposed
 
 // simd_cast {{{2
diff --git a/libstdc++-v3/include/experimental/bits/simd_math.h b/libstdc++-v3/include/experimental/bits/simd_math.h
index d954e761eee..afd8b5a028f 100644
--- a/libstdc++-v3/include/experimental/bits/simd_math.h
+++ 

Re: [PATCH] simplify-rtx: Fix up simplify_logical_relational_operation for vector IOR [PR101008]

2021-06-11 Thread Richard Biener
On Fri, 11 Jun 2021, Jakub Jelinek wrote:

> Hi!
> 
> simplify_relational_operation callees typically return just const0_rtx
> or const_true_rtx and then simplify_relational_operation attempts to fix
> that up if the comparison result has vector mode, or floating mode,
> or punt if it has scalar mode and vector mode operands (it doesn't know how
> exactly to deal with the scalar masks).
> But, simplify_logical_relational_operation has a special case, where
> it attempts to fold (x < y) | (x >= y) etc. and if it determines it is
> always true, it just returns const_true_rtx, without doing the dances that
> simplify_relational_operation does.
> That results in an ICE on the following testcase, where such folding happens
> during expansion (of debug stmts into DEBUG_INSNs) and we ICE because
> all of sudden a VOIDmode rtx appears where it expects a vector (V4SImode)
> rtx.
> 
> The following patch fixes that by moving the adjustement into a separate
> helper routine and using it from both simplify_relational_operation and
> simplify_logical_relational_operation.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Thanks,
Richard.

> 2021-06-11  Jakub Jelinek  
> 
>   PR rtl-optimization/101008
>   * simplify-rtx.c (relational_result): New function.
>   (simplify_logical_relational_operation,
>   simplify_relational_operation): Use it.
> 
>   * gcc.dg/pr101008.c: New test.
> 
> --- gcc/simplify-rtx.c.jj 2021-05-04 21:02:24.0 +0200
> +++ gcc/simplify-rtx.c2021-06-10 13:56:48.946628822 +0200
> @@ -2294,6 +2294,53 @@ comparison_code_valid_for_mode (enum rtx
>   gcc_unreachable ();
>  }
>  }
> +
> +/* Canonicalize RES, a scalar const0_rtx/const_true_rtx to the right
> +   false/true value of comparison with MODE where comparison operands
> +   have CMP_MODE.  */
> +
> +static rtx
> +relational_result (machine_mode mode, machine_mode cmp_mode, rtx res)
> +{
> +  if (SCALAR_FLOAT_MODE_P (mode))
> +{
> +  if (res == const0_rtx)
> +return CONST0_RTX (mode);
> +#ifdef FLOAT_STORE_FLAG_VALUE
> +  REAL_VALUE_TYPE val = FLOAT_STORE_FLAG_VALUE (mode);
> +  return const_double_from_real_value (val, mode);
> +#else
> +  return NULL_RTX;
> +#endif
> +}
> +  if (VECTOR_MODE_P (mode))
> +{
> +  if (res == const0_rtx)
> + return CONST0_RTX (mode);
> +#ifdef VECTOR_STORE_FLAG_VALUE
> +  rtx val = VECTOR_STORE_FLAG_VALUE (mode);
> +  if (val == NULL_RTX)
> + return NULL_RTX;
> +  if (val == const1_rtx)
> + return CONST1_RTX (mode);
> +
> +  return gen_const_vec_duplicate (mode, val);
> +#else
> +  return NULL_RTX;
> +#endif
> +}
> +  /* For vector comparison with scalar int result, it is unknown
> + if the target means here a comparison into an integral bitmask,
> + or comparison where all comparisons true mean const_true_rtx
> + whole result, or where any comparisons true mean const_true_rtx
> + whole result.  For const0_rtx all the cases are the same.  */
> +  if (VECTOR_MODE_P (cmp_mode)
> +  && SCALAR_INT_MODE_P (mode)
> +  && res == const_true_rtx)
> +return NULL_RTX;
> +
> +  return res;
> +}
>  
>  /* Simplify a logical operation CODE with result mode MODE, operating on OP0
> and OP1, which should be both relational operations.  Return 0 if no such
> @@ -2329,7 +2376,7 @@ simplify_context::simplify_logical_relat
>int mask = mask0 | mask1;
>  
>if (mask == 15)
> -return const_true_rtx;
> +return relational_result (mode, GET_MODE (op0), const_true_rtx);
>  
>code = mask_to_comparison (mask);
>  
> @@ -5315,51 +5362,7 @@ simplify_context::simplify_relational_op
>  
>tem = simplify_const_relational_operation (code, cmp_mode, op0, op1);
>if (tem)
> -{
> -  if (SCALAR_FLOAT_MODE_P (mode))
> - {
> -  if (tem == const0_rtx)
> -return CONST0_RTX (mode);
> -#ifdef FLOAT_STORE_FLAG_VALUE
> -   {
> - REAL_VALUE_TYPE val;
> - val = FLOAT_STORE_FLAG_VALUE (mode);
> - return const_double_from_real_value (val, mode);
> -   }
> -#else
> -   return NULL_RTX;
> -#endif
> - }
> -  if (VECTOR_MODE_P (mode))
> - {
> -   if (tem == const0_rtx)
> - return CONST0_RTX (mode);
> -#ifdef VECTOR_STORE_FLAG_VALUE
> -   {
> - rtx val = VECTOR_STORE_FLAG_VALUE (mode);
> - if (val == NULL_RTX)
> -   return NULL_RTX;
> - if (val == const1_rtx)
> -   return CONST1_RTX (mode);
> -
> - return gen_const_vec_duplicate (mode, val);
> -   }
> -#else
> -   return NULL_RTX;
> -#endif
> - }
> -  /* For vector comparison with scalar int result, it is unknown
> -  if the target means here a comparison into an integral bitmask,
> -  or comparison where all comparisons true mean const_true_rtx
> -  whole result, or where any comparisons true mean 

[PATCH] i386: Try to avoid variable permutation instruction [PR101021]

2021-06-11 Thread Uros Bizjak via Gcc-patches
Some permutations can be implemented without costly PSHUFB instruction, e.g.:

{ 8,9,10,11,12,13,14,15, 0,1,2,3,4,5,6,7 } with PALIGNR,

{ 0,1,2,3, 4,5,6,7, 4,5,6,7, 12,13,14,15 } with PSHUFD,

{ 0,1, 2,3, 2,3, 6,7, 8,9,10,11,12,13,14,15 } with PSHUFLW and

{ 0,1,2,3,4,5,6,7, 8,9, 10,11, 10,11, 14,15 } with PSHUFHW.

All these instructions have constant shuffle control mask and do not
need to load shuffle mask from a memory to a temporary XMM register.

2021-06-11  Uroš Bizjak  

gcc/
PR target/101021
* config/i386/i386-expand.c (expand_vec_perm_pshufb): Return
false if the permutation can be implemented with constant
permutation instruction in wider mode.
(canonicalize_vector_int_perm): Move above expand_vec_perm_pshufb.
Handle V8QImode and V4HImode.

gcc/testsuite/

PR target/101021
* gcc.target/i386/pr101021-1.c: New test.
* gcc.target/i386/pr101021-2.c: Ditto.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Additionally tested with:

GCC_TEST_RUN_EXPENSIVE=1 make check-gcc
RUNTESTFLAGS='--target_board=unix/-mavx dg-torture.exp=vshuf*.c'
Pushed to master.

Uros.
diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index 9ee5257adf9..2fa3a18dc6a 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -17354,6 +17354,59 @@ expand_vec_perm_vpermil (struct expand_vec_perm_d *d)
   return true;
 }
 
+/* For V*[QHS]Imode permutations, check if the same permutation
+   can't be performed in a 2x, 4x or 8x wider inner mode.  */
+
+static bool
+canonicalize_vector_int_perm (const struct expand_vec_perm_d *d,
+ struct expand_vec_perm_d *nd)
+{
+  int i;
+  machine_mode mode = VOIDmode;
+
+  switch (d->vmode)
+{
+case E_V8QImode: mode = V4HImode; break;
+case E_V16QImode: mode = V8HImode; break;
+case E_V32QImode: mode = V16HImode; break;
+case E_V64QImode: mode = V32HImode; break;
+case E_V4HImode: mode = V2SImode; break;
+case E_V8HImode: mode = V4SImode; break;
+case E_V16HImode: mode = V8SImode; break;
+case E_V32HImode: mode = V16SImode; break;
+case E_V4SImode: mode = V2DImode; break;
+case E_V8SImode: mode = V4DImode; break;
+case E_V16SImode: mode = V8DImode; break;
+default: return false;
+}
+  for (i = 0; i < d->nelt; i += 2)
+if ((d->perm[i] & 1) || d->perm[i + 1] != d->perm[i] + 1)
+  return false;
+  nd->vmode = mode;
+  nd->nelt = d->nelt / 2;
+  for (i = 0; i < nd->nelt; i++)
+nd->perm[i] = d->perm[2 * i] / 2;
+  if (GET_MODE_INNER (mode) != DImode)
+canonicalize_vector_int_perm (nd, nd);
+  if (nd != d)
+{
+  nd->one_operand_p = d->one_operand_p;
+  nd->testing_p = d->testing_p;
+  if (d->op0 == d->op1)
+   nd->op0 = nd->op1 = gen_lowpart (nd->vmode, d->op0);
+  else
+   {
+ nd->op0 = gen_lowpart (nd->vmode, d->op0);
+ nd->op1 = gen_lowpart (nd->vmode, d->op1);
+   }
+  if (d->testing_p)
+   nd->target = gen_raw_REG (nd->vmode, LAST_VIRTUAL_REGISTER + 1);
+  else
+   nd->target = gen_reg_rtx (nd->vmode);
+}
+  return true;
+}
+
 /* Return true if permutation D can be performed as VMODE permutation
instead.  */
 
@@ -17391,6 +17444,7 @@ expand_vec_perm_pshufb (struct expand_vec_perm_d *d)
   unsigned i, nelt, eltsz, mask;
   unsigned char perm[64];
   machine_mode vmode = V16QImode;
+  struct expand_vec_perm_d nd;
   rtx rperm[64], vperm, target, op0, op1;
 
   nelt = d->nelt;
@@ -17539,6 +17593,10 @@ expand_vec_perm_pshufb (struct expand_vec_perm_d *d)
return false;
 }
 
+  /* Try to avoid variable permutation instruction.  */
+  if (canonicalize_vector_int_perm (d, ) && expand_vec_perm_1 ())
+return false;
+
   if (d->testing_p)
 return true;
 
@@ -17617,57 +17675,6 @@ expand_vec_perm_pshufb (struct expand_vec_perm_d *d)
   return true;
 }
 
-/* For V*[QHS]Imode permutations, check if the same permutation
-   can't be performed in a 2x, 4x or 8x wider inner mode.  */
-
-static bool
-canonicalize_vector_int_perm (const struct expand_vec_perm_d *d,
- struct expand_vec_perm_d *nd)
-{
-  int i;
-  machine_mode mode = VOIDmode;
-
-  switch (d->vmode)
-{
-case E_V16QImode: mode = V8HImode; break;
-case E_V32QImode: mode = V16HImode; break;
-case E_V64QImode: mode = V32HImode; break;
-case E_V8HImode: mode = V4SImode; break;
-case E_V16HImode: mode = V8SImode; break;
-case E_V32HImode: mode = V16SImode; break;
-case E_V4SImode: mode = V2DImode; break;
-case E_V8SImode: mode = V4DImode; break;
-case E_V16SImode: mode = V8DImode; break;
-default: return false;
-}
-  for (i = 0; i < d->nelt; i += 2)
-if ((d->perm[i] & 1) || d->perm[i + 1] != d->perm[i] + 1)
-  return false;
-  nd->vmode = mode;
-  nd->nelt = d->nelt / 2;
-  for (i = 0; i < nd->nelt; i++)
-nd->perm[i] = d->perm[2 * i] / 2;
-  if (GET_MODE_INNER (mode) != DImode)
-   

[PATCH v2] c++: Add C++23 consteval if support - P1938R3 [PR100974]

2021-06-11 Thread Jakub Jelinek via Gcc-patches
On Thu, Jun 10, 2021 at 03:00:54PM -0400, Jason Merrill wrote:
> The second is clearer about the fix, the first is clearer about the problem.
> Maybe add a fixit to the first error?

Done.

> OK, just add a comment then.
> 
> OK, just add a comment then.

And added comments.

> > > > --- gcc/cp/cp-gimplify.c.jj 2021-06-09 21:54:39.473193977 +0200
> > > > +++ gcc/cp/cp-gimplify.c2021-06-10 09:49:35.898557178 +0200
> > > > @@ -161,7 +161,9 @@ genericize_if_stmt (tree *stmt_p)
> > > >  if (!else_)
> > > >else_ = build_empty_stmt (locus);
> > > > -  if (integer_nonzerop (cond) && !TREE_SIDE_EFFECTS (else_))
> > > > +  if (IF_STMT_CONSTEVAL_P (stmt))
> > > > +stmt = else_;
> > > 
> > > This seems redundant, since you're using boolean_false_node for the
> > > condition.
> > 
> > It is only when !TREE_SIDE_EFFECTS (else_).
> > I think that is about having labels in the then_/else_ blocks and
> > gotos jumping into those from outside.
> > But IF_STMT_CONSTEVAL_P guarantees that is not the case (and we verify
> > earlier), so we can do that (and in fact, for IF_STMT_CONSTEVAL_P have
> > to, because then_ could contain consteval calls not constant expression
> > evaluated).
> > I guess we could do that for IF_STMT_CONSTEXPR_P too, that also
> > doesn't allow gotos/switch into the branches.

For this too.
> > 
> > > > --- gcc/cp/call.c.jj2021-06-09 21:54:39.436194489 +0200
> > > > +++ gcc/cp/call.c   2021-06-10 09:49:35.949556470 +0200
> > > > @@ -8840,6 +8840,7 @@ immediate_invocation_p (tree fn, int nar
> > > >   || !DECL_IMMEDIATE_FUNCTION_P (current_function_decl))
> > > >   && (current_binding_level->kind != sk_function_parms
> > > >   || !current_binding_level->immediate_fn_ctx_p)
> > > > + && !in_immediate_fn_ctx_p
> > > 
> > > Now that we have this flag, shouldn't we set it in actual immediate 
> > > function
> > > context?  Or rename the flag to match when it's actually set.
> > 
> > I guess I can try that.  Though, I'm not sure if we could also get rid of
> > the current_binding_level->immediate_fn_ctx_p for sk_function_parms case.

Had a look at this, but could handle it.
So instead I've renamed it to in_consteval_if_p.

Ok for trunk if it passes bootstrap/regtest?

2021-06-11  Jakub Jelinek  

PR c++/100974 - P1938R3 - if consteval
gcc/c-family/
* c-cppbuiltin.c (c_cpp_builtins): Predefine __cpp_if_consteval for
-std=c++2b.
gcc/cp/
* cp-tree.h (struct saved_scope): Add consteval_if_p
member.  Formatting fix for the discarded_stmt comment.
(in_consteval_if_p, IF_STMT_CONSTEVAL_P): Define.
* parser.c (cp_parser_lambda_expression): Temporarily disable
in_consteval_if_p when parsing lambda body.
(cp_parser_selection_statement): Parse consteval if.
* decl.c (struct named_label_entry): Add in_consteval_if member.
(level_for_consteval_if): New function.
(poplevel_named_label_1, check_previous_goto_1, check_goto): Handle
consteval if.
* constexpr.c (cxx_eval_builtin_function_call): Clarify in comment
why CP_BUILT_IN_IS_CONSTANT_EVALUATED needs to *non_constant_p
for !ctx->manifestly_const_eval.
(cxx_eval_conditional_expression): For IF_STMT_CONSTEVAL_P evaluate
condition as if it was __builtin_is_constant_evaluated call.
(potential_constant_expression_1): For IF_STMT_CONSTEVAL_P always
recurse on both branches.
* cp-gimplify.c (genericize_if_stmt): Genericize IF_STMT_CONSTEVAL_P
as the else branch.
* pt.c (tsubst_expr) : Copy IF_STMT_CONSTEVAL_P.
Temporarily set in_consteval_if_p when recursing on
IF_STMT_CONSTEVAL_P then branch.
(tsubst_lambda_expr): Temporarily disable
in_consteval_if_p when instantiating lambda body.
* call.c (immediate_invocation_p): Return false when
in_consteval_if_p.
gcc/testsuite/
* g++.dg/cpp23/consteval-if1.C: New test.
* g++.dg/cpp23/consteval-if2.C: New test.
* g++.dg/cpp23/consteval-if3.C: New test.
* g++.dg/cpp23/consteval-if4.C: New test.
* g++.dg/cpp23/consteval-if5.C: New test.
* g++.dg/cpp23/consteval-if6.C: New test.
* g++.dg/cpp23/consteval-if7.C: New test.
* g++.dg/cpp23/consteval-if8.C: New test.
* g++.dg/cpp23/consteval-if9.C: New test.
* g++.dg/cpp23/consteval-if10.C: New test.
* g++.dg/cpp23/feat-cxx2b.C: Add __cpp_if_consteval tests.

--- gcc/c-family/c-cppbuiltin.c.jj  2021-06-10 19:56:20.584335765 +0200
+++ gcc/c-family/c-cppbuiltin.c 2021-06-11 11:34:15.408578098 +0200
@@ -1029,6 +1029,7 @@ c_cpp_builtins (cpp_reader *pfile)
{
  /* Set feature test macros for C++23.  */
  cpp_define (pfile, "__cpp_size_t_suffix=202011L");
+ cpp_define (pfile, "__cpp_if_consteval=202106L");
}
   if (flag_concepts)
 {
--- gcc/cp/cp-tree.h.jj 

Re: [PATCH 2/2] arm: Auto-vectorization for MVE: add pack/unpack patterns

2021-06-11 Thread Richard Sandiford via Gcc-patches
Christophe Lyon  writes:
> In the meantime, I tried to make some progress, and looked at how
> things work on aarch64.
>
> This led me to define (in mve.md):
>
> (define_insn "@mve_vec_pack_trunc_lo_"
>  [(set (match_operand: 0 "register_operand" "=w")
>(truncate: (match_operand:MVE_5 1 "register_operand" "w")))]
>  "TARGET_HAVE_MVE"
>  "vmovnb.i   %q0, %q1\;"
>   [(set_attr "type" "mve_move")]
> )
>
> (define_insn "@mve_vec_pack_trunc_hi_"
>  [(set (match_operand: 0 "register_operand" "=w")
>(vec_concat:
> (match_operand: 1 "register_operand" "0")
> (truncate:
> (match_operand:MVE_5 2 "register_operand" "w"]
>  "TARGET_HAVE_MVE"
>  "vmovnt.i   %q0, %q2\;"
>   [(set_attr "type" "mve_move")]
> )
>
> and in vec-common.md, for
> (define_expand "vec_pack_trunc_"
>  [(set (match_operand: 0 "register_operand")
>(vec_concat:
> (truncate:
> (match_operand:VN 1 "register_operand"))
> (truncate:
> (match_operand:VN 2 "register_operand"]
>
> I expand this for MVE:
>   rtx tmpreg = gen_reg_rtx (mode);
>   emit_insn (gen_mve_vec_pack_trunc_lo (mode, tmpreg, operands[1]));
>   emit_insn (gen_mve_vec_pack_trunc_hi (mode, operands[0],
> tmpreg, operands[2]));
>
> I am getting an error in reload:
> error: unable to generate reloads for:
> (insn 10 9 11 2 (set (reg:V4HI 122 [ vect__4.16 ])
> (truncate:V4HI (reg:V4SI 118 [ vect__4.16 ])))
> "/gcc.target/arm/simd/mve-vec-pack.c":17:112 3609
> {mve_vec_pack_trunc_lo_v4si}
>  (expr_list:REG_DEAD (reg:V4SI 118 [ vect__4.16 ])
> (nil)))
>
> The next insn is:
> (insn 11 10 12 2 (set (reg:V8HI 121 [ vect__7.18 ])
> (vec_concat:V8HI (reg:V4HI 122 [ vect__4.16 ])
> (truncate:V4HI (reg:V4SI 119 [ vect__4.17 ]
> "/gcc.target/arm/simd/mve-vec-pack.c":17:112 3611
> {mve_vec_pack_trunc_hi_v4si}
>  (expr_list:REG_DEAD (reg:V4HI 122 [ vect__4.16 ])
> (expr_list:REG_DEAD (reg:V4SI 119 [ vect__4.17 ])
> (nil
>
> What is causing the reload error?

For this to work on MVE, we'd need to allow the 64-bit V_narrow modes to be
stored in MVE registers.  I'm not sure off-hand whether allowing that
would be a good idea or not.

If we continue to allow only 128-bit vectors in MVE registers then
we'll need to continue to use unspecs instead of truncate and vec_concat.

Thanks,
Richard


Re: [PATCH 2/2] arm: Auto-vectorization for MVE: add pack/unpack patterns

2021-06-11 Thread Christophe Lyon via Gcc-patches
On Fri, 11 Jun 2021 at 11:38, Richard Sandiford
 wrote:
>
> Christophe Lyon  writes:
> > On Tue, 8 Jun 2021 at 14:10, Richard Sandiford
> >  wrote:
> >>
> >> Christophe Lyon  writes:
> >> > This patch adds vec_unpack_hi_, vec_unpack_lo_,
> >> > vec_pack_trunc_ patterns for MVE.
> >> >
> >> > It does so by moving the unpack patterns from neon.md to
> >> > vec-common.md, while adding them support for MVE. The pack expander is
> >> > derived from the Neon one (which in turn is renamed into
> >> > neon_quad_vec_pack_trunc_).
> >> >
> >> > The patch introduces mve_vec_pack_trunc_ to avoid the need for a
> >> > zero-initialized temporary, which is needed if the
> >> > vec_pack_trunc_ expander calls @mve_vmovn[bt]q_
> >> > instead.
> >> >
> >> > With this patch, we can now vectorize the 16 and 8-bit versions of
> >> > vclz and vshl, although the generated code could still be improved.
> >> > For test_clz_s16, we now generate
> >> > vldrh.16q3, [r1]
> >> > vmovlb.s16   q2, q3
> >> > vmovlt.s16   q3, q3
> >> > vclz.i32  q2, q2
> >> > vclz.i32  q3, q3
> >> > vmovnb.i32  q1, q2
> >> > vmovnt.i32  q1, q3
> >> > vstrh.16q1, [r0]
> >> > which could be improved to
> >> > vldrh.16q3, [r1]
> >> >   vclz.i16q1, q3
> >> > vstrh.16q1, [r0]
> >> > if we could avoid the need for unpack/pack steps.
> >>
> >> Yeah, there was a PR about fixing this for popcount.  I guess the same
> >> approach would apply here too.
> >>
> >> > For reference, clang-12 generates:
> >> >   vldrh.s32   q0, [r1]
> >> >   vldrh.s32   q1, [r1, #8]
> >> >   vclz.i32q0, q0
> >> >   vstrh.32q0, [r0]
> >> >   vclz.i32q0, q1
> >> >   vstrh.32q0, [r0, #8]
> >> >
> >> > 2021-06-03  Christophe Lyon  
> >> >
> >> >   gcc/
> >> >   * config/arm/mve.md (mve_vmovltq_): Prefix with '@'.
> >> >   (mve_vmovlbq_): Likewise.
> >> >   (mve_vmovnbq_): Likewise.
> >> >   (mve_vmovntq_): Likewise.
> >> >   (@mve_vec_pack_trunc_): New pattern.
> >> >   * config/arm/neon.md (vec_unpack_hi_): Move to
> >> >   vec-common.md.
> >> >   (vec_unpack_lo_): Likewise.
> >> >   (vec_pack_trunc_): Rename to
> >> >   neon_quad_vec_pack_trunc_.
> >> >   * config/arm/vec-common.md (vec_unpack_hi_): New
> >> >   pattern.
> >> >   (vec_unpack_lo_): New.
> >> >   (vec_pack_trunc_): New.
> >> >
> >> >   gcc/testsuite/
> >> >   * gcc.target/arm/simd/mve-vclz.c: Update expected results.
> >> >   * gcc.target/arm/simd/mve-vshl.c: Likewise.
> >> > ---
> >> >  gcc/config/arm/mve.md| 20 -
> >> >  gcc/config/arm/neon.md   | 39 +
> >> >  gcc/config/arm/vec-common.md | 89 
> >> >  gcc/testsuite/gcc.target/arm/simd/mve-vclz.c |  7 +-
> >> >  gcc/testsuite/gcc.target/arm/simd/mve-vshl.c |  5 +-
> >> >  5 files changed, 114 insertions(+), 46 deletions(-)
> >> >
> >> > diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> >> > index 99e46d0bc69..b18292c07d3 100644
> >> > --- a/gcc/config/arm/mve.md
> >> > +++ b/gcc/config/arm/mve.md
> >> > @@ -510,7 +510,7 @@ (define_insn "mve_vrev32q_"
> >> >  ;;
> >> >  ;; [vmovltq_u, vmovltq_s])
> >> >  ;;
> >> > -(define_insn "mve_vmovltq_"
> >> > +(define_insn "@mve_vmovltq_"
> >> >[
> >> > (set (match_operand: 0 "s_register_operand" "=w")
> >> >   (unspec: [(match_operand:MVE_3 1 
> >> > "s_register_operand" "w")]
> >> > @@ -524,7 +524,7 @@ (define_insn "mve_vmovltq_"
> >> >  ;;
> >> >  ;; [vmovlbq_s, vmovlbq_u])
> >> >  ;;
> >> > -(define_insn "mve_vmovlbq_"
> >> > +(define_insn "@mve_vmovlbq_"
> >> >[
> >> > (set (match_operand: 0 "s_register_operand" "=w")
> >> >   (unspec: [(match_operand:MVE_3 1 
> >> > "s_register_operand" "w")]
> >> > @@ -2187,7 +2187,7 @@ (define_insn "mve_vmlsldavxq_s"
> >> >  ;;
> >> >  ;; [vmovnbq_u, vmovnbq_s])
> >> >  ;;
> >> > -(define_insn "mve_vmovnbq_"
> >> > +(define_insn "@mve_vmovnbq_"
> >> >[
> >> > (set (match_operand: 0 "s_register_operand" "=w")
> >> >   (unspec: [(match_operand: 1 
> >> > "s_register_operand" "0")
> >> > @@ -2202,7 +2202,7 @@ (define_insn "mve_vmovnbq_"
> >> >  ;;
> >> >  ;; [vmovntq_s, vmovntq_u])
> >> >  ;;
> >> > -(define_insn "mve_vmovntq_"
> >> > +(define_insn "@mve_vmovntq_"
> >> >[
> >> > (set (match_operand: 0 "s_register_operand" "=w")
> >> >   (unspec: [(match_operand: 1 
> >> > "s_register_operand" "0")
> >> > @@ -2214,6 +2214,18 @@ (define_insn "mve_vmovntq_"
> >> >[(set_attr "type" "mve_move")
> >> >  ])
> >> >
> >> > +(define_insn "@mve_vec_pack_trunc_"
> >> > + [(set (match_operand: 0 "register_operand" "=")
> >> > +   (vec_concat:
> >> > + (truncate:
> >> > + (match_operand:MVE_5 1 "register_operand" "w"))
> >> > + (truncate:
> >> > +  

Re: [PATCH] Add gnu::diagnose_as attribute

2021-06-11 Thread Matthias Kretz
How can we make progress here? I could try to produce some "Tony Tables" of 
diagnostic output of my modified stdx::simd. I believe it's a major 
productivity boost to see abbreviated / "obfuscated" diagnostics *out-of-the 
box* (with the possibility to opt-out). Actually, it already *is* a 
productivity boost to me. Understanding diagnostics has improved from 

"1. ooof, I'm not going to read this, let me rather guess what the issue was
2. sh** I have to read it
3. several minutes later: I finally found the five words to understand the 
problem; I could use a break"

to

"1. right, let me check that"

For reference I'll attach my stdx::simd diagnose_as patch.

We could also talk about extending the feature to provide more information 
about the diagnose_as substition. E.g. print a list of all diagnose_as 
substitutions, which were used, at the end of the output stream. Or simpler, 
print "note: some identifiers were simplified, use -fno-diagnostics-use-
aliases to see their real names".

On Tuesday, 1 June 2021 21:12:18 CEST Jason Merrill wrote:
> > Right, but then two of my design goals can't be met:
> > 
> > 1. Diagnostics have an improved signal-to-noise ratio out of the box.
> > 
> > 2. We can use replacement names that are not valid identifiers.
> 
> This is the basic disconnect: I think that these goals are
> contradictory, and that replacement names that are not valid identifiers
> will just confuse users that don't know about them.
> 
> If a user sees stdx::foo in a diagnostic and then tries to refer to
> stdx::foo and gets an error, the diagnostic is not more helpful than one
> that uses the fully qualified name.
> 
> Jonathan, David, any thoughts on this issue?

-- 
──
 Dr. Matthias Kretz   https://mattkretz.github.io
 GSI Helmholtz Centre for Heavy Ion Research   https://gsi.de
 std::experimental::simd  https://github.com/VcDevel/std-simd
──
diff --git a/libstdc++-v3/include/experimental/bits/simd.h b/libstdc++-v3/include/experimental/bits/simd.h
index 43331134301..8e0cceff860 100644
--- a/libstdc++-v3/include/experimental/bits/simd.h
+++ b/libstdc++-v3/include/experimental/bits/simd.h
@@ -80,13 +80,13 @@ using __m512d [[__gnu__::__vector_size__(64)]] = double;
 using __m512i [[__gnu__::__vector_size__(64)]] = long long;
 #endif
 
-namespace simd_abi {
+namespace simd_abi [[__gnu__::__diagnose_as__("simd_abi")]] {
 // simd_abi forward declarations {{{
 // implementation details:
-struct _Scalar;
+  struct [[__gnu__::__diagnose_as__("scalar")]] _Scalar;
 
 template 
-  struct _Fixed;
+  struct [[__gnu__::__diagnose_as__("fixed_size")]] _Fixed;
 
 // There are two major ABIs that appear on different architectures.
 // Both have non-boolean values packed into an N Byte register
@@ -105,28 +105,11 @@ template 
 template 
   struct _VecBltnBtmsk;
 
-template 
-  using _VecN = _VecBuiltin;
-
-template 
-  using _Sse = _VecBuiltin<_UsedBytes>;
-
-template 
-  using _Avx = _VecBuiltin<_UsedBytes>;
-
-template 
-  using _Avx512 = _VecBltnBtmsk<_UsedBytes>;
-
-template 
-  using _Neon = _VecBuiltin<_UsedBytes>;
-
-// implementation-defined:
-using __sse = _Sse<>;
-using __avx = _Avx<>;
-using __avx512 = _Avx512<>;
-using __neon = _Neon<>;
-using __neon128 = _Neon<16>;
-using __neon64 = _Neon<8>;
+#if defined __i386__ || defined __x86_64__
+using __sse [[__gnu__::__diagnose_as__("[SSE]")]] = _VecBuiltin<16>;
+using __avx [[__gnu__::__diagnose_as__("[AVX]")]] = _VecBuiltin<32>;
+using __avx512 [[__gnu__::__diagnose_as__("[AVX512]")]] = _VecBltnBtmsk<64>;
+#endif
 
 // standard:
 template 
@@ -364,7 +347,7 @@ namespace __detail
* users link TUs compiled with different flags. This is especially important
* for using simd in libraries.
*/
-  using __odr_helper
+  using __odr_helper [[__gnu__::__diagnose_as__("[ODR helper]")]]
 = conditional_t<__machine_flags() == 0, _OdrEnforcer,
 		_MachineFlagsTemplate<__machine_flags(), __floating_point_flags()>>;
 
@@ -689,7 +672,7 @@ template 
   __is_avx512_abi()
   {
 constexpr auto _Bytes = __abi_bytes_v<_Abi>;
-return _Bytes <= 64 && is_same_v, _Abi>;
+return _Bytes <= 64 && is_same_v, _Abi>;
   }
 
 // }}}
diff --git a/libstdc++-v3/include/experimental/bits/simd_detail.h b/libstdc++-v3/include/experimental/bits/simd_detail.h
index 78ad33f74e4..1f127cd0d52 100644
--- a/libstdc++-v3/include/experimental/bits/simd_detail.h
+++ b/libstdc++-v3/include/experimental/bits/simd_detail.h
@@ -36,7 +36,7 @@
   {\
 _GLIBCXX_BEGIN_NAMESPACE_VERSION   \
   namespace experimental { \
-  inline namespace parallelism_v2 {
+	inline namespace parallelism_v2 

Re: [PATCH 3/4] remove %K from error() calls in the aarch64/arm back ends (PR 98512)

2021-06-11 Thread Richard Sandiford via Gcc-patches
Martin Sebor via Gcc-patches  writes:
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index 7b37e1b602c..7cdc824730c 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -13242,13 +13242,8 @@ bounds_check (rtx operand, HOST_WIDE_INT low, 
> HOST_WIDE_INT high,
>lane = INTVAL (operand);
>  
>if (lane < low || lane >= high)
> -{
> -  if (exp)
> - error ("%K%s %wd out of range %wd - %wd",
> -exp, desc, lane, low, high - 1);
> -  else
> - error ("%s %wd out of range %wd - %wd", desc, lane, low, high - 1);
> -}
> +error_at (EXPR_LOCATION (exp),
> +   "%s %wd out of range %wd - %wd", desc, lane, low, high - 1);
>  }
>  
>  /* Bounds-check lanes.  */

This part doesn't look safe: “exp” is null when called from arm_const_bounds.

Thanks,
Richard


Re: [PATCH 2/2] arm: Auto-vectorization for MVE: add pack/unpack patterns

2021-06-11 Thread Richard Sandiford via Gcc-patches
Christophe Lyon  writes:
> On Tue, 8 Jun 2021 at 14:10, Richard Sandiford
>  wrote:
>>
>> Christophe Lyon  writes:
>> > This patch adds vec_unpack_hi_, vec_unpack_lo_,
>> > vec_pack_trunc_ patterns for MVE.
>> >
>> > It does so by moving the unpack patterns from neon.md to
>> > vec-common.md, while adding them support for MVE. The pack expander is
>> > derived from the Neon one (which in turn is renamed into
>> > neon_quad_vec_pack_trunc_).
>> >
>> > The patch introduces mve_vec_pack_trunc_ to avoid the need for a
>> > zero-initialized temporary, which is needed if the
>> > vec_pack_trunc_ expander calls @mve_vmovn[bt]q_
>> > instead.
>> >
>> > With this patch, we can now vectorize the 16 and 8-bit versions of
>> > vclz and vshl, although the generated code could still be improved.
>> > For test_clz_s16, we now generate
>> > vldrh.16q3, [r1]
>> > vmovlb.s16   q2, q3
>> > vmovlt.s16   q3, q3
>> > vclz.i32  q2, q2
>> > vclz.i32  q3, q3
>> > vmovnb.i32  q1, q2
>> > vmovnt.i32  q1, q3
>> > vstrh.16q1, [r0]
>> > which could be improved to
>> > vldrh.16q3, [r1]
>> >   vclz.i16q1, q3
>> > vstrh.16q1, [r0]
>> > if we could avoid the need for unpack/pack steps.
>>
>> Yeah, there was a PR about fixing this for popcount.  I guess the same
>> approach would apply here too.
>>
>> > For reference, clang-12 generates:
>> >   vldrh.s32   q0, [r1]
>> >   vldrh.s32   q1, [r1, #8]
>> >   vclz.i32q0, q0
>> >   vstrh.32q0, [r0]
>> >   vclz.i32q0, q1
>> >   vstrh.32q0, [r0, #8]
>> >
>> > 2021-06-03  Christophe Lyon  
>> >
>> >   gcc/
>> >   * config/arm/mve.md (mve_vmovltq_): Prefix with '@'.
>> >   (mve_vmovlbq_): Likewise.
>> >   (mve_vmovnbq_): Likewise.
>> >   (mve_vmovntq_): Likewise.
>> >   (@mve_vec_pack_trunc_): New pattern.
>> >   * config/arm/neon.md (vec_unpack_hi_): Move to
>> >   vec-common.md.
>> >   (vec_unpack_lo_): Likewise.
>> >   (vec_pack_trunc_): Rename to
>> >   neon_quad_vec_pack_trunc_.
>> >   * config/arm/vec-common.md (vec_unpack_hi_): New
>> >   pattern.
>> >   (vec_unpack_lo_): New.
>> >   (vec_pack_trunc_): New.
>> >
>> >   gcc/testsuite/
>> >   * gcc.target/arm/simd/mve-vclz.c: Update expected results.
>> >   * gcc.target/arm/simd/mve-vshl.c: Likewise.
>> > ---
>> >  gcc/config/arm/mve.md| 20 -
>> >  gcc/config/arm/neon.md   | 39 +
>> >  gcc/config/arm/vec-common.md | 89 
>> >  gcc/testsuite/gcc.target/arm/simd/mve-vclz.c |  7 +-
>> >  gcc/testsuite/gcc.target/arm/simd/mve-vshl.c |  5 +-
>> >  5 files changed, 114 insertions(+), 46 deletions(-)
>> >
>> > diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
>> > index 99e46d0bc69..b18292c07d3 100644
>> > --- a/gcc/config/arm/mve.md
>> > +++ b/gcc/config/arm/mve.md
>> > @@ -510,7 +510,7 @@ (define_insn "mve_vrev32q_"
>> >  ;;
>> >  ;; [vmovltq_u, vmovltq_s])
>> >  ;;
>> > -(define_insn "mve_vmovltq_"
>> > +(define_insn "@mve_vmovltq_"
>> >[
>> > (set (match_operand: 0 "s_register_operand" "=w")
>> >   (unspec: [(match_operand:MVE_3 1 
>> > "s_register_operand" "w")]
>> > @@ -524,7 +524,7 @@ (define_insn "mve_vmovltq_"
>> >  ;;
>> >  ;; [vmovlbq_s, vmovlbq_u])
>> >  ;;
>> > -(define_insn "mve_vmovlbq_"
>> > +(define_insn "@mve_vmovlbq_"
>> >[
>> > (set (match_operand: 0 "s_register_operand" "=w")
>> >   (unspec: [(match_operand:MVE_3 1 
>> > "s_register_operand" "w")]
>> > @@ -2187,7 +2187,7 @@ (define_insn "mve_vmlsldavxq_s"
>> >  ;;
>> >  ;; [vmovnbq_u, vmovnbq_s])
>> >  ;;
>> > -(define_insn "mve_vmovnbq_"
>> > +(define_insn "@mve_vmovnbq_"
>> >[
>> > (set (match_operand: 0 "s_register_operand" "=w")
>> >   (unspec: [(match_operand: 1 
>> > "s_register_operand" "0")
>> > @@ -2202,7 +2202,7 @@ (define_insn "mve_vmovnbq_"
>> >  ;;
>> >  ;; [vmovntq_s, vmovntq_u])
>> >  ;;
>> > -(define_insn "mve_vmovntq_"
>> > +(define_insn "@mve_vmovntq_"
>> >[
>> > (set (match_operand: 0 "s_register_operand" "=w")
>> >   (unspec: [(match_operand: 1 
>> > "s_register_operand" "0")
>> > @@ -2214,6 +2214,18 @@ (define_insn "mve_vmovntq_"
>> >[(set_attr "type" "mve_move")
>> >  ])
>> >
>> > +(define_insn "@mve_vec_pack_trunc_"
>> > + [(set (match_operand: 0 "register_operand" "=")
>> > +   (vec_concat:
>> > + (truncate:
>> > + (match_operand:MVE_5 1 "register_operand" "w"))
>> > + (truncate:
>> > + (match_operand:MVE_5 2 "register_operand" "w"]
>> > + "TARGET_HAVE_MVE"
>> > + "vmovnb.i%q0, %q1\;vmovnt.i   %q0, %q2"
>> > +  [(set_attr "type" "mve_move")]
>> > +)
>> > +
>>
>> I realise this is (like you say) based on the neon.md pattern,
>> but we should use separate vmovnb and 

Re: [PATCH] i386: Fix up *vec_concat_0_1 [PR101007]

2021-06-11 Thread Uros Bizjak via Gcc-patches
On Fri, Jun 11, 2021 at 10:59 AM Jakub Jelinek  wrote:
>
> On Fri, Apr 23, 2021 at 12:53:58PM +0800, Hongtao Liu via Gcc-patches wrote:
> > -(define_insn "*vec_concatv4si_0"
> > -  [(set (match_operand:V4SI 0 "register_operand"   "=v,x")
> > - (vec_concat:V4SI
> > -   (match_operand:V2SI 1 "nonimmediate_operand" "vm,?!*y")
> > -   (match_operand:V2SI 2 "const0_operand"   " C,C")))]
> > +(define_insn "*vec_concat_0"
> > +  [(set (match_operand:VI124_128 0 "register_operand"   "=v,x")
> > + (vec_concat:VI124_128
> > +   (match_operand: 1 "nonimmediate_operand" "vm,?!*y")
> > +   (match_operand: 2 "const0_operand"   " C,C")))]
> >"TARGET_SSE2"
> >"@
> > %vmovq\t{%1, %0|%0, %1}
> > @@ -22154,6 +22157,24 @@ (define_insn "avx_vec_concat"
> > (set_attr "prefix" "maybe_evex")
> > (set_attr "mode" "")])
> >
> > +(define_insn_and_split "*vec_concat_0"
> > +  [(set (match_operand:V 0 "register_operand")
> > + (vec_select:V
> > +   (vec_concat:
> > + (match_operand:V 1 "nonimmediate_operand")
> > + (match_operand:V 2 "const0_operand"))
> > +   (match_parallel 3 "movq_parallel"
> > + [(match_operand 4 "const_int_operand")])))]
> > +  "ix86_pre_reload_split ()"
> > +  "#"
> > +  "&& 1"
> > +  [(set (match_dup 0)
> > + (vec_concat:V (match_dup 1) (match_dup 5)))]
> > +{
> > +  operands[1] = gen_lowpart (mode, operands[1]);
> > +  operands[5] = CONST0_RTX (mode);
> > +})
>
> This regressed the following testcase with -msse -mno-sse2.
> The define_insn_and_split splits the permutation into *vec_concat_0
> or *vec_concatv2di_0 insns which both have TARGET_SSE2 in their
> conditions (for the former you can see it above), but the
> define_insn_and_split matches always when the V mode's condition do,
> which for V16QI/V8HI/V4SI/V2DI/V4SF modes is always (well, when those
> modes are valid, which is TARGET_SSE).
>
> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux,
> ok for trunk?
>
> 2021-06-11  Jakub Jelinek  
>
> PR target/101007
> * config/i386/sse.md (*vec_concat_0_1): Require TARGET_SSE2.
>
> * gcc.target/i386/sse-pr101007.c: New test.

OK, even as obvious patch.

Thanks,
Uros.

> --- gcc/config/i386/sse.md.jj   2021-06-07 09:24:57.706689972 +0200
> +++ gcc/config/i386/sse.md  2021-06-10 11:14:52.407588679 +0200
> @@ -22395,7 +22395,7 @@ (define_insn_and_split "*vec_concat (match_operand:V 2 "const0_operand"))
>   (match_parallel 3 "movq_parallel"
> [(match_operand 4 "const_int_operand")])))]
> -  "ix86_pre_reload_split ()"
> +  "TARGET_SSE2 && ix86_pre_reload_split ()"
>"#"
>"&& 1"
>[(set (match_dup 0)
> --- gcc/testsuite/gcc.target/i386/sse-pr101007.c.jj 2021-06-10 
> 11:41:25.818609527 +0200
> +++ gcc/testsuite/gcc.target/i386/sse-pr101007.c2021-06-10 
> 11:38:39.301910017 +0200
> @@ -0,0 +1,14 @@
> +/* PR target/101007 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -msse -mno-sse2" } */
> +
> +typedef unsigned __attribute__((__vector_size__ (8))) U;
> +typedef unsigned __attribute__((__vector_size__ (16))) V;
> +V v;
> +U *p;
> +
> +void
> +foo (void)
> +{
> +  *p = (U) __builtin_shufflevector ((V)(0 == (V){} >= 0), v, 4, 2);
> +}
>
> Jakub
>


[PATCH] middle-end/101009 - fix distance vector recording

2021-06-11 Thread Richard Biener
This fixes recording of distance vectors in case the DDR has just
constant equal indexes.  In that case we expect distance vectors
with zero distances to be recorded which is what was done when
any distance was computed for affine indexes.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-06-11  Richard Biener  

PR middle-end/101009
* tree-data-ref.c (build_classic_dist_vector_1): Make sure
to set *init_b to true when we encounter a constant equal
index pair.
(compute_affine_dependence): Also dump the actual DR_REF.

* gcc.dg/torture/pr101009.c: New testcase.
---
 gcc/testsuite/gcc.dg/torture/pr101009.c | 17 +
 gcc/tree-data-ref.c | 10 --
 2 files changed, 25 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr101009.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr101009.c 
b/gcc/testsuite/gcc.dg/torture/pr101009.c
new file mode 100644
index 000..2bbed1dfc2c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr101009.c
@@ -0,0 +1,17 @@
+/* { dg-do run } */
+/* { dg-additional-options "-fno-tree-sra -fno-tree-pre 
-ftree-loop-distribution" } */
+
+struct a {
+  unsigned b;
+  unsigned c;
+} e, *f = 
+int d = 1;
+int main() {
+  for (; d; d--) {
+struct a g[] = {{2, 1}, {2, 1}};
+*f = g[1];
+  }
+  if (e.c != 1)
+__builtin_abort ();
+  return 0;
+}
diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c
index b1f64684840..b37c234aea8 100644
--- a/gcc/tree-data-ref.c
+++ b/gcc/tree-data-ref.c
@@ -5121,6 +5121,8 @@ build_classic_dist_vector_1 (struct 
data_dependence_relation *ddr,
  non_affine_dependence_relation (ddr);
  return false;
}
+  else
+   *init_b = true;
 }
 
   return true;
@@ -5616,9 +5618,13 @@ compute_affine_dependence (struct 
data_dependence_relation *ddr,
   if (dump_file && (dump_flags & TDF_DETAILS))
 {
   fprintf (dump_file, "(compute_affine_dependence\n");
-  fprintf (dump_file, "  stmt_a: ");
+  fprintf (dump_file, "  ref_a: ");
+  print_generic_expr (dump_file, DR_REF (dra));
+  fprintf (dump_file, ", stmt_a: ");
   print_gimple_stmt (dump_file, DR_STMT (dra), 0, TDF_SLIM);
-  fprintf (dump_file, "  stmt_b: ");
+  fprintf (dump_file, "  ref_b: ");
+  print_generic_expr (dump_file, DR_REF (drb));
+  fprintf (dump_file, ", stmt_b: ");
   print_gimple_stmt (dump_file, DR_STMT (drb), 0, TDF_SLIM);
 }
 
-- 
2.26.2


[PATCH] simplify-rtx: Fix up simplify_logical_relational_operation for vector IOR [PR101008]

2021-06-11 Thread Jakub Jelinek via Gcc-patches
Hi!

simplify_relational_operation callees typically return just const0_rtx
or const_true_rtx and then simplify_relational_operation attempts to fix
that up if the comparison result has vector mode, or floating mode,
or punt if it has scalar mode and vector mode operands (it doesn't know how
exactly to deal with the scalar masks).
But, simplify_logical_relational_operation has a special case, where
it attempts to fold (x < y) | (x >= y) etc. and if it determines it is
always true, it just returns const_true_rtx, without doing the dances that
simplify_relational_operation does.
That results in an ICE on the following testcase, where such folding happens
during expansion (of debug stmts into DEBUG_INSNs) and we ICE because
all of sudden a VOIDmode rtx appears where it expects a vector (V4SImode)
rtx.

The following patch fixes that by moving the adjustement into a separate
helper routine and using it from both simplify_relational_operation and
simplify_logical_relational_operation.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2021-06-11  Jakub Jelinek  

PR rtl-optimization/101008
* simplify-rtx.c (relational_result): New function.
(simplify_logical_relational_operation,
simplify_relational_operation): Use it.

* gcc.dg/pr101008.c: New test.

--- gcc/simplify-rtx.c.jj   2021-05-04 21:02:24.0 +0200
+++ gcc/simplify-rtx.c  2021-06-10 13:56:48.946628822 +0200
@@ -2294,6 +2294,53 @@ comparison_code_valid_for_mode (enum rtx
gcc_unreachable ();
 }
 }
+
+/* Canonicalize RES, a scalar const0_rtx/const_true_rtx to the right
+   false/true value of comparison with MODE where comparison operands
+   have CMP_MODE.  */
+
+static rtx
+relational_result (machine_mode mode, machine_mode cmp_mode, rtx res)
+{
+  if (SCALAR_FLOAT_MODE_P (mode))
+{
+  if (res == const0_rtx)
+return CONST0_RTX (mode);
+#ifdef FLOAT_STORE_FLAG_VALUE
+  REAL_VALUE_TYPE val = FLOAT_STORE_FLAG_VALUE (mode);
+  return const_double_from_real_value (val, mode);
+#else
+  return NULL_RTX;
+#endif
+}
+  if (VECTOR_MODE_P (mode))
+{
+  if (res == const0_rtx)
+   return CONST0_RTX (mode);
+#ifdef VECTOR_STORE_FLAG_VALUE
+  rtx val = VECTOR_STORE_FLAG_VALUE (mode);
+  if (val == NULL_RTX)
+   return NULL_RTX;
+  if (val == const1_rtx)
+   return CONST1_RTX (mode);
+
+  return gen_const_vec_duplicate (mode, val);
+#else
+  return NULL_RTX;
+#endif
+}
+  /* For vector comparison with scalar int result, it is unknown
+ if the target means here a comparison into an integral bitmask,
+ or comparison where all comparisons true mean const_true_rtx
+ whole result, or where any comparisons true mean const_true_rtx
+ whole result.  For const0_rtx all the cases are the same.  */
+  if (VECTOR_MODE_P (cmp_mode)
+  && SCALAR_INT_MODE_P (mode)
+  && res == const_true_rtx)
+return NULL_RTX;
+
+  return res;
+}
   
 /* Simplify a logical operation CODE with result mode MODE, operating on OP0
and OP1, which should be both relational operations.  Return 0 if no such
@@ -2329,7 +2376,7 @@ simplify_context::simplify_logical_relat
   int mask = mask0 | mask1;
 
   if (mask == 15)
-return const_true_rtx;
+return relational_result (mode, GET_MODE (op0), const_true_rtx);
 
   code = mask_to_comparison (mask);
 
@@ -5315,51 +5362,7 @@ simplify_context::simplify_relational_op
 
   tem = simplify_const_relational_operation (code, cmp_mode, op0, op1);
   if (tem)
-{
-  if (SCALAR_FLOAT_MODE_P (mode))
-   {
-  if (tem == const0_rtx)
-return CONST0_RTX (mode);
-#ifdef FLOAT_STORE_FLAG_VALUE
- {
-   REAL_VALUE_TYPE val;
-   val = FLOAT_STORE_FLAG_VALUE (mode);
-   return const_double_from_real_value (val, mode);
- }
-#else
- return NULL_RTX;
-#endif
-   }
-  if (VECTOR_MODE_P (mode))
-   {
- if (tem == const0_rtx)
-   return CONST0_RTX (mode);
-#ifdef VECTOR_STORE_FLAG_VALUE
- {
-   rtx val = VECTOR_STORE_FLAG_VALUE (mode);
-   if (val == NULL_RTX)
- return NULL_RTX;
-   if (val == const1_rtx)
- return CONST1_RTX (mode);
-
-   return gen_const_vec_duplicate (mode, val);
- }
-#else
- return NULL_RTX;
-#endif
-   }
-  /* For vector comparison with scalar int result, it is unknown
-if the target means here a comparison into an integral bitmask,
-or comparison where all comparisons true mean const_true_rtx
-whole result, or where any comparisons true mean const_true_rtx
-whole result.  For const0_rtx all the cases are the same.  */
-  if (VECTOR_MODE_P (cmp_mode)
- && SCALAR_INT_MODE_P (mode)
- && tem == const_true_rtx)
-   return NULL_RTX;
-
-  return tem;
-}
+return relational_result (mode, 

[PATCH] i386: Fix up *vec_concat_0_1 [PR101007]

2021-06-11 Thread Jakub Jelinek via Gcc-patches
On Fri, Apr 23, 2021 at 12:53:58PM +0800, Hongtao Liu via Gcc-patches wrote:
> -(define_insn "*vec_concatv4si_0"
> -  [(set (match_operand:V4SI 0 "register_operand"   "=v,x")
> - (vec_concat:V4SI
> -   (match_operand:V2SI 1 "nonimmediate_operand" "vm,?!*y")
> -   (match_operand:V2SI 2 "const0_operand"   " C,C")))]
> +(define_insn "*vec_concat_0"
> +  [(set (match_operand:VI124_128 0 "register_operand"   "=v,x")
> + (vec_concat:VI124_128
> +   (match_operand: 1 "nonimmediate_operand" "vm,?!*y")
> +   (match_operand: 2 "const0_operand"   " C,C")))]
>"TARGET_SSE2"
>"@
> %vmovq\t{%1, %0|%0, %1}
> @@ -22154,6 +22157,24 @@ (define_insn "avx_vec_concat"
> (set_attr "prefix" "maybe_evex")
> (set_attr "mode" "")])
>  
> +(define_insn_and_split "*vec_concat_0"
> +  [(set (match_operand:V 0 "register_operand")
> + (vec_select:V
> +   (vec_concat:
> + (match_operand:V 1 "nonimmediate_operand")
> + (match_operand:V 2 "const0_operand"))
> +   (match_parallel 3 "movq_parallel"
> + [(match_operand 4 "const_int_operand")])))]
> +  "ix86_pre_reload_split ()"
> +  "#"
> +  "&& 1"
> +  [(set (match_dup 0)
> + (vec_concat:V (match_dup 1) (match_dup 5)))]
> +{
> +  operands[1] = gen_lowpart (mode, operands[1]);
> +  operands[5] = CONST0_RTX (mode);
> +})

This regressed the following testcase with -msse -mno-sse2.
The define_insn_and_split splits the permutation into *vec_concat_0
or *vec_concatv2di_0 insns which both have TARGET_SSE2 in their
conditions (for the former you can see it above), but the
define_insn_and_split matches always when the V mode's condition do,
which for V16QI/V8HI/V4SI/V2DI/V4SF modes is always (well, when those
modes are valid, which is TARGET_SSE). 

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux,
ok for trunk?

2021-06-11  Jakub Jelinek  

PR target/101007
* config/i386/sse.md (*vec_concat_0_1): Require TARGET_SSE2.

* gcc.target/i386/sse-pr101007.c: New test.

--- gcc/config/i386/sse.md.jj   2021-06-07 09:24:57.706689972 +0200
+++ gcc/config/i386/sse.md  2021-06-10 11:14:52.407588679 +0200
@@ -22395,7 +22395,7 @@ (define_insn_and_split "*vec_concat= 0), v, 4, 2);
+}

Jakub



Re: [PATCH 3/4] remove %K from error() calls in the aarch64/arm back ends (PR 98512)

2021-06-11 Thread Christophe Lyon via Gcc-patches
On Fri, 11 Jun 2021 at 01:29, Martin Sebor via Gcc-patches
 wrote:
>
> This patch removes the uses of %K from error() calls in the aarch64
> and arm back ends.  I tested this change only by building a cross-
> compiler but I can't easily run the tests so I'd appreciate some help
> validating it.  The fallout from the change should be limited to changes
> to error messages so in the worst case it could be addressed after
> committing the patch.

I've submitted a validation with your patch, I'll let you know the
results (in a few hours)

Thanks

Christophe


Re: [PATCH] Add statistics counting to PHI-OPT

2021-06-11 Thread Richard Biener via Gcc-patches
On Fri, Jun 11, 2021 at 6:15 AM apinski--- via Gcc-patches
 wrote:
>
> From: Andrew Pinski 
>
> This should have been done before I started to work on connecting
> PHI-OPT to match-and-simplify to see quickly if we miss anything
> but it is better late than never.
> Anyways there was no statistics counting in PHI-OPT before so adding
> it is the right thing to do.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

OK (minor nit - the strings have inconsistent starting with/without caps)

Thanks,
Richard.

> gcc/ChangeLog:
>
> * tree-ssa-phiopt.c (replace_phi_edge_with_variable):
> Add counting of how many times it is done.
> (factor_out_conditional_conversion): Likewise.
> (match_simplify_replacement): Likewise.
> (value_replacement): Likewise.
> (spaceship_replacement): Likewise.
> (cond_store_replacement): Likewise.
> (cond_if_else_store_replacement_1): Likewise.
> (hoist_adjacent_loads): Likewise.
> ---
>  gcc/tree-ssa-phiopt.c | 17 +
>  1 file changed, 17 insertions(+)
>
> diff --git a/gcc/tree-ssa-phiopt.c b/gcc/tree-ssa-phiopt.c
> index 76f4e7ec843..02e26f974a5 100644
> --- a/gcc/tree-ssa-phiopt.c
> +++ b/gcc/tree-ssa-phiopt.c
> @@ -419,6 +419,8 @@ replace_phi_edge_with_variable (basic_block cond_block,
>gsi = gsi_last_bb (cond_block);
>gsi_remove (, true);
>
> +  statistics_counter_event (cfun, "Replace PHI with variable", 1);
> +
>if (dump_file && (dump_flags & TDF_DETAILS))
>  fprintf (dump_file,
>   "COND_EXPR in block %d and PHI in block %d converted to 
> straightline code.\n",
> @@ -618,6 +620,9 @@ factor_out_conditional_conversion (edge e0, edge e1, gphi 
> *phi,
>/* Remove the original PHI stmt.  */
>gsi = gsi_for_stmt (phi);
>gsi_remove (, true);
> +
> +  statistics_counter_event (cfun, "factored out cast", 1);
> +
>return newphi;
>  }
>
> @@ -893,6 +898,11 @@ match_simplify_replacement (basic_block cond_bb, 
> basic_block middle_bb,
>
>replace_phi_edge_with_variable (cond_bb, e1, phi, result);
>
> +  /* Add Statistic here even though replace_phi_edge_with_variable already
> + does it as we want to be able to count when match-simplify happens vs
> + the others.  */
> +  statistics_counter_event (cfun, "match-simplify PHI replacement", 1);
> +
>/* Note that we optimized this PHI.  */
>return true;
>  }
> @@ -1196,6 +1206,8 @@ value_replacement (basic_block cond_bb, basic_block 
> middle_bb,
> }
>else
> {
> + statistics_counter_event (cfun, "Replace PHI with 
> variable/value_replacement", 1);
> +
>   /* Replace the PHI arguments with arg. */
>   SET_PHI_ARG_DEF (phi, e0->dest_idx, arg);
>   SET_PHI_ARG_DEF (phi, e1->dest_idx, arg);
> @@ -2320,6 +2332,7 @@ spaceship_replacement (basic_block cond_bb, basic_block 
> middle_bb,
>
>gimple_stmt_iterator psi = gsi_for_stmt (phi);
>remove_phi_node (, true);
> +  statistics_counter_event (cfun, "spaceship replacement", 1);
>
>return true;
>  }
> @@ -2982,6 +2995,7 @@ cond_store_replacement (basic_block middle_bb, 
> basic_block join_bb,
>fprintf (dump_file, "\nInserted a new PHI statement in joint 
> block:\n");
>print_gimple_stmt (dump_file, new_stmt, 0, TDF_VOPS|TDF_MEMSYMS);
>  }
> +  statistics_counter_event (cfun, "conditional store replacement", 1);
>
>return true;
>  }
> @@ -3056,6 +3070,8 @@ cond_if_else_store_replacement_1 (basic_block then_bb, 
> basic_block else_bb,
>else
>  gsi_insert_before (, new_stmt, GSI_NEW_STMT);
>
> +  statistics_counter_event (cfun, "if-then-else store replacement", 1);
> +
>return true;
>  }
>
> @@ -3469,6 +3485,7 @@ hoist_adjacent_loads (basic_block bb0, basic_block bb1,
>gsi_move_to_bb_end (, bb0);
>gsi2 = gsi_for_stmt (def2);
>gsi_move_to_bb_end (, bb0);
> +  statistics_counter_event (cfun, "hoisted loads", 1);
>
>if (dump_file && (dump_flags & TDF_DETAILS))
> {
> --
> 2.27.0
>


Re: [Patch] contrib/gcc-changelog: Check that PR in subject in in changelog

2021-06-11 Thread Martin Liška

On 6/10/21 5:32 PM, Tobias Burnus wrote:

On 10.06.21 16:46, Martin Liška wrote:

Note that flake8 has "plugins". At openSUSE, I install:


... None of those are available on Ubuntu – I probably should nag doko
or start using my private computer for the tests ...


I support the patch, please install it (except the not intentional change
in gcc/c/c-parser.c).

Then, please ask Jonathan for updating of the server hook.

Cheers,
Martin



Updated as suggested and with you flake8-fix patch applied on top.

Tobias


-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf




[PATCH] Use stablesort for sorting association chain

2021-06-11 Thread Richard Biener
This should preserve the original association order as much as possible
for the initial SLP discovery attempt and also improve consistency.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-06-11  Richard Biener  

* tree-vect-slp.c (vect_build_slp_tree_2): Use stablesort
to sort operands of the associative chain.
---
 gcc/tree-vect-slp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index c4f8f38012f..6237a61ffd4 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -1865,7 +1865,7 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
  /* Now we have a set of chains with the same length.  */
  /* 1. pre-sort according to def_type and operation.  */
  for (unsigned lane = 0; lane < group_size; ++lane)
-   chains[lane].sort (dt_sort_cmp, vinfo);
+   chains[lane].stablesort (dt_sort_cmp, vinfo);
  if (dump_enabled_p ())
{
  dump_printf_loc (MSG_NOTE, vect_location,
-- 
2.26.2


Re: [PATCH] Expose stable sort algorithm to gcc_sort_r and add vec::stablesort

2021-06-11 Thread Richard Biener
On Thu, 10 Jun 2021, Alexander Monakov wrote:

> On Thu, 10 Jun 2021, Richard Biener wrote:
> 
> > This makes it possible to apply GCCs stable sort algorithm to vec<>
> > and also use it with the qsort_r compatible interface.
> > 
> > Alex, any comments?
> 
> I'm afraid the patch is not correct, see below; (I'll also point out
> errors in comments while at it).

Whoops - thanks for noticing.  This is what you get when copy-editing
adding gcc_stablesort_r in (which I had originally omitted) ...

Fixed, re-bootstrapped and tested on x86_64-unknown-linux-gnu and pushed.

Thanks,
Richard.

>From 0a9a35b9b07dfc82239545ec43dacbc9091543fa Mon Sep 17 00:00:00 2001
From: Richard Biener 
Date: Thu, 10 Jun 2021 11:03:55 +0200
Subject: [PATCH] Expose stable sort algorithm to gcc_sort_r and add
 vec::stablesort
To: gcc-patches@gcc.gnu.org

This makes it possible to apply GCCs stable sort algorithm to vec<>
and also use it with the qsort_r compatible interface.

2021-06-10  Richard Biener  

* system.h (gcc_stablesort_r): Declare.
* sort.cc (gcc_sort_r): Support stable sort.
(gcc_stablesort_r): Define.
* vec.h (vec<>::stablesort): Add.
---
 gcc/sort.cc  | 14 +-
 gcc/system.h |  1 +
 gcc/vec.h| 24 
 3 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/gcc/sort.cc b/gcc/sort.cc
index fe499b5ec73..1c83c62008d 100644
--- a/gcc/sort.cc
+++ b/gcc/sort.cc
@@ -277,8 +277,12 @@ gcc_sort_r (void *vbase, size_t n, size_t size, 
sort_r_cmp_fn *cmp, void *data)
 {
   if (n < 2)
 return;
+  size_t nlim = 5;
+  bool stable = (ssize_t) size < 0;
+  if (stable)
+nlim = 3, size = ~size;
   char *base = (char *)vbase;
-  sort_r_ctx c = {data, cmp, base, n, size, 5};
+  sort_r_ctx c = {data, cmp, base, n, size, nlim};
   long long scratch[32];
   size_t bufsz = (n / 2) * size;
   void *buf = bufsz <= sizeof scratch ? scratch : xmalloc (bufsz);
@@ -296,3 +300,11 @@ gcc_stablesort (void *vbase, size_t n, size_t size, cmp_fn 
*cmp)
 {
   gcc_qsort (vbase, n, ~size, cmp);
 }
+
+/* Stable sort, signature-compatible to Glibc qsort_r.  */
+void
+gcc_stablesort_r (void *vbase, size_t n, size_t size, sort_r_cmp_fn *cmp,
+ void *data)
+{
+  gcc_sort_r (vbase, n, ~size, cmp, data);
+}
diff --git a/gcc/system.h b/gcc/system.h
index 3c856266cc2..adde3e264b6 100644
--- a/gcc/system.h
+++ b/gcc/system.h
@@ -1250,6 +1250,7 @@ void gcc_sort_r (void *, size_t, size_t, sort_r_cmp_fn *, 
void *);
 void gcc_qsort (void *, size_t, size_t, int (*)(const void *, const void *));
 void gcc_stablesort (void *, size_t, size_t,
 int (*)(const void *, const void *));
+void gcc_stablesort_r (void *, size_t, size_t, sort_r_cmp_fn *, void *data);
 /* Redirect four-argument qsort calls to gcc_qsort; one-argument invocations
correspond to vec::qsort, and use C qsort internally.  */
 #define PP_5th(a1, a2, a3, a4, a5, ...) a5
diff --git a/gcc/vec.h b/gcc/vec.h
index 24df2db0eeb..193377cb69c 100644
--- a/gcc/vec.h
+++ b/gcc/vec.h
@@ -612,6 +612,7 @@ public:
   void block_remove (unsigned, unsigned);
   void qsort (int (*) (const void *, const void *));
   void sort (int (*) (const void *, const void *, void *), void *);
+  void stablesort (int (*) (const void *, const void *, void *), void *);
   T *bsearch (const void *key, int (*compar)(const void *, const void *));
   T *bsearch (const void *key,
  int (*compar)(const void *, const void *, void *), void *);
@@ -1160,6 +1161,17 @@ vec::sort (int (*cmp) (const void *, 
const void *, void *),
 gcc_sort_r (address (), length (), sizeof (T), cmp, data);
 }
 
+/* Sort the contents of this vector with gcc_stablesort_r.  CMP is the
+   comparison function to pass to qsort.  */
+
+template
+inline void
+vec::stablesort (int (*cmp) (const void *, const void *,
+void *), void *data)
+{
+  if (length () > 1)
+gcc_stablesort_r (address (), length (), sizeof (T), cmp, data);
+}
 
 /* Search the contents of the sorted vector with a binary search.
CMP is the comparison function to pass to bsearch.  */
@@ -1488,6 +1500,7 @@ public:
   void block_remove (unsigned, unsigned);
   void qsort (int (*) (const void *, const void *));
   void sort (int (*) (const void *, const void *, void *), void *);
+  void stablesort (int (*) (const void *, const void *, void *), void *);
   T *bsearch (const void *key, int (*compar)(const void *, const void *));
   T *bsearch (const void *key,
  int (*compar)(const void *, const void *, void *), void *);
@@ -2053,6 +2066,17 @@ vec::sort (int (*cmp) (const void *, 
const void *,
 m_vec->sort (cmp, data);
 }
 
+/* Sort the contents of this vector with gcc_stablesort_r.  CMP is the
+   comparison function to pass to qsort.  */
+
+template
+inline void
+vec::stablesort (int (*cmp) (const void *, const void *,
+void *), void *data)
+{
+  if (m_vec)
+

Re: [PATCH] PR fortran/100950 - ICE in output_constructor_regular_field, at varasm.c:5514

2021-06-11 Thread Bernhard Reutner-Fischer via Gcc-patches
On 10 June 2021 20:52:12 CEST, Harald Anlauf  wrote:

>> I think the mpz_set_si args above fit on one line.
>
>That's true.
>
>Since this block is exactly the same as for constant strings,
>which is handled in the first condition, I've thought some more
>and am convinced now that these two can be fused.  Done now.

Thanks.
Btw.. I know that we cast hwi to long int often and use %ld but think there is 
a HOST_WIDE_INT_PRINT_DEC format macro to print a hwi.

>I've also added two cornercases to the testcase, and regtested again.
>
>> btw.. there's a commentary typo in add_init_expr_to_sym():
>> s/skeep/skip/
>
>That is a completely unrelated issue in a different file, right?

Completely unrelated, yes.
>
>Thanks for your constructive comments!

thanks for the patch!