[PATCH v3] LoongArch: Add support code model extreme.

2022-08-19 Thread Lulu Cheng
v1 -> v2:
- Modify some description information.
- Add options -W[no]extreme-plt, warn about code model extreme not support plt 
mode,
and then disable plt.

v2 -> v3:
- When -mcmodel=extreme, default set to -fno-plt mode, if the user forces to use
'-mcmodel=extreme -fplt', an error will be reported.
- Delete -Wextreme-plt.
- Fix bug when compiling with '-mcmodel=normal -fno-plt -mno-explicit-relocs'.


Use five instructions to calculate a signed 64-bit offset relative to the pc.

gcc/ChangeLog:

* config/loongarch/loongarch-opts.cc: Allow cmodel to be extreme.
* config/loongarch/loongarch.cc (loongarch_call_tls_get_addr):
Add extreme support for TLS GD and LD types.
(loongarch_legitimize_tls_address): Add extreme support for TLS LE
and IE.
(loongarch_split_symbol): When compiling with -mcmodel=extreme,
the symbol address will be obtained through five instructions.
(loongarch_print_operand_reloc): Add support.
(loongarch_print_operand): Add support.
(loongarch_print_operand_address): Add support.
(loongarch_option_override_internal): Set '-mcmodel=extreme' option
incompatible with '-mno-explicit-relocs'.
* config/loongarch/loongarch.md (@lui_l_hi20):
Loads bits 12-31 of data into registers.
(lui_h_lo20): Load bits 32-51 of the data and spell bits 0-31 of
the source register.
(lui_h_hi12): Load bits 52-63 of the data and spell bits 0-51 of
the source register.
* config/loongarch/predicates.md: Symbols need to be decomposed
when defining the macro TARGET_CMODEL_EXTREME
* doc/invoke.texi: Modify the description information of cmodel in the 
document.
Document -W[no-]extreme-plt.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/func-call-1.c: Add option '-mcmodel=normal'.
* gcc.target/loongarch/func-call-2.c: Likewise.
* gcc.target/loongarch/func-call-3.c: Likewise.
* gcc.target/loongarch/func-call-4.c: Likewise.
* gcc.target/loongarch/func-call-5.c: Likewise.
* gcc.target/loongarch/func-call-6.c: Likewise.
* gcc.target/loongarch/func-call-7.c: Likewise.
* gcc.target/loongarch/func-call-8.c: Likewise.
* gcc.target/loongarch/relocs-symbol-noaddend.c: Likewise.
* gcc.target/loongarch/func-call-extreme-1.c: New test.
* gcc.target/loongarch/func-call-extreme-2.c: New test.
---
 gcc/config/loongarch/loongarch-opts.cc|   3 +-
 gcc/config/loongarch/loongarch.cc | 223 +++---
 gcc/config/loongarch/loongarch.md |  34 ++-
 gcc/config/loongarch/predicates.md|   9 +-
 gcc/doc/invoke.texi   |  50 +---
 .../gcc.target/loongarch/func-call-1.c|   2 +-
 .../gcc.target/loongarch/func-call-2.c|   2 +-
 .../gcc.target/loongarch/func-call-3.c|   2 +-
 .../gcc.target/loongarch/func-call-4.c|   2 +-
 .../gcc.target/loongarch/func-call-5.c|   2 +-
 .../gcc.target/loongarch/func-call-6.c|   2 +-
 .../gcc.target/loongarch/func-call-7.c|   2 +-
 .../gcc.target/loongarch/func-call-8.c|   2 +-
 .../loongarch/func-call-extreme-1.c   |  32 +++
 .../loongarch/func-call-extreme-2.c   |  32 +++
 .../loongarch/relocs-symbol-noaddend.c|   2 +-
 16 files changed, 319 insertions(+), 82 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/func-call-extreme-1.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/func-call-extreme-2.c

diff --git a/gcc/config/loongarch/loongarch-opts.cc 
b/gcc/config/loongarch/loongarch-opts.cc
index 3f70943ded6..2ae89f23443 100644
--- a/gcc/config/loongarch/loongarch-opts.cc
+++ b/gcc/config/loongarch/loongarch-opts.cc
@@ -376,14 +376,13 @@ fallback:
 
   /* 5.  Target code model */
   t.cmodel = constrained.cmodel ? opt_cmodel : CMODEL_NORMAL;
-  if (t.cmodel != CMODEL_NORMAL)
+  if (t.cmodel != CMODEL_NORMAL && t.cmodel != CMODEL_EXTREME)
 {
   warning (0, "%qs is not supported, now cmodel is set to %qs",
   loongarch_cmodel_strings[t.cmodel], "normal");
   t.cmodel = CMODEL_NORMAL;
 }
 
-
   /* Cleanup and return.  */
   obstack_free (&msg_obstack, NULL);
   *target = t;
diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 24378143641..c5667f5ae3c 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -2436,7 +2436,19 @@ loongarch_call_tls_get_addr (rtx sym, enum 
loongarch_symbol_type type, rtx v0)
   /* Split tls symbol to high and low.  */
   rtx high = gen_rtx_HIGH (Pmode, copy_rtx (loc));
   high = loongarch_force_temporary (tmp, high);
-  emit_insn (gen_tls_low (Pmode, a0, high, loc));
+
+  if (TARGET_CMODEL_EXTREME)
+   {
+ gcc_assert (TARGET_EXPLICIT_RELOCS);

Re: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)

2022-08-19 Thread 钟居哲

After deep analysis and several tries. I have made a analysis and conclusion as 
follows.
I really appreciate if you could spend some time take a look at the following 
analysis that I made:

According to the codes in test_vector_subregs_fore_back:

~~~
 poly_uint64 nunits = GET_MODE_NUNITS (inner_mode);
  unsigned int min_nunits = constant_lower_bound (nunits);
  scalar_mode int_mode = GET_MODE_INNER (inner_mode);
  unsigned int count = gcd (min_nunits, 4);

  rtx_vector_builder builder (inner_mode, count, 2);
  for (unsigned int i = 0; i < count; ++i)
builder.quick_push (gen_int_mode (i, int_mode));
  for (unsigned int i = 0; i < count; ++i)
builder.quick_push (gen_int_mode (-(int) i, int_mode));
  rtx x = builder.build ();


Analyze the codes and tried several patterns:

For poly_uint16 (2,2):
x = (const_vector:VNx1DI [
(const_int 0 [0])
(const_int 1 [0x1])
repeat [
(const_int 0 [0])
(const_int -1 [0x])
]
])

For poly_uint16 (4,4):
x = (const_vector:VNx1DI [
(const_int 0 [0])
(const_int 1 [0x1])
(const_int 2 [0x2])
(const_int 3 [0x3])
repeat [
(const_int 0 [0])
(const_int -1 [0x])
(const_int -2 [0xfffe])
(const_int -3 [0xfffd])
]
])

So, I think I can conclude the pattern rule as follows:

poly_uint16 (n,n):
x = (const_vector:VNx1DI [
(const_int 0 [0])
(const_int 1 [0x1])
(const_int 2 [0x2])
(const_int 3 [0x3])
..
(const_int n [hex(n)])
repeat [
(const_int 0 [0])
(const_int -1 [0x])
(const_int -2 [0xfffe])
(const_int -3 [0xfffd])
.
 (const_int -n [hex(-n)])
]
])
Am I understanding right?

Let's first take a look at the codes you write:

rtx_vector_builder builder (inner_mode, count, 2);
  for (unsigned int i = 0; i < count; ++i)
builder.quick_push (gen_int_mode (i, int_mode));
  for (unsigned int i = 0; i < count; ++i)
builder.quick_push (gen_int_mode (-(int) i, int_mode));
  rtx x = builder.build ();

So for poly_uint (1,1):
Ideally according to the codes, you want to generate the pattern like this:
x = (const_vector:VNx1DI [
(const_int 0 [0])
repeat [
(const_int 0 [0])
]
])

In your codes, when count = 1 for poly_uint16 (1,1). And i < count iteration 
make i always is 0.
In this case, i = -i = 0.

So we will end up with the first value = 0, and repeat value is also 0, in "rtx 
x = builder.build ();".
Because GCC thinks all the value are 0, so the pattern is changed as follows 
which is not a stepped vector:

x = (const_vector:VNx1DI repeat [
(const_int 0 [0])
])

This is a const_vector duplicate vector.
This is not the correct pattern you want so it fails in the following check.

I think the test pattern generator goes wrong in poly_uint16 (1,1) which is the 
only corner case will go wrong.

To fix this, we should adjust test pattern generator to specifically 
poly_uint16 (1,1). We should make first value
and repeat value different. I tried 2 solution and they both pass for 
poly_uint16 (1,1):


Original codes:
rtx_vector_builder builder (inner_mode, count, 2);
  for (unsigned int i = 0; i < count; ++i)
builder.quick_push (gen_int_mode (i, int_mode));
  for (unsigned int i = 0; i < count; ++i)
builder.quick_push (gen_int_mode (-(int) i, int_mode));



1st solution:
rtx_vector_builder builder (inner_mode, count, 2);
  for (unsigned int i = 0; i < count; ++i)
builder.quick_push (gen_int_mode (count == 1 ? 1 : i, int_mode));
  for (unsigned int i = 0; i < count; ++i)
builder.quick_push (gen_int_mode (-(int) i, int_mode));

x = (const_vector:VNx1DI [
(const_int 0 [0])
repeat [
(const_int 1 [0x1])
]
])



2nd solution:
rtx_vector_builder builder (inner_mode, count, 2);
  for (unsigned int i = 0; i < count; ++i)
builder.quick_push (gen_int_mode (i, int_mode));
  for (unsigned int i = 0; i < count; ++i)
builder.quick_push (gen_int_mode (count == 1 ? -1 : -(int) i, int_mode));

x= 
(const_vector:VNx1DI [
(const_int 0 [0])
repeat [
(const_int -1 [0x])
]
])


Do you agree with these solution? Or you could offer a better solution to fix 
this? Thank you so much.


juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2022-08-19 20:52
To: juzhe.zhong\@rivai.ai
CC: gcc-patches; rguenther; kito.cheng
Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) 
and allow the machine_mode definition with poly_uint16 (1, 1)
"juzhe.zh...@rivai.ai"  writes:
>> Ah, right, sorry for the bogus suggestion.
>> In

[PATCH v3] c++: Implement -Wself-move warning [PR81159]

2022-08-19 Thread Marek Polacek via Gcc-patches
On Thu, Aug 18, 2022 at 08:33:47PM -0400, Jason Merrill wrote:
> On 8/18/22 13:19, Marek Polacek wrote:
> > On Mon, Aug 15, 2022 at 03:54:05PM -0400, Jason Merrill wrote:
> > > On 8/9/22 09:37, Marek Polacek wrote:
> > > > +  /* We're looking for *std::move ((T &) &arg), or
> > > > + *std::move ((T &) (T *) r) if the argument it a reference.  */
> > > > +  if (!REFERENCE_REF_P (rhs)
> > > > +  || TREE_CODE (TREE_OPERAND (rhs, 0)) != CALL_EXPR)
> > > > +return;
> > > > +  tree fn = TREE_OPERAND (rhs, 0);
> > > > +  if (!is_std_move_p (fn))
> > > > +return;
> > > > +  tree arg = CALL_EXPR_ARG (fn, 0);
> > > > +  if (TREE_CODE (arg) != NOP_EXPR)
> > > > +return;
> > > > +  /* Strip the (T &).  */
> > > > +  arg = TREE_OPERAND (arg, 0);
> > > > +  /* Strip the (T *) or &.  */
> > > > +  arg = TREE_OPERAND (arg, 0);
> > > 
> > > Are you sure these are the only two expressions that can make it here? 
> > > What
> > > if the argument to move is *Tptr?
> > 
> > Not 100% sure but I couldn't find any other form.  For *Tptr we get
> > *std::move ((int * &) &Tptr)
> 
> That likes like what you'd get when the argument is Tptr, not when it's
> *Tptr.  And indeed that's what I see in the testcase:
> 
> > +  Tptr = std::move (Tptr); // { dg-warning "moving a variable to itself" }
> 
> is missing the *

Duh, sorry.  The previous patch didn't handle the *Tptr case.  Further poking
revealed that we need special care to handle (*(Tptr)) and **Tptr etc.  So in
this patch I'm stripping all *s and V_C_Es.  Sigh.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
About 5 years ago we got a request to implement -Wself-move, which
warns about useless moves like this:

  int x;
  x = std::move (x);

This patch implements that warning.

PR c++/81159

gcc/c-family/ChangeLog:

* c.opt (Wself-move): New option.

gcc/cp/ChangeLog:

* typeck.cc (maybe_warn_self_move): New.
(cp_build_modify_expr): Call maybe_warn_self_move.

gcc/ChangeLog:

* doc/invoke.texi: Document -Wself-move.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wself-move1.C: New test.
---
 gcc/c-family/c.opt  |   4 +
 gcc/cp/typeck.cc|  51 +-
 gcc/doc/invoke.texi |  23 -
 gcc/testsuite/g++.dg/warn/Wself-move1.C | 122 
 4 files changed, 198 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/warn/Wself-move1.C

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index dfdebd596ef..f776efd39d8 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -1229,6 +1229,10 @@ Wselector
 ObjC ObjC++ Var(warn_selector) Warning
 Warn if a selector has multiple methods.
 
+Wself-move
+C++ ObjC++ Var(warn_self_move) Warning LangEnabledBy(C++ ObjC++, Wall)
+Warn when a value is moved to itself with std::move.
+
 Wsequence-point
 C ObjC C++ ObjC++ Var(warn_sequence_point) Warning LangEnabledBy(C ObjC C++ 
ObjC++,Wall)
 Warn about possible violations of sequence point rules.
diff --git a/gcc/cp/typeck.cc b/gcc/cp/typeck.cc
index 992ebfd99fb..b2bd13db1b6 100644
--- a/gcc/cp/typeck.cc
+++ b/gcc/cp/typeck.cc
@@ -8897,7 +8897,54 @@ cp_build_c_cast (location_t loc, tree type, tree expr,
 
   return error_mark_node;
 }
-
+
+/* Warn when a value is moved to itself with std::move.  LHS is the target,
+   RHS may be the std::move call, and LOC is the location of the whole
+   assignment.  */
+
+static void
+maybe_warn_self_move (location_t loc, tree lhs, tree rhs)
+{
+  if (!warn_self_move)
+return;
+
+  /* C++98 doesn't know move.  */
+  if (cxx_dialect < cxx11)
+return;
+
+  if (processing_template_decl)
+return;
+
+  if (!REFERENCE_REF_P (rhs)
+  || TREE_CODE (TREE_OPERAND (rhs, 0)) != CALL_EXPR)
+return;
+  tree fn = TREE_OPERAND (rhs, 0);
+  if (!is_std_move_p (fn))
+return;
+
+  /* Just a little helper to strip * and various NOPs.  */
+  auto extract_op = [] (tree &op) {
+STRIP_NOPS (op);
+while (INDIRECT_REF_P (op))
+  op = TREE_OPERAND (op, 0);
+op = maybe_undo_parenthesized_ref (op);
+STRIP_ANY_LOCATION_WRAPPER (op);
+  };
+
+  tree arg = CALL_EXPR_ARG (fn, 0);
+  extract_op (arg);
+  if (TREE_CODE (arg) == ADDR_EXPR)
+arg = TREE_OPERAND (arg, 0);
+  extract_op (lhs);
+
+  if (cp_tree_equal (lhs, arg))
+{
+  auto_diagnostic_group d;
+  if (warning_at (loc, OPT_Wself_move, "moving a variable to itself"))
+   inform (loc, "remove % call");
+}
+}
+
 /* For use from the C common bits.  */
 tree
 build_modify_expr (location_t location,
@@ -9101,6 +9148,8 @@ cp_build_modify_expr (location_t loc, tree lhs, enum 
tree_code modifycode,
 
   if (modifycode == NOP_EXPR)
{
+ maybe_warn_self_move (loc, lhs, rhs);
+
  if (c_dialect_objc ())
{
  result = objc_maybe_build_modify_expr (lhs, rhs);
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index f65d351a5fc..5dea3fee124 1006

Ping [PATCH V2] libcpp: Optimize #pragma once with a hash table [PR58770]

2022-08-19 Thread Paul Hollinsky via Gcc-patches
Hi all,

Would love some feedback on this patch!

Thanks,
Paul

On Mon, Aug 01, 2022 at 05:18:40AM +, Paul Hollinsky wrote:
> Rather than traversing the all_files linked list for every include,
> this factors out the quick idempotency checks (modification time
> and size) to be the keys in a hash table so we can find matching
> files quickly.
> 
> The hash table value type is a linked list, in case more than one
> file matches the quick check.
> 
> The table is only built if a once-only file is seen, so include
> guard performance is not affected.
> 
> My laptop would previously complete Ricardo's benchmark from the
> PR in ~1.1s using #pragma once, and ~0.35s using include guards.
> 
> After this change, both benchmarks now complete in ~0.35s. I did
> have to randomize the modification dates on the benchmark headers
> so the files did not all end up in the same hash table list, but
> that would likely not come up outside of the contrived benchmark.
> 
> I bootstrapped and ran the testsuite on x86_64 Darwin, as well as
> ppc64le and aarch64 Linux.
> 
> libcpp/ChangeLog:
> 
>   PR preprocessor/58770
>   * internal.h: Add hash table for #pragma once
>   * files.cc: Optimize #pragma once with the hash table
> 
> Signed-off-by: Paul Hollinsky 
> ---
>  libcpp/files.cc   | 116 +++---
>  libcpp/internal.h |   3 ++
>  2 files changed, 112 insertions(+), 7 deletions(-)
> 
> diff --git a/libcpp/files.cc b/libcpp/files.cc
> index 24208f7b0f8..d4ffd77578e 100644
> --- a/libcpp/files.cc
> +++ b/libcpp/files.cc
> @@ -167,6 +167,33 @@ struct file_hash_entry_pool
>struct cpp_file_hash_entry pool[FILE_HASH_POOL_SIZE];
>  };
>  
> +/* A set of attributes designed to quickly identify obviously different files
> +   in a hashtable.  Just in case there are collisions, we still maintain a
> +   list.  These sub-lists can then be checked for #pragma once rather than
> +   interating through all_files.  */
> +struct file_quick_idempotency_attrs
> +{
> +  file_quick_idempotency_attrs(const _cpp_file *f)
> +: mtime(f->st.st_mtime), size(f->st.st_size) {}
> +
> +  time_t mtime;
> +  off_t size;
> +
> +  static hashval_t hash (/* _cpp_file* */ const void *p);
> +};
> +
> +/* Sub-list of very similar files kept in a hashtable to check for #pragma
> +   once.  */
> +struct file_sublist
> +{
> +  _cpp_file *f;
> +  file_sublist *next;
> +
> +  static int eq (/* _cpp_file* */ const void *p,
> +  /* file_sublist* */ const void *q);
> +  static void del (/* file_sublist* */ void *p);
> +};
> +
>  static bool open_file (_cpp_file *file);
>  static bool pch_open_file (cpp_reader *pfile, _cpp_file *file,
>  bool *invalid_pch);
> @@ -849,17 +876,17 @@ has_unique_contents (cpp_reader *pfile, _cpp_file 
> *file, bool import,
>if (!pfile->seen_once_only)
>  return true;
>  
> -  /* We may have read the file under a different name.  Look
> - for likely candidates and compare file contents to be sure.  */
> -  for (_cpp_file *f = pfile->all_files; f; f = f->next_file)
> +  /* We may have read the file under a different name.  We've kept
> + similar looking files in this lists under this hash table, so
> + check those more thoroughly.  */
> +  void* ent = htab_find(pfile->pragma_once_files, file);
> +  for (file_sublist *e = static_cast (ent); e; e = e->next)
>  {
> +  _cpp_file *f = e->f;
>if (f == file)
>   continue; /* It'sa me!  */
>  
> -  if ((import || f->once_only)
> -   && f->err_no == 0
> -   && f->st.st_mtime == file->st.st_mtime
> -   && f->st.st_size == file->st.st_size)
> +  if ((import || f->once_only) && f->err_no == 0)
>   {
> _cpp_file *ref_file;
>  
> @@ -895,6 +922,38 @@ has_unique_contents (cpp_reader *pfile, _cpp_file *file, 
> bool import,
>return true;
>  }
>  
> +/* Add the given file to the #pragma once table so it can be
> +   quickly identified and excluded the next time it's seen.  */
> +static void
> +update_pragma_once_table (cpp_reader *pfile, _cpp_file *file)
> +{
> +  void **slot = htab_find_slot (pfile->pragma_once_files, file, INSERT);
> +  if (slot)
> +{
> +  if (!*slot)
> + *slot = xcalloc (1, sizeof (file_sublist));
> +
> +  file_sublist *e = static_cast (*slot);
> +  while (e->f)
> + {
> +   if (!e->next)
> + {
> +   void *new_sublist = xcalloc(1, sizeof (file_sublist));
> +   e->next = static_cast (new_sublist);
> + }
> +   e = e->next;
> + }
> +
> +  e->f = file;
> +}
> +  else
> +{
> +  cpp_error (pfile, CPP_DL_ERROR,
> +  "Unable to create #pragma once table space for %s",
> +  _cpp_get_file_name (file));
> +}
> +}
> +
>  /* Place the file referenced by FILE into a new buffer on the buffer
> stack if possible.  Returns true if a buffer is stacked.  Use LOC
> for any diagnostics.  */
> @@ -950,6 +1009,9 @@ _cpp_stac

Re: [PATCH] Add ABI test for __bf16 type

2022-08-19 Thread H.J. Lu via Gcc-patches
On Thu, Aug 18, 2022 at 5:56 PM Hongtao Liu via Gcc-patches
 wrote:
>
> On Thu, Aug 18, 2022 at 3:36 PM Haochen Jiang via Gcc-patches
>  wrote:
> >
> > Hi all,
> >
> > This patch aims to add bf16 abi test after the whole __bf16 type is added.
> >
> > Regtested on x86_64-pc-linux-gnu. Ok for trunk?
> Ok.

All BF16 ABI tests failed due to missing __m128bf16/__m256bf16/__m512bf16.
When will __bf16 types be added?

> >
> > BRs,
> > Haochen
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/x86_64/abi/bf16/abi-bf16.exp: New test.
> > * gcc.target/x86_64/abi/bf16/args.h: Ditto.
> > * gcc.target/x86_64/abi/bf16/asm-support.S: Ditto.
> > * gcc.target/x86_64/abi/bf16/bf16-check.h: Ditto.
> > * gcc.target/x86_64/abi/bf16/bf16-helper.h: Ditto.
> > * gcc.target/x86_64/abi/bf16/defines.h: Ditto.
> > * gcc.target/x86_64/abi/bf16/m256bf16/abi-bf16-ymm.exp: Ditto.
> > * gcc.target/x86_64/abi/bf16/m256bf16/args.h: Ditto.
> > * gcc.target/x86_64/abi/bf16/m256bf16/asm-support.S: Ditto.
> > * gcc.target/x86_64/abi/bf16/m256bf16/bf16-ymm-check.h: Ditto.
> > * gcc.target/x86_64/abi/bf16/m256bf16/test_m256_returning.c: Ditto.
> > * gcc.target/x86_64/abi/bf16/m256bf16/test_passing_m256.c: Ditto.
> > * gcc.target/x86_64/abi/bf16/m256bf16/test_passing_structs.c: Ditto.
> > * gcc.target/x86_64/abi/bf16/m256bf16/test_passing_unions.c: Ditto.
> > * gcc.target/x86_64/abi/bf16/m256bf16/test_varargs-m256.c: Ditto.
> > * gcc.target/x86_64/abi/bf16/m512bf16/abi-bf16-zmm.exp: Ditto.
> > * gcc.target/x86_64/abi/bf16/m512bf16/args.h: Ditto.
> > * gcc.target/x86_64/abi/bf16/m512bf16/asm-support.S: Ditto.
> > * gcc.target/x86_64/abi/bf16/m512bf16/bf16-zmm-check.h: Ditto.
> > * gcc.target/x86_64/abi/bf16/m512bf16/test_m512_returning.c: Ditto.
> > * gcc.target/x86_64/abi/bf16/m512bf16/test_passing_m512.c: Ditto.
> > * gcc.target/x86_64/abi/bf16/m512bf16/test_passing_structs.c: Ditto.
> > * gcc.target/x86_64/abi/bf16/m512bf16/test_passing_unions.c: Ditto.
> > * gcc.target/x86_64/abi/bf16/m512bf16/test_varargs-m512.c: Ditto.
> > * gcc.target/x86_64/abi/bf16/macros.h: Ditto.
> > * gcc.target/x86_64/abi/bf16/test_3_element_struct_and_unions.c: 
> > Ditto.
> > * gcc.target/x86_64/abi/bf16/test_basic_alignment.c: Ditto.
> > * gcc.target/x86_64/abi/bf16/test_basic_array_size_and_align.c: 
> > Ditto.
> > * gcc.target/x86_64/abi/bf16/test_basic_returning.c: Ditto.
> > * gcc.target/x86_64/abi/bf16/test_basic_sizes.c: Ditto.
> > * gcc.target/x86_64/abi/bf16/test_basic_struct_size_and_align.c: 
> > Ditto.
> > * gcc.target/x86_64/abi/bf16/test_basic_union_size_and_align.c: 
> > Ditto.
> > * gcc.target/x86_64/abi/bf16/test_m128_returning.c: Ditto.
> > * gcc.target/x86_64/abi/bf16/test_passing_floats.c: Ditto.
> > * gcc.target/x86_64/abi/bf16/test_passing_m128.c: Ditto.
> > * gcc.target/x86_64/abi/bf16/test_passing_structs.c: Ditto.
> > * gcc.target/x86_64/abi/bf16/test_passing_unions.c: Ditto.
> > * gcc.target/x86_64/abi/bf16/test_struct_returning.c: Ditto.
> > * gcc.target/x86_64/abi/bf16/test_varargs-m128.c: Ditto.



-- 
H.J.


[Patch] lto-wrapper.cc: Delete offload_names temp files in case of error [PR106686]

2022-08-19 Thread Tobias Burnus

I saw that files such as /tmp/ccxFKYeS.target.o kept accumulating. Not a high 
number,
but still several.

I turned out that when compiling an offloading program successfully, they were 
removed.
But when it failed, those remained. (Example: See PR.)

This patch fixes this by storing the file name earlier and process them during 
cleanup,
unless they have been taken care of before. (Usual way; as they are printf'ed 
to stdout,
I assume the caller takes care of the tmp files.)

OK for mainline?

Tobias


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
lto-wrapper.cc: Delete offload_names temp files in case of error [PR106686]

Usually, the caller takes care of the .o files for the offload compilers
(suffix: ".target.o"). However, if an error occurs during processing
(e.g. fatal error by lto1), they were not deleted.

gcc/ChangeLog:

	PR lto/106686
	* lto-wrapper.cc (free_array_of_ptrs): Move before tool_cleanup.
	(tool_cleanup): Unlink offload_names.
	(compile_offload_image): Take filename argument to set it early.
	(compile_images_for_offload_targets): Update call; set
	offload_names to NULL after freeing the array.

diff --git a/gcc/lto-wrapper.cc b/gcc/lto-wrapper.cc
index 1e8eba1..9a76470 100644
--- a/gcc/lto-wrapper.cc
+++ b/gcc/lto-wrapper.cc
@@ -89,6 +89,25 @@ static bool xassembler_options_error = false;
 
 const char tool_name[] = "lto-wrapper";
 
+/* Auxiliary function that frees elements of PTR and PTR itself.
+   N is number of elements to be freed.  If PTR is NULL, nothing is freed.
+   If an element is NULL, subsequent elements are not freed.  */
+
+static void **
+free_array_of_ptrs (void **ptr, unsigned n)
+{
+  if (!ptr)
+return NULL;
+  for (unsigned i = 0; i < n; i++)
+{
+  if (!ptr[i])
+	break;
+  free (ptr[i]);
+}
+  free (ptr);
+  return NULL;
+}
+
 /* Delete tempfiles.  Called from utils_cleanup.  */
 
 void
@@ -114,6 +133,12 @@ tool_cleanup (bool)
   if (output_names[i])
 	maybe_unlink (output_names[i]);
 }
+  if (offload_names)
+{
+  for (i = 0; offload_names[i]; i++)
+	maybe_unlink (offload_names[i]);
+  free_array_of_ptrs ((void **) offload_names, i);
+}
 }
 
 static void
@@ -626,25 +651,6 @@ merge_and_complain (vec &decoded_options,
   }
 }
 
-/* Auxiliary function that frees elements of PTR and PTR itself.
-   N is number of elements to be freed.  If PTR is NULL, nothing is freed.
-   If an element is NULL, subsequent elements are not freed.  */
-
-static void **
-free_array_of_ptrs (void **ptr, unsigned n)
-{
-  if (!ptr)
-return NULL;
-  for (unsigned i = 0; i < n; i++)
-{
-  if (!ptr[i])
-	break;
-  free (ptr[i]);
-}
-  free (ptr);
-  return NULL;
-}
-
 /* Parse STR, saving found tokens into PVALUES and return their number.
Tokens are assumed to be delimited by ':'.  If APPEND is non-null,
append it to every token we find.  */
@@ -908,13 +914,13 @@ access_check (const char *name, int mode)
 /* Prepare a target image for offload TARGET, using mkoffload tool from
COMPILER_PATH.  Return the name of the resultant object file.  */
 
-static char *
+static const char *
 compile_offload_image (const char *target, const char *compiler_path,
 		   unsigned in_argc, char *in_argv[],
 		   vec compiler_opts,
-		   vec linker_opts)
+		   vec linker_opts,
+		   char **filename)
 {
-  char *filename = NULL;
   char *dumpbase;
   char **argv;
   char *suffix
@@ -922,6 +928,7 @@ compile_offload_image (const char *target, const char *compiler_path,
   strcpy (suffix, "/accel/");
   strcat (suffix, target);
   strcat (suffix, "/mkoffload");
+  *filename = NULL;
 
   char **paths = NULL;
   unsigned n_paths = parse_env_var (compiler_path, &paths, suffix);
@@ -950,9 +957,9 @@ compile_offload_image (const char *target, const char *compiler_path,
 
   /* Generate temporary output file name.  */
   if (save_temps)
-filename = concat (dumpbase, ".o", NULL);
+*filename = concat (dumpbase, ".o", NULL);
   else
-filename = make_temp_file (".target.o");
+*filename = make_temp_file (".target.o");
 
   struct obstack argv_obstack;
   obstack_init (&argv_obstack);
@@ -962,7 +969,7 @@ compile_offload_image (const char *target, const char *compiler_path,
   if (verbose)
 obstack_ptr_grow (&argv_obstack, "-v");
   obstack_ptr_grow (&argv_obstack, "-o");
-  obstack_ptr_grow (&argv_obstack, filename);
+  obstack_ptr_grow (&argv_obstack, *filename);
 
   /* Append names of input object files.  */
   for (unsigned i = 0; i < in_argc; i++)
@@ -986,7 +993,7 @@ compile_offload_image (const char *target, const char *compiler_path,
   obstack_free (&argv_obstack, NULL);
 
   free_array_of_ptrs ((void **) paths, n_paths);
-  return filename;
+  return *filename;
 }
 
 
@@ -1016,10 +1023,9 

Re: [PATCH] wwwdocs: Add D language changes and caveats to gcc-12/changes.html

2022-08-19 Thread Gerald Pfeifer
On Wed, 17 Aug 2022, Iain Buclaw wrote:
> This patch belatedly adds the new features and changes to the D 
> front-end during the GCC 12 development cycle, as well as a bullet in 
> the caveat section for D's new bootstrapping requirements.

Nice!

> +D:
> +Building and bootstrapping GDC, the D compiler, now requires a working 
> GDC
> +compiler (GCC version 9.1 or later) and D runtime library, libphobos, as
> +the D front end is written in D.

Might we be able to omit the "and bootstrapping" reference, which can be 
seen as a specific flavor of builing?

And put "libphobos" in parentheses, in line with "GCC version 9.1..."?

> +by default, but compiles and works if --enable-libphobos is
> +used. Other targets may require a more recent minimum version of GCC to
> +bootstrap. Specifics are documented for affected targets in the


Might we be able to omit "for affected targets"?

How do you feel about

+https://gcc.gnu.org/install/specific.html";>installation
+instructions.

instead of

+https://gcc.gnu.org/install/specific.html";>manual for
+installing GCC.

Genuine questions, all of them.

> +  On supported targets, the __traits(compiles) 
> expression

No comma between "targets" and "the".

> +  -fcheck=, enables or disables the code generation of
> + specific run-time contract checks.

No comma (I think)? ALso for the following entries.

> +  -fcheckaction=, controls the run-time behaviour on an

We generally use US English for consistency. :)

> +  -fdump-c++-spec=, dumps all compiled
> + extern(C++) declarations as C++ code to a file.

"to a file as C++ code"?

> + The supplimentary option -fdump-c++-spec-verbose turns on
> + emission of comments for ignored declaration in the generated C++ spec.

"declarations"

> +  -fextern-std=, controls which C++ standard
> + extern(C++) declarations are compiled to be compatible
> + with.

It feels something is missing here (in terms of grammar)?

> +  -fsave-mixins=, saves mixins expanded at compile-time 
> to
> + a file.
> +  

Will it be clear to everyone what a mixin is? (It's not to me, but I do 
not know D.)

> +  Deprecated and removed features:
:
> +  The -ftransition=dip25 and
> + -ftransition=dip1000 compiler switches have been renamed
> + to -fpreview=dip25 and -fprefix=dip1000.
> +  

Hmm, so the feature as such has not been deprecated or removed; just how 
it's been invoked has changed.


I hope this is useful feedback. As maintainer for everything D in GCC you 
do not need approval to commit, I'm just happy to help (from a consistency
and "naive user" perspective if nothing else).

Cheers,
Gerald


Re: [PATCH] Refactor back_threader_profitability

2022-08-19 Thread Jeff Law via Gcc-patches




On 8/17/2022 1:31 AM, Aldy Hernandez via Gcc-patches wrote:

I just have a few high level comments.

On Tue, Aug 16, 2022 at 4:05 PM Richard Biener  wrote:

The following refactors profitable_path_p in the backward threader,
splitting out parts that can be computed once the exit block is known,
parts that contiguously update and that can be checked allowing
for the path to be later identified as FSM with larger limits,
possibly_profitable_path_p, and final checks done when the whole
path is known, profitable_path_p.

I thought we were removing references to FSM, as they were leftovers
from some previous incarnation.  For that matter, I don't think I ever
understood what they are, so if we're gonna keep them, could you
comment what makes FSM threads different from other threads?
FSM refers to the initial implementation from Steve E. IIRC.  It was 
designed to handle threading backedges in a loop where the path out of 
the current iteration would tell us where a multi-way branch in the next 
loop iteration would go.


During the integration of Steve's work it was recognized that the 
backwards walk was generally a better model and we started moving to the 
backwards based threader with the goal of removing the forward threader.


There should be tests in the testsuite which validate that we haven't 
lost the key transformation.  ssa-thread-backedge.c and pr77445-2.c



The DOM threader has limits?  I thought most of those limits were just
due to the fact that it couldn't determine long enough paths?  Either
way, I like that we're merging the necessary forward threader bits
here, in preparation for its demise ;-).
The forward threader has structural limitations due to its custom block 
copier and CFG update code as well as profitibility limitations.


jeff



Re: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)

2022-08-19 Thread 钟居哲
I rewrite test_vector_subregs_fore_back as follows:

static void
test_vector_subregs_fore_back (machine_mode inner_mode)
{
  poly_uint64 nunits = GET_MODE_NUNITS (inner_mode);
  unsigned int min_nunits = constant_lower_bound (nunits);
  scalar_mode int_mode = GET_MODE_INNER (inner_mode);
  unsigned int count = gcd (min_nunits, 4);

  rtx_vector_builder builder (inner_mode, count, 2);
  unsigned int nelts_per_pattern = count == 1 ? 2 : count;
  for (unsigned int i = 0; i < nelts_per_pattern; ++i)
builder.quick_push (gen_int_mode (i, int_mode));
  for (unsigned int i = 0; i < count; ++i)
builder.quick_push (gen_int_mode (-(int) i, int_mode));
  rtx x = builder.build ();

  test_vector_subregs_modes (x);
  if (!nunits.is_constant ())
test_vector_subregs_modes (x, nunits - min_nunits, count);
}

I add the code: unsigned int nelts_per_pattern = count == 1 ? 2 : count;
then replace the "count" into "nelts_per_pattern " in the first loop.

It can pass now. And "x" value I print out seems to be correct:
(const_vector:VNx1DI [
(const_int 0 [0])
repeat [
(const_int 1 [0x1])
]
])

Is this correct solution ? 

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2022-08-19 20:52
To: juzhe.zhong\@rivai.ai
CC: gcc-patches; rguenther; kito.cheng
Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) 
and allow the machine_mode definition with poly_uint16 (1, 1)
"juzhe.zh...@rivai.ai"  writes:
>> Ah, right, sorry for the bogus suggestion.
>> In that case, what specifically goes wrong?  Which test in
>> test_vector_subregs_modes fails, and what does the ICE look like?
>
> Because may_be (nunits, 1) return true. For nunits = (1,1), it will execute 
> test_vector_subregs_modes.
> The fail ICE:
> ../../../riscv-gcc/gcc/simplify-rtx.cc:8396: test_vector_subregs_modes: FAIL: 
> ASSERT_RTX_EQ (expected, simplify_subreg (QImode, x, inner_mode, byte))
>   expected: (nil)
>
>   actual: (const_int 0 [0])
> cc1: internal compiler error: in assert_rtx_eq_at, at selftest-rtl.cc:57
> 0x1304ee1 selftest::assert_rtx_eq_at(selftest::location const&, char const*, 
> rtx_def*, rtx_def*)
> ../../../riscv-gcc/gcc/selftest-rtl.cc:57
> 0x1332504 test_vector_subregs_modes
> ../../../riscv-gcc/gcc/simplify-rtx.cc:8395
> 0x1332988 test_vector_subregs_fore_back
> ../../../riscv-gcc/gcc/simplify-rtx.cc:8442
> 0x1332ae7 test_vector_subregs
> ../../../riscv-gcc/gcc/simplify-rtx.cc:8467
> 0x1332c57 test_vector_ops
> ../../../riscv-gcc/gcc/simplify-rtx.cc:8487
> 0x1332c7b selftest::simplify_rtx_cc_tests()
> ../../../riscv-gcc/gcc/simplify-rtx.cc:8547
> 0x21318fa selftest::run_tests()
> ../../../riscv-gcc/gcc/selftest-run-tests.cc:115
> 0x1362a76 toplev::run_self_tests()
> ../../../riscv-gcc/gcc/toplev.cc:2205
>
> I analyzed the codes:
> In test_vector_subregs_fore_back, when nunits = (1,1). The expected = 
> NULL_RTX and simplify_subreg (QImode, x, inner_mode, byte) = const_int 0.
> So the assertion fails.
 
Hmm, ok, so the subreg operation is unexpected succeeding.
 
> This is the test for stepped vector using 2 element per pattern.  For 
> poly_uint16 (1,1), it's true it is possible only has 1 element. 
 
The stepped case is 3 elements per pattern rather than 2.  In a stepped
pattern: a, b, b+n are represented explicitly, then the rest are
implicitly b+n*2, b+n*3, etc.
 
The case being handled by this code is instead the 2-element case:
a, b are represented explicitly, then the rest are implicitly all b.
 
Why is (1,1) different though?  The test is constructing:
 
  nunits: 1 + 1x
  shape: nelts_per_pattern == 2, npatterns == 1
  elements: a, b[, b, b, b, b, ...]
 
It then tests subregs starting at 0 + 1x (so starting at the first b).
But for (2,2) we should have:
 
  nunits: 2 + 2x
  shape: nelts_per_pattern == 2, npatterns == 2
  elements: a1, a2, b1, b2[, b1, b2, b1, b2, ...]
 
and we should test subregs starting at 0 + 2x (so starting at the
first b1).  The two cases should be very similar, it's just that the
(2,2) case doubles the number of patterns.
 
> I think it makes sense to fail the test. However for poly (1,1) machine mode, 
> can we have the chance that some target define this
> kind of machine mode only used for intrinsics?  I already developed full RVV 
> support in GCC (including intrinsc and auto-vectorization).
> I only enable auto-vectorization with mode larger than (2,2) and test it 
> fully.
> From my experience, it seems the stepped vector only created during VLA 
> auto-vectorization. And I think only allow poly (1,1)mode used in intrinsics 
> will
> not create issues. Am I understanding wrong ?Feel free to correct me. Thanks ~
 
Following on from what I said above, it doesn't look like this particular
case is related to stepped vectors.
 
(1,1) shouldn't (need to) be a special case though.  Any potentital
problems that would occur for (1,1) with npatterns==1 would also occur
for (n

Re: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)

2022-08-19 Thread 钟居哲
As you mentioned. For poly_uint16 (1, 1), the pattern should be:
shape: nelts_per_pattern == 2, npatterns == 1
 elements: a, b[, b, b, b, b, ...]

I tried to print out the rtx that tests create to test:

static void
test_vector_subregs_fore_back (machine_mode inner_mode)
{
  poly_uint64 nunits = GET_MODE_NUNITS (inner_mode);
  unsigned int min_nunits = constant_lower_bound (nunits);
  scalar_mode int_mode = GET_MODE_INNER (inner_mode);
  unsigned int count = gcd (min_nunits, 4);

  rtx_vector_builder builder (inner_mode, count, 2);
  for (unsigned int i = 0; i < count; ++i)
builder.quick_push (gen_int_mode (i, int_mode));
  for (unsigned int i = 0; i < count; ++i)
builder.quick_push (gen_int_mode (-(int) i, int_mode));
  rtx x = builder.build ();

  test_vector_subregs_modes (x);
  print_rtl_single(stdout,x);
  if (!nunits.is_constant ())
test_vector_subregs_modes (x, nunits - min_nunits, count);
}

the x value:
(const_vector:VNx1DI repeat [
(const_int 0 [0])
])

It seems that it doesn't match the pattern you said. Am I understanding 
correctly? And would you mind taking a look at the codes which generate x:

static void
test_vector_subregs_fore_back (machine_mode inner_mode)
{
  poly_uint64 nunits = GET_MODE_NUNITS (inner_mode);
  unsigned int min_nunits = constant_lower_bound (nunits);
  scalar_mode int_mode = GET_MODE_INNER (inner_mode);
  unsigned int count = gcd (min_nunits, 4);

  rtx_vector_builder builder (inner_mode, count, 2);
  for (unsigned int i = 0; i < count; ++i)
builder.quick_push (gen_int_mode (i, int_mode));
  for (unsigned int i = 0; i < count; ++i)
builder.quick_push (gen_int_mode (-(int) i, int_mode));
  rtx x = builder.build ();

The nunits  = (1,1), min_nunits = 1, count = 1.  I print out these variable 
value may helpful for you to check. Thanks!




juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2022-08-19 20:52
To: juzhe.zhong\@rivai.ai
CC: gcc-patches; rguenther; kito.cheng
Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) 
and allow the machine_mode definition with poly_uint16 (1, 1)
"juzhe.zh...@rivai.ai"  writes:
>> Ah, right, sorry for the bogus suggestion.
>> In that case, what specifically goes wrong?  Which test in
>> test_vector_subregs_modes fails, and what does the ICE look like?
>
> Because may_be (nunits, 1) return true. For nunits = (1,1), it will execute 
> test_vector_subregs_modes.
> The fail ICE:
> ../../../riscv-gcc/gcc/simplify-rtx.cc:8396: test_vector_subregs_modes: FAIL: 
> ASSERT_RTX_EQ (expected, simplify_subreg (QImode, x, inner_mode, byte))
>   expected: (nil)
>
>   actual: (const_int 0 [0])
> cc1: internal compiler error: in assert_rtx_eq_at, at selftest-rtl.cc:57
> 0x1304ee1 selftest::assert_rtx_eq_at(selftest::location const&, char const*, 
> rtx_def*, rtx_def*)
> ../../../riscv-gcc/gcc/selftest-rtl.cc:57
> 0x1332504 test_vector_subregs_modes
> ../../../riscv-gcc/gcc/simplify-rtx.cc:8395
> 0x1332988 test_vector_subregs_fore_back
> ../../../riscv-gcc/gcc/simplify-rtx.cc:8442
> 0x1332ae7 test_vector_subregs
> ../../../riscv-gcc/gcc/simplify-rtx.cc:8467
> 0x1332c57 test_vector_ops
> ../../../riscv-gcc/gcc/simplify-rtx.cc:8487
> 0x1332c7b selftest::simplify_rtx_cc_tests()
> ../../../riscv-gcc/gcc/simplify-rtx.cc:8547
> 0x21318fa selftest::run_tests()
> ../../../riscv-gcc/gcc/selftest-run-tests.cc:115
> 0x1362a76 toplev::run_self_tests()
> ../../../riscv-gcc/gcc/toplev.cc:2205
>
> I analyzed the codes:
> In test_vector_subregs_fore_back, when nunits = (1,1). The expected = 
> NULL_RTX and simplify_subreg (QImode, x, inner_mode, byte) = const_int 0.
> So the assertion fails.
 
Hmm, ok, so the subreg operation is unexpected succeeding.
 
> This is the test for stepped vector using 2 element per pattern.  For 
> poly_uint16 (1,1), it's true it is possible only has 1 element. 
 
The stepped case is 3 elements per pattern rather than 2.  In a stepped
pattern: a, b, b+n are represented explicitly, then the rest are
implicitly b+n*2, b+n*3, etc.
 
The case being handled by this code is instead the 2-element case:
a, b are represented explicitly, then the rest are implicitly all b.
 
Why is (1,1) different though?  The test is constructing:
 
  nunits: 1 + 1x
  shape: nelts_per_pattern == 2, npatterns == 1
  elements: a, b[, b, b, b, b, ...]
 
It then tests subregs starting at 0 + 1x (so starting at the first b).
But for (2,2) we should have:
 
  nunits: 2 + 2x
  shape: nelts_per_pattern == 2, npatterns == 2
  elements: a1, a2, b1, b2[, b1, b2, b1, b2, ...]
 
and we should test subregs starting at 0 + 2x (so starting at the
first b1).  The two cases should be very similar, it's just that the
(2,2) case doubles the number of patterns.
 
> I think it makes sense to fail the test. However for poly (1,1) machine mode, 
> can we have the chance that some target define this
> kind of machine mode only used for intrinsics?  I already developed

Re: [Patch] mkoffload: Cleanup temporary omp_requires_file

2022-08-19 Thread Jakub Jelinek via Gcc-patches
On Fri, Aug 19, 2022 at 04:01:10PM +0200, Tobias Burnus wrote:
> Rather obvious, once found. I forgot to add some cleanup, cluttering
> /tmp with ".mkoffload.omp_requires".
> 
> The same issue exists for GCN also for ".mkoffload.dbg"
> 
> OK for mainline? – For the dbg issue, OK also for GCC 11 and 12 backport?
> 
> Follow-up to the 'OpenMP: Move omp requires checks to libgomp" commit
> https://gcc.gnu.org/r13-1458-g683f11843974f0bdf42f79cdcbb0c2b43c7b81b0
> 
> For GCN, the issues exists since
> https://gcc.gnu.org/r11-6683-g505caa7295b93ecdec8ac9b31595eb34dbd48c9f
> 
> Tobias
> 
> PS: In case you wonder why it is also added with -save-temps: The
> internally called maybe_unlink removed the file, unless -save-temps. For
> the latter, with verbose flag, it also shows them as '[Leaving %s]'.
> 
> PPS: If you see some ...target.o files in /tmp, that's because of
> https://gcc.gnu.org/PR106686
> -
> Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
> München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
> Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
> München, HRB 106955

> mkoffload: Cleanup temporary omp_requires_file
> 
> The file (suffix ".mkoffload.omp_requires") used to save the 'omp requires'
> data has to be passed to maybe_unlink for cleanup or -v -save-temps stderr
> diagnostic. That was missed before. - For GCN, the same has to be done for
> the files with suffix ".mkoffload.dbg.o".
> 
> gcc/ChangeLog:
> 
>   * config/gcn/mkoffload.cc (main): Add omp_requires_file and dbgobj to
>   files_to_cleanup.
>   * config/i386/intelmic-mkoffload.cc (prepare_target_image): Add
>   omp_requires_file to temp_files.
>   * config/nvptx/mkoffload.cc (omp_requires_file): New global static var.
>   (main): Remove local omp_requires_file var.
>   (tool_cleanup): Handle omp_requires_file.

Ok, thanks.

Jakub



[Patch] mkoffload: Cleanup temporary omp_requires_file

2022-08-19 Thread Tobias Burnus

Rather obvious, once found. I forgot to add some cleanup, cluttering
/tmp with ".mkoffload.omp_requires".

The same issue exists for GCN also for ".mkoffload.dbg"

OK for mainline? – For the dbg issue, OK also for GCC 11 and 12 backport?

Follow-up to the 'OpenMP: Move omp requires checks to libgomp" commit
https://gcc.gnu.org/r13-1458-g683f11843974f0bdf42f79cdcbb0c2b43c7b81b0

For GCN, the issues exists since
https://gcc.gnu.org/r11-6683-g505caa7295b93ecdec8ac9b31595eb34dbd48c9f

Tobias

PS: In case you wonder why it is also added with -save-temps: The
internally called maybe_unlink removed the file, unless -save-temps. For
the latter, with verbose flag, it also shows them as '[Leaving %s]'.

PPS: If you see some ...target.o files in /tmp, that's because of
https://gcc.gnu.org/PR106686
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
mkoffload: Cleanup temporary omp_requires_file

The file (suffix ".mkoffload.omp_requires") used to save the 'omp requires'
data has to be passed to maybe_unlink for cleanup or -v -save-temps stderr
diagnostic. That was missed before. - For GCN, the same has to be done for
the files with suffix ".mkoffload.dbg.o".

gcc/ChangeLog:

	* config/gcn/mkoffload.cc (main): Add omp_requires_file and dbgobj to
	files_to_cleanup.
	* config/i386/intelmic-mkoffload.cc (prepare_target_image): Add
	omp_requires_file to temp_files.
	* config/nvptx/mkoffload.cc (omp_requires_file): New global static var.
	(main): Remove local omp_requires_file var.
	(tool_cleanup): Handle omp_requires_file.

diff --git a/gcc/config/gcn/mkoffload.cc b/gcc/config/gcn/mkoffload.cc
index d2464332275..4206448703a 100644
--- a/gcc/config/gcn/mkoffload.cc
+++ b/gcc/config/gcn/mkoffload.cc
@@ -1030,6 +1030,7 @@ main (int argc, char **argv)
 		}
 		  else
 		dbgobj = make_temp_file (".mkoffload.dbg.o");
+		  obstack_ptr_grow (&files_to_cleanup, dbgobj);
 
 		  /* If the copy fails then just ignore it.  */
 		  if (copy_early_debug_info (argv[ix], dbgobj))
@@ -1085,6 +1086,7 @@ main (int argc, char **argv)
 	omp_requires_file = concat (dumppfx, ".mkoffload.omp_requires", NULL);
   else
 	omp_requires_file = make_temp_file (".mkoffload.omp_requires");
+  obstack_ptr_grow (&files_to_cleanup, omp_requires_file);
 
   /* Run the compiler pass.  */
   xputenv (concat ("GCC_OFFLOAD_OMP_REQUIRES_FILE=", omp_requires_file, NULL));
diff --git a/gcc/config/i386/intelmic-mkoffload.cc b/gcc/config/i386/intelmic-mkoffload.cc
index 596f6f107b8..5deddff6ca2 100644
--- a/gcc/config/i386/intelmic-mkoffload.cc
+++ b/gcc/config/i386/intelmic-mkoffload.cc
@@ -526,6 +526,7 @@ prepare_target_image (const char *target_compiler, int argc, char **argv, uint32
 omp_requires_file = concat (dumppfx, ".mkoffload.omp_requires", NULL);
   else
 omp_requires_file = make_temp_file (".mkoffload.omp_requires");
+  temp_files[num_temps++] = omp_requires_file;
   xputenv (concat ("GCC_OFFLOAD_OMP_REQUIRES_FILE=", omp_requires_file, NULL));
 
   compile_for_target (&argv_obstack);
diff --git a/gcc/config/nvptx/mkoffload.cc b/gcc/config/nvptx/mkoffload.cc
index 0fa5f4423bf..7557681ad07 100644
--- a/gcc/config/nvptx/mkoffload.cc
+++ b/gcc/config/nvptx/mkoffload.cc
@@ -55,6 +55,7 @@ static id_map *var_ids, **vars_tail = &var_ids;
 /* Files to unlink.  */
 static const char *ptx_name;
 static const char *ptx_cfile_name;
+static const char *omp_requires_file;
 static const char *ptx_dumpbase;
 
 enum offload_abi offload_abi = OFFLOAD_ABI_UNSET;
@@ -68,6 +69,8 @@ tool_cleanup (bool from_signal ATTRIBUTE_UNUSED)
 maybe_unlink (ptx_cfile_name);
   if (ptx_name)
 maybe_unlink (ptx_name);
+  if (omp_requires_file)
+maybe_unlink (omp_requires_file);
 }
 
 static void
@@ -571,7 +689,6 @@ main (int argc, char **argv)
   unsetenv ("COMPILER_PATH");
   unsetenv ("LIBRARY_PATH");
 
-  char *omp_requires_file;
   if (save_temps)
 	omp_requires_file = concat (dumppfx, ".mkoffload.omp_requires", NULL);
   else


[PATCH] tree-optimization/105937 - avoid uninit diagnostics crossing iterations

2022-08-19 Thread Richard Biener via Gcc-patches
The following avoids adding PHIs to the worklist for uninit processing
if we reach them following backedges.  That confuses predicate analysis
because it assumes the use is happening in the same iteration as the the
definition.  For the testcase in the PR the situation is like

void foo (int val)
{
  int uninit;
  # val = PHI <..> (B)
  for (..)
{
  if (..)
{
  .. = val; (C)
  val = uninit;
}
  # val = PHI <..> (A)
}
}

and starting from (A) with 'uninit' as argument we arrive at (B)
and from there at (C).  Predicate analysis then tries to prove
the predicate of (B) (not the backedge) can prove that the
path from (B) to (C) is unreachable which isn't really what it
necessary - that's what we'd need to do when the preheader
edge of the loop were the edge with the uninitialized def.

So the following makes those cases intentionally false negatives.

Bootstrapped and tested on x86_64-unknown-linux-gnu, will push
on Monday if there are no comments.

Thanks,
Richard.

PR tree-optimization/105937
* tree-ssa-uninit.cc (find_uninit_use): Do not queue PHIs
on backedges.
(execute_late_warn_uninitialized): Mark backedges.

* g++.dg/uninit-pr105937.C: New testcase.
---
 gcc/testsuite/g++.dg/uninit-pr105937.C | 235 +
 gcc/tree-ssa-uninit.cc |  14 +-
 2 files changed, 247 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/uninit-pr105937.C

diff --git a/gcc/testsuite/g++.dg/uninit-pr105937.C 
b/gcc/testsuite/g++.dg/uninit-pr105937.C
new file mode 100644
index 000..26b4f74c5e1
--- /dev/null
+++ b/gcc/testsuite/g++.dg/uninit-pr105937.C
@@ -0,0 +1,235 @@
+// { dg-do compile }
+// { dg-require-effective-target c++17 }
+// { dg-options "-O2 -Wall" }
+
+#include 
+#include 
+#include 
+
+using utf8 = char;
+
+enum
+{
+FONT_SIZE_TINY = 2,
+FONT_SIZE_SMALL = 0,
+FONT_SIZE_MEDIUM = 1,
+FONT_SIZE_COUNT = 3
+};
+
+constexpr const uint16_t FONT_SPRITE_GLYPH_COUNT = 224;
+
+enum class FontSpriteBase : int16_t
+{
+MEDIUM_EXTRA_DARK = -2,
+MEDIUM_DARK = -1,
+
+TINY = FONT_SIZE_TINY * FONT_SPRITE_GLYPH_COUNT,
+SMALL = FONT_SIZE_SMALL * FONT_SPRITE_GLYPH_COUNT,
+MEDIUM = FONT_SIZE_MEDIUM * FONT_SPRITE_GLYPH_COUNT,
+};
+
+struct TTFSurface;
+
+class CodepointView
+{
+private:
+std::string_view _str;
+
+public:
+class iterator
+{
+private:
+std::string_view _str;
+size_t _index;
+
+public:
+iterator(std::string_view str, size_t index)
+: _str(str)
+, _index(index)
+{
+}
+
+bool operator==(const iterator& rhs) const
+{
+return _index == rhs._index;
+}
+bool operator!=(const iterator& rhs) const
+{
+return _index != rhs._index;
+}
+char32_t operator*() const
+{
+return GetNextCodepoint(&_str[_index], nullptr);
+}
+iterator& operator++()
+{
+return *this;
+}
+iterator operator++(int)
+{
+auto result = *this;
+if (_index < _str.size())
+{
+const utf8* nextch;
+GetNextCodepoint(&_str[_index], &nextch);
+_index = nextch - _str.data();
+}
+return result;
+}
+
+size_t GetIndex() const
+{
+return _index;
+}
+
+static char32_t GetNextCodepoint(const char* ch, const char** next);
+};
+
+CodepointView(std::string_view str)
+: _str(str)
+{
+}
+
+iterator begin() const
+{
+return iterator(_str, 0);
+}
+
+iterator end() const
+{
+return iterator(_str, _str.size());
+}
+};
+
+struct InternalTTFFont;
+using TTF_Font = InternalTTFFont;
+struct TTFFontDescriptor
+{
+const utf8* filename;
+const utf8* font_name;
+int32_t ptSize;
+int32_t offset_x;
+int32_t offset_y;
+int32_t line_height;
+int32_t hinting_threshold;
+TTF_Font* font;
+};
+using codepoint_t = uint32_t;
+
+#define abstract = 0
+
+struct ITTF
+{
+virtual ~ITTF() = default;
+virtual TTFFontDescriptor* ttf_get_font_from_sprite_base(FontSpriteBase 
spriteBase) abstract;
+virtual TTFSurface* ttf_surface_cache_get_or_add(TTF_Font* font, 
std::string_view text) abstract;
+};
+
+namespace OpenRCT2 {
+struct IContext
+{
+virtual ~IContext() = default;
+
+virtual ITTF* GetTTF() abstract;
+};
+}
+
+static void ttf_draw_string_raw_ttf(OpenRCT2::IContext* context, 
std::string_view text)
+{
+TTFFontDescriptor* fontDesc = 
context->GetTTF()->ttf_get_font_from_sprite_base(FontSpriteBase::MEDIUM_EXTRA_DARK);
+if (fontDesc->font == nullptr)
+{
+return;
+}
+
+TTFSurface* surface = 
context->GetTTF()->ttf_surface_cache_get_or_add(fontDesc->font, text);
+if (surface

Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)

2022-08-19 Thread Richard Sandiford via Gcc-patches
"juzhe.zh...@rivai.ai"  writes:
>> Ah, right, sorry for the bogus suggestion.
>> In that case, what specifically goes wrong?  Which test in
>> test_vector_subregs_modes fails, and what does the ICE look like?
>
> Because may_be (nunits, 1) return true. For nunits = (1,1), it will execute 
> test_vector_subregs_modes.
> The fail ICE:
> ../../../riscv-gcc/gcc/simplify-rtx.cc:8396: test_vector_subregs_modes: FAIL: 
> ASSERT_RTX_EQ (expected, simplify_subreg (QImode, x, inner_mode, byte))
>   expected: (nil)
>
>   actual: (const_int 0 [0])
> cc1: internal compiler error: in assert_rtx_eq_at, at selftest-rtl.cc:57
> 0x1304ee1 selftest::assert_rtx_eq_at(selftest::location const&, char const*, 
> rtx_def*, rtx_def*)
> ../../../riscv-gcc/gcc/selftest-rtl.cc:57
> 0x1332504 test_vector_subregs_modes
> ../../../riscv-gcc/gcc/simplify-rtx.cc:8395
> 0x1332988 test_vector_subregs_fore_back
> ../../../riscv-gcc/gcc/simplify-rtx.cc:8442
> 0x1332ae7 test_vector_subregs
> ../../../riscv-gcc/gcc/simplify-rtx.cc:8467
> 0x1332c57 test_vector_ops
> ../../../riscv-gcc/gcc/simplify-rtx.cc:8487
> 0x1332c7b selftest::simplify_rtx_cc_tests()
> ../../../riscv-gcc/gcc/simplify-rtx.cc:8547
> 0x21318fa selftest::run_tests()
> ../../../riscv-gcc/gcc/selftest-run-tests.cc:115
> 0x1362a76 toplev::run_self_tests()
> ../../../riscv-gcc/gcc/toplev.cc:2205
>
> I analyzed the codes:
> In test_vector_subregs_fore_back, when nunits = (1,1). The expected = 
> NULL_RTX and simplify_subreg (QImode, x, inner_mode, byte) = const_int 0.
> So the assertion fails.

Hmm, ok, so the subreg operation is unexpected succeeding.

> This is the test for stepped vector using 2 element per pattern.  For 
> poly_uint16 (1,1), it's true it is possible only has 1 element. 

The stepped case is 3 elements per pattern rather than 2.  In a stepped
pattern: a, b, b+n are represented explicitly, then the rest are
implicitly b+n*2, b+n*3, etc.

The case being handled by this code is instead the 2-element case:
a, b are represented explicitly, then the rest are implicitly all b.

Why is (1,1) different though?  The test is constructing:

  nunits: 1 + 1x
  shape: nelts_per_pattern == 2, npatterns == 1
  elements: a, b[, b, b, b, b, ...]

It then tests subregs starting at 0 + 1x (so starting at the first b).
But for (2,2) we should have:

  nunits: 2 + 2x
  shape: nelts_per_pattern == 2, npatterns == 2
  elements: a1, a2, b1, b2[, b1, b2, b1, b2, ...]

and we should test subregs starting at 0 + 2x (so starting at the
first b1).  The two cases should be very similar, it's just that the
(2,2) case doubles the number of patterns.

> I think it makes sense to fail the test. However for poly (1,1) machine mode, 
> can we have the chance that some target define this
> kind of machine mode only used for intrinsics?  I already developed full RVV 
> support in GCC (including intrinsc and auto-vectorization).
> I only enable auto-vectorization with mode larger than (2,2) and test it 
> fully.
> From my experience, it seems the stepped vector only created during VLA 
> auto-vectorization. And I think only allow poly (1,1)mode used in intrinsics 
> will
> not create issues. Am I understanding wrong ?Feel free to correct me. Thanks ~

Following on from what I said above, it doesn't look like this particular
case is related to stepped vectors.

(1,1) shouldn't (need to) be a special case though.  Any potentital
problems that would occur for (1,1) with npatterns==1 would also occur
for (n,n) with npatterns==n.  E.g. if stepped vectors are problematic
for (1,1) then an interleaving of 2 stepped vectors (i.e. npatterns==2)
would be problematic for (2,2).

So yeah, preventing a mode being used for autovectorisation would allow
the target to have a bit more control over which constants are actually
generated.  But it shouldn't be necessary to do that for correctness.

Thanks,
Richard

> juzhe.zh...@rivai.ai
>  
> From: Richard Sandiford
> Date: 2022-08-19 17:35
> To: juzhe.zhong\@rivai.ai
> CC: rguenther; gcc-patches; kito.cheng
> Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) 
> and allow the machine_mode definition with poly_uint16 (1, 1)
> "juzhe.zh...@rivai.ai"  writes:
>> Hi, Richard. I tried the codes:
>> if (!nunits.is_constant () && maybe_gt (nunits, 1)) 
>> test_vector_subregs_modes (x, nunits - min_nunits, count);
>>
>> It still failed. For nunits = (1,1) , maybe_gt (nunits, 1) return true value.
>  
> Ah, right, sorry for the bogus suggestion.
>  
> In that case, what specifically goes wrong?  Which test in
> test_vector_subregs_modes fails, and what does the ICE look like?
>  
> Thanks,
> Richard
>  
>> But I tried:
>> if (!nunits.is_constant () && known_gt (nunits, 1)) 
>> test_vector_subregs_modes (x, nunits - min_nunits, count);
>> It pass. But it report a warning: "warning: comparison between signed and 
>> unsigned integer expressions [-Wsign-compare]" during the compilation.
>>
>> 

Re: [GCC 13/15][PATCH v3] arm: Add support for dwarf debug directives and pseudo hard-register for PAC feature.

2022-08-19 Thread Richard Earnshaw via Gcc-patches




On 19/08/2022 11:04, Srinath Parvathaneni via Gcc-patches wrote:

Hello,

This patch teaches the DWARF support in gcc about RA_AUTH_CODE pseudo 
hard-register and also
.save {ra_auth_code} and .cfi_offset ra_auth_code  dwarf directives for 
the PAC feature
in Armv8.1-M architecture.

RA_AUTH_CODE register number is 107 and it's dwarf register number is 143.

When compiled with " -march=armv8.1-m.main -mbranch-protection=pac-ret+leaf+bti 
-mthumb
-mfloat-abi=soft -fasynchronous-unwind-tables -g -O2 -S" command line options, 
the assembly
output after this patch looks like below:

 ...
 .cfi_startproc
 pacbti  ip, lr, sp
 movsr1, #40
 push{ip, lr}
 .save {ra_auth_code, lr}
 .cfi_def_cfa_offset 8
 .cfi_offset 143, -8
 .cfi_offset 14, -4
 ...
 pop {ip, lr}
 .cfi_restore 14
 .cfi_restore 143
 .cfi_def_cfa_offset 0
 movsr0, #0
 aut ip, lr, sp
 bx  lr
 .cfi_endproc
 ...

Regression tested on arm-none-eabi target and found no regressions.

Ok for master?

Regards,
Srinath.

gcc/ChangeLog:

2022-08-17  Srinath Parvathaneni  

 * config/arm/aout.h (ra_auth_code): Add to enum.
 * config/arm/arm.cc (emit_multi_reg_push): Add RA_AUTH_CODE register to
 dwarf frame expression.
 (arm_emit_multi_reg_pop): Restore RA_AUTH_CODE register.
 (arm_expand_prologue): Mark as frame related insn.
 (arm_regno_class): Check for pac pseudo reigster.
 (arm_dbx_register_number): Assign ra_auth_code register number in 
dwarf.
 (arm_unwind_emit_sequence): Print .save directive with ra_auth_code
 register.
 (arm_conditional_register_usage): Mark ra_auth_code in fixed reigsters.
 * config/arm/arm.h (FIRST_PSEUDO_REGISTER): Modify.
 (IS_PAC_Pseudo_REGNUM): Define.
 (enum reg_class): Add PAC_REG entry.
 * config/arm/arm.md (RA_AUTH_CODE): Define.

gcc/testsuite/ChangeLog:

2022-08-17  Srinath Parvathaneni  

 * g++.target/arm/pac-1.C: New test.
 * gcc.target/arm/pac-9.c: Likewise.

The general boiler-plate to add the RA register is OK, but the code that 
tweaks the generation of the push instructions is fixing the wrong 
problem.  The dwarf code already knows how to to track reg->reg copies 
and put out the right frame information, but this isn't working because 
you've not augmented the PAC instruction correctly.  What you need is a 
frame-related augmentation to that and that essentially does


(set (IP_REGNUM) (RA_AUTH_CODE))

The generic dwarf code should then handle all the rest for emitting CFI 
directives.


The code in arm_unwind_emit_sequence (and technically in 
arm_unwind_emit_set as well, but we probably never reach that code as of 
today) then needs updating to handle the special cases when IP appears 
in the list of registers and PAC is enabled.  That's a bit of a hack, 
but I can't immediately think of a better way of handling it.


R.



### Attachment also inlined for ease of reply###


diff --git a/gcc/config/arm/aout.h b/gcc/config/arm/aout.h
index 
b918ad3782fbee82320febb8b6e72ad615780261..ffeed45a678f17c63d5b42c21f020ca416cbf23f
 100644
--- a/gcc/config/arm/aout.h
+++ b/gcc/config/arm/aout.h
@@ -74,7 +74,8 @@
"wr8",   "wr9",   "wr10",  "wr11",  \
"wr12",  "wr13",  "wr14",  "wr15",  \
"wcgr0", "wcgr1", "wcgr2", "wcgr3", \
-  "cc", "vfpcc", "sfp", "afp", "apsrq", "apsrge", "p0"   \
+  "cc", "vfpcc", "sfp", "afp", "apsrq", "apsrge", "p0",  \
+  "ra_auth_code" \
  }
  #endif
  
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h

index 
3495ab857eac38ecdf37e55f1d201b1c35cbde0b..c7067819f6785e44d30d8e5365505ab98682
 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -816,7 +816,8 @@ extern const int arm_arch_cde_coproc_bits[];
s16-s31   S VFP variable (aka d8-d15).
vfpcc   Not a real register.  Represents the VFP condition
code flags.
-   vpr Used to represent MVE VPR predication.  */
+   vpr Used to represent MVE VPR predication.
+   ra_auth_codePseudo register to save PAC.  */
  
  /* The stack backtrace structure is as follows:

fp points to here:  |  save code pointer  |  [fp]
@@ -857,7 +858,7 @@ extern const int arm_arch_cde_coproc_bits[];
1,1,1,1,1,1,1,1,\
1,1,1,1,\
/* Specials.  */\
-  1,1,1,1,1,1,1\
+  1,1,1,1,1,1,1,1  \
  }
  
  /* 1 for registers not available across function calls.

@@ -887,7 +888,7 @@ extern const int arm_arch_cde_coproc_bits[];
1,1,1,1,1,1,1,1,\
1,1,1,1,\
/* Spec

Re: [PATCH v2] LoongArch: Add support code model extreme.

2022-08-19 Thread Xi Ruoyao via Gcc-patches
On Fri, 2022-08-19 at 09:56 +0800, Lulu Cheng wrote:
> v1 -> v2:
> - Modify some description information.
> - Add options -W[no]extreme-plt, warn about code model extreme not support 
> plt mode,
> and then disable plt.

I think we can make -mcmodel=large imply -fno-plt, and reject if user
uses "-fplt -mcmodel=large" explicitly:

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 114ff8ec836..25483da0f44 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -5961,9 +5961,8 @@ loongarch_option_override_internal (struct gcc_options 
*opts)
 
if (opts->x_flag_plt)
  {
-   warning (OPT_Wextreme_plt, "code model %qs not support %s mode,"
-"now set to %s",
-"extreme", "plt", "noplt");
+   if (global_options_set.x_flag_plt)
+ error ("code model %qs is not compatible with %s", "-fplt");
opts->x_flag_plt = 0;
  }
break;

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


[GCC 13/15][PATCH v3] arm: Add support for dwarf debug directives and pseudo hard-register for PAC feature.

2022-08-19 Thread Srinath Parvathaneni via Gcc-patches
Hello,

This patch teaches the DWARF support in gcc about RA_AUTH_CODE pseudo 
hard-register and also 
.save {ra_auth_code} and .cfi_offset ra_auth_code  dwarf directives for 
the PAC feature
in Armv8.1-M architecture.

RA_AUTH_CODE register number is 107 and it's dwarf register number is 143.

When compiled with " -march=armv8.1-m.main -mbranch-protection=pac-ret+leaf+bti 
-mthumb
-mfloat-abi=soft -fasynchronous-unwind-tables -g -O2 -S" command line options, 
the assembly
output after this patch looks like below:

...
.cfi_startproc
pacbti  ip, lr, sp
movsr1, #40
push{ip, lr}
.save {ra_auth_code, lr}
.cfi_def_cfa_offset 8
.cfi_offset 143, -8
.cfi_offset 14, -4
...
pop {ip, lr}
.cfi_restore 14
.cfi_restore 143
.cfi_def_cfa_offset 0
movsr0, #0
aut ip, lr, sp
bx  lr
.cfi_endproc
...

Regression tested on arm-none-eabi target and found no regressions.

Ok for master?

Regards,
Srinath.

gcc/ChangeLog:

2022-08-17  Srinath Parvathaneni  

* config/arm/aout.h (ra_auth_code): Add to enum.
* config/arm/arm.cc (emit_multi_reg_push): Add RA_AUTH_CODE register to
dwarf frame expression.
(arm_emit_multi_reg_pop): Restore RA_AUTH_CODE register.
(arm_expand_prologue): Mark as frame related insn.
(arm_regno_class): Check for pac pseudo reigster.
(arm_dbx_register_number): Assign ra_auth_code register number in dwarf.
(arm_unwind_emit_sequence): Print .save directive with ra_auth_code
register.
(arm_conditional_register_usage): Mark ra_auth_code in fixed reigsters.
* config/arm/arm.h (FIRST_PSEUDO_REGISTER): Modify.
(IS_PAC_Pseudo_REGNUM): Define.
(enum reg_class): Add PAC_REG entry.
* config/arm/arm.md (RA_AUTH_CODE): Define.

gcc/testsuite/ChangeLog:

2022-08-17  Srinath Parvathaneni  

* g++.target/arm/pac-1.C: New test.
* gcc.target/arm/pac-9.c: Likewise.


### Attachment also inlined for ease of reply###


diff --git a/gcc/config/arm/aout.h b/gcc/config/arm/aout.h
index 
b918ad3782fbee82320febb8b6e72ad615780261..ffeed45a678f17c63d5b42c21f020ca416cbf23f
 100644
--- a/gcc/config/arm/aout.h
+++ b/gcc/config/arm/aout.h
@@ -74,7 +74,8 @@
   "wr8",   "wr9",   "wr10",  "wr11",   \
   "wr12",  "wr13",  "wr14",  "wr15",   \
   "wcgr0", "wcgr1", "wcgr2", "wcgr3",  \
-  "cc", "vfpcc", "sfp", "afp", "apsrq", "apsrge", "p0" \
+  "cc", "vfpcc", "sfp", "afp", "apsrq", "apsrge", "p0",\
+  "ra_auth_code"   \
 }
 #endif
 
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 
3495ab857eac38ecdf37e55f1d201b1c35cbde0b..c7067819f6785e44d30d8e5365505ab98682
 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -816,7 +816,8 @@ extern const int arm_arch_cde_coproc_bits[];
s16-s31   S VFP variable (aka d8-d15).
vfpcc   Not a real register.  Represents the VFP condition
code flags.
-   vpr Used to represent MVE VPR predication.  */
+   vpr Used to represent MVE VPR predication.
+   ra_auth_codePseudo register to save PAC.  */
 
 /* The stack backtrace structure is as follows:
   fp points to here:  |  save code pointer  |  [fp]
@@ -857,7 +858,7 @@ extern const int arm_arch_cde_coproc_bits[];
   1,1,1,1,1,1,1,1, \
   1,1,1,1, \
   /* Specials.  */ \
-  1,1,1,1,1,1,1\
+  1,1,1,1,1,1,1,1  \
 }
 
 /* 1 for registers not available across function calls.
@@ -887,7 +888,7 @@ extern const int arm_arch_cde_coproc_bits[];
   1,1,1,1,1,1,1,1, \
   1,1,1,1, \
   /* Specials.  */ \
-  1,1,1,1,1,1,1\
+  1,1,1,1,1,1,1,1  \
 }
 
 #ifndef SUBTARGET_CONDITIONAL_REGISTER_USAGE
@@ -1063,10 +1064,10 @@ extern const int arm_arch_cde_coproc_bits[];
&& (LAST_VFP_REGNUM - (REGNUM) >= 2 * (N) - 1))
 
 /* The number of hard registers is 16 ARM + 1 CC + 1 SFP + 1 AFP
-   + 1 APSRQ + 1 APSRGE + 1 VPR.  */
+   + 1 APSRQ + 1 APSRGE + 1 VPR + 1 Pseudo register to save PAC.  */
 /* Intel Wireless MMX Technology registers add 16 + 4 more.  */
 /* VFP (VFP3) adds 32 (64) + 1 VFPCC.  */
-#define FIRST_PSEUDO_REGISTER   107
+#define FIRST_PSEUDO_REGISTER   108
 
 #define DBX_REGISTER_NUMBER(REGNO) arm_dbx_register_number (REGNO)
 
@@ -1253,12 +1254,15 @@ extern int arm_regs_in_sequence[];
   CC_REGNUM, VFPCC_REGNUM, \
   FRAME_POINTER_REGNUM, ARG_POINTER_REGNUM,\
   SP_REGNUM, PC_REGNUM, APSRQ_REGNUM,  \
-  APSRGE_REGNUM, VPR_REGNUM\
+  APSRGE_REGNUM, VPR_REGNUM, RA_AUTH_CODE 

Re: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)

2022-08-19 Thread juzhe.zh...@rivai.ai
> Ah, right, sorry for the bogus suggestion.
> In that case, what specifically goes wrong?  Which test in
> test_vector_subregs_modes fails, and what does the ICE look like?

Because may_be (nunits, 1) return true. For nunits = (1,1), it will execute 
test_vector_subregs_modes.
The fail ICE:
../../../riscv-gcc/gcc/simplify-rtx.cc:8396: test_vector_subregs_modes: FAIL: 
ASSERT_RTX_EQ (expected, simplify_subreg (QImode, x, inner_mode, byte))
  expected: (nil)

  actual: (const_int 0 [0])
cc1: internal compiler error: in assert_rtx_eq_at, at selftest-rtl.cc:57
0x1304ee1 selftest::assert_rtx_eq_at(selftest::location const&, char const*, 
rtx_def*, rtx_def*)
../../../riscv-gcc/gcc/selftest-rtl.cc:57
0x1332504 test_vector_subregs_modes
../../../riscv-gcc/gcc/simplify-rtx.cc:8395
0x1332988 test_vector_subregs_fore_back
../../../riscv-gcc/gcc/simplify-rtx.cc:8442
0x1332ae7 test_vector_subregs
../../../riscv-gcc/gcc/simplify-rtx.cc:8467
0x1332c57 test_vector_ops
../../../riscv-gcc/gcc/simplify-rtx.cc:8487
0x1332c7b selftest::simplify_rtx_cc_tests()
../../../riscv-gcc/gcc/simplify-rtx.cc:8547
0x21318fa selftest::run_tests()
../../../riscv-gcc/gcc/selftest-run-tests.cc:115
0x1362a76 toplev::run_self_tests()
../../../riscv-gcc/gcc/toplev.cc:2205

I analyzed the codes:
In test_vector_subregs_fore_back, when nunits = (1,1). The expected = NULL_RTX 
and simplify_subreg (QImode, x, inner_mode, byte) = const_int 0.
So the assertion fails.

This is the test for stepped vector using 2 element per pattern.  For 
poly_uint16 (1,1), it's true it is possible only has 1 element. 
I think it makes sense to fail the test. However for poly (1,1) machine mode, 
can we have the chance that some target define this
kind of machine mode only used for intrinsics?  I already developed full RVV 
support in GCC (including intrinsc and auto-vectorization).
I only enable auto-vectorization with mode larger than (2,2) and test it fully.
From my experience, it seems the stepped vector only created during VLA 
auto-vectorization. And I think only allow poly (1,1)mode used in intrinsics 
will
not create issues. Am I understanding wrong ?Feel free to correct me. Thanks ~



juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2022-08-19 17:35
To: juzhe.zhong\@rivai.ai
CC: rguenther; gcc-patches; kito.cheng
Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) 
and allow the machine_mode definition with poly_uint16 (1, 1)
"juzhe.zh...@rivai.ai"  writes:
> Hi, Richard. I tried the codes:
> if (!nunits.is_constant () && maybe_gt (nunits, 1)) 
> test_vector_subregs_modes (x, nunits - min_nunits, count);
>
> It still failed. For nunits = (1,1) , maybe_gt (nunits, 1) return true value.
 
Ah, right, sorry for the bogus suggestion.
 
In that case, what specifically goes wrong?  Which test in
test_vector_subregs_modes fails, and what does the ICE look like?
 
Thanks,
Richard
 
> But I tried:
> if (!nunits.is_constant () && known_gt (nunits, 1)) 
> test_vector_subregs_modes (x, nunits - min_nunits, count);
> It pass. But it report a warning: "warning: comparison between signed and 
> unsigned integer expressions [-Wsign-compare]" during the compilation.
>
> Finally, I tried:
> if (!nunits.is_constant () && known_gt (GET_MODE_NUNITS (inner_mode), 1)) 
> test_vector_subregs_modes (x, nunits - min_nunits, count);
> It passed with no warning.
>
> Is 'known_gt (GET_MODE_NUNITS (inner_mode), 1)' a good solution for this?
> Thanks!
>
>
> juzhe.zh...@rivai.ai
>  
> From: Richard Sandiford
> Date: 2022-08-19 16:03
> To: juzhe.zhong
> CC: gcc-patches; rguenther; kito.cheng
> Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) 
> and allow the machine_mode definition with poly_uint16 (1, 1)
> juzhe.zh...@rivai.ai writes:
>> From: zhongjuzhe 
>>
>> Hello. This patch is preparing for following RVV support.
>>
>> Both ARM SVE and RVV (RISC-V 'V' Extension) support length-agnostic vector.
>> The minimum vector length of ARM SVE is 128-bit and the runtime invariant of 
>> ARM SVE is always 128-bit blocks.
>> However, the minimum vector length of RVV can be 32bit in 'Zve32*' 
>> sub-extension and 64bit in 'Zve*' sub-extension.
>>
>> So I define the machine_mode as follows:
>> VECTOR_MODE_WITH_PREFIX (VNx, INT, DI, 1, 0);
>> ADJUST_NUNITS (MODE, riscv_vector_chunks);
>> The riscv_vector_chunks = poly_uint16 (1, 1)
>>
>> The compilation is failed for the stepped vector test:
>> (const_vector:VNx1DI repeat [
>> (const_int 8 [0x8])
>> (const_int 7 [0x7])
>> ])
>>
>> I understand for stepped vector should always have aleast 2 elements and 
>> stepped vector initialization is common
>> for VLA (vector-lengthe agnostic) auto-vectorization. It makes sense that 
>> report fail for stepped vector of poly_uint16 (1, 1).
>>
>> machine mode with nunits = poly_uint16 (1, 1) needs to implemented in 
>> intrinsics. And I would like to enable RVV auto-vectoriz

Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)

2022-08-19 Thread Richard Sandiford via Gcc-patches
"juzhe.zh...@rivai.ai"  writes:
> Hi, Richard. I tried the codes:
> if (!nunits.is_constant () && maybe_gt (nunits, 1)) 
> test_vector_subregs_modes (x, nunits - min_nunits, count);
>
> It still failed. For nunits = (1,1) , maybe_gt (nunits, 1) return true value.

Ah, right, sorry for the bogus suggestion.

In that case, what specifically goes wrong?  Which test in
test_vector_subregs_modes fails, and what does the ICE look like?

Thanks,
Richard

> But I tried:
> if (!nunits.is_constant () && known_gt (nunits, 1)) 
> test_vector_subregs_modes (x, nunits - min_nunits, count);
> It pass. But it report a warning: "warning: comparison between signed and 
> unsigned integer expressions [-Wsign-compare]" during the compilation.
>
> Finally, I tried:
> if (!nunits.is_constant () && known_gt (GET_MODE_NUNITS (inner_mode), 1)) 
> test_vector_subregs_modes (x, nunits - min_nunits, count);
> It passed with no warning.
>
> Is 'known_gt (GET_MODE_NUNITS (inner_mode), 1)' a good solution for this?
> Thanks!
>
>
> juzhe.zh...@rivai.ai
>  
> From: Richard Sandiford
> Date: 2022-08-19 16:03
> To: juzhe.zhong
> CC: gcc-patches; rguenther; kito.cheng
> Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) 
> and allow the machine_mode definition with poly_uint16 (1, 1)
> juzhe.zh...@rivai.ai writes:
>> From: zhongjuzhe 
>>
>> Hello. This patch is preparing for following RVV support.
>>
>> Both ARM SVE and RVV (RISC-V 'V' Extension) support length-agnostic vector.
>> The minimum vector length of ARM SVE is 128-bit and the runtime invariant of 
>> ARM SVE is always 128-bit blocks.
>> However, the minimum vector length of RVV can be 32bit in 'Zve32*' 
>> sub-extension and 64bit in 'Zve*' sub-extension.
>>
>> So I define the machine_mode as follows:
>> VECTOR_MODE_WITH_PREFIX (VNx, INT, DI, 1, 0);
>> ADJUST_NUNITS (MODE, riscv_vector_chunks);
>> The riscv_vector_chunks = poly_uint16 (1, 1)
>>
>> The compilation is failed for the stepped vector test:
>> (const_vector:VNx1DI repeat [
>> (const_int 8 [0x8])
>> (const_int 7 [0x7])
>> ])
>>
>> I understand for stepped vector should always have aleast 2 elements and 
>> stepped vector initialization is common
>> for VLA (vector-lengthe agnostic) auto-vectorization. It makes sense that 
>> report fail for stepped vector of poly_uint16 (1, 1).
>>
>> machine mode with nunits = poly_uint16 (1, 1) needs to implemented in 
>> intrinsics. And I would like to enable RVV auto-vectorization
>> with vector mode only nunits is larger than poly_uint16 (2, 2) in RISC-V 
>> backend. I think it will not create issue if we define
>> vector mode with nunits = poly_uint16 (1, 1). Feel free to correct me or 
>> offer me some other better solutions. Thanks!
>>
>>   
>>
>> gcc/ChangeLog:
>>
>> * simplify-rtx.cc (test_vector_subregs_fore_back): skip test for 
>> poly_uint16 (1, 1).
>>
>> ---
>>  gcc/simplify-rtx.cc | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
>> index 7d09bf7103d..61e0dfa00d0 100644
>> --- a/gcc/simplify-rtx.cc
>> +++ b/gcc/simplify-rtx.cc
>> @@ -8438,7 +8438,7 @@ test_vector_subregs_fore_back (machine_mode inner_mode)
>>rtx x = builder.build ();
>>  
>>test_vector_subregs_modes (x);
>> -  if (!nunits.is_constant ())
>> +  if (!nunits.is_constant () && known_ne (nunits, poly_uint16 (1, 1)))
>>  test_vector_subregs_modes (x, nunits - min_nunits, count);
>  
> I think instead we should use maybe_gt (nunits, 1), on the basis that
> the fore_back tests require vectors that have a minimum of 2 elements.
> Something like poly_uint16 (1, 2) would have the same problem as
> poly_uint16 (1, 1).  ({1, 2} is an unlikely value, but it's OK in
> principle.)
>  
> This corresponds to the minimum of 3 elements for the stepped tests:
>  
>   if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT
>   && maybe_gt (GET_MODE_NUNITS (mode), 2))
> {
>   test_vector_ops_series (mode, scalar_reg);
>   test_vector_subregs (mode);
> }
>  
> Thanks,
> Richard
>  


Re: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)

2022-08-19 Thread juzhe.zh...@rivai.ai
Hi, Richard. I tried the codes:
if (!nunits.is_constant () && maybe_gt (nunits, 1)) 
test_vector_subregs_modes (x, nunits - min_nunits, count);

It still failed. For nunits = (1,1) , maybe_gt (nunits, 1) return true value.
But I tried:
if (!nunits.is_constant () && known_gt (nunits, 1)) 
test_vector_subregs_modes (x, nunits - min_nunits, count);
It pass. But it report a warning: "warning: comparison between signed and 
unsigned integer expressions [-Wsign-compare]" during the compilation.

Finally, I tried:
if (!nunits.is_constant () && known_gt (GET_MODE_NUNITS (inner_mode), 1)) 
test_vector_subregs_modes (x, nunits - min_nunits, count);
It passed with no warning.

Is 'known_gt (GET_MODE_NUNITS (inner_mode), 1)' a good solution for this?
Thanks!


juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2022-08-19 16:03
To: juzhe.zhong
CC: gcc-patches; rguenther; kito.cheng
Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) 
and allow the machine_mode definition with poly_uint16 (1, 1)
juzhe.zh...@rivai.ai writes:
> From: zhongjuzhe 
>
> Hello. This patch is preparing for following RVV support.
>
> Both ARM SVE and RVV (RISC-V 'V' Extension) support length-agnostic vector.
> The minimum vector length of ARM SVE is 128-bit and the runtime invariant of 
> ARM SVE is always 128-bit blocks.
> However, the minimum vector length of RVV can be 32bit in 'Zve32*' 
> sub-extension and 64bit in 'Zve*' sub-extension.
>
> So I define the machine_mode as follows:
> VECTOR_MODE_WITH_PREFIX (VNx, INT, DI, 1, 0);
> ADJUST_NUNITS (MODE, riscv_vector_chunks);
> The riscv_vector_chunks = poly_uint16 (1, 1)
>
> The compilation is failed for the stepped vector test:
> (const_vector:VNx1DI repeat [
> (const_int 8 [0x8])
> (const_int 7 [0x7])
> ])
>
> I understand for stepped vector should always have aleast 2 elements and 
> stepped vector initialization is common
> for VLA (vector-lengthe agnostic) auto-vectorization. It makes sense that 
> report fail for stepped vector of poly_uint16 (1, 1).
>
> machine mode with nunits = poly_uint16 (1, 1) needs to implemented in 
> intrinsics. And I would like to enable RVV auto-vectorization
> with vector mode only nunits is larger than poly_uint16 (2, 2) in RISC-V 
> backend. I think it will not create issue if we define
> vector mode with nunits = poly_uint16 (1, 1). Feel free to correct me or 
> offer me some other better solutions. Thanks!
>
>   
>
> gcc/ChangeLog:
>
> * simplify-rtx.cc (test_vector_subregs_fore_back): skip test for 
> poly_uint16 (1, 1).
>
> ---
>  gcc/simplify-rtx.cc | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
> index 7d09bf7103d..61e0dfa00d0 100644
> --- a/gcc/simplify-rtx.cc
> +++ b/gcc/simplify-rtx.cc
> @@ -8438,7 +8438,7 @@ test_vector_subregs_fore_back (machine_mode inner_mode)
>rtx x = builder.build ();
>  
>test_vector_subregs_modes (x);
> -  if (!nunits.is_constant ())
> +  if (!nunits.is_constant () && known_ne (nunits, poly_uint16 (1, 1)))
>  test_vector_subregs_modes (x, nunits - min_nunits, count);
 
I think instead we should use maybe_gt (nunits, 1), on the basis that
the fore_back tests require vectors that have a minimum of 2 elements.
Something like poly_uint16 (1, 2) would have the same problem as
poly_uint16 (1, 1).  ({1, 2} is an unlikely value, but it's OK in
principle.)
 
This corresponds to the minimum of 3 elements for the stepped tests:
 
  if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT
  && maybe_gt (GET_MODE_NUNITS (mode), 2))
{
  test_vector_ops_series (mode, scalar_reg);
  test_vector_subregs (mode);
}
 
Thanks,
Richard
 


GCC 12.2.1 Status Report (2022-08-19)

2022-08-19 Thread Richard Biener via Gcc-patches
Status
==

The gcc-12 branch is again open for regression and documentation fixes.


Quality Data


Priority  #   Change from last report
---   ---
P1  0  
P2  434   +   2
P3  64+   2
P4  239
P5  25 
---   ---
Total P1-P3 498   +   4
Total   762   +   4


Previous Report
===

https://gcc.gnu.org/pipermail/gcc/2022-August/239270.html


Re: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)

2022-08-19 Thread juzhe.zh...@rivai.ai
Thank you so much. Address your comment. I think "maybe_gt (nunits, 1)" is a 
more solid solution than I do.
I will send a patch to fix this.



juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2022-08-19 16:03
To: juzhe.zhong
CC: gcc-patches; rguenther; kito.cheng
Subject: Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) 
and allow the machine_mode definition with poly_uint16 (1, 1)
juzhe.zh...@rivai.ai writes:
> From: zhongjuzhe 
>
> Hello. This patch is preparing for following RVV support.
>
> Both ARM SVE and RVV (RISC-V 'V' Extension) support length-agnostic vector.
> The minimum vector length of ARM SVE is 128-bit and the runtime invariant of 
> ARM SVE is always 128-bit blocks.
> However, the minimum vector length of RVV can be 32bit in 'Zve32*' 
> sub-extension and 64bit in 'Zve*' sub-extension.
>
> So I define the machine_mode as follows:
> VECTOR_MODE_WITH_PREFIX (VNx, INT, DI, 1, 0);
> ADJUST_NUNITS (MODE, riscv_vector_chunks);
> The riscv_vector_chunks = poly_uint16 (1, 1)
>
> The compilation is failed for the stepped vector test:
> (const_vector:VNx1DI repeat [
> (const_int 8 [0x8])
> (const_int 7 [0x7])
> ])
>
> I understand for stepped vector should always have aleast 2 elements and 
> stepped vector initialization is common
> for VLA (vector-lengthe agnostic) auto-vectorization. It makes sense that 
> report fail for stepped vector of poly_uint16 (1, 1).
>
> machine mode with nunits = poly_uint16 (1, 1) needs to implemented in 
> intrinsics. And I would like to enable RVV auto-vectorization
> with vector mode only nunits is larger than poly_uint16 (2, 2) in RISC-V 
> backend. I think it will not create issue if we define
> vector mode with nunits = poly_uint16 (1, 1). Feel free to correct me or 
> offer me some other better solutions. Thanks!
>
>   
>
> gcc/ChangeLog:
>
> * simplify-rtx.cc (test_vector_subregs_fore_back): skip test for 
> poly_uint16 (1, 1).
>
> ---
>  gcc/simplify-rtx.cc | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
> index 7d09bf7103d..61e0dfa00d0 100644
> --- a/gcc/simplify-rtx.cc
> +++ b/gcc/simplify-rtx.cc
> @@ -8438,7 +8438,7 @@ test_vector_subregs_fore_back (machine_mode inner_mode)
>rtx x = builder.build ();
>  
>test_vector_subregs_modes (x);
> -  if (!nunits.is_constant ())
> +  if (!nunits.is_constant () && known_ne (nunits, poly_uint16 (1, 1)))
>  test_vector_subregs_modes (x, nunits - min_nunits, count);
 
I think instead we should use maybe_gt (nunits, 1), on the basis that
the fore_back tests require vectors that have a minimum of 2 elements.
Something like poly_uint16 (1, 2) would have the same problem as
poly_uint16 (1, 1).  ({1, 2} is an unlikely value, but it's OK in
principle.)
 
This corresponds to the minimum of 3 elements for the stepped tests:
 
  if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT
  && maybe_gt (GET_MODE_NUNITS (mode), 2))
{
  test_vector_ops_series (mode, scalar_reg);
  test_vector_subregs (mode);
}
 
Thanks,
Richard
 


Re: [PATCH] middle-end: skipp stepped vector test of poly_int (1, 1) and allow the machine_mode definition with poly_uint16 (1, 1)

2022-08-19 Thread Richard Sandiford via Gcc-patches
juzhe.zh...@rivai.ai writes:
> From: zhongjuzhe 
>
> Hello. This patch is preparing for following RVV support.
>
> Both ARM SVE and RVV (RISC-V 'V' Extension) support length-agnostic vector.
> The minimum vector length of ARM SVE is 128-bit and the runtime invariant of 
> ARM SVE is always 128-bit blocks.
> However, the minimum vector length of RVV can be 32bit in 'Zve32*' 
> sub-extension and 64bit in 'Zve*' sub-extension.
>
> So I define the machine_mode as follows:
> VECTOR_MODE_WITH_PREFIX (VNx, INT, DI, 1, 0);
> ADJUST_NUNITS (MODE, riscv_vector_chunks);
> The riscv_vector_chunks = poly_uint16 (1, 1)
>
> The compilation is failed for the stepped vector test:
> (const_vector:VNx1DI repeat [
> (const_int 8 [0x8])
> (const_int 7 [0x7])
> ])
>
> I understand for stepped vector should always have aleast 2 elements and 
> stepped vector initialization is common
> for VLA (vector-lengthe agnostic) auto-vectorization. It makes sense that 
> report fail for stepped vector of poly_uint16 (1, 1).
>
> machine mode with nunits = poly_uint16 (1, 1) needs to implemented in 
> intrinsics. And I would like to enable RVV auto-vectorization
> with vector mode only nunits is larger than poly_uint16 (2, 2) in RISC-V 
> backend. I think it will not create issue if we define
> vector mode with nunits = poly_uint16 (1, 1). Feel free to correct me or 
> offer me some other better solutions. Thanks!
>
>   
>
> gcc/ChangeLog:
>
> * simplify-rtx.cc (test_vector_subregs_fore_back): skip test for 
> poly_uint16 (1, 1).
>
> ---
>  gcc/simplify-rtx.cc | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
> index 7d09bf7103d..61e0dfa00d0 100644
> --- a/gcc/simplify-rtx.cc
> +++ b/gcc/simplify-rtx.cc
> @@ -8438,7 +8438,7 @@ test_vector_subregs_fore_back (machine_mode inner_mode)
>rtx x = builder.build ();
>  
>test_vector_subregs_modes (x);
> -  if (!nunits.is_constant ())
> +  if (!nunits.is_constant () && known_ne (nunits, poly_uint16 (1, 1)))
>  test_vector_subregs_modes (x, nunits - min_nunits, count);

I think instead we should use maybe_gt (nunits, 1), on the basis that
the fore_back tests require vectors that have a minimum of 2 elements.
Something like poly_uint16 (1, 2) would have the same problem as
poly_uint16 (1, 1).  ({1, 2} is an unlikely value, but it's OK in
principle.)

This corresponds to the minimum of 3 elements for the stepped tests:

  if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT
  && maybe_gt (GET_MODE_NUNITS (mode), 2))
{
  test_vector_ops_series (mode, scalar_reg);
  test_vector_subregs (mode);
}

Thanks,
Richard


Re: [PATCH] jobserver: detect properly O_NONBLOCK

2022-08-19 Thread Richard Biener via Gcc-patches
On Thu, Aug 18, 2022 at 2:43 PM Martin Liška  wrote:
>
> That handles systems that don't have O_NONBLOCK, in that case
> WPA streaming is not using jobserver if --jobserver-auth uses 'fifo'.
>
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> Tested with mingw cross compiler as well.
>
> Ready to be installed?

libstdc++ uses

# For Networking TS.
AC_CHECK_HEADERS([fcntl.h sys/ioctl.h sys/socket.h sys/uio.h poll.h
netdb.h arpa/inet.h netinet/in.h netinet/tcp.h])
AC_CHECK_DECL(F_GETFL,,,[#include ])
AC_CHECK_DECL(F_SETFL,,,[#include ])
if test "$ac_cv_have_decl_F_GETFL$ac_cv_have_decl_F_SETFL" = yesyes ; then
  AC_CHECK_DECL(O_NONBLOCK,,,[#include ])
fi

I'd rather not invent sth fancy with /dev/null (that seems very unixy)

> Thanks,
> Martin
>
> gcc/ChangeLog:
>
> * configure.ac: Detect O_NONBLOCK flag for open.
> * config.in: Regenerate.
> * configure: Regenerate.
> * opts-common.cc (jobserver_info::connect): Set is_connected
>   properly based on O_NONBLOCK.
> * opts-jobserver.h (struct jobserver_info): Add is_connected
>   member variable.
>
> gcc/lto/ChangeLog:
>
> * lto.cc (wait_for_child): Ask if we are connected to jobserver.
> (stream_out_partitions): Likewise.
> ---
>  gcc/config.in|  6 ++
>  gcc/configure| 39 +--
>  gcc/configure.ac | 11 +++
>  gcc/lto/lto.cc   | 12 ++--
>  gcc/opts-common.cc   | 11 ++-
>  gcc/opts-jobserver.h |  2 ++
>  6 files changed, 72 insertions(+), 9 deletions(-)
>
> diff --git a/gcc/config.in b/gcc/config.in
> index 413b2bd36cb..abab9bf5024 100644
> --- a/gcc/config.in
> +++ b/gcc/config.in
> @@ -2148,6 +2148,12 @@
>  #endif
>
>
> +/* Define if O_NONBLOCK supported by fcntl. */
> +#ifndef USED_FOR_TARGET
> +#undef HOST_HAS_O_NONBLOCK
> +#endif
> +
> +
>  /* Define which stat syscall is able to handle 64bit indodes. */
>  #ifndef USED_FOR_TARGET
>  #undef HOST_STAT_FOR_64BIT_INODES
> diff --git a/gcc/configure b/gcc/configure
> index da7a45066b5..8b416c1a142 100755
> --- a/gcc/configure
> +++ b/gcc/configure
> @@ -12460,6 +12460,41 @@ $as_echo "#define HOST_HAS_O_CLOEXEC 1" >>confdefs.h
>
>  fi
>
> +# Check if O_NONBLOCK is defined by fcntl
> +{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for O_NONBLOCK" >&5
> +$as_echo_n "checking for O_NONBLOCK... " >&6; }
> +if ${ac_cv_o_nonblock+:} false; then :
> +  $as_echo_n "(cached) " >&6
> +else
> +
> +cat confdefs.h - <<_ACEOF >conftest.$ac_ext
> +/* end confdefs.h.  */
> +
> +#include 
> +int
> +main ()
> +{
> +
> +return open ("/dev/null", O_RDONLY | O_NONBLOCK);
> +  ;
> +  return 0;
> +}
> +_ACEOF
> +if ac_fn_cxx_try_compile "$LINENO"; then :
> +  ac_cv_o_nonblock=yes
> +else
> +  ac_cv_o_nonblock=no
> +fi
> +rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
> +fi
> +{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_o_nonblock" >&5
> +$as_echo "$ac_cv_o_nonblock" >&6; }
> +if test $ac_cv_o_nonblock = yes; then
> +
> +$as_echo "#define HOST_HAS_O_NONBLOCK 1" >>confdefs.h
> +
> +fi
> +
>  # C++ Modules would like some networking features to provide the mapping
>  # server.  You can still use modules without them though.
>  # The following network-related checks could probably do with some
> @@ -19678,7 +19713,7 @@ else
>lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
>lt_status=$lt_dlunknown
>cat > conftest.$ac_ext <<_LT_EOF
> -#line 19681 "configure"
> +#line 19716 "configure"
>  #include "confdefs.h"
>
>  #if HAVE_DLFCN_H
> @@ -19784,7 +19819,7 @@ else
>lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
>lt_status=$lt_dlunknown
>cat > conftest.$ac_ext <<_LT_EOF
> -#line 19787 "configure"
> +#line 19822 "configure"
>  #include "confdefs.h"
>
>  #if HAVE_DLFCN_H
> diff --git a/gcc/configure.ac b/gcc/configure.ac
> index f70b6c24fda..4ebdad38b9b 100644
> --- a/gcc/configure.ac
> +++ b/gcc/configure.ac
> @@ -1707,6 +1707,17 @@ if test $ac_cv_o_cloexec = yes; then
>[Define if O_CLOEXEC supported by fcntl.])
>  fi
>
> +# Check if O_NONBLOCK is defined by fcntl
> +AC_CACHE_CHECK(for O_NONBLOCK, ac_cv_o_nonblock, [
> +AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[
> +#include ]], [[
> +return open ("/dev/null", O_RDONLY | O_NONBLOCK);]])],
> +[ac_cv_o_nonblock=yes],[ac_cv_o_nonblock=no])])
> +if test $ac_cv_o_nonblock = yes; then
> +  AC_DEFINE(HOST_HAS_O_NONBLOCK, 1,
> +  [Define if O_NONBLOCK supported by fcntl.])
> +fi
> +
>  # C++ Modules would like some networking features to provide the mapping
>  # server.  You can still use modules without them though.
>  # The following network-related checks could probably do with some
> diff --git a/gcc/lto/lto.cc b/gcc/lto/lto.cc
> index c82307f4f7e..3a9147b01b5 100644
> --- a/gcc/lto/lto.cc
> +++ b/gcc/lto/lto.cc
> @@ -213,11 +213,11 @@ wait_for_child ()
>  }
>while (!WIFEXITED (status) && !WIFSIGNALED (status));
>
> ---nruns;
> +  --nruns;
>
> -

Re: [PATCH] Make path_range_query standalone and add reset_path.

2022-08-19 Thread Richard Biener via Gcc-patches
On Thu, Aug 18, 2022 at 1:10 PM Aldy Hernandez  wrote:
>
> On Thu, Aug 18, 2022, 11:34 Richard Biener  wrote:
> >
> > On Thu, Aug 18, 2022 at 9:58 AM Aldy Hernandez  wrote:
> > >
> > > FWIW, here is a rebased version with the latest trunk.
> > >
> > > There are no regressions, or differences in threading counts, and the
> > > performance change is negligible.
> >
> > +{
> > +  mark_dfs_back_edges ();
> > +
> > +  if (flag_checking)
> > +   verify_marked_backedges (fun);
> >
> > that looks redundant.
>
> Do you suggest somewhere else, or should we nuke
> verify_marked_backedges altogether since that's its only use?

Not sure.  I'd leave the function around in any case.

> >
> > Otherwise looks good - it might be possible to get rid of the reset_path use
> > as well?
>
>
> That's what I mentioned wrt DOM threading. There's a 14% degradation
> from not reusing the path_range_query in the forward threader  because
> it really abuses the query to try to simplify its statements.  So I've
> left reset_path for it to use, but we can get rid of it when we nuke
> the old threader.

Ah, I see.

> Aldy
>
>
> Aldy
>
> >
> > Richard.
> >
> > > Aldy
> > >
> > > On Wed, Aug 17, 2022 at 1:59 PM Aldy Hernandez  wrote:
> > > >
> > > > These are a bunch of cleanups inspired by Richi's suggestion of making
> > > > path_range_query standalone, instead of having to call
> > > > compute_ranges() for each path.
> > > >
> > > > I've made the ranger need explicit, and moved the responsibility for
> > > > its creation to the caller.
> > > >
> > > > I've also investigated and documented why the forward threader needs its
> > > > own compute exit dependencies variant.  I can't wait for it to go away
> > > > :-/.
> > > >
> > > > I've also added constructors that take a path and dependencies, and
> > > > made compute_ranges() private.  Unfortunately, reinstantiating
> > > > path_range_query in the forward threader caused a 14% performance
> > > > regression in DOM, because the old threader calls it over and over on
> > > > the same path to simplify statements (some of which not even in the
> > > > IL, but that's old news).
> > > >
> > > > In the meantime, I've left the ability to reset a path, but this time
> > > > appropriately called reset_path().
> > > >
> > > > I've moved the verify_marked_backedges call to the threader.  Is this
> > > > ok?
> > > >
> > > > Interestingly, comparing the thread counts for the patch yielded more
> > > > threads.  I narrowed this down to a bug in the path oracle reset code
> > > > that's not cleaning things up as expected.  I'll fix that before
> > > > committing this, but figured I'd post for comments in the meantime.
> > > >
> > > > Thoughts?
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > * gimple-range-path.cc (path_range_query::path_range_query): Add
> > > > various constructors to take a path.
> > > > (path_range_query::~path_range_query): Remove m_alloced_ranger.
> > > > (path_range_query::range_on_path_entry): Adjust for m_ranger 
> > > > being
> > > > a reference.
> > > > (path_range_query::set_path): Rename to...
> > > > (path_range_query::reset_path): ...this and call compute_ranges.
> > > > (path_range_query::ssa_range_in_phi): Adjust for m_ranger
> > > > reference.
> > > > (path_range_query::range_defined_in_block): Same.
> > > > (path_range_query::compute_ranges_in_block): Same.
> > > > (path_range_query::adjust_for_non_null_uses): Same.
> > > > (path_range_query::compute_exit_dependencies): Use m_path 
> > > > instead
> > > > of argument.
> > > > (path_range_query::compute_ranges): Remove path argument.
> > > > (path_range_query::range_of_stmt): Adjust for m_ranger 
> > > > reference.
> > > > (path_range_query::compute_outgoing_relations): Same.
> > > > * gimple-range-path.h (class path_range_query): Add various
> > > > constructors.
> > > > Make compute_ranges and compute_exit_dependencies private.
> > > > Rename set_path to reset_path.
> > > > Make m_ranger a reference.
> > > > Remove m_alloced_ranger.
> > > > * tree-ssa-dom.cc (pass_dominator::execute): Adjust constructor 
> > > > to
> > > > path_range_query.
> > > > * tree-ssa-loop-ch.cc (entry_loop_condition_is_static): Take a
> > > > ranger and instantiate a new path_range_query every time.
> > > > (ch_base::copy_headers): Pass ranger instead of 
> > > > path_range_query.
> > > > * tree-ssa-threadbackward.cc (class back_threader): Remove 
> > > > m_solver.
> > > > (back_threader::~back_threader): Remove m_solver.
> > > > (back_threader::find_taken_edge_switch): Adjust for m_ranger
> > > > reference.
> > > > (back_threader::find_taken_edge_cond): Same.
> > > > (back_threader::dump): Remove m_solver.
> > > > (back_threader::back_threader): Move v

Re: [PATCH] Remove path_range_query constructor that takes an edge.

2022-08-19 Thread Richard Biener via Gcc-patches
On Thu, Aug 18, 2022 at 6:13 PM Aldy Hernandez  wrote:
>
> The path_range_query constructor that takes an edge is really a
> convenience function for the loop-ch pass.  It feels wrong to pollute
> the API with such a specialized function that could be done with
> a small inline function closer to its user.
>
> As an added benefit, we remove one use of reset_path.  The last
> remaining one is the forward threader one.
>
> OK?

OK.

> gcc/ChangeLog:
>
> * gimple-range-path.cc (path_range_query::path_range_query):
> Remove constructor that takes edge.
> * gimple-range-path.h (class path_range_query): Same.
> * tree-ssa-loop-ch.cc (edge_range_query): New.
> (entry_loop_condition_is_static): Call edge_range_query.
> ---
>  gcc/gimple-range-path.cc | 15 ---
>  gcc/gimple-range-path.h  |  1 -
>  gcc/tree-ssa-loop-ch.cc  | 17 +++--
>  3 files changed, 15 insertions(+), 18 deletions(-)
>
> diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc
> index ba7c2ed9b47..bc2879c0c57 100644
> --- a/gcc/gimple-range-path.cc
> +++ b/gcc/gimple-range-path.cc
> @@ -59,21 +59,6 @@ path_range_query::path_range_query (gimple_ranger &ranger, 
> bool resolve)
>m_oracle = new path_oracle (m_ranger.oracle ());
>  }
>
> -path_range_query::path_range_query (gimple_ranger &ranger,
> -   edge e,
> -   bool resolve)
> -  : m_cache (new ssa_global_cache),
> -m_has_cache_entry (BITMAP_ALLOC (NULL)),
> -m_ranger (ranger),
> -m_resolve (resolve)
> -{
> -  m_oracle = new path_oracle (m_ranger.oracle ());
> -  auto_vec bbs (2);
> -  bbs.quick_push (e->dest);
> -  bbs.quick_push (e->src);
> -  reset_path (bbs, NULL);
> -}
> -
>  path_range_query::~path_range_query ()
>  {
>delete m_oracle;
> diff --git a/gcc/gimple-range-path.h b/gcc/gimple-range-path.h
> index 483fde0d431..9f2d6d92dab 100644
> --- a/gcc/gimple-range-path.h
> +++ b/gcc/gimple-range-path.h
> @@ -37,7 +37,6 @@ public:
> const bitmap_head *dependencies = NULL,
> bool resolve = true);
>path_range_query (gimple_ranger &ranger, bool resolve = true);
> -  path_range_query (gimple_ranger &ranger, edge e, bool resolve = true);
>virtual ~path_range_query ();
>void reset_path (const vec &, const bitmap_head 
> *dependencies);
>bool range_of_expr (vrange &r, tree name, gimple * = NULL) override;
> diff --git a/gcc/tree-ssa-loop-ch.cc b/gcc/tree-ssa-loop-ch.cc
> index 96816b89287..9c316887d5b 100644
> --- a/gcc/tree-ssa-loop-ch.cc
> +++ b/gcc/tree-ssa-loop-ch.cc
> @@ -45,6 +45,20 @@ along with GCC; see the file COPYING3.  If not see
> increases effectiveness of code motion optimizations, and reduces the need
> for loop preconditioning.  */
>
> +/* Given a path through edge E, whose last statement is COND, return
> +   the range of the solved conditional in R.  */
> +
> +static void
> +edge_range_query (irange &r, edge e, gcond *cond, gimple_ranger &ranger)
> +{
> +  auto_vec path (2);
> +  path.safe_push (e->dest);
> +  path.safe_push (e->src);
> +  path_range_query query (ranger, path);
> +  if (!query.range_of_stmt (r, cond))
> +r.set_varying (boolean_type_node);
> +}
> +
>  /* Return true if the condition on the first iteration of the loop can
> be statically determined.  */
>
> @@ -72,8 +86,7 @@ entry_loop_condition_is_static (class loop *l, 
> gimple_ranger *ranger)
>  desired_static_value = boolean_true_node;
>
>int_range<2> r;
> -  path_range_query query (*ranger, e);
> -  query.range_of_stmt (r, last);
> +  edge_range_query (r, e, last, *ranger);
>return r == int_range<2> (desired_static_value, desired_static_value);
>  }
>
> --
> 2.37.1
>


Re: [PATCH] Fix bogus -Wstringop-overflow warning in Ada

2022-08-19 Thread Richard Biener via Gcc-patches
On Thu, Aug 18, 2022 at 4:46 PM Eric Botcazou  wrote:
>
> > Hmm :/  But that means we _should_ force a sign extension but only
> > from ptrofftype_p ()?  That is, your test above should maybe read
> >
> >signop sgn = TYPE_SIGN (type);
> >if (ptrofftype_p (type))
> >  sgn = SIGNED;
> >
> > assuming 'type' is the type of lowbnd
>
> Yes, that's essentially equivalent to what get_offset_range does, but I'm not
> sure why having two slightly different ways of doing it would be better than a
> single one here,   Maybe replace the call to get_precision in both places with
> TYPE_PRECSION (type) then?

I wasn't aware of the copy in get_offset_range.  To cite:

  wide_int wr[2];
  if (!get_range (x, stmt, wr, rvals))
return false;

  signop sgn = SIGNED;
  /* Only convert signed integers or unsigned sizetype to a signed
 offset and avoid converting large positive values in narrower
 types to negative offsets.  */
  if (TYPE_UNSIGNED (type)
  && wr[0].get_precision () < TYPE_PRECISION (sizetype))
sgn = UNSIGNED;

  r[0] = offset_int::from (wr[0], sgn);
  r[1] = offset_int::from (wr[1], sgn);

I guess the main issue here is that the machinery converts to offset_int
prematurely and thus needs to do it even when it's not clear in what context
(POINTER_PLUS_EXPR offset or not) it is used.  The code unfortunately
is a bit of a mess and I'm not too familiar with it.  I'm OK with your
original patch, given the above it's consistent (even if maybe broken).

Thanks,
Richard.

>
> --
> Eric Botcazou
>
>
>