Re: realpath() patch to fix symlinks resolution for win32

2023-02-10 Thread i.nixman--- via Gcc-patches

On 2023-02-11 06:32, Jonathan Yong wrote:

On 2/6/23 06:40, Jonathan Yong wrote:

On 1/18/23 10:44, i.nixman--- via Gcc-patches wrote:

hello again!

the final version of the path for 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108350


successfully bootstraped for x86_64-mingw32 and x86_64-linux.

could anyone apply it please?



best!


Looks good to me, please supply the appropriate changelog.


Pushed to master.



Thank you Jonathan!

could you please close the corresponding BR too?:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108350



best!


Re: realpath() patch to fix symlinks resolution for win32

2023-02-10 Thread Jonathan Yong via Gcc-patches

On 2/6/23 06:40, Jonathan Yong wrote:

On 1/18/23 10:44, i.nixman--- via Gcc-patches wrote:

hello again!

the final version of the path for 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108350


successfully bootstraped for x86_64-mingw32 and x86_64-linux.

could anyone apply it please?



best!


Looks good to me, please supply the appropriate changelog.


Pushed to master.


[PATCH] RISC-V: Optimize the code gen of VLM/VSM.

2023-02-10 Thread incarnation.p.lee--- via Gcc-patches
From: Pan Li 

PR 108185
PR 108654

The bytesize of the vbool*_t isn't well defined. This patch
adjust the rvv bool modes with actually mode size in bytes.
However, only allow mode tieable when exactly equal for the
rvv bool types, aka vbool1_t, vbool2_t, vbool4_t, vbool8_t,
vbool16_t, vbool32_t, and vbool64_t.

gcc/ChangeLog:

* config/riscv/riscv-modes.def (ADJUST_BYTESIZE):
* config/riscv/riscv.cc (riscv_v_adjust_bytesize):
(riscv_modes_tieable_p):
* config/riscv/riscv.h (riscv_v_adjust_bytesize):
* machmode.h (VECTOR_BOOL_MODE_P):
* tree-ssa-sccvn.cc (visit_reference_op_load):

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr108185-1.c: New test.
* gcc.target/riscv/pr108185-2.c: New test.
* gcc.target/riscv/pr108185-3.c: New test.
* gcc.target/riscv/pr108185-4.c: New test.
* gcc.target/riscv/pr108185-5.c: New test.
* gcc.target/riscv/pr108185-6.c: New test.
* gcc.target/riscv/pr108185-7.c: New test.
* gcc.target/riscv/pr108185-8.c: New test.
---
 gcc/config/riscv/riscv-modes.def| 14 ++--
 gcc/config/riscv/riscv.cc   | 34 -
 gcc/config/riscv/riscv.h|  2 +
 gcc/machmode.h  |  3 +
 gcc/testsuite/gcc.target/riscv/pr108185-1.c | 68 ++
 gcc/testsuite/gcc.target/riscv/pr108185-2.c | 68 ++
 gcc/testsuite/gcc.target/riscv/pr108185-3.c | 68 ++
 gcc/testsuite/gcc.target/riscv/pr108185-4.c | 68 ++
 gcc/testsuite/gcc.target/riscv/pr108185-5.c | 68 ++
 gcc/testsuite/gcc.target/riscv/pr108185-6.c | 68 ++
 gcc/testsuite/gcc.target/riscv/pr108185-7.c | 68 ++
 gcc/testsuite/gcc.target/riscv/pr108185-8.c | 77 +
 gcc/tree-ssa-sccvn.cc   | 13 +++-
 13 files changed, 608 insertions(+), 11 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-7.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-8.c

diff --git a/gcc/config/riscv/riscv-modes.def b/gcc/config/riscv/riscv-modes.def
index d5305efa8a6..cc21d3c83a2 100644
--- a/gcc/config/riscv/riscv-modes.def
+++ b/gcc/config/riscv/riscv-modes.def
@@ -64,13 +64,13 @@ ADJUST_ALIGNMENT (VNx16BI, 1);
 ADJUST_ALIGNMENT (VNx32BI, 1);
 ADJUST_ALIGNMENT (VNx64BI, 1);
 
-ADJUST_BYTESIZE (VNx1BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
-ADJUST_BYTESIZE (VNx2BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
-ADJUST_BYTESIZE (VNx4BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
-ADJUST_BYTESIZE (VNx8BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
-ADJUST_BYTESIZE (VNx16BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
-ADJUST_BYTESIZE (VNx32BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
-ADJUST_BYTESIZE (VNx64BI, riscv_v_adjust_nunits (VNx64BImode, 8));
+ADJUST_BYTESIZE (VNx1BI, riscv_v_adjust_bytesize (VNx1BImode, 1));
+ADJUST_BYTESIZE (VNx2BI, riscv_v_adjust_bytesize (VNx2BImode, 1));
+ADJUST_BYTESIZE (VNx4BI, riscv_v_adjust_bytesize (VNx4BImode, 1));
+ADJUST_BYTESIZE (VNx8BI, riscv_v_adjust_bytesize (VNx8BImode, 1));
+ADJUST_BYTESIZE (VNx16BI, riscv_v_adjust_bytesize (VNx16BImode, 2));
+ADJUST_BYTESIZE (VNx32BI, riscv_v_adjust_bytesize (VNx32BImode, 4));
+ADJUST_BYTESIZE (VNx64BI, riscv_v_adjust_bytesize (VNx64BImode, 8));
 
 /*
| Mode| MIN_VLEN=32 | MIN_VLEN=32 | MIN_VLEN=64 | MIN_VLEN=64 |
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 3b7804b7501..995cdab108f 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -1003,6 +1003,27 @@ riscv_v_adjust_nunits (machine_mode mode, int scale)
   return scale;
 }
 
+/* Call from ADJUST_BYTESIZE in riscv-modes.def.  Return the correct
+   BYTES size for corresponding machine_mode.  */
+
+poly_int64
+riscv_v_adjust_bytesize (machine_mode mode, int scale)
+{
+  gcc_assert (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL);
+
+  if (riscv_v_ext_vector_mode_p (mode))
+{
+  poly_uint16 mode_size = GET_MODE_SIZE (mode);
+
+  if (known_lt (mode_size, BYTES_PER_RISCV_VECTOR))
+   return mode_size;
+  else
+   return BYTES_PER_RISCV_VECTOR;
+}
+
+  return scale;
+}
+
 /* Return true if X is a valid address for machine mode MODE.  If it is,
fill in INFO appropriately.  STRICT_P is true if REG_OK_STRICT is in
effect.  */
@@ -5807,11 +5828,22 @@ riscv_hard_regno_mode_ok (unsigned int regno, 
machine_mode mode)
 /* 

Re: [PATCH] testsuite: adjust patterns in RISC-V tests to skip unwind table directives

2023-02-10 Thread Jeff Law via Gcc-patches




On 2/9/23 02:48, Andreas Schwab via Gcc-patches wrote:

PR target/108723
* gcc.target/riscv/shorten-memrefs-1.c: Adjust patterns to skip
over cfi directives.
* gcc.target/riscv/shorten-memrefs-2.c: Likewise.
* gcc.target/riscv/shorten-memrefs-3.c: Likewise.
* gcc.target/riscv/shorten-memrefs-4.c: Likewise.
* gcc.target/riscv/shorten-memrefs-5.c: Likewise.
* gcc.target/riscv/shorten-memrefs-6.c: Likewise.
* gcc.target/riscv/shorten-memrefs-8.c: Likewise.

OK.  Thanks.

Jeff


Re: [PATCH] libstdc++: testsuite: Add char8_t to codecvt_unicode

2023-02-10 Thread Hans-Peter Nilsson
Casual observation from a random reader that's sometimes hit by 
testresults acting up:

On Thu, 9 Feb 2023, Dimitrij Mijoski via Gcc-patches wrote:

> libstdc++-v3/ChangeLog:
> 
>   * testsuite/22_locale/codecvt/codecvt_unicode.cc: Rename
> functions.
>   * testsuite/22_locale/codecvt/codecvt_unicode.h: Make more
> generic so it accepts char8_t.
>   * testsuite/22_locale/codecvt/codecvt_unicode_wchar_t.cc: Rename
> functions.
>   * testsuite/22_locale/codecvt/codecvt_unicode_char8_t.cc: New test.

(Second line indents to the "*", not the text after it.  TL;DR: 
See other entries.)

But above that, please never *modify* existing tests (unless 
they're bad and wrong), add new ones instead.  TL;DR: Bisecting 
when reghunting gets messy.

brgds, H-P

> ---
>  .../22_locale/codecvt/codecvt_unicode.cc  |  16 +-
>  .../22_locale/codecvt/codecvt_unicode.h   | 807 +-
>  .../codecvt/codecvt_unicode_char8_t.cc|  53 ++
>  .../codecvt/codecvt_unicode_wchar_t.cc|   6 +-
>  4 files changed, 484 insertions(+), 398 deletions(-)
>  create mode 100644 
> libstdc++-v3/testsuite/22_locale/codecvt/codecvt_unicode_char8_t.cc
> 
> diff --git a/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_unicode.cc 
> b/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_unicode.cc
> index df1a2b4cc..eafb53a8c 100644
> --- a/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_unicode.cc
> +++ b/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_unicode.cc
> @@ -27,38 +27,38 @@ void
>  test_utf8_utf32_codecvts ()
>  {
>using codecvt_c32 = codecvt;
> -  auto loc_c = locale::classic ();
> +  auto _c = locale::classic ();
>VERIFY (has_facet (loc_c));
>  
>auto  = use_facet (loc_c);
> -  test_utf8_utf32_codecvts (cvt);
> +  test_utf8_utf32_cvt (cvt);
>  
>codecvt_utf8 cvt2;
> -  test_utf8_utf32_codecvts (cvt2);
> +  test_utf8_utf32_cvt (cvt2);
>  }
>  
>  void
>  test_utf8_utf16_codecvts ()
>  {
>using codecvt_c16 = codecvt;
> -  auto loc_c = locale::classic ();
> +  auto _c = locale::classic ();
>VERIFY (has_facet (loc_c));
>  
>auto  = use_facet (loc_c);
> -  test_utf8_utf16_cvts (cvt);
> +  test_utf8_utf16_cvt (cvt);
>  
>codecvt_utf8_utf16 cvt2;
> -  test_utf8_utf16_cvts (cvt2);
> +  test_utf8_utf16_cvt (cvt2);
>  
>codecvt_utf8_utf16 cvt3;
> -  test_utf8_utf16_cvts (cvt3);
> +  test_utf8_utf16_cvt (cvt3);
>  }
>  
>  void
>  test_utf8_ucs2_codecvts ()
>  {
>codecvt_utf8 cvt;
> -  test_utf8_ucs2_cvts (cvt);
> +  test_utf8_ucs2_cvt (cvt);
>  }
>  
>  int
> diff --git a/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_unicode.h 
> b/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_unicode.h
> index fbdc7a35b..690c07215 100644
> --- a/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_unicode.h
> +++ b/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_unicode.h
> @@ -42,33 +42,33 @@ auto constexpr array_size (const T (&)[N]) -> size_t
>return N;
>  }
>  
> -template 
> +template 
>  void
> -utf8_to_utf32_in_ok (const std::codecvt )
> +utf8_to_utf32_in_ok (const std::codecvt )
>  {
>using namespace std;
>// UTF-8 string of 1-byte CP, 2-byte CP, 3-byte CP and 4-byte CP
> -  const char in[] = "b?\u\U0010";
> -  const char32_t exp_literal[] = U"b?\u\U0010";
> -  CharT exp[array_size (exp_literal)] = {};
> -  std::copy (begin (exp_literal), end (exp_literal), begin (exp));
> -
> -  static_assert (array_size (in) == 11, "");
> -  static_assert (array_size (exp_literal) == 5, "");
> -  static_assert (array_size (exp) == 5, "");
> -  VERIFY (char_traits::length (in) == 10);
> -  VERIFY (char_traits::length (exp_literal) == 4);
> -  VERIFY (char_traits::length (exp) == 4);
> +  const unsigned char input[] = "b?\u\U0010";
> +  const char32_t expected[] = U"b?\u\U0010";
> +  static_assert (array_size (input) == 11, "");
> +  static_assert (array_size (expected) == 5, "");
> +
> +  ExternT in[array_size (input)];
> +  InternT exp[array_size (expected)];
> +  copy (begin (input), end (input), begin (in));
> +  copy (begin (expected), end (expected), begin (exp));
> +  VERIFY (char_traits::length (in) == 10);
> +  VERIFY (char_traits::length (exp) == 4);
>  
>test_offsets_ok offsets[] = {{0, 0}, {1, 1}, {3, 2}, {6, 3}, {10, 4}};
>for (auto t : offsets)
>  {
> -  CharT out[array_size (exp) - 1] = {};
> +  InternT out[array_size (exp) - 1] = {};
>VERIFY (t.in_size <= array_size (in));
>VERIFY (t.out_size <= array_size (out));
>auto state = mbstate_t{};
> -  auto in_next = (const char *) nullptr;
> -  auto out_next = (CharT *) nullptr;
> +  auto in_next = (const ExternT *) nullptr;
> +  auto out_next = (InternT *) nullptr;
>auto res = codecvt_base::result ();
>  
>res = cvt.in (state, in, in + t.in_size, in_next, out, out + 
> t.out_size,
> @@ -76,19 +76,19 @@ utf8_to_utf32_in_ok (const std::codecvt mbstate_t> )
>VERIFY (res == 

Re: [PATCH] libstdc++: Add missing free functions for atomic_flag [PR103934]

2023-02-10 Thread Jonathan Wakely via Gcc-patches
On Fri, 10 Feb 2023 at 18:25, Thomas Rodgers  wrote:
>
> This patch did not get committed in a timely manner after it was OK'd. In 
> revisiting the patch some issues were found that have lead me to resubmit for 
> review -
>
> Specifically -
>
> The original commit to add C++20 atomic_flag::test did not include the free 
> functions for atomic_flag_test[_explicit]
> The original commit to add C++20 atomic_flag::wait/notify did not include the 
> free functions for atomic_flag_wait/notify[_explicit]
>
> These two commits landed in GCC10 and GCC11 respectively. My original patch 
> included both sets of free functions, but
> that complicates the backporting of these changes to GCC10, GCC11, and GCC12.

I don't think we need them in GCC 10.

> Additionally commit 7c2155 removed const qualification from 
> atomic_flag::notify_one/notify_all but the original version of this
> patch accepts the atomic flag as const.
>
> The original version of this patch did not include test cases for the 
> atomic_flag_test[_explicit] free functions.
>
> I have split the original patch into two patches, on for the atomic_flag_test 
> free functions, and one for the atomic_flag_wait/notify
> free functions.

Thanks.

For [PATCH 1/2] please name the added functions in the changelog entry:

* include/std/atomic (atomic_flag_test): Add.
(atomic_flag_test_explicit): Add.

Similarly for the changelog in [PATCH 2/2], naming the four new
functions added to include/std/atomic.

The indentation is off in [PATCH 2/2] for atomic_flag:

+#if __cpp_lib_atomic_wait
+  inline void
+  atomic_flag_wait(atomic_flag* __a, bool __old) noexcept
+  { __a->wait(__old); }
+

And similarly for the other three added functions.
The function names should start in the same column as the 'inline' and
opening brace of the function body.


Both patches are OK for trunk, gcc-12 and gcc-11 with those changes.




>
>
> On Wed, Feb 2, 2022 at 1:35 PM Jonathan Wakely  wrote:
>>
>> >+  inline void
>> >+  atomic_flag_wait_explicit(const atomic_flag* __a, bool __old,
>> >+   std::memory_order __m) noexcept
>>
>> No need for the std:: qualification, and check the indentation.
>>
>>
>> > libstdc++-v3/ChangeLog:
>> >
>> >PR103934
>>
>> This needs to include the component: PR libstdc++/103934
>>
>> >* include/std/atomic: Add missing free functions.
>>
>> Please name the new functions in the changelog, in the usual format.
>> Just the names is fine, no need for the full signatures with
>> parameters.
>>
>> OK for trunk with those changes.
>>



[v3][PATCH 0/2] Handle component_ref to a structure/union field including FAM for builtin_object_size

2023-02-10 Thread Qing Zhao via Gcc-patches
These are the 3rd version of the patches for PR101832, to fix
builtin_object_size to correctly handle component_ref to a
structure/union field that includes a flexible array member.

also includes a documentation update for the GCC extension on embedding
a structure/union with flexible array member into another structure.
which includes a fix to PR77650.

compared to the 2nd version of the patch, the major changes are:
1. only include C99 flexible array member to this extension, trailing [0], [1]
   and [4] are all excluded.
2. for the new bit type_include_flexarray in tree_type_common, print it
   and also stream in/out it. 
3. update testing cases.
4. more clarification on the documentation. warnings for deprecating the 
   case when the structure with C99 FAM is embedded in the middle of
   another structure. 
5. add a new warning option -Wgnu-variable-sized-type-not-at-end for
   identifing all such cases.

bootstrapped and regression tested on aarch64 and x86.

Okay for commit?

thanks.

Qing



[v3][PATCH 2/2] Update documentation to clarify a GCC extension (PR77650)

2023-02-10 Thread Qing Zhao via Gcc-patches
on structure with C99 flexible array member being nested in another structure.

This is also fixed PR77650.

" GCC extension accepts a structure containing a ISO C99 "flexible array
member", or a union containing such a structure (possibly recursively)
to be a member of a structure.

 There are three situations:

   * The structure with a C99 flexible array member is the last field of
 another structure, for example:

  struct flex  { int length; char data[]; };

  struct out_flex { int m; struct flex flex_data; };

 In the above, 'flex_data.data[]' is considered as a flexible array
 too.

   * The structure with a C99 flexible array member is the field of
 another union, for example:

  struct flex1  { int length1; char data1[]; }
  struct flex2  { int length2; char data2[]; }

  union out_flex { struct flex1 flex_data1; struct flex2 flex_data2; }

 In the above, 'flex_data1.data1[]' or 'flex_data2.data2[]' is
 considered as flexible arrays too.

   * The structure with a C99 flexible array member is the middle field
 of another structure, for example:

  struct flex  { int length; char data[]; };

  struct mid_flex { int m; struct flex flex_data; int n; };

 In the above, 'flex_data.data[]' is allowed to be extended flexibly
 to the padding.  E.g, up to 4 elements.

 However, relying on space in struct padding is a bad programming
 practice, compilers do not handle such extension consistently, Any
 code relying on this behavior should be modified to ensure that
 flexible array members only end up at the ends of structures.

 Please use warning option '-Wgnu-variable-sized-type-not-at-end' to
 identify all such cases in the source code and modify them.  This
 extension will be deprecated from gcc in the next release. "

gcc/c-family/ChangeLog:

PR c/77650
* c.opt: New option -Wgnu-variable-sized-type-not-at-end.

gcc/c/ChangeLog:

PR c/77650
* c-decl.cc (finish_struct): Issue warnings for new option.

gcc/ChangeLog:

PR c/77650
* doc/extend.texi: Document GCC extension on a structure containing
a flexible array member to be a member of another structure.

gcc/testsuite/ChangeLog:

PR c/77650
* gcc.dg/variable-sized-type-flex-array.c: New test.
---
 gcc/c-family/c.opt|  5 ++
 gcc/c/c-decl.cc   |  7 +++
 gcc/doc/extend.texi   | 58 ++-
 .../gcc.dg/variable-sized-type-flex-array.c   | 31 ++
 4 files changed, 100 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/variable-sized-type-flex-array.c

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index c0fea56a8f5..fd720538800 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -737,6 +737,11 @@ Wformat-truncation=
 C ObjC C++ LTO ObjC++ Joined RejectNegative UInteger Var(warn_format_trunc) 
Warning LangEnabledBy(C ObjC C++ LTO ObjC++,Wformat=, warn_format >= 1, 0) 
IntegerRange(0, 2)
 Warn about calls to snprintf and similar functions that truncate output.
 
+Wgnu-variable-sized-type-not-at-end
+C C++ Var(warn_variable_sized_type_not_at_end) Warning
+Warn about structures or unions with C99 flexible array members are not
+at the end of a structure.
+
 Wif-not-aligned
 C ObjC C++ ObjC++ Var(warn_if_not_aligned) Init(1) Warning
 Warn when the field in a struct is not aligned.
diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
index 741a37560b0..041df4355da 100644
--- a/gcc/c/c-decl.cc
+++ b/gcc/c/c-decl.cc
@@ -9289,6 +9289,13 @@ finish_struct (location_t loc, tree t, tree fieldlist, 
tree attributes,
   && is_last_field)
TYPE_INCLUDE_FLEXARRAY (t) = true;
 
+  if (warn_variable_sized_type_not_at_end
+ && !is_last_field
+ && TYPE_INCLUDE_FLEXARRAY (TREE_TYPE (x)))
+   warning_at (DECL_SOURCE_LOCATION (x),
+   OPT_Wgnu_variable_sized_type_not_at_end,
+   "variable sized type not at the end of a struct");
+
   if (DECL_NAME (x)
  || RECORD_OR_UNION_TYPE_P (TREE_TYPE (x)))
saw_named_field = true;
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 5a026c4b48c..737228b35ac 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -1748,7 +1748,63 @@ Flexible array members may only appear as the last 
member of a
 A structure containing a flexible array member, or a union containing
 such a structure (possibly recursively), may not be a member of a
 structure or an element of an array.  (However, these uses are
-permitted by GCC as extensions.)
+permitted by GCC as extensions, see details below.)
+@end itemize
+
+GCC extension accepts a structure containing a ISO C99 @dfn{flexible array
+member}, or a union containing such a structure (possibly recursively)
+to be a member of a structure.
+
+There are three situations:
+
+@itemize @bullet

[v3][PATCH 1/2] Handle component_ref to a structre/union field including C99 FAM [PR101832]

2023-02-10 Thread Qing Zhao via Gcc-patches
GCC extension accepts the case when a struct with a C99 flexible array member
is embedded into another struct or union (possibly recursively).
__builtin_object_size should treat such struct as flexible size.

gcc/c/ChangeLog:

PR tree-optimization/101832
* c-decl.cc (finish_struct): Set TYPE_INCLUDE_FLEXARRAY for
struct/union type.

gcc/cp/ChangeLog:

PR tree-optimization/101832
* module.cc (trees_out::core_bools): Stream out new bit
type_include_flexarray.
(trees_in::core_bools): Stream in new bit type_include_flexarray.

gcc/ChangeLog:

PR tree-optimization/101832
* print-tree.cc (print_node): Print new bit type_include_flexarray.
* tree-core.h (struct tree_type_common): New bit
type_include_flexarray.
* tree-object-size.cc (addr_object_size): Handle structure/union type
when it has flexible size.
* tree-streamer-in.cc (unpack_ts_type_common_value_fields): Stream
in new bit type_include_flexarray.
* tree-streamer-out.cc (pack_ts_type_common_value_fields): Stream
out new bit type_include_flexarray.
* tree.h (TYPE_INCLUDE_FLEXARRAY): New macro
TYPE_INCLUDE_FLEXARRAY.

gcc/testsuite/ChangeLog:

PR tree-optimization/101832
* gcc.dg/builtin-object-size-pr101832.c: New test.
---
 gcc/c/c-decl.cc   |  12 ++
 gcc/cp/module.cc  |   2 +
 gcc/print-tree.cc |   5 +
 .../gcc.dg/builtin-object-size-pr101832.c | 134 ++
 gcc/tree-core.h   |   4 +-
 gcc/tree-object-size.cc   |  79 +++
 gcc/tree-streamer-in.cc   |   1 +
 gcc/tree-streamer-out.cc  |   1 +
 gcc/tree.h|   6 +
 9 files changed, 215 insertions(+), 29 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/builtin-object-size-pr101832.c

diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
index 20e7d1855bf..741a37560b0 100644
--- a/gcc/c/c-decl.cc
+++ b/gcc/c/c-decl.cc
@@ -9277,6 +9277,18 @@ finish_struct (location_t loc, tree t, tree fieldlist, 
tree attributes,
   /* Set DECL_NOT_FLEXARRAY flag for FIELD_DECL x.  */
   DECL_NOT_FLEXARRAY (x) = !is_flexible_array_member_p (is_last_field, x);
 
+  /* Set TYPE_INCLUDE_FLEXARRAY for the context of x, t
+   * when x is an array.  */
+  if (TREE_CODE (TREE_TYPE (x)) == ARRAY_TYPE)
+   TYPE_INCLUDE_FLEXARRAY (t) = flexible_array_member_type_p (TREE_TYPE 
(x)) ;
+  /* Recursively set TYPE_INCLUDE_FLEXARRAY for the context of x, t
+when x is the last field.  */
+  else if ((TREE_CODE (TREE_TYPE (x)) == RECORD_TYPE
+   || TREE_CODE (TREE_TYPE (x)) == UNION_TYPE)
+  && TYPE_INCLUDE_FLEXARRAY (TREE_TYPE (x))
+  && is_last_field)
+   TYPE_INCLUDE_FLEXARRAY (t) = true;
+
   if (DECL_NAME (x)
  || RECORD_OR_UNION_TYPE_P (TREE_TYPE (x)))
saw_named_field = true;
diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index ac2fe66b080..c750361b704 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -5371,6 +5371,7 @@ trees_out::core_bools (tree t)
   WB (t->type_common.lang_flag_5);
   WB (t->type_common.lang_flag_6);
   WB (t->type_common.typeless_storage);
+  WB (t->type_common.type_include_flexarray);
 }
 
   if (CODE_CONTAINS_STRUCT (code, TS_DECL_COMMON))
@@ -5551,6 +5552,7 @@ trees_in::core_bools (tree t)
   RB (t->type_common.lang_flag_5);
   RB (t->type_common.lang_flag_6);
   RB (t->type_common.typeless_storage);
+  RB (t->type_common.type_include_flexarray);
 }
 
   if (CODE_CONTAINS_STRUCT (code, TS_DECL_COMMON))
diff --git a/gcc/print-tree.cc b/gcc/print-tree.cc
index 1f3afcbbc86..efacdb7686f 100644
--- a/gcc/print-tree.cc
+++ b/gcc/print-tree.cc
@@ -631,6 +631,11 @@ print_node (FILE *file, const char *prefix, tree node, int 
indent,
  && TYPE_CXX_ODR_P (node))
fputs (" cxx-odr-p", file);
 
+  if ((code == RECORD_TYPE
+  || code == UNION_TYPE)
+ && TYPE_INCLUDE_FLEXARRAY (node))
+   fputs (" include-flexarray", file);
+
   /* The transparent-union flag is used for different things in
 different nodes.  */
   if ((code == UNION_TYPE || code == RECORD_TYPE)
diff --git a/gcc/testsuite/gcc.dg/builtin-object-size-pr101832.c 
b/gcc/testsuite/gcc.dg/builtin-object-size-pr101832.c
new file mode 100644
index 000..60078e11634
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/builtin-object-size-pr101832.c
@@ -0,0 +1,134 @@
+/* PR 101832: 
+   GCC extension accepts the case when a struct with a C99 flexible array
+   member is embedded into another struct (possibly recursively).
+   __builtin_object_size will treat such struct as flexible size.
+   However, when a structure with non-C99 flexible array member, i.e, trailing
+   [0], [1], or [4], 

Re: [RFC PATCH v1 01/10] docs: Document a canonical RTL for a conditional-zero insns

2023-02-10 Thread Andrew Pinski via Gcc-patches
On Fri, Feb 10, 2023 at 2:43 PM Philipp Tomsich
 wrote:
>
> On RISC-V, conditional-zero (i.e., move a register value or zero to a
> destination register) instructions are part if the Zicond extension.
> To support architectures that have similar constructs, we define a
> canonical RTL representation that can be used in if-conversion.

This is seems like worse canonical form than say:
(define_insn ""
  [(set (match_operand:m 0 ...)
  (if_then_else (eq_or_ne (match_operand:m 1 ...) (const_int 0))
 (match_operand:m 2 ...)
 (const_int 0)
   ))]
  "..."
  "...")
(define_insn ""
  [(set (match_operand:m 0 ...)
  (if_then_else (eq_or_ne (match_operand:m 1 ...) (const_int 0))
 (const_int 0)
 (match_operand:m 2 ...)
   ))]
  "..."
  "...")

Why can't you use the above form instead? This is the standard way of
expressing condition moves of 0.
This matches even what aarch64 form is already:
(define_insn "*cmov_insn"
  [(set (match_operand:ALLI 0 "register_operand" "=r,r,r,r,r,r,r")
(if_then_else:ALLI
 (match_operator 1 "aarch64_comparison_operator"
  [(match_operand 2 "cc_register" "") (const_int 0)])
 (match_operand:ALLI 3 "aarch64_reg_zero_or_m1_or_1"
"rZ,rZ,UsM,rZ,Ui1,UsM,Ui1")
 (match_operand:ALLI 4 "aarch64_reg_zero_or_m1_or_1"
"rZ,UsM,rZ,Ui1,rZ,UsM,Ui1")))]
  "!((operands[3] == const1_rtx && operands[4] == constm1_rtx)
 || (operands[3] == constm1_rtx && operands[4] == const1_rtx))"
  ;; Final two alternatives should be unreachable, but included for completeness
  "..."
  [(set_attr "type" "csel, csel, csel, csel, csel, mov_imm, mov_imm")]
)

(Which is more complex as it can handle even more than just the simple
case you provide).

Thanks,
Andrew Pinski

> +(define_insn ""
> +  [(set (match_operand:@var{m} 0 @dots{})
> +(and:@var{m}
> +  (neg:@var{m} (@var{eq_or_ne} (match_operand:@var{m} 1 @dots{})
> +   (const_int 0)))
> +  (match_operand:@var{m} 2 @dots{})))]
> +  "@dots{}"
> +  "@dots{}")

>
> Signed-off-by: Philipp Tomsich 
> ---
>
>  gcc/doc/md.texi | 17 +
>  1 file changed, 17 insertions(+)
>
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 7235d34c4b3..579462ea67f 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -8347,6 +8347,23 @@ operand of @code{mult} is also a shift, then that is 
> extended also.
>  This transformation is only applied when it can be proven that the
>  original operation had sufficient precision to prevent overflow.
>
> +@cindex @code{conditional-zero}, canonicalization of
> +@item
> +A machine that has an instruction that performs a conditional-zero
> +operation (i.e., an instruction that moves a register value or puts 0
> +into the destination register) should specify the pattern for that
> +instruction as:
> +@smallexample
> +(define_insn ""
> +  [(set (match_operand:@var{m} 0 @dots{})
> +(and:@var{m}
> +  (neg:@var{m} (@var{eq_or_ne} (match_operand:@var{m} 1 @dots{})
> +   (const_int 0)))
> +  (match_operand:@var{m} 2 @dots{})))]
> +  "@dots{}"
> +  "@dots{}")
> +@end smallexample
> +
>  @end itemize
>
>  Further canonicalization rules are defined in the function
> --
> 2.34.1
>


[pushed] analyzer: don't warn for deref-before-check for checks in macros [PR108745]

2023-02-10 Thread David Malcolm via Gcc-patches
Integration testing shows this patch fixes all 9 known false positives
from -Wanalyzer-deref-before-check within ImageMagick-7.1.0-57, and
eliminates 34 further as-yet unassessed such diagnostics, without
eliminating the 1 known true positive.

This improves the rate of true positives for the warning from
1.56% to 4.76% of the total:

-Wanalyzer-deref-before-check: 1.56% -> 4.76% (GOOD: 1 BAD: 63->20)
TRUE:  1
   FALSE: 15 ->  6 (-9)
 ImageMagick-7.1.0-57:  9 ->  0 (-9)
TODO: 48 -> 14 (-34)
 ImageMagick-7.1.0-57: 21 ->  1 (-20)
   qemu-7.2.0: 25 -> 11 (-14)

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r13-5811-gaa601e30758581.

gcc/analyzer/ChangeLog:
PR analyzer/108745
* sm-malloc.cc (deref_before_check::emit): Reject the warning if
the check occurs within a macro defintion.

gcc/testsuite/ChangeLog:
PR analyzer/108745
* gcc.dg/analyzer/deref-before-check-macro-pr108745.c: New test.
* gcc.dg/analyzer/deref-before-check-macro.c: New test.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/sm-malloc.cc | 37 +
 .../deref-before-check-macro-pr108745.c   | 54 +++
 .../analyzer/deref-before-check-macro.c   | 25 +
 3 files changed, 116 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.dg/analyzer/deref-before-check-macro-pr108745.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/deref-before-check-macro.c

diff --git a/gcc/analyzer/sm-malloc.cc b/gcc/analyzer/sm-malloc.cc
index 9aee810f818..c24fe737481 100644
--- a/gcc/analyzer/sm-malloc.cc
+++ b/gcc/analyzer/sm-malloc.cc
@@ -1519,6 +1519,43 @@ public:
!= _check_enode->get_point ().get_call_string ())
   return false;
 
+/* Reject the warning if the check occurs within a macro defintion.
+   This avoids false positives for such code as:
+
+   #define throw_error \
+  do { \
+if (p) \
+  cleanup (p); \
+return;\
+  } while (0)
+
+   if (p->idx >= n)
+ throw_error ();
+
+   where the usage of "throw_error" implicitly adds a check
+   on 'p'.
+
+   We do warn when the check is in a macro expansion if we can get
+   at the location of the condition and it is't part of the
+   definition, so that we warn for checks such as:
+  if (words[0][0] == '@')
+return;
+  g_assert(words[0] != NULL); <--- here
+   Unfortunately we don't have locations for individual gimple
+   arguments, so in:
+  g_assert (ptr);
+   we merely have a gimple_cond
+  if (p_2(D) == 0B)
+   with no way of getting at the location of the condition separately
+   from that of the gimple_cond (where the "if" is within the macro
+   definition).  We reject the warning for such cases.
+
+   We do warn when the *deref* occurs in a macro, since this can be
+   a source of real bugs; see e.g. PR 77425.  */
+location_t check_loc = m_check_enode->get_point ().get_location ();
+if (linemap_location_from_macro_definition_p (line_table, check_loc))
+  return false;
+
 /* Reject the warning if the deref's BB doesn't dominate that
of the check, so that we don't warn e.g. for shared cleanup
code that checks a pointer for NULL, when that code is sometimes
diff --git a/gcc/testsuite/gcc.dg/analyzer/deref-before-check-macro-pr108745.c 
b/gcc/testsuite/gcc.dg/analyzer/deref-before-check-macro-pr108745.c
new file mode 100644
index 000..92f5a02645d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/deref-before-check-macro-pr108745.c
@@ -0,0 +1,54 @@
+/* Reduced from ImageMagick-7.1.0-57.  */
+
+#define NULL ((void *)0)
+
+typedef __builtin_va_list va_list;
+typedef __SIZE_TYPE__ size_t;
+
+typedef struct _ExceptionInfo ExceptionInfo;
+
+void
+ThrowMagickException(ExceptionInfo*,
+const char*,
+const char*,
+...) __attribute__((__format__(__printf__, 3, 4)));
+
+typedef struct _Image
+{
+  /* [...snip...] */
+  size_t columns, rows, depth, colors;
+  /* [...snip...] */
+} Image;
+
+typedef struct _ImageInfo
+{
+  /* [...snip...] */
+  char filename[4096];
+  /* [...snip...] */
+} ImageInfo;
+
+extern Image *AcquireImage(const ImageInfo*, ExceptionInfo*);
+extern void CloseBlob(Image*);
+extern Image *DestroyImageList(Image*);
+
+#define ThrowReaderException(tag) \
+{ \
+  (void) ThrowMagickException(exception, tag, \
+"`%s'",image_info->filename); \
+  if ((image) != (Image *) NULL) \
+{ \
+  (void) CloseBlob(image); \
+  image=DestroyImageList(image); \
+} \
+  return((Image *) NULL); \
+}
+
+Image*
+ReadMAPImage(const ImageInfo* image_info, ExceptionInfo* exception)
+{
+  Image* image;
+  image = AcquireImage(image_info, exception);
+  if ((image->columns == 0) || (image->rows == 0))
+

Re: [RFC PATCH v1 08/10] ifcvt: add if-conversion to conditional-zero instructions

2023-02-10 Thread Andrew Pinski via Gcc-patches
On Fri, Feb 10, 2023 at 2:47 PM Philipp Tomsich
 wrote:
>
> Some architectures, as it the case on RISC-V with the proposed
> ZiCondOps and the vendor-defined XVentanaCondOps, define a
> conditional-zero instruction that is equivalent to:
>  - the positive form:  rd = (rc != 0) ? rs : 0
>  - the negated form:   rd = (rc == 0) ? rs : 0
>
> While noce_try_store_flag_mask will somewhat work for this case, it
> will generate a number of atomic RTX that will misdirect the cost
> calculation and may be too long (i.e., 4 RTX and more) to successfully
> merge at combine-time.

Can you expand on this? Especially when there are patterns that use
(if_then_else) already.

>
> Instead, we add two new transforms that attempt to build up what we
> define as the canonical form of a conditional-zero expression:
>
>   (set (match_operand 0 "register_operand" "=r")
>(and (neg (eq_or_ne (match_operand 1 "register_operand" "r")
>(const_int 0)))
> (match_operand 2 "register_operand" "r")))

Again why are you not using:
(set (reg) (if_then_else (eq_ne (reg) (const_int 0)) (reg) (const_int 0)))
Form instead of the really bad "canonical" form of the above?

Thanks,
Andrew Pinski

>
> Architectures that provide a conditional-zero are thus expected to
> define an instruction matching this pattern in their backend.
>
> Based on this, we support the following cases:
>  - noce_try_condzero:
>   a ? a : b
>   a ? b : 0  (and then/else swapped)
>  !a ? b : 0  (and then/else swapped)
>  - noce_try_condzero_arith:
>  conditional-plus, conditional-minus, conditional-and,
>  conditional-or, conditional-xor, conditional-shift,
>  conditional-and
>
> Given that this is hooked into the CE passes, it is less powerful than
> a tree-pass (e.g., it can not transform cases where an extension, such
> as for uint16_t operations is in either the then or else-branch
> together with the arithmetic) but already covers a good array of cases
> and triggers across SPEC CPU 2017.
>
> Adding transformations in a tree pass should come in a future
> improvement.
>
> gcc/ChangeLog:
>
> * ifcvt.cc (noce_emit_insn): Add prototype.
> (noce_emit_condzero): Helper for noce_try_condzero and
> noce_try_condzero_arith transforms.
> (noce_try_condzero): New transform.
> (noce_try_condzero_arith): New transform for conditional
> arithmetic that can be built up by exploiting that the
> conditional-zero instruction will inject 0, which acts
> as the neutral element for operations.
> (noce_process_if_block): Call noce_try_condzero and
> noce_try_condzero_arith.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/xventanacondops-and-01.c: New test.
> * gcc.target/riscv/xventanacondops-and-02.c: New test.
> * gcc.target/riscv/xventanacondops-eq-01.c: New test.
> * gcc.target/riscv/xventanacondops-eq-02.c: New test.
> * gcc.target/riscv/xventanacondops-lt-01.c: New test.
> * gcc.target/riscv/xventanacondops-ne-01.c: New test.
> * gcc.target/riscv/xventanacondops-xor-01.c: New test.
>
> Signed-off-by: Philipp Tomsich 
> ---
>
>  gcc/ifcvt.cc  | 216 ++
>  .../gcc.target/riscv/zicond-and-01.c  |  16 ++
>  .../gcc.target/riscv/zicond-and-02.c  |  15 ++
>  gcc/testsuite/gcc.target/riscv/zicond-eq-01.c |  11 +
>  gcc/testsuite/gcc.target/riscv/zicond-eq-02.c |  14 ++
>  gcc/testsuite/gcc.target/riscv/zicond-lt-01.c |  16 ++
>  gcc/testsuite/gcc.target/riscv/zicond-ne-01.c |  10 +
>  .../gcc.target/riscv/zicond-xor-01.c  |  14 ++
>  8 files changed, 312 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zicond-and-01.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zicond-and-02.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zicond-eq-01.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zicond-eq-02.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zicond-lt-01.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zicond-ne-01.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zicond-xor-01.c
>
> diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
> index 008796838f7..7ac3bd8f18e 100644
> --- a/gcc/ifcvt.cc
> +++ b/gcc/ifcvt.cc
> @@ -97,6 +97,7 @@ static int find_if_case_2 (basic_block, edge, edge);
>  static int dead_or_predicable (basic_block, basic_block, basic_block,
>edge, int);
>  static void noce_emit_move_insn (rtx, rtx);
> +static rtx_insn *noce_emit_insn (rtx);
>  static rtx_insn *block_has_only_trap (basic_block);
>  static void need_cmov_or_rewire (basic_block, hash_set *,
>  hash_map *);
> @@ -787,6 +788,9 @@ static rtx noce_get_alt_condition (struct noce_if_info *, 
> rtx, rtx_insn **);
>  static int noce_try_minmax (struct noce_if_info *);
>  static int noce_try_abs (struct noce_if_info *);
>  static 

[RFC PATCH v1 10/10] RISC-V: Support XVentanaCondOps extension

2023-02-10 Thread Philipp Tomsich
The vendor-defined XVentanaCondOps extension adds two instructions
with semantics identical to Zicond.

This plugs the 2 new instruction in using the canonical RTX, which
also matches the combiner-input for noce_try_store_flag_mask and
noce_try_store_flag, defined for conditional-zero.

For documentation on XVentanaCondOps, refer to:
  
https://github.com/ventanamicro/ventana-custom-extensions/releases/download/v1.0.1/ventana-custom-extensions-v1.0.1.pdf

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_rtx_costs): Recognize idiom
for conditional zero as a single instruction for TARGET_XVENTANACONDOPS.
* config/riscv/riscv.md: Include xventanacondops.md.
* config/riscv/zicond.md: Enable splitters for TARGET_XVENTANACONDOPS.
* config/riscv/xventanacondops.md: New file.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xventanacondops-and-01.c: New test.
* gcc.target/riscv/xventanacondops-and-02.c: New test.
* gcc.target/riscv/xventanacondops-eq-01.c: New test.
* gcc.target/riscv/xventanacondops-eq-02.c: New test.
* gcc.target/riscv/xventanacondops-ifconv-imm.c: New test.
* gcc.target/riscv/xventanacondops-le-01.c: New test.
* gcc.target/riscv/xventanacondops-le-02.c: New test.
* gcc.target/riscv/xventanacondops-lt-01.c: New test.
* gcc.target/riscv/xventanacondops-lt-03.c: New test.
* gcc.target/riscv/xventanacondops-ne-01.c: New test.
* gcc.target/riscv/xventanacondops-ne-03.c: New test.
* gcc.target/riscv/xventanacondops-ne-04.c: New test.
* gcc.target/riscv/xventanacondops-xor-01.c: New test.

Signed-off-by: Philipp Tomsich 
---

 gcc/config/riscv/riscv.cc |  4 +--
 gcc/config/riscv/riscv.md |  5 ++--
 gcc/config/riscv/xventanacondops.md   | 29 +++
 gcc/config/riscv/zicond.md| 15 +-
 .../gcc.target/riscv/xventanacondops-and-01.c | 16 ++
 .../gcc.target/riscv/xventanacondops-and-02.c | 15 ++
 .../gcc.target/riscv/xventanacondops-eq-01.c  | 11 +++
 .../gcc.target/riscv/xventanacondops-eq-02.c  | 14 +
 .../riscv/xventanacondops-ifconv-imm.c| 19 
 .../gcc.target/riscv/xventanacondops-le-01.c  | 16 ++
 .../gcc.target/riscv/xventanacondops-le-02.c  | 11 +++
 .../gcc.target/riscv/xventanacondops-lt-01.c  | 16 ++
 .../gcc.target/riscv/xventanacondops-lt-03.c  | 16 ++
 .../gcc.target/riscv/xventanacondops-ne-01.c  | 10 +++
 .../gcc.target/riscv/xventanacondops-ne-03.c  | 13 +
 .../gcc.target/riscv/xventanacondops-ne-04.c  | 13 +
 .../gcc.target/riscv/xventanacondops-xor-01.c | 14 +
 17 files changed, 226 insertions(+), 11 deletions(-)
 create mode 100644 gcc/config/riscv/xventanacondops.md
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-and-01.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-and-02.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-eq-01.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-eq-02.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-ifconv-imm.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-le-01.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-le-02.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-lt-01.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-lt-03.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-ne-01.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-ne-03.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-ne-04.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-xor-01.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 7e69a652fc5..94ac8f350e6 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -2331,8 +2331,8 @@ riscv_rtx_costs (rtx x, machine_mode mode, int 
outer_code, int opno ATTRIBUTE_UN
   return false;
 
 case AND:
-  /* czero.eqz/nez */
-  if ((TARGET_ZICOND)
+  /* czero.eqz/nez or vt.maskc/vt.maskcn */
+  if ((TARGET_ZICOND || TARGET_XVENTANACONDOPS)
  && mode == word_mode
  && GET_CODE (XEXP (x, 0)) == NEG)
{
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 6f255a80379..e6b73c316cb 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -2673,7 +2673,7 @@ (define_split
(match_operator:GPR 1 "anyle_operator"
   [(match_operand:X 2 "register_operand")
(match_operand:X 3 "register_operand")]))]
-  "TARGET_ZICOND"
+  "TARGET_ZICOND || TARGET_XVENTANACONDOPS"
   [(set (match_dup 0) (match_dup 4))
(set (match_dup 0) (eq:GPR (match_dup 0) (const_int 0)))]
  {
@@ -2707,7 +2707,7 @@ (define_split
(match_operator:GPR 1 

[RFC PATCH v1 07/10] RISC-V: Recognize bexti in negated if-conversion

2023-02-10 Thread Philipp Tomsich
While the positive case "if ((bits >> SHAMT) & 1)" for SHAMT 0..10 can
trigger conversion into efficient branchless sequences
  - with Zbs (bexti + neg + and)
  - with Zicond (andi + czero.nez)
the inverted/negated case results in
  andi a5,a0,1024
  seqz a5,a5
  neg a5,a5
  and a5,a5,a1
due to how the sequence presents to the combine pass.

This adds an additional splitter to reassociate the polarity reversed
case into bexti + addi, if Zbs is present.

gcc/ChangeLog:

* config/riscv/zicond.md: Add split to reassociate
"andi + seqz + neg" into "bexti + addi".

Signed-off-by: Philipp Tomsich 
---

 gcc/config/riscv/zicond.md | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/gcc/config/riscv/zicond.md b/gcc/config/riscv/zicond.md
index 15fdaa539f1..0aad61c7009 100644
--- a/gcc/config/riscv/zicond.md
+++ b/gcc/config/riscv/zicond.md
@@ -143,3 +143,13 @@ (define_split
 {
   operands[2] = GEN_INT(1 << UINTVAL(operands[2]));
 })
+
+(define_split
+  [(set (match_operand:X 0 "register_operand")
+   (neg:X (eq:X (zero_extract:X (match_operand:X 1 "register_operand")
+(const_int 1)
+(match_operand 2 "immediate_operand"))
+(const_int 0]
+  "!TARGET_ZICOND && TARGET_ZBS"
+  [(set (match_dup 0) (zero_extract:X (match_dup 1) (const_int 1) (match_dup 
2)))
+   (set (match_dup 0) (plus:X (match_dup 0) (const_int -1)))])
-- 
2.34.1



[RFC PATCH v1 08/10] ifcvt: add if-conversion to conditional-zero instructions

2023-02-10 Thread Philipp Tomsich
Some architectures, as it the case on RISC-V with the proposed
ZiCondOps and the vendor-defined XVentanaCondOps, define a
conditional-zero instruction that is equivalent to:
 - the positive form:  rd = (rc != 0) ? rs : 0
 - the negated form:   rd = (rc == 0) ? rs : 0

While noce_try_store_flag_mask will somewhat work for this case, it
will generate a number of atomic RTX that will misdirect the cost
calculation and may be too long (i.e., 4 RTX and more) to successfully
merge at combine-time.

Instead, we add two new transforms that attempt to build up what we
define as the canonical form of a conditional-zero expression:

  (set (match_operand 0 "register_operand" "=r")
   (and (neg (eq_or_ne (match_operand 1 "register_operand" "r")
   (const_int 0)))
(match_operand 2 "register_operand" "r")))

Architectures that provide a conditional-zero are thus expected to
define an instruction matching this pattern in their backend.

Based on this, we support the following cases:
 - noce_try_condzero:
  a ? a : b
  a ? b : 0  (and then/else swapped)
 !a ? b : 0  (and then/else swapped)
 - noce_try_condzero_arith:
 conditional-plus, conditional-minus, conditional-and,
 conditional-or, conditional-xor, conditional-shift,
 conditional-and

Given that this is hooked into the CE passes, it is less powerful than
a tree-pass (e.g., it can not transform cases where an extension, such
as for uint16_t operations is in either the then or else-branch
together with the arithmetic) but already covers a good array of cases
and triggers across SPEC CPU 2017.

Adding transformations in a tree pass should come in a future
improvement.

gcc/ChangeLog:

* ifcvt.cc (noce_emit_insn): Add prototype.
(noce_emit_condzero): Helper for noce_try_condzero and
noce_try_condzero_arith transforms.
(noce_try_condzero): New transform.
(noce_try_condzero_arith): New transform for conditional
arithmetic that can be built up by exploiting that the
conditional-zero instruction will inject 0, which acts
as the neutral element for operations.
(noce_process_if_block): Call noce_try_condzero and
noce_try_condzero_arith.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xventanacondops-and-01.c: New test.
* gcc.target/riscv/xventanacondops-and-02.c: New test.
* gcc.target/riscv/xventanacondops-eq-01.c: New test.
* gcc.target/riscv/xventanacondops-eq-02.c: New test.
* gcc.target/riscv/xventanacondops-lt-01.c: New test.
* gcc.target/riscv/xventanacondops-ne-01.c: New test.
* gcc.target/riscv/xventanacondops-xor-01.c: New test.

Signed-off-by: Philipp Tomsich 
---

 gcc/ifcvt.cc  | 216 ++
 .../gcc.target/riscv/zicond-and-01.c  |  16 ++
 .../gcc.target/riscv/zicond-and-02.c  |  15 ++
 gcc/testsuite/gcc.target/riscv/zicond-eq-01.c |  11 +
 gcc/testsuite/gcc.target/riscv/zicond-eq-02.c |  14 ++
 gcc/testsuite/gcc.target/riscv/zicond-lt-01.c |  16 ++
 gcc/testsuite/gcc.target/riscv/zicond-ne-01.c |  10 +
 .../gcc.target/riscv/zicond-xor-01.c  |  14 ++
 8 files changed, 312 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zicond-and-01.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zicond-and-02.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zicond-eq-01.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zicond-eq-02.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zicond-lt-01.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zicond-ne-01.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zicond-xor-01.c

diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
index 008796838f7..7ac3bd8f18e 100644
--- a/gcc/ifcvt.cc
+++ b/gcc/ifcvt.cc
@@ -97,6 +97,7 @@ static int find_if_case_2 (basic_block, edge, edge);
 static int dead_or_predicable (basic_block, basic_block, basic_block,
   edge, int);
 static void noce_emit_move_insn (rtx, rtx);
+static rtx_insn *noce_emit_insn (rtx);
 static rtx_insn *block_has_only_trap (basic_block);
 static void need_cmov_or_rewire (basic_block, hash_set *,
 hash_map *);
@@ -787,6 +788,9 @@ static rtx noce_get_alt_condition (struct noce_if_info *, 
rtx, rtx_insn **);
 static int noce_try_minmax (struct noce_if_info *);
 static int noce_try_abs (struct noce_if_info *);
 static int noce_try_sign_mask (struct noce_if_info *);
+static rtx noce_emit_condzero (struct noce_if_info *, rtx, bool = false);
+static int noce_try_condzero (struct noce_if_info *);
+static int noce_try_condzero_arith (struct noce_if_info *);
 
 /* Return the comparison code for reversed condition for IF_INFO,
or UNKNOWN if reversing the condition is not possible.  */
@@ -1664,6 +1668,214 @@ noce_try_addcc (struct noce_if_info *if_info)
   return FALSE;
 }
 
+/* Helper to noce_try_condzero: cond ? a : 0. */
+static rtx

[RFC PATCH v1 06/10] RISC-V: Recognize sign-extract + and cases for czero.eqz/nez

2023-02-10 Thread Philipp Tomsich
Users might use explicit arithmetic operations to create a mask and
then and it, in a sequence like
cond = (bits >> SHIFT) & 1;
mask = ~(cond - 1);
val &= mask;
which will present as a single-bit sign-extract.

Dependening on what combination of XVentanaCondOps and Zbs are
available, this will map to the following sequences:
 - bexti + czero, if both Zbs and XVentanaCondOps are present
 - andi + czero,  if only XVentanaCondOps is available and the
  sign-extract is operating on bits 10:0 (bit 11
  can't be reached, as the immediate is
  sign-extended)
 - slli + srli + and, otherwise.

gcc/ChangeLog:

* config/riscv/zicond.md: Recognize SIGN_EXTRACT of a
single-bit followed by AND for Zicond.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zicond-le-01.c: New test.

Signed-off-by: Philipp Tomsich 
---

 gcc/config/riscv/zicond.md| 45 +++
 gcc/testsuite/gcc.target/riscv/zicond-le-01.c | 16 +++
 2 files changed, 61 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zicond-le-01.c

diff --git a/gcc/config/riscv/zicond.md b/gcc/config/riscv/zicond.md
index 9d1ce067150..15fdaa539f1 100644
--- a/gcc/config/riscv/zicond.md
+++ b/gcc/config/riscv/zicond.md
@@ -98,3 +98,48 @@ (define_split
   operands[6] = gen_rtx_fmt_ee (GET_CODE (operands[1]) == LE ? GT : GTU,
mode, operands[2], operands[3]);
 })
+
+;; Users might use explicit arithmetic operations to create a mask and
+;; then and it, in a sequence like
+;;cond = (bits >> SHIFT) & 1;
+;;mask = ~(cond - 1);
+;;val &= mask;
+;; which will present as a single-bit sign-extract in the combiner.
+;;
+;; This will give rise to any of the following cases:
+;; - with Zbs and XVentanaCondOps: bexti + vt.maskc
+;; - with XVentanaCondOps (but w/o Zbs):
+;;   - andi + vt.maskc, if the mask is representable in the immediate
+;;  (which requires extra care due to the immediate
+;;   being sign-extended)
+;;   - slli + srli + and
+;; - otherwise: slli + srli + and
+
+;; With Zbb, we have bexti for all possible bits...
+(define_split
+  [(set (match_operand:X 0 "register_operand")
+   (and:X (sign_extract:X (match_operand:X 1 "register_operand")
+  (const_int 1)
+  (match_operand 2 "immediate_operand"))
+  (match_operand:X 3 "register_operand")))
+   (clobber (match_operand:X 4 "register_operand"))]
+  "TARGET_ZICOND && TARGET_ZBS"
+  [(set (match_dup 4) (zero_extract:X (match_dup 1) (const_int 1) (match_dup 
2)))
+   (set (match_dup 0) (and:X (neg:X (ne:X (match_dup 4) (const_int 0)))
+(match_dup 3)))])
+
+;; ...whereas RV64I only allows us access to bits 0..10 in a single andi.
+(define_split
+  [(set (match_operand:X 0 "register_operand")
+   (and:X (sign_extract:X (match_operand:X 1 "register_operand")
+  (const_int 1)
+  (match_operand 2 "immediate_operand"))
+  (match_operand:X 3 "register_operand")))
+   (clobber (match_operand:X 4 "register_operand"))]
+  "TARGET_ZICOND && !TARGET_ZBS && (UINTVAL (operands[2]) < 11)"
+  [(set (match_dup 4) (and:X (match_dup 1) (match_dup 2)))
+   (set (match_dup 0) (and:X (neg:X (ne:X (match_dup 4) (const_int 0)))
+(match_dup 3)))]
+{
+  operands[2] = GEN_INT(1 << UINTVAL(operands[2]));
+})
diff --git a/gcc/testsuite/gcc.target/riscv/zicond-le-01.c 
b/gcc/testsuite/gcc.target/riscv/zicond-le-01.c
new file mode 100644
index 000..e5902d1ca5b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zicond-le-01.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zicond -mabi=lp64 -mbranch-cost=4" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-O1" "-Os" "-Oz"  } } */
+
+long long sink (long long);
+
+long long le1 (long long a, long long b)
+{
+  if (a <= b)
+b = 0;
+
+  return sink(b);
+}
+
+/* { dg-final { scan-assembler-times "sgt\t" 1 } } */
+/* { dg-final { scan-assembler-times "czero.eqz\t" 1 } } */
-- 
2.34.1



[RFC PATCH v1 04/10] RISC-V: Support immediates in Zicond

2023-02-10 Thread Philipp Tomsich
When if-conversion encounters sequences using immediates, the
sequences can't trivially map back onto czero.eqz/czero.nezt (even if
benefitial) due to czero.eqz/czero.nez not having immediate forms.

This adds a splitter to rewrite opportunities for Zicond that operate
on an immediate by first putting the immediate into a register to
enable the non-immediate czero.eqz/czero.nez instructions to operate
on the value.

Consider code, such as

  long func2 (long a, long c)
  {
if (c)
  a = 2;
else
  a = 5;
return a;
  }

which will be converted to

  func2:
seqza0,a2
neg a0,a0
andia0,a0,3
addia0,a0,2
ret

Following this change, we generate

li  a0,3
czero.nez a0,a0,a2
addia0,a0,2
ret

This commit also introduces a simple unit test for if-conversion with
immediate (literal) values as the sources for simple sets in the THEN
and ELSE blocks. The test checks that the conditional-zero instruction
(czero.eqz/nez) is emitted as part of the resulting branchless
instruction sequence.

gcc/ChangeLog:

* config/riscv/zicond.md: Support immediates for
czero.eqz/czero.nez through a splitter.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zicond-ifconv-imm.c: New test.

Signed-off-by: Philipp Tomsich 
---

 gcc/config/riscv/zicond.md| 20 +++
 .../gcc.target/riscv/zicond-ifconv-imm.c  | 19 ++
 2 files changed, 39 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zicond-ifconv-imm.c

diff --git a/gcc/config/riscv/zicond.md b/gcc/config/riscv/zicond.md
index 278e3a67802..19d0b35585b 100644
--- a/gcc/config/riscv/zicond.md
+++ b/gcc/config/riscv/zicond.md
@@ -28,3 +28,23 @@ (define_insn "*czero."
(match_operand:DI 2 "register_operand" "r")))]
   "TARGET_ZICOND"
   "czero.\t%0,%2,%1")
+
+;; Zicond does not have immediate forms, so we need to do extra work
+;; to support these: if we encounter a vt.maskc/n with an immediate,
+;; we split this into a load-immediate followed by a czero.eqz/nez.
+(define_split
+  [(set (match_operand:DI 0 "register_operand")
+   (and:DI (neg:DI (match_operator:DI 1 "equality_operator"
+  [(match_operand:DI 2 "register_operand")
+   (const_int 0)]))
+   (match_operand:DI 3 "immediate_operand")))
+   (clobber (match_operand:DI 4 "register_operand"))]
+  "TARGET_ZICOND"
+  [(set (match_dup 4) (match_dup 3))
+   (set (match_dup 0) (and:DI (neg:DI (match_dup 1))
+ (match_dup 4)))]
+{
+  /* Eliminate the clobber/temporary, if it is not needed. */
+  if (!rtx_equal_p (operands[0], operands[2]))
+ operands[4] = operands[0];
+})
diff --git a/gcc/testsuite/gcc.target/riscv/zicond-ifconv-imm.c 
b/gcc/testsuite/gcc.target/riscv/zicond-ifconv-imm.c
new file mode 100644
index 000..f410537a4f0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zicond-ifconv-imm.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zicond -mabi=lp64" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-Os" "-Oz" } } */
+
+/* Each function below should emit a czero.nez instruction */
+
+long
+foo0 (long a, long b, long c)
+{
+  if (c)
+a = 0;
+  else
+a = 5;
+  return a;
+}
+
+/* { dg-final { scan-assembler-times "czero.nez\t" 1 } } */
+/* { dg-final { scan-assembler-not "beqz\t" } } */
+/* { dg-final { scan-assembler-not "bnez\t" } } */
-- 
2.34.1



[RFC PATCH v1 05/10] RISC-V: Support noce_try_store_flag_mask as czero.eqz/czero.nez

2023-02-10 Thread Philipp Tomsich
When if-conversion in noce_try_store_flag_mask starts the sequence off
with an order-operator, our patterns for czero.eqz/nez will receive
the result of the order-operator as a register argument; consequently,
they can't know that the result will be either 1 or 0.

To convey this information (and make czero.eqz/nez applicable), we
wrap the result of the order-operator in a eq/ne against (const_int 0).
This commit adds the split pattern to handle these cases.

During if-conversion, if noce_try_store_flag_mask succeeds, we may see
if (cur < next) {
next = 0;
}
transformed into
   27: r82:SI=ltu(r76:DI,r75:DI)
  REG_DEAD r76:DI
   28: r81:SI=r82:SI^0x1
  REG_DEAD r82:SI
   29: r80:DI=zero_extend(r81:SI)
  REG_DEAD r81:SI

This currently escapes the combiner, as RISC-V does not have a pattern
to apply the 'slt' instruction to 'geu' verbs.  By adding a pattern in
this commit, we match such cases.

gcc/ChangeLog:

* config/riscv/predicates.md (anyge_operator): Define.
(anygt_operator): Same.
(anyle_operator): Same.
(anylt_operator): Same.
* config/riscv/riscv.md: Helpers for ge(u) & le(u).
* config/riscv/zicond.md: Add split to wrap an an
order-operator suitably for generating czero.eqz/nez

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zicond-le-02.c: New test.
* gcc.target/riscv/zicond-lt-03.c: New test.

Signed-off-by: Philipp Tomsich 
---

 gcc/config/riscv/predicates.md| 12 +
 gcc/config/riscv/riscv.md | 26 ++
 gcc/config/riscv/zicond.md| 50 +++
 gcc/testsuite/gcc.target/riscv/zicond-le-02.c | 11 
 gcc/testsuite/gcc.target/riscv/zicond-lt-03.c | 16 ++
 5 files changed, 115 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zicond-le-02.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zicond-lt-03.c

diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index 034d088c656..6b6f867824e 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -204,6 +204,18 @@ (define_predicate "modular_operator"
 (define_predicate "equality_operator"
   (match_code "eq,ne"))
 
+(define_predicate "anyge_operator"
+  (match_code "ge,geu"))
+
+(define_predicate "anygt_operator"
+  (match_code "gt,gtu"))
+
+(define_predicate "anyle_operator"
+  (match_code "le,leu"))
+
+(define_predicate "anylt_operator"
+  (match_code "lt,ltu"))
+
 (define_predicate "order_operator"
   (match_code "eq,ne,lt,ltu,le,leu,ge,geu,gt,gtu"))
 
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 7c632bb4d65..6f255a80379 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -2668,6 +2668,19 @@ (define_insn "*sge_"
   [(set_attr "type" "slt")
(set_attr "mode" "")])
 
+(define_split
+  [(set (match_operand:GPR 0 "register_operand")
+   (match_operator:GPR 1 "anyle_operator"
+  [(match_operand:X 2 "register_operand")
+   (match_operand:X 3 "register_operand")]))]
+  "TARGET_ZICOND"
+  [(set (match_dup 0) (match_dup 4))
+   (set (match_dup 0) (eq:GPR (match_dup 0) (const_int 0)))]
+ {
+  operands[4] = gen_rtx_fmt_ee (GET_CODE (operands[1]) == LE ? LT : LTU,
+   mode, operands[3], operands[2]);
+ })
+
 (define_insn "*slt_"
   [(set (match_operand:GPR   0 "register_operand" "= r")
(any_lt:GPR (match_operand:X 1 "register_operand" "  r")
@@ -2689,6 +2702,19 @@ (define_insn "*sle_"
   [(set_attr "type" "slt")
(set_attr "mode" "")])
 
+(define_split
+  [(set (match_operand:GPR 0 "register_operand")
+   (match_operator:GPR 1 "anyge_operator"
+  [(match_operand:X 2 "register_operand")
+   (match_operand:X 3 "register_operand")]))]
+  "TARGET_ZICOND"
+  [(set (match_dup 0) (match_dup 4))
+   (set (match_dup 0) (eq:GPR (match_dup 0) (const_int 0)))]
+{
+  operands[4] = gen_rtx_fmt_ee (GET_CODE (operands[1]) == GE ? LT : LTU,
+   mode, operands[2], operands[3]);
+})
+
 ;;
 ;;  
 ;;
diff --git a/gcc/config/riscv/zicond.md b/gcc/config/riscv/zicond.md
index 19d0b35585b..9d1ce067150 100644
--- a/gcc/config/riscv/zicond.md
+++ b/gcc/config/riscv/zicond.md
@@ -48,3 +48,53 @@ (define_split
   if (!rtx_equal_p (operands[0], operands[2]))
  operands[4] = operands[0];
 })
+
+;; Make order operators digestible to the vt.maskc logic by
+;; wrapping their result in a comparison against (const_int 0).
+
+;; "a >= b" is "!(a < b)"
+(define_split
+  [(set (match_operand:X 0 "register_operand")
+   (and:X (neg:X (match_operator:X 1 "anyge_operator"
+[(match_operand:X 2 "register_operand")
+ (match_operand:X 3 "arith_operand")]))
+  (match_operand:X 4 "register_operand")))
+   (clobber (match_operand:X 5 "register_operand"))]
+  "TARGET_ZICOND"
+  [(set 

[RFC PATCH v1 09/10] RISC-V: Recognize xventanacondops extension

2023-02-10 Thread Philipp Tomsich
This adds the xventanacondops extension to the option parsing and as a
default for the ventana-vt1 core:

gcc/Changelog:

* common/config/riscv/riscv-common.cc: Recognize
  "xventanacondops" as part of an architecture string.
* config/riscv/riscv-opts.h (MASK_XVENTANACONDOPS): Define.
(TARGET_XVENTANACONDOPS): Define.
* config/riscv/riscv.opt: Add "riscv_xventanacondops".

Signed-off-by: Philipp Tomsich 
---

 gcc/common/config/riscv/riscv-common.cc | 2 ++
 gcc/config/riscv/riscv-opts.h   | 3 +++
 gcc/config/riscv/riscv.opt  | 3 +++
 3 files changed, 8 insertions(+)

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 999e1926db1..a77a04d68d9 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -1250,6 +1250,8 @@ static const riscv_ext_flag_table_t 
riscv_ext_flag_table[] =
   {"svinval", _options::x_riscv_sv_subext, MASK_SVINVAL},
   {"svnapot", _options::x_riscv_sv_subext, MASK_SVNAPOT},
 
+  {"xventanacondops", _options::x_riscv_xventanacondops, 
MASK_XVENTANACONDOPS},
+
   {NULL, NULL, 0}
 };
 
diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index 61d5212da20..d80e81c6c28 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -191,4 +191,7 @@ enum stack_protector_guard {
? 0 \
: 32 << (__builtin_popcount (riscv_zvl_flags) - 1))
 
+#define MASK_XVENTANACONDOPS (1 << 0)
+#define TARGET_XVENTANACONDOPS ((riscv_xventanacondops & MASK_XVENTANACONDOPS) 
!= 0)
+
 #endif /* ! GCC_RISCV_OPTS_H */
diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index e78c99382cd..6ebaad43d0e 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -233,6 +233,9 @@ int riscv_zm_subext
 TargetVariable
 int riscv_sv_subext
 
+TargetVariable
+int riscv_xventanacondops = 0
+
 Enum
 Name(isa_spec_class) Type(enum riscv_isa_spec_class)
 Supported ISA specs (for use with the -misa-spec= option):
-- 
2.34.1



[RFC PATCH v1 00/10] RISC-V: Support the Zicond (conditional-operations) extension

2023-02-10 Thread Philipp Tomsich


The (proposed, but about to be frozen) Zicond extension adds 2
unconditional R-type instructions that can be used to build branchless
sequences that have conditional-arithmetic/bitwise/select semantics
and integrate will with the RISC-V architecture.

See the Zicond specification for details:
  
https://github.com/riscv/riscv-zicond/releases/download/v1.0-draft-20230207/riscv-zicond_1.0-draft-20230207.pdf

The Zicond extension defines a conditional-zero(-or-value)
instruction, which is similar to the following C construct:
  rd = rc ? rs : 0

This functionality can be tied back into if-convertsion and also
matches some typical programming idioms.  This series includes backend
support for Zicond both to handle conditional-zero constructions and
if-conversion.  We also change the previously submitted
XVentanaCondops support to use the Zicond infrastructure.

Tested against SPEC CPU 2017.



Philipp Tomsich (10):
  docs: Document a canonical RTL for a conditional-zero insns
  RISC-V: Recognize Zicond (conditional operations) extension
  RISC-V: Generate czero.eqz/nez on noce_try_store_flag_mask
if-conversion
  RISC-V: Support immediates in Zicond
  RISC-V: Support noce_try_store_flag_mask as czero.eqz/czero.nez
  RISC-V: Recognize sign-extract + and cases for czero.eqz/nez
  RISC-V: Recognize bexti in negated if-conversion
  ifcvt: add if-conversion to conditional-zero instructions
  RISC-V: Recognize xventanacondops extension
  RISC-V: Support XVentanaCondOps extension

 gcc/common/config/riscv/riscv-common.cc   |   5 +
 gcc/config/riscv/predicates.md|  12 +
 gcc/config/riscv/riscv-opts.h |   5 +
 gcc/config/riscv/riscv.cc |  15 ++
 gcc/config/riscv/riscv.md |  28 +++
 gcc/config/riscv/riscv.opt|   3 +
 gcc/config/riscv/xventanacondops.md   |  29 +++
 gcc/config/riscv/zicond.md| 156 +
 gcc/doc/md.texi   |  17 ++
 gcc/ifcvt.cc  | 216 ++
 .../gcc.target/riscv/xventanacondops-and-01.c |  16 ++
 .../gcc.target/riscv/xventanacondops-and-02.c |  15 ++
 .../gcc.target/riscv/xventanacondops-eq-01.c  |  11 +
 .../gcc.target/riscv/xventanacondops-eq-02.c  |  14 ++
 .../riscv/xventanacondops-ifconv-imm.c|  19 ++
 .../gcc.target/riscv/xventanacondops-le-01.c  |  16 ++
 .../gcc.target/riscv/xventanacondops-le-02.c  |  11 +
 .../gcc.target/riscv/xventanacondops-lt-01.c  |  16 ++
 .../gcc.target/riscv/xventanacondops-lt-03.c  |  16 ++
 .../gcc.target/riscv/xventanacondops-ne-01.c  |  10 +
 .../gcc.target/riscv/xventanacondops-ne-03.c  |  13 ++
 .../gcc.target/riscv/xventanacondops-ne-04.c  |  13 ++
 .../gcc.target/riscv/xventanacondops-xor-01.c |  14 ++
 .../gcc.target/riscv/zicond-and-01.c  |  16 ++
 .../gcc.target/riscv/zicond-and-02.c  |  15 ++
 gcc/testsuite/gcc.target/riscv/zicond-eq-01.c |  11 +
 gcc/testsuite/gcc.target/riscv/zicond-eq-02.c |  14 ++
 .../gcc.target/riscv/zicond-ifconv-imm.c  |  19 ++
 gcc/testsuite/gcc.target/riscv/zicond-le-01.c |  16 ++
 gcc/testsuite/gcc.target/riscv/zicond-le-02.c |  11 +
 gcc/testsuite/gcc.target/riscv/zicond-lt-01.c |  16 ++
 gcc/testsuite/gcc.target/riscv/zicond-lt-03.c |  16 ++
 gcc/testsuite/gcc.target/riscv/zicond-ne-01.c |  10 +
 gcc/testsuite/gcc.target/riscv/zicond-ne-03.c |  13 ++
 gcc/testsuite/gcc.target/riscv/zicond-ne-04.c |  13 ++
 .../gcc.target/riscv/zicond-xor-01.c  |  14 ++
 36 files changed, 854 insertions(+)
 create mode 100644 gcc/config/riscv/xventanacondops.md
 create mode 100644 gcc/config/riscv/zicond.md
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-and-01.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-and-02.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-eq-01.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-eq-02.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-ifconv-imm.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-le-01.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-le-02.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-lt-01.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-lt-03.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-ne-01.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-ne-03.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-ne-04.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-xor-01.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zicond-and-01.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zicond-and-02.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zicond-eq-01.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zicond-eq-02.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zicond-ifconv-imm.c
 

[RFC PATCH v1 03/10] RISC-V: Generate czero.eqz/nez on noce_try_store_flag_mask if-conversion

2023-02-10 Thread Philipp Tomsich
Adds a pattern to map the output of noce_try_store_flag_mask
if-conversion in the combiner onto vt.maskc; the input patterns
supported are similar to the following:
  (set (reg/v/f:DI 75 [  ])
   (and:DI (neg:DI (ne:DI (reg:DI 82)
   (const_int 0 [0])))
   (reg/v/f:DI 75 [  ])))

To ensure that the combine-pass doesn't get confused about
profitability, we recognize the idiom as requiring a single
instruction when the Zicond extension is present.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_rtx_costs): Recongnize the idiom
for conditional-zero as a single instruction for TARGET_ZICOND
* config/riscv/riscv.md: Include zicond.md.
* config/riscv/zicond.md: New file.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zicond-ne-03.c: New test.
* gcc.target/riscv/zicond-ne-04.c: New test.

Signed-off-by: Philipp Tomsich 
---

 gcc/config/riscv/riscv.cc | 15 ++
 gcc/config/riscv/riscv.md |  1 +
 gcc/config/riscv/zicond.md| 30 +++
 gcc/testsuite/gcc.target/riscv/zicond-ne-03.c | 13 
 gcc/testsuite/gcc.target/riscv/zicond-ne-04.c | 13 
 5 files changed, 72 insertions(+)
 create mode 100644 gcc/config/riscv/zicond.md
 create mode 100644 gcc/testsuite/gcc.target/riscv/zicond-ne-03.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zicond-ne-04.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index e4d3e1a3229..7e69a652fc5 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -2331,6 +2331,21 @@ riscv_rtx_costs (rtx x, machine_mode mode, int 
outer_code, int opno ATTRIBUTE_UN
   return false;
 
 case AND:
+  /* czero.eqz/nez */
+  if ((TARGET_ZICOND)
+ && mode == word_mode
+ && GET_CODE (XEXP (x, 0)) == NEG)
+   {
+ rtx inner = XEXP (XEXP (x, 0), 0);
+
+ if ((GET_CODE (inner) == EQ || GET_CODE (inner) == NE)
+ && CONST_INT_P (XEXP (inner, 1))
+ && INTVAL (XEXP (inner, 1)) == 0)
+   {
+ *total = COSTS_N_INSNS (1);
+ return true;
+   }
+   }
   /* slli.uw pattern for zba.  */
   if (TARGET_ZBA && TARGET_64BIT && mode == DImode
  && GET_CODE (XEXP (x, 0)) == ASHIFT)
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index e8b5fc6644d..7c632bb4d65 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -3228,3 +3228,4 @@ (define_insn "riscv_prefetchi_"
 (include "generic.md")
 (include "sifive-7.md")
 (include "vector.md")
+(include "zicond.md")
diff --git a/gcc/config/riscv/zicond.md b/gcc/config/riscv/zicond.md
new file mode 100644
index 000..278e3a67802
--- /dev/null
+++ b/gcc/config/riscv/zicond.md
@@ -0,0 +1,30 @@
+;; Machine description for the RISC-V Zicond extension
+;; Copyright (C) 2022-23 Free Software Foundation, Inc.
+
+;; This file is part of GCC.
+
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; .
+
+(define_code_iterator eq_or_ne [eq ne])
+(define_code_attr eqz [(eq "nez") (ne "eqz")])
+
+(define_insn "*czero."
+  [(set (match_operand:DI 0 "register_operand" "=r")
+   (and:DI (neg:DI (eq_or_ne:DI
+   (match_operand:DI 1 "register_operand" "r")
+   (const_int 0)))
+   (match_operand:DI 2 "register_operand" "r")))]
+  "TARGET_ZICOND"
+  "czero.\t%0,%2,%1")
diff --git a/gcc/testsuite/gcc.target/riscv/zicond-ne-03.c 
b/gcc/testsuite/gcc.target/riscv/zicond-ne-03.c
new file mode 100644
index 000..887b1273ce7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zicond-ne-03.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zicond -mabi=lp64 -mtune=thead-c906" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-O1" "-Os" "-Oz" } } */
+
+long long ne3(long long a, long long b)
+{
+  if (a != 0)
+return b;
+
+  return 0;
+}
+
+/* { dg-final { scan-assembler-times "czero.eqz" 1 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/zicond-ne-04.c 
b/gcc/testsuite/gcc.target/riscv/zicond-ne-04.c
new file mode 100644
index 000..6f6df06b4b9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zicond-ne-04.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zicond -mabi=lp64 -mtune=thead-c906" } */
+/* { dg-skip-if "" { *-*-* } { 

[RFC PATCH v1 01/10] docs: Document a canonical RTL for a conditional-zero insns

2023-02-10 Thread Philipp Tomsich
On RISC-V, conditional-zero (i.e., move a register value or zero to a
destination register) instructions are part if the Zicond extension.
To support architectures that have similar constructs, we define a
canonical RTL representation that can be used in if-conversion.

Signed-off-by: Philipp Tomsich 
---

 gcc/doc/md.texi | 17 +
 1 file changed, 17 insertions(+)

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 7235d34c4b3..579462ea67f 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -8347,6 +8347,23 @@ operand of @code{mult} is also a shift, then that is 
extended also.
 This transformation is only applied when it can be proven that the
 original operation had sufficient precision to prevent overflow.
 
+@cindex @code{conditional-zero}, canonicalization of
+@item
+A machine that has an instruction that performs a conditional-zero
+operation (i.e., an instruction that moves a register value or puts 0
+into the destination register) should specify the pattern for that
+instruction as:
+@smallexample
+(define_insn ""
+  [(set (match_operand:@var{m} 0 @dots{})
+(and:@var{m}
+  (neg:@var{m} (@var{eq_or_ne} (match_operand:@var{m} 1 @dots{})
+   (const_int 0)))
+  (match_operand:@var{m} 2 @dots{})))]
+  "@dots{}"
+  "@dots{}")
+@end smallexample
+
 @end itemize
 
 Further canonicalization rules are defined in the function
-- 
2.34.1



[RFC PATCH v1 02/10] RISC-V: Recognize Zicond (conditional operations) extension

2023-02-10 Thread Philipp Tomsich
This adds the RISC-V Zicond extension to the option parsing.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: Recognize "zicond"
as part of an architecture string.
* config/riscv/riscv-opts.h (MASK_ZICOND): Define.
(TARGET_ZICOND): Define.

Signed-off-by: Philipp Tomsich 
---

 gcc/common/config/riscv/riscv-common.cc | 3 +++
 gcc/config/riscv/riscv-opts.h   | 2 ++
 2 files changed, 5 insertions(+)

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 787674003cb..999e1926db1 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -163,6 +163,8 @@ static const struct riscv_ext_version 
riscv_ext_version_table[] =
   {"zifencei", ISA_SPEC_CLASS_20191213, 2, 0},
   {"zifencei", ISA_SPEC_CLASS_20190608, 2, 0},
 
+  {"zicond", ISA_SPEC_CLASS_NONE, 1, 0},
+
   {"zawrs", ISA_SPEC_CLASS_NONE, 1, 0},
 
   {"zba", ISA_SPEC_CLASS_NONE, 1, 0},
@@ -1181,6 +1183,7 @@ static const riscv_ext_flag_table_t 
riscv_ext_flag_table[] =
 
   {"zicsr",_options::x_riscv_zi_subext, MASK_ZICSR},
   {"zifencei", _options::x_riscv_zi_subext, MASK_ZIFENCEI},
+  {"zicond",   _options::x_riscv_zi_subext, MASK_ZICOND},
 
   {"zawrs", _options::x_riscv_za_subext, MASK_ZAWRS},
 
diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index ff398c0a2ae..61d5212da20 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -69,9 +69,11 @@ enum stack_protector_guard {
 
 #define MASK_ZICSR(1 << 0)
 #define MASK_ZIFENCEI (1 << 1)
+#define MASK_ZICOND   (1 << 2)
 
 #define TARGET_ZICSR((riscv_zi_subext & MASK_ZICSR) != 0)
 #define TARGET_ZIFENCEI ((riscv_zi_subext & MASK_ZIFENCEI) != 0)
+#define TARGET_ZICOND   ((riscv_zi_subext & MASK_ZICOND) != 0)
 
 #define MASK_ZAWRS   (1 << 0)
 #define TARGET_ZAWRS ((riscv_za_subext & MASK_ZAWRS) != 0)
-- 
2.34.1



Re: [PATCH] tree-optimization/106722 - fix CD-DCE edge marking

2023-02-10 Thread Jan Hubicka via Gcc-patches
> The following fixes a latent issue when we mark control edges but
> end up with marking a block with no stmts necessary.  In this case
> we fail to mark dependent control edges of that block.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
> 
> Does this look OK?
> 
> Thanks,
> Richard.
> 
>   PR tree-optimization/106722
>   * tree-ssa-dce.cc (mark_last_stmt_necessary): Return
>   whether we marked a stmt.
>   (mark_control_dependent_edges_necessary): When
>   mark_last_stmt_necessary didn't mark any stmt make sure
>   to mark its control dependent edges.
>   (propagate_necessity): Likewise.
> 
>   * gcc.dg/torture/pr108737.c: New testcase.
> diff --git a/gcc/tree-ssa-dce.cc b/gcc/tree-ssa-dce.cc
> index b2fe9f4f55e..21b3294fc86 100644
> --- a/gcc/tree-ssa-dce.cc
> +++ b/gcc/tree-ssa-dce.cc
> @@ -327,17 +327,23 @@ mark_stmt_if_obviously_necessary (gimple *stmt, bool 
> aggressive)
>  
>  /* Mark the last statement of BB as necessary.  */
>  
> -static void
> +static bool
>  mark_last_stmt_necessary (basic_block bb)
>  {
>gimple *stmt = last_stmt (bb);
>  
> -  bitmap_set_bit (last_stmt_necessary, bb->index);
> +  if (!bitmap_set_bit (last_stmt_necessary, bb->index))
> +return true;
> +
>bitmap_set_bit (bb_contains_live_stmts, bb->index);
>  
>/* We actually mark the statement only if it is a control statement.  */
>if (stmt && is_ctrl_stmt (stmt))
> -mark_stmt_necessary (stmt, true);
> +{
> +  mark_stmt_necessary (stmt, true);
> +  return true;
> +}
> +  return false;
>  }
>  
>  
> @@ -369,8 +375,8 @@ mark_control_dependent_edges_necessary (basic_block bb, 
> bool ignore_self)
> continue;
>   }
>  
> -  if (!bitmap_bit_p (last_stmt_necessary, cd_bb->index))
> - mark_last_stmt_necessary (cd_bb);
> +  if (!mark_last_stmt_necessary (cd_bb))
> + mark_control_dependent_edges_necessary (cd_bb, false);

Makes sense to me, though I am bit surprised it took such a long time to
show up.

Honza


Re: [PATCH 1/2]middle-end: Fix wrong overmatching of div-bitmask by using new optabs [PR108583]

2023-02-10 Thread Andrew MacLeod via Gcc-patches



On 2/10/23 13:34, Richard Biener wrote:



In any case, if you disagree I don’t' really see a way forward aside from 
making this its own pattern
running it before the overwidening pattern.

I think we should look to see if ranger can be persuaded to provide the
range of the 16-bit addition, even though the statement that produces it
isn't part of a BB.  It shouldn't matter that the addition originally
came from a 32-bit one: the range follows directly from the ranges of
the operands (i.e. the fact that the operands are the results of
widening conversions).

I think you can ask ranger on operations on names defined in the IL, so you can 
work yourself through the sequence of operations in the pattern sequence to 
compute ranges on their defs (and possibly even store them in the SSA info).  
You just need to pick the correct ranger API for this…. Andrew CCed



Its not clear to me whats being asked...

Expressions don't need to be in the IL to do range calculations.. I 
believe we support arbitrary tree expressions via range_of_expr.


if you have 32 bit ranges that you want to do 16 bit addition on, you 
can also cast those ranges to a 16bit type,


my32bitrange.cast (my16bittype);

then invoke range-ops directly via getting the handler:

handler = range_op_handler (PLUS_EXPR, 16bittype_tree);
if (handler)
   handler->fold (result, my16bittype, mycasted32bitrange, 
myothercasted32bitrange)


There are higher level APIs if what you have on hand is closer to IL 
than random ranges


Describe exactly what it is you want to do... and I'll try to direct you 
to the best way to do it.


Andrew





[pushed] wwwdocs: news/profiledriven: Update a Citeseer link

2023-02-10 Thread Gerald Pfeifer
citeseer.ist.psu.edu stalls, whereas citeseerx.ist.psu.edu responds.
Switch to https on the way.

Pushed.

Gerald
---
 htdocs/news/profiledriven.html | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/htdocs/news/profiledriven.html b/htdocs/news/profiledriven.html
index cac172b4..82febc6d 100644
--- a/htdocs/news/profiledriven.html
+++ b/htdocs/news/profiledriven.html
@@ -297,7 +297,7 @@ Frequency and Program Profile Analysis; Wu and Larus; 
MICRO-27.
 
 [8] wwwdocs:
 http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.39.1922;>Hyperblock
+href="https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.39.1922;>Hyperblock
 Performance Optimizations For ILP Processors;  David Isaac August, 1996
 
 [9] wwwdocs:
-- 
2.39.1


Re: [PATCH 1/2]middle-end: Fix wrong overmatching of div-bitmask by using new optabs [PR108583]

2023-02-10 Thread Richard Biener via Gcc-patches



> Am 10.02.2023 um 19:12 schrieb Richard Sandiford via Gcc-patches 
> :
> 
> Tamar Christina  writes:
>>> -Original Message-
>>> From: Richard Sandiford 
>>> Sent: Friday, February 10, 2023 4:57 PM
>>> To: Tamar Christina 
>>> Cc: Tamar Christina via Gcc-patches ; nd
>>> ; rguent...@suse.de; j...@ventanamicro.com
>>> Subject: Re: [PATCH 1/2]middle-end: Fix wrong overmatching of div-bitmask
>>> by using new optabs [PR108583]
>>> 
>>> Tamar Christina  writes:
> -Original Message-
> From: Richard Sandiford 
> Sent: Friday, February 10, 2023 4:25 PM
> To: Tamar Christina 
> Cc: Tamar Christina via Gcc-patches ; nd
> ; rguent...@suse.de; j...@ventanamicro.com
> Subject: Re: [PATCH 1/2]middle-end: Fix wrong overmatching of
> div-bitmask by using new optabs [PR108583]
> 
> Tamar Christina  writes:
>>> -Original Message-
>>> From: Richard Sandiford 
>>> Sent: Friday, February 10, 2023 3:57 PM
>>> To: Tamar Christina 
>>> Cc: Tamar Christina via Gcc-patches ; nd
>>> ; rguent...@suse.de; j...@ventanamicro.com
>>> Subject: Re: [PATCH 1/2]middle-end: Fix wrong overmatching of
>>> div-bitmask by using new optabs [PR108583]
>>> 
>>> Tamar Christina  writes:
>> a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc index
>> 
> 
>>> 
> 
>>> 6934aebc69f231af24668f0a1c3d140e97f55487..e39d7e6b362ef44eb2fc467f33
> 69
>> de2afea139d6 100644
>> --- a/gcc/tree-vect-patterns.cc
>> +++ b/gcc/tree-vect-patterns.cc
>> @@ -3914,12 +3914,82 @@ vect_recog_divmod_pattern
>>> (vec_info
>>> *vinfo,
>>   return pattern_stmt;
>> }
>>   else if ((cst = uniform_integer_cst_p (oprnd1))
>> -   && targetm.vectorize.can_special_div_by_const (rhs_code,
> vectype,
>> -  wi::to_wide
> (cst),
>> -  NULL,
> NULL_RTX,
>> -  NULL_RTX))
>> +   && TYPE_UNSIGNED (itype)
>> +   && rhs_code == TRUNC_DIV_EXPR
>> +   && vectype
>> +   && direct_internal_fn_supported_p (IFN_ADDH, vectype,
>> +  OPTIMIZE_FOR_SPEED))
>> {
>> -  return NULL;
>> +  /* div optimizations using narrowings
>> +   we can do the division e.g. shorts by 255 faster by
>> + calculating it
> as
>> +   (x + ((x + 257) >> 8)) >> 8 assuming the operation is done in
>> +   double the precision of x.
>> +
>> +   If we imagine a short as being composed of two blocks
>> + of bytes
>>> then
>> +   adding 257 or 0b_0001__0001 to the number is
>> + equivalent
>>> to
>> +   adding 1 to each sub component:
>> +
>> +short value of 16-bits
>> +   ┌──┬┐
>> +   │  ││
>> +   └──┴┘
>> + 8-bit part1 ▲  8-bit part2   ▲
>> + ││
>> + ││
>> ++1   +1
>> +
>> +   after the first addition, we have to shift right by
>> + 8, and narrow
> the
>> +   results back to a byte.  Remember that the addition
>> + must be done
>>> in
>> +   double the precision of the input.  However if we
>> + know that the
> addition
>> +   `x + 257` does not overflow then we can do the
>> + operation in the
> current
>> +   precision.  In which case we don't need the pack and
>>> unpacks.
> */
>> +  auto wcst = wi::to_wide (cst);
>> +  int pow = wi::exact_log2 (wcst + 1);
>> +  if (pow == (int) (element_precision (vectype) / 2))
>> +{
>> +  wide_int min,max;
>> +  /* If we're in a pattern we need to find the orginal
> definition.  */
>> +  tree op0 = oprnd0;
>> +  gimple *stmt = SSA_NAME_DEF_STMT (oprnd0);
>> +  stmt_vec_info stmt_info = vinfo->lookup_stmt (stmt);
>> +  if (is_pattern_stmt_p (stmt_info))
>> +{
>> +  auto orig_stmt = STMT_VINFO_RELATED_STMT
> (stmt_info);
>> +  if (is_gimple_assign (STMT_VINFO_STMT (orig_stmt)))
>> +op0 = gimple_assign_lhs (STMT_VINFO_STMT
> (orig_stmt));
>> +}
> 
> If this is generally safe (I'm skipping thinking about it in
> the interests of a quick review :-)), then I think it should be
> done in vect_get_range_info instead.  Using gimple_get_lhs
> would be more general than handling just 

Re: [PATCH] libstdc++: Add missing free functions for atomic_flag [PR103934]

2023-02-10 Thread Thomas Rodgers via Gcc-patches
This patch did not get committed in a timely manner after it was OK'd. In
revisiting the patch some issues were found that have lead me to resubmit
for review -

Specifically -

The original commit to add C++20 atomic_flag::test did not include the free
functions for atomic_flag_test[_explicit]
The original commit to add C++20 atomic_flag::wait/notify did not include
the free functions for atomic_flag_wait/notify[_explicit]

These two commits landed in GCC10 and GCC11 respectively. My original patch
included both sets of free functions, but
that complicates the backporting of these changes to GCC10, GCC11, and
GCC12.

Additionally commit 7c2155 removed const qualification from
atomic_flag::notify_one/notify_all but the original version of this
patch accepts the atomic flag as const.

The original version of this patch did not include test cases for the
atomic_flag_test[_explicit] free functions.

I have split the original patch into two patches, on for the
atomic_flag_test free functions, and one for the atomic_flag_wait/notify
free functions.


On Wed, Feb 2, 2022 at 1:35 PM Jonathan Wakely  wrote:

> >+  inline void
> >+  atomic_flag_wait_explicit(const atomic_flag* __a, bool __old,
> >+   std::memory_order __m) noexcept
>
> No need for the std:: qualification, and check the indentation.
>
>
> > libstdc++-v3/ChangeLog:
> >
> >PR103934
>
> This needs to include the component: PR libstdc++/103934
>
> >* include/std/atomic: Add missing free functions.
>
> Please name the new functions in the changelog, in the usual format.
> Just the names is fine, no need for the full signatures with
> parameters.
>
> OK for trunk with those changes.
>
>
From 1338538c1667b7fbe201d22d81254e33a0e7b24c Mon Sep 17 00:00:00 2001
From: Thomas W Rodgers 
Date: Fri, 10 Feb 2023 10:09:06 -0800
Subject: [PATCH 2/2] libstdc++: Add missing free functions for atomic_flag
 [PR103934]

This patch adds -
  atomic_flag_wait
  atomic_flag_wait_explicit
  atomic_flag_notify
  atomic_flag_notify_explicit

Which were missed when commit 83a1be introduced C++20 atomic wait.

libstdc++-v3/ChangeLog:

	PR libstdc++/103934
	* include/std/atomic: Add missing free functions.
	* testsuite/29_atomics/atomic_flag/wait_notify/1.cc:
	Add test case to cover missing atomic_flag free functions.
---
 libstdc++-v3/include/std/atomic   | 19 ++
 .../29_atomics/atomic_flag/wait_notify/1.cc   | 26 +--
 2 files changed, 43 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/include/std/atomic b/libstdc++-v3/include/std/atomic
index 1edd3ae16fa..e82a6427378 100644
--- a/libstdc++-v3/include/std/atomic
+++ b/libstdc++-v3/include/std/atomic
@@ -1259,6 +1259,25 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   atomic_flag_clear(volatile atomic_flag* __a) noexcept
   { atomic_flag_clear_explicit(__a, memory_order_seq_cst); }
 
+#if __cpp_lib_atomic_wait
+  inline void
+  atomic_flag_wait(atomic_flag* __a, bool __old) noexcept
+  { __a->wait(__old); }
+
+  inline void
+  atomic_flag_wait_explicit(atomic_flag* __a, bool __old,
+memory_order __m) noexcept
+  { __a->wait(__old, __m); }
+
+  inline void
+  atomic_flag_notify_one(atomic_flag* __a) noexcept
+  { __a->notify_one(); }
+
+  inline void
+  atomic_flag_notify_all(atomic_flag* __a) noexcept
+  { __a->notify_all(); }
+#endif // __cpp_lib_atomic_wait
+
   /// @cond undocumented
   // _GLIBCXX_RESOLVE_LIB_DEFECTS
   // 3220. P0558 broke conforming C++14 uses of atomic shared_ptr
diff --git a/libstdc++-v3/testsuite/29_atomics/atomic_flag/wait_notify/1.cc b/libstdc++-v3/testsuite/29_atomics/atomic_flag/wait_notify/1.cc
index 240fb4259f7..777fa915ea1 100644
--- a/libstdc++-v3/testsuite/29_atomics/atomic_flag/wait_notify/1.cc
+++ b/libstdc++-v3/testsuite/29_atomics/atomic_flag/wait_notify/1.cc
@@ -26,8 +26,8 @@
 
 #include 
 
-int
-main()
+void
+test01()
 {
   std::atomic_flag a;
   VERIFY( !a.test() );
@@ -39,5 +39,27 @@ main()
 });
   a.wait(false);
   t.join();
+}
+
+void
+test02()
+{
+  std::atomic_flag a;
+  VERIFY( !std::atomic_flag_test() );
+  std::atomic_flag_wait(, true);
+  std::thread t([&]
+{
+  std::atomic_flag_test_and_set();
+  std::atomic_flag_notify_one();
+});
+std::atomic_flag_wait(, false);
+t.join();
+}
+
+int
+main()
+{
+  test01();
+  test02();
   return 0;
 }
-- 
2.39.1

From 7f21e98bbefde5437a116ccda35a85922f5882bf Mon Sep 17 00:00:00 2001
From: Thomas W Rodgers 
Date: Fri, 10 Feb 2023 09:35:11 -0800
Subject: [PATCH 1/2] libstdc++: Add missing free functions for atomic_flag
 [PR103934]

This patch adds -
  atomic_flag_test
  atomic_flag_test_explicit

Which were missed when commit 491ba6 introduced C++20 atomic flag
test.

libstdc++-v3/ChangeLog:

	PR libstdc++/103934
	* include/std/atomic: Add missing free functions.
	* testsuite/29_atomics/atomic_flag/test/explicit.cc
Add test case to cover missing atomic_flag free functions.
* 

Re: [PATCH 1/2]middle-end: Fix wrong overmatching of div-bitmask by using new optabs [PR108583]

2023-02-10 Thread Richard Sandiford via Gcc-patches
Tamar Christina  writes:
>> -Original Message-
>> From: Richard Sandiford 
>> Sent: Friday, February 10, 2023 4:57 PM
>> To: Tamar Christina 
>> Cc: Tamar Christina via Gcc-patches ; nd
>> ; rguent...@suse.de; j...@ventanamicro.com
>> Subject: Re: [PATCH 1/2]middle-end: Fix wrong overmatching of div-bitmask
>> by using new optabs [PR108583]
>> 
>> Tamar Christina  writes:
>> >> -Original Message-
>> >> From: Richard Sandiford 
>> >> Sent: Friday, February 10, 2023 4:25 PM
>> >> To: Tamar Christina 
>> >> Cc: Tamar Christina via Gcc-patches ; nd
>> >> ; rguent...@suse.de; j...@ventanamicro.com
>> >> Subject: Re: [PATCH 1/2]middle-end: Fix wrong overmatching of
>> >> div-bitmask by using new optabs [PR108583]
>> >>
>> >> Tamar Christina  writes:
>> >> >> -Original Message-
>> >> >> From: Richard Sandiford 
>> >> >> Sent: Friday, February 10, 2023 3:57 PM
>> >> >> To: Tamar Christina 
>> >> >> Cc: Tamar Christina via Gcc-patches ; nd
>> >> >> ; rguent...@suse.de; j...@ventanamicro.com
>> >> >> Subject: Re: [PATCH 1/2]middle-end: Fix wrong overmatching of
>> >> >> div-bitmask by using new optabs [PR108583]
>> >> >>
>> >> >> Tamar Christina  writes:
>> >> >> >> > a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc index
>> >> >> >> >
>> >> >> >>
>> >> >>
>> >>
>> 6934aebc69f231af24668f0a1c3d140e97f55487..e39d7e6b362ef44eb2fc467f33
>> >> >> >> 69
>> >> >> >> > de2afea139d6 100644
>> >> >> >> > --- a/gcc/tree-vect-patterns.cc
>> >> >> >> > +++ b/gcc/tree-vect-patterns.cc
>> >> >> >> > @@ -3914,12 +3914,82 @@ vect_recog_divmod_pattern
>> (vec_info
>> >> >> *vinfo,
>> >> >> >> >return pattern_stmt;
>> >> >> >> >  }
>> >> >> >> >else if ((cst = uniform_integer_cst_p (oprnd1))
>> >> >> >> > -&& targetm.vectorize.can_special_div_by_const (rhs_code,
>> >> >> >> vectype,
>> >> >> >> > -   wi::to_wide
>> >> (cst),
>> >> >> >> > -   NULL,
>> >> NULL_RTX,
>> >> >> >> > -   NULL_RTX))
>> >> >> >> > +&& TYPE_UNSIGNED (itype)
>> >> >> >> > +&& rhs_code == TRUNC_DIV_EXPR
>> >> >> >> > +&& vectype
>> >> >> >> > +&& direct_internal_fn_supported_p (IFN_ADDH, vectype,
>> >> >> >> > +   OPTIMIZE_FOR_SPEED))
>> >> >> >> >  {
>> >> >> >> > -  return NULL;
>> >> >> >> > +  /* div optimizations using narrowings
>> >> >> >> > +   we can do the division e.g. shorts by 255 faster by
>> >> >> >> > + calculating it
>> >> as
>> >> >> >> > +   (x + ((x + 257) >> 8)) >> 8 assuming the operation is 
>> >> >> >> > done in
>> >> >> >> > +   double the precision of x.
>> >> >> >> > +
>> >> >> >> > +   If we imagine a short as being composed of two blocks
>> >> >> >> > + of bytes
>> >> >> then
>> >> >> >> > +   adding 257 or 0b_0001__0001 to the number is
>> >> >> >> > + equivalent
>> >> >> to
>> >> >> >> > +   adding 1 to each sub component:
>> >> >> >> > +
>> >> >> >> > + short value of 16-bits
>> >> >> >> > +   ┌──┬┐
>> >> >> >> > +   │  ││
>> >> >> >> > +   └──┴┘
>> >> >> >> > +  8-bit part1 ▲  8-bit part2   ▲
>> >> >> >> > +  ││
>> >> >> >> > +  ││
>> >> >> >> > + +1   +1
>> >> >> >> > +
>> >> >> >> > +   after the first addition, we have to shift right by
>> >> >> >> > + 8, and narrow
>> >> the
>> >> >> >> > +   results back to a byte.  Remember that the addition
>> >> >> >> > + must be done
>> >> >> in
>> >> >> >> > +   double the precision of the input.  However if we
>> >> >> >> > + know that the
>> >> >> >> addition
>> >> >> >> > +   `x + 257` does not overflow then we can do the
>> >> >> >> > + operation in the
>> >> >> >> current
>> >> >> >> > +   precision.  In which case we don't need the pack and
>> unpacks.
>> >> */
>> >> >> >> > +  auto wcst = wi::to_wide (cst);
>> >> >> >> > +  int pow = wi::exact_log2 (wcst + 1);
>> >> >> >> > +  if (pow == (int) (element_precision (vectype) / 2))
>> >> >> >> > + {
>> >> >> >> > +   wide_int min,max;
>> >> >> >> > +   /* If we're in a pattern we need to find the orginal
>> >> definition.  */
>> >> >> >> > +   tree op0 = oprnd0;
>> >> >> >> > +   gimple *stmt = SSA_NAME_DEF_STMT (oprnd0);
>> >> >> >> > +   stmt_vec_info stmt_info = vinfo->lookup_stmt (stmt);
>> >> >> >> > +   if (is_pattern_stmt_p (stmt_info))
>> >> >> >> > + {
>> >> >> >> > +   auto orig_stmt = STMT_VINFO_RELATED_STMT
>> >> (stmt_info);
>> >> >> >> > +   if (is_gimple_assign (STMT_VINFO_STMT (orig_stmt)))
>> >> >> >> > + op0 = gimple_assign_lhs (STMT_VINFO_STMT
>> >> (orig_stmt));
>> >> >> >> > + }
>> >> >> >>
>> >> >> >> If this is generally safe (I'm skipping thinking about it in
>> >> >> >> the interests of a 

[pushed] [PR108754] RA: Use caller save equivalent memory only for LRA

2023-02-10 Thread Vladimir Makarov via Gcc-patches

The following patch should  solve

  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108754

The patch simply switches off a new optimization for targets using the 
old reload pass.


The patch was successfully bootstrapped on x86-64.

commit 7757567358a84c3774cb972350bd7ea299daaa8d
Author: Vladimir N. Makarov 
Date:   Fri Feb 10 12:17:07 2023 -0500

RA: Use caller save equivalent memory only for LRA

Recently I submitted a patch to reuse memory with constant address for
caller saves optimization for constant or pure function call.  It
seems to work only for targets using LRA instead of the old reload
pass.  So the patch switches off this optimization when the old reload
pass is used.

PR middle-end/108754

gcc/ChangeLog:

* ira.cc (update_equiv_regs): Set up ira_reg_equiv for
valid_combine only when ira_use_lra_p is true.

diff --git a/gcc/ira.cc b/gcc/ira.cc
index d0b6ea062e8..9f9af808f63 100644
--- a/gcc/ira.cc
+++ b/gcc/ira.cc
@@ -3773,7 +3773,7 @@ update_equiv_regs (void)
 		{
 		  note = set_unique_reg_note (insn, REG_EQUIV, replacement);
 		}
-		  else
+		  else if (ira_use_lra_p)
 		{
 		  /* We still can use this equivalence for caller save
 			 optimization in LRA.  Mark this.  */


RE: [PATCH 1/2]middle-end: Fix wrong overmatching of div-bitmask by using new optabs [PR108583]

2023-02-10 Thread Tamar Christina via Gcc-patches
> -Original Message-
> From: Richard Sandiford 
> Sent: Friday, February 10, 2023 4:57 PM
> To: Tamar Christina 
> Cc: Tamar Christina via Gcc-patches ; nd
> ; rguent...@suse.de; j...@ventanamicro.com
> Subject: Re: [PATCH 1/2]middle-end: Fix wrong overmatching of div-bitmask
> by using new optabs [PR108583]
> 
> Tamar Christina  writes:
> >> -Original Message-
> >> From: Richard Sandiford 
> >> Sent: Friday, February 10, 2023 4:25 PM
> >> To: Tamar Christina 
> >> Cc: Tamar Christina via Gcc-patches ; nd
> >> ; rguent...@suse.de; j...@ventanamicro.com
> >> Subject: Re: [PATCH 1/2]middle-end: Fix wrong overmatching of
> >> div-bitmask by using new optabs [PR108583]
> >>
> >> Tamar Christina  writes:
> >> >> -Original Message-
> >> >> From: Richard Sandiford 
> >> >> Sent: Friday, February 10, 2023 3:57 PM
> >> >> To: Tamar Christina 
> >> >> Cc: Tamar Christina via Gcc-patches ; nd
> >> >> ; rguent...@suse.de; j...@ventanamicro.com
> >> >> Subject: Re: [PATCH 1/2]middle-end: Fix wrong overmatching of
> >> >> div-bitmask by using new optabs [PR108583]
> >> >>
> >> >> Tamar Christina  writes:
> >> >> >> > a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc index
> >> >> >> >
> >> >> >>
> >> >>
> >>
> 6934aebc69f231af24668f0a1c3d140e97f55487..e39d7e6b362ef44eb2fc467f33
> >> >> >> 69
> >> >> >> > de2afea139d6 100644
> >> >> >> > --- a/gcc/tree-vect-patterns.cc
> >> >> >> > +++ b/gcc/tree-vect-patterns.cc
> >> >> >> > @@ -3914,12 +3914,82 @@ vect_recog_divmod_pattern
> (vec_info
> >> >> *vinfo,
> >> >> >> >return pattern_stmt;
> >> >> >> >  }
> >> >> >> >else if ((cst = uniform_integer_cst_p (oprnd1))
> >> >> >> > - && targetm.vectorize.can_special_div_by_const (rhs_code,
> >> >> >> vectype,
> >> >> >> > -wi::to_wide
> >> (cst),
> >> >> >> > -NULL,
> >> NULL_RTX,
> >> >> >> > -NULL_RTX))
> >> >> >> > + && TYPE_UNSIGNED (itype)
> >> >> >> > + && rhs_code == TRUNC_DIV_EXPR
> >> >> >> > + && vectype
> >> >> >> > + && direct_internal_fn_supported_p (IFN_ADDH, vectype,
> >> >> >> > +OPTIMIZE_FOR_SPEED))
> >> >> >> >  {
> >> >> >> > -  return NULL;
> >> >> >> > +  /* div optimizations using narrowings
> >> >> >> > +   we can do the division e.g. shorts by 255 faster by
> >> >> >> > + calculating it
> >> as
> >> >> >> > +   (x + ((x + 257) >> 8)) >> 8 assuming the operation is done 
> >> >> >> > in
> >> >> >> > +   double the precision of x.
> >> >> >> > +
> >> >> >> > +   If we imagine a short as being composed of two blocks
> >> >> >> > + of bytes
> >> >> then
> >> >> >> > +   adding 257 or 0b_0001__0001 to the number is
> >> >> >> > + equivalent
> >> >> to
> >> >> >> > +   adding 1 to each sub component:
> >> >> >> > +
> >> >> >> > +  short value of 16-bits
> >> >> >> > +   ┌──┬┐
> >> >> >> > +   │  ││
> >> >> >> > +   └──┴┘
> >> >> >> > +   8-bit part1 ▲  8-bit part2   ▲
> >> >> >> > +   ││
> >> >> >> > +   ││
> >> >> >> > +  +1   +1
> >> >> >> > +
> >> >> >> > +   after the first addition, we have to shift right by
> >> >> >> > + 8, and narrow
> >> the
> >> >> >> > +   results back to a byte.  Remember that the addition
> >> >> >> > + must be done
> >> >> in
> >> >> >> > +   double the precision of the input.  However if we
> >> >> >> > + know that the
> >> >> >> addition
> >> >> >> > +   `x + 257` does not overflow then we can do the
> >> >> >> > + operation in the
> >> >> >> current
> >> >> >> > +   precision.  In which case we don't need the pack and
> unpacks.
> >> */
> >> >> >> > +  auto wcst = wi::to_wide (cst);
> >> >> >> > +  int pow = wi::exact_log2 (wcst + 1);
> >> >> >> > +  if (pow == (int) (element_precision (vectype) / 2))
> >> >> >> > +  {
> >> >> >> > +wide_int min,max;
> >> >> >> > +/* If we're in a pattern we need to find the orginal
> >> definition.  */
> >> >> >> > +tree op0 = oprnd0;
> >> >> >> > +gimple *stmt = SSA_NAME_DEF_STMT (oprnd0);
> >> >> >> > +stmt_vec_info stmt_info = vinfo->lookup_stmt (stmt);
> >> >> >> > +if (is_pattern_stmt_p (stmt_info))
> >> >> >> > +  {
> >> >> >> > +auto orig_stmt = STMT_VINFO_RELATED_STMT
> >> (stmt_info);
> >> >> >> > +if (is_gimple_assign (STMT_VINFO_STMT (orig_stmt)))
> >> >> >> > +  op0 = gimple_assign_lhs (STMT_VINFO_STMT
> >> (orig_stmt));
> >> >> >> > +  }
> >> >> >>
> >> >> >> If this is generally safe (I'm skipping thinking about it in
> >> >> >> the interests of a quick review :-)), then I think it should be
> >> >> >> done in vect_get_range_info instead.  Using gimple_get_lhs
> >> >> 

Re: Ping^3: [PATCH] libcpp: Handle extended characters in user-defined literal suffix [PR103902]

2023-02-10 Thread Jakub Jelinek via Gcc-patches
On Fri, Feb 10, 2023 at 11:58:03AM -0500, Lewis Hyatt wrote:
> On Fri, Feb 10, 2023 at 11:30 AM Jakub Jelinek  wrote:
> >
> > On Mon, Sep 26, 2022 at 06:27:25PM -0400, Lewis Hyatt via Gcc-patches wrote:
> > > May I please ping this patch again? Joseph suggested that it would be 
> > > best if
> > > a C++ maintainer has a look at it. This is one of just a few places left 
> > > where
> > > we don't handle UTF-8 properly in libcpp, it would be really nice to get 
> > > them
> > > fixed up if there is time to review this patch. Thanks!
> >
> > CCing them.
> >
> > Just some nits from me, but I agree C++ maintainers are the best reviewers
> > for this.
> 
> Thanks so much for looking it over, I really appreciate it. I'll be
> sure to incorporate all your feedback along with those from the full
> review.
> 
> Is this for stage 1 at this point BTW?

It has been posted during stage 1, on the other side we are already almost a
month into stage 4, and fixes an important bug which isn't a regression
though.  I'd defer to the reviewers to decide that.

I've noticed this when seeing PR108717 being marked as dup of PR103902.
> The is_macro() function was doing two jobs, first lexing the
> identifier and looking it up in the hash table, and then calling
> cpp_macro_p(). This was a bit duplicative because the identifier was
> then immediately lexed again after the check. Since lexing it became
> more complicated with UTF-8 support, I changed it not to duplicate
> that effort and instead scan_cur_identifer() does the job once. With
> that done, all that's left for is_macro() to do is just the one line
> check so I got rid of it. However, I agree that the check about
> suffix_begin is not really trivial and so factoring this out into one
> place instead of two makes sense. I'll try to move the whole warning
> into its own function in the next iteration.

I agree, I just wanted to mention that if both of the callers need the same
large comment and roughly or completely the same code that guards the
cpp_warning, then it is a good candidate for a new helper, exactly like
you've added for the large duplication in the first new function.

Jakub



Re: [PATCH 1/2]middle-end: Fix wrong overmatching of div-bitmask by using new optabs [PR108583]

2023-02-10 Thread Richard Sandiford via Gcc-patches
Richard Sandiford  writes:
> Final pattern statements (those not in DEF_SEQ) always have the same
> type and value as the original statements.  We wouldn't see mismatched
> precisions if we were only looking at final pattern statements.
>
> Like you say, the 16-bit addition didn't exist before vectorisation
> (it was a 32-bit addition instead).  So to make things type-correct,
> the 32-bit addition:
>
>A: sum = a + b   (STMT_VINFO_RELATED_STMT == A2)
>
> is replaced with:
>
>DEF_SEQ:
>  A1: tmp = a' + b'  (STMT_VINFO_RELATED_STMT == A)
>A2: sum' = (int) tmp (STMT_VINFO_RELATED_STMT == A)
>
> (using different notation from before, just to confuse things).
> Here, A2 is the final pattern statement for A and A1 is just a
> temporary result.  sum == sum'.
>
> Later, we do a similar thing for the division itself.  We have:
>
>B: quotient = sum / 0xff(STMT_VINFO_RELATED_STMT == B2)
>
> We realise that this can be a 16-bit division, so (IIRC) we use
> vect_look_through_possible_promotion on sum to find the best
> starting point.  This should give:
>
>DEF_SEQ:
>  B1: tmp2 = tmp / (uint16_t) 0xff  (STMT_VINFO_RELATED_STMT == B)
>B2: quotient' = (int) tmp2  (STMT_VINFO_RELATED_STMT == B)
>
> Both changes are done by vect_widened_op_tree.

Eh, I meant vect_recog_over_widening_pattern.

Richard


Re: Ping^3: [PATCH] libcpp: Handle extended characters in user-defined literal suffix [PR103902]

2023-02-10 Thread Lewis Hyatt via Gcc-patches
On Fri, Feb 10, 2023 at 11:30 AM Jakub Jelinek  wrote:
>
> On Mon, Sep 26, 2022 at 06:27:25PM -0400, Lewis Hyatt via Gcc-patches wrote:
> > May I please ping this patch again? Joseph suggested that it would be best 
> > if
> > a C++ maintainer has a look at it. This is one of just a few places left 
> > where
> > we don't handle UTF-8 properly in libcpp, it would be really nice to get 
> > them
> > fixed up if there is time to review this patch. Thanks!
>
> CCing them.
>
> Just some nits from me, but I agree C++ maintainers are the best reviewers
> for this.

Thanks so much for looking it over, I really appreciate it. I'll be
sure to incorporate all your feedback along with those from the full
review.

Is this for stage 1 at this point BTW?

One note, the patch as-is doesn't quite apply to master branch
nowadays, it just needs a small tweak since warn_about_normalization()
has acquired a new argument in the meantime. If it's helpful I can
resend it with this addressed, as well as the rest of your comments?

Finally one comment here:

> > +  if (const auto sr = scan_cur_identifier (pfile))
> > + {
> > +   /* If a string format macro, say from inttypes.h, is placed touching
> > +  a string literal it could be parsed as a C++11 user-defined
> > +  string literal thus breaking the program.  User-defined literals
> > +  outside of namespace std must start with a single underscore, so
> > +  assume anything of that form really is a UDL suffix.  We don't
> > +  need to worry about UDLs defined inside namespace std because
> > +  their names are reserved, so cannot be used as macro names in
> > +  valid programs.  */
> > +   if ((suffix_begin[0] != '_' || suffix_begin[1] == '_')
> > +   && cpp_macro_p (sr.node))
>
> What is the advantage of dropping is_macro_not_literal_suffix and
> hand-inlining it in two different spots?
> Couldn't even the actual warning be moved into an inline function?

The is_macro() function was doing two jobs, first lexing the
identifier and looking it up in the hash table, and then calling
cpp_macro_p(). This was a bit duplicative because the identifier was
then immediately lexed again after the check. Since lexing it became
more complicated with UTF-8 support, I changed it not to duplicate
that effort and instead scan_cur_identifer() does the job once. With
that done, all that's left for is_macro() to do is just the one line
check so I got rid of it. However, I agree that the check about
suffix_begin is not really trivial and so factoring this out into one
place instead of two makes sense. I'll try to move the whole warning
into its own function in the next iteration.

> Otherwise it looks reasonable to me, but I'd still prefer Jason or Nathan
> to review this.
>
> Jakub
>

Thanks again.

-Lewis


Re: [PATCH 1/2]middle-end: Fix wrong overmatching of div-bitmask by using new optabs [PR108583]

2023-02-10 Thread Richard Sandiford via Gcc-patches
Tamar Christina  writes:
>> -Original Message-
>> From: Richard Sandiford 
>> Sent: Friday, February 10, 2023 4:25 PM
>> To: Tamar Christina 
>> Cc: Tamar Christina via Gcc-patches ; nd
>> ; rguent...@suse.de; j...@ventanamicro.com
>> Subject: Re: [PATCH 1/2]middle-end: Fix wrong overmatching of div-bitmask
>> by using new optabs [PR108583]
>> 
>> Tamar Christina  writes:
>> >> -Original Message-
>> >> From: Richard Sandiford 
>> >> Sent: Friday, February 10, 2023 3:57 PM
>> >> To: Tamar Christina 
>> >> Cc: Tamar Christina via Gcc-patches ; nd
>> >> ; rguent...@suse.de; j...@ventanamicro.com
>> >> Subject: Re: [PATCH 1/2]middle-end: Fix wrong overmatching of
>> >> div-bitmask by using new optabs [PR108583]
>> >>
>> >> Tamar Christina  writes:
>> >> >> > a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc index
>> >> >> >
>> >> >>
>> >>
>> 6934aebc69f231af24668f0a1c3d140e97f55487..e39d7e6b362ef44eb2fc467f33
>> >> >> 69
>> >> >> > de2afea139d6 100644
>> >> >> > --- a/gcc/tree-vect-patterns.cc
>> >> >> > +++ b/gcc/tree-vect-patterns.cc
>> >> >> > @@ -3914,12 +3914,82 @@ vect_recog_divmod_pattern (vec_info
>> >> *vinfo,
>> >> >> >return pattern_stmt;
>> >> >> >  }
>> >> >> >else if ((cst = uniform_integer_cst_p (oprnd1))
>> >> >> > -   && targetm.vectorize.can_special_div_by_const (rhs_code,
>> >> >> vectype,
>> >> >> > -  wi::to_wide
>> (cst),
>> >> >> > -  NULL,
>> NULL_RTX,
>> >> >> > -  NULL_RTX))
>> >> >> > +   && TYPE_UNSIGNED (itype)
>> >> >> > +   && rhs_code == TRUNC_DIV_EXPR
>> >> >> > +   && vectype
>> >> >> > +   && direct_internal_fn_supported_p (IFN_ADDH, vectype,
>> >> >> > +  OPTIMIZE_FOR_SPEED))
>> >> >> >  {
>> >> >> > -  return NULL;
>> >> >> > +  /* div optimizations using narrowings
>> >> >> > +   we can do the division e.g. shorts by 255 faster by 
>> >> >> > calculating it
>> as
>> >> >> > +   (x + ((x + 257) >> 8)) >> 8 assuming the operation is done in
>> >> >> > +   double the precision of x.
>> >> >> > +
>> >> >> > +   If we imagine a short as being composed of two blocks of
>> >> >> > + bytes
>> >> then
>> >> >> > +   adding 257 or 0b_0001__0001 to the number is
>> >> >> > + equivalent
>> >> to
>> >> >> > +   adding 1 to each sub component:
>> >> >> > +
>> >> >> > +short value of 16-bits
>> >> >> > +   ┌──┬┐
>> >> >> > +   │  ││
>> >> >> > +   └──┴┘
>> >> >> > + 8-bit part1 ▲  8-bit part2   ▲
>> >> >> > + ││
>> >> >> > + ││
>> >> >> > ++1   +1
>> >> >> > +
>> >> >> > +   after the first addition, we have to shift right by 8, and 
>> >> >> > narrow
>> the
>> >> >> > +   results back to a byte.  Remember that the addition must
>> >> >> > + be done
>> >> in
>> >> >> > +   double the precision of the input.  However if we know
>> >> >> > + that the
>> >> >> addition
>> >> >> > +   `x + 257` does not overflow then we can do the operation
>> >> >> > + in the
>> >> >> current
>> >> >> > +   precision.  In which case we don't need the pack and unpacks.
>> */
>> >> >> > +  auto wcst = wi::to_wide (cst);
>> >> >> > +  int pow = wi::exact_log2 (wcst + 1);
>> >> >> > +  if (pow == (int) (element_precision (vectype) / 2))
>> >> >> > +{
>> >> >> > +  wide_int min,max;
>> >> >> > +  /* If we're in a pattern we need to find the orginal
>> definition.  */
>> >> >> > +  tree op0 = oprnd0;
>> >> >> > +  gimple *stmt = SSA_NAME_DEF_STMT (oprnd0);
>> >> >> > +  stmt_vec_info stmt_info = vinfo->lookup_stmt (stmt);
>> >> >> > +  if (is_pattern_stmt_p (stmt_info))
>> >> >> > +{
>> >> >> > +  auto orig_stmt = STMT_VINFO_RELATED_STMT
>> (stmt_info);
>> >> >> > +  if (is_gimple_assign (STMT_VINFO_STMT (orig_stmt)))
>> >> >> > +op0 = gimple_assign_lhs (STMT_VINFO_STMT
>> (orig_stmt));
>> >> >> > +}
>> >> >>
>> >> >> If this is generally safe (I'm skipping thinking about it in the
>> >> >> interests of a quick review :-)), then I think it should be done
>> >> >> in vect_get_range_info instead.  Using gimple_get_lhs would be
>> >> >> more general than handling just assignments.
>> >> >>
>> >> >> > +
>> >> >> > +  /* Check that no overflow will occur.  If we don't have range
>> >> >> > + information we can't perform the optimization.  */
>> >> >> > +  if (vect_get_range_info (op0, , ))
>> >> >> > +{
>> >> >> > +  wide_int one = wi::to_wide (build_one_cst (itype));
>> >> >> > +  wide_int adder = wi::add (one, wi::lshift (one, pow));
>> >> >> > +  wi::overflow_type ovf;
>> >> 

Re: [PATCH 2/2] c++: speculative constexpr and is_constant_evaluated [PR108243]

2023-02-10 Thread Patrick Palka via Gcc-patches
On Fri, 10 Feb 2023, Patrick Palka wrote:

> On Thu, 9 Feb 2023, Patrick Palka wrote:
> 
> > On Thu, 9 Feb 2023, Jason Merrill wrote:
> > 
> > > On 2/9/23 09:36, Patrick Palka wrote:
> > > > On Sun, 5 Feb 2023, Jason Merrill wrote:
> > > > 
> > > > > On 2/3/23 15:51, Patrick Palka wrote:
> > > > > > On Mon, 30 Jan 2023, Jason Merrill wrote:
> > > > > > 
> > > > > > > On 1/27/23 17:02, Patrick Palka wrote:
> > > > > > > > This PR illustrates that __builtin_is_constant_evaluated 
> > > > > > > > currently
> > > > > > > > acts
> > > > > > > > as an optimization barrier for our speculative constexpr 
> > > > > > > > evaluation,
> > > > > > > > since we don't want to prematurely fold the builtin to false if 
> > > > > > > > the
> > > > > > > > expression in question would be later manifestly constant 
> > > > > > > > evaluated
> > > > > > > > (in
> > > > > > > > which case it must be folded to true).
> > > > > > > > 
> > > > > > > > This patch fixes this by permitting 
> > > > > > > > __builtin_is_constant_evaluated
> > > > > > > > to get folded as false during cp_fold_function, since at that 
> > > > > > > > point
> > > > > > > > we're sure we're doing manifestly constant evaluation.  To that 
> > > > > > > > end
> > > > > > > > we add a flags parameter to cp_fold that controls what 
> > > > > > > > mce_value the
> > > > > > > > CALL_EXPR case passes to maybe_constant_value.
> > > > > > > > 
> > > > > > > > bootstrapped and rgetsted no x86_64-pc-linux-gnu, does this 
> > > > > > > > look OK
> > > > > > > > for
> > > > > > > > trunk?
> > > > > > > > 
> > > > > > > > PR c++/108243
> > > > > > > > 
> > > > > > > > gcc/cp/ChangeLog:
> > > > > > > > 
> > > > > > > > * cp-gimplify.cc (enum fold_flags): Define.
> > > > > > > > (cp_fold_data::genericize): Replace this data member 
> > > > > > > > with ...
> > > > > > > > (cp_fold_data::fold_flags): ... this.
> > > > > > > > (cp_fold_r): Adjust cp_fold_data use and cp_fold_calls.
> > > > > > > > (cp_fold_function): Likewise.
> > > > > > > > (cp_fold_maybe_rvalue): Likewise.
> > > > > > > > (cp_fully_fold_init): Likewise.
> > > > > > > > (cp_fold): Add fold_flags parameter.  Don't cache if 
> > > > > > > > flags
> > > > > > > > isn't empty.
> > > > > > > > : Pass mce_false to maybe_constant_value
> > > > > > > > if if ff_genericize is set.
> > > > > > > > 
> > > > > > > > gcc/testsuite/ChangeLog:
> > > > > > > > 
> > > > > > > > * g++.dg/opt/pr108243.C: New test.
> > > > > > > > ---
> > > > > > > > gcc/cp/cp-gimplify.cc   | 76
> > > > > > > > ++---
> > > > > > > > gcc/testsuite/g++.dg/opt/pr108243.C | 29 +++
> > > > > > > > 2 files changed, 76 insertions(+), 29 deletions(-)
> > > > > > > > create mode 100644 gcc/testsuite/g++.dg/opt/pr108243.C
> > > > > > > > 
> > > > > > > > diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
> > > > > > > > index a35cedd05cc..d023a63768f 100644
> > > > > > > > --- a/gcc/cp/cp-gimplify.cc
> > > > > > > > +++ b/gcc/cp/cp-gimplify.cc
> > > > > > > > @@ -43,12 +43,20 @@ along with GCC; see the file COPYING3.  If 
> > > > > > > > not
> > > > > > > > see
> > > > > > > > #include "omp-general.h"
> > > > > > > > #include "opts.h"
> > > > > > > > +/* Flags for cp_fold and cp_fold_r.  */
> > > > > > > > +
> > > > > > > > +enum fold_flags {
> > > > > > > > +  ff_none = 0,
> > > > > > > > +  /* Whether we're being called from cp_fold_function.  */
> > > > > > > > +  ff_genericize = 1 << 0,
> > > > > > > > +};
> > > > > > > > +
> > > > > > > > /* Forward declarations.  */
> > > > > > > >   static tree cp_genericize_r (tree *, int *, void *);
> > > > > > > > static tree cp_fold_r (tree *, int *, void *);
> > > > > > > > static void cp_genericize_tree (tree*, bool);
> > > > > > > > -static tree cp_fold (tree);
> > > > > > > > +static tree cp_fold (tree, fold_flags);
> > > > > > > >   /* Genericize a TRY_BLOCK.  */
> > > > > > > > @@ -996,9 +1004,8 @@ struct cp_genericize_data
> > > > > > > > struct cp_fold_data
> > > > > > > > {
> > > > > > > >   hash_set pset;
> > > > > > > > -  bool genericize; // called from cp_fold_function?
> > > > > > > > -
> > > > > > > > -  cp_fold_data (bool g): genericize (g) {}
> > > > > > > > +  fold_flags flags;
> > > > > > > > +  cp_fold_data (fold_flags flags): flags (flags) {}
> > > > > > > > };
> > > > > > > >   static tree
> > > > > > > > @@ -1039,7 +1046,7 @@ cp_fold_r (tree *stmt_p, int 
> > > > > > > > *walk_subtrees,
> > > > > > > > void
> > > > > > > > *data_)
> > > > > > > >   break;
> > > > > > > > }
> > > > > > > > -  *stmt_p = stmt = cp_fold (*stmt_p);
> > > > > > > > +  *stmt_p = stmt = cp_fold (*stmt_p, data->flags);
> > > > > > > > if (data->pset.add (stmt))
> > > > > > > > {
> > > > > > > > @@ -1119,12 +1126,12 @@ cp_fold_r (tree *stmt_p, int 

[pushed] [PR108500] RA: Use simple LRA for huge functions

2023-02-10 Thread Vladimir Makarov via Gcc-patches

The following patch is for

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108500

The patch improves compilation speed.  Compilation time of the biggest 
test in the PR decreases from 1235s to 709s.


The patch was successfully bootstrapped on x86-64.
commit 02371cdd755d2b53fb580d3e8209c44e0c45c337
Author: Vladimir N. Makarov 
Date:   Fri Feb 10 11:12:37 2023 -0500

RA: Use simple LRA for huge functions

The PR108500 test contains a huge function and RA spends a lot of time
to compile the test with -O0.  The patch decreases compilation time
considerably for huge functions.  Compilation time for the PR test
decreases from 1235s to 709s on Intel i7-13600K.

PR tree-optimization/108500

gcc/ChangeLog:

* params.opt (ira-simple-lra-insn-threshold): Add new param.
* ira.cc (ira): Use the param to switch on simple LRA.

diff --git a/gcc/ira.cc b/gcc/ira.cc
index 6143db06c52..d0b6ea062e8 100644
--- a/gcc/ira.cc
+++ b/gcc/ira.cc
@@ -5624,12 +5624,16 @@ ira (FILE *f)
 if (DF_REG_DEF_COUNT (i) || DF_REG_USE_COUNT (i))
   num_used_regs++;
 
-  /* If there are too many pseudos and/or basic blocks (e.g. 10K
- pseudos and 10K blocks or 100K pseudos and 1K blocks), we will
- use simplified and faster algorithms in LRA.  */
+  /* If there are too many pseudos and/or basic blocks (e.g. 10K pseudos and
+ 10K blocks or 100K pseudos and 1K blocks) or we have too many function
+ insns, we will use simplified and faster algorithms in LRA.  */
   lra_simple_p
-= ira_use_lra_p
-  && num_used_regs >= (1U << 26) / last_basic_block_for_fn (cfun);
+= (ira_use_lra_p
+   && (num_used_regs >= (1U << 26) / last_basic_block_for_fn (cfun)
+   /* max uid is a good evaluation of the number of insns as most
+  optimizations are done on tree-SSA level.  */
+   || ((uint64_t) get_max_uid ()
+	   > (uint64_t) param_ira_simple_lra_insn_threshold * 1000)));
 
   if (lra_simple_p)
 {
diff --git a/gcc/params.opt b/gcc/params.opt
index 8a128c321c9..c7913d9063a 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -302,6 +302,10 @@ The number of registers in each class kept unused by loop invariant motion.
 Common Joined UInteger Var(param_ira_max_conflict_table_size) Init(1000) Param Optimization
 Max size of conflict table in MB.
 
+-param=ira-simple-lra-insn-threshold=
+Common Joined UInteger Var(param_ira_simple_lra_insn_threshold) Init(1000) Param Optimization
+Approximate function insn number in 1K units triggering simple local RA.
+
 -param=ira-max-loops-num=
 Common Joined UInteger Var(param_ira_max_loops_num) Init(100) Param Optimization
 Max loops number for regional RA.


Re: [pushed] [PR103541] RA: Implement reuse of equivalent memory for caller saves optimization (version 2)

2023-02-10 Thread Hans-Peter Nilsson via Gcc-patches
> From: Vladimir Makarov via Gcc-patches 
> Date: Thu, 9 Feb 2023 22:49:34 +0100

> The patch was successfully bootstrapped (--enable-languages=all) and 
> tested on x86, x86-64, aarch64

Sorry, but this (also) caused test-suite regressions,
perhaps just for cris-elf.  I've opened 108754 and will
assist, looking closer.

brgds, H-P


RE: [PATCH 1/2]middle-end: Fix wrong overmatching of div-bitmask by using new optabs [PR108583]

2023-02-10 Thread Tamar Christina via Gcc-patches
> -Original Message-
> From: Richard Sandiford 
> Sent: Friday, February 10, 2023 4:25 PM
> To: Tamar Christina 
> Cc: Tamar Christina via Gcc-patches ; nd
> ; rguent...@suse.de; j...@ventanamicro.com
> Subject: Re: [PATCH 1/2]middle-end: Fix wrong overmatching of div-bitmask
> by using new optabs [PR108583]
> 
> Tamar Christina  writes:
> >> -Original Message-
> >> From: Richard Sandiford 
> >> Sent: Friday, February 10, 2023 3:57 PM
> >> To: Tamar Christina 
> >> Cc: Tamar Christina via Gcc-patches ; nd
> >> ; rguent...@suse.de; j...@ventanamicro.com
> >> Subject: Re: [PATCH 1/2]middle-end: Fix wrong overmatching of
> >> div-bitmask by using new optabs [PR108583]
> >>
> >> Tamar Christina  writes:
> >> >> > a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc index
> >> >> >
> >> >>
> >>
> 6934aebc69f231af24668f0a1c3d140e97f55487..e39d7e6b362ef44eb2fc467f33
> >> >> 69
> >> >> > de2afea139d6 100644
> >> >> > --- a/gcc/tree-vect-patterns.cc
> >> >> > +++ b/gcc/tree-vect-patterns.cc
> >> >> > @@ -3914,12 +3914,82 @@ vect_recog_divmod_pattern (vec_info
> >> *vinfo,
> >> >> >return pattern_stmt;
> >> >> >  }
> >> >> >else if ((cst = uniform_integer_cst_p (oprnd1))
> >> >> > -&& targetm.vectorize.can_special_div_by_const (rhs_code,
> >> >> vectype,
> >> >> > -   wi::to_wide
> (cst),
> >> >> > -   NULL,
> NULL_RTX,
> >> >> > -   NULL_RTX))
> >> >> > +&& TYPE_UNSIGNED (itype)
> >> >> > +&& rhs_code == TRUNC_DIV_EXPR
> >> >> > +&& vectype
> >> >> > +&& direct_internal_fn_supported_p (IFN_ADDH, vectype,
> >> >> > +   OPTIMIZE_FOR_SPEED))
> >> >> >  {
> >> >> > -  return NULL;
> >> >> > +  /* div optimizations using narrowings
> >> >> > +   we can do the division e.g. shorts by 255 faster by 
> >> >> > calculating it
> as
> >> >> > +   (x + ((x + 257) >> 8)) >> 8 assuming the operation is done in
> >> >> > +   double the precision of x.
> >> >> > +
> >> >> > +   If we imagine a short as being composed of two blocks of
> >> >> > + bytes
> >> then
> >> >> > +   adding 257 or 0b_0001__0001 to the number is
> >> >> > + equivalent
> >> to
> >> >> > +   adding 1 to each sub component:
> >> >> > +
> >> >> > + short value of 16-bits
> >> >> > +   ┌──┬┐
> >> >> > +   │  ││
> >> >> > +   └──┴┘
> >> >> > +  8-bit part1 ▲  8-bit part2   ▲
> >> >> > +  ││
> >> >> > +  ││
> >> >> > + +1   +1
> >> >> > +
> >> >> > +   after the first addition, we have to shift right by 8, and 
> >> >> > narrow
> the
> >> >> > +   results back to a byte.  Remember that the addition must
> >> >> > + be done
> >> in
> >> >> > +   double the precision of the input.  However if we know
> >> >> > + that the
> >> >> addition
> >> >> > +   `x + 257` does not overflow then we can do the operation
> >> >> > + in the
> >> >> current
> >> >> > +   precision.  In which case we don't need the pack and unpacks.
> */
> >> >> > +  auto wcst = wi::to_wide (cst);
> >> >> > +  int pow = wi::exact_log2 (wcst + 1);
> >> >> > +  if (pow == (int) (element_precision (vectype) / 2))
> >> >> > + {
> >> >> > +   wide_int min,max;
> >> >> > +   /* If we're in a pattern we need to find the orginal
> definition.  */
> >> >> > +   tree op0 = oprnd0;
> >> >> > +   gimple *stmt = SSA_NAME_DEF_STMT (oprnd0);
> >> >> > +   stmt_vec_info stmt_info = vinfo->lookup_stmt (stmt);
> >> >> > +   if (is_pattern_stmt_p (stmt_info))
> >> >> > + {
> >> >> > +   auto orig_stmt = STMT_VINFO_RELATED_STMT
> (stmt_info);
> >> >> > +   if (is_gimple_assign (STMT_VINFO_STMT (orig_stmt)))
> >> >> > + op0 = gimple_assign_lhs (STMT_VINFO_STMT
> (orig_stmt));
> >> >> > + }
> >> >>
> >> >> If this is generally safe (I'm skipping thinking about it in the
> >> >> interests of a quick review :-)), then I think it should be done
> >> >> in vect_get_range_info instead.  Using gimple_get_lhs would be
> >> >> more general than handling just assignments.
> >> >>
> >> >> > +
> >> >> > +   /* Check that no overflow will occur.  If we don't have range
> >> >> > +  information we can't perform the optimization.  */
> >> >> > +   if (vect_get_range_info (op0, , ))
> >> >> > + {
> >> >> > +   wide_int one = wi::to_wide (build_one_cst (itype));
> >> >> > +   wide_int adder = wi::add (one, wi::lshift (one, pow));
> >> >> > +   wi::overflow_type ovf;
> >> >> > +   /* We need adder and max in the same precision.  */
> >> >> > +   wide_int zadder
> >> 

Re: Ping^3: [PATCH] libcpp: Handle extended characters in user-defined literal suffix [PR103902]

2023-02-10 Thread Jakub Jelinek via Gcc-patches
On Mon, Sep 26, 2022 at 06:27:25PM -0400, Lewis Hyatt via Gcc-patches wrote:
> May I please ping this patch again? Joseph suggested that it would be best if
> a C++ maintainer has a look at it. This is one of just a few places left where
> we don't handle UTF-8 properly in libcpp, it would be really nice to get them
> fixed up if there is time to review this patch. Thanks!

CCing them.

Just some nits from me, but I agree C++ maintainers are the best reviewers
for this.

> libcpp/ChangeLog:
> 
>   PR preprocessor/103902
>   * lex.cc (identifier_diagnostics_on_lex): New function refactors
>   common code from...
>   (lex_identifier_intern): ...here, and...
>   (lex_identifier): ...here.
>   (struct scan_id_result): New struct to hold the result of...

I'd just write
(struct scan_id_result): New type.
or similar, no need to explain what it will be used for.

>   (scan_cur_identifier): ...new function.

So just New function here too.

>   (create_literal2): New function.
>   (is_macro): Removed function that is now handled directly in
>   lex_string() and lex_raw_string().
>   (is_macro_not_literal_suffix): Likewise.
>   (lit_accum::create_literal2): New function.
>   (lex_raw_string): Make use of new function scan_cur_identifier().
>   (lex_string): Likewise.

> +/* Helper function to perform diagnostics that are needed (rarely)
> +   when an identifier is lexed.  */
> +static void identifier_diagnostics_on_lex (cpp_reader *pfile,
> +cpp_hashnode *node)

Formatting, function name should be at the start of line, so
static void
identifier_diagnostics_on_lex (cpp_reader *pfile, cpp_hashnode *node)

> +{
> +  if (__builtin_expect (!(node->flags & NODE_DIAGNOSTIC)
> + || pfile->state.skipping, 1))
> +return;
> +
> +  /* It is allowed to poison the same identifier twice.  */
> +  if ((node->flags & NODE_POISONED) && !pfile->state.poisoned_ok)
> +cpp_error (pfile, CPP_DL_ERROR, "attempt to use poisoned \"%s\"",
> +NODE_NAME (node));
> +
> +  /* Constraint 6.10.3.5: __VA_ARGS__; should only appear in the
> + replacement list of a variadic macro.  */
> +  if (node == pfile->spec_nodes.n__VA_ARGS__
> +  && !pfile->state.va_args_ok)
> +{
> +  if (CPP_OPTION (pfile, cplusplus))
> + cpp_error (pfile, CPP_DL_PEDWARN,
> +"__VA_ARGS__ can only appear in the expansion"
> +" of a C++11 variadic macro");
> +  else
> + cpp_error (pfile, CPP_DL_PEDWARN,
> +"__VA_ARGS__ can only appear in the expansion"
> +" of a C99 variadic macro");
> +}
> +

Perhaps add here the:
+  /* __VA_OPT__ should only appear in the replacement list of a
+variadic macro.  */
comment that used to be present only in the second occurrence of
all this.

> +  if (node == pfile->spec_nodes.n__VA_OPT__)
> +maybe_va_opt_error (pfile);
> +
> +  /* For -Wc++-compat, warn about use of C++ named operators.  */
> +  if (node->flags & NODE_WARN_OPERATOR)
> +cpp_warning (pfile, CPP_W_CXX_OPERATOR_NAMES,
> +  "identifier \"%s\" is a special operator name in C++",
> +  NODE_NAME (node));
> +}
> +
> +/* Helper function to scan an entire identifier beginning at
> +   pfile->buffer->cur, and possibly containing extended characters (UCNs
> +   and/or UTF-8).  Returns the cpp_hashnode for the identifier on success, or
> +   else nullptr, as well as a normalize_state so that normalization warnings
> +   may be issued once the token lexing is complete.  */

This looks like a function comment that should be immediately above
scan_cur_identifier, there might be another comment above struct
scan_id_result which would just explain the purpose of the class.

> +
> +struct scan_id_result
> +{
> +  cpp_hashnode *node;
> +  normalize_state nst;
> +
> +  scan_id_result ()
> +: node (nullptr)
> +  {
> +nst = INITIAL_NORMALIZE_STATE;
> +  }
> +
> +  explicit operator bool () const { return node; }
> +};
> +
> +static scan_id_result
> +scan_cur_identifier (cpp_reader *pfile)
> +{

> @@ -2741,26 +2800,53 @@ lex_raw_string (cpp_reader *pfile, cpp_token *token, 
> const uchar *base)
>  
>if (CPP_OPTION (pfile, user_literals))
>  {
> -  /* If a string format macro, say from inttypes.h, is placed touching
> -  a string literal it could be parsed as a C++11 user-defined string
> -  literal thus breaking the program.  */
> -  if (is_macro_not_literal_suffix (pfile, pos))
> - {
> -   /* Raise a warning, but do not consume subsequent tokens.  */
> -   if (CPP_OPTION (pfile, warn_literal_suffix) && !pfile->state.skipping)
> - cpp_warning_with_line (pfile, CPP_W_LITERAL_SUFFIX,
> -token->src_loc, 0,
> -"invalid suffix on literal; C++11 requires "
> -"a space between literal 

Re: [PATCH 1/2]middle-end: Fix wrong overmatching of div-bitmask by using new optabs [PR108583]

2023-02-10 Thread Richard Sandiford via Gcc-patches
Tamar Christina  writes:
>> -Original Message-
>> From: Richard Sandiford 
>> Sent: Friday, February 10, 2023 3:57 PM
>> To: Tamar Christina 
>> Cc: Tamar Christina via Gcc-patches ; nd
>> ; rguent...@suse.de; j...@ventanamicro.com
>> Subject: Re: [PATCH 1/2]middle-end: Fix wrong overmatching of div-bitmask
>> by using new optabs [PR108583]
>> 
>> Tamar Christina  writes:
>> >> > a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc index
>> >> >
>> >>
>> 6934aebc69f231af24668f0a1c3d140e97f55487..e39d7e6b362ef44eb2fc467f33
>> >> 69
>> >> > de2afea139d6 100644
>> >> > --- a/gcc/tree-vect-patterns.cc
>> >> > +++ b/gcc/tree-vect-patterns.cc
>> >> > @@ -3914,12 +3914,82 @@ vect_recog_divmod_pattern (vec_info
>> *vinfo,
>> >> >return pattern_stmt;
>> >> >  }
>> >> >else if ((cst = uniform_integer_cst_p (oprnd1))
>> >> > -  && targetm.vectorize.can_special_div_by_const (rhs_code,
>> >> vectype,
>> >> > - wi::to_wide 
>> >> > (cst),
>> >> > - NULL, 
>> >> > NULL_RTX,
>> >> > - NULL_RTX))
>> >> > +  && TYPE_UNSIGNED (itype)
>> >> > +  && rhs_code == TRUNC_DIV_EXPR
>> >> > +  && vectype
>> >> > +  && direct_internal_fn_supported_p (IFN_ADDH, vectype,
>> >> > + OPTIMIZE_FOR_SPEED))
>> >> >  {
>> >> > -  return NULL;
>> >> > +  /* div optimizations using narrowings
>> >> > +   we can do the division e.g. shorts by 255 faster by calculating 
>> >> > it as
>> >> > +   (x + ((x + 257) >> 8)) >> 8 assuming the operation is done in
>> >> > +   double the precision of x.
>> >> > +
>> >> > +   If we imagine a short as being composed of two blocks of bytes
>> then
>> >> > +   adding 257 or 0b_0001__0001 to the number is equivalent
>> to
>> >> > +   adding 1 to each sub component:
>> >> > +
>> >> > +   short value of 16-bits
>> >> > +   ┌──┬┐
>> >> > +   │  ││
>> >> > +   └──┴┘
>> >> > +8-bit part1 ▲  8-bit part2   ▲
>> >> > +││
>> >> > +││
>> >> > +   +1   +1
>> >> > +
>> >> > +   after the first addition, we have to shift right by 8, and 
>> >> > narrow the
>> >> > +   results back to a byte.  Remember that the addition must be done
>> in
>> >> > +   double the precision of the input.  However if we know that
>> >> > + the
>> >> addition
>> >> > +   `x + 257` does not overflow then we can do the operation in
>> >> > + the
>> >> current
>> >> > +   precision.  In which case we don't need the pack and unpacks.  
>> >> > */
>> >> > +  auto wcst = wi::to_wide (cst);
>> >> > +  int pow = wi::exact_log2 (wcst + 1);
>> >> > +  if (pow == (int) (element_precision (vectype) / 2))
>> >> > +   {
>> >> > + wide_int min,max;
>> >> > + /* If we're in a pattern we need to find the orginal 
>> >> > definition.  */
>> >> > + tree op0 = oprnd0;
>> >> > + gimple *stmt = SSA_NAME_DEF_STMT (oprnd0);
>> >> > + stmt_vec_info stmt_info = vinfo->lookup_stmt (stmt);
>> >> > + if (is_pattern_stmt_p (stmt_info))
>> >> > +   {
>> >> > + auto orig_stmt = STMT_VINFO_RELATED_STMT (stmt_info);
>> >> > + if (is_gimple_assign (STMT_VINFO_STMT (orig_stmt)))
>> >> > +   op0 = gimple_assign_lhs (STMT_VINFO_STMT (orig_stmt));
>> >> > +   }
>> >>
>> >> If this is generally safe (I'm skipping thinking about it in the
>> >> interests of a quick review :-)), then I think it should be done in
>> >> vect_get_range_info instead.  Using gimple_get_lhs would be more
>> >> general than handling just assignments.
>> >>
>> >> > +
>> >> > + /* Check that no overflow will occur.  If we don't have range
>> >> > +information we can't perform the optimization.  */
>> >> > + if (vect_get_range_info (op0, , ))
>> >> > +   {
>> >> > + wide_int one = wi::to_wide (build_one_cst (itype));
>> >> > + wide_int adder = wi::add (one, wi::lshift (one, pow));
>> >> > + wi::overflow_type ovf;
>> >> > + /* We need adder and max in the same precision.  */
>> >> > + wide_int zadder
>> >> > +   = wide_int_storage::from (adder, wi::get_precision 
>> >> > (max),
>> >> > + UNSIGNED);
>> >> > + wi::add (max, zadder, UNSIGNED, );
>> >>
>> >> Could you explain this a bit more?  When do we have mismatched
>> >> precisions?
>> >
>> > C promotion rules will promote e.g.
>> >
>> > void fun2(uint8_t* restrict pixel, uint8_t level, int n) {
>> >   for (int i = 0; i < n; i+=1)
>> > pixel[i] 

Re: [RFC/RFT 0/3] Add compiler support for Control Flow Integrity

2023-02-10 Thread Dan Li via Gcc-patches
On 02/08, Peter Collingbourne wrote:
> On Sun, Dec 18, 2022 at 10:06 PM Dan Li  wrote:
> >
> > This series of patches is mainly used to support the control flow
> > integrity protection of the linux kernel [1], which is similar to
> > -fsanitize=kcfi in clang 16.0 [2,3].
> >
> > I hope that this feature will also support user-mode CFI in the
> > future (at least for developers who can recompile the runtime),
> > so I use -fsanitize=cfi as a compilation option here.
> 
> Please don't. The various CFI-related build flags are confusing enough
> without also having this inconsistency between Clang and GCC.

Hi Peter,

Got it, as discussed before[1], in the next version I will use the same
compile option.

[1]. 
https://patchwork.kernel.org/project/linux-arm-kernel/patch/20221219061758.23321-1-ashimida.1...@gmail.com/

Thanks,
Dan.

> 
> Peter


Re: [RFC/RFT 0/3] Add compiler support for Control Flow Integrity

2023-02-10 Thread Dan Li via Gcc-patches
On 02/09, Hongtao Liu wrote:
> On Mon, Dec 19, 2022 at 3:59 PM Dan Li via Gcc-patches
>  wrote:
> >
> > This series of patches is mainly used to support the control flow
> > integrity protection of the linux kernel [1], which is similar to
> > -fsanitize=kcfi in clang 16.0 [2,3].
> >
> > I hope that this feature will also support user-mode CFI in the
> > future (at least for developers who can recompile the runtime),
> > so I use -fsanitize=cfi as a compilation option here.
> >
> > Any suggestion please let me know :).
> Do you have this series as a branch somewhere that we could also try for x86?

Hi Hongtao,

I haven't tried this feature on the x86 platform, if possible, I will try it in
the next version.

Thanks,
Dan.

> --
> BR,
> Hongtao


RE: [PATCH 1/2]middle-end: Fix wrong overmatching of div-bitmask by using new optabs [PR108583]

2023-02-10 Thread Tamar Christina via Gcc-patches
> -Original Message-
> From: Richard Sandiford 
> Sent: Friday, February 10, 2023 3:57 PM
> To: Tamar Christina 
> Cc: Tamar Christina via Gcc-patches ; nd
> ; rguent...@suse.de; j...@ventanamicro.com
> Subject: Re: [PATCH 1/2]middle-end: Fix wrong overmatching of div-bitmask
> by using new optabs [PR108583]
> 
> Tamar Christina  writes:
> >> > a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc index
> >> >
> >>
> 6934aebc69f231af24668f0a1c3d140e97f55487..e39d7e6b362ef44eb2fc467f33
> >> 69
> >> > de2afea139d6 100644
> >> > --- a/gcc/tree-vect-patterns.cc
> >> > +++ b/gcc/tree-vect-patterns.cc
> >> > @@ -3914,12 +3914,82 @@ vect_recog_divmod_pattern (vec_info
> *vinfo,
> >> >return pattern_stmt;
> >> >  }
> >> >else if ((cst = uniform_integer_cst_p (oprnd1))
> >> > -   && targetm.vectorize.can_special_div_by_const (rhs_code,
> >> vectype,
> >> > -  wi::to_wide 
> >> > (cst),
> >> > -  NULL, 
> >> > NULL_RTX,
> >> > -  NULL_RTX))
> >> > +   && TYPE_UNSIGNED (itype)
> >> > +   && rhs_code == TRUNC_DIV_EXPR
> >> > +   && vectype
> >> > +   && direct_internal_fn_supported_p (IFN_ADDH, vectype,
> >> > +  OPTIMIZE_FOR_SPEED))
> >> >  {
> >> > -  return NULL;
> >> > +  /* div optimizations using narrowings
> >> > +   we can do the division e.g. shorts by 255 faster by calculating 
> >> > it as
> >> > +   (x + ((x + 257) >> 8)) >> 8 assuming the operation is done in
> >> > +   double the precision of x.
> >> > +
> >> > +   If we imagine a short as being composed of two blocks of bytes
> then
> >> > +   adding 257 or 0b_0001__0001 to the number is equivalent
> to
> >> > +   adding 1 to each sub component:
> >> > +
> >> > +short value of 16-bits
> >> > +   ┌──┬┐
> >> > +   │  ││
> >> > +   └──┴┘
> >> > + 8-bit part1 ▲  8-bit part2   ▲
> >> > + ││
> >> > + ││
> >> > ++1   +1
> >> > +
> >> > +   after the first addition, we have to shift right by 8, and 
> >> > narrow the
> >> > +   results back to a byte.  Remember that the addition must be done
> in
> >> > +   double the precision of the input.  However if we know that
> >> > + the
> >> addition
> >> > +   `x + 257` does not overflow then we can do the operation in
> >> > + the
> >> current
> >> > +   precision.  In which case we don't need the pack and unpacks.  */
> >> > +  auto wcst = wi::to_wide (cst);
> >> > +  int pow = wi::exact_log2 (wcst + 1);
> >> > +  if (pow == (int) (element_precision (vectype) / 2))
> >> > +{
> >> > +  wide_int min,max;
> >> > +  /* If we're in a pattern we need to find the orginal 
> >> > definition.  */
> >> > +  tree op0 = oprnd0;
> >> > +  gimple *stmt = SSA_NAME_DEF_STMT (oprnd0);
> >> > +  stmt_vec_info stmt_info = vinfo->lookup_stmt (stmt);
> >> > +  if (is_pattern_stmt_p (stmt_info))
> >> > +{
> >> > +  auto orig_stmt = STMT_VINFO_RELATED_STMT (stmt_info);
> >> > +  if (is_gimple_assign (STMT_VINFO_STMT (orig_stmt)))
> >> > +op0 = gimple_assign_lhs (STMT_VINFO_STMT (orig_stmt));
> >> > +}
> >>
> >> If this is generally safe (I'm skipping thinking about it in the
> >> interests of a quick review :-)), then I think it should be done in
> >> vect_get_range_info instead.  Using gimple_get_lhs would be more
> >> general than handling just assignments.
> >>
> >> > +
> >> > +  /* Check that no overflow will occur.  If we don't have range
> >> > + information we can't perform the optimization.  */
> >> > +  if (vect_get_range_info (op0, , ))
> >> > +{
> >> > +  wide_int one = wi::to_wide (build_one_cst (itype));
> >> > +  wide_int adder = wi::add (one, wi::lshift (one, pow));
> >> > +  wi::overflow_type ovf;
> >> > +  /* We need adder and max in the same precision.  */
> >> > +  wide_int zadder
> >> > += wide_int_storage::from (adder, wi::get_precision 
> >> > (max),
> >> > +  UNSIGNED);
> >> > +  wi::add (max, zadder, UNSIGNED, );
> >>
> >> Could you explain this a bit more?  When do we have mismatched
> >> precisions?
> >
> > C promotion rules will promote e.g.
> >
> > void fun2(uint8_t* restrict pixel, uint8_t level, int n) {
> >   for (int i = 0; i < n; i+=1)
> > pixel[i] = (pixel[i] + level) / 0xff; }
> >
> > And have the addition be done as a 32 bit integer.  The vectorizer
> > 

Re: [PATCH 1/2]middle-end: Fix wrong overmatching of div-bitmask by using new optabs [PR108583]

2023-02-10 Thread Richard Sandiford via Gcc-patches
Tamar Christina  writes:
>> > a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc index
>> >
>> 6934aebc69f231af24668f0a1c3d140e97f55487..e39d7e6b362ef44eb2fc467f33
>> 69
>> > de2afea139d6 100644
>> > --- a/gcc/tree-vect-patterns.cc
>> > +++ b/gcc/tree-vect-patterns.cc
>> > @@ -3914,12 +3914,82 @@ vect_recog_divmod_pattern (vec_info *vinfo,
>> >return pattern_stmt;
>> >  }
>> >else if ((cst = uniform_integer_cst_p (oprnd1))
>> > - && targetm.vectorize.can_special_div_by_const (rhs_code,
>> vectype,
>> > -wi::to_wide (cst),
>> > -NULL, NULL_RTX,
>> > -NULL_RTX))
>> > + && TYPE_UNSIGNED (itype)
>> > + && rhs_code == TRUNC_DIV_EXPR
>> > + && vectype
>> > + && direct_internal_fn_supported_p (IFN_ADDH, vectype,
>> > +OPTIMIZE_FOR_SPEED))
>> >  {
>> > -  return NULL;
>> > +  /* div optimizations using narrowings
>> > +   we can do the division e.g. shorts by 255 faster by calculating it 
>> > as
>> > +   (x + ((x + 257) >> 8)) >> 8 assuming the operation is done in
>> > +   double the precision of x.
>> > +
>> > +   If we imagine a short as being composed of two blocks of bytes then
>> > +   adding 257 or 0b_0001__0001 to the number is equivalent to
>> > +   adding 1 to each sub component:
>> > +
>> > +  short value of 16-bits
>> > +   ┌──┬┐
>> > +   │  ││
>> > +   └──┴┘
>> > +   8-bit part1 ▲  8-bit part2   ▲
>> > +   ││
>> > +   ││
>> > +  +1   +1
>> > +
>> > +   after the first addition, we have to shift right by 8, and narrow 
>> > the
>> > +   results back to a byte.  Remember that the addition must be done in
>> > +   double the precision of the input.  However if we know that the
>> addition
>> > +   `x + 257` does not overflow then we can do the operation in the
>> current
>> > +   precision.  In which case we don't need the pack and unpacks.  */
>> > +  auto wcst = wi::to_wide (cst);
>> > +  int pow = wi::exact_log2 (wcst + 1);
>> > +  if (pow == (int) (element_precision (vectype) / 2))
>> > +  {
>> > +wide_int min,max;
>> > +/* If we're in a pattern we need to find the orginal definition.  */
>> > +tree op0 = oprnd0;
>> > +gimple *stmt = SSA_NAME_DEF_STMT (oprnd0);
>> > +stmt_vec_info stmt_info = vinfo->lookup_stmt (stmt);
>> > +if (is_pattern_stmt_p (stmt_info))
>> > +  {
>> > +auto orig_stmt = STMT_VINFO_RELATED_STMT (stmt_info);
>> > +if (is_gimple_assign (STMT_VINFO_STMT (orig_stmt)))
>> > +  op0 = gimple_assign_lhs (STMT_VINFO_STMT (orig_stmt));
>> > +  }
>> 
>> If this is generally safe (I'm skipping thinking about it in the interests 
>> of a
>> quick review :-)), then I think it should be done in vect_get_range_info
>> instead.  Using gimple_get_lhs would be more general than handling just
>> assignments.
>> 
>> > +
>> > +/* Check that no overflow will occur.  If we don't have range
>> > +   information we can't perform the optimization.  */
>> > +if (vect_get_range_info (op0, , ))
>> > +  {
>> > +wide_int one = wi::to_wide (build_one_cst (itype));
>> > +wide_int adder = wi::add (one, wi::lshift (one, pow));
>> > +wi::overflow_type ovf;
>> > +/* We need adder and max in the same precision.  */
>> > +wide_int zadder
>> > +  = wide_int_storage::from (adder, wi::get_precision (max),
>> > +UNSIGNED);
>> > +wi::add (max, zadder, UNSIGNED, );
>> 
>> Could you explain this a bit more?  When do we have mismatched
>> precisions?
>
> C promotion rules will promote e.g.
>
> void fun2(uint8_t* restrict pixel, uint8_t level, int n)
> {
>   for (int i = 0; i < n; i+=1)
> pixel[i] = (pixel[i] + level) / 0xff;
> }
>
> And have the addition be done as a 32 bit integer.  The vectorizer will 
> demote this down
> to a short, but range information is not stored for patterns.  So In the 
> above the range will
> correctly be 0x1fe but the precision will be that of the original expression, 
> so 32.  This will
> be a mismatch with itype which is derived from the size the vectorizer will 
> perform the
> operation in.

Gah, missed this first time round, sorry.

Richi would know better than me, but I think it's dangerous to rely on
the orig/pattern link for range information.  The end result of a pattern
(vect_stmt_to_vectorize) has to have the same type as the lhs of the
original statement.  But the other statements in the pattern sequence
can do arbitrary things.  Their range isn't predictable from the range
of the original statement result.

IIRC, the addition above 

Re: [PATCH] libgomp, openmp: pinned memory

2023-02-10 Thread Andrew Stubbs

On 10/02/2023 15:11, Thomas Schwinge wrote:

Hi!

Re OpenMP 'pinned' memory allocator trait semantics vs. 'omp_realloc':

On 2022-01-13T13:53:03+, Andrew Stubbs  wrote:

On 05/01/2022 17:07, Andrew Stubbs wrote:

[...], I'm working on an implementation using mmap instead of malloc
for pinned allocations.  [...]



This means that large allocations will now be page aligned and therefore
pin the smallest number of pages for the size requested, and that that
memory will be unpinned automatically when freed via munmap, or moved
via mremap.



--- /dev/null
+++ b/libgomp/config/linux/allocator.c



+static void *
+linux_memspace_realloc (omp_memspace_handle_t memspace, void *addr,
+ size_t oldsize, size_t size, int oldpin, int pin)
+{
+  if (oldpin && pin)
+{
+  void *newaddr = mremap (addr, oldsize, size, MREMAP_MAYMOVE);
+  if (newaddr == MAP_FAILED)
+ return NULL;
+
+  return newaddr;
+}
+  else if (oldpin || pin)
+{
+  void *newaddr = linux_memspace_alloc (memspace, size, pin);
+  if (newaddr)
+ {
+   memcpy (newaddr, addr, oldsize < size ? oldsize : size);
+   linux_memspace_free (memspace, addr, oldsize, oldpin);
+ }
+
+  return newaddr;
+}
+  else
+return realloc (addr, size);
+}


I did wonder if 'mremap' with 'MREMAP_MAYMOVE' is really acceptable here,
given OpenMP 5.2, 6.2 "Memory Allocators": "Allocators with the 'pinned'
trait defined to be 'true' ensure that their allocations remain in the
same storage resource at the same location for their entire lifetime."
I'd have read into this that 'realloc' may shrink or enlarge the region
(unless even that considered faulty), but the region must not be moved
("same location"), thus no 'MREMAP_MAYMOVE'; see 'man 2 mremap'


I don't think the OpenMP specification really means that any program 
using omp_realloc should abort randomly depending on the vagaries of 
chaos? What are we supposed to do? Hugely over-allocate in case realloc 
is ever called?


Andrew


Re: [PATCH 3/5] openmp, nvptx: ompx_unified_shared_mem_alloc

2023-02-10 Thread Andrew Stubbs

On 10/02/2023 14:21, Thomas Schwinge wrote:

Is the correct fix the following (conceptually like
'linux_memspace_alloc' cited above), or is there something that I fail to
understand?

  static void *
  linux_memspace_calloc (omp_memspace_handle_t memspace, size_t size, int 
pin)
  {
if (memspace == ompx_unified_shared_mem_space)
  {
void *ret = gomp_usm_alloc (size, GOMP_DEVICE_ICV);
memset (ret, 0, size);
return ret;
  }
 -  else if (memspace == ompx_unified_shared_mem_space
 -  || pin)
 +  else if (pin)
  return linux_memspace_alloc (memspace, size, pin);
else
  return calloc (1, size);


Yes, I think that is what was intended (and what actually happens). You 
can have your memory both unified and pinned (well, maybe it's possible, 
but there's no one Cuda API for that), so the USM takes precedence.



The following ones then again are conceptually like
'linux_memspace_alloc' cited above:


@@ -77,9 +86,9 @@ static void
  linux_memspace_free (omp_memspace_handle_t memspace, void *addr, size_t size,
int pin)
  {
-  (void)memspace;
-
-  if (pin)
+  if (memspace == ompx_unified_shared_mem_space)
+gomp_usm_free (addr, GOMP_DEVICE_ICV);
+  else if (pin)
  munmap (addr, size);
else
  free (addr);
@@ -89,7 +98,9 @@ static void *
  linux_memspace_realloc (omp_memspace_handle_t memspace, void *addr,
   size_t oldsize, size_t size, int oldpin, int pin)
  {
-  if (oldpin && pin)
+  if (memspace == ompx_unified_shared_mem_space)
+goto manual_realloc;
+  else if (oldpin && pin)
  {
void *newaddr = mremap (addr, oldsize, size, MREMAP_MAYMOVE);
if (newaddr == MAP_FAILED)
@@ -98,18 +109,19 @@ linux_memspace_realloc (omp_memspace_handle_t memspace, 
void *addr,
[...]


Yes.

..., and similar those here:


--- a/libgomp/config/nvptx/allocator.c
+++ b/libgomp/config/nvptx/allocator.c
@@ -125,6 +125,8 @@ nvptx_memspace_alloc (omp_memspace_handle_t memspace, 
size_t size)
__atomic_store_n (&__nvptx_lowlat_heap_root, root.raw, 
MEMMODEL_RELEASE);
return result;
  }
+  else if (memspace == ompx_host_mem_space)
+return NULL;
else
  return malloc (size);
  }
@@ -145,6 +147,8 @@ nvptx_memspace_calloc (omp_memspace_handle_t memspace, 
size_t size)

return result;
  }
+  else if (memspace == ompx_host_mem_space)
+return NULL;
else
  return calloc (1, size);
  }
@@ -354,6 +358,8 @@ nvptx_memspace_realloc (omp_memspace_handle_t memspace, 
void *addr,
   }
return result;
  }
+  else if (memspace == ompx_host_mem_space)
+return NULL;
else
  return realloc (addr, size);
  }


(I'd have added an explicit no-op (or, 'abort'?) to
'nvptx_memspace_free', but that's maybe just me...)  ;-\


Why? The host memspace is just the regular heap, which can be a thing on 
any device. It's an extension though so we can define it either way.



--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h



+extern void * gomp_usm_alloc (size_t size, int device_num);
+extern void gomp_usm_free (void *device_ptr, int device_num);
+extern bool gomp_is_usm_ptr (void *ptr);


'gomp_is_usm_ptr' isn't defined/used anywhere; I'll remove it.


I think I started that and then decided against. Thanks.


--- a/libgomp/target.c
+++ b/libgomp/target.c



@@ -3740,6 +3807,9 @@ gomp_load_plugin_for_device (struct gomp_device_descr 
*device,
DLSYM (unload_image);
DLSYM (alloc);
DLSYM (free);
+  DLSYM_OPT (usm_alloc, usm_alloc);
+  DLSYM_OPT (usm_free, usm_free);
+  DLSYM_OPT (is_usm_ptr, is_usm_ptr);
DLSYM (dev2host);
DLSYM (host2dev);


As a sanity check, shouldn't we check that either none or all three of
those are defined, like in the 'if (cuda && cuda != 4) { [error] }' check
a bit further down?


This is only going to happen when somebody writes a new plugin, and then 
they'll discover very quickly that there are issues. I've wasted more 
time writing this sentence than it's worth already. :)



Note that these remarks likewise apply to the current upstream
submission:
>
 "openmp, nvptx: ompx_unified_shared_mem_alloc".


I have new patches to heap on top of this set already on OG12, and more 
planned, plus these ones you're working on; the whole patchset is going 
to have to get a rebase, squash, and tidy "soonish".


Andrew


Re: [PATCH 1/2] Handle component_ref to a structre/union field including flexible array member [PR101832]

2023-02-10 Thread Qing Zhao via Gcc-patches
Thanks, Kees.

If there is no objection, I will update my patches with this. And send the 
updated patches soon.

Qing

> On Feb 9, 2023, at 11:46 AM, Kees Cook  wrote:
> 
> On Thu, Feb 09, 2023 at 02:40:57PM +, Qing Zhao wrote:
>> So, the major question here is:
>> 
>> in addition to the C99 standard flexible array member [ ], shall we include 
>> [0], [1] or even [4] into this extension, and treat the structure with a 
>> trailing [0], [1], or [4] embedded into another structure/union still as 
>> flexible-sized?
>> 
>> I think that we might need to limit this extension ONLY to C99 standard FAM 
>> [ ].  All other [0], [1], or [4] should be excluded from this extension. The 
>> reasons are:
>> 
>> 1. The real usages of such GCC extension (embedding structure with FAM into 
>> another structure/union), as my understanding, the old glibc’s <_G_config.h> 
>> (https://gcc.gnu.org/legacy-ml/gcc-patches/2002-08/msg01149.html), and the 
>> bug https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101832, ONLY involved C99 
>> standard FAM;
>> 
>> 2. Embedding a structure with C99 FAM [] into the end of another structure, 
>> and still treat it flexible sized might have more usages, and as discussed 
>> with Kees, it might be reasonable to promote this into a  C standard later 
>> if needed.
>> 
>> So, based on this consideration, I think I should only document the 
>> following as GCC extension:
>> 
>> struct flex  { int length; char data[ ]; };
>> struct out_flex { int m; struct flex flex_data; };
>> 
>> Issue warnings for the following: (when the structure is not at the end)
>> 
>> struct out_flex_mid  {  struct flex flex_data; int m};
>> 
>> 
>> However, for the trailing [0], [1], or [4], when such structure embedded 
>> into the end of another structure, We should NOT treat the outer structure 
>> as flexible sized. 
>> Logically, we will NOT issue warnings when such structure is not at the end. 
>> 
>> Let me know if you have any comment or suggestions.
> 
> FWIW this all sounds correct to me.
> 
> -- 
> Kees Cook



Re: [PATCH] libgomp, openmp: pinned memory

2023-02-10 Thread Thomas Schwinge
Hi!

Re OpenMP 'pinned' memory allocator trait semantics vs. 'omp_realloc':

On 2022-01-13T13:53:03+, Andrew Stubbs  wrote:
> On 05/01/2022 17:07, Andrew Stubbs wrote:
>> [...], I'm working on an implementation using mmap instead of malloc
>> for pinned allocations.  [...]

> This means that large allocations will now be page aligned and therefore
> pin the smallest number of pages for the size requested, and that that
> memory will be unpinned automatically when freed via munmap, or moved
> via mremap.

> --- /dev/null
> +++ b/libgomp/config/linux/allocator.c

> +static void *
> +linux_memspace_realloc (omp_memspace_handle_t memspace, void *addr,
> + size_t oldsize, size_t size, int oldpin, int pin)
> +{
> +  if (oldpin && pin)
> +{
> +  void *newaddr = mremap (addr, oldsize, size, MREMAP_MAYMOVE);
> +  if (newaddr == MAP_FAILED)
> + return NULL;
> +
> +  return newaddr;
> +}
> +  else if (oldpin || pin)
> +{
> +  void *newaddr = linux_memspace_alloc (memspace, size, pin);
> +  if (newaddr)
> + {
> +   memcpy (newaddr, addr, oldsize < size ? oldsize : size);
> +   linux_memspace_free (memspace, addr, oldsize, oldpin);
> + }
> +
> +  return newaddr;
> +}
> +  else
> +return realloc (addr, size);
> +}

I did wonder if 'mremap' with 'MREMAP_MAYMOVE' is really acceptable here,
given OpenMP 5.2, 6.2 "Memory Allocators": "Allocators with the 'pinned'
trait defined to be 'true' ensure that their allocations remain in the
same storage resource at the same location for their entire lifetime."
I'd have read into this that 'realloc' may shrink or enlarge the region
(unless even that considered faulty), but the region must not be moved
("same location"), thus no 'MREMAP_MAYMOVE'; see 'man 2 mremap'
(2019-03-06):

'MREMAP_MAYMOVE'
By  default, if there is not sufficient space to expand a mapping at 
its current location, then 'mremap()' fails.  If this flag is specified, then 
the kernel is permitted to relocate the mapping to a new virtual address, if 
necessary.  If the mapping is relocated, then absolute pointers into the old 
mapping location become invalid (offsets relative to the starting address of 
the mapping should be employed).

..., but then I saw that OpenMP 5.2, 18.13.9 'omp_realloc' is specified
such that it isn't expected to 'realloc' in-place, but rather it
"deallocates previously allocated memory and requests a memory
allocation", which I understand that it does end a "lifetime" and then
establish a new "lifetime", which means that 'MREMAP_MAYMOVE' in fact is
fine (as implemented)?


Further I read in 'man 2 mremap' (2019-03-06):

If  the  memory segment specified by *old_address* and *old_size* is locked 
(using 'mlock(2)' or similar), then this lock is maintained when the segment is 
resized and/or relocated.  As a consequence, the amount of memory locked by the 
process may change.

(The current proposed code evidently does make use of that; OK.)

But then in 'NOTES' I read:

If 'mremap()' is used to move or expand an area locked with 'mlock(2)' or 
equivalent, the 'mremap()' call will make a best effort to populate the new 
area but will not fail with 'ENOMEM' if the area cannot be populated.

What exactly is that supposed to tell us: "will make a best effort [...]
but will not fail"?  Isn't that in conflict with the earlier statement?
So can we rely on 'mremap' together with 'mlock' or not?


(This topic remains valid even if we follow through the idea of using
CUDA to register page-locked memory, because that's not available in all
configurations, and we then still want to do the 'mmap'/'mlock' thing, I
suppose.)


Grüße
 Thomas
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


RE: [PATCH 1/2]middle-end: Fix wrong overmatching of div-bitmask by using new optabs [PR108583]

2023-02-10 Thread Tamar Christina via Gcc-patches
> -Original Message-
> From: Richard Sandiford 
> Sent: Friday, February 10, 2023 2:31 PM
> To: Tamar Christina 
> Cc: Tamar Christina via Gcc-patches ; nd
> ; rguent...@suse.de; j...@ventanamicro.com
> Subject: Re: [PATCH 1/2]middle-end: Fix wrong overmatching of div-bitmask
> by using new optabs [PR108583]
> 
> Tamar Christina  writes:
> >> -Original Message-
> >> From: Richard Sandiford 
> >> Sent: Friday, February 10, 2023 1:36 PM
> >> To: Tamar Christina via Gcc-patches 
> >> Cc: Tamar Christina ; nd ;
> >> rguent...@suse.de; j...@ventanamicro.com
> >> Subject: Re: [PATCH 1/2]middle-end: Fix wrong overmatching of
> >> div-bitmask by using new optabs [PR108583]
> >>
> >> I think I'm misunderstanding, but: it seems like we're treating the
> >> add highpart optabs as companions to the mul highpart optabs.  But
> >> AIUI, the add highpart optab is used such that, for an N-bit mode, we
> >> do an N-bit addition followed by a shift by N/2.  Is that right?
> >> The mul highpart optabs instead do an 2N-bit multiplication followed
> >> by a shift by N.
> >
> > Correct.
> >
> >>
> >> Apart from consistency, the reason this matters is: I'm not sure what
> >> we gain by adding the optab rather than simply open-coding the
> >> addition and the shift directly into the vector pattern.  It seems
> >> like the AArch64 expander in
> >> 2/2 does just do an ordinary N-bit addition followed by an ordinary
> >> shift by N/2.
> >
> > I mentioned in the implementation, but I did so because AArch64 has
> > various optimization on shifts when it comes to truncating results.  I
> > didn't need to represent it with shifts, in fact the original pattern
> > did not. But representing it directly in the final instructions are
> > problematic because these instructions are unspecs and I would have
> needed to provide additional optabs to optimize them in.
> >
> > So the shift representation was more natural for AArch64. It would not
> > be say for
> > AArch32 which does not have these optimizations already. SVE has
> > similar optimizations and at the very worse you get an usra.
> >
> > I avoided open coding it with add and shift because it creates a 4
> > instructions (and shifts which are typically slow) dependency chain
> > instead of a load and multiply.  This change, unless the target is
> > known to optimize it further is unlikely to be beneficial.  And by the
> > time we get to costing the only alternative is to undo the existing pattern
> and so you lose the general shift optimization.
> >
> > So it seemed unwise to open code as shifts, given the codegen out of
> > the vectorizer would be degenerate for most targets or one needs the
> > more complicated route of costing during pattern matching already.
> 
> Hmm, OK.  That seems like a cost-model thing though, rather than something
> that should be exposed through optabs.  And I imagine the open-coded
> version would still be better than nothing on targets without highpart
> multiply.

Yeah but I don't think we've ever done costing on patterns during matching.
It's always been commit and go under the assumption that the replacement
Is always going to be cheaper.

> 
> So how about replacing the hook with one that simply asks whether division
> through highpart multiplication is preferred over the add/shift sequence?
> (Unfortunately it's not going to be possible to work that out from existing
> information.)

If Richi has no objections to it I can do that instead then..

Just to clarify, are you satisfied with the answer on the mixed precisions?

Thanks,
Tamar

> 
> Thanks,
> Richard
> 
> >
> >>
> >> Some comments in addition to Richard's:
> >>
> >> Tamar Christina via Gcc-patches  writes:
> >> > Hi All,
> >> >
> >> > As discussed in the ticket, this replaces the approach for
> >> > optimizing the div by bitmask operation from a hook into optabs
> >> > implemented through add_highpart.
> >> >
> >> > In order to be able to use this we need to check whether the
> >> > current precision has enough bits to do the operation without any
> >> > of the additions
> >> overflowing.
> >> >
> >> > We use range information to determine this and only do the
> >> > operation if we're sure am overflow won't occur.
> >> >
> >> > Bootstrapped Regtested on aarch64-none-linux-gnu and 
> >> issues.
> >> >
> >> > Ok for master?
> >> >
> >> > Thanks,
> >> > Tamar
> >> >
> >> > gcc/ChangeLog:
> >> >
> >> >  PR target/108583
> >> >  * doc/tm.texi (TARGET_VECTORIZE_CAN_SPECIAL_DIV_BY_CONST):
> >> Remove.
> >> >  * doc/tm.texi.in: Likewise.
> >> >  * explow.cc (round_push, align_dynamic_address): Revert previous
> >> patch.
> >> >  * expmed.cc (expand_divmod): Likewise.
> >> >  * expmed.h (expand_divmod): Likewise.
> >> >  * expr.cc (force_operand, expand_expr_divmod): Likewise.
> >> >  * optabs.cc (expand_doubleword_mod,
> >> expand_doubleword_divmod): Likewise.
> >> >  * internal-fn.def (ADDH): New.
> >> >  * optabs.def (sadd_highpart_optab, uadd_highpart_optab): New.
> >> >  * 

Re: [PATCH 2/2] c++: speculative constexpr and is_constant_evaluated [PR108243]

2023-02-10 Thread Patrick Palka via Gcc-patches
On Thu, 9 Feb 2023, Patrick Palka wrote:

> On Thu, 9 Feb 2023, Jason Merrill wrote:
> 
> > On 2/9/23 09:36, Patrick Palka wrote:
> > > On Sun, 5 Feb 2023, Jason Merrill wrote:
> > > 
> > > > On 2/3/23 15:51, Patrick Palka wrote:
> > > > > On Mon, 30 Jan 2023, Jason Merrill wrote:
> > > > > 
> > > > > > On 1/27/23 17:02, Patrick Palka wrote:
> > > > > > > This PR illustrates that __builtin_is_constant_evaluated currently
> > > > > > > acts
> > > > > > > as an optimization barrier for our speculative constexpr 
> > > > > > > evaluation,
> > > > > > > since we don't want to prematurely fold the builtin to false if 
> > > > > > > the
> > > > > > > expression in question would be later manifestly constant 
> > > > > > > evaluated
> > > > > > > (in
> > > > > > > which case it must be folded to true).
> > > > > > > 
> > > > > > > This patch fixes this by permitting 
> > > > > > > __builtin_is_constant_evaluated
> > > > > > > to get folded as false during cp_fold_function, since at that 
> > > > > > > point
> > > > > > > we're sure we're doing manifestly constant evaluation.  To that 
> > > > > > > end
> > > > > > > we add a flags parameter to cp_fold that controls what mce_value 
> > > > > > > the
> > > > > > > CALL_EXPR case passes to maybe_constant_value.
> > > > > > > 
> > > > > > > bootstrapped and rgetsted no x86_64-pc-linux-gnu, does this look 
> > > > > > > OK
> > > > > > > for
> > > > > > > trunk?
> > > > > > > 
> > > > > > >   PR c++/108243
> > > > > > > 
> > > > > > > gcc/cp/ChangeLog:
> > > > > > > 
> > > > > > >   * cp-gimplify.cc (enum fold_flags): Define.
> > > > > > >   (cp_fold_data::genericize): Replace this data member with ...
> > > > > > >   (cp_fold_data::fold_flags): ... this.
> > > > > > >   (cp_fold_r): Adjust cp_fold_data use and cp_fold_calls.
> > > > > > >   (cp_fold_function): Likewise.
> > > > > > >   (cp_fold_maybe_rvalue): Likewise.
> > > > > > >   (cp_fully_fold_init): Likewise.
> > > > > > >   (cp_fold): Add fold_flags parameter.  Don't cache if flags
> > > > > > >   isn't empty.
> > > > > > >   : Pass mce_false to maybe_constant_value
> > > > > > >   if if ff_genericize is set.
> > > > > > > 
> > > > > > > gcc/testsuite/ChangeLog:
> > > > > > > 
> > > > > > >   * g++.dg/opt/pr108243.C: New test.
> > > > > > > ---
> > > > > > > gcc/cp/cp-gimplify.cc   | 76
> > > > > > > ++---
> > > > > > > gcc/testsuite/g++.dg/opt/pr108243.C | 29 +++
> > > > > > > 2 files changed, 76 insertions(+), 29 deletions(-)
> > > > > > > create mode 100644 gcc/testsuite/g++.dg/opt/pr108243.C
> > > > > > > 
> > > > > > > diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
> > > > > > > index a35cedd05cc..d023a63768f 100644
> > > > > > > --- a/gcc/cp/cp-gimplify.cc
> > > > > > > +++ b/gcc/cp/cp-gimplify.cc
> > > > > > > @@ -43,12 +43,20 @@ along with GCC; see the file COPYING3.  If not
> > > > > > > see
> > > > > > > #include "omp-general.h"
> > > > > > > #include "opts.h"
> > > > > > > +/* Flags for cp_fold and cp_fold_r.  */
> > > > > > > +
> > > > > > > +enum fold_flags {
> > > > > > > +  ff_none = 0,
> > > > > > > +  /* Whether we're being called from cp_fold_function.  */
> > > > > > > +  ff_genericize = 1 << 0,
> > > > > > > +};
> > > > > > > +
> > > > > > > /* Forward declarations.  */
> > > > > > >   static tree cp_genericize_r (tree *, int *, void *);
> > > > > > > static tree cp_fold_r (tree *, int *, void *);
> > > > > > > static void cp_genericize_tree (tree*, bool);
> > > > > > > -static tree cp_fold (tree);
> > > > > > > +static tree cp_fold (tree, fold_flags);
> > > > > > >   /* Genericize a TRY_BLOCK.  */
> > > > > > > @@ -996,9 +1004,8 @@ struct cp_genericize_data
> > > > > > > struct cp_fold_data
> > > > > > > {
> > > > > > >   hash_set pset;
> > > > > > > -  bool genericize; // called from cp_fold_function?
> > > > > > > -
> > > > > > > -  cp_fold_data (bool g): genericize (g) {}
> > > > > > > +  fold_flags flags;
> > > > > > > +  cp_fold_data (fold_flags flags): flags (flags) {}
> > > > > > > };
> > > > > > >   static tree
> > > > > > > @@ -1039,7 +1046,7 @@ cp_fold_r (tree *stmt_p, int *walk_subtrees,
> > > > > > > void
> > > > > > > *data_)
> > > > > > >   break;
> > > > > > > }
> > > > > > > -  *stmt_p = stmt = cp_fold (*stmt_p);
> > > > > > > +  *stmt_p = stmt = cp_fold (*stmt_p, data->flags);
> > > > > > > if (data->pset.add (stmt))
> > > > > > > {
> > > > > > > @@ -1119,12 +1126,12 @@ cp_fold_r (tree *stmt_p, int 
> > > > > > > *walk_subtrees,
> > > > > > > void
> > > > > > > *data_)
> > > > > > >here rather than in cp_genericize to avoid problems 
> > > > > > > with the
> > > > > > > invisible
> > > > > > >reference transition.  */
> > > > > > > case INIT_EXPR:
> > > > > > > -  if (data->genericize)
> > > > > > > +  if (data->flags & ff_genericize)
> > > > > > >   

Re: [PATCH] PR tree-optimization/108520 - Add function context for querying global ranges.

2023-02-10 Thread Andrew MacLeod via Gcc-patches



On 2/10/23 02:45, Richard Biener wrote:

On Fri, Feb 10, 2023 at 1:02 AM Andrew MacLeod via Gcc-patches
 wrote:

I was about to ping on this, and then found it in my drafts.. Doh!


get_range_global() can invoke tree.cc::nonnull_arg_p() if the item being
queried is a pointer and a parameter.  This routine assumes the context
is CFUN, and this is not always true.

Can you share the backtrace where the context is different and cfun not NULL?
I'm curious ..



Its from the assume pass when we are trying to map the actual p[aramater 
passed to a function with the global value we assigned to the parameter 
in that function.ie


   // determine x has a global range of [10, 200] in this function.
assume_func (int x) { if (x>=10 || x<=200) return 1; else return 0; }

myfunc() {
  int y_3
  assume_func (y_3)   //  Here, we lookup assume_fund parameter list, 
and assign [10,200]
                 //  to the range of y_3 via the inferred range 
mechanism


}

The traceback is:

0x1e77e95 nonnull_arg_p(tree_node const*)
    /gcc/master/gcc/gcc/tree.cc:14438
0x1e96241 get_range_global
    /gcc/master/gcc/gcc/value-query.cc:330
0x1e96668 global_range_query::range_of_expr(vrange&, tree_node*, gimple*)
    /gcc/master/gcc/gcc/value-query.cc:406
0x32483c4 gimple_infer_range::check_assume_func(gcall*)
    /gcc/master/gcc/gcc/gimple-range-infer.cc:87
0x32488e8 gimple_infer_range::gimple_infer_range(gimple*)
    /gcc/master/gcc/gcc/gimple-range-infer.cc:166
0x3249635 infer_range_manager::register_all_uses(tree_node*)
    /gcc/master/gcc/gcc/gimple-range-infer.cc:374
0x3248e94 infer_range_manager::has_range_p(tree_node*, basic_block_def*)
    /gcc/master/gcc/gcc/gimple-range-infer.cc:273
0x3248f7e infer_range_manager::maybe_adjust_range(vrange&, tree_node*, 
basic_block_def*)

    /gcc/master/gcc/gcc/gimple-range-infer.cc:290
0x1c8b6e1 path_range_query::adjust_for_non_null_uses(basic_block_def*)
    /gcc/master/gcc/gcc/gimple-range-path.cc:501
0x1c8bf23 path_range_query::compute_ranges(bitmap_head const*)
    /gcc/master/gcc/gcc/gimple-range-path.cc:622
0x1c8a40a path_range_query::reset_path(vecvl_ptr> const&, bitmap_head const*)

    /gcc/master/gcc/gcc/gimple-range-path.cc:229
0x1c89af0 path_range_query::path_range_query(gimple_ranger&, 
vec const&, bitmap_head const*, bool)

    /gcc/master/gcc/gcc/gimple-range-path.cc:50
0x1d52a96 back_threader::find_taken_edge_cond(vecva_heap, vl_ptr> const&, gcond*)

    /gcc/master/gcc/gcc/tree-ssa-threadbackward.cc:324


This patch simply adds a function context to the get_range_global query,
and defaults it to cfun. If the context passed in is anything different
than cfun, then it simply chooses not to invoke nonnull_arg_p().

The check_assume function now directly calls gimple_range_global with
the correct function context instead of indirectly calling it through
the global_range_query->range_of_expr () method. Thats all it should
have been doing in the first place really since its always an ssa name.

Bootstraps on x86_64-pc-linux-gnu with no regressions.  OK for trunk?

OK.

Note that in the end nonnull_arg_p should get a struct function argument
as well, and its flag_delete_null_pointer_checks then need to become
opt_for_fn (fn->decl, flag_delete_null_pointer_checks).

I'll also note that for functions with many arguments nonnull_arg_p is
quite expensive and nonnull_arg_p on a PARM_DECL default-def
should be reflected on its range by a pass and we shouldn't re-query
this in any on-demand called function.


At the moment, Its only being queried thjis way as we try to resolve 
these assume_functions.  IF we enhance this to utilize IPA information 
next release in a similar way, it'll become more prevalent





Re: [PATCH 1/2]middle-end: Fix wrong overmatching of div-bitmask by using new optabs [PR108583]

2023-02-10 Thread Richard Sandiford via Gcc-patches
Tamar Christina  writes:
>> -Original Message-
>> From: Richard Sandiford 
>> Sent: Friday, February 10, 2023 1:36 PM
>> To: Tamar Christina via Gcc-patches 
>> Cc: Tamar Christina ; nd ;
>> rguent...@suse.de; j...@ventanamicro.com
>> Subject: Re: [PATCH 1/2]middle-end: Fix wrong overmatching of div-bitmask
>> by using new optabs [PR108583]
>> 
>> I think I'm misunderstanding, but: it seems like we're treating the add
>> highpart optabs as companions to the mul highpart optabs.  But AIUI, the add
>> highpart optab is used such that, for an N-bit mode, we do an N-bit addition
>> followed by a shift by N/2.  Is that right?
>> The mul highpart optabs instead do an 2N-bit multiplication followed by a
>> shift by N.
>
> Correct.
>
>> 
>> Apart from consistency, the reason this matters is: I'm not sure what we gain
>> by adding the optab rather than simply open-coding the addition and the
>> shift directly into the vector pattern.  It seems like the AArch64 expander 
>> in
>> 2/2 does just do an ordinary N-bit addition followed by an ordinary shift by
>> N/2.
>
> I mentioned in the implementation, but I did so because AArch64 has various
> optimization on shifts when it comes to truncating results.  I didn't need to
> represent it with shifts, in fact the original pattern did not. But 
> representing it
> directly in the final instructions are problematic because these instructions 
> are
> unspecs and I would have needed to provide additional optabs to optimize them 
> in.
>
> So the shift representation was more natural for AArch64. It would not be say 
> for
> AArch32 which does not have these optimizations already. SVE has similar 
> optimizations
> and at the very worse you get an usra.
>
> I avoided open coding it with add and shift because it creates a 4 
> instructions (and shifts
> which are typically slow) dependency chain instead of a load and multiply.  
> This change,
> unless the target is known to optimize it further is unlikely to be 
> beneficial.  And by the
> time we get to costing the only alternative is to undo the existing pattern 
> and so you lose
> the general shift optimization.
>
> So it seemed unwise to open code as shifts, given the codegen out of the 
> vectorizer would
> be degenerate for most targets or one needs the more complicated route of 
> costing during
> pattern matching already.

Hmm, OK.  That seems like a cost-model thing though, rather than
something that should be exposed through optabs.  And I imagine
the open-coded version would still be better than nothing on
targets without highpart multiply.

So how about replacing the hook with one that simply asks whether
division through highpart multiplication is preferred over the
add/shift sequence?  (Unfortunately it's not going to be possible
to work that out from existing information.)

Thanks,
Richard

>
>> 
>> Some comments in addition to Richard's:
>> 
>> Tamar Christina via Gcc-patches  writes:
>> > Hi All,
>> >
>> > As discussed in the ticket, this replaces the approach for optimizing
>> > the div by bitmask operation from a hook into optabs implemented
>> > through add_highpart.
>> >
>> > In order to be able to use this we need to check whether the current
>> > precision has enough bits to do the operation without any of the additions
>> overflowing.
>> >
>> > We use range information to determine this and only do the operation
>> > if we're sure am overflow won't occur.
>> >
>> > Bootstrapped Regtested on aarch64-none-linux-gnu and 
>> issues.
>> >
>> > Ok for master?
>> >
>> > Thanks,
>> > Tamar
>> >
>> > gcc/ChangeLog:
>> >
>> >PR target/108583
>> >* doc/tm.texi (TARGET_VECTORIZE_CAN_SPECIAL_DIV_BY_CONST):
>> Remove.
>> >* doc/tm.texi.in: Likewise.
>> >* explow.cc (round_push, align_dynamic_address): Revert previous
>> patch.
>> >* expmed.cc (expand_divmod): Likewise.
>> >* expmed.h (expand_divmod): Likewise.
>> >* expr.cc (force_operand, expand_expr_divmod): Likewise.
>> >* optabs.cc (expand_doubleword_mod,
>> expand_doubleword_divmod): Likewise.
>> >* internal-fn.def (ADDH): New.
>> >* optabs.def (sadd_highpart_optab, uadd_highpart_optab): New.
>> >* doc/md.texi: Document them.
>> >* doc/rtl.texi: Likewise.
>> >* target.def (can_special_div_by_const): Remove.
>> >* target.h: Remove tree-core.h include
>> >* targhooks.cc (default_can_special_div_by_const): Remove.
>> >* targhooks.h (default_can_special_div_by_const): Remove.
>> >* tree-vect-generic.cc (expand_vector_operation): Remove hook.
>> >* tree-vect-patterns.cc (vect_recog_divmod_pattern): Remove hook
>> and
>> >implement new obtab recognition based on range.
>> >* tree-vect-stmts.cc (vectorizable_operation): Remove hook.
>> >
>> > gcc/testsuite/ChangeLog:
>> >
>> >PR target/108583
>> >* gcc.dg/vect/vect-div-bitmask-4.c: New test.
>> >* gcc.dg/vect/vect-div-bitmask-5.c: New test.
>> >
>> > --- inline copy of patch --
>> > diff --git a/gcc/doc/md.texi 

Re: [PATCH 3/5] openmp, nvptx: ompx_unified_shared_mem_alloc

2023-02-10 Thread Thomas Schwinge
Hi Andrew!

On 2022-03-08T11:30:57+, Hafiz Abid Qadeer  wrote:
> From: Andrew Stubbs 
>
> This adds support for using Cuda Managed Memory with omp_alloc.  It will be
> used as the underpinnings for "requires unified_shared_memory" in a later
> patch.
>
> There are two new predefined allocators, ompx_unified_shared_mem_alloc and
> ompx_host_mem_alloc, plus corresponding memory spaces, [...]

> --- a/libgomp/config/linux/allocator.c
> +++ b/libgomp/config/linux/allocator.c
> @@ -42,9 +42,11 @@
>  static void *
>  linux_memspace_alloc (omp_memspace_handle_t memspace, size_t size, int pin)
>  {
> -  (void)memspace;
> -
> -  if (pin)
> +  if (memspace == ompx_unified_shared_mem_space)
> +{
> +  return gomp_usm_alloc (size, GOMP_DEVICE_ICV);
> +}
> +  else if (pin)
>  {
>void *addr = mmap (NULL, size, PROT_READ | PROT_WRITE,
>MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);

This I understand conceptually, but then:

> @@ -67,7 +69,14 @@ linux_memspace_alloc (omp_memspace_handle_t memspace, 
> size_t size, int pin)
>  static void *
>  linux_memspace_calloc (omp_memspace_handle_t memspace, size_t size, int pin)
>  {
> -  if (pin)
> +  if (memspace == ompx_unified_shared_mem_space)
> +{
> +  void *ret = gomp_usm_alloc (size, GOMP_DEVICE_ICV);
> +  memset (ret, 0, size);
> +  return ret;
> +}
> +  else if (memspace == ompx_unified_shared_mem_space
> +  || pin)
>  return linux_memspace_alloc (memspace, size, pin);
>else
>  return calloc (1, size);

..., here, we've got a duplicated (and thus always-false) expression
'memspace == ompx_unified_shared_mem_space' (..., which
'-Wduplicated-cond' fails to report; 
"'-Wduplicated-cond' doesn't diagnose duplicated subexpressions"...).
Is the correct fix the following (conceptually like
'linux_memspace_alloc' cited above), or is there something that I fail to
understand?

 static void *
 linux_memspace_calloc (omp_memspace_handle_t memspace, size_t size, int 
pin)
 {
   if (memspace == ompx_unified_shared_mem_space)
 {
   void *ret = gomp_usm_alloc (size, GOMP_DEVICE_ICV);
   memset (ret, 0, size);
   return ret;
 }
-  else if (memspace == ompx_unified_shared_mem_space
-  || pin)
+  else if (pin)
 return linux_memspace_alloc (memspace, size, pin);
   else
 return calloc (1, size);

The following ones then again are conceptually like
'linux_memspace_alloc' cited above:

> @@ -77,9 +86,9 @@ static void
>  linux_memspace_free (omp_memspace_handle_t memspace, void *addr, size_t size,
>int pin)
>  {
> -  (void)memspace;
> -
> -  if (pin)
> +  if (memspace == ompx_unified_shared_mem_space)
> +gomp_usm_free (addr, GOMP_DEVICE_ICV);
> +  else if (pin)
>  munmap (addr, size);
>else
>  free (addr);
> @@ -89,7 +98,9 @@ static void *
>  linux_memspace_realloc (omp_memspace_handle_t memspace, void *addr,
>   size_t oldsize, size_t size, int oldpin, int pin)
>  {
> -  if (oldpin && pin)
> +  if (memspace == ompx_unified_shared_mem_space)
> +goto manual_realloc;
> +  else if (oldpin && pin)
>  {
>void *newaddr = mremap (addr, oldsize, size, MREMAP_MAYMOVE);
>if (newaddr == MAP_FAILED)
> @@ -98,18 +109,19 @@ linux_memspace_realloc (omp_memspace_handle_t memspace, 
> void *addr,
> [...]

..., and similar those here:

> --- a/libgomp/config/nvptx/allocator.c
> +++ b/libgomp/config/nvptx/allocator.c
> @@ -125,6 +125,8 @@ nvptx_memspace_alloc (omp_memspace_handle_t memspace, 
> size_t size)
>__atomic_store_n (&__nvptx_lowlat_heap_root, root.raw, 
> MEMMODEL_RELEASE);
>return result;
>  }
> +  else if (memspace == ompx_host_mem_space)
> +return NULL;
>else
>  return malloc (size);
>  }
> @@ -145,6 +147,8 @@ nvptx_memspace_calloc (omp_memspace_handle_t memspace, 
> size_t size)
>
>return result;
>  }
> +  else if (memspace == ompx_host_mem_space)
> +return NULL;
>else
>  return calloc (1, size);
>  }
> @@ -354,6 +358,8 @@ nvptx_memspace_realloc (omp_memspace_handle_t memspace, 
> void *addr,
>   }
>return result;
>  }
> +  else if (memspace == ompx_host_mem_space)
> +return NULL;
>else
>  return realloc (addr, size);
>  }

(I'd have added an explicit no-op (or, 'abort'?) to
'nvptx_memspace_free', but that's maybe just me...)  ;-\


> --- a/libgomp/libgomp.h
> +++ b/libgomp/libgomp.h

> +extern void * gomp_usm_alloc (size_t size, int device_num);
> +extern void gomp_usm_free (void *device_ptr, int device_num);
> +extern bool gomp_is_usm_ptr (void *ptr);

'gomp_is_usm_ptr' isn't defined/used anywhere; I'll remove it.


> --- a/libgomp/target.c
> +++ b/libgomp/target.c

> @@ -3740,6 +3807,9 @@ gomp_load_plugin_for_device (struct gomp_device_descr 
> *device,
>DLSYM (unload_image);
>DLSYM (alloc);
>DLSYM (free);
> +  DLSYM_OPT 

RE: [PATCH 1/2]middle-end: Fix wrong overmatching of div-bitmask by using new optabs [PR108583]

2023-02-10 Thread Tamar Christina via Gcc-patches
> -Original Message-
> From: Richard Sandiford 
> Sent: Friday, February 10, 2023 1:36 PM
> To: Tamar Christina via Gcc-patches 
> Cc: Tamar Christina ; nd ;
> rguent...@suse.de; j...@ventanamicro.com
> Subject: Re: [PATCH 1/2]middle-end: Fix wrong overmatching of div-bitmask
> by using new optabs [PR108583]
> 
> I think I'm misunderstanding, but: it seems like we're treating the add
> highpart optabs as companions to the mul highpart optabs.  But AIUI, the add
> highpart optab is used such that, for an N-bit mode, we do an N-bit addition
> followed by a shift by N/2.  Is that right?
> The mul highpart optabs instead do an 2N-bit multiplication followed by a
> shift by N.

Correct.

> 
> Apart from consistency, the reason this matters is: I'm not sure what we gain
> by adding the optab rather than simply open-coding the addition and the
> shift directly into the vector pattern.  It seems like the AArch64 expander in
> 2/2 does just do an ordinary N-bit addition followed by an ordinary shift by
> N/2.

I mentioned in the implementation, but I did so because AArch64 has various
optimization on shifts when it comes to truncating results.  I didn't need to
represent it with shifts, in fact the original pattern did not. But 
representing it
directly in the final instructions are problematic because these instructions 
are
unspecs and I would have needed to provide additional optabs to optimize them 
in.

So the shift representation was more natural for AArch64. It would not be say 
for
AArch32 which does not have these optimizations already. SVE has similar 
optimizations
and at the very worse you get an usra.

I avoided open coding it with add and shift because it creates a 4 instructions 
(and shifts
which are typically slow) dependency chain instead of a load and multiply.  
This change,
unless the target is known to optimize it further is unlikely to be beneficial. 
 And by the
time we get to costing the only alternative is to undo the existing pattern and 
so you lose
the general shift optimization.

So it seemed unwise to open code as shifts, given the codegen out of the 
vectorizer would
be degenerate for most targets or one needs the more complicated route of 
costing during
pattern matching already.

> 
> Some comments in addition to Richard's:
> 
> Tamar Christina via Gcc-patches  writes:
> > Hi All,
> >
> > As discussed in the ticket, this replaces the approach for optimizing
> > the div by bitmask operation from a hook into optabs implemented
> > through add_highpart.
> >
> > In order to be able to use this we need to check whether the current
> > precision has enough bits to do the operation without any of the additions
> overflowing.
> >
> > We use range information to determine this and only do the operation
> > if we're sure am overflow won't occur.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and 
> issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > PR target/108583
> > * doc/tm.texi (TARGET_VECTORIZE_CAN_SPECIAL_DIV_BY_CONST):
> Remove.
> > * doc/tm.texi.in: Likewise.
> > * explow.cc (round_push, align_dynamic_address): Revert previous
> patch.
> > * expmed.cc (expand_divmod): Likewise.
> > * expmed.h (expand_divmod): Likewise.
> > * expr.cc (force_operand, expand_expr_divmod): Likewise.
> > * optabs.cc (expand_doubleword_mod,
> expand_doubleword_divmod): Likewise.
> > * internal-fn.def (ADDH): New.
> > * optabs.def (sadd_highpart_optab, uadd_highpart_optab): New.
> > * doc/md.texi: Document them.
> > * doc/rtl.texi: Likewise.
> > * target.def (can_special_div_by_const): Remove.
> > * target.h: Remove tree-core.h include
> > * targhooks.cc (default_can_special_div_by_const): Remove.
> > * targhooks.h (default_can_special_div_by_const): Remove.
> > * tree-vect-generic.cc (expand_vector_operation): Remove hook.
> > * tree-vect-patterns.cc (vect_recog_divmod_pattern): Remove hook
> and
> > implement new obtab recognition based on range.
> > * tree-vect-stmts.cc (vectorizable_operation): Remove hook.
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR target/108583
> > * gcc.dg/vect/vect-div-bitmask-4.c: New test.
> > * gcc.dg/vect/vect-div-bitmask-5.c: New test.
> >
> > --- inline copy of patch --
> > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index
> >
> 7235d34c4b30949febfa10d5a626ac9358281cfa..02004c4b0f4d88dffe980f74080
> 3
> > 8595e21af35d 100644
> > --- a/gcc/doc/md.texi
> > +++ b/gcc/doc/md.texi
> > @@ -5668,6 +5668,18 @@ represented in RTL using a
> @code{smul_highpart} RTX expression.
> >  Similar, but the multiplication is unsigned.  This may be represented
> > in RTL using an @code{umul_highpart} RTX expression.
> >
> > +@cindex @code{sadd@var{m}3_highpart} instruction pattern @item
> > +@samp{smul@var{m}3_highpart}
> 
> sadd
> 
> > +Perform a signed addition of operands 1 and 2, which have mode
> > +@var{m}, and store the most significant 

Re: [PATCH 2/2]AArch64 Update div-bitmask to implement new optab instead of target hook [PR108583]

2023-02-10 Thread Richard Sandiford via Gcc-patches
I was asking in the 1/2 review whether we need the optab, but that
decision doesn't affect the other patterns, so:

Tamar Christina  writes:
> Hi All,
>
> This replaces the custom division hook with just an implementation through
> add_highpart.  For NEON we implement the add highpart (Addition + extraction 
> of
> the upper highpart of the register in the same precision) as ADD + LSR.
>
> This representation allows us to easily optimize the sequence using existing
> sequences. This gets us a pretty decent sequence using SRA:
>
> umull   v1.8h, v0.8b, v3.8b
> umull2  v0.8h, v0.16b, v3.16b
> add v5.8h, v1.8h, v2.8h
> add v4.8h, v0.8h, v2.8h
> usrav1.8h, v5.8h, 8
> usrav0.8h, v4.8h, 8
> uzp2v1.16b, v1.16b, v0.16b
>
> To get the most optimal sequence however we match (a + ((b + c) >> n)) where n
> is half the precision of the mode of the operation into addhn + uaddw which is
> a general good optimization on its own and gets us back to:
>
> .L4:
> ldr q0, [x3]
> umull   v1.8h, v0.8b, v5.8b
> umull2  v0.8h, v0.16b, v5.16b
> addhn   v3.8b, v1.8h, v4.8h
> addhn   v2.8b, v0.8h, v4.8h
> uaddw   v1.8h, v1.8h, v3.8b
> uaddw   v0.8h, v0.8h, v2.8b
> uzp2v1.16b, v1.16b, v0.16b
> str q1, [x3], 16
> cmp x3, x4
> bne .L4
>
> For SVE2 we optimize the initial sequence to the same ADD + LSR which gets us:
>
> .L3:
> ld1bz0.h, p0/z, [x0, x3]
> mul z0.h, p1/m, z0.h, z2.h
> add z1.h, z0.h, z3.h
> usraz0.h, z1.h, #8
> lsr z0.h, z0.h, #8
> st1bz0.h, p0, [x0, x3]
> inchx3
> whilelo p0.h, w3, w2
> b.any   .L3
> .L1:
> ret
>
> and to get the most optimal sequence I match (a + b) >> n (same constraint on 
> n)
> to addhnb which gets us to:
>
> .L3:
> ld1bz0.h, p0/z, [x0, x3]
> mul z0.h, p1/m, z0.h, z2.h
> addhnb  z1.b, z0.h, z3.h
> addhnb  z0.b, z0.h, z1.h
> st1bz0.h, p0, [x0, x3]
> inchx3
> whilelo p0.h, w3, w2
> b.any   .L3
>
> There are multiple RTL representations possible for these optimizations, I did
> not represent them using a zero_extend because we seem very inconsistent in 
> this
> in the backend.  Since they are unspecs we won't match them from vector ops
> anyway. I figured maintainers would prefer this, but my maintainer ouija board
> is still out for repairs :)

I agree this is the best approach as things stand.  Personally, I'd like
to have some way for the target to define simplification rules based on
unspecs, so that unspecs act more like target-specific rtl codes.  But I
know others disagree, and it wouldn't really apply to this case anyway.

> There are no new test as new correctness tests were added to the mid-end and
> the existing codegen tests for this already exist.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and  issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>   PR target/108583
>   * config/aarch64/aarch64-simd.md (@aarch64_bitmask_udiv3): Remove.
>   (add3_highpart, *bitmask_shift_plus): New.
>   * config/aarch64/aarch64-sve2.md (add3_highpart,
>   *bitmask_shift_plus): New.
>   (@aarch64_bitmask_udiv3): Remove.
>   * config/aarch64/aarch64.cc
>   (aarch64_vectorize_can_special_div_by_constant): Removed.
>   * config/aarch64/iterators.md (UNSPEC_SADD_HIGHPART,
>   UNSPEC_UADD_HIGHPART, ADD_HIGHPART): New.
>
> --- inline copy of patch -- 
> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index 
> 7f212bf37cd2c120dceb7efa733c9fa76226f029..26871a56d1fdb134f0ad9d828ce68a8df0272c53
>  100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -4867,62 +4867,48 @@ (define_expand "aarch64_hn2"
>}
>  )
>  
> -;; div optimizations using narrowings
> -;; we can do the division e.g. shorts by 255 faster by calculating it as
> -;; (x + ((x + 257) >> 8)) >> 8 assuming the operation is done in
> -;; double the precision of x.
> -;;
> -;; If we imagine a short as being composed of two blocks of bytes then
> -;; adding 257 or 0b_0001__0001 to the number is equivalent to
> -;; adding 1 to each sub component:
> -;;
> -;;  short value of 16-bits
> -;; ┌──┬┐
> -;; │  ││
> -;; └──┴┘
> -;;   8-bit part1 ▲  8-bit part2   ▲
> -;;   ││
> -;;   ││
> -;;  +1   +1
> -;;
> -;; after the first addition, we have to shift right by 8, and narrow the
> -;; results back to a byte.  Remember that the addition must be done in
> -;; double the precision of the input.  Since 8 is half the size of a short
> -;; we can use a narrowing halfing instruction 

Re: [PATCH 1/2]middle-end: Fix wrong overmatching of div-bitmask by using new optabs [PR108583]

2023-02-10 Thread Richard Biener via Gcc-patches
On Fri, 10 Feb 2023, Richard Sandiford wrote:

> I think I'm misunderstanding, but: it seems like we're treating the
> add highpart optabs as companions to the mul highpart optabs.  But AIUI,
> the add highpart optab is used such that, for an N-bit mode, we do
> an N-bit addition followed by a shift by N/2.  Is that right?
> The mul highpart optabs instead do an 2N-bit multiplication followed
> by a shift by N.

That also confused me - and the docs add to the confusion more than
clear it up ... I agree we should be consistent in the semantics
for add_highpart and mul_highpart.

> Apart from consistency, the reason this matters is: I'm not sure what we
> gain by adding the optab rather than simply open-coding the addition and
> the shift directly into the vector pattern.  It seems like the AArch64
> expander in 2/2 does just do an ordinary N-bit addition followed by an
> ordinary shift by N/2.
> 
> Some comments in addition to Richard's:
> 
> Tamar Christina via Gcc-patches  writes:
> > Hi All,
> >
> > As discussed in the ticket, this replaces the approach for optimizing the
> > div by bitmask operation from a hook into optabs implemented through
> > add_highpart.
> >
> > In order to be able to use this we need to check whether the current 
> > precision
> > has enough bits to do the operation without any of the additions 
> > overflowing.
> >
> > We use range information to determine this and only do the operation if 
> > we're
> > sure am overflow won't occur.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and  issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > PR target/108583
> > * doc/tm.texi (TARGET_VECTORIZE_CAN_SPECIAL_DIV_BY_CONST): Remove.
> > * doc/tm.texi.in: Likewise.
> > * explow.cc (round_push, align_dynamic_address): Revert previous patch.
> > * expmed.cc (expand_divmod): Likewise.
> > * expmed.h (expand_divmod): Likewise.
> > * expr.cc (force_operand, expand_expr_divmod): Likewise.
> > * optabs.cc (expand_doubleword_mod, expand_doubleword_divmod): Likewise.
> > * internal-fn.def (ADDH): New.
> > * optabs.def (sadd_highpart_optab, uadd_highpart_optab): New.
> > * doc/md.texi: Document them.
> > * doc/rtl.texi: Likewise.
> > * target.def (can_special_div_by_const): Remove.
> > * target.h: Remove tree-core.h include
> > * targhooks.cc (default_can_special_div_by_const): Remove.
> > * targhooks.h (default_can_special_div_by_const): Remove.
> > * tree-vect-generic.cc (expand_vector_operation): Remove hook.
> > * tree-vect-patterns.cc (vect_recog_divmod_pattern): Remove hook and
> > implement new obtab recognition based on range.
> > * tree-vect-stmts.cc (vectorizable_operation): Remove hook.
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR target/108583
> > * gcc.dg/vect/vect-div-bitmask-4.c: New test.
> > * gcc.dg/vect/vect-div-bitmask-5.c: New test.
> >
> > --- inline copy of patch -- 
> > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> > index 
> > 7235d34c4b30949febfa10d5a626ac9358281cfa..02004c4b0f4d88dffe980f7408038595e21af35d
> >  100644
> > --- a/gcc/doc/md.texi
> > +++ b/gcc/doc/md.texi
> > @@ -5668,6 +5668,18 @@ represented in RTL using a @code{smul_highpart} RTX 
> > expression.
> >  Similar, but the multiplication is unsigned.  This may be represented
> >  in RTL using an @code{umul_highpart} RTX expression.
> >  
> > +@cindex @code{sadd@var{m}3_highpart} instruction pattern
> > +@item @samp{smul@var{m}3_highpart}
> 
> sadd
> 
> > +Perform a signed addition of operands 1 and 2, which have mode
> > +@var{m}, and store the most significant half of the product in operand 0.
> > +The least significant half of the product is discarded.  This may be
> > +represented in RTL using a @code{sadd_highpart} RTX expression.
> > +
> > +@cindex @code{uadd@var{m}3_highpart} instruction pattern
> > +@item @samp{uadd@var{m}3_highpart}
> > +Similar, but the addition is unsigned.  This may be represented
> > +in RTL using an @code{uadd_highpart} RTX expression.
> > +
> >  @cindex @code{madd@var{m}@var{n}4} instruction pattern
> >  @item @samp{madd@var{m}@var{n}4}
> >  Multiply operands 1 and 2, sign-extend them to mode @var{n}, add
> > diff --git a/gcc/doc/rtl.texi b/gcc/doc/rtl.texi
> > index 
> > d1380e1eb3ba6b2853686f41f2bf937bfcbed1fe..63a7ef6e566eeea4f14c00343d171940ec4222f3
> >  100644
> > --- a/gcc/doc/rtl.texi
> > +++ b/gcc/doc/rtl.texi
> > @@ -2535,6 +2535,17 @@ out in machine mode @var{m}.  @code{smul_highpart} 
> > returns the high part
> >  of a signed multiplication, @code{umul_highpart} returns the high part
> >  of an unsigned multiplication.
> >  
> > +@findex sadd_highpart
> > +@findex uadd_highpart
> > +@cindex high-part addition
> > +@cindex addition high part
> > +@item (sadd_highpart:@var{m} @var{x} @var{y})
> > +@itemx (uadd_highpart:@var{m} @var{x} @var{y})
> > +Represents the high-part addition of @var{x} and @var{y} carried
> > +out in machine 

Re: [PATCH 1/2]middle-end: Fix wrong overmatching of div-bitmask by using new optabs [PR108583]

2023-02-10 Thread Richard Sandiford via Gcc-patches
I think I'm misunderstanding, but: it seems like we're treating the
add highpart optabs as companions to the mul highpart optabs.  But AIUI,
the add highpart optab is used such that, for an N-bit mode, we do
an N-bit addition followed by a shift by N/2.  Is that right?
The mul highpart optabs instead do an 2N-bit multiplication followed
by a shift by N.

Apart from consistency, the reason this matters is: I'm not sure what we
gain by adding the optab rather than simply open-coding the addition and
the shift directly into the vector pattern.  It seems like the AArch64
expander in 2/2 does just do an ordinary N-bit addition followed by an
ordinary shift by N/2.

Some comments in addition to Richard's:

Tamar Christina via Gcc-patches  writes:
> Hi All,
>
> As discussed in the ticket, this replaces the approach for optimizing the
> div by bitmask operation from a hook into optabs implemented through
> add_highpart.
>
> In order to be able to use this we need to check whether the current precision
> has enough bits to do the operation without any of the additions overflowing.
>
> We use range information to determine this and only do the operation if we're
> sure am overflow won't occur.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and  issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>   PR target/108583
>   * doc/tm.texi (TARGET_VECTORIZE_CAN_SPECIAL_DIV_BY_CONST): Remove.
>   * doc/tm.texi.in: Likewise.
>   * explow.cc (round_push, align_dynamic_address): Revert previous patch.
>   * expmed.cc (expand_divmod): Likewise.
>   * expmed.h (expand_divmod): Likewise.
>   * expr.cc (force_operand, expand_expr_divmod): Likewise.
>   * optabs.cc (expand_doubleword_mod, expand_doubleword_divmod): Likewise.
>   * internal-fn.def (ADDH): New.
>   * optabs.def (sadd_highpart_optab, uadd_highpart_optab): New.
>   * doc/md.texi: Document them.
>   * doc/rtl.texi: Likewise.
>   * target.def (can_special_div_by_const): Remove.
>   * target.h: Remove tree-core.h include
>   * targhooks.cc (default_can_special_div_by_const): Remove.
>   * targhooks.h (default_can_special_div_by_const): Remove.
>   * tree-vect-generic.cc (expand_vector_operation): Remove hook.
>   * tree-vect-patterns.cc (vect_recog_divmod_pattern): Remove hook and
>   implement new obtab recognition based on range.
>   * tree-vect-stmts.cc (vectorizable_operation): Remove hook.
>
> gcc/testsuite/ChangeLog:
>
>   PR target/108583
>   * gcc.dg/vect/vect-div-bitmask-4.c: New test.
>   * gcc.dg/vect/vect-div-bitmask-5.c: New test.
>
> --- inline copy of patch -- 
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 
> 7235d34c4b30949febfa10d5a626ac9358281cfa..02004c4b0f4d88dffe980f7408038595e21af35d
>  100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -5668,6 +5668,18 @@ represented in RTL using a @code{smul_highpart} RTX 
> expression.
>  Similar, but the multiplication is unsigned.  This may be represented
>  in RTL using an @code{umul_highpart} RTX expression.
>  
> +@cindex @code{sadd@var{m}3_highpart} instruction pattern
> +@item @samp{smul@var{m}3_highpart}

sadd

> +Perform a signed addition of operands 1 and 2, which have mode
> +@var{m}, and store the most significant half of the product in operand 0.
> +The least significant half of the product is discarded.  This may be
> +represented in RTL using a @code{sadd_highpart} RTX expression.
> +
> +@cindex @code{uadd@var{m}3_highpart} instruction pattern
> +@item @samp{uadd@var{m}3_highpart}
> +Similar, but the addition is unsigned.  This may be represented
> +in RTL using an @code{uadd_highpart} RTX expression.
> +
>  @cindex @code{madd@var{m}@var{n}4} instruction pattern
>  @item @samp{madd@var{m}@var{n}4}
>  Multiply operands 1 and 2, sign-extend them to mode @var{n}, add
> diff --git a/gcc/doc/rtl.texi b/gcc/doc/rtl.texi
> index 
> d1380e1eb3ba6b2853686f41f2bf937bfcbed1fe..63a7ef6e566eeea4f14c00343d171940ec4222f3
>  100644
> --- a/gcc/doc/rtl.texi
> +++ b/gcc/doc/rtl.texi
> @@ -2535,6 +2535,17 @@ out in machine mode @var{m}.  @code{smul_highpart} 
> returns the high part
>  of a signed multiplication, @code{umul_highpart} returns the high part
>  of an unsigned multiplication.
>  
> +@findex sadd_highpart
> +@findex uadd_highpart
> +@cindex high-part addition
> +@cindex addition high part
> +@item (sadd_highpart:@var{m} @var{x} @var{y})
> +@itemx (uadd_highpart:@var{m} @var{x} @var{y})
> +Represents the high-part addition of @var{x} and @var{y} carried
> +out in machine mode @var{m}.  @code{sadd_highpart} returns the high part
> +of a signed addition, @code{uadd_highpart} returns the high part
> +of an unsigned addition.

The patch doesn't add these RTL codes though.

> +
>  @findex fma
>  @cindex fused multiply-add
>  @item (fma:@var{m} @var{x} @var{y} @var{z})
> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> index 
> 

Re: [PATCH 1/2]middle-end: Fix wrong overmatching of div-bitmask by using new optabs [PR108583]

2023-02-10 Thread Richard Biener via Gcc-patches
On Thu, 9 Feb 2023, Tamar Christina wrote:

> Hi All,
> 
> As discussed in the ticket, this replaces the approach for optimizing the
> div by bitmask operation from a hook into optabs implemented through
> add_highpart.
> 
> In order to be able to use this we need to check whether the current precision
> has enough bits to do the operation without any of the additions overflowing.
> 
> We use range information to determine this and only do the operation if we're
> sure am overflow won't occur.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and  issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   PR target/108583
>   * doc/tm.texi (TARGET_VECTORIZE_CAN_SPECIAL_DIV_BY_CONST): Remove.
>   * doc/tm.texi.in: Likewise.
>   * explow.cc (round_push, align_dynamic_address): Revert previous patch.
>   * expmed.cc (expand_divmod): Likewise.
>   * expmed.h (expand_divmod): Likewise.
>   * expr.cc (force_operand, expand_expr_divmod): Likewise.
>   * optabs.cc (expand_doubleword_mod, expand_doubleword_divmod): Likewise.
>   * internal-fn.def (ADDH): New.
>   * optabs.def (sadd_highpart_optab, uadd_highpart_optab): New.
>   * doc/md.texi: Document them.
>   * doc/rtl.texi: Likewise.
>   * target.def (can_special_div_by_const): Remove.
>   * target.h: Remove tree-core.h include
>   * targhooks.cc (default_can_special_div_by_const): Remove.
>   * targhooks.h (default_can_special_div_by_const): Remove.
>   * tree-vect-generic.cc (expand_vector_operation): Remove hook.
>   * tree-vect-patterns.cc (vect_recog_divmod_pattern): Remove hook and
>   implement new obtab recognition based on range.
>   * tree-vect-stmts.cc (vectorizable_operation): Remove hook.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR target/108583
>   * gcc.dg/vect/vect-div-bitmask-4.c: New test.
>   * gcc.dg/vect/vect-div-bitmask-5.c: New test.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 
> 7235d34c4b30949febfa10d5a626ac9358281cfa..02004c4b0f4d88dffe980f7408038595e21af35d
>  100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -5668,6 +5668,18 @@ represented in RTL using a @code{smul_highpart} RTX 
> expression.
>  Similar, but the multiplication is unsigned.  This may be represented
>  in RTL using an @code{umul_highpart} RTX expression.
>  
> +@cindex @code{sadd@var{m}3_highpart} instruction pattern
> +@item @samp{smul@var{m}3_highpart}
> +Perform a signed addition of operands 1 and 2, which have mode
> +@var{m}, and store the most significant half of the product in operand 0.

of the sum

> +The least significant half of the product is discarded.  This may be
> +represented in RTL using a @code{sadd_highpart} RTX expression.

likewise.

> +
> +@cindex @code{uadd@var{m}3_highpart} instruction pattern
> +@item @samp{uadd@var{m}3_highpart}
> +Similar, but the addition is unsigned.  This may be represented
> +in RTL using an @code{uadd_highpart} RTX expression.
> +

is the highpart of the results sign- (for sadd) or zero- (for uadd)
extended to the full precision of the result mode? "store the most
significant half ... in operand 0" leaves that underspecified I think
(likewise for the mul_highpart pattern docs you copied this from).

Otherwise looks good to me.  Review would have been easier with
splitting the revert from the new implementation ...

Thanks,
Richard.

>  @cindex @code{madd@var{m}@var{n}4} instruction pattern
>  @item @samp{madd@var{m}@var{n}4}
>  Multiply operands 1 and 2, sign-extend them to mode @var{n}, add
> diff --git a/gcc/doc/rtl.texi b/gcc/doc/rtl.texi
> index 
> d1380e1eb3ba6b2853686f41f2bf937bfcbed1fe..63a7ef6e566eeea4f14c00343d171940ec4222f3
>  100644
> --- a/gcc/doc/rtl.texi
> +++ b/gcc/doc/rtl.texi
> @@ -2535,6 +2535,17 @@ out in machine mode @var{m}.  @code{smul_highpart} 
> returns the high part
>  of a signed multiplication, @code{umul_highpart} returns the high part
>  of an unsigned multiplication.
>  
> +@findex sadd_highpart
> +@findex uadd_highpart
> +@cindex high-part addition
> +@cindex addition high part
> +@item (sadd_highpart:@var{m} @var{x} @var{y})
> +@itemx (uadd_highpart:@var{m} @var{x} @var{y})
> +Represents the high-part addition of @var{x} and @var{y} carried
> +out in machine mode @var{m}.  @code{sadd_highpart} returns the high part
> +of a signed addition, @code{uadd_highpart} returns the high part
> +of an unsigned addition.
> +
>  @findex fma
>  @cindex fused multiply-add
>  @item (fma:@var{m} @var{x} @var{y} @var{z})
> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> index 
> c6c891972d1e58cd163b259ba96a599d62326865..3ab2031a336b8758d5791484017e6b0d62ab077e
>  100644
> --- a/gcc/doc/tm.texi
> +++ b/gcc/doc/tm.texi
> @@ -6137,22 +6137,6 @@ instruction pattern.  There is no need for the hook to 
> handle these two
>  implementation approaches itself.
>  @end deftypefn
>  
> -@deftypefn {Target Hook} bool 

[Patch][v2] OpenMP/Fortran: Fix loop-iter var privatization with !$OMP LOOP [PR108512]

2023-02-10 Thread Tobias Burnus

Updated version attached.

On 31.01.23 12:20, Jakub Jelinek via Gcc-patches wrote:

On Tue, Jan 24, 2023 at 04:24:07PM +0100, Tobias Burnus wrote:

+  if (code->op == EXEC_OMP_LOOP)
+;  /* Already rejected in resolve_omp_clauses.  */

I don't understand why is this needed.  Sure, the vast majority of
constructs don't allow reduction(inscan, ...), do we need to list them all?


I think using the same != condition as in resolve_omp_clauses would be
better. The messages inside this procedure come earlier than the ones in
resolve_omp_clauses. Thus, they cannot be silenced there.

However, while I did try this, preventing some odd error messages, I
have now removed it again. – It is never quite clear when it is better
to have multiple error messages and when suppressing some is better.

 * * *

For more on 'inscan' and combined constructs (mainly with target, an
OpenMP 5.1 feature), see the newly filed https://gcc.gnu.org/PR108749


diff --git a/gcc/fortran/resolve.cc b/gcc/fortran/resolve.cc
index 94213cd3cd4..bd2a749776d 100644
--- a/gcc/fortran/resolve.cc
+++ b/gcc/fortran/resolve.cc
@@ -11950,6 +11950,7 @@ gfc_resolve_code (gfc_code *code, gfc_namespace *ns)
 case EXEC_OMP_DISTRIBUTE_SIMD:
 case EXEC_OMP_DO:
 case EXEC_OMP_DO_SIMD:
+case EXEC_OMP_LOOP:
 case EXEC_OMP_SIMD:
 case EXEC_OMP_TARGET_SIMD:
   gfc_resolve_omp_do_blocks (code, ns);

I'm afraid this is needed but insufficient.
I think
 case EXEC_OMP_MASKED_TASKLOOP:
 case EXEC_OMP_MASKED_TASKLOOP_SIMD:
 case EXEC_OMP_MASTER_TASKLOOP:
 case EXEC_OMP_MASTER_TASKLOOP_SIMD:
  case EXEC_OMP_PARALLEL_LOOP:
  case EXEC_OMP_TARGET_PARALLEL_LOOP:
  case EXEC_OMP_TARGET_TEAMS_LOOP:
  case EXEC_OMP_TARGET_SIMD:
  case EXEC_OMP_TEAMS_LOOP:
should be in the list above (of course alphabetically sorted in between the
others)
gfc_resolve_omp_parallel_blocks (code, ns);


I think 'TARGET_SIMD' shouldn't be resolved though parallel blocks but
can call directly call gfc_resolve_omp_do_blocks (as
currently/previously implemented). The masked version were already
handled inside gfc_resolve_omp_parallel_blocks but missing in
gfc_resolve_code, while the 'loop' ones had to be added to both.

(I did not extend the testcase, but I updated two to add additional
dg-error to the same line.)

Thanks for the comments. Any further suggestions?

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
OpenMP/Fortran: Fix loop-iter var privatization with !$OMP LOOP [PR108512]

For 'parallel', loop-iteration variables are marked are marked as 'private',
unless they either appear in an omp do/simd loop or an data-sharing clause
already exists for those on 'parallel'. 'omp loop' wasn't handled, leading
to (potentially) multiple data-sharing clauses in gfc_resolve_do_iterator
as omp_current_ctx pointed to the 'parallel' directive, ignoring the
in-betwen 'loop' directive.

The latter lead to a bogus diagnostic - or rather an ICE as the source
location var contained only '\0'.

Additionally, several 'case EXEC_OMP...LOOP' have been added to call the 
right resolution function and likewise for '{masked,master} taskloop'.

gcc/fortran/ChangeLog:

	PR fortran/108512
	* openmp.cc (gfc_resolve_omp_parallel_blocks): Handle combined 'loop'
	directives.
	(gfc_resolve_do_iterator): Set a source location for added
	'private'-clause arguments.
	* resolve.cc (gfc_resolve_code): Call gfc_resolve_omp_do_blocks
	also for EXEC_OMP_LOOP and gfc_resolve_omp_parallel_blocks for
	combined directives with loop + '{masked,master} taskloop (simd)'.

gcc/testsuite/ChangeLog:

	PR fortran/108512
	* gfortran.dg/gomp/loop-5.f90: New test.
	* gfortran.dg/gomp/loop-2.f90: Update dg-error.
	* gfortran.dg/gomp/taskloop-2.f90: Update dg-error.

 gcc/fortran/openmp.cc | 13 +++--
 gcc/fortran/resolve.cc|  9 +++
 gcc/testsuite/gfortran.dg/gomp/loop-2.f90 |  5 ++
 gcc/testsuite/gfortran.dg/gomp/loop-5.f90 | 84 +++
 gcc/testsuite/gfortran.dg/gomp/taskloop-2.f90 |  2 +
 5 files changed, 109 insertions(+), 4 deletions(-)

diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
index 1897e1dbc71..abca146d78e 100644
--- a/gcc/fortran/openmp.cc
+++ b/gcc/fortran/openmp.cc
@@ -9125,28 +9125,32 @@ gfc_resolve_omp_parallel_blocks (gfc_code *code, gfc_namespace *ns)
 {
 case EXEC_OMP_DISTRIBUTE_PARALLEL_DO:
 case EXEC_OMP_DISTRIBUTE_PARALLEL_DO_SIMD:
+case EXEC_OMP_MASKED_TASKLOOP:
+case EXEC_OMP_MASKED_TASKLOOP_SIMD:
+case EXEC_OMP_MASTER_TASKLOOP:
+case EXEC_OMP_MASTER_TASKLOOP_SIMD:
 case EXEC_OMP_PARALLEL_DO:
 case EXEC_OMP_PARALLEL_DO_SIMD:

Re: [PATCH] tree-optimization/108724 - vectorized code getting piecewise expanded

2023-02-10 Thread Richard Biener via Gcc-patches
On Fri, 10 Feb 2023, Richard Sandiford wrote:

> Richard Biener via Gcc-patches  writes:
> > This fixes an oversight to when removing the hard limits on using
> > generic vectors for the vectorizer to enable both SLP and BB
> > vectorization to use those.  The vectorizer relies on vector lowering
> > to expand plus, minus and negate to bit operations but vector
> > lowering has a hard limit on the minimum number of elements per
> > work item.  Vectorizer costs for the testcase at hand work out
> > to vectorize a loop with just two work items per vector and that
> > causes element wise expansion and spilling.
> >
> > The fix for now is to re-instantiate the hard limit, matching what
> > vector lowering does.  For the future the way to go is to emit the
> > lowered sequence directly from the vectorizer instead.
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?
> 
> LGTM after reading the vector lowering stuff in the PR trail.
> 
> TBH I don't remember when the hard limit was removed though.

I removed it as part of fixing PR101801 but didn't anticipate the
close connection with vector lowering.

Richard.

> Thanks,
> Richard
> 
> >
> > Thanks,
> > Richard.
> >
> > PR tree-optimization/108724
> > * tree-vect-stmts.cc (vectorizable_operation): Avoid
> > using word_mode vectors when vector lowering will
> > decompose them to elementwise operations.
> >
> > * gcc.target/i386/pr108724.c: New testcase.
> > ---
> >  gcc/testsuite/gcc.target/i386/pr108724.c | 15 +++
> >  gcc/tree-vect-stmts.cc   | 14 ++
> >  2 files changed, 29 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr108724.c
> >
> > diff --git a/gcc/testsuite/gcc.target/i386/pr108724.c 
> > b/gcc/testsuite/gcc.target/i386/pr108724.c
> > new file mode 100644
> > index 000..c4e0e918610
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/pr108724.c
> > @@ -0,0 +1,15 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O3 -mno-sse" } */
> > +
> > +int a[16], b[16], c[16];
> > +void foo()
> > +{
> > +  for (int i = 0; i < 16; i++) {
> > +a[i] = b[i] + c[i];
> > +  }
> > +}
> > +
> > +/* When this is vectorized this shouldn't be expanded piecewise again
> > +   which will result in spilling for the upper half access.  */
> > +
> > +/* { dg-final { scan-assembler-not "\\\[er\\\]sp" } } */
> > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > index c86249adcc3..09b5af603d2 100644
> > --- a/gcc/tree-vect-stmts.cc
> > +++ b/gcc/tree-vect-stmts.cc
> > @@ -6315,6 +6315,20 @@ vectorizable_operation (vec_info *vinfo,
> >return false;
> >  }
> >  
> > +  /* ???  We should instead expand the operations here, instead of
> > + relying on vector lowering which has this hard cap on the number
> > + of vector elements below it performs elementwise operations.  */
> > +  if (using_emulated_vectors_p
> > +  && (code == PLUS_EXPR || code == MINUS_EXPR || code == NEGATE_EXPR)
> > +  && ((BITS_PER_WORD / vector_element_bits (vectype)) < 4
> > + || maybe_lt (nunits_out, 4U)))
> > +{
> > +  if (dump_enabled_p ())
> > +   dump_printf (MSG_NOTE, "not using word mode for +- and less than "
> > +"four vector elements\n");
> > +  return false;
> > +}
> > +
> >int reduc_idx = STMT_VINFO_REDUC_IDX (stmt_info);
> >vec_loop_masks *masks = (loop_vinfo ? _VINFO_MASKS (loop_vinfo) : 
> > NULL);
> >internal_fn cond_fn = get_conditional_internal_fn (code);
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)


Re: [PATCH] tree-optimization/108724 - vectorized code getting piecewise expanded

2023-02-10 Thread Richard Sandiford via Gcc-patches
Richard Biener via Gcc-patches  writes:
> This fixes an oversight to when removing the hard limits on using
> generic vectors for the vectorizer to enable both SLP and BB
> vectorization to use those.  The vectorizer relies on vector lowering
> to expand plus, minus and negate to bit operations but vector
> lowering has a hard limit on the minimum number of elements per
> work item.  Vectorizer costs for the testcase at hand work out
> to vectorize a loop with just two work items per vector and that
> causes element wise expansion and spilling.
>
> The fix for now is to re-instantiate the hard limit, matching what
> vector lowering does.  For the future the way to go is to emit the
> lowered sequence directly from the vectorizer instead.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?

LGTM after reading the vector lowering stuff in the PR trail.

TBH I don't remember when the hard limit was removed though.

Thanks,
Richard

>
> Thanks,
> Richard.
>
>   PR tree-optimization/108724
>   * tree-vect-stmts.cc (vectorizable_operation): Avoid
>   using word_mode vectors when vector lowering will
>   decompose them to elementwise operations.
>
>   * gcc.target/i386/pr108724.c: New testcase.
> ---
>  gcc/testsuite/gcc.target/i386/pr108724.c | 15 +++
>  gcc/tree-vect-stmts.cc   | 14 ++
>  2 files changed, 29 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr108724.c
>
> diff --git a/gcc/testsuite/gcc.target/i386/pr108724.c 
> b/gcc/testsuite/gcc.target/i386/pr108724.c
> new file mode 100644
> index 000..c4e0e918610
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr108724.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -mno-sse" } */
> +
> +int a[16], b[16], c[16];
> +void foo()
> +{
> +  for (int i = 0; i < 16; i++) {
> +a[i] = b[i] + c[i];
> +  }
> +}
> +
> +/* When this is vectorized this shouldn't be expanded piecewise again
> +   which will result in spilling for the upper half access.  */
> +
> +/* { dg-final { scan-assembler-not "\\\[er\\\]sp" } } */
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index c86249adcc3..09b5af603d2 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -6315,6 +6315,20 @@ vectorizable_operation (vec_info *vinfo,
>return false;
>  }
>  
> +  /* ???  We should instead expand the operations here, instead of
> + relying on vector lowering which has this hard cap on the number
> + of vector elements below it performs elementwise operations.  */
> +  if (using_emulated_vectors_p
> +  && (code == PLUS_EXPR || code == MINUS_EXPR || code == NEGATE_EXPR)
> +  && ((BITS_PER_WORD / vector_element_bits (vectype)) < 4
> +   || maybe_lt (nunits_out, 4U)))
> +{
> +  if (dump_enabled_p ())
> + dump_printf (MSG_NOTE, "not using word mode for +- and less than "
> +  "four vector elements\n");
> +  return false;
> +}
> +
>int reduc_idx = STMT_VINFO_REDUC_IDX (stmt_info);
>vec_loop_masks *masks = (loop_vinfo ? _VINFO_MASKS (loop_vinfo) : 
> NULL);
>internal_fn cond_fn = get_conditional_internal_fn (code);


[PATCH] tree-optimization/108724 - vectorized code getting piecewise expanded

2023-02-10 Thread Richard Biener via Gcc-patches
This fixes an oversight to when removing the hard limits on using
generic vectors for the vectorizer to enable both SLP and BB
vectorization to use those.  The vectorizer relies on vector lowering
to expand plus, minus and negate to bit operations but vector
lowering has a hard limit on the minimum number of elements per
work item.  Vectorizer costs for the testcase at hand work out
to vectorize a loop with just two work items per vector and that
causes element wise expansion and spilling.

The fix for now is to re-instantiate the hard limit, matching what
vector lowering does.  For the future the way to go is to emit the
lowered sequence directly from the vectorizer instead.

Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?

Thanks,
Richard.

PR tree-optimization/108724
* tree-vect-stmts.cc (vectorizable_operation): Avoid
using word_mode vectors when vector lowering will
decompose them to elementwise operations.

* gcc.target/i386/pr108724.c: New testcase.
---
 gcc/testsuite/gcc.target/i386/pr108724.c | 15 +++
 gcc/tree-vect-stmts.cc   | 14 ++
 2 files changed, 29 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr108724.c

diff --git a/gcc/testsuite/gcc.target/i386/pr108724.c 
b/gcc/testsuite/gcc.target/i386/pr108724.c
new file mode 100644
index 000..c4e0e918610
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr108724.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mno-sse" } */
+
+int a[16], b[16], c[16];
+void foo()
+{
+  for (int i = 0; i < 16; i++) {
+a[i] = b[i] + c[i];
+  }
+}
+
+/* When this is vectorized this shouldn't be expanded piecewise again
+   which will result in spilling for the upper half access.  */
+
+/* { dg-final { scan-assembler-not "\\\[er\\\]sp" } } */
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index c86249adcc3..09b5af603d2 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -6315,6 +6315,20 @@ vectorizable_operation (vec_info *vinfo,
   return false;
 }
 
+  /* ???  We should instead expand the operations here, instead of
+ relying on vector lowering which has this hard cap on the number
+ of vector elements below it performs elementwise operations.  */
+  if (using_emulated_vectors_p
+  && (code == PLUS_EXPR || code == MINUS_EXPR || code == NEGATE_EXPR)
+  && ((BITS_PER_WORD / vector_element_bits (vectype)) < 4
+ || maybe_lt (nunits_out, 4U)))
+{
+  if (dump_enabled_p ())
+   dump_printf (MSG_NOTE, "not using word mode for +- and less than "
+"four vector elements\n");
+  return false;
+}
+
   int reduc_idx = STMT_VINFO_REDUC_IDX (stmt_info);
   vec_loop_masks *masks = (loop_vinfo ? _VINFO_MASKS (loop_vinfo) : NULL);
   internal_fn cond_fn = get_conditional_internal_fn (code);
-- 
2.35.3


RE: [PATCH 1/2]middle-end: Fix wrong overmatching of div-bitmask by using new optabs [PR108583]

2023-02-10 Thread Tamar Christina via Gcc-patches
Oops, realizes I forgot to fill in the test results, there were no issues 

> -Original Message-
> From: Gcc-patches  bounces+tamar.christina=arm@gcc.gnu.org> On Behalf Of Tamar
> Christina via Gcc-patches
> Sent: Thursday, February 9, 2023 5:17 PM
> To: gcc-patches@gcc.gnu.org
> Cc: nd ; rguent...@suse.de; j...@ventanamicro.com
> Subject: [PATCH 1/2]middle-end: Fix wrong overmatching of div-bitmask by
> using new optabs [PR108583]
> 
> Hi All,
> 
> As discussed in the ticket, this replaces the approach for optimizing the div 
> by
> bitmask operation from a hook into optabs implemented through
> add_highpart.
> 
> In order to be able to use this we need to check whether the current
> precision has enough bits to do the operation without any of the additions
> overflowing.
> 
> We use range information to determine this and only do the operation if
> we're sure am overflow won't occur.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and  issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   PR target/108583
>   * doc/tm.texi (TARGET_VECTORIZE_CAN_SPECIAL_DIV_BY_CONST):
> Remove.
>   * doc/tm.texi.in: Likewise.
>   * explow.cc (round_push, align_dynamic_address): Revert previous
> patch.
>   * expmed.cc (expand_divmod): Likewise.
>   * expmed.h (expand_divmod): Likewise.
>   * expr.cc (force_operand, expand_expr_divmod): Likewise.
>   * optabs.cc (expand_doubleword_mod,
> expand_doubleword_divmod): Likewise.
>   * internal-fn.def (ADDH): New.
>   * optabs.def (sadd_highpart_optab, uadd_highpart_optab): New.
>   * doc/md.texi: Document them.
>   * doc/rtl.texi: Likewise.
>   * target.def (can_special_div_by_const): Remove.
>   * target.h: Remove tree-core.h include
>   * targhooks.cc (default_can_special_div_by_const): Remove.
>   * targhooks.h (default_can_special_div_by_const): Remove.
>   * tree-vect-generic.cc (expand_vector_operation): Remove hook.
>   * tree-vect-patterns.cc (vect_recog_divmod_pattern): Remove hook
> and
>   implement new obtab recognition based on range.
>   * tree-vect-stmts.cc (vectorizable_operation): Remove hook.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR target/108583
>   * gcc.dg/vect/vect-div-bitmask-4.c: New test.
>   * gcc.dg/vect/vect-div-bitmask-5.c: New test.
> 
> --- inline copy of patch --
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index
> 7235d34c4b30949febfa10d5a626ac9358281cfa..02004c4b0f4d88dffe980f74080
> 38595e21af35d 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -5668,6 +5668,18 @@ represented in RTL using a @code{smul_highpart}
> RTX expression.
>  Similar, but the multiplication is unsigned.  This may be represented  in RTL
> using an @code{umul_highpart} RTX expression.
> 
> +@cindex @code{sadd@var{m}3_highpart} instruction pattern @item
> +@samp{smul@var{m}3_highpart} Perform a signed addition of operands 1
> +and 2, which have mode @var{m}, and store the most significant half of
> +the product in operand 0.
> +The least significant half of the product is discarded.  This may be
> +represented in RTL using a @code{sadd_highpart} RTX expression.
> +
> +@cindex @code{uadd@var{m}3_highpart} instruction pattern @item
> +@samp{uadd@var{m}3_highpart} Similar, but the addition is unsigned.
> +This may be represented in RTL using an @code{uadd_highpart} RTX
> +expression.
> +
>  @cindex @code{madd@var{m}@var{n}4} instruction pattern  @item
> @samp{madd@var{m}@var{n}4}  Multiply operands 1 and 2, sign-extend
> them to mode @var{n}, add diff --git a/gcc/doc/rtl.texi b/gcc/doc/rtl.texi
> index
> d1380e1eb3ba6b2853686f41f2bf937bfcbed1fe..63a7ef6e566eeea4f14c00343
> d171940ec4222f3 100644
> --- a/gcc/doc/rtl.texi
> +++ b/gcc/doc/rtl.texi
> @@ -2535,6 +2535,17 @@ out in machine mode @var{m}.
> @code{smul_highpart} returns the high part  of a signed multiplication,
> @code{umul_highpart} returns the high part  of an unsigned multiplication.
> 
> +@findex sadd_highpart
> +@findex uadd_highpart
> +@cindex high-part addition
> +@cindex addition high part
> +@item (sadd_highpart:@var{m} @var{x} @var{y}) @itemx
> +(uadd_highpart:@var{m} @var{x} @var{y}) Represents the high-part
> +addition of @var{x} and @var{y} carried out in machine mode @var{m}.
> +@code{sadd_highpart} returns the high part of a signed addition,
> +@code{uadd_highpart} returns the high part of an unsigned addition.
> +
>  @findex fma
>  @cindex fused multiply-add
>  @item (fma:@var{m} @var{x} @var{y} @var{z}) diff --git a/gcc/doc/tm.texi
> b/gcc/doc/tm.texi index
> c6c891972d1e58cd163b259ba96a599d62326865..3ab2031a336b8758d57914840
> 17e6b0d62ab077e 100644
> --- a/gcc/doc/tm.texi
> +++ b/gcc/doc/tm.texi
> @@ -6137,22 +6137,6 @@ instruction pattern.  There is no need for the hook
> to handle these two  implementation approaches itself.
>  @end deftypefn
> 
> -@deftypefn {Target Hook} bool
> TARGET_VECTORIZE_CAN_SPECIAL_DIV_BY_CONST (enum
> @var{tree_code}, tree 

RE: [PATCH 2/2]AArch64 Update div-bitmask to implement new optab instead of target hook [PR108583]

2023-02-10 Thread Tamar Christina via Gcc-patches
Oops, realizes I forgot to fill in the test results, there were no issues 

> -Original Message-
> From: Gcc-patches  bounces+tamar.christina=arm@gcc.gnu.org> On Behalf Of Tamar
> Christina via Gcc-patches
> Sent: Thursday, February 9, 2023 5:22 PM
> To: gcc-patches@gcc.gnu.org
> Cc: nd ; Richard Earnshaw ;
> Marcus Shawcroft ; Kyrylo Tkachov
> ; Richard Sandiford
> 
> Subject: [PATCH 2/2]AArch64 Update div-bitmask to implement new optab
> instead of target hook [PR108583]
> 
> Hi All,
> 
> This replaces the custom division hook with just an implementation through
> add_highpart.  For NEON we implement the add highpart (Addition +
> extraction of the upper highpart of the register in the same precision) as ADD
> + LSR.
> 
> This representation allows us to easily optimize the sequence using existing
> sequences. This gets us a pretty decent sequence using SRA:
> 
> umull   v1.8h, v0.8b, v3.8b
> umull2  v0.8h, v0.16b, v3.16b
> add v5.8h, v1.8h, v2.8h
> add v4.8h, v0.8h, v2.8h
> usrav1.8h, v5.8h, 8
> usrav0.8h, v4.8h, 8
> uzp2v1.16b, v1.16b, v0.16b
> 
> To get the most optimal sequence however we match (a + ((b + c) >> n))
> where n is half the precision of the mode of the operation into addhn +
> uaddw which is a general good optimization on its own and gets us back to:
> 
> .L4:
> ldr q0, [x3]
> umull   v1.8h, v0.8b, v5.8b
> umull2  v0.8h, v0.16b, v5.16b
> addhn   v3.8b, v1.8h, v4.8h
> addhn   v2.8b, v0.8h, v4.8h
> uaddw   v1.8h, v1.8h, v3.8b
> uaddw   v0.8h, v0.8h, v2.8b
> uzp2v1.16b, v1.16b, v0.16b
> str q1, [x3], 16
> cmp x3, x4
> bne .L4
> 
> For SVE2 we optimize the initial sequence to the same ADD + LSR which gets
> us:
> 
> .L3:
> ld1bz0.h, p0/z, [x0, x3]
> mul z0.h, p1/m, z0.h, z2.h
> add z1.h, z0.h, z3.h
> usraz0.h, z1.h, #8
> lsr z0.h, z0.h, #8
> st1bz0.h, p0, [x0, x3]
> inchx3
> whilelo p0.h, w3, w2
> b.any   .L3
> .L1:
> ret
> 
> and to get the most optimal sequence I match (a + b) >> n (same constraint
> on n) to addhnb which gets us to:
> 
> .L3:
> ld1bz0.h, p0/z, [x0, x3]
> mul z0.h, p1/m, z0.h, z2.h
> addhnb  z1.b, z0.h, z3.h
> addhnb  z0.b, z0.h, z1.h
> st1bz0.h, p0, [x0, x3]
> inchx3
> whilelo p0.h, w3, w2
> b.any   .L3
> 
> There are multiple RTL representations possible for these optimizations, I did
> not represent them using a zero_extend because we seem very inconsistent
> in this in the backend.  Since they are unspecs we won't match them from
> vector ops anyway. I figured maintainers would prefer this, but my
> maintainer ouija board is still out for repairs :)
> 
> There are no new test as new correctness tests were added to the mid-end
> and the existing codegen tests for this already exist.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and  issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   PR target/108583
>   * config/aarch64/aarch64-simd.md
> (@aarch64_bitmask_udiv3): Remove.
>   (add3_highpart, *bitmask_shift_plus): New.
>   * config/aarch64/aarch64-sve2.md (add3_highpart,
>   *bitmask_shift_plus): New.
>   (@aarch64_bitmask_udiv3): Remove.
>   * config/aarch64/aarch64.cc
>   (aarch64_vectorize_can_special_div_by_constant): Removed.
>   * config/aarch64/iterators.md (UNSPEC_SADD_HIGHPART,
>   UNSPEC_UADD_HIGHPART, ADD_HIGHPART): New.
> 
> --- inline copy of patch --
> diff --git a/gcc/config/aarch64/aarch64-simd.md
> b/gcc/config/aarch64/aarch64-simd.md
> index
> 7f212bf37cd2c120dceb7efa733c9fa76226f029..26871a56d1fdb134f0ad9d828ce
> 68a8df0272c53 100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -4867,62 +4867,48 @@ (define_expand
> "aarch64_hn2"
>}
>  )
> 
> -;; div optimizations using narrowings
> -;; we can do the division e.g. shorts by 255 faster by calculating it as -;; 
> (x +
> ((x + 257) >> 8)) >> 8 assuming the operation is done in -;; double the
> precision of x.
> -;;
> -;; If we imagine a short as being composed of two blocks of bytes then -;;
> adding 257 or 0b_0001__0001 to the number is equivalent to -;;
> adding 1 to each sub component:
> -;;
> -;;  short value of 16-bits
> -;; ┌──┬┐
> -;; │  ││
> -;; └──┴┘
> -;;   8-bit part1 ▲  8-bit part2   ▲
> -;;   ││
> -;;   ││
> -;;  +1   +1
> -;;
> -;; after the first addition, we have to shift right by 8, and narrow the -;;
> results back to a byte.  Remember that the addition must be done in -;;
> double the precision of the input.  Since 

[PATCH] RISC-V: Add Full 'v' extension predicate to vsmul intrinsic

2023-02-10 Thread juzhe . zhong
From: Ju-Zhe Zhong 

According to RVV ISA, vsmul are not supported for EEW=64 in Zve64*,
so add Full 'V' extension required into predicate of vsmul intrinsics.

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-functions.def (vsmul): Change 
iterators.

---
 gcc/config/riscv/riscv-vector-builtins-functions.def | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/riscv-vector-builtins-functions.def 
b/gcc/config/riscv/riscv-vector-builtins-functions.def
index cea44c8fb20..66fa63530f3 100644
--- a/gcc/config/riscv/riscv-vector-builtins-functions.def
+++ b/gcc/config/riscv/riscv-vector-builtins-functions.def
@@ -170,14 +170,14 @@ DEF_RVV_FUNCTION (vaadd, alu, full_preds, i_vvv_ops)
 DEF_RVV_FUNCTION (vasub, alu, full_preds, i_vvv_ops)
 DEF_RVV_FUNCTION (vaaddu, alu, full_preds, u_vvv_ops)
 DEF_RVV_FUNCTION (vasubu, alu, full_preds, u_vvv_ops)
-DEF_RVV_FUNCTION (vsmul, alu, full_preds, i_vvv_ops)
+DEF_RVV_FUNCTION (vsmul, alu, full_preds, full_v_i_vvv_ops)
 DEF_RVV_FUNCTION (vssra, alu, full_preds, i_shift_vvv_ops)
 DEF_RVV_FUNCTION (vssrl, alu, full_preds, u_shift_vvv_ops)
 DEF_RVV_FUNCTION (vaadd, alu, full_preds, i_vvx_ops)
 DEF_RVV_FUNCTION (vasub, alu, full_preds, i_vvx_ops)
 DEF_RVV_FUNCTION (vaaddu, alu, full_preds, u_vvx_ops)
 DEF_RVV_FUNCTION (vasubu, alu, full_preds, u_vvx_ops)
-DEF_RVV_FUNCTION (vsmul, alu, full_preds, i_vvx_ops)
+DEF_RVV_FUNCTION (vsmul, alu, full_preds, full_v_i_vvx_ops)
 DEF_RVV_FUNCTION (vssra, alu, full_preds, i_shift_vvx_ops)
 DEF_RVV_FUNCTION (vssrl, alu, full_preds, u_shift_vvx_ops)
 DEF_RVV_FUNCTION (vnclipu, narrow_alu, full_preds, u_narrow_shift_vwv_ops)
-- 
2.36.3



[GCC] In 'contrib/config-list.mk', clarify i686-symbolics-gnu to i686-gnu (was: RFA: Add makefile for cross-configuration torture test)

2023-02-10 Thread Thomas Schwinge
Hi!

On 2011-04-13T06:49:31-0400, Joern Rennecke  wrote:
> [...]
> This Makefile is supposed to give coverage of all the main configure targets
> and notable variants that enable different config files.
> [...]

> --- contrib/config-list.mk(revision 0)
> +++ contrib/config-list.mk(revision 0)

> +LIST = [...]
> +  [...]
> +  i686-elf i686-kopensolaris-gnu i686-symbolics-gnu i686-pc-msdosdjgpp \
> +  [...]

Unless anybody has any rationale to share about i686-symbolics-gnu,
I intend to soon push the attached
"In 'contrib/config-list.mk', clarify i686-symbolics-gnu to i686-gnu".


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 18712980c1007d40c30d1893d47475c928e19e95 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 10 Feb 2023 10:43:24 +0100
Subject: [PATCH] In 'contrib/config-list.mk', clarify i686-symbolics-gnu to
 i686-gnu

Already in the first revision of 'contrib/config-list.mk', i686-symbolics-gnu
has been present, but it's not clear to me whether that was meant to be
Symbolics as in the manufacturer, ,
with GNU (that is, GNU/Hurd) kernel/operating system (user land), or Symbolics
kernel with GNU operating system (user land)?

I can't find any mention of "Symbolics" in the history of 'config.sub'
upstream.

Either way, GCC configures i686-symbolics-gnu exactly the same as i686-gnu:

$ sed -n -e '/Using .* host machine hooks\.$/q' -e '/^Using the following target machine macro files:$/,$p' log/i686-gnu-make.out
Using the following target machine macro files:
[...]/gcc/config/vxworks-dummy.h
[...]/gcc/config/i386/i386.h
[...]/gcc/config/i386/unix.h
[...]/gcc/config/i386/att.h
[...]/gcc/config/elfos.h
[...]/gcc/config/gnu-user.h
[...]/gcc/config/glibc-stdint.h
[...]/gcc/config/i386/gnu-user-common.h
[...]/gcc/config/i386/gnu-user.h
[...]/gcc/config/gnu.h
[...]/gcc/config/i386/gnu.h
[...]/gcc/config/initfini-array.h

..., so let's clarify i686-symbolics-gnu to i686-gnu.

	contrib/
	* config-list.mk (LIST): Clarify i686-symbolics-gnu to i686-gnu.
---
 contrib/config-list.mk | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/contrib/config-list.mk b/contrib/config-list.mk
index 20b8f4a196f..4c0a94d5c37 100644
--- a/contrib/config-list.mk
+++ b/contrib/config-list.mk
@@ -50,7 +50,7 @@ LIST = aarch64-elf aarch64-freebsd13 aarch64-linux-gnu aarch64-rtems \
   i686-pc-linux-gnu i686-apple-darwin i686-apple-darwin9 i686-apple-darwin10 \
   i686-freebsd13 i686-kfreebsd-gnu \
   i686-netbsdelf9 \
-  i686-openbsd i686-elf i686-kopensolaris-gnu i686-symbolics-gnu \
+  i686-openbsd i686-elf i686-kopensolaris-gnu i686-gnu \
   i686-pc-msdosdjgpp i686-lynxos i686-nto-qnx \
   i686-rtems i686-solaris2.11 i686-wrs-vxworks \
   i686-wrs-vxworksae \
-- 
2.25.1



[PATCH] tree-optimization/106722 - fix CD-DCE edge marking

2023-02-10 Thread Richard Biener via Gcc-patches
The following fixes a latent issue when we mark control edges but
end up with marking a block with no stmts necessary.  In this case
we fail to mark dependent control edges of that block.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

Does this look OK?

Thanks,
Richard.

PR tree-optimization/106722
* tree-ssa-dce.cc (mark_last_stmt_necessary): Return
whether we marked a stmt.
(mark_control_dependent_edges_necessary): When
mark_last_stmt_necessary didn't mark any stmt make sure
to mark its control dependent edges.
(propagate_necessity): Likewise.

* gcc.dg/torture/pr108737.c: New testcase.
---
 gcc/testsuite/gcc.dg/torture/pr108737.c | 19 +++
 gcc/tree-ssa-dce.cc | 20 +---
 2 files changed, 32 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr108737.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr108737.c 
b/gcc/testsuite/gcc.dg/torture/pr108737.c
new file mode 100644
index 000..c8388bcabeb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr108737.c
@@ -0,0 +1,19 @@
+/* { dg-do run } */
+
+extern void exit (int);
+extern void abort (void);
+
+void __attribute__((noipa)) foo () { exit (0); }
+
+void __attribute__((noipa)) blah (int x)
+{
+  while (1) {
+  if(x) foo();
+  }
+}
+
+int main()
+{
+  blah (1);
+  abort ();
+}
diff --git a/gcc/tree-ssa-dce.cc b/gcc/tree-ssa-dce.cc
index b2fe9f4f55e..21b3294fc86 100644
--- a/gcc/tree-ssa-dce.cc
+++ b/gcc/tree-ssa-dce.cc
@@ -327,17 +327,23 @@ mark_stmt_if_obviously_necessary (gimple *stmt, bool 
aggressive)
 
 /* Mark the last statement of BB as necessary.  */
 
-static void
+static bool
 mark_last_stmt_necessary (basic_block bb)
 {
   gimple *stmt = last_stmt (bb);
 
-  bitmap_set_bit (last_stmt_necessary, bb->index);
+  if (!bitmap_set_bit (last_stmt_necessary, bb->index))
+return true;
+
   bitmap_set_bit (bb_contains_live_stmts, bb->index);
 
   /* We actually mark the statement only if it is a control statement.  */
   if (stmt && is_ctrl_stmt (stmt))
-mark_stmt_necessary (stmt, true);
+{
+  mark_stmt_necessary (stmt, true);
+  return true;
+}
+  return false;
 }
 
 
@@ -369,8 +375,8 @@ mark_control_dependent_edges_necessary (basic_block bb, 
bool ignore_self)
  continue;
}
 
-  if (!bitmap_bit_p (last_stmt_necessary, cd_bb->index))
-   mark_last_stmt_necessary (cd_bb);
+  if (!mark_last_stmt_necessary (cd_bb))
+   mark_control_dependent_edges_necessary (cd_bb, false);
 }
 
   if (!skipped)
@@ -790,8 +796,8 @@ propagate_necessity (bool aggressive)
  if (gimple_bb (stmt)
  != get_immediate_dominator (CDI_POST_DOMINATORS, arg_bb))
{
- if (!bitmap_bit_p (last_stmt_necessary, arg_bb->index))
-   mark_last_stmt_necessary (arg_bb);
+ if (!mark_last_stmt_necessary (arg_bb))
+   mark_control_dependent_edges_necessary (arg_bb, false);
}
  else if (arg_bb != ENTRY_BLOCK_PTR_FOR_FN (cfun)
   && !bitmap_bit_p (visited_control_parents,
-- 
2.35.3


Re: [PATCH] Add x86_64-gnu target to contrib/config-list.mk

2023-02-10 Thread Thomas Schwinge
Hi!

On 2023-02-02T00:55:01-0500, Flavio Cruz  wrote:
> contrib/ChangeLog:
>   * config-list.mk: Add x86_64-gnu to list of archs.
>
> Signed-off-by: Flavio Cruz 

Thanks, pushed to GCC master branch in
commit e635681dd26abf8243b49f6fd39d3582d57225ba
"Add x86_64-gnu target to contrib/config-list.mk".


Grüße
 Thomas


>  contrib/config-list.mk | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/contrib/config-list.mk b/contrib/config-list.mk
> index 20b8f4a196f..661b11c9bbd 100644
> --- a/contrib/config-list.mk
> +++ b/contrib/config-list.mk
> @@ -96,7 +96,7 @@ LIST = aarch64-elf aarch64-freebsd13 aarch64-linux-gnu 
> aarch64-rtems \
>sparc-wrs-vxworks sparc64-elf sparc64-rtems sparc64-linux \
>sparc64-netbsd sparc64-openbsd \
>v850e1-elf v850e-elf v850-elf v850-rtems vax-linux-gnu \
> -  vax-netbsdelf visium-elf x86_64-apple-darwin \
> +  vax-netbsdelf visium-elf x86_64-apple-darwin x86_64-gnu \
>x86_64-pc-linux-gnuOPT-with-fpmath=avx \
>x86_64-elfOPT-with-fpmath=sse x86_64-freebsd13 x86_64-netbsd \
>x86_64-w64-mingw32 \
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [PATCH 3/3] vect: inbranch SIMD clones

2023-02-10 Thread Jakub Jelinek via Gcc-patches
On Fri, Jan 06, 2023 at 12:20:33PM +, Andrew Stubbs wrote:
> > > +/* Ensure the the in-branch simd clones are used on targets that support
> > > +   them.  These counts include all call and definitions.  */
> > > +
> > > +/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */
> > 
> > Drop lines line above.
> 
> I don't want to drop the comment because I get so frustrated by testcases
> that fail when something changes and it's not obvious what the original
> author was actually trying to test.
> 
> I've tried to fix the -flto thing and I can't figure out how. The problem
> seems to be that there are two dump files from the two compiler invocations
> and it scans the wrong one. Aarch64 has the same problem.

Two dumps are because it is in a dg-do run test.
I think it would be better to separate it, have for all cases one
test with defaulted dg-do (in vect.exp that is either dg-do run or dg-do
compile:
# If the target system supports vector instructions, the default action
# for a test is 'run', otherwise it's 'compile'.
) without the dg-final and then another one with the same TYPE which would
be forcibly dg-do compile with dg-final and
dg-additional-options "-ffat-lto-objects", then you get a single dump only.

> > > +/* { dg-final { scan-tree-dump-times "simdclone" 18 "optimized" { target 
> > > x86_64-*-* } } } */
> > > +/* { dg-final { scan-tree-dump-times "simdclone" 7 "optimized" { target 
> > > amdgcn-*-* } } } */
> > 
> > And scan-tree-dump-times " = foo.simdclone" 2 "optimized"; I'd think that
> > should be the right number for all of x86_64, amdgcn and aarch64.  And
> > please don't forget about i?86-*-* too.
> 
> I've switched the pattern and changed to using the "vect" dump (instead of
> "optimized") so that the later transformations don't mess up the counts.
> However there are still other reasons why the count varies. It might be that
> those can be turned off by options somehow, but probably testing those cases
> is valuable too. The values are 2, 3, or 4, now, instead of 18, so that's an
> improvement.

But still varries between the architectures, so it is an extra maintainance
nightmare.

> > > +/* TODO: aarch64 */
> > 
> > For aarch64, one would need to include it in 
> > check_effective_target_vect_simd_clones
> > first...
> 
> I've done so and tested it, but that's not included in the patch because
> there were other testcases that started reporting fails. None of the new
> testcases fail for Aarch64.

Sure, that would be for a separate patch.

Anyway, if you want, commit the patch as is and tweak the testcases if
possible incrementally.

Jakub



Re: [wwwdocs] gcc-13/changes.html: Document C++ -fexcess-precision=standard

2023-02-10 Thread Gerald Pfeifer
On Thu, 9 Feb 2023, Jakub Jelinek wrote:
> Martin Liska mentioned that porting_to.html doesn't mention
> the C++ excess precision changes.  Not really sure if porting_to
> should document those, but I think changes.html certainly should.

Do you think this is a widely spread issue for existing software? Did 
it materialize a couple of times building/testing Fedora with GCC 13 
snapshots (assuming you have done so as in the past)?

> Ok for wwwdocs?

Yes, thank you! Two minor suggestions/questions below:

> --- a/htdocs/gcc-13/changes.html
> +++ b/htdocs/gcc-13/changes.html
> +  -fexcess-precision=fast.  The option affects mainly

Here I'd say "mainly affects".

> +  IA-32/x86-64 where when defaulting to x87 math and in some cases on
> +  Motorola 68000 float and double expressions
> +  are evaluated in long double precision and S/390, System 
> z,
> +  IBM z Systems where float expressions are evaluated in
> +  double precision.

The "where when" part proved a bit tricky for my brain. :-) 

I think it is precise, but am wondering whether

  ...IA-32/x64 using x87 math and in some cases on Motorola 68000, where
  float and double expressions are evaluated...

might work? What do you think?

Gerald