[PATCH] c++: P2513R4, char8_t Compatibility and Portability Fix [PR106656]

2022-09-23 Thread Marek Polacek via Gcc-patches
P0482R6, which added char8_t, didn't allow

  const char arr[] = u8"howdy";

because it said "Declarations of arrays of char may currently be initialized
with UTF-8 string literals. Under this proposal, such initializations would
become ill-formed."  This caused too many issues, so P2513R4 alleviates some
of those compatibility problems.  In particular, "Arrays of char or unsigned
char may now be initialized with a UTF-8 string literal."  This restriction
has been lifted for initialization only, not implicit conversions.  Also,
my reading is that 'signed char' was excluded from the allowable conversions.

This is supposed to be treated as a DR in C++20.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

PR c++/106656

gcc/c-family/ChangeLog:

* c-cppbuiltin.cc (c_cpp_builtins): Update value of __cpp_char8_t
for C++20.

gcc/cp/ChangeLog:

* typeck2.cc (array_string_literal_compatible_p): Allow
initializing arrays of char or unsigned char by a UTF-8 string literal.

gcc/testsuite/ChangeLog:

* g++.dg/cpp23/feat-cxx2b.C: Adjust.
* g++.dg/cpp2a/feat-cxx2a.C: Likewise.
* g++.dg/ext/char8_t-feature-test-macro-2.C: Likewise.
* g++.dg/ext/char8_t-init-2.C: Likewise.
* g++.dg/cpp2a/char8_t3.C: New test.
* g++.dg/cpp2a/char8_t4.C: New test.
---
 gcc/c-family/c-cppbuiltin.cc  |  2 +-
 gcc/cp/typeck2.cc |  9 +
 gcc/testsuite/g++.dg/cpp23/feat-cxx2b.C   |  4 +-
 gcc/testsuite/g++.dg/cpp2a/char8_t3.C | 37 +++
 gcc/testsuite/g++.dg/cpp2a/char8_t4.C | 17 +
 gcc/testsuite/g++.dg/cpp2a/feat-cxx2a.C   |  4 +-
 .../g++.dg/ext/char8_t-feature-test-macro-2.C |  4 +-
 gcc/testsuite/g++.dg/ext/char8_t-init-2.C |  4 +-
 8 files changed, 72 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/char8_t3.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/char8_t4.C

diff --git a/gcc/c-family/c-cppbuiltin.cc b/gcc/c-family/c-cppbuiltin.cc
index a1557eb23d5..b709f845c81 100644
--- a/gcc/c-family/c-cppbuiltin.cc
+++ b/gcc/c-family/c-cppbuiltin.cc
@@ -1112,7 +1112,7 @@ c_cpp_builtins (cpp_reader *pfile)
   if (flag_threadsafe_statics)
cpp_define (pfile, "__cpp_threadsafe_static_init=200806L");
   if (flag_char8_t)
-cpp_define (pfile, "__cpp_char8_t=201811L");
+   cpp_define (pfile, "__cpp_char8_t=202207L");
 #ifndef THREAD_MODEL_SPEC
   /* Targets that define THREAD_MODEL_SPEC need to define
 __STDCPP_THREADS__ in their config/XXX/XXX-c.c themselves.  */
diff --git a/gcc/cp/typeck2.cc b/gcc/cp/typeck2.cc
index 75fd0e2a9bf..739097a9734 100644
--- a/gcc/cp/typeck2.cc
+++ b/gcc/cp/typeck2.cc
@@ -1118,6 +1118,15 @@ array_string_literal_compatible_p (tree type, tree init)
   if (ordinary_char_type_p (to_char_type)
   && ordinary_char_type_p (from_char_type))
 return true;
+
+  /* P2513 (C++20/C++23): "an array of char or unsigned char may
+ be initialized by a UTF-8 string literal, or by such a string
+ literal enclosed in braces."  */
+  if (from_char_type == char8_type_node
+  && (to_char_type == char_type_node
+ || to_char_type == unsigned_char_type_node))
+return true;
+
   return false;
 }
 
diff --git a/gcc/testsuite/g++.dg/cpp23/feat-cxx2b.C 
b/gcc/testsuite/g++.dg/cpp23/feat-cxx2b.C
index d3e40724085..0537e1d24b5 100644
--- a/gcc/testsuite/g++.dg/cpp23/feat-cxx2b.C
+++ b/gcc/testsuite/g++.dg/cpp23/feat-cxx2b.C
@@ -504,8 +504,8 @@
 
 #ifndef __cpp_char8_t
 #  error "__cpp_char8_t"
-#elif __cpp_char8_t != 201811
-#  error "__cpp_char8_t != 201811"
+#elif __cpp_char8_t != 202207
+#  error "__cpp_char8_t != 202207"
 #endif
 
 #ifndef __cpp_designated_initializers
diff --git a/gcc/testsuite/g++.dg/cpp2a/char8_t3.C 
b/gcc/testsuite/g++.dg/cpp2a/char8_t3.C
new file mode 100644
index 000..071a718c4d0
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/char8_t3.C
@@ -0,0 +1,37 @@
+// PR c++/106656 - P2513 - char8_t Compatibility and Portability Fixes
+// { dg-do compile { target c++20 } }
+
+const char *p1 = u8""; // { dg-error "invalid conversion" }
+const unsigned char *p2 = u8""; // { dg-error "invalid conversion" }
+const signed char *p3 = u8""; // { dg-error "invalid conversion" }
+const char *p4 = { u8"" }; // { dg-error "invalid conversion" }
+const unsigned char *p5 = { u8"" }; // { dg-error "invalid conversion" }
+const signed char *p6 = { u8"" }; // { dg-error "invalid conversion" }
+const char *p7 = static_cast(u8""); // { dg-error "invalid" }
+const char a1[] = u8"text";
+const unsigned char a2[] = u8"";
+const signed char a3[] = u8""; // { dg-error "cannot initialize array" }
+const char a4[] = { u8"text" };
+const unsigned char a5[] = { u8"" };
+const signed char a6[] = { u8"" }; // { dg-error "cannot initialize array" }
+
+const char *
+resource_id ()
+{
+  static const char res_id[] = u8"";
+  return res_id;
+}
+
+const char8_t x[] = 

[committed] libstdc++: Add test for type traits not having friend access

2022-09-23 Thread Jonathan Wakely via Gcc-patches
Tested powerpc64le-linux. Pushed to trunk.

-- >8 --

This ensures that the std::is_assignable and std::is_assignable_v
traits are evaluated "in a context unrelated" to the argument types.

libstdc++-v3/ChangeLog:

* testsuite/20_util/is_assignable/requirements/access.cc:
New test.
---
 .../is_assignable/requirements/access.cc  | 22 +++
 1 file changed, 22 insertions(+)
 create mode 100644 
libstdc++-v3/testsuite/20_util/is_assignable/requirements/access.cc

diff --git 
a/libstdc++-v3/testsuite/20_util/is_assignable/requirements/access.cc 
b/libstdc++-v3/testsuite/20_util/is_assignable/requirements/access.cc
new file mode 100644
index 000..a96fba654cd
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/is_assignable/requirements/access.cc
@@ -0,0 +1,22 @@
+// { dg-do compile { target c++11 } }
+
+#include 
+
+class S {
+  operator int();
+  friend void g(); // #1
+};
+
+void
+g()
+{
+  int i = 0;
+  S s;
+  i = s; // this works, because we're inside a friend.
+
+  // But the traits are evaluated in "a context unrelated to either type".
+  static_assert( ! std::is_assignable::value, "unfriendly");
+#if __cplusplus >= 201703L
+  static_assert( ! std::is_assignable_v, "unfriendly");
+#endif
+}
-- 
2.37.3



[committed] libstdc++: Fix std::is_nothrow_invocable_r for uncopyable prvalues [PR91456]

2022-09-23 Thread Jonathan Wakely via Gcc-patches
Tested powerpc64le-linux. Pushed to trunk.

-- >8 --

This is the last missing piece of PR 91456.

This also removes the only use of the C++11 version of
std::is_nothrow_invocable, which was just renamed to
__is_nothrow_invocable_lib. We can remove that now.

libstdc++-v3/ChangeLog:

PR libstdc++/91456
* include/std/type_traits (__is_nothrow_invocable_lib): Remove.
(__is_invocable_impl::__nothrow_type): New member type which
checks if the conversion can throw.
(__is_nt_invocable_impl): Replace class template with alias
template to __is_nt_invocable_impl::__nothrow_type.
* testsuite/20_util/is_nothrow_invocable/91456.cc: New test.
* testsuite/20_util/is_nothrow_convertible/value.cc: Remove
macro used by value_ext.cc test.
* testsuite/20_util/is_nothrow_convertible/value_ext.cc: Remove
test for non-standard __is_nothrow_invocable_lib trait.
---
 libstdc++-v3/include/std/type_traits  | 45 ++-
 .../20_util/is_nothrow_convertible/value.cc   |  2 -
 .../91456.cc} | 19 +---
 3 files changed, 36 insertions(+), 30 deletions(-)
 rename libstdc++-v3/testsuite/20_util/{is_nothrow_convertible/value_ext.cc => 
is_nothrow_invocable/91456.cc} (59%)

diff --git a/libstdc++-v3/include/std/type_traits 
b/libstdc++-v3/include/std/type_traits
index 1797b9e97f7..7c635313a95 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -1451,12 +1451,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 };
 #pragma GCC diagnostic pop
 
-  // is_nothrow_convertible for C++11
-  template
-struct __is_nothrow_convertible_lib
-: public __is_nt_convertible_helper<_From, _To>::type
-{ };
-
 #if __cplusplus > 201703L
 #define __cpp_lib_is_nothrow_convertible 201806L
   /// is_nothrow_convertible
@@ -2825,7 +2819,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   // The primary template is used for invalid INVOKE expressions.
   template::value, typename = void>
-struct __is_invocable_impl : false_type { };
+struct __is_invocable_impl
+: false_type
+{
+  using __nothrow_type = false_type; // For is_nothrow_invocable_r
+};
 
   // Used for valid INVOKE and INVOKE expressions.
   template
@@ -2833,7 +2831,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   /* is_void<_Ret> = */ true,
   __void_t>
 : true_type
-{ };
+{
+  using __nothrow_type = true_type; // For is_nothrow_invocable_r
+};
 
 #pragma GCC diagnostic push
 #pragma GCC diagnostic ignored "-Wctor-dtor-privacy"
@@ -2845,23 +2845,30 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 {
 private:
   // The type of the INVOKE expression.
-  // Unlike declval, this doesn't add_rvalue_reference.
-  static typename _Result::type _S_get();
+  // Unlike declval, this doesn't add_rvalue_reference, so it respects
+  // guaranteed copy elision.
+  static typename _Result::type _S_get() noexcept;
 
   template
-   static void _S_conv(_Tp);
+   static void _S_conv(_Tp) noexcept;
 
   // This overload is viable if INVOKE(f, args...) can convert to _Tp.
-  template(_S_get()))>
-   static true_type
+  template(_S_get())),
+  bool _Noex = noexcept(_S_conv<_Tp>(_S_get()))>
+   static __bool_constant<_Check_Noex ? _Noex : true>
_S_test(int);
 
-  template
+  template
static false_type
_S_test(...);
 
 public:
+  // For is_invocable_r
   using type = decltype(_S_test<_Ret>(1));
+
+  // For is_nothrow_invocable_r
+  using __nothrow_type = decltype(_S_test<_Ret, true>(1));
 };
 #pragma GCC diagnostic pop
 
@@ -2992,15 +2999,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 };
 
   /// @cond undocumented
-  template
-struct __is_nt_invocable_impl : false_type { };
-
   template
-struct __is_nt_invocable_impl<_Result, _Ret,
- __void_t>
-: __or_,
-   __is_nothrow_convertible_lib>::type
-{ };
+using __is_nt_invocable_impl
+  = typename __is_invocable_impl<_Result, _Ret>::__nothrow_type;
   /// @endcond
 
   /// std::is_nothrow_invocable_r
diff --git a/libstdc++-v3/testsuite/20_util/is_nothrow_convertible/value.cc 
b/libstdc++-v3/testsuite/20_util/is_nothrow_convertible/value.cc
index e9aded73624..a2686285052 100644
--- a/libstdc++-v3/testsuite/20_util/is_nothrow_convertible/value.cc
+++ b/libstdc++-v3/testsuite/20_util/is_nothrow_convertible/value.cc
@@ -21,9 +21,7 @@
 #include 
 #include 
 
-#ifndef IS_NT_CONVERTIBLE_DEFINED
 using std::is_nothrow_convertible;
-#endif
 
 void test01()
 {
diff --git a/libstdc++-v3/testsuite/20_util/is_nothrow_convertible/value_ext.cc 
b/libstdc++-v3/testsuite/20_util/is_nothrow_invocable/91456.cc
similarity index 59%
rename from libstdc++-v3/testsuite/20_util/is_nothrow_convertible/value_ext.cc
rename to 

[committed] testsuite: Add more C2x tests

2022-09-23 Thread Joseph Myers
There are some new requirements in C2x where GCC already behaves as
required (for all standard versions), where previous standard versions
either had weaker requirements permitting the GCC behavior, or were
arguably defective in what they said in that area.  Add tests that
specifically verify GCC behaves as expected for C2x.  (There may be
further such tests to be added in future for already-supported C2x
features.)

* Compound literals in function parameter lists have automatic storage
  duration.  (This is a case where strictly this wasn't specified by
  previous standard versions, but it seems to make more sense to treat
  this as a defect in those versions than to implement something
  different conditionally for those versions.)

* Concatenation of string literals with different prefixes is a
  constraint violation (previously it was implementation-defined
  whether it was permitted, and GCC did not permit it).

* UCNs above 0x10 are a constraint violation; previously they were
  implicitly undefined behavior by virtue of wording about "designates
  the character" referring to code points outside the ISO/IEC 10646
  range; GCC diagnosed such UCNs since commit
  0900e29cdbc533fecf2a311447bbde17f101bbd6 (Sep 2019).  The test I
  added also has more detailed coverage of what lower-valued UCNs are
  accepted.

Tested for x86_64-pc-linux-gnu.

* gcc.dg/c2x-complit-1.c, gcc.dg/c2x-concat-1.c,
gcc.dg/cpp/c2x-ucn-1.c: New tests.

diff --git a/gcc/testsuite/gcc.dg/c2x-complit-1.c 
b/gcc/testsuite/gcc.dg/c2x-complit-1.c
new file mode 100644
index 000..af92d4d0a9a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/c2x-complit-1.c
@@ -0,0 +1,35 @@
+/* Test storage duration of compound literals in parameter lists for C2x.  */
+/* { dg-do run } */
+/* { dg-options "-std=c2x -pedantic-errors" } */
+
+extern void abort (void);
+extern void exit (int);
+
+int x;
+
+void f (int a[(int) { x }]);
+
+int *q;
+
+int
+fp (int *p)
+{
+  q = p;
+  return 1;
+}
+
+void
+g (int a, int b[fp ((int [2]) { a, a + 2 })])
+{
+  if (q[0] != a || q[1] != a + 2)
+abort ();
+}
+
+int
+main (void)
+{
+  int t[1] = { 0 };
+  g (1, t);
+  g (2, t);
+  exit (0);
+}
diff --git a/gcc/testsuite/gcc.dg/c2x-concat-1.c 
b/gcc/testsuite/gcc.dg/c2x-concat-1.c
new file mode 100644
index 000..e92eaaf639e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/c2x-concat-1.c
@@ -0,0 +1,31 @@
+/* Test errors for bad string literal concatenation.  */
+/* { dg-do compile } */
+/* { dg-options "-std=c2x -pedantic-errors" } */
+
+void *pLU = L"" U""; /* { dg-error "non-standard concatenation" } */
+void *pL_U = L"" "" U""; /* { dg-error "non-standard concatenation" } */
+void *pLu = L"" u""; /* { dg-error "non-standard concatenation" } */
+void *pL_u = L"" "" u""; /* { dg-error "non-standard concatenation" } */
+void *pLu8 = L"" u8""; /* { dg-error "non-standard concatenation" } */
+void *pL_u8 = L"" "" u8""; /* { dg-error "non-standard concatenation" } */
+
+void *pUL = U"" L""; /* { dg-error "non-standard concatenation" } */
+void *pU_L = U"" "" L""; /* { dg-error "non-standard concatenation" } */
+void *pUu = U"" u""; /* { dg-error "non-standard concatenation" } */
+void *pU_u = U"" "" u""; /* { dg-error "non-standard concatenation" } */
+void *pUu8 = U"" u8""; /* { dg-error "non-standard concatenation" } */
+void *pU_u8 = U"" "" u8""; /* { dg-error "non-standard concatenation" } */
+
+void *puL = u"" L""; /* { dg-error "non-standard concatenation" } */
+void *pu_L = u"" "" L""; /* { dg-error "non-standard concatenation" } */
+void *puU = u"" U""; /* { dg-error "non-standard concatenation" } */
+void *pu_U = u"" "" U""; /* { dg-error "non-standard concatenation" } */
+void *puu8 = u"" u8""; /* { dg-error "non-standard concatenation" } */
+void *pu_u8 = u"" "" u8""; /* { dg-error "non-standard concatenation" } */
+
+void *pu8L = u8"" L""; /* { dg-error "non-standard concatenation" } */
+void *pu8_L = u8"" "" L""; /* { dg-error "non-standard concatenation" } */
+void *pu8U = u8"" U""; /* { dg-error "non-standard concatenation" } */
+void *pu8_U = u8"" "" U""; /* { dg-error "non-standard concatenation" } */
+void *pu8u = u8"" u""; /* { dg-error "non-standard concatenation" } */
+void *pu8_u = u8"" "" u""; /* { dg-error "non-standard concatenation" } */
diff --git a/gcc/testsuite/gcc.dg/cpp/c2x-ucn-1.c 
b/gcc/testsuite/gcc.dg/cpp/c2x-ucn-1.c
new file mode 100644
index 000..a4998aeda85
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/cpp/c2x-ucn-1.c
@@ -0,0 +1,996 @@
+/* Test characters not permitted in UCNs in C2x.  */
+/* { dg-do compile } */
+/* { dg-options "-std=c2x -pedantic-errors" } */
+
+#if U'\u' /* { dg-error "is not a valid universal character" } */
+#endif
+void *tu0 = U"\u"; /* { dg-error "is not a valid universal character" } */
+#if U'\U' /* { dg-error "is not a valid universal character" } */
+#endif
+void *tU0 = U"\U"; /* { dg-error "is not a valid universal character" 
} */
+#if U'\u0001' /* { 

[r13-2804 Regression] FAIL: gcc.target/i386/avx256-unaligned-store-3.c scan-assembler vmovupd.*movv2df_internal/4 on Linux/x86_64

2022-09-23 Thread haochen.jiang via Gcc-patches
On Linux/x86_64,

a282f086ef26d90e9785e992cd09a0d118b24695 is the first bad commit
commit a282f086ef26d90e9785e992cd09a0d118b24695
Author: Hu, Lin1 
Date:   Tue Sep 13 16:28:54 2022 +0800

i386: Optimize code generation of 
__mm256_zextsi128_si256(__mm_set1_epi8(-1))

caused

FAIL: gcc.target/i386/avx256-unaligned-store-3.c scan-assembler vextractf128
FAIL: gcc.target/i386/avx256-unaligned-store-3.c scan-assembler 
vmovupd.*movv2df_internal/4

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r13-2804/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx256-unaligned-store-3.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx256-unaligned-store-3.c 
--target_board='unix{-m64\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com)


[PATCH] Fix profile count comparison.

2022-09-23 Thread Eugene Rozenfeld via Gcc-patches
The comparison was incorrect when the counts weren't PRECISE.
For example, crossmodule-indir-call-topn-1.c was failing
with AutoFDO: when count_sum is 0 with quality AFDO,
count_sum > profile_count::zero() evaluates to true. Taking that
branch then leads to an assert in the call to to_sreal().

Tested on x86_64-pc-linux-gnu.

gcc/ChangeLog:

* ipa-cp.cc (good_cloning_opportunity_p): Fix profile count comparison.
---
 gcc/ipa-cp.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
index 543a9334e2c..66bba71c068 100644
--- a/gcc/ipa-cp.cc
+++ b/gcc/ipa-cp.cc
@@ -3338,9 +3338,9 @@ good_cloning_opportunity_p (struct cgraph_node *node, 
sreal time_benefit,

   ipa_node_params *info = ipa_node_params_sum->get (node);
   int eval_threshold = opt_for_fn (node->decl, param_ipa_cp_eval_threshold);
-  if (count_sum > profile_count::zero ())
+  if (count_sum.nonzero_p ())
 {
-  gcc_assert (base_count > profile_count::zero ());
+  gcc_assert (base_count.nonzero_p ());
   sreal factor = count_sum.probability_in (base_count).to_sreal ();
   sreal evaluation = (time_benefit * factor) / size_cost;
   evaluation = incorporate_penalties (node, info, evaluation);
--
2.25.1


[PATCH] Fix typo in chapter level for RISC-V attributes

2022-09-23 Thread Torbjörn SVENSSON via Gcc-patches
The "RISC-V specific attributes" section should be at the same level
as "PowerPC-specific attributes".

gcc/ChangeLog:

* doc/sourcebuild.texi: Fix chapter level.

Signed-off-by: Torbjörn SVENSSON  
---
 gcc/doc/sourcebuild.texi | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 760ff9559a6..52357cc7aee 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -2447,7 +2447,7 @@ PowerPC target pre-defines macro _ARCH_PWR9 which means 
the @code{-mcpu}
 setting is Power9 or later.
 @end table
 
-@subsection RISC-V specific attributes
+@subsubsection RISC-V specific attributes
 
 @table @code
 
-- 
2.25.1



Making gcc toolchain installs relocatable

2022-09-23 Thread Keith Packard via Gcc-patches

I submitted the referenced patch to extend the 'getenv' .specs function
back in August and didn't see any response, so I wanted to provide a bit
more context to see if that would help people understand why I wrote
this.

Here's a link to that message:

https://gcc.gnu.org/pipermail/gcc-patches/2022-August/600452.html

I'm working with embedded toolchains where I want to distribute binary
versions of binutils, gcc and a suite of libraries in a tar file which
the user can unpack anywhere on their system. To make this work, I need
to create .spec file fragments that can locate the correct libraries
relative to the location where the toolchain was unpacked.

An easy way to do this, which doesn't depend on a default sysroot value,
is to use the GCC_EXEC_PREFIX environment variable in the .specs
file. Gcc sets that whenever it discovers that it hasn't been run from
the defined installation path. However, if the user does end up
installing gcc in the defined installation path, then that variable
isn't set at all. If a .specs file attempts to reference the variable,
gcc will emit a fatal error and exit.

This patch makes it possible for the .specs file fragment to provide the
default path as a fallback for a missing environment variable so that,
instead of exiting, the correct value will be substituted instead.

By doing this, I can create portable .specs file fragments which work
wherever the toolchain is installed.

This patch seemed like the least invasive approach to solving this
problem, but there are two other approaches that could work, and which
would make the .specs files simpler:

 1. Always set the GCC_EXEC_PREFIX environment variable, even if GCC
is executed from the expected location.

 2. Make %R in .specs files expand to the appropriate value even if
there is no sysroot defined.

I'd be happy to provide an implementation of either of those if that
would be more acceptable.

-- 
-keith


signature.asc
Description: PGP signature


[PATCH] c++: Don't quote nothrow in diagnostic

2022-09-23 Thread Marek Polacek via Gcc-patches
In 
Jason noticed that we quote "nothrow" in diagnostics even though it's
not a keyword in C++.  Just removing the quotes didn't work because
then -Wformat-diag complains, so this patch replaces it with "no-throw".

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

gcc/cp/ChangeLog:

* constraint.cc (diagnose_trait_expr): Say "no-throw" (without quotes)
rather than "nothrow" in quotes.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-traits3.C: Adjust expected diagnostics.
---
 gcc/cp/constraint.cc  | 14 +++---
 gcc/testsuite/g++.dg/cpp2a/concepts-traits3.C |  8 
 2 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index 5839bfb4b52..136647f7c9e 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -3592,13 +3592,13 @@ diagnose_trait_expr (tree expr, tree args)
   switch (TRAIT_EXPR_KIND (expr))
 {
 case CPTK_HAS_NOTHROW_ASSIGN:
-  inform (loc, "  %qT is not % copy assignable", t1);
+  inform (loc, "  %qT is not no-throw copy assignable", t1);
   break;
 case CPTK_HAS_NOTHROW_CONSTRUCTOR:
-  inform (loc, "  %qT is not % default constructible", t1);
+  inform (loc, "  %qT is not no-throw default constructible", t1);
   break;
 case CPTK_HAS_NOTHROW_COPY:
-  inform (loc, "  %qT is not % copy constructible", t1);
+  inform (loc, "  %qT is not no-throw copy constructible", t1);
   break;
 case CPTK_HAS_TRIVIAL_ASSIGN:
   inform (loc, "  %qT is not trivially copy assignable", t1);
@@ -3674,7 +3674,7 @@ diagnose_trait_expr (tree expr, tree args)
   inform (loc, "  %qT is not trivially assignable from %qT", t1, t2);
   break;
 case CPTK_IS_NOTHROW_ASSIGNABLE:
-  inform (loc, "  %qT is not % assignable from %qT", t1, t2);
+  inform (loc, "  %qT is not no-throw assignable from %qT", t1, t2);
   break;
 case CPTK_IS_CONSTRUCTIBLE:
   if (!t2)
@@ -3690,9 +3690,9 @@ diagnose_trait_expr (tree expr, tree args)
   break;
 case CPTK_IS_NOTHROW_CONSTRUCTIBLE:
   if (!t2)
-   inform (loc, "  %qT is not % default constructible", t1);
+   inform (loc, "  %qT is not no-throw default constructible", t1);
   else
-   inform (loc, "  %qT is not % constructible from %qE", t1, t2);
+   inform (loc, "  %qT is not no-throw constructible from %qE", t1, t2);
   break;
 case CPTK_HAS_UNIQUE_OBJ_REPRESENTATIONS:
   inform (loc, "  %qT does not have unique object representations", t1);
@@ -3701,7 +3701,7 @@ diagnose_trait_expr (tree expr, tree args)
   inform (loc, "  %qT is not convertible from %qE", t2, t1);
   break;
 case CPTK_IS_NOTHROW_CONVERTIBLE:
-   inform (loc, "  %qT is not % convertible from %qE", t2, t1);
+   inform (loc, "  %qT is not no-throw convertible from %qE", t2, t1);
   break;
 case CPTK_REF_CONSTRUCTS_FROM_TEMPORARY:
   inform (loc, "  %qT is not a reference that binds to a temporary "
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-traits3.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-traits3.C
index f20608b6918..6ac849d71fd 100644
--- a/gcc/testsuite/g++.dg/cpp2a/concepts-traits3.C
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-traits3.C
@@ -21,7 +21,7 @@ concept TriviallyAssignable = __is_trivially_assignable(T, U);
 
 template
 concept NothrowAssignable = __is_nothrow_assignable(T, U);
-// { dg-message "'S' is not 'nothrow' assignable from 'int'" "" { target *-*-* 
} .-1  }
+// { dg-message "'S' is not no-throw assignable from 'int'" "" { target *-*-* 
} .-1  }
 
 template
 concept Constructible = __is_constructible(T, Args...);
@@ -37,9 +37,9 @@ concept TriviallyConstructible = 
__is_trivially_constructible(T, Args...);
 
 template
 concept NothrowConstructible = __is_nothrow_constructible(T, Args...);
-// { dg-message "'S' is not 'nothrow' default constructible" "" { target *-*-* 
} .-1  }
-// { dg-message "'S' is not 'nothrow' constructible from 'int'" "" { target 
*-*-* } .-2  }
-// { dg-message "'S' is not 'nothrow' constructible from 'int, char'" "" { 
target *-*-* } .-3  }
+// { dg-message "'S' is not no-throw default constructible" "" { target *-*-* 
} .-1  }
+// { dg-message "'S' is not no-throw constructible from 'int'" "" { target 
*-*-* } .-2  }
+// { dg-message "'S' is not no-throw constructible from 'int, char'" "" { 
target *-*-* } .-3  }
 
 template
 concept UniqueObjReps = __has_unique_object_representations(T);

base-commit: 8a7bcf95a82c3dd68bd4bcfbd8432eb970575bc2
-- 
2.37.3



Re: [PATCH v2] testsuite: Only run test on target if VMA == LMA

2022-09-23 Thread Torbjorn SVENSSON via Gcc-patches

Hi Richard,

Thanks for your review.
Comments below.

On 2022-09-23 19:34, Richard Sandiford wrote:

Torbjörn SVENSSON via Gcc-patches  writes:

Checking that the triplet matches arm*-*-eabi (or msp430-*-*) is not
enough to know if the execution will enter an endless loop, or if it
will give a meaningful result. As the execution test only work when
VMA and LMA are equal, make sure that this condition is met.

2022-09-16  Torbjörn SVENSSON  

gcc/testsuite/ChangeLog:

* lib/target-supports.exp (check_effective_target_vma_equals_lma): New.
 * c-c++-common/torture/attr-noinit-1.c: Requre VMA == LMA to run.
 * c-c++-common/torture/attr-noinit-2.c: Likewise.
 * c-c++-common/torture/attr-noinit-3.c: Likewise.
 * c-c++-common/torture/attr-persistent-1.c: Likewise.
 * c-c++-common/torture/attr-persistent-3.c: Likewise.


Looks like a nice thing to have.

Could you add an entry to doc/sourcebuild.texi too?


I've added a note and will be shared in a v3.


+
+# Example output of objdump:
+#vma_equals_lma9059.exe: file format elf32-littlearm
+#
+#Sections:
+#Idx Name  Size  VMA   LMA   File off  Algn
+#  6 .data 0558  2000  08002658  0002  2**3
+#  CONTENTS, ALLOC, LOAD, DATA
+
+# Capture LMA and VMA columns for .data section
+if ![ regexp "\d*\d+\s+\.data\s+\d+\s+(\d+)\s+(\d+)" $output dummy 
vma lma ] {


Maybe my Tcl is getting rusty, but I'm surprised this quoting works.
I'd have expected single backslashes to be interpreted as C-style
sequences (for \n etc) and so be eaten before regexp sees them.
Quoting with {...} instead of "..." would avoid that.


Good catch! I'm not fluent in Tcl and apparently, I was not testing this 
well enough before sending it to the list. I got the expected result for 
the test cases, but for the wrong reason. I've correct it for the v3.



+verbose "Could not parse objdump output" 2
+return 0
+} else {
+return [string equal $vma $lma]
+}
+   } else {
+remote_file build delete $exe
+verbose "Could not determine if VMA is equal to LMA. Assuming not 
equal." 2
+return 0
+   }


Would it be more conservative to return 1 on failure rather than 0?
That way, a faulty test would trigger XFAILs rather than UNSUPPORTEDs,
with XFAILs being more likely to get attention.


The main issue here is that for targets where VMA != LMA, executing the 
tests will fall into an endless recursion loop. Well, "endless" in the 
sense that the stack might be depleted or the test will simply timeout. 
The test cases are designed to assume that it's safe to call _start() 
from within main() to verify that the state of some variables tagged 
with certain attributes are correct after a "reset".



On the other hand, I suppose we don't lose much if these tests are
run on common targets only.  So either way is OK, just asking. ;-)


Do you think it's worth to have these tests reach the timeout to have 
them in the XFAIL list rather than in the UNSUPPORTED list?
Keep in mind that it's not just one test case, it's 5 test cases times 
the number of permutations of the CFLAGS...
It's also not expected that these test cases will be changed in a way 
that will make them work when VMA != LMA.


Kind regards,
Torbjörn


Re: [PATCH v3] RISC-V: remove deprecate pic code model macro

2022-09-23 Thread Vineet Gupta

On 9/2/22 14:05, Vineet Gupta wrote:

Came across this deprecated symbol when looking around for
-mexplicit-relocs handling in code

Signed-off-by: Vineet Gupta 


No rush but looks like this got lost in the bigger thread about 
LOAD_ADDRESS_MACRO.


Thx,
-Vineet


---
  gcc/config/riscv/riscv-c.cc   | 5 -
  gcc/testsuite/gcc.target/riscv/predef-1.c | 3 ---
  gcc/testsuite/gcc.target/riscv/predef-2.c | 3 ---
  gcc/testsuite/gcc.target/riscv/predef-3.c | 3 ---
  gcc/testsuite/gcc.target/riscv/predef-4.c | 3 ---
  gcc/testsuite/gcc.target/riscv/predef-5.c | 3 ---
  gcc/testsuite/gcc.target/riscv/predef-6.c | 3 ---
  gcc/testsuite/gcc.target/riscv/predef-7.c | 3 ---
  gcc/testsuite/gcc.target/riscv/predef-8.c | 3 ---
  9 files changed, 29 deletions(-)

diff --git a/gcc/config/riscv/riscv-c.cc b/gcc/config/riscv/riscv-c.cc
index eb7ef09297e9..8d55ad598a9c 100644
--- a/gcc/config/riscv/riscv-c.cc
+++ b/gcc/config/riscv/riscv-c.cc
@@ -93,11 +93,6 @@ riscv_cpu_cpp_builtins (cpp_reader *pfile)
break;
  
  case CM_PIC:

-  /* __riscv_cmodel_pic is deprecated, and will removed in next GCC 
release.
-see https://github.com/riscv/riscv-c-api-doc/pull/11  */
-  builtin_define ("__riscv_cmodel_pic");
-  /* FALLTHROUGH. */
-
  case CM_MEDANY:
builtin_define ("__riscv_cmodel_medany");
break;
diff --git a/gcc/testsuite/gcc.target/riscv/predef-1.c 
b/gcc/testsuite/gcc.target/riscv/predef-1.c
index 2e57ce6b3954..9dddc1849635 100644
--- a/gcc/testsuite/gcc.target/riscv/predef-1.c
+++ b/gcc/testsuite/gcc.target/riscv/predef-1.c
@@ -57,9 +57,6 @@ int main () {
  #endif
  #if defined(__riscv_cmodel_medany)
  #error "__riscv_cmodel_medlow"
-#endif
-#if defined(__riscv_cmodel_pic)
-#error "__riscv_cmodel_medlow"
  #endif
  
return 0;

diff --git a/gcc/testsuite/gcc.target/riscv/predef-2.c 
b/gcc/testsuite/gcc.target/riscv/predef-2.c
index c85b3c9fd32a..755fe4ef7d8a 100644
--- a/gcc/testsuite/gcc.target/riscv/predef-2.c
+++ b/gcc/testsuite/gcc.target/riscv/predef-2.c
@@ -57,9 +57,6 @@ int main () {
  #endif
  #if !defined(__riscv_cmodel_medany)
  #error "__riscv_cmodel_medlow"
-#endif
-#if defined(__riscv_cmodel_pic)
-#error "__riscv_cmodel_medlow"
  #endif
  
return 0;

diff --git a/gcc/testsuite/gcc.target/riscv/predef-3.c 
b/gcc/testsuite/gcc.target/riscv/predef-3.c
index 82a89d415809..513645351c09 100644
--- a/gcc/testsuite/gcc.target/riscv/predef-3.c
+++ b/gcc/testsuite/gcc.target/riscv/predef-3.c
@@ -57,9 +57,6 @@ int main () {
  #endif
  #if !defined(__riscv_cmodel_medany)
  #error "__riscv_cmodel_medany"
-#endif
-#if !defined(__riscv_cmodel_pic)
-#error "__riscv_cmodel_pic"
  #endif
  
return 0;

diff --git a/gcc/testsuite/gcc.target/riscv/predef-4.c 
b/gcc/testsuite/gcc.target/riscv/predef-4.c
index 5868d39eb67a..76b6feec6b6f 100644
--- a/gcc/testsuite/gcc.target/riscv/predef-4.c
+++ b/gcc/testsuite/gcc.target/riscv/predef-4.c
@@ -57,9 +57,6 @@ int main () {
  #endif
  #if defined(__riscv_cmodel_medany)
  #error "__riscv_cmodel_medlow"
-#endif
-#if defined(__riscv_cmodel_pic)
-#error "__riscv_cmodel_medlow"
  #endif
  
return 0;

diff --git a/gcc/testsuite/gcc.target/riscv/predef-5.c 
b/gcc/testsuite/gcc.target/riscv/predef-5.c
index 4b2bd3835061..54a51508afbd 100644
--- a/gcc/testsuite/gcc.target/riscv/predef-5.c
+++ b/gcc/testsuite/gcc.target/riscv/predef-5.c
@@ -57,9 +57,6 @@ int main () {
  #endif
  #if !defined(__riscv_cmodel_medany)
  #error "__riscv_cmodel_medlow"
-#endif
-#if defined(__riscv_cmodel_pic)
-#error "__riscv_cmodel_medlow"
  #endif
  
return 0;

diff --git a/gcc/testsuite/gcc.target/riscv/predef-6.c 
b/gcc/testsuite/gcc.target/riscv/predef-6.c
index 8e5ea366bd5e..f61709f7bf32 100644
--- a/gcc/testsuite/gcc.target/riscv/predef-6.c
+++ b/gcc/testsuite/gcc.target/riscv/predef-6.c
@@ -57,9 +57,6 @@ int main () {
  #endif
  #if !defined(__riscv_cmodel_medany)
  #error "__riscv_cmodel_medany"
-#endif
-#if !defined(__riscv_cmodel_pic)
-#error "__riscv_cmodel_medpic"
  #endif
  
return 0;

diff --git a/gcc/testsuite/gcc.target/riscv/predef-7.c 
b/gcc/testsuite/gcc.target/riscv/predef-7.c
index 0bde299aef1a..41217554c4db 100644
--- a/gcc/testsuite/gcc.target/riscv/predef-7.c
+++ b/gcc/testsuite/gcc.target/riscv/predef-7.c
@@ -57,9 +57,6 @@ int main () {
  #endif
  #if defined(__riscv_cmodel_medany)
  #error "__riscv_cmodel_medlow"
-#endif
-#if defined(__riscv_cmodel_pic)
-#error "__riscv_cmodel_medlow"
  #endif
  
return 0;

diff --git a/gcc/testsuite/gcc.target/riscv/predef-8.c 
b/gcc/testsuite/gcc.target/riscv/predef-8.c
index 18aa591a6039..982056a53438 100644
--- a/gcc/testsuite/gcc.target/riscv/predef-8.c
+++ b/gcc/testsuite/gcc.target/riscv/predef-8.c
@@ -57,9 +57,6 @@ int main () {
  #endif
  #if !defined(__riscv_cmodel_medany)
  #error "__riscv_cmodel_medlow"
-#endif
-#if defined(__riscv_cmodel_pic)
-#error "__riscv_cmodel_medlow"
  #endif
  
return 0;




Re: [PATCH v2] testsuite: Only run test on target if VMA == LMA

2022-09-23 Thread Richard Sandiford via Gcc-patches
Torbjörn SVENSSON via Gcc-patches  writes:
> Checking that the triplet matches arm*-*-eabi (or msp430-*-*) is not
> enough to know if the execution will enter an endless loop, or if it
> will give a meaningful result. As the execution test only work when
> VMA and LMA are equal, make sure that this condition is met.
>
> 2022-09-16  Torbjörn SVENSSON  
>
> gcc/testsuite/ChangeLog:
>
>   * lib/target-supports.exp (check_effective_target_vma_equals_lma): New.
> * c-c++-common/torture/attr-noinit-1.c: Requre VMA == LMA to run.
> * c-c++-common/torture/attr-noinit-2.c: Likewise.
> * c-c++-common/torture/attr-noinit-3.c: Likewise.
> * c-c++-common/torture/attr-persistent-1.c: Likewise.
> * c-c++-common/torture/attr-persistent-3.c: Likewise.

Looks like a nice thing to have.

Could you add an entry to doc/sourcebuild.texi too?

A couple of comments below...

> Co-Authored-By: Yvan ROUX  
> Signed-off-by: Torbjörn SVENSSON  
> ---
>  .../c-c++-common/torture/attr-noinit-1.c  |  3 +-
>  .../c-c++-common/torture/attr-noinit-2.c  |  3 +-
>  .../c-c++-common/torture/attr-noinit-3.c  |  3 +-
>  .../c-c++-common/torture/attr-persistent-1.c  |  3 +-
>  .../c-c++-common/torture/attr-persistent-3.c  |  3 +-
>  gcc/testsuite/lib/target-supports.exp | 49 +++
>  6 files changed, 59 insertions(+), 5 deletions(-)
>
> diff --git a/gcc/testsuite/c-c++-common/torture/attr-noinit-1.c 
> b/gcc/testsuite/c-c++-common/torture/attr-noinit-1.c
> index 877e7647ac9..f84eba0b649 100644
> --- a/gcc/testsuite/c-c++-common/torture/attr-noinit-1.c
> +++ b/gcc/testsuite/c-c++-common/torture/attr-noinit-1.c
> @@ -1,4 +1,5 @@
> -/* { dg-do run } */
> +/* { dg-do link } */
> +/* { dg-do run { target { vma_equals_lma } } } */
>  /* { dg-require-effective-target noinit } */
>  /* { dg-skip-if "data LMA != VMA" { msp430-*-* } { "-mlarge" } } */
>  /* { dg-options "-save-temps" } */
> diff --git a/gcc/testsuite/c-c++-common/torture/attr-noinit-2.c 
> b/gcc/testsuite/c-c++-common/torture/attr-noinit-2.c
> index befa2a0bd52..4528b9e3cfa 100644
> --- a/gcc/testsuite/c-c++-common/torture/attr-noinit-2.c
> +++ b/gcc/testsuite/c-c++-common/torture/attr-noinit-2.c
> @@ -1,4 +1,5 @@
> -/* { dg-do run } */
> +/* { dg-do link } */
> +/* { dg-do run { target { vma_equals_lma } } } */
>  /* { dg-require-effective-target noinit } */
>  /* { dg-options "-fdata-sections -save-temps" } */
>  /* { dg-skip-if "data LMA != VMA" { msp430-*-* } { "-mlarge" } } */
> diff --git a/gcc/testsuite/c-c++-common/torture/attr-noinit-3.c 
> b/gcc/testsuite/c-c++-common/torture/attr-noinit-3.c
> index 519e88a59a6..2f1745694c9 100644
> --- a/gcc/testsuite/c-c++-common/torture/attr-noinit-3.c
> +++ b/gcc/testsuite/c-c++-common/torture/attr-noinit-3.c
> @@ -1,4 +1,5 @@
> -/* { dg-do run } */
> +/* { dg-do link } */
> +/* { dg-do run { target { vma_equals_lma } } } */
>  /* { dg-require-effective-target noinit } */
>  /* { dg-options "-flto -save-temps" } */
>  /* { dg-skip-if "data LMA != VMA" { msp430-*-* } { "-mlarge" } } */
> diff --git a/gcc/testsuite/c-c++-common/torture/attr-persistent-1.c 
> b/gcc/testsuite/c-c++-common/torture/attr-persistent-1.c
> index 72dc3c27192..b11a515cef8 100644
> --- a/gcc/testsuite/c-c++-common/torture/attr-persistent-1.c
> +++ b/gcc/testsuite/c-c++-common/torture/attr-persistent-1.c
> @@ -1,4 +1,5 @@
> -/* { dg-do run } */
> +/* { dg-do link } */
> +/* { dg-do run { target { vma_equals_lma } } } */
>  /* { dg-require-effective-target persistent } */
>  /* { dg-skip-if "data LMA != VMA" { msp430-*-* } { "-mlarge" } } */
>  /* { dg-options "-save-temps" } */
> diff --git a/gcc/testsuite/c-c++-common/torture/attr-persistent-3.c 
> b/gcc/testsuite/c-c++-common/torture/attr-persistent-3.c
> index 3e4fd28618d..068a72af5c8 100644
> --- a/gcc/testsuite/c-c++-common/torture/attr-persistent-3.c
> +++ b/gcc/testsuite/c-c++-common/torture/attr-persistent-3.c
> @@ -1,4 +1,5 @@
> -/* { dg-do run } */
> +/* { dg-do link } */
> +/* { dg-do run { target { vma_equals_lma } } } */
>  /* { dg-require-effective-target persistent } */
>  /* { dg-options "-flto -save-temps" } */
>  /* { dg-skip-if "data LMA != VMA" { msp430-*-* } { "-mlarge" } } */
> diff --git a/gcc/testsuite/lib/target-supports.exp 
> b/gcc/testsuite/lib/target-supports.exp
> index 703aba412a6..df8141a15d8 100644
> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp
> @@ -370,6 +370,55 @@ proc check_weak_override_available { } {
>  return [check_weak_available]
>  }
>  
> +# Return 1 if VMA is equal to LMA for the .data section, 0
> +# otherwise.  Cache the result.
> +
> +proc check_effective_target_vma_equals_lma { } {
> +global tool
> +
> +return [check_cached_effective_target vma_equals_lma {
> + set src vma_equals_lma[pid].c
> + set exe vma_equals_lma[pid].exe
> + verbose "check_effective_target_vma_equals_lma  compiling testfile 
> $src" 2
> + set f [open 

Re: [PATCH v2] testsuite: Skip intrinsics test if arm

2022-09-23 Thread Richard Sandiford via Gcc-patches
Torbjörn SVENSSON via Gcc-patches  writes:
> In the test cases, it's clearly written that intrinsics is not
> implemented on arm*. A simple xfail does not help since there are
> link error and that would cause an UNRESOLVED testcase rather than
> XFAIL.
> By changing to dg-skip-if, the entire test case is omitted.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/advsimd-intrinsics/vld1x2.c: Replace
>   dg-xfail-if with gd-skip-if.

Typo: s/gd/dg/

OK with that change, thanks.

Richard

>   * gcc.target/aarch64/advsimd-intrinsics/vld1x3.c: Likewise.
>   * gcc.target/aarch64/advsimd-intrinsics/vld1x4.c: Likewise.
>
> Co-Authored-By: Yvan ROUX  
> Signed-off-by: Torbjörn SVENSSON  
> ---
>  gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld1x2.c | 2 +-
>  gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld1x3.c | 2 +-
>  gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld1x4.c | 2 +-
>  3 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld1x2.c 
> b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld1x2.c
> index 92a139bc523..f933102be47 100644
> --- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld1x2.c
> +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld1x2.c
> @@ -1,6 +1,6 @@
>  /* We haven't implemented these intrinsics for arm yet.  */
> -/* { dg-xfail-if "" { arm*-*-* } } */
>  /* { dg-do run } */
> +/* { dg-skip-if "unsupported" { arm*-*-* } } */
>  /* { dg-options "-O3" } */
>  
>  #include 
> diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld1x3.c 
> b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld1x3.c
> index 6ddd507d9cf..b20dec061b5 100644
> --- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld1x3.c
> +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld1x3.c
> @@ -1,6 +1,6 @@
>  /* We haven't implemented these intrinsics for arm yet.  */
> -/* { dg-xfail-if "" { arm*-*-* } } */
>  /* { dg-do run } */
> +/* { dg-skip-if "unsupported" { arm*-*-* } } */
>  /* { dg-options "-O3" } */
>  
>  #include 
> diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld1x4.c 
> b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld1x4.c
> index 451a0afc6aa..e59f845880e 100644
> --- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld1x4.c
> +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld1x4.c
> @@ -1,6 +1,6 @@
>  /* We haven't implemented these intrinsics for arm yet.  */
> -/* { dg-xfail-if "" { arm*-*-* } } */
>  /* { dg-do run } */
> +/* { dg-skip-if "unsupported" { arm*-*-* } } */
>  /* { dg-options "-O3" } */
>  
>  #include 


Re: [PATCH] c++: Implement __is_{nothrow_,}convertible [PR106784]

2022-09-23 Thread Marek Polacek via Gcc-patches
On Fri, Sep 23, 2022 at 05:34:21PM +0100, Jonathan Wakely wrote:
> On Fri, 23 Sept 2022 at 15:43, Jonathan Wakely wrote:
> >
> > On Fri, 23 Sept 2022 at 15:34, Marek Polacek wrote:
> > >
> > > On Thu, Sep 22, 2022 at 06:14:44PM -0400, Jason Merrill wrote:
> > > > On 9/22/22 09:39, Marek Polacek wrote:
> > > > > To improve compile times, the C++ library could use compiler built-ins
> > > > > rather than implementing std::is_convertible (and _nothrow) as class
> > > > > templates.  This patch adds the built-ins.  We already have
> > > > > __is_constructible and __is_assignable, and the nothrow forms of 
> > > > > those.
> > > > >
> > > > > Microsoft (and clang, for compatibility) also provide an alias called
> > > > > __is_convertible_to.  I did not add it, but it would be trivial to do
> > > > > so.
> > > > >
> > > > > I noticed that our __is_assignable doesn't implement the "Access 
> > > > > checks
> > > > > are performed as if from a context unrelated to either type" 
> > > > > requirement,
> > > > > therefore std::is_assignable / __is_assignable give two different 
> > > > > results
> > > > > here:
> > > > >
> > > > >class S {
> > > > >  operator int();
> > > > >  friend void g(); // #1
> > > > >};
> > > > >
> > > > >void
> > > > >g ()
> > > > >{
> > > > >  // #1 doesn't matter
> > > > >  static_assert(std::is_assignable::value, "");
> > > > >  static_assert(__is_assignable(int&, S), "");
> > > > >}
> > > > >
> > > > > This is not a problem if __is_assignable is not meant to be used by
> > > > > the users.
> > > >
> > > > That's fine, it's not.
> > >
> > > Okay then.  libstdc++ needs to make sure then that it's handled right.
> >
> > That's fine, the type traits in libstdc++ are always "a context
> > unrelated to either type", unless users do something idiotic like
> > declare std::is_assignable as a friend.
> >
> > https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1339r1.pdf
> > wants to explicitly say that's idiotic.
> 
> And I just checked that a variable template like std::is_assignable_v
> also counts as "a context unrelated to either type", even when
> instantiated inside a member function of the type:
> 
> #include 
> 
> template
> constexpr bool is_assignable_v = __is_assignable(T, U);
> 
> class S {
>   operator int();
>   friend void g(); // #1
> };
> 
> void
> g ()
> {
>   // #1 doesn't matter
>   static_assert(std::is_assignable::value, "");
>   static_assert(std::is_assignable_v, "");
>   static_assert(__is_assignable(int&, S), "");
> }
> 
> The first two assertions are consistent, and fail, which is what we
> want.  The direct use of the built-in succeeds, but we don't care.

Great, thanks. 

Marek



Re: [PATCH] c++: Implement __is_{nothrow_,}convertible [PR106784]

2022-09-23 Thread Jonathan Wakely via Gcc-patches
On Fri, 23 Sept 2022 at 15:43, Jonathan Wakely wrote:
>
> On Fri, 23 Sept 2022 at 15:34, Marek Polacek wrote:
> >
> > On Thu, Sep 22, 2022 at 06:14:44PM -0400, Jason Merrill wrote:
> > > On 9/22/22 09:39, Marek Polacek wrote:
> > > > To improve compile times, the C++ library could use compiler built-ins
> > > > rather than implementing std::is_convertible (and _nothrow) as class
> > > > templates.  This patch adds the built-ins.  We already have
> > > > __is_constructible and __is_assignable, and the nothrow forms of those.
> > > >
> > > > Microsoft (and clang, for compatibility) also provide an alias called
> > > > __is_convertible_to.  I did not add it, but it would be trivial to do
> > > > so.
> > > >
> > > > I noticed that our __is_assignable doesn't implement the "Access checks
> > > > are performed as if from a context unrelated to either type" 
> > > > requirement,
> > > > therefore std::is_assignable / __is_assignable give two different 
> > > > results
> > > > here:
> > > >
> > > >class S {
> > > >  operator int();
> > > >  friend void g(); // #1
> > > >};
> > > >
> > > >void
> > > >g ()
> > > >{
> > > >  // #1 doesn't matter
> > > >  static_assert(std::is_assignable::value, "");
> > > >  static_assert(__is_assignable(int&, S), "");
> > > >}
> > > >
> > > > This is not a problem if __is_assignable is not meant to be used by
> > > > the users.
> > >
> > > That's fine, it's not.
> >
> > Okay then.  libstdc++ needs to make sure then that it's handled right.
>
> That's fine, the type traits in libstdc++ are always "a context
> unrelated to either type", unless users do something idiotic like
> declare std::is_assignable as a friend.
>
> https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1339r1.pdf
> wants to explicitly say that's idiotic.

And I just checked that a variable template like std::is_assignable_v
also counts as "a context unrelated to either type", even when
instantiated inside a member function of the type:

#include 

template
constexpr bool is_assignable_v = __is_assignable(T, U);

class S {
  operator int();
  friend void g(); // #1
};

void
g ()
{
  // #1 doesn't matter
  static_assert(std::is_assignable::value, "");
  static_assert(std::is_assignable_v, "");
  static_assert(__is_assignable(int&, S), "");
}

The first two assertions are consistent, and fail, which is what we
want.  The direct use of the built-in succeeds, but we don't care.



Re: [PATCH] c++: Implement __is_{nothrow_,}convertible [PR106784]

2022-09-23 Thread Marek Polacek via Gcc-patches
On Fri, Sep 23, 2022 at 11:54:53AM -0400, Jason Merrill wrote:
> On 9/23/22 10:34, Marek Polacek wrote:
> > On Thu, Sep 22, 2022 at 06:14:44PM -0400, Jason Merrill wrote:
> > > On 9/22/22 09:39, Marek Polacek wrote:
> > > > To improve compile times, the C++ library could use compiler built-ins
> > > > rather than implementing std::is_convertible (and _nothrow) as class
> > > > templates.  This patch adds the built-ins.  We already have
> > > > __is_constructible and __is_assignable, and the nothrow forms of those.
> > > > 
> > > > Microsoft (and clang, for compatibility) also provide an alias called
> > > > __is_convertible_to.  I did not add it, but it would be trivial to do
> > > > so.
> > > > 
> > > > I noticed that our __is_assignable doesn't implement the "Access checks
> > > > are performed as if from a context unrelated to either type" 
> > > > requirement,
> > > > therefore std::is_assignable / __is_assignable give two different 
> > > > results
> > > > here:
> > > > 
> > > > class S {
> > > >   operator int();
> > > >   friend void g(); // #1
> > > > };
> > > > 
> > > > void
> > > > g ()
> > > > {
> > > >   // #1 doesn't matter
> > > >   static_assert(std::is_assignable::value, "");
> > > >   static_assert(__is_assignable(int&, S), "");
> > > > }
> > > > 
> > > > This is not a problem if __is_assignable is not meant to be used by
> > > > the users.
> > > 
> > > That's fine, it's not.
> > Okay then.  libstdc++ needs to make sure then that it's handled right.
> > 
> > > > This patch doesn't make libstdc++ use the new built-ins, but I had to
> > > > rename a class otherwise its name would clash with the new built-in.
> > > 
> > > Sigh, that's going to be a hassle when comparing compiler versions on
> > > preprocessed code.
> > 
> > Yeah, I guess :/.  Kind of like __integer_pack / __make_integer_seq.
> > 
> > > > --- a/gcc/cp/constraint.cc
> > > > +++ b/gcc/cp/constraint.cc
> > > > @@ -3697,6 +3697,12 @@ diagnose_trait_expr (tree expr, tree args)
> > > >case CPTK_HAS_UNIQUE_OBJ_REPRESENTATIONS:
> > > >  inform (loc, "  %qT does not have unique object 
> > > > representations", t1);
> > > >  break;
> > > > +case CPTK_IS_CONVERTIBLE:
> > > > +  inform (loc, "  %qT is not convertible from %qE", t2, t1);
> > > > +  break;
> > > > +case CPTK_IS_NOTHROW_CONVERTIBLE:
> > > > +   inform (loc, "  %qT is not % convertible from %qE", 
> > > > t2, t1);
> > > 
> > > It's odd that the existing diagnostics quote "nothrow", which is not a
> > > keyword.  I wonder why these library traits didn't use "noexcept"?
> > 
> > Eh, yeah, only "throw" is.  The quotes were deliberately added in
> > .  Should
> > I prepare a separate patch to use "%" rather than "%"?
> > OTOH, the traits have "nothrow" in their names, so maybe just go back to
> > "nothrow"?
> 
> The latter, I think.  Or possibly "no-throw".  I guess -Wformat-diag wants
> "nothrow" quoted because of the attribute of that name.

OK, let me see.
 
> > > > --- a/gcc/cp/method.cc
> > > > +++ b/gcc/cp/method.cc
> > > > @@ -2236,6 +2236,37 @@ ref_xes_from_temporary (tree to, tree from, bool 
> > > > direct_init_p)
> > > >  return ref_conv_binds_directly (to, val, direct_init_p).is_false 
> > > > ();
> > > >}
> > > > +/* Return true if FROM can be converted to TO using implicit 
> > > > conversions,
> > > > +   or both FROM and TO are possibly cv-qualified void.  NB: This 
> > > > doesn't
> > > > +   implement the "Access checks are performed as if from a context 
> > > > unrelated
> > > > +   to either type" restriction.  */
> > > > +
> > > > +bool
> > > > +is_convertible (tree from, tree to)
> > > 
> > > You didn't want to add conversion to is*_xible?
> > 
> > No, it didn't look like a good fit.  It does things we don't need, and
> > also has if VOID_TYPE_P -> return error_mark_node; which would be wrong
> > for __is_convertible.
> > 
> > I realized I'm not testing passing an incomplete type to the built-in,
> > but since that is UB, I reckon we don't need to test it (we issue
> > "error: invalid use of incomplete type").
> 
> But your patch does test that, in the existing call to check_trait_type from
> finish_trait_expr?

Yes, it eventually checks complete_type_or_else.

> The patch is OK.

Thanks,

Marek



Re: Extend fold_vec_perm to fold VEC_PERM_EXPR in VLA manner

2022-09-23 Thread Richard Sandiford via Gcc-patches
Prathamesh Kulkarni  writes:
> On Tue, 20 Sept 2022 at 18:09, Richard Sandiford
>  wrote:
>>
>> Prathamesh Kulkarni  writes:
>> > On Mon, 12 Sept 2022 at 19:57, Richard Sandiford
>> >  wrote:
>> >>
>> >> Prathamesh Kulkarni  writes:
>> >> >> The VLA encoding encodes the first N patterns explicitly.  The
>> >> >> npatterns/nelts_per_pattern values then describe how to extend that
>> >> >> initial sequence to an arbitrary number of elements.  So when 
>> >> >> performing
>> >> >> an operation on (potentially) variable-length vectors, the questions 
>> >> >> is:
>> >> >>
>> >> >> * Can we work out an initial sequence and npatterns/nelts_per_pattern
>> >> >>   pair that will be correct for all elements of the result?
>> >> >>
>> >> >> This depends on the operation that we're performing.  E.g. it's
>> >> >> different for unary operations (vector_builder::new_unary_operation)
>> >> >> and binary operations (vector_builder::new_binary_operations).  It also
>> >> >> varies between unary operations and between binary operations, hence
>> >> >> the allow_stepped_p parameters.
>> >> >>
>> >> >> For VEC_PERM_EXPR, I think the key requirement is that:
>> >> >>
>> >> >> (R) Each individual selector pattern must always select from the same 
>> >> >> vector.
>> >> >>
>> >> >> Whether this condition is met depends both on the pattern itself and on
>> >> >> the number of patterns that it's combined with.
>> >> >>
>> >> >> E.g. suppose we had the selector pattern:
>> >> >>
>> >> >>   { 0, 1, 4, ... }   i.e. 3x - 2 for x > 0
>> >> >>
>> >> >> If the arguments and selector are n elements then this pattern on its
>> >> >> own would select from more than one argument if 3(n-1) - 2 >= n.
>> >> >> This is clearly true for large enough n.  So if n is variable then
>> >> >> we cannot represent this.
>> >> >>
>> >> >> If the pattern above is one of two patterns, so interleaved as:
>> >> >>
>> >> >>  { 0, _, 1, _, 4, _, ... }  o=0
>> >> >>   or { _, 0, _, 1, _, 4, ... }  o=1
>> >> >>
>> >> >> then the pattern would select from more than one argument if
>> >> >> 3(n/2-1) - 2 + o >= n.  This too would be a problem for variable n.
>> >> >>
>> >> >> But if the pattern above is one of four patterns then it selects
>> >> >> from more than one argument if 3(n/4-1) - 2 + o >= n.  This is not
>> >> >> true for any valid n or o, so the pattern is OK.
>> >> >>
>> >> >> So let's define some ad hoc terminology:
>> >> >>
>> >> >> * Px is the number of patterns in x
>> >> >> * Ex is the number of elements per pattern in x
>> >> >>
>> >> >> where x can be:
>> >> >>
>> >> >> * 1: first argument
>> >> >> * 2: second argument
>> >> >> * s: selector
>> >> >> * r: result
>> >> >>
>> >> >> Then:
>> >> >>
>> >> >> (1) The number of elements encoded explicitly for x is Ex*Px
>> >> >>
>> >> >> (2) The explicit encoding can be used to produce a sequence of N*Ex*Px
>> >> >> elements for any integer N.  This extended sequence can be 
>> >> >> reencoded
>> >> >> as having N*Px patterns, with Ex staying the same.
>> >> >>
>> >> >> (3) If Ex < 3, Ex can be increased by 1 by repeating the final Px 
>> >> >> elements
>> >> >> of the explicit encoding.
>> >> >>
>> >> >> So let's assume (optimistically) that we can produce the result
>> >> >> by calculating the first Pr*Er elements and using the Pr,Er encoding
>> >> >> to imply the rest.  Then:
>> >> >>
>> >> >> * (2) means that, when combining multiple input operands with 
>> >> >> potentially
>> >> >>   different encodings, we can set the number of patterns in the result
>> >> >>   to the least common multiple of the number of patterns in the inputs.
>> >> >>   In this case:
>> >> >>
>> >> >>   Pr = least_common_multiple(P1, P2, Ps)
>> >> >>
>> >> >>   is a valid number of patterns.
>> >> >>
>> >> >> * (3) means that the number of elements per pattern of the result can
>> >> >>   be the maximum of the number of elements per pattern in the inputs.
>> >> >>   (Alternatively, we could always use 3.)  In this case:
>> >> >>
>> >> >>   Er = max(E1, E2, Es)
>> >> >>
>> >> >>   is a valid number of elements per pattern.
>> >> >>
>> >> >> So if (R) holds we can compute the result -- for both VLA and VLS -- by
>> >> >> calculating the first Pr*Er elements of the result and using the
>> >> >> encoding to derive the rest.  If (R) doesn't hold then we need the
>> >> >> selector to be constant-length.  We should then fill in the result
>> >> >> based on:
>> >> >>
>> >> >> - Pr == number of elements in the result
>> >> >> - Er == 1
>> >> >>
>> >> >> But this should be the fallback option, even for VLS.
>> >> >>
>> >> >> As far as the arguments go: we should reject CONSTRUCTORs for
>> >> >> variable-length types.  After doing that, we can treat a CONSTRUCTOR
>> >> >> for an N-element vector type by setting the number of patterns to N
>> >> >> and the number of elements per pattern to 1.
>> >> > Hi Richard,
>> >> > Thanks for the suggestions, and sorry for late response.
>> >> > I have a couple of very elementary 

Re: [PATCH] RISC-V: make USE_LOAD_ADDRESS_MACRO easier to understand

2022-09-23 Thread Kito Cheng via Gcc-patches
Committed with ChangeLog and minor naming tweaking.


> But I'm not sure if the current checking of local symbol can be simplified
> a bit. Isn't the first line enough for GET_CODE == const case too ?

SYMBOL_REF_P not work for CONST, SYMBOL_REF_P is just checking
GET_CODE is SYMBOL_REF, and SYMBOL_REF_LOCAL_P will also ICE if you
feed something other than SYMBOL_REF when checking enabled...

On Sat, Sep 3, 2022 at 7:08 AM Vineet Gupta  wrote:
>
> The current macro has several && and || making it really hard to understand
> the first time.
>
> Signed-off-by: Vineet Gupta 
> ---
> Since we are on this topic, perhaps get this simplification too.
>
> But I'm not sure if the current checking of local symbol can be simplified
> a bit. Isn't the first line enough for GET_CODE == const case too ?
>
> ---
>  gcc/config/riscv/riscv.h | 13 +++--
>  1 file changed, 7 insertions(+), 6 deletions(-)
>
> diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
> index eb1284e56d69..3e3f67ef8270 100644
> --- a/gcc/config/riscv/riscv.h
> +++ b/gcc/config/riscv/riscv.h
> @@ -749,18 +749,19 @@ typedef struct {
>  #define CASE_VECTOR_MODE SImode
>  #define CASE_VECTOR_PC_RELATIVE (riscv_cmodel != CM_MEDLOW)
>
> +#define LOCAL_SYM(sym) \
> + ((SYMBOL_REF_P (sym) && SYMBOL_REF_LOCAL_P (sym)) \
> +|| ((GET_CODE (sym) == CONST)  \
> +&& SYMBOL_REF_P (XEXP (XEXP (sym, 0),0))   \
> +&& SYMBOL_REF_LOCAL_P (XEXP (XEXP (sym, 0),0
> +
>  /* The load-address macro is used for PC-relative addressing of symbols
> that bind locally.  Don't use it for symbols that should be addressed
> via the GOT.  Also, avoid it for CM_MEDLOW, where LUI addressing
> currently results in more opportunities for linker relaxation.  */
>  #define USE_LOAD_ADDRESS_MACRO(sym)\
>(!TARGET_EXPLICIT_RELOCS &&  \
> -   ((flag_pic  \
> - && ((SYMBOL_REF_P (sym) && SYMBOL_REF_LOCAL_P (sym))  \
> -|| ((GET_CODE (sym) == CONST)  \
> -&& SYMBOL_REF_P (XEXP (XEXP (sym, 0),0))   \
> -&& SYMBOL_REF_LOCAL_P (XEXP (XEXP (sym, 0),0)  \
> - || riscv_cmodel == CM_MEDANY))
> +   ((flag_pic && LOCAL_SYM(sym)) || riscv_cmodel == CM_MEDANY))
>
>  /* Define this as 1 if `char' should by default be signed; else as 0.  */
>  #define DEFAULT_SIGNED_CHAR 0
> --
> 2.32.0
>


Re: [PATCH] c++: Implement __is_{nothrow_,}convertible [PR106784]

2022-09-23 Thread Jason Merrill via Gcc-patches

On 9/23/22 10:34, Marek Polacek wrote:

On Thu, Sep 22, 2022 at 06:14:44PM -0400, Jason Merrill wrote:

On 9/22/22 09:39, Marek Polacek wrote:

To improve compile times, the C++ library could use compiler built-ins
rather than implementing std::is_convertible (and _nothrow) as class
templates.  This patch adds the built-ins.  We already have
__is_constructible and __is_assignable, and the nothrow forms of those.

Microsoft (and clang, for compatibility) also provide an alias called
__is_convertible_to.  I did not add it, but it would be trivial to do
so.

I noticed that our __is_assignable doesn't implement the "Access checks
are performed as if from a context unrelated to either type" requirement,
therefore std::is_assignable / __is_assignable give two different results
here:

class S {
  operator int();
  friend void g(); // #1
};

void
g ()
{
  // #1 doesn't matter
  static_assert(std::is_assignable::value, "");
  static_assert(__is_assignable(int&, S), "");
}

This is not a problem if __is_assignable is not meant to be used by
the users.


That's fine, it's not.
  
Okay then.  libstdc++ needs to make sure then that it's handled right.



This patch doesn't make libstdc++ use the new built-ins, but I had to
rename a class otherwise its name would clash with the new built-in.


Sigh, that's going to be a hassle when comparing compiler versions on
preprocessed code.


Yeah, I guess :/.  Kind of like __integer_pack / __make_integer_seq.


--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -3697,6 +3697,12 @@ diagnose_trait_expr (tree expr, tree args)
   case CPTK_HAS_UNIQUE_OBJ_REPRESENTATIONS:
 inform (loc, "  %qT does not have unique object representations", t1);
 break;
+case CPTK_IS_CONVERTIBLE:
+  inform (loc, "  %qT is not convertible from %qE", t2, t1);
+  break;
+case CPTK_IS_NOTHROW_CONVERTIBLE:
+   inform (loc, "  %qT is not % convertible from %qE", t2, t1);


It's odd that the existing diagnostics quote "nothrow", which is not a
keyword.  I wonder why these library traits didn't use "noexcept"?


Eh, yeah, only "throw" is.  The quotes were deliberately added in
.  Should
I prepare a separate patch to use "%" rather than "%"?
OTOH, the traits have "nothrow" in their names, so maybe just go back to
"nothrow"?


The latter, I think.  Or possibly "no-throw".  I guess -Wformat-diag 
wants "nothrow" quoted because of the attribute of that name.



--- a/gcc/cp/method.cc
+++ b/gcc/cp/method.cc
@@ -2236,6 +2236,37 @@ ref_xes_from_temporary (tree to, tree from, bool 
direct_init_p)
 return ref_conv_binds_directly (to, val, direct_init_p).is_false ();
   }
+/* Return true if FROM can be converted to TO using implicit conversions,
+   or both FROM and TO are possibly cv-qualified void.  NB: This doesn't
+   implement the "Access checks are performed as if from a context unrelated
+   to either type" restriction.  */
+
+bool
+is_convertible (tree from, tree to)


You didn't want to add conversion to is*_xible?


No, it didn't look like a good fit.  It does things we don't need, and
also has if VOID_TYPE_P -> return error_mark_node; which would be wrong
for __is_convertible.

I realized I'm not testing passing an incomplete type to the built-in,
but since that is UB, I reckon we don't need to test it (we issue
"error: invalid use of incomplete type").


But your patch does test that, in the existing call to check_trait_type 
from finish_trait_expr?


The patch is OK.

Jason



[patch] Fix thinko in powerpc default specs for -mabi

2022-09-23 Thread Olivier Hainque via Gcc-patches
Hello,

For a powerpc compiler configured with --with-abi=elfv2, an explicit
-mabi option other than elfv1 fails to override the default.

For example, after

  [...]/configure --enable-languages=c --target=powerpc-elf --with-abi=elfv2
  make all-gcc

This command:

  ./gcc/xgcc -B./gcc/ t.c -mabi=ieeelongdouble -v

issues:

  ./gcc/cc1 [...] t.c -mabi=ieeelongdouble -mabi=elfv2

elfv2 overrides the user request here, which I think
is not as intended.

This is controlled by OPTION_DEFAULT_SPECS, where we have

  {"abi", "%{!mabi=elfv*:-mabi=%(VALUE)}" },

>From https://gcc.gnu.org/pipermail/gcc-patches/2013-November/375042.html
I don't see an explicit reason to trigger only on elfv* . It just looks
like an oversight with a focus on elfv1 vs elfv2 at the time.

The attached patch is a proposal to correct this, simply removing the
"elfv" prefix from the spec and adding the corresponding description
to the block comment just above.

We have been using this for about a year now in gcc-11 based toolchains.
This helps our dejagnu testsuite runs for VxWorks on powerpc and 
hasn't produced any ill side effect to date.

The patch also bootstraps and regtests fine on powerpc64-linux.

Is this OK to commit?

Thanks in advance!

With Kind Regards,

Olivier

2022-09-14  Olivier Hainque  

* config/rs6000/option-defaults.h (OPTION_DEFAULT_SPECS):
Have any -mabi, not only -mabi=elfv*, override the --with-abi
configuration default.


commit 33933796b777591007c04448860e781ac17b9070
Author: Olivier Hainque 
AuthorDate: Thu Apr 21 14:44:47 2022 +
Commit: Olivier Hainque 
CommitDate: Thu Apr 21 14:47:37 2022 +

Fix thinko in --with-abi processing on powerpc

Make it so any -mabi overrides what --with-abi requests
as a default, not only -mabi=elfv*.

Part of V415-021 (-mabi overrides on powerpc)

Change-Id: I62763dee62bbbd7d446f2dd091017d0c7e719cab

diff --git a/gcc/config/rs6000/option-defaults.h b/gcc/config/rs6000/option-defaults.h
index 7ebd115755a..ecf246e6b2e 100644
--- a/gcc/config/rs6000/option-defaults.h
+++ b/gcc/config/rs6000/option-defaults.h
@@ -47,6 +47,7 @@
 #endif
 
 /* Support for a compile-time default CPU, et cetera.  The rules are:
+   --with-abi is ignored if -mabi is specified.
--with-cpu is ignored if -mcpu is specified; likewise --with-cpu-32
  and --with-cpu-64.
--with-tune is ignored if -mtune or -mcpu is specified; likewise
@@ -54,7 +55,7 @@
--with-float is ignored if -mhard-float or -msoft-float are
  specified.  */
 #define OPTION_DEFAULT_SPECS \
-  {"abi", "%{!mabi=elfv*:-mabi=%(VALUE)}" }, \
+  {"abi", "%{!mabi=*:-mabi=%(VALUE)}" }, \
   {"tune", "%{!mtune=*:%{!mcpu=*:-mtune=%(VALUE)}}" }, \
   {"tune_32", "%{" OPT_ARCH32 ":%{!mtune=*:%{!mcpu=*:-mtune=%(VALUE)}}}" }, \
   {"tune_64", "%{" OPT_ARCH64 ":%{!mtune=*:%{!mcpu=*:-mtune=%(VALUE)}}}" }, \


Re: [PATCH] RISC-V: Add RVV machine modes.

2022-09-23 Thread Kito Cheng via Gcc-patches
Committed, thanks!

On Thu, Sep 15, 2022 at 7:40 PM  wrote:
>
> From: zhongjuzhe 
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-modes.def (VECTOR_BOOL_MODE): Add RVV mask modes.
> (ADJUST_NUNITS): Adjust nunits using riscv_vector_chunks.
> (ADJUST_ALIGNMENT): Adjust alignment.
> (ADJUST_BYTESIZE): Adjust bytesize using riscv_vector_chunks.
> (RVV_MODES): New macro.
> (VECTOR_MODE_WITH_PREFIX): Add RVV vector modes.
> (VECTOR_MODES_WITH_PREFIX): Add RVV vector modes.
>
> ---
>  gcc/config/riscv/riscv-modes.def | 141 +++
>  1 file changed, 141 insertions(+)
>
> diff --git a/gcc/config/riscv/riscv-modes.def 
> b/gcc/config/riscv/riscv-modes.def
> index 6e30c1a5595..95f69e87e23 100644
> --- a/gcc/config/riscv/riscv-modes.def
> +++ b/gcc/config/riscv/riscv-modes.def
> @@ -22,6 +22,147 @@ along with GCC; see the file COPYING3.  If not see
>  FLOAT_MODE (HF, 2, ieee_half_format);
>  FLOAT_MODE (TF, 16, ieee_quad_format);
>
> +/* Vector modes.  */
> +
> +/* Encode the ratio of SEW/LMUL into the mask types. There are the following
> + * mask types.  */
> +
> +/* | Mode | MIN_VLEN = 32 | MIN_VLEN = 64 |
> +   |  | SEW/LMUL  | SEW/LMUL  |
> +   | VNx1BI   | 32| 64|
> +   | VNx2BI   | 16| 32|
> +   | VNx4BI   | 8 | 16|
> +   | VNx8BI   | 4 | 8 |
> +   | VNx16BI  | 2 | 4 |
> +   | VNx32BI  | 1 | 2 |
> +   | VNx64BI  | N/A   | 1 |  */
> +
> +VECTOR_BOOL_MODE (VNx1BI, 1, BI, 8);
> +VECTOR_BOOL_MODE (VNx2BI, 2, BI, 8);
> +VECTOR_BOOL_MODE (VNx4BI, 4, BI, 8);
> +VECTOR_BOOL_MODE (VNx8BI, 8, BI, 8);
> +VECTOR_BOOL_MODE (VNx16BI, 16, BI, 8);
> +VECTOR_BOOL_MODE (VNx32BI, 32, BI, 8);
> +VECTOR_BOOL_MODE (VNx64BI, 64, BI, 8);
> +
> +ADJUST_NUNITS (VNx1BI, riscv_vector_chunks * 1);
> +ADJUST_NUNITS (VNx2BI, riscv_vector_chunks * 2);
> +ADJUST_NUNITS (VNx4BI, riscv_vector_chunks * 4);
> +ADJUST_NUNITS (VNx8BI, riscv_vector_chunks * 8);
> +ADJUST_NUNITS (VNx16BI, riscv_vector_chunks * 16);
> +ADJUST_NUNITS (VNx32BI, riscv_vector_chunks * 32);
> +ADJUST_NUNITS (VNx64BI, riscv_vector_chunks * 64);
> +
> +ADJUST_ALIGNMENT (VNx1BI, 1);
> +ADJUST_ALIGNMENT (VNx2BI, 1);
> +ADJUST_ALIGNMENT (VNx4BI, 1);
> +ADJUST_ALIGNMENT (VNx8BI, 1);
> +ADJUST_ALIGNMENT (VNx16BI, 1);
> +ADJUST_ALIGNMENT (VNx32BI, 1);
> +ADJUST_ALIGNMENT (VNx64BI, 1);
> +
> +ADJUST_BYTESIZE (VNx1BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
> +ADJUST_BYTESIZE (VNx2BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
> +ADJUST_BYTESIZE (VNx4BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
> +ADJUST_BYTESIZE (VNx8BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk);
> +ADJUST_BYTESIZE (VNx16BI, riscv_vector_chunks * 
> riscv_bytes_per_vector_chunk);
> +ADJUST_BYTESIZE (VNx32BI, riscv_vector_chunks * 
> riscv_bytes_per_vector_chunk);
> +ADJUST_BYTESIZE (VNx64BI, riscv_vector_chunks * 
> riscv_bytes_per_vector_chunk);
> +
> +/*
> +   | Mode| MIN_VLEN=32 | MIN_VLEN=32 | MIN_VLEN=64 | MIN_VLEN=64 |
> +   | | LMUL|  SEW/LMUL   | LMUL| SEW/LMUL|
> +   | VNx1QI  | MF4 |  32 | MF8 | 64  |
> +   | VNx2QI  | MF2 |  16 | MF4 | 32  |
> +   | VNx4QI  | M1  |  8  | MF2 | 16  |
> +   | VNx8QI  | M2  |  4  | M1  | 8   |
> +   | VNx16QI | M4  |  2  | M2  | 4   |
> +   | VNx32QI | M8  |  1  | M4  | 2   |
> +   | VNx64QI | N/A |  N/A| M8  | 1   |
> +   | VNx1(HI|HF) | MF2 |  32 | MF4 | 64  |
> +   | VNx2(HI|HF) | M1  |  16 | MF2 | 32  |
> +   | VNx4(HI|HF) | M2  |  8  | M1  | 16  |
> +   | VNx8(HI|HF) | M4  |  4  | M2  | 8   |
> +   | VNx16(HI|HF)| M8  |  2  | M4  | 4   |
> +   | VNx32(HI|HF)| N/A |  N/A| M8  | 2   |
> +   | VNx1(SI|SF) | M1  |  32 | MF2 | 64  |
> +   | VNx2(SI|SF) | M2  |  16 | M1  | 32  |
> +   | VNx4(SI|SF) | M4  |  8  | M2  | 16  |
> +   | VNx8(SI|SF) | M8  |  4  | M4  | 8   |
> +   | VNx16(SI|SF)| N/A |  N/A| M8  | 4   |
> +   | VNx1(DI|DF) | N/A |  N/A| M1  | 64  |
> +   | VNx2(DI|DF) | N/A |  N/A| M2  | 32  |
> +   | VNx4(DI|DF) | N/A |  N/A| M4  | 16  |
> +   | VNx8(DI|DF) | N/A |  N/A| M8  | 8   |
> +*/
> +
> +/* Define 

[og12] Come up with {,UN}LIKELY macros (was: [Patch][2/3][v2] nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup)

2022-09-23 Thread Thomas Schwinge
Hi!

Since the 2022-09-12 backport of this:

On 2022-08-29T20:43:26+0200, Tobias Burnus  wrote:
> nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup

... to og12 in commit 2b6ad53fd76c7bb9605be417d137a7d9a18f2117, the og12
branch didn't build anymore:

[...]/gcc/config/nvptx/mkoffload.cc: In function 'void process(FILE*, 
FILE*, uint32_t)':
[...]/gcc/config/nvptx/mkoffload.cc:284:59: error: 'UNLIKELY' was not 
declared in this scope
if (UNLIKELY (startswith (input + i, ".target sm_")))
   ^
[...]/gcc/config/nvptx/mkoffload.cc:289:57: error: 'UNLIKELY' was not 
declared in this scope
if (UNLIKELY (startswith (input + i, ".version ")))
 ^
make[2]: *** [[...]/gcc/config/nvptx/t-nvptx:8: mkoffload.o] Error 1

> --- a/gcc/config/nvptx/mkoffload.cc
> +++ b/gcc/config/nvptx/mkoffload.cc

> @@ -261,6 +281,16 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
>   case '\n':
> fprintf (out, "\\n\"\n\t\"");
> /* Look for mappings on subsequent lines.  */
> +   if (UNLIKELY (startswith (input + i, ".target sm_")))
> + {
> +   sm_ver = input + i + strlen (".target sm_");
> +   continue;
> + }
> +   if (UNLIKELY (startswith (input + i, ".version ")))
> + {
> +   version = input + i + strlen (".version ");
> +   continue;
> + }

To fix this, I've pushed a (very much reduced) partial cherry-pick of
commit r13-171-g22d9c8802add09a93308319fc37dd3a0f1125393
"Come up with {,UN}LIKELY macros" to og12 branch in
commit 44b77201a5431450f608b4538fefb1319127de13, see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 44b77201a5431450f608b4538fefb1319127de13 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Thu, 3 Feb 2022 10:58:18 +0100
Subject: [PATCH] Come up with {,UN}LIKELY macros.

gcc/ChangeLog:

	* system.h (LIKELY): Define.
	(UNLIKELY): Likewise.

(cherry picked from commit 22d9c8802add09a93308319fc37dd3a0f1125393, partial)
---
 gcc/ChangeLog.omp | 8 
 gcc/system.h  | 3 +++
 2 files changed, 11 insertions(+)

diff --git a/gcc/ChangeLog.omp b/gcc/ChangeLog.omp
index 4f80bcbd356..30c3abfc15b 100644
--- a/gcc/ChangeLog.omp
+++ b/gcc/ChangeLog.omp
@@ -1,3 +1,11 @@
+2022-09-23  Thomas Schwinge  
+
+	Backport from master branch:
+	2022-05-09  Martin Liska  
+
+	* system.h (LIKELY): Define.
+	(UNLIKELY): Likewise.
+
 2022-09-12  Tobias Burnus  
 
 	Backport from mainline:
diff --git a/gcc/system.h b/gcc/system.h
index e10c34f70ec..6b6868d0bbf 100644
--- a/gcc/system.h
+++ b/gcc/system.h
@@ -736,6 +736,9 @@ extern int vsnprintf (char *, size_t, const char *, va_list);
 #define __builtin_expect(a, b) (a)
 #endif
 
+#define LIKELY(x) (__builtin_expect ((x), 1))
+#define UNLIKELY(x) (__builtin_expect ((x), 0))
+
 /* Some of the headers included by  can use "abort" within a
namespace, e.g. "_VSTD::abort();", which fails after we use the
preprocessor to redefine "abort" as "fancy_abort" below.  */
-- 
2.35.1



Re: [PATCH] RISC-V: Suppress riscv-selftests.cc warning.

2022-09-23 Thread Kito Cheng via Gcc-patches
Committed, but squashed changes to "RISC-V: Support poly move
manipulation and selftests." instead of a standalone commit.

On Sat, Sep 17, 2022 at 9:00 AM  wrote:
>
> From: Ju-Zhe Zhong 
>
> This patch is a fix patch for:
> https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601643.html
>
> Suppress the warning as follows:
>
> ../../../riscv-gcc/gcc/poly-int.h: In function
> ‘poly_int64 eval_value(rtx, std::map&)’:
> ../../../riscv-gcc/gcc/poly-int.h:845:48: warning:
> ‘*((void*)& op2_val +8)’ may be used uninitialized
> in this function [-Wmaybe-uninitialized]
>  POLY_SET_COEFF (C, r, i, NCa (a.coeffs[i]) + b.coeffs[i]);
> ^
> ../../../riscv-gcc/gcc/config/riscv/riscv-selftests.cc:74:23:
> note: ‘*((void*)& op2_val +8)’ was declared here
>poly_int64 op1_val, op2_val;
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-selftests.cc (eval_value): Add initial value.
>
> ---
>  gcc/config/riscv/riscv-selftests.cc | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/config/riscv/riscv-selftests.cc 
> b/gcc/config/riscv/riscv-selftests.cc
> index 167cd47c880..490b6ed6b8e 100644
> --- a/gcc/config/riscv/riscv-selftests.cc
> +++ b/gcc/config/riscv/riscv-selftests.cc
> @@ -71,7 +71,8 @@ eval_value (rtx x, std::map _to_rtx)
>unsigned regno = REGNO (x);
>expr = regno_to_rtx[regno];
>
> -  poly_int64 op1_val, op2_val;
> +  poly_int64 op1_val = 0;
> +  poly_int64 op2_val = 0;
>if (UNARY_P (expr))
>  {
>op1_val = eval_value (XEXP (expr, 0), regno_to_rtx);
> --
> 2.36.1
>


Re: [PATCH] RISC-V: Support poly move manipulation and selftests.

2022-09-23 Thread Kito Cheng via Gcc-patches
Committed. thanks!

On Thu, Sep 15, 2022 at 4:29 PM  wrote:
>
> From: zhongjuzhe 
>
> gcc/ChangeLog:
>
> * common/config/riscv/riscv-common.cc: Change "static void" to "void".
> * config.gcc: Add riscv-selftests.o
> * config/riscv/predicates.md: Allow const_poly_int.
> * config/riscv/riscv-protos.h (riscv_reinit): New function.
> (riscv_parse_arch_string): change as exten function.
> (riscv_run_selftests): New function.
> * config/riscv/riscv.cc (riscv_cannot_force_const_mem): Don't allow 
> poly into const pool.
> (riscv_report_v_required): New function.
> (riscv_expand_op): New function.
> (riscv_expand_mult_with_const_int): New function.
> (riscv_legitimize_poly_move): Ditto.
> (riscv_legitimize_move): New function.
> (riscv_hard_regno_mode_ok): Add VL/VTYPE register allocation and fix 
> vector RA.
> (riscv_convert_vector_bits): Fix riscv_vector_chunks configuration 
> for -marh no 'v'.
> (riscv_reinit): New function.
> (TARGET_RUN_TARGET_SELFTESTS): New target hook support.
> * config/riscv/t-riscv: Add riscv-selftests.o.
> * config/riscv/riscv-selftests.cc: New file.
>
> gcc/testsuite/ChangeLog:
>
> * selftests/riscv/empty-func.rtl: New test.
>
> ---
>  gcc/common/config/riscv/riscv-common.cc  |   2 +-
>  gcc/config.gcc   |   2 +-
>  gcc/config/riscv/predicates.md   |   3 +
>  gcc/config/riscv/riscv-protos.h  |   9 +
>  gcc/config/riscv/riscv-selftests.cc  | 239 +++
>  gcc/config/riscv/riscv.cc| 298 ++-
>  gcc/config/riscv/t-riscv |   4 +
>  gcc/testsuite/selftests/riscv/empty-func.rtl |   8 +
>  8 files changed, 558 insertions(+), 7 deletions(-)
>  create mode 100644 gcc/config/riscv/riscv-selftests.cc
>  create mode 100644 gcc/testsuite/selftests/riscv/empty-func.rtl
>
> diff --git a/gcc/common/config/riscv/riscv-common.cc 
> b/gcc/common/config/riscv/riscv-common.cc
> index 77219162eeb..c39ed2e2696 100644
> --- a/gcc/common/config/riscv/riscv-common.cc
> +++ b/gcc/common/config/riscv/riscv-common.cc
> @@ -1224,7 +1224,7 @@ static const riscv_ext_flag_table_t 
> riscv_ext_flag_table[] =
>  /* Parse a RISC-V ISA string into an option mask.  Must clear or set all arch
> dependent mask bits, in case more than one -march string is passed.  */
>
> -static void
> +void
>  riscv_parse_arch_string (const char *isa,
>  struct gcc_options *opts,
>  location_t loc)
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index f4e757bd853..27ffce3fb50 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -515,7 +515,7 @@ pru-*-*)
> ;;
>  riscv*)
> cpu_type=riscv
> -   extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o 
> riscv-shorten-memrefs.o"
> +   extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o 
> riscv-shorten-memrefs.o riscv-selftests.o"
> d_target_objs="riscv-d.o"
> ;;
>  rs6000*-*-*)
> diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
> index 862e72b0983..5e149b3a95f 100644
> --- a/gcc/config/riscv/predicates.md
> +++ b/gcc/config/riscv/predicates.md
> @@ -146,6 +146,9 @@
>  case CONST_INT:
>return !splittable_const_int_operand (op, mode);
>
> +case CONST_POLY_INT:
> +  return known_eq (rtx_to_poly_int64 (op), BYTES_PER_RISCV_VECTOR);
> +
>  case CONST:
>  case SYMBOL_REF:
>  case LABEL_REF:
> diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
> index 649c5c977e1..f9a2baa46c7 100644
> --- a/gcc/config/riscv/riscv-protos.h
> +++ b/gcc/config/riscv/riscv-protos.h
> @@ -74,6 +74,7 @@ extern bool riscv_expand_block_move (rtx, rtx, rtx);
>  extern bool riscv_store_data_bypass_p (rtx_insn *, rtx_insn *);
>  extern rtx riscv_gen_gpr_save_insn (struct riscv_frame_info *);
>  extern bool riscv_gpr_save_operation_p (rtx);
> +extern void riscv_reinit (void);
>
>  /* Routines implemented in riscv-c.cc.  */
>  void riscv_cpu_cpp_builtins (cpp_reader *);
> @@ -86,6 +87,7 @@ extern void riscv_init_builtins (void);
>
>  /* Routines implemented in riscv-common.cc.  */
>  extern std::string riscv_arch_str (bool version_p = true);
> +extern void riscv_parse_arch_string (const char *, struct gcc_options *, 
> location_t);
>
>  extern bool riscv_hard_regno_rename_ok (unsigned, unsigned);
>
> @@ -105,4 +107,11 @@ struct riscv_cpu_info {
>
>  extern const riscv_cpu_info *riscv_find_cpu (const char *);
>
> +/* Routines implemented in riscv-selftests.cc.  */
> +#if CHECKING_P
> +namespace selftest {
> +extern void riscv_run_selftests (void);
> +} // namespace selftest
> +#endif
> +
>  #endif /* ! GCC_RISCV_PROTOS_H */
> diff --git a/gcc/config/riscv/riscv-selftests.cc 
> b/gcc/config/riscv/riscv-selftests.cc
> new file mode 100644
> index 000..167cd47c880
> --- 

[Patch] OpenACC: Fix reduction tree-sharing issue [PR106982]

2022-09-23 Thread Tobias Burnus

This fixes a tree-sharing ICE. It seems as if all unshare_expr
I added were required in this case. The first long testcase is
based on the real testcase from the OpenACC testsuite, the second
one is what reduction produced - but I thought some nested reduction
might be interesting as well; hence, I included both tests.


Bootstrapped and regtested on x86-64-gnu-linux w/o offloading.
OK for mainline and GCC 12?

(It gives an ICE with GCC 10 but not with GCC 9; thus,
more regression-fix backporting would be possible,
if someone cares.)

Tobias


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
OpenACC: Fix reduction tree-sharing issue [PR106982]

The tree for var == incoming == outgound was
'MEM  [(double *)]' which caused the ICE
"incorrect sharing of tree nodes".

	PR middle-end/106982

gcc/ChangeLog:

	* omp-low.cc (lower_oacc_reductions): Add some unshare_expr.

gcc/testsuite/ChangeLog:

	* c-c++-common/goacc/reduction-7.c: New test.
	* c-c++-common/goacc/reduction-8.c: New test.

 gcc/omp-low.cc | 17 -
 gcc/testsuite/c-c++-common/goacc/reduction-7.c | 22 ++
 gcc/testsuite/c-c++-common/goacc/reduction-8.c | 12 
 3 files changed, 46 insertions(+), 5 deletions(-)

diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
index f0469d20b3d..8e07fb5d8a8 100644
--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.cc
@@ -7631,7 +7631,12 @@ lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner,
 	  incoming = build_simple_mem_ref (incoming);
 	  }
 	else
-	  v1 = v2 = v3 = var;
+	  {
+	v1 = unshare_expr (var);
+	v2 = unshare_expr (var);
+	v3 = unshare_expr (var);
+	outgoing = unshare_expr (outgoing);
+	  }
 
 	/* Determine position in reduction buffer, which may be used
 	   by target.  The parser has ensured that this is not a
@@ -7659,21 +7664,23 @@ lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner,
 	  = build_call_expr_internal_loc (loc, IFN_GOACC_REDUCTION,
 	  TREE_TYPE (var), 6, setup_code,
 	  unshare_expr (ref_to_res),
-	  incoming, level, op, off);
+	  unshare_expr (incoming), level,
+	  op, off);
 	tree init_call
 	  = build_call_expr_internal_loc (loc, IFN_GOACC_REDUCTION,
 	  TREE_TYPE (var), 6, init_code,
 	  unshare_expr (ref_to_res),
-	  v1, level, op, off);
+	  unshare_expr (v1), level, op, off);
 	tree fini_call
 	  = build_call_expr_internal_loc (loc, IFN_GOACC_REDUCTION,
 	  TREE_TYPE (var), 6, fini_code,
 	  unshare_expr (ref_to_res),
-	  v2, level, op, off);
+	  unshare_expr (v2), level, op, off);
 	tree teardown_call
 	  = build_call_expr_internal_loc (loc, IFN_GOACC_REDUCTION,
 	  TREE_TYPE (var), 6, teardown_code,
-	  ref_to_res, v3, level, op, off);
+	  ref_to_res, unshare_expr (v3),
+	  level, op, off);
 
 	gimplify_assign (v1, setup_call, _fork);
 	gimplify_assign (v2, init_call, _fork);
diff --git a/gcc/testsuite/c-c++-common/goacc/reduction-7.c b/gcc/testsuite/c-c++-common/goacc/reduction-7.c
new file mode 100644
index 000..482b0ab1984
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/reduction-7.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+
+/* PR middle-end/106982 */
+
+long long n = 100;
+int multiplicitive_n = 128;
+
+void test1(double *rand, double *a, double *b, double *c)
+{
+#pragma acc data copyin(a[0:10*multiplicitive_n], b[0:10*multiplicitive_n]) copyout(c[0:10])
+{
+#pragma acc parallel loop
+for (int i = 0; i < 10; ++i)
+{
+double temp = 1.0;
+#pragma acc loop vector reduction(*:temp)
+for (int j = 0; j < multiplicitive_n; ++j)
+  temp *= a[(i * multiplicitive_n) + j] + b[(i * multiplicitive_n) + j];
+c[i] = temp;
+}
+}
+}
diff --git a/gcc/testsuite/c-c++-common/goacc/reduction-8.c b/gcc/testsuite/c-c++-common/goacc/reduction-8.c
new file mode 100644
index 000..2c3ed499d5b
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/reduction-8.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+
+/* PR middle-end/106982 */
+
+void test1(double *c)
+{
+double reduced[5];
+#pragma acc parallel loop gang private(reduced)
+for (int x = 0; x < 5; ++x)
+#pragma acc loop worker reduction(*:reduced)
+  for (int y = 0; y < 5; ++y) { }
+}


Re: [PATCH] c++: Implement __is_{nothrow_,}convertible [PR106784]

2022-09-23 Thread Marek Polacek via Gcc-patches
On Fri, Sep 23, 2022 at 03:40:23PM +0100, Jonathan Wakely wrote:
> On Thu, 22 Sept 2022 at 23:14, Jason Merrill wrote:
> > On 9/22/22 09:39, Marek Polacek wrote:
> > > This patch doesn't make libstdc++ use the new built-ins, but I had to
> > > rename a class otherwise its name would clash with the new built-in.
> >
> > Sigh, that's going to be a hassle when comparing compiler versions on
> > preprocessed code.
> 
> Good point. Clang has some gross hacks that we could consider. When it
> sees a declaration that uses a built-in name, it disables the built-in
> for the remainder of the translation unit. It does this precisely to
> allow a new Clang to compile old std::lib headers where a built-in
> like __is_assignable was used as a normal class template, not the
> built-in (because no such built-in existed at the time the library
> code was written). For us, this is only really a problem when
> bisecting bugs and using a newer compiler to compile .ii files from
> odler headers, but for Clang combining a new Clang with older
> libstdc++ headers is a hard requirement (recall that when Clang was
> first deployed to macOS it had to consume the system's libstdc++ 4.2
> headers).
> 
> It's a big kluge, but it would mean that a new GCC could happily
> consume preprocessed code from older libstdc++ headers.

Ah, you're right, it must be this lib/Parse/ParseDeclCXX.cpp code:

// GNU libstdc++ 4.2 and libc++ use certain intrinsic names as the
// name of struct templates, but some are keywords in GCC >= 4.3
// and Clang. Therefore, when we see the token sequence "struct
// X", make X into a normal identifier rather than a keyword, to
// allow libstdc++ 4.2 and libc++ to work properly.
TryKeywordIdentFallback(true);
 
Whether we want to do this, I'm not sure.

Marek



Re: [PATCH] c++: Implement __is_{nothrow_,}convertible [PR106784]

2022-09-23 Thread Jonathan Wakely via Gcc-patches
On Fri, 23 Sept 2022 at 15:34, Marek Polacek wrote:
>
> On Thu, Sep 22, 2022 at 06:14:44PM -0400, Jason Merrill wrote:
> > On 9/22/22 09:39, Marek Polacek wrote:
> > > To improve compile times, the C++ library could use compiler built-ins
> > > rather than implementing std::is_convertible (and _nothrow) as class
> > > templates.  This patch adds the built-ins.  We already have
> > > __is_constructible and __is_assignable, and the nothrow forms of those.
> > >
> > > Microsoft (and clang, for compatibility) also provide an alias called
> > > __is_convertible_to.  I did not add it, but it would be trivial to do
> > > so.
> > >
> > > I noticed that our __is_assignable doesn't implement the "Access checks
> > > are performed as if from a context unrelated to either type" requirement,
> > > therefore std::is_assignable / __is_assignable give two different results
> > > here:
> > >
> > >class S {
> > >  operator int();
> > >  friend void g(); // #1
> > >};
> > >
> > >void
> > >g ()
> > >{
> > >  // #1 doesn't matter
> > >  static_assert(std::is_assignable::value, "");
> > >  static_assert(__is_assignable(int&, S), "");
> > >}
> > >
> > > This is not a problem if __is_assignable is not meant to be used by
> > > the users.
> >
> > That's fine, it's not.
>
> Okay then.  libstdc++ needs to make sure then that it's handled right.

That's fine, the type traits in libstdc++ are always "a context
unrelated to either type", unless users do something idiotic like
declare std::is_assignable as a friend.

https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1339r1.pdf
wants to explicitly say that's idiotic.



Re: [PATCH] c++: Implement __is_{nothrow_,}convertible [PR106784]

2022-09-23 Thread Jonathan Wakely via Gcc-patches
On Thu, 22 Sept 2022 at 23:14, Jason Merrill wrote:
> On 9/22/22 09:39, Marek Polacek wrote:
> > This patch doesn't make libstdc++ use the new built-ins, but I had to
> > rename a class otherwise its name would clash with the new built-in.
>
> Sigh, that's going to be a hassle when comparing compiler versions on
> preprocessed code.

Good point. Clang has some gross hacks that we could consider. When it
sees a declaration that uses a built-in name, it disables the built-in
for the remainder of the translation unit. It does this precisely to
allow a new Clang to compile old std::lib headers where a built-in
like __is_assignable was used as a normal class template, not the
built-in (because no such built-in existed at the time the library
code was written). For us, this is only really a problem when
bisecting bugs and using a newer compiler to compile .ii files from
odler headers, but for Clang combining a new Clang with older
libstdc++ headers is a hard requirement (recall that when Clang was
first deployed to macOS it had to consume the system's libstdc++ 4.2
headers).

It's a big kluge, but it would mean that a new GCC could happily
consume preprocessed code from older libstdc++ headers.



Re: [PATCH] c++: Implement __is_{nothrow_,}convertible [PR106784]

2022-09-23 Thread Marek Polacek via Gcc-patches
On Thu, Sep 22, 2022 at 06:14:44PM -0400, Jason Merrill wrote:
> On 9/22/22 09:39, Marek Polacek wrote:
> > To improve compile times, the C++ library could use compiler built-ins
> > rather than implementing std::is_convertible (and _nothrow) as class
> > templates.  This patch adds the built-ins.  We already have
> > __is_constructible and __is_assignable, and the nothrow forms of those.
> > 
> > Microsoft (and clang, for compatibility) also provide an alias called
> > __is_convertible_to.  I did not add it, but it would be trivial to do
> > so.
> > 
> > I noticed that our __is_assignable doesn't implement the "Access checks
> > are performed as if from a context unrelated to either type" requirement,
> > therefore std::is_assignable / __is_assignable give two different results
> > here:
> > 
> >class S {
> >  operator int();
> >  friend void g(); // #1
> >};
> > 
> >void
> >g ()
> >{
> >  // #1 doesn't matter
> >  static_assert(std::is_assignable::value, "");
> >  static_assert(__is_assignable(int&, S), "");
> >}
> > 
> > This is not a problem if __is_assignable is not meant to be used by
> > the users.
> 
> That's fine, it's not.
 
Okay then.  libstdc++ needs to make sure then that it's handled right.

> > This patch doesn't make libstdc++ use the new built-ins, but I had to
> > rename a class otherwise its name would clash with the new built-in.
> 
> Sigh, that's going to be a hassle when comparing compiler versions on
> preprocessed code.

Yeah, I guess :/.  Kind of like __integer_pack / __make_integer_seq.

> > --- a/gcc/cp/constraint.cc
> > +++ b/gcc/cp/constraint.cc
> > @@ -3697,6 +3697,12 @@ diagnose_trait_expr (tree expr, tree args)
> >   case CPTK_HAS_UNIQUE_OBJ_REPRESENTATIONS:
> > inform (loc, "  %qT does not have unique object representations", 
> > t1);
> > break;
> > +case CPTK_IS_CONVERTIBLE:
> > +  inform (loc, "  %qT is not convertible from %qE", t2, t1);
> > +  break;
> > +case CPTK_IS_NOTHROW_CONVERTIBLE:
> > +   inform (loc, "  %qT is not % convertible from %qE", t2, t1);
> 
> It's odd that the existing diagnostics quote "nothrow", which is not a
> keyword.  I wonder why these library traits didn't use "noexcept"?

Eh, yeah, only "throw" is.  The quotes were deliberately added in
.  Should
I prepare a separate patch to use "%" rather than "%"?
OTOH, the traits have "nothrow" in their names, so maybe just go back to
"nothrow"?
 
> > --- a/gcc/cp/method.cc
> > +++ b/gcc/cp/method.cc
> > @@ -2236,6 +2236,37 @@ ref_xes_from_temporary (tree to, tree from, bool 
> > direct_init_p)
> > return ref_conv_binds_directly (to, val, direct_init_p).is_false ();
> >   }
> > +/* Return true if FROM can be converted to TO using implicit conversions,
> > +   or both FROM and TO are possibly cv-qualified void.  NB: This doesn't
> > +   implement the "Access checks are performed as if from a context 
> > unrelated
> > +   to either type" restriction.  */
> > +
> > +bool
> > +is_convertible (tree from, tree to)
> 
> You didn't want to add conversion to is*_xible?

No, it didn't look like a good fit.  It does things we don't need, and
also has if VOID_TYPE_P -> return error_mark_node; which would be wrong
for __is_convertible.

I realized I'm not testing passing an incomplete type to the built-in,
but since that is UB, I reckon we don't need to test it (we issue
"error: invalid use of incomplete type").

Marek



Re: [PATCH 2/2]AArch64 Perform more late folding of reg moves and shifts which arrive after expand

2022-09-23 Thread Richard Sandiford via Gcc-patches
Tamar Christina  writes:
> Hi All,
>
> Similar to the 1/2 patch but adds additional back-end specific folding for if
> the register sequence was created as a result of RTL optimizations.
>
> Concretely:
>
> #include 
>
> unsigned int foor (uint32x4_t x)
> {
> return x[1] >> 16;
> }
>
> generates:
>
> foor:
> umovw0, v0.h[3]
> ret
>
> instead of
>
> foor:
> umovw0, v0.s[1]
> lsr w0, w0, 16
> ret

The same thing ought to work for smov, so it would be good to do both.
That would also make the split between the original and new patterns
more obvious: left shift for the old pattern, right shift for the new
pattern.

> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64.md (*si3_insn_uxtw): Split SHIFT into
>   left and right ones.
>   * config/aarch64/constraints.md (Usl): New.
>   * config/aarch64/iterators.md (SHIFT_NL, LSHIFTRT): New.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/shift-read.c: New test.
>
> --- inline copy of patch -- 
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 
> c333fb1f72725992bb304c560f1245a242d5192d..6aa1fb4be003f2027d63ac69fd314c2bbc876258
>  100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -5493,7 +5493,7 @@ (define_insn "*rol3_insn"
>  ;; zero_extend version of shifts
>  (define_insn "*si3_insn_uxtw"
>[(set (match_operand:DI 0 "register_operand" "=r,r")
> - (zero_extend:DI (SHIFT_no_rotate:SI
> + (zero_extend:DI (SHIFT_arith:SI
>(match_operand:SI 1 "register_operand" "r,r")
>(match_operand:QI 2 "aarch64_reg_or_shift_imm_si" "Uss,r"]
>""
> @@ -5528,6 +5528,60 @@ (define_insn "*rolsi3_insn_uxtw"
>[(set_attr "type" "rotate_imm")]
>  )
>  
> +(define_insn "*si3_insn2_uxtw"
> +  [(set (match_operand:DI 0 "register_operand" "=r,?r,r")

Is the "?" justified?  It seems odd to penalise a native,
single-instruction r->r operation in favour of a w->r operation.

> + (zero_extend:DI (LSHIFTRT:SI
> +  (match_operand:SI 1 "register_operand" "w,r,r")
> +  (match_operand:QI 2 "aarch64_reg_or_shift_imm_si" "Usl,Uss,r"]
> +  ""
> +  {
> +switch (which_alternative)
> +{
> +  case 0:
> + {
> +   machine_mode dest, vec_mode;
> +   int val = INTVAL (operands[2]);
> +   int size = 32 - val;
> +   if (size == 16)
> + dest = HImode;
> +   else if (size == 8)
> + dest = QImode;
> +   else
> + gcc_unreachable ();
> +
> +   /* Get nearest 64-bit vector mode.  */
> +   int nunits = 64 / size;
> +   auto vector_mode
> + = mode_for_vector (as_a  (dest), nunits);
> +   if (!vector_mode.exists (_mode))
> + gcc_unreachable ();
> +   operands[1] = gen_rtx_REG (vec_mode, REGNO (operands[1]));
> +   operands[2] = gen_int_mode (val / size, SImode);
> +
> +   /* Ideally we just call aarch64_get_lane_zero_extend but reload gets
> +  into a weird loop due to a mov of w -> r being present most time
> +  this instruction applies.  */
> +   switch (dest)
> +   {
> + case QImode:
> +   return "umov\\t%w0, %1.b[%2]";
> + case HImode:
> +   return "umov\\t%w0, %1.h[%2]";
> + default:
> +   gcc_unreachable ();
> +   }

Doesn't this reduce to something like:

  if (size == 16)
return "umov\\t%w0, %1.h[1]";
  if (size == 8)
return "umov\\t%w0, %1.b[3]";
  gcc_unreachable ();

?  We should print %1 correctly as vN even with its original type.

Thanks,
Richard

> + }
> +  case 1:
> + return "\\t%w0, %w1, %2";
> +  case 2:
> + return "\\t%w0, %w1, %w2";
> +  default:
> + gcc_unreachable ();
> +  }
> +  }
> +  [(set_attr "type" "neon_to_gp,bfx,shift_reg")]
> +)
> +
>  (define_insn "*3_insn"
>[(set (match_operand:SHORT 0 "register_operand" "=r")
>   (ASHIFT:SHORT (match_operand:SHORT 1 "register_operand" "r")
> diff --git a/gcc/config/aarch64/constraints.md 
> b/gcc/config/aarch64/constraints.md
> index 
> ee7587cca1673208e2bfd6b503a21d0c8b69bf75..470510d691ee8589aec9b0a71034677534641bea
>  100644
> --- a/gcc/config/aarch64/constraints.md
> +++ b/gcc/config/aarch64/constraints.md
> @@ -166,6 +166,14 @@ (define_constraint "Uss"
>(and (match_code "const_int")
> (match_test "(unsigned HOST_WIDE_INT) ival < 32")))
>  
> +(define_constraint "Usl"
> +  "@internal
> +  A constraint that matches an immediate shift constant in SImode that has an
> +  exact mode available to use."
> +  (and (match_code "const_int")
> +   (and (match_test "satisfies_constraint_Uss (op)")
> + (match_test "(32 - ival == 8) || (32 - ival == 16)"
> +
>  (define_constraint "Usn"
>   "A constant that can be used with a CCMN operation (once negated)."
>   (and (match_code "const_int")
> diff --git 

Re: [PATCH] Avoid depending on destructor order

2022-09-23 Thread David Edelsohn via Gcc-patches
On Fri, Sep 23, 2022 at 10:12 AM Thomas Neumann  wrote:

> >
> > +static const bool in_shutdown = false;
> >
> > I'll let Jason or others decide if this is the right solution.  It seems
> > that in_shutdown also could be declared outside the #ifdef and
> > initialized as "false".
>
> sure, either is fine. Moving it outside the #ifdef wastes one byte in
> the executable (while the compiler can eliminate the const), but it does
> not really matter.
>
> I have verified that the patch below fixes builds for both fast-path and
> non-fast-path builds. But if you prefer I will move the in_shutdown
> definition instead.
>
> Best
>
> Thomas
>
> PS: in_shutdown is an int here instead of a bool because non-fast-path
> builds do not include stdbool. Not a good reason, of course, but I
> wanted to keep the patch minimal and it makes no difference in practice.
>
>
>  When using the atomic fast path deregistering can fail during
>  program shutdown if the lookup structures are already destroyed.
>  The assert in __deregister_frame_info_bases takes that into
>  account. In the non-fast-path case however is not aware of
>  program shutdown, which caused a compiler error on such platforms.
>  We fix that by introducing a constant for in_shutdown in
>  non-fast-path builds.
>
>  libgcc/ChangeLog:
>  * unwind-dw2-fde.c: Introduce a constant for in_shutdown
>  for the non-fast-path case.
>
> diff --git a/libgcc/unwind-dw2-fde.c b/libgcc/unwind-dw2-fde.c
> index d237179f4ea..0bcd5061d76 100644
> --- a/libgcc/unwind-dw2-fde.c
> +++ b/libgcc/unwind-dw2-fde.c
> @@ -67,6 +67,8 @@ static void
>   init_object (struct object *ob);
>
>   #else
> +/* Without fast path frame deregistration must always succeed.  */
> +static const int in_shutdown = 0;
>
>   /* The unseen_objects list contains objects that have been registered
>  but not yet categorized in any way.  The seen_objects list has had
>

Thanks for the patch.  I'll let you and Jason decide which style solution
is preferred.

Thanks, David


[committed] libstdc++: Micro-optimizaion for std::bitset stream extraction

2022-09-23 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux, pushed to trunk.

-- >8 --

Don't bother trying to copy any characters for bitset<0>.

libstdc++-v3/ChangeLog:

* include/std/bitset (operator>>): Do not copy for N==0.
* testsuite/20_util/bitset/io/input.cc: Add comment.
---
 libstdc++-v3/include/std/bitset   | 2 +-
 libstdc++-v3/testsuite/20_util/bitset/io/input.cc | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/include/std/bitset b/libstdc++-v3/include/std/bitset
index 83c6416b770..6dbc58c6429 100644
--- a/libstdc++-v3/include/std/bitset
+++ b/libstdc++-v3/include/std/bitset
@@ -1615,7 +1615,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 
   if (__tmp.empty() && _Nb)
__state |= __ios_base::failbit;
-  else
+  else if _GLIBCXX17_CONSTEXPR (_Nb)
__x._M_copy_from_string(__tmp, static_cast(0), _Nb,
__zero, __one);
   if (__state)
diff --git a/libstdc++-v3/testsuite/20_util/bitset/io/input.cc 
b/libstdc++-v3/testsuite/20_util/bitset/io/input.cc
index 939861b171e..0f22cefbb5b 100644
--- a/libstdc++-v3/testsuite/20_util/bitset/io/input.cc
+++ b/libstdc++-v3/testsuite/20_util/bitset/io/input.cc
@@ -39,7 +39,7 @@ void test01()
   ss.clear();
   ss.str("*");
   ss >> b0;
-  VERIFY( ss.rdstate() == ios_base::goodbit );
+  VERIFY( ss.rdstate() == ios_base::goodbit ); // LWG 3199
 }
 
 int main()
-- 
2.37.3



Re: [PATCH] Avoid depending on destructor order

2022-09-23 Thread Thomas Neumann via Gcc-patches


+static const bool in_shutdown = false;

I'll let Jason or others decide if this is the right solution.  It seems 
that in_shutdown also could be declared outside the #ifdef and 
initialized as "false".


sure, either is fine. Moving it outside the #ifdef wastes one byte in 
the executable (while the compiler can eliminate the const), but it does 
not really matter.


I have verified that the patch below fixes builds for both fast-path and 
non-fast-path builds. But if you prefer I will move the in_shutdown 
definition instead.


Best

Thomas

PS: in_shutdown is an int here instead of a bool because non-fast-path 
builds do not include stdbool. Not a good reason, of course, but I 
wanted to keep the patch minimal and it makes no difference in practice.



When using the atomic fast path deregistering can fail during
program shutdown if the lookup structures are already destroyed.
The assert in __deregister_frame_info_bases takes that into
account. In the non-fast-path case however is not aware of
program shutdown, which caused a compiler error on such platforms.
We fix that by introducing a constant for in_shutdown in
non-fast-path builds.

libgcc/ChangeLog:
* unwind-dw2-fde.c: Introduce a constant for in_shutdown
for the non-fast-path case.

diff --git a/libgcc/unwind-dw2-fde.c b/libgcc/unwind-dw2-fde.c
index d237179f4ea..0bcd5061d76 100644
--- a/libgcc/unwind-dw2-fde.c
+++ b/libgcc/unwind-dw2-fde.c
@@ -67,6 +67,8 @@ static void
 init_object (struct object *ob);

 #else
+/* Without fast path frame deregistration must always succeed.  */
+static const int in_shutdown = 0;

 /* The unseen_objects list contains objects that have been registered
but not yet categorized in any way.  The seen_objects list has had


Re: [PATCH] Avoid depending on destructor order

2022-09-23 Thread David Edelsohn via Gcc-patches
On Fri, Sep 23, 2022 at 9:38 AM Thomas Neumann  wrote:

> > This patch broke bootstrap on AIX and probably other targets.
> >
> > #ifdef ATOMIC_FDE_FAST_PATH
> > #include "unwind-dw2-btree.h"
> >
> > static struct btree registered_frames;
> > static bool in_shutdown;
> > ...
> > #else
> >
> > in_shutdown only is defined for ATOMIC_FDE_FAST_PATH but used in code /
> > asserts not protected by that macro.
> >
> >gcc_assert (in_shutdown || ob);
> >return (void *) ob;
> > }
>
> I am sorry for that, I did not consider that my test machines all use
> the fast path.
>
> I think the problem can be fixed by the trivial patch below, I will
> commit that after I have tested builds both with and without fast path.
>
> Best
>
> Thomas
>
>
> diff --git a/libgcc/unwind-dw2-fde.c b/libgcc/unwind-dw2-fde.c
> index d237179f4ea..d6e347c5481 100644
> --- a/libgcc/unwind-dw2-fde.c
> +++ b/libgcc/unwind-dw2-fde.c
> @@ -67,6 +67,8 @@ static void
>   init_object (struct object *ob);
>
>   #else
> +/* Without fast path frame lookup must always succeed */
> +static const bool in_shutdown = false;
>
>   /* The unseen_objects list contains objects that have been registered
>  but not yet categorized in any way.  The seen_objects list has had
>

I tried the patch but it still failed because the type name "bool"  is not
known.  This patch is the only use of "bool" in the libgcc source code,
which is C, not C++.

Thanks, David


Re: [PATCH] Avoid depending on destructor order

2022-09-23 Thread David Edelsohn via Gcc-patches
On Fri, Sep 23, 2022 at 9:38 AM Thomas Neumann  wrote:

> > This patch broke bootstrap on AIX and probably other targets.
> >
> > #ifdef ATOMIC_FDE_FAST_PATH
> > #include "unwind-dw2-btree.h"
> >
> > static struct btree registered_frames;
> > static bool in_shutdown;
> > ...
> > #else
> >
> > in_shutdown only is defined for ATOMIC_FDE_FAST_PATH but used in code /
> > asserts not protected by that macro.
> >
> >gcc_assert (in_shutdown || ob);
> >return (void *) ob;
> > }
>
> I am sorry for that, I did not consider that my test machines all use
> the fast path.
>
> I think the problem can be fixed by the trivial patch below, I will
> commit that after I have tested builds both with and without fast path.
>
> Best
>
> Thomas
>
>
> diff --git a/libgcc/unwind-dw2-fde.c b/libgcc/unwind-dw2-fde.c
> index d237179f4ea..d6e347c5481 100644
> --- a/libgcc/unwind-dw2-fde.c
> +++ b/libgcc/unwind-dw2-fde.c
> @@ -67,6 +67,8 @@ static void
>   init_object (struct object *ob);
>
>   #else
> +/* Without fast path frame lookup must always succeed */
>
The comment should end with full stop and two spaces.


> +static const bool in_shutdown = false;
>
I'll let Jason or others decide if this is the right solution.  It seems
that in_shutdown also could be declared outside the #ifdef and initialized
as "false".

Thanks, David


>
>   /* The unseen_objects list contains objects that have been registered
>  but not yet categorized in any way.  The seen_objects list has had
>


Re: [PATCH 2/2]AArch64 Add support for neg on v1df

2022-09-23 Thread Richard Sandiford via Gcc-patches
Tamar Christina  writes:
>> -Original Message-
>> From: Richard Sandiford 
>> Sent: Friday, September 23, 2022 6:04 AM
>> To: Tamar Christina 
>> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
>> ; Marcus Shawcroft
>> ; Kyrylo Tkachov 
>> Subject: Re: [PATCH 2/2]AArch64 Add support for neg on v1df
>> 
>> Tamar Christina  writes:
>> >> -Original Message-
>> >> From: Richard Sandiford 
>> >> Sent: Friday, September 23, 2022 5:30 AM
>> >> To: Tamar Christina 
>> >> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
>> >> ; Marcus Shawcroft
>> >> ; Kyrylo Tkachov
>> 
>> >> Subject: Re: [PATCH 2/2]AArch64 Add support for neg on v1df
>> >>
>> >> Tamar Christina  writes:
>> >> > Hi All,
>> >> >
>> >> > This adds support for using scalar fneg on the V1DF type.
>> >> >
>> >> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>> >> >
>> >> > Ok for master?
>> >>
>> >> Why just this one operation though?  Couldn't we extend iterators
>> >> like
>> >> GPF_F16 to include V1DF, avoiding the need for new patterns?
>> >>
>> >
>> > Simply because it's the only one I know how to generate code for.
>> > I can change GPF_F16 but I don't know under which circumstances we'd
>> > generate a V1DF for the other operations.
>> 
>> We'd do it for things like:
>> 
>> __Float64x1_t foo (__Float64x1_t x) { return -x; }
>> 
>> if the pattern is available, instead of using subregs.  So one way would be 
>> to
>> scan the expand rtl dump for subregs.
>
> Ahh yes, I forgot about that ACLE type.
>
>> 
>> If the point is that there is no observable difference between defining 1-
>> element vector ops and not, except for this one case, then that suggests we
>> should handle this case in target-independent code instead.  There's no point
>> forcing every target that has V1DF to define a duplicate of the DF neg
>> pattern.
>
> My original approach was to indeed use DF instead of V1DF, however since we
> do define V1DF I had expected the mode to be somewhat usable.
>
> So I'm happy to do whichever one you prefer now that I know how to test it.
> I can either change my mid-end code, or extend the coverage of V1DF, any 
> preference? 

I don't mind really, as long as we're consistent.  Maybe Richi has an opinion.

If he doesn't mind either, then I guess it makes sense to define the ops
as completely as possible (e.g. equivalently to V2SF), although it doesn't
need to be all in one go.

Thanks,
Richard

> Tamar
>
>> 
>> Thanks,
>> Richard
>> >
>> > So if it's ok to do so without full test coverage I'm happy to do so...
>> >
>> > Tamar.
>> >
>> >> Richard
>> >>
>> >> >
>> >> > Thanks,
>> >> > Tamar
>> >> >
>> >> > gcc/ChangeLog:
>> >> >
>> >> > * config/aarch64/aarch64-simd.md (negv1df2): New.
>> >> >
>> >> > gcc/testsuite/ChangeLog:
>> >> >
>> >> > * gcc.target/aarch64/simd/addsub_2.c: New test.
>> >> >
>> >> > --- inline copy of patch --
>> >> > diff --git a/gcc/config/aarch64/aarch64-simd.md
>> >> > b/gcc/config/aarch64/aarch64-simd.md
>> >> > index
>> >> >
>> >>
>> f4152160084d6b6f34bd69f0ba6386c1ab50f77e..cf8c094bd4b76981cef2dd5dd7
>> >> b8
>> >> > e6be0d56101f 100644
>> >> > --- a/gcc/config/aarch64/aarch64-simd.md
>> >> > +++ b/gcc/config/aarch64/aarch64-simd.md
>> >> > @@ -2713,6 +2713,14 @@ (define_insn "neg2"
>> >> >[(set_attr "type" "neon_fp_neg_")]
>> >> >  )
>> >> >
>> >> > +(define_insn "negv1df2"
>> >> > + [(set (match_operand:V1DF 0 "register_operand" "=w")
>> >> > +   (neg:V1DF (match_operand:V1DF 1 "register_operand" "w")))]
>> >> > +"TARGET_SIMD"
>> >> > + "fneg\\t%d0, %d1"
>> >> > +  [(set_attr "type" "neon_fp_neg_d")]
>> >> > +)
>> >> > +
>> >> >  (define_insn "abs2"
>> >> >   [(set (match_operand:VHSDF 0 "register_operand" "=w")
>> >> > (abs:VHSDF (match_operand:VHSDF 1 "register_operand"
>> >> > "w")))] diff --git
>> >> > a/gcc/testsuite/gcc.target/aarch64/simd/addsub_2.c
>> >> > b/gcc/testsuite/gcc.target/aarch64/simd/addsub_2.c
>> >> > new file mode 100644
>> >> > index
>> >> >
>> >>
>> ..55a7365e897f8af509de953129
>> >> e0
>> >> > f516974f7ca8
>> >> > --- /dev/null
>> >> > +++ b/gcc/testsuite/gcc.target/aarch64/simd/addsub_2.c
>> >> > @@ -0,0 +1,22 @@
>> >> > +/* { dg-do compile } */
>> >> > +/* { dg-options "-Ofast" } */
>> >> > +/* { dg-final { check-function-bodies "**" "" "" { target { le } }
>> >> > +} } */
>> >> > +
>> >> > +#pragma GCC target "+nosve"
>> >> > +
>> >> > +/*
>> >> > +** f1:
>> >> > +** ...
>> >> > +** fnegd[0-9]+, d[0-9]+
>> >> > +** faddv[0-9]+.2s, v[0-9]+.2s, v[0-9]+.2s
>> >> > +** ...
>> >> > +*/
>> >> > +void f1 (float *restrict a, float *restrict b, float *res, int n) {
>> >> > +   for (int i = 0; i < 2; i+=2)
>> >> > +{
>> >> > +  res[i+0] = a[i+0] + b[i+0];
>> >> > +  res[i+1] = a[i+1] - b[i+1];
>> >> > +}
>> >> > +}
>> >> > +


[PATCH v2] libgo: Portable access to thread ID in struct sigevent

2022-09-23 Thread soeren--- via Gcc-patches
From: Sören Tempel 

Tested on x86_64 Arch Linux (glibc) and Alpine Linux (musl libc).

Previously, libgo relied on the _sigev_un implementation-specific
field in struct sigevent, which is only available on glibc. This
patch uses the sigev_notify_thread_id macro instead which is mandated
by timer_create(2). In theory, this should work with any libc
implementation for Linux. Unfortunately, there is an open glibc bug
as glibc does not define this macro. For this reason, a glibc-specific
workaround is required. Other libcs (such as musl) define the macro
and don't require the workaround.

This makes go_signal compatible with musl libc.

See: https://sourceware.org/bugzilla/show_bug.cgi?id=27417

Signed-off-by: Sören Tempel 
---
Changes since v1: Add workaround for glibc.

 libgo/go/runtime/os_linux.go |  4 +++-
 libgo/runtime/go-signal.c| 15 +++
 2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/libgo/go/runtime/os_linux.go b/libgo/go/runtime/os_linux.go
index 96fb1788..6653d85e 100644
--- a/libgo/go/runtime/os_linux.go
+++ b/libgo/go/runtime/os_linux.go
@@ -22,6 +22,8 @@ type mOS struct {
profileTimerValid uint32
 }
 
+func setProcID(uintptr, int32)
+
 func getProcID() uint64 {
return uint64(gettid())
 }
@@ -365,7 +367,7 @@ func setThreadCPUProfiler(hz int32) {
var sevp _sigevent
sevp.sigev_notify = _SIGEV_THREAD_ID
sevp.sigev_signo = _SIGPROF
-   *((*int32)(unsafe.Pointer(_sigev_un))) = int32(mp.procid)
+   setProcID(uintptr(unsafe.Pointer()), int32(mp.procid))
ret := timer_create(_CLOCK_THREAD_CPUTIME_ID, , )
if ret != 0 {
// If we cannot create a timer for this M, leave 
profileTimerValid false
diff --git a/libgo/runtime/go-signal.c b/libgo/runtime/go-signal.c
index 528d9b6d..c56350cc 100644
--- a/libgo/runtime/go-signal.c
+++ b/libgo/runtime/go-signal.c
@@ -16,6 +16,11 @@
   #define SA_RESTART 0
 #endif
 
+// Workaround for https://sourceware.org/bugzilla/show_bug.cgi?id=27417
+#if __linux__ && !defined(sigev_notify_thread_id)
+  #define sigev_notify_thread_id _sigev_un._tid
+#endif
+
 #ifdef USING_SPLIT_STACK
 
 extern void __splitstack_getcontext(void *context[10]);
@@ -183,6 +188,16 @@ setSigactionHandler(struct sigaction* sa, uintptr handler)
sa->sa_sigaction = (void*)(handler);
 }
 
+void setProcID(uintptr_t, int32_t)
+   __asm__ (GOSYM_PREFIX "runtime.setProcID");
+
+void
+setProcID(uintptr_t ptr, int32_t v)
+{
+   struct sigevent *s = (void *)ptr;
+   s->sigev_notify_thread_id = v;
+}
+
 // C code to fetch values from the siginfo_t and ucontext_t pointers
 // passed to a signal handler.
 


RE: [PATCH]middle-end Add optimized float addsub without needing VEC_PERM_EXPR.

2022-09-23 Thread Tamar Christina via Gcc-patches
> -Original Message-
> From: Gcc-patches  bounces+tamar.christina=arm@gcc.gnu.org> On Behalf Of Tamar
> Christina via Gcc-patches
> Sent: Friday, September 23, 2022 9:14 AM
> To: Richard Biener 
> Cc: Richard Sandiford ; nd ;
> Tamar Christina via Gcc-patches ;
> juzhe.zh...@rivai.ai
> Subject: RE: [PATCH]middle-end Add optimized float addsub without
> needing VEC_PERM_EXPR.
> 
> > -Original Message-
> > From: Richard Biener 
> > Sent: Friday, September 23, 2022 8:54 AM
> > To: Tamar Christina 
> > Cc: Richard Sandiford ; Tamar Christina via
> > Gcc-patches ; nd ;
> > juzhe.zh...@rivai.ai
> > Subject: RE: [PATCH]middle-end Add optimized float addsub without
> > needing VEC_PERM_EXPR.
> >
> > On Fri, 23 Sep 2022, Tamar Christina wrote:
> >
> > > Hi,
> > >
> > > Attached is the respun version of the patch,
> > >
> > > > >>
> > > > >> Wouldn't a target need to re-check if lanes are NaN or denormal
> > > > >> if after a SFmode lane operation a DFmode lane operation follows?
> > > > >> IIRC that is what usually makes punning "integer" vectors as FP
> > > > >> vectors
> > costly.
> > >
> > > I don't believe this is a problem, due to NANs not being a single
> > > value and according to the standard the sign bit doesn't change the
> > meaning of a NAN.
> > >
> > > That's why specifically for negates generally no check is performed
> > > and it's Assumed that if a value is a NaN going in, it's a NaN
> > > coming out, and this Optimization doesn't change that.  Also under
> > > fast-math we don't guarantee a stable representation for NaN (or zeros,
> etc) afaik.
> > >
> > > So if that is still a concern I could add && !HONORS_NAN () to the
> > constraints.
> > >
> > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > >
> > > Ok for master?
> > >
> > > Thanks,
> > > Tamar
> > >
> > > gcc/ChangeLog:
> > >
> > >   * match.pd: Add fneg/fadd rule.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >   * gcc.target/aarch64/simd/addsub_1.c: New test.
> > >   * gcc.target/aarch64/sve/addsub_1.c: New test.
> > >
> > > --- inline version of patch ---
> > >
> > > diff --git a/gcc/match.pd b/gcc/match.pd index
> > >
> >
> 1bb936fc4010f98f24bb97671350e8432c55b347..2617d56091dfbd41ae49f980e
> > e0a
> > > f3757f5ec1cf 100644
> > > --- a/gcc/match.pd
> > > +++ b/gcc/match.pd
> > > @@ -7916,6 +7916,59 @@ and,
> > >(simplify (reduc (op @0 VECTOR_CST@1))
> > >  (op (reduc:type @0) (reduc:type @1
> > >
> > > +/* Simplify vector floating point operations of alternating sub/add pairs
> > > +   into using an fneg of a wider element type followed by a normal add.
> > > +   under IEEE 754 the fneg of the wider type will negate every even entry
> > > +   and when doing an add we get a sub of the even and add of every odd
> > > +   elements.  */
> > > +(simplify
> > > + (vec_perm (plus:c @0 @1) (minus @0 @1) VECTOR_CST@2)  (if
> > > +(!VECTOR_INTEGER_TYPE_P (type) && !BYTES_BIG_ENDIAN)
> >
> > shouldn't this be FLOAT_WORDS_BIG_ENDIAN instead?
> >
> > I'm still concerned what
> >
> >  (neg:V2DF (subreg:V2DF (reg:V4SF) 0))
> >
> > means for architectures like RISC-V.  Can one "reformat" FP values in
> > vector registers so that two floats overlap a double (and then back)?
> >
> > I suppose you rely on target_can_change_mode_class to tell you that.
> 
> Indeed, the documentation says:
> 
> "This hook returns true if it is possible to bitcast values held in registers 
> of
> class rclass from mode from to mode to and if doing so preserves the low-
> order bits that are common to both modes. The result is only meaningful if
> rclass has registers that can hold both from and to."
> 
> This implies to me that if the bitcast shouldn't be possible the hook should
> reject it.
> Of course you always where something is possible, but perhaps not cheap to
> do.
> 
> The specific implementation for RISC-V seem to imply to me that they
> disallow any FP conversions. So seems to be ok.
> 
> >
> >
> > > +  (with
> > > +   {
> > > + /* Build a vector of integers from the tree mask.  */
> > > + vec_perm_builder builder;
> > > + if (!tree_to_vec_perm_builder (, @2))
> > > +   return NULL_TREE;
> > > +
> > > + /* Create a vec_perm_indices for the integer vector.  */
> > > + poly_uint64 nelts = TYPE_VECTOR_SUBPARTS (type);
> > > + vec_perm_indices sel (builder, 2, nelts);
> > > +   }
> > > +   (if (sel.series_p (0, 2, 0, 2))
> > > +(with
> > > + {
> > > +   machine_mode vec_mode = TYPE_MODE (type);
> > > +   auto elem_mode = GET_MODE_INNER (vec_mode);
> > > +   auto nunits = exact_div (GET_MODE_NUNITS (vec_mode), 2);
> > > +   tree stype;
> > > +   switch (elem_mode)
> > > +  {
> > > +  case E_HFmode:
> > > +stype = float_type_node;
> > > +break;
> > > +  case E_SFmode:
> > > +stype = double_type_node;
> > > +break;
> > > +  default:
> > > +return NULL_TREE;
> > > +  }
> >
> > Can't you use GET_MODE_WIDER_MODE and double-check the mode-size
> 

Re: [PATCH] Avoid depending on destructor order

2022-09-23 Thread Thomas Neumann via Gcc-patches

This patch broke bootstrap on AIX and probably other targets.

#ifdef ATOMIC_FDE_FAST_PATH
#include "unwind-dw2-btree.h"

static struct btree registered_frames;
static bool in_shutdown;
...
#else

in_shutdown only is defined for ATOMIC_FDE_FAST_PATH but used in code / 
asserts not protected by that macro.


   gcc_assert (in_shutdown || ob);
   return (void *) ob;
}


I am sorry for that, I did not consider that my test machines all use 
the fast path.


I think the problem can be fixed by the trivial patch below, I will 
commit that after I have tested builds both with and without fast path.


Best

Thomas


diff --git a/libgcc/unwind-dw2-fde.c b/libgcc/unwind-dw2-fde.c
index d237179f4ea..d6e347c5481 100644
--- a/libgcc/unwind-dw2-fde.c
+++ b/libgcc/unwind-dw2-fde.c
@@ -67,6 +67,8 @@ static void
 init_object (struct object *ob);

 #else
+/* Without fast path frame lookup must always succeed */
+static const bool in_shutdown = false;

 /* The unseen_objects list contains objects that have been registered
but not yet categorized in any way.  The seen_objects list has had


Re: [PATCH] c++ modules: ICE with class NTTP argument [PR100616]

2022-09-23 Thread Patrick Palka via Gcc-patches
On Thu, 22 Sep 2022, Nathan Sidwell wrote:

> On 9/22/22 14:25, Patrick Palka wrote:
> 
> > index 80467c19254..722b64793ed 100644
> > --- a/gcc/cp/decl.cc
> > +++ b/gcc/cp/decl.cc
> > @@ -18235,9 +18235,11 @@ maybe_register_incomplete_var (tree var)
> > {
> >   /* When the outermost open class is complete we can resolve any
> >  pointers-to-members.  */
> > - tree context = outermost_open_class ();
> > - incomplete_var iv = {var, context};
> > - vec_safe_push (incomplete_vars, iv);
> > + if (tree context = outermost_open_class ())
> > +   {
> > + incomplete_var iv = {var, context};
> > + vec_safe_push (incomplete_vars, iv);
> > +   }
> 
> My immediate thought here is eek!  during stream in, the outermost_open_class
> could be anything -- to do with the context that wanted to lookup of the thing
> being streamed in, right?  So, the above change is I think just papering over
> a problem in this case.

D'oh, makes sense.  I suppose this second branch of
maybe_register_incomplete_var shouldn't ever be taken during stream-in.

> 
> not sure how to approach this.

Judging by the two commits that introduced/modified this part of
maybe_register_incomplete_var, r196852 and r214333, ISTM the code
is really only concerned with constexpr static data members (whose
initializer may contain a pointer-to-member for a currently open class).
So maybe we ought to restrict the branch like so, which effectively
disables this part of maybe_register_incomplete_var during stream-in, and
guarantees that outermost_open_class doesn't return NULL if the branch is
taken?

Bootstrapped and regtested on x86_64-pc-linux-gnu.


-- >8 --

Subject: [PATCH] c++ modules: ICE with class NTTP argument [PR100616]

PR c++/100616

gcc/cp/ChangeLog:

* decl.cc (maybe_register_incomplete_var): Restrict second branch
to static data members from a currently open class.

gcc/testsuite/ChangeLog:

* g++.dg/modules/pr100616_a.C: New test.
* g++.dg/modules/pr100616_b.C: New test.
---
 gcc/cp/decl.cc|  2 ++
 gcc/testsuite/g++.dg/modules/pr100616_a.C |  8 
 gcc/testsuite/g++.dg/modules/pr100616_b.C | 10 ++
 3 files changed, 20 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/modules/pr100616_a.C
 create mode 100644 gcc/testsuite/g++.dg/modules/pr100616_b.C

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 80467c19254..ea616f0e686 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -18230,6 +18230,8 @@ maybe_register_incomplete_var (tree var)
  vec_safe_push (incomplete_vars, iv);
}
   else if (!(DECL_LANG_SPECIFIC (var) && DECL_TEMPLATE_INFO (var))
+  && DECL_CLASS_SCOPE_P (var)
+  && currently_open_class (DECL_CONTEXT (var))
   && decl_constant_var_p (var)
   && (TYPE_PTRMEM_P (inner_type) || CLASS_TYPE_P (inner_type)))
{
diff --git a/gcc/testsuite/g++.dg/modules/pr100616_a.C 
b/gcc/testsuite/g++.dg/modules/pr100616_a.C
new file mode 100644
index 000..788af2eb533
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/pr100616_a.C
@@ -0,0 +1,8 @@
+// PR c++/100616
+// { dg-additional-options "-std=c++20 -fmodules-ts" }
+// { dg-module-cmi pr100616 }
+export module pr100616;
+
+template struct C { };
+struct A { };
+C c1;
diff --git a/gcc/testsuite/g++.dg/modules/pr100616_b.C 
b/gcc/testsuite/g++.dg/modules/pr100616_b.C
new file mode 100644
index 000..a37eb08131b
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/pr100616_b.C
@@ -0,0 +1,10 @@
+// PR c++/100616
+// { dg-additional-options "-std=c++20 -fmodules-ts" }
+module pr100616;
+
+C c2;
+
+// FIXME: We don't reuse the artificial VAR_DECL for the class NTTP argument 
A{}
+// from the other translation unit, which causes these types to be different.
+using type = decltype(c1);
+using type = decltype(c2); // { dg-bogus "conflicting" "" { xfail *-*-* } }
-- 
2.38.0.rc1.2.g4b79ee4b0c



RE: [PATCH]middle-end simplify complex if expressions where comparisons are inverse of one another.

2022-09-23 Thread Tamar Christina via Gcc-patches
> -Original Message-
> From: Richard Biener 
> Sent: Friday, September 23, 2022 9:10 AM
> To: Tamar Christina 
> Cc: Andrew Pinski ; nd ; gcc-
> patc...@gcc.gnu.org
> Subject: RE: [PATCH]middle-end simplify complex if expressions where
> comparisons are inverse of one another.
> 
> On Fri, 23 Sep 2022, Tamar Christina wrote:
> 
> > Hello,
> >
> > > where logical_inverted is somewhat contradicting using
> > > zero_one_valued instead of truth_valued_p (I think the former might
> > > not work for vector booleans?).
> > >
> > > In the end I'd prefer zero_one_valued_p but avoiding
> > > inverse_conditions_p would be nice.
> > >
> > > Richard.
> >
> > It's not pretty but I've made it work and added more tests.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
> > and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > * match.pd: Add new rule.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/aarch64/if-compare_1.c: New test.
> > * gcc.target/aarch64/if-compare_2.c: New test.
> >
> > --- inline copy of patch ---
> >
> > diff --git a/gcc/match.pd b/gcc/match.pd index
> >
> b61ed70e69b881a49177f10f20c1f92712bb8665..39da61bf117a6eb2924fc8a647
> 3f
> > b37ddadd60e9 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -1903,6 +1903,101 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >   (if (INTEGRAL_TYPE_P (type))
> >(bit_and @0 @1)))
> >
> > +(for cmp (tcc_comparison)
> > + icmp (inverted_tcc_comparison)
> > + /* Fold (((a < b) & c) | ((a >= b) & d)) into (a < b ? c : d) & 1.
> > +*/  (simplify
> > +  (bit_ior
> > +   (bit_and:c (convert? zero_one_valued_p@0) @2)
> > +   (bit_and:c (convert? zero_one_valued_p@1) @3))
> > +(with {
> > +  enum tree_code c1
> > +   = (TREE_CODE (@0) == SSA_NAME
> > +  ? gimple_assign_rhs_code (SSA_NAME_DEF_STMT (@0)) :
> TREE_CODE
> > +(@0));
> > +
> > +  enum tree_code c2
> > +   = (TREE_CODE (@1) == SSA_NAME
> > +  ? gimple_assign_rhs_code (SSA_NAME_DEF_STMT (@1)) :
> TREE_CODE (@1));
> > + }
> > +(if (INTEGRAL_TYPE_P (type)
> > +&& c1 == cmp
> > +&& c2 == icmp
> 
> So that doesn't have any advantage over doing
> 
>  (simplify
>   (bit_ior
>(bit_and:c (convert? (cmp@0 @01 @02)) @2)
>(bit_and:c (convert? (icmp@1 @11 @12)) @3)) ...
> 
> I don't remember if that's what we had before.

No, the specific problem has always been applying zero_one_valued_p to the 
right type.
Before it was much shorter because I was using the tree  helper function to get 
the inverses.

But with your suggestion I think I can do zero_one_valued_p on @0 and @1 
instead..

> 
> > +/* The scalar version has to be canonicalized after vectorization
> > +   because it makes unconditional loads conditional ones, which
> > +   means we lose vectorization because the loads may trap.  */
> > +&& canonicalize_math_after_vectorization_p ())
> > + (bit_and (cond @0 @2 @3) { build_one_cst (type); }
> > +
> > + /* Fold ((-(a < b) & c) | (-(a >= b) & d)) into a < b ? c : d.  */
> 
> The comment doesn't match the pattern below?

The pattern in the comment gets rewritten to this form eventually,
so I match it instead.  I can update the comment but I thought the above
made it more clear why these belong together 

> 
> > + (simplify
> > +  (bit_ior
> > +   (cond zero_one_valued_p@0 @2 zerop)
> > +   (cond zero_one_valued_p@1 @3 zerop))
> > +(with {
> > +  enum tree_code c1
> > +   = (TREE_CODE (@0) == SSA_NAME
> > +  ? gimple_assign_rhs_code (SSA_NAME_DEF_STMT (@0)) :
> TREE_CODE
> > +(@0));
> > +
> > +  enum tree_code c2
> > +   = (TREE_CODE (@1) == SSA_NAME
> > +  ? gimple_assign_rhs_code (SSA_NAME_DEF_STMT (@1)) :
> TREE_CODE (@1));
> > + }
> > +(if (INTEGRAL_TYPE_P (type)
> > +&& c1 == cmp
> > +&& c2 == icmp
> > +/* The scalar version has to be canonicalized after vectorization
> > +   because it makes unconditional loads conditional ones, which
> > +   means we lose vectorization because the loads may trap.  */
> > +&& canonicalize_math_after_vectorization_p ())
> > +(cond @0 @2 @3
> > +
> > + /* Vector Fold (((a < b) & c) | ((a >= b) & d)) into a < b ? c : d.
> > +and ((~(a < b) & c) | (~(a >= b) & d)) into a < b ? c : d.  */
> > +(simplify
> > +  (bit_ior
> > +   (bit_and:c (vec_cond:s @0 @4 @5) @2)
> > +   (bit_and:c (vec_cond:s @1 @4 @5) @3))
> > +(with {
> > +  enum tree_code c1
> > +   = (TREE_CODE (@0) == SSA_NAME
> > +  ? gimple_assign_rhs_code (SSA_NAME_DEF_STMT (@0)) :
> TREE_CODE
> > +(@0));
> > +
> > +  enum tree_code c2
> > +   = (TREE_CODE (@1) == SSA_NAME
> > +  ? gimple_assign_rhs_code (SSA_NAME_DEF_STMT (@1)) :
> TREE_CODE (@1));
> > + }
> > + (if (c1 == cmp && c2 == icmp)
> > +  (if (integer_zerop (@5))
> > +   (switch
> > +   (if (integer_onep (@4))
> > +(bit_and (vec_cond @0 @2 @3) @4))
> > +   (if (integer_minus_onep (@4))
> > +

RE: [PATCH]middle-end Add optimized float addsub without needing VEC_PERM_EXPR.

2022-09-23 Thread Tamar Christina via Gcc-patches
> -Original Message-
> From: Richard Biener 
> Sent: Friday, September 23, 2022 8:54 AM
> To: Tamar Christina 
> Cc: Richard Sandiford ; Tamar Christina via
> Gcc-patches ; nd ;
> juzhe.zh...@rivai.ai
> Subject: RE: [PATCH]middle-end Add optimized float addsub without
> needing VEC_PERM_EXPR.
> 
> On Fri, 23 Sep 2022, Tamar Christina wrote:
> 
> > Hi,
> >
> > Attached is the respun version of the patch,
> >
> > > >>
> > > >> Wouldn't a target need to re-check if lanes are NaN or denormal
> > > >> if after a SFmode lane operation a DFmode lane operation follows?
> > > >> IIRC that is what usually makes punning "integer" vectors as FP vectors
> costly.
> >
> > I don't believe this is a problem, due to NANs not being a single
> > value and according to the standard the sign bit doesn't change the
> meaning of a NAN.
> >
> > That's why specifically for negates generally no check is performed
> > and it's Assumed that if a value is a NaN going in, it's a NaN coming
> > out, and this Optimization doesn't change that.  Also under fast-math
> > we don't guarantee a stable representation for NaN (or zeros, etc) afaik.
> >
> > So if that is still a concern I could add && !HONORS_NAN () to the
> constraints.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > * match.pd: Add fneg/fadd rule.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/aarch64/simd/addsub_1.c: New test.
> > * gcc.target/aarch64/sve/addsub_1.c: New test.
> >
> > --- inline version of patch ---
> >
> > diff --git a/gcc/match.pd b/gcc/match.pd index
> >
> 1bb936fc4010f98f24bb97671350e8432c55b347..2617d56091dfbd41ae49f980e
> e0a
> > f3757f5ec1cf 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -7916,6 +7916,59 @@ and,
> >(simplify (reduc (op @0 VECTOR_CST@1))
> >  (op (reduc:type @0) (reduc:type @1
> >
> > +/* Simplify vector floating point operations of alternating sub/add pairs
> > +   into using an fneg of a wider element type followed by a normal add.
> > +   under IEEE 754 the fneg of the wider type will negate every even entry
> > +   and when doing an add we get a sub of the even and add of every odd
> > +   elements.  */
> > +(simplify
> > + (vec_perm (plus:c @0 @1) (minus @0 @1) VECTOR_CST@2)  (if
> > +(!VECTOR_INTEGER_TYPE_P (type) && !BYTES_BIG_ENDIAN)
> 
> shouldn't this be FLOAT_WORDS_BIG_ENDIAN instead?
> 
> I'm still concerned what
> 
>  (neg:V2DF (subreg:V2DF (reg:V4SF) 0))
> 
> means for architectures like RISC-V.  Can one "reformat" FP values in vector
> registers so that two floats overlap a double (and then back)?
> 
> I suppose you rely on target_can_change_mode_class to tell you that.

Indeed, the documentation says:

"This hook returns true if it is possible to bitcast values held in registers 
of class rclass
from mode from to mode to and if doing so preserves the low-order bits that are
common to both modes. The result is only meaningful if rclass has registers 
that can
hold both from and to."

This implies to me that if the bitcast shouldn't be possible the hook should 
reject it.
Of course you always where something is possible, but perhaps not cheap to do.

The specific implementation for RISC-V seem to imply to me that they disallow 
any FP
conversions. So seems to be ok.

> 
> 
> > +  (with
> > +   {
> > + /* Build a vector of integers from the tree mask.  */
> > + vec_perm_builder builder;
> > + if (!tree_to_vec_perm_builder (, @2))
> > +   return NULL_TREE;
> > +
> > + /* Create a vec_perm_indices for the integer vector.  */
> > + poly_uint64 nelts = TYPE_VECTOR_SUBPARTS (type);
> > + vec_perm_indices sel (builder, 2, nelts);
> > +   }
> > +   (if (sel.series_p (0, 2, 0, 2))
> > +(with
> > + {
> > +   machine_mode vec_mode = TYPE_MODE (type);
> > +   auto elem_mode = GET_MODE_INNER (vec_mode);
> > +   auto nunits = exact_div (GET_MODE_NUNITS (vec_mode), 2);
> > +   tree stype;
> > +   switch (elem_mode)
> > +{
> > +case E_HFmode:
> > +  stype = float_type_node;
> > +  break;
> > +case E_SFmode:
> > +  stype = double_type_node;
> > +  break;
> > +default:
> > +  return NULL_TREE;
> > +}
> 
> Can't you use GET_MODE_WIDER_MODE and double-check the mode-size
> doubles?  I mean you obviously miss DFmode -> TFmode.

Problem is I need the type, not the mode, but all even 
build_pointer_type_for_mode
requires the new scalar type.  So I couldn't find anything to help here given 
that there's
no inverse relationship between modes and types.

> 
> > +   tree ntype = build_vector_type (stype, nunits);
> > +   if (!ntype)
> 
> You want to check that the above results in a vector mode.

Does it? Technically you can cast a V2SF to both a V1DF or DF can't you?
Both seem equally valid here.

> > +return NULL_TREE;
> > +
> > +   /* The format has to have a simple sign bit.  */
> > 

RE: [PATCH]middle-end simplify complex if expressions where comparisons are inverse of one another.

2022-09-23 Thread Richard Biener via Gcc-patches
On Fri, 23 Sep 2022, Tamar Christina wrote:

> Hello,
> 
> > where logical_inverted is somewhat contradicting using zero_one_valued
> > instead of truth_valued_p (I think the former might not work for vector
> > booleans?).
> > 
> > In the end I'd prefer zero_one_valued_p but avoiding inverse_conditions_p
> > would be nice.
> > 
> > Richard.
> 
> It's not pretty but I've made it work and added more tests.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu,
> x86_64-pc-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   * match.pd: Add new rule.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/aarch64/if-compare_1.c: New test.
>   * gcc.target/aarch64/if-compare_2.c: New test.
> 
> --- inline copy of patch ---
> 
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 
> b61ed70e69b881a49177f10f20c1f92712bb8665..39da61bf117a6eb2924fc8a6473fb37ddadd60e9
>  100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -1903,6 +1903,101 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (if (INTEGRAL_TYPE_P (type))
>(bit_and @0 @1)))
>  
> +(for cmp (tcc_comparison)
> + icmp (inverted_tcc_comparison)
> + /* Fold (((a < b) & c) | ((a >= b) & d)) into (a < b ? c : d) & 1.  */
> + (simplify
> +  (bit_ior
> +   (bit_and:c (convert? zero_one_valued_p@0) @2)
> +   (bit_and:c (convert? zero_one_valued_p@1) @3))
> +(with {
> +  enum tree_code c1
> + = (TREE_CODE (@0) == SSA_NAME
> +? gimple_assign_rhs_code (SSA_NAME_DEF_STMT (@0)) : TREE_CODE (@0));
> +
> +  enum tree_code c2
> + = (TREE_CODE (@1) == SSA_NAME
> +? gimple_assign_rhs_code (SSA_NAME_DEF_STMT (@1)) : TREE_CODE (@1));
> + }
> +(if (INTEGRAL_TYPE_P (type)
> +  && c1 == cmp
> +  && c2 == icmp

So that doesn't have any advantage over doing

 (simplify
  (bit_ior
   (bit_and:c (convert? (cmp@0 @01 @02)) @2)
   (bit_and:c (convert? (icmp@1 @11 @12)) @3))
...

I don't remember if that's what we had before.

> +  /* The scalar version has to be canonicalized after vectorization
> + because it makes unconditional loads conditional ones, which
> + means we lose vectorization because the loads may trap.  */
> +  && canonicalize_math_after_vectorization_p ())
> + (bit_and (cond @0 @2 @3) { build_one_cst (type); }
> +
> + /* Fold ((-(a < b) & c) | (-(a >= b) & d)) into a < b ? c : d.  */

The comment doesn't match the pattern below?

> + (simplify
> +  (bit_ior
> +   (cond zero_one_valued_p@0 @2 zerop)
> +   (cond zero_one_valued_p@1 @3 zerop))
> +(with {
> +  enum tree_code c1
> + = (TREE_CODE (@0) == SSA_NAME
> +? gimple_assign_rhs_code (SSA_NAME_DEF_STMT (@0)) : TREE_CODE (@0));
> +
> +  enum tree_code c2
> + = (TREE_CODE (@1) == SSA_NAME
> +? gimple_assign_rhs_code (SSA_NAME_DEF_STMT (@1)) : TREE_CODE (@1));
> + }
> +(if (INTEGRAL_TYPE_P (type)
> +  && c1 == cmp
> +  && c2 == icmp
> +  /* The scalar version has to be canonicalized after vectorization
> + because it makes unconditional loads conditional ones, which
> + means we lose vectorization because the loads may trap.  */
> +  && canonicalize_math_after_vectorization_p ())
> +(cond @0 @2 @3
> +
> + /* Vector Fold (((a < b) & c) | ((a >= b) & d)) into a < b ? c : d. 
> +and ((~(a < b) & c) | (~(a >= b) & d)) into a < b ? c : d.  */
> + (simplify
> +  (bit_ior
> +   (bit_and:c (vec_cond:s @0 @4 @5) @2)
> +   (bit_and:c (vec_cond:s @1 @4 @5) @3))
> +(with {
> +  enum tree_code c1
> + = (TREE_CODE (@0) == SSA_NAME
> +? gimple_assign_rhs_code (SSA_NAME_DEF_STMT (@0)) : TREE_CODE (@0));
> +
> +  enum tree_code c2
> + = (TREE_CODE (@1) == SSA_NAME
> +? gimple_assign_rhs_code (SSA_NAME_DEF_STMT (@1)) : TREE_CODE (@1));
> + }
> + (if (c1 == cmp && c2 == icmp)
> +  (if (integer_zerop (@5))
> +   (switch
> + (if (integer_onep (@4))
> +  (bit_and (vec_cond @0 @2 @3) @4))
> + (if (integer_minus_onep (@4))
> +  (vec_cond @0 @2 @3)))
> +  (if (integer_zerop (@4))
> +   (switch
> + (if (integer_onep (@5))
> +  (bit_and (vec_cond @0 @3 @2) @5))
> + (if (integer_minus_onep (@5))
> +  (vec_cond @0 @3 @2
> +
> + /* Scalar Vectorized Fold ((-(a < b) & c) | (-(a >= b) & d))
> +into a < b ? d : c.  */
> + (simplify
> +  (bit_ior
> +   (vec_cond:s @0 @2 integer_zerop)
> +   (vec_cond:s @1 @3 integer_zerop))
> +(with {
> +  enum tree_code c1
> + = (TREE_CODE (@0) == SSA_NAME
> +? gimple_assign_rhs_code (SSA_NAME_DEF_STMT (@0)) : TREE_CODE (@0));
> +
> +  enum tree_code c2
> + = (TREE_CODE (@1) == SSA_NAME
> +? gimple_assign_rhs_code (SSA_NAME_DEF_STMT (@1)) : TREE_CODE (@1));
> + }
> + (if (c1 == cmp && c2 == icmp)
> +  (vec_cond @0 @2 @3)
> +

As you say, it's not pretty.  When looking at

int zoo1 (int a, int b, int c, int d)
{  
   return ((a < b) & c) | ((a >= b) 

Re: RE: [PATCH]middle-end Add optimized float addsub without needing VEC_PERM_EXPR.

2022-09-23 Thread 钟居哲
So far I didn't see the case that V2DF <-> V4SF in RISC-V. 



juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2022-09-23 20:54
To: Tamar Christina
CC: Richard Sandiford; Tamar Christina via Gcc-patches; nd; juzhe.zhong
Subject: RE: [PATCH]middle-end Add optimized float addsub without needing 
VEC_PERM_EXPR.
On Fri, 23 Sep 2022, Tamar Christina wrote:
 
> Hi,
> 
> Attached is the respun version of the patch,
> 
> > >>
> > >> Wouldn't a target need to re-check if lanes are NaN or denormal if
> > >> after a SFmode lane operation a DFmode lane operation follows?  IIRC
> > >> that is what usually makes punning "integer" vectors as FP vectors 
> > >> costly.
> 
> I don't believe this is a problem, due to NANs not being a single value and
> according to the standard the sign bit doesn't change the meaning of a NAN.
> 
> That's why specifically for negates generally no check is performed and it's
> Assumed that if a value is a NaN going in, it's a NaN coming out, and this
> Optimization doesn't change that.  Also under fast-math we don't guarantee
> a stable representation for NaN (or zeros, etc) afaik.
> 
> So if that is still a concern I could add && !HONORS_NAN () to the 
> constraints.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> * match.pd: Add fneg/fadd rule.
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.target/aarch64/simd/addsub_1.c: New test.
> * gcc.target/aarch64/sve/addsub_1.c: New test.
> 
> --- inline version of patch ---
> 
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 
> 1bb936fc4010f98f24bb97671350e8432c55b347..2617d56091dfbd41ae49f980ee0af3757f5ec1cf
>  100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -7916,6 +7916,59 @@ and,
>(simplify (reduc (op @0 VECTOR_CST@1))
>  (op (reduc:type @0) (reduc:type @1
>  
> +/* Simplify vector floating point operations of alternating sub/add pairs
> +   into using an fneg of a wider element type followed by a normal add.
> +   under IEEE 754 the fneg of the wider type will negate every even entry
> +   and when doing an add we get a sub of the even and add of every odd
> +   elements.  */
> +(simplify
> + (vec_perm (plus:c @0 @1) (minus @0 @1) VECTOR_CST@2)
> + (if (!VECTOR_INTEGER_TYPE_P (type) && !BYTES_BIG_ENDIAN)
 
shouldn't this be FLOAT_WORDS_BIG_ENDIAN instead?
 
I'm still concerned what
 
(neg:V2DF (subreg:V2DF (reg:V4SF) 0))
 
means for architectures like RISC-V.  Can one "reformat" FP values
in vector registers so that two floats overlap a double
(and then back)?
 
I suppose you rely on target_can_change_mode_class to tell you that.
 
 
> +  (with
> +   {
> + /* Build a vector of integers from the tree mask.  */
> + vec_perm_builder builder;
> + if (!tree_to_vec_perm_builder (, @2))
> +   return NULL_TREE;
> +
> + /* Create a vec_perm_indices for the integer vector.  */
> + poly_uint64 nelts = TYPE_VECTOR_SUBPARTS (type);
> + vec_perm_indices sel (builder, 2, nelts);
> +   }
> +   (if (sel.series_p (0, 2, 0, 2))
> +(with
> + {
> +   machine_mode vec_mode = TYPE_MODE (type);
> +   auto elem_mode = GET_MODE_INNER (vec_mode);
> +   auto nunits = exact_div (GET_MODE_NUNITS (vec_mode), 2);
> +   tree stype;
> +   switch (elem_mode)
> + {
> + case E_HFmode:
> +stype = float_type_node;
> +break;
> + case E_SFmode:
> +stype = double_type_node;
> +break;
> + default:
> +return NULL_TREE;
> + }
 
Can't you use GET_MODE_WIDER_MODE and double-check the
mode-size doubles?  I mean you obviously miss DFmode -> TFmode.
 
> +   tree ntype = build_vector_type (stype, nunits);
> +   if (!ntype)
 
You want to check that the above results in a vector mode.
 
> + return NULL_TREE;
> +
> +   /* The format has to have a simple sign bit.  */
> +   const struct real_format *fmt = FLOAT_MODE_FORMAT (vec_mode);
> +   if (fmt == NULL)
> + return NULL_TREE;
> + }
> + (if (fmt->signbit_rw == GET_MODE_UNIT_BITSIZE (vec_mode) - 1
 
shouldn't this be a check on the component mode?  I think you'd
want to check that the bigger format signbit_rw is equal to
the smaller format mode size plus its signbit_rw or so?
 
> +   && fmt->signbit_rw == fmt->signbit_ro
> +   && targetm.can_change_mode_class (TYPE_MODE (ntype), TYPE_MODE (type), 
> ALL_REGS)
> +   && (optimize_vectors_before_lowering_p ()
> +   || target_supports_op_p (ntype, NEGATE_EXPR, optab_vector)))
> +  (plus (view_convert:type (negate (view_convert:ntype @1))) @0)))
> +
>  (simplify
>   (vec_perm @0 @1 VECTOR_CST@2)
>   (with
> diff --git a/gcc/testsuite/gcc.target/aarch64/simd/addsub_1.c 
> b/gcc/testsuite/gcc.target/aarch64/simd/addsub_1.c
> new file mode 100644
> index 
> ..1fb91a34c421bbd2894faa0dbbf1b47ad43310c4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/simd/addsub_1.c
> @@ -0,0 +1,56 @@
> +/* { dg-do compile } */
> +/* { 

Re: [PATCH] tree-object-size: Support strndup and strdup

2022-09-23 Thread Jakub Jelinek via Gcc-patches
On Thu, Sep 22, 2022 at 11:26:29AM -0400, Siddhesh Poyarekar wrote:
> On 2022-09-22 09:02, Jakub Jelinek wrote:
> > On Mon, Aug 15, 2022 at 03:23:11PM -0400, Siddhesh Poyarekar wrote:
> > > --- a/gcc/tree-object-size.cc
> > > +++ b/gcc/tree-object-size.cc
> > > @@ -495,6 +495,18 @@ decl_init_size (tree decl, bool min)
> > > return size;
> > >   }
> > > +/* Get the outermost object that PTR may point into.  */
> > > +
> > > +static tree
> > > +get_whole_object (const_tree ptr)
> > > +{
> > > +  tree pt_var = TREE_OPERAND (ptr, 0);
> > > +  while (handled_component_p (pt_var))
> > > +pt_var = TREE_OPERAND (pt_var, 0);
> > > +
> > > +  return pt_var;
> > > +}
> > 
> > Not sure why you want a new function for this.
> > This is essentially get_base_address (TREE_OPERAND (ptr, 0)).
> 
> Oh, so can addr_object_size be simplified to use get_base_address too?

You can try.  As you can see in get_base_address, that function
handles something that the above doesn't (looking through some MEM_REFs too).

> > Even if c_strlen (src, 1) is constant, I don't see what you can assume
> > for object size of strndup ("abcd\0efgh", n); for minimum, except 1.
> 
> Can't we assume MIN(5, n) for STRING_CST?

If you mean MIN(5, n + 1), for c_strlen constant yes, but say if you have
strndup (&"abcd\0efgh"[i], n); you can't just from seeing a base address
being a STRING_CST with certain length assume anything than 1.

> For ARRAY_REFs, it may end up being MIN(array_size, n) and not account for

No, for non-OST_MINIMUM array size of objects (or string literals) containing
the strings is relevant and you can indeed use MIN(__b{d}os (src), n + 1)
as maximum.  But for the minimum, the object size is irrelevant, you don't
know where in the string there are '\0's and they could appear anywhere
(unless you do string length range analysis).  With c_strlen on src
returning constant you know the exact string length and so you can use
MIN (c_strlen (src, 1) + 1, n + 1) as both minimum and maximum, but in all
other cases, 1 is the safe answer.

Jakub



[COMMITTED] frange: Make the setter taking trees a wrapper.

2022-09-23 Thread Aldy Hernandez via Gcc-patches
The frange setter does all its work in trees.  This incurs a penalty
for the real_value variants because they must wrap their arguments
into a tree and pass it to the tree setter, which will then do the
opposite.  This is leftovers from the irange setter.

Even though the we still need constructors taking trees so we can
interact with the tree world, there's no sense penalizing the rest of
the implementation.

Tested on x86-64 Linux.

gcc/ChangeLog:

* value-range.cc (frange::set): Swap setters such that the one
accepting REAL_VALUE_TYPE does all the work.
---
 gcc/value-range.cc | 31 ++-
 1 file changed, 14 insertions(+), 17 deletions(-)

diff --git a/gcc/value-range.cc b/gcc/value-range.cc
index 43905ba4901..9ca442478c9 100644
--- a/gcc/value-range.cc
+++ b/gcc/value-range.cc
@@ -290,7 +290,9 @@ frange::flush_denormals_to_zero ()
 // Setter for franges.
 
 void
-frange::set (tree min, tree max, value_range_kind kind)
+frange::set (tree type,
+const REAL_VALUE_TYPE , const REAL_VALUE_TYPE ,
+value_range_kind kind)
 {
   switch (kind)
 {
@@ -299,7 +301,7 @@ frange::set (tree min, tree max, value_range_kind kind)
   return;
 case VR_VARYING:
 case VR_ANTI_RANGE:
-  set_varying (TREE_TYPE (min));
+  set_varying (type);
   return;
 case VR_RANGE:
   break;
@@ -308,14 +310,12 @@ frange::set (tree min, tree max, value_range_kind kind)
 }
 
   // Handle NANs.
-  if (real_isnan (TREE_REAL_CST_PTR (min)) || real_isnan (TREE_REAL_CST_PTR 
(max)))
+  if (real_isnan () || real_isnan ())
 {
-  gcc_checking_assert (real_identical (TREE_REAL_CST_PTR (min),
-  TREE_REAL_CST_PTR (max)));
-  tree type = TREE_TYPE (min);
+  gcc_checking_assert (real_identical (, ));
   if (HONOR_NANS (type))
{
- bool sign = real_isneg (TREE_REAL_CST_PTR (min));
+ bool sign = real_isneg ();
  set_nan (type, sign);
}
   else
@@ -324,9 +324,9 @@ frange::set (tree min, tree max, value_range_kind kind)
 }
 
   m_kind = kind;
-  m_type = TREE_TYPE (min);
-  m_min = *TREE_REAL_CST_PTR (min);
-  m_max = *TREE_REAL_CST_PTR (max);
+  m_type = type;
+  m_min = min;
+  m_max = max;
   if (HONOR_NANS (m_type))
 {
   m_pos_nan = true;
@@ -351,7 +351,7 @@ frange::set (tree min, tree max, value_range_kind kind)
 }
 
   // Check for swapped ranges.
-  gcc_checking_assert (tree_compare (LE_EXPR, min, max));
+  gcc_checking_assert (real_compare (LE_EXPR, , ));
 
   normalize_kind ();
 
@@ -361,14 +361,11 @@ frange::set (tree min, tree max, value_range_kind kind)
 verify_range ();
 }
 
-// Setter for frange from REAL_VALUE_TYPE endpoints.
-
 void
-frange::set (tree type,
-const REAL_VALUE_TYPE , const REAL_VALUE_TYPE ,
-value_range_kind kind)
+frange::set (tree min, tree max, value_range_kind kind)
 {
-  set (build_real (type, min), build_real (type, max), kind);
+  set (TREE_TYPE (min),
+   *TREE_REAL_CST_PTR (min), *TREE_REAL_CST_PTR (max), kind);
 }
 
 // Normalize range to VARYING or UNDEFINED, or vice versa.  Return
-- 
2.37.1



RE: [PATCH]middle-end Add optimized float addsub without needing VEC_PERM_EXPR.

2022-09-23 Thread Richard Biener via Gcc-patches
On Fri, 23 Sep 2022, Tamar Christina wrote:

> Hi,
> 
> Attached is the respun version of the patch,
> 
> > >>
> > >> Wouldn't a target need to re-check if lanes are NaN or denormal if
> > >> after a SFmode lane operation a DFmode lane operation follows?  IIRC
> > >> that is what usually makes punning "integer" vectors as FP vectors 
> > >> costly.
> 
> I don't believe this is a problem, due to NANs not being a single value and
> according to the standard the sign bit doesn't change the meaning of a NAN.
> 
> That's why specifically for negates generally no check is performed and it's
> Assumed that if a value is a NaN going in, it's a NaN coming out, and this
> Optimization doesn't change that.  Also under fast-math we don't guarantee
> a stable representation for NaN (or zeros, etc) afaik.
> 
> So if that is still a concern I could add && !HONORS_NAN () to the 
> constraints.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   * match.pd: Add fneg/fadd rule.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/aarch64/simd/addsub_1.c: New test.
>   * gcc.target/aarch64/sve/addsub_1.c: New test.
> 
> --- inline version of patch ---
> 
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 
> 1bb936fc4010f98f24bb97671350e8432c55b347..2617d56091dfbd41ae49f980ee0af3757f5ec1cf
>  100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -7916,6 +7916,59 @@ and,
>(simplify (reduc (op @0 VECTOR_CST@1))
>  (op (reduc:type @0) (reduc:type @1
>  
> +/* Simplify vector floating point operations of alternating sub/add pairs
> +   into using an fneg of a wider element type followed by a normal add.
> +   under IEEE 754 the fneg of the wider type will negate every even entry
> +   and when doing an add we get a sub of the even and add of every odd
> +   elements.  */
> +(simplify
> + (vec_perm (plus:c @0 @1) (minus @0 @1) VECTOR_CST@2)
> + (if (!VECTOR_INTEGER_TYPE_P (type) && !BYTES_BIG_ENDIAN)

shouldn't this be FLOAT_WORDS_BIG_ENDIAN instead?

I'm still concerned what

 (neg:V2DF (subreg:V2DF (reg:V4SF) 0))

means for architectures like RISC-V.  Can one "reformat" FP values
in vector registers so that two floats overlap a double
(and then back)?

I suppose you rely on target_can_change_mode_class to tell you that.


> +  (with
> +   {
> + /* Build a vector of integers from the tree mask.  */
> + vec_perm_builder builder;
> + if (!tree_to_vec_perm_builder (, @2))
> +   return NULL_TREE;
> +
> + /* Create a vec_perm_indices for the integer vector.  */
> + poly_uint64 nelts = TYPE_VECTOR_SUBPARTS (type);
> + vec_perm_indices sel (builder, 2, nelts);
> +   }
> +   (if (sel.series_p (0, 2, 0, 2))
> +(with
> + {
> +   machine_mode vec_mode = TYPE_MODE (type);
> +   auto elem_mode = GET_MODE_INNER (vec_mode);
> +   auto nunits = exact_div (GET_MODE_NUNITS (vec_mode), 2);
> +   tree stype;
> +   switch (elem_mode)
> +  {
> +  case E_HFmode:
> +stype = float_type_node;
> +break;
> +  case E_SFmode:
> +stype = double_type_node;
> +break;
> +  default:
> +return NULL_TREE;
> +  }

Can't you use GET_MODE_WIDER_MODE and double-check the
mode-size doubles?  I mean you obviously miss DFmode -> TFmode.

> +   tree ntype = build_vector_type (stype, nunits);
> +   if (!ntype)

You want to check that the above results in a vector mode.

> +  return NULL_TREE;
> +
> +   /* The format has to have a simple sign bit.  */
> +   const struct real_format *fmt = FLOAT_MODE_FORMAT (vec_mode);
> +   if (fmt == NULL)
> +  return NULL_TREE;
> + }
> + (if (fmt->signbit_rw == GET_MODE_UNIT_BITSIZE (vec_mode) - 1

shouldn't this be a check on the component mode?  I think you'd
want to check that the bigger format signbit_rw is equal to
the smaller format mode size plus its signbit_rw or so?

> +   && fmt->signbit_rw == fmt->signbit_ro
> +   && targetm.can_change_mode_class (TYPE_MODE (ntype), TYPE_MODE 
> (type), ALL_REGS)
> +   && (optimize_vectors_before_lowering_p ()
> +   || target_supports_op_p (ntype, NEGATE_EXPR, optab_vector)))
> +  (plus (view_convert:type (negate (view_convert:ntype @1))) @0)))
> +
>  (simplify
>   (vec_perm @0 @1 VECTOR_CST@2)
>   (with
> diff --git a/gcc/testsuite/gcc.target/aarch64/simd/addsub_1.c 
> b/gcc/testsuite/gcc.target/aarch64/simd/addsub_1.c
> new file mode 100644
> index 
> ..1fb91a34c421bbd2894faa0dbbf1b47ad43310c4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/simd/addsub_1.c
> @@ -0,0 +1,56 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target arm_v8_2a_fp16_neon_ok } */
> +/* { dg-options "-Ofast" } */
> +/* { dg-add-options arm_v8_2a_fp16_neon } */
> +/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */
> +
> +#pragma GCC target "+nosve"
> +
> +/* 

Re: [PATCH] MIPS: fix building on multiarch platform

2022-09-23 Thread YunQiang Su via Gcc-patches
Xi Ruoyao via Gcc-patches  于2022年9月21日周三 23:09写道:
>
> On Wed, 2022-09-21 at 11:31 +, YunQiang Su wrote:
> > On platforms that support multiarch, such as Debian,
> > the filesystem hierarchy doesn't fellow the old Irix style:
> > lib & lib/ for native
> > lib64 for N64 on N32/O32 systems
> > lib32 for N32 on N64/O32 systems
> > libo32 for O32 on N64/N32 systems
> >
> > Thus we cannot
> >  #define STANDARD_STARTFILE_PREFIX_1
> >  #define STANDARD_STARTFILE_PREFIX_2
> > on N32 or N64 systems, else collect2 won't look for libraries
> > on /lib/.
> >
> > gcc/ChangeLog:
> > * configure.ac: AC_DEFINE(ENABLE_MULTIARCH, 1)
> > * configure: Regenerated.
> > * config.in: Regenerated.
> > * config/mips/mips.h: don't define STANDARD_STARTFILE_PREFIX_1
> >   if ENABLE_MULTIARCH is defined.
> > * config/mips/t-linux64: define correct multiarch path when
> >   multiarch is enabled.
> > ---
> >  gcc/config.in |  6 ++
> >  gcc/config/mips/mips.h|  2 ++
> >  gcc/config/mips/t-linux64 | 21 -
> >  gcc/configure |  4 
> >  gcc/configure.ac  |  3 +++
> >  5 files changed, 35 insertions(+), 1 deletion(-)
> >
> > diff --git a/gcc/config.in b/gcc/config.in
> > index 6ac17be189e..b2ce6361327 100644
> > --- a/gcc/config.in
> > +++ b/gcc/config.in
> > @@ -2312,6 +2312,12 @@
> >  #endif
> >
> >
> > +/* Specify if mutliarch is enabled. */
> > +#ifndef USED_FOR_TARGET
> > +#undef ENABLE_MULTIARCH
> > +#endif
> > +
> > +
> >  /* The size of `dev_t', as computed by sizeof. */
> >  #ifndef USED_FOR_TARGET
> >  #undef SIZEOF_DEV_T
> > diff --git a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h
> > index 74b6e11aabb..fe7f5b274b9 100644
> > --- a/gcc/config/mips/mips.h
> > +++ b/gcc/config/mips/mips.h
> > @@ -3427,6 +3427,7 @@ struct GTY(())  machine_function {
> >
> >  /* If we are *not* using multilibs and the default ABI is not ABI_32
> > we
> > need to change these from /lib and /usr/lib.  */
> > +#ifndef ENABLE_MULTIARCH
> >  #if MIPS_ABI_DEFAULT == ABI_N32
> >  #define STANDARD_STARTFILE_PREFIX_1 "/lib32/"
> >  #define STANDARD_STARTFILE_PREFIX_2 "/usr/lib32/"
> > @@ -3434,6 +3435,7 @@ struct GTY(())  machine_function {
> >  #define STANDARD_STARTFILE_PREFIX_1 "/lib64/"
> >  #define STANDARD_STARTFILE_PREFIX_2 "/usr/lib64/"
> >  #endif
> > +#endif
>
> Should we just remove STANDARD_STARTFILE_PREFIX_{1,2} unconditionally?
> I just took a look and the only Linux ports using these macros are MIPS
> and LoongArch (borrowed these macros from MIPS, I guess).  On a non-
> multilib distro /usr/lib is likely used, and on multilib distros the
> macros are not used anyway.
>

Yes. As Maciej pointed, the old mips style for MIPS filesystem
hierarchy is quite quirk.
This is from Irix.
It does make lots of trouble for distribution makers to support
multilib for mips64,
since

On non-multilib distributions, like OpenWrt for mips64, lib is used,
and lib64 is a symlink to lib.
While on multilib-enabled ports, like Fedora, the old Irix style is used.

So, we should keep the code.

> >  /* Load store bonding is not supported by micromips and fix_24k.  The
> > performance can be degraded for those targets.  Hence, do not bond for
> > diff --git a/gcc/config/mips/t-linux64 b/gcc/config/mips/t-linux64
> > index 2fdd8e00407..37d176ea309 100644
> > --- a/gcc/config/mips/t-linux64
> > +++ b/gcc/config/mips/t-linux64
> > @@ -20,7 +20,26 @@ MULTILIB_OPTIONS = mabi=n32/mabi=32/mabi=64
> >  MULTILIB_DIRNAMES = n32 32 64
> >  MIPS_EL = $(if $(filter %el, $(firstword $(subst -, ,$(target,el)
> >  MIPS_SOFT = $(if $(strip $(filter MASK_SOFT_FLOAT_ABI, 
> > $(target_cpu_default)) $(filter soft, $(with_float))),soft)
> > -MULTILIB_OSDIRNAMES = \
> > +ifeq (yes,$(enable_multiarch))
> > +  ifneq (,$(findstring gnuabi64,$(target)))
> > +MULTILIB_OSDIRNAMES = \
> > +   ../lib32$(call 
> > if_multiarch,:mips64$(MIPS_EL)-linux-gnuabin32$(MIPS_SOFT)) \
> > +   ../libo32$(call if_multiarch,:mips$(MIPS_EL)-linux-gnu$(MIPS_SOFT)) 
> > \
> > +   ../lib$(call 
> > if_multiarch,:mips64$(MIPS_EL)-linux-gnuabi64$(MIPS_SOFT))
> > +  else ifneq (,$(findstring gnuabin32,$(target)))
> > +MULTILIB_OSDIRNAMES = \
> > +   ../lib$(call 
> > if_multiarch,:mips64$(MIPS_EL)-linux-gnuabin32$(MIPS_SOFT)) \
> > +   ../libo32$(call if_multiarch,:mips$(MIPS_EL)-linux-gnu$(MIPS_SOFT)) 
> > \
> > +   ../lib64$(call 
> > if_multiarch,:mips64$(MIPS_EL)-linux-gnuabi64$(MIPS_SOFT))
> > +  else
> > +MULTILIB_OSDIRNAMES = \
> > +   ../lib32$(call 
> > if_multiarch,:mips64$(MIPS_EL)-linux-gnuabin32$(MIPS_SOFT)) \
> > +   ../lib$(call if_multiarch,:mips$(MIPS_EL)-linux-gnu$(MIPS_SOFT)) \
> > +   ../lib64$(call 
> > if_multiarch,:mips64$(MIPS_EL)-linux-gnuabi64$(MIPS_SOFT))
> > +  endif
> > +else
> > +  MULTILIB_OSDIRNAMES = \
> > ../lib32$(call 
> > 

[PATCH] tree-optimization/106922 - extend same-val clobber FRE

2022-09-23 Thread Richard Biener via Gcc-patches
The following extends the skipping of same valued stores to
handle an arbitrary number of them as long as they are from the
same value (which we now record).  That's an obvious extension
which allows to optimize the m_engaged member of std::optional
more reliably.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

PR tree-optimization/106922
* tree-ssa-sccvn.cc (vn_reference_lookup_3): Allow
an arbitrary number of same valued skipped stores.

* g++.dg/torture/pr106922.C: New testcase.
---
 gcc/testsuite/g++.dg/torture/pr106922.C | 48 +
 gcc/tree-ssa-sccvn.cc   | 10 --
 2 files changed, 55 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/torture/pr106922.C

diff --git a/gcc/testsuite/g++.dg/torture/pr106922.C 
b/gcc/testsuite/g++.dg/torture/pr106922.C
new file mode 100644
index 000..046fc6cce76
--- /dev/null
+++ b/gcc/testsuite/g++.dg/torture/pr106922.C
@@ -0,0 +1,48 @@
+// { dg-do compile }
+// { dg-require-effective-target c++17 }
+// { dg-additional-options "-Wall" }
+// -O1 doesn't iterate VN and thus has bogus uninit diagnostics
+// { dg-skip-if "" { *-*-* } { "-O1" } { "" } }
+
+#include 
+
+#include 
+template 
+using Optional = std::optional;
+
+#include 
+
+struct MyOptionalStructWithInt {
+int myint; /* works without this */
+Optional> myoptional;
+};
+
+struct MyOptionalsStruct {
+MyOptionalStructWithInt external1;
+MyOptionalStructWithInt external2;
+};
+
+struct MyStruct { };
+std::ostream  << (std::ostream , const MyStruct );
+
+std::vector getMyStructs();
+
+void test()
+{
+MyOptionalsStruct externals;
+MyOptionalStructWithInt internal1;
+MyOptionalStructWithInt internal2;
+
+std::vector myStructs;
+myStructs = getMyStructs();
+
+for (const auto& myStruct : myStructs)
+{
+std::stringstream address_stream;
+address_stream << myStruct;
+internal1.myint = internal2.myint = 0;
+externals.external1 = internal1;
+externals.external2 = internal2;
+externals.external2 = internal2;
+}
+}
diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
index 9c12a8e4f03..2cc2c0e1e34 100644
--- a/gcc/tree-ssa-sccvn.cc
+++ b/gcc/tree-ssa-sccvn.cc
@@ -2680,7 +2680,6 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void 
*data_,
   if (is_gimple_reg_type (TREE_TYPE (lhs))
  && types_compatible_p (TREE_TYPE (lhs), vr->type)
  && (ref->ref || data->orig_ref.ref)
- && !data->same_val
  && !data->mask
  && data->partial_defs.is_empty ()
  && multiple_p (get_object_alignment
@@ -2693,8 +2692,13 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void 
*data_,
 a different loop iteration but only to loop invariants.  Use
 CONSTANT_CLASS_P (unvalueized!) as conservative approximation.
 The one-hop lookup below doesn't have this issue since there's
-a virtual PHI before we ever reach a backedge to cross.  */
- if (CONSTANT_CLASS_P (rhs))
+a virtual PHI before we ever reach a backedge to cross.
+We can skip multiple defs as long as they are from the same
+value though.  */
+ if (data->same_val
+ && !operand_equal_p (data->same_val, rhs))
+   ;
+ else if (CONSTANT_CLASS_P (rhs))
{
  if (dump_file && (dump_flags & TDF_DETAILS))
{
-- 
2.35.3


Re: [PATCH v3 06/11] OpenMP: Pointers and member mappings

2022-09-23 Thread Tobias Burnus

Hi Julian and Jakub, hi all,

On 23.09.22 09:29, Julian Brown wrote:

How about this version? (Re-tested.)

[...]

* * *

Some more generic (pre)remarks – not affecting the patch code,
but possibly the commit log message:


This follows OMP 5.0, 2.19.7.1 "map Clause":

which is also in "OMP 5.2, 5.8.3 map Clause [152:1-4]". It might
make sense to add this ref in addition (or instead):


 "If a list item in a map clause is an associated pointer and the
  pointer is not the base pointer of another list item in a map clause
  on the same construct, then it is treated as if its pointer target
  is implicitly mapped in the same clause. For the purposes of the map
  clause, the mapped pointer target is treated as if its base pointer
  is the associated pointer."


Pre-remark 1: Issue with the wording in the 5.x spec. Namely, assume

integer, pointer :: p(:)
! allocate(p(1024)); deallocate(p) ! < (un)commented lined or not

!$omp target map(p)
 p => null()
!$omp end target

Here, 'p' is neither associated nor unassociated (NULL pointer),
but it is undefined. However, the 5.x spec does require an implicit mapping of
the associated pointer target – but the compiler has no idea whether
the pointer address is valid (associated) or dangling (undefined) - and
the p.data address might be either invalid to access and/or the p.dim[...]
garbage data could yield a size(p) that is huge.


Thus, the following restriction was proposed for OpenMP 6.0 (TR11):

"The association status of a list item that is a pointer must not be
undefined unless it is a structure component and it results from a
predefined default mapper."

which makes my example invalid. (Add some caveat here about TR11 not
yet being released and also TRs being not final named-version releases.)

This also affects the quote you show, which now reads (2nd line is new):

"If a list item in a map clause is an associated pointer
that is not attach-ineligible
and the pointer is not the base pointer [...]".

with 'attach-ineligible' being defined in the previous bullet list
(i.e. in the preceding paragraph (>= 5.1) about derived-type components;
first bullet point there: pointer implies attach-ineligible).

I think the 5.x wording has issues, but on the other hand, the wording
above is not in an OpenMP release and not even in a TR. As the spirit
has not changed, it probably makes sense to keep the 5.0 (5.x) wording.

Cf. https://github.com/OpenMP/spec/pull/3319/files and
https://github.com/OpenMP/spec/issues/3290
(And apologies to non-OpenMP members as those are non-publicly accessible.)

* * *



However, we can also write this:
 map(to: tvar%arrptr) map(tofrom: tvar%arrptr(3:8))

and then instead we should follow:

 "If the structure sibling list item is a pointer then it is treated
  as if its association status is undefined, unless it appears as
  the base pointer of another list item in a map clause on the same
  construct."

This wording disappeared in 5.1 due to some cleanup (cf. Issue 2152, which has multiple 
changes; this one is Pull Req. 2379). I think the matching current / OpenMP 5.2 wording 
(5.8.3 map Clause [152:5-8, 11-13 (,14-16)]) is "For map clauses on map-entering 
constructs, if any list item has a base pointer for which a corresponding pointer exists 
in the data environment upon entry to the region and either a new list item or the 
corresponding pointer is created in the device data environment on entry to the region, 
then: (Fortran) 1. The corresponding pointer variable is associated with a pointer target 
that has the same rank and bounds as the pointer target of the original pointer, such 
that the corresponding list item can be accessed through the pointer in a target region. 
..." I think here 'a new list item ... is created ... on entry' applies. However, 
this should not affect what you wrote later on. Still, I wonder whether 5.2 instead of (I 
think it makes sense to also read 5.2 when implementing this for bug-fix changes (and 
trivial feature changes), but not for larger-effort changes, which can be implemented 
later on.)

But, that's not implemented quite right at the moment [...]
The solution is to detect when we're mapping a smaller part of the array
(or a subcomponent) on the same directive, and only map the descriptor
in that case. So we get mappings like this instead:

 map(to: tvar%arrptr)   -->
 GOMP_MAP_ALLOC  tvar%arrptr  (the descriptor)

 map(tofrom: tvar%arrptr(3:8)   -->
 GOMP_MAP_TOFROM tvar%arrptr%data(3) (size 8-3+1, etc.)
 GOMP_MAP_ALWAYS_POINTER tvar%arrptr%data (bias 3, etc.)

(I concur.)

* * *

--- a/gcc/fortran/trans-openmp.cc
+++ b/gcc/fortran/trans-openmp.cc
...
@@ -2470,22 +2471,18 @@ gfc_trans_omp_array_section (stmtblock_t *block, 
gfc_omp_namelist *n,
}
  if (GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (decl)))
{
...
+  if (ptr_kind != GOMP_MAP_ALWAYS_POINTER)
   {
...
+ /* For OpenMP, the descriptor must be mapped with its own explicit
+map clause (e.g. both "map(foo%arr)" and 

[PATCH] testsuite: Verify that module-mapper is avialable

2022-09-23 Thread Torbjörn SVENSSON via Gcc-patches
For some test cases, it's required that the optional module mapper
"g++-mapper-server" is built. As the server is not required, the
test cases will fail if it can't be found.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp (check_is_prog_name_available):
New.
* lib/target-supports-dg.exp
(dg-require-prog-name-available): New.
* g++.dg/modules/modules.exp: Verify avilability of module
mapper.

Signed-off-by: Torbjörn SVENSSON  
---
 gcc/testsuite/g++.dg/modules/modules.exp | 31 
 gcc/testsuite/lib/target-supports-dg.exp | 15 
 gcc/testsuite/lib/target-supports.exp| 15 
 3 files changed, 61 insertions(+)

diff --git a/gcc/testsuite/g++.dg/modules/modules.exp 
b/gcc/testsuite/g++.dg/modules/modules.exp
index afb323d0efd..4784803742a 100644
--- a/gcc/testsuite/g++.dg/modules/modules.exp
+++ b/gcc/testsuite/g++.dg/modules/modules.exp
@@ -279,6 +279,29 @@ proc module-init { src } {
 return $option_list
 }
 
+# Return 1 if requirements are met
+proc module-check-requirements { tests } {
+foreach test $tests {
+   set tmp [dg-get-options $test]
+   foreach op $tmp {
+   switch [lindex $op 0] {
+   "dg-additional-options" {
+   # Example strings to match:
+   # -fmodules-ts -fmodule-mapper=|@g++-mapper-server\\ -t\\ 
[srcdir]/inc-xlate-1.map
+   # -fmodules-ts -fmodule-mapper=|@g++-mapper-server
+   if [regexp -- {(^| )-fmodule-mapper=\|@([^\\ ]*)} [lindex 
$op 2] dummy dummy2 prog] {
+   verbose "Checking that mapper exist: $prog"
+   if { ![ check_is_prog_name_available $prog ] } {
+   return 0
+   }
+   }
+   }
+   }
+   }
+}
+return 1
+}
+
 # cleanup any detritus from previous run
 cleanup_module_files [find $DEFAULT_REPO *.gcm]
 
@@ -307,6 +330,14 @@ foreach src [lsort [find $srcdir/$subdir {*_a.[CHX}]] {
set tests [lsort [find [file dirname $src] \
  [regsub {_a.[CHX]$} [file tail $src] 
{_[a-z].[CHX]}]]]
 
+   if { ![module-check-requirements $tests] } {
+   set testcase [regsub {_a.[CH]} $src {}]
+   set testcase \
+   [string range $testcase [string length "$srcdir/"] end]
+   unsupported $testcase
+   continue
+   }
+
set std_list [module-init $src]
foreach std $std_list {
set mod_files {}
diff --git a/gcc/testsuite/lib/target-supports-dg.exp 
b/gcc/testsuite/lib/target-supports-dg.exp
index aa2164bc789..6ce3b2b1a1b 100644
--- a/gcc/testsuite/lib/target-supports-dg.exp
+++ b/gcc/testsuite/lib/target-supports-dg.exp
@@ -683,3 +683,18 @@ proc dg-require-symver { args } {
set dg-do-what [list [lindex ${dg-do-what} 0] "N" "P"]
 }
 }
+
+# If this target does not provide prog named "$args", skip this test.
+
+proc dg-require-prog-name-available { args } {
+# The args are within another list; pull them out.
+set args [lindex $args 0]
+
+set prog [lindex $args 1]
+
+if { ![ check_is_prog_name_available $prog ] } {
+upvar dg-do-what dg-do-what
+set dg-do-what [list [lindex ${dg-do-what} 0] "N" "P"]
+}
+}
+
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 703aba412a6..c3b7a6c17b3 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -11928,3 +11928,18 @@ main:
.byte 0
   } ""]
 }
+ 
+# Return 1 if this target has prog named "$prog", 0 otherwise.
+
+proc check_is_prog_name_available { prog } {
+global tool
+
+set options [list "additional_flags=-print-prog-name=$prog"]
+set output [lindex [${tool}_target_compile "" "" "none" $options] 0]
+
+if { $output == $prog } {
+return 0
+}
+
+return 1
+}
-- 
2.25.1



[committed] libstdc++: Enable constexpr std::bitset for debug mode

2022-09-23 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux. Pushed to trunk.

As I said in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107015 I think
we should just get rid of __debug::bitset, it is useless except for
C++98, where it's ABI-incompatible with C++11 and later.

-- >8 --

We already disable all debug mode checks for C++11 and later, so we can
easily make everything constexpr. This fixes the FAIL results for the
new tests when using -D_GLIBCXX_DEBUG.

Also fix some other tests failing with non-default test flags.

libstdc++-v3/ChangeLog:

* include/debug/bitset (__debug::bitset): Add constexpr to all
member functions.
(operator&, operator|, operator^): Add inline and constexpr.
(operator>>, operator<<): Add inline.
* testsuite/20_util/bitset/access/constexpr.cc: Only check using
constexpr std::string for the cxx11 ABI.
* testsuite/20_util/bitset/cons/constexpr_c++23.cc: Likewise.
* testsuite/20_util/headers/bitset/synopsis.cc: Check constexpr
for C++23.
---
 libstdc++-v3/include/debug/bitset | 43 ---
 .../20_util/bitset/access/constexpr.cc|  2 +
 .../20_util/bitset/cons/constexpr_c++23.cc|  2 +
 .../20_util/headers/bitset/synopsis.cc|  9 
 4 files changed, 51 insertions(+), 5 deletions(-)

diff --git a/libstdc++-v3/include/debug/bitset 
b/libstdc++-v3/include/debug/bitset
index 4c0af03c255..9335fe441a3 100644
--- a/libstdc++-v3/include/debug/bitset
+++ b/libstdc++-v3/include/debug/bitset
@@ -141,6 +141,7 @@ namespace __debug
   : _Base(__val) { }
 
   template
+   _GLIBCXX23_CONSTEXPR
 explicit
 bitset(const std::basic_string<_CharT, _Traits, _Alloc>& __str,
   typename std::basic_string<_CharT, _Traits, _Alloc>::size_type
@@ -152,6 +153,7 @@ namespace __debug
   // _GLIBCXX_RESOLVE_LIB_DEFECTS
   // 396. what are characters zero and one.
   template
+   _GLIBCXX23_CONSTEXPR
bitset(const std::basic_string<_CharT, _Traits, _Alloc>& __str,
   typename std::basic_string<_CharT, _Traits, _Alloc>::size_type
   __pos,
@@ -160,10 +162,12 @@ namespace __debug
   _CharT __zero, _CharT __one = _CharT('1'))
: _Base(__str, __pos, __n, __zero, __one) { }
 
+  _GLIBCXX23_CONSTEXPR
   bitset(const _Base& __x) : _Base(__x) { }
 
 #if __cplusplus >= 201103L
   template
+   _GLIBCXX23_CONSTEXPR
 explicit
 bitset(const _CharT* __str,
   typename std::basic_string<_CharT>::size_type __n
@@ -173,6 +177,7 @@ namespace __debug
 #endif
 
   // 23.3.5.2 bitset operations:
+  _GLIBCXX23_CONSTEXPR
   bitset<_Nb>&
   operator&=(const bitset<_Nb>& __rhs) _GLIBCXX_NOEXCEPT
   {
@@ -180,6 +185,7 @@ namespace __debug
return *this;
   }
 
+  _GLIBCXX23_CONSTEXPR
   bitset<_Nb>&
   operator|=(const bitset<_Nb>& __rhs) _GLIBCXX_NOEXCEPT
   {
@@ -187,6 +193,7 @@ namespace __debug
return *this;
   }
 
+  _GLIBCXX23_CONSTEXPR
   bitset<_Nb>&
   operator^=(const bitset<_Nb>& __rhs) _GLIBCXX_NOEXCEPT
   {
@@ -194,6 +201,7 @@ namespace __debug
return *this;
   }
 
+  _GLIBCXX23_CONSTEXPR
   bitset<_Nb>&
   operator<<=(size_t __pos) _GLIBCXX_NOEXCEPT
   {
@@ -201,6 +209,7 @@ namespace __debug
return *this;
   }
 
+  _GLIBCXX23_CONSTEXPR
   bitset<_Nb>&
   operator>>=(size_t __pos) _GLIBCXX_NOEXCEPT
   {
@@ -208,6 +217,7 @@ namespace __debug
return *this;
   }
 
+  _GLIBCXX23_CONSTEXPR
   bitset<_Nb>&
   set() _GLIBCXX_NOEXCEPT
   {
@@ -217,6 +227,7 @@ namespace __debug
 
   // _GLIBCXX_RESOLVE_LIB_DEFECTS
   // 186. bitset::set() second parameter should be bool
+  _GLIBCXX23_CONSTEXPR
   bitset<_Nb>&
   set(size_t __pos, bool __val = true)
   {
@@ -224,6 +235,7 @@ namespace __debug
return *this;
   }
 
+  _GLIBCXX23_CONSTEXPR
   bitset<_Nb>&
   reset() _GLIBCXX_NOEXCEPT
   {
@@ -231,6 +243,7 @@ namespace __debug
return *this;
   }
 
+  _GLIBCXX23_CONSTEXPR
   bitset<_Nb>&
   reset(size_t __pos)
   {
@@ -238,10 +251,12 @@ namespace __debug
return *this;
   }
 
+  _GLIBCXX23_CONSTEXPR
   bitset<_Nb>
   operator~() const _GLIBCXX_NOEXCEPT
   { return bitset(~_M_base()); }
 
+  _GLIBCXX23_CONSTEXPR
   bitset<_Nb>&
   flip() _GLIBCXX_NOEXCEPT
   {
@@ -249,6 +264,7 @@ namespace __debug
return *this;
   }
 
+  _GLIBCXX23_CONSTEXPR
   bitset<_Nb>&
   flip(size_t __pos)
   {
@@ -259,6 +275,7 @@ namespace __debug
   // element access:
   // _GLIBCXX_RESOLVE_LIB_DEFECTS
   // 11. Bitset minor problems
+  _GLIBCXX23_CONSTEXPR
   reference
   operator[](size_t __pos)
   {
@@ -288,6 +305,7 @@ namespace __debug
 #endif
 
   template 
+   _GLIBCXX23_CONSTEXPR
 

Re: Extend fold_vec_perm to fold VEC_PERM_EXPR in VLA manner

2022-09-23 Thread Prathamesh Kulkarni via Gcc-patches
On Tue, 20 Sept 2022 at 18:09, Richard Sandiford
 wrote:
>
> Prathamesh Kulkarni  writes:
> > On Mon, 12 Sept 2022 at 19:57, Richard Sandiford
> >  wrote:
> >>
> >> Prathamesh Kulkarni  writes:
> >> >> The VLA encoding encodes the first N patterns explicitly.  The
> >> >> npatterns/nelts_per_pattern values then describe how to extend that
> >> >> initial sequence to an arbitrary number of elements.  So when performing
> >> >> an operation on (potentially) variable-length vectors, the questions is:
> >> >>
> >> >> * Can we work out an initial sequence and npatterns/nelts_per_pattern
> >> >>   pair that will be correct for all elements of the result?
> >> >>
> >> >> This depends on the operation that we're performing.  E.g. it's
> >> >> different for unary operations (vector_builder::new_unary_operation)
> >> >> and binary operations (vector_builder::new_binary_operations).  It also
> >> >> varies between unary operations and between binary operations, hence
> >> >> the allow_stepped_p parameters.
> >> >>
> >> >> For VEC_PERM_EXPR, I think the key requirement is that:
> >> >>
> >> >> (R) Each individual selector pattern must always select from the same 
> >> >> vector.
> >> >>
> >> >> Whether this condition is met depends both on the pattern itself and on
> >> >> the number of patterns that it's combined with.
> >> >>
> >> >> E.g. suppose we had the selector pattern:
> >> >>
> >> >>   { 0, 1, 4, ... }   i.e. 3x - 2 for x > 0
> >> >>
> >> >> If the arguments and selector are n elements then this pattern on its
> >> >> own would select from more than one argument if 3(n-1) - 2 >= n.
> >> >> This is clearly true for large enough n.  So if n is variable then
> >> >> we cannot represent this.
> >> >>
> >> >> If the pattern above is one of two patterns, so interleaved as:
> >> >>
> >> >>  { 0, _, 1, _, 4, _, ... }  o=0
> >> >>   or { _, 0, _, 1, _, 4, ... }  o=1
> >> >>
> >> >> then the pattern would select from more than one argument if
> >> >> 3(n/2-1) - 2 + o >= n.  This too would be a problem for variable n.
> >> >>
> >> >> But if the pattern above is one of four patterns then it selects
> >> >> from more than one argument if 3(n/4-1) - 2 + o >= n.  This is not
> >> >> true for any valid n or o, so the pattern is OK.
> >> >>
> >> >> So let's define some ad hoc terminology:
> >> >>
> >> >> * Px is the number of patterns in x
> >> >> * Ex is the number of elements per pattern in x
> >> >>
> >> >> where x can be:
> >> >>
> >> >> * 1: first argument
> >> >> * 2: second argument
> >> >> * s: selector
> >> >> * r: result
> >> >>
> >> >> Then:
> >> >>
> >> >> (1) The number of elements encoded explicitly for x is Ex*Px
> >> >>
> >> >> (2) The explicit encoding can be used to produce a sequence of N*Ex*Px
> >> >> elements for any integer N.  This extended sequence can be reencoded
> >> >> as having N*Px patterns, with Ex staying the same.
> >> >>
> >> >> (3) If Ex < 3, Ex can be increased by 1 by repeating the final Px 
> >> >> elements
> >> >> of the explicit encoding.
> >> >>
> >> >> So let's assume (optimistically) that we can produce the result
> >> >> by calculating the first Pr*Er elements and using the Pr,Er encoding
> >> >> to imply the rest.  Then:
> >> >>
> >> >> * (2) means that, when combining multiple input operands with 
> >> >> potentially
> >> >>   different encodings, we can set the number of patterns in the result
> >> >>   to the least common multiple of the number of patterns in the inputs.
> >> >>   In this case:
> >> >>
> >> >>   Pr = least_common_multiple(P1, P2, Ps)
> >> >>
> >> >>   is a valid number of patterns.
> >> >>
> >> >> * (3) means that the number of elements per pattern of the result can
> >> >>   be the maximum of the number of elements per pattern in the inputs.
> >> >>   (Alternatively, we could always use 3.)  In this case:
> >> >>
> >> >>   Er = max(E1, E2, Es)
> >> >>
> >> >>   is a valid number of elements per pattern.
> >> >>
> >> >> So if (R) holds we can compute the result -- for both VLA and VLS -- by
> >> >> calculating the first Pr*Er elements of the result and using the
> >> >> encoding to derive the rest.  If (R) doesn't hold then we need the
> >> >> selector to be constant-length.  We should then fill in the result
> >> >> based on:
> >> >>
> >> >> - Pr == number of elements in the result
> >> >> - Er == 1
> >> >>
> >> >> But this should be the fallback option, even for VLS.
> >> >>
> >> >> As far as the arguments go: we should reject CONSTRUCTORs for
> >> >> variable-length types.  After doing that, we can treat a CONSTRUCTOR
> >> >> for an N-element vector type by setting the number of patterns to N
> >> >> and the number of elements per pattern to 1.
> >> > Hi Richard,
> >> > Thanks for the suggestions, and sorry for late response.
> >> > I have a couple of very elementary questions:
> >> >
> >> > 1: Consider following inputs to VEC_PERM_EXPR:
> >> > op1: P_op1 == 4, E_op1 == 1
> >> > {1, 2, 3, 4, ...}
> >> >
> >> > op2: P_op2 == 2, E_op2 == 2
> >> 

[committed] libstdc++: Optimize std::bitset::to_string

2022-09-23 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux. Pushed to trunk.

-- >8 --

This makes to_string approximately twice as fast at any optimization
level. Instead of iterating through every bit, jump straight to the next
bit that is set, by using _Find_first and _Find_next.

libstdc++-v3/ChangeLog:

* include/std/bitset (bitset::_M_copy_to_string): Find set bits
instead of iterating over individual bits.
---
 libstdc++-v3/include/std/bitset | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/libstdc++-v3/include/std/bitset b/libstdc++-v3/include/std/bitset
index 0c84f15fda0..83c6416b770 100644
--- a/libstdc++-v3/include/std/bitset
+++ b/libstdc++-v3/include/std/bitset
@@ -1495,9 +1495,12 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
_CharT __zero, _CharT __one) const
   {
__s.assign(_Nb, __zero);
-   for (size_t __i = _Nb; __i > 0; --__i)
- if (_Unchecked_test(__i - 1))
-   _Traits::assign(__s[_Nb - __i], __one);
+   size_t __n = this->_Find_first();
+   while (__n < _Nb)
+ {
+   __s[_Nb - __n - 1] = __one;
+   __n = _Find_next(__n);
+ }
   }
 #endif // HOSTED
 
-- 
2.37.3



[PATCH 2/2]AArch64 Perform more late folding of reg moves and shifts which arrive after expand

2022-09-23 Thread Tamar Christina via Gcc-patches
Hi All,

Similar to the 1/2 patch but adds additional back-end specific folding for if
the register sequence was created as a result of RTL optimizations.

Concretely:

#include 

unsigned int foor (uint32x4_t x)
{
return x[1] >> 16;
}

generates:

foor:
umovw0, v0.h[3]
ret

instead of

foor:
umovw0, v0.s[1]
lsr w0, w0, 16
ret

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64.md (*si3_insn_uxtw): Split SHIFT into
left and right ones.
* config/aarch64/constraints.md (Usl): New.
* config/aarch64/iterators.md (SHIFT_NL, LSHIFTRT): New.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/shift-read.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 
c333fb1f72725992bb304c560f1245a242d5192d..6aa1fb4be003f2027d63ac69fd314c2bbc876258
 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -5493,7 +5493,7 @@ (define_insn "*rol3_insn"
 ;; zero_extend version of shifts
 (define_insn "*si3_insn_uxtw"
   [(set (match_operand:DI 0 "register_operand" "=r,r")
-   (zero_extend:DI (SHIFT_no_rotate:SI
+   (zero_extend:DI (SHIFT_arith:SI
 (match_operand:SI 1 "register_operand" "r,r")
 (match_operand:QI 2 "aarch64_reg_or_shift_imm_si" "Uss,r"]
   ""
@@ -5528,6 +5528,60 @@ (define_insn "*rolsi3_insn_uxtw"
   [(set_attr "type" "rotate_imm")]
 )
 
+(define_insn "*si3_insn2_uxtw"
+  [(set (match_operand:DI 0 "register_operand" "=r,?r,r")
+   (zero_extend:DI (LSHIFTRT:SI
+(match_operand:SI 1 "register_operand" "w,r,r")
+(match_operand:QI 2 "aarch64_reg_or_shift_imm_si" "Usl,Uss,r"]
+  ""
+  {
+switch (which_alternative)
+{
+  case 0:
+   {
+ machine_mode dest, vec_mode;
+ int val = INTVAL (operands[2]);
+ int size = 32 - val;
+ if (size == 16)
+   dest = HImode;
+ else if (size == 8)
+   dest = QImode;
+ else
+   gcc_unreachable ();
+
+ /* Get nearest 64-bit vector mode.  */
+ int nunits = 64 / size;
+ auto vector_mode
+   = mode_for_vector (as_a  (dest), nunits);
+ if (!vector_mode.exists (_mode))
+   gcc_unreachable ();
+ operands[1] = gen_rtx_REG (vec_mode, REGNO (operands[1]));
+ operands[2] = gen_int_mode (val / size, SImode);
+
+ /* Ideally we just call aarch64_get_lane_zero_extend but reload gets
+into a weird loop due to a mov of w -> r being present most time
+this instruction applies.  */
+ switch (dest)
+ {
+   case QImode:
+ return "umov\\t%w0, %1.b[%2]";
+   case HImode:
+ return "umov\\t%w0, %1.h[%2]";
+   default:
+ gcc_unreachable ();
+ }
+   }
+  case 1:
+   return "\\t%w0, %w1, %2";
+  case 2:
+   return "\\t%w0, %w1, %w2";
+  default:
+   gcc_unreachable ();
+  }
+  }
+  [(set_attr "type" "neon_to_gp,bfx,shift_reg")]
+)
+
 (define_insn "*3_insn"
   [(set (match_operand:SHORT 0 "register_operand" "=r")
(ASHIFT:SHORT (match_operand:SHORT 1 "register_operand" "r")
diff --git a/gcc/config/aarch64/constraints.md 
b/gcc/config/aarch64/constraints.md
index 
ee7587cca1673208e2bfd6b503a21d0c8b69bf75..470510d691ee8589aec9b0a71034677534641bea
 100644
--- a/gcc/config/aarch64/constraints.md
+++ b/gcc/config/aarch64/constraints.md
@@ -166,6 +166,14 @@ (define_constraint "Uss"
   (and (match_code "const_int")
(match_test "(unsigned HOST_WIDE_INT) ival < 32")))
 
+(define_constraint "Usl"
+  "@internal
+  A constraint that matches an immediate shift constant in SImode that has an
+  exact mode available to use."
+  (and (match_code "const_int")
+   (and (match_test "satisfies_constraint_Uss (op)")
+   (match_test "(32 - ival == 8) || (32 - ival == 16)"
+
 (define_constraint "Usn"
  "A constant that can be used with a CCMN operation (once negated)."
  (and (match_code "const_int")
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 
e904407b2169e589b7007ff966b2d9347a6d0fd2..bf16207225e3a4f1f20ed6f54321bccbbf15d73f
 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -2149,8 +2149,11 @@ (define_mode_attr sve_lane_pair_con [(VNx8HF "y") 
(VNx4SF "x")])
 ;; This code iterator allows the various shifts supported on the core
 (define_code_iterator SHIFT [ashift ashiftrt lshiftrt rotatert rotate])
 
-;; This code iterator allows all shifts except for rotates.
-(define_code_iterator SHIFT_no_rotate [ashift ashiftrt lshiftrt])
+;; This code iterator allows arithmetic shifts
+(define_code_iterator SHIFT_arith [ashift ashiftrt])
+
+;; Singleton code iterator for only logical right shift.
+(define_code_iterator LSHIFTRT 

[PATCH 1/2]middle-end Fold BIT_FIELD_REF and Shifts into BIT_FIELD_REFs alone

2022-09-23 Thread Tamar Christina via Gcc-patches
Hi All,

This adds a match.pd rule that can fold right shifts and bit_field_refs of
integers into just a bit_field_ref by adjusting the offset and the size of the
extract and adds an extend to the previous size.

Concretely turns:

#include 

unsigned int foor (uint32x4_t x)
{
return x[1] >> 16;
}

which used to generate:

  _1 = BIT_FIELD_REF ;
  _3 = _1 >> 16;

into

  _4 = BIT_FIELD_REF ;
  _2 = (unsigned int) _4;

I currently limit the rewrite to only doing it if the resulting extract is in
a mode the target supports. i.e. it won't rewrite it to extract say 13-bits
because I worry that for targets that won't have a bitfield extract instruction
this may be a de-optimization.

Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
and no issues.

Testcase are added in patch 2/2.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* match.pd: Add bitfield and shift folding.

--- inline copy of patch -- 
diff --git a/gcc/match.pd b/gcc/match.pd
index 
1d407414bee278c64c00d425d9f025c1c58d853d..b225d36dc758f1581502c8d03761544bfd499c01
 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -7245,6 +7245,23 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   && ANY_INTEGRAL_TYPE_P (type) && ANY_INTEGRAL_TYPE_P (TREE_TYPE(@0)))
   (IFN_REDUC_PLUS_WIDEN @0)))
 
+/* Canonicalize BIT_FIELD_REFS and shifts to BIT_FIELD_REFS.  */
+(for shift (rshift)
+ op (plus)
+ (simplify
+  (shift (BIT_FIELD_REF @0 @1 @2) integer_pow2p@3)
+  (if (INTEGRAL_TYPE_P (type))
+   (with { /* Can't use wide-int here as the precision differs between
+ @1 and @3.  */
+  unsigned HOST_WIDE_INT size = tree_to_uhwi (@1);
+  unsigned HOST_WIDE_INT shiftc = tree_to_uhwi (@3);
+  unsigned HOST_WIDE_INT newsize = size - shiftc;
+  tree nsize = wide_int_to_tree (bitsizetype, newsize);
+  tree ntype
+= build_nonstandard_integer_type (newsize, 1); }
+(if (ntype)
+ (convert:type (BIT_FIELD_REF:ntype @0 { nsize; } (op @2 @3
+
 (simplify
  (BIT_FIELD_REF (BIT_FIELD_REF @0 @1 @2) @3 @4)
  (BIT_FIELD_REF @0 @3 { const_binop (PLUS_EXPR, bitsizetype, @2, @4); }))




-- 
diff --git a/gcc/match.pd b/gcc/match.pd
index 
1d407414bee278c64c00d425d9f025c1c58d853d..b225d36dc758f1581502c8d03761544bfd499c01
 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -7245,6 +7245,23 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   && ANY_INTEGRAL_TYPE_P (type) && ANY_INTEGRAL_TYPE_P (TREE_TYPE(@0)))
   (IFN_REDUC_PLUS_WIDEN @0)))
 
+/* Canonicalize BIT_FIELD_REFS and shifts to BIT_FIELD_REFS.  */
+(for shift (rshift)
+ op (plus)
+ (simplify
+  (shift (BIT_FIELD_REF @0 @1 @2) integer_pow2p@3)
+  (if (INTEGRAL_TYPE_P (type))
+   (with { /* Can't use wide-int here as the precision differs between
+ @1 and @3.  */
+  unsigned HOST_WIDE_INT size = tree_to_uhwi (@1);
+  unsigned HOST_WIDE_INT shiftc = tree_to_uhwi (@3);
+  unsigned HOST_WIDE_INT newsize = size - shiftc;
+  tree nsize = wide_int_to_tree (bitsizetype, newsize);
+  tree ntype
+= build_nonstandard_integer_type (newsize, 1); }
+(if (ntype)
+ (convert:type (BIT_FIELD_REF:ntype @0 { nsize; } (op @2 @3
+
 (simplify
  (BIT_FIELD_REF (BIT_FIELD_REF @0 @1 @2) @3 @4)
  (BIT_FIELD_REF @0 @3 { const_binop (PLUS_EXPR, bitsizetype, @2, @4); }))





[PATCH][committed] aarch64: Add Arm Neoverse V2 support

2022-09-23 Thread Kyrylo Tkachov via Gcc-patches
Hi all,

This patch adds -mcpu/-mtune support for the Arm Neoverse V2 core.
This updates the internal references to "demeter", but leaves "demeter" as an
accepted value to -mcpu/-mtune as it appears in the released GCC 12 series.

Bootstrapped and tested on aarch64-none-linux-gnu.
Pushing to trunk.
Thanks,
Kyrill

gcc/ChangeLog:

* config/aarch64/aarch64-cores.def (neoverse-v2): New entry.
(demeter): Update tunings to neoversev2.
* config/aarch64/aarch64-tune.md: Regenerate.
* config/aarch64/aarch64.cc (demeter_addrcost_table): Rename to
neoversev2_addrcost_table.
(demeter_regmove_cost): Rename to neoversev2_addrcost_table.
(demeter_advsimd_vector_cost): Rename to neoversev2_advsimd_vector_cost.
(demeter_sve_vector_cost): Rename to neoversev2_sve_vector_cost.
(demeter_scalar_issue_info): Rename to neoversev2_scalar_issue_info.
(demeter_advsimd_issue_info): Rename to neoversev2_advsimd_issue_info.
(demeter_sve_issue_info): Rename to neoversev2_sve_issue_info.
(demeter_vec_issue_info): Rename to neoversev2_vec_issue_info.
Update references to above.
(demeter_vector_cost): Rename to neoversev2_vector_cost.
(demeter_tunings): Rename to neoversev2_tunings.
(aarch64_vec_op_count::rename_cycles_per_iter): Use
neoversev2_sve_issue_info instead of demeter_sve_issue_info.
* doc/invoke.texi (AArch64 Options): Document neoverse-v2.


v2.patch
Description: v2.patch


[committed] MAINTAINERS: Add myself to Write After Approval

2022-09-23 Thread Paul-Antoine Arras

Hello,

I just added myself to the Write After Approval section of MAINTAINERS file.
--
PAFrom d10308ff618e036d6c3d1a8c491ca4755b257612 Mon Sep 17 00:00:00 2001
From: Paul-Antoine Arras 
Date: Fri, 23 Sep 2022 10:21:48 +
Subject: [PATCH] MAINTAINERS: Add myself to Write After Approval

ChangeLog:

* MAINTAINERS (Write After Approval): Add myself.
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git MAINTAINERS MAINTAINERS
index be146855ed8..f63de226609 100644
--- MAINTAINERS
+++ MAINTAINERS
@@ -316,6 +316,7 @@ from other maintainers or reviewers.
 
 Mark G. Adams  
 Pedro Alves
+Paul-Antoine Arras 
 Raksit Ashok   
 Matt Austern   
 David Ayers
-- 
2.31.1



RE: [PATCH 2/2]AArch64 Add support for neg on v1df

2022-09-23 Thread Tamar Christina via Gcc-patches
> -Original Message-
> From: Richard Sandiford 
> Sent: Friday, September 23, 2022 6:04 AM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
> ; Marcus Shawcroft
> ; Kyrylo Tkachov 
> Subject: Re: [PATCH 2/2]AArch64 Add support for neg on v1df
> 
> Tamar Christina  writes:
> >> -Original Message-
> >> From: Richard Sandiford 
> >> Sent: Friday, September 23, 2022 5:30 AM
> >> To: Tamar Christina 
> >> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
> >> ; Marcus Shawcroft
> >> ; Kyrylo Tkachov
> 
> >> Subject: Re: [PATCH 2/2]AArch64 Add support for neg on v1df
> >>
> >> Tamar Christina  writes:
> >> > Hi All,
> >> >
> >> > This adds support for using scalar fneg on the V1DF type.
> >> >
> >> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >> >
> >> > Ok for master?
> >>
> >> Why just this one operation though?  Couldn't we extend iterators
> >> like
> >> GPF_F16 to include V1DF, avoiding the need for new patterns?
> >>
> >
> > Simply because it's the only one I know how to generate code for.
> > I can change GPF_F16 but I don't know under which circumstances we'd
> > generate a V1DF for the other operations.
> 
> We'd do it for things like:
> 
> __Float64x1_t foo (__Float64x1_t x) { return -x; }
> 
> if the pattern is available, instead of using subregs.  So one way would be to
> scan the expand rtl dump for subregs.

Ahh yes, I forgot about that ACLE type.

> 
> If the point is that there is no observable difference between defining 1-
> element vector ops and not, except for this one case, then that suggests we
> should handle this case in target-independent code instead.  There's no point
> forcing every target that has V1DF to define a duplicate of the DF neg
> pattern.

My original approach was to indeed use DF instead of V1DF, however since we
do define V1DF I had expected the mode to be somewhat usable.

So I'm happy to do whichever one you prefer now that I know how to test it.
I can either change my mid-end code, or extend the coverage of V1DF, any 
preference? 

Tamar

> 
> Thanks,
> Richard
> >
> > So if it's ok to do so without full test coverage I'm happy to do so...
> >
> > Tamar.
> >
> >> Richard
> >>
> >> >
> >> > Thanks,
> >> > Tamar
> >> >
> >> > gcc/ChangeLog:
> >> >
> >> >  * config/aarch64/aarch64-simd.md (negv1df2): New.
> >> >
> >> > gcc/testsuite/ChangeLog:
> >> >
> >> >  * gcc.target/aarch64/simd/addsub_2.c: New test.
> >> >
> >> > --- inline copy of patch --
> >> > diff --git a/gcc/config/aarch64/aarch64-simd.md
> >> > b/gcc/config/aarch64/aarch64-simd.md
> >> > index
> >> >
> >>
> f4152160084d6b6f34bd69f0ba6386c1ab50f77e..cf8c094bd4b76981cef2dd5dd7
> >> b8
> >> > e6be0d56101f 100644
> >> > --- a/gcc/config/aarch64/aarch64-simd.md
> >> > +++ b/gcc/config/aarch64/aarch64-simd.md
> >> > @@ -2713,6 +2713,14 @@ (define_insn "neg2"
> >> >[(set_attr "type" "neon_fp_neg_")]
> >> >  )
> >> >
> >> > +(define_insn "negv1df2"
> >> > + [(set (match_operand:V1DF 0 "register_operand" "=w")
> >> > +   (neg:V1DF (match_operand:V1DF 1 "register_operand" "w")))]
> >> > +"TARGET_SIMD"
> >> > + "fneg\\t%d0, %d1"
> >> > +  [(set_attr "type" "neon_fp_neg_d")]
> >> > +)
> >> > +
> >> >  (define_insn "abs2"
> >> >   [(set (match_operand:VHSDF 0 "register_operand" "=w")
> >> > (abs:VHSDF (match_operand:VHSDF 1 "register_operand"
> >> > "w")))] diff --git
> >> > a/gcc/testsuite/gcc.target/aarch64/simd/addsub_2.c
> >> > b/gcc/testsuite/gcc.target/aarch64/simd/addsub_2.c
> >> > new file mode 100644
> >> > index
> >> >
> >>
> ..55a7365e897f8af509de953129
> >> e0
> >> > f516974f7ca8
> >> > --- /dev/null
> >> > +++ b/gcc/testsuite/gcc.target/aarch64/simd/addsub_2.c
> >> > @@ -0,0 +1,22 @@
> >> > +/* { dg-do compile } */
> >> > +/* { dg-options "-Ofast" } */
> >> > +/* { dg-final { check-function-bodies "**" "" "" { target { le } }
> >> > +} } */
> >> > +
> >> > +#pragma GCC target "+nosve"
> >> > +
> >> > +/*
> >> > +** f1:
> >> > +** ...
> >> > +**  fnegd[0-9]+, d[0-9]+
> >> > +**  faddv[0-9]+.2s, v[0-9]+.2s, v[0-9]+.2s
> >> > +** ...
> >> > +*/
> >> > +void f1 (float *restrict a, float *restrict b, float *res, int n) {
> >> > +   for (int i = 0; i < 2; i+=2)
> >> > +{
> >> > +  res[i+0] = a[i+0] + b[i+0];
> >> > +  res[i+1] = a[i+1] - b[i+1];
> >> > +}
> >> > +}
> >> > +


Re: [PATCH] testsuite: Sanitize fails for SP FPU on Arm

2022-09-23 Thread Torbjorn SVENSSON via Gcc-patches

Hi Joseph,

On 2022-09-23 00:42, Joseph Myers wrote:

On Thu, 22 Sep 2022, Torbjörn SVENSSON via Gcc-patches wrote:


This patch stops reporting fails for Arm targets with single
precision floating point unit for types wider than 32 bits (the width
of float on arm-none-eabi).

As reported in PR102017, fenv is reported as supported in recent
versions of newlib. At the same time, for some Arm targets, the
implementation in libgcc does not support exceptions and thus, the
test fails with a call to abort().


It's definitely wrong to have this sort of Arm-specific conditional in
architecture-independent tests.  Tests requiring floating-point exceptions
support should have an appropriate dg-require-effective-target; if that
dg-require-effective-target wrongly passes in certain configurations, fix
it (or e.g. add a new check_effective_target_fenv_exceptions_double to
verify that exceptions work for double, as opposed to the present
check_effective_target_fenv_exceptions which checks whether exceptions
work for float, and then adjust tests requiring exceptions for double to
use the new effective-target).



Okay, thanks for your review.
I will split this test case into 3 files as for SP FPU, we would like to 
verify that the exception handling for the float part work, but at the 
same time we know that exception handling for double and long double 
should not work.
Do you think it's preferable to have the double and long double should 
be reported as UNSUPPORRTED, or as XFAIL for the SP FPU case?


Re: [PATCH 2/2]AArch64 Add support for neg on v1df

2022-09-23 Thread Richard Sandiford via Gcc-patches
Tamar Christina  writes:
>> -Original Message-
>> From: Richard Sandiford 
>> Sent: Friday, September 23, 2022 5:30 AM
>> To: Tamar Christina 
>> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
>> ; Marcus Shawcroft
>> ; Kyrylo Tkachov 
>> Subject: Re: [PATCH 2/2]AArch64 Add support for neg on v1df
>> 
>> Tamar Christina  writes:
>> > Hi All,
>> >
>> > This adds support for using scalar fneg on the V1DF type.
>> >
>> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>> >
>> > Ok for master?
>> 
>> Why just this one operation though?  Couldn't we extend iterators like
>> GPF_F16 to include V1DF, avoiding the need for new patterns?
>> 
>
> Simply because it's the only one I know how to generate code for.
> I can change GPF_F16 but I don't know under which circumstances we'd generate
> a V1DF for the other operations.

We'd do it for things like:

__Float64x1_t foo (__Float64x1_t x) { return -x; }

if the pattern is available, instead of using subregs.  So one way
would be to scan the expand rtl dump for subregs.

If the point is that there is no observable difference between
defining 1-element vector ops and not, except for this one case,
then that suggests we should handle this case in target-independent
code instead.  There's no point forcing every target that has V1DF
to define a duplicate of the DF neg pattern.

Thanks,
Richard
>
> So if it's ok to do so without full test coverage I'm happy to do so...
>
> Tamar.
>
>> Richard
>> 
>> >
>> > Thanks,
>> > Tamar
>> >
>> > gcc/ChangeLog:
>> >
>> >* config/aarch64/aarch64-simd.md (negv1df2): New.
>> >
>> > gcc/testsuite/ChangeLog:
>> >
>> >* gcc.target/aarch64/simd/addsub_2.c: New test.
>> >
>> > --- inline copy of patch --
>> > diff --git a/gcc/config/aarch64/aarch64-simd.md
>> > b/gcc/config/aarch64/aarch64-simd.md
>> > index
>> >
>> f4152160084d6b6f34bd69f0ba6386c1ab50f77e..cf8c094bd4b76981cef2dd5dd7
>> b8
>> > e6be0d56101f 100644
>> > --- a/gcc/config/aarch64/aarch64-simd.md
>> > +++ b/gcc/config/aarch64/aarch64-simd.md
>> > @@ -2713,6 +2713,14 @@ (define_insn "neg2"
>> >[(set_attr "type" "neon_fp_neg_")]
>> >  )
>> >
>> > +(define_insn "negv1df2"
>> > + [(set (match_operand:V1DF 0 "register_operand" "=w")
>> > +   (neg:V1DF (match_operand:V1DF 1 "register_operand" "w")))]
>> > +"TARGET_SIMD"
>> > + "fneg\\t%d0, %d1"
>> > +  [(set_attr "type" "neon_fp_neg_d")]
>> > +)
>> > +
>> >  (define_insn "abs2"
>> >   [(set (match_operand:VHSDF 0 "register_operand" "=w")
>> > (abs:VHSDF (match_operand:VHSDF 1 "register_operand" "w")))]
>> > diff --git a/gcc/testsuite/gcc.target/aarch64/simd/addsub_2.c
>> > b/gcc/testsuite/gcc.target/aarch64/simd/addsub_2.c
>> > new file mode 100644
>> > index
>> >
>> ..55a7365e897f8af509de953129
>> e0
>> > f516974f7ca8
>> > --- /dev/null
>> > +++ b/gcc/testsuite/gcc.target/aarch64/simd/addsub_2.c
>> > @@ -0,0 +1,22 @@
>> > +/* { dg-do compile } */
>> > +/* { dg-options "-Ofast" } */
>> > +/* { dg-final { check-function-bodies "**" "" "" { target { le } } }
>> > +} */
>> > +
>> > +#pragma GCC target "+nosve"
>> > +
>> > +/*
>> > +** f1:
>> > +** ...
>> > +**fnegd[0-9]+, d[0-9]+
>> > +**faddv[0-9]+.2s, v[0-9]+.2s, v[0-9]+.2s
>> > +** ...
>> > +*/
>> > +void f1 (float *restrict a, float *restrict b, float *res, int n) {
>> > +   for (int i = 0; i < 2; i+=2)
>> > +{
>> > +  res[i+0] = a[i+0] + b[i+0];
>> > +  res[i+1] = a[i+1] - b[i+1];
>> > +}
>> > +}
>> > +


RE: [PATCH 2/2]AArch64 Extend tbz pattern to allow SI to SI extensions.

2022-09-23 Thread Tamar Christina via Gcc-patches
> -Original Message-
> From: Richard Sandiford 
> Sent: Friday, September 23, 2022 5:43 AM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
> ; Marcus Shawcroft
> ; Kyrylo Tkachov 
> Subject: Re: [PATCH 2/2]AArch64 Extend tbz pattern to allow SI to SI
> extensions.
> 
> Tamar Christina  writes:
> > Hi All,
> >
> > This adds additional recognition of & 1 into the tbz/tbnz pattern.
> >
> > Concretely with the mid-end changes this changes
> >
> > void g1(bool x)
> > {
> >   if (__builtin_expect (x, 0))
> > h ();
> > }
> >
> > from
> >
> > tst w0, 255
> > bne .L7
> >
> > to
> >
> > tbnzw0, #0, .L5
> >
> > This pattern occurs ~120,000 times in SPECCPU 20127, basically on
> > every boolean comparison.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > * config/aarch64/aarch64.md (*tb1): Renamed this
> ...
> > (*tb1): ... To this.
> > * config/aarch64/iterators.md (GPI2): New.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/aarch64/tbz_1.c: New test.
> >
> > --- inline copy of patch --
> > diff --git a/gcc/config/aarch64/aarch64.md
> > b/gcc/config/aarch64/aarch64.md index
> >
> 6aa1fb4be003f2027d63ac69fd314c2bbc876258..3faa03f453c94665d9d82225f18
> 0
> > d8afdcd0b5fe 100644
> > --- a/gcc/config/aarch64/aarch64.md
> > +++ b/gcc/config/aarch64/aarch64.md
> > @@ -943,31 +943,33 @@ (define_insn "*cb1"
> >   (const_int 1)))]
> >  )
> >
> > -(define_insn "*tb1"
> > +(define_insn "*tb1"
> >[(set (pc) (if_then_else
> > - (EQL (zero_extract:DI (match_operand:GPI 0 "register_operand"
> "r")
> > -   (const_int 1)
> > -   (match_operand 1
> > - "aarch64_simd_shift_imm_" "n"))
> > + (EQL (zero_extract:GPI2
> > +(match_operand:GPI 0 "register_operand" "r")
> > +(const_int 1)
> > +(match_operand 1
> "aarch64_simd_shift_imm_" "n"))
> >(const_int 0))
> >  (label_ref (match_operand 2 "" ""))
> >  (pc)))
> > (clobber (reg:CC CC_REGNUM))]
> > -  "!aarch64_track_speculation"
> > +  "!aarch64_track_speculation
> > +&& known_ge (GET_MODE_SIZE (mode),
> > +GET_MODE_SIZE (mode))"
> 
> Is this check necessary?  The extraction evaluates to 0 or 1, so it shouldn't
> matter whether it is interpreted as DI or SI.
> 

Ah yes, fair point.

I will remove the check,

Thanks,
Tamar.

> OK without the check if you agree.
> 
> Thanks,
> Richard
> 
> >{
> >  if (get_attr_length (insn) == 8)
> >{
> > if (get_attr_far_branch (insn) == 1)
> >   return aarch64_gen_far_branch (operands, 2, "Ltb",
> > -"\\t%0, %1, ");
> > +"\\t%0, %1, ");
> > else
> >   {
> > operands[1] = GEN_INT (HOST_WIDE_INT_1U << UINTVAL
> (operands[1]));
> > -   return "tst\t%0, %1\;\t%l2";
> > +   return "tst\t%0, %1\;\t%l2";
> >   }
> >}
> >  else
> > -  return "\t%0, %1, %l2";
> > +  return "\t%0, %1, %l2";
> >}
> >[(set_attr "type" "branch")
> > (set (attr "length")
> > diff --git a/gcc/config/aarch64/iterators.md
> > b/gcc/config/aarch64/iterators.md index
> >
> 89ca66fd291b60a28979785706ecc5345ea86744..f6b2e7a83c63cab73947b6bd6
> 1b4
> > 99b4b57d14ac 100644
> > --- a/gcc/config/aarch64/iterators.md
> > +++ b/gcc/config/aarch64/iterators.md
> > @@ -28,6 +28,8 @@ (define_mode_iterator CCFP_CCFPE [CCFP CCFPE])
> >
> >  ;; Iterator for General Purpose Integer registers (32- and 64-bit
> > modes)  (define_mode_iterator GPI [SI DI])
> > +;; Copy of the above iterator
> > +(define_mode_iterator GPI2 [SI DI])
> >
> >  ;; Iterator for HI, SI, DI, some instructions can only work on these modes.
> >  (define_mode_iterator GPI_I16 [(HI "AARCH64_ISA_F16") SI DI]) diff
> > --git a/gcc/testsuite/gcc.target/aarch64/tbz_1.c
> > b/gcc/testsuite/gcc.target/aarch64/tbz_1.c
> > new file mode 100644
> > index
> >
> ..6a75eb4e7aedbfa3ae329358c
> 6ee
> > 4d675704a074
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/tbz_1.c
> > @@ -0,0 +1,32 @@
> > +/* { dg-do compile } */
> > +/* { dg-additional-options "-O2 -std=c99  -fno-unwind-tables
> > +-fno-asynchronous-unwind-tables" } */
> > +/* { dg-final { check-function-bodies "**" "" "" { target { le } } }
> > +} */
> > +
> > +#include // Type your code here, or load an example.
> > +
> > +void h(void);
> > +
> > +/*
> > +** g1:
> > +** tbnzw[0-9], #?0, .L([0-9]+)
> > +** ret
> > +** ...
> > +*/
> > +void g1(bool x)
> > +{
> > +  if (__builtin_expect (x, 0))
> > +h ();
> > +}
> > +
> > +/*
> > +** g2:
> > +** tbz w[0-9]+, #?0, .L([0-9]+)
> > +** b   h
> > +** ...
> > +*/
> > +void g2(bool 

RE: [PATCH 2/2]AArch64 Add support for neg on v1df

2022-09-23 Thread Tamar Christina via Gcc-patches
> -Original Message-
> From: Richard Sandiford 
> Sent: Friday, September 23, 2022 5:30 AM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
> ; Marcus Shawcroft
> ; Kyrylo Tkachov 
> Subject: Re: [PATCH 2/2]AArch64 Add support for neg on v1df
> 
> Tamar Christina  writes:
> > Hi All,
> >
> > This adds support for using scalar fneg on the V1DF type.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> 
> Why just this one operation though?  Couldn't we extend iterators like
> GPF_F16 to include V1DF, avoiding the need for new patterns?
> 

Simply because it's the only one I know how to generate code for.
I can change GPF_F16 but I don't know under which circumstances we'd generate
a V1DF for the other operations.

So if it's ok to do so without full test coverage I'm happy to do so...

Tamar.

> Richard
> 
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > * config/aarch64/aarch64-simd.md (negv1df2): New.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/aarch64/simd/addsub_2.c: New test.
> >
> > --- inline copy of patch --
> > diff --git a/gcc/config/aarch64/aarch64-simd.md
> > b/gcc/config/aarch64/aarch64-simd.md
> > index
> >
> f4152160084d6b6f34bd69f0ba6386c1ab50f77e..cf8c094bd4b76981cef2dd5dd7
> b8
> > e6be0d56101f 100644
> > --- a/gcc/config/aarch64/aarch64-simd.md
> > +++ b/gcc/config/aarch64/aarch64-simd.md
> > @@ -2713,6 +2713,14 @@ (define_insn "neg2"
> >[(set_attr "type" "neon_fp_neg_")]
> >  )
> >
> > +(define_insn "negv1df2"
> > + [(set (match_operand:V1DF 0 "register_operand" "=w")
> > +   (neg:V1DF (match_operand:V1DF 1 "register_operand" "w")))]
> > +"TARGET_SIMD"
> > + "fneg\\t%d0, %d1"
> > +  [(set_attr "type" "neon_fp_neg_d")]
> > +)
> > +
> >  (define_insn "abs2"
> >   [(set (match_operand:VHSDF 0 "register_operand" "=w")
> > (abs:VHSDF (match_operand:VHSDF 1 "register_operand" "w")))]
> > diff --git a/gcc/testsuite/gcc.target/aarch64/simd/addsub_2.c
> > b/gcc/testsuite/gcc.target/aarch64/simd/addsub_2.c
> > new file mode 100644
> > index
> >
> ..55a7365e897f8af509de953129
> e0
> > f516974f7ca8
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/simd/addsub_2.c
> > @@ -0,0 +1,22 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-Ofast" } */
> > +/* { dg-final { check-function-bodies "**" "" "" { target { le } } }
> > +} */
> > +
> > +#pragma GCC target "+nosve"
> > +
> > +/*
> > +** f1:
> > +** ...
> > +** fnegd[0-9]+, d[0-9]+
> > +** faddv[0-9]+.2s, v[0-9]+.2s, v[0-9]+.2s
> > +** ...
> > +*/
> > +void f1 (float *restrict a, float *restrict b, float *res, int n) {
> > +   for (int i = 0; i < 2; i+=2)
> > +{
> > +  res[i+0] = a[i+0] + b[i+0];
> > +  res[i+1] = a[i+1] - b[i+1];
> > +}
> > +}
> > +


Re: [PATCH 2/2]AArch64 Extend tbz pattern to allow SI to SI extensions.

2022-09-23 Thread Richard Sandiford via Gcc-patches
Tamar Christina  writes:
> Hi All,
>
> This adds additional recognition of & 1 into the tbz/tbnz pattern.
>
> Concretely with the mid-end changes this changes
>
> void g1(bool x)
> {
>   if (__builtin_expect (x, 0))
> h ();
> }
>
> from
>
> tst w0, 255
> bne .L7
>
> to
>
> tbnzw0, #0, .L5
>
> This pattern occurs ~120,000 times in SPECCPU 20127, basically on
> every boolean comparison.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64.md (*tb1): Renamed this ...
>   (*tb1): ... To this.
>   * config/aarch64/iterators.md (GPI2): New.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/tbz_1.c: New test.
>
> --- inline copy of patch -- 
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 
> 6aa1fb4be003f2027d63ac69fd314c2bbc876258..3faa03f453c94665d9d82225f180d8afdcd0b5fe
>  100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -943,31 +943,33 @@ (define_insn "*cb1"
> (const_int 1)))]
>  )
>  
> -(define_insn "*tb1"
> +(define_insn "*tb1"
>[(set (pc) (if_then_else
> -   (EQL (zero_extract:DI (match_operand:GPI 0 "register_operand" "r")
> - (const_int 1)
> - (match_operand 1
> -   "aarch64_simd_shift_imm_" "n"))
> +   (EQL (zero_extract:GPI2
> +  (match_operand:GPI 0 "register_operand" "r")
> +  (const_int 1)
> +  (match_operand 1 "aarch64_simd_shift_imm_" "n"))
>  (const_int 0))
>(label_ref (match_operand 2 "" ""))
>(pc)))
> (clobber (reg:CC CC_REGNUM))]
> -  "!aarch64_track_speculation"
> +  "!aarch64_track_speculation
> +&& known_ge (GET_MODE_SIZE (mode),
> +  GET_MODE_SIZE (mode))"

Is this check necessary?  The extraction evaluates to 0 or 1,
so it shouldn't matter whether it is interpreted as DI or SI.

OK without the check if you agree.

Thanks,
Richard

>{
>  if (get_attr_length (insn) == 8)
>{
>   if (get_attr_far_branch (insn) == 1)
> return aarch64_gen_far_branch (operands, 2, "Ltb",
> -  "\\t%0, %1, ");
> +  "\\t%0, %1, ");
>   else
> {
>   operands[1] = GEN_INT (HOST_WIDE_INT_1U << UINTVAL (operands[1]));
> - return "tst\t%0, %1\;\t%l2";
> + return "tst\t%0, %1\;\t%l2";
> }
>}
>  else
> -  return "\t%0, %1, %l2";
> +  return "\t%0, %1, %l2";
>}
>[(set_attr "type" "branch")
> (set (attr "length")
> diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
> index 
> 89ca66fd291b60a28979785706ecc5345ea86744..f6b2e7a83c63cab73947b6bd61b499b4b57d14ac
>  100644
> --- a/gcc/config/aarch64/iterators.md
> +++ b/gcc/config/aarch64/iterators.md
> @@ -28,6 +28,8 @@ (define_mode_iterator CCFP_CCFPE [CCFP CCFPE])
>  
>  ;; Iterator for General Purpose Integer registers (32- and 64-bit modes)
>  (define_mode_iterator GPI [SI DI])
> +;; Copy of the above iterator
> +(define_mode_iterator GPI2 [SI DI])
>  
>  ;; Iterator for HI, SI, DI, some instructions can only work on these modes.
>  (define_mode_iterator GPI_I16 [(HI "AARCH64_ISA_F16") SI DI])
> diff --git a/gcc/testsuite/gcc.target/aarch64/tbz_1.c 
> b/gcc/testsuite/gcc.target/aarch64/tbz_1.c
> new file mode 100644
> index 
> ..6a75eb4e7aedbfa3ae329358c6ee4d675704a074
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/tbz_1.c
> @@ -0,0 +1,32 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-O2 -std=c99  -fno-unwind-tables 
> -fno-asynchronous-unwind-tables" } */
> +/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */
> +
> +#include // Type your code here, or load an example.
> +
> +void h(void);
> +
> +/*
> +** g1:
> +**   tbnzw[0-9], #?0, .L([0-9]+)
> +**   ret
> +**   ...
> +*/
> +void g1(bool x)
> +{
> +  if (__builtin_expect (x, 0))
> +h ();
> +}
> +
> +/*
> +** g2:
> +**   tbz w[0-9]+, #?0, .L([0-9]+)
> +**   b   h
> +**   ...
> +*/
> +void g2(bool x)
> +{
> +  if (__builtin_expect (x, 1))
> +h ();
> +}
> +


Re: [PATCH v3 06/11] OpenMP: Pointers and member mappings

2022-09-23 Thread Jakub Jelinek via Gcc-patches
On Fri, Sep 23, 2022 at 08:29:46AM +0100, Julian Brown wrote:
> On Thu, 22 Sep 2022 15:17:08 +0200
> Jakub Jelinek  wrote:
> 
> > > +  bool built_sym_hash = false;  
> > 
> > So, I think usually we don't construct such hash_maps right away,
> > but have just pointer to the hash map initialized to NULL (then you
> > don't need to built_sym_hash next to it) and you simply new the
> > hash_map when needed the first time and delete it at the end (which
> > does nothing if it is NULL).
> 
> How about this version? (Re-tested.)

I'd appreciate if Tobias could have a second look, I'm getting less and
less familiar with Fortran, from my POV LGTM.
The patch is ok if Tobias doesn't spot anything today.

Jakub



[PATCH 2/4]AArch64 Add implementation for pow2 bitmask division.

2022-09-23 Thread Tamar Christina via Gcc-patches
Hi All,

This adds an implementation for the new optab for unsigned pow2 bitmask for
AArch64.

The implementation rewrites:

   x = y / (2 ^ (sizeof (y)/2)-1

into e.g. (for bytes)

   (x + ((x + 257) >> 8)) >> 8

where it's required that the additions be done in double the precision of x
such that we don't lose any bits during an overflow.

Essentially the sequence decomposes the division into doing two smaller
divisions, one for the top and bottom parts of the number and adding the results
back together.

To account for the fact that shift by 8 would be division by 256 we add 1 to
both parts of x such that when 255 we still get 1 as the answer.

Because the amount we shift are half the original datatype we can use the
halfing instructions the ISA provides to do the operation instead of using
actual shifts.

For AArch64 this means we generate for:

void draw_bitmap1(uint8_t* restrict pixel, uint8_t level, int n)
{
  for (int i = 0; i < (n & -16); i+=1)
pixel[i] = (pixel[i] * level) / 0xff;
}

the following:

moviv3.16b, 0x1
umull2  v1.8h, v0.16b, v2.16b
umull   v0.8h, v0.8b, v2.8b
addhn   v5.8b, v1.8h, v3.8h
addhn   v4.8b, v0.8h, v3.8h
uaddw   v1.8h, v1.8h, v5.8b
uaddw   v0.8h, v0.8h, v4.8b
uzp2v0.16b, v0.16b, v1.16b

instead of:

umull   v2.8h, v1.8b, v5.8b
umull2  v1.8h, v1.16b, v5.16b
umull   v0.4s, v2.4h, v3.4h
umull2  v2.4s, v2.8h, v3.8h
umull   v4.4s, v1.4h, v3.4h
umull2  v1.4s, v1.8h, v3.8h
uzp2v0.8h, v0.8h, v2.8h
uzp2v1.8h, v4.8h, v1.8h
shrnv0.8b, v0.8h, 7
shrn2   v0.16b, v1.8h, 7

Which results in significantly faster code.

Thanks for Wilco for the concept.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (@aarch64_bitmask_udiv3): New.
* config/aarch64/aarch64.cc 
(aarch64_vectorize_can_special_div_by_constant): New.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/div-by-bitmask.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 
587a45d77721e1b39accbad7dbeca4d741eccb10..f4152160084d6b6f34bd69f0ba6386c1ab50f77e
 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -4831,6 +4831,65 @@ (define_expand "aarch64_hn2"
   }
 )
 
+;; div optimizations using narrowings
+;; we can do the division e.g. shorts by 255 faster by calculating it as
+;; (x + ((x + 257) >> 8)) >> 8 assuming the operation is done in
+;; double the precision of x.
+;;
+;; If we imagine a short as being composed of two blocks of bytes then
+;; adding 257 or 0b_0001__0001 to the number is equivalen to
+;; adding 1 to each sub component:
+;;
+;;  short value of 16-bits
+;; ┌──┬┐
+;; │  ││
+;; └──┴┘
+;;   8-bit part1 ▲  8-bit part2   ▲
+;;   ││
+;;   ││
+;;  +1   +1
+;;
+;; after the first addition, we have to shift right by 8, and narrow the
+;; results back to a byte.  Remember that the addition must be done in
+;; double the precision of the input.  Since 8 is half the size of a short
+;; we can use a narrowing halfing instruction in AArch64, addhn which also
+;; does the addition in a wider precision and narrows back to a byte.  The
+;; shift itself is implicit in the operation as it writes back only the top
+;; half of the result. i.e. bits 2*esize-1:esize.
+;;
+;; Since we have narrowed the result of the first part back to a byte, for
+;; the second addition we can use a widening addition, uaddw.
+;;
+;; For the finaly shift, since it's unsigned arithmatic we emit an ushr by 8
+;; to shift and the vectorizer.
+;;
+;; The shift is later optimized by combine to a uzp2 with movi #0.
+(define_expand "@aarch64_bitmask_udiv3"
+  [(match_operand:VQN 0 "register_operand")
+   (match_operand:VQN 1 "register_operand")
+   (match_operand:VQN 2 "immediate_operand")]
+  "TARGET_SIMD"
+{
+  unsigned HOST_WIDE_INT size
+= (1ULL << GET_MODE_UNIT_BITSIZE (mode)) - 1;
+  if (!CONST_VECTOR_P (operands[2])
+  || const_vector_encoded_nelts (operands[2]) != 1
+  || size != UINTVAL (CONST_VECTOR_ELT (operands[2], 0)))
+FAIL;
+
+  rtx addend = gen_reg_rtx (mode);
+  rtx val = aarch64_simd_gen_const_vector_dup (mode, 1);
+  emit_move_insn (addend, lowpart_subreg (mode, val, mode));
+  rtx tmp1 = gen_reg_rtx (mode);
+  rtx tmp2 = gen_reg_rtx (mode);
+  emit_insn (gen_aarch64_addhn (tmp1, operands[1], addend));
+  unsigned bitsize = GET_MODE_UNIT_BITSIZE (mode);
+  rtx shift_vector = aarch64_simd_gen_const_vector_dup (mode, bitsize);
+  emit_insn (gen_aarch64_uaddw (tmp2, operands[1], tmp1));
+  emit_insn (gen_aarch64_simd_lshr (operands[0], tmp2, shift_vector));

[PATCH 4/4]AArch64 sve2: rewrite pack + NARROWB + NARROWB to NARROWB + NARROWT

2022-09-23 Thread Tamar Christina via Gcc-patches
Hi All,

This adds an RTL pattern for when two NARROWB instructions are being combined
with a PACK.  The second NARROWB is then transformed into a NARROWT.

For the example:

void draw_bitmap1(uint8_t* restrict pixel, uint8_t level, int n)
{
  for (int i = 0; i < (n & -16); i+=1)
pixel[i] += (pixel[i] * level) / 0xff;
}

we generate:

addhnb  z6.b, z0.h, z4.h
addhnb  z5.b, z1.h, z4.h
addhnb  z0.b, z0.h, z6.h
addhnt  z0.b, z1.h, z5.h
add z0.b, z0.b, z2.b

instead of:

addhnb  z6.b, z1.h, z4.h
addhnb  z5.b, z0.h, z4.h
addhnb  z1.b, z1.h, z6.h
addhnb  z0.b, z0.h, z5.h
uzp1z0.b, z0.b, z1.b
add z0.b, z0.b, z2.b

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64-sve2.md (*aarch64_sve_pack_):
New.
* config/aarch64/iterators.md (binary_top): New.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-div-bitmask-4.c: New test.
* gcc.target/aarch64/sve2/div-by-bitmask_2.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64-sve2.md 
b/gcc/config/aarch64/aarch64-sve2.md
index 
ab5dcc369481311e5bd68a1581265e1ce99b4b0f..0ee46c8b0d43467da4a6b98ad3c41e5d05d8cf38
 100644
--- a/gcc/config/aarch64/aarch64-sve2.md
+++ b/gcc/config/aarch64/aarch64-sve2.md
@@ -1600,6 +1600,25 @@ (define_insn "@aarch64_sve_"
   "\t%0., %2., %3."
 )
 
+(define_insn_and_split "*aarch64_sve_pack_"
+  [(set (match_operand: 0 "register_operand" "=w")
+   (unspec:
+ [(match_operand:SVE_FULL_HSDI 1 "register_operand" "w")
+  (subreg:SVE_FULL_HSDI (unspec:
+[(match_operand:SVE_FULL_HSDI 2 "register_operand" "w")
+ (match_operand:SVE_FULL_HSDI 3 "register_operand" "w")]
+SVE2_INT_BINARY_NARROWB) 0)]
+ UNSPEC_PACK))]
+  "TARGET_SVE2"
+  "#"
+  "&& true"
+  [(const_int 0)]
+{
+  rtx tmp = lowpart_subreg (mode, operands[1], mode);
+  emit_insn (gen_aarch64_sve (, mode,
+ operands[0], tmp, operands[2], operands[3]));
+})
+
 ;; -
 ;;  [INT] Narrowing right shifts
 ;; -
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 
0dd9dc66f7ccd78acacb759662d0cd561cd5b4ef..37d8161a33b1c399d80be82afa67613a087389d4
 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -3589,6 +3589,11 @@ (define_int_attr brk_op [(UNSPEC_BRKA "a") (UNSPEC_BRKB 
"b")
 
 (define_int_attr sve_pred_op [(UNSPEC_PFIRST "pfirst") (UNSPEC_PNEXT "pnext")])
 
+(define_int_attr binary_top [(UNSPEC_ADDHNB "UNSPEC_ADDHNT")
+(UNSPEC_RADDHNB "UNSPEC_RADDHNT")
+(UNSPEC_RSUBHNB "UNSPEC_RSUBHNT")
+(UNSPEC_SUBHNB "UNSPEC_SUBHNT")])
+
 (define_int_attr sve_int_op [(UNSPEC_ADCLB "adclb")
 (UNSPEC_ADCLT "adclt")
 (UNSPEC_ADDHNB "addhnb")
diff --git a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-4.c 
b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-4.c
new file mode 100644
index 
..0df08bda6fd3e33280307ea15c82dd9726897cfd
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-4.c
@@ -0,0 +1,26 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-additional-options "-fno-vect-cost-model" { target aarch64*-*-* } } */
+
+#include 
+#include "tree-vect.h"
+
+#define N 50
+#define TYPE uint32_t
+
+__attribute__((noipa, noinline, optimize("O1")))
+void fun1(TYPE* restrict pixel, TYPE level, int n)
+{
+  for (int i = 0; i < n; i+=1)
+pixel[i] += (pixel[i] * (uint64_t)level) / 0xUL;
+}
+
+__attribute__((noipa, noinline, optimize("O3")))
+void fun2(TYPE* restrict pixel, TYPE level, int n)
+{
+  for (int i = 0; i < n; i+=1)
+pixel[i] += (pixel[i] * (uint64_t)level) / 0xUL;
+}
+
+#include "vect-div-bitmask.h"
+
+/* { dg-final { scan-tree-dump-not "vect_recog_divmod_pattern: detected" 
"vect" { target aarch64*-*-* } } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/div-by-bitmask_2.c 
b/gcc/testsuite/gcc.target/aarch64/sve2/div-by-bitmask_2.c
new file mode 100644
index 
..cddcebdf15ecaa9dc515f58cdbced36c8038db1b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/div-by-bitmask_2.c
@@ -0,0 +1,56 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O2 -std=c99" } */
+/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */
+
+#include 
+
+/*
+** draw_bitmap1:
+** ...
+** addhnb  z6.b, z0.h, z4.h
+** addhnb  z5.b, z1.h, z4.h
+** addhnb  z0.b, z0.h, z6.h
+** addhnt  z0.b, z1.h, z5.h
+** ...
+*/
+void draw_bitmap1(uint8_t* restrict pixel, uint8_t level, int n)
+{
+  for (int i = 

[PATCH 3/4]AArch64 Add SVE2 implementation for pow2 bitmask division

2022-09-23 Thread Tamar Christina via Gcc-patches
Hi All,

In plenty of image and video processing code it's common to modify pixel values
by a widening operation and then scale them back into range by dividing by 255.

This patch adds an named function to allow us to emit an optimized sequence
when doing an unsigned division that is equivalent to:

   x = y / (2 ^ (bitsize (y)/2)-1)

For SVE2 this means we generate for:

void draw_bitmap1(uint8_t* restrict pixel, uint8_t level, int n)
{
  for (int i = 0; i < (n & -16); i+=1)
pixel[i] = (pixel[i] * level) / 0xff;
}

the following:

mov z3.b, #1
.L3:
ld1bz0.h, p0/z, [x0, x3]
mul z0.h, p1/m, z0.h, z2.h
addhnb  z1.b, z0.h, z3.h
addhnb  z0.b, z0.h, z1.h
st1bz0.h, p0, [x0, x3]
inchx3
whilelo p0.h, w3, w2
b.any   .L3

instead of:

.L3:
ld1bz0.h, p1/z, [x0, x3]
mul z0.h, p0/m, z0.h, z1.h
umulh   z0.h, p0/m, z0.h, z2.h
lsr z0.h, z0.h, #7
st1bz0.h, p1, [x0, x3]
inchx3
whilelo p1.h, w3, w2
b.any   .L3

Which results in significantly faster code.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64-sve2.md (@aarch64_bitmask_udiv3): New.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve2/div-by-bitmask_1.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64-sve2.md 
b/gcc/config/aarch64/aarch64-sve2.md
index 
f138f4be4bcf74c1a4a6d5847ed831435246737f..4d097f7c405cc68a1d6cda5c234a1023a6eba0d1
 100644
--- a/gcc/config/aarch64/aarch64-sve2.md
+++ b/gcc/config/aarch64/aarch64-sve2.md
@@ -71,6 +71,7 @@
 ;;  [INT] Reciprocal approximation
 ;;  [INT<-FP] Base-2 logarithm
 ;;  [INT] Polynomial multiplication
+;;  [INT] Misc optab implementations
 ;;
 ;; == Permutation
 ;;  [INT,FP] General permutes
@@ -2312,6 +2313,47 @@ (define_insn "@aarch64_sve_"
   "\t%0., %1., %2."
 )
 
+;; -
+;;  [INT] Misc optab implementations
+;; -
+;; Includes:
+;; - aarch64_bitmask_udiv
+;; -
+
+;; div optimizations using narrowings
+;; we can do the division e.g. shorts by 255 faster by calculating it as
+;; (x + ((x + 257) >> 8)) >> 8 assuming the operation is done in
+;; double the precision of x.
+;;
+;; See aarch64-simd.md for bigger explanation.
+(define_expand "@aarch64_bitmask_udiv3"
+  [(match_operand:SVE_FULL_HSDI 0 "register_operand")
+   (match_operand:SVE_FULL_HSDI 1 "register_operand")
+   (match_operand:SVE_FULL_HSDI 2 "immediate_operand")]
+  "TARGET_SVE2"
+{
+  unsigned HOST_WIDE_INT size
+= (1ULL << GET_MODE_UNIT_BITSIZE (mode)) - 1;
+  if (!CONST_VECTOR_P (operands[2])
+  || const_vector_encoded_nelts (operands[2]) != 1
+  || size != UINTVAL (CONST_VECTOR_ELT (operands[2], 0)))
+FAIL;
+
+  rtx addend = gen_reg_rtx (mode);
+  rtx tmp1 = gen_reg_rtx (mode);
+  rtx tmp2 = gen_reg_rtx (mode);
+  rtx val = aarch64_simd_gen_const_vector_dup (mode, 1);
+  emit_move_insn (addend, lowpart_subreg (mode, val, mode));
+  emit_insn (gen_aarch64_sve (UNSPEC_ADDHNB, mode, tmp1, operands[1],
+ addend));
+  emit_insn (gen_aarch64_sve (UNSPEC_ADDHNB, mode, tmp2, operands[1],
+ lowpart_subreg (mode, tmp1,
+ mode)));
+  emit_move_insn (operands[0],
+ lowpart_subreg (mode, tmp2, mode));
+  DONE;
+})
+
 ;; =
 ;; == Permutation
 ;; =
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/div-by-bitmask_1.c 
b/gcc/testsuite/gcc.target/aarch64/sve2/div-by-bitmask_1.c
new file mode 100644
index 
..e6f5098c30f4e2eb8ed1af153c0bb0d204cda6d9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/div-by-bitmask_1.c
@@ -0,0 +1,53 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O2 -std=c99" } */
+/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */
+
+#include 
+
+/*
+** draw_bitmap1:
+** ...
+** mul z[0-9]+.h, p[0-9]+/m, z[0-9]+.h, z[0-9]+.h
+** addhnb  z[0-9]+.b, z[0-9]+.h, z[0-9]+.h
+** addhnb  z[0-9]+.b, z[0-9]+.h, z[0-9]+.h
+** ...
+*/
+void draw_bitmap1(uint8_t* restrict pixel, uint8_t level, int n)
+{
+  for (int i = 0; i < (n & -16); i+=1)
+pixel[i] = (pixel[i] * level) / 0xff;
+}
+
+void draw_bitmap2(uint8_t* restrict pixel, uint8_t level, int n)
+{
+  for (int i = 0; i < (n & -16); i+=1)
+pixel[i] = (pixel[i] * level) / 0xfe;
+}
+
+/*
+** draw_bitmap3:
+** ...
+** mul z[0-9]+.s, p[0-9]+/m, z[0-9]+.s, z[0-9]+.s
+** addhnb  z[0-9]+.h, 

[PATCH 1/4]middle-end Support not decomposing specific divisions during vectorization.

2022-09-23 Thread Tamar Christina via Gcc-patches
Hi All,

In plenty of image and video processing code it's common to modify pixel values
by a widening operation and then scale them back into range by dividing by 255.

e.g.:

   x = y / (2 ^ (bitsize (y)/2)-1

This patch adds a new target hook can_special_div_by_const, similar to
can_vec_perm which can be called to check if a target will handle a particular
division in a special way in the back-end.

The vectorizer will then vectorize the division using the standard tree code
and at expansion time the hook is called again to generate the code for the
division.

Alot of the changes in the patch are to pass down the tree operands in all paths
that can lead to the divmod expansion so that the target hook always has the
type of the expression you're expanding since the types can change the
expansion.

Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* expmed.h (expand_divmod): Pass tree operands down in addition to RTX.
* expmed.cc (expand_divmod): Likewise.
* explow.cc (round_push, align_dynamic_address): Likewise.
* expr.cc (force_operand, expand_expr_divmod): Likewise.
* optabs.cc (expand_doubleword_mod, expand_doubleword_divmod):
Likewise.
* target.h: Include tree-core.
* target.def (can_special_div_by_const): New.
* targhooks.cc (default_can_special_div_by_const): New.
* targhooks.h (default_can_special_div_by_const): New.
* tree-vect-generic.cc (expand_vector_operation): Use it.
* doc/tm.texi.in: Document it.
* doc/tm.texi: Regenerate.
* tree-vect-patterns.cc (vect_recog_divmod_pattern): Check for support.
* tree-vect-stmts.cc (vectorizable_operation): Likewise.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-div-bitmask-1.c: New test.
* gcc.dg/vect/vect-div-bitmask-2.c: New test.
* gcc.dg/vect/vect-div-bitmask-3.c: New test.
* gcc.dg/vect/vect-div-bitmask.h: New file.

--- inline copy of patch -- 
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 
92bda1a7e14a3c9ea63e151e4a49a818bf4d1bdb..adba9fe97a9b43729c5e86d244a2a23e76cac097
 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -6112,6 +6112,22 @@ instruction pattern.  There is no need for the hook to 
handle these two
 implementation approaches itself.
 @end deftypefn
 
+@deftypefn {Target Hook} bool TARGET_VECTORIZE_CAN_SPECIAL_DIV_BY_CONST (enum 
@var{tree_code}, tree @var{vectype}, tree @var{treeop0}, tree @var{treeop1}, 
rtx *@var{output}, rtx @var{in0}, rtx @var{in1})
+This hook is used to test whether the target has a special method of
+division of vectors of type @var{vectype} using the two operands 
@code{treeop0},
+and @code{treeop1} and producing a vector of type @var{vectype}.  The division
+will then not be decomposed by the and kept as a div.
+
+When the hook is being used to test whether the target supports a special
+divide, @var{in0}, @var{in1}, and @var{output} are all null.  When the hook
+is being used to emit a division, @var{in0} and @var{in1} are the source
+vectors of type @var{vecttype} and @var{output} is the destination vector of
+type @var{vectype}.
+
+Return true if the operation is possible, emitting instructions for it
+if rtxes are provided and updating @var{output}.
+@end deftypefn
+
 @deftypefn {Target Hook} tree TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION 
(unsigned @var{code}, tree @var{vec_type_out}, tree @var{vec_type_in})
 This hook should return the decl of a function that implements the
 vectorized variant of the function with the @code{combined_fn} code
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 
112462310b134705d860153294287cfd7d4af81d..d5a745a02acdf051ea1da1b04076d058c24ce093
 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4164,6 +4164,8 @@ address;  but often a machine-dependent strategy can 
generate better code.
 
 @hook TARGET_VECTORIZE_VEC_PERM_CONST
 
+@hook TARGET_VECTORIZE_CAN_SPECIAL_DIV_BY_CONST
+
 @hook TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION
 
 @hook TARGET_VECTORIZE_BUILTIN_MD_VECTORIZED_FUNCTION
diff --git a/gcc/explow.cc b/gcc/explow.cc
index 
ddb4d6ae3600542f8d2bb5617cdd3933a9fae6c0..568e0eb1a158c696458ae678f5e346bf34ba0036
 100644
--- a/gcc/explow.cc
+++ b/gcc/explow.cc
@@ -1037,7 +1037,7 @@ round_push (rtx size)
  TRUNC_DIV_EXPR.  */
   size = expand_binop (Pmode, add_optab, size, alignm1_rtx,
   NULL_RTX, 1, OPTAB_LIB_WIDEN);
-  size = expand_divmod (0, TRUNC_DIV_EXPR, Pmode, size, align_rtx,
+  size = expand_divmod (0, TRUNC_DIV_EXPR, Pmode, NULL, NULL, size, align_rtx,
NULL_RTX, 1);
   size = expand_mult (Pmode, size, align_rtx, NULL_RTX, 1);
 
@@ -1203,7 +1203,7 @@ align_dynamic_address (rtx target, unsigned 
required_align)
 gen_int_mode (required_align / BITS_PER_UNIT - 1,
   Pmode),
  

Re: [PATCH][testsuite]: make check-functions-body dump expected and seen cases on failure.

2022-09-23 Thread Richard Sandiford via Gcc-patches
Tamar Christina  writes:
> Hi All,
>
> Often times when a check_function_body check fails it can be quite hard to
> figure out why as no additional information is provided.
>
> This changes it so that on failures it prints out the regex expression it's
> using and the text it's comparing against to the verbose log.
>
> This makes it much easier to figure out why a test has failed.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/testsuite/ChangeLog:
>
>   * lib/scanasm.exp (check_function_body): Add debug output to verbose log
>   on failure.

OK, thanks.

Richard

>
> --- inline copy of patch -- 
> diff --git a/gcc/testsuite/lib/scanasm.exp b/gcc/testsuite/lib/scanasm.exp
> index 
> a80630bb2a819812ce1fe05184535011a12f1288..7c9dcfc9b2e49093355219f76838161f4c3302df
>  100644
> --- a/gcc/testsuite/lib/scanasm.exp
> +++ b/gcc/testsuite/lib/scanasm.exp
> @@ -803,7 +803,12 @@ proc check_function_body { functions name body_regexp } {
>  if { ![info exists up_functions($name)] } {
>   return 0
>  }
> -return [regexp "^$body_regexp\$" $up_functions($name)]
> +set fn_res [regexp "^$body_regexp\$" $up_functions($name)]
> +if { !$fn_res } {
> +  verbose -log "body: $body_regexp"
> +  verbose -log "against: $up_functions($name)"
> +}
> +return $fn_res
>  }
>  
>  # Check the implementations of functions against expected output.  Used as:


Re: [PATCH 2/2]AArch64 Add support for neg on v1df

2022-09-23 Thread Richard Sandiford via Gcc-patches
Tamar Christina  writes:
> Hi All,
>
> This adds support for using scalar fneg on the V1DF type.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?

Why just this one operation though?  Couldn't we extend iterators
like GPF_F16 to include V1DF, avoiding the need for new patterns?

Richard

>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64-simd.md (negv1df2): New.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/simd/addsub_2.c: New test.
>
> --- inline copy of patch -- 
> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index 
> f4152160084d6b6f34bd69f0ba6386c1ab50f77e..cf8c094bd4b76981cef2dd5dd7b8e6be0d56101f
>  100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -2713,6 +2713,14 @@ (define_insn "neg2"
>[(set_attr "type" "neon_fp_neg_")]
>  )
>  
> +(define_insn "negv1df2"
> + [(set (match_operand:V1DF 0 "register_operand" "=w")
> +   (neg:V1DF (match_operand:V1DF 1 "register_operand" "w")))]
> + "TARGET_SIMD"
> + "fneg\\t%d0, %d1"
> +  [(set_attr "type" "neon_fp_neg_d")]
> +)
> +
>  (define_insn "abs2"
>   [(set (match_operand:VHSDF 0 "register_operand" "=w")
> (abs:VHSDF (match_operand:VHSDF 1 "register_operand" "w")))]
> diff --git a/gcc/testsuite/gcc.target/aarch64/simd/addsub_2.c 
> b/gcc/testsuite/gcc.target/aarch64/simd/addsub_2.c
> new file mode 100644
> index 
> ..55a7365e897f8af509de953129e0f516974f7ca8
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/simd/addsub_2.c
> @@ -0,0 +1,22 @@
> +/* { dg-do compile } */
> +/* { dg-options "-Ofast" } */
> +/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */
> +
> +#pragma GCC target "+nosve"
> +
> +/*
> +** f1:
> +** ...
> +**   fnegd[0-9]+, d[0-9]+
> +**   faddv[0-9]+.2s, v[0-9]+.2s, v[0-9]+.2s
> +** ...
> +*/
> +void f1 (float *restrict a, float *restrict b, float *res, int n)
> +{
> +   for (int i = 0; i < 2; i+=2)
> +{
> +  res[i+0] = a[i+0] + b[i+0];
> +  res[i+1] = a[i+1] - b[i+1];
> +}
> +}
> +


[PATCH 2/2]AArch64 Extend tbz pattern to allow SI to SI extensions.

2022-09-23 Thread Tamar Christina via Gcc-patches
Hi All,

This adds additional recognition of & 1 into the tbz/tbnz pattern.

Concretely with the mid-end changes this changes

void g1(bool x)
{
  if (__builtin_expect (x, 0))
h ();
}

from

tst w0, 255
bne .L7

to

tbnzw0, #0, .L5

This pattern occurs ~120,000 times in SPECCPU 20127, basically on
every boolean comparison.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64.md (*tb1): Renamed this ...
(*tb1): ... To this.
* config/aarch64/iterators.md (GPI2): New.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/tbz_1.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 
6aa1fb4be003f2027d63ac69fd314c2bbc876258..3faa03f453c94665d9d82225f180d8afdcd0b5fe
 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -943,31 +943,33 @@ (define_insn "*cb1"
  (const_int 1)))]
 )
 
-(define_insn "*tb1"
+(define_insn "*tb1"
   [(set (pc) (if_then_else
- (EQL (zero_extract:DI (match_operand:GPI 0 "register_operand" "r")
-   (const_int 1)
-   (match_operand 1
- "aarch64_simd_shift_imm_" "n"))
+ (EQL (zero_extract:GPI2
+(match_operand:GPI 0 "register_operand" "r")
+(const_int 1)
+(match_operand 1 "aarch64_simd_shift_imm_" "n"))
   (const_int 0))
 (label_ref (match_operand 2 "" ""))
 (pc)))
(clobber (reg:CC CC_REGNUM))]
-  "!aarch64_track_speculation"
+  "!aarch64_track_speculation
+&& known_ge (GET_MODE_SIZE (mode),
+GET_MODE_SIZE (mode))"
   {
 if (get_attr_length (insn) == 8)
   {
if (get_attr_far_branch (insn) == 1)
  return aarch64_gen_far_branch (operands, 2, "Ltb",
-"\\t%0, %1, ");
+"\\t%0, %1, ");
else
  {
operands[1] = GEN_INT (HOST_WIDE_INT_1U << UINTVAL (operands[1]));
-   return "tst\t%0, %1\;\t%l2";
+   return "tst\t%0, %1\;\t%l2";
  }
   }
 else
-  return "\t%0, %1, %l2";
+  return "\t%0, %1, %l2";
   }
   [(set_attr "type" "branch")
(set (attr "length")
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 
89ca66fd291b60a28979785706ecc5345ea86744..f6b2e7a83c63cab73947b6bd61b499b4b57d14ac
 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -28,6 +28,8 @@ (define_mode_iterator CCFP_CCFPE [CCFP CCFPE])
 
 ;; Iterator for General Purpose Integer registers (32- and 64-bit modes)
 (define_mode_iterator GPI [SI DI])
+;; Copy of the above iterator
+(define_mode_iterator GPI2 [SI DI])
 
 ;; Iterator for HI, SI, DI, some instructions can only work on these modes.
 (define_mode_iterator GPI_I16 [(HI "AARCH64_ISA_F16") SI DI])
diff --git a/gcc/testsuite/gcc.target/aarch64/tbz_1.c 
b/gcc/testsuite/gcc.target/aarch64/tbz_1.c
new file mode 100644
index 
..6a75eb4e7aedbfa3ae329358c6ee4d675704a074
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/tbz_1.c
@@ -0,0 +1,32 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O2 -std=c99  -fno-unwind-tables 
-fno-asynchronous-unwind-tables" } */
+/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */
+
+#include // Type your code here, or load an example.
+
+void h(void);
+
+/*
+** g1:
+** tbnzw[0-9], #?0, .L([0-9]+)
+** ret
+** ...
+*/
+void g1(bool x)
+{
+  if (__builtin_expect (x, 0))
+h ();
+}
+
+/*
+** g2:
+** tbz w[0-9]+, #?0, .L([0-9]+)
+** b   h
+** ...
+*/
+void g2(bool x)
+{
+  if (__builtin_expect (x, 1))
+h ();
+}
+




-- 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 
6aa1fb4be003f2027d63ac69fd314c2bbc876258..3faa03f453c94665d9d82225f180d8afdcd0b5fe
 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -943,31 +943,33 @@ (define_insn "*cb1"
  (const_int 1)))]
 )
 
-(define_insn "*tb1"
+(define_insn "*tb1"
   [(set (pc) (if_then_else
- (EQL (zero_extract:DI (match_operand:GPI 0 "register_operand" "r")
-   (const_int 1)
-   (match_operand 1
- "aarch64_simd_shift_imm_" "n"))
+ (EQL (zero_extract:GPI2
+(match_operand:GPI 0 "register_operand" "r")
+(const_int 1)
+(match_operand 1 "aarch64_simd_shift_imm_" "n"))
   (const_int 0))
 (label_ref (match_operand 2 "" ""))
 (pc)))
(clobber (reg:CC CC_REGNUM))]
-  

[PATCH 1/2]middle-end: RFC: On expansion of conditional branches, give hint if argument is a truth type to backend

2022-09-23 Thread Tamar Christina via Gcc-patches
Hi All,

This is an RFC to figure out how to deal with targets that don't have native
comparisons against QImode values.

Booleans, at least in C99 and higher are 0-1 valued.  This means that we only
really need to test a single bit.  However in RTL we no longer have this
information available and just have an SImode value (due to the promotion of
QImode to SImode).

This RFC fixes it by emitting an explicit & 1 during the expansion of the
conditional branch.

However it's unlikely that we want to do this unconditionally.  Most targets
I've tested seem to have harmless code changes, like x86 changes from testb to
andl, $1.

So I have two questions:

1. Should I limit this behind a target macro? Or should I just leave it for all
   targets and deal with the fallout.
2. How can I tell whether the C99 0-1 values bools are being used or the older
   0, non-0 variant?

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
However there are some benign codegen changes on x86, testb changed to andl $1.

This pattern occurs more than 120,000 times in SPECCPU 2017 and so is quite 
common.

Thanks,
Tamar

gcc/ChangeLog:

* tree.h (tree_zero_one_valued_p): New.
* dojump.cc (do_jump): Add & 1 if truth type.

--- inline copy of patch -- 
diff --git a/gcc/dojump.cc b/gcc/dojump.cc
index 
2af0cd1aca3b6af13d5d8799094ee93f18022296..8eaf1be49cd12298e61c6946ae79ca9de6197864
 100644
--- a/gcc/dojump.cc
+++ b/gcc/dojump.cc
@@ -605,7 +605,17 @@ do_jump (tree exp, rtx_code_label *if_false_label,
   /* Fall through and generate the normal code.  */
 default:
 normal:
-  temp = expand_normal (exp);
+  tree cmp = exp;
+  /* If the expression is a truth type then explicitly generate an & 1
+to indicate to the target that it's a zero-one values type.  This
+allows the target to further optimize the comparison should it
+choose to.  */
+  if (tree_zero_one_valued_p (exp))
+   {
+ type = TREE_TYPE (exp);
+ cmp = build2 (BIT_AND_EXPR, type, exp, build_int_cstu (type, 1));
+   }
+  temp = expand_normal (cmp);
   do_pending_stack_adjust ();
   /* The RTL optimizers prefer comparisons against pseudos.  */
   if (GET_CODE (temp) == SUBREG)
diff --git a/gcc/tree.h b/gcc/tree.h
index 
8f8a9660c9e0605eb516de194640b8c1b531b798..be3d2dee82f692e81082cf21c878c10f9fe9e1f1
 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -4690,6 +4690,7 @@ extern tree signed_or_unsigned_type_for (int, tree);
 extern tree signed_type_for (tree);
 extern tree unsigned_type_for (tree);
 extern bool is_truth_type_for (tree, tree);
+extern bool tree_zero_one_valued_p (tree);
 extern tree truth_type_for (tree);
 extern tree build_pointer_type_for_mode (tree, machine_mode, bool);
 extern tree build_pointer_type (tree);




-- 
diff --git a/gcc/dojump.cc b/gcc/dojump.cc
index 
2af0cd1aca3b6af13d5d8799094ee93f18022296..8eaf1be49cd12298e61c6946ae79ca9de6197864
 100644
--- a/gcc/dojump.cc
+++ b/gcc/dojump.cc
@@ -605,7 +605,17 @@ do_jump (tree exp, rtx_code_label *if_false_label,
   /* Fall through and generate the normal code.  */
 default:
 normal:
-  temp = expand_normal (exp);
+  tree cmp = exp;
+  /* If the expression is a truth type then explicitly generate an & 1
+to indicate to the target that it's a zero-one values type.  This
+allows the target to further optimize the comparison should it
+choose to.  */
+  if (tree_zero_one_valued_p (exp))
+   {
+ type = TREE_TYPE (exp);
+ cmp = build2 (BIT_AND_EXPR, type, exp, build_int_cstu (type, 1));
+   }
+  temp = expand_normal (cmp);
   do_pending_stack_adjust ();
   /* The RTL optimizers prefer comparisons against pseudos.  */
   if (GET_CODE (temp) == SUBREG)
diff --git a/gcc/tree.h b/gcc/tree.h
index 
8f8a9660c9e0605eb516de194640b8c1b531b798..be3d2dee82f692e81082cf21c878c10f9fe9e1f1
 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -4690,6 +4690,7 @@ extern tree signed_or_unsigned_type_for (int, tree);
 extern tree signed_type_for (tree);
 extern tree unsigned_type_for (tree);
 extern bool is_truth_type_for (tree, tree);
+extern bool tree_zero_one_valued_p (tree);
 extern tree truth_type_for (tree);
 extern tree build_pointer_type_for_mode (tree, machine_mode, bool);
 extern tree build_pointer_type (tree);





[PATCH][testsuite]: make check-functions-body dump expected and seen cases on failure.

2022-09-23 Thread Tamar Christina via Gcc-patches
Hi All,

Often times when a check_function_body check fails it can be quite hard to
figure out why as no additional information is provided.

This changes it so that on failures it prints out the regex expression it's
using and the text it's comparing against to the verbose log.

This makes it much easier to figure out why a test has failed.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/testsuite/ChangeLog:

* lib/scanasm.exp (check_function_body): Add debug output to verbose log
on failure.

--- inline copy of patch -- 
diff --git a/gcc/testsuite/lib/scanasm.exp b/gcc/testsuite/lib/scanasm.exp
index 
a80630bb2a819812ce1fe05184535011a12f1288..7c9dcfc9b2e49093355219f76838161f4c3302df
 100644
--- a/gcc/testsuite/lib/scanasm.exp
+++ b/gcc/testsuite/lib/scanasm.exp
@@ -803,7 +803,12 @@ proc check_function_body { functions name body_regexp } {
 if { ![info exists up_functions($name)] } {
return 0
 }
-return [regexp "^$body_regexp\$" $up_functions($name)]
+set fn_res [regexp "^$body_regexp\$" $up_functions($name)]
+if { !$fn_res } {
+  verbose -log "body: $body_regexp"
+  verbose -log "against: $up_functions($name)"
+}
+return $fn_res
 }
 
 # Check the implementations of functions against expected output.  Used as:




-- 
diff --git a/gcc/testsuite/lib/scanasm.exp b/gcc/testsuite/lib/scanasm.exp
index 
a80630bb2a819812ce1fe05184535011a12f1288..7c9dcfc9b2e49093355219f76838161f4c3302df
 100644
--- a/gcc/testsuite/lib/scanasm.exp
+++ b/gcc/testsuite/lib/scanasm.exp
@@ -803,7 +803,12 @@ proc check_function_body { functions name body_regexp } {
 if { ![info exists up_functions($name)] } {
return 0
 }
-return [regexp "^$body_regexp\$" $up_functions($name)]
+set fn_res [regexp "^$body_regexp\$" $up_functions($name)]
+if { !$fn_res } {
+  verbose -log "body: $body_regexp"
+  verbose -log "against: $up_functions($name)"
+}
+return $fn_res
 }
 
 # Check the implementations of functions against expected output.  Used as:





[PATCH]middle-end fix floating out of constants in conditionals

2022-09-23 Thread Tamar Christina via Gcc-patches
Hi All,

The following testcase:

int zoo1 (int a, int b, int c, int d)
{
   return (a > b ? c : d) & 1;
}

gets de-optimized by the front-end since somewhere around GCC 4.x due to a fix
that was added to fold_binary_op_with_conditional_arg.

The folding is supposed to succeed only if we have folded at least one of the
branches, however the check doesn't tests that all of the values are
non-constant.  So if one of the operators are a constant it accepts the folding.

This ends up folding

   return (a > b ? c : d) & 1;

into

   return (a > b ? c & 1 : d & 1);

and thus performing the AND twice.

change changes it to reject the folding if one of the arguments are a constant
and if the operations being performed are the same.

Secondly it adds a new match.pd rule to now also fold the opposite direction, so
it now also folds:

   return (a > b ? c & 1 : d & 1);

into

   return (a > b ? c : d) & 1;

Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
and  issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* fold-const.cc (fold_binary_op_with_conditional_arg): Add relaxation.
* match.pd: Add ternary constant fold rule.
* tree-cfg.cc (verify_gimple_assign_ternary): RHS1 of a COND_EXPR isn't
a value but an expression itself. 

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/if-compare_3.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index 
4f4ec81c8d4b6937ade3141a14c695b67c874c35..0ee083f290d12104969f1b335dc33917c97b4808
 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -7212,7 +7212,9 @@ fold_binary_op_with_conditional_arg (location_t loc,
 }
 
   /* Check that we have simplified at least one of the branches.  */
-  if (!TREE_CONSTANT (arg) && !TREE_CONSTANT (lhs) && !TREE_CONSTANT (rhs))
+  if ((!TREE_CONSTANT (arg) && !TREE_CONSTANT (lhs) && !TREE_CONSTANT (rhs))
+  || (TREE_CONSTANT (arg) && TREE_CODE (lhs) == TREE_CODE (rhs)
+ && !TREE_CONSTANT (lhs)))
 return NULL_TREE;
 
   return fold_build3_loc (loc, cond_code, type, test, lhs, rhs);
diff --git a/gcc/match.pd b/gcc/match.pd
index 
b225d36dc758f1581502c8d03761544bfd499c01..b61ed70e69b881a49177f10f20c1f92712bb8665
 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -4318,6 +4318,16 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (op @3 (vec_cond:s @0 @1 @2))
   (vec_cond @0 (op! @3 @1) (op! @3 @2
 
+/* Float out binary operations from branches if they can't be folded.
+   Fold (a ? (b op c) : (d op c)) --> (op (a ? b : d) c).  */
+(for op (plus mult min max bit_and bit_ior bit_xor minus lshift rshift rdiv
+trunc_div ceil_div floor_div round_div trunc_mod ceil_mod floor_mod
+round_mod)
+ (simplify
+  (cond @0 (op @1 @2) (op @3 @2))
+   (if (!FLOAT_TYPE_P (type) || !(HONOR_NANS (@1) && flag_trapping_math))
+(op (cond @0 @1 @3) @2
+
 #if GIMPLE
 (match (nop_atomic_bit_test_and_p @0 @1 @4)
  (bit_and (convert?@4 (ATOMIC_FETCH_OR_XOR_N @2 INTEGER_CST@0 @3))
diff --git a/gcc/testsuite/gcc.target/aarch64/if-compare_3.c 
b/gcc/testsuite/gcc.target/aarch64/if-compare_3.c
new file mode 100644
index 
..1d97da5c0d6454175881c219927471a567a6f0c7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/if-compare_3.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O3 -std=c99" } */
+/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */
+
+/*
+**zoo:
+** cmp w0, w1
+** cselw0, w3, w2, le
+** ret
+*/
+int zoo (int a, int b, int c, int d)
+{
+   return a > b ? c : d;
+}
+
+/*
+**zoo1:
+** cmp w0, w1
+** cselw0, w3, w2, le
+** and w0, w0, 1
+** ret
+*/
+int zoo1 (int a, int b, int c, int d)
+{
+   return (a > b ? c : d) & 1;
+}
+
diff --git a/gcc/tree-cfg.cc b/gcc/tree-cfg.cc
index 
b19710392940cf469de52d006603ae1e3deb6b76..aaf1b29da5c598add25dad2c38b828eaa89c49ce
 100644
--- a/gcc/tree-cfg.cc
+++ b/gcc/tree-cfg.cc
@@ -4244,7 +4244,9 @@ verify_gimple_assign_ternary (gassign *stmt)
   return true;
 }
 
-  if (!is_gimple_val (rhs1)
+  /* In a COND_EXPR the rhs1 is the condition and thus isn't
+ a gimple value by definition.  */
+  if ((!is_gimple_val (rhs1) && rhs_code != COND_EXPR)
   || !is_gimple_val (rhs2)
   || !is_gimple_val (rhs3))
 {




-- 
diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index 
4f4ec81c8d4b6937ade3141a14c695b67c874c35..0ee083f290d12104969f1b335dc33917c97b4808
 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -7212,7 +7212,9 @@ fold_binary_op_with_conditional_arg (location_t loc,
 }
 
   /* Check that we have simplified at least one of the branches.  */
-  if (!TREE_CONSTANT (arg) && !TREE_CONSTANT (lhs) && !TREE_CONSTANT (rhs))
+  if ((!TREE_CONSTANT (arg) && !TREE_CONSTANT (lhs) && !TREE_CONSTANT (rhs))
+  || (TREE_CONSTANT (arg) && TREE_CODE (lhs) == TREE_CODE (rhs)
+ && !TREE_CONSTANT (lhs)))
 return 

[PATCH]middle-end Recognize more conditional comparisons idioms.

2022-09-23 Thread Tamar Christina via Gcc-patches
Hi All,

GCC currently recognizes some of these for signed but not unsigned types.
It also has trouble dealing with casts in between because these are handled
by the fold machinery.

This moves the pattern detection to match.pd instead.

We fold e.g.:

uint32_t min1_32u(uint32_t a, uint32_t b, uint32_t c, uint32_t d) {
  uint32_t result;
  uint32_t m = (a >= b) - 1;
  result = (c & m) | (d & ~m);
  return result;
}

into a >= b ? c : d for all integral types.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* match.pd: New cond select pattern.

gcc/testsuite/ChangeLog:

* gcc.dg/select_cond_1.c: New test.
* gcc.dg/select_cond_2.c: New test.
* gcc.dg/select_cond_3.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/match.pd b/gcc/match.pd
index 
39da61bf117a6eb2924fc8a6473fb37ddadd60e9..7b8f50410acfd0afafc5606e972cfc4e125d3a5d
 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3577,6 +3577,25 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (cond (le @0 integer_zerop@1) (negate@2 @0) integer_zerop@1)
   (max @2 @1))
 
+/* (a & ((c `op` d) - 1)) | (b & ~((c `op` d) - 1)) ->  c `op` d ? a : b.  */
+(for op (simple_comparison)
+ (simplify
+  (bit_xor:c
+   (convert2? @0)
+   (bit_and:c
+(convert2? (bit_xor:c @1 @0))
+(convert3? (negate (convert? (op@4 @2 @3))
+  /* Alternative form, where some canonicalization were not done due to the
+ arguments being signed.  */
+  (if (INTEGRAL_TYPE_P (type) && tree_zero_one_valued_p (@4))
+   (convert:type (cond @4 @1 @0
+ (simplify
+  (bit_ior:c
+   (mult:c @0 (convert (convert2? (op@4 @2 @3
+   (bit_and:c @1 (convert (plus:c integer_minus_onep (convert (op@4 @2 @3))
+  (if (INTEGRAL_TYPE_P (type) && tree_zero_one_valued_p (@4))
+   (cond @4 @0 @1
+
 /* Simplifications of shift and rotates.  */
 
 (for rotate (lrotate rrotate)
diff --git a/gcc/testsuite/gcc.dg/select_cond_1.c 
b/gcc/testsuite/gcc.dg/select_cond_1.c
new file mode 100644
index 
..9eb9959baafe5fffeec24e4e3ae656f8fcfe943c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/select_cond_1.c
@@ -0,0 +1,97 @@
+/* { dg-do run } */
+/* { dg-additional-options "-O2 -std=c99 -fdump-tree-optimized -save-temps" } 
*/
+
+#include 
+
+__attribute__((noipa, noinline))
+uint32_t min1_32u(uint32_t a, uint32_t b, uint32_t c, uint32_t d) {
+  uint32_t result;
+  uint32_t m = (a >= b) - 1;
+  result = (c & m) | (d & ~m);
+  return result;
+}
+
+__attribute__((noipa, noinline))
+uint32_t max1_32u(uint32_t a, uint32_t b, uint32_t c, uint32_t d) {
+  uint32_t result;
+  uint32_t m = (a <= b) - 1;
+  result = (c & m) | (d & ~m);
+  return result;
+}
+
+__attribute__((noipa, noinline))
+uint32_t min2_32u(uint32_t a, uint32_t b, uint32_t c, uint32_t d) {
+  uint32_t result;
+  uint32_t m = (a > b) - 1;
+  result = (c & m) | (d & ~m);
+  return result;
+}
+
+__attribute__((noipa, noinline))
+uint32_t max2_32u(uint32_t a, uint32_t b, uint32_t c, uint32_t d) {
+  uint32_t result;
+  uint32_t m = (a < b) - 1;
+  result = (c & m) | (d & ~m);
+  return result;
+}
+
+__attribute__((noipa, noinline))
+uint32_t min3_32u(uint32_t a, uint32_t b, uint32_t c, uint32_t d) {
+  uint32_t result;
+  uint32_t m = (a == b) - 1;
+  result = (c & m) | (d & ~m);
+  return result;
+}
+
+__attribute__((noipa, noinline))
+uint32_t max3_32u(uint32_t a, uint32_t b, uint32_t c, uint32_t d) {
+  uint32_t result;
+  uint32_t m = (a != b) - 1;
+  result = (c & m) | (d & ~m);
+  return result;
+}
+
+/* { dg-final { scan-tree-dump-times {_[0-9]+ \? c_[0-9]+\(D\) : 
d_[0-9]+\(D\)} 6 "optimized" } } */
+
+extern void abort ();
+
+int main () {
+
+  if (min1_32u (3, 5, 7 , 8) != 7)
+abort ();
+
+  if (max1_32u (3, 5, 7 , 8) != 8)
+abort ();
+
+  if (min1_32u (5, 3, 7 , 8) != 8)
+abort ();
+
+  if (max1_32u (5, 3, 7 , 8) != 7)
+abort ();
+
+  if (min2_32u (3, 5, 7 , 8) != 7)
+abort ();
+
+  if (max2_32u (3, 5, 7 , 8) != 8)
+abort ();
+
+  if (min2_32u (5, 3, 7 , 8) != 8)
+abort ();
+
+  if (max2_32u (5, 3, 7 , 8) != 7)
+abort ();
+
+  if (min3_32u (3, 5, 7 , 8) != 7)
+abort ();
+
+  if (max3_32u (3, 5, 7 , 8) != 8)
+abort ();
+
+  if (min3_32u (5, 3, 7 , 8) != 7)
+abort ();
+
+  if (max3_32u (5, 3, 7 , 8) != 8)
+abort ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/select_cond_2.c 
b/gcc/testsuite/gcc.dg/select_cond_2.c
new file mode 100644
index 
..623b2272ee7b4b00130d5e9fb8c781dbc5d4189e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/select_cond_2.c
@@ -0,0 +1,99 @@
+/* { dg-do run } */
+/* { dg-additional-options "-O2 -std=c99 -fdump-tree-optimized -save-temps" } 
*/
+
+#include 
+
+__attribute__((noipa, noinline))
+uint32_t min1_32u(uint32_t a, uint32_t b) {
+  uint32_t result;
+  uint32_t m = (a >= b) - 1;
+  result = (a & m) | (b & ~m);
+  return result;
+}
+
+__attribute__((noipa, noinline))
+uint32_t 

[PATCH 2/2]AArch64 Add support for neg on v1df

2022-09-23 Thread Tamar Christina via Gcc-patches
Hi All,

This adds support for using scalar fneg on the V1DF type.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (negv1df2): New.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/simd/addsub_2.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 
f4152160084d6b6f34bd69f0ba6386c1ab50f77e..cf8c094bd4b76981cef2dd5dd7b8e6be0d56101f
 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -2713,6 +2713,14 @@ (define_insn "neg2"
   [(set_attr "type" "neon_fp_neg_")]
 )
 
+(define_insn "negv1df2"
+ [(set (match_operand:V1DF 0 "register_operand" "=w")
+   (neg:V1DF (match_operand:V1DF 1 "register_operand" "w")))]
+ "TARGET_SIMD"
+ "fneg\\t%d0, %d1"
+  [(set_attr "type" "neon_fp_neg_d")]
+)
+
 (define_insn "abs2"
  [(set (match_operand:VHSDF 0 "register_operand" "=w")
(abs:VHSDF (match_operand:VHSDF 1 "register_operand" "w")))]
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/addsub_2.c 
b/gcc/testsuite/gcc.target/aarch64/simd/addsub_2.c
new file mode 100644
index 
..55a7365e897f8af509de953129e0f516974f7ca8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/addsub_2.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast" } */
+/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */
+
+#pragma GCC target "+nosve"
+
+/*
+** f1:
+** ...
+** fnegd[0-9]+, d[0-9]+
+** faddv[0-9]+.2s, v[0-9]+.2s, v[0-9]+.2s
+** ...
+*/
+void f1 (float *restrict a, float *restrict b, float *res, int n)
+{
+   for (int i = 0; i < 2; i+=2)
+{
+  res[i+0] = a[i+0] + b[i+0];
+  res[i+1] = a[i+1] - b[i+1];
+}
+}
+




-- 
diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 
f4152160084d6b6f34bd69f0ba6386c1ab50f77e..cf8c094bd4b76981cef2dd5dd7b8e6be0d56101f
 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -2713,6 +2713,14 @@ (define_insn "neg2"
   [(set_attr "type" "neon_fp_neg_")]
 )
 
+(define_insn "negv1df2"
+ [(set (match_operand:V1DF 0 "register_operand" "=w")
+   (neg:V1DF (match_operand:V1DF 1 "register_operand" "w")))]
+ "TARGET_SIMD"
+ "fneg\\t%d0, %d1"
+  [(set_attr "type" "neon_fp_neg_d")]
+)
+
 (define_insn "abs2"
  [(set (match_operand:VHSDF 0 "register_operand" "=w")
(abs:VHSDF (match_operand:VHSDF 1 "register_operand" "w")))]
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/addsub_2.c 
b/gcc/testsuite/gcc.target/aarch64/simd/addsub_2.c
new file mode 100644
index 
..55a7365e897f8af509de953129e0f516974f7ca8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/addsub_2.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast" } */
+/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */
+
+#pragma GCC target "+nosve"
+
+/*
+** f1:
+** ...
+** fnegd[0-9]+, d[0-9]+
+** faddv[0-9]+.2s, v[0-9]+.2s, v[0-9]+.2s
+** ...
+*/
+void f1 (float *restrict a, float *restrict b, float *res, int n)
+{
+   for (int i = 0; i < 2; i+=2)
+{
+  res[i+0] = a[i+0] + b[i+0];
+  res[i+1] = a[i+1] - b[i+1];
+}
+}
+





RE: [PATCH]middle-end simplify complex if expressions where comparisons are inverse of one another.

2022-09-23 Thread Tamar Christina via Gcc-patches
Hello,

> where logical_inverted is somewhat contradicting using zero_one_valued
> instead of truth_valued_p (I think the former might not work for vector
> booleans?).
> 
> In the end I'd prefer zero_one_valued_p but avoiding inverse_conditions_p
> would be nice.
> 
> Richard.

It's not pretty but I've made it work and added more tests.

Bootstrapped Regtested on aarch64-none-linux-gnu,
x86_64-pc-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* match.pd: Add new rule.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/if-compare_1.c: New test.
* gcc.target/aarch64/if-compare_2.c: New test.

--- inline copy of patch ---

diff --git a/gcc/match.pd b/gcc/match.pd
index 
b61ed70e69b881a49177f10f20c1f92712bb8665..39da61bf117a6eb2924fc8a6473fb37ddadd60e9
 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -1903,6 +1903,101 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (if (INTEGRAL_TYPE_P (type))
   (bit_and @0 @1)))
 
+(for cmp (tcc_comparison)
+ icmp (inverted_tcc_comparison)
+ /* Fold (((a < b) & c) | ((a >= b) & d)) into (a < b ? c : d) & 1.  */
+ (simplify
+  (bit_ior
+   (bit_and:c (convert? zero_one_valued_p@0) @2)
+   (bit_and:c (convert? zero_one_valued_p@1) @3))
+(with {
+  enum tree_code c1
+   = (TREE_CODE (@0) == SSA_NAME
+  ? gimple_assign_rhs_code (SSA_NAME_DEF_STMT (@0)) : TREE_CODE (@0));
+
+  enum tree_code c2
+   = (TREE_CODE (@1) == SSA_NAME
+  ? gimple_assign_rhs_code (SSA_NAME_DEF_STMT (@1)) : TREE_CODE (@1));
+ }
+(if (INTEGRAL_TYPE_P (type)
+&& c1 == cmp
+&& c2 == icmp
+/* The scalar version has to be canonicalized after vectorization
+   because it makes unconditional loads conditional ones, which
+   means we lose vectorization because the loads may trap.  */
+&& canonicalize_math_after_vectorization_p ())
+ (bit_and (cond @0 @2 @3) { build_one_cst (type); }
+
+ /* Fold ((-(a < b) & c) | (-(a >= b) & d)) into a < b ? c : d.  */
+ (simplify
+  (bit_ior
+   (cond zero_one_valued_p@0 @2 zerop)
+   (cond zero_one_valued_p@1 @3 zerop))
+(with {
+  enum tree_code c1
+   = (TREE_CODE (@0) == SSA_NAME
+  ? gimple_assign_rhs_code (SSA_NAME_DEF_STMT (@0)) : TREE_CODE (@0));
+
+  enum tree_code c2
+   = (TREE_CODE (@1) == SSA_NAME
+  ? gimple_assign_rhs_code (SSA_NAME_DEF_STMT (@1)) : TREE_CODE (@1));
+ }
+(if (INTEGRAL_TYPE_P (type)
+&& c1 == cmp
+&& c2 == icmp
+/* The scalar version has to be canonicalized after vectorization
+   because it makes unconditional loads conditional ones, which
+   means we lose vectorization because the loads may trap.  */
+&& canonicalize_math_after_vectorization_p ())
+(cond @0 @2 @3
+
+ /* Vector Fold (((a < b) & c) | ((a >= b) & d)) into a < b ? c : d. 
+and ((~(a < b) & c) | (~(a >= b) & d)) into a < b ? c : d.  */
+ (simplify
+  (bit_ior
+   (bit_and:c (vec_cond:s @0 @4 @5) @2)
+   (bit_and:c (vec_cond:s @1 @4 @5) @3))
+(with {
+  enum tree_code c1
+   = (TREE_CODE (@0) == SSA_NAME
+  ? gimple_assign_rhs_code (SSA_NAME_DEF_STMT (@0)) : TREE_CODE (@0));
+
+  enum tree_code c2
+   = (TREE_CODE (@1) == SSA_NAME
+  ? gimple_assign_rhs_code (SSA_NAME_DEF_STMT (@1)) : TREE_CODE (@1));
+ }
+ (if (c1 == cmp && c2 == icmp)
+  (if (integer_zerop (@5))
+   (switch
+   (if (integer_onep (@4))
+(bit_and (vec_cond @0 @2 @3) @4))
+   (if (integer_minus_onep (@4))
+(vec_cond @0 @2 @3)))
+  (if (integer_zerop (@4))
+   (switch
+   (if (integer_onep (@5))
+(bit_and (vec_cond @0 @3 @2) @5))
+   (if (integer_minus_onep (@5))
+(vec_cond @0 @3 @2
+
+ /* Scalar Vectorized Fold ((-(a < b) & c) | (-(a >= b) & d))
+into a < b ? d : c.  */
+ (simplify
+  (bit_ior
+   (vec_cond:s @0 @2 integer_zerop)
+   (vec_cond:s @1 @3 integer_zerop))
+(with {
+  enum tree_code c1
+   = (TREE_CODE (@0) == SSA_NAME
+  ? gimple_assign_rhs_code (SSA_NAME_DEF_STMT (@0)) : TREE_CODE (@0));
+
+  enum tree_code c2
+   = (TREE_CODE (@1) == SSA_NAME
+  ? gimple_assign_rhs_code (SSA_NAME_DEF_STMT (@1)) : TREE_CODE (@1));
+ }
+ (if (c1 == cmp && c2 == icmp)
+  (vec_cond @0 @2 @3)
+
 /* Transform X & -Y into X * Y when Y is { 0 or 1 }.  */
 (simplify
  (bit_and:c (convert? (negate zero_one_valued_p@0)) @1)
diff --git a/gcc/testsuite/gcc.target/aarch64/if-compare_1.c 
b/gcc/testsuite/gcc.target/aarch64/if-compare_1.c
new file mode 100644
index 
..53bbd779a30e1a30e0ce0e4e5eaf589bfaf570fe
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/if-compare_1.c
@@ -0,0 +1,47 @@
+/* { dg-do run } */
+/* { dg-additional-options "-O -save-temps" } */
+/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */
+
+extern void abort ();
+
+/*
+**zoo1:
+** 

RE: [PATCH]middle-end Add optimized float addsub without needing VEC_PERM_EXPR.

2022-09-23 Thread Tamar Christina via Gcc-patches
Hi,

Attached is the respun version of the patch,

> >>
> >> Wouldn't a target need to re-check if lanes are NaN or denormal if
> >> after a SFmode lane operation a DFmode lane operation follows?  IIRC
> >> that is what usually makes punning "integer" vectors as FP vectors costly.

I don't believe this is a problem, due to NANs not being a single value and
according to the standard the sign bit doesn't change the meaning of a NAN.

That's why specifically for negates generally no check is performed and it's
Assumed that if a value is a NaN going in, it's a NaN coming out, and this
Optimization doesn't change that.  Also under fast-math we don't guarantee
a stable representation for NaN (or zeros, etc) afaik.

So if that is still a concern I could add && !HONORS_NAN () to the constraints.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* match.pd: Add fneg/fadd rule.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/simd/addsub_1.c: New test.
* gcc.target/aarch64/sve/addsub_1.c: New test.

--- inline version of patch ---

diff --git a/gcc/match.pd b/gcc/match.pd
index 
1bb936fc4010f98f24bb97671350e8432c55b347..2617d56091dfbd41ae49f980ee0af3757f5ec1cf
 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -7916,6 +7916,59 @@ and,
   (simplify (reduc (op @0 VECTOR_CST@1))
 (op (reduc:type @0) (reduc:type @1
 
+/* Simplify vector floating point operations of alternating sub/add pairs
+   into using an fneg of a wider element type followed by a normal add.
+   under IEEE 754 the fneg of the wider type will negate every even entry
+   and when doing an add we get a sub of the even and add of every odd
+   elements.  */
+(simplify
+ (vec_perm (plus:c @0 @1) (minus @0 @1) VECTOR_CST@2)
+ (if (!VECTOR_INTEGER_TYPE_P (type) && !BYTES_BIG_ENDIAN)
+  (with
+   {
+ /* Build a vector of integers from the tree mask.  */
+ vec_perm_builder builder;
+ if (!tree_to_vec_perm_builder (, @2))
+   return NULL_TREE;
+
+ /* Create a vec_perm_indices for the integer vector.  */
+ poly_uint64 nelts = TYPE_VECTOR_SUBPARTS (type);
+ vec_perm_indices sel (builder, 2, nelts);
+   }
+   (if (sel.series_p (0, 2, 0, 2))
+(with
+ {
+   machine_mode vec_mode = TYPE_MODE (type);
+   auto elem_mode = GET_MODE_INNER (vec_mode);
+   auto nunits = exact_div (GET_MODE_NUNITS (vec_mode), 2);
+   tree stype;
+   switch (elem_mode)
+{
+case E_HFmode:
+  stype = float_type_node;
+  break;
+case E_SFmode:
+  stype = double_type_node;
+  break;
+default:
+  return NULL_TREE;
+}
+   tree ntype = build_vector_type (stype, nunits);
+   if (!ntype)
+return NULL_TREE;
+
+   /* The format has to have a simple sign bit.  */
+   const struct real_format *fmt = FLOAT_MODE_FORMAT (vec_mode);
+   if (fmt == NULL)
+return NULL_TREE;
+ }
+ (if (fmt->signbit_rw == GET_MODE_UNIT_BITSIZE (vec_mode) - 1
+ && fmt->signbit_rw == fmt->signbit_ro
+ && targetm.can_change_mode_class (TYPE_MODE (ntype), TYPE_MODE 
(type), ALL_REGS)
+ && (optimize_vectors_before_lowering_p ()
+ || target_supports_op_p (ntype, NEGATE_EXPR, optab_vector)))
+  (plus (view_convert:type (negate (view_convert:ntype @1))) @0)))
+
 (simplify
  (vec_perm @0 @1 VECTOR_CST@2)
  (with
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/addsub_1.c 
b/gcc/testsuite/gcc.target/aarch64/simd/addsub_1.c
new file mode 100644
index 
..1fb91a34c421bbd2894faa0dbbf1b47ad43310c4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/addsub_1.c
@@ -0,0 +1,56 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_2a_fp16_neon_ok } */
+/* { dg-options "-Ofast" } */
+/* { dg-add-options arm_v8_2a_fp16_neon } */
+/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */
+
+#pragma GCC target "+nosve"
+
+/* 
+** f1:
+** ...
+** fnegv[0-9]+.2d, v[0-9]+.2d
+** faddv[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s
+** ...
+*/
+void f1 (float *restrict a, float *restrict b, float *res, int n)
+{
+   for (int i = 0; i < (n & -4); i+=2)
+{
+  res[i+0] = a[i+0] + b[i+0];
+  res[i+1] = a[i+1] - b[i+1];
+}
+}
+
+/* 
+** d1:
+** ...
+** fnegv[0-9]+.4s, v[0-9]+.4s
+** faddv[0-9]+.8h, v[0-9]+.8h, v[0-9]+.8h
+** ...
+*/
+void d1 (_Float16 *restrict a, _Float16 *restrict b, _Float16 *res, int n)
+{
+   for (int i = 0; i < (n & -8); i+=2)
+{
+  res[i+0] = a[i+0] + b[i+0];
+  res[i+1] = a[i+1] - b[i+1];
+}
+}
+
+/* 
+** e1:
+** ...
+** faddv[0-9]+.2d, v[0-9]+.2d, v[0-9]+.2d
+** fsubv[0-9]+.2d, v[0-9]+.2d, v[0-9]+.2d
+** ins v[0-9]+.d\[1\], v[0-9]+.d\[1\]
+** ...
+*/
+void e1 (double *restrict a, double *restrict b, double *res, int n)
+{
+   for (int i = 0; i < (n & -4); i+=2)
+{
+

[PATCH] [testsuite][arm] Fix cmse-15.c expected output

2022-09-23 Thread Torbjörn SVENSSON via Gcc-patches
The cmse-15.c testcase fails at -Os because ICF means that we
generate
secure3:
b   secure1

which is OK, but does not match the currently expected
secure3:
...
bx  r[0-3]

gcc/testsuite/ChangeLog:

* gcc.target/arm/cmse/cmse-15.c: Align with -Os improvements.

Co-Authored-By: Yvan ROUX  
Signed-off-by: Torbjörn SVENSSON  
---
 gcc/testsuite/gcc.target/arm/cmse/cmse-15.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/testsuite/gcc.target/arm/cmse/cmse-15.c 
b/gcc/testsuite/gcc.target/arm/cmse/cmse-15.c
index b0fefe561a1..5188f1d697f 100644
--- a/gcc/testsuite/gcc.target/arm/cmse/cmse-15.c
+++ b/gcc/testsuite/gcc.target/arm/cmse/cmse-15.c
@@ -144,6 +144,8 @@ int secure2 (s_bar_ptr s_bar_p)
 ** bx  r[0-3]
 ** |
 ** blx r[0-3]
+** |
+** b   secure1
 ** )
 ** ...
 */
-- 
2.25.1



Re: [PATCH v2 0/9] fortran: clobber fixes [PR41453]

2022-09-23 Thread Mikael Morin

Le 22/09/2022 à 22:42, Harald Anlauf via Fortran a écrit :

Hi Mikael,

thanks for this impressive series of patches.

Am 18.09.22 um 22:15 schrieb Mikael Morin via Fortran:

The first patch is a refactoring moving the clobber generation in
gfc_conv_procedure_call where it feels more appropriate.
The second patch is a fix for the ICE originally motivating my work
on this topic.
The third patch is a fix for some wrong code issue discovered with an
earlier version of this series.


This LGTM.  It also fixes a regression introduced with r9-3030 :-)
If you think that this set (1-3) is backportable, feel free to do so.


Yes, 2 and 3 are worth backporting, I will see how dependent they are on 1.


The following patches are gradual condition loosenings to enable clobber
generation in more and more cases.


Patches 4-8 all look fine.  Since they address missed optimizations,
they are probably something for mainline.

I was wondering if you could add a test for the change in patch 7
addressing the clobber generation for an associate-name, e.g. by
adding to testcase intent_optimize_7.f90 near the end:

   associate (av => ct)
     av = 111222333
     call foo(av)
   end associate
   if (ct /= 42) stop 3

plus the adjustments in the patterns.

Indeed, I didn't add a test because there was one already, but the 
existing test hasn't the check for clobber generation and store removal.
I prefer to create a new test though, so that the patch and the test 
come together, and the test for patch 8 is not encumbered with unrelated 
stuff.


By the way, the same could be said about patch 6.
I will create a test for that one as well.


Regarding patch 9, I do not see anything wrong with it, but then
independent eyes might see more.  I think it is ok for mainline
as is.


Each patch has been tested through an incremental bootstrap and a
partial testsuite run on fortran *intent* tests, and the whole lot has
been run through the full fortran regression testsuite.
OK for master?


Yes.


Thanks for the review.


Re: [PATCH] frange: drop endpoints to min/max representable numbers for -ffinite-math-only.

2022-09-23 Thread Richard Biener via Gcc-patches
On Fri, Sep 23, 2022 at 9:21 AM Aldy Hernandez  wrote:
>
> Ughhh, my bad.  I had reworked this as soon as Jakub said we couldn't
> cache the min/max in TYPE_MIN/MAX_VALUE, but forgot to send it.  And
> yes...the incessant wrapping was very annoying.  It's all fixed.
>
> Let me know what you think.

Much better.

Thanks,
Richard.

> Aldy
>
> On Fri, Sep 23, 2022 at 9:04 AM Richard Biener
>  wrote:
> >
> > On Thu, Sep 22, 2022 at 6:49 PM Aldy Hernandez  wrote:
> > >
> > > Similarly to how we drop NANs to UNDEFINED when -ffinite-math-only, I
> > > think we can drop the numbers outside of the min/max representable
> > > numbers to the representable number.
> > >
> > > This means the endpoings to VR_VARYING for -ffinite-math-only can now
> > > be the min/max representable, instead of -INF and +INF.
> > >
> > > Saturating in the setter means that the upcoming implementation for
> > > binary operators no longer have to worry about doing the right
> > > thing for -ffinite-math-only.  If the range goes outside the limits,
> > > it'll get chopped down.
> > >
> > > How does this look?
> > >
> > > Tested on x86-64 Linux.
> > >
> > > gcc/ChangeLog:
> > >
> > > * range-op-float.cc (build_le): Use vrp_val_*.
> > > (build_lt): Same.
> > > (build_ge): Same.
> > > (build_gt): Same.
> > > * value-range.cc (frange::set): Chop ranges outside of the
> > > representable numbers for -ffinite-math-only.
> > > (frange::normalize_kind): Use vrp_val*.
> > > (frange::verify_range): Same.
> > > (frange::set_nonnegative): Same.
> > > (range_tests_floats): Remove tests that depend on -INF and +INF.
> > > * value-range.h (real_max_representable): Add prototype.
> > > (real_min_representable): Same.
> > > (vrp_val_max): Set max representable number for
> > > -ffinite-math-only.
> > > (vrp_val_min): Same but for min.
> > > (frange::set_varying): Use vrp_val*.
> > > ---
> > >  gcc/range-op-float.cc | 12 +++
> > >  gcc/value-range.cc| 46 ---
> > >  gcc/value-range.h | 30 ++--
> > >  3 files changed, 53 insertions(+), 35 deletions(-)
> > >
> > > diff --git a/gcc/range-op-float.cc b/gcc/range-op-float.cc
> > > index 2bd3dc9253f..15ba19c2deb 100644
> > > --- a/gcc/range-op-float.cc
> > > +++ b/gcc/range-op-float.cc
> > > @@ -232,7 +232,8 @@ build_le (frange , tree type, const frange )
> > >  {
> > >gcc_checking_assert (!val.known_isnan ());
> > >
> > > -  r.set (type, dconstninf, val.upper_bound ());
> > > +  REAL_VALUE_TYPE ninf = *TREE_REAL_CST_PTR (vrp_val_min (type));
> > > +  r.set (type, ninf, val.upper_bound ());
> > >
> > >// Add both zeros if there's the possibility of zero equality.
> > >frange_add_zeros (r, type);
> > > @@ -257,7 +258,8 @@ build_lt (frange , tree type, const frange )
> > >return false;
> > >  }
> > >// We only support closed intervals.
> > > -  r.set (type, dconstninf, val.upper_bound ());
> > > +  REAL_VALUE_TYPE ninf = *TREE_REAL_CST_PTR (vrp_val_min (type));
> > > +  r.set (type, ninf, val.upper_bound ());
> > >return true;
> > >  }
> > >
> > > @@ -268,7 +270,8 @@ build_ge (frange , tree type, const frange )
> > >  {
> > >gcc_checking_assert (!val.known_isnan ());
> > >
> > > -  r.set (type, val.lower_bound (), dconstinf);
> > > +  REAL_VALUE_TYPE inf = *TREE_REAL_CST_PTR (vrp_val_max (type));
> > > +  r.set (type, val.lower_bound (), inf);
> > >
> > >// Add both zeros if there's the possibility of zero equality.
> > >frange_add_zeros (r, type);
> > > @@ -294,7 +297,8 @@ build_gt (frange , tree type, const frange )
> > >  }
> > >
> > >// We only support closed intervals.
> > > -  r.set (type, val.lower_bound (), dconstinf);
> > > +  REAL_VALUE_TYPE inf = *TREE_REAL_CST_PTR (vrp_val_max (type));
> > > +  r.set (type, val.lower_bound (), inf);
> > >return true;
> > >  }
> > >
> > > diff --git a/gcc/value-range.cc b/gcc/value-range.cc
> > > index 7e8028eced2..e57d60e1bac 100644
> > > --- a/gcc/value-range.cc
> > > +++ b/gcc/value-range.cc
> > > @@ -338,6 +338,18 @@ frange::set (tree min, tree max, value_range_kind 
> > > kind)
> > >m_neg_nan = false;
> > >  }
> > >
> > > +  // For -ffinite-math-only we can drop ranges outside the
> > > +  // representable numbers to min/max for the type.
> > > +  if (flag_finite_math_only)
> > > +{
> > > +  REAL_VALUE_TYPE min_repr = *TREE_REAL_CST_PTR (vrp_val_min 
> > > (m_type));
> > > +  REAL_VALUE_TYPE max_repr = *TREE_REAL_CST_PTR (vrp_val_max 
> > > (m_type));
> > > +  if (real_less (_min, _repr))
> > > +   m_min = min_repr;
> > > +  if (real_less (_repr, _max))
> > > +   m_max = max_repr;
> >
> > I think you want to re-formulate that in terms of real_isinf() and
> > change those to the max representable values.
> >
> > > +}
> > > +
> > >// Check for swapped ranges.
> > 

Re: [PATCH] tree-optimization/106922 - missed FRE/PRE

2022-09-23 Thread Richard Biener via Gcc-patches
On Fri, 23 Sep 2022, Jakub Jelinek wrote:

> On Thu, Sep 22, 2022 at 01:10:08PM +0200, Richard Biener via Gcc-patches 
> wrote:
> > * g++.dg/tree-ssa/pr106922.C: Adjust.
> 
> > --- a/gcc/testsuite/g++.dg/tree-ssa/pr106922.C
> > +++ b/gcc/testsuite/g++.dg/tree-ssa/pr106922.C
> > @@ -87,5 +87,4 @@ void testfunctionfoo() {
> >}
> >  }
> >  
> > -// { dg-final { scan-tree-dump-times "Found fully redundant value" 4 "pre" 
> > { xfail { ! lp64 } } } }
> > -// { dg-final { scan-tree-dump-not "m_initialized" "cddce3" { xfail { ! 
> > lp64 } } } }
> > +// { dg-final { scan-tree-dump-not "m_initialized" "dce3" } }
> 
> I've noticed
> +UNRESOLVED: g++.dg/tree-ssa/pr106922.C  -std=gnu++20  scan-tree-dump-not 
> dce3 "m_initialized"
> +UNRESOLVED: g++.dg/tree-ssa/pr106922.C  -std=gnu++2b  scan-tree-dump-not 
> dce3 "m_initialized"
> with this change, both on x86_64 and i686.
> The dump is still cddce3, additionally as the last reference to the pre
> dump is gone, not sure it is worth creating that dump.

oops...

> With the following patch, there aren't FAILs nor UNRESOLVED tests with
> GXX_TESTSUITE_STDS=98,11,14,17,20,2b make check-g++ 
> RUNTESTFLAGS="--target_board=unix\{-m32,-m64\} dg.exp='pr106922.C'"
> 
> Ok for trunk?

OK.

Thanks,
Richard.

> 2022-09-23  Jakub Jelinek  
> 
>   PR tree-optimization/106922
>   * g++.dg/tree-ssa/pr106922.C: Scan in cddce3 dump rather than
>   dce3.  Remove -fdump-tree-pre-details from dg-options.
> 
> --- gcc/testsuite/g++.dg/tree-ssa/pr106922.C.jj   2022-09-23 
> 09:02:57.011311664 +0200
> +++ gcc/testsuite/g++.dg/tree-ssa/pr106922.C  2022-09-23 09:41:06.348797951 
> +0200
> @@ -1,5 +1,5 @@
>  // { dg-require-effective-target c++20 }
> -// { dg-options "-O2 -fdump-tree-pre-details -fdump-tree-cddce3" }
> +// { dg-options "-O2 -fdump-tree-cddce3" }
>  
>  template  struct __new_allocator {
>void deallocate(int *, int) { operator delete(0); }
> @@ -87,4 +87,4 @@ void testfunctionfoo() {
>}
>  }
>  
> -// { dg-final { scan-tree-dump-not "m_initialized" "dce3" } }
> +// { dg-final { scan-tree-dump-not "m_initialized" "cddce3" } }
> 
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)


Re: [PATCH] tree-optimization/106922 - missed FRE/PRE

2022-09-23 Thread Jakub Jelinek via Gcc-patches
On Thu, Sep 22, 2022 at 01:10:08PM +0200, Richard Biener via Gcc-patches wrote:
>   * g++.dg/tree-ssa/pr106922.C: Adjust.

> --- a/gcc/testsuite/g++.dg/tree-ssa/pr106922.C
> +++ b/gcc/testsuite/g++.dg/tree-ssa/pr106922.C
> @@ -87,5 +87,4 @@ void testfunctionfoo() {
>}
>  }
>  
> -// { dg-final { scan-tree-dump-times "Found fully redundant value" 4 "pre" { 
> xfail { ! lp64 } } } }
> -// { dg-final { scan-tree-dump-not "m_initialized" "cddce3" { xfail { ! lp64 
> } } } }
> +// { dg-final { scan-tree-dump-not "m_initialized" "dce3" } }

I've noticed
+UNRESOLVED: g++.dg/tree-ssa/pr106922.C  -std=gnu++20  scan-tree-dump-not dce3 
"m_initialized"
+UNRESOLVED: g++.dg/tree-ssa/pr106922.C  -std=gnu++2b  scan-tree-dump-not dce3 
"m_initialized"
with this change, both on x86_64 and i686.
The dump is still cddce3, additionally as the last reference to the pre
dump is gone, not sure it is worth creating that dump.

With the following patch, there aren't FAILs nor UNRESOLVED tests with
GXX_TESTSUITE_STDS=98,11,14,17,20,2b make check-g++ 
RUNTESTFLAGS="--target_board=unix\{-m32,-m64\} dg.exp='pr106922.C'"

Ok for trunk?

2022-09-23  Jakub Jelinek  

PR tree-optimization/106922
* g++.dg/tree-ssa/pr106922.C: Scan in cddce3 dump rather than
dce3.  Remove -fdump-tree-pre-details from dg-options.

--- gcc/testsuite/g++.dg/tree-ssa/pr106922.C.jj 2022-09-23 09:02:57.011311664 
+0200
+++ gcc/testsuite/g++.dg/tree-ssa/pr106922.C2022-09-23 09:41:06.348797951 
+0200
@@ -1,5 +1,5 @@
 // { dg-require-effective-target c++20 }
-// { dg-options "-O2 -fdump-tree-pre-details -fdump-tree-cddce3" }
+// { dg-options "-O2 -fdump-tree-cddce3" }
 
 template  struct __new_allocator {
   void deallocate(int *, int) { operator delete(0); }
@@ -87,4 +87,4 @@ void testfunctionfoo() {
   }
 }
 
-// { dg-final { scan-tree-dump-not "m_initialized" "dce3" } }
+// { dg-final { scan-tree-dump-not "m_initialized" "cddce3" } }


Jakub



Re: [PATCH v3 06/11] OpenMP: Pointers and member mappings

2022-09-23 Thread Julian Brown
On Thu, 22 Sep 2022 15:17:08 +0200
Jakub Jelinek  wrote:

> > +  bool built_sym_hash = false;  
> 
> So, I think usually we don't construct such hash_maps right away,
> but have just pointer to the hash map initialized to NULL (then you
> don't need to built_sym_hash next to it) and you simply new the
> hash_map when needed the first time and delete it at the end (which
> does nothing if it is NULL).

How about this version? (Re-tested.)

Thanks,

Julian
>From 76ec4e82c3d7918f670d2048ad625c98b3f23921 Mon Sep 17 00:00:00 2001
From: Julian Brown 
Date: Tue, 31 May 2022 18:39:00 +
Subject: [PATCH] OpenMP: Pointers and member mappings

Implementing the "omp declare mapper" functionality, I noticed some
cases where handling of derived type members that are pointers doesn't
seem to be quite right. At present, a type such as this:

  type T
  integer, pointer, dimension(:) :: arrptr
  end type T

  type(T) :: tvar
  [...]
  !$omp target map(tofrom: tvar%arrptr)

will be mapped using three mapping nodes:

  GOMP_MAP_TO tvar%arrptr   (the descriptor)
  GOMP_MAP_TOFROM *tvar%arrptr%data (the actual array data)
  GOMP_MAP_ALWAYS_POINTER tvar%arrptr%data  (a pointer to the array data)

This follows OMP 5.0, 2.19.7.1 "map Clause":

  "If a list item in a map clause is an associated pointer and the
   pointer is not the base pointer of another list item in a map clause
   on the same construct, then it is treated as if its pointer target
   is implicitly mapped in the same clause. For the purposes of the map
   clause, the mapped pointer target is treated as if its base pointer
   is the associated pointer."

However, we can also write this:

  map(to: tvar%arrptr) map(tofrom: tvar%arrptr(3:8))

and then instead we should follow:

  "If the structure sibling list item is a pointer then it is treated
   as if its association status is undefined, unless it appears as
   the base pointer of another list item in a map clause on the same
   construct."

But, that's not implemented quite right at the moment (and completely
breaks once we introduce declare mappers), because we still map the "to:
tvar%arrptr" as the descriptor and the entire array, then we map the
"tvar%arrptr(3:8)" part using the descriptor (again!) and the array slice.

The solution is to detect when we're mapping a smaller part of the array
(or a subcomponent) on the same directive, and only map the descriptor
in that case. So we get mappings like this instead:

  map(to: tvar%arrptr)   -->
  GOMP_MAP_ALLOC  tvar%arrptr  (the descriptor)

  map(tofrom: tvar%arrptr(3:8)   -->
  GOMP_MAP_TOFROM tvar%arrptr%data(3) (size 8-3+1, etc.)
  GOMP_MAP_ALWAYS_POINTER tvar%arrptr%data (bias 3, etc.)

This version of the patch builds a hash table separating candidate
clauses for dependency checking by root symbol, to alleviate potential
quadratic behaviour.

2022-09-23  Julian Brown  

gcc/fortran/
	* gfortran.h (gfc_omp_namelist): Add "duplicate_of" field to "u2"
	union.
	* trans-openmp.cc (dependency.h): Include.
	(gfc_trans_omp_array_section): Do not map descriptors here for OpenMP.
	(gfc_symbol_rooted_namelist): New function.
	(gfc_trans_omp_clauses): Check subcomponent and subarray/element
	accesses elsewhere in the clause list for pointers to derived types or
	array descriptors, and map just the pointer/descriptor if we have any.

libgomp/
	* testsuite/libgomp.fortran/map-subarray.f90: New test.
	* testsuite/libgomp.fortran/map-subarray-2.f90: New test.
	* testsuite/libgomp.fortran/map-subcomponents.f90: New test.
	* testsuite/libgomp.fortran/struct-elem-map-1.f90: Adjust for
	descriptor-mapping changes.
---
 gcc/fortran/gfortran.h|   1 +
 gcc/fortran/trans-openmp.cc   | 205 --
 .../libgomp.fortran/map-subarray-2.f90| 108 +
 .../libgomp.fortran/map-subarray.f90  |  33 +++
 .../libgomp.fortran/map-subcomponents.f90 |  35 +++
 .../libgomp.fortran/struct-elem-map-1.f90 |  10 +-
 6 files changed, 375 insertions(+), 17 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.fortran/map-subarray-2.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/map-subarray.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/map-subcomponents.f90

diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index 4babd77924b..fe8c4e131f3 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -1358,6 +1358,7 @@ typedef struct gfc_omp_namelist
 {
   struct gfc_omp_namelist_udr *udr;
   gfc_namespace *ns;
+  struct gfc_omp_namelist *duplicate_of;
 } u2;
   struct gfc_omp_namelist *next;
   locus where;
diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc
index 8e9d5346b05..c83929fa06b 100644
--- a/gcc/fortran/trans-openmp.cc
+++ b/gcc/fortran/trans-openmp.cc
@@ -40,6 +40,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "omp-general.h"
 #include "omp-low.h"
 #include "memmodel.h"  /* For 

Re: [PATCH] frange: drop endpoints to min/max representable numbers for -ffinite-math-only.

2022-09-23 Thread Aldy Hernandez via Gcc-patches
Ughhh, my bad.  I had reworked this as soon as Jakub said we couldn't
cache the min/max in TYPE_MIN/MAX_VALUE, but forgot to send it.  And
yes...the incessant wrapping was very annoying.  It's all fixed.

Let me know what you think.
Aldy

On Fri, Sep 23, 2022 at 9:04 AM Richard Biener
 wrote:
>
> On Thu, Sep 22, 2022 at 6:49 PM Aldy Hernandez  wrote:
> >
> > Similarly to how we drop NANs to UNDEFINED when -ffinite-math-only, I
> > think we can drop the numbers outside of the min/max representable
> > numbers to the representable number.
> >
> > This means the endpoings to VR_VARYING for -ffinite-math-only can now
> > be the min/max representable, instead of -INF and +INF.
> >
> > Saturating in the setter means that the upcoming implementation for
> > binary operators no longer have to worry about doing the right
> > thing for -ffinite-math-only.  If the range goes outside the limits,
> > it'll get chopped down.
> >
> > How does this look?
> >
> > Tested on x86-64 Linux.
> >
> > gcc/ChangeLog:
> >
> > * range-op-float.cc (build_le): Use vrp_val_*.
> > (build_lt): Same.
> > (build_ge): Same.
> > (build_gt): Same.
> > * value-range.cc (frange::set): Chop ranges outside of the
> > representable numbers for -ffinite-math-only.
> > (frange::normalize_kind): Use vrp_val*.
> > (frange::verify_range): Same.
> > (frange::set_nonnegative): Same.
> > (range_tests_floats): Remove tests that depend on -INF and +INF.
> > * value-range.h (real_max_representable): Add prototype.
> > (real_min_representable): Same.
> > (vrp_val_max): Set max representable number for
> > -ffinite-math-only.
> > (vrp_val_min): Same but for min.
> > (frange::set_varying): Use vrp_val*.
> > ---
> >  gcc/range-op-float.cc | 12 +++
> >  gcc/value-range.cc| 46 ---
> >  gcc/value-range.h | 30 ++--
> >  3 files changed, 53 insertions(+), 35 deletions(-)
> >
> > diff --git a/gcc/range-op-float.cc b/gcc/range-op-float.cc
> > index 2bd3dc9253f..15ba19c2deb 100644
> > --- a/gcc/range-op-float.cc
> > +++ b/gcc/range-op-float.cc
> > @@ -232,7 +232,8 @@ build_le (frange , tree type, const frange )
> >  {
> >gcc_checking_assert (!val.known_isnan ());
> >
> > -  r.set (type, dconstninf, val.upper_bound ());
> > +  REAL_VALUE_TYPE ninf = *TREE_REAL_CST_PTR (vrp_val_min (type));
> > +  r.set (type, ninf, val.upper_bound ());
> >
> >// Add both zeros if there's the possibility of zero equality.
> >frange_add_zeros (r, type);
> > @@ -257,7 +258,8 @@ build_lt (frange , tree type, const frange )
> >return false;
> >  }
> >// We only support closed intervals.
> > -  r.set (type, dconstninf, val.upper_bound ());
> > +  REAL_VALUE_TYPE ninf = *TREE_REAL_CST_PTR (vrp_val_min (type));
> > +  r.set (type, ninf, val.upper_bound ());
> >return true;
> >  }
> >
> > @@ -268,7 +270,8 @@ build_ge (frange , tree type, const frange )
> >  {
> >gcc_checking_assert (!val.known_isnan ());
> >
> > -  r.set (type, val.lower_bound (), dconstinf);
> > +  REAL_VALUE_TYPE inf = *TREE_REAL_CST_PTR (vrp_val_max (type));
> > +  r.set (type, val.lower_bound (), inf);
> >
> >// Add both zeros if there's the possibility of zero equality.
> >frange_add_zeros (r, type);
> > @@ -294,7 +297,8 @@ build_gt (frange , tree type, const frange )
> >  }
> >
> >// We only support closed intervals.
> > -  r.set (type, val.lower_bound (), dconstinf);
> > +  REAL_VALUE_TYPE inf = *TREE_REAL_CST_PTR (vrp_val_max (type));
> > +  r.set (type, val.lower_bound (), inf);
> >return true;
> >  }
> >
> > diff --git a/gcc/value-range.cc b/gcc/value-range.cc
> > index 7e8028eced2..e57d60e1bac 100644
> > --- a/gcc/value-range.cc
> > +++ b/gcc/value-range.cc
> > @@ -338,6 +338,18 @@ frange::set (tree min, tree max, value_range_kind kind)
> >m_neg_nan = false;
> >  }
> >
> > +  // For -ffinite-math-only we can drop ranges outside the
> > +  // representable numbers to min/max for the type.
> > +  if (flag_finite_math_only)
> > +{
> > +  REAL_VALUE_TYPE min_repr = *TREE_REAL_CST_PTR (vrp_val_min (m_type));
> > +  REAL_VALUE_TYPE max_repr = *TREE_REAL_CST_PTR (vrp_val_max (m_type));
> > +  if (real_less (_min, _repr))
> > +   m_min = min_repr;
> > +  if (real_less (_repr, _max))
> > +   m_max = max_repr;
>
> I think you want to re-formulate that in terms of real_isinf() and
> change those to the max representable values.
>
> > +}
> > +
> >// Check for swapped ranges.
> >gcc_checking_assert (tree_compare (LE_EXPR, min, max));
> >
> > @@ -371,8 +383,8 @@ bool
> >  frange::normalize_kind ()
> >  {
> >if (m_kind == VR_RANGE
> > -  && real_isinf (_min, 1)
> > -  && real_isinf (_max, 0))
> > +  && vrp_val_is_min (build_real (m_type, m_min))
> > +  && vrp_val_is_max (build_real (m_type, m_max)))
>
> 

Re: [PATCH] frange: drop endpoints to min/max representable numbers for -ffinite-math-only.

2022-09-23 Thread Richard Biener via Gcc-patches
On Thu, Sep 22, 2022 at 6:49 PM Aldy Hernandez  wrote:
>
> Similarly to how we drop NANs to UNDEFINED when -ffinite-math-only, I
> think we can drop the numbers outside of the min/max representable
> numbers to the representable number.
>
> This means the endpoings to VR_VARYING for -ffinite-math-only can now
> be the min/max representable, instead of -INF and +INF.
>
> Saturating in the setter means that the upcoming implementation for
> binary operators no longer have to worry about doing the right
> thing for -ffinite-math-only.  If the range goes outside the limits,
> it'll get chopped down.
>
> How does this look?
>
> Tested on x86-64 Linux.
>
> gcc/ChangeLog:
>
> * range-op-float.cc (build_le): Use vrp_val_*.
> (build_lt): Same.
> (build_ge): Same.
> (build_gt): Same.
> * value-range.cc (frange::set): Chop ranges outside of the
> representable numbers for -ffinite-math-only.
> (frange::normalize_kind): Use vrp_val*.
> (frange::verify_range): Same.
> (frange::set_nonnegative): Same.
> (range_tests_floats): Remove tests that depend on -INF and +INF.
> * value-range.h (real_max_representable): Add prototype.
> (real_min_representable): Same.
> (vrp_val_max): Set max representable number for
> -ffinite-math-only.
> (vrp_val_min): Same but for min.
> (frange::set_varying): Use vrp_val*.
> ---
>  gcc/range-op-float.cc | 12 +++
>  gcc/value-range.cc| 46 ---
>  gcc/value-range.h | 30 ++--
>  3 files changed, 53 insertions(+), 35 deletions(-)
>
> diff --git a/gcc/range-op-float.cc b/gcc/range-op-float.cc
> index 2bd3dc9253f..15ba19c2deb 100644
> --- a/gcc/range-op-float.cc
> +++ b/gcc/range-op-float.cc
> @@ -232,7 +232,8 @@ build_le (frange , tree type, const frange )
>  {
>gcc_checking_assert (!val.known_isnan ());
>
> -  r.set (type, dconstninf, val.upper_bound ());
> +  REAL_VALUE_TYPE ninf = *TREE_REAL_CST_PTR (vrp_val_min (type));
> +  r.set (type, ninf, val.upper_bound ());
>
>// Add both zeros if there's the possibility of zero equality.
>frange_add_zeros (r, type);
> @@ -257,7 +258,8 @@ build_lt (frange , tree type, const frange )
>return false;
>  }
>// We only support closed intervals.
> -  r.set (type, dconstninf, val.upper_bound ());
> +  REAL_VALUE_TYPE ninf = *TREE_REAL_CST_PTR (vrp_val_min (type));
> +  r.set (type, ninf, val.upper_bound ());
>return true;
>  }
>
> @@ -268,7 +270,8 @@ build_ge (frange , tree type, const frange )
>  {
>gcc_checking_assert (!val.known_isnan ());
>
> -  r.set (type, val.lower_bound (), dconstinf);
> +  REAL_VALUE_TYPE inf = *TREE_REAL_CST_PTR (vrp_val_max (type));
> +  r.set (type, val.lower_bound (), inf);
>
>// Add both zeros if there's the possibility of zero equality.
>frange_add_zeros (r, type);
> @@ -294,7 +297,8 @@ build_gt (frange , tree type, const frange )
>  }
>
>// We only support closed intervals.
> -  r.set (type, val.lower_bound (), dconstinf);
> +  REAL_VALUE_TYPE inf = *TREE_REAL_CST_PTR (vrp_val_max (type));
> +  r.set (type, val.lower_bound (), inf);
>return true;
>  }
>
> diff --git a/gcc/value-range.cc b/gcc/value-range.cc
> index 7e8028eced2..e57d60e1bac 100644
> --- a/gcc/value-range.cc
> +++ b/gcc/value-range.cc
> @@ -338,6 +338,18 @@ frange::set (tree min, tree max, value_range_kind kind)
>m_neg_nan = false;
>  }
>
> +  // For -ffinite-math-only we can drop ranges outside the
> +  // representable numbers to min/max for the type.
> +  if (flag_finite_math_only)
> +{
> +  REAL_VALUE_TYPE min_repr = *TREE_REAL_CST_PTR (vrp_val_min (m_type));
> +  REAL_VALUE_TYPE max_repr = *TREE_REAL_CST_PTR (vrp_val_max (m_type));
> +  if (real_less (_min, _repr))
> +   m_min = min_repr;
> +  if (real_less (_repr, _max))
> +   m_max = max_repr;

I think you want to re-formulate that in terms of real_isinf() and
change those to the max representable values.

> +}
> +
>// Check for swapped ranges.
>gcc_checking_assert (tree_compare (LE_EXPR, min, max));
>
> @@ -371,8 +383,8 @@ bool
>  frange::normalize_kind ()
>  {
>if (m_kind == VR_RANGE
> -  && real_isinf (_min, 1)
> -  && real_isinf (_max, 0))
> +  && vrp_val_is_min (build_real (m_type, m_min))
> +  && vrp_val_is_max (build_real (m_type, m_max)))

I think this suggests that keeping +-Inf might be the better choice.
At least you want to change vrp_val_is_min/max to take a REAL_VALUE_TYPE
to avoid building four tree nodes for just this compare?

>  {
>if (m_pos_nan && m_neg_nan)
> {
> @@ -385,8 +397,8 @@ frange::normalize_kind ()
>if (!m_pos_nan || !m_neg_nan)
> {
>   m_kind = VR_RANGE;
> - m_min = dconstninf;
> - m_max = dconstinf;
> + m_min = *TREE_REAL_CST_PTR (vrp_val_min (m_type));
> + m_max = 

Re: [PATCH] Add debug functions for REAL_VALUE_TYPE.

2022-09-23 Thread Aldy Hernandez via Gcc-patches
On Fri, Sep 23, 2022 at 8:54 AM Richard Biener
 wrote:
>
> On Thu, Sep 22, 2022 at 6:48 PM Aldy Hernandez via Gcc-patches
>  wrote:
> >
> > We currently have no way of dumping REAL_VALUE_TYPEs when debugging.
> >
> > Tested on a gdb session examining the real value 10.0:
> >
> > (gdb) p min
> > $9 = {cl = 1, decimal = 0, sign = 0, signalling = 0, canonical = 0, uexp = 
> > 4, sig = {0, 0, 11529215046068469760}}
> > (gdb) p debug (min)
> > 0x0.ap+4
> >
> > OK for trunk?
>
> I'd say the reference taking variant is enough (just remember to do
> debug (*val)),

I added the pointer one because all of real.cc takes pointers
argument, but yeah...let's just nuke the pointer variant.  I hate that
we have various variants in VRP (my fault).

Thanks.
Aldy

> but OK (maybe simplify the pointer variant by forwarding instead of 
> duplicating)
>
> Richard.
>
> >
> > gcc/ChangeLog:
> >
> > * real.cc (debug): New.
> > ---
> >  gcc/real.cc | 16 
> >  1 file changed, 16 insertions(+)
> >
> > diff --git a/gcc/real.cc b/gcc/real.cc
> > index 73bbac645d9..a31b256a47b 100644
> > --- a/gcc/real.cc
> > +++ b/gcc/real.cc
> > @@ -1900,6 +1900,22 @@ real_to_decimal (char *str, const REAL_VALUE_TYPE 
> > *r_orig, size_t buf_size,
> > digits, crop_trailing_zeros, VOIDmode);
> >  }
> >
> > +DEBUG_FUNCTION void
> > +debug (const REAL_VALUE_TYPE *r)
> > +{
> > +  char s[60];
> > +  real_to_hexadecimal (s, r, sizeof (s), 0, 1);
> > +  fprintf (stderr, "%s\n", s);
> > +}
> > +
> > +DEBUG_FUNCTION void
> > +debug (const REAL_VALUE_TYPE )
> > +{
> > +  char s[60];
> > +  real_to_hexadecimal (s, , sizeof (s), 0, 1);
> > +  fprintf (stderr, "%s\n", s);
> > +}
> > +
> >  /* Render R as a hexadecimal floating point constant.  Emit DIGITS
> > significant digits in the result, bounded by BUF_SIZE.  If DIGITS is 0,
> > choose the maximum for the representation.  If CROP_TRAILING_ZEROS,
> > --
> > 2.37.1
> >
>



Re: [PATCH] frange: dump hex values when dumping FP numbers.

2022-09-23 Thread Aldy Hernandez via Gcc-patches
On Thu, Sep 22, 2022 at 11:04 PM Jakub Jelinek  wrote:
>
> On Thu, Sep 22, 2022 at 06:49:10PM +0200, Aldy Hernandez wrote:
> > It has been suggested that if we start bumping numbers by an ULP when
> > calculating open ranges (for example the numbers less than 3.0) that
> > dumping these will become increasingly harder to read, and instead we
> > should opt for the hex representation.  I still find the floating
> > point representation easier to read for most numbers, but perhaps we
> > could have both?
> >
> > With this patch this is the representation for [15.0, 20.0]:
> >
> >  [frange] float [1.5e+1 (0x0.fp+4), 2.0e+1 (0x0.ap+5)]
> >
> > Would you find this useful, or should we stick to the hex
> > representation only (or something altogether different)?
>
> I think dumping both is the way to go, but real_to_hexadecimal doesn't
> do anything useful with decimal floats, so that part should be
> guarded on !DECIMAL_FLOAT_TYPE_P (type).

Remember that decimal floats are not supported for frange, but I
suppose it's good form to guard the print for when we do.

>
> Why do you build a tree + dump_generic_node for decimal instead of
> real_to_decimal_for_mode ?

Honestly, cause I was too lazy.  That code is inherited from irange
and I figured it's just a dumping routine, but I'm happy to fix it.

> The former I think calls:
> char string[100];
> real_to_decimal (string, , sizeof (string), 0, 1);
> so perhaps:
>   char s[100];
>   real_to_decimal_for_mode (s, , sizeof (string), 0, 1, TYPE_MODE (type));
>   pp_string (pp, "%s", s);
>   if (!DECIMAL_FLOAT_TYPE_P (type))
> {
>   real_to_hexadecimal (s, , sizeof (s), 0, 1);
>   pp_printf (pp, " (%s)", s);
> }
> ?

Thanks.

I'm retesting the following and will commit if it succeeds since we
seem to have overwhelming consensus :).
Aldy
From 2f052904412bbe5821ee310067ad76b52980d8e3 Mon Sep 17 00:00:00 2001
From: Aldy Hernandez 
Date: Thu, 22 Sep 2022 18:07:03 +0200
Subject: [PATCH] frange: dump hex values when dumping FP numbers.

It has been suggested that if we start bumping numbers by an ULP when
calculating open ranges (for example the numbers less than 3.0) that
dumping these will become increasingly harder to read, and instead we
should opt for the hex representation.  I still find the floating
point representation easier to read for most numbers, but perhaps we
could have both?

With this patch this is the representation for [15.0, 20.0]:

 [frange] float [1.5e+1 (0x0.fp+4), 2.0e+1 (0x0.ap+5)]

Would you find this useful, or should we stick to the hex
representation only?

Tested on x86-64 Linux.

gcc/ChangeLog:

	* value-range-pretty-print.cc (vrange_printer::print_real_value): New.
	(vrange_printer::visit): Call print_real_value.
	* value-range-pretty-print.h: New print_real_value.
---
 gcc/value-range-pretty-print.cc | 19 +++
 gcc/value-range-pretty-print.h  |  1 +
 2 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/gcc/value-range-pretty-print.cc b/gcc/value-range-pretty-print.cc
index eb7442229ba..8cbe97b76fd 100644
--- a/gcc/value-range-pretty-print.cc
+++ b/gcc/value-range-pretty-print.cc
@@ -117,6 +117,19 @@ vrange_printer::print_irange_bitmasks (const irange ) const
   pp_string (pp, buf);
 }
 
+void
+vrange_printer::print_real_value (tree type, const REAL_VALUE_TYPE ) const
+{
+  char s[100];
+  real_to_decimal_for_mode (s, , sizeof (s), 0, 1, TYPE_MODE (type));
+  pp_string (pp, s);
+  if (!DECIMAL_FLOAT_TYPE_P (type))
+{
+  real_to_hexadecimal (s, , sizeof (s), 0, 1);
+  pp_printf (pp, " (%s)", s);
+}
+}
+
 // Print an frange.
 
 void
@@ -141,11 +154,9 @@ vrange_printer::visit (const frange ) const
   bool has_endpoints = !r.known_isnan ();
   if (has_endpoints)
 {
-  dump_generic_node (pp,
-			 build_real (type, r.lower_bound ()), 0, TDF_NONE, false);
+  print_real_value (type, r.lower_bound ());
   pp_string (pp, ", ");
-  dump_generic_node (pp,
-			 build_real (type, r.upper_bound ()), 0, TDF_NONE, false);
+  print_real_value (type, r.upper_bound ());
 }
   pp_character (pp, ']');
   print_frange_nan (r);
diff --git a/gcc/value-range-pretty-print.h b/gcc/value-range-pretty-print.h
index 20c26598fe7..a9ae5a7b4cc 100644
--- a/gcc/value-range-pretty-print.h
+++ b/gcc/value-range-pretty-print.h
@@ -32,6 +32,7 @@ private:
   void print_irange_bound (const wide_int , tree type) const;
   void print_irange_bitmasks (const irange &) const;
   void print_frange_nan (const frange &) const;
+  void print_real_value (tree type, const REAL_VALUE_TYPE ) const;
 
   pretty_printer *pp;
 };
-- 
2.37.1



Re: [PATCH] Add debug functions for REAL_VALUE_TYPE.

2022-09-23 Thread Richard Biener via Gcc-patches
On Thu, Sep 22, 2022 at 6:48 PM Aldy Hernandez via Gcc-patches
 wrote:
>
> We currently have no way of dumping REAL_VALUE_TYPEs when debugging.
>
> Tested on a gdb session examining the real value 10.0:
>
> (gdb) p min
> $9 = {cl = 1, decimal = 0, sign = 0, signalling = 0, canonical = 0, uexp = 4, 
> sig = {0, 0, 11529215046068469760}}
> (gdb) p debug (min)
> 0x0.ap+4
>
> OK for trunk?

I'd say the reference taking variant is enough (just remember to do
debug (*val)),
but OK (maybe simplify the pointer variant by forwarding instead of duplicating)

Richard.

>
> gcc/ChangeLog:
>
> * real.cc (debug): New.
> ---
>  gcc/real.cc | 16 
>  1 file changed, 16 insertions(+)
>
> diff --git a/gcc/real.cc b/gcc/real.cc
> index 73bbac645d9..a31b256a47b 100644
> --- a/gcc/real.cc
> +++ b/gcc/real.cc
> @@ -1900,6 +1900,22 @@ real_to_decimal (char *str, const REAL_VALUE_TYPE 
> *r_orig, size_t buf_size,
> digits, crop_trailing_zeros, VOIDmode);
>  }
>
> +DEBUG_FUNCTION void
> +debug (const REAL_VALUE_TYPE *r)
> +{
> +  char s[60];
> +  real_to_hexadecimal (s, r, sizeof (s), 0, 1);
> +  fprintf (stderr, "%s\n", s);
> +}
> +
> +DEBUG_FUNCTION void
> +debug (const REAL_VALUE_TYPE )
> +{
> +  char s[60];
> +  real_to_hexadecimal (s, , sizeof (s), 0, 1);
> +  fprintf (stderr, "%s\n", s);
> +}
> +
>  /* Render R as a hexadecimal floating point constant.  Emit DIGITS
> significant digits in the result, bounded by BUF_SIZE.  If DIGITS is 0,
> choose the maximum for the representation.  If CROP_TRAILING_ZEROS,
> --
> 2.37.1
>


Re: [PATCH] [x86] Support 2-instruction vector shuffle for V4SI/V4SF in ix86_expand_vec_perm_const_1.

2022-09-23 Thread Jakub Jelinek via Gcc-patches
On Fri, Sep 23, 2022 at 02:42:54PM +0800, liuhongt via Gcc-patches wrote:
> 2022-09-23  Hongtao Liu  
>   Liwei Xu  
> 
> gcc/ChangeLog:
> 
>   PR target/53346
>   * config/i386/i386-expand.cc (expand_vec_perm_shufps_shufps):
>   New function.
>   (ix86_expand_vec_perm_const_1): Insert
>   expand_vec_perm_shufps_shufps at the end of 2-instruction
>   expand sequence.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/i386/pr53346-1.c: New test.
>   * gcc.target/i386/pr53346-2.c: New test.
> ---
>  gcc/config/i386/i386-expand.cc| 117 ++
>  gcc/testsuite/gcc.target/i386/pr53346-1.c |  70 +
>  gcc/testsuite/gcc.target/i386/pr53346-2.c |  59 +++
>  gcc/testsuite/gcc.target/i386/pr53346-3.c |  69 +
>  gcc/testsuite/gcc.target/i386/pr53346-4.c |  59 +++
>  5 files changed, 374 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr53346-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr53346-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr53346-3.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr53346-4.c
> 
> diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
> index 5334363e235..43c58111a62 100644
> --- a/gcc/config/i386/i386-expand.cc
> +++ b/gcc/config/i386/i386-expand.cc
> @@ -19604,6 +19604,120 @@ expand_vec_perm_1 (struct expand_vec_perm_d *d)
>return false;
>  }
>  
> +/* A subroutine of ix86_expand_vec_perm_const_1. Try to implement D
> +   in terms of a pair of shufps+ shufps/pshufd instructions. */
> +static bool
> +expand_vec_perm_shufps_shufps (struct expand_vec_perm_d *d)
> +{
> +  unsigned char perm1[4];
> +  machine_mode vmode = d->vmode;
> +  bool ok;
> +  unsigned i, j, k, count = 0;
> +
> +  if (d->one_operand_p
> +  || (vmode != V4SImode && vmode != V4SFmode))
> +return false;
> +
> +  if (d->testing_p)
> +return true;
> +
> +  for (i = 0; i < 4; ++i)
> +count += d->perm[i] > 3 ? 1 : 0;
> +
> +  gcc_assert(count & 3);

Missing space before (
> +  /* shufps.  */
> +  ok = expand_vselect_vconcat(tmp, d->op0, d->op1,
> +   perm1, d->nelt, false);

Ditto.

> +  /* When lone_idx is not 0, it must from second op(count == 1).  */
> +  gcc_assert ((lone_idx == 0 && count == 3)
> +   || (lone_idx != 0 && count == 1));

Perhaps write it more simply as
  gcc_assert (count == (lone_idx ? 1 : 3));
?

> +  /* shufps.  */
> +  ok = expand_vselect_vconcat(tmp, d->op0, d->op1,
> +   perm1, d->nelt, false);

Missing space before (

> +  gcc_assert (ok);
> +
> +  /* Refine lone and pair index to original order.  */
> +  perm1[shift] = lone_idx << 1;
> +  perm1[shift + 1] = pair_idx << 1;
> +
> +  /* Select the remaining 2 elements in another vector.  */
> +  for (i = 2 - shift; i < 4 - shift; ++i)
> + perm1[i] = (lone_idx == 1) ? (d->perm[i] + 4) : d->perm[i];

All the ()s in the above line aren't needed.

> +  /* shufps.  */
> +  ok = expand_vselect_vconcat(d->target, tmp, d->op1,
> +   perm1, d->nelt, false);

Again, missing space

Otherwise LGTM

Jakub



Re: [PATCH] opts: fix --help=common with '\t' description

2022-09-23 Thread Richard Biener via Gcc-patches
On Thu, Sep 22, 2022 at 3:59 PM Martin Liška  wrote:
>
> Fixes -flto-compression option:
>
> -  -flto-compression-level= Use z Use zlib/zstd compression level 
>  for IL.
> +  -flto-compression-level=<0,19> Use zlib/zstd compression level  
> for IL.
>
> Ready for master?

OK

> Thanks,
> Martin
>
> ---
>  gcc/common.opt | 2 +-
>  gcc/opts.cc| 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/common.opt b/gcc/common.opt
> index 06ef768ab78..296d6f194bf 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -2106,7 +2106,7 @@ Specify the algorithm to partition symbols and vars at 
> linktime.
>  ; The initial value of -1 comes from Z_DEFAULT_COMPRESSION in zlib.h.
>  flto-compression-level=
>  Common Joined RejectNegative UInteger Var(flag_lto_compression_level) 
> Init(-1) IntegerRange(0, 19)
> --flto-compression-level=   Use zlib/zstd compression level 
>  for IL.
> +Use zlib/zstd compression level  for IL.
>
>  flto-odr-type-merging
>  Common Ignore
> diff --git a/gcc/opts.cc b/gcc/opts.cc
> index e058aaf3697..eb5db01de17 100644
> --- a/gcc/opts.cc
> +++ b/gcc/opts.cc
> @@ -1801,7 +1801,7 @@ print_filtered_help (unsigned int include_flags,
>   help = new_help;
> }
>
> -  if (option->range_max != -1)
> +  if (option->range_max != -1 && tab == NULL)
> {
>   char b[128];
>   snprintf (b, sizeof (b), "<%d,%d>", option->range_min,
> --
> 2.37.3
>


Re: [PATCH] attribs: Improve diagnostics

2022-09-23 Thread Richard Biener via Gcc-patches
On Fri, 23 Sep 2022, Jakub Jelinek wrote:

> Hi!
> 
> When looking at the attribs code, I've noticed weird diagnostics
> like
> int a __attribute__((section ("foo", "bar")));
> a.c:1:1: error: wrong number of arguments specified for ?section? attribute
> 1 | int a __attribute__((section ("foo", "bar")));
>   | ^~~
> a.c:1:1: note: expected between 1 and 1, found 2
> As roughly 50% of attributes that accept any arguments have
> spec->min_length == spec->max_length, I think it is worth it to have
> separate wording for such common case and just write simpler
> a.c:1:1: note: expected 1, found 2
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Thanks,
Richard.

> 2022-09-23  Jakub Jelinek  
> 
>   * attribs.cc (decl_attributes): Improve diagnostics, instead of
>   saying expected between 1 and 1, found 2 just say expected 1, found 2.
> 
> --- gcc/attribs.cc.jj 2022-09-22 10:54:44.693705319 +0200
> +++ gcc/attribs.cc2022-09-22 18:18:38.142414100 +0200
> @@ -737,6 +737,9 @@ decl_attributes (tree *node, tree attrib
> if (spec->max_length < 0)
>   inform (input_location, "expected %i or more, found %i",
>   spec->min_length, nargs);
> +   else if (spec->min_length == spec->max_length)
> + inform (input_location, "expected %i, found %i",
> + spec->min_length, nargs);
> else
>   inform (input_location, "expected between %i and %i, found %i",
>   spec->min_length, spec->max_length, nargs);
> 
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)


[PATCH] [x86] Support 2-instruction vector shuffle for V4SI/V4SF in ix86_expand_vec_perm_const_1.

2022-09-23 Thread liuhongt via Gcc-patches
x86 have shufps which shuffles the first operand to the lower 64-bit,
and the second operand to the upper 64-bit. For
__builtin_shufflevector (op0, op1, 1, 4, 3, 6), it will be veclowered since
can_vec_perm_const_p return false for sse2 target.
This patch add a new function to support 2-operand v4si/v4sf
vector shuffle with any index for sse2.

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for trunk?

2022-09-23  Hongtao Liu  
Liwei Xu  

gcc/ChangeLog:

PR target/53346
* config/i386/i386-expand.cc (expand_vec_perm_shufps_shufps):
New function.
(ix86_expand_vec_perm_const_1): Insert
expand_vec_perm_shufps_shufps at the end of 2-instruction
expand sequence.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr53346-1.c: New test.
* gcc.target/i386/pr53346-2.c: New test.
---
 gcc/config/i386/i386-expand.cc| 117 ++
 gcc/testsuite/gcc.target/i386/pr53346-1.c |  70 +
 gcc/testsuite/gcc.target/i386/pr53346-2.c |  59 +++
 gcc/testsuite/gcc.target/i386/pr53346-3.c |  69 +
 gcc/testsuite/gcc.target/i386/pr53346-4.c |  59 +++
 5 files changed, 374 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr53346-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr53346-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr53346-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr53346-4.c

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 5334363e235..43c58111a62 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -19604,6 +19604,120 @@ expand_vec_perm_1 (struct expand_vec_perm_d *d)
   return false;
 }
 
+/* A subroutine of ix86_expand_vec_perm_const_1. Try to implement D
+   in terms of a pair of shufps+ shufps/pshufd instructions. */
+static bool
+expand_vec_perm_shufps_shufps (struct expand_vec_perm_d *d)
+{
+  unsigned char perm1[4];
+  machine_mode vmode = d->vmode;
+  bool ok;
+  unsigned i, j, k, count = 0;
+
+  if (d->one_operand_p
+  || (vmode != V4SImode && vmode != V4SFmode))
+return false;
+
+  if (d->testing_p)
+return true;
+
+  for (i = 0; i < 4; ++i)
+count += d->perm[i] > 3 ? 1 : 0;
+
+  gcc_assert(count & 3);
+
+  rtx tmp = gen_reg_rtx (vmode);
+  /* 2 from op0 and 2 from op1.  */
+  if (count == 2)
+{
+  unsigned char perm2[4];
+  for (i = 0, j = 0, k = 2; i < 4; ++i)
+   if (d->perm[i] & 4)
+ {
+   perm1[k++] = d->perm[i];
+   perm2[i] = k - 1;
+ }
+   else
+ {
+   perm1[j++] = d->perm[i];
+   perm2[i] = j - 1;
+ }
+
+  /* shufps.  */
+  ok = expand_vselect_vconcat(tmp, d->op0, d->op1,
+ perm1, d->nelt, false);
+  gcc_assert (ok);
+  if (vmode == V4SImode && TARGET_SSE2)
+  /* pshufd.  */
+   ok = expand_vselect (d->target, tmp,
+perm2, d->nelt, false);
+  else
+   {
+ /* shufps.  */
+ perm2[2] += 4;
+ perm2[3] += 4;
+ ok = expand_vselect_vconcat (d->target, tmp, tmp,
+  perm2, d->nelt, false);
+   }
+  gcc_assert (ok);
+}
+  /* 3 from one op and 1 from another.  */
+  else
+{
+  unsigned pair_idx = 8, lone_idx = 8, shift;
+
+  /* Find the lone index.  */
+  for (i = 0; i < 4; ++i)
+   if ((d->perm[i] > 3 && count == 1)
+   || (d->perm[i] < 4 && count == 3))
+ lone_idx = i;
+
+  /* When lone_idx is not 0, it must from second op(count == 1).  */
+  gcc_assert ((lone_idx == 0 && count == 3)
+ || (lone_idx != 0 && count == 1));
+
+  /* Find the pair index that sits in the same half as the lone index.  */
+  shift = lone_idx & 2;
+  pair_idx = 1 - lone_idx + 2 * shift;
+
+  /* First permutate lone index and pair index into the same vector as
+[ lone, lone, pair, pair ].  */
+  perm1[1] = perm1[0]
+   = (count == 3) ? d->perm[lone_idx] : d->perm[lone_idx] - 4;
+  perm1[3] = perm1[2]
+   = (count == 3) ? d->perm[pair_idx] : d->perm[pair_idx] + 4;
+
+  /* Alway put the vector contains lone indx at the first.  */
+  if (count == 1)
+   std::swap (d->op0, d->op1);
+
+  /* shufps.  */
+  ok = expand_vselect_vconcat(tmp, d->op0, d->op1,
+ perm1, d->nelt, false);
+  gcc_assert (ok);
+
+  /* Refine lone and pair index to original order.  */
+  perm1[shift] = lone_idx << 1;
+  perm1[shift + 1] = pair_idx << 1;
+
+  /* Select the remaining 2 elements in another vector.  */
+  for (i = 2 - shift; i < 4 - shift; ++i)
+   perm1[i] = (lone_idx == 1) ? (d->perm[i] + 4) : d->perm[i];
+
+  /* Adjust to original selector.  */
+  if (lone_idx > 1)
+   std::swap (tmp, d->op1);
+
+  /* shufps.  */
+  ok = 

[PATCH] attribs: Improve diagnostics

2022-09-23 Thread Jakub Jelinek via Gcc-patches
Hi!

When looking at the attribs code, I've noticed weird diagnostics
like
int a __attribute__((section ("foo", "bar")));
a.c:1:1: error: wrong number of arguments specified for ‘section’ attribute
1 | int a __attribute__((section ("foo", "bar")));
  | ^~~
a.c:1:1: note: expected between 1 and 1, found 2
As roughly 50% of attributes that accept any arguments have
spec->min_length == spec->max_length, I think it is worth it to have
separate wording for such common case and just write simpler
a.c:1:1: note: expected 1, found 2

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2022-09-23  Jakub Jelinek  

* attribs.cc (decl_attributes): Improve diagnostics, instead of
saying expected between 1 and 1, found 2 just say expected 1, found 2.

--- gcc/attribs.cc.jj   2022-09-22 10:54:44.693705319 +0200
+++ gcc/attribs.cc  2022-09-22 18:18:38.142414100 +0200
@@ -737,6 +737,9 @@ decl_attributes (tree *node, tree attrib
  if (spec->max_length < 0)
inform (input_location, "expected %i or more, found %i",
spec->min_length, nargs);
+ else if (spec->min_length == spec->max_length)
+   inform (input_location, "expected %i, found %i",
+   spec->min_length, nargs);
  else
inform (input_location, "expected between %i and %i, found %i",
spec->min_length, spec->max_length, nargs);


Jakub