[PATCH] gcc/configure: check for powerpc64le-unknown-freebsd

2021-10-15 Thread Piotr Kubaj
Only powerpc64-unknown-freebsd was checked for.

Signed-off-by: Piotr Kubaj 
---
 gcc/configure| 2 +-
 gcc/configure.ac | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/configure b/gcc/configure
index 5ea5a1b7143..8790153cfda 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -30717,7 +30717,7 @@ $as_echo "#define HAVE_LD_PPC_GNU_ATTR_LONG_DOUBLE 1" 
>>confdefs.h
 esac
 
 case "$target:$tm_file" in
-  powerpc64-*-freebsd* | powerpc64*-*-linux* | 
powerpc*-*-linux*rs6000/biarch64.h*)
+  powerpc64*-*-freebsd* | powerpc64*-*-linux* | 
powerpc*-*-linux*rs6000/biarch64.h*)
   case "$target" in
  *le-*-linux*)
  emul_name="-melf64lppc"
diff --git a/gcc/configure.ac b/gcc/configure.ac
index 344b2f586e8..c2cad0a3f40 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -6515,7 +6515,7 @@ EOF
 esac
 
 case "$target:$tm_file" in
-  powerpc64-*-freebsd* | powerpc64*-*-linux* | 
powerpc*-*-linux*rs6000/biarch64.h*)
+  powerpc64*-*-freebsd* | powerpc64*-*-linux* | 
powerpc*-*-linux*rs6000/biarch64.h*)
   case "$target" in
  *le-*-linux*)
  emul_name="-melf64lppc"
-- 
2.33.0



Re: [PATCH v2 0/4] libffi: Sync with upstream

2021-10-15 Thread David Edelsohn via Gcc-patches
On Fri, Oct 15, 2021 at 8:06 PM H.J. Lu  wrote:
>
> On Wed, Oct 13, 2021 at 6:42 AM H.J. Lu  wrote:
> >
> > On Wed, Oct 13, 2021 at 6:03 AM Richard Biener
> >  wrote:
> > >
> > > On Wed, Oct 13, 2021 at 2:56 PM H.J. Lu  wrote:
> > > >
> > > > On Wed, Oct 13, 2021 at 5:45 AM Richard Biener
> > > >  wrote:
> > > > >
> > > > > On Thu, Sep 2, 2021 at 5:50 PM H.J. Lu  wrote:
> > > > > >
> > > > > > Change in the v2 patch:
> > > > > >
> > > > > > 1. Disable static trampolines by default.
> > > > > >
> > > > > >
> > > > > > GCC maintained a copy of libffi snapshot from 2009 and 
> > > > > > cherry-picked fixes
> > > > > > from upstream over the last 10+ years.  In the meantime, libffi 
> > > > > > upstream
> > > > > > has been changed significantly with new features, bug fixes and new 
> > > > > > target
> > > > > > support.  Here is a set of patches to sync with libffi 3.4.2 
> > > > > > release and
> > > > > > make it easier to sync with libffi upstream:
> > > > > >
> > > > > > 1. Document how to sync with upstream.
> > > > > > 2. Add scripts to help sync with upstream.
> > > > > > 3. Sync with libffi 3.4.2. This patch is quite big.  It is availale 
> > > > > > at
> > > > > >
> > > > > > https://gitlab.com/x86-gcc/gcc/-/commit/15e80c879c571f79a0e57702848a9df5fba5be2f
> > > > > > 4. Integrate libffi build and testsuite with GCC.
> > > > >
> > > > > How did you test this?  It looks like libgo is the only consumer of
> > > > > libffi these days.
> > > > > In particular go/libgo seems to be supported on almost all targets 
> > > > > besides
> > > > > darwin/windows - did you test cross and canadian configurations?
> > > >
> > > > I only tested it on Linux/i686 and Linux/x86-64.   My understanding is 
> > > > that
> > > > the upstream libffi works on Darwin and Windows.
> > > >
> > > > > I applaud the attempt to sync to upsteam but I fear you won't get any 
> > > > > "review"
> > > > > of this massive diff.
> > > >
> > > > I believe that it should just work.  Our libffi is very much out of 
> > > > date.
> > >
> > > Yes, you can hope.  And yes, our libffi is out of date.
> > >
> > > Can you please do the extra step to test one weird architecture, namely
> > > powerpc64-aix which is available on the compile-farm?
> >
> > I will give it a try and report back.
> >
> > > If that goes well I think it's good to "hope" at this point (and plenty of
> > > time to fix fallout until the GCC 12 release).
> > >
> > > Thus OK after the extra testing dance and waiting until early next
> > > week so others can throw in a veto.
>
> I tried to bootstrap GCC master branch on  gcc119.fsffrance.org:
>
> *  MT/MODEL: 8284-22A 
> *
> * Partition: gcc119   
> *
> *System: power8-aix.osuosl.org
> *
> *   O/S: AIX V7.2 7200-04-03-2038
>
> I configured GCC with
>
> --with-as=/usr/bin/as --with-ld=/usr/bin/ld
> --enable-version-specific-runtime-libs --disable-nls
> --enable-decimal-float=dpd --disable-libstdcxx-pch --disable-werror
> --enable-__cxa_atexit --with-gmp=/opt/cfarm --with-mpfr=/opt/cfarm
> --with-mpc=/opt/cfarm --with-isl=/opt/cfarm --prefix=/opt/freeware
> --with-local-prefix=/opt/freeware --enable-languages=c,c++,go
>
> I got
>
> g++   -g -DIN_GCC -fno-exceptions -fno-rtti -fasynchronous-unwind-tables 
> -W
> -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wno-format 
> -Wmissing-format-at
> tribute -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros 
> -Wno-
> overlength-strings -fno-common  -DHAVE_CONFIG_H  -DGENERATOR_FILE 
> -static-libstd
> c++ -static-libgcc -Wl,-bbigtoc -Wl,-bmaxdata:0x4000 -o build/genenums \
> build/genenums.o build/read-md.o build/errors.o 
> ../build-powerpc-ibm-aix7.2.
> 4.0/libiberty/libiberty.a
> ld: 0711-317 ERROR: Undefined symbol: lexer_line
> ld: 0711-317 ERROR: Undefined symbol: .yylex(char const**)
> ld: 0711-317 ERROR: Undefined symbol: .yybegin(char const*)
> ld: 0711-317 ERROR: Undefined symbol: lexer_toplevel_done
> ld: 0711-317 ERROR: Undefined symbol: .yyend()
> ld: 0711-345 Use the -bloadmap or -bnoquiet option to obtain more information.
> collect2: error: ld returned 8 exit status
> Makefile:3000: recipe for target 'build/gengtype' failed
> gmake[5]: *** [build/gengtype] Error 1
>
> David, is there an instruction to bootstrap GCC on AIX?

The CompileFarm page in the GCC wiki has instructions under "build tips":

https://gcc.gnu.org/wiki/CompileFarm#Services_and_software_installed_on_farm_machines

The error that you show might be due to not having /opt/freeware/bin
first in your path and the bootstrap used the AIX version of lex or
sed or some other command.

Thanks, David


Re: [PATCH v2 0/4] libffi: Sync with upstream

2021-10-15 Thread H.J. Lu via Gcc-patches
On Wed, Oct 13, 2021 at 6:42 AM H.J. Lu  wrote:
>
> On Wed, Oct 13, 2021 at 6:03 AM Richard Biener
>  wrote:
> >
> > On Wed, Oct 13, 2021 at 2:56 PM H.J. Lu  wrote:
> > >
> > > On Wed, Oct 13, 2021 at 5:45 AM Richard Biener
> > >  wrote:
> > > >
> > > > On Thu, Sep 2, 2021 at 5:50 PM H.J. Lu  wrote:
> > > > >
> > > > > Change in the v2 patch:
> > > > >
> > > > > 1. Disable static trampolines by default.
> > > > >
> > > > >
> > > > > GCC maintained a copy of libffi snapshot from 2009 and cherry-picked 
> > > > > fixes
> > > > > from upstream over the last 10+ years.  In the meantime, libffi 
> > > > > upstream
> > > > > has been changed significantly with new features, bug fixes and new 
> > > > > target
> > > > > support.  Here is a set of patches to sync with libffi 3.4.2 release 
> > > > > and
> > > > > make it easier to sync with libffi upstream:
> > > > >
> > > > > 1. Document how to sync with upstream.
> > > > > 2. Add scripts to help sync with upstream.
> > > > > 3. Sync with libffi 3.4.2. This patch is quite big.  It is availale at
> > > > >
> > > > > https://gitlab.com/x86-gcc/gcc/-/commit/15e80c879c571f79a0e57702848a9df5fba5be2f
> > > > > 4. Integrate libffi build and testsuite with GCC.
> > > >
> > > > How did you test this?  It looks like libgo is the only consumer of
> > > > libffi these days.
> > > > In particular go/libgo seems to be supported on almost all targets 
> > > > besides
> > > > darwin/windows - did you test cross and canadian configurations?
> > >
> > > I only tested it on Linux/i686 and Linux/x86-64.   My understanding is 
> > > that
> > > the upstream libffi works on Darwin and Windows.
> > >
> > > > I applaud the attempt to sync to upsteam but I fear you won't get any 
> > > > "review"
> > > > of this massive diff.
> > >
> > > I believe that it should just work.  Our libffi is very much out of date.
> >
> > Yes, you can hope.  And yes, our libffi is out of date.
> >
> > Can you please do the extra step to test one weird architecture, namely
> > powerpc64-aix which is available on the compile-farm?
>
> I will give it a try and report back.
>
> > If that goes well I think it's good to "hope" at this point (and plenty of
> > time to fix fallout until the GCC 12 release).
> >
> > Thus OK after the extra testing dance and waiting until early next
> > week so others can throw in a veto.

I tried to bootstrap GCC master branch on  gcc119.fsffrance.org:

*  MT/MODEL: 8284-22A *
* Partition: gcc119   *
*System: power8-aix.osuosl.org*
*   O/S: AIX V7.2 7200-04-03-2038

I configured GCC with

--with-as=/usr/bin/as --with-ld=/usr/bin/ld
--enable-version-specific-runtime-libs --disable-nls
--enable-decimal-float=dpd --disable-libstdcxx-pch --disable-werror
--enable-__cxa_atexit --with-gmp=/opt/cfarm --with-mpfr=/opt/cfarm
--with-mpc=/opt/cfarm --with-isl=/opt/cfarm --prefix=/opt/freeware
--with-local-prefix=/opt/freeware --enable-languages=c,c++,go

I got

g++   -g -DIN_GCC -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W
-Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wno-format -Wmissing-format-at
tribute -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-
overlength-strings -fno-common  -DHAVE_CONFIG_H  -DGENERATOR_FILE -static-libstd
c++ -static-libgcc -Wl,-bbigtoc -Wl,-bmaxdata:0x4000 -o build/genenums \
build/genenums.o build/read-md.o build/errors.o ../build-powerpc-ibm-aix7.2.
4.0/libiberty/libiberty.a
ld: 0711-317 ERROR: Undefined symbol: lexer_line
ld: 0711-317 ERROR: Undefined symbol: .yylex(char const**)
ld: 0711-317 ERROR: Undefined symbol: .yybegin(char const*)
ld: 0711-317 ERROR: Undefined symbol: lexer_toplevel_done
ld: 0711-317 ERROR: Undefined symbol: .yyend()
ld: 0711-345 Use the -bloadmap or -bnoquiet option to obtain more information.
collect2: error: ld returned 8 exit status
Makefile:3000: recipe for target 'build/gengtype' failed
gmake[5]: *** [build/gengtype] Error 1

David, is there an instruction to bootstrap GCC on AIX?

Thanks.

-- 
H.J.


[committed] libstdc++: Fix error in filesystem::path with Clang

2021-10-15 Thread Jonathan Wakely via Gcc-patches
THis fixes teh following error seen with Clang:

error: function '_S_convert>' with deduced
return type cannot be used before it is defined
  return string_type(_S_convert(std::u8string_view(__str)));
 ^

libstdc++-v3/ChangeLog:

* include/bits/fs_path.h (path::_S_convert(T)): Avoid recursive
call to function with deduced return type.

Tested powerpc64le-linux. Committed to trunk.

commit e547d1341b1fe90672c9b982c4a98f8197237bb7
Author: Jonathan Wakely 
Date:   Fri Oct 15 23:27:54 2021

libstdc++: Fix error in filesystem::path with Clang

THis fixes teh following error seen with Clang:

error: function '_S_convert>' with deduced
return type cannot be used before it is defined
  return string_type(_S_convert(std::u8string_view(__str)));
 ^

libstdc++-v3/ChangeLog:

* include/bits/fs_path.h (path::_S_convert(T)): Avoid recursive
call to function with deduced return type.

diff --git a/libstdc++-v3/include/bits/fs_path.h 
b/libstdc++-v3/include/bits/fs_path.h
index d13fb12455c..4bd980952f1 100644
--- a/libstdc++-v3/include/bits/fs_path.h
+++ b/libstdc++-v3/include/bits/fs_path.h
@@ -630,7 +630,8 @@ namespace __detail
  // Calling _S_convert will return a u8string_view that
  // refers to __str and would dangle after this function returns.
  // Return a string_type instead, to avoid dangling.
- return string_type(_S_convert(std::u8string_view(__str)));
+ return string_type(_S_convert(__str.data(),
+   __str.data() + __str.size()));
 #endif
else
  return _S_convert(__str.data(), __str.data() + __str.size());


[committed] libstdc++: Define std::basic_string::resize_and_overwrite for C++23 (P1072R10)

2021-10-15 Thread Jonathan Wakely via Gcc-patches
A recently approved change for the C++23 working draft.

libstdc++-v3/ChangeLog:

* include/bits/basic_string.h (__cpp_lib_string_resize_and_overwrite):
Define for C++23.
(basic_string::resize_and_overwrite): Declare.
* include/bits/basic_string.tcc (basic_string::resize_and_overwrite):
Define.
* include/std/version (__cpp_lib_resize_and_overwrite): Define
for C++23.
* 
testsuite/21_strings/basic_string/capacity/char/resize_and_overwrite.cc:
New test.

Tested powerpc64le-linux. Committed to trunk.

commit 929abc7fe3ad4491ac412ca232e055618559f268
Author: Jonathan Wakely 
Date:   Fri Oct 15 22:01:25 2021

libstdc++: Define std::basic_string::resize_and_overwrite for C++23 
(P1072R10)

A recently approved change for the C++23 working draft.

libstdc++-v3/ChangeLog:

* include/bits/basic_string.h 
(__cpp_lib_string_resize_and_overwrite):
Define for C++23.
(basic_string::resize_and_overwrite): Declare.
* include/bits/basic_string.tcc 
(basic_string::resize_and_overwrite):
Define.
* include/std/version (__cpp_lib_resize_and_overwrite): Define
for C++23.
* 
testsuite/21_strings/basic_string/capacity/char/resize_and_overwrite.cc:
New test.

diff --git a/libstdc++-v3/include/bits/basic_string.h 
b/libstdc++-v3/include/bits/basic_string.h
index 59c84b1b6ad..a6575fa9e26 100644
--- a/libstdc++-v3/include/bits/basic_string.h
+++ b/libstdc++-v3/include/bits/basic_string.h
@@ -971,6 +971,13 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
 #pragma GCC diagnostic pop
 #endif
 
+#if __cplusplus > 202002L
+#define __cpp_lib_string_resize_and_overwrite 202110L
+  template
+   constexpr void
+   resize_and_overwrite(size_type __n, _Operation __op);
+#endif
+
   /**
*  Returns the total number of characters that the %string can hold
*  before needing to allocate more memory.
diff --git a/libstdc++-v3/include/bits/basic_string.tcc 
b/libstdc++-v3/include/bits/basic_string.tcc
index 371f1c3ccee..98c386239f9 100644
--- a/libstdc++-v3/include/bits/basic_string.tcc
+++ b/libstdc++-v3/include/bits/basic_string.tcc
@@ -515,6 +515,37 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   return __n;
 }
 
+#if __cplusplus > 202002L
+  template
+  template
+constexpr void
+basic_string<_CharT, _Traits, _Alloc>::
+resize_and_overwrite(size_type __n, _Operation __op)
+{
+  const size_type __capacity = capacity();
+  _CharT* __p;
+  if (__n > __capacity)
+   {
+ __p = _M_create(__n, __capacity);
+ this->_S_copy(__p, _M_data(), length()); // exclude trailing null
+ _M_dispose();
+ _M_data(__p);
+ _M_capacity(__n);
+   }
+  else
+   __p = _M_data();
+  struct _Terminator {
+   ~_Terminator() { _M_this->_M_set_length(_M_r); }
+   basic_string* _M_this;
+   size_type _M_r;
+  };
+  _Terminator __term{this};
+  const size_type __n2 [[maybe_unused]] = __n;
+  __term._M_r = std::move(__op)(__p, __n);
+  _GLIBCXX_DEBUG_ASSERT(__term._M_r >= 0 && __term._M_r <= __n2);
+}
+#endif // C++23
+
 #endif  // _GLIBCXX_USE_CXX11_ABI

   template
diff --git a/libstdc++-v3/include/std/version b/libstdc++-v3/include/std/version
index 0d7ae3bf857..2b118301da7 100644
--- a/libstdc++-v3/include/std/version
+++ b/libstdc++-v3/include/std/version
@@ -294,6 +294,9 @@
 #define __cpp_lib_is_scoped_enum 202011L
 #define __cpp_lib_move_only_function 202110L
 #define __cpp_lib_string_contains 202011L
+#if _GLIBCXX_USE_CXX11_ABI // Only supported with cxx11-abi
+# define __cpp_lib_string_resize_and_overwrite 202110L
+#endif
 #define __cpp_lib_to_underlying 202102L
 #endif // C++2b
 #endif // C++20
diff --git 
a/libstdc++-v3/testsuite/21_strings/basic_string/capacity/char/resize_and_overwrite.cc
 
b/libstdc++-v3/testsuite/21_strings/basic_string/capacity/char/resize_and_overwrite.cc
new file mode 100644
index 000..f0e81126a41
--- /dev/null
+++ 
b/libstdc++-v3/testsuite/21_strings/basic_string/capacity/char/resize_and_overwrite.cc
@@ -0,0 +1,114 @@
+// { dg-options "-std=gnu++23" }
+// { dg-do run { target { c++23 && cxx11-abi } } }
+
+#include 
+
+#ifndef __cpp_lib_string_resize_and_overwrite
+#error "Feature test macro for resize_and_overwrite is missing in "
+#elif __cpp_lib_string_resize_and_overwrite != 202110L
+# error "Feature test macro for resize_and_overwrite has wrong value in 
"
+#endif
+
+
+#include 
+#include 
+
+// P1072R10 basic_string::resize_and_overwrite
+
+void
+test01()
+{
+  std::string s = "foo";
+  s.resize_and_overwrite(99, [](char* p, int n) {
+VERIFY( n == 99 );
+VERIFY( !std::strncmp(p, "foo", 3) );
+std::strcpy(p, "monkey tennis");
+return 6;
+  });
+  VERIFY( s == "monkey" );
+  VERIFY( s.size() == 6 );
+  VERIFY( s.capacity() >= 99 );
+  VERIFY( s[6] == '\0' );
+
+  const auto str = 

[r12-4443 Regression] FAIL: 27_io/ios_base/failure/dual_abi.cc execution test on Linux/x86_64

2021-10-15 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

93ac832f1846e4867aa6537f76f510fab8e3e87d is the first bad commit
commit 93ac832f1846e4867aa6537f76f510fab8e3e87d
Author: Andrew MacLeod 
Date:   Thu Oct 7 10:12:29 2021 -0400

Ranger : Do not process abnormal ssa-names.

caused

FAIL: 27_io/ios_base/failure/dual_abi.cc execution test

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-4443/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check 
RUNTESTFLAGS="conformance.exp=27_io/ios_base/failure/dual_abi.cc 
--target_board='unix{-m32}'"
$ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check 
RUNTESTFLAGS="conformance.exp=27_io/ios_base/failure/dual_abi.cc 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check 
RUNTESTFLAGS="conformance.exp=27_io/ios_base/failure/dual_abi.cc 
--target_board='unix{-m64}'"
$ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check 
RUNTESTFLAGS="conformance.exp=27_io/ios_base/failure/dual_abi.cc 
--target_board='unix{-m64\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


[Patch] Fortran: Fix CLASS conversion check [PR102745]

2021-10-15 Thread Tobias Burnus

This patch fixes two issues:

First, to print 'CLASS(t2)' instead of:
Error: Type mismatch in argument ‘x’ at (1); passed CLASS(__class_MAIN___T2_a) 
to TYPE(t)

Additionally,

  class(t2) = class(t)  ! 't2' extends 't'
  class(t2) = class(any)

was wrongly accepted.

OK?

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
Fortran: Fix CLASS conversion check [PR102745]

	PR fortran/102745
gcc/fortran/ChangeLog
	* intrinsic.c (gfc_convert_type_warn): Fix checks by checking CLASS
	and do typcheck in correct order for type extension.
	* misc.c (gfc_typename): Print proper not internal CLASS type name.

gcc/testsuite/ChangeLog
	* gfortran.dg/class_72.f90: New.

 gcc/fortran/intrinsic.c|  7 +--
 gcc/fortran/misc.c | 10 ++--
 gcc/testsuite/gfortran.dg/class_72.f90 | 83 ++
 3 files changed, 92 insertions(+), 8 deletions(-)

diff --git a/gcc/fortran/intrinsic.c b/gcc/fortran/intrinsic.c
index 219f04f2317..f5c88d98cc9 100644
--- a/gcc/fortran/intrinsic.c
+++ b/gcc/fortran/intrinsic.c
@@ -5237,12 +5237,13 @@ gfc_convert_type_warn (gfc_expr *expr, gfc_typespec *ts, int eflag, int wflag,
   /* In building an array constructor, gfortran can end up here when no
  conversion is required for an intrinsic type.  We need to let derived
  types drop through.  */
-  if (from_ts.type != BT_DERIVED
+  if (from_ts.type != BT_DERIVED && from_ts.type != BT_CLASS
   && (from_ts.type == ts->type && from_ts.kind == ts->kind))
 return true;
 
-  if (expr->ts.type == BT_DERIVED && ts->type == BT_DERIVED
-  && gfc_compare_types (>ts, ts))
+  if ((expr->ts.type == BT_DERIVED || expr->ts.type == BT_CLASS)
+  && (ts->type == BT_DERIVED || ts->type == BT_CLASS)
+  && gfc_compare_types (ts, >ts))
 return true;
 
   /* If array is true then conversion is in an array constructor where
diff --git a/gcc/fortran/misc.c b/gcc/fortran/misc.c
index 3d449ae17fe..e6402e881e3 100644
--- a/gcc/fortran/misc.c
+++ b/gcc/fortran/misc.c
@@ -130,7 +130,6 @@ gfc_typename (gfc_typespec *ts, bool for_hash)
   static char buffer2[GFC_MAX_SYMBOL_LEN + 8];
   static int flag = 0;
   char *buffer;
-  gfc_typespec *ts1;
   gfc_charlen_t length = 0;
 
   buffer = flag ? buffer1 : buffer2;
@@ -180,16 +179,17 @@ gfc_typename (gfc_typespec *ts, bool for_hash)
   sprintf (buffer, "TYPE(%s)", ts->u.derived->name);
   break;
 case BT_CLASS:
-  if (ts->u.derived == NULL)
+  if (!ts->u.derived || !ts->u.derived->components
+	  || !ts->u.derived->components->ts.u.derived)
 	{
 	  sprintf (buffer, "invalid class");
 	  break;
 	}
-  ts1 = ts->u.derived->components ? >u.derived->components->ts : NULL;
-  if (ts1 && ts1->u.derived && ts1->u.derived->attr.unlimited_polymorphic)
+  if (ts->u.derived->components->ts.u.derived->attr.unlimited_polymorphic)
 	sprintf (buffer, "CLASS(*)");
   else
-	sprintf (buffer, "CLASS(%s)", ts->u.derived->name);
+	sprintf (buffer, "CLASS(%s)",
+		 ts->u.derived->components->ts.u.derived->name);
   break;
 case BT_ASSUMED:
   sprintf (buffer, "TYPE(*)");
diff --git a/gcc/testsuite/gfortran.dg/class_72.f90 b/gcc/testsuite/gfortran.dg/class_72.f90
new file mode 100644
index 000..0fd6ec010f5
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/class_72.f90
@@ -0,0 +1,83 @@
+! PR fortran/102745
+
+implicit none
+
+type t
+end type t
+
+type, extends(t) :: t2
+end type t2
+
+type t3
+end type t3
+
+type(t), allocatable :: var
+type(t2), allocatable :: v2ar
+type(t3), allocatable :: v3ar
+class(t), allocatable :: cvar
+class(t2), allocatable :: c2var
+class(t3), allocatable :: c3var
+
+call f(var)
+call f(v2ar)   ! { dg-error "passed TYPE.t2. to TYPE.t." }
+call f(v2ar%t)
+call f(cvar)
+call f(c2var)  ! { dg-error "passed CLASS.t2. to TYPE.t." }
+call f(c2var%t)
+
+call f2(var)   ! { dg-error "passed TYPE.t. to TYPE.t2." }
+call f2(v2ar)
+call f2(cvar)  ! { dg-error "passed CLASS.t. to TYPE.t2." }
+call f2(c2var)
+
+
+var = var
+var = v2ar  ! { dg-error "TYPE.t2. to TYPE.t." }
+var = cvar
+var = c2var ! { dg-error "TYPE.t2. to TYPE.t." }
+
+v2ar = var  ! { dg-error "Cannot convert TYPE.t. to TYPE.t2." }
+v2ar = v2ar
+v2ar = cvar ! { dg-error "Cannot convert TYPE.t. to TYPE.t2." }
+v2ar = c2var
+
+cvar = var
+cvar = v2ar
+cvar = cvar
+cvar = c2var
+
+c2var = var   ! { dg-error "Cannot convert TYPE.t. to CLASS.t2." }
+c2var = v3ar  ! { dg-error "Cannot convert TYPE.t3. to CLASS.t2." }
+c2var = v2ar
+c2var = cvar  ! { dg-error "Cannot convert CLASS.t. to CLASS.t2." }
+c2var = c3var ! { dg-error "Cannot convert CLASS.t3. to CLASS.t2." }
+c2var = c2var
+
+allocate (var, source=var)
+allocate (var, source=v2ar)   ! { dg-error "incompatible with source-expr" }
+allocate (var, source=cvar)
+allocate 

Re: [PATCH] c++: fix cases of core1001/1322 by not dropping cv-qualifier of function parameter of type of typename or decltype[PR101402,PR102033,PR102034,PR102039,PR102

2021-10-15 Thread Jason Merrill via Gcc-patches

On 10/14/21 07:04, Nick Huang wrote:

IMHO, I think your patch probably finally solved this long-standing Core
1001 issue. Of course it is not up to me to say so. I just want to point out
that it even solves the following case, even though it is more or less
expected if concept and lambda all work expectedly.

template
concept IsLambdaAry3=__is_same(T, decltype(+[]{})[3]);
template
void bar(const T){}
template<>
void bar(const decltype(+[]{})[3]){}


Sounds good.  Here's what I'm applying:From 79802c5dcc043a515f429bb2bec7573b8537c32a Mon Sep 17 00:00:00 2001
From: Jason Merrill 
Date: Tue, 28 Sep 2021 10:02:04 -0400
Subject: [PATCH] c++: array cv-quals and template specialization [PR101402]
To: gcc-patches@gcc.gnu.org

PRs 101402, 102033, etc. demonstrated that the fix for PR92010 wasn't
handling all cases of the CWG1001/1322 issue with parameter type qual
stripping and arrays with templates.  The problem turned out to be in
determine_specialization, which did an extra substitution without the 92010
fix and then complained that the result didn't match.

But just removing that wrong/redundant code meant that we were accepting
specializations with different numbers of parameters, because the code in
fn_type_unification that compares types in this case wasn't checking for
length mismatch.

After fixing that, I realized that fn_type_unification couldn't tell the
difference between variadic and non-variadic function types, because the
args array doesn't include the terminal void we use to indicate non-variadic
function type.  So I added it, and made the necessary adjustments.

Thanks to qingzhe "nick" huang  for the patch that
led me to dig more into this, and the extensive testcases.

	PR c++/51851
	PR c++/101402
	PR c++/102033
	PR c++/102034
	PR c++/102039
	PR c++/102044

gcc/cp/ChangeLog:

	* pt.c (determine_specialization): Remove redundant code.
	(fn_type_unification): Check for mismatched length.
	(type_unification_real): Ignore terminal void.
	(get_bindings): Don't stop at void_list_node.
	* class.c (resolve_address_of_overloaded_function): Likewise.

gcc/testsuite/ChangeLog:

	* g++.dg/template/fnspec2.C: New test.
	* g++.dg/template/parm-cv1.C: New test.
	* g++.dg/template/parm-cv2.C: New test.
	* g++.dg/template/parm-cv3.C: New test.
---
 gcc/cp/class.c   |   2 +-
 gcc/cp/pt.c  |  30 +++--
 gcc/testsuite/g++.dg/template/fnspec2.C  |   9 ++
 gcc/testsuite/g++.dg/template/parm-cv1.C |  15 +++
 gcc/testsuite/g++.dg/template/parm-cv2.C |  23 
 gcc/testsuite/g++.dg/template/parm-cv3.C | 142 +++
 6 files changed, 204 insertions(+), 17 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/template/fnspec2.C
 create mode 100644 gcc/testsuite/g++.dg/template/parm-cv1.C
 create mode 100644 gcc/testsuite/g++.dg/template/parm-cv2.C
 create mode 100644 gcc/testsuite/g++.dg/template/parm-cv3.C

diff --git a/gcc/cp/class.c b/gcc/cp/class.c
index 59611627d18..f16e50b9de9 100644
--- a/gcc/cp/class.c
+++ b/gcc/cp/class.c
@@ -8382,7 +8382,7 @@ resolve_address_of_overloaded_function (tree target_type,
   nargs = list_length (target_arg_types);
   args = XALLOCAVEC (tree, nargs);
   for (arg = target_arg_types, ia = 0;
-	   arg != NULL_TREE && arg != void_list_node;
+	   arg != NULL_TREE;
 	   arg = TREE_CHAIN (arg), ++ia)
 	args[ia] = TREE_VALUE (arg);
   nargs = ia;
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 009fe6db573..287cf4ce9d0 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -2230,7 +2230,6 @@ determine_specialization (tree template_id,
 	{
 	  tree decl_arg_types;
 	  tree fn_arg_types;
-	  tree insttype;
 
 	  /* In case of explicit specialization, we need to check if
 	 the number of template headers appearing in the specialization
@@ -2356,20 +2355,6 @@ determine_specialization (tree template_id,
 	   template argument.  */
 	continue;
 
-  /* Remove, from the set of candidates, all those functions
- whose constraints are not satisfied. */
-  if (flag_concepts && !constraints_satisfied_p (fn, targs))
-continue;
-
-  // Then, try to form the new function type.
-	  insttype = tsubst (TREE_TYPE (fn), targs, tf_fndecl_type, NULL_TREE);
-	  if (insttype == error_mark_node)
-	continue;
-	  fn_arg_types
-	= skip_artificial_parms_for (fn, TYPE_ARG_TYPES (insttype));
-	  if (!compparms (fn_arg_types, decl_arg_types))
-	continue;
-
 	  /* Save this template, and the arguments deduced.  */
 	  templates = tree_cons (targs, fn, templates);
 	}
@@ -21862,6 +21847,15 @@ fn_type_unification (tree fn,
  TREE_VALUE (sarg));
 	goto fail;
 	  }
+  if ((i < nargs || sarg)
+	  /* add_candidates uses DEDUCE_EXACT for x.operator foo(), but args
+	 doesn't contain the trailing void, and conv fns are always ().  */
+	  && !DECL_CONV_FN_P (decl))
+	{
+	  unsigned nsargs = i + list_length (sarg);
+	  unify_arity (explain_p, nargs, nsargs);
+	  goto fail;
+	}
 

Re: [PATCH] Add a simulate_record_decl lang hook

2021-10-15 Thread Jason Merrill via Gcc-patches

On 9/24/21 13:53, Richard Sandiford wrote:

This patch adds a lang hook for defining a struct/RECORD_TYPE
“as if” it had appeared directly in the source code.  It follows
the similar existing hook for enums.

It's the caller's responsibility to create the fields
(as FIELD_DECLs) but the hook's responsibility to create
and declare the associated RECORD_TYPE.

For now the hook is hard-coded to do the equivalent of:

   typedef struct NAME { FIELDS } NAME;

but this could be controlled by an extra parameter if some callers
want a different behaviour in future.

The motivating use case is to allow the long list of struct
definitions in arm_neon.h to be provided by the compiler,
which in turn unblocks various arm_neon.h optimisations.

Tested on aarch64-linux-gnu, individually and with a follow-on
patch from Jonathan that makes use of the hook.  OK to install?

Richard


gcc/
* langhooks.h (lang_hooks_for_types::simulate_record_decl): New hook.
* langhooks-def.h (lhd_simulate_record_decl): Declare.
(LANG_HOOKS_SIMULATE_RECORD_DECL): Define.
(LANG_HOOKS_FOR_TYPES_INITIALIZER): Include it.
* langhooks.c (lhd_simulate_record_decl): New function.

gcc/c/
* c-tree.h (c_simulate_record_decl): Declare.
* c-objc-common.h (LANG_HOOKS_SIMULATE_RECORD_DECL): Override.
* c-decl.c (c_simulate_record_decl): New function.

gcc/cp/
* decl.c: Include langhooks-def.h.
(cxx_simulate_record_decl): New function.
* cp-objcp-common.h (cxx_simulate_record_decl): Declare.
(LANG_HOOKS_SIMULATE_RECORD_DECL): Override.
---
  gcc/c/c-decl.c   | 31 +++
  gcc/c/c-objc-common.h|  2 ++
  gcc/c/c-tree.h   |  2 ++
  gcc/cp/cp-objcp-common.h |  4 
  gcc/cp/decl.c| 38 ++
  gcc/langhooks-def.h  |  4 
  gcc/langhooks.c  | 21 +
  gcc/langhooks.h  | 10 ++
  8 files changed, 112 insertions(+)

diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
index 771efa3eadf..8d1324b118c 100644
--- a/gcc/c/c-decl.c
+++ b/gcc/c/c-decl.c
@@ -9436,6 +9436,37 @@ c_simulate_enum_decl (location_t loc, const char *name,
input_location = saved_loc;
return enumtype;
  }
+
+/* Implement LANG_HOOKS_SIMULATE_RECORD_DECL.  */
+
+tree
+c_simulate_record_decl (location_t loc, const char *name,
+   array_slice fields)
+{
+  location_t saved_loc = input_location;
+  input_location = loc;
+
+  class c_struct_parse_info *struct_info;
+  tree ident = get_identifier (name);
+  tree type = start_struct (loc, RECORD_TYPE, ident, _info);
+
+  for (unsigned int i = 0; i < fields.size (); ++i)
+{
+  DECL_FIELD_CONTEXT (fields[i]) = type;
+  if (i > 0)
+   DECL_CHAIN (fields[i - 1]) = fields[i];
+}
+
+  finish_struct (loc, type, fields[0], NULL_TREE, struct_info);
+
+  tree decl = build_decl (loc, TYPE_DECL, ident, type);
+  TYPE_NAME (type) = decl;
+  TYPE_STUB_DECL (type) = decl;
+  lang_hooks.decls.pushdecl (decl);
+
+  input_location = saved_loc;
+  return type;
+}
  
  /* Create the FUNCTION_DECL for a function definition.
 DECLSPECS, DECLARATOR and ATTRIBUTES are the parts of
diff --git a/gcc/c/c-objc-common.h b/gcc/c/c-objc-common.h
index 7d35a0621e4..f4e8271f06c 100644
--- a/gcc/c/c-objc-common.h
+++ b/gcc/c/c-objc-common.h
@@ -81,6 +81,8 @@ along with GCC; see the file COPYING3.  If not see
  
  #undef LANG_HOOKS_SIMULATE_ENUM_DECL

  #define LANG_HOOKS_SIMULATE_ENUM_DECL c_simulate_enum_decl
+#undef LANG_HOOKS_SIMULATE_RECORD_DECL
+#define LANG_HOOKS_SIMULATE_RECORD_DECL c_simulate_record_decl
  #undef LANG_HOOKS_TYPE_FOR_MODE
  #define LANG_HOOKS_TYPE_FOR_MODE c_common_type_for_mode
  #undef LANG_HOOKS_TYPE_FOR_SIZE
diff --git a/gcc/c/c-tree.h b/gcc/c/c-tree.h
index d50d0cb7f2d..8578d2d1e77 100644
--- a/gcc/c/c-tree.h
+++ b/gcc/c/c-tree.h
@@ -598,6 +598,8 @@ extern tree finish_struct (location_t, tree, tree, tree,
   class c_struct_parse_info *);
  extern tree c_simulate_enum_decl (location_t, const char *,
  vec *);
+extern tree c_simulate_record_decl (location_t, const char *,
+   array_slice);
  extern struct c_arg_info *build_arg_info (void);
  extern struct c_arg_info *get_parm_info (bool, tree);
  extern tree grokfield (location_t, struct c_declarator *,
diff --git a/gcc/cp/cp-objcp-common.h b/gcc/cp/cp-objcp-common.h
index f1704aad557..d5859406e8f 100644
--- a/gcc/cp/cp-objcp-common.h
+++ b/gcc/cp/cp-objcp-common.h
@@ -39,6 +39,8 @@ extern bool cp_handle_option (size_t, const char *, 
HOST_WIDE_INT, int,
  extern tree cxx_make_type_hook(tree_code);
  extern tree cxx_simulate_enum_decl (location_t, const char *,
vec *);
+extern tree cxx_simulate_record_decl (location_t, const char *,
+ array_slice);
  
  

Re: [PATCH] PR fortran/102685 - ICE in output_constructor_regular_field, at varasm.c:5514

2021-10-15 Thread Harald Anlauf via Gcc-patches
Hi Tobias, all,

> > In developing the patch I encountered a difficulty with testcase
> > dec_structure_6.f90, which uses a DEC extension, namelist "old-style
> > CLIST initializers in STRUCTURE".  I could not figure out how to
> > determine the shape of the initializer; it seemed to be always zero.
> > I've added code to accept this, but only under -fdec-structure, and
> > added a TODO in a comment.  If somebody reading this could give me
> > a hint to solve end, I would adjust the patch accordingly.
> 
> See attached patch – it does initialize the variables similarly to other
> shapes in that file, except that it has to take the shape from the LHS
> as seemingly (same testfile) having a 1-dim array can be used to
> initialize a 2-dim array.
> 
> You can approve that patch and integrate it then in your own patch :-)

your fix to match_clist_expr LGTM.  I can really use it.

> LGTM – with the DECL exception removed from resolve.c.

I've removed the DEC exception, cleaned up, regtested again.

Committed and pushed:

https://gcc.gnu.org/g:1e819bd95ebeefc1dc469daa1855ce005cb77822

Thanks,
Harald

> Thanks,
> 
> Tobias
> 
> PS: Without the auto-reshape part, a simple 'gfc_array_size (expr,
> >shape[0]))" would have been sufficient.
> -
> Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
> München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
> Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
> München, HRB 106955
>


[r12-4438 Regression] FAIL: libgomp.c/places-10.c execution test on Linux/x86_64

2021-10-15 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

4764049dd620affcd3e2658dc7f03a6616370a29 is the first bad commit
commit 4764049dd620affcd3e2658dc7f03a6616370a29
Author: Jakub Jelinek 
Date:   Fri Oct 15 16:25:25 2021 +0200

openmp: Fix up handling of OMP_PLACES=threads(1)

caused

FAIL: libgomp.c/places-10.c execution test

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-4438/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c.exp=libgomp.c/places-10.c --target_board='unix{-m32}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c.exp=libgomp.c/places-10.c --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c.exp=libgomp.c/places-10.c --target_board='unix{-m64}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="c.exp=libgomp.c/places-10.c --target_board='unix{-m64\ 
-march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


Re: [r12-4397 Regression] FAIL: gcc.dg/guality/pr54200.c -Og -DPREVENT_OPTIMIZATION line 20 z == 3 on Linux/x86_64

2021-10-15 Thread H.J. Lu via Gcc-patches
On Fri, Oct 15, 2021 at 2:00 AM Martin Liška  wrote:
>
> On 10/14/21 21:16, sunil.k.pandey wrote:
> > FAIL: gcc.dg/guality/pr54200.c  -Og -DPREVENT_OPTIMIZATION  line 20 z == 3
>
> Hello.
>
> I've just verified the assembly is identical before and after the revision.
> So it must be a false positive.
>
> Cheers,
> Martin

I saw

Breakpoint 1, foo (z=3, x=, b=1) at
/export/gnu/import/git/gitlab/x86-gcc/gcc/testsuite/gcc.dg/guality/pr54200.c:20^M
20return a; /* { dg-final { gdb-test . "z" "3" { xfail {
aarch64*-*-* && { no-opts "-O0" "-Og" } } } } } */^M
$1 =  [uninitialized] 3^M
$2 = 3^M
A debugging session is active.^M
^M
Inferior 1 [process 4053185] will be killed.^M
^M
Quit anyway? (y or n) [answered Y; input not from terminal]^M
 [uninitialized] 3 != 3
FAIL: gcc.dg/guality/pr54200.c  -Og -DPREVENT_OPTIMIZATION  line 20 z == 3

I don't know where [uninitialized] came from.  I can't reproduce it
when I run gdb by hand.

-- 
H.J.


Re: [PATCH, OpenMP 5.1, Fortran] Strictly-structured block support for OpenMP directives

2021-10-15 Thread Jakub Jelinek via Gcc-patches
On Sat, Oct 16, 2021 at 02:44:12AM +0800, Chung-Lin Tang wrote:
> The patch currently does not allow strictly-structured BLOCK for 
> sections/parallel sections,
> since I was referencing the 5.1 spec while writing it, although that is 
> trivially fixable.
> (was sensing a bit odd why those two constructs had to be specially treated 
> in 5.1 anyways)
> 
> The bigger issue is that under the current way the patch is written, the 
> statements inside
> a [parallel] sections construct are parsed automatically by 
> parse_executable(), so to enforce
> the specified meaning of "structured-block-sequence" (i.e. BLOCK or non-BLOCK 
> starting sequence of stmts)
> will probably be more a bit harder to implement:
> 
> !$omp sections
> block
>!$omp section
>block
>  x=0
>end block
>x=1   !! This is allowed now, though should be wrong spec-wise
>!$omp section
>x=2
> end block
> 
> Currently "$!omp section" acts essentially as a top-level separator within a 
> sections-construct,
> rather than a structured directive. Though I would kind of argue this is 
> actually better to use for the
> user (why prohibit what looks like very apparent meaning of the program?)
> 
> So Jakub, my question for this is, is this current state okay? Or must we 
> implement the spec pedantically?

I'd certainly not implement 5.1 pedantically when we know we'd change one
way for 5.0 -> 5.1 and change it back again for 5.1 -> 5.2.
An example of that is
!$omp sections
!$omp section
block
end block
x = 1
!$end omp sections
This is valid in 5.0 and will be valid again in 5.2 with the same meaning,
so let's just use the 5.0/5.2 wording here.  Ditto the !$omp end ambiguity
Tobias raised etc.
Whether to add support for the 5.2 behavior when one sees
!$omp {,parallel }sections
block
or not is more tough question, I bet the answer should be what is easier to
implement right now (i.e. don't spend too much effort on hard 5.1
implementation if 5.2 would be easier, as eventually we'd need to do the 5.2
implementation afterwards anyway).
So, for block right after sections either we implement the 5.2 wording right
away and therefore look for !$omp section only within that block and not
outside of it, or we for the 1st section with omitted !$omp section before
it only implement the 5.1 pedantic behavior, i.e. if it starts with a block,
don't look for !$omp section in the block, but require either !$omp section
or !$omp end sections right after the corresponding end block.

Does this answer all questions?

Jakub



[pushed] Darwin: Update specs handling '-r'.

2021-10-15 Thread Iain Sandoe via Gcc-patches
We were not wrapping all the default libraries in checks for whether
they should be used.  We were also wasting a process launch calling
dsymutil for 'r' link lines (a NOP in practice).  Order the checks
that exclude linking from most likely to occur, downwards.

tested on powerpc, i686, x86_64, arm64 darwin, pushed to master, thanks.
Iain

Signed-off-by: Iain Sandoe 

gcc/ChangeLog:

* config/darwin.h (LINK_COMMAND_SPEC_A): Update 'r' handling to
skip gomp and itm when r or nodefaultlibs is given.
(DSYMUTIL_SPEC): Do not call dsymutil for '-r' link lines.
Update ordering of exclusions, remove duplicate 'v' addition
(collect2 will add this from the main command line).
---
 gcc/config/darwin.h | 33 ++---
 1 file changed, 18 insertions(+), 15 deletions(-)

diff --git a/gcc/config/darwin.h b/gcc/config/darwin.h
index 4aedf467c17..27cb3e4bb30 100644
--- a/gcc/config/darwin.h
+++ b/gcc/config/darwin.h
@@ -349,7 +349,7 @@ extern GTY(()) int darwin_ms_struct;
linkers, and for positional arguments like libraries.  */
 
 #define LINK_COMMAND_SPEC_A \
-   "%{!fdump=*:%{!fsyntax-only:%{!c:%{!M:%{!MM:%{!E:%{!S:\
+   "%{!c:%{!E:%{!S:%{!M:%{!MM:%{!fsyntax-only:%{!fdump=*: \
 %(linker)" \
 LINK_PLUGIN_SPEC \
 "%{flto*:%< 10.6 10.7 mmacosx-version-min= -ld10-uwfef) \
   %(link_gcc_c_sequence) \
 }}}\
-%{!nostdlib:%{!r:%{!nostartfiles:%E}}} %{T*} %{F*} "\
+%{!r:%{!nostdlib:%{!nostartfiles:%E}}} %{T*} %{F*} "\
 DARWIN_PIE_SPEC \
 DARWIN_NOPIE_SPEC \
 DARWIN_RDYNAMIC \
@@ -384,12 +387,12 @@ extern GTY(()) int darwin_ms_struct;
enabled).  */
 
 #define DSYMUTIL_SPEC \
-   "%{!fdump=*:%{!fsyntax-only:%{!c:%{!M:%{!MM:%{!E:%{!S:\
-%{v} \
-%{g*:%{!gctf:%{!gbtf:%{!gstabs*:%{%:debug-level-gt(0): -idsym}\
-%{.c|.cc|.C|.cpp|.cp|.c++|.cxx|.CPP|.m|.mm|.s|.f|.f90|\
-  .f95|.f03|.f77|.for|.F|.F90|.F95|.F03|.d: \
-%{g*:%{!gctf:%{!gbtf:%{!gstabs*:%{%:debug-level-gt(0): -dsym}"
+  "%{!c:%{!E:%{!S:%{!r:%{!M:%{!MM:%{!fsyntax-only:%{!fdump=*:\
+ %{g*:%{!gctf:%{!gbtf:%{!gstabs*:%{%:debug-level-gt(0): -idsym \
+   %{.c|.cc|.C|.cpp|.cp|.c++|.cxx|.CPP|.m|.mm|.s|.f|.f90|\
+.f95|.f03|.f77|.for|.F|.F90|.F95|.F03|.d: -dsym }\
+  }\
+   "
 
 #define LINK_COMMAND_SPEC LINK_COMMAND_SPEC_A DSYMUTIL_SPEC
 
-- 
2.24.3 (Apple Git-128)



[pushed] Darwin: Revise handling of some driver opts.

2021-10-15 Thread Iain Sandoe via Gcc-patches
Darwin has a user convenience feature where some linker options are exposed
at the driver level (so one can type '-all_load' instead of '-Wl,-all_load'
or '-Xlinker -all_load').  We retain this feature, but now these options are
all marked as 'Driver' and we process them as early as possible so that they
get allocated to the right toolchain command.  There are a couple of special
cases where these driver opts are used multiple times, or to control
operations on more than one command (e.g. dynamiclib).  These are handled
specially and we then add %

gcc/ChangeLog:

* config/darwin-driver.c (darwin_driver_init): Revise comments, handle
filelist and framework options in specs instead of code.
* config/darwin.h (SUBTARGET_DRIVER_SELF_SPECS): Update to handle link
specs that are really driver ones.
(DARWIN_CC1_SPEC): Likewise.
(CPP_SPEC): Likewise.
(SYSROOT_SPEC): Append space.
(LINK_SYSROOT_SPEC): Remove most driver link specs.
(STANDARD_STARTFILE_PREFIX_2): Update link-related specs.
(STARTFILE_SPEC): Likewise.
(ASM_MMACOSX_VERSION_MIN_SPEC): Fix line wrap.
(ASM_SPEC): Update driver-related specs.
(ASM_FINAL_SPEC): Likewise.
* config/darwin.opt: Remove now unused option aliases.
* config/i386/darwin.h (EXTRA_ASM_OPTS): Ensure space after opt.
(ASM_SPEC): Update driver-related specs.
---
 gcc/config/darwin-driver.c |  30 +---
 gcc/config/darwin.h| 298 +++--
 gcc/config/darwin.opt  | 155 ---
 gcc/config/i386/darwin.h   |   9 +-
 4 files changed, 228 insertions(+), 264 deletions(-)

diff --git a/gcc/config/darwin-driver.c b/gcc/config/darwin-driver.c
index 573abae4782..a036e091c48 100644
--- a/gcc/config/darwin-driver.c
+++ b/gcc/config/darwin-driver.c
@@ -259,14 +259,11 @@ maybe_get_sysroot_from_sdkroot ()
   return xstrndup (maybe_sysroot, strlen (maybe_sysroot));
 }
 
-/* Translate -filelist and -framework options in *DECODED_OPTIONS
-   (size *DECODED_OPTIONS_COUNT) to use -Xlinker so that they are
-   considered to be linker inputs in the case that no other inputs are
-   specified.  Handling these options in DRIVER_SELF_SPECS does not
-   suffice because specs are too late to add linker inputs, and
-   handling them in LINK_SPEC does not suffice because the linker will
-   not be called if there are no other inputs.  When native, also
-   default the -mmacosx-version-min flag.  */
+/* Handle the deduction of m32/m64 from -arch flags and the interactions
+   between them (i.e. try to warn a user who thinks that they have a driver
+   that can produce multi-slice "FAT" outputs with more than one arch).
+   Default the -mmacosx-version-min flag, which requires a system call on
+   native hosts.  */
 
 void
 darwin_driver_init (unsigned int *decoded_options_count,
@@ -326,23 +323,6 @@ darwin_driver_init (unsigned int *decoded_options_count,
  seenM64 = true;
  break;
 
-   case OPT_filelist:
-   case OPT_framework:
- ++*decoded_options_count;
- *decoded_options = XRESIZEVEC (struct cl_decoded_option,
-*decoded_options,
-*decoded_options_count);
- memmove (*decoded_options + i + 2,
-  *decoded_options + i + 1,
-  ((*decoded_options_count - i - 2)
-   * sizeof (struct cl_decoded_option)));
- generate_option (OPT_Xlinker, (*decoded_options)[i].arg, 1,
-  CL_DRIVER, &(*decoded_options)[i + 1]);
- generate_option (OPT_Xlinker,
-  (*decoded_options)[i].canonical_option[0], 1,
-  CL_DRIVER, &(*decoded_options)[i]);
- break;
-
case OPT_mmacosx_version_min_:
  seen_version_min = true;
  vers_string =
diff --git a/gcc/config/darwin.h b/gcc/config/darwin.h
index 0fa1c572bc9..4aedf467c17 100644
--- a/gcc/config/darwin.h
+++ b/gcc/config/darwin.h
@@ -118,25 +118,164 @@ see the files COPYING3 and COPYING.RUNTIME respectively. 
 If not, see
 /* True if pragma ms_struct is in effect.  */
 extern GTY(()) int darwin_ms_struct;
 
-/* The majority of Darwin's special driver opts are direct access to ld flags
-   (to save the user typing -Wl,x or Xlinker x) but we can't process
-   them here, since doing so will make it appear that there are linker infiles
-   and the linker will invoked even when it is not necessary.
+/* Darwin has a user convenience feature where some linker options are exposed
+   at the driver level (so one can type "-all_load" instead of "-Wl,-all_load"
+   or "-Xlinker -all_load").  We retain this, but now these options are all
+   marked as 'Driver' and we process them as early as possible so that they
+   get allocated to the right toolchain command.  There are a couple of special
+   cases where these driver opts are used 

Re: [PATCH, OpenMP 5.1, Fortran] Strictly-structured block support for OpenMP directives

2021-10-15 Thread Chung-Lin Tang

On 2021/10/14 7:19 PM, Jakub Jelinek wrote:

On Thu, Oct 14, 2021 at 12:20:51PM +0200, Jakub Jelinek via Gcc-patches wrote:

Thinking more about the Fortran case for !$omp sections, there is an
ambiguity.
!$omp sections
block
   !$omp section
end block
is clear and !$omp end sections is optional, but
!$omp sections
block
end block
is ambiguous during parsing, it could be either followed by !$omp section
and then the BLOCK would be first section, or by !$omp end sections and then
it would be clearly the whole sections, with first section being empty
inside of the block, or if it is followed by something else, it is
ambiguous whether the block ... end block is part of the first section,
followed by something and then we should be looking later for either
!$omp section or !$omp end section to prove that, or if
!$omp sections
block
end block
was the whole sections construct and we shouldn't await anything further.
I'm afraid back to the drawing board.


And I have to correct myself, there is no ambiguity in 5.2 here,
the important fact is hidden in sections/parallel sections being
block-associated constructs.  That means the body of the whole construct
has to be a structured-block, and by the 5.1+ definition of Fortran
structured block, it is either block ... end block or something that
doesn't start with block.
So,
!$omp sections
block
end block
a = 1
is only ambiguous in whether it is actually
!$omp sections
block
   !$omp section
end block
a = 1
or
!$omp sections
!$omp section
block
end block
!$omp end sections
a = 1
but both actually do the same thing, work roughly as !$omp single.
If one wants block statement as first in structured-block-sequence
of the first section, followed by either some further statements
or by other sections, then one needs to write
!$omp sections
!$omp section
block
end block
a = 1
...
!$omp end sections
or
!$omp sections
block
   block
   end block
   a = 1
...
end block

Your patch probably already handles it that way, but we again need
testsuite coverage to prove it is handled the way it should in all these
cases (and that we diagnose what is invalid).


The patch currently does not allow strictly-structured BLOCK for 
sections/parallel sections,
since I was referencing the 5.1 spec while writing it, although that is 
trivially fixable.
(was sensing a bit odd why those two constructs had to be specially treated in 
5.1 anyways)

The bigger issue is that under the current way the patch is written, the 
statements inside
a [parallel] sections construct are parsed automatically by parse_executable(), 
so to enforce
the specified meaning of "structured-block-sequence" (i.e. BLOCK or non-BLOCK 
starting sequence of stmts)
will probably be more a bit harder to implement:

!$omp sections
block
   !$omp section
   block
 x=0
   end block
   x=1   !! This is allowed now, though should be wrong spec-wise
   !$omp section
   x=2
end block

Currently "$!omp section" acts essentially as a top-level separator within a 
sections-construct,
rather than a structured directive. Though I would kind of argue this is 
actually better to use for the
user (why prohibit what looks like very apparent meaning of the program?)

So Jakub, my question for this is, is this current state okay? Or must we 
implement the spec pedantically?

As for the other issues:
(1) BLOCK/END BLOCK is not generally handled in parse_omp_structured_block, so 
for workshare,
it is only handled for the top-level construct, not within workshare. I 
think this is what you meant
in the last mail.

(2) As for the dangling-!$omp_end issue Tobias raised, because we are basically 
using 1-statement lookahead,
any "!$omp end <*>" is naturally bound with the adjacent BLOCK/END BLOCK, 
so we should be okay there.

Thanks,
Chung-Lin


Re: [PATCH] hardened conditionals

2021-10-15 Thread Alexandre Oliva via Gcc-patches
On Oct 14, 2021, Richard Biener  wrote:

> Yeah, I think that eventually marking the operation we want to preserve
> (with volatile?) would be the best way.  On GIMPLE that's difficult,
> it's easier on GENERIC (we can set TREE_THIS_VOLATILE on the
> expression at least), and possibly also on RTL (where such flag
> might already be a thing?).

Making the expr volatile would likely get gimple to deal with it like
memory, which would completely defeat the point of trying to avoid a
copy.

RTL has support for volatile MEMs and (user-)REGs indeed, but in order
to avoid the copy, we don't want the pseudo to be volatile, we want
specific users thereof to be.  An unspec_volatile would accomplish that,
but it would require RTL patterns to match it wherever a pseudo might
appear.  Considering all forms of insns involving conditionals on all
relevant targets, that's far too much effort for no measurable beenefit.


> So when going that way doing the hardening on RTL seems easier (if you
> want to catch all compares, if you want to only catch compare + jump
> that has your mentioned issue of all the different representations)

It's not.  RTL has various ways to represent store-flags too.  Even now
that we don't have to worry about implicit CC, a single boolean test may
expand to a compare-and-set-[CC-]reg, and then a
compare-and-store-CC-reg, or a single compare-and-set-[non-CC-]reg, and
IIRC in some cases even more than one (pair of) conditionals.

Compare-and-branches also come in such a multitude of settings.

It all depends on the target, and I don't really see any benefit
whatsoever of implementing such trivial gimple passes with all the
potential complexity of RTL on all the architectures relevant for GCC,
or even for this project alone.

> Note that I did not look on the actual patch, I'm trying to see whether 
> there's
> some generic usefulness we can extract from the patchs requirement
> which to me looks quite specific and given it's "hackish" implementation
> way might not be the most important one to carry on trunk (I understand
> that AdaCore will carry it in their compiler).

It's also simple, no-maintenance, and entirely self-contained.  A good
example of something that could be implemented as a plugin, except for
command-line options.

Maybe we could have a plugin collection in our source tree, to hold
stuff like this and to offer examples of plugins, and means to build
select plugins as such, or as preloaded modules into the compiler for
easier deployment.

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


[r12-4429 Regression] FAIL: gcc.target/i386/avx512fp16-v4hf-concat.c scan-assembler-times vpunpcklqdq 1 on Linux/x86_64

2021-10-15 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

be072bfa5bb3817168daa0a4a398cd9bd915a726 is the first bad commit
commit be072bfa5bb3817168daa0a4a398cd9bd915a726
Author: Hongyu Wang 
Date:   Mon Aug 30 15:18:35 2021 +0800

AVX512FP16: Enhance vector shuffle builtins

caused

FAIL: gcc.target/i386/avx512fp16-v4hf-concat.c scan-assembler-times vpunpcklqdq 
1

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-4429/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512fp16-v4hf-concat.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512fp16-v4hf-concat.c 
--target_board='unix{-m64\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


[committed] libstdc++: Make non-propagating-cache fully constexpr [PR101263]

2021-10-15 Thread Jonathan Wakely via Gcc-patches
libstdc++-v3/ChangeLog:

PR libstdc++/101263
* include/std/ranges (__cached): New wrapper struct.
(__non_propagating_cache): Use __cached for contained value.
(__non_propagating_cache::_M_emplace_deref): Add constexpr. Use
std::construct_at instead of placement new.
* testsuite/std/ranges/adaptors/join.cc: Check constexpr works.

Tested powerpc64le-linux. Committed to trunk.

commit 2c564e813c0626802e5bfb066c094933d5e6a774
Author: Jonathan Wakely 
Date:   Fri Oct 15 14:49:21 2021

libstdc++: Make non-propagating-cache fully constexpr [PR101263]

libstdc++-v3/ChangeLog:

PR libstdc++/101263
* include/std/ranges (__cached): New wrapper struct.
(__non_propagating_cache): Use __cached for contained value.
(__non_propagating_cache::_M_emplace_deref): Add constexpr. Use
std::construct_at instead of placement new.
* testsuite/std/ranges/adaptors/join.cc: Check constexpr works.

diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
index 07eae0cf94b..b8de400dfbb 100644
--- a/libstdc++-v3/include/std/ranges
+++ b/libstdc++-v3/include/std/ranges
@@ -1159,9 +1159,34 @@ namespace views::__adaptor
// (such as join_view::_M_inner).
   };
 
+template
+  struct __cached
+  {
+   struct _Deref_t { };
+   static constexpr _Deref_t __deref{};
+
+   // Initialize _M_t directly from the result of dereferencing __i.
+   // This avoids any unwanted temporary materialization that would
+   // occur if *__i was bound to a reference before initializing _M_t.
+   template
+ constexpr explicit
+ __cached(_Deref_t, _Iter&& __i)
+ : _M_t(*__i)
+ { }
+
+   template
+ constexpr explicit
+ __cached(_Args&&... __args)
+ : _M_t(std::forward<_Args>(__args)...)
+ { }
+
+   _Tp _M_t;
+  };
+
 template
   requires is_object_v<_Tp>
-  struct __non_propagating_cache<_Tp> : protected _Optional_base<_Tp>
+  struct __non_propagating_cache<_Tp>
+  : protected _Optional_base<__cached<_Tp>>
   {
__non_propagating_cache() = default;
 
@@ -1205,23 +1230,22 @@ namespace views::__adaptor
 
constexpr _Tp&
operator*() noexcept
-   { return this->_M_get(); }
+   { return this->_M_get()._M_t; }
 
constexpr const _Tp&
operator*() const noexcept
-   { return this->_M_get(); }
+   { return this->_M_get()._M_t; }
 
template
- _Tp&
+ constexpr _Tp&
  _M_emplace_deref(const _Iter& __i)
  {
this->_M_reset();
-   // Using _Optional_base::_M_construct to initialize from '*__i'
-   // would incur an extra move due to the indirection, so we instead
-   // use placement new directly.
-   ::new ((void *) std::__addressof(this->_M_payload._M_payload)) 
_Tp(*__i);
+   // Use the special constructor of __cached<_Tp> that does *__i.
+   std::construct_at(std::__addressof(this->_M_payload._M_payload),
+ std::in_place, __cached<_Tp>::__deref, __i);
this->_M_payload._M_engaged = true;
-   return this->_M_get();
+   return **this;
  }
   };
 
diff --git a/libstdc++-v3/testsuite/std/ranges/adaptors/join.cc 
b/libstdc++-v3/testsuite/std/ranges/adaptors/join.cc
index 50af3fdf729..1ec42381ad2 100644
--- a/libstdc++-v3/testsuite/std/ranges/adaptors/join.cc
+++ b/libstdc++-v3/testsuite/std/ranges/adaptors/join.cc
@@ -193,6 +193,18 @@ test11()
 ;
 }
 
+void
+test12()
+{
+  // PR libstdc++/101263
+  constexpr auto b = [] {
+auto r = std::views::iota(0, 5)
+  | std::views::lazy_split(0)
+  | std::views::join;
+return r.begin() != r.end();
+  }();
+}
+
 int
 main()
 {
@@ -207,4 +219,5 @@ main()
   test09();
   test10();
   test11();
+  test12();
 }


[committed] libstdc++: Add missing constexpr to std::variant (P2231R1)

2021-10-15 Thread Jonathan Wakely via Gcc-patches
This implements the changes in P2231R1 which make std::variant fully
constexpr in C++20.

We need to replace placement new with std::construct_at, but that isn't
defined for C++17. Use std::_Construct instead, which forwards to
std::construct_at in C++20 mode (since the related changes to make
std::optional fully constexpr, in r12-4389).

We also need to replace the untyped char buffer in _Uninitialized with a
union, which can be accessed in constexpr functions. But the union needs
to have a non-trivial destructor if its variant type is non-trivial,
which means that the _Variadic_union also needs a non-trivial
destructor. This adds a constrained partial specialization of
_Variadic_union for the C++20-only case where a non-trivial destructor
is needed.

We can't use concepts to constrain the specialization (or the primary
template's destructor) in C++17, so retain the untyped char buffer
solution for C++17 mode.

libstdc++-v3/ChangeLog:

* include/std/variant (__cpp_lib_variant): Update value for
C++20.
(__variant_cast, __variant_construct): Add constexpr for C++20.
(__variant_construct_single, __construct_by_index) Likewise. Use
std::_Construct instead of placement new.
(_Uninitialized) [__cplusplus >= 202002]: Replace
buffer with a union and define a destructor.
(_Variadic_union) [__cplusplus >= 202002]: Add a specialization
for non-trivial destruction.
(_Variant_storage::__index_of): New helper variable template.
(_Variant_storage::~_Variant_storage()): Add constexpr.
(_Variant_storage::_M_reset()): Likewise.
(_Copy_ctor_base, _Move_ctor_base): Likewise.
(_Copy_assign_base, _Move_assign_base): Likewise.
(variant, swap): Likewise.
* include/std/version (__cpp_lib_variant): Update value for
C++20.
* testsuite/20_util/optional/version.cc: Check for exact value
in C++17.
* testsuite/20_util/variant/87619.cc: Increase timeout for
C++20 mode.
* testsuite/20_util/variant/constexpr.cc: New test.
* testsuite/20_util/variant/version.cc: New test.

Tested powerpc64le-linux. Committed to trunk.

commit ad820b0bb5f8342a8db2831d1f15c103583a3ba0
Author: Jonathan Wakely 
Date:   Thu Oct 14 13:27:03 2021

libstdc++: Add missing constexpr to std::variant (P2231R1)

This implements the changes in P2231R1 which make std::variant fully
constexpr in C++20.

We need to replace placement new with std::construct_at, but that isn't
defined for C++17. Use std::_Construct instead, which forwards to
std::construct_at in C++20 mode (since the related changes to make
std::optional fully constexpr, in r12-4389).

We also need to replace the untyped char buffer in _Uninitialized with a
union, which can be accessed in constexpr functions. But the union needs
to have a non-trivial destructor if its variant type is non-trivial,
which means that the _Variadic_union also needs a non-trivial
destructor. This adds a constrained partial specialization of
_Variadic_union for the C++20-only case where a non-trivial destructor
is needed.

We can't use concepts to constrain the specialization (or the primary
template's destructor) in C++17, so retain the untyped char buffer
solution for C++17 mode.

libstdc++-v3/ChangeLog:

* include/std/variant (__cpp_lib_variant): Update value for
C++20.
(__variant_cast, __variant_construct): Add constexpr for C++20.
(__variant_construct_single, __construct_by_index) Likewise. Use
std::_Construct instead of placement new.
(_Uninitialized) [__cplusplus >= 202002]: Replace
buffer with a union and define a destructor.
(_Variadic_union) [__cplusplus >= 202002]: Add a specialization
for non-trivial destruction.
(_Variant_storage::__index_of): New helper variable template.
(_Variant_storage::~_Variant_storage()): Add constexpr.
(_Variant_storage::_M_reset()): Likewise.
(_Copy_ctor_base, _Move_ctor_base): Likewise.
(_Copy_assign_base, _Move_assign_base): Likewise.
(variant, swap): Likewise.
* include/std/version (__cpp_lib_variant): Update value for
C++20.
* testsuite/20_util/optional/version.cc: Check for exact value
in C++17.
* testsuite/20_util/variant/87619.cc: Increase timeout for
C++20 mode.
* testsuite/20_util/variant/constexpr.cc: New test.
* testsuite/20_util/variant/version.cc: New test.

diff --git a/libstdc++-v3/include/std/variant b/libstdc++-v3/include/std/variant
index f49094130ee..d18365fde22 100644
--- a/libstdc++-v3/include/std/variant
+++ b/libstdc++-v3/include/std/variant
@@ -39,13 +39,14 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 

[committed] libstdc++: Remove try/catch overhead in std::variant::emplace

2021-10-15 Thread Jonathan Wakely via Gcc-patches
The __variant_construct_by_index helper function sets the new index
before constructing the new object. This means that if the construction
throws then the exception needs to be caught, so the index can be reset
to variant_npos, and then the exception rethrown. This means callers are
responsible for restoring the variant's invariants and they need the
overhead of a catch handler and a rethrow.

If we don't set the index until after construction completes then the
invariant is never broken, and callers can ignore the exception and let
it propagate. The callers all call _M_reset() first, which sets index to
variant_npos as required while the variant is valueless.

We need to be slightly careful here, because changing the order of
operations in __variant_construct_by_index and removing the try-block
from variant::emplace changes an implicit ABI contract between those
two functions. If the linker were to create an executable containing an
instantiation of the old __variant_construct_by_index and an
instantiation of the new variant::emplace code then we would have a
combination that breaks the invariant and doesn't have the exception
handling to restore it. To avoid this problem, we can rename the
__variant_construct_by_index function so that the new emplace code
calls a new symbol, and is unaffected by the behaviour of the old
symbol.

libstdc++-v3/ChangeLog:

* include/std/variant (__detail::__variant::__get_storage):
Remove unused function.
(__variant_construct_by_index): Set index after construction is
complete. Rename to ...
(__detail::__variant::__construct_by_index): ... this.
(variant): Use new name for __variant_construct_by_index friend
declaration. Remove __get_storage friend declaration.
(variant::emplace): Use new name and remove try-blocks.

Tested powerpc64le-linux. Committed to trunk.

commit e27771e5dcd8cf2cb757db6177a3485acd28b88f
Author: Jonathan Wakely 
Date:   Fri Oct 15 10:58:56 2021

libstdc++: Remove try/catch overhead in std::variant::emplace

The __variant_construct_by_index helper function sets the new index
before constructing the new object. This means that if the construction
throws then the exception needs to be caught, so the index can be reset
to variant_npos, and then the exception rethrown. This means callers are
responsible for restoring the variant's invariants and they need the
overhead of a catch handler and a rethrow.

If we don't set the index until after construction completes then the
invariant is never broken, and callers can ignore the exception and let
it propagate. The callers all call _M_reset() first, which sets index to
variant_npos as required while the variant is valueless.

We need to be slightly careful here, because changing the order of
operations in __variant_construct_by_index and removing the try-block
from variant::emplace changes an implicit ABI contract between those
two functions. If the linker were to create an executable containing an
instantiation of the old __variant_construct_by_index and an
instantiation of the new variant::emplace code then we would have a
combination that breaks the invariant and doesn't have the exception
handling to restore it. To avoid this problem, we can rename the
__variant_construct_by_index function so that the new emplace code
calls a new symbol, and is unaffected by the behaviour of the old
symbol.

libstdc++-v3/ChangeLog:

* include/std/variant (__detail::__variant::__get_storage):
Remove unused function.
(__variant_construct_by_index): Set index after construction is
complete. Rename to ...
(__detail::__variant::__construct_by_index): ... this.
(variant): Use new name for __variant_construct_by_index friend
declaration. Remove __get_storage friend declaration.
(variant::emplace): Use new name and remove try-blocks.

diff --git a/libstdc++-v3/include/std/variant b/libstdc++-v3/include/std/variant
index 4a6826b7ba6..f49094130ee 100644
--- a/libstdc++-v3/include/std/variant
+++ b/libstdc++-v3/include/std/variant
@@ -1094,19 +1094,21 @@ namespace __variant
>;
 }
 
-} // namespace __variant
-} // namespace __detail
-
   template
-void __variant_construct_by_index(_Variant& __v, _Args&&... __args)
+inline void
+__construct_by_index(_Variant& __v, _Args&&... __args)
 {
-  __v._M_index = _Np;
   auto&& __storage = __detail::__variant::__get<_Np>(__v);
   ::new ((void*)std::addressof(__storage))
 remove_reference_t
  (std::forward<_Args>(__args)...);
+  // Construction didn't throw, so can set the new index now:
+  __v._M_index = _Np;
 }
 
+} // namespace __variant
+} // namespace __detail
+
   template
 constexpr bool
 holds_alternative(const variant<_Types...>& __v) noexcept
@@ 

[committed] libstdc++: Remove unused functions in std::variant implementation

2021-10-15 Thread Jonathan Wakely via Gcc-patches
These functions aren't used, and accessing the storage as a void* isn't
compatible with C++20 constexpr requirements anyway, so we're unlikely
to ever start using them in future.

libstdc++-v3/ChangeLog:

* include/std/variant (_Variant_storage::_M_storage()): Remove.
(__detail::__variant::__get_storage): Remove.
(variant): Remove friend declaration of __get_storage.

Tested powerpc64le-linux. Committed to trunk.

commit 1ba7adabf29eb671e418692fad076ea6edd08e3d
Author: Jonathan Wakely 
Date:   Fri Oct 15 11:52:08 2021

libstdc++: Remove unused functions in std::variant implementation

These functions aren't used, and accessing the storage as a void* isn't
compatible with C++20 constexpr requirements anyway, so we're unlikely
to ever start using them in future.

libstdc++-v3/ChangeLog:

* include/std/variant (_Variant_storage::_M_storage()): Remove.
(__detail::__variant::__get_storage): Remove.
(variant): Remove friend declaration of __get_storage.

diff --git a/libstdc++-v3/include/std/variant b/libstdc++-v3/include/std/variant
index b85a89d0b7b..4a6826b7ba6 100644
--- a/libstdc++-v3/include/std/variant
+++ b/libstdc++-v3/include/std/variant
@@ -434,13 +434,6 @@ namespace __variant
   ~_Variant_storage()
   { _M_reset(); }
 
-  void*
-  _M_storage() const noexcept
-  {
-   return const_cast(static_cast(
-   std::addressof(_M_u)));
-  }
-
   constexpr bool
   _M_valid() const noexcept
   {
@@ -472,13 +465,6 @@ namespace __variant
   void _M_reset() noexcept
   { _M_index = static_cast<__index_type>(variant_npos); }
 
-  void*
-  _M_storage() const noexcept
-  {
-   return const_cast(static_cast(
-   std::addressof(_M_u)));
-  }
-
   constexpr bool
   _M_valid() const noexcept
   {
@@ -809,11 +795,6 @@ namespace __variant
 : _FUN_type<_Tp, _Variant>
 { };
 
-  // Returns the raw storage for __v.
-  template
-void* __get_storage(_Variant&& __v) noexcept
-{ return __v._M_storage(); }
-
   template 
 struct _Extra_visit_slot_needed
 {
@@ -1690,10 +1671,6 @@ namespace __variant
friend constexpr decltype(auto)
__detail::__variant::__get(_Vp&& __v) noexcept;
 
-  template
-   friend void*
-   __detail::__variant::__get_storage(_Vp&& __v) noexcept;
-
 #define _VARIANT_RELATION_FUNCTION_TEMPLATE(__OP) \
   template \
friend constexpr bool \


[pushed] Darwin, D: Fix D bootstrap, include tm-dwarf2.h.

2021-10-15 Thread Iain Sandoe via Gcc-patches
After r12-4432-g7bfe7d634f60b0a9 Darwin fails to bootstrap with D
enabled since there is no definition of either DWARF2_DEBUG_INFO or
PREFERRED_DEBUGGING_TYPE.

Fixed here by adding the tm-dwarf2.h file to tm_d_file for Darwin.

tested on x86_64-darwin, pushed to master, thanks,
Iain

P.S. it is not obvious to me where Linux is getting the definition from.

Signed-off-by: Iain Sandoe 

gcc/ChangeLog:

* config.gcc: Add tm-dwarf2.h to tm_d-file.
---
 gcc/config.gcc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index aa5bd5d1459..3675e063a53 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -666,6 +666,7 @@ case ${target} in
 *-*-darwin*)
   tmake_file="t-darwin "
   tm_file="${tm_file} darwin.h"
+  tm_d_file="${tm_d_file} tm-dwarf2.h"
   darwin_os=`echo ${target} | sed 's/.*darwin\([0-9.]*\).*$/\1/'`
   darwin_maj=`expr "$darwin_os" : '\([0-9]*\).*'`
   macos_min=`expr "$darwin_os" : '[0-9]*\.\([0-9]*\).*'`
-- 
2.24.3 (Apple Git-128)



[COMMITTED] Ranger : Do not process abnormal ssa-names.

2021-10-15 Thread Andrew MacLeod via Gcc-patches

On 10/15/21 10:17 AM, Jeff Law wrote:




I don't want to push it quite yet as I wanted feedback to make sure 
we don't actually do anything I'm not aware of with SSA_NAMES which 
have the ABNORMAL_PHI flag set.  Most of the code i can find in VRP 
and vr-values appears to punt, so I presume not even considering 
those names is fine?


This also seems like something that might be worth back-porting, 
especially the hybrid pass parts...
Punting on the abnormals seems perfectly fine to me.  They rarely, if 
ever, provide information that improves optimization.


Jeff


pushed as commit  93ac832f1846e4867aa6537f76f510fab8e3e87d

Andrew



Re: [PATCH v2 12/14] arm: Convert more load/store MVE builtins to predicate qualifiers

2021-10-15 Thread Richard Sandiford via Gcc-patches
Christophe LYON  writes:
> On 15/10/2021 17:08, Richard Sandiford wrote:
>> Christophe Lyon via Gcc-patches  writes:
>>> This patch covers a few builtins where we do not use the 
>>> iterator and thus we cannot use .
>>>
>>> For v2di instructions, we use the V8BI mode for predicates.
>> Why V8BI though, when VPRED uses HI?
>
>
> Hmm.. I used your suggestion:
>
> https://gcc.gnu.org/pipermail/gcc-patches/2021-October/581362.html
>
> Maybe I misinterpreted it?

Gah!  Sorry, that was a typo for V2BI :-(

>> Would it make sense to define a V2BI?  Or doesn't that work?
>
> I didn't try, I'll have a look. wrt the architecture, I'm not sure what 
> this would mean?

Hmm, yeah, I guess V2BI is wrong because every 4th bit matters.
It would need to be V4BI instead.

I guess if the number of predicate elements doesn't match the number
of data elements, there's probably not much point using a dedicated
predicate mode.

So yeah, maybe keeping HImode is better after all.  Sorry for the
misdirection.

Richard


Re: [PATCH] Adjust testcase for O2 vectorization.

2021-10-15 Thread Martin Sebor via Gcc-patches

On 10/14/21 1:11 AM, liuhongt wrote:

Hi Kewen:
   Cound you help to verify if this patch fix those regressions
for rs6000 port.

As discussed in [1], this patch add xfail/target selector to those
testcases, also make a copy of them so that they can be tested w/o
vectorization.


Just to make sure I understand what's happening with the tests:
the new -N-novec.c tests consist of just the casses xfailed due
to vectorizartion in the corresponding -N.c tests?  Or are there
some other differences (e.g., new cases in them, etc.)?  I'd
hope to eventually remove the -novec.c tests once all warnings
behave as expected with vectorization as without it (maybe
keeping just one case both ways as a sanity check).

For the target-supports selectors, I confess I don't know enough
about vectorization to find their names quite intuitive enough
to know when to use each.  For instance, for vect_slp_v4qi_store:

+# Return the true if target support vectorization of v4qi store.
+proc check_effective_target_vect_slp_v4qi_store { } {
+set pattern {add new stmt: MEM }
+return [expr { [check_vect_slp_vnqihi_store_usage $pattern ] != 0 }]
+}

When should this selector be used?  In cases involving 4-byte
char stores?  Only naturally aligned 4-bytes stores (i.e., on
a 4 byte boundary, as the check_vect_slp_vnqihi_store_usage
suggests?) Or 4-byte stores of any types (e.g., four chars
as well as two 16-bit shorts), etc.?

Hopefully once all the warnings handle vectorization we won't
need to use them, but until then it would be good to document
this in more detail in the .exp file.

Finally, thank you for adding comments to the xfailed tests
referencing the corresponding bugs!  Can you please mention
the PR in the comment in each of the new xfails?  Like so:

index 7d29b5f48c7..cb687c69324 100644
--- a/gcc/testsuite/c-c++-common/Wstringop-overflow-2.c
+++ b/gcc/testsuite/c-c++-common/Wstringop-overflow-2.c
@@ -189,8 +189,9 @@ void ga1__ (void)

   struct A1 a = { 1 };
   a.a[0] = 0;
+  // O2 vectorization regress Wstringop-overflow case (1), refer to 
pr102462.
   a.a[1] = 1;// { dg-warning 
"\\\[-Wstringop-overflow" }
-  a.a[2] = 2;// { dg-warning 
"\\\[-Wstringop-overflow" "" { xfail { i?86-*-* x86_64-*-* } } }
+  a.a[2] = 2;// { dg-warning 
"\\\[-Wstringop-overflow" "pr102462" { xfail { vect_slp_v2qi_store } } }

   
   PR in dg-warning comment.

This should make it easier to deal with the XFAILs once
the warnings have improved to handle vectorization.

Martin


Re: [PATCH v2 12/14] arm: Convert more load/store MVE builtins to predicate qualifiers

2021-10-15 Thread Christophe LYON via Gcc-patches



On 15/10/2021 17:08, Richard Sandiford wrote:

Christophe Lyon via Gcc-patches  writes:

This patch covers a few builtins where we do not use the 
iterator and thus we cannot use .

For v2di instructions, we use the V8BI mode for predicates.

Why V8BI though, when VPRED uses HI?



Hmm.. I used your suggestion:

https://gcc.gnu.org/pipermail/gcc-patches/2021-October/581362.html

Maybe I misinterpreted it?




Would it make sense to define a V2BI?  Or doesn't that work?


I didn't try, I'll have a look. wrt the architecture, I'm not sure what 
this would mean?



Thanks,


Christophe




Thanks,
Richard


2021-10-13  Christophe Lyon  

gcc/
PR target/100757
PR target/101325
* config/arm/arm-builtins.c (STRSBS_P_QUALIFIERS): Use predicate
qualifier.
(STRSBU_P_QUALIFIERS): Likewise.
(LDRGBS_Z_QUALIFIERS): Likewise.
(LDRGBU_Z_QUALIFIERS): Likewise.
(LDRGBWBXU_Z_QUALIFIERS): Likewise.
(LDRGBWBS_Z_QUALIFIERS): Likewise.
(LDRGBWBU_Z_QUALIFIERS): Likewise.
(STRSBWBS_P_QUALIFIERS): Likewise.
(STRSBWBU_P_QUALIFIERS): Likewise.
* config/arm/mve.md: Use VxBI instead of HI.

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 06ff9d2278a..e58580bb828 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -738,13 +738,13 @@ arm_strss_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
  static enum arm_type_qualifiers
  arm_strsbs_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
= { qualifier_void, qualifier_unsigned, qualifier_immediate,
-  qualifier_none, qualifier_unsigned};
+  qualifier_none, qualifier_predicate};
  #define STRSBS_P_QUALIFIERS (arm_strsbs_p_qualifiers)
  
  static enum arm_type_qualifiers

  arm_strsbu_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
= { qualifier_void, qualifier_unsigned, qualifier_immediate,
-  qualifier_unsigned, qualifier_unsigned};
+  qualifier_unsigned, qualifier_predicate};
  #define STRSBU_P_QUALIFIERS (arm_strsbu_p_qualifiers)
  
  static enum arm_type_qualifiers

@@ -780,13 +780,13 @@ arm_ldrgbu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
  static enum arm_type_qualifiers
  arm_ldrgbs_z_qualifiers[SIMD_MAX_BUILTIN_ARGS]
= { qualifier_none, qualifier_unsigned, qualifier_immediate,
-  qualifier_unsigned};
+  qualifier_predicate};
  #define LDRGBS_Z_QUALIFIERS (arm_ldrgbs_z_qualifiers)
  
  static enum arm_type_qualifiers

  arm_ldrgbu_z_qualifiers[SIMD_MAX_BUILTIN_ARGS]
= { qualifier_unsigned, qualifier_unsigned, qualifier_immediate,
-  qualifier_unsigned};
+  qualifier_predicate};
  #define LDRGBU_Z_QUALIFIERS (arm_ldrgbu_z_qualifiers)
  
  static enum arm_type_qualifiers

@@ -826,7 +826,7 @@ arm_ldrgbwbxu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
  static enum arm_type_qualifiers
  arm_ldrgbwbxu_z_qualifiers[SIMD_MAX_BUILTIN_ARGS]
= { qualifier_unsigned, qualifier_unsigned, qualifier_immediate,
-  qualifier_unsigned};
+  qualifier_predicate};
  #define LDRGBWBXU_Z_QUALIFIERS (arm_ldrgbwbxu_z_qualifiers)
  
  static enum arm_type_qualifiers

@@ -842,13 +842,13 @@ arm_ldrgbwbu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
  static enum arm_type_qualifiers
  arm_ldrgbwbs_z_qualifiers[SIMD_MAX_BUILTIN_ARGS]
= { qualifier_none, qualifier_unsigned, qualifier_immediate,
-  qualifier_unsigned};
+  qualifier_predicate};
  #define LDRGBWBS_Z_QUALIFIERS (arm_ldrgbwbs_z_qualifiers)
  
  static enum arm_type_qualifiers

  arm_ldrgbwbu_z_qualifiers[SIMD_MAX_BUILTIN_ARGS]
= { qualifier_unsigned, qualifier_unsigned, qualifier_immediate,
-  qualifier_unsigned};
+  qualifier_predicate};
  #define LDRGBWBU_Z_QUALIFIERS (arm_ldrgbwbu_z_qualifiers)
  
  static enum arm_type_qualifiers

@@ -864,13 +864,13 @@ arm_strsbwbu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
  static enum arm_type_qualifiers
  arm_strsbwbs_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
= { qualifier_unsigned, qualifier_unsigned, qualifier_const,
-  qualifier_none, qualifier_unsigned};
+  qualifier_none, qualifier_predicate};
  #define STRSBWBS_P_QUALIFIERS (arm_strsbwbs_p_qualifiers)
  
  static enum arm_type_qualifiers

  arm_strsbwbu_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
= { qualifier_unsigned, qualifier_unsigned, qualifier_const,
-  qualifier_unsigned, qualifier_unsigned};
+  qualifier_unsigned, qualifier_predicate};
  #define STRSBWBU_P_QUALIFIERS (arm_strsbwbu_p_qualifiers)
  
  static enum arm_type_qualifiers

diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 81ad488155d..c07487c0750 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -7282,7 +7282,7 @@ (define_insn "mve_vstrwq_scatter_base_p_v4si"
[(match_operand:V4SI 0 "s_register_operand" "w")
 (match_operand:SI 1 "immediate_operand" "i")
 (match_operand:V4SI 2 "s_register_operand" "w")
-(match_operand:HI 3 "vpr_register_operand" "Up")]
+(match_operand:V4BI 3 

Re: [PATCH] Fix handling of flag_rename_registers.

2021-10-15 Thread Martin Liška

On 10/14/21 16:27, Jeff Law wrote:

So what's the preferred way to handle this?  We're in the process of evaluating 
-frename-registers on our target right now and subject to verification of a 
couple issues, our inclination is to turn it on for our target at -O2.

Jeff


I think the best approach is doing that in TARGET_OPTION_OPTIMIZATION_TABLE 
like c6x does:

static const struct default_options c6x_option_optimization_table[] =
  {
{ OPT_LEVELS_1_PLUS, OPT_frename_registers, NULL, 1 },
...
}

Cheers,
Martin


Re: [PATCH] rs6000: Remove unnecessary option manipulation.

2021-10-15 Thread Martin Liška

On 10/14/21 17:10, Bill Schmidt via Gcc-patches wrote:

Looks like you got your parentheses wrong here.


Whoops, thanks for the heads up.

I'm testing this fixed version.

P.S. Next time, please CC me.

Thanks,
MartinFrom cd9891ec3eed3a5b289b7c556598606d21e48206 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Wed, 13 Oct 2021 14:20:33 +0200
Subject: [PATCH] rs6000: Remove unnecessary option manipulation.

gcc/ChangeLog:

	* config/rs6000/rs6000.c (rs6000_override_options_after_change):
	Do not set flag_rename_registers, it's already default behavior.
	Use EnabledBy for unroll_only_small_loops.
	* config/rs6000/rs6000.opt: Use EnabledBy for
	unroll_only_small_loops.
---
 gcc/config/rs6000/rs6000.c   | 7 +--
 gcc/config/rs6000/rs6000.opt | 2 +-
 2 files changed, 2 insertions(+), 7 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 01a95591a5d..b9dddcd0aa1 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -3472,13 +3472,8 @@ rs6000_override_options_after_change (void)
   /* Explicit -funroll-loops turns -munroll-only-small-loops off, and
  turns -frename-registers on.  */
   if ((OPTION_SET_P (flag_unroll_loops) && flag_unroll_loops)
-   || (OPTION_SET_P (flag_unroll_all_loops)
-	   && flag_unroll_all_loops))
+   || (OPTION_SET_P (flag_unroll_all_loops) && flag_unroll_all_loops))
 {
-  if (!OPTION_SET_P (unroll_only_small_loops))
-	unroll_only_small_loops = 0;
-  if (!OPTION_SET_P (flag_rename_registers))
-	flag_rename_registers = 1;
   if (!OPTION_SET_P (flag_cunroll_grow_size))
 	flag_cunroll_grow_size = 1;
 }
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index 9d7878f144a..faeb7423ca7 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -546,7 +546,7 @@ Target Undocumented Var(rs6000_optimize_swaps) Init(1) Save
 Analyze and remove doubleword swaps from VSX computations.
 
 munroll-only-small-loops
-Target Undocumented Var(unroll_only_small_loops) Init(0) Save
+Target Undocumented Var(unroll_only_small_loops) Init(0) Save EnabledBy(funroll-loops)
 ; Use conservative small loop unrolling.
 
 mpower9-misc
-- 
2.33.0



[committed] openmp: Improve testsuite/libgomp.c/affinity-1.c testcase

2021-10-15 Thread Jakub Jelinek via Gcc-patches
Hi!

I've noticed that while I have added hopefully sufficient test coverage
for the case where one uses simple number or !number as p-interval,
I haven't added any coverage for number:len:stride or number:len.

This patch adds that.

Tested on x86_64-linux and i686-linux, committed to trunk.

2021-10-15  Jakub Jelinek  

* testsuite/libgomp.c/affinity-1.c (struct places): Change name field
type from char [50] to const char *.
(places_array): Add a testcase for simplified syntax place followed
by length or length and stride.

--- libgomp/testsuite/libgomp.c/affinity-1.c.jj 2021-10-15 15:58:51.660135285 
+0200
+++ libgomp/testsuite/libgomp.c/affinity-1.c2021-10-15 17:08:13.638249421 
+0200
@@ -48,7 +48,7 @@ struct place
 };
 struct places
 {
-  char name[50];
+  const char *name;
   int count;
   struct place places[8];
 } places_array[] = {
@@ -63,7 +63,8 @@ struct places
   { 4, 1 }, { 5, 1 }, { 6, 1 }, { 7, 1 } } },
   { "{0,1},{3,2,4},{6,5,!6},{6},{7:2:-1,!6}", 5,
 { { 0, 2 }, { 2, 3 }, { 5, 1 }, { 6, 1 }, { 7, 1 } } },
-  { "1,2,{2,3,!2},3,3,!3,!{5:3:-1,!4,!5},{4},5,!4,!5", 3,
+  { "1,2,{2,3,!2},3,3,!3,!{5:3:-1,!4,!5},{4},5,!4,!5,"
+"1:2,!{1},!2,7:3:-2,!{5},!7,!3", 3,
 { { 1, 1 }, { 2, 1 }, { 3, 1 } } }
 };
 

Jakub



Re: [PATCH] options: Fix variable tracking option processing.

2021-10-15 Thread Martin Liška

All right, and there's second part that moves the code
from toplev.c to opts.c (finish_options) as I've done in the original version.

The patch also handles PR102766 where nvptx.c target sets:
debug_nonbind_markers_p = 0;

So the easiest approach is marking the flag as set in global_options_set,
I haven't found a better approach :/ Reason is that nvptx_option_override
is called before finish_options.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
MartinFrom 827647ab23b8bec9d20094e009341598e12a644b Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Thu, 14 Oct 2021 14:57:18 +0200
Subject: [PATCH] options: Fix variable tracking option processing.

	PR debug/102585
	PR bootstrap/102766

gcc/ChangeLog:

	* opts.c (finish_options): Process flag_var_tracking* options
	here as they can be adjusted by optimize attribute.
	* toplev.c (process_options): Remove it here.

gcc/testsuite/ChangeLog:

	* gcc.dg/pr102585.c: New test.
---
 gcc/config/nvptx/nvptx.c|  1 +
 gcc/opts.c  | 19 +++
 gcc/testsuite/gcc.dg/pr102585.c |  6 ++
 gcc/toplev.c| 21 +
 4 files changed, 27 insertions(+), 20 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr102585.c

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 951252e598a..1e4b26381c5 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -219,6 +219,7 @@ nvptx_option_override (void)
 flag_toplevel_reorder = 1;
 
   debug_nonbind_markers_p = 0;
+  OPTION_SET_P (debug_nonbind_markers_p) = 1;
 
   /* Set flag_no_common, unless explicitly disabled.  We fake common
  using .weak, and that's not entirely accurate, so avoid it
diff --git a/gcc/opts.c b/gcc/opts.c
index 65fe192a198..bf3a81c287e 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -1349,6 +1349,25 @@ finish_options (struct gcc_options *opts, struct gcc_options *opts_set,
 SET_OPTION_IF_UNSET (opts, opts_set, flag_vect_cost_model,
 			 VECT_COST_MODEL_CHEAP);
 
+  /* One could use EnabledBy, but it would lead to a circular dependency.  */
+  if (!OPTION_SET_P (flag_var_tracking_uninit))
+ flag_var_tracking_uninit = flag_var_tracking;
+
+  if (!OPTION_SET_P (flag_var_tracking_assignments))
+flag_var_tracking_assignments
+  = (flag_var_tracking
+	 && !(flag_selective_scheduling || flag_selective_scheduling2));
+
+  if (flag_var_tracking_assignments_toggle)
+flag_var_tracking_assignments = !flag_var_tracking_assignments;
+
+  if (flag_var_tracking_assignments && !flag_var_tracking)
+flag_var_tracking = flag_var_tracking_assignments = -1;
+
+  if (flag_var_tracking_assignments
+  && (flag_selective_scheduling || flag_selective_scheduling2))
+warning_at (loc, 0,
+		"var-tracking-assignments changes selective scheduling");
 }
 
 #define LEFT_COLUMN	27
diff --git a/gcc/testsuite/gcc.dg/pr102585.c b/gcc/testsuite/gcc.dg/pr102585.c
new file mode 100644
index 000..efd066b4a4e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr102585.c
@@ -0,0 +1,6 @@
+/* PR debug/102585 */
+/* { dg-do compile } */
+/* { dg-options "-fvar-tracking-assignments -fno-var-tracking" } */
+
+#pragma GCC optimize 0
+void d_demangle_callback_Og() { int c = 0; }
diff --git a/gcc/toplev.c b/gcc/toplev.c
index 4f574a5aad3..7c0467948f2 100644
--- a/gcc/toplev.c
+++ b/gcc/toplev.c
@@ -1494,6 +1494,7 @@ process_options (bool no_backend)
 	}
   flag_var_tracking = 0;
   flag_var_tracking_uninit = 0;
+  flag_var_tracking_assignments = 0;
 }
 
   /* The debug hooks are used to implement -fdump-go-spec because it
@@ -1502,26 +1503,6 @@ process_options (bool no_backend)
   if (flag_dump_go_spec != NULL)
 debug_hooks = dump_go_spec_init (flag_dump_go_spec, debug_hooks);
 
-  /* One could use EnabledBy, but it would lead to a circular dependency.  */
-  if (!OPTION_SET_P (flag_var_tracking_uninit))
- flag_var_tracking_uninit = flag_var_tracking;
-
-  if (!OPTION_SET_P (flag_var_tracking_assignments))
-flag_var_tracking_assignments
-  = (flag_var_tracking
-	 && !(flag_selective_scheduling || flag_selective_scheduling2));
-
-  if (flag_var_tracking_assignments_toggle)
-flag_var_tracking_assignments = !flag_var_tracking_assignments;
-
-  if (flag_var_tracking_assignments && !flag_var_tracking)
-flag_var_tracking = flag_var_tracking_assignments = -1;
-
-  if (flag_var_tracking_assignments
-  && (flag_selective_scheduling || flag_selective_scheduling2))
-warning_at (UNKNOWN_LOCATION, 0,
-		"var-tracking-assignments changes selective scheduling");
-
   if (!OPTION_SET_P (debug_nonbind_markers_p))
 debug_nonbind_markers_p
   = (optimize
-- 
2.33.0



Re: [PATCH]AArch64 Lower intrinsics shift to GIMPLE when possible.

2021-10-15 Thread Richard Sandiford via Gcc-patches
Tamar Christina  writes:
>> -Original Message-
>> From: Richard Sandiford 
>> Sent: Friday, October 15, 2021 1:26 PM
>> To: Tamar Christina 
>> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
>> ; Marcus Shawcroft
>> ; Kyrylo Tkachov 
>> Subject: Re: [PATCH]AArch64 Lower intrinsics shift to GIMPLE when possible.
>> 
>> Tamar Christina  writes:
>> > Hi All,
>> >
>> > This lowers shifts to GIMPLE when the C interpretations of the shift
>> > operations matches that of AArch64.
>> >
>> > In C shifting right by BITSIZE is undefined, but the behavior is
>> > defined in AArch64.  Additionally negative shifts lefts are undefined
>> > in C but defined for the register variant of the instruction (SSHL, USHL) 
>> > as
>> being right shifts.
>> >
>> > Since we have a right shift by immediate I rewrite those cases into
>> > right shifts
>> >
>> > So:
>> >
>> > int64x1_t foo3 (int64x1_t a)
>> > {
>> >   return vshl_s64 (a, vdup_n_s64(-6)); }
>> >
>> > produces:
>> >
>> > foo3:
>> > sshrd0, d0, 6
>> > ret
>> >
>> > instead of:
>> >
>> > foo3:
>> > mov x0, -6
>> > fmovd1, x0
>> > sshld0, d0, d1
>> > ret
>> >
>> > This behavior isn't specifically mentioned for a left shift by
>> > immediate, but I believe that only the case because we do have a right
>> > shift by immediate but not a right shift by register.  As such I do the 
>> > same
>> for left shift by immediate.
>> >
>> > The testsuite already has various testcases for shifts (vshl.c etc) so
>> > I am not adding overlapping tests here.
>> >
>> > Out of range shifts like
>> >
>> > int64x1_t foo3 (int64x1_t a)
>> > {
>> >   return vshl_s64 (a, vdup_n_s64(80)); }
>> >
>> > now get optimized to 0 as well along with undefined behaviors both in
>> > C and AArch64.
>> 
>> The SSHL results are well-defined for all shift amounts, so we shouldn't
>> convert them to undefined gimple, even as a temporary step.  E.g.:
>> 
>> int32x4_t foo(int32x4_t x) {
>>   return vshlq_s32(x, vdupq_n_s32(256)); }
>> 
>> should fold to “x” (if we fold it at all).  Similarly:
>> 
>> int32x4_t foo(int32x4_t x) {
>>   return vshlq_s32(x, vdupq_n_s32(257)); }
>> 
>> should fold to x << 1 (again if we fold it at all).
>> 
>> For a shift right:
>> 
>> int32x4_t foo(int32x4_t x) {
>>   return vshlq_s32(x, vdupq_n_s32(-64)); }
>> 
>> is equivalent to:
>> 
>> int32x4_t foo(int32x4_t x) {
>>   return vshrq_n_s32(x, 31);
>> }
>> 
>> and so it shouldn't fold to 0.
>
> And here I thought I had read the specs very carefully...
>
> I will punt on  them because I don't think those ranged are common at all.

Sounds good.

There were other review comments further down the message (I should have
been clearer about that, sorry).  Could you have a look at those too?

Thanks,
Richard

> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
>
> --- inline copy of patch ---
>
> diff --git a/gcc/config/aarch64/aarch64-builtins.c 
> b/gcc/config/aarch64/aarch64-builtins.c
> index 
> f6b41d9c200d6300dee65ba60ae94488231a8a38..568775cb8effaf51a692ba12af99e9865d2cf8a3
>  100644
> --- a/gcc/config/aarch64/aarch64-builtins.c
> +++ b/gcc/config/aarch64/aarch64-builtins.c
> @@ -2394,6 +2394,68 @@ aarch64_general_gimple_fold_builtin (unsigned int 
> fcode, gcall *stmt)
>  1, args[0]);
>   gimple_call_set_lhs (new_stmt, gimple_call_lhs (stmt));
>   break;
> +  BUILTIN_VSDQ_I_DI (BINOP, ashl, 3, NONE)
> + {
> +   tree cst = args[1];
> +   tree ctype = TREE_TYPE (cst);
> +   if (INTEGRAL_TYPE_P (ctype)
> +   && TREE_CODE (cst) == INTEGER_CST)
> + {
> +   wide_int wcst = wi::to_wide (cst);
> +   if (wi::neg_p (wcst, TYPE_SIGN (ctype)))
> + new_stmt =
> +   gimple_build_assign (gimple_call_lhs (stmt),
> +RSHIFT_EXPR, args[0],
> +wide_int_to_tree (ctype,
> +  wi::abs (wcst)));
> +   else
> + new_stmt =
> +   gimple_build_assign (gimple_call_lhs (stmt),
> +LSHIFT_EXPR, args[0], args[1]);
> + }
> + }
> + break;
> +  BUILTIN_VSDQ_I_DI (BINOP, sshl, 0, NONE)
> +  BUILTIN_VSDQ_I_DI (BINOP_UUS, ushl, 0, NONE)
> + {
> +   tree cst = args[1];
> +   tree ctype = TREE_TYPE (cst);
> +   HOST_WIDE_INT bits = GET_MODE_UNIT_BITSIZE (TYPE_MODE (TREE_TYPE 
> (args[0])));
> +   if (INTEGRAL_TYPE_P (ctype)
> +   && TREE_CODE (cst) == INTEGER_CST)
> + {
> +   wide_int wcst = wi::to_wide (cst);
> +   wide_int abs_cst = wi::abs (wcst);
> +   if (wi::geu_p (abs_cst, bits))
> + break;
> +
> +   if (wi::neg_p (wcst, TYPE_SIGN (ctype)))
> + new_stmt =
> +   gimple_build_assign (gimple_call_lhs (stmt),
> +  

[committed] hppa: Consistently use "rG" constraint for copy instruction in move patterns

2021-10-15 Thread John David Anglin

Some move patterns on hppa use the "rG" constraint and some just use the "r" 
constraint
for the copy instruction.  This patch makes all the move patterns consistent.  
It causes
a copy of register %r0 to always be used to zero a register.

There's no functional change since there are multiple ways to zero integer 
registers.

Tested on hppa2.0w-hp-hpux11.11 and hppa64-hp-hpux11.11.

Committed to active branches.

Dave
---
Consistently use "rG" constraint for copy instruction in move patterns

2021-10-15  John David Anglin  

gcc/ChangeLog:

* config/pa/pa.md: Consistently use "rG" constraint for copy
instruction in move patterns.

diff --git a/gcc/config/pa/pa.md b/gcc/config/pa/pa.md
index 13a25381b6d..5cda3b79933 100644
--- a/gcc/config/pa/pa.md
+++ b/gcc/config/pa/pa.md
@@ -2186,14 +2186,14 @@
   [(set (match_operand:SI 0 "move_dest_operand"
  "=r,r,r,r,r,r,Q,!*q,!r,!*f,*f,T,?r,?*f")
(match_operand:SI 1 "move_src_operand"
- "A,r,J,N,K,RQ,rM,!rM,!*q,!*fM,RT,*f,*f,r"))]
+ "A,rG,J,N,K,RQ,rM,!rM,!*q,!*fM,RT,*f,*f,r"))]
   "(register_operand (operands[0], SImode)
 || reg_or_0_operand (operands[1], SImode))
&& !TARGET_SOFT_FLOAT
&& !TARGET_64BIT"
   "@
ldw RT'%A1,%0
-   copy %1,%0
+   copy %r1,%0
ldi %1,%0
ldil L'%1,%0
{zdepi|depwi,z} %Z1,%0
@@ -2214,14 +2214,14 @@
   [(set (match_operand:SI 0 "move_dest_operand"
  "=r,r,r,r,r,r,Q,!*q,!r,!*f,*f,T")
(match_operand:SI 1 "move_src_operand"
- "A,r,J,N,K,RQ,rM,!rM,!*q,!*fM,RT,*f"))]
+ "A,rG,J,N,K,RQ,rM,!rM,!*q,!*fM,RT,*f"))]
   "(register_operand (operands[0], SImode)
 || reg_or_0_operand (operands[1], SImode))
&& !TARGET_SOFT_FLOAT
&& TARGET_64BIT"
   "@
ldw RT'%A1,%0
-   copy %1,%0
+   copy %r1,%0
ldi %1,%0
ldil L'%1,%0
{zdepi|depwi,z} %Z1,%0
@@ -2240,14 +2240,14 @@
   [(set (match_operand:SI 0 "move_dest_operand"
  "=r,r,r,r,r,r,Q,!*q,!r")
(match_operand:SI 1 "move_src_operand"
- "A,r,J,N,K,RQ,rM,!rM,!*q"))]
+ "A,rG,J,N,K,RQ,rM,!rM,!*q"))]
   "(register_operand (operands[0], SImode)
 || reg_or_0_operand (operands[1], SImode))
&& TARGET_SOFT_FLOAT
&& TARGET_64BIT"
   "@
ldw RT'%A1,%0
-   copy %1,%0
+   copy %r1,%0
ldi %1,%0
ldil L'%1,%0
{zdepi|depwi,z} %Z1,%0
@@ -2381,13 +2381,13 @@
   [(set (match_operand:SI 0 "move_dest_operand"
  "=r,r,r,r,r,r,Q,!*q,!r")
(match_operand:SI 1 "move_src_operand"
- "A,r,J,N,K,RQ,rM,!rM,!*q"))]
+ "A,rG,J,N,K,RQ,rM,!rM,!*q"))]
   "(register_operand (operands[0], SImode)
 || reg_or_0_operand (operands[1], SImode))
&& TARGET_SOFT_FLOAT"
   "@
ldw RT'%A1,%0
-   copy %1,%0
+   copy %r1,%0
ldi %1,%0
ldil L'%1,%0
{zdepi|depwi,z} %Z1,%0
@@ -2909,11 +2909,11 @@
   [(set (match_operand:HI 0 "move_dest_operand"
  "=r,r,r,r,r,Q,!*q,!r")
(match_operand:HI 1 "move_src_operand"
- "r,J,N,K,RQ,rM,!rM,!*q"))]
+ "rG,J,N,K,RQ,rM,!rM,!*q"))]
   "(register_operand (operands[0], HImode)
 || reg_or_0_operand (operands[1], HImode))"
   "@
-   copy %1,%0
+   copy %r1,%0
ldi %1,%0
ldil L'%1,%0
{zdepi|depwi,z} %Z1,%0
@@ -3069,11 +3069,11 @@
   [(set (match_operand:QI 0 "move_dest_operand"
  "=r,r,r,r,r,Q,!*q,!r")
(match_operand:QI 1 "move_src_operand"
- "r,J,N,K,RQ,rM,!rM,!*q"))]
+ "rG,J,N,K,RQ,rM,!rM,!*q"))]
   "(register_operand (operands[0], QImode)
 || reg_or_0_operand (operands[1], QImode))"
   "@
-   copy %1,%0
+   copy %r1,%0
ldi %1,%0
ldil L'%1,%0
{zdepi|depwi,z} %Z1,%0
@@ -4047,12 +4047,12 @@
   [(set (match_operand:DF 0 "move_dest_operand"
  "=!*r,*r,*r,*r,*r,Q,f,f,T")
(match_operand:DF 1 "move_src_operand"
- "!*r,J,N,K,RQ,*rG,fG,RT,f"))]
+ "!*rG,J,N,K,RQ,*rG,fG,RT,f"))]
   "(register_operand (operands[0], DFmode)
 || reg_or_0_operand (operands[1], DFmode))
&& !TARGET_SOFT_FLOAT && TARGET_64BIT"
   "@
-   copy %1,%0
+   copy %r1,%0
ldi %1,%0
ldil L'%1,%0
depdi,z %z1,%0
@@ -4069,12 +4069,12 @@
   [(set (match_operand:DF 0 "move_dest_operand"
  "=!*r,*r,*r,*r,*r,Q")
(match_operand:DF 1 "move_src_operand"
- "!*r,J,N,K,RQ,*rG"))]
+ "!*rG,J,N,K,RQ,*rG"))]
   "(register_operand (operands[0], DFmode)
 || reg_or_0_operand (operands[1], DFmode))
&& TARGET_SOFT_FLOAT && TARGET_64BIT"
   "@
-   copy %1,%0
+   copy %r1,%0
ldi %1,%0
ldil L'%1,%0
depdi,z %z1,%0
@@ -4221,13 +4221,13 @@
   [(set (match_operand:DI 0 "move_dest_operand"
   

Re: [PATCH v2 00/14] ARM/MVE use vectors of boolean for predicates

2021-10-15 Thread Richard Sandiford via Gcc-patches
Christophe Lyon via Gcc-patches  writes:
> This is v2 of this patch series, addressing the comments I received.
> The changes v1 -> v2 are:
>
> - Patch 3: added an executable test, and updated
>   check_effective_target_arm_mve_hw
> - Patch 4: split into patch 4 and patch 14 (to keep numbering the same
>   for the other patches)
> - Patch 5: updated arm_class_likely_spilled_p as suggested.
> - Patch 7: updated test_vector_ops_duplicate in simplify-rtx.c as
>   suggested.
> - Patch 8: added V2DI -> HI/hi mapping in MVE_VPRED/MVE_vpred
>   iterators, removed now useless mve_vpselq_v2di, and fixed
>   mov expander.
> - Patch 9: arm_mode_to_pred_mode now returns opt_machine_mode, removed
>   useless floating-point checks in vec_cmpu.
> - Patch 12: replaced hi with v8bi in v2di load/store instructions
>
> I'll squash patch 2 with patch patch 9 and patch 3 with patch 8.

This looks good to me part from the question in 12/14 and the couple
of other (very) minor nits.

Thanks,
Richard

> Original text:
>
> This patch series addresses PR 100757 and 101325 by representing
> vectors of predicates (MVE VPR.P0 register) as vectors of booleans
> rather than using HImode.
>
> As this implies a lot of mostly mechanical changes, I have tried to
> split the patches in a way that should help reviewers, but the split
> is a bit artificial.
>
> Patches 1-3 add new tests.
>
> Patches 4-6 are small independent improvements.
>
> Patch 7 implements the predicate qualifier, but does not change any
> builtin yet.
>
> Patch 8 is the first of the two main patches, and uses the new
> qualifier to describe the vcmp and vpsel builtins that are useful for
> auto-vectorization of comparisons.
>
> Patch 9 is the second main patch, which fixes the vcond_mask expander.
>
> Patches 10-13 convert almost all the remaining builtins with HI
> operands to use the predicate qualifier.  After these, there are still
> a few builtins with HI operands left, about which I am not sure: vctp,
> vpnot, load-gather and store-scatter with v2di operands.  In fact,
> patches 11/12 update some STR/LDR qualifiers in a way that breaks
> these v2di builtins although existing tests still pass.
>
> Christophe Lyon (14):
>   arm: Add new tests for comparison vectorization with Neon and MVE
>   arm: Add tests for PR target/100757
>   arm: Add tests for PR target/101325
>   arm: Add GENERAL_AND_VPR_REGS regclass
>   arm: Add support for VPR_REG in arm_class_likely_spilled_p
>   arm: Fix mve_vmvnq_n_ argument mode
>   arm: Implement MVE predicates as vectors of booleans
>   arm: Implement auto-vectorized MVE comparisons with vectors of boolean
> predicates
>   arm: Fix vcond_mask expander for MVE (PR target/100757)
>   arm: Convert remaining MVE vcmp builtins to predicate qualifiers
>   arm: Convert more MVE builtins to predicate qualifiers
>   arm: Convert more load/store MVE builtins to predicate qualifiers
>   arm: Convert more MVE/CDE builtins to predicate qualifiers
>   arm: Add VPR_REG to ALL_REGS
>
>  gcc/config/arm/arm-builtins.c | 228 +++--
>  gcc/config/arm/arm-modes.def  |   5 +
>  gcc/config/arm/arm-protos.h   |   3 +-
>  gcc/config/arm/arm-simd-builtin-types.def |   4 +
>  gcc/config/arm/arm.c  | 130 ++-
>  gcc/config/arm/arm.h  |   5 +-
>  gcc/config/arm/arm_mve_builtins.def   | 746 
>  gcc/config/arm/iterators.md   |   5 +
>  gcc/config/arm/mve.md | 832 ++
>  gcc/config/arm/neon.md|  39 +
>  gcc/config/arm/vec-common.md  |  52 --
>  gcc/simplify-rtx.c|  26 +-
>  .../arm/acle/cde-mve-full-assembly.c  | 264 +++---
>  .../gcc.target/arm/simd/mve-vcmp-f32-2.c  |  32 +
>  .../gcc.target/arm/simd/neon-compare-1.c  |  78 ++
>  .../gcc.target/arm/simd/neon-compare-2.c  |  13 +
>  .../gcc.target/arm/simd/neon-compare-3.c  |  14 +
>  .../arm/simd/neon-compare-scalar-1.c  |  57 ++
>  .../gcc.target/arm/simd/neon-vcmp-f16.c   |  12 +
>  .../gcc.target/arm/simd/neon-vcmp-f32-2.c |  15 +
>  .../gcc.target/arm/simd/neon-vcmp-f32-3.c |  12 +
>  .../gcc.target/arm/simd/neon-vcmp-f32.c   |  12 +
>  gcc/testsuite/gcc.target/arm/simd/neon-vcmp.c |  22 +
>  .../gcc.target/arm/simd/pr100757-2.c  |  20 +
>  .../gcc.target/arm/simd/pr100757-3.c  |  20 +
>  .../gcc.target/arm/simd/pr100757-4.c  |  19 +
>  gcc/testsuite/gcc.target/arm/simd/pr100757.c  |  19 +
>  .../gcc.target/arm/simd/pr101325-2.c  |  19 +
>  gcc/testsuite/gcc.target/arm/simd/pr101325.c  |  14 +
>  gcc/testsuite/lib/target-supports.exp |   3 +-
>  30 files changed, 1611 insertions(+), 1109 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f32-2.c
>  create mode 100644 gcc/testsuite/gcc.target/arm/simd/neon-compare-1.c
>  create mode 100644 

Re: [PATCH v2 12/14] arm: Convert more load/store MVE builtins to predicate qualifiers

2021-10-15 Thread Richard Sandiford via Gcc-patches
Christophe Lyon via Gcc-patches  writes:
> This patch covers a few builtins where we do not use the 
> iterator and thus we cannot use .
>
> For v2di instructions, we use the V8BI mode for predicates.

Why V8BI though, when VPRED uses HI?

Would it make sense to define a V2BI?  Or doesn't that work?

Thanks,
Richard

>
> 2021-10-13  Christophe Lyon  
>
>   gcc/
>   PR target/100757
>   PR target/101325
>   * config/arm/arm-builtins.c (STRSBS_P_QUALIFIERS): Use predicate
>   qualifier.
>   (STRSBU_P_QUALIFIERS): Likewise.
>   (LDRGBS_Z_QUALIFIERS): Likewise.
>   (LDRGBU_Z_QUALIFIERS): Likewise.
>   (LDRGBWBXU_Z_QUALIFIERS): Likewise.
>   (LDRGBWBS_Z_QUALIFIERS): Likewise.
>   (LDRGBWBU_Z_QUALIFIERS): Likewise.
>   (STRSBWBS_P_QUALIFIERS): Likewise.
>   (STRSBWBU_P_QUALIFIERS): Likewise.
>   * config/arm/mve.md: Use VxBI instead of HI.
>
> diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
> index 06ff9d2278a..e58580bb828 100644
> --- a/gcc/config/arm/arm-builtins.c
> +++ b/gcc/config/arm/arm-builtins.c
> @@ -738,13 +738,13 @@ arm_strss_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
>  static enum arm_type_qualifiers
>  arm_strsbs_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
>= { qualifier_void, qualifier_unsigned, qualifier_immediate,
> -  qualifier_none, qualifier_unsigned};
> +  qualifier_none, qualifier_predicate};
>  #define STRSBS_P_QUALIFIERS (arm_strsbs_p_qualifiers)
>  
>  static enum arm_type_qualifiers
>  arm_strsbu_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
>= { qualifier_void, qualifier_unsigned, qualifier_immediate,
> -  qualifier_unsigned, qualifier_unsigned};
> +  qualifier_unsigned, qualifier_predicate};
>  #define STRSBU_P_QUALIFIERS (arm_strsbu_p_qualifiers)
>  
>  static enum arm_type_qualifiers
> @@ -780,13 +780,13 @@ arm_ldrgbu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
>  static enum arm_type_qualifiers
>  arm_ldrgbs_z_qualifiers[SIMD_MAX_BUILTIN_ARGS]
>= { qualifier_none, qualifier_unsigned, qualifier_immediate,
> -  qualifier_unsigned};
> +  qualifier_predicate};
>  #define LDRGBS_Z_QUALIFIERS (arm_ldrgbs_z_qualifiers)
>  
>  static enum arm_type_qualifiers
>  arm_ldrgbu_z_qualifiers[SIMD_MAX_BUILTIN_ARGS]
>= { qualifier_unsigned, qualifier_unsigned, qualifier_immediate,
> -  qualifier_unsigned};
> +  qualifier_predicate};
>  #define LDRGBU_Z_QUALIFIERS (arm_ldrgbu_z_qualifiers)
>  
>  static enum arm_type_qualifiers
> @@ -826,7 +826,7 @@ arm_ldrgbwbxu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
>  static enum arm_type_qualifiers
>  arm_ldrgbwbxu_z_qualifiers[SIMD_MAX_BUILTIN_ARGS]
>= { qualifier_unsigned, qualifier_unsigned, qualifier_immediate,
> -  qualifier_unsigned};
> +  qualifier_predicate};
>  #define LDRGBWBXU_Z_QUALIFIERS (arm_ldrgbwbxu_z_qualifiers)
>  
>  static enum arm_type_qualifiers
> @@ -842,13 +842,13 @@ arm_ldrgbwbu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
>  static enum arm_type_qualifiers
>  arm_ldrgbwbs_z_qualifiers[SIMD_MAX_BUILTIN_ARGS]
>= { qualifier_none, qualifier_unsigned, qualifier_immediate,
> -  qualifier_unsigned};
> +  qualifier_predicate};
>  #define LDRGBWBS_Z_QUALIFIERS (arm_ldrgbwbs_z_qualifiers)
>  
>  static enum arm_type_qualifiers
>  arm_ldrgbwbu_z_qualifiers[SIMD_MAX_BUILTIN_ARGS]
>= { qualifier_unsigned, qualifier_unsigned, qualifier_immediate,
> -  qualifier_unsigned};
> +  qualifier_predicate};
>  #define LDRGBWBU_Z_QUALIFIERS (arm_ldrgbwbu_z_qualifiers)
>  
>  static enum arm_type_qualifiers
> @@ -864,13 +864,13 @@ arm_strsbwbu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
>  static enum arm_type_qualifiers
>  arm_strsbwbs_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
>= { qualifier_unsigned, qualifier_unsigned, qualifier_const,
> -  qualifier_none, qualifier_unsigned};
> +  qualifier_none, qualifier_predicate};
>  #define STRSBWBS_P_QUALIFIERS (arm_strsbwbs_p_qualifiers)
>  
>  static enum arm_type_qualifiers
>  arm_strsbwbu_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
>= { qualifier_unsigned, qualifier_unsigned, qualifier_const,
> -  qualifier_unsigned, qualifier_unsigned};
> +  qualifier_unsigned, qualifier_predicate};
>  #define STRSBWBU_P_QUALIFIERS (arm_strsbwbu_p_qualifiers)
>  
>  static enum arm_type_qualifiers
> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> index 81ad488155d..c07487c0750 100644
> --- a/gcc/config/arm/mve.md
> +++ b/gcc/config/arm/mve.md
> @@ -7282,7 +7282,7 @@ (define_insn "mve_vstrwq_scatter_base_p_v4si"
>   [(match_operand:V4SI 0 "s_register_operand" "w")
>(match_operand:SI 1 "immediate_operand" "i")
>(match_operand:V4SI 2 "s_register_operand" "w")
> -  (match_operand:HI 3 "vpr_register_operand" "Up")]
> +  (match_operand:V4BI 3 "vpr_register_operand" "Up")]
>VSTRWSBQ))
>]
>"TARGET_HAVE_MVE"
> @@ -7371,7 +7371,7 @@ (define_insn "mve_vldrwq_gather_base_z_v4si"
>[(set (match_operand:V4SI 0 

RE: [PATCH]AArch64 Lower intrinsics shift to GIMPLE when possible.

2021-10-15 Thread Tamar Christina via Gcc-patches


> -Original Message-
> From: Richard Sandiford 
> Sent: Friday, October 15, 2021 1:26 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
> ; Marcus Shawcroft
> ; Kyrylo Tkachov 
> Subject: Re: [PATCH]AArch64 Lower intrinsics shift to GIMPLE when possible.
> 
> Tamar Christina  writes:
> > Hi All,
> >
> > This lowers shifts to GIMPLE when the C interpretations of the shift
> > operations matches that of AArch64.
> >
> > In C shifting right by BITSIZE is undefined, but the behavior is
> > defined in AArch64.  Additionally negative shifts lefts are undefined
> > in C but defined for the register variant of the instruction (SSHL, USHL) as
> being right shifts.
> >
> > Since we have a right shift by immediate I rewrite those cases into
> > right shifts
> >
> > So:
> >
> > int64x1_t foo3 (int64x1_t a)
> > {
> >   return vshl_s64 (a, vdup_n_s64(-6)); }
> >
> > produces:
> >
> > foo3:
> > sshrd0, d0, 6
> > ret
> >
> > instead of:
> >
> > foo3:
> > mov x0, -6
> > fmovd1, x0
> > sshld0, d0, d1
> > ret
> >
> > This behavior isn't specifically mentioned for a left shift by
> > immediate, but I believe that only the case because we do have a right
> > shift by immediate but not a right shift by register.  As such I do the same
> for left shift by immediate.
> >
> > The testsuite already has various testcases for shifts (vshl.c etc) so
> > I am not adding overlapping tests here.
> >
> > Out of range shifts like
> >
> > int64x1_t foo3 (int64x1_t a)
> > {
> >   return vshl_s64 (a, vdup_n_s64(80)); }
> >
> > now get optimized to 0 as well along with undefined behaviors both in
> > C and AArch64.
> 
> The SSHL results are well-defined for all shift amounts, so we shouldn't
> convert them to undefined gimple, even as a temporary step.  E.g.:
> 
> int32x4_t foo(int32x4_t x) {
>   return vshlq_s32(x, vdupq_n_s32(256)); }
> 
> should fold to “x” (if we fold it at all).  Similarly:
> 
> int32x4_t foo(int32x4_t x) {
>   return vshlq_s32(x, vdupq_n_s32(257)); }
> 
> should fold to x << 1 (again if we fold it at all).
> 
> For a shift right:
> 
> int32x4_t foo(int32x4_t x) {
>   return vshlq_s32(x, vdupq_n_s32(-64)); }
> 
> is equivalent to:
> 
> int32x4_t foo(int32x4_t x) {
>   return vshrq_n_s32(x, 31);
> }
> 
> and so it shouldn't fold to 0.

And here I thought I had read the specs very carefully...

I will punt on  them because I don't think those ranged are common at all.


Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

--- inline copy of patch ---

diff --git a/gcc/config/aarch64/aarch64-builtins.c 
b/gcc/config/aarch64/aarch64-builtins.c
index 
f6b41d9c200d6300dee65ba60ae94488231a8a38..568775cb8effaf51a692ba12af99e9865d2cf8a3
 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -2394,6 +2394,68 @@ aarch64_general_gimple_fold_builtin (unsigned int fcode, 
gcall *stmt)
   1, args[0]);
gimple_call_set_lhs (new_stmt, gimple_call_lhs (stmt));
break;
+  BUILTIN_VSDQ_I_DI (BINOP, ashl, 3, NONE)
+   {
+ tree cst = args[1];
+ tree ctype = TREE_TYPE (cst);
+ if (INTEGRAL_TYPE_P (ctype)
+ && TREE_CODE (cst) == INTEGER_CST)
+   {
+ wide_int wcst = wi::to_wide (cst);
+ if (wi::neg_p (wcst, TYPE_SIGN (ctype)))
+   new_stmt =
+ gimple_build_assign (gimple_call_lhs (stmt),
+  RSHIFT_EXPR, args[0],
+  wide_int_to_tree (ctype,
+wi::abs (wcst)));
+ else
+   new_stmt =
+ gimple_build_assign (gimple_call_lhs (stmt),
+  LSHIFT_EXPR, args[0], args[1]);
+   }
+   }
+   break;
+  BUILTIN_VSDQ_I_DI (BINOP, sshl, 0, NONE)
+  BUILTIN_VSDQ_I_DI (BINOP_UUS, ushl, 0, NONE)
+   {
+ tree cst = args[1];
+ tree ctype = TREE_TYPE (cst);
+ HOST_WIDE_INT bits = GET_MODE_UNIT_BITSIZE (TYPE_MODE (TREE_TYPE 
(args[0])));
+ if (INTEGRAL_TYPE_P (ctype)
+ && TREE_CODE (cst) == INTEGER_CST)
+   {
+ wide_int wcst = wi::to_wide (cst);
+ wide_int abs_cst = wi::abs (wcst);
+ if (wi::geu_p (abs_cst, bits))
+   break;
+
+ if (wi::neg_p (wcst, TYPE_SIGN (ctype)))
+   new_stmt =
+ gimple_build_assign (gimple_call_lhs (stmt),
+  RSHIFT_EXPR, args[0],
+  wide_int_to_tree (ctype, abs_cst));
+ else
+   new_stmt =
+ gimple_build_assign (gimple_call_lhs (stmt),
+  LSHIFT_EXPR, args[0], args[1]);
+   }
+   }
+   

Re: [PATCH v2 09/14] arm: Fix vcond_mask expander for MVE (PR target/100757)

2021-10-15 Thread Richard Sandiford via Gcc-patches
Christophe Lyon via Gcc-patches  writes:
> @@ -31086,36 +31087,20 @@ arm_expand_vector_compare (rtx target, rtx_code 
> code, rtx op0, rtx op1,
>  case NE:
>if (TARGET_HAVE_MVE)
>   {
> -   rtx vpr_p0;
> -   if (vcond_mve)
> - vpr_p0 = target;
> -   else
> - vpr_p0 = gen_reg_rtx (arm_mode_to_pred_mode (cmp_mode));
> -
> switch (GET_MODE_CLASS (cmp_mode))
>   {
>   case MODE_VECTOR_INT:
> -   emit_insn (gen_mve_vcmpq (code, cmp_mode, vpr_p0, op0, force_reg 
> (cmp_mode, op1)));
> +   emit_insn (gen_mve_vcmpq (code, cmp_mode, target, op0, force_reg 
> (cmp_mode, op1)));

Pre-existing nit: long line.  Same for later calls in the same function.

Richard

> break;
>   case MODE_VECTOR_FLOAT:
> if (TARGET_HAVE_MVE_FLOAT)
> - emit_insn (gen_mve_vcmpq_f (code, cmp_mode, vpr_p0, op0, 
> force_reg (cmp_mode, op1)));
> + emit_insn (gen_mve_vcmpq_f (code, cmp_mode, target, op0, 
> force_reg (cmp_mode, op1)));
> else
>   gcc_unreachable ();
> break;
>   default:
> gcc_unreachable ();
>   }
> -
> -   /* If we are not expanding a vcond, build the result here.  */
> -   if (!vcond_mve)
> - {
> -   rtx zero = gen_reg_rtx (cmp_result_mode);
> -   rtx one = gen_reg_rtx (cmp_result_mode);
> -   emit_move_insn (zero, CONST0_RTX (cmp_result_mode));
> -   emit_move_insn (one, CONST1_RTX (cmp_result_mode));
> -   emit_insn (gen_mve_vpselq (VPSELQ_S, cmp_result_mode, target, 
> one, zero, vpr_p0));
> - }
>   }
>else
>   emit_insn (gen_neon_vc (code, cmp_mode, target, op0, op1));
> @@ -31127,23 +31112,7 @@ arm_expand_vector_compare (rtx target, rtx_code 
> code, rtx op0, rtx op1,
>  case GEU:
>  case GTU:
>if (TARGET_HAVE_MVE)
> - {
> -   rtx vpr_p0;
> -   if (vcond_mve)
> - vpr_p0 = target;
> -   else
> - vpr_p0 = gen_reg_rtx (arm_mode_to_pred_mode (cmp_mode));
> -
> -   emit_insn (gen_mve_vcmpq (code, cmp_mode, vpr_p0, op0, force_reg 
> (cmp_mode, op1)));
> -   if (!vcond_mve)
> - {
> -   rtx zero = gen_reg_rtx (cmp_result_mode);
> -   rtx one = gen_reg_rtx (cmp_result_mode);
> -   emit_move_insn (zero, CONST0_RTX (cmp_result_mode));
> -   emit_move_insn (one, CONST1_RTX (cmp_result_mode));
> -   emit_insn (gen_mve_vpselq (VPSELQ_S, cmp_result_mode, target, 
> one, zero, vpr_p0));
> - }
> - }
> + emit_insn (gen_mve_vcmpq (code, cmp_mode, target, op0, force_reg 
> (cmp_mode, op1)));
>else
>   emit_insn (gen_neon_vc (code, cmp_mode, target,
>   op0, force_reg (cmp_mode, op1)));
> @@ -31154,23 +31123,7 @@ arm_expand_vector_compare (rtx target, rtx_code 
> code, rtx op0, rtx op1,
>  case LEU:
>  case LTU:
>if (TARGET_HAVE_MVE)
> - {
> -   rtx vpr_p0;
> -   if (vcond_mve)
> - vpr_p0 = target;
> -   else
> - vpr_p0 = gen_reg_rtx (arm_mode_to_pred_mode (cmp_mode));
> -
> -   emit_insn (gen_mve_vcmpq (swap_condition (code), cmp_mode, vpr_p0, 
> force_reg (cmp_mode, op1), op0));
> -   if (!vcond_mve)
> - {
> -   rtx zero = gen_reg_rtx (cmp_result_mode);
> -   rtx one = gen_reg_rtx (cmp_result_mode);
> -   emit_move_insn (zero, CONST0_RTX (cmp_result_mode));
> -   emit_move_insn (one, CONST1_RTX (cmp_result_mode));
> -   emit_insn (gen_mve_vpselq (VPSELQ_S, cmp_result_mode, target, 
> one, zero, vpr_p0));
> - }
> - }
> + emit_insn (gen_mve_vcmpq (swap_condition (code), cmp_mode, target, 
> force_reg (cmp_mode, op1), op0));
>else
>   emit_insn (gen_neon_vc (swap_condition (code), cmp_mode,
>   target, force_reg (cmp_mode, op1), op0));
> @@ -31185,8 +31138,8 @@ arm_expand_vector_compare (rtx target, rtx_code code, 
> rtx op0, rtx op1,
>   rtx gt_res = gen_reg_rtx (cmp_result_mode);
>   rtx alt_res = gen_reg_rtx (cmp_result_mode);
>   rtx_code alt_code = (code == LTGT ? LT : LE);
> - if (arm_expand_vector_compare (gt_res, GT, op0, op1, true, vcond_mve)
> - || arm_expand_vector_compare (alt_res, alt_code, op0, op1, true, 
> vcond_mve))
> + if (arm_expand_vector_compare (gt_res, GT, op0, op1, true)
> + || arm_expand_vector_compare (alt_res, alt_code, op0, op1, true))
> gcc_unreachable ();
>   emit_insn (gen_rtx_SET (target, gen_rtx_IOR (cmp_result_mode,
>gt_res, alt_res)));
> @@ -31206,19 +31159,15 @@ arm_expand_vcond (rtx *operands, machine_mode 
> cmp_result_mode)
>  {
>/* When expanding for MVE, we do not want to emit a (useless) vpsel in
>   arm_expand_vector_compare, and another one here.  */
> -  bool vcond_mve=false;
>rtx mask;
>  
>  

Re: [PATCH v2 03/14] arm: Add tests for PR target/101325

2021-10-15 Thread Richard Sandiford via Gcc-patches
Christophe Lyon via Gcc-patches  writes:
> These tests are derived from the one provided in the PR: there is a
> compile-only test because I did not have access to anything that could
> execute MVE code until recently.
> I have been able to add an executable test since QEMU supports MVE.
>
> Instead of adding arm_v8_1m_mve_hw, I update arm_mve_hw so that it
> uses add_options_for_arm_v8_1m_mve_fp, like arm_neon_hw does.  This
> ensures arm_mve_hw passes even if the toolchain does not generate MVE
> code by default.
>
> 2021-10-13  Christophe Lyon  
>
>   gcc/testsuite/
>   PR target/101325
>   * gcc.target/arm/simd/pr101325.c: New.
>   * gcc.target/arm/simd/pr101325-2.c: New.
>   * lib/target-supports.exp (check_effective_target_arm_mve_hw): Use
>   add_options_for_arm_v8_1m_mve_fp.
>
> add executable test and update check_effective_target_arm_mve_hw
>
> diff --git a/gcc/testsuite/gcc.target/arm/simd/pr101325-2.c 
> b/gcc/testsuite/gcc.target/arm/simd/pr101325-2.c
> new file mode 100644
> index 000..7907a386385
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/simd/pr101325-2.c
> @@ -0,0 +1,19 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target arm_mve_hw } */
> +/* { dg-options "-O3" } */
> +/* { dg-add-options arm_v8_1m_mve } */
> +
> +#include 
> +
> +
> +__attribute((noinline,noipa))

Very minor, but: noinline is redundant with noipa.

Richard

> +unsigned foo(int8x16_t v, int8x16_t w)
> +{
> +  return vcmpeqq (v, w);
> +}
> +
> +int main(void)
> +{
> +  if (foo (vdupq_n_s8(0), vdupq_n_s8(0)) != 0xU)
> +__builtin_abort ();
> +}
> diff --git a/gcc/testsuite/gcc.target/arm/simd/pr101325.c 
> b/gcc/testsuite/gcc.target/arm/simd/pr101325.c
> new file mode 100644
> index 000..a466683a0b1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/simd/pr101325.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target arm_v8_1m_mve_ok } */
> +/* { dg-add-options arm_v8_1m_mve } */
> +/* { dg-additional-options "-O3" } */
> +
> +#include 
> +
> +unsigned foo(int8x16_t v, int8x16_t w)
> +{
> +  return vcmpeqq (v, w);
> +}
> +/* { dg-final { scan-assembler {\tvcmp.i8  eq} } } */
> +/* { dg-final { scan-assembler {\tvmrs\t r[0-9]+, P0} } } */
> +/* { dg-final { scan-assembler {\tuxth} } } */
> diff --git a/gcc/testsuite/lib/target-supports.exp 
> b/gcc/testsuite/lib/target-supports.exp
> index e030e4f376b..b0e35b602af 100644
> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp
> @@ -4889,6 +4889,7 @@ proc check_effective_target_arm_cmse_hw { } {
>   }
>  } "-mcmse -Wl,--section-start,.gnu.sgstubs=0x0040"]
>  }
> +
>  # Return 1 if the target supports executing MVE instructions, 0
>  # otherwise.
>  
> @@ -4904,7 +4905,7 @@ proc check_effective_target_arm_mve_hw {} {
>  : "0" (a), "r" (b));
> return (a != 2);
>   }
> -} ""]
> +} [add_options_for_arm_v8_1m_mve_fp ""]]
>  }
>  
>  # Return 1 if this is an ARM target where ARMv8-M Security Extensions with


[committed] openmp: Handle OpenMP 5.1 simplified OMP_PLACES syntax

2021-10-15 Thread Jakub Jelinek via Gcc-patches
Hi!

In addition to adding ll_caches and numa_domain abstract names
to OMP_PLACES syntax, OpenMP 5.1 also added one syntax simplification:
https://github.com/OpenMP/spec/issues/2080
https://github.com/OpenMP/spec/pull/2081
in particular that in the grammar place non-terminal is now
not only { res-list } but also res (i.e. a non-negative integer),
which stands as a shortcut for { res }
So, one can specify OMP_PLACES=0,4,8,12 with the meaning
OMP_PLACES={0},{4},{8},{12} or OMP_PLACES=0:4 instead of OMP_PLACES={0}:4
or OMP_PLACES={0},{1},{2},{3} etc.

This patch implements that.

Regtested on x86_64-linux and i686-linux, commited to trunk.

2021-10-15  Jakub Jelinek  

* env.c (parse_one_place): Handle non-negative-number the same
as { non-negative-number }.  Reject even !number:1 and
!number:1:stride or !place:1 or !place:1:stride instead of just
length other than 1.
* libgomp.texi (OpenMP 5.1): Document OMP_PLACES syntax extensions
and OMP_NUM_TEAMS/OMP_TEAMS_THREAD_LIMIT and
omp_{set_num,get_max}_teams/omp_{s,g}et_teams_thread_limit features
as implemented.
* testsuite/libgomp.c/affinity-1.c: Add a test for the 5.1 place
simplified syntax.

--- libgomp/env.c.jj2021-10-15 14:07:07.464919497 +0200
+++ libgomp/env.c   2021-10-15 15:29:33.051521024 +0200
@@ -546,6 +546,7 @@ parse_one_place (char **envp, bool *nega
   long stride = 1;
   int pass;
   bool any_negate = false;
+  bool has_braces = true;
   *negatep = false;
   while (isspace ((unsigned char) *env))
 ++env;
@@ -557,12 +558,28 @@ parse_one_place (char **envp, bool *nega
++env;
 }
   if (*env != '{')
-return false;
-  ++env;
-  while (isspace ((unsigned char) *env))
-++env;
+{
+  char *end;
+  unsigned long this_num;
+
+  errno = 0;
+  this_num = strtoul (env, , 10);
+  if (errno || end == env)
+   return false;
+  env = end - 1;
+  has_braces = false;
+  if (gomp_places_list
+ && !gomp_affinity_add_cpus (p, this_num, 1, 1, false))
+   return false;
+}
+  else
+{
+  ++env;
+  while (isspace ((unsigned char) *env))
+   ++env;
+}
   start = env;
-  for (pass = 0; pass < (any_negate ? 2 : 1); pass++)
+  for (pass = 0; pass < (any_negate ? 2 : has_braces); pass++)
 {
   env = start;
   do
@@ -590,6 +607,8 @@ parse_one_place (char **envp, bool *nega
  if (*env == ':')
{
  ++env;
+ if (this_negate)
+   return false;
  while (isspace ((unsigned char) *env))
++env;
  errno = 0;
@@ -612,8 +631,6 @@ parse_one_place (char **envp, bool *nega
++env;
}
}
- if (this_negate && this_len != 1)
-   return false;
  if (gomp_places_list && pass == this_negate)
{
  if (this_negate)
@@ -640,6 +657,8 @@ parse_one_place (char **envp, bool *nega
   if (*env == ':')
 {
   char *end;
+  if (*negatep)
+   return false;
   ++env;
   while (isspace ((unsigned char) *env))
++env;
@@ -663,8 +682,6 @@ parse_one_place (char **envp, bool *nega
++env;
}
 }
-  if (*negatep && len != 1)
-return false;
   *envp = env;
   *lenp = len;
   *stridep = stride;
--- libgomp/libgomp.texi.jj 2021-10-14 22:03:52.007889926 +0200
+++ libgomp/libgomp.texi2021-10-15 15:47:33.791920057 +0200
@@ -309,7 +309,7 @@ The OpenMP 4.5 specification is fully su
 @item @code{present} argument to @code{defaultmap} clause @tab N @tab
 @item @code{omp_set_num_teams}, @code{omp_set_teams_thread_limit},
   @code{omp_get_max_teams}, @code{omp_get_teams_thread_limit} runtime
-  routines @tab N @tab
+  routines @tab Y @tab
 @item @code{omp_target_is_accessible} runtime routine @tab N @tab
 @item @code{omp_target_memcpy_async} and @code{omp_target_memcpy_rect_async}
   runtime routines @tab N @tab
@@ -328,9 +328,9 @@ The OpenMP 4.5 specification is fully su
   @code{ompt_callback_target_emi_t}, @code{ompt_callback_target_map_emi_t}
   and @code{ompt_callback_target_submit_emi_t} @tab N @tab
 @item @code{ompt_callback_error_t} type @tab N @tab
-@item @code{OMP_PLACES} syntax was extension @tab N @tab
+@item @code{OMP_PLACES} syntax extensions @tab Y @tab
 @item @code{OMP_NUM_TEAMS} and @code{OMP_TEAMS_THREAD_LIMIT} environment
-  variables @tab N @tab
+  variables @tab Y @tab
 @end multitable
 
 @unnumberedsubsec Other new OpenMP 5.1 features
--- libgomp/testsuite/libgomp.c/affinity-1.c.jj 2021-08-12 20:37:12.702473673 
+0200
+++ libgomp/testsuite/libgomp.c/affinity-1.c2021-10-15 15:13:06.712762372 
+0200
@@ -48,7 +48,7 @@ struct place
 };
 struct places
 {
-  char name[40];
+  char name[50];
   int count;
   struct place places[8];
 } places_array[] = {
@@ -62,7 +62,9 @@ struct places
 { { 1, 1 }, { 2, 1 }, { 3, 1 },
   { 

[committed] openmp: Fix up strtoul and strtoull uses in libgomp

2021-10-15 Thread Jakub Jelinek via Gcc-patches
Hi!

Yesterday when working on numa_domains, I've noticed because of a bug
in my patch a hang on a large NUMA machine.  I've fixed the bug, but
also discovered that the hang was a result of making wrong assumptions
about strtoul/strtoull.  All the uses were for portability setting
errno = 0 before the calls and treating non-zero errno after the call
as invalid input, but for the case where there are no valid digits at
all strtoul may set errno to EINVAL, but doesn't have to and with
glibc doesn't do that.  So, this patch goes through all the strtoul calls
and next to errno != 0 checks adds also endptr == startptr check.
Haven't done it in places where we immediately reject strtoul returning 0
the same as we reject errno != 0, because strtoul must return 0 in the
case where it sets endptr to the start pointer.  In some spots the code
was using errno = 0; x = strtoul (p, , 10); if (errno) { /*invalid*/ }
and those spots had to be changed to
errno = 0; x = strtoul (p, , 10); if (errno || end == p) { /*invalid*/ }
p = end;

Regtested on x86_64-linux and i686-linux, committed to trunk.

2021-10-15  Jakub Jelinek  

* env.c (parse_schedule): For strtoul or strtoull calls which don't
clearly reject return value 0 as invalid handle the case where end
pointer is the same as first argument as invalid.
(parse_unsigned_long_1): Likewise.
(parse_one_place): Likewise.
(parse_places_var): Likewise.
(parse_stacksize): Likewise.
(parse_spincount): Likewise.
(parse_affinity): Likewise.
(parse_gomp_openacc_dim): Likewise.  Avoid strict aliasing violation.
Make code valid C89.
* config/linux/affinity.c (gomp_affinity_find_last_cache_level):
For strtoul calls which don't clearly reject return value 0 as
invalid handle the case where end pointer is the same as first
argument as invalid.
(gomp_affinity_init_level_1): Likewise.
(gomp_affinity_init_numa_domains): Likewise.
* config/rtems/proc.c (parse_thread_pools): Likewise.

--- libgomp/env.c.jj2021-10-14 22:04:30.594333475 +0200
+++ libgomp/env.c   2021-10-15 14:07:07.464919497 +0200
@@ -183,7 +183,7 @@ parse_schedule (void)
 
   errno = 0;
   value = strtoul (env, , 10);
-  if (errno)
+  if (errno || end == env)
 goto invalid;
 
   while (isspace ((unsigned char) *end))
@@ -232,7 +232,7 @@ parse_unsigned_long_1 (const char *name,
 
   errno = 0;
   value = strtoul (env, , 10);
-  if (errno || (long) value <= 0 - allow_zero)
+  if (errno || end == env || (long) value <= 0 - allow_zero)
 goto invalid;
 
   while (isspace ((unsigned char) *end))
@@ -570,6 +570,7 @@ parse_one_place (char **envp, bool *nega
  unsigned long this_num, this_len = 1;
  long this_stride = 1;
  bool this_negate = (*env == '!');
+ char *end;
  if (this_negate)
{
  if (gomp_places_list)
@@ -580,9 +581,10 @@ parse_one_place (char **envp, bool *nega
}
 
  errno = 0;
- this_num = strtoul (env, , 10);
- if (errno)
+ this_num = strtoul (env, , 10);
+ if (errno || end == env)
return false;
+ env = end;
  while (isspace ((unsigned char) *env))
++env;
  if (*env == ':')
@@ -602,9 +604,10 @@ parse_one_place (char **envp, bool *nega
  while (isspace ((unsigned char) *env))
++env;
  errno = 0;
- this_stride = strtol (env, , 10);
- if (errno)
+ this_stride = strtol (env, , 10);
+ if (errno || end == env)
return false;
+ env = end;
  while (isspace ((unsigned char) *env))
++env;
}
@@ -636,6 +639,7 @@ parse_one_place (char **envp, bool *nega
 ++env;
   if (*env == ':')
 {
+  char *end;
   ++env;
   while (isspace ((unsigned char) *env))
++env;
@@ -651,9 +655,10 @@ parse_one_place (char **envp, bool *nega
  while (isspace ((unsigned char) *env))
++env;
  errno = 0;
- stride = strtol (env, , 10);
- if (errno)
+ stride = strtol (env, , 10);
+ if (errno || end == env)
return false;
+ env = end;
  while (isspace ((unsigned char) *env))
++env;
}
@@ -720,7 +725,7 @@ parse_places_var (const char *name, bool
 
  errno = 0;
  count = strtoul (env, , 10);
- if (errno)
+ if (errno || end == env)
goto invalid;
  env = end;
  while (isspace ((unsigned char) *env))
@@ -859,7 +864,7 @@ parse_stacksize (const char *name, unsig
 
   errno = 0;
   value = strtoul (env, , 10);
-  if (errno)
+  if (errno || end == env)
 goto invalid;
 
   while (isspace ((unsigned char) *end))
@@ -928,7 +933,7 @@ parse_spincount (const char 

[committed] openmp: Fix up handling of OMP_PLACES=threads(1)

2021-10-15 Thread Jakub Jelinek via Gcc-patches
Hi!

When writing the places-*.c tests, I've noticed that we mishandle threads
abstract name with specified num-places if num-places isn't a multiple of
number of hw threads in a core.  It then happily ignores the maximum count
and overwrites for the remaining hw threads in a core further places that
haven't been allocated.

Regtested on x86_64-linux and i686-linux, committed to trunk.

2021-10-15  Jakub Jelinek  

* config/linux/affinity.c (gomp_affinity_init_level_1): For level 1
after creating count places clean up and return immediately.
* testsuite/libgomp.c/places-6.c: New test.
* testsuite/libgomp.c/places-7.c: New test.
* testsuite/libgomp.c/places-8.c: New test.
* testsuite/libgomp.c/places-9.c: New test.
* testsuite/libgomp.c/places-10.c: New test.

--- libgomp/config/linux/affinity.c.jj  2021-10-14 22:04:30.595333461 +0200
+++ libgomp/config/linux/affinity.c 2021-10-15 13:20:19.561484351 +0200
@@ -338,8 +338,13 @@ gomp_affinity_init_level_1 (int level, i
  if (gomp_affinity_add_cpus (pl, first, 1, 0, true))
{
  CPU_CLR_S (first, gomp_cpuset_size, copy);
- if (level == 1)
-   gomp_places_list_len++;
+ if (level == 1
+ && ++gomp_places_list_len >= count)
+   {
+ fclose (f);
+ free (line);
+ return;
+   }
}
}
if (*p == ',')
--- libgomp/testsuite/libgomp.c/places-6.c.jj   2021-10-15 13:28:17.461582786 
+0200
+++ libgomp/testsuite/libgomp.c/places-6.c  2021-10-15 13:28:25.228470619 
+0200
@@ -0,0 +1,10 @@
+/* { dg-set-target-env-var OMP_PLACES "threads(1)" } */
+
+#include 
+
+int
+main ()
+{
+  omp_display_env (0);
+  return 0;
+}
--- libgomp/testsuite/libgomp.c/places-7.c.jj   2021-10-15 13:28:17.465582728 
+0200
+++ libgomp/testsuite/libgomp.c/places-7.c  2021-10-15 13:28:30.295397448 
+0200
@@ -0,0 +1,10 @@
+/* { dg-set-target-env-var OMP_PLACES "cores(1)" } */
+
+#include 
+
+int
+main ()
+{
+  omp_display_env (0);
+  return 0;
+}
--- libgomp/testsuite/libgomp.c/places-8.c.jj   2021-10-15 13:28:17.469582670 
+0200
+++ libgomp/testsuite/libgomp.c/places-8.c  2021-10-15 13:28:35.181326887 
+0200
@@ -0,0 +1,10 @@
+/* { dg-set-target-env-var OMP_PLACES "sockets(1)" } */
+
+#include 
+
+int
+main ()
+{
+  omp_display_env (0);
+  return 0;
+}
--- libgomp/testsuite/libgomp.c/places-9.c.jj   2021-10-15 13:28:17.473582613 
+0200
+++ libgomp/testsuite/libgomp.c/places-9.c  2021-10-15 13:28:39.913258548 
+0200
@@ -0,0 +1,10 @@
+/* { dg-set-target-env-var OMP_PLACES "ll_caches(1)" } */
+
+#include 
+
+int
+main ()
+{
+  omp_display_env (0);
+  return 0;
+}
--- libgomp/testsuite/libgomp.c/places-10.c.jj  2021-10-15 13:28:17.477582555 
+0200
+++ libgomp/testsuite/libgomp.c/places-10.c 2021-10-15 13:28:46.433164392 
+0200
@@ -0,0 +1,10 @@
+/* { dg-set-target-env-var OMP_PLACES "numa_domains(1)" } */
+
+#include 
+
+int
+main ()
+{
+  omp_display_env (0);
+  return 0;
+}

Jakub



Re: [PATCH] Allow early sets of SSE hard registers from standard_sse_constant_p

2021-10-15 Thread Uros Bizjak via Gcc-patches
On Fri, Oct 15, 2021 at 2:15 PM Roger Sayle  wrote:
>
>
> My previous patch, which was intended to reduce the differences seen by
> the combination of -march=cascadelake and -m32, has additionally found
> some more instances where this combination behaves differently to regular
> x86_64-pc-linux-gnu.  The middle-end always, and backends usually, use
> emit_move_insn to emit/expand move instructions allowing the backend
> control over placing things in constant pools, adding REG_EQUAL notes,
> and so on.  Several of the AVX512 built-in expanders bypass this logic,
> and instead generate moves directly using emit_insn(gen_rtx_SET (dst,src)).
>
> For example, i386-expand.c line 12004 contains:
>   for (i = 0; i < 8; i++)
> emit_insn (gen_rtx_SET (xmm_regs[i], const0_rtx));
>
> I suspect that in this case, loading of standard_sse_constant_p, my
> change to require loading of likely spilled hard registers via a
> pseudo is perhaps overly strict, so this patch/fix reallows these
> immediate constants values to be loaded directly prior to reload.
>
> If anyone notices a (SPEC benchmark) performance regression with
> this patch, I'll propose the more invasive fix to make more use of
> emit_move_insn in the backend (and revert this fix), but all things
> being equal it's best to leave things the way they previously were.
>
> This patch not only cures the regressions reported by Sunil's
> tester, but in combination with the previous patch now has 7 fewer
> unexpected failures in the testsuite with -m32 -march=cascadelake.
> This patch has also been tested with "make bootstrap" and
> "make -k check" on x86_64-pc-linux-gnu with no new failures.
>
> Ok for mainline?
> Sorry again for the temporary inconvenience.
>
>
> 2021-10-15  Roger Sayle  
>
> gcc/ChangeLog
> * config/i386/i386.c (ix86_hardreg_mov_ok): For vector modes,
> allow standard_sse_constant_p immediate constants.

LGTM.

Thanks,
Uros.


Re: [PATCH] Allow fully resolving backward jump threading passes.

2021-10-15 Thread Jeff Law via Gcc-patches




On 10/15/2021 8:25 AM, Aldy Hernandez wrote:

This refactors the backward threader pass so that it can be called in
either fully resolving mode, or in classic mode where any unknowns
default to VARYING.  Doing so opens the door for
"pass_thread_jumps_full" which has the resolving bits set.

This pass has not been added to the pipeline, but with it in place, we
can now experiment with it to see how to reduce the number of
jump threaders.  The first suspect will probably be enabling fully
resolving in the backward threader pass immediately preceeding VRP2,
and removing the VRP2 threader pass.  Now that VRP and the backward
threader are sharing a solver, and most of the threads get handcuffed
by cancel_threads(), we should have a variety of scenarios to try.

In the process, I have cleaned up things to make it trivial to see
what the difference between the 3 variants are (early jump
threading, quick jump threading without resolving SSAs, and fully
resolving jump threading).  Since I moved stuff around, it's probably
easier to just look at the last section in tree-ssa-threadbackward to
see how it's all laid out.

No functional changes as the new pass hasn't been added to the
pipeline.

OK pending tests?

gcc/ChangeLog:

* tree-pass.h (make_pass_thread_jumps_full): New.
* tree-ssa-threadbackward.c (pass_thread_jumps::gate): Inline.
(try_thread_blocks): Add resolve and speed arguments.
(pass_thread_jumps::execute): Inline.
(do_early_thread_jumps): New.
(do_thread_jumps): New.
(make_pass_thread_jumps):
(pass_early_thread_jumps::gate): Inline.
(pass_early_thread_jumps::execute): Inline.
(class pass_thread_jumps_full): New.

OK.
jeff



[PATCH] Allow fully resolving backward jump threading passes.

2021-10-15 Thread Aldy Hernandez via Gcc-patches
This refactors the backward threader pass so that it can be called in
either fully resolving mode, or in classic mode where any unknowns
default to VARYING.  Doing so opens the door for
"pass_thread_jumps_full" which has the resolving bits set.

This pass has not been added to the pipeline, but with it in place, we
can now experiment with it to see how to reduce the number of
jump threaders.  The first suspect will probably be enabling fully
resolving in the backward threader pass immediately preceeding VRP2,
and removing the VRP2 threader pass.  Now that VRP and the backward
threader are sharing a solver, and most of the threads get handcuffed
by cancel_threads(), we should have a variety of scenarios to try.

In the process, I have cleaned up things to make it trivial to see
what the difference between the 3 variants are (early jump
threading, quick jump threading without resolving SSAs, and fully
resolving jump threading).  Since I moved stuff around, it's probably
easier to just look at the last section in tree-ssa-threadbackward to
see how it's all laid out.

No functional changes as the new pass hasn't been added to the
pipeline.

OK pending tests?

gcc/ChangeLog:

* tree-pass.h (make_pass_thread_jumps_full): New.
* tree-ssa-threadbackward.c (pass_thread_jumps::gate): Inline.
(try_thread_blocks): Add resolve and speed arguments.
(pass_thread_jumps::execute): Inline.
(do_early_thread_jumps): New.
(do_thread_jumps): New.
(make_pass_thread_jumps):
(pass_early_thread_jumps::gate): Inline.
(pass_early_thread_jumps::execute): Inline.
(class pass_thread_jumps_full): New.
---
 gcc/tree-pass.h   |   1 +
 gcc/tree-ssa-threadbackward.c | 178 --
 2 files changed, 108 insertions(+), 71 deletions(-)

diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index 84477a47b88..d379769a943 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -407,6 +407,7 @@ extern gimple_opt_pass *make_pass_cd_dce (gcc::context 
*ctxt);
 extern gimple_opt_pass *make_pass_call_cdce (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_merge_phi (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_thread_jumps (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_thread_jumps_full (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_early_thread_jumps (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_split_crit_edges (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_laddress (gcc::context *ctxt);
diff --git a/gcc/tree-ssa-threadbackward.c b/gcc/tree-ssa-threadbackward.c
index 8cc92a484fe..62f936a9651 100644
--- a/gcc/tree-ssa-threadbackward.c
+++ b/gcc/tree-ssa-threadbackward.c
@@ -114,7 +114,7 @@ private:
   static const edge UNREACHABLE_EDGE;
   // Set to TRUE if unknown SSA names along a path should be resolved
   // with the ranger.  Otherwise, unknown SSA names are assumed to be
-  // VARYING.  Setting to true more precise but slower.
+  // VARYING.  Setting to true is more precise but slower.
   bool m_resolve;
 };
 
@@ -925,47 +925,15 @@ back_threader_registry::register_path (const 
vec _path,
   return true;
 }
 
-namespace {
-
-const pass_data pass_data_thread_jumps =
-{
-  GIMPLE_PASS,
-  "thread",
-  OPTGROUP_NONE,
-  TV_TREE_SSA_THREAD_JUMPS,
-  ( PROP_cfg | PROP_ssa ),
-  0,
-  0,
-  0,
-  TODO_update_ssa,
-};
-
-class pass_thread_jumps : public gimple_opt_pass
-{
-public:
-  pass_thread_jumps (gcc::context *ctxt)
-: gimple_opt_pass (pass_data_thread_jumps, ctxt)
-  {}
-
-  opt_pass * clone (void) { return new pass_thread_jumps (m_ctxt); }
-  virtual bool gate (function *);
-  virtual unsigned int execute (function *);
-};
-
-bool
-pass_thread_jumps::gate (function *fun ATTRIBUTE_UNUSED)
-{
-  return flag_thread_jumps && flag_expensive_optimizations;
-}
-
-// Try to thread blocks in FUN.  Return TRUE if any jump thread paths were
-// registered.
+// Try to thread blocks in FUN.  RESOLVE is TRUE when fully resolving
+// unknown SSAs.  SPEED is TRUE when optimizing for speed.
+//
+// Return TRUE if any jump thread paths were registered.
 
 static bool
-try_thread_blocks (function *fun)
+try_thread_blocks (function *fun, bool resolve, bool speed)
 {
-  /* Try to thread each block with more than one successor.  */
-  back_threader threader (/*speed=*/true, /*resolve=*/false);
+  back_threader threader (speed, resolve);
   basic_block bb;
   FOR_EACH_BB_FN (bb, fun)
 {
@@ -975,24 +943,27 @@ try_thread_blocks (function *fun)
   return threader.thread_through_all_blocks (/*peel_loop_headers=*/true);
 }
 
-unsigned int
-pass_thread_jumps::execute (function *fun)
+static unsigned int
+do_early_thread_jumps (function *fun, bool resolve)
 {
-  loop_optimizer_init (LOOPS_HAVE_PREHEADERS | LOOPS_HAVE_SIMPLE_LATCHES);
+  loop_optimizer_init (AVOID_CFG_MODIFICATIONS);
 
-  bool changed = try_thread_blocks (fun);
+  try_thread_blocks (fun, resolve, /*speed=*/false);
 
   

Re: [PATCH] Ranger : Do not process abnormal ssa-names.

2021-10-15 Thread Jeff Law via Gcc-patches




On 10/15/2021 8:21 AM, Aldy Hernandez wrote:



On 10/15/21 3:50 PM, Andrew MacLeod wrote:
I've been looking at the pathological time issue ranger has with the 
testcase from, uh..  PR 97623 I think.  I've lost the details, 
but kept the file since it was showing unpleasant behaviour.


Most of the time is spent in callbacks from substitute_and_fold to 
value_on_edge()  dealing with PHI results and arguments. Turns out, 
its virtually all wasted time dealing with SSA_NAMES with the 
OCCURS_IN_ABNORMAL_PHI flag set..


This patch tells ranger not to consider any SSA_NAMEs which occur in 
abnormal PHIs.  This reduces the memory footprint of all the caches, 
and also has a ripple effect with the new threader code which uses 
the GORI exports and imports tables, making it faster as well as no 
ssa-name with the abnormal flag set will be entered into the tables.


That alone was not quite enough, as all the sheer volume of call 
backs still took time,  so I added checks in the value_of_* class of 
routines used by substitute_and_fold to indicate there is no constant 
value available for any SSA_NAME with that flag set.


On my x86_64 box, before this change, that test case looked like:

tree VRP   :   7.76 (  4%)   0.23 ( 5%)   
8.02 (  4%)   537k (  0%)
tree VRP threader  :   7.20 (  4%)   0.08 (  2%) 7.28 
( 4%)   392k (  0%)
tree Early VRP :  39.22 ( 22%)   0.07 (  2%) 
39.44 ( 22%)  1142k (  0%)


And with this patch , the results are:

  tree VRP   :   7.57 (  6%)   0.26 ( 5%)   
7.85 (  6%)   537k (  0%)
  tree VRP threader  :   0.62 (  0%)   0.02 ( 0%)   
0.65 (  0%)   392k (  0%)
  tree Early VRP :   4.00 (  3%)   0.01 ( 0%)   
4.03 (  3%)  1142k (  0%)


Which is a significant improvement, both for EVRP and the threader..

The patch adjusts the ranger folder, as well as the hybrid folder.

bootstrapped on x86_64-pc-linux-gnu with no regressions and no missed 
cases that I have been able to find.


I don't want to push it quite yet as I wanted feedback to make sure 
we don't actually do anything I'm not aware of with SSA_NAMES which 
have the ABNORMAL_PHI flag set.  Most of the code i can find in VRP 
and vr-values appears to punt, so I presume not even considering 
those names is fine?


The backward threader skips both edges with EDGE_ABNORMAL set as well 
as phi results to have SSA_NAME_OCCURS_IN_ABNORMAL_PHI.


The forward threader skips out on all abnormal edges as well.  It 
seems to even avoid threading through blocks where one of the 2 
outgoing edges is abnormal.  Dunno if this was an oversight, or just 
being extra careful.
Being extra careful.   I couldn't convince myself that copying a block 
with an abnormal edge (incoming or outgoing) was going to be reliably safe.


jeff



Re: [PATCH] Ranger : Do not process abnormal ssa-names.

2021-10-15 Thread Aldy Hernandez via Gcc-patches




On 10/15/21 3:50 PM, Andrew MacLeod wrote:
I've been looking at the pathological time issue ranger has with the 
testcase from, uh..  PR 97623 I think.  I've lost the details, but 
kept the file since it was showing unpleasant behaviour.


Most of the time is spent in callbacks from substitute_and_fold to 
value_on_edge()  dealing with PHI results and arguments.  Turns out, its 
virtually all wasted time dealing with SSA_NAMES with the 
OCCURS_IN_ABNORMAL_PHI flag set..


This patch tells ranger not to consider any SSA_NAMEs which occur in 
abnormal PHIs.  This reduces the memory footprint of all the caches, and 
also has a ripple effect with the new threader code which uses the GORI 
exports and imports tables, making it faster as well as no ssa-name with 
the abnormal flag set will be entered into the tables.


That alone was not quite enough, as all the sheer volume of call backs 
still took time,  so I added checks in the value_of_* class of routines 
used by substitute_and_fold to indicate there is no constant value 
available for any SSA_NAME with that flag set.


On my x86_64 box, before this change, that test case looked like:

tree VRP   :   7.76 (  4%)   0.23 ( 5%)   8.02 
(  4%)   537k (  0%)
tree VRP threader  :   7.20 (  4%)   0.08 (  2%) 7.28 ( 
4%)   392k (  0%)
tree Early VRP :  39.22 ( 22%)   0.07 (  2%) 39.44 ( 
22%)  1142k (  0%)


And with this patch , the results are:

  tree VRP   :   7.57 (  6%)   0.26 ( 5%)   7.85 
(  6%)   537k (  0%)
  tree VRP threader  :   0.62 (  0%)   0.02 ( 0%)   0.65 
(  0%)   392k (  0%)
  tree Early VRP :   4.00 (  3%)   0.01 ( 0%)   4.03 
(  3%)  1142k (  0%)


Which is a significant improvement, both for EVRP and the threader..

The patch adjusts the ranger folder, as well as the hybrid folder.

bootstrapped on x86_64-pc-linux-gnu with no regressions and no missed 
cases that I have been able to find.


I don't want to push it quite yet as I wanted feedback to make sure we 
don't actually do anything I'm not aware of with SSA_NAMES which have 
the ABNORMAL_PHI flag set.  Most of the code i can find in VRP and 
vr-values appears to punt, so I presume not even considering those names 
is fine?


The backward threader skips both edges with EDGE_ABNORMAL set as well as 
phi results to have SSA_NAME_OCCURS_IN_ABNORMAL_PHI.


The forward threader skips out on all abnormal edges as well.  It seems 
to even avoid threading through blocks where one of the 2 outgoing edges 
is abnormal.  Dunno if this was an oversight, or just being extra careful.


Anywh, at least from the threaders you're safe.

Aldy



Re: [PATCH] Ranger : Do not process abnormal ssa-names.

2021-10-15 Thread Jeff Law via Gcc-patches




On 10/15/2021 7:50 AM, Andrew MacLeod via Gcc-patches wrote:
I've been looking at the pathological time issue ranger has with the 
testcase from, uh..  PR 97623 I think.  I've lost the details, but 
kept the file since it was showing unpleasant behaviour.


Most of the time is spent in callbacks from substitute_and_fold to 
value_on_edge()  dealing with PHI results and arguments.  Turns out, 
its virtually all wasted time dealing with SSA_NAMES with the 
OCCURS_IN_ABNORMAL_PHI flag set..


This patch tells ranger not to consider any SSA_NAMEs which occur in 
abnormal PHIs.  This reduces the memory footprint of all the caches, 
and also has a ripple effect with the new threader code which uses the 
GORI exports and imports tables, making it faster as well as no 
ssa-name with the abnormal flag set will be entered into the tables.


That alone was not quite enough, as all the sheer volume of call backs 
still took time,  so I added checks in the value_of_* class of 
routines used by substitute_and_fold to indicate there is no constant 
value available for any SSA_NAME with that flag set.


On my x86_64 box, before this change, that test case looked like:

tree VRP   :   7.76 (  4%)   0.23 ( 5%) 8.02 
(  4%)   537k (  0%)
tree VRP threader  :   7.20 (  4%)   0.08 (  2%) 7.28 
(  4%)   392k (  0%)
tree Early VRP :  39.22 ( 22%)   0.07 (  2%) 39.44 
( 22%)  1142k (  0%)


And with this patch , the results are:

 tree VRP   :   7.57 (  6%)   0.26 ( 5%) 7.85 
(  6%)   537k (  0%)
 tree VRP threader  :   0.62 (  0%)   0.02 ( 0%) 0.65 
(  0%)   392k (  0%)
 tree Early VRP :   4.00 (  3%)   0.01 ( 0%) 4.03 
(  3%)  1142k (  0%)


Which is a significant improvement, both for EVRP and the threader..

The patch adjusts the ranger folder, as well as the hybrid folder.

bootstrapped on x86_64-pc-linux-gnu with no regressions and no missed 
cases that I have been able to find.


I don't want to push it quite yet as I wanted feedback to make sure we 
don't actually do anything I'm not aware of with SSA_NAMES which have 
the ABNORMAL_PHI flag set.  Most of the code i can find in VRP and 
vr-values appears to punt, so I presume not even considering those 
names is fine?


This also seems like something that might be worth back-porting, 
especially the hybrid pass parts...
Punting on the abnormals seems perfectly fine to me.  They rarely, if 
ever, provide information that improves optimization.


Jeff


[PATCH] Ranger : Do not process abnormal ssa-names.

2021-10-15 Thread Andrew MacLeod via Gcc-patches
I've been looking at the pathological time issue ranger has with the 
testcase from, uh..  PR 97623 I think.  I've lost the details, but 
kept the file since it was showing unpleasant behaviour.


Most of the time is spent in callbacks from substitute_and_fold to 
value_on_edge()  dealing with PHI results and arguments.  Turns out, its 
virtually all wasted time dealing with SSA_NAMES with the 
OCCURS_IN_ABNORMAL_PHI flag set..


This patch tells ranger not to consider any SSA_NAMEs which occur in 
abnormal PHIs.  This reduces the memory footprint of all the caches, and 
also has a ripple effect with the new threader code which uses the GORI 
exports and imports tables, making it faster as well as no ssa-name with 
the abnormal flag set will be entered into the tables.


That alone was not quite enough, as all the sheer volume of call backs 
still took time,  so I added checks in the value_of_* class of routines 
used by substitute_and_fold to indicate there is no constant value 
available for any SSA_NAME with that flag set.


On my x86_64 box, before this change, that test case looked like:

tree VRP   :   7.76 (  4%)   0.23 ( 5%)   8.02 
(  4%)   537k (  0%)
tree VRP threader  :   7.20 (  4%)   0.08 (  2%) 7.28 (  
4%)   392k (  0%)
tree Early VRP :  39.22 ( 22%)   0.07 (  2%) 39.44 ( 
22%)  1142k (  0%)


And with this patch , the results are:

 tree VRP   :   7.57 (  6%)   0.26 ( 5%)   7.85 
(  6%)   537k (  0%)
 tree VRP threader  :   0.62 (  0%)   0.02 ( 0%)   0.65 
(  0%)   392k (  0%)
 tree Early VRP :   4.00 (  3%)   0.01 ( 0%)   4.03 
(  3%)  1142k (  0%)


Which is a significant improvement, both for EVRP and the threader..

The patch adjusts the ranger folder, as well as the hybrid folder.

bootstrapped on x86_64-pc-linux-gnu with no regressions and no missed 
cases that I have been able to find.


I don't want to push it quite yet as I wanted feedback to make sure we 
don't actually do anything I'm not aware of with SSA_NAMES which have 
the ABNORMAL_PHI flag set.  Most of the code i can find in VRP and 
vr-values appears to punt, so I presume not even considering those names 
is fine?


This also seems like something that might be worth back-porting, 
especially the hybrid pass parts...


Andrew


>From 146744fcde6a67f759ffc4aa3e8340861e229829 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Thu, 7 Oct 2021 10:12:29 -0400
Subject: [PATCH] Ranger : Do not process abnormal ssa-names.

	* gimple-range-fold.h (gimple_range_ssa_p): Don't process names
	that occur in abnormal phis.
	* gimple-range.cc (gimple_ranger::range_on_edge): Return false for
	abnormal and EH edges.
	* gimple-ssa-evrp.c (rvrp_folder::value_of_expr): Ditto.
	(rvrp_folder::value_on_edge): Ditto.
	(rvrp_folder::value_of_stmt): Ditto.
	(hybrid_folder::value_of_expr): Ditto for ranger queries.
	(hybrid_folder::value_on_edge): Ditto.
	(hybrid_folder::value_of_stmt): Ditto.
	* value-query.cc (gimple_range_global): Always return a range if
	the type is supported.
---
 gcc/gimple-range-fold.h |  1 +
 gcc/gimple-range.cc |  4 
 gcc/gimple-ssa-evrp.c   | 39 ---
 gcc/value-query.cc  |  3 ++-
 4 files changed, 39 insertions(+), 8 deletions(-)

diff --git a/gcc/gimple-range-fold.h b/gcc/gimple-range-fold.h
index bc0874b5f31..350e2c4e039 100644
--- a/gcc/gimple-range-fold.h
+++ b/gcc/gimple-range-fold.h
@@ -93,6 +93,7 @@ gimple_range_ssa_p (tree exp)
 {
   if (exp && TREE_CODE (exp) == SSA_NAME &&
   !SSA_NAME_IS_VIRTUAL_OPERAND (exp) &&
+  !SSA_NAME_OCCURS_IN_ABNORMAL_PHI (exp) &&
   irange::supports_type_p (TREE_TYPE (exp)))
 return exp;
   return NULL_TREE;
diff --git a/gcc/gimple-range.cc b/gcc/gimple-range.cc
index 6eb3f71bbd3..85ef9745593 100644
--- a/gcc/gimple-range.cc
+++ b/gcc/gimple-range.cc
@@ -180,6 +180,10 @@ gimple_ranger::range_on_edge (irange , edge e, tree name)
   int_range_max edge_range;
   gcc_checking_assert (irange::supports_type_p (TREE_TYPE (name)));
 
+  // Do not process values along abnormal or EH edges.
+  if (e->flags & (EDGE_ABNORMAL|EDGE_EH))
+return false;
+
   unsigned idx;
   if ((idx = tracer.header ("range_on_edge (")))
 {
diff --git a/gcc/gimple-ssa-evrp.c b/gcc/gimple-ssa-evrp.c
index 437f19471f1..7f2055501a0 100644
--- a/gcc/gimple-ssa-evrp.c
+++ b/gcc/gimple-ssa-evrp.c
@@ -137,6 +137,9 @@ public:
 
   tree value_of_expr (tree name, gimple *s = NULL) OVERRIDE
   {
+// Shortcircuit subst_and_fold callbacks for abnormal ssa_names.
+if (TREE_CODE (name) == SSA_NAME && SSA_NAME_OCCURS_IN_ABNORMAL_PHI (name))
+  return NULL;
 tree ret = m_ranger->value_of_expr (name, s);
 if (!ret && supported_pointer_equiv_p (name))
   ret = m_pta->get_equiv (name);
@@ -145,6 +148,9 @@ public:
 
   tree value_on_edge (edge e, tree name) OVERRIDE
   {
+// Shortcircuit subst_and_fold callbacks for 

[committed] amdgcn: fix up offload debug linking with LLVM 13

2021-10-15 Thread Andrew Stubbs
This is a follow-up to my previous LLVM13 support patches (the amdgcn 
port uses the LLVM assembler) to fix up a corner case.


With this patch one can now enable debug information in LLVM 13 offload 
binaries. This was trickier than you'd think because the different LLVM 
versions have different attribute interfaces and behaviours and when you 
fix one issue another issue pops up in another case. The root of the 
problem with debug is that mkoffload has to set the ELF flags on the 
early debug binary the same way the assembler will do in all supported 
cases, or else it won't link.


The only known remaining problem with LLVM 13 compatibility is an 
assembler error with kernels that use mapped variables. It only affects 
a few test cases in the testsuite.


LLVMs 10, 11, and 12 remain unsupported.

Andrewamdgcn: fix up offload debug linking with LLVM 13

Between LLVM 9 and LLVM 13 the attribute works differently in several ways,
and this needs to be allowed for in GCC and mkoffload independently.

This patch fixes up mkoffload when debug info is enabled, which is made more
complicated because the configure tests checks whether the attribute option
is accepted silently, but does not check if the assembler actually sets the
ELF flags for that attribute, and mkoffload needs to mimick that behaviour
exactly. The patch therefore removes some of the conditionals.

gcc/ChangeLog:

* config/gcn/gcn-hsa.h (S_FIJI): Set unconditionally.
(S_900): Likewise.
(S_906): Likewise.
* config/gcn/gcn.c: Hard code SRAM ECC settings for old architectures.
* config/gcn/mkoffload.c (ELFABIVERSION_AMDGPU_HSA): Rename to ...
(ELFABIVERSION_AMDGPU_HSA_V3): ... this.
(ELFABIVERSION_AMDGPU_HSA_V4): New.
(SET_SRAM_ECC_UNSUPPORTED): New.
(copy_early_debug_info): Create elf flags to match the other objects.
(main): Just let the attribute flags pass through.

diff --git a/gcc/config/gcn/gcn-hsa.h b/gcc/config/gcn/gcn-hsa.h
index 6a432d17d99f..4fd2f07b9836 100644
--- a/gcc/config/gcn/gcn-hsa.h
+++ b/gcc/config/gcn/gcn-hsa.h
@@ -96,21 +96,10 @@ extern unsigned int gcn_local_sym_hash (const char *name);
 #define X_908 "march=gfx908:;"
 #endif
 
-#ifdef HAVE_GCN_SRAM_ECC_FIJI
-#define S_FIJI
-#else
+/* These targets can't have SRAM-ECC, even if a broken assembler allows it.  */
 #define S_FIJI "!march=*:;march=fiji:;"
-#endif
-#ifdef HAVE_GCN_SRAM_ECC_GFX900
-#define S_900
-#else
 #define S_900 "march=gfx900:;"
-#endif
-#ifdef HAVE_GCN_SRAM_ECC_GFX906
-#define S_906
-#else
 #define S_906 "march=gfx906:;"
-#endif
 #ifdef HAVE_GCN_SRAM_ECC_GFX908
 #define S_908
 #else
diff --git a/gcc/config/gcn/gcn.c b/gcc/config/gcn/gcn.c
index 2e90f327c451..75a9c5766947 100644
--- a/gcc/config/gcn/gcn.c
+++ b/gcc/config/gcn/gcn.c
@@ -5226,27 +5226,21 @@ output_file_start (void)
 #ifndef HAVE_GCN_XNACK_FIJI
   use_xnack_attr = false;
 #endif
-#ifndef HAVE_GCN_SRAM_ECC_FIJI
   use_sram_attr = false;
-#endif
   break;
 case PROCESSOR_VEGA10:
   cpu = "gfx900";
 #ifndef HAVE_GCN_XNACK_GFX900
   use_xnack_attr = false;
 #endif
-#ifndef HAVE_GCN_SRAM_ECC_GFX900
   use_sram_attr = false;
-#endif
   break;
 case PROCESSOR_VEGA20:
   cpu = "gfx906";
 #ifndef HAVE_GCN_XNACK_GFX906
   use_xnack_attr = false;
 #endif
-#ifndef HAVE_GCN_SRAM_ECC_GFX906
   use_sram_attr = false;
-#endif
   break;
 case PROCESSOR_GFX908:
   cpu = "gfx908";
diff --git a/gcc/config/gcn/mkoffload.c b/gcc/config/gcn/mkoffload.c
index a3b22d059b96..b2e71ea5aa00 100644
--- a/gcc/config/gcn/mkoffload.c
+++ b/gcc/config/gcn/mkoffload.c
@@ -42,8 +42,10 @@
 
 #undef  ELFOSABI_AMDGPU_HSA
 #define ELFOSABI_AMDGPU_HSA 64
-#undef  ELFABIVERSION_AMDGPU_HSA
-#define ELFABIVERSION_AMDGPU_HSA 1
+#undef  ELFABIVERSION_AMDGPU_HSA_V3
+#define ELFABIVERSION_AMDGPU_HSA_V3 1
+#undef  ELFABIVERSION_AMDGPU_HSA_V4
+#define ELFABIVERSION_AMDGPU_HSA_V4 2
 
 #undef  EF_AMDGPU_MACH_AMDGCN_GFX803
 #define EF_AMDGPU_MACH_AMDGCN_GFX803 0x2a
@@ -77,6 +79,7 @@
 #define SET_SRAM_ECC_ON(VAR) VAR |= EF_AMDGPU_SRAM_ECC_V3
 #define SET_SRAM_ECC_ANY(VAR) SET_SRAM_ECC_ON (VAR)
 #define SET_SRAM_ECC_OFF(VAR) VAR &= ~EF_AMDGPU_SRAM_ECC_V3
+#define SET_SRAM_ECC_UNSUPPORTED(VAR) SET_SRAM_ECC_OFF (VAR)
 #define TEST_SRAM_ECC_ANY(VAR) 0 /* Not supported.  */
 #define TEST_SRAM_ECC_ON(VAR) (VAR & EF_AMDGPU_SRAM_ECC_V3)
 #endif
@@ -94,6 +97,9 @@
 | EF_AMDGPU_FEATURE_SRAMECC_ANY_V4)
 #define SET_SRAM_ECC_OFF(VAR) VAR = ((VAR & ~EF_AMDGPU_FEATURE_SRAMECC_V4) \
 | EF_AMDGPU_FEATURE_SRAMECC_OFF_V4)
+#define SET_SRAM_ECC_UNSUPPORTED(VAR) \
+  VAR = ((VAR & ~EF_AMDGPU_FEATURE_SRAMECC_V4) \
+| EF_AMDGPU_FEATURE_SRAMECC_UNSUPPORTED_V4)
 #define TEST_SRAM_ECC_ANY(VAR) ((VAR & EF_AMDGPU_FEATURE_SRAMECC_V4) \
== EF_AMDGPU_FEATURE_SRAMECC_ANY_V4)
 #define TEST_SRAM_ECC_ON(VAR) ((VAR & 

Re: [PATCH]AArch64 Lower intrinsics shift to GIMPLE when possible.

2021-10-15 Thread Richard Sandiford via Gcc-patches
Tamar Christina  writes:
> Hi All,
>
> This lowers shifts to GIMPLE when the C interpretations of the shift 
> operations
> matches that of AArch64.
>
> In C shifting right by BITSIZE is undefined, but the behavior is defined in
> AArch64.  Additionally negative shifts lefts are undefined in C but defined
> for the register variant of the instruction (SSHL, USHL) as being right 
> shifts.
>
> Since we have a right shift by immediate I rewrite those cases into right 
> shifts
>
> So:
>
> int64x1_t foo3 (int64x1_t a)
> {
>   return vshl_s64 (a, vdup_n_s64(-6));
> }
>
> produces:
>
> foo3:
> sshrd0, d0, 6
> ret
>
> instead of:
>
> foo3:
> mov x0, -6
> fmovd1, x0
> sshld0, d0, d1
> ret
>
> This behavior isn't specifically mentioned for a left shift by immediate, but 
> I
> believe that only the case because we do have a right shift by immediate but 
> not
> a right shift by register.  As such I do the same for left shift by immediate.
>
> The testsuite already has various testcases for shifts (vshl.c etc) so I am 
> not
> adding overlapping tests here.
>
> Out of range shifts like
>
> int64x1_t foo3 (int64x1_t a)
> {
>   return vshl_s64 (a, vdup_n_s64(80));
> }
>
> now get optimized to 0 as well along with undefined behaviors both in C and
> AArch64.

The SSHL results are well-defined for all shift amounts, so we shouldn't
convert them to undefined gimple, even as a temporary step.  E.g.:

int32x4_t foo(int32x4_t x) {
  return vshlq_s32(x, vdupq_n_s32(256));
}

should fold to “x” (if we fold it at all).  Similarly:

int32x4_t foo(int32x4_t x) {
  return vshlq_s32(x, vdupq_n_s32(257));
}

should fold to x << 1 (again if we fold it at all).

For a shift right:

int32x4_t foo(int32x4_t x) {
  return vshlq_s32(x, vdupq_n_s32(-64));
}

is equivalent to:

int32x4_t foo(int32x4_t x) {
  return vshrq_n_s32(x, 31);
}

and so it shouldn't fold to 0.

> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64-builtins.c
>   (aarch64_general_gimple_fold_builtin): Add ashl, sshl, ushl, ashr,
>   ashr_simd, lshr, lshr_simd.
>   * config/aarch64/aarch64-simd-builtins.def (lshr): Use USHIFTIMM.
>   * config/aarch64/arm_neon.h (vshr_n_u8, vshr_n_u16, vshr_n_u32,
>   vshrq_n_u8, vshrq_n_u16, vshrq_n_u32, vshrq_n_u64): Fix type hack.
>
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/signbit-2.c: New test.
>
> --- inline copy of patch -- 
> diff --git a/gcc/config/aarch64/aarch64-builtins.c 
> b/gcc/config/aarch64/aarch64-builtins.c
> index 
> f6b41d9c200d6300dee65ba60ae94488231a8a38..e47545b111762b95242d8f8de1a26f7bd11992ae
>  100644
> --- a/gcc/config/aarch64/aarch64-builtins.c
> +++ b/gcc/config/aarch64/aarch64-builtins.c
> @@ -2394,6 +2394,68 @@ aarch64_general_gimple_fold_builtin (unsigned int 
> fcode, gcall *stmt)
>  1, args[0]);
>   gimple_call_set_lhs (new_stmt, gimple_call_lhs (stmt));
>   break;
> +  BUILTIN_VSDQ_I_DI (BINOP, ashl, 3, NONE)
> + {
> +   tree cst = args[1];
> +   tree ctype = TREE_TYPE (cst);
> +   if (INTEGRAL_TYPE_P (ctype)

Nit: redundant test.

> +   && TREE_CODE (cst) == INTEGER_CST)
> + {
> +   wide_int wcst = wi::to_wide (cst);
> +   if (wi::neg_p (wcst, TYPE_SIGN (ctype)))
> + new_stmt =
> +   gimple_build_assign (gimple_call_lhs (stmt),
> +RSHIFT_EXPR, args[0],
> +wide_int_to_tree (ctype,
> +  wi::abs (wcst)));
> +   else
> + new_stmt =
> +   gimple_build_assign (gimple_call_lhs (stmt),
> +LSHIFT_EXPR, args[0], args[1]);
> + }

I think it's a bug that we currently accept out-of-range shift amounts
for vshl{,q}_n.  E.g., for:

#include 

int32x4_t foo(int32x4_t x) {
  return vshlq_n_s32(x, 32);
}

clang gives:

error: argument value 33 is outside the valid range [0, 31]
  return vshlq_n_s32(x, 33);
 ^  ~~

which AIUI is the correct behaviour.

So for this I think we should only fold [0, precision - 1] shifts.
Let's leave improving the error detection as future work. :-)

> + }
> + break;
> +  BUILTIN_VSDQ_I_DI (BINOP, sshl, 0, NONE)
> +  BUILTIN_VSDQ_I_DI (BINOP_UUS, ushl, 0, NONE)
> + {
> +   tree cst = args[1];
> +   tree ctype = TREE_TYPE (cst);
> +   HOST_WIDE_INT bits = GET_MODE_UNIT_BITSIZE (TYPE_MODE (TREE_TYPE 
> (args[0])));
> +   if (INTEGRAL_TYPE_P (ctype)
> +   && TREE_CODE (cst) == INTEGER_CST)

I don't think this works, since args[1] is a vector rather than
a scalar.  E.g. trying locally:

int32x4_t foo(int32x4_t x) {
  return vshlq_s32(vdupq_n_s32(1), vdupq_n_s32(10));
}


[PATCH] Allow early sets of SSE hard registers from standard_sse_constant_p

2021-10-15 Thread Roger Sayle

My previous patch, which was intended to reduce the differences seen by
the combination of -march=cascadelake and -m32, has additionally found
some more instances where this combination behaves differently to regular
x86_64-pc-linux-gnu.  The middle-end always, and backends usually, use
emit_move_insn to emit/expand move instructions allowing the backend
control over placing things in constant pools, adding REG_EQUAL notes,
and so on.  Several of the AVX512 built-in expanders bypass this logic,
and instead generate moves directly using emit_insn(gen_rtx_SET (dst,src)).

For example, i386-expand.c line 12004 contains:
  for (i = 0; i < 8; i++)
emit_insn (gen_rtx_SET (xmm_regs[i], const0_rtx));

I suspect that in this case, loading of standard_sse_constant_p, my
change to require loading of likely spilled hard registers via a
pseudo is perhaps overly strict, so this patch/fix reallows these
immediate constants values to be loaded directly prior to reload.

If anyone notices a (SPEC benchmark) performance regression with
this patch, I'll propose the more invasive fix to make more use of
emit_move_insn in the backend (and revert this fix), but all things
being equal it's best to leave things the way they previously were.

This patch not only cures the regressions reported by Sunil's
tester, but in combination with the previous patch now has 7 fewer
unexpected failures in the testsuite with -m32 -march=cascadelake.
This patch has also been tested with "make bootstrap" and
"make -k check" on x86_64-pc-linux-gnu with no new failures.

Ok for mainline?
Sorry again for the temporary inconvenience.


2021-10-15  Roger Sayle  

gcc/ChangeLog
* config/i386/i386.c (ix86_hardreg_mov_ok): For vector modes,
allow standard_sse_constant_p immediate constants.


Thanks in advance,
Roger
--

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index fb65609..9cc903e 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -19303,7 +19303,9 @@ ix86_hardreg_mov_ok (rtx dst, rtx src)
   /* Avoid complex sets of likely_spilled hard registers before reload.  */
   if (REG_P (dst) && HARD_REGISTER_P (dst)
   && !REG_P (src) && !MEM_P (src)
-  && !x86_64_immediate_operand (src, GET_MODE (dst))
+  && !(VECTOR_MODE_P (GET_MODE (dst))
+  ? standard_sse_constant_p (src, GET_MODE (dst))
+  : x86_64_immediate_operand (src, GET_MODE (dst)))
   && ix86_class_likely_spilled_p (REGNO_REG_CLASS (REGNO (dst)))
   && !reload_completed)
 return false;


[COMMITED] tree-optimization/102752: Fix determining precission of reduction_var

2021-10-15 Thread Stefan Schulze Frielinghaus via Gcc-patches
While determining the precission of reduction_var an SSA_NAME instead of
its TREE_TYPE is used.  Streamlined with other TREE_TYPE (reduction_var)
uses.

Bootstrapped and regtested on x86 and IBM Z.  Committed as per PR102752.

gcc/ChangeLog:

* tree-loop-distribution.c (reduction_var_overflows_first):
Pass the type of reduction_var as first argument as it is also
done for the load type.
(loop_distribution::transform_reduction_loop): Add missing
TREE_TYPE while determining precission of reduction_var.
---
 gcc/tree-loop-distribution.c | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c
index fb9250031b5..583c01a42d8 100644
--- a/gcc/tree-loop-distribution.c
+++ b/gcc/tree-loop-distribution.c
@@ -3425,12 +3425,12 @@ generate_strlen_builtin_using_rawmemchr (loop_p loop, 
tree reduction_var,
 
 /* Return true if we can count at least as many characters by taking pointer
difference as we can count via reduction_var without an overflow.  Thus
-   compute 2^n < (2^(m-1) / s) where n = TYPE_PRECISION (reduction_var),
+   compute 2^n < (2^(m-1) / s) where n = TYPE_PRECISION (reduction_var_type),
m = TYPE_PRECISION (ptrdiff_type_node), and s = size of each character.  */
 static bool
-reduction_var_overflows_first (tree reduction_var, tree load_type)
+reduction_var_overflows_first (tree reduction_var_type, tree load_type)
 {
-  widest_int n2 = wi::lshift (1, TYPE_PRECISION (reduction_var));;
+  widest_int n2 = wi::lshift (1, TYPE_PRECISION (reduction_var_type));;
   widest_int m2 = wi::lshift (1, TYPE_PRECISION (ptrdiff_type_node) - 1);
   widest_int s = wi::to_widest (TYPE_SIZE_UNIT (load_type));
   return wi::ltu_p (n2, wi::udiv_trunc (m2, s));
@@ -3654,6 +3654,7 @@ loop_distribution::transform_reduction_loop (loop_p loop)
   && integer_onep (reduction_iv.step))
 {
   location_t loc = gimple_location (DR_STMT (load_dr));
+  tree reduction_var_type = TREE_TYPE (reduction_var);
   /* While determining the length of a string an overflow might occur.
 If an overflow only occurs in the loop implementation and not in the
 strlen implementation, then either the overflow is undefined or the
@@ -3680,8 +3681,8 @@ loop_distribution::transform_reduction_loop (loop_p loop)
  && TYPE_PRECISION (load_type) == TYPE_PRECISION (char_type_node)
  && ((TYPE_PRECISION (sizetype) >= TYPE_PRECISION (ptr_type_node) - 1
   && TYPE_PRECISION (ptr_type_node) >= 32)
- || (TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (reduction_var))
- && TYPE_PRECISION (reduction_var) <= TYPE_PRECISION 
(sizetype)))
+ || (TYPE_OVERFLOW_UNDEFINED (reduction_var_type)
+ && TYPE_PRECISION (reduction_var_type) <= TYPE_PRECISION 
(sizetype)))
  && builtin_decl_implicit (BUILT_IN_STRLEN))
generate_strlen_builtin (loop, reduction_var, load_iv.base,
 reduction_iv.base, loc);
@@ -3689,8 +3690,8 @@ loop_distribution::transform_reduction_loop (loop_p loop)
   != CODE_FOR_nothing
   && ((TYPE_PRECISION (ptrdiff_type_node) == TYPE_PRECISION 
(ptr_type_node)
&& TYPE_PRECISION (ptrdiff_type_node) >= 32)
-  || (TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (reduction_var))
-  && reduction_var_overflows_first (reduction_var, 
load_type
+  || (TYPE_OVERFLOW_UNDEFINED (reduction_var_type)
+  && reduction_var_overflows_first (reduction_var_type, 
load_type
generate_strlen_builtin_using_rawmemchr (loop, reduction_var,
 load_iv.base,
 load_type,
-- 
2.31.1



RE: [PATCH]middle-end convert negate + right shift into compare greater.

2021-10-15 Thread Tamar Christina via Gcc-patches


> -Original Message-
> From: Richard Biener 
> Sent: Friday, October 15, 2021 10:07 AM
> To: Tamar Christina 
> Cc: Richard Earnshaw ; gcc-
> patc...@gcc.gnu.org; nd 
> Subject: RE: [PATCH]middle-end convert negate + right shift into compare
> greater.
> 
> On Fri, 15 Oct 2021, Tamar Christina wrote:
> 
> > > >
> > > > +/* Fold (-x >> C) into x > 0 where C = precision(type) - 1.  */
> > > > +(for cst (INTEGER_CST VECTOR_CST)  (simplify
> > > > +  (rshift (negate:s @0) cst@1)
> > > > +   (if (!flag_wrapv)
> > >
> > > Don't test flag_wrapv directly, instead use the appropriate
> > > TYPE_OVERFLOW_{UNDEFINED,WRAPS} predicates.  But I'm not sure
> what
> > > we are protecting against?  Right-shift of signed integers is
> > > implementation- defined and GCC treats it as you'd expect, sign-
> extending the result.
> > >
> >
> > It's protecting against the overflow of the negate on INT_MIN. When
> > wrapping overflows are enabled the results would be wrong.
> 
> But -INT_MIN == INT_MIN in twos-complement so I fail to see the wrong
> result?  That is, both -INT_MIN >> 31 and INT_MIN >> 31 are -1.
> 
> > > > +(with { tree ctype = TREE_TYPE (@0);
> > > > +   tree stype = TREE_TYPE (@1);
> > > > +   tree bt = truth_type_for (ctype); }
> > > > + (switch
> > > > +  /* Handle scalar case.  */
> > > > +  (if (INTEGRAL_TYPE_P (ctype)
> > > > +  && !VECTOR_TYPE_P (ctype)
> > > > +  && !TYPE_UNSIGNED (ctype)
> > > > +  && canonicalize_math_after_vectorization_p ()
> > > > +  && wi::eq_p (wi::to_wide (@1), TYPE_PRECISION (stype) - 1))
> > > > +   (convert:bt (gt:bt @0 { build_zero_cst (stype); })))
> > >
> > > I'm not sure why the result is of type 'bt' rather than the original
> > > type of the expression?
> >
> > That was to satisfy some RTL check that expected results of
> > comparisons to always be a Boolean, though for scalar that logically
> > always is the case, I just added it for consistency.
> >
> > >
> > > In that regard for non-vectors we'd have to add the sign extension
> > > from unsigned bool, in the vector case we'd hope the type of the
> > > comparison is correct.  I think in both cases it might be convenient
> > > to use
> > >
> > >   (cond (gt:bt @0 { build_zero_cst (ctype); }) { build_all_ones_cst
> > > (ctype); } { build_zero_cost (ctype); })
> > >
> > > to compute the correct result and rely on (cond ..) simplifications
> > > to simplify that if possible.
> > >
> > > Btw, 'stype' should be irrelevant - you need to look at the
> > > precision of 'ctype', no?
> >
> > I was working under the assumption that both input types must have the
> > same precision, but turns out that assumption doesn't need to hold.
> >
> > New version attached.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
> > and no regressions.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > * match.pd: New negate+shift pattern.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.dg/signbit-2.c: New test.
> > * gcc.dg/signbit-3.c: New test.
> > * gcc.target/aarch64/signbit-1.c: New test.
> >
> > --- inline copy of patch ---
> >
> > diff --git a/gcc/match.pd b/gcc/match.pd index
> >
> 7d2a24dbc5e9644a09968f877e12a824d8ba1caa..9532cae582e152cae6e22fcce
> 95a
> > 9744a844e3c2 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -38,7 +38,8 @@ along with GCC; see the file COPYING3.  If not see
> > uniform_integer_cst_p
> > HONOR_NANS
> > uniform_vector_p
> > -   bitmask_inv_cst_vector_p)
> > +   bitmask_inv_cst_vector_p
> > +   expand_vec_cmp_expr_p)
> >
> >  /* Operator lists.  */
> >  (define_operator_list tcc_comparison
> > @@ -826,6 +827,42 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >  { tree utype = unsigned_type_for (type); }
> >  (convert (rshift (lshift (convert:utype @0) @2) @3))
> >
> > +/* Fold (-x >> C) into x > 0 where C = precision(type) - 1.  */ (for
> > +cst (INTEGER_CST VECTOR_CST)  (simplify
> > +  (rshift (negate:s @0) cst@1)
> > +   (if (!TYPE_OVERFLOW_WRAPS (type))
> 
> as said, I don't think that's necessary but at least it's now written 
> correctly ;)
> 
> > +(with { tree ctype = TREE_TYPE (@0);
> 
> Instead of 'ctype' you can use 'type' since the type of the expression is the
> same as that of @0
> 
> > +   tree stype = TREE_TYPE (@1);
> > +   tree bt = truth_type_for (ctype);
> > +   tree zeros = build_zero_cst (ctype); }
> > + (switch
> > +  /* Handle scalar case.  */
> > +  (if (INTEGRAL_TYPE_P (ctype)
> > +  && !VECTOR_TYPE_P (ctype)
> 
> INTEGRAL_TYPE_P does not include VECTOR_TYPE_P.
> 
> > +  && !TYPE_UNSIGNED (ctype)
> > +  && canonicalize_math_after_vectorization_p ()
> > +  && wi::eq_p (wi::to_wide (@1), TYPE_PRECISION (ctype) - 1))
> > +   (cond (gt:bt @0 { zeros; }) { build_all_ones_cst (ctype); } { 
> > zeros; }))
> > +  /* Handle vector case with a scalar immediate.  */
> > +

Re: [COMMITTED] Do not call range_on_path_entry for SSAs defined within the path

2021-10-15 Thread Aldy Hernandez via Gcc-patches
It's been fixed on trunk.

Aldy

On Fri, Oct 15, 2021, 09:52 Christophe LYON via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:

>
> On 14/10/2021 14:21, Aldy Hernandez via Gcc-patches wrote:
> > In the path solver, when requesting the range of an SSA for which we
> > know nothing, we ask the ranger for the range incoming to the path.
> > We do this by asking for all the incoming ranges to the path entry
> > block and unioning them.
> >
> > The problem here is that we're asking for a range on path entry for an
> > SSA which *is* defined in the path, but for which we know nothing
> > about:
> >
> >   some_global.1_2 = some_global;
> >   _3 = (char) some_global.1_2;
> >
> > This request is causing us to ask for range_on_edge of _3 on the
> > incoming edges to the path.  This is a bit of nonsensical request
> > because _3 isn't live on entry to the path, so ranger correctly
> > returns UNDEFINED.  The proper thing is to avoid asking this in the
> > first place.
> >
> > I have added a relevant assert, since it doesn't make sense to call
> > range_on_path_entry for SSAs defined within the path.
> >
> > Tested on x86-64 Linux.
> >
> >   PR 102736
> >
> > gcc/ChangeLog:
> >
> >   PR tree/optimization/102736
> >   * gimple-range-path.cc (path_range_query::range_on_path_entry):
> >   Assert that the requested range is defined outside the path.
> >   (path_range_query::ssa_range_in_phi): Do not call
> >   range_on_path_entry for SSA names that are defined within the
> >   path.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.dg/tree-ssa/pr102736.c: New test.
> > ---
> >   gcc/gimple-range-path.cc |  6 +-
> >   gcc/testsuite/gcc.dg/tree-ssa/pr102736.c | 21 +
> >   2 files changed, 26 insertions(+), 1 deletion(-)
> >   create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr102736.c
> >
> > diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc
> > index 422abfddb8f..694271306a7 100644
> > --- a/gcc/gimple-range-path.cc
> > +++ b/gcc/gimple-range-path.cc
> > @@ -134,6 +134,7 @@ path_range_query::defined_outside_path (tree name)
> >   void
> >   path_range_query::range_on_path_entry (irange , tree name)
> >   {
> > +  gcc_checking_assert (defined_outside_path (name));
> > int_range_max tmp;
> > basic_block entry = entry_bb ();
> > bool changed = false;
> > @@ -258,7 +259,10 @@ path_range_query::ssa_range_in_phi (irange , gphi
> *phi)
> >   // Using both the range on entry to the path, and the
> >   // range on this edge yields significantly better
> >   // results.
> > - range_on_path_entry (r, arg);
> > + if (defined_outside_path (arg))
> > +   range_on_path_entry (r, arg);
> > + else
> > +   r.set_varying (TREE_TYPE (name));
> >   m_ranger.range_on_edge (tmp, e_in, arg);
> >   r.intersect (tmp);
> >   return;
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr102736.c
> b/gcc/testsuite/gcc.dg/tree-ssa/pr102736.c
> > new file mode 100644
> > index 000..7e556f01a86
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr102736.c
> > @@ -0,0 +1,21 @@
> > +// { dg-do run }
> > +// { dg-options "-O1 -ftree-vrp" }
> > +
> > +int a, b = -1, c;
> > +int d = 1;
> > +static inline char e(char f, int g) { return g ? f : 0; }
> > +static inline char h(char f) { return f < a ? f : f < a; }
> > +static inline unsigned char i(unsigned char f, int g) { return g ? f :
> f > g; }
> > +void j() {
> > +L:
> > +  c = e(1, i(h(b), d));
> > +  if (b)
> > +return;
> > +  goto L;
> > +}
> > +int main() {
> > +  j();
> > +  if (c != 1)
> > +__builtin_abort ();
> > +  return 0;
> > +}
>
> Hi,
>
>
> The new test fails at execution on arm / aarch64, not sure if you are
> aware of that already?
>
>
> Thanks,
>
> Christophe
>
>
>


Re: [PATCH] PR fortran/102685 - ICE in output_constructor_regular_field, at varasm.c:5514

2021-10-15 Thread Tobias Burnus

Hi Harald, dear all,

On 14.10.21 23:27, Harald Anlauf via Fortran wrote:

the attached patch adds a check for the shape of arrays in derived type
constructors.  This brings it in line with other major brands.
...
In developing the patch I encountered a difficulty with testcase
dec_structure_6.f90, which uses a DEC extension, namelist "old-style
CLIST initializers in STRUCTURE".  I could not figure out how to
determine the shape of the initializer; it seemed to be always zero.
I've added code to accept this, but only under -fdec-structure, and
added a TODO in a comment.  If somebody reading this could give me
a hint to solve end, I would adjust the patch accordingly.


See attached patch – it does initialize the variables similarly to other
shapes in that file, except that it has to take the shape from the LHS
as seemingly (same testfile) having a 1-dim array can be used to
initialize a 2-dim array.

You can approve that patch and integrate it then in your own patch :-)


Regtested on x86_64-pc-linux-gnu.  OK?  Or further comments?


LGTM – with the DECL exception removed from resolve.c.

Thanks,

Tobias

PS: Without the auto-reshape part, a simple 'gfc_array_size (expr,
>shape[0]))" would have been sufficient.
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
diff --git a/gcc/fortran/decl.c b/gcc/fortran/decl.c
index d6a22d13451..86adb81da32 100644
--- a/gcc/fortran/decl.c
+++ b/gcc/fortran/decl.c
@@ -892,29 +892,32 @@ match_clist_expr (gfc_expr **result, gfc_typespec *ts, gfc_array_spec *as)
   /* Set up expr as an array constructor. */
   if (!scalar)
 {
   expr = gfc_get_array_expr (ts->type, ts->kind, );
   expr->ts = *ts;
   expr->value.constructor = array_head;
 
-  expr->rank = as->rank;
-  expr->shape = gfc_get_shape (expr->rank);
-
   /* Validate sizes.  We built expr ourselves, so cons_size will be
 	 constant (we fail above for non-constant expressions).
 	 We still need to verify that the sizes match.  */
   gcc_assert (gfc_array_size (expr, _size));
   cmp = mpz_cmp (cons_size, as_size);
   if (cmp < 0)
 	gfc_error ("Not enough elements in array initializer at %C");
   else if (cmp > 0)
 	gfc_error ("Too many elements in array initializer at %C");
   mpz_clear (cons_size);
   if (cmp)
 	goto cleanup;
+
+  /* Set the rank/shape to match the LHS as auto-reshape is implied. */
+  expr->rank = as->rank;
+  expr->shape = gfc_get_shape (as->rank);
+  for (int i = 0; i < as->rank; ++i)
+	spec_dimen_size (as, i, >shape[i]);
 }
 
   /* Make sure scalar types match. */
   else if (!gfc_compare_types (>ts, ts)
&& !gfc_convert_type (expr, ts, 1))
 goto cleanup;
 


Re: [committed] libstdc++: Simplify variant access functions

2021-10-15 Thread Maciej Cencora via Gcc-patches
Hi,

variant getter can be implemented in C++17 without using "recursive"
calls, but by generating a list of member pointers and applying them
with fold expression. Here's an example:

https://godbolt.org/z/3vcKjWjPG

Regards,
Maciej


Re: [PATCH] middle-end: fix de-optimizations with bitclear patterns on signed values

2021-10-15 Thread Richard Biener via Gcc-patches
On Fri, 15 Oct 2021, Tamar Christina wrote:

> Hi All,
> 
> During testing after rebasing to commit I noticed a failing testcase with the
> bitmask compare patch.
> 
> Consider the following C++ testcase:
> 
> #include 
> 
> #define A __attribute__((noipa))
> A bool f5 (double i, double j) { auto c = i <=> j; return c >= 0; }
> 
> This turns into a comparison against chars, on systems where chars are signed
> the pattern inserts an unsigned convert such that it's able to do the
> transformation.
> 
> i.e.:
> 
>   # RANGE [-1, 2]
>   # c$_M_value_22 = PHI <-1(3), 0(2), 2(5), 1(4)>
>   # RANGE ~[3, 254]
>   _11 = (unsigned char) c$_M_value_22;
>   _19 = _11 <= 1;
>   # .MEM_24 = VDEF <.MEM_6(D)>
>   D.10434 ={v} {CLOBBER};
>   # .MEM_14 = VDEF <.MEM_24>
>   D.10407 ={v} {CLOBBER};
>   # VUSE <.MEM_14>
>   return _19;
> 
> instead of:
> 
>   # RANGE [-1, 2]
>   # c$_M_value_5 = PHI <-1(3), 0(2), 2(5), 1(4)>
>   # RANGE [-2, 2]
>   _3 = c$_M_value_5 & -2;
>   _19 = _3 == 0;
>   # .MEM_24 = VDEF <.MEM_6(D)>
>   D.10440 ={v} {CLOBBER};
>   # .MEM_14 = VDEF <.MEM_24>
>   D.10413 ={v} {CLOBBER};
>   # VUSE <.MEM_14>
>   return _19;
> 
> This causes much worse codegen under -ffast-math due to phiops no longer
> recognizing the pattern.  It turns out that phiopts spaceship_replacement is
> looking for the exact form that was just changed.
> 
> Trying to get it to recognize the new form is not trivial as the 
> transformation
> doesn't look to work when the thing it's pointing to is itself a phi-node.

What do you mean?  Where it handles the BIT_AND it could also handle
the conversion, no?  The later handling would probably more explicitely
need to distinguish between the BIT_AND and the conversion forms.

Jakub?

> Because of these issues this change delays the replacements until after loop
> opts.  This fixes the failing C++ testcase.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu,
> x86_64-pc-linux-gnu and no regressions.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   * match.pd: Delay bitmask compare pattern till after loop opts.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 
> 9532cae582e152cae6e22fcce95a9744a844e3c2..d26e498447fc25a327a42cc6a119c6153d09ba03
>  100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -4945,7 +4945,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>icmp (le le gt le gt)
>   (simplify
>(cmp (bit_and:c@2 @0 cst@1) integer_zerop)
> -   (with { tree csts = bitmask_inv_cst_vector_p (@1); }
> +   (if (canonicalize_math_after_vectorization_p ())
> +(with { tree csts = bitmask_inv_cst_vector_p (@1); }
>   (switch
>(if (csts && TYPE_UNSIGNED (TREE_TYPE (@1))
>  && (VECTOR_TYPE_P (TREE_TYPE (@1)) || single_use (@2)))
> @@ -4954,7 +4955,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  && (cmp == EQ_EXPR || cmp == NE_EXPR)
>  && (VECTOR_TYPE_P (TREE_TYPE (@1)) || single_use (@2)))
> (with { tree utype = unsigned_type_for (TREE_TYPE (@1)); }
> - (icmp (convert:utype @0) { csts; }
> + (icmp (convert:utype @0) { csts; })
>  
>  /* -A CMP -B -> B CMP A.  */
>  (for cmp (tcc_comparison)
> 
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)


Re: [patch] Tame if-combining when loop unswitching is enabled

2021-10-15 Thread Richard Biener via Gcc-patches
On Fri, Oct 15, 2021 at 11:15 AM Eric Botcazou via Gcc-patches
 wrote:
>
> Hi,
>
> in order to make it possible to vectorize loops running over arrays in Ada,
> which generally contain index checks, hence control-flow instructions, we rely
> on loop unswitching to generate two copies of the loop, one guarded with a
> global condition (no index check fails in the loop) and vectorizable and one
> with the original index checks and non-vectorizable.  This is achieved by the
> simple trick of prepending the global_condition to the condition of the index
> checks and letting the loop unswitching pass do its magic.
>
> But there is an enemy, namely if-combining, which can turn a simple boolean
> conjunction into something else that loop unswitching cannot deal with, and a
> testcase is attached with 3 slightly different versions of the same issue.
>
> Therefore the attached patch attempts to tame if-combining by reasoning on the
> loop invariant status (really loop depths) of the conditions.
>
> Bootstrapped/regtested on x86-64/Linux, OK for the mainline?

Hmm, I see the issue.

This heuristic probably defeats associating the combined conditions to
"good" order?  That is, it looks to me that eventually teaching unswitching
to unswitch on comparisons [feeding GIMPLE_CONDs] would solve the
issue as well?  Martin was working on generalizing the code to handle
switches so eventually he can look into also handling condition parts.

That said, reassoc does order the outermost loop invariant conditions
in the leaf of the condition chain, no?

So,

   for (i;;)
 {
bool a = inv < 0;
bool b = i > 3;
bool c = a && b;
if (c)
  ...
 }

could be unswitched as


 if (inv < 0)
   for (i;;)
 {
bool a = true;
bool b = i > 3;
bool c = a && b;
if (c)
  ...
 }
 else
   for (i;;)
 {
bool a = false;
bool b = i > 3;
bool c = a && b;
if (c)
  ...
 }

>
> 2021-10-15  Eric Botcazou  
>
> * tree-ssa-ifcombine.c: Include cfgloop.h.
> (operand_loop_depth): New function.
> (ifcombine_ifandif): When loop unswitching is enabled, do not merge
> conditions whose loop invariant status is different.
>
>
> 2021-10-15  Eric Botcazou  
>
> * gnat.dg/vect19.ads, gnat.dg/vect19.adb: New test.
>
> --
> Eric Botcazou


[PATCH] middle-end: fix de-optimizations with bitclear patterns on signed values

2021-10-15 Thread Tamar Christina via Gcc-patches
Hi All,

During testing after rebasing to commit I noticed a failing testcase with the
bitmask compare patch.

Consider the following C++ testcase:

#include 

#define A __attribute__((noipa))
A bool f5 (double i, double j) { auto c = i <=> j; return c >= 0; }

This turns into a comparison against chars, on systems where chars are signed
the pattern inserts an unsigned convert such that it's able to do the
transformation.

i.e.:

  # RANGE [-1, 2]
  # c$_M_value_22 = PHI <-1(3), 0(2), 2(5), 1(4)>
  # RANGE ~[3, 254]
  _11 = (unsigned char) c$_M_value_22;
  _19 = _11 <= 1;
  # .MEM_24 = VDEF <.MEM_6(D)>
  D.10434 ={v} {CLOBBER};
  # .MEM_14 = VDEF <.MEM_24>
  D.10407 ={v} {CLOBBER};
  # VUSE <.MEM_14>
  return _19;

instead of:

  # RANGE [-1, 2]
  # c$_M_value_5 = PHI <-1(3), 0(2), 2(5), 1(4)>
  # RANGE [-2, 2]
  _3 = c$_M_value_5 & -2;
  _19 = _3 == 0;
  # .MEM_24 = VDEF <.MEM_6(D)>
  D.10440 ={v} {CLOBBER};
  # .MEM_14 = VDEF <.MEM_24>
  D.10413 ={v} {CLOBBER};
  # VUSE <.MEM_14>
  return _19;

This causes much worse codegen under -ffast-math due to phiops no longer
recognizing the pattern.  It turns out that phiopts spaceship_replacement is
looking for the exact form that was just changed.

Trying to get it to recognize the new form is not trivial as the transformation
doesn't look to work when the thing it's pointing to is itself a phi-node.

Because of these issues this change delays the replacements until after loop
opts.  This fixes the failing C++ testcase.

Bootstrapped Regtested on aarch64-none-linux-gnu,
x86_64-pc-linux-gnu and no regressions.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* match.pd: Delay bitmask compare pattern till after loop opts.

--- inline copy of patch -- 
diff --git a/gcc/match.pd b/gcc/match.pd
index 
9532cae582e152cae6e22fcce95a9744a844e3c2..d26e498447fc25a327a42cc6a119c6153d09ba03
 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -4945,7 +4945,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   icmp (le le gt le gt)
  (simplify
   (cmp (bit_and:c@2 @0 cst@1) integer_zerop)
-   (with { tree csts = bitmask_inv_cst_vector_p (@1); }
+   (if (canonicalize_math_after_vectorization_p ())
+(with { tree csts = bitmask_inv_cst_vector_p (@1); }
  (switch
   (if (csts && TYPE_UNSIGNED (TREE_TYPE (@1))
   && (VECTOR_TYPE_P (TREE_TYPE (@1)) || single_use (@2)))
@@ -4954,7 +4955,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   && (cmp == EQ_EXPR || cmp == NE_EXPR)
   && (VECTOR_TYPE_P (TREE_TYPE (@1)) || single_use (@2)))
(with { tree utype = unsigned_type_for (TREE_TYPE (@1)); }
-   (icmp (convert:utype @0) { csts; }
+   (icmp (convert:utype @0) { csts; })
 
 /* -A CMP -B -> B CMP A.  */
 (for cmp (tcc_comparison)


-- 
diff --git a/gcc/match.pd b/gcc/match.pd
index 9532cae582e152cae6e22fcce95a9744a844e3c2..d26e498447fc25a327a42cc6a119c6153d09ba03 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -4945,7 +4945,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   icmp (le le gt le gt)
  (simplify
   (cmp (bit_and:c@2 @0 cst@1) integer_zerop)
-   (with { tree csts = bitmask_inv_cst_vector_p (@1); }
+   (if (canonicalize_math_after_vectorization_p ())
+(with { tree csts = bitmask_inv_cst_vector_p (@1); }
  (switch
   (if (csts && TYPE_UNSIGNED (TREE_TYPE (@1))
 	   && (VECTOR_TYPE_P (TREE_TYPE (@1)) || single_use (@2)))
@@ -4954,7 +4955,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 	   && (cmp == EQ_EXPR || cmp == NE_EXPR)
 	   && (VECTOR_TYPE_P (TREE_TYPE (@1)) || single_use (@2)))
(with { tree utype = unsigned_type_for (TREE_TYPE (@1)); }
-	(icmp (convert:utype @0) { csts; }
+	(icmp (convert:utype @0) { csts; })
 
 /* -A CMP -B -> B CMP A.  */
 (for cmp (tcc_comparison)



[COMMITTED] Make signness explicit in tree-ssa/pr102736.c

2021-10-15 Thread Aldy Hernandez via Gcc-patches
This test is failing on ppc64* due to different default signness for
chars.  Thanks to Richi for quickly pointing out the problem.

Tested on x86-64 Linux and ppc64le Linux.

gcc/testsuite/ChangeLog:

PR testsuite/pr102751
* gcc.dg/tree-ssa/pr102736.c: Make sign explicit.
---
 gcc/testsuite/gcc.dg/tree-ssa/pr102736.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr102736.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr102736.c
index 7e556f01a86..c693a7189dd 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr102736.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr102736.c
@@ -3,8 +3,8 @@
 
 int a, b = -1, c;
 int d = 1;
-static inline char e(char f, int g) { return g ? f : 0; }
-static inline char h(char f) { return f < a ? f : f < a; }
+static inline signed char e(signed char f, int g) { return g ? f : 0; }
+static inline signed char h(signed char f) { return f < a ? f : f < a; }
 static inline unsigned char i(unsigned char f, int g) { return g ? f : f > g; }
 void j() {
 L:
-- 
2.31.1



Re: [PATCH]middle-end convert negate + right shift into compare greater.

2021-10-15 Thread Richard Biener via Gcc-patches
On Fri, 15 Oct 2021, Richard Earnshaw wrote:

> On 15/10/2021 10:06, Richard Biener via Gcc-patches wrote:
> > On Fri, 15 Oct 2021, Tamar Christina wrote:
> > 
> 
>  +/* Fold (-x >> C) into x > 0 where C = precision(type) - 1.  */ (for
>  +cst (INTEGER_CST VECTOR_CST)  (simplify
>  +  (rshift (negate:s @0) cst@1)
>  +   (if (!flag_wrapv)
> >>>
> >>> Don't test flag_wrapv directly, instead use the appropriate
> >>> TYPE_OVERFLOW_{UNDEFINED,WRAPS} predicates.  But I'm not sure what
> >>> we are protecting against?  Right-shift of signed integers is
> >>> implementation-
> >>> defined and GCC treats it as you'd expect, sign-extending the result.
> >>>
> >>
> >> It's protecting against the overflow of the negate on INT_MIN. When
> >> wrapping
> >> overflows are enabled the results would be wrong.
> > 
> > But -INT_MIN == INT_MIN in twos-complement so I fail to see the wrong
> > result?  That is, both -INT_MIN >> 31 and INT_MIN >> 31 are -1.
> 
> Exactly, so transforming the original testcase from (x = -a >> 31) into (x =
> -(a > 0)) is not valid in that case.

Hmm, but we're not doing that.  Actually, we inconsistently handle
the scalar and the vector variant here - maybe the (negate ..)
is missing around the (gt @0 { ...}) of the scalar case.

Btw, I would appreciate testcases for the cases that would go wrong,
indeed INT_MIN would be handled wrong.

Richard.



> R.
> 
> > 
>  +(with { tree ctype = TREE_TYPE (@0);
>  +tree stype = TREE_TYPE (@1);
>  +tree bt = truth_type_for (ctype); }
>  + (switch
>  +  /* Handle scalar case.  */
>  +  (if (INTEGRAL_TYPE_P (ctype)
>  +   && !VECTOR_TYPE_P (ctype)
>  +   && !TYPE_UNSIGNED (ctype)
>  +   && canonicalize_math_after_vectorization_p ()
>  +   && wi::eq_p (wi::to_wide (@1), TYPE_PRECISION (stype) - 1))
>  +   (convert:bt (gt:bt @0 { build_zero_cst (stype); })))
> >>>
> >>> I'm not sure why the result is of type 'bt' rather than the original type
> >>> of the
> >>> expression?
> >>
> >> That was to satisfy some RTL check that expected results of comparisons to
> >> always
> >> be a Boolean, though for scalar that logically always is the case, I just
> >> added it
> >> for consistency.
> >>
> >>>
> >>> In that regard for non-vectors we'd have to add the sign extension from
> >>> unsigned bool, in the vector case we'd hope the type of the comparison is
> >>> correct.  I think in both cases it might be convenient to use
> >>>
> >>>(cond (gt:bt @0 { build_zero_cst (ctype); }) { build_all_ones_cst
> >>> (ctype); }
> >>> { build_zero_cost (ctype); })
> >>>
> >>> to compute the correct result and rely on (cond ..) simplifications to
> >>> simplify
> >>> that if possible.
> >>>
> >>> Btw, 'stype' should be irrelevant - you need to look at the precision of
> >>> 'ctype',
> >>> no?
> >>
> >> I was working under the assumption that both input types must have the same
> >> precision, but turns out that assumption doesn't need to hold.
> >>
> >> New version attached.
> >>
> >> Bootstrapped Regtested on aarch64-none-linux-gnu,
> >> x86_64-pc-linux-gnu and no regressions.
> >>
> >> Ok for master?
> >>
> >> Thanks,
> >> Tamar
> >>
> >> gcc/ChangeLog:
> >>
> >>  * match.pd: New negate+shift pattern.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >>  * gcc.dg/signbit-2.c: New test.
> >>  * gcc.dg/signbit-3.c: New test.
> >>  * gcc.target/aarch64/signbit-1.c: New test.
> >>
> >> --- inline copy of patch ---
> >>
> >> diff --git a/gcc/match.pd b/gcc/match.pd
> >> index
> >> 7d2a24dbc5e9644a09968f877e12a824d8ba1caa..9532cae582e152cae6e22fcce95a9744a844e3c2
> >> 100644
> >> --- a/gcc/match.pd
> >> +++ b/gcc/match.pd
> >> @@ -38,7 +38,8 @@ along with GCC; see the file COPYING3.  If not see
> >>  uniform_integer_cst_p
> >>  HONOR_NANS
> >>  uniform_vector_p
> >> -   bitmask_inv_cst_vector_p)
> >> +   bitmask_inv_cst_vector_p
> >> +   expand_vec_cmp_expr_p)
> >>   
> >>   /* Operator lists.  */
> >>   (define_operator_list tcc_comparison
> >> @@ -826,6 +827,42 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >>   { tree utype = unsigned_type_for (type); }
> >>   (convert (rshift (lshift (convert:utype @0) @2) @3))
> >>   +/* Fold (-x >> C) into x > 0 where C = precision(type) - 1.  */
> >> +(for cst (INTEGER_CST VECTOR_CST)
> >> + (simplify
> >> +  (rshift (negate:s @0) cst@1)
> >> +   (if (!TYPE_OVERFLOW_WRAPS (type))
> > 
> > as said, I don't think that's necessary but at least it's now
> > written correctly ;)
> > 
> >> +(with { tree ctype = TREE_TYPE (@0);
> > 
> > Instead of 'ctype' you can use 'type' since the type of the expression
> > is the same as that of @0
> > 
> >> +  tree stype = TREE_TYPE (@1);
> >> +  tree bt = truth_type_for (ctype);
> >> +  tree zeros = build_zero_cst (ctype); }
> >> + (switch
> >> +  /* Handle scalar case.  */
> >> +  (if (INTEGRAL_TYPE_P (ctype)
> >> + && 

Re: [PATCH] Convert strlen pass from evrp to ranger.

2021-10-15 Thread Aldy Hernandez via Gcc-patches



On 10/15/21 2:47 AM, Andrew MacLeod wrote:

On 10/14/21 6:07 PM, Martin Sebor via Gcc-patches wrote:

On 10/9/21 12:47 PM, Aldy Hernandez via Gcc-patches wrote:

We seem to be passing a lot of context around in the strlen code.  I
certainly don't want to contribute to more.

Most of the handle_* functions are passing the gsi as well as either
ptr_qry or rvals.  That looks a bit messy.  May I suggest putting all
of that in the strlen pass object (well, the dom walker object, but we
can rename it to be less dom centric)?

Something like the attached (untested) patch could be the basis for
further cleanups.

Jakub, would this line of work interest you?


You didn't ask me but since no one spoke up against it let me add
some encouragement: this is exactly what I was envisioning and in
line with other such modernization we have been doing elsewhere.
Could you please submit it for review?

Martin


I'm willing to bet he didn't submit it for review because he doesn't 
have time this release to polish and track it...  (I think the threader 
has been quite consuming).  Rather, it was offered as a starting point 
for someone else who might be interested in continuing to pursue this 
work...  *everyone* is interested in cleanup work others do :-)


Exactly.  There's a lot of work that could be done in this area, and I'm 
trying to avoid the situation with the threaders where what started as 
refactoring ended up with me basically owning them ;-).


That being said, I there are enough cleanups that are useful on their 
own.  I've removed all the passing around of GSIs, as well as ptr_qry, 
with the exception of anything dealing with the sprintf pass, since it 
has a slightly different interface.


This is patch 0001, which I'm formally submitting for inclusion.  No 
functional changes with this patch.  OK for trunk?


Also, I am PINGing patch 0002, which is the strlen pass conversion to 
the ranger.  As mentioned, this is just a change from an evrp client to 
a ranger client.  The APIs are exactly the same, and besides, the evrp 
analyzer is deprecated and slated for removal.  OK for trunk?


Aldy
>From 152bc3a1dad9a960b7c0c53c65d6690532d9da5a Mon Sep 17 00:00:00 2001
From: Aldy Hernandez 
Date: Fri, 8 Oct 2021 15:54:23 +0200
Subject: [PATCH] Convert strlen pass from evrp to ranger.

The following patch converts the strlen pass from evrp to ranger,
leaving DOM as the last remaining user.

No additional cleanups have been done.  For example, the strlen pass
still has uses of VR_ANTI_RANGE, and the sprintf still passes around
pairs of integers instead of using a proper range.  Fixing this
could further improve these passes.

Basically the entire patch is just adjusting the calls to range_of_expr
to include context.  The previous context of si->stmt was mostly
empty, so not really useful ;-).

With ranger we are now able to remove the range calculation from
before_dom_children entirely.  Just working with the ranger on-demand
catches all the strlen and sprintf testcases with the exception of
builtin-sprintf-warn-22.c which is due to a limitation of the sprintf
code.  I have XFAILed the test and documented what the problem is.

On a positive note, these changes found two possible sprintf overflow
bugs in the C++ and Fortran front-ends which I have fixed below.

Tested on x86-64 Linux.

gcc/ChangeLog:

	* tree-ssa-strlen.c (compare_nonzero_chars): Pass statement
	context to ranger.
	(get_addr_stridx): Same.
	(get_stridx): Same.
	(get_range_strlen_dynamic): Same.
	(handle_builtin_strlen): Same.
	(handle_builtin_strchr): Same.
	(handle_builtin_strcpy): Same.
	(maybe_diag_stxncpy_trunc): Same.
	(handle_builtin_stxncpy_strncat):
	(handle_builtin_memcpy): Same.
	(handle_builtin_strcat): Same.
	(handle_alloc_call): Same.
	(handle_builtin_memset): Same.
	(handle_builtin_string_cmp): Same.
	(handle_pointer_plus): Same.
	(count_nonzero_bytes_addr): Same.
	(count_nonzero_bytes): Same.
	(handle_store): Same.
	(fold_strstr_to_strncmp): Same.
	(handle_integral_assign): Same.
	(check_and_optimize_stmt): Same.
	(class strlen_dom_walker): Replace evrp with ranger.
	(strlen_dom_walker::before_dom_children): Remove evrp.
	(strlen_dom_walker::after_dom_children): Remove evrp.
	* gimple-ssa-warn-access.cc (maybe_check_access_sizes):
	Restrict sprintf output.

gcc/cp/ChangeLog:

	* ptree.c (cxx_print_xnode): Add more space to pfx array.

gcc/fortran/ChangeLog:

	* misc.c (gfc_dummy_typename): Make sure ts->kind is
	non-negative.

gcc/testsuite/ChangeLog:

	* gcc.dg/tree-ssa/builtin-sprintf-warn-22.c: XFAIL.
---
 gcc/cp/ptree.c|   2 +-
 gcc/fortran/misc.c|   2 +-
 gcc/gimple-ssa-warn-access.cc |   2 +-
 .../gcc.dg/tree-ssa/builtin-sprintf-warn-22.c |  13 +-
 gcc/tree-ssa-strlen.c | 145 ++
 5 files changed, 92 insertions(+), 72 deletions(-)

diff --git a/gcc/cp/ptree.c b/gcc/cp/ptree.c
index 1dcd764af01..ca7884db39b 100644
--- 

Re: [PATCH]middle-end convert negate + right shift into compare greater.

2021-10-15 Thread Richard Earnshaw via Gcc-patches

On 15/10/2021 10:06, Richard Biener via Gcc-patches wrote:

On Fri, 15 Oct 2021, Tamar Christina wrote:



+/* Fold (-x >> C) into x > 0 where C = precision(type) - 1.  */ (for
+cst (INTEGER_CST VECTOR_CST)  (simplify
+  (rshift (negate:s @0) cst@1)
+   (if (!flag_wrapv)


Don't test flag_wrapv directly, instead use the appropriate
TYPE_OVERFLOW_{UNDEFINED,WRAPS} predicates.  But I'm not sure what
we are protecting against?  Right-shift of signed integers is implementation-
defined and GCC treats it as you'd expect, sign-extending the result.



It's protecting against the overflow of the negate on INT_MIN. When wrapping
overflows are enabled the results would be wrong.


But -INT_MIN == INT_MIN in twos-complement so I fail to see the wrong
result?  That is, both -INT_MIN >> 31 and INT_MIN >> 31 are -1.


Exactly, so transforming the original testcase from (x = -a >> 31) into 
(x = -(a > 0)) is not valid in that case.


R.




+(with { tree ctype = TREE_TYPE (@0);
+   tree stype = TREE_TYPE (@1);
+   tree bt = truth_type_for (ctype); }
+ (switch
+  /* Handle scalar case.  */
+  (if (INTEGRAL_TYPE_P (ctype)
+  && !VECTOR_TYPE_P (ctype)
+  && !TYPE_UNSIGNED (ctype)
+  && canonicalize_math_after_vectorization_p ()
+  && wi::eq_p (wi::to_wide (@1), TYPE_PRECISION (stype) - 1))
+   (convert:bt (gt:bt @0 { build_zero_cst (stype); })))


I'm not sure why the result is of type 'bt' rather than the original type of the
expression?


That was to satisfy some RTL check that expected results of comparisons to 
always
be a Boolean, though for scalar that logically always is the case, I just added 
it
for consistency.



In that regard for non-vectors we'd have to add the sign extension from
unsigned bool, in the vector case we'd hope the type of the comparison is
correct.  I think in both cases it might be convenient to use

   (cond (gt:bt @0 { build_zero_cst (ctype); }) { build_all_ones_cst (ctype); }
{ build_zero_cost (ctype); })

to compute the correct result and rely on (cond ..) simplifications to simplify
that if possible.

Btw, 'stype' should be irrelevant - you need to look at the precision of 
'ctype',
no?


I was working under the assumption that both input types must have the same
precision, but turns out that assumption doesn't need to hold.

New version attached.

Bootstrapped Regtested on aarch64-none-linux-gnu,
x86_64-pc-linux-gnu and no regressions.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* match.pd: New negate+shift pattern.

gcc/testsuite/ChangeLog:

* gcc.dg/signbit-2.c: New test.
* gcc.dg/signbit-3.c: New test.
* gcc.target/aarch64/signbit-1.c: New test.

--- inline copy of patch ---

diff --git a/gcc/match.pd b/gcc/match.pd
index 
7d2a24dbc5e9644a09968f877e12a824d8ba1caa..9532cae582e152cae6e22fcce95a9744a844e3c2
 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -38,7 +38,8 @@ along with GCC; see the file COPYING3.  If not see
 uniform_integer_cst_p
 HONOR_NANS
 uniform_vector_p
-   bitmask_inv_cst_vector_p)
+   bitmask_inv_cst_vector_p
+   expand_vec_cmp_expr_p)
  
  /* Operator lists.  */

  (define_operator_list tcc_comparison
@@ -826,6 +827,42 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  { tree utype = unsigned_type_for (type); }
  (convert (rshift (lshift (convert:utype @0) @2) @3))
  
+/* Fold (-x >> C) into x > 0 where C = precision(type) - 1.  */

+(for cst (INTEGER_CST VECTOR_CST)
+ (simplify
+  (rshift (negate:s @0) cst@1)
+   (if (!TYPE_OVERFLOW_WRAPS (type))


as said, I don't think that's necessary but at least it's now
written correctly ;)


+(with { tree ctype = TREE_TYPE (@0);


Instead of 'ctype' you can use 'type' since the type of the expression
is the same as that of @0


+   tree stype = TREE_TYPE (@1);
+   tree bt = truth_type_for (ctype);
+   tree zeros = build_zero_cst (ctype); }
+ (switch
+  /* Handle scalar case.  */
+  (if (INTEGRAL_TYPE_P (ctype)
+  && !VECTOR_TYPE_P (ctype)


INTEGRAL_TYPE_P does not include VECTOR_TYPE_P.


+  && !TYPE_UNSIGNED (ctype)
+  && canonicalize_math_after_vectorization_p ()
+  && wi::eq_p (wi::to_wide (@1), TYPE_PRECISION (ctype) - 1))
+   (cond (gt:bt @0 { zeros; }) { build_all_ones_cst (ctype); } { zeros; }))
+  /* Handle vector case with a scalar immediate.  */
+  (if (VECTOR_INTEGER_TYPE_P (ctype)
+  && !VECTOR_TYPE_P (stype)
+  && !TYPE_UNSIGNED (ctype)
+  && expand_vec_cmp_expr_p (ctype, ctype, { GT_EXPR }))
+   (with { HOST_WIDE_INT bits = GET_MODE_UNIT_BITSIZE (TYPE_MODE (ctype)); 
}
+   (if (wi::eq_p (wi::to_wide (@1), bits - 1))


You can use element_precision (@0) - 1 in both the scalar and vector case.


+(convert:bt (gt:bt @0 { zeros; })
+  /* Handle vector case with a vector immediate.   */
+  (if (VECTOR_INTEGER_TYPE_P (ctype)
+  && VECTOR_TYPE_P (stype)
+

[committed] openmp: Add support for OMP_PLACES=numa_domains

2021-10-15 Thread Jakub Jelinek via Gcc-patches
Hi!

This adds support for numa_domains abstract name in OMP_PLACES, also new
in OpenMP 5.1.

Way to test this is
OMP_PLACES=numa_domains OMP_DISPLAY_ENV=true LD_PRELOAD=.libs/libgomp.so.1 
/bin/true
and see what it prints on OMP_PLACES line.
For non-NUMA machines it should print a single place that covers all CPUs,
for NUMA machine one place for each NUMA node with corresponding CPUs.

Bootstrapped/regtested on x86_64-linux and i686-linux and tested on
powerpc64le-linux too, committed to trunk.

2021-10-15  Jakub Jelinek  

* env.c (parse_places_var): Handle numa_domains as level 5.
* config/linux/affinity.c (gomp_affinity_init_numa_domains): New
function.
(gomp_affinity_init_level): Use it instead of
gomp_affinity_init_level_1 for level == 5.
* testsuite/libgomp.c/places-5.c: New test.

--- libgomp/env.c.jj2021-10-14 15:25:48.816212823 +0200
+++ libgomp/env.c   2021-10-14 18:53:34.238432698 +0200
@@ -701,6 +701,11 @@ parse_places_var (const char *name, bool
   env += 9;
   level = 4;
 }
+  else if (strncasecmp (env, "numa_domains", 12) == 0)
+{
+  env += 12;
+  level = 5;
+}
   if (level)
 {
   count = ULONG_MAX;
--- libgomp/config/linux/affinity.c.jj  2021-10-14 17:13:12.811302863 +0200
+++ libgomp/config/linux/affinity.c 2021-10-14 19:22:37.820259007 +0200
@@ -355,6 +355,102 @@ gomp_affinity_init_level_1 (int level, i
   free (line);
 }
 
+static void
+gomp_affinity_init_numa_domains (unsigned long count, cpu_set_t *copy,
+char *name)
+{
+  FILE *f;
+  char *nline = NULL, *line = NULL;
+  size_t nlinelen = 0, linelen = 0;
+  char *q;
+  size_t prefix_len = sizeof ("/sys/devices/system/node/") - 1;
+
+  strcpy (name, "/sys/devices/system/node/online");
+  f = fopen (name, "r");
+  if (f == NULL || getline (, , f) <= 0)
+{
+  if (f)
+   fclose (f);
+  return;
+}
+  fclose (f);
+  q = nline;
+  while (*q && *q != '\n' && gomp_places_list_len < count)
+{
+  unsigned long nfirst, nlast;
+
+  errno = 0;
+  nfirst = strtoul (q, , 10);
+  if (errno)
+   break;
+  nlast = nfirst;
+  if (*q == '-')
+   {
+ errno = 0;
+ nlast = strtoul (q + 1, , 10);
+ if (errno || nlast < nfirst)
+   break;
+   }
+  for (; nfirst <= nlast; nfirst++)
+   {
+ sprintf (name + prefix_len, "node%lu/cpulist", nfirst);
+ f = fopen (name, "r");
+ if (f == NULL)
+   continue;
+ if (getline (, , f) > 0)
+   {
+ char *p = line;
+ void *pl = NULL;
+
+ while (*p && *p != '\n')
+   {
+ unsigned long first, last;
+ bool seen = false;
+
+ errno = 0;
+ first = strtoul (p, , 10);
+ if (errno)
+   break;
+ last = first;
+ if (*p == '-')
+   {
+ errno = 0;
+ last = strtoul (p + 1, , 10);
+ if (errno || last < first)
+   break;
+   }
+ for (; first <= last; first++)
+   {
+ if (!CPU_ISSET_S (first, gomp_cpuset_size, copy))
+   continue;
+ if (pl == NULL)
+   {
+ pl = gomp_places_list[gomp_places_list_len];
+ gomp_affinity_init_place (pl);
+   }
+ if (gomp_affinity_add_cpus (pl, first, 1, 0, true))
+   {
+ CPU_CLR_S (first, gomp_cpuset_size, copy);
+ if (!seen)
+   {
+ gomp_places_list_len++;
+ seen = true;
+   }
+   }
+   }
+ if (*p == ',')
+   ++p;
+   }
+   }
+ fclose (f);
+   }
+  if (*q == ',')
+   ++q;
+}
+  free (line);
+  free (nline);
+}
+
 bool
 gomp_affinity_init_level (int level, unsigned long count, bool quiet)
 {
@@ -377,8 +473,11 @@ gomp_affinity_init_level (int level, uns
   copy = gomp_alloca (gomp_cpuset_size);
   strcpy (name, "/sys/devices/system/cpu/cpu");
   memcpy (copy, gomp_cpusetp, gomp_cpuset_size);
-  gomp_affinity_init_level_1 (level, level > 3 ? level : 3, count, copy, name,
- quiet);
+  if (level == 5)
+gomp_affinity_init_numa_domains (count, copy, name);
+  else
+gomp_affinity_init_level_1 (level, level > 3 ? level : 3, count, copy,
+   name, quiet);
   if (gomp_places_list_len == 0)
 {
   if (!quiet)
--- libgomp/testsuite/libgomp.c/places-5.c.jj   2021-10-15 11:31:58.161312737 
+0200
+++ 

[committed] openmp: Add support for OMP_PLACES=ll_caches

2021-10-15 Thread Jakub Jelinek via Gcc-patches
Hi!

This patch implements support for ll_caches abstract name in OMP_PLACES,
which stands for places where logical cpus in each place share the last
level cache.

This seems to work fine for me on x86 and kernel sources show that it is
in common code, but on some machines on CompileFarm the files I'm using,
i.e.
/sys/devices/system/cpu/cpuN/cache/indexN/level
/sys/devices/system/cpu/cpuN/cache/indexN/shared_cpu_list
don't exist, is that because they have too old kernel and newer kernels
are fine or should I implement some fallback methods (which)?
E.g. on gcc112.fsffrance.org I see just shared_cpu_map and not shared_cpu_list
(with shared_cpu_map being harder to parse) and on another box I didn't even
see the cache subdirectories.

Way to test this is
OMP_PLACES=ll_caches OMP_DISPLAY_ENV=true LD_PRELOAD=.libs/libgomp.so.1 
/bin/true
and see what it prints on OMP_PLACES line.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2021-10-15  Jakub Jelinek  

* env.c (parse_places_var): Handle ll_caches as level 4.
* config/linux/affinity.c (gomp_affinity_find_last_cache_level): New
function.
(gomp_affinity_init_level_1): Handle level 4 as logical cpus sharing
last level cache.
(gomp_affinity_init_level): Likewise.
* testsuite/libgomp.c/places-1.c: New test.
* testsuite/libgomp.c/places-2.c: New test.
* testsuite/libgomp.c/places-3.c: New test.
* testsuite/libgomp.c/places-4.c: New test.

--- libgomp/env.c.jj2021-10-11 12:20:21.926063118 +0200
+++ libgomp/env.c   2021-10-14 15:25:48.816212823 +0200
@@ -696,6 +696,11 @@ parse_places_var (const char *name, bool
   env += 7;
   level = 3;
 }
+  else if (strncasecmp (env, "ll_caches", 9) == 0)
+{
+  env += 9;
+  level = 4;
+}
   if (level)
 {
   count = ULONG_MAX;
--- libgomp/config/linux/affinity.c.jj  2021-07-13 09:50:46.270677237 +0200
+++ libgomp/config/linux/affinity.c 2021-10-14 17:13:12.811302863 +0200
@@ -223,6 +223,46 @@ gomp_affinity_finalize_place_list (bool
   return true;
 }
 
+/* Find the index of the last level cache.  We assume the index
+   of the last level cache is the same for all logical CPUs.
+   Also, if there are multiple caches with the same highest level,
+   assume they have the same shared_cpu_list and pick the last one
+   from them (highest index number).  */
+
+static int
+gomp_affinity_find_last_cache_level (char *name, size_t prefix_len,
+unsigned long cpu)
+{
+  int ret = -1;
+  unsigned long maxval = 0;
+  char *line = NULL;
+  size_t linelen = 0;
+  FILE *f;
+
+  for (int l = 0; l < 128; l++)
+{
+  sprintf (name + prefix_len, "%lu/cache/index%u/level", cpu, l);
+  f = fopen (name, "r");
+  if (f == NULL)
+   break;
+  if (getline (, , f) > 0)
+   {
+ unsigned long val;
+ char *p;
+ errno = 0;
+ val = strtoul (line, , 10);
+ if (!errno && val >= maxval)
+   {
+ ret = l;
+ maxval = val;
+   }
+   }
+  fclose (f);
+}
+  free (line);
+  return ret;
+}
+
 static void
 gomp_affinity_init_level_1 (int level, int this_level, unsigned long count,
cpu_set_t *copy, char *name, bool quiet)
@@ -232,12 +272,29 @@ gomp_affinity_init_level_1 (int level, i
   char *line = NULL;
   size_t linelen = 0;
   unsigned long i, max = 8 * gomp_cpuset_size;
+  int init = -1;
 
   for (i = 0; i < max && gomp_places_list_len < count; i++)
 if (CPU_ISSET_S (i, gomp_cpuset_size, copy))
   {
-   sprintf (name + prefix_len, "%lu/topology/%s_siblings_list",
-i, this_level == 3 ? "core" : "thread");
+   if (level == 4)
+ {
+   if (init == -1)
+ {
+   init = gomp_affinity_find_last_cache_level (name, prefix_len,
+   i);
+   if (init == -1)
+ {
+   CPU_CLR_S (i, gomp_cpuset_size, copy);
+   continue;
+ }
+   sprintf (name + prefix_len,
+"%lu/cache/index%u/shared_cpu_list", i, init);
+ }
+ }
+   else
+ sprintf (name + prefix_len, "%lu/topology/%s_siblings_list",
+  i, this_level == 3 ? "core" : "thread");
f = fopen (name, "r");
if (f == NULL)
  {
@@ -302,7 +359,7 @@ bool
 gomp_affinity_init_level (int level, unsigned long count, bool quiet)
 {
   char name[sizeof ("/sys/devices/system/cpu/cpu/topology/"
-   "thread_siblings_list") + 3 * sizeof (unsigned long)];
+   "thread_siblings_list") + 6 * sizeof (unsigned long)];
   cpu_set_t *copy;
 
   if (gomp_cpusetp)
@@ -320,7 +377,8 @@ gomp_affinity_init_level (int level, uns
   copy = gomp_alloca (gomp_cpuset_size);
   strcpy (name, 

[patch] Tame if-combining when loop unswitching is enabled

2021-10-15 Thread Eric Botcazou via Gcc-patches
Hi,

in order to make it possible to vectorize loops running over arrays in Ada, 
which generally contain index checks, hence control-flow instructions, we rely 
on loop unswitching to generate two copies of the loop, one guarded with a 
global condition (no index check fails in the loop) and vectorizable and one 
with the original index checks and non-vectorizable.  This is achieved by the 
simple trick of prepending the global_condition to the condition of the index 
checks and letting the loop unswitching pass do its magic.

But there is an enemy, namely if-combining, which can turn a simple boolean 
conjunction into something else that loop unswitching cannot deal with, and a 
testcase is attached with 3 slightly different versions of the same issue.

Therefore the attached patch attempts to tame if-combining by reasoning on the 
loop invariant status (really loop depths) of the conditions.

Bootstrapped/regtested on x86-64/Linux, OK for the mainline?


2021-10-15  Eric Botcazou  

* tree-ssa-ifcombine.c: Include cfgloop.h.
(operand_loop_depth): New function.
(ifcombine_ifandif): When loop unswitching is enabled, do not merge
conditions whose loop invariant status is different.


2021-10-15  Eric Botcazou  

* gnat.dg/vect19.ads, gnat.dg/vect19.adb: New test.

-- 
Eric Botcazoudiff --git a/gcc/tree-ssa-ifcombine.c b/gcc/tree-ssa-ifcombine.c
index f93e04aa4df..986084049da 100644
--- a/gcc/tree-ssa-ifcombine.c
+++ b/gcc/tree-ssa-ifcombine.c
@@ -35,6 +35,7 @@ along with GCC; see the file COPYING3.  If not see
BRANCH_COST.  */
 #include "fold-const.h"
 #include "cfganal.h"
+#include "cfgloop.h"
 #include "gimple-fold.h"
 #include "gimple-iterator.h"
 #include "gimplify-me.h"
@@ -378,6 +379,19 @@ update_profile_after_ifcombine (basic_block inner_cond_bb,
   outer2->probability = profile_probability::never ();
 }
 
+/* Return the loop depth of GIMPLE operand OP.  */
+
+static int
+operand_loop_depth (tree op)
+{
+  basic_block bb;
+
+  if (TREE_CODE (op) == SSA_NAME && (bb = gimple_bb (SSA_NAME_DEF_STMT (op
+return bb_loop_depth (bb);
+
+  return 0;
+}
+
 /* If-convert on a and pattern with a common else block.  The inner
if is specified by its INNER_COND_BB, the outer by OUTER_COND_BB.
inner_inv, outer_inv and result_inv indicate whether the conditions
@@ -554,6 +568,22 @@ ifcombine_ifandif (basic_block inner_cond_bb, bool inner_inv,
 	  HONOR_NANS (gimple_cond_lhs (outer_cond)));
   if (outer_cond_code == ERROR_MARK)
 	return false;
+  /* Do not merge if the loop invariant status of the conditions is not
+	 the same and we'll be unswitching loops downstream.  */
+  if (flag_unswitch_loops)
+	{
+	  const int current_depth
+	= MIN (bb_loop_depth (inner_cond_bb),
+		   bb_loop_depth (outer_cond_bb));
+	  const int inner_depth
+	= MAX (operand_loop_depth (gimple_cond_lhs (inner_cond)),
+		   operand_loop_depth (gimple_cond_rhs (inner_cond)));
+	  const int outer_depth
+	= MAX (operand_loop_depth (gimple_cond_lhs (outer_cond)),
+		   operand_loop_depth (gimple_cond_rhs (outer_cond)));
+	  if ((inner_depth < current_depth) != (outer_depth < current_depth))
+	return false;
+	}
   /* Don't return false so fast, try maybe_fold_or_comparisons?  */
 
   if (!(t = maybe_fold_and_comparisons (boolean_type_node, inner_cond_code,
-- { dg-do compile { target i?86-*-* x86_64-*-* } }
-- { dg-options "-O3 -msse2 -fno-vect-cost-model -fdump-tree-vect-details" }
-- { dg-additional-options "-gnatX" }

package body Vect19 is

   function "+" (X, Y : Varray) return Varray is
  R : Varray (X'Range);
   begin
  for I in X'Range loop
 R(I) := X(I) + Y(I);
  end loop;
  return R;
   end;

   procedure Add (X, Y : Varray; R : out Varray) is
   begin
  for I in X'Range loop
 R(I) := X(I) + Y(I);
  end loop;
   end;

   procedure Add (X, Y : not null access Varray; R : not null access Varray) is
   begin
  for I in X'Range loop
 R(I) := X(I) + Y(I);
  end loop;
   end;

end Vect19;

-- { dg-final { scan-tree-dump-times "vectorized 1 loops" 3 "vect"  } }
package Vect19 is

   type Varray is array (Natural range 1 .. <>) of Long_Float;
   for Varray'Alignment use 16;

   function "+" (X, Y : Varray) return Varray;
   procedure Add (X, Y : Varray; R : out Varray);
   procedure Add (X, Y : not null access Varray; R : not null access Varray);

end Vect19;


RE: [PATCH]middle-end convert negate + right shift into compare greater.

2021-10-15 Thread Richard Biener via Gcc-patches
On Fri, 15 Oct 2021, Tamar Christina wrote:

> > >
> > > +/* Fold (-x >> C) into x > 0 where C = precision(type) - 1.  */ (for
> > > +cst (INTEGER_CST VECTOR_CST)  (simplify
> > > +  (rshift (negate:s @0) cst@1)
> > > +   (if (!flag_wrapv)
> > 
> > Don't test flag_wrapv directly, instead use the appropriate
> > TYPE_OVERFLOW_{UNDEFINED,WRAPS} predicates.  But I'm not sure what
> > we are protecting against?  Right-shift of signed integers is 
> > implementation-
> > defined and GCC treats it as you'd expect, sign-extending the result.
> > 
> 
> It's protecting against the overflow of the negate on INT_MIN. When wrapping
> overflows are enabled the results would be wrong.

But -INT_MIN == INT_MIN in twos-complement so I fail to see the wrong
result?  That is, both -INT_MIN >> 31 and INT_MIN >> 31 are -1.

> > > +(with { tree ctype = TREE_TYPE (@0);
> > > + tree stype = TREE_TYPE (@1);
> > > + tree bt = truth_type_for (ctype); }
> > > + (switch
> > > +  /* Handle scalar case.  */
> > > +  (if (INTEGRAL_TYPE_P (ctype)
> > > +&& !VECTOR_TYPE_P (ctype)
> > > +&& !TYPE_UNSIGNED (ctype)
> > > +&& canonicalize_math_after_vectorization_p ()
> > > +&& wi::eq_p (wi::to_wide (@1), TYPE_PRECISION (stype) - 1))
> > > +   (convert:bt (gt:bt @0 { build_zero_cst (stype); })))
> > 
> > I'm not sure why the result is of type 'bt' rather than the original type 
> > of the
> > expression?
> 
> That was to satisfy some RTL check that expected results of comparisons to 
> always
> be a Boolean, though for scalar that logically always is the case, I just 
> added it
> for consistency.
> 
> > 
> > In that regard for non-vectors we'd have to add the sign extension from
> > unsigned bool, in the vector case we'd hope the type of the comparison is
> > correct.  I think in both cases it might be convenient to use
> > 
> >   (cond (gt:bt @0 { build_zero_cst (ctype); }) { build_all_ones_cst 
> > (ctype); }
> > { build_zero_cost (ctype); })
> > 
> > to compute the correct result and rely on (cond ..) simplifications to 
> > simplify
> > that if possible.
> > 
> > Btw, 'stype' should be irrelevant - you need to look at the precision of 
> > 'ctype',
> > no?
> 
> I was working under the assumption that both input types must have the same
> precision, but turns out that assumption doesn't need to hold.
> 
> New version attached.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu,
> x86_64-pc-linux-gnu and no regressions.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   * match.pd: New negate+shift pattern.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/signbit-2.c: New test.
>   * gcc.dg/signbit-3.c: New test.
>   * gcc.target/aarch64/signbit-1.c: New test.
> 
> --- inline copy of patch ---
> 
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 
> 7d2a24dbc5e9644a09968f877e12a824d8ba1caa..9532cae582e152cae6e22fcce95a9744a844e3c2
>  100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -38,7 +38,8 @@ along with GCC; see the file COPYING3.  If not see
> uniform_integer_cst_p
> HONOR_NANS
> uniform_vector_p
> -   bitmask_inv_cst_vector_p)
> +   bitmask_inv_cst_vector_p
> +   expand_vec_cmp_expr_p)
>  
>  /* Operator lists.  */
>  (define_operator_list tcc_comparison
> @@ -826,6 +827,42 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  { tree utype = unsigned_type_for (type); }
>  (convert (rshift (lshift (convert:utype @0) @2) @3))
>  
> +/* Fold (-x >> C) into x > 0 where C = precision(type) - 1.  */
> +(for cst (INTEGER_CST VECTOR_CST)
> + (simplify
> +  (rshift (negate:s @0) cst@1)
> +   (if (!TYPE_OVERFLOW_WRAPS (type))

as said, I don't think that's necessary but at least it's now
written correctly ;)

> +(with { tree ctype = TREE_TYPE (@0);

Instead of 'ctype' you can use 'type' since the type of the expression
is the same as that of @0

> + tree stype = TREE_TYPE (@1);
> + tree bt = truth_type_for (ctype);
> + tree zeros = build_zero_cst (ctype); }
> + (switch
> +  /* Handle scalar case.  */
> +  (if (INTEGRAL_TYPE_P (ctype)
> +&& !VECTOR_TYPE_P (ctype)

INTEGRAL_TYPE_P does not include VECTOR_TYPE_P.

> +&& !TYPE_UNSIGNED (ctype)
> +&& canonicalize_math_after_vectorization_p ()
> +&& wi::eq_p (wi::to_wide (@1), TYPE_PRECISION (ctype) - 1))
> +   (cond (gt:bt @0 { zeros; }) { build_all_ones_cst (ctype); } { zeros; 
> }))
> +  /* Handle vector case with a scalar immediate.  */
> +  (if (VECTOR_INTEGER_TYPE_P (ctype)
> +&& !VECTOR_TYPE_P (stype)
> +&& !TYPE_UNSIGNED (ctype)
> +&& expand_vec_cmp_expr_p (ctype, ctype, { GT_EXPR }))
> +   (with { HOST_WIDE_INT bits = GET_MODE_UNIT_BITSIZE (TYPE_MODE 
> (ctype)); }
> + (if (wi::eq_p (wi::to_wide (@1), bits - 1))

You can use element_precision (@0) - 1 in both the scalar and vector case.

> +  (convert:bt (gt:bt @0 { zeros; })
> +  /* Handle vector case with a vector 

Re: [r12-4397 Regression] FAIL: gcc.dg/guality/pr54200.c -Og -DPREVENT_OPTIMIZATION line 20 z == 3 on Linux/x86_64

2021-10-15 Thread Martin Liška

On 10/14/21 21:16, sunil.k.pandey wrote:

FAIL: gcc.dg/guality/pr54200.c  -Og -DPREVENT_OPTIMIZATION  line 20 z == 3


Hello.

I've just verified the assembly is identical before and after the revision.
So it must be a false positive.

Cheers,
Martin


Re: [PATCH 1v2/3][vect] Add main vectorized loop unrolling

2021-10-15 Thread Richard Biener via Gcc-patches
On Tue, 12 Oct 2021, Andre Vieira (lists) wrote:

> Hi Richi,
> 
> I think this is what you meant, I now hide all the unrolling cost calculations
> in the existing target hooks for costs. I did need to adjust 'finish_cost' to
> take the loop_vinfo so the target's implementations are able to set the newly
> renamed 'suggested_unroll_factor'.
> 
> Also added the checks for the epilogue's VF.
> 
> Is this more like what you had in mind?

Not exactly (sorry..).  For the target hook I think we don't want to
pass vec_info but instead another output parameter like the existing
ones.

vect_estimate_min_profitable_iters should then via
vect_analyze_loop_costing and vect_analyze_loop_2 report the unroll
suggestion to vect_analyze_loop which should then, if the suggestion
was > 1, instead of iterating to the next vector mode run again
with a fixed VF (old VF times suggested unroll factor - there's
min_vf in vect_analyze_loop_2 which we should adjust to
the old VF times two for example and maybe store the suggested
factor as hint) - if it succeeds the result will end up in the
list of considered modes (where we now may have more than one
entry for the same mode but a different VF), we probably want to
only consider more unrolling once.

For simplicity I'd probably set min_vf = max_vf = old VF * suggested 
factor, thus take the targets request literally.

Richard.

> 
> gcc/ChangeLog:
> 
>     * config/aarch64/aarch64.c (aarch64_finish_cost): Add class vec_info
> parameter.
>     * config/i386/i386.c (ix86_finish_cost): Likewise.
>     * config/rs6000/rs6000.c (rs6000_finish_cost): Likewise.
>     * doc/tm.texi: Document changes to TARGET_VECTORIZE_FINISH_COST.
>     * target.def: Add class vec_info parameter to finish_cost.
>     * targhooks.c (default_finish_cost): Likewise.
>     * targhooks.h (default_finish_cost): Likewise.
>     * tree-vect-loop.c (vect_determine_vectorization_factor): Use 
> suggested_unroll_factor
>     to increase vectorization_factor if possible.
>     (_loop_vec_info::_loop_vec_info): Add suggested_unroll_factor 
> member.
>     (vect_compute_single_scalar_iteration_cost): Adjust call to
> finish_cost.
>     (vect_determine_partial_vectors_and_peeling): Ensure unrolled loop is
> not predicated.
>     (vect_determine_unroll_factor): New.
>     (vect_try_unrolling): New.
>     (vect_reanalyze_as_main_loop): Also try to unroll when 
> reanalyzing as main loop.
>     (vect_analyze_loop): Add call to vect_try_unrolling and check to
> ensure epilogue
>     is either a smaller VF than main loop or uses partial vectors and
> might be of equal
>     VF.
>     (vect_estimate_min_profitable_iters): Adjust call to finish_cost.
>     (vectorizable_reduction): Make sure to not use 
> single_defuse_cyle when unrolling.
>     * tree-vect-slp.c (vect_bb_vectorization_profitable_p): Adjust call to
> finish_cost.
>     * tree-vectorizer.h (finish_cost): Change to pass new class vec_info
> parameter.
> 
> On 01/10/2021 09:19, Richard Biener wrote:
> > On Thu, 30 Sep 2021, Andre Vieira (lists) wrote:
> >
> >> Hi,
> >>
> >>
>  That just forces trying the vector modes we've tried before. Though I
>  might
>  need to revisit this now I think about it. I'm afraid it might be
>  possible
>  for
>  this to generate an epilogue with a vf that is not lower than that of the
>  main
>  loop, but I'd need to think about this again.
> 
>  Either way I don't think this changes the vector modes used for the
>  epilogue.
>  But maybe I'm just missing your point here.
> >>> Yes, I was refering to the above which suggests that when we vectorize
> >>> the main loop with V4SF but unroll then we try vectorizing the
> >>> epilogue with V4SF as well (but not unrolled).  I think that's
> >>> premature (not sure if you try V8SF if the main loop was V4SF but
> >>> unrolled 4 times).
> >> My main motivation for this was because I had a SVE loop that vectorized
> >> with
> >> both VNx8HI, then V8HI which beat VNx8HI on cost, then it decided to unroll
> >> V8HI by two and skipped using VNx8HI as a predicated epilogue which
> >> would've
> >> been the best choice.
> > I see, yes - for fully predicated epilogues it makes sense to consider
> > the same vector mode as for the main loop anyways (independent on
> > whether we're unrolling or not).  One could argue that with an
> > unrolled V4SImode main loop a predicated V8SImode epilogue would also
> > be a good match (but then somehow costing favored the unrolled V4SI
> > over the V8SI for the main loop...).
> >
> >> So that is why I decided to just 'reset' the vector_mode selection. In a
> >> scenario where you only have the traditional vector modes it might make
> >> less
> >> sense.
> >>
> >> Just realized I still didn't add any check to make sure the epilogue has a
> >> lower VF than the previous loop, though I'm still not sure that could
> >> happen.
> 

Re: [PATCH] libstdc++: Check [ptr, end) and [ptr, ptr+n) ranges with _GLIBCXX_ASSERTIONS

2021-10-15 Thread Jonathan Wakely via Gcc-patches
On Fri, 15 Oct 2021 at 06:19, François Dumont wrote:
>
> On 14/10/21 7:43 pm, Jonathan Wakely wrote:
> > On Thu, 14 Oct 2021 at 18:11, François Dumont  wrote:
> >> Hi
> >>
> >>   On a related subject I am waiting for some feedback on:
> >>
> >> https://gcc.gnu.org/pipermail/libstdc++/2021-August/053005.html
> > I'm concerned that this adds too much overhead for the
> > _GLIBCXX_ASSERTIONS case. It adds function calls which are not
> > necessarily inlined, and which perform arithmetic and comparisons on
> > the arguments. That has a runtime cost which is non-zero.
>
> I thought that limiting the checks to __valid_range would be fine for
> _GLIBCXX_ASSERTIONS. If you do not want any overhead you just don't
> define it.

Then you get no checks at all. The point of _GLIBCXX_ASSERTIONS is to
get *some* checking, without too much overhead. If you are willing to
accept the overhead we already have _GLIBCXX_DEBUG for that.

We could consider a second level of _GLIBCXX_ASSERTIONS=2 that turns
on extra checks, but we need to be careful about adding any
non-trivial checks to _GLIBCXX_ASSERTIONS=1 (which is what is used
today in major linux distributions, to build every C++ program and
library in the OS).


>
> >
> > The patches I sent in this thread have zero runtime cost, because they
> > use the compiler built-in which compiles away to nothing if the sizes
> > aren't known.
> I'll try to find out if it can help for the test case on std::copy which
> I was adding with my proposal.
> >
> >> On 11/10/21 6:49 pm, Jonathan Wakely wrote:
> >>> This enables lightweight checks for the __glibcxx_requires_valid_range
> >>> and __glibcxx_requires_string_len macros  when _GLIBCXX_ASSERTIONS is
> >>> defined.  By using __builtin_object_size we can check whether the end of
> >>> the range is part of the same object as the start of the range, and
> >>> detect problems like in PR 89927.
> >>>
> >>> libstdc++-v3/ChangeLog:
> >>>
> >>>* include/debug/debug.h (__valid_range_p, __valid_range_n): New
> >>>inline functions using __builtin_object_size to check ranges
> >>>delimited by pointers.
> >>>[_GLIBCXX_ASSERTIONS] (__glibcxx_requires_valid_range): Use
> >>>__valid_range_p.
> >>>[_GLIBCXX_ASSERTIONS] (__glibcxx_requires_string_len): Use
> >>>__valid_range_n.
> >>>
> >>>
> >>> The first patch allows us to detect bugs like string("foo", "bar"),
> >>> like in PR 89927. Debug mode cannot currently detect this. The new
> >>> check uses the compiler built-in to detect when the two arguments are
> >>> not part of the same object. This assumes we're optimizing and the
> >>> compiler knows the values of the pointers. If it doesn't, then the
> >>> function just returns true and should inline to nothing.
> >> I see, it does not detect that input pointers are unrelated but as they
> >> are the computed size is >= __sz.
> >>
> >> Isn't it UB to compare unrelated pointers ?
> > Yes, and my patch doesn't compare any pointers, does it?
> >
> +  __UINTPTR_TYPE__ __f = (__UINTPTR_TYPE__)__first;
> +  __UINTPTR_TYPE__ __l = (__UINTPTR_TYPE__)__last;
> +  if (const std::size_t __sz = __builtin_object_size(__first, 3))
> +return __f <= __l && (__l - __f) <= __sz;
>
> Isn't it a comparison ?

It's not comparing pointers, it's comparing integers. To avoid the
unspecified behaviour of comparing unrelated pointers.

>
> But maybe this is what the previous cast is for, I never understood it.
>
> Note that those cast could be moved within the if branch, even if I
> guess that the compiler does it.

At -O1 the casts are zero cost, they don't generate any code. At -O0,
you have so much overhead for every line of code that this doesn't
make much difference! But yes, we could move them into the if
statement.



[PATCH] AVX512FP16: Add *_set1_pch intrinsics.

2021-10-15 Thread dianhong.xu--- via Gcc-patches
From: dianhong xu 

Add *_set1_pch (_Float16 _Complex A) intrinsics.

gcc/ChangeLog:

* config/i386/avx512fp16intrin.h:
(_mm512_set1_pch): New intrinsic.
* config/i386/avx512fp16vlintrin.h:
(_mm256_set1_pch): New intrinsic.
(_mm_set1_pch): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512fp16-set1-pch-1a.c: New test.
* gcc.target/i386/avx512fp16-set1-pch-1b.c: New test.
* gcc.target/i386/avx512fp16vl-set1-pch-1a.c: New test.
* gcc.target/i386/avx512fp16vl-set1-pch-1b.c: New test.
---
 gcc/config/i386/avx512fp16intrin.h| 13 +
 gcc/config/i386/avx512fp16vlintrin.h  | 26 +
 .../gcc.target/i386/avx512fp16-set1-pch-1a.c  | 13 +
 .../gcc.target/i386/avx512fp16-set1-pch-1b.c  | 42 ++
 .../i386/avx512fp16vl-set1-pch-1a.c   | 20 +++
 .../i386/avx512fp16vl-set1-pch-1b.c   | 57 +++
 6 files changed, 171 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-set1-pch-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-set1-pch-1b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-set1-pch-1a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16vl-set1-pch-1b.c

diff --git a/gcc/config/i386/avx512fp16intrin.h 
b/gcc/config/i386/avx512fp16intrin.h
index 079ce321c01..17025d68b8e 100644
--- a/gcc/config/i386/avx512fp16intrin.h
+++ b/gcc/config/i386/avx512fp16intrin.h
@@ -7237,6 +7237,19 @@ _mm512_permutexvar_ph (__m512i __A, __m512h __B)
 (__mmask32)-1);
 }
 
+extern __inline __m512h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm512_set1_pch (_Float16 _Complex __A)
+{
+  union
+  {
+_Float16 _Complex a;
+float b;
+  } u = { .a = __A};
+
+  return (__m512h) _mm512_set1_ps (u.b);
+}
+
 #ifdef __DISABLE_AVX512FP16__
 #undef __DISABLE_AVX512FP16__
 #pragma GCC pop_options
diff --git a/gcc/config/i386/avx512fp16vlintrin.h 
b/gcc/config/i386/avx512fp16vlintrin.h
index f83a429ba43..1de4513d7f1 100644
--- a/gcc/config/i386/avx512fp16vlintrin.h
+++ b/gcc/config/i386/avx512fp16vlintrin.h
@@ -3315,6 +3315,32 @@ _mm_permutexvar_ph (__m128i __A, __m128h __B)
 (__mmask8)-1);
 }
 
+extern __inline __m256h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_set1_pch (_Float16 _Complex __A)
+{
+  union
+  {
+_Float16 _Complex a;
+float b;
+  } u = { .a = __A };
+
+  return (__m256h) _mm256_set1_ps (u.b);
+}
+
+extern __inline __m128h
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_set1_pch (_Float16 _Complex __A)
+{
+  union
+  {
+_Float16 _Complex a;
+float b;
+  } u = { .a = __A };
+
+  return (__m128h) _mm_set1_ps (u.b);
+}
+
 #ifdef __DISABLE_AVX512FP16VL__
 #undef __DISABLE_AVX512FP16VL__
 #pragma GCC pop_options
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-set1-pch-1a.c 
b/gcc/testsuite/gcc.target/i386/avx512fp16-set1-pch-1a.c
new file mode 100644
index 000..0055193f243
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-set1-pch-1a.c
@@ -0,0 +1,13 @@
+/* { dg-do compile} */
+/* { dg-options "-O2 -mavx512fp16" } */
+
+#include 
+
+__m512h
+__attribute__ ((noinline, noclone))
+test_mm512_set1_pch (_Float16 _Complex A)
+{
+  return _mm512_set1_pch(A);
+}
+
+/* { dg-final { scan-assembler "vbroadcastss\[ \\t\]+\[^\n\r\]*%zmm\[01\]" } } 
*/
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-set1-pch-1b.c 
b/gcc/testsuite/gcc.target/i386/avx512fp16-set1-pch-1b.c
new file mode 100644
index 000..450d7e37237
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-set1-pch-1b.c
@@ -0,0 +1,42 @@
+/* { dg-do run { target avx512fp16 } } */
+/* { dg-options "-O2 -mavx512fp16" } */
+
+#include
+#include 
+#include 
+
+static void do_test (void);
+
+#define DO_TEST do_test
+#define AVX512FP16
+
+#include 
+#include "avx512-check.h"
+
+static void
+do_test (void)
+{
+ _Float16 _Complex fc = 1.0 + 1.0*I;
+  union
+  {
+_Float16 _Complex a;
+float b;
+  } u = { .a = fc };
+  float ff= u.b;
+
+  typedef union
+  {
+float fp[16];
+__m512h m512h;
+  } u1;
+
+  __m512h test512 = _mm512_set1_pch(fc);
+
+  u1 test;
+  test.m512h = test512;
+  for (int i = 0; i<16; i++)
+  {
+if (test.fp[i] != ff) abort();
+  }
+
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16vl-set1-pch-1a.c 
b/gcc/testsuite/gcc.target/i386/avx512fp16vl-set1-pch-1a.c
new file mode 100644
index 000..4c5624f9935
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16vl-set1-pch-1a.c
@@ -0,0 +1,20 @@
+/* { dg-do compile} */
+/* { dg-options "-O2 -mavx512fp16 -mavx512vl" } */
+
+#include 
+
+__m256h
+__attribute__ ((noinline, noclone))
+test_mm256_set1_pch (_Float16 _Complex A)
+{
+  return _mm256_set1_pch(A);
+}
+
+__m128h
+__attribute__ ((noinline, noclone))
+test_mm_set1_pch (_Float16 

Re: [PATCH] Always default to DWARF2_DEBUG if not specified, warn about deprecated STABS

2021-10-15 Thread Richard Biener via Gcc-patches
On Tue, 28 Sep 2021, Koning, Paul wrote:

> 
> 
> > On Sep 28, 2021, at 2:14 AM, Richard Biener via Gcc-patches 
> >  wrote:
> > 
> > On Tue, Sep 21, 2021 at 4:26 PM Richard Biener via Gcc-patches
> >  wrote:
> >> 
> >> This makes defaults.h choose DWARF2_DEBUG if PREFERRED_DEBUGGING_TYPE
> >> is not specified by the target and errors out if DWARF DWARF is not 
> >> supported.
> >> 
> >> ...
> >> 
> >> This completes the series of deprecating STABS for GCC 12.
> >> 
> >> Bootstrapped and tested on x86_64-unknown-linux-gnu.
> >> 
> >> OK for trunk?
> > 
> > Ping.
> 
> pdp11 is fine.

I have now pushed this and the related changes.html update.

Richard.


[PATCH] c/102763 - fix ICE with invalid input to GIMPLE FE

2021-10-15 Thread Richard Biener via Gcc-patches
This fixes an ICE for the failure to verify we're dereferencing a
pointer before throwing that at build_simple_mem_ref.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

Richard.

2021-10-15  Richard Biener  

PR c/102763
gcc/c/
* gimple-parser.c
(c_parser_gimple_postfix_expression_after_primary): Check
for a pointer do be dereferenced by ->.

gcc/testsuite/
* gcc.dg/gimplefe-error-12.c: New testcase.
---
 gcc/c/gimple-parser.c|  8 
 gcc/testsuite/gcc.dg/gimplefe-error-12.c | 10 ++
 2 files changed, 18 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/gimplefe-error-12.c

diff --git a/gcc/c/gimple-parser.c b/gcc/c/gimple-parser.c
index c43ee38a2cf..f3d99355a8e 100644
--- a/gcc/c/gimple-parser.c
+++ b/gcc/c/gimple-parser.c
@@ -1817,6 +1817,14 @@ c_parser_gimple_postfix_expression_after_primary 
(gimple_parser ,
case CPP_DEREF:
  {
/* Structure element reference.  */
+   if (!POINTER_TYPE_P (TREE_TYPE (expr.value)))
+ {
+   c_parser_error (parser, "dereference of non-pointer");
+   expr.set_error ();
+   expr.original_code = ERROR_MARK;
+   expr.original_type = NULL;
+   return expr;
+ }
c_parser_consume_token (parser);
if (c_parser_next_token_is (parser, CPP_NAME))
  {
diff --git a/gcc/testsuite/gcc.dg/gimplefe-error-12.c 
b/gcc/testsuite/gcc.dg/gimplefe-error-12.c
new file mode 100644
index 000..981ff7ba499
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/gimplefe-error-12.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-fgimple" } */
+
+int get_current ();
+
+__GIMPLE
+void foo()
+{
+  get_current()->flags; /* { dg-error "non-pointer" } */
+}
-- 
2.31.1


Re: [RFC] Don't move cold code out of loop by checking bb count

2021-10-15 Thread Richard Biener via Gcc-patches
On Sat, Oct 9, 2021 at 5:45 AM Xionghu Luo  wrote:
>
> Hi,
>
> On 2021/9/28 20:09, Richard Biener wrote:
> > On Fri, Sep 24, 2021 at 8:29 AM Xionghu Luo  wrote:
> >>
> >> Update the patch to v3, not sure whether you prefer the paste style
> >> and continue to link the previous thread as Segher dislikes this...
> >>
> >>
> >> [PATCH v3] Don't move cold code out of loop by checking bb count
> >>
> >>
> >> Changes:
> >> 1. Handle max_loop in determine_max_movement instead of
> >> outermost_invariant_loop.
> >> 2. Remove unnecessary changes.
> >> 3. Add for_all_locs_in_loop (loop, ref, ref_in_loop_hot_body) in 
> >> can_sm_ref_p.
> >> 4. "gsi_next ();" in move_computations_worker is kept since it caused
> >> infinite loop when implementing v1 and the iteration is missed to be
> >> updated actually.
> >>
> >> v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-August/576488.html
> >> v2: https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579086.html
> >>
> >> There was a patch trying to avoid move cold block out of loop:
> >>
> >> https://gcc.gnu.org/pipermail/gcc/2014-November/215551.html
> >>
> >> Richard suggested to "never hoist anything from a bb with lower execution
> >> frequency to a bb with higher one in LIM invariantness_dom_walker
> >> before_dom_children".
> >>
> >> In gimple LIM analysis, add find_coldest_out_loop to move invariants to
> >> expected target loop, if profile count of the loop bb is colder
> >> than target loop preheader, it won't be hoisted out of loop.
> >> Likely for store motion, if all locations of the REF in loop is cold,
> >> don't do store motion of it.
> >>
> >> SPEC2017 performance evaluation shows 1% performance improvement for
> >> intrate GEOMEAN and no obvious regression for others.  Especially,
> >> 500.perlbench_r +7.52% (Perf shows function S_regtry of perlbench is
> >> largely improved.), and 548.exchange2_r+1.98%, 526.blender_r +1.00%
> >> on P8LE.
> >>
> >> gcc/ChangeLog:
> >>
> >> * loop-invariant.c (find_invariants_bb): Check profile count
> >> before motion.
> >> (find_invariants_body): Add argument.
> >> * tree-ssa-loop-im.c (find_coldest_out_loop): New function.
> >> (determine_max_movement): Use find_coldest_out_loop.
> >> (move_computations_worker): Adjust and fix iteration udpate.
> >> (execute_sm_exit): Check pointer validness.
> >> (class ref_in_loop_hot_body): New functor.
> >> (ref_in_loop_hot_body::operator): New.
> >> (can_sm_ref_p): Use for_all_locs_in_loop.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >> * gcc.dg/tree-ssa/recip-3.c: Adjust.
> >> * gcc.dg/tree-ssa/ssa-lim-18.c: New test.
> >> * gcc.dg/tree-ssa/ssa-lim-19.c: New test.
> >> * gcc.dg/tree-ssa/ssa-lim-20.c: New test.
> >> ---
> >>  gcc/loop-invariant.c   | 10 ++--
> >>  gcc/tree-ssa-loop-im.c | 61 --
> >>  gcc/testsuite/gcc.dg/tree-ssa/recip-3.c|  2 +-
> >>  gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-18.c | 20 +++
> >>  gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c | 27 ++
> >>  gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-20.c | 25 +
> >>  gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-21.c | 28 ++
> >>  7 files changed, 165 insertions(+), 8 deletions(-)
> >>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-18.c
> >>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c
> >>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-20.c
> >>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-21.c
> >>
> >> diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c
> >> index fca0c2b24be..5c3be7bf0eb 100644
> >> --- a/gcc/loop-invariant.c
> >> +++ b/gcc/loop-invariant.c
> >> @@ -1183,9 +1183,14 @@ find_invariants_insn (rtx_insn *insn, bool 
> >> always_reached, bool always_executed)
> >> call.  */
> >>
> >>  static void
> >> -find_invariants_bb (basic_block bb, bool always_reached, bool 
> >> always_executed)
> >> +find_invariants_bb (class loop *loop, basic_block bb, bool always_reached,
> >> +   bool always_executed)
> >>  {
> >>rtx_insn *insn;
> >> +  basic_block preheader = loop_preheader_edge (loop)->src;
> >> +
> >> +  if (preheader->count > bb->count)
> >> +return;
> >>
> >>FOR_BB_INSNS (bb, insn)
> >>  {
> >> @@ -1214,8 +1219,7 @@ find_invariants_body (class loop *loop, basic_block 
> >> *body,
> >>unsigned i;
> >>
> >>for (i = 0; i < loop->num_nodes; i++)
> >> -find_invariants_bb (body[i],
> >> -   bitmap_bit_p (always_reached, i),
> >> +find_invariants_bb (loop, body[i], bitmap_bit_p (always_reached, i),
> >> bitmap_bit_p (always_executed, i));
> >>  }
> >>
> >> diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c
> >> index 4b187c2cdaf..655fab03442 100644
> >> --- a/gcc/tree-ssa-loop-im.c
> >> +++ b/gcc/tree-ssa-loop-im.c
> >> @@ -417,6 +417,28 @@ movement_possibility 

[PATCH]AArch64 Lower intrinsics shift to GIMPLE when possible.

2021-10-15 Thread Tamar Christina via Gcc-patches
Hi All,

This lowers shifts to GIMPLE when the C interpretations of the shift operations
matches that of AArch64.

In C shifting right by BITSIZE is undefined, but the behavior is defined in
AArch64.  Additionally negative shifts lefts are undefined in C but defined
for the register variant of the instruction (SSHL, USHL) as being right shifts.

Since we have a right shift by immediate I rewrite those cases into right shifts

So:

int64x1_t foo3 (int64x1_t a)
{
  return vshl_s64 (a, vdup_n_s64(-6));
}

produces:

foo3:
sshrd0, d0, 6
ret

instead of:

foo3:
mov x0, -6
fmovd1, x0
sshld0, d0, d1
ret

This behavior isn't specifically mentioned for a left shift by immediate, but I
believe that only the case because we do have a right shift by immediate but not
a right shift by register.  As such I do the same for left shift by immediate.

The testsuite already has various testcases for shifts (vshl.c etc) so I am not
adding overlapping tests here.

Out of range shifts like

int64x1_t foo3 (int64x1_t a)
{
  return vshl_s64 (a, vdup_n_s64(80));
}

now get optimized to 0 as well along with undefined behaviors both in C and
AArch64.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64-builtins.c
(aarch64_general_gimple_fold_builtin): Add ashl, sshl, ushl, ashr,
ashr_simd, lshr, lshr_simd.
* config/aarch64/aarch64-simd-builtins.def (lshr): Use USHIFTIMM.
* config/aarch64/arm_neon.h (vshr_n_u8, vshr_n_u16, vshr_n_u32,
vshrq_n_u8, vshrq_n_u16, vshrq_n_u32, vshrq_n_u64): Fix type hack.


gcc/testsuite/ChangeLog:

* gcc.target/aarch64/signbit-2.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64-builtins.c 
b/gcc/config/aarch64/aarch64-builtins.c
index 
f6b41d9c200d6300dee65ba60ae94488231a8a38..e47545b111762b95242d8f8de1a26f7bd11992ae
 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -2394,6 +2394,68 @@ aarch64_general_gimple_fold_builtin (unsigned int fcode, 
gcall *stmt)
   1, args[0]);
gimple_call_set_lhs (new_stmt, gimple_call_lhs (stmt));
break;
+  BUILTIN_VSDQ_I_DI (BINOP, ashl, 3, NONE)
+   {
+ tree cst = args[1];
+ tree ctype = TREE_TYPE (cst);
+ if (INTEGRAL_TYPE_P (ctype)
+ && TREE_CODE (cst) == INTEGER_CST)
+   {
+ wide_int wcst = wi::to_wide (cst);
+ if (wi::neg_p (wcst, TYPE_SIGN (ctype)))
+   new_stmt =
+ gimple_build_assign (gimple_call_lhs (stmt),
+  RSHIFT_EXPR, args[0],
+  wide_int_to_tree (ctype,
+wi::abs (wcst)));
+ else
+   new_stmt =
+ gimple_build_assign (gimple_call_lhs (stmt),
+  LSHIFT_EXPR, args[0], args[1]);
+   }
+   }
+   break;
+  BUILTIN_VSDQ_I_DI (BINOP, sshl, 0, NONE)
+  BUILTIN_VSDQ_I_DI (BINOP_UUS, ushl, 0, NONE)
+   {
+ tree cst = args[1];
+ tree ctype = TREE_TYPE (cst);
+ HOST_WIDE_INT bits = GET_MODE_UNIT_BITSIZE (TYPE_MODE (TREE_TYPE 
(args[0])));
+ if (INTEGRAL_TYPE_P (ctype)
+ && TREE_CODE (cst) == INTEGER_CST)
+   {
+ wide_int wcst = wi::to_wide (cst);
+ wide_int abs_cst = wi::abs (wcst);
+ if (wi::eq_p (abs_cst, bits))
+   break;
+
+ if (wi::neg_p (wcst, TYPE_SIGN (ctype)))
+   new_stmt =
+ gimple_build_assign (gimple_call_lhs (stmt),
+  RSHIFT_EXPR, args[0],
+  wide_int_to_tree (ctype, abs_cst));
+ else
+   new_stmt =
+ gimple_build_assign (gimple_call_lhs (stmt),
+  LSHIFT_EXPR, args[0], args[1]);
+   }
+   }
+   break;
+  BUILTIN_VDQ_I (SHIFTIMM, ashr, 3, NONE)
+  VAR1 (SHIFTIMM, ashr_simd, 0, NONE, di)
+  BUILTIN_VDQ_I (USHIFTIMM, lshr, 3, NONE)
+  VAR1 (USHIFTIMM, lshr_simd, 0, NONE, di)
+   {
+ tree cst = args[1];
+ tree ctype = TREE_TYPE (cst);
+ HOST_WIDE_INT bits = GET_MODE_UNIT_BITSIZE (TYPE_MODE (TREE_TYPE 
(args[0])));
+ if (INTEGRAL_TYPE_P (ctype)
+ && TREE_CODE (cst) == INTEGER_CST
+ && wi::ne_p (wi::to_wide (cst), bits))
+   new_stmt = gimple_build_assign (gimple_call_lhs (stmt),
+   RSHIFT_EXPR, args[0], args[1]);
+   }
+   break;
   BUILTIN_GPF (BINOP, fmulx, 0, ALL)
{
  gcc_assert (nargs == 2);
diff --git 

Re: [COMMITTED] Do not call range_on_path_entry for SSAs defined within the path

2021-10-15 Thread Christophe LYON via Gcc-patches



On 14/10/2021 14:21, Aldy Hernandez via Gcc-patches wrote:

In the path solver, when requesting the range of an SSA for which we
know nothing, we ask the ranger for the range incoming to the path.
We do this by asking for all the incoming ranges to the path entry
block and unioning them.

The problem here is that we're asking for a range on path entry for an
SSA which *is* defined in the path, but for which we know nothing
about:

some_global.1_2 = some_global;
_3 = (char) some_global.1_2;

This request is causing us to ask for range_on_edge of _3 on the
incoming edges to the path.  This is a bit of nonsensical request
because _3 isn't live on entry to the path, so ranger correctly
returns UNDEFINED.  The proper thing is to avoid asking this in the
first place.

I have added a relevant assert, since it doesn't make sense to call
range_on_path_entry for SSAs defined within the path.

Tested on x86-64 Linux.

PR 102736

gcc/ChangeLog:

PR tree/optimization/102736
* gimple-range-path.cc (path_range_query::range_on_path_entry):
Assert that the requested range is defined outside the path.
(path_range_query::ssa_range_in_phi): Do not call
range_on_path_entry for SSA names that are defined within the
path.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr102736.c: New test.
---
  gcc/gimple-range-path.cc |  6 +-
  gcc/testsuite/gcc.dg/tree-ssa/pr102736.c | 21 +
  2 files changed, 26 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr102736.c

diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc
index 422abfddb8f..694271306a7 100644
--- a/gcc/gimple-range-path.cc
+++ b/gcc/gimple-range-path.cc
@@ -134,6 +134,7 @@ path_range_query::defined_outside_path (tree name)
  void
  path_range_query::range_on_path_entry (irange , tree name)
  {
+  gcc_checking_assert (defined_outside_path (name));
int_range_max tmp;
basic_block entry = entry_bb ();
bool changed = false;
@@ -258,7 +259,10 @@ path_range_query::ssa_range_in_phi (irange , gphi *phi)
// Using both the range on entry to the path, and the
// range on this edge yields significantly better
// results.
-   range_on_path_entry (r, arg);
+   if (defined_outside_path (arg))
+ range_on_path_entry (r, arg);
+   else
+ r.set_varying (TREE_TYPE (name));
m_ranger.range_on_edge (tmp, e_in, arg);
r.intersect (tmp);
return;
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr102736.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr102736.c
new file mode 100644
index 000..7e556f01a86
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr102736.c
@@ -0,0 +1,21 @@
+// { dg-do run }
+// { dg-options "-O1 -ftree-vrp" }
+
+int a, b = -1, c;
+int d = 1;
+static inline char e(char f, int g) { return g ? f : 0; }
+static inline char h(char f) { return f < a ? f : f < a; }
+static inline unsigned char i(unsigned char f, int g) { return g ? f : f > g; }
+void j() {
+L:
+  c = e(1, i(h(b), d));
+  if (b)
+return;
+  goto L;
+}
+int main() {
+  j();
+  if (c != 1)
+__builtin_abort ();
+  return 0;
+}


Hi,


The new test fails at execution on arm / aarch64, not sure if you are 
aware of that already?



Thanks,

Christophe




RE: [PATCH]middle-end convert negate + right shift into compare greater.

2021-10-15 Thread Tamar Christina via Gcc-patches
> >
> > +/* Fold (-x >> C) into x > 0 where C = precision(type) - 1.  */ (for
> > +cst (INTEGER_CST VECTOR_CST)  (simplify
> > +  (rshift (negate:s @0) cst@1)
> > +   (if (!flag_wrapv)
> 
> Don't test flag_wrapv directly, instead use the appropriate
> TYPE_OVERFLOW_{UNDEFINED,WRAPS} predicates.  But I'm not sure what
> we are protecting against?  Right-shift of signed integers is implementation-
> defined and GCC treats it as you'd expect, sign-extending the result.
> 

It's protecting against the overflow of the negate on INT_MIN. When wrapping
overflows are enabled the results would be wrong.

> > +(with { tree ctype = TREE_TYPE (@0);
> > +   tree stype = TREE_TYPE (@1);
> > +   tree bt = truth_type_for (ctype); }
> > + (switch
> > +  /* Handle scalar case.  */
> > +  (if (INTEGRAL_TYPE_P (ctype)
> > +  && !VECTOR_TYPE_P (ctype)
> > +  && !TYPE_UNSIGNED (ctype)
> > +  && canonicalize_math_after_vectorization_p ()
> > +  && wi::eq_p (wi::to_wide (@1), TYPE_PRECISION (stype) - 1))
> > +   (convert:bt (gt:bt @0 { build_zero_cst (stype); })))
> 
> I'm not sure why the result is of type 'bt' rather than the original type of 
> the
> expression?

That was to satisfy some RTL check that expected results of comparisons to 
always
be a Boolean, though for scalar that logically always is the case, I just added 
it
for consistency.

> 
> In that regard for non-vectors we'd have to add the sign extension from
> unsigned bool, in the vector case we'd hope the type of the comparison is
> correct.  I think in both cases it might be convenient to use
> 
>   (cond (gt:bt @0 { build_zero_cst (ctype); }) { build_all_ones_cst (ctype); }
> { build_zero_cost (ctype); })
> 
> to compute the correct result and rely on (cond ..) simplifications to 
> simplify
> that if possible.
> 
> Btw, 'stype' should be irrelevant - you need to look at the precision of 
> 'ctype',
> no?

I was working under the assumption that both input types must have the same
precision, but turns out that assumption doesn't need to hold.

New version attached.

Bootstrapped Regtested on aarch64-none-linux-gnu,
x86_64-pc-linux-gnu and no regressions.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* match.pd: New negate+shift pattern.

gcc/testsuite/ChangeLog:

* gcc.dg/signbit-2.c: New test.
* gcc.dg/signbit-3.c: New test.
* gcc.target/aarch64/signbit-1.c: New test.

--- inline copy of patch ---

diff --git a/gcc/match.pd b/gcc/match.pd
index 
7d2a24dbc5e9644a09968f877e12a824d8ba1caa..9532cae582e152cae6e22fcce95a9744a844e3c2
 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -38,7 +38,8 @@ along with GCC; see the file COPYING3.  If not see
uniform_integer_cst_p
HONOR_NANS
uniform_vector_p
-   bitmask_inv_cst_vector_p)
+   bitmask_inv_cst_vector_p
+   expand_vec_cmp_expr_p)
 
 /* Operator lists.  */
 (define_operator_list tcc_comparison
@@ -826,6 +827,42 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 { tree utype = unsigned_type_for (type); }
 (convert (rshift (lshift (convert:utype @0) @2) @3))
 
+/* Fold (-x >> C) into x > 0 where C = precision(type) - 1.  */
+(for cst (INTEGER_CST VECTOR_CST)
+ (simplify
+  (rshift (negate:s @0) cst@1)
+   (if (!TYPE_OVERFLOW_WRAPS (type))
+(with { tree ctype = TREE_TYPE (@0);
+   tree stype = TREE_TYPE (@1);
+   tree bt = truth_type_for (ctype);
+   tree zeros = build_zero_cst (ctype); }
+ (switch
+  /* Handle scalar case.  */
+  (if (INTEGRAL_TYPE_P (ctype)
+  && !VECTOR_TYPE_P (ctype)
+  && !TYPE_UNSIGNED (ctype)
+  && canonicalize_math_after_vectorization_p ()
+  && wi::eq_p (wi::to_wide (@1), TYPE_PRECISION (ctype) - 1))
+   (cond (gt:bt @0 { zeros; }) { build_all_ones_cst (ctype); } { zeros; }))
+  /* Handle vector case with a scalar immediate.  */
+  (if (VECTOR_INTEGER_TYPE_P (ctype)
+  && !VECTOR_TYPE_P (stype)
+  && !TYPE_UNSIGNED (ctype)
+  && expand_vec_cmp_expr_p (ctype, ctype, { GT_EXPR }))
+   (with { HOST_WIDE_INT bits = GET_MODE_UNIT_BITSIZE (TYPE_MODE (ctype)); 
}
+   (if (wi::eq_p (wi::to_wide (@1), bits - 1))
+(convert:bt (gt:bt @0 { zeros; })
+  /* Handle vector case with a vector immediate.   */
+  (if (VECTOR_INTEGER_TYPE_P (ctype)
+  && VECTOR_TYPE_P (stype)
+  && !TYPE_UNSIGNED (ctype)
+  && uniform_vector_p (@1)
+  && expand_vec_cmp_expr_p (ctype, ctype, { GT_EXPR }))
+   (with { tree cst = vector_cst_elt (@1, 0);
+  HOST_WIDE_INT bits = GET_MODE_UNIT_BITSIZE (TYPE_MODE (ctype)); }
+(if (wi::eq_p (wi::to_wide (cst), bits - 1))
+(convert:bt (gt:bt @0 { zeros; }))
+
 /* Fold (C1/X)*C2 into (C1*C2)/X.  */
 (simplify
  (mult (rdiv@3 REAL_CST@0 @1) REAL_CST@2)
diff --git a/gcc/testsuite/gcc.dg/signbit-2.c b/gcc/testsuite/gcc.dg/signbit-2.c
new file mode 100644
index 

Re: FW: [PING] Re: [Patch][GCC][middle-end] - Generate FRINTZ for (double)(int) under -ffast-math on aarch64

2021-10-15 Thread Richard Biener via Gcc-patches
On Fri, Sep 24, 2021 at 2:59 PM Jirui Wu via Gcc-patches
 wrote:
>
> Hi,
>
> Ping: https://gcc.gnu.org/pipermail/gcc-patches/2021-August/577846.html
>
> The patch is attached as text for ease of use. Is there anything that needs 
> to change?
>
> Ok for master? If OK, can it be committed for me, I have no commit rights.

I'm still not sure about the correctness.  I suppose the
flag_fp_int_builtin_inexact && !flag_trapping_math is supposed to guard
against spurious inexact exceptions, shouldn't that be
!flag_fp_int_builtin_inexact || !flag_trapping_math instead?

The comment looks a bit redundant and we prefer sth like

/* (double)(int)x -> trunc (x) if the type of x matches the
expressions FP type.  */

Thanks,
Richard.

> Jirui Wu
>
> -Original Message-
> From: Jirui Wu
> Sent: Friday, September 10, 2021 10:14 AM
> To: Richard Biener 
> Cc: Richard Biener ; Andrew Pinski 
> ; Richard Sandiford ; 
> i...@airs.com; gcc-patches@gcc.gnu.org; Joseph S. Myers 
> 
> Subject: [PING] Re: [Patch][GCC][middle-end] - Generate FRINTZ for 
> (double)(int) under -ffast-math on aarch64
>
> Hi,
>
> Ping: https://gcc.gnu.org/pipermail/gcc-patches/2021-August/577846.html
>
> Ok for master? If OK, can it be committed for me, I have no commit rights.
>
> Jirui Wu
> -Original Message-
> From: Jirui Wu
> Sent: Friday, September 3, 2021 12:39 PM
> To: 'Richard Biener' 
> Cc: Richard Biener ; Andrew Pinski 
> ; Richard Sandiford ; 
> i...@airs.com; gcc-patches@gcc.gnu.org; Joseph S. Myers 
> 
> Subject: RE: [Patch][GCC][middle-end] - Generate FRINTZ for (double)(int) 
> under -ffast-math on aarch64
>
> Ping
>
> -Original Message-
> From: Jirui Wu
> Sent: Friday, August 20, 2021 4:28 PM
> To: Richard Biener 
> Cc: Richard Biener ; Andrew Pinski 
> ; Richard Sandiford ; 
> i...@airs.com; gcc-patches@gcc.gnu.org; Joseph S. Myers 
> 
> Subject: RE: [Patch][GCC][middle-end] - Generate FRINTZ for (double)(int) 
> under -ffast-math on aarch64
>
> > -Original Message-
> > From: Richard Biener 
> > Sent: Friday, August 20, 2021 8:15 AM
> > To: Jirui Wu 
> > Cc: Richard Biener ; Andrew Pinski
> > ; Richard Sandiford ;
> > i...@airs.com; gcc-patches@gcc.gnu.org; Joseph S. Myers
> > 
> > Subject: RE: [Patch][GCC][middle-end] - Generate FRINTZ for
> > (double)(int) under -ffast-math on aarch64
> >
> > On Thu, 19 Aug 2021, Jirui Wu wrote:
> >
> > > Hi all,
> > >
> > > This patch generates FRINTZ instruction to optimize type casts.
> > >
> > > The changes in this patch covers:
> > > * Generate FRINTZ for (double)(int) casts.
> > > * Add new test cases.
> > >
> > > The intermediate type is not checked according to the C99 spec.
> > > Overflow of the integral part when casting floats to integers causes
> > undefined behavior.
> > > As a result, optimization to trunc() is not invalid.
> > > I've confirmed that Boolean type does not match the matching condition.
> > >
> > > Regtested on aarch64-none-linux-gnu and no issues.
> > >
> > > Ok for master? If OK can it be committed for me, I have no commit rights.
> >
> > +/* Detected a fix_trunc cast inside a float type cast,
> > +   use IFN_TRUNC to optimize.  */
> > +#if GIMPLE
> > +(simplify
> > +  (float (fix_trunc @0))
> > +  (if (direct_internal_fn_supported_p (IFN_TRUNC, type,
> > +  OPTIMIZE_FOR_BOTH)
> > +   && flag_unsafe_math_optimizations
> > +   && type == TREE_TYPE (@0))
> >
> > types_match (type, TREE_TYPE (@0))
> >
> > please.  Please perform cheap tests first (the flag test).
> >
> > + (IFN_TRUNC @0)))
> > +#endif
> >
> > why only for GIMPLE?  I'm not sure flag_unsafe_math_optimizations is a
> > good test here.  If you say we can use undefined behavior of any
> > overflow of the fix_trunc operation what do we guard here?
> > If it's Inf/NaN input then flag_finite_math_only would be more
> > appropriate, if it's behavior for -0. (I suppose trunc (-0.0) == -0.0
> > and thus "wrong") then a && !HONOR_SIGNED_ZEROS (type) is missing
> > instead.  If it's setting of FENV state and possibly trapping on
> > overflow (but it's undefined?!) then flag_trapping_math covers the
> > latter but we don't have any flag for eliding FENV state affecting
> > transforms, so there the kitchen-sink flag_unsafe_math_optimizations might 
> > apply.
> >
> > So - which is it?
> >
> This change is only for GIMPLE because we can't test for the optab support 
> without being in GIMPLE. direct_internal_fn_supported_p is defined only for 
> GIMPLE.
>
> IFN_TRUNC's documentation mentions nothing for zero, NaNs/inf inputs.
> So I think the correct guard is just flag_fp_int_builtin_inexact.
> !flag_trapping_math because the operation can only still raise inexacts.
>
> The new pattern is moved next to the place you mentioned.
>
> Ok for master? If OK can it be committed for me, I have no commit rights.
>
> Thanks,
> Jirui
> > Note there's also the pattern
> >
> > /* Handle cases of two conversions in a row.  */ (for ocvt (convert
> > float
> 

[PATCH] ipa/102762 - fix ICE with invalid __builtin_va_arg_pack () use

2021-10-15 Thread Richard Biener via Gcc-patches
We have to be careful to not break the argument space calculation.
If there's not enough arguments just do not append any.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-10-15  Richard Biener  

PR ipa/102762
* tree-inline.c (copy_bb): Avoid underflowing nargs.

* gcc.dg/torture/pr102762.c: New testcase.
---
 gcc/testsuite/gcc.dg/torture/pr102762.c | 11 +++
 gcc/tree-inline.c   |  8 +++-
 2 files changed, 18 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr102762.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr102762.c 
b/gcc/testsuite/gcc.dg/torture/pr102762.c
new file mode 100644
index 000..67c6b00ccea
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr102762.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* We fail to diagnose the invalid __builtin_va_arg_pack use with -flto.  */
+/* { dg-skip-if "" { *-*-* } { "-flto" } { "" } } */
+
+void log_bad_request();
+void foo(a, b)
+ int a, b;
+{
+  log_bad_request(0, __builtin_va_arg_pack());  /* { dg-error "invalid use" } 
*/
+  foo(0);
+}
diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
index e292a144967..b2c58ac4c3b 100644
--- a/gcc/tree-inline.c
+++ b/gcc/tree-inline.c
@@ -2117,7 +2117,13 @@ copy_bb (copy_body_data *id, basic_block bb,
  size_t nargs = nargs_caller;
 
  for (p = DECL_ARGUMENTS (id->src_fn); p; p = DECL_CHAIN (p))
-   nargs--;
+   {
+ /* Avoid crashing on invalid IL that doesn't have a
+varargs function or that passes not enough arguments.  */
+ if (nargs == 0)
+   break;
+ nargs--;
+   }
 
  /* Create the new array of arguments.  */
  size_t nargs_callee = gimple_call_num_args (call_stmt);
-- 
2.31.1


Re: [PATCH] Adjust testcase for O2 vectorization.

2021-10-15 Thread Kewen.Lin via Gcc-patches
on 2021/10/14 下午6:56, Kewen.Lin via Gcc-patches wrote:
> Hi Hongtao,
> 
> on 2021/10/14 下午3:11, liuhongt wrote:
>> Hi Kewen:
>>   Cound you help to verify if this patch fix those regressions
>> for rs6000 port.
>>
> 
> The ppc64le run just finished, there are still some regresssions:
> 
> NA->XPASS: c-c++-common/Wstringop-overflow-2.c  -Wc++-compat   (test for 
> warnings, line 194)
> NA->XPASS: c-c++-common/Wstringop-overflow-2.c  -Wc++-compat   (test for 
> warnings, line 212)
> NA->XPASS: c-c++-common/Wstringop-overflow-2.c  -Wc++-compat   (test for 
> warnings, line 296)
> NA->XPASS: c-c++-common/Wstringop-overflow-2.c  -Wc++-compat   (test for 
> warnings, line 314)
> NA->FAIL: gcc.dg/Wstringop-overflow-21-novec.c (test for excess errors)
> NA->FAIL: gcc.dg/Wstringop-overflow-21-novec.c  (test for warnings, line 18)
> NA->FAIL: gcc.dg/Wstringop-overflow-21-novec.c  (test for warnings, line 29)
> NA->FAIL: gcc.dg/Wstringop-overflow-21-novec.c  (test for warnings, line 45)
> NA->FAIL: gcc.dg/Wstringop-overflow-21-novec.c  (test for warnings, line 55)
> NA->FAIL: gcc.dg/Wstringop-overflow-76-novec.c note (test for warnings, line 
> 104)
> NA->FAIL: gcc.dg/Wstringop-overflow-76-novec.c note (test for warnings, line 
> 137)
> NA->FAIL: gcc.dg/Wstringop-overflow-76-novec.c note (test for warnings, line 
> 19)
> NA->FAIL: gcc.dg/Wstringop-overflow-76-novec.c note (test for warnings, line 
> 39)
> NA->FAIL: gcc.dg/Wstringop-overflow-76-novec.c note (test for warnings, line 
> 56)
> NA->FAIL: gcc.dg/Wstringop-overflow-76-novec.c note (test for warnings, line 
> 70)
> NA->FAIL: gcc.dg/Wstringop-overflow-76-novec.c (test for excess errors)
> NA->FAIL: gcc.dg/Wstringop-overflow-76-novec.c  (test for warnings, line 116)
> NA->FAIL: gcc.dg/Wstringop-overflow-76-novec.c  (test for warnings, line 131)
> NA->FAIL: gcc.dg/Wstringop-overflow-76-novec.c  (test for warnings, line 146)
> NA->FAIL: gcc.dg/Wstringop-overflow-76-novec.c  (test for warnings, line 33)
> NA->FAIL: gcc.dg/Wstringop-overflow-76-novec.c  (test for warnings, line 50)
> NA->FAIL: gcc.dg/Wstringop-overflow-76-novec.c  (test for warnings, line 64)
> NA->FAIL: gcc.dg/Wstringop-overflow-76-novec.c  (test for warnings, line 78)
> NA->FAIL: gcc.dg/Wstringop-overflow-76-novec.c  (test for warnings, line 97)
> PASS->FAIL: c-c++-common/Wstringop-overflow-2.c  -std=gnu++14 (test for 
> excess errors)
> NA->FAIL: c-c++-common/Wstringop-overflow-2.c  -std=gnu++14  (test for 
> warnings, line 229)
> NA->FAIL: c-c++-common/Wstringop-overflow-2.c  -std=gnu++14  (test for 
> warnings, line 230)
> NA->FAIL: c-c++-common/Wstringop-overflow-2.c  -std=gnu++14  (test for 
> warnings, line 331)
> NA->FAIL: c-c++-common/Wstringop-overflow-2.c  -std=gnu++14  (test for 
> warnings, line 332)
> // omitting -std=gnu++17, -std=gnu++2a, -std=gnu++98
> 
> I'll have a look and get back to you tomorrow.
> 

The failure c-c++-common/Wstringop-overflow-2.c is due to that the
current proc check_vect_slp_vnqihi_store_usage is made as "cache"
but it can vary for different input patterns.  For rs6000 the test
for v2qi fails, the cached test result makes v4qi check fail
unexpectedly (should pass).  I adjusted caching for the following users
check_effective_target_vect_slp_v*_store, also refactored a bit.
One trivial change is to add one new argument macro then we can just
compile the corresponding foo* function instead of all, hope it helps
to make the debugging outputs compact.

For the failure Wstringop-overflow-76-novec.c, there is one typo
comparing to the original Wstringop-overflow-76.c.  Guess it failed
on x86 too?  It would be surprising if it passes on x86.
As to the failure Wstringop-overflow-21-novec.c, I confirmed it's
just noise, patching typos caused this failure.

One new round ppc64le testing just finished with below diff and all
previous regressions are fixed without any new regressions.


diff --git a/gcc/testsuite/gcc.dg/Wstringop-overflow-76-novec.c 
b/gcc/testsuite/gcc.dg/Wstringop-overflow-76-novec.c
index d000b587a65..1132348c5f4 100644
--- a/gcc/testsuite/gcc.dg/Wstringop-overflow-76-novec.c
+++ b/gcc/testsuite/gcc.dg/Wstringop-overflow-76-novec.c
@@ -82,7 +82,7 @@ void max_d8_p (char *q, int i)
 struct A3_5
 {
   char a3[3];  // { dg-message "at offset 3 into destination object 'a3' of 
size 3" "pr??" { xfail *-*-* } }
-  char a5[5];
+  char a5[5];  // { dg-message "at offset 5 into destination object 'a5' of 
size 5" "note" }
 };

 void max_A3_A5 (int i, struct A3_5 *pa3_5)
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 530c5769614..8736b908ec7 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -7584,12 +7584,13 @@ proc 
check_effective_target_vect_element_align_preferred { } {
 # Return zero if the desirable pattern isn't found.
 # It's used by Warray-bounds/Wstringop-overflow testcases which are
 # regressed by O2 vectorization, refer to 

Re: [PATCH] AVX512FP16: Support vector shuffle builtins

2021-10-15 Thread Hongtao Liu via Gcc-patches
On Fri, Oct 15, 2021 at 2:15 PM Hongyu Wang  wrote:
>
> > ix86_expand_vec_perm is only called by (define_expand "vec_perm"
> > which means target, op0 and op1 must existed, and you can drop
> > if(target/op0/op1) stuff.
>
> Yes, dropped.
>
> > Those checks for NULL seems reasonable according to documents,
> > op0,op1,target maybe NULL.
> Thanks for pointing it out, didn't realize the difference between
> these 2 functions.
LGTM.
>
> Updated patch.
>
> Hongtao Liu  于2021年10月15日周五 下午1:54写道:
> >
> > On Fri, Oct 15, 2021 at 1:37 PM Hongyu Wang  wrote:
> > >
> > > > This part seems not related to vector shuffle.
> > > Yes, have separated this part to another patch and checked-in.
> > >
> > > Updated patch. Ok for this one?
> > >
> > > Hongtao Liu via Gcc-patches  于2021年10月14日周四 
> > > 下午2:33写道:
> > > >
> > > > On Thu, Oct 14, 2021 at 10:39 AM Hongyu Wang via Gcc-patches
> > > >  wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > This patch supports HFmode vector shuffle by creating HImode subreg 
> > > > > when
> > > > > expanding permutation expr.
> > > > >
> > > > > Bootstrapped/regtested on x86_64-pc-linux-gnu{-m32,} and sde{-m32,}
> > > > > OK for master?
> > > > >
> > > > > gcc/ChangeLog:
> > > > >
> > > > > * config/i386/i386-expand.c (ix86_expand_vec_perm): Convert
> > > > > HFmode input operand to HImode.
> > > > > (ix86_vectorize_vec_perm_const): Likewise.
> > > > > (ix86_expand_vector_init): Allow HFmode for 
> > > > > one_operand_shuffle.
> > > > > * config/i386/sse.md (*avx512bw_permvar_truncv16siv16hi_1_hf):
> > > > > New define_insn.
> > > > > (*avx512f_permvar_truncv8siv8hi_1_hf):
> > > > > Likewise.
> > > > >
> > > > > gcc/testsuite/ChangeLog:
> > > > >
> > > > > * gcc.target/i386/avx512fp16-builtin_shuffle-1.c: New test.
> > > > > * gcc.target/i386/avx512fp16-pr101846.c: Ditto.
> > > > > * gcc.target/i386/avx512fp16-pr94680.c: Ditto.
> > > > > ---
> > > > >  gcc/config/i386/i386-expand.c | 29 ++-
> > > > >  gcc/config/i386/sse.md| 54 +++-
> > > > >  .../i386/avx512fp16-builtin_shuffle-1.c   | 86 
> > > > > +++
> > > > >  .../gcc.target/i386/avx512fp16-pr101846.c | 56 
> > > > >  .../gcc.target/i386/avx512fp16-pr94680.c  | 61 +
> > > > >  5 files changed, 284 insertions(+), 2 deletions(-)
> > > > >  create mode 100644 
> > > > > gcc/testsuite/gcc.target/i386/avx512fp16-builtin_shuffle-1.c
> > > > >  create mode 100644 
> > > > > gcc/testsuite/gcc.target/i386/avx512fp16-pr101846.c
> > > > >  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-pr94680.c
> > > > >
> > > > > diff --git a/gcc/config/i386/i386-expand.c 
> > > > > b/gcc/config/i386/i386-expand.c
> > > > > index c0924a59efb..0f50ed3b9f8 100644
> > > > > --- a/gcc/config/i386/i386-expand.c
> > > > > +++ b/gcc/config/i386/i386-expand.c
> > > > > @@ -4836,6 +4836,18 @@ ix86_expand_vec_perm (rtx operands[])
> > > > >e = GET_MODE_UNIT_SIZE (mode);
> > > > >gcc_assert (w <= 64);
> > > > >
> > > > > +  if (GET_MODE_INNER (mode) == HFmode)
> > > > > +{
> > > > > +  machine_mode orig_mode = mode;
> > > > > +  mode = mode_for_vector (HImode, w).require ();
> > > > > +  if (target)
> > > > > +   target = lowpart_subreg (mode, target, orig_mode);
> > > > > +  if (op0)
> > > > > +   op0 = lowpart_subreg (mode, op0, orig_mode);
> > > > > +  if (op1)
> > > > > +   op1 = lowpart_subreg (mode, op1, orig_mode);
> > > > > +}
> > > > > +
> > ix86_expand_vec_perm is only called by (define_expand "vec_perm"
> > which means target, op0 and op1 must existed, and you can drop
> > if(target/op0/op1) stuff.
> > > > >if (TARGET_AVX512F && one_operand_shuffle)
> > > > >  {
> > > > >rtx (*gen) (rtx, rtx, rtx) = NULL;
> > > > > @@ -15092,7 +15104,8 @@ ix86_expand_vector_init (bool mmx_ok, rtx 
> > > > > target, rtx vals)
> > > > >   rtx ops[2] = { XVECEXP (vals, 0, 0), XVECEXP (vals, 0, 1) };
> > > > >   if (inner_mode == QImode
> > > > >   || inner_mode == HImode
> > > > > - || inner_mode == TImode)
> > > > > + || inner_mode == TImode
> > > > > + || inner_mode == HFmode)
> > > > This part seems not related to vector shuffle.
> > > > > {
> > > > >   unsigned int n_bits = n_elts * GET_MODE_SIZE 
> > > > > (inner_mode);
> > > > >   scalar_mode elt_mode = inner_mode == TImode ? DImode : 
> > > > > SImode;
> > > > > @@ -21099,6 +21112,20 @@ ix86_vectorize_vec_perm_const (machine_mode 
> > > > > vmode, rtx target, rtx op0,
> > > > >unsigned int i, nelt, which;
> > > > >bool two_args;
> > > > >
> > > > > +  /* For HF mode vector, convert it to HI using subreg.  */
> > > > > +  if (GET_MODE_INNER (vmode) == HFmode)
> > > > > +{
> > > > > +  machine_mode orig_mode = vmode;
> > > > > +  vmode = 

Ping^2: [PATCH v2 0/2] Fix vec_sel code generation and merge xxsel to vsel

2021-10-15 Thread Xionghu Luo via Gcc-patches
Ping^2, thanks.

https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579637.html


On 2021/10/8 09:17, Xionghu Luo via Gcc-patches wrote:
> Ping, thanks.
> 
> 
> On 2021/9/17 13:25, Xionghu Luo wrote:
>> These two patches are updated version from:
>> https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579490.html
>>
>> Changes:
>> 1. Fix alignment error in md files.
>> 2. Replace rtx_equal_p with match_dup.
>> 3. Use register_operand instead of gpc_reg_operand to align with
>>vperm/xxperm.
>> 4. Regression tested pass on P8LE.
>>
>> Xionghu Luo (2):
>>   rs6000: Fix wrong code generation for vec_sel [PR94613]
>>   rs6000: Fold xxsel to vsel since they have same semantics
>>
>>  gcc/config/rs6000/altivec.md  | 84 ++-
>>  gcc/config/rs6000/rs6000-call.c   | 62 ++
>>  gcc/config/rs6000/rs6000.c| 19 ++---
>>  gcc/config/rs6000/vector.md   | 26 +++---
>>  gcc/config/rs6000/vsx.md  | 25 --
>>  gcc/testsuite/gcc.target/powerpc/builtins-1.c |  2 +-
>>  gcc/testsuite/gcc.target/powerpc/pr94613.c| 47 +++
>>  7 files changed, 193 insertions(+), 72 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr94613.c
>>
> 

-- 
Thanks,
Xionghu


Re: [PATCH] AVX512FP16: Support vector shuffle builtins

2021-10-15 Thread Hongyu Wang via Gcc-patches
> ix86_expand_vec_perm is only called by (define_expand "vec_perm"
> which means target, op0 and op1 must existed, and you can drop
> if(target/op0/op1) stuff.

Yes, dropped.

> Those checks for NULL seems reasonable according to documents,
> op0,op1,target maybe NULL.
Thanks for pointing it out, didn't realize the difference between
these 2 functions.

Updated patch.

Hongtao Liu  于2021年10月15日周五 下午1:54写道:
>
> On Fri, Oct 15, 2021 at 1:37 PM Hongyu Wang  wrote:
> >
> > > This part seems not related to vector shuffle.
> > Yes, have separated this part to another patch and checked-in.
> >
> > Updated patch. Ok for this one?
> >
> > Hongtao Liu via Gcc-patches  于2021年10月14日周四 
> > 下午2:33写道:
> > >
> > > On Thu, Oct 14, 2021 at 10:39 AM Hongyu Wang via Gcc-patches
> > >  wrote:
> > > >
> > > > Hi,
> > > >
> > > > This patch supports HFmode vector shuffle by creating HImode subreg when
> > > > expanding permutation expr.
> > > >
> > > > Bootstrapped/regtested on x86_64-pc-linux-gnu{-m32,} and sde{-m32,}
> > > > OK for master?
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > * config/i386/i386-expand.c (ix86_expand_vec_perm): Convert
> > > > HFmode input operand to HImode.
> > > > (ix86_vectorize_vec_perm_const): Likewise.
> > > > (ix86_expand_vector_init): Allow HFmode for one_operand_shuffle.
> > > > * config/i386/sse.md (*avx512bw_permvar_truncv16siv16hi_1_hf):
> > > > New define_insn.
> > > > (*avx512f_permvar_truncv8siv8hi_1_hf):
> > > > Likewise.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > > * gcc.target/i386/avx512fp16-builtin_shuffle-1.c: New test.
> > > > * gcc.target/i386/avx512fp16-pr101846.c: Ditto.
> > > > * gcc.target/i386/avx512fp16-pr94680.c: Ditto.
> > > > ---
> > > >  gcc/config/i386/i386-expand.c | 29 ++-
> > > >  gcc/config/i386/sse.md| 54 +++-
> > > >  .../i386/avx512fp16-builtin_shuffle-1.c   | 86 +++
> > > >  .../gcc.target/i386/avx512fp16-pr101846.c | 56 
> > > >  .../gcc.target/i386/avx512fp16-pr94680.c  | 61 +
> > > >  5 files changed, 284 insertions(+), 2 deletions(-)
> > > >  create mode 100644 
> > > > gcc/testsuite/gcc.target/i386/avx512fp16-builtin_shuffle-1.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-pr101846.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-pr94680.c
> > > >
> > > > diff --git a/gcc/config/i386/i386-expand.c 
> > > > b/gcc/config/i386/i386-expand.c
> > > > index c0924a59efb..0f50ed3b9f8 100644
> > > > --- a/gcc/config/i386/i386-expand.c
> > > > +++ b/gcc/config/i386/i386-expand.c
> > > > @@ -4836,6 +4836,18 @@ ix86_expand_vec_perm (rtx operands[])
> > > >e = GET_MODE_UNIT_SIZE (mode);
> > > >gcc_assert (w <= 64);
> > > >
> > > > +  if (GET_MODE_INNER (mode) == HFmode)
> > > > +{
> > > > +  machine_mode orig_mode = mode;
> > > > +  mode = mode_for_vector (HImode, w).require ();
> > > > +  if (target)
> > > > +   target = lowpart_subreg (mode, target, orig_mode);
> > > > +  if (op0)
> > > > +   op0 = lowpart_subreg (mode, op0, orig_mode);
> > > > +  if (op1)
> > > > +   op1 = lowpart_subreg (mode, op1, orig_mode);
> > > > +}
> > > > +
> ix86_expand_vec_perm is only called by (define_expand "vec_perm"
> which means target, op0 and op1 must existed, and you can drop
> if(target/op0/op1) stuff.
> > > >if (TARGET_AVX512F && one_operand_shuffle)
> > > >  {
> > > >rtx (*gen) (rtx, rtx, rtx) = NULL;
> > > > @@ -15092,7 +15104,8 @@ ix86_expand_vector_init (bool mmx_ok, rtx 
> > > > target, rtx vals)
> > > >   rtx ops[2] = { XVECEXP (vals, 0, 0), XVECEXP (vals, 0, 1) };
> > > >   if (inner_mode == QImode
> > > >   || inner_mode == HImode
> > > > - || inner_mode == TImode)
> > > > + || inner_mode == TImode
> > > > + || inner_mode == HFmode)
> > > This part seems not related to vector shuffle.
> > > > {
> > > >   unsigned int n_bits = n_elts * GET_MODE_SIZE (inner_mode);
> > > >   scalar_mode elt_mode = inner_mode == TImode ? DImode : 
> > > > SImode;
> > > > @@ -21099,6 +21112,20 @@ ix86_vectorize_vec_perm_const (machine_mode 
> > > > vmode, rtx target, rtx op0,
> > > >unsigned int i, nelt, which;
> > > >bool two_args;
> > > >
> > > > +  /* For HF mode vector, convert it to HI using subreg.  */
> > > > +  if (GET_MODE_INNER (vmode) == HFmode)
> > > > +{
> > > > +  machine_mode orig_mode = vmode;
> > > > +  vmode = mode_for_vector (HImode,
> > > > +  GET_MODE_NUNITS (vmode)).require ();
> > > > +  if (target)
> > > > +   target = lowpart_subreg (vmode, target, orig_mode);
> > > > +  if (op0)
> > > > +   op0 = lowpart_subreg (vmode, op0, orig_mode);
> > > > +  if (op1)
> > > > +   op1 = lowpart_subreg