Re: [PATCH v4, rs6000] Add V1TI into vector comparison expand [PR103316]

2022-05-25 Thread Kewen.Lin via Gcc-patches
Hi Haochen,

on 2022/5/26 13:30, HAO CHEN GUI wrote:
> Kewen,
>   Thanks so much for your advice. Just one question about effective-target.
> 
>   For the test cases, it needs both power10_ok and int128 support. I saw some
> existing test cases have these two checks as well. But I wonder if power10_ok
> already covers int128 on powerpc targets? Can we save one check then?
> 

Good question, IMHO the checks are orthogonal, it's doable to disable int128
support by hacking the compiler, the int128 effective-target check then fails
due to missing defined __SIZEOF_INT128__, but power10_ok check isn't able to
catch that, the test case could end up with possible unexpected result without
the explicit int128 check.

To me, the int128 check is to ensure int128 type is available and the
power10_ok check is to ensure the power10 specific instructions are supported. 

BR,
Kewen


Re: [PATCH v4, rs6000] Add V1TI into vector comparison expand [PR103316]

2022-05-25 Thread HAO CHEN GUI via Gcc-patches
Kewen,
  Thanks so much for your advice. Just one question about effective-target.

  For the test cases, it needs both power10_ok and int128 support. I saw some
existing test cases have these two checks as well. But I wonder if power10_ok
already covers int128 on powerpc targets? Can we save one check then?

On 26/5/2022 上午 11:22, Kewen.Lin wrote:
> Hi Haochen,
> 
> on 2022/5/24 16:45, HAO CHEN GUI wrote:
>> Hi,
>>This patch adds V1TI mode into a new mode iterator used in vector
>> comparison and rotation expands. Without the patch, the comparisons
>> between two vector __int128 are converted to scalar comparisons. The
>> code is suboptimal. The patch fixes the issue. Now all comparisons
>> between two vector __int128 generates P10 new comparison instructions.
>> Also the relative built-ins generate the same instructions after gimple
>> folding. So they're added back to the list.
>>
>>   This patch also merges some vector comparison and rotation expands
>> for V1T1 and other vector integer modes as they have the same patterns.
>> The expands for V1TI only are removed.
>>
>>Bootstrapped and tested on ppc64 Linux BE and LE with no regressions.
>> Is this okay for trunk? Any recommendations? Thanks a lot.
>>
>> ChangeLog
>> 2022-05-24 Haochen Gui 
>>
>> gcc/
>>  PR target/103316
>>  * config/rs6000/rs6000-builtin.cc (rs6000_gimple_fold_builtin): Enable
>>  gimple folding for RS6000_BIF_VCMPEQUT, RS6000_BIF_VCMPNET,
>>  RS6000_BIF_CMPGE_1TI, RS6000_BIF_CMPGE_U1TI, RS6000_BIF_VCMPGTUT,
>>  RS6000_BIF_VCMPGTST, RS6000_BIF_CMPLE_1TI, RS6000_BIF_CMPLE_U1TI.
>>  * config/rs6000/vector.md (VEC_IC): Define.  Add support for new Power10
>>  V1TI instructions.
> 
> Nit: Maybe "New mode iterator" is better than "Define".
> 
>>  (vec_cmp): Set mode iterator to VEC_IC.
>>  (vec_cmpu): Likewise.
>>  (vector_nlt): Set mode iterator to VEC_IC.
>>  (vector_nltv1ti): Remove.
>>  (vector_gtu): Set mode iterator to VEC_IC.
>>  (vector_gtuv1ti): Remove.
>>  (vector_nltu): Set mode iterator to VEC_IC.
>>  (vector_nltuv1ti): Remove.
>>  (vector_geu): Set mode iterator to VEC_IC.
>>  (vector_ngt): Likewise.
>>  (vector_ngtv1ti): Remove.
>>  (vector_ngtu): Set mode iterator to VEC_IC.
>>  (vector_ngtuv1ti): Remove.
>>  (vector_gtu__p): Set mode iterator to VEC_IC.
>>  (vector_gtu_v1ti_p): Remove.
>>  (vrotl3): Set mode iterator to VEC_IC.  Emit insns for V1TI.
>>  (vrotlv1ti3): Remove.
>>  (vashr3): Set mode iterator to VEC_IC.  Emit insns for V1TI.
>>  (vashrv1ti3): Remove.
>>
>> gcc/testsuite/
>>  PR target/103316
>>  * gcc.target/powerpc/pr103316.c: New.
>>  * gcc.target/powerpc/fold-vec-cmp-int128.c: New.
>>
>> patch.diff
>> diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
>> b/gcc/config/rs6000/rs6000-builtin.cc
>> index e925ba9fad9..b67f4e066a8 100644
>> --- a/gcc/config/rs6000/rs6000-builtin.cc
>> +++ b/gcc/config/rs6000/rs6000-builtin.cc
>> @@ -2000,16 +2000,14 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator 
>> *gsi)
>>  case RS6000_BIF_VCMPEQUH:
>>  case RS6000_BIF_VCMPEQUW:
>>  case RS6000_BIF_VCMPEQUD:
>> -/* We deliberately omit RS6000_BIF_VCMPEQUT for now, because gimple
>> -   folding produces worse code for 128-bit compares.  */
>> +case RS6000_BIF_VCMPEQUT:
>>fold_compare_helper (gsi, EQ_EXPR, stmt);
>>return true;
>>
>>  case RS6000_BIF_VCMPNEB:
>>  case RS6000_BIF_VCMPNEH:
>>  case RS6000_BIF_VCMPNEW:
>> -/* We deliberately omit RS6000_BIF_VCMPNET for now, because gimple
>> -   folding produces worse code for 128-bit compares.  */
>> +case RS6000_BIF_VCMPNET:
>>fold_compare_helper (gsi, NE_EXPR, stmt);
>>return true;
>>
>> @@ -2021,9 +2019,8 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>>  case RS6000_BIF_CMPGE_U4SI:
>>  case RS6000_BIF_CMPGE_2DI:
>>  case RS6000_BIF_CMPGE_U2DI:
>> -/* We deliberately omit RS6000_BIF_CMPGE_1TI and RS6000_BIF_CMPGE_U1TI
>> -   for now, because gimple folding produces worse code for 128-bit
>> -   compares.  */
>> +case RS6000_BIF_CMPGE_1TI:
>> +case RS6000_BIF_CMPGE_U1TI:
>>fold_compare_helper (gsi, GE_EXPR, stmt);
>>return true;
>>
>> @@ -2035,9 +2032,8 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>>  case RS6000_BIF_VCMPGTUW:
>>  case RS6000_BIF_VCMPGTUD:
>>  case RS6000_BIF_VCMPGTSD:
>> -/* We deliberately omit RS6000_BIF_VCMPGTUT and RS6000_BIF_VCMPGTST
>> -   for now, because gimple folding produces worse code for 128-bit
>> -   compares.  */
>> +case RS6000_BIF_VCMPGTUT:
>> +case RS6000_BIF_VCMPGTST:
>>fold_compare_helper (gsi, GT_EXPR, stmt);
>>return true;
>>
>> @@ -2049,9 +2045,8 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>>  case RS6000_BIF_CMPLE_U4SI:
>>  case RS6000_BIF_CMPLE_2DI:
>>  case RS6000_BIF_CMPLE_U2DI:
>> -  

Re: [PATCH v4, rs6000] Add V1TI into vector comparison expand [PR103316]

2022-05-25 Thread Kewen.Lin via Gcc-patches
Hi Haochen,

on 2022/5/24 16:45, HAO CHEN GUI wrote:
> Hi,
>This patch adds V1TI mode into a new mode iterator used in vector
> comparison and rotation expands. Without the patch, the comparisons
> between two vector __int128 are converted to scalar comparisons. The
> code is suboptimal. The patch fixes the issue. Now all comparisons
> between two vector __int128 generates P10 new comparison instructions.
> Also the relative built-ins generate the same instructions after gimple
> folding. So they're added back to the list.
> 
>   This patch also merges some vector comparison and rotation expands
> for V1T1 and other vector integer modes as they have the same patterns.
> The expands for V1TI only are removed.
> 
>Bootstrapped and tested on ppc64 Linux BE and LE with no regressions.
> Is this okay for trunk? Any recommendations? Thanks a lot.
> 
> ChangeLog
> 2022-05-24 Haochen Gui 
> 
> gcc/
>   PR target/103316
>   * config/rs6000/rs6000-builtin.cc (rs6000_gimple_fold_builtin): Enable
>   gimple folding for RS6000_BIF_VCMPEQUT, RS6000_BIF_VCMPNET,
>   RS6000_BIF_CMPGE_1TI, RS6000_BIF_CMPGE_U1TI, RS6000_BIF_VCMPGTUT,
>   RS6000_BIF_VCMPGTST, RS6000_BIF_CMPLE_1TI, RS6000_BIF_CMPLE_U1TI.
>   * config/rs6000/vector.md (VEC_IC): Define.  Add support for new Power10
>   V1TI instructions.

Nit: Maybe "New mode iterator" is better than "Define".

>   (vec_cmp): Set mode iterator to VEC_IC.
>   (vec_cmpu): Likewise.
>   (vector_nlt): Set mode iterator to VEC_IC.
>   (vector_nltv1ti): Remove.
>   (vector_gtu): Set mode iterator to VEC_IC.
>   (vector_gtuv1ti): Remove.
>   (vector_nltu): Set mode iterator to VEC_IC.
>   (vector_nltuv1ti): Remove.
>   (vector_geu): Set mode iterator to VEC_IC.
>   (vector_ngt): Likewise.
>   (vector_ngtv1ti): Remove.
>   (vector_ngtu): Set mode iterator to VEC_IC.
>   (vector_ngtuv1ti): Remove.
>   (vector_gtu__p): Set mode iterator to VEC_IC.
>   (vector_gtu_v1ti_p): Remove.
>   (vrotl3): Set mode iterator to VEC_IC.  Emit insns for V1TI.
>   (vrotlv1ti3): Remove.
>   (vashr3): Set mode iterator to VEC_IC.  Emit insns for V1TI.
>   (vashrv1ti3): Remove.
> 
> gcc/testsuite/
>   PR target/103316
>   * gcc.target/powerpc/pr103316.c: New.
>   * gcc.target/powerpc/fold-vec-cmp-int128.c: New.
> 
> patch.diff
> diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
> b/gcc/config/rs6000/rs6000-builtin.cc
> index e925ba9fad9..b67f4e066a8 100644
> --- a/gcc/config/rs6000/rs6000-builtin.cc
> +++ b/gcc/config/rs6000/rs6000-builtin.cc
> @@ -2000,16 +2000,14 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>  case RS6000_BIF_VCMPEQUH:
>  case RS6000_BIF_VCMPEQUW:
>  case RS6000_BIF_VCMPEQUD:
> -/* We deliberately omit RS6000_BIF_VCMPEQUT for now, because gimple
> -   folding produces worse code for 128-bit compares.  */
> +case RS6000_BIF_VCMPEQUT:
>fold_compare_helper (gsi, EQ_EXPR, stmt);
>return true;
> 
>  case RS6000_BIF_VCMPNEB:
>  case RS6000_BIF_VCMPNEH:
>  case RS6000_BIF_VCMPNEW:
> -/* We deliberately omit RS6000_BIF_VCMPNET for now, because gimple
> -   folding produces worse code for 128-bit compares.  */
> +case RS6000_BIF_VCMPNET:
>fold_compare_helper (gsi, NE_EXPR, stmt);
>return true;
> 
> @@ -2021,9 +2019,8 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>  case RS6000_BIF_CMPGE_U4SI:
>  case RS6000_BIF_CMPGE_2DI:
>  case RS6000_BIF_CMPGE_U2DI:
> -/* We deliberately omit RS6000_BIF_CMPGE_1TI and RS6000_BIF_CMPGE_U1TI
> -   for now, because gimple folding produces worse code for 128-bit
> -   compares.  */
> +case RS6000_BIF_CMPGE_1TI:
> +case RS6000_BIF_CMPGE_U1TI:
>fold_compare_helper (gsi, GE_EXPR, stmt);
>return true;
> 
> @@ -2035,9 +2032,8 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>  case RS6000_BIF_VCMPGTUW:
>  case RS6000_BIF_VCMPGTUD:
>  case RS6000_BIF_VCMPGTSD:
> -/* We deliberately omit RS6000_BIF_VCMPGTUT and RS6000_BIF_VCMPGTST
> -   for now, because gimple folding produces worse code for 128-bit
> -   compares.  */
> +case RS6000_BIF_VCMPGTUT:
> +case RS6000_BIF_VCMPGTST:
>fold_compare_helper (gsi, GT_EXPR, stmt);
>return true;
> 
> @@ -2049,9 +2045,8 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>  case RS6000_BIF_CMPLE_U4SI:
>  case RS6000_BIF_CMPLE_2DI:
>  case RS6000_BIF_CMPLE_U2DI:
> -/* We deliberately omit RS6000_BIF_CMPLE_1TI and RS6000_BIF_CMPLE_U1TI
> -   for now, because gimple folding produces worse code for 128-bit
> -   compares.  */
> +case RS6000_BIF_CMPLE_1TI:
> +case RS6000_BIF_CMPLE_U1TI:
>fold_compare_helper (gsi, LE_EXPR, stmt);
>return true;
> 
> diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md
> index 4d0797c48f8..3b7a272994f 100644
> 

[PATCH] RISC-V: Add -mtune=thead-c906 to the invoke docs

2022-05-25 Thread Palmer Dabbelt
gcc/ChangeLog

* doc/invoke.texi (RISC-V): Document -mtune=thead-c906.

Signed-off-by: Palmer Dabbelt 
---
 gcc/doc/invoke.texi | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 71098d86313..a584dc6a7f9 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -28088,7 +28088,7 @@ Permissible values for this option are: 
@samp{sifive-e20}, @samp{sifive-e21},
 Optimize the output for the given processor, specified by microarchitecture or
 particular CPU name.  Permissible values for this option are: @samp{rocket},
 @samp{sifive-3-series}, @samp{sifive-5-series}, @samp{sifive-7-series},
-@samp{size}, and all valid options for @option{-mcpu=}.
+@samp{thead-c906}, @samp{size}, and all valid options for @option{-mcpu=}.
 
 When @option{-mtune=} is not specified, use the setting from @option{-mcpu},
 the default is @samp{rocket} if both are not specified.
-- 
2.34.1



Re: [PATCH][_GLIBCXX_INLINE_VERSION] Fix std::span pretty printer

2022-05-25 Thread Jonathan Wakely via Gcc-patches
On Thu, 26 May 2022 at 00:34, Jonathan Wakely  wrote:
>
> On Wed, 25 May 2022 at 21:29, François Dumont via Libstdc++
>  wrote:
> >
> > Hi
> >
> >  Here is a patch to fix std::span pretty printer in versioned
> > namespace mode.
> >
> >  Note that there is still a problem with std::atomic after this patch.
> >
> > got: $13 = std::atomic> (empty) = {get() = 0x0}
> > FAIL: libstdc++-prettyprinters/cxx20.cc print spe

Does this fix it?

--- a/libstdc++-v3/python/libstdcxx/v6/printers.py
+++ b/libstdc++-v3/python/libstdcxx/v6/printers.py
@@ -1734,6 +1734,7 @@ class StdAtomicPrinter:
impl = val['_M_impl']
self.shptr_printer = SharedPointerPrinter(typename, impl)
self.children = self._shptr_children
+self.typename = self.typename.replace(self.value_type.tag, typ)

def _shptr_children(self):
return SmartPtrIterator(self.shptr_printer.pointer)


I'll test it with a versioned-namespace build tomorrow.





> >
> >  libstdc++: [_GLIBCXX_INLINE_VERSION] Fix std::span pretty printer
> >
> >  libstdc++-v3/ChangeLog:
> >
> >  * python/libstdcxx/v6/printers.py (StdSpanPrinter.__init__):
> >  Strip typename from version namespace.
> >
> > Tested under Linux x86_64 _GLIBCXX_INLINE_VERSION mode.
> >
> > Ok to commit ?
>
> OK, thanks.



Re: [PATCH][_GLIBCXX_INLINE_VERSION] Fix std::span pretty printer

2022-05-25 Thread Jonathan Wakely via Gcc-patches
On Wed, 25 May 2022 at 21:29, François Dumont via Libstdc++
 wrote:
>
> Hi
>
>  Here is a patch to fix std::span pretty printer in versioned
> namespace mode.
>
>  Note that there is still a problem with std::atomic after this patch.
>
> got: $13 = std::atomic> (empty) = {get() = 0x0}
> FAIL: libstdc++-prettyprinters/cxx20.cc print spe
>
>  libstdc++: [_GLIBCXX_INLINE_VERSION] Fix std::span pretty printer
>
>  libstdc++-v3/ChangeLog:
>
>  * python/libstdcxx/v6/printers.py (StdSpanPrinter.__init__):
>  Strip typename from version namespace.
>
> Tested under Linux x86_64 _GLIBCXX_INLINE_VERSION mode.
>
> Ok to commit ?

OK, thanks.



Re: [COMMITTED] Tweak some comments.

2022-05-25 Thread Bernhard Reutner-Fischer via Gcc-patches
On 25 May 2022 16:42:05 CEST, Andrew MacLeod via Gcc-patches 
 wrote:
>Just fixed some misspelling and such.

Thanks!


Re: [PATCH] RISC-V: Don't unconditionally add m, a, f, d in arch-canonicalize

2022-05-25 Thread Palmer Dabbelt

On Wed, 25 May 2022 07:00:11 PDT (-0700), gcc-patches@gcc.gnu.org wrote:

Committed, Thanks for fixing my stupid bug :P


IMO this is a good candidate for a backport.


On Wed, May 25, 2022 at 9:26 PM Simon Cook  wrote:


This solves an issue where rv32i, etc. are canonicalized to rv32imafd
since the g->i addition of 'm', 'a', 'f', 'd' is not actually gated by
whether the input was rv32g/rv64g.

gcc/ChangeLog:

* config/riscv/arch-canonicalize: Only add mafd extension if
base was rv32/rv64g.
---
 gcc/config/riscv/arch-canonicalize | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/riscv/arch-canonicalize
b/gcc/config/riscv/arch-canonicalize
index 71b2232b29e..fd7651ac491 100755
--- a/gcc/config/riscv/arch-canonicalize
+++ b/gcc/config/riscv/arch-canonicalize
@@ -73,8 +73,8 @@ def arch_canonicalize(arch, isa_spec):
   std_exts = []
   if arch[:5] in ['rv32e', 'rv32i', 'rv32g', 'rv64i', 'rv64g']:
 new_arch = arch[:5].replace("g", "i")
-std_exts = ['m', 'a', 'f', 'd']
 if arch[:5] in ['rv32g', 'rv64g']:
+  std_exts = ['m', 'a', 'f', 'd']
   if not is_isa_spec_2p2:
 extra_long_ext = ['zicsr', 'zifencei']
   else:
--
2.32.1


[PATCH, rs6000] Clean up the option_mask defines (part 3)

2022-05-25 Thread will schmidt via Gcc-patches
[PATCH, rs6000] Clean up the option_mask defines (part 3)

Hi,

Per code review, the MASK_REGNAMES, OPTION_MASK_REGNAMES,
MASK_PROTOTYPE, OPTION_MASK_PROTOTYPE options are not used
elsewhere in the codebase.  Thus it should be safe to remove them.

This includes an update to a nearby comment to hint that most
of the MASK_ options have now been replaced with their
OPTION_MASK_ equivalents.

Regtested OK on power10.  OK for trunk?

diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index dcf632c1f1ad..fe77a343d2e1 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -503,12 +503,13 @@ extern int rs6000_vector_align[];
answers if the arguments are not in the normal range.  */
 #define TARGET_MINMAX  (TARGET_HARD_FLOAT && TARGET_PPC_GFXOPT \
 && (TARGET_P9_MINMAX || !flag_trapping_math))
 
 /* In switching from using target_flags to using rs6000_isa_flags, the options
-   machinery creates OPTION_MASK_ instead of MASK_.  For now map
-   OPTION_MASK_ back into MASK_.  */
+   machinery creates OPTION_MASK_ instead of MASK_.  The MASK_
+   options that have not yet been replaced by their OPTION_MASK_
+   equivalents are defined here.  */
 #define MASK_STRICT_ALIGN  OPTION_MASK_STRICT_ALIGN
 
 #ifndef IN_LIBGCC2
 #define MASK_POWERPC64 OPTION_MASK_POWERPC64
 #endif
@@ -519,18 +520,10 @@ extern int rs6000_vector_align[];
 
 #ifdef TARGET_LITTLE_ENDIAN
 #define MASK_LITTLE_ENDIAN OPTION_MASK_LITTLE_ENDIAN
 #endif
 
-#ifdef TARGET_REGNAMES
-#define MASK_REGNAMES  OPTION_MASK_REGNAMES
-#endif
-
-#ifdef TARGET_PROTOTYPE
-#define MASK_PROTOTYPE OPTION_MASK_PROTOTYPE
-#endif
-
 #ifdef TARGET_MODULO
 #define RS6000_BTM_MODULO  OPTION_MASK_MODULO
 #endif
 
 



Re: [PATCH, rs6000] Clean up the option_mask defines (part 2)

2022-05-25 Thread will schmidt via Gcc-patches
[PATCH, rs6000] Clean up the option_mask defines (part 2)

Hi,
This patch reworks most of the lingering MASK_*
values to OPTION_MASK_* and removes the now redundant defines.

Regtested OK on power10.  OK for trunk?

gcc/
* rs6000.h (RS6000_BTM_VSX, RS6000_BTM_P8_VECTOR, RS6000_BTM_P9_VECTOR,
RS6000_BTM_P9_MISC, RS6000_BTM_HTM, RS6000_BTM_POPCNTD,
RS6000_BTM_DFP, RS6000_BTM_HARD_FLOAT, RS6000_BTM_LDBL128,
RS6000_BTM_FLOAT128, RS6000_BTM_FLOAT128_HW, RS6000_BTM_MMA,
RS6000_BTM_P10): Rework defines to use OPTION_MASK_.
(MASK_DFP, MASK_DIRECT_MOVE, MASK_FLOAT128_KEYWORD,
MASK_FLOAT128_HW, MASK_P8_FUSION, MASK_HARD_FLOAT, MASK_HTM,
MASK_MMA, MASK_MULTIPLE, MASK_NO_UPDATE, MASK_P8_VECTOR,
MASK_P9_VECTOR, MASK_P9_MISC, MASK_POPCNTD, MASK_RECIP_PRECISION,
MASK_SOFT_FLOAT, MASK_UPDATE, MASK_VSX, MASK_POWER10,
MASK_P10_FUSION): Remove unused defines.
* config/rs6000/rs6000-cpus.def (RS6000_CPU): Rework macro calls to
use OPTION_MASK_ defines.
* config/rs6000/darwin.h (TARGET_DEFAULT) Update define to use
OPTION_MASK_MULTIPLE.
* config/rs6000/darwin64-biarch.h (TARGET_DEFAULT): Same.

diff --git a/gcc/config/rs6000/darwin.h b/gcc/config/rs6000/darwin.h
index 86556ccbbf58..6a8845eb3bb7 100644
--- a/gcc/config/rs6000/darwin.h
+++ b/gcc/config/rs6000/darwin.h
@@ -365,11 +365,11 @@
 /* Default target flag settings.  Despite the fact that STMW/LMW
serializes, it's still a big code size win to use them.  Use FSEL by
default as well.  */
 
 #undef  TARGET_DEFAULT
-#define TARGET_DEFAULT (MASK_MULTIPLE | OPTION_MASK_PPC_GFXOPT)
+#define TARGET_DEFAULT (OPTION_MASK_MULTIPLE | OPTION_MASK_PPC_GFXOPT)
 
 /* Darwin always uses IBM long double, never IEEE long double.  */
 #undef  TARGET_IEEEQUAD
 #define TARGET_IEEEQUAD 0
 
diff --git a/gcc/config/rs6000/darwin64-biarch.h 
b/gcc/config/rs6000/darwin64-biarch.h
index 6a700c61c4c2..6515bcc8bf5a 100644
--- a/gcc/config/rs6000/darwin64-biarch.h
+++ b/gcc/config/rs6000/darwin64-biarch.h
@@ -19,11 +19,11 @@
along with GCC; see the file COPYING3.  If not see
.  */
 
 #undef  TARGET_DEFAULT
 #define TARGET_DEFAULT (MASK_POWERPC64 | MASK_64BIT \
-   | MASK_MULTIPLE | OPTION_MASK_PPC_GFXOPT)
+   | OPTION_MASK_MULTIPLE | OPTION_MASK_PPC_GFXOPT)
 
 #undef DARWIN_ARCH_SPEC
 #define DARWIN_ARCH_SPEC "%{m32:ppc;:ppc64}"
 
 /* Actually, there's really only 970 as an active option.  */
diff --git a/gcc/config/rs6000/rs6000-cpus.def 
b/gcc/config/rs6000/rs6000-cpus.def
index ca78bd8cf89f..4301b1bcb120 100644
--- a/gcc/config/rs6000/rs6000-cpus.def
+++ b/gcc/config/rs6000/rs6000-cpus.def
@@ -174,29 +174,31 @@
 
RS6000_CPU (NAME, CPU, FLAGS)
 
where the arguments are the fields of struct rs6000_ptt.  */
 
-RS6000_CPU ("401", PROCESSOR_PPC403, MASK_SOFT_FLOAT)
-RS6000_CPU ("403", PROCESSOR_PPC403, MASK_SOFT_FLOAT | MASK_STRICT_ALIGN)
-RS6000_CPU ("405", PROCESSOR_PPC405, MASK_SOFT_FLOAT | OPTION_MASK_MULHW
-   | OPTION_MASK_DLMZB)
+RS6000_CPU ("401", PROCESSOR_PPC403, OPTION_MASK_SOFT_FLOAT)
+RS6000_CPU ("403", PROCESSOR_PPC403, OPTION_MASK_SOFT_FLOAT | 
MASK_STRICT_ALIGN)
+RS6000_CPU ("405", PROCESSOR_PPC405, OPTION_MASK_SOFT_FLOAT
+   | OPTION_MASK_MULHW | OPTION_MASK_DLMZB)
 RS6000_CPU ("405fp", PROCESSOR_PPC405, OPTION_MASK_MULHW | OPTION_MASK_DLMZB)
-RS6000_CPU ("440", PROCESSOR_PPC440, MASK_SOFT_FLOAT | OPTION_MASK_MULHW
+RS6000_CPU ("440", PROCESSOR_PPC440, OPTION_MASK_SOFT_FLOAT | OPTION_MASK_MULHW
| OPTION_MASK_DLMZB)
 RS6000_CPU ("440fp", PROCESSOR_PPC440, OPTION_MASK_MULHW | OPTION_MASK_DLMZB)
-RS6000_CPU ("464", PROCESSOR_PPC440, MASK_SOFT_FLOAT | OPTION_MASK_MULHW
+RS6000_CPU ("464", PROCESSOR_PPC440, OPTION_MASK_SOFT_FLOAT | OPTION_MASK_MULHW
| OPTION_MASK_DLMZB)
 RS6000_CPU ("464fp", PROCESSOR_PPC440, OPTION_MASK_MULHW | OPTION_MASK_DLMZB)
-RS6000_CPU ("476", PROCESSOR_PPC476, MASK_SOFT_FLOAT | OPTION_MASK_PPC_GFXOPT
-   | OPTION_MASK_MFCRF | OPTION_MASK_POPCNTB | OPTION_MASK_FPRND
-   | OPTION_MASK_CMPB | OPTION_MASK_MULHW | OPTION_MASK_DLMZB)
-RS6000_CPU ("476fp", PROCESSOR_PPC476, OPTION_MASK_PPC_GFXOPT
-   | OPTION_MASK_MFCRF | OPTION_MASK_POPCNTB | OPTION_MASK_FPRND
-   | OPTION_MASK_CMPB | OPTION_MASK_MULHW | OPTION_MASK_DLMZB)
+RS6000_CPU ("476", PROCESSOR_PPC476,
+   OPTION_MASK_SOFT_FLOAT | OPTION_MASK_PPC_GFXOPT | OPTION_MASK_MFCRF
+   | OPTION_MASK_POPCNTB | OPTION_MASK_FPRND | OPTION_MASK_CMPB
+   | OPTION_MASK_MULHW | OPTION_MASK_DLMZB)
+RS6000_CPU ("476fp", PROCESSOR_PPC476,
+   OPTION_MASK_PPC_GFXOPT | OPTION_MASK_MFCRF | OPTION_MASK_POPCNTB
+   | OPTION_MASK_FPRND | OPTION_MASK_CMPB | OPTION_MASK_MULHW
+   | OPTION_MASK_DLMZB)
 RS6000_CPU ("505", PROCESSOR_MPCCORE, 0)
-RS6000_CPU ("601", PROCESSOR_PPC601, MASK_MULTIPLE)

[PATCH][_GLIBCXX_INLINE_VERSION] Fix std::span pretty printer

2022-05-25 Thread François Dumont via Gcc-patches

Hi

    Here is a patch to fix std::span pretty printer in versioned 
namespace mode.


    Note that there is still a problem with std::atomic after this patch.

got: $13 = std::atomic> (empty) = {get() = 0x0}
FAIL: libstdc++-prettyprinters/cxx20.cc print spe

    libstdc++: [_GLIBCXX_INLINE_VERSION] Fix std::span pretty printer

    libstdc++-v3/ChangeLog:

    * python/libstdcxx/v6/printers.py (StdSpanPrinter.__init__):
    Strip typename from version namespace.

Tested under Linux x86_64 _GLIBCXX_INLINE_VERSION mode.

Ok to commit ?

François
diff --git a/libstdc++-v3/python/libstdcxx/v6/printers.py b/libstdc++-v3/python/libstdcxx/v6/printers.py
index 0bd793c0897..fafdff3e5c0 100644
--- a/libstdc++-v3/python/libstdcxx/v6/printers.py
+++ b/libstdc++-v3/python/libstdcxx/v6/printers.py
@@ -1687,7 +1687,7 @@ class StdSpanPrinter:
 return '[%d]' % count, (self.begin + count).dereference()
 
 def __init__(self, typename, val):
-self.typename = typename
+self.typename = strip_versioned_namespace(typename)
 self.val = val
 if val.type.template_argument(1) == gdb.parse_and_eval('static_cast(-1)'):
 self.size = val['_M_extent']['_M_extent_value']
@@ -1994,7 +1994,7 @@ class FilteringTypePrinter(object):
 self.enabled = True
 
 class _recognizer(object):
-"The recognizer class for TemplateTypePrinter."
+"The recognizer class for FilteringTypePrinter."
 
 def __init__(self, match, name):
 self.match = match


[PATCH v2] c++: suppress -Waddress warnings with *_cast [PR105569]

2022-05-25 Thread Marek Polacek via Gcc-patches
On Wed, May 18, 2022 at 09:43:47AM -0400, Jason Merrill wrote:
> On 5/16/22 13:06, Marek Polacek wrote:
> > dynamic_cast can legally return nullptr, so I don't think it's helpful
> > for -Waddress to warn for
> > 
> >if (dynamic_cast())
> >  // ...
> > 
> > More generally, it's likely not useful to warn for the artificial
> > POINTER_PLUS_EXPRs created by build_base_path.
> 
> Normally the POINTER_PLUS_EXPR is guarded by if (nonnull).  But
> build_base_path isn't adding that guard in this case because the operand is
> known to be a reference, which cannot be null
> (http://eel.is/c++draft/dcl.ref#5).  So a warning is indicated for this
> testcase, though it would be good to give a more informative one ("comparing
> address of reference to null").

Ah, got it.  How about this patch instead?  Thanks,

-- >8 --
This patch improves the diagnostic for -Waddress when it warns for

  if (dynamic_cast())
// ...

where 'ref' is a reference, which cannot be null.  In particular, it
changes
warning: comparing the result of pointer addition '(((A*)ref) + 
((sizetype)(*(long int*)((& ref)->B::_vptr.B + -24' and NULL
to
warning: comparing address of reference 'ref' to null

PR c++/105569

gcc/cp/ChangeLog:

* typeck.cc (warn_for_null_address): Improve the warning when
the POINTER_PLUS_EXPR's base is of reference type.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Waddress-9.C: New test.
---
 gcc/cp/typeck.cc   | 12 --
 gcc/testsuite/g++.dg/warn/Waddress-9.C | 31 ++
 2 files changed, 41 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/warn/Waddress-9.C

diff --git a/gcc/cp/typeck.cc b/gcc/cp/typeck.cc
index 385cdf4d733..0837f2484ba 100644
--- a/gcc/cp/typeck.cc
+++ b/gcc/cp/typeck.cc
@@ -4757,8 +4757,16 @@ warn_for_null_address (location_t location, tree op, 
tsubst_flags_t complain)
   tree off = TREE_OPERAND (cop, 1);
   if (!integer_zerop (off)
  && !warning_suppressed_p (cop, OPT_Waddress))
-   warning_at (location, OPT_Waddress, "comparing the result of pointer "
-   "addition %qE and NULL", cop);
+   {
+ tree base = TREE_OPERAND (cop, 0);
+ STRIP_NOPS (base);
+ if (TYPE_REF_P (TREE_TYPE (base)))
+   warning_at (location, OPT_Waddress, "comparing address of "
+   "reference %qE and NULL", base);
+ else
+   warning_at (location, OPT_Waddress, "comparing the result of "
+   "pointer addition %qE and NULL", cop);
+   }
   return;
 }
   else if (CONVERT_EXPR_P (op)
diff --git a/gcc/testsuite/g++.dg/warn/Waddress-9.C 
b/gcc/testsuite/g++.dg/warn/Waddress-9.C
new file mode 100644
index 000..a3654ff1f91
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Waddress-9.C
@@ -0,0 +1,31 @@
+// PR c++/105569
+// { dg-do compile { target c++11 } }
+// { dg-options -Waddress }
+
+class A {};
+
+class B : public virtual A {};
+
+class C : public A {};
+
+int main() {
+B* object = new B();
+B  = *object;
+
+bool b = nullptr == dynamic_cast(); // { dg-warning "comparing 
address of reference .ref. and NULL" }
+bool b4 = nullptr == static_cast(); // { dg-warning "comparing 
address of reference .ref. and NULL" }
+if (dynamic_cast()) // { dg-warning "comparing address of 
reference .ref. and NULL" }
+  {
+  }
+if (static_cast()) // { dg-warning "comparing address of reference 
.ref. and NULL" }
+  {
+  }
+
+auto ptr = dynamic_cast();
+bool b2 = ptr == nullptr;
+
+C* cobject = new C();
+C  = *cobject;
+
+bool b3 = nullptr == dynamic_cast();
+}

base-commit: 34970d08c6297e12f3f9117b6ac19fb2de522e24
-- 
2.36.1



[PATCH, rs6000] Clean up the option_mask defines (part 1)

2022-05-25 Thread will schmidt via Gcc-patches
[PATCH, rs6000] Clean up the option_mask defines

Hi,

We have an assortment of MASK and OPTION_MASK #defines throughout
the rs6000 code, MASK_ALTIVEC and OPTION_MASK_ALTIVEC as an example.

We currently #define the MASK_ entries to their OPTION_MASK_
equivalents so the two names could be used interchangeably.

The mapping is in place from when we switched from using
target_flags to rs6000_isa_flags via
commit 4d9675496a28ef6184f2a9c3ac5e6e3ea63606c1 in 2012.

This patch converts the references for most of the lingering MASK_*
values to OPTION_MASK_*  and removes the now redundant defines.

I have split this into multiple parts due to size.

Regtested OK on power10.  OK for trunk?

gcc/
* rs6000.h (MASK_ALTIVEC, MASK_CMPB, MASK_CRYPTO
MASK_DLMZB, MASK_EABI, MASK_FPRND, MASK_ISEL
MASK_MFCRF, MASK_MULHW, MASK_POPCNTB, MASK_PPC_GFXOPT
MASK_PPC_GPOPT):  Remove defines.
(RS6000_BTM_ALTIVEC, RS6000_BTM_CMPB, RS6000_BTM_CRYPTO,
RS6000_BTM_FRE, RS6000_BTM_FRES, RS6000_BTM_FRSQRTE,
RS6000_BTM_FRSQRTES, RS6000_BTM_CELL) : Redefine using
OPTION_MASK_ instead of MASK_.
* rs6000-cpus.def (RS6000_CPU) Update macro calls to use
OPTION_MASK_ instead of MASK_.
* rs6000.cc (rs6000_darwin_file_start): Update mapping[] table
entries to use OPTION_MASK_PPC_GPOPT, OPTION_MASK_MFCRF,
OPTION_MASK_ALTIVEC instead of their MASK_ variants.
* rs6000-c.cc : Update comment to reference OPTION_MASK_GFXOPT.
* aix71.h (TARGET_DEFAULT): Update define to use OPTION_MASK_
instead of MASK_.
* darwin.h (TARGET_DEFAULT): Same.
* darwin64-biarch.h (TARGET_DEFAULT): Same.
* default64.h (TARGET_DEFAULT): Same.
* eabi.h (TARGET_DEFAULT): Same.
* eabialtivec.h (TARGET_DEFAULT): Same.
* linuxaltivec.h (TARGET_DEFAULT): Same.
* vxworks.h (TARGET_DEFAULT): Same.

diff --git a/gcc/config/rs6000/aix71.h b/gcc/config/rs6000/aix71.h
index 57e07bcc65ee..8c2ec5d36375 100644
--- a/gcc/config/rs6000/aix71.h
+++ b/gcc/config/rs6000/aix71.h
@@ -135,13 +135,15 @@ do {  
\
 #include "rs6000-cpus.def"
 #undef RS6000_CPU
 
 #undef  TARGET_DEFAULT
 #ifdef RS6000_BI_ARCH
-#define TARGET_DEFAULT (MASK_PPC_GPOPT | MASK_PPC_GFXOPT | MASK_MFCRF | 
MASK_POWERPC64 | MASK_64BIT)
+#define TARGET_DEFAULT (OPTION_MASK_PPC_GPOPT | OPTION_MASK_PPC_GFXOPT \
+   | OPTION_MASK_MFCRF | MASK_POWERPC64 | MASK_64BIT)
 #else
-#define TARGET_DEFAULT (MASK_PPC_GPOPT | MASK_PPC_GFXOPT | MASK_MFCRF)
+#define TARGET_DEFAULT (OPTION_MASK_PPC_GPOPT | OPTION_MASK_PPC_GFXOPT \
+   | OPTION_MASK_MFCRF)
 #endif
 
 #undef  PROCESSOR_DEFAULT
 #define PROCESSOR_DEFAULT PROCESSOR_POWER7
 #undef  PROCESSOR_DEFAULT64
diff --git a/gcc/config/rs6000/darwin.h b/gcc/config/rs6000/darwin.h
index b5cef42610f7..86556ccbbf58 100644
--- a/gcc/config/rs6000/darwin.h
+++ b/gcc/config/rs6000/darwin.h
@@ -365,11 +365,11 @@
 /* Default target flag settings.  Despite the fact that STMW/LMW
serializes, it's still a big code size win to use them.  Use FSEL by
default as well.  */
 
 #undef  TARGET_DEFAULT
-#define TARGET_DEFAULT (MASK_MULTIPLE | MASK_PPC_GFXOPT)
+#define TARGET_DEFAULT (MASK_MULTIPLE | OPTION_MASK_PPC_GFXOPT)
 
 /* Darwin always uses IBM long double, never IEEE long double.  */
 #undef  TARGET_IEEEQUAD
 #define TARGET_IEEEQUAD 0
 
diff --git a/gcc/config/rs6000/darwin64-biarch.h 
b/gcc/config/rs6000/darwin64-biarch.h
index 57b0fab084e3..6a700c61c4c2 100644
--- a/gcc/config/rs6000/darwin64-biarch.h
+++ b/gcc/config/rs6000/darwin64-biarch.h
@@ -19,11 +19,11 @@
along with GCC; see the file COPYING3.  If not see
.  */
 
 #undef  TARGET_DEFAULT
 #define TARGET_DEFAULT (MASK_POWERPC64 | MASK_64BIT \
-   | MASK_MULTIPLE | MASK_PPC_GFXOPT)
+   | MASK_MULTIPLE | OPTION_MASK_PPC_GFXOPT)
 
 #undef DARWIN_ARCH_SPEC
 #define DARWIN_ARCH_SPEC "%{m32:ppc;:ppc64}"
 
 /* Actually, there's really only 970 as an active option.  */
diff --git a/gcc/config/rs6000/default64.h b/gcc/config/rs6000/default64.h
index 4bf0feef2f8e..08b58c965d19 100644
--- a/gcc/config/rs6000/default64.h
+++ b/gcc/config/rs6000/default64.h
@@ -27,9 +27,10 @@ along with GCC; see the file COPYING3.  If not see
 #define TARGET_DEFAULT (ISA_2_7_MASKS_SERVER | MASK_POWERPC64 | MASK_64BIT | 
MASK_LITTLE_ENDIAN)
 #undef ASM_DEFAULT_SPEC
 #define ASM_DEFAULT_SPEC "-mpower8"
 #else
 #undef TARGET_DEFAULT
-#define TARGET_DEFAULT (MASK_PPC_GFXOPT | MASK_PPC_GPOPT | MASK_MFCRF | 
MASK_POWERPC64 | MASK_64BIT)
+#define TARGET_DEFAULT (OPTION_MASK_PPC_GFXOPT | OPTION_MASK_PPC_GPOPT \
+   | OPTION_MASK_MFCRF | MASK_POWERPC64 | MASK_64BIT)
 #undef ASM_DEFAULT_SPEC
 #define ASM_DEFAULT_SPEC "-mpower4"
 #endif
diff --git a/gcc/config/rs6000/eabi.h b/gcc/config/rs6000/eabi.h
index 

Re: [0/9] [middle-end] Add param to vec_perm_const hook to specify mode of input operand

2022-05-25 Thread Prathamesh Kulkarni via Gcc-patches
On Thu, 26 May 2022 at 00:37, Richard Biener  wrote:
>
>
>
> > Am 25.05.2022 um 21:03 schrieb Prathamesh Kulkarni 
> > :
> >
> > On Wed, 25 May 2022 at 18:27, Richard Biener  
> > wrote:
> >>
> >>> On Tue, May 24, 2022 at 9:22 PM Prathamesh Kulkarni via Gcc-patches
> >>>  wrote:
> >>>
> >>> On Tue, 24 May 2022 at 14:50, Richard Sandiford
> >>>  wrote:
> 
>  Prathamesh Kulkarni  writes:
> > diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> > index c5006afc00d..0a3c733ada9 100644
> > --- a/gcc/doc/tm.texi
> > +++ b/gcc/doc/tm.texi
> > @@ -6088,14 +6088,18 @@ for the given scalar type @var{type}.  
> > @var{is_packed} is false if the scalar
> > access using @var{type} is known to be naturally aligned.
> > @end deftypefn
> >
> > -@deftypefn {Target Hook} bool TARGET_VECTORIZE_VEC_PERM_CONST 
> > (machine_mode @var{mode}, rtx @var{output}, rtx @var{in0}, rtx 
> > @var{in1}, const vec_perm_indices @var{})
> > +@deftypefn {Target Hook} bool TARGET_VECTORIZE_VEC_PERM_CONST 
> > (machine_mode @var{mode}, machine_mode @var{op_mode}, rtx @var{output}, 
> > rtx @var{in0}, rtx @var{in1}, const vec_perm_indices @var{})
> > This hook is used to test whether the target can permute up to two
> > -vectors of mode @var{mode} using the permutation vector @code{sel}, and
> > -also to emit such a permutation.  In the former case @var{in0}, 
> > @var{in1}
> > -and @var{out} are all null.  In the latter case @var{in0} and 
> > @var{in1} are
> > -the source vectors and @var{out} is the destination vector; all three 
> > are
> > -operands of mode @var{mode}.  @var{in1} is the same as @var{in0} if
> > -@var{sel} describes a permutation on one vector instead of two.
> > +vectors of mode @var{op_mode} using the permutation vector @code{sel},
> > +producing a vector of mode @var{mode}.The hook is also used to emit 
> > such
> 
>  Should be two spaces between “@var{mode}.” and “The”.
> 
> > +a permutation.
> > +
> > +When the hook is being used to test whether the target supports a 
> > permutation,
> > +@var{in0}, @var{in1}, and @var{out} are all null.When the hook is 
> > being used
> 
>  Same here: missing spaces before “When”.
> 
> > +to emit a permutation, @var{in0} and @var{in1} are the source vectors 
> > of mode
> > +@var{op_mode} and @var{out} is the destination vector of mode 
> > @var{mode}.
> > +@var{in1} is the same as @var{in0} if @var{sel} describes a 
> > permutation on one
> > +vector instead of two.
> >
> > Return true if the operation is possible, emitting instructions for it
> > if rtxes are provided.
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index f5efa77560c..f2a527d9c42 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -7596,6 +7596,8 @@ and,
> >  (with
> >   {
> > tree op0 = @0, op1 = @1, op2 = @2;
> > +machine_mode result_mode = TYPE_MODE (type);
> > +machine_mode op_mode = TYPE_MODE (TREE_TYPE (op0));
> >
> > /* Build a vector of integers from the tree mask.  */
> > vec_perm_builder builder;
> > @@ -7703,12 +7705,12 @@ and,
> > 2-argument version.  */
> >  tree oldop2 = op2;
> >  if (sel.ninputs () == 2
> > -|| can_vec_perm_const_p (TYPE_MODE (type), sel, false))
> > +|| can_vec_perm_const_p (result_mode, op_mode, sel, false))
> >op2 = vec_perm_indices_to_tree (TREE_TYPE (op2), sel);
> >  else
> >{
> >  vec_perm_indices sel2 (builder, 2, nelts);
> > - if (can_vec_perm_const_p (TYPE_MODE (type), sel2, false))
> > + if (can_vec_perm_const_p (result_mode, op_mode, sel2, 
> > false))
> >op2 = vec_perm_indices_to_tree (TREE_TYPE (op2), sel2);
> >  else
> >/* Not directly supported with either encoding,
> 
>  Please replace the use of TYPE_MODE here:
> 
> /* See if the permutation is performing a single element
>    insert from a CONSTRUCTOR or constant and use a BIT_INSERT_EXPR
>    in that case.  But only if the vector mode is supported,
>    otherwise this is invalid GIMPLE.  */
> if (TYPE_MODE (type) != BLKmode
> 
>  as well.
> 
>  OK with those changes, thanks.
> >>> Thanks, committed the patch in ae8decf1d2b8329af59592b4fa78ee8dfab3ba5e.
> >>
> >> So the present state allows to ask can_vec_perm_const_p but the
> >> implementation asks for
> >>
> >> 431   if (direct_optab_handler (vec_perm_optab, mode) !=
> >> CODE_FOR_nothing)
> >> 432 return true;
> >>
> >> which then breaks.  Also the VEC_PERMs are not yet valid.  Any reason this
> >> was committed half-way through the review process of 

Re: [PATCH] c++: fix ICE on invalid attributes [PR96637]

2022-05-25 Thread Jason Merrill via Gcc-patches

On 5/25/22 14:49, Marek Polacek wrote:

On Tue, May 24, 2022 at 08:22:22AM -0400, Jason Merrill wrote:

On 4/29/22 10:12, Marek Polacek wrote:

This patch fixes crashes with invalid attributes.  Arguably it could
make sense to assert seen_error() too.


So in this testcase we have TREE_CHAIN of a TREE_LIST pointing to
error_mark_node?  Can we avoid that?


Yes and yes.  Sorry, my previous fix was lousy.  This one is better:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?


OK, thanks.


-- >8 --
When chaining attributes, attr_chainon should be used rather than plain
chainon, so that we don't end up with a TREE_LIST where one of the elements
is error_mark_node, which causes problems.  parser.cc has already been
fixed to use attr_chainon, but decl.cc has not.  Until now.

PR c++/96637

gcc/cp/ChangeLog:

* cp-tree.h (attr_chainon): Declare.
* decl.cc (start_decl): Use attr_chainon.
(grokdeclarator): Likewise.
* parser.cc (cp_parser_statement): No longer static.

gcc/testsuite/ChangeLog:

* g++.dg/parse/error64.C: New test.
---
  gcc/cp/cp-tree.h |  1 +
  gcc/cp/decl.cc   | 19 ++-
  gcc/cp/parser.cc |  2 +-
  gcc/testsuite/g++.dg/parse/error64.C |  4 
  4 files changed, 16 insertions(+), 10 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/parse/error64.C

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index ba986e892b6..d77fd1eb8a9 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -7235,6 +7235,7 @@ extern void inject_this_parameter (tree, cp_cv_quals);
  extern location_t defparse_location (tree);
  extern void maybe_show_extern_c_location (void);
  extern bool literal_integer_zerop (const_tree);
+extern tree attr_chainon (tree, tree);
  
  /* in pt.cc */

  extern tree canonical_type_parameter  (tree);
diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 381259cb9cf..b1ea838ce8b 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -5557,7 +5557,7 @@ start_decl (const cp_declarator *declarator,
*pushed_scope_p = NULL_TREE;
  
if (prefix_attributes != error_mark_node)

-attributes = chainon (attributes, prefix_attributes);
+attributes = attr_chainon (attributes, prefix_attributes);
  
decl = grokdeclarator (declarator, declspecs, NORMAL, initialized,

 );
@@ -12728,9 +12728,10 @@ grokdeclarator (const cp_declarator *declarator,
   as a whole.  */
late_attrs = splice_template_attributes (, type);
  returned_attrs = decl_attributes (,
-   chainon (returned_attrs, attrs),
+   attr_chainon (returned_attrs,
+ attrs),
attr_flags);
- returned_attrs = chainon (late_attrs, returned_attrs);
+ returned_attrs = attr_chainon (late_attrs, returned_attrs);
}
  
inner_declarator = declarator->declarator;

@@ -12781,8 +12782,8 @@ grokdeclarator (const cp_declarator *declarator,
  
  	   The optional attribute-specifier-seq appertains to the

   array.  */
-   returned_attrs = chainon (returned_attrs,
- declarator->std_attributes);
+   returned_attrs = attr_chainon (returned_attrs,
+  declarator->std_attributes);
  break;
  
  	case cdk_function:

@@ -13122,9 +13123,9 @@ grokdeclarator (const cp_declarator *declarator,
/* transaction_safe applies to the type, but
   transaction_safe_dynamic applies to the function.  */
if (is_attribute_p ("transaction_safe", tx_qual))
- attrs = chainon (attrs, att);
+ attrs = attr_chainon (attrs, att);
else
- returned_attrs = chainon (returned_attrs, att);
+ returned_attrs = attr_chainon (returned_attrs, att);
  }
if (attrs)
  /* [dcl.fct]/2:
@@ -13438,7 +13439,7 @@ grokdeclarator (const cp_declarator *declarator,
if (returned_attrs)
  {
if (attrlist)
-   *attrlist = chainon (returned_attrs, *attrlist);
+   *attrlist = attr_chainon (returned_attrs, *attrlist);
else
attrlist = _attrs;
  }
@@ -13451,7 +13452,7 @@ grokdeclarator (const cp_declarator *declarator,
/* [dcl.meaning]/1: The optional attribute-specifier-seq following
 a declarator-id appertains to the entity that is declared.  */
if (declarator->std_attributes != error_mark_node)
-   *attrlist = chainon (*attrlist, declarator->std_attributes);
+   *attrlist = attr_chainon (*attrlist, declarator->std_attributes);
else
/* We should have already diagnosed the issue (c++/78344).  */
gcc_assert (seen_error ());
diff --git 

Re: [PATCH] Modula-2: merge proposal/review: 1/9 01.patch-set-01

2022-05-25 Thread Gaius Mulley via Gcc-patches
Richard Biener  writes:

> So is there a reason to have the 'scaffold' separate from the object
> of hello.mod?

Perhaps the major advantage is flexibility?  But no we can by default
produce the scaffold within the object of hello.mod (and give users the
ability to disable scaffold generation should they wish to implement
their own).

> Is there more than a 1:1 relation between a .mod and the 'scaffold'?

yes there is a 1:1 relation between a .mod and the scaffold.  Although
the caveat is that the compiler would need to parse every .def and .mod
imports from the application universe.  Not a major change as gm2 has
the ability to do whole program (application) compiling, so a minor set
of changes to ensure that upon compiling the program module that it also
parses every .def/.mod.

> Why are multiple tools involved here - can the m2 frontend not parse
> imports, reorder runtime modules and generate the 'scaffold' as
> GENERIC IL as part of the translation unit of the .mod file?
> Indirection through emitting C++ source code makes the process a bit
> awkward IMHO.

indeed the m2 front end can parse imports, reorder and generate the
scaffold.

> Unfortunately I have no m2 installation around to look at how complex
> the 'scaffold' is.

the scaffold is really simple for example here it is for hello.mod:

  $ gm2 -c -g -fmakelist hello.mod
  $ cat hello.lst
Storage
SYSTEM
M2RTS
RTExceptions
# libc   11 
/home/gaius/opt/lib/gcc/x86_64-pc-linux-gnu/13.0.0/m2/m2pim/libc.def FOR 'C'
# SYSTEM   11 
/home/gaius/opt/lib/gcc/x86_64-pc-linux-gnu/13.0.0/m2/m2pim/SYSTEM.mod
# StrIO   11 
/home/gaius/opt/lib/gcc/x86_64-pc-linux-gnu/13.0.0/m2/m2pim/StrIO.mod
# StrLib   10 
/home/gaius/opt/lib/gcc/x86_64-pc-linux-gnu/13.0.0/m2/m2pim/StrLib.mod
# ASCII   10 
/home/gaius/opt/lib/gcc/x86_64-pc-linux-gnu/13.0.0/m2/m2pim/ASCII.mod
# NumberIO   10 
/home/gaius/opt/lib/gcc/x86_64-pc-linux-gnu/13.0.0/m2/m2pim/NumberIO.mod
# Indexing   10 
/home/gaius/opt/lib/gcc/x86_64-pc-linux-gnu/13.0.0/m2/m2pim/Indexing.mod
# errno9 
/home/gaius/opt/lib/gcc/x86_64-pc-linux-gnu/13.0.0/m2/m2pim/errno.def
# termios9 
/home/gaius/opt/lib/gcc/x86_64-pc-linux-gnu/13.0.0/m2/m2pim/termios.def
# FIO9 /home/gaius/opt/lib/gcc/x86_64-pc-linux-gnu/13.0.0/m2/m2pim/FIO.mod
# IO8 /home/gaius/opt/lib/gcc/x86_64-pc-linux-gnu/13.0.0/m2/m2pim/IO.mod
# StdIO7 
/home/gaius/opt/lib/gcc/x86_64-pc-linux-gnu/13.0.0/m2/m2pim/StdIO.mod
# Debug6 
/home/gaius/opt/lib/gcc/x86_64-pc-linux-gnu/13.0.0/m2/m2pim/Debug.mod
# SysStorage5 
/home/gaius/opt/lib/gcc/x86_64-pc-linux-gnu/13.0.0/m2/m2pim/SysStorage.mod
# SysExceptions4 
/home/gaius/opt/lib/gcc/x86_64-pc-linux-gnu/13.0.0/m2/m2pim/SysExceptions.def
# M2EXCEPTION4 
/home/gaius/opt/lib/gcc/x86_64-pc-linux-gnu/13.0.0/m2/m2pim/M2EXCEPTION.mod
# Storage4 
/home/gaius/opt/lib/gcc/x86_64-pc-linux-gnu/13.0.0/m2/m2pim/Storage.mod
# RTExceptions3 
/home/gaius/opt/lib/gcc/x86_64-pc-linux-gnu/13.0.0/m2/m2pim/RTExceptions.mod
# M2RTS2 
/home/gaius/opt/lib/gcc/x86_64-pc-linux-gnu/13.0.0/m2/m2pim/M2RTS.mod
# hello1 hello.mod
#
# Initialization order
#
StrIO
StrLib
ASCII
NumberIO
Indexing
errno
termios
FIO
IO
StdIO
Debug
SysStorage
SysExceptions
M2EXCEPTION
hello

   and now to generate the scaffold for a static application:

   $ ~/opt/bin/gm2 -fmakeinit -c -g hello.mod
   $ cat hello_m2.cpp 
extern "C" void exit(int);

extern "C" void RTExceptions_DefaultErrorCatch(void);
extern "C" void _M2_Storage_init (int argc, char *argv[]);
extern "C" void _M2_Storage_finish (void);
extern "C" void _M2_SYSTEM_init (int argc, char *argv[]);
extern "C" void _M2_SYSTEM_finish (void);
extern "C" void _M2_M2RTS_init (int argc, char *argv[]);
extern "C" void _M2_M2RTS_finish (void);
extern "C" void _M2_RTExceptions_init (int argc, char *argv[]);
extern "C" void _M2_RTExceptions_finish (void);
extern "C" void _M2_StrIO_init (int argc, char *argv[]);
extern "C" void _M2_StrIO_finish (void);
extern "C" void _M2_StrLib_init (int argc, char *argv[]);
extern "C" void _M2_StrLib_finish (void);
extern "C" void _M2_ASCII_init (int argc, char *argv[]);
extern "C" void _M2_ASCII_finish (void);
extern "C" void _M2_NumberIO_init (int argc, char *argv[]);
extern "C" void _M2_NumberIO_finish (void);
extern "C" void _M2_Indexing_init (int argc, char *argv[]);
extern "C" void _M2_Indexing_finish (void);
extern "C" void _M2_errno_init (int argc, char *argv[]);
extern "C" void _M2_errno_finish (void);
extern "C" void _M2_termios_init (int argc, char *argv[]);
extern "C" void _M2_termios_finish (void);
extern "C" void _M2_FIO_init (int argc, char *argv[]);
extern "C" void _M2_FIO_finish (void);
extern "C" void _M2_IO_init (int argc, char *argv[]);
extern "C" void _M2_IO_finish (void);
extern "C" void _M2_StdIO_init (int argc, char *argv[]);
extern "C" void _M2_StdIO_finish (void);
extern "C" void _M2_Debug_init (int argc, char *argv[]);
extern "C" void _M2_Debug_finish (void);
extern "C" void _M2_SysStorage_init (int argc, 

Re: [0/9] [middle-end] Add param to vec_perm_const hook to specify mode of input operand

2022-05-25 Thread Richard Biener via Gcc-patches



> Am 25.05.2022 um 21:03 schrieb Prathamesh Kulkarni 
> :
> 
> On Wed, 25 May 2022 at 18:27, Richard Biener  
> wrote:
>> 
>>> On Tue, May 24, 2022 at 9:22 PM Prathamesh Kulkarni via Gcc-patches
>>>  wrote:
>>> 
>>> On Tue, 24 May 2022 at 14:50, Richard Sandiford
>>>  wrote:
 
 Prathamesh Kulkarni  writes:
> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> index c5006afc00d..0a3c733ada9 100644
> --- a/gcc/doc/tm.texi
> +++ b/gcc/doc/tm.texi
> @@ -6088,14 +6088,18 @@ for the given scalar type @var{type}.  
> @var{is_packed} is false if the scalar
> access using @var{type} is known to be naturally aligned.
> @end deftypefn
> 
> -@deftypefn {Target Hook} bool TARGET_VECTORIZE_VEC_PERM_CONST 
> (machine_mode @var{mode}, rtx @var{output}, rtx @var{in0}, rtx @var{in1}, 
> const vec_perm_indices @var{})
> +@deftypefn {Target Hook} bool TARGET_VECTORIZE_VEC_PERM_CONST 
> (machine_mode @var{mode}, machine_mode @var{op_mode}, rtx @var{output}, 
> rtx @var{in0}, rtx @var{in1}, const vec_perm_indices @var{})
> This hook is used to test whether the target can permute up to two
> -vectors of mode @var{mode} using the permutation vector @code{sel}, and
> -also to emit such a permutation.  In the former case @var{in0}, @var{in1}
> -and @var{out} are all null.  In the latter case @var{in0} and @var{in1} 
> are
> -the source vectors and @var{out} is the destination vector; all three are
> -operands of mode @var{mode}.  @var{in1} is the same as @var{in0} if
> -@var{sel} describes a permutation on one vector instead of two.
> +vectors of mode @var{op_mode} using the permutation vector @code{sel},
> +producing a vector of mode @var{mode}.The hook is also used to emit such
 
 Should be two spaces between “@var{mode}.” and “The”.
 
> +a permutation.
> +
> +When the hook is being used to test whether the target supports a 
> permutation,
> +@var{in0}, @var{in1}, and @var{out} are all null.When the hook is being 
> used
 
 Same here: missing spaces before “When”.
 
> +to emit a permutation, @var{in0} and @var{in1} are the source vectors of 
> mode
> +@var{op_mode} and @var{out} is the destination vector of mode @var{mode}.
> +@var{in1} is the same as @var{in0} if @var{sel} describes a permutation 
> on one
> +vector instead of two.
> 
> Return true if the operation is possible, emitting instructions for it
> if rtxes are provided.
> diff --git a/gcc/match.pd b/gcc/match.pd
> index f5efa77560c..f2a527d9c42 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -7596,6 +7596,8 @@ and,
>  (with
>   {
> tree op0 = @0, op1 = @1, op2 = @2;
> +machine_mode result_mode = TYPE_MODE (type);
> +machine_mode op_mode = TYPE_MODE (TREE_TYPE (op0));
> 
> /* Build a vector of integers from the tree mask.  */
> vec_perm_builder builder;
> @@ -7703,12 +7705,12 @@ and,
> 2-argument version.  */
>  tree oldop2 = op2;
>  if (sel.ninputs () == 2
> -|| can_vec_perm_const_p (TYPE_MODE (type), sel, false))
> +|| can_vec_perm_const_p (result_mode, op_mode, sel, false))
>op2 = vec_perm_indices_to_tree (TREE_TYPE (op2), sel);
>  else
>{
>  vec_perm_indices sel2 (builder, 2, nelts);
> - if (can_vec_perm_const_p (TYPE_MODE (type), sel2, false))
> + if (can_vec_perm_const_p (result_mode, op_mode, sel2, 
> false))
>op2 = vec_perm_indices_to_tree (TREE_TYPE (op2), sel2);
>  else
>/* Not directly supported with either encoding,
 
 Please replace the use of TYPE_MODE here:
 
/* See if the permutation is performing a single element
   insert from a CONSTRUCTOR or constant and use a BIT_INSERT_EXPR
   in that case.  But only if the vector mode is supported,
   otherwise this is invalid GIMPLE.  */
if (TYPE_MODE (type) != BLKmode
 
 as well.
 
 OK with those changes, thanks.
>>> Thanks, committed the patch in ae8decf1d2b8329af59592b4fa78ee8dfab3ba5e.
>> 
>> So the present state allows to ask can_vec_perm_const_p but the
>> implementation asks for
>> 
>> 431   if (direct_optab_handler (vec_perm_optab, mode) !=
>> CODE_FOR_nothing)
>> 432 return true;
>> 
>> which then breaks.  Also the VEC_PERMs are not yet valid.  Any reason this
>> was committed half-way through the review process of the series?
> Hi Richard,
> I am very sorry about that. I thought the patch was approved, and
> committed it after testing on x86_64 and aarch64.
> Should I revert it ?

No need.

> Um sorry to ask a silly question -- I am not sure why does the patch
> break the above condition ?
> The patch 

Re: [0/9] [middle-end] Add param to vec_perm_const hook to specify mode of input operand

2022-05-25 Thread Prathamesh Kulkarni via Gcc-patches
On Wed, 25 May 2022 at 18:27, Richard Biener  wrote:
>
> On Tue, May 24, 2022 at 9:22 PM Prathamesh Kulkarni via Gcc-patches
>  wrote:
> >
> > On Tue, 24 May 2022 at 14:50, Richard Sandiford
> >  wrote:
> > >
> > > Prathamesh Kulkarni  writes:
> > > > diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> > > > index c5006afc00d..0a3c733ada9 100644
> > > > --- a/gcc/doc/tm.texi
> > > > +++ b/gcc/doc/tm.texi
> > > > @@ -6088,14 +6088,18 @@ for the given scalar type @var{type}.  
> > > > @var{is_packed} is false if the scalar
> > > >  access using @var{type} is known to be naturally aligned.
> > > >  @end deftypefn
> > > >
> > > > -@deftypefn {Target Hook} bool TARGET_VECTORIZE_VEC_PERM_CONST 
> > > > (machine_mode @var{mode}, rtx @var{output}, rtx @var{in0}, rtx 
> > > > @var{in1}, const vec_perm_indices @var{})
> > > > +@deftypefn {Target Hook} bool TARGET_VECTORIZE_VEC_PERM_CONST 
> > > > (machine_mode @var{mode}, machine_mode @var{op_mode}, rtx @var{output}, 
> > > > rtx @var{in0}, rtx @var{in1}, const vec_perm_indices @var{})
> > > >  This hook is used to test whether the target can permute up to two
> > > > -vectors of mode @var{mode} using the permutation vector @code{sel}, and
> > > > -also to emit such a permutation.  In the former case @var{in0}, 
> > > > @var{in1}
> > > > -and @var{out} are all null.  In the latter case @var{in0} and 
> > > > @var{in1} are
> > > > -the source vectors and @var{out} is the destination vector; all three 
> > > > are
> > > > -operands of mode @var{mode}.  @var{in1} is the same as @var{in0} if
> > > > -@var{sel} describes a permutation on one vector instead of two.
> > > > +vectors of mode @var{op_mode} using the permutation vector @code{sel},
> > > > +producing a vector of mode @var{mode}.The hook is also used to emit 
> > > > such
> > >
> > > Should be two spaces between “@var{mode}.” and “The”.
> > >
> > > > +a permutation.
> > > > +
> > > > +When the hook is being used to test whether the target supports a 
> > > > permutation,
> > > > +@var{in0}, @var{in1}, and @var{out} are all null.When the hook is 
> > > > being used
> > >
> > > Same here: missing spaces before “When”.
> > >
> > > > +to emit a permutation, @var{in0} and @var{in1} are the source vectors 
> > > > of mode
> > > > +@var{op_mode} and @var{out} is the destination vector of mode 
> > > > @var{mode}.
> > > > +@var{in1} is the same as @var{in0} if @var{sel} describes a 
> > > > permutation on one
> > > > +vector instead of two.
> > > >
> > > >  Return true if the operation is possible, emitting instructions for it
> > > >  if rtxes are provided.
> > > > diff --git a/gcc/match.pd b/gcc/match.pd
> > > > index f5efa77560c..f2a527d9c42 100644
> > > > --- a/gcc/match.pd
> > > > +++ b/gcc/match.pd
> > > > @@ -7596,6 +7596,8 @@ and,
> > > >   (with
> > > >{
> > > >  tree op0 = @0, op1 = @1, op2 = @2;
> > > > +machine_mode result_mode = TYPE_MODE (type);
> > > > +machine_mode op_mode = TYPE_MODE (TREE_TYPE (op0));
> > > >
> > > >  /* Build a vector of integers from the tree mask.  */
> > > >  vec_perm_builder builder;
> > > > @@ -7703,12 +7705,12 @@ and,
> > > >  2-argument version.  */
> > > >   tree oldop2 = op2;
> > > >   if (sel.ninputs () == 2
> > > > -|| can_vec_perm_const_p (TYPE_MODE (type), sel, false))
> > > > +|| can_vec_perm_const_p (result_mode, op_mode, sel, false))
> > > > op2 = vec_perm_indices_to_tree (TREE_TYPE (op2), sel);
> > > >   else
> > > > {
> > > >   vec_perm_indices sel2 (builder, 2, nelts);
> > > > - if (can_vec_perm_const_p (TYPE_MODE (type), sel2, false))
> > > > + if (can_vec_perm_const_p (result_mode, op_mode, sel2, 
> > > > false))
> > > > op2 = vec_perm_indices_to_tree (TREE_TYPE (op2), sel2);
> > > >   else
> > > > /* Not directly supported with either encoding,
> > >
> > > Please replace the use of TYPE_MODE here:
> > >
> > > /* See if the permutation is performing a single element
> > >insert from a CONSTRUCTOR or constant and use a BIT_INSERT_EXPR
> > >in that case.  But only if the vector mode is supported,
> > >otherwise this is invalid GIMPLE.  */
> > > if (TYPE_MODE (type) != BLKmode
> > >
> > > as well.
> > >
> > > OK with those changes, thanks.
> > Thanks, committed the patch in ae8decf1d2b8329af59592b4fa78ee8dfab3ba5e.
>
> So the present state allows to ask can_vec_perm_const_p but the
> implementation asks for
>
> 431   if (direct_optab_handler (vec_perm_optab, mode) !=
> CODE_FOR_nothing)
> 432 return true;
>
> which then breaks.  Also the VEC_PERMs are not yet valid.  Any reason this
> was committed half-way through the review process of the series?
Hi Richard,
I am very sorry about that. I thought the patch was approved, and
committed it after testing on x86_64 and aarch64.
Should I revert it ?

Um sorry to ask a 

Re: [PATCH v2] DSE: Use the constant store source if possible

2022-05-25 Thread H.J. Lu via Gcc-patches
On Wed, May 25, 2022 at 2:30 AM Richard Sandiford
 wrote:
>
> Richard Biener via Gcc-patches  writes:
> > On Tue, May 24, 2022 at 10:11 PM H.J. Lu  wrote:
> >>
> >> On Mon, May 23, 2022 at 11:42 PM Richard Biener
> >>  wrote:
> >> >
> >> > On Mon, May 23, 2022 at 8:34 PM H.J. Lu  wrote:
> >> > >
> >> > > On Mon, May 23, 2022 at 12:38:06PM +0200, Richard Biener wrote:
> >> > > > On Sat, May 21, 2022 at 5:02 AM H.J. Lu via Gcc-patches
> >> > > >  wrote:
> >> > > > >
> >> > > > > When recording store for RTL dead store elimination, check if the 
> >> > > > > source
> >> > > > > register is set only once to a constant.  If yes, record the 
> >> > > > > constant
> >> > > > > as the store source.  It eliminates unrolled zero stores after 
> >> > > > > memset 0
> >> > > > > in a loop where a vector register is used as the zero store source.
> >> > > > >
> >> > > > > gcc/
> >> > > > >
> >> > > > > PR rtl-optimization/105638
> >> > > > > * dse.cc (record_store): Use the constant source if the 
> >> > > > > source
> >> > > > > register is set only once.
> >> > > > >
> >> > > > > gcc/testsuite/
> >> > > > >
> >> > > > > PR rtl-optimization/105638
> >> > > > > * g++.target/i386/pr105638.C: New test.
> >> > > > > ---
> >> > > > >  gcc/dse.cc   | 19 ++
> >> > > > >  gcc/testsuite/g++.target/i386/pr105638.C | 44 
> >> > > > > 
> >> > > > >  2 files changed, 63 insertions(+)
> >> > > > >  create mode 100644 gcc/testsuite/g++.target/i386/pr105638.C
> >> > > > >
> >> > > > > diff --git a/gcc/dse.cc b/gcc/dse.cc
> >> > > > > index 30c11cee034..0433dd3d846 100644
> >> > > > > --- a/gcc/dse.cc
> >> > > > > +++ b/gcc/dse.cc
> >> > > > > @@ -1508,6 +1508,25 @@ record_store (rtx body, bb_info_t bb_info)
> >> > > > >
> >> > > > >   if (tem && CONSTANT_P (tem))
> >> > > > > const_rhs = tem;
> >> > > > > + else
> >> > > > > +   {
> >> > > > > + /* If RHS is set only once to a constant, set 
> >> > > > > CONST_RHS
> >> > > > > +to the constant.  */
> >> > > > > + df_ref def = DF_REG_DEF_CHAIN (REGNO (rhs));
> >> > > > > + if (def != nullptr
> >> > > > > + && !DF_REF_IS_ARTIFICIAL (def)
> >> > > > > + && !DF_REF_NEXT_REG (def))
> >> > > > > +   {
> >> > > > > + rtx_insn *def_insn = DF_REF_INSN (def);
> >> > > > > + rtx def_body = PATTERN (def_insn);
> >> > > > > + if (GET_CODE (def_body) == SET)
> >> > > > > +   {
> >> > > > > + rtx def_src = SET_SRC (def_body);
> >> > > > > + if (CONSTANT_P (def_src))
> >> > > > > +   const_rhs = def_src;
> >> > > >
> >> > > > doesn't DSE have its own tracking of stored values?  Shouldn't we
> >> > >
> >> > > It tracks stored values only within the basic block.  When RTL loop
> >> > > invariant motion hoists a constant initialization out of the loop into
> >> > > a separate basic block, the constant store value becomes unknown
> >> > > within the original basic block.
> >> > >
> >> > > > improve _that_ if it is not enough?  I also wonder if you need to
> >> > >
> >> > > My patch extends DSE stored value tracking to include the constant 
> >> > > which
> >> > > is set only once in another basic block.
> >> > >
> >> > > > verify the SET isn't partial?
> >> > > >
> >> > >
> >> > > Here is the v2 patch to check that the constant is set by a non-partial
> >> > > unconditional load.
> >> > >
> >> > > OK for master?
> >> > >
> >> > > Thanks.
> >> > >
> >> > > H.J.
> >> > > ---
> >> > > RTL DSE tracks redundant constant stores within a basic block.  When 
> >> > > RTL
> >> > > loop invariant motion hoists a constant initialization out of the loop
> >> > > into a separate basic block, the constant store value becomes unknown
> >> > > within the original basic block.  When recording store for RTL DSE, 
> >> > > check
> >> > > if the source register is set only once to a constant by a non-partial
> >> > > unconditional load.  If yes, record the constant as the constant store
> >> > > source.  It eliminates unrolled zero stores after memset 0 in a loop
> >> > > where a vector register is used as the zero store source.
> >> > >
> >> > > gcc/
> >> > >
> >> > > PR rtl-optimization/105638
> >> > > * dse.cc (record_store): Use the constant source if the source
> >> > > register is set only once.
> >> > >
> >> > > gcc/testsuite/
> >> > >
> >> > > PR rtl-optimization/105638
> >> > > * g++.target/i386/pr105638.C: New test.
> >> > > ---
> >> > >  gcc/dse.cc   | 22 
> >> > >  gcc/testsuite/g++.target/i386/pr105638.C | 44 
> >> > >  2 files changed, 66 insertions(+)
> >> > >  create mode 100644 gcc/testsuite/g++.target/i386/pr105638.C
> >> > >
> >> > > diff --git 

Re: [PATCH v2] DSE: Use the constant store source if possible

2022-05-25 Thread H.J. Lu via Gcc-patches
On Wed, May 25, 2022 at 12:30 AM Richard Sandiford
 wrote:
>
> "H.J. Lu via Gcc-patches"  writes:
> > On Mon, May 23, 2022 at 12:38:06PM +0200, Richard Biener wrote:
> >> On Sat, May 21, 2022 at 5:02 AM H.J. Lu via Gcc-patches
> >>  wrote:
> >> >
> >> > When recording store for RTL dead store elimination, check if the source
> >> > register is set only once to a constant.  If yes, record the constant
> >> > as the store source.  It eliminates unrolled zero stores after memset 0
> >> > in a loop where a vector register is used as the zero store source.
> >> >
> >> > gcc/
> >> >
> >> > PR rtl-optimization/105638
> >> > * dse.cc (record_store): Use the constant source if the source
> >> > register is set only once.
> >> >
> >> > gcc/testsuite/
> >> >
> >> > PR rtl-optimization/105638
> >> > * g++.target/i386/pr105638.C: New test.
> >> > ---
> >> >  gcc/dse.cc   | 19 ++
> >> >  gcc/testsuite/g++.target/i386/pr105638.C | 44 
> >> >  2 files changed, 63 insertions(+)
> >> >  create mode 100644 gcc/testsuite/g++.target/i386/pr105638.C
> >> >
> >> > diff --git a/gcc/dse.cc b/gcc/dse.cc
> >> > index 30c11cee034..0433dd3d846 100644
> >> > --- a/gcc/dse.cc
> >> > +++ b/gcc/dse.cc
> >> > @@ -1508,6 +1508,25 @@ record_store (rtx body, bb_info_t bb_info)
> >> >
> >> >   if (tem && CONSTANT_P (tem))
> >> > const_rhs = tem;
> >> > + else
> >> > +   {
> >> > + /* If RHS is set only once to a constant, set CONST_RHS
> >> > +to the constant.  */
> >> > + df_ref def = DF_REG_DEF_CHAIN (REGNO (rhs));
> >> > + if (def != nullptr
> >> > + && !DF_REF_IS_ARTIFICIAL (def)
> >> > + && !DF_REF_NEXT_REG (def))
> >> > +   {
> >> > + rtx_insn *def_insn = DF_REF_INSN (def);
> >> > + rtx def_body = PATTERN (def_insn);
> >> > + if (GET_CODE (def_body) == SET)
> >> > +   {
> >> > + rtx def_src = SET_SRC (def_body);
> >> > + if (CONSTANT_P (def_src))
> >> > +   const_rhs = def_src;
> >>
> >> doesn't DSE have its own tracking of stored values?  Shouldn't we
> >
> > It tracks stored values only within the basic block.  When RTL loop
> > invariant motion hoists a constant initialization out of the loop into
> > a separate basic block, the constant store value becomes unknown
> > within the original basic block.
> >
> >> improve _that_ if it is not enough?  I also wonder if you need to
> >
> > My patch extends DSE stored value tracking to include the constant which
> > is set only once in another basic block.
> >
> >> verify the SET isn't partial?
> >>
> >
> > Here is the v2 patch to check that the constant is set by a non-partial
> > unconditional load.
> >
> > OK for master?
> >
> > Thanks.
> >
> > H.J.
> > ---
> > RTL DSE tracks redundant constant stores within a basic block.  When RTL
> > loop invariant motion hoists a constant initialization out of the loop
> > into a separate basic block, the constant store value becomes unknown
> > within the original basic block.  When recording store for RTL DSE, check
> > if the source register is set only once to a constant by a non-partial
> > unconditional load.  If yes, record the constant as the constant store
> > source.  It eliminates unrolled zero stores after memset 0 in a loop
> > where a vector register is used as the zero store source.
> >
> > gcc/
> >
> >   PR rtl-optimization/105638
> >   * dse.cc (record_store): Use the constant source if the source
> >   register is set only once.
> >
> > gcc/testsuite/
> >
> >   PR rtl-optimization/105638
> >   * g++.target/i386/pr105638.C: New test.
> > ---
> >  gcc/dse.cc   | 22 
> >  gcc/testsuite/g++.target/i386/pr105638.C | 44 
> >  2 files changed, 66 insertions(+)
> >  create mode 100644 gcc/testsuite/g++.target/i386/pr105638.C
> >
> > diff --git a/gcc/dse.cc b/gcc/dse.cc
> > index 30c11cee034..af8e88dac32 100644
> > --- a/gcc/dse.cc
> > +++ b/gcc/dse.cc
> > @@ -1508,6 +1508,28 @@ record_store (rtx body, bb_info_t bb_info)
> >
> > if (tem && CONSTANT_P (tem))
> >   const_rhs = tem;
> > +   else
> > + {
> > +   /* If RHS is set only once to a constant, set CONST_RHS
> > +  to the constant.  */
> > +   df_ref def = DF_REG_DEF_CHAIN (REGNO (rhs));
> > +   if (def != nullptr
> > +   && !DF_REF_IS_ARTIFICIAL (def)
> > +   && !(DF_REF_FLAGS (def)
> > +& (DF_REF_PARTIAL | DF_REF_CONDITIONAL))
> > +   && !DF_REF_NEXT_REG (def))
>
> Can we introduce a helper for this?  There are already similar tests
> in ira and loop-iv, and it seems a bit too complex to have to open-code
> each time.

I can use 

[pushed] c++: CTAD with alias and nested template [PR105655]

2022-05-25 Thread Jason Merrill via Gcc-patches
Here, alias_ctad_tweaks expect tsubst_decl of a FUNCTION_DECL to return a
FUNCTION_DECL.  A reasonable expectation, but in this case we were replacing
the template args of the class-scope deduction guide with equivalent args,
so looking in the hash table we found the partial instantiation stored when
instantiating A, which is a TEMPLATE_DECL.  It's fine for that to be
what is stored, but tsubst_function_decl should never return it.

Tested x86_64-pc-linux-gnu, applying to trunk.

PR c++/105655

gcc/cp/ChangeLog:

* pt.cc (build_template_decl): Add assert.
(tsubst_function_decl): Don't return a template.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/class-deduction-alias13.C: New test.
---
 gcc/cp/pt.cc  |  6 -
 .../g++.dg/cpp2a/class-deduction-alias13.C| 24 +++
 2 files changed, 29 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/class-deduction-alias13.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 76913cb1409..021af019840 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -5021,6 +5021,8 @@ maybe_update_decl_type (tree orig_type, tree scope)
 static tree
 build_template_decl (tree decl, tree parms, bool member_template_p)
 {
+  gcc_checking_assert (TREE_CODE (decl) != TEMPLATE_DECL);
+
   tree tmpl = build_lang_decl (TEMPLATE_DECL, DECL_NAME (decl), NULL_TREE);
   SET_DECL_LANGUAGE (tmpl, DECL_LANGUAGE (decl));
   DECL_TEMPLATE_PARMS (tmpl) = parms;
@@ -14074,7 +14076,9 @@ tsubst_function_decl (tree t, tree args, tsubst_flags_t 
complain,
{
  hash = hash_tmpl_and_args (gen_tmpl, argvec);
  if (tree spec = retrieve_specialization (gen_tmpl, argvec, hash))
-   return spec;
+   /* The spec for these args might be a partial instantiation of the
+  template, but here what we want is the FUNCTION_DECL.  */
+   return STRIP_TEMPLATE (spec);
}
 }
   else
diff --git a/gcc/testsuite/g++.dg/cpp2a/class-deduction-alias13.C 
b/gcc/testsuite/g++.dg/cpp2a/class-deduction-alias13.C
new file mode 100644
index 000..0a90a83081b
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/class-deduction-alias13.C
@@ -0,0 +1,24 @@
+// PR c++/105655
+// { dg-do compile { target c++20 } }
+
+template 
+struct A
+{
+  template 
+  struct B
+  {
+B(const L & left, const R & right)
+{}
+  };
+
+  template 
+  B(const L &, const R &) -> B;
+};
+
+template 
+using C = A::B;
+
+int main()
+{
+  C x{0, 0};
+}

base-commit: 850a9ce8bcca59c7efabcdeeca14c5bd905e8363
-- 
2.27.0



Re: [PATCH] c++: fix ICE on invalid attributes [PR96637]

2022-05-25 Thread Marek Polacek via Gcc-patches
On Tue, May 24, 2022 at 08:22:22AM -0400, Jason Merrill wrote:
> On 4/29/22 10:12, Marek Polacek wrote:
> > This patch fixes crashes with invalid attributes.  Arguably it could
> > make sense to assert seen_error() too.
> 
> So in this testcase we have TREE_CHAIN of a TREE_LIST pointing to
> error_mark_node?  Can we avoid that?

Yes and yes.  Sorry, my previous fix was lousy.  This one is better:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
When chaining attributes, attr_chainon should be used rather than plain
chainon, so that we don't end up with a TREE_LIST where one of the elements
is error_mark_node, which causes problems.  parser.cc has already been
fixed to use attr_chainon, but decl.cc has not.  Until now.

PR c++/96637

gcc/cp/ChangeLog:

* cp-tree.h (attr_chainon): Declare.
* decl.cc (start_decl): Use attr_chainon.
(grokdeclarator): Likewise.
* parser.cc (cp_parser_statement): No longer static.

gcc/testsuite/ChangeLog:

* g++.dg/parse/error64.C: New test.
---
 gcc/cp/cp-tree.h |  1 +
 gcc/cp/decl.cc   | 19 ++-
 gcc/cp/parser.cc |  2 +-
 gcc/testsuite/g++.dg/parse/error64.C |  4 
 4 files changed, 16 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/parse/error64.C

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index ba986e892b6..d77fd1eb8a9 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -7235,6 +7235,7 @@ extern void inject_this_parameter (tree, cp_cv_quals);
 extern location_t defparse_location (tree);
 extern void maybe_show_extern_c_location (void);
 extern bool literal_integer_zerop (const_tree);
+extern tree attr_chainon (tree, tree);
 
 /* in pt.cc */
 extern tree canonical_type_parameter   (tree);
diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 381259cb9cf..b1ea838ce8b 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -5557,7 +5557,7 @@ start_decl (const cp_declarator *declarator,
   *pushed_scope_p = NULL_TREE;
 
   if (prefix_attributes != error_mark_node)
-attributes = chainon (attributes, prefix_attributes);
+attributes = attr_chainon (attributes, prefix_attributes);
 
   decl = grokdeclarator (declarator, declspecs, NORMAL, initialized,
 );
@@ -12728,9 +12728,10 @@ grokdeclarator (const cp_declarator *declarator,
   as a whole.  */
late_attrs = splice_template_attributes (, type);
  returned_attrs = decl_attributes (,
-   chainon (returned_attrs, attrs),
+   attr_chainon (returned_attrs,
+ attrs),
attr_flags);
- returned_attrs = chainon (late_attrs, returned_attrs);
+ returned_attrs = attr_chainon (late_attrs, returned_attrs);
}
 
   inner_declarator = declarator->declarator;
@@ -12781,8 +12782,8 @@ grokdeclarator (const cp_declarator *declarator,
 
   The optional attribute-specifier-seq appertains to the
   array.  */
-   returned_attrs = chainon (returned_attrs,
- declarator->std_attributes);
+   returned_attrs = attr_chainon (returned_attrs,
+  declarator->std_attributes);
  break;
 
case cdk_function:
@@ -13122,9 +13123,9 @@ grokdeclarator (const cp_declarator *declarator,
/* transaction_safe applies to the type, but
   transaction_safe_dynamic applies to the function.  */
if (is_attribute_p ("transaction_safe", tx_qual))
- attrs = chainon (attrs, att);
+ attrs = attr_chainon (attrs, att);
else
- returned_attrs = chainon (returned_attrs, att);
+ returned_attrs = attr_chainon (returned_attrs, att);
  }
if (attrs)
  /* [dcl.fct]/2:
@@ -13438,7 +13439,7 @@ grokdeclarator (const cp_declarator *declarator,
   if (returned_attrs)
 {
   if (attrlist)
-   *attrlist = chainon (returned_attrs, *attrlist);
+   *attrlist = attr_chainon (returned_attrs, *attrlist);
   else
attrlist = _attrs;
 }
@@ -13451,7 +13452,7 @@ grokdeclarator (const cp_declarator *declarator,
   /* [dcl.meaning]/1: The optional attribute-specifier-seq following
 a declarator-id appertains to the entity that is declared.  */
   if (declarator->std_attributes != error_mark_node)
-   *attrlist = chainon (*attrlist, declarator->std_attributes);
+   *attrlist = attr_chainon (*attrlist, declarator->std_attributes);
   else
/* We should have already diagnosed the issue (c++/78344).  */
gcc_assert (seen_error ());
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 868b8610d60..62aaccda23d 

Re: [wwwdocs] Add C status page

2022-05-25 Thread Joseph Myers
On Tue, 24 May 2022, Marek Polacek via Gcc-patches wrote:

> > And cases where some support in GCC should 
> > definitely be done to consider the feature implemented, even when not 
> > needed for conformance (e.g. the %wN, %wfN printf/scanf formats need 
> > implementing in glibc, and corresponding format checking support needs 
> > implementing in GCC).
> 
> These could be marked as "partially implemented" (yellow color).  Except
> I often don't know which features need extensions like that.

I think something like how c99status.html lists wide character library 
support is appropriate for the Notes (so along the lines of "Library 
feature, no compiler support required. GCC doesn't have %wN, %wfN format 
checking support.").

> > There are also cases where a feature is 
> > substantially there but a more detailed review should be done for how it 
> > matches up to the standard version (e.g. the DFP support based on TR 
> > 24732-2009 could do with such a detailed review for how it matches C2x 
> > requirements).
> 
> Indeed, and that's a hard problem.  I for one could never figure out this
> one.  So I'd leave it in the "?" (red color) state.

I think the approach taken in c99status.html - list the version where 
there is substantial support so code written with the new syntax will 
generally work - is appropriate as regards the choice of version number to 
list.  So that would list GCC 4.3 as the version with DFP support (i.e. 
the entry for TS 18661-2 integration - I don't think listing all the TS 
18661 parts in a single row is right).  And then it might be marked as 
partially implemented.  And the Notes would say (in some order) that (a) 
support based on TR 24732-2009 may not have been fully updated to C2X 
requirements, (b) both language and library features are involved, and 
library support is in libdfp (link to libdfp) but might not be complete, 
(c) there are issues with correct rounding of conversions between binary 
and decimal DFP in some cases (link to bugs 97635, 105669), and issues 
with exceptions and rounding modes similar to those that apply with binary 
floating point, (d) DFP is only supported for a few architectures.

(I don't think the versions in which I made -std=c2x -pedantic allow DFP 
features need to be mentioned at all.)

> > > +
> > > +  Binary literals
> > > +   > > href="http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2549.pdf;>N2549
> > > +  GCC 
> > > 11
> > > +  
> > > +
> > 
> > This is an example of cases where the version where a feature was 
> > supported in GCC as an extension is long before the version where 
> > -pedantic knows about it being supported in a given standard version; 
> > listing the version with the -pedantic change in such cases may not be 
> > very helpful without noting when it was originally implemented.
> 
> Probably here we could just say "Yes" (green color) and make a note in the
> "Notes" column.

Following the same principle as c99status.html for chosing versions to 
list says that the version number should be listed as GCC 4.3.

> > There are 
> > probably other examples in the list.  (There are also examples where GCC 
> > supports the feature but hasn't yet had -pedantic updated accordingly, 
> > e.g. #warning.  And cases where it's largely supported but there are small 
> > differences in the standard version that still need implementing, e.g. 
> > typeof.)
>  
> Yeah, I bet.  It's tricky to decide :/.

(In the case of typeof, "partially implemented" is probably right anyway, 
simply because of the remove_quals variant in C2x (for which there is a 
proposal of renaming to be considered in July).  And then note that typeof 
itself, with some semantic differences, but not remove_quals, was 
implemented in GCC 1.21 or before - similar to how c99status.html handles 
complex numbers, noting that __complex__ was supported in GCC 2.5 but 
_Complex not until GCC 3.0 which is the version listed in the version 
column there.)

> Maybe it's just me, but I find some value in having proposals like that in
> the table too (but it should be "N/A" and gray).  This is what I did for
> N2641.
> 
> What I like about having a table like this is that it makes it clear what
> remains to be implemented, and unimplemented features have a linked PR,
> which makes it easy to see if someone is already planning to implement the
> feature, or if it still unassigned.

I don't object to having non-features listed as N/A, but I think you'll 
have a lot of rows like that.  Library features (that don't need compiler 
support) should be explicitly noted as such when marking them as N/A.

I have rough notes on C2x features needing implementing in GCC and glibc 
(sorted approximately by the relevant commits in the (private) C standard 
git repository), but that isn't a straightforward checklist intended to be 
more widely comprehensible or to map to corresponding PRs, includes cases 
of possible future improvements not actually needed to 

[PATCH] c++: constrained partial spec forward decl [PR96363]

2022-05-25 Thread Patrick Palka via Gcc-patches
Here during cp_parser_single_declaration for #2, we were calling
associate_classtype_constraints for TPL (the primary template type)
before maybe_process_partial_specialization could get a chance to
notice that we're in fact declaring a distinct constrained partial
spec and not redeclaring the primary template.  This caused us to
emit a bogus error about differing constraints b/t the primary template
the current constraints at #2.  This patch fixes this by moving the
call to associate_classtype_constraints after the call to shadow_tag
(which calls maybe_process_partial_specialization) and adjusting
shadow_tag to use the return value of m_p_p_s.

Moreover, if we later try to define a constrained partial specialization
that's been declared earlier (as in the third testcase), then
maybe_new_partial_specialization correctly notices that it's a
redeclaration and returns NULL_TREE.  But we need to also update TYPE to
point to the constrained class type in this case (it'll otherwise
continue to point to the primary template type, eventually leading to a
bogus error).

Bootstrapped and regtested on x86_64-pc-linux-gnu, also tested against
cmcstl and range-v3, does this look OK for trunk?  Since it should only
affect concepts code, I wonder about backporting this for 12.2?

PR c++/96363

gcc/cp/ChangeLog:

* decl.cc (shadow_tag): Use the return value of
maybe_process_partial_specialization.
* parser.cc (cp_parser_single_declaration): Call shadow_tag
before associate_classtype_constraints.
* pt.cc (maybe_new_partial_specialization): Change return type
to bool.  Take 'type' argument by mutable reference.  Set 'type'
to point to the correct constrained specialization when
appropriate.
(maybe_process_partial_specialization): Adjust accordingly.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-partial-spec12.C: New test.
* g++.dg/cpp2a/concepts-partial-spec13.C: New test.
---
 gcc/cp/decl.cc|  3 +-
 gcc/cp/parser.cc  | 12 +++---
 gcc/cp/pt.cc  | 38 ++-
 .../g++.dg/cpp2a/concepts-partial-spec12.C| 10 +
 .../g++.dg/cpp2a/concepts-partial-spec12a.C   | 14 +++
 .../g++.dg/cpp2a/concepts-partial-spec13.C| 16 
 6 files changed, 69 insertions(+), 24 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-partial-spec12.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-partial-spec12a.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-partial-spec13.C

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 381259cb9cf..c7caa12f061 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -5464,7 +5464,8 @@ shadow_tag (cp_decl_specifier_seq *declspecs)
   if (!t)
 return NULL_TREE;
 
-  if (maybe_process_partial_specialization (t) == error_mark_node)
+  t = maybe_process_partial_specialization (t);
+  if (t == error_mark_node)
 return NULL_TREE;
 
   /* This is where the variables in an anonymous union are
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 868b8610d60..d9e78e1f4cc 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -31811,12 +31811,6 @@ cp_parser_single_declaration (cp_parser* parser,
   if (cp_parser_declares_only_class_p (parser)
  || (declares_class_or_enum & 2))
{
- /* If this is a declaration, but not a definition, associate
-any constraints with the type declaration. Constraints
-are associated with definitions in cp_parser_class_specifier.  */
- if (declares_class_or_enum == 1)
-   associate_classtype_constraints (decl_specifiers.type);
-
  decl = shadow_tag (_specifiers);
 
  /* In this case:
@@ -31838,6 +31832,12 @@ cp_parser_single_declaration (cp_parser* parser,
  else
decl = error_mark_node;
 
+ /* If this is a declaration, but not a definition, associate
+any constraints with the type declaration. Constraints
+are associated with definitions in cp_parser_class_specifier.  */
+ if (declares_class_or_enum == 1)
+   associate_classtype_constraints (TREE_TYPE (decl));
+
  /* Perform access checks for template parameters.  */
  cp_parser_perform_template_parameter_access_checks (checks);
 
diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index b45a29926d2..7de9b11bd12 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -874,12 +874,12 @@ check_explicit_instantiation_namespace (tree spec)
   spec, current_namespace, ns);
 }
 
-/* Returns the type of a template specialization only if that
-   specialization needs to be defined. Otherwise (e.g., if the type has
-   already been defined), the function returns NULL_TREE.  */
+/* Returns true if TYPE is a new partial specialization that needs to be
+   set up.  This may also modify TYPE to point to the correct (new or
+   existing) 

Re: [PATCH] tree-optimization/105726 - adjust array bound heuristic

2022-05-25 Thread Martin Sebor via Gcc-patches

On 5/25/22 11:19, Martin Sebor wrote:

On 5/25/22 04:49, Richard Biener wrote:

...

[*] For example, no warning is issued for the following overread:


Scratch that, there is no overread with strncpy.  When there is like
with memcpy, it is diagnosed as it should be.



struct A a;

void g (char *d)
{
   struct B *q = 
   __builtin_strncpy (d, q->a, 123);   // { dg-warning 
"-Wstringop-overread" "pr???" { xfail *-*-* } }

}




Re: [PATCH] tree-optimization/105726 - adjust array bound heuristic

2022-05-25 Thread Martin Sebor via Gcc-patches

On 5/25/22 04:49, Richard Biener wrote:

There's heuristic to detect ptr[1].a[...] out of bound accesses
reasoning that if ptr points to an array of aggregates a trailing
incomplete array has to have size zero.  The following more
thoroughly constrains the cases this applies to avoid false
positive diagnostics.


The code in the area below does look like it's missing some logic
(like what you're adding).  The whole -Wrestrict pass is messy and
could use some cleanup (sorry).  The -Warray-bounds logic should
probably be removed (-Wstringop-overflow should now catch most of
the valid instances it issues, and if it doesn't(*) it should be
enhanced).  The pass could even be moved into the access warning
pass and probably also simplified quite a bit.

For what it's worth, a simple C test case for the same bug is:

struct A
{
  int i;
  struct B { char a[1]; } b;
};

void f (char *d, struct A *p)
{
  struct B *q = >b;
  __builtin_strncpy (d, q->a, 1);   // { dg-bogus "-Warray-bounds" }
}

Martin

[*] For example, no warning is issued for the following overread:

struct A a;

void g (char *d)
{
  struct B *q = 
  __builtin_strncpy (d, q->a, 123);   // { dg-warning 
"-Wstringop-overread" "pr???" { xfail *-*-* } }

}



Bootstrapped and tested on x86_64-unknown-linux-gnu.

OK?

Thanks,
Richard.

2022-05-25  Richard Biener  

PR tree-optimization/105726
* gimple-ssa-warn-restrict.cc (builtin_memref::set_base_and_offset):
Constrain array-of-flexarray case more.

* g++.dg/warn/Warray-bounds-27.C: New testcase.
---
  gcc/gimple-ssa-warn-restrict.cc  | 22 
  gcc/testsuite/g++.dg/warn/Warray-bounds-27.C | 16 ++
  2 files changed, 29 insertions(+), 9 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/warn/Warray-bounds-27.C

diff --git a/gcc/gimple-ssa-warn-restrict.cc b/gcc/gimple-ssa-warn-restrict.cc
index b678e806da3..734cdd7f5b4 100644
--- a/gcc/gimple-ssa-warn-restrict.cc
+++ b/gcc/gimple-ssa-warn-restrict.cc
@@ -525,7 +525,6 @@ builtin_memref::set_base_and_offset (tree expr)
  {
tree memrefoff = fold_convert (ptrdiff_type_node, TREE_OPERAND (base, 
1));
extend_offset_range (memrefoff);
-  base = TREE_OPERAND (base, 0);
  
if (refoff != HOST_WIDE_INT_MIN

  && TREE_CODE (expr) == COMPONENT_REF)
@@ -538,14 +537,19 @@ builtin_memref::set_base_and_offset (tree expr)
 REFOFF is set to s[1].b - (char*)s.  */
  offset_int off = tree_to_shwi (memrefoff);
  refoff += off;
-   }
-
-  if (!integer_zerop (memrefoff))
-   /* A non-zero offset into an array of struct with flexible array
-  members implies that the array is empty because there is no
-  way to initialize such a member when it belongs to an array.
-  This must be some sort of a bug.  */
-   refsize = 0;
+
+ if (!integer_zerop (memrefoff)
+ && !COMPLETE_TYPE_P (TREE_TYPE (expr))
+ && multiple_of_p (sizetype, memrefoff,
+   TYPE_SIZE_UNIT (TREE_TYPE (base)), true))
+   /* A non-zero offset into an array of struct with flexible array
+  members implies that the array is empty because there is no
+  way to initialize such a member when it belongs to an array.
+  This must be some sort of a bug.  */
+   refsize = 0;
+   }
+
+  base = TREE_OPERAND (base, 0);
  }
  
if (TREE_CODE (ref) == COMPONENT_REF)

diff --git a/gcc/testsuite/g++.dg/warn/Warray-bounds-27.C 
b/gcc/testsuite/g++.dg/warn/Warray-bounds-27.C
new file mode 100644
index 000..06ce089c4b0
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Warray-bounds-27.C
@@ -0,0 +1,16 @@
+// PR105726
+// { dg-do compile }
+// { dg-require-effective-target c++11 }
+// { dg-options "-O2 -Warray-bounds" }
+
+#include 
+#include 
+
+struct X {
+char pad[4];
+std::array mField;
+};
+
+void encode(char* aBuffer, const X& aMessage) {
+strncpy(aBuffer, aMessage.mField.data(), 1); // { dg-bogus "bounds" }
+}




Vim swap files not ignored

2022-05-25 Thread Bruce Korb via Gcc-patches

Hi,
I don't have the keys for write access anymore. This ought to be 
applied. Odd that it never has been. :)diff --git a/.gitignore b/.gitignore
index 14ee0325176..021a8c74185 100644
--- a/.gitignore
+++ b/.gitignore
@@ -6,6 +6,7 @@
 *~
 .#*
 *#
+.*.swp
 
 *.flt
 *.gmo


Re: [COMMITTED] Use infer instead of side-effect for ranges.

2022-05-25 Thread Segher Boessenkool
On Wed, May 25, 2022 at 10:35:31AM -0400, Andrew MacLeod wrote:
> Executive decision made.
> 
> gimple-range-side-effect.{h,cc}   -> gimple-range-infer.{h,cc}
> 
> class stmt_side_effects  -->  class gimple_infer_range
> 
> class side_effect_manager  -->  class infer_range_manager
> 
> various APIs/comments with the term "side effect"  renamed to "infer 
> range" or some variation of tense.

Thank you!


Segher


[PATCH] AArch64: Cleanup option processing code

2022-05-25 Thread Wilco Dijkstra via Gcc-patches

Further cleanup option processing. Remove the duplication of global
variables for CPU and tune settings so that CPU option processing is
simplified even further. Move global variables that need save and 
restore due to target option processing into aarch64.opt. This removes
the need for explicit saving/restoring and unnecessary reparsing of
options.

Bootstrap OK, regress pass, OK for commit?

gcc/
* config/aarch64/aarch64.opt (explicit_tune_core): Rename to
selected_tune.
(explicit_arch): Rename to selected_arch.
(x_aarch64_override_tune_string): Remove.
(aarch64_ra_sign_key): Add as TargetVariable so it gets saved/restored.
(aarch64_override_tune_string): Add Save so it gets saved/restored.
* config/aarch64/aarch64.h (aarch64_architecture_version): Remove.
* config/aarch64/aarch64.cc (aarch64_architecture_version): Remove.
(processor): Remove archtecture_version field.
(selected_arch): Remove global.
(selected_cpu): Remove global.
(selected_tune): Remove global.
(aarch64_ra_sign_key): Move global to aarch64.opt so it is saved.
(aarch64_override_options_internal): Use aarch64_get_tune_cpu.
(aarch64_override_options): Further simplify code to only set
selected_arch and selected_tune globals.
(aarch64_option_save): Remove now that target options are saved.
(aarch64_option_restore): Remove redundant target option restores.
* config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins): Use
AARCH64_ISA_V9.
* config/aarch64/aarch64-opts.h (aarch64_key_type): Add, moved from...
* config/aarch64/aarch64-protos.h (aarch64_key_type): Remove.
(aarch64_ra_sign_key): Remove.

---

diff --git a/gcc/config/aarch64/aarch64-c.cc b/gcc/config/aarch64/aarch64-c.cc
index 
767ee0c763c56a022089a647c7425afb00644644..3d2fb5ec2ef33e66aaa59d216c53a29737262794
 100644
--- a/gcc/config/aarch64/aarch64-c.cc
+++ b/gcc/config/aarch64/aarch64-c.cc
@@ -82,7 +82,7 @@ aarch64_update_cpp_builtins (cpp_reader *pfile)
 {
   aarch64_def_or_undef (flag_unsafe_math_optimizations, "__ARM_FP_FAST", 
pfile);
 
-  builtin_define_with_int_value ("__ARM_ARCH", aarch64_architecture_version);
+  builtin_define_with_int_value ("__ARM_ARCH", AARCH64_ISA_V9 ? 9 : 8);
 
   builtin_define_with_int_value ("__ARM_SIZEOF_MINIMAL_ENUM",
 flag_short_enums ? 1 : 4);
diff --git a/gcc/config/aarch64/aarch64-opts.h 
b/gcc/config/aarch64/aarch64-opts.h
index 
93572fe8330218568435f485a366101eb2c58da9..421648a156a02cc1f43660eddc5de38c2f366a72
 100644
--- a/gcc/config/aarch64/aarch64-opts.h
+++ b/gcc/config/aarch64/aarch64-opts.h
@@ -98,4 +98,10 @@ enum stack_protector_guard {
   SSP_GLOBAL   /* global canary */
 };
 
+/* The key type that -msign-return-address should use.  */
+enum aarch64_key_type {
+  AARCH64_KEY_A,
+  AARCH64_KEY_B
+};
+
 #endif
diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 
df311812e8d4b87c0ad8692adacedcd79a8e0f64..dabd047d7ba2c532238720d59ecd59f0f5ba822f
 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -672,14 +672,6 @@ enum simd_immediate_check {
   AARCH64_CHECK_MOV  = AARCH64_CHECK_ORR | AARCH64_CHECK_BIC
 };
 
-/* The key type that -msign-return-address should use.  */
-enum aarch64_key_type {
-  AARCH64_KEY_A,
-  AARCH64_KEY_B
-};
-
-extern enum aarch64_key_type aarch64_ra_sign_key;
-
 extern struct tune_params aarch64_tune_params;
 
 /* The available SVE predicate patterns, known in the ACLE as "svpattern".  */
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 
f835da33b72f36bbf25a0e1328135411bd8ab4f6..80cfe4b740798072ed2a3d08089ff889943f916a
 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -148,9 +148,6 @@
 
 #define PCC_BITFIELD_TYPE_MATTERS  1
 
-/* Major revision number of the ARM Architecture implemented by the target.  */
-extern unsigned aarch64_architecture_version;
-
 /* Instruction tuning/selection flags.  */
 
 /* Bit values used to identify processor capabilities.  */
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
060c67a93b95b717563ed063509e3e31c978d891..6bdfd55bcd6308c71fb6cfccb5922b71830c7c4f
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -306,9 +306,6 @@ static bool aarch64_print_address_internal (FILE*, 
machine_mode, rtx,
aarch64_addr_query_type);
 static HOST_WIDE_INT aarch64_clamp_to_uimm12_shift (HOST_WIDE_INT val);
 
-/* Major revision number of the ARM Architecture implemented by the target.  */
-unsigned aarch64_architecture_version;
-
 /* The processor for which instructions should be scheduled.  */
 enum aarch64_processor aarch64_tune = cortexa53;
 
@@ -2677,7 +2674,6 @@ struct processor
   enum aarch64_processor ident;
 

[Patch] libgomp.texi: Add more to-be-implemented OpenMP 5.2 features / [wwwdocs] p.texi: Add more to-be-implemented OpenMP 5.2 features

2022-05-25 Thread Tobias Burnus

I had a look at the OpenMP 5.2 changes and found some more,
which I think should be listed. Some were rather hidden, two
were only documented in the deprecate part.

I also added the 'begin declare target', which is a variant
to the delimited form of 'declare target' – the non-begin
variant is then deprecated in 5.1. (The begin variant permits
clauses, the other doesn't.)

OK? Comments?

Tobias

PS: For Fortran, 5.1 permitted with capture:
[!$omp end atomic]
namely: the 'end atomic' is optional.
gfortran has implemented this – but 5.2 now made it required again,
treating it as bug. I am not sure whether this should now also be
listed as 'deleted' feature or not. Hmm.
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
libgomp.texi: Add more to-be-implemented OpenMP 5.2 features

libgomp/
	* libgomp.texi (Other new OpenMP 5.1 features): Add
	'begin declare target'.
	(Other new OpenMP 5.2 features): New.

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 56724687fb4..7434643dd1d 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -343,6 +343,7 @@ The OpenMP 4.5 specification is fully supported.
 @item Support of structured block sequences in C/C++ @tab Y @tab
 @item @code{unconstrained} and @code{reproducible} modifiers on @code{order}
   clause @tab Y @tab
+@item Support @code{begin/end declare target} syntax in C/C++ @tab N @tab
 @end multitable
 
 
@@ -395,6 +396,23 @@ The OpenMP 4.5 specification is fully supported.
 @item @code{omp_cur_iteration} keyword @tab N @tab
 @end multitable
 
+@unnumberedsubsec Other new OpenMP 5.2 features
+
+@multitable @columnfractions .60 .10 .25
+@headitem Description @tab Status @tab Comments
+@item For Fortran, optional comma between directive and clause @tab N @tab
+@item Conforming device numbers and @code{omp_initial_device} and
+  @code{omp_invalid_device} enum/PARAMETER @tab N @tab
+@item Initial value of @emph{default-device-var} ICV with
+  @code{OMP_TARGET_OFFLOAD=mandatory} @tab N @tab
+@item @emph{interop_types} in any position of the modifier list for the @code{init} clause
+  of the @code{interop} construct @tab N @tab
+@item Deprecate specifying multiple modifiers in the @code{map} clause without
+  comma separator @tab N @tab
+@item Deprecate @code{destroy} clause on the @code{depobj} construct with no
+  argument @tab N @tab
+@end multitable
+
 
 @c -
 @c OpenMP Runtime Library Routines
gcc-13/changes.html - Update OpenMP / projects/gomp - add more to-do items

* gcc-13/changes.html (OpenMP): Add 'nowait' with 'taskwait'.
* projects/gomp/index.html: Add one 5.1 and several 5.2 additional
  to-be-implemented features.

diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html
index c2610412..a262087f 100644
--- a/htdocs/gcc-13/changes.html
+++ b/htdocs/gcc-13/changes.html
@@ -40,7 +40,8 @@ a work-in-progress.
   
 The following OpenMP 5.1 features have been added: the
 omp_all_memory reserved locator, the inoutset
-modifier to the depend clause and the
+modifier to the depend clause, the nowait
+clause for the taskwait directive and the
 omp_target_is_accessible, omp_target_memcpy_async,
 omp_target_memcpy_rect_async and
 omp_get_mapped_ptr API routines. Fortran now supports
diff --git a/htdocs/projects/gomp/index.html b/htdocs/projects/gomp/index.html
index 68531a57..f2b80ad8 100644
--- a/htdocs/projects/gomp/index.html
+++ b/htdocs/projects/gomp/index.html
@@ -630,6 +630,11 @@ than listed, depending on resolved corner cases and optimizations.
 GCC13
 
   
+  
+nowait clause in taskwait directive
+GCC13
+
+  
   
 target_device trait in OpenMP Context
 No
@@ -690,11 +695,6 @@ than listed, depending on resolved corner cases and optimizations.
 No
 
   
-  
-nowait clause in taskwait directive
-No
-
-  
   
 present argument to defaultmap clause
 No
@@ -725,6 +725,12 @@ than listed, depending on resolved corner cases and optimizations.
 No
 
   
+  
+Support begin/end declare target syntax in C/C++
+No
+
+  
+  
 
 
 
@@ -875,6 +881,36 @@ than listed, depending on resolved corner cases and optimizations.
 No
 
   
+  
+For Fortran, optional comma between directive and clause
+No
+
+  
+  
+Conforming device numbers and omp_initial_device and omp_invalid_device enum/PARAMETER
+No
+
+  
+  
+interop_types in any position of the modifier list for the init clause of the interop construct
+No
+
+  
+  
+Initial value of default-device-var ICV with OMP_TARGET_OFFLOAD=mandatory
+No
+
+  
+  
+Deprecate specifying multiple modifiers in the map clause without 

Re: [PATCH v3] libstdc++: fix pointer type exception catch [PR105387]

2022-05-25 Thread Jonathan Wakely via Gcc-patches
On Wed, 25 May 2022 at 03:30, Jakob Hasse via Libstdc++
 wrote:
>
> Hello,
>
> two weeks ago I submitted the second version of the patch PR105387 for the 
> bug 105387. Now I added a pointer-to-member exception test just to make sure 
> that it doesn't break in case RTTI is enabled. The test is disabled if RTTI 
> is disabled. I didn't receive any feedback so far regarding the second 
> version of the patch. Is there any issue preventing acceptance?

Just a lack of time to review it properly.

It's on my list.


>
> I ran the conformance tests on libstdc++v3 by running
> make -j 18 check RUNTESTFLAGS=conformance.exp
>
> Results for the current version (only difference is the added 
> pointer-to-member test):
>
> Without RTTI before applying patch:
> === libstdc++ Summary ===
>
> # of expected passes 14560
> # of unexpected failures 5
> # of expected failures 95
> # of unsupported tests 702
>
> Without RTTI after applying patch:
> === libstdc++ Summary ===
>
> # of expected passes 14562
> # of unexpected failures 5
> # of expected failures 95
> # of unsupported tests 703
>
> With RTTI before applying patch:
> === libstdc++ Summary ===
>
> # of expected passes 14598
> # of unexpected failures 2
> # of expected failures 95
> # of unsupported tests 683
>
> With RTTI after applying patch:
> === libstdc++ Summary ===
>
> # of expected passes 14602
> # of unexpected failures 2
> # of expected failures 95
> # of unsupported tests 683
>
> Given that the pointer-to-member test is disabled when RTTI is disabled, the 
> results look logical to me.
>


[pushed] c++: deduction from auto fn [PR105623]

2022-05-25 Thread Jason Merrill via Gcc-patches
Since my patch for PR90451, we defer mark_used of single functions as late
as possible.  And since my r12-1273, we keep BASELINK from lookup around
rather than reconstruct it later.  These both made us try to instantiate g
with a function type that still had 'auto' as its return type.

Tested x86_64-pc-linux-gnu, applying to trunk.

PR c++/105623

gcc/cp/ChangeLog:

* decl2.cc (mark_used): Copy type from fn to BASELINK.
* pt.cc (unify_one_argument): Call mark_single_function.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/auto-fn62.C: New test.
---
 gcc/cp/decl2.cc| 11 ---
 gcc/cp/pt.cc   |  4 
 gcc/testsuite/g++.dg/cpp1y/auto-fn62.C | 14 ++
 3 files changed, 26 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/auto-fn62.C

diff --git a/gcc/cp/decl2.cc b/gcc/cp/decl2.cc
index ae743c8a3df..e72fdf05382 100644
--- a/gcc/cp/decl2.cc
+++ b/gcc/cp/decl2.cc
@@ -5799,10 +5799,15 @@ mark_used (tree decl, tsubst_flags_t complain)
  actually used until after overload resolution.  */
   if (BASELINK_P (decl))
 {
-  decl = BASELINK_FUNCTIONS (decl);
-  if (really_overloaded_fn (decl))
+  tree fns = BASELINK_FUNCTIONS (decl);
+  if (really_overloaded_fn (fns))
return true;
-  decl = OVL_FIRST (decl);
+  fns = OVL_FIRST (fns);
+  if (!mark_used (fns, complain))
+   return false;
+  /* We might have deduced its return type.  */
+  TREE_TYPE (decl) = TREE_TYPE (fns);
+  return true;
 }
 
   if (!DECL_P (decl))
diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index b45a29926d2..76913cb1409 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -22643,6 +22643,10 @@ unify_one_argument (tree tparms, tree targs, tree 
parm, tree arg,
  return unify_success (explain_p);
}
 
+ /* Force auto deduction now.  Use tf_none to avoid redundant
+deprecated warning on deprecated-14.C.  */
+ mark_single_function (arg, tf_none);
+
  arg_expr = arg;
  arg = unlowered_expr_type (arg);
  if (arg == error_mark_node)
diff --git a/gcc/testsuite/g++.dg/cpp1y/auto-fn62.C 
b/gcc/testsuite/g++.dg/cpp1y/auto-fn62.C
new file mode 100644
index 000..9c2bff1ccf3
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/auto-fn62.C
@@ -0,0 +1,14 @@
+// PR c++/105623
+// { dg-do compile { target c++14 } }
+
+template 
+auto g(T fn) { }
+
+template
+struct base {
+  static auto value() { }
+};
+
+struct S : base {
+  static void f() { g(value); }
+};

base-commit: 1b661f3f5e712c951e774b3b91fffe4dac734cc7
prerequisite-patch-id: cc6e608c68f4eb133f6a153f83dfe4f033544cbd
prerequisite-patch-id: b29700132a341b1334a8e95d8f2809f7122f7181
-- 
2.27.0



[pushed] c++: constexpr returning deallocated ptr

2022-05-25 Thread Jason Merrill via Gcc-patches
In constexpr-new3.C, the f7 function returns a deleted pointer, which we
were happily caching because the new and delete are balanced.  Don't.

Tested x86_64-pc-linux-gnu, applying to trunk.

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_call_expression): Check for
heap vars in the result.
---
 gcc/cp/constexpr.cc | 5 +
 1 file changed, 5 insertions(+)

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index 1a70fda1dc5..45208478c3f 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -1356,6 +1356,7 @@ static tree cxx_eval_constant_expression (const 
constexpr_ctx *, tree,
  value_cat, bool *, bool *, tree * = 
NULL);
 static tree cxx_fold_indirect_ref (const constexpr_ctx *, location_t, tree, 
tree,
   bool * = NULL);
+static tree find_heap_var_refs (tree *, int *, void *);
 
 /* Attempt to evaluate T which represents a call to a builtin function.
We assume here that all builtin functions evaluate to scalar types
@@ -2965,6 +2966,10 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree 
t,
  cacheable = false;
  break;
}
+ /* Also don't cache a call that returns a deallocated pointer.  */
+ if (cacheable && (cp_walk_tree_without_duplicates
+   (, find_heap_var_refs, NULL)))
+   cacheable = false;
}
 
/* Rewrite all occurrences of the function's RESULT_DECL with the

base-commit: 1b661f3f5e712c951e774b3b91fffe4dac734cc7
prerequisite-patch-id: cc6e608c68f4eb133f6a153f83dfe4f033544cbd
-- 
2.27.0



[pushed] c++: strict constexpr and local vars

2022-05-25 Thread Jason Merrill via Gcc-patches
A change I was working on made constexpr_searcher.cc start to fail, and when
I looked at it I wondered why it had been accepted before.  This turned out
to be because we try to be more flexible about constant-evaluation of static
initializers, as allowed, but we were wrongly doing the same for non-static
initializers as well.

Tested x86_64-pc-linux-gnu, applying to trunk.

gcc/cp/ChangeLog:

* constexpr.cc (maybe_constant_init_1): Only pass false for
strict when initializing a variable of static duration.

libstdc++-v3/ChangeLog:

* testsuite/20_util/function_objects/constexpr_searcher.cc: Add
constexpr.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/constexpr-local4.C: New test.
---
 gcc/cp/constexpr.cc | 12 +---
 gcc/testsuite/g++.dg/cpp1y/constexpr-local4.C   | 17 +
 .../function_objects/constexpr_searcher.cc  |  4 ++--
 3 files changed, 28 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-local4.C

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index 388239ea8a8..1a70fda1dc5 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -8301,9 +8301,15 @@ maybe_constant_init_1 (tree t, tree decl, bool 
allow_non_constant,
   else if (CONSTANT_CLASS_P (t) && allow_non_constant)
 /* No evaluation needed.  */;
   else
-t = cxx_eval_outermost_constant_expr (t, allow_non_constant,
- /*strict*/false,
- manifestly_const_eval, false, decl);
+{
+  /* [basic.start.static] allows constant-initialization of variables with
+static or thread storage duration even if it isn't required, but we
+shouldn't bend the rules the same way for automatic variables.  */
+  bool is_static = (decl && DECL_P (decl)
+   && (TREE_STATIC (decl) || DECL_EXTERNAL (decl)));
+  t = cxx_eval_outermost_constant_expr (t, allow_non_constant, !is_static,
+   manifestly_const_eval, false, decl);
+}
   if (TREE_CODE (t) == TARGET_EXPR)
 {
   tree init = TARGET_EXPR_INITIAL (t);
diff --git a/gcc/testsuite/g++.dg/cpp1y/constexpr-local4.C 
b/gcc/testsuite/g++.dg/cpp1y/constexpr-local4.C
new file mode 100644
index 000..bef62488579
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/constexpr-local4.C
@@ -0,0 +1,17 @@
+// { dg-do compile { target c++14 } }
+
+struct A
+{
+  int i;
+  constexpr A(int i): i(i) {};
+};
+
+const A a = 42;
+
+constexpr int f()
+{
+  const int j = a.i;   // { dg-message "'a'" }
+  return j;
+}
+
+static_assert (f() == 42,"");  // { dg-error "non-constant" }
diff --git 
a/libstdc++-v3/testsuite/20_util/function_objects/constexpr_searcher.cc 
b/libstdc++-v3/testsuite/20_util/function_objects/constexpr_searcher.cc
index 33012200cb0..17069694c1b 100644
--- a/libstdc++-v3/testsuite/20_util/function_objects/constexpr_searcher.cc
+++ b/libstdc++-v3/testsuite/20_util/function_objects/constexpr_searcher.cc
@@ -28,13 +28,13 @@
 
 #include 
 
-const std::string_view
+constexpr std::string_view
 patt = "World";
 
 constexpr std::string_view
 greet = "Hello, Humongous World of Wonder!!!";
 
-const std::wstring_view
+constexpr std::wstring_view
 wpatt = L"World";
 
 constexpr std::wstring_view

base-commit: 1b661f3f5e712c951e774b3b91fffe4dac734cc7
-- 
2.27.0



Re: [PATCH] Canonicalize X&-Y as X*Y in match.pd when Y is [0,1].

2022-05-25 Thread Koning, Paul via Gcc-patches



On May 25, 2022, at 10:39 AM, Roger Sayle 
mailto:ro...@nextmovesoftware.com>> wrote:


On May 25, 2022, at 7:34 AM, Richard Biener via Gcc-patches mailto:patc...@gcc.gnu.org>> wrote:

On Tue, May 24, 2022 at 3:55 PM Roger Sayle
mailto:ro...@nextmovesoftware.com>> wrote:


"For every pessimization, there's an equal and opposite optimization".

In the review of my original patch for PR middle-end/98865, Richard
Biener pointed out that match.pd shouldn't be transforming X*Y into
X&-Y as the former is considered cheaper by tree-ssa's cost model
(operator count).  A corollary of this is that we should instead be
transforming X&-Y into the cheaper X*Y as a preferred canonical form
(especially as RTL expansion now intelligently selects the
appropriate implementation based on the target's costs).

With this patch we now generate identical code for:
int foo(int x, int y) { return -(x&1) & y; } int bar(int x, int y) {
return (x&1) * y; }

What, if anything, does the target description have to do for "the
appropriate
implementation" to be selected?  For example, if the target has an "AND
with
complement" operation, it's probably cheaper than multiply and would be
the
preferred generated code.

RTL expansion will use an AND and NEG instruction pair if that's cheaper
than the cost of a MULT or a synth_mult sequence.  Even, without the
backend providing an rtx_costs function, GCC will default to AND and NEG
having COSTS_N_INSNS(1), and MULT having COSTS_N_INSNS(4).
But consider the case where y is cloned/inlined/CSE'd to have the
value 2, in which (on many targets) case the LSHIFT is cheaper than
a AND and a NEG.

Alas, I don't believe a existence of ANDN, such as with BMI or SSE, has
any impact on the decision, as this is NEG;AND not NOT;AND.  If you
known of any target that has an "AND with negation" instruction, I'll
probably need to tweak RTL expansion to check for that explicitly.

I don't know of one either (in the two's complement world); I misread the minus 
as tilde in the "before".  Sorry about the mixup.

paul




[COMMITTED] Tweak some comments.

2022-05-25 Thread Andrew MacLeod via Gcc-patches

Just fixed some misspelling and such.

pushed.

Andrew
commit 761cc32e5a1c762f91904d2a86980f106a5bc441
Author: Andrew MacLeod 
Date:   Wed May 25 10:39:31 2022 -0400

Tweak comments.

Adjust some mispellings in comments.

* gimple-range-cache.cc: Adjust comments.
* gimple-range-infer.cc: Adjust comments.
* gimple-range-infer.h: Adjust comments.
* gimple-range.cc: Adjust comments.

diff --git a/gcc/gimple-range-cache.cc b/gcc/gimple-range-cache.cc
index 5d5e2bfd0c3..6e73ac706ac 100644
--- a/gcc/gimple-range-cache.cc
+++ b/gcc/gimple-range-cache.cc
@@ -1440,7 +1440,7 @@ ranger_cache::apply_inferred_ranges (gimple *s)
   if (infer.num () == 0)
 return;
 
-  // Do not update the on-netry cache for block ending stmts.
+  // Do not update the on-entry cache for block ending stmts.
   if (stmt_ends_bb_p (s))
 {
   edge_iterator ei;
diff --git a/gcc/gimple-range-infer.cc b/gcc/gimple-range-infer.cc
index 8e25830849f..545d4f2de3d 100644
--- a/gcc/gimple-range-infer.cc
+++ b/gcc/gimple-range-infer.cc
@@ -120,7 +120,7 @@ gimple_infer_range::gimple_infer_range (gimple *s)
 
 // -
 
-// This class is an element in list of infered ranges.
+// This class is an element in the list of infered ranges.
 
 class exit_range
 {
diff --git a/gcc/gimple-range-infer.h b/gcc/gimple-range-infer.h
index 629436753e9..412958fe28e 100644
--- a/gcc/gimple-range-infer.h
+++ b/gcc/gimple-range-infer.h
@@ -26,7 +26,7 @@ along with GCC; see the file COPYING3.  If not see
 
 // This class manages an on-demand summary of inferred ranges for a statement.
 // It can be instantiated as required and provides a list of inferred ranges.
-// New inferred ranges should added in the constructor of this class.
+// New inferred ranges should be added in the constructor of this class.
 
 class gimple_infer_range
 {
@@ -49,7 +49,7 @@ private:
 
 // This class manages a list of inferred ranges for each basic block.
 // As inferences are made, they can be registered to a block and later
-// queried.  WHen constructed with a TRUE flag, immediate uses chains are
+// queried.  When constructed with a TRUE flag, immediate uses chains are
 // followed the first time a name is referenced and block populated if
 // there are any inferred ranges.
 
diff --git a/gcc/gimple-range.cc b/gcc/gimple-range.cc
index 08a9c01e91a..53f4865af3b 100644
--- a/gcc/gimple-range.cc
+++ b/gcc/gimple-range.cc
@@ -118,7 +118,7 @@ gimple_ranger::range_of_expr (irange , tree expr, gimple *stmt)
   // If name is defined in this block, try to get an range from S.
   if (def_stmt && gimple_bb (def_stmt) == bb)
 	{
-	  // Declared in ths block, if it has a global set, check for an
+	  // Declared in this block, if it has a global set, check for an
 	  // override from a block walk, otherwise calculate it.
 	  if (m_cache.get_global_range (r, expr))
 	m_cache.block_range (r, bb, expr, false);


RE: [PATCH] Canonicalize X&-Y as X*Y in match.pd when Y is [0,1].

2022-05-25 Thread Roger Sayle
 
> > On May 25, 2022, at 7:34 AM, Richard Biener via Gcc-patches  patc...@gcc.gnu.org> wrote:
> >
> > On Tue, May 24, 2022 at 3:55 PM Roger Sayle
>  wrote:
> >>
> >>
> >> "For every pessimization, there's an equal and opposite optimization".
> >>
> >> In the review of my original patch for PR middle-end/98865, Richard
> >> Biener pointed out that match.pd shouldn't be transforming X*Y into
> >> X&-Y as the former is considered cheaper by tree-ssa's cost model
> >> (operator count).  A corollary of this is that we should instead be
> >> transforming X&-Y into the cheaper X*Y as a preferred canonical form
> >> (especially as RTL expansion now intelligently selects the
> >> appropriate implementation based on the target's costs).
> >>
> >> With this patch we now generate identical code for:
> >> int foo(int x, int y) { return -(x&1) & y; } int bar(int x, int y) {
> >> return (x&1) * y; }
> 
> What, if anything, does the target description have to do for "the
appropriate
> implementation" to be selected?  For example, if the target has an "AND
with
> complement" operation, it's probably cheaper than multiply and would be
the
> preferred generated code.

RTL expansion will use an AND and NEG instruction pair if that's cheaper
than the cost of a MULT or a synth_mult sequence.  Even, without the
backend providing an rtx_costs function, GCC will default to AND and NEG
having COSTS_N_INSNS(1), and MULT having COSTS_N_INSNS(4).
But consider the case where y is cloned/inlined/CSE'd to have the
value 2, in which (on many targets) case the LSHIFT is cheaper than
a AND and a NEG.

Alas, I don't believe a existence of ANDN, such as with BMI or SSE, has
any impact on the decision, as this is NEG;AND not NOT;AND.  If you
known of any target that has an "AND with negation" instruction, I'll
probably need to tweak RTL expansion to check for that explicitly.

The correct way to think about this canonicalization, is that the
default implementation of RTL expansion of a multiply by a 0/1
value is to use NEG/AND, and it's only in the extremely rare cases
where a multiply (or synth_mult sequence) is extremely cheap,
for example a single cycle multiply, where this will be used.

Roger
--




[COMMITTED] Use infer instead of side-effect for ranges.

2022-05-25 Thread Andrew MacLeod via Gcc-patches

On 5/20/22 15:18, Bernhard Reutner-Fischer via Gcc-patches wrote:

On 20 May 2022 16:39:20 CEST, Segher Boessenkool  
wrote:

On Fri, May 20, 2022 at 10:11:32AM +0200, Eric Botcazou wrote:

I suggest 'deduce', 'deduction', 'deducing a range'. What the code is
actually doing is deducing that 'b' in 'a / b' cannot be zero. Function in
GCC might be called like 'deduce_ranges_from_stmt'.

Or "infer", "inference", "inferring a range".

"Infer" is great here, yes!

infer (verb):
  deduce or conclude (something) from evidence and reasoning rather than
  from explicit statements

It has exactly the connotation wanted here, I would say.

Infer, deduct, refine
Sound all plausible, a native speaker should probably help decide the bikeshed 
:-)
thanks,


Executive decision made.

gimple-range-side-effect.{h,cc}   -> gimple-range-infer.{h,cc}

class stmt_side_effects  -->  class gimple_infer_range

class side_effect_manager  -->  class infer_range_manager

various APIs/comments with the term "side effect"  renamed to "infer 
range" or some variation of tense.


Bootstraps from scratch on x86_64-pc-linux-gnu with no regressions.  pushed.

Andrew

commit 156d7d8dbc8d65d3958486bc4112a7279935e47d
Author: Andrew MacLeod 
Date:   Tue May 24 11:32:42 2022 -0400

Use infer instead of side-effect for ranges.

Rename the files and classes to reflect the term infer rather than side-effect.

* Makefile.in (OBJS): Use gimple-range-infer.o.
* gimple-range-cache.cc (ranger_cache::fill_block_cache): Change msg.
(ranger_cache::range_from_dom): Rename var side_effect to infer.
(ranger_cache::apply_inferred_ranges): Rename from apply_side_effects.
* gimple-range-cache.h: Include gimple-range-infer.h.
(class ranger_cache): Adjust prototypes, use infer_range_manager.
* gimple-range-infer.cc: Rename from gimple-range-side-effects.cc.
(gimple_infer_range::*): Rename from stmt_side_effects.
(infer_range_manager::*): Rename from side_effect_manager.
* gimple-range-side-effect.cc: Rename.
* gimple-range-side-effect.h: Rename.
* gimple-range-infer.h: Rename from gimple-range-side-effects.h.
(class gimple_infer_range): Rename from stmt_side_effects.
(class infer_range_manager): Rename from side_effect_manager.
* gimple-range.cc (gimple_ranger::register_inferred_ranges): Rename
from register_side_effects.
* gimple-range.h (register_inferred_ranges): Adjust prototype.
* range-op.h: Adjust comment.
* tree-vrp.cc (rvrp_folder::pre_fold_bb): Use register_inferred_ranges.
(rvrp_folder::post_fold_bb): Use register_inferred_ranges.

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 97e5450ecb5..731d8dd2a69 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1410,7 +1410,7 @@ OBJS = \
 	gimple-range-edge.o \
 	gimple-range-fold.o \
 	gimple-range-gori.o \
-	gimple-range-side-effect.o \
+	gimple-range-infer.o \
 	gimple-range-trace.o \
 	gimple-ssa-backprop.o \
 	gimple-ssa-evrp.o \
diff --git a/gcc/gimple-range-cache.cc b/gcc/gimple-range-cache.cc
index c726393b380..5d5e2bfd0c3 100644
--- a/gcc/gimple-range-cache.cc
+++ b/gcc/gimple-range-cache.cc
@@ -944,7 +944,7 @@ bool
 ranger_cache::edge_range (irange , edge e, tree name, enum rfd_mode mode)
 {
   exit_range (r, name, e->src, mode);
-  // If this is not an abnormal edge, check for side effects on exit.
+  // If this is not an abnormal edge, check for inferred ranges on exit.
   if ((e->flags & (EDGE_EH | EDGE_ABNORMAL)) == 0)
 m_exit.maybe_adjust_range (r, name, e->src);
   int_range_max er;
@@ -1251,12 +1251,12 @@ ranger_cache::fill_block_cache (tree name, basic_block bb, basic_block def_bb)
 	}
 
 	  // Regardless of whether we have visited pred or not, if the
-	  // pred has side_effects, revisit this block.
+	  // pred has inferred ranges, revisit this block.
 	  // Don't search the DOM tree.
 	  if (m_exit.has_range_p (name, pred))
 	{
 	  if (DEBUG_RANGE_CACHE)
-		fprintf (dump_file, "side effect: update ");
+		fprintf (dump_file, "Inferred range: update ");
 	  m_update->add (node);
 	}
 
@@ -1317,8 +1317,8 @@ ranger_cache::range_from_dom (irange , tree name, basic_block start_bb,
   basic_block bb;
   basic_block prev_bb = start_bb;
 
-  // Track any side effects seen
-  int_range_max side_effect (TREE_TYPE (name));
+  // Track any inferred ranges seen.
+  int_range_max infer (TREE_TYPE (name));
 
   // Range on entry to the DEF block should not be queried.
   gcc_checking_assert (start_bb != def_bb);
@@ -1332,8 +1332,8 @@ ranger_cache::range_from_dom (irange , tree name, basic_block start_bb,
bb;
prev_bb = bb, bb = get_immediate_dominator (CDI_DOMINATORS, bb))
 {
-  // Accumulate any block exit side effects.
-  m_exit.maybe_adjust_range (side_effect, name, bb);
+  // Accumulate 

Re: [PATCH] RISC-V: Don't unconditionally add m, a, f, d in arch-canonicalize

2022-05-25 Thread Kito Cheng via Gcc-patches
Committed, Thanks for fixing my stupid bug :P


On Wed, May 25, 2022 at 9:26 PM Simon Cook  wrote:
>
> This solves an issue where rv32i, etc. are canonicalized to rv32imafd
> since the g->i addition of 'm', 'a', 'f', 'd' is not actually gated by
> whether the input was rv32g/rv64g.
>
> gcc/ChangeLog:
>
> * config/riscv/arch-canonicalize: Only add mafd extension if
> base was rv32/rv64g.
> ---
>  gcc/config/riscv/arch-canonicalize | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/config/riscv/arch-canonicalize
> b/gcc/config/riscv/arch-canonicalize
> index 71b2232b29e..fd7651ac491 100755
> --- a/gcc/config/riscv/arch-canonicalize
> +++ b/gcc/config/riscv/arch-canonicalize
> @@ -73,8 +73,8 @@ def arch_canonicalize(arch, isa_spec):
>std_exts = []
>if arch[:5] in ['rv32e', 'rv32i', 'rv32g', 'rv64i', 'rv64g']:
>  new_arch = arch[:5].replace("g", "i")
> -std_exts = ['m', 'a', 'f', 'd']
>  if arch[:5] in ['rv32g', 'rv64g']:
> +  std_exts = ['m', 'a', 'f', 'd']
>if not is_isa_spec_2p2:
>  extra_long_ext = ['zicsr', 'zifencei']
>else:
> --
> 2.32.1


Re: [PATCH] Canonicalize X&-Y as X*Y in match.pd when Y is [0,1].

2022-05-25 Thread Koning, Paul via Gcc-patches



> On May 25, 2022, at 7:34 AM, Richard Biener via Gcc-patches 
>  wrote:
> 
> On Tue, May 24, 2022 at 3:55 PM Roger Sayle  
> wrote:
>> 
>> 
>> "For every pessimization, there's an equal and opposite optimization".
>> 
>> In the review of my original patch for PR middle-end/98865, Richard
>> Biener pointed out that match.pd shouldn't be transforming X*Y into
>> X&-Y as the former is considered cheaper by tree-ssa's cost model
>> (operator count).  A corollary of this is that we should instead be
>> transforming X&-Y into the cheaper X*Y as a preferred canonical form
>> (especially as RTL expansion now intelligently selects the appropriate
>> implementation based on the target's costs).
>> 
>> With this patch we now generate identical code for:
>> int foo(int x, int y) { return -(x&1) & y; }
>> int bar(int x, int y) { return (x&1) * y; }

What, if anything, does the target description have to do for "the appropriate 
implementation" to be selected?  For example, if the target has an "AND with 
complement" operation, it's probably cheaper than multiply and would be the 
preferred generated code.

paul




Re: [PATCH v2 09/11] OpenMP 5.0 "declare mapper" support for C++

2022-05-25 Thread Jakub Jelinek via Gcc-patches
On Tue, May 24, 2022 at 04:48:13PM +0200, Jakub Jelinek wrote:
> > This version of the patch improves detection of explicitly-mapped struct
> > accesses which inhibit implicitly-triggered user-defined mappers for a
> > target region.
> 
> Will start with a general comment, from looking at the dumps it seems
> handling the mappers in the FE right away for explicit mapping clauses
> and attaching mapper binding clauses for types that are (or could
> conservatively be, including from the recursive mappers themselves) be
> used in the target body and letting gimplification find those var in detail
> and use mapper binding clauses to actually expand it looks like the right
> approach to me.  As I raised in an earlier patch, a big question is if we
> should do map clause sorting on gimplify_scan_omp_clauses or
> gimplify_adjust_omp_clauses or both...
> The conservative discovery of what types we might need to create mapper
> binding clauses for should be probably done only if
> !processing_template_decl.

Oh, and one very important thing I forgot to say yesterday.
With declare mapper but even the general mapping of aggregate is mapping
of all its members/elements individually, we are going to end up with huge
mapping lists.  We need to undo that at compile time whenever possible,
so if we after the declare mapper handling (from explicit or implicit
mappings) and sorting the mapping clauses see that we have say
struct S { int x, y, z[2], w; } s;
and we see map (tofrom: s.x, s.y, s.z[0], s.z[1], s.w) we should turn
that back into map (tofrom: s).  Basically optimize consecutive mappings
of the same kind to one that covers them together.
Then there is the question of padding bits, if there is reasonably small
padding in between mapped fields we could be mapping the padding too.

Jakub



[PATCH] RISC-V: Don't unconditionally add m,a,f,d in arch-canonicalize

2022-05-25 Thread Simon Cook
This solves an issue where rv32i, etc. are canonicalized to rv32imafd
since the g->i addition of 'm', 'a', 'f', 'd' is not actually gated by
whether the input was rv32g/rv64g.

gcc/ChangeLog:

* config/riscv/arch-canonicalize: Only add mafd extension if
base was rv32/rv64g.
---
 gcc/config/riscv/arch-canonicalize | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/riscv/arch-canonicalize
b/gcc/config/riscv/arch-canonicalize
index 71b2232b29e..fd7651ac491 100755
--- a/gcc/config/riscv/arch-canonicalize
+++ b/gcc/config/riscv/arch-canonicalize
@@ -73,8 +73,8 @@ def arch_canonicalize(arch, isa_spec):
   std_exts = []
   if arch[:5] in ['rv32e', 'rv32i', 'rv32g', 'rv64i', 'rv64g']:
 new_arch = arch[:5].replace("g", "i")
-std_exts = ['m', 'a', 'f', 'd']
 if arch[:5] in ['rv32g', 'rv64g']:
+  std_exts = ['m', 'a', 'f', 'd']
   if not is_isa_spec_2p2:
 extra_long_ext = ['zicsr', 'zifencei']
   else:
-- 
2.32.1


Re: [PATCH] libgomp: start using LIKELY/UNLIKELY macros

2022-05-25 Thread Jakub Jelinek via Gcc-patches
On Wed, May 25, 2022 at 03:13:09PM +0200, Martin Liška wrote:
> Btw. I noticed one discrepancy:
> 
> ./libgomp/loop_ull.c:
> 
>   /* Cheap overflow protection.  */
>   if (__builtin_expect ((nthreads | ws->chunk_size_ull)
> < 1ULL << (sizeof (gomp_ull)
>* __CHAR_BIT__ / 2 - 1), 1))
> ws->mode = ws->end_ull < (__LONG_LONG_MAX__ * 2ULL + 1
>   - (nthreads + 1) * ws->chunk_size_ull);
> 
> while:
> 
> ./libgomp/loop.c:
> 
>   /* Cheap overflow protection.  */
>   if (__builtin_expect ((nthreads | ws->chunk_size)
> >= 1UL << (sizeof (long)
>* __CHAR_BIT__ / 2 - 1), 0))
> ws->mode = 0;
> 
> So once, it uses __builtin_expect(..., 1) and second time 
> __builtin_expect(..., 0).
> Is it expected?

Yes.  The comparison is different.  The first case is
or_of_the_two_values < 0x8000ULL which is predicted likely (usually both
number of threads and chunk size are small numbers).
While in the other case it is
or_of_the_two_values >= 0x8000UL (or 0x8000UL for 32-bit arches)
and again, we predict both numbers to be small and so the comparison to be
false.

Jakub



Re: [PATCH] libgomp: start using LIKELY/UNLIKELY macros

2022-05-25 Thread Martin Liška
On 5/25/22 13:55, Jakub Jelinek wrote:
> On Wed, May 25, 2022 at 01:52:46PM +0200, Martin Liška wrote:
>> Similarly to g:22d9c8802add09a93308319fc37dd3a0f1125393, I would like to use
>> {UN,}LIKELY macros in libgomp. If the community is fine, I'm planning doing
>> the same in other GCC's libraries.
> 
> I must say I prefer __builtin_expect over the macros any time.
> 
>   Jakub
> 

I respect that.

Btw. I noticed one discrepancy:

./libgomp/loop_ull.c:

/* Cheap overflow protection.  */
if (__builtin_expect ((nthreads | ws->chunk_size_ull)
  < 1ULL << (sizeof (gomp_ull)
 * __CHAR_BIT__ / 2 - 1), 1))
  ws->mode = ws->end_ull < (__LONG_LONG_MAX__ * 2ULL + 1
- (nthreads + 1) * ws->chunk_size_ull);

while:

./libgomp/loop.c:

/* Cheap overflow protection.  */
if (__builtin_expect ((nthreads | ws->chunk_size)
  >= 1UL << (sizeof (long)
 * __CHAR_BIT__ / 2 - 1), 0))
  ws->mode = 0;

So once, it uses __builtin_expect(..., 1) and second time __builtin_expect(..., 
0).
Is it expected?

Thanks,
Martin


Re: [0/9] [middle-end] Add param to vec_perm_const hook to specify mode of input operand

2022-05-25 Thread Richard Biener via Gcc-patches
On Tue, May 24, 2022 at 9:22 PM Prathamesh Kulkarni via Gcc-patches
 wrote:
>
> On Tue, 24 May 2022 at 14:50, Richard Sandiford
>  wrote:
> >
> > Prathamesh Kulkarni  writes:
> > > diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> > > index c5006afc00d..0a3c733ada9 100644
> > > --- a/gcc/doc/tm.texi
> > > +++ b/gcc/doc/tm.texi
> > > @@ -6088,14 +6088,18 @@ for the given scalar type @var{type}.  
> > > @var{is_packed} is false if the scalar
> > >  access using @var{type} is known to be naturally aligned.
> > >  @end deftypefn
> > >
> > > -@deftypefn {Target Hook} bool TARGET_VECTORIZE_VEC_PERM_CONST 
> > > (machine_mode @var{mode}, rtx @var{output}, rtx @var{in0}, rtx @var{in1}, 
> > > const vec_perm_indices @var{})
> > > +@deftypefn {Target Hook} bool TARGET_VECTORIZE_VEC_PERM_CONST 
> > > (machine_mode @var{mode}, machine_mode @var{op_mode}, rtx @var{output}, 
> > > rtx @var{in0}, rtx @var{in1}, const vec_perm_indices @var{})
> > >  This hook is used to test whether the target can permute up to two
> > > -vectors of mode @var{mode} using the permutation vector @code{sel}, and
> > > -also to emit such a permutation.  In the former case @var{in0}, @var{in1}
> > > -and @var{out} are all null.  In the latter case @var{in0} and @var{in1} 
> > > are
> > > -the source vectors and @var{out} is the destination vector; all three are
> > > -operands of mode @var{mode}.  @var{in1} is the same as @var{in0} if
> > > -@var{sel} describes a permutation on one vector instead of two.
> > > +vectors of mode @var{op_mode} using the permutation vector @code{sel},
> > > +producing a vector of mode @var{mode}.The hook is also used to emit such
> >
> > Should be two spaces between “@var{mode}.” and “The”.
> >
> > > +a permutation.
> > > +
> > > +When the hook is being used to test whether the target supports a 
> > > permutation,
> > > +@var{in0}, @var{in1}, and @var{out} are all null.When the hook is being 
> > > used
> >
> > Same here: missing spaces before “When”.
> >
> > > +to emit a permutation, @var{in0} and @var{in1} are the source vectors of 
> > > mode
> > > +@var{op_mode} and @var{out} is the destination vector of mode @var{mode}.
> > > +@var{in1} is the same as @var{in0} if @var{sel} describes a permutation 
> > > on one
> > > +vector instead of two.
> > >
> > >  Return true if the operation is possible, emitting instructions for it
> > >  if rtxes are provided.
> > > diff --git a/gcc/match.pd b/gcc/match.pd
> > > index f5efa77560c..f2a527d9c42 100644
> > > --- a/gcc/match.pd
> > > +++ b/gcc/match.pd
> > > @@ -7596,6 +7596,8 @@ and,
> > >   (with
> > >{
> > >  tree op0 = @0, op1 = @1, op2 = @2;
> > > +machine_mode result_mode = TYPE_MODE (type);
> > > +machine_mode op_mode = TYPE_MODE (TREE_TYPE (op0));
> > >
> > >  /* Build a vector of integers from the tree mask.  */
> > >  vec_perm_builder builder;
> > > @@ -7703,12 +7705,12 @@ and,
> > >  2-argument version.  */
> > >   tree oldop2 = op2;
> > >   if (sel.ninputs () == 2
> > > -|| can_vec_perm_const_p (TYPE_MODE (type), sel, false))
> > > +|| can_vec_perm_const_p (result_mode, op_mode, sel, false))
> > > op2 = vec_perm_indices_to_tree (TREE_TYPE (op2), sel);
> > >   else
> > > {
> > >   vec_perm_indices sel2 (builder, 2, nelts);
> > > - if (can_vec_perm_const_p (TYPE_MODE (type), sel2, false))
> > > + if (can_vec_perm_const_p (result_mode, op_mode, sel2, 
> > > false))
> > > op2 = vec_perm_indices_to_tree (TREE_TYPE (op2), sel2);
> > >   else
> > > /* Not directly supported with either encoding,
> >
> > Please replace the use of TYPE_MODE here:
> >
> > /* See if the permutation is performing a single element
> >insert from a CONSTRUCTOR or constant and use a BIT_INSERT_EXPR
> >in that case.  But only if the vector mode is supported,
> >otherwise this is invalid GIMPLE.  */
> > if (TYPE_MODE (type) != BLKmode
> >
> > as well.
> >
> > OK with those changes, thanks.
> Thanks, committed the patch in ae8decf1d2b8329af59592b4fa78ee8dfab3ba5e.

So the present state allows to ask can_vec_perm_const_p but the
implementation asks for

431   if (direct_optab_handler (vec_perm_optab, mode) !=
CODE_FOR_nothing)
432 return true;

which then breaks.  Also the VEC_PERMs are not yet valid.  Any reason this
was committed half-way through the review process of the series?

At least I have a user in the vectorizer ready - allowing more permutes
from existing vectors (of different sizes now) to be SLP vectorized.

Thanks,
Richard.

> Thanks,
> Prathamesh
> >
> > Richard


Re: [PATCH] Add GIMPLE switch support to loop unswitching

2022-05-25 Thread Michael Matz via Gcc-patches
Hello,

On Wed, 25 May 2022, Richard Biener via Gcc-patches wrote:

> I guess we might want to (warn about labels in the "toplevel"
> scope in switch statements).  So warn about
> 
> switch (x)
> {
> case 1:
> bar:
> };

That style is actually used quite some time in GCC itself.  Sometimes with 
statements between 'case 1:' and 'bar:'.

> Maybe we just want to warn when the label is otherwise unused?

We do with -Wunused-labels (part of -Wall).  The testcases simply aren't 
compiled with that.


Ciao,
Michael.


Re: [PATCH] c: Improve build_component_ref diagnostics [PR91134]

2022-05-25 Thread Jakub Jelinek via Gcc-patches
On Tue, May 24, 2022 at 09:59:03AM -0400, David Malcolm wrote:
> > Ideally we'd have an automated check that the fix-it hint fixes the
> > code, but failing that, I like to have at least some DejaGnu test
> > coverage for fix-it hints - something like the tests in
> > gcc.dg/fixits.c
> > or gcc.dg/semicolon-fixits.c, perhaps?

Done, see another mail.

> Also, what does the output from:
>   -fdiagnostics-generate-patch
> look like?  That's usually the best way of checking if we're generating
> good fix-it hints.

--- pr91134.c
+++ pr91134.c
@@ -10,19 +10,19 @@
   struct X *pointer = 
   struct Y *yp = 
   struct X **pointerpointer = 
-  int i = *pointerpointer->member; /* { dg-error "'pointerpointer' is a 
pointer to pointer; did you mean to dereference it before applying '->' to 
it\\\?" } */
+  int i = *(*pointerpointer)->member;  /* { dg-error "'pointerpointer' is a 
pointer to pointer; did you mean to dereference it before applying '->' to 
it\\\?" } */
 /* { dg-begin-multiline-output "" }
int i = *pointerpointer->member;
   ^~
 (*)
{ dg-end-multiline-output "" } */
-  int j = pointer.member;  /* { dg-error "'pointer' is a pointer; 
did you mean to use '->'\\\?" } */
+  int j = pointer->member; /* { dg-error "'pointer' is a pointer; 
did you mean to use '->'\\\?" } */
 /* { dg-begin-multiline-output "" }
int j = pointer.member;
   ^
   ->
{ dg-end-multiline-output "" } */
-  int k = yp->x->member;   /* { dg-error "'yp->x' is a pointer to 
pointer; did you mean to dereference it before applying '->' to it\\\?" } */
+  int k = (*yp->x)->member;/* { dg-error "'yp->x' is a pointer to 
pointer; did you mean to dereference it before applying '->' to it\\\?" } */
 /* { dg-begin-multiline-output "" }
int k = yp->x->member;
 ^~

The second and third change actually fix those issues and make it compile,
the first one will generate a different error, because the right
change in that case is int i = (*pointerpointer)->member;
but guessing that in the compiler would be too hard, when
parsing pointerpointer->member we really don't know that
there is * before it...

Jakub



Re: [PATCH] c: Improve build_component_ref diagnostics [PR91134]

2022-05-25 Thread Jakub Jelinek via Gcc-patches
On Tue, May 24, 2022 at 09:43:54AM -0400, Marek Polacek wrote:
> Consider extending the test like

Done, + added the dg-*-multiline-output stuff to test the fixit hints.

Thanks.

Here is what I've committed:

2022-05-25  Jakub Jelinek  

PR c/91134
gcc/c/
* c-tree.h (build_component_ref): Add ARROW_LOC location_t argument.
* c-typeck.cc (build_component_ref): Likewise.  If DATUM is
INDIRECT_REF and ARROW_LOC isn't UNKNOWN_LOCATION, print a different
diagnostic and fixit hint if DATUM has pointer type.
* c-parser.cc (c_parser_postfix_expression,
c_parser_omp_variable_list): Adjust build_component_ref callers.
* gimple-parser.cc (c_parser_gimple_postfix_expression_after_primary):
Likewise.
gcc/objc/
* objc-act.cc (objc_build_component_ref): Adjust build_component_ref
caller.
gcc/testsuite/
* gcc.dg/pr91134.c: New test.

--- gcc/c/c-tree.h.jj   2022-05-25 11:06:58.381515989 +0200
+++ gcc/c/c-tree.h  2022-05-25 13:57:58.975539087 +0200
@@ -699,7 +699,8 @@ extern struct c_expr convert_lvalue_to_r
 extern tree decl_constant_value_1 (tree, bool);
 extern void mark_exp_read (tree);
 extern tree composite_type (tree, tree);
-extern tree build_component_ref (location_t, tree, tree, location_t);
+extern tree build_component_ref (location_t, tree, tree, location_t,
+location_t);
 extern tree build_array_ref (location_t, tree, tree);
 extern tree build_external_ref (location_t, tree, bool, tree *);
 extern void pop_maybe_used (bool);
--- gcc/c/c-typeck.cc.jj2022-05-25 11:06:58.388515915 +0200
+++ gcc/c/c-typeck.cc   2022-05-25 14:19:16.486336943 +0200
@@ -2457,11 +2457,12 @@ should_suggest_deref_p (tree datum_type)
 /* Make an expression to refer to the COMPONENT field of structure or
union value DATUM.  COMPONENT is an IDENTIFIER_NODE.  LOC is the
location of the COMPONENT_REF.  COMPONENT_LOC is the location
-   of COMPONENT.  */
+   of COMPONENT.  ARROW_LOC is the location of the first -> operand if
+   it is from -> operator.  */
 
 tree
 build_component_ref (location_t loc, tree datum, tree component,
-location_t component_loc)
+location_t component_loc, location_t arrow_loc)
 {
   tree type = TREE_TYPE (datum);
   enum tree_code code = TREE_CODE (type);
@@ -2577,11 +2578,23 @@ build_component_ref (location_t loc, tre
   /* Special-case the error message for "ptr.field" for the case
 where the user has confused "." vs "->".  */
   rich_location richloc (line_table, loc);
-  /* "loc" should be the "." token.  */
-  richloc.add_fixit_replace ("->");
-  error_at (,
-   "%qE is a pointer; did you mean to use %<->%>?",
-   datum);
+  if (TREE_CODE (datum) == INDIRECT_REF && arrow_loc != UNKNOWN_LOCATION)
+   {
+ richloc.add_fixit_insert_before (arrow_loc, "(*");
+ richloc.add_fixit_insert_after (arrow_loc, ")");
+ error_at (,
+   "%qE is a pointer to pointer; did you mean to dereference "
+   "it before applying %<->%> to it?",
+   TREE_OPERAND (datum, 0));
+   }
+  else
+   {
+ /* "loc" should be the "." token.  */
+ richloc.add_fixit_replace ("->");
+ error_at (,
+   "%qE is a pointer; did you mean to use %<->%>?",
+   datum);
+   }
   return error_mark_node;
 }
   else if (code != ERROR_MARK)
--- gcc/c/c-parser.cc.jj2022-05-25 11:06:58.355516262 +0200
+++ gcc/c/c-parser.cc   2022-05-25 13:57:58.980539036 +0200
@@ -9235,8 +9235,9 @@ c_parser_postfix_expression (c_parser *p
if (c_parser_next_token_is (parser, CPP_NAME))
  {
c_token *comp_tok = c_parser_peek_token (parser);
-   offsetof_ref = build_component_ref
- (loc, offsetof_ref, comp_tok->value, comp_tok->location);
+   offsetof_ref
+ = build_component_ref (loc, offsetof_ref, comp_tok->value,
+comp_tok->location, UNKNOWN_LOCATION);
c_parser_consume_token (parser);
while (c_parser_next_token_is (parser, CPP_DOT)
   || c_parser_next_token_is (parser,
@@ -9263,9 +9264,11 @@ c_parser_postfix_expression (c_parser *p
break;
  }
c_token *comp_tok = c_parser_peek_token (parser);
-   offsetof_ref = build_component_ref
- (loc, offsetof_ref, comp_tok->value,
-  comp_tok->location);
+   offsetof_ref
+ = build_component_ref (loc, offsetof_ref,
+comp_tok->value,
+comp_tok->location,
+   

Re: [PATCH v5] c++: ICE with temporary of class type in DMI [PR100252]

2022-05-25 Thread Jason Merrill via Gcc-patches

On 5/24/22 16:21, Marek Polacek wrote:

On Tue, May 24, 2022 at 04:01:37PM -0400, Jason Merrill wrote:

On 5/24/22 09:55, Marek Polacek wrote:

On Tue, May 24, 2022 at 08:36:39AM -0400, Jason Merrill wrote:

On 5/16/22 11:36, Marek Polacek wrote:

+static tree
+replace_placeholders_for_class_temp_r (tree *tp, int *, void *data)
+{
+  tree t = *tp;
+  tree full_expr = *static_cast(data);
+
+  /* We're looking for a TARGET_EXPR nested in the whole expression.  */
+  if (TREE_CODE (t) == TARGET_EXPR
+  && !potential_prvalue_result_of (t, full_expr))
+{
+  tree init = TARGET_EXPR_INITIAL (t);
+  while (TREE_CODE (init) == COMPOUND_EXPR)
+   init = TREE_OPERAND (init, 1);


Hmm, how do we get a COMPOUND_EXPR around a CONSTRUCTOR?


Sadly, that's possible for code like (from nsdmi-aggr18.C)

struct D {
int x = 42;
B b = (true, A{x});
};

where the TARGET_EXPR_INITIAL is
<<< Unknown tree: void_cst >>>, {.x=((struct D *) this)->x, .y=(&)->x}


Hmm, perhaps cp_build_compound_expr should build an additional TARGET_EXPR
around the COMPOUND_EXPR but leave the one inside alone. Feel free to
investigate that if you'd like, or the patch is OK as is.


Sorry, I was unclear.  The whole expression is:

TARGET_EXPR >>, 
{.x=((struct D *) this)->x, .y=(&)->x}>)>

so there *is* a TARGET_EXPR around the COMPOUND_EXPR.


Yes, but that's because cp_build_compound_expr changes 
TARGET_EXPR_INITIAL of the TARGET_EXPR from the CONSTRUCTOR to the 
COMPOUND_EXPR; I'm suggesting it might build a new one instead.



We'd have to build
a TARGET_EXPR around the COMPOUND_EXPR's RHS = the CONSTRUCTOR.  Frankly,
I'm not sure if it's worth the effort.  The while loop is somewhat unsightly
but not too bad.


Agreed.

Jason



Re: [PATCH] AArch64: Prioritise init_have_lse_atomics constructor [PR 105708]

2022-05-25 Thread Richard Sandiford via Gcc-patches
Wilco Dijkstra  writes:
> Hi Richard,
>
> I've added a comment - as usual it's just a number. A quick grep in gcc and
> glibc showed that priorities 98-101 are used, so I just went a bit below so it
> has a higher priority than typical initializations.

Thanks.  OK for trunk, and for backports after a trial period.

Richard

> Cheers,
> Wilco
>
> Here is v2:
>
> Increase the priority of the init_have_lse_atomics constructor so it runs
> before other constructors. This improves chances that rr works when LSE
> atomics are supported.
>
> Regress and bootstrap pass, OK for commit?
>
> 2022-05-24  Wilco Dijkstra  
>
> libgcc/
> PR libgcc/105708
> * config/aarch64/lse-init.c: Increase constructor priority.
>
> ---
>
> diff --git a/libgcc/config/aarch64/lse-init.c 
> b/libgcc/config/aarch64/lse-init.c
> index 
> fc875b7fe80e947623e570eac130e7a14b516551..33b97c8d766895cf0101a851e1dc4ed6a1a053d9
>  100644
> --- a/libgcc/config/aarch64/lse-init.c
> +++ b/libgcc/config/aarch64/lse-init.c
> @@ -38,7 +38,9 @@ _Bool __aarch64_have_lse_atomics
>
>  unsigned long int __getauxval (unsigned long int);
>
> -static void __attribute__((constructor))
> +/* Use a higher priority to ensure it runs before user constructors
> +   and library constructors with priority 100. */
> +static void __attribute__((constructor (90)))
>  init_have_lse_atomics (void)
>  {
>unsigned long hwcap = __getauxval (AT_HWCAP);


Re: [PATCH] libgomp: start using LIKELY/UNLIKELY macros

2022-05-25 Thread Jakub Jelinek via Gcc-patches
On Wed, May 25, 2022 at 01:52:46PM +0200, Martin Liška wrote:
> Similarly to g:22d9c8802add09a93308319fc37dd3a0f1125393, I would like to use
> {UN,}LIKELY macros in libgomp. If the community is fine, I'm planning doing
> the same in other GCC's libraries.

I must say I prefer __builtin_expect over the macros any time.

Jakub



[PATCH] libgomp: start using LIKELY/UNLIKELY macros

2022-05-25 Thread Martin Liška
Similarly to g:22d9c8802add09a93308319fc37dd3a0f1125393, I would like to use
{UN,}LIKELY macros in libgomp. If the community is fine, I'm planning doing
the same in other GCC's libraries.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

libgomp/ChangeLog:

* libgomp.h (LIKELY): Define macro.
(UNLIKELY): Likewise.
* affinity-fmt.c (gomp_display_string): Use the macros.
(gomp_display_repeat): Likewise.
* alloc.c (defined): Likewise.
* allocator.c (omp_aligned_alloc): Likewise.
(GOMP_alloc): Likewise.
(omp_aligned_calloc): Likewise.
(omp_realloc): Likewise.
* config/gcn/bar.c (gomp_barrier_wait_end): Likewise.
(gomp_team_barrier_wait_end): Likewise.
(gomp_team_barrier_wait_final): Likewise.
(gomp_team_barrier_wait_cancel_end): Likewise.
* config/gcn/bar.h (gomp_team_barrier_cancelled): Likewise.
* config/linux/alpha/futex.h (futex_wait): Likewise.
(futex_wake): Likewise.
* config/linux/bar.c (gomp_barrier_wait_end): Likewise.
(gomp_team_barrier_wait_end): Likewise.
(gomp_team_barrier_wait_final): Likewise.
(gomp_team_barrier_wait_cancel_end): Likewise.
* config/linux/bar.h (gomp_team_barrier_cancelled): Likewise.
* config/linux/futex.h (futex_wait): Likewise.
(futex_wake): Likewise.
* config/linux/ia64/futex.h (futex_wait): Likewise.
(futex_wake): Likewise.
* config/linux/lock.c (gomp_tid): Likewise.
* config/linux/mutex.h (gomp_mutex_unlock): Likewise.
* config/linux/powerpc/futex.h (sys_futex0): Likewise.
(futex_wait): Likewise.
(futex_wake): Likewise.
* config/linux/s390/futex.h (futex_wait): Likewise.
(futex_wake): Likewise.
* config/linux/sem.c (gomp_sem_wait_slow): Likewise.
* config/linux/sem.h (gomp_sem_post): Likewise.
* config/linux/sparc/futex.h (futex_wait): Likewise.
(futex_wake): Likewise.
* config/linux/wait.h (do_spin): Likewise.
* config/linux/x86/futex.h (futex_wait): Likewise.
(futex_wake): Likewise.
* config/nvptx/bar.c (futex_wait): Likewise.
* config/nvptx/bar.h (gomp_team_barrier_cancelled): Likewise.
* config/posix/bar.h (gomp_team_barrier_cancelled): Likewise.
* config/posix/pool.h (gomp_get_thread_pool): Likewise.
* config/rtems/bar.c (do_spin): Likewise.
* config/rtems/bar.h (gomp_team_barrier_cancelled): Likewise.
* config/rtems/pool.h (gomp_get_own_thread_pool): Likewise.
(gomp_get_thread_pool): Likewise.
* iter.c (gomp_iter_dynamic_next): Likewise.
(gomp_iter_guided_next): Likewise.
* iter_ull.c (gomp_iter_ull_static_next): Likewise.
(gomp_iter_ull_dynamic_next_locked): Likewise.
(gomp_iter_ull_dynamic_next): Likewise.
(gomp_iter_ull_guided_next_locked): Likewise.
(gomp_iter_ull_guided_next): Likewise.
(gomp_vdebug): Likewise.
(gomp_debug): Likewise.
(gomp_finish_task): Likewise.
(gomp_work_share_init_done): Likewise.
* loop.c (gomp_loop_init): Likewise.
* loop_ull.c (gomp_loop_ull_init): Likewise.
* oacc-int.h (GOACC_PROF_ENABLED): Likewise.
* oacc-profiling.c (_goacc_profiling_dispatch_p): Likewise.
(_goacc_profiling_setup_p): Likewise.
* ordered.c (GOMP_doacross_post): Likewise.
(GOMP_doacross_wait): Likewise.
(GOMP_doacross_ull_post): Likewise.
(GOMP_doacross_ull_wait): Likewise.
* parallel.c (gomp_resolve_num_threads): Likewise.
(GOMP_parallel_end): Likewise.
* plugin/plugin-gcn.c (alloc_by_agent): Likewise.
(gcn_exec): Likewise.
(GOMP_OFFLOAD_free): Likewise.
* plugin/plugin-nvptx.c (nvptx_exec): Likewise.
(nvptx_alloc): Likewise.
(nvptx_free): Likewise.
(GOMP_OFFLOAD_openacc_exec): Likewise.
(GOMP_OFFLOAD_openacc_async_exec): Likewise.
* priority_queue.h (priority_queue_multi_p): Likewise.
(priority_tree_insert): Likewise.
* single.c (GOMP_single_start): Likewise.
* target.c (gomp_copy_host2dev): Likewise.
(gomp_copy_dev2host): Likewise.
(GOMP_target_ext): Likewise.
(GOMP_target_update_ext): Likewise.
(GOMP_target_enter_exit_data): Likewise.
* task.c (gomp_task_handle_depend): Likewise.
(GOMP_task): Likewise.
(gomp_create_target_task): Likewise.
(priority_list_downgrade_task): Likewise.
(gomp_task_run_pre): Likewise.
(gomp_task_run_post_handle_dependers): Likewise.
(gomp_task_run_post_remove_parent): Likewise.
(gomp_barrier_handle_tasks): Likewise.
(GOMP_taskwait): Likewise.
(GOMP_taskwait_depend): Likewise.
(gomp_task_maybe_wait_for_dependencies): 

Re: [patch] [wwwdocs]+[invoke.texi] Update GCN for gfx90a (was: Re: [committed] amdgcn: Add gfx90a support)

2022-05-25 Thread Andrew Stubbs

On 25/05/2022 12:16, Tobias Burnus wrote:

On 25.05.22 11:18, Andrew Stubbs wrote:

On 24/05/2022 17:44, Tobias Burnus wrote:

On 24.05.22 17:31, Andrew Stubbs wrote:

amdgcn: Add gfx90a support
I've deliberately avoided the MI100 and MI200 names because they're 
really not that simple. MI100 is gfx908, but MI150 is gfx906 and MI125 
is gfx900.


(The last two were misread – they should be MI50 and MI25.)

The MI100 and MI200 names show up in the documents:

* '"AMD Instinct MI100" Instruction Set Architecture / Reference Guide'
   
https://developer.amd.com/wp-content/resources/CDNA1_Shader_ISA_14December2020.pdf 


* '"AMD Instinct MI200" Instruction Set Architecture / Reference Guide'
   
https://developer.amd.com/wp-content/resources/CDNA2_Shader_ISA_4February2022.pdf 



And to update the gcc-13/changes.html. 

I can only see one option?


In one patch, I had two  to choose from, in the revised follow-up 
patch,

which was compile (nothing to choose from).

Updated diffs attached. — OK?


OK, sorry for the confusion.


[...] GCN now requires llvm-mc of LLVM 13.0.1 or higher [..]
I'm going to update the Wiki page today. I'm not sure where else it is 
currently documented.


I think that's the only place.


It's done now.

Andrew



Re: [PATCH] Canonicalize X&-Y as X*Y in match.pd when Y is [0,1].

2022-05-25 Thread Richard Biener via Gcc-patches
On Tue, May 24, 2022 at 3:55 PM Roger Sayle  wrote:
>
>
> "For every pessimization, there's an equal and opposite optimization".
>
> In the review of my original patch for PR middle-end/98865, Richard
> Biener pointed out that match.pd shouldn't be transforming X*Y into
> X&-Y as the former is considered cheaper by tree-ssa's cost model
> (operator count).  A corollary of this is that we should instead be
> transforming X&-Y into the cheaper X*Y as a preferred canonical form
> (especially as RTL expansion now intelligently selects the appropriate
> implementation based on the target's costs).
>
> With this patch we now generate identical code for:
> int foo(int x, int y) { return -(x&1) & y; }
> int bar(int x, int y) { return (x&1) * y; }
>
> specifically on x86_64-pc-linux-gnu both use and/neg/and when
> optimizing for speed, but both use and/mul when optimizing for
> size.
>
> One minor wrinkle/improvement is that this patch includes three
> additional optimizations (that account for the change in canonical
> form) to continue to optimize PR92834 and PR94786.

Those are presumably the preceeding patterns which match

(convert? (negate@4 (convert? (cmp@5 @2 @3)))

for the multiplication operand - those should be zero_one_valued_p
maybe with the exception of the conversions?  So are the original
patterns still needed after the canonicalization to a multiplication?

Otherwise looks good to me.

Thanks,
Richard.

> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32},
> with no new failures.  Ok for mainline?
>
>
> 2022-05-24  Roger Sayle  
>
> gcc/ChangeLog
> * match.pd (match_zero_one_valued_p): New predicate.
> (mult @0 @1): Use zero_one_valued_p for optimization to the
> expression "bit_and @0 @1".
> (bit_and (negate zero_one_valued_p@0) @1): Optimize to MULT_EXPR.
> (plus @0 (mult (minus @1 @0) zero_one_valued_p@2): New transform.
> (minus @0 (mult (minus @0 @1) zero_one_valued_p@2): Likewise.
> (bit_xor @0 (mult (bit_xor @0 @1) zero_one_valued_p@2): Likewise.
>
> gcc/testsuite/ChangeLog
> * gcc.dg/pr98865.c: New test case.
>
>
> Thanks in advance,
> Roger
> --
>


Re: [patch] [wwwdocs]+[invoke.texi] Update GCN for gfx90a (was: Re: [committed] amdgcn: Add gfx90a support)

2022-05-25 Thread Tobias Burnus

On 25.05.22 11:18, Andrew Stubbs wrote:

On 24/05/2022 17:44, Tobias Burnus wrote:

On 24.05.22 17:31, Andrew Stubbs wrote:

amdgcn: Add gfx90a support

I've deliberately avoided the MI100 and MI200 names because they're
really not that simple. MI100 is gfx908, but MI150 is gfx906 and MI125
is gfx900.


(The last two were misread – they should be MI50 and MI25.)

The MI100 and MI200 names show up in the documents:

* '"AMD Instinct MI100" Instruction Set Architecture / Reference Guide'
  
https://developer.amd.com/wp-content/resources/CDNA1_Shader_ISA_14December2020.pdf
* '"AMD Instinct MI200" Instruction Set Architecture / Reference Guide'
  
https://developer.amd.com/wp-content/resources/CDNA2_Shader_ISA_4February2022.pdf


And to update the gcc-13/changes.html.

I can only see one option?


In one patch, I had two  to choose from, in the revised follow-up patch,
which was compile (nothing to choose from).

Updated diffs attached. — OK?


[...] GCN now requires llvm-mc of LLVM 13.0.1 or higher [..]

I'm going to update the Wiki page today. I'm not sure where else it is
currently documented.


I think that's the only place.

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
GCN: Add gfx908/gfx90a to -march/-mtune in invoke.texi

gcc/
	* doc/invoke.texi (AMD GCN Options): Add gfx908/gfx90a.

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 12f834ff01d..71098d86313 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -19737,7 +19737,6 @@ Set architecture type or tuning for @var{gpu}. Supported values for @var{gpu}
 are
 
 @table @samp
-@opindex fiji
 @item fiji
 Compile for GCN3 Fiji devices (gfx803).
 
@@ -19747,6 +19746,12 @@ Compile for GCN5 Vega 10 devices (gfx900).
 @item gfx906
 Compile for GCN5 Vega 20 devices (gfx906).
 
+@item gfx908
+Compile for CDNA1 Instinct MI100 series devices (gfx908).
+
+@item gfx90a
+Compile for CDNA2 Instinct MI200 series devices (gfx90a).
+
 @end table
 
 @item -msram-ecc=on
gcc-13/changes.html: Add gfx90a to GCN

diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html
index cc0e15eb..c2610412 100644
--- a/htdocs/gcc-13/changes.html
+++ b/htdocs/gcc-13/changes.html
@@ -99,6 +99,13 @@ a work-in-progress.
 
 
 
+AMD Radeon (GCN)
+
+  Support for the Instinct MI200 series devices (https://gcc.gnu.org/onlinedocs/gcc/AMD-GCN-Options.html;>
+  gfx90a) has been added.
+
+
 
 
 


[committed] d: add more 'final' and 'override' to gcc/d/*.cc 'visit' impls

2022-05-25 Thread Iain Buclaw via Gcc-patches
Hi,

The first round of adding these missed several more cases in other
files where the Visitor pattern is used in the D front-end.

Bootstrapped on x86_64-linux-gnu, and committed to mainline.

Regards,
Iain.

---
gcc/d/ChangeLog:

* expr.cc: Add "final" and "override" to all "visit" vfunc decls
as appropriate.
* imports.cc: Likewise.
* typeinfo.cc: Likewise.

Signed-off-by: Iain Buclaw 
---
 gcc/d/expr.cc | 110 +++---
 gcc/d/imports.cc  |  26 +--
 gcc/d/typeinfo.cc |  22 +-
 3 files changed, 79 insertions(+), 79 deletions(-)

diff --git a/gcc/d/expr.cc b/gcc/d/expr.cc
index c259e7df6d5..7edcbc47abc 100644
--- a/gcc/d/expr.cc
+++ b/gcc/d/expr.cc
@@ -234,7 +234,7 @@ public:
   /* Visitor interfaces, each Expression class should have
  overridden the default.  */
 
-  void visit (Expression *)
+  void visit (Expression *) final override
   {
 gcc_unreachable ();
   }
@@ -243,7 +243,7 @@ public:
  expression is void, then the resulting type is void.  Otherwise
  they are implicitly converted to a common type.  */
 
-  void visit (CondExp *e)
+  void visit (CondExp *e) final override
   {
 tree cond = convert_for_condition (build_expr (e->econd),
   e->econd->type);
@@ -263,7 +263,7 @@ public:
  usual conversions to bring them to a common type before comparison.
  The result type is bool.  */
 
-  void visit (IdentityExp *e)
+  void visit (IdentityExp *e) final override
   {
 tree_code code = (e->op == EXP::identity) ? EQ_EXPR : NE_EXPR;
 Type *tb1 = e->e1->type->toBasetype ();
@@ -328,7 +328,7 @@ public:
  equality or inequality.  Operands go through the usual conversions to 
bring
  them to a common type before comparison.  The result type is bool.  */
 
-  void visit (EqualExp *e)
+  void visit (EqualExp *e) final override
   {
 Type *tb1 = e->e1->type->toBasetype ();
 Type *tb2 = e->e2->type->toBasetype ();
@@ -475,7 +475,7 @@ public:
  exists in an associative array.  The result is a pointer to the
  element, or null if false.  */
 
-  void visit (InExp *e)
+  void visit (InExp *e) final override
   {
 Type *tb2 = e->e2->type->toBasetype ();
 Type *tkey = tb2->isTypeAArray ()->index->toBasetype ();
@@ -490,7 +490,7 @@ public:
 
   /* Build a relational expression.  The result type is bool.  */
 
-  void visit (CmpExp *e)
+  void visit (CmpExp *e) final override
   {
 Type *tb1 = e->e1->type->toBasetype ();
 Type *tb2 = e->e2->type->toBasetype ();
@@ -539,7 +539,7 @@ public:
  expression is void, then the resulting type is void.  Otherwise the
  result is bool.  */
 
-  void visit (LogicalExp *e)
+  void visit (LogicalExp *e) final override
   {
 tree_code code = (e->op == EXP::andAnd) ? TRUTH_ANDIF_EXPR : 
TRUTH_ORIF_EXPR;
 
@@ -571,7 +571,7 @@ public:
   /* Build a binary operand expression.  Operands go through usual arithmetic
  conversions to bring them to a common type before evaluating.  */
 
-  void visit (BinExp *e)
+  void visit (BinExp *e) final override
   {
 tree_code code;
 
@@ -666,7 +666,7 @@ public:
  same type, producing a dynamic array with the result.  If one operand
  is an element type, that element is converted to an array of length 1.  */
 
-  void visit (CatExp *e)
+  void visit (CatExp *e) final override
   {
 Type *tb1 = e->e1->type->toBasetype ();
 Type *tb2 = e->e2->type->toBasetype ();
@@ -745,7 +745,7 @@ public:
   /* Build an assignment operator expression.  The right operand is implicitly
  converted to the type of the left operand, and assigned to it.  */
 
-  void visit (BinAssignExp *e)
+  void visit (BinAssignExp *e) final override
   {
 tree_code code;
 Expression *e1b = e->e1;
@@ -818,7 +818,7 @@ public:
   /* Build a concat assignment expression.  The right operand is appended
  to the left operand.  */
 
-  void visit (CatAssignExp *e)
+  void visit (CatAssignExp *e) final override
   {
 Type *tb1 = e->e1->type->toBasetype ();
 Type *tb2 = e->e2->type->toBasetype ();
@@ -861,7 +861,7 @@ public:
   /* Build an assignment expression.  The right operand is implicitly
  converted to the type of the left operand, and assigned to it.  */
 
-  void visit (AssignExp *e)
+  void visit (AssignExp *e) final override
   {
 /* First, handle special assignment semantics.  */
 
@@ -1146,7 +1146,7 @@ public:
 
   /* Build a throw expression.  */
 
-  void visit (ThrowExp *e)
+  void visit (ThrowExp *e) final override
   {
 tree arg = build_expr_dtor (e->e1);
 this->result_ = build_libcall (LIBCALL_THROW, Type::tvoid, 1, arg);
@@ -1154,7 +1154,7 @@ public:
 
   /* Build a postfix expression.  */
 
-  void visit (PostExp *e)
+  void visit (PostExp *e) final override
   {
 tree result;
 
@@ -1177,7 +1177,7 @@ public:
 
   /* Build an index expression.  */
 
-  void visit (IndexExp *e)
+  void visit 

Re: [PATCH] Add GIMPLE switch support to loop unswitching

2022-05-25 Thread Richard Biener via Gcc-patches
On Wed, 25 May 2022, David Malcolm wrote:

> On Wed, 2022-05-25 at 10:36 +0200, Richard Biener via Gcc-patches
> wrote:
> > This patch adds support to unswitch loops with switch statements
> > based on invariant index.  It furthermore reworks the cost model
> > to allow an overall budget of statements to be created per original
> > loop by all unswitching opportunities in the loop.  Compared to
> > the original all unswitching opportunities in a loop are
> > pre-evaluated before the first transform which will allow future
> > changes to select the most profitable candidates first.
> > 
> > To efficiently support switch statements the pass now uses
> > ranger to simplify switch statements and conditions in loop
> > copies based on ranges extracted from the recorded set of
> > predicates unswitched.
> > 
> > Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed to trunk.
> > 
> > Richard.
> > 
> 
> [...snip...]
> 
> > diff --git a/gcc/testsuite/gcc.dg/loop-unswitch-10.c
> > b/gcc/testsuite/gcc.dg/loop-unswitch-10.c
> > new file mode 100644
> > index 000..5e4f16e2935
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/loop-unswitch-10.c
> > @@ -0,0 +1,56 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -funswitch-loops -fdump-tree-unswitch-
> > optimized" } */
> > +
> > +int
> > +__attribute__((noipa))
> > +foo(double *a, double *b, double *c, double *d, double *r, int size,
> > int order)
> > +{
> > +  for (int i = 0; i < size; i++)
> > +  {
> > +    double tmp, tmp2;
> > +
> > +    switch(order)
> > +    {
> > +  case 0:
> > +    tmp = -8 * a[i];
> > +    tmp2 = 2 * b[i];
> > +    break;
> > +  case 1: 
> > +    tmp = 3 * a[i] -  2 * b[i];
> > +    tmp2 = 5 * b[i] - 2 * c[i];
> > +    break;
> > +  case 2:
> > +    tmp = 9 * a[i] +  2 * b[i] + c[i];
> > +    tmp2 = 4 * b[i] + 2 * c[i] + 8 * d[i];
> > +    break;
> > +  case 3:
> > +    tmp = 3 * a[i] +  2 * b[i] - c[i];
> > +    tmp2 = b[i] - 2 * c[i] + 8 * d[i];
> > +    break;
> > +  defaut:
> > +    __builtin_unreachable ();
> 
> I'm guessing "defaut:" here is a typo for "default:" i.e. it's an
> unused label named "defaut", when presumably you meant to have an
> unreachable default case.  Does this affect the loop unswitching logic?
> 
> I wonder if we warn for this (or if we can/should?)

I guess we might want to (warn about labels in the "toplevel"
scope in switch statements).  So warn about

switch (x)
{
case 1:
bar:
};

the 'case' might be missing here (if 'bar' is a valid enumeration
constant for example) but not about

switch (x)
{
case 1:
 {
bar:
 }
};

Maybe we just want to warn when the label is otherwise unused?

> Looking at the patch, it seems to also be present in:
>   gcc/testsuite/gcc.dg/loop-unswitch-11.c
>   gcc/testsuite/gcc.dg/loop-unswitch-14.c
> but I might have missed some.

Heh - how did you spot these but me not looking over the patch
multiple times ...

And no, it doesn't change behavior so I just pushed a fix.

Richard.

> 
> 
> Dave
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)


[PATCH] tree-optimization/105726 - adjust array bound heuristic

2022-05-25 Thread Richard Biener via Gcc-patches
There's heuristic to detect ptr[1].a[...] out of bound accesses
reasoning that if ptr points to an array of aggregates a trailing
incomplete array has to have size zero.  The following more
thoroughly constrains the cases this applies to avoid false
positive diagnostics.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

OK?

Thanks,
Richard.

2022-05-25  Richard Biener  

PR tree-optimization/105726
* gimple-ssa-warn-restrict.cc (builtin_memref::set_base_and_offset):
Constrain array-of-flexarray case more.

* g++.dg/warn/Warray-bounds-27.C: New testcase.
---
 gcc/gimple-ssa-warn-restrict.cc  | 22 
 gcc/testsuite/g++.dg/warn/Warray-bounds-27.C | 16 ++
 2 files changed, 29 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/warn/Warray-bounds-27.C

diff --git a/gcc/gimple-ssa-warn-restrict.cc b/gcc/gimple-ssa-warn-restrict.cc
index b678e806da3..734cdd7f5b4 100644
--- a/gcc/gimple-ssa-warn-restrict.cc
+++ b/gcc/gimple-ssa-warn-restrict.cc
@@ -525,7 +525,6 @@ builtin_memref::set_base_and_offset (tree expr)
 {
   tree memrefoff = fold_convert (ptrdiff_type_node, TREE_OPERAND (base, 
1));
   extend_offset_range (memrefoff);
-  base = TREE_OPERAND (base, 0);
 
   if (refoff != HOST_WIDE_INT_MIN
  && TREE_CODE (expr) == COMPONENT_REF)
@@ -538,14 +537,19 @@ builtin_memref::set_base_and_offset (tree expr)
 REFOFF is set to s[1].b - (char*)s.  */
  offset_int off = tree_to_shwi (memrefoff);
  refoff += off;
-   }
-
-  if (!integer_zerop (memrefoff))
-   /* A non-zero offset into an array of struct with flexible array
-  members implies that the array is empty because there is no
-  way to initialize such a member when it belongs to an array.
-  This must be some sort of a bug.  */
-   refsize = 0;
+
+ if (!integer_zerop (memrefoff)
+ && !COMPLETE_TYPE_P (TREE_TYPE (expr))
+ && multiple_of_p (sizetype, memrefoff,
+   TYPE_SIZE_UNIT (TREE_TYPE (base)), true))
+   /* A non-zero offset into an array of struct with flexible array
+  members implies that the array is empty because there is no
+  way to initialize such a member when it belongs to an array.
+  This must be some sort of a bug.  */
+   refsize = 0;
+   }
+
+  base = TREE_OPERAND (base, 0);
 }
 
   if (TREE_CODE (ref) == COMPONENT_REF)
diff --git a/gcc/testsuite/g++.dg/warn/Warray-bounds-27.C 
b/gcc/testsuite/g++.dg/warn/Warray-bounds-27.C
new file mode 100644
index 000..06ce089c4b0
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Warray-bounds-27.C
@@ -0,0 +1,16 @@
+// PR105726
+// { dg-do compile }
+// { dg-require-effective-target c++11 }
+// { dg-options "-O2 -Warray-bounds" }
+
+#include 
+#include 
+
+struct X {
+char pad[4];
+std::array mField;
+};
+
+void encode(char* aBuffer, const X& aMessage) {
+strncpy(aBuffer, aMessage.mField.data(), 1); // { dg-bogus "bounds" }
+}
-- 
2.35.3


Re: [PATCH] AArch64: Prioritise init_have_lse_atomics constructor [PR 105708]

2022-05-25 Thread Wilco Dijkstra via Gcc-patches
Hi Richard,

I've added a comment - as usual it's just a number. A quick grep in gcc and
glibc showed that priorities 98-101 are used, so I just went a bit below so it
has a higher priority than typical initializations.

Cheers,
Wilco

Here is v2:

Increase the priority of the init_have_lse_atomics constructor so it runs
before other constructors. This improves chances that rr works when LSE
atomics are supported.

Regress and bootstrap pass, OK for commit?

2022-05-24  Wilco Dijkstra  

libgcc/
PR libgcc/105708
* config/aarch64/lse-init.c: Increase constructor priority.

---

diff --git a/libgcc/config/aarch64/lse-init.c b/libgcc/config/aarch64/lse-init.c
index 
fc875b7fe80e947623e570eac130e7a14b516551..33b97c8d766895cf0101a851e1dc4ed6a1a053d9
 100644
--- a/libgcc/config/aarch64/lse-init.c
+++ b/libgcc/config/aarch64/lse-init.c
@@ -38,7 +38,9 @@ _Bool __aarch64_have_lse_atomics
 
 unsigned long int __getauxval (unsigned long int);
 
-static void __attribute__((constructor))
+/* Use a higher priority to ensure it runs before user constructors
+   and library constructors with priority 100. */
+static void __attribute__((constructor (90)))
 init_have_lse_atomics (void)
 {
   unsigned long hwcap = __getauxval (AT_HWCAP);


Re: [PATCH] Add GIMPLE switch support to loop unswitching

2022-05-25 Thread David Malcolm via Gcc-patches
On Wed, 2022-05-25 at 10:36 +0200, Richard Biener via Gcc-patches
wrote:
> This patch adds support to unswitch loops with switch statements
> based on invariant index.  It furthermore reworks the cost model
> to allow an overall budget of statements to be created per original
> loop by all unswitching opportunities in the loop.  Compared to
> the original all unswitching opportunities in a loop are
> pre-evaluated before the first transform which will allow future
> changes to select the most profitable candidates first.
> 
> To efficiently support switch statements the pass now uses
> ranger to simplify switch statements and conditions in loop
> copies based on ranges extracted from the recorded set of
> predicates unswitched.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed to trunk.
> 
> Richard.
> 

[...snip...]

> diff --git a/gcc/testsuite/gcc.dg/loop-unswitch-10.c
> b/gcc/testsuite/gcc.dg/loop-unswitch-10.c
> new file mode 100644
> index 000..5e4f16e2935
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/loop-unswitch-10.c
> @@ -0,0 +1,56 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -funswitch-loops -fdump-tree-unswitch-
> optimized" } */
> +
> +int
> +__attribute__((noipa))
> +foo(double *a, double *b, double *c, double *d, double *r, int size,
> int order)
> +{
> +  for (int i = 0; i < size; i++)
> +  {
> +    double tmp, tmp2;
> +
> +    switch(order)
> +    {
> +  case 0:
> +    tmp = -8 * a[i];
> +    tmp2 = 2 * b[i];
> +    break;
> +  case 1: 
> +    tmp = 3 * a[i] -  2 * b[i];
> +    tmp2 = 5 * b[i] - 2 * c[i];
> +    break;
> +  case 2:
> +    tmp = 9 * a[i] +  2 * b[i] + c[i];
> +    tmp2 = 4 * b[i] + 2 * c[i] + 8 * d[i];
> +    break;
> +  case 3:
> +    tmp = 3 * a[i] +  2 * b[i] - c[i];
> +    tmp2 = b[i] - 2 * c[i] + 8 * d[i];
> +    break;
> +  defaut:
> +    __builtin_unreachable ();

I'm guessing "defaut:" here is a typo for "default:" i.e. it's an
unused label named "defaut", when presumably you meant to have an
unreachable default case.  Does this affect the loop unswitching logic?

I wonder if we warn for this (or if we can/should?)

Looking at the patch, it seems to also be present in:
  gcc/testsuite/gcc.dg/loop-unswitch-11.c
  gcc/testsuite/gcc.dg/loop-unswitch-14.c
but I might have missed some.


Dave



Re: [PATCH] asan: Fix up instrumentation of assignments which are both loads and stores [PR105714]

2022-05-25 Thread Richard Biener via Gcc-patches
On Wed, 25 May 2022, Jakub Jelinek wrote:

> Hi!
> 
> On the following testcase with -Os asan pass sees:
>[local count: 354334800]:
>   # h_21 = PHI 
>   *c.3_5 = *d.2_4;
>   h_15 = h_21 + 1;
>   if (h_15 != 3)
> goto ; [75.00%]
>   else
> goto ; [25.00%]
> 
>[local count: 118111600]:
>   *c.3_5 = MEM[(struct a *) + 12B];
>   _13 = c.3_5->x;
>   return _13;
> It instruments the
>   *c.3_5 = *d.2_4;
> assignment by adding
>   .ASAN_CHECK (7, c.3_5, 4, 4);
>   .ASAN_CHECK (6, d.2_4, 4, 4);
> before it (which later lowers to checking the corresponding shadow
> memory).  But when considering instrumentation of
>   *c.3_5 = MEM[(struct a *) + 12B];
> it doesn't instrument anything, because it sees that *c.3_5 store is
> already instrumented in a dominating block and so there is no need
> to instrument *c.3_5 store again (i.e. add another
>   .ASAN_CHECK (7, c.3_5, 4, 4);
> ).  That is true, but misses the fact that we still want to
> instrument the MEM[(struct a *) + 12B] load.
> 
> The following patch fixes that by changing has_stmt_been_instrumented_p
> to consider both store and load in the assignment if it does both
> (returning true iff both have been instrumented).
> That matches how we handle e.g. builtin calls, where we also perform AND
> of all the memory locs involved in the call.
> 
> I've verified that we still don't add the redundant
>   .ASAN_CHECK (7, c.3_5, 4, 4);
> call but just add
>   _18 = [(struct a *) + 12B];
>   .ASAN_CHECK (6, _18, 4, 4);
> to instrument the load.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Thanks,
Richard.
 
> 2022-05-25  Jakub Jelinek  
> 
>   PR sanitizer/105714
>   * asan.cc (has_stmt_been_instrumented_p): For assignments which
>   are both stores and loads, return true only if both destination
>   and source have been instrumented.
> 
>   * gcc.dg/asan/pr105714.c: New test.
> 
> --- gcc/asan.cc.jj2022-05-12 08:27:56.923834018 +0200
> +++ gcc/asan.cc   2022-05-24 11:39:28.527258357 +0200
> @@ -1285,7 +1285,20 @@ has_stmt_been_instrumented_p (gimple *st
>  
>if (get_mem_ref_of_assignment (as_a  (stmt), ,
>_is_store))
> - return has_mem_ref_been_instrumented ();
> + {
> +   if (!has_mem_ref_been_instrumented ())
> + return false;
> +   if (r_is_store && gimple_assign_load_p (stmt))
> + {
> +   asan_mem_ref src;
> +   asan_mem_ref_init (, NULL, 1);
> +   src.start = gimple_assign_rhs1 (stmt);
> +   src.access_size = int_size_in_bytes (TREE_TYPE (src.start));
> +   if (!has_mem_ref_been_instrumented ())
> + return false;
> + }
> +   return true;
> + }
>  }
>else if (gimple_call_builtin_p (stmt, BUILT_IN_NORMAL))
>  {
> --- gcc/testsuite/gcc.dg/asan/pr105714.c.jj   2022-05-24 11:50:26.753570348 
> +0200
> +++ gcc/testsuite/gcc.dg/asan/pr105714.c  2022-05-24 11:51:01.074225766 
> +0200
> @@ -0,0 +1,33 @@
> +/* PR sanitizer/105714 */
> +/* { dg-do run } */
> +/* { dg-skip-if "" { *-*-* } { "*" } { "-Os" } } */
> +/* { dg-shouldfail "asan" } */
> +
> +struct A { int x; };
> +struct A b[2];
> +struct A *c = b, *d = b;
> +int e;
> +
> +int
> +foo ()
> +{
> +  for (e = 0; e < 1; e++)
> +{
> +  int i[1];
> +  i;
> +}
> +  for (int h = 0; h < 3; h++)
> +*c = *d;
> +  *c = *(b + 3);
> +  return c->x;
> +}
> +
> +int
> +main ()
> +{
> +  foo ();
> +  return 0;
> +}
> +
> +/* { dg-output "ERROR: AddressSanitizer: global-buffer-overflow on 
> address.*(\n|\r\n|\r)" } */
> +/* { dg-output "READ of size.*" } */
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)


[PATCH] asan: Fix up instrumentation of assignments which are both loads and stores [PR105714]

2022-05-25 Thread Jakub Jelinek via Gcc-patches
Hi!

On the following testcase with -Os asan pass sees:
   [local count: 354334800]:
  # h_21 = PHI 
  *c.3_5 = *d.2_4;
  h_15 = h_21 + 1;
  if (h_15 != 3)
goto ; [75.00%]
  else
goto ; [25.00%]

   [local count: 118111600]:
  *c.3_5 = MEM[(struct a *) + 12B];
  _13 = c.3_5->x;
  return _13;
It instruments the
  *c.3_5 = *d.2_4;
assignment by adding
  .ASAN_CHECK (7, c.3_5, 4, 4);
  .ASAN_CHECK (6, d.2_4, 4, 4);
before it (which later lowers to checking the corresponding shadow
memory).  But when considering instrumentation of
  *c.3_5 = MEM[(struct a *) + 12B];
it doesn't instrument anything, because it sees that *c.3_5 store is
already instrumented in a dominating block and so there is no need
to instrument *c.3_5 store again (i.e. add another
  .ASAN_CHECK (7, c.3_5, 4, 4);
).  That is true, but misses the fact that we still want to
instrument the MEM[(struct a *) + 12B] load.

The following patch fixes that by changing has_stmt_been_instrumented_p
to consider both store and load in the assignment if it does both
(returning true iff both have been instrumented).
That matches how we handle e.g. builtin calls, where we also perform AND
of all the memory locs involved in the call.

I've verified that we still don't add the redundant
  .ASAN_CHECK (7, c.3_5, 4, 4);
call but just add
  _18 = [(struct a *) + 12B];
  .ASAN_CHECK (6, _18, 4, 4);
to instrument the load.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2022-05-25  Jakub Jelinek  

PR sanitizer/105714
* asan.cc (has_stmt_been_instrumented_p): For assignments which
are both stores and loads, return true only if both destination
and source have been instrumented.

* gcc.dg/asan/pr105714.c: New test.

--- gcc/asan.cc.jj  2022-05-12 08:27:56.923834018 +0200
+++ gcc/asan.cc 2022-05-24 11:39:28.527258357 +0200
@@ -1285,7 +1285,20 @@ has_stmt_been_instrumented_p (gimple *st
 
   if (get_mem_ref_of_assignment (as_a  (stmt), ,
 _is_store))
-   return has_mem_ref_been_instrumented ();
+   {
+ if (!has_mem_ref_been_instrumented ())
+   return false;
+ if (r_is_store && gimple_assign_load_p (stmt))
+   {
+ asan_mem_ref src;
+ asan_mem_ref_init (, NULL, 1);
+ src.start = gimple_assign_rhs1 (stmt);
+ src.access_size = int_size_in_bytes (TREE_TYPE (src.start));
+ if (!has_mem_ref_been_instrumented ())
+   return false;
+   }
+ return true;
+   }
 }
   else if (gimple_call_builtin_p (stmt, BUILT_IN_NORMAL))
 {
--- gcc/testsuite/gcc.dg/asan/pr105714.c.jj 2022-05-24 11:50:26.753570348 
+0200
+++ gcc/testsuite/gcc.dg/asan/pr105714.c2022-05-24 11:51:01.074225766 
+0200
@@ -0,0 +1,33 @@
+/* PR sanitizer/105714 */
+/* { dg-do run } */
+/* { dg-skip-if "" { *-*-* } { "*" } { "-Os" } } */
+/* { dg-shouldfail "asan" } */
+
+struct A { int x; };
+struct A b[2];
+struct A *c = b, *d = b;
+int e;
+
+int
+foo ()
+{
+  for (e = 0; e < 1; e++)
+{
+  int i[1];
+  i;
+}
+  for (int h = 0; h < 3; h++)
+*c = *d;
+  *c = *(b + 3);
+  return c->x;
+}
+
+int
+main ()
+{
+  foo ();
+  return 0;
+}
+
+/* { dg-output "ERROR: AddressSanitizer: global-buffer-overflow on 
address.*(\n|\r\n|\r)" } */
+/* { dg-output "READ of size.*" } */

Jakub



Re: [PATCH v2] DSE: Use the constant store source if possible

2022-05-25 Thread Richard Sandiford via Gcc-patches
Richard Biener via Gcc-patches  writes:
> On Tue, May 24, 2022 at 10:11 PM H.J. Lu  wrote:
>>
>> On Mon, May 23, 2022 at 11:42 PM Richard Biener
>>  wrote:
>> >
>> > On Mon, May 23, 2022 at 8:34 PM H.J. Lu  wrote:
>> > >
>> > > On Mon, May 23, 2022 at 12:38:06PM +0200, Richard Biener wrote:
>> > > > On Sat, May 21, 2022 at 5:02 AM H.J. Lu via Gcc-patches
>> > > >  wrote:
>> > > > >
>> > > > > When recording store for RTL dead store elimination, check if the 
>> > > > > source
>> > > > > register is set only once to a constant.  If yes, record the constant
>> > > > > as the store source.  It eliminates unrolled zero stores after 
>> > > > > memset 0
>> > > > > in a loop where a vector register is used as the zero store source.
>> > > > >
>> > > > > gcc/
>> > > > >
>> > > > > PR rtl-optimization/105638
>> > > > > * dse.cc (record_store): Use the constant source if the 
>> > > > > source
>> > > > > register is set only once.
>> > > > >
>> > > > > gcc/testsuite/
>> > > > >
>> > > > > PR rtl-optimization/105638
>> > > > > * g++.target/i386/pr105638.C: New test.
>> > > > > ---
>> > > > >  gcc/dse.cc   | 19 ++
>> > > > >  gcc/testsuite/g++.target/i386/pr105638.C | 44 
>> > > > > 
>> > > > >  2 files changed, 63 insertions(+)
>> > > > >  create mode 100644 gcc/testsuite/g++.target/i386/pr105638.C
>> > > > >
>> > > > > diff --git a/gcc/dse.cc b/gcc/dse.cc
>> > > > > index 30c11cee034..0433dd3d846 100644
>> > > > > --- a/gcc/dse.cc
>> > > > > +++ b/gcc/dse.cc
>> > > > > @@ -1508,6 +1508,25 @@ record_store (rtx body, bb_info_t bb_info)
>> > > > >
>> > > > >   if (tem && CONSTANT_P (tem))
>> > > > > const_rhs = tem;
>> > > > > + else
>> > > > > +   {
>> > > > > + /* If RHS is set only once to a constant, set CONST_RHS
>> > > > > +to the constant.  */
>> > > > > + df_ref def = DF_REG_DEF_CHAIN (REGNO (rhs));
>> > > > > + if (def != nullptr
>> > > > > + && !DF_REF_IS_ARTIFICIAL (def)
>> > > > > + && !DF_REF_NEXT_REG (def))
>> > > > > +   {
>> > > > > + rtx_insn *def_insn = DF_REF_INSN (def);
>> > > > > + rtx def_body = PATTERN (def_insn);
>> > > > > + if (GET_CODE (def_body) == SET)
>> > > > > +   {
>> > > > > + rtx def_src = SET_SRC (def_body);
>> > > > > + if (CONSTANT_P (def_src))
>> > > > > +   const_rhs = def_src;
>> > > >
>> > > > doesn't DSE have its own tracking of stored values?  Shouldn't we
>> > >
>> > > It tracks stored values only within the basic block.  When RTL loop
>> > > invariant motion hoists a constant initialization out of the loop into
>> > > a separate basic block, the constant store value becomes unknown
>> > > within the original basic block.
>> > >
>> > > > improve _that_ if it is not enough?  I also wonder if you need to
>> > >
>> > > My patch extends DSE stored value tracking to include the constant which
>> > > is set only once in another basic block.
>> > >
>> > > > verify the SET isn't partial?
>> > > >
>> > >
>> > > Here is the v2 patch to check that the constant is set by a non-partial
>> > > unconditional load.
>> > >
>> > > OK for master?
>> > >
>> > > Thanks.
>> > >
>> > > H.J.
>> > > ---
>> > > RTL DSE tracks redundant constant stores within a basic block.  When RTL
>> > > loop invariant motion hoists a constant initialization out of the loop
>> > > into a separate basic block, the constant store value becomes unknown
>> > > within the original basic block.  When recording store for RTL DSE, check
>> > > if the source register is set only once to a constant by a non-partial
>> > > unconditional load.  If yes, record the constant as the constant store
>> > > source.  It eliminates unrolled zero stores after memset 0 in a loop
>> > > where a vector register is used as the zero store source.
>> > >
>> > > gcc/
>> > >
>> > > PR rtl-optimization/105638
>> > > * dse.cc (record_store): Use the constant source if the source
>> > > register is set only once.
>> > >
>> > > gcc/testsuite/
>> > >
>> > > PR rtl-optimization/105638
>> > > * g++.target/i386/pr105638.C: New test.
>> > > ---
>> > >  gcc/dse.cc   | 22 
>> > >  gcc/testsuite/g++.target/i386/pr105638.C | 44 
>> > >  2 files changed, 66 insertions(+)
>> > >  create mode 100644 gcc/testsuite/g++.target/i386/pr105638.C
>> > >
>> > > diff --git a/gcc/dse.cc b/gcc/dse.cc
>> > > index 30c11cee034..af8e88dac32 100644
>> > > --- a/gcc/dse.cc
>> > > +++ b/gcc/dse.cc
>> > > @@ -1508,6 +1508,28 @@ record_store (rtx body, bb_info_t bb_info)
>> > >
>> > >   if (tem && CONSTANT_P (tem))
>> > > const_rhs = tem;
>> > > + else
>> > > +   {
>> > > + /* 

[committed] libgomp: Fix occassional hangs with taskwait nowait depend

2022-05-25 Thread Jakub Jelinek via Gcc-patches
Hi!

Richi reported occassional hangs with taskwait-depend-nowait-1.*
tests and I've finally manged to reproduce.  The problem is if
taskwait depend without nowait is encountered soon after
taskwait depend nowait and the former depends on the latter and there
is no other work to do, the taskwait depend without nowait is put
to sleep, but the empty_task optimization in
gomp_task_run_post_handle_dependers wouldn't wake it up in that
case.  gomp_task_run_post_handle_dependers normally does some wakeups
because it schedules more work (another task), which is not the
case of empty_task, but we need to do the wakeups that would be done
upon task completion so that we awake sleeping threads when the
last child is done.
So, the taskwait-depend-nowait-1.* testcase is fixed with the
else if (__builtin_expect (task->parent_depends_on, 0) part of
the patch.
The new testcase can hang on another problem, if the empty task
is the last task of a taskgroup, we need to use atomic store
like elsewhere to decrease the counter to 0, and wake up taskgroup
end if needed.
Yet another spot which can sleep is normal taskwait (without depend),
but I believe nothing needs to be done for that - in that case we
await solely until the children's queue has no tasks, tasks still
waiting for dependencies aren't accounted in that, but the reason
is that if taskwait should wait for something, there needs to be at least
one active child doing something (in the children queue), which then
possibly awakes some of its siblings when the dependencies are met,
or in the empty task case awakes further dependencies, but in any
case the child that finished is still handled as active child and
will awake taskwait at the end if there is nothing further to
do.
Last sleeping case are barriers, but that is handled by ++ret and
awaking the barrier.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed
to trunk.

2022-05-25  Jakub Jelinek  

* task.c (gomp_task_run_post_handle_dependers): If empty_task
is the last task taskwait depend depends on, wake it up.
Similarly if it is the last child of a taskgroup, use atomic
store instead of decrement and awak taskgroup wait if any.
* testsuite/libgomp.c-c++-common/taskwait-depend-nowait-2.c: New test.

--- libgomp/task.c.jj   2022-05-23 22:38:26.381094885 +0200
+++ libgomp/task.c  2022-05-24 18:23:03.054074341 +0200
@@ -1382,10 +1382,30 @@ gomp_task_run_post_handle_dependers (str
{
  if (!parent)
task->parent = NULL;
+ else if (__builtin_expect (task->parent_depends_on, 0)
+  && --parent->taskwait->n_depend == 0
+  && parent->taskwait->in_depend_wait)
+   {
+ parent->taskwait->in_depend_wait = false;
+ gomp_sem_post (>taskwait->taskwait_sem);
+   }
  if (gomp_task_run_post_handle_depend (task, team))
++ret;
  if (taskgroup)
-   taskgroup->num_children--;
+   {
+ if (taskgroup->num_children > 1)
+   --taskgroup->num_children;
+ else
+   {
+ __atomic_store_n (>num_children, 0,
+   MEMMODEL_RELEASE);
+ if (taskgroup->in_taskgroup_wait)
+   {
+ taskgroup->in_taskgroup_wait = false;
+ gomp_sem_post (>taskgroup_sem);
+   }
+   }
+   }
  gomp_finish_task (task);
  free (task);
  continue;
--- libgomp/testsuite/libgomp.c-c++-common/taskwait-depend-nowait-2.c.jj
2022-05-25 09:56:20.131618294 +0200
+++ libgomp/testsuite/libgomp.c-c++-common/taskwait-depend-nowait-2.c   
2022-05-25 10:50:36.781833445 +0200
@@ -0,0 +1,48 @@
+#include 
+#include 
+
+int
+main ()
+{
+  int a[48], b = 1;
+  #pragma omp parallel num_threads (4)
+  {
+#pragma omp barrier
+#pragma omp single
+{
+  int i;
+  for (i = 0; i < 48; ++i)
+   #pragma omp task depend(in: a) shared(a)
+ a[i] = i;
+  for (i = 0; i < 32; ++i)
+   {
+ #pragma omp taskwait depend(inout: a) nowait
+   }
+  #pragma omp taskwait
+  for (i = 0; i < 48; ++i)
+   if (a[i] != i)
+ abort ();
+  for (i = 0; i < 48; ++i)
+   #pragma omp task depend(in: a) shared(a)
+ a[i] = 2 * i + 1;
+  #pragma omp taskgroup
+  {
+   #pragma omp taskwait depend(inoutset: a) nowait
+   #pragma omp taskgroup
+   {
+ #pragma omp taskwait depend(inoutset: a) nowait
+   }
+  }
+  for (i = 0; i < 48; ++i)
+   if (a[i] != 2 * i + 1)
+ abort ();
+  #pragma omp task depend(in: a) shared(a)
+  usleep (5000);
+  #pragma omp taskgroup
+  {
+   #pragma omp taskwait depend(inout: a) nowait
+  }
+}
+  }
+  return 0;
+}

Jakub



Re: [PATCH v2] DSE: Use the constant store source if possible

2022-05-25 Thread Richard Biener via Gcc-patches
On Tue, May 24, 2022 at 10:11 PM H.J. Lu  wrote:
>
> On Mon, May 23, 2022 at 11:42 PM Richard Biener
>  wrote:
> >
> > On Mon, May 23, 2022 at 8:34 PM H.J. Lu  wrote:
> > >
> > > On Mon, May 23, 2022 at 12:38:06PM +0200, Richard Biener wrote:
> > > > On Sat, May 21, 2022 at 5:02 AM H.J. Lu via Gcc-patches
> > > >  wrote:
> > > > >
> > > > > When recording store for RTL dead store elimination, check if the 
> > > > > source
> > > > > register is set only once to a constant.  If yes, record the constant
> > > > > as the store source.  It eliminates unrolled zero stores after memset > > > > > 0
> > > > > in a loop where a vector register is used as the zero store source.
> > > > >
> > > > > gcc/
> > > > >
> > > > > PR rtl-optimization/105638
> > > > > * dse.cc (record_store): Use the constant source if the source
> > > > > register is set only once.
> > > > >
> > > > > gcc/testsuite/
> > > > >
> > > > > PR rtl-optimization/105638
> > > > > * g++.target/i386/pr105638.C: New test.
> > > > > ---
> > > > >  gcc/dse.cc   | 19 ++
> > > > >  gcc/testsuite/g++.target/i386/pr105638.C | 44 
> > > > > 
> > > > >  2 files changed, 63 insertions(+)
> > > > >  create mode 100644 gcc/testsuite/g++.target/i386/pr105638.C
> > > > >
> > > > > diff --git a/gcc/dse.cc b/gcc/dse.cc
> > > > > index 30c11cee034..0433dd3d846 100644
> > > > > --- a/gcc/dse.cc
> > > > > +++ b/gcc/dse.cc
> > > > > @@ -1508,6 +1508,25 @@ record_store (rtx body, bb_info_t bb_info)
> > > > >
> > > > >   if (tem && CONSTANT_P (tem))
> > > > > const_rhs = tem;
> > > > > + else
> > > > > +   {
> > > > > + /* If RHS is set only once to a constant, set CONST_RHS
> > > > > +to the constant.  */
> > > > > + df_ref def = DF_REG_DEF_CHAIN (REGNO (rhs));
> > > > > + if (def != nullptr
> > > > > + && !DF_REF_IS_ARTIFICIAL (def)
> > > > > + && !DF_REF_NEXT_REG (def))
> > > > > +   {
> > > > > + rtx_insn *def_insn = DF_REF_INSN (def);
> > > > > + rtx def_body = PATTERN (def_insn);
> > > > > + if (GET_CODE (def_body) == SET)
> > > > > +   {
> > > > > + rtx def_src = SET_SRC (def_body);
> > > > > + if (CONSTANT_P (def_src))
> > > > > +   const_rhs = def_src;
> > > >
> > > > doesn't DSE have its own tracking of stored values?  Shouldn't we
> > >
> > > It tracks stored values only within the basic block.  When RTL loop
> > > invariant motion hoists a constant initialization out of the loop into
> > > a separate basic block, the constant store value becomes unknown
> > > within the original basic block.
> > >
> > > > improve _that_ if it is not enough?  I also wonder if you need to
> > >
> > > My patch extends DSE stored value tracking to include the constant which
> > > is set only once in another basic block.
> > >
> > > > verify the SET isn't partial?
> > > >
> > >
> > > Here is the v2 patch to check that the constant is set by a non-partial
> > > unconditional load.
> > >
> > > OK for master?
> > >
> > > Thanks.
> > >
> > > H.J.
> > > ---
> > > RTL DSE tracks redundant constant stores within a basic block.  When RTL
> > > loop invariant motion hoists a constant initialization out of the loop
> > > into a separate basic block, the constant store value becomes unknown
> > > within the original basic block.  When recording store for RTL DSE, check
> > > if the source register is set only once to a constant by a non-partial
> > > unconditional load.  If yes, record the constant as the constant store
> > > source.  It eliminates unrolled zero stores after memset 0 in a loop
> > > where a vector register is used as the zero store source.
> > >
> > > gcc/
> > >
> > > PR rtl-optimization/105638
> > > * dse.cc (record_store): Use the constant source if the source
> > > register is set only once.
> > >
> > > gcc/testsuite/
> > >
> > > PR rtl-optimization/105638
> > > * g++.target/i386/pr105638.C: New test.
> > > ---
> > >  gcc/dse.cc   | 22 
> > >  gcc/testsuite/g++.target/i386/pr105638.C | 44 
> > >  2 files changed, 66 insertions(+)
> > >  create mode 100644 gcc/testsuite/g++.target/i386/pr105638.C
> > >
> > > diff --git a/gcc/dse.cc b/gcc/dse.cc
> > > index 30c11cee034..af8e88dac32 100644
> > > --- a/gcc/dse.cc
> > > +++ b/gcc/dse.cc
> > > @@ -1508,6 +1508,28 @@ record_store (rtx body, bb_info_t bb_info)
> > >
> > >   if (tem && CONSTANT_P (tem))
> > > const_rhs = tem;
> > > + else
> > > +   {
> > > + /* If RHS is set only once to a constant, set CONST_RHS
> > > +to the constant.  */
> > > + df_ref def = DF_REG_DEF_CHAIN (REGNO (rhs));
> > > +   

Re: [patch] [wwwdocs]+[invoke.texi] Update GCN for gfx90a (was: Re: [committed] amdgcn: Add gfx90a support)

2022-05-25 Thread Andrew Stubbs

On 24/05/2022 17:44, Tobias Burnus wrote:

On 24.05.22 17:31, Andrew Stubbs wrote:

amdgcn: Add gfx90a support


Attached is an attempt to update invoke.texi


I've deliberately avoided the MI100 and MI200 names because they're 
really not that simple. MI100 is gfx908, but MI150 is gfx906 and MI125 
is gfx900. I'm not sure what's going on with the MI200 series, now or in 
the future.


And to update the gcc-13/changes.html. Regarding the latter, I have to 
versions – the first is more readable, the latter makes more clear where 
to use it, but reads much worse. – Pick one or suggest a better one.


I can only see one option?

PS: I was thinking of mentioning that GCN now requires llvm-mc of LLVM 
13.0.1 or higher (during build + installed as assembler) but I then 
thought that changes.html is not the best place and there is an error 
during build, stating what is needed.


I'm going to update the Wiki page today. I'm not sure where else it is 
currently documented.


Andrew


[ping][vect-patterns] Refactor widen_plus/widen_minus as internal_fns

2022-05-25 Thread Joel Hutton via Gcc-patches
Ping!

Just checking there is still interest in this. I'm assuming you've been busy 
with release.

Joel

> -Original Message-
> From: Joel Hutton
> Sent: 13 April 2022 16:53
> To: Richard Sandiford 
> Cc: Richard Biener ; gcc-patches@gcc.gnu.org
> Subject: [vect-patterns] Refactor widen_plus/widen_minus as internal_fns
> 
> Hi all,
> 
> These patches refactor the widening patterns in vect-patterns to use
> internal_fn instead of tree_codes.
> 
> Sorry about the delay, some changes to master made it a bit messier.
> 
> Bootstrapped and regression tested on aarch64.
> 
> Joel
> 
> > > diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
> > > index 854cbcff390..4a8ea67e62f 100644
> > > --- a/gcc/tree-vect-patterns.c
> > > +++ b/gcc/tree-vect-patterns.c
> > > @@ -1245,7 +1245,7 @@ vect_recog_sad_pattern (vec_info *vinfo,
> > > static gimple *  vect_recog_widen_op_pattern (vec_info *vinfo,
> > >stmt_vec_info last_stmt_info, tree *type_out,
> > > -  tree_code orig_code, tree_code wide_code,
> > > +  tree_code orig_code, code_helper
> > wide_code_or_ifn,
> >
> > I think it'd be better to keep the original “wide_code” name and try
> > to remove as many places as possible in which switching based on
> > tree_code or internal_fn is necessary.  The recent gimple-match.h
> > patches should help with that, but more routines might be needed.
> 
> Done.
> 
> > > @@ -1309,8 +1310,16 @@ vect_recog_widen_op_pattern (vec_info
> *vinfo,
> > >  2, oprnd, half_type, unprom, vectype);
> > >
> > >tree var = vect_recog_temp_ssa_var (itype, NULL);
> > > -  gimple *pattern_stmt = gimple_build_assign (var, wide_code,
> > > -   oprnd[0], oprnd[1]);
> > > +  gimple *pattern_stmt;
> > > +  if (wide_code_or_ifn.is_tree_code ())
> > > +pattern_stmt = gimple_build_assign (var, wide_code_or_ifn,
> > > + oprnd[0], oprnd[1]);
> > > +  else
> > > +{
> > > +  internal_fn fn = as_internal_fn ((combined_fn) wide_code_or_ifn);
> > > +  pattern_stmt = gimple_build_call_internal (fn, 2, oprnd[0], 
> > > oprnd[1]);
> > > +  gimple_call_set_lhs (pattern_stmt, var);
> > > +}
> >
> > For example, I think we should hide this inside a new:
> >
> >   gimple_build (var, wide_code, oprnd[0], oprnd[1]);
> >
> > that works directly on code_helper, similarly to the new code_helper
> > gimple_build interfaces.
> 
> Done.
> 
> > > @@ -4513,14 +4513,16 @@ vect_gen_widened_results_half (vec_info
> > *vinfo, enum tree_code code,
> > >tree new_temp;
> > >
> > >/* Generate half of the widened result:  */
> > > -  gcc_assert (op_type == TREE_CODE_LENGTH (code));
> > >if (op_type != binary_op)
> > >  vec_oprnd1 = NULL;
> > > -  new_stmt = gimple_build_assign (vec_dest, code, vec_oprnd0,
> > vec_oprnd1);
> > > +  if (ch.is_tree_code ())
> > > +new_stmt = gimple_build_assign (vec_dest, ch, vec_oprnd0,
> > vec_oprnd1);
> > > +  else
> > > +new_stmt = gimple_build_call_internal (as_internal_fn
> > > + ((combined_fn)
> > ch),
> > > +2, vec_oprnd0, vec_oprnd1);
> >
> > Similarly here.  I guess the combined_fn/internal_fn path will also
> > need to cope with null trailing operands, for consistency with the tree_code
> one.
> >
> 
> Done.
> 
> > > @@ -4744,31 +4747,28 @@ vectorizable_conversion (vec_info *vinfo,
> > >&& ! vec_stmt)
> > >  return false;
> > >
> > > -  gassign *stmt = dyn_cast  (stmt_info->stmt);
> > > -  if (!stmt)
> > > +  gimple* stmt = stmt_info->stmt;
> > > +  if (!(is_gimple_assign (stmt) || is_gimple_call (stmt)))
> > >  return false;
> > >
> > > -  if (TREE_CODE (gimple_assign_lhs (stmt)) != SSA_NAME)
> > > -return false;
> > > +  if (is_gimple_assign (stmt))
> > > +  {
> > > +code_or_ifn = gimple_assign_rhs_code (stmt);  }  else
> > > +code_or_ifn = gimple_call_combined_fn (stmt);
> >
> > It might be possible to use gimple_extract_op here (only recently added).
> > This would also provide the number of operands directly, instead of
> > needing “op_type”.  It would also provide an array of operands.
> >
> 
> Done.
> 
> > > -  code = gimple_assign_rhs_code (stmt);
> > > -  if (!CONVERT_EXPR_CODE_P (code)
> > > -  && code != FIX_TRUNC_EXPR
> > > -  && code != FLOAT_EXPR
> > > -  && code != WIDEN_PLUS_EXPR
> > > -  && code != WIDEN_MINUS_EXPR
> > > -  && code != WIDEN_MULT_EXPR
> > > -  && code != WIDEN_LSHIFT_EXPR)
> >
> > Is it safe to drop this check independently of parts 2 and 3?
> > (Genuine question, haven't checked in detail.)
> 
> It requires the parts 2 and 3. I've moved that change into this first patch.
> 
> > > @@ -4784,7 +4784,8 @@ vectorizable_conversion (vec_info *vinfo,
> > >  }
> > >
> > >rhs_type = TREE_TYPE (op0);
> > > -  if ((code != FIX_TRUNC_EXPR && code != FLOAT_EXPR)
> > > +  if ((code_or_ifn.is_tree_code () && 

Re: [PATCH] Modula-2: merge proposal/review: 1/9 01.patch-set-01

2022-05-25 Thread Richard Biener via Gcc-patches
On Tue, May 24, 2022 at 5:45 PM Gaius Mulley  wrote:
>
> Richard Biener  writes:
>
> > On Sat, May 21, 2022 at 3:11 AM Gaius Mulley  wrote:
> >>
> >>
> >> Hi,
> >>
> >> Gaius wrote:
> >>
> >> > the changes do raise questions.  The reason for the changes here are to
> >> > allow easy linking for modula-2 users.
> >>
> >> >  $ gm2 hello.mod
> >>
> >> > for example will compile and link with all dependent modules (dependants
> >> > are generated by analysing module source imports).  The gm2 driver will
> >> > add objects and libraries to the link.
> >>
> >> in more detail the gm2 driver does the following:
> >>
> >>   $ gm2 -v hello.mod
> >>
> >> full output below, but to summarise and annotate:
> >>
> >> cc1gm2 generates an assembler file from hello.mod
> >>  as --64 /tmp/cc8BoL3d.s -o hello.o
> >>
> >>  # gm2l generates a list of all dependent modules from parsing all imports
> >>  /home/gaius/opt/lib/gcc/x86_64-pc-linux-gnu/13.0.0/gm2l -v \
> >>  -I/home/gaius/opt/lib/gcc/x86_64-pc-linux-gnu/13.0.0/m2/m2pim -o \
> >>  /tmp/ccSMojUb.l hello.mod
> >>
> >>  # gm2lorder reorders the critical runtime modules
> >>  /home/gaius/opt/lib/gcc/x86_64-pc-linux-gnu/13.0.0/gm2lorder \
> >> /tmp/ccSMojUb.l -o /tmp/ccHDRdde.lst
> >>
> >>  # gm2lgen generates a C++ scaffold from the reordered module list
> >>  /home/gaius/opt/lib/gcc/x86_64-pc-linux-gnu/13.0.0/gm2lgen -fcpp \
> >> /tmp/ccHDRdde.lst -o a-hello_m2.cpp
> >>
> >>  # cc1plus compiles the scaffold
> >>  /home/gaius/opt/lib/gcc/x86_64-pc-linux-gnu/13.0.0/cc1plus -v \
> >>  -mtune=generic -march=x86-64 \
> >>  -I/home/gaius/opt/lib/gcc/x86_64-pc-linux-gnu/13.0.0/m2/m2pim \
> >>  -quiet a-hello_m2.cpp -o a-hello_m2.s
> >>  as --64 a-hello_m2.s -o a-hello_m2.o
> >>
> >>  # gm2lcc creates an archive from the list of modules and the scaffold
> >> /home/gaius/opt/lib/gcc/x86_64-pc-linux-gnu/13.0.0/gm2lcc \
> >>   -L/home/gaius/opt/lib/gcc/x86_64-pc-linux-gnu/13.0.0/m2/m2pim \
> >>   -ftarget-ar=/usr/bin/ar -ftarget-ranlib=/usr/bin/ranlib \
> >> -fobject-path=/home/gaius/opt/lib/gcc/x86_64-pc-linux-gnu/13.0.0/m2/m2pim \
> >>   --exec --startup a-hello_m2.o --ar -o /tmp/ccNJ60fa.a --mainobject \
> >>   a-hello_m2.o /tmp/ccHDRdde.lst
> >>
> >> /usr/bin/ar rc /tmp/ccNJ60fa.a  hello.o a-hello_m2.o
> >> /usr/bin/ranlib /tmp/ccNJ60fa.a

So is there a reason to have the 'scaffold' separate from the object
of hello.mod?
Is there more than a 1:1 relation between a .mod and the 'scaffold'?  Why
are multiple tools involved here - can the m2 frontend not parse imports,
reorder runtime modules and generate the 'scaffold' as GENERIC IL
as part of the translation unit of the .mod file?  Indirection through emitting
C++ source code makes the process a bit awkward IMHO.

Unfortunately I have no m2 installation around to look at how complex
the 'scaffold' is.

> >> # finally collect2 performs the link from the archive and any default
> >>   libraries
> >>
> >> hope this helps
> >
> > Yes, it does.  So historically when there was complex massaging required
> > like this it was offloaded to a "helper driver".  With -flto there's 
> > lto-wrapper
> > (but here invoked by the linker), with ada there's gnatmake and others
> > and with certain targets collect2 does extra processing producing global
> > CTORs (or for C++ with -frepo even invoked additional compilations).
>
> Hi Richard,
>
> interesting thank you for the information about different languages (yes
> I recall years ago the lang-specs used to be much more complex).  global
> CTORs would work might be helpful, although I wonder if they are overly
> complex for modula-2?  (As we still need to control order).  I guess
> there would be pros/cons for multi-lingual projects.
>
> I wonder whether it could be resolved in the modula-2 front end by
> placing the scaffold generation inside cc1gm2 (which is generated when a
> program module is seen - and/or perhaps forced by a switch if a user
> really wanted to link an implementation module as the application,
> say by -fm2-scaffold).  So by default the proposal would be that
>
>   $ gm2 -c programmodule.mod
>
> generates programmodule.o (which contains main and a scaffold to
> construct/deconstruct every module as well as the user level code).
> There could be a switch to emit the scaffold in C or C++ should users
> want to interfere.
>
> Overall much of the modula-2 code inside /gm2tools would go into cc1gm2
> and many 100s of C lines of code would disappear from the 'gm2' driver
> and the code base would clean up :-).  In the process it would become
> more like the other GCC language drivers.
>
> Some of the gm2 link options could be passed into cc1gm2 (those forcing
> the order of module initialization and user specified scaffold list
> override).  The make dependencies could also be emitted if required
> as cc1gm2 now has knowledge of all imports.
>
> > I do think that this might all belong into the main driver code but then
> > maybe all the different language compilation 

Re: [PATCH] ipa-visibility: Optimize TLS access [PR99619]

2022-05-25 Thread Jan Hubicka via Gcc-patches
> On Mon, 16 May 2022, Alexander Monakov wrote:
> 
> > On Mon, 9 May 2022, Jan Hubicka wrote:
> > 
> > > > On second thought, it might be better to keep the assert, and place the 
> > > > loop
> > > > under 'if (optimize)'?
> > > 
> > > The problem is that at IPA level it does not make sense to check
> > > optimize flag as it is function specific.  (shlib is OK to check it
> > > anywhere since it is global.)
> > > 
> > > So I think we really want to run the code only at the WPA time
> > > (symtab_state>=IPA_SSA) and we want to see what is optimization flag of
> > > those function referring the variable since that is what decided codegen
> > > we will produce.
> > 
> > Perhaps I misunderstood the issue. Are you saying that there might be no -O
> > option on lto1 command line, because lto1 is supposed to take optimization
> > level from function summaries, but during pass_ipa_whole_program_visibility
> > there's no "current function" so 'optimize' is at its default value (zero)?
> > 
> > And the solution is to iterate over referring functions to see if at least
> > one of them satisfies 'opt_for_fn (decl, optimize) > 0'?
> 
> Do you want to see a patch implementing the above solution?
Yes, I think it is correct solution here...

Thanks,
Honza
> 
> Alexander


[PATCH] Add GIMPLE switch support to loop unswitching

2022-05-25 Thread Richard Biener via Gcc-patches
This patch adds support to unswitch loops with switch statements
based on invariant index.  It furthermore reworks the cost model
to allow an overall budget of statements to be created per original
loop by all unswitching opportunities in the loop.  Compared to
the original all unswitching opportunities in a loop are
pre-evaluated before the first transform which will allow future
changes to select the most profitable candidates first.

To efficiently support switch statements the pass now uses
ranger to simplify switch statements and conditions in loop
copies based on ranges extracted from the recorded set of
predicates unswitched.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed to trunk.

Richard.

gcc/ChangeLog:

* dbgcnt.def (DEBUG_COUNTER): Add loop_unswitch counter.
* params.opt (max-unswitch-level): Remove.
* doc/invoke.texi (max-unswitch-level): Likewise.
* tree-cfg.c (gimple_lv_add_condition_to_bb): Support not
gimplified expressions.
* tree-ssa-loop-unswitch.c (struct unswitch_predicate): New.
(tree_may_unswitch_on): Rename to ...
(find_unswitching_predicates_for_bb): ... this and handle
switch statements.
(get_predicates_for_bb): Likewise.
(set_predicates_for_bb): Likewise.
(init_loop_unswitch_info): Likewise.
(tree_ssa_unswitch_loops): Prepare stuff before calling
tree_unswitch_single_loop.
(tree_unswitch_single_loop): Rework the function using
pre-computed predicates and with a per original loop cost model.
(merge_last): New.
(add_predicate_to_path): Likewise.
(find_range_for_lhs): Likewise.
(simplify_using_entry_checks): Rename to ...
(evaluate_control_stmt_using_entry_checks): ... this, handle
switch statements and improve simplifications using ranger.
(simplify_loop_version): Rework using
evaluate_control_stmt_using_entry_checks.
(evaluate_bbs): New.
(evaluate_loop_insns_for_predicate): Likewise.
(tree_unswitch_loop): Adjust to allow switch statements and
pass in the edge to unswitch.
(clean_up_after_unswitching): New.
(pass_tree_unswitch::execute): Pass down fun.

gcc/testsuite/ChangeLog:

* gcc.dg/loop-unswitch-7.c: New test.
* gcc.dg/loop-unswitch-8.c: New test.
* gcc.dg/loop-unswitch-9.c: New test.
* gcc.dg/loop-unswitch-10.c: New test.
* gcc.dg/loop-unswitch-11.c: New test.
* gcc.dg/loop-unswitch-12.c: New test.
* gcc.dg/loop-unswitch-13.c: New test.
* gcc.dg/loop-unswitch-14.c: New test.
* gcc.dg/loop-unswitch-15.c: New test.
* gcc.dg/loop-unswitch-16.c: New test.
* gcc.dg/loop-unswitch-17.c: New test.
* gcc.dg/torture/20220518-1.c: New test.
* gcc.dg/torture/20220518-2.c: New test.
* gcc.dg/torture/20220525-1.c: New test.
* gcc.dg/alias-10.c: Adjust.
* gcc.dg/tree-ssa/loop-6.c: Likewise.
* gcc.dg/loop-unswitch-1.c: Likewise.

Co-authored-by: Richard Biener  
---
 gcc/dbgcnt.def|1 +
 gcc/doc/invoke.texi   |3 -
 gcc/params.opt|4 -
 gcc/testsuite/gcc.dg/alias-10.c   |2 +-
 gcc/testsuite/gcc.dg/loop-unswitch-1.c|2 +-
 gcc/testsuite/gcc.dg/loop-unswitch-10.c   |   56 ++
 gcc/testsuite/gcc.dg/loop-unswitch-11.c   |   45 +
 gcc/testsuite/gcc.dg/loop-unswitch-12.c   |   28 +
 gcc/testsuite/gcc.dg/loop-unswitch-13.c   |   35 +
 gcc/testsuite/gcc.dg/loop-unswitch-14.c   |   60 ++
 gcc/testsuite/gcc.dg/loop-unswitch-15.c   |   15 +
 gcc/testsuite/gcc.dg/loop-unswitch-16.c   |   22 +
 gcc/testsuite/gcc.dg/loop-unswitch-17.c   |   24 +
 gcc/testsuite/gcc.dg/loop-unswitch-7.c|   28 +
 gcc/testsuite/gcc.dg/loop-unswitch-8.c|   31 +
 gcc/testsuite/gcc.dg/loop-unswitch-9.c|   27 +
 gcc/testsuite/gcc.dg/torture/20220518-1.c |   39 +
 gcc/testsuite/gcc.dg/torture/20220518-2.c |   14 +
 gcc/testsuite/gcc.dg/torture/20220525-1.c |   33 +
 gcc/testsuite/gcc.dg/tree-ssa/loop-6.c|2 +-
 gcc/tree-cfg.cc   |7 +-
 gcc/tree-ssa-loop-unswitch.cc | 1062 -
 22 files changed, 1288 insertions(+), 252 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/loop-unswitch-10.c
 create mode 100644 gcc/testsuite/gcc.dg/loop-unswitch-11.c
 create mode 100644 gcc/testsuite/gcc.dg/loop-unswitch-12.c
 create mode 100644 gcc/testsuite/gcc.dg/loop-unswitch-13.c
 create mode 100644 gcc/testsuite/gcc.dg/loop-unswitch-14.c
 create mode 100644 gcc/testsuite/gcc.dg/loop-unswitch-15.c
 create mode 100644 gcc/testsuite/gcc.dg/loop-unswitch-16.c
 create mode 100644 gcc/testsuite/gcc.dg/loop-unswitch-17.c
 create mode 100644 gcc/testsuite/gcc.dg/loop-unswitch-7.c
 create mode 100644 gcc/testsuite/gcc.dg/loop-unswitch-8.c
 create mode 100644 gcc/testsuite/gcc.dg/loop

Re: [PATCH] aarch64: Fix pac-ret with unusual dwarf in libgcc unwinder [PR104689]

2022-05-25 Thread Richard Sandiford via Gcc-patches
Szabolcs Nagy  writes:
> The 05/13/2022 16:35, Richard Sandiford wrote:
>> Szabolcs Nagy via Gcc-patches  writes:
>> > The RA_SIGN_STATE dwarf pseudo-register is normally only set using the
>> > DW_CFA_AARCH64_negate_ra_state (== DW_CFA_window_save) operation which
>> > toggles the return address signedness state (the default state is 0).
>> > (It may be set by remember/restore_state CFI too, those save/restore
>> > the state of all registers.)
>> >
>> > However RA_SIGN_STATE can be set directly via DW_CFA_val_expression too.
>> > GCC does not generate such CFI but some other compilers reportedly do.
>> >
>> > Note: the toggle operation must not be mixed with other dwarf register
>> > rule CFI within the same CIE and FDE.
>> >
>> > In libgcc we assume REG_UNSAVED means the RA_STATE is set using toggle
>> > operations, otherwise we assume its value is set by other CFI.
>> 
>> AFAIK, this is the first time I've looked at the RA_SIGN_STATE code,
>> so this is probably a naive point/question, but: it seems a bit
>> underhand for the existing code to be using REG_UNSAVED and
>> loc.offset to hold the toggle state.  Would it make sense to add
>> a new enum value for known, pre-evaluated constants?  _Unwind_GetGR
>> would then DTRT for both cases.
>> 
>> That's a comment about the pre-existing code though.  I agree this
>> patch looks like the right fix if we keep to the current approach.
>
> yes, this is a hack. i looked at introducing a generic REG_*
> enum to deal with RA_SIGN_STATE now, but it's a bit awkward:
>
> normally frame state for a reg starts out REG_UNSAVED, which
> should mean 0 value for the RA_SIGN_STATE pseudo register.
>
> when moving up frames the uw context gets copied and updated
> according to the frame state (where REG_UNSAVED normally means
> unmodified copy), this is not right for RA_SIGN_STATE which
> should be reset in the absence of related dwarf ops. we can
> fix this up in target hooks for update context, but we still
> have to special case REG_UNSAVED.
>
> i think introducing a new REG_CONST does not simplify the
> aarch64 target code (we might need further changes to get
> a clean solution).

Ah, yeah, the zero reset would still need to be shoe-horned
in somehow.

OK for the original patch in that case.

Thanks,
Richard


Re: OpenMP patches pending review

2022-05-25 Thread Jakub Jelinek via Gcc-patches
On Wed, May 25, 2022 at 09:25:19AM +0200, Tobias Burnus wrote:
> first – thanks for all the reviews – both quick and though.
> 
> Pending review are:
> 
> * Julian's struct/declare mapper patch set:
> 10/11 – OpenMP: Use OMP_ARRAY_SECTION instead of TREE_LIST for array sections 
> in C FE
> 11/11 - OpenMP: Support OpenMP 5.0 "declare mapper" directives for C

Generally, the same things apply to these C FE patches as to the C++ FE
patches (except that the C++ patches usually included also generic code
changes while the C FE patches didn't).

Jakub



Re: [PATCH v2] DSE: Use the constant store source if possible

2022-05-25 Thread Richard Sandiford via Gcc-patches
"H.J. Lu via Gcc-patches"  writes:
> On Mon, May 23, 2022 at 12:38:06PM +0200, Richard Biener wrote:
>> On Sat, May 21, 2022 at 5:02 AM H.J. Lu via Gcc-patches
>>  wrote:
>> >
>> > When recording store for RTL dead store elimination, check if the source
>> > register is set only once to a constant.  If yes, record the constant
>> > as the store source.  It eliminates unrolled zero stores after memset 0
>> > in a loop where a vector register is used as the zero store source.
>> >
>> > gcc/
>> >
>> > PR rtl-optimization/105638
>> > * dse.cc (record_store): Use the constant source if the source
>> > register is set only once.
>> >
>> > gcc/testsuite/
>> >
>> > PR rtl-optimization/105638
>> > * g++.target/i386/pr105638.C: New test.
>> > ---
>> >  gcc/dse.cc   | 19 ++
>> >  gcc/testsuite/g++.target/i386/pr105638.C | 44 
>> >  2 files changed, 63 insertions(+)
>> >  create mode 100644 gcc/testsuite/g++.target/i386/pr105638.C
>> >
>> > diff --git a/gcc/dse.cc b/gcc/dse.cc
>> > index 30c11cee034..0433dd3d846 100644
>> > --- a/gcc/dse.cc
>> > +++ b/gcc/dse.cc
>> > @@ -1508,6 +1508,25 @@ record_store (rtx body, bb_info_t bb_info)
>> >
>> >   if (tem && CONSTANT_P (tem))
>> > const_rhs = tem;
>> > + else
>> > +   {
>> > + /* If RHS is set only once to a constant, set CONST_RHS
>> > +to the constant.  */
>> > + df_ref def = DF_REG_DEF_CHAIN (REGNO (rhs));
>> > + if (def != nullptr
>> > + && !DF_REF_IS_ARTIFICIAL (def)
>> > + && !DF_REF_NEXT_REG (def))
>> > +   {
>> > + rtx_insn *def_insn = DF_REF_INSN (def);
>> > + rtx def_body = PATTERN (def_insn);
>> > + if (GET_CODE (def_body) == SET)
>> > +   {
>> > + rtx def_src = SET_SRC (def_body);
>> > + if (CONSTANT_P (def_src))
>> > +   const_rhs = def_src;
>> 
>> doesn't DSE have its own tracking of stored values?  Shouldn't we
>
> It tracks stored values only within the basic block.  When RTL loop
> invariant motion hoists a constant initialization out of the loop into
> a separate basic block, the constant store value becomes unknown
> within the original basic block.
>
>> improve _that_ if it is not enough?  I also wonder if you need to
>
> My patch extends DSE stored value tracking to include the constant which
> is set only once in another basic block.
>
>> verify the SET isn't partial?
>> 
>
> Here is the v2 patch to check that the constant is set by a non-partial
> unconditional load.
>
> OK for master?
>
> Thanks.
>
> H.J.
> ---
> RTL DSE tracks redundant constant stores within a basic block.  When RTL
> loop invariant motion hoists a constant initialization out of the loop
> into a separate basic block, the constant store value becomes unknown
> within the original basic block.  When recording store for RTL DSE, check
> if the source register is set only once to a constant by a non-partial
> unconditional load.  If yes, record the constant as the constant store
> source.  It eliminates unrolled zero stores after memset 0 in a loop
> where a vector register is used as the zero store source.
>
> gcc/
>
>   PR rtl-optimization/105638
>   * dse.cc (record_store): Use the constant source if the source
>   register is set only once.
>
> gcc/testsuite/
>
>   PR rtl-optimization/105638
>   * g++.target/i386/pr105638.C: New test.
> ---
>  gcc/dse.cc   | 22 
>  gcc/testsuite/g++.target/i386/pr105638.C | 44 
>  2 files changed, 66 insertions(+)
>  create mode 100644 gcc/testsuite/g++.target/i386/pr105638.C
>
> diff --git a/gcc/dse.cc b/gcc/dse.cc
> index 30c11cee034..af8e88dac32 100644
> --- a/gcc/dse.cc
> +++ b/gcc/dse.cc
> @@ -1508,6 +1508,28 @@ record_store (rtx body, bb_info_t bb_info)
>  
> if (tem && CONSTANT_P (tem))
>   const_rhs = tem;
> +   else
> + {
> +   /* If RHS is set only once to a constant, set CONST_RHS
> +  to the constant.  */
> +   df_ref def = DF_REG_DEF_CHAIN (REGNO (rhs));
> +   if (def != nullptr
> +   && !DF_REF_IS_ARTIFICIAL (def)
> +   && !(DF_REF_FLAGS (def)
> +& (DF_REF_PARTIAL | DF_REF_CONDITIONAL))
> +   && !DF_REF_NEXT_REG (def))

Can we introduce a helper for this?  There are already similar tests
in ira and loop-iv, and it seems a bit too complex to have to open-code
each time.

Thanks,
Richard

> + {
> +   rtx_insn *def_insn = DF_REF_INSN (def);
> +   rtx def_body = PATTERN (def_insn);
> +   if (GET_CODE (def_body) == SET)
> + {
> +   rtx def_src = SET_SRC (def_body);
> +   if (CONSTANT_P 

OpenMP patches pending review

2022-05-25 Thread Tobias Burnus

Hi all, hi Jakub,

first – thanks for all the reviews – both quick and though.

Pending review are:

* Julian's struct/declare mapper patch set:
10/11 – OpenMP: Use OMP_ARRAY_SECTION instead of TREE_LIST for array sections 
in C FE
11/11 - OpenMP: Support OpenMP 5.0 "declare mapper" directives for C

All others (00/11 to 09/11) have been reviewed yesterday. (Thanks!)

(Whether 10/11 and 11/11 should be better reviewed before or after re-submission
of the revised patches, I don' know.)


* Kwok's metadirectives patches (0/7 .. 7/7) at:
"[PATCH 0/7] openmp: OpenMP metadirectives support   Kwok Cheung Yeung"
https://gcc.gnu.org/pipermail/gcc-patches/2021-December/thread.html#586600

With some follow-up fixes:
(1) "[PATCH] openmp: Metadirective patch fixes"
https://gcc.gnu.org/pipermail/gcc-patches/2022-January/589190.html
(2) [PATCH] openmp: Add support for target_device selector set in metadirectives
https://gcc.gnu.org/pipermail/gcc-patches/2022-January/589191.html
+ follow up fix to plugin-nvptx in the same thread:
https://gcc.gnu.org/pipermail/gcc-patches/2022-January/589269.html
(3) [PATCH] openmp: Add warning when functions containing metadirectives with 
'construct={target}' called directly
https://gcc.gnu.org/pipermail/gcc-patches/2022-January/589427.html
(4) Follow-up fix to (6/7) patch for Fortran:
[PATCH 6/7] openmp, fortran: Add Fortran support for parsing metadirectives
https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590349.html
(5) [PATCH] openmp: Improve handling of nested OpenMP metadirectives in C and 
C++ (was: Re: [PATCH 1/7] openmp: Add C support for parsing metadirectives)
https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590592.html


* A bunch of additional patches are pending, which either small but depend on 
other patches
   or are larger – but I intent to ping them separately. For some overview, see 
the old thread at
  https://gcc.gnu.org/pipermail/gcc/2022-May/238642.html
in particular: memory management + unified-shared-memory and the kernels 
management
and depending on others: omp_get_max_teams, omp_set_num_teams, and 
omp_{gs}et_teams_thread_limit
+ Fortran deep mapping (working but should be better handled chopped in pieces)


Reviewed and now requires to be revised:

* 01/11 to 09/11 of the struct rework/declare mapper patches as remarked above
* by-device environment variables
* Probably some additional patches

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [PATCH] AArch64: Prioritise init_have_lse_atomics constructor [PR 105708]

2022-05-25 Thread Richard Sandiford via Gcc-patches
Wilco Dijkstra  writes:
> Increase the priority of the init_have_lse_atomics constructor so it runs
> before other constructors. This improves chances that rr works when LSE
> atomics are supported.

Can you add a comment above the function explaining why we chose 90
in particular?  I see 100 was originally suggested in the PR.

Thanks,
Richard

> Regress and bootstrap pass, OK for commit?
>
> 2022-05-24  Wilco Dijkstra  
>
> libgcc/
> PR libgcc/105708
> * config/aarch64/lse-init.c: Increase constructor priority.
>
> ---
> diff --git a/libgcc/config/aarch64/lse-init.c 
> b/libgcc/config/aarch64/lse-init.c
> index 
> fc875b7fe80e947623e570eac130e7a14b516551..92d91dfeed77f299aa610d72091499271490
>  100644
> --- a/libgcc/config/aarch64/lse-init.c
> +++ b/libgcc/config/aarch64/lse-init.c
> @@ -38,7 +38,7 @@ _Bool __aarch64_have_lse_atomics
>
>  unsigned long int __getauxval (unsigned long int);
>
> -static void __attribute__((constructor))
> +static void __attribute__((constructor (90)))
>  init_have_lse_atomics (void)
>  {
>unsigned long hwcap = __getauxval (AT_HWCAP);