date:20220530

[PATCH] expr.cc: Optimize if char array initialization consists of all zeros

2022-05-30 Thread Takayuki 'January June' Suwa via Gcc-patches


Hi all,

In some targets, initialization code for char array may be split into two
parts even if the initialization consists of all zeros:

/* example */
extern void foo(char*);
void test(void) {
  char a[256] = { 0, 0, 0, 0, 0, 0, 0, 0, 0 };
  foo(a);
}

;; Xtensa (xtensa-lx106)
.LC0:
.string ""
.string ""
.string ""
.string ""
.string ""
.string ""
.string ""
.string ""
.string ""
.string ""
.zero   246
test:
movia9, 0x110
sub sp, sp, a9
l32ra3, .LC1
movi.n  a4, 0xa
mov.n   a2, sp
s32ia0, sp, 268
call0   memcpy
movia4, 0xf6
movi.n  a3, 0
addi.n  a2, sp, 10
call0   memset
mov.n   a2, sp
call0   foo
l32ia0, sp, 268
movia9, 0x110
add.n   sp, sp, a9
ret.n

;; H8/300 (-mh -mint32)
.LC0:
.string ""
.string ""
.string ""
.string ""
.string ""
.string ""
.string ""
.string ""
.string ""
.string ""
.zero   246
_test:
sub.l   #256,er7
sub.l   er2,er2
add.b   #10,r2l
mov.l   #.LC0,er1
mov.l   er7,er0
jsr @_memcpy
sub.l   er2,er2
add.b   #246,r2l
sub.l   er1,er1
sub.l   er0,er0
add.b   #10,r0l
add.l   er7,er0
jsr @_memset
mov.l   er7,er0
jsr @_foo
add.l   #256,er7
rts

i386 target (both 32 and 64bit) does not show such behavior.

gcc/ChangeLog:

* expr.cc (store_expr): Add check if the initialization content
consists of all zeros.
---
 gcc/expr.cc | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/gcc/expr.cc b/gcc/expr.cc
index 7197996cec7..f94310dc7b9 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -6015,6 +6015,7 @@ store_expr (tree exp, rtx target, int call_param_p,
   rtx dest_mem;
   tree str = TREE_CODE (exp) == STRING_CST
 ? exp : TREE_OPERAND (TREE_OPERAND (exp, 0), 0);
+  char ch;

   exp_len = int_expr_size (exp);
   if (exp_len <= 0)
@@ -6032,6 +6033,12 @@ store_expr (tree exp, rtx target, int call_param_p,
}

   str_copy_len = TREE_STRING_LENGTH (str);
+  /* If str contains only zeroes, no need to store to target.  */
+  ch = 0;
+  for (HOST_WIDE_INT i = 0; i < str_copy_len; ++i)
+   ch |= TREE_STRING_POINTER (str)[i];
+  if (ch == 0)
+   str_copy_len = 0;
   if ((STORE_MAX_PIECES & (STORE_MAX_PIECES - 1)) == 0)
{
  str_copy_len += STORE_MAX_PIECES - 1;
--
2.20.1

Re: [PATCH v5, rs6000] Add V1TI into vector comparison expand [PR103316]

2022-05-30 Thread Kewen.Lin via Gcc-patches

Hi Haochen,

on 2022/5/31 09:52, HAO CHEN GUI wrote:
> Hi,
>This patch adds V1TI mode into a new mode iterator used in vector
> comparison shift and rotation expands. Without the patch, the comparisons
> between two vector __int128 are converted to scalar comparisons and
> code is suboptimal. The patch fixes the issue. Now all comparisons
> between two vector __int128 generates P10 new comparison instructions.
> Also the relative built-ins generate the same instructions after gimple
> folding. So they're added back to the folding list.
> 
>   This patch also merges some vector comparison shift and rotation expands
> for V1T1 and other vector integer modes as they have the similar patterns.
> The expands for V1TI only are removed.
> 
>Bootstrapped and tested on ppc64 Linux BE and LE with no regressions.
> Is this okay for trunk? Any recommendations? Thanks a lot.
> 

OK, thanks!

BR,
Kewen

> ChangeLog
> 2022-05-24 Haochen Gui 
> 
> gcc/
>   PR target/103316
>   * config/rs6000/rs6000-builtin.cc (rs6000_gimple_fold_builtin): Enable
>   gimple folding for RS6000_BIF_VCMPEQUT, RS6000_BIF_VCMPNET,
>   RS6000_BIF_CMPGE_1TI, RS6000_BIF_CMPGE_U1TI, RS6000_BIF_VCMPGTUT,
>   RS6000_BIF_VCMPGTST, RS6000_BIF_CMPLE_1TI, RS6000_BIF_CMPLE_U1TI.
>   * config/rs6000/vector.md (VEC_IC): New mode iterator.  Add support
>   for new Power10 V1TI instructions.
>   (vec_cmp): Set mode iterator to VEC_IC.
>   (vec_cmpu): Likewise.
>   (vector_nlt): Set mode iterator to VEC_IC.
>   (vector_nltv1ti): Remove.
>   (vector_gtu): Set mode iterator to VEC_IC.
>   (vector_gtuv1ti): Remove.
>   (vector_nltu): Set mode iterator to VEC_IC.
>   (vector_nltuv1ti): Remove.
>   (vector_geu): Set mode iterator to VEC_IC.
>   (vector_ngt): Likewise.
>   (vector_ngtv1ti): Remove.
>   (vector_ngtu): Set mode iterator to VEC_IC.
>   (vector_ngtuv1ti): Remove.
>   (vector_gtu__p): Set mode iterator to VEC_IC.
>   (vector_gtu_v1ti_p): Remove.
>   (vrotl3): Set mode iterator to VEC_IC.  Emit insns for V1TI.
>   (vrotlv1ti3): Remove.
>   (vashr3): Set mode iterator to VEC_IC.  Emit insns for V1TI.
>   (vashrv1ti3): Remove.
> 
> gcc/testsuite/
>   PR target/103316
>   * gcc.target/powerpc/pr103316.c: New.
>   * gcc.target/powerpc/fold-vec-cmp-int128.c: New.
> 
> 
> patch.diff
> diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
> b/gcc/config/rs6000/rs6000-builtin.cc
> index e925ba9fad9..b67f4e066a8 100644
> --- a/gcc/config/rs6000/rs6000-builtin.cc
> +++ b/gcc/config/rs6000/rs6000-builtin.cc
> @@ -2000,16 +2000,14 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>  case RS6000_BIF_VCMPEQUH:
>  case RS6000_BIF_VCMPEQUW:
>  case RS6000_BIF_VCMPEQUD:
> -/* We deliberately omit RS6000_BIF_VCMPEQUT for now, because gimple
> -   folding produces worse code for 128-bit compares.  */
> +case RS6000_BIF_VCMPEQUT:
>fold_compare_helper (gsi, EQ_EXPR, stmt);
>return true;
> 
>  case RS6000_BIF_VCMPNEB:
>  case RS6000_BIF_VCMPNEH:
>  case RS6000_BIF_VCMPNEW:
> -/* We deliberately omit RS6000_BIF_VCMPNET for now, because gimple
> -   folding produces worse code for 128-bit compares.  */
> +case RS6000_BIF_VCMPNET:
>fold_compare_helper (gsi, NE_EXPR, stmt);
>return true;
> 
> @@ -2021,9 +2019,8 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>  case RS6000_BIF_CMPGE_U4SI:
>  case RS6000_BIF_CMPGE_2DI:
>  case RS6000_BIF_CMPGE_U2DI:
> -/* We deliberately omit RS6000_BIF_CMPGE_1TI and RS6000_BIF_CMPGE_U1TI
> -   for now, because gimple folding produces worse code for 128-bit
> -   compares.  */
> +case RS6000_BIF_CMPGE_1TI:
> +case RS6000_BIF_CMPGE_U1TI:
>fold_compare_helper (gsi, GE_EXPR, stmt);
>return true;
> 
> @@ -2035,9 +2032,8 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>  case RS6000_BIF_VCMPGTUW:
>  case RS6000_BIF_VCMPGTUD:
>  case RS6000_BIF_VCMPGTSD:
> -/* We deliberately omit RS6000_BIF_VCMPGTUT and RS6000_BIF_VCMPGTST
> -   for now, because gimple folding produces worse code for 128-bit
> -   compares.  */
> +case RS6000_BIF_VCMPGTUT:
> +case RS6000_BIF_VCMPGTST:
>fold_compare_helper (gsi, GT_EXPR, stmt);
>return true;
> 
> @@ -2049,9 +2045,8 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>  case RS6000_BIF_CMPLE_U4SI:
>  case RS6000_BIF_CMPLE_2DI:
>  case RS6000_BIF_CMPLE_U2DI:
> -/* We deliberately omit RS6000_BIF_CMPLE_1TI and RS6000_BIF_CMPLE_U1TI
> -   for now, because gimple folding produces worse code for 128-bit
> -   compares.  */
> +case RS6000_BIF_CMPLE_1TI:
> +case RS6000_BIF_CMPLE_U1TI:
>fold_compare_helper (gsi, LE_EXPR, stmt);
>return true;
> 
> diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md
> index 4d0797c48f8..a0d33d2f604 100644
>

[PATCH v5, rs6000] Add V1TI into vector comparison expand [PR103316]

2022-05-30 Thread HAO CHEN GUI via Gcc-patches

Hi,
   This patch adds V1TI mode into a new mode iterator used in vector
comparison shift and rotation expands. Without the patch, the comparisons
between two vector __int128 are converted to scalar comparisons and
code is suboptimal. The patch fixes the issue. Now all comparisons
between two vector __int128 generates P10 new comparison instructions.
Also the relative built-ins generate the same instructions after gimple
folding. So they're added back to the folding list.

  This patch also merges some vector comparison shift and rotation expands
for V1T1 and other vector integer modes as they have the similar patterns.
The expands for V1TI only are removed.

   Bootstrapped and tested on ppc64 Linux BE and LE with no regressions.
Is this okay for trunk? Any recommendations? Thanks a lot.

ChangeLog
2022-05-24 Haochen Gui 

gcc/
PR target/103316
* config/rs6000/rs6000-builtin.cc (rs6000_gimple_fold_builtin): Enable
gimple folding for RS6000_BIF_VCMPEQUT, RS6000_BIF_VCMPNET,
RS6000_BIF_CMPGE_1TI, RS6000_BIF_CMPGE_U1TI, RS6000_BIF_VCMPGTUT,
RS6000_BIF_VCMPGTST, RS6000_BIF_CMPLE_1TI, RS6000_BIF_CMPLE_U1TI.
* config/rs6000/vector.md (VEC_IC): New mode iterator.  Add support
for new Power10 V1TI instructions.
(vec_cmp): Set mode iterator to VEC_IC.
(vec_cmpu): Likewise.
(vector_nlt): Set mode iterator to VEC_IC.
(vector_nltv1ti): Remove.
(vector_gtu): Set mode iterator to VEC_IC.
(vector_gtuv1ti): Remove.
(vector_nltu): Set mode iterator to VEC_IC.
(vector_nltuv1ti): Remove.
(vector_geu): Set mode iterator to VEC_IC.
(vector_ngt): Likewise.
(vector_ngtv1ti): Remove.
(vector_ngtu): Set mode iterator to VEC_IC.
(vector_ngtuv1ti): Remove.
(vector_gtu__p): Set mode iterator to VEC_IC.
(vector_gtu_v1ti_p): Remove.
(vrotl3): Set mode iterator to VEC_IC.  Emit insns for V1TI.
(vrotlv1ti3): Remove.
(vashr3): Set mode iterator to VEC_IC.  Emit insns for V1TI.
(vashrv1ti3): Remove.

gcc/testsuite/
PR target/103316
* gcc.target/powerpc/pr103316.c: New.
* gcc.target/powerpc/fold-vec-cmp-int128.c: New.


patch.diff
diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
b/gcc/config/rs6000/rs6000-builtin.cc
index e925ba9fad9..b67f4e066a8 100644
--- a/gcc/config/rs6000/rs6000-builtin.cc
+++ b/gcc/config/rs6000/rs6000-builtin.cc
@@ -2000,16 +2000,14 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
 case RS6000_BIF_VCMPEQUH:
 case RS6000_BIF_VCMPEQUW:
 case RS6000_BIF_VCMPEQUD:
-/* We deliberately omit RS6000_BIF_VCMPEQUT for now, because gimple
-   folding produces worse code for 128-bit compares.  */
+case RS6000_BIF_VCMPEQUT:
   fold_compare_helper (gsi, EQ_EXPR, stmt);
   return true;

 case RS6000_BIF_VCMPNEB:
 case RS6000_BIF_VCMPNEH:
 case RS6000_BIF_VCMPNEW:
-/* We deliberately omit RS6000_BIF_VCMPNET for now, because gimple
-   folding produces worse code for 128-bit compares.  */
+case RS6000_BIF_VCMPNET:
   fold_compare_helper (gsi, NE_EXPR, stmt);
   return true;

@@ -2021,9 +2019,8 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
 case RS6000_BIF_CMPGE_U4SI:
 case RS6000_BIF_CMPGE_2DI:
 case RS6000_BIF_CMPGE_U2DI:
-/* We deliberately omit RS6000_BIF_CMPGE_1TI and RS6000_BIF_CMPGE_U1TI
-   for now, because gimple folding produces worse code for 128-bit
-   compares.  */
+case RS6000_BIF_CMPGE_1TI:
+case RS6000_BIF_CMPGE_U1TI:
   fold_compare_helper (gsi, GE_EXPR, stmt);
   return true;

@@ -2035,9 +2032,8 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
 case RS6000_BIF_VCMPGTUW:
 case RS6000_BIF_VCMPGTUD:
 case RS6000_BIF_VCMPGTSD:
-/* We deliberately omit RS6000_BIF_VCMPGTUT and RS6000_BIF_VCMPGTST
-   for now, because gimple folding produces worse code for 128-bit
-   compares.  */
+case RS6000_BIF_VCMPGTUT:
+case RS6000_BIF_VCMPGTST:
   fold_compare_helper (gsi, GT_EXPR, stmt);
   return true;

@@ -2049,9 +2045,8 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
 case RS6000_BIF_CMPLE_U4SI:
 case RS6000_BIF_CMPLE_2DI:
 case RS6000_BIF_CMPLE_U2DI:
-/* We deliberately omit RS6000_BIF_CMPLE_1TI and RS6000_BIF_CMPLE_U1TI
-   for now, because gimple folding produces worse code for 128-bit
-   compares.  */
+case RS6000_BIF_CMPLE_1TI:
+case RS6000_BIF_CMPLE_U1TI:
   fold_compare_helper (gsi, LE_EXPR, stmt);
   return true;

diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md
index 4d0797c48f8..a0d33d2f604 100644
--- a/gcc/config/rs6000/vector.md
+++ b/gcc/config/rs6000/vector.md
@@ -26,6 +26,9 @@
 ;; Vector int modes
 (define_mode_iterator VEC_I [V16QI V8HI V4SI V2DI])

+;; Vector int modes for comparison, shift and rotation
+(define_mode_iterator VEC_IC [V16QI

[r13-855 Regression] FAIL: gcc.target/i386/pr78794.c scan-assembler pandn on Linux/x86_64

2022-05-30 Thread skpandey--- via Gcc-patches

On Linux/x86_64,

43201f2c2173894bf7c423cad6da1c21567e06c0 is the first bad commit
commit 43201f2c2173894bf7c423cad6da1c21567e06c0
Author: Roger Sayle 
Date:   Mon May 30 21:20:09 2022 +0100

PR target/70321: Split double word equality/inequality after STV on x86.

caused

FAIL: gcc.target/i386/pr65105-5.c scan-assembler pandn
FAIL: gcc.target/i386/pr78794.c scan-assembler pandn

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r13-855/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr65105-5.c --target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr65105-5.c --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr78794.c --target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/pr78794.c --target_board='unix{-m32\ 
-march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)

Re: [PATCH v3] DSE: Use the constant store source if possible

2022-05-30 Thread Jeff Law via Gcc-patches





On 5/30/2022 2:28 AM, Richard Sandiford wrote:

Jeff Law via Gcc-patches  writes:

On 5/29/2022 3:43 PM, H.J. Lu wrote:

On Sat, May 28, 2022 at 11:37 AM Jeff Law via Gcc-patches
 wrote:


On 5/26/2022 2:43 PM, H.J. Lu via Gcc-patches wrote:

On Thu, May 26, 2022 at 04:14:17PM +0100, Richard Sandiford wrote:

"H.J. Lu"  writes:

On Wed, May 25, 2022 at 12:30 AM Richard Sandiford
 wrote:

"H.J. Lu via Gcc-patches"  writes:

On Mon, May 23, 2022 at 12:38:06PM +0200, Richard Biener wrote:

On Sat, May 21, 2022 at 5:02 AM H.J. Lu via Gcc-patches
 wrote:

When recording store for RTL dead store elimination, check if the source
register is set only once to a constant.  If yes, record the constant
as the store source.  It eliminates unrolled zero stores after memset 0
in a loop where a vector register is used as the zero store source.

gcc/

   PR rtl-optimization/105638
   * dse.cc (record_store): Use the constant source if the source
   register is set only once.

gcc/testsuite/

   PR rtl-optimization/105638
   * g++.target/i386/pr105638.C: New test.
---
gcc/dse.cc   | 19 ++
gcc/testsuite/g++.target/i386/pr105638.C | 44 
2 files changed, 63 insertions(+)
create mode 100644 gcc/testsuite/g++.target/i386/pr105638.C

diff --git a/gcc/dse.cc b/gcc/dse.cc
index 30c11cee034..0433dd3d846 100644
--- a/gcc/dse.cc
+++ b/gcc/dse.cc
@@ -1508,6 +1508,25 @@ record_store (rtx body, bb_info_t bb_info)

 if (tem && CONSTANT_P (tem))
   const_rhs = tem;
+ else
+   {
+ /* If RHS is set only once to a constant, set CONST_RHS
+to the constant.  */
+ df_ref def = DF_REG_DEF_CHAIN (REGNO (rhs));
+ if (def != nullptr
+ && !DF_REF_IS_ARTIFICIAL (def)
+ && !DF_REF_NEXT_REG (def))
+   {
+ rtx_insn *def_insn = DF_REF_INSN (def);
+ rtx def_body = PATTERN (def_insn);
+ if (GET_CODE (def_body) == SET)
+   {
+ rtx def_src = SET_SRC (def_body);
+ if (CONSTANT_P (def_src))
+   const_rhs = def_src;

doesn't DSE have its own tracking of stored values?  Shouldn't we

It tracks stored values only within the basic block.  When RTL loop
invariant motion hoists a constant initialization out of the loop into
a separate basic block, the constant store value becomes unknown
within the original basic block.


improve _that_ if it is not enough?  I also wonder if you need to

My patch extends DSE stored value tracking to include the constant which
is set only once in another basic block.


verify the SET isn't partial?


Here is the v2 patch to check that the constant is set by a non-partial
unconditional load.

OK for master?

Thanks.

H.J.
---
RTL DSE tracks redundant constant stores within a basic block.  When RTL
loop invariant motion hoists a constant initialization out of the loop
into a separate basic block, the constant store value becomes unknown
within the original basic block.  When recording store for RTL DSE, check
if the source register is set only once to a constant by a non-partial
unconditional load.  If yes, record the constant as the constant store
source.  It eliminates unrolled zero stores after memset 0 in a loop
where a vector register is used as the zero store source.

gcc/

 PR rtl-optimization/105638
 * dse.cc (record_store): Use the constant source if the source
 register is set only once.

gcc/testsuite/

 PR rtl-optimization/105638
 * g++.target/i386/pr105638.C: New test.
---
gcc/dse.cc   | 22 
gcc/testsuite/g++.target/i386/pr105638.C | 44 
2 files changed, 66 insertions(+)
create mode 100644 gcc/testsuite/g++.target/i386/pr105638.C

diff --git a/gcc/dse.cc b/gcc/dse.cc
index 30c11cee034..af8e88dac32 100644
--- a/gcc/dse.cc
+++ b/gcc/dse.cc
@@ -1508,6 +1508,28 @@ record_store (rtx body, bb_info_t bb_info)

   if (tem && CONSTANT_P (tem))
 const_rhs = tem;
+   else
+ {
+   /* If RHS is set only once to a constant, set CONST_RHS
+  to the constant.  */
+   df_ref def = DF_REG_DEF_CHAIN (REGNO (rhs));
+   if (def != nullptr
+   && !DF_REF_IS_ARTIFICIAL (def)
+   && !(DF_REF_FLAGS (def)
+& (DF_REF_PARTIAL | DF_REF_CONDITIONAL))
+   && !DF_REF_NEXT_REG (def))

Can we introduce a helper for this?  There are already similar tests
in ira and loop-iv, and it seems a bit too complex to have to open-code
each time.

I can use find_single_def_src in loop-iv.cc:

/* If REGNO has a single definition, return its known value, otherwise return
  null.  */

rtx
find_single_def_src (unsigned int regno)

Yeah, reusing that sounds

[PATCH] Use cxx11 abi in versioned namespace

2022-05-30 Thread François Dumont via Gcc-patches


Hi

Here is the patch to use cxx11 abi for the versioned namespace mode.

It is still suffering from a side effect on gdb pretty printers with 3 
errors like:


got: type = 
std::unique_ptrstd::__8::char_traits, std::__8::allocator >>[]>>[99]>

FAIL: libstdc++-prettyprinters/80276.cc whatis p4

However once I apply the version bump errors are gone.

Let me know if it is ok to apply all at once or if you prefer to avoid 
the version bump.


François
diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
index 138bd58d86c..0fe8ce89986 100644
--- a/libstdc++-v3/acinclude.m4
+++ b/libstdc++-v3/acinclude.m4
@@ -4449,12 +4449,16 @@ dnl
 AC_DEFUN([GLIBCXX_ENABLE_LIBSTDCXX_DUAL_ABI], [
   GLIBCXX_ENABLE(libstdcxx-dual-abi,$1,,[support two versions of std::string])
   if test x$enable_symvers = xgnu-versioned-namespace; then
-# gnu-versioned-namespace is incompatible with the dual ABI.
-enable_libstdcxx_dual_abi="no"
-  fi
-  if test x"$enable_libstdcxx_dual_abi" != xyes; then
+# gnu-versioned-namespace is incompatible with the dual ABI...
 AC_MSG_NOTICE([dual ABI is disabled])
-default_libstdcxx_abi="gcc4-compatible"
+enable_libstdcxx_dual_abi="no"
+# ... and use the cxx11 one.
+default_libstdcxx_abi="new"
+  else
+if test x"$enable_libstdcxx_dual_abi" != xyes; then
+  AC_MSG_NOTICE([dual ABI is disabled])
+  default_libstdcxx_abi="gcc4-compatible"
+fi
   fi
   GLIBCXX_CONDITIONAL(ENABLE_DUAL_ABI, test $enable_libstdcxx_dual_abi = yes)
 ])
diff --git a/libstdc++-v3/config/locale/dragonfly/monetary_members.cc b/libstdc++-v3/config/locale/dragonfly/monetary_members.cc
index 1265190dad9..5f91fd9b560 100644
--- a/libstdc++-v3/config/locale/dragonfly/monetary_members.cc
+++ b/libstdc++-v3/config/locale/dragonfly/monetary_members.cc
@@ -39,7 +39,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
 // This file might be compiled twice, but we only want to define the members
 // of money_base once.
-#if ! _GLIBCXX_USE_CXX11_ABI
+#if ! _GLIBCXX_USE_CXX11_ABI || ! _GLIBCXX_USE_DUAL_ABI
 
   // Construct and return valid pattern consisting of some combination of:
   // space none symbol sign value
@@ -207,7 +207,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   }
 return __ret;
   }
-#endif
+#endif // ! _GLIBCXX_USE_CXX11_ABI || ! _GLIBCXX_USE_DUAL_ABI
 
   template<>
 void
diff --git a/libstdc++-v3/config/locale/generic/monetary_members.cc b/libstdc++-v3/config/locale/generic/monetary_members.cc
index a778a6f402d..6b56e30a568 100644
--- a/libstdc++-v3/config/locale/generic/monetary_members.cc
+++ b/libstdc++-v3/config/locale/generic/monetary_members.cc
@@ -36,7 +36,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
 // This file might be compiled twice, but we only want to define the members
 // of money_base once.
-#if ! _GLIBCXX_USE_CXX11_ABI
+#if ! _GLIBCXX_USE_CXX11_ABI || ! _GLIBCXX_USE_DUAL_ABI
 
   // Construct and return valid pattern consisting of some combination of:
   // space none symbol sign value
diff --git a/libstdc++-v3/config/locale/gnu/monetary_members.cc b/libstdc++-v3/config/locale/gnu/monetary_members.cc
index 556575363da..54be63c44d7 100644
--- a/libstdc++-v3/config/locale/gnu/monetary_members.cc
+++ b/libstdc++-v3/config/locale/gnu/monetary_members.cc
@@ -37,7 +37,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
 // This file might be compiled twice, but we only want to define the members
 // of money_base once.
-#if ! _GLIBCXX_USE_CXX11_ABI
+#if ! _GLIBCXX_USE_CXX11_ABI || ! _GLIBCXX_USE_DUAL_ABI
 
   // Construct and return valid pattern consisting of some combination of:
   // space none symbol sign value
@@ -205,7 +205,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   }
 return __ret;
   }
-#endif
+#endif // ! _GLIBCXX_USE_CXX11_ABI || ! _GLIBCXX_USE_DUAL_ABI
 
   extern char __narrow_multibyte_chars(const char* s, __locale_t cloc);
 
diff --git a/libstdc++-v3/config/locale/gnu/numeric_members.cc b/libstdc++-v3/config/locale/gnu/numeric_members.cc
index c714d6a544f..ae17cb76c81 100644
--- a/libstdc++-v3/config/locale/gnu/numeric_members.cc
+++ b/libstdc++-v3/config/locale/gnu/numeric_members.cc
@@ -39,7 +39,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   extern char __narrow_multibyte_chars(const char* s, __locale_t cloc);
 
 // This file might be compiled twice, but we only want to define this once.
-#if ! _GLIBCXX_USE_CXX11_ABI
+#if ! _GLIBCXX_USE_CXX11_ABI || ! _GLIBCXX_USE_DUAL_ABI
   char
   __narrow_multibyte_chars(const char* s, __locale_t cloc)
   {
@@ -84,7 +84,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   }
 return '\0';
   }
-#endif
+#endif // ! _GLIBCXX_USE_CXX11_ABI || ! _GLIBCXX_USE_DUAL_ABI
 
   template<>
 void
diff --git a/libstdc++-v3/configure b/libstdc++-v3/configure
index 9b94fd71e42..d3c4106a74a 100755
--- a/libstdc++-v3/configure
+++ b/libstdc++-v3/configure
@@ -75217,13 +75217,18 @@ fi
 
 
   if test x$enable_symvers = xgnu-versioned-namespace; then
-# gnu-versioned-namespace is incompatible with the dual ABI.
-

[PATCH] Fortran: improve runtime error message with ALLOCATE and, ERRMSG= [PR91300]

2022-05-30 Thread Harald Anlauf via Gcc-patches


Hi Tobias,

Am 30.05.22 um 09:33 schrieb Tobias Burnus:

On 28.05.22 22:25, Harald Anlauf via Fortran wrote:

the PR rightfully complained that we did not differentiate errors on
ALLOCATE(...,stat=,errmsg=) for failures from allocation of already
allocated objects or insufficient virtual memory.

It is even worse: The error message states the wrong reason.

The attached patch introduces a new STAT value LIBERROR_NO_MEMORY
that is returned for insufficient virtual memory, and a corresponding
(simple and invariant) ERRMSG: "Insufficient virtual memory".

I think the message is fine – at least on Linux 'virtual memory' is
what's used and it is clear what's meant, even if I stumble a bit about
the wording. (But do not have a crisp short & comprehensive wording
myself.)


for reference these are the messages of selected compilers:

ifort: insufficient virtual memory
nag: Out of memory
crayftn: The program was unable to request more memory space.

And since Intel's message for attempting to allocate an already
allocated object was closest to gfortran's version, and Cray is
far too verbose for my taste, I threw mental dice between Intel
and NAG, and Intel won.


(In the PR Janne mentions checking for errno, but since the standard
malloc(3) only mentions ENOMEM as possible error value, and the user
may replace malloc by a special library, I believe that won't be easy
to handle in a general way.)

I con concur. Especially as POSIX and the Linux manpage only list ENOMEM
as only value.

Most compilers I tried (Intel/NAG/Crayftn) behave similarly, except
for Nvidia/Flang, which try to return the size of the allocation in
the error message.

I am not sure that this is worth the effort.

I think it is not needed – especially as we generate the message in the
FE and not in libgfortran. The advantage for the users is that they know
what value has been requested – but they cannot really do much with the
knowledge, either.


Thanks for confirming this.


The testcase should be able to handle 32 and 64 bit systems.
At least that's what I think.


I hope it is – at least on 64bit, I currently expect it too fail
somewhat reliably, with 32bit I think it won't – but that's also handled.

But you could add a -fdump-tree-original + '! { dg-final {
scan-tree-dump*' to do some checking in addition (e.g. whether both
strings appear in the dump; could be more complex, but that's probably
not needed).


Regtested on x86_64-pc-linux-gnu.  OK for mainline?  Suggestions?


LGTM – with considering comments on the testcase.



Fortran: improve runtime error message with ALLOCATE and ERRMSG=


Consider appending [PR91300] in that line – and try to make the
gcc-patches@ and fortran@ lines the same

(Searching for the PR number or case-insignificantly for the string in
the mailing list archive, will fine the message; thus, that's okay.)


OK, will do from now on.  My visual parsing and reading ability of
subject lines is not really positive-correlated with their machine-
readability, but then gcc-patches@ is not what I'm reading... ;-)
(I consider it basically a write-only list).


ALLOCATE: generate different STAT,ERRMSG results for failures from
allocation of already allocated objects or insufficient virtual memory.

gcc/fortran/ChangeLog:

  PR fortran/91300
  * libgfortran.h: Define new error code LIBERROR_NO_MEMORY.
  * trans-stmt.cc (gfc_trans_allocate): Generate code for setting
  ERRMSG depending on result of STAT result of ALLOCATE.
  * trans.cc (gfc_allocate_using_malloc): Use STAT value of
  LIBERROR_NO_MEMORY in case of failed malloc.

gcc/testsuite/ChangeLog:

  PR fortran/91300
  * gfortran.dg/allocate_alloc_opt_15.f90: New test.
---

...


+  stat1   = -1
+  errmsg1 = ""
+  allocate (array(1), stat=stat1, errmsg=errmsg1)
+  if (stat1   /=  0) stop 1
+  if (errmsg1 /= "") stop 1

Consider to init the errmsg1 and then check that is has not been
touched. (For completeness, I think we already have such tests).

+  ! Obtain stat,errmsg for attempt to allocate an allocated object
+  allocate (array(1), stat=stat1, errmsg=errmsg1)
+  if (stat1   ==  0) stop 2
+  if (errmsg1 == "") stop 2

Consider to check (either here or as a third test) for the
gfortran-specific error message.

+  stat2   = -1
+  errmsg2 = ""
+  ! Try to allocate very large object
+  allocate (bigarray(bignumber), stat=stat2, errmsg=errmsg2)
+  if (stat2 /= 0) then
+ print *, "stat  =", stat1
+ print *, "errmsg: ", trim (errmsg1)
+ print *, "stat  =", stat2
+ print *, "errmsg: ", trim (errmsg2)
+ ! Ensure different results for stat, errmsg variables
+ if (stat2   == stat1 ) stop 3
+ if (errmsg2 == "" .or. errmsg2 == errmsg1) stop 4


Likewise for errmsg2


I've adjusted the testcase as suggested and hardened it somewhat
against strange options like -fdefault-integer-8 -fdefault-real-8.
Committed and pushed as attached.

Thanks for the review!

Harald


Thanks,

Tobias

-

Re: [x86 PATCH] Allow SCmode and DImode to be tieable on TARGET_64BIT.

2022-05-30 Thread Uros Bizjak via Gcc-patches

On Mon, May 30, 2022 at 10:12 PM Uros Bizjak  wrote:
>
> On Mon, May 30, 2022 at 3:22 PM Roger Sayle  
> wrote:
> >
> >
> > This patch is a form of insurance policy in case my patch for PR 7061 runs
> > into problems on non-x86 targets; the middle-end can add an extra check
> > that the backend is happy placing SCmode and DImode values in the same
> > register, before creating a SUBREG.  Unfortunately, ix86_modes_tieable_p
> > currently claims this is not allowed(?), even though the default target
> > hook for modes_tieable_p is to always return true [i.e. false can be
> > used to specifically prohibit bad combinations], and the x86_64 ABI
> > passes SCmode values in DImode registers!.  This makes the backend's
> > modes_tiable_p hook a little more forgiving, and additionally enables
> > interconversion between SCmode and V2SFmode, and between DCmode and
> > VD2Fmode, which opens interesting opportunities in the future.
> >
> > I believe there should currently be no code generation differences
> > with this change.  This patch has been tested on x86_64-pc-linux-gnu
> > with make bootstrap and make -k check, both with and without
> > --target_board=unix{-m32}, with no new failures.  Ok for mainline?
> >
> >
> > 2022-05-30  Roger Sayle  
> >
> > gcc/ChangeLog
> > * config/i386/i386.cc (ix86_modes_tieable_p): Allow SCmode to be
> > tieable with DImode on TARGET_64BIT, and SCmode tieable with
> > V2SFmode, and DCmode with V2DFmode.
>
> I *think* this is OK, but hard to say for sure without some testcases.
> Please note that x86_64 ABI passes SDmode in two separate XMM
> registers.

I meant DCmode here.

Uros.

Re: [x86 PATCH] Allow SCmode and DImode to be tieable on TARGET_64BIT.

2022-05-30 Thread Uros Bizjak via Gcc-patches

On Mon, May 30, 2022 at 3:22 PM Roger Sayle  wrote:
>
>
> This patch is a form of insurance policy in case my patch for PR 7061 runs
> into problems on non-x86 targets; the middle-end can add an extra check
> that the backend is happy placing SCmode and DImode values in the same
> register, before creating a SUBREG.  Unfortunately, ix86_modes_tieable_p
> currently claims this is not allowed(?), even though the default target
> hook for modes_tieable_p is to always return true [i.e. false can be
> used to specifically prohibit bad combinations], and the x86_64 ABI
> passes SCmode values in DImode registers!.  This makes the backend's
> modes_tiable_p hook a little more forgiving, and additionally enables
> interconversion between SCmode and V2SFmode, and between DCmode and
> VD2Fmode, which opens interesting opportunities in the future.
>
> I believe there should currently be no code generation differences
> with this change.  This patch has been tested on x86_64-pc-linux-gnu
> with make bootstrap and make -k check, both with and without
> --target_board=unix{-m32}, with no new failures.  Ok for mainline?
>
>
> 2022-05-30  Roger Sayle  
>
> gcc/ChangeLog
> * config/i386/i386.cc (ix86_modes_tieable_p): Allow SCmode to be
> tieable with DImode on TARGET_64BIT, and SCmode tieable with
> V2SFmode, and DCmode with V2DFmode.

I *think* this is OK, but hard to say for sure without some testcases.
Please note that x86_64 ABI passes SDmode in two separate XMM
registers.

Uros.

Re: [x86 PATCH] PR rtl-optimization/101617: Use neg/sbb in ix86_expand_int_movcc.

2022-05-30 Thread Uros Bizjak via Gcc-patches

On Mon, May 30, 2022 at 7:50 PM Roger Sayle  wrote:
>
>
> This patch resolves PR rtl-optimization/101617 where we should generate
> the exact same code for (X ? -1 : 1) as we do for ((X ? -1 : 0) | 1).
> The cause of the current difference on x86_64 is actually in
> ix86_expand_int_movcc that doesn't know that negl;sbbl can be used
> to create a -1/0 result depending on whether the input is zero/nonzero.
>
> So for Andrew Pinski's test case:
>
> int f1(int i)
> {
>   return i ? -1 : 1;
> }
>
> GCC currently generates:
>
> f1: cmpl$1, %edi
> sbbl%eax, %eax  // x ? 0 : -1
> andl$2, %eax// x ? 0 : 2
> subl$1, %eax// x ? -1 : 1
> ret
>
> but with the attached patch, now generates:
>
> f1: negl%edi
> sbbl%eax, %eax  // x ? -1 : 0
> orl $1, %eax// x ? -1 : 1
> ret
>
> To implement this I needed to add two expanders to i386.md to generate
> the required instructions (in both SImode and DImode) matching the
> pre-existing define_insns of the same name.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32},
> with no new failures.  Ok for mainline?
>
>
> 2022-05-30  Roger Sayle  
>
> gcc/ChangeLog
> PR rtl-optimization/101617
> * config/i386/i386-expand.cc (ix86_expand_int_movcc): Add a
> special case (indicated by negate_cc_compare_p) to generate a
> -1/0 mask using neg;sbb.
> * config/i386/i386.md (x86_neg_ccc): New define_expand
> to generate an *x86_neg_ccc instruction.
> (x86_movcc_0_m1_neg): Likewise, a new define_expand to
> generate a *x86_movcc_0_m1_neg instruction.
>
> gcc/testsuite/ChangeLog
> PR rtl-optimization/101617
> * gcc.target/i386/pr101617.c: New test case.

LGTM.

Thanks,
Uros.

[PATCH] i386: Remove constraints when used with constant integer predicates, take 2

2022-05-30 Thread Uros Bizjak via Gcc-patches

i386: Remove constraints when used with constant integer predicates, take 2

const_int_operand and other const*_operand predicates do not need
constraints when the constraint is inherited from the range of
constant integer predicate.  Remove the constraint in case all
alternatives use the same inherited constraint.

However, when there are operands, commitative with a non-constant
operand, the operand effectively matches e.g.
nonimmediate_operand|const_int_operand rather than just
const_int_operand.  We should keep the constraint for
const_int_operand that are in a % pair. See PR 105624.

2022-05-30  Uroš Bizjak  

gcc/ChangeLog:

* config/i386/i386.md: Remove constraints when used with
const_int_operand, const0_operand, const_1_operand, constm1_operand,
const8_operand, const128_operand, const248_operand, const123_operand,
const2367_operand, const1248_operand, const359_operand,
const_4_or_8_to_11_operand, const48_operand, const_0_to_1_operand,
const_0_to_3_operand, const_0_to_4_operand, const_0_to_5_operand,
const_0_to_7_operand, const_0_to_15_operand, const_0_to_31_operand,
const_0_to_63_operand, const_0_to_127_operand, const_0_to_255_operand,
const_0_to_255_mul_8_operand, const_1_to_31_operand,
const_1_to_63_operand, const_2_to_3_operand, const_4_to_5_operand,
const_4_to_7_operand, const_6_to_7_operand, const_8_to_9_operand,
const_8_to_11_operand, const_8_to_15_operand, const_10_to_11_operand,
const_12_to_13_operand, const_12_to_15_operand, const_14_to_15_operand,
const_16_to_19_operand, const_16_to_31_operand, const_20_to_23_operand,
const_24_to_27_operand and const_28_to_31_operand.
* config/i386/mmx.md: Ditto.
* config/i386/sse.md: Ditto.
* config/i386/subst.md: Ditto.
* config/i386/sync.md: Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr105624.c: New test.

Bootstrapped and regression tested on x86_64 -linux-gnu {,-m32}.

Pushed to master.

Uros.


p.diff.txt.gz
Description: application/gzip

[x86 PATCH] PR rtl-optimization/101617: Use neg/sbb in ix86_expand_int_movcc.

2022-05-30 Thread Roger Sayle


This patch resolves PR rtl-optimization/101617 where we should generate
the exact same code for (X ? -1 : 1) as we do for ((X ? -1 : 0) | 1).
The cause of the current difference on x86_64 is actually in
ix86_expand_int_movcc that doesn't know that negl;sbbl can be used
to create a -1/0 result depending on whether the input is zero/nonzero.

So for Andrew Pinski's test case:

int f1(int i)
{
  return i ? -1 : 1;
}

GCC currently generates:

f1: cmpl$1, %edi
sbbl%eax, %eax  // x ? 0 : -1
andl$2, %eax// x ? 0 : 2
subl$1, %eax// x ? -1 : 1
ret

but with the attached patch, now generates:

f1: negl%edi
sbbl%eax, %eax  // x ? -1 : 0
orl $1, %eax// x ? -1 : 1
ret

To implement this I needed to add two expanders to i386.md to generate
the required instructions (in both SImode and DImode) matching the
pre-existing define_insns of the same name.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32},
with no new failures.  Ok for mainline?


2022-05-30  Roger Sayle  

gcc/ChangeLog
PR rtl-optimization/101617
* config/i386/i386-expand.cc (ix86_expand_int_movcc): Add a
special case (indicated by negate_cc_compare_p) to generate a
-1/0 mask using neg;sbb.
* config/i386/i386.md (x86_neg_ccc): New define_expand
to generate an *x86_neg_ccc instruction.
(x86_movcc_0_m1_neg): Likewise, a new define_expand to
generate a *x86_movcc_0_m1_neg instruction.

gcc/testsuite/ChangeLog
PR rtl-optimization/101617
* gcc.target/i386/pr101617.c: New test case.


Thanks in advance,
Roger
--

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 5cd7b99..36f4698 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -3142,6 +3142,7 @@ ix86_expand_int_movcc (rtx operands[])
   rtx compare_op;
   machine_mode mode = GET_MODE (operands[0]);
   bool sign_bit_compare_p = false;
+  bool negate_cc_compare_p = false;
   rtx op0 = XEXP (operands[1], 0);
   rtx op1 = XEXP (operands[1], 1);
   rtx op2 = operands[2];
@@ -3188,16 +3189,48 @@ ix86_expand_int_movcc (rtx operands[])
   HOST_WIDE_INT cf = INTVAL (op3);
   HOST_WIDE_INT diff;
 
+  if ((mode == SImode
+  || (TARGET_64BIT && mode == DImode))
+ && (GET_MODE (op0) == SImode
+ || (TARGET_64BIT && GET_MODE (op0) == DImode)))
+   {
+ /* Special case x != 0 ? -1 : y.  */
+ if (code == NE && op1 == const0_rtx && ct == -1)
+   {
+ negate_cc_compare_p = true;
+ std::swap (ct, cf);
+ code = EQ;
+   }
+ else if (code == EQ && op1 == const0_rtx && cf == -1)
+   negate_cc_compare_p = true;
+   }
+
   diff = ct - cf;
   /*  Sign bit compares are better done using shifts than we do by using
  sbb.  */
   if (sign_bit_compare_p
+ || negate_cc_compare_p
  || ix86_expand_carry_flag_compare (code, op0, op1, _op))
{
  /* Detect overlap between destination and compare sources.  */
  rtx tmp = out;
 
-  if (!sign_bit_compare_p)
+ if (negate_cc_compare_p)
+   {
+ if (GET_MODE (op0) == DImode)
+   emit_insn (gen_x86_negdi_ccc (gen_reg_rtx (DImode), op0));
+ else
+   emit_insn (gen_x86_negsi_ccc (gen_reg_rtx (SImode),
+ gen_lowpart (SImode, op0)));
+
+ tmp = gen_reg_rtx (mode);
+ if (mode == DImode)
+   emit_insn (gen_x86_movdicc_0_m1_neg (tmp));
+ else
+   emit_insn (gen_x86_movsicc_0_m1_neg (gen_lowpart (SImode,
+ tmp)));
+   }
+ else if (!sign_bit_compare_p)
{
  rtx flags;
  bool fpcmp = false;
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 602dfa7..370df74 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -11189,6 +11189,14 @@
   [(set_attr "type" "negnot")
(set_attr "mode" "")])
 
+(define_expand "x86_neg_ccc"
+  [(parallel
+[(set (reg:CCC FLAGS_REG)
+ (ne:CCC (match_operand:SWI48 1 "register_operand")
+ (const_int 0)))
+ (set (match_operand:SWI48 0 "register_operand")
+ (neg:SWI48 (match_dup 1)))])])
+
 (define_insn "*negqi_ext_2"
   [(set (zero_extract:SWI248
  (match_operand:SWI248 0 "register_operand" "+Q")
@@ -20700,6 +20708,12 @@
(set_attr "mode" "")
(set_attr "length_immediate" "0")])
 
+(define_expand "x86_movcc_0_m1_neg"
+  [(parallel
+[(set (match_operand:SWI48 0 "register_operand")
+ (neg:SWI48 (ltu:SWI48 (reg:CCC FLAGS_REG) (const_int 0
+ (clobber (reg:CC

Re: [PATCH, OpenMP, v2] Implement uses_allocators clause for target regions

2022-05-30 Thread Jakub Jelinek via Gcc-patches

On Mon, May 30, 2022 at 10:43:30PM +0800, Chung-Lin Tang wrote:
> > This feels like you only accept a single allocator in the new syntax,
> > but that isn't my reading of the spec, I'd understand it as:
> > uses_allocators (memspace(omp_high_bw_mem_space), traits(foo_traits) : bar, 
> > baz, qux)
> > being valid too.
> 
> This patch now allows multiple allocators to be specified in new syntax, 
> although I have
> to note that the 5.2 specification of uses_allocators (page 181) specifically 
> says
> "allocator: expression of allocator_handle_type" for the "Arguments" 
> description,
> not a "list" like the allocate clause.

I guess this should be raised on omp-lang then what we really want.
Because the 5.1 syntax definitely allowed multiple allocators.

> > It should be only removed if we emit the error (again with break; too).
> > IMHO (see the other mail) we should complain here if it has value 0
> > (the omp_null_allocator case), dunno if we should error or just warn
> > if the value is outside of the range of known predefined identifiers (1..8
> > currently I think). > But, otherwise, IMHO we need to keep it around, 
> > perhaps replacing the
> > CONST_DECL with INTEGER_CST, for the purposes of checking what predefined
> > allocators are used in the region.
> 
> omp_alloc in libgomp does handle the omp_null_allocator case, by converting it
> to something else.

Sure, but the spec says that omp_alloc (42, omp_null_allocator) is invalid
in target regions unless requires dynamic_allocators is seen first.
And uses_allocators (omp_null_allocator) shouldn't make that valid.
> @@ -15651,6 +15653,199 @@ c_parser_omp_clause_allocate (c_parser *parser, 
> tree list)
>return nl;
>  }
>  
> +/* OpenMP 5.0:
> +   uses_allocators ( allocator-list )
> +
> +   allocator-list:
> +   allocator
> +   allocator , allocator-list
> +   allocator ( traits-array )
> +   allocator ( traits-array ) , allocator-list
> +
> +   OpenMP 5.2:
> +
> +   uses_allocators ( modifier : allocator-list )
> +   uses_allocators ( modifier , modifier : allocator-list )
> +
> +   modifier:
> +   traits ( traits-array )
> +   memspace ( mem-space-handle )  */
> +
> +static tree
> +c_parser_omp_clause_uses_allocators (c_parser *parser, tree list)
> +{
> +  location_t clause_loc = c_parser_peek_token (parser)->location;
> +  tree t = NULL_TREE, nl = list;
> +  matching_parens parens;
> +  if (!parens.require_open (parser))
> +return list;
> +
> +  tree memspace_expr = NULL_TREE;
> +  tree traits_var = NULL_TREE;
> +
> +  struct item_tok
> +  {
> +location_t loc;
> +tree id;
> +item_tok (void) : loc (UNKNOWN_LOCATION), id (NULL_TREE) {}
> +  };
> +  struct item { item_tok name, arg; };
> +  auto_vec *modifiers = NULL, *allocators = NULL;
> +  auto_vec *cur_list = new auto_vec (4);

Each vec/auto_vec with a new type brings quite some overhead,
a lot of functions need to be instantiated for it.
I think it would be far easier to use raw token parsing:
  unsigned int pos = 1;
  bool has_modifiers = false;
  while (true)
{
  c_token *tok = c_parser_peek_nth_token_raw (parser, pos);
  if (tok->type != CPP_NAME)
break;
  ++pos;
  if (c_parser_peek_nth_token_raw (parser, pos + 1)->type
  == CPP_OPEN_PAREN)
{
  ++pos;
  if (!c_parser_check_balanced_raw_token_sequence (parser, )
  || c_parser_peek_nth_token_raw (parser, pos)->type
 != CPP_CLOSE_PAREN)
break;
  ++pos;
}
  tok = c_parser_peek_nth_token_raw (parser, pos);
  if (tok->type == CPP_COLON)
{
  has_modifiers = true;
  break;
}
  if (tok->type != CPP_COMMA)
break;
  ++pos;
}
This should (haven't tested it though, so sorry if there are errors)
cheaply determine if one should or shouldn't parse modifiers and
then can just do parsing of modifiers if (has_modifiers) and
then just the list (with ()s inside of list only if (!has_modifiers)).
> @@ -21093,7 +21292,8 @@ c_parser_omp_target_exit_data (location_t loc, 
> c_parser *parser,
>   | (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_IN_REDUCTION) \
>   | (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_THREAD_LIMIT) \
>   | (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_IS_DEVICE_PTR)\
> - | (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_HAS_DEVICE_ADDR))
> + | (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_HAS_DEVICE_ADDR)\

Can you please fix up both the IS_DEVICE_PTR and newly added
HAS_DEVICE_ADDR change to have a space before \ ?

> + | (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_USES_ALLOCATORS))

> +static tree
> +cp_parser_omp_clause_uses_allocators (cp_parser *parser, tree list)
> +{
> +  location_t clause_loc
> += cp_lexer_peek_token (parser->lexer)->location;
> +  tree t = NULL_TREE, nl = list;
> +  matching_parens parens;
> +  if (!parens.require_open (parser))
> +return list;
> +
> +  tree memspace_expr = NULL_TREE;
> +  tree traits_var = NULL_TREE;
> +
>

Re: [PATCH] libgompd: Add ompd_get/rel_display_control_vars

2022-05-30 Thread Jakub Jelinek via Gcc-patches

On Fri, May 27, 2022 at 02:04:11AM +0200, Mohamed Atef wrote:
> libgomp/ChangeLog
> 
> 2022-05-27  Mohamed Atef  
> 
> * libgompd.map (ompd_get_display_control_vars,
> ompd_rel_display_control_vars): New global symbol versions.
> * env.c: (gompd_buffer, gompd_env_buff_size): New Variables.
> (dump_icvs): New function.
> (initialize_env): call dump_icvs.
> * ompd-icv.c: (ompd_get_display_control_vars): New function.
> (ompd_rel_display_control_vars): New function.

I don't like this solution, I know LLVM libomp does it this way,
but that adds complexity to the libgomp library to make it easier
for libgompd, and even worse further bloats .data section of
every OpenMP program even when debugging isn't enabled (and not just a tiny
bit, but by sizeof (char *) + sizeof (unsigned long) + 500 bytes, which is
significant).

This really should be done on the libgompd side, instead of copying
the gompd_buffer you've added, it should instead copy
gomp_global_icv and whatever else is necessary to print it, then
do that in libgompd.

> diff --git a/libgomp/env.c b/libgomp/env.c
> index 243c6267ef9..173c9271303 100644
> --- a/libgomp/env.c
> +++ b/libgomp/env.c
> @@ -91,6 +91,8 @@ unsigned long gomp_places_list_len;
>  uintptr_t gomp_def_allocator = omp_default_mem_alloc;
>  int gomp_debug_var;
>  int gompd_enabled;
> +char *gompd_buffer;
> +unsigned long gompd_env_buff_size;
>  unsigned int gomp_num_teams_var;
>  int gomp_nteams_var;
>  int gomp_teams_thread_limit_var;
> @@ -1453,6 +1455,187 @@ omp_display_env (int verbose)
>  }
>  ialias (omp_display_env)
>  
> +/* This function dumps all global ICVs into a buffer
> +   in the form "icv-name=icv-value\n", so that OMPD can read the
> +   buffer and display all icvs.  */
> +
> +static void
> +dump_icvs (void)
> +{
> +  static char temp_buffer[500];
> +  char temp_num_str[20];
> +  strcat (temp_buffer, "OMP_DYNAMIC=");
> +  strcat (temp_buffer, gomp_global_icv.dyn_var ? "TRUE\n" : "FALSE\n");
> +  strcat (temp_buffer, "OMP_NESTED=");
> +  strcat (temp_buffer,
> +  gomp_global_icv.max_active_levels_var > 1 ? "TRUE\n" : "FALSE\n");
> +  strcat (temp_buffer, "OMP_NUM_THREADS=");
> +  sprintf (temp_num_str, "%lu\n", gomp_global_icv.nthreads_var);
> +  strcat (temp_buffer, temp_num_str);
> +  strcat (temp_buffer, "OMP_SCHEDULE=");

This is terribly inefficient even when done that way on the libgompd side.
You really don't want every further strcat to walk all the already
previously stored bytes to find the end, and you don't want to hardcode
the size of the buffer, see
https://www.gnu.org/prep/standards/standards.html#Writing-Robust-Programs

So, the buffer should be dynamically allocated, and you should at all times
know how many bytes have been already stored.  One possibility is to
call a printing function twice, once with say NULL as the buffer which will
be a mode in which it just computes the length of each addition and just
returns the total length, then allocate and then pass non-NULL buffer
into which it will actually store it.
Or you could allocate keep some extra bytes in the buffer (longer than
say longest name of an env variable name, = char, largest representable
long long number, \n char plus a few bytes extra) and keep checking before
adding any var that is guaranteed to fit into those extra bytes whether
there are at least those number of bytes left in the buffer and reallocate
otherwise.  But for vars that can have arbitrarily long results (e.g. vars
with various lists), such checking needs to be done before adding any
partial result that can fit).

Jakub

Re: [PATCH 2/5] Implement generic range temporaries.

2022-05-30 Thread Andrew MacLeod via Gcc-patches


On 5/30/22 09:27, Aldy Hernandez wrote:

Now that we have generic ranges, we need a way to define generic local
temporaries on the stack for intermediate calculations in the ranger
and elsewhere.  We need temporaries analogous to int_range_max, but
for any of the supported types (currently just integers, but soon
integers, pointers, and floats).

The tmp_range object is such a temporary.  It is designed to be
transparently used as a vrange.  It shares vrange's abstract API, and
implicitly casts itself to a vrange when passed around.

The ultimate name will be value_range, but we need to remove legacy
first for that to happen.  Until then, tmp_range will do.

I was going to suggest maybe renaming value_range to legacy_range or 
something, and then start using value_range for ranges of any time.  
Then it occurred to me that numerous places which use value_range 
will/can continue to use value_range going forward.. ie


value_range vr;
 if (!rvals->range_of_expr (vr, name, stmt))
   return -1;

would be unaffected, to it would be pointless turmoil to rename that and 
then rename it back to value_range.


I also notice there are already a few instance of local variable named 
tmp_range, which make name renames annoying.   Perhaps we should use 
Value_Range or something like that in the interim for the multi-type 
ranges?   Then the rename is trivial down the road, formatting will be 
unaffected, and then we're kinda sorta using the end_goal name?


Andrew

Re: [PATCH] Make the default rtx_costs of MULT/DIV variants consistent.

2022-05-30 Thread Jeff Law via Gcc-patches





On 5/30/2022 7:46 AM, Roger Sayle wrote:

GCC's middle-end provides a default cost model for RTL expressions, for
backends that don't specify their own instruction timings, that can be
summarized as multiplications are COSTS_N_INSNS(4), divisions are
COSTS_N_INSNS(7) and all other operations are COSTS_N_INSNS(1).
This patch tweaks the above definition so that fused-multiply-add
(FMA) and high-part multiplications cost the same as regular
multiplications,
or more importantly aren't (by default) considered less expensive.  Likewise
the saturating forms of multiplication and division cost the same as the
regular variants.  These values can always be changed by the target, but
the goal is to avoid RTL expansion substituting a suitable operation with
its saturating equivalent because it (accidentally) looks much cheaper.
For example, PR 89845 is about implementing division/modulus via highpart
multiply, which may accidentally look extremely cheap.

I believe there should be no code generation changes for this patch,
but of course I'm happy to address any adverse changes on rare targets.
This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32},
with no new failures.  Ok for mainline?


2022-05-30  Roger Sayle  

gcc/ChangeLog
 * rtlanal.cc (rtx_cost) : Treat FMA, SS_MULT, US_MULT,
 SMUL_HIGHPART and UMUL_HIGHPART as having the same cost as MULT.
 : Likewise, SS_DIV and US_DIV have the same default as DIV.

OK.

Jeff

Re: [PATCH, OpenMP, v2] Implement uses_allocators clause for target regions

2022-05-30 Thread Chung-Lin Tang


Hi Jakub,
this is v3 of the uses_allocators patch.

On 2022/5/20 1:46 AM, Jakub Jelinek wrote:

On Tue, May 10, 2022 at 07:29:23PM +0800, Chung-Lin Tang wrote:

@@ -15624,6 +15626,233 @@ c_parser_omp_clause_allocate (c_parser *parser, tree 
list)
return nl;
  }
  
+/* OpenMP 5.2:

+   uses_allocators ( allocator-list )


As uses_allocators is a 5.0 feature already, the above should say
/* OpenMP 5.0:

+
+   allocator-list:
+   allocator
+   allocator , allocator-list
+   allocator ( traits-array )
+   allocator ( traits-array ) , allocator-list
+


And here it should add
   OpenMP 5.2:


Done.


+  if (c_parser_next_token_is (parser, CPP_NAME))
+{
+  c_token *tok = c_parser_peek_token (parser);
+  const char *p = IDENTIFIER_POINTER (tok->value);
+
+  if (strcmp ("traits", p) == 0 || strcmp ("memspace", p) == 0)
+   {
+ has_modifiers = true;
+ c_parser_consume_token (parser);
+ matching_parens parens2;;


Double ;;, should be just ;
But more importantly, it is more complex.
When you see
uses_allocators(traits or
uses_allocators(memspace
it is not given that it has modifiers.  While the 5.0/5.1 syntax had
a restriction that when allocator is not a predefined allocator (and
traits or memspace aren't predefined allocators) it must use ()s with
traits, so
uses_allocators(traits)
uses_allocators(memspace)
uses_allocators(traits,memspace)
are all invalid,
omp_allocator_handle_t traits;
uses_allocators(traits(mytraits))
or
omp_allocator_handle_t memspace;
uses_allocators(memspace(mytraits),omp_default_mem_alloc)
are valid in the old syntax.

So, I'm afraid to find out if the traits or memspace identifier
seen after uses_allocator ( are modifiers or not we need to
peek (for C with c_parser_peek_nth_token_raw) through all the
modifiers whether we see a : and only in that case say they
are modifiers rather than the old style syntax.


The parser parts have been rewritten to allow this kind of use now.
New code essentially parses lists of "id(id), id(id), ...", possibly delimited
by a ':' marking the modifier/allocator lists.


I don't really like the modifiers handling not done in a loop.
As I said above, there needs to be some check whether there are modifiers or
not, but once we figure out there are modifiers, it should be done in a loop
with say some mask var on which traits have been already handled to diagnose
duplicates, we don't want to do the parsing code twice.


Now everything is done in loops. The new code should be considerably simpler 
now.


This feels like you only accept a single allocator in the new syntax,
but that isn't my reading of the spec, I'd understand it as:
uses_allocators (memspace(omp_high_bw_mem_space), traits(foo_traits) : bar, 
baz, qux)
being valid too.


This patch now allows multiple allocators to be specified in new syntax, 
although I have
to note that the 5.2 specification of uses_allocators (page 181) specifically 
says
"allocator: expression of allocator_handle_type" for the "Arguments" 
description,
not a "list" like the allocate clause.


+   case OMP_CLAUSE_USES_ALLOCATORS:
+ t = OMP_CLAUSE_USES_ALLOCATORS_ALLOCATOR (c);
+ if (bitmap_bit_p (_head, DECL_UID (t))
+ || bitmap_bit_p (_head, DECL_UID (t))
+ || bitmap_bit_p (_head, DECL_UID (t))
+ || bitmap_bit_p (_head, DECL_UID (t)))


You can't just use DECL_UID before you actually verify it is a variable.
So IMHO this particular if should be moved down somewhat.


Guarded now.


+   {
+ error_at (OMP_CLAUSE_LOCATION (c),
+   "%qE appears more than once in data clauses", t);
+ remove = true;
+   }
+ else
+   bitmap_set_bit (_head, DECL_UID (t));
+ if (TREE_CODE (TREE_TYPE (t)) != ENUMERAL_TYPE
+ || strcmp (IDENTIFIER_POINTER (TYPE_IDENTIFIER (TREE_TYPE (t))),
+"omp_allocator_handle_t") != 0)
+   {
+ error_at (OMP_CLAUSE_LOCATION (c),
+   "allocator must be of % type");
+ remove = true;
+   }


I'd add break; after remove = true;


Added some such breaks.


+ if (TREE_CODE (t) == CONST_DECL)
+   {
+ if (OMP_CLAUSE_USES_ALLOCATORS_MEMSPACE (c)
+ || OMP_CLAUSE_USES_ALLOCATORS_TRAITS (c))
+   error_at (OMP_CLAUSE_LOCATION (c),
+ "modifiers cannot be used with pre-defined "
+ "allocators");
+
+ /* Currently for pre-defined allocators in libgomp, we do not
+require additional init/fini inside target regions, so discard
+such clauses.  */
+ remove = true;
+   }


It should be only removed if we emit the error (again with break; too).
IMHO (see the other mail) we should complain here if it has value 0
(the omp_null_allocator case), dunno if we should error or just warn
if the value is

[PATCH] Make the default rtx_costs of MULT/DIV variants consistent.

2022-05-30 Thread Roger Sayle


GCC's middle-end provides a default cost model for RTL expressions, for
backends that don't specify their own instruction timings, that can be
summarized as multiplications are COSTS_N_INSNS(4), divisions are
COSTS_N_INSNS(7) and all other operations are COSTS_N_INSNS(1).
This patch tweaks the above definition so that fused-multiply-add
(FMA) and high-part multiplications cost the same as regular
multiplications,
or more importantly aren't (by default) considered less expensive.  Likewise
the saturating forms of multiplication and division cost the same as the
regular variants.  These values can always be changed by the target, but
the goal is to avoid RTL expansion substituting a suitable operation with
its saturating equivalent because it (accidentally) looks much cheaper.
For example, PR 89845 is about implementing division/modulus via highpart
multiply, which may accidentally look extremely cheap.

I believe there should be no code generation changes for this patch,
but of course I'm happy to address any adverse changes on rare targets.
This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32},
with no new failures.  Ok for mainline?


2022-05-30  Roger Sayle  

gcc/ChangeLog
* rtlanal.cc (rtx_cost) : Treat FMA, SS_MULT, US_MULT,
SMUL_HIGHPART and UMUL_HIGHPART as having the same cost as MULT.
: Likewise, SS_DIV and US_DIV have the same default as DIV.


Thanks in advance,
Roger
--

diff --git a/gcc/rtlanal.cc b/gcc/rtlanal.cc
index 7c29682..d78cc60 100644
--- a/gcc/rtlanal.cc
+++ b/gcc/rtlanal.cc
@@ -4578,6 +4578,11 @@ rtx_cost (rtx x, machine_mode mode, enum rtx_code 
outer_code,
   switch (code)
 {
 case MULT:
+case FMA:
+case SS_MULT:
+case US_MULT:
+case SMUL_HIGHPART:
+case UMUL_HIGHPART:
   /* Multiplication has time-complexity O(N*N), where N is the
 number of units (translated from digits) when using
 schoolbook long multiplication.  */
@@ -4587,6 +4592,8 @@ rtx_cost (rtx x, machine_mode mode, enum rtx_code 
outer_code,
 case UDIV:
 case MOD:
 case UMOD:
+case SS_DIV:
+case US_DIV:
   /* Similarly, complexity for schoolbook long division.  */
   total = factor * factor * COSTS_N_INSNS (7);
   break;

[PATCH 5/5] Convert ranger and clients to vrange.

2022-05-30 Thread Aldy Hernandez via Gcc-patches

Finally, the meat of the work.  Convert ranger and associated clients
to vrange.

Everything's relatively mechanical given the previous patches.  I did
include a minor cleanup in the edge code.  There's no need to check
that the type of the switch is an integer as non-integer switches are
invalid.  I verified this with an appropriately coded assert.

Tested on x86-64 & ppc64le Linux.

gcc/ChangeLog:

* gimple-range-cache.cc (ssa_block_ranges::dump): Convert to vrange.
(sbr_vector::sbr_vector): Same.
(sbr_vector::grow): Same.
(sbr_vector::set_bb_range): Same.
(sbr_vector::get_bb_range): Same.
(sbr_sparse_bitmap::sbr_sparse_bitmap): Same.
(sbr_sparse_bitmap::set_bb_range): Same.
(sbr_sparse_bitmap::get_bb_range): Same.
(block_range_cache::set_bb_range): Same.
(block_range_cache::get_bb_range): Same.
(block_range_cache::dump): Same.
(ssa_global_cache::get_global_range): Same.
(ssa_global_cache::set_global_range): Same.
(ssa_global_cache::clear): Same.
(ssa_global_cache::dump): Same.
(ranger_cache::get_global_range): Same.
(ranger_cache::set_global_range): Same.
(ranger_cache::range_of_def): Same.
(ranger_cache::entry_range): Same.
(ranger_cache::exit_range): Same.
(ranger_cache::edge_range): Same.
(ranger_cache::range_of_expr): Same.
(ranger_cache::range_on_edge): Same.
(ranger_cache::block_range): Same.
(ranger_cache::propagate_cache): Same.
(ranger_cache::fill_block_cache): Same.
(ranger_cache::range_from_dom): Same.
* gimple-range-cache.h: Same.
* gimple-range-edge.cc (gimple_outgoing_range::get_edge_range):
Same.
(gimple_outgoing_range::switch_edge_range): Same.
(gimple_outgoing_range::edge_range_p): Same.
* gimple-range-edge.h: Same.
* gimple-range-fold.cc (fur_source::get_operand): Same.
(fur_source::get_phi_operand): Same.
(fur_edge::get_operand): Same.
(fur_edge::get_phi_operand): Same.
(fur_stmt::get_operand): Same.
(fur_stmt::get_phi_operand): Same.
(fur_list::fur_list): Same.
(fur_list::get_operand): Same.
(fur_list::get_phi_operand): Same.
(fold_range): Same.
(adjust_imagpart_expr): Same.
(adjust_realpart_expr): Same.
(gimple_range_adjustment): Same.
(fold_using_range::fold_stmt): Same.
(fold_using_range::range_of_range_op): Same.
(fold_using_range::range_of_address): Same.
(fold_using_range::range_of_phi): Same.
(fold_using_range::range_of_call): Same.
(fold_using_range::range_of_builtin_call): Same.
(fold_using_range::range_of_builtin_int_call): Same.
(fold_using_range::range_of_cond_expr): Same.
(fur_source::register_outgoing_edges): Same.
* gimple-range-fold.h (fold_range): Same.
(gimple_range_type): Same.
(gimple_range_ssa_p): Same.
* gimple-range-gori.cc (gimple_range_calc_op1): Same.
(gimple_range_calc_op2): Same.
(gori_compute::compute_operand_range_switch): Same.
(gori_compute::compute_operand_range): Same.
(gori_compute::logical_combine): Same.
(gori_compute::compute_logical_operands): Same.
(gori_compute::compute_operand1_range): Same.
(gori_compute::compute_operand2_range): Same.
(gori_compute::compute_operand1_and_operand2_range): Same.
(gori_compute::outgoing_edge_range_p): Same.
(gori_compute::condexpr_adjust): Same.
* gimple-range-gori.h (gimple_range_calc_op1): Same.
(gimple_range_calc_op2): Same.
* gimple-range-path.cc (path_range_query::get_cache): Same.
(path_range_query::set_cache): Same.
(path_range_query::range_on_path_entry): Same.
(path_range_query::internal_range_of_expr): Same.
(path_range_query::range_of_expr): Same.
(path_range_query::ssa_range_in_phi): Same.
(path_range_query::range_defined_in_block): Same.
(path_range_query::compute_ranges_in_phis): Same.
(path_range_query::compute_ranges_in_block): Same.
(path_range_query::add_to_imports): Same.
(path_range_query::range_of_stmt): Same.
* gimple-range-path.h: Same.
* gimple-range-side-effect.cc (stmt_side_effects::add_range): Same.
(side_effect_manager::~side_effect_manager): Same.
(side_effect_manager::get_nonzero): Same.
(side_effect_manager::maybe_adjust_range): Same.
(side_effect_manager::add_range): Same.
* gimple-range-side-effect.h: Same.
* gimple-range-tests.cc: Same.
* gimple-range-trace.cc (range_tracer::trailer): Same.
(debug_seed_ranger): Same.
* gimple-range-trace.h: Same.
* gimple-range.cc (gimple_ranger::range_of_expr): Same.

[PATCH 3/5] Convert range-op.* to vrange.

2022-05-30 Thread Aldy Hernandez via Gcc-patches

This patch provides the infrastructure to make range-ops type agnostic.

First, the range_op_handler function has been replaced with an object
of the same name.  It's coded in such a way to minimize changes to the
code base, and to encapsulate the dispatch code.

Instead of:

range_operator *op = range_op_handler (code, type);
if (op)
  op->fold_range (...);

We now do:
range_op_handler op (code, type);
if (op)
  op->fold_range (...);

I've folded gimple_range_handler into the range_op_handler class,
since it's also a query into the range operators.

Instead of:

range_operator *handler = gimple_range_handler (stmt);

We now do:

range_op_handler handler (stmt);

This all has the added benefit of moving all the dispatch code into an
independent class and avoid polluting range_operator (which we'll
further split later when frange and prange come live).

There's this annoying "using" keyword that's been added to each
operator due to hiding rules in C++.  The issue is that we will have
different virtual versions of fold_range() for each combination of
operands.  For example:

// Traditional binary op on irange's.
fold_range (irange , const irange , const irange );
// For POINTER_DIFF_EXPR:
fold_range (irange , const prange , const prange );
// Cast from irange to prange.
fold_range (prange , const irange , const irange );

Overloading virtuals when there are multiple same named methods causes
hidden virtuals warnings from -Woverloaded-virtual, thus the using
keyword.  An alternative would be to have different names:
fold_range_III, fold_range_IPP, fold_range_PII, but that's uglier
still.

Tested on x86-64 & ppc64le Linux.

gcc/ChangeLog:

* gimple-range-edge.cc (gimple_outgoing_range_stmt_p): Adjust for
vrange and convert range_op_handler function calls to use the
identically named object.
* gimple-range-fold.cc (gimple_range_operand1): Same.
(gimple_range_operand2): Same.
(fold_using_range::fold_stmt): Same.
(fold_using_range::range_of_range_op): Same.
(fold_using_range::range_of_builtin_ubsan_call): Same.
(fold_using_range::relation_fold_and_or): Same.
(fur_source::register_outgoing_edges): Same.
* gimple-range-fold.h (gimple_range_handler): Remove.
* gimple-range-gori.cc (gimple_range_calc_op1): Adjust for vrange.
(gimple_range_calc_op2): Same.
(range_def_chain::get_def_chain): Same.
(gori_compute::compute_operand_range): Same.
(gori_compute::condexpr_adjust): Same.
* gimple-range.cc (gimple_ranger::prefill_name): Same.
(gimple_ranger::prefill_stmt_dependencies): Same.
* range-op.cc (get_bool_state): Same.
(class operator_equal): Add using clause.
(class operator_not_equal): Same.
(class operator_lt): Same.
(class operator_le): Same.
(class operator_gt): Same.
(class operator_ge): Same.
(class operator_plus): Same.
(class operator_minus): Same.
(class operator_mult): Same.
(class operator_exact_divide): Same.
(class operator_lshift): Same.
(class operator_rshift): Same.
(class operator_cast): Same.
(class operator_logical_and): Same.
(class operator_bitwise_and): Same.
(class operator_logical_or): Same.
(class operator_bitwise_or): Same.
(class operator_bitwise_xor): Same.
(class operator_trunc_mod): Same.
(class operator_logical_not): Same.
(class operator_bitwise_not): Same.
(class operator_cst): Same.
(class operator_identity): Same.
(class operator_unknown): Same.
(class operator_abs): Same.
(class operator_negate): Same.
(class operator_addr_expr): Same.
(class pointer_or_operator): Same.
(operator_plus::op1_range): Adjust for vrange.
(operator_minus::op1_range): Same.
(operator_mult::op1_range): Same.
(operator_cast::op1_range): Same.
(operator_bitwise_not::fold_range): Same.
(operator_negate::fold_range): Same.
(range_op_handler): Rename to...
(get_handler): ...this.
(range_op_handler::range_op_handler): New.
(range_op_handler::fold_range): New.
(range_op_handler::op1_range): New.
(range_op_handler::op2_range): New.
(range_op_handler::lhs_op1_relation): New.
(range_op_handler::lhs_op2_relation): New.
(range_op_handler::op1_op2_relation): New.
(range_cast): Adjust for vrange.
* range-op.h (range_op_handler): Remove function.
(range_cast): Adjust for vrange.
(class range_op_handler): New.
(get_bool_state): Adjust for vrange.
(empty_range_varying): Same.
(relop_early_resolve): Same.
* tree-data-ref.cc (compute_distributive_range): Same.

[PATCH 4/5] Revamp irange_allocator to handle vranges.

2022-05-30 Thread Aldy Hernandez via Gcc-patches

This patch revamps the range allocator to handle generic vrange's.
I've cleaned it up somehow to make it obvious the various things you
can allocate with it.  I've also moved away from overloads into
distinct names when appropriate.

The various entry points are now:

  // Allocate a range of TYPE.
  vrange *alloc_vrange (tree type);
  // Allocate a memory block of BYTES.
  void *alloc (unsigned bytes);
  // Return a clone of SRC.
  template  T *clone (const T );

It is now possible to allocate a clone of an irange, or any future
range types:

  irange *i = allocator.clone  (some_irange);
  frange *f = allocator.clone  (some_frange);

You can actually do so without the <>, but I find it clearer to
specify the vrange type.

So with it you can allocate a specific range type, or vrange, or a
block of memory.

I have rewritten the C style casts to C++ casts, since casts tend to
be hints of problematic designs.  With the C++ casts you can at least
grep for them easier.  Speak of which, the next patch, which converts
ranger to vrange, will further clean this space by removing some
unnecessary casts.

Tested on x86-64 Linux and ppc64le Linux.

* gimple-range-cache.cc (sbr_vector::sbr_vector): Adjust for
vrange allocator.
(sbr_vector::grow): Same.
(sbr_vector::set_bb_range): Same.
(sbr_sparse_bitmap::sbr_sparse_bitmap): Same.
(sbr_sparse_bitmap::set_bb_range): Same.
(block_range_cache::~block_range_cache): Same.
(block_range_cache::set_bb_range): Same.
(ssa_global_cache::ssa_global_cache): Same.
(ssa_global_cache::~ssa_global_cache): Same.
(ssa_global_cache::set_global_range): Same.
* gimple-range-cache.h (block_range_cache): Same.
(ssa_global_cache): Same.
* gimple-range-edge.cc
(gimple_outgoing_range::calc_switch_ranges): Same.
* gimple-range-edge.h (gimple_outgoing_range): Same.
* gimple-range-side-effect.cc (side_effect_manager::get_nonzero):
Same.
(side_effect_manager::add_range): Same.
* gimple-range-side-effect.h (class side_effect_manager): Same.
* value-range.h (class irange_allocator): Rename to...
(class vrange_allocator): ...this.
(irange_allocator::irange_allocator): New.
(vrange_allocator::vrange_allocator): New.
(irange_allocator::~irange_allocator): New.
(vrange_allocator::~vrange_allocator): New.
(irange_allocator::get_memory): Rename to...
(vrange_allocator::alloc): ...this.
(vrange_allocator::alloc_vrange): Rename from...
(irange_allocator::allocate): ...this.
(vrange_allocator::alloc_irange): New.
---
 gcc/gimple-range-cache.cc   | 55 +++---
 gcc/gimple-range-cache.h|  4 +-
 gcc/gimple-range-edge.cc|  4 +-
 gcc/gimple-range-edge.h |  2 +-
 gcc/gimple-range-side-effect.cc | 13 --
 gcc/gimple-range-side-effect.h  |  2 +-
 gcc/value-range.h   | 82 +
 7 files changed, 96 insertions(+), 66 deletions(-)

diff --git a/gcc/gimple-range-cache.cc b/gcc/gimple-range-cache.cc
index c726393b380..9c541993fb6 100644
--- a/gcc/gimple-range-cache.cc
+++ b/gcc/gimple-range-cache.cc
@@ -75,7 +75,7 @@ ssa_block_ranges::dump (FILE *f)
 class sbr_vector : public ssa_block_ranges
 {
 public:
-  sbr_vector (tree t, irange_allocator *allocator);
+  sbr_vector (tree t, vrange_allocator *allocator);
 
   virtual bool set_bb_range (const_basic_block bb, const irange ) override;
   virtual bool get_bb_range (irange , const_basic_block bb) override;
@@ -86,20 +86,21 @@ protected:
   int_range<2> m_varying;
   int_range<2> m_undefined;
   tree m_type;
-  irange_allocator *m_irange_allocator;
+  vrange_allocator *m_range_allocator;
   void grow ();
 };
 
 
 // Initialize a block cache for an ssa_name of type T.
 
-sbr_vector::sbr_vector (tree t, irange_allocator *allocator)
+sbr_vector::sbr_vector (tree t, vrange_allocator *allocator)
 {
   gcc_checking_assert (TYPE_P (t));
   m_type = t;
-  m_irange_allocator = allocator;
+  m_range_allocator = allocator;
   m_tab_size = last_basic_block_for_fn (cfun) + 1;
-  m_tab = (irange **)allocator->get_memory (m_tab_size * sizeof (irange *));
+  m_tab = static_cast 
+(allocator->alloc (m_tab_size * sizeof (irange *)));
   memset (m_tab, 0, m_tab_size * sizeof (irange *));
 
   // Create the cached type range.
@@ -121,8 +122,8 @@ sbr_vector::grow ()
   int new_size = inc + curr_bb_size;
 
   // Allocate new memory, copy the old vector and clear the new space.
-  irange **t = (irange **)m_irange_allocator->get_memory (new_size
- * sizeof (irange *));
+  irange **t = static_cast 
+(m_range_allocator->alloc (new_size * sizeof (irange *)));
   memcpy (t, m_tab, m_tab_size * sizeof (irange *));
   memset (t + m_tab_size, 0, (new_size - m_tab_size) * sizeof (irange *));
 
@@ -143,7

[PATCH 2/5] Implement generic range temporaries.

2022-05-30 Thread Aldy Hernandez via Gcc-patches

Now that we have generic ranges, we need a way to define generic local
temporaries on the stack for intermediate calculations in the ranger
and elsewhere.  We need temporaries analogous to int_range_max, but
for any of the supported types (currently just integers, but soon
integers, pointers, and floats).

The tmp_range object is such a temporary.  It is designed to be
transparently used as a vrange.  It shares vrange's abstract API, and
implicitly casts itself to a vrange when passed around.

The ultimate name will be value_range, but we need to remove legacy
first for that to happen.  Until then, tmp_range will do.

Sample usage is as follows.  Instead of:

extern void foo (vrange &);

int_range_max t;
t.set_nonzero (type);
foo (t);

one does:

tmp_range t (type);
t.set_nonzero (type);
foo (t);

You can also delay initialization, for use in loops for example:

tmp_range t;
...
t.init (type);
t.set_varying (type);

Creating an supported range type, will result in an unsupported_range
object being created, which will trap if anything but set_undefined()
and undefined_p() are called on it.  There's no size penalty for the
unsupported_range, since its immutable and can be shared across
instances.

Since supports_type_p() is called at construction time for each
temporary, I've removed the non-zero check from this function, which
was mostly unneeded.  I fixed the handful of callers that were
passing null types, and in the process sped things up a bit.

As more range types come about, the tmp_range class will be augmented
to support them by adding the relevant bits in the initialization
code.

Tested on x86-64 & ppc64le Linux.

gcc/ChangeLog:

* gimple-range-fold.h (gimple_range_type): Check type before
calling supports_type_p.
* gimple-range-path.cc (path_range_query::range_of_stmt): Same.
* value-query.cc (range_query::get_tree_range): Same.
* value-range.cc (tmp_range::lower_bound): New.
(tmp_range::upper_bound): New.
(tmp_range::dump): New.
* value-range.h (class tmp_range): New.
(irange::supports_type_p): Do not check if type is non-zero.
---
 gcc/gimple-range-fold.h  |   2 +-
 gcc/gimple-range-path.cc |   2 +-
 gcc/value-query.cc   |   3 +-
 gcc/value-range.cc   |  38 
 gcc/value-range.h| 130 ++-
 5 files changed, 169 insertions(+), 6 deletions(-)

diff --git a/gcc/gimple-range-fold.h b/gcc/gimple-range-fold.h
index 53a5bf85dd4..20cb73dabb9 100644
--- a/gcc/gimple-range-fold.h
+++ b/gcc/gimple-range-fold.h
@@ -81,7 +81,7 @@ gimple_range_type (const gimple *s)
type = TREE_TYPE (type);
}
 }
-  if (irange::supports_type_p (type))
+  if (type && irange::supports_type_p (type))
 return type;
   return NULL_TREE;
 }
diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc
index 459d3797da7..66f433dd1d5 100644
--- a/gcc/gimple-range-path.cc
+++ b/gcc/gimple-range-path.cc
@@ -755,7 +755,7 @@ path_range_query::range_of_stmt (irange , gimple *stmt, 
tree)
 {
   tree type = gimple_range_type (stmt);
 
-  if (!irange::supports_type_p (type))
+  if (!type || !irange::supports_type_p (type))
 return false;
 
   // If resolving unknowns, fold the statement making use of any
diff --git a/gcc/value-query.cc b/gcc/value-query.cc
index 9ccd802457b..26e3858103b 100644
--- a/gcc/value-query.cc
+++ b/gcc/value-query.cc
@@ -249,7 +249,8 @@ range_query::get_tree_range (irange , tree expr, gimple 
*stmt)
   if (UNARY_CLASS_P (expr))
 {
   range_operator *op = range_op_handler (TREE_CODE (expr), type);
-  if (op)
+  tree op0_type = TREE_TYPE (TREE_OPERAND (expr, 0));
+  if (op && irange::supports_type_p (op0_type))
{
  int_range_max r0;
  range_of_expr (r0, TREE_OPERAND (expr, 0), stmt);
diff --git a/gcc/value-range.cc b/gcc/value-range.cc
index 97ff0614f48..c5f326ab479 100644
--- a/gcc/value-range.cc
+++ b/gcc/value-range.cc
@@ -30,6 +30,42 @@ along with GCC; see the file COPYING3.  If not see
 #include "fold-const.h"
 #include "gimple-range.h"
 
+// Convenience function only available for integers and pointers.
+
+wide_int
+tmp_range::lower_bound () const
+{
+  if (is_a  (*m_vrange))
+return as_a  (*m_vrange).lower_bound ();
+  gcc_unreachable ();
+}
+
+// Convenience function only available for integers and pointers.
+
+wide_int
+tmp_range::upper_bound () const
+{
+  if (is_a  (*m_vrange))
+return as_a  (*m_vrange).upper_bound ();
+  gcc_unreachable ();
+}
+
+void
+tmp_range::dump (FILE *out) const
+{
+  if (m_vrange)
+m_vrange->dump (out);
+  else
+fprintf (out, "NULL");
+}
+
+DEBUG_FUNCTION void
+debug (const tmp_range )
+{
+  r.dump (stderr);
+  fprintf (stderr, "\n");
+}
+
 // Default implementation when none has been defined.
 
 bool
@@ -186,6 +222,8 @@ unsupported_range::fits_p (const vrange &) const

[PATCH 1/5] Implement abstract vrange class.

2022-05-30 Thread Aldy Hernandez via Gcc-patches

This is a series of patches making ranger type agnostic in preparation
for contributing support for other types of ranges (pointers and
floats initially).

The first step in this process is to implement vrange, an abstract
class that will be exclusively used by ranger, and from which all
ranges will inherit.  Vrange provides the minimum operations for
ranger to work.  The current virtual methods are what we've used to
implement frange (floats) and prange (pointers), but we may restrict
the virtual methods further as other ranges come about
(i.e. set_nonnegative() has no meaning for a future string range).

This patchset also provides a mechanism for declaring local type
agnostic ranges that can transparently hold an irange, frange,
prange's, etc, and a dispatch mechanism for range-ops to work with
various range types.  More details in the relevant patches.

FUTURE PLAN
===

The plan after this is to contribute a bare bones implementation for
floats (frange) that will provide relationals, followed by a
separation of integers and pointers (irange and prange).  Once this is
in place, we can further enhance both floats and pointers.  For
example, pointer tracking, pointer plus optimizations, and keeping
track of NaN's, etc.

Once frange and prange come live, all ranger clients will immediately
benefit from these enhancements.  For instance, in our local branch,
the threader is already float aware with regards to relationals.

We expect to wait a few weeks before starting to contribute further
enhancements to give the tree a time to stabilize, and Andrew time to
rebase his upcoming patches  :-P.

NOTES
=

In discussions with Andrew, it has become clear that with vrange
coming about, supports_type_p() is somewhat ambiguous.  Prior to
vrange it has been used to (a) determine if a type is supported by
ranger, (b) as a short-cut for checking if a type is pointer or integer,
as well as (c) to see if a given range can hold a type.  These things
have had the same meaning in irange, but are slightly different with
vrange.  I will address this in a follow-up patch.

Speaking of supported types, we now provide an unsupported_range
for passing around ranges for unsupported types. We've been silently
doing this for a while, in both vr-values by creating VARYING for
unsupported types with error_mark_node end points, and in ranger when
we pass an unsupported range before we realize in range_of_expr that
it's unsupported.  This class just formalizes what we've already been
doing in an irange, but making it explicit that you can't do anything
with these ranges except pass them.  Any other operation traps.

There is no GTY support for vrange yet, as we don't store it long
term.  When we contribute support for global ranges (think
SSA_NAME_RANGE_INFO but for generic ranges), we will include it.  There
was just no need to pollute this patchset with it.

TESTING
===

The patchset has been tested on x86-64 Linux as well as ppc64 Linux.
I have also verified that we fold the same number of conditionals in
evrp as well as thread the same number of paths.  There should be no
user visible changes.

We have also benchmarked the work, with the final numbers being an
*improvement* of 1.92% for evrp, and 0.82% for VRP.  Overall
compilation has a miniscule improvement.  This is despite the extra
indirection level.

The improvements are mostly because of small cleanups required for the
generalization of ranges.  As a sanity check, I stuck kcachegrind on a
few sample .ii files to see where the time was being gained.  Most of
the gain came from gimple_range_global() being 19% faster.  This
function is called a lot, and it was constructing a legacy
value_range, then returning it by value, which the caller then had to
convert to an irange.  This is in line with other pending work:
anytime we get rid of legacy, we gain time.

I will wait a few days before committing to welcome any comments.

gcc/ChangeLog:

* value-range-equiv.cc (value_range_equiv::set): New.
* value-range-equiv.h (class value_range_equiv): Make set method
virtual.
Remove default bitmap argument from set method.
* value-range.cc (vrange::contains_p): New.
(vrange::singleton_p): New.
(vrange::operator=): New.
(vrange::operator==): New.
(irange::fits_p): Move to .cc file.
(irange::set_nonnegative): New.
(unsupported_range::unsupported_range): New.
(unsupported_range::set): New.
(unsupported_range::type): New.
(unsupported_range::set_undefined): New.
(unsupported_range::set_varying): New.
(unsupported_range::dump): New.
(unsupported_range::union_): New.
(unsupported_range::intersect): New.
(unsupported_range::zero_p): New.
(unsupported_range::nonzero_p): New.
(unsupported_range::set_nonzero): New.
(unsupported_range::set_zero): New.
(unsupported_range::set_nonnegative): New.

[x86 PATCH] Allow SCmode and DImode to be tieable on TARGET_64BIT.

2022-05-30 Thread Roger Sayle


This patch is a form of insurance policy in case my patch for PR 7061 runs
into problems on non-x86 targets; the middle-end can add an extra check
that the backend is happy placing SCmode and DImode values in the same
register, before creating a SUBREG.  Unfortunately, ix86_modes_tieable_p
currently claims this is not allowed(?), even though the default target
hook for modes_tieable_p is to always return true [i.e. false can be
used to specifically prohibit bad combinations], and the x86_64 ABI
passes SCmode values in DImode registers!.  This makes the backend's
modes_tiable_p hook a little more forgiving, and additionally enables
interconversion between SCmode and V2SFmode, and between DCmode and
VD2Fmode, which opens interesting opportunities in the future.

I believe there should currently be no code generation differences
with this change.  This patch has been tested on x86_64-pc-linux-gnu
with make bootstrap and make -k check, both with and without
--target_board=unix{-m32}, with no new failures.  Ok for mainline?


2022-05-30  Roger Sayle  

gcc/ChangeLog
* config/i386/i386.cc (ix86_modes_tieable_p): Allow SCmode to be
tieable with DImode on TARGET_64BIT, and SCmode tieable with
V2SFmode, and DCmode with V2DFmode.


Thanks in advance,
Roger
--

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index daa60ac..df5c80d 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -20141,6 +20141,18 @@ ix86_modes_tieable_p (machine_mode mode1, machine_mode 
mode2)
 return (GET_MODE_SIZE (mode1) == 8
&& ix86_hard_regno_mode_ok (FIRST_MMX_REG, mode1));
 
+  /* SCmode and DImode can be tied.  */
+  if ((mode1 == E_SCmode && mode2 == E_DImode)
+  || (mode1 == E_DImode && mode2 == E_SCmode))
+return TARGET_64BIT;
+
+  /* [SD]Cmode and V2[SD]Fmode modes can be tied.  */
+  if ((mode1 == E_SCmode && mode2 == E_V2SFmode)
+  || (mode1 == E_V2SFmode && mode2 == E_SCmode)
+  || (mode1 == E_DCmode && mode2 == E_V2DFmode)
+  || (mode1 == E_V2DFmode && mode2 == E_DCmode))
+return true;
+
   return false;
 }

[PATCH] tree-optimization/105763 - avoid abnormals with ranger

2022-05-30 Thread Richard Biener via Gcc-patches



In unswitching we use ranger to simplify switch statements so we
have to avoid doing anything for abnormals.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2022-05-30  Richard Biener  

PR tree-optimization/105763
* tree-ssa-loop-unswitch.cc (find_unswitching_predicates_for_bb):
Check gimple_range_ssa_p.

* gcc.dg/pr105763.c: New testcase.
---
 gcc/testsuite/gcc.dg/pr105763.c | 21 +
 gcc/tree-ssa-loop-unswitch.cc   |  3 +--
 2 files changed, 22 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr105763.c

diff --git a/gcc/testsuite/gcc.dg/pr105763.c b/gcc/testsuite/gcc.dg/pr105763.c
new file mode 100644
index 000..4c76b17e73d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr105763.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+
+int rl2_decode_png_bit_depth;
+int *rl2_decode_png_p_data;
+void png_destroy_read_struct ();
+int __attribute__((returns_twice)) _setjmp ();
+void rl2_decode_png_row_pointers()
+{
+  unsigned sample_type = 0;
+  _setjmp();
+  switch (rl2_decode_png_bit_depth)
+  case 6:
+sample_type = 7;
+  png_destroy_read_struct();
+  for (;;)
+switch (sample_type)
+case 3:
+case 5:
+  *rl2_decode_png_p_data;
+}
diff --git a/gcc/tree-ssa-loop-unswitch.cc b/gcc/tree-ssa-loop-unswitch.cc
index f32f1a78f00..a55905c2c68 100644
--- a/gcc/tree-ssa-loop-unswitch.cc
+++ b/gcc/tree-ssa-loop-unswitch.cc
@@ -494,8 +494,7 @@ find_unswitching_predicates_for_bb (basic_block bb, class 
loop *loop,
 {
   unsigned nlabels = gimple_switch_num_labels (stmt);
   tree idx = gimple_switch_index (stmt);
-  if (TREE_CODE (idx) != SSA_NAME
- || nlabels < 1)
+  if (!gimple_range_ssa_p (idx) || nlabels < 1)
return;
   /* Index must be invariant.  */
   def = SSA_NAME_DEF_STMT (idx);
-- 
2.35.3

Re: [PATCH v5] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction

2022-05-30 Thread Richard Biener via Gcc-patches

On Sun, May 29, 2022 at 5:59 PM Di Zhao OS
 wrote:
>
> Hi, attached is a new version of the patch. The changes are:
> - Skip using temporary equivalences for floating-point values, because
> folding expressions can generate incorrect values. For example,
> operations on 0.0 and -0.0 may have different results.
> - Avoid inserting duplicated back-refs from value-number to predicates.
> - Disable fre in testsuite/g++.dg/pr83541.C .
>
> Summary of the previous versions:
> https://gcc.gnu.org/pipermail/gcc-patches/2021-December/587346.html
>
> Is the patch still considered?

Yes, sorry - I have to catch up after release work but will look again at
this change after some vacation next week.

Richard.

>
> Thanks,
> Di Zhao
>
> ---
>
> Extend FRE with temporary equivalences.
>
> 2022-05-29  Di Zhao  
>
> gcc/ChangeLog:
> PR tree-optimization/101186
> * tree-ssa-sccvn.c (VN_INFO): remove assertions (there could be a
> predicate already).
> (dominated_by_p_w_unex): Moved upward.
> (vn_nary_op_get_predicated_value): Moved upward.
> (is_vn_valid_at_bb): Check if vn_pval is valid at BB.
> (lookup_equiv_head): Lookup the "equivalence head" of given node.
> (lookup_equiv_heads): Lookup the "equivalence head"s of given nodes.
> (vn_tracking_edge): Extracted utility function.
> (init_vn_nary_op_from_stmt): Insert and lookup by "equivalence head"s.
> (vn_nary_op_insert_into): Insert new value at the front.
> (vn_nary_op_insert_pieces_predicated_1): Insert as predicated values
> from pieces.
> (fold_const_from_equiv_heads): Fold N-ary expression of equiv-heads.
> (push_new_nary_ref): Insert a back-reference to vn_nary_op_t.
> (val_equiv_insert): Record temporary equivalence.
> (vn_nary_op_insert_pieces_predicated): Record equivalences instead of
> some predicates; insert back-refs.
> (record_equiv_from_prev_phi_1): Record temporary equivalences 
> generated
> by PHI nodes.
> (record_equiv_from_prev_phi): Given an outgoing edge of a conditional
> expression taken, record equivalences generated by PHI nodes.
> (visit_nary_op): Add lookup previous results of N-ary operations by
> equivalences.
> (insert_related_predicates_on_edge): Some predicates can be computed
> from equivalences, no need to insert them.
> (process_bb): Add lookup predicated values by equivalences.
> (struct unwind_state): Unwind state of back-refs to vn_nary_op_t.
> (do_unwind): Unwind the back-refs to vn_nary_op_t.
> (do_rpo_vn): Update back-reference unwind state.
> * tree-ssa-sccvn.h (struct nary_ref): hold a lists of references to 
> the
> nary map entries.
>
> gcc/testsuite/ChangeLog:
>
> * g++.dg/pr83541.C: Disable fre.
> * gcc.dg/tree-ssa/pr68619-2.c: Disable fre.
> * gcc.dg/tree-ssa/pr71947-1.c: Disable fre.
> * gcc.dg/tree-ssa/pr71947-2.c: Disable fre.
> * gcc.dg/tree-ssa/pr71947-3.c: Disable fre.
> * gcc.dg/tree-ssa/pr71947-5.c: Disable fre.
> * gcc.dg/tree-ssa/pr71947-7.c: Disable fre.
> * gcc.dg/tree-ssa/pr71947-8.c: Disable fre.
> * gcc.dg/tree-ssa/pr71947-9.c: Disable fre.
> * gcc.dg/tree-ssa/vrp03.c: Disable fre.
> * gcc.dg/tree-ssa/ssa-fre-100.c: New test.
> * gcc.dg/tree-ssa/ssa-fre-101.c: New test.
> * gcc.dg/tree-ssa/ssa-fre-102.c: New test.
> * gcc.dg/tree-ssa/ssa-pre-34.c: New test.

Re: [PATCH] Add divide by zero side effect.

2022-05-30 Thread Richard Biener via Gcc-patches

On Fri, May 27, 2022 at 9:34 PM Andi Kleen via Gcc-patches
 wrote:
>
> Andrew MacLeod via Gcc-patches  writes:
> >
> > diff --git a/gcc/gimple-range-side-effect.cc 
> > b/gcc/gimple-range-side-effect.cc
> > index 2c8c77dc569..548e4bea313 100644
> > --- a/gcc/gimple-range-side-effect.cc
> > +++ b/gcc/gimple-range-side-effect.cc
> > @@ -116,6 +116,23 @@ stmt_side_effects::stmt_side_effects (gimple *s)
> >  walk_stmt_load_store_ops (s, (void *)this, non_null_loadstore,
> > non_null_loadstore);
> >
> > +  if (is_a (s))
> > +{
> > +  switch (gimple_assign_rhs_code (s))
> > + {
> > + case TRUNC_DIV_EXPR:
> > + case CEIL_DIV_EXPR:
> > + case FLOOR_DIV_EXPR:
> > + case ROUND_DIV_EXPR:
> > + case EXACT_DIV_EXPR:
> > +   // Divide means operand 2 is not zero after this stmt.
> > +   if (gimple_range_ssa_p (gimple_assign_rhs2 (s)))
> > + add_nonzero (gimple_assign_rhs2 (s));
>
> Sorry I'm late, but how does this ensure the value is a integer?
> I believe for floating point the assumption is not correct because
> division by zero doesn't necessarily fault.

non-integer divisions use a different TREE_CODE (RDIV_EXPR).

>
> -Andi

[PATCH] Fold truncations of left shifts in match.pd

2022-05-30 Thread Roger Sayle


Whilst investigating PR 55278, I noticed that the tree-ssa optimizers
aren't eliminating the promotions of shifts to "int" as inserted by the
c-family front-ends, instead leaving this simplification to be left to
the RTL optimizers.  This patch allows match.pd to do this itself earlier,
narrowing (T)(X << C) to (T)X << C when the constant C is known to be
valid for the (narrower) type T.

Hence for this simple test case:
short foo(short x) { return x << 5; }

the .optimized dump currently looks like:

short int foo (short int x)
{
  int _1;
  int _2;
  short int _4;

   [local count: 1073741824]:
  _1 = (int) x_3(D);
  _2 = _1 << 5;
  _4 = (short int) _2;
  return _4;
}

but with this patch, now becomes:

short int foo (short int x)
{
  short int _2;

   [local count: 1073741824]:
  _2 = x_1(D) << 5;
  return _2;
}

This is always reasonable as RTL expansion knows how to use
widening optabs if it makes sense at the RTL level to perform
this shift in a wider mode.

Of course, there's often a catch.  The above simplification not only
reduces the number of statements in gimple, but also allows further
optimizations, for example including the perception of rotate idioms
and bswap16.  Alas, optimizing things earlier than anticipated
requires several testsuite changes [though all these tests have
been confirmed to generate identical assembly code on x86_64].
The only significant change is that the vectorization pass previously
wouldn't vectorize rotations if the backend doesn't explicitly provide
an optab for them.  This is curious as if the rotate is expressed as
ior(lshift,rshift) it will vectorize, and likewise RTL expansion will
generate the iorv(lshiftv,rshiftv) sequence if required for a vector
mode rotation.  Hence this patch includes a tweak to the optabs
test in tree-vect-stmts.cc's vectorizable_shifts to better reflect
the functionality supported by RTL expansion.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32},
with no new failures.  Ok for mainline?


2022-05-30  Roger Sayle  

gcc/ChangeLog
* match.pd (convert (lshift @1 INTEGER_CST@2)): Narrow integer
left shifts by a constant when the result is truncated, and the
shift constant is well-defined for the narrower mode.
* tree-vect-stmts.cc (vectorizable_shift): Rotations by
constants are vectorizable, if the backend supports logical
shifts and IOR logical operations in the required vector mode.

gcc/testsuite/ChangeLog
* gcc.dg/fold-convlshift-4.c: New test case.
* gcc.dg/optimize-bswaphi-1.c: Update found bswap count.
* gcc.dg/tree-ssa/pr61839_3.c: Shift is now optimized before VRP.
* gcc.dg/vect/vect-over-widen-1-big-array.c: Remove obsolete tests.
* gcc.dg/vect/vect-over-widen-1.c: Likewise.
* gcc.dg/vect/vect-over-widen-3-big-array.c: Likewise.
* gcc.dg/vect/vect-over-widen-3.c: Likewise.
* gcc.dg/vect/vect-over-widen-4-big-array.c: Likewise.
* gcc.dg/vect/vect-over-widen-4.c: Likewise.


Thanks in advance,
Roger
--

diff --git a/gcc/match.pd b/gcc/match.pd
index c2fed9b..05b34d6 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3637,6 +3637,16 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   && !integer_zerop (@3))
   (lshift (convert @2) @3)))
 
+/* Narrow a lshift by constant.  */
+(simplify
+ (convert (lshift:s@0 @1 INTEGER_CST@2))
+ (if (INTEGRAL_TYPE_P (type)
+  && INTEGRAL_TYPE_P (TREE_TYPE (@0))
+  && TYPE_PRECISION (type) <= TYPE_PRECISION (TREE_TYPE (@0))
+  && !integer_zerop (@2)
+  && wi::ltu_p (wi::to_wide (@2), TYPE_PRECISION (type)))
+  (lshift (convert @1) (convert @2
+
 /* Simplifications of conversions.  */
 
 /* Basic strip-useless-type-conversions / strip_nops.  */
diff --git a/gcc/testsuite/gcc.dg/fold-convlshift-4.c 
b/gcc/testsuite/gcc.dg/fold-convlshift-4.c
new file mode 100644
index 000..001627f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/fold-convlshift-4.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+short foo(short x)
+{
+  return x << 5;
+}
+
+/* { dg-final { scan-tree-dump-not "\\(int\\)" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "\\(short int\\)" "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/optimize-bswaphi-1.c 
b/gcc/testsuite/gcc.dg/optimize-bswaphi-1.c
index d045da9..a5d8bfd 100644
--- a/gcc/testsuite/gcc.dg/optimize-bswaphi-1.c
+++ b/gcc/testsuite/gcc.dg/optimize-bswaphi-1.c
@@ -68,4 +68,4 @@ get_unaligned_16_be (unsigned char *p)
 
 
 /* { dg-final { scan-tree-dump-times "16 bit load in target endianness found 
at" 4 "bswap" } } */
-/* { dg-final { scan-tree-dump-times "16 bit bswap implementation found at" 5 
"bswap" } } */
+/* { dg-final { scan-tree-dump-times "16 bit bswap implementation found at" 4 
"bswap" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr61839_3.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr61839_3.c

Re: [PATCH] [PR105665] ivopts: check defs of names in base for undefs

2022-05-30 Thread Richard Biener via Gcc-patches

On Sat, May 28, 2022 at 7:52 AM Alexandre Oliva via Gcc-patches
 wrote:
>
>
> The patch for PR 100810 tested for undefined SSA_NAMEs appearing
> directly in the base expression of the potential IV candidate, but
> that's not enough.  The testcase for PR105665 shows an undefined
> SSA_NAME has the same ill effect if it's referenced as an PHI_NODE arg
> in the referenced SSA_NAME.  The variant of that test shows it can be
> further removed from the referenced SSA_NAME.
>
> To avoid deep recursion, precompute SSA_NAMEs deemed unsuitable
> candidates, so that we can skip them with a flag test.
>
> Regstrapped on x86_64-linux-gnu.  Ok to install?

I don't think you can rely on TREE_VISITED not set at the start of the
pass (and you don't clear it either).  So it's probably better to use a
sbitmap.

I also wonder how you decide that tracking PHIs with (one) uninit arg
is "enough"?  The previous patch indeed is also only somewhat a
hack because the GIMPLE IL doesn't stabilize the "value" of an
SSA default def.  Is it important which edge of the PHI the undef
appears in?  I presume the testcase might have it on the
loop entry edge?  I presume only PHIs in loop headers are to be
considered?

Richard.

>
> for  gcc/ChangeLog
>
> PR tree-optimization/105665
> PR tree-optimization/100810
> * tree-ssa-loop-ivopts.cc (mark_ssa_undefs): Precompute
> unsuitability of SSA_NAMEs in TREE_VISITED.
> (find_ssa_undef): Check the precomputed flag.
> (tree_ssa_iv_optimize): Call mark_ssa_undefs.
>
> for  gcc/testsuite/ChangeLog
>
> PR tree-optimization/105665
> PR tree-optimization/100810
> * gcc.dg/torture/pr105665.c: New.
> ---
>  gcc/testsuite/gcc.dg/torture/pr105665.c |   20 ++
>  gcc/tree-ssa-loop-ivopts.cc |   62 
> ++-
>  2 files changed, 80 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/torture/pr105665.c
>
> diff --git a/gcc/testsuite/gcc.dg/torture/pr105665.c 
> b/gcc/testsuite/gcc.dg/torture/pr105665.c
> new file mode 100644
> index 0..34cfc65843495
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/torture/pr105665.c
> @@ -0,0 +1,20 @@
> +/* { dg-do run } */
> +
> +int a, b, c[1], d[2], *e = c;
> +int main() {
> +  int f = 0;
> +  for (; b < 2; b++) {
> +int g;
> +if (f)
> +  g++, b = 40;
> +a = d[b * b];
> +for (f = 0; f < 3; f++) {
> +  if (e)
> +break;
> +  g--;
> +  if (a)
> +a = g;
> +}
> +  }
> +  return 0;
> +}
> diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
> index 81b536f930415..d8200f2a53b21 100644
> --- a/gcc/tree-ssa-loop-ivopts.cc
> +++ b/gcc/tree-ssa-loop-ivopts.cc
> @@ -3071,13 +3071,70 @@ get_loop_invariant_expr (struct ivopts_data *data, 
> tree inv_expr)
>return *slot;
>  }
>
> -/* Find the first undefined SSA name in *TP.  */
> +/* Mark as TREe_VISITED any SSA_NAMEs that are unsuitable as ivopts
> +   candidates for potentially involving undefined behavior.  */
> +
> +static void
> +mark_ssa_undefs (void)
> +{
> +  auto_vec queue;
> +
> +  unsigned int i;
> +  tree var;
> +  FOR_EACH_SSA_NAME (i, var, cfun)
> +{
> +  if (SSA_NAME_IS_VIRTUAL_OPERAND (var)
> + || ssa_defined_default_def_p (var)
> + || !ssa_undefined_value_p (var, false))
> +   TREE_VISITED (var) = false;
> +  else
> +   {
> + TREE_VISITED (var) = true;
> + queue.safe_push (var);
> + if (dump_file)
> +   fprintf (dump_file, "marking _%i as undef\n",
> +SSA_NAME_VERSION (var));
> +   }
> +}
> +
> +  while (!queue.is_empty ())
> +{
> +  var = queue.pop ();
> +  gimple *stmt;
> +  imm_use_iterator iter;
> +  FOR_EACH_IMM_USE_STMT (stmt, iter, var)
> +   {
> + if (is_gimple_call (stmt) || is_a  (stmt))
> +   continue;
> +
> + def_operand_p defvar;
> + ssa_op_iter diter;
> + FOR_EACH_PHI_OR_STMT_DEF (defvar, stmt, diter, SSA_OP_DEF)
> +   {
> + gcc_checking_assert (is_gimple_assign (stmt)
> +  || is_a  (stmt));
> + tree def = DEF_FROM_PTR (defvar);
> + if (TREE_VISITED (def))
> +   continue;
> + TREE_VISITED (def) = true;
> + queue.safe_push (def);
> + if (dump_file)
> +   fprintf (dump_file, "Marking _%i as undef because of _%i\n",
> +SSA_NAME_VERSION (def), SSA_NAME_VERSION (var));
> +   }
> +   }
> +}
> +}
> +
> +/* Return *TP if it is an SSA_NAME marked with TREE_VISITED, i.e., as
> +   unsuitable as ivopts candidates for potentially involving undefined
> +   behavior.  */
>
>  static tree
>  find_ssa_undef (tree *tp, int *walk_subtrees, void *)
>  {
>if (TREE_CODE (*tp) == SSA_NAME
> -  && ssa_undefined_value_p (*tp, false))
> +  && TREE_VISITED (*tp))
>  return

Re: [PATCH 5/7] openmp: Add C++ support for parsing metadirectives

2022-05-30 Thread Jakub Jelinek via Gcc-patches

On Fri, Dec 10, 2021 at 05:37:34PM +, Kwok Cheung Yeung wrote:
> From e9bb138d4c3f560e48e408facce2361533685a98 Mon Sep 17 00:00:00 2001
> From: Kwok Cheung Yeung 
> Date: Mon, 6 Dec 2021 22:58:01 +
> Subject: [PATCH 5/7] openmp: Add C++ support for parsing metadirectives
> 
> This adds support for parsing OpenMP metadirectives in the C++ front end.
> 
> 2021-12-10  Kwok Cheung Yeung  
> 
>   gcc/cp/
>   * parser.c (cp_parser_skip_to_end_of_statement): Handle parentheses.
>   (cp_parser_skip_to_end_of_block_or_statement): Likewise.
>   (cp_parser_omp_context_selector): Add extra argument.  Allow
>   non-constant expressions.
>   (cp_parser_omp_context_selector_specification): Add extra argument and
>   propagate to cp_parser_omp_context_selector.
>   (analyze_metadirective_body): New.
>   (cp_parser_omp_metadirective): New.
>   (cp_parser_omp_construct): Handle PRAGMA_OMP_METADIRECTIVE.
>   (cp_parser_pragma): Handle PRAGMA_OMP_METADIRECTIVE.
> ---
>  gcc/cp/parser.c | 425 +++-
>  1 file changed, 417 insertions(+), 8 deletions(-)
> 
> diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
> index 6f273bfe21f..afbfe148949 100644
> --- a/gcc/cp/parser.c
> +++ b/gcc/cp/parser.c
> @@ -3907,6 +3907,17 @@ cp_parser_skip_to_end_of_statement (cp_parser* parser)
> ++nesting_depth;
> break;
>  
> + case CPP_OPEN_PAREN:
> +   /* Track parentheses in case the statement is a standalone 'for'
> +  statement - we want to skip over the semicolons separating the
> +  operands.  */
> +   ++nesting_depth;
> +   break;
> +
> + case CPP_CLOSE_PAREN:
> +   --nesting_depth;
> +   break;
> +
>   case CPP_KEYWORD:
> if (token->keyword != RID__EXPORT
> && token->keyword != RID__MODULE
> @@ -3996,6 +4007,17 @@ cp_parser_skip_to_end_of_block_or_statement 
> (cp_parser* parser)
> nesting_depth++;
> break;
>  
> + case CPP_OPEN_PAREN:
> +   /* Track parentheses in case the statement is a standalone 'for'
> +  statement - we want to skip over the semicolons separating the
> +  operands.  */
> +   nesting_depth++;
> +   break;
> +
> + case CPP_CLOSE_PAREN:
> +   nesting_depth--;
> +   break;
> +

Like for C FE, I think this is too risky.

>   case CTX_PROPERTY_EXPR:
> -   t = cp_parser_constant_expression (parser);
> +   /* Allow non-constant expressions in metadirectives.  */
> +   t = metadirective_p
> +   ? cp_parser_expression (parser)
> +   : cp_parser_constant_expression (parser);
> if (t != error_mark_node)
>   {
> t = fold_non_dependent_expr (t);
> -   if (!value_dependent_expression_p (t)
> -   && (!INTEGRAL_TYPE_P (TREE_TYPE (t))
> -   || !tree_fits_shwi_p (t)))
> +   if (metadirective_p && !INTEGRAL_TYPE_P (TREE_TYPE (t)))

Like in the other patch, but more importantly, if t is
type_dependent_expression_p, you shouldn't diagnose it, it might be
integral after instantiation.  But it needs to be diagnosed later during
instantiation if it isn't integral then...

> +  cp_token *token = cp_lexer_peek_token (parser->lexer);
> +  bool stop = false;
> +
> +  if (cp_lexer_next_token_is_keyword (parser->lexer, RID_CASE))
> + in_case = true;
> +  else if (cp_lexer_next_token_is_keyword (parser->lexer, RID_LABEL))
> + in_label_decl = true;
> +
> +  switch (token->type)
> + {
> + case CPP_EOF:
> +   break;
> + case CPP_NAME:
> +   if ((!in_case
> +&& cp_lexer_nth_token_is (parser->lexer, 2, CPP_COLON))
> +   || in_label_decl)
> + labels.safe_push (token->u.value);

Similar thing as for C, identifier : can appear in various spots even when
it isn't a label, in C++ even in many more cases.  Say
nested struct definition, struct S : T {}; or (that is for both C and C++)
e.g. bitfields struct V { int v : 13; };

> +   goto add;
> + case CPP_OPEN_BRACE:
> +   ++nesting_depth;
> +   goto add;
> + case CPP_CLOSE_BRACE:
> +   if (--nesting_depth == 0)
> + stop = true;
> +   goto add;
> + case CPP_OPEN_PAREN:
> +   ++bracket_depth;
> +   goto add;
> + case CPP_CLOSE_PAREN:
> +   --bracket_depth;
> +   goto add;
> + case CPP_COLON:
> +   in_case = false;
> +   goto add;
> + case CPP_SEMICOLON:
> +   if (nesting_depth == 0 && bracket_depth == 0)
> + stop = true;
> +   /* Local label declarations are terminated by a semicolon.  */
> +   in_label_decl = false;
> +   goto add;
> + default:
> + add:
> +   tokens.safe_push (*token);
> +   cp_lexer_consume_token (parser->lexer);
> +   if (stop)
> + break;
> +   continue;
> + }
> +  break;
> +}
> +}
> +
> +/*

Re: [PATCH 4/7] openmp: Add support for streaming metadirectives and resolving them after LTO

2022-05-30 Thread Jakub Jelinek via Gcc-patches

On Fri, Dec 10, 2021 at 05:36:20PM +, Kwok Cheung Yeung wrote:
> 2021-12-10  Kwok Cheung Yeung  
> 
>   gcc/
>   * Makefile.in (OBJS): Add omp-expand-metadirective.o.
>   * gimple-streamer-in.c (input_gimple_stmt): Add case for
>   GIMPLE_OMP_METADIRECTIVE.  Handle metadirective labels.
>   * gimple-streamer-out.c (output_gimple_stmt): Likewise.
>   * omp-expand-metadirective.cc: New.
>   * passes.def: Add pass_omp_expand_metadirective.
>   * tree-pass.h (make_pass_omp_expand_metadirective): New prototype.
> ---
>  gcc/Makefile.in |   1 +
>  gcc/gimple-streamer-in.c|  10 ++
>  gcc/gimple-streamer-out.c   |   6 +
>  gcc/omp-expand-metadirective.cc | 191 
>  gcc/passes.def  |   1 +
>  gcc/tree-pass.h |   1 +
>  6 files changed, 210 insertions(+)
>  create mode 100644 gcc/omp-expand-metadirective.cc
> 
> @@ -151,6 +151,7 @@ input_gimple_stmt (class lto_input_block *ib, class 
> data_in *data_in,
>  case GIMPLE_COND:
>  case GIMPLE_GOTO:
>  case GIMPLE_DEBUG:
> +case GIMPLE_OMP_METADIRECTIVE:
>for (i = 0; i < num_ops; i++)
>   {
> tree *opp, op = stream_read_tree (ib, data_in);
> @@ -188,6 +189,15 @@ input_gimple_stmt (class lto_input_block *ib, class 
> data_in *data_in,
> else
>   gimple_call_set_fntype (call_stmt, stream_read_tree (ib, data_in));
>   }
> +  if (gomp_metadirective *metadirective_stmt
> + = dyn_cast  (stmt))
> + {
> +   gimple_alloc_omp_metadirective (metadirective_stmt);
> +   for (i = 0; i < num_ops; i++)
> + gimple_omp_metadirective_set_label (metadirective_stmt, i,
> + stream_read_tree (ib,
> +   data_in));
> + }

Ah, sorry for the comment about LTO streaming, here it is.

> --- /dev/null
> +++ b/gcc/omp-expand-metadirective.cc
> @@ -0,0 +1,191 @@
> +/* Expand an OpenMP metadirective.
> +
> +   Copyright (C) 2021 Free Software Foundation, Inc.

We have 2022 now...
> +

Missing function comment.
> +static void
> +omp_expand_metadirective (function *fun, basic_block bb)
> +{
> +  gimple *stmt = last_stmt (bb);
> +  vec candidates
> += omp_resolve_metadirective (stmt);
> +
> +  /* This is the last chance for the metadirective to be resolved.  */
> +  if (candidates.is_empty ())
> +gcc_unreachable ();

  gcc_assert (!candidates.is_empty ());
?

> +  /* opt_pass methods: */
> +  virtual bool gate (function *)
> +  {
> +return (flag_openmp);

Useless ()s around it.
But much more importantly, I don't really like this to be a separate pass,
walking the whole IL once more is expensive, even when you restrict it
to just flag_openmp.
Late declare variant resolving is done in the (now a little bit misnamed)
pass_omp_device_lower.
The gate of that pass is right now:
  return (!(fun->curr_properties & PROP_gimple_lomp_dev)
  || (flag_openmp
  && cgraph_node::get (fun->decl)->calls_declare_variant_alt));
so it would be nice to track (conservatively)
whether current function has any metadirectives
in it which aren't yet resolved (but perhaps the calls_declare_variant_alt
bit could be abused for that too) andin that case also deal with those.
You can surely gather them in the omp-offload.cc pass and then call
a function in your new file to handle that.

Jakub

Re: [PATCH 3/7] openmp: Add support for resolving metadirectives during parsing and Gimplification

2022-05-30 Thread Jakub Jelinek via Gcc-patches

On Fri, Dec 10, 2021 at 05:35:05PM +, Kwok Cheung Yeung wrote:
> 2021-12-10  Kwok Cheung Yeung  
> 
>   gcc/
>   * gimplify.c (expand_omp_metadirective): New.
>   * omp-general.c: Include tree-pretty-print.h.
>   (DELAY_METADIRECTIVES_AFTER_LTO): New macro.
>   (omp_context_selector_matches): Delay resolution of selectors.  Allow
>   non-constant expressions.
>   (omp_dynamic_cond): New.
>   (omp_dynamic_selector_p): New.
>   (sort_variant): New.
>   (omp_get_dynamic_candidates): New.
>   (omp_resolve_metadirective): New.
>   (omp_resolve_metadirective): New.
>   * omp-general.h (struct omp_metadirective_variant): New.
>   (omp_resolve_metadirective): New prototype.
> 
>   gcc/c-family/
>   * c-omp.c (c_omp_expand_metadirective_r): New.
>   (c_omp_expand_metadirective): New.

> --- a/gcc/c-family/c-omp.c
> +++ b/gcc/c-family/c-omp.c
> @@ -3264,8 +3264,49 @@ c_omp_categorize_directive (const char *first, const 
> char *second,
>return NULL;
>  }
>  

Missing function comment.

> +static tree
> +c_omp_expand_metadirective_r (vec 
> ,
> +   hash_map _labels,
> +   unsigned index)
> +{
> +  struct omp_metadirective_variant  = candidates[index];
> +  tree if_block = push_stmt_list ();
> +  if (candidate.directive != NULL_TREE)
> +add_stmt (candidate.directive);
> +  if (candidate.body != NULL_TREE)
> +{
> +  tree *label = body_labels.get (candidate.body);
> +  if (label != NULL)
> + add_stmt (build1 (GOTO_EXPR, void_type_node, *label));
> +  else
> + {
> +   tree body_label = create_artificial_label (UNKNOWN_LOCATION);
> +   add_stmt (build1 (LABEL_EXPR, void_type_node, body_label));
> +   add_stmt (candidate.body);
> +   body_labels.put (candidate.body, body_label);
> + }
> +}
> +  if_block = pop_stmt_list (if_block);
> +
> +  if (index == candidates.length () - 1)
> +return if_block;
> +
> +  tree cond = candidate.selector;
> +  gcc_assert (cond != NULL_TREE);
> +
> +  tree else_block = c_omp_expand_metadirective_r (candidates, body_labels,
> +   index + 1);
> +  tree ret = push_stmt_list ();
> +  tree stmt = build3 (COND_EXPR, void_type_node, cond, if_block, else_block);
> +  add_stmt (stmt);
> +  ret = pop_stmt_list (ret);
> +
> +  return ret;
> +}
> +

Likewise.

>  tree
> -c_omp_expand_metadirective (vec &)
> +c_omp_expand_metadirective (vec 
> )
>  {
> -  return NULL_TREE;
> +  hash_map body_labels;
> +  return c_omp_expand_metadirective_r (candidates, body_labels, 0);
>  }

> --- a/gcc/omp-general.c
> +++ b/gcc/omp-general.c
> @@ -45,6 +45,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "data-streamer.h"
>  #include "streamer-hooks.h"
>  #include "opts.h"
> +#include "tree-pretty-print.h"
>  
>  enum omp_requires omp_requires_mask;
>  
> @@ -1253,14 +1254,22 @@ omp_context_name_list_prop (tree prop)
>  }
>  }
>  
> +#define DELAY_METADIRECTIVES_AFTER_LTO { \
> +  if (metadirective_p && !(cfun->curr_properties & PROP_gimple_lomp_dev))
> \
> +return -1;   \

Why this?  Is that just some testing hack (which then the choice of
selectors in the testsuite relies on)?  I don't see why those selectors
shouldn't be resolved as early as possible.

> @@ -2624,16 +2673,189 @@ omp_lto_input_declare_variant_alt (lto_input_block 
> *ib, cgraph_node *node,
>INSERT) = entryp;
>  }
>  

Missing function comment.

> +static int
> +sort_variant (const void * a, const void *b, void *)
> +{
> +  widest_int score1 = ((const struct omp_metadirective_variant *) a)->score;
> +  widest_int score2 = ((const struct omp_metadirective_variant *) b)->score;
> +
> +  if (score1 > score2)
> +return -1;
> +  else if (score1 < score2)
> +return 1;
> +  else
> +return 0;
> +}

Note, resolving at the end of parsing (during gimplification) is still not
during parsing.  For resolving during parsing we'll need another mode, in
which the FEs somehow track e.g. selected OpenMP constructs that surround
the code being currently parsed.  Obviously that can be handled
incrementally.

Jakub

Re: [PATCH 2/7] openmp: Add middle-end support for metadirectives

2022-05-30 Thread Jakub Jelinek via Gcc-patches

On Fri, Dec 10, 2021 at 05:33:25PM +, Kwok Cheung Yeung wrote:
> 2021-12-10  Kwok Cheung Yeung  
> 
>   gcc/
>   * gimple-low.c (lower_omp_metadirective): New.
>   (lower_stmt): Handle GIMPLE_OMP_METADIRECTIVE.
>   * gimple-pretty-print.c (dump_gimple_omp_metadirective): New.
>   (pp_gimple_stmt_1): Handle GIMPLE_OMP_METADIRECTIVE.
>   * gimple-walk.c (walk_gimple_op): Handle GIMPLE_OMP_METADIRECTIVE.
>   (walk_gimple_stmt): Likewise.
>   * gimple.c (gimple_alloc_omp_metadirective): New.
>   (gimple_build_omp_metadirective): New.
>   (gimple_build_omp_metadirective_variant): New.
>   * gimple.def (GIMPLE_OMP_METADIRECTIVE): New.
>   (GIMPLE_OMP_METADIRECTIVE_VARIANT): New.
>   * gimple.h (gomp_metadirective_variant): New.
>   (gomp_metadirective): New.
>   (is_a_helper ::test): New.
>   (is_a_helper ::test): New.
>   (is_a_helper ::test): New.
>   (is_a_helper ::test): New.
>   (gimple_alloc_omp_metadirective): New prototype.
>   (gimple_build_omp_metadirective): New prototype.
>   (gimple_build_omp_metadirective_variant): New prototype.
>   (gimple_has_substatements): Add GIMPLE_OMP_METADIRECTIVE case.
>   (gimple_has_ops): Add GIMPLE_OMP_METADIRECTIVE.
>   (gimple_omp_metadirective_label): New.
>   (gimple_omp_metadirective_set_label): New.
>   (gimple_omp_metadirective_variants): New.
>   (gimple_omp_metadirective_set_variants): New.
>   (CASE_GIMPLE_OMP): Add GIMPLE_OMP_METADIRECTIVE.
>   * gimplify.c (is_gimple_stmt): Add OMP_METADIRECTIVE.
>   (expand_omp_metadirective): New.
>   (gimplify_omp_metadirective): New.
>   (gimplify_expr): Add case for OMP_METADIRECTIVE.
>   * gsstruct.def (GSS_OMP_METADIRECTIVE): New.
>   (GSS_OMP_METADIRECTIVE_VARIANT): New.
>   * omp-expand.c (build_omp_regions_1): Handle GIMPLE_OMP_METADIRECTIVE.
>   (omp_make_gimple_edges): Likewise.
>   * omp-low.c (struct omp_context): Add next_clone field.
>   (new_omp_context): Initialize next_clone field.
>   (clone_omp_context): New.
>   (delete_omp_context): Delete clone contexts.
>   (scan_omp_metadirective): New.
>   (scan_omp_1_stmt): Handle GIMPLE_OMP_METADIRECTIVE.
>   (lower_omp_metadirective): New.
>   (lower_omp_1): Handle GIMPLE_OMP_METADIRECTIVE.
>   * tree-cfg.c (cleanup_dead_labels): Handle GIMPLE_OMP_METADIRECTIVE.
>   (gimple_redirect_edge_and_branch): Likewise.
>   * tree-inline.c (remap_gimple_stmt): Handle GIMPLE_OMP_METADIRECTIVE.
>   (estimate_num_insns): Likewise.
>   * tree-pretty-print.c (dump_generic_node): Handle OMP_METADIRECTIVE.
>   * tree-ssa-operands.c (parse_ssa_operands): Handle
>   GIMPLE_OMP_METADIRECTIVE.

> --- a/gcc/gimple-pretty-print.c
> +++ b/gcc/gimple-pretty-print.c
> @@ -2051,6 +2051,63 @@ dump_gimple_omp_return (pretty_printer *buffer, const 
> gimple *gs, int spc,
>  }
>  }
>  
> +/* Dump a GIMPLE_OMP_METADIRECTIVE tuple on the pretty_printer BUFFER.  */
> +
> +static void
> +dump_gimple_omp_metadirective (pretty_printer *buffer, const gimple *gs,
> +int spc, dump_flags_t flags)
> +{
> +  if (flags & TDF_RAW)
> +{
> +  dump_gimple_fmt (buffer, spc, flags, "%G <%+BODY <%S> >", gs,
> +gimple_omp_body (gs));
> +}

No need for {}s around a single statement.

> +  else
> +{
> +  pp_string (buffer, "#pragma omp metadirective");
> +  newline_and_indent (buffer, spc + 2);
> +
> +  gimple *variant = gimple_omp_metadirective_variants (gs);
> +
> +  for (unsigned i = 0; i < gimple_num_ops (gs); i++)
> + {
> +   tree selector = gimple_op (gs, i);
> +
> +   if (selector == NULL_TREE)
> + pp_string (buffer, "default:");
> +   else
> + {
> +   pp_string (buffer, "when (");
> +   dump_generic_node (buffer, selector, spc, flags, false);
> +   pp_string (buffer, "):");
> + }
> +
> +   if (variant != NULL)
> + {
> +   newline_and_indent (buffer, spc + 4);
> +   pp_left_brace (buffer);
> +   pp_newline (buffer);
> +   dump_gimple_seq (buffer, gimple_omp_body (variant), spc + 6,
> +flags);
> +   newline_and_indent (buffer, spc + 4);
> +   pp_right_brace (buffer);
> +
> +   variant = variant->next;
> + }
> +   else
> + {
> +   tree label = gimple_omp_metadirective_label (gs, i);
> +
> +   pp_string (buffer, " ");
> +   dump_generic_node (buffer, label, spc, flags, false);
> + }
> +
> +   if (i != gimple_num_ops (gs) - 1)
> + newline_and_indent (buffer, spc + 2);

I think better would be to use a gimple_stmt_iterator to walk the variants,
so no variant->next etc., but gimple_omp_metadirective_variants returning
a gimple_seq instead of gimple * (it is the same thing under the hood),
then

Re: [PATCH v2, rs6000] Fix ICE on expand bcd__ [PR100736]

2022-05-30 Thread Kewen.Lin via Gcc-patches

Hi Haochen,

on 2022/5/26 15:35, HAO CHEN GUI wrote:
> Hi,
>   This patch fixes the ICE reported in PR100736. It removes the condition
> check of finite math only flag not setting in "*_cc" pattern.
> With or without this flag, we still can use "cror" to check if either
> two bits of CC is set or not for "fp_two" codes. We don't need a reverse
> comparison (implemented by crnot) here when the finite math flag is set,
> as the latency of "cror" and "crnor" are the same.
> 
>   Bootstrapped and tested on powerpc64-linux BE and LE with no regressions.
> Is this okay for trunk? Any recommendations? Thanks a lot.
> 
> ChangeLog
> 2022-05-26 Haochen Gui 
> 
> gcc/
>   * config/rs6000/rs6000.md (*_cc): Remove condition of
>   finite math only flag not setting.
> 
> gcc/testsuite/
>   * gcc.target/powerpc/pr100736.c: New.
> 
> 
> patch.diff
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index fdfbc6566a5..a6f9cbc9b8b 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -12995,9 +12995,9 @@ (define_insn_and_split "*_cc"
>[(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
>   (fp_two:GPR (match_operand:CCFP 1 "cc_reg_operand" "y")
> (const_int 0)))]
> -  "!flag_finite_math_only"
> +  ""
>"#"
> -  "&& 1"
> +  ""

Segher added this hunk, not sure if he prefer to keep the condition unchanged
and update the expansion side, looking forward to his comments.  :)

>[(pc)]
>  {
>rtx cc = rs6000_emit_fp_cror (, mode, operands[1]);
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr100736.c 
> b/gcc/testsuite/gcc.target/powerpc/pr100736.c
> new file mode 100644
> index 000..32cb6df6cd9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr100736.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_p8vector_ok } */
> +/* { dg-options "-mdejagnu-cpu=power8 -O2 -ffinite-math-only" } */
> +
> +typedef __attribute__ ((altivec (vector__))) unsigned char v;
> +
> +int foo (v a, v b)
> +{
> +  return __builtin_vec_bcdsub_ge (a, b, 0);
> +}
> +
> +/* { dg-final { scan-assembler {\mcror\M} } } */
> 

The case of PR100736 fails with ICE as reported, maybe we can remove this 
dg-final check,
since as you noted in the description above either "cror" or "crnor" are 
acceptable,
this extra check could probably make this case fragile.

BR,
Kewen

Re: [PATCH v2] libgccjit: allow common objects in $(EXTRA_GCC_OBJS) and $(EXTRA_OBJS)

2022-05-30 Thread Xi Ruoyao via Gcc-patches

Ping.  I'd like to see libgccjit working on LoongArch so I would be able
to submit a Rust port to upstream.

If the result is NACK I'd like to know alternative approaches to fix the
build failure.

I doubt if "j...@gcc.gnu.org" is really used, so CC'ed the JIT maintainer
listed in MAINTAINERS.

On Thu, 2022-05-19 at 16:10 +0800, Yang Yujie wrote:
> Hello,
> 
> This patch fixes libgccjit build failure on loongarch* targets,
> and could probably be useful for future ports.
> 
> For now, libgccjit is linked with objects from $(EXTRA_GCC_OBJS) and
> libbackend.a, which contains object files from $(EXTRA_OBJS).
> 
> This effectively forbids any overlap between those two lists, i.e. all
> target-specific shared code between the gcc driver and compiler
> executables must go into gcc/common/config//-common.cc,
> which feels a bit inconvenient when there are a lot of "common" stuff
> that we want to put into separate source files.
> 
> By linking libgccjit with $(EXTRA_GCC_OBJS_EXCLUSIVE), which contains
> all elements from $(EXTRA_GCC_OBJS) but not $(EXTRA_OBJS), this
> problem
> can be alleviated.
> 
> This patch does not affect any other target architecture than
> loongarch,
> and has been bootstrapped and regression-tested on loongarch64-linux-
> gnuf64
> an x86_64-pc-linux-gnu.
> 
> Any recommendations? Please review. Thanks a lot.
> 
> Yujie
> 
> * gcc/jit/ChangeLog:
> 
> * Make-lang.in: only link objects from $(EXTRA_GCC_OBJS)
> that's not in $(EXTRA_OBJS) into libgccjit.
> ---
>  gcc/jit/Make-lang.in | 9 +++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/jit/Make-lang.in b/gcc/jit/Make-lang.in
> index 6e10abfd0ac..248ec45b729 100644
> --- a/gcc/jit/Make-lang.in
> +++ b/gcc/jit/Make-lang.in
> @@ -157,18 +157,23 @@ LIBGCCJIT_EXTRA_OPTS =
> $(LIBGCCJIT_VERSION_SCRIPT_OPTION) \
>  endif
>  endif
>  
> +# Only link objects from $(EXTRA_GCC_OBJS) that's not already
> +# included in libbackend.a ($(EXTRA_OBJS)).
> +EXTRA_GCC_OBJS_EXCLUSIVE = $(foreach _obj1, $(EXTRA_GCC_OBJS), \
> +   $(if $(filter $(_obj1), $(EXTRA_OBJS)),, $(_obj1)))
> +
>  # We avoid using $(BACKEND) from Makefile.in in order to avoid
> pulling
>  # in main.o
>  $(LIBGCCJIT_FILENAME): $(jit_OBJS) \
> libbackend.a libcommon-target.a libcommon.a \
> $(CPPLIB) $(LIBDECNUMBER) \
> $(LIBDEPS) $(srcdir)/jit/libgccjit.map \
> -   $(EXTRA_GCC_OBJS) $(jit.prev)
> +   $(EXTRA_GCC_OBJS_EXCLUSIVE) $(jit.prev)
> @$(call LINK_PROGRESS,$(INDEX.jit),start)
> +$(LLINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) -o $@ -shared \
>  $(jit_OBJS) libbackend.a libcommon-target.a libcommon.a \
>  $(CPPLIB) $(LIBDECNUMBER) $(EXTRA_GCC_LIBS) $(LIBS)
> $(BACKENDLIBS) \
> -    $(EXTRA_GCC_OBJS) \
> +    $(EXTRA_GCC_OBJS_EXCLUSIVE) \
>  $(LIBGCCJIT_EXTRA_OPTS)
> @$(call LINK_PROGRESS,$(INDEX.jit),end)
>  

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

[PATCH] PR rtl-optimization/7061: Complex number arguments on x86_64-like ABIs.

2022-05-30 Thread Roger Sayle


This patch addresses the issue in comment #6 of PR rtl-optimization/7061
(a four digit PR number) from 2006 where on x86_64 complex number arguments
are unconditionally spilled to the stack.

For the test cases below:
float re(float _Complex a) { return __real__ a; }
float im(float _Complex a) { return __imag__ a; }

GCC with -O2 currently generates:

re: movq%xmm0, -8(%rsp)
movss   -8(%rsp), %xmm0
ret
im: movq%xmm0, -8(%rsp)
movss   -4(%rsp), %xmm0
ret

with this patch we now generate:

re: ret
im: movq%xmm0, %rax
shrq$32, %rax
movd%eax, %xmm0
ret

[Technically, this shift can be performed on %xmm0 in a single
instruction, but the backend needs to be taught to do that, the
important bit is that the SCmode argument isn't written to the
stack].

The patch itself is to emit_group_store where just before RTL
expansion commits to writing to the stack, we check if the store
group consists of a single scalar integer register that holds
a complex mode value; on x86_64 SCmode arguments are passed in
DImode registers.  If this is the case, we can use a SUBREG to
"view_convert" the integer to the equivalent complex mode.

An interesting corner case that showed up during testing is that
x86_64 also passes HCmode arguments in DImode registers(!), i.e.
using modes of different sizes.  This is easily handled/supported
by first converting to an integer mode of the correct size, and
then generating a complex mode SUBREG of this.  This is similar
in concept to the patch I proposed here:
https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590139.html
which was almost (but not quite) approved here:
https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591139.html

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32},
with no new failures.  Ok for mainline?


2020-05-30  Roger Sayle  

gcc/ChangeLog
PR rtl-optimization/7061
* expr.cc (emit_group_stote): For groups that consist of a single
scalar integer register that hold a complex mode value, use
gen_lowpart to generate a SUBREG to "view_convert" to the complex
mode.  For modes of different sizes, first convert to an integer
mode of the appropriate size.

gcc/testsuite/ChangeLog
PR rtl-optimization/7061
* gcc.target/i386/pr7061-1.c: New test case.
* gcc.target/i386/pr7061-2.c: New test case.


Thanks in advance,
Roger
--

diff --git a/gcc/expr.cc b/gcc/expr.cc
index 7197996..c9df206 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -2803,10 +2803,26 @@ emit_group_store (rtx orig_dst, rtx src, tree type 
ATTRIBUTE_UNUSED,
{
  machine_mode dest_mode = GET_MODE (dest);
  machine_mode tmp_mode = GET_MODE (tmps[i]);
+ scalar_int_mode imode;
 
  gcc_assert (known_eq (bytepos, 0) && XVECLEN (src, 0));
 
- if (GET_MODE_ALIGNMENT (dest_mode)
+ if (finish == 1
+ && REG_P (tmps[i])
+ && COMPLEX_MODE_P (dest_mode)
+ && SCALAR_INT_MODE_P (tmp_mode)
+ && int_mode_for_mode (dest_mode).exists ())
+   {
+ if (tmp_mode != imode)
+   {
+ rtx tmp = gen_reg_rtx (imode);
+ emit_move_insn (tmp, gen_lowpart (imode, tmps[i]));
+ dst = gen_lowpart (dest_mode, tmp);
+   }
+ else
+   dst = gen_lowpart (dest_mode, tmps[i]);
+   }
+ else if (GET_MODE_ALIGNMENT (dest_mode)
  >= GET_MODE_ALIGNMENT (tmp_mode))
{
  dest = assign_stack_temp (dest_mode,
diff --git a/gcc/testsuite/gcc.target/i386/pr7061-1.c 
b/gcc/testsuite/gcc.target/i386/pr7061-1.c
new file mode 100644
index 000..ce5f6b2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr7061-1.c
@@ -0,0 +1,4 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2" } */
+float re(float _Complex a) { return __real__ a; }
+/* { dg-final { scan-assembler-not "mov" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr7061-2.c 
b/gcc/testsuite/gcc.target/i386/pr7061-2.c
new file mode 100644
index 000..ac33340
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr7061-2.c
@@ -0,0 +1,5 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2" } */
+float im(float _Complex a) { return __imag__ a; }
+/* { dg-final { scan-assembler-not "movss" } } */
+/* { dg-final { scan-assembler-not "rsp" } } */

Re: [PATCH v3, rs6000] Implemented f[min/max]_optab by xs[min/max]dp [PR103605]

2022-05-30 Thread Kewen.Lin via Gcc-patches

Hi Haochen,

on 2022/5/18 16:52, HAO CHEN GUI wrote:
> Hi,
>   This patch implements optab f[min/max]_optab by xs[min/max]dp on rs6000.
> Tests show that outputs of xs[min/max]dp are consistent with the standard
> of C99 fmin/max.
> 
>   This patch also binds __builtin_vsx_xs[min/max]dp to fmin/max instead
> of smin/max. So the builtins always generate xs[min/max]dp on all
> platforms.
> 
>   Bootstrapped and tested on ppc64 Linux BE and LE with no regressions.
> Is this okay for trunk? Any recommendations? Thanks a lot.
> 
> ChangeLog
> 2022-05-18 Haochen Gui 
> 
> gcc/
>   PR target/103605
>   * rs6000.md (FMINMAX): New.
>   (minmax_op): New.
>   (f3): New pattern by UNSPEC_FMAX and UNSPEC_FMIN.
>   * rs6000-builtins.def (__builtin_vsx_xsmaxdp): Set pattern to fmaxdf3.
>   (__builtin_vsx_xsmindp): Set pattern to fmindf3.
> 

These changelog entries look wrong to me, they miss the relative path names.

* config/rs6000/rs6000-builtins.def ...
* config/rs6000/rs6000.md ...

> gcc/testsuite/
>   PR target/103605
>   * gcc.dg/pr103605.c: New.

... and wrong path here.

* gcc.target/powerpc/pr103605.c: New test.

OK with the changelog above fixed.  Thanks!

BR,
Kewen

> 
> patch.diff
> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
> b/gcc/config/rs6000/rs6000-builtins.def
> index f4a9f24bcc5..8b735493b40 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -1613,10 +1613,10 @@
>  XSCVSPDP vsx_xscvspdp {}
> 
>const double __builtin_vsx_xsmaxdp (double, double);
> -XSMAXDP smaxdf3 {}
> +XSMAXDP fmaxdf3 {}
> 
>const double __builtin_vsx_xsmindp (double, double);
> -XSMINDP smindf3 {}
> +XSMINDP fmindf3 {}
> 
>const double __builtin_vsx_xsrdpi (double);
>  XSRDPI vsx_xsrdpi {}
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index bf85baa5370..197de0838ee 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -158,6 +158,8 @@ (define_c_enum "unspec"
> UNSPEC_HASHCHK
> UNSPEC_XXSPLTIDP_CONST
> UNSPEC_XXSPLTIW_CONST
> +   UNSPEC_FMAX
> +   UNSPEC_FMIN
>])
> 
>  ;;
> @@ -5341,6 +5343,22 @@ (define_insn_and_split "*s3_fpr"
>DONE;
>  })
> 
> +
> +(define_int_iterator FMINMAX [UNSPEC_FMAX UNSPEC_FMIN])
> +
> +(define_int_attr  minmax_op [(UNSPEC_FMAX "max")
> +  (UNSPEC_FMIN "min")])
> +
> +(define_insn "f3"
> +  [(set (match_operand:SFDF 0 "vsx_register_operand" "=wa")
> + (unspec:SFDF [(match_operand:SFDF 1 "vsx_register_operand" "wa")
> +   (match_operand:SFDF 2 "vsx_register_operand" "wa")]
> +   FMINMAX))]
> +"TARGET_VSX"
> +"xsdp %x0,%x1,%x2"
> +[(set_attr "type" "fp")]
> +)
> +
>  (define_expand "movcc"
> [(set (match_operand:GPR 0 "gpc_reg_operand")
>(if_then_else:GPR (match_operand 1 "comparison_operator")
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr103605.c 
> b/gcc/testsuite/gcc.target/powerpc/pr103605.c
> new file mode 100644
> index 000..e43ac40c2d1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr103605.c
> @@ -0,0 +1,37 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_vsx_ok } */
> +/* { dg-options "-O1 -mvsx" } */
> +/* { dg-final { scan-assembler-times {\mxsmaxdp\M} 3 } } */
> +/* { dg-final { scan-assembler-times {\mxsmindp\M} 3 } } */
> +
> +#include 
> +
> +double test1 (double d0, double d1)
> +{
> +  return fmin (d0, d1);
> +}
> +
> +float test2 (float d0, float d1)
> +{
> +  return fmin (d0, d1);
> +}
> +
> +double test3 (double d0, double d1)
> +{
> +  return fmax (d0, d1);
> +}
> +
> +float test4 (float d0, float d1)
> +{
> +  return fmax (d0, d1);
> +}
> +
> +double test5 (double d0, double d1)
> +{
> +  return __builtin_vsx_xsmindp (d0, d1);
> +}
> +
> +double test6 (double d0, double d1)
> +{
> +  return __builtin_vsx_xsmaxdp (d0, d1);
> +}

Re: [PATCH] Add a bit dislike for separate mem alternative when op is REG_P.

2022-05-30 Thread Alexander Monakov via Gcc-patches

On Mon, 30 May 2022, Hongtao Liu wrote:

> On Mon, May 30, 2022 at 3:44 PM Alexander Monakov  wrote:
> >
> > On Mon, 30 May 2022, Hongtao Liu wrote:
> >
> > > On Mon, May 30, 2022 at 2:22 PM Alexander Monakov via Gcc-patches
> > >  wrote:
> > > > >
> > > > > The spill is mainly decided by 3 insns related to r92
> > > > >
> > > > > 283(insn 3 61 4 2 (set (reg/v:SF 92 [ x ])
> > > > > 284(reg:SF 102)) "test3.c":7:1 142 {*movsf_internal}
> > > > > 285 (expr_list:REG_DEAD (reg:SF 102)
> > > > >
> > > > > 288(insn 9 4 12 2 (set (reg:SI 89 [ _11 ])
> > > > > 289(subreg:SI (reg/v:SF 92 [ x ]) 0)) "test3.c":3:36 81 
> > > > > {*movsi_internal}
> > > > > 290 (nil))
> > > > >
> > > > > And
> > > > > 382(insn 28 27 29 5 (set (reg:DF 98)
> > > > > 383(float_extend:DF (reg/v:SF 92 [ x ]))) "test3.c":11:13 163 
> > > > > {*extendsfdf2}
> > > > > 384 (expr_list:REG_DEAD (reg/v:SF 92 [ x ])
> > > > > 385(nil)))
> > > > > 386(insn 29 28 30 5 (s
> > > > >
> > > > > The frequency the for INSN 3 and INSN 9 is not affected, but 
> > > > > frequency of INSN
> > > > > 28 drop from 805 -> 89 after swapping "unlikely" and "likely".  
> > > > > Because of
> > > > > that, GPR cost decreases a lot, finally make the RA choose GPR 
> > > > > instead of MEM.
> > > > >
> > > > > GENERAL_REGS:2356,2356
> > > > > SSE_REGS:6000,6000
> > > > > MEM:4089,4089
> > > >
> > > > But why are SSE_REGS costed so high? r92 is used in SFmode, it doesn't 
> > > > make
> > > > sense that selecting a GPR for it looks cheaper than xmm0.
> > > For INSN3 and INSN 28, SSE_REGS costs zero.
> > > But for INSN 9, it's a SImode move, we have disparaged non-gpr
> > > alternatives in movsi_internal pattern which finally makes SSE_REGS
> > > costs 6 * 1000(1000 is frequency, 6 is move cost between SSE_REGS and
> > > GPR, sse_to_integer/integer_to_sse).
> >
> > But wait, selecting a GPR for r92 makes insn 3 (movsf_internal) an
> > sse-to-integer move, so it should be equally high cost? Not to mention
> > that the use in insn 28 (extendsfdf2) should have higher cost also.
> >
> > Why does GPR cost 2356 instead of 6000 for insn 3 plus extra for insn 28?
> First GPR cost in insn 3 is not necessarily equal to integer_to_sse,
> it's the minimal cost of all alternatives, and one alternative is ?r,
> the cost is 2.
> 
> I think the difference in movsf_internal and movsi_internal for *v and
> ?r make RA finally choose GPR.

I think this is one of the main issues here, if in the end it's the same
'mov %xmmN, ' instruction, only the pattern name is different.

Alexander

Re: [x86 PING] PR target/70321: Split double word equality/inequality after STV.

2022-05-30 Thread Uros Bizjak via Gcc-patches

On Mon, May 30, 2022 at 11:18 AM Uros Bizjak  wrote:
>
> On Mon, May 30, 2022 at 11:11 AM Roger Sayle  
> wrote:
> >
> >
> > Hi Uros,
> > This is a ping of my patch from April, which as you've suggested should be
> > submitted
> > for review even if there remain two missed-optimization regressions on ia32
> > (to
> > allow reviewers to better judge if those fixes are appropriate/the best
> > solution).
> > https://gcc.gnu.org/pipermail/gcc-patches/2022-April/593174.html
> >
> > The executive summary is that the core of this patch is a single pre-reload
> > splitter:
> >
> > (define_insn_and_split "*cmp_doubleword"
> >   [(set (reg:CCZ FLAGS_REG)
> >(compare:CCZ (match_operand: 0 "nonimmediate_operand")
> > (match_operand: 1 "x86_64_general_operand")))]
> >   "ix86_pre_reload_split ()"
> >   "#"
> >   "&& 1"
> >   [(parallel [(set (reg:CCZ FLAGS_REG)
> >   (compare:CCZ (ior:DWIH (match_dup 4) (match_dup 5))
> >(const_int 0)))
> >  (set (match_dup 4) (ior:DWIH (match_dup 4) (match_dup 5)))])]
> >
> > That allows the RTL optimizers to assume the target has a double word
> > equality/inequality comparison during combine, but then split this into
> > an CC setting IOR of the lowpart and highpart just before reload.
> >
> > The intended effect is that for PR target/70321's test case:
> >
> > void foo (long long ixi)
> > {
> >   if (ixi != 14348907)
> > __builtin_abort ();
> > }
> >
> > where with -m32 -O2 GCC previously produced:
> >
> > movl16(%esp), %eax
> > movl20(%esp), %edx
> > xorl$14348907, %eax
> > orl %eax, %edx
> > jne .L3
> >
> > we now produce the slightly improved:
> >
> > movl16(%esp), %eax
> > xorl$14348907, %eax
> > orl 20(%esp), %eax
> > jne .L3
> >
> > Similar improvements are seen with _int128 equality on TARGET_64BIT.
> >
> > The rest of the patch, in fact the bulk of it, is purely to adjust the other
> > parts of the i386 backend that make the assumption that double word
> > equality has been lowered during RTL expansion, including for example
> > STV which turns DImode equality into SSE ptest, which previously
> > explicitly looked for the IOR of lowpart/highpart.
> >
> > This patch has been retested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check, with no new failures.  However, when adding
> > --target_board=unix{-m32} there two new missed optimization FAILs
> > both related to pandn.
> > FAIL: gcc.target/i386/pr65105-5.c scan-assembler pandn
> > FAIL: gcc.target/i386/pr78794.c scan-assembler pandn
> >
> > These become the requested test cases for the fix proposed here:
> > https://gcc.gnu.org/pipermail/gcc-patches/2022-May/595390.html
> >
> > OK for mainline, now we're in stage 1?
> >
> >
> > 2022-05-30  Roger Sayle  
> >
> > gcc/ChangeLog
> > PR target/70321
> > * config/i386/i386-expand.cc (ix86_expand_branch): Don't decompose
> > DI mode equality/inequality using XOR here.  Instead generate a
> > COMPARE for doubleword modes (DImode on !TARGET_64BIT or TImode).
> > * config/i386/i386-features.cc (gen_gpr_to_xmm_move_src): Use
> > gen_rtx_SUBREG when NUNITS is 1, i.e. for TImode to V1TImode.
> > (general_scalar_chain::convert_compare): New function to convert
> > scalar equality/inequality comparison into vector operations.
> > (general_scalar_chain::convert_insn) [COMPARE]: Refactor. Call
> > new convert_compare helper method.
> > (convertible_comparion_p): Update to match doubleword COMPARE
> > of two register, memory or integer constant operands.
> > * config/i386/i386-features.h
> > (general_scalar_chain::convert_compare):
> > Prototype/declare member function here.
> > * config/i386/i386.md (cstore4): Change mode to SDWIM, but
> > only allow new doubleword modes for EQ and NE operators.
> > (*cmp_doubleword): New define_insn_and_split, to split a
> > doubleword comparison into a pair of XORs followed by an IOR to
> > set the (zero) flags register, optimizing the XORs if possible.
> > * config/i386/sse.md (V_AVX): Include V1TI and V2TI in mode
> > iterator;
> > V_AVX is (currently) only used by ptest.
> > (sse4_1 mode attribute): Update to support V1TI and V2TI.
> >
> > gcc/testsuite/ChangeLog
> > PR target/70321
> > * gcc.target/i386/pr70321.c: New test case.
> > * gcc.target/i386/sse4_1-stv-1.c: New test case.

@@ -0,0 +1,10 @@
+/* { dg-do compile { target ia32 } } */
+/* { dg-options "-O2" } */

Ah, you will need explicit -mstv -mno-stackrealign here.

Uros.

Re: [x86 PING] PR target/70321: Split double word equality/inequality after STV.

2022-05-30 Thread Uros Bizjak via Gcc-patches

On Mon, May 30, 2022 at 11:11 AM Roger Sayle  wrote:
>
>
> Hi Uros,
> This is a ping of my patch from April, which as you've suggested should be
> submitted
> for review even if there remain two missed-optimization regressions on ia32
> (to
> allow reviewers to better judge if those fixes are appropriate/the best
> solution).
> https://gcc.gnu.org/pipermail/gcc-patches/2022-April/593174.html
>
> The executive summary is that the core of this patch is a single pre-reload
> splitter:
>
> (define_insn_and_split "*cmp_doubleword"
>   [(set (reg:CCZ FLAGS_REG)
>(compare:CCZ (match_operand: 0 "nonimmediate_operand")
> (match_operand: 1 "x86_64_general_operand")))]
>   "ix86_pre_reload_split ()"
>   "#"
>   "&& 1"
>   [(parallel [(set (reg:CCZ FLAGS_REG)
>   (compare:CCZ (ior:DWIH (match_dup 4) (match_dup 5))
>(const_int 0)))
>  (set (match_dup 4) (ior:DWIH (match_dup 4) (match_dup 5)))])]
>
> That allows the RTL optimizers to assume the target has a double word
> equality/inequality comparison during combine, but then split this into
> an CC setting IOR of the lowpart and highpart just before reload.
>
> The intended effect is that for PR target/70321's test case:
>
> void foo (long long ixi)
> {
>   if (ixi != 14348907)
> __builtin_abort ();
> }
>
> where with -m32 -O2 GCC previously produced:
>
> movl16(%esp), %eax
> movl20(%esp), %edx
> xorl$14348907, %eax
> orl %eax, %edx
> jne .L3
>
> we now produce the slightly improved:
>
> movl16(%esp), %eax
> xorl$14348907, %eax
> orl 20(%esp), %eax
> jne .L3
>
> Similar improvements are seen with _int128 equality on TARGET_64BIT.
>
> The rest of the patch, in fact the bulk of it, is purely to adjust the other
> parts of the i386 backend that make the assumption that double word
> equality has been lowered during RTL expansion, including for example
> STV which turns DImode equality into SSE ptest, which previously
> explicitly looked for the IOR of lowpart/highpart.
>
> This patch has been retested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, with no new failures.  However, when adding
> --target_board=unix{-m32} there two new missed optimization FAILs
> both related to pandn.
> FAIL: gcc.target/i386/pr65105-5.c scan-assembler pandn
> FAIL: gcc.target/i386/pr78794.c scan-assembler pandn
>
> These become the requested test cases for the fix proposed here:
> https://gcc.gnu.org/pipermail/gcc-patches/2022-May/595390.html
>
> OK for mainline, now we're in stage 1?
>
>
> 2022-05-30  Roger Sayle  
>
> gcc/ChangeLog
> PR target/70321
> * config/i386/i386-expand.cc (ix86_expand_branch): Don't decompose
> DI mode equality/inequality using XOR here.  Instead generate a
> COMPARE for doubleword modes (DImode on !TARGET_64BIT or TImode).
> * config/i386/i386-features.cc (gen_gpr_to_xmm_move_src): Use
> gen_rtx_SUBREG when NUNITS is 1, i.e. for TImode to V1TImode.
> (general_scalar_chain::convert_compare): New function to convert
> scalar equality/inequality comparison into vector operations.
> (general_scalar_chain::convert_insn) [COMPARE]: Refactor. Call
> new convert_compare helper method.
> (convertible_comparion_p): Update to match doubleword COMPARE
> of two register, memory or integer constant operands.
> * config/i386/i386-features.h
> (general_scalar_chain::convert_compare):
> Prototype/declare member function here.
> * config/i386/i386.md (cstore4): Change mode to SDWIM, but
> only allow new doubleword modes for EQ and NE operators.
> (*cmp_doubleword): New define_insn_and_split, to split a
> doubleword comparison into a pair of XORs followed by an IOR to
> set the (zero) flags register, optimizing the XORs if possible.
> * config/i386/sse.md (V_AVX): Include V1TI and V2TI in mode
> iterator;
> V_AVX is (currently) only used by ptest.
> (sse4_1 mode attribute): Update to support V1TI and V2TI.
>
> gcc/testsuite/ChangeLog
> PR target/70321
> * gcc.target/i386/pr70321.c: New test case.
> * gcc.target/i386/sse4_1-stv-1.c: New test case.

OK.

Thanks,
Uros.

>
>
> Many thanks in advance,
> Roger
> --
>

[x86 PING] PR target/70321: Split double word equality/inequality after STV.

2022-05-30 Thread Roger Sayle


Hi Uros,
This is a ping of my patch from April, which as you've suggested should be
submitted
for review even if there remain two missed-optimization regressions on ia32
(to
allow reviewers to better judge if those fixes are appropriate/the best
solution).
https://gcc.gnu.org/pipermail/gcc-patches/2022-April/593174.html

The executive summary is that the core of this patch is a single pre-reload
splitter:

(define_insn_and_split "*cmp_doubleword"
  [(set (reg:CCZ FLAGS_REG)
   (compare:CCZ (match_operand: 0 "nonimmediate_operand")
(match_operand: 1 "x86_64_general_operand")))]
  "ix86_pre_reload_split ()"
  "#"
  "&& 1"
  [(parallel [(set (reg:CCZ FLAGS_REG)
  (compare:CCZ (ior:DWIH (match_dup 4) (match_dup 5))
   (const_int 0)))
 (set (match_dup 4) (ior:DWIH (match_dup 4) (match_dup 5)))])]

That allows the RTL optimizers to assume the target has a double word
equality/inequality comparison during combine, but then split this into
an CC setting IOR of the lowpart and highpart just before reload.

The intended effect is that for PR target/70321's test case:

void foo (long long ixi)
{
  if (ixi != 14348907)
__builtin_abort ();
}

where with -m32 -O2 GCC previously produced:

movl16(%esp), %eax
movl20(%esp), %edx
xorl$14348907, %eax
orl %eax, %edx
jne .L3

we now produce the slightly improved:

movl16(%esp), %eax
xorl$14348907, %eax
orl 20(%esp), %eax
jne .L3

Similar improvements are seen with _int128 equality on TARGET_64BIT.

The rest of the patch, in fact the bulk of it, is purely to adjust the other
parts of the i386 backend that make the assumption that double word
equality has been lowered during RTL expansion, including for example
STV which turns DImode equality into SSE ptest, which previously 
explicitly looked for the IOR of lowpart/highpart.

This patch has been retested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, with no new failures.  However, when adding 
--target_board=unix{-m32} there two new missed optimization FAILs
both related to pandn.
FAIL: gcc.target/i386/pr65105-5.c scan-assembler pandn
FAIL: gcc.target/i386/pr78794.c scan-assembler pandn

These become the requested test cases for the fix proposed here:
https://gcc.gnu.org/pipermail/gcc-patches/2022-May/595390.html

OK for mainline, now we're in stage 1?


2022-05-30  Roger Sayle  

gcc/ChangeLog
PR target/70321
* config/i386/i386-expand.cc (ix86_expand_branch): Don't decompose
DI mode equality/inequality using XOR here.  Instead generate a
COMPARE for doubleword modes (DImode on !TARGET_64BIT or TImode).
* config/i386/i386-features.cc (gen_gpr_to_xmm_move_src): Use
gen_rtx_SUBREG when NUNITS is 1, i.e. for TImode to V1TImode.
(general_scalar_chain::convert_compare): New function to convert
scalar equality/inequality comparison into vector operations.
(general_scalar_chain::convert_insn) [COMPARE]: Refactor. Call
new convert_compare helper method.
(convertible_comparion_p): Update to match doubleword COMPARE
of two register, memory or integer constant operands.
* config/i386/i386-features.h
(general_scalar_chain::convert_compare):
Prototype/declare member function here.
* config/i386/i386.md (cstore4): Change mode to SDWIM, but
only allow new doubleword modes for EQ and NE operators.
(*cmp_doubleword): New define_insn_and_split, to split a
doubleword comparison into a pair of XORs followed by an IOR to
set the (zero) flags register, optimizing the XORs if possible.
* config/i386/sse.md (V_AVX): Include V1TI and V2TI in mode
iterator;
V_AVX is (currently) only used by ptest.
(sse4_1 mode attribute): Update to support V1TI and V2TI.

gcc/testsuite/ChangeLog
PR target/70321
* gcc.target/i386/pr70321.c: New test case.
* gcc.target/i386/sse4_1-stv-1.c: New test case.


Many thanks in advance,
Roger
--

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 5cd7b99..8e9d2b6 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -2317,21 +2317,15 @@ ix86_expand_branch (enum rtx_code code, rtx op0, rtx 
op1, rtx label)
 case E_DImode:
   if (TARGET_64BIT)
goto simple;
-  /* For 32-bit target DI comparison may be performed on
-SSE registers.  To allow this we should avoid split
-to SI mode which is achieved by doing xor in DI mode
-and then comparing with zero (which is recognized by
-STV pass).  We don't compare using xor when optimizing
-for size.  */
-  if (!optimize_insn_for_size_p ()
- && TARGET_STV
- && (code == EQ || code == NE))
-   {
- op0 = force_reg (mode, gen_rtx_XOR (mode, op0,

Re: [gcc r12-6398] [Ada] Reduce runtime dependencies on stage1

2022-05-30 Thread Arnaud Charlet via Gcc-patches

> >  In order to build GNAT, the Ada compiler, you need a working GNAT
> > -compiler (GCC version 4.7 or later).
> > +compiler (GCC version 5.1 or later).
> >
> >  This includes GNAT tools such as @command{gnatmake} and
> >  @command{gnatlink}, since the Ada front end is written in Ada and
> 
> I've now tested that, but find that it doesn't work:
> 
> [...]
> gcc -c -g  -gnatpg -gnatwns -gnata -W -Wall -I- -I. -Iada/generated -Iada 
> -I[...]/source-gcc/gcc/ada [...]/source-gcc/gcc/ada/contracts.adb -o 
> ada/contracts.o

Ah, that's unfortunate. I guess on my side I'm using a version of GNAT built
without assertions, so not failing on this.

> So, what is actually the prerequisite GCC/Ada compiler version to be able
> to build GCC 12+ GCC/Ada?

I'm afraid I don't have more info at this stage, I assume GCC 6.1 would work
(not tested, if someone could confirm that would be great).

Arno

Re: [PATCH v3] DSE: Use the constant store source if possible

2022-05-30 Thread Richard Sandiford via Gcc-patches

"H.J. Lu"  writes:
> On Thu, May 26, 2022 at 04:14:17PM +0100, Richard Sandiford wrote:
>> "H.J. Lu"  writes:
>> > On Wed, May 25, 2022 at 12:30 AM Richard Sandiford
>> >  wrote:
>> >>
>> >> "H.J. Lu via Gcc-patches"  writes:
>> >> > On Mon, May 23, 2022 at 12:38:06PM +0200, Richard Biener wrote:
>> >> >> On Sat, May 21, 2022 at 5:02 AM H.J. Lu via Gcc-patches
>> >> >>  wrote:
>> >> >> >
>> >> >> > When recording store for RTL dead store elimination, check if the 
>> >> >> > source
>> >> >> > register is set only once to a constant.  If yes, record the constant
>> >> >> > as the store source.  It eliminates unrolled zero stores after 
>> >> >> > memset 0
>> >> >> > in a loop where a vector register is used as the zero store source.
>> >> >> >
>> >> >> > gcc/
>> >> >> >
>> >> >> > PR rtl-optimization/105638
>> >> >> > * dse.cc (record_store): Use the constant source if the 
>> >> >> > source
>> >> >> > register is set only once.
>> >> >> >
>> >> >> > gcc/testsuite/
>> >> >> >
>> >> >> > PR rtl-optimization/105638
>> >> >> > * g++.target/i386/pr105638.C: New test.
>> >> >> > ---
>> >> >> >  gcc/dse.cc   | 19 ++
>> >> >> >  gcc/testsuite/g++.target/i386/pr105638.C | 44 
>> >> >> > 
>> >> >> >  2 files changed, 63 insertions(+)
>> >> >> >  create mode 100644 gcc/testsuite/g++.target/i386/pr105638.C
>> >> >> >
>> >> >> > diff --git a/gcc/dse.cc b/gcc/dse.cc
>> >> >> > index 30c11cee034..0433dd3d846 100644
>> >> >> > --- a/gcc/dse.cc
>> >> >> > +++ b/gcc/dse.cc
>> >> >> > @@ -1508,6 +1508,25 @@ record_store (rtx body, bb_info_t bb_info)
>> >> >> >
>> >> >> >   if (tem && CONSTANT_P (tem))
>> >> >> > const_rhs = tem;
>> >> >> > + else
>> >> >> > +   {
>> >> >> > + /* If RHS is set only once to a constant, set CONST_RHS
>> >> >> > +to the constant.  */
>> >> >> > + df_ref def = DF_REG_DEF_CHAIN (REGNO (rhs));
>> >> >> > + if (def != nullptr
>> >> >> > + && !DF_REF_IS_ARTIFICIAL (def)
>> >> >> > + && !DF_REF_NEXT_REG (def))
>> >> >> > +   {
>> >> >> > + rtx_insn *def_insn = DF_REF_INSN (def);
>> >> >> > + rtx def_body = PATTERN (def_insn);
>> >> >> > + if (GET_CODE (def_body) == SET)
>> >> >> > +   {
>> >> >> > + rtx def_src = SET_SRC (def_body);
>> >> >> > + if (CONSTANT_P (def_src))
>> >> >> > +   const_rhs = def_src;
>> >> >>
>> >> >> doesn't DSE have its own tracking of stored values?  Shouldn't we
>> >> >
>> >> > It tracks stored values only within the basic block.  When RTL loop
>> >> > invariant motion hoists a constant initialization out of the loop into
>> >> > a separate basic block, the constant store value becomes unknown
>> >> > within the original basic block.
>> >> >
>> >> >> improve _that_ if it is not enough?  I also wonder if you need to
>> >> >
>> >> > My patch extends DSE stored value tracking to include the constant which
>> >> > is set only once in another basic block.
>> >> >
>> >> >> verify the SET isn't partial?
>> >> >>
>> >> >
>> >> > Here is the v2 patch to check that the constant is set by a non-partial
>> >> > unconditional load.
>> >> >
>> >> > OK for master?
>> >> >
>> >> > Thanks.
>> >> >
>> >> > H.J.
>> >> > ---
>> >> > RTL DSE tracks redundant constant stores within a basic block.  When RTL
>> >> > loop invariant motion hoists a constant initialization out of the loop
>> >> > into a separate basic block, the constant store value becomes unknown
>> >> > within the original basic block.  When recording store for RTL DSE, 
>> >> > check
>> >> > if the source register is set only once to a constant by a non-partial
>> >> > unconditional load.  If yes, record the constant as the constant store
>> >> > source.  It eliminates unrolled zero stores after memset 0 in a loop
>> >> > where a vector register is used as the zero store source.
>> >> >
>> >> > gcc/
>> >> >
>> >> >   PR rtl-optimization/105638
>> >> >   * dse.cc (record_store): Use the constant source if the source
>> >> >   register is set only once.
>> >> >
>> >> > gcc/testsuite/
>> >> >
>> >> >   PR rtl-optimization/105638
>> >> >   * g++.target/i386/pr105638.C: New test.
>> >> > ---
>> >> >  gcc/dse.cc   | 22 
>> >> >  gcc/testsuite/g++.target/i386/pr105638.C | 44 
>> >> >  2 files changed, 66 insertions(+)
>> >> >  create mode 100644 gcc/testsuite/g++.target/i386/pr105638.C
>> >> >
>> >> > diff --git a/gcc/dse.cc b/gcc/dse.cc
>> >> > index 30c11cee034..af8e88dac32 100644
>> >> > --- a/gcc/dse.cc
>> >> > +++ b/gcc/dse.cc
>> >> > @@ -1508,6 +1508,28 @@ record_store (rtx body, bb_info_t bb_info)
>> >> >
>> >> > if (tem && CONSTANT_P (tem))
>> >> >   const_rhs = tem;
>> >> > +

Re: [PATCH] Add a bit dislike for separate mem alternative when op is REG_P.

2022-05-30 Thread Hongtao Liu via Gcc-patches

On Mon, May 30, 2022 at 3:44 PM Alexander Monakov  wrote:
>
> On Mon, 30 May 2022, Hongtao Liu wrote:
>
> > On Mon, May 30, 2022 at 2:22 PM Alexander Monakov via Gcc-patches
> >  wrote:
> > > >
> > > > The spill is mainly decided by 3 insns related to r92
> > > >
> > > > 283(insn 3 61 4 2 (set (reg/v:SF 92 [ x ])
> > > > 284(reg:SF 102)) "test3.c":7:1 142 {*movsf_internal}
> > > > 285 (expr_list:REG_DEAD (reg:SF 102)
> > > >
> > > > 288(insn 9 4 12 2 (set (reg:SI 89 [ _11 ])
> > > > 289(subreg:SI (reg/v:SF 92 [ x ]) 0)) "test3.c":3:36 81 
> > > > {*movsi_internal}
> > > > 290 (nil))
> > > >
> > > > And
> > > > 382(insn 28 27 29 5 (set (reg:DF 98)
> > > > 383(float_extend:DF (reg/v:SF 92 [ x ]))) "test3.c":11:13 163 
> > > > {*extendsfdf2}
> > > > 384 (expr_list:REG_DEAD (reg/v:SF 92 [ x ])
> > > > 385(nil)))
> > > > 386(insn 29 28 30 5 (s
> > > >
> > > > The frequency the for INSN 3 and INSN 9 is not affected, but frequency 
> > > > of INSN
> > > > 28 drop from 805 -> 89 after swapping "unlikely" and "likely".  Because 
> > > > of
> > > > that, GPR cost decreases a lot, finally make the RA choose GPR instead 
> > > > of MEM.
> > > >
> > > > GENERAL_REGS:2356,2356
> > > > SSE_REGS:6000,6000
> > > > MEM:4089,4089
> > >
> > > But why are SSE_REGS costed so high? r92 is used in SFmode, it doesn't 
> > > make
> > > sense that selecting a GPR for it looks cheaper than xmm0.
> > For INSN3 and INSN 28, SSE_REGS costs zero.
> > But for INSN 9, it's a SImode move, we have disparaged non-gpr
> > alternatives in movsi_internal pattern which finally makes SSE_REGS
> > costs 6 * 1000(1000 is frequency, 6 is move cost between SSE_REGS and
> > GPR, sse_to_integer/integer_to_sse).
>
> But wait, selecting a GPR for r92 makes insn 3 (movsf_internal) an
> sse-to-integer move, so it should be equally high cost? Not to mention
> that the use in insn 28 (extendsfdf2) should have higher cost also.
>
> Why does GPR cost 2356 instead of 6000 for insn 3 plus extra for insn 28?
First GPR cost in insn 3 is not necessarily equal to integer_to_sse,
it's the minimal cost of all alternatives, and one alternative is ?r,
the cost is 2.

I think the difference in movsf_internal and movsi_internal for *v and
?r make RA finally choose GPR.
>
> Alexander



-- 
BR,
Hongtao

[Ada] Simplify construction of a path to file

2022-05-30 Thread Pierre-Marie de Rodat via Gcc-patches

Code cleanup; semantics is unaffected.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* osint.adb (Locate_File): Change variable to constant and
initialize it by concatenation of directory, file name and NUL.diff --git a/gcc/ada/osint.adb b/gcc/ada/osint.adb
--- a/gcc/ada/osint.adb
+++ b/gcc/ada/osint.adb
@@ -1886,13 +1886,13 @@ package body Osint is
   end if;
 
   declare
- Full_Name : String (1 .. Dir_Name'Length + Name'Length + 1);
+ Full_Name :
+   constant String (1 .. Dir_Name'Length + Name'Length + 1) :=
+   Dir_Name.all & Name & ASCII.NUL;
+ --  Use explicit bounds, because Dir_Name might be a substring whose
+ --  'First is not 1.
 
   begin
- Full_Name (1 .. Dir_Name'Length) := Dir_Name.all;
- Full_Name (Dir_Name'Length + 1 .. Full_Name'Last - 1) := Name;
- Full_Name (Full_Name'Last) := ASCII.NUL;
-
  Attr.all := Unknown_Attributes;
 
  if not Is_Regular_File (Full_Name'Address, Attr) then

[Ada] Fix spurious options being inserted in -fdiagnostics-format=json output

2022-05-30 Thread Pierre-Marie de Rodat via Gcc-patches

Without this patch, gnat would use `-gnatw?` as the default option for
some of the default warnings.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* erroutc.adb (Get_Warning_Option): Don't consider `?` as a
valid option switch.diff --git a/gcc/ada/erroutc.adb b/gcc/ada/erroutc.adb
--- a/gcc/ada/erroutc.adb
+++ b/gcc/ada/erroutc.adb
@@ -367,7 +367,7 @@ package body Erroutc is
   Warn : constant Boolean := Errors.Table (Id).Warn;
   Warn_Chr : constant String (1 .. 2) := Errors.Table (Id).Warn_Chr;
begin
-  if Warn and then Warn_Chr /= "  " then
+  if Warn and then Warn_Chr /= "  " and then Warn_Chr (1) /= '?' then
  if Warn_Chr = "$ " then
 return "-gnatel";
  elsif Warn_Chr (2) = ' ' then

[Ada] Add "option" field to GNAT's -fdiagnostics-format=json output

2022-05-30 Thread Pierre-Marie de Rodat via Gcc-patches

This enables better integration with tools that handle GNAT's output.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* erroutc.ads (Get_Warning_Option): New function returning the
option responsible for a warning if it exists.
* erroutc.adb (Get_Warning_Option): Likewise.
(Get_Warning_Tag): Rely on Get_Warning_Option when possible.
* errout.adb (Output_JSON_Message): Emit option field.diff --git a/gcc/ada/errout.adb b/gcc/ada/errout.adb
--- a/gcc/ada/errout.adb
+++ b/gcc/ada/errout.adb
@@ -2170,6 +2170,9 @@ package body Errout is
   --  Do not print continuations messages as children of the current
   --  message if the current message is a continuation message.
 
+  Option : constant String := Get_Warning_Option (E);
+  --  The option that triggered this message.
+
--  Start of processing for Output_JSON_Message
 
begin
@@ -2197,9 +2200,16 @@ package body Errout is
  Write_Str ("}");
   end if;
 
+  Write_Str ("]");
+
+  --  Print message option, if there is one
+  if Option /= "" then
+ Write_Str (",""option"":""" & Option & );
+  end if;
+
   --  Print message content
 
-  Write_Str ("],""message"":""");
+  Write_Str (",""message"":""");
   Write_JSON_Escaped_String (Errors.Table (E).Text);
   Write_Str ();
 


diff --git a/gcc/ada/erroutc.adb b/gcc/ada/erroutc.adb
--- a/gcc/ada/erroutc.adb
+++ b/gcc/ada/erroutc.adb
@@ -359,6 +359,26 @@ package body Erroutc is
   return Cur_Msg;
end Get_Msg_Id;
 
+   
+   -- Get_Warning_Option --
+   
+
+   function Get_Warning_Option (Id : Error_Msg_Id) return String is
+  Warn : constant Boolean := Errors.Table (Id).Warn;
+  Warn_Chr : constant String (1 .. 2) := Errors.Table (Id).Warn_Chr;
+   begin
+  if Warn and then Warn_Chr /= "  " then
+ if Warn_Chr = "$ " then
+return "-gnatel";
+ elsif Warn_Chr (2) = ' ' then
+return "-gnatw" & Warn_Chr (1);
+ else
+return "-gnatw" & Warn_Chr;
+ end if;
+  end if;
+  return "";
+   end Get_Warning_Option;
+
-
-- Get_Warning_Tag --
-
@@ -366,22 +386,19 @@ package body Erroutc is
function Get_Warning_Tag (Id : Error_Msg_Id) return String is
   Warn : constant Boolean := Errors.Table (Id).Warn;
   Warn_Chr : constant String (1 .. 2) := Errors.Table (Id).Warn_Chr;
+  Option   : constant String  := Get_Warning_Option (Id);
begin
-  if Warn and then Warn_Chr /= "  " then
+  if Warn then
  if Warn_Chr = "? " then
 return "[enabled by default]";
  elsif Warn_Chr = "* " then
 return "[restriction warning]";
- elsif Warn_Chr = "$ " then
-return "[-gnatel]";
- elsif Warn_Chr (2) = ' ' then
-return "[-gnatw" & Warn_Chr (1) & ']';
- else
-return "[-gnatw" & Warn_Chr & ']';
+ elsif Option /= "" then
+return "[" & Option & "]";
  end if;
-  else
- return "";
   end if;
+
+  return "";
end Get_Warning_Tag;
 
-


diff --git a/gcc/ada/erroutc.ads b/gcc/ada/erroutc.ads
--- a/gcc/ada/erroutc.ads
+++ b/gcc/ada/erroutc.ads
@@ -493,6 +493,10 @@ package Erroutc is
--  Returns the number of warnings in the Errors table that were triggered
--  by a Compile_Time_Warning pragma.
 
+   function Get_Warning_Option (Id : Error_Msg_Id) return String;
+   --  Returns the warning switch causing this warning message or an empty
+   --  string is there is none..
+
function Get_Warning_Tag (Id : Error_Msg_Id) return String;
--  Given an error message ID, return tag showing warning message class, or
--  the null string if this option is not enabled or this is not a warning.

[Ada] Remove contract duplication in formal doubly linked lists

2022-05-30 Thread Pierre-Marie de Rodat via Gcc-patches

Remove a minor duplication in Post of a function of formal doubly linked
lists.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* libgnat/a-cfdlli.ads (Insert): Remove the duplication.diff --git a/gcc/ada/libgnat/a-cfdlli.ads b/gcc/ada/libgnat/a-cfdlli.ads
--- a/gcc/ada/libgnat/a-cfdlli.ads
+++ b/gcc/ada/libgnat/a-cfdlli.ads
@@ -543,15 +543,7 @@ is
Lst   => Length (Container),
Item  => New_Item))
 
---  Container contains Count times New_Item at the end
-
-and M.Constant_Range
-  (Container => Model (Container),
-   Fst   => Length (Container)'Old + 1,
-   Lst   => Length (Container),
-   Item  => New_Item)
-
---  A Count cursors have been inserted at the end of Container
+--  Count cursors have been inserted at the end of Container
 
 and P_Positions_Truncated
   (Positions (Container)'Old,

[Ada] Fix expansion of structural subprogram variants

2022-05-30 Thread Pierre-Marie de Rodat via Gcc-patches

When implementing structural subprogram variants we ignored them in
expansion of the pragma itself, but not in expansion of a recursive
subprogram call. Now fixed.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* exp_ch6.adb (Check_Subprogram_Variant): Ignore structural
variants.diff --git a/gcc/ada/exp_ch6.adb b/gcc/ada/exp_ch6.adb
--- a/gcc/ada/exp_ch6.adb
+++ b/gcc/ada/exp_ch6.adb
@@ -3298,19 +3298,33 @@ package body Exp_Ch6 is
  Variant_Prag : constant Node_Id :=
Get_Pragma (Current_Scope, Pragma_Subprogram_Variant);
 
+ Pragma_Arg1  : Node_Id;
  Variant_Proc : Entity_Id;
 
   begin
  if Present (Variant_Prag) and then Is_Checked (Variant_Prag) then
 
---  Analysis of the pragma rewrites its argument with a reference
---  to the internally generated procedure.
+Pragma_Arg1 :=
+  Expression (First (Pragma_Argument_Associations (Variant_Prag)));
 
-Variant_Proc :=
-  Entity
-(Expression
-   (First
-  (Pragma_Argument_Associations (Variant_Prag;
+--  If pragma parameter is still an aggregate, it comes from a
+--  structural variant, which is not expanded and ignored for
+--  run-time execution.
+
+if Nkind (Pragma_Arg1) = N_Aggregate then
+   pragma Assert
+ (Chars
+(First
+  (Choices
+ (First (Component_Associations (Pragma_Arg1) =
+  Name_Structural);
+   return;
+end if;
+
+--  Otherwise, analysis of the pragma rewrites its argument with a
+--  reference to the internally generated procedure.
+
+Variant_Proc := Entity (Pragma_Arg1);
 
 Insert_Action (Call_Node,
   Make_Procedure_Call_Statement (Loc,

[Ada] Remove repeated description of support for Address clauses

2022-05-30 Thread Pierre-Marie de Rodat via Gcc-patches

The GNAT behaviour regarding the Ada RM requirement to support Address
clauses for imported subprograms was documented twice: in section about
packed types (which was a mistake) and in section about address clauses
(where it belongs).

Cleanup related to the use of packed arrays for bitset operations to
detect uses of uninitialized scalars in GNAT.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* doc/gnat_rm/implementation_advice.rst (Packed Types): Remove
duplicated and wrongly placed paragraph.
* gnat_rm.texi: Regenerate.diff --git a/gcc/ada/doc/gnat_rm/implementation_advice.rst b/gcc/ada/doc/gnat_rm/implementation_advice.rst
--- a/gcc/ada/doc/gnat_rm/implementation_advice.rst
+++ b/gcc/ada/doc/gnat_rm/implementation_advice.rst
@@ -353,12 +353,6 @@ then values of the type are implicitly initialized to zero. This
 happens both for objects of the packed type, and for objects that have a
 subcomponent of the packed type.
 
-
-  "An implementation should support Address clauses for imported
-  subprograms."
-
-Followed.
-
 .. index:: Address clauses
 
 RM 13.3(14-19): Address Clauses


diff --git a/gcc/ada/gnat_rm.texi b/gcc/ada/gnat_rm.texi
--- a/gcc/ada/gnat_rm.texi
+++ b/gcc/ada/gnat_rm.texi
@@ -14333,14 +14333,6 @@ then values of the type are implicitly initialized to zero. This
 happens both for objects of the packed type, and for objects that have a
 subcomponent of the packed type.
 
-@quotation
-
-“An implementation should support Address clauses for imported
-subprograms.”
-@end quotation
-
-Followed.
-
 @geindex Address clauses
 
 @node RM 13 3 14-19 Address Clauses,RM 13 3 29-35 Alignment Clauses,RM 13 2 6-8 Packed Types,Implementation Advice

[Ada] Add insertion character to Ineffective_Inline_Warnings messages

2022-05-30 Thread Pierre-Marie de Rodat via Gcc-patches

This enables tools that ingest GNAT's output to properly classify these
messages.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* inline.adb (Check_Package_Body_For_Inlining): Add insertion
character.diff --git a/gcc/ada/inline.adb b/gcc/ada/inline.adb
--- a/gcc/ada/inline.adb
+++ b/gcc/ada/inline.adb
@@ -2804,8 +2804,8 @@ package body Inline is
   elsif Ineffective_Inline_Warnings then
  Error_Msg_Unit_1 := Bname;
  Error_Msg_N
-   ("unable to inline subprograms defined in $??", P);
- Error_Msg_N ("\body not found??", P);
+   ("unable to inline subprograms defined in $?p?", P);
+ Error_Msg_N ("\body not found?p?", P);
  return;
   end if;
end if;

[Ada] Add insertion character for overlay modification warnings

2022-05-30 Thread Pierre-Marie de Rodat via Gcc-patches

This enables tools that ingest GNAT's output to properly classify these
messages.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* freeze.adb (Warn_Overlay): Add 'o' insertion character.
* sem_ch13.adb (Analyze_Attribute_Definition_Clause): Likewise.
* sem_util.adb (Note_Possible_Modifications): Likewise.diff --git a/gcc/ada/freeze.adb b/gcc/ada/freeze.adb
--- a/gcc/ada/freeze.adb
+++ b/gcc/ada/freeze.adb
@@ -789,7 +789,7 @@ package body Freeze is
then
   Error_Msg_Sloc := Sloc (Addr);
   Error_Msg_NE
-("??constant& may be modified via address clause#",
+("?o?constant& may be modified via address clause#",
  Decl, O_Ent);
end if;
 end;
@@ -10638,11 +10638,11 @@ package body Freeze is
  if Present (Old) then
 Error_Msg_Node_2 := Old;
 Error_Msg_N
-  ("default initialization of & may modify &??",
+  ("default initialization of & may modify &?o?",
Nam);
  else
 Error_Msg_N
-  ("default initialization of & may modify overlaid storage??",
+  ("default initialization of & may modify overlaid storage?o?",
Nam);
  end if;
 


diff --git a/gcc/ada/sem_ch13.adb b/gcc/ada/sem_ch13.adb
--- a/gcc/ada/sem_ch13.adb
+++ b/gcc/ada/sem_ch13.adb
@@ -6705,7 +6705,7 @@ package body Sem_Ch13 is
 and then not Overlays_Constant (U_Ent)
 and then Address_Clause_Overlay_Warnings
   then
- Error_Msg_N ("??constant overlays a variable", Expr);
+ Error_Msg_N ("?o?constant overlays a variable", Expr);
 
   --  Imported variables can have an address clause, but then
   --  the import is pretty meaningless except to suppress


diff --git a/gcc/ada/sem_util.adb b/gcc/ada/sem_util.adb
--- a/gcc/ada/sem_util.adb
+++ b/gcc/ada/sem_util.adb
@@ -25893,7 +25893,7 @@ package body Sem_Util is
 
   Error_Msg_Sloc := Sloc (Addr);
   Error_Msg_NE
-("??constant& may be modified via address clause#",
+("?o?constant& may be modified via address clause#",
  N, O_Ent);
end;
 end if;

[Ada] Fix Warn_On_All_Unread_Out_Parameters not being properly tagged

2022-05-30 Thread Pierre-Marie de Rodat via Gcc-patches

This caused tools ingesting GNAT's output to mislabel these messages.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* sem_warn.adb (Warn_On_Useless_Assignment): Fix insertion
character.diff --git a/gcc/ada/sem_warn.adb b/gcc/ada/sem_warn.adb
--- a/gcc/ada/sem_warn.adb
+++ b/gcc/ada/sem_warn.adb
@@ -4595,7 +4595,7 @@ package body Sem_Warn is
 then
if Warn_On_All_Unread_Out_Parameters then
   Error_Msg_NE
-("?m?& modified by call, but value might not "
+("?.o?& modified by call, but value might not "
  & "be referenced", LA, Ent);
end if;
 else

[Ada] Fix Warn_On_Late_Primitives messages not being properly tagged

2022-05-30 Thread Pierre-Marie de Rodat via Gcc-patches

This caused tools ingesting GNAT's output to mislabel these messages.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* sem_disp.adb (Warn_On_Late_Primitive_After_Private_Extension):
Fix insertion character.diff --git a/gcc/ada/sem_disp.adb b/gcc/ada/sem_disp.adb
--- a/gcc/ada/sem_disp.adb
+++ b/gcc/ada/sem_disp.adb
@@ -1207,7 +1207,7 @@ package body Sem_Disp is
   Error_Msg_Name_2 := Chars (E);
   Error_Msg_Sloc := Sloc (E);
   Error_Msg_N
-("?j?primitive of type % defined after private extension "
+("?.j?primitive of type % defined after private extension "
  & "% #?", Prim);
   Error_Msg_Name_1 := Chars (Prim);
   Error_Msg_Name_2 := Chars (E);

[Ada] Deconstruct deferred references

2022-05-30 Thread Pierre-Marie de Rodat via Gcc-patches

While cleaning up and modifying code for unreferenced warnings we
removed all calls to Defer_Reference, which was the only routine that
populated the Deferred_References table. Consequently, all the code
related to this table became dead.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* lib-xref.ads (Deferred_Reference_Entry, Defer_Reference,
Process_Deferred_References, Has_Deferred_Reference): Remove
client API.
* lib-xref.adb (Deferred_References, Defer_Reference,
Has_Deferred_Reference, Process_Deferred_References): Remove
implementation.
* frontend.adb, sem_ch11.adb, sem_ch5.adb, sem_res.adb,
sem_util.adb, sem_warn.adb: Remove uses of Deferred_References.diff --git a/gcc/ada/frontend.adb b/gcc/ada/frontend.adb
--- a/gcc/ada/frontend.adb
+++ b/gcc/ada/frontend.adb
@@ -38,7 +38,6 @@ with Ghost;  use Ghost;
 with Inline; use Inline;
 with Lib;use Lib;
 with Lib.Load;   use Lib.Load;
-with Lib.Xref;
 with Live;   use Live;
 with Namet;  use Namet;
 with Nlists; use Nlists;
@@ -481,7 +480,6 @@ begin
 
 --  Output waiting warning messages
 
-Lib.Xref.Process_Deferred_References;
 Sem_Warn.Output_Non_Modified_In_Out_Warnings;
 Sem_Warn.Output_Unreferenced_Messages;
 Sem_Warn.Check_Unused_Withs;


diff --git a/gcc/ada/lib-xref.adb b/gcc/ada/lib-xref.adb
--- a/gcc/ada/lib-xref.adb
+++ b/gcc/ada/lib-xref.adb
@@ -57,14 +57,6 @@ package body Lib.Xref is
-- Declarations --
--
 
-   package Deferred_References is new Table.Table (
- Table_Component_Type => Deferred_Reference_Entry,
- Table_Index_Type => Int,
- Table_Low_Bound  => 0,
- Table_Initial=> 512,
- Table_Increment  => 200,
- Table_Name   => "Name_Deferred_References");
-
--  The Xref table is used to record references. The Loc field is set
--  to No_Location for a definition entry.
 
@@ -211,21 +203,6 @@ package body Lib.Xref is
   end if;
end Add_Entry;
 
-   -
-   -- Defer_Reference --
-   -
-
-   procedure Defer_Reference (Deferred_Reference : Deferred_Reference_Entry) is
-   begin
-  --  If Get_Ignore_Errors, then we are in Preanalyze_Without_Errors, and
-  --  we should not record cross references, because that will cause
-  --  duplicates when we call Analyze.
-
-  if not Get_Ignore_Errors then
- Deferred_References.Append (Deferred_Reference);
-  end if;
-   end Defer_Reference;
-
---
-- Equal --
---
@@ -1291,21 +1268,6 @@ package body Lib.Xref is
   return E;
end Get_Key;
 
-   
-   -- Has_Deferred_Reference --
-   
-
-   function Has_Deferred_Reference (Ent : Entity_Id) return Boolean is
-   begin
-  for J in Deferred_References.First .. Deferred_References.Last loop
- if Deferred_References.Table (J).E = Ent then
-return True;
- end if;
-  end loop;
-
-  return False;
-   end Has_Deferred_Reference;
-
--
-- Hash --
--
@@ -2753,33 +2715,6 @@ package body Lib.Xref is
   end Output_Refs;
end Output_References;
 
-   -
-   -- Process_Deferred_References --
-   -
-
-   procedure Process_Deferred_References is
-   begin
-  for J in Deferred_References.First .. Deferred_References.Last loop
- declare
-D : Deferred_Reference_Entry renames Deferred_References.Table (J);
-
- begin
-case Known_To_Be_Assigned (D.N) is
-   when True =>
-  Generate_Reference (D.E, D.N, 'm');
-
-   when False =>
-  Generate_Reference (D.E, D.N, 'r');
-
-end case;
- end;
-  end loop;
-
-  --  Clear processed entries from table
-
-  Deferred_References.Init;
-   end Process_Deferred_References;
-
 --  Start of elaboration for Lib.Xref
 
 begin


diff --git a/gcc/ada/lib-xref.ads b/gcc/ada/lib-xref.ads
--- a/gcc/ada/lib-xref.ads
+++ b/gcc/ada/lib-xref.ads
@@ -578,40 +578,6 @@ package Lib.Xref is
--  Export at line 4, that its body is exported to C, and that the link name
--  as given in the pragma is "here".
 
-   -
-   -- Deferred_References --
-   -
-
-   --  Normally we generate references as we go along, but as discussed in
-   --  Sem_Util.Is_LHS, and Sem_Ch8.Find_Direct_Name/Find_Selected_Component,
-   --  we have one case where that is tricky, which is when we have something
-   --  like X.A := 3, where we don't know until we know the type of X whether
-   --  this is a reference (if X is an access type, so what we really have is
-   --  X.all.A := 3) or a modification, where X is not an access type.
-
-   --

[Ada] Fix -gnatw.f warnings not having the right insertion characters

2022-05-30 Thread Pierre-Marie de Rodat via Gcc-patches

This caused these warnings not to be tagged with the switch causing
them, which is an issue for tools ingesting GNAT's output.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* sem_elab.adb (Process_Conditional_ABE_Access_Taken): Add '.f'
insertion characters.diff --git a/gcc/ada/sem_elab.adb b/gcc/ada/sem_elab.adb
--- a/gcc/ada/sem_elab.adb
+++ b/gcc/ada/sem_elab.adb
@@ -4958,7 +4958,7 @@ package body Sem_Elab is
  then
 Error_Msg_Name_1 := Attribute_Name (Attr);
 Error_Msg_NE
-  ("??% attribute of & before body seen", Attr, Subp_Id);
+  ("?.f?% attribute of & before body seen", Attr, Subp_Id);
 Error_Msg_N ("\possible Program_Error on later references", Attr);
 
 Output_Active_Scenarios (Attr, New_In_State);

[Ada] Avoid creating a finalization wrapper block for functions

2022-05-30 Thread Pierre-Marie de Rodat via Gcc-patches

This patch fixes a bug whereby if a function body has local objects that
are finalizable, and has a return statement that requires generation of
transient finalizable temps, and there are dynamic-sized objects
requiring pushing/popping the (primary) stack at run time, and the
function body has exception handlers, then incorrect code is
generated. In particular, the transient objects are finalized after they
have been deallocated. This can cause seg faults, corrupted heap, and
the like.

Note that if there are no dynamic-sized objects, then the bug does
not occur, because the back end allocates objects of compile-time-known
size based on where they are referenced, rather than which local
block they are declared in.

This patch relies on the fact that an At_End handler and regular
exception handlers CAN coexist in the same handled statement sequence,
which was not true some years ago.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* exp_ch7.adb (Wrap_HSS_In_Block): Do not create a new block in
the case of function bodies. We include all subprogram bodies,
because it's harmless for procedures. We cannot easily avoid
creating this block in ALL cases, because some transformations
of (e.g.) task bodies end up moving some code such that the
wrong exception handlers apply to that code.
(Build_Finalizer_Call): Remove code for creating a new block.
This was unreachable code, given that Wrap_HSS_In_Block has
already done that, but with the above change to
Wrap_HSS_In_Block, this code becomes reachable, and triggers
essentially the same bug.
* exp_ch7.ads: Adjust comment.diff --git a/gcc/ada/exp_ch7.adb b/gcc/ada/exp_ch7.adb
--- a/gcc/ada/exp_ch7.adb
+++ b/gcc/ada/exp_ch7.adb
@@ -341,8 +341,8 @@ package body Exp_Ch7 is
--  Build_Finalizer.
 
procedure Build_Finalizer_Call (N : Node_Id; Fin_Id : Entity_Id);
-   --  N is a construct which contains a handled sequence of statements, Fin_Id
-   --  is the entity of a finalizer. Create an At_End handler which covers the
+   --  N is a construct that contains a handled sequence of statements, Fin_Id
+   --  is the entity of a finalizer. Create an At_End handler that covers the
--  statements of N and calls Fin_Id. If the handled statement sequence has
--  an exception handler, the statements will be wrapped in a block to avoid
--  unwanted interaction with the new At_End handler.
@@ -3722,7 +3722,7 @@ package body Exp_Ch7 is
   --  which belongs to a protected type.
 
   Loc : constant Source_Ptr := Sloc (N);
-  HSS : Node_Id;
+  HSS : Node_Id := Handled_Statement_Sequence (N);
 
begin
   --  Do not perform this expansion in SPARK mode because we do not create
@@ -3732,13 +3732,8 @@ package body Exp_Ch7 is
  return;
   end if;
 
-  --  The At_End handler should have been assimilated by the finalizer
-
-  HSS := Handled_Statement_Sequence (N);
-  pragma Assert (No (At_End_Proc (HSS)));
-
   --  If the construct to be cleaned up is a protected subprogram body, the
-  --  finalizer call needs to be associated with the block which wraps the
+  --  finalizer call needs to be associated with the block that wraps the
   --  unprotected version of the subprogram. The following illustrates this
   --  scenario:
 
@@ -3760,27 +3755,9 @@ package body Exp_Ch7 is
 
   if Is_Prot_Body then
  HSS := Handled_Statement_Sequence (Last (Statements (HSS)));
-
-  --  An At_End handler and regular exception handlers cannot coexist in
-  --  the same statement sequence. Wrap the original statements in a block.
-
-  elsif Present (Exception_Handlers (HSS)) then
- declare
-End_Lab : constant Node_Id := End_Label (HSS);
-Block   : Node_Id;
-
- begin
-Block :=
-  Make_Block_Statement (Loc, Handled_Statement_Sequence => HSS);
-
-Set_Handled_Statement_Sequence (N,
-  Make_Handled_Sequence_Of_Statements (Loc, New_List (Block)));
-
-HSS := Handled_Statement_Sequence (N);
-Set_End_Label (HSS, End_Lab);
- end;
   end if;
 
+  pragma Assert (No (At_End_Proc (HSS)));
   Set_At_End_Proc (HSS, New_Occurrence_Of (Fin_Id, Loc));
 
   --  Attach reference to finalizer to tree, for LLVM use
@@ -5568,10 +5545,10 @@ package body Exp_Ch7 is
procedure Expand_Cleanup_Actions (N : Node_Id) is
   pragma Assert
 (Nkind (N) in N_Block_Statement
-| N_Entry_Body
-| N_Extended_Return_Statement
 | N_Subprogram_Body
-| N_Task_Body);
+| N_Task_Body
+| N_Entry_Body
+| N_Extended_Return_Statement);
 
   Scop : constant Entity_Id := Current_Scope;
 
@@ -5639,18 +5616,14 @@ package body Exp_Ch7 is

[Ada] Incorrect unreferenced warnings on null subprograms and formals with aspects

2022-05-30 Thread Pierre-Marie de Rodat via Gcc-patches

This patch corrects an error in the compiler whereby spurious
unreferenced warnings are generated on formal parameters of null generic
subprograms.

These changes also fix the parsing of aspects on formal parameters such
that the aspects now get properly set for all formal parameters in a
parameter list.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* par-ch6.adb (P_Formal_Part): Set Aspect_Specifications on all
formals instead of just the last in a formal id list.
* sem_ch6.adb (Analyze_Null_Procedure): Mark expanded null
generic procedures as trivial in order to avoid spurious
unreferenced warnings.diff --git a/gcc/ada/par-ch6.adb b/gcc/ada/par-ch6.adb
--- a/gcc/ada/par-ch6.adb
+++ b/gcc/ada/par-ch6.adb
@@ -1656,6 +1656,28 @@ package body Ch6 is
 
 P_Aspect_Specifications (Specification_Node, False);
 
+--  Set the aspect specifications for previous Ids
+
+if Has_Aspects (Specification_Node)
+  and then Prev_Ids (Specification_Node)
+then
+   --  Loop through each previous id
+
+   declare
+  Prev_Id : Node_Id := Prev (Specification_Node);
+   begin
+  loop
+ Set_Aspect_Specifications
+   (Prev_Id, Aspect_Specifications (Specification_Node));
+
+ --  Exit when we reach the first parameter in the list
+
+ exit when not Prev_Ids (Prev_Id);
+ Prev_Id := Prev (Prev_Id);
+  end loop;
+   end;
+end if;
+
 if Token = Tok_Right_Paren then
Scan;  -- past right paren
exit Specification_Loop;


diff --git a/gcc/ada/sem_ch6.adb b/gcc/ada/sem_ch6.adb
--- a/gcc/ada/sem_ch6.adb
+++ b/gcc/ada/sem_ch6.adb
@@ -2063,6 +2063,11 @@ package body Sem_Ch6 is
  Analyze_Generic_Subprogram_Body (Null_Body, Prev);
  Is_Completion := True;
 
+ --  Mark the newly generated subprogram body as trivial
+
+ Set_Is_Trivial_Subprogram
+   (Defining_Unit_Name (Specification (Null_Body)));
+
  goto Leave;
 
   else

[Ada] Fix illegal Ada in s-dwalin.adb

2022-05-30 Thread Pierre-Marie de Rodat via Gcc-patches

Both the `System.Mmap` and `System.Object_Reader` packages are defining
entities named `Offset` and they are both `use`d at the top of
s-dwalin.adb.

Therefore, the references to `Offset` throughout this file are
ambiguous, and GNAT is supposed to complain. Since it does not for the
moment, we fix the ambiguity by declaring a subtype `Offset` at the top
of the file simply renames `System.Object_Reader.Offset`.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* libgnat/s-dwalin.adb: Add a subtype declaration to fix the
ambiguity.diff --git a/gcc/ada/libgnat/s-dwalin.adb b/gcc/ada/libgnat/s-dwalin.adb
--- a/gcc/ada/libgnat/s-dwalin.adb
+++ b/gcc/ada/libgnat/s-dwalin.adb
@@ -44,6 +44,8 @@ with System.Storage_Elements;  use System.Storage_Elements;
 
 package body System.Dwarf_Lines is
 
+   subtype Offset is Object_Reader.Offset;
+
function Get_Load_Displacement (C : Dwarf_Context) return Storage_Offset;
--  Return the displacement between the load address present in the binary
--  and the run-time address at which it is loaded (i.e. non-zero for PIE).

[Ada] Incorrect determination of whether an expression is predicate-static

2022-05-30 Thread Pierre-Marie de Rodat via Gcc-patches

The expression given in a Static_Predicate aspect specification is
required to be predicate-static. The implementation of this compile-time
check was incorrect in some cases. There were problems in both
directions: expressions that are not predicate-static were incorrectly
treated as though they were and vice versa. This led to both accepting
invalid code and to rejecting valid code.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* sem_ch13.adb (Is_Predicate_Static): Do not generate warnings
about subexpressions of enclosing expressions. Generate warnings
for predicates that are known to be always true or always false,
except in the case where the predicate is expressed as a Boolean
literal. Deal with non-predicate-static expressions that have
been transformed into predicate-static expressions.  Add missing
Is_Type_Ref call to N_Membership_Test case.diff --git a/gcc/ada/sem_ch13.adb b/gcc/ada/sem_ch13.adb
--- a/gcc/ada/sem_ch13.adb
+++ b/gcc/ada/sem_ch13.adb
@@ -180,7 +180,8 @@ package body Sem_Ch13 is
 
function Is_Predicate_Static
  (Expr : Node_Id;
-  Nam  : Name_Id) return Boolean;
+  Nam  : Name_Id;
+  Warn : Boolean := True) return Boolean;
--  Given predicate expression Expr, tests if Expr is predicate-static in
--  the sense of the rules in (RM 3.2.4 (15-24)). Occurrences of the type
--  name in the predicate expression have been replaced by references to
@@ -205,6 +206,11 @@ package body Sem_Ch13 is
--
--  We can't allow this, otherwise we have predicate-static applying to a
--  larger class than static expressions, which was never intended.
+   --
+   --  The Warn parameter is True iff this is not a recursive call. This
+   --  parameter is used to avoid generating warnings for subexpressions and
+   --  for cases where the predicate expression (as originally written by
+   --  the user, before any transformations) is a Boolean literal.
 
procedure New_Put_Image_Subprogram
  (N: Node_Id;
@@ -14000,7 +14006,8 @@ package body Sem_Ch13 is
 
function Is_Predicate_Static
  (Expr : Node_Id;
-  Nam  : Name_Id) return Boolean
+  Nam  : Name_Id;
+  Warn : Boolean := True) return Boolean
is
   function All_Static_Case_Alternatives (L : List_Id) return Boolean;
   --  Given a list of case expression alternatives, returns True if all
@@ -14050,13 +14057,42 @@ package body Sem_Ch13 is
   begin
  return (Nkind (N) = N_Identifier
   and then Chars (N) = Nam
-  and then Paren_Count (N) = 0)
-   or else Nkind (N) = N_Function_Call;
+  and then Paren_Count (N) = 0);
   end Is_Type_Ref;
 
+  --  helper function for recursive calls
+  function Is_Predicate_Static_Aux (Expr : Node_Id) return Boolean is
+(Is_Predicate_Static (Expr, Nam, Warn => False));
+
--  Start of processing for Is_Predicate_Static
 
begin
+  --   Handle cases like
+  -- subtype S is Integer with Static_Predicate =>
+  --   (Some_Integer_Variable in Integer) and then (S /= 0);
+  --   where the predicate (which should be rejected) might have been
+  --   transformed into just "(S /= 0)", which would appear to be
+  --   a predicate-static expression (and therefore legal).
+
+  if Original_Node (Expr) /= Expr then
+
+ --  Emit warnings for predicates that are always True or always False
+ --  and were not originally expressed as Boolean literals.
+
+ return Result : constant Boolean :=
+   Is_Predicate_Static_Aux (Original_Node (Expr))
+ do
+if Result and then Warn and then Is_Entity_Name (Expr) then
+   if Entity (Expr) = Standard_True then
+  Error_Msg_N ("predicate is redundant (always True)?", Expr);
+   elsif Entity (Expr) = Standard_False then
+  Error_Msg_N
+("predicate is unsatisfiable (always False)?", Expr);
+   end if;
+end if;
+ end return;
+  end if;
+
   --  Predicate_Static means one of the following holds. Numbers are the
   --  corresponding paragraph numbers in (RM 3.2.4(16-22)).
 
@@ -14070,6 +14106,7 @@ package body Sem_Ch13 is
   --  for a static membership test.
 
   elsif Nkind (Expr) in N_Membership_Test
+and then Is_Type_Ref (Left_Opnd (Expr))
 and then All_Membership_Choices_Static (Expr)
   then
  return True;
@@ -14115,11 +14152,11 @@ package body Sem_Ch13 is
   --  operand is predicate-static.
 
   elsif (Nkind (Expr) in N_Op_And | N_Op_Or | N_Op_Xor
-  and then Is_Predicate_Static (Left_Opnd (Expr), Nam)
-  and then Is_Predicate_Static (Right_Opnd (Expr), Nam))
+  and then Is_Predicate_Static_Aux (Left_Opnd (Expr))
+  and then Is_Predicate_Static_Aux (Right_Opnd (Expr)))

[Ada] Fix expansion of aggregate for discriminated limited extension

2022-05-30 Thread Pierre-Marie de Rodat via Gcc-patches

The presence of the discriminants prevents the values associated with the
components of the parent type from being put into the sub-aggregate built
for the _Parent component.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* exp_aggr.adb (Expand_Record_Aggregate.Build_Back_End_Aggregate):
Skip the discriminants at the start of the component list before
looking for the components inherited from the parent in the case
of a tagged extension.diff --git a/gcc/ada/exp_aggr.adb b/gcc/ada/exp_aggr.adb
--- a/gcc/ada/exp_aggr.adb
+++ b/gcc/ada/exp_aggr.adb
@@ -8474,11 +8474,21 @@ package body Exp_Aggr is
   Parent_Name  : Node_Id;
 
begin
-  --  Remove the inherited component association from the
-  --  aggregate and store them in the parent aggregate
-
   First_Comp   := First (Component_Associations (N));
   Parent_Comps := New_List;
+
+  --  First skip the discriminants
+
+  while Present (First_Comp)
+and then Ekind (Entity (First (Choices (First_Comp
+   = E_Discriminant
+  loop
+ Next (First_Comp);
+  end loop;
+
+  --  Then remove the inherited component association from the
+  --  aggregate and store them in the parent aggregate
+
   while Present (First_Comp)
 and then
   Scope (Original_Record_Component

[Ada] Do not freeze specifically for dispatch tables

2022-05-30 Thread Pierre-Marie de Rodat via Gcc-patches

The left-overs of the old freezing code in Make_DT are now redundant since
the semantic analyzer performs the same task for the 'Access attribute.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* exp_disp.adb (Make_DT): Remove remaining freezing code.diff --git a/gcc/ada/exp_disp.adb b/gcc/ada/exp_disp.adb
--- a/gcc/ada/exp_disp.adb
+++ b/gcc/ada/exp_disp.adb
@@ -4469,35 +4469,6 @@ package body Exp_Disp is
  Parent_Typ := Full_View (Parent_Typ);
   end if;
 
-  --  Ensure that all the primitives are frozen. This is only required when
-  --  building static dispatch tables: the primitives must be frozen to be
-  --  referenced, otherwise we have problems with the back end. But this is
-  --  not a requirement with nonstatic dispatch tables because in this case
-  --  we generate an empty dispatch table at this point and the extra code
-  --  required to register the primitives in their slot will be generated
-  --  later, when each primitive is frozen (see Freeze_Subprogram).
-
-  if Building_Static_DT (Typ) then
- declare
-F_List: List_Id;
-Prim  : Entity_Id;
-Prim_Elmt : Elmt_Id;
-
- begin
-Prim_Elmt := First_Elmt (Primitive_Operations (Typ));
-while Present (Prim_Elmt) loop
-   Prim   := Node (Prim_Elmt);
-   F_List := Freeze_Entity (Prim, Typ, Do_Freeze_Profile => False);
-
-   if Present (F_List) then
-  Append_List_To (Result, F_List);
-   end if;
-
-   Next_Elmt (Prim_Elmt);
-end loop;
- end;
-  end if;
-
   if not Is_Interface (Typ) and then Has_Interfaces (Typ) then
  declare
 Cannot_Have_Null_Disc : Boolean := False;

[Ada] Do not analyze expression functions for dispatch tables

2022-05-30 Thread Pierre-Marie de Rodat via Gcc-patches

There is no need to analyze expression functions that are primitives of a
tagged type for its dispatch table because they will be analyzed at the end
of the current scope.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* sem_attr.adb (Resolve_Attribute) : Don't analyze
the body of an expression function in the case of a dispatch table.diff --git a/gcc/ada/sem_attr.adb b/gcc/ada/sem_attr.adb
--- a/gcc/ada/sem_attr.adb
+++ b/gcc/ada/sem_attr.adb
@@ -12144,11 +12144,15 @@ package body Sem_Attr is
  elsif Scope (Subp_Id) /= Current_Scope then
 null;
 
+ --  Dispatch tables are not a freeze point either
+
+ elsif Nkind (Parent (N)) = N_Unchecked_Type_Conversion
+   and then Is_Dispatch_Table_Entity (Etype (Parent (N)))
+ then
+null;
+
   --  Analyze the body of the expression function to freeze
-  --  the expression. This takes care of the case where the
-  --  'Access is part of dispatch table initialization and
-  --  the generated body of the expression function has not
-  --  been analyzed yet.
+  --  the expression.
 
  else
 Analyze (Subp_Body);

[Ada] Introduce Opt.CCG_Mode

2022-05-30 Thread Pierre-Marie de Rodat via Gcc-patches

To handle code common to the old and the new CCG code generator.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* gnat1drv.adb, opt.ads, sem_ch7.adb: Introduce CCG_Mode.diff --git a/gcc/ada/gnat1drv.adb b/gcc/ada/gnat1drv.adb
--- a/gcc/ada/gnat1drv.adb
+++ b/gcc/ada/gnat1drv.adb
@@ -180,6 +180,7 @@ procedure Gnat1drv is
   --  Set all flags required when generating C code
 
   if Generate_C_Code then
+ CCG_Mode := True;
  Modify_Tree_For_C := True;
  Transform_Function_Array := True;
  Unnest_Subprogram_Mode := True;


diff --git a/gcc/ada/opt.ads b/gcc/ada/opt.ads
--- a/gcc/ada/opt.ads
+++ b/gcc/ada/opt.ads
@@ -262,6 +262,9 @@ package Opt is
--  Set to True to build, bind and link all the sources of a project file
--  (switch -B)
 
+   CCG_Mode : Boolean := False;
+   --  Set to True when running as CCG (either via -gnatceg or via -emit-c)
+
Check_Aliasing_Of_Parameters : Boolean := False;
--  GNAT
--  Set to True to detect whether subprogram parameters and function results


diff --git a/gcc/ada/sem_ch7.adb b/gcc/ada/sem_ch7.adb
--- a/gcc/ada/sem_ch7.adb
+++ b/gcc/ada/sem_ch7.adb
@@ -409,7 +409,7 @@ package body Sem_Ch7 is
  --  should occur, so we need to catch all cases where the
  --  subprogram may be inlined by the client.
 
- if not Generate_C_Code
+ if not CCG_Mode
and then (Is_Inlined (Decl_Id)
   or else Has_Pragma_Inline (Decl_Id))
  then
@@ -431,7 +431,7 @@ package body Sem_Ch7 is
  --  unless we generate C code since inlining is then
  --  handled by the C compiler.
 
- if not Generate_C_Code
+ if not CCG_Mode
and then (Is_Inlined (Decl_Id)
   or else Has_Pragma_Inline (Decl_Id))
  then

[Ada] Update proofs of double arithmetic unit after prover changes

2022-05-30 Thread Pierre-Marie de Rodat via Gcc-patches

Changes in GNATprove (translation to Why3 and provers) result in proofs
being much less automatic, and requiring ghost code to detail
intermediate steps. In some cases, it is better to use the new By
mechanism to prove assertions in steps, as this avoids polluting the
proof context for other proofs. For that reason, add units s-spark.ads
and s-spcuop.ads/b to the runtime sources.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* Makefile.rtl: Add new units.
* libgnat/s-aridou.adb (Scaled_Divide): Add ghost code for provers.
* libgnat/s-spcuop.adb: New unit for ghost cut operations.
* libgnat/s-spcuop.ads: New unit for ghost cut operations.
* libgnat/s-spark.ads: New unit.

patch.diff.gz
Description: application/gzip

[Ada] Tweaks to hardening docs

2022-05-30 Thread Pierre-Marie de Rodat via Gcc-patches

Mention when security hardening features are available in other
languages.

Expand the strub documentation a little, for clarity and completeness.

Add missing 'aliased' and attribute to variables in strub example.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* doc/gnat_rm/security_hardening_features.rst: Mention
availability in other languages when applicable.
(Stack Scrubbing): Associate the attribute with types, expand
some comments, fix the example involving access to variables.
* gnat_rm.texi: Regenerate.diff --git a/gcc/ada/doc/gnat_rm/security_hardening_features.rst b/gcc/ada/doc/gnat_rm/security_hardening_features.rst
--- a/gcc/ada/doc/gnat_rm/security_hardening_features.rst
+++ b/gcc/ada/doc/gnat_rm/security_hardening_features.rst
@@ -31,9 +31,10 @@ option, to affect all subprograms in a compilation, and with a
  --  Before returning, Bar scrubs all call-clobbered registers.
 
 
-For usage and more details on the command-line option, and on the
-``zero_call_used_regs`` attribute, see :title:`Using the GNU Compiler
-Collection (GCC)`.
+For usage and more details on the command-line option, on the
+``zero_call_used_regs`` attribute, and on their use with other
+programming languages, see :title:`Using the GNU Compiler Collection
+(GCC)`.
 
 
 .. Stack Scrubbing:
@@ -44,14 +45,17 @@ Stack Scrubbing
 GNAT can generate code to zero-out stack frames used by subprograms.
 
 It can be activated with the :samp:`Machine_Attribute` pragma, on
-specific subprograms and variables.
+specific subprograms and variables, or their types.  (This attribute
+always applies to a type, even when it is associated with a subprogram
+or a variable.)
 
 .. code-block:: ada
 
  function Foo returns Integer;
  pragma Machine_Attribute (Foo, "strub");
  --  Foo and its callers are modified so as to scrub the stack
- --  space used by Foo after it returns.
+ --  space used by Foo after it returns.  Shorthand for:
+ --  pragma Machine_Attribute (Foo, "strub", "at-calls");
 
  procedure Bar;
  pragma Machine_Attribute (Bar, "strub", "internal");
@@ -61,13 +65,16 @@ specific subprograms and variables.
  Var : Integer;
  pragma Machine_Attribute (Var, "strub");
  --  Reading from Var in a subprogram enables stack scrubbing
- --  of the stack space used by the subprogram.
+ --  of the stack space used by the subprogram.  Furthermore, if
+ --  Var is declared within a subprogram, this also enables
+ --  scrubbing of the stack space used by that subprogram.
 
 
 There are also :switch:`-fstrub` command-line options to control
 default settings.  For usage and more details on the command-line
-option, and on the ``strub`` attribute, see :title:`Using the GNU
-Compiler Collection (GCC)`.
+option, on the ``strub`` attribute, and their use with other
+programming languages, see :title:`Using the GNU Compiler Collection
+(GCC)`.
 
 Note that Ada secondary stacks are not scrubbed.  The restriction
 ``No_Secondary_Stack`` avoids their use, and thus their accidental
@@ -81,16 +88,19 @@ access type designates a non-strub type.
 
 .. code-block:: ada
 
- VI : Integer;
+ VI : aliased Integer;
+ pragma Machine_Attribute (VI, "strub");
  XsVI : access Integer := VI'Access; -- Error.
  UXsVI : access Integer := VI'Unchecked_Access; -- OK,
- -- UXsVI.all does not enable strub in the enclosing subprogram.
+ --  UXsVI does *not* enable strub in subprograms that
+ --  dereference it to obtain the UXsVI.all value.
 
  type Strub_Int is new Integer;
  pragma Machine_Attribute (Strub_Int, "strub");
- VSI : Strub_Int;
- XsVSI : access Strub_Int := VSI'Access; -- OK.
- -- XsVSI.all enables strub in the enclosing subprogram.
+ VSI : aliased Strub_Int;
+ XsVSI : access Strub_Int := VSI'Access; -- OK,
+ --  VSI and XsVSI.all both enable strub in subprograms that
+ --  read their values.
 
 
 Every access-to-subprogram type, renaming, and overriding and
@@ -108,6 +118,9 @@ turned ``callable`` through such an explicit conversion:
 
  type TBar_Callable is access procedure;
  pragma Machine_Attribute (TBar_Callable, "strub", "callable");
+ --  The attribute modifies the procedure type, rather than the
+ --  access type, because of the extra argument after "strub",
+ --  only applicable to subprogram types.
 
  Bar_Callable_Ptr : constant TBar_Callable
 		:= TBar_Callable (TBar'(Bar'Access));
@@ -115,6 +128,7 @@ turned ``callable`` through such an explicit conversion:
  procedure Bar_Callable renames Bar_Callable_Ptr.all;
  pragma Machine_Attribute (Bar_Callable, "strub", "callable");
 
+
 Note that the renaming declaration is expanded to a full subprogram
 body, it won't be just an alias.  Only if it is inlined will it be as
 efficient as a call by dereferencing the access-to-subprogram constant
@@ -162,6 +176,10 @@ respectively.
 They are

[Ada] Fix typo in comment for functional sets

2022-05-30 Thread Pierre-Marie de Rodat via Gcc-patches

Minor fix in a recently added comment.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* libgnat/a-cofuse.ads (Empty_Set): Fix typo in comment.diff --git a/gcc/ada/libgnat/a-cofuse.ads b/gcc/ada/libgnat/a-cofuse.ads
--- a/gcc/ada/libgnat/a-cofuse.ads
+++ b/gcc/ada/libgnat/a-cofuse.ads
@@ -216,7 +216,7 @@ package Ada.Containers.Functional_Sets with SPARK_Mode is
  and Included_Except (Add'Result, Container, Item);
 
function Empty_Set return Set with
-   --  Return an new empty set
+   --  Return a new empty set
 
  Global => null,
  Post   => Is_Empty (Empty_Set'Result);

[Ada] Do not freeze profiles for dispatch tables

2022-05-30 Thread Pierre-Marie de Rodat via Gcc-patches

When static dispatch tables are built for library-level tagged types, the
primitives (the subprogram themselves) are frozen; that's necessary because
their address is taken.  However, their profile, i.e. all the types present
therein, is also frozen, which is not necessary after AI05-019 and is also
inconsistent with the handling of attribute references.

The change also removes a couple of pragma Inline on subprograms that are
too large for inlining to bring any benefit.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* exp_ch3.adb (Expand_N_Object_Declaration): Adjust call to Make_DT.
* exp_disp.ads (Building_Static_DT): Remove pragma Inline.
(Building_Static_Secondary_DT): Likewise.
(Convert_Tag_To_Interface): Likewise.
(Make_DT): Remove second parameter.
* exp_disp.adb (Make_DT): Likewise.
(Check_Premature_Freezing): Delete.
Pass Do_Freeze_Profile as False in call to Freeze_Entity.
* freeze.ads (Freezing_Library_Level_Tagged_Type): Delete.
* freeze.adb (Freeze_Profile): Remove obsolete code.
(Freeze_Entity): Tweak comment.diff --git a/gcc/ada/exp_ch3.adb b/gcc/ada/exp_ch3.adb
--- a/gcc/ada/exp_ch3.adb
+++ b/gcc/ada/exp_ch3.adb
@@ -6909,9 +6909,9 @@ package body Exp_Ch3 is
 
  begin
 if Is_Concurrent_Type (Base_Typ) then
-   New_Nodes := Make_DT (Corresponding_Record_Type (Base_Typ), N);
+   New_Nodes := Make_DT (Corresponding_Record_Type (Base_Typ));
 else
-   New_Nodes := Make_DT (Base_Typ, N);
+   New_Nodes := Make_DT (Base_Typ);
 end if;
 
 Insert_List_Before (N, New_Nodes);


diff --git a/gcc/ada/exp_disp.adb b/gcc/ada/exp_disp.adb
--- a/gcc/ada/exp_disp.adb
+++ b/gcc/ada/exp_disp.adb
@@ -3660,7 +3660,7 @@ package body Exp_Disp is
--  replaced by gotos which jump to the end of the routine and restore the
--  Ghost mode.
 
-   function Make_DT (Typ : Entity_Id; N : Node_Id := Empty) return List_Id is
+   function Make_DT (Typ : Entity_Id) return List_Id is
   Loc : constant Source_Ptr := Sloc (Typ);
 
   Max_Predef_Prims : constant Int :=
@@ -3678,23 +3678,6 @@ package body Exp_Disp is
   --  offset to the components that reference secondary dispatch tables.
   --  Used to compute the offset of components located at fixed position.
 
-  procedure Check_Premature_Freezing
-(Subp: Entity_Id;
- Tagged_Type : Entity_Id;
- Typ : Entity_Id);
-  --  Verify that all untagged types in the profile of a subprogram are
-  --  frozen at the point the subprogram is frozen. This enforces the rule
-  --  on RM 13.14 (14) as modified by AI05-019. At the point a subprogram
-  --  is frozen, enough must be known about it to build the activation
-  --  record for it, which requires at least that the size of all
-  --  parameters be known. Controlling arguments are by-reference,
-  --  and therefore the rule only applies to untagged types. Typical
-  --  violation of the rule involves an object declaration that freezes a
-  --  tagged type, when one of its primitive operations has a type in its
-  --  profile whose full view has not been analyzed yet. More complex cases
-  --  involve composite types that have one private unfrozen subcomponent.
-  --  Move this check to sem???
-
   procedure Export_DT (Typ : Entity_Id; DT : Entity_Id; Index : Nat := 0);
   --  Export the dispatch table DT of tagged type Typ. Required to generate
   --  forward references and statically allocate the table. For primary
@@ -3733,103 +3716,6 @@ package body Exp_Disp is
   function Number_Of_Predefined_Prims (Typ : Entity_Id) return Nat;
   --  Returns the number of predefined primitives of Typ
 
-  --
-  -- Check_Premature_Freezing --
-  --
-
-  procedure Check_Premature_Freezing
-(Subp: Entity_Id;
- Tagged_Type : Entity_Id;
- Typ : Entity_Id)
-  is
- Comp : Entity_Id;
-
- function Is_Actual_For_Formal_Incomplete_Type
-   (T : Entity_Id) return Boolean;
- --  In Ada 2012, if a nested generic has an incomplete formal type,
- --  the actual may be (and usually is) a private type whose completion
- --  appears later. It is safe to build the dispatch table in this
- --  case, gigi will have full views available.
-
- --
- -- Is_Actual_For_Formal_Incomplete_Type --
- --
-
- function Is_Actual_For_Formal_Incomplete_Type
-   (T : Entity_Id) return Boolean
- is
-Gen_Par : Entity_Id;
-F   : Node_Id;
-
- begin
-if not Is_Generic_Instance (Current_Scope)
-  or

[Ada] Restore hiding of predefined "=" operator through class-wide type

2022-05-30 Thread Pierre-Marie de Rodat via Gcc-patches

The previous change introduced a backward incompatibility in the handling
of a user-defined "=" operator for a class-wide type of a tagged type: it
would previously hide the predefined "=" operator of the tagged type in
the private case, but it no longer does in this case, while it still does
in the nonprivate case.

This hiding is a non-portability, but is fundamentally what the compiler
implements, instead of the RM rule which requires homographs.  The reason
lies in the implementation of the "=" operator in GNAT: internally, there
is not a "=" predefined operator for every nonlimited type, but instead
there is a single, universal "=" predefined operator for all the nonlimited
types.  The consequence is that the hiding rule implemented in GNAT for "="
is effectively that a user-declared symmetrical "=" operator returning
boolean hides the predefined "=" operator of any type that is covered by
the user-declared operator.

Whether it is desirable to implement the exact RM rule in GNAT is to be
discussed, but existing code relies on the non-portability and would thus
need to be changed.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* sem_ch6.adb (New_Overloaded_Entity): Deal specifically with the
overriding of the "=" operator for tagged types.diff --git a/gcc/ada/sem_ch6.adb b/gcc/ada/sem_ch6.adb
--- a/gcc/ada/sem_ch6.adb
+++ b/gcc/ada/sem_ch6.adb
@@ -12303,7 +12303,34 @@ package body Sem_Ch6 is
--  Work done in Override_Dispatching_Operation, so
--  nothing else needs to be done here.
 
-   null;
+   --  ??? Special case to keep supporting the hiding
+   --  of the predefined "=" operator for a nonlimited
+   --  tagged type by a user-defined "=" operator for
+   --  its class-wide type when the type is private.
+
+   if Chars (E) = Name_Op_Eq then
+  declare
+ Typ : constant Entity_Id
+ := Etype (First_Entity (E));
+ H   : Entity_Id := Homonym (E);
+
+  begin
+ while Present (H)
+   and then Scope (H) = Scope (E)
+ loop
+if Is_User_Defined_Equality (H)
+   and then Is_Immediately_Visible (H)
+   and then Etype (First_Entity (H))
+  = Class_Wide_Type (Typ)
+then
+   Remove_Entity_And_Homonym (E);
+   exit;
+end if;
+
+H := Homonym (H);
+ end loop;
+  end;
+   end if;
 end if;
 
  else

[Ada] Adapt proof of runtime unit s-arit32

2022-05-30 Thread Pierre-Marie de Rodat via Gcc-patches

After changes in GNATprove, adapt proof. Simply move an assertion up
before it is first needed here.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* libgnat/s-arit32.adb (Scaled_Divide32): Move assertion up.diff --git a/gcc/ada/libgnat/s-arit32.adb b/gcc/ada/libgnat/s-arit32.adb
--- a/gcc/ada/libgnat/s-arit32.adb
+++ b/gcc/ada/libgnat/s-arit32.adb
@@ -474,6 +474,7 @@ is
 
   D := Uns64 (Xu) * Uns64 (Yu);
 
+  Lemma_Abs_Mult_Commutation (Big (X), Big (Y));
   pragma Assert (Mult = Big (D));
   Lemma_Hi_Lo (D, Hi (D), Lo (D));
   pragma Assert (Mult = Big_2xx32 * Big (Hi (D)) + Big (Lo (D)));
@@ -508,7 +509,6 @@ is
 
   Lemma_Abs_Div_Commutation (Big (X) * Big (Y), Big (Z));
   Lemma_Abs_Rem_Commutation (Big (X) * Big (Y), Big (Z));
-  Lemma_Abs_Mult_Commutation (Big (X), Big (Y));
   Lemma_Abs_Commutation (X);
   Lemma_Abs_Commutation (Y);
   Lemma_Abs_Commutation (Z);

[Ada] PR ada/105303 Fix use of Assertion_Policy in internal generics unit

2022-05-30 Thread Pierre-Marie de Rodat via Gcc-patches

The internal unit System.Generic_Array_Operations defines only generic
subprograms. Thus, pragma Assertion_Policy inside the spec has no
effect, as each instantiation is only subject to the assertion policy at
the program point of the instantiation. Remove this confusing pragma,
and add the pragma inside each generic body making use of additional
assertions or ghost code, so that running time of instantiations is not
impacted by assertions meant for formal verification.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

PR ada/105303
* libgnat/s-gearop.adb: Add pragma Assertion_Policy in generic
bodies making use of additional assertions or ghost code.
* libgnat/s-gearop.ads: Remove confusing Assertion_Policy.diff --git a/gcc/ada/libgnat/s-gearop.adb b/gcc/ada/libgnat/s-gearop.adb
--- a/gcc/ada/libgnat/s-gearop.adb
+++ b/gcc/ada/libgnat/s-gearop.adb
@@ -32,7 +32,8 @@
 --  Preconditions, postconditions, ghost code, loop invariants and assertions
 --  in this unit are meant for analysis only, not for run-time checking, as it
 --  would be too costly otherwise. This is enforced by setting the assertion
---  policy to Ignore.
+--  policy to Ignore, here for non-generic code, and inside the generic for
+--  generic code.
 
 pragma Assertion_Policy (Pre=> Ignore,
  Post   => Ignore,
@@ -72,6 +73,12 @@ is
--
 
function Diagonal (A : Matrix) return Vector is
+  pragma Assertion_Policy (Pre=> Ignore,
+   Post   => Ignore,
+   Ghost  => Ignore,
+   Loop_Invariant => Ignore,
+   Assert => Ignore);
+
   N : constant Natural := Natural'Min (A'Length (1), A'Length (2));
begin
   return R : Vector (A'First (1) .. A'First (1) + (N - 1))
@@ -126,6 +133,11 @@ is
-
 
procedure Back_Substitute (M, N : in out Matrix) is
+  pragma Assertion_Policy (Pre=> Ignore,
+   Post   => Ignore,
+   Ghost  => Ignore,
+   Loop_Invariant => Ignore,
+   Assert => Ignore);
   pragma Assert (M'First (1) = N'First (1)
and then
  M'Last  (1) = N'Last (1));
@@ -215,6 +227,11 @@ is
   N   : in out Matrix;
   Det : out Scalar)
is
+  pragma Assertion_Policy (Pre=> Ignore,
+   Post   => Ignore,
+   Ghost  => Ignore,
+   Loop_Invariant => Ignore,
+   Assert => Ignore);
   pragma Assert (M'First (1) = N'First (1)
and then
  M'Last  (1) = N'Last (1));
@@ -460,6 +477,11 @@ is
-
 
function L2_Norm (X : X_Vector) return Result_Real'Base is
+  pragma Assertion_Policy (Pre=> Ignore,
+   Post   => Ignore,
+   Ghost  => Ignore,
+   Loop_Invariant => Ignore,
+   Assert => Ignore);
   Sum : Result_Real'Base := 0.0;
 
begin
@@ -479,6 +501,11 @@ is
--
 
function Matrix_Elementwise_Operation (X : X_Matrix) return Result_Matrix is
+  pragma Assertion_Policy (Pre=> Ignore,
+   Post   => Ignore,
+   Ghost  => Ignore,
+   Loop_Invariant => Ignore,
+   Assert => Ignore);
begin
   return R : Result_Matrix (X'Range (1), X'Range (2))
 with Relaxed_Initialization
@@ -524,6 +551,11 @@ is
  (Left  : Left_Matrix;
   Right : Right_Matrix) return Result_Matrix
is
+  pragma Assertion_Policy (Pre=> Ignore,
+   Post   => Ignore,
+   Ghost  => Ignore,
+   Loop_Invariant => Ignore,
+   Assert => Ignore);
begin
   return R : Result_Matrix (Left'Range (1), Left'Range (2))
 with Relaxed_Initialization
@@ -570,6 +602,11 @@ is
   Y : Y_Matrix;
   Z : Z_Scalar) return Result_Matrix
is
+  pragma Assertion_Policy (Pre=> Ignore,
+   Post   => Ignore,
+   Ghost  => Ignore,
+   Loop_Invariant => Ignore,
+   Assert => Ignore);
begin
   return R : Result_Matrix (X'Range (1), X'Range (2))
 with Relaxed_Initialization
@@ -657,6 +694,11 @@ is
  (Left  :

[Ada] Delete no-longer-used Convert_To_Return_False flag

2022-05-30 Thread Pierre-Marie de Rodat via Gcc-patches

As a result of other recent changes, the Convert_To_Return_False flag
is never set. The flag can be therefore be deleted.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* exp_ch11.adb (Expand_N_Raise_Expression): Remove
Convert_To_Return_False test.
* gen_il-fields.ads: Remove Convert_To_Return_False field.
* gen_il-gen-gen_nodes.adb: Remove use of
Convert_To_Return_False field.
* sinfo.ads: Remove comment describing Convert_To_Return_False
flag.diff --git a/gcc/ada/exp_ch11.adb b/gcc/ada/exp_ch11.adb
--- a/gcc/ada/exp_ch11.adb
+++ b/gcc/ada/exp_ch11.adb
@@ -1350,37 +1350,19 @@ package body Exp_Ch11 is
   -- in
   --   raise Constraint_Error;
 
-  --  unless the flag Convert_To_Return_False is set, in which case
-  --  the transformation is to:
-
-  -- do
-  --   return False;
-  -- in
-  --   raise Constraint_Error;
-
   --  The raise constraint error can never be executed. It is just a dummy
   --  node that can be labeled with an arbitrary type.
 
   RCE := Make_Raise_Constraint_Error (Loc, Reason => CE_Explicit_Raise);
   Set_Etype (RCE, Typ);
 
-  if Convert_To_Return_False (N) then
- Rewrite (N,
-   Make_Expression_With_Actions (Loc,
- Actions => New_List (
-   Make_Simple_Return_Statement (Loc,
- Expression => New_Occurrence_Of (Standard_False, Loc))),
-   Expression => RCE));
-
-  else
- Rewrite (N,
-   Make_Expression_With_Actions (Loc,
- Actions => New_List (
-   Make_Raise_Statement (Loc,
- Name   => Name (N),
- Expression => Expression (N))),
-   Expression => RCE));
-  end if;
+  Rewrite (N,
+Make_Expression_With_Actions (Loc,
+  Actions => New_List (
+Make_Raise_Statement (Loc,
+  Name   => Name (N),
+  Expression => Expression (N))),
+Expression => RCE));
 
   Analyze_And_Resolve (N, Typ);
end Expand_N_Raise_Expression;


diff --git a/gcc/ada/gen_il-fields.ads b/gcc/ada/gen_il-fields.ads
--- a/gcc/ada/gen_il-fields.ads
+++ b/gcc/ada/gen_il-fields.ads
@@ -118,7 +118,6 @@ package Gen_IL.Fields is
   Contract_Test_Cases,
   Controlling_Argument,
   Conversion_OK,
-  Convert_To_Return_False,
   Corresponding_Aspect,
   Corresponding_Body,
   Corresponding_Entry_Body,


diff --git a/gcc/ada/gen_il-gen-gen_nodes.adb b/gcc/ada/gen_il-gen-gen_nodes.adb
--- a/gcc/ada/gen_il-gen-gen_nodes.adb
+++ b/gcc/ada/gen_il-gen-gen_nodes.adb
@@ -523,8 +523,7 @@ begin -- Gen_IL.Gen.Gen_Nodes
 
Cc (N_Raise_Expression, N_Subexpr,
(Sy (Name, Node_Id, Default_Empty),
-Sy (Expression, Node_Id, Default_Empty),
-Sm (Convert_To_Return_False, Flag)));
+Sy (Expression, Node_Id, Default_Empty)));
 
Cc (N_Range, N_Subexpr,
(Sy (Low_Bound, Node_Id),


diff --git a/gcc/ada/sinfo.ads b/gcc/ada/sinfo.ads
--- a/gcc/ada/sinfo.ads
+++ b/gcc/ada/sinfo.ads
@@ -1005,12 +1005,6 @@ package Sinfo is
--direct conversion of the underlying integer result, with no regard to
--the small operand.
 
-   --  Convert_To_Return_False
-   --Present in N_Raise_Expression nodes that appear in the body of the
-   --special predicateM function used to test a predicate in the context
-   --of a membership test, where raise expression results in returning a
-   --value of False rather than raising an exception.
-
--  Corresponding_Aspect
--Present in N_Pragma node. Used to point back to the source aspect from
--the corresponding pragma. This field is Empty for source pragmas.
@@ -6932,7 +6926,6 @@ package Sinfo is
   --  Sloc points to RAISE
   --  Name (always present)
   --  Expression (set to Empty if no expression present)
-  --  Convert_To_Return_False
   --  plus fields for expression
 
   ---

[Ada] Add empty constructors to the functional containers

2022-05-30 Thread Pierre-Marie de Rodat via Gcc-patches

This patch adds empty constructors to the functional containers so that
we can use them in expression functions.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* libgnat/a-cofuma.ads, libgnat/a-cofuma.adb,
libgnat/a-cofuse.ads, libgnat/a-cofuse.adb,
libgnat/a-cofuve.ads, libgnat/a-cofuve.adb: Add empty
constructors.diff --git a/gcc/ada/libgnat/a-cofuma.adb b/gcc/ada/libgnat/a-cofuma.adb
--- a/gcc/ada/libgnat/a-cofuma.adb
+++ b/gcc/ada/libgnat/a-cofuma.adb
@@ -130,6 +130,13 @@ package body Ada.Containers.Functional_Maps with SPARK_Mode => Off is
   return True;
end Elements_Equal_Except;
 
+   ---
+   -- Empty_Map --
+   ---
+
+   function Empty_Map return Map is
+  ((others =>  <>));
+
-
-- Get --
-


diff --git a/gcc/ada/libgnat/a-cofuma.ads b/gcc/ada/libgnat/a-cofuma.ads
--- a/gcc/ada/libgnat/a-cofuma.ads
+++ b/gcc/ada/libgnat/a-cofuma.ads
@@ -243,6 +243,14 @@ package Ada.Containers.Functional_Maps with SPARK_Mode is
  and Container <= Add'Result
  and Keys_Included_Except (Add'Result, Container, New_Key);
 
+   function Empty_Map return Map with
+   --  Return an empty Map
+
+ Global => null,
+ Post   =>
+   Length (Empty_Map'Result) = 0
+ and Is_Empty (Empty_Map'Result);
+
function Remove
  (Container : Map;
   Key   : Key_Type) return Map


diff --git a/gcc/ada/libgnat/a-cofuse.adb b/gcc/ada/libgnat/a-cofuse.adb
--- a/gcc/ada/libgnat/a-cofuse.adb
+++ b/gcc/ada/libgnat/a-cofuse.adb
@@ -63,6 +63,13 @@ package body Ada.Containers.Functional_Sets with SPARK_Mode => Off is
function Contains (Container : Set; Item : Element_Type) return Boolean is
  (Find (Container.Content, Item) > 0);
 
+   ---
+   -- Empty_Set --
+   ---
+
+   function Empty_Set return Set is
+  ((others => <>));
+
-
-- Included_Except --
-


diff --git a/gcc/ada/libgnat/a-cofuse.ads b/gcc/ada/libgnat/a-cofuse.ads
--- a/gcc/ada/libgnat/a-cofuse.ads
+++ b/gcc/ada/libgnat/a-cofuse.ads
@@ -215,6 +215,12 @@ package Ada.Containers.Functional_Sets with SPARK_Mode is
  and Container <= Add'Result
  and Included_Except (Add'Result, Container, Item);
 
+   function Empty_Set return Set with
+   --  Return an new empty set
+
+ Global => null,
+ Post   => Is_Empty (Empty_Set'Result);
+
function Remove (Container : Set; Item : Element_Type) return Set with
--  Return a new set containing all the elements of Container except E
 


diff --git a/gcc/ada/libgnat/a-cofuve.adb b/gcc/ada/libgnat/a-cofuve.adb
--- a/gcc/ada/libgnat/a-cofuve.adb
+++ b/gcc/ada/libgnat/a-cofuve.adb
@@ -118,6 +118,13 @@ package body Ada.Containers.Functional_Vectors with SPARK_Mode => Off is
   return False;
end Contains;
 
+   
+   -- Empty_Sequence --
+   
+
+   function Empty_Sequence return Sequence is
+  ((others => <>));
+
--
-- Equal_Except --
--


diff --git a/gcc/ada/libgnat/a-cofuve.ads b/gcc/ada/libgnat/a-cofuve.ads
--- a/gcc/ada/libgnat/a-cofuve.ads
+++ b/gcc/ada/libgnat/a-cofuve.ads
@@ -343,6 +343,12 @@ package Ada.Containers.Functional_Vectors with SPARK_Mode is
--  i.e. it does not contain pointers that could be used to alias mutable
--  data).
 
+   function Empty_Sequence return Sequence with
+   --  Return an empty Sequence
+
+ Global => null,
+ Post   => Length (Empty_Sequence'Result) = 0;
+
---
--  Iteration Primitives --
---

[Ada] Fix new CUDA kernel registration scheme

2022-05-30 Thread Pierre-Marie de Rodat via Gcc-patches

Removal of the previous kernel registration scheme unearthed mistakes in
the new one, which were:
- The new kernel registration code relied on the binder expansion phase,
  which didn't happen because the registration code was already
  generated by the binder.
- The kernel handle passed to CUDA_Register_Function was the first eight
  bytes of the code of the host-side procedure representing the kernel
  rather than its address.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* bindgen.adb (Gen_CUDA_Init): Remove code generating CUDA
definitions.
(Gen_CUDA_Defs): New function, generating definitions
initialized by Gen_CUDA_Init.
(Gen_Output_File_Ada): Call Gen_CUDA_Defs instead of
Gen_CUDA_Init.
(Gen_Adainit): Call Gen_CUDA_Init.diff --git a/gcc/ada/bindgen.adb b/gcc/ada/bindgen.adb
--- a/gcc/ada/bindgen.adb
+++ b/gcc/ada/bindgen.adb
@@ -311,8 +311,11 @@ package body Bindgen is
procedure Gen_CodePeer_Wrapper;
--  For CodePeer, generate wrapper which calls user-defined main subprogram
 
+   procedure Gen_CUDA_Defs;
+   --  Generate definitions needed in order to register kernels
+
procedure Gen_CUDA_Init;
-   --  When CUDA registration code is needed.
+   --  Generate calls needed in order to register kernels
 
procedure Gen_Elab_Calls (Elab_Order : Unit_Id_Array);
--  Generate sequence of elaboration calls
@@ -1115,6 +1118,8 @@ package body Bindgen is
  WBI ("");
   end if;
 
+  Gen_CUDA_Init;
+
   Gen_Elab_Calls (Elab_Order);
 
   if not CodePeer_Mode then
@@ -1221,10 +1226,10 @@ package body Bindgen is
end Gen_Bind_Env_String;
 
---
-   -- Gen_CUDA_Init --
+   -- Gen_CUDA_Defs --
---
 
-   procedure Gen_CUDA_Init is
+   procedure Gen_CUDA_Defs is
   Unit_Name : constant String :=
 Get_Name_String (Units.Table (First_Unit_Entry).Uname);
   Unit : constant String :=
@@ -1237,7 +1242,7 @@ package body Bindgen is
   WBI ("");
   WBI ("   ");
 
-  WBI ("   function CUDA_Register_Function");
+  WBI ("   procedure CUDA_Register_Function");
   WBI ("  (Fat_Binary_Handle : System.Address;");
   WBI ("   Func : System.Address;");
   WBI ("   Kernel_Name : Interfaces.C.Strings.chars_ptr;");
@@ -1247,7 +1252,7 @@ package body Bindgen is
   WBI ("   Nullptr2 : System.Address;");
   WBI ("   Nullptr3 : System.Address;");
   WBI ("   Nullptr4 : System.Address;");
-  WBI ("   Nullptr5 : System.Address) return Boolean;");
+  WBI ("   Nullptr5 : System.Address);");
   WBI ("   pragma Import");
   WBI (" (Convention => C,");
   WBI ("  Entity => CUDA_Register_Function,");
@@ -1261,8 +1266,8 @@ package body Bindgen is
   WBI ("   Entity => CUDA_Register_Fat_Binary,");
   WBI ("   External_Name => ""__cudaRegisterFatBinary"");");
   WBI ("");
-  WBI ("   function CUDA_Register_Fat_Binary_End");
-  WBI (" (Fat_Binary : System.Address) return Boolean;");
+  WBI ("   procedure CUDA_Register_Fat_Binary_End");
+  WBI (" (Fat_Binary : System.Address);");
   WBI ("   pragma Import");
   WBI (" (Convention => C,");
   WBI ("  Entity => CUDA_Register_Fat_Binary_End,");
@@ -1287,8 +1292,7 @@ package body Bindgen is
   WBI ("  Fat_Binary'Address,");
   WBI ("  System.Null_Address);");
   WBI ("");
-  WBI ("   Fat_Binary_Handle : System.Address :=");
-  WBI (" CUDA_Register_Fat_Binary (Wrapper'Address);");
+  WBI ("   Fat_Binary_Handle : System.Address;");
   WBI ("");
 
   for K in CUDA_Kernels.First .. CUDA_Kernels.Last loop
@@ -1300,9 +1304,9 @@ package body Bindgen is
 --  K_Symbol is a unique identifier used to derive all symbol names
 --  related to kernel K.
 
-Kernel_Addr : constant String := Kernel_Symbol & "_Addr";
---  Kernel_Addr is the name of the symbol representing the address
---  of the host-side procedure of the kernel. The address is
+Kernel_Proc : constant String := Kernel_Symbol & "_Proc";
+--  Kernel_Proc is the name of the symbol representing the
+--  host-side procedure of the kernel. The address is
 --  pragma-imported and then used while registering the kernel with
 --  the CUDA runtime.
 Kernel_String : constant String := Kernel_Symbol & "_String";
@@ -1315,40 +1319,80 @@ package body Bindgen is
 
  begin
 --  Import host-side kernel address.
-WBI ("   " & Kernel_Addr & " : constant System.Address;");
+WBI ("   procedure " & Kernel_Proc & ";");
 WBI ("   pragma Import");
 WBI ("  (Convention=> C,");
-WBI ("   Entity=> " & Kernel_Addr & ",");
+WBI ("   Entity=> " & Kernel_Proc & ",");

Re: [PATCH v3] DSE: Use the constant store source if possible

2022-05-30 Thread Richard Sandiford via Gcc-patches

Jeff Law via Gcc-patches  writes:
> On 5/29/2022 3:43 PM, H.J. Lu wrote:
>> On Sat, May 28, 2022 at 11:37 AM Jeff Law via Gcc-patches
>>  wrote:
>>>
>>>
>>> On 5/26/2022 2:43 PM, H.J. Lu via Gcc-patches wrote:
 On Thu, May 26, 2022 at 04:14:17PM +0100, Richard Sandiford wrote:
> "H.J. Lu"  writes:
>> On Wed, May 25, 2022 at 12:30 AM Richard Sandiford
>>  wrote:
>>> "H.J. Lu via Gcc-patches"  writes:
 On Mon, May 23, 2022 at 12:38:06PM +0200, Richard Biener wrote:
> On Sat, May 21, 2022 at 5:02 AM H.J. Lu via Gcc-patches
>  wrote:
>> When recording store for RTL dead store elimination, check if the 
>> source
>> register is set only once to a constant.  If yes, record the constant
>> as the store source.  It eliminates unrolled zero stores after 
>> memset 0
>> in a loop where a vector register is used as the zero store source.
>>
>> gcc/
>>
>>   PR rtl-optimization/105638
>>   * dse.cc (record_store): Use the constant source if the 
>> source
>>   register is set only once.
>>
>> gcc/testsuite/
>>
>>   PR rtl-optimization/105638
>>   * g++.target/i386/pr105638.C: New test.
>> ---
>>gcc/dse.cc   | 19 ++
>>gcc/testsuite/g++.target/i386/pr105638.C | 44 
>> 
>>2 files changed, 63 insertions(+)
>>create mode 100644 gcc/testsuite/g++.target/i386/pr105638.C
>>
>> diff --git a/gcc/dse.cc b/gcc/dse.cc
>> index 30c11cee034..0433dd3d846 100644
>> --- a/gcc/dse.cc
>> +++ b/gcc/dse.cc
>> @@ -1508,6 +1508,25 @@ record_store (rtx body, bb_info_t bb_info)
>>
>> if (tem && CONSTANT_P (tem))
>>   const_rhs = tem;
>> + else
>> +   {
>> + /* If RHS is set only once to a constant, set CONST_RHS
>> +to the constant.  */
>> + df_ref def = DF_REG_DEF_CHAIN (REGNO (rhs));
>> + if (def != nullptr
>> + && !DF_REF_IS_ARTIFICIAL (def)
>> + && !DF_REF_NEXT_REG (def))
>> +   {
>> + rtx_insn *def_insn = DF_REF_INSN (def);
>> + rtx def_body = PATTERN (def_insn);
>> + if (GET_CODE (def_body) == SET)
>> +   {
>> + rtx def_src = SET_SRC (def_body);
>> + if (CONSTANT_P (def_src))
>> +   const_rhs = def_src;
> doesn't DSE have its own tracking of stored values?  Shouldn't we
 It tracks stored values only within the basic block.  When RTL loop
 invariant motion hoists a constant initialization out of the loop into
 a separate basic block, the constant store value becomes unknown
 within the original basic block.

> improve _that_ if it is not enough?  I also wonder if you need to
 My patch extends DSE stored value tracking to include the constant 
 which
 is set only once in another basic block.

> verify the SET isn't partial?
>
 Here is the v2 patch to check that the constant is set by a non-partial
 unconditional load.

 OK for master?

 Thanks.

 H.J.
 ---
 RTL DSE tracks redundant constant stores within a basic block.  When 
 RTL
 loop invariant motion hoists a constant initialization out of the loop
 into a separate basic block, the constant store value becomes unknown
 within the original basic block.  When recording store for RTL DSE, 
 check
 if the source register is set only once to a constant by a non-partial
 unconditional load.  If yes, record the constant as the constant store
 source.  It eliminates unrolled zero stores after memset 0 in a loop
 where a vector register is used as the zero store source.

 gcc/

 PR rtl-optimization/105638
 * dse.cc (record_store): Use the constant source if the source
 register is set only once.

 gcc/testsuite/

 PR rtl-optimization/105638
 * g++.target/i386/pr105638.C: New test.
 ---
gcc/dse.cc   | 22 
gcc/testsuite/g++.target/i386/pr105638.C | 44 
 
2 files changed, 66 insertions(+)
create mode 100644 gcc/testsuite/g++.target/i386/pr105638.C

 diff --git a/gcc/dse.cc

Re: [PATCH] x86: {,v}psadbw have commutative source operands

2022-05-30 Thread Uros Bizjak via Gcc-patches

On Mon, May 30, 2022 at 9:59 AM Jan Beulich  wrote:
>
> On 27.05.2022 11:05, Uros Bizjak wrote:
> > On Fri, May 27, 2022 at 10:13 AM Jan Beulich  wrote:
> >>
> >> Like noticed for gas as well (binutils-gdb commit c8cad9d389b7), the
> >> "absolute difference" aspect of the insns makes their source operands
> >> commutative.
> >
> > You will need to expand via ix86_fixup_binary_operands_no_copy, use
> > register_mmxmem_operand on both input operands and use
> > ix86_binary_operator insn constraint. Please see many examples w/
> > commutative operands throughout .md files.
>
> Hmm, yes, I see. As to the use of ix86_binary_operator_ok(): In
> particular in sse.md I see many uses of
> ix86_fixup_binary_operands_no_copy() in expanders where the
> corresponding insns don't use ix86_binary_operator_ok(), e.g. the
> immediately preceding uavg. Is there a(n) (anti-)pattern?

ix86_fixup_binary_operands was historically used with destructive
two-operand instructions (where one input operand is tied with output
operand). It fixed up expansion to help register allocator a bit, and
brought expansion to the form of two-operand instruction, especially
with memory and immediate operands.

Please note that nowadays RA can fix up the operands by itself, but it
is still beneficial to have e.g. memory operation in the right place
from the beginning.

> My simplistic initial version was based on observations while
> putting together the inverse change for
> vgf2p8affine{,inv}qb_ (commit c0569d342ca4), which
> aren't commutative. Are you suggesting that the remaining (for indeed
> being commutative) vgf2p8mulb_ also is incomplete,
> requiring an expander as well? And maybe the same then in
> v1ti3 for any_logic:V1TI, avx512bw_umulhrswv32hi3,
> or _dp (and likely a few more)?
>
> At least a few pmadd* appear to lack commutativity marking altogether.

Yes, sse.md could use some TLC in this area. I remember doing a review
of these patterns in i386.md a while ago.

Uros.

> Jan
>
> >> --- a/gcc/config/i386/mmx.md
> >> +++ b/gcc/config/i386/mmx.md
> >> @@ -4407,7 +4407,7 @@
> >>
> >>  (define_insn "mmx_psadbw"
> >>[(set (match_operand:V1DI 0 "register_operand" "=y,x,Yw")
> >> -(unspec:V1DI [(match_operand:V8QI 1 "register_operand" "0,0,Yw")
> >> +(unspec:V1DI [(match_operand:V8QI 1 "register_operand" "%0,0,Yw")
> >>   (match_operand:V8QI 2 "register_mmxmem_operand" 
> >> "ym,x,Yw")]
> >>  UNSPEC_PSADBW))]
> >>"(TARGET_MMX || TARGET_MMX_WITH_SSE)
> >> --- a/gcc/config/i386/sse.md
> >> +++ b/gcc/config/i386/sse.md
> >> @@ -19983,7 +19983,7 @@
> >>  (define_insn "_psadbw"
> >>[(set (match_operand:VI8_AVX2_AVX512BW 0 "register_operand" "=x,YW")
> >> (unspec:VI8_AVX2_AVX512BW
> >> - [(match_operand: 1 "register_operand" "0,YW")
> >> + [(match_operand: 1 "register_operand" "%0,YW")
> >>(match_operand: 2 "vector_operand" "xBm,YWm")]
> >>   UNSPEC_PSADBW))]
> >>"TARGET_SSE2"
> >>
> >
>

Re: [0/9] [middle-end] Add param to vec_perm_const hook to specify mode of input operand

2022-05-30 Thread Richard Sandiford via Gcc-patches

(Sorry for the slow reply, was off on Friday)

Richard Biener  writes:
> On Wed, May 25, 2022 at 10:24 PM Prathamesh Kulkarni
>  wrote:
>>
>> On Thu, 26 May 2022 at 00:37, Richard Biener  
>> wrote:
> [...]
>> > x86 now accepts V4SI V8SI permutes because we don’t ask it correctly and 
>> > thus my naive attempt to use the new function API breaks . Not to mention 
>> > the VEC_PERM IL is still rejected. I will wait for the rest of the series 
>> > to be approved and pushed.
>> Hi,
>> I pushed the entire series in ae8decf1d2b8329af59592b4fa78ee8dfab3ba5e
>> after it was approved by Richard S.
>
> Thanks.
>
> Maybe I'm doing it wrong but I now see
>
>   indices.new_vector (mask, second_vec.first == -1U ? 1 : 2, nunits);
>   bool identity_p = indices.series_p (0, 1, 0, 1);
>
> where nunits is 4 and mask {4, 5, 6, 7}, the number of vectors is 1,
> and now indices.series_p (0, 1, 0, 1) returns true despite my input
> vector having 8 elements and 'indices' should select the upper half.
> That's because the function calls clamp() on the encoding but
> clamp() knows nothing about the different nunits of the input vectors.
>
> I suppose vec_perm_indices needs updating to allow for different
> nunits of the input vectors as well?

The final argument to new_vector is supposed to be the number of elements
per input vector, so it sounds like it should be 8 rather than 4 in this
situation.

The number of elements per output vector is taken from the mask argument.

Thanks,
Richard

>
> Where else does this change need adjustments to other APIs?
>
> PR101668 has a naiive user of the new capability.  The included
> testcase works OK but trying to expand test coverage quickly
> runs into wrong-code, like for
>
> typedef int v8si __attribute__((vector_size (32)));
> typedef long long v4di __attribute__((vector_size (32)));
>
> void
> bar_s32_s64 (v4di * dst, v8si src)
> {
>   long long tem[8];
>   tem[0] = src[4];
>   tem[1] = src[5];
>   tem[2] = src[6];
>   tem[3] = src[7];
>   dst[0] = *(v4di *) tem;
> }
>
> which I expected to be rejected with -mavx2.
>
> Thanks,
> Richard.
>
>> Thanks,
>> Prathamesh
>> >
>> > Richard.
>> >
>> > > Thanks,
>> > > Prathamesh
>> > >>
>> > >> At least I have a user in the vectorizer ready - allowing more permutes
>> > >> from existing vectors (of different sizes now) to be SLP vectorized.
>> > >>
>> > >> Thanks,
>> > >> Richard.
>> > >>
>> > >>> Thanks,
>> > >>> Prathamesh
>> > 
>> >  Richard

Re: vec_perm_const hook -- Fix build failure in ARM backend

2022-05-30 Thread Prathamesh Kulkarni via Gcc-patches

On Mon, 30 May 2022 at 13:04, Christophe Lyon  wrote:
>
> Hi Prathamesh,
>
>
> On 5/27/22 09:11, Prathamesh Kulkarni via Gcc-patches wrote:
> > Hi,
> > I forgot to adjust prototype for arm_vectorize_vec_perm_const, which,
> > resulted in following
> > build error:
> >
> > # 00:05:33 make[3]: [Makefile:1787:
> > armv8l-unknown-linux-gnueabihf/bits/largefile-config.h] Error 1
> > (ignored)
> > # 00:10:53 
> > /home/tcwg-buildslave/workspace/tcwg_gnu_4/abe/snapshots/gcc.git~master/gcc/config/arm/arm.cc:299:13:
> > error: ‘bool arm_vectorize_vec_perm_const(machine_mode, rtx, rtx, rtx,
> > const vec_perm_indices&)’ declared ‘static’ but never defined
> > [-Werror=unused-function]
> > # 00:12:22 make[3]: *** [Makefile:2418: arm.o] Error 1
> > # 00:23:34 make[2]: *** [Makefile:5005: all-stage2-gcc] Error 2
> > # 00:23:34 make[1]: *** [Makefile:25739: stage2-bubble] Error 2
> > # 00:23:34 make: *** [Makefile:1072: all] Error 2
> >
> > https://gcc.gnu.org/pipermail/gcc-regression/2022-May/076645.html
> >
> > The attached patch fixes it.
> > OK to commit ?
> >
>
> Doesn't this count as "obvious" ?
OK, pushed.

Thanks,
Prathamesh
>
> Thanks,
>
> Christophe
>
> > Thanks,
> > Prathamesh

Re: [PATCH] x86: {,v}psadbw have commutative source operands

2022-05-30 Thread Jan Beulich via Gcc-patches

On 27.05.2022 11:05, Uros Bizjak wrote:
> On Fri, May 27, 2022 at 10:13 AM Jan Beulich  wrote:
>>
>> Like noticed for gas as well (binutils-gdb commit c8cad9d389b7), the
>> "absolute difference" aspect of the insns makes their source operands
>> commutative.
> 
> You will need to expand via ix86_fixup_binary_operands_no_copy, use
> register_mmxmem_operand on both input operands and use
> ix86_binary_operator insn constraint. Please see many examples w/
> commutative operands throughout .md files.

Hmm, yes, I see. As to the use of ix86_binary_operator_ok(): In
particular in sse.md I see many uses of
ix86_fixup_binary_operands_no_copy() in expanders where the
corresponding insns don't use ix86_binary_operator_ok(), e.g. the
immediately preceding uavg. Is there a(n) (anti-)pattern?

My simplistic initial version was based on observations while
putting together the inverse change for
vgf2p8affine{,inv}qb_ (commit c0569d342ca4), which
aren't commutative. Are you suggesting that the remaining (for indeed
being commutative) vgf2p8mulb_ also is incomplete,
requiring an expander as well? And maybe the same then in
v1ti3 for any_logic:V1TI, avx512bw_umulhrswv32hi3,
or _dp (and likely a few more)?

At least a few pmadd* appear to lack commutativity marking altogether.

Jan

>> --- a/gcc/config/i386/mmx.md
>> +++ b/gcc/config/i386/mmx.md
>> @@ -4407,7 +4407,7 @@
>>
>>  (define_insn "mmx_psadbw"
>>[(set (match_operand:V1DI 0 "register_operand" "=y,x,Yw")
>> -(unspec:V1DI [(match_operand:V8QI 1 "register_operand" "0,0,Yw")
>> +(unspec:V1DI [(match_operand:V8QI 1 "register_operand" "%0,0,Yw")
>>   (match_operand:V8QI 2 "register_mmxmem_operand" 
>> "ym,x,Yw")]
>>  UNSPEC_PSADBW))]
>>"(TARGET_MMX || TARGET_MMX_WITH_SSE)
>> --- a/gcc/config/i386/sse.md
>> +++ b/gcc/config/i386/sse.md
>> @@ -19983,7 +19983,7 @@
>>  (define_insn "_psadbw"
>>[(set (match_operand:VI8_AVX2_AVX512BW 0 "register_operand" "=x,YW")
>> (unspec:VI8_AVX2_AVX512BW
>> - [(match_operand: 1 "register_operand" "0,YW")
>> + [(match_operand: 1 "register_operand" "%0,YW")
>>(match_operand: 2 "vector_operand" "xBm,YWm")]
>>   UNSPEC_PSADBW))]
>>"TARGET_SSE2"
>>
>

Re: [PATCH] Add a bit dislike for separate mem alternative when op is REG_P.

2022-05-30 Thread Alexander Monakov via Gcc-patches

On Mon, 30 May 2022, Hongtao Liu wrote:

> On Mon, May 30, 2022 at 2:22 PM Alexander Monakov via Gcc-patches
>  wrote:
> > >
> > > The spill is mainly decided by 3 insns related to r92
> > >
> > > 283(insn 3 61 4 2 (set (reg/v:SF 92 [ x ])
> > > 284(reg:SF 102)) "test3.c":7:1 142 {*movsf_internal}
> > > 285 (expr_list:REG_DEAD (reg:SF 102)
> > >
> > > 288(insn 9 4 12 2 (set (reg:SI 89 [ _11 ])
> > > 289(subreg:SI (reg/v:SF 92 [ x ]) 0)) "test3.c":3:36 81 
> > > {*movsi_internal}
> > > 290 (nil))
> > >
> > > And
> > > 382(insn 28 27 29 5 (set (reg:DF 98)
> > > 383(float_extend:DF (reg/v:SF 92 [ x ]))) "test3.c":11:13 163 
> > > {*extendsfdf2}
> > > 384 (expr_list:REG_DEAD (reg/v:SF 92 [ x ])
> > > 385(nil)))
> > > 386(insn 29 28 30 5 (s
> > >
> > > The frequency the for INSN 3 and INSN 9 is not affected, but frequency of 
> > > INSN
> > > 28 drop from 805 -> 89 after swapping "unlikely" and "likely".  Because of
> > > that, GPR cost decreases a lot, finally make the RA choose GPR instead of 
> > > MEM.
> > >
> > > GENERAL_REGS:2356,2356
> > > SSE_REGS:6000,6000
> > > MEM:4089,4089
> >
> > But why are SSE_REGS costed so high? r92 is used in SFmode, it doesn't make
> > sense that selecting a GPR for it looks cheaper than xmm0.
> For INSN3 and INSN 28, SSE_REGS costs zero.
> But for INSN 9, it's a SImode move, we have disparaged non-gpr
> alternatives in movsi_internal pattern which finally makes SSE_REGS
> costs 6 * 1000(1000 is frequency, 6 is move cost between SSE_REGS and
> GPR, sse_to_integer/integer_to_sse).

But wait, selecting a GPR for r92 makes insn 3 (movsf_internal) an
sse-to-integer move, so it should be equally high cost? Not to mention
that the use in insn 28 (extendsfdf2) should have higher cost also.

Why does GPR cost 2356 instead of 6000 for insn 3 plus extra for insn 28?

Alexander

Re: vec_perm_const hook -- Fix build failure in ARM backend

2022-05-30 Thread Christophe Lyon via Gcc-patches


Hi Prathamesh,


On 5/27/22 09:11, Prathamesh Kulkarni via Gcc-patches wrote:

Hi,
I forgot to adjust prototype for arm_vectorize_vec_perm_const, which,
resulted in following
build error:

# 00:05:33 make[3]: [Makefile:1787:
armv8l-unknown-linux-gnueabihf/bits/largefile-config.h] Error 1
(ignored)
# 00:10:53 
/home/tcwg-buildslave/workspace/tcwg_gnu_4/abe/snapshots/gcc.git~master/gcc/config/arm/arm.cc:299:13:
error: ‘bool arm_vectorize_vec_perm_const(machine_mode, rtx, rtx, rtx,
const vec_perm_indices&)’ declared ‘static’ but never defined
[-Werror=unused-function]
# 00:12:22 make[3]: *** [Makefile:2418: arm.o] Error 1
# 00:23:34 make[2]: *** [Makefile:5005: all-stage2-gcc] Error 2
# 00:23:34 make[1]: *** [Makefile:25739: stage2-bubble] Error 2
# 00:23:34 make: *** [Makefile:1072: all] Error 2

https://gcc.gnu.org/pipermail/gcc-regression/2022-May/076645.html

The attached patch fixes it.
OK to commit ?



Doesn't this count as "obvious" ?

Thanks,

Christophe


Thanks,
Prathamesh

Re: [PATCH] PR fortran/91300 - runtime error message with allocate and errmsg=

2022-05-30 Thread Tobias Burnus


On 28.05.22 22:25, Harald Anlauf via Fortran wrote:

the PR rightfully complained that we did not differentiate errors on
ALLOCATE(...,stat=,errmsg=) for failures from allocation of already
allocated objects or insufficient virtual memory.

It is even worse: The error message states the wrong reason.

The attached patch introduces a new STAT value LIBERROR_NO_MEMORY
that is returned for insufficient virtual memory, and a corresponding
(simple and invariant) ERRMSG: "Insufficient virtual memory".

I think the message is fine – at least on Linux 'virtual memory' is
what's used and it is clear what's meant, even if I stumble a bit about
the wording. (But do not have a crisp short & comprehensive wording myself.)

(In the PR Janne mentions checking for errno, but since the standard
malloc(3) only mentions ENOMEM as possible error value, and the user
may replace malloc by a special library, I believe that won't be easy
to handle in a general way.)

I con concur. Especially as POSIX and the Linux manpage only list ENOMEM
as only value.

Most compilers I tried (Intel/NAG/Crayftn) behave similarly, except
for Nvidia/Flang, which try to return the size of the allocation in
the error message.

I am not sure that this is worth the effort.

I think it is not needed – especially as we generate the message in the
FE and not in libgfortran. The advantage for the users is that they know
what value has been requested – but they cannot really do much with the
knowledge, either.

The testcase should be able to handle 32 and 64 bit systems.
At least that's what I think.


I hope it is – at least on 64bit, I currently expect it too fail
somewhat reliably, with 32bit I think it won't – but that's also handled.

But you could add a -fdump-tree-original + '! { dg-final {
scan-tree-dump*' to do some checking in addition (e.g. whether both
strings appear in the dump; could be more complex, but that's probably
not needed).


Regtested on x86_64-pc-linux-gnu.  OK for mainline?  Suggestions?


LGTM – with considering comments on the testcase.



Fortran: improve runtime error message with ALLOCATE and ERRMSG=


Consider appending [PR91300] in that line – and try to make the
gcc-patches@ and fortran@ lines the same

(Searching for the PR number or case-insignificantly for the string in
the mailing list archive, will fine the message; thus, that's okay.)


ALLOCATE: generate different STAT,ERRMSG results for failures from
allocation of already allocated objects or insufficient virtual memory.

gcc/fortran/ChangeLog:

  PR fortran/91300
  * libgfortran.h: Define new error code LIBERROR_NO_MEMORY.
  * trans-stmt.cc (gfc_trans_allocate): Generate code for setting
  ERRMSG depending on result of STAT result of ALLOCATE.
  * trans.cc (gfc_allocate_using_malloc): Use STAT value of
  LIBERROR_NO_MEMORY in case of failed malloc.

gcc/testsuite/ChangeLog:

  PR fortran/91300
  * gfortran.dg/allocate_alloc_opt_15.f90: New test.
---

...


+  stat1   = -1
+  errmsg1 = ""
+  allocate (array(1), stat=stat1, errmsg=errmsg1)
+  if (stat1   /=  0) stop 1
+  if (errmsg1 /= "") stop 1

Consider to init the errmsg1 and then check that is has not been
touched. (For completeness, I think we already have such tests).

+  ! Obtain stat,errmsg for attempt to allocate an allocated object
+  allocate (array(1), stat=stat1, errmsg=errmsg1)
+  if (stat1   ==  0) stop 2
+  if (errmsg1 == "") stop 2

Consider to check (either here or as a third test) for the
gfortran-specific error message.

+  stat2   = -1
+  errmsg2 = ""
+  ! Try to allocate very large object
+  allocate (bigarray(bignumber), stat=stat2, errmsg=errmsg2)
+  if (stat2 /= 0) then
+ print *, "stat  =", stat1
+ print *, "errmsg: ", trim (errmsg1)
+ print *, "stat  =", stat2
+ print *, "errmsg: ", trim (errmsg2)
+ ! Ensure different results for stat, errmsg variables
+ if (stat2   == stat1 ) stop 3
+ if (errmsg2 == "" .or. errmsg2 == errmsg1) stop 4


Likewise for errmsg2

Thanks,

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955

Re: [PATCH] Add a bit dislike for separate mem alternative when op is REG_P.

2022-05-30 Thread Hongtao Liu via Gcc-patches

On Mon, May 30, 2022 at 2:22 PM Alexander Monakov via Gcc-patches
 wrote:
>
> > > In the PR, the spill happens in the initial basic block of the function, 
> > > i.e.
> > > the one with the highest frequency.
> > >
> > > Also as noted in the PR, swapping the 'unlikely' branch to 'likely' 
> > > avoids the spill,
> > > even though it does not affect the frequency of the initial basic block, 
> > > and
> > > makes the block with the use more rarely executed.
> >
> > The spill is mainly decided by 3 insns related to r92
> >
> > 283(insn 3 61 4 2 (set (reg/v:SF 92 [ x ])
> > 284(reg:SF 102)) "test3.c":7:1 142 {*movsf_internal}
> > 285 (expr_list:REG_DEAD (reg:SF 102)
> >
> > 288(insn 9 4 12 2 (set (reg:SI 89 [ _11 ])
> > 289(subreg:SI (reg/v:SF 92 [ x ]) 0)) "test3.c":3:36 81 
> > {*movsi_internal}
> > 290 (nil))
> >
> > And
> > 382(insn 28 27 29 5 (set (reg:DF 98)
> > 383(float_extend:DF (reg/v:SF 92 [ x ]))) "test3.c":11:13 163 
> > {*extendsfdf2}
> > 384 (expr_list:REG_DEAD (reg/v:SF 92 [ x ])
> > 385(nil)))
> > 386(insn 29 28 30 5 (s
> >
> > The frequency the for INSN 3 and INSN 9 is not affected, but frequency of 
> > INSN
> > 28 drop from 805 -> 89 after swapping "unlikely" and "likely".  Because of
> > that, GPR cost decreases a lot, finally make the RA choose GPR instead of 
> > MEM.
> >
> > GENERAL_REGS:2356,2356
> > SSE_REGS:6000,6000
> > MEM:4089,4089
>
> But why are SSE_REGS costed so high? r92 is used in SFmode, it doesn't make
> sense that selecting a GPR for it looks cheaper than xmm0.
For INSN3 and INSN 28, SSE_REGS costs zero.
But for INSN 9, it's a SImode move, we have disparaged non-gpr
alternatives in movsi_internal pattern which finally makes SSE_REGS
costs 6 * 1000(1000 is frequency, 6 is move cost between SSE_REGS and
GPR, sse_to_integer/integer_to_sse).
value of sse_to_integer/integer_to_sse is decided as 6(3 times as GPR
move cost) to avoid too many spills between gpr and xmm, we once set
sse_to_integer/integer_to_sse  as 2 and generated many movd
instructions but finally caused frequency drop and regressed
performance.

Another choice is changing movsi_internal alternatives from *v to
?v(just like what we did in *movsf_internal for gpr alternatives),
then the separate change can fix PR, and also eliminate the extra
movd, but it may regress cases elsewhere.
Any thoughts for changing *v to ?v @Uros Bizjak
>
> > Dump of 301.ira:
> > 67  a4(r92,l0) costs: AREG:2356,2356 DREG:2356,2356 CREG:2356,2356 
> > BREG:2356,2356 SIREG:2356,2356 DIREG:2356,2356 AD_REGS:2356,2356 
> > CLOBBERED_REGS:2356,2356 Q_REGS:2356,2356 NON_Q_REGS:2356,2356 
> > TLS_GOTBASE_REGS:2356,2356 GENERAL_REGS:2356,2356 SSE_FIRST_REG:6000,6000 
> > NO_REX_SSE_REGS:6000,6000 SSE_REGS:6000,6000 \
> >MMX_REGS:19534,19534 INT_SSE_REGS:19534,19534 ALL_REGS:214534,214534 
> > MEM:4089,4089
> >
> > And although there's no spill, there's an extra VMOVD in the later BB which
> > looks suboptimal(Guess we can stand with that since it's cold.)
>
> I think that falls out of the wrong decision for SSE_REGS cost.
>
> Alexander
>
> >
> > 24vmovd   %eax, %xmm2
> > 25vcvtss2sd   %xmm2, %xmm2, %xmm1
> > 26vmulsd  %xmm0, %xmm1, %xmm0
> > 27vcvtsd2ss   %xmm0, %xmm0, %xmm0
> > >
> > > Do you have a root cause analysis that explains the above?
> > >
> > > Alexander



-- 
BR,
Hongtao

Re: [PATCH] Mips: Enable asynchronous unwind tables with both ASAN and TSAN

2022-05-30 Thread Dimitrije Milosevic

Hi Xi, thanks for pointing this out. I'd definitely say that the 
https://clang.llvm.org/docs/ThreadSanitizer.html documentation is outdated. 
According to 
https://github.com/google/sanitizers/wiki/ThreadSanitizerCppManual#supported-platforms
 TSAN is supported on Mips64. Furthermore, there are actual code segments (in 
compiler-rt/lib/tsan/rtl/tsan_platforms.h, for example) related to Mips64.
I didn't add the 64-bit target check, however. Here is the updated version of 
the patch.

gcc/ChangeLog:

* config/mips/mips.cc (mips_option_override): Enable
asyncronous unwind tables with both ASAN and TSAN.

---

 gcc/config/mips/mips.cc | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc
index e64928f4113..2dce4007678 100644
--- a/gcc/config/mips/mips.cc
+++ b/gcc/config/mips/mips.cc
@@ -20115,10 +20115,16 @@ mips_option_override (void)
target_flags |= MASK_64BIT;
 }

-  /* -fsanitize=address needs to turn on -fasynchronous-unwind-tables in
- order for tracebacks to be complete but not if any
- -fasynchronous-unwind-table were already specified.  */
-  if (flag_sanitize & SANITIZE_USER_ADDRESS
+  /* -fsanitize=address, and -fsanitize=thread need to turn
+ on -fasynchronous-unwind-tables in order for tracebacks
+ to be complete but not if any -fasynchronous-unwind-table
+ were already specified.  */
+  /* FIXME: HWSAN is currently only available on AArch64,
+  but could also utilize -fasynchronous-unwind-tables.
+ FIXME: We would also like to check if -ffreestanding is passed in.
+  However, it is only available in C-ish frontends.  */
+  if (((flag_sanitize & SANITIZE_USER_ADDRESS)
+  || (TARGET_64BIT && (flag_sanitize & SANITIZE_THREAD)))
   && !global_options_set.x_flag_asynchronous_unwind_tables)
 flag_asynchronous_unwind_tables = 1;

---


From: Xi Ruoyao 
Sent: Saturday, May 28, 2022 12:30 PM
To: Dimitrije Milosevic ; 
gcc-patches@gcc.gnu.org 
Cc: Djordje Todorovic 
Subject: Re: [PATCH] Mips: Enable asynchronous unwind tables with both ASAN and 
TSAN

On Thu, 2022-05-26 at 14:18 +, Dimitrije Milosevic wrote:
> Enable asynchronous unwind tables with both ASAN and TSAN for correct 
> back-trace.
> LLVM currently enables asynchronous unwind tables for: ASAN, HWSAN, TSAN, 
> MSAN, and DFSAN.
> HWSAN is currently available only on AArch64, while MSAN and DFSAN are not 
> available at all.
> Also, LLVM checks is '-ffreestanding' is not passed in. '-ffreestanding' is 
> only available in C-family frontends, hence why no such check is included.
> TODO: Not sure if any tests should be added.

According to https://clang.llvm.org/docs/ThreadSanitizer.html, TSAN is
not supported on MIPS.  Is this doc outdated?

--
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: nvptx: forward '-v' command-line option to assembler, linker (was: [MentorEmbedded/nvptx-tools] Issue 30: Ignore not-supported sm_* error without --verify (PR #31))

2022-05-30 Thread Tobias Burnus


Hi Thomas,

On 29.05.22 22:49, Thomas Schwinge wrote:

Not sure if that's what you had in mind, but what do you think about the
attached "nvptx: forward '-v' command-line option to assembler, linker"?
OK to push to GCC master branch (after merging

"Put '-v' verbose output onto stderr instead of stdout")?


I was mainly thinking of some way to have it available — which
'-foffload-options=-Wa,-v' already permits on the GCC side. (Once the
nvptx-tools patch actually makes use of the '-v'.)

If I understand your patch correctly, this patch now causes 'gcc -v' to
imply 'gcc -v -Wa,-v'. I think that's okay, since 'gcc -v' already
outputs a lot of lines and those lines can be helpful to understand what
happens and what not.

Tom, your thoughts on this?

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955

RE: [PATCH] Add a bit dislike for separate mem alternative when op is REG_P.

2022-05-30 Thread Alexander Monakov via Gcc-patches

> > In the PR, the spill happens in the initial basic block of the function, 
> > i.e.
> > the one with the highest frequency.
> > 
> > Also as noted in the PR, swapping the 'unlikely' branch to 'likely' avoids 
> > the spill,
> > even though it does not affect the frequency of the initial basic block, and
> > makes the block with the use more rarely executed.
> 
> The spill is mainly decided by 3 insns related to r92
> 
> 283(insn 3 61 4 2 (set (reg/v:SF 92 [ x ])
> 284(reg:SF 102)) "test3.c":7:1 142 {*movsf_internal}
> 285 (expr_list:REG_DEAD (reg:SF 102)
> 
> 288(insn 9 4 12 2 (set (reg:SI 89 [ _11 ])
> 289(subreg:SI (reg/v:SF 92 [ x ]) 0)) "test3.c":3:36 81 
> {*movsi_internal}
> 290 (nil))
> 
> And
> 382(insn 28 27 29 5 (set (reg:DF 98)
> 383(float_extend:DF (reg/v:SF 92 [ x ]))) "test3.c":11:13 163 
> {*extendsfdf2}
> 384 (expr_list:REG_DEAD (reg/v:SF 92 [ x ])
> 385(nil)))
> 386(insn 29 28 30 5 (s
> 
> The frequency the for INSN 3 and INSN 9 is not affected, but frequency of INSN
> 28 drop from 805 -> 89 after swapping "unlikely" and "likely".  Because of
> that, GPR cost decreases a lot, finally make the RA choose GPR instead of MEM.
> 
> GENERAL_REGS:2356,2356 
> SSE_REGS:6000,6000
> MEM:4089,4089

But why are SSE_REGS costed so high? r92 is used in SFmode, it doesn't make
sense that selecting a GPR for it looks cheaper than xmm0.

> Dump of 301.ira:
> 67  a4(r92,l0) costs: AREG:2356,2356 DREG:2356,2356 CREG:2356,2356 
> BREG:2356,2356 SIREG:2356,2356 DIREG:2356,2356 AD_REGS:2356,2356 
> CLOBBERED_REGS:2356,2356 Q_REGS:2356,2356 NON_Q_REGS:2356,2356 
> TLS_GOTBASE_REGS:2356,2356 GENERAL_REGS:2356,2356 SSE_FIRST_REG:6000,6000 
> NO_REX_SSE_REGS:6000,6000 SSE_REGS:6000,6000 \
>MMX_REGS:19534,19534 INT_SSE_REGS:19534,19534 ALL_REGS:214534,214534 
> MEM:4089,4089
> 
> And although there's no spill, there's an extra VMOVD in the later BB which
> looks suboptimal(Guess we can stand with that since it's cold.)

I think that falls out of the wrong decision for SSE_REGS cost.

Alexander

> 
> 24vmovd   %eax, %xmm2
> 25vcvtss2sd   %xmm2, %xmm2, %xmm1
> 26vmulsd  %xmm0, %xmm1, %xmm0
> 27vcvtsd2ss   %xmm0, %xmm0, %xmm0
> > 
> > Do you have a root cause analysis that explains the above?
> > 
> > Alexander

86 matches

Mail list logo