date:20210907

[PATCH] [i386] Optimize v4sf reduction.

2021-09-07 Thread liuhongt via Gcc-patches

Hi:
  The optimization is decribled in PR.
  The two instruction sequences are almost as fast, but the optimized
instruction sequences could be one mov instruction less on sse2 and
2 mov instruction less on sse3.

  Bootstrapped and regtested on x86_64-linux-gnu{-m32,}.

gcc/ChangeLog:

PR target/101059
* config/i386/sse.md (reduc_plus_scal_): Split to ..
(reduc_plus_scal_v4sf): .. this, New define_expand.
(reduc_plus_scal_v2df): .. and this, New define_expand.

gcc/testsuite/ChangeLog:

PR target/101059
* gcc.target/i386/sse2-pr101059.c: New test.
* gcc.target/i386/sse3-pr101059.c: New test.
---
 gcc/config/i386/sse.md| 39 +--
 gcc/testsuite/gcc.target/i386/sse2-pr101059.c | 32 +++
 gcc/testsuite/gcc.target/i386/sse3-pr101059.c | 13 +++
 3 files changed, 73 insertions(+), 11 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-pr101059.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse3-pr101059.c

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 5785e73241c..b8057344a1c 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -2874,19 +2874,36 @@ (define_insn "sse3_hv4sf3"
(set_attr "prefix_rep" "1,*")
(set_attr "mode" "V4SF")])
 
-(define_mode_iterator REDUC_SSE_PLUS_MODE
- [(V2DF "TARGET_SSE") (V4SF "TARGET_SSE")])
+(define_expand "reduc_plus_scal_v4sf"
+ [(plus:V4SF
+   (match_operand:SF 0 "register_operand")
+   (match_operand:V4SF 1 "register_operand"))]
+ "TARGET_SSE"
+{
+  rtx vtmp = gen_reg_rtx (V4SFmode);
+  rtx stmp = gen_reg_rtx (SFmode);
+  if (TARGET_SSE3)
+emit_insn (gen_sse3_movshdup (vtmp, operands[1]));
+  else
+emit_insn (gen_sse_shufps (vtmp, operands[1], operands[1], GEN_INT(177)));
 
-(define_expand "reduc_plus_scal_"
- [(plus:REDUC_SSE_PLUS_MODE
-   (match_operand: 0 "register_operand")
-   (match_operand:REDUC_SSE_PLUS_MODE 1 "register_operand"))]
- ""
+  emit_insn (gen_addv4sf3 (operands[1], operands[1], vtmp));
+  emit_insn (gen_sse_movhlps (vtmp, vtmp, operands[1]));
+  emit_insn (gen_vec_extractv4sfsf (stmp, vtmp, const0_rtx));
+  emit_insn (gen_vec_extractv4sfsf (operands[0], operands[1], const0_rtx));
+  emit_insn (gen_addsf3 (operands[0], operands[0], stmp));
+  DONE;
+})
+
+(define_expand "reduc_plus_scal_v2df"
+ [(plus:V2DF
+   (match_operand:DF 0 "register_operand")
+   (match_operand:V2DF 1 "register_operand"))]
+ "TARGET_SSE"
 {
-  rtx tmp = gen_reg_rtx (mode);
-  ix86_expand_reduc (gen_add3, tmp, operands[1]);
-  emit_insn (gen_vec_extract (operands[0], tmp,
-const0_rtx));
+  rtx tmp = gen_reg_rtx (V2DFmode);
+  ix86_expand_reduc (gen_addv2df3, tmp, operands[1]);
+  emit_insn (gen_vec_extractv2dfdf (operands[0], tmp, const0_rtx));
   DONE;
 })
 
diff --git a/gcc/testsuite/gcc.target/i386/sse2-pr101059.c 
b/gcc/testsuite/gcc.target/i386/sse2-pr101059.c
new file mode 100644
index 000..d155bf5b43c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/sse2-pr101059.c
@@ -0,0 +1,32 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -ffast-math -msse2" } */
+/* { dg-require-effective-target sse2 } */
+
+#ifndef CHECK_H
+#define CHECK_H "sse2-check.h"
+#endif
+
+#ifndef TEST
+#define TEST sse2_test
+#endif
+
+#include CHECK_H
+
+float
+__attribute__((noipa, optimize("tree-vectorize")))
+foo (float* p)
+{
+  float sum = 0.f;
+  for (int i = 0; i != 4; i++)
+sum += p[i];
+  return sum;
+}
+
+static void
+TEST (void)
+{
+  float p[4] = {1.0f, 2.0f, 3.0f, 4.0f};
+  float res = foo (p);
+  if (res != 10.0f)
+abort();
+}
diff --git a/gcc/testsuite/gcc.target/i386/sse3-pr101059.c 
b/gcc/testsuite/gcc.target/i386/sse3-pr101059.c
new file mode 100644
index 000..4795e892883
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/sse3-pr101059.c
@@ -0,0 +1,13 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -ffast-math -msse3" } */
+/* { dg-require-effective-target sse3 } */
+
+#ifndef CHECK_H
+#define CHECK_H "sse3-check.h"
+#endif
+
+#ifndef TEST
+#define TEST sse3_test
+#endif
+
+#include "sse2-pr101059.c"
-- 
2.27.0

PING^2 [PATCH] x86: Update memcpy/memset inline strategies for -mtune=generic

2021-09-07 Thread H.J. Lu via Gcc-patches

On Sun, Aug 22, 2021 at 8:28 AM H.J. Lu  wrote:
>
> On Tue, Mar 23, 2021 at 09:19:38AM +0100, Richard Biener wrote:
> > On Tue, Mar 23, 2021 at 3:41 AM Hongyu Wang  wrote:
> > >
> > > > Hongyue, please collect code size differences on SPEC CPU 2017 and
> > > > eembc.
> > >
> > > Here is code size difference for this patch
> >
> > Thanks, nothing too bad although slightly larger impacts than envisioned.
> >
>
> PING.
>
> OK for master branch?
>
> Thanks.
>
> H.J.
>  ---
> Simplify memcpy and memset inline strategies to avoid branches for
> -mtune=generic:
>
> 1. With MOVE_RATIO and CLEAR_RATIO == 17, GCC will use integer/vector
>load and store for up to 16 * 16 (256) bytes when the data size is
>fixed and known.
> 2. Inline only if data size is known to be <= 256.
>a. Use "rep movsb/stosb" with simple code sequence if the data size
>   is a constant.
>b. Use loop if data size is not a constant.
> 3. Use memcpy/memset libray function if data size is unknown or > 256.
>

PING:

https://gcc.gnu.org/pipermail/gcc-patches/2021-August/577889.html

-- 
H.J.

Re: [PATCH V2 00/10] Initial support for AVX512FP16

2021-09-07 Thread Hongtao Liu via Gcc-patches

On Wed, Sep 8, 2021 at 10:54 AM Hongtao Liu  wrote:
>
> On Wed, Jul 21, 2021 at 3:43 PM liuhongt  wrote:
> >
> > Hi:
> >   As discussed in [1], this patch support _Float16 under target sse2
> > and above, w/o avx512fp16, _Float16 type is storage only, all operations
> > are emulated by soft-fp and float instructions. Soft-fp keeps the 
> > intermediate
> > result of the operation at 32-bit precision by defaults, which may lead to
> > inconsistent behavior between soft-fp and avx512fp16 instructions, using 
> > option
> > -fexcess-precision=standard will force round back after every operation.
> >
> > [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574112.html
> >
> > There's 10 patches in this series:
> >
> > 1)  Update hf soft-fp from glibc.
> > 2)  [i386] Enable _Float16 type for TARGET_SSE2 and above.
> > 3)  [i386] libgcc: Enable hfmode soft-sf/df/xf/tf extensions and
> > truncations.
> > 4) AVX512FP16: Initial support for AVX512FP16 feature and scalar _Float16
> > instructions.
> > 5) AVX512FP16: Support vector init/broadcast/set/extract for FP16.
> > 6) AVX512FP16: Add testcase for vector init and broadcast intrinsics.
> > 7) AVX512FP16: Add tests for vector passing in variable arguments.
> > 8) AVX512FP16: Add ABI tests for xmm.
> > 9) AVX512FP16: Add ABI test for ymm.
> > 10) AVX512FP16: Add abi test for zmm
>
> I'm going to check in patch 4-10 plus [1].
patch 4 introduces a new failure which seems to just expose the latent
bug which is already recorded in PR99936 and PR98531.

g++.dg/modules/xtreme-header_b.C -std=c++17 (internal compiler error)

>
>
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-September/578654.html.
>
>
> >
> >   Bootstrapped and regtested on x86_64-linux-gnu{-m32,} on CLX.
> >   Boostrappped and regtested on x86_64-linux-gnu{-m32\ -march=native,\ 
> > -march=native} on SPR.
> >   Pass 300+ new tests under gcc.dg/torture/*float16*
> >
> >   On SPR, there're regressions related to FLT_EVAL_METHODS for 
> > pr69225-[1234567].c
> >  since TARGET_AVX512FP16 will set FLT_EVAL_MATHOD as 
> > FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16.
> >
> >  gcc/common/config/i386/cpuinfo.h  |2 +
> >  gcc/common/config/i386/i386-common.c  |   26 +-
> >  gcc/common/config/i386/i386-cpuinfo.h |1 +
> >  gcc/common/config/i386/i386-isas.h|1 +
> >  gcc/config.gcc|2 +-
> >  gcc/config/i386/avx512fp16intrin.h|  225 
> >  gcc/config/i386/cpuid.h   |1 +
> >  gcc/config/i386/i386-builtin-types.def|7 +-
> >  gcc/config/i386/i386-builtins.c   |   23 +
> >  gcc/config/i386/i386-c.c  |2 +
> >  gcc/config/i386/i386-expand.c |  129 +-
> >  gcc/config/i386/i386-isa.def  |1 +
> >  gcc/config/i386/i386-modes.def|   13 +-
> >  gcc/config/i386/i386-options.c|4 +-
> >  gcc/config/i386/i386.c|  238 +++-
> >  gcc/config/i386/i386.h|   28 +-
> >  gcc/config/i386/i386.md   |  304 -
> >  gcc/config/i386/i386.opt  |4 +
> >  gcc/config/i386/immintrin.h   |4 +
> >  gcc/config/i386/sse.md|  395 --
> >  gcc/doc/extend.texi   |   16 +
> >  gcc/doc/invoke.texi   |   10 +-
> >  gcc/lto/lto-lang.c|3 +
> >  gcc/optabs-query.c|   10 +-
> >  gcc/testsuite/g++.dg/other/i386-2.C   |2 +-
> >  gcc/testsuite/g++.dg/other/i386-3.C   |2 +-
> >  gcc/testsuite/g++.target/i386/float16-1.C |8 +
> >  gcc/testsuite/g++.target/i386/float16-2.C |   14 +
> >  gcc/testsuite/g++.target/i386/float16-3.C |   10 +
> >  gcc/testsuite/gcc.target/i386/avx-1.c |2 +-
> >  gcc/testsuite/gcc.target/i386/avx-2.c |2 +-
> >  gcc/testsuite/gcc.target/i386/avx512-check.h  |3 +
> >  .../gcc.target/i386/avx512fp16-10a.c  |   14 +
> >  .../gcc.target/i386/avx512fp16-10b.c  |   25 +
> >  .../gcc.target/i386/avx512fp16-12a.c  |   21 +
> >  .../gcc.target/i386/avx512fp16-12b.c  |   27 +
> >  gcc/testsuite/gcc.target/i386/avx512fp16-1a.c |   24 +
> >  gcc/testsuite/gcc.target/i386/avx512fp16-1b.c |   32 +
> >  gcc/testsuite/gcc.target/i386/avx512fp16-1c.c |   26 +
> >  gcc/testsuite/gcc.target/i386/avx512fp16-1d.c |   33 +
> >  gcc/testsuite/gcc.target/i386/avx512fp16-1e.c |   30 +
> >  gcc/testsuite/gcc.target/i386/avx512fp16-2a.c |   28 +
> >  gcc/testsuite/gcc.target/i386/avx512fp16-2b.c |   33 +
> >  gcc/testsuite/gcc.target/i386/avx512fp16-2c.c |   36 +
> >  gcc/testsuite/gcc.target/i386/avx512fp16-3a.c |   36 +
> >  gcc/testsuite/gcc.target/i386/avx512fp16-3b.c |   35 +
> >  gcc/testsuite/gcc.target/i386/avx512fp16-3c.c |   40 +
> >

Re: [PATCH V2 00/10] Initial support for AVX512FP16

2021-09-07 Thread Hongtao Liu via Gcc-patches

On Wed, Jul 21, 2021 at 3:43 PM liuhongt  wrote:
>
> Hi:
>   As discussed in [1], this patch support _Float16 under target sse2
> and above, w/o avx512fp16, _Float16 type is storage only, all operations
> are emulated by soft-fp and float instructions. Soft-fp keeps the intermediate
> result of the operation at 32-bit precision by defaults, which may lead to
> inconsistent behavior between soft-fp and avx512fp16 instructions, using 
> option
> -fexcess-precision=standard will force round back after every operation.
>
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574112.html
>
> There's 10 patches in this series:
>
> 1)  Update hf soft-fp from glibc.
> 2)  [i386] Enable _Float16 type for TARGET_SSE2 and above.
> 3)  [i386] libgcc: Enable hfmode soft-sf/df/xf/tf extensions and
> truncations.
> 4) AVX512FP16: Initial support for AVX512FP16 feature and scalar _Float16
> instructions.
> 5) AVX512FP16: Support vector init/broadcast/set/extract for FP16.
> 6) AVX512FP16: Add testcase for vector init and broadcast intrinsics.
> 7) AVX512FP16: Add tests for vector passing in variable arguments.
> 8) AVX512FP16: Add ABI tests for xmm.
> 9) AVX512FP16: Add ABI test for ymm.
> 10) AVX512FP16: Add abi test for zmm

I'm going to check in patch 4-10 plus [1].


[1] https://gcc.gnu.org/pipermail/gcc-patches/2021-September/578654.html.


>
>   Bootstrapped and regtested on x86_64-linux-gnu{-m32,} on CLX.
>   Boostrappped and regtested on x86_64-linux-gnu{-m32\ -march=native,\ 
> -march=native} on SPR.
>   Pass 300+ new tests under gcc.dg/torture/*float16*
>
>   On SPR, there're regressions related to FLT_EVAL_METHODS for 
> pr69225-[1234567].c
>  since TARGET_AVX512FP16 will set FLT_EVAL_MATHOD as 
> FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16.
>
>  gcc/common/config/i386/cpuinfo.h  |2 +
>  gcc/common/config/i386/i386-common.c  |   26 +-
>  gcc/common/config/i386/i386-cpuinfo.h |1 +
>  gcc/common/config/i386/i386-isas.h|1 +
>  gcc/config.gcc|2 +-
>  gcc/config/i386/avx512fp16intrin.h|  225 
>  gcc/config/i386/cpuid.h   |1 +
>  gcc/config/i386/i386-builtin-types.def|7 +-
>  gcc/config/i386/i386-builtins.c   |   23 +
>  gcc/config/i386/i386-c.c  |2 +
>  gcc/config/i386/i386-expand.c |  129 +-
>  gcc/config/i386/i386-isa.def  |1 +
>  gcc/config/i386/i386-modes.def|   13 +-
>  gcc/config/i386/i386-options.c|4 +-
>  gcc/config/i386/i386.c|  238 +++-
>  gcc/config/i386/i386.h|   28 +-
>  gcc/config/i386/i386.md   |  304 -
>  gcc/config/i386/i386.opt  |4 +
>  gcc/config/i386/immintrin.h   |4 +
>  gcc/config/i386/sse.md|  395 --
>  gcc/doc/extend.texi   |   16 +
>  gcc/doc/invoke.texi   |   10 +-
>  gcc/lto/lto-lang.c|3 +
>  gcc/optabs-query.c|   10 +-
>  gcc/testsuite/g++.dg/other/i386-2.C   |2 +-
>  gcc/testsuite/g++.dg/other/i386-3.C   |2 +-
>  gcc/testsuite/g++.target/i386/float16-1.C |8 +
>  gcc/testsuite/g++.target/i386/float16-2.C |   14 +
>  gcc/testsuite/g++.target/i386/float16-3.C |   10 +
>  gcc/testsuite/gcc.target/i386/avx-1.c |2 +-
>  gcc/testsuite/gcc.target/i386/avx-2.c |2 +-
>  gcc/testsuite/gcc.target/i386/avx512-check.h  |3 +
>  .../gcc.target/i386/avx512fp16-10a.c  |   14 +
>  .../gcc.target/i386/avx512fp16-10b.c  |   25 +
>  .../gcc.target/i386/avx512fp16-12a.c  |   21 +
>  .../gcc.target/i386/avx512fp16-12b.c  |   27 +
>  gcc/testsuite/gcc.target/i386/avx512fp16-1a.c |   24 +
>  gcc/testsuite/gcc.target/i386/avx512fp16-1b.c |   32 +
>  gcc/testsuite/gcc.target/i386/avx512fp16-1c.c |   26 +
>  gcc/testsuite/gcc.target/i386/avx512fp16-1d.c |   33 +
>  gcc/testsuite/gcc.target/i386/avx512fp16-1e.c |   30 +
>  gcc/testsuite/gcc.target/i386/avx512fp16-2a.c |   28 +
>  gcc/testsuite/gcc.target/i386/avx512fp16-2b.c |   33 +
>  gcc/testsuite/gcc.target/i386/avx512fp16-2c.c |   36 +
>  gcc/testsuite/gcc.target/i386/avx512fp16-3a.c |   36 +
>  gcc/testsuite/gcc.target/i386/avx512fp16-3b.c |   35 +
>  gcc/testsuite/gcc.target/i386/avx512fp16-3c.c |   40 +
>  gcc/testsuite/gcc.target/i386/avx512fp16-4.c  |   31 +
>  gcc/testsuite/gcc.target/i386/avx512fp16-5.c  |  133 ++
>  gcc/testsuite/gcc.target/i386/avx512fp16-6.c  |   57 +
>  gcc/testsuite/gcc.target/i386/avx512fp16-7.c  |   86 ++
>  gcc/testsuite/gcc.target/i386/avx512fp16-8.c  |   53 +
>  gcc/testsuite/gcc.target/i386/avx512fp16-9a.c |   27 +
>  gcc/testsuite/gcc.target/i386/avx512fp16-9b.c |   49 +
>  .../gcc.target/i386/avx512fp16-vararg-1.c |  122 ++
>

Re: [PATCH] libgcc, i386: Export hf and hc from libgcc_s.so.1

2021-09-07 Thread Hongtao Liu via Gcc-patches

On Wed, Sep 8, 2021 at 8:54 AM Hongtao Liu  wrote:
>
> On Tue, Sep 7, 2021 at 8:29 PM Jakub Jelinek via Gcc-patches
>  wrote:
> >
> > On Mon, Sep 06, 2021 at 10:58:53AM +0200, Jakub Jelinek via Gcc-patches 
> > wrote:
> > > On Mon, Sep 06, 2021 at 08:49:27AM +0100, Iain Sandoe wrote:
> > > > > Ok.  The *.ver changes are still needed (see above), but that can be 
> > > > > done
> > > > > incrementally.
> > > >
> > > > I can commit the .ver change if that’s approved, sure - for the record 
> > > > I haven’t checked
> > > > any targets other than Darwin and Linux.
> > >
> > > The following patch exports it for Linux from config/i386/*.ver where it
> > > IMNSHO belongs, aarch64 already exports some of those at GCC_11* and other
> > > targets might add them at completely different gcc versions.
> > >
> > > Tested on x86_64-linux, verified the right symbols are exported, ok for 
> > > trunk?
> >
> > Bootstrapped/regtested on x86_64-linux and i686-linux successfully, though
> > actually __divhc3 and __mulhc3 aren't exported, they aren't even compiled
> > into libgcc_s.so.1.  Is that on purpose (large functions very unlikely being
> > used in most of the programs)?  If yes, I'll drop the __divhc3/__mulhc3
> > lines.  If not,
> > LIB2ADD_ST += $(addprefix $(srcdir)/config/i386/, $(libgcc2-hf-extras))
> > should be changed to
> > LIB2ADD += $(addprefix $(srcdir)/config/i386/, $(libgcc2-hf-extras))
> > (untested), or even the LIB2FUNCS_EXCLUDE dropped?
I was worried w/o this, there would be a duplicate definition error,
but it's not, so removed.
> >
> I think it's not, let me try to verify it.
> > > 2021-09-06  Jakub Jelinek  
> > >   Iain Sandoe  
> > >
> > >   * config/i386/libgcc-glibc.ver: Add %inherit GCC_12.0.0 GCC_7.0.0
> > >   and export *hf* and *hc* functions at GCC_12.0.0.
> > >
> > > --- libgcc/config/i386/libgcc-glibc.ver.jj2021-01-05 
> > > 00:13:58.142298913 +0100
> > > +++ libgcc/config/i386/libgcc-glibc.ver   2021-09-06 
> > > 10:47:52.244726676 +0200
> > > @@ -194,3 +194,23 @@ GCC_4.8.0 {
> > >__cpu_indicator_init
> > >  }
> > >  %endif
> > > +
> > > +%inherit GCC_12.0.0 GCC_7.0.0
> > > +GCC_12.0.0 {
> > > +  __divhc3
> > > +  __mulhc3
> > > +  __eqhf2
> > > +  __nehf2
> > > +  __extendhfdf2
> > > +  __extendhfsf2
> > > +  __extendhftf2
> > > +  __extendhfxf2
> > > +  __fixhfti
> > > +  __fixunshfti
> > > +  __floattihf
> > > +  __floatuntihf
> > > +  __truncdfhf2
> > > +  __truncsfhf2
> > > +  __trunctfhf2
> > > +  __truncxfhf2
> > > +}
> >
> > Jakub
> >
>
>
> --
> BR,
> Hongtao

  Bootstrapped and regtested on x86_64-linux-gnu{-m32,}.
  Ok for trunk?

libgcc/ChangeLog:

* config/i386/t-softfp: Compile __{mul,div}hc3 into
libgcc_s.so.1.
---
 libgcc/config/i386/t-softfp | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/libgcc/config/i386/t-softfp b/libgcc/config/i386/t-softfp
index 2363ea17194..7620cc0cec5 100644
--- a/libgcc/config/i386/t-softfp
+++ b/libgcc/config/i386/t-softfp
@@ -2,9 +2,8 @@ LIB2ADD += $(srcdir)/config/i386/sfp-exceptions.c

 # Replace _divhc3 and _mulhc3.
 libgcc2-hf-functions = _divhc3 _mulhc3
-LIB2FUNCS_EXCLUDE += $(libgcc2-hf-functions)
 libgcc2-hf-extras = $(addsuffix .c, $(libgcc2-hf-functions))
-LIB2ADD_ST += $(addprefix $(srcdir)/config/i386/, $(libgcc2-hf-extras))
+LIB2ADD += $(addprefix $(srcdir)/config/i386/, $(libgcc2-hf-extras))

 softfp_extensions := hfsf hfdf hftf hfxf sfdf sftf dftf xftf
 softfp_truncations := tfhf xfhf dfhf sfhf tfsf dfsf tfdf tfxf
-- 
2.27.0


-- 
BR,
Hongtao

Re: [PATCH v4] c++: Add gnu::diagnose_as attribute

2021-09-07 Thread Jason Merrill via Gcc-patches


On 7/23/21 4:58 AM, Matthias Kretz wrote:

Hi Jason,

I found a few regressions from the last patch in the meantime. Version 4 of
the patch is attached.

Questions:

1. I simplified the condition for calling dump_template_parms in
dump_function_name. !DECL_FRIEND_PSEUDO_TEMPLATE_INSTANTIATION (t) is
equivalent to DECL_USE_TEMPLATE (t) in this context; implying that
dump_template_parms is unconditionally called with `primary = false`. Or am I
missing something?

2. Given a DECL_TI_ARGS tree, can I query whether an argument was deduced or
explicitly specified? I'm asking because I still consider diagnostics of
function templates unfortunate. `template  void f()` is fine, as is
`void f(T) [with T = float]`, but `void f() [with T = float]` could be better.
I.e. if the template parameter appears somewhere in the function parameter
list, dump_template_parms would only produce noise. If, however, the template
parameter was given explicitly, it would be nice if it could show up
accordingly in diagnostics.

3. When parsing tentatively and the parse is rejected, input_location is not
reset, correct? In the attached patch I therefore made
cp_parser_namespace_alias_definition reset input_location on a failed
tentative parse. But it feels wrong. Shouldn't input_location be restored on
cp_parser_parse_definitely?

--

This attribute overrides the diagnostics output string for the entity it
appertains to. The motivation is to improve QoI for library TS
implementations, where diagnostics have a very bad signal-to-noise ratio
due to the long namespaces involved.

With the attribute, it is possible to solve PR89370 and make
std::__cxx11::basic_string<_CharT, _Traits, _Alloc> appear as
std::string in diagnostic output without extra hacks to recognize the
type in the C++ frontend.

Signed-off-by: Matthias Kretz 

gcc/ChangeLog:

 PR c++/89370
 * doc/extend.texi: Document the diagnose_as attribute.
 * doc/invoke.texi: Document -fno-diagnostics-use-aliases.

gcc/c-family/ChangeLog:

 PR c++/89370
 * c.opt (fdiagnostics-use-aliases): New diagnostics flag.

gcc/cp/ChangeLog:

 PR c++/89370
 * cp-tree.h: Add is_alias_template_p declaration.
 * decl2.c (is_alias_template_p): New function. Determines
 whether a given TYPE_DECL is actually an alias template that is
 still missing its template_info.


I still think you want to share code with get_underlying_template.  For 
the case where the alias doesn't have DECL_TEMPLATE_INFO yet, you can 
compare to current_template_args ().  Or you could do some initial 
processing that doesn't care about templates in the handler, and then do 
more in cp_parser_alias_declaration after the call to grokfield/start_decl.


If you still think you need this function, let's call it 
is_renaming_alias_template or renaming_alias_template_p; using both is_ 
and _p is redundant.  I don't have a strong preference which.



 (is_late_template_attribute): Decls with diagnose_as attribute
 are early attributes only if they are alias templates.


Is there a reason not to apply it early to other templates as well?


 * error.c (dump_scope): When printing the name of a namespace,
 look for the diagnose_as attribute. If found, print the
 associated string instead of calling dump_decl.


Did you decide not to handle this in dump_decl, so we use the 
diagnose_as when referring to the namespace in non-scope contexts as well?



 (dump_decl_name_or_diagnose_as): New function to replace
 dump_decl (pp, DECL_NAME(t), flags) and inspect the tree for the
 diagnose_as attribute before printing the DECL_NAME.



+  if (flag_diagnostics_use_aliases)
+{
+  tree attr = lookup_attribute ("diagnose_as", DECL_ATTRIBUTES (decl));
+  if (attr && TREE_VALUE (attr))
+   {
+ pp_cxx_ws_string (
+   pp, TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (attr;


This pattern is used several places outside this function; can we factor 
it into something like


if (maybe_print_diagnose_as (special))
  /* OK */;

?


 (dump_template_scope): New function. Prints the scope of a
 template instance correctly applying diagnose_as attributes and
 adjusting the list of template parms accordingly.



+  const bool tmplate
+   = TYPE_LANG_SPECIFIC (special) && CLASSTYPE_TEMPLATE_INFO (special)
+   && (TREE_CODE (CLASSTYPE_TI_TEMPLATE (special)) != TEMPLATE_DECL
+ || PRIMARY_TEMPLATE_P (CLASSTYPE_TI_TEMPLATE (special)));


CLASSTYPE_SPECIALIZATION_OF_PRIMARY_TEMPLATE_P?


+  tmplate ? _CHAIN(*parms) : parms, flags);


Missing space before (


+ if (tmplate)
+   TREE_VALUE (*parms) = make_tree_vec (0);


This could use a comment.


 (dump_aggr_type): If the type has a diagnose_as attribute, print
 the associated string instead of printing the original type

Re: [PATCH 2/2] Get rid of all float-int special cases in validate_subreg.

2021-09-07 Thread Hongtao Liu via Gcc-patches

On Wed, Sep 8, 2021 at 7:20 AM Segher Boessenkool
 wrote:
>
> On Fri, Sep 03, 2021 at 05:05:47PM +0200, Andreas Schwab wrote:
> > On Sep 02 2021, Segher Boessenkool wrote:
> > > On Tue, Aug 31, 2021 at 07:17:49PM +0800, liuhongt via Gcc-patches wrote:
> > >>* emit-rtl.c (validate_subreg): Get rid of all float-int
> > >>special cases.
> > >
> > > This caused various regressions on powerpc.  Please revert this until
> > > this can be done safely (the comment this patch deletes says why it can
> > > not be done yet).
> >
> > This also breaks ada on riscv64.
> >
> > s-fatgen.adb: In function 'System.Fat_Flt.Attr_Float.Scaling':
> > s-fatgen.adb:830:8: error: unable to find a register to spill
> > s-fatgen.adb:830:8: error: this is the insn:
> > (insn 215 321 216 26 (set (reg:SF 88 [ xx.26_39 ])
> > (mult:SF (reg:SF 190)
> > (subreg:SF (reg:DI 221 [164]) 0))) "s-fatgen.adb":821:25 17 
> > {mulsf3}
> >  (expr_list:REG_DEAD (reg:DI 221 [164])
> > (expr_list:REG_DEAD (reg:SF 190)
> > (nil
> > during RTL pass: reload
>
> It still is broken on rs6000.  This breaks when building SPEC for
> example (but in many more places as well).
>
> This needs to be fixed somehow.
>
> I sent 
> (Message-ID: <20210907230730.gm1...@gate.crashing.org>) that may be a
> start discussing this somewhat.  The idea of the change looks fine, but
> the time isn't ripe for it yet (if it was intentional!)
>
> In the meantime, various targets still are broken.  This needs a real
> fix.  How many *other* targets have been broken, just not detected yet?
riscv64 report related bug.
Other than that, no other target reports related regression yet.
>
>
> Segher



-- 
BR,
Hongtao

Re: [PATCH] libgcc, i386: Export hf and hc from libgcc_s.so.1

2021-09-07 Thread Hongtao Liu via Gcc-patches

On Tue, Sep 7, 2021 at 8:29 PM Jakub Jelinek via Gcc-patches
 wrote:
>
> On Mon, Sep 06, 2021 at 10:58:53AM +0200, Jakub Jelinek via Gcc-patches wrote:
> > On Mon, Sep 06, 2021 at 08:49:27AM +0100, Iain Sandoe wrote:
> > > > Ok.  The *.ver changes are still needed (see above), but that can be 
> > > > done
> > > > incrementally.
> > >
> > > I can commit the .ver change if that’s approved, sure - for the record I 
> > > haven’t checked
> > > any targets other than Darwin and Linux.
> >
> > The following patch exports it for Linux from config/i386/*.ver where it
> > IMNSHO belongs, aarch64 already exports some of those at GCC_11* and other
> > targets might add them at completely different gcc versions.
> >
> > Tested on x86_64-linux, verified the right symbols are exported, ok for 
> > trunk?
>
> Bootstrapped/regtested on x86_64-linux and i686-linux successfully, though
> actually __divhc3 and __mulhc3 aren't exported, they aren't even compiled
> into libgcc_s.so.1.  Is that on purpose (large functions very unlikely being
> used in most of the programs)?  If yes, I'll drop the __divhc3/__mulhc3
> lines.  If not,
> LIB2ADD_ST += $(addprefix $(srcdir)/config/i386/, $(libgcc2-hf-extras))
> should be changed to
> LIB2ADD += $(addprefix $(srcdir)/config/i386/, $(libgcc2-hf-extras))
> (untested), or even the LIB2FUNCS_EXCLUDE dropped?
>
I think it's not, let me try to verify it.
> > 2021-09-06  Jakub Jelinek  
> >   Iain Sandoe  
> >
> >   * config/i386/libgcc-glibc.ver: Add %inherit GCC_12.0.0 GCC_7.0.0
> >   and export *hf* and *hc* functions at GCC_12.0.0.
> >
> > --- libgcc/config/i386/libgcc-glibc.ver.jj2021-01-05 00:13:58.142298913 
> > +0100
> > +++ libgcc/config/i386/libgcc-glibc.ver   2021-09-06 10:47:52.244726676 
> > +0200
> > @@ -194,3 +194,23 @@ GCC_4.8.0 {
> >__cpu_indicator_init
> >  }
> >  %endif
> > +
> > +%inherit GCC_12.0.0 GCC_7.0.0
> > +GCC_12.0.0 {
> > +  __divhc3
> > +  __mulhc3
> > +  __eqhf2
> > +  __nehf2
> > +  __extendhfdf2
> > +  __extendhfsf2
> > +  __extendhftf2
> > +  __extendhfxf2
> > +  __fixhfti
> > +  __fixunshfti
> > +  __floattihf
> > +  __floatuntihf
> > +  __truncdfhf2
> > +  __truncsfhf2
> > +  __trunctfhf2
> > +  __truncxfhf2
> > +}
>
> Jakub
>


-- 
BR,
Hongtao

Re: [PATCH 2/2] Get rid of all float-int special cases in validate_subreg.

2021-09-07 Thread Segher Boessenkool

On Fri, Sep 03, 2021 at 05:05:47PM +0200, Andreas Schwab wrote:
> On Sep 02 2021, Segher Boessenkool wrote:
> > On Tue, Aug 31, 2021 at 07:17:49PM +0800, liuhongt via Gcc-patches wrote:
> >>* emit-rtl.c (validate_subreg): Get rid of all float-int
> >>special cases.
> >
> > This caused various regressions on powerpc.  Please revert this until
> > this can be done safely (the comment this patch deletes says why it can
> > not be done yet).
> 
> This also breaks ada on riscv64.
> 
> s-fatgen.adb: In function 'System.Fat_Flt.Attr_Float.Scaling':
> s-fatgen.adb:830:8: error: unable to find a register to spill
> s-fatgen.adb:830:8: error: this is the insn:
> (insn 215 321 216 26 (set (reg:SF 88 [ xx.26_39 ])
> (mult:SF (reg:SF 190)
> (subreg:SF (reg:DI 221 [164]) 0))) "s-fatgen.adb":821:25 17 
> {mulsf3}
>  (expr_list:REG_DEAD (reg:DI 221 [164])
> (expr_list:REG_DEAD (reg:SF 190)
> (nil
> during RTL pass: reload

It still is broken on rs6000.  This breaks when building SPEC for
example (but in many more places as well).

This needs to be fixed somehow.

I sent 
(Message-ID: <20210907230730.gm1...@gate.crashing.org>) that may be a
start discussing this somewhat.  The idea of the change looks fine, but
the time isn't ripe for it yet (if it was intentional!)

In the meantime, various targets still are broken.  This needs a real
fix.  How many *other* targets have been broken, just not detected yet?

Segher

Re: [PATCH] Fix SFmode subreg of DImode and TImode

2021-09-07 Thread Segher Boessenkool

Hi!

On Tue, Sep 07, 2021 at 03:12:36AM -0400, Michael Meissner wrote:
> [PATCH] Fix SFmode subreg of DImode and TImode
> 
> This patch fixes the breakage in the PowerPC due to a recent change in SUBREG
> behavior.

But what was that change?  And was that intentional?  If so, why wasn't
it documented, was the existing behaviour considered buggy?  But the
documentation agrees with the previous behaviour afaics.

> While it is arguable that the patch that caused the breakage should
> be reverted, this patch should be a bandage to prevent these changes from
> happening again.

NAK.  This patch will likely cause us to generate worse code.  If that
is not the case it will need a long, in-depth explanation of why not.

Sorry.

> I first noticed it in building the Spec 2017 wrf_r and blender_r
> benchmarks.  Once I applied this patch, I also noticed several of the
> tests now pass.
> 
> The core of the problem is we need to treat SUBREG's of SFmode and SImode
> specially on the PowerPC.  This is due to the fact that SFmode values that are
> in the vector and floating point registers are represented as DFmode.  When we
> want to do a direct move between the GPR registers and the vector registers, 
> we
> have to convert the value from the DFmode representation to/from the SFmode
> representation.

The core of the problem is that subreg of pseudos has three meanings:
  -- Paradoxical subregs;
  -- Actual subregs;
  -- "bit_cast" thingies: treat the same bits as something else.  Like
 looking at the bits of a float as its memory image.

Ignoring paradoxical subregs (as well as subregs of mem, which should
have disappeared by now), and subregs of hard registers as well (those
have *different* semantics after all), the other two kinds can be mixed,
and *have to* be mixed, because subregs of subregs are non-canonical.

Is there any reason why not to allow this kind of subreg?

If we want to not allow mixing bit_cast with subregs, we should make it
its own RTL code.

> +  /* In case we are given a SUBREG for a larger type, reduce it to
> +  SImode.  */
> +  if (mode == SFmode && GET_MODE_SIZE (inner_mode) > 4)
> + {
> +   rtx tmp = gen_reg_rtx (SImode);
> +   emit_move_insn (tmp, gen_lowpart (SImode, source));
> +   emit_insn (gen_movsf_from_si (dest, tmp));
> +   return true;
> + }

This makes it two separate insns.  Is that always optimised to code that
is at least as good as before?

Segher

[COMMITTED] gcc: xtensa: fix PR target/102115

2021-09-07 Thread Max Filippov via Gcc-patches

2021-09-07  Takayuki 'January June' Suwa  
gcc/
PR target/102115
* config/xtensa/xtensa.c (xtensa_emit_move_sequence): Add
'CONST_INT_P (src)' to the condition of the block that tries to
eliminate literal when loading integer contant.
---
 gcc/config/xtensa/xtensa.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/config/xtensa/xtensa.c b/gcc/config/xtensa/xtensa.c
index f4f8f1975c55..8d6755144c12 100644
--- a/gcc/config/xtensa/xtensa.c
+++ b/gcc/config/xtensa/xtensa.c
@@ -1084,7 +1084,8 @@ xtensa_emit_move_sequence (rtx *operands, machine_mode 
mode)
{
  /* Try to emit MOVI + SLLI sequence, that is smaller
 than L32R + literal.  */
- if (optimize_size && mode == SImode && register_operand (dst, mode))
+ if (optimize_size && mode == SImode && CONST_INT_P (src)
+ && register_operand (dst, mode))
{
  HOST_WIDE_INT srcval = INTVAL (src);
  int shift = ctz_hwi (srcval);
-- 
2.20.1

Re: libgo patch committed: Update to Go1.17rc2 release

2021-09-07 Thread Ian Lance Taylor via Gcc-patches

On Fri, Sep 3, 2021 at 2:00 AM Matthias Klose  wrote:
>
> On 8/31/21 3:24 PM, H.J. Lu via Gcc-patches wrote:
> > On Thu, Aug 12, 2021 at 8:24 PM Ian Lance Taylor via Gcc-patches
> >  wrote:
> >>
> >> This patch updates libgo from the Go1.16.5 release to the Go 1.17rc2
> >> release.  As usual with these version updates, the patch itself is too
> >> large to attach to this e-mail message.  I've attached the changes to
> >> files that are specific to gccgo.  Bootstraped and ran Go testsuite on
> >> x86_64-pc-linux-gnu.  Committed to mainline.
> >>
> >> Ian
> >
> > This breaks build with x32:
>
> This is PR/102102
>
> Also seen on x86_64-linux-gnu, when configuring with
> --with-multilib-list=m32,m64,mx32

Should be fixed now, I hope.

I don't know how to test this, as my desktop does not support x32 mode.

Ian

libgo patch committed: Use hash32 for p32 variants

2021-09-07 Thread Ian Lance Taylor via Gcc-patches

This libgo patch uses hash32 rather than hash64 for amd32p32 (x32
mode) and mips64p32 and mips64p32le (the n32 ABI).  This should fix PR
102102.  Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.
Committed to mainline.

Ian
3e86f786c08a5ae8b3153352a1295ab7fe6a4b51
diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE
index e9f38d449a4..c3772694780 100644
--- a/gcc/go/gofrontend/MERGE
+++ b/gcc/go/gofrontend/MERGE
@@ -1,4 +1,4 @@
-b3fad6957a04520013197ea7cab11bec3298d552
+e42c7c0216aec70834e8827174458aa4a50169fa
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
diff --git a/libgo/go/runtime/hash32.go b/libgo/go/runtime/hash32.go
index 58ae38b200d..0df73035c05 100644
--- a/libgo/go/runtime/hash32.go
+++ b/libgo/go/runtime/hash32.go
@@ -5,8 +5,8 @@
 // Hashing algorithm inspired by
 // wyhash: 
https://github.com/wangyi-fudan/wyhash/blob/ceb019b530e2c1c14d70b79bfa2bc49de7d95bc1/Modern%20Non-Cryptographic%20Hash%20Function%20and%20Pseudorandom%20Number%20Generator.pdf
 
-//go:build 386 || arm || mips || mipsle || armbe || m68k || nios2 || ppc || 
riscv || s390 || sh || shbe || sparc
-// +build 386 arm mips mipsle armbe m68k nios2 ppc riscv s390 sh shbe sparc
+//go:build 386 || arm || mips || mipsle || amd64p32 || armbe || m68k || 
mips64p32 || mips64p32le || nios2 || ppc || riscv || s390 || sh || shbe || sparc
+// +build 386 arm mips mipsle amd64p32 armbe m68k mips64p32 mips64p32le nios2 
ppc riscv s390 sh shbe sparc
 
 package runtime
 
diff --git a/libgo/go/runtime/hash64.go b/libgo/go/runtime/hash64.go
index 4b32d515c4b..96ed90b9753 100644
--- a/libgo/go/runtime/hash64.go
+++ b/libgo/go/runtime/hash64.go
@@ -5,8 +5,8 @@
 // Hashing algorithm inspired by
 // wyhash: https://github.com/wangyi-fudan/wyhash
 
-//go:build amd64 || arm64 || mips64 || mips64le || ppc64 || ppc64le || riscv64 
|| s390x || wasm || alpha || amd64p32 || arm64be || ia64 || mips64p32 || 
mips64p32le || sparc64
-// +build amd64 arm64 mips64 mips64le ppc64 ppc64le riscv64 s390x wasm alpha 
amd64p32 arm64be ia64 mips64p32 mips64p32le sparc64
+//go:build amd64 || arm64 || mips64 || mips64le || ppc64 || ppc64le || riscv64 
|| s390x || wasm || alpha || arm64be || ia64 || sparc64
+// +build amd64 arm64 mips64 mips64le ppc64 ppc64le riscv64 s390x wasm alpha 
arm64be ia64 sparc64
 
 package runtime

testsuite: Use explicit -ftree-cselim in tests using -fdump-tree-cselim-details

2021-09-07 Thread Joseph Myers

When testing for Nios II (gcc-testresults shows this for various other
targets as well), tests scanning cselim dumps produce an UNRESOLVED
result because those dumps do not exist.

cselim is enabled conditionally by code in toplev.c:

  if (flag_tree_cselim == AUTODETECT_VALUE)
{
  if (HAVE_conditional_move)
flag_tree_cselim = 1;
  else
flag_tree_cselim = 0;
}

Add explicit -ftree-cselim to dg-options in the affected tests (as
already used by some other tests of cselim dumps) so that this dump
exists on all architectures.

Tested with no regressions with cross to nios2-elf, where this causes
the tests in question to PASS instead of being UNRESOLVED.  OK to commit?

2021-09-07  Joseph Myers  

* gcc.dg/tree-ssa/pr89430-1.c, gcc.dg/tree-ssa/pr89430-2.c,
gcc.dg/tree-ssa/pr89430-3.c, gcc.dg/tree-ssa/pr89430-4.c,
gcc.dg/tree-ssa/pr89430-5.c, gcc.dg/tree-ssa/pr89430-6.c,
gcc.dg/tree-ssa/pr89430-7-comp-ref.c,
gcc.dg/tree-ssa/pr89430-8-mem-ref-size.c,
gcc.dg/tree-ssa/pr99473-1.c: Use -ftree-cselim.

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr89430-1.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr89430-1.c
index 8ee1850ac63..d9fb2edf549 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr89430-1.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr89430-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-cselim-details" } */
+/* { dg-options "-O2 -ftree-cselim -fdump-tree-cselim-details" } */
 
 unsigned test(unsigned k, unsigned b) {
 unsigned a[2];
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr89430-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr89430-2.c
index 9b96875ac7a..bb39df2be8e 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr89430-2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr89430-2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-cselim-details" } */
+/* { dg-options "-O2 -ftree-cselim -fdump-tree-cselim-details" } */
 
 int c;
 unsigned test(unsigned k, unsigned b) {
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr89430-3.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr89430-3.c
index 0fac9f9b9c7..00166373267 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr89430-3.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr89430-3.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-cselim-details" } */
+/* { dg-options "-O2 -ftree-cselim -fdump-tree-cselim-details" } */
 
 unsigned a[2];
 unsigned test(unsigned k, unsigned b) {
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr89430-4.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr89430-4.c
index 54b8c11a407..127cbdf3d10 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr89430-4.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr89430-4.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-cselim-details" } */
+/* { dg-options "-O2 -ftree-cselim -fdump-tree-cselim-details" } */
 
 int *p;
 unsigned test(unsigned k, unsigned b) {
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr89430-5.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr89430-5.c
index b2d04119381..6a00f54b545 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr89430-5.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr89430-5.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-cselim-details" } */
+/* { dg-options "-O2 -ftree-cselim -fdump-tree-cselim-details" } */
 
 int test(int b, int k) {
 struct {
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr89430-6.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr89430-6.c
index 8d3c4f7cc6a..ecc083ebebe 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr89430-6.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr89430-6.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-cselim-details" } */
+/* { dg-options "-O2 -ftree-cselim -fdump-tree-cselim-details" } */
 
 int test(int b, int k) {
 typedef struct {
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr89430-7-comp-ref.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr89430-7-comp-ref.c
index c35a2afc70b..4fad2d1eb13 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr89430-7-comp-ref.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr89430-7-comp-ref.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-cselim-details" } */
+/* { dg-options "-O2 -ftree-cselim -fdump-tree-cselim-details" } */
 
 typedef union {
   int i;
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr89430-8-mem-ref-size.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr89430-8-mem-ref-size.c
index f9e66aefb13..5f93112acf7 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr89430-8-mem-ref-size.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr89430-8-mem-ref-size.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-cselim-details" } */
+/* { dg-options "-O2 -ftree-cselim -fdump-tree-cselim-details" } */
 
 int *t;
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr99473-1.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr99473-1.c
index a9fd5427694..0fda5663a80 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr99473-1.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr99473-1.c
@@ -1,5 +1,5 @@
 /* { dg-do

[PATCH] PR fortran/82314 - ICE in gfc_conv_expr_descriptor, at fortran/trans-array.c:6972

2021-09-07 Thread Harald Anlauf via Gcc-patches

When adding the initializer for an array, we need to make sure that
array bounds are properly simplified if that array is a PARAMETER.
Otherwise the generated initializer could be wrong and screw up
subsequent simplifications, see PR.

The minimal solution is to attempt simplification of array bounds
before adding the initializer as in the attached patch.  (We could
place that part in a helper function if this functionality is
considered useful elsewhere).

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

Thanks,
Harald


Fortran - ensure simplification of bounds of array-valued named constants

gcc/fortran/ChangeLog:

PR fortran/82314
* decl.c (add_init_expr_to_sym): For proper initialization of
array-valued named constants the array bounds need to be
simplified before adding the initializer.

gcc/testsuite/ChangeLog:

PR fortran/82314
* gfortran.dg/pr82314.f90: New test.

diff --git a/gcc/fortran/decl.c b/gcc/fortran/decl.c
index 2e49a673e15..f2e8896b562 100644
--- a/gcc/fortran/decl.c
+++ b/gcc/fortran/decl.c
@@ -2169,6 +2169,24 @@ add_init_expr_to_sym (const char *name, gfc_expr **initp, locus *var_locus)
 	  sym->as->type = AS_EXPLICIT;
 	}

+  /* Ensure that explicit bounds are simplified.  */
+  if (sym->attr.flavor == FL_PARAMETER && sym->attr.dimension
+	  && sym->as->type == AS_EXPLICIT)
+	{
+	  for (int dim = 0; dim < sym->as->rank; ++dim)
+	{
+	  gfc_expr *e;
+
+	  e = sym->as->lower[dim];
+	  if (e->expr_type != EXPR_CONSTANT)
+		gfc_reduce_init_expr (e);
+
+	  e = sym->as->upper[dim];
+	  if (e->expr_type != EXPR_CONSTANT)
+		gfc_reduce_init_expr (e);
+	}
+	}
+
   /* Need to check if the expression we initialized this
 	 to was one of the iso_c_binding named constants.  If so,
 	 and we're a parameter (constant), let it be iso_c.
diff --git a/gcc/testsuite/gfortran.dg/pr82314.f90 b/gcc/testsuite/gfortran.dg/pr82314.f90
new file mode 100644
index 000..3a147e22711
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr82314.f90
@@ -0,0 +1,11 @@
+! { dg-do run }
+! PR fortran/82314 - ICE in gfc_conv_expr_descriptor
+
+program p
+  implicit none
+  integer, parameter :: karray(merge(3,7,.true.):merge(3,7,.false.)) = 1
+  integer, parameter :: i = size   (karray)
+  integer, parameter :: l = lbound (karray,1)
+  integer, parameter :: u = ubound (karray,1)
+  if (l /= 3 .or. u /= 7 .or. i /= 5) stop 1
+end

[COMMITTED V2 5/7] bpf: BPF CO-RE support

2021-09-07 Thread David Faust via Gcc-patches

This commit introduces support for BPF Compile Once - Run
Everywhere (CO-RE) in GCC.

gcc/ChangeLog:

* config/bpf/bpf.c: Adjust includes.
(bpf_handle_preserve_access_index_attribute): New function.
(bpf_attribute_table): Use it here.
(bpf_builtins): Add BPF_BUILTIN_PRESERVE_ACCESS_INDEX.
(bpf_option_override): Handle "-mco-re" option.
(bpf_asm_init_sections): New.
(TARGET_ASM_INIT_SECTIONS): Redefine.
(bpf_file_end): New.
(TARGET_ASM_FILE_END): Redefine.
(bpf_init_builtins): Add "__builtin_preserve_access_index".
(bpf_core_compute, bpf_core_get_index): New.
(is_attr_preserve_access): New.
(bpf_expand_builtin): Handle new builtins.
(bpf_core_newdecl, bpf_core_is_maybe_aggregate_access): New.
(bpf_core_walk): New.
(bpf_resolve_overloaded_builtin): New.
(TARGET_RESOLVE_OVERLOADED_BUILTIN): Redefine.
(handle_attr): New.
(pass_bpf_core_attr): New RTL pass.
* config/bpf/bpf-passes.def: New file.
* config/bpf/bpf-protos.h (make_pass_bpf_core_attr): New.
* config/bpf/coreout.c: New file.
* config/bpf/coreout.h: Likewise.
* config/bpf/t-bpf (TM_H): Add $(srcdir)/config/bpf/coreout.h.
(coreout.o): New rule.
(PASSES_EXTRA): Add $(srcdir)/config/bpf/bpf-passes.def.
* config.gcc (bpf): Add coreout.h to extra_headers.
Add coreout.o to extra_objs.
Add $(srcdir)/config/bpf/coreout.c to target_gtfiles.
---
 gcc/config.gcc|   3 +
 gcc/config/bpf/bpf-passes.def |  20 ++
 gcc/config/bpf/bpf-protos.h   |   2 +
 gcc/config/bpf/bpf.c  | 591 ++
 gcc/config/bpf/coreout.c  | 356 
 gcc/config/bpf/coreout.h  | 114 +++
 gcc/config/bpf/t-bpf  |   8 +
 7 files changed, 1094 insertions(+)
 create mode 100644 gcc/config/bpf/bpf-passes.def
 create mode 100644 gcc/config/bpf/coreout.c
 create mode 100644 gcc/config/bpf/coreout.h

diff --git a/gcc/config.gcc b/gcc/config.gcc
index e553ef34bc7..e3e9d8f676f 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -1525,6 +1525,9 @@ bpf-*-*)
 use_collect2=no
 extra_headers="bpf-helpers.h"
 use_gcc_stdint=provide
+extra_headers="coreout.h"
+extra_objs="coreout.o"
+target_gtfiles="$target_gtfiles \$(srcdir)/config/bpf/coreout.c"
 ;;
 cr16-*-elf)
 tm_file="elfos.h ${tm_file} newlib-stdint.h"
diff --git a/gcc/config/bpf/bpf-passes.def b/gcc/config/bpf/bpf-passes.def
new file mode 100644
index 000..3e961659411
--- /dev/null
+++ b/gcc/config/bpf/bpf-passes.def
@@ -0,0 +1,20 @@
+/* Declaration of target-specific passes for eBPF.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   .  */
+
+INSERT_PASS_AFTER (pass_df_initialize_opt, 1, pass_bpf_core_attr);
diff --git a/gcc/config/bpf/bpf-protos.h b/gcc/config/bpf/bpf-protos.h
index aeb512665ed..7ce3386ffda 100644
--- a/gcc/config/bpf/bpf-protos.h
+++ b/gcc/config/bpf/bpf-protos.h
@@ -30,4 +30,6 @@ extern void bpf_print_operand_address (FILE *, rtx);
 extern void bpf_expand_prologue (void);
 extern void bpf_expand_epilogue (void);
 
+rtl_opt_pass * make_pass_bpf_core_attr (gcc::context *);
+
 #endif /* ! GCC_BPF_PROTOS_H */
diff --git a/gcc/config/bpf/bpf.c b/gcc/config/bpf/bpf.c
index 7228978a3a9..01d9c03479e 100644
--- a/gcc/config/bpf/bpf.c
+++ b/gcc/config/bpf/bpf.c
@@ -56,6 +56,24 @@ along with GCC; see the file COPYING3.  If not see
 #include "langhooks.h"
 #include "flags.h"
 
+#include "cfg.h" /* needed for struct control_flow_graph used in BB macros */
+#include "gimple.h"
+#include "gimple-iterator.h"
+#include "gimple-walk.h"
+#include "tree-pass.h"
+#include "tree-iterator.h"
+
+#include "context.h"
+#include "pass_manager.h"
+
+#include "gimplify.h"
+#include "gimplify-me.h"
+
+#include "ctfc.h"
+#include "btf.h"
+
+#include "coreout.h"
+
 /* Per-function machine data.  */
 struct GTY(()) machine_function
 {
@@ -105,6 +123,27 @@ bpf_handle_fndecl_attribute (tree *node, tree name,
   return NULL_TREE;
 }
 
+/* Handle preserve_access_index attribute, which can be applied to structs,
+   unions and classes. Actually adding the attribute to the TYPE_DECL is
+   taken care of for us, so just warn

[COMMITTED V2 6/7] bpf testsuite: Add BPF CO-RE tests

2021-09-07 Thread David Faust via Gcc-patches

This commit adds several tests for the new BPF CO-RE functionality to
the BPF target testsuite.

gcc/testsuite/ChangeLog:

* gcc.target/bpf/core-attr-1.c: New test.
* gcc.target/bpf/core-attr-2.c: Likewise.
* gcc.target/bpf/core-attr-3.c: Likewise.
* gcc.target/bpf/core-attr-4.c: Likewise
* gcc.target/bpf/core-builtin-1.c: Likewise
* gcc.target/bpf/core-builtin-2.c: Likewise.
* gcc.target/bpf/core-builtin-3.c: Likewise.
* gcc.target/bpf/core-section-1.c: Likewise.
---
 gcc/testsuite/gcc.target/bpf/core-attr-1.c| 23 +++
 gcc/testsuite/gcc.target/bpf/core-attr-2.c| 21 ++
 gcc/testsuite/gcc.target/bpf/core-attr-3.c| 41 
 gcc/testsuite/gcc.target/bpf/core-attr-4.c| 35 ++
 gcc/testsuite/gcc.target/bpf/core-builtin-1.c | 64 +++
 gcc/testsuite/gcc.target/bpf/core-builtin-2.c | 26 
 gcc/testsuite/gcc.target/bpf/core-builtin-3.c | 26 
 gcc/testsuite/gcc.target/bpf/core-section-1.c | 38 +++
 8 files changed, 274 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/bpf/core-attr-1.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/core-attr-2.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/core-attr-3.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/core-attr-4.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/core-builtin-1.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/core-builtin-2.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/core-builtin-3.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/core-section-1.c

diff --git a/gcc/testsuite/gcc.target/bpf/core-attr-1.c 
b/gcc/testsuite/gcc.target/bpf/core-attr-1.c
new file mode 100644
index 000..1af9dc5ea6d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/bpf/core-attr-1.c
@@ -0,0 +1,23 @@
+/* Basic test for struct __attribute__((preserve_access_index))
+   for BPF CO-RE support.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O0 -dA -gbtf -mco-re" } */
+
+struct S {
+  int a;
+  int b;
+  int c;
+} __attribute__((preserve_access_index));
+
+void
+func (struct S * s)
+{
+  /* 0:2 */
+  int *x = &(s->c);
+
+  *x = 4;
+}
+
+/* { dg-final { scan-assembler-times "ascii \"0:2.0\"\[\t 
\]+\[^\n\]*btf_aux_string" 1 } } */
+/* { dg-final { scan-assembler-times "bpfcr_type" 1 } } */
diff --git a/gcc/testsuite/gcc.target/bpf/core-attr-2.c 
b/gcc/testsuite/gcc.target/bpf/core-attr-2.c
new file mode 100644
index 000..25c819a0082
--- /dev/null
+++ b/gcc/testsuite/gcc.target/bpf/core-attr-2.c
@@ -0,0 +1,21 @@
+/* Basic test for union __attribute__((preserve_access_index))
+   for BPF CO-RE support.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O0 -dA -gbtf -mco-re" } */
+
+union U {
+  int a;
+  char c;
+} __attribute__((preserve_access_index));
+
+void
+func (union U *u)
+{
+  /* 0:1 */
+  char *c = &(u->c);
+  *c = 'c';
+}
+
+/* { dg-final { scan-assembler-times "ascii \"0:1.0\"\[\t 
\]+\[^\n\]*btf_aux_string" 1 } } */
+/* { dg-final { scan-assembler-times "bpfcr_type" 1 } } */
diff --git a/gcc/testsuite/gcc.target/bpf/core-attr-3.c 
b/gcc/testsuite/gcc.target/bpf/core-attr-3.c
new file mode 100644
index 000..b46549f788c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/bpf/core-attr-3.c
@@ -0,0 +1,41 @@
+/* Test for __attribute__((preserve_access_index)) for BPF CO-RE support
+   for nested structure.
+
+   Note that even though struct O lacks the attribute, when accessed as a
+   member of another attributed type, CO-RE relocations should still be
+   generated.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O0 -dA -gbtf -mco-re" } */
+
+struct O {
+  int e;
+  int f;
+};
+
+struct S {
+  int a;
+  struct {
+int b;
+int c;
+  } inner;
+  struct O other;
+} __attribute__((preserve_access_index));
+
+void
+func (struct S *foo)
+{
+  /* 0:1:1 */
+  int *x = &(foo->inner.c);
+
+  /* 0:2:0 */
+  int *y = &(foo->other.e);
+
+  *x = 4;
+  *y = 5;
+}
+
+/* { dg-final { scan-assembler-times "ascii \"0:1:1.0\"\[\t 
\]+\[^\n\]*btf_aux_string" 1 } } */
+/* { dg-final { scan-assembler-times "ascii \"0:2:0.0\"\[\t 
\]+\[^\n\]*btf_aux_string" 1 } } */
+
+/* { dg-final { scan-assembler-times "bpfcr_type" 2 } } */
diff --git a/gcc/testsuite/gcc.target/bpf/core-attr-4.c 
b/gcc/testsuite/gcc.target/bpf/core-attr-4.c
new file mode 100644
index 000..9c0f966b556
--- /dev/null
+++ b/gcc/testsuite/gcc.target/bpf/core-attr-4.c
@@ -0,0 +1,35 @@
+/* Test for BPF CO-RE __attribute__((preserve_access_index)) with accesses on
+   LHS and both LHS and RHS of assignment.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O0 -dA -gbtf -mco-re" } */
+
+struct T {
+  int a;
+  int b;
+  struct U {
+int c;
+struct V {
+  int d;
+  int e[4];
+  int f;
+} v;
+  } u;
+} __attribute__((preserve_access_index));
+
+
+void
+func (struct T *t)
+{
+  /* 0:2:1:1:3 */
+  t->u.v.e[3] = 0xa1;
+
+  /* 0:2:0, 0:0, 0:1 */
+  t->u.c = t->a + t->b;
+}
+
+/* { dg-final { scan-assembler-times "ascii

[COMMITTED V2 7/7] doc: BPF CO-RE documentation

2021-09-07 Thread David Faust via Gcc-patches

Document the new command line options (-mco-re and -mno-co-re), the new
BPF target builtin (__builtin_preserve_access_index), and the new BPF
target attribute (preserve_access_index) introduced with BPF CO-RE.

gcc/ChangeLog:

* doc/extend.texi (BPF Type Attributes) New node.
Document new preserve_access_index attribute.
Document new preserve_access_index builtin.
* doc/invoke.texi: Document -mco-re and -mno-co-re options.
---
 gcc/doc/extend.texi | 16 
 gcc/doc/invoke.texi | 13 -
 2 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 7fb22ed8063..31319f7dd08 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -8256,6 +8256,7 @@ attributes.
 * Common Type Attributes::
 * ARC Type Attributes::
 * ARM Type Attributes::
+* BPF Type Attributes::
 * MeP Type Attributes::
 * PowerPC Type Attributes::
 * x86 Type Attributes::
@@ -8830,6 +8831,17 @@ virtual table for @code{C} is not exported.  (You can use
 @code{__attribute__} instead of @code{__declspec} if you prefer, but
 most Symbian OS code uses @code{__declspec}.)
 
+@node BPF Type Attributes
+@subsection BPF Type Attributes
+
+@cindex @code{preserve_access_index} type attribute, BPF
+BPF Compile Once - Run Everywhere (CO-RE) support. When attached to a
+@code{struct} or @code{union} type definition, indicates that CO-RE
+relocation information should be generated for any access to a variable
+of that type. The behavior is equivalent to the programmer manually
+wrapping every such access with @code{__builtin_preserve_access_index}.
+
+
 @node MeP Type Attributes
 @subsection MeP Type Attributes
 
@@ -15467,6 +15479,10 @@ Load 16-bits from the @code{struct sk_buff} packet 
data pointed by the register
 Load 32-bits from the @code{struct sk_buff} packet data pointed by the 
register @code{%r6} and return it.
 @end deftypefn
 
+@deftypefn {Built-in Function} void * __builtin_preserve_access_index 
(@var{expr})
+BPF Compile Once-Run Everywhere (CO-RE) support. Instruct GCC to generate 
CO-RE relocation records for any accesses to aggregate data structures (struct, 
union, array types) in @var{expr}. This builtin is otherwise transparent, the 
return value is whatever @var{expr} evaluates to. It is also overloaded: 
@var{expr} may be of any type (not necessarily a pointer), the return type is 
the same. Has no effect if @code{-mco-re} is not in effect (either specified or 
implied).
+@end deftypefn
+
 @node FR-V Built-in Functions
 @subsection FR-V Built-in Functions
 
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index a9580a08a65..e39dde009ef 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -904,7 +904,7 @@ Objective-C and Objective-C++ Dialects}.
 
 @emph{eBPF Options}
 @gccoptlist{-mbig-endian -mlittle-endian -mkernel=@var{version}
--mframe-limit=@var{bytes} -mxbpf}
+-mframe-limit=@var{bytes} -mxbpf -mco-re -mno-co-re}
 
 @emph{FR30 Options}
 @gccoptlist{-msmall-model  -mno-lsim}
@@ -22635,6 +22635,17 @@ Generate code for a big-endian target.
 @opindex mlittle-endian
 Generate code for a little-endian target.  This is the default.
 
+@item -mco-re
+@opindex mco-re
+Enable BPF Compile Once - Run Everywhere (CO-RE) support. Requires and
+is implied by @option{-gbtf}.
+
+@item -mno-co-re
+@opindex mno-co-re
+Disable BPF Compile Once - Run Everywhere (CO-RE) support. BPF CO-RE
+support is enabled by default when generating BTF debug information for
+the BPF target.
+
 @item -mxbpf
 Generate code for an expanded version of BPF, which relaxes some of
 the restrictions imposed by the BPF architecture:
-- 
2.33.0

[COMMITTED V2 4/7] btf: expose get_btf_id

2021-09-07 Thread David Faust via Gcc-patches

Expose the function get_btf_id, so that it may be used by the BPF
backend. This enables the BPF CO-RE machinery in the BPF backend to
lookup BTF type IDs, in order to create CO-RE relocation records.

A prototype is added in ctfc.h

gcc/ChangeLog:

* btfout.c (get_btf_id): Function is no longer static.
* ctfc.h: Expose it here.
---
 gcc/btfout.c | 2 +-
 gcc/ctfc.h   | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/btfout.c b/gcc/btfout.c
index 8cdd9905fb6..cdc6c6378c0 100644
--- a/gcc/btfout.c
+++ b/gcc/btfout.c
@@ -156,7 +156,7 @@ init_btf_id_map (size_t len)
 /* Return the BTF type ID of CTF type ID KEY, or BTF_INVALID_TYPEID if the CTF
type with ID KEY does not map to a BTF type.  */
 
-static inline ctf_id_t
+ctf_id_t
 get_btf_id (ctf_id_t key)
 {
   return btf_id_map[key];
diff --git a/gcc/ctfc.h b/gcc/ctfc.h
index 14180c1e5de..a0b7e4105a8 100644
--- a/gcc/ctfc.h
+++ b/gcc/ctfc.h
@@ -431,6 +431,7 @@ extern int ctf_add_variable (ctf_container_ref, const char 
*, ctf_id_t,
 dw_die_ref, unsigned int);
 
 extern ctf_id_t ctf_lookup_tree_type (ctf_container_ref, const tree);
+extern ctf_id_t get_btf_id (ctf_id_t);
 
 /* CTF section does not emit location information; at this time, location
information is needed for BTF CO-RE use-cases.  */
-- 
2.33.0

[COMMITTED V2 2/7] ctfc: externalize ctf_dtd_lookup

2021-09-07 Thread David Faust via Gcc-patches

Expose the function ctf_dtd_lookup, so that it can be used by the BPF
CO-RE machinery. The function is no longer static, and an extern
prototype is added in ctfc.h.

gcc/ChangeLog:

* ctfc.c (ctf_dtd_lookup): Function is no longer static.
* ctfc.h: Analogous change.
---
 gcc/ctfc.c | 2 +-
 gcc/ctfc.h | 5 -
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/gcc/ctfc.c b/gcc/ctfc.c
index 1a6ddb80829..db6ba030301 100644
--- a/gcc/ctfc.c
+++ b/gcc/ctfc.c
@@ -132,7 +132,7 @@ ctf_dtd_insert (ctf_container_ref ctfc, ctf_dtdef_ref dtd)
 
 /* Lookup CTF type given a DWARF die for the type.  */
 
-static ctf_dtdef_ref
+ctf_dtdef_ref
 ctf_dtd_lookup (const ctf_container_ref ctfc, const dw_die_ref type)
 {
   ctf_dtdef_t entry;
diff --git a/gcc/ctfc.h b/gcc/ctfc.h
index 39c527074b5..825570d807e 100644
--- a/gcc/ctfc.h
+++ b/gcc/ctfc.h
@@ -388,7 +388,10 @@ extern bool ctf_type_exists (ctf_container_ref, 
dw_die_ref, ctf_id_t *);
 
 extern void ctf_add_cuname (ctf_container_ref, const char *);
 
-extern ctf_dvdef_ref ctf_dvd_lookup (const ctf_container_ref, dw_die_ref);
+extern ctf_dtdef_ref ctf_dtd_lookup (const ctf_container_ref ctfc,
+dw_die_ref die);
+extern ctf_dvdef_ref ctf_dvd_lookup (const ctf_container_ref ctfc,
+dw_die_ref die);
 
 extern const char * ctf_add_string (ctf_container_ref, const char *,
uint32_t *, int);
-- 
2.33.0

[COMMITTED V2 3/7] ctfc: add function to lookup CTF ID of a TREE type

2021-09-07 Thread David Faust via Gcc-patches

Add a new function, ctf_lookup_tree_type, to return the CTF type ID
associated with a type via its is TREE node. The function is exposed via
a prototype in ctfc.h.

gcc/ChangeLog:

* ctfc.c (ctf_lookup_tree_type): New function.
* ctfc.h: Likewise.
---
 gcc/ctfc.c | 16 
 gcc/ctfc.h |  2 ++
 2 files changed, 18 insertions(+)

diff --git a/gcc/ctfc.c b/gcc/ctfc.c
index db6ba030301..73c118e3d49 100644
--- a/gcc/ctfc.c
+++ b/gcc/ctfc.c
@@ -791,6 +791,22 @@ ctf_add_sou (ctf_container_ref ctfc, uint32_t flag, const 
char * name,
   return type;
 }
 
+/* Given a TREE_TYPE node, return the CTF type ID for that type.  */
+
+ctf_id_t
+ctf_lookup_tree_type (ctf_container_ref ctfc, const tree type)
+{
+  dw_die_ref die = lookup_type_die (type);
+  if (die == NULL)
+return CTF_NULL_TYPEID;
+
+  ctf_dtdef_ref dtd = ctf_dtd_lookup (ctfc, die);
+  if (dtd == NULL)
+return CTF_NULL_TYPEID;
+
+  return dtd->dtd_type;
+}
+
 /* Check if CTF for TYPE has already been generated.  Mainstay for
de-duplication.  If CTF type already exists, returns TRUE and updates
the TYPE_ID for the caller.  */
diff --git a/gcc/ctfc.h b/gcc/ctfc.h
index 825570d807e..14180c1e5de 100644
--- a/gcc/ctfc.h
+++ b/gcc/ctfc.h
@@ -430,6 +430,8 @@ extern int ctf_add_function_arg (ctf_container_ref, 
dw_die_ref,
 extern int ctf_add_variable (ctf_container_ref, const char *, ctf_id_t,
 dw_die_ref, unsigned int);
 
+extern ctf_id_t ctf_lookup_tree_type (ctf_container_ref, const tree);
+
 /* CTF section does not emit location information; at this time, location
information is needed for BTF CO-RE use-cases.  */
 
-- 
2.33.0

[COMMITTED V2 1/7] dwarf: externalize lookup_type_die

2021-09-07 Thread David Faust via Gcc-patches

Expose the function lookup_type_die in dwarf2out, so that it can be used
by CTF/BTF when adding BPF CO-RE information. The function is now
non-static, and an extern prototype is added in dwarf2out.h.

gcc/ChangeLog:

* dwarf2out.c (lookup_type_die): Function is no longer static.
* dwarf2out.h: Expose it here.
---
 gcc/dwarf2out.c | 3 +--
 gcc/dwarf2out.h | 1 +
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index 72cd1f51380..9876750e4f9 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -3740,7 +3740,6 @@ static bool remove_AT (dw_die_ref, enum dwarf_attribute);
 static void remove_child_TAG (dw_die_ref, enum dwarf_tag);
 static void add_child_die (dw_die_ref, dw_die_ref);
 static dw_die_ref new_die (enum dwarf_tag, dw_die_ref, tree);
-static dw_die_ref lookup_type_die (tree);
 static dw_die_ref strip_naming_typedef (tree, dw_die_ref);
 static dw_die_ref lookup_type_die_strip_naming_typedef (tree);
 static void equate_type_number_to_die (tree, dw_die_ref);
@@ -5838,7 +5837,7 @@ new_die (enum dwarf_tag tag_value, dw_die_ref parent_die, 
tree t)
 
 /* Return the DIE associated with the given type specifier.  */
 
-static inline dw_die_ref
+dw_die_ref
 lookup_type_die (tree type)
 {
   dw_die_ref die = TYPE_SYMTAB_DIE (type);
diff --git a/gcc/dwarf2out.h b/gcc/dwarf2out.h
index b2152a53bf9..312a9909784 100644
--- a/gcc/dwarf2out.h
+++ b/gcc/dwarf2out.h
@@ -417,6 +417,7 @@ extern dw_die_ref new_die_raw (enum dwarf_tag);
 extern dw_die_ref base_type_die (tree, bool);
 
 extern dw_die_ref lookup_decl_die (tree);
+extern dw_die_ref lookup_type_die (tree);
 
 extern dw_die_ref dw_get_die_child (dw_die_ref);
 extern dw_die_ref dw_get_die_sib (dw_die_ref);
-- 
2.33.0

[COMMITTED V2 0/7] BPF CO-RE Support

2021-09-07 Thread David Faust via Gcc-patches

[ Changes from V1:

  All patches have been OK'd, but the prerequisite series "Allow means for late
  BTF generation for BPF CO-RE" had not been accepted. Now that that series has
  been applied, this can be pushed with some very minor tweaks:

  - Accomodate rename of option '-mco-re' (was -mcore) in option handling,
tests and documentation.
  - Set BTF_WITH_CORE_DEBUG write symbols flag where needed. ]

Hello,

This patch series adds support for the BPF Compile Once - Run Everywhere
(BPF CO-RE) mechanism in GCC.

A BPF program is some user code which is injected (via a verifier and loader)
into a running kernel, and executed in kernel context. To do useful work, a BPF
program generally must interact with kernel data structures in some way.
Therefore, BPF programs written in C usually include kernel headers.

This introduces two major portability issues when compiling BPF programs:

   1. Kernel data structures regularly change, with fields added, moved or
  deleted between versions. An eBPF program cannot in general be expected
  to run on any systems which does not share an identical kernel version to
  the system on which it was compiled.

   2. Included kernel headers (and used data structures) may be internal, not
  exposed in an userspace API, and therefore target-specific. An eBPF
  program compiled on an x86_64 machine will include x86_64 kernel headers.
  The resulting program may not run well (or at all) in machines of
  another architecture.

BPF CO-RE is designed to solve the first issue by leveraging the BPF loader to
adjust references to kernel data structures made by the program as-needed
according to versions of structures actually present on the host kernel.

To achieve this, additional information is placed in a ".BTF.ext" section.  This
information tells the loader which references will require adjusting, and how to
perform each necessary adjustment.

For any access to a data structure which may require load-time adjustment,
the following information is recorded (making up a CO-RE relocation record):
- The BTF type ID of the outermost structure which is accessed.
- An access string encoding the accessed member via a series of member and
  array indexes. These indexes are used to look up detailed BTF information
  about the member.
- The offset of the appropriate instruction to patch in the BPF program.
- An integer specifying what kind of relocation to perform.

A CO-RE-capable BPF loader reads this information together with the BTF
information of the program, compares it against BTF information of the host
kernel, and determines the appropriate way to patch the specified instruction.

Once all CO-RE relocations are resolved, the program is loaded and verified as
usual. The process can be summarized with the following diagram:

  ++
  | C compiler |
  +-+--+
| BPF + BTF + CO-RE relocations
v
  ++
 +--->| BPF loader |
 |+-+--+
 |  | BPF (adapted)
 BTF |  v
 |++
 ++   Kernel   |
  ++

Note that a single ELF object may contain multiple eBPF programs. As a result, a
single .BTF.ext section can contain CO-RE relocations for multiple programs in
distinct sections.

Many data structure accesses (e.g., those described in the program itself) do
not need to be patched. So, GCC only generates CO-RE information for accesses
marked as being "of interest." To be compatible with LLVM a new BPF target
builtin, __builtin_preserve_access_index, is implemented. Any accesses to
aggregate data structures (structs, unions, arrays) in the argument will have
appropriate CO-RE information generated and output. This builtin is otherwise
transparent - it does not alter the program's functionality in any way.

In addition, a new BPF target attribute preserve_access_index is added.  This
attribute may annotate struct and union type definitions. Any access to a type
with this attribute is automatically "of interest," and will have CO-RE
information generated accordingly.

Finally, generation of BPF CO-RE information is gated behind a new BPF option,
-mcore (and its negative, -mno-core). Because CO-RE support is intimately tied
to BTF debug information, -gbtf for BPF target implies -mcore, and -mcore
requires BTF generation. For cases where BTF information is desired but CO-RE
is not important, it can be disabled with -mno-core.

David Faust (7):
  dwarf: externalize lookup_type_die
  ctfc: externalize ctf_dtd_lookup
  ctfc: add function to lookup CTF ID of a TREE type
  btf: expose get_btf_id
  bpf: BPF CO-RE support
  bpf testsuite: Add BPF CO-RE tests
  doc: BPF CO-RE documentation

 gcc/btfout.c  |   2 +-
 gcc/config.gcc|   3 +
 gcc/config/bpf/bpf-passes.def |  20 +

[Committed] Fix fatal typo in gcc.dg/no_profile_instrument_function-attr-2.c

2021-09-07 Thread Hans-Peter Nilsson via Gcc-patches

(Committed as obvious.)

Dejagnu is unfortunately brittle: a syntax error in a
directive can abort the test-run for the current "tool"
(gcc, g++, gfortran), and if you don't check for this
condition or actually read the stdout log yourself, your
tools may make you believe the test was successful without
regressions.  At the very least, always grep for ^ERROR: in
the stdout log!

With r12-3379, the testsuite got such a fatal syntax error,
causing the gcc test-run to abort at (e.g.):

...
FAIL: gcc.dg/memchr.c (test for excess errors)
FAIL: gcc.dg/memcmp-3.c (test for excess errors)
ERROR: (DejaGnu) proc "scan-tree-dump-not\" = foo {\(\)"} optimized" does not 
exist.
The error code is TCL LOOKUP COMMAND scan-tree-dump-not\"
The info on the error is:
invalid command name "scan-tree-dump-not""
while executing
"::tcl_unknown scan-tree-dump-not\" = foo {\(\)"} optimized"
("uplevel" body line 1)
invoked from within
"uplevel 1 ::tcl_unknown $args"

=== gcc Summary ===

# of expected passes63740
# of unexpected failures38
# of unexpected successes   2
# of expected failures  351
# of unresolved testcases   3
# of unsupported tests  662
x/cris-elf/gccobj/gcc/xgcc  version 12.0.0 20210907 (experimental)\
 [master r12-3391-g849d5f5929fc] (GCC)

testsuite:
* gcc.dg/no_profile_instrument_function-attr-2.c: Fix
typo in last change.
---
 gcc/testsuite/gcc.dg/no_profile_instrument_function-attr-2.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/no_profile_instrument_function-attr-2.c 
b/gcc/testsuite/gcc.dg/no_profile_instrument_function-attr-2.c
index 472eca88efdd..2e93ee5f6891 100644
--- a/gcc/testsuite/gcc.dg/no_profile_instrument_function-attr-2.c
+++ b/gcc/testsuite/gcc.dg/no_profile_instrument_function-attr-2.c
@@ -12,4 +12,4 @@ int bar()
   return foo();
 }
 
-/* { dg-final { scan-tree-dump-not" = foo \\(\\)" "optimized"} } */
+/* { dg-final { scan-tree-dump-not " = foo \\(\\)" "optimized"} } */
-- 
2.11.0

[PATCH] configure: Avoid unnecessary constraints on executables for $build.

2021-09-07 Thread Iain Sandoe

Hi Folks,

So, looking through the various email threads and the PR, I think that
what has happened is :

As the PR points out, our existing PCH model does not work if the compiler
executable is PIE - which manifests on platforms like Darwin (which is PIE
by default) or Linux when configured —enable-default-pie.

H.J’s original patch forces no-PIE onto the compiler executables, and
because of shared code on $host also to the driver etc.

However, the patch also forces no-PIE onto executables that run on
$build (e.g. the generators etc) which is not needed (and breaks bootstrap
for at least one case, albeit one not often tested).

Marcus, observed that there was no separation in the treatment of $build
and $host, and a follow-on patch was applied that made the no-PIE change
to $build distinct.

However, IMHO, the correct change there would really be to remove the
code applying no-PIE to $build (where it is not required).

The patch below makes this change and thus fixes the bootstrap regression.

Testing - all supported languages :

x86_64, powerpc64le, powerpc64, aarch64 - linux
x64_64, powerpc, i686 - darwin
aix (nop, since ASLR is not used)
solaris, thanks to Rainer (since the machine on the cfarm doesn’t support
PIE).

All Linux platforms configured —enable-default-pie (without which one
cannot observe this anyway).

crosses and “canadians” (actually all those here are $target=$host).

$build = x86_64-linux, $host=$target = aarch64-linux
$build = powerpcle-linux $host=$target = x86_64-linux
$build= x86_64-darwin, $target = powerpc-darwin
$build = x86_64-darwin $host=$target=aarch64-darwin20

The seems to be a problem in building the native-crossed libgo, but that
is not a result of this patch.  Other than this - nothing untoward is observed
and I’ve manually checked that the $build exes are PIE and the $host ones
are not...

OK for master, and eventually backports?
Iain


[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71934
[2] https://gcc.gnu.org/pipermail/gcc-patches/2015-October/432180.html
[3] https://gcc.gnu.org/pipermail/gcc-patches/2021-August/578169.html

—— commit log

The executables for GCC's c-family compilers must be built with no-PIE
because they use PCH and the current model for this requires that the
exe is always lauched at the same address.  Since the other language
compilers share code with the c-family this constraint is also applied
to them.

However, the executables that run on $build (generators, and parsers
for md and def files) need not have any such constraint; they do not
consume PCH files.

This change simplifies the configuration and Makefile content by
removing the code enforcing no-PIE on these exes.  This also fixes a
bootstrap issue with some Darwin versions and clang as the bootstrap
compiler,  where -no-PIE causes the correct relocation model to be
switched off leading to invalid user-space code.

Signed-off-by: Iain Sandoe 

gcc/ChangeLog:

* Makefile.in: Remove variables related to applying no-PIE
to the exes on $build.
* configure: Regenerate.
* configure.ac: Remove configuration related to applying
no-PIE to the exes on $build.
---
 gcc/Makefile.in  |  7 ---
 gcc/configure| 18 ++
 gcc/configure.ac | 10 --
 3 files changed, 2 insertions(+), 33 deletions(-)

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index f0c560fe45b..d0c5ca214c9 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -799,13 +799,8 @@ DIR = ../gcc
 # Native compiler for the build machine and its switches.
 CC_FOR_BUILD = @CC_FOR_BUILD@
 CXX_FOR_BUILD = @CXX_FOR_BUILD@
-NO_PIE_CFLAGS_FOR_BUILD = @NO_PIE_CFLAGS_FOR_BUILD@
-NO_PIE_FLAG_FOR_BUILD = @NO_PIE_FLAG_FOR_BUILD@
 BUILD_CFLAGS= @BUILD_CFLAGS@ $(GENERATOR_CFLAGS) -DGENERATOR_FILE
 BUILD_CXXFLAGS = @BUILD_CXXFLAGS@ $(GENERATOR_CFLAGS) -DGENERATOR_FILE
-BUILD_NO_PIE_CFLAGS = @BUILD_NO_PIE_CFLAGS@
-BUILD_CFLAGS += $(BUILD_NO_PIE_CFLAGS)
-BUILD_CXXFLAGS += $(BUILD_NO_PIE_CFLAGS)
 
 # Native compiler that we use.  This may be C++ some day.
 COMPILER_FOR_BUILD = $(CXX_FOR_BUILD)
@@ -817,8 +812,6 @@ BUILD_LINKERFLAGS = $(BUILD_CXXFLAGS)
 
 # Native linker and preprocessor flags.  For x-fragment overrides.
 BUILD_LDFLAGS=@BUILD_LDFLAGS@
-BUILD_NO_PIE_FLAG = @BUILD_NO_PIE_FLAG@
-BUILD_LDFLAGS += $(BUILD_NO_PIE_FLAG)
 BUILD_CPPFLAGS= -I. -I$(@D) -I$(srcdir) -I$(srcdir)/$(@D) \
-I$(srcdir)/../include @INCINTL@ $(CPPINC) $(CPPFLAGS)
 
diff --git a/gcc/configure b/gcc/configure
index 500e3f68215..87d7b9c435b 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -753,10 +753,6 @@ FGREP
 SED
 LIBTOOL
 collect2
-NO_PIE_FLAG_FOR_BUILD
-NO_PIE_CFLAGS_FOR_BUILD
-BUILD_NO_PIE_FLAG
-BUILD_NO_PIE_CFLAGS
 STMP_FIXINC
 BUILD_LDFLAGS
 BUILD_CXXFLAGS
@@ -13336,24 +13332,14 @@ BUILD_CXXFLAGS='$(ALL_CXXFLAGS)'
 BUILD_LDFLAGS='$(LDFLAGS)'
 STMP_FIXINC=stmp-fixinc
 
-BUILD_NO_PIE_CFLAGS='$(NO_PIE_CFLAGS)'
-BUILD_NO_PIE_FLAG='$(NO_PIE_FLAG)'
-
 # And these apply if build != host, or we

Re: [PATCH v2] c++: Fix cp_tree_equal for template value args using dependent sizeof/alignof/noexcept expressions

2021-09-07 Thread Jakub Jelinek via Gcc-patches

On Tue, Sep 07, 2021 at 04:01:41PM -0400, Jason Merrill via Gcc-patches wrote:
> By the way, please avoid using non-ASCII characters in testcases unless
> that's specifically what you're testing:
> 
> > +// { dg-error "name ‘A::foo’" }
> 
> Here, change the smart quotes to . to match whatever the output locale uses
> for quotes.  And the error comments in dependent-name16.C seem to be on the
> wrong lines.

In the testsuite only 's will match, so it is safe to use those, but . is
fine too.
And the ()s probably need some backslashes, I think 3.

Jakub

Re: [PATCH v2] c++: Fix cp_tree_equal for template value args using dependent sizeof/alignof/noexcept expressions

2021-09-07 Thread Jason Merrill via Gcc-patches

By the way, please avoid using non-ASCII characters in testcases unless 
that's specifically what you're testing:



+// { dg-error "name ‘A::foo’" }


Here, change the smart quotes to . to match whatever the output locale 
uses for quotes.  And the error comments in dependent-name16.C seem to 
be on the wrong lines.


Jason

Re: [PATCH v2] c++: Fix cp_tree_equal for template value args using dependent sizeof/alignof/noexcept expressions

2021-09-07 Thread Jason Merrill via Gcc-patches


On 9/2/21 8:10 PM, Barrett Adair wrote:
Thanks for the feedback, Jason. Coming back to this today, The problem 
appears much deeper than I realized. I've attached another WIP version 
of the patch, including a couple of new test cases based on your 
feedback (for now, please excuse any misformatted dg-error comments).


The dependent-name16.C case demonstrates an error message regression 
caused by this patch; I'm trying to fix the regression. When compiling 
dependent-name16.C and breaking at cp/tree.c:3922, DECL_CONTEXT(t2) is 
NULL because the "g" declaration is still being parsed near the top of 
cp_parser_init_declarator, and context is only set later during that 
function. DECL_CONTEXT(t1), on the other hand, is set because the "f" 
declaration was already parsed.


I'm beginning to believe that a proper solution to this problem would 
require decorating the function template type parameter nodes with more 
function information (e.g. at least scope and name) prior to parsing the 
trailing return type, if not somehow setting the DECL_CONTEXT earlier in 
some form -- am I missing something?


Perhaps comparing DECL_SOURCE_LOCATION would be useful?

But actually, I suspect you're approaching this the wrong way: the 
problem in canon-type-15.C is "internal compiler error: canonical types 
differ for identical types ‘size_c’ and ‘size_c’".


The intent is that template arguments involving with distinct but 
equivalent arguments should themselves be distinct but equivalent: 
different tree nodes, but the same TYPE_CANONICAL.


That isn't happening here: they're different tree nodes and have 
different TYPE_CANONICAL, so they're considered non-equivalent, but 
structurally compare as equivalent, so we get the ICE above.  We need to 
fix TYPE_CANONICAL to match.


I think a straightforward approach would be to make 
any_template_arguments_need_structural_equality_p return true if one of 
the template arguments involves a function parameter, so that 
TYPE_CANONICAL gets set to 0 and we always do structural comparison in 
that context.


Also, in the first place, I'm a little confused why we insert 
dependent-arg instantiations into the specialization/instantiation hash 
tables before any top-level instantiation occurs. From a bird's eye 
view, the benefit/necessity of this design is unclear. Can anyone point 
me to some background reading here?


So that whenever we write e.g. A we get the same type node.

Jason

[PATCH] coroutines: Small cleanups to await_statement_walker [NFC].

2021-09-07 Thread Iain Sandoe

Hi,

This is a small code cleanup patch, but is useful in follow-on work to
fix actual bugs - by making it only one place that we need to consider
the flattening of a statement containing await expressions.

tested on x86-64, powerpc-linux, x86_64-darwin,
OK for master?
thanks
Iain

— commit message

There is no need to make a MODIFY_EXPR for any of the condition
vars that we synthesize.

Expansion of co_return can be carried out independently of any
co_awaits that might be contained which simplifies this.

Where we are rewriting statements to handle await expression
logic, there is no need to carry out any analysis - we just need
to detect the presence of any co_await.

Signed-off-by: Iain Sandoe 

gcc/cp/ChangeLog:

* coroutines.cc (await_statement_walker): Code cleanups.
---
 gcc/cp/coroutines.cc | 121 ---
 1 file changed, 56 insertions(+), 65 deletions(-)

diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index d2cc2e73c89..27556723b71 100644
--- a/gcc/cp/coroutines.cc
+++ b/gcc/cp/coroutines.cc
@@ -3412,16 +3412,11 @@ await_statement_walker (tree *stmt, int *do_subtree, 
void *d)
   return NULL_TREE;
 }
 
-  /* We have something to be handled as a single statement.  */
-  bool has_cleanup_wrapper = TREE_CODE (*stmt) == CLEANUP_POINT_EXPR;
-  hash_set visited;
-  awpts->saw_awaits = 0;
-  hash_set truth_aoif_to_expand;
-  awpts->truth_aoif_to_expand = _aoif_to_expand;
-  awpts->needs_truth_if_exp = false;
-  awpts->has_awaiter_init = false;
+  /* We have something to be handled as a single statement.  We have to handle
+ a few statements specially where await statements have to be moved out of
+ constructs.  */
   tree expr = *stmt;
-  if (has_cleanup_wrapper)
+  if (TREE_CODE (*stmt) == CLEANUP_POINT_EXPR)
 expr = TREE_OPERAND (expr, 0);
   STRIP_NOPS (expr);
 
@@ -3437,6 +3432,8 @@ await_statement_walker (tree *stmt, int *do_subtree, void 
*d)
   transforms can be implemented.  */
case IF_STMT:
  {
+   tree *await_ptr;
+   hash_set visited;
/* Transform 'if (cond with awaits) then stmt1 else stmt2' into
   bool cond = cond with awaits.
   if (cond) then stmt1 else stmt2.  */
@@ -3444,10 +3441,8 @@ await_statement_walker (tree *stmt, int *do_subtree, 
void *d)
/* We treat the condition as if it was a stand-alone statement,
   to see if there are any await expressions which will be analyzed
   and registered.  */
-   if ((res = cp_walk_tree (_COND (if_stmt),
-   analyze_expression_awaits, d, )))
- return res;
-   if (!awpts->saw_awaits)
+   if (!(cp_walk_tree (_COND (if_stmt),
+ find_any_await, _ptr, )))
  return NULL_TREE; /* Nothing special to do here.  */
 
gcc_checking_assert (!awpts->bind_stack->is_empty());
@@ -3463,7 +3458,7 @@ await_statement_walker (tree *stmt, int *do_subtree, void 
*d)
/* We want to initialize the new variable with the expression
   that contains the await(s) and potentially also needs to
   have truth_if expressions expanded.  */
-   tree new_s = build2_loc (sloc, MODIFY_EXPR, boolean_type_node,
+   tree new_s = build2_loc (sloc, INIT_EXPR, boolean_type_node,
 newvar, cond_inner);
finish_expr_stmt (new_s);
IF_COND (if_stmt) = newvar;
@@ -3477,25 +3472,25 @@ await_statement_walker (tree *stmt, int *do_subtree, 
void *d)
  break;
case FOR_STMT:
  {
+   tree *await_ptr;
+   hash_set visited;
/* for loops only need special treatment if the condition or the
   iteration expression contain a co_await.  */
tree for_stmt = *stmt;
/* Sanity check.  */
-   if ((res = cp_walk_tree (_INIT_STMT (for_stmt),
-   analyze_expression_awaits, d, )))
- return res;
-   gcc_checking_assert (!awpts->saw_awaits);
-
-   if ((res = cp_walk_tree (_COND (for_stmt),
-   analyze_expression_awaits, d, )))
- return res;
-   bool for_cond_await = awpts->saw_awaits != 0;
-   unsigned save_awaits = awpts->saw_awaits;
-
-   if ((res = cp_walk_tree (_EXPR (for_stmt),
-   analyze_expression_awaits, d, )))
- return res;
-   bool for_expr_await = awpts->saw_awaits > save_awaits;
+   gcc_checking_assert
+ (!(cp_walk_tree (_INIT_STMT (for_stmt), find_any_await,
+  _ptr, )));
+
+   visited.empty ();
+   bool for_cond_await
+ = cp_walk_tree (_COND (for_stmt), find_any_await,
+ _ptr, );
+
+   visited.empty ();
+   bool for_expr_await
+ = cp_walk_tree (_EXPR (for_stmt), find_any_await,
+

Re: [patch][version 8]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-09-07 Thread Qing Zhao via Gcc-patches




> On Sep 7, 2021, at 11:57 AM, Qing Zhao via Gcc-patches 
>  wrote:
> 
> Hi, Richard,
> 
> Thanks a lot for your review.
> 
>> On Sep 6, 2021, at 5:16 AM, Richard Biener  wrote:
>> 
>> On Sat, 21 Aug 2021, Qing Zhao wrote:
>> 
>>> Hi,
>>> 
>>> This is the 8th version of the patch for the new security feature for GCC.
>>> I have tested it with bootstrap on both x86 and aarch64, regression testing 
>>> on both x86 and aarch64.
>>> Also tested it with the kernel testing case provided by Kees.
>>> Also compile CPU2017 (running is ongoing), without any issue.
>>> 
>>> Please take a look at this patch and let me know any issues.
>> 
>> +  /* If this DECL is a VLA, a temporary address variable for it has been
>> + created, the replacement for DECL is recorded in DECL_VALUE_EXPR 
>> (decl),
>> + we should use it as the LHS of the call.  */
>> +
>> +  tree lhs_call
>> += is_vla ? DECL_VALUE_EXPR (decl) : decl;
>> +  gimplify_assign (lhs_call, call, seq_p);
>> 
>> you shouldn't need to replace the lhs with DECL_VALUE_EXPR of it
>> here, gimplify_assign should take care of that.
> 
> Okay, I see.
> 
> I will change the above sequence simply to the following:
> 
> -  /* If this DECL is a VLA, a temporary address variable for it has been
> - created, the replacement for DECL is recorded in DECL_VALUE_EXPR (decl),
> - we should use it as the LHS of the call.  */
> -
> -  tree lhs_call
> -= is_vla ? DECL_VALUE_EXPR (decl) : decl;
>   gimplify_assign (lhs_call, call, seq_p);
> }
> 

Changes here should be:

-  /* If this DECL is a VLA, a temporary address variable for it has been
- created, the replacement for DECL is recorded in DECL_VALUE_EXPR (decl),
- we should use it as the LHS of the call.  */
-
-  tree lhs_call
-= is_vla ? DECL_VALUE_EXPR (decl) : decl;
-  gimplify_assign (lhs_call, call, seq_p);
+  gimplify_assign (decl, call, seq_p);
 }

Qing

Re: [Patch] Fortran: Handle allocated() with coindexed scalars [PR93834] (was: [PATCH] PR fortran/93834 - [9/10/11/12 Regression] ICE in trans_caf_is_present, at fortran/trans-intrinsic.c:8469)

2021-09-07 Thread Harald Anlauf via Gcc-patches

Hi Tobias,

I think I can follow now what you are thinking, and I also had some
thoughts about what you be done in principle.

I was struggling the way I did because of:

(1) Intel rejects the code in the PR.  For my previous patch,

% ifort coarray_allocated.f90 -coarray
coarray_allocated.f90(8): error #7364: The argument to the ALLOCATED intrinsic 
cannot be a coindexed object.   [A]
  print *, allocated (a[1]) ! { dg-error "shall not be coindexed" }
--^
compilation aborted for coarray_allocated.f90 (code 1)


(2) F2018: 16.9.11  ALLOCATED (ARRAY) or ALLOCATED (SCALAR)

Arguments.
ARRAY   shall be an allocatable array.
SCALAR  shall be an allocatable scalar.


(3) F2018: 9.4  Scalars

9.4.3 Coindexed named objects
A coindexed-named-object is a named scalar coarray variable followed by an 
image selector.

F2018: 9.5  Arrays (or 9.4  Scalars)

F2018: 9.6 Image selectors
An image selector determines the image index for a coindexed object.


Those are the sources of information I had, and which I interpreted in
the way that even if A is an allocatable coarray (scalar or array),
when adding an image selector, like in A[N], that object would not
satisfy the requirements for the ALLOCATED intrinsic.  It also doesn't
say coarray here.

I really didn't think about the synchronization stuff here.

I also didn't read the section on the ALLOCATE statement, but in fact
there is the following (probably in line with your argument):

9.7.1.2  Execution of an ALLOCATE statement

"The coarray shall not become allocated on an image unless it is
 successfully allocated on all active images in this team."

and the following note which says:

"When an image executes an ALLOCATE statement, communication is not
 necessarily involved apart from any required for synchronization. ..."

So a dirty shortcut could be allowed if the ALLOCATED() is considered valid.

Harald

> Gesendet: Dienstag, 07. September 2021 um 16:33 Uhr
> Von: "Tobias Burnus" 
> An: "Harald Anlauf" 
> Cc: "fortran" , "gcc-patches" 
> Betreff: [Patch] Fortran: Handle allocated() with coindexed scalars [PR93834] 
> (was: [PATCH] PR fortran/93834 - [9/10/11/12 Regression] ICE in 
> trans_caf_is_present, at fortran/trans-intrinsic.c:8469)
>
> Now I actually tested the patch – and fixed some issues.
> 
> OK? – It does add support for 'allocated(a[i])' by treating
> it as 'allocated(a)', as 'a' must be collectively allocated
> ("established") on all images of the team.*
> 
> 'a[i]' is (probably) an allocatable, following Malcolm in
> answer to my question to the J3-list as linked below.
> 
> Tobias
> 
> * Ignoring issues related to failed images. It could
> also be handled by fetching 'a' from the remote
> image, but I am not sure that's better in terms of
> handling failed images.
> 
> PS:
> On 07.09.21 10:02, Tobias Burnus wrote:
> > Hi Harald,
> >
> > I spend yesterday about two hours with this. Now I am still
> > tired but understand more. I think the confusion between the
> > two of us is due to wording and in which directions the
> > thoughts then go:
> >
> >
> > Talking about coindexed, all of a[i], b[i]%c and c%d[i] are
> > coindexed and there are many constraints like "shall not be
> > a coindexed variable" – which then rejects all of those.
> > That's what I was thinking of.
> >
> > I think your starting point is that while ('a' = allocatable)
> >   a, b%a, c[5]%d(1)%a
> > are ALLOCATABLE, adding a subobject reference such as
> >   a(:), b%a(:,:), c[5]%d(1)%a(:,:,:)
> > makes the variable no longer allocatable.
> > I think that's what you were thinking of.
> >
> > We then both argued along those different lines – which caused
> > the confusion as we both thought we talked about the same.
> >
> >
> > While those cases are clear, the question is whether
> >   a[i] or b%a[i]
> > is allocatable or not – assuming that 'a' is a scalar.
> > (For an array, '(:)' has to appear before the image-selector,
> > which in turn makes it nonallocatable.)
> >
> >
> > I tried to pinpoint the words for this in the standard – and
> > failed. I think I need a "how to read the Fortran standard" 101
> > and some long time actually reading it :-(
> >
> > Malcolm has answered me – and he believes (but only offhand) that
> >   a[i]  and  b%a[i]
> > _are_ allocatable. See (6) at
> > https://mailman.j3-fortran.org/pipermail/j3/2021-September/013322.html
> >
> >
> > This implies that
> >   if ( allocated (a[i]) .and. allocated (b%a[i]) ) stop 1
> > is valid.
> >
> > However, I do note that coarray allocatables have to be collectively
> > (de)allocated, therefore
> >   allocated (a[i]) .and. allocated (b%a[i])
> > is equivalent to
> >   allocated (a) .and. allocated (b%a)
> > at least assuming that no image has failed.
> >
> >
> > First: Does this answer all the questions you had and resolved the
> > confusion?
> > Secondly, do you agree about the last bits of the analysis?
> > Thirdly, what do you think of the attached patch?
> >
> > Tobias
> -
>

Re: [PATCH v2] c-family: Add __builtin_assoc_barrier

2021-09-07 Thread Jason Merrill via Gcc-patches


On 9/6/21 8:21 AM, Matthias Kretz wrote:

Hi,

On Tuesday, 20 July 2021 22:22:02 CEST Jason Merrill wrote:

The C++ front end already uses PAREN_EXPR in templates to indicate
parenthesized initializers in cases where that matters for
decltype(auto).  It should be fine to use it for both that and
__builtin_assoc_barrier, but you probably want to distinguish them with
a TREE_LANG_FLAG, and change tsubst_copy_and_build to keep the
PAREN_EXPR in this case.


I reused REF_PARENTHESIZED_P for PAREN_EXPR.


For constexpr you probably just need to add handling to
cxx_eval_constant_expression to evaluate its operand instead.


OK, that was easy.

On Monday, 19 July 2021 14:34:12 CEST Richard Biener wrote:

On Mon, 19 Jul 2021, Matthias Kretz wrote:

tested on x86_64-pc-linux-gnu with no new failures. OK for master?


I think now that PAREN_EXPR can appear in C++ code you need to
adjust some machiner to expect it (constexpr folding?  template stuff?).
I suggest to add some testcases covering templates and constexpr
functions.


Right. I expanded the test.


+@deftypefn {Built-in Function} @var{type} __builtin_assoc_barrier
(@var{type} @var{expr})
+This built-in represents a re-association barrier for the floating-point
+expression @var{expr} with operations following the built-in. The
expression
+@var{expr} itself can be reordered, and the whole expression @var{expr}
can
be
+reordered with operations after the barrier.

What operations follow the built-in also applies to operations leading
the builtin?  Maybe "This built-in represents a re-association barrier
for the floating-point expression @var{expr} with the expression
consuming its value."  But I'm not an english speaker - I guess
I'm mostly confused about "follow" here.


With "follow" I meant time / precedence and not that the operation follows
syntactically. So e.g. a + b * c: the addition follows after the
multiplication. It's probably not as precise as it could/should be. Also "the
whole expression @var{expr} can be reordered with operations after the
barrier" probably should say "with operands" not "with operations", right?


I'm not sure if there are better C/C++ language terms describing what
the builtin does, but basically it appears as opaque operand to the
surrounding expression and the surrounding expression is opaque
to the expression inside the parens.


I can't think of any other term that would help here.

Based upon your suggestion, the attached patch now says:
"This built-in inhibits re-association of the floating-point expression
@var{expr} with expressions consuming the return value of the built-in. The
expression @var{expr} itself can be reordered, and the whole expression
@var{expr} can be reordered with operands after the barrier. [...]"

New patch attached. OK to push?

---

New builtin to enable explicit use of PAREN_EXPR in C & C++ code.

Signed-off-by: Matthias Kretz 

gcc/testsuite/ChangeLog:

 * c-c++-common/builtin-assoc-barrier-1.c: New test.

gcc/cp/ChangeLog:

 * constexpr.c (cxx_eval_constant_expression): Handle PAREN_EXPR
 via cxx_eval_constant_expression.
 * cp-objcp-common.c (names_builtin_p): Handle
 RID_BUILTIN_ASSOC_BARRIER.
 * cp-tree.h: Adjust TREE_LANG_FLAG documentation to include
 PAREN_EXPR in REF_PARENTHESIZED_P.
 (REF_PARENTHESIZED_P): Add PAREN_EXPR.
 * parser.c (cp_parser_postfix_expression): Handle
 RID_BUILTIN_ASSOC_BARRIER.
 * pt.c (tsubst_copy_and_build): If the PAREN_EXPR is not a
 parenthesized initializer, evaluate by ignoring the PAREN_EXPR.
 * semantics.c (force_paren_expr): Simplify conditionals. Set
 REF_PARENTHESIZED_P on PAREN_EXPR.
 (maybe_undo_parenthesized_ref): Test PAREN_EXPR for
 REF_PARENTHESIZED_P.

gcc/c-family/ChangeLog:

 * c-common.c (c_common_reswords): Add __builtin_assoc_barrier.
 * c-common.h (enum rid): Add RID_BUILTIN_ASSOC_BARRIER.

gcc/c/ChangeLog:

 * c-decl.c (names_builtin_p): Handle RID_BUILTIN_ASSOC_BARRIER.
 * c-parser.c (c_parser_postfix_expression): Likewise.

gcc/ChangeLog:

 * doc/extend.texi: Document __builtin_assoc_barrier.
---
  gcc/c-family/c-common.c   |  1 +
  gcc/c-family/c-common.h   |  2 +-
  gcc/c/c-decl.c|  1 +
  gcc/c/c-parser.c  | 20 
  gcc/cp/constexpr.c|  6 +++
  gcc/cp/cp-objcp-common.c  |  1 +
  gcc/cp/cp-tree.h  | 12 +++--
  gcc/cp/parser.c   | 14 ++
  gcc/cp/pt.c   |  5 +-
  gcc/cp/semantics.c| 23 +++--
  gcc/doc/extend.texi   | 18 +++
  .../c-c++-common/builtin-assoc-barrier-1.c| 48 +++
  12 files changed,

Re: [PATCH] c++: Fix up constexpr evaluation of deleting dtors [PR100495]

2021-09-07 Thread Jason Merrill via Gcc-patches


On 9/7/21 3:56 AM, Jakub Jelinek wrote:

Hi!

We do not save bodies of constexpr clones and instead evaluate the bodies
of the constexpr functions they were cloned from.
I believe that is just fine for constructors because complete vs. base
ctors differ only in classes that have virtual bases and such constructors
aren't constexpr, similarly complete/base destructors.
But as the testcase below shows, for deleting destructors it is not fine,
deleting dtors while marked as clones in fact are just artificial functions
with synthetized body which calls the user destructor and deallocation.

So, either we'd need to evaluate the destructor and afterwards synthetize
and evaluate the deallocation, or we can just save and use the deleting
dtors bodies.  The latter seems much easier to me.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/11.3?


OK.


2021-09-07  Jakub Jelinek  

PR c++/100495
* constexpr.c (maybe_save_constexpr_fundef): Save body even for
constexpr deleting dtors.
(cxx_eval_call_expression): Don't use DECL_CLONED_FUNCTION for
deleting dtors.

* g++.dg/cpp2a/constexpr-new21.C: New test.

--- gcc/cp/constexpr.c.jj   2021-09-01 11:37:41.889557426 +0200
+++ gcc/cp/constexpr.c  2021-09-06 13:05:04.098652914 +0200
@@ -865,7 +865,7 @@ maybe_save_constexpr_fundef (tree fun)
if (processing_template_decl
|| !DECL_DECLARED_CONSTEXPR_P (fun)
|| cp_function_chain->invalid_constexpr
-  || DECL_CLONED_FUNCTION_P (fun))
+  || (DECL_CLONED_FUNCTION_P (fun) && !DECL_DELETING_DESTRUCTOR_P (fun)))
  return;
  
if (!is_valid_constexpr_fn (fun, !DECL_GENERATED_P (fun)))

@@ -2372,7 +2372,7 @@ cxx_eval_call_expression (const constexp
*non_constant_p = true;
return t;
  }
-  if (DECL_CLONED_FUNCTION_P (fun))
+  if (DECL_CLONED_FUNCTION_P (fun) && !DECL_DELETING_DESTRUCTOR_P (fun))
  fun = DECL_CLONED_FUNCTION (fun);
  
if (is_ubsan_builtin_p (fun))

--- gcc/testsuite/g++.dg/cpp2a/constexpr-new21.C.jj 2021-09-06 
13:09:59.326484091 +0200
+++ gcc/testsuite/g++.dg/cpp2a/constexpr-new21.C2021-09-06 
13:08:46.574511401 +0200
@@ -0,0 +1,17 @@
+// PR c++/100495
+// { dg-do compile { target c++20 } }
+
+struct S {
+  constexpr virtual ~S () {}
+};
+
+constexpr bool
+foo ()
+{
+  S *p = new S ();
+  delete p;
+  return true;
+}
+
+constexpr bool x = foo ();
+static_assert (x);

Jakub

Re: [PATCH 7/8 v2] coroutines: Make proxy vars for the function arg copies.

2021-09-07 Thread Jason Merrill via Gcc-patches


On 9/5/21 3:50 PM, Iain Sandoe wrote:

Hello Jason,


On 3 Sep 2021, at 15:23, Iain Sandoe  wrote:

On 3 Sep 2021, at 15:07, Jason Merrill via Gcc-patches 
 wrote:

On 9/1/21 6:56 AM, Iain Sandoe wrote:

This adds top level proxy variables for the coroutine frame



+ add_decl_expr (parm_i->copy_var);


Are these getting DECL_VALUE_EXPR somewhere?


Yes - all variables in the outermost bind expression will be allocated a frame 
slot, and gain a DECL_VALUE_EXPR to point to it.


Amended comments to say this.



+   }
+
+  /* Now replace all uses of the parms in the function body with the local
+vars.  */


I think the following old comment still applies to how 'visited' is used, and 
should be adapted here as well:


Yes, will do.

done

OK now?


OK.


thanks
Iain

[PATCH] coroutines: Make proxy vars for the function arg copies.

This adds top level proxy variables for the coroutine frame
copies of the original function args.  These are then available
in the debugger to refer to the frame copies.  We rewrite the
function body to use the copies, since the original parms will
no longer be in scope when the coroutine is running.

Signed-off-by: Iain Sandoe 

gcc/cp/ChangeLog:

* coroutines.cc (struct param_info): Add copy_var.
(build_actor_fn): Use simplified param references.
(register_param_uses): Likewise.
(rewrite_param_uses): Likewise.
(analyze_fn_parms): New function.
(coro_rewrite_function_body): Add proxies for the fn
parameters to the outer bind scope of the rewritten code.
(morph_fn_to_coro): Use simplified version of param ref.
---
  gcc/cp/coroutines.cc | 251 +--
  1 file changed, 121 insertions(+), 130 deletions(-)

diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index cd774b78ae0..d2cc2e73c89 100644
--- a/gcc/cp/coroutines.cc
+++ b/gcc/cp/coroutines.cc
@@ -1964,6 +1964,7 @@ transform_await_wrapper (tree *stmt, int *do_subtree, 
void *d)
  struct param_info
  {
tree field_id; /* The name of the copy in the coroutine frame.  */
+  tree copy_var; /* The local var proxy for the frame copy.  */
vec *body_uses; /* Worklist of uses, void if there are none.  */
tree frame_type;   /* The type used to represent this parm in the frame.  */
tree orig_type;/* The original type of the parm (not as passed).  */
@@ -2169,36 +2170,6 @@ build_actor_fn (location_t loc, tree coro_frame_type, 
tree actor, tree fnbody,
/* Declare the continuation handle.  */
add_decl_expr (continuation);
  
-  /* Re-write param references in the body, no code should be generated

- here.  */
-  if (DECL_ARGUMENTS (orig))
-{
-  tree arg;
-  for (arg = DECL_ARGUMENTS (orig); arg != NULL; arg = DECL_CHAIN (arg))
-   {
- bool existed;
- param_info  = param_uses->get_or_insert (arg, );
- if (!parm.body_uses)
-   continue; /* Wasn't used in the original function body.  */
-
- tree fld_ref = lookup_member (coro_frame_type, parm.field_id,
-   /*protect=*/1, /*want_type=*/0,
-   tf_warning_or_error);
- tree fld_idx = build3_loc (loc, COMPONENT_REF, parm.frame_type,
-actor_frame, fld_ref, NULL_TREE);
-
- /* We keep these in the frame as a regular pointer, so convert that
-  back to the type expected.  */
- if (parm.pt_ref)
-   fld_idx = build1_loc (loc, CONVERT_EXPR, TREE_TYPE (arg), fld_idx);
-
- int i;
- tree *puse;
- FOR_EACH_VEC_ELT (*parm.body_uses, i, puse)
-   *puse = fld_idx;
-   }
-}
-
/* Re-write local vars, similarly.  */
local_vars_transform xform_vars_data
  = {actor, actor_frame, coro_frame_type, loc, local_var_uses};
@@ -3771,11 +3742,11 @@ struct param_frame_data
bool param_seen;
  };
  
-/* A tree-walk callback that records the use of parameters (to allow for

-   optimizations where handling unused parameters may be omitted).  */
+/* A tree walk callback that rewrites each parm use to the local variable
+   that represents its copy in the frame.  */
  
  static tree

-register_param_uses (tree *stmt, int *do_subtree ATTRIBUTE_UNUSED, void *d)
+rewrite_param_uses (tree *stmt, int *do_subtree ATTRIBUTE_UNUSED, void *d)
  {
param_frame_data *data = (param_frame_data *) d;
  
@@ -3783,7 +3754,7 @@ register_param_uses (tree *stmt, int *do_subtree ATTRIBUTE_UNUSED, void *d)

if (TREE_CODE (*stmt) == VAR_DECL && DECL_HAS_VALUE_EXPR_P (*stmt))
  {
tree t = DECL_VALUE_EXPR (*stmt);
-  return cp_walk_tree (, register_param_uses, d, NULL);
+  return cp_walk_tree (, rewrite_param_uses, d, NULL);
  }
  
if (TREE_CODE (*stmt) != PARM_DECL)

@@ -3797,16 +3768,88 @@ register_param_uses (tree *stmt, int *do_subtree 
ATTRIBUTE_UNUSED, void *d)
param_info  =

Re: [PATCH 6/8] coroutines: Convert implementation variables to debug-friendly form.

2021-09-07 Thread Jason Merrill via Gcc-patches


On 9/5/21 3:47 PM, Iain Sandoe wrote:

Hello Jason,

The patch below is a squashed version of:

(approved) [PATCH 4/8] coroutines: Make some of the artificial names more  
debugger-friendly.
   [PATCH 5/8] coroutines: Define and populate accessors for 
debug  state.
   [PATCH 6/8] coroutines: Convert implementation variables to  
debug-friendly form.
(approved) [PATCH 8/8] coroutines: Make the continue handle visible to debug.


[PATCH 6/8] coroutines: Convert implementation variables to debug-friendly form.


On 3 Sep 2021, at 15:21, Iain Sandoe  wrote:

On 3 Sep 2021, at 15:12, Jason Merrill  wrote:

On 9/3/21 9:56 AM, Iain Sandoe wrote:

On 3 Sep 2021, at 14:52, Jason Merrill  wrote:

On 9/1/21 6:55 AM, Iain Sandoe wrote:



(morph_fn_to_coro): Likewise.


Hmm, this patch doesn't seem to match the description and ChangeLog entry other 
than in the names of the functions changed.

with 20:20 hindsight I should have squashed the (several) patches related to 
the implementation symbols,
I’ll redo the description - essentially, this is just making use of the 
simplification available because we now have pre-defined values for the field 
names.


I can see how that describes a few lines in this patch, but not for instance 
the change to transform_await_expr, which seems to have nothing to do with 
names?


it’s indirect indeed - but the changes we’ve made to the variable handling mean 
that we no longer need to rewrite the
proxy vars into their frame->offset; that is handled by the DECL_VALUE_EXPR (as 
for user’s vars ) - the change to transform_await_expr is removing the now defunct 
substitution (and poorly described, sorry).


now stated explicitly in the comments and in the commit log.


So that sounds like it goes with patch 1, rather than 4/5?  But it's 
fine as is.



But yes, moving the changed lines that just use the variables from the previous 
patch into that commit sounds good.  I use rebase -i for that sort of thing all 
the time.


yeah, me too -  I realised too late that this series could have had more 
squashing - if it would make things easier I could do that for the patches 
related to implemenation variables .. - which would include patch 8 (but not 
patch 7 which is related to parms only)


squashing the series should make the changes clearer.

[PATCH 5/8] coroutines: Define and populate accessors for debug  state.


+static GTY(()) tree coro_frame_needs_free_field;
+static GTY(()) tree coro_resume_index_field;
+static GTY(()) tree coro_self_handle_field;


Since these are identifiers, not FIELD_DECLs, calling them *_field seems 
misleading.

I could append _id or _name .. they were just getting long already.
(they are names of fields, so that would not be misleading)


_id works for me, either with or without the _field.



Done

OK now?


OK, thanks.


thanks
Iain

==

[PATCH] coroutines: Expose implementation state to the debugger.

In the process of transforming a coroutine into the separate representation
as the ramp function and a state machine, we generate some variables that
are of interest to a user during debugging.  Any variable that is persistent
for the execution of the coroutine is placed into the coroutine frame.

In particular:
   The promise object.
   The function pointers for the resumer and destroyer.
   The current resume index (suspend point).
   The handle that represents this coroutine 'self handle'.
   Any handle provided for a continuation coroutine.
   Whether the coroutine frame is allocated and needs to be freed.

Visibility of some of these has already been requested by end users.

This patch ensures that such variables have names that are usable in a
debugger, but are in the reserved namespace for the implementation (they
all begin with _Coro_).  The identifiers are generated lazily when the
first coroutine is encountered.

We place the variables into the outermost bind expression and then add a
DECL_VALUE_EXPR to each that points to the frame entry.

These changes simplify the handling of the variables in the body of the
function (in particular, the use of the DECL_VALUE_EXPR means that we now
no longer need to rewrite proxies for the promise and coroutine handles into
the frame->offset form).

Partial improvement to debugging (PR c++/99215).

Signed-off-by: Iain Sandoe 

gcc/cp/ChangeLog:

* coroutines.cc (coro_resume_fn_id, coro_destroy_fn_id,
coro_promise_id, coro_frame_needs_free_id, coro_resume_index_id,
coro_self_handle_id, coro_actor_continue_id,
coro_frame_i_a_r_c_id): New.
(coro_init_identifiers): Initialize new name identifiers.
(coro_promise_type_found_p): Use pre-built identifiers.
(struct await_xform_data): Remove unused fields.
(transform_await_expr): Delete code that is now unused.
(build_actor_fn): Simplify interface, use pre-built identifiers and
remove transforms that are no longer needed.

Re: [patch][version 8]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-09-07 Thread Qing Zhao via Gcc-patches

Hi, Richard,

Thanks a lot for your review.

> On Sep 6, 2021, at 5:16 AM, Richard Biener  wrote:
> 
> On Sat, 21 Aug 2021, Qing Zhao wrote:
> 
>> Hi,
>> 
>> This is the 8th version of the patch for the new security feature for GCC.
>> I have tested it with bootstrap on both x86 and aarch64, regression testing 
>> on both x86 and aarch64.
>> Also tested it with the kernel testing case provided by Kees.
>> Also compile CPU2017 (running is ongoing), without any issue.
>> 
>> Please take a look at this patch and let me know any issues.
> 
> +  /* If this DECL is a VLA, a temporary address variable for it has been
> + created, the replacement for DECL is recorded in DECL_VALUE_EXPR 
> (decl),
> + we should use it as the LHS of the call.  */
> +
> +  tree lhs_call
> += is_vla ? DECL_VALUE_EXPR (decl) : decl;
> +  gimplify_assign (lhs_call, call, seq_p);
> 
> you shouldn't need to replace the lhs with DECL_VALUE_EXPR of it
> here, gimplify_assign should take care of that.

Okay, I see.

I will change the above sequence simply to the following:

-  /* If this DECL is a VLA, a temporary address variable for it has been
- created, the replacement for DECL is recorded in DECL_VALUE_EXPR (decl),
- we should use it as the LHS of the call.  */
-
-  tree lhs_call
-= is_vla ? DECL_VALUE_EXPR (decl) : decl;
   gimplify_assign (lhs_call, call, seq_p);
 }


Let me know if this change is not right. 

> +/* Return true if the DECL need to be automaticly initialized by the
> +   compiler.  */
> +static bool
> +is_var_need_auto_init (tree decl)
> +{
> +  if (auto_var_p (decl)
> +  && (opt_for_fn (current_function_decl, flag_auto_var_init)
> 
> maybe I said otherwise at some point but you can test 'flag_auto_var_init'
> directly when not in an IPA pass, no need to use 'opt_for_fn'

Okay, I changed all the usage of “opt_for_fn” back to “flag_auto_var_init” 
since the new transformation is not in an IPA pass:

 is_var_need_auto_init (tree decl)
 {
   if (auto_var_p (decl)
-  && (opt_for_fn (current_function_decl, flag_auto_var_init)
-   > AUTO_INIT_UNINITIALIZED)
+  && (flag_auto_var_init > AUTO_INIT_UNINITIALIZED)
   && (!lookup_attribute ("uninitialized", DECL_ATTRIBUTES (decl
 return true;
   return false;
@@ -1941,8 +1934,7 @@ gimplify_decl_expr (tree *stmt_p, gimple_seq *seq_p)
   else if (is_var_need_auto_init (decl))
{
  gimple_add_init_for_auto_var (decl,
-   opt_for_fn (current_function_decl,
-   flag_auto_var_init),
+   flag_auto_var_init,
is_vla,
seq_p);
  /* The expanding of a call to the above .DEFERRED_INIT will apply
@@ -1953,8 +1945,7 @@ gimplify_decl_expr (tree *stmt_p, gimple_seq *seq_p)
 In order to make the paddings as zeroes for pattern init, We
 should add a call to __builtin_clear_padding to clear the
 paddings to zero in compatiple with CLANG.  */
- if (opt_for_fn (current_function_decl, flag_auto_var_init)
- == AUTO_INIT_PATTERN)
+ if (flag_auto_var_init == AUTO_INIT_PATTERN)
gimple_add_padding_init_for_auto_var (decl, is_vla, seq_p);
}
 }
(END)



> 
> +   > AUTO_INIT_UNINITIALIZED)
> +  && (!lookup_attribute ("uninitialized", DECL_ATTRIBUTES (decl
> +return true;
> +  return false;
> 
> 
> diff --git a/gcc/tree.c b/gcc/tree.c
> index e923e67b6942..23d7b17774ce 100644
> --- a/gcc/tree.c
> +++ b/gcc/tree.c
> @@ -9508,6 +9508,22 @@ build_common_builtin_nodes (void)
>   tree tmp, ftype;
>   int ecf_flags;
> 
> +  /* If user requests automatic variables initialization, the builtin
> + BUILT_IN_CLEAR_PADDING is needed.  */
> +  if (flag_auto_var_init > AUTO_INIT_UNINITIALIZED
> +  && !builtin_decl_explicit_p (BUILT_IN_CLEAR_PADDING))
> 
> I think this is prone to fail with LTO and auto-var-init setting
> different in different TUs.  Just build the builtin unconditionally
> when it's not available.

So, change the above to:

[opc@qinzhao-ol8u3-x86 gcc]$ git diff tree.c

diff --git a/gcc/tree.c b/gcc/tree.c
index d014fda16cd1..43624059223b 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -9510,10 +9510,7 @@ build_common_builtin_nodes (void)
   tree tmp, ftype;
   int ecf_flags;
 
-  /* If user requests automatic variables initialization, the builtin
- BUILT_IN_CLEAR_PADDING is needed.  */
-  if (flag_auto_var_init > AUTO_INIT_UNINITIALIZED
-  && !builtin_decl_explicit_p (BUILT_IN_CLEAR_PADDING))
+  if (!builtin_decl_explicit_p (BUILT_IN_CLEAR_PADDING))
 {
   ftype = build_function_type_list (void_type_node,
ptr_type_node,


> 
> +{
> +  ftype = build_function_type_list (void_type_node,
> +   ptr_type_node,
> +

Re: [Patch] libgomp.texi: Add OpenMP Implementation Status

2021-09-07 Thread Jakub Jelinek via Gcc-patches

On Tue, Sep 07, 2021 at 06:07:35PM +0200, Tobias Burnus wrote:
> > depend case from the other clauses.  The depend parsing tries to parse it
> > as the rigid varname followed by optional . field or array section and if
> > that fails, parses it as an expression, verifies it is lvalue and just
> > uses the address of that lvalue as the depend address.
> Is this okay (Y) – or should be better improved (P + comment)? I did use
> "Y".

Let's go with Y.

> > While for map/to/from, I think what we need to do is make the OpenMP array
> > section a new tree code (perhaps C/C++ FE only), ...
> Now split off and marked as "N".
> 
> I did look through some lists – and found range-based for loops (Y) as
> item not in Appendix B. (And fixed one typo)

Bronis didn't want that bullet because there was C++11/C++14 support bullet
and he argued that range based for loops falls into that.
But IMHO range for is significant change that is worth explicit bullet.

> libgomp.texi: Extend OpenMP 5.0 Implementation Status
> 
> libgomp/
> * libgomp.texi (OpenMP Implementation Status): Extend
>   OpenMP 5.0 section.
>   (OpenACC Profiling Interface): Fix typo.

Ok, thanks.

Jakub

Sv: [PATCH 1/2 v2] jit : Generate debug info for variables

2021-09-07 Thread Petter Tomner via Gcc-patches

I realized I still managed to mess up some WS. I have attached a patch that is 
the same, except fixes the WS issue 
underneath.

Regards, Petter

+  FOR_EACH_VEC_ELT (m_globals, i, global)
+rest_of_decl_compilation (global, true, true);


0001-libgccjit-Generate-debug-info-for-variables-WS-fix.patch
Description: 0001-libgccjit-Generate-debug-info-for-variables-WS-fix.patch

Re: [Patch] libgomp.texi: Add OpenMP Implementation Status

2021-09-07 Thread Tobias Burnus


On 07.09.21 13:34, Jakub Jelinek wrote:

On Tue, Sep 07, 2021 at 01:22:43PM +0200, Tobias Burnus wrote:

On 07.09.21 12:49, Jakub Jelinek wrote:

s/taits/traits/ and I'd add /handled/& correctly/

fixed.

+@item C/C++'s lvalue expressions in @code{to}, @code{from}, @code{depend}
+  and @code{map} clause @tab Y @tab

I think this is not implemented yet, at least not in trunk.
We don't allow map(to:foo(234)[:32]) or map(to:bar()->x->y[5].z[3]) etc.

I somehow had the impression that I saw lvalues for 'depend', [...]

We do indeed support it for depend (and affinity), just don't support in
the lvalue case array sections, so perhaps split the
depend case from the other clauses.  The depend parsing tries to parse it
as the rigid varname followed by optional . field or array section and if
that fails, parses it as an expression, verifies it is lvalue and just
uses the address of that lvalue as the depend address.

Is this okay (Y) – or should be better improved (P + comment)? I did use
"Y".

While for map/to/from, I think what we need to do is make the OpenMP array
section a new tree code (perhaps C/C++ FE only), ...

Now split off and marked as "N".

I did look through some lists – and found range-based for loops (Y) as
item not in Appendix B. (And fixed one typo)

Anything else? Otherwise, maybe we should get this in – and then slowly
improve those lists...

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
libgomp.texi: Extend OpenMP 5.0 Implementation Status

libgomp/
* libgomp.texi (OpenMP Implementation Status): Extend
	OpenMP 5.0 section.
	(OpenACC Profiling Interface): Fix typo.

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 0ae9c3260ff..64085182620 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -172,8 +172,98 @@ The OpenMP 4.5 specification is fully supported.
 @node OpenMP 5.0
 @section OpenMP 5.0
 
-Partial support of the OpenMP 5.0 specification. The OMPT and the OMPD
-interfaces are unsupported.
+@unnumberedsubsec New features listed in Appendix B of the OpenMP specification
+@c This list is sorted as in OpenMP 5.1's B.3 not as in OpenMP 5.0's B.2
+
+@multitable @columnfractions .60 .10 .25
+@headitem Description @tab Status @tab Comments
+@item Array shaping @tab N @tab
+@item Array sections with non-unit strides in C and C++ @tab N @tab
+@item Iterators @tab Y @tab
+@item @code{metadirective} directive @tab N @tab
+@item @code{declare variant} directive
+  @tab P @tab Only C and C++, simd traits not handled correctly
+@item @emph{target-offload-var} ICV and @code{OMP_TARGET_OFFLOAD}
+  env variable @tab Y @tab
+@item Nested-parallel changes to @emph{max-active-levels-var} ICV @tab Y @tab
+@item @code{requires} directive @tab P
+  @tab Only fulfillable requirement is @code{atomic_default_mem_order}
+@item @code{teams} construct outside an enclosing target region @tab Y @tab
+@item Non-rectangular loop nests @tab Y @tab
+@item @code{!=} as relational-op in canonical loop form for C/C++ @tab Y @tab
+@item @code{nonmonotonic} as default loop schedule modifier for worksharing-loop
+  constructs @tab Y @tab
+@item Collapse of associated loops that are imperfectly nested loops @tab N @tab
+@item Clauses @code{if}, @code{nontemporal} and @code{order(concurrent)} in
+  @code{simd} construct @tab Y @tab
+@item @code{atomic} constructs in @code{simd} @tab Y @tab
+@item @code{loop} construct @tab Y @tab
+@item @code{order(concurrent)} clause @tab Y @tab
+@item @code{scan} directive and @code{in_scan} modifier for the
+  @code{reduction} clause @tab Y @tab
+@item @code{in_reduction} clause on @code{task} constructs @tab Y @tab
+@item @code{in_reduction} clause on @code{target} constructs @tab P
+  @tab Only C/C++, @code{nowait} only stub
+@item @code{task_reduction} clause with @code{taskgroup} @tab Y @tab
+@item @code{task} modifier to @code{reduction} clause @tab Y @tab
+@item @code{affinity} clause to @code{task} construct @tab Y @tab Stub only
+@item @code{detach} clause to @code{task} construct @tab Y @tab
+@item @code{omp_fulfill_event} runtime routine @tab Y @tab
+@item @code{reduction} and @code{in_reduction} clauses on @code{taskloop}
+  and @code{taskloop simd} constructs @tab Y @tab
+@item @code{taskloop} construct cancelable by @code{cancel} construct
+  @tab Y @tab
+@item @code{mutexinouset} @emph{dependence-type} for @code{depend} clause
+  @tab Y @tab
+@item Predefined memory spaces, memory allocators, allocator traits
+  @tab Y @tab Some are only stubs
+@item Memory management routines @tab Y @tab
+@item @code{allocate} directive @tab N @tab
+@item @code{allocate} clause @tab P @tab initial support in C/C++ only
+@item @code{use_device_addr} clause on @code{target data} @tab Y @tab

Re: [Patch] libgfortran: Makefile fix for ISO_Fortran_binding.h

2021-09-07 Thread Tobias Burnus


Now committed as r12-3384-gfc4f0631de806c89a383fd02428a16e91068b9f6

Sorry for the breakage – and thanks for the report on IRC, Richard!

Tobias

On 07.09.21 16:13, Tobias Burnus wrote:

Since the last libgfortran/Makefile.am commit,
  https://gcc.gnu.org/g:13beaf9e8d2d8264c0ad8f6504793fdcf26f3f73
the ISO_Fortran_binding.h file is no longer copied to
$(build)/.../libgfortran/include/ – which breaks in-build-tree testing.

The Makefile does contain the rule:

   ISO_Fortran_binding.h: $(srcdir)/ISO_Fortran_binding.h

but make does not regard this as invitation to copy it from $srcdir to
$build
but just prints:

make: Circular $(srcdir)/ISO_Fortran_binding.h <-
$(srcdir)/ISO_Fortran_binding.h dependency dropped.

As we do not actually need the ISO_Fortran_binding.h file in
the $build directory (we just want to have it ready at
$build/include/ for the testsuite runs), the following patch
avoids an extra file in $build and also solves the dependency issue.

I intent to commit it later as obvious, unless anyone has
concerns, comments or a better suggestion.

Tobias

PS: Due to 'gfor_c_HEADERS = ISO_Fortran_binding.h', the 'make install'
file is copied from $srcdir, but that's fine and copying it from
$build/include/ is neither better nor worse.


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955

Re: [PATCH] Abstract PHI and forwarder block checks in jump threader.

2021-09-07 Thread Jeff Law via Gcc-patches





On 9/7/2021 7:23 AM, Aldy Hernandez wrote:



On 9/7/21 2:59 PM, Richard Biener wrote:
On September 7, 2021 12:02:27 PM GMT+02:00, Aldy Hernandez 
 wrote:



On 9/6/21 9:19 AM, Richard Biener wrote:

On Fri, Sep 3, 2021 at 3:59 PM Aldy Hernandez via Gcc-patches
 wrote:


This patch abstracts out a couple common idioms in the forward
threader that I found useful while navigating the code base.

Tested on x86-64 Linux.

OK?

gcc/ChangeLog:

  * tree-ssa-threadedge.c (has_phis_p): New.
  (forwarder_block_p): New.
  (potentially_threadable_block): Call forwarder_block_p.
  (jump_threader::thread_around_empty_blocks): Call 
has_phis_p.

  (jump_threader::thread_through_normal_block): Call
  forwarder_block_p.
---
   gcc/tree-ssa-threadedge.c | 25 +++--
   1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c
index e57f6d3e39c..3db54a199fd 100644
--- a/gcc/tree-ssa-threadedge.c
+++ b/gcc/tree-ssa-threadedge.c
@@ -95,6 +95,21 @@ jump_threader::thread_through_all_blocks (bool 
may_peel_loop_headers)
 return m_registry->thread_through_all_blocks 
(may_peel_loop_headers);

   }

+static inline bool
+has_phis_p (basic_block bb)
+{
+  return !gsi_end_p (gsi_start_phis (bb));


gimple_seq_empty_p (phi_nodes (bb)) shoud be cheaper.  Do virtual PHIs
count as PHIs for you?


I don't know.  The goal was to abstract some common idioms without
changing existing behavior, but if my abstractions confuse other
readers, perhaps I should revert my patch.

FWIW, my initial motivation here was to merge the path profitability
code between the forward and backward threaders.  It seems the forward
threader is more permissive than the backward threader, even though the
latter can thread more paths than it's allowed (per profitable_path_p).




+}
+
+/* Return TRUE for a forwarder block which is defined as having PHIs
+   but no instructions.  */
+
+static bool
+forwarder_block_p (basic_block bb)


There exists a function with exactly the same signature in 
cfgrtl.h, likewise

several similar implementations might exist elsewhere.


Ughh, that's definitely not good.



Your definition is also quite odd, not matching what one would expect
(the PHI requirement).  The tree-cfgcleanup.c variant has
tree_forwarder_block_p which is explicit about this.

Btw, gsi_start_nondebug_bb does not ignore labels.


Would a name like empty_block_with_phis_p be more appropriate?


I think so.  That said, my main concern ist the clash with the same 
named function.


Agreed.

OK for trunk?

Aldy


p.patch

commit 77ac56456d5db150d6a71eaca918f19d2b478f82
Author: Aldy Hernandez 
Date:   Tue Sep 7 15:20:23 2021 +0200

 Rename forwarder_block_p in treading code to empty_block_with_phis_p.
 
 gcc/ChangeLog:
 
 * tree-ssa-threadedge.c (forwarder_block_p): Rename to...

 (empty_block_with_phis_p): ...this.
 (potentially_threadable_block): Same.
 (jump_threader::thread_through_normal_block): Same.
OK.  I nearly called out a request for a name change  Guess I should 
have :-)


jeff

[Patch] Fortran: Handle allocated() with coindexed scalars [PR93834] (was: [PATCH] PR fortran/93834 - [9/10/11/12 Regression] ICE in trans_caf_is_present, at fortran/trans-intrinsic.c:8469)

2021-09-07 Thread Tobias Burnus


Now I actually tested the patch – and fixed some issues.

OK? – It does add support for 'allocated(a[i])' by treating
it as 'allocated(a)', as 'a' must be collectively allocated
("established") on all images of the team.*

'a[i]' is (probably) an allocatable, following Malcolm in
answer to my question to the J3-list as linked below.

Tobias

* Ignoring issues related to failed images. It could
also be handled by fetching 'a' from the remote
image, but I am not sure that's better in terms of
handling failed images.

PS:
On 07.09.21 10:02, Tobias Burnus wrote:

Hi Harald,

I spend yesterday about two hours with this. Now I am still
tired but understand more. I think the confusion between the
two of us is due to wording and in which directions the
thoughts then go:


Talking about coindexed, all of a[i], b[i]%c and c%d[i] are
coindexed and there are many constraints like "shall not be
a coindexed variable" – which then rejects all of those.
That's what I was thinking of.

I think your starting point is that while ('a' = allocatable)
  a, b%a, c[5]%d(1)%a
are ALLOCATABLE, adding a subobject reference such as
  a(:), b%a(:,:), c[5]%d(1)%a(:,:,:)
makes the variable no longer allocatable.
I think that's what you were thinking of.

We then both argued along those different lines – which caused
the confusion as we both thought we talked about the same.


While those cases are clear, the question is whether
  a[i] or b%a[i]
is allocatable or not – assuming that 'a' is a scalar.
(For an array, '(:)' has to appear before the image-selector,
which in turn makes it nonallocatable.)


I tried to pinpoint the words for this in the standard – and
failed. I think I need a "how to read the Fortran standard" 101
and some long time actually reading it :-(

Malcolm has answered me – and he believes (but only offhand) that
  a[i]  and  b%a[i]
_are_ allocatable. See (6) at
https://mailman.j3-fortran.org/pipermail/j3/2021-September/013322.html


This implies that
  if ( allocated (a[i]) .and. allocated (b%a[i]) ) stop 1
is valid.

However, I do note that coarray allocatables have to be collectively
(de)allocated, therefore
  allocated (a[i]) .and. allocated (b%a[i])
is equivalent to
  allocated (a) .and. allocated (b%a)
at least assuming that no image has failed.


First: Does this answer all the questions you had and resolved the
confusion?
Secondly, do you agree about the last bits of the analysis?
Thirdly, what do you think of the attached patch?

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
Fortran: Handle allocated() with coindexed scalars [PR93834]

2021-09-07  Harald Anlauf  
	Tobias Burnus  

While for an allocatable 'array', 'array(:)' and 'array(:)[1]' are
not allocatable, it is believed that not only 'scalar' but also
'scalar[1]' is allocatable.  However, coarrays are collectively
established/allocated; thus, 'allocated(scalar[i])' is equivalent
to 'allocated(scalar)'. [At least when assuming that 'i' does not
refer to a failed image.]

	PR fortran/93834
gcc/fortran/ChangeLog:

	* trans-intrinsic.c (gfc_conv_allocated): Cleanup. Handle
	coindexed scalar coarrays.

gcc/testsuite/ChangeLog:

	* gfortran.dg/coarray_allocated.f90: New test.

diff --git a/gcc/fortran/trans-intrinsic.c b/gcc/fortran/trans-intrinsic.c
index 46670baae55..c9d1aace33e 100644
--- a/gcc/fortran/trans-intrinsic.c
+++ b/gcc/fortran/trans-intrinsic.c
@@ -8887,50 +8887,63 @@ caf_this_image_ref (gfc_ref *ref)
 static void
 gfc_conv_allocated (gfc_se *se, gfc_expr *expr)
 {
-  gfc_actual_arglist *arg1;
   gfc_se arg1se;
   tree tmp;
-  symbol_attribute caf_attr;
+  bool coindexed_caf_comp = false;
+  gfc_expr *e = expr->value.function.actual->expr;
 
   gfc_init_se (, NULL);
-  arg1 = expr->value.function.actual;
-
-  if (arg1->expr->ts.type == BT_CLASS)
+  if (e->ts.type == BT_CLASS)
 {
   /* Make sure that class array expressions have both a _data
 	 component reference and an array reference  */
-  if (CLASS_DATA (arg1->expr)->attr.dimension)
-	gfc_add_class_array_ref (arg1->expr);
+  if (CLASS_DATA (e)->attr.dimension)
+	gfc_add_class_array_ref (e);
   /*  whilst scalars only need the _data component.  */
   else
-	gfc_add_data_component (arg1->expr);
+	gfc_add_data_component (e);
 }
 
-  /* When arg1 references an allocatable component in a coarray, then call
+  /* When 'e' references an allocatable component in a coarray, then call
  the caf-library function caf_is_present ().  */
-  if (flag_coarray == GFC_FCOARRAY_LIB && arg1->expr->expr_type == EXPR_FUNCTION
-  && arg1->expr->value.function.isym
-  && arg1->expr->value.function.isym->id == GFC_ISYM_CAF_GET)
-caf_attr = gfc_caf_attr (arg1->expr->value.function.actual->expr);
-  else
-gfc_clear_attr (_attr);
-  if

[Patch] libgfortran: Makefile fix for ISO_Fortran_binding.h

2021-09-07 Thread Tobias Burnus


Since the last libgfortran/Makefile.am commit,
  https://gcc.gnu.org/g:13beaf9e8d2d8264c0ad8f6504793fdcf26f3f73
the ISO_Fortran_binding.h file is no longer copied to
$(build)/.../libgfortran/include/ – which breaks in-build-tree testing.

The Makefile does contain the rule:

   ISO_Fortran_binding.h: $(srcdir)/ISO_Fortran_binding.h

but make does not regard this as invitation to copy it from $srcdir to $build
but just prints:

make: Circular $(srcdir)/ISO_Fortran_binding.h <- 
$(srcdir)/ISO_Fortran_binding.h dependency dropped.

As we do not actually need the ISO_Fortran_binding.h file in
the $build directory (we just want to have it ready at
$build/include/ for the testsuite runs), the following patch
avoids an extra file in $build and also solves the dependency issue.

I intent to commit it later as obvious, unless anyone has
concerns, comments or a better suggestion.

Tobias

PS: Due to 'gfor_c_HEADERS = ISO_Fortran_binding.h', the 'make install'
file is copied from $srcdir, but that's fine and copying it from
$build/include/ is neither better nor worse.

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
libgfortran: Makefile fix for ISO_Fortran_binding.h

libgfortran/ChangeLog:

	* Makefile.am (gfor_built_src): Depend on
	include/ISO_Fortran_binding.h not on ISO_Fortran_binding.h.
	(ISO_Fortran_binding.h): Rename make target to ...
	(include/ISO_Fortran_binding.h): ... this.
	* Makefile.in: Regenerate.

diff --git a/libgfortran/Makefile.am b/libgfortran/Makefile.am
index 366198b5938..008f2e7549c 100644
--- a/libgfortran/Makefile.am
+++ b/libgfortran/Makefile.am
@@ -817,7 +817,7 @@ gfor_built_src= $(i_all_c) $(i_any_c) $(i_count_c) $(i_maxloc0_c) \
 $(i_pow_c) $(i_pack_c) $(i_unpack_c) $(i_matmulavx128_c) \
 $(i_spread_c) selected_int_kind.inc selected_real_kind.inc kinds.h \
 $(i_cshift0_c) kinds.inc c99_protos.inc fpu-target.h fpu-target.inc \
-ISO_Fortran_binding.h \
+include/ISO_Fortran_binding.h \
 $(i_cshift1a_c) $(i_maxloc0s_c) $(i_minloc0s_c) $(i_maxloc1s_c) \
 $(i_minloc1s_c) $(i_maxloc2s_c) $(i_minloc2s_c) $(i_maxvals_c) \
 $(i_maxval0s_c) $(i_minval0s_c) $(i_maxval1s_c) $(i_minval1s_c) \
@@ -1076,15 +1076,13 @@ fpu-target.inc: fpu-target.h $(srcdir)/libgfortran.h
 	grep '^#define GFC_FPE_' < $(top_srcdir)/../gcc/fortran/libgfortran.h > $@ || true
 	grep '^#define GFC_FPE_' < $(srcdir)/libgfortran.h >> $@ || true
 
-# Place ISO_Fortran_binding.h also under include/ in the build directory such
+# Place ISO_Fortran_binding.h under include/ in the build directory such
 # that it can be used for in-built-tree testsuite runs without interference of
 # other files in the build dir - like intrinsic .mod files or other .h files.
-ISO_Fortran_binding.h: $(srcdir)/ISO_Fortran_binding.h
+include/ISO_Fortran_binding.h: $(srcdir)/ISO_Fortran_binding.h
 	-rm -f $@
-	cp $(srcdir)/ISO_Fortran_binding.h $@
 	$(MKDIR_P) include
-	-rm -f include/ISO_Fortran_binding.h
-	cp $@ include/ISO_Fortran_binding.h
+	cp $(srcdir)/ISO_Fortran_binding.h $@
 
 ## A 'normal' build shouldn't need to regenerate these
 ## so we only include them in maintainer mode
diff --git a/libgfortran/Makefile.in b/libgfortran/Makefile.in
index a3cb6f4c5ca..5dac04e171e 100644
--- a/libgfortran/Makefile.in
+++ b/libgfortran/Makefile.in
@@ -1382,7 +1382,7 @@ gfor_built_src = $(i_all_c) $(i_any_c) $(i_count_c) $(i_maxloc0_c) \
 $(i_pow_c) $(i_pack_c) $(i_unpack_c) $(i_matmulavx128_c) \
 $(i_spread_c) selected_int_kind.inc selected_real_kind.inc kinds.h \
 $(i_cshift0_c) kinds.inc c99_protos.inc fpu-target.h fpu-target.inc \
-ISO_Fortran_binding.h \
+include/ISO_Fortran_binding.h \
 $(i_cshift1a_c) $(i_maxloc0s_c) $(i_minloc0s_c) $(i_maxloc1s_c) \
 $(i_minloc1s_c) $(i_maxloc2s_c) $(i_minloc2s_c) $(i_maxvals_c) \
 $(i_maxval0s_c) $(i_minval0s_c) $(i_maxval1s_c) $(i_minval1s_c) \
@@ -7042,15 +7042,13 @@ fpu-target.inc: fpu-target.h $(srcdir)/libgfortran.h
 	grep '^#define GFC_FPE_' < $(top_srcdir)/../gcc/fortran/libgfortran.h > $@ || true
 	grep '^#define GFC_FPE_' < $(srcdir)/libgfortran.h >> $@ || true
 
-# Place ISO_Fortran_binding.h also under include/ in the build directory such
+# Place ISO_Fortran_binding.h under include/ in the build directory such
 # that it can be used for in-built-tree testsuite runs without interference of
 # other files in the build dir - like intrinsic .mod files or other .h files.
-ISO_Fortran_binding.h: $(srcdir)/ISO_Fortran_binding.h
+include/ISO_Fortran_binding.h: $(srcdir)/ISO_Fortran_binding.h
 	-rm -f $@
-	cp $(srcdir)/ISO_Fortran_binding.h $@
 	$(MKDIR_P) include
-	-rm -f include/ISO_Fortran_binding.h
-	cp $@ include/ISO_Fortran_binding.h
+	cp $(srcdir)/ISO_Fortran_binding.h $@
 
 @MAINTAINER_MODE_TRUE@$(i_all_c): m4/all.m4

Re: [patch] Fix PR debug/101947

2021-09-07 Thread Eric Botcazou

> ctnode between the two loops isn't used, so I think it is cleaner to just
> use two
>   for (comdat_type_node *ctnode = comdat_type_list; ctnode != NULL;
>ctnode = ctnode->next)
> loops instead of reusing the iterator variable.

Thanks.  IMO it's more readable this way though and it's the same idiom as in 
prune_unused_types and dwarf2out_finish so I'm leaving it as-is.

-- 
Eric Botcazou

Re: [PATCH 04/13] arm: Add GENERAL_AND_VPR_REGS regclass

2021-09-07 Thread Richard Earnshaw via Gcc-patches





On 07/09/2021 13:05, Christophe LYON wrote:


On 07/09/2021 11:42, Richard Earnshaw wrote:



On 07/09/2021 10:15, Christophe Lyon via Gcc-patches wrote:

At some point during the development of this patch series, it appeared
that in some cases the register allocator wants “VPR or general”
rather than “VPR or general or FP” (which is the same thing as
ALL_REGS).  The series does not seem to require this anymore, but it
seems to be a good thing to do anyway, to give the register allocator
more freedom.

2021-09-01  Christophe Lyon  

gcc/
* config/arm/arm.h (reg_class): Add GENERAL_AND_VPR_REGS.
(REG_CLASS_NAMES): Likewise.
(REG_CLASS_CONTENTS): Likewise. Add VPR_REG to ALL_REGS.

diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 015299c1534..fab39d05916 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -1286,6 +1286,7 @@ enum reg_class
    SFP_REG,
    AFP_REG,
    VPR_REG,
+  GENERAL_AND_VPR_REGS,
    ALL_REGS,
    LIM_REG_CLASSES
  };
@@ -1315,6 +1316,7 @@ enum reg_class
    "SFP_REG",    \
    "AFP_REG",    \
    "VPR_REG",    \
+  "GENERAL_AND_VPR_REGS", \
    "ALL_REGS"    \
  }
  @@ -1343,7 +1345,8 @@ enum reg_class
    { 0x, 0x, 0x, 0x0040 }, /* SFP_REG 
*/    \
    { 0x, 0x, 0x, 0x0080 }, /* AFP_REG 
*/    \
    { 0x, 0x, 0x, 0x0400 }, /* VPR_REG. 
*/    \
-  { 0x7FFF, 0x, 0x, 0x000F }  /* ALL_REGS. 
*/    \
+  { 0x5FFF, 0x, 0x, 0x0400 }, /* 
GENERAL_AND_VPR_REGS.  */ \
+  { 0x7FFF, 0x, 0x, 0x040F }  /* ALL_REGS. 
*/    \

  }


You've changed the definition of ALL_REGS here (to include VPR_REG), 
but not really explained why.  Is that the source of the underlying 
issue with the 'appeared' you mention?



I first added VPR_REG to ALL_REGS, but Richard Sandiford suggested I 
create a new GENERAL_AND_VPR_REGS that would be more restrictive. I did 
not remove VPR_REG from ALL_REGS because I thought it was an omission: 
shouldn't ALL_REGS contain all registers?


Surely that should be a separate patch then.

R.






R.



    #define FP_SYSREGS \

Re: [PATCH] Abstract PHI and forwarder block checks in jump threader.

2021-09-07 Thread Aldy Hernandez via Gcc-patches




On 9/7/21 2:59 PM, Richard Biener wrote:

On September 7, 2021 12:02:27 PM GMT+02:00, Aldy Hernandez  
wrote:



On 9/6/21 9:19 AM, Richard Biener wrote:

On Fri, Sep 3, 2021 at 3:59 PM Aldy Hernandez via Gcc-patches
 wrote:


This patch abstracts out a couple common idioms in the forward
threader that I found useful while navigating the code base.

Tested on x86-64 Linux.

OK?

gcc/ChangeLog:

  * tree-ssa-threadedge.c (has_phis_p): New.
  (forwarder_block_p): New.
  (potentially_threadable_block): Call forwarder_block_p.
  (jump_threader::thread_around_empty_blocks): Call has_phis_p.
  (jump_threader::thread_through_normal_block): Call
  forwarder_block_p.
---
   gcc/tree-ssa-threadedge.c | 25 +++--
   1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c
index e57f6d3e39c..3db54a199fd 100644
--- a/gcc/tree-ssa-threadedge.c
+++ b/gcc/tree-ssa-threadedge.c
@@ -95,6 +95,21 @@ jump_threader::thread_through_all_blocks (bool 
may_peel_loop_headers)
 return m_registry->thread_through_all_blocks (may_peel_loop_headers);
   }

+static inline bool
+has_phis_p (basic_block bb)
+{
+  return !gsi_end_p (gsi_start_phis (bb));


gimple_seq_empty_p (phi_nodes (bb)) shoud be cheaper.  Do virtual PHIs
count as PHIs for you?


I don't know.  The goal was to abstract some common idioms without
changing existing behavior, but if my abstractions confuse other
readers, perhaps I should revert my patch.

FWIW, my initial motivation here was to merge the path profitability
code between the forward and backward threaders.  It seems the forward
threader is more permissive than the backward threader, even though the
latter can thread more paths than it's allowed (per profitable_path_p).




+}
+
+/* Return TRUE for a forwarder block which is defined as having PHIs
+   but no instructions.  */
+
+static bool
+forwarder_block_p (basic_block bb)


There exists a function with exactly the same signature in cfgrtl.h, likewise
several similar implementations might exist elsewhere.


Ughh, that's definitely not good.



Your definition is also quite odd, not matching what one would expect
(the PHI requirement).  The tree-cfgcleanup.c variant has
tree_forwarder_block_p which is explicit about this.

Btw, gsi_start_nondebug_bb does not ignore labels.


Would a name like empty_block_with_phis_p be more appropriate?


I think so.  That said, my main concern ist the clash with the same named 
function.


Agreed.

OK for trunk?

Aldy

commit 77ac56456d5db150d6a71eaca918f19d2b478f82
Author: Aldy Hernandez 
Date:   Tue Sep 7 15:20:23 2021 +0200

Rename forwarder_block_p in treading code to empty_block_with_phis_p.

gcc/ChangeLog:

* tree-ssa-threadedge.c (forwarder_block_p): Rename to...
(empty_block_with_phis_p): ...this.
(potentially_threadable_block): Same.
(jump_threader::thread_through_normal_block): Same.

diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c
index 3db54a199fd..3c7cdc58b93 100644
--- a/gcc/tree-ssa-threadedge.c
+++ b/gcc/tree-ssa-threadedge.c
@@ -101,11 +101,10 @@ has_phis_p (basic_block bb)
   return !gsi_end_p (gsi_start_phis (bb));
 }
 
-/* Return TRUE for a forwarder block which is defined as having PHIs
-   but no instructions.  */
+/* Return TRUE for a block with PHIs but no statements.  */
 
 static bool
-forwarder_block_p (basic_block bb)
+empty_block_with_phis_p (basic_block bb)
 {
   return gsi_end_p (gsi_start_nondebug_bb (bb)) && has_phis_p (bb);
 }
@@ -123,7 +122,7 @@ potentially_threadable_block (basic_block bb)
  to the loop header.   We want to thread through them as we can
  sometimes thread to the loop exit, which is obviously profitable.
  The interesting case here is when the block has PHIs.  */
-  if (forwarder_block_p (bb))
+  if (empty_block_with_phis_p (bb))
 return true;
 
   /* If BB has a single successor or a single predecessor, then
@@ -1008,7 +1007,7 @@ jump_threader::thread_through_normal_block (vec *path,
 {
   /* First case.  The statement simply doesn't have any instructions, but
 	 does have PHIs.  */
-  if (forwarder_block_p (e->dest))
+  if (empty_block_with_phis_p (e->dest))
 	return 0;
 
   /* Second case.  */

Re: [PATCH] Abstract PHI and forwarder block checks in jump threader.

2021-09-07 Thread Richard Biener via Gcc-patches

On September 7, 2021 12:02:27 PM GMT+02:00, Aldy Hernandez  
wrote:
>
>
>On 9/6/21 9:19 AM, Richard Biener wrote:
>> On Fri, Sep 3, 2021 at 3:59 PM Aldy Hernandez via Gcc-patches
>>  wrote:
>>>
>>> This patch abstracts out a couple common idioms in the forward
>>> threader that I found useful while navigating the code base.
>>>
>>> Tested on x86-64 Linux.
>>>
>>> OK?
>>>
>>> gcc/ChangeLog:
>>>
>>>  * tree-ssa-threadedge.c (has_phis_p): New.
>>>  (forwarder_block_p): New.
>>>  (potentially_threadable_block): Call forwarder_block_p.
>>>  (jump_threader::thread_around_empty_blocks): Call has_phis_p.
>>>  (jump_threader::thread_through_normal_block): Call
>>>  forwarder_block_p.
>>> ---
>>>   gcc/tree-ssa-threadedge.c | 25 +++--
>>>   1 file changed, 19 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c
>>> index e57f6d3e39c..3db54a199fd 100644
>>> --- a/gcc/tree-ssa-threadedge.c
>>> +++ b/gcc/tree-ssa-threadedge.c
>>> @@ -95,6 +95,21 @@ jump_threader::thread_through_all_blocks (bool 
>>> may_peel_loop_headers)
>>> return m_registry->thread_through_all_blocks (may_peel_loop_headers);
>>>   }
>>>
>>> +static inline bool
>>> +has_phis_p (basic_block bb)
>>> +{
>>> +  return !gsi_end_p (gsi_start_phis (bb));
>> 
>> gimple_seq_empty_p (phi_nodes (bb)) shoud be cheaper.  Do virtual PHIs
>> count as PHIs for you?
>
>I don't know.  The goal was to abstract some common idioms without 
>changing existing behavior, but if my abstractions confuse other 
>readers, perhaps I should revert my patch.
>
>FWIW, my initial motivation here was to merge the path profitability 
>code between the forward and backward threaders.  It seems the forward 
>threader is more permissive than the backward threader, even though the 
>latter can thread more paths than it's allowed (per profitable_path_p).
>
>> 
>>> +}
>>> +
>>> +/* Return TRUE for a forwarder block which is defined as having PHIs
>>> +   but no instructions.  */
>>> +
>>> +static bool
>>> +forwarder_block_p (basic_block bb)
>> 
>> There exists a function with exactly the same signature in cfgrtl.h, likewise
>> several similar implementations might exist elsewhere.
>
>Ughh, that's definitely not good.
>
>> 
>> Your definition is also quite odd, not matching what one would expect
>> (the PHI requirement).  The tree-cfgcleanup.c variant has
>> tree_forwarder_block_p which is explicit about this.
>> 
>> Btw, gsi_start_nondebug_bb does not ignore labels.
>
>Would a name like empty_block_with_phis_p be more appropriate?

I think so.  That said, my main concern ist the clash with the same named 
function.

Richard. 

>Aldy
>

Re: [PATCH] libgcc, i386: Export hf and hc from libgcc_s.so.1

2021-09-07 Thread Jakub Jelinek via Gcc-patches

On Mon, Sep 06, 2021 at 10:58:53AM +0200, Jakub Jelinek via Gcc-patches wrote:
> On Mon, Sep 06, 2021 at 08:49:27AM +0100, Iain Sandoe wrote:
> > > Ok.  The *.ver changes are still needed (see above), but that can be done
> > > incrementally.
> > 
> > I can commit the .ver change if that’s approved, sure - for the record I 
> > haven’t checked
> > any targets other than Darwin and Linux.
> 
> The following patch exports it for Linux from config/i386/*.ver where it
> IMNSHO belongs, aarch64 already exports some of those at GCC_11* and other
> targets might add them at completely different gcc versions.
> 
> Tested on x86_64-linux, verified the right symbols are exported, ok for trunk?

Bootstrapped/regtested on x86_64-linux and i686-linux successfully, though
actually __divhc3 and __mulhc3 aren't exported, they aren't even compiled
into libgcc_s.so.1.  Is that on purpose (large functions very unlikely being
used in most of the programs)?  If yes, I'll drop the __divhc3/__mulhc3
lines.  If not,
LIB2ADD_ST += $(addprefix $(srcdir)/config/i386/, $(libgcc2-hf-extras))
should be changed to
LIB2ADD += $(addprefix $(srcdir)/config/i386/, $(libgcc2-hf-extras))
(untested), or even the LIB2FUNCS_EXCLUDE dropped?

> 2021-09-06  Jakub Jelinek  
>   Iain Sandoe  
> 
>   * config/i386/libgcc-glibc.ver: Add %inherit GCC_12.0.0 GCC_7.0.0
>   and export *hf* and *hc* functions at GCC_12.0.0.
> 
> --- libgcc/config/i386/libgcc-glibc.ver.jj2021-01-05 00:13:58.142298913 
> +0100
> +++ libgcc/config/i386/libgcc-glibc.ver   2021-09-06 10:47:52.244726676 
> +0200
> @@ -194,3 +194,23 @@ GCC_4.8.0 {
>__cpu_indicator_init
>  }
>  %endif
> +
> +%inherit GCC_12.0.0 GCC_7.0.0
> +GCC_12.0.0 {
> +  __divhc3
> +  __mulhc3
> +  __eqhf2
> +  __nehf2
> +  __extendhfdf2
> +  __extendhfsf2
> +  __extendhftf2
> +  __extendhfxf2
> +  __fixhfti
> +  __fixunshfti
> +  __floattihf
> +  __floatuntihf
> +  __truncdfhf2
> +  __truncsfhf2
> +  __trunctfhf2
> +  __truncxfhf2
> +}

Jakub

Re: [patch] Fix PR debug/101947

2021-09-07 Thread Jakub Jelinek via Gcc-patches

On Tue, Sep 07, 2021 at 02:06:29PM +0200, Eric Botcazou wrote:
> this is the recent LTO bootstrap failure with Ada enabled.  The compiler now 
> generates DW_OP_deref_type for a unit of the Ada front-end, which means that 
> the offset of base types in the CU must be computed during early DWARF too.
> 
> LTO-bootstrapped/regtested on x86-64/Linux, OK for the mainline?
> 
> 
> 2021-09-07  Eric Botcazou  
> 
>   PR debug/101947
>   * dwarf2out.c (mark_base_types): New overloaded function.
>   (dwarf2out_early_finish): Invoke it on the COMDAT type list as well
>   as the compilation unit, and call move_marked_base_types afterward.

>  /* Comparison function for sorting marked base types.  */
>  
>  static int
> @@ -32697,6 +32731,7 @@ ctf_debug_do_cu (dw_die_ref die)
>  static void
>  dwarf2out_early_finish (const char *filename)
>  {
> +  comdat_type_node *ctnode;
>set_early_dwarf s;
>char dl_section_ref[MAX_ARTIFICIAL_LABEL_BYTES];
>  
> @@ -32792,8 +32827,7 @@ dwarf2out_early_finish (const char *filename)
>note_variable_value (comp_unit_die ());
>for (limbo_die_node *node = cu_die_list; node; node = node->next)
>  note_variable_value (node->die);
> -  for (comdat_type_node *ctnode = comdat_type_list; ctnode != NULL;
> -   ctnode = ctnode->next)
> +  for (ctnode = comdat_type_list; ctnode != NULL; ctnode = ctnode->next)
>  note_variable_value (ctnode->root_die);
>for (limbo_die_node *node = limbo_die_list; node; node = node->next)
>  note_variable_value (node->die);
> @@ -32845,6 +32879,11 @@ dwarf2out_early_finish (const char *filename)
>   location related output removed and some LTO specific changes.
>   Some refactoring might make both smaller and easier to match up.  */
>  
> +  for (ctnode = comdat_type_list; ctnode != NULL; ctnode = ctnode->next)
> +mark_base_types (ctnode->root_die);

ctnode between the two loops isn't used, so I think it is cleaner to just
use two
  for (comdat_type_node *ctnode = comdat_type_list; ctnode != NULL;
   ctnode = ctnode->next)
loops instead of reusing the iterator variable.

Ok for trunk either way.

Jakub

[patch] Fix PR debug/101947

2021-09-07 Thread Eric Botcazou

Hi,

this is the recent LTO bootstrap failure with Ada enabled.  The compiler now 
generates DW_OP_deref_type for a unit of the Ada front-end, which means that 
the offset of base types in the CU must be computed during early DWARF too.

LTO-bootstrapped/regtested on x86-64/Linux, OK for the mainline?


2021-09-07  Eric Botcazou  

PR debug/101947
* dwarf2out.c (mark_base_types): New overloaded function.
(dwarf2out_early_finish): Invoke it on the COMDAT type list as well
as the compilation unit, and call move_marked_base_types afterward.

-- 
Eric Botcazoudiff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index 4f100606618..d7664277c25 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -30295,6 +30295,40 @@ mark_base_types (dw_loc_descr_ref loc)
 }
 }
 
+/* Stripped-down variant of resolve_addr, mark DW_TAG_base_type nodes
+   referenced from typed stack ops and count how often they are used.  */
+
+static void
+mark_base_types (dw_die_ref die)
+{
+  dw_die_ref c;
+  dw_attr_node *a;
+  dw_loc_list_ref *curr;
+  unsigned ix;
+
+  FOR_EACH_VEC_SAFE_ELT (die->die_attr, ix, a)
+switch (AT_class (a))
+  {
+  case dw_val_class_loc_list:
+	curr = AT_loc_list_ptr (a);
+	while (*curr)
+	  {
+	mark_base_types ((*curr)->expr);
+	curr = &(*curr)->dw_loc_next;
+	  }
+	break;
+
+  case dw_val_class_loc:
+	mark_base_types (AT_loc (a));
+	break;
+
+  default:
+	break;
+  }
+
+  FOR_EACH_CHILD (die, c, mark_base_types (c));
+}
+
 /* Comparison function for sorting marked base types.  */
 
 static int
@@ -32697,6 +32731,7 @@ ctf_debug_do_cu (dw_die_ref die)
 static void
 dwarf2out_early_finish (const char *filename)
 {
+  comdat_type_node *ctnode;
   set_early_dwarf s;
   char dl_section_ref[MAX_ARTIFICIAL_LABEL_BYTES];
 
@@ -32792,8 +32827,7 @@ dwarf2out_early_finish (const char *filename)
   note_variable_value (comp_unit_die ());
   for (limbo_die_node *node = cu_die_list; node; node = node->next)
 note_variable_value (node->die);
-  for (comdat_type_node *ctnode = comdat_type_list; ctnode != NULL;
-   ctnode = ctnode->next)
+  for (ctnode = comdat_type_list; ctnode != NULL; ctnode = ctnode->next)
 note_variable_value (ctnode->root_die);
   for (limbo_die_node *node = limbo_die_list; node; node = node->next)
 note_variable_value (node->die);
@@ -32845,6 +32879,11 @@ dwarf2out_early_finish (const char *filename)
  location related output removed and some LTO specific changes.
  Some refactoring might make both smaller and easier to match up.  */
 
+  for (ctnode = comdat_type_list; ctnode != NULL; ctnode = ctnode->next)
+mark_base_types (ctnode->root_die);
+  mark_base_types (comp_unit_die ());
+  move_marked_base_types ();
+
   /* Traverse the DIE's and add sibling attributes to those DIE's
  that have children.  */
   add_sibling_attributes (comp_unit_die ());

Re: [PATCH 04/13] arm: Add GENERAL_AND_VPR_REGS regclass

2021-09-07 Thread Christophe LYON via Gcc-patches




On 07/09/2021 11:42, Richard Earnshaw wrote:



On 07/09/2021 10:15, Christophe Lyon via Gcc-patches wrote:

At some point during the development of this patch series, it appeared
that in some cases the register allocator wants “VPR or general”
rather than “VPR or general or FP” (which is the same thing as
ALL_REGS).  The series does not seem to require this anymore, but it
seems to be a good thing to do anyway, to give the register allocator
more freedom.

2021-09-01  Christophe Lyon  

gcc/
* config/arm/arm.h (reg_class): Add GENERAL_AND_VPR_REGS.
(REG_CLASS_NAMES): Likewise.
(REG_CLASS_CONTENTS): Likewise. Add VPR_REG to ALL_REGS.

diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 015299c1534..fab39d05916 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -1286,6 +1286,7 @@ enum reg_class
    SFP_REG,
    AFP_REG,
    VPR_REG,
+  GENERAL_AND_VPR_REGS,
    ALL_REGS,
    LIM_REG_CLASSES
  };
@@ -1315,6 +1316,7 @@ enum reg_class
    "SFP_REG",    \
    "AFP_REG",    \
    "VPR_REG",    \
+  "GENERAL_AND_VPR_REGS", \
    "ALL_REGS"    \
  }
  @@ -1343,7 +1345,8 @@ enum reg_class
    { 0x, 0x, 0x, 0x0040 }, /* SFP_REG 
*/    \
    { 0x, 0x, 0x, 0x0080 }, /* AFP_REG 
*/    \
    { 0x, 0x, 0x, 0x0400 }, /* VPR_REG.  
*/    \
-  { 0x7FFF, 0x, 0x, 0x000F }  /* ALL_REGS.  
*/    \
+  { 0x5FFF, 0x, 0x, 0x0400 }, /* 
GENERAL_AND_VPR_REGS.  */ \
+  { 0x7FFF, 0x, 0x, 0x040F }  /* ALL_REGS.  
*/    \

  }


You've changed the definition of ALL_REGS here (to include VPR_REG), 
but not really explained why.  Is that the source of the underlying 
issue with the 'appeared' you mention?



I first added VPR_REG to ALL_REGS, but Richard Sandiford suggested I 
create a new GENERAL_AND_VPR_REGS that would be more restrictive. I did 
not remove VPR_REG from ALL_REGS because I thought it was an omission: 
shouldn't ALL_REGS contain all registers?





R.



    #define FP_SYSREGS \

Re: [Patch] libgomp.texi: Add OpenMP Implementation Status

2021-09-07 Thread Jakub Jelinek via Gcc-patches

On Tue, Sep 07, 2021 at 01:22:43PM +0200, Tobias Burnus wrote:
> On 07.09.21 12:49, Jakub Jelinek wrote:
> > On Tue, Sep 07, 2021 at 12:39:01PM +0200, Tobias Burnus wrote:
> > > +@item @code{declare variant} directive @tab P @tab Only C and C++
> > Even for C/C++, we don't handle the simd stuff there (where we should be
> > expecting a different function types in those cases, ones following the
> > declare simd ABIs).  So, essentially it is correct only on
> > non-x86/non-aarch64, because other arches don't have their declare simd
> > ABIs.
> I now use: ", simd taits not handled".

s/taits/traits/ and I'd add /handled/& correctly/

> > > +@item C/C++'s lvalue expressions in @code{to}, @code{from}, @code{depend}
> > > +  and @code{map} clause @tab Y @tab
> > I think this is not implemented yet, at least not in trunk.
> > We don't allow map(to:foo(234)[:32]) or map(to:bar()->x->y[5].z[3]) etc.
> 
> I somehow had the impression that I saw lvalues for 'depend', but while
> they are more complicated, I think I did not see f()[1] etc. – For
> mapping, I did see patches implementing stuff like 'z->x->y[5]' (which
> are pending review) but I think that even those did not use a function
> call on the left. Thus, a simple pilot error.

We do indeed support it for depend (and affinity), just don't support in
the lvalue case array sections, so perhaps split the
depend case from the other clauses.  The depend parsing tries to parse it
as the rigid varname followed by optional . field or array section and if
that fails, parses it as an expression, verifies it is lvalue and just
uses the address of that lvalue as the depend address.
While for map/to/from, I think what we need to do is make the OpenMP array
section a new tree code (perhaps C/C++ FE only), which is not recognized by
default, only when certain flag is set, set that flag while parsing
map/to/from/depend/affinity operands and clear afterwards, use normal
expression parsing and then verify after parsing that it is only used in the
spots where it can be used.

> I have now added "Stub only" to both the 'close' modifier and, for
> completeness, to the the 'affinity' clause. – I think both are
> relatively boring; possibly, with unified shared memory or (for
> affinity) real NUMA memory, they become a tad more useful.

True.

Jakub

Re: [Patch] libgomp.texi: Add OpenMP Implementation Status

2021-09-07 Thread Tobias Burnus


On 07.09.21 12:49, Jakub Jelinek wrote:

On Tue, Sep 07, 2021 at 12:39:01PM +0200, Tobias Burnus wrote:

+@item @code{declare variant} directive @tab P @tab Only C and C++

Even for C/C++, we don't handle the simd stuff there (where we should be
expecting a different function types in those cases, ones following the
declare simd ABIs).  So, essentially it is correct only on
non-x86/non-aarch64, because other arches don't have their declare simd
ABIs.

I now use: ", simd taits not handled".

+@item C/C++'s lvalue expressions in @code{to}, @code{from}, @code{depend}
+  and @code{map} clause @tab Y @tab

I think this is not implemented yet, at least not in trunk.
We don't allow map(to:foo(234)[:32]) or map(to:bar()->x->y[5].z[3]) etc.


I somehow had the impression that I saw lvalues for 'depend', but while
they are more complicated, I think I did not see f()[1] etc. – For
mapping, I did see patches implementing stuff like 'z->x->y[5]' (which
are pending review) but I think that even those did not use a function
call on the left. Thus, a simple pilot error.

-> 'N'


+@item @code{close} @emph{map-type-modifier} @tab Y @tab

But it is stub only, though not sure if we in short term plan something
else.  So perhaps ok.


I have now added "Stub only" to both the 'close' modifier and, for
completeness, to the the 'affinity' clause. – I think both are
relatively boring; possibly, with unified shared memory or (for
affinity) real NUMA memory, they become a tad more useful.

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
libgomp.texi: Extend OpenMP 5.0 Implementation Status

libgomp/
* libgomp.texi (OpenMP Implementation Status): Extend
	OpenMP 5.0 section.

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 0ae9c3260ff..fedc20b4abe 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -172,8 +172,89 @@ The OpenMP 4.5 specification is fully supported.
 @node OpenMP 5.0
 @section OpenMP 5.0
 
-Partial support of the OpenMP 5.0 specification. The OMPT and the OMPD
-interfaces are unsupported.
+@unnumberedsubsec New features listed in Appendix B of the OpenMP specification
+@c This list is sorted as in OpenMP 5.1's B.3 not as in OpenMP 5.0's B.2
+
+@multitable @columnfractions .60 .10 .25
+@headitem Description @tab Status @tab Comments
+@item Array shaping @tab N @tab
+@item Array sections with non-unit strides in C and C++ @tab N @tab
+@item Iterators @tab Y @tab
+@item @code{metadirective} directive @tab N @tab
+@item @code{declare variant} directive
+  @tab P @tab Only C and C++, simd taits not handled
+@item @emph{target-offload-var} ICV and @code{OMP_TARGET_OFFLOAD}
+  env variable @tab Y @tab
+@item Nested-parallel changes to @emph{max-active-levels-var} ICV @tab Y @tab
+@item @code{requires} directive @tab P
+  @tab Only fulfillable requirement is @code{atomic_default_mem_order}
+@item @code{teams} construct outside an enclosing target region @tab Y @tab
+@item Non-rectangular loop nests @tab Y @tab
+@item @code{!=} as relational-op in canonical loop form for C/C++ @tab Y @tab
+@item @code{nonmonotonic} as default loop schedule modifier for worksharing-loop
+  constructs @tab Y @tab
+@item Collapse of associated loops that are imperfectly nested loops @tab N @tab
+@item Clauses @code{if}, @code{nontemporal} and @code{order(concurrent)} in
+  @code{simd} construct @tab Y @tab
+@item @code{atomic} constructs in @code{simd} @tab Y @tab
+@item @code{loop} construct @tab Y @tab
+@item @code{order(concurrent)} clause @tab Y @tab
+@item @code{scan} directive and @code{in_scan} modifier for the
+  @code{reduction} clause @tab Y @tab
+@item @code{in_reduction} clause on @code{task} constructs @tab Y @tab
+@item @code{in_reduction} clause on @code{target} constructs @tab P
+  @tab Only C/C++, @code{nowait} only stub
+@item @code{task_reduction} clause with @code{taskgroup} @tab Y @tab
+@item @code{task} modifier to @code{reduction} clause @tab Y @tab
+@item @code{affinity} clause to @code{task} construct @tab Y @tab Stub only
+@item @code{detach} clause to @code{task} construct @tab Y @tab
+@item @code{omp_fulfill_event} runtime routine @tab Y @tab
+@item @code{reduction} and @code{in_reduction} clauses on @code{taskloop}
+  and @code{taskloop simd} constructs @tab Y @tab
+@item @code{taskloop} construct cancelable by @code{cancel} construct
+  @tab Y @tab
+@item @code{mutexinouset} @emph{dependence-type} for @code{depend} clause
+  @tab Y @tab
+@item Predefined memory spaces, memory allocators, allocator traits
+  @tab Y @tab Some are only stubs
+@item Memory management routines @tab Y @tab
+@item @code{allocate} directive @tab N @tab
+@item @code{allocate} clause @tab P @tab initial support in C/C++ only
+@item

[PATCH] tree-optimization/102226 - fix epilogue vector re-use

2021-09-07 Thread Richard Biener via Gcc-patches

This fixes re-use of the reduction value in epilogue vectorization
when a conversion from/to variable lenght vectors is required.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-09-07  Richard Biener  

PR tree-optimization/102226
* tree-vect-loop.c (vect_transform_cycle_phi): Record
the converted value for the epilogue PHI use.

* g++.dg/vect/pr102226.cc: New testcase.
---
 gcc/testsuite/g++.dg/vect/pr102226.cc | 29 +++
 gcc/tree-vect-loop.c  |  4 ++--
 2 files changed, 31 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/vect/pr102226.cc

diff --git a/gcc/testsuite/g++.dg/vect/pr102226.cc 
b/gcc/testsuite/g++.dg/vect/pr102226.cc
new file mode 100644
index 000..ddf5e460c28
--- /dev/null
+++ b/gcc/testsuite/g++.dg/vect/pr102226.cc
@@ -0,0 +1,29 @@
+// { dg-do compile }
+// { dg-require-effective-target c++11 }
+// { dg-additional-options "-msve-vector-bits=128" { target aarch64_sve } }
+
+template  struct b { using c = a; };
+template  class> using f = b;
+template  class g>
+using h = typename f::c;
+struct i {
+  template  using k = typename j::l;
+};
+struct m : i {
+  using l = h;
+};
+class n {
+public:
+  char operator[](long o) {
+m::l s;
+return s[o];
+  }
+} p;
+n r;
+int q() {
+  long d;
+  for (long e; e; e++)
+if (p[e] == r[e])
+  d++;
+  return d;
+}
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 0c8d992624b..c9dcc647d2c 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -7755,11 +7755,11 @@ vect_transform_cycle_phi (loop_vec_info loop_vinfo,
  (reduc_info),
);
}
+ if (!useless_type_conversion_p (vectype_out, TREE_TYPE (def)))
+   def = gimple_convert (, vectype_out, def);
  /* Adjust the input so we pick up the partially reduced value
 for the skip edge in vect_create_epilog_for_reduction.  */
  accumulator->reduc_input = def;
- if (!useless_type_conversion_p (vectype_out, TREE_TYPE (def)))
-   def = gimple_convert (, vectype_out, def);
  if (loop_vinfo->main_loop_edge)
{
  /* While we'd like to insert on the edge this will split
-- 
2.31.1

Re: [Patch] libgomp.texi: Add OpenMP Implementation Status

2021-09-07 Thread Jakub Jelinek via Gcc-patches

On Tue, Sep 07, 2021 at 12:39:01PM +0200, Tobias Burnus wrote:
> +@item @code{declare variant} directive @tab P @tab Only C and C++

Even for C/C++, we don't handle the simd stuff there (where we should be
expecting a different function types in those cases, ones following the
declare simd ABIs).  So, essentially it is correct only on
non-x86/non-aarch64, because other arches don't have their declare simd
ABIs.

> +@item C/C++'s lvalue expressions in @code{to}, @code{from}, @code{depend}
> +  and @code{map} clause @tab Y @tab

I think this is not implemented yet, at least not in trunk.
We don't allow map(to:foo(234)[:32]) or map(to:bar()->x->y[5].z[3]) etc.

> +@item @code{close} @emph{map-type-modifier} @tab Y @tab

But it is stub only, though not sure if we in short term plan something
else.  So perhaps ok.

Jakub

Re: [Patch] libgomp.texi: Add OpenMP Implementation Status

2021-09-07 Thread Tobias Burnus


Hi Jakub,

On 07.09.21 10:14, Jakub Jelinek wrote:

libgomp.texi: Add OpenMP Implementation Status

libgomp/
 * libgomp.texi (Enabling OpenMP): Refer to OMP spec in general
 not to 4.5; link to new section.
 (OpenMP Implementation Status): New.

Ok.  I'll try to provide the 5.0 implementation status soon.


Attached is a first go for the 5.0 implementation status, just based on
Appendix B.

It cannot harm to proof read it – and extending it as follow up with
additional items and more details, implementation-specific behavior and
the like is useful and appreciated.

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
libgomp.texi: Extend OpenMP 5.0 Implementation Status

libgomp/
* libgomp.texi (OpenMP Implementation Status): Extend
	OpenMP 5.0 section.

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 0ae9c3260ff..e1bfb4839a9 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -172,8 +172,87 @@ The OpenMP 4.5 specification is fully supported.
 @node OpenMP 5.0
 @section OpenMP 5.0
 
-Partial support of the OpenMP 5.0 specification. The OMPT and the OMPD
-interfaces are unsupported.
+@unnumberedsubsec New features listed in Appendix B of the OpenMP specification
+@c This list is sorted as in OpenMP 5.1's B.3 not as in OpenMP 5.0's B.2
+
+@multitable @columnfractions .60 .10 .25
+@headitem Description @tab Status @tab Comments
+@item Array shaping @tab N @tab
+@item Array sections with non-unit strides in C and C++ @tab N @tab
+@item Iterators @tab Y @tab
+@item @code{metadirective} directive @tab N @tab
+@item @code{declare variant} directive @tab P @tab Only C and C++
+@item @emph{target-offload-var} ICV and @code{OMP_TARGET_OFFLOAD}
+  env variable @tab Y @tab
+@item Nested-parallel changes to @emph{max-active-levels-var} ICV @tab Y @tab
+@item @code{requires} directive @tab P
+  @tab Only fulfillable requirement is @code{atomic_default_mem_order}
+@item @code{teams} construct outside an enclosing target region @tab Y @tab
+@item Non-rectangular loop nests @tab Y @tab
+@item @code{!=} as relational-op in canonical loop form for C/C++ @tab Y @tab
+@item @code{nonmonotonic} as default loop schedule modifier for worksharing-loop
+  constructs @tab Y @tab
+@item Collapse of associated loops that are imperfectly nested loops @tab N @tab
+@item Clauses @code{if}, @code{nontemporal} and @code{order(concurrent)} in
+  @code{simd} construct @tab Y @tab
+@item @code{atomic} constructs in @code{simd} @tab Y @tab
+@item @code{loop} construct @tab Y @tab
+@item @code{order(concurrent)} clause @tab Y @tab
+@item @code{scan} directive and @code{in_scan} modifier for the
+  @code{reduction} clause @tab Y @tab
+@item @code{in_reduction} clause on @code{task} constructs @tab Y @tab
+@item @code{in_reduction} clause on @code{target} constructs @tab P
+  @tab Only C/C++, @code{nowait} only stub
+@item @code{task_reduction} clause with @code{taskgroup} @tab Y @tab
+@item @code{task} modifier to @code{reduction} clause @tab Y @tab
+@item @code{affinity} clause to @code{task} construct @tab Y @tab
+@item @code{detach} clause to @code{task} construct @tab Y @tab
+@item @code{omp_fulfill_event} runtime routine @tab Y @tab
+@item @code{reduction} and @code{in_reduction} clauses on @code{taskloop}
+  and @code{taskloop simd} constructs @tab Y @tab
+@item @code{taskloop} construct cancelable by @code{cancel} construct
+  @tab Y @tab
+@item @code{mutexinouset} @emph{dependence-type} for @code{depend} clause
+  @tab Y @tab
+@item Predefined memory spaces, memory allocators, allocator traits
+  @tab Y @tab Some are only stubs
+@item Memory management routines @tab Y @tab
+@item @code{allocate} directive @tab N @tab
+@item @code{allocate} clause @tab P @tab initial support in C/C++ only
+@item @code{use_device_addr} clause on @code{target data} @tab Y @tab
+@item @code{ancestor} modifier on @code{device} clause @tab P @tab Reverse offload unsupported
+@item Implicit declare target directive @tab Y @tab
+@item Discontiguous array section with @code{target update} construct
+  @tab N @tab
+@item C/C++'s lvalue expressions in @code{to}, @code{from}, @code{depend}
+  and @code{map} clause @tab Y @tab
+@item Nested @code{declare target} directive @tab Y @tab
+@item Combined @code{master} constructs @tab Y @tab
+@item @code{depend} clause on @code{taskwait} @tab Y @tab
+@item Weak memory ordering clauses on @code{atomic} and @code{flush} construct
+  @tab Y @tab
+@item @code{hint} clause on the @code{atomic} construct @tab Y @tab
+@item @code{depobj} construct and depend objects  @tab Y @tab
+@item Lock hints were renamed to synchronization hints @tab Y @tab
+@item @code{conditional} modifier to @code{lastprivate} clause

Re: [PATCH 2/6] [i386] Enable _Float16 type for TARGET_SSE2 and above.

2021-09-07 Thread Jakub Jelinek via Gcc-patches

On Tue, Sep 07, 2021 at 06:08:44PM +0800, Hongtao Liu wrote:
> -On x86 targets with @code{target("sse2")} and above, GCC supports
> half-precision
> -(16-bit) floating point via the @code{_Float16} type which is defined by
> -18661-3:2015. For C++, x86 provide a builtin type named @code{_Float16}
> -which contains same data format as C.
> -
> -Without @option{-mavx512fp16}, @code{_Float16} type is storage only, all
> -operations will be emulated by software emulation and the @code{float}
> -instructions. The default behavior for @code{FLT_EVAL_METHOD} is to keep
> -the intermediate result of the operation as 32-bit precision. This may lead
> -to inconsistent behavior between software emulation and AVX512-FP16
> +On x86 targets, without @option{-mavx512fp16}, @code{_Float16} type is

I'd add write
  targets with SSE2 enabled, without ...

> +storage only, all operations will be emulated by software emulation and the
> +@code{float} instructions. The default behavior for @code{FLT_EVAL_METHOD} is
> +to keep the intermediate result of the operation as 32-bit precision. This 
> may
> +lead to inconsistent behavior between software emulation and AVX512-FP16
>  instructions.

Ok for trunk with that change, thanks.

Jakub

Re: [PATCH 2/6] [i386] Enable _Float16 type for TARGET_SSE2 and above.

2021-09-07 Thread Hongtao Liu via Gcc-patches

On Tue, Sep 7, 2021 at 3:18 PM Jakub Jelinek  wrote:
>
> On Tue, Sep 07, 2021 at 09:52:57AM +0800, Hongtao Liu wrote:
> > Adjust the wording for x86 _Float16 type.
> >
> > gcc/ChangeLog:
> >
> > * doc/extend.texi: (@node Floating Types): Adjust the wording.
> > (@node Half-Precision): Ditto.
> >
> > 1 file changed, 15 insertions(+), 13 deletions(-)
> > gcc/doc/extend.texi | 28 +++-
> >
> > modified   gcc/doc/extend.texi
> > @@ -1076,9 +1076,10 @@ systems where @code{__float128} is supported.
> > The @code{_Float32}
> >  type is supported on all systems supporting IEEE binary32; the
> >  @code{_Float64} and @code{_Float32x} types are supported on all systems
> >  supporting IEEE binary64.  The @code{_Float16} type is supported on AArch64
> > -systems by default, and on ARM systems when the IEEE format for 16-bit
> > -floating-point types is selected with @option{-mfp16-format=ieee}.
> > -GCC does not currently support @code{_Float128x} on any systems.
> > +systems by default when the IEEE format for 16-bit floating-point types is
>
> The AArch64 case now has the ARM case restriction and ARM is lost.  It
> should be
>
> +systems by default, on ARM systems when the IEEE format for 16-bit
> +floating-point-types is
>
> > +selected with @option{-mfp16-format=ieee} and, for both C and C++, on x86
> > +systems with SSE2 enabled. GCC does not currently support
> > +@code{_Float128x} on any systems.
> >
> >  On the i386, x86_64, IA-64, and HP-UX targets, you can declare complex
> >  types using the corresponding internal complex type, @code{XCmode} for
> > @@ -1108,6 +1109,12 @@ On ARM and AArch64 targets, GCC supports
> > half-precision (16-bit) floating
> >  point via the @code{__fp16} type defined in the ARM C Language Extensions.
> >  On ARM systems, you must enable this type explicitly with the
> >  @option{-mfp16-format} command-line option in order to use it.
> > +On x86 targets with SSE2 enabled, GCC supports half-precision (16-bit)
> > +floating point via the @code{_Float16} type, there are many ways to enable
> > +SSE2, @option{-msse2, -mavx, -mavx512f, ...} on the command line, or 
> > various
> > +target attributes.
>
> The ", there are many ways ... attributes" was just meant as explanation for
> the "with SSE2 enabled" wording, not something that should be literally in
> the documentation.  It is documented elsewhere...
>
> > +For C++, x86 provides a builtin type named @code{_Float16} which contains
> > +same data format as C.
> >
> >  ARM targets support two incompatible representations for half-precision
> >  floating-point values.  You must choose one of the representations and
> > @@ -1151,16 +1158,11 @@ calls.
> >  It is recommended that portable code use the @code{_Float16} type defined
> >  by ISO/IEC TS 18661-3:2015.  @xref{Floating Types}.
> >
> > -On x86 targets with @code{target("sse2")} and above, GCC supports
> > half-precision
> > -(16-bit) floating point via the @code{_Float16} type which is defined by
> > -18661-3:2015. For C++, x86 provide a builtin type named @code{_Float16}
> > -which contains same data format as C.
> > -
> > -Without @option{-mavx512fp16}, @code{_Float16} type is storage only, all
> > -operations will be emulated by software emulation and the @code{float}
> > -instructions. The default behavior for @code{FLT_EVAL_METHOD} is to keep
> > -the intermediate result of the operation as 32-bit precision. This may lead
> > -to inconsistent behavior between software emulation and AVX512-FP16
> > +On x86 targets, without @option{-mavx512fp16}, @code{_Float16} type is
> > +storage only, all operations will be emulated by software emulation and the
> > +@code{float} instructions. The default behavior for @code{FLT_EVAL_METHOD} 
> > is
> > +to keep the intermediate result of the operation as 32-bit precision. This 
> > may
> > +lead to inconsistent behavior between software emulation and AVX512-FP16
> >  instructions.
> >
> >  @node Decimal Float
>
> Jakub
>
Like this?


1 file changed, 12 insertions(+), 13 deletions(-)
gcc/doc/extend.texi | 25 -

modified   gcc/doc/extend.texi
@@ -1076,9 +1076,10 @@ systems where @code{__float128} is supported.
The @code{_Float32}
 type is supported on all systems supporting IEEE binary32; the
 @code{_Float64} and @code{_Float32x} types are supported on all systems
 supporting IEEE binary64.  The @code{_Float16} type is supported on AArch64
-systems by default, and on ARM systems when the IEEE format for 16-bit
-floating-point types is selected with @option{-mfp16-format=ieee}.
-GCC does not currently support @code{_Float128x} on any systems.
+systems by default, on ARM systems when the IEEE format for 16-bit
+floating-point types is selected with @option{-mfp16-format=ieee} and,
+for both C and C++, on x86 systems with SSE2 enabled. GCC does not currently
+support @code{_Float128x} on any systems.

 On the i386, x86_64, IA-64, and HP-UX targets, you can declare complex
 types using the

Re: [PATCH] Abstract PHI and forwarder block checks in jump threader.

2021-09-07 Thread Aldy Hernandez via Gcc-patches





On 9/6/21 9:19 AM, Richard Biener wrote:

On Fri, Sep 3, 2021 at 3:59 PM Aldy Hernandez via Gcc-patches
 wrote:


This patch abstracts out a couple common idioms in the forward
threader that I found useful while navigating the code base.

Tested on x86-64 Linux.

OK?

gcc/ChangeLog:

 * tree-ssa-threadedge.c (has_phis_p): New.
 (forwarder_block_p): New.
 (potentially_threadable_block): Call forwarder_block_p.
 (jump_threader::thread_around_empty_blocks): Call has_phis_p.
 (jump_threader::thread_through_normal_block): Call
 forwarder_block_p.
---
  gcc/tree-ssa-threadedge.c | 25 +++--
  1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c
index e57f6d3e39c..3db54a199fd 100644
--- a/gcc/tree-ssa-threadedge.c
+++ b/gcc/tree-ssa-threadedge.c
@@ -95,6 +95,21 @@ jump_threader::thread_through_all_blocks (bool 
may_peel_loop_headers)
return m_registry->thread_through_all_blocks (may_peel_loop_headers);
  }

+static inline bool
+has_phis_p (basic_block bb)
+{
+  return !gsi_end_p (gsi_start_phis (bb));


gimple_seq_empty_p (phi_nodes (bb)) shoud be cheaper.  Do virtual PHIs
count as PHIs for you?


I don't know.  The goal was to abstract some common idioms without 
changing existing behavior, but if my abstractions confuse other 
readers, perhaps I should revert my patch.


FWIW, my initial motivation here was to merge the path profitability 
code between the forward and backward threaders.  It seems the forward 
threader is more permissive than the backward threader, even though the 
latter can thread more paths than it's allowed (per profitable_path_p).





+}
+
+/* Return TRUE for a forwarder block which is defined as having PHIs
+   but no instructions.  */
+
+static bool
+forwarder_block_p (basic_block bb)


There exists a function with exactly the same signature in cfgrtl.h, likewise
several similar implementations might exist elsewhere.


Ughh, that's definitely not good.



Your definition is also quite odd, not matching what one would expect
(the PHI requirement).  The tree-cfgcleanup.c variant has
tree_forwarder_block_p which is explicit about this.

Btw, gsi_start_nondebug_bb does not ignore labels.


Would a name like empty_block_with_phis_p be more appropriate?

Aldy

[PATCH] Come up with section_flag enum.

2021-09-07 Thread Martin Liška


Hi.

I'm planning some refactoring related to 'section *' and I noticed we have
quite ugly mask definitions (of form 1UL << N), where SECTION_FORGET is unused
and

#define SECTION_STYLE_MASK 0x60 /* bits used for SECTION_STYLE */

Is actually OR of 2 other values. What about making that a standard enum value
with 1UL << N values?

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

gcc/ChangeLog:

* output.h (enum section_flag): New.
(SECTION_FORGET): Remove.
(SECTION_ENTSIZE): Make it (1UL << 8) - 1.
(SECTION_STYLE_MASK): Define it based on other enum
values.
* varasm.c (switch_to_section): Remove unused handling of
SECTION_FORGET.
---
 gcc/output.h | 85 +---
 gcc/varasm.c |  5 +---
 2 files changed, 48 insertions(+), 42 deletions(-)

diff --git a/gcc/output.h b/gcc/output.h
index 73ca4545f4f..8f6f15308f4 100644
--- a/gcc/output.h
+++ b/gcc/output.h
@@ -365,44 +365,53 @@ extern void default_function_switched_text_sections (FILE 
*, tree, bool);
 extern void no_asm_to_stream (FILE *);
 
 /* Flags controlling properties of a section.  */

-#define SECTION_ENTSIZE 0x000ff/* entity size in section */
-#define SECTION_CODE0x00100/* contains code */
-#define SECTION_WRITE   0x00200/* data is writable */
-#define SECTION_DEBUG   0x00400/* contains debug data */
-#define SECTION_LINKONCE 0x00800   /* is linkonce */
-#define SECTION_SMALL   0x01000/* contains "small data" */
-#define SECTION_BSS 0x02000/* contains zeros only */
-#define SECTION_FORGET  0x04000/* forget that we've entered the 
section */
-#define SECTION_MERGE   0x08000/* contains mergeable data */
-#define SECTION_STRINGS  0x1   /* contains zero terminated strings 
without
-  embedded zeros */
-#define SECTION_OVERRIDE 0x2   /* allow override of default flags */
-#define SECTION_TLS 0x4/* contains thread-local storage */
-#define SECTION_NOTYPE  0x8/* don't output @progbits */
-#define SECTION_DECLARED 0x10  /* section has been used */
-#define SECTION_STYLE_MASK 0x60/* bits used for SECTION_STYLE */
-#define SECTION_COMMON   0x80  /* contains common data */
-#define SECTION_RELRO   0x100  /* data is readonly after relocation 
processing */
-#define SECTION_EXCLUDE  0x200 /* discarded by the linker */
-#define SECTION_RETAIN  0x400  /* retained by the linker.  */
-#define SECTION_LINK_ORDER 0x800   /* section needs link-order.  */
-
-/* NB: The maximum SECTION_MACH_DEP is 0x1000 since AVR needs 4 bits
-   in SECTION_MACH_DEP.  */
-#define SECTION_MACH_DEP 0x1000/* subsequent bits reserved for target 
*/
-
-/* This SECTION_STYLE is used for unnamed sections that we can switch
-   to using a special assembler directive.  */
-#define SECTION_UNNAMED 0x00
-
-/* This SECTION_STYLE is used for named sections that we can switch
-   to using a general section directive.  */
-#define SECTION_NAMED   0x20
-
-/* This SECTION_STYLE is used for sections that we cannot switch to at
-   all.  The choice of section is implied by the directive that we use
-   to declare the object.  */
-#define SECTION_NOSWITCH 0x40
+enum section_flag
+{
+  /* This SECTION_STYLE is used for unnamed sections that we can switch
+ to using a special assembler directive.  */
+  SECTION_UNNAMED = 0,
+
+  SECTION_ENTSIZE = (1UL << 8) - 1,  /* entity size in section */
+  SECTION_CODE = 1UL << 8,   /* contains code */
+  SECTION_WRITE = 1UL << 9,  /* data is writable */
+
+  SECTION_DEBUG = 1UL << 10, /* contains debug data */
+  SECTION_LINKONCE = 1UL << 11,  /* is linkonce */
+  SECTION_SMALL = 1UL << 12, /* contains "small data" */
+  SECTION_BSS = 1UL << 13,   /* contains zeros only */
+  SECTION_MERGE = 1UL << 14, /* contains mergeable data */
+  SECTION_STRINGS = 1UL << 15,   /* contains zero terminated strings
+  without embedded zeros */
+  SECTION_OVERRIDE = 1UL << 16,  /* allow override of default flags */
+  SECTION_TLS = 1UL << 17,   /* contains thread-local storage */
+  SECTION_NOTYPE = 1UL << 18,/* don't output @progbits */
+  SECTION_DECLARED = 1UL << 19,  /* section has been used */
+
+  /* This SECTION_STYLE is used for named sections that we can switch
+ to using a general section directive.  */
+  SECTION_NAMED = 1UL << 20,
+
+  /* This SECTION_STYLE is used for sections that we cannot switch to at
+ all.  The choice of section is implied by the directive that we use
+ to declare the object.  */
+  SECTION_NOSWITCH = 1UL << 21,
+
+  /* bits used for SECTION_STYLE */

Re: [PATCH 04/13] arm: Add GENERAL_AND_VPR_REGS regclass

2021-09-07 Thread Richard Earnshaw via Gcc-patches





On 07/09/2021 10:15, Christophe Lyon via Gcc-patches wrote:

At some point during the development of this patch series, it appeared
that in some cases the register allocator wants “VPR or general”
rather than “VPR or general or FP” (which is the same thing as
ALL_REGS).  The series does not seem to require this anymore, but it
seems to be a good thing to do anyway, to give the register allocator
more freedom.

2021-09-01  Christophe Lyon  

gcc/
* config/arm/arm.h (reg_class): Add GENERAL_AND_VPR_REGS.
(REG_CLASS_NAMES): Likewise.
(REG_CLASS_CONTENTS): Likewise. Add VPR_REG to ALL_REGS.

diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 015299c1534..fab39d05916 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -1286,6 +1286,7 @@ enum reg_class
SFP_REG,
AFP_REG,
VPR_REG,
+  GENERAL_AND_VPR_REGS,
ALL_REGS,
LIM_REG_CLASSES
  };
@@ -1315,6 +1316,7 @@ enum reg_class
"SFP_REG",\
"AFP_REG",\
"VPR_REG",\
+  "GENERAL_AND_VPR_REGS", \
"ALL_REGS"\
  }
  
@@ -1343,7 +1345,8 @@ enum reg_class

{ 0x, 0x, 0x, 0x0040 }, /* SFP_REG */   \
{ 0x, 0x, 0x, 0x0080 }, /* AFP_REG */   \
{ 0x, 0x, 0x, 0x0400 }, /* VPR_REG.  */ \
-  { 0x7FFF, 0x, 0x, 0x000F }  /* ALL_REGS.  */ \
+  { 0x5FFF, 0x, 0x, 0x0400 }, /* GENERAL_AND_VPR_REGS. 
 */ \
+  { 0x7FFF, 0x, 0x, 0x040F }  /* ALL_REGS.  */ \
  }


You've changed the definition of ALL_REGS here (to include VPR_REG), but 
not really explained why.  Is that the source of the underlying issue 
with the 'appeared' you mention?


R.


  
  #define FP_SYSREGS \

Re: [PATCH] flag_complex_method: support optimize attribute

2021-09-07 Thread Martin Liška


On 9/6/21 14:16, Richard Biener wrote:

On Mon, Sep 6, 2021 at 1:46 PM Jakub Jelinek  wrote:


On Mon, Sep 06, 2021 at 01:37:46PM +0200, Martin Liška wrote:

--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -1323,6 +1323,14 @@ finish_options (struct gcc_options *opts, struct 
gcc_options *opts_set,
= (opts->x_flag_unroll_loops
   || opts->x_flag_peel_loops
   || opts->x_optimize >= 3);
+
+  /* With -fcx-limited-range, we do cheap and quick complex arithmetic.  */
+  if (opts->x_flag_cx_limited_range)
+flag_complex_method = 0;
+
+  /* With -fcx-fortran-rules, we do something in-between cheap and C99.  */
+  if (opts->x_flag_cx_fortran_rules)
+flag_complex_method = 1;


That should then be opts->x_flag_complex_method instead of flag_complex_method.

Ok with that change.


But the C/C++ langhooks also set flag_complex_method so I fail to see how
this helps?  As said I was referring to -fcx-limited-range on the command-line
and -fno-cx-limited-range in the optimize node to undo this which should
get you the langhook setting of flag_complex_method = 2.


You are right, it's even more complicated as -fno-cx-limited-range is target 
specific.
Option handling has been introducing surprises every time ...

The following tested patch should handle it.

Ready to be installed?
Thanks,
Martin




Note, I think we want to do much more in finish_options and less in
process_options, anything that is about Optimization options rather than
just the global ones.  Though one needs to be careful with the cases where
the code diagnoses something.

 Jakub

From e88ae14be7c5609a969897b5d09f40709fea8a34 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Fri, 3 Sep 2021 10:53:00 +0200
Subject: [PATCH] flag_complex_method: support optimize attribute

gcc/c-family/ChangeLog:

	* c-opts.c (c_common_init_options_struct): Set also
	  x_flag_default_complex_method.

gcc/ChangeLog:

	* common.opt: Add new variable flag_default_complex_method.
	* opts.c (finish_options): Handle flags related to
	  x_flag_complex_method.
	* toplev.c (process_options): Remove option handling related
	to flag_complex_method.

gcc/go/ChangeLog:

	* go-lang.c (go_langhook_init_options_struct): Set also
	  x_flag_default_complex_method.

gcc/lto/ChangeLog:

	* lto-lang.c (lto_init_options_struct): Set also
	  x_flag_default_complex_method.

gcc/testsuite/ChangeLog:

	* gcc.c-torture/compile/attr-complex-method-2.c: New test.
	* gcc.c-torture/compile/attr-complex-method.c: New test.
---
 gcc/c-family/c-opts.c|  1 +
 gcc/common.opt   |  3 +++
 gcc/go/go-lang.c |  1 +
 gcc/lto/lto-lang.c   |  1 +
 gcc/opts.c   | 12 
 .../gcc.c-torture/compile/attr-complex-method-2.c| 10 ++
 .../gcc.c-torture/compile/attr-complex-method.c  | 10 ++
 gcc/toplev.c |  8 
 8 files changed, 38 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.c-torture/compile/attr-complex-method-2.c
 create mode 100644 gcc/testsuite/gcc.c-torture/compile/attr-complex-method.c

diff --git a/gcc/c-family/c-opts.c b/gcc/c-family/c-opts.c
index fdde082158b..3eaab5e1530 100644
--- a/gcc/c-family/c-opts.c
+++ b/gcc/c-family/c-opts.c
@@ -222,6 +222,7 @@ c_common_init_options_struct (struct gcc_options *opts)
 
   /* By default, C99-like requirements for complex multiply and divide.  */
   opts->x_flag_complex_method = 2;
+  opts->x_flag_default_complex_method = opts->x_flag_complex_method;
 }
 
 /* Common initialization before calling option handlers.  */
diff --git a/gcc/common.opt b/gcc/common.opt
index 7d69ab5ef7c..6bfe0b74023 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -59,6 +59,9 @@ enum incremental_link flag_incremental_link = INCREMENTAL_LINK_NONE
 Variable
 int flag_complex_method = 1
 
+Variable
+int flag_default_complex_method = 1
+
 ; Language specific warning pass for unused results.
 Variable
 bool flag_warn_unused_result = false
diff --git a/gcc/go/go-lang.c b/gcc/go/go-lang.c
index a01db8dbdcd..c3ae6f012bb 100644
--- a/gcc/go/go-lang.c
+++ b/gcc/go/go-lang.c
@@ -174,6 +174,7 @@ go_langhook_init_options_struct (struct gcc_options *opts)
   /* Default to avoiding range issues for complex multiply and
  divide.  */
   opts->x_flag_complex_method = 2;
+  opts->x_flag_default_complex_method = opts->x_flag_complex_method;
 
   /* The builtin math functions should not set errno.  */
   opts->x_flag_errno_math = 0;
diff --git a/gcc/lto/lto-lang.c b/gcc/lto/lto-lang.c
index 92f499643b5..a014e5884e0 100644
--- a/gcc/lto/lto-lang.c
+++ b/gcc/lto/lto-lang.c
@@ -813,6 +813,7 @@ lto_init_options_struct (struct gcc_options *opts)
  safe choice.  This will pessimize Fortran code with LTO unless
  people specify a complex method manually or use -ffast-math.  */
   opts->x_flag_complex_method

[PATCH 13/13] arm: Convert more MVE/CDE builtins to predicate qualifiers

2021-09-07 Thread Christophe Lyon via Gcc-patches

This patch covers a few non-load/store builtins where we do not use
the  iterator and thus we cannot use .

We need to update the expected code in cde-mve-full-assembly.c because
we now use mve_movv16qi instead of movhi to generate the vmsr
instruction.

2021-09-02  Christophe Lyon  

gcc/
PR target/100757
PR target/101325
* config/arm/arm-builtins.c (CX_UNARY_UNONE_QUALIFIERS): Use
predicate.
(CX_BINARY_UNONE_QUALIFIERS): Likewise.
(CX_TERNARY_UNONE_QUALIFIERS): Likewise.
(TERNOP_NONE_NONE_NONE_UNONE_QUALIFIERS): Delete.
(QUADOP_NONE_NONE_NONE_NONE_UNONE_QUALIFIERS): Delete.
(QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE_QUALIFIERS): Delete.
* config/arm/arm_mve_builtins.def: Use predicated qualifiers.
* config/arm/mve.md: Use VxBI instead of HI.

gcc/testsuite/
* gcc.target/arm/acle/cde-mve-full-assembly.c: Remove expected '@ 
movhi'.

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index e58580bb828..d725458f1ad 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -344,7 +344,7 @@ static enum arm_type_qualifiers
 arm_cx_unary_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_immediate, qualifier_none,
   qualifier_unsigned_immediate,
-  qualifier_unsigned };
+  qualifier_predicate };
 #define CX_UNARY_UNONE_QUALIFIERS (arm_cx_unary_unone_qualifiers)
 
 /* T (immediate, T, T, unsigned immediate).  */
@@ -353,7 +353,7 @@ arm_cx_binary_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_immediate,
   qualifier_none, qualifier_none,
   qualifier_unsigned_immediate,
-  qualifier_unsigned };
+  qualifier_predicate };
 #define CX_BINARY_UNONE_QUALIFIERS (arm_cx_binary_unone_qualifiers)
 
 /* T (immediate, T, T, T, unsigned immediate).  */
@@ -362,7 +362,7 @@ arm_cx_ternary_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_immediate,
   qualifier_none, qualifier_none, qualifier_none,
   qualifier_unsigned_immediate,
-  qualifier_unsigned };
+  qualifier_predicate };
 #define CX_TERNARY_UNONE_QUALIFIERS (arm_cx_ternary_unone_qualifiers)
 
 /* The first argument (return type) of a store should be void type,
@@ -558,12 +558,6 @@ 
arm_ternop_none_none_none_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define TERNOP_NONE_NONE_NONE_IMM_QUALIFIERS \
   (arm_ternop_none_none_none_imm_qualifiers)
 
-static enum arm_type_qualifiers
-arm_ternop_none_none_none_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
-  = { qualifier_none, qualifier_none, qualifier_none, qualifier_unsigned };
-#define TERNOP_NONE_NONE_NONE_UNONE_QUALIFIERS \
-  (arm_ternop_none_none_none_unone_qualifiers)
-
 static enum arm_type_qualifiers
 arm_ternop_none_none_none_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_none, qualifier_none, qualifier_predicate };
@@ -616,13 +610,6 @@ 
arm_quadop_unone_unone_none_none_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define QUADOP_UNONE_UNONE_NONE_NONE_PRED_QUALIFIERS \
   (arm_quadop_unone_unone_none_none_pred_qualifiers)
 
-static enum arm_type_qualifiers
-arm_quadop_none_none_none_none_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
-  = { qualifier_none, qualifier_none, qualifier_none, qualifier_none,
-qualifier_unsigned };
-#define QUADOP_NONE_NONE_NONE_NONE_UNONE_QUALIFIERS \
-  (arm_quadop_none_none_none_none_unone_qualifiers)
-
 static enum arm_type_qualifiers
 arm_quadop_none_none_none_none_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_none, qualifier_none, qualifier_none,
@@ -637,13 +624,6 @@ 
arm_quadop_none_none_none_imm_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define QUADOP_NONE_NONE_NONE_IMM_PRED_QUALIFIERS \
   (arm_quadop_none_none_none_imm_pred_qualifiers)
 
-static enum arm_type_qualifiers
-arm_quadop_unone_unone_unone_unone_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
-  = { qualifier_unsigned, qualifier_unsigned, qualifier_unsigned,
-qualifier_unsigned, qualifier_unsigned };
-#define QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE_QUALIFIERS \
-  (arm_quadop_unone_unone_unone_unone_unone_qualifiers)
-
 static enum arm_type_qualifiers
 arm_quadop_unone_unone_unone_unone_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_unsigned, qualifier_unsigned,
diff --git a/gcc/config/arm/arm_mve_builtins.def 
b/gcc/config/arm/arm_mve_builtins.def
index bb79edf83ca..0fb53d866ec 100644
--- a/gcc/config/arm/arm_mve_builtins.def
+++ b/gcc/config/arm/arm_mve_builtins.def
@@ -87,8 +87,8 @@ VAR4 (BINOP_UNONE_UNONE_UNONE, vcreateq_u, v16qi, v8hi, v4si, 
v2di)
 VAR4 (BINOP_NONE_UNONE_UNONE, vcreateq_s, v16qi, v8hi, v4si, v2di)
 VAR3 (BINOP_UNONE_UNONE_IMM, vshrq_n_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_NONE_NONE_IMM, vshrq_n_s, v16qi, v8hi, v4si)
-VAR1 (BINOP_NONE_NONE_UNONE, vaddlvq_p_s, v4si)
-VAR1 (BINOP_UNONE_UNONE_UNONE, vaddlvq_p_u, v4si)
+VAR1 (BINOP_NONE_NONE_PRED, vaddlvq_p_s, v4si)
+VAR1

[PATCH 12/13] arm: Convert more load/store MVE builtins to predicate qualifiers

2021-09-07 Thread Christophe Lyon via Gcc-patches

This patch covers a few builtins where we do not use the 
iterator and thus we cannot use .

However this introduces a problem for the v2di instructions, because
there is not predicate for this case.  For instance, changing
STRSBS_P_QUALIFIERS breaks mve_vstrdq_scatter_base_p_v2di.
Similarly, this patch introduces problems with:
mve_vldrdq_gather_base_z_v2di
mve_vldrdq_gather_base_wb_z_v2di
mve_vldrdq_gather_base_nowb_z_v2di
mve_vstrdq_scatter_base_wb_p_v2di

2021-09-02  Christophe Lyon  

gcc/
PR target/100757
PR target/101325
* config/arm/arm-builtins.c (STRSBS_P_QUALIFIERS): Use predicate
qualifier.
(STRSBU_P_QUALIFIERS): Likewise.
(LDRGBS_Z_QUALIFIERS): Likewise.
(LDRGBU_Z_QUALIFIERS): Likewise.
(LDRGBWBXU_Z_QUALIFIERS): Likewise.
(LDRGBWBS_Z_QUALIFIERS): Likewise.
(LDRGBWBU_Z_QUALIFIERS): Likewise.
(STRSBWBS_P_QUALIFIERS): Likewise.
(STRSBWBU_P_QUALIFIERS): Likewise.
* config/arm/mve.md: Use VxBI instead of HI.

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 06ff9d2278a..e58580bb828 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -738,13 +738,13 @@ arm_strss_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 static enum arm_type_qualifiers
 arm_strsbs_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_void, qualifier_unsigned, qualifier_immediate,
-  qualifier_none, qualifier_unsigned};
+  qualifier_none, qualifier_predicate};
 #define STRSBS_P_QUALIFIERS (arm_strsbs_p_qualifiers)
 
 static enum arm_type_qualifiers
 arm_strsbu_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_void, qualifier_unsigned, qualifier_immediate,
-  qualifier_unsigned, qualifier_unsigned};
+  qualifier_unsigned, qualifier_predicate};
 #define STRSBU_P_QUALIFIERS (arm_strsbu_p_qualifiers)
 
 static enum arm_type_qualifiers
@@ -780,13 +780,13 @@ arm_ldrgbu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 static enum arm_type_qualifiers
 arm_ldrgbs_z_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_unsigned, qualifier_immediate,
-  qualifier_unsigned};
+  qualifier_predicate};
 #define LDRGBS_Z_QUALIFIERS (arm_ldrgbs_z_qualifiers)
 
 static enum arm_type_qualifiers
 arm_ldrgbu_z_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_unsigned, qualifier_immediate,
-  qualifier_unsigned};
+  qualifier_predicate};
 #define LDRGBU_Z_QUALIFIERS (arm_ldrgbu_z_qualifiers)
 
 static enum arm_type_qualifiers
@@ -826,7 +826,7 @@ arm_ldrgbwbxu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 static enum arm_type_qualifiers
 arm_ldrgbwbxu_z_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_unsigned, qualifier_immediate,
-  qualifier_unsigned};
+  qualifier_predicate};
 #define LDRGBWBXU_Z_QUALIFIERS (arm_ldrgbwbxu_z_qualifiers)
 
 static enum arm_type_qualifiers
@@ -842,13 +842,13 @@ arm_ldrgbwbu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 static enum arm_type_qualifiers
 arm_ldrgbwbs_z_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_unsigned, qualifier_immediate,
-  qualifier_unsigned};
+  qualifier_predicate};
 #define LDRGBWBS_Z_QUALIFIERS (arm_ldrgbwbs_z_qualifiers)
 
 static enum arm_type_qualifiers
 arm_ldrgbwbu_z_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_unsigned, qualifier_immediate,
-  qualifier_unsigned};
+  qualifier_predicate};
 #define LDRGBWBU_Z_QUALIFIERS (arm_ldrgbwbu_z_qualifiers)
 
 static enum arm_type_qualifiers
@@ -864,13 +864,13 @@ arm_strsbwbu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 static enum arm_type_qualifiers
 arm_strsbwbs_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_unsigned, qualifier_const,
-  qualifier_none, qualifier_unsigned};
+  qualifier_none, qualifier_predicate};
 #define STRSBWBS_P_QUALIFIERS (arm_strsbwbs_p_qualifiers)
 
 static enum arm_type_qualifiers
 arm_strsbwbu_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_unsigned, qualifier_const,
-  qualifier_unsigned, qualifier_unsigned};
+  qualifier_unsigned, qualifier_predicate};
 #define STRSBWBU_P_QUALIFIERS (arm_strsbwbu_p_qualifiers)
 
 static enum arm_type_qualifiers
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 2f36d47c800..241195909da 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -7294,7 +7294,7 @@ (define_insn "mve_vstrwq_scatter_base_p_v4si"
[(match_operand:V4SI 0 "s_register_operand" "w")
 (match_operand:SI 1 "immediate_operand" "i")
 (match_operand:V4SI 2 "s_register_operand" "w")
-(match_operand:HI 3 "vpr_register_operand" "Up")]
+(match_operand:V4BI 3 "vpr_register_operand" "Up")]
 VSTRWSBQ))
   ]
   "TARGET_HAVE_MVE"
@@ -7383,7 +7383,7 @@ (define_insn "mve_vldrwq_gather_base_z_v4si"
   [(set (match_operand:V4SI 0 "s_register_operand" "=")
(unspec:V4SI

[PATCH 10/13] arm: Convert remaining MVE vcmp builtins to predicate qualifiers

2021-09-07 Thread Christophe Lyon via Gcc-patches

This is mostly a mechanical change, only tested by the intrinsics
expansion tests.

2021-09-02  Christophe Lyon  

gcc/
PR target/100757
PR target/101325
* config/arm/arm-builtins.c (BINOP_UNONE_NONE_NONE_QUALIFIERS):
Delete.
(TERNOP_UNONE_NONE_NONE_UNONE_QUALIFIERS): Change to ...
(TERNOP_PRED_NONE_NONE_PRED_QUALIFIERS): ... this.
(TERNOP_PRED_UNONE_UNONE_PRED_QUALIFIERS): New.
* config/arm/arm_mve_builtins.def (vcmp*q_n_, vcmp*q_m_f): Use new
predicated qualifiers.
* config/arm/mve.md (mve_vcmpq_n_)
(mve_vcmp*q_m_f): Use MVE_VPRED instead of HI.

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 6e3638869f1..b3455d87d4f 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -487,12 +487,6 @@ arm_binop_none_none_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define BINOP_NONE_NONE_UNONE_QUALIFIERS \
   (arm_binop_none_none_unone_qualifiers)
 
-static enum arm_type_qualifiers
-arm_binop_unone_none_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
-  = { qualifier_unsigned, qualifier_none, qualifier_none };
-#define BINOP_UNONE_NONE_NONE_QUALIFIERS \
-  (arm_binop_unone_none_none_qualifiers)
-
 static enum arm_type_qualifiers
 arm_binop_pred_none_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_predicate, qualifier_none, qualifier_none };
@@ -553,10 +547,10 @@ 
arm_ternop_unone_unone_imm_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   (arm_ternop_unone_unone_imm_unone_qualifiers)
 
 static enum arm_type_qualifiers
-arm_ternop_unone_none_none_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
-  = { qualifier_unsigned, qualifier_none, qualifier_none, qualifier_unsigned };
-#define TERNOP_UNONE_NONE_NONE_UNONE_QUALIFIERS \
-  (arm_ternop_unone_none_none_unone_qualifiers)
+arm_ternop_pred_none_none_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_predicate, qualifier_none, qualifier_none, qualifier_predicate 
};
+#define TERNOP_PRED_NONE_NONE_PRED_QUALIFIERS \
+  (arm_ternop_pred_none_none_pred_qualifiers)
 
 static enum arm_type_qualifiers
 arm_ternop_none_none_none_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
@@ -602,6 +596,13 @@ 
arm_ternop_unone_unone_unone_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define TERNOP_UNONE_UNONE_UNONE_PRED_QUALIFIERS \
   (arm_ternop_unone_unone_unone_pred_qualifiers)
 
+static enum arm_type_qualifiers
+arm_ternop_pred_unone_unone_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_predicate, qualifier_unsigned, qualifier_unsigned,
+qualifier_predicate };
+#define TERNOP_PRED_UNONE_UNONE_PRED_QUALIFIERS \
+  (arm_ternop_pred_unone_unone_pred_qualifiers)
+
 static enum arm_type_qualifiers
 arm_ternop_none_none_none_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_none, qualifier_none, qualifier_none };
diff --git a/gcc/config/arm/arm_mve_builtins.def 
b/gcc/config/arm/arm_mve_builtins.def
index 58a05e61bd9..91ed2073918 100644
--- a/gcc/config/arm/arm_mve_builtins.def
+++ b/gcc/config/arm/arm_mve_builtins.def
@@ -118,9 +118,9 @@ VAR3 (BINOP_UNONE_UNONE_UNONE, vhaddq_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_UNONE, vhaddq_n_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_UNONE, veorq_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_PRED_UNONE_UNONE, vcmphiq_, v16qi, v8hi, v4si)
-VAR3 (BINOP_UNONE_UNONE_UNONE, vcmphiq_n_, v16qi, v8hi, v4si)
+VAR3 (BINOP_PRED_UNONE_UNONE, vcmphiq_n_, v16qi, v8hi, v4si)
 VAR3 (BINOP_PRED_UNONE_UNONE, vcmpcsq_, v16qi, v8hi, v4si)
-VAR3 (BINOP_UNONE_UNONE_UNONE, vcmpcsq_n_, v16qi, v8hi, v4si)
+VAR3 (BINOP_PRED_UNONE_UNONE, vcmpcsq_n_, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_UNONE, vbicq_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_UNONE, vandq_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_UNONE, vaddvq_p_u, v16qi, v8hi, v4si)
@@ -142,17 +142,17 @@ VAR3 (BINOP_UNONE_UNONE_NONE, vbrsrq_n_u, v16qi, v8hi, 
v4si)
 VAR3 (BINOP_UNONE_UNONE_IMM, vshlq_n_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_IMM, vrshrq_n_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_IMM, vqshlq_n_u, v16qi, v8hi, v4si)
-VAR3 (BINOP_UNONE_NONE_NONE, vcmpneq_n_, v16qi, v8hi, v4si)
+VAR3 (BINOP_PRED_NONE_NONE, vcmpneq_n_, v16qi, v8hi, v4si)
 VAR3 (BINOP_PRED_NONE_NONE, vcmpltq_, v16qi, v8hi, v4si)
-VAR3 (BINOP_UNONE_NONE_NONE, vcmpltq_n_, v16qi, v8hi, v4si)
+VAR3 (BINOP_PRED_NONE_NONE, vcmpltq_n_, v16qi, v8hi, v4si)
 VAR3 (BINOP_PRED_NONE_NONE, vcmpleq_, v16qi, v8hi, v4si)
-VAR3 (BINOP_UNONE_NONE_NONE, vcmpleq_n_, v16qi, v8hi, v4si)
+VAR3 (BINOP_PRED_NONE_NONE, vcmpleq_n_, v16qi, v8hi, v4si)
 VAR3 (BINOP_PRED_NONE_NONE, vcmpgtq_, v16qi, v8hi, v4si)
-VAR3 (BINOP_UNONE_NONE_NONE, vcmpgtq_n_, v16qi, v8hi, v4si)
+VAR3 (BINOP_PRED_NONE_NONE, vcmpgtq_n_, v16qi, v8hi, v4si)
 VAR3 (BINOP_PRED_NONE_NONE, vcmpgeq_, v16qi, v8hi, v4si)
-VAR3 (BINOP_UNONE_NONE_NONE, vcmpgeq_n_, v16qi, v8hi, v4si)
+VAR3 (BINOP_PRED_NONE_NONE, vcmpgeq_n_, v16qi, v8hi, v4si)
 VAR3 (BINOP_PRED_NONE_NONE, vcmpeqq_, v16qi, v8hi, v4si)
-VAR3 (BINOP_UNONE_NONE_NONE,

[PATCH 09/13] arm: Fix vcond_mask expander for MVE (PR target/100757)

2021-09-07 Thread Christophe Lyon via Gcc-patches

From: Christophe Lyon 

The problem in this PR is that we call VPSEL with a mask of vector
type instead of HImode. This happens because operand 3 in vcond_mask
is the pre-computed vector comparison and has vector type.

This patch fixes it by implementing TARGET_VECTORIZE_GET_MASK_MODE,
returning the appropriate VxBI mode when targeting MVE.  In turn, this
implies implementing vec_cmp,
vec_cmpu and vcond_mask_, and we can
move vec_cmp, vec_cmpu and
vcond_mask_ back to neon.md since they are not
used by MVE anymore.  The new * patterns listed above are
implemented in mve.md since they are only valid for MVE. However this
may make maintenance/comparison more painful than having all of them
in vec-common.md.

In the process, we can get rid of the recently added vcond_mve
parameter of arm_expand_vector_compare.

Compared to neon.md's vcond_mask_ before my "arm:
Auto-vectorization for MVE: vcmp" patch (r12-834), it keeps the VDQWH
iterator added in r12-835 (to have V4HF/V8HF support), as well as the
(! || flag_unsafe_math_optimizations) condition which
was not present before r12-834 although SF modes were enabled by VDQW
(I think this was a bug).

Using TARGET_VECTORIZE_GET_MASK_MODE has the advantage that we no
longer need to generate vpsel with vectors of 0 and 1: the masks are
now merged via scalar 'ands' instructions operating on 16-bit masks
after converting the boolean vectors.

In addition, this patch fixes a problem in arm_expand_vcond() where
the result would be a vector of 0 or 1 instead of operand 1 or 2.

Reducing the number of iterations in pr100757-3.c from 32 to 8, we
generate the code below:

float a[32];
float fn1(int d) {
  float c = 4.0f;
  for (int b = 0; b < 8; b++)
if (a[b] != 2.0f)
  c = 5.0f;
  return c;
}

fn1:
ldr r3, .L3+48
vldr.64 d4, .L3  // q2=(2.0,2.0,2.0,2.0)
vldr.64 d5, .L3+8
vldrw.32q0, [r3] // q0=a(0..3)
addsr3, r3, #16
vcmp.f32eq, q0, q2   // cmp a(0..3) == (2.0,2.0,2.0,2.0)
vldrw.32q1, [r3] // q1=a(4..7)
vmrs r3, P0
vcmp.f32eq, q1, q2   // cmp a(4..7) == (2.0,2.0,2.0,2.0)
vmrsr2, P0  @ movhi
andsr3, r3, r2   // r3=select(a(0..3]) & select(a(4..7))
vldr.64 d4, .L3+16   // q2=(5.0,5.0,5.0,5.0)
vldr.64 d5, .L3+24
vmsr P0, r3
vldr.64 d6, .L3+32   // q3=(4.0,4.0,4.0,4.0)
vldr.64 d7, .L3+40
vpsel q3, q3, q2 // q3=vcond_mask(4.0,5.0)
vmov.32 r2, q3[1]// keep the scalar max
vmov.32 r0, q3[3]
vmov.32 r3, q3[2]
vmov.f32s11, s12
vmovs15, r2
vmovs14, r3
vmaxnm.f32  s15, s11, s15
vmaxnm.f32  s15, s15, s14
vmovs14, r0
vmaxnm.f32  s15, s15, s14
vmovr0, s15
bx  lr
.L4:
.align  3
.L3:
.word   1073741824  // 2.0f
.word   1073741824
.word   1073741824
.word   1073741824
.word   1084227584  // 5.0f
.word   1084227584
.word   1084227584
.word   1084227584
.word   1082130432  // 4.0f
.word   1082130432
.word   1082130432
.word   1082130432

2021-09-02  Christophe Lyon  

PR target/100757
gcc/
* config/arm/arm-protos.h (arm_get_mask_mode): New prototype.
(arm_expand_vector_compare): Update prototype.
* config/arm/arm.c (TARGET_VECTORIZE_GET_MASK_MODE): New.
(arm_vector_mode_supported_p): Add support for VxBI modes.
(arm_expand_vector_compare): Remove useless generation of vpsel.
(arm_expand_vcond): Fix select operands.
(arm_get_mask_mode): New.
* config/arm/mve.md (vec_cmp): New.
(vec_cmpu): New.
(vcond_mask_): New.
* config/arm/vec-common.md (vec_cmp)
(vec_cmpu): Move to ...
* config/arm/neon.md (vec_cmp)
(vec_cmpu): ... here
and disable for MVE.

diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 9b1f61394ad..9e3d71e0c29 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -201,6 +201,7 @@ extern void arm_init_cumulative_args (CUMULATIVE_ARGS *, 
tree, rtx, tree);
 extern bool arm_pad_reg_upward (machine_mode, tree, int);
 #endif
 extern int arm_apply_result_size (void);
+extern opt_machine_mode arm_get_mask_mode (machine_mode mode);
 
 #endif /* RTX_CODE */
 
@@ -372,7 +373,7 @@ extern void arm_emit_coreregs_64bit_shift (enum rtx_code, 
rtx, rtx, rtx, rtx,
 extern bool arm_fusion_enabled_p (tune_params::fuse_ops);
 extern bool arm_valid_symbolic_address_p (rtx);
 extern bool arm_validize_comparison (rtx *, rtx *, rtx *);
-extern bool arm_expand_vector_compare (rtx, rtx_code, rtx, rtx, bool, bool);
+extern bool arm_expand_vector_compare (rtx, rtx_code, rtx, rtx, bool);
 #endif /*

[PATCH 08/13] arm: Implement auto-vectorized MVE comparisons with vectors of boolean predicates

2021-09-07 Thread Christophe Lyon via Gcc-patches

We make use of qualifier_predicate to describe MVE builtins
prototypes, restricting to auto-vectorizable vcmp* and vpsel builtins,
as they are exercised by the tests added earlier in the series.

Special handling is needed for mve_vpselq because it has a v2di
variant, which has no natural VPR.P0 representation: we keep HImode
for it.

The vector_compare expansion code is updated to use the right VxBI
mode instead of HI for the result.

New mov patterns are introduced to handle the new modes.

2021-09-01  Christophe Lyon 

gcc/
PR target/100757
PR target/101325
* config/arm/arm-builtins.c (BINOP_PRED_UNONE_UNONE_QUALIFIERS)
(BINOP_PRED_NONE_NONE_QUALIFIERS)
(TERNOP_NONE_NONE_NONE_PRED_QUALIFIERS)
(TERNOP_UNONE_UNONE_UNONE_PRED_QUALIFIERS): New.
* config/arm/arm.c (arm_hard_regno_mode_ok): Handle new VxBI
modes.
(arm_mode_to_pred_mode): New.
(arm_expand_vector_compare): Use the right VxBI mode instead of
HI.
(arm_expand_vcond): Likewise.
* config/arm/arm_mve_builtins.def (vcmpneq_, vcmphiq_, vcmpcsq_)
(vcmpltq_, vcmpleq_, vcmpgtq_, vcmpgeq_, vcmpeqq_, vcmpneq_f)
(vcmpltq_f, vcmpleq_f, vcmpgtq_f, vcmpgeq_f, vcmpeqq_f, vpselq_u)
(vpselq_s, vpselq_f): Use new predicated qualifiers.
* config/arm/iterators.md (MVE_7): New mode iterator.
(MVE_VPRED, MVE_vpred): New attribute iterators.
* config/arm/mve.md (@mve_vcmpq_)
(@mve_vcmpq_f, @mve_vpselq_)
(@mve_vpselq_f): Use MVE_VPRED instead of HI.
(@mve_vpselq_v2di): Define separately.
(mov): New expander for VxBI modes.
(mve_mov): New insn for VxBI modes.

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 771759f0cdd..6e3638869f1 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -469,6 +469,12 @@ 
arm_binop_unone_unone_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define BINOP_UNONE_UNONE_UNONE_QUALIFIERS \
   (arm_binop_unone_unone_unone_qualifiers)
 
+static enum arm_type_qualifiers
+arm_binop_pred_unone_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_predicate, qualifier_unsigned, qualifier_unsigned };
+#define BINOP_PRED_UNONE_UNONE_QUALIFIERS \
+  (arm_binop_pred_unone_unone_qualifiers)
+
 static enum arm_type_qualifiers
 arm_binop_unone_none_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_none, qualifier_immediate };
@@ -487,6 +493,12 @@ arm_binop_unone_none_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define BINOP_UNONE_NONE_NONE_QUALIFIERS \
   (arm_binop_unone_none_none_qualifiers)
 
+static enum arm_type_qualifiers
+arm_binop_pred_none_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_predicate, qualifier_none, qualifier_none };
+#define BINOP_PRED_NONE_NONE_QUALIFIERS \
+  (arm_binop_pred_none_none_qualifiers)
+
 static enum arm_type_qualifiers
 arm_binop_unone_unone_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_unsigned, qualifier_none };
@@ -558,6 +570,12 @@ 
arm_ternop_none_none_none_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define TERNOP_NONE_NONE_NONE_UNONE_QUALIFIERS \
   (arm_ternop_none_none_none_unone_qualifiers)
 
+static enum arm_type_qualifiers
+arm_ternop_none_none_none_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none, qualifier_none, qualifier_predicate };
+#define TERNOP_NONE_NONE_NONE_PRED_QUALIFIERS \
+  (arm_ternop_none_none_none_pred_qualifiers)
+
 static enum arm_type_qualifiers
 arm_ternop_none_none_imm_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_none, qualifier_immediate, qualifier_unsigned 
};
@@ -577,6 +595,13 @@ 
arm_ternop_unone_unone_unone_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define TERNOP_UNONE_UNONE_UNONE_UNONE_QUALIFIERS \
   (arm_ternop_unone_unone_unone_unone_qualifiers)
 
+static enum arm_type_qualifiers
+arm_ternop_unone_unone_unone_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_unsigned, qualifier_unsigned,
+qualifier_predicate };
+#define TERNOP_UNONE_UNONE_UNONE_PRED_QUALIFIERS \
+  (arm_ternop_unone_unone_unone_pred_qualifiers)
+
 static enum arm_type_qualifiers
 arm_ternop_none_none_none_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_none, qualifier_none, qualifier_none };
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 1222cb0d0fe..5f6637d9a5f 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -25304,7 +25304,7 @@ arm_hard_regno_mode_ok (unsigned int regno, 
machine_mode mode)
 return false;
 
   if (IS_VPR_REGNUM (regno))
-return mode == HImode;
+return mode == HImode || mode == V16BImode || mode == V8BImode || mode == 
V4BImode;
 
   if (TARGET_THUMB1)
 /* For the Thumb we only allow values bigger than SImode in
@@ -30994,6 +30994,19 @@ arm_split_atomic_op (enum rtx_code code, rtx old_out, 
rtx new_out, rtx mem,

[PATCH 07/13] arm: Implement MVE predicates as vectors of booleans

2021-09-07 Thread Christophe Lyon via Gcc-patches

This patch implements support for vectors of booleans to support MVE
predicates, instead of HImode.  Since the ABI mandates pred16_t (aka
uint16_t) to represent predicates in intrinsics prototypes, we
introduce a new "predicate" type qualifier so that we can map relevant
builtins HImode arguments and return value to the appropriate vector
of booleans (VxBI).

We have to update test_vector_ops_duplicate, because it iterates using
an offset in bytes, where we would need to iterate in bits: we stop
iterating when we reach the end of the vector of booleans.

2021-09-01  Christophe Lyon  

gcc/
PR target/100757
PR target/101325
* config/arm/arm-builtins.c (arm_type_qualifiers): Add 
qualifier_predicate.
(arm_init_simd_builtin_types): Add new simd types.
(arm_init_builtin): Map predicate vectors arguments to HImode.
(arm_expand_builtin_args): Move HImode predicate arguments to VxBI
rtx. Move return value to HImode rtx.
* config/arm/arm-modes.def (V16BI, V8BI, V4BI): New modes.
* config/arm/arm-simd-builtin-types.def (Pred1x16_t,
Pred2x8_t,Pred4x4_t): New.
* simplify-rtx.c (test_vector_ops_duplicate): Avoid going past the
end of the test vector.

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 3a9ff8f26b8..771759f0cdd 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -92,7 +92,9 @@ enum arm_type_qualifiers
   qualifier_lane_pair_index = 0x1000,
   /* Lane indices selected in quadtuplets - must be within range of previous
  argument = a vector.  */
-  qualifier_lane_quadtup_index = 0x2000
+  qualifier_lane_quadtup_index = 0x2000,
+  /* MVE vector predicates.  */
+  qualifier_predicate = 0x4000
 };
 
 /*  The qualifier_internal allows generation of a unary builtin from
@@ -1633,6 +1635,13 @@ arm_init_simd_builtin_types (void)
   arm_simd_types[Bfloat16x4_t].eltype = arm_bf16_type_node;
   arm_simd_types[Bfloat16x8_t].eltype = arm_bf16_type_node;
 
+  if (TARGET_HAVE_MVE)
+{
+  arm_simd_types[Pred1x16_t].eltype = unsigned_intHI_type_node;
+  arm_simd_types[Pred2x8_t].eltype = unsigned_intHI_type_node;
+  arm_simd_types[Pred4x4_t].eltype = unsigned_intHI_type_node;
+}
+
   for (i = 0; i < nelts; i++)
 {
   tree eltype = arm_simd_types[i].eltype;
@@ -1780,6 +1789,11 @@ arm_init_builtin (unsigned int fcode, arm_builtin_datum 
*d,
   if (qualifiers & qualifier_map_mode)
op_mode = d->mode;
 
+  /* MVE Predicates use HImode as mandated by the ABI: pred16_t is unsigned
+short.  */
+  if (qualifiers & qualifier_predicate)
+   op_mode = HImode;
+
   /* For pointers, we want a pointer to the basic type
 of the vector.  */
   if (qualifiers & qualifier_pointer && VECTOR_MODE_P (op_mode))
@@ -3024,6 +3038,11 @@ arm_expand_builtin_args (rtx target, machine_mode 
map_mode, int fcode,
case ARG_BUILTIN_COPY_TO_REG:
  if (POINTER_TYPE_P (TREE_TYPE (arg[argc])))
op[argc] = convert_memory_address (Pmode, op[argc]);
+
+ /* MVE uses mve_pred16_t (aka HImode) for vectors of predicates.  
*/
+ if (GET_MODE_CLASS (mode[argc]) == MODE_VECTOR_BOOL)
+   op[argc] = gen_lowpart (mode[argc], op[argc]);
+
  /*gcc_assert (GET_MODE (op[argc]) == mode[argc]); */
  if (!(*insn_data[icode].operand[opno].predicate)
  (op[argc], mode[argc]))
@@ -3229,6 +3248,13 @@ constant_arg:
   else
 emit_insn (insn);
 
+  if (GET_MODE_CLASS (tmode) == MODE_VECTOR_BOOL)
+{
+  rtx HItarget = gen_reg_rtx (HImode);
+  emit_move_insn (HItarget, gen_lowpart (HImode, target));
+  return HItarget;
+}
+
   return target;
 }
 
diff --git a/gcc/config/arm/arm-modes.def b/gcc/config/arm/arm-modes.def
index a5e74ba3943..b414a709a62 100644
--- a/gcc/config/arm/arm-modes.def
+++ b/gcc/config/arm/arm-modes.def
@@ -84,6 +84,11 @@ VECTOR_MODE (FLOAT, BF, 2);   /* V2BF.  */
 VECTOR_MODE (FLOAT, BF, 4);   /*V4BF.  */
 VECTOR_MODE (FLOAT, BF, 8);   /*V8BF.  */
 
+/* Predicates for MVE.  */
+VECTOR_BOOL_MODE (V16BI, 16, 2);
+VECTOR_BOOL_MODE (V8BI, 8, 2);
+VECTOR_BOOL_MODE (V4BI, 4, 2);
+
 /* Fraction and accumulator vector modes.  */
 VECTOR_MODES (FRACT, 4);  /* V4QQ  V2HQ */
 VECTOR_MODES (UFRACT, 4); /* V4UQQ V2UHQ */
diff --git a/gcc/config/arm/arm-simd-builtin-types.def 
b/gcc/config/arm/arm-simd-builtin-types.def
index c19a1b6e3eb..d3987985b4c 100644
--- a/gcc/config/arm/arm-simd-builtin-types.def
+++ b/gcc/config/arm/arm-simd-builtin-types.def
@@ -51,3 +51,7 @@
   ENTRY (Bfloat16x2_t, V2BF, none, 32, bfloat16, 20)
   ENTRY (Bfloat16x4_t, V4BF, none, 64, bfloat16, 20)
   ENTRY (Bfloat16x8_t, V8BF, none, 128, bfloat16, 20)
+
+  ENTRY (Pred1x16_t, V16BI, unsigned, 16, uint16, 21)
+  ENTRY (Pred2x8_t, V8BI, unsigned, 8, uint16, 21)
+  ENTRY

[PATCH 06/13] arm: Fix mve_vmvnq_n_ argument mode

2021-09-07 Thread Christophe Lyon via Gcc-patches

The vmvnq_n* intrinsics and have [u]int[16|32]_t arguments, so use
 iterator instead of HI in mve_vmvnq_n_.

2021-09-03  Christophe Lyon  

gcc/
* config/arm/mve.md (mve_vmvnq_n_): Use V_elem mode
for operand 1.

diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index e393518ea88..14d17060290 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -617,7 +617,7 @@ (define_insn "mve_vcvtaq_"
 (define_insn "mve_vmvnq_n_"
   [
(set (match_operand:MVE_5 0 "s_register_operand" "=w")
-   (unspec:MVE_5 [(match_operand:HI 1 "immediate_operand" "i")]
+   (unspec:MVE_5 [(match_operand: 1 "immediate_operand" "i")]
 VMVNQ_N))
   ]
   "TARGET_HAVE_MVE"
-- 
2.25.1

[PATCH 05/13] arm: Add support for VPR_REG in arm_class_likely_spilled_p

2021-09-07 Thread Christophe Lyon via Gcc-patches

VPR_REG is the only register in its class, so it should be handled by
TARGET_CLASS_LIKELY_SPILLED_P.  No test fails without this patch, but
it seems it should be implemented.

2021-09-01  Christophe Lyon  

gcc/
* config/arm/arm.c (arm_class_likely_spilled_p): Handle VPR_REG.

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 11dafc70067..1222cb0d0fe 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -29307,6 +29307,9 @@ arm_class_likely_spilled_p (reg_class_t rclass)
   || rclass  == CC_REG)
 return true;
 
+  if (TARGET_HAVE_MVE && (rclass == VPR_REG))
+return true;
+
   return false;
 }
 
-- 
2.25.1

[PATCH 01/13] arm: Add new tests for comparison vectorization with Neon and MVE

2021-09-07 Thread Christophe Lyon via Gcc-patches

This patch mainly adds Neon tests similar to existing MVE ones,
to make sure we do not break Neon when fixing MVE.

mve-vcmp-f32-2.c is similar to mve-vcmp-f32.c but uses a conditional
with 2.0f and 3.0f constants to help scan-assembler-times.

2021-09-01  Christophe Lyon 

gcc/testsuite/
* gcc.target/arm/simd/mve-vcmp-f32-2.c: New.
* gcc.target/arm/simd/neon-compare-1.c: New.
* gcc.target/arm/simd/neon-compare-2.c: New.
* gcc.target/arm/simd/neon-compare-3.c: New.
* gcc.target/arm/simd/neon-compare-scalar-1.c: New.
* gcc.target/arm/simd/neon-vcmp-f16.c: New.
* gcc.target/arm/simd/neon-vcmp-f32-2.c: New.
* gcc.target/arm/simd/neon-vcmp-f32-3.c: New.
* gcc.target/arm/simd/neon-vcmp-f32.c: New.
* gcc.target/arm/simd/neon-vcmp.c: New.

diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f32-2.c 
b/gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f32-2.c
new file mode 100644
index 000..917a95bf141
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f32-2.c
@@ -0,0 +1,32 @@
+/* { dg-do assemble } */
+/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
+/* { dg-add-options arm_v8_1m_mve_fp } */
+/* { dg-additional-options "-O3 -funsafe-math-optimizations" } */
+
+#include 
+
+#define NB 4
+
+#define FUNC(OP, NAME) \
+  void test_ ## NAME ##_f (float * __restrict__ dest, float *a, float *b) { \
+int i; \
+for (i=0; i, vcmpgt)
+FUNC(>=, vcmpge)
+
+/* { dg-final { scan-assembler-times {\tvcmp.f32\teq, q[0-9]+, q[0-9]+\n} 1 } 
} */
+/* { dg-final { scan-assembler-times {\tvcmp.f32\tne, q[0-9]+, q[0-9]+\n} 1 } 
} */
+/* { dg-final { scan-assembler-times {\tvcmp.f32\tlt, q[0-9]+, q[0-9]+\n} 1 } 
} */
+/* { dg-final { scan-assembler-times {\tvcmp.f32\tle, q[0-9]+, q[0-9]+\n} 1 } 
} */
+/* { dg-final { scan-assembler-times {\tvcmp.f32\tgt, q[0-9]+, q[0-9]+\n} 1 } 
} */
+/* { dg-final { scan-assembler-times {\tvcmp.f32\tge, q[0-9]+, q[0-9]+\n} 1 } 
} */
+/* { dg-final { scan-assembler-times {\t.word\t1073741824\n} 24 } } */ /* 
Constant 2.0f.  */
+/* { dg-final { scan-assembler-times {\t.word\t1077936128\n} 24 } } */ /* 
Constant 3.0f.  */
diff --git a/gcc/testsuite/gcc.target/arm/simd/neon-compare-1.c 
b/gcc/testsuite/gcc.target/arm/simd/neon-compare-1.c
new file mode 100644
index 000..2e0222a71f2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/neon-compare-1.c
@@ -0,0 +1,78 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-add-options arm_neon } */
+/* { dg-additional-options "-O3" } */
+
+#include "mve-compare-1.c"
+
+/* 64-bit vectors.  */
+/* vmvn is used by 'ne' comparisons: 3 sizes * 2 (signed/unsigned) * 2
+   (register/zero) = 12.  */
+/* { dg-final { scan-assembler-times {\tvmvn\td[0-9]+, d[0-9]+\n} 12 } } */
+
+/* { 8 bits } x { eq, ne, lt, le, gt, ge }. */
+/* ne uses eq, lt/le only apply to comparison with zero, they use gt/ge
+   otherwise.  */
+/* { dg-final { scan-assembler-times {\tvceq.i8\td[0-9]+, d[0-9]+, d[0-9]+\n} 
4 } } */
+/* { dg-final { scan-assembler-times {\tvceq.i8\td[0-9]+, d[0-9]+, #0\n} 4 } } 
*/
+/* { dg-final { scan-assembler-times {\tvclt.s8\td[0-9]+, d[0-9]+, #0\n} 1 } } 
*/
+/* { dg-final { scan-assembler-times {\tvcle.s8\td[0-9]+, d[0-9]+, #0\n} 1 } } 
*/
+/* { dg-final { scan-assembler-times {\tvcgt.s8\td[0-9]+, d[0-9]+, d[0-9]+\n} 
2 } } */
+/* { dg-final { scan-assembler-times {\tvcgt.s8\td[0-9]+, d[0-9]+, #0\n} 1 } } 
*/
+/* { dg-final { scan-assembler-times {\tvcge.s8\td[0-9]+, d[0-9]+, d[0-9]+\n} 
2 } } */
+/* { dg-final { scan-assembler-times {\tvcge.s8\td[0-9]+, d[0-9]+, #0\n} 1 } } 
*/
+
+/* { 16 bits } x { eq, ne, lt, le, gt, ge }. */
+/* { dg-final { scan-assembler-times {\tvceq.i16\td[0-9]+, d[0-9]+, d[0-9]+\n} 
4 } } */
+/* { dg-final { scan-assembler-times {\tvceq.i16\td[0-9]+, d[0-9]+, #0\n} 4 } 
} */
+/* { dg-final { scan-assembler-times {\tvclt.s16\td[0-9]+, d[0-9]+, #0\n} 1 } 
} */
+/* { dg-final { scan-assembler-times {\tvcle.s16\td[0-9]+, d[0-9]+, #0\n} 1 } 
} */
+/* { dg-final { scan-assembler-times {\tvcgt.s16\td[0-9]+, d[0-9]+, d[0-9]+\n} 
2 } } */
+/* { dg-final { scan-assembler-times {\tvcgt.s16\td[0-9]+, d[0-9]+, #0\n} 1 } 
} */
+/* { dg-final { scan-assembler-times {\tvcge.s16\td[0-9]+, d[0-9]+, d[0-9]+\n} 
2 } } */
+/* { dg-final { scan-assembler-times {\tvcge.s16\td[0-9]+, d[0-9]+, #0\n} 1 } 
} */
+
+/* { 32 bits } x { eq, ne, lt, le, gt, ge }. */
+/* { dg-final { scan-assembler-times {\tvceq.i32\td[0-9]+, d[0-9]+, d[0-9]+\n} 
4 } } */
+/* { dg-final { scan-assembler-times {\tvceq.i32\td[0-9]+, d[0-9]+, #0\n} 4 } 
} */
+/* { dg-final { scan-assembler-times {\tvclt.s32\td[0-9]+, d[0-9]+, #0\n} 1 } 
} */
+/* { dg-final { scan-assembler-times {\tvcle.s32\td[0-9]+, d[0-9]+, #0\n} 1 } 
} */
+/* { dg-final { scan-assembler-times {\tvcgt.s32\td[0-9]+, d[0-9]+, d[0-9]+\n} 
2 } } */
+/* {

[PATCH 02/13] arm: Add tests for PR target/100757

2021-09-07 Thread Christophe Lyon via Gcc-patches

These tests currently trigger an ICE which is fixed later in the patch
series.

The pr100757*.c testcases are derived from
gcc.c-torture/compile/20160205-1.c, forcing the use of MVE, and using
various types and return values different from 0 and 1 to avoid
commonalization with boolean masks.  In addition, since we should not
need these masks, the tests make sure they are not present.

2021-09-01  Christophe Lyon  

gcc/testsuite/
PR target/100757
* gcc.target/arm/simd/pr100757-2.c: New.
* gcc.target/arm/simd/pr100757-3.c: New.
* gcc.target/arm/simd/pr100757-4.c: New.
* gcc.target/arm/simd/pr100757.c: New.

diff --git a/gcc/testsuite/gcc.target/arm/simd/pr100757-2.c 
b/gcc/testsuite/gcc.target/arm/simd/pr100757-2.c
new file mode 100644
index 000..c2262b4d81e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/pr100757-2.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
+/* { dg-add-options arm_v8_1m_mve_fp } */
+/* { dg-additional-options "-O3 -funsafe-math-optimizations" } */
+/* Derived from gcc.c-torture/compile/20160205-1.c.  */
+
+float a[32];
+int fn1(int d) {
+  int c = 4;
+  for (int b = 0; b < 32; b++)
+if (a[b] != 2.0f)
+  c = 5;
+  return c;
+}
+
+/* { dg-final { scan-assembler-times {\t.word\t1073741824\n} 4 } } */ /* 
Constant 2.0f.  */
+/* { dg-final { scan-assembler-times {\t.word\t4\n} 4 } } */ /* Initial value 
for c.  */
+/* { dg-final { scan-assembler-times {\t.word\t5\n} 4 } } */ /* Possible value 
for c.  */
+/* { dg-final { scan-assembler-not {\t.word\t1\n} } } */ /* 'true' mask.  */
+/* { dg-final { scan-assembler-not {\t.word\t0\n} } } */ /* 'false' mask.  */
diff --git a/gcc/testsuite/gcc.target/arm/simd/pr100757-3.c 
b/gcc/testsuite/gcc.target/arm/simd/pr100757-3.c
new file mode 100644
index 000..e604555c04c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/pr100757-3.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
+/* { dg-add-options arm_v8_1m_mve_fp } */
+/* { dg-additional-options "-O3 -funsafe-math-optimizations" } */
+/* Copied from gcc.c-torture/compile/20160205-1.c.  */
+
+float a[32];
+float fn1(int d) {
+  float c = 4.0f;
+  for (int b = 0; b < 32; b++)
+if (a[b] != 2.0f)
+  c = 5.0f;
+  return c;
+}
+
+/* { dg-final { scan-assembler-times {\t.word\t1073741824\n} 4 } } */ /* 
Constant 2.0f.  */
+/* { dg-final { scan-assembler-times {\t.word\t1084227584\n} 4 } } */ /* 
Initial value for c (4.0).  */
+/* { dg-final { scan-assembler-times {\t.word\t1082130432\n} 4 } } */ /* 
Possible value for c (5.0).  */
+/* { dg-final { scan-assembler-not {\t.word\t1\n} } } */ /* 'true' mask.  */
+/* { dg-final { scan-assembler-not {\t.word\t0\n} } } */ /* 'false' mask.  */
diff --git a/gcc/testsuite/gcc.target/arm/simd/pr100757-4.c 
b/gcc/testsuite/gcc.target/arm/simd/pr100757-4.c
new file mode 100644
index 000..c12040c517f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/pr100757-4.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_1m_mve_ok } */
+/* { dg-add-options arm_v8_1m_mve } */
+/* { dg-additional-options "-O3" } */
+/* Derived from gcc.c-torture/compile/20160205-1.c.  */
+
+unsigned int a[32];
+int fn1(int d) {
+  int c = 2;
+  for (int b = 0; b < 32; b++)
+if (a[b])
+  c = 3;
+  return c;
+}
+
+/* { dg-final { scan-assembler-times {\t.word\t0\n} 4 } } */ /* 'false' mask.  
*/
+/* { dg-final { scan-assembler-not {\t.word\t1\n} } } */ /* 'true' mask.  */
+/* { dg-final { scan-assembler-times {\t.word\t2\n} 4 } } */ /* Initial value 
for c.  */
+/* { dg-final { scan-assembler-times {\t.word\t3\n} 4 } } */ /* Possible value 
for c.  */
diff --git a/gcc/testsuite/gcc.target/arm/simd/pr100757.c 
b/gcc/testsuite/gcc.target/arm/simd/pr100757.c
new file mode 100644
index 000..41d6e4e2d7a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/pr100757.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_1m_mve_ok } */
+/* { dg-add-options arm_v8_1m_mve } */
+/* { dg-additional-options "-O3" } */
+/* Derived from gcc.c-torture/compile/20160205-1.c.  */
+
+int a[32];
+int fn1(int d) {
+  int c = 2;
+  for (int b = 0; b < 32; b++)
+if (a[b])
+  c = 3;
+  return c;
+}
+
+/* { dg-final { scan-assembler-times {\t.word\t0\n} 4 } } */ /* 'false' mask.  
*/
+/* { dg-final { scan-assembler-not {\t.word\t1\n} } } */ /* 'true' mask.  */
+/* { dg-final { scan-assembler-times {\t.word\t2\n} 4 } } */ /* Initial value 
for c.  */
+/* { dg-final { scan-assembler-times {\t.word\t3\n} 4 } } */ /* Possible value 
for c.  */
-- 
2.25.1

[PATCH 04/13] arm: Add GENERAL_AND_VPR_REGS regclass

2021-09-07 Thread Christophe Lyon via Gcc-patches

At some point during the development of this patch series, it appeared
that in some cases the register allocator wants “VPR or general”
rather than “VPR or general or FP” (which is the same thing as
ALL_REGS).  The series does not seem to require this anymore, but it
seems to be a good thing to do anyway, to give the register allocator
more freedom.

2021-09-01  Christophe Lyon  

gcc/
* config/arm/arm.h (reg_class): Add GENERAL_AND_VPR_REGS.
(REG_CLASS_NAMES): Likewise.
(REG_CLASS_CONTENTS): Likewise. Add VPR_REG to ALL_REGS.

diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 015299c1534..fab39d05916 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -1286,6 +1286,7 @@ enum reg_class
   SFP_REG,
   AFP_REG,
   VPR_REG,
+  GENERAL_AND_VPR_REGS,
   ALL_REGS,
   LIM_REG_CLASSES
 };
@@ -1315,6 +1316,7 @@ enum reg_class
   "SFP_REG",   \
   "AFP_REG",   \
   "VPR_REG",   \
+  "GENERAL_AND_VPR_REGS", \
   "ALL_REGS"   \
 }
 
@@ -1343,7 +1345,8 @@ enum reg_class
   { 0x, 0x, 0x, 0x0040 }, /* SFP_REG */\
   { 0x, 0x, 0x, 0x0080 }, /* AFP_REG */\
   { 0x, 0x, 0x, 0x0400 }, /* VPR_REG.  */  \
-  { 0x7FFF, 0x, 0x, 0x000F }  /* ALL_REGS.  */ \
+  { 0x5FFF, 0x, 0x, 0x0400 }, /* GENERAL_AND_VPR_REGS. 
 */ \
+  { 0x7FFF, 0x, 0x, 0x040F }  /* ALL_REGS.  */ \
 }
 
 #define FP_SYSREGS \
-- 
2.25.1

[PATCH 03/13] arm: Add test for PR target/101325

2021-09-07 Thread Christophe Lyon via Gcc-patches

This test is derived from the one provided in the PR: it is a
compile-only test because I do not have access to anything that could
execute it.  We can switch it do 'dg-do run' later, however it would
be better to write a new executable test to ensure coverage in case
the tester cannot execute such code (and it will need a new
arm_v8_1m_mve_hw or similar effective-target).

2021-09-01  Christophe Lyon  

gcc/testsuite/
PR target/101325
* gcc.target/arm/simd/pr101325.c: New.

diff --git a/gcc/testsuite/gcc.target/arm/simd/pr101325.c 
b/gcc/testsuite/gcc.target/arm/simd/pr101325.c
new file mode 100644
index 000..a466683a0b1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/pr101325.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_1m_mve_ok } */
+/* { dg-add-options arm_v8_1m_mve } */
+/* { dg-additional-options "-O3" } */
+
+#include 
+
+unsigned foo(int8x16_t v, int8x16_t w)
+{
+  return vcmpeqq (v, w);
+}
+/* { dg-final { scan-assembler {\tvcmp.i8  eq} } } */
+/* { dg-final { scan-assembler {\tvmrs\t r[0-9]+, P0} } } */
+/* { dg-final { scan-assembler {\tuxth} } } */
-- 
2.25.1

[PATCH 00/13] ARM/MVE use vectors of boolean for predicates

2021-09-07 Thread Christophe Lyon via Gcc-patches

This patch series addresses PR 100757 and 101325 by representing
vectors of predicates (MVE VPR.P0 register) as vectors of booleans
rather than using HImode.

As this implies a lot of mostly mechanical changes, I have tried to
split the patches in a way that should help reviewers, but the split
is a bit artificial.

Patches 1-3 add new tests.

Patches 4-6 are small independent improvements.

Patch 7 implements the predicate qualifier, but does not change any
builtin yet.

Patch 8 is the first of the two main patches, and uses the new
qualifier to describe the vcmp and vpsel builtins that are useful for
auto-vectorization of comparisons.

Patch 9 is the second main patch, which fixes the vcond_mask expander.

Patches 10-13 convert almost all the remaining builtins with HI
operands to use the predicate qualifier.  After these, there are still
a few builtins with HI operands left, about which I am not sure: vctp,
vpnot, load-gather and store-scatter with v2di operands.  In fact,
patches 11/12 update some STR/LDR qualifiers in a way that breaks
these v2di builtins although existing tests still pass.

Christophe Lyon (13):
  arm: Add new tests for comparison vectorization with Neon and MVE
  arm: Add tests for PR target/100757
  arm: Add test for PR target/101325
  arm: Add GENERAL_AND_VPR_REGS regclass
  arm: Add support for VPR_REG in arm_class_likely_spilled_p
  arm: Fix mve_vmvnq_n_ argument mode
  arm: Implement MVE predicates as vectors of booleans
  arm: Implement auto-vectorized MVE comparisons with vectors of boolean
predicates
  arm: Fix vcond_mask expander for MVE (PR target/100757)
  arm: Convert remaining MVE vcmp builtins to predicate qualifiers
  arm: Convert more MVE builtins to predicate qualifiers
  arm: Convert more load/store MVE builtins to predicate qualifiers
  arm: Convert more MVE/CDE builtins to predicate qualifiers

 gcc/config/arm/arm-builtins.c | 228 +++--
 gcc/config/arm/arm-modes.def  |   5 +
 gcc/config/arm/arm-protos.h   |   3 +-
 gcc/config/arm/arm-simd-builtin-types.def |   4 +
 gcc/config/arm/arm.c  | 128 ++-
 gcc/config/arm/arm.h  |   5 +-
 gcc/config/arm/arm_mve_builtins.def   | 746 
 gcc/config/arm/iterators.md   |   5 +
 gcc/config/arm/mve.md | 823 ++
 gcc/config/arm/neon.md|  39 +
 gcc/config/arm/vec-common.md  |  52 --
 gcc/simplify-rtx.c|   7 +
 .../arm/acle/cde-mve-full-assembly.c  | 264 +++---
 .../gcc.target/arm/simd/mve-vcmp-f32-2.c  |  32 +
 .../gcc.target/arm/simd/neon-compare-1.c  |  78 ++
 .../gcc.target/arm/simd/neon-compare-2.c  |  13 +
 .../gcc.target/arm/simd/neon-compare-3.c  |  14 +
 .../arm/simd/neon-compare-scalar-1.c  |  57 ++
 .../gcc.target/arm/simd/neon-vcmp-f16.c   |  12 +
 .../gcc.target/arm/simd/neon-vcmp-f32-2.c |  15 +
 .../gcc.target/arm/simd/neon-vcmp-f32-3.c |  12 +
 .../gcc.target/arm/simd/neon-vcmp-f32.c   |  12 +
 gcc/testsuite/gcc.target/arm/simd/neon-vcmp.c |  22 +
 .../gcc.target/arm/simd/pr100757-2.c  |  20 +
 .../gcc.target/arm/simd/pr100757-3.c  |  20 +
 .../gcc.target/arm/simd/pr100757-4.c  |  19 +
 gcc/testsuite/gcc.target/arm/simd/pr100757.c  |  19 +
 gcc/testsuite/gcc.target/arm/simd/pr101325.c  |  14 +
 28 files changed, 1581 insertions(+), 1087 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f32-2.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/neon-compare-1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/neon-compare-2.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/neon-compare-3.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/neon-compare-scalar-1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/neon-vcmp-f16.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/neon-vcmp-f32-2.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/neon-vcmp-f32-3.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/neon-vcmp-f32.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/neon-vcmp.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/pr100757-2.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/pr100757-3.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/pr100757-4.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/pr100757.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/pr101325.c

-- 
2.25.1

[PATCH] tree-optimization/101555 - avoid redundant alias queries in PRE

2021-09-07 Thread Richard Biener via Gcc-patches

This avoids doing redundant work during PHI translation to invalidate
mems when translating their corresponding VUSE through the blocks
virtual PHI node.  All the invalidation work is already done by
prune_clobbered_mems.

This speeds up the compile of the testcase from 275s with PRE
taking 91% of the compile-time down to 43s with PRE taking 16%
of the compile-time.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2021-09-07  Richard Biener  

PR tree-optimization/101555
* tree-ssa-pre.c (translate_vuse_through_block): Do not
perform an alias walk to determine the validity of the
mem at the start of the block which is already guaranteed
by means of prune_clobbered_mems.
(phi_translate_1): Pass edge to translate_vuse_through_block.
---
 gcc/tree-ssa-pre.c | 97 ++
 1 file changed, 37 insertions(+), 60 deletions(-)

diff --git a/gcc/tree-ssa-pre.c b/gcc/tree-ssa-pre.c
index 769aadb2315..08755847f66 100644
--- a/gcc/tree-ssa-pre.c
+++ b/gcc/tree-ssa-pre.c
@@ -1237,21 +1237,18 @@ fully_constant_expression (pre_expr e)
   return e;
 }
 
-/* Translate the VUSE backwards through phi nodes in PHIBLOCK, so that
-   it has the value it would have in BLOCK.  Set *SAME_VALID to true
+/* Translate the VUSE backwards through phi nodes in E->dest, so that
+   it has the value it would have in E->src.  Set *SAME_VALID to true
in case the new vuse doesn't change the value id of the OPERANDS.  */
 
 static tree
 translate_vuse_through_block (vec operands,
  alias_set_type set, alias_set_type base_set,
- tree type, tree vuse,
- basic_block phiblock,
- basic_block block, bool *same_valid)
+ tree type, tree vuse, edge e, bool *same_valid)
 {
+  basic_block phiblock = e->dest;
   gimple *phi = SSA_NAME_DEF_STMT (vuse);
   ao_ref ref;
-  edge e = NULL;
-  bool use_oracle;
 
   if (same_valid)
 *same_valid = true;
@@ -1259,59 +1256,40 @@ translate_vuse_through_block (vec 
operands,
   if (gimple_bb (phi) != phiblock)
 return vuse;
 
-  unsigned int cnt = param_sccvn_max_alias_queries_per_access;
-  use_oracle = ao_ref_init_from_vn_reference (, set, base_set,
- type, operands);
-
-  /* Use the alias-oracle to find either the PHI node in this block,
- the first VUSE used in this block that is equivalent to vuse or
- the first VUSE which definition in this block kills the value.  */
-  if (gimple_code (phi) == GIMPLE_PHI)
-e = find_edge (block, phiblock);
-  else if (use_oracle)
-while (cnt > 0
-  && !stmt_may_clobber_ref_p_1 (phi, ))
-  {
-   --cnt;
-   vuse = gimple_vuse (phi);
-   phi = SSA_NAME_DEF_STMT (vuse);
-   if (gimple_bb (phi) != phiblock)
- return vuse;
-   if (gimple_code (phi) == GIMPLE_PHI)
- {
-   e = find_edge (block, phiblock);
-   break;
- }
-  }
-  else
-return NULL_TREE;
-
-  if (e)
+  /* We have pruned expressions that are killed in PHIBLOCK via
+ prune_clobbered_mems but we have not rewritten the VUSE to the one
+ live at the start of the block.  If there is no virtual PHI to translate
+ through return the VUSE live at entry.  Otherwise the VUSE to translate
+ is the def of the virtual PHI node.  */
+  phi = get_virtual_phi (phiblock);
+  if (!phi)
+return BB_LIVE_VOP_ON_EXIT
+(get_immediate_dominator (CDI_DOMINATORS, phiblock));
+
+  if (same_valid
+  && ao_ref_init_from_vn_reference (, set, base_set, type, operands))
 {
-  if (use_oracle && same_valid)
-   {
- bitmap visited = NULL;
- /* Try to find a vuse that dominates this phi node by skipping
-non-clobbering statements.  */
- vuse = get_continuation_for_phi (phi, , true,
-  cnt, , false, NULL, NULL);
- if (visited)
-   BITMAP_FREE (visited);
-   }
-  else
-   vuse = NULL_TREE;
-  /* If we didn't find any, the value ID can't stay the same.  */
-  if (!vuse && same_valid)
-   *same_valid = false;
-  /* ??? We would like to return vuse here as this is the canonical
- upmost vdef that this reference is associated with.  But during
-insertion of the references into the hash tables we only ever
-directly insert with their direct gimple_vuse, hence returning
-something else would make us not find the other expression.  */
-  return PHI_ARG_DEF (phi, e->dest_idx);
+  bitmap visited = NULL;
+  /* Try to find a vuse that dominates this phi node by skipping
+non-clobbering statements.  */
+  unsigned int cnt = param_sccvn_max_alias_queries_per_access;
+  vuse = get_continuation_for_phi (phi, , true,
+

Re: [Patch] libgomp.texi: Add OpenMP Implementation Status

2021-09-07 Thread Jakub Jelinek via Gcc-patches

On Mon, Sep 06, 2021 at 06:48:25PM +0200, Tobias Burnus wrote:
> Hi Jakub, hi all,
> 
> as the issue came up from time to time, I thought it
> makes sense to add an implementation status.
> 
> I settled on putting it into libgomp.texi instead of
> a webpage or another location.
> 
> The 5.1 section should be fine – except for additional
> items. OpenMP 5.0 has stub wordings and needs to be filled.
> The version in the patch should be sufficient for now
> but it would be nice to have it completed before GCC 12.
> (Or even before the BoF in two weeks or OpenMP Con in
> a week, hmm.)
> 
> OK? Comments? Typo/wording fixes?
> 
> Tobias
> 
> PS: Besides the OpenMP implementation status, also:
> * implementation details/choices could be added.
> * some hints regarding tricks & tips related to OpenMP
>   and in particular offloading
>   (e.g. -foffload-options=-latomic might be needed for nvptx*)
> * user-friendlier documentation how to build the offload compiler.
>   (The offloading wiki page has too many internals and
>   lacks a step-by-step guide.)
> Not all should be in libgomp.texi, however.
> 
> (* Side note: it is mentioned indirectly: as example in
> https://gcc.gnu.org/onlinedocs/gcc/C-Dialect-Options.html#index-foffload )
> 
> Currently, the OpenMP/offloading documentation is spread over:
> 
> * libgomp: https://gcc.gnu.org/onlinedocs/libgomp/
> * gcc: https://gcc.gnu.org/onlinedocs/gcc/
>   flags (-fopenmp(-simd), -foffload(-options=), ...)
> * gfortran: OpenMP + OpenMP Modules omp_lib and omp_lib_kinds
>   (and likewise for OpenACC) https://gcc.gnu.org/onlinedocs/gfortran/
> * Project page: https://gcc.gnu.org/projects/gomp/
> * Wiki page: https://gcc.gnu.org/wiki/openmp
>   (likewise: https://gcc.gnu.org/wiki/OpenACC
>   and also: https://gcc.gnu.org/wiki/OpenACC/Implementation%20Status )
> * Offloading: https://gcc.gnu.org/wiki/Offloading
>   plus https://gcc.gnu.org/wiki/nvptx
>   plus configure-time options at https://gcc.gnu.org/install/configure.html
> 
> -
> Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
> München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
> Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
> München, HRB 106955

> libgomp.texi: Add OpenMP Implementation Status
> 
> libgomp/
>   * libgomp.texi (Enabling OpenMP): Refer to OMP spec in general
>   not to 4.5; link to new section.
>   (OpenMP Implementation Status): New.

Ok.  I'll try to provide the 5.0 implementation status soon.

Jakub

Re: [Patch] C, C++, Fortran, OpenMP: Add support for 'flush seq_cst' construct

2021-09-07 Thread Jakub Jelinek via Gcc-patches

On Mon, Sep 06, 2021 at 06:08:14PM +0200, Marcel Vollweiler wrote:
> C, C++, Fortran, OpenMP: Add support for 'flush seq_cst' construct.
> 
> This patch adds support for the 'seq_cst' memory order clause on the 'flush'
> directive which was introduced in OpenMP 5.1.
> 
> gcc/c-family/ChangeLog:
> 
>   * c-omp.c (c_finish_omp_flush): Handle MEMMODEL_SEQ_CST.
> 
> gcc/c/ChangeLog:
> 
>   * c-parser.c (c_parser_omp_flush): Parse 'seq_cst' clause on 'flush' 
>   directive.
> 
> gcc/cp/ChangeLog:
> 
>   * parser.c (cp_parser_omp_flush): Parse 'seq_cst' clause on 'flush'
>   directive.
>   * semantics.c (finish_omp_flush): Handle MEMMODEL_SEQ_CST.
> 
> gcc/fortran/ChangeLog:
> 
>   * openmp.c (gfc_match_omp_flush): Parse 'seq_cst' clause on 'flush'
> directive.
>   * trans-openmp.c (gfc_trans_omp_flush): Handle OMP_MEMORDER_SEQ_CST.
> 
> gcc/testsuite/ChangeLog:
> 
>   * c-c++-common/gomp/flush-1.c: Add test case for 'seq_cst'.
>   * c-c++-common/gomp/flush-2.c: Add test case for 'seq_cst'.
>   * g++.dg/gomp/attrs-1.C:  Adapt test to handle all flush clauses.
>   * gfortran.dg/gomp/flush-1.f90:  Add test case for 'seq_cst'.
>   * gfortran.dg/gomp/flush-2.f90:  Add test case for 'seq_cst'.

Just single space after :, not two.

> --- a/gcc/testsuite/g++.dg/gomp/attrs-1.C
> +++ b/gcc/testsuite/g++.dg/gomp/attrs-1.C
> @@ -528,6 +528,12 @@ bar (int d, int m, int i1, int i2, int i3, int p, int 
> *idp, int s,
>;
>[[omp::directive (flush acq_rel)]]
>;
> +  [[omp::directive (flush acquire)]]
> +  ;
> +  [[omp::directive (flush release)]]
> +  ;
> +  [[omp::directive (flush seq_cst)]]
> +  ;
>[[omp::directive (flush (p, f))]]
>;
>[[omp::directive (simd

When changing attrs-1.C, please also add corresponding change to attrs-2.C
too, with commas after the directive name (i.e. flush, acquire etc.).

Ok for trunk with those nits fixed, thanks.

Jakub

Re: [PATCH] PR fortran/93834 - [9/10/11/12 Regression] ICE in trans_caf_is_present, at fortran/trans-intrinsic.c:8469

2021-09-07 Thread Tobias Burnus


Hi Harald,

I spend yesterday about two hours with this. Now I am still
tired but understand more. I think the confusion between the
two of us is due to wording and in which directions the
thoughts then go:


Talking about coindexed, all of a[i], b[i]%c and c%d[i] are
coindexed and there are many constraints like "shall not be
a coindexed variable" – which then rejects all of those.
That's what I was thinking of.

I think your starting point is that while ('a' = allocatable)
  a, b%a, c[5]%d(1)%a
are ALLOCATABLE, adding a subobject reference such as
  a(:), b%a(:,:), c[5]%d(1)%a(:,:,:)
makes the variable no longer allocatable.
I think that's what you were thinking of.

We then both argued along those different lines – which caused
the confusion as we both thought we talked about the same.


While those cases are clear, the question is whether
  a[i] or b%a[i]
is allocatable or not – assuming that 'a' is a scalar.
(For an array, '(:)' has to appear before the image-selector,
which in turn makes it nonallocatable.)


I tried to pinpoint the words for this in the standard – and
failed. I think I need a "how to read the Fortran standard" 101
and some long time actually reading it :-(

Malcolm has answered me – and he believes (but only offhand) that
  a[i]  and  b%a[i]
_are_ allocatable. See (6) at
https://mailman.j3-fortran.org/pipermail/j3/2021-September/013322.html


This implies that
  if ( allocated (a[i]) .and. allocated (b%a[i]) ) stop 1
is valid.

However, I do note that coarray allocatables have to be collectively
(de)allocated, therefore
  allocated (a[i]) .and. allocated (b%a[i])
is equivalent to
  allocated (a) .and. allocated (b%a)
at least assuming that no image has failed.


First: Does this answer all the questions you had and resolved the
confusion?
Secondly, do you agree about the last bits of the analysis?
Thirdly, what do you think of the attached patch?

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
Fortran: Handle allocated() with coindexed scalars [PR93834]

2021-09-07  Harald Anlauf  
	Tobias Burnus  

While for an allocatable 'array', 'array(:)' and 'array(:)[1]' are
not allocatable, it is believed that not only 'scalar' but also
'scalar[1]' is allocatable.  However, coarrays are collectively
established/allocated; thus, 'allocated(scalar[i])' is equivalent
to 'allocated(scalar)'. [At least when assuming that 'i' does not
refer to a failed image.]

	PR fortran/93834
gcc/fortran/ChangeLog:

	* trans-intrinsic.c (gfc_conv_allocated): Cleanup. Handle
	coindexed scalar coarrays.

gcc/testsuite/ChangeLog:

	* gfortran.dg/coarray_allocated.f90: New test.

diff --git a/gcc/fortran/trans-intrinsic.c b/gcc/fortran/trans-intrinsic.c
index 46670baae55..6a7a86d245a 100644
--- a/gcc/fortran/trans-intrinsic.c
+++ b/gcc/fortran/trans-intrinsic.c
@@ -8887,50 +8887,64 @@ caf_this_image_ref (gfc_ref *ref)
 static void
 gfc_conv_allocated (gfc_se *se, gfc_expr *expr)
 {
-  gfc_actual_arglist *arg1;
   gfc_se arg1se;
   tree tmp;
-  symbol_attribute caf_attr;
+  bool coindexed_caf_comp = false;
 
-  gfc_init_se (, NULL);
-  arg1 = expr->value.function.actual;
+  expr = expr->value.function.actual->expr;
 
-  if (arg1->expr->ts.type == BT_CLASS)
+  gfc_init_se (, NULL);
+  if (expr->ts.type == BT_CLASS)
 {
   /* Make sure that class array expressions have both a _data
 	 component reference and an array reference  */
-  if (CLASS_DATA (arg1->expr)->attr.dimension)
-	gfc_add_class_array_ref (arg1->expr);
+  if (CLASS_DATA (expr)->attr.dimension)
+	gfc_add_class_array_ref (expr);
   /*  whilst scalars only need the _data component.  */
   else
-	gfc_add_data_component (arg1->expr);
+	gfc_add_data_component (expr);
 }
 
-  /* When arg1 references an allocatable component in a coarray, then call
+  /* When expr references an allocatable component in a coarray, then call
  the caf-library function caf_is_present ().  */
-  if (flag_coarray == GFC_FCOARRAY_LIB && arg1->expr->expr_type == EXPR_FUNCTION
-  && arg1->expr->value.function.isym
-  && arg1->expr->value.function.isym->id == GFC_ISYM_CAF_GET)
-caf_attr = gfc_caf_attr (arg1->expr->value.function.actual->expr);
-  else
-gfc_clear_attr (_attr);
-  if (flag_coarray == GFC_FCOARRAY_LIB && caf_attr.codimension
-  && !caf_this_image_ref (arg1->expr->value.function.actual->expr->ref))
-tmp = trans_caf_is_present (se, arg1->expr->value.function.actual->expr);
+  if (flag_coarray == GFC_FCOARRAY_LIB && expr->expr_type == EXPR_FUNCTION
+  && expr->value.function.isym
+  && expr->value.function.isym->id == GFC_ISYM_CAF_GET)
+{
+  expr = expr->value.function.actual->expr;
+  if (caf_this_image_ref (expr->ref))
+	coindexed_caf_comp = false;  /* Local access.

[PATCH] c++: Fix up constexpr evaluation of deleting dtors [PR100495]

2021-09-07 Thread Jakub Jelinek via Gcc-patches

Hi!

We do not save bodies of constexpr clones and instead evaluate the bodies
of the constexpr functions they were cloned from.
I believe that is just fine for constructors because complete vs. base
ctors differ only in classes that have virtual bases and such constructors
aren't constexpr, similarly complete/base destructors.
But as the testcase below shows, for deleting destructors it is not fine,
deleting dtors while marked as clones in fact are just artificial functions
with synthetized body which calls the user destructor and deallocation.

So, either we'd need to evaluate the destructor and afterwards synthetize
and evaluate the deallocation, or we can just save and use the deleting
dtors bodies.  The latter seems much easier to me.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/11.3?

2021-09-07  Jakub Jelinek  

PR c++/100495
* constexpr.c (maybe_save_constexpr_fundef): Save body even for
constexpr deleting dtors.
(cxx_eval_call_expression): Don't use DECL_CLONED_FUNCTION for
deleting dtors.

* g++.dg/cpp2a/constexpr-new21.C: New test.

--- gcc/cp/constexpr.c.jj   2021-09-01 11:37:41.889557426 +0200
+++ gcc/cp/constexpr.c  2021-09-06 13:05:04.098652914 +0200
@@ -865,7 +865,7 @@ maybe_save_constexpr_fundef (tree fun)
   if (processing_template_decl
   || !DECL_DECLARED_CONSTEXPR_P (fun)
   || cp_function_chain->invalid_constexpr
-  || DECL_CLONED_FUNCTION_P (fun))
+  || (DECL_CLONED_FUNCTION_P (fun) && !DECL_DELETING_DESTRUCTOR_P (fun)))
 return;
 
   if (!is_valid_constexpr_fn (fun, !DECL_GENERATED_P (fun)))
@@ -2372,7 +2372,7 @@ cxx_eval_call_expression (const constexp
   *non_constant_p = true;
   return t;
 }
-  if (DECL_CLONED_FUNCTION_P (fun))
+  if (DECL_CLONED_FUNCTION_P (fun) && !DECL_DELETING_DESTRUCTOR_P (fun))
 fun = DECL_CLONED_FUNCTION (fun);
 
   if (is_ubsan_builtin_p (fun))
--- gcc/testsuite/g++.dg/cpp2a/constexpr-new21.C.jj 2021-09-06 
13:09:59.326484091 +0200
+++ gcc/testsuite/g++.dg/cpp2a/constexpr-new21.C2021-09-06 
13:08:46.574511401 +0200
@@ -0,0 +1,17 @@
+// PR c++/100495
+// { dg-do compile { target c++20 } }
+
+struct S {
+  constexpr virtual ~S () {}
+};
+
+constexpr bool
+foo ()
+{
+  S *p = new S ();
+  delete p;
+  return true;
+}
+
+constexpr bool x = foo ();
+static_assert (x);

Jakub

Re: [PATCH 2/6] [i386] Enable _Float16 type for TARGET_SSE2 and above.

2021-09-07 Thread Jakub Jelinek via Gcc-patches

On Tue, Sep 07, 2021 at 09:52:57AM +0800, Hongtao Liu wrote:
> Adjust the wording for x86 _Float16 type.
> 
> gcc/ChangeLog:
> 
> * doc/extend.texi: (@node Floating Types): Adjust the wording.
> (@node Half-Precision): Ditto.
> 
> 1 file changed, 15 insertions(+), 13 deletions(-)
> gcc/doc/extend.texi | 28 +++-
> 
> modified   gcc/doc/extend.texi
> @@ -1076,9 +1076,10 @@ systems where @code{__float128} is supported.
> The @code{_Float32}
>  type is supported on all systems supporting IEEE binary32; the
>  @code{_Float64} and @code{_Float32x} types are supported on all systems
>  supporting IEEE binary64.  The @code{_Float16} type is supported on AArch64
> -systems by default, and on ARM systems when the IEEE format for 16-bit
> -floating-point types is selected with @option{-mfp16-format=ieee}.
> -GCC does not currently support @code{_Float128x} on any systems.
> +systems by default when the IEEE format for 16-bit floating-point types is

The AArch64 case now has the ARM case restriction and ARM is lost.  It
should be

+systems by default, on ARM systems when the IEEE format for 16-bit
+floating-point-types is

> +selected with @option{-mfp16-format=ieee} and, for both C and C++, on x86
> +systems with SSE2 enabled. GCC does not currently support
> +@code{_Float128x} on any systems.
> 
>  On the i386, x86_64, IA-64, and HP-UX targets, you can declare complex
>  types using the corresponding internal complex type, @code{XCmode} for
> @@ -1108,6 +1109,12 @@ On ARM and AArch64 targets, GCC supports
> half-precision (16-bit) floating
>  point via the @code{__fp16} type defined in the ARM C Language Extensions.
>  On ARM systems, you must enable this type explicitly with the
>  @option{-mfp16-format} command-line option in order to use it.
> +On x86 targets with SSE2 enabled, GCC supports half-precision (16-bit)
> +floating point via the @code{_Float16} type, there are many ways to enable
> +SSE2, @option{-msse2, -mavx, -mavx512f, ...} on the command line, or various
> +target attributes.

The ", there are many ways ... attributes" was just meant as explanation for
the "with SSE2 enabled" wording, not something that should be literally in
the documentation.  It is documented elsewhere...

> +For C++, x86 provides a builtin type named @code{_Float16} which contains
> +same data format as C.
> 
>  ARM targets support two incompatible representations for half-precision
>  floating-point values.  You must choose one of the representations and
> @@ -1151,16 +1158,11 @@ calls.
>  It is recommended that portable code use the @code{_Float16} type defined
>  by ISO/IEC TS 18661-3:2015.  @xref{Floating Types}.
> 
> -On x86 targets with @code{target("sse2")} and above, GCC supports
> half-precision
> -(16-bit) floating point via the @code{_Float16} type which is defined by
> -18661-3:2015. For C++, x86 provide a builtin type named @code{_Float16}
> -which contains same data format as C.
> -
> -Without @option{-mavx512fp16}, @code{_Float16} type is storage only, all
> -operations will be emulated by software emulation and the @code{float}
> -instructions. The default behavior for @code{FLT_EVAL_METHOD} is to keep
> -the intermediate result of the operation as 32-bit precision. This may lead
> -to inconsistent behavior between software emulation and AVX512-FP16
> +On x86 targets, without @option{-mavx512fp16}, @code{_Float16} type is
> +storage only, all operations will be emulated by software emulation and the
> +@code{float} instructions. The default behavior for @code{FLT_EVAL_METHOD} is
> +to keep the intermediate result of the operation as 32-bit precision. This 
> may
> +lead to inconsistent behavior between software emulation and AVX512-FP16
>  instructions.
> 
>  @node Decimal Float

Jakub

[PATCH] Fix SFmode subreg of DImode and TImode

2021-09-07 Thread Michael Meissner via Gcc-patches

[PATCH] Fix SFmode subreg of DImode and TImode

This patch fixes the breakage in the PowerPC due to a recent change in SUBREG
behavior.  While it is arguable that the patch that caused the breakage should
be reverted, this patch should be a bandage to prevent these changes from
happening again.

I first noticed it in building the Spec 2017 wrf_r and blender_r
benchmarks.  Once I applied this patch, I also noticed several of the
tests now pass.

The core of the problem is we need to treat SUBREG's of SFmode and SImode
specially on the PowerPC.  This is due to the fact that SFmode values that are
in the vector and floating point registers are represented as DFmode.  When we
want to do a direct move between the GPR registers and the vector registers, we
have to convert the value from the DFmode representation to/from the SFmode
representation.

By doing this special processing instead of doing the transfer via store and
load, we were able to speed up the math library which at times want to use the
SFmode values in a union, and do logical operations on it (to test exponent
ranges, etc.) and then move it over to use as a floating point value.

I did a bootstrap build on a little endian power9 system with and without the
patch applied.  There was no regression in the tests.  I'm doing a build on a
big endian power8 system, but it hasn't finished yet as I sent this email.  I
will check on the big endian progress tomorrow morning.

The following tests now pass once again with the test.

C tests:

gcc.c-torture/compile/20071102-1.c
gcc.c-torture/compile/pr55921.c
gcc.c-torture/compile/pr85945.c
gcc.c-torture/execute/complex-3.c
gcc.dg/atomic/c11-atomic-exec-1.c
gcc.dg/atomic/c11-atomic-exec-2.c
gcc.dg/atomic/c11-atomic-exec-4.c
gcc.dg/atomic/c11-atomic-exec-5.c
gcc.dg/c11-atomic-2.c
gcc.dg/pr42475.c
gcc.dg/pr47201.c
gcc.dg/pr48335-1.c
gcc.dg/torture/pr67741.c
gcc.dg/tree-ssa/ssa-dom-thread-10.c
gcc.dg/tsan/pr88030.c
gcc.dg/ubsan/float-cast-overflow-atomic.c
gcc.dg/vect/no-tree-sra-bb-slp-pr50730.c

C++ tests:
==
g++.dg/opt/alias1.C
g++.dg/template/koenig6.C
g++.dg/torture/pr40924.C
tmpdir-g++.dg-struct-layout-1/t001

Fortran tests:
==
gfortran.dg/array_constructor_type_22.f03
gfortran.dg/array_function_6.f90
gfortran.dg/derived_comp_array_ref_7.f90
gfortran.dg/elemental_scalar_args_1.f90
gfortran.dg/elemental_subroutine_1.f90
gfortran.dg/inline_matmul_5.f90
gfortran.dg/inline_matmul_8.f90
gfortran.dg/inline_matmul_9.f90
gfortran.dg/matmul_bounds_6.f90
gfortran.dg/operator_1.f90
gfortran.dg/past_eor.f90
gfortran.dg/pr101121.f
gfortran.dg/pr91552.f90
gfortran.dg/spread_shape_1.f90
gfortran.dg/typebound_operator_3.f03
gfortran.dg/value_1.f90
gfortran.fortran-torture/execute/entry_4.f90
gfortran.fortran-torture/execute/intrinsic_dotprod.f90
gfortran.fortran-torture/execute/intrinsic_matmul.f90

Can I check this fix into the master branch?

2021-09-06  Michael Meissner  

gcc/

* config/rs6000/rs6000.c (rs6000_emit_move_si_sf_subreg): Deal
with SUBREGs of TImode and DImode.
---
 gcc/config/rs6000/rs6000.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index b9ebd56c993..7bbf29a3e1c 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -10942,6 +10942,16 @@ rs6000_emit_move_si_sf_subreg (rtx dest, rtx source, 
machine_mode mode)
  return true;
}
 
+  /* In case we are given a SUBREG for a larger type, reduce it to
+SImode.  */
+  if (mode == SFmode && GET_MODE_SIZE (inner_mode) > 4)
+   {
+ rtx tmp = gen_reg_rtx (SImode);
+ emit_move_insn (tmp, gen_lowpart (SImode, source));
+ emit_insn (gen_movsf_from_si (dest, tmp));
+ return true;
+   }
+
   if (mode == SFmode && inner_mode == SImode)
{
  emit_insn (gen_movsf_from_si (dest, inner_source));
-- 
2.31.1


-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

82 matches

Mail list logo