[PING^2][PATCH] rs6000: load high and low part of 128bit vector independently [PR110040]

2024-03-25 Thread jeevitha
Ping!

please review.

Thanks & Regards
Jeevitha


On 26/02/24 11:13 am, jeevitha wrote:
> Hi All,
> 
> The following patch has been bootstrapped and regtested on powerpc64le-linux.
> 
> PR110040 exposes an issue concerning moves from vector registers to GPRs.
> There are two moves, one for upper 64 bits and the other for the lower
> 64 bits.  In the problematic test case, we are only interested in storing
> the lower 64 bits.  However, the instruction for copying the upper 64 bits
> is still emitted and is dead code.  This patch adds a splitter that splits
> apart the two move instructions so that DCE can remove the dead code after
> splitting.
> 
> 2024-02-26  Jeevitha Palanisamy  
> 
> gcc/
>   PR target/110040
>   * config/rs6000/vsx.md (split pattern for V1TI to DI move): Defined.
> 
> gcc/testsuite/
>   PR target/110040
>   * gcc.target/powerpc/pr110040-1.c: New testcase.
>   * gcc.target/powerpc/pr110040-2.c: New testcase.
> 
> 
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index 6111cc90eb7..78457f8fb14 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -6706,3 +6706,19 @@
>"vmsumcud %0,%1,%2,%3"
>[(set_attr "type" "veccomplex")]
>  )
> +
> +(define_split
> +  [(set (match_operand:V1TI 0 "int_reg_operand")
> +   (match_operand:V1TI 1 "vsx_register_operand"))]
> +  "reload_completed
> +   && TARGET_DIRECT_MOVE_64BIT"
> +   [(pc)]
> +{
> +  rtx op0 = gen_rtx_REG (DImode, REGNO (operands[0]));
> +  rtx op1 = gen_rtx_REG (V2DImode, REGNO (operands[1]));
> +  rtx op2 = gen_rtx_REG (DImode, REGNO (operands[0]) + 1);
> +  rtx op3 = gen_rtx_REG (V2DImode, REGNO (operands[1]));
> +  emit_insn (gen_vsx_extract_v2di (op0, op1, GEN_INT (0)));
> +  emit_insn (gen_vsx_extract_v2di (op2, op3, GEN_INT (1)));
> +  DONE;
> +})
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr110040-1.c 
> b/gcc/testsuite/gcc.target/powerpc/pr110040-1.c
> new file mode 100644
> index 000..fb3bd254636
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr110040-1.c
> @@ -0,0 +1,14 @@
> +/* PR target/110040 */
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_p9vector_ok } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power9" } */
> +/* { dg-final { scan-assembler-not {\mmfvsrd\M} } } */
> +
> +#include 
> +
> +void
> +foo (signed long *dst, vector signed __int128 src)
> +{
> +  *dst = (signed long) src[0];
> +}
> +
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr110040-2.c 
> b/gcc/testsuite/gcc.target/powerpc/pr110040-2.c
> new file mode 100644
> index 000..f3aa22be4e8
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr110040-2.c
> @@ -0,0 +1,13 @@
> +/* PR target/110040 */
> +/* { dg-do compile } */
> +/* { dg-require-effective-target power10_ok } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power10" } */
> +/* { dg-final { scan-assembler-not {\mmfvsrd\M} } } */
> +
> +#include 
> +
> +void
> +foo (signed int *dst, vector signed __int128 src)
> +{
> +  __builtin_vec_xst_trunc (src, 0, dst);
> +}
> 
> 


Re: [PATCH] sanitizer: [PR110027] Align asan_vec[0] to MAX (alignb, ASAN_RED_ZONE_SIZE)

2024-03-25 Thread Hongtao Liu
On Tue, Mar 26, 2024 at 11:26 AM Hongtao Liu  wrote:
>
> On Mon, Mar 25, 2024 at 8:51 PM Jakub Jelinek  wrote:
> >
> > On Tue, Mar 12, 2024 at 07:57:59PM +0800, liuhongt wrote:
> > > if alignb > ASAN_RED_ZONE_SIZE and offset[0] is not multiple of
> > > alignb. (base_align_bias - base_offset) may not aligned to alignb, and
> > > caused segement fault.
> > >
> > > Bootstrapped and regtested on x86_64-linux-gnu{-m32,}.
> > > Ok for trunk and backport to GCC13?
> > >
> > > gcc/ChangeLog:
> > >
> > >   PR sanitizer/110027
> > >   * cfgexpand.cc (expand_stack_vars): Align frame offset to
> > >   MAX (alignb, ASAN_RED_ZONE_SIZE).
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >   * g++.dg/asan/pr110027.C: New test.
> > > ---
> > >  gcc/cfgexpand.cc |  2 +-
> > >  gcc/testsuite/g++.dg/asan/pr110027.C | 20 
> > >  2 files changed, 21 insertions(+), 1 deletion(-)
> > >  create mode 100644 gcc/testsuite/g++.dg/asan/pr110027.C
> > >
> > > diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
> > > index 0de299c62e3..92062378d8e 100644
> > > --- a/gcc/cfgexpand.cc
> > > +++ b/gcc/cfgexpand.cc
> > > @@ -1214,7 +1214,7 @@ expand_stack_vars (bool (*pred) (size_t), class 
> > > stack_vars_data *data)
> > >   {
> > > if (data->asan_vec.is_empty ())
> > >   {
> > > -   align_frame_offset (ASAN_RED_ZONE_SIZE);
> > > +   align_frame_offset (MAX (alignb, ASAN_RED_ZONE_SIZE));
> > > prev_offset = frame_offset.to_constant ();
> > >   }
> > > prev_offset = align_base (prev_offset,
> >
> > This doesn't look correct to me.
> > The above is done just once for the first var partition.  And
> > var partitions are sorted by stack_var_cmp, which puts > 
> > MAX_SUPPORTED_STACK_ALIGNMENT
> > alignment vars first (that should be none on x86, the above is quite huge
> > alignment), then on size decreasing and only after that on alignment
> > decreasing.
> >
> > So, try to add some other variable with larger size and smaller alignment
> > to the frame (and make sure it isn't optimized away).
> >
> > alignb above is the alignment of the first partition's var, if
> > align_frame_offset really needs to depend on the var alignment, it probably
> > should be the maximum alignment of all the vars with alignment
> > alignb * BITS_PER_UNIT <= MAX_SUPPORTED_STACK_ALIGNMENT
>
> In asan_emit_stack_protection, when it allocated fake stack, it assume
> bottom of stack is also aligned to alignb. And the place violated this
> is the first var partition. which is 32 bytes offsets,  it should be
> MAX_SUPPORTED_STACK_ALIGNMENT / BITS_PER_UNIT.
> So I think we need to use MAX (MAX_SUPPORTED_STACK_ALIGNMENT /
> BITS_PER_UNIT, ASAN_RED_ZONE_SIZE) for the first var partition.
It should be MAX (BIGGEST_ALIGNMENT / BITS_PER_UNIT, ASAN_RED_ZONE_SIZE).
MAX_SUPPORTED_STACK_ALIGNMENT is huge.
>
> >
> > > diff --git a/gcc/testsuite/g++.dg/asan/pr110027.C 
> > > b/gcc/testsuite/g++.dg/asan/pr110027.C
> > > new file mode 100644
> > > index 000..0067781bc89
> > > --- /dev/null
> > > +++ b/gcc/testsuite/g++.dg/asan/pr110027.C
> > > @@ -0,0 +1,20 @@
> > > +/* PR sanitizer/110027 */
> > > +/* { dg-do run } */
> > > +/* { dg-require-effective-target avx512f_runtime } */
> > > +/* { dg-options "-std=gnu++23 -mavx512f -fsanitize=address -O0 -g 
> > > -fstack-protector-strong" } */
> > > +
> > > +#include 
> > > +#include 
> > > +
> > > +template 
> > > +using Vec [[gnu::vector_size(W * sizeof(T))]] = T;
> > > +
> > > +auto foo() {
> > > +  Vec<8, int64_t> ret{};
> > > +  return ret;
> > > +}
> > > +
> > > +int main() {
> > > +  foo();
> > > +  return 0;
> > > +}
> > > --
> > > 2.31.1
> >
> > Jakub
> >
>
>
> --
> BR,
> Hongtao



-- 
BR,
Hongtao


Re: [PATCH] sanitizer: [PR110027] Align asan_vec[0] to MAX (alignb, ASAN_RED_ZONE_SIZE)

2024-03-25 Thread Hongtao Liu
On Mon, Mar 25, 2024 at 8:51 PM Jakub Jelinek  wrote:
>
> On Tue, Mar 12, 2024 at 07:57:59PM +0800, liuhongt wrote:
> > if alignb > ASAN_RED_ZONE_SIZE and offset[0] is not multiple of
> > alignb. (base_align_bias - base_offset) may not aligned to alignb, and
> > caused segement fault.
> >
> > Bootstrapped and regtested on x86_64-linux-gnu{-m32,}.
> > Ok for trunk and backport to GCC13?
> >
> > gcc/ChangeLog:
> >
> >   PR sanitizer/110027
> >   * cfgexpand.cc (expand_stack_vars): Align frame offset to
> >   MAX (alignb, ASAN_RED_ZONE_SIZE).
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * g++.dg/asan/pr110027.C: New test.
> > ---
> >  gcc/cfgexpand.cc |  2 +-
> >  gcc/testsuite/g++.dg/asan/pr110027.C | 20 
> >  2 files changed, 21 insertions(+), 1 deletion(-)
> >  create mode 100644 gcc/testsuite/g++.dg/asan/pr110027.C
> >
> > diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
> > index 0de299c62e3..92062378d8e 100644
> > --- a/gcc/cfgexpand.cc
> > +++ b/gcc/cfgexpand.cc
> > @@ -1214,7 +1214,7 @@ expand_stack_vars (bool (*pred) (size_t), class 
> > stack_vars_data *data)
> >   {
> > if (data->asan_vec.is_empty ())
> >   {
> > -   align_frame_offset (ASAN_RED_ZONE_SIZE);
> > +   align_frame_offset (MAX (alignb, ASAN_RED_ZONE_SIZE));
> > prev_offset = frame_offset.to_constant ();
> >   }
> > prev_offset = align_base (prev_offset,
>
> This doesn't look correct to me.
> The above is done just once for the first var partition.  And
> var partitions are sorted by stack_var_cmp, which puts > 
> MAX_SUPPORTED_STACK_ALIGNMENT
> alignment vars first (that should be none on x86, the above is quite huge
> alignment), then on size decreasing and only after that on alignment
> decreasing.
>
> So, try to add some other variable with larger size and smaller alignment
> to the frame (and make sure it isn't optimized away).
>
> alignb above is the alignment of the first partition's var, if
> align_frame_offset really needs to depend on the var alignment, it probably
> should be the maximum alignment of all the vars with alignment
> alignb * BITS_PER_UNIT <= MAX_SUPPORTED_STACK_ALIGNMENT

In asan_emit_stack_protection, when it allocated fake stack, it assume
bottom of stack is also aligned to alignb. And the place violated this
is the first var partition. which is 32 bytes offsets,  it should be
MAX_SUPPORTED_STACK_ALIGNMENT / BITS_PER_UNIT.
So I think we need to use MAX (MAX_SUPPORTED_STACK_ALIGNMENT /
BITS_PER_UNIT, ASAN_RED_ZONE_SIZE) for the first var partition.

>
> > diff --git a/gcc/testsuite/g++.dg/asan/pr110027.C 
> > b/gcc/testsuite/g++.dg/asan/pr110027.C
> > new file mode 100644
> > index 000..0067781bc89
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.dg/asan/pr110027.C
> > @@ -0,0 +1,20 @@
> > +/* PR sanitizer/110027 */
> > +/* { dg-do run } */
> > +/* { dg-require-effective-target avx512f_runtime } */
> > +/* { dg-options "-std=gnu++23 -mavx512f -fsanitize=address -O0 -g 
> > -fstack-protector-strong" } */
> > +
> > +#include 
> > +#include 
> > +
> > +template 
> > +using Vec [[gnu::vector_size(W * sizeof(T))]] = T;
> > +
> > +auto foo() {
> > +  Vec<8, int64_t> ret{};
> > +  return ret;
> > +}
> > +
> > +int main() {
> > +  foo();
> > +  return 0;
> > +}
> > --
> > 2.31.1
>
> Jakub
>


-- 
BR,
Hongtao


[patch, libgfortran] PR107031 - endfile truncates file at wrong position

2024-03-25 Thread Jerry D

Hi all,

There has been a bit of discussio on which way to go on this.

I took a look today and this trivial patch gives the behavior concluded 
on Fortran Discourse. See the bugzilla for all the relevant information.


Regresion tested on x86-64.

I will do the appropriate changelog.

OK for trunk?

Attached is a new test case and the patch here:

diff --git a/libgfortran/io/file_pos.c b/libgfortran/io/file_pos.c
index 2bc05b293f8..d169961f997 100644
--- a/libgfortran/io/file_pos.c
+++ b/libgfortran/io/file_pos.c
@@ -352,7 +352,6 @@ st_endfile (st_parameter_filepos *fpp)
  dtp.common = fpp->common;
  memset (, 0, sizeof (dtp.u.p));
  dtp.u.p.current_unit = u;
- next_record (, 1);
}

   unit_truncate (u, stell (u->s), >common);
! { dg-do run }
! PR107031 Check that endfile truncates at end of record 5.
program test_truncate
integer :: num_rec, tmp, i, nr, j
open(10, file="in.dat", action='readwrite')

do i=1,10
  write(10, *) i
end do

rewind (10)

num_rec = 5
i = 1
ioerr = 0
do while (i <= num_rec .and. ioerr == 0)
read(10, *, iostat=ioerr) tmp
i = i + 1
enddo
endfile(10)
rewind (10)
i = 0
ioerr = 0
do while (i <= num_rec + 1 .and. ioerr == 0)
  read(10, *, iostat=ioerr) j
  i = i + 1
end do
close(10, status='delete')
if (i - 1 /= 5) stop 1
end program test_truncate


Re: [PATCH v2] MIPS: Add MIN/MAX.fmt instructions support for MIPS R6

2024-03-25 Thread YunQiang Su
Jie Mei  于2024年3月25日周一 17:46写道:
>
> This patch adds the smin/smax RTL mode for the
> min/max.fmt instructions.
>
> Also, since the min/max.fmt instrucions applies to the
> IEEE 754-2008 "minNum" and "maxNum" operations, this
> patch also provides the new "fmin3" and
> "fmax3" modes.
>
> gcc/ChangeLog:
>
> * config/mips/i6400.md (i6400_fpu_minmax): New
> define_insn_reservation.
> * config/mips/mips.h (ISA_HAS_FMIN_FMAX): Define new macro.
> * config/mips/mips.md (UNSPEC_FMIN): New unspec.
> (UNSPEC_FMAX): Same as above.
> (type): Add fminmax.
> (smin3): Generates MIN.fmt instructions.
> (smax3): Generates MAX.fmt instructions.
> (fmin3): Generates MIN.fmt instructions.
> (fmax3): Generates MAX.fmt instructions.
> * config/mips/p6600.md (p6600_fpu_fabs): Include fminmax
> type.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/mips/mips-minmax.c: New test for MIPS R6.
> ---
>  gcc/config/mips/i6400.md|  6 +++
>  gcc/config/mips/mips.h  |  2 +
>  gcc/config/mips/mips.md | 50 -
>  gcc/config/mips/p6600.md|  2 +-
>  gcc/testsuite/gcc.target/mips/mips-minmax.c | 40 +
>  5 files changed, 97 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/mips/mips-minmax.c
>
> diff --git a/gcc/config/mips/i6400.md b/gcc/config/mips/i6400.md
> index 9f216fe0210..d6f691ee217 100644
> --- a/gcc/config/mips/i6400.md
> +++ b/gcc/config/mips/i6400.md
> @@ -219,6 +219,12 @@
> (eq_attr "type" "fabs,fneg,fmove"))
>"i6400_fpu_short, i6400_fpu_apu")
>
> +;; min, max
> +(define_insn_reservation "i6400_fpu_minmax" 2
> +  (and (eq_attr "cpu" "i6400")
> +   (eq_attr "type" "fminmax"))
> +  "i6400_fpu_short+i6400_fpu_logic")
> +
>  ;; fadd, fsub, fcvt
>  (define_insn_reservation "i6400_fpu_fadd" 4
>(and (eq_attr "cpu" "i6400")
> diff --git a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h
> index 7145d23c650..5ce984ac99b 100644
> --- a/gcc/config/mips/mips.h
> +++ b/gcc/config/mips/mips.h
> @@ -1259,6 +1259,8 @@ struct mips_cpu_info {
>  #define ISA_HAS_9BIT_DISPLACEMENT  (mips_isa_rev >= 6  \
>  || ISA_HAS_MIPS16E2)
>
> +#define ISA_HAS_FMIN_FMAX  (mips_isa_rev >= 6)
> +
>  /* ISA has data indexed prefetch instructions.  This controls use of
> 'prefx', along with TARGET_HARD_FLOAT and TARGET_DOUBLE_FLOAT.
> (prefx is a cop1x instruction, so can only be used if FP is
> diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md
> index b0fb5850a9e..26f758c90dd 100644
> --- a/gcc/config/mips/mips.md
> +++ b/gcc/config/mips/mips.md
> @@ -97,6 +97,10 @@
>UNSPEC_GET_FCSR
>UNSPEC_SET_FCSR
>
> +  ;; Floating-point unspecs.
> +  UNSPEC_FMIN
> +  UNSPEC_FMAX
> +
>;; HI/LO moves.
>UNSPEC_MFHI
>UNSPEC_MTHI
> @@ -370,6 +374,7 @@
>  ;; frsqrt   floating point reciprocal square root
>  ;; frsqrt1  floating point reciprocal square root step1
>  ;; frsqrt2  floating point reciprocal square root step2
> +;; fminmax  floating point min/max
>  ;; dspmac   DSP MAC instructions not saturating the accumulator
>  ;; dspmacsatDSP MAC instructions that saturate the accumulator
>  ;; accext   DSP accumulator extract instructions
> @@ -387,8 +392,8 @@
> 
> prefetch,prefetchx,condmove,mtc,mfc,mthi,mtlo,mfhi,mflo,const,arith,logical,
> shift,slt,signext,clz,pop,trap,imul,imul3,imul3nc,imadd,idiv,idiv3,move,
> fmove,fadd,fmul,fmadd,fdiv,frdiv,frdiv1,frdiv2,fabs,fneg,fcmp,fcvt,fsqrt,
> -   frsqrt,frsqrt1,frsqrt2,dspmac,dspmacsat,accext,accmod,dspalu,dspalusat,
> -   multi,atomic,syncloop,nop,ghost,multimem,
> +   frsqrt,frsqrt1,frsqrt2,fminmax,dspmac,dspmacsat,accext,accmod,dspalu,
> +   dspalusat,multi,atomic,syncloop,nop,ghost,multimem,
> simd_div,simd_fclass,simd_flog2,simd_fadd,simd_fcvt,simd_fmul,simd_fmadd,
> simd_fdiv,simd_bitins,simd_bitmov,simd_insert,simd_sld,simd_mul,simd_fcmp,
> simd_fexp2,simd_int_arith,simd_bit,simd_shift,simd_splat,simd_fill,
> @@ -7971,6 +7976,47 @@
>[(set_attr "move_type" "load")
> (set_attr "insn_count" "2")])
>
> +;;
> +;;  Float point MIN/MAX
> +;;
> +
> +(define_insn "smin3"
> +  [(set (match_operand:SCALARF 0 "register_operand" "=f")
> +   (smin:SCALARF (match_operand:SCALARF 1 "register_operand" "f")
> + (match_operand:SCALARF 2 "register_operand" "f")))]
> +  "ISA_HAS_FMIN_FMAX"
> +  "min.\t%0,%1,%2"
> +  [(set_attr "type" "fminmax")
> +   (set_attr "mode" "")])
> +
> +(define_insn "smax3"
> +  [(set (match_operand:SCALARF 0 "register_operand" "=f")
> +   (smax:SCALARF (match_operand:SCALARF 1 "register_operand" "f")
> + (match_operand:SCALARF 2 "register_operand" "f")))]
> +  "ISA_HAS_FMIN_FMAX"
> +  "max.\t%0,%1,%2"
> +  [(set_attr "type" "fminmax")
> +  (set_attr "mode" 

[PATCH v3] doc: Correction of Tree SSA Passes info.

2024-03-25 Thread Chenghui Pan
Current document of Tree SSA passes contains many parts that is not
updated for many years.

This patch removes some info that is outdated and not existed in
current GCC codebase, and fixes some wrong code location descriptions
based on current codebase status and ChangeLogs.

Changes since v1:
* v3: Add reference to related PRs.
* v2: Add correct info for pass_build_alias.

gcc/ChangeLog:

PR rtl-optimization/951
PR tree-optimization/13756
* doc/passes.texi: Correction of Tree SSA Passes info.
---
 gcc/doc/passes.texi | 75 +
 1 file changed, 7 insertions(+), 68 deletions(-)

diff --git a/gcc/doc/passes.texi b/gcc/doc/passes.texi
index b50d3d5635b..b13ad06c5a9 100644
--- a/gcc/doc/passes.texi
+++ b/gcc/doc/passes.texi
@@ -450,17 +450,6 @@ The following briefly describes the Tree optimization 
passes that are
 run after gimplification and what source files they are located in.
 
 @itemize @bullet
-@item Remove useless statements
-
-This pass is an extremely simple sweep across the gimple code in which
-we identify obviously dead code and remove it.  Here we do things like
-simplify @code{if} statements with constant conditions, remove
-exception handling constructs surrounding code that obviously cannot
-throw, remove lexical bindings that contain no variables, and other
-assorted simplistic cleanups.  The idea is to get rid of the obvious
-stuff quickly rather than wait until later when it's more work to get
-rid of it.  This pass is located in @file{tree-cfg.cc} and described by
-@code{pass_remove_useless_stmts}.
 
 @item OpenMP lowering
 
@@ -478,7 +467,7 @@ described by @code{pass_lower_omp}.
 
 If OpenMP generation (@option{-fopenmp}) is enabled, this pass expands
 parallel regions into their own functions to be invoked by the thread
-library.  The pass is located in @file{omp-low.cc} and is described by
+library.  The pass is located in @file{omp-expand.cc} and is described by
 @code{pass_expand_omp}.
 
 @item Lower control flow
@@ -511,15 +500,6 @@ This pass decomposes a function into basic blocks and 
creates all of
 the edges that connect them.  It is located in @file{tree-cfg.cc} and
 is described by @code{pass_build_cfg}.
 
-@item Find all referenced variables
-
-This pass walks the entire function and collects an array of all
-variables referenced in the function, @code{referenced_vars}.  The
-index at which a variable is found in the array is used as a UID
-for the variable within this function.  This data is needed by the
-SSA rewriting routines.  The pass is located in @file{tree-dfa.cc}
-and is described by @code{pass_referenced_vars}.
-
 @item Enter static single assignment form
 
 This pass rewrites the function such that it is in SSA form.  After
@@ -562,15 +542,6 @@ variables that are used once into the expression that uses 
them and
 seeing if the result can be simplified.  It is located in
 @file{tree-ssa-forwprop.cc} and is described by @code{pass_forwprop}.
 
-@item Copy Renaming
-
-This pass attempts to change the name of compiler temporaries involved in
-copy operations such that SSA->normal can coalesce the copy away.  When 
compiler
-temporaries are copies of user variables, it also renames the compiler
-temporary to the user variable resulting in better use of user symbols.  It is
-located in @file{tree-ssa-copyrename.c} and is described by
-@code{pass_copyrename}.
-
 @item PHI node optimizations
 
 This pass recognizes forms of PHI inputs that can be represented as
@@ -581,12 +552,8 @@ It is located in @file{tree-ssa-phiopt.cc} and is 
described by
 @item May-alias optimization
 
 This pass performs a flow sensitive SSA-based points-to analysis.
-The resulting may-alias, must-alias, and escape analysis information
-is used to promote variables from in-memory addressable objects to
-non-aliased variables that can be renamed into SSA form.  We also
-update the @code{VDEF}/@code{VUSE} memory tags for non-renameable
-aggregates so that we get fewer false kills.  The pass is located
-in @file{tree-ssa-alias.cc} and is described by @code{pass_may_alias}.
+It is located in @file{tree-ssa-structalias.cc} and is described
+by @code{pass_build_alias}.
 
 Interprocedural points-to information is located in
 @file{tree-ssa-structalias.cc} and described by @code{pass_ipa_pta}.
@@ -604,7 +571,7 @@ is described by @code{pass_ipa_tree_profile}.
 This pass implements series of heuristics to guess propababilities
 of branches.  The resulting predictions are turned into edge profile
 by propagating branches across the control flow graphs.
-The pass is located in @file{tree-profile.cc} and is described by
+The pass is located in @file{predict.cc} and is described by
 @code{pass_profile}.
 
 @item Lower complex arithmetic
@@ -653,7 +620,7 @@ in @file{tree-ssa-math-opts.cc} and is described by
 @item Full redundancy elimination
 
 This is a simpler form of PRE that only eliminates redundancies that
-occur on all paths.  It is 

Re: [PATCH v1] doc: Correction of Tree SSA Passes info.

2024-03-25 Thread Chenghui Pan

Seems right. I will reference them.

Also I think maybe some documents for the passes are still missing and 
we can add them


in the future...

On 2024/3/26 08:11, Andrew Pinski wrote:

On Sun, Mar 24, 2024 at 8:46 PM Chenghui Pan  wrote:

Current document of Tree SSA passes contains many parts that is not
updated for many years.

This patch removes some info that is outdated and not existed in
current GCC codebase, and fixes some wrong code location descriptions
based on current codebase status and ChangeLogs.


This improves the situation for
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=951 (and
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=13756 ). Maybe it should
include a reference to those 2 also.

Thanks,
Andrew


gcc/ChangeLog:

 * doc/passes.texi: Correction of Tree SSA Passes info.
---
  gcc/doc/passes.texi | 70 -
  1 file changed, 6 insertions(+), 64 deletions(-)

diff --git a/gcc/doc/passes.texi b/gcc/doc/passes.texi
index b50d3d5635b..068036acb7d 100644
--- a/gcc/doc/passes.texi
+++ b/gcc/doc/passes.texi
@@ -450,17 +450,6 @@ The following briefly describes the Tree optimization 
passes that are
  run after gimplification and what source files they are located in.

  @itemize @bullet
-@item Remove useless statements
-
-This pass is an extremely simple sweep across the gimple code in which
-we identify obviously dead code and remove it.  Here we do things like
-simplify @code{if} statements with constant conditions, remove
-exception handling constructs surrounding code that obviously cannot
-throw, remove lexical bindings that contain no variables, and other
-assorted simplistic cleanups.  The idea is to get rid of the obvious
-stuff quickly rather than wait until later when it's more work to get
-rid of it.  This pass is located in @file{tree-cfg.cc} and described by
-@code{pass_remove_useless_stmts}.

  @item OpenMP lowering

@@ -478,7 +467,7 @@ described by @code{pass_lower_omp}.

  If OpenMP generation (@option{-fopenmp}) is enabled, this pass expands
  parallel regions into their own functions to be invoked by the thread
-library.  The pass is located in @file{omp-low.cc} and is described by
+library.  The pass is located in @file{omp-expand.cc} and is described by
  @code{pass_expand_omp}.

  @item Lower control flow
@@ -511,15 +500,6 @@ This pass decomposes a function into basic blocks and 
creates all of
  the edges that connect them.  It is located in @file{tree-cfg.cc} and
  is described by @code{pass_build_cfg}.

-@item Find all referenced variables
-
-This pass walks the entire function and collects an array of all
-variables referenced in the function, @code{referenced_vars}.  The
-index at which a variable is found in the array is used as a UID
-for the variable within this function.  This data is needed by the
-SSA rewriting routines.  The pass is located in @file{tree-dfa.cc}
-and is described by @code{pass_referenced_vars}.
-
  @item Enter static single assignment form

  This pass rewrites the function such that it is in SSA form.  After
@@ -562,15 +542,6 @@ variables that are used once into the expression that uses 
them and
  seeing if the result can be simplified.  It is located in
  @file{tree-ssa-forwprop.cc} and is described by @code{pass_forwprop}.

-@item Copy Renaming
-
-This pass attempts to change the name of compiler temporaries involved in
-copy operations such that SSA->normal can coalesce the copy away.  When 
compiler
-temporaries are copies of user variables, it also renames the compiler
-temporary to the user variable resulting in better use of user symbols.  It is
-located in @file{tree-ssa-copyrename.c} and is described by
-@code{pass_copyrename}.
-
  @item PHI node optimizations

  This pass recognizes forms of PHI inputs that can be represented as
@@ -585,8 +556,7 @@ The resulting may-alias, must-alias, and escape analysis 
information
  is used to promote variables from in-memory addressable objects to
  non-aliased variables that can be renamed into SSA form.  We also
  update the @code{VDEF}/@code{VUSE} memory tags for non-renameable
-aggregates so that we get fewer false kills.  The pass is located
-in @file{tree-ssa-alias.cc} and is described by @code{pass_may_alias}.
+aggregates so that we get fewer false kills.

  Interprocedural points-to information is located in
  @file{tree-ssa-structalias.cc} and described by @code{pass_ipa_pta}.
@@ -604,7 +574,7 @@ is described by @code{pass_ipa_tree_profile}.
  This pass implements series of heuristics to guess propababilities
  of branches.  The resulting predictions are turned into edge profile
  by propagating branches across the control flow graphs.
-The pass is located in @file{tree-profile.cc} and is described by
+The pass is located in @file{predict.cc} and is described by
  @code{pass_profile}.

  @item Lower complex arithmetic
@@ -653,7 +623,7 @@ in @file{tree-ssa-math-opts.cc} and is described by
  @item Full redundancy elimination

  This is a 

Re: Re: [PATCH] RISC-V: Add initial cost handling for segment loads/stores.

2024-03-25 Thread juzhe.zh...@rivai.ai
I think it's harmless to let this patch in GCC-14.
So LGTM from my side to land this path in GCC-14..



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2024-03-26 01:07
To: Jeff Law; 钟居哲; gcc-patches; palmer; kito.cheng
CC: rdapp.gcc
Subject: Re: [PATCH] RISC-V: Add initial cost handling for segment loads/stores.
> So where do we stand with this?  Juzhe asked it to be rebased, but I
> don't see a rebased version in my inbox and I don't see anything that
> looks like this on the trunk.
 
I missed this one and figured as we're pretty late in the cycle it can
wait until GCC 15.  Therefore let's call it "deferred".
 
Regards
Robin
 


Re: [PATCH v1] doc: Correction of Tree SSA Passes info.

2024-03-25 Thread Andrew Pinski
On Sun, Mar 24, 2024 at 8:46 PM Chenghui Pan  wrote:
>
> Current document of Tree SSA passes contains many parts that is not
> updated for many years.
>
> This patch removes some info that is outdated and not existed in
> current GCC codebase, and fixes some wrong code location descriptions
> based on current codebase status and ChangeLogs.


This improves the situation for
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=951 (and
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=13756 ). Maybe it should
include a reference to those 2 also.

Thanks,
Andrew

>
> gcc/ChangeLog:
>
> * doc/passes.texi: Correction of Tree SSA Passes info.
> ---
>  gcc/doc/passes.texi | 70 -
>  1 file changed, 6 insertions(+), 64 deletions(-)
>
> diff --git a/gcc/doc/passes.texi b/gcc/doc/passes.texi
> index b50d3d5635b..068036acb7d 100644
> --- a/gcc/doc/passes.texi
> +++ b/gcc/doc/passes.texi
> @@ -450,17 +450,6 @@ The following briefly describes the Tree optimization 
> passes that are
>  run after gimplification and what source files they are located in.
>
>  @itemize @bullet
> -@item Remove useless statements
> -
> -This pass is an extremely simple sweep across the gimple code in which
> -we identify obviously dead code and remove it.  Here we do things like
> -simplify @code{if} statements with constant conditions, remove
> -exception handling constructs surrounding code that obviously cannot
> -throw, remove lexical bindings that contain no variables, and other
> -assorted simplistic cleanups.  The idea is to get rid of the obvious
> -stuff quickly rather than wait until later when it's more work to get
> -rid of it.  This pass is located in @file{tree-cfg.cc} and described by
> -@code{pass_remove_useless_stmts}.
>
>  @item OpenMP lowering
>
> @@ -478,7 +467,7 @@ described by @code{pass_lower_omp}.
>
>  If OpenMP generation (@option{-fopenmp}) is enabled, this pass expands
>  parallel regions into their own functions to be invoked by the thread
> -library.  The pass is located in @file{omp-low.cc} and is described by
> +library.  The pass is located in @file{omp-expand.cc} and is described by
>  @code{pass_expand_omp}.
>
>  @item Lower control flow
> @@ -511,15 +500,6 @@ This pass decomposes a function into basic blocks and 
> creates all of
>  the edges that connect them.  It is located in @file{tree-cfg.cc} and
>  is described by @code{pass_build_cfg}.
>
> -@item Find all referenced variables
> -
> -This pass walks the entire function and collects an array of all
> -variables referenced in the function, @code{referenced_vars}.  The
> -index at which a variable is found in the array is used as a UID
> -for the variable within this function.  This data is needed by the
> -SSA rewriting routines.  The pass is located in @file{tree-dfa.cc}
> -and is described by @code{pass_referenced_vars}.
> -
>  @item Enter static single assignment form
>
>  This pass rewrites the function such that it is in SSA form.  After
> @@ -562,15 +542,6 @@ variables that are used once into the expression that 
> uses them and
>  seeing if the result can be simplified.  It is located in
>  @file{tree-ssa-forwprop.cc} and is described by @code{pass_forwprop}.
>
> -@item Copy Renaming
> -
> -This pass attempts to change the name of compiler temporaries involved in
> -copy operations such that SSA->normal can coalesce the copy away.  When 
> compiler
> -temporaries are copies of user variables, it also renames the compiler
> -temporary to the user variable resulting in better use of user symbols.  It 
> is
> -located in @file{tree-ssa-copyrename.c} and is described by
> -@code{pass_copyrename}.
> -
>  @item PHI node optimizations
>
>  This pass recognizes forms of PHI inputs that can be represented as
> @@ -585,8 +556,7 @@ The resulting may-alias, must-alias, and escape analysis 
> information
>  is used to promote variables from in-memory addressable objects to
>  non-aliased variables that can be renamed into SSA form.  We also
>  update the @code{VDEF}/@code{VUSE} memory tags for non-renameable
> -aggregates so that we get fewer false kills.  The pass is located
> -in @file{tree-ssa-alias.cc} and is described by @code{pass_may_alias}.
> +aggregates so that we get fewer false kills.
>
>  Interprocedural points-to information is located in
>  @file{tree-ssa-structalias.cc} and described by @code{pass_ipa_pta}.
> @@ -604,7 +574,7 @@ is described by @code{pass_ipa_tree_profile}.
>  This pass implements series of heuristics to guess propababilities
>  of branches.  The resulting predictions are turned into edge profile
>  by propagating branches across the control flow graphs.
> -The pass is located in @file{tree-profile.cc} and is described by
> +The pass is located in @file{predict.cc} and is described by
>  @code{pass_profile}.
>
>  @item Lower complex arithmetic
> @@ -653,7 +623,7 @@ in @file{tree-ssa-math-opts.cc} and is described by
>  @item Full redundancy elimination
>
>  This is a simpler form of PRE 

Re: No rule to make target '../libbacktrace/libbacktrace.la', needed by 'libgo.la'. [PR106472]

2024-03-25 Thread Ian Lance Taylor
On Sat, Mar 23, 2024 at 4:32 AM Дилян Палаузов
 wrote:
>
> Can the build experts say what needs to be changed?  The dependencies I added 
> are missing in the build configuration (@if gcc-bootstrap).
>
> I cannot say if libbacktrace should or should not be a bootstrap=true module.

I don't count as a build expert these days, but since GCC itself links
against libbacktrace, my understanding is that the libbacktrace
host_module should be bootstrap=true, just like, say, libcpp.

Ian


[PATCH v3] c++: ICE with noexcept and local specialization, again [PR114349]

2024-03-25 Thread Marek Polacek
On Mon, Mar 25, 2024 at 03:40:10PM -0400, Jason Merrill wrote:
> On 3/22/24 17:30, Marek Polacek wrote:
> > On Thu, Mar 21, 2024 at 05:27:37PM -0400, Jason Merrill wrote:
> > > On 3/21/24 17:01, Marek Polacek wrote:
> > > > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> > > > 
> > > > -- >8 --
> > > > Patrick noticed that my r14-9339-gdc6c3bfb59baab patch is wrong;
> > > > we're dealing with a noexcept-spec there, not a noexcept-expr, so
> > > > setting cp_noexcept_operand et al is incorrect.  Back to the drawing
> > > > board then.
> > > > 
> > > > To fix noexcept84.C, we should probably avoid doing push_to_top_level
> > > > in certain cases.  Patrick suggested checking:
> > > > 
> > > > const bool push_to_top = current_function_decl != fn;
> > > > 
> > > > which works, but I'm not sure I follow the logic there.  I also came
> > > > up with
> > > > 
> > > > const bool push_to_top = !decl_function_context (fn);
> > > > 
> > > > which also works.  But ultimately I went with 
> > > > !DECL_TEMPLATE_INSTANTIATED;
> > > > if DECL_TEMPLATE_INSTANTIATED is set, we've already pushed to top level
> > > > if it was necessary in instantiate_body.
> > > 
> > > This sort of thing is what maybe_push_to_top_level is for, does that also
> > > work?
> > 
> > Sadly -- and I should have mentioned that -- no.  maybe_push_to_top_level 
> > asks:
> > 
> >bool push_to_top
> >  = !(current_function_decl
> > && !LAMBDA_FUNCTION_P (d)
> > && decl_function_context (d) == current_function_decl);
> > 
> > here both d and current_function_decl are test()::S::S(), and
> > decl_function_context (d) is test().  (current_function_decl was
> > set to test()::S::S() by an earlier push_access_scope call.)
> > 
> > But I want it to work, and I think using maybe_ would be a way nicer
> > fix.  So what if we don't push to top level if decl_function_context
> > is non-null?  I had to add the LAMBDA_TYPE_P check though: it looks
> > that we always have to push to top level for lambdas, but sometimes
> > we get a lambda's TYPE_DECL, and LAMBDA_FUNCTION_P doesn't catch
> > that.  An example is lambda-nested4.C.
> > 
> > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> > 
> > -- >8 --
> > Patrick noticed that my r14-9339-gdc6c3bfb59baab patch is wrong;
> > we're dealing with a noexcept-spec there, not a noexcept-expr, so
> > setting cp_noexcept_operand et al is incorrect.  Back to the drawing
> > board then.
> > 
> > To fix noexcept84.C, we should probably avoid doing push_to_top_level
> > in certain cases.  maybe_push_to_top_level didn't work here as-is, so
> > I changed it to not push to top level if decl_function_context is
> > non-null, when we are not dealing with a lambda.
> > 
> > This also fixes c++/114349, introduced by r14-9339.
> > 
> > PR c++/114349
> > 
> > gcc/cp/ChangeLog:
> > 
> > * name-lookup.cc (maybe_push_to_top_level): For a non-lambda,
> > don't push to top level if decl_function_context is non-null.
> > * pt.cc (maybe_instantiate_noexcept): Use maybe_push_to_top_level.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/cpp0x/noexcept85.C: New test.
> > * g++.dg/cpp0x/noexcept86.C: New test.
> > ---
> >   gcc/cp/name-lookup.cc   | 12 ++---
> >   gcc/cp/pt.cc| 11 ++---
> >   gcc/testsuite/g++.dg/cpp0x/noexcept85.C | 33 +
> >   gcc/testsuite/g++.dg/cpp0x/noexcept86.C | 25 +++
> >   4 files changed, 68 insertions(+), 13 deletions(-)
> >   create mode 100644 gcc/testsuite/g++.dg/cpp0x/noexcept85.C
> >   create mode 100644 gcc/testsuite/g++.dg/cpp0x/noexcept86.C
> > 
> > diff --git a/gcc/cp/name-lookup.cc b/gcc/cp/name-lookup.cc
> > index dce4caf8981..4b2b27bdd0d 100644
> > --- a/gcc/cp/name-lookup.cc
> > +++ b/gcc/cp/name-lookup.cc
> > @@ -8664,10 +8664,14 @@ maybe_push_to_top_level (tree d)
> >   {
> > /* Push if D isn't function-local, or is a lambda function, for which 
> > name
> >resolution is already done.  */
> > -  bool push_to_top
> > -= !(current_function_decl
> > -   && !LAMBDA_FUNCTION_P (d)
> > -   && decl_function_context (d) == current_function_decl);
> > +  const bool push_to_top
> > += (LAMBDA_FUNCTION_P (d)
> > +   || (TREE_CODE (d) == TYPE_DECL
> > +  && TREE_TYPE (d)
> > +  && LAMBDA_TYPE_P (TREE_TYPE (d)))
> > +   || !current_function_decl
> > +   || (!decl_function_context (d)
> > +  && decl_function_context (d) != current_function_decl));
> 
> This line seems unnecessary; the case it excludes is when
> decl_function_context and current_function_decl are both null, but if
> current_function_decl is null we already succeeded.
> 
> OK with this line removed.

Thanks a lot.  I've verified that the following version passes a 
bootstrap/regtest
on x86_64-pc-linux-gnu.

-- >8 --
Patrick noticed that my r14-9339-gdc6c3bfb59baab patch is wrong;
we're dealing with a noexcept-spec there, not a noexcept-expr, 

Re: [PATCH] c++: broken direct-init with trailing array member [PR114439]

2024-03-25 Thread Jason Merrill

On 3/25/24 17:59, Marek Polacek wrote:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?


OK.


-- >8 --
can_init_array_with_p is wrongly saying that the init for 's' here:

   struct S {
 int *list = arr;
 int arr[];
   };

   struct A {
 A() {}
 S s[2]{};
   };

is invalid.  But as process_init_constructor_array says, for "non-constant
initialization of trailing elements with no explicit initializers" we use
a VEC_INIT_EXPR wrapped in a TARGET_EXPR, built in process_init_constructor.

Unfortunately we didn't have a test for this scenario so I didn't
realize can_init_array_with_p must handle it.

PR c++/114439

gcc/cp/ChangeLog:

* init.cc (can_init_array_with_p): Return true for a VEC_INIT_EXPR
wrapped in a TARGET_EXPR.

gcc/testsuite/ChangeLog:

* g++.dg/init/array65.C: New test.
---
  gcc/cp/init.cc  |  6 -
  gcc/testsuite/g++.dg/init/array65.C | 38 +
  2 files changed, 43 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/init/array65.C

diff --git a/gcc/cp/init.cc b/gcc/cp/init.cc
index dbd37d47cbf..a93ce00800c 100644
--- a/gcc/cp/init.cc
+++ b/gcc/cp/init.cc
@@ -950,12 +950,16 @@ can_init_array_with_p (tree type, tree init)
   mem-initializers of a constructor.  */
if (DECL_DEFAULTED_FN (current_function_decl))
  return true;
-  /* As an extension, we allow copying from a compound literal.  */
if (TREE_CODE (init) == TARGET_EXPR)
  {
init = TARGET_EXPR_INITIAL (init);
+  /* As an extension, we allow copying from a compound literal.  */
if (TREE_CODE (init) == CONSTRUCTOR)
return CONSTRUCTOR_C99_COMPOUND_LITERAL (init);
+  /* VEC_INIT_EXPR is used for non-constant initialization of trailing
+elements with no explicit initializers.  */
+  else if (TREE_CODE (init) == VEC_INIT_EXPR)
+   return true;
  }
  
return false;

diff --git a/gcc/testsuite/g++.dg/init/array65.C 
b/gcc/testsuite/g++.dg/init/array65.C
new file mode 100644
index 000..0b144f45a9d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/init/array65.C
@@ -0,0 +1,38 @@
+// PR c++/114439
+// { dg-do compile { target c++11 } }
+
+struct S {
+  int *list = arr;
+  __extension__ int arr[];
+};
+
+struct R {
+  int *list = arr;
+  int arr[2];
+};
+
+struct A {
+  A() {}
+  S s[2]{};
+};
+
+struct A2 {
+  A2() {}
+  S s[2]{ {}, {} };
+};
+
+struct B {
+  B() {}
+  R r[2]{};
+};
+
+struct B2 {
+  B2() {}
+  R r[2]{ {}, {} };
+};
+
+struct S1 { S1(); };
+struct S2 {
+  S2() {}
+  S1 a[1] {};
+};

base-commit: 18555b914316e8c1fb11ee821f2ee839d834e58e




[PATCH] c++: broken direct-init with trailing array member [PR114439]

2024-03-25 Thread Marek Polacek
Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
can_init_array_with_p is wrongly saying that the init for 's' here:

  struct S {
int *list = arr;
int arr[];
  };

  struct A {
A() {}
S s[2]{};
  };

is invalid.  But as process_init_constructor_array says, for "non-constant
initialization of trailing elements with no explicit initializers" we use
a VEC_INIT_EXPR wrapped in a TARGET_EXPR, built in process_init_constructor.

Unfortunately we didn't have a test for this scenario so I didn't
realize can_init_array_with_p must handle it.

PR c++/114439

gcc/cp/ChangeLog:

* init.cc (can_init_array_with_p): Return true for a VEC_INIT_EXPR
wrapped in a TARGET_EXPR.

gcc/testsuite/ChangeLog:

* g++.dg/init/array65.C: New test.
---
 gcc/cp/init.cc  |  6 -
 gcc/testsuite/g++.dg/init/array65.C | 38 +
 2 files changed, 43 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/init/array65.C

diff --git a/gcc/cp/init.cc b/gcc/cp/init.cc
index dbd37d47cbf..a93ce00800c 100644
--- a/gcc/cp/init.cc
+++ b/gcc/cp/init.cc
@@ -950,12 +950,16 @@ can_init_array_with_p (tree type, tree init)
  mem-initializers of a constructor.  */
   if (DECL_DEFAULTED_FN (current_function_decl))
 return true;
-  /* As an extension, we allow copying from a compound literal.  */
   if (TREE_CODE (init) == TARGET_EXPR)
 {
   init = TARGET_EXPR_INITIAL (init);
+  /* As an extension, we allow copying from a compound literal.  */
   if (TREE_CODE (init) == CONSTRUCTOR)
return CONSTRUCTOR_C99_COMPOUND_LITERAL (init);
+  /* VEC_INIT_EXPR is used for non-constant initialization of trailing
+elements with no explicit initializers.  */
+  else if (TREE_CODE (init) == VEC_INIT_EXPR)
+   return true;
 }
 
   return false;
diff --git a/gcc/testsuite/g++.dg/init/array65.C 
b/gcc/testsuite/g++.dg/init/array65.C
new file mode 100644
index 000..0b144f45a9d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/init/array65.C
@@ -0,0 +1,38 @@
+// PR c++/114439
+// { dg-do compile { target c++11 } }
+
+struct S {
+  int *list = arr;
+  __extension__ int arr[];
+};
+
+struct R {
+  int *list = arr;
+  int arr[2];
+};
+
+struct A {
+  A() {}
+  S s[2]{};
+};
+
+struct A2 {
+  A2() {}
+  S s[2]{ {}, {} };
+};
+
+struct B {
+  B() {}
+  R r[2]{};
+};
+
+struct B2 {
+  B2() {}
+  R r[2]{ {}, {} };
+};
+
+struct S1 { S1(); };
+struct S2 {
+  S2() {}
+  S1 a[1] {};
+};

base-commit: 18555b914316e8c1fb11ee821f2ee839d834e58e
-- 
2.44.0



Re: TARGET_RTX_COSTS and pipeline latency vs. variable-latency instructions (was Re: [PATCH] RISC-V: Add XiangShan Nanhu microarchitecture.)

2024-03-25 Thread Jeff Law




On 3/25/24 2:57 PM, Palmer Dabbelt wrote:

On Mon, 25 Mar 2024 13:49:18 PDT (-0700), jeffreya...@gmail.com wrote:



On 3/25/24 2:31 PM, Palmer Dabbelt wrote:

On Mon, 25 Mar 2024 13:27:34 PDT (-0700), Jeff Law wrote:


I'd doubt it's worth the complexity.  Picking some reasonable value 
gets

you the vast majority of the benefit.   Something like
COSTS_N_INSNS(6) is enough to get CSE to trigger.  So what's left is a
reasonable cost, particularly for the division-by-constant case 
where we

need a ceiling for synth_mult.


Ya, makes sense.  I noticed our multi-word multiply costs are a bit odd
too (they really only work for 64-bit mul on 32-bit targets), but that's
probably not worth worrying about either.

We do have a changes locally that adjust various costs.  One of which is
highpart multiply.  One of the many things to start working through once
gcc-15 opens for development.  Hence my desire to help keep gcc-14 on
track for an on-time release.


Cool.  LMK if there's anything we can do to help on that front.
I think the RISC-V space is in pretty good shape.   Most of the issues 
left are either generic or hitting other targets.  While the number of 
P1s has been flat or rising, that's more an artifact of bug 
triage/reprioritization process that's ongoing.  I can only speak for 
myself, but the progress in nailing down the slew of bugs thrown into 
the P1 bucket over the last few weeks has been great IMHO.


jeff


Re: TARGET_RTX_COSTS and pipeline latency vs. variable-latency instructions (was Re: [PATCH] RISC-V: Add XiangShan Nanhu microarchitecture.)

2024-03-25 Thread Palmer Dabbelt

On Mon, 25 Mar 2024 13:49:18 PDT (-0700), jeffreya...@gmail.com wrote:



On 3/25/24 2:31 PM, Palmer Dabbelt wrote:

On Mon, 25 Mar 2024 13:27:34 PDT (-0700), Jeff Law wrote:



I'd doubt it's worth the complexity.  Picking some reasonable value gets
you the vast majority of the benefit.   Something like
COSTS_N_INSNS(6) is enough to get CSE to trigger.  So what's left is a
reasonable cost, particularly for the division-by-constant case where we
need a ceiling for synth_mult.


Ya, makes sense.  I noticed our multi-word multiply costs are a bit odd
too (they really only work for 64-bit mul on 32-bit targets), but that's
probably not worth worrying about either.

We do have a changes locally that adjust various costs.  One of which is
highpart multiply.  One of the many things to start working through once
gcc-15 opens for development.  Hence my desire to help keep gcc-14 on
track for an on-time release.


Cool.  LMK if there's anything we can do to help on that front.



Jeff


Re: [PATCH] c++: templated substitution into lambda-expr [PR114393]

2024-03-25 Thread Patrick Palka
On Mon, 25 Mar 2024, Patrick Palka wrote:

> Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK
> for trunk?
> 
> -- >8 --
> 
> The below testcases use a lambda-expr as a template argument and they
> all trip over the below added tsubst_lambda_expr sanity check ultimately
> because current_template_parms is empty, which causes push_template_decl
> to return error_mark_node from the call to begin_lambda_type.  Were it
> not for the sanity check this silent error_mark_node result leads to
> nonsensical errors down the line, or silent breakage.
> 
> In the first testcase, we hit this assert during instantiation of the
> dependent alias template-id c1_t<_Data> from instantiate_template, which
> clears current_template_parms via push_to_top_level.  Similar story for
> the second testcase.  For the third testcase we hit the assert during
> partial instantiation of the member template from instantiate_class_template
> which similarly calls push_to_top_level.
> 
> These testcases illustrate that templated substitution into a lambda-expr
> is not always possible, in particular when we lost the relevant template
> context.  I experimented with recovering the template context by making
> tsubst_lambda_expr fall back to using scope_chain->prev->template_parms if
> current_template_parms is empty which worked but seemed like a hack.  I
> also experimented with preserving the template context by keeping
> current_template_parms set during instantiate_template for a dependent
> specialization which also worked but it's at odds with the fact that we
> cache dependent specializations (and so they should be independent of
> the template context).
> 
> So instead of trying to make such substitution work, this patch uses the
> extra-args mechanism to defer templated substitution into a lambda-expr
> when we lost the relevant template context.
> 
>   PR c++/114393
>   PR c++/107457
>   PR c++/93595
> 
> gcc/cp/ChangeLog:
> 
>   * cp-tree.h (LAMBDA_EXPR_EXTRA_ARGS):
>   (struct GTY):
>   * module.cc (trees_out::core_vals) : Stream
>   LAMBDA_EXPR_EXTRA_ARGS.
>   (trees_in::core_vals) : Likewise.
>   * pt.cc (has_extra_args_mechanism_p):
>   (tsubst_lambda_expr):

Whoops, this version of the patch has an incomplete ChangeLog entry.

> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/cpp2a/lambda-targ2.C: New test.
>   * g++.dg/cpp2a/lambda-targ3.C: New test.
>   * g++.dg/cpp2a/lambda-targ4.C: New test.
> ---
>  gcc/cp/cp-tree.h  |  5 +
>  gcc/cp/module.cc  |  2 ++
>  gcc/cp/pt.cc  | 20 ++--
>  gcc/testsuite/g++.dg/cpp2a/lambda-targ2.C | 19 +++
>  gcc/testsuite/g++.dg/cpp2a/lambda-targ3.C | 12 
>  gcc/testsuite/g++.dg/cpp2a/lambda-targ4.C | 12 
>  6 files changed, 68 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/cpp2a/lambda-targ2.C
>  create mode 100644 gcc/testsuite/g++.dg/cpp2a/lambda-targ3.C
>  create mode 100644 gcc/testsuite/g++.dg/cpp2a/lambda-targ4.C
> 
> diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
> index c29a5434492..27100537038 100644
> --- a/gcc/cp/cp-tree.h
> +++ b/gcc/cp/cp-tree.h
> @@ -1538,6 +1538,10 @@ enum cp_lambda_default_capture_mode_type {
>  #define LAMBDA_EXPR_REGEN_INFO(NODE) \
>(((struct tree_lambda_expr *)LAMBDA_EXPR_CHECK (NODE))->regen_info)
>  
> +/* Like PACK_EXPANSION_EXTRA_ARGS, for lambda-expressions.  */
> +#define LAMBDA_EXPR_EXTRA_ARGS(NODE) \
> +  (((struct tree_lambda_expr *)LAMBDA_EXPR_CHECK (NODE))->extra_args)
> +
>  /* The closure type of the lambda, which is also the type of the
> LAMBDA_EXPR.  */
>  #define LAMBDA_EXPR_CLOSURE(NODE) \
> @@ -1550,6 +1554,7 @@ struct GTY (()) tree_lambda_expr
>tree this_capture;
>tree extra_scope;
>tree regen_info;
> +  tree extra_args;
>vec *pending_proxies;
>location_t locus;
>enum cp_lambda_default_capture_mode_type default_capture_mode : 2;
> diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
> index 52c60cf370c..1cd890909e3 100644
> --- a/gcc/cp/module.cc
> +++ b/gcc/cp/module.cc
> @@ -6312,6 +6312,7 @@ trees_out::core_vals (tree t)
>WT (((lang_tree_node *)t)->lambda_expression.this_capture);
>WT (((lang_tree_node *)t)->lambda_expression.extra_scope);
>WT (((lang_tree_node *)t)->lambda_expression.regen_info);
> +  WT (((lang_tree_node *)t)->lambda_expression.extra_args);
>/* pending_proxies is a parse-time thing.  */
>gcc_assert (!((lang_tree_node *)t)->lambda_expression.pending_proxies);
>if (state)
> @@ -6814,6 +6815,7 @@ trees_in::core_vals (tree t)
>RT (((lang_tree_node *)t)->lambda_expression.this_capture);
>RT (((lang_tree_node *)t)->lambda_expression.extra_scope);
>RT (((lang_tree_node *)t)->lambda_expression.regen_info);
> +  RT (((lang_tree_node *)t)->lambda_expression.extra_args);
>/* 

Re: [PATCH v7 3/5] Use the .ACCESS_WITH_SIZE in builtin object size.

2024-03-25 Thread Joseph Myers
On Wed, 20 Mar 2024, Qing Zhao wrote:

> +   the size of the element can be retrived from the result type of the call,
> +   which is the pointer to the array type.  */

Again, start a sentence with an uppercase letter.

> +  /* if not for dynamic object size, return.  */

> +  /* result type is a pointer type to the original flexible array type.  */

Likewise.

-- 
Joseph S. Myers
josmy...@redhat.com



Re: TARGET_RTX_COSTS and pipeline latency vs. variable-latency instructions (was Re: [PATCH] RISC-V: Add XiangShan Nanhu microarchitecture.)

2024-03-25 Thread Jeff Law




On 3/25/24 2:31 PM, Palmer Dabbelt wrote:

On Mon, 25 Mar 2024 13:27:34 PDT (-0700), Jeff Law wrote:



I'd doubt it's worth the complexity.  Picking some reasonable value gets
you the vast majority of the benefit.   Something like
COSTS_N_INSNS(6) is enough to get CSE to trigger.  So what's left is a
reasonable cost, particularly for the division-by-constant case where we
need a ceiling for synth_mult.


Ya, makes sense.  I noticed our multi-word multiply costs are a bit odd 
too (they really only work for 64-bit mul on 32-bit targets), but that's 
probably not worth worrying about either.
We do have a changes locally that adjust various costs.  One of which is 
highpart multiply.  One of the many things to start working through once 
gcc-15 opens for development.  Hence my desire to help keep gcc-14 on 
track for an on-time release.


Jeff


Re: [PATCH v7 2/5] Convert references with "counted_by" attributes to/from .ACCESS_WITH_SIZE.

2024-03-25 Thread Joseph Myers
On Wed, 20 Mar 2024, Qing Zhao wrote:

> +  /* get the TYPE of the counted_by field.  */

Start comments with an uppercase letter.

> +   The type of the first argument of this function is a POINTER type
> +   to the orignal flexible array type.

s/orignal/original/

> +   If HANDLE_COUNTED_BY is true, check the counted_by attribute and generate
> +   call to .ACCESS_WITH_SIZE. otherwise, ignore the attribute.  */

A sentence should start with an uppercase letter, "Otherwise".

> -  /* Ordinary case; arg is a COMPONENT_REF or a decl.  */
> +  /* Ordinary case; arg is a COMPONENT_REF or a decl,or a call to
> +  .ACCESS_WITH_SIZE.  */

There should be a space after a comma.

> +/* Get the corresponding reference from the call to a .ACCESS_WITH_SIZE.
> + * i.e the first argument of this call. return NULL_TREE otherwise.  */
> +extern tree get_ref_from_access_with_size (tree);

Again, start a sentence with an uppercase letter.

> +case CALL_EXPR:
> +  /* for a call to .ACCESS_WITH_SIZE, check the first argument.  */

Likewise.

> +  /* for a call to .ACCESS_WITH_SIZE, check the first argument.  */

Likewise.

-- 
Joseph S. Myers
josmy...@redhat.com



[PATCH] c++: templated substitution into lambda-expr [PR114393]

2024-03-25 Thread Patrick Palka
Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK
for trunk?

-- >8 --

The below testcases use a lambda-expr as a template argument and they
all trip over the below added tsubst_lambda_expr sanity check ultimately
because current_template_parms is empty, which causes push_template_decl
to return error_mark_node from the call to begin_lambda_type.  Were it
not for the sanity check this silent error_mark_node result leads to
nonsensical errors down the line, or silent breakage.

In the first testcase, we hit this assert during instantiation of the
dependent alias template-id c1_t<_Data> from instantiate_template, which
clears current_template_parms via push_to_top_level.  Similar story for
the second testcase.  For the third testcase we hit the assert during
partial instantiation of the member template from instantiate_class_template
which similarly calls push_to_top_level.

These testcases illustrate that templated substitution into a lambda-expr
is not always possible, in particular when we lost the relevant template
context.  I experimented with recovering the template context by making
tsubst_lambda_expr fall back to using scope_chain->prev->template_parms if
current_template_parms is empty which worked but seemed like a hack.  I
also experimented with preserving the template context by keeping
current_template_parms set during instantiate_template for a dependent
specialization which also worked but it's at odds with the fact that we
cache dependent specializations (and so they should be independent of
the template context).

So instead of trying to make such substitution work, this patch uses the
extra-args mechanism to defer templated substitution into a lambda-expr
when we lost the relevant template context.

PR c++/114393
PR c++/107457
PR c++/93595

gcc/cp/ChangeLog:

* cp-tree.h (LAMBDA_EXPR_EXTRA_ARGS):
(struct GTY):
* module.cc (trees_out::core_vals) : Stream
LAMBDA_EXPR_EXTRA_ARGS.
(trees_in::core_vals) : Likewise.
* pt.cc (has_extra_args_mechanism_p):
(tsubst_lambda_expr):

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/lambda-targ2.C: New test.
* g++.dg/cpp2a/lambda-targ3.C: New test.
* g++.dg/cpp2a/lambda-targ4.C: New test.
---
 gcc/cp/cp-tree.h  |  5 +
 gcc/cp/module.cc  |  2 ++
 gcc/cp/pt.cc  | 20 ++--
 gcc/testsuite/g++.dg/cpp2a/lambda-targ2.C | 19 +++
 gcc/testsuite/g++.dg/cpp2a/lambda-targ3.C | 12 
 gcc/testsuite/g++.dg/cpp2a/lambda-targ4.C | 12 
 6 files changed, 68 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/lambda-targ2.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/lambda-targ3.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/lambda-targ4.C

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index c29a5434492..27100537038 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -1538,6 +1538,10 @@ enum cp_lambda_default_capture_mode_type {
 #define LAMBDA_EXPR_REGEN_INFO(NODE) \
   (((struct tree_lambda_expr *)LAMBDA_EXPR_CHECK (NODE))->regen_info)
 
+/* Like PACK_EXPANSION_EXTRA_ARGS, for lambda-expressions.  */
+#define LAMBDA_EXPR_EXTRA_ARGS(NODE) \
+  (((struct tree_lambda_expr *)LAMBDA_EXPR_CHECK (NODE))->extra_args)
+
 /* The closure type of the lambda, which is also the type of the
LAMBDA_EXPR.  */
 #define LAMBDA_EXPR_CLOSURE(NODE) \
@@ -1550,6 +1554,7 @@ struct GTY (()) tree_lambda_expr
   tree this_capture;
   tree extra_scope;
   tree regen_info;
+  tree extra_args;
   vec *pending_proxies;
   location_t locus;
   enum cp_lambda_default_capture_mode_type default_capture_mode : 2;
diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 52c60cf370c..1cd890909e3 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -6312,6 +6312,7 @@ trees_out::core_vals (tree t)
   WT (((lang_tree_node *)t)->lambda_expression.this_capture);
   WT (((lang_tree_node *)t)->lambda_expression.extra_scope);
   WT (((lang_tree_node *)t)->lambda_expression.regen_info);
+  WT (((lang_tree_node *)t)->lambda_expression.extra_args);
   /* pending_proxies is a parse-time thing.  */
   gcc_assert (!((lang_tree_node *)t)->lambda_expression.pending_proxies);
   if (state)
@@ -6814,6 +6815,7 @@ trees_in::core_vals (tree t)
   RT (((lang_tree_node *)t)->lambda_expression.this_capture);
   RT (((lang_tree_node *)t)->lambda_expression.extra_scope);
   RT (((lang_tree_node *)t)->lambda_expression.regen_info);
+  RT (((lang_tree_node *)t)->lambda_expression.extra_args);
   /* lambda_expression.pending_proxies is NULL  */
   ((lang_tree_node *)t)->lambda_expression.locus
= state->read_location (*this);
diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 8cf0d5b7a8d..b1a9ee2b385 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -3855,7 +3855,8 @@ has_extra_args_mechanism_p 

Re: [PATCH v7 1/5] Provide counted_by attribute to flexible array member field (PR108896)

2024-03-25 Thread Joseph Myers
On Wed, 20 Mar 2024, Qing Zhao wrote:

> +  /* This attribute only applies to a C99 flexible array member type.  */
> +  else if (! c_flexible_array_member_type_p (TREE_TYPE (decl)))
> +{
> +  error_at (DECL_SOURCE_LOCATION (decl),
> + "%qE attribute is not allowed for a non"
> + " flexible array member field",

"non-flexible" not "non flexible" ("non" shouldn't appear as a word on its 
own).

> +  /* Error when the field is not found in the containing structure.  */
> +  if (!counted_by_field)
> +error_at (DECL_SOURCE_LOCATION (field_decl),
> +   "Argument %qE to the %qE attribute is not a field declaration"
> +   " in the same structure as %qD", fieldname,

Diagnostics should start with a lowercase letter, "argument" not 
"Argument".

> +  if (TREE_CODE (TREE_TYPE (real_field)) != INTEGER_TYPE)
> + error_at (DECL_SOURCE_LOCATION (field_decl),
> +   "Argument %qE to the %qE attribute is not a field declaration"
> +   " with an integer type", fieldname,

Likewise.

Generally checks for integer types should allow any INTEGRAL_TYPE_P, 
rather than just INTEGER_TYPE.  For example, it should be valid to use 
this attribute with a field with _BitInt type.  (It would be fairly 
useless with a _BitInt larger than size_t, but maybe e.g. someone knows 
the size in their code must fit into 24 bits and so uses unsigned 
_BitInt(24) for the field.)

Of course there should be corresponding testcases for _Bool / enum / 
_BitInt count fields.

What happens when there are multiple counted_by attributes on the same 
field?  As far as I can see, all but one end up being ignored (by the code 
that actually uses the attribute).  I think multiple such attributes using 
different identifiers should be diagnosed, even if all the identifiers are 
indeed integer fields in the same structure - it doesn't seem meaningful 
to say that multiple fields give the count of elements.  (Multiple 
attributes with the *same* identifier are probably OK to allow; maybe that 
could arise in code using complicated macros that end up adding the 
attribute more than once.)

> +@cindex @code{counted_by} variable attribute
> +@item counted_by (@var{count})
> +The @code{counted_by} attribute may be attached to the C99 flexible array
> +member of a structure.  It indicates that the number of the elements of the
> +array is given by the field named "@var{count}" in the same structure as the
> +flexible array member.

You shouldn't use ASCII quotes like that in Texinfo (outside @code etc. 
where they represent literal quotes in programming language source code).  
You can say ``@var{count}'' if you wish to quote the name.

> +The field that represents the number of the elements should have an
> +integer type.  Otherwise, the compiler will report a warning and ignore
> +the attribute.
> +When the field that represents the number of the elements is assigned a
> +negative integer value, the compiler will treat the value as zero.

In general it's best for documentation to be in the present tense (so the 
compiler *reports* a warning rather than "will report", *treats* the value 
as zero rather than "will treat").

> +It's the user's responsibility to make sure the above requirements to
> +be kept all the time.  Otherwise the compiler will report warnings,
> +at the same time, the results of the array bound sanitizer and the
> +@code{__builtin_dynamic_object_size} is undefined.

Likewise.

-- 
Joseph S. Myers
josmy...@redhat.com



New German PO file for 'gcc' (version 14.1-b20240218)

2024-03-25 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the German team of translators.  The file is available at:

https://translationproject.org/latest/gcc/de.po

(This file, 'gcc-14.1-b20240218.de.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

https://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

https://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




Re: TARGET_RTX_COSTS and pipeline latency vs. variable-latency instructions (was Re: [PATCH] RISC-V: Add XiangShan Nanhu microarchitecture.)

2024-03-25 Thread Palmer Dabbelt

On Mon, 25 Mar 2024 13:27:34 PDT (-0700), Jeff Law wrote:



On 3/25/24 2:13 PM, Palmer Dabbelt wrote:

On Mon, 25 Mar 2024 12:59:14 PDT (-0700), Jeff Law wrote:



On 3/25/24 1:48 PM, Xi Ruoyao wrote:

On Mon, 2024-03-18 at 20:54 -0600, Jeff Law wrote:

+/* Costs to use when optimizing for xiangshan nanhu.  */
+static const struct riscv_tune_param xiangshan_nanhu_tune_info = {
+  {COSTS_N_INSNS (3), COSTS_N_INSNS (3)},    /* fp_add */
+  {COSTS_N_INSNS (3), COSTS_N_INSNS (3)},    /* fp_mul */
+  {COSTS_N_INSNS (10), COSTS_N_INSNS (20)},    /* fp_div */
+  {COSTS_N_INSNS (3), COSTS_N_INSNS (3)},    /* int_mul */
+  {COSTS_N_INSNS (6), COSTS_N_INSNS (6)},    /* int_div */
+  6,    /* issue_rate */
+  3,    /* branch_cost */
+  3,    /* memory_cost */
+  3,    /* fmv_cost */
+  true,    /* slow_unaligned_access */
+  false,    /* use_divmod_expansion */
+  RISCV_FUSE_ZEXTW | RISCV_FUSE_ZEXTH,  /* fusible_ops */
+  NULL,    /* vector cost */



Is your integer division really that fast?  The table above essentially
says that your cpu can do integer division in 6 cycles.


Hmm, I just seen I've coded some even smaller value for LoongArch CPUs
so forgive me for "hijacking" this thread...

The problem seems integer division may spend different number of cycles
for different inputs: on LoongArch LA664 I've observed 5 cycles for some
inputs and 39 cycles for other inputs.

So should we use the minimal value, the maximum value, or something in-
between for TARGET_RTX_COSTS and pipeline descriptions?

Yea, early outs are relatively common in the actual hardware
implementation.

The biggest reason to refine the cost of a division is so that we've got
a reasonably accurate cost for division by a constant -- which can often
be done with multiplication by reciprocal sequence.  The multiplication
by reciprocal sequence will use mult, add, sub, shadd insns and you need
a reasonable cost model for those so you can compare against the cost of
a hardware division.

So to answer your question.  Choose something sensible, you probably
don't want the fastest case and you may not want the slowest case.


Maybe we should have some sort of per-bit-set cost hook for mul/div?
Without that we're kind of just guessing at whether the implmentation
has early outs based on hueristics used to implicitly generate the cost
models.

Not sure that's really worth the complexity, though...

I'd doubt it's worth the complexity.  Picking some reasonable value gets
you the vast majority of the benefit.   Something like
COSTS_N_INSNS(6) is enough to get CSE to trigger.  So what's left is a
reasonable cost, particularly for the division-by-constant case where we
need a ceiling for synth_mult.


Ya, makes sense.  I noticed our multi-word multiply costs are a bit odd 
too (they really only work for 64-bit mul on 32-bit targets), but that's 
probably not worth worrying about either.




Jeff


Re: TARGET_RTX_COSTS and pipeline latency vs. variable-latency instructions (was Re: [PATCH] RISC-V: Add XiangShan Nanhu microarchitecture.)

2024-03-25 Thread Jeff Law




On 3/25/24 2:13 PM, Palmer Dabbelt wrote:

On Mon, 25 Mar 2024 12:59:14 PDT (-0700), Jeff Law wrote:



On 3/25/24 1:48 PM, Xi Ruoyao wrote:

On Mon, 2024-03-18 at 20:54 -0600, Jeff Law wrote:

+/* Costs to use when optimizing for xiangshan nanhu.  */
+static const struct riscv_tune_param xiangshan_nanhu_tune_info = {
+  {COSTS_N_INSNS (3), COSTS_N_INSNS (3)},    /* fp_add */
+  {COSTS_N_INSNS (3), COSTS_N_INSNS (3)},    /* fp_mul */
+  {COSTS_N_INSNS (10), COSTS_N_INSNS (20)},    /* fp_div */
+  {COSTS_N_INSNS (3), COSTS_N_INSNS (3)},    /* int_mul */
+  {COSTS_N_INSNS (6), COSTS_N_INSNS (6)},    /* int_div */
+  6,    /* issue_rate */
+  3,    /* branch_cost */
+  3,    /* memory_cost */
+  3,    /* fmv_cost */
+  true,    /* slow_unaligned_access */
+  false,    /* use_divmod_expansion */
+  RISCV_FUSE_ZEXTW | RISCV_FUSE_ZEXTH,  /* fusible_ops */
+  NULL,    /* vector cost */



Is your integer division really that fast?  The table above essentially
says that your cpu can do integer division in 6 cycles.


Hmm, I just seen I've coded some even smaller value for LoongArch CPUs
so forgive me for "hijacking" this thread...

The problem seems integer division may spend different number of cycles
for different inputs: on LoongArch LA664 I've observed 5 cycles for some
inputs and 39 cycles for other inputs.

So should we use the minimal value, the maximum value, or something in-
between for TARGET_RTX_COSTS and pipeline descriptions?

Yea, early outs are relatively common in the actual hardware
implementation.

The biggest reason to refine the cost of a division is so that we've got
a reasonably accurate cost for division by a constant -- which can often
be done with multiplication by reciprocal sequence.  The multiplication
by reciprocal sequence will use mult, add, sub, shadd insns and you need
a reasonable cost model for those so you can compare against the cost of
a hardware division.

So to answer your question.  Choose something sensible, you probably
don't want the fastest case and you may not want the slowest case.


Maybe we should have some sort of per-bit-set cost hook for mul/div? 
Without that we're kind of just guessing at whether the implmentation 
has early outs based on hueristics used to implicitly generate the cost 
models.


Not sure that's really worth the complexity, though...
I'd doubt it's worth the complexity.  Picking some reasonable value gets 
you the vast majority of the benefit.   Something like
COSTS_N_INSNS(6) is enough to get CSE to trigger.  So what's left is a 
reasonable cost, particularly for the division-by-constant case where we 
need a ceiling for synth_mult.


Jeff


Re: TARGET_RTX_COSTS and pipeline latency vs. variable-latency instructions (was Re: [PATCH] RISC-V: Add XiangShan Nanhu microarchitecture.)

2024-03-25 Thread Palmer Dabbelt

On Mon, 25 Mar 2024 12:59:14 PDT (-0700), Jeff Law wrote:



On 3/25/24 1:48 PM, Xi Ruoyao wrote:

On Mon, 2024-03-18 at 20:54 -0600, Jeff Law wrote:

+/* Costs to use when optimizing for xiangshan nanhu.  */
+static const struct riscv_tune_param xiangshan_nanhu_tune_info = {
+  {COSTS_N_INSNS (3), COSTS_N_INSNS (3)},  /* fp_add */
+  {COSTS_N_INSNS (3), COSTS_N_INSNS (3)},  /* fp_mul */
+  {COSTS_N_INSNS (10), COSTS_N_INSNS (20)},/* fp_div */
+  {COSTS_N_INSNS (3), COSTS_N_INSNS (3)},  /* int_mul */
+  {COSTS_N_INSNS (6), COSTS_N_INSNS (6)},  /* int_div */
+  6,   /* issue_rate */
+  3,   /* branch_cost */
+  3,   /* memory_cost */
+  3,   /* fmv_cost */
+  true,/* 
slow_unaligned_access */
+  false,   /* use_divmod_expansion */
+  RISCV_FUSE_ZEXTW | RISCV_FUSE_ZEXTH,  /* fusible_ops */
+  NULL,/* vector cost */



Is your integer division really that fast?  The table above essentially
says that your cpu can do integer division in 6 cycles.


Hmm, I just seen I've coded some even smaller value for LoongArch CPUs
so forgive me for "hijacking" this thread...

The problem seems integer division may spend different number of cycles
for different inputs: on LoongArch LA664 I've observed 5 cycles for some
inputs and 39 cycles for other inputs.

So should we use the minimal value, the maximum value, or something in-
between for TARGET_RTX_COSTS and pipeline descriptions?

Yea, early outs are relatively common in the actual hardware
implementation.

The biggest reason to refine the cost of a division is so that we've got
a reasonably accurate cost for division by a constant -- which can often
be done with multiplication by reciprocal sequence.  The multiplication
by reciprocal sequence will use mult, add, sub, shadd insns and you need
a reasonable cost model for those so you can compare against the cost of
a hardware division.

So to answer your question.  Choose something sensible, you probably
don't want the fastest case and you may not want the slowest case.


Maybe we should have some sort of per-bit-set cost hook for mul/div?  
Without that we're kind of just guessing at whether the implmentation 
has early outs based on hueristics used to implicitly generate the cost 
models.


Not sure that's really worth the complexity, though...


Jeff


Re: TARGET_RTX_COSTS and pipeline latency vs. variable-latency instructions (was Re: [PATCH] RISC-V: Add XiangShan Nanhu microarchitecture.)

2024-03-25 Thread Jeff Law




On 3/25/24 1:48 PM, Xi Ruoyao wrote:

On Mon, 2024-03-18 at 20:54 -0600, Jeff Law wrote:

+/* Costs to use when optimizing for xiangshan nanhu.  */
+static const struct riscv_tune_param xiangshan_nanhu_tune_info = {
+  {COSTS_N_INSNS (3), COSTS_N_INSNS (3)},  /* fp_add */
+  {COSTS_N_INSNS (3), COSTS_N_INSNS (3)},  /* fp_mul */
+  {COSTS_N_INSNS (10), COSTS_N_INSNS (20)},/* fp_div */
+  {COSTS_N_INSNS (3), COSTS_N_INSNS (3)},  /* int_mul */
+  {COSTS_N_INSNS (6), COSTS_N_INSNS (6)},  /* int_div */
+  6,   /* issue_rate */
+  3,   /* branch_cost */
+  3,   /* memory_cost */
+  3,   /* fmv_cost */
+  true,/* 
slow_unaligned_access */
+  false,   /* use_divmod_expansion */
+  RISCV_FUSE_ZEXTW | RISCV_FUSE_ZEXTH,  /* fusible_ops */
+  NULL,/* vector cost */



Is your integer division really that fast?  The table above essentially
says that your cpu can do integer division in 6 cycles.


Hmm, I just seen I've coded some even smaller value for LoongArch CPUs
so forgive me for "hijacking" this thread...

The problem seems integer division may spend different number of cycles
for different inputs: on LoongArch LA664 I've observed 5 cycles for some
inputs and 39 cycles for other inputs.

So should we use the minimal value, the maximum value, or something in-
between for TARGET_RTX_COSTS and pipeline descriptions?
Yea, early outs are relatively common in the actual hardware 
implementation.


The biggest reason to refine the cost of a division is so that we've got 
a reasonably accurate cost for division by a constant -- which can often 
be done with multiplication by reciprocal sequence.  The multiplication 
by reciprocal sequence will use mult, add, sub, shadd insns and you need 
a reasonable cost model for those so you can compare against the cost of 
a hardware division.


So to answer your question.  Choose something sensible, you probably 
don't want the fastest case and you may not want the slowest case.


Jeff


Re: [PATCH v2] c++: direct-init of an array of class type [PR59465]

2024-03-25 Thread Marek Polacek
On Mon, Mar 25, 2024 at 01:39:39PM +0100, Stephan Bergmann wrote:
> On 3/25/24 13:07, Jakub Jelinek wrote:
> > On Mon, Mar 25, 2024 at 12:36:46PM +0100, Stephan Bergmann wrote:
> > > This started to break
> > > 
> > > > $ cat test.cc
> > > > struct S1 { S1(); };
> > > > struct S2 {
> > > >  S2() {}
> > > >  S1 a[1] {};
> > > > };
> > > 
> > > > $ g++ -fsyntax-only test.cc
> > > > test.cc: In constructor ‘S2::S2()’:
> > > > test.cc:3:10: error: invalid initializer for array member ‘S1 S2::a [1]’
> > > >  3 | S2() {}
> > > >|  ^
> > 
> > https://gcc.gnu.org/PR114439 ?
> 
> yes, sorry, missed that already-existing bugracker issue

I have a patch now, sorry about the breakage.  I'm surprised we had no test
covering this :(.

Marek



TARGET_RTX_COSTS and pipeline latency vs. variable-latency instructions (was Re: [PATCH] RISC-V: Add XiangShan Nanhu microarchitecture.)

2024-03-25 Thread Xi Ruoyao
On Mon, 2024-03-18 at 20:54 -0600, Jeff Law wrote:
> > +/* Costs to use when optimizing for xiangshan nanhu.  */
> > +static const struct riscv_tune_param xiangshan_nanhu_tune_info = {
> > +  {COSTS_N_INSNS (3), COSTS_N_INSNS (3)},  /* fp_add */
> > +  {COSTS_N_INSNS (3), COSTS_N_INSNS (3)},  /* fp_mul */
> > +  {COSTS_N_INSNS (10), COSTS_N_INSNS (20)},/* fp_div */
> > +  {COSTS_N_INSNS (3), COSTS_N_INSNS (3)},  /* int_mul */
> > +  {COSTS_N_INSNS (6), COSTS_N_INSNS (6)},  /* int_div */
> > +  6,   /* issue_rate */
> > +  3,   /* branch_cost */
> > +  3,   /* memory_cost */
> > +  3,   /* fmv_cost */
> > +  true,/* 
> > slow_unaligned_access */
> > +  false,   /* use_divmod_expansion */
> > +  RISCV_FUSE_ZEXTW | RISCV_FUSE_ZEXTH,  /* fusible_ops */
> > +  NULL,/* vector cost */

> Is your integer division really that fast?  The table above essentially 
> says that your cpu can do integer division in 6 cycles.

Hmm, I just seen I've coded some even smaller value for LoongArch CPUs
so forgive me for "hijacking" this thread...

The problem seems integer division may spend different number of cycles
for different inputs: on LoongArch LA664 I've observed 5 cycles for some
inputs and 39 cycles for other inputs.

So should we use the minimal value, the maximum value, or something in-
between for TARGET_RTX_COSTS and pipeline descriptions?

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] c-family, c++: Handle EXCESS_PRECISION_EXPR in pretty printers

2024-03-25 Thread Jason Merrill

On 3/22/24 05:11, Jakub Jelinek wrote:

Hi!

I've noticed that the c-c++-common/gomp/depobj-3.c test FAILs on i686-linux:
PASS: c-c++-common/gomp/depobj-3.c  -std=c++17  at line 17 (test for warnings, 
line 15)
FAIL: c-c++-common/gomp/depobj-3.c  -std=c++17  at line 39 (test for warnings, 
line 37)
PASS: c-c++-common/gomp/depobj-3.c  -std=c++17  at line 43 (test for errors, 
line 41)
PASS: c-c++-common/gomp/depobj-3.c  -std=c++17  (test for warnings, line 45)
FAIL: c-c++-common/gomp/depobj-3.c  -std=c++17 (test for excess errors)
Excess errors:
/home/jakub/src/gcc/gcc/testsuite/c-c++-common/gomp/depobj-3.c:37:38: warning: the 
'destroy' expression ''excess_precision_expr' not supported by 
dump_expr' should be the same as the 'depobj' argument 'obj' 
[-Wopenmp]
The following patch replaces that 'excess_precision_expr' not supported by 
dump_expr
with (float)(((long double)a) + (long double)5)
Still ugly and doesn't actually fix the FAIL (will deal with that
incrementally), but at least valid C/C++ and shows the excess precision
handling in action.

Ok for trunk if this passes bootstrap/regtest?


OK.


2024-03-22  Jakub Jelinek  

gcc/c/
* c-pretty-print.cc (pp_c_cast_expression,
c_pretty_printer::expression): Handle EXCESS_PRECISION_EXPR like
NOP_EXPR.
gcc/cp/
* error.cc (dump_expr): Handle EXCESS_PRECISION_EXPR like NOP_EXPR.

--- gcc/c-family/c-pretty-print.cc.jj   2024-01-12 10:07:57.744858004 +0100
+++ gcc/c-family/c-pretty-print.cc  2024-03-22 09:58:56.640001991 +0100
@@ -2327,6 +2327,7 @@ pp_c_cast_expression (c_pretty_printer *
  case FIX_TRUNC_EXPR:
  CASE_CONVERT:
  case VIEW_CONVERT_EXPR:
+case EXCESS_PRECISION_EXPR:
if (!location_wrapper_p (e))
pp_c_type_cast (pp, TREE_TYPE (e));
pp_c_cast_expression (pp, TREE_OPERAND (e, 0));
@@ -2753,6 +2754,7 @@ c_pretty_printer::expression (tree e)
  case FIX_TRUNC_EXPR:
  CASE_CONVERT:
  case VIEW_CONVERT_EXPR:
+case EXCESS_PRECISION_EXPR:
pp_c_cast_expression (this, e);
break;
  
--- gcc/cp/error.cc.jj	2024-01-20 12:32:34.157939870 +0100

+++ gcc/cp/error.cc 2024-03-22 10:00:38.259610171 +0100
@@ -2662,6 +2662,7 @@ dump_expr (cxx_pretty_printer *pp, tree
  CASE_CONVERT:
  case IMPLICIT_CONV_EXPR:
  case VIEW_CONVERT_EXPR:
+case EXCESS_PRECISION_EXPR:
{
tree op = TREE_OPERAND (t, 0);
  


Jakub





Re: [PATCH v2] c++: ICE with noexcept and local specialization, again [PR114349]

2024-03-25 Thread Jason Merrill

On 3/22/24 17:30, Marek Polacek wrote:

On Thu, Mar 21, 2024 at 05:27:37PM -0400, Jason Merrill wrote:

On 3/21/24 17:01, Marek Polacek wrote:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
Patrick noticed that my r14-9339-gdc6c3bfb59baab patch is wrong;
we're dealing with a noexcept-spec there, not a noexcept-expr, so
setting cp_noexcept_operand et al is incorrect.  Back to the drawing
board then.

To fix noexcept84.C, we should probably avoid doing push_to_top_level
in certain cases.  Patrick suggested checking:

const bool push_to_top = current_function_decl != fn;

which works, but I'm not sure I follow the logic there.  I also came
up with

const bool push_to_top = !decl_function_context (fn);

which also works.  But ultimately I went with !DECL_TEMPLATE_INSTANTIATED;
if DECL_TEMPLATE_INSTANTIATED is set, we've already pushed to top level
if it was necessary in instantiate_body.


This sort of thing is what maybe_push_to_top_level is for, does that also
work?


Sadly -- and I should have mentioned that -- no.  maybe_push_to_top_level asks:

   bool push_to_top
 = !(current_function_decl
&& !LAMBDA_FUNCTION_P (d)
&& decl_function_context (d) == current_function_decl);

here both d and current_function_decl are test()::S::S(), and
decl_function_context (d) is test().  (current_function_decl was
set to test()::S::S() by an earlier push_access_scope call.)

But I want it to work, and I think using maybe_ would be a way nicer
fix.  So what if we don't push to top level if decl_function_context
is non-null?  I had to add the LAMBDA_TYPE_P check though: it looks
that we always have to push to top level for lambdas, but sometimes
we get a lambda's TYPE_DECL, and LAMBDA_FUNCTION_P doesn't catch
that.  An example is lambda-nested4.C.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
Patrick noticed that my r14-9339-gdc6c3bfb59baab patch is wrong;
we're dealing with a noexcept-spec there, not a noexcept-expr, so
setting cp_noexcept_operand et al is incorrect.  Back to the drawing
board then.

To fix noexcept84.C, we should probably avoid doing push_to_top_level
in certain cases.  maybe_push_to_top_level didn't work here as-is, so
I changed it to not push to top level if decl_function_context is
non-null, when we are not dealing with a lambda.

This also fixes c++/114349, introduced by r14-9339.

PR c++/114349

gcc/cp/ChangeLog:

* name-lookup.cc (maybe_push_to_top_level): For a non-lambda,
don't push to top level if decl_function_context is non-null.
* pt.cc (maybe_instantiate_noexcept): Use maybe_push_to_top_level.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/noexcept85.C: New test.
* g++.dg/cpp0x/noexcept86.C: New test.
---
  gcc/cp/name-lookup.cc   | 12 ++---
  gcc/cp/pt.cc| 11 ++---
  gcc/testsuite/g++.dg/cpp0x/noexcept85.C | 33 +
  gcc/testsuite/g++.dg/cpp0x/noexcept86.C | 25 +++
  4 files changed, 68 insertions(+), 13 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/noexcept85.C
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/noexcept86.C

diff --git a/gcc/cp/name-lookup.cc b/gcc/cp/name-lookup.cc
index dce4caf8981..4b2b27bdd0d 100644
--- a/gcc/cp/name-lookup.cc
+++ b/gcc/cp/name-lookup.cc
@@ -8664,10 +8664,14 @@ maybe_push_to_top_level (tree d)
  {
/* Push if D isn't function-local, or is a lambda function, for which name
   resolution is already done.  */
-  bool push_to_top
-= !(current_function_decl
-   && !LAMBDA_FUNCTION_P (d)
-   && decl_function_context (d) == current_function_decl);
+  const bool push_to_top
+= (LAMBDA_FUNCTION_P (d)
+   || (TREE_CODE (d) == TYPE_DECL
+  && TREE_TYPE (d)
+  && LAMBDA_TYPE_P (TREE_TYPE (d)))
+   || !current_function_decl
+   || (!decl_function_context (d)
+  && decl_function_context (d) != current_function_decl));


This line seems unnecessary; the case it excludes is when 
decl_function_context and current_function_decl are both null, but if 
current_function_decl is null we already succeeded.


OK with this line removed.


if (push_to_top)
  push_to_top_level ();
diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 8cf0d5b7a8d..7b00a8615d2 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -26855,7 +26855,7 @@ maybe_instantiate_noexcept (tree fn, tsubst_flags_t 
complain)
}
else if (push_tinst_level (fn))
{
- push_to_top_level ();
+ const bool push_to_top = maybe_push_to_top_level (fn);
  push_access_scope (fn);
  push_deferring_access_checks (dk_no_deferred);
  input_location = DECL_SOURCE_LOCATION (fn);
@@ -26878,17 +26878,10 @@ maybe_instantiate_noexcept (tree fn, tsubst_flags_t 
complain)
  if (orig_fn)
++processing_template_decl;
  
-	  ++cp_unevaluated_operand;

- 

C++ Patch ping

2024-03-25 Thread Jakub Jelinek
Hi!

I'd like to ping the
https://gcc.gnu.org/pipermail/gcc-patches/2024-March/647445.html
PR111284 P2 patch.

Thanks.

Jakub



Re: [PATCH] libgcc: arm: fix build for FDPIC target

2024-03-25 Thread Max Filippov
On Fri, Mar 22, 2024 at 1:15 PM Max Filippov  wrote:
>
> libgcc/
> * unwind-arm-common.inc (__gnu_personality_sigframe_fdpic): Cast
> last argument of _Unwind_VRS_Set to void *.
> ---
>  libgcc/unwind-arm-common.inc | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Build-tested for arm-gnu-uclinuxfdpiceabi, committed as obvious.

-- 
Thanks.
-- Max


Re: [PATCH] RISC-V: Add initial cost handling for segment loads/stores.

2024-03-25 Thread Robin Dapp
> So where do we stand with this?  Juzhe asked it to be rebased, but I
> don't see a rebased version in my inbox and I don't see anything that
> looks like this on the trunk.

I missed this one and figured as we're pretty late in the cycle it can
wait until GCC 15.  Therefore let's call it "deferred".

Regards
 Robin


Re: [PING][PATCH] libstdc++: atomic: Add missing clear_padding in __atomic_float constructor

2024-03-25 Thread xndcn
Wow, thank you all, you guys!

在 2024年3月14日星期四,Jonathan Wakely  写道:

> On Fri, 16 Feb 2024 at 15:15, Jonathan Wakely wrote:
> >
> > On Fri, 16 Feb 2024 at 14:10, Jakub Jelinek wrote:
> > >
> > > On Fri, Feb 16, 2024 at 01:51:54PM +, Jonathan Wakely wrote:
> > > > Ah, although __atomic_compare_exchange only takes pointers, the
> > > > compiler replaces that with a call to __atomic_compare_exchange_n
> > > > which takes the newval by value, which presumably uses an 80-bit FP
> > > > register and so the padding bits become indeterminate again.
> > >
> > > __atomic_compare_exchange_n only works with integers, so I guess
> > > it is doing VIEW_CONVERT_EXPR (aka union-style type punning) on the
> > > argument.
> > >
> > > Do you have preprocessed source for the testcase?
> >
> > Sent offlist.
>
> Jakub fixed the compiler, so I've pushed the attached patch now.
>
> Tested x86_64-linux.
>


Re: [PATCH] amdgcn: Add gfx1036 target

2024-03-25 Thread Richard Biener
On Mon, 25 Mar 2024, Richard Biener wrote:

> Add support for the gfx1036 RDNA2 APU integrated graphics devices.  The ROCm
> documentation warns that these may not be supported, but it seems to work
> at least partially.
> 
> x86 host bootstrap/regtest running, target-libgomp testing for the
> offload produces results comparable to those of gfx1030.  The nice
> thing is that gfx1036 is inside every Zen4 desktop CPU (Ryzen 7xxx)
> and testing on that doesn't interfere with a separate GPU used for
> your desktop (where I experienced crashes when using the GPU for both
> offload and graphics).
> 
> I'll note that while gfx1030 works with llvm14 gfx1036 needs llvm15
> as minimum version for the assembler.
> 
> OK for trunk?
> 
> I'll follow up with the libgomp testing test summary for archival
> purposes.

Here's the result of a

make -k -j4 check-target-libgomp 
RUNTESTFLAGS="--target_board=unix/-foffload-options=-march=gfx1036"

I had to manually kill some hung processes from the OACC offload
testsuite.  I'll try again later this week with your locking fix
applied to newlib (not sure if that's also employed for non-I/O).

Richard.


cat <<'EOF' |
Native configuration is x86_64-pc-linux-gnu

=== libgomp tests ===


Running target unix/-foffload-options=-march=gfx1036
FAIL: libgomp.c++/../libgomp.c-c++-common/icv-7.c execution test
FAIL: libgomp.c++/../libgomp.c-c++-common/teams-2.c execution test
FAIL: libgomp.c++/../libgomp.c-c++-common/teams-nteams-icv-1.c execution test
FAIL: libgomp.c++/../libgomp.c-c++-common/teams-nteams-icv-2.c execution test
FAIL: libgomp.c++/../libgomp.c-c++-common/teams-nteams-icv-3.c execution test
FAIL: libgomp.c++/../libgomp.c-c++-common/teams-nteams-icv-4.c execution test
FAIL: libgomp.c++/firstprivate-2.C (test for excess errors)
UNRESOLVED: libgomp.c++/firstprivate-2.C compilation failed to produce 
executable
XPASS: libgomp.c++/target-49.C execution test
FAIL: libgomp.c/../libgomp.c-c++-common/icv-7.c execution test
FAIL: libgomp.c/../libgomp.c-c++-common/teams-2.c execution test
FAIL: libgomp.c/../libgomp.c-c++-common/teams-nteams-icv-1.c execution test
FAIL: libgomp.c/../libgomp.c-c++-common/teams-nteams-icv-2.c execution test
FAIL: libgomp.c/../libgomp.c-c++-common/teams-nteams-icv-3.c execution test
FAIL: libgomp.c/../libgomp.c-c++-common/teams-nteams-icv-4.c execution test
FAIL: libgomp.c/declare-variant-4-gfx1030.c (test for excess errors)
FAIL: libgomp.c/declare-variant-4-gfx1100.c (test for excess errors)
FAIL: libgomp.c/declare-variant-4-gfx900.c (test for excess errors)
FAIL: libgomp.c/declare-variant-4-gfx906.c (test for excess errors)
FAIL: libgomp.c/declare-variant-4-gfx908.c (test for excess errors)
FAIL: libgomp.c/declare-variant-4-gfx90a.c (test for excess errors)
FAIL: libgomp.c/declare-variant-4.c execution test
FAIL: libgomp.c/declare-variant-4.c scan-amdgcn-amdhsa-offload-tree-dump 
optimized "= gfx[^ ]+ ();"
FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/deep-copy-10.c 
-DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa  -O0  
execution test
FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/deep-copy-10.c 
-DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa  -O2  
execution test
FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/parallel-dims.c 
-DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa  -O0  
execution test
FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/parallel-dims.c 
-DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa  -O2  
execution test
FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/private-atomic-1-gang.c 
-DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa  -O0  
execution test
FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/private-atomic-1-gang.c 
-DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa  -O2  
execution test
FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/static-variable-1.c 
-DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa  -O0  
execution test
FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/static-variable-1.c 
-DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa  -O2  
execution test
FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/vprop-2.c 
-DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa  -O0  
execution test
FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/vprop-2.c 
-DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa  -O2  
execution test
FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/vprop.c 
-DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa  -O2  
(test for excess errors)
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/deep-copy-10.c 
-DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa  -O0  
execution test
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/deep-copy-10.c 
-DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 

[PATCH] c++: fix alias CTAD [PR114377]

2024-03-25 Thread centurion
>From b34312d82b236601c348382d30e625558f37d40c Mon Sep 17 00:00:00 2001
From: centurion 
Date: Mon, 25 Mar 2024 01:57:21 +0400
Subject: [PATCH] c++: fix alias CTAD [PR114377]

PR c++/114377

gcc/cp/ChangeLog:

PR c++/114377
* pt.cc (find_template_parameter_info::found): Use TREE_TYPE for
TEMPLATE_DECL instead of DECL_INITIAL.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/class-deduction-alias19.C: New test.
---
 gcc/cp/pt.cc  |  3 ++-
 .../g++.dg/cpp2a/class-deduction-alias19.C| 15 +++
 2 files changed, 17 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/class-deduction-alias19.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 8cf0d5b7a8d..d8a02f1cd7f 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -11032,7 +11032,8 @@ find_template_parameter_info::found (tree parm)
 {
   if (TREE_CODE (parm) == TREE_LIST)
 parm = TREE_VALUE (parm);
-  if (TREE_CODE (parm) == TYPE_DECL)
+  if (TREE_CODE (parm) == TYPE_DECL
+  || TREE_CODE(parm) == TEMPLATE_DECL)
 parm = TREE_TYPE (parm);
   else
 parm = DECL_INITIAL (parm);
diff --git a/gcc/testsuite/g++.dg/cpp2a/class-deduction-alias19.C 
b/gcc/testsuite/g++.dg/cpp2a/class-deduction-alias19.C
new file mode 100644
index 000..1ea79bd7691
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/class-deduction-alias19.C
@@ -0,0 +1,15 @@
+// PR c++/114377
+// { dg-do compile { target c++20 } }
+
+template  typename Iterator>
+struct K {};
+
+template 
+class Foo {};
+
+template  typename TTP>
+using Bar = Foo>;
+
+void s() {
+Bar(1); // { dg-error "failed|no match" }
+}
-- 
2.44.0




Re: [PING^4] Re: [PATCH] analyzer: deal with -fshort-enums

2024-03-25 Thread Yvan ROUX - foss
Ping!

Rgds,
Yvan

From: Torbjorn SVENSSON - foss
Sent: Friday, March 15, 2024 11:32 AM
To: David Malcolm; Alexandre Oliva
Cc: gcc-patches@gcc.gnu.org; Yvan ROUX - foss
Subject: [PING^3] Re: [PATCH] analyzer: deal with -fshort-enums

Ping!

Kind regards,
Torbjörn

On 2024-03-08 10:14, Torbjorn SVENSSON wrote:
> Ping!
>
> Kind regards,
> Torbjörn
>
> On 2024-02-22 09:51, Torbjorn SVENSSON wrote:
>> Ping!
>>
>> Kind regards,
>> Torbjörn
>>
>> On 2024-02-07 17:21, Torbjorn SVENSSON wrote:
>>> Hi,
>>>
>>> Is it okay to backport 3cbab07b08d2f3a3ed34b6ec12e67727c59d285c to
>>> releases/gcc-13?
>>>
>>> Without this backport, I see these failures on arm-none-eabi:
>>>
>>> FAIL: gcc.dg/analyzer/switch-enum-1.c  (test for bogus messages, line
>>> 26)
>>> FAIL: gcc.dg/analyzer/switch-enum-1.c  (test for bogus messages, line
>>> 44)
>>> FAIL: gcc.dg/analyzer/switch-enum-2.c  (test for bogus messages, line
>>> 34)
>>> FAIL: gcc.dg/analyzer/switch-enum-2.c  (test for bogus messages, line
>>> 52)
>>> FAIL: gcc.dg/analyzer/torture/switch-enum-pr105273-doom-p_floor.c -O0
>>>   (test for bogus messages, line 82)
>>> FAIL: gcc.dg/analyzer/torture/switch-enum-pr105273-doom-p_maputl.c
>>> -O0(test for bogus messages, line 83)
>>>
>>> Kind regards,
>>> Torbjörn
>>>
>>>
>>> On 2023-12-06 23:22, David Malcolm wrote:
 On Wed, 2023-12-06 at 02:31 -0300, Alexandre Oliva wrote:
> On Nov 22, 2023, Alexandre Oliva  wrote:
>
>> Ah, nice, that's a great idea, I wish I'd thought of that!  Will
>> do.
>
> Sorry it took me so long, here it is.  I added two tests, so that,
> regardless of the defaults, we get both circumstances tested, without
> repetition.
>
> Regstrapped on x86_64-linux-gnu.  Also tested on arm-eabi.  Ok to
> install?

 Thanks for the updated patch.

 Looks good to me.

 Dave

>
>
> analyzer: deal with -fshort-enums
>
> On platforms that enable -fshort-enums by default, various switch-
> enum
> analyzer tests fail, because apply_constraints_for_gswitch doesn't
> expect the integral promotion type cast.  I've arranged for the code
> to cope with those casts.
>
>
> for  gcc/analyzer/ChangeLog
>
>  * region-model.cc (has_nondefault_case_for_value_p): Take
>  enumerate type as a parameter.
>  (region_model::apply_constraints_for_gswitch): Cope with
>  integral promotion type casts.
>
> for  gcc/testsuite/ChangeLog
>
>  * gcc.dg/analyzer/switch-short-enum-1.c: New.
>  * gcc.dg/analyzer/switch-no-short-enum-1.c: New.
> ---
>   gcc/analyzer/region-model.cc   |   27 +++-
>   .../gcc.dg/analyzer/switch-no-short-enum-1.c   |  141
> 
>   .../gcc.dg/analyzer/switch-short-enum-1.c  |  140
> 
>   3 files changed, 304 insertions(+), 4 deletions(-)
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/switch-no-short-
> enum-1.c
>   create mode 100644 gcc/testsuite/gcc.dg/analyzer/switch-short-enum-
> 1.c
>
> diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-
> model.cc
> index 2157ad2578b85..6a7a8bc9f4884 100644
> --- a/gcc/analyzer/region-model.cc
> +++ b/gcc/analyzer/region-model.cc
> @@ -5387,10 +5387,10 @@ has_nondefault_case_for_value_p (const
> gswitch *switch_stmt, tree int_cst)
>  has nondefault cases handling all values in the enum.  */
>   static bool
> -has_nondefault_cases_for_all_enum_values_p (const gswitch
> *switch_stmt)
> +has_nondefault_cases_for_all_enum_values_p (const gswitch
> *switch_stmt,
> +   tree type)
>   {
> gcc_assert (switch_stmt);
> -  tree type = TREE_TYPE (gimple_switch_index (switch_stmt));
> gcc_assert (TREE_CODE (type) == ENUMERAL_TYPE);
> for (tree enum_val_iter = TYPE_VALUES (type);
> @@ -5426,6 +5426,23 @@ apply_constraints_for_gswitch (const
> switch_cfg_superedge ,
>   {
> tree index  = gimple_switch_index (switch_stmt);
> const svalue *index_sval = get_rvalue (index, ctxt);
> +  bool check_index_type = true;
> +
> +  /* With -fshort-enum, there may be a type cast.  */
> +  if (ctxt && index_sval->get_kind () == SK_UNARYOP
> +  && TREE_CODE (index_sval->get_type ()) == INTEGER_TYPE)
> +{
> +  const unaryop_svalue *unaryop = as_a 
> (index_sval);
> +  if (unaryop->get_op () == NOP_EXPR
> + && is_a  (unaryop->get_arg ()))
> +   if (const initial_svalue *initvalop = (as_a  initial_svalue *>
> +  (unaryop->get_arg
> (
> + if (TREE_CODE (initvalop->get_type ()) == ENUMERAL_TYPE)
> +   {
> + 

Re: [PATCH] amdgcn: Add gfx1036 target

2024-03-25 Thread Andrew Stubbs

On 25/03/2024 11:27, Richard Biener wrote:

Add support for the gfx1036 RDNA2 APU integrated graphics devices.  The ROCm
documentation warns that these may not be supported, but it seems to work
at least partially.

x86 host bootstrap/regtest running, target-libgomp testing for the
offload produces results comparable to those of gfx1030.  The nice
thing is that gfx1036 is inside every Zen4 desktop CPU (Ryzen 7xxx)
and testing on that doesn't interfere with a separate GPU used for
your desktop (where I experienced crashes when using the GPU for both
offload and graphics).

I'll note that while gfx1030 works with llvm14 gfx1036 needs llvm15
as minimum version for the assembler.

OK for trunk?


OK.



I'll follow up with the libgomp testing test summary for archival
purposes.  I still see linker errors for testcases using -g
(the ld: ^[[0;31merror: ^[[0mincompatible mach:
/tmp/ccr0oDpD.mkoffload.dbg.o^M kind)


This is caused by the --with-arch=gfx1036 not being picked up by 
mkoffload. It works fine if you use the default configuration or specify 
the -march explicitly. Either way, the bug is not in your patch.


For now, please test like this:

   RUNTESTFLAGS=--target_board=unix/-foffload=-march=gfx1036

Andrew


Thanks,
Richard.

gcc/ChangeLog:

* config.gcc (amdgcn): Add gfx1036 entries.
* config/gcn/gcn-hsa.h (NO_XNACK): Likewise.
(gcn_local_sym_hash): Likewise.
* config/gcn/gcn-opts.h (enum processor_type): Likewise.
(TARGET_GFX1036): New macro.
* config/gcn/gcn.cc (gcn_option_override): Handle gfx1036.
(gcn_omp_device_kind_arch_isa): Likewise.
(output_file_start): Likewise.
* config/gcn/gcn.h (TARGET_CPU_CPP_BUILTINS): Add __gfx1036__.
(TARGET_CPU_CPP_BUILTINS): Rename __gfx1030 to __gfx1030__.
* config/gcn/gcn.opt: Add gfx1036.
* config/gcn/mkoffload.cc (EF_AMDGPU_MACH_AMDGCN_GFX1036): New.
(main): Handle gfx1036.
* config/gcn/t-omp-device: Add gfx1036 isa.
* doc/install.texi (amdgcn): Add gfx1036.
* doc/invoke.texi (-march): Likewise.

libgomp/ChangeLog:

* plugin/plugin-gcn.c (EF_AMDGPU_MACH): GFX1036.
(gcn_gfx1103_s): New.
(isa_hsa_name): Handle gfx1036.
(isa_code): Likewise.
(max_isa_vgprs): Likewise.
---
  gcc/config.gcc  |  4 ++--
  gcc/config/gcn/gcn-hsa.h|  6 +++---
  gcc/config/gcn/gcn-opts.h   |  2 ++
  gcc/config/gcn/gcn.cc   | 10 ++
  gcc/config/gcn/gcn.h|  4 +++-
  gcc/config/gcn/gcn.opt  |  3 +++
  gcc/config/gcn/mkoffload.cc |  5 +
  gcc/config/gcn/t-omp-device |  2 +-
  gcc/doc/install.texi|  3 ++-
  gcc/doc/invoke.texi |  3 +++
  libgomp/plugin/plugin-gcn.c |  8 
  11 files changed, 42 insertions(+), 8 deletions(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 87a5c92b6e3..17873ac2103 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -4560,7 +4560,7 @@ case "${target}" in
for which in arch tune; do
eval "val=\$with_$which"
case ${val} in
-   "" | fiji | gfx900 | gfx906 | gfx908 | gfx90a | gfx1030 
| gfx1100 | gfx1103)
+   "" | fiji | gfx900 | gfx906 | gfx908 | gfx90a | gfx1030 
| gfx1036 | gfx1100 | gfx1103)
# OK
;;
*)
@@ -4576,7 +4576,7 @@ case "${target}" in
TM_MULTILIB_CONFIG=
;;
xdefault | xyes)
-   TM_MULTILIB_CONFIG=`echo 
"gfx900,gfx906,gfx908,gfx90a,gfx1030,gfx1100,gfx1103" | sed 
"s/${with_arch},\?//;s/,$//"`
+   TM_MULTILIB_CONFIG=`echo 
"gfx900,gfx906,gfx908,gfx90a,gfx1030,gfx1036,gfx1100,gfx1103" | sed 
"s/${with_arch},\?//;s/,$//"`
;;
*)
TM_MULTILIB_CONFIG="${with_multilib_list}"
diff --git a/gcc/config/gcn/gcn-hsa.h b/gcc/config/gcn/gcn-hsa.h
index ac32b8a328f..7d6e3141cea 100644
--- a/gcc/config/gcn/gcn-hsa.h
+++ b/gcc/config/gcn/gcn-hsa.h
@@ -90,7 +90,7 @@ extern unsigned int gcn_local_sym_hash (const char *name);
 the ELF flags (e_flags) of that generated file must be identical to those
 generated by the compiler.  */
  
-#define NO_XNACK "march=fiji:;march=gfx1030:;march=gfx1100:;march=gfx1103:;" \

+#define NO_XNACK 
"march=fiji:;march=gfx1030:;march=gfx1036:;march=gfx1100:;march=gfx1103:;" \
  /* These match the defaults set in gcn.cc.  */ \
  
"!mxnack*|mxnack=default:%{march=gfx900|march=gfx906|march=gfx908:-mattr=-xnack};"
  #define NO_SRAM_ECC "!march=*:;march=fiji:;march=gfx900:;march=gfx906:;"
@@ -106,8 +106,8 @@ extern unsigned int gcn_local_sym_hash (const char *name);
  "%{" ABI_VERSION_SPEC "} " \
  "%{" NO_XNACK XNACKOPT "} " \
  "%{" NO_SRAM_ECC SRAMOPT "} " \
- 

Re: [PATCH] amdgcn: Add gfx1036 target

2024-03-25 Thread Richard Biener
On Mon, 25 Mar 2024, Tobias Burnus wrote:

> Richard Biener wrote:
> > I'll follow up with the libgomp testing test summary for archival
> > purposes.  I still see linker errors for testcases using -g
> > (the ld: ^[[0;31merror: ^[[0mincompatible mach:
> > /tmp/ccr0oDpD.mkoffload.dbg.o^M kind)
> 
> Hmm, odd ? can you try compile with -save-temp and look at the relevant files
> with, e.g., readelf -h on the GCN files (e.g. 'readelf -h
> *.xamdgcn-amdhsa.mkoffload.*o') ? that should show under "Flags" what the
> program was compiled for.
> 
> We did encounter this issue with LLVM 18 and the solution was explicitly set
> the version both in the compiler via gcc/config/gcn/gcn-hsa.h's
> 
> #define ABI_VERSION_SPEC "march=fiji:--amdhsa-code-object-version=3;" \
>  "!march=*|march=*:--amdhsa-code-object-version=4"
> 
> and for the debugging data in mkoffload.cc's
> 
>   ehdr.e_ident[8] = (elf_arch == EF_AMDGPU_MACH_AMDGCN_GFX803
>  ? ELFABIVERSION_AMDGPU_HSA_V3
>  : ELFABIVERSION_AMDGPU_HSA_V4);
> 
> But I fail to see why this doesn't work for you - you should get V4 for your
> gfx1036 target.
> 
> Here, ELFABIVERSION_AMDGPU_HSA_V4 2 (V1 did not have a number and V2 started
> with 0, hence V3 = 1 etc.)

So just for the record it was --with-arch=gfx1036 not passed through to
mkoffload, using explicit -foffload-options=-march=gfx1036 fixes that
problem.

> What LLVM version did you use for the assembler (llvm-mc)?

I've used llvm15 but lld from llvm14 (don't ask ...)

Richard.


Re: [PATCH] amdgcn: Add gfx1036 target

2024-03-25 Thread Tobias Burnus

Richard Biener wrote:

I'll follow up with the libgomp testing test summary for archival
purposes.  I still see linker errors for testcases using -g
(the ld: ^[[0;31merror: ^[[0mincompatible mach:
/tmp/ccr0oDpD.mkoffload.dbg.o^M kind)


Hmm, odd – can you try compile with -save-temp and look at the relevant 
files with, e.g., readelf -h on the GCN files (e.g. 'readelf -h 
*.xamdgcn-amdhsa.mkoffload.*o') – that should show under "Flags" what 
the program was compiled for.


We did encounter this issue with LLVM 18 and the solution was explicitly 
set the version both in the compiler via gcc/config/gcn/gcn-hsa.h's


#define ABI_VERSION_SPEC "march=fiji:--amdhsa-code-object-version=3;" \
 "!march=*|march=*:--amdhsa-code-object-version=4"

and for the debugging data in mkoffload.cc's

  ehdr.e_ident[8] = (elf_arch == EF_AMDGPU_MACH_AMDGCN_GFX803
 ? ELFABIVERSION_AMDGPU_HSA_V3
 : ELFABIVERSION_AMDGPU_HSA_V4);

But I fail to see why this doesn't work for you - you should get V4 for 
your gfx1036 target.


Here, ELFABIVERSION_AMDGPU_HSA_V4 2 (V1 did not have a number and V2 
started with 0, hence V3 = 1 etc.)


What LLVM version did you use for the assembler (llvm-mc)?

Tobias


Re: Backport PR91838 and PR110838

2024-03-25 Thread Richard Biener
On Mon, 25 Mar 2024, Andre Vieira (lists) wrote:

> Hi,
> 
> After the backport off PR target/112787 a failure was reported against x86_64,
> this would be fixed by backporting:
> * tree-optimization/91838 - fix FAIL of g++.dg/opt/pr91838.C
> (d1c072a1c3411a6fe29900750b38210af8451eeb)
> * tree-optimization/110838 - less aggressively fold out-of-bound shifts
> (04aa0edcace22a7815cfc57575f1f7b1f166ac10)
> 
> Patches apply cleanly, just one minor git context conflict with includes.
> 
> Bootstrapped and regression tested on aarch64-unknown-linux-gnu and
> x86_64-pc-linux-gnu for gcc-12 and gcc-13 branches.
> 
> OK to backport?

OK.

Thanks,
Richard.


Backport PR91838 and PR110838

2024-03-25 Thread Andre Vieira (lists)

Hi,

After the backport off PR target/112787 a failure was reported against 
x86_64, this would be fixed by backporting:
* tree-optimization/91838 - fix FAIL of g++.dg/opt/pr91838.C 
(d1c072a1c3411a6fe29900750b38210af8451eeb)
* tree-optimization/110838 - less aggressively fold out-of-bound shifts 
(04aa0edcace22a7815cfc57575f1f7b1f166ac10)


Patches apply cleanly, just one minor git context conflict with includes.

Bootstrapped and regression tested on aarch64-unknown-linux-gnu and 
x86_64-pc-linux-gnu for gcc-12 and gcc-13 branches.


OK to backport?

Kind regards,
Andre


Re: GCN: Enable effective-target 'vect_long_mult'

2024-03-25 Thread Andrew Stubbs

On 21/03/2024 10:41, Thomas Schwinge wrote:

Hi!

OK to push the attached "GCN: Enable effective-target 'vect_long_mult'"?
(Or is that not what you'd expect to see for GCN?  I haven't checked the
actual back end code...)


OK.

Andrew



Re: GCN: Enable effective-target 'vect_hw_misalign'

2024-03-25 Thread Andrew Stubbs

On 21/03/2024 10:41, Thomas Schwinge wrote:

Hi!

OK to push the attached
"GCN: Enable effective-target 'vect_hw_misalign'"?  (Or is that not what
you'd expect to see for GCN?  I haven't checked the actual back end
code...)


OK.

Andrew.


Re: [PATCH] sanitizer: [PR110027] Align asan_vec[0] to MAX (alignb, ASAN_RED_ZONE_SIZE)

2024-03-25 Thread Jakub Jelinek
On Tue, Mar 12, 2024 at 07:57:59PM +0800, liuhongt wrote:
> if alignb > ASAN_RED_ZONE_SIZE and offset[0] is not multiple of
> alignb. (base_align_bias - base_offset) may not aligned to alignb, and
> caused segement fault.
> 
> Bootstrapped and regtested on x86_64-linux-gnu{-m32,}.
> Ok for trunk and backport to GCC13?
> 
> gcc/ChangeLog:
> 
>   PR sanitizer/110027
>   * cfgexpand.cc (expand_stack_vars): Align frame offset to
>   MAX (alignb, ASAN_RED_ZONE_SIZE).
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/asan/pr110027.C: New test.
> ---
>  gcc/cfgexpand.cc |  2 +-
>  gcc/testsuite/g++.dg/asan/pr110027.C | 20 
>  2 files changed, 21 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/g++.dg/asan/pr110027.C
> 
> diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
> index 0de299c62e3..92062378d8e 100644
> --- a/gcc/cfgexpand.cc
> +++ b/gcc/cfgexpand.cc
> @@ -1214,7 +1214,7 @@ expand_stack_vars (bool (*pred) (size_t), class 
> stack_vars_data *data)
>   {
> if (data->asan_vec.is_empty ())
>   {
> -   align_frame_offset (ASAN_RED_ZONE_SIZE);
> +   align_frame_offset (MAX (alignb, ASAN_RED_ZONE_SIZE));
> prev_offset = frame_offset.to_constant ();
>   }
> prev_offset = align_base (prev_offset,

This doesn't look correct to me.
The above is done just once for the first var partition.  And
var partitions are sorted by stack_var_cmp, which puts > 
MAX_SUPPORTED_STACK_ALIGNMENT
alignment vars first (that should be none on x86, the above is quite huge
alignment), then on size decreasing and only after that on alignment
decreasing.

So, try to add some other variable with larger size and smaller alignment
to the frame (and make sure it isn't optimized away).

alignb above is the alignment of the first partition's var, if
align_frame_offset really needs to depend on the var alignment, it probably
should be the maximum alignment of all the vars with alignment
alignb * BITS_PER_UNIT <= MAX_SUPPORTED_STACK_ALIGNMENT

> diff --git a/gcc/testsuite/g++.dg/asan/pr110027.C 
> b/gcc/testsuite/g++.dg/asan/pr110027.C
> new file mode 100644
> index 000..0067781bc89
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/asan/pr110027.C
> @@ -0,0 +1,20 @@
> +/* PR sanitizer/110027 */
> +/* { dg-do run } */
> +/* { dg-require-effective-target avx512f_runtime } */
> +/* { dg-options "-std=gnu++23 -mavx512f -fsanitize=address -O0 -g 
> -fstack-protector-strong" } */
> +
> +#include 
> +#include 
> +
> +template 
> +using Vec [[gnu::vector_size(W * sizeof(T))]] = T;
> +
> +auto foo() {
> +  Vec<8, int64_t> ret{};
> +  return ret;
> +}
> +
> +int main() {
> +  foo();
> +  return 0;
> +}
> -- 
> 2.31.1

Jakub



Re: [PATCH v2] c++: direct-init of an array of class type [PR59465]

2024-03-25 Thread Stephan Bergmann

On 3/25/24 13:07, Jakub Jelinek wrote:

On Mon, Mar 25, 2024 at 12:36:46PM +0100, Stephan Bergmann wrote:

This started to break


$ cat test.cc
struct S1 { S1(); };
struct S2 {
 S2() {}
 S1 a[1] {};
};



$ g++ -fsyntax-only test.cc
test.cc: In constructor ‘S2::S2()’:
test.cc:3:10: error: invalid initializer for array member ‘S1 S2::a [1]’
 3 | S2() {}
   |  ^


https://gcc.gnu.org/PR114439 ?


yes, sorry, missed that already-existing bugracker issue



[committed] libstdc++: Fix incorrect macro used in #undef in test

2024-03-25 Thread Jonathan Wakely
Tested aarch64-linux. Puhed to trunk.

-- >8 --

This was a copy & paste error.

libstdc++-v3/ChangeLog:

* testsuite/std/text_encoding/requirements.cc: #undef the
correct macro.
---
 libstdc++-v3/testsuite/std/text_encoding/requirements.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libstdc++-v3/testsuite/std/text_encoding/requirements.cc 
b/libstdc++-v3/testsuite/std/text_encoding/requirements.cc
index a1d5d6baee1..6cd71b68225 100644
--- a/libstdc++-v3/testsuite/std/text_encoding/requirements.cc
+++ b/libstdc++-v3/testsuite/std/text_encoding/requirements.cc
@@ -8,7 +8,7 @@
 # error "Feature-test macro for text_encoding has wrong value in 
"
 #endif
 
-#undef __cpp_lib_expected
+#undef __cpp_lib_text_encoding
 #include 
 #ifndef __cpp_lib_text_encoding
 # error "Feature-test macro for text_encoding missing in "
-- 
2.44.0



RE: [PATCH v1] RISC-V: Allow RVV intrinsic when function target("arch=+v")

2024-03-25 Thread Li, Pan2
Committed, thanks kito.

Pan

-Original Message-
From: Kito Cheng  
Sent: Monday, March 25, 2024 8:04 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang 

Subject: Re: [PATCH v1] RISC-V: Allow RVV intrinsic when function 
target("arch=+v")

LGTM, thanks :)

On Mon, Mar 25, 2024 at 3:42 PM  wrote:
>
> From: Pan Li 
>
> This patch would like to allow the RVV intrinsic when function is
> attributed as target("arch=+v") and build with rv64gc.  For example:
>
> vint32m1_t
> __attribute__((target("arch=+v")))
> test_1 (vint32m1_t a, vint32m1_t b, size_t vl)
> {
>   return __riscv_vadd_vv_i32m1 (a, b, vl);
> }
>
> build with -march=rv64gc -mabi=lp64d -O3, we will have asm like below:
> test_1:
>   .option push
>   .option arch, rv64i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_v1p0_zicsr2p0_\
> zifencei2p0_zve32f1p0_zve32x1p0_zve64d1p0_zve64f1p0_zve64x1p0_zvl128b1p0_zvl32b1p0_zvl64b1p0
>   vsetvli zero,a0,e32,m1,ta,ma
>   vadd.vv v8,v8,v9
>   ret
>
> The riscv_vector.h must be included when leverage intrinisc type(s) and
> API(s).  And the scope of this attribute should not excced the function
> body.  Meanwhile, to make rvv types and API(s) available for this attribute,
> include riscv_vector.h will not report error for now if v is not present
> in march.
>
> Below test are passed for this patch:
> * The riscv fully regression test.
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-c.cc (riscv_pragma_intrinsic): Remove error
> when V is disabled and init the RVV types and intrinic APIs.
> * config/riscv/riscv-vector-builtins.cc (expand_builtin): Report
> error if V ext is disabled.
> * config/riscv/riscv.cc (riscv_return_value_is_vector_type_p):
> Ditto.
> (riscv_arguments_is_vector_type_p): Ditto.
> (riscv_vector_cc_function_p): Ditto.
> * config/riscv/riscv_vector.h: Remove error if V is disable.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/pragma-1.c: Remove.
> * gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-1.c: 
> New test.
> * gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-2.c: 
> New test.
> * gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-3.c: 
> New test.
> * gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-4.c: 
> New test.
> * gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-5.c: 
> New test.
> * gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-6.c: 
> New test.
> * gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-7.c: 
> New test.
> * gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-8.c: 
> New test.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/config/riscv/riscv-c.cc   | 18 +++
>  gcc/config/riscv/riscv-vector-builtins.cc |  5 
>  gcc/config/riscv/riscv.cc | 30 ---
>  gcc/config/riscv/riscv_vector.h   |  4 ---
>  .../gcc.target/riscv/rvv/base/pragma-1.c  |  4 ---
>  .../target_attribute_v_with_intrinsic-1.c |  5 
>  .../target_attribute_v_with_intrinsic-2.c | 18 +++
>  .../target_attribute_v_with_intrinsic-3.c | 13 
>  .../target_attribute_v_with_intrinsic-4.c | 10 +++
>  .../target_attribute_v_with_intrinsic-5.c | 12 
>  .../target_attribute_v_with_intrinsic-6.c | 12 
>  .../target_attribute_v_with_intrinsic-7.c |  9 ++
>  .../target_attribute_v_with_intrinsic-8.c | 23 ++
>  13 files changed, 145 insertions(+), 18 deletions(-)
>  delete mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pragma-1.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-1.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-2.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-3.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-4.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-5.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-6.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-7.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-8.c
>
> diff --git a/gcc/config/riscv/riscv-c.cc b/gcc/config/riscv/riscv-c.cc
> index edb866d51e4..01314037461 100644
> --- a/gcc/config/riscv/riscv-c.cc
> +++ b/gcc/config/riscv/riscv-c.cc
> @@ -201,14 +201,20 @@ riscv_pragma_intrinsic (cpp_reader *)
>if (strcmp (name, "vector") == 0
>|| strcmp (name, "xtheadvector") == 0)
>  {
> -  if (!TARGET_VECTOR)
> +  if (TARGET_VECTOR)
> +   riscv_vector::handle_pragma_vector ();
> +  else /* 

Re: [PATCH v2] c++: direct-init of an array of class type [PR59465]

2024-03-25 Thread Jakub Jelinek
On Mon, Mar 25, 2024 at 12:36:46PM +0100, Stephan Bergmann wrote:
> On 3/21/24 10:28 PM, Jason Merrill wrote:
> > On 3/21/24 16:48, Marek Polacek wrote:
> > > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> > 
> > OK.
> 
> This started to break
> 
> > $ cat test.cc
> > struct S1 { S1(); };
> > struct S2 {
> > S2() {}
> > S1 a[1] {};
> > };
> 
> > $ g++ -fsyntax-only test.cc
> > test.cc: In constructor ‘S2::S2()’:
> > test.cc:3:10: error: invalid initializer for array member ‘S1 S2::a [1]’
> > 3 | S2() {}
> >   |  ^

https://gcc.gnu.org/PR114439 ?

Jakub



Re: [PATCH v1] RISC-V: Allow RVV intrinsic when function target("arch=+v")

2024-03-25 Thread Kito Cheng
LGTM, thanks :)

On Mon, Mar 25, 2024 at 3:42 PM  wrote:
>
> From: Pan Li 
>
> This patch would like to allow the RVV intrinsic when function is
> attributed as target("arch=+v") and build with rv64gc.  For example:
>
> vint32m1_t
> __attribute__((target("arch=+v")))
> test_1 (vint32m1_t a, vint32m1_t b, size_t vl)
> {
>   return __riscv_vadd_vv_i32m1 (a, b, vl);
> }
>
> build with -march=rv64gc -mabi=lp64d -O3, we will have asm like below:
> test_1:
>   .option push
>   .option arch, rv64i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_v1p0_zicsr2p0_\
> zifencei2p0_zve32f1p0_zve32x1p0_zve64d1p0_zve64f1p0_zve64x1p0_zvl128b1p0_zvl32b1p0_zvl64b1p0
>   vsetvli zero,a0,e32,m1,ta,ma
>   vadd.vv v8,v8,v9
>   ret
>
> The riscv_vector.h must be included when leverage intrinisc type(s) and
> API(s).  And the scope of this attribute should not excced the function
> body.  Meanwhile, to make rvv types and API(s) available for this attribute,
> include riscv_vector.h will not report error for now if v is not present
> in march.
>
> Below test are passed for this patch:
> * The riscv fully regression test.
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-c.cc (riscv_pragma_intrinsic): Remove error
> when V is disabled and init the RVV types and intrinic APIs.
> * config/riscv/riscv-vector-builtins.cc (expand_builtin): Report
> error if V ext is disabled.
> * config/riscv/riscv.cc (riscv_return_value_is_vector_type_p):
> Ditto.
> (riscv_arguments_is_vector_type_p): Ditto.
> (riscv_vector_cc_function_p): Ditto.
> * config/riscv/riscv_vector.h: Remove error if V is disable.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/pragma-1.c: Remove.
> * gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-1.c: 
> New test.
> * gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-2.c: 
> New test.
> * gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-3.c: 
> New test.
> * gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-4.c: 
> New test.
> * gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-5.c: 
> New test.
> * gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-6.c: 
> New test.
> * gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-7.c: 
> New test.
> * gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-8.c: 
> New test.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/config/riscv/riscv-c.cc   | 18 +++
>  gcc/config/riscv/riscv-vector-builtins.cc |  5 
>  gcc/config/riscv/riscv.cc | 30 ---
>  gcc/config/riscv/riscv_vector.h   |  4 ---
>  .../gcc.target/riscv/rvv/base/pragma-1.c  |  4 ---
>  .../target_attribute_v_with_intrinsic-1.c |  5 
>  .../target_attribute_v_with_intrinsic-2.c | 18 +++
>  .../target_attribute_v_with_intrinsic-3.c | 13 
>  .../target_attribute_v_with_intrinsic-4.c | 10 +++
>  .../target_attribute_v_with_intrinsic-5.c | 12 
>  .../target_attribute_v_with_intrinsic-6.c | 12 
>  .../target_attribute_v_with_intrinsic-7.c |  9 ++
>  .../target_attribute_v_with_intrinsic-8.c | 23 ++
>  13 files changed, 145 insertions(+), 18 deletions(-)
>  delete mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pragma-1.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-1.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-2.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-3.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-4.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-5.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-6.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-7.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-8.c
>
> diff --git a/gcc/config/riscv/riscv-c.cc b/gcc/config/riscv/riscv-c.cc
> index edb866d51e4..01314037461 100644
> --- a/gcc/config/riscv/riscv-c.cc
> +++ b/gcc/config/riscv/riscv-c.cc
> @@ -201,14 +201,20 @@ riscv_pragma_intrinsic (cpp_reader *)
>if (strcmp (name, "vector") == 0
>|| strcmp (name, "xtheadvector") == 0)
>  {
> -  if (!TARGET_VECTOR)
> +  if (TARGET_VECTOR)
> +   riscv_vector::handle_pragma_vector ();
> +  else /* Indicates riscv_vector.h is included but v is missing in arch  
> */
> {
> - error ("%<#pragma riscv intrinsic%> option %qs needs 'V' or "
> -"'XTHEADVECTOR' extension enabled",
> -name);
> - return;
> + /* To make the 

Re: [PATCH v2] c++: direct-init of an array of class type [PR59465]

2024-03-25 Thread Stephan Bergmann

On 3/21/24 10:28 PM, Jason Merrill wrote:

On 3/21/24 16:48, Marek Polacek wrote:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?


OK.


This started to break


$ cat test.cc
struct S1 { S1(); };
struct S2 {
S2() {}
S1 a[1] {};
};



$ g++ -fsyntax-only test.cc
test.cc: In constructor ‘S2::S2()’:
test.cc:3:10: error: invalid initializer for array member ‘S1 S2::a [1]’
3 | S2() {}
  |  ^





[wwwdocs] [committed] Add Tokyo 2024 papers.

2024-03-25 Thread Jakub Jelinek
Hi!

I've committed the following patch to add the new CWG papers (and filed
corresponding bugzilla bugs).

diff --git a/htdocs/projects/cxx-status.html b/htdocs/projects/cxx-status.html
index 65030980..bfef2114 100644
--- a/htdocs/projects/cxx-status.html
+++ b/htdocs/projects/cxx-status.html
@@ -135,6 +135,55 @@
https://gcc.gnu.org/PR113800;>No

 
+
+
+   Disallow binding a returned reference to a temporary 
+   https://wg21.link/P2748R5;>P2748R5
+   https://gcc.gnu.org/PR114455;>No
+   
+
+
+   Attributes for structured bindings 
+  https://wg21.link/P0609R3;>P0609R3
+   https://gcc.gnu.org/PR114456;>No
+   __cpp_structured_bindings >= 202403L 
+
+
+   Erroneous behavior for uninitialized reads 
+   https://wg21.link/P2795R5;>P2795R5
+   https://gcc.gnu.org/PR114457;>No
+   
+
+
+   = delete("reason"); 
+   https://wg21.link/P2573R2;>P2573R2
+   https://gcc.gnu.org/PR114458;>No
+   __cpp_deleted_function >= 202403L 
+
+
+   Variadic friends 
+   https://wg21.link/P2893R3;>P2893R3
+   https://gcc.gnu.org/PR114459;>No
+   __cpp_variadic_friend >= 202403L 
+
+
+   Clarifying rules for brace elision in aggregate initialization 
+   https://wg21.link/P3106R1;>P3106R1 (DR) 
+   https://gcc.gnu.org/PR114460;>No
+   
+
+
+   Disallow module declarations to be macros 
+   https://wg21.link/P3034R1;>P3034R1 (DR) 
+   https://gcc.gnu.org/PR114461;>No
+   
+
+
+   Trivial infinite loops are not undefined behavior 
+  https://wg21.link/P2809R3;>P2809R3 (DR) 
+   https://gcc.gnu.org/PR114462;>No
+   
+
 

[PATCH] amdgcn: Add gfx1036 target

2024-03-25 Thread Richard Biener
Add support for the gfx1036 RDNA2 APU integrated graphics devices.  The ROCm
documentation warns that these may not be supported, but it seems to work
at least partially.

x86 host bootstrap/regtest running, target-libgomp testing for the
offload produces results comparable to those of gfx1030.  The nice
thing is that gfx1036 is inside every Zen4 desktop CPU (Ryzen 7xxx)
and testing on that doesn't interfere with a separate GPU used for
your desktop (where I experienced crashes when using the GPU for both
offload and graphics).

I'll note that while gfx1030 works with llvm14 gfx1036 needs llvm15
as minimum version for the assembler.

OK for trunk?

I'll follow up with the libgomp testing test summary for archival
purposes.  I still see linker errors for testcases using -g
(the ld: ^[[0;31merror: ^[[0mincompatible mach: 
/tmp/ccr0oDpD.mkoffload.dbg.o^M kind)

Thanks,
Richard.

gcc/ChangeLog:

* config.gcc (amdgcn): Add gfx1036 entries.
* config/gcn/gcn-hsa.h (NO_XNACK): Likewise.
(gcn_local_sym_hash): Likewise.
* config/gcn/gcn-opts.h (enum processor_type): Likewise.
(TARGET_GFX1036): New macro.
* config/gcn/gcn.cc (gcn_option_override): Handle gfx1036.
(gcn_omp_device_kind_arch_isa): Likewise.
(output_file_start): Likewise.
* config/gcn/gcn.h (TARGET_CPU_CPP_BUILTINS): Add __gfx1036__.
(TARGET_CPU_CPP_BUILTINS): Rename __gfx1030 to __gfx1030__.
* config/gcn/gcn.opt: Add gfx1036.
* config/gcn/mkoffload.cc (EF_AMDGPU_MACH_AMDGCN_GFX1036): New.
(main): Handle gfx1036.
* config/gcn/t-omp-device: Add gfx1036 isa.
* doc/install.texi (amdgcn): Add gfx1036.
* doc/invoke.texi (-march): Likewise.

libgomp/ChangeLog:

* plugin/plugin-gcn.c (EF_AMDGPU_MACH): GFX1036.
(gcn_gfx1103_s): New.
(isa_hsa_name): Handle gfx1036.
(isa_code): Likewise.
(max_isa_vgprs): Likewise.
---
 gcc/config.gcc  |  4 ++--
 gcc/config/gcn/gcn-hsa.h|  6 +++---
 gcc/config/gcn/gcn-opts.h   |  2 ++
 gcc/config/gcn/gcn.cc   | 10 ++
 gcc/config/gcn/gcn.h|  4 +++-
 gcc/config/gcn/gcn.opt  |  3 +++
 gcc/config/gcn/mkoffload.cc |  5 +
 gcc/config/gcn/t-omp-device |  2 +-
 gcc/doc/install.texi|  3 ++-
 gcc/doc/invoke.texi |  3 +++
 libgomp/plugin/plugin-gcn.c |  8 
 11 files changed, 42 insertions(+), 8 deletions(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 87a5c92b6e3..17873ac2103 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -4560,7 +4560,7 @@ case "${target}" in
for which in arch tune; do
eval "val=\$with_$which"
case ${val} in
-   "" | fiji | gfx900 | gfx906 | gfx908 | gfx90a | gfx1030 
| gfx1100 | gfx1103)
+   "" | fiji | gfx900 | gfx906 | gfx908 | gfx90a | gfx1030 
| gfx1036 | gfx1100 | gfx1103)
# OK
;;
*)
@@ -4576,7 +4576,7 @@ case "${target}" in
TM_MULTILIB_CONFIG=
;;
xdefault | xyes)
-   TM_MULTILIB_CONFIG=`echo 
"gfx900,gfx906,gfx908,gfx90a,gfx1030,gfx1100,gfx1103" | sed 
"s/${with_arch},\?//;s/,$//"`
+   TM_MULTILIB_CONFIG=`echo 
"gfx900,gfx906,gfx908,gfx90a,gfx1030,gfx1036,gfx1100,gfx1103" | sed 
"s/${with_arch},\?//;s/,$//"`
;;
*)
TM_MULTILIB_CONFIG="${with_multilib_list}"
diff --git a/gcc/config/gcn/gcn-hsa.h b/gcc/config/gcn/gcn-hsa.h
index ac32b8a328f..7d6e3141cea 100644
--- a/gcc/config/gcn/gcn-hsa.h
+++ b/gcc/config/gcn/gcn-hsa.h
@@ -90,7 +90,7 @@ extern unsigned int gcn_local_sym_hash (const char *name);
the ELF flags (e_flags) of that generated file must be identical to those
generated by the compiler.  */
 
-#define NO_XNACK "march=fiji:;march=gfx1030:;march=gfx1100:;march=gfx1103:;" \
+#define NO_XNACK 
"march=fiji:;march=gfx1030:;march=gfx1036:;march=gfx1100:;march=gfx1103:;" \
 /* These match the defaults set in gcn.cc.  */ \
 
"!mxnack*|mxnack=default:%{march=gfx900|march=gfx906|march=gfx908:-mattr=-xnack};"
 #define NO_SRAM_ECC "!march=*:;march=fiji:;march=gfx900:;march=gfx906:;"
@@ -106,8 +106,8 @@ extern unsigned int gcn_local_sym_hash (const char *name);
  "%{" ABI_VERSION_SPEC "} " \
  "%{" NO_XNACK XNACKOPT "} " \
  "%{" NO_SRAM_ECC SRAMOPT "} " \
- 
"%{march=gfx1030|march=gfx1100|march=gfx1103:-mattr=+wavefrontsize64} " \
- "%{march=gfx1030|march=gfx1100|march=gfx1103:-mattr=+cumode} 
" \
+ 
"%{march=gfx1030|march=gfx1036|march=gfx1100|march=gfx1103:-mattr=+wavefrontsize64}
 " \
+ 
"%{march=gfx1030|march=gfx1036|march=gfx1100|march=gfx1103:-mattr=+cumode} " \
  "-filetype=obj"
 

RE: [PATCH] aarch64: Align lrcpc3 FEAT_STRING with /proc/cpuinfo 'Features' entry

2024-03-25 Thread Kyrylo Tkachov
Hi Victor,

> -Original Message-
> From: Victor Do Nascimento 
> Sent: Monday, March 25, 2024 10:59 AM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Sandiford
> ; Richard Earnshaw
> ; Victor Do Nascimento
> 
> Subject: [PATCH] aarch64: Align lrcpc3 FEAT_STRING with /proc/cpuinfo
> 'Features' entry
> 
> Due to the Linux kernel exposing the lrcpc3 architectural feature as
> "lrcpc3", this patch corrects the relevant FEATURE_STRING entry in the
> "rcpc3" AARCH64_OPT_FMV_EXTENSION macro, such that the feature can be
> correctly detected when doing native compilation on rcpc3-enabled
> targets.
> 
> Regtested on aarch64-linux-gnu.

Ok but...

> 
> gcc/ChangeLog:
> 
>   * config/aarch64/aarch64-option-extensions.def: Fix 'lrcpc3'
>   entry.

This would usually be written as:
* config/aarch64/aarch64-option-extensions.def (rcpc3):
Fix FEATURE_STRING field to "lrcpc3".

Thanks,
Kyrill

> 
> gcc/testsuite/ChangeLog:
> 
>   * testsuite/gcc.target/aarch64/cpunative/info_24: New.
>   * testsuite/gcc.target/aarch64/cpunative/native_cpu_24.c:
>   Likewise.
> ---
>  gcc/config/aarch64/aarch64-option-extensions.def  |  2 +-
>  gcc/testsuite/gcc.target/aarch64/cpunative/info_24|  8 
>  .../gcc.target/aarch64/cpunative/native_cpu_24.c  | 11 +++
>  3 files changed, 20 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/cpunative/info_24
>  create mode 100644
> gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_24.c
> 
> diff --git a/gcc/config/aarch64/aarch64-option-extensions.def
> b/gcc/config/aarch64/aarch64-option-extensions.def
> index 1a3b91c68cf..975e7b84cec 100644
> --- a/gcc/config/aarch64/aarch64-option-extensions.def
> +++ b/gcc/config/aarch64/aarch64-option-extensions.def
> @@ -174,7 +174,7 @@ AARCH64_OPT_FMV_EXTENSION("rcpc", RCPC, (), (), (),
> "lrcpc")
> 
>  AARCH64_FMV_FEATURE("rcpc2", RCPC2, (RCPC))
> 
> -AARCH64_OPT_FMV_EXTENSION("rcpc3", RCPC3, (), (), (), "rcpc3")
> +AARCH64_OPT_FMV_EXTENSION("rcpc3", RCPC3, (), (), (), "lrcpc3")
> 
>  AARCH64_FMV_FEATURE("frintts", FRINTTS, ())
> 
> diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/info_24
> b/gcc/testsuite/gcc.target/aarch64/cpunative/info_24
> new file mode 100644
> index 000..8d3c16a1091
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/cpunative/info_24
> @@ -0,0 +1,8 @@
> +processor: 0
> +BogoMIPS : 100.00
> +Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 asimddp lrcpc3
> +CPU implementer  : 0xfe
> +CPU architecture: 8
> +CPU variant  : 0x0
> +CPU part : 0xd08
> +CPU revision : 2
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_24.c
> b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_24.c
> new file mode 100644
> index 000..05dc870885f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_24.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile { target { { aarch64*-*-linux*} && native } } } */
> +/* { dg-set-compiler-env-var GCC_CPUINFO
> "$srcdir/gcc.target/aarch64/cpunative/info_23" } */
> +/* { dg-additional-options "-mcpu=native --save-temps " } */
> +
> +int main()
> +{
> +  return 0;
> +}
> +
> +/* { dg-final { scan-assembler {\.arch armv8-a\+dotprod\+crc\+crypto\+rcpc3} 
> }
> } */
> +/* Test one where rcpc3 is available and so should be emitted.  */
> --
> 2.34.1



[PATCH] aarch64: Align lrcpc3 FEAT_STRING with /proc/cpuinfo 'Features' entry

2024-03-25 Thread Victor Do Nascimento
Due to the Linux kernel exposing the lrcpc3 architectural feature as
"lrcpc3", this patch corrects the relevant FEATURE_STRING entry in the
"rcpc3" AARCH64_OPT_FMV_EXTENSION macro, such that the feature can be
correctly detected when doing native compilation on rcpc3-enabled
targets.

Regtested on aarch64-linux-gnu.

gcc/ChangeLog:

* config/aarch64/aarch64-option-extensions.def: Fix 'lrcpc3'
entry.

gcc/testsuite/ChangeLog:

* testsuite/gcc.target/aarch64/cpunative/info_24: New.
* testsuite/gcc.target/aarch64/cpunative/native_cpu_24.c:
Likewise.
---
 gcc/config/aarch64/aarch64-option-extensions.def  |  2 +-
 gcc/testsuite/gcc.target/aarch64/cpunative/info_24|  8 
 .../gcc.target/aarch64/cpunative/native_cpu_24.c  | 11 +++
 3 files changed, 20 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/cpunative/info_24
 create mode 100644 gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_24.c

diff --git a/gcc/config/aarch64/aarch64-option-extensions.def 
b/gcc/config/aarch64/aarch64-option-extensions.def
index 1a3b91c68cf..975e7b84cec 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -174,7 +174,7 @@ AARCH64_OPT_FMV_EXTENSION("rcpc", RCPC, (), (), (), "lrcpc")
 
 AARCH64_FMV_FEATURE("rcpc2", RCPC2, (RCPC))
 
-AARCH64_OPT_FMV_EXTENSION("rcpc3", RCPC3, (), (), (), "rcpc3")
+AARCH64_OPT_FMV_EXTENSION("rcpc3", RCPC3, (), (), (), "lrcpc3")
 
 AARCH64_FMV_FEATURE("frintts", FRINTTS, ())
 
diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/info_24 
b/gcc/testsuite/gcc.target/aarch64/cpunative/info_24
new file mode 100644
index 000..8d3c16a1091
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/cpunative/info_24
@@ -0,0 +1,8 @@
+processor  : 0
+BogoMIPS   : 100.00
+Features   : fp asimd evtstrm aes pmull sha1 sha2 crc32 asimddp lrcpc3
+CPU implementer: 0xfe
+CPU architecture: 8
+CPU variant: 0x0
+CPU part   : 0xd08
+CPU revision   : 2
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_24.c 
b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_24.c
new file mode 100644
index 000..05dc870885f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_24.c
@@ -0,0 +1,11 @@
+/* { dg-do compile { target { { aarch64*-*-linux*} && native } } } */
+/* { dg-set-compiler-env-var GCC_CPUINFO 
"$srcdir/gcc.target/aarch64/cpunative/info_23" } */
+/* { dg-additional-options "-mcpu=native --save-temps " } */
+
+int main()
+{
+  return 0;
+}
+
+/* { dg-final { scan-assembler {\.arch armv8-a\+dotprod\+crc\+crypto\+rcpc3} } 
} */
+/* Test one where rcpc3 is available and so should be emitted.  */
-- 
2.34.1



Re: [PATCH] libgcc: arm: fix build for FDPIC target

2024-03-25 Thread Christophe Lyon

Hi,

On 3/22/24 21:14, Max Filippov wrote:

libgcc/
* unwind-arm-common.inc (__gnu_personality_sigframe_fdpic): Cast
last argument of _Unwind_VRS_Set to void *.
---
  libgcc/unwind-arm-common.inc | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libgcc/unwind-arm-common.inc b/libgcc/unwind-arm-common.inc
index 5453f38186b5..576f7e93e8a8 100644
--- a/libgcc/unwind-arm-common.inc
+++ b/libgcc/unwind-arm-common.inc
@@ -248,7 +248,7 @@ __gnu_personality_sigframe_fdpic (_Unwind_State state,
  + ARM_SIGCONTEXT_R0;
  /* Restore regs saved on stack by the kernel.  */
  for (i = 0; i < 16; i++)
-   _Unwind_VRS_Set (context, _UVRSC_CORE, i, _UVRSD_UINT32, sp + 4 * i);
+   _Unwind_VRS_Set (context, _UVRSC_CORE, i, _UVRSD_UINT32, (void *)(sp + 
4 * i));


LGTM (but I'm not a maintainer).

Thanks,

Christophe

  
  return _URC_CONTINUE_UNWIND;

  }


Re: [PATCH v2] doc: Correction of Tree SSA Passes info.

2024-03-25 Thread Richard Biener
On Mon, 25 Mar 2024, Chenghui Pan wrote:

> Current document of Tree SSA passes contains many parts that is not
> updated for many years.
> 
> This patch removes some info that is outdated and not existed in
> current GCC codebase, and fixes some wrong code location descriptions
> based on current codebase status and ChangeLogs.

OK

Thanks,
Richard.

> Changes since v1: Add correct info for pass_build_alias.
> 
> gcc/ChangeLog:
> 
>   * doc/passes.texi: Correction of Tree SSA Passes info.
> ---
>  gcc/doc/passes.texi | 75 +
>  1 file changed, 7 insertions(+), 68 deletions(-)
> 
> diff --git a/gcc/doc/passes.texi b/gcc/doc/passes.texi
> index b50d3d5635b..b13ad06c5a9 100644
> --- a/gcc/doc/passes.texi
> +++ b/gcc/doc/passes.texi
> @@ -450,17 +450,6 @@ The following briefly describes the Tree optimization 
> passes that are
>  run after gimplification and what source files they are located in.
>  
>  @itemize @bullet
> -@item Remove useless statements
> -
> -This pass is an extremely simple sweep across the gimple code in which
> -we identify obviously dead code and remove it.  Here we do things like
> -simplify @code{if} statements with constant conditions, remove
> -exception handling constructs surrounding code that obviously cannot
> -throw, remove lexical bindings that contain no variables, and other
> -assorted simplistic cleanups.  The idea is to get rid of the obvious
> -stuff quickly rather than wait until later when it's more work to get
> -rid of it.  This pass is located in @file{tree-cfg.cc} and described by
> -@code{pass_remove_useless_stmts}.
>  
>  @item OpenMP lowering
>  
> @@ -478,7 +467,7 @@ described by @code{pass_lower_omp}.
>  
>  If OpenMP generation (@option{-fopenmp}) is enabled, this pass expands
>  parallel regions into their own functions to be invoked by the thread
> -library.  The pass is located in @file{omp-low.cc} and is described by
> +library.  The pass is located in @file{omp-expand.cc} and is described by
>  @code{pass_expand_omp}.
>  
>  @item Lower control flow
> @@ -511,15 +500,6 @@ This pass decomposes a function into basic blocks and 
> creates all of
>  the edges that connect them.  It is located in @file{tree-cfg.cc} and
>  is described by @code{pass_build_cfg}.
>  
> -@item Find all referenced variables
> -
> -This pass walks the entire function and collects an array of all
> -variables referenced in the function, @code{referenced_vars}.  The
> -index at which a variable is found in the array is used as a UID
> -for the variable within this function.  This data is needed by the
> -SSA rewriting routines.  The pass is located in @file{tree-dfa.cc}
> -and is described by @code{pass_referenced_vars}.
> -
>  @item Enter static single assignment form
>  
>  This pass rewrites the function such that it is in SSA form.  After
> @@ -562,15 +542,6 @@ variables that are used once into the expression that 
> uses them and
>  seeing if the result can be simplified.  It is located in
>  @file{tree-ssa-forwprop.cc} and is described by @code{pass_forwprop}.
>  
> -@item Copy Renaming
> -
> -This pass attempts to change the name of compiler temporaries involved in
> -copy operations such that SSA->normal can coalesce the copy away.  When 
> compiler
> -temporaries are copies of user variables, it also renames the compiler
> -temporary to the user variable resulting in better use of user symbols.  It 
> is
> -located in @file{tree-ssa-copyrename.c} and is described by
> -@code{pass_copyrename}.
> -
>  @item PHI node optimizations
>  
>  This pass recognizes forms of PHI inputs that can be represented as
> @@ -581,12 +552,8 @@ It is located in @file{tree-ssa-phiopt.cc} and is 
> described by
>  @item May-alias optimization
>  
>  This pass performs a flow sensitive SSA-based points-to analysis.
> -The resulting may-alias, must-alias, and escape analysis information
> -is used to promote variables from in-memory addressable objects to
> -non-aliased variables that can be renamed into SSA form.  We also
> -update the @code{VDEF}/@code{VUSE} memory tags for non-renameable
> -aggregates so that we get fewer false kills.  The pass is located
> -in @file{tree-ssa-alias.cc} and is described by @code{pass_may_alias}.
> +It is located in @file{tree-ssa-structalias.cc} and is described
> +by @code{pass_build_alias}.
>  
>  Interprocedural points-to information is located in
>  @file{tree-ssa-structalias.cc} and described by @code{pass_ipa_pta}.
> @@ -604,7 +571,7 @@ is described by @code{pass_ipa_tree_profile}.
>  This pass implements series of heuristics to guess propababilities
>  of branches.  The resulting predictions are turned into edge profile
>  by propagating branches across the control flow graphs.
> -The pass is located in @file{tree-profile.cc} and is described by
> +The pass is located in @file{predict.cc} and is described by
>  @code{pass_profile}.
>  
>  @item Lower complex arithmetic
> @@ -653,7 +620,7 @@ in 

[PATCH v2] doc: Correction of Tree SSA Passes info.

2024-03-25 Thread Chenghui Pan
Current document of Tree SSA passes contains many parts that is not
updated for many years.

This patch removes some info that is outdated and not existed in
current GCC codebase, and fixes some wrong code location descriptions
based on current codebase status and ChangeLogs.

Changes since v1: Add correct info for pass_build_alias.

gcc/ChangeLog:

* doc/passes.texi: Correction of Tree SSA Passes info.
---
 gcc/doc/passes.texi | 75 +
 1 file changed, 7 insertions(+), 68 deletions(-)

diff --git a/gcc/doc/passes.texi b/gcc/doc/passes.texi
index b50d3d5635b..b13ad06c5a9 100644
--- a/gcc/doc/passes.texi
+++ b/gcc/doc/passes.texi
@@ -450,17 +450,6 @@ The following briefly describes the Tree optimization 
passes that are
 run after gimplification and what source files they are located in.
 
 @itemize @bullet
-@item Remove useless statements
-
-This pass is an extremely simple sweep across the gimple code in which
-we identify obviously dead code and remove it.  Here we do things like
-simplify @code{if} statements with constant conditions, remove
-exception handling constructs surrounding code that obviously cannot
-throw, remove lexical bindings that contain no variables, and other
-assorted simplistic cleanups.  The idea is to get rid of the obvious
-stuff quickly rather than wait until later when it's more work to get
-rid of it.  This pass is located in @file{tree-cfg.cc} and described by
-@code{pass_remove_useless_stmts}.
 
 @item OpenMP lowering
 
@@ -478,7 +467,7 @@ described by @code{pass_lower_omp}.
 
 If OpenMP generation (@option{-fopenmp}) is enabled, this pass expands
 parallel regions into their own functions to be invoked by the thread
-library.  The pass is located in @file{omp-low.cc} and is described by
+library.  The pass is located in @file{omp-expand.cc} and is described by
 @code{pass_expand_omp}.
 
 @item Lower control flow
@@ -511,15 +500,6 @@ This pass decomposes a function into basic blocks and 
creates all of
 the edges that connect them.  It is located in @file{tree-cfg.cc} and
 is described by @code{pass_build_cfg}.
 
-@item Find all referenced variables
-
-This pass walks the entire function and collects an array of all
-variables referenced in the function, @code{referenced_vars}.  The
-index at which a variable is found in the array is used as a UID
-for the variable within this function.  This data is needed by the
-SSA rewriting routines.  The pass is located in @file{tree-dfa.cc}
-and is described by @code{pass_referenced_vars}.
-
 @item Enter static single assignment form
 
 This pass rewrites the function such that it is in SSA form.  After
@@ -562,15 +542,6 @@ variables that are used once into the expression that uses 
them and
 seeing if the result can be simplified.  It is located in
 @file{tree-ssa-forwprop.cc} and is described by @code{pass_forwprop}.
 
-@item Copy Renaming
-
-This pass attempts to change the name of compiler temporaries involved in
-copy operations such that SSA->normal can coalesce the copy away.  When 
compiler
-temporaries are copies of user variables, it also renames the compiler
-temporary to the user variable resulting in better use of user symbols.  It is
-located in @file{tree-ssa-copyrename.c} and is described by
-@code{pass_copyrename}.
-
 @item PHI node optimizations
 
 This pass recognizes forms of PHI inputs that can be represented as
@@ -581,12 +552,8 @@ It is located in @file{tree-ssa-phiopt.cc} and is 
described by
 @item May-alias optimization
 
 This pass performs a flow sensitive SSA-based points-to analysis.
-The resulting may-alias, must-alias, and escape analysis information
-is used to promote variables from in-memory addressable objects to
-non-aliased variables that can be renamed into SSA form.  We also
-update the @code{VDEF}/@code{VUSE} memory tags for non-renameable
-aggregates so that we get fewer false kills.  The pass is located
-in @file{tree-ssa-alias.cc} and is described by @code{pass_may_alias}.
+It is located in @file{tree-ssa-structalias.cc} and is described
+by @code{pass_build_alias}.
 
 Interprocedural points-to information is located in
 @file{tree-ssa-structalias.cc} and described by @code{pass_ipa_pta}.
@@ -604,7 +571,7 @@ is described by @code{pass_ipa_tree_profile}.
 This pass implements series of heuristics to guess propababilities
 of branches.  The resulting predictions are turned into edge profile
 by propagating branches across the control flow graphs.
-The pass is located in @file{tree-profile.cc} and is described by
+The pass is located in @file{predict.cc} and is described by
 @code{pass_profile}.
 
 @item Lower complex arithmetic
@@ -653,7 +620,7 @@ in @file{tree-ssa-math-opts.cc} and is described by
 @item Full redundancy elimination
 
 This is a simpler form of PRE that only eliminates redundancies that
-occur on all paths.  It is located in @file{tree-ssa-pre.cc} and
+occur on all paths.  It is located in @file{tree-ssa-sccvn.cc} and
 

[PATCH v2] MIPS: Add MIN/MAX.fmt instructions support for MIPS R6

2024-03-25 Thread Jie Mei
This patch adds the smin/smax RTL mode for the
min/max.fmt instructions.

Also, since the min/max.fmt instrucions applies to the
IEEE 754-2008 "minNum" and "maxNum" operations, this
patch also provides the new "fmin3" and
"fmax3" modes.

gcc/ChangeLog:

* config/mips/i6400.md (i6400_fpu_minmax): New
define_insn_reservation.
* config/mips/mips.h (ISA_HAS_FMIN_FMAX): Define new macro.
* config/mips/mips.md (UNSPEC_FMIN): New unspec.
(UNSPEC_FMAX): Same as above.
(type): Add fminmax.
(smin3): Generates MIN.fmt instructions.
(smax3): Generates MAX.fmt instructions.
(fmin3): Generates MIN.fmt instructions.
(fmax3): Generates MAX.fmt instructions.
* config/mips/p6600.md (p6600_fpu_fabs): Include fminmax
type.

gcc/testsuite/ChangeLog:

* gcc.target/mips/mips-minmax.c: New test for MIPS R6.
---
 gcc/config/mips/i6400.md|  6 +++
 gcc/config/mips/mips.h  |  2 +
 gcc/config/mips/mips.md | 50 -
 gcc/config/mips/p6600.md|  2 +-
 gcc/testsuite/gcc.target/mips/mips-minmax.c | 40 +
 5 files changed, 97 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/mips/mips-minmax.c

diff --git a/gcc/config/mips/i6400.md b/gcc/config/mips/i6400.md
index 9f216fe0210..d6f691ee217 100644
--- a/gcc/config/mips/i6400.md
+++ b/gcc/config/mips/i6400.md
@@ -219,6 +219,12 @@
(eq_attr "type" "fabs,fneg,fmove"))
   "i6400_fpu_short, i6400_fpu_apu")
 
+;; min, max
+(define_insn_reservation "i6400_fpu_minmax" 2
+  (and (eq_attr "cpu" "i6400")
+   (eq_attr "type" "fminmax"))
+  "i6400_fpu_short+i6400_fpu_logic")
+
 ;; fadd, fsub, fcvt
 (define_insn_reservation "i6400_fpu_fadd" 4
   (and (eq_attr "cpu" "i6400")
diff --git a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h
index 7145d23c650..5ce984ac99b 100644
--- a/gcc/config/mips/mips.h
+++ b/gcc/config/mips/mips.h
@@ -1259,6 +1259,8 @@ struct mips_cpu_info {
 #define ISA_HAS_9BIT_DISPLACEMENT  (mips_isa_rev >= 6  \
 || ISA_HAS_MIPS16E2)
 
+#define ISA_HAS_FMIN_FMAX  (mips_isa_rev >= 6)
+
 /* ISA has data indexed prefetch instructions.  This controls use of
'prefx', along with TARGET_HARD_FLOAT and TARGET_DOUBLE_FLOAT.
(prefx is a cop1x instruction, so can only be used if FP is
diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md
index b0fb5850a9e..26f758c90dd 100644
--- a/gcc/config/mips/mips.md
+++ b/gcc/config/mips/mips.md
@@ -97,6 +97,10 @@
   UNSPEC_GET_FCSR
   UNSPEC_SET_FCSR
 
+  ;; Floating-point unspecs.
+  UNSPEC_FMIN
+  UNSPEC_FMAX
+
   ;; HI/LO moves.
   UNSPEC_MFHI
   UNSPEC_MTHI
@@ -370,6 +374,7 @@
 ;; frsqrt   floating point reciprocal square root
 ;; frsqrt1  floating point reciprocal square root step1
 ;; frsqrt2  floating point reciprocal square root step2
+;; fminmax  floating point min/max
 ;; dspmac   DSP MAC instructions not saturating the accumulator
 ;; dspmacsatDSP MAC instructions that saturate the accumulator
 ;; accext   DSP accumulator extract instructions
@@ -387,8 +392,8 @@
prefetch,prefetchx,condmove,mtc,mfc,mthi,mtlo,mfhi,mflo,const,arith,logical,
shift,slt,signext,clz,pop,trap,imul,imul3,imul3nc,imadd,idiv,idiv3,move,
fmove,fadd,fmul,fmadd,fdiv,frdiv,frdiv1,frdiv2,fabs,fneg,fcmp,fcvt,fsqrt,
-   frsqrt,frsqrt1,frsqrt2,dspmac,dspmacsat,accext,accmod,dspalu,dspalusat,
-   multi,atomic,syncloop,nop,ghost,multimem,
+   frsqrt,frsqrt1,frsqrt2,fminmax,dspmac,dspmacsat,accext,accmod,dspalu,
+   dspalusat,multi,atomic,syncloop,nop,ghost,multimem,
simd_div,simd_fclass,simd_flog2,simd_fadd,simd_fcvt,simd_fmul,simd_fmadd,
simd_fdiv,simd_bitins,simd_bitmov,simd_insert,simd_sld,simd_mul,simd_fcmp,
simd_fexp2,simd_int_arith,simd_bit,simd_shift,simd_splat,simd_fill,
@@ -7971,6 +7976,47 @@
   [(set_attr "move_type" "load")
(set_attr "insn_count" "2")])
 
+;;
+;;  Float point MIN/MAX
+;;
+
+(define_insn "smin3"
+  [(set (match_operand:SCALARF 0 "register_operand" "=f")
+   (smin:SCALARF (match_operand:SCALARF 1 "register_operand" "f")
+ (match_operand:SCALARF 2 "register_operand" "f")))]
+  "ISA_HAS_FMIN_FMAX"
+  "min.\t%0,%1,%2"
+  [(set_attr "type" "fminmax")
+   (set_attr "mode" "")])
+
+(define_insn "smax3"
+  [(set (match_operand:SCALARF 0 "register_operand" "=f")
+   (smax:SCALARF (match_operand:SCALARF 1 "register_operand" "f")
+ (match_operand:SCALARF 2 "register_operand" "f")))]
+  "ISA_HAS_FMIN_FMAX"
+  "max.\t%0,%1,%2"
+  [(set_attr "type" "fminmax")
+  (set_attr "mode" "")])
+
+(define_insn "fmin3"
+  [(set (match_operand:SCALARF 0 "register_operand" "=f")
+   (unspec:SCALARF [(use (match_operand:SCALARF 1 "register_operand" "f"))
+(use (match_operand:SCALARF 2 "register_operand" "f"))]
+   

[PATCH] c++/modules: Fix instantiation of imported temploid friends [PR114275]

2024-03-25 Thread Nathaniel Shead
Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?

I'm not 100% sure I've covered all places where this needs to be handled
but so far this passes all the testcases I have.

-- >8 --

This patch fixes a number of issues with the handling of temploid friend
declarations.

The primary issue is that instantiations of friend declarations should
attach the declaration to the same module as the befriending class, by
[module.unit] p7.1 and [temp.friend] p2; this could be a different
module from the current TU, and so needs special handling.

This patch only implements this for class declarations so far, as
function declarations don't seem to be causing any issues with this
currently, but probably should revisit for GCC 15.

The other main issue here is that we can't assume that just because name
lookup didn't find a definition for a hidden template class, it doesn't
mean that it doesn't exist: it could be a non-exported entity that we've
nevertheless streamed in from an imported module. We need to ensure that
when instantiating friend classes that we return the same TYPE_DECL that
we got from our imports, otherwise we will get later issues with
'duplicate_decls' (rightfully) complaining that they're different.

PR c++/105320
PR c++/114275

gcc/cp/ChangeLog:

* cp-tree.h (module_may_redeclare_friend_class): New.
(propagate_defining_module): New.
(lookup_imported_hidden_friend): New.
* module.cc (imported_temploid_friends): New.
(trees_out::decl_value): Write imported_temploid_friends.
(trees_in::decl_value): Read it.
(get_originating_module_decl): Handle instantiated friend
classes being attached to a different module.
(module_may_redeclare_friend_class): New.
(propagate_defining_module): New.
(init_modules): Initialize imported_temploid_friends.
* name-lookup.cc (lookup_imported_hidden_friend): New.
* pt.cc (tsubst_friend_class): Lookup imported hidden friends.
Error when redeclaring the type in the wrong module. Propagate
the defining module to the instantiated type.

gcc/testsuite/ChangeLog:

* g++.dg/modules/tpl-friend-10_a.C: New test.
* g++.dg/modules/tpl-friend-10_b.C: New test.
* g++.dg/modules/tpl-friend-10_c.C: New test.
* g++.dg/modules/tpl-friend-11_a.C: New test.
* g++.dg/modules/tpl-friend-11_b.C: New test.
* g++.dg/modules/tpl-friend-12_a.C: New test.
* g++.dg/modules/tpl-friend-12_b.C: New test.
* g++.dg/modules/tpl-friend-12_c.C: New test.
* g++.dg/modules/tpl-friend-12_d.C: New test.
* g++.dg/modules/tpl-friend-12_e.C: New test.
* g++.dg/modules/tpl-friend-12_f.C: New test.
* g++.dg/modules/tpl-friend-13_a.C: New test.
* g++.dg/modules/tpl-friend-13_b.C: New test.
* g++.dg/modules/tpl-friend-13_c.C: New test.
* g++.dg/modules/tpl-friend-13_d.C: New test.
* g++.dg/modules/tpl-friend-13_e.C: New test.
* g++.dg/modules/tpl-friend-9.C: New test.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/cp-tree.h  |  3 +
 gcc/cp/module.cc  | 96 +++
 gcc/cp/name-lookup.cc | 42 
 gcc/cp/pt.cc  | 20 
 gcc/testsuite/g++.dg/modules/tpl-friend-10.h  |  0
 .../g++.dg/modules/tpl-friend-10_a.C  | 15 +++
 .../g++.dg/modules/tpl-friend-10_b.C  |  5 +
 .../g++.dg/modules/tpl-friend-10_c.C  |  7 ++
 .../g++.dg/modules/tpl-friend-11_a.C  | 14 +++
 .../g++.dg/modules/tpl-friend-11_b.C  |  5 +
 .../g++.dg/modules/tpl-friend-12_a.C  | 10 ++
 .../g++.dg/modules/tpl-friend-12_b.C  |  9 ++
 .../g++.dg/modules/tpl-friend-12_c.C  | 10 ++
 .../g++.dg/modules/tpl-friend-12_d.C  |  8 ++
 .../g++.dg/modules/tpl-friend-12_e.C  |  7 ++
 .../g++.dg/modules/tpl-friend-12_f.C  |  8 ++
 .../g++.dg/modules/tpl-friend-13_a.C  |  7 ++
 .../g++.dg/modules/tpl-friend-13_b.C  |  5 +
 .../g++.dg/modules/tpl-friend-13_c.C  |  7 ++
 .../g++.dg/modules/tpl-friend-13_d.C  |  6 ++
 .../g++.dg/modules/tpl-friend-13_e.C  |  8 ++
 gcc/testsuite/g++.dg/modules/tpl-friend-9.C   | 11 +++
 22 files changed, 303 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/modules/tpl-friend-10.h
 create mode 100644 gcc/testsuite/g++.dg/modules/tpl-friend-10_a.C
 create mode 100644 gcc/testsuite/g++.dg/modules/tpl-friend-10_b.C
 create mode 100644 gcc/testsuite/g++.dg/modules/tpl-friend-10_c.C
 create mode 100644 gcc/testsuite/g++.dg/modules/tpl-friend-11_a.C
 create mode 100644 gcc/testsuite/g++.dg/modules/tpl-friend-11_b.C
 create mode 100644 gcc/testsuite/g++.dg/modules/tpl-friend-12_a.C
 create mode 100644 gcc/testsuite/g++.dg/modules/tpl-friend-12_b.C
 create mode 100644 gcc/testsuite/g++.dg/modules/tpl-friend-12_c.C
 

[GCC] RTEMS: Add multilib configuration for aarch64

2024-03-25 Thread Sebastian Huber
gcc/ChangeLog:

* config.gcc (aarch64-*-rtems*): Add target makefile fragment
t-aarch64-rtems.
* config/aarch64/t-aarch64-rtems: New file.
---
 gcc/config.gcc |  1 +
 gcc/config/aarch64/t-aarch64-rtems | 41 ++
 2 files changed, 42 insertions(+)
 create mode 100644 gcc/config/aarch64/t-aarch64-rtems

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 648b3dc2110..c3b73d05eb7 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -1139,6 +1139,7 @@ aarch64*-*-elf | aarch64*-*-fuchsia* | aarch64*-*-rtems*)
 ;;
aarch64-*-rtems*)
tm_file="${tm_file} aarch64/rtems.h rtems.h"
+   tmake_file="${tmake_file} aarch64/t-aarch64-rtems"
;;
esac
case $target in
diff --git a/gcc/config/aarch64/t-aarch64-rtems 
b/gcc/config/aarch64/t-aarch64-rtems
new file mode 100644
index 000..88e07f54551
--- /dev/null
+++ b/gcc/config/aarch64/t-aarch64-rtems
@@ -0,0 +1,41 @@
+# Machine description for AArch64 architecture.
+#  Copyright (C) 2024 Free Software Foundation, Inc.
+#
+#  This file is part of GCC.
+#
+#  GCC is free software; you can redistribute it and/or modify it
+#  under the terms of the GNU General Public License as published by
+#  the Free Software Foundation; either version 3, or (at your option)
+#  any later version.
+#
+#  GCC is distributed in the hope that it will be useful, but
+#  WITHOUT ANY WARRANTY; without even the implied warranty of
+#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+#  General Public License for more details.
+#
+#  You should have received a copy of the GNU General Public License
+#  along with GCC; see the file COPYING3.  If not see
+#  .
+
+MULTILIB_OPTIONS  =
+MULTILIB_DIRNAMES =
+
+MULTILIB_OPTIONS  += mabi=ilp32
+MULTILIB_DIRNAMES += ilp32
+
+MULTILIB_OPTIONS  += mno-outline-atomics
+MULTILIB_DIRNAMES += nooa
+
+MULTILIB_OPTIONS  += mcpu=cortex-a53
+MULTILIB_DIRNAMES += a53
+
+MULTILIB_OPTIONS  += mfix-cortex-a53-835769
+MULTILIB_DIRNAMES += fix835769
+
+MULTILIB_OPTIONS  += mfix-cortex-a53-843419
+MULTILIB_DIRNAMES += fix843419
+
+MULTILIB_REQUIRED =
+
+MULTILIB_REQUIRED += mabi=ilp32
+MULTILIB_REQUIRED += 
mno-outline-atomics/mcpu=cortex-a53/mfix-cortex-a53-835769/mfix-cortex-a53-843419
-- 
2.35.3



Re: [PATCH v1] doc: Correction of Tree SSA Passes info.

2024-03-25 Thread Richard Biener
On Mon, 25 Mar 2024, Chenghui Pan wrote:

> Current document of Tree SSA passes contains many parts that is not
> updated for many years.
> 
> This patch removes some info that is outdated and not existed in
> current GCC codebase, and fixes some wrong code location descriptions
> based on current codebase status and ChangeLogs.
> 
> gcc/ChangeLog:
> 
>   * doc/passes.texi: Correction of Tree SSA Passes info.
> ---
>  gcc/doc/passes.texi | 70 -
>  1 file changed, 6 insertions(+), 64 deletions(-)
> 
> diff --git a/gcc/doc/passes.texi b/gcc/doc/passes.texi
> index b50d3d5635b..068036acb7d 100644
> --- a/gcc/doc/passes.texi
> +++ b/gcc/doc/passes.texi
> @@ -450,17 +450,6 @@ The following briefly describes the Tree optimization 
> passes that are
>  run after gimplification and what source files they are located in.
>  
>  @itemize @bullet
> -@item Remove useless statements
> -
> -This pass is an extremely simple sweep across the gimple code in which
> -we identify obviously dead code and remove it.  Here we do things like
> -simplify @code{if} statements with constant conditions, remove
> -exception handling constructs surrounding code that obviously cannot
> -throw, remove lexical bindings that contain no variables, and other
> -assorted simplistic cleanups.  The idea is to get rid of the obvious
> -stuff quickly rather than wait until later when it's more work to get
> -rid of it.  This pass is located in @file{tree-cfg.cc} and described by
> -@code{pass_remove_useless_stmts}.
>  
>  @item OpenMP lowering
>  
> @@ -478,7 +467,7 @@ described by @code{pass_lower_omp}.
>  
>  If OpenMP generation (@option{-fopenmp}) is enabled, this pass expands
>  parallel regions into their own functions to be invoked by the thread
> -library.  The pass is located in @file{omp-low.cc} and is described by
> +library.  The pass is located in @file{omp-expand.cc} and is described by
>  @code{pass_expand_omp}.
>  
>  @item Lower control flow
> @@ -511,15 +500,6 @@ This pass decomposes a function into basic blocks and 
> creates all of
>  the edges that connect them.  It is located in @file{tree-cfg.cc} and
>  is described by @code{pass_build_cfg}.
>  
> -@item Find all referenced variables
> -
> -This pass walks the entire function and collects an array of all
> -variables referenced in the function, @code{referenced_vars}.  The
> -index at which a variable is found in the array is used as a UID
> -for the variable within this function.  This data is needed by the
> -SSA rewriting routines.  The pass is located in @file{tree-dfa.cc}
> -and is described by @code{pass_referenced_vars}.
> -
>  @item Enter static single assignment form
>  
>  This pass rewrites the function such that it is in SSA form.  After
> @@ -562,15 +542,6 @@ variables that are used once into the expression that 
> uses them and
>  seeing if the result can be simplified.  It is located in
>  @file{tree-ssa-forwprop.cc} and is described by @code{pass_forwprop}.
>  
> -@item Copy Renaming
> -
> -This pass attempts to change the name of compiler temporaries involved in
> -copy operations such that SSA->normal can coalesce the copy away.  When 
> compiler
> -temporaries are copies of user variables, it also renames the compiler
> -temporary to the user variable resulting in better use of user symbols.  It 
> is
> -located in @file{tree-ssa-copyrename.c} and is described by
> -@code{pass_copyrename}.
> -
>  @item PHI node optimizations
>  
>  This pass recognizes forms of PHI inputs that can be represented as
> @@ -585,8 +556,7 @@ The resulting may-alias, must-alias, and escape analysis 
> information
>  is used to promote variables from in-memory addressable objects to
>  non-aliased variables that can be renamed into SSA form.  We also
>  update the @code{VDEF}/@code{VUSE} memory tags for non-renameable
> -aggregates so that we get fewer false kills.  The pass is located
> -in @file{tree-ssa-alias.cc} and is described by @code{pass_may_alias}.
> +aggregates so that we get fewer false kills.

This pass is now located in tree-ssa-structalias.cc and it's named
pass_build_alias instead.

The part of the description that says

"The resulting may-alias, must-alias, and escape analysis information
is used to promote variables from in-memory addressable objects to
non-aliased variables that can be renamed into SSA form.  We also
update the @code{VDEF}/@code{VUSE} memory tags for non-renameable
aggregates so that we get fewer false kills."

is obsolete and should be removed.

The rest of the changes look good to me.

Thanks,
Richard.

>  Interprocedural points-to information is located in
>  @file{tree-ssa-structalias.cc} and described by @code{pass_ipa_pta}.
> @@ -604,7 +574,7 @@ is described by @code{pass_ipa_tree_profile}.
>  This pass implements series of heuristics to guess propababilities
>  of branches.  The resulting predictions are turned into edge profile
>  by propagating branches across the control flow graphs.

[PATCH v1] RISC-V: Allow RVV intrinsic when function target("arch=+v")

2024-03-25 Thread pan2 . li
From: Pan Li 

This patch would like to allow the RVV intrinsic when function is
attributed as target("arch=+v") and build with rv64gc.  For example:

vint32m1_t
__attribute__((target("arch=+v")))
test_1 (vint32m1_t a, vint32m1_t b, size_t vl)
{
  return __riscv_vadd_vv_i32m1 (a, b, vl);
}

build with -march=rv64gc -mabi=lp64d -O3, we will have asm like below:
test_1:
  .option push
  .option arch, rv64i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_v1p0_zicsr2p0_\
zifencei2p0_zve32f1p0_zve32x1p0_zve64d1p0_zve64f1p0_zve64x1p0_zvl128b1p0_zvl32b1p0_zvl64b1p0
  vsetvli zero,a0,e32,m1,ta,ma
  vadd.vv v8,v8,v9
  ret

The riscv_vector.h must be included when leverage intrinisc type(s) and
API(s).  And the scope of this attribute should not excced the function
body.  Meanwhile, to make rvv types and API(s) available for this attribute,
include riscv_vector.h will not report error for now if v is not present
in march.

Below test are passed for this patch:
* The riscv fully regression test.

gcc/ChangeLog:

* config/riscv/riscv-c.cc (riscv_pragma_intrinsic): Remove error
when V is disabled and init the RVV types and intrinic APIs.
* config/riscv/riscv-vector-builtins.cc (expand_builtin): Report
error if V ext is disabled.
* config/riscv/riscv.cc (riscv_return_value_is_vector_type_p):
Ditto.
(riscv_arguments_is_vector_type_p): Ditto.
(riscv_vector_cc_function_p): Ditto.
* config/riscv/riscv_vector.h: Remove error if V is disable.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pragma-1.c: Remove.
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-1.c: New 
test.
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-2.c: New 
test.
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-3.c: New 
test.
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-4.c: New 
test.
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-5.c: New 
test.
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-6.c: New 
test.
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-7.c: New 
test.
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-8.c: New 
test.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/riscv-c.cc   | 18 +++
 gcc/config/riscv/riscv-vector-builtins.cc |  5 
 gcc/config/riscv/riscv.cc | 30 ---
 gcc/config/riscv/riscv_vector.h   |  4 ---
 .../gcc.target/riscv/rvv/base/pragma-1.c  |  4 ---
 .../target_attribute_v_with_intrinsic-1.c |  5 
 .../target_attribute_v_with_intrinsic-2.c | 18 +++
 .../target_attribute_v_with_intrinsic-3.c | 13 
 .../target_attribute_v_with_intrinsic-4.c | 10 +++
 .../target_attribute_v_with_intrinsic-5.c | 12 
 .../target_attribute_v_with_intrinsic-6.c | 12 
 .../target_attribute_v_with_intrinsic-7.c |  9 ++
 .../target_attribute_v_with_intrinsic-8.c | 23 ++
 13 files changed, 145 insertions(+), 18 deletions(-)
 delete mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pragma-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-4.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-5.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-6.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-7.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-8.c

diff --git a/gcc/config/riscv/riscv-c.cc b/gcc/config/riscv/riscv-c.cc
index edb866d51e4..01314037461 100644
--- a/gcc/config/riscv/riscv-c.cc
+++ b/gcc/config/riscv/riscv-c.cc
@@ -201,14 +201,20 @@ riscv_pragma_intrinsic (cpp_reader *)
   if (strcmp (name, "vector") == 0
   || strcmp (name, "xtheadvector") == 0)
 {
-  if (!TARGET_VECTOR)
+  if (TARGET_VECTOR)
+   riscv_vector::handle_pragma_vector ();
+  else /* Indicates riscv_vector.h is included but v is missing in arch  */
{
- error ("%<#pragma riscv intrinsic%> option %qs needs 'V' or "
-"'XTHEADVECTOR' extension enabled",
-name);
- return;
+ /* To make the the rvv types and intrinsic API available for the
+target("arch=+v") attribute,  we need to temporally enable the
+TARGET_VECTOR, and disable it after all initialized.  */
+ target_flags |= MASK_VECTOR;
+
+ riscv_vector::init_builtins ();