Re: [PATCH 13/13] rs6000, remove vector set and vector init built-ins.
Hi, on 2024/4/20 05:18, Carl Love wrote: > rs6000, remove vector set and vector init built-ins. > > The vector init built-ins: > > __builtin_vec_init_v16qi, __builtin_vec_init_v8hi, > __builtin_vec_init_v4si, __builtin_vec_init_v4sf, > __builtin_vec_init_v2di, __builtin_vec_init_v2df, > __builtin_vec_set_v1ti > > perform the same operation as initializing the vector in C code. For > example: > > result_v4si = __builtin_vec_init_v4si (1, 2, 3, 4); > result_v4si = {1, 2, 3, 4}; > > These two constructs were tested and verified they generate identical > assembly instructions with no optimization and -O3 optimization. > > The vector set built-ins: > > __builtin_vec_set_v16qi, __builtin_vec_set_v8hi. > __builtin_vec_set_v4si, __builtin_vec_set_v4sf > > perform the same operation as setting a specific element in the vector in > C code. For example: > > src_v4si = __builtin_vec_set_v4si (src_v4si, int_val, index); > src_v4si[index] = int_val; > > The built-in actually generates more instructions than the inline C code > with no optimization but is identical with -O3 optimizations. > > All of the above built-ins that are removed do not have test cases and > are not documented. > > Built-ins __builtin_vec_set_v1ti __builtin_vec_set_v2di, > __builtin_vec_set_v2df are not removed as they are used in function > resolve_vec_insert() in file rs6000-c.cc. I think we can replace these calls with the equivalent gimple codes (early expanding it) and then we can get rid of these instances. BR, Kewen > > The built-ins are removed as they don't provide any benefit over just > using C code. > > gcc/ChangeLog: > * config/rs6000/rs6000-builtins.def (__builtin_vec_init_v16qi, >__builtin_vec_init_v8hi, __builtin_vec_init_v4si, > __builtin_vec_init_v4sf, __builtin_vec_init_v2di, > __builtin_vec_init_v2df, __builtin_vec_set_v1ti, > __builtin_vec_set_v16qi, __builtin_vec_set_v8hi. > __builtin_vec_set_v4si, __builtin_vec_set_v4sf, > __builtin_vec_set_v2di, __builtin_vec_set_v2df, > __builtin_vec_set_v1ti): Remove built-in definitions. > --- > gcc/config/rs6000/rs6000-builtins.def | 42 ++- > 1 file changed, 2 insertions(+), 40 deletions(-) > > diff --git a/gcc/config/rs6000/rs6000-builtins.def > b/gcc/config/rs6000/rs6000-builtins.def > index 19d05b8043a..d04ad4ce7e5 100644 > --- a/gcc/config/rs6000/rs6000-builtins.def > +++ b/gcc/config/rs6000/rs6000-builtins.def > @@ -1115,37 +1115,6 @@ >const signed short __builtin_vec_ext_v8hi (vss, signed int); > VEC_EXT_V8HI nothing {extract} > > - const vsc __builtin_vec_init_v16qi (signed char, signed char, signed char, > \ > -signed char, signed char, signed char, signed char, signed char, > \ > -signed char, signed char, signed char, signed char, signed char, > \ > -signed char, signed char, signed char); > -VEC_INIT_V16QI nothing {init} > - > - const vf __builtin_vec_init_v4sf (float, float, float, float); > -VEC_INIT_V4SF nothing {init} > - > - const vsi __builtin_vec_init_v4si (signed int, signed int, signed int, \ > - signed int); > -VEC_INIT_V4SI nothing {init} > - > - const vss __builtin_vec_init_v8hi (signed short, signed short, signed > short,\ > - signed short, signed short, signed short, signed short, \ > - signed short); > -VEC_INIT_V8HI nothing {init} > - > - const vsc __builtin_vec_set_v16qi (vsc, signed char, const int<4>); > -VEC_SET_V16QI nothing {set} > - > - const vf __builtin_vec_set_v4sf (vf, float, const int<2>); > -VEC_SET_V4SF nothing {set} > - > - const vsi __builtin_vec_set_v4si (vsi, signed int, const int<2>); > -VEC_SET_V4SI nothing {set} > - > - const vss __builtin_vec_set_v8hi (vss, signed short, const int<3>); > -VEC_SET_V8HI nothing {set} > - > - > ; Cell builtins. > [cell] >pure vsc __builtin_altivec_lvlx (signed long, const void *); > @@ -1292,15 +1261,8 @@ >const signed long long __builtin_vec_ext_v2di (vsll, signed int); > VEC_EXT_V2DI nothing {extract} > > - const vsq __builtin_vec_init_v1ti (signed __int128); > -VEC_INIT_V1TI nothing {init} > - > - const vd __builtin_vec_init_v2df (double, double); > -VEC_INIT_V2DF nothing {init} > - > - const vsll __builtin_vec_init_v2di (signed long long, signed long long); > -VEC_INIT_V2DI nothing {init} > - > +;; VEC_SET_V1TI, VEC_SET_V2DF and VEC_SET_V2DI are used in > +;; resolve_vec_insert(), rs6000-c.cc >const vsq __builtin_vec_set_v1ti (vsq, signed __int128, const int<0,0>); > VEC_SET_V1TI nothing {set} >
Re: [PATCH 12/13] rs6000, remove __builtin_vsx_xvcmpeqsp built-in
Hi, on 2024/4/20 05:18, Carl Love wrote: > rs6000, remove __builtin_vsx_xvcmpeqsp built-in > > The built-in __builtin_vsx_xvcmpeqsp is a duplicate of the overloaded > vec_cmpeq built-in. The built-in is undocumented. The built-in and > the test cases are removed. > > gcc/ChangeLog: > * config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcmpeqsp): > Remove built-in definition. > Ah, you separated this __builtin_vsx_xvcmpeqsp from the one for __builtin_vsx_xvcmpeqsp_p, it's fine, please ignore the comments for considering this __builtin_vsx_xvcmpeqsp in my previous reply to 11/13. > gcc/testsuite/ChangeLog: > * vsx-builtin-3.c (do_cmp): Remove test case for > __builtin_vsx_xvcmpeqsp. > --- > gcc/config/rs6000/rs6000-builtins.def| 3 --- > gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c | 2 -- > 2 files changed, 5 deletions(-) > > diff --git a/gcc/config/rs6000/rs6000-builtins.def > b/gcc/config/rs6000/rs6000-builtins.def > index 2f6149edd5f..19d05b8043a 100644 > --- a/gcc/config/rs6000/rs6000-builtins.def > +++ b/gcc/config/rs6000/rs6000-builtins.def > @@ -1613,9 +1613,6 @@ >const signed int __builtin_vsx_xvcmpeqdp_p (signed int, vd, vd); > XVCMPEQDP_P vector_eq_v2df_p {pred} > > - const vf __builtin_vsx_xvcmpeqsp (vf, vf); > -XVCMPEQSP vector_eqv4sf {} > - >const vd __builtin_vsx_xvcmpgedp (vd, vd); > XVCMPGEDP vector_gev2df {} > > diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c > b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c > index 35ea31b2616..245893dc0e3 100644 > --- a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c > +++ b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c > @@ -27,7 +27,6 @@ > /* { dg-final { scan-assembler "xvcmpeqdp" } } */ > /* { dg-final { scan-assembler "xvcmpgtdp" } } */ > /* { dg-final { scan-assembler "xvcmpgedp" } } */ > -/* { dg-final { scan-assembler "xvcmpeqsp" } } */ > /* { dg-final { scan-assembler "xvcmpgtsp" } } */ > /* { dg-final { scan-assembler "xvcmpgesp" } } */ > /* { dg-final { scan-assembler "xxsldwi" } } */ > @@ -112,7 +111,6 @@ int do_cmp (void) >d[i][0] = __builtin_vsx_xvcmpgtdp (d[i][1], d[i][2]); i++; >d[i][0] = __builtin_vsx_xvcmpgedp (d[i][1], d[i][2]); i++; > > - f[i][0] = __builtin_vsx_xvcmpeqsp (f[i][1], f[i][2]); i++; >f[i][0] = __builtin_vsx_xvcmpgtsp (f[i][1], f[i][2]); i++; >f[i][0] = __builtin_vsx_xvcmpgesp (f[i][1], f[i][2]); i++; >return i; As the other in this patch series, I prefer to change it with vec_cmpeq here, OK for trunk with this tweaked (also keep the scan there), thanks! BR, Kewen
Re: [PATCH 10/13] rs6000, extend vec_xxpermdi built-in for __int128 args
Hi, on 2024/4/20 05:18, Carl Love wrote: > rs6000, extend vec_xxpermdi built-in for __int128 args > > Add a new overloaded instance for vec_xxpermdi > >__int128 vec_xxpermdi (__int128, __int128, const int); > > Update the documentation to include a reference to the new built-in > instance. > > gcc/ChangeLog: > * config/rs6000/rs6000-builtins.def (vec_xxpermdi): Add new > overloaded built-in instance. > --- > gcc/config/rs6000/rs6000-overload.def | 2 ++ > gcc/doc/extend.texi | 1 + > 2 files changed, 3 insertions(+) > > diff --git a/gcc/config/rs6000/rs6000-overload.def > b/gcc/config/rs6000/rs6000-overload.def > index 5912c9452f4..49962e2f2a2 100644 > --- a/gcc/config/rs6000/rs6000-overload.def > +++ b/gcc/config/rs6000/rs6000-overload.def > @@ -4932,6 +4932,8 @@ > XXPERMDI_4SF XXPERMDI_VF >vd __builtin_vsx_xxpermdi (vd, vd, const int); > XXPERMDI_2DF XXPERMDI_VD > + vsq __builtin_vsx_xxpermdi (vsq, vsq, const int); > +XXPERMDI_1TI XXPERMDI_1TI This actually introduces the signed __int128, considering the other existing ones, I think we want both signed and unsigned. > > [VEC_XXSLDWI, vec_xxsldwi, __builtin_vsx_xxsldwi] >vsc __builtin_vsx_xxsldwi (vsc, vsc, const int); > diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi > index 86b8e536dbe..47cf2f3bc8b 100644 > --- a/gcc/doc/extend.texi > +++ b/gcc/doc/extend.texi > @@ -22505,6 +22505,7 @@ void vec_vsx_st (vector bool char, int, vector bool > char *); > void vec_vsx_st (vector bool char, int, unsigned char *); > void vec_vsx_st (vector bool char, int, signed char *); > > +vector __int128 vec_xxpermdi (vector __int128, vector __int128, const int); > vector double vec_xxpermdi (vector double, vector double, const int); > vector float vec_xxpermdi (vector float, vector float, const int); Nit: Considering the existing ones sorted by element size descending, I guess it's better to move the above here (and with the explicit signed and unsigned). And we need a test case for it as well? BR, Kewen > vector long long vec_xxpermdi (vector long long, vector long long, const > int);
Re: [PATCH v2 2/2] RISC-V: avoid LUI based const mat in prologue/epilogue expansion [PR/105733]
On 5/13/24 6:54 PM, Patrick O'Neill wrote: On 5/13/24 13:28, Jeff Law wrote: On 5/13/24 12:49 PM, Vineet Gupta wrote: If the constant used for stack offset can be expressed as sum of two S12 values, the constant need not be materialized (in a reg) and instead the two S12 bits can be added to instructions involved with frame pointer. This avoids burning a register and more importantly can often get down to be 2 insn vs. 3. The prev patches to generally avoid LUI based const materialization didn't fix this PR and need this directed fix in funcion prologue/epilogue expansion. This fix doesn't move the neddle for SPEC, at all, but it is still a win considering gcc generates one insn fewer than llvm for the test ;-) gcc-13.1 release | gcc 230823 | | | g6619b3d4c15c | This patch | clang/llvm - li t0,-4096 | li t0,-4096 | addi sp,sp,-2048 | addi sp,sp,-2048 addi t0,t0,2016 | addi t0,t0,2032 | add sp,sp,-16 | addi sp,sp,-32 li a4,4096 | add sp,sp,t0 | add a5,sp,a0 | add a1,sp,16 add sp,sp,t0 | addi a5,sp,-2032 | sb zero,0(a5) | add a0,a0,a1 li a5,-4096 | add a0,a5,a0 | addi sp,sp,2032 | sb zero,0(a0) addi a4,a4,-2032 | li t0, 4096 | addi sp,sp,32 | addi sp,sp,2032 add a4,a4,a5 | sb zero,2032(a0) | ret | addi sp,sp,48 addi a5,sp,16 | addi t0,t0,-2032 | | ret add a5,a4,a5 | add sp,sp,t0 | add a0,a5,a0 | ret | li t0,4096 | sd a5,8(sp) | sb zero,2032(a0)| addi t0,t0,-2016 | add sp,sp,t0 | ret | gcc/ChangeLog: PR target/105733 * config/riscv/riscv.h: New macros for with aligned offsets. * config/riscv/riscv.cc (riscv_split_sum_of_two_s12): New function to split a sum of two s12 values into constituents. (riscv_expand_prologue): Handle offset being sum of two S12. (riscv_expand_epilogue): Ditto. * config/riscv/riscv-protos.h (riscv_split_sum_of_two_s12): New. gcc/testsuite/ChangeLog: * gcc.target/riscv/pr105733.c: New Test. * gcc.target/riscv/rvv/autovec/vls/spill-1.c: Adjust to not expect LUI 4096. * gcc.target/riscv/rvv/autovec/vls/spill-2.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/spill-3.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/spill-4.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/spill-5.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/spill-6.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/spill-7.c: Ditto. @@ -8074,14 +8111,26 @@ riscv_expand_epilogue (int style) } else { - if (!SMALL_OPERAND (adjust_offset.to_constant ())) + HOST_WIDE_INT adj_off_value = adjust_offset.to_constant (); + if (SMALL_OPERAND (adj_off_value)) + { + adjust = GEN_INT (adj_off_value); + } + else if (SUM_OF_TWO_S12_ALGN (adj_off_value)) + { + HOST_WIDE_INT base, off; + riscv_split_sum_of_two_s12 (adj_off_value, , ); + insn = gen_add3_insn (stack_pointer_rtx, hard_frame_pointer_rtx, + GEN_INT (base)); + RTX_FRAME_RELATED_P (insn) = 1; + adjust = GEN_INT (off); + } So this was the hunk that we identified internally as causing problems with libgomp's testsuite. We never fully chased it down as this hunk didn't seem terribly important performance wise -- we just set it aside. The thing is it looked basically correct to me. So the failure was certainly unexpected, but it was consistent. So I think the question is whether or not the CI system runs the libgomp testsuite, particularly in the rv64 linux configuration. If it does, and it passes, then we're good. I'm still finding my way around the configuration, so I don't know if the CI system Edwin & Patrick have built tests libgomp or not. I poked around the .sum files in pre/postcommit and we do run tests like: PASS: c-c++-common/gomp/affinity-2.c (test for errors, line 45) I was able to find the summary info: Tests that now fail, but worked before (15 tests): libgomp: libgomp.fortran/simd7.f90 -O0 execution test libgomp: libgomp.fortran/task2.f90 -O0 execution test libgomp: libgomp.fortran/vla2.f90 -O0 execution test libgomp: libgomp.fortran/vla3.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test libgomp: libgomp.fortran/vla3.f90 -O3 -g execution test libgomp: libgomp.fortran/vla4.f90 -O1 execution test libgomp: libgomp.fortran/vla4.f90 -O2 execution test libgomp: libgomp.fortran/vla4.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test libgomp: libgomp.fortran/vla4.f90 -O3 -g execution test libgomp: libgomp.fortran/vla4.f90 -Os execution test
Re: [PATCH] report message for operator %a on unaddressible exp
Hi, "Kewen.Lin" writes: > Hi, > > on 2024/5/14 11:00, Jiufu Guo wrote: >> Hi, >> >> Thanks a lot for your helpful review! >> >> "Kewen.Lin" writes: >> >>> Hi, >>> >>> on 2024/5/13 10:57, Jiufu Guo wrote: Hi, For PR96866, when gcc print asm code for modifier "%a" which requires an address operand, while the operand is with the constraint "X" which allow non-address form. An error message would be reported to indicate the invalid asm operands. Bootstrap pass on ppc64{,le}. Is this ok for trunk? BR, Jeff(Jiufu Guo) PR target/96866 gcc/ChangeLog: * config/rs6000/rs6000.cc (print_operand_address): gcc/testsuite/ChangeLog: * gcc.target/powerpc/pr96866-1.c: New test. * gcc.target/powerpc/pr96866-2.c: New test. --- gcc/config/rs6000/rs6000.cc | 6 ++ gcc/testsuite/gcc.target/powerpc/pr96866-1.c | 15 +++ gcc/testsuite/gcc.target/powerpc/pr96866-2.c | 10 ++ 3 files changed, 31 insertions(+) create mode 100644 gcc/testsuite/gcc.target/powerpc/pr96866-1.c create mode 100644 gcc/testsuite/gcc.target/powerpc/pr96866-2.c diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index 117999613d8..50943d76f79 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -14659,6 +14659,12 @@ print_operand_address (FILE *file, rtx x) else if (SYMBOL_REF_P (x) || GET_CODE (x) == CONST || GET_CODE (x) == LABEL_REF) { + if (this_is_asm_operands && !address_operand (x, VOIDmode)) >>> >>> Do we really need this_is_asm_operands here? >> I understand your point: >> since in function 'print_operand_address' which supports not only user >> asm code. So, it maybe incorrect if 'x' is not an 'address_operand', >> no matter this_is_asm_operands. >> >> Here, 'this_is_asm_operands' is needed because it would be treated as an >> user fault in asm-code (otherwise, internal_error in the compiler). > > The called function "output_operand_lossage" already takes different > actions for this_is_asm_operands and !this_is_asm_operands cases, so > for this_is_asm_operands, it goes with error_for_asm and no ICE, no? > > And without this_is_asm_operands, if we adopt constraint X internally > and hit this (it means it's already unexpected), isn't better to see > the ICE instead of going further? Yeap, exactly! "output_operand_lossage" could handle both user 'asm' error and internal_error. So it would be ok to call it directly just for "gcc_assert(TARGET_TOC)" for this "if condition". Like: ``` else if (TARGET_TOC) output_operand_lossage ("invalid expression as operand"); ``` I would refine the patch. Thanks again for your great comments. BR, Jeff(Jiufu) Guo > > BR, > Kewen > >> >> I notice one thing: >> As what we need is emitting error for printing address if the address >> can not be access directly. >> So it would be better to emit message through 'output_operand_lossage' >> just befor gcc_assert(TARGET_TOC). >> >> Thanks a lot for your insight comment! >> >>> + { +output_operand_lossage ("invalid expression as operand"); +return; + } + output_addr_const (file, x); if (small_data_operand (x, GET_MODE (x))) fprintf (file, "@%s(%s)", SMALL_DATA_RELOC, diff --git a/gcc/testsuite/gcc.target/powerpc/pr96866-1.c b/gcc/testsuite/gcc.target/powerpc/pr96866-1.c new file mode 100644 index 000..6554a472a11 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr96866-1.c @@ -0,0 +1,15 @@ +/* It's to verify no ICE here, ignore error messages about invalid 'asm'. */ +/* { dg-excess-errors "pr96866-2.c" } */ +/* { dg-options "-fPIC -O2" } */ >>> >>> Nit: If these two options are required, it would be good to have a comment >>> explaining it a bit >>> when it's not obvious. >> >> Good suggestion, thanks! >>> + +int x[2]; + +int __attribute__ ((noipa)) +f1 (void) +{ + int n; + int *p = x; + *p++; + __asm__ volatile("ld %0, %a1" : "=r"(n) : "X"(p)); + return n; +} diff --git a/gcc/testsuite/gcc.target/powerpc/pr96866-2.c b/gcc/testsuite/gcc.target/powerpc/pr96866-2.c new file mode 100644 index 000..a5ec96f29dd --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr96866-2.c @@ -0,0 +1,10 @@ +/* It's to verify no ICE here, ignore error messages about invalid 'asm'. */ +/* { dg-excess-errors "pr96866-2.c" } */ +/* { dg-options "-fPIC -O2" } */ >>> >>> Ditto. >> Thanks! >> >> BR, >> Jeff(Jiufu) Guo >>> >>> BR, >>> Kewen >>> + +void +f (void) +{ + extern int x; + __asm__ volatile("#%a0" ::"X"()); +}
Re: [PATCH] report message for operator %a on unaddressible exp
Hi, on 2024/5/14 11:00, Jiufu Guo wrote: > Hi, > > Thanks a lot for your helpful review! > > "Kewen.Lin" writes: > >> Hi, >> >> on 2024/5/13 10:57, Jiufu Guo wrote: >>> Hi, >>> >>> For PR96866, when gcc print asm code for modifier "%a" which requires >>> an address operand, while the operand is with the constraint "X" which >>> allow non-address form. An error message would be reported to indicate >>> the invalid asm operands. >>> >>> Bootstrap pass on ppc64{,le}. >>> Is this ok for trunk? >>> >>> BR, >>> Jeff(Jiufu Guo) >>> >>> PR target/96866 >>> >>> gcc/ChangeLog: >>> >>> * config/rs6000/rs6000.cc (print_operand_address): >>> >>> gcc/testsuite/ChangeLog: >>> >>> * gcc.target/powerpc/pr96866-1.c: New test. >>> * gcc.target/powerpc/pr96866-2.c: New test. >>> >>> --- >>> gcc/config/rs6000/rs6000.cc | 6 ++ >>> gcc/testsuite/gcc.target/powerpc/pr96866-1.c | 15 +++ >>> gcc/testsuite/gcc.target/powerpc/pr96866-2.c | 10 ++ >>> 3 files changed, 31 insertions(+) >>> create mode 100644 gcc/testsuite/gcc.target/powerpc/pr96866-1.c >>> create mode 100644 gcc/testsuite/gcc.target/powerpc/pr96866-2.c >>> >>> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc >>> index 117999613d8..50943d76f79 100644 >>> --- a/gcc/config/rs6000/rs6000.cc >>> +++ b/gcc/config/rs6000/rs6000.cc >>> @@ -14659,6 +14659,12 @@ print_operand_address (FILE *file, rtx x) >>>else if (SYMBOL_REF_P (x) || GET_CODE (x) == CONST >>>|| GET_CODE (x) == LABEL_REF) >>> { >>> + if (this_is_asm_operands && !address_operand (x, VOIDmode)) >> >> Do we really need this_is_asm_operands here? > I understand your point: > since in function 'print_operand_address' which supports not only user > asm code. So, it maybe incorrect if 'x' is not an 'address_operand', > no matter this_is_asm_operands. > > Here, 'this_is_asm_operands' is needed because it would be treated as an > user fault in asm-code (otherwise, internal_error in the compiler). The called function "output_operand_lossage" already takes different actions for this_is_asm_operands and !this_is_asm_operands cases, so for this_is_asm_operands, it goes with error_for_asm and no ICE, no? And without this_is_asm_operands, if we adopt constraint X internally and hit this (it means it's already unexpected), isn't better to see the ICE instead of going further? BR, Kewen > > I notice one thing: > As what we need is emitting error for printing address if the address > can not be access directly. > So it would be better to emit message through 'output_operand_lossage' > just befor gcc_assert(TARGET_TOC). > > Thanks a lot for your insight comment! > >> >>> + { >>> + output_operand_lossage ("invalid expression as operand"); >>> + return; >>> + } >>> + >>>output_addr_const (file, x); >>>if (small_data_operand (x, GET_MODE (x))) >>> fprintf (file, "@%s(%s)", SMALL_DATA_RELOC, >>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr96866-1.c >>> b/gcc/testsuite/gcc.target/powerpc/pr96866-1.c >>> new file mode 100644 >>> index 000..6554a472a11 >>> --- /dev/null >>> +++ b/gcc/testsuite/gcc.target/powerpc/pr96866-1.c >>> @@ -0,0 +1,15 @@ >>> +/* It's to verify no ICE here, ignore error messages about invalid 'asm'. >>> */ >>> +/* { dg-excess-errors "pr96866-2.c" } */ >>> +/* { dg-options "-fPIC -O2" } */ >> >> Nit: If these two options are required, it would be good to have a comment >> explaining it a bit >> when it's not obvious. > > Good suggestion, thanks! >> >>> + >>> +int x[2]; >>> + >>> +int __attribute__ ((noipa)) >>> +f1 (void) >>> +{ >>> + int n; >>> + int *p = x; >>> + *p++; >>> + __asm__ volatile("ld %0, %a1" : "=r"(n) : "X"(p)); >>> + return n; >>> +} >>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr96866-2.c >>> b/gcc/testsuite/gcc.target/powerpc/pr96866-2.c >>> new file mode 100644 >>> index 000..a5ec96f29dd >>> --- /dev/null >>> +++ b/gcc/testsuite/gcc.target/powerpc/pr96866-2.c >>> @@ -0,0 +1,10 @@ >>> +/* It's to verify no ICE here, ignore error messages about invalid 'asm'. >>> */ >>> +/* { dg-excess-errors "pr96866-2.c" } */ >>> +/* { dg-options "-fPIC -O2" } */ >> >> Ditto. > Thanks! > > BR, > Jeff(Jiufu) Guo >> >> BR, >> Kewen >> >>> + >>> +void >>> +f (void) >>> +{ >>> + extern int x; >>> + __asm__ volatile("#%a0" ::"X"()); >>> +}
Re: [PATCH 9/13] rs6000, remove __builtin_vsx_xvnegdp and __builtin_vsx_xvnegsp built-ins
Hi, on 2024/4/20 05:18, Carl Love wrote: > rs6000, remove __builtin_vsx_xvnegdp and __builtin_vsx_xvnegsp built-ins > > The undocumented __builtin_vsx_xvnegdp and __builtin_vsx_xvnegsp are > redundant. The overloaded vec_neg built-in provides the same > functionality. The two buit-ins are not documented nor are there any > test cases for them. > > Remove the definitions so users will use the overloaded vec_neg built-in > which is documented in the PVIPR. OK, thanks! BR, Kewen > > gcc/ChangeLog: > * config/rs6000/rs6000-builtins.def (__builtin_vsx_xvnegdp, > __builtin_vsx_xvnegsp): Remove built-in definitions. > --- > gcc/config/rs6000/rs6000-builtins.def | 6 -- > 1 file changed, 6 deletions(-) > > diff --git a/gcc/config/rs6000/rs6000-builtins.def > b/gcc/config/rs6000/rs6000-builtins.def > index f33564d3d9c..d65c858ac0c 100644 > --- a/gcc/config/rs6000/rs6000-builtins.def > +++ b/gcc/config/rs6000/rs6000-builtins.def > @@ -1763,12 +1763,6 @@ >const vf __builtin_vsx_xvnabssp (vf); > XVNABSSP vsx_nabsv4sf2 {} > > - const vd __builtin_vsx_xvnegdp (vd); > -XVNEGDP negv2df2 {} > - > - const vf __builtin_vsx_xvnegsp (vf); > -XVNEGSP negv4sf2 {} > - >const vd __builtin_vsx_xvnmadddp (vd, vd, vd); > XVNMADDDP nfmav2df4 {} >
Re: [PATCH] report message for operator %a on unaddressible exp
Hi, Thanks a lot for your helpful review! "Kewen.Lin" writes: > Hi, > > on 2024/5/13 10:57, Jiufu Guo wrote: >> Hi, >> >> For PR96866, when gcc print asm code for modifier "%a" which requires >> an address operand, while the operand is with the constraint "X" which >> allow non-address form. An error message would be reported to indicate >> the invalid asm operands. >> >> Bootstrap pass on ppc64{,le}. >> Is this ok for trunk? >> >> BR, >> Jeff(Jiufu Guo) >> >> PR target/96866 >> >> gcc/ChangeLog: >> >> * config/rs6000/rs6000.cc (print_operand_address): >> >> gcc/testsuite/ChangeLog: >> >> * gcc.target/powerpc/pr96866-1.c: New test. >> * gcc.target/powerpc/pr96866-2.c: New test. >> >> --- >> gcc/config/rs6000/rs6000.cc | 6 ++ >> gcc/testsuite/gcc.target/powerpc/pr96866-1.c | 15 +++ >> gcc/testsuite/gcc.target/powerpc/pr96866-2.c | 10 ++ >> 3 files changed, 31 insertions(+) >> create mode 100644 gcc/testsuite/gcc.target/powerpc/pr96866-1.c >> create mode 100644 gcc/testsuite/gcc.target/powerpc/pr96866-2.c >> >> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc >> index 117999613d8..50943d76f79 100644 >> --- a/gcc/config/rs6000/rs6000.cc >> +++ b/gcc/config/rs6000/rs6000.cc >> @@ -14659,6 +14659,12 @@ print_operand_address (FILE *file, rtx x) >>else if (SYMBOL_REF_P (x) || GET_CODE (x) == CONST >> || GET_CODE (x) == LABEL_REF) >> { >> + if (this_is_asm_operands && !address_operand (x, VOIDmode)) > > Do we really need this_is_asm_operands here? I understand your point: since in function 'print_operand_address' which supports not only user asm code. So, it maybe incorrect if 'x' is not an 'address_operand', no matter this_is_asm_operands. Here, 'this_is_asm_operands' is needed because it would be treated as an user fault in asm-code (otherwise, internal_error in the compiler). I notice one thing: As what we need is emitting error for printing address if the address can not be access directly. So it would be better to emit message through 'output_operand_lossage' just befor gcc_assert(TARGET_TOC). Thanks a lot for your insight comment! > >> +{ >> + output_operand_lossage ("invalid expression as operand"); >> + return; >> +} >> + >>output_addr_const (file, x); >>if (small_data_operand (x, GET_MODE (x))) >> fprintf (file, "@%s(%s)", SMALL_DATA_RELOC, >> diff --git a/gcc/testsuite/gcc.target/powerpc/pr96866-1.c >> b/gcc/testsuite/gcc.target/powerpc/pr96866-1.c >> new file mode 100644 >> index 000..6554a472a11 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/powerpc/pr96866-1.c >> @@ -0,0 +1,15 @@ >> +/* It's to verify no ICE here, ignore error messages about invalid 'asm'. >> */ >> +/* { dg-excess-errors "pr96866-2.c" } */ >> +/* { dg-options "-fPIC -O2" } */ > > Nit: If these two options are required, it would be good to have a comment > explaining it a bit > when it's not obvious. Good suggestion, thanks! > >> + >> +int x[2]; >> + >> +int __attribute__ ((noipa)) >> +f1 (void) >> +{ >> + int n; >> + int *p = x; >> + *p++; >> + __asm__ volatile("ld %0, %a1" : "=r"(n) : "X"(p)); >> + return n; >> +} >> diff --git a/gcc/testsuite/gcc.target/powerpc/pr96866-2.c >> b/gcc/testsuite/gcc.target/powerpc/pr96866-2.c >> new file mode 100644 >> index 000..a5ec96f29dd >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/powerpc/pr96866-2.c >> @@ -0,0 +1,10 @@ >> +/* It's to verify no ICE here, ignore error messages about invalid 'asm'. >> */ >> +/* { dg-excess-errors "pr96866-2.c" } */ >> +/* { dg-options "-fPIC -O2" } */ > > Ditto. Thanks! BR, Jeff(Jiufu) Guo > > BR, > Kewen > >> + >> +void >> +f (void) >> +{ >> + extern int x; >> + __asm__ volatile("#%a0" ::"X"()); >> +}
Re: [PATCH 8/13] rs6000, remove __builtin_vsx_vperm_* built-ins
Hi, on 2024/4/20 05:18, Carl Love wrote: > rs6000, remove __builtin_vsx_vperm_* built-ins > > The undocumented built-ins: > __builtin_vsx_vperm_16qi_uns, > __builtin_vsx_vperm_1ti, > __builtin_vsx_vperm_1ti_uns, > __builtin_vsx_vperm_2df, > __builtin_vsx_vperm_2di, > __builtin_vsx_vperm_2di_uns, > __builtin_vsx_vperm_4sf, > __builtin_vsx_vperm_4si, > __builtin_vsx_vperm_4si_uns > > are duplicats of the __builtin_altivec_* builtins that are used by > the overloaded vec_perm built-in that is documented in the PVIPR. > > gcc/ChangeLog: > * config/rs6000/rs6000-builtins.def (__builtin_vsx_vperm_16qi_uns, > __builtin_vsx_vperm_1ti, __builtin_vsx_vperm_1ti_uns, > __builtin_vsx_vperm_2df, __builtin_vsx_vperm_2di, > __builtin_vsx_vperm_2di_uns, __builtin_vsx_vperm_4sf, > __builtin_vsx_vperm_4si, __builtin_vsx_vperm_4si_uns): Remove > built-in definitions and comments. > > gcc/testsuite/ChangeLog: > * gcc.target/powerpc/vsx-builtin-3.c (__builtin_vsx_vperm_16qi_uns, >__builtin_vsx_vperm_1ti, __builtin_vsx_vperm_1ti_uns, > __builtin_vsx_vperm_2df, __builtin_vsx_vperm_2di, > __builtin_vsx_vperm_2di_uns, __builtin_vsx_vperm_4sf, > __builtin_vsx_vperm_4si, __builtin_vsx_vperm_4si_uns): Remove > test cases. > --- > gcc/config/rs6000/rs6000-builtins.def | 33 --- > .../gcc.target/powerpc/vsx-builtin-3.c| 20 --- > 2 files changed, 53 deletions(-) > > diff --git a/gcc/config/rs6000/rs6000-builtins.def > b/gcc/config/rs6000/rs6000-builtins.def > index 3c409d729ea..f33564d3d9c 100644 > --- a/gcc/config/rs6000/rs6000-builtins.def > +++ b/gcc/config/rs6000/rs6000-builtins.def > @@ -1529,39 +1529,6 @@ >const vf __builtin_vsx_uns_floato_v2di (vsll); > UNS_FLOATO_V2DI unsfloatov2di {} > > -; These are duplicates of __builtin_altivec_* counterparts, and are being > -; kept for backwards compatibility. The reason for their existence is > -; unclear. TODO: Consider deprecation/removal at some point. > - const vsc __builtin_vsx_vperm_16qi (vsc, vsc, vuc); > -VPERM_16QI_X altivec_vperm_v16qi {} > - > - const vuc __builtin_vsx_vperm_16qi_uns (vuc, vuc, vuc); > -VPERM_16QI_UNS_X altivec_vperm_v16qi_uns {} > - > - const vsq __builtin_vsx_vperm_1ti (vsq, vsq, vsc); > -VPERM_1TI_X altivec_vperm_v1ti {} > - > - const vsq __builtin_vsx_vperm_1ti_uns (vsq, vsq, vsc); > -VPERM_1TI_UNS_X altivec_vperm_v1ti_uns {} > - > - const vd __builtin_vsx_vperm_2df (vd, vd, vuc); > -VPERM_2DF_X altivec_vperm_v2df {} > - > - const vsll __builtin_vsx_vperm_2di (vsll, vsll, vuc); > -VPERM_2DI_X altivec_vperm_v2di {} > - > - const vull __builtin_vsx_vperm_2di_uns (vull, vull, vuc); > -VPERM_2DI_UNS_X altivec_vperm_v2di_uns {} > - > - const vf __builtin_vsx_vperm_4sf (vf, vf, vuc); > -VPERM_4SF_X altivec_vperm_v4sf {} > - > - const vsi __builtin_vsx_vperm_4si (vsi, vsi, vuc); > -VPERM_4SI_X altivec_vperm_v4si {} > - > - const vui __builtin_vsx_vperm_4si_uns (vui, vui, vuc); > -VPERM_4SI_UNS_X altivec_vperm_v4si_uns {} > - >const vss __builtin_vsx_vperm_8hi (vss, vss, vuc); > VPERM_8HI_X altivec_vperm_v8hi {} > > diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c > b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c > index 01f35dad713..35ea31b2616 100644 > --- a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c > +++ b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c > @@ -2,7 +2,6 @@ > /* { dg-skip-if "" { powerpc*-*-darwin* } } */ > /* { dg-require-effective-target powerpc_vsx_ok } */ > /* { dg-options "-O2 -mdejagnu-cpu=power7" } */ > -/* { dg-final { scan-assembler "vperm" } } */ > /* { dg-final { scan-assembler "xvrdpi" } } */ > /* { dg-final { scan-assembler "xvrdpic" } } */ > /* { dg-final { scan-assembler "xvrdpim" } } */ > @@ -56,25 +55,6 @@ extern __vector unsigned long long ull[][4]; > extern __vector __bool long bl[][4]; > #endif > > -int do_perm(void) > -{ > - int i = 0; > - > - si[i][0] = __builtin_vsx_vperm_4si (si[i][1], si[i][2], uc[i][3]); i++; > - ss[i][0] = __builtin_vsx_vperm_8hi (ss[i][1], ss[i][2], uc[i][3]); i++; > - sc[i][0] = __builtin_vsx_vperm_16qi (sc[i][1], sc[i][2], uc[i][3]); i++; > - f[i][0] = __builtin_vsx_vperm_4sf (f[i][1], f[i][2], uc[i][3]); i++; > - d[i][0] = __builtin_vsx_vperm_2df (d[i][1], d[i][2], uc[i][3]); i++; > - > - si[i][0] = __builtin_vsx_vperm (si[i][1], si[i][2], uc[i][3]); i++; > - ss[i][0] = __builtin_vsx_vperm (ss[i][1], ss[i][2], uc[i][3]); i++; > - sc[i][0] = __builtin_vsx_vperm (sc[i][1], sc[i][2], uc[i][3]); i++; > - f[i][0] = __builtin_vsx_vperm (f[i][1], f[i][2], uc[i][3]); i++; > - d[i][0] = __builtin_vsx_vperm (d[i][1], d[i][2], uc[i][3]); i++; > - > - return i; > -} > - I prefer to just relace these __builtin_vsx_vperm with vec_perm, OK with this tweaked (also keep the above removed vperm scan), thanks! BR, Kewen > int do_xxperm (void) > { >int i
Re: [PATCH 7/13] rs6000, remove the vec_xxsel built-ins, they are duplicates
Hi, on 2024/4/20 05:18, Carl Love wrote: > rs6000, remove the vec_xxsel built-ins, they are duplicates > > The following undocumented built-ins are covered by the existing overloaded > vec_sel built-in definitions. > > const vsc __builtin_vsx_xxsel_16qi (vsc, vsc, vsc); > same as vsc __builtin_vec_sel (vsc, vsc, vuc); (overloaded vec_sel) > > const vuc __builtin_vsx_xxsel_16qi_uns (vuc, vuc, vuc); > same as vuc __builtin_vec_sel (vuc, vuc, vuc); (overloaded vec_sel) > > const vd __builtin_vsx_xxsel_2df (vd, vd, vd); > same as vd __builtin_vec_sel (vd, vd, vull); (overloaded vec_sel) > > const vsll __builtin_vsx_xxsel_2di (vsll, vsll, vsll); > same as vsll __builtin_vec_sel (vsll, vsll, vsll); (overloaded vec_sel) > > const vull __builtin_vsx_xxsel_2di_uns (vull, vull, vull); > same as vull __builtin_vec_sel (vull, vull, vsll); (overloaded vec_sel) > > const vf __builtin_vsx_xxsel_4sf (vf, vf, vf); > same as vf __builtin_vec_sel (vf, vf, vsi) (overloaded vec_sel) > > const vsi __builtin_vsx_xxsel_4si (vsi, vsi, vsi); > same as vsi __builtin_vec_sel (vsi, vsi, vbi); (overloaded vec_sel) > > const vui __builtin_vsx_xxsel_4si_uns (vui, vui, vui); > same as vui __builtin_vec_sel (vui, vui, vui); (overloaded vec_sel) > > const vss __builtin_vsx_xxsel_8hi (vss, vss, vss); > same as vss __builtin_vec_sel (vss, vss, vbs); (overloaded vec_sel) > > const vus __builtin_vsx_xxsel_8hi_uns (vus, vus, vus); > same as vus __builtin_vec_sel (vus, vus, vus); (overloaded vec_sel) > > This patch removed the duplicate built-in definitions so users will only > use the documented vec_sel built-in. The __builtin_vsx_xxsel_[4si, 8hi, > 16qi, 4sf, 2df] tests are also removed. > > gcc/ChangeLog: > * config/rs6000/rs6000-builtins.def (__builtin_vsx_xxmrglw_4si, Typo: __builtin_vsx_xxmrglw_4si, which doesn't belong to this patch. > __builtin_vsx_xxsel_16qi, __builtin_vsx_xxsel_16qi_uns, > __builtin_vsx_xxsel_2df, __builtin_vsx_xxsel_2di, > __builtin_vsx_xxsel_2di_uns, __builtin_vsx_xxsel_4sf, > __builtin_vsx_xxsel_4si, __builtin_vsx_xxsel_4si_uns, > __builtin_vsx_xxsel_8hi, __builtin_vsx_xxsel_8hi_uns): Remove > built-in definitions. > > gcc/testsuite/ChangeLog: > * gcc.target/powerpc/vsx-builtin-3.c (__builtin_vsx_xxsel_4si, > __builtin_vsx_xxsel_8hi, __builtin_vsx_xxsel_16qi, > __builtin_vsx_xxsel_4sf, __builtin_vsx_xxsel_2df): Remove test > cases for removed built-ins. > --- > gcc/config/rs6000/rs6000-builtins.def | 30 --- > .../gcc.target/powerpc/vsx-builtin-3.c| 26 > 2 files changed, 56 deletions(-) > > diff --git a/gcc/config/rs6000/rs6000-builtins.def > b/gcc/config/rs6000/rs6000-builtins.def > index 46d2ae7b7cb..3c409d729ea 100644 > --- a/gcc/config/rs6000/rs6000-builtins.def > +++ b/gcc/config/rs6000/rs6000-builtins.def > @@ -1925,36 +1925,6 @@ >const vss __builtin_vsx_xxpermdi_8hi (vss, vss, const int<2>); > XXPERMDI_8HI vsx_xxpermdi_v8hi {} > > - const vsc __builtin_vsx_xxsel_16qi (vsc, vsc, vsc); > -XXSEL_16QI vector_select_v16qi {} > - > - const vuc __builtin_vsx_xxsel_16qi_uns (vuc, vuc, vuc); > -XXSEL_16QI_UNS vector_select_v16qi_uns {} > - > - const vd __builtin_vsx_xxsel_2df (vd, vd, vd); > -XXSEL_2DF vector_select_v2df {} > - > - const vsll __builtin_vsx_xxsel_2di (vsll, vsll, vsll); > -XXSEL_2DI vector_select_v2di {} > - > - const vull __builtin_vsx_xxsel_2di_uns (vull, vull, vull); > -XXSEL_2DI_UNS vector_select_v2di_uns {} > - > - const vf __builtin_vsx_xxsel_4sf (vf, vf, vf); > -XXSEL_4SF vector_select_v4sf {} > - > - const vsi __builtin_vsx_xxsel_4si (vsi, vsi, vsi); > -XXSEL_4SI vector_select_v4si {} > - > - const vui __builtin_vsx_xxsel_4si_uns (vui, vui, vui); > -XXSEL_4SI_UNS vector_select_v4si_uns {} > - > - const vss __builtin_vsx_xxsel_8hi (vss, vss, vss); > -XXSEL_8HI vector_select_v8hi {} > - > - const vus __builtin_vsx_xxsel_8hi_uns (vus, vus, vus); > -XXSEL_8HI_UNS vector_select_v8hi_uns {} > - >const vsc __builtin_vsx_xxsldwi_16qi (vsc, vsc, const int<2>); > XXSLDWI_16QI vsx_xxsldwi_v16qi {} > > diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c > b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c > index ff875c55304..01f35dad713 100644 > --- a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c > +++ b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c > @@ -2,7 +2,6 @@ > /* { dg-skip-if "" { powerpc*-*-darwin* } } */ > /* { dg-require-effective-target powerpc_vsx_ok } */ > /* { dg-options "-O2 -mdejagnu-cpu=power7" } */ > -/* { dg-final { scan-assembler "xxsel" } } */ > /* { dg-final { scan-assembler "vperm" } } */ > /* { dg-final { scan-assembler "xvrdpi" } } */ > /* { dg-final { scan-assembler "xvrdpic" } } */ > @@ -57,31 +56,6 @@ extern __vector unsigned long long ull[][4]; > extern __vector __bool long
Re: [PATCH 6/13] rs6000, add overloaded vec_sel with int128 arguments
Hi, on 2024/4/20 05:17, Carl Love wrote: > rs6000, add overloaded vec_sel with int128 arguments > > Extend the vec_sel built-in to take three signed/unsigned int128 arguments > and return a signed/unsigned int128 result. > > Extending the vec_sel built-in makes the existing buit-ins > __builtin_vsx_xxsel_1ti and __builtin_vsx_xxsel_1ti_uns obsolete. The > patch removes these built-ins. > > The patch adds documentation and test cases for the new overloaded vec_sel > built-ins. > > gcc/ChangeLog: > * config/rs6000/rs6000-builtins.def (__builtin_vsx_xxsel_1ti, > __builtin_vsx_xxsel_1ti_uns): Remove built-in definitions. > * config/rs6000/rs6000-overload.def (vec_sel): Add new overloaded > definitions. > * doc/extend.texi: Add documentation for new vec_sel arguments. > > gcc/testsuite/ChangeLog: > * gcc.target/powerpc/vec_sel_runnable-int128.c: New test file. > --- > gcc/config/rs6000/rs6000-builtins.def | 6 -- > gcc/config/rs6000/rs6000-overload.def | 4 + > gcc/doc/extend.texi | 14 > .../powerpc/vec-sel-runnable-i128.c | 84 +++ > 4 files changed, 102 insertions(+), 6 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-sel-runnable-i128.c > > diff --git a/gcc/config/rs6000/rs6000-builtins.def > b/gcc/config/rs6000/rs6000-builtins.def > index d09e21a9151..46d2ae7b7cb 100644 > --- a/gcc/config/rs6000/rs6000-builtins.def > +++ b/gcc/config/rs6000/rs6000-builtins.def > @@ -1931,12 +1931,6 @@ >const vuc __builtin_vsx_xxsel_16qi_uns (vuc, vuc, vuc); > XXSEL_16QI_UNS vector_select_v16qi_uns {} > > - const vsq __builtin_vsx_xxsel_1ti (vsq, vsq, vsq); > -XXSEL_1TI vector_select_v1ti {} > - > - const vsq __builtin_vsx_xxsel_1ti_uns (vsq, vsq, vsq); > -XXSEL_1TI_UNS vector_select_v1ti_uns {} > - >const vd __builtin_vsx_xxsel_2df (vd, vd, vd); > XXSEL_2DF vector_select_v2df {} > > diff --git a/gcc/config/rs6000/rs6000-overload.def > b/gcc/config/rs6000/rs6000-overload.def > index 68501c05289..5912c9452f4 100644 > --- a/gcc/config/rs6000/rs6000-overload.def > +++ b/gcc/config/rs6000/rs6000-overload.def > @@ -3274,6 +3274,10 @@ > VSEL_2DF VSEL_2DF_B >vd __builtin_vec_sel (vd, vd, vull); > VSEL_2DF VSEL_2DF_U > + vsq __builtin_vec_sel (vsq, vsq, vsq); > +VSEL_1TI VSEL_1TI_S > + vuq __builtin_vec_sel (vuq, vuq, vuq); > +VSEL_1TI_UNS VSEL_1TI_U > ; The following variants are deprecated. >vsll __builtin_vec_sel (vsll, vsll, vsll); > VSEL_2DI_B VSEL_2DI_S > diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi > index 64a43b55e2d..86b8e536dbe 100644 > --- a/gcc/doc/extend.texi > +++ b/gcc/doc/extend.texi > @@ -23358,6 +23358,20 @@ The programmer is responsible for understanding the > endianness issues involved > with the first argument and the result. > @findex vec_replace_unaligned > > +Vector select > + > +@smallexample > +vector signed __int128 vec_sel (vector signed __int128, > + vector signed __int128, vector signed __int128); > +vector unsigned __int128 vec_sel (vector unsigned __int128, > + vector unsigned __int128, vector unsigned __int128); > +@end smallexample > + > +The overloaded built-in @code{vec_sel} with vector signed/unsigned __int128 > +arguments and returns a vector selecting bits from the two source vectors > based > +on the values of the third input vector. This built-in is an extension of > the > +@code{vec_sel} built-in documented in the PVIPR. > + Why did you place this in a section for ISA 3.1 (Power10)? It doesn't really require this support. The used instance VSEL_1TI and VSEL_1TI_UNS are placed in altivec stanza, so it looks that we should put it under the section "PowerPC AltiVec Built-in Functions on ISA 2.05". And since it's an extension of @code{vec_sel} documented in the PVIPR, I prefer to just mention it's "an extension of the @code{vec_sel} built-in documented in the PVIPR" and omitting the description to avoid possible slightly different wording. > Vector Shift Left Double Bit Immediate > @smallexample > @exdent vector signed char vec_sldb (vector signed char, vector signed char, > diff --git a/gcc/testsuite/gcc.target/powerpc/vec-sel-runnable-i128.c > b/gcc/testsuite/gcc.target/powerpc/vec-sel-runnable-i128.c > new file mode 100644 > index 000..58eb383e8c3 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/vec-sel-runnable-i128.c > @@ -0,0 +1,84 @@ > +/* { dg-do run { target power10_hw }} */ > +/* { dg-require-effective-target int128 } */ > +/* { dg-require-effective-target power10_hw } */ As mentioned above, this doesn't require power10, you can specify vmx_hw. (btw removing { target power10_hw } on dg-do run line). > +/* { dg-options "-mdejagnu-cpu=power10 -save-temps" } */ s/-mdejagnu-cpu=power10/-maltivec/ s/-save-temps// > + > + > +#include > + > + > +#define DEBUG 0 > + > +#if DEBUG > +#include > +void
Re: [PATCH] report message for operator %a on unaddressible exp
Hi, Thanks for your helpful comments! Segher Boessenkool writes: > Hi! > > On Mon, May 13, 2024 at 10:57:12AM +0800, Jiufu Guo wrote: >> For PR96866, when gcc print asm code for modifier "%a" which requires >> an address operand, > > It requires a *memory* operand, and it outputs its address. This is a > generic modifier btw (not rs6000). Oh, yeap. it outputs the operands's address. I would update words like: which requires an addressable operand. > >> while the operand is with the constraint "X" which >> allow non-address form. An error message would be reported to indicate >> the invalid asm operands. > > "non-address form"? Every mem has an address. > > But 'X' is not memory. What is it at all? Why do we use that when you > *have to* have mem here? "X" allows any thing. This is the reason why the code is *invalid*. Other constraints("r/m") should be better than "X" for "%a". > > The code you add that tests for address_operand looks wrong. I would > expect it to test the operand is memory, instead :-) I understand your concern. While there is a tricky work: before invoking print_operand_address/output_address, the orignal operand (which would be 'mem') is stripped to it's address. So, 'address_operand' is tested for print_operand_address is targets. While I also wonder if "address_operand" is really needed. Because under the condition: ``` else if (SYMBOL_REF_P (x) || GET_CODE (x) == CONST || GET_CODE (x) == LABEL_REF) { ``` 'x' is already known, it only could be: SYMBOL_REF/LABEL_REF or CONST. I would update the patch for this. Thanks for your comments. BR, Jeff(Jiufu) Guo > > > Segher
RE: [PATCH] vect: generate suitable convert insn for int -> int, float -> float and int <-> float.
Do you have any advice? BRs, Lin -Original Message- From: Hu, Lin1 Sent: Wednesday, May 8, 2024 9:38 AM To: gcc-patches@gcc.gnu.org Cc: Liu, Hongtao ; ubiz...@gmail.com Subject: [PATCH] vect: generate suitable convert insn for int -> int, float -> float and int <-> float. Hi, all This patch aims to optimize __builtin_convertvector. We want the function can generate more efficient insn for some situations. Like v2si -> v2di. The patch has been bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk? BRs, Lin gcc/ChangeLog: PR target/107432 * tree-vect-generic.cc (expand_vector_conversion): Support convert for int -> int, float -> float and int <-> float. (expand_vector_conversion_no_vec_pack): Check if can convert int <-> int, float <-> float and int <-> float, directly. Support indirect convert, when direct optab is not supported. gcc/testsuite/ChangeLog: PR target/107432 * gcc.target/i386/pr107432-1.c: New test. * gcc.target/i386/pr107432-2.c: Ditto. * gcc.target/i386/pr107432-3.c: Ditto. * gcc.target/i386/pr107432-4.c: Ditto. * gcc.target/i386/pr107432-5.c: Ditto. * gcc.target/i386/pr107432-6.c: Ditto. * gcc.target/i386/pr107432-7.c: Ditto. --- gcc/testsuite/gcc.target/i386/pr107432-1.c | 234 + gcc/testsuite/gcc.target/i386/pr107432-2.c | 105 + gcc/testsuite/gcc.target/i386/pr107432-3.c | 55 + gcc/testsuite/gcc.target/i386/pr107432-4.c | 56 + gcc/testsuite/gcc.target/i386/pr107432-5.c | 72 +++ gcc/testsuite/gcc.target/i386/pr107432-6.c | 139 gcc/testsuite/gcc.target/i386/pr107432-7.c | 156 ++ gcc/tree-vect-generic.cc | 107 +- 8 files changed, 918 insertions(+), 6 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/pr107432-1.c create mode 100644 gcc/testsuite/gcc.target/i386/pr107432-2.c create mode 100644 gcc/testsuite/gcc.target/i386/pr107432-3.c create mode 100644 gcc/testsuite/gcc.target/i386/pr107432-4.c create mode 100644 gcc/testsuite/gcc.target/i386/pr107432-5.c create mode 100644 gcc/testsuite/gcc.target/i386/pr107432-6.c create mode 100644 gcc/testsuite/gcc.target/i386/pr107432-7.c diff --git a/gcc/testsuite/gcc.target/i386/pr107432-1.c b/gcc/testsuite/gcc.target/i386/pr107432-1.c new file mode 100644 index 000..a4f37447eb4 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr107432-1.c @@ -0,0 +1,234 @@ +/* { dg-do compile } */ +/* { dg-options "-march=x86-64 -mavx512bw -mavx512vl -O3" } */ +/* { dg-final { scan-assembler-times "vpmovqd" 6 } } */ +/* { dg-final { scan-assembler-times "vpmovqw" 6 } } */ +/* { dg-final { scan-assembler-times "vpmovqb" 6 } } */ +/* { dg-final { scan-assembler-times "vpmovdw" 6 { target { ia32 } } } +} */ +/* { dg-final { scan-assembler-times "vpmovdw" 8 { target { ! ia32 } } +} } */ +/* { dg-final { scan-assembler-times "vpmovdb" 6 { target { ia32 } } } +} */ +/* { dg-final { scan-assembler-times "vpmovdb" 8 { target { ! ia32 } } +} } */ +/* { dg-final { scan-assembler-times "vpmovwb" 8 } } */ + +#include + +typedef short __v2hi __attribute__ ((__vector_size__ (4))); typedef +char __v2qi __attribute__ ((__vector_size__ (2))); typedef char __v4qi +__attribute__ ((__vector_size__ (4))); typedef char __v8qi +__attribute__ ((__vector_size__ (8))); + +typedef unsigned short __v2hu __attribute__ ((__vector_size__ (4))); +typedef unsigned short __v4hu __attribute__ ((__vector_size__ (8))); +typedef unsigned char __v2qu __attribute__ ((__vector_size__ (2))); +typedef unsigned char __v4qu __attribute__ ((__vector_size__ (4))); +typedef unsigned char __v8qu __attribute__ ((__vector_size__ (8))); +typedef unsigned int __v2su __attribute__ ((__vector_size__ (8))); + +__v2si mm_cvtepi64_epi32_builtin_convertvector(__m128i a) { + return __builtin_convertvector((__v2di)a, __v2si); } + +__m128imm256_cvtepi64_epi32_builtin_convertvector(__m256i a) +{ + return (__m128i)__builtin_convertvector((__v4di)a, __v4si); } + +__m256imm512_cvtepi64_epi32_builtin_convertvector(__m512i a) +{ + return (__m256i)__builtin_convertvector((__v8di)a, __v8si); } + +__v2hi mm_cvtepi64_epi16_builtin_convertvector(__m128i a) +{ + return __builtin_convertvector((__v2di)a, __v2hi); } + +__v4hi mm256_cvtepi64_epi16_builtin_convertvector(__m256i a) +{ + return __builtin_convertvector((__v4di)a, __v4hi); } + +__m128imm512_cvtepi64_epi16_builtin_convertvector(__m512i a) +{ + return (__m128i)__builtin_convertvector((__v8di)a, __v8hi); } + +__v2qi mm_cvtepi64_epi8_builtin_convertvector(__m128i a) +{ + return __builtin_convertvector((__v2di)a, __v2qi); } + +__v4qi mm256_cvtepi64_epi8_builtin_convertvector(__m256i a) +{ + return __builtin_convertvector((__v4di)a, __v4qi); } + +__v8qi mm512_cvtepi64_epi8_builtin_convertvector(__m512i a) +{ + return __builtin_convertvector((__v8di)a, __v8qi); }
Re: [PATCHv2] Value range: Add range op for __builtin_isfinite
Hi Aldy, Thanks for your review comments. 在 2024/5/13 19:18, Aldy Hernandez 写道: > On Thu, May 9, 2024 at 10:05 AM Mikael Morin wrote: >> >> Hello, >> >> Le 07/05/2024 à 04:37, HAO CHEN GUI a écrit : >>> Hi, >>>The former patch adds isfinite optab for __builtin_isfinite. >>> https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649339.html >>> >>>Thus the builtin might not be folded at front end. The range op for >>> isfinite is needed for value range analysis. This patch adds them. >>> >>>Compared to last version, this version fixes a typo. >>> >>>Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no >>> regressions. Is it OK for the trunk? >>> >>> Thanks >>> Gui Haochen >>> >>> ChangeLog >>> Value Range: Add range op for builtin isfinite >>> >>> The former patch adds optab for builtin isfinite. Thus builtin isfinite >>> might >>> not be folded at front end. So the range op for isfinite is needed for >>> value >>> range analysis. This patch adds range op for builtin isfinite. >>> >>> gcc/ >>> * gimple-range-op.cc (class cfn_isfinite): New. >>> (op_cfn_finite): New variables. >>> (gimple_range_op_handler::maybe_builtin_call): Handle >>> CFN_BUILT_IN_ISFINITE. >>> >>> gcc/testsuite/ >>> * gcc/testsuite/gcc.dg/tree-ssa/range-isfinite.c: New test. >>> >>> patch.diff >>> diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc >>> index 9de130b4022..99c511728d3 100644 >>> --- a/gcc/gimple-range-op.cc >>> +++ b/gcc/gimple-range-op.cc >>> @@ -1192,6 +1192,56 @@ public: >>> } >>> } op_cfn_isinf; >>> >>> +//Implement range operator for CFN_BUILT_IN_ISFINITE >>> +class cfn_isfinite : public range_operator >>> +{ >>> +public: >>> + using range_operator::fold_range; >>> + using range_operator::op1_range; >>> + virtual bool fold_range (irange , tree type, const frange , >>> +const irange &, relation_trio) const override >>> + { >>> +if (op1.undefined_p ()) >>> + return false; >>> + >>> +if (op1.known_isfinite ()) >>> + { >>> + r.set_nonzero (type); >>> + return true; >>> + } >>> + >>> +if (op1.known_isnan () >>> + || op1.known_isinf ()) >>> + { >>> + r.set_zero (type); >>> + return true; >>> + } >>> + >>> +return false; >> I think the canonical API behaviour sets R to varying and returns true >> instead of just returning false if nothing is known about the range. > > Correct. If we know it's varying, we just set varying and return > true. Returning false is usually reserved for "I have no idea". > However, every caller of fold_range() should know to ignore a return > of false, so you should be safe. So it's better to set varying here and return true? > >> >> I'm not sure whether it makes any difference; Aldy can probably tell. >> But if the type is bool, varying is [0,1] which is better than unknown >> range. > > Also, I see you're setting zero/nonzero. Is the return type known to > be boolean, because if so, we usually prefer to one of: The return type is int. For __builtin_isfinite, the result is nonzero when the float is a finite number, 0 otherwise. > > r = range_true () > r = range_false () > r = range_true_and_false (); > > It doesn't matter either way, but it's probably best to use these as > they force boolean_type_node automatically. > > I don't have a problem with this patch, but I would prefer the > floating point savvy people to review this, as there are no members of > the ranger team that are floating point experts :). > > Also, I see you mention in your original post that this patch was > needed as a follow-up to this one: > > https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649339.html > > I don't see the above patch in the source tree currently: Sorry, I may not express it clear. I sent a series of patches for review. Some patches depend on others. The patch I mentioned is a patch also under review. Here is the list of the series of patches. Some of them are generic, and others are rs6000 specific. [PATCH] Value Range: Add range op for builtin isinf https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648303.html [patch, rs6000] Implement optab_isinf for SFmode, DFmode and TFmode [PR97786] https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648304.html [Patch] Builtin: Fold builtin_isinf on IBM long double to builtin_isinf on double [PR97786] https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648433.html [PATCH] Optab: add isfinite_optab for __builtin_isfinite https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649339.html [PATCHv2] Value range: Add range op for __builtin_isfinite https://gcc.gnu.org/pipermail/gcc-patches/2024-May/650857.html [PATCH-2, rs6000] Implement optab_isfinite for SFmode, DFmode and TFmode [PR97786] https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649346.html [PATCH-3] Builtin: Fold builtin_isfinite on IBM long double to builtin_isfinite on double [PR97786]
Re: [PATCH 5/13] rs6000, remove duplicated built-ins of vecmergl and vec_mergeh
Hi, on 2024/4/20 05:17, Carl Love wrote: > rs6000, remove duplicated built-ins of vecmergl and vec_mergeh > > The following undocumented built-ins are same as existing documented > overloaded builtins. > > const vf __builtin_vsx_xxmrghw (vf, vf); > same as vf __builtin_vec_mergeh (vf, vf); (overloaded vec_mergeh) > > const vsi __builtin_vsx_xxmrghw_4si (vsi, vsi); > same as vsi __builtin_vec_mergeh (vsi, vsi); (overloaded vec_mergeh) > > const vf __builtin_vsx_xxmrglw (vf, vf); > same as vf __builtin_vec_mergel (vf, vf); (overloaded vec_mergel) > > const vsi __builtin_vsx_xxmrglw_4si (vsi, vsi); > same as vsi __builtin_vec_mergel (vsi, vsi); (overloaded vec_mergel) > > This patch removes the duplicate built-in definitions so only the > documented built-ins will be available for use. The case statements in > rs6000_gimple_fold_builtin are removed as they are no longer needed. The > patch removes the now unused define_expands for vsx_xxmrghw_ and > vsx_xxmrglw_. Ok for trunk, thanks! BR, Kewen > > gcc/ChangeLog: > * config/rs6000/rs6000-builtins.def (__builtin_vsx_xxmrghw, > __builtin_vsx_xxmrghw_4si, __builtin_vsx_xxmrglw, > __builtin_vsx_xxmrglw_4si, __builtin_vsx_xxsel_16qi): Remove > built-in definition. > * config/rs6000/rs6000-builtin.cc (rs6000_gimple_fold_builtin): > remove case entries RS6000_BIF_XXMRGLW_4SI, > RS6000_BIF_XXMRGLW_4SF, RS6000_BIF_XXMRGHW_4SI, > RS6000_BIF_XXMRGHW_4SF. > * config/rs6000/vsx.md (vsx_xxmrghw_, vsx_xxmrglw_): > Remove unused define_expands. > --- > gcc/config/rs6000/rs6000-builtin.cc | 4 --- > gcc/config/rs6000/rs6000-builtins.def | 12 > gcc/config/rs6000/vsx.md | 41 --- > 3 files changed, 57 deletions(-) > > diff --git a/gcc/config/rs6000/rs6000-builtin.cc > b/gcc/config/rs6000/rs6000-builtin.cc > index ac9f16fe51a..f83d65b06ef 100644 > --- a/gcc/config/rs6000/rs6000-builtin.cc > +++ b/gcc/config/rs6000/rs6000-builtin.cc > @@ -2097,20 +2097,16 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi) > /* vec_mergel (integrals). */ > case RS6000_BIF_VMRGLH: > case RS6000_BIF_VMRGLW: > -case RS6000_BIF_XXMRGLW_4SI: > case RS6000_BIF_VMRGLB: > case RS6000_BIF_VEC_MERGEL_V2DI: > -case RS6000_BIF_XXMRGLW_4SF: > case RS6000_BIF_VEC_MERGEL_V2DF: >fold_mergehl_helper (gsi, stmt, 1); >return true; > /* vec_mergeh (integrals). */ > case RS6000_BIF_VMRGHH: > case RS6000_BIF_VMRGHW: > -case RS6000_BIF_XXMRGHW_4SI: > case RS6000_BIF_VMRGHB: > case RS6000_BIF_VEC_MERGEH_V2DI: > -case RS6000_BIF_XXMRGHW_4SF: > case RS6000_BIF_VEC_MERGEH_V2DF: >fold_mergehl_helper (gsi, stmt, 0); >return true; > diff --git a/gcc/config/rs6000/rs6000-builtins.def > b/gcc/config/rs6000/rs6000-builtins.def > index 5b7237a2327..d09e21a9151 100644 > --- a/gcc/config/rs6000/rs6000-builtins.def > +++ b/gcc/config/rs6000/rs6000-builtins.def > @@ -1904,18 +1904,6 @@ >const signed int __builtin_vsx_xvtsqrtsp_fg (vf); > XVTSQRTSP_FG vsx_tsqrtv4sf2_fg {} > > - const vf __builtin_vsx_xxmrghw (vf, vf); > -XXMRGHW_4SF vsx_xxmrghw_v4sf {} > - > - const vsi __builtin_vsx_xxmrghw_4si (vsi, vsi); > -XXMRGHW_4SI vsx_xxmrghw_v4si {} > - > - const vf __builtin_vsx_xxmrglw (vf, vf); > -XXMRGLW_4SF vsx_xxmrglw_v4sf {} > - > - const vsi __builtin_vsx_xxmrglw_4si (vsi, vsi); > -XXMRGLW_4SI vsx_xxmrglw_v4si {} > - >const vsc __builtin_vsx_xxpermdi_16qi (vsc, vsc, const int<2>); > XXPERMDI_16QI vsx_xxpermdi_v16qi {} > > diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md > index 3d39ae7995f..26560ecc38a 100644 > --- a/gcc/config/rs6000/vsx.md > +++ b/gcc/config/rs6000/vsx.md > @@ -4810,47 +4810,6 @@ > } >[(set_attr "type" "vecperm")]) > > -;; V4SF/V4SI interleave > -(define_expand "vsx_xxmrghw_" > - [(set (match_operand:VSX_W 0 "vsx_register_operand" "=wa") > -(vec_select:VSX_W > - (vec_concat: > - (match_operand:VSX_W 1 "vsx_register_operand" "wa") > - (match_operand:VSX_W 2 "vsx_register_operand" "wa")) > - (parallel [(const_int 0) (const_int 4) > - (const_int 1) (const_int 5)])))] > - "VECTOR_MEM_VSX_P (mode)" > -{ > - rtx (*fun) (rtx, rtx, rtx); > - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_ > - : gen_altivec_vmrglw_direct_; > - if (!BYTES_BIG_ENDIAN) > -std::swap (operands[1], operands[2]); > - emit_insn (fun (operands[0], operands[1], operands[2])); > - DONE; > -} > - [(set_attr "type" "vecperm")]) > - > -(define_expand "vsx_xxmrglw_" > - [(set (match_operand:VSX_W 0 "vsx_register_operand" "=wa") > - (vec_select:VSX_W > - (vec_concat: > - (match_operand:VSX_W 1 "vsx_register_operand" "wa") > - (match_operand:VSX_W 2 "vsx_register_operand" "wa")) > - (parallel [(const_int 2)
RE: [PATCH v4 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int
> That's just a matter of matching the overflow as an additional case no? > i.e. you can add an overload for unsigned_integer_sat_add matching the > IFN_ ADD_OVERFLOW and using the realpart and imagpart helpers. > I think that would be better as it avoid visiting all the statements twice > but also > extends the matching to some __builtin_add_overflow uses and should be fairly > simple. Thanks Tamar, got the point here, will have a try with overload unsigned_integer_sat_add for that. > Yeah, I think that's better than iterating over the statements twice. It > also fits better > In the existing code. Ack, will follow the existing code. Pan -Original Message- From: Tamar Christina Sent: Monday, May 13, 2024 11:03 PM To: Li, Pan2 ; gcc-patches@gcc.gnu.org Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com; Liu, Hongtao Subject: RE: [PATCH v4 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int > > Thanks Tamer for comments. > > > I think OPTIMIZE_FOR_BOTH is better here, since this is a win also when > optimizing for size. > > Sure thing, let me update it in v5. > > > Hmm why do you iterate independently over the statements? The block below > already visits > > Every statement doesn't it? > > Because it will hit .ADD_OVERFLOW first, then it will never hit SAT_ADD as the > shape changed, or shall we put it to the previous pass ? > That's just a matter of matching the overflow as an additional case no? i.e. you can add an overload for unsigned_integer_sat_add matching the IFN_ ADD_OVERFLOW and using the realpart and imagpart helpers. I think that would be better as it avoid visiting all the statements twice but also extends the matching to some __builtin_add_overflow uses and should be fairly simple. > > The root of your match is a BIT_IOR_EXPR expression, so I think you just > > need to > change the entry below to: > > > > case BIT_IOR_EXPR: > > match_saturation_arith (, stmt, m_cfg_changed_p); > > /* fall-through */ > > case BIT_XOR_EXPR: > > match_uaddc_usubc (, stmt, code); > > break; > > There are other shapes (not covered in this patch) of SAT_ADD like below > branch > version, the IOR should be one of the ROOT. Thus doesn't > add case here. Then, shall we take case for each shape here ? Both works for > me. > Yeah, I think that's better than iterating over the statements twice. It also fits better In the existing code. Tamar. > #define SAT_ADD_U_1(T) \ > T sat_add_u_1_##T(T x, T y) \ > { \ > return (T)(x + y) >= x ? (x + y) : -1; \ > } > > SAT_ADD_U_1(uint32_t) > > Pan > > > -Original Message- > From: Tamar Christina > Sent: Monday, May 13, 2024 5:10 PM > To: Li, Pan2 ; gcc-patches@gcc.gnu.org > Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com; > Liu, Hongtao > Subject: RE: [PATCH v4 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned > scalar int > > Hi Pan, > > > -Original Message- > > From: pan2...@intel.com > > Sent: Monday, May 6, 2024 3:48 PM > > To: gcc-patches@gcc.gnu.org > > Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; Tamar Christina > > ; richard.guent...@gmail.com; > > hongtao@intel.com; Pan Li > > Subject: [PATCH v4 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned > scalar > > int > > > > From: Pan Li > > > > This patch would like to add the middle-end presentation for the > > saturation add. Aka set the result of add to the max when overflow. > > It will take the pattern similar as below. > > > > SAT_ADD (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x)) > > > > Take uint8_t as example, we will have: > > > > * SAT_ADD (1, 254) => 255. > > * SAT_ADD (1, 255) => 255. > > * SAT_ADD (2, 255) => 255. > > * SAT_ADD (255, 255) => 255. > > > > Given below example for the unsigned scalar integer uint64_t: > > > > uint64_t sat_add_u64 (uint64_t x, uint64_t y) > > { > > return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x)); > > } > > > > Before this patch: > > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y) > > { > > long unsigned int _1; > > _Bool _2; > > long unsigned int _3; > > long unsigned int _4; > > uint64_t _7; > > long unsigned int _10; > > __complex__ long unsigned int _11; > > > > ;; basic block 2, loop depth 0 > > ;;pred: ENTRY > > _11 = .ADD_OVERFLOW (x_5(D), y_6(D)); > > _1 = REALPART_EXPR <_11>; > > _10 = IMAGPART_EXPR <_11>; > > _2 = _10 != 0; > > _3 = (long unsigned int) _2; > > _4 = -_3; > > _7 = _1 | _4; > > return _7; > > ;;succ: EXIT > > > > } > > > > After this patch: > > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y) > > { > > uint64_t _7; > > > > ;; basic block 2, loop depth 0 > > ;;pred: ENTRY > > _7 = .SAT_ADD (x_5(D), y_6(D)); [tail call] > > return _7; > > ;;succ: EXIT > > } > > > > We perform the tranform during widen_mult because that the sub-expr of > > SAT_ADD will
[committed] RISC-V: Fix format issue for trailing operator [NFC]
From: Pan Li This patch would like to fix below format issue of trailing operator. === ERROR type #1: trailing operator (4 error(s)) === gcc/config/riscv/riscv-vector-builtins.cc:4641:39: if ((exts & RVV_REQUIRE_ELEN_FP_16) && gcc/config/riscv/riscv-vector-builtins.cc:4651:39: if ((exts & RVV_REQUIRE_ELEN_FP_32) && gcc/config/riscv/riscv-vector-builtins.cc:4661:39: if ((exts & RVV_REQUIRE_ELEN_FP_64) && gcc/config/riscv/riscv-vector-builtins.cc:4670:36: if ((exts & RVV_REQUIRE_ELEN_64) && Passed the ./contrib/check_GNU_style.sh for this patch, and double checked there is no other format issue of the original patch. Committed as format change. gcc/ChangeLog: * config/riscv/riscv-vector-builtins.cc (validate_instance_type_required_extensions): Remove the operator from the trailing and put it to new line. Signed-off-by: Pan Li --- gcc/config/riscv/riscv-vector-builtins.cc | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/gcc/config/riscv/riscv-vector-builtins.cc b/gcc/config/riscv/riscv-vector-builtins.cc index 3fdb4400d70..c08d87a2680 100644 --- a/gcc/config/riscv/riscv-vector-builtins.cc +++ b/gcc/config/riscv/riscv-vector-builtins.cc @@ -4638,8 +4638,8 @@ validate_instance_type_required_extensions (const rvv_type_info type, { uint64_t exts = type.required_extensions; - if ((exts & RVV_REQUIRE_ELEN_FP_16) && -!TARGET_VECTOR_ELEN_FP_16_P (riscv_vector_elen_flags)) + if ((exts & RVV_REQUIRE_ELEN_FP_16) +&& !TARGET_VECTOR_ELEN_FP_16_P (riscv_vector_elen_flags)) { error_at (EXPR_LOCATION (exp), "built-in function %qE requires the " @@ -4648,8 +4648,8 @@ validate_instance_type_required_extensions (const rvv_type_info type, return false; } - if ((exts & RVV_REQUIRE_ELEN_FP_32) && -!TARGET_VECTOR_ELEN_FP_32_P (riscv_vector_elen_flags)) + if ((exts & RVV_REQUIRE_ELEN_FP_32) +&& !TARGET_VECTOR_ELEN_FP_32_P (riscv_vector_elen_flags)) { error_at (EXPR_LOCATION (exp), "built-in function %qE requires the " @@ -4658,8 +4658,8 @@ validate_instance_type_required_extensions (const rvv_type_info type, return false; } - if ((exts & RVV_REQUIRE_ELEN_FP_64) && -!TARGET_VECTOR_ELEN_FP_64_P (riscv_vector_elen_flags)) + if ((exts & RVV_REQUIRE_ELEN_FP_64) +&& !TARGET_VECTOR_ELEN_FP_64_P (riscv_vector_elen_flags)) { error_at (EXPR_LOCATION (exp), "built-in function %qE requires the zve64d or v ISA extension", @@ -4667,8 +4667,8 @@ validate_instance_type_required_extensions (const rvv_type_info type, return false; } - if ((exts & RVV_REQUIRE_ELEN_64) && -!TARGET_VECTOR_ELEN_64_P (riscv_vector_elen_flags)) + if ((exts & RVV_REQUIRE_ELEN_64) +&& !TARGET_VECTOR_ELEN_64_P (riscv_vector_elen_flags)) { error_at (EXPR_LOCATION (exp), "built-in function %qE requires the " -- 2.34.1
RE: [PATCH v1] RISC-V: Bugfix ICE for RVV intrinisc vfw on _Float16 scalar
Ack, thanks Jeff and will fix it ASAP. Pan -Original Message- From: Jeff Law Sent: Tuesday, May 14, 2024 2:10 AM To: Li, Pan2 ; Kito Cheng ; juzhe.zh...@rivai.ai Cc: gcc-patches Subject: Re: [PATCH v1] RISC-V: Bugfix ICE for RVV intrinisc vfw on _Float16 scalar On 5/13/24 9:00 AM, Li, Pan2 wrote: > Committed, thanks Juzhe and Kito. Let's wait for a while before backport to > 14. Could you fix the formatting nits caught by the CI linter? === ERROR type #1: trailing operator (4 error(s)) === gcc/config/riscv/riscv-vector-builtins.cc:4641:39: if ((exts & RVV_REQUIRE_ELEN_FP_16) && gcc/config/riscv/riscv-vector-builtins.cc:4651:39: if ((exts & RVV_REQUIRE_ELEN_FP_32) && gcc/config/riscv/riscv-vector-builtins.cc:4661:39: if ((exts & RVV_REQUIRE_ELEN_FP_64) && gcc/config/riscv/riscv-vector-builtins.cc:4670:36: if ((exts & RVV_REQUIRE_ELEN_64) && The "&&" needs to come down to the next line, indented like if ((exts && RVV_REQUIRE_ELEN_FP_16) && !TARGET_VECTOR_.) Ie, the "&&" indents just inside the first open paren. It looks like all the conditions in validate_instance_type_required_extensions need to be fixed in a similar manner. Given this is NFC, just post it for the archiver. No need to wait on review. Jeff
[PATCH] aarch64: Fold vget_low_* intrinsics to BIT_FIELD_REF [PR102171]
This patch folds vget_low_* intrinsics to BIT_FILED_REF to open up more optimization opportunities for gimple optimizers. While we are here, we also remove the vget_low_* definitions from arm_neon.h and use the new intrinsics framework. PR target/102171 gcc/ChangeLog: * config/aarch64/aarch64-builtins.cc (AARCH64_SIMD_VGET_LOW_BUILTINS): New macro to create definitions for all vget_low intrinsics. (VGET_LOW_BUILTIN): Likewise. (enum aarch64_builtins): Add vget_low function codes. (aarch64_general_fold_builtin): Fold vget_low calls. * config/aarch64/aarch64-simd-builtins.def: Delete vget_low builtins. * config/aarch64/aarch64-simd.md (aarch64_get_low): Delete. (aarch64_vget_lo_halfv8bf): Likewise. * config/aarch64/arm_neon.h (__attribute__): Delete. (vget_low_f16): Likewise. (vget_low_f32): Likewise. (vget_low_f64): Likewise. (vget_low_p8): Likewise. (vget_low_p16): Likewise. (vget_low_p64): Likewise. (vget_low_s8): Likewise. (vget_low_s16): Likewise. (vget_low_s32): Likewise. (vget_low_s64): Likewise. (vget_low_u8): Likewise. (vget_low_u16): Likewise. (vget_low_u32): Likewise. (vget_low_u64): Likewise. (vget_low_bf16): Likewise. gcc/testsuite/ChangeLog: * gcc.target/aarch64/pr113573.c: Replace __builtin_aarch64_get_lowv8hi with vget_low_s16. * gcc.target/aarch64/vget_low_2.c: New test. * gcc.target/aarch64/vget_low_2_be.c: New test. Signed-off-by: Pengxuan Zheng --- gcc/config/aarch64/aarch64-builtins.cc| 60 ++ gcc/config/aarch64/aarch64-simd-builtins.def | 5 +- gcc/config/aarch64/aarch64-simd.md| 23 +--- gcc/config/aarch64/arm_neon.h | 105 -- gcc/testsuite/gcc.target/aarch64/pr113573.c | 2 +- gcc/testsuite/gcc.target/aarch64/vget_low_2.c | 30 + .../gcc.target/aarch64/vget_low_2_be.c| 31 ++ 7 files changed, 124 insertions(+), 132 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/vget_low_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/vget_low_2_be.c diff --git a/gcc/config/aarch64/aarch64-builtins.cc b/gcc/config/aarch64/aarch64-builtins.cc index 75d21de1401..4afe7c86ae3 100644 --- a/gcc/config/aarch64/aarch64-builtins.cc +++ b/gcc/config/aarch64/aarch64-builtins.cc @@ -658,6 +658,23 @@ static aarch64_simd_builtin_datum aarch64_simd_builtin_data[] = { VREINTERPRET_BUILTINS \ VREINTERPRETQ_BUILTINS +#define AARCH64_SIMD_VGET_LOW_BUILTINS \ + VGET_LOW_BUILTIN(f16) \ + VGET_LOW_BUILTIN(f32) \ + VGET_LOW_BUILTIN(f64) \ + VGET_LOW_BUILTIN(p8) \ + VGET_LOW_BUILTIN(p16) \ + VGET_LOW_BUILTIN(p64) \ + VGET_LOW_BUILTIN(s8) \ + VGET_LOW_BUILTIN(s16) \ + VGET_LOW_BUILTIN(s32) \ + VGET_LOW_BUILTIN(s64) \ + VGET_LOW_BUILTIN(u8) \ + VGET_LOW_BUILTIN(u16) \ + VGET_LOW_BUILTIN(u32) \ + VGET_LOW_BUILTIN(u64) \ + VGET_LOW_BUILTIN(bf16) + typedef struct { const char *name; @@ -697,6 +714,9 @@ typedef struct #define VREINTERPRET_BUILTIN(A, B, L) \ AARCH64_SIMD_BUILTIN_VREINTERPRET##L##_##A##_##B, +#define VGET_LOW_BUILTIN(A) \ + AARCH64_SIMD_BUILTIN_VGET_LOW_##A, + #undef VAR1 #define VAR1(T, N, MAP, FLAG, A) \ AARCH64_SIMD_BUILTIN_##T##_##N##A, @@ -732,6 +752,7 @@ enum aarch64_builtins AARCH64_CRC32_BUILTIN_MAX, /* SIMD intrinsic builtins. */ AARCH64_SIMD_VREINTERPRET_BUILTINS + AARCH64_SIMD_VGET_LOW_BUILTINS /* ARMv8.3-A Pointer Authentication Builtins. */ AARCH64_PAUTH_BUILTIN_AUTIA1716, AARCH64_PAUTH_BUILTIN_PACIA1716, @@ -823,8 +844,37 @@ static aarch64_fcmla_laneq_builtin_datum aarch64_fcmla_lane_builtin_data[] = { && SIMD_INTR_QUAL(A) == SIMD_INTR_QUAL(B) \ }, +#undef VGET_LOW_BUILTIN +#define VGET_LOW_BUILTIN(A) \ + {"vget_low_" #A, \ + AARCH64_SIMD_BUILTIN_VGET_LOW_##A, \ + 2, \ + { SIMD_INTR_MODE(A, d), SIMD_INTR_MODE(A, q) }, \ + { SIMD_INTR_QUAL(A), SIMD_INTR_QUAL(A) }, \ + FLAG_AUTO_FP, \ + false \ + }, + +#define AARCH64_SIMD_VGET_LOW_BUILTINS \ + VGET_LOW_BUILTIN(f16) \ + VGET_LOW_BUILTIN(f32) \ + VGET_LOW_BUILTIN(f64) \ + VGET_LOW_BUILTIN(p8) \ + VGET_LOW_BUILTIN(p16) \ + VGET_LOW_BUILTIN(p64) \ + VGET_LOW_BUILTIN(s8) \ + VGET_LOW_BUILTIN(s16) \ + VGET_LOW_BUILTIN(s32) \ + VGET_LOW_BUILTIN(s64) \ + VGET_LOW_BUILTIN(u8) \ + VGET_LOW_BUILTIN(u16) \ + VGET_LOW_BUILTIN(u32) \ + VGET_LOW_BUILTIN(u64) \ + VGET_LOW_BUILTIN(bf16) + static const aarch64_simd_intrinsic_datum aarch64_simd_intrinsic_data[] = { AARCH64_SIMD_VREINTERPRET_BUILTINS + AARCH64_SIMD_VGET_LOW_BUILTINS }; @@ -3216,6 +3266,9 @@ aarch64_fold_builtin_lane_check (tree arg0, tree arg1, tree arg2) #define VREINTERPRET_BUILTIN(A, B, L) \ case AARCH64_SIMD_BUILTIN_VREINTERPRET##L##_##A##_##B: +#undef VGET_LOW_BUILTIN +#define VGET_LOW_BUILTIN(A) \ + case
Re: [PATCH v2 2/2] RISC-V: avoid LUI based const mat in prologue/epilogue expansion [PR/105733]
On 5/13/24 13:28, Jeff Law wrote: On 5/13/24 12:49 PM, Vineet Gupta wrote: If the constant used for stack offset can be expressed as sum of two S12 values, the constant need not be materialized (in a reg) and instead the two S12 bits can be added to instructions involved with frame pointer. This avoids burning a register and more importantly can often get down to be 2 insn vs. 3. The prev patches to generally avoid LUI based const materialization didn't fix this PR and need this directed fix in funcion prologue/epilogue expansion. This fix doesn't move the neddle for SPEC, at all, but it is still a win considering gcc generates one insn fewer than llvm for the test ;-) gcc-13.1 release | gcc 230823 | | | g6619b3d4c15c | This patch | clang/llvm - li t0,-4096 | li t0,-4096 | addi sp,sp,-2048 | addi sp,sp,-2048 addi t0,t0,2016 | addi t0,t0,2032 | add sp,sp,-16 | addi sp,sp,-32 li a4,4096 | add sp,sp,t0 | add a5,sp,a0 | add a1,sp,16 add sp,sp,t0 | addi a5,sp,-2032 | sb zero,0(a5) | add a0,a0,a1 li a5,-4096 | add a0,a5,a0 | addi sp,sp,2032 | sb zero,0(a0) addi a4,a4,-2032 | li t0, 4096 | addi sp,sp,32 | addi sp,sp,2032 add a4,a4,a5 | sb zero,2032(a0) | ret | addi sp,sp,48 addi a5,sp,16 | addi t0,t0,-2032 | | ret add a5,a4,a5 | add sp,sp,t0 | add a0,a5,a0 | ret | li t0,4096 | sd a5,8(sp) | sb zero,2032(a0)| addi t0,t0,-2016 | add sp,sp,t0 | ret | gcc/ChangeLog: PR target/105733 * config/riscv/riscv.h: New macros for with aligned offsets. * config/riscv/riscv.cc (riscv_split_sum_of_two_s12): New function to split a sum of two s12 values into constituents. (riscv_expand_prologue): Handle offset being sum of two S12. (riscv_expand_epilogue): Ditto. * config/riscv/riscv-protos.h (riscv_split_sum_of_two_s12): New. gcc/testsuite/ChangeLog: * gcc.target/riscv/pr105733.c: New Test. * gcc.target/riscv/rvv/autovec/vls/spill-1.c: Adjust to not expect LUI 4096. * gcc.target/riscv/rvv/autovec/vls/spill-2.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/spill-3.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/spill-4.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/spill-5.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/spill-6.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/spill-7.c: Ditto. @@ -8074,14 +8111,26 @@ riscv_expand_epilogue (int style) } else { - if (!SMALL_OPERAND (adjust_offset.to_constant ())) + HOST_WIDE_INT adj_off_value = adjust_offset.to_constant (); + if (SMALL_OPERAND (adj_off_value)) + { + adjust = GEN_INT (adj_off_value); + } + else if (SUM_OF_TWO_S12_ALGN (adj_off_value)) + { + HOST_WIDE_INT base, off; + riscv_split_sum_of_two_s12 (adj_off_value, , ); + insn = gen_add3_insn (stack_pointer_rtx, hard_frame_pointer_rtx, + GEN_INT (base)); + RTX_FRAME_RELATED_P (insn) = 1; + adjust = GEN_INT (off); + } So this was the hunk that we identified internally as causing problems with libgomp's testsuite. We never fully chased it down as this hunk didn't seem terribly important performance wise -- we just set it aside. The thing is it looked basically correct to me. So the failure was certainly unexpected, but it was consistent. So I think the question is whether or not the CI system runs the libgomp testsuite, particularly in the rv64 linux configuration. If it does, and it passes, then we're good. I'm still finding my way around the configuration, so I don't know if the CI system Edwin & Patrick have built tests libgomp or not. I poked around the .sum files in pre/postcommit and we do run tests like: PASS: c-c++-common/gomp/affinity-2.c (test for errors, line 45) I'm not familar with libgomp so I don't know if that's the same libgomp tests you're referring to. Patrick If it isn't run, then we'll need to do a run to test that. I'm set up here to do that if needed. I can just drop this version into our internal tree, trigger an internal CI run and see if it complains :-) If it does complain, then we know where to start investigations. Jeff
Re: [RFC][PATCH] PR tree-optimization/109071 - -Warray-bounds false positive warnings due to code duplication from jump threading
On Tue, May 14, 2024 at 01:38:49AM +0200, Andrew Pinski wrote: > On Mon, May 13, 2024, 11:41 PM Kees Cook wrote: > > But it makes no sense to warn about: > > > > void sparx5_set (int * ptr, struct nums * sg, int index) > > { > >if (index >= 4) > > warn (); > >*ptr = 0; > >*val = sg->vals[index]; > >if (index >= 4) > > warn (); > >*ptr = *val; > > } > > > > Because at "*val = sg->vals[index];" the actual value range tracking for > > index is _still_ [INT_MIN,INT_MAX]. (Only within the "then" side of the > > "if" statements is the range tracking [4,INT_MAX].) > > > > However, in the case where jump threading has split the execution flow > > and produced a copy of "*val = sg->vals[index];" where the value range > > tracking for "index" is now [4,INT_MAX], is the warning valid. But it > > is only for that instance. Reporting it for effectively both (there is > > only 1 source line for the array indexing) is misleading because there > > is nothing the user can do about it -- the compiler created the copy and > > then noticed it had a range it could apply to that array index. > > > > "there is nothing the user can do about it" is very much false. They could > change warn call into a noreturn function call instead. (In the case of > the Linux kernel panic). There are things the user can do to fix the > warning and even get better code generation out of the compilers. This isn't about warn() not being noreturn. The warn() could be any function call; the jump threading still happens. GCC is warning about a compiler-constructed situation that cannot be reliably fixed on the source side (GCC emitting the warning is highly unstable in these cases), since the condition is not *always* true for the given line of code. If it is not useful to warn for "array[index]" being out of range when "index" is always [INT_MIN,INT_MAX], then it is not useful to warn when "index" MAY be [INT_MIN,INT_MAX] for a given line of code. -Kees -- Kees Cook
Re: [RFC][PATCH] PR tree-optimization/109071 - -Warray-bounds false positive warnings due to code duplication from jump threading
On Mon, May 13, 2024, 11:41 PM Kees Cook wrote: > On Mon, May 13, 2024 at 02:46:32PM -0600, Jeff Law wrote: > > > > > > On 5/13/24 1:48 PM, Qing Zhao wrote: > > > -Warray-bounds is an important option to enable linux kernal to keep > > > the array out-of-bound errors out of the source tree. > > > > > > However, due to the false positive warnings reported in PR109071 > > > (-Warray-bounds false positive warnings due to code duplication from > > > jump threading), -Warray-bounds=1 cannot be added on by default. > > > > > > Although it's impossible to elinimate all the false positive warnings > > > from -Warray-bounds=1 (See PR104355 Misleading -Warray-bounds > > > documentation says "always out of bounds"), we should minimize the > > > false positive warnings in -Warray-bounds=1. > > > > > > The root reason for the false positive warnings reported in PR109071 > is: > > > > > > When the thread jump optimization tries to reduce the # of branches > > > inside the routine, sometimes it needs to duplicate the code and > > > split into two conditional pathes. for example: > > > > > > The original code: > > > > > > void sparx5_set (int * ptr, struct nums * sg, int index) > > > { > > >if (index >= 4) > > > warn (); > > >*ptr = 0; > > >*val = sg->vals[index]; > > >if (index >= 4) > > > warn (); > > >*ptr = *val; > > > > > >return; > > > } > > > > > > With the thread jump, the above becomes: > > > > > > void sparx5_set (int * ptr, struct nums * sg, int index) > > > { > > >if (index >= 4) > > > { > > >warn (); > > >*ptr = 0;// Code duplications since "warn" does > return; > > >*val = sg->vals[index]; // same this line. > > > // In this path, since it's under the > condition > > > // "index >= 4", the compiler knows the > value > > > // of "index" is larger then 4, therefore > the > > > // out-of-bound warning. > > >warn (); > > > } > > >else > > > { > > >*ptr = 0; > > >*val = sg->vals[index]; > > > } > > >*ptr = *val; > > >return; > > > } > > > > > > We can see, after the thread jump optimization, the # of branches > inside > > > the routine "sparx5_set" is reduced from 2 to 1, however, due to the > > > code duplication (which is needed for the correctness of the code), we > > > got a false positive out-of-bound warning. > > > > > > In order to eliminate such false positive out-of-bound warning, > > > > > > A. Add one more flag for GIMPLE: is_splitted. > > > B. During the thread jump optimization, when the basic blocks are > > > duplicated, mark all the STMTs inside the original and duplicated > > > basic blocks as "is_splitted"; > > > C. Inside the array bound checker, add the following new heuristic: > > > > > > If > > > 1. the stmt is duplicated and splitted into two conditional paths; > > > + 2. the warning level < 2; > > > + 3. the current block is not dominating the exit block > > > Then not report the warning. > > > > > > The false positive warnings are moved from -Warray-bounds=1 to > > > -Warray-bounds=2 now. > > > > > > Bootstrapped and regression tested on both x86 and aarch64. adjusted > > > -Warray-bounds-61.c due to the false positive warnings. > > > > > > Let me know if you have any comments and suggestions. > > This sounds horribly wrong. In the code above, the warning is correct. > > It's not sensible from a user's perspective. > > If this doesn't warn: > > void sparx5_set (int * ptr, struct nums * sg, int index) > { >*ptr = 0; >*val = sg->vals[index]; >*ptr = *val; > } > > ... because the value range tracking of "index" spans [INT_MIN,INT_MAX], > and warnings based on the value range are silenced if they haven't been > clamped at all. (Otherwise warnings would be produced everywhere: only > when a limited set of values is known is it useful to produce a warning.) > > > But it makes no sense to warn about: > > void sparx5_set (int * ptr, struct nums * sg, int index) > { >if (index >= 4) > warn (); >*ptr = 0; >*val = sg->vals[index]; >if (index >= 4) > warn (); >*ptr = *val; > } > > Because at "*val = sg->vals[index];" the actual value range tracking for > index is _still_ [INT_MIN,INT_MAX]. (Only within the "then" side of the > "if" statements is the range tracking [4,INT_MAX].) > > However, in the case where jump threading has split the execution flow > and produced a copy of "*val = sg->vals[index];" where the value range > tracking for "index" is now [4,INT_MAX], is the warning valid. But it > is only for that instance. Reporting it for effectively both (there is > only 1 source line for the array indexing) is misleading because there > is nothing the user can do about it -- the compiler created the copy and > then noticed it had a range it could apply to that array index. > "there is
Re: [PATCH v2 1/3] RISC-V: movmem for RISCV with V extension
On 12/19/23 10:28 PM, Jeff Law wrote: On 12/19/23 02:53, Sergei Lewis wrote: gcc/ChangeLog * config/riscv/riscv.md (movmem): Use riscv_vector::expand_block_move, if and only if we know the entire operation can be performed using one vector load followed by one vector store gcc/testsuite/ChangeLog PR target/112109 * gcc.target/riscv/rvv/base/movmem-1.c: New test So this needs to be regression tested. Given that it only affects RVV, I would suggest testing rv64gcv or rv32gcv. +(define_expand "movmem" + [(parallel [(set (match_operand:BLK 0 "general_operand") + (match_operand:BLK 1 "general_operand")) + (use (match_operand:P 2 "const_int_operand")) + (use (match_operand:SI 3 "const_int_operand"))])] + "TARGET_VECTOR" +{ + if ((INTVAL (operands[2]) >= TARGET_MIN_VLEN/8) + && (INTVAL (operands[2]) <= TARGET_MIN_VLEN) + && riscv_vector::expand_block_move (operands[0], operands[1], + operands[2])) + DONE; + else + FAIL; +}) Just a formatting nit. A space on each side of the '/' operator above. So I've fixed the formatting nit and tested on rv64gc and rv32gcv. I hadn't planned to push it, but muscle memory kicked in and 1/3 has been pushed. I'll be looking at 2/3 and 3/3 tomorrow (or possibly a bit tonight to take advantage of overnight CI runs). jeff
Re: Follow up #1 (was Re: [PATCH v2 1/2] RISC-V: avoid LUI based const materialization ... [part of PR/106265])
On 5/13/24 15:47, Jeff Law wrote: >> On 5/13/24 11:49, Vineet Gupta wrote: >>> 500.perlbench_r-0 | 1,214,534,029,025 | 1,212,887,959,387 | >>> 500.perlbench_r-1 |740,383,419,739 | 739,280,308,163 | >>> 500.perlbench_r-2 |692,074,638,817 | 691,118,734,547 | >>> 502.gcc_r-0 |190,820,141,435 | 190,857,065,988 | >>> 502.gcc_r-1 |225,747,660,839 | 225,809,444,357 | <- -0.02% >>> 502.gcc_r-2 |220,370,089,641 | 220,406,367,876 | <- -0.03% >>> 502.gcc_r-3 |179,111,460,458 | 179,135,609,723 | <- -0.02% >>> 502.gcc_r-4 |219,301,546,340 | 219,320,416,956 | <- -0.01% >>> 503.bwaves_r-0|278,733,324,691 | 278,733,323,575 | <- -0.01% >>> 503.bwaves_r-1|442,397,521,282 | 442,397,519,616 | >>> 503.bwaves_r-2|344,112,218,206 | 344,112,216,760 | >>> 503.bwaves_r-3|417,561,469,153 | 417,561,467,597 | >>> 505.mcf_r |669,319,257,525 | 669,318,763,084 | >>> 507.cactuBSSN_r | 2,852,767,394,456 | 2,564,736,063,742 | <+ 10.10% >> The small gcc regression seems like a tooling issue of some sort. >> Looking at the topblocks, the insn sequences are exactly the same, only >> the counts differ and its not obvious why. >> Here's for gcc_r-1. >> >> >> > Block 0 @ 0x170ca, 12 insns, 87854493 times, 0.47%: >> >> 000170ca : >> 170ca: 7179 add sp,sp,-48 >> 170cc: ec26 sd s1,24(sp) >> 170ce: e84a sd s2,16(sp) >> 170d0: e44e sd s3,8(sp) >> 170d2: f406 sd ra,40(sp) >> 170d4: f022 sd s0,32(sp) >> 170d6: 84aa mv s1,a0 >> 170d8: 03200913 li s2,50 >> 170dc: 03d00993 li s3,61 >> 170e0: 8526 mv a0,s1 >> 170e2: 001cd097 auipc ra,0x1cd >> 170e6: bac080e7 jalr -1108(ra) # 1e3c8e >> >> >> > Block 1 @ 0x706d0a, 3 insns, 274713936 times, 0.37%: >> > Block 2 @ 0x1e3c8e, 9 insns, 88507109 times, 0.35%: >> ... >> >> < Block 0 @ 0x170ca, 12 insns, 87869602 times, 0.47%: >> < Block 1 @ 0x706d42, 3 insns, 274608893 times, 0.36%: >> < Block 2 @ 0x1e3c94, 9 insns, 88526354 times, 0.35%: >> >> >> FWIW, Greg internally has been looking at some of this and found some >> issues in the bbv tooling, but I wish all of this was shared/upstream >> (QEMU bbv plugin) for people to compare notes and not discover/fix the >> same issues over and again. > Yea, we all meant to coordinate on those plugins. The one we've got had > some problems with hash collisions and when there's a hash collision it > just produces total junk data. I chased a few of these down and fixed > them about a year ago. > > The other thing is qemu will split up blocks based on its internal > notion of a translation page. So if you're looking at block level data > you'll stumble over that as well. This aspect is the most troublesome > problem I'm aware of right now. And these two are exactly what Greg fixed, among others :-) -Vineet
Re: Follow up #1 (was Re: [PATCH v2 1/2] RISC-V: avoid LUI based const materialization ... [part of PR/106265])
On 5/13/24 3:13 PM, Vineet Gupta wrote: On 5/13/24 11:49, Vineet Gupta wrote: 500.perlbench_r-0 | 1,214,534,029,025 | 1,212,887,959,387 | 500.perlbench_r-1 |740,383,419,739 | 739,280,308,163 | 500.perlbench_r-2 |692,074,638,817 | 691,118,734,547 | 502.gcc_r-0 |190,820,141,435 | 190,857,065,988 | 502.gcc_r-1 |225,747,660,839 | 225,809,444,357 | <- -0.02% 502.gcc_r-2 |220,370,089,641 | 220,406,367,876 | <- -0.03% 502.gcc_r-3 |179,111,460,458 | 179,135,609,723 | <- -0.02% 502.gcc_r-4 |219,301,546,340 | 219,320,416,956 | <- -0.01% 503.bwaves_r-0|278,733,324,691 | 278,733,323,575 | <- -0.01% 503.bwaves_r-1|442,397,521,282 | 442,397,519,616 | 503.bwaves_r-2|344,112,218,206 | 344,112,216,760 | 503.bwaves_r-3|417,561,469,153 | 417,561,467,597 | 505.mcf_r |669,319,257,525 | 669,318,763,084 | 507.cactuBSSN_r | 2,852,767,394,456 | 2,564,736,063,742 | <+ 10.10% The small gcc regression seems like a tooling issue of some sort. Looking at the topblocks, the insn sequences are exactly the same, only the counts differ and its not obvious why. Here's for gcc_r-1. > Block 0 @ 0x170ca, 12 insns, 87854493 times, 0.47%: 000170ca : 170ca: 7179 add sp,sp,-48 170cc: ec26 sd s1,24(sp) 170ce: e84a sd s2,16(sp) 170d0: e44e sd s3,8(sp) 170d2: f406 sd ra,40(sp) 170d4: f022 sd s0,32(sp) 170d6: 84aa mv s1,a0 170d8: 03200913 li s2,50 170dc: 03d00993 li s3,61 170e0: 8526 mv a0,s1 170e2: 001cd097 auipc ra,0x1cd 170e6: bac080e7 jalr -1108(ra) # 1e3c8e > Block 1 @ 0x706d0a, 3 insns, 274713936 times, 0.37%: > Block 2 @ 0x1e3c8e, 9 insns, 88507109 times, 0.35%: ... < Block 0 @ 0x170ca, 12 insns, 87869602 times, 0.47%: < Block 1 @ 0x706d42, 3 insns, 274608893 times, 0.36%: < Block 2 @ 0x1e3c94, 9 insns, 88526354 times, 0.35%: FWIW, Greg internally has been looking at some of this and found some issues in the bbv tooling, but I wish all of this was shared/upstream (QEMU bbv plugin) for people to compare notes and not discover/fix the same issues over and again. Yea, we all meant to coordinate on those plugins. The one we've got had some problems with hash collisions and when there's a hash collision it just produces total junk data. I chased a few of these down and fixed them about a year ago. The other thing is qemu will split up blocks based on its internal notion of a translation page. So if you're looking at block level data you'll stumble over that as well. This aspect is the most troublesome problem I'm aware of right now. Jeff
Re: [RFC][PATCH] PR tree-optimization/109071 - -Warray-bounds false positive warnings due to code duplication from jump threading
On Mon, May 13, 2024 at 02:46:32PM -0600, Jeff Law wrote: > > > On 5/13/24 1:48 PM, Qing Zhao wrote: > > -Warray-bounds is an important option to enable linux kernal to keep > > the array out-of-bound errors out of the source tree. > > > > However, due to the false positive warnings reported in PR109071 > > (-Warray-bounds false positive warnings due to code duplication from > > jump threading), -Warray-bounds=1 cannot be added on by default. > > > > Although it's impossible to elinimate all the false positive warnings > > from -Warray-bounds=1 (See PR104355 Misleading -Warray-bounds > > documentation says "always out of bounds"), we should minimize the > > false positive warnings in -Warray-bounds=1. > > > > The root reason for the false positive warnings reported in PR109071 is: > > > > When the thread jump optimization tries to reduce the # of branches > > inside the routine, sometimes it needs to duplicate the code and > > split into two conditional pathes. for example: > > > > The original code: > > > > void sparx5_set (int * ptr, struct nums * sg, int index) > > { > >if (index >= 4) > > warn (); > >*ptr = 0; > >*val = sg->vals[index]; > >if (index >= 4) > > warn (); > >*ptr = *val; > > > >return; > > } > > > > With the thread jump, the above becomes: > > > > void sparx5_set (int * ptr, struct nums * sg, int index) > > { > >if (index >= 4) > > { > >warn (); > >*ptr = 0;// Code duplications since "warn" does return; > >*val = sg->vals[index]; // same this line. > > // In this path, since it's under the condition > > // "index >= 4", the compiler knows the value > > // of "index" is larger then 4, therefore the > > // out-of-bound warning. > >warn (); > > } > >else > > { > >*ptr = 0; > >*val = sg->vals[index]; > > } > >*ptr = *val; > >return; > > } > > > > We can see, after the thread jump optimization, the # of branches inside > > the routine "sparx5_set" is reduced from 2 to 1, however, due to the > > code duplication (which is needed for the correctness of the code), we > > got a false positive out-of-bound warning. > > > > In order to eliminate such false positive out-of-bound warning, > > > > A. Add one more flag for GIMPLE: is_splitted. > > B. During the thread jump optimization, when the basic blocks are > > duplicated, mark all the STMTs inside the original and duplicated > > basic blocks as "is_splitted"; > > C. Inside the array bound checker, add the following new heuristic: > > > > If > > 1. the stmt is duplicated and splitted into two conditional paths; > > + 2. the warning level < 2; > > + 3. the current block is not dominating the exit block > > Then not report the warning. > > > > The false positive warnings are moved from -Warray-bounds=1 to > > -Warray-bounds=2 now. > > > > Bootstrapped and regression tested on both x86 and aarch64. adjusted > > -Warray-bounds-61.c due to the false positive warnings. > > > > Let me know if you have any comments and suggestions. > This sounds horribly wrong. In the code above, the warning is correct. It's not sensible from a user's perspective. If this doesn't warn: void sparx5_set (int * ptr, struct nums * sg, int index) { *ptr = 0; *val = sg->vals[index]; *ptr = *val; } ... because the value range tracking of "index" spans [INT_MIN,INT_MAX], and warnings based on the value range are silenced if they haven't been clamped at all. (Otherwise warnings would be produced everywhere: only when a limited set of values is known is it useful to produce a warning.) But it makes no sense to warn about: void sparx5_set (int * ptr, struct nums * sg, int index) { if (index >= 4) warn (); *ptr = 0; *val = sg->vals[index]; if (index >= 4) warn (); *ptr = *val; } Because at "*val = sg->vals[index];" the actual value range tracking for index is _still_ [INT_MIN,INT_MAX]. (Only within the "then" side of the "if" statements is the range tracking [4,INT_MAX].) However, in the case where jump threading has split the execution flow and produced a copy of "*val = sg->vals[index];" where the value range tracking for "index" is now [4,INT_MAX], is the warning valid. But it is only for that instance. Reporting it for effectively both (there is only 1 source line for the array indexing) is misleading because there is nothing the user can do about it -- the compiler created the copy and then noticed it had a range it could apply to that array index. This situation makes -Warray-bounds unusable for the Linux kernel (we cannot have false positives says BDFL), but we'd *really* like to have it enabled since it usually finds real bugs. But these false positives can't be fixed on our end. :( So, moving them to -Warray-bounds=2 makes sense as that's the level
[PATCH] RISC-V: add option -m(no-)autovec-segment
LGTM juzhe.zh...@rivai.ai
[pushed] wwwdocs: cxx-dr-status: Replace by
The validator warns about as deprecated; use instead. Pushed. Gerald --- htdocs/projects/cxx-dr-status.html | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/htdocs/projects/cxx-dr-status.html b/htdocs/projects/cxx-dr-status.html index c70cdf21..e29d2407 100644 --- a/htdocs/projects/cxx-dr-status.html +++ b/htdocs/projects/cxx-dr-status.html @@ -19929,7 +19929,7 @@ https://wg21.link/cwg2842;>2842 open - Preferring an initializer_list over a single value + Preferring an initializer_list over a single value - @@ -20062,7 +20062,7 @@ https://wg21.link/cwg2861;>2861 review - dynamic_cast on bad pointer value + dynamic_cast on bad pointer value ? @@ -20097,7 +20097,7 @@ https://wg21.link/cwg2866;>2866 open - Observing the effects of [[no_unique_address] wwwdocs:] + Observing the effects of [[no_unique_address] wwwdocs:] - @@ -20118,7 +20118,7 @@ https://wg21.link/cwg2869;>2869 open - this in local classes + this in local classes - @@ -20167,7 +20167,7 @@ https://wg21.link/cwg2876;>2876 open - Disambiguation of T x = delete("text") + Disambiguation of T x = delete("text") - @@ -20188,7 +20188,7 @@ https://wg21.link/cwg2879;>2879 open - Undesired outcomes with const_cast + Undesired outcomes with const_cast - -- 2.45.0
Re: Re: [PATCH v1 2/3] RISC-V: Implement vectorizable early exit with vcond_mask_len
>> Seems a bit odd on first sight. If all we want to do is to >> select between two masks why do we need a large Pmode mode? Since we are lowering final mask = vcond_mask_len (mask, 1s, 0s, len, bias), into: vid.v v1 vcmp v2 vmsltu.vx v2, v1, len, TUMU Then len is Pmode, so we only allow to lower vcond_mask_len with vector mode for Pmode. >> So that's basically a mask-move with length? Can't this be done >> differently? If not, please describe, maybe this is already >> the shortest way. We are implementing: final mask = mask[i] && i < len ? 1 : 0 The mask move with length but TUMU, I believe current approach is the optimal way. juzhe.zh...@rivai.ai From: Robin Dapp Date: 2024-05-14 05:14 To: pan2.li; gcc-patches CC: rdapp.gcc; juzhe.zhong; kito.cheng; richard.guenther; Tamar.Christina; richard.sandiford Subject: Re: [PATCH v1 2/3] RISC-V: Implement vectorizable early exit with vcond_mask_len Hi Pan, thanks for working on this. In general the patch looks reasonable to me but I'd rather have some more comments about the high-level idea. E.g. cbranch is implemented like aarch64 by xor'ing the bitmasks and comparing the result against zero (so we branch based on mask equality). > +;; vcond_mask_len High-level description here instead please. > +(define_insn_and_split "vcond_mask_len_" > + [(set (match_operand:VB 0 "register_operand") > +(unspec: VB [ > + (match_operand:VB 1 "register_operand") > + (match_operand:VB 2 "const_1_operand") I guess it works like that because operand[2] is just implicitly used anyway but shouldn't that rather be an all_ones_operand? > + && riscv_vector::get_vector_mode (Pmode, GET_MODE_NUNITS > (mode)).exists ()" Seems a bit odd on first sight. If all we want to do is to select between two masks why do we need a large Pmode mode? > +rtx ops[] = {operands[0], operands[1], operands[1], cmp, reg, > operands[4]}; So that's basically a mask-move with length? Can't this be done differently? If not, please describe, maybe this is already the shortest way. Regards Robin
Re: [PATCH] RISC-V: Do not allow v0 as dest when merging [PR115068].
Hi, Robin. I saw vwadd/vwsub.wx have same issue. Could you change them and add test too ? Thanks. juzhe.zh...@rivai.ai From: Robin Dapp Date: 2024-05-14 04:15 To: gcc-patches CC: rdapp.gcc; palmer; Kito Cheng; juzhe.zh...@rivai.ai; jeffreyalaw Subject: [PATCH] RISC-V: Do not allow v0 as dest when merging [PR115068]. Hi, this patch splits the vfw...wf pattern so we do not emit e.g. vfwadd.wf v0,v8,fa5,v0.t anymore. Regtested on rv64gcv_zvfh. Regards Robin gcc/ChangeLog: PR target/115068 * config/riscv/vector.md: Split vfw.wf pattern. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/pr115068-run.c: New test. * gcc.target/riscv/rvv/base/pr115068.c: New test. --- gcc/config/riscv/vector.md| 20 ++--- .../gcc.target/riscv/rvv/base/pr115068-run.c | 28 ++ .../gcc.target/riscv/rvv/base/pr115068.c | 29 +++ 3 files changed, 67 insertions(+), 10 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr115068-run.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr115068.c diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md index 2a54f78df8e..e408baa809c 100644 --- a/gcc/config/riscv/vector.md +++ b/gcc/config/riscv/vector.md @@ -7178,24 +7178,24 @@ (define_insn "@pred_single_widen_sub" (symbol_ref "riscv_vector::get_frm_mode (operands[9])"))]) (define_insn "@pred_single_widen__scalar" - [(set (match_operand:VWEXTF 0 "register_operand" "=vr, vr") + [(set (match_operand:VWEXTF 0 "register_operand""=vd, vd, vr, vr") (if_then_else:VWEXTF (unspec: - [(match_operand: 1 "vector_mask_operand" "vmWc1,vmWc1") - (match_operand 5 "vector_length_operand" " rK, rK") - (match_operand 6 "const_int_operand" "i,i") - (match_operand 7 "const_int_operand" "i,i") - (match_operand 8 "const_int_operand" "i,i") - (match_operand 9 "const_int_operand" "i,i") + [(match_operand: 1 "vector_mask_operand" " vm, vm,Wc1,Wc1") + (match_operand 5 "vector_length_operand" " rK, rK, rK, rK") + (match_operand 6 "const_int_operand" " i, i, i, i") + (match_operand 7 "const_int_operand" " i, i, i, i") + (match_operand 8 "const_int_operand" " i, i, i, i") + (match_operand 9 "const_int_operand" " i, i, i, i") (reg:SI VL_REGNUM) (reg:SI VTYPE_REGNUM) (reg:SI FRM_REGNUM)] UNSPEC_VPREDICATE) (plus_minus:VWEXTF - (match_operand:VWEXTF 3 "register_operand" " vr, vr") + (match_operand:VWEXTF 3 "register_operand"" vr, vr, vr, vr") (float_extend:VWEXTF (vec_duplicate: - (match_operand: 4 "register_operand" "f,f" - (match_operand:VWEXTF 2 "vector_merge_operand" " vu,0")))] + (match_operand: 4 "register_operand" " f, f, f, f" + (match_operand:VWEXTF 2 "vector_merge_operand" " vu, 0, vu, 0")))] "TARGET_VECTOR" "vfw.wf\t%0,%3,%4%p1" [(set_attr "type" "vf") diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr115068-run.c b/gcc/testsuite/gcc.target/riscv/rvv/base/pr115068-run.c new file mode 100644 index 000..95ec8e06021 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr115068-run.c @@ -0,0 +1,28 @@ +/* { dg-do run } */ +/* { dg-require-effective-target riscv_v_ok } */ +/* { dg-add-options riscv_v } */ +/* { dg-additional-options "-std=gnu99" } */ + +#include +#include + +vfloat64m8_t +test_vfwadd_wf_f64m8_m (vbool8_t vm, vfloat64m8_t vs2, float rs1, size_t vl) +{ + return __riscv_vfwadd_wf_f64m8_m (vm, vs2, rs1, vl); +} + +char global_memory[1024]; +void *fake_memory = (void *) global_memory; + +int +main () +{ + asm volatile ("fence" ::: "memory"); + vfloat64m8_t vfwadd_wf_f64m8_m_vd = test_vfwadd_wf_f64m8_m ( +__riscv_vreinterpret_v_i8m1_b8 (__riscv_vundefined_i8m1 ()), +__riscv_vundefined_f64m8 (), 1.0, __riscv_vsetvlmax_e64m8 ()); + asm volatile ("" ::"vr"(vfwadd_wf_f64m8_m_vd) : "memory"); + + return 0; +} diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr115068.c b/gcc/testsuite/gcc.target/riscv/rvv/base/pr115068.c new file mode 100644 index 000..6d680037aa1 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr115068.c @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-add-options riscv_v } */ +/* { dg-additional-options "-std=gnu99" } */ + +#include +#include + +vfloat64m8_t +test_vfwadd_wf_f64m8_m (vbool8_t vm, vfloat64m8_t vs2, float rs1, size_t vl) +{ + return __riscv_vfwadd_wf_f64m8_m (vm, vs2, rs1, vl); +} + +char global_memory[1024]; +void *fake_memory = (void *) global_memory; + +int +main () +{ + asm volatile ("fence" ::: "memory"); + vfloat64m8_t vfwadd_wf_f64m8_m_vd = test_vfwadd_wf_f64m8_m ( +
Re: [PATCH] arm: Force flag_pic for FDPIC
On Mon, Mar 4, 2024 at 12:13 AM Fangrui Song wrote: > > From: Fangrui Song > > -fno-pic -mfdpic generated code is like regular -fno-pic, not suitable > for FDPIC (absolute addressing for symbol references and no function > descriptor). The sh port simply upgrades -fno-pic to -fpie by setting > flag_pic. Let's follow suit. > > Link: > https://inbox.sourceware.org/gcc-patches/20150913165303.gc17...@brightrain.aerifal.cx/ > > gcc/ChangeLog: > > * config/arm/arm.cc (arm_option_override): Set flag_pic if > TARGET_FDPIC. > > gcc/testsuite/ChangeLog: > > * gcc.target/arm/fdpic-pie.c: New test. > --- > gcc/config/arm/arm.cc| 6 + > gcc/testsuite/gcc.target/arm/fdpic-pie.c | 30 > 2 files changed, 36 insertions(+) > create mode 100644 gcc/testsuite/gcc.target/arm/fdpic-pie.c > > diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc > index 1cd69268ee9..f2fd3cce48c 100644 > --- a/gcc/config/arm/arm.cc > +++ b/gcc/config/arm/arm.cc > @@ -3682,6 +3682,12 @@ arm_option_override (void) >arm_pic_register = FDPIC_REGNUM; >if (TARGET_THUMB1) > sorry ("FDPIC mode is not supported in Thumb-1 mode"); > + > + /* FDPIC code is a special form of PIC, and the vast majority of code > +generation constraints that apply to PIC also apply to FDPIC, so we > + set flag_pic to avoid the need to check TARGET_FDPIC everywhere > + flag_pic is checked. */ > + flag_pic = 2; > } > >if (arm_pic_register_string != NULL) > diff --git a/gcc/testsuite/gcc.target/arm/fdpic-pie.c > b/gcc/testsuite/gcc.target/arm/fdpic-pie.c > new file mode 100644 > index 000..909db8bce74 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/arm/fdpic-pie.c > @@ -0,0 +1,30 @@ > +// { dg-do compile } > +// { dg-options "-O2 -fno-pic -mfdpic" } > +// { dg-skip-if "-mpure-code and -fPIC incompatible" { *-*-* } { > "-mpure-code" } } > + > +__attribute__((visibility("hidden"))) void hidden_fun(void); > +void fun(void); > +__attribute__((visibility("hidden"))) extern int hidden_var; > +extern int var; > +__attribute__((visibility("hidden"))) const int ro_hidden_var = 42; > + > +// { dg-final { scan-assembler "hidden_fun\\(GOTOFFFUNCDESC\\)" } } > +void *addr_hidden_fun(void) { return hidden_fun; } > + > +// { dg-final { scan-assembler "fun\\(GOTFUNCDESC\\)" } } > +void *addr_fun(void) { return fun; } > + > +// { dg-final { scan-assembler "hidden_var\\(GOT\\)" } } > +void *addr_hidden_var(void) { return _var; } > + > +// { dg-final { scan-assembler "var\\(GOT\\)" } } > +void *addr_var(void) { return } > + > +// { dg-final { scan-assembler ".LANCHOR0\\(GOT\\)" } } > +const int *addr_ro_hidden_var(void) { return _hidden_var; } > + > +// { dg-final { scan-assembler "hidden_var\\(GOT\\)" } } > +int read_hidden_var(void) { return hidden_var; } > + > +// { dg-final { scan-assembler "var\\(GOT\\)" } } > +int read_var(void) { return var; } > -- > 2.44.0.rc1.240.g4c46232300-goog Ping:) -- 宋方睿
Re: [PATCH v1 2/3] RISC-V: Implement vectorizable early exit with vcond_mask_len
Hi Pan, thanks for working on this. In general the patch looks reasonable to me but I'd rather have some more comments about the high-level idea. E.g. cbranch is implemented like aarch64 by xor'ing the bitmasks and comparing the result against zero (so we branch based on mask equality). > +;; vcond_mask_len High-level description here instead please. > +(define_insn_and_split "vcond_mask_len_" > + [(set (match_operand:VB 0 "register_operand") > +(unspec: VB [ > + (match_operand:VB 1 "register_operand") > + (match_operand:VB 2 "const_1_operand") I guess it works like that because operand[2] is just implicitly used anyway but shouldn't that rather be an all_ones_operand? > + && riscv_vector::get_vector_mode (Pmode, GET_MODE_NUNITS > (mode)).exists ()" Seems a bit odd on first sight. If all we want to do is to select between two masks why do we need a large Pmode mode? > +rtx ops[] = {operands[0], operands[1], operands[1], cmp, reg, > operands[4]}; So that's basically a mask-move with length? Can't this be done differently? If not, please describe, maybe this is already the shortest way. Regards Robin
Follow up #1 (was Re: [PATCH v2 1/2] RISC-V: avoid LUI based const materialization ... [part of PR/106265])
On 5/13/24 11:49, Vineet Gupta wrote: > 500.perlbench_r-0 | 1,214,534,029,025 | 1,212,887,959,387 | > 500.perlbench_r-1 |740,383,419,739 | 739,280,308,163 | > 500.perlbench_r-2 |692,074,638,817 | 691,118,734,547 | > 502.gcc_r-0 |190,820,141,435 | 190,857,065,988 | > 502.gcc_r-1 |225,747,660,839 | 225,809,444,357 | <- -0.02% > 502.gcc_r-2 |220,370,089,641 | 220,406,367,876 | <- -0.03% > 502.gcc_r-3 |179,111,460,458 | 179,135,609,723 | <- -0.02% > 502.gcc_r-4 |219,301,546,340 | 219,320,416,956 | <- -0.01% > 503.bwaves_r-0|278,733,324,691 | 278,733,323,575 | <- -0.01% > 503.bwaves_r-1|442,397,521,282 | 442,397,519,616 | > 503.bwaves_r-2|344,112,218,206 | 344,112,216,760 | > 503.bwaves_r-3|417,561,469,153 | 417,561,467,597 | > 505.mcf_r |669,319,257,525 | 669,318,763,084 | > 507.cactuBSSN_r | 2,852,767,394,456 | 2,564,736,063,742 | <+ 10.10% The small gcc regression seems like a tooling issue of some sort. Looking at the topblocks, the insn sequences are exactly the same, only the counts differ and its not obvious why. Here's for gcc_r-1. > Block 0 @ 0x170ca, 12 insns, 87854493 times, 0.47%: 000170ca : 170ca: 7179 add sp,sp,-48 170cc: ec26 sd s1,24(sp) 170ce: e84a sd s2,16(sp) 170d0: e44e sd s3,8(sp) 170d2: f406 sd ra,40(sp) 170d4: f022 sd s0,32(sp) 170d6: 84aa mv s1,a0 170d8: 03200913 li s2,50 170dc: 03d00993 li s3,61 170e0: 8526 mv a0,s1 170e2: 001cd097 auipc ra,0x1cd 170e6: bac080e7 jalr -1108(ra) # 1e3c8e > Block 1 @ 0x706d0a, 3 insns, 274713936 times, 0.37%: > Block 2 @ 0x1e3c8e, 9 insns, 88507109 times, 0.35%: ... < Block 0 @ 0x170ca, 12 insns, 87869602 times, 0.47%: < Block 1 @ 0x706d42, 3 insns, 274608893 times, 0.36%: < Block 2 @ 0x1e3c94, 9 insns, 88526354 times, 0.35%: FWIW, Greg internally has been looking at some of this and found some issues in the bbv tooling, but I wish all of this was shared/upstream (QEMU bbv plugin) for people to compare notes and not discover/fix the same issues over and again. Thx, -Vineet
Re: [RFC][PATCH] PR tree-optimization/109071 - -Warray-bounds false positive warnings due to code duplication from jump threading
On 5/13/24 1:48 PM, Qing Zhao wrote: -Warray-bounds is an important option to enable linux kernal to keep the array out-of-bound errors out of the source tree. However, due to the false positive warnings reported in PR109071 (-Warray-bounds false positive warnings due to code duplication from jump threading), -Warray-bounds=1 cannot be added on by default. Although it's impossible to elinimate all the false positive warnings from -Warray-bounds=1 (See PR104355 Misleading -Warray-bounds documentation says "always out of bounds"), we should minimize the false positive warnings in -Warray-bounds=1. The root reason for the false positive warnings reported in PR109071 is: When the thread jump optimization tries to reduce the # of branches inside the routine, sometimes it needs to duplicate the code and split into two conditional pathes. for example: The original code: void sparx5_set (int * ptr, struct nums * sg, int index) { if (index >= 4) warn (); *ptr = 0; *val = sg->vals[index]; if (index >= 4) warn (); *ptr = *val; return; } With the thread jump, the above becomes: void sparx5_set (int * ptr, struct nums * sg, int index) { if (index >= 4) { warn (); *ptr = 0;// Code duplications since "warn" does return; *val = sg->vals[index]; // same this line. // In this path, since it's under the condition // "index >= 4", the compiler knows the value // of "index" is larger then 4, therefore the // out-of-bound warning. warn (); } else { *ptr = 0; *val = sg->vals[index]; } *ptr = *val; return; } We can see, after the thread jump optimization, the # of branches inside the routine "sparx5_set" is reduced from 2 to 1, however, due to the code duplication (which is needed for the correctness of the code), we got a false positive out-of-bound warning. In order to eliminate such false positive out-of-bound warning, A. Add one more flag for GIMPLE: is_splitted. B. During the thread jump optimization, when the basic blocks are duplicated, mark all the STMTs inside the original and duplicated basic blocks as "is_splitted"; C. Inside the array bound checker, add the following new heuristic: If 1. the stmt is duplicated and splitted into two conditional paths; + 2. the warning level < 2; + 3. the current block is not dominating the exit block Then not report the warning. The false positive warnings are moved from -Warray-bounds=1 to -Warray-bounds=2 now. Bootstrapped and regression tested on both x86 and aarch64. adjusted -Warray-bounds-61.c due to the false positive warnings. Let me know if you have any comments and suggestions. This sounds horribly wrong. In the code above, the warning is correct. Jeff
Re: [PATCH] RISC-V: add option -m(no-)autovec-segment
On 2/27/24 07:25, Jeff Law wrote: > On 2/25/24 21:53, Greg McGary wrote: >> Add option -m(no-)autovec-segment to enable/disable autovectorizer >> from emitting vector segment load/store instructions. This is useful for >> performance experiments. >> >> gcc/ChangeLog: >> * config/riscv/autovec.md (vec_mask_len_load_lanes, >> vec_mask_len_store_lanes): >>Predicate with TARGET_VECTOR_AUTOVEC_SEGMENT >> * gcc/config/riscv/riscv-opts.h (TARGET_VECTOR_AUTOVEC_SEGMENT): New >> macro. >> * gcc/config/riscv/riscv.opt (-m(no-)autovec-segment): New option. >> * gcc/tree-vect-stmts.cc (gcc/tree-vect-stmts.cc): Prevent >> divide-by-zero. >> * testsuite/gcc.target/riscv/rvv/autovec/struct/*_noseg*.c, >> testsuite/gcc.target/riscv/rvv/autovec/no-segment.c: New tests. > I don't mind having options to do this kind of selection (we've done > similar things internally for other RVV features). But I don't think > now is the time to be introducing this stuff. We're in stage4 of the > development cycle after all. Ping ! now that we are back in stage1 Thx, -Vineet
Re: [PATCH v2 2/2] RISC-V: avoid LUI based const mat in prologue/epilogue expansion [PR/105733]
On 5/13/24 12:49 PM, Vineet Gupta wrote: If the constant used for stack offset can be expressed as sum of two S12 values, the constant need not be materialized (in a reg) and instead the two S12 bits can be added to instructions involved with frame pointer. This avoids burning a register and more importantly can often get down to be 2 insn vs. 3. The prev patches to generally avoid LUI based const materialization didn't fix this PR and need this directed fix in funcion prologue/epilogue expansion. This fix doesn't move the neddle for SPEC, at all, but it is still a win considering gcc generates one insn fewer than llvm for the test ;-) gcc-13.1 release | gcc 230823 | | |g6619b3d4c15c| This patch | clang/llvm - li t0,-4096 | lit0,-4096 | addi sp,sp,-2048 | addi sp,sp,-2048 addit0,t0,2016 | addi t0,t0,2032| add sp,sp,-16 | addi sp,sp,-32 li a4,4096 | add sp,sp,t0 | add a5,sp,a0| add a1,sp,16 add sp,sp,t0 | addi a5,sp,-2032 | sbzero,0(a5) | add a0,a0,a1 li a5,-4096 | add a0,a5,a0 | addi sp,sp,2032 | sb zero,0(a0) addia4,a4,-2032 | lit0, 4096 | addi sp,sp,32| addi sp,sp,2032 add a4,a4,a5 | sbzero,2032(a0) | ret | addi sp,sp,48 addia5,sp,16 | addi t0,t0,-2032 | | ret add a5,a4,a5 | add sp,sp,t0 | add a0,a5,a0 | ret | li t0,4096 | sd a5,8(sp) | sb zero,2032(a0)| addit0,t0,-2016 | add sp,sp,t0 | ret | gcc/ChangeLog: PR target/105733 * config/riscv/riscv.h: New macros for with aligned offsets. * config/riscv/riscv.cc (riscv_split_sum_of_two_s12): New function to split a sum of two s12 values into constituents. (riscv_expand_prologue): Handle offset being sum of two S12. (riscv_expand_epilogue): Ditto. * config/riscv/riscv-protos.h (riscv_split_sum_of_two_s12): New. gcc/testsuite/ChangeLog: * gcc.target/riscv/pr105733.c: New Test. * gcc.target/riscv/rvv/autovec/vls/spill-1.c: Adjust to not expect LUI 4096. * gcc.target/riscv/rvv/autovec/vls/spill-2.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/spill-3.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/spill-4.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/spill-5.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/spill-6.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/spill-7.c: Ditto. @@ -8074,14 +8111,26 @@ riscv_expand_epilogue (int style) } else { - if (!SMALL_OPERAND (adjust_offset.to_constant ())) + HOST_WIDE_INT adj_off_value = adjust_offset.to_constant (); + if (SMALL_OPERAND (adj_off_value)) + { + adjust = GEN_INT (adj_off_value); + } + else if (SUM_OF_TWO_S12_ALGN (adj_off_value)) + { + HOST_WIDE_INT base, off; + riscv_split_sum_of_two_s12 (adj_off_value, , ); + insn = gen_add3_insn (stack_pointer_rtx, hard_frame_pointer_rtx, + GEN_INT (base)); + RTX_FRAME_RELATED_P (insn) = 1; + adjust = GEN_INT (off); + } So this was the hunk that we identified internally as causing problems with libgomp's testsuite. We never fully chased it down as this hunk didn't seem terribly important performance wise -- we just set it aside. The thing is it looked basically correct to me. So the failure was certainly unexpected, but it was consistent. So I think the question is whether or not the CI system runs the libgomp testsuite, particularly in the rv64 linux configuration. If it does, and it passes, then we're good. I'm still finding my way around the configuration, so I don't know if the CI system Edwin & Patrick have built tests libgomp or not. If it isn't run, then we'll need to do a run to test that. I'm set up here to do that if needed. I can just drop this version into our internal tree, trigger an internal CI run and see if it complains :-) If it does complain, then we know where to start investigations. Jeff
[PATCH] Fortran: fix bounds check for assignment, class component [PR86100]
Dear all, the attached patch does two things: - it fixes a bogus array bounds check when deep-copying a class component of a derived type and the class component has rank > 1, the reason being that the previous code compared the full size of one side with the size of the first dimension of the other - the bounds-check error message that was generated e.g. by an allocate statement with conflicting sizes in the allocation and the source-expr will now use an improved abbreviated name pointing to the component involved, which was introduced in 14-development. What I could not resolve: a deep copy may still create no useful array name in the error message (which I am now unable to trigger). If someone sees how to extract it reliably from the tree, please let me know. Regtested on x86_64-pc-linux-gnu. OK for mainline? I would like to backport this to 14-branch after a decent delay. Thanks, Harald From e187285dfd83da2f69cfd50854c701744dc8acc5 Mon Sep 17 00:00:00 2001 From: Harald Anlauf Date: Mon, 13 May 2024 22:06:33 +0200 Subject: [PATCH] Fortran: fix bounds check for assignment, class component [PR86100] gcc/fortran/ChangeLog: PR fortran/86100 * trans-array.cc (gfc_conv_ss_startstride): Use abridged_ref_name to generate a more user-friendly name for bounds-check messages. * trans-expr.cc (gfc_copy_class_to_class): Fix bounds check for rank>1 by looping over the dimensions. gcc/testsuite/ChangeLog: PR fortran/86100 * gfortran.dg/bounds_check_25.f90: New test. --- gcc/fortran/trans-array.cc| 7 +++- gcc/fortran/trans-expr.cc | 40 ++- gcc/testsuite/gfortran.dg/bounds_check_25.f90 | 32 +++ 3 files changed, 60 insertions(+), 19 deletions(-) create mode 100644 gcc/testsuite/gfortran.dg/bounds_check_25.f90 diff --git a/gcc/fortran/trans-array.cc b/gcc/fortran/trans-array.cc index c5b56f4e273..eec62c296ff 100644 --- a/gcc/fortran/trans-array.cc +++ b/gcc/fortran/trans-array.cc @@ -4911,6 +4911,7 @@ done: gfc_expr *expr; locus *expr_loc; const char *expr_name; + char *ref_name = NULL; ss_info = ss->info; if (ss_info->type != GFC_SS_SECTION) @@ -4922,7 +4923,10 @@ done: expr = ss_info->expr; expr_loc = >where; - expr_name = expr->symtree->name; + if (expr->ref) + expr_name = ref_name = abridged_ref_name (expr, NULL); + else + expr_name = expr->symtree->name; gfc_start_block (); @@ -5134,6 +5138,7 @@ done: gfc_add_expr_to_block (, tmp); + free (ref_name); } tmp = gfc_finish_block (); diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc index e315e2d3370..dfc5b8e9b4a 100644 --- a/gcc/fortran/trans-expr.cc +++ b/gcc/fortran/trans-expr.cc @@ -1520,7 +1520,6 @@ gfc_copy_class_to_class (tree from, tree to, tree nelems, bool unlimited) stmtblock_t body; stmtblock_t ifbody; gfc_loopinfo loop; - tree orig_nelems = nelems; /* Needed for bounds check. */ gfc_init_block (); tmp = fold_build2_loc (input_location, MINUS_EXPR, @@ -1552,27 +1551,32 @@ gfc_copy_class_to_class (tree from, tree to, tree nelems, bool unlimited) /* Add bounds check. */ if ((gfc_option.rtcheck & GFC_RTCHECK_BOUNDS) > 0 && is_from_desc) { - char *msg; const char *name = "<>"; - tree from_len; + int dim, rank; if (DECL_P (to)) - name = (const char *)(DECL_NAME (to)->identifier.id.str); - - from_len = gfc_conv_descriptor_size (from_data, 1); - from_len = fold_convert (TREE_TYPE (orig_nelems), from_len); - tmp = fold_build2_loc (input_location, NE_EXPR, - logical_type_node, from_len, orig_nelems); - msg = xasprintf ("Array bound mismatch for dimension %d " - "of array '%s' (%%ld/%%ld)", - 1, name); - - gfc_trans_runtime_check (true, false, tmp, , - _current_locus, msg, - fold_convert (long_integer_type_node, orig_nelems), - fold_convert (long_integer_type_node, from_len)); + name = IDENTIFIER_POINTER (DECL_NAME (to)); - free (msg); + rank = GFC_TYPE_ARRAY_RANK (TREE_TYPE (from_data)); + for (dim = 1; dim <= rank; dim++) + { + tree from_len, to_len, cond; + char *msg; + + from_len = gfc_conv_descriptor_size (from_data, dim); + from_len = fold_convert (long_integer_type_node, from_len); + to_len = gfc_conv_descriptor_size (to_data, dim); + to_len = fold_convert (long_integer_type_node, to_len); + msg = xasprintf ("Array bound mismatch for dimension %d " + "of array '%s' (%%ld/%%ld)", + dim, name); + cond = fold_build2_loc (input_location, NE_EXPR, + logical_type_node, from_len, to_len); + gfc_trans_runtime_check (true, false, cond, , + _current_locus, msg, + to_len, from_len); + free (msg); + } } tmp = build_call_vec (fcn_type, fcn, args); diff --git a/gcc/testsuite/gfortran.dg/bounds_check_25.f90
Re: [PATCH v2 1/2] RISC-V: avoid LUI based const materialization ... [part of PR/106265]
On 5/13/24 12:49 PM, Vineet Gupta wrote: Apologies for the delay in getting this out. Needed to fix one ICE with glibc build and fresh round of testing: both testsuite and SPEC runs (which are similar to v1 in terms of Cactu gains, but some more minor regressions elsewhere gcc). Again those seem so small that IMHO this should still go in. I'll investigate those next as well as an existing weirdnes in glibc tempnam which I spotted during the debugging. Changes since v1 [1] - Tighten the main conditition to avoid stack regs as destination (to avoid making them potentially unaligned with -2047 addend: this might be OK execution/ABI wise, but undesirable/ugly still specially when coming from compiler codegen). - Ensure that first alternative is always split - Remove "&& 1" from split condition. That was tripping up glibc build with illegal operands `add s0, s0, 2048`. [1] https://gcc.gnu.org/pipermail/gcc-patches/2024-March/647877.html +;; Special case of adding a reg and constant if latter is sum of two S12 +;; values (in range -2048 to 2047). Avoid materialized the const and fuse +;; into the add (with an additional add for 2nd value). Makes a 3 insn +;; sequence into 2 insn. + +(define_insn_and_split "*add3_const_sum_of_two_s12" + [(set (match_operand:P0 "register_operand" "=r,r") + (plus:P (match_operand:P 1 "register_operand" " r,r") + (match_operand:P 2 "const_two_s12"" MiG,r")))] + "!riscv_reg_frame_related (operands[0])" So that !riscv_reg_frame_related is my only concern with this patch. It's a destination, so it *may* be OK. If it were a source operand, then we'd have to worry about cases where it was a pseudo with the same value as sp/fp/argp and subsequent copy propagation replacing the pseudo with sp/fp/argp causing the insn to no longer match. Similarly if it were a source operand we'd have to worry about cases where the pseudo had a registered (or discoverable) equivalence to sp/fp/argp plus an offset. IRA/LRA can replace the use with its equivalence in some of those cases which would have potentially caused headaches. But as a destination we really just have to worry about generation in the prologue/epilogue and for alloca calls. Those should be the only places that set one of those special registers. They're constrained enough that I think we'll be OK. I'm very slightly worried about hard register cprop, but I think it should be safe these days WRT those special registers in the unlikely event it found an opportunity to propagate them. So a tentative OK. If we find this tidibit is problematical in the future, then what I would suggest is we allow those special registers and dial-back the aggressiveness on the range of allowed constants. That would allow the first instruction in the sequence to never create a mis-aligned sp. But again, that's only if we need to revisit. Please wait for CI to report back sane results :-) Jeff
[PATCH] RISC-V: Do not allow v0 as dest when merging [PR115068].
Hi, this patch splits the vfw...wf pattern so we do not emit e.g. vfwadd.wf v0,v8,fa5,v0.t anymore. Regtested on rv64gcv_zvfh. Regards Robin gcc/ChangeLog: PR target/115068 * config/riscv/vector.md: Split vfw.wf pattern. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/pr115068-run.c: New test. * gcc.target/riscv/rvv/base/pr115068.c: New test. --- gcc/config/riscv/vector.md| 20 ++--- .../gcc.target/riscv/rvv/base/pr115068-run.c | 28 ++ .../gcc.target/riscv/rvv/base/pr115068.c | 29 +++ 3 files changed, 67 insertions(+), 10 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr115068-run.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr115068.c diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md index 2a54f78df8e..e408baa809c 100644 --- a/gcc/config/riscv/vector.md +++ b/gcc/config/riscv/vector.md @@ -7178,24 +7178,24 @@ (define_insn "@pred_single_widen_sub" (symbol_ref "riscv_vector::get_frm_mode (operands[9])"))]) (define_insn "@pred_single_widen__scalar" - [(set (match_operand:VWEXTF 0 "register_operand" "=vr, vr") + [(set (match_operand:VWEXTF 0 "register_operand""=vd, vd, vr, vr") (if_then_else:VWEXTF (unspec: - [(match_operand: 1 "vector_mask_operand" "vmWc1,vmWc1") -(match_operand 5 "vector_length_operand" " rK, rK") -(match_operand 6 "const_int_operand" "i, i") -(match_operand 7 "const_int_operand" "i, i") -(match_operand 8 "const_int_operand" "i, i") -(match_operand 9 "const_int_operand" "i, i") + [(match_operand: 1 "vector_mask_operand" " vm, vm,Wc1,Wc1") +(match_operand 5 "vector_length_operand" " rK, rK, rK, rK") +(match_operand 6 "const_int_operand" " i, i, i, i") +(match_operand 7 "const_int_operand" " i, i, i, i") +(match_operand 8 "const_int_operand" " i, i, i, i") +(match_operand 9 "const_int_operand" " i, i, i, i") (reg:SI VL_REGNUM) (reg:SI VTYPE_REGNUM) (reg:SI FRM_REGNUM)] UNSPEC_VPREDICATE) (plus_minus:VWEXTF - (match_operand:VWEXTF 3 "register_operand" " vr, vr") + (match_operand:VWEXTF 3 "register_operand"" vr, vr, vr, vr") (float_extend:VWEXTF (vec_duplicate: - (match_operand: 4 "register_operand" "f, f" - (match_operand:VWEXTF 2 "vector_merge_operand" " vu, 0")))] + (match_operand: 4 "register_operand" " f, f, f, f" + (match_operand:VWEXTF 2 "vector_merge_operand" " vu, 0, vu, 0")))] "TARGET_VECTOR" "vfw.wf\t%0,%3,%4%p1" [(set_attr "type" "vf") diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr115068-run.c b/gcc/testsuite/gcc.target/riscv/rvv/base/pr115068-run.c new file mode 100644 index 000..95ec8e06021 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr115068-run.c @@ -0,0 +1,28 @@ +/* { dg-do run } */ +/* { dg-require-effective-target riscv_v_ok } */ +/* { dg-add-options riscv_v } */ +/* { dg-additional-options "-std=gnu99" } */ + +#include +#include + +vfloat64m8_t +test_vfwadd_wf_f64m8_m (vbool8_t vm, vfloat64m8_t vs2, float rs1, size_t vl) +{ + return __riscv_vfwadd_wf_f64m8_m (vm, vs2, rs1, vl); +} + +char global_memory[1024]; +void *fake_memory = (void *) global_memory; + +int +main () +{ + asm volatile ("fence" ::: "memory"); + vfloat64m8_t vfwadd_wf_f64m8_m_vd = test_vfwadd_wf_f64m8_m ( +__riscv_vreinterpret_v_i8m1_b8 (__riscv_vundefined_i8m1 ()), +__riscv_vundefined_f64m8 (), 1.0, __riscv_vsetvlmax_e64m8 ()); + asm volatile ("" ::"vr"(vfwadd_wf_f64m8_m_vd) : "memory"); + + return 0; +} diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr115068.c b/gcc/testsuite/gcc.target/riscv/rvv/base/pr115068.c new file mode 100644 index 000..6d680037aa1 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr115068.c @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-add-options riscv_v } */ +/* { dg-additional-options "-std=gnu99" } */ + +#include +#include + +vfloat64m8_t +test_vfwadd_wf_f64m8_m (vbool8_t vm, vfloat64m8_t vs2, float rs1, size_t vl) +{ + return __riscv_vfwadd_wf_f64m8_m (vm, vs2, rs1, vl); +} + +char global_memory[1024]; +void *fake_memory = (void *) global_memory; + +int +main () +{ + asm volatile ("fence" ::: "memory"); + vfloat64m8_t vfwadd_wf_f64m8_m_vd = test_vfwadd_wf_f64m8_m ( +__riscv_vreinterpret_v_i8m1_b8 (__riscv_vundefined_i8m1 ()), +
[wwwdocs] cxx-dr-status: Update from C++ Core Language Issue TOC, Revision 114
Pushed. commit 06c46c88cc02e0dff5f65b41754178fb25fb939e Author: Marek Polacek Date: Mon May 13 16:09:05 2024 -0400 cxx-dr-status: Update from C++ Core Language Issue TOC, Revision 114 diff --git a/htdocs/projects/cxx-dr-status.html b/htdocs/projects/cxx-dr-status.html index a5f45359..2a61cfbd 100644 --- a/htdocs/projects/cxx-dr-status.html +++ b/htdocs/projects/cxx-dr-status.html @@ -15,7 +15,7 @@ This table tracks the implementation status of C++ defect reports in GCC. It is based on C++ Standard Core Language Issue Table of Contents, Revision - 113 (https://www.open-std.org/jtc1/sc22/wg21/docs/cwg_toc.html;>here). + 114 (https://www.open-std.org/jtc1/sc22/wg21/docs/cwg_toc.html;>here). @@ -1652,7 +1652,7 @@ https://wg21.link/cwg233;>233 - drafting + review References vs pointers in UDC overload resolution No https://gcc.gnu.org/PR114697;>PR114697 @@ -3196,7 +3196,7 @@ https://wg21.link/cwg453;>453 - tentatively ready + DR References may only bind to "valid" objects ? @@ -7031,11 +7031,11 @@ ? - + https://wg21.link/cwg1001;>1001 - drafting + review Parameter type adjustment in dependent parameter types - - + ? https://gcc.gnu.org/PR51851;>PR51851 @@ -7292,7 +7292,7 @@ https://wg21.link/cwg1038;>1038 - DR + DRWP Overload resolution of x.static_func ? @@ -8624,6 +8624,7 @@ https://wg21.link/cwg1228;>1228 NAD Copy-list-initialization and explicit constructors + No https://gcc.gnu.org/PR113300;>PR113300 @@ -11916,7 +11917,7 @@ https://wg21.link/cwg1698;>1698 - DR + DRWP Files ending in \ ? @@ -12075,11 +12076,11 @@ ? - + https://wg21.link/cwg1721;>1721 - drafting + review Diagnosing ODR violations for static data members - - + ? @@ -13454,11 +13455,11 @@ N/A - + https://wg21.link/cwg1918;>1918 - open + CD5 friend templates with dependent scopes - - + ? @@ -13644,11 +13645,11 @@ - - + https://wg21.link/cwg1945;>1945 - open + CD5 Friend declarations naming members of class templates in non-templates - - + ? @@ -13709,7 +13710,7 @@ https://wg21.link/cwg1954;>1954 - tentatively ready + DR typeid null dereference check in subexpressions ? @@ -14373,11 +14374,11 @@ - - + https://wg21.link/cwg2049;>2049 - drafting + DRWP List initializer in non-type template default argument - - + ? @@ -14410,7 +14411,7 @@ https://wg21.link/cwg2054;>2054 - DR + DRWP Missing description of class SFINAE ? @@ -14746,7 +14747,7 @@ https://wg21.link/cwg2102;>2102 - DR + DRWP Constructor checking in new-expression ? @@ -15797,7 +15798,7 @@ https://wg21.link/cwg2252;>2252 - DR + DRWP Enumeration list-initialization from the same type ? @@ -17069,11 +17070,11 @@ ? - + https://wg21.link/cwg2434;>2434 - open + review Mandatory copy elision vs non-class objects - - + ? @@ -17183,7 +17184,7 @@ https://wg21.link/cwg2450;>2450 - review + DRWP braced-init-list as a template-argument 11 @@ -17244,12 +17245,12 @@ ? - + https://wg21.link/cwg2459;>2459 - drafting + DRWP Template parameter initialization - - - + ? + https://gcc.gnu.org/PR113800;>PR113800 https://wg21.link/cwg2460;>2460 @@ -17365,7 +17366,7 @@ https://wg21.link/cwg2476;>2476 - tentatively ready + DR placeholder-type-specifiers and function declarators ? @@ -17561,7 +17562,7 @@ https://wg21.link/cwg2504;>2504 - DR + DRWP Inheriting constructors from virtual base classes ? @@ -17750,7 +17751,7 @@ https://wg21.link/cwg2531;>2531 - DR + DRWP Static data members redeclared as constexpr ? @@ -17764,7 +17765,7 @@ https://wg21.link/cwg2533;>2533 - review + DR Storage duration of implicitly created objects ? @@ -17855,14 +17856,14 @@ https://wg21.link/cwg2546;>2546 - tentatively ready + DR Defaulted secondary comparison operators defined as deleted ?
[RFC][PATCH] PR tree-optimization/109071 - -Warray-bounds false positive warnings due to code duplication from jump threading
-Warray-bounds is an important option to enable linux kernal to keep the array out-of-bound errors out of the source tree. However, due to the false positive warnings reported in PR109071 (-Warray-bounds false positive warnings due to code duplication from jump threading), -Warray-bounds=1 cannot be added on by default. Although it's impossible to elinimate all the false positive warnings from -Warray-bounds=1 (See PR104355 Misleading -Warray-bounds documentation says "always out of bounds"), we should minimize the false positive warnings in -Warray-bounds=1. The root reason for the false positive warnings reported in PR109071 is: When the thread jump optimization tries to reduce the # of branches inside the routine, sometimes it needs to duplicate the code and split into two conditional pathes. for example: The original code: void sparx5_set (int * ptr, struct nums * sg, int index) { if (index >= 4) warn (); *ptr = 0; *val = sg->vals[index]; if (index >= 4) warn (); *ptr = *val; return; } With the thread jump, the above becomes: void sparx5_set (int * ptr, struct nums * sg, int index) { if (index >= 4) { warn (); *ptr = 0; // Code duplications since "warn" does return; *val = sg->vals[index]; // same this line. // In this path, since it's under the condition // "index >= 4", the compiler knows the value // of "index" is larger then 4, therefore the // out-of-bound warning. warn (); } else { *ptr = 0; *val = sg->vals[index]; } *ptr = *val; return; } We can see, after the thread jump optimization, the # of branches inside the routine "sparx5_set" is reduced from 2 to 1, however, due to the code duplication (which is needed for the correctness of the code), we got a false positive out-of-bound warning. In order to eliminate such false positive out-of-bound warning, A. Add one more flag for GIMPLE: is_splitted. B. During the thread jump optimization, when the basic blocks are duplicated, mark all the STMTs inside the original and duplicated basic blocks as "is_splitted"; C. Inside the array bound checker, add the following new heuristic: If 1. the stmt is duplicated and splitted into two conditional paths; + 2. the warning level < 2; + 3. the current block is not dominating the exit block Then not report the warning. The false positive warnings are moved from -Warray-bounds=1 to -Warray-bounds=2 now. Bootstrapped and regression tested on both x86 and aarch64. adjusted -Warray-bounds-61.c due to the false positive warnings. Let me know if you have any comments and suggestions. Thanks. Qing PR tree optimization/109071 gcc/ChangeLog: * gimple-array-bounds.cc (check_out_of_bounds_and_warn): Add two new arguments for the new heuristic to not issue warnings. (array_bounds_checker::check_array_ref): Call the new prototype of the routine check_out_of_bounds_and_warn. (array_bounds_checker::check_mem_ref): Add one new argument for the new heuristic to not issue warnings. (array_bounds_checker::check_addr_expr): Call the new prototype of the routine check_mem_ref, add new heuristic for not issue warnings. (array_bounds_checker::check_array_bounds): Call the new prototype of the routine check_mem_ref. * gimple-array-bounds.h: New prototype of check_mem_ref. * gimple.h (struct GTY): Add one new flag is_splitted for gimple. (gimple_is_splitted_p): New function. (gimple_set_is_splitted): New function. * tree-ssa-threadupdate.cc (set_stmts_in_bb_is_splitted): New function. (back_jt_path_registry::duplicate_thread_path): Mark all the stmts in both original and copied blocks as IS_SPLITTED. gcc/testsuite/ChangeLog: * gcc.dg/Warray-bounds-61.c: Adjust testing case. * gcc.dg/pr109071-1.c: New test. * gcc.dg/pr109071.c: New test. --- gcc/gimple-array-bounds.cc | 46 + gcc/gimple-array-bounds.h | 2 +- gcc/gimple.h| 21 +-- gcc/testsuite/gcc.dg/Warray-bounds-61.c | 6 ++-- gcc/testsuite/gcc.dg/pr109071-1.c | 22 gcc/testsuite/gcc.dg/pr109071.c | 22 gcc/tree-ssa-threadupdate.cc| 15 7 files changed, 122 insertions(+), 12 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/pr109071-1.c create mode 100644 gcc/testsuite/gcc.dg/pr109071.c diff --git a/gcc/gimple-array-bounds.cc b/gcc/gimple-array-bounds.cc index 008071cd5464..4a2975623bc1 100644 --- a/gcc/gimple-array-bounds.cc +++ b/gcc/gimple-array-bounds.cc @@ -264,7 +264,9 @@ check_out_of_bounds_and_warn (location_t location, tree ref, tree up_bound, tree up_bound_p1,
[to-be-committed][RISC-V] Improve AND with some constants
If we have an AND with a constant operand and the constant operand requires synthesis, then we may be able to generate more efficient code than we do now. Essentially the need for constant synthesis gives us a budget for alternative ways to clear bits, which zext.w can do for bits 32..63 trivially. So if we clear 32..63 via zext.w, the constant for the remaining bits to clear may be simple enough to use with andi or bseti. That will save us an instruction. This has tested in Ventana's CI system as well as my own. I'll wait for the upstream CI tester to report success before committing. Jeff gcc/ * config/riscv/bitmanip.md: Add new splitter for AND with a constant that masks off bits 32..63 and needs synthesis. gcc/testsuite/ * gcc.target/riscv/zba_zbs_and-1.c: New test. +++ b/gcc/testsuite/gcc.target/riscv/zba_zbs_and-1.c diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md index 724511b6df3..8769a6b818b 100644 --- a/gcc/config/riscv/bitmanip.md +++ b/gcc/config/riscv/bitmanip.md @@ -843,6 +843,40 @@ (define_insn_and_split "*andi_extrabit" } [(set_attr "type" "bitmanip")]) +;; If we have the ZBA extension, then we can clear the upper half of a 64 +;; bit object with a zext.w. So if we have AND where the constant would +;; require synthesis of two or more instructions, but 32->64 sign extension +;; of the constant is a simm12, then we can use zext.w+andi. If the adjusted +;; constant is a single bit constant, then we can use zext.w+bclri +;; +;; With the mvconst_internal pattern claiming a single insn to synthesize +;; constants, this must be a define_insn_and_split. +(define_insn_and_split "" + [(set (match_operand:DI 0 "register_operand" "=r") + (and:DI (match_operand:DI 1 "register_operand" "r") + (match_operand 2 "const_int_operand" "n")))] + "TARGET_64BIT + && TARGET_ZBA + && !paradoxical_subreg_p (operands[1]) + /* Only profitable if synthesis takes more than one insn. */ + && riscv_const_insns (operands[2]) != 1 + /* We need the upper half to be zero. */ + && (INTVAL (operands[2]) & HOST_WIDE_INT_C (0x)) == 0 + /* And the the adjusted constant must either be something we can + implement with andi or bclri. */ + && ((SMALL_OPERAND (sext_hwi (INTVAL (operands[2]), 32)) +|| (TARGET_ZBS && popcount_hwi (INTVAL (operands[2])) == 31)) + && INTVAL (operands[2]) != 0x7fff)" + "#" + "&& 1" + [(set (match_dup 0) (zero_extend:DI (match_dup 3))) + (set (match_dup 0) (and:DI (match_dup 0) (match_dup 2)))] + "{ + operands[3] = gen_lowpart (SImode, operands[1]); + operands[2] = GEN_INT (sext_hwi (INTVAL (operands[2]), 32)); + }" + [(set_attr "type" "bitmanip")]) + ;; IF_THEN_ELSE: test for 2 bits of opposite polarity (define_insn_and_split "*branch_mask_twobits_equals_singlebit" [(set (pc) diff --git a/gcc/testsuite/gcc.target/riscv/zba_zbs_and-1.c b/gcc/testsuite/gcc.target/riscv/zba_zbs_and-1.c new file mode 100644 index 000..23fd769449e --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/zba_zbs_and-1.c @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gc_zba_zbb_zbs -mabi=lp64" } */ +/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" } } */ + + +unsigned long long w32mem_1(unsigned long long w32) +{ +return w32 & ~(1U << 0); +} + +unsigned long long w32mem_2(unsigned long long w32) +{ +return w32 & ~(1U << 30); +} + +unsigned long long w32mem_3(unsigned long long w32) +{ +return w32 & ~(1U << 31); +} + +/* If we do synthesis, then we'd see an addi. */ +/* { dg-final { scan-assembler-not "addi\t" } } */
[PATCH v2 1/2] RISC-V: avoid LUI based const materialization ... [part of PR/106265]
Apologies for the delay in getting this out. Needed to fix one ICE with glibc build and fresh round of testing: both testsuite and SPEC runs (which are similar to v1 in terms of Cactu gains, but some more minor regressions elsewhere gcc). Again those seem so small that IMHO this should still go in. I'll investigate those next as well as an existing weirdnes in glibc tempnam which I spotted during the debugging. Changes since v1 [1] - Tighten the main conditition to avoid stack regs as destination (to avoid making them potentially unaligned with -2047 addend: this might be OK execution/ABI wise, but undesirable/ugly still specially when coming from compiler codegen). - Ensure that first alternative is always split - Remove "&& 1" from split condition. That was tripping up glibc build with illegal operands `add s0, s0, 2048`. [1] https://gcc.gnu.org/pipermail/gcc-patches/2024-March/647877.html --- ... if the constant can be represented as sum of two S12 values. The two S12 values could instead be fused with subsequent ADD insn. The helps - avoid an additional LUI insn - side benefits of not clobbering a reg e.g. w/o patch w/ patch long | | plus(unsigned long i) | li a5,4096 | { | addia5,a5,-2032 | addi a0, a0, 2047 return i + 2064; | add a0,a0,a5| addi a0, a0, 17 } | ret | ret NOTE: In theory not having const in a standalone reg might seem less CSE friendly, but for workloads in consideration these mat are from very late LRA reloads and follow on GCSE is not doing much currently. The real benefit however is seen in base+offset computation for array accesses and especially for stack accesses which are finalized late in optim pipeline, during LRA register allocation. Often the finalized offsets trigger LRA reloads resulting in mind boggling repetition of exact same insn sequence including LUI based constant materialization. This shaves off 290 billion dynamic instrustions (QEMU icounts) in SPEC 2017 Cactu benchmark which is over 10% of workload. In the rest of suite, there additional 10 billion shaved, with both gains and losses in indiv workloads as is usual with compiler changes. 500.perlbench_r-0 | 1,214,534,029,025 | 1,212,887,959,387 | 500.perlbench_r-1 |740,383,419,739 | 739,280,308,163 | 500.perlbench_r-2 |692,074,638,817 | 691,118,734,547 | 502.gcc_r-0 |190,820,141,435 | 190,857,065,988 | 502.gcc_r-1 |225,747,660,839 | 225,809,444,357 | <- -0.02% 502.gcc_r-2 |220,370,089,641 | 220,406,367,876 | <- -0.03% 502.gcc_r-3 |179,111,460,458 | 179,135,609,723 | <- -0.02% 502.gcc_r-4 |219,301,546,340 | 219,320,416,956 | <- -0.01% 503.bwaves_r-0|278,733,324,691 | 278,733,323,575 | <- -0.01% 503.bwaves_r-1|442,397,521,282 | 442,397,519,616 | 503.bwaves_r-2|344,112,218,206 | 344,112,216,760 | 503.bwaves_r-3|417,561,469,153 | 417,561,467,597 | 505.mcf_r |669,319,257,525 | 669,318,763,084 | 507.cactuBSSN_r | 2,852,767,394,456 | 2,564,736,063,742 | <+ 10.10% 508.namd_r| 1,855,884,342,110 | 1,855,881,110,934 | 510.parest_r | 1,654,525,521,053 | 1,654,402,859,174 | 511.povray_r | 2,990,146,655,619 | 2,990,060,324,589 | 519.lbm_r | 1,158,337,294,525 | 1,158,337,294,529 | 520.omnetpp_r | 1,021,765,791,283 | 1,026,165,661,394 | 521.wrf_r | 1,715,955,652,503 | 1,714,352,737,385 | 523.xalancbmk_r |849,846,008,075 | 849,836,851,752 | 525.x264_r-0 |277,801,762,763 | 277,488,776,427 | 525.x264_r-1 |927,281,789,540 | 926,751,516,742 | 525.x264_r-2 |915,352,631,375 | 914,667,785,953 | 526.blender_r | 1,652,839,180,887 | 1,653,260,825,512 | 527.cam4_r| 1,487,053,494,925 | 1,484,526,670,770 | 531.deepsjeng_r | 1,641,969,526,837 | 1,642,126,598,866 | 538.imagick_r | 2,098,016,546,691 | 2,097,997,929,125 | 541.leela_r | 1,983,557,323,877 | 1,983,531,314,526 | 544.nab_r | 1,516,061,611,233 | 1,516,061,407,715 | 548.exchange2_r | 2,072,594,330,215 | 2,072,591,648,318 | 549.fotonik3d_r | 1,001,499,307,366 | 1,001,478,944,189 | 554.roms_r| 1,028,799,739,111 | 1,028,780,904,061 | 557.xz_r-0|363,827,039,684 | 363,057,014,260 | 557.xz_r-1|906,649,112,601 | 905,928,888,732 | 557.xz_r-2|509,023,898,187 | 508,140,356,932 | 997.specrand_fr |402,535,577 | 403,052,561 | 999.specrand_ir |402,535,577 | 403,052,561 | This should still be considered damage control as the real/deeper fix would be to reduce number of LRA reloads or CSE/anchor those during LRA constraint sub-pass (re)runs (thats a different PR/114729. Implementation Details (for posterity)
[PATCH v2 2/2] RISC-V: avoid LUI based const mat in prologue/epilogue expansion [PR/105733]
If the constant used for stack offset can be expressed as sum of two S12 values, the constant need not be materialized (in a reg) and instead the two S12 bits can be added to instructions involved with frame pointer. This avoids burning a register and more importantly can often get down to be 2 insn vs. 3. The prev patches to generally avoid LUI based const materialization didn't fix this PR and need this directed fix in funcion prologue/epilogue expansion. This fix doesn't move the neddle for SPEC, at all, but it is still a win considering gcc generates one insn fewer than llvm for the test ;-) gcc-13.1 release | gcc 230823 | | |g6619b3d4c15c| This patch | clang/llvm - li t0,-4096 | lit0,-4096 | addi sp,sp,-2048 | addi sp,sp,-2048 addit0,t0,2016 | addi t0,t0,2032| add sp,sp,-16 | addi sp,sp,-32 li a4,4096 | add sp,sp,t0 | add a5,sp,a0| add a1,sp,16 add sp,sp,t0 | addi a5,sp,-2032 | sbzero,0(a5) | add a0,a0,a1 li a5,-4096 | add a0,a5,a0 | addi sp,sp,2032 | sb zero,0(a0) addia4,a4,-2032 | lit0, 4096 | addi sp,sp,32| addi sp,sp,2032 add a4,a4,a5 | sbzero,2032(a0) | ret | addi sp,sp,48 addia5,sp,16 | addi t0,t0,-2032 | | ret add a5,a4,a5 | add sp,sp,t0 | add a0,a5,a0 | ret | li t0,4096 | sd a5,8(sp) | sb zero,2032(a0)| addit0,t0,-2016 | add sp,sp,t0 | ret | gcc/ChangeLog: PR target/105733 * config/riscv/riscv.h: New macros for with aligned offsets. * config/riscv/riscv.cc (riscv_split_sum_of_two_s12): New function to split a sum of two s12 values into constituents. (riscv_expand_prologue): Handle offset being sum of two S12. (riscv_expand_epilogue): Ditto. * config/riscv/riscv-protos.h (riscv_split_sum_of_two_s12): New. gcc/testsuite/ChangeLog: * gcc.target/riscv/pr105733.c: New Test. * gcc.target/riscv/rvv/autovec/vls/spill-1.c: Adjust to not expect LUI 4096. * gcc.target/riscv/rvv/autovec/vls/spill-2.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/spill-3.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/spill-4.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/spill-5.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/spill-6.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/spill-7.c: Ditto. Signed-off-by: Vineet Gupta --- gcc/config/riscv/riscv-protos.h | 2 + gcc/config/riscv/riscv.cc | 74 +-- gcc/config/riscv/riscv.h | 7 ++ gcc/testsuite/gcc.target/riscv/pr105733.c | 15 .../riscv/rvv/autovec/vls/spill-1.c | 4 +- .../riscv/rvv/autovec/vls/spill-2.c | 4 +- .../riscv/rvv/autovec/vls/spill-3.c | 4 +- .../riscv/rvv/autovec/vls/spill-4.c | 4 +- .../riscv/rvv/autovec/vls/spill-5.c | 4 +- .../riscv/rvv/autovec/vls/spill-6.c | 4 +- .../riscv/rvv/autovec/vls/spill-7.c | 4 +- 11 files changed, 105 insertions(+), 21 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/pr105733.c diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h index 706dc204e643..6da6ae4d041f 100644 --- a/gcc/config/riscv/riscv-protos.h +++ b/gcc/config/riscv/riscv-protos.h @@ -166,6 +166,8 @@ extern void riscv_subword_address (rtx, rtx *, rtx *, rtx *, rtx *); extern void riscv_lshift_subword (machine_mode, rtx, rtx, rtx *); extern enum memmodel riscv_union_memmodels (enum memmodel, enum memmodel); extern bool riscv_reg_frame_related (rtx); +extern void riscv_split_sum_of_two_s12 (HOST_WIDE_INT, HOST_WIDE_INT *, + HOST_WIDE_INT *); /* Routines implemented in riscv-c.cc. */ void riscv_cpu_cpp_builtins (cpp_reader *); diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index 4067505270e1..4b742489b272 100644 --- a/gcc/config/riscv/riscv.cc +++ b/gcc/config/riscv/riscv.cc @@ -4063,6 +4063,32 @@ riscv_split_doubleword_move (rtx dest, rtx src) riscv_emit_move (riscv_subword (dest, true), riscv_subword (src, true)); } } + +/* Constant VAL is known to be sum of two S12 constants. Break it into + comprising BASE and OFF. + Numerically S12 is -2048 to 2047, however it uses the more conservative + range -2048 to 2032 as offsets pertain to stack related registers. */ + +void +riscv_split_sum_of_two_s12 (HOST_WIDE_INT val, HOST_WIDE_INT *base, + HOST_WIDE_INT *off) +{ + if (SUM_OF_TWO_S12_N (val)) +{ + *base = -2048; + *off = val - (-2048); +} + else if (SUM_OF_TWO_S12_P_ALGN (val)) +{ + *base =
[PATCH v2 0/2] RISC-V improve stack/array access by constant mat tweak
Hi, This set of patches help improve stack/array accesses by improving constant materialization. Details are in respective patches. The first patch is the main change which improves SPEC cactu by 10%. As discussed/agreed for v1 [1], I've dropped the splitter variant for stack accesses. I also have a few follow-ups which I come back to seperately. Thx, -Vineet [1] https://gcc.gnu.org/pipermail/gcc-patches/2024-March/647874.html Vineet Gupta (2): RISC-V: avoid LUI based const materialization ... [part of PR/106265] RISC-V: avoid LUI based const mat in prologue/epilogue expansion [PR/105733] gcc/config/riscv/constraints.md | 6 ++ gcc/config/riscv/predicates.md| 6 ++ gcc/config/riscv/riscv-protos.h | 3 + gcc/config/riscv/riscv.cc | 85 +-- gcc/config/riscv/riscv.h | 22 + gcc/config/riscv/riscv.md | 40 + gcc/testsuite/gcc.target/riscv/pr105733.c | 15 .../riscv/rvv/autovec/vls/spill-1.c | 4 +- .../riscv/rvv/autovec/vls/spill-2.c | 4 +- .../riscv/rvv/autovec/vls/spill-3.c | 4 +- .../riscv/rvv/autovec/vls/spill-4.c | 4 +- .../riscv/rvv/autovec/vls/spill-5.c | 4 +- .../riscv/rvv/autovec/vls/spill-6.c | 4 +- .../riscv/rvv/autovec/vls/spill-7.c | 4 +- .../gcc.target/riscv/sum-of-two-s12-const-1.c | 45 ++ .../gcc.target/riscv/sum-of-two-s12-const-2.c | 15 .../gcc.target/riscv/sum-of-two-s12-const-3.c | 22 + 17 files changed, 266 insertions(+), 21 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/pr105733.c create mode 100644 gcc/testsuite/gcc.target/riscv/sum-of-two-s12-const-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/sum-of-two-s12-const-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/sum-of-two-s12-const-3.c -- 2.34.1
Re: [PATCH] rs6000: Enable overlapped by-pieces operations
Hi, on 2024/5/9 15:35, HAO CHEN GUI wrote: > Hi Kewen, > Thanks for your comments. > > 在 2024/5/9 13:44, Kewen.Lin 写道: >> Hi, >> >> on 2024/5/8 14:47, HAO CHEN GUI wrote: >>> Hi, >>> This patch enables overlapped by-piece operations. On rs6000, default >>> move/set/clear ratio is 2. So the overlap is only enabled with compare >>> by-pieces. >> >> Thanks for enabling this, did you evaluate if it can help some benchmark? > > Tested it with SPEC2017. No obvious performance impact. I think memory > compare might not be hot enough. > > Tested it with my micro benchmark. 5-10% performance gain when compare > length is 7. Nice! > >> >>> >>> Bootstrapped and tested on powerpc64-linux BE and LE with no >>> regressions. Is it OK for the trunk? >>> >>> Thanks >>> Gui Haochen >>> >>> ChangeLog >>> rs6000: Enable overlapped by-pieces operations >>> >>> This patch enables overlapped by-piece operations by defining >>> TARGET_OVERLAP_OP_BY_PIECES_P to true. On rs6000, default move/set/clear >>> ratio is 2. So the overlap is only enabled with compare by-pieces. >>> >>> gcc/ >>> * config/rs6000/rs6000.cc (TARGET_OVERLAP_OP_BY_PIECES_P): Define. >>> >>> gcc/testsuite/ >>> * gcc.target/powerpc/block-cmp-9.c: New. >>> >>> >>> patch.diff >>> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc >>> index 6b9a40fcc66..2b5f5cf1d86 100644 >>> --- a/gcc/config/rs6000/rs6000.cc >>> +++ b/gcc/config/rs6000/rs6000.cc >>> @@ -1774,6 +1774,9 @@ static const scoped_attribute_specs *const >>> rs6000_attribute_table[] = >>> #undef TARGET_CONST_ANCHOR >>> #define TARGET_CONST_ANCHOR 0x8000 >>> >>> +#undef TARGET_OVERLAP_OP_BY_PIECES_P >>> +#define TARGET_OVERLAP_OP_BY_PIECES_P hook_bool_void_true >>> + >>> >>> >>> /* Processor table. */ >>> diff --git a/gcc/testsuite/gcc.target/powerpc/block-cmp-9.c >>> b/gcc/testsuite/gcc.target/powerpc/block-cmp-9.c >>> new file mode 100644 >>> index 000..b5f51affbb7 >>> --- /dev/null >>> +++ b/gcc/testsuite/gcc.target/powerpc/block-cmp-9.c >>> @@ -0,0 +1,11 @@ >>> +/* { dg-do compile } */ >>> +/* { dg-options "-O2 -mdejagnu-cpu=power8" } */ >> >> Why does it need power8 forced here? > > I just want to exclude P7 LE as targetm.slow_unaligned_access return false > for it and the expand cmpmemsi won't be invoked. > I think it over. It's no need. For the sub-targets which library is > called, l[hb]z won't be generated too. Thanks for checking, OK with dropping this forced power8. BR, Kewen > >> >> BR, >> Kewen >> >>> +/* { dg-final { scan-assembler-not {\ml[hb]z\M} } } */ >>> + >>> +/* Test if by-piece overlap compare is enabled and following case is >>> + implemented by two overlap word loads and compares. */ >>> + >>> +int foo (const char* s1, const char* s2) >>> +{ >>> + return __builtin_memcmp (s1, s2, 7) == 0; >>> +} >> > > Thanks > Gui Haochen
Re: [COMMITTED 2/5] Fix ranger when called from SCEV.
On Mon, 2024-05-13 20:19:42 +0200, Jan-Benedict Glaw wrote: > On Tue, 2024-04-30 17:24:15 -0400, Andrew MacLeod wrote: > > Bootstrapped on x86_64-pc-linux-gnu with no regressions. pushed. > > Starting with this patch (upstream as > e8ae56a7dc46e39a48017bb5159e4dc672ec7fad, can still be reproduced with > 0c585c8d0dd85601a8d116ada99126a48c8ce9fd as of May 13th), my CI builds fail > for > csky-elf in all-target-libgcc by falling into a loop infinite loop: > > ../gcc/configure '--with-pkgversion=basepoints/gcc-15-432-g0c585c8d0dd, built > at 1715608899' \ > --prefix=/tmp/gcc-csky-elf --enable-werror-always > --enable-languages=all\ > --disable-gcov --disable-shared --disable-threads --target=csky-elf > --without-headers > make V=1 all-gcc > make V=1 install-strip-gcc > make V=1 all-target-libgcc Just to add: /var/lib/laminar/run/gcc-csky-elf/65/toolchain-build/./gcc/cc1 -quiet \ -I . -I . -I ../../.././gcc -I ../../../../gcc/libgcc \ -I ../../../../gcc/libgcc/. -I ../../../../gcc/libgcc/../gcc \ -I ../../../../gcc/libgcc/../include -imultilib ck801 \ -iprefix /var/lib/laminar/run/gcc-csky-elf/65/toolchain-build/gcc/../lib/gcc/csky-elf/15.0.0/ \ -isystem /var/lib/laminar/run/gcc-csky-elf/65/toolchain-build/./gcc/include \ -isystem /var/lib/laminar/run/gcc-csky-elf/65/toolchain-build/./gcc/include-fixed \ -MD unwind-dw2-fde.d -MF unwind-dw2-fde.dep -MP -MT unwind-dw2-fde.o \ -D IN_GCC -D CROSS_DIRECTORY_STRUCTURE -D IN_LIBGCC2 -D inhibit_libc \ -D HAVE_CC_TLS -D USE_EMUTLS -D HIDE_EXPORTS \ -isystem /tmp/gcc-csky-elf/csky-elf/include \ -isystem /tmp/gcc-csky-elf/csky-elf/sys-include \ -isystem ./include ../../../../gcc/libgcc/unwind-dw2-fde.c -quiet \ -dumpbase unwind-dw2-fde.c -dumpbase-ext .c -mcpu=ck801 -g -g -g -O2 -O2 -O2\ -Wextra -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wstrict-prototypes\ -Wmissing-prototypes -Wold-style-definition -fbuilding-libgcc -fno-stack-protector \ -fexceptions -fvisibility=hidden -o /tmp/cc3SHedS.s > (gdb) bt > #0 0x0098f1df in bitmap_list_find_element (head=0x38f2e18, > indx=5001) at ../../gcc/gcc/bitmap.cc:375 > #1 bitmap_set_bit (head=0x38f2e18, bit=640244) at ../../gcc/gcc/bitmap.cc:962 > #2 0x00d39cd1 in process_bb_lives (bb=, > curr_point=@0x7ffe062c1b2c: 3039473, dead_insn_p=) at > ../../gcc/gcc/lra-lives.cc:889 > #3 lra_create_live_ranges_1 (all_p=all_p@entry=true, dead_insn_p= out>) at ../../gcc/gcc/lra-lives.cc:1416 > #4 0x00d3b810 in lra_create_live_ranges (all_p=all_p@entry=true, > dead_insn_p=) at ../../gcc/gcc/lra-lives.cc:1486 > #5 0x00d1a8bd in lra (f=, verbose=) at > ../../gcc/gcc/lra.cc:2482 > #6 0x00cd0e18 in do_reload () at ../../gcc/gcc/ira.cc:5973 > #7 (anonymous namespace)::pass_reload::execute (this=) at > ../../gcc/gcc/ira.cc:6161 > #8 0x00de6368 in execute_one_pass (pass=pass@entry=0x367c490) at > ../../gcc/gcc/passes.cc:2647 > #9 0x00de6c00 in execute_pass_list_1 (pass=0x367c490) at > ../../gcc/gcc/passes.cc:2756 > #10 0x00de6c12 in execute_pass_list_1 (pass=0x367b2f0) at > ../../gcc/gcc/passes.cc:2757 > #11 0x00de6c39 in execute_pass_list (fn=0x7f24a1c06240, > pass=) at ../../gcc/gcc/passes.cc:2767 > #12 0x00a188c6 in cgraph_node::expand (this=0x7f24a1bfaaa0) at > ../../gcc/gcc/context.h:48 > #13 cgraph_node::expand (this=0x7f24a1bfaaa0) at > ../../gcc/gcc/cgraphunit.cc:1798 > #14 0x00a1a69b in expand_all_functions () at > ../../gcc/gcc/cgraphunit.cc:2028 > #15 symbol_table::compile (this=0x7f24a205b000) at > ../../gcc/gcc/cgraphunit.cc:2404 > #16 0x00a1ccb8 in symbol_table::compile (this=0x7f24a205b000) at > ../../gcc/gcc/cgraphunit.cc:2315 > #17 symbol_table::finalize_compilation_unit (this=0x7f24a205b000) at > ../../gcc/gcc/cgraphunit.cc:2589 > #18 0x00f0932d in compile_file () at ../../gcc/gcc/toplev.cc:476 > #19 0x00839648 in do_compile () at ../../gcc/gcc/toplev.cc:2158 > #20 toplev::main (this=this@entry=0x7ffe062c1f2e, argc=, > argc@entry=78, argv=, argv@entry=0x7ffe062c2058) at > ../../gcc/gcc/toplev.cc:2314 > #21 0x0083ad9e in main (argc=78, argv=0x7ffe062c2058) at > ../../gcc/gcc/main.cc:39 > > (Loop is based in process_bb_lives(), looping in the > FOR_BB_INSNS_REVERSE_SAFE (bb, curr_insn, next) block starting at > about line 696.) MfG, JBG -- signature.asc Description: PGP signature
Re: [COMMITTED 2/5] Fix ranger when called from SCEV.
On Tue, 2024-04-30 17:24:15 -0400, Andrew MacLeod wrote: > Bootstrapped on x86_64-pc-linux-gnu with no regressions. pushed. Starting with this patch (upstream as e8ae56a7dc46e39a48017bb5159e4dc672ec7fad, can still be reproduced with 0c585c8d0dd85601a8d116ada99126a48c8ce9fd as of May 13th), my CI builds fail for csky-elf in all-target-libgcc by falling into a loop infinite loop: ../gcc/configure '--with-pkgversion=basepoints/gcc-15-432-g0c585c8d0dd, built at 1715608899'\ --prefix=/tmp/gcc-csky-elf --enable-werror-always --enable-languages=all\ --disable-gcov --disable-shared --disable-threads --target=csky-elf --without-headers make V=1 all-gcc make V=1 install-strip-gcc make V=1 all-target-libgcc (gdb) bt #0 0x0098f1df in bitmap_list_find_element (head=0x38f2e18, indx=5001) at ../../gcc/gcc/bitmap.cc:375 #1 bitmap_set_bit (head=0x38f2e18, bit=640244) at ../../gcc/gcc/bitmap.cc:962 #2 0x00d39cd1 in process_bb_lives (bb=, curr_point=@0x7ffe062c1b2c: 3039473, dead_insn_p=) at ../../gcc/gcc/lra-lives.cc:889 #3 lra_create_live_ranges_1 (all_p=all_p@entry=true, dead_insn_p=) at ../../gcc/gcc/lra-lives.cc:1416 #4 0x00d3b810 in lra_create_live_ranges (all_p=all_p@entry=true, dead_insn_p=) at ../../gcc/gcc/lra-lives.cc:1486 #5 0x00d1a8bd in lra (f=, verbose=) at ../../gcc/gcc/lra.cc:2482 #6 0x00cd0e18 in do_reload () at ../../gcc/gcc/ira.cc:5973 #7 (anonymous namespace)::pass_reload::execute (this=) at ../../gcc/gcc/ira.cc:6161 #8 0x00de6368 in execute_one_pass (pass=pass@entry=0x367c490) at ../../gcc/gcc/passes.cc:2647 #9 0x00de6c00 in execute_pass_list_1 (pass=0x367c490) at ../../gcc/gcc/passes.cc:2756 #10 0x00de6c12 in execute_pass_list_1 (pass=0x367b2f0) at ../../gcc/gcc/passes.cc:2757 #11 0x00de6c39 in execute_pass_list (fn=0x7f24a1c06240, pass=) at ../../gcc/gcc/passes.cc:2767 #12 0x00a188c6 in cgraph_node::expand (this=0x7f24a1bfaaa0) at ../../gcc/gcc/context.h:48 #13 cgraph_node::expand (this=0x7f24a1bfaaa0) at ../../gcc/gcc/cgraphunit.cc:1798 #14 0x00a1a69b in expand_all_functions () at ../../gcc/gcc/cgraphunit.cc:2028 #15 symbol_table::compile (this=0x7f24a205b000) at ../../gcc/gcc/cgraphunit.cc:2404 #16 0x00a1ccb8 in symbol_table::compile (this=0x7f24a205b000) at ../../gcc/gcc/cgraphunit.cc:2315 #17 symbol_table::finalize_compilation_unit (this=0x7f24a205b000) at ../../gcc/gcc/cgraphunit.cc:2589 #18 0x00f0932d in compile_file () at ../../gcc/gcc/toplev.cc:476 #19 0x00839648 in do_compile () at ../../gcc/gcc/toplev.cc:2158 #20 toplev::main (this=this@entry=0x7ffe062c1f2e, argc=, argc@entry=78, argv=, argv@entry=0x7ffe062c2058) at ../../gcc/gcc/toplev.cc:2314 #21 0x0083ad9e in main (argc=78, argv=0x7ffe062c2058) at ../../gcc/gcc/main.cc:39 (Loop is based in process_bb_lives(), looping in the FOR_BB_INSNS_REVERSE_SAFE (bb, curr_insn, next) block starting at about line 696.) MfG, JBG -- signature.asc Description: PGP signature
Re: [PATCH v1] RISC-V: Bugfix ICE for RVV intrinisc vfw on _Float16 scalar
On 5/13/24 9:00 AM, Li, Pan2 wrote: Committed, thanks Juzhe and Kito. Let's wait for a while before backport to 14. Could you fix the formatting nits caught by the CI linter? === ERROR type #1: trailing operator (4 error(s)) === gcc/config/riscv/riscv-vector-builtins.cc:4641:39: if ((exts & RVV_REQUIRE_ELEN_FP_16) && gcc/config/riscv/riscv-vector-builtins.cc:4651:39: if ((exts & RVV_REQUIRE_ELEN_FP_32) && gcc/config/riscv/riscv-vector-builtins.cc:4661:39: if ((exts & RVV_REQUIRE_ELEN_FP_64) && gcc/config/riscv/riscv-vector-builtins.cc:4670:36: if ((exts & RVV_REQUIRE_ELEN_64) && The "&&" needs to come down to the next line, indented like if ((exts && RVV_REQUIRE_ELEN_FP_16) && !TARGET_VECTOR_.) Ie, the "&&" indents just inside the first open paren. It looks like all the conditions in validate_instance_type_required_extensions need to be fixed in a similar manner. Given this is NFC, just post it for the archiver. No need to wait on review. Jeff
[COMMITTED][GCC12] Backport of 111009 patch.
Same patch for gcc12. bootstraps and passes all tests on x86_64-pc-linux-gnu On 5/9/24 10:32, Andrew MacLeod wrote: As requested, backported the patch for 111009 to resolve incorrect ranges from addr_expr and committed to GCC 13 branch. bootstraps and passes all tests on x86_64-pc-linux-gnu Andrewcommit b5d079c37e9eee15c0bfe34ffcae31e551192777 Author: Andrew MacLeod Date: Fri May 10 13:56:01 2024 -0400 Fix range-ops operator_addr. Lack of symbolic information prevents op1_range from being able to draw the same conclusions as fold_range can. PR tree-optimization/111009 gcc/ * range-op.cc (operator_addr_expr::op1_range): Be more restrictive. * value-range.h (contains_zero_p): New. gcc/testsuite/ * gcc.dg/pr111009.c: New. diff --git a/gcc/range-op.cc b/gcc/range-op.cc index bf95f5fbaa1..2e0d67b70b6 100644 --- a/gcc/range-op.cc +++ b/gcc/range-op.cc @@ -3825,7 +3825,17 @@ operator_addr_expr::op1_range (irange , tree type, const irange , relation_kind rel ATTRIBUTE_UNUSED) const { - return operator_addr_expr::fold_range (r, type, lhs, op2); + if (empty_range_varying (r, type, lhs, op2)) +return true; + + // Return a non-null pointer of the LHS type (passed in op2), but only + // if we cant overflow, eitherwise a no-zero offset could wrap to zero. + // See PR 111009. + if (!contains_zero_p (lhs) && TYPE_OVERFLOW_UNDEFINED (type)) +r = range_nonzero (type); + else +r.set_varying (type); + return true; } diff --git a/gcc/testsuite/gcc.dg/pr111009.c b/gcc/testsuite/gcc.dg/pr111009.c new file mode 100644 index 000..3accd9ac063 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr111009.c @@ -0,0 +1,38 @@ +/* PR tree-optimization/111009 */ +/* { dg-do run } */ +/* { dg-options "-O3 -fno-strict-overflow" } */ + +struct dso { + struct dso * next; + int maj; +}; + +__attribute__((noipa)) static void __dso_id__cmp_(void) {} + +__attribute__((noipa)) +static int bug(struct dso * d, struct dso *dso) +{ + struct dso **p = + struct dso *curr = 0; + + while (*p) { + curr = *p; + // prevent null deref below + if (!dso) return 1; + if (dso == curr) return 1; + + int *a = >maj; + // null deref + if (!(a && *a)) __dso_id__cmp_(); + + p = >next; + } + return 0; +} + +__attribute__((noipa)) +int main(void) { +struct dso d = { 0, 0, }; +bug(, 0); +} + diff --git a/gcc/value-range.h b/gcc/value-range.h index d4cba22d540..22f5fc68d7c 100644 --- a/gcc/value-range.h +++ b/gcc/value-range.h @@ -605,6 +605,16 @@ irange::normalize_kind () } } +inline bool +contains_zero_p (const irange ) +{ + if (r.undefined_p ()) +return false; + + tree zero = build_zero_cst (r.type ()); + return r.contains_p (zero); +} + // Return the maximum value for TYPE. inline tree
[COMMITTED] c++: Avoid using __array_rank as a variable name [PR115061]
Pushed as obvious. -- >8 -- This patch fixes a compilation error when building GCC using Clang. Since __array_rank is used as a built-in trait name, use rank instead. PR c++/115061 gcc/cp/ChangeLog: * semantics.cc (finish_trait_expr): Use rank instead of __array_rank. Signed-off-by: Ken Matsui --- gcc/cp/semantics.cc | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc index 43b175f92fd..df62e2d80db 100644 --- a/gcc/cp/semantics.cc +++ b/gcc/cp/semantics.cc @@ -12914,10 +12914,10 @@ finish_trait_expr (location_t loc, cp_trait_kind kind, tree type1, tree type2) tree val; if (kind == CPTK_RANK) { - size_t __array_rank = 0; + size_t rank = 0; for (; TREE_CODE (type1) == ARRAY_TYPE; type1 = TREE_TYPE (type1)) - ++__array_rank; - val = build_int_cst (size_type_node, __array_rank); + ++rank; + val = build_int_cst (size_type_node, rank); } else val = (trait_expr_value (kind, type1, type2) -- 2.44.0
Re: [PATCH] c++: Avoid using __array_rank as a variable name [PR115061]
On Mon, May 13, 2024 at 8:19 AM Marek Polacek wrote: > > On Sun, May 12, 2024 at 11:48:07PM -0700, Ken Matsui wrote: > > This patch fixes a compilation error when building GCC using Clang. > > Since __array_rank is used as a built-in trait name, use rank instead. > > I think you can go ahead and push this patch as obvious, thanks. Oh, I see. Thank you for letting me know! > > > PR c++/115061 > > > > gcc/cp/ChangeLog: > > > > * semantics.cc (finish_trait_expr): Use rank instead of > > __array_rank. > > > > Signed-off-by: Ken Matsui > > --- > > gcc/cp/semantics.cc | 6 +++--- > > 1 file changed, 3 insertions(+), 3 deletions(-) > > > > diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc > > index 43b175f92fd..df62e2d80db 100644 > > --- a/gcc/cp/semantics.cc > > +++ b/gcc/cp/semantics.cc > > @@ -12914,10 +12914,10 @@ finish_trait_expr (location_t loc, cp_trait_kind > > kind, tree type1, tree type2) > >tree val; > >if (kind == CPTK_RANK) > > { > > - size_t __array_rank = 0; > > + size_t rank = 0; > >for (; TREE_CODE (type1) == ARRAY_TYPE; type1 = TREE_TYPE (type1)) > > - ++__array_rank; > > - val = build_int_cst (size_type_node, __array_rank); > > + ++rank; > > + val = build_int_cst (size_type_node, rank); > > } > >else > > val = (trait_expr_value (kind, type1, type2) > > -- > > 2.44.0 > > > > Marek >
[r15-429 Regression] FAIL: experimental/simd/pr109261_constexpr_simd.cc -msse2 -O2 -Wno-psabi (test for excess errors) on Linux/x86_64
On Linux/x86_64, fb1649f8b4ad5043dd0e65e4e3a643a0ced018a9 is the first bad commit commit fb1649f8b4ad5043dd0e65e4e3a643a0ced018a9 Author: Matthias Kretz Date: Mon May 6 12:13:55 2024 +0200 libstdc++: Use __builtin_shufflevector for simd split and concat caused FAIL: experimental/simd/pr109261_constexpr_simd.cc -msse2 -O2 -Wno-psabi (test for excess errors) with GCC configured with ../../gcc/configure --prefix=/export/users/haochenj/src/gcc-bisect/master/master/r15-429/usr --enable-clocale=gnu --with-system-zlib --with-demangler-in-ld --with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl --enable-libmpx x86_64-linux --disable-bootstrap To reproduce: $ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check RUNTESTFLAGS="conformance.exp=experimental/simd/pr109261_constexpr_simd.cc --target_board='unix{-m32}'" (Please do not reply to this email, for question about this report, contact me at haochen dot jiang at intel.com.) (If you met problems with cascadelake related, disabling AVX512F in command line might save that.) (However, please make sure that there is no potential problems with AVX512.)
Re: [PATCH v1 3/3] RISC-V: Enable vectorizable early exit test
Hi Pan, > > @@ -4114,6 +4115,7 @@ proc check_effective_target_vect_early_break_hw { } { > || [check_effective_target_arm_v8_neon_hw] > || [check_sse4_hw_available] > || [istarget amdgcn-*-*] > + || [check_effective_target_riscv_v] > }}] > } I believe this should be riscv_v_ok. riscv_v only checks if we can compile. OK with that changed after 2/3 is in. Regards Robin
[PATCH] Match: optimize `a == CST & unary(a)` [PR111487]
This is an expansion of the optimize `a == CST & a` to handle more than just casts. It adds optimization for unary. The patch for binary operators will come later. Bootstrapped and tested on x86_64-linux-gnu with no regressions. PR tree-optimization/111487 gcc/ChangeLog: * match.pd (tcc_int_unary): New operator list. (`a == CST & unary(a)`): New pattern. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/and-unary-1.c: New test. Signed-off-by: Andrew Pinski --- gcc/match.pd| 12 gcc/testsuite/gcc.dg/tree-ssa/and-unary-1.c | 61 + 2 files changed, 73 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/and-unary-1.c diff --git a/gcc/match.pd b/gcc/match.pd index 07e743ae464..3ee28a3d8fc 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -57,6 +57,10 @@ along with GCC; see the file COPYING3. If not see #include "cfn-operators.pd" +/* integer unary operators that return the same type. */ +(define_operator_list tcc_int_unary + abs absu negate bit_not BSWAP POPCOUNT CTZ CLZ PARITY) + /* Define operand lists for math rounding functions {,i,l,ll}FN, where the versions prefixed with "i" return an int, those prefixed with "l" return a long and those prefixed with "ll" return a long long. @@ -5451,6 +5455,14 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) @2 { build_zero_cst (type); })) +/* `(a == CST) & unary(a)` can be simplified to `(a == CST) & unary(CST)`. */ +(simplify + (bit_and:c (convert@2 (eq @0 INTEGER_CST@1)) +(convert? (tcc_int_unary @3))) + (if (bitwise_equal_p (@0, @3)) + (with { tree inner_type = TREE_TYPE (@3); } + (bit_and @2 (convert (tcc_int_unary (convert:inner_type @1))) + /* Optimize # x_5 in range [cst1, cst2] where cst2 = cst1 + 1 x_5 == cstN ? cst4 : cst3 diff --git a/gcc/testsuite/gcc.dg/tree-ssa/and-unary-1.c b/gcc/testsuite/gcc.dg/tree-ssa/and-unary-1.c new file mode 100644 index 000..c157bc11b00 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/and-unary-1.c @@ -0,0 +1,61 @@ +/* { dg-do compile } */ +/* { dg-options "-O1 -fdump-tree-forwprop1-raw -fdump-tree-optimized-raw" } */ +/* unary part of PR tree-optimization/111487 */ + +int abs1(int a) +{ + int b = __builtin_abs(a); + return (a == 1) & b; +} +int absu1(int a) +{ + int b; + b = a > 0 ? -a:a; + b = -b; +return (a == 1) & b; +} + +int bswap1(int a) +{ + int b = __builtin_bswap32(a); + return (a == 1) & b; +} + +int ctz1(int a) +{ + int b = __builtin_ctz(a); + return (a == 1) & b; +} +int pop1(int a) +{ + int b = __builtin_popcount(a); + return (a == 1) & b; +} +int neg1(int a) +{ + int b = -(a); + return (a == 1) & b; +} +int not1(int a) +{ + int b = ~(a); + return (a == 1) & b; +} +int partity1(int a) +{ + int b = __builtin_parity(a); + return (a == 1) & b; +} + + +/* We should optimize out the unary operator for each. + For ctz we can optimize directly to `return 0`. + For bswap1 and not1, we can do the same but not until after forwprop1. */ +/* { dg-final { scan-tree-dump-times "eq_expr, " 7 "forwprop1" } } */ +/* { dg-final { scan-tree-dump-times "eq_expr, " 5 "optimized" } } */ +/* { dg-final { scan-tree-dump-not "abs_expr, " "forwprop1" } } */ +/* { dg-final { scan-tree-dump-not "absu_expr, " "forwprop1" } } */ +/* { dg-final { scan-tree-dump-not "bit_not_expr, " "forwprop1" } } */ +/* { dg-final { scan-tree-dump-not "negate_expr, " "forwprop1" } } */ +/* { dg-final { scan-tree-dump-not "gimple_call <" "forwprop1" } } */ +/* { dg-final { scan-tree-dump-not "bit_and_expr, " "forwprop1" } } */ -- 2.34.1
Re: [Patch, aarch64] v3: Preparatory patch to place target independent and,dependent changed code in one file
Hi Ajit, Why did you send three mails for this revision of the patch? If you're going to send a new revision of the patch you should increment the version number and outline the changes / reasons for the new revision. Mostly the comments below are just style nits and things you missed from the last review(s) (please try not to miss so many in the future). On 09/05/2024 17:06, Ajit Agarwal wrote: > Hello Alex/Richard: > > All review comments are addressed. > > Common infrastructure of load store pair fusion is divided into target > independent and target dependent changed code. > > Target independent code is the Generic code with pure virtual function > to interface betwwen target independent and dependent code. > > Target dependent code is the implementation of pure virtual function for > aarch64 target and the call to target independent code. > > Bootstrapped on aarch64-linux-gnu. > > Thanks & Regards > Ajit > > > > aarch64: Preparatory patch to place target independent and > dependent changed code in one file > > Common infrastructure of load store pair fusion is divided into target > independent and target dependent changed code. > > Target independent code is the Generic code with pure virtual function > to interface betwwen target independent and dependent code. > > Target dependent code is the implementation of pure virtual function for > aarch64 target and the call to target independent code. > > 2024-05-09 Ajit Kumar Agarwal > > gcc/ChangeLog: > > * config/aarch64/aarch64-ldp-fusion.cc: Place target > independent and dependent changed code. > --- > gcc/config/aarch64/aarch64-ldp-fusion.cc | 542 +++ > 1 file changed, 363 insertions(+), 179 deletions(-) > > diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc > b/gcc/config/aarch64/aarch64-ldp-fusion.cc > index 1d9caeab05d..217790e111a 100644 > --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc > +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc > @@ -138,6 +138,224 @@ struct alt_base >poly_int64 offset; > }; > > +// Virtual base class for load/store walkers used in alias analysis. > +struct alias_walker > +{ > + virtual bool conflict_p (int ) const = 0; > + virtual insn_info *insn () const = 0; > + virtual bool valid () const = 0; > + virtual void advance () = 0; > +}; > + > +enum class writeback{ You missed a nit here. Space before '{'. > + ALL, > + EXISTING > +}; You also missed adding comments for the enum, please see the review for v2: https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651074.html > + > +struct pair_fusion { > + pair_fusion () > + { > +calculate_dominance_info (CDI_DOMINATORS); > +df_analyze (); > +crtl->ssa = new rtl_ssa::function_info (cfun); > + }; > + > + // Given: > + // - an rtx REG_OP, the non-memory operand in a load/store insn, > + // - a machine_mode MEM_MODE, the mode of the MEM in that insn, and > + // - a boolean LOAD_P (true iff the insn is a load), then: > + // return true if the access should be considered an FP/SIMD access. > + // Such accesses are segregated from GPR accesses, since we only want > + // to form pairs for accesses that use the same register file. > + virtual bool fpsimd_op_p (rtx, machine_mode, bool) > + { > +return false; > + } > + > + // Return true if we should consider forming ldp/stp insns from memory > + // accesses with operand mode MODE at this stage in compilation. > + virtual bool pair_operand_mode_ok_p (machine_mode mode) = 0; > + > + // Return true iff REG_OP is a suitable register operand for a paired > + // memory access, where LOAD_P is true if we're asking about loads and > + // false for stores. MEM_MODE gives the mode of the operand. > + virtual bool pair_reg_operand_ok_p (bool load_p, rtx reg_op, > + machine_mode mode) = 0; The comment needs updating since we changed the name of the last param, i.e. s/MEM_MODE/MODE/. > + > + // Return alias check limit. > + // This is needed to avoid unbounded quadratic behaviour when > + // performing alias analysis. > + virtual int pair_mem_alias_check_limit () = 0; > + > + // Returns true if we should try to handle writeback opportunities > + // (not whether there are any). > + virtual bool handle_writeback_opportunities (enum writeback which) = 0 ; Heh, the bit in parens from the v2 review probably doesn't need to go into the comment here. Also you should describe WHICH in the comment. > + > + // Given BASE_MEM, the mem from the lower candidate access for a pair, > + // and LOAD_P (true if the access is a load), check if we should proceed > + // to form the pair given the target's code generation policy on > + // paired accesses. > + virtual bool pair_mem_ok_with_policy (rtx first_mem, bool load_p, > + machine_mode mode) = 0; The name of the first param needs updating in the prototype, i.e. s/first_mem/base_mem/. I think you missed the bit about
[pushed][PR115013][LRA]: Modify register starvation recognition
The following patch fixes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115013 Successfully tested and bootstrapped on x86-64. commit 44430ef3d8ba75692efff5f6969d5610134566d3 Author: Vladimir N. Makarov Date: Mon May 13 10:12:11 2024 -0400 [PR115013][LRA]: Modify register starvation recognition My recent patch to recognize reg starvation resulted in few GCC test failures. The following patch fixes this by using more accurate starvation calculation and ignoring small reg classes. gcc/ChangeLog: PR rtl-optimization/115013 * lra-constraints.cc (process_alt_operands): Update all_used_nregs only for winreg. Ignore reg starvation for small reg classes. diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc index e945a4da451..92b343fa99a 100644 --- a/gcc/lra-constraints.cc +++ b/gcc/lra-constraints.cc @@ -2674,8 +2674,9 @@ process_alt_operands (int only_alternative) if (early_clobber_p || curr_static_id->operand[nop].type != OP_OUT) { - all_used_nregs - += ira_reg_class_min_nregs[this_alternative][mode]; + if (winreg) + all_used_nregs + += ira_reg_class_min_nregs[this_alternative][mode]; all_this_alternative = (reg_class_subunion [all_this_alternative][this_alternative]); @@ -3250,6 +3251,7 @@ process_alt_operands (int only_alternative) overall += LRA_MAX_REJECT; } if (all_this_alternative != NO_REGS + && !SMALL_REGISTER_CLASS_P (all_this_alternative) && all_used_nregs != 0 && all_reload_nregs != 0 && (all_used_nregs + all_reload_nregs + 1 >= ira_class_hard_regs_num[all_this_alternative]))
Re: [PATCH] c++: Avoid using __array_rank as a variable name [PR115061]
On Sun, May 12, 2024 at 11:48:07PM -0700, Ken Matsui wrote: > This patch fixes a compilation error when building GCC using Clang. > Since __array_rank is used as a built-in trait name, use rank instead. I think you can go ahead and push this patch as obvious, thanks. > PR c++/115061 > > gcc/cp/ChangeLog: > > * semantics.cc (finish_trait_expr): Use rank instead of > __array_rank. > > Signed-off-by: Ken Matsui > --- > gcc/cp/semantics.cc | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc > index 43b175f92fd..df62e2d80db 100644 > --- a/gcc/cp/semantics.cc > +++ b/gcc/cp/semantics.cc > @@ -12914,10 +12914,10 @@ finish_trait_expr (location_t loc, cp_trait_kind > kind, tree type1, tree type2) >tree val; >if (kind == CPTK_RANK) > { > - size_t __array_rank = 0; > + size_t rank = 0; >for (; TREE_CODE (type1) == ARRAY_TYPE; type1 = TREE_TYPE (type1)) > - ++__array_rank; > - val = build_int_cst (size_type_node, __array_rank); > + ++rank; > + val = build_int_cst (size_type_node, rank); > } >else > val = (trait_expr_value (kind, type1, type2) > -- > 2.44.0 > Marek
RE: [PATCH v1 1/3] Vect: Support loop len in vectorizable early exit
> -Original Message- > From: pan2...@intel.com > Sent: Monday, May 13, 2024 3:54 PM > To: gcc-patches@gcc.gnu.org > Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com; > Tamar Christina ; Richard Sandiford > ; Pan Li > Subject: [PATCH v1 1/3] Vect: Support loop len in vectorizable early exit > > From: Pan Li > > This patch adds early break auto-vectorization support for target which > use length on partial vectorization. Consider this following example: > > unsigned vect_a[802]; > unsigned vect_b[802]; > > void test (unsigned x, int n) > { > for (int i = 0; i < n; i++) > { > vect_b[i] = x + i; > > if (vect_a[i] > x) > break; > > vect_a[i] = x; > } > } > > We use VCOND_MASK_LEN to simulate the generate (mask && i < len + bias). > And then the IR of RVV looks like below: > > ... > _87 = .SELECT_VL (ivtmp_85, POLY_INT_CST [32, 32]); > _55 = (int) _87; > ... > mask_patt_6.13_69 = vect_cst__62 < vect__3.12_67; > vec_len_mask_72 = .VCOND_MASK_LEN (mask_patt_6.13_69, { -1, ... }, \ > {0, ... }, _87, 0); > if (vec_len_mask_72 != { 0, ... }) > goto ; [5.50%] > else > goto ; [94.50%] > > The below tests are passed for this patch: > 1. The riscv fully regression tests. > 2. The aarch64 fully regression tests. > 3. The x86 bootstrap tests. > 4. The x86 fully regression tests. > > gcc/ChangeLog: > > * tree-vect-stmts.cc (vectorizable_early_exit): Add loop len > handling for one or multiple stmt. > > Signed-off-by: Pan Li > --- > gcc/tree-vect-stmts.cc | 47 - > - > 1 file changed, 45 insertions(+), 2 deletions(-) > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc > index 21e8fe98e44..bfd9d66568f 100644 > --- a/gcc/tree-vect-stmts.cc > +++ b/gcc/tree-vect-stmts.cc > @@ -12896,7 +12896,9 @@ vectorizable_early_exit (vec_info *vinfo, > stmt_vec_info stmt_info, > ncopies = vect_get_num_copies (loop_vinfo, vectype); > >vec_loop_masks *masks = _VINFO_MASKS (loop_vinfo); > + vec_loop_lens *lens = _VINFO_LENS (loop_vinfo); >bool masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo); > + bool len_loop_p = LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo); > >/* Now build the new conditional. Pattern gimple_conds get dropped during > codegen so we must replace the original insn. */ > @@ -12960,12 +12962,11 @@ vectorizable_early_exit (vec_info *vinfo, > stmt_vec_info stmt_info, > { > if (direct_internal_fn_supported_p (IFN_VCOND_MASK_LEN, vectype, > OPTIMIZE_FOR_SPEED)) > - return false; > + vect_record_loop_len (loop_vinfo, lens, ncopies, vectype, 1); > else > vect_record_loop_mask (loop_vinfo, masks, ncopies, vectype, NULL); > } > > - >return true; > } > > @@ -13018,6 +13019,25 @@ vectorizable_early_exit (vec_info *vinfo, > stmt_vec_info stmt_info, > stmts[i], _gsi); > workset.quick_push (stmt_mask); > } > + else if (len_loop_p) > + for (unsigned i = 0; i < stmts.length (); i++) > + { > + tree all_ones_mask = build_all_ones_cst (vectype); > + tree all_zero_mask = build_zero_cst (vectype); > + tree len = vect_get_loop_len (loop_vinfo, gsi, lens, ncopies, > + vectype, i, 1); > + signed char cst = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS > (loop_vinfo); > + tree bias = build_int_cst (intQI_type_node, cst); > + tree len_mask > + = make_temp_ssa_name (TREE_TYPE (stmts[i]), NULL, > "vec_len_mask"); > + gcall *call = gimple_build_call_internal (IFN_VCOND_MASK_LEN, 5, > + stmts[i], all_ones_mask, > + all_zero_mask, len, bias); > + gimple_call_set_lhs (call, len_mask); > + gsi_insert_before (_gsi, call, GSI_SAME_STMT); > + > + workset.quick_push (len_mask); > + } >else > workset.splice (stmts); > > @@ -13042,6 +13062,29 @@ vectorizable_early_exit (vec_info *vinfo, > stmt_vec_info stmt_info, > new_temp = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, > new_temp, _gsi); > } > + else if (len_loop_p) > + { > + /* len_mask = VCOND_MASK_LEN (compare_mask, ones, zero, len, bias) > + > + which is equivalent to: > + > + len_mask = compare_mask mask && i < len ? 1 : 0 > + */ > + tree all_ones_mask = build_all_ones_cst (vectype); > + tree all_zero_mask = build_zero_cst (vectype); > + tree len > + = vect_get_loop_len (loop_vinfo, gsi, lens, ncopies, vectype, 0, 1); > + signed char biasval = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS > (loop_vinfo); > + tree bias = build_int_cst (intQI_type_node, biasval); > + tree
RE: [PATCH v4 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int
> > Thanks Tamer for comments. > > > I think OPTIMIZE_FOR_BOTH is better here, since this is a win also when > optimizing for size. > > Sure thing, let me update it in v5. > > > Hmm why do you iterate independently over the statements? The block below > already visits > > Every statement doesn't it? > > Because it will hit .ADD_OVERFLOW first, then it will never hit SAT_ADD as the > shape changed, or shall we put it to the previous pass ? > That's just a matter of matching the overflow as an additional case no? i.e. you can add an overload for unsigned_integer_sat_add matching the IFN_ ADD_OVERFLOW and using the realpart and imagpart helpers. I think that would be better as it avoid visiting all the statements twice but also extends the matching to some __builtin_add_overflow uses and should be fairly simple. > > The root of your match is a BIT_IOR_EXPR expression, so I think you just > > need to > change the entry below to: > > > > case BIT_IOR_EXPR: > > match_saturation_arith (, stmt, m_cfg_changed_p); > > /* fall-through */ > > case BIT_XOR_EXPR: > > match_uaddc_usubc (, stmt, code); > > break; > > There are other shapes (not covered in this patch) of SAT_ADD like below > branch > version, the IOR should be one of the ROOT. Thus doesn't > add case here. Then, shall we take case for each shape here ? Both works for > me. > Yeah, I think that's better than iterating over the statements twice. It also fits better In the existing code. Tamar. > #define SAT_ADD_U_1(T) \ > T sat_add_u_1_##T(T x, T y) \ > { \ > return (T)(x + y) >= x ? (x + y) : -1; \ > } > > SAT_ADD_U_1(uint32_t) > > Pan > > > -Original Message- > From: Tamar Christina > Sent: Monday, May 13, 2024 5:10 PM > To: Li, Pan2 ; gcc-patches@gcc.gnu.org > Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com; > Liu, Hongtao > Subject: RE: [PATCH v4 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned > scalar int > > Hi Pan, > > > -Original Message- > > From: pan2...@intel.com > > Sent: Monday, May 6, 2024 3:48 PM > > To: gcc-patches@gcc.gnu.org > > Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; Tamar Christina > > ; richard.guent...@gmail.com; > > hongtao@intel.com; Pan Li > > Subject: [PATCH v4 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned > scalar > > int > > > > From: Pan Li > > > > This patch would like to add the middle-end presentation for the > > saturation add. Aka set the result of add to the max when overflow. > > It will take the pattern similar as below. > > > > SAT_ADD (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x)) > > > > Take uint8_t as example, we will have: > > > > * SAT_ADD (1, 254) => 255. > > * SAT_ADD (1, 255) => 255. > > * SAT_ADD (2, 255) => 255. > > * SAT_ADD (255, 255) => 255. > > > > Given below example for the unsigned scalar integer uint64_t: > > > > uint64_t sat_add_u64 (uint64_t x, uint64_t y) > > { > > return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x)); > > } > > > > Before this patch: > > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y) > > { > > long unsigned int _1; > > _Bool _2; > > long unsigned int _3; > > long unsigned int _4; > > uint64_t _7; > > long unsigned int _10; > > __complex__ long unsigned int _11; > > > > ;; basic block 2, loop depth 0 > > ;;pred: ENTRY > > _11 = .ADD_OVERFLOW (x_5(D), y_6(D)); > > _1 = REALPART_EXPR <_11>; > > _10 = IMAGPART_EXPR <_11>; > > _2 = _10 != 0; > > _3 = (long unsigned int) _2; > > _4 = -_3; > > _7 = _1 | _4; > > return _7; > > ;;succ: EXIT > > > > } > > > > After this patch: > > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y) > > { > > uint64_t _7; > > > > ;; basic block 2, loop depth 0 > > ;;pred: ENTRY > > _7 = .SAT_ADD (x_5(D), y_6(D)); [tail call] > > return _7; > > ;;succ: EXIT > > } > > > > We perform the tranform during widen_mult because that the sub-expr of > > SAT_ADD will be optimized to .ADD_OVERFLOW. We need to try the .SAT_ADD > > pattern first and then .ADD_OVERFLOW, or we may never catch the pattern > > .SAT_ADD. Meanwhile, the isel pass is after widen_mult and then we > > cannot perform the .SAT_ADD pattern match as the sub-expr will be > > optmized to .ADD_OVERFLOW first. > > > > The below tests are passed for this patch: > > 1. The riscv fully regression tests. > > 2. The aarch64 fully regression tests. > > 3. The x86 bootstrap tests. > > 4. The x86 fully regression tests. > > > > PR target/51492 > > PR target/112600 > > > > gcc/ChangeLog: > > > > * internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADD > > to the return true switch case(s). > > * internal-fn.def (SAT_ADD): Add new signed optab SAT_ADD. > > * match.pd: Add unsigned SAT_ADD match. > > * optabs.def (OPTAB_NL): Remove fixed-point limitation for us/ssadd. > > * tree-ssa-math-opts.cc
RE: [PATCH v1] RISC-V: Bugfix ICE for RVV intrinisc vfw on _Float16 scalar
Committed, thanks Juzhe and Kito. Let's wait for a while before backport to 14. Pan -Original Message- From: Kito Cheng Sent: Monday, May 13, 2024 10:11 PM To: juzhe.zh...@rivai.ai Cc: Li, Pan2 ; gcc-patches Subject: Re: [PATCH v1] RISC-V: Bugfix ICE for RVV intrinisc vfw on _Float16 scalar LGTM as well :) On Sat, May 11, 2024 at 3:58 PM juzhe.zh...@rivai.ai wrote: > > LGTM from my side. Wait for kito chime in. > > > juzhe.zh...@rivai.ai > > > From: pan2.li > Date: 2024-05-11 15:54 > To: gcc-patches > CC: juzhe.zhong; kito.cheng; Pan Li > Subject: [PATCH v1] RISC-V: Bugfix ICE for RVV intrinisc vfw on _Float16 > scalar > From: Pan Li > > For the vfw vx format RVV intrinsic, the scalar type _Float16 also > requires the zvfh extension. Unfortunately, we only check the > vector tree type and miss the scalar _Float16 type checking. For > example: > > vfloat32mf2_t test_vfwsub_wf_f32mf2(vfloat32mf2_t vs2, _Float16 rs1, size_t > vl) > { > return __riscv_vfwsub_wf_f32mf2(vs2, rs1, vl); > } > > It should report some error message like zvfh extension is required > instead of ICE for unreg insn. > > This patch would like to make up such kind of validation for _Float16 > in the RVV intrinsic API. It will report some error like below when > there is no zvfh enabled. > > error: built-in function '__riscv_vfwsub_wf_f32mf2(vs2, rs1, vl)' > requires the zvfhmin or zvfh ISA extension > > PR target/114988 > > Passed the rv64gcv fully regression tests, included c/c++/fortran. > > gcc/ChangeLog: > > * config/riscv/riscv-vector-builtins.cc > (validate_instance_type_required_extensions): New func impl to > validate the intrinisc func type ops. > (expand_builtin): Validate instance type before expand. > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/rvv/base/pr114988-1.c: New test. > * gcc.target/riscv/rvv/base/pr114988-2.c: New test. > > Signed-off-by: Pan Li > --- > gcc/config/riscv/riscv-vector-builtins.cc | 51 +++ > .../gcc.target/riscv/rvv/base/pr114988-1.c| 9 > .../gcc.target/riscv/rvv/base/pr114988-2.c| 9 > 3 files changed, 69 insertions(+) > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr114988-1.c > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr114988-2.c > > diff --git a/gcc/config/riscv/riscv-vector-builtins.cc > b/gcc/config/riscv/riscv-vector-builtins.cc > index 192a6c230d1..3fdb4400d70 100644 > --- a/gcc/config/riscv/riscv-vector-builtins.cc > +++ b/gcc/config/riscv/riscv-vector-builtins.cc > @@ -4632,6 +4632,54 @@ gimple_fold_builtin (unsigned int code, > gimple_stmt_iterator *gsi, gcall *stmt) >return gimple_folder (rfn.instance, rfn.decl, gsi, stmt).fold (); > } > +static bool > +validate_instance_type_required_extensions (const rvv_type_info type, > + tree exp) > +{ > + uint64_t exts = type.required_extensions; > + > + if ((exts & RVV_REQUIRE_ELEN_FP_16) && > +!TARGET_VECTOR_ELEN_FP_16_P (riscv_vector_elen_flags)) > +{ > + error_at (EXPR_LOCATION (exp), > + "built-in function %qE requires the " > + "zvfhmin or zvfh ISA extension", > + exp); > + return false; > +} > + > + if ((exts & RVV_REQUIRE_ELEN_FP_32) && > +!TARGET_VECTOR_ELEN_FP_32_P (riscv_vector_elen_flags)) > +{ > + error_at (EXPR_LOCATION (exp), > + "built-in function %qE requires the " > + "zve32f, zve64f, zve64d or v ISA extension", > + exp); > + return false; > +} > + > + if ((exts & RVV_REQUIRE_ELEN_FP_64) && > +!TARGET_VECTOR_ELEN_FP_64_P (riscv_vector_elen_flags)) > +{ > + error_at (EXPR_LOCATION (exp), > + "built-in function %qE requires the zve64d or v ISA extension", > + exp); > + return false; > +} > + > + if ((exts & RVV_REQUIRE_ELEN_64) && > +!TARGET_VECTOR_ELEN_64_P (riscv_vector_elen_flags)) > +{ > + error_at (EXPR_LOCATION (exp), > + "built-in function %qE requires the " > + "zve64x, zve64f, zve64d or v ISA extension", > + exp); > + return false; > +} > + > + return true; > +} > + > /* Expand a call to the RVV function with subcode CODE. EXP is the call > expression and TARGET is the preferred location for the result. > Return the value of the lhs. */ > @@ -4649,6 +4697,9 @@ expand_builtin (unsigned int code, tree exp, rtx target) >return target; > } > + if (!validate_instance_type_required_extensions (rfn.instance.type, exp)) > +return target; > + >return function_expander (rfn.instance, rfn.decl, exp, target).expand (); > } > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr114988-1.c > b/gcc/testsuite/gcc.target/riscv/rvv/base/pr114988-1.c > new file mode 100644 > index 000..b8474804c88 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr114988-1.c > @@ -0,0 +1,9 @@ > +/* { dg-do compile } */ > +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3" } */ > + > +#include "riscv_vector.h" > + > +vfloat32mf2_t test_vfwsub_wf_f32mf2(vfloat32mf2_t
[PATCH v1 3/3] RISC-V: Enable vectorizable early exit test
From: Pan Li This patch depends on below 2 patches. https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651459.html https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651460.html After we supported vectorizable early exit in RISC-V, we would like to enable the gcc vect test for vectorizable early test. The vect-early-break_124-pr114403.c failed to vectorize for now. Because that the __builtin_memcpy with 8 bytes failed to folded into int64 assignment during ccp1. We will improve that first and mark this as xfail for RISC-V. The below tests are passed for this patch: 1. The riscv fully regression tests. 2. The aarch64 fully regression tests. 3. The x86 bootstrap tests. 4. The x86 fully regression tests. gcc/testsuite/ChangeLog: * gcc.dg/vect/slp-mask-store-1.c: Add pragma novector as it will have 2 times LOOP VECTORIZED in RISC-V. * gcc.dg/vect/vect-early-break_124-pr114403.c: Xfail for the riscv backend. * lib/target-supports.exp: Add RISC-V backend. Signed-off-by: Pan Li --- gcc/testsuite/gcc.dg/vect/slp-mask-store-1.c | 2 ++ gcc/testsuite/gcc.dg/vect/vect-early-break_124-pr114403.c | 2 +- gcc/testsuite/lib/target-supports.exp | 2 ++ 3 files changed, 5 insertions(+), 1 deletion(-) diff --git a/gcc/testsuite/gcc.dg/vect/slp-mask-store-1.c b/gcc/testsuite/gcc.dg/vect/slp-mask-store-1.c index fdd9032da98..2f80bf89e5e 100644 --- a/gcc/testsuite/gcc.dg/vect/slp-mask-store-1.c +++ b/gcc/testsuite/gcc.dg/vect/slp-mask-store-1.c @@ -28,6 +28,8 @@ main () if (__builtin_memcmp (x, res, sizeof (x)) != 0) abort (); + +#pragma GCC novector for (int i = 0; i < 32; ++i) if (flag[i] != 0 && flag[i] != 1) abort (); diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_124-pr114403.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_124-pr114403.c index 51abf245ccb..101ae1e0eaa 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-early-break_124-pr114403.c +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_124-pr114403.c @@ -2,7 +2,7 @@ /* { dg-require-effective-target vect_early_break_hw } */ /* { dg-require-effective-target vect_long_long } */ -/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { xfail riscv*-*-* } } } */ #include "tree-vect.h" diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index 6f5d477b128..adaa5912588 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -4099,6 +4099,7 @@ proc check_effective_target_vect_early_break { } { || [check_effective_target_arm_v8_neon_ok] || [check_effective_target_sse4] || [istarget amdgcn-*-*] + || [check_effective_target_riscv_v] }}] } @@ -4114,6 +4115,7 @@ proc check_effective_target_vect_early_break_hw { } { || [check_effective_target_arm_v8_neon_hw] || [check_sse4_hw_available] || [istarget amdgcn-*-*] + || [check_effective_target_riscv_v] }}] } -- 2.34.1
[PATCH v1 2/3] RISC-V: Implement vectorizable early exit with vcond_mask_len
From: Pan Li This patch depends on below middle-end implementation. https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651459.html After we support the loop lens for the vectorizable, we would like to implement the feature for the RISC-V target. Given below example: unsigned vect_a[1923]; unsigned vect_b[1923]; unsigned test (unsigned limit, int n) { unsigned ret = 0; for (int i = 0; i < n; i++) { vect_b[i] = limit + i; if (vect_a[i] > limit) { ret = vect_b[i]; return ret; } vect_a[i] = limit; } return ret; } Before this patch: ... .L8: swa3,0(a5) addiw a0,a0,1 addi a4,a4,4 addi a5,a5,4 beq a1,a0,.L2 .L4: swa0,0(a4) lwa2,0(a5) bleu a2,a3,.L8 ret After this patch: ... .L5: vsetvli a5,a3,e8,mf4,ta,ma vmv1r.v v4,v2 vsetvli t4,zero,e32,m1,ta,ma vmv.v.x v1,a5 vadd.vv v2,v2,v1 vsetvli zero,a5,e32,m1,ta,ma vadd.vv v5,v4,v3 slli a6,a5,2 vle32.v v1,0(t1) vmsltu.vv v1,v3,v1 vcpop.m t4,v1 beq t4,zero,.L4 vmv.x.s a4,v4 .L3: ... The below tests are passed for this patch: 1. The riscv fully regression tests. gcc/ChangeLog: * config/riscv/autovec-opt.md (*vcond_mask_len_popcount_): New pattern of vcond_mask_len_popcount for vector bool mode. * config/riscv/autovec.md (vcond_mask_len_): New pattern of vcond_mask_len for vector bool mode. (cbranch4): New pattern for vector bool mode. * config/riscv/vector-iterators.md: Add new unspec UNSPEC_SELECT_MASK. * config/riscv/vector.md (@pred_popcount): Add VLS mode to popcount pattern. (@pred_popcount): Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/early-break-1.c: New test. * gcc.target/riscv/rvv/autovec/early-break-2.c: New test. Signed-off-by: Pan Li --- gcc/config/riscv/autovec-opt.md | 33 ++ gcc/config/riscv/autovec.md | 60 +++ gcc/config/riscv/vector-iterators.md | 1 + gcc/config/riscv/vector.md| 18 +++--- .../riscv/rvv/autovec/early-break-1.c | 34 +++ .../riscv/rvv/autovec/early-break-2.c | 37 6 files changed, 174 insertions(+), 9 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/early-break-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/early-break-2.c diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md index 645dc53d868..04f85d8e455 100644 --- a/gcc/config/riscv/autovec-opt.md +++ b/gcc/config/riscv/autovec-opt.md @@ -1436,3 +1436,36 @@ (define_insn_and_split "*n" DONE; } [(set_attr "type" "vmalu")]) + +;; Optimization pattern for early break auto-vectorization +;; vcond_mask_len (mask, ones, zeros, len, bias) + vlmax popcount +;; -> non vlmax popcount (mask, len) +(define_insn_and_split "*vcond_mask_len_popcount_" + [(set (match_operand:P 0 "register_operand") +(popcount:P + (unspec:VB_VLS [ + (unspec:VB_VLS [ + (match_operand:VB_VLS 1 "register_operand") + (match_operand:VB_VLS 2 "const_1_operand") + (match_operand:VB_VLS 3 "const_0_operand") + (match_operand 4 "autovec_length_operand") + (match_operand 5 "const_0_operand")] UNSPEC_SELECT_MASK) + (match_operand 6 "autovec_length_operand") + (const_int 1) + (reg:SI VL_REGNUM) + (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)))] + "TARGET_VECTOR + && can_create_pseudo_p () + && riscv_vector::get_vector_mode (Pmode, GET_MODE_NUNITS (mode)).exists ()" + "#" + "&& 1" + [(const_int 0)] + { +riscv_vector::emit_nonvlmax_insn ( + code_for_pred_popcount (mode, Pmode), + riscv_vector::CPOP_OP, + operands, operands[4]); +DONE; + } + [(set_attr "type" "vector")] +) diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md index aa1ae0fe075..dfa58b8af69 100644 --- a/gcc/config/riscv/autovec.md +++ b/gcc/config/riscv/autovec.md @@ -2612,3 +2612,63 @@ (define_expand "rawmemchr" DONE; } ) + +;; = +;; == Early break auto-vectorization patterns +;; = + +;; vcond_mask_len +(define_insn_and_split "vcond_mask_len_" + [(set (match_operand:VB 0 "register_operand") +(unspec: VB [ + (match_operand:VB 1 "register_operand") + (match_operand:VB 2 "const_1_operand") + (match_operand:VB 3 "const_0_operand") + (match_operand 4 "autovec_length_operand") + (match_operand 5 "const_0_operand")] UNSPEC_SELECT_MASK))] + "TARGET_VECTOR + && can_create_pseudo_p () + && riscv_vector::get_vector_mode (Pmode, GET_MODE_NUNITS (mode)).exists ()" + "#" + "&& 1" + [(const_int 0)] + { +machine_mode mode = riscv_vector::get_vector_mode (Pmode, +
[PATCH v1 1/3] Vect: Support loop len in vectorizable early exit
From: Pan Li This patch adds early break auto-vectorization support for target which use length on partial vectorization. Consider this following example: unsigned vect_a[802]; unsigned vect_b[802]; void test (unsigned x, int n) { for (int i = 0; i < n; i++) { vect_b[i] = x + i; if (vect_a[i] > x) break; vect_a[i] = x; } } We use VCOND_MASK_LEN to simulate the generate (mask && i < len + bias). And then the IR of RVV looks like below: ... _87 = .SELECT_VL (ivtmp_85, POLY_INT_CST [32, 32]); _55 = (int) _87; ... mask_patt_6.13_69 = vect_cst__62 < vect__3.12_67; vec_len_mask_72 = .VCOND_MASK_LEN (mask_patt_6.13_69, { -1, ... }, \ {0, ... }, _87, 0); if (vec_len_mask_72 != { 0, ... }) goto ; [5.50%] else goto ; [94.50%] The below tests are passed for this patch: 1. The riscv fully regression tests. 2. The aarch64 fully regression tests. 3. The x86 bootstrap tests. 4. The x86 fully regression tests. gcc/ChangeLog: * tree-vect-stmts.cc (vectorizable_early_exit): Add loop len handling for one or multiple stmt. Signed-off-by: Pan Li --- gcc/tree-vect-stmts.cc | 47 -- 1 file changed, 45 insertions(+), 2 deletions(-) diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index 21e8fe98e44..bfd9d66568f 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -12896,7 +12896,9 @@ vectorizable_early_exit (vec_info *vinfo, stmt_vec_info stmt_info, ncopies = vect_get_num_copies (loop_vinfo, vectype); vec_loop_masks *masks = _VINFO_MASKS (loop_vinfo); + vec_loop_lens *lens = _VINFO_LENS (loop_vinfo); bool masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo); + bool len_loop_p = LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo); /* Now build the new conditional. Pattern gimple_conds get dropped during codegen so we must replace the original insn. */ @@ -12960,12 +12962,11 @@ vectorizable_early_exit (vec_info *vinfo, stmt_vec_info stmt_info, { if (direct_internal_fn_supported_p (IFN_VCOND_MASK_LEN, vectype, OPTIMIZE_FOR_SPEED)) - return false; + vect_record_loop_len (loop_vinfo, lens, ncopies, vectype, 1); else vect_record_loop_mask (loop_vinfo, masks, ncopies, vectype, NULL); } - return true; } @@ -13018,6 +13019,25 @@ vectorizable_early_exit (vec_info *vinfo, stmt_vec_info stmt_info, stmts[i], _gsi); workset.quick_push (stmt_mask); } + else if (len_loop_p) + for (unsigned i = 0; i < stmts.length (); i++) + { + tree all_ones_mask = build_all_ones_cst (vectype); + tree all_zero_mask = build_zero_cst (vectype); + tree len = vect_get_loop_len (loop_vinfo, gsi, lens, ncopies, + vectype, i, 1); + signed char cst = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo); + tree bias = build_int_cst (intQI_type_node, cst); + tree len_mask + = make_temp_ssa_name (TREE_TYPE (stmts[i]), NULL, "vec_len_mask"); + gcall *call = gimple_build_call_internal (IFN_VCOND_MASK_LEN, 5, + stmts[i], all_ones_mask, + all_zero_mask, len, bias); + gimple_call_set_lhs (call, len_mask); + gsi_insert_before (_gsi, call, GSI_SAME_STMT); + + workset.quick_push (len_mask); + } else workset.splice (stmts); @@ -13042,6 +13062,29 @@ vectorizable_early_exit (vec_info *vinfo, stmt_vec_info stmt_info, new_temp = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, new_temp, _gsi); } + else if (len_loop_p) + { + /* len_mask = VCOND_MASK_LEN (compare_mask, ones, zero, len, bias) + +which is equivalent to: + +len_mask = compare_mask mask && i < len ? 1 : 0 + */ + tree all_ones_mask = build_all_ones_cst (vectype); + tree all_zero_mask = build_zero_cst (vectype); + tree len + = vect_get_loop_len (loop_vinfo, gsi, lens, ncopies, vectype, 0, 1); + signed char biasval = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo); + tree bias = build_int_cst (intQI_type_node, biasval); + tree len_mask + = make_temp_ssa_name (TREE_TYPE (new_temp), NULL, "vec_len_mask"); + gcall *call = gimple_build_call_internal (IFN_VCOND_MASK_LEN, 5, + new_temp, all_ones_mask, + all_zero_mask, len, bias); + gimple_call_set_lhs (call, len_mask); + gsi_insert_before (_gsi, call, GSI_SAME_STMT); + new_temp = len_mask; + } } gcc_assert
Re: [PATCH v2 2/3] diagnostics: Don't hardcode auto_enable_urls to false for mingw hosts
13 May 2024 1:30:28 pm NightStrike : On Thu, May 9, 2024 at 1:03 PM Peter Damianov wrote: Windows terminal and mintty both have support for link escape sequences, and so auto_enable_urls shouldn't be hardcoded to false. For older versions of the windows console, mingw_ansi_fputs's console API translation logic does mangle these sequences, but there's nothing useful it could do even if this weren't the case, so check if the ansi escape sequences are supported at all. conhost.exe doesn't support link escape sequences, but printing them does not cause any problems. Are there any issues when running under the Wine console, such as when running the testsuite? I installed wine and gave compiling a file emitting a warning a try. Unfortunately, yes, gcc emits mangled warnings here. Even simply running this patch under wine causes problems, it's not just wine's conhost.exe. I'm not sure whether it's my fault or wine's. I've attached two screenshots demonstrating exactly what happens. (I think???) wine should only be advertising that it supports those settings regarding escape sequences if it actually does. Also, on this machine, wine is near unusably slow, I'm talking multiple seconds to react to a keypress through the wine conhost. I will not be attempting to run the testsuite, I severely doubt it will work.
Re: [PATCH] internal-fn: Do not force vcond operand to reg.
On Mon, May 13, 2024 at 4:14 PM Robin Dapp wrote: > > > What happens if we simply remove all of the force_reg here? > > On x86 I bootstrapped and tested the attached without fallout > (gcc188, so it's no avx512-native machine and therefore limited > coverage). riscv regtest is unchanged. > For aarch64 I would to rely on the pre-commit CI to pick it > up (does that work on sub-threads?). OK if that pre-commit CI works out. Richard. > Regards > Robin > > > gcc/ChangeLog: > > PR middle-end/113474 > > * internal-fn.cc (expand_vec_cond_mask_optab_fn): Remove > force_regs. > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/rvv/autovec/pr113474.c: New test. > --- > gcc/internal-fn.cc | 3 --- > .../gcc.target/riscv/rvv/autovec/pr113474.c | 13 + > 2 files changed, 13 insertions(+), 3 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr113474.c > > diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc > index 2c764441cde..4d226c478b4 100644 > --- a/gcc/internal-fn.cc > +++ b/gcc/internal-fn.cc > @@ -3163,9 +3163,6 @@ expand_vec_cond_mask_optab_fn (internal_fn, gcall > *stmt, convert_optab optab) >rtx_op1 = expand_normal (op1); >rtx_op2 = expand_normal (op2); > > - mask = force_reg (mask_mode, mask); > - rtx_op1 = force_reg (mode, rtx_op1); > - >rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE); >create_output_operand ([0], target, mode); >create_input_operand ([1], rtx_op1, mode); > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr113474.c > b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr113474.c > new file mode 100644 > index 000..0364bf9f5e3 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr113474.c > @@ -0,0 +1,13 @@ > +/* { dg-do compile { target riscv_v } } */ > +/* { dg-additional-options "-std=c99" } */ > + > +void > +foo (int n, int **a) > +{ > + int b; > + for (b = 0; b < n; b++) > +for (long e = 8; e > 0; e--) > + a[b][e] = a[b][e] == 15; > +} > + > +/* { dg-final { scan-assembler "vmerge.vim" } } */ > -- > 2.45.0 >
Re: [PATCH] internal-fn: Do not force vcond operand to reg.
> What happens if we simply remove all of the force_reg here? On x86 I bootstrapped and tested the attached without fallout (gcc188, so it's no avx512-native machine and therefore limited coverage). riscv regtest is unchanged. For aarch64 I would to rely on the pre-commit CI to pick it up (does that work on sub-threads?). Regards Robin gcc/ChangeLog: PR middle-end/113474 * internal-fn.cc (expand_vec_cond_mask_optab_fn): Remove force_regs. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/pr113474.c: New test. --- gcc/internal-fn.cc | 3 --- .../gcc.target/riscv/rvv/autovec/pr113474.c | 13 + 2 files changed, 13 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr113474.c diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc index 2c764441cde..4d226c478b4 100644 --- a/gcc/internal-fn.cc +++ b/gcc/internal-fn.cc @@ -3163,9 +3163,6 @@ expand_vec_cond_mask_optab_fn (internal_fn, gcall *stmt, convert_optab optab) rtx_op1 = expand_normal (op1); rtx_op2 = expand_normal (op2); - mask = force_reg (mask_mode, mask); - rtx_op1 = force_reg (mode, rtx_op1); - rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE); create_output_operand ([0], target, mode); create_input_operand ([1], rtx_op1, mode); diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr113474.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr113474.c new file mode 100644 index 000..0364bf9f5e3 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr113474.c @@ -0,0 +1,13 @@ +/* { dg-do compile { target riscv_v } } */ +/* { dg-additional-options "-std=c99" } */ + +void +foo (int n, int **a) +{ + int b; + for (b = 0; b < n; b++) +for (long e = 8; e > 0; e--) + a[b][e] = a[b][e] == 15; +} + +/* { dg-final { scan-assembler "vmerge.vim" } } */ -- 2.45.0
Re: [PATCH v1] RISC-V: Bugfix ICE for RVV intrinisc vfw on _Float16 scalar
LGTM as well :) On Sat, May 11, 2024 at 3:58 PM juzhe.zh...@rivai.ai wrote: > > LGTM from my side. Wait for kito chime in. > > > juzhe.zh...@rivai.ai > > > From: pan2.li > Date: 2024-05-11 15:54 > To: gcc-patches > CC: juzhe.zhong; kito.cheng; Pan Li > Subject: [PATCH v1] RISC-V: Bugfix ICE for RVV intrinisc vfw on _Float16 > scalar > From: Pan Li > > For the vfw vx format RVV intrinsic, the scalar type _Float16 also > requires the zvfh extension. Unfortunately, we only check the > vector tree type and miss the scalar _Float16 type checking. For > example: > > vfloat32mf2_t test_vfwsub_wf_f32mf2(vfloat32mf2_t vs2, _Float16 rs1, size_t > vl) > { > return __riscv_vfwsub_wf_f32mf2(vs2, rs1, vl); > } > > It should report some error message like zvfh extension is required > instead of ICE for unreg insn. > > This patch would like to make up such kind of validation for _Float16 > in the RVV intrinsic API. It will report some error like below when > there is no zvfh enabled. > > error: built-in function '__riscv_vfwsub_wf_f32mf2(vs2, rs1, vl)' > requires the zvfhmin or zvfh ISA extension > > PR target/114988 > > Passed the rv64gcv fully regression tests, included c/c++/fortran. > > gcc/ChangeLog: > > * config/riscv/riscv-vector-builtins.cc > (validate_instance_type_required_extensions): New func impl to > validate the intrinisc func type ops. > (expand_builtin): Validate instance type before expand. > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/rvv/base/pr114988-1.c: New test. > * gcc.target/riscv/rvv/base/pr114988-2.c: New test. > > Signed-off-by: Pan Li > --- > gcc/config/riscv/riscv-vector-builtins.cc | 51 +++ > .../gcc.target/riscv/rvv/base/pr114988-1.c| 9 > .../gcc.target/riscv/rvv/base/pr114988-2.c| 9 > 3 files changed, 69 insertions(+) > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr114988-1.c > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr114988-2.c > > diff --git a/gcc/config/riscv/riscv-vector-builtins.cc > b/gcc/config/riscv/riscv-vector-builtins.cc > index 192a6c230d1..3fdb4400d70 100644 > --- a/gcc/config/riscv/riscv-vector-builtins.cc > +++ b/gcc/config/riscv/riscv-vector-builtins.cc > @@ -4632,6 +4632,54 @@ gimple_fold_builtin (unsigned int code, > gimple_stmt_iterator *gsi, gcall *stmt) >return gimple_folder (rfn.instance, rfn.decl, gsi, stmt).fold (); > } > +static bool > +validate_instance_type_required_extensions (const rvv_type_info type, > + tree exp) > +{ > + uint64_t exts = type.required_extensions; > + > + if ((exts & RVV_REQUIRE_ELEN_FP_16) && > +!TARGET_VECTOR_ELEN_FP_16_P (riscv_vector_elen_flags)) > +{ > + error_at (EXPR_LOCATION (exp), > + "built-in function %qE requires the " > + "zvfhmin or zvfh ISA extension", > + exp); > + return false; > +} > + > + if ((exts & RVV_REQUIRE_ELEN_FP_32) && > +!TARGET_VECTOR_ELEN_FP_32_P (riscv_vector_elen_flags)) > +{ > + error_at (EXPR_LOCATION (exp), > + "built-in function %qE requires the " > + "zve32f, zve64f, zve64d or v ISA extension", > + exp); > + return false; > +} > + > + if ((exts & RVV_REQUIRE_ELEN_FP_64) && > +!TARGET_VECTOR_ELEN_FP_64_P (riscv_vector_elen_flags)) > +{ > + error_at (EXPR_LOCATION (exp), > + "built-in function %qE requires the zve64d or v ISA extension", > + exp); > + return false; > +} > + > + if ((exts & RVV_REQUIRE_ELEN_64) && > +!TARGET_VECTOR_ELEN_64_P (riscv_vector_elen_flags)) > +{ > + error_at (EXPR_LOCATION (exp), > + "built-in function %qE requires the " > + "zve64x, zve64f, zve64d or v ISA extension", > + exp); > + return false; > +} > + > + return true; > +} > + > /* Expand a call to the RVV function with subcode CODE. EXP is the call > expression and TARGET is the preferred location for the result. > Return the value of the lhs. */ > @@ -4649,6 +4697,9 @@ expand_builtin (unsigned int code, tree exp, rtx target) >return target; > } > + if (!validate_instance_type_required_extensions (rfn.instance.type, exp)) > +return target; > + >return function_expander (rfn.instance, rfn.decl, exp, target).expand (); > } > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr114988-1.c > b/gcc/testsuite/gcc.target/riscv/rvv/base/pr114988-1.c > new file mode 100644 > index 000..b8474804c88 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr114988-1.c > @@ -0,0 +1,9 @@ > +/* { dg-do compile } */ > +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3" } */ > + > +#include "riscv_vector.h" > + > +vfloat32mf2_t test_vfwsub_wf_f32mf2(vfloat32mf2_t vs2, _Float16 rs1, size_t > vl) > +{ > + return __riscv_vfwsub_wf_f32mf2(vs2, rs1, vl); /* { dg-error {built-in > function '__riscv_vfwsub_wf_f32mf2\(vs2, rs1, vl\)' requires the zvfhmin or > zvfh ISA extension} } */ > +} > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr114988-2.c >
Re: [EXTERNAL] [COMMITTED] Regenerate cygming.opt.urls and mingw.opt.urls
On Mon, 2024-05-13 at 09:42 -0400, David Malcolm wrote: > On Mon, 2024-05-13 at 11:14 +0200, Mark Wielaard wrote: > > Hi Evgeny, > > > > Adding David to the CC, who might know the details. > > > > On Mon, May 13, 2024 at 08:44:12AM +, Evgeny Karpov wrote: > > > Sunday, May 12, 2024 > > > > > > Thank you for reviewing our changes related to the refactoring of > > > extracting the MinGW implementation from ix64. > > > > > > It was expected to move the MinGW-related files without changes > > > in > > > this commit ("Reuse MinGW from i386 for AArch64") and apply the > > > renaming in a follow-up commit, which has been done in 'Rename > > > "x86 > > > Windows Options" to "Cygwin and MinGW Options"'. > > > > > > The script to update opt.urls files has been used. > > > > > > > diff --git a/gcc/config/mingw/cygming.opt.urls > > > > b/gcc/config/mingw/cygming.opt.urls > > > > index c624e22e4427..af11c4997609 100644 > > > > --- a/gcc/config/mingw/cygming.opt.urls > > > > +++ b/gcc/config/mingw/cygming.opt.urls > > > > @@ -1,4 +1,4 @@ > > > > > > > -; Autogenerated by regenerate-opt-urls.py from > > > > gcc/config/i386/cygming.opt > > > > and generated HTML > > > > +; Autogenerated by regenerate-opt-urls.py from > > > > +gcc/config/mingw/cygming.opt and generated HTML > > > > > > I am not sure why this comment has not been updated. Is it > > > critical > > > or it could be updated next time when it is needed? > > > > Odd that the script didn't update this comment, it really should > > have. > > It might be that running the script through make regenerate-opt- > > urls > > inside the gcc build subdir invokes regenerate-opt-urls.py slightly > > differently so that this line is updated. > > It might be a "make" dependencies issue: > "make regenerate-opt-urls" has dependencies on OPT_URLS_HTML_DEPS > which > is currently defined as: > OPT_URLS_HTML_DEPS = $(build_htmldir)/gcc/Option-Index.html \ > $(build_htmldir)/gdc/Option-Index.html \ > $(build_htmldir)/gfortran/Option-Index.html > which might not be enough for the doc changes when moving things > around > that affect other generated html files. > > So when the CI runs "make regenerate-opt-urls" in a pristine build it > will forcibly rerun texinfo to regenerate the docs first, whereas if > you manually run the script in a build directory, you might not be > seeing the latest version of the HTML (especially in thre presence of > file moves). > > So I think the Makefile as currently written handles most cases, but > can get it slightly wrong for the case you ran into here (sorry); > fully > refreshing the built docs ought to fix such cases. Specifically, if you have some generated .html files in the $(build_htmldir) from a file that has gone away (due to a move), then I suspect these .html files stick around until you fully delete the $(build_htmldir), and in the meantime they get found by regenerate-opt- urls.py and lead to duplicate enries, leading to differences against a pristine build dir. Dave > > That's my theory of what happened here, anyway. > > Dave > > > > > > > mconsole > > > > UrlSuffix(gcc/Cygwin-and-MinGW-Options.html#index-mconsole) > > > > @@ -9,9 +9,8 @@ UrlSuffix(gcc/Cygwin-and-MinGW- > > > > Options.html#index- > > > > mdll) > > > > mnop-fun-dllimport > > > > UrlSuffix(gcc/Cygwin-and-MinGW-Options.html#index-mnop-fun- > > > > dllimport) > > > > > > > > -; skipping UrlSuffix for 'mthreads' due to multiple URLs: > > > > -; duplicate: 'gcc/Cygwin-and-MinGW-Options.html#index- > > > > mthreads-1' > > > > -; duplicate: 'gcc/x86-Options.html#index-mthreads' > > > > +mthreads > > > > +UrlSuffix(gcc/Cygwin-and-MinGW-Options.html#index-mthreads-1) > > > > > > mthreads has the same issue before applying changes. Has > > > something > > > been changed recently? > > > This is the change in patch series in 'Rename "x86 Windows > > > Options" > > > to "Cygwin and MinGW Options"' commit. > > > > > > ; skipping UrlSuffix for 'mthreads' due to multiple URLs: > > > +; duplicate: 'gcc/Cygwin-and-MinGW-Options.html#index- > > > mthreads- > > > 1' > > > ; duplicate: 'gcc/x86-Options.html#index-mthreads' > > > -; duplicate: 'gcc/x86-Windows-Options.html#index-mthreads-1' > > > > Again, it might be caused by invoking the script by hand vs with > > make > > regenerate-opt-urls.py. I believe with the make option it will > > renumber the suffixes making sure the urls are unique. > > > > BTW. There is a CI buildbot that tries to regenerate all generated > > files, which is how I spotted this: > > https://builder.sourceware.org/buildbot/#/builders/gcc-autoregen > > (It should also sent email to the author of the patch on failure.) > > > > Cheers, > > > > Mark > > >
Re: [EXTERNAL] [COMMITTED] Regenerate cygming.opt.urls and mingw.opt.urls
On Mon, 2024-05-13 at 11:14 +0200, Mark Wielaard wrote: > Hi Evgeny, > > Adding David to the CC, who might know the details. > > On Mon, May 13, 2024 at 08:44:12AM +, Evgeny Karpov wrote: > > Sunday, May 12, 2024 > > > > Thank you for reviewing our changes related to the refactoring of > > extracting the MinGW implementation from ix64. > > > > It was expected to move the MinGW-related files without changes in > > this commit ("Reuse MinGW from i386 for AArch64") and apply the > > renaming in a follow-up commit, which has been done in 'Rename "x86 > > Windows Options" to "Cygwin and MinGW Options"'. > > > > The script to update opt.urls files has been used. > > > > > diff --git a/gcc/config/mingw/cygming.opt.urls > > > b/gcc/config/mingw/cygming.opt.urls > > > index c624e22e4427..af11c4997609 100644 > > > --- a/gcc/config/mingw/cygming.opt.urls > > > +++ b/gcc/config/mingw/cygming.opt.urls > > > @@ -1,4 +1,4 @@ > > > > > -; Autogenerated by regenerate-opt-urls.py from > > > gcc/config/i386/cygming.opt > > > and generated HTML > > > +; Autogenerated by regenerate-opt-urls.py from > > > +gcc/config/mingw/cygming.opt and generated HTML > > > > I am not sure why this comment has not been updated. Is it critical > > or it could be updated next time when it is needed? > > Odd that the script didn't update this comment, it really should > have. > It might be that running the script through make regenerate-opt-urls > inside the gcc build subdir invokes regenerate-opt-urls.py slightly > differently so that this line is updated. It might be a "make" dependencies issue: "make regenerate-opt-urls" has dependencies on OPT_URLS_HTML_DEPS which is currently defined as: OPT_URLS_HTML_DEPS = $(build_htmldir)/gcc/Option-Index.html \ $(build_htmldir)/gdc/Option-Index.html \ $(build_htmldir)/gfortran/Option-Index.html which might not be enough for the doc changes when moving things around that affect other generated html files. So when the CI runs "make regenerate-opt-urls" in a pristine build it will forcibly rerun texinfo to regenerate the docs first, whereas if you manually run the script in a build directory, you might not be seeing the latest version of the HTML (especially in thre presence of file moves). So I think the Makefile as currently written handles most cases, but can get it slightly wrong for the case you ran into here (sorry); fully refreshing the built docs ought to fix such cases. That's my theory of what happened here, anyway. Dave > > > > mconsole > > > UrlSuffix(gcc/Cygwin-and-MinGW-Options.html#index-mconsole) > > > @@ -9,9 +9,8 @@ UrlSuffix(gcc/Cygwin-and-MinGW- > > > Options.html#index- > > > mdll) > > > mnop-fun-dllimport > > > UrlSuffix(gcc/Cygwin-and-MinGW-Options.html#index-mnop-fun- > > > dllimport) > > > > > > -; skipping UrlSuffix for 'mthreads' due to multiple URLs: > > > -; duplicate: 'gcc/Cygwin-and-MinGW-Options.html#index- > > > mthreads-1' > > > -; duplicate: 'gcc/x86-Options.html#index-mthreads' > > > +mthreads > > > +UrlSuffix(gcc/Cygwin-and-MinGW-Options.html#index-mthreads-1) > > > > mthreads has the same issue before applying changes. Has something > > been changed recently? > > This is the change in patch series in 'Rename "x86 Windows Options" > > to "Cygwin and MinGW Options"' commit. > > > > ; skipping UrlSuffix for 'mthreads' due to multiple URLs: > > +; duplicate: 'gcc/Cygwin-and-MinGW-Options.html#index-mthreads- > > 1' > > ; duplicate: 'gcc/x86-Options.html#index-mthreads' > > -; duplicate: 'gcc/x86-Windows-Options.html#index-mthreads-1' > > Again, it might be caused by invoking the script by hand vs with make > regenerate-opt-urls.py. I believe with the make option it will > renumber the suffixes making sure the urls are unique. > > BTW. There is a CI buildbot that tries to regenerate all generated > files, which is how I spotted this: > https://builder.sourceware.org/buildbot/#/builders/gcc-autoregen > (It should also sent email to the author of the patch on failure.) > > Cheers, > > Mark >
RE: [PATCH v4 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int
Thanks Tamer for comments. > I think OPTIMIZE_FOR_BOTH is better here, since this is a win also when > optimizing for size. Sure thing, let me update it in v5. > Hmm why do you iterate independently over the statements? The block below > already visits > Every statement doesn't it? Because it will hit .ADD_OVERFLOW first, then it will never hit SAT_ADD as the shape changed, or shall we put it to the previous pass ? > The root of your match is a BIT_IOR_EXPR expression, so I think you just need > to change the entry below to: > > case BIT_IOR_EXPR: > match_saturation_arith (, stmt, m_cfg_changed_p); > /* fall-through */ > case BIT_XOR_EXPR: > match_uaddc_usubc (, stmt, code); > break; There are other shapes (not covered in this patch) of SAT_ADD like below branch version, the IOR should be one of the ROOT. Thus doesn't add case here. Then, shall we take case for each shape here ? Both works for me. #define SAT_ADD_U_1(T) \ T sat_add_u_1_##T(T x, T y) \ { \ return (T)(x + y) >= x ? (x + y) : -1; \ } SAT_ADD_U_1(uint32_t) Pan -Original Message- From: Tamar Christina Sent: Monday, May 13, 2024 5:10 PM To: Li, Pan2 ; gcc-patches@gcc.gnu.org Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com; Liu, Hongtao Subject: RE: [PATCH v4 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int Hi Pan, > -Original Message- > From: pan2...@intel.com > Sent: Monday, May 6, 2024 3:48 PM > To: gcc-patches@gcc.gnu.org > Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; Tamar Christina > ; richard.guent...@gmail.com; > hongtao@intel.com; Pan Li > Subject: [PATCH v4 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned > scalar > int > > From: Pan Li > > This patch would like to add the middle-end presentation for the > saturation add. Aka set the result of add to the max when overflow. > It will take the pattern similar as below. > > SAT_ADD (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x)) > > Take uint8_t as example, we will have: > > * SAT_ADD (1, 254) => 255. > * SAT_ADD (1, 255) => 255. > * SAT_ADD (2, 255) => 255. > * SAT_ADD (255, 255) => 255. > > Given below example for the unsigned scalar integer uint64_t: > > uint64_t sat_add_u64 (uint64_t x, uint64_t y) > { > return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x)); > } > > Before this patch: > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y) > { > long unsigned int _1; > _Bool _2; > long unsigned int _3; > long unsigned int _4; > uint64_t _7; > long unsigned int _10; > __complex__ long unsigned int _11; > > ;; basic block 2, loop depth 0 > ;;pred: ENTRY > _11 = .ADD_OVERFLOW (x_5(D), y_6(D)); > _1 = REALPART_EXPR <_11>; > _10 = IMAGPART_EXPR <_11>; > _2 = _10 != 0; > _3 = (long unsigned int) _2; > _4 = -_3; > _7 = _1 | _4; > return _7; > ;;succ: EXIT > > } > > After this patch: > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y) > { > uint64_t _7; > > ;; basic block 2, loop depth 0 > ;;pred: ENTRY > _7 = .SAT_ADD (x_5(D), y_6(D)); [tail call] > return _7; > ;;succ: EXIT > } > > We perform the tranform during widen_mult because that the sub-expr of > SAT_ADD will be optimized to .ADD_OVERFLOW. We need to try the .SAT_ADD > pattern first and then .ADD_OVERFLOW, or we may never catch the pattern > .SAT_ADD. Meanwhile, the isel pass is after widen_mult and then we > cannot perform the .SAT_ADD pattern match as the sub-expr will be > optmized to .ADD_OVERFLOW first. > > The below tests are passed for this patch: > 1. The riscv fully regression tests. > 2. The aarch64 fully regression tests. > 3. The x86 bootstrap tests. > 4. The x86 fully regression tests. > > PR target/51492 > PR target/112600 > > gcc/ChangeLog: > > * internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADD > to the return true switch case(s). > * internal-fn.def (SAT_ADD): Add new signed optab SAT_ADD. > * match.pd: Add unsigned SAT_ADD match. > * optabs.def (OPTAB_NL): Remove fixed-point limitation for us/ssadd. > * tree-ssa-math-opts.cc (gimple_unsigned_integer_sat_add): New extern > func decl generated in match.pd match. > (match_saturation_arith): New func impl to match the saturation arith. > (math_opts_dom_walker::after_dom_children): Try match saturation > arith. > > Signed-off-by: Pan Li > --- > gcc/internal-fn.cc| 1 + > gcc/internal-fn.def | 2 ++ > gcc/match.pd | 28 > gcc/optabs.def| 4 ++-- > gcc/tree-ssa-math-opts.cc | 46 > +++ > 5 files changed, 79 insertions(+), 2 deletions(-) > > diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc > index 0a7053c2286..73045ca8c8c 100644 > --- a/gcc/internal-fn.cc > +++ b/gcc/internal-fn.cc > @@ -4202,6 +4202,7 @@
Re: [PATCH 1/4] rs6000: Make all 128 bit scalar FP modes have 128 bit precision [PR112993]
On Mon, 13 May 2024, Kewen.Lin wrote: > > In fact replacing all of X_TYPE_SIZE with a single hook might be worthwhile > > though this removes the "convenient" defaulting, requiring each target to > > enumerate all standard C ABI type modes. But that might be also a good > > thing. > > > > I guess the main value by extending from floating point types to all is to > unify them? (Assuming that excepting for floating types the others would > not have multiple possible representations like what we faces on 128bit fp). For integer types, giving the number of bits makes sense as an interface - there isn't an issue with different modes. So I think it's appropriate for floating and integer types to have separate hooks - with the one for floating types returning a mode, and the one for integer types returning a number of bits. (And also keep the existing separate hook for _FloatN / _FloatNx modes.) That may also make for more convenient defaults (whether a target has long double wider than double is largely independent of what sizes it uses for integer types). -- Joseph S. Myers josmy...@redhat.com
[PATCH] PR60276 fix for single-lane SLP
When enabling single-lane SLP and not splitting groups the fix for PR60276 is no longer effective since it for unknown reason exempted pure SLP. The following removes this exemption, making gcc.dg/vect/pr60276.c PASS even with --param vect-single-lane-slp=1 Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. PR tree-optimization/60276 * tree-vect-stmts.cc (vectorizable_load): Do not exempt pure_slp grouped loads from the STMT_VINFO_MIN_NEG_DIST restriction. --- gcc/tree-vect-stmts.cc | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index 21e8fe98e44..b8a71605f1b 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -9995,8 +9995,7 @@ vectorizable_load (vec_info *vinfo, /* Invalidate assumptions made by dependence analysis when vectorization on the unrolled body effectively re-orders stmts. */ - if (!PURE_SLP_STMT (stmt_info) - && STMT_VINFO_MIN_NEG_DIST (stmt_info) != 0 + if (STMT_VINFO_MIN_NEG_DIST (stmt_info) != 0 && maybe_gt (LOOP_VINFO_VECT_FACTOR (loop_vinfo), STMT_VINFO_MIN_NEG_DIST (stmt_info))) { -- 2.35.3
Re: [PATCH v2 2/3] diagnostics: Don't hardcode auto_enable_urls to false for mingw hosts
13 May 2024 1:30:28 pm NightStrike : On Thu, May 9, 2024 at 1:03 PM Peter Damianov wrote: Windows terminal and mintty both have support for link escape sequences, and so auto_enable_urls shouldn't be hardcoded to false. For older versions of the windows console, mingw_ansi_fputs's console API translation logic does mangle these sequences, but there's nothing useful it could do even if this weren't the case, so check if the ansi escape sequences are supported at all. conhost.exe doesn't support link escape sequences, but printing them does not cause any problems. Are there any issues when running under the Wine console, such as when running the testsuite? I did not try this. There shouldn't be problems if wine implements ENABLE_VIRTUAL_TERMINAL_PROCESSING correctly, but I agree it would be good to check. Are there instructions anywhere for running the testsuite with wine? Anything specific I need to do?
Re: [PATCH v2 2/3] diagnostics: Don't hardcode auto_enable_urls to false for mingw hosts
On Thu, May 9, 2024 at 1:03 PM Peter Damianov wrote: > > Windows terminal and mintty both have support for link escape sequences, and > so > auto_enable_urls shouldn't be hardcoded to false. For older versions of the > windows console, mingw_ansi_fputs's console API translation logic does mangle > these sequences, but there's nothing useful it could do even if this weren't > the case, so check if the ansi escape sequences are supported at all. > > conhost.exe doesn't support link escape sequences, but printing them does not > cause any problems. Are there any issues when running under the Wine console, such as when running the testsuite?
Re: [PATCH] testsuite: c++: Allow for std::printf in g++.dg/modules/stdio-1_a.H [PR98529]
Hi Nathaniel, >> > There are a couple of other tests that appear to potentially have a >> > similar issue: >> > >> > global-2_a.C >> > 21:// { dg-final { scan-lang-dump-not {Reachable GMF '::printf[^\n']*' >> > added} module } } >> > >> > global-3_a.C >> > 15:// { dg-final { scan-lang-dump-not {Reachable GMF '::printf[^'\n]*' >> > added} module } } >> >> neither module file contains "Reachable GMF" at all, with ::printf or >> otherwise. >> > > Yes, I think the test is aiming to check that such a declaration is not > added at all, and so that's correct. But if for some reason on some > system it did add "::std::printf" that would be a bug that would not be > caught by this test. understood. However, the question about global-3_a.C remains which contains no printf at all. >> > Which I suppose maybe also should be updated in the same way; I guess >> > they don't fail on Solaris because they aren't actually correctly >> > testing what they think they are. >> >> Perhaps, but it would be useful to first understand what those tests are >> supposed to look like. WRT global-3_a.C, printf doesn't occur at all, >> so this may just be a case of copy-and-paste. >> >> Maybe Nathan, who authored the tests, can shed some light. >> >> > Otherwise LGTM. >> >> Thanks. I'll go ahead and commit the patch as is, asjusting the other >> two once it's become clear what they should look like. >> > > Ah, I should have been clearer: I'm not sure I can approve, but I've > CC'd Jason in. Sorry, I already committed the patch. I can revert, of course, if that's inappropriate. OTOH, it could be considered obvious ;-) Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [PATCH] report message for operator %a on unaddressible exp
Hi, on 2024/5/13 10:57, Jiufu Guo wrote: > Hi, > > For PR96866, when gcc print asm code for modifier "%a" which requires > an address operand, while the operand is with the constraint "X" which > allow non-address form. An error message would be reported to indicate > the invalid asm operands. > > Bootstrap pass on ppc64{,le}. > Is this ok for trunk? > > BR, > Jeff(Jiufu Guo) > > PR target/96866 > > gcc/ChangeLog: > > * config/rs6000/rs6000.cc (print_operand_address): > > gcc/testsuite/ChangeLog: > > * gcc.target/powerpc/pr96866-1.c: New test. > * gcc.target/powerpc/pr96866-2.c: New test. > > --- > gcc/config/rs6000/rs6000.cc | 6 ++ > gcc/testsuite/gcc.target/powerpc/pr96866-1.c | 15 +++ > gcc/testsuite/gcc.target/powerpc/pr96866-2.c | 10 ++ > 3 files changed, 31 insertions(+) > create mode 100644 gcc/testsuite/gcc.target/powerpc/pr96866-1.c > create mode 100644 gcc/testsuite/gcc.target/powerpc/pr96866-2.c > > diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc > index 117999613d8..50943d76f79 100644 > --- a/gcc/config/rs6000/rs6000.cc > +++ b/gcc/config/rs6000/rs6000.cc > @@ -14659,6 +14659,12 @@ print_operand_address (FILE *file, rtx x) >else if (SYMBOL_REF_P (x) || GET_CODE (x) == CONST > || GET_CODE (x) == LABEL_REF) > { > + if (this_is_asm_operands && !address_operand (x, VOIDmode)) Do we really need this_is_asm_operands here? > + { > + output_operand_lossage ("invalid expression as operand"); > + return; > + } > + >output_addr_const (file, x); >if (small_data_operand (x, GET_MODE (x))) > fprintf (file, "@%s(%s)", SMALL_DATA_RELOC, > diff --git a/gcc/testsuite/gcc.target/powerpc/pr96866-1.c > b/gcc/testsuite/gcc.target/powerpc/pr96866-1.c > new file mode 100644 > index 000..6554a472a11 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/pr96866-1.c > @@ -0,0 +1,15 @@ > +/* It's to verify no ICE here, ignore error messages about invalid 'asm'. */ > +/* { dg-excess-errors "pr96866-2.c" } */ > +/* { dg-options "-fPIC -O2" } */ Nit: If these two options are required, it would be good to have a comment explaining it a bit when it's not obvious. > + > +int x[2]; > + > +int __attribute__ ((noipa)) > +f1 (void) > +{ > + int n; > + int *p = x; > + *p++; > + __asm__ volatile("ld %0, %a1" : "=r"(n) : "X"(p)); > + return n; > +} > diff --git a/gcc/testsuite/gcc.target/powerpc/pr96866-2.c > b/gcc/testsuite/gcc.target/powerpc/pr96866-2.c > new file mode 100644 > index 000..a5ec96f29dd > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/pr96866-2.c > @@ -0,0 +1,10 @@ > +/* It's to verify no ICE here, ignore error messages about invalid 'asm'. */ > +/* { dg-excess-errors "pr96866-2.c" } */ > +/* { dg-options "-fPIC -O2" } */ Ditto. BR, Kewen > + > +void > +f (void) > +{ > + extern int x; > + __asm__ volatile("#%a0" ::"X"()); > +}
Re: [PATCH] testsuite: c++: Allow for std::printf in g++.dg/modules/stdio-1_a.H [PR98529]
On Mon, May 13, 2024 at 01:59:51PM +0200, Rainer Orth wrote: > Hi Nathaniel, > > > On Mon, May 13, 2024 at 10:40:30AM +0200, Rainer Orth wrote: > >> g++.dg/modules/stdio-1_a.H currently FAILs on Solaris: > >> > >> FAIL: g++.dg/modules/stdio-1_a.H -std=c++17 scan-lang-dump module > >> "Depset:0 decl entity:[0-9]* function_decl:'::printf'" > >> FAIL: g++.dg/modules/stdio-1_a.H -std=c++2a scan-lang-dump module > >> "Depset:0 decl entity:[0-9]* function_decl:'::printf'" > >> FAIL: g++.dg/modules/stdio-1_a.H -std=c++2b scan-lang-dump module > >> "Depset:0 decl entity:[0-9]* function_decl:'::printf'" > >> > >> The problem is that the module file doesn't contain > >> > >> Depset:0 decl entity:95 function_decl:'::printf' > >> > >> as expected by the test, but > >> > >> Depset:0 decl entity:26 function_decl:'::std::printf' > >> > >> This happens because Solaris declares printf in namespace std > >> as allowed by C++11, Annex D, D.5. > >> > >> This patch allows for both forms. > >> > >> Tested on i386-pc-solaris2.11, sparc-sun-solaris2.11, and > >> x86_64-pc-linux-gnu. > >> > >> Ok for trunk? > >> > >>Rainer > > > > There are a couple of other tests that appear to potentially have a > > similar issue: > > > > global-2_a.C > > 21:// { dg-final { scan-lang-dump-not {Reachable GMF '::printf[^\n']*' > > added} module } } > > > > global-3_a.C > > 15:// { dg-final { scan-lang-dump-not {Reachable GMF '::printf[^'\n]*' > > added} module } } > > neither module file contains "Reachable GMF" at all, with ::printf or > otherwise. > Yes, I think the test is aiming to check that such a declaration is not added at all, and so that's correct. But if for some reason on some system it did add "::std::printf" that would be a bug that would not be caught by this test. > > Which I suppose maybe also should be updated in the same way; I guess > > they don't fail on Solaris because they aren't actually correctly > > testing what they think they are. > > Perhaps, but it would be useful to first understand what those tests are > supposed to look like. WRT global-3_a.C, printf doesn't occur at all, > so this may just be a case of copy-and-paste. > > Maybe Nathan, who authored the tests, can shed some light. > > > Otherwise LGTM. > > Thanks. I'll go ahead and commit the patch as is, asjusting the other > two once it's become clear what they should look like. > Ah, I should have been clearer: I'm not sure I can approve, but I've CC'd Jason in. > Rainer > > -- > - > Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [PATCH] testsuite: c++: Allow for std::printf in g++.dg/modules/stdio-1_a.H [PR98529]
Hi Nathaniel, > On Mon, May 13, 2024 at 10:40:30AM +0200, Rainer Orth wrote: >> g++.dg/modules/stdio-1_a.H currently FAILs on Solaris: >> >> FAIL: g++.dg/modules/stdio-1_a.H -std=c++17 scan-lang-dump module "Depset:0 >> decl entity:[0-9]* function_decl:'::printf'" >> FAIL: g++.dg/modules/stdio-1_a.H -std=c++2a scan-lang-dump module "Depset:0 >> decl entity:[0-9]* function_decl:'::printf'" >> FAIL: g++.dg/modules/stdio-1_a.H -std=c++2b scan-lang-dump module "Depset:0 >> decl entity:[0-9]* function_decl:'::printf'" >> >> The problem is that the module file doesn't contain >> >> Depset:0 decl entity:95 function_decl:'::printf' >> >> as expected by the test, but >> >> Depset:0 decl entity:26 function_decl:'::std::printf' >> >> This happens because Solaris declares printf in namespace std >> as allowed by C++11, Annex D, D.5. >> >> This patch allows for both forms. >> >> Tested on i386-pc-solaris2.11, sparc-sun-solaris2.11, and >> x86_64-pc-linux-gnu. >> >> Ok for trunk? >> >> Rainer > > There are a couple of other tests that appear to potentially have a > similar issue: > > global-2_a.C > 21:// { dg-final { scan-lang-dump-not {Reachable GMF '::printf[^\n']*' > added} module } } > > global-3_a.C > 15:// { dg-final { scan-lang-dump-not {Reachable GMF '::printf[^'\n]*' > added} module } } neither module file contains "Reachable GMF" at all, with ::printf or otherwise. > Which I suppose maybe also should be updated in the same way; I guess > they don't fail on Solaris because they aren't actually correctly > testing what they think they are. Perhaps, but it would be useful to first understand what those tests are supposed to look like. WRT global-3_a.C, printf doesn't occur at all, so this may just be a case of copy-and-paste. Maybe Nathan, who authored the tests, can shed some light. > Otherwise LGTM. Thanks. I'll go ahead and commit the patch as is, asjusting the other two once it's become clear what they should look like. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
[PATCH][14 backport] c++: Fix instantiation of imported temploid friends [PR114275]
> > @@ -11751,9 +11767,16 @@ tsubst_friend_class (tree friend_tmpl, tree args) > > if (tmpl != error_mark_node) > > { > > /* The new TMPL is not an instantiation of anything, so we > > -forget its origins. We don't reset CLASSTYPE_TI_TEMPLATE > > +forget its origins. It is also not a specialization of > > +anything. We don't reset CLASSTYPE_TI_TEMPLATE > > for the new type because that is supposed to be the > > corresponding template decl, i.e., TMPL. */ > > + spec_entry elt; > > + elt.tmpl = friend_tmpl; > > + elt.args = CLASSTYPE_TI_ARGS (TREE_TYPE (tmpl)); > > + elt.spec = TREE_TYPE (tmpl); > > + type_specializations->remove_elt (); > > For GCC 14.2 let's guard this with if (modules_p ()); for GCC 15 it can be > unconditional. OK. > > Jason > I'm looking to backport this patch to GCC 14 now that it's been on trunk some time. Here's the patch I'm aiming to add (squashed with the changes from r15-220-gec2365e07537e8) after cherrypicking the prerequisite commit r15-58-g2faf040335f9b4; is this OK? Or should I keep it as two separate commits to make the cherrypicking more obvious? Not entirely sure on the etiquette around this. Bootstrapped and regtested on x86_64-pc-linux-gnu on top of the releases/gcc-14 branch. -- >8 -- This patch fixes a number of issues with the handling of temploid friend declarations. The primary issue is that instantiations of friend declarations should attach the declaration to the same module as the befriending class, by [module.unit] p7.1 and [temp.friend] p2; this could be a different module from the current TU, and so needs special handling. The other main issue here is that we can't assume that just because name lookup didn't find a definition for a hidden class template, that it doesn't exist at all: it could be a non-exported entity that we've nevertheless streamed in from an imported module. We need to ensure that when instantiating template friend classes that we return the same TEMPLATE_DECL that we got from our imports, otherwise we will get later issues with 'duplicate_decls' (rightfully) complaining that they're different when trying to merge. This doesn't appear necessary for function templates due to the existing name lookup handling already finding these hidden declarations. PR c++/105320 PR c++/114275 gcc/cp/ChangeLog: * cp-tree.h (propagate_defining_module): Declare. (remove_defining_module): Declare. (lookup_imported_hidden_friend): Declare. * decl.cc (duplicate_decls): Also check if hidden decls can be redeclared in this module. Call remove_defining_module on to-be-freed newdecl. * module.cc (imported_temploid_friends): New. (init_modules): Initialize it. (trees_out::decl_value): Write it; don't consider imported temploid friends as attached to a module. (trees_in::decl_value): Read it for non-discarded decls. (get_originating_module_decl): Follow the owning decl for an imported temploid friend. (propagate_defining_module): New. (remove_defining_module): New. * name-lookup.cc (get_mergeable_namespace_binding): New. (lookup_imported_hidden_friend): New. * pt.cc (tsubst_friend_function): Propagate defining module for new friend functions. (tsubst_friend_class): Lookup imported hidden friends. Check for valid module attachment of existing names. Propagate defining module for new classes. gcc/testsuite/ChangeLog: * g++.dg/modules/tpl-friend-10_a.C: New test. * g++.dg/modules/tpl-friend-10_b.C: New test. * g++.dg/modules/tpl-friend-10_c.C: New test. * g++.dg/modules/tpl-friend-10_d.C: New test. * g++.dg/modules/tpl-friend-11_a.C: New test. * g++.dg/modules/tpl-friend-11_b.C: New test. * g++.dg/modules/tpl-friend-12_a.C: New test. * g++.dg/modules/tpl-friend-12_b.C: New test. * g++.dg/modules/tpl-friend-12_c.C: New test. * g++.dg/modules/tpl-friend-12_d.C: New test. * g++.dg/modules/tpl-friend-12_e.C: New test. * g++.dg/modules/tpl-friend-12_f.C: New test. * g++.dg/modules/tpl-friend-13_a.C: New test. * g++.dg/modules/tpl-friend-13_b.C: New test. * g++.dg/modules/tpl-friend-13_c.C: New test. * g++.dg/modules/tpl-friend-13_d.C: New test. * g++.dg/modules/tpl-friend-13_e.C: New test. * g++.dg/modules/tpl-friend-13_f.C: New test. * g++.dg/modules/tpl-friend-13_g.C: New test. * g++.dg/modules/tpl-friend-14_a.C: New test. * g++.dg/modules/tpl-friend-14_b.C: New test. * g++.dg/modules/tpl-friend-14_c.C: New test. * g++.dg/modules/tpl-friend-14_d.C: New test. * g++.dg/modules/tpl-friend-9.C: New test. Signed-off-by: Nathaniel Shead Reviewed-by: Jason Merrill Reviewed-by: Patrick Palka ---
Re: [PATCH] testsuite: c++: Allow for std::printf in g++.dg/modules/stdio-1_a.H [PR98529]
On Mon, May 13, 2024 at 10:40:30AM +0200, Rainer Orth wrote: > g++.dg/modules/stdio-1_a.H currently FAILs on Solaris: > > FAIL: g++.dg/modules/stdio-1_a.H -std=c++17 scan-lang-dump module "Depset:0 > decl entity:[0-9]* function_decl:'::printf'" > FAIL: g++.dg/modules/stdio-1_a.H -std=c++2a scan-lang-dump module "Depset:0 > decl entity:[0-9]* function_decl:'::printf'" > FAIL: g++.dg/modules/stdio-1_a.H -std=c++2b scan-lang-dump module "Depset:0 > decl entity:[0-9]* function_decl:'::printf'" > > The problem is that the module file doesn't contain > > Depset:0 decl entity:95 function_decl:'::printf' > > as expected by the test, but > > Depset:0 decl entity:26 function_decl:'::std::printf' > > This happens because Solaris declares printf in namespace std > as allowed by C++11, Annex D, D.5. > > This patch allows for both forms. > > Tested on i386-pc-solaris2.11, sparc-sun-solaris2.11, and > x86_64-pc-linux-gnu. > > Ok for trunk? > > Rainer There are a couple of other tests that appear to potentially have a similar issue: global-2_a.C 21:// { dg-final { scan-lang-dump-not {Reachable GMF '::printf[^\n']*' added} module } } global-3_a.C 15:// { dg-final { scan-lang-dump-not {Reachable GMF '::printf[^'\n]*' added} module } } Which I suppose maybe also should be updated in the same way; I guess they don't fail on Solaris because they aren't actually correctly testing what they think they are. Otherwise LGTM. Nathaniel > > -- > - > Rainer Orth, Center for Biotechnology, Bielefeld University > > > 2024-05-13 Rainer Orth > > gcc/testsuite: > PR c++/98529 > * g++.dg/modules/stdio-1_a.H (scan-lang-dump): Allow for > ::std::printf. > > diff --git a/gcc/testsuite/g++.dg/modules/stdio-1_a.H > b/gcc/testsuite/g++.dg/modules/stdio-1_a.H > --- a/gcc/testsuite/g++.dg/modules/stdio-1_a.H > +++ b/gcc/testsuite/g++.dg/modules/stdio-1_a.H > @@ -10,5 +10,5 @@ > #endif > // There should be *lots* of depsets (209 for glibc today) > // { dg-final { scan-lang-dump {Writing section:60 } module } } > -// { dg-final { scan-lang-dump {Depset:0 decl entity:[0-9]* > function_decl:'::printf'} module } } > +// { dg-final { scan-lang-dump {Depset:0 decl entity:[0-9]* > function_decl:'(::std)?::printf'} module } } > // { dg-final { scan-lang-dump {Depset:1 binding namespace_decl:'::printf'} > module } }
Re: [PATCHv2] Value range: Add range op for __builtin_isfinite
On Thu, May 9, 2024 at 10:05 AM Mikael Morin wrote: > > Hello, > > Le 07/05/2024 à 04:37, HAO CHEN GUI a écrit : > > Hi, > >The former patch adds isfinite optab for __builtin_isfinite. > > https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649339.html > > > >Thus the builtin might not be folded at front end. The range op for > > isfinite is needed for value range analysis. This patch adds them. > > > >Compared to last version, this version fixes a typo. > > > >Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no > > regressions. Is it OK for the trunk? > > > > Thanks > > Gui Haochen > > > > ChangeLog > > Value Range: Add range op for builtin isfinite > > > > The former patch adds optab for builtin isfinite. Thus builtin isfinite > > might > > not be folded at front end. So the range op for isfinite is needed for > > value > > range analysis. This patch adds range op for builtin isfinite. > > > > gcc/ > > * gimple-range-op.cc (class cfn_isfinite): New. > > (op_cfn_finite): New variables. > > (gimple_range_op_handler::maybe_builtin_call): Handle > > CFN_BUILT_IN_ISFINITE. > > > > gcc/testsuite/ > > * gcc/testsuite/gcc.dg/tree-ssa/range-isfinite.c: New test. > > > > patch.diff > > diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc > > index 9de130b4022..99c511728d3 100644 > > --- a/gcc/gimple-range-op.cc > > +++ b/gcc/gimple-range-op.cc > > @@ -1192,6 +1192,56 @@ public: > > } > > } op_cfn_isinf; > > > > +//Implement range operator for CFN_BUILT_IN_ISFINITE > > +class cfn_isfinite : public range_operator > > +{ > > +public: > > + using range_operator::fold_range; > > + using range_operator::op1_range; > > + virtual bool fold_range (irange , tree type, const frange , > > +const irange &, relation_trio) const override > > + { > > +if (op1.undefined_p ()) > > + return false; > > + > > +if (op1.known_isfinite ()) > > + { > > + r.set_nonzero (type); > > + return true; > > + } > > + > > +if (op1.known_isnan () > > + || op1.known_isinf ()) > > + { > > + r.set_zero (type); > > + return true; > > + } > > + > > +return false; > I think the canonical API behaviour sets R to varying and returns true > instead of just returning false if nothing is known about the range. Correct. If we know it's varying, we just set varying and return true. Returning false is usually reserved for "I have no idea". However, every caller of fold_range() should know to ignore a return of false, so you should be safe. > > I'm not sure whether it makes any difference; Aldy can probably tell. > But if the type is bool, varying is [0,1] which is better than unknown > range. Also, I see you're setting zero/nonzero. Is the return type known to be boolean, because if so, we usually prefer to one of: r = range_true () r = range_false () r = range_true_and_false (); It doesn't matter either way, but it's probably best to use these as they force boolean_type_node automatically. I don't have a problem with this patch, but I would prefer the floating point savvy people to review this, as there are no members of the ranger team that are floating point experts :). Also, I see you mention in your original post that this patch was needed as a follow-up to this one: https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649339.html I don't see the above patch in the source tree currently: Thanks. Aldy > > > + } > > + virtual bool op1_range (frange , tree type, const irange , > > + const frange &, relation_trio) const override > > + { > > +if (lhs.zero_p ()) > > + { > > + // The range is [-INF,-INF][+INF,+INF] NAN, but it can't be > > represented. > > + // Set range to varying > > + r.set_varying (type); > > + return true; > > + } > > + > > +if (!range_includes_zero_p ()) > > + { > > + nan_state nan (false); > > + r.set (type, real_min_representable (type), > > +real_max_representable (type), nan); > > + return true; > > + } > > + > > +return false; > Same here. > > > + } > > +} op_cfn_isfinite; > > + > > // Implement range operator for CFN_BUILT_IN_ > > class cfn_parity : public range_operator > > { >
Re: [PATCH] report message for operator %a on unaddressible exp
Hi! On Mon, May 13, 2024 at 10:57:12AM +0800, Jiufu Guo wrote: > For PR96866, when gcc print asm code for modifier "%a" which requires > an address operand, It requires a *memory* operand, and it outputs its address. This is a generic modifier btw (not rs6000). > while the operand is with the constraint "X" which > allow non-address form. An error message would be reported to indicate > the invalid asm operands. "non-address form"? Every mem has an address. But 'X' is not memory. What is it at all? Why do we use that when you *have to* have mem here? The code you add that tests for address_operand looks wrong. I would expect it to test the operand is memory, instead :-) Segher
Re: [PATCH] libstdc++: Use __builtin_shufflevector for simd split and concat
On Tue, 7 May 2024 at 14:42, Matthias Kretz wrote: > > Tested on x86_64-linux-gnu and aarch64-linux-gnu and with Clang 18 on x86_64- > linux-gnu. > > OK for trunk and backport(s)? OK for all. > > -- 8< > > Signed-off-by: Matthias Kretz > > libstdc++-v3/ChangeLog: > > PR libstdc++/114958 > * include/experimental/bits/simd.h (__as_vector): Return scalar > simd as one-element vector. Return vector from single-vector > fixed_size simd. > (__vec_shuffle): New. > (__extract_part): Adjust return type signature. > (split): Use __extract_part for any split into non-fixed_size > simds. > (concat): If the return type stores a single vector, use > __vec_shuffle (which calls __builtin_shufflevector) to produce > the return value. > * include/experimental/bits/simd_builtin.h > (__shift_elements_right): Removed. > (__extract_part): Return single elements directly. Use > __vec_shuffle (which calls __builtin_shufflevector) to for all > non-trivial cases. > * include/experimental/bits/simd_fixed_size.h (__extract_part): > Return single elements directly. > * testsuite/experimental/simd/pr114958.cc: New test. > --- > libstdc++-v3/include/experimental/bits/simd.h | 161 +- > .../include/experimental/bits/simd_builtin.h | 152 + > .../experimental/bits/simd_fixed_size.h | 4 +- > .../testsuite/experimental/simd/pr114958.cc | 20 +++ > 4 files changed, 145 insertions(+), 192 deletions(-) > create mode 100644 libstdc++-v3/testsuite/experimental/simd/pr114958.cc > > > -- > ── > Dr. Matthias Kretz https://mattkretz.github.io > GSI Helmholtz Centre for Heavy Ion Research https://gsi.de > stdₓ::simd > ──
[PATCH] Refactor SLP reduction group discovery
The following refactors a bit how we perform SLP reduction group discovery possibly making it easier to have multiple reduction groups later, esp. with single-lane SLP. Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. * tree-vect-slp.cc (vect_analyze_slp_instance): Remove slp_inst_kind_reduc_group handling. (vect_analyze_slp): Add the meat here. --- gcc/tree-vect-slp.cc | 67 ++-- 1 file changed, 34 insertions(+), 33 deletions(-) diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc index 8c18f5308e2..f34ed54a70b 100644 --- a/gcc/tree-vect-slp.cc +++ b/gcc/tree-vect-slp.cc @@ -3586,7 +3586,6 @@ vect_analyze_slp_instance (vec_info *vinfo, slp_instance_kind kind, unsigned max_tree_size, unsigned *limit) { - unsigned int i; vec scalar_stmts; if (is_a (vinfo)) @@ -3620,35 +3619,6 @@ vect_analyze_slp_instance (vec_info *vinfo, STMT_VINFO_REDUC_DEF (vect_orig_stmt (stmt_info)) = STMT_VINFO_REDUC_DEF (vect_orig_stmt (scalar_stmts.last ())); } - else if (kind == slp_inst_kind_reduc_group) -{ - /* Collect reduction statements. */ - const vec - = as_a (vinfo)->reductions; - scalar_stmts.create (reductions.length ()); - for (i = 0; reductions.iterate (i, _info); i++) - { - gassign *g; - next_info = vect_stmt_to_vectorize (next_info); - if ((STMT_VINFO_RELEVANT_P (next_info) - || STMT_VINFO_LIVE_P (next_info)) - /* ??? Make sure we didn't skip a conversion around a reduction -path. In that case we'd have to reverse engineer that -conversion stmt following the chain using reduc_idx and from -the PHI using reduc_def. */ - && STMT_VINFO_DEF_TYPE (next_info) == vect_reduction_def - /* Do not discover SLP reductions for lane-reducing ops, that -will fail later. */ - && (!(g = dyn_cast (STMT_VINFO_STMT (next_info))) - || (gimple_assign_rhs_code (g) != DOT_PROD_EXPR - && gimple_assign_rhs_code (g) != WIDEN_SUM_EXPR - && gimple_assign_rhs_code (g) != SAD_EXPR))) - scalar_stmts.quick_push (next_info); - } - /* If less than two were relevant/live there's nothing to SLP. */ - if (scalar_stmts.length () < 2) - return false; -} else gcc_unreachable (); @@ -3740,9 +3710,40 @@ vect_analyze_slp (vec_info *vinfo, unsigned max_tree_size) /* Find SLP sequences starting from groups of reductions. */ if (loop_vinfo->reductions.length () > 1) - vect_analyze_slp_instance (vinfo, bst_map, loop_vinfo->reductions[0], - slp_inst_kind_reduc_group, max_tree_size, - ); + { + /* Collect reduction statements. */ + vec scalar_stmts; + scalar_stmts.create (loop_vinfo->reductions.length ()); + for (auto next_info : loop_vinfo->reductions) + { + gassign *g; + next_info = vect_stmt_to_vectorize (next_info); + if ((STMT_VINFO_RELEVANT_P (next_info) + || STMT_VINFO_LIVE_P (next_info)) + /* ??? Make sure we didn't skip a conversion around a +reduction path. In that case we'd have to reverse +engineer that conversion stmt following the chain using +reduc_idx and from the PHI using reduc_def. */ + && STMT_VINFO_DEF_TYPE (next_info) == vect_reduction_def + /* Do not discover SLP reductions for lane-reducing ops, that +will fail later. */ + && (!(g = dyn_cast (STMT_VINFO_STMT (next_info))) + || (gimple_assign_rhs_code (g) != DOT_PROD_EXPR + && gimple_assign_rhs_code (g) != WIDEN_SUM_EXPR + && gimple_assign_rhs_code (g) != SAD_EXPR))) + scalar_stmts.quick_push (next_info); + } + if (scalar_stmts.length () > 1) + { + vec roots = vNULL; + vec remain = vNULL; + vect_build_slp_instance (loop_vinfo, slp_inst_kind_reduc_group, + scalar_stmts, roots, remain, + max_tree_size, , bst_map, NULL); + } + else + scalar_stmts.release (); + } } hash_set visited_patterns; -- 2.35.3
RE: [PATCH] Allow patterns in SLP reductions
On Mon, 13 May 2024, Tamar Christina wrote: > > -Original Message- > > From: Richard Biener > > Sent: Friday, May 10, 2024 2:07 PM > > To: Richard Biener > > Cc: gcc-patches@gcc.gnu.org > > Subject: Re: [PATCH] Allow patterns in SLP reductions > > > > On Fri, Mar 1, 2024 at 10:21 AM Richard Biener wrote: > > > > > > The following removes the over-broad rejection of patterns for SLP > > > reductions which is done by removing them from LOOP_VINFO_REDUCTIONS > > > during pattern detection. That's also insufficient in case the > > > pattern only appears on the reduction path. Instead this implements > > > the proper correctness check in vectorizable_reduction and guides > > > SLP discovery to heuristically avoid forming later invalid groups. > > > > > > I also couldn't find any testcase that FAILs when allowing the SLP > > > reductions to form so I've added one. > > > > > > I came across this for single-lane SLP reductions with the all-SLP > > > work where we rely on patterns to properly vectorize COND_EXPR > > > reductions. > > > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu, queued for stage1. > > > > Re-bootstrapped/tested, r15-361-g52d4691294c847 > > Awesome! > > Does this now allow us to write new reductions using patterns? i.e. > widening reductions? Yes (SLP reductions, that is). This is really only for SLP reductions (not SLP reduction chains, not non-SLP reductions). So it's just a corner-case but since with SLP-only non-SLP reductions become SLP reductions with a single lane that was important to fix ;) Richard. > Cheers, > Tamar > > > > Richard. > > > > > Richard. > > > > > > * tree-vect-patterns.cc (vect_pattern_recog_1): Do not > > > remove reductions involving patterns. > > > * tree-vect-loop.cc (vectorizable_reduction): Reject SLP > > > reduction groups with multiple lane-reducing reductions. > > > * tree-vect-slp.cc (vect_analyze_slp_instance): When discovering > > > SLP reduction groups avoid including lane-reducing ones. > > > > > > * gcc.dg/vect/vect-reduc-sad-9.c: New testcase. > > > --- > > > gcc/testsuite/gcc.dg/vect/vect-reduc-sad-9.c | 68 > > > gcc/tree-vect-loop.cc| 15 + > > > gcc/tree-vect-patterns.cc| 13 > > > gcc/tree-vect-slp.cc | 26 +--- > > > 4 files changed, 101 insertions(+), 21 deletions(-) > > > create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-sad-9.c > > > > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-sad-9.c > > b/gcc/testsuite/gcc.dg/vect/vect-reduc-sad-9.c > > > new file mode 100644 > > > index 000..3c6af4510f4 > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-sad-9.c > > > @@ -0,0 +1,68 @@ > > > +/* Disabling epilogues until we find a better way to deal with scans. */ > > > +/* { dg-additional-options "--param vect-epilogues-nomask=0" } */ > > > +/* { dg-additional-options "-msse4.2" { target { x86_64-*-* i?86-*-* } } > > > } */ > > > +/* { dg-require-effective-target vect_usad_char } */ > > > + > > > +#include > > > +#include "tree-vect.h" > > > + > > > +#define N 64 > > > + > > > +unsigned char X[N] __attribute__ ((__aligned__(__BIGGEST_ALIGNMENT__))); > > > +unsigned char Y[N] __attribute__ ((__aligned__(__BIGGEST_ALIGNMENT__))); > > > +int abs (int); > > > + > > > +/* Sum of absolute differences between arrays of unsigned char types. > > > + Detected as a sad pattern. > > > + Vectorized on targets that support sad for unsigned chars. */ > > > + > > > +__attribute__ ((noinline)) int > > > +foo (int len, int *res2) > > > +{ > > > + int i; > > > + int result = 0; > > > + int result2 = 0; > > > + > > > + for (i = 0; i < len; i++) > > > +{ > > > + /* Make sure we are not using an SLP reduction for this. */ > > > + result += abs (X[2*i] - Y[2*i]); > > > + result2 += abs (X[2*i + 1] - Y[2*i + 1]); > > > +} > > > + > > > + *res2 = result2; > > > + return result; > > > +} > > > + > > > + > > > +int > > > +main (void) > > > +{ > > > + int i; > > > + int sad; > > > + > > > + check_vect (); > > > + > > > + for (i = 0; i < N/2; i++) > > > +{ > > > + X[2*i] = i; > > > + Y[2*i] = N/2 - i; > > > + X[2*i+1] = i; > > > + Y[2*i+1] = 0; > > > + __asm__ volatile (""); > > > +} > > > + > > > + > > > + int sad2; > > > + sad = foo (N/2, ); > > > + if (sad != (N/2)*(N/4)) > > > +abort (); > > > + if (sad2 != (N/2-1)*(N/2)/2) > > > +abort (); > > > + > > > + return 0; > > > +} > > > + > > > +/* { dg-final { scan-tree-dump "vect_recog_sad_pattern: detected" "vect" > > > } } */ > > > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ > > > + > > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc > > > index 35f1f8c7d42..13dcdba403a 100644 > > > --- a/gcc/tree-vect-loop.cc > > > +++ b/gcc/tree-vect-loop.cc > > > @@ -7703,6 +7703,21
Re: [PATCH] c++: Optimize in maybe_clone_body aliases even when not at_eof [PR113208]
On Fri, May 10, 2024 at 03:59:25PM -0400, Jason Merrill wrote: > > 2024-05-09 Jakub Jelinek > > Jason Merrill > > > > PR lto/113208 > > * cp-tree.h (maybe_optimize_cdtor): Remove. > > * decl2.cc (tentative_decl_linkage): Call maybe_make_one_only > > for implicit instantiations of maybe in charge ctors/dtors > > declared inline. > > (import_export_decl): Don't call maybe_optimize_cdtor. > > (c_parse_final_cleanups): Formatting fixes. > > * optimize.cc (can_alias_cdtor): Adjust condition, for > > HAVE_COMDAT_GROUP && DECL_ONE_ONLY && DECL_WEAK return true even > > if not DECL_INTERFACE_KNOWN. > > > --- gcc/cp/optimize.cc.jj 2024-04-25 20:33:30.771858912 +0200 > > +++ gcc/cp/optimize.cc 2024-05-09 17:10:23.920478922 +0200 > > @@ -220,10 +220,8 @@ can_alias_cdtor (tree fn) > > gcc_assert (DECL_MAYBE_IN_CHARGE_CDTOR_P (fn)); > > /* Don't use aliases for weak/linkonce definitions unless we can put > > both > >symbols in the same COMDAT group. */ > > - return (DECL_INTERFACE_KNOWN (fn) > > - && (SUPPORTS_ONE_ONLY || !DECL_WEAK (fn)) > > - && (!DECL_ONE_ONLY (fn) > > - || (HAVE_COMDAT_GROUP && DECL_WEAK (fn; > > + return (DECL_WEAK (fn) ? (HAVE_COMDAT_GROUP && DECL_ONE_ONLY (fn)) > > +: (DECL_INTERFACE_KNOWN (fn) && !DECL_ONE_ONLY (fn))); > > Hmm, would > > (DECL_ONE_ONLY (fn) ? HAVE_COMDAT_GROUP > : (DECL_INTERFACE_KNOWN (fn) && !DECL_WEAK (fn))) > > make sense instead? I don't think DECL_WEAK is necessary for COMDAT. I think it isn't indeed necessary for COMDAT, although e.g. comdat_linkage will not call make_decl_one_only if !flag_weak. But I think it is absolutely required for the alias cdtor optimization in question, because otherwise it would be an ABI change. Consider older version of GCC or some other compiler emitting _ZN6vectorI12QualityValueEC1ERKS1_ and _ZN6vectorI12QualityValueEC2ERKS1_ symbols not as aliases, each in their own comdat groups, so .text._ZN6vectorI12QualityValueEC1ERKS1_ in _ZN6vectorI12QualityValueEC1ERKS1_ comdat group and .text._ZN6vectorI12QualityValueEC2ERKS1_ in _ZN6vectorI12QualityValueEC2ERKS1_ comdat group. And then comes GCC with the above patch without the DECL_WEAK check in there, and decides to use alias, so _ZN6vectorI12QualityValueEC1ERKS1_ is an alias to _ZN6vectorI12QualityValueEC2ERKS1_ and both live in .text._ZN6vectorI12QualityValueEC2ERKS1_ section in _ZN6vectorI12QualityValueEC5ERKS1_ comdat group. If you mix TUs with this, the linker can keep one of the section sets from the _ZN6vectorI12QualityValueEC1ERKS1_ and _ZN6vectorI12QualityValueEC2ERKS1_ and _ZN6vectorI12QualityValueEC5ERKS1_ comdat groups. If there is no .weak for the symbols, this will fail to link, one can emit it either the old way or the new way but never both, it is part of an ABI. While with .weak, mixing it is possible, worst case one gets some unused code in the linked binary or shared library. Of course the desirable case is that there is no mixing and there is no unused code, but if it happens, no big deal. Without .weak it is a big deal. Jakub
RE: [PATCH] Allow patterns in SLP reductions
> -Original Message- > From: Richard Biener > Sent: Friday, May 10, 2024 2:07 PM > To: Richard Biener > Cc: gcc-patches@gcc.gnu.org > Subject: Re: [PATCH] Allow patterns in SLP reductions > > On Fri, Mar 1, 2024 at 10:21 AM Richard Biener wrote: > > > > The following removes the over-broad rejection of patterns for SLP > > reductions which is done by removing them from LOOP_VINFO_REDUCTIONS > > during pattern detection. That's also insufficient in case the > > pattern only appears on the reduction path. Instead this implements > > the proper correctness check in vectorizable_reduction and guides > > SLP discovery to heuristically avoid forming later invalid groups. > > > > I also couldn't find any testcase that FAILs when allowing the SLP > > reductions to form so I've added one. > > > > I came across this for single-lane SLP reductions with the all-SLP > > work where we rely on patterns to properly vectorize COND_EXPR > > reductions. > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu, queued for stage1. > > Re-bootstrapped/tested, r15-361-g52d4691294c847 Awesome! Does this now allow us to write new reductions using patterns? i.e. widening reductions? Cheers, Tamar > > Richard. > > > Richard. > > > > * tree-vect-patterns.cc (vect_pattern_recog_1): Do not > > remove reductions involving patterns. > > * tree-vect-loop.cc (vectorizable_reduction): Reject SLP > > reduction groups with multiple lane-reducing reductions. > > * tree-vect-slp.cc (vect_analyze_slp_instance): When discovering > > SLP reduction groups avoid including lane-reducing ones. > > > > * gcc.dg/vect/vect-reduc-sad-9.c: New testcase. > > --- > > gcc/testsuite/gcc.dg/vect/vect-reduc-sad-9.c | 68 > > gcc/tree-vect-loop.cc| 15 + > > gcc/tree-vect-patterns.cc| 13 > > gcc/tree-vect-slp.cc | 26 +--- > > 4 files changed, 101 insertions(+), 21 deletions(-) > > create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-sad-9.c > > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-sad-9.c > b/gcc/testsuite/gcc.dg/vect/vect-reduc-sad-9.c > > new file mode 100644 > > index 000..3c6af4510f4 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-sad-9.c > > @@ -0,0 +1,68 @@ > > +/* Disabling epilogues until we find a better way to deal with scans. */ > > +/* { dg-additional-options "--param vect-epilogues-nomask=0" } */ > > +/* { dg-additional-options "-msse4.2" { target { x86_64-*-* i?86-*-* } } } > > */ > > +/* { dg-require-effective-target vect_usad_char } */ > > + > > +#include > > +#include "tree-vect.h" > > + > > +#define N 64 > > + > > +unsigned char X[N] __attribute__ ((__aligned__(__BIGGEST_ALIGNMENT__))); > > +unsigned char Y[N] __attribute__ ((__aligned__(__BIGGEST_ALIGNMENT__))); > > +int abs (int); > > + > > +/* Sum of absolute differences between arrays of unsigned char types. > > + Detected as a sad pattern. > > + Vectorized on targets that support sad for unsigned chars. */ > > + > > +__attribute__ ((noinline)) int > > +foo (int len, int *res2) > > +{ > > + int i; > > + int result = 0; > > + int result2 = 0; > > + > > + for (i = 0; i < len; i++) > > +{ > > + /* Make sure we are not using an SLP reduction for this. */ > > + result += abs (X[2*i] - Y[2*i]); > > + result2 += abs (X[2*i + 1] - Y[2*i + 1]); > > +} > > + > > + *res2 = result2; > > + return result; > > +} > > + > > + > > +int > > +main (void) > > +{ > > + int i; > > + int sad; > > + > > + check_vect (); > > + > > + for (i = 0; i < N/2; i++) > > +{ > > + X[2*i] = i; > > + Y[2*i] = N/2 - i; > > + X[2*i+1] = i; > > + Y[2*i+1] = 0; > > + __asm__ volatile (""); > > +} > > + > > + > > + int sad2; > > + sad = foo (N/2, ); > > + if (sad != (N/2)*(N/4)) > > +abort (); > > + if (sad2 != (N/2-1)*(N/2)/2) > > +abort (); > > + > > + return 0; > > +} > > + > > +/* { dg-final { scan-tree-dump "vect_recog_sad_pattern: detected" "vect" } > > } */ > > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ > > + > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc > > index 35f1f8c7d42..13dcdba403a 100644 > > --- a/gcc/tree-vect-loop.cc > > +++ b/gcc/tree-vect-loop.cc > > @@ -7703,6 +7703,21 @@ vectorizable_reduction (loop_vec_info loop_vinfo, > >return false; > > } > > > > + /* Lane-reducing ops also never can be used in a SLP reduction group > > + since we'll mix lanes belonging to different reductions. But it's > > + OK to use them in a reduction chain or when the reduction group > > + has just one element. */ > > + if (lane_reduc_code_p > > + && slp_node > > + && !REDUC_GROUP_FIRST_ELEMENT (stmt_info) > > + && SLP_TREE_LANES (slp_node) > 1) > > +{ > > + if (dump_enabled_p ()) > > + dump_printf_loc
Re: [EXTERNAL] [COMMITTED] Regenerate cygming.opt.urls and mingw.opt.urls
Hi Evgeny, Adding David to the CC, who might know the details. On Mon, May 13, 2024 at 08:44:12AM +, Evgeny Karpov wrote: > Sunday, May 12, 2024 > > Thank you for reviewing our changes related to the refactoring of > extracting the MinGW implementation from ix64. > > It was expected to move the MinGW-related files without changes in > this commit ("Reuse MinGW from i386 for AArch64") and apply the > renaming in a follow-up commit, which has been done in 'Rename "x86 > Windows Options" to "Cygwin and MinGW Options"'. > > The script to update opt.urls files has been used. > > > diff --git a/gcc/config/mingw/cygming.opt.urls > > b/gcc/config/mingw/cygming.opt.urls > > index c624e22e4427..af11c4997609 100644 > > --- a/gcc/config/mingw/cygming.opt.urls > > +++ b/gcc/config/mingw/cygming.opt.urls > > @@ -1,4 +1,4 @@ > > > -; Autogenerated by regenerate-opt-urls.py from gcc/config/i386/cygming.opt > > and generated HTML > > +; Autogenerated by regenerate-opt-urls.py from > > +gcc/config/mingw/cygming.opt and generated HTML > > I am not sure why this comment has not been updated. Is it critical > or it could be updated next time when it is needed? Odd that the script didn't update this comment, it really should have. It might be that running the script through make regenerate-opt-urls inside the gcc build subdir invokes regenerate-opt-urls.py slightly differently so that this line is updated. > > mconsole > > UrlSuffix(gcc/Cygwin-and-MinGW-Options.html#index-mconsole) > > @@ -9,9 +9,8 @@ UrlSuffix(gcc/Cygwin-and-MinGW-Options.html#index- > > mdll) > > mnop-fun-dllimport > > UrlSuffix(gcc/Cygwin-and-MinGW-Options.html#index-mnop-fun-dllimport) > > > > -; skipping UrlSuffix for 'mthreads' due to multiple URLs: > > -; duplicate: 'gcc/Cygwin-and-MinGW-Options.html#index-mthreads-1' > > -; duplicate: 'gcc/x86-Options.html#index-mthreads' > > +mthreads > > +UrlSuffix(gcc/Cygwin-and-MinGW-Options.html#index-mthreads-1) > > mthreads has the same issue before applying changes. Has something been > changed recently? > This is the change in patch series in 'Rename "x86 Windows Options" to > "Cygwin and MinGW Options"' commit. > > ; skipping UrlSuffix for 'mthreads' due to multiple URLs: > +; duplicate: 'gcc/Cygwin-and-MinGW-Options.html#index-mthreads-1' > ; duplicate: 'gcc/x86-Options.html#index-mthreads' > -; duplicate: 'gcc/x86-Windows-Options.html#index-mthreads-1' Again, it might be caused by invoking the script by hand vs with make regenerate-opt-urls.py. I believe with the make option it will renumber the suffixes making sure the urls are unique. BTW. There is a CI buildbot that tries to regenerate all generated files, which is how I spotted this: https://builder.sourceware.org/buildbot/#/builders/gcc-autoregen (It should also sent email to the author of the patch on failure.) Cheers, Mark
RE: [PATCH v4 2/3] VECT: Support new IFN SAT_ADD for unsigned vector int
Hi Pan, > -Original Message- > From: pan2...@intel.com > Sent: Monday, May 6, 2024 3:49 PM > To: gcc-patches@gcc.gnu.org > Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; Tamar Christina > ; richard.guent...@gmail.com; > hongtao@intel.com; Pan Li > Subject: [PATCH v4 2/3] VECT: Support new IFN SAT_ADD for unsigned vector int > > From: Pan Li > > This patch depends on below scalar enabling patch: > > https://gcc.gnu.org/pipermail/gcc-patches/2024-May/650822.html > > For vectorize, we leverage the existing vect pattern recog to find > the pattern similar to scalar and let the vectorizer to perform > the rest part for standard name usadd3 in vector mode. > The riscv vector backend have insn "Vector Single-Width Saturating > Add and Subtract" which can be leveraged when expand the usadd3 > in vector mode. For example: > > void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n) > { > unsigned i; > > for (i = 0; i < n; i++) > out[i] = (x[i] + y[i]) | (- (uint64_t)((uint64_t)(x[i] + y[i]) < x[i])); > } > > Before this patch: > void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n) > { > ... > _80 = .SELECT_VL (ivtmp_78, POLY_INT_CST [2, 2]); > ivtmp_58 = _80 * 8; > vect__4.7_61 = .MASK_LEN_LOAD (vectp_x.5_59, 64B, { -1, ... }, _80, 0); > vect__6.10_65 = .MASK_LEN_LOAD (vectp_y.8_63, 64B, { -1, ... }, _80, 0); > vect__7.11_66 = vect__4.7_61 + vect__6.10_65; > mask__8.12_67 = vect__4.7_61 > vect__7.11_66; > vect__12.15_72 = .VCOND_MASK (mask__8.12_67, { 18446744073709551615, > ... }, vect__7.11_66); > .MASK_LEN_STORE (vectp_out.16_74, 64B, { -1, ... }, _80, 0, vect__12.15_72); > vectp_x.5_60 = vectp_x.5_59 + ivtmp_58; > vectp_y.8_64 = vectp_y.8_63 + ivtmp_58; > vectp_out.16_75 = vectp_out.16_74 + ivtmp_58; > ivtmp_79 = ivtmp_78 - _80; > ... > } > > After this patch: > void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n) > { > ... > _62 = .SELECT_VL (ivtmp_60, POLY_INT_CST [2, 2]); > ivtmp_46 = _62 * 8; > vect__4.7_49 = .MASK_LEN_LOAD (vectp_x.5_47, 64B, { -1, ... }, _62, 0); > vect__6.10_53 = .MASK_LEN_LOAD (vectp_y.8_51, 64B, { -1, ... }, _62, 0); > vect__12.11_54 = .SAT_ADD (vect__4.7_49, vect__6.10_53); > .MASK_LEN_STORE (vectp_out.12_56, 64B, { -1, ... }, _62, 0, vect__12.11_54); > ... > } > > The below test suites are passed for this patch. > * The riscv fully regression tests. > * The aarch64 fully regression tests. > * The x86 bootstrap tests. > * The x86 fully regression tests. > > PR target/51492 > PR target/112600 > > gcc/ChangeLog: > > * tree-vect-patterns.cc (gimple_unsigned_integer_sat_add): New func > decl generated by match.pd match. > (vect_recog_sat_add_pattern): New func impl to recog the pattern > for unsigned SAT_ADD. > > Signed-off-by: Pan Li Patch looks good to me, but I cannot approve so I'll pass it on to Richi. Cheers, Tamar > --- > gcc/tree-vect-patterns.cc | 51 +++ > 1 file changed, 51 insertions(+) > > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc > index 87c2acff386..8ffcaf71d5c 100644 > --- a/gcc/tree-vect-patterns.cc > +++ b/gcc/tree-vect-patterns.cc > @@ -4487,6 +4487,56 @@ vect_recog_mult_pattern (vec_info *vinfo, >return pattern_stmt; > } > > +extern bool gimple_unsigned_integer_sat_add (tree, tree*, tree (*)(tree)); > + > +/* > + * Try to detect saturation add pattern (SAT_ADD), aka below gimple: > + * _7 = _4 + _6; > + * _8 = _4 > _7; > + * _9 = (long unsigned int) _8; > + * _10 = -_9; > + * _12 = _7 | _10; > + * > + * And then simplied to > + * _12 = .SAT_ADD (_4, _6); > + */ > + > +static gimple * > +vect_recog_sat_add_pattern (vec_info *vinfo, stmt_vec_info stmt_vinfo, > + tree *type_out) > +{ > + gimple *last_stmt = STMT_VINFO_STMT (stmt_vinfo); > + > + if (!is_gimple_assign (last_stmt)) > +return NULL; > + > + tree res_ops[2]; > + tree lhs = gimple_assign_lhs (last_stmt); > + > + if (gimple_unsigned_integer_sat_add (lhs, res_ops, NULL)) > +{ > + tree itype = TREE_TYPE (res_ops[0]); > + tree vtype = get_vectype_for_scalar_type (vinfo, itype); > + > + if (vtype != NULL_TREE && direct_internal_fn_supported_p ( > + IFN_SAT_ADD, vtype, OPTIMIZE_FOR_SPEED)) > + { > + *type_out = vtype; > + gcall *call = gimple_build_call_internal (IFN_SAT_ADD, 2, res_ops[0], > + res_ops[1]); > + > + gimple_call_set_lhs (call, vect_recog_temp_ssa_var (itype, NULL)); > + gimple_call_set_nothrow (call, /* nothrow_p */ false); > + gimple_set_location (call, gimple_location (last_stmt)); > + > + vect_pattern_detected ("vect_recog_sat_add_pattern", last_stmt); > + return call; > + } > +} > + > + return NULL; > +} > + > /* Detect a signed division by a constant that wouldn't be >
RE: [PATCH v4 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int
Hi Pan, > -Original Message- > From: pan2...@intel.com > Sent: Monday, May 6, 2024 3:48 PM > To: gcc-patches@gcc.gnu.org > Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; Tamar Christina > ; richard.guent...@gmail.com; > hongtao@intel.com; Pan Li > Subject: [PATCH v4 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned > scalar > int > > From: Pan Li > > This patch would like to add the middle-end presentation for the > saturation add. Aka set the result of add to the max when overflow. > It will take the pattern similar as below. > > SAT_ADD (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x)) > > Take uint8_t as example, we will have: > > * SAT_ADD (1, 254) => 255. > * SAT_ADD (1, 255) => 255. > * SAT_ADD (2, 255) => 255. > * SAT_ADD (255, 255) => 255. > > Given below example for the unsigned scalar integer uint64_t: > > uint64_t sat_add_u64 (uint64_t x, uint64_t y) > { > return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x)); > } > > Before this patch: > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y) > { > long unsigned int _1; > _Bool _2; > long unsigned int _3; > long unsigned int _4; > uint64_t _7; > long unsigned int _10; > __complex__ long unsigned int _11; > > ;; basic block 2, loop depth 0 > ;;pred: ENTRY > _11 = .ADD_OVERFLOW (x_5(D), y_6(D)); > _1 = REALPART_EXPR <_11>; > _10 = IMAGPART_EXPR <_11>; > _2 = _10 != 0; > _3 = (long unsigned int) _2; > _4 = -_3; > _7 = _1 | _4; > return _7; > ;;succ: EXIT > > } > > After this patch: > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y) > { > uint64_t _7; > > ;; basic block 2, loop depth 0 > ;;pred: ENTRY > _7 = .SAT_ADD (x_5(D), y_6(D)); [tail call] > return _7; > ;;succ: EXIT > } > > We perform the tranform during widen_mult because that the sub-expr of > SAT_ADD will be optimized to .ADD_OVERFLOW. We need to try the .SAT_ADD > pattern first and then .ADD_OVERFLOW, or we may never catch the pattern > .SAT_ADD. Meanwhile, the isel pass is after widen_mult and then we > cannot perform the .SAT_ADD pattern match as the sub-expr will be > optmized to .ADD_OVERFLOW first. > > The below tests are passed for this patch: > 1. The riscv fully regression tests. > 2. The aarch64 fully regression tests. > 3. The x86 bootstrap tests. > 4. The x86 fully regression tests. > > PR target/51492 > PR target/112600 > > gcc/ChangeLog: > > * internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADD > to the return true switch case(s). > * internal-fn.def (SAT_ADD): Add new signed optab SAT_ADD. > * match.pd: Add unsigned SAT_ADD match. > * optabs.def (OPTAB_NL): Remove fixed-point limitation for us/ssadd. > * tree-ssa-math-opts.cc (gimple_unsigned_integer_sat_add): New extern > func decl generated in match.pd match. > (match_saturation_arith): New func impl to match the saturation arith. > (math_opts_dom_walker::after_dom_children): Try match saturation > arith. > > Signed-off-by: Pan Li > --- > gcc/internal-fn.cc| 1 + > gcc/internal-fn.def | 2 ++ > gcc/match.pd | 28 > gcc/optabs.def| 4 ++-- > gcc/tree-ssa-math-opts.cc | 46 > +++ > 5 files changed, 79 insertions(+), 2 deletions(-) > > diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc > index 0a7053c2286..73045ca8c8c 100644 > --- a/gcc/internal-fn.cc > +++ b/gcc/internal-fn.cc > @@ -4202,6 +4202,7 @@ commutative_binary_fn_p (internal_fn fn) > case IFN_UBSAN_CHECK_MUL: > case IFN_ADD_OVERFLOW: > case IFN_MUL_OVERFLOW: > +case IFN_SAT_ADD: > case IFN_VEC_WIDEN_PLUS: > case IFN_VEC_WIDEN_PLUS_LO: > case IFN_VEC_WIDEN_PLUS_HI: > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def > index 848bb9dbff3..25badbb86e5 100644 > --- a/gcc/internal-fn.def > +++ b/gcc/internal-fn.def > @@ -275,6 +275,8 @@ DEF_INTERNAL_SIGNED_OPTAB_FN (MULHS, ECF_CONST > | ECF_NOTHROW, first, > DEF_INTERNAL_SIGNED_OPTAB_FN (MULHRS, ECF_CONST | ECF_NOTHROW, > first, > smulhrs, umulhrs, binary) > > +DEF_INTERNAL_SIGNED_OPTAB_FN (SAT_ADD, ECF_CONST, first, ssadd, usadd, > binary) > + > DEF_INTERNAL_COND_FN (ADD, ECF_CONST, add, binary) > DEF_INTERNAL_COND_FN (SUB, ECF_CONST, sub, binary) > DEF_INTERNAL_COND_FN (MUL, ECF_CONST, smul, binary) > diff --git a/gcc/match.pd b/gcc/match.pd > index d401e7503e6..7058e4cbe29 100644 > --- a/gcc/match.pd > +++ b/gcc/match.pd > @@ -3043,6 +3043,34 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > || POINTER_TYPE_P (itype)) >&& wi::eq_p (wi::to_wide (int_cst), wi::max_value (itype)) > > +/* Unsigned Saturation Add */ > +(match (usadd_left_part @0 @1) > + (plus:c @0 @1) > + (if (INTEGRAL_TYPE_P (type) > + && TYPE_UNSIGNED (TREE_TYPE (@0)) > + && types_match (type, TREE_TYPE (@0)) > + && types_match (type,
Re: [pushed 00/21] Various backports to gcc 13 (analyzer, jit, diagnostics)
On Thu, May 09, 2024 at 01:42:15PM -0400, David Malcolm wrote: > I've pushed the following changes to releases/gcc-13 > as r13-8741-g89feb3557a0188 through r13-8761-gb7a2697733d19a. Unfortunately many of the commits contained git commit message wording that update_git_version can't cope with. Wording like (cherry picked from commit r14-1664-gfe9771b59f576f) is wrong, (cherry picked from commit .) is reserved solely for what one gets from git cherry-pick -x (i.e. the full commit hash without anything extra). I had to ignore the following commits in the ChangeLog generation because of this: 89feb3557a018893cfe50c2e07f91559bd3cde2b ccf8d3e3d26c6ba3d5e11fffeed8d64018e9c060 e0c52905f666e3d23881f82dbf39466a24f009f4 b38472ffc1e631bd357573b44d956ce16d94e666 a0b13d0860848dd5f2876897ada1e22e4e681e91 b8c772cae97b54386f7853edf0f9897012bfa90b 810d35a7e054bcbb5b66d2e5924428e445f5fba9 0df1ee083434ac00ecb19582b1e5b25e105981b2 2c688f6afce4cbb414f5baab1199cd525f309fca 60dcb710b6b4aa22ea96abc8df6dfe9067f3d7fe 44968a0e00f656e9bb3e504bb2fa1a8282002015 Can you please add the ChangeLog entries for these by hand (commits which only touch ChangeLog files are allowed and shouldn't contain ChangeLog style entry in the commit message)? Thanks. Jakub
RE: [EXTERNAL] [COMMITTED] Regenerate cygming.opt.urls and mingw.opt.urls
Sunday, May 12, 2024 Mark Wielaard wrote: > The new cygming.opt.urls and mingw.opt.urls in the > gcc/config/mingw/cygming.opt.urls directory need to generated by make > regenerate-opt-urls in the gcc subdirectory. They still contained references > to > the gcc/config/i386 directory from which they were copied. > > Fixes: 1f05dfc131c7 ("Reuse MinGW from i386 for AArch64") > Fixes: e8d003736e6c ("Rename "x86 Windows Options" to "Cygwin and > MinGW Options"") > > gcc/ChangeLog: > > * config/mingw/cygming.opt.urls: Regenerate. > * config/mingw/mingw.opt.urls: Likewise. > --- Hello Mark, Thank you for reviewing our changes related to the refactoring of extracting the MinGW implementation from ix64. It was expected to move the MinGW-related files without changes in this commit ("Reuse MinGW from i386 for AArch64") and apply the renaming in a follow-up commit, which has been done in 'Rename "x86 Windows Options" to "Cygwin and MinGW Options"'. The script to update opt.urls files has been used. > gcc/config/mingw/cygming.opt.urls | 7 +++ > gcc/config/mingw/mingw.opt.urls | 2 +- > 2 files changed, 4 insertions(+), 5 deletions(-) > > diff --git a/gcc/config/mingw/cygming.opt.urls > b/gcc/config/mingw/cygming.opt.urls > index c624e22e4427..af11c4997609 100644 > --- a/gcc/config/mingw/cygming.opt.urls > +++ b/gcc/config/mingw/cygming.opt.urls > @@ -1,4 +1,4 @@ > -; Autogenerated by regenerate-opt-urls.py from gcc/config/i386/cygming.opt > and generated HTML > +; Autogenerated by regenerate-opt-urls.py from > +gcc/config/mingw/cygming.opt and generated HTML I am not sure why this comment has not been updated. Is it critical or it could be updated next time when it is needed? > > mconsole > UrlSuffix(gcc/Cygwin-and-MinGW-Options.html#index-mconsole) > @@ -9,9 +9,8 @@ UrlSuffix(gcc/Cygwin-and-MinGW-Options.html#index- > mdll) > mnop-fun-dllimport > UrlSuffix(gcc/Cygwin-and-MinGW-Options.html#index-mnop-fun-dllimport) > > -; skipping UrlSuffix for 'mthreads' due to multiple URLs: > -; duplicate: 'gcc/Cygwin-and-MinGW-Options.html#index-mthreads-1' > -; duplicate: 'gcc/x86-Options.html#index-mthreads' > +mthreads > +UrlSuffix(gcc/Cygwin-and-MinGW-Options.html#index-mthreads-1) mthreads has the same issue before applying changes. Has something been changed recently? This is the change in patch series in 'Rename "x86 Windows Options" to "Cygwin and MinGW Options"' commit. ; skipping UrlSuffix for 'mthreads' due to multiple URLs: +; duplicate: 'gcc/Cygwin-and-MinGW-Options.html#index-mthreads-1' ; duplicate: 'gcc/x86-Options.html#index-mthreads' -; duplicate: 'gcc/x86-Windows-Options.html#index-mthreads-1' Regards, Evgeny > mwin32 > UrlSuffix(gcc/Cygwin-and-MinGW-Options.html#index-mwin32) > diff --git a/gcc/config/mingw/mingw.opt.urls > b/gcc/config/mingw/mingw.opt.urls index f8ee5be6a535..40fb086606b2 > 100644 > --- a/gcc/config/mingw/mingw.opt.urls > +++ b/gcc/config/mingw/mingw.opt.urls > @@ -1,4 +1,4 @@ > -; Autogenerated by regenerate-opt-urls.py from gcc/config/i386/mingw.opt > and generated HTML > +; Autogenerated by regenerate-opt-urls.py from > +gcc/config/mingw/mingw.opt and generated HTML > > mcrtdll= > UrlSuffix(gcc/Cygwin-and-MinGW-Options.html#index-mcrtdll) > -- > 2.39.3
[PATCH] testsuite: c++: Allow for std::printf in g++.dg/modules/stdio-1_a.H [PR98529]
g++.dg/modules/stdio-1_a.H currently FAILs on Solaris: FAIL: g++.dg/modules/stdio-1_a.H -std=c++17 scan-lang-dump module "Depset:0 decl entity:[0-9]* function_decl:'::printf'" FAIL: g++.dg/modules/stdio-1_a.H -std=c++2a scan-lang-dump module "Depset:0 decl entity:[0-9]* function_decl:'::printf'" FAIL: g++.dg/modules/stdio-1_a.H -std=c++2b scan-lang-dump module "Depset:0 decl entity:[0-9]* function_decl:'::printf'" The problem is that the module file doesn't contain Depset:0 decl entity:95 function_decl:'::printf' as expected by the test, but Depset:0 decl entity:26 function_decl:'::std::printf' This happens because Solaris declares printf in namespace std as allowed by C++11, Annex D, D.5. This patch allows for both forms. Tested on i386-pc-solaris2.11, sparc-sun-solaris2.11, and x86_64-pc-linux-gnu. Ok for trunk? Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University 2024-05-13 Rainer Orth gcc/testsuite: PR c++/98529 * g++.dg/modules/stdio-1_a.H (scan-lang-dump): Allow for ::std::printf. diff --git a/gcc/testsuite/g++.dg/modules/stdio-1_a.H b/gcc/testsuite/g++.dg/modules/stdio-1_a.H --- a/gcc/testsuite/g++.dg/modules/stdio-1_a.H +++ b/gcc/testsuite/g++.dg/modules/stdio-1_a.H @@ -10,5 +10,5 @@ #endif // There should be *lots* of depsets (209 for glibc today) // { dg-final { scan-lang-dump {Writing section:60 } module } } -// { dg-final { scan-lang-dump {Depset:0 decl entity:[0-9]* function_decl:'::printf'} module } } +// { dg-final { scan-lang-dump {Depset:0 decl entity:[0-9]* function_decl:'(::std)?::printf'} module } } // { dg-final { scan-lang-dump {Depset:1 binding namespace_decl:'::printf'} module } }
[COMMITTED] ada: Attributes Put_Image and Object_Size are defined by Ada 2022
From: Piotr Trojanek Recognize references to attributes Put_Image and Object_Size as language-defined in Ada 2022 and implementation-defined in earlier versions of Ada. Other attributes listed in Ada 2022 RM, K.2 and currently implemented in GNAT are correctly categorized. This change only affects code with restriction No_Implementation_Attributes. gcc/ada/ * sem_attr.adb (Attribute_22): Add Put_Image and Object_Size. * sem_attr.ads (Attribute_Imp_Def): Remove Object_Size. Tested on x86_64-pc-linux-gnu, committed on master. --- gcc/ada/sem_attr.adb | 4 +++- gcc/ada/sem_attr.ads | 11 --- 2 files changed, 3 insertions(+), 12 deletions(-) diff --git a/gcc/ada/sem_attr.adb b/gcc/ada/sem_attr.adb index 65442d45a85..b979ffdf0b1 100644 --- a/gcc/ada/sem_attr.adb +++ b/gcc/ada/sem_attr.adb @@ -181,7 +181,9 @@ package body Sem_Attr is (Attribute_Enum_Rep | Attribute_Enum_Val | Attribute_Index| - Attribute_Preelaborable_Initialization => True, + Attribute_Object_Size | + Attribute_Preelaborable_Initialization | + Attribute_Put_Image=> True, others => False); -- The following array contains all attributes that imply a modification diff --git a/gcc/ada/sem_attr.ads b/gcc/ada/sem_attr.ads index 4c9f27043c6..65b7b534711 100644 --- a/gcc/ada/sem_attr.ads +++ b/gcc/ada/sem_attr.ads @@ -373,17 +373,6 @@ package Sem_Attr is -- other composite object passed by reference, there is no other way -- of specifying that a zero address should be passed. - - - -- Object_Size -- - - - - Attribute_Object_Size => True, - -- Type'Object_Size is the same as Type'Size for all types except - -- fixed-point types and discrete types. For fixed-point types and - -- discrete types, this attribute gives the size used for default - -- allocation of objects and components of the size. See section in - -- Einfo ("Handling of Type'Size values") for further details. - - -- Passed_By_Reference -- - -- 2.43.2
[COMMITTED] ada: Fix crash on Compile_Time_Warning in dead code
From: Bob Duff If a pragma Compile_Time_Warning triggers, and the pragma is later removed because it is dead code, then the compiler can return a bad exit code. This causes gprbuild to report "*** compilation phase failed". This is because Total_Errors_Detected, which is declared as Nat, goes negative, causing Constraint_Error. In assertions-off mode, the Constraint_Error is not detected, but the compiler nonetheless reports a bad exit code. This patch prevents that negative count. gcc/ada/ * errout.adb (Output_Messages): Protect against the total going negative. Tested on x86_64-pc-linux-gnu, committed on master. --- gcc/ada/errout.adb | 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/gcc/ada/errout.adb b/gcc/ada/errout.adb index d28a410f47b..c4761bd1bc9 100644 --- a/gcc/ada/errout.adb +++ b/gcc/ada/errout.adb @@ -3399,11 +3399,16 @@ package body Errout is if Warning_Mode = Treat_As_Error then declare -Compile_Time_Pragma_Warnings : constant Int := +Compile_Time_Pragma_Warnings : constant Nat := Count_Compile_Time_Pragma_Warnings; - begin -Total_Errors_Detected := Total_Errors_Detected + Warnings_Detected +Total : constant Int := Total_Errors_Detected + Warnings_Detected - Warning_Info_Messages - Compile_Time_Pragma_Warnings; +-- We need to protect against a negative Total here, because +-- if a pragma Compile_Time_Warning occurs in dead code, it +-- gets counted in Compile_Time_Pragma_Warnings but not in +-- Warnings_Detected. + begin +Total_Errors_Detected := Int'Max (Total, 0); Warnings_Detected := Warning_Info_Messages + Compile_Time_Pragma_Warnings; end; -- 2.43.2
[COMMITTED] ada: Refine type of a local variable
From: Piotr Trojanek Code cleanup; semantics is unaffected. gcc/ada/ * sem_util.adb (Has_No_Output): Iteration with First_Formal/Next_Formal involves Entity_Ids. Tested on x86_64-pc-linux-gnu, committed on master. --- gcc/ada/sem_util.adb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/ada/sem_util.adb b/gcc/ada/sem_util.adb index e9ab6650dac..03055039a1f 100644 --- a/gcc/ada/sem_util.adb +++ b/gcc/ada/sem_util.adb @@ -4203,7 +4203,7 @@ package body Sem_Util is --- function Has_No_Output (Subp : Entity_Id) return Boolean is - Param : Node_Id; + Param : Entity_Id; begin -- A function has its result as output -- 2.43.2
Re: [PATCH] tree-ssa-math-opts: Pattern recognize yet another .ADD_OVERFLOW pattern [PR113982]
On Mon, 13 May 2024, Jakub Jelinek wrote: > Hi! > > We pattern recognize already many different patterns, and closest to the > requested one also >yc = (type) y; >zc = (type) z; >x = yc + zc; >w = (typeof_y) x; >if (x > max) > where y/z has the same unsigned type and type is a wider unsigned type > and max is maximum value of the narrower unsigned type. > But apparently people are creative in writing this in diffent ways, > this requests >yc = (type) y; >zc = (type) z; >x = yc + zc; >w = (typeof_y) x; >if (x >> narrower_type_bits) > > The following patch implements that. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? OK. Seeing the large matching code I wonder if using a match in match.pd might be more easy to maintain (eh, and I'd still like to somehow see "inline" match patterns in source files, not sure how, but requiring some gen* program extracting them). Thanks, Richard. > 2024-05-13 Jakub Jelinek > > PR middle-end/113982 > * tree-ssa-math-opts.cc (arith_overflow_check_p): Also return 1 > for RSHIFT_EXPR by precision of maxval if shift result is only > used in a cast or comparison against zero. > (match_arith_overflow): Handle the RSHIFT_EXPR use case. > > * gcc.dg/pr113982.c: New test. > > --- gcc/tree-ssa-math-opts.cc.jj 2024-04-11 09:26:36.318369218 +0200 > +++ gcc/tree-ssa-math-opts.cc 2024-05-10 18:17:08.795744811 +0200 > @@ -3947,6 +3947,66 @@ arith_overflow_check_p (gimple *stmt, gi >else > return 0; > > + if (maxval > + && ccode == RSHIFT_EXPR > + && crhs1 == lhs > + && TREE_CODE (crhs2) == INTEGER_CST > + && wi::to_widest (crhs2) == TYPE_PRECISION (TREE_TYPE (maxval))) > +{ > + tree shiftlhs = gimple_assign_lhs (use_stmt); > + if (!shiftlhs) > + return 0; > + use_operand_p use; > + if (!single_imm_use (shiftlhs, , _use_stmt)) > + return 0; > + if (gimple_code (cur_use_stmt) == GIMPLE_COND) > + { > + ccode = gimple_cond_code (cur_use_stmt); > + crhs1 = gimple_cond_lhs (cur_use_stmt); > + crhs2 = gimple_cond_rhs (cur_use_stmt); > + } > + else if (is_gimple_assign (cur_use_stmt)) > + { > + if (gimple_assign_rhs_class (cur_use_stmt) == GIMPLE_BINARY_RHS) > + { > + ccode = gimple_assign_rhs_code (cur_use_stmt); > + crhs1 = gimple_assign_rhs1 (cur_use_stmt); > + crhs2 = gimple_assign_rhs2 (cur_use_stmt); > + } > + else if (gimple_assign_rhs_code (cur_use_stmt) == COND_EXPR) > + { > + tree cond = gimple_assign_rhs1 (cur_use_stmt); > + if (COMPARISON_CLASS_P (cond)) > + { > + ccode = TREE_CODE (cond); > + crhs1 = TREE_OPERAND (cond, 0); > + crhs2 = TREE_OPERAND (cond, 1); > + } > + else > + return 0; > + } > + else > + { > + enum tree_code sc = gimple_assign_rhs_code (cur_use_stmt); > + tree castlhs = gimple_assign_lhs (cur_use_stmt); > + if (!CONVERT_EXPR_CODE_P (sc) > + || !castlhs > + || !INTEGRAL_TYPE_P (TREE_TYPE (castlhs)) > + || (TYPE_PRECISION (TREE_TYPE (castlhs)) > + > TYPE_PRECISION (TREE_TYPE (maxval > + return 0; > + return 1; > + } > + } > + else > + return 0; > + if ((ccode != EQ_EXPR && ccode != NE_EXPR) > + || crhs1 != shiftlhs > + || !integer_zerop (crhs2)) > + return 0; > + return 1; > +} > + >if (TREE_CODE_CLASS (ccode) != tcc_comparison) > return 0; > > @@ -4049,6 +4109,7 @@ arith_overflow_check_p (gimple *stmt, gi > _8 = IMAGPART_EXPR <_7>; > if (_8) > and replace (utype) x with _9. > + Or with x >> popcount (max) instead of x > max. > > Also recognize: > x = ~z; > @@ -4481,10 +4542,62 @@ match_arith_overflow (gimple_stmt_iterat > gcc_checking_assert (is_gimple_assign (use_stmt)); > if (gimple_assign_rhs_class (use_stmt) == GIMPLE_BINARY_RHS) > { > - gimple_assign_set_rhs1 (use_stmt, ovf); > - gimple_assign_set_rhs2 (use_stmt, build_int_cst (type, 0)); > - gimple_assign_set_rhs_code (use_stmt, > - ovf_use == 1 ? NE_EXPR : EQ_EXPR); > + if (gimple_assign_rhs_code (use_stmt) == RSHIFT_EXPR) > + { > + g2 = gimple_build_assign (make_ssa_name (boolean_type_node), > + ovf_use == 1 ? NE_EXPR : EQ_EXPR, > + ovf, build_int_cst (type, 0)); > + gimple_stmt_iterator gsiu = gsi_for_stmt (use_stmt); > + gsi_insert_before (, g2, GSI_SAME_STMT); > + gimple_assign_set_rhs_with_ops (, NOP_EXPR, > +
[COMMITTED] ada: Remove code that expected pre/post being split into conjuncts
From: Piotr Trojanek The removed code is no longer needed (and causes assertion failures). Most likely it should have been using the Split_PPC flag. gcc/ada/ * sem_util.adb (Is_Potentially_Unevaluated): Remove code for recovering the original structure of expressions with AND THEN. Tested on x86_64-pc-linux-gnu, committed on master. --- gcc/ada/sem_util.adb | 29 ++--- 1 file changed, 2 insertions(+), 27 deletions(-) diff --git a/gcc/ada/sem_util.adb b/gcc/ada/sem_util.adb index 1166c68b972..b5c33638b35 100644 --- a/gcc/ada/sem_util.adb +++ b/gcc/ada/sem_util.adb @@ -19582,39 +19582,14 @@ package body Sem_Util is -- Local variables - Par : Node_Id; Expr : Node_Id; + Par : Node_Id; -- Start of processing for Is_Potentially_Unevaluated begin Expr := N; - Par := N; - - -- A postcondition whose expression is a short-circuit is broken down - -- into individual aspects for better exception reporting. The original - -- short-circuit expression is rewritten as the second operand, and an - -- occurrence of 'Old in that operand is potentially unevaluated. - -- See sem_ch13.adb for details of this transformation. The reference - -- to 'Old may appear within an expression, so we must look for the - -- enclosing pragma argument in the tree that contains the reference. - - while Present (Par) -and then Nkind (Par) /= N_Pragma_Argument_Association - loop - if Is_Rewrite_Substitution (Par) - and then Nkind (Original_Node (Par)) = N_And_Then - then -return True; - end if; - - Par := Parent (Par); - end loop; - - -- Other cases; 'Old appears within other expression (not the top-level - -- conjunct in a postcondition) with a potentially unevaluated operand. - - Par := Parent (Expr); + Par := Parent (Expr); while Present (Par) and then Nkind (Par) /= N_Pragma_Argument_Association -- 2.43.2
[COMMITTED] ada: Revert recent change for Put_Image and Object_Size attributes
From: Piotr Trojanek Recent change for attribute Object_Size caused spurious errors when restriction No_Implementation_Attributes is active and attribute Object_Size is introduced by expansion of dispatching operations. Temporarily revert that change for a further investigation. gcc/ada/ * sem_attr.adb (Attribute_22): Remove Put_Image and Object_Size. * sem_attr.ads (Attribute_Imp_Def): Restore Object_Size. Tested on x86_64-pc-linux-gnu, committed on master. --- gcc/ada/sem_attr.adb | 4 +--- gcc/ada/sem_attr.ads | 11 +++ 2 files changed, 12 insertions(+), 3 deletions(-) diff --git a/gcc/ada/sem_attr.adb b/gcc/ada/sem_attr.adb index b979ffdf0b1..65442d45a85 100644 --- a/gcc/ada/sem_attr.adb +++ b/gcc/ada/sem_attr.adb @@ -181,9 +181,7 @@ package body Sem_Attr is (Attribute_Enum_Rep | Attribute_Enum_Val | Attribute_Index| - Attribute_Object_Size | - Attribute_Preelaborable_Initialization | - Attribute_Put_Image=> True, + Attribute_Preelaborable_Initialization => True, others => False); -- The following array contains all attributes that imply a modification diff --git a/gcc/ada/sem_attr.ads b/gcc/ada/sem_attr.ads index 65b7b534711..4c9f27043c6 100644 --- a/gcc/ada/sem_attr.ads +++ b/gcc/ada/sem_attr.ads @@ -373,6 +373,17 @@ package Sem_Attr is -- other composite object passed by reference, there is no other way -- of specifying that a zero address should be passed. + - + -- Object_Size -- + - + + Attribute_Object_Size => True, + -- Type'Object_Size is the same as Type'Size for all types except + -- fixed-point types and discrete types. For fixed-point types and + -- discrete types, this attribute gives the size used for default + -- allocation of objects and components of the size. See section in + -- Einfo ("Handling of Type'Size values") for further details. + - -- Passed_By_Reference -- - -- 2.43.2